ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 97
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES/ Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITORS
BENJAMIN W A N Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITED BY PETER W. HAWKES CEMES / Laboratoire d’Optique Electronique du Centre National de la Recherche Scientijique Toulouse, France
VOLUME 97
ACADEMIC PRESS San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. @ Copyright 0 1996 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. 525 B Street, Suite 1900, San Diego, California 92101-4495. USA http://www.apnet.com
Academic Press Limited 24-28 Oval Road, London NWl 7DX, UK http ://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014739-4 PRINTED IN THE UNITED STATES OF AMERJCA 97 9 8 9 9 00 01 BC 9 8 7 6 5
96
4
3 2
1
CONTENTS CONTRIBUTORS ...................................... PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Image Representation with Gabor Wavelets and Its Applications NAVARRO. ANTONIO TABERNERO. AND GABRIEL CRIST6BAL I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Joint Space-Frequency Representations and Wavelets . . . . . I11. Gabor Schemes of Representation . . . . . . . . . . . . . . . . . . . IV. Vision Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Image Coding. Enhancement. and Reconstruction . . . . . . . . VI . Image Analysis and Machine Vision . . . . . . . . . . . . . . . . . . VII . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix xi
RAFAEL
2 8 19 37 50 61 75 79
Models and Algorithms for Edge-Preserving Image Reconstruction L. BEDINI.I . GERACE. E.SALERNO. AND A . TONAZZINI I. Introduction ................................... 86 I1. Inverse Problem. Image Reconstruction. 94 and Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I11. Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 IV. Image Models and Markov Random Fields . . . . . . . . . . . . 104 118 V . Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Constraining an Implicit Line Process . . . . . . . . . . . . . . . . 129 VII . Determining the Free Parameters . . . . . . . . . . . . . . . . . . . 141 VIII . Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 IX. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Y
vi
CONTENTS
Successive Approximation Wavelet Vector Quantization for Image and Video Coding E. A. B. DA SILVAAND D. G. SAMPSON I. Introduction . . . . . , . . . . . . . . . . . . . . . . . . . , . . . . . . . . 11. Wavelets . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Successive Approximation Quantization . . . . . . , . . . . , . . IV. Successive Approximation Wavelet Lattice Vector Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . V. Application to Image and Video Coding . . . . . . . . . . . , . . VI. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I. 11.
111. IV.
Quantum Theory of the Optics of Charged Particles R. JAGANNATHANAND S. A. KHAN Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scalar Theory of Charged-Particle Wave Optics . . . . . . . . . Spinor Theory of Charged-Particle Wave Optics . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . , . . . . , . . . . . . References . . . . . , , . . . . . . . . . . . . . . . . , . . . . . . . . . . .
Ultrahigh-Order Canonical Aberration Calculation and Integration Transformation in Rotationally Symmetric Magnetic and Electrostatic Lenses JIYE XIMEN I. Introduction . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . 11. Power-Series Expansions for Hamiltonian Functions and Eikonals in Magnetic Lenses . . . . . . . , . . . . . . . . . . . . , . 111. Generalized Integration Transformation on Eikonals Independent of ( r X p ) in Magnetic Lenses . , . . . . . . . . . . IV. Canonical Aberrations up to the Ninth-Order Approximation in Magnetic Lenses . , . , . . . , . . . . . . . . . . V. Generalized Integration Transformation on Eikonals Associated with ( r X p ) in Magnetic Lenses . , . . . . . . . . . . VI. Eikonal Integration Transformation in Glaser’s Bell-Shaped Magnetic Field . , . . . . . . . . . . . . . . . . . . . , . VII. Generalized Integration Transformation on Eikonals in Electrostatic Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . .
191 195 205
221 226 252 253
257 259 322 336 356
360 361 369 381 389 393 396
CONTENTS
VIII. Conclusion References
................................... ...................................
vii 403 407
Erratum and Addendum for Physical Information and the Derivation of Electron Physics B. ROY FRIEDEN409
INDEX
...........................................
413
This Page Intentionally Left Blank
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
L. BEDINI(85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy GABRIELCRIST6BAL (l), Daza de ValdCs (CSIC), Instituto de Optica, 28006 Madrid, Spain EDUARDOA. B. DA SILVA(190, Depto de Electronica, Universidade Federal do Rio de Janeiro, Cep 21945-970 Rio de Janeiro, Brazil B. ROY FRIEDEN(4091, Optical Sciences Center, University of Arizona, Tucson, Arizona 85721 I. GERACE(85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy (2571, Institute of Mathematical Sciences, CIT Campus, R. JAGANNATHAN Theramani, Madras 600113, India S. A. KHAN (2571, Institute of Mathematical Sciences, CIT Campus, Theramani, Madras 600113, India
RAFAELNAVARRO (l), Daza de ValdCs (CSIC), Instituto de Optica, 28006 Madrid, Spain E. SALERNO (85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy DEMITRIOS G. SAMPSON (1911, Zographou, Athens 15772, Greece
ANTONIO TABERNERO (l), Facultad de Informbtica, Universidad PolitCcnica de Madrid, 28660 Madrid, Spain ANNATONAZZINI (85), CNR Istituto di Elaborazione della Informazione, 1-56126 Pisa, Italy JIYEXIMEN(359), Department of Radio Electronics, Peking University, Beijing 100871, People’s Republic of China ix
This Page Intentionally Left Blank
PREFACE
This volume contains three contributions from image science and two from electron optics. It concludes with an erratum and addendum to the chapter by Frieden that appeared in volume 90 (1995). Although it is not usual to publish errata in this serial, for the simple reason that readers are not likely to be aware of subsequent corrections, I have made an exception here because of the importance and wide-ranging nature of the work reported by Frieden. I am convinced that his ideas will be recognized by our successors as a major advance in theoretical physics and it therefore seemed reasonable to ensure that they are expressed correctly here. Two chapters examine different aspects of wavelets. R. Navarro, A. Tabernero, and G. Cristdbal describe image representation using Gabor wavelets, with sections on vision modeling, coding, enhancement and reconstruction, and reconstruction and on analysis and machine vision. E. A. B. da Silva and D. G. Sampson discuss successive approximation wavelet vector quantization for image and video coding, a most interesting use of wavelets of great practical importance. The chapter on image science, by L. Bedini, I. Gerace, E. Salerno, and A. Tonazzini, deals with a very common problem in image processing: How can images be restored without suppressing small features of interest, notably edges? This question raises deep and difficult questions of regularization, which we meet in most ill-posed problems. The authors analyze these and discuss in detail some ways of solving them. The chapter, by R. Jagannathan and S. A. Khan, is really a complete monograph on a little-studied question, namely the development of electron optics when the spin of the electron is not neglected. Generally, electron optics is developed from the everyday Schrodinger equation, as though the electron had no spin; although this is certainly justified in virtually all practical situations, it is intellectually frustrating that this approximation does not emerge as a special case of a more general theory based on the Dirac equation. This study goes a long way toward remedying this situation and I am delighted to include it here. We conclude with a shorter chapter by J.-Y. Ximen, whose work has already appeared as a supplement to this serial. This is concerned with higher order aberrations of electron lenses. xi
xii
PREFACE
I am most grateful to all these authors for the work and time they have devoted to their contributions and I conclude as usual with a list of forthcoming contributions. Peter W. Hawkes
FORTHCOMING CONTRIBUTIONS Nanofabrication Finite-element methods for eddy-current problems Use of the hypermatrix Image processing with signal dependent noise The Wigner distribution Hexagon-based image processing Microscopic imaging with mass-selected secondary ions Modern map methods for particle optics Cadmium selenide field-effect transistors and display ODE methods Electron microscopy in mineralogy and geology Electron-beam deflection in color cathode-ray tubes Fuzzy morphology The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Miniaturization in electron optics Liquid metal ion sources The critical-voltage effect Stack filtering Median filters
H. Ahmed and W. Chen R. Albanese and G. Rubinacci D. Antzoulatos H. H. Arsenault M. J. Bastiaans S. B. M. Bell M. T. Bernius M. Berz and colleagues T. P. Brody, A. van Calster, and J. F. Farrell J. C. Butcher P. E. Champness B. Dasgupta E. R. Dougherty and D. Sinha M. Drechsler J. M. H. Du Buf A. Feinerman R. G. Forbes A. Fox M. Gabbouj N. C. Gallagher and E. Coyle
PREFACE
Quantitative particle modeling Structural analysis of quasicrystals Formal polynomials for image processing Contrast transfer and crystal images Morphological scale-spaces Optical interconnects Surface relief Spin-polarized SEM Sideband imaging The recursive dyadic Green’s function for ferrite circulators Near-field optical imaging Vector transformation SAGCM InP/InGaAs avalanche photodiodes for optical fiber communications SEM image processing Electron holography and Lorentz microscopy of magnetic materials Electron holography of electrostatic fields The dual de Broglie wave Electronic tools in parapsychology Phase-space treatment of photon beams Aspects of mirror electron microscopy The imaging plate and its applications Representation of image operators Z-contrast in materials science HDTV The wave-particle dualism Electron holography Space-variant image restoration X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology
xiii
D. Greenspan (vol. 98) K. Hiraga A. Imiya K. Ishizuka P. Jackway (vol. 98) M. A. Karim and K. M. Iftekharuddin J. J. Koenderink and A. J. van Doorn K. Koike W. Krakow C. M. Krowne (vol. 98) A. Lewis W. Li C. L. F. Ma, M. J. Deen, and L. E. Tarof N. C. MacDonald M. Mankos, M. R. Scheinfein, and J. M. Cowley (vol. 98) G. Mattcucci, G. F. Missiroli, and G. Pozzi M. Molski R. L. Morris G. Nemes S. Nepijko T. Oikawa and N. Mori (vol. 99) B. Olstad S. J. Pennycook E. Petajan H. Rauch D. Saldin A. de Santis G. Schmahl J. P. F. Sellschop J. Serra
xiv
PREFACE
Set-theoretic methods in image processing Focus-deflection systems and their applications Mosaic color filters for imaging devices
New developments in ferroelectrics Electron gun optics Very high resolution electron microscopy Morphology on graphs
M. I. Sezan T. Soma T. Sugiura, K. Masui, K. Yamamoto, and M. Tni J. Toulouse Y. Uchikawa D. van Dyck L. Vincent
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL. 97
Image Representation with Gabor Wavelets and Its Applications RAFAEL NAVARRO Instituto de Optica “Daza de Vald6s” (CSlC). Serrano 121. 28006 Madrid. Spain
ANTONIO TABERNERO Facultad de Informritica. Universidad Politkcnica de Madrid. Boadilla del Monte. 28660 Madrid. Spain
and GABRIEL CRISTOBAL Instituto de Optica “Daza de Valdis” (CSIC). Serrano 121. 28006 Madrid. Spain
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Joint Space-Frequency Representations and Wavelets A . Joint Representations. Wigner Distribution. Spectrogram. and Block Transforms B. Wavelets C . Multiresolution Pyramids D . Vision-Oriented Models 111. Gabor Schemes of Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Exact Gabor Expansion for a Continuous Signal B. Gabor Expansion of Discrete Signals C. Quasicomplete Gabor Transform . . . . . . . . . . . . . . . . . . . . . . . . . . IV . Vision Modeling A . Image Representation in the Visual Cortex B. Gabor Functions and the RFs of Cortical Cells C. Sampling in the Human Visual System V . Image Coding. Enhancement. and Reconstruction A . Image Coding and Compression B. Image Enhancement and Reconstruction VI . Image Analysis and Machine Vision A . EdgeDetection B. TextureAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C . MotionAnalysis D . Stereo VII . Conclusion References
2
................. 8 .................................... 8 ........................................ 11 ............................... 13 ............................... 16 .................. ........................
...................................... .................... .................. ....................... .................. ........................... ...................... .......................... .................................... ....................................
.......................................... ......................................... ......................................... 1
19 23 30 34 37 37 41 45 50 50 54 61 63 64 72 74 75 79
Copyright Q 1996 by Academic Press. Inc . All rights of reproduction in any form reserved.
2
RAFAEL NAVARRO ET AL.
I. INTRODUCTION In image analysis and processing, there is a classical choice between spatial and frequency domain representations. The former, consisting of a two-dimensional (2D) array of pixels, is the standard way to represent discrete images. This is the typical format used for acquisition and display, but it is also common for storage and processing. Space representations appear in a natural way, and they are important for shape analysis, object localization, and description (either photometric or morphologic) of the scene. There is much processing that can be done in the space domain: histogram modification, pixel and neighbor operations, and many others. On the other hand, there are many tasks that we can perform in the Fourier (spatial frequency) domain in a more natural way, such as filtering and correlations. These two representations have very important complementary advantages that we often have to combine when developing practical applications. An interesting example is our own visual system, which has to perform a variety of complex tasks in real time and in parallel, processing nonstationary signals. Figure 1 (redrawn from Bartelt ef al., 1980) illustrates this problem with a simple nonstationary 1D temporal signal. (It is straightforward to extend the following discussion to 2D images or even 3D signals, such as image sequences.) The four panels show different ways of representing a signal corresponding to two consecutive musical notes. The upper left panel shows the signal as we would see it when displayed in an oscilloscope. Here we can appreciate the temporal evolution, namely, the periodic oscillations of the wave, and the transition from one note to the next. Although this representation is complete, it is hard with a simple glimpse to say much about the exact frequencies of the notes. The Fourier spectrum (upper right) provides an accurate global description of the frequency content of the signal, but it does not tell us much about the timing and order of the notes. Despite the fact that either of these two descriptions may be very useful for sound engineers, the music performer would rather prefer a stave (bottom left) of a musical score, that is, a conjoint representation in time ( t axis) and frequency (log Y axis). The Wigner distribution function (Wigner, 1932), bottom right, provides a complete mathematical description of the joint time-frequency domain (Jacobson and Wechsler, 19881, but at the cost of very high redundancy (doubling the dimension of the signal). Regular sampling of a signal with N elements in the spatial (or frequency) domain will require N 2 samples in the conjoint domain defined by the Wigner distribution to be stored and analyzed. Although this high degree of redundancy may be necessary in
IMAGE REPRESENTATION WITH GABOR WAVELETS
3
+-
FREQUENCY REPRESENTATION
SIGNAL
~
.._ .._.__....._._...
W(t.U)
I-
t MUStCAL SCORE
lI l -
q
w
WIGNER DISTRIBUTION FUNCTION
FIGURE1. Four different descriptions of the same signal: time domain (upper left); frequency domain (upper right); conjoint: stave (lower left) and Wigner distribution function (lower right). Reprinted with permission from Bartelt el al., The Wigner distribution function and its optical production, Optics Cornm. 32, 32-38. Copyright 1980, Elsevier Sci. Ltd., The Boulevard, Langford Lane, Kidlington OX5 IGB, UK.
some especially difficult problems (Cristobal et al., 1990, such an expensive redundancy cannot be afforded in general, particularly in vision and image processing tasks (2D or 3D signals). The musician will prefer a conjoint but compact (and meaningful) code, like the stave: only two samples (notes) are required to represent the signal in the example of Fig. 1. Such kind of conjoint but compact codes is more likely to be found in biology, combining usefulness with maximum economy. A possible approach to building a representation with these advantages is to optimally sample the conjoint domain trying to diminish redundancy without losing information. The uncertainty principle tells us that there exists a limit for joint (space-frequency) localization (Gabor, 1946; Daugman, 19851, that is, if we apply fine sampling in the space (or time) domain, then apply coarse frequency sampling and vice versa. The uncertainty product limits the minimum area for sampling the conjoint domain. Gabor, in his Theory of Communication (19461, observed that Gaussian wave packets (Gabor wavelets or Gabor functions) minimize such conjoint
4
RAFAEL NAVARRO ET AL.
Time
-
Time
-
FIGURE2. Two ways of sampling the conjoint time-frequency domain, with sampling units having constant area: homogeneous (left); adapting the aspect ratio to the spatialfrequency band (right).
uncertainty, being optimal sampling units, or logons, of the conjoint domain. The left panel of Fig. 2 shows the “classical” way of homogeneously sampling this 2D space-frequency conjoint domain. The right panel represents smarter sampling, as the one used in wavelet or multiscale pyramid representations and presumably by our own visual system. Here, the sampling area is kept constant, but the aspect ratio of the sampling units changes from one frequency level to the next. This is smarter sampling because it takes into account that low-frequency features will tend to occupy a large temporal (or spatial) interval requiring rather coarse sampling, whereas high frequencies require fine temporal (or spatial) sampling. In both cases, the sampling density is very important. Critical sampling (Nyquist) will produce the minimum number of linearly independent elements ( N ) to have a complete representation of the signal; a lower sampling density will cause aliasing artifacts, whereas oversampling will produce a redundant representation (this will be further discussed later). One of the most exciting features of wavelet and similar representations is that they appear to be useful for almost every signal processing application (either acoustical, 1D; 2D images or 3D sequences), including the modeling of biological systems. However, despite several early developments of the basic theory (Haar, 1910), only in the 1980s were the first applications to image processing published. Wigner (1932) introduced a complete joint representation of the phase space in quantum mechanics; Gabor (1946) proposed Gaussian wave packets, logons or information quanta, for optimally packing information. Cohen (1966) developed a generalized framework for phase space distribution functions, showing that
IMAGE REPRESENTATION WITH GABOR WAVELETS
5
most of these conjoint image representations belong to a large class of bilinear distributions. Any given representation is obtained by choosing an appropriate kernel in the generalized distribution. Until recently, these theoretical developments were not accompanied by practical applications in signal processing. Apart from the availability of much cheaper and more powerful computers, several factors have accelerated this field in the 1980s and 1990s. On the one hand, Gabor functions were successfully applied to model the responses of simple cells in the brain's visual cortex, in both 1D (Marcklja, 1980) and 2D (Daugman, 1980). On the other hand, Bastiaans (1981) and Morlet et al. (1982) provided the theoretical basis for a practical implementation of the Gabor and other expansions. Further generalizations of the Gabor expansion (Daugman, 1988; Porat and Zeevi, 1988) and the development of wavelet theory (Grossman and Morlet, 1984; Meyer, 1988; Mallat, 1989b; Daubechies, 1990) have opened broad fields of applications. In particular, wavelet theory has constituted a unifying framework, merging ideas coming from mathematics, physics, and engineering. One of the most important applications has been to image coding and compression, because of its technological relevance. In fact, many conjoint schemes of representation, such as multiresolution pyramids (Burt and Adelson, 19831, or the discrete cosine transform (Rao, 1990) used in Joint Photographic Experts Group (JPEG) and Moving Picture Experts Group (MPEG) image and video standards, were specifically directed to image compression. A Gabor function, or Gaussian wave packet, is a complex exponential with a Gaussian modulation or envelope. From now on, we will use the variable t (time) for the 1D case and x , y for 2D (despite the fact that this review is mainly focused on 2D images, it is simpler and more convenient to use a 1D formulation that can be easily generalized to the 2D case). In one dimension, the mathematical expression of a Gabor function is glo.&)
= a exp[ - 4
t - t d 2 ] exp[ 4 w d +
411.
(1)
The two labels to, wo stand for the temporal and frequency localization or tuning. The parameter a determines the half-width of the Gaussian envelope, and 4 is the phase offset of the complex exponential. The most characteristic property of the Gabor functions is that they have the same mathematical expression in both domains. The Fourier transform of gl, q l( t ) will be
where c$' = wot, + (6. This property, which allows fast implementations in either the space or frequency domain, along with their optimal localization
6
RAFAEL NAVARRO ET AL.
(Gabor, 1946), will yield a series of interesting applications. Moreover, by changing a single parameter, the bandwidth a, we can continuously shift the time-frequency, or in 2D the space/spatial-frequency localization, from one domain to the other. For instance, visual models (as well as those for most applications) use fine spatial sampling (high localization) and coarse sampling of the spatial-frequency domain (see Section IV). In addition to the two, space (or time) and Fourier, possible computer implementations (Navarro and Tabernero, 1990, Bastiaans (1982) proposed a parallel optical generation of the Gabor expansion. Subsequently, several authors (Freysz et al., 1990; Li and Zhang, 1992; Sheng et al., 1992) reported optical implementations. In the two-dimensional case, it is common to use Cartesian spatial coordinates but polar coordinates for the spatial-frequency domain: gXO.YOJ".~O
=
exp{i[2.rrf,(xcos 8, +ysin 8,)
+ +])gauss(x -xg,y
-yo) ( 3a)
where the Gaussian envelope has the form gauss(x,y) =aexp(-aa2[(xcos8, +ysinO,)* +y2(xsin 8, -ycos 8 , ) ' ] ) .
(3b)
The four labels xg, yo, fo, 8,, stand for the spatial and frequency localization. The parameters a and y define the bandwidth and aspect ratio of the Gaussian envelope, respectively (we have restricted the Gaussian to have its principal axis along the B0 direction); 4 is again the phase offset. Apart from the interesting properties mentioned previously, Gaussian wave packets (or wavelets), GWs, also have some drawbacks. Their lack of orthogonality makes the computation of the expansion coefficients difficult. A possible solution is to find a biorthogonal companion basis that facilitates the computation of the coefficients for the exact reconstruction of the signal (Bastiaans, 1981). This solution is computationally expensive, and the interpolating biorthogonal functions can have a rather complicated shape. Several practical solutions for finding the expansion coefficients have been proposed, such as the use of a relaxation network (Daugman, 1988). By oversampling the signal to some degree, we can obtain dual functions more similar in shape to the Gabor basis (Daubechies, 1990). The redundancy inherent in oversampling is, of course, a bad property for coding and compression applications. However, for control systems, redundancy and lack of orthogonality are desirable properties that are necessary for robustness. Biological vision (and sensory systems in general) lacks orthogonality, producing a redundancy that is highly expen-
IMAGE REPRESENTATION WITH GABOR WAVELETS
7
sive, this being the price of robustness. The use of redundant sampling permits us to design quasicomplete Gabor representations (Navarro and Tabernero, 1991) that are simple, robust, and fast to implement, providing reconstructions with a high signal-to-noise ratio (SNR) and high visual quality. A minor drawback is that Gabor functions are not pure passband, which is a basic requirement for being an admissible wavelet (but their DC response is very small anyway-less than 0.002 for a 2D, one octave bandwidth Gabor function). These drawbacks have motivated the search for other basis functions, orthogonal when possible. This, along with the wide range (still increasing) of applications and the merging of ideas from different fields, has produced the appearance of many different schemes of image representation in the literature (we will review the most representative schemes in Section 11, before focusing on GWs in Section 111). Almost every author seems to have a favorite scheme and basis function, depending on his or her area of interest, personal background, etc. In our case, there are several reasons why GWs (Gabor functions) constitute our favorite basis for image representation. Apart from optimal joint localization (as pointed out by Gabor), good behavior of Gaussians, and robustness, perhaps the most interesting property is that they probably have the broadest field of application. For a given application (for example, coding, edge detection, motion analysis) one can find and implement an optimal basis function. For instance, Canny (1986) has shown that Gaussian derivatives are optimal for edge detection in noisy environments. Gabor functions are probably not optimal for most applications, but they perform well in almost all cases and in most of them are even nearly optimal. This can be explained intuitively in terms of the central limit theorem (Papoulis, 19891, i.e., that the cumulative convolution of many different kernels will result in a Gaussian convolution. The following is not a rigorous but only intuitive discussion: The good fit obtained with GWs to the responses of cortical neurons could be, roughly speaking, a consequence of the central limit theorem in the sense that from the retina to the primary visual cortex, there is a series of successive neural networks. In a rough linear approach, we can realize each neural layer as a discrete convolution. Thus, the global effect would be approximately equivalent to a single Gaussian channel. Although this idea is far from having a rigorous demonstration, it has been applied to the implementation of multiscale Gabor filtering (Rao and Ben-Arie, 1993). On the other hand, with the central limit theorem in mind, one could tend to think that when trying to optimize a basis function (a filter) for many different tasks simultaneously, the resulting filter could tend to show a Gaussian envelope.
8
RAFAEL NAVARRO ET A L
The field of application of GWs and similar schemes of image representation is huge and continuously increasing. They are highly useful in almost every problem of image processing, coding, enhancement, and analysis and low to mid-level vision (including modeling biological vision). Moreover, multiscale and wavelet representations have provided important breakthroughs in image understanding and analysis. Furthermore, Gabor functions are a widely used tool for visual testing in psychophysical and physiological studies. Gaussian envelopes are very common in grating stimuli to measure contrast sensitivity, to study shape, texture, motion perception, (Caelli and Moraglia, 1985; Sagi, 1990; Geri et al., 1995; Watson and Turano, 19951, or modeling brightness perceptioon (du Buf, 1995). Although these applications are beyond the scope of this review, we want to mention them because of their increasing relevance. All these facts suggest that GWs are especially suitable for building general-purpose environments for image processing, analysis, and artificial vision systems. Here, we have classified the most relevant applications in three groups: modeling of early processing in the human visual system in Section IV; applications to image coding, enhancement, and reconstruction in Section V; and applications to image analysis and machine vision in Section VI. Prior to these applications, we review the main conjoint image representations in Section 11, and then Section I11 specifically treats Gabor representations. 11. JOINT SPACE-FREQUENCY REPRESENTATIONS AND WAVELETS A. Joint Representations, Wiper Distribution, Spectrogram, and
Block Transforms Stationary signals or processes are statistically invariant over space or time (e.g., white noise or sinusoids), and thus we can apply a global description or analysis to them (e.g., Fourier transform). As in the example of Fig. 1, an image composed of several differently textured objects will be nonstationary. Images can also be affected by nonstationary processes. For instance, optical defocus will produce a spatially invariant blur in the case of a flat object that is perpendicular to the optical axis of the camera. However, in the 3D world, defocus will vary with the distance from the object to the camera, and hence it will be nonstationary in general. The result is a spatially variant blur that we cannot describe as a conventional convolution. Spatially variant signals and processes can be better characterized by conjoint time-frequency or space/spatial frequency representations.
IMAGE REPRESENTATION WITH GABOR WAVELETS
9
1. Wigner Distribution Function Wigner (1932) introduced a bilinear distribution as a conjoint representation of the phase space in quantum mechanics. Later, Ville (1948) derived the same (Wigner or Wigner-Ville) distribution in the field of signal processing. As we have mentioned before, we will be using the variable t for the 1D case (equivalent expressions can be derived for the 2D spatial domain or higher dimensions). For a continuous and integrable signal f ( t ) , the symmetric Wigner distribution (WD) is given by (Claasen and Mecklenbrauker, 1980)
where s is the integrating variable, w the is frequency variable, and f* stands for the complex conjugate of f. The WD belongs to the Cohen class of bilinear distributions (Cohen, 1966), in which each member is obtained by introducing a particular kernel, +( 5,a),in the generalized distribution (Jacobson and Wechsler, 1988). These bilinear distributions C ( t , w ) , can be expressed as the 2D Fourier transform of weighted versions of the ambiguity function:
where A( 5,a) is the ambiguity function
The Wigner distribution, because of its bilinear definition, contains crossterms, complicating its interpretation, especially in pattern recognition applications. 2. Complex Spectrogram Another way to obtain a conjoint representation is through the complex spectrogram, which can be expressed as a windowed Fourier transform: F(t,w)
=
j
m
w ( s - t)f(s)e-'wsds --m
(7)
where w ( s ) is the window that introduces localization in time (or space). The signal can be recovered from the complex spectrogram by the inversion formula (Helstrom, 1966):
10
RAFAEL NAVARRO ET AL.
The Wigner-Ville distribution can be considered as a particular case of the complex spectrogram, where the shifting window is the signal itself (complex conjugated). Both the spectrogram and the Wigner-Ville distribution belong to the Cohen class (with kernels 4 = W,(t, w ) and 4 = 1, respectively), are conjoint, complete, and invertible representations, but at the cost of high redundancy. When the window w ( s ) is a Gaussian, we can make a simple change, calling g,,,(s)
=
w ( s - t)eiws.
(9)
Then g,, ,(s) is a Gabor function, and Eq. (7)becomes
Therefore, we can obtain the “gaussian” complex spectrogram at any given point ( t , w ) as the inner product between the signal f and a localized Gabor function. The decomposition of a signal into its projections on a set of displaced and modulated versions of a kernel function appears in quantum optics and other areas of physics. The elements of the set {g,, ,(s)) are the coherent states associated with the Weyl-Heisenberg group that sample the phase space ( t , 01. The spectrogram of Eq. (10) provides information about the energy content of the signal at ( t , w ) , because the inner product captures similarities between the signal f and the “probe” function gt,,that is localized in the joint domain. To recover the signal in the continuous case, we rewrite Eq. (8) as
The window function does not need to be Gaussian in general. However, as we said in the Introduction, Gabor functions have the advantage of maximum joint localization; i.e., they achieve the lower bound of the joint uncertainty. This has also been demonstrated in the 2D case for separable Gabor functions (Daugman, 1985). Signal uncertainty is commonly defined in terms of the variances of the marginal energy distributions associated with the signal and its Fourier transform. An alternative definition of infomutionul uncertainty (Leipnik, 1959) has been introduced in terms of the entropy of the joint density function. Interestingly, Leipnik (1960) found that Gabor functions (among others) are entropy-minimizing signals. [See Stork and Wilson (1990) for a more recent discussion of alternative metrics or measures of joint localization.]
IMAGE REPRESENTATION WITH GABOR WAVELETS
11
3. Block Transforms Both the WD and the complex spectrogram involve high redundancy and permit exact recovery of the signal in the continuous case. In practical signal processing applications, we have to work with a discrete number of samples. In the Fourier transform, the complex exponentials constitute the basis functions, in both the continuous and discrete cases. For the latter case, signal recovery is guaranteed for band-limited signals with a sampling frequency greater than or equal to the Nyquist frequency. The W D also permits signal recovery in the discrete case (Claasen and Mecklenbrauker, 1980). In the case of the discrete spectrogram, with a discrete number of windows, image reconstruction is guaranteed only under certain conditions (this will be discussed in Section 111). When looking for a complete but compact discrete joint image representation, one can think of dividing the signal into nonoverlapping blocks and independently processing each block (contrary to the case of overlapping continuously shifted windows). Each block is a localized (in space, time, etc.) portion of the signal. Then if we apply an invertible transform to each block, we will be able to recover the signal whenever the set of blocks is complete. This is the origin of a series of block transforms, of which the discrete cosine transform (DCT) is the most representative example (Rao, 1990). Current standards for image and video compression are based on the DCT. However, the sharp discontinuities between image blocks may produce ringing and other artifacts after quantization, specially at low-bit-rate transmission, that are visually annoying. We can eliminate these artifacts by duplicating the number of blocks, in what is called the lapped orthogonal transform (LOT) (Malvar, 1989). This is a typical example of oversampling, which generates a linear dependence (redundancy) that improves robustness (this is discussed further in Section 111). We will see later that if we apply a blocklike decomposition in the Fourier domain, we can obtain a multiscale or multiresolution transform. In block transforms orthogonality is guaranteed, but there is not a good joint localization.
B. Wavelets In wavelet theory, the signal is represented with a set of basis functions that sample the conjoint domain time-frequency (or space/spatialfrequency), providing a local frequency representation with a resolution matched to each scale, so that
12
RAFAEL NAVARRO ET AL.
where are the basis functions and ci are the coefficients that constitute the representation in that basis. The key idea of a wavelet transform is that the basis functions are obtained by translations and dilations of a unique wavelet. A wavelet transform can be viewed as a decomposition into a set of frequency channels having the same bandwidth on a logarithmic scale. The application of wavelets to signal and image processing is recent (Mallat, 1989b; Daubechies, 19901, but their mathematical origins date back in 1910, with the Haar (1910) orthogonal basis functions. After Gabor's seminal Theory of Communication (1946), wavelets and similar ideas were used in solving differential equations, harmonic analysis, theory of coherent states, computer graphics, engineering applications, etc. [See, for instance, Chui (1992a, 1992b), Daubechies (19921, Meyer (1993), and Fournier (1994) for reviews on wavelets.] Grossman and Morlet (1984) introduced the name wavelet (continuous case) in the context of geophysics. Then the idea of multiresolution analysis was incorporated along with a systematic theoretical background (Meyer, 1988, 1993; Mallat, 1989b). In the continuous 1D case, the general expression of a wavelet basis function is
where the translation and dilation coefficients ( b and a, respectively) of the basic function vary continuously. In electrical engineering, this is called a "constant" Q resonant analysis. The continuous wavelet transform W of a function f E L2(%),i.e., square integrable, is
The basis function q must satisfy the admissibility condition of finite energy (Mallat, 1989a). This implies that its Fourier transform is pure bandpass having a zero DC response @(O) = 0. Thus, the function q must oscillate above and below zero as a wave packet, which is the origin of the name wavelet. The wavelet transform (WT) has a series of important properties. We list only a few of them. The WT is an isometry, up to a proportional coefficient, L 2 ( % )+ L2(9ti'X 8)(Grossman and Morlet, 1984). It can be discretized by sampling both the scale (frequency) and position (space or time) parameters as shown in Fig. 2b. Another property is that wavelets easily characterize local regularity, which is interesting in texture analysis. In the discrete case, more interesting in signal processing, there exist
IMAGE REPRESENTATION WITH GABOR WAVELETS
13
necessary and sufficient conditions that the basis functions have to meet so that the WT has an inverse (Daubechies, 1992). A specially interesting class of discrete basis functions is orthogonal wavelets. A large class of orthogonal wavelets can be related to quadrature mirror filters (Mallat, 1989b). There are important desirable properties of wavelets that are not fully compatible with orthogonality, namely, small (or finite at least) spatial support, linear phase (symmetry), and smoothness. This last property is very important in signal representation to avoid annoying artifacts, such as ringing and aliasing. The mathematical description of smoothness has been made in terms of the number of vanishing moments (Meyer, 1993), which determines the convergence rate or wavelet approximation to a smooth function. Finite impulse response (small support) is necessary for having spatial localization. Among these desirable features, orthogonality is a very restrictive condition that may be relaxed to meet other important properties, such as better joint localization. In particular, the use of linearly dependent (redundant) biorthogonal basis functions (Daubechies, 1990) makes it possible to meet smoothness, symmetry, and localization requirements while keeping most of the interesting properties derived from orthogonality.
C. Multiresolution Pyramids Multiresolution pyramids are a different approach to joint representations (Burt and Adelson, 1983). The basic idea is similar to that of the block transforms but applied to the frequency domain. Let (W;(w)}be a set of windows that completely cover the Fourier domain, i.e., C W & w ) = 1. Then we can decompose the Fourier transform F of the signal in a series of bands so that f(t)
=
xfi:(t)= i
1 2T
F ( w ) [ W ; ( o ) e ’ ” ‘ ]d w . --m
Here we have represented the signal as the sum of filtered versions, f$), one for each window (band). This produces a representation that is localized in space (or time) and frequency (depending on the width of the window). The product within the bracket is a sort of Fourier (complex) wavelet that forms a complete basis. The set of windows {W;(w)}can be implemented as a bank of filters. Mallat (1989a) has shown that there exists a one-to-one correspondence between the coefficients of a wavelet expansion and those of multiresolution pyramid representations, as illustrated in Fig. 3. This is done through a mother wavelet and a scaling function 4. Figure 4 shows an example of a scaling function in both spatial
14
RAFAEL NAVARRO ET AL.
t
Sliding Window g(t 1
FIGURE3. The Fourier-windowed transform (STn3 as a filter bank. If the window is a Gaussian the modulated filter bank produces a Gabor transform. The output of this bank can be plotted on a joint diagram as in Fig. 2. The entries in any column represent the DFT of the corresponding batch of data. Each row represents the contribution to each harmonic from the bank filter. Redrawn by permission from Rioul and Vetterli, Wavelets and signal processing. IEEE Signal Proc. Mug. 8, 14-38. Copyright 1991 IEEE.
and Fourier domains as well as its associated wavelet function, also in both domains. The basic idea is to split the signal into its lower and higher frequency components. One of the main applications of multiresolution representations is in coding and compression, in which each frequency band is sampled to achieve a maximum rate of compression. The name pyramid comes from the fact that the sampling rate depends on the bandwidth of each particular subband (Tanimoto and Pavlidis, 1975). Therefore, if we put the samples of each band on top of the previous one we obtain a pyramid. There are basically two different strategies for sampling. Critical sampling is used to eliminate redundancy so that the conjoint representation has no more samples than the original signal. Although we can obtain higher rates of compression with critical sampling, it has an important cost. Namely, we end up with a representation that is not robust (losing a single sample will cause very disturbing effects) and that is not translational invariant (i.e., a small displacement of the signal will produce a representation that is completely different), which preclude its application to vision (Simoncelli et al., 1992). In some applications, it is possible to solve the translation dependence by a circular shift of the data (Coifman and Donoho, 1995). However, a much more robust representation is obtained by Nyquist
15
IMAGE REPRESENTATION WITH GABOR WAVELETS t
1
0
-5
0.4
0.2
0
5
0
x
1, -10
-B
0
I
10
w
L
w
4. Example of a scding function 4 ( x ) (upper left) and its transfer function I&W) (lower left), along with the impulse response of the associated wavelet filter $ ( X I (upper right) and its Fourier transform $(o)(lower right). Redrawn by permission from Mallat, A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Patt. Anal. Machine Intell. 11, 674-693. Copyright 1989 IEEE. FIGURE
sampling of each band, i.e., taking samples with a frequency double the maximum frequency present in the band. The result will be a shiftable and robust multiscale transform, at the cost of some redundancy. One practical problem is that of designing filters with a finite impulse response, simultaneously having good frequency resolution. One solution is to use quadrature mirror filters consisting of couples of low pass and high pass that are in phase quadrature (Esteban and Galand, 1977). This constitutes an orthogonal basis that permits obtaining good localization in both domains, avoiding aliasing artifacts, and obtaining an exact reconstruction of the signal.
16
RAFAEL NAVARRO ET AL.
RGURE5. Original image (a); wavelet transform pyramid with biorthogonal basis functions (b); recovered image (c) after thresholded coefficients (d).
The extension to 2D (for application to image processing) of most of the analysis done above in 1D is straightforward. Figure 5 shows an example of a multiscale wavelet transform (b) of a woman’s portrait (a), including the application to compression: after thresholding the coefficients as explained in Section V,A (d) and image recovered (c) from (d). D. Viwn-Oriented Models
One striking fact about joint multiscale representations and wavelets is that a similar representation has been found in the human visual system
IMAGE REPRESENTATION WITH GABOR WAVELETS
17
(see Section IV). Marr (1982) and co-workers established the basis for the modern theory of computational vision defining the primal sketch. It consisted of detecting edges (abrupt changes in the gray levels of images) by applying a Laplacian of a Gaussian operator and then extracting the zero crossings. This is done at different scales (resolutions). Using scaled versions of this operator, Burt and Adelson (1983) constructed the Laplacian pyramid. Each layer is constructed by duplicating the size of the Laplacian operator, so that both the peak frequency an the bandwidth are divided by 2. In their particular pyramid implementation, they first obtained low pass-filtered versions of the image using Gaussian filters, then subtracted the results from the previous version. Then they subsampled the low pass-filtered version and repeated the process several times. Consequently, the Nyquist sampling of low pass-filtered versions of the image gives (1/2)* less samples, producing the pyramid scheme. This yields an overcomplete representation with 4/3 more coefficients than the original image. One important experimental finding in human vision is orientation selectivity, which is not captured by the Laplacian pyramid. Consequently, Daugman (1980) used 2D Gabor functions (GFs) to fit experimental data, and Watson (1983) implemented a computational model of visual image representation with GFs. By sampling the frequency domain in a lossless and polar-separable way, Watson (1987a) introduced an oriented pyramid called the cortex transform that permitted a complete representation of the image. The filters, four orientations by four frequencies plus low-pass and high-pass residuals, are constructed in the Fourier domain as the product of a circularly symmetric dom filter with an orientation selectivity fan filter (see Fig. 6a). The impulse response of the cortex filter (Fig. 6b) roughly resembles a 2D Gabor function with ringing artifacts. Marr (1982), Young (1985, 1987), and others have proposed Gaussian derivatives (GDs) as an alternative to Gabor functions for modeling the receptive fields of simple cortical cells. Figure 7 shows the four first derivatives in lD, and their frequency responses, Go, G,, G,, and G,, respectively correspond to the Gaussian and its first, second, and third derivatives. Cauchy filters (Klein and Levi, 1985) or even Hermite polynomials with a Gaussian envelope (Martens, 1990a, 1990b) have also been used but to a much smaller extent. Gabor functions turn out to be a particular case of Hermite polynomials when the degree of the polynomial tends to infinity. GDs are commonly used in the literatiire as an alternative to Gabor functions, having very similar properties but with the additional advantage of being pure bandpass (i.e., meeting the admissibility condition of wavelets), but at the cost of lower flexibility, i.e., fixed orientations, etc. (GDs are orthogonal only when centered on a fiycd origin of coordinates,
18
RAFAEL NAVARRO ET AL.
FIGURE6. Construction of a cortex filter in the frequency domain: (a) dom filter; (b) fan filter; (c) the cortex filter as the product of a dom and a fan filter; (d) the spatial impulse response of a cortex filter resembling Gabor function. Reprinted by permission from Watson. The cortex transform: rapid computation of simulated neural images. Comp. W .G ~ p h . Image h c . 39,311-327. Copyright 1987 Academic Press, Orlando, FL.
but under translation they lose their orthogonality). To solve this problem, steerable filters can be synthesized in any arbitrary orientation as a linear combination of a set of basis filters (Freeman and Adelson, 1991). Figure 8 shows examples of steerable filters constructed from the second derivatives of a Gaussian, G,, and their quadrature pairs H,. Figure 9 illustrates the design of steerable filters in the Fourier domain. Based on steerable filters, Simoncelli et al. (1992) have proposed a shiftable multiscale transform. Perona (1995) has developed a method for generating deformable kernels to model early vision. Trying to improve the biological plausibility of spatial sampling, Watson and Ahumada (1989) proposed a hexagonal-oriented quadrature pyramid,
IMAGE REPRESENTATION WITH GABOR WAVELETS
19
FIGURE 7. Gaussian derivative wavelets (left) along with their frequency responses (right). g o , . . . ,g, correspond to a Gaussian and the first, second, and third derivatives, respectively.
with basis functions that are orthogonal, self-similar, and localized in space and spatial frequency. However, this scheme has some unrealistic features such as multiple orientation selectivity. In summary, a large variety of schemes of image representation have appeared in different fields of application, including vision modeling. In particular, Fig. 10 shows 1D profiles and frequency responses for Gabor functions with different frequency tuning. We have mentioned Gabor functions briefly in this section, but we will give a detailed analysis next. For a thorough comparative evaluation and optimal filter design for several of the more used decomposition techniques see Akansu and Haddad (1992). 111. GABORSCHEMES OF REPRESENTATION
To introduce Gabor schemes of representation, let us consider the question of reconstructing a signal from a sampled version of the complex spectrogram (Section 11,A). It was shown [Eq. (1011 that a sample of the spectrogram at time t and frequency w could be seen as the projection of
20
RAFAEL NAVARRO ET AL.
FIGURE8. G, and H, quadrature pair basis filters (rows a and d) that span the space of all rotations of their respective filters. G, and H2 have the same frequency response (rows b and e) but a 90" shifted phase (quadrature). Rows (c) and (f) show equivalent x-y separable basis functions. Reprinted by permission from Freeman and Adelson. The design and use of steerable filters for image analysis, enhancement, and wavelet representation. IEEE Trans. Pan. Anul. Mach. Intell. 13,891-906. Copyright 1991 IEEE.
the signal onto a modulated and displaced version of the window, g,, Js). Instead of a continuum, we now have only a discrete set of functions: { g n m ( s ) )= { g n T , m w ( s )= } (w(s - nT)eimwS},with n , m integers,
( 16) that sample the joint domain at points (nT,mW). Recovering the signal from the sampled spectrogram is equivalent to reconstruct f(s> from its
21
IMAGE REPRESENTATION WITH GABOR WAVELETS
C
d
f
e
FIGURE9. Design of a steerable digital filter in the frequency domain. (a) The desired radial frequency distribution; (b) the corresponding angularly symmetric 2D frequency response obtained through frequency transformation. The resulting responses of the four steerable filters (c)-(f) are obtained by multiplying by cos3(v- Oil. Reprinted by permission from Freeman and Adelson. The design and use of steerable filters for image analysis, enhancement, and wavelet representation. IEEE Trans. Pan. Anal. Mach. Infell. 13,891-906, Fig. 6, p. 895. Copyright 1991 IEEE.
projections on that set, that is, with a summation on the indexes (n,rn> instead of a double integral in t and o.Another related problem would be to express f(s) as a linear combination of the set of functions {gnm(s)}:
In the continuous case, Eq. (11) provides us with the answer to both questions, as it uses both the projections (f,g,, and the functions g,, to recover the signal. One could say that in that case, the same set of functions is used for the analysis (obtaining the projections) and synthesis (regenerating the signal). As we will see, that is not true, in general, when one counts only on a discrete number of projections. In that case, expressing f(s) as an expansion of a set of functions may constitute a problem different from using the projections of f(s) onto that set to recover it. These two problems are closely related, as we will see in Section II1,A. The Gabor expansion arises when the basis functions gnm(s)in Eq. (17) are obtained by displacements and modulations of a Gaussian window function [ w ( s ) ,in Eq. (16)]. Gabor (1946) based his choice of the window on the fact that the Gaussian has minimal support in the joint domain. Later, Daugman (1985) showed that this was also the case in 2D.
>,
22
RAFAEL NAVARRO ET AL.
o.:m
0 -1
-5
0
5
-05
0
5
FIGURE10. One-dimensional Gabor functions (left) and their frequency responses (right). The peak frequencies (O,f,, 2fl, and 4fl) and bandwidths correspond to a multiscale logarithmic scheme.
There are many possibilities when designing a Gabor expansion. Apart from choosing the width of the Gaussian envelope (which determines the resolution in both domains) and the phase of the complex exponential, the key issue is to decide the sampling intervals T (time or space) and W (frequency) that govern the degree of overlap of the “sampling” function g,,(s). Intuitively, it seems clear that a sampling lattice too sparse ( T , W large) will not allow exact reconstruction of the signal. The original choice of Gabor (1946) was TW = 2a,which corresponds to the Nyquist density. This is the minimum required to preserve all the information and, therefore, is called the critical sampling case. Schemes with TW < 2a correspond to oversampling. For a fixed 7” we can continuously vary the ratio T / W depending on whether we want more resolution in one or the other domain. The main problem of this expansion is the lack of orthogonality of the Gabor functions, which makes the computation of the expansion coefficients an, difficult. The task is trivial when the set {g,,,(s)) is orthogonal, because in that case the coefficient an, are the projections onto the same set of functions; that is, the analysis and synthesis windows are the same.
IMAGE REPRESENTATION WITH GABOR WAVELETS
23
For example, an orthogonal set is generated if the window function is a rectangular pulse as in block transforms. However, that window is not well localized in the frequency domain (as opposed to Gaussian windows), and therefore the coefficient anmmay capture components of the signal far from the desired frequency rn W. Unfortunately, this is a general drawback of orthogonal sets. The Low-Balian theorem (see Daubechies, 1990) states that no orthogonal set of functions can be generated from a window that is well localized in both domains. Therefore, as we mentioned before, joint localization and orthogonality of the set of functions are properties that cannot be met simultaneously. Much work has been done to overcome the problem of the lack of orthogonality of the Gabor functions, developing efficient ways to compute or approximate the expansion coefficients anm. This will be the subject of most of this section. A. Exact Gabor Expansionfor a ContinuousSignal Here we shall follow the theoretical formulation of Daubechies (1990) and review the main approaches to solving the continuous case. For this purpose, we will introduce the so-called biorthogonal functions (Bastiaans, 1981,1985; Porat and Zeevi, 1988) and the related Zak transform (Eizinger, 1988; Zeevi and Gertner, 1992; Bastiaans, 1994). We delay until the next subsection the discussion of the discrete case, where the computation of the coefficients is transformed into solving a linear system of (many) equations. For simplicity we restrict the discussion to the 1D case. All integrals and summatories are in Z unless otherwise stated. Following Daubechies, given a set of coherent states [displaced and modulated versions of a seed function, Eq. (1611, which we will call { q j ( s ) } (for simplicity we consider just an index), we define an operator T that maps a function f(s) in L2 (square integrable functions) into a sequence formed by its projections on the coherent states: T ( f ( s ) ) = {(f,*i)} ( 18) and the corresponding operator T* ,which reverses the process, mapping a sequence of coefficients {cJ into a function
T*((cj}) =
ciqj(s). i
Now if we define T as (T*T), this new operator maps L2 into L2. T computes the projections of f(s) into the set ( q i ( s ) } , and T* reconstructs a function g(s) from the resulting sequence. However, in general g(s) # f(s>; i.e., the operator T is not the identity. Consequently, trying to regenerate a function from its projections will not always recover the original signal.
24
RAFAEL NAVARRO ET AL.
This is, in operator notation, the already known fact that in general we cannot compute the coefficients of the Gabor expansion by simply calculating the inner products, as the set of Gabor functions is not orthogonal. To be able to reconstruct the signal, apart from T being a one-to-one map, in practice, stability is also required. This means that if two signals g(s) and f(s) are similar, their sequences should be close too. Mathematically we want Allf1I2 I
cl
with A > 0, B < m,
(20)
so that if Ilf - gll + 0, the sum of the squared differences of the projections should also tend to zero. The foregoing condition can be expressed using operator notation as A1 IT IBI,
(21)
with I the identity matrix. A set of functions that generates an operator T complying with the foregoing conditions is said to form a frame (Duffin and Schaeffer, 1952). The constants A , B are called frame bounds and determine some important properties. A frame can be seen as a generalization of the concept of a linear basis in a Hilbert space, being able to generate the space, but leaving, in general, “too many” vectors. An irreducible frame will be a basis with linearly independent elements; otherwise, the frame is redundant with elements that are not linearly independent. There are two advantages in using redundant frames. First, redundant frames are not orthogonal, and as we mentioned before (Low-Balian theorem), relaxing the orthogonality condition permits elements with better localization. Second, the linear dependence of the elements of a redundant frame implies robustness, in the sense that the combination of elements can “do the work” of another element that is lost, destroyed, etc. Orthogonal bases are a particular case of nonredundant, linearly independent frames whose functions present bad localization properties. Using T, we can construct a dual set of functions that also constitutes a frame as +;(s)
=
T-’*;(s).
(22)
The dual frame is very useful because if we denote the dual operator as ?, then
?f= I
(23)
A
and T*T = T*T = I. In practice,*this means that if we generate a sequence from a function using T or T, by applying the corresponding adjoint
IMAGE REPRESENTATION WITH GABOR WAVELETS
25
operator in the inverse operation [as in Eq. (1911 we now recover the signal:
Equation (24a) corresponds to the Gabor expansion when {*;I is a set of Gabor functions. It shows that the coefficients of the expansion are to be computed as projections not on the original set but on the dual one. On the other hand, Eq. (24b) tells us how to recover the signal from samples Therefore, both equations provide the answer of the spectrogram ((f,9;)). to the two problems stated at the beginning of the chapter, showing that they are closely related through the concept of the dual set. If we use a particular synthesis window, we must use the corresponding dual function for the analysis and vice versa. It is important to note that if the frame is redundant, another set of functions in the space can be obtained that could play the role of the dual set and lead to the reconstruction formulas of Eq. (241, having then multiple expansions. The Gabor expansion can then be easily computed if we can generate the dual set. This can be found from Eq. (221, using the following expansion for T - I :
The convergence of this series depends on the frame bounds A and B. One can distinguish three cases: A
=
+;.=
0
B
=
1: from Eq. (211, we see that T = I, and consequently
*;.The synthesis and analysis windows are the same, and
therefore the frame constitutes an orthogonal basis. = B: “Tight” frames; in this case T = AI, and the dual set is the same, except for a constant. A # B: General case. If A = B the frame is said to be “snug.”
A
In the first two cases, it is trivial to recover the signal from its projections. The case of snug frames is important because when B / A = 1 we have good convergence in Eq. (25) and the dual set is not too different from the original. Consequently, these snug frames provide good direct
26
RAFAEL NAVARRO ET AL.
recovery of the signal, to a first approximation, from its projections on the frame functions; i.e., taking only the term k = 0 in Eq. (25): f(s) a
c
(26)
i
An adequate choice of the seed function from which the set is generated
allows us to build up a tight frame with all its convenient properties (Daubechies et al., 1986). However, for the Gabor expansion, the seed function is a Gaussian. Thus, we have to study the conditions under which a frame can be developed and try to make it a tight or, at least, snug frame. The parameters that we can vary in the Gabor expansion are the sampling rates in time ( T ) and frequency ( W ) . If TW 2 27r, the sampling lattice is too loose, and the signal cannot be completely recovered (Bargmann et al., 1971). The choice in many schemes (e.g., Gabor, 1946; Bastiaans, 1981) was then TW = 27r, i.e., the critical sampling corresponding to the Nyquist density, where the coefficients of the expansion can be associated with the degrees of freedom of the signal. However, as Gaussians are smooth functions, the Low-Balian theorem states that they cannot constitute a frame for this critical sampling. In practice this means that with critical sampling the reconstruction, although possible in theory, will lack stability, reducing its interest in many practical applications. Finally, when TW < 27r (i.e., oversampling the signal) we can obtain a frame (although not a convenient “tight” frame). However, by increasing , so on), we can get the amount of oversampling (WT = 7r, WT = ~ / 2 and snug enough frames (with B/A = 1). Fortunately, oversampling by a factor of 2 or 4 (over the Nyquist frequency) is enough in practice, although stability will improve as we increase oversampling. For a given value of TW < 27r, any choice of T, W will generate a frame, but we can see intuitively that large values of T will create bad spatial sampling, and the reciprocal is true for large W. In general, the best results (that is, the snugger frames) are obtained by adjusting T to the width of the Gaussian. If we desired to increase the temporal resolution we should reduce both T and the Gaussian width. The evolution of the dual functions for different degrees of oversampling (TW = 27rA) was studied by Daubechies (1990) for the continuous case and is shown in Fig. 11. Similar studies for the discrete case can be found in Wexler and Raz (1990); Qian et al. (1992); and Qian and Chen (1993). Moderate oversampling (by a factor of 4; A = 0.25) is enough to have dual functions with very little difference with respect to the original Gaussian envelope (except for scaling). With this modest redundancy it is possible to obtain an approximate direct reconstruction of the signal, after
IMAGE REPRESENTATION WITH GABOR WAVELETS
27
Eq. (26). As we move closer to the critical sampling ( A + l), the dual functions depart from the Gaussian shape, and consequently the later approximation will no longer hold. For critical sampling, we obtain a dual function that is non-square-integrable (lower right panel in Fig. 11) reflecting the unstable nature of the reconstruction. The convergence of the dual functions (the analysis window) to the original Gaussians (the synthesis window) as oversampling increases admits an intuitive explanation. We can consider oversampling as an intermediate step between the reconstruction of the signal from the continuous spectrogram [Eq. (1 11, corresponding to an infinitely dense sampling lattice] and the critical sampling [using the sparsest possible lattice in Eq. (17)]. Whereas in the latter case, the analysis and synthesis windows can be very different, in the former they are proportional by a factor 2 7 ~[Eq. (1 01. Therefore, by increasing oversampling we can get a more robust representation with dual functions closer to Gaussians.
1. Biorthogonal Functions Bastiaans (1981) introduced the idea of using functions that were biorthogonal to the set of Gabor functions, a concept that Daubechies (1990) later generalized in the context of dual frames. Indeed, these biorthogonal functions turn out to be the dual set because Eq. (22) implies biorthogonality among the dual sets: =
aij.
(27)
0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 FIGURE11. Evolution of the dual function of a Gaussian obtained through Eq. (25) for different sampling rates (7"= 27rk A = 0.25, 0.375, 0.5, 0.75, 0.95, and 1.0). Reprinted by permission from Daubechies. Ten lectures on wavelets. Copyright 1992 Society for Industrial and Applied Mathematics. Philadelphia. -10 -5
28
RAFAEL NAVARRO ET AL.
Bastiaans (1981) was the first to use this relationship to compute the analytical expression of the biorthogonal functions in the Gabor expansion (for the critical sampling TW = 27r). From a Gaussian synthesis window w ( s ) of width T
he obtained the corresponding biorthogonal analysis windows G(s) of the dual set:
where K, = 1.8540746. He also showed that the biorthogonal or dual set can be generated by translations and modulations of G(s). Both functions, w(s), G(s) are shown in Fig. 12. The dual function is badly localized in time (Low-Balian theorem). Consequently, the coefficients unm will capture information from the signal far from the time nT. Moreover, its odd behavior with sharp spikes will cause stability problems in practice [some considerations of the effect of quantization in the Gabor expansion can be found in Porat and Zeevi (1988)l. In spite of this problem, the use of biorthogonal functions is widespread in the computation of the Gabor 1.2
0.6
0
-0.6
- 1.2 -10
-S
0
S
1 0
FIGURE12. Gaussian window (solid line) and its corresponding biorthogonal function (dotted line) for the critical sampling case. The analytical expression of this function was first obtained by Bastiaans (1981).
IMAGE REPRESENTATION WITH GABOR WAVELETS
29
expansion [Bastiaans (1981,1985) in 1D and Porat and Zeevi (1988) in 2D]. Some authors (Kritikos and Farnum, 1987) have introduced approximations that facilitate the computation of the biorthogonal function for a more general type of window. 2. The Zak Transfom
In this section we introduce the Zak transform (ZT) first proposed in the context of solid-state physics and the reciprocal lattice (Zak, 1967). This transform, another joint representation of a signal, permits reformulating the problem without the explicit use of biorthogonal or dual functions. The Zak transform of f ( t ) is the Fourier transform of the sequence { f ( t + mT)): f(t, w ) = f ( t + mT)e-imWT. (30)
c m
f i t , w ) is periodic in w with a period R
=
2 r / T and quasiperiodic in t , so
that: f(f
+ k T , w + m a ) =f(t,
w)eimwT.
(31) The Zak transform maps the information contained in f ( t ) into a square of area T X R = 27r in the joint domain. This again is related to the Nyquist sampling density. Bastiaans (1985) implicitly used the Zak transform, but Eizinger (1988; Eizinger et al., 1989) was the first to apply it explicitly to the Gabor expansion. Later work has been done on implementing the ZT in the discrete case (Zeevi and Gertner, 1992; Bastiaans, 1994). The ZT is interesting here because it permits us to translate the biorthogonality relationship of Eq. (27) into a product: ( f ( t ) , g ( t ) ) = Sij * T f ( t , w ) g * ( t , w )= 1 .
(32) We can then invert the ZT of the Gaussian window (provided it has no zeros) to compute the ZT of the desired dual biorthogonal function. Finally, by computing the inverse ZT one obtains the desired dual function. However, we do not need to obtain these dual functions explicitly. Taking Zak transforms on both sides of the Gabor expansion [Eq. (1711 we obtain: f(t, w ) = G ( t , w ) C anme-inWT eimnr . (33)
c
n
m
Thus, the Fourier transform Z ( t , w ) of the sequence of coefficients anmis given by
30
RAFAEL NAVARRO ET AL.
and from it, the coefficients anmcan be obtained via the inverse Fourier transform of a sequence. If the Zak transform of the window function $0, w ) has zeros, it means that nonzero functions &, w ) can be built in such a way that W ,w ) & , w ) = 0. Then h(t, w ) are homogeneous solutions of the biorthogonal equation [Eq. (32)], and consequently adding them to the biorthogonal window generates new permissible reconstruction functions, causing a multiplicity of solutions. Similarly to what happened in the discussion in terms of frames, one can take advantage of the multiplicity of solutions when oversampling to generate well-behaved dual functions using the ZT (Zibulski and Zeevi, 1993). B. Gabor Expansion of Discrete Signals In the foregoing analysis we have considered continuous signals to be expanded through a set of continuous functions. In digital signal processing, an analog-to-digital (A/D) converter will sample the input function, which will be known only at a finite number of points ( N ) . Therefore, in practice we will be mostly interested in the Gabor expansion of discrete signals, which will be distinguished using brackets notation. As is usual when working with finite discrete sequences, we assume that we are dealing with a periodic sequence of period N. In the discrete case the Gabor expansion becomes a system of linear equations. Let us suppose that M Gabor functions are being used for the expansion. Again, in what follows, we will use a single index to denote the different Gabor functions, although two (for 1D signals) or four (for 2D signals) should be used to indicate the sampling of the joint domain (nT,rnW).As before, we want to find a set of coefficients { c i ) so that the Gabor expansion is as close as possible to the original signal f [ k ] (in the N sampled points) according to some criterion: M
f[k]=
C cigi[k],
k
=
0,... N
(35a)
i= 1
that we can rewrite as a vector-matrix product: F=GC
(35b)
where F is a vector whose N components are the input samples, and C is the vector formed by the M coefficients of the expansion (the unknowns). G is a N X M matrix whose M columns correspond to the M Gabor functions sampled at the N points. A common criterion of similarity is
IMAGE REPRESENTATION WITH GABOR WAVELETS
31
least squares error, where the goals is to minimize the norm of the error vector: (F-GC)+(F-GC)
(36)
where the superscript denotes the transpose of the complex conjugate. Again, using an orthogonal basis, the solution C could be easily built through the projections on the basis vectors (now using a discrete version of the inner product). However, since Gabor functions are not orthogonal, one has to solve the well-known set of normal equations: +
G'GC
=
G'F
(37)
where now G'G is a M X M square matrix. The main advantage of this least-square approach is that the above result is a rather general solution. It does not depend on the particular type of expansion of sampling used. The solution obtained is the closest to the original signal in the leastsquares sense. We need only solve a (large) linear system with N equations (the number of samples) and M unknowns (the number of Gabor functions used to reconstruct the signal). Next we review the main approaches to practically solving Eq. (37). 1. Iterative Methods: Daugrnan 's Neural Network
The first method for calculating the Gabor expansion by solving a linear system of equations was developed by Daugman (1988) and applied to 2D signals (images). He pointed out the difficulties of directly solving the system: for a typical 256 X 256 pixel image, we would need to solve a system of at least 65,536 equations. However, he also noted that the joint localization of Gabor functions would lead to very sparse matrices, thus opening the door to special techniques. Daugman utilized a neural network, implementing a steepest descent method to minimize the cost function of Eq. (36). From an initial guess of the coefficients he reconstructed the image and computed the resulting error. Then, each coefficient is updated by an amount proportional to the inner product of its corresponding Gabor function and the reconstruction error. As there is only one minimum, the net cannot be trapped in local minima, and the convergence toward the desired coefficients is ensured. As this procedure is based on iterative methods for solving a linear system, its convergence can be improved using standard techniques in numerical analysis (Braithwaite and Beddoes, 1992). Figure 13a shows an image of 256 X 256 pixels, and Fig. 13b presents the 4D coefficient set {anmrs}of the Gabor expansion computed by Daugman. The sampling intervals in the spatial domain were 16 pixels in
32
RAFAEL NAVARRO ET AL.
a
b
FIGURE13. (a) Image Lena 256 X 256 pixels. (b) Coefficients of the Gabor expansion of (a) computed using Daugman's neural network. Reproduced by permission from Daugman. Complete discrete 2D Gabor transform by neural networks for image analysis and compression. IEEE Tmns. Acoustic, Speech, Signal Process. 36, 1169-1179. Copyright 1988 IEEE.
both directions, so that all the coefficients corresponding to different frequencies are grouped in the figure around the spatial sampling positions (16n, 16rn). 2. Direct Methods for the Inversion of G'G
Despite the large size of the resulting linear system, the approach of trying to solve it directly is not so inefficient as one might think. Once the particular set of Gabor functions that will be used to sample the joint domain has been chosen, the matrix G that defines the system is fixed, independent of the input. Then the bulk of the process, the inversion of the matrix, has to be done only once. Most approaches consist of factoring the matrix of Eq. (37) using, for instance, QR decomposition (Lau et al., 19931, singular value decomposition (Ebrahimi and Kunt, 1991), or Toeplitz matrices (Yao, 1993). Once this very time-consuming task (on the order of N 3 operations, with N the number of coefficients) is accomplished, the computation of the Gabor expansion for a particular signal is a much faster process (requiring N Z operations), because it involves only a matrix multiplication. In the critical sampling case N = M the coefficients can be obtained as C = G-'F, although the ill-conditioning of G may force the use of singular value decomposition techniques to invert it (Ebrahimi and Kunt, 1991).
IMAGE REPRESENTATION WITH GABOR WAVELETS
33
On the other hand, undersampling leads to an overdetermined system
( M < N ) , where there are too few functions ( M I to account completely for the degrees of freedom of the signal ( N ) . The signal cannot be recovered exactly, but a best approximation in the least-squares sense can be obtained [after Eq. (3711 by applying the pseudoinverse ( G + G ) - ' G + to the input F. It has been pointed out (Genossar and Porat, 1992) that trying to represent a signal optimally with an incomplete set of Gabor functions is not equivalent to computing a complete Gabor transform and then dropping unwanted coefficients. In the latter case, the reconstruction obtained is not the closest to the original signal in the least-squares sense, because of the lack of orthogonality of the Gabor functions. This could have important consequences for image compression: if we decide not to code certain Gabor coefficients, it is advisable to compute the Gabor expansion from the beginning as an undersampled case, using only the subset of Gabor functions whose coefficients we intend to use. Finally, for oversampled expansions, we have an underdetermined system, with more degrees of freedom ( M )than restrictions ( N ) , and one ends up with multiple solutions. Note that multiple solutions indicate multiple sets of biorthogonal functions that can be used in the reconstruction (as happened in the continuous case). In this case, the pseudoinverse can also be used to find the particular solution of minimal norm. Whereas the preceding methods aim to solve Eq. (37) to compute directly the coefficients of the expansion, other approaches express the (discrete) biorthogonality condition [Eq. (27)] as another system of linear equations (Wexler and Raz, 1990) to obtain the dual functions. Again, for a given window and sampling scheme, they have only to compute the dual functions once and then calculate the coefficients through fast inner products. As before, in a oversampled scheme they find a system with multiple solutions. The additional degrees of freedom (due to underdetermination) can be used to impose further realistic or useful constraints on the solutions, such as reduced spatial support, better joint localization, or similarity with the original window (Wexler and Raz, 1990; Qian et al., 1992). Qian and Chen (1993) showed that the minimum norm solution obtained through the pseudoinverse gives a biorthogonal function that is the closest to the original window. The same idea appears again: oversampling with a nonorthogonal set gives rise to multiple solutions, which permits finding biorthogonal functions with better behavior and a more robust Gabor expansion. Moreover, when the dual function is close to the original, we can approximate the coefficients of the Gabor expansion directly by the inner products ( f , yri), instead of using the theoretical biorthogonal functions. Some authors call this situation a pseudorthogonal scheme (equivalent to a snug frame in the continuous case).
34
RAFAEL NAVARRO ET AL.
C. Quasicomplete Gabor Transfom In the previous sections it was shown how the lack of orthogonality of the Gabor functions causes some difficulties in computing an exact Gabor expansion, which increases computational cost. Although the signal analysis (or decomposition) was relatively easy, based on computing inner products (projections), more complex biorthogonal or dual functions were needed for an exact reconstruction. Nevertheless, in many applications of image processing and machine vision, we are mostly interested in the analysis of the image, with exact recovery being a secondary issue. These two facts together lead us to the idea of developing a simplified version of the Gabor expansion, centered on the analysis and based on multiresolution pyramids (Section 11,C). Interestingly, inspiration for this approach comes from the human visual system (see Section IV), which seems to use a highly redundant, nonorthogonal, and incomplete representation (e.g., attenuation of low and high spatial frequencies, sparse sampling in the periphery). For these reasons, Gabor schemes of representation that are inspired by our visual system do not always yield an exact reconstruction (Watson, 1983; Navarro and Tabernero, 1991). Relaxing the constraint of an exact image reconstruction allows us to design a simpler quasicomplete representation, much more appropriate for real-time applications. Furthermore, despite the loose nature of these schemes, we will show methods for improving the recovery of the signal. 1. A Gabor Scheme Based on the Human Visual System
For most of the foregoing analysis, we have assumed regular sampling (nT, m W ) of the conjoint domain. However, in the visual system one finds frequency channels roughly distributed in octaves and with a bandwidth that is proportional to their peak frequency (Wilson and Bergen, 1979). This has inspired schemes with nonuniform sampling (Daugman, 1988; Porat and Zeevi, 1988) similar to the wavelet representations, where the complete set of basis functions is obtained by translation, scaling, and rotations of a mother Gabor function. Biological image representation is, of course, not so simple (see Section IV), but Gabor wavelets are a good model in the sense that they capture most of its functional features. The theory underlying visual models is that of multiresolution pyramids (Section II,C), where now the functions that try to cover the Fourier domain are called channels of the visual system. This approach allows easy and direct image recovery by simply adding together the different channels as in pyramid decomposition (Burt and Adelson, 1983), avoiding the problems associated with finding the exact synthesis window. The degree of redun-
IMAGE REPRESENTATION WITH GABOR WAVELETS
35
dancy of these models depends on the overlapping of the different channels. A minimal redundancy could be achieved if the channels were perfect bandpass filters (at the cost of other undesirable effects such as ringing artifacts). However, the visual systems seems to work with smooth Gaborlike channels instead, which produce a more robust representation (at the cost of higher redundancy). For other alternative vision-oriented models see Section II,D and Section IV. In these models it is common to adapt some of the terminology used in visual psychophysics and physiology. For example, the real or imaginary part of a 2D Gabor function [Eq. (l)] is often called a receptive field, and it is characterized by five labels, gxo,y o , fo, oo, p , that correspond to the spatial ( x , y ) and frequency (f,0 ) localization, and parity p = 0 , l (even, odd). In terms of multiresolution pyramids, we can group all the receptive fields having the same frequency and orientation (and sometimes parity) to construct a visual channel that consists of a Gaussian (or couple of Gaussians, symmetric or antisymmetric depending on the parity) in the Fourier domain. A difference from the formulation given in the previous sections is that now we do not have a coherent phase (common origin) for all the receptive fields within a channel (as happened with a set of coherent states), giving a more biologically plausible model. Following these considerations, Navarro and Tabernero (1991) proposed a quasicomplete Gabor scheme of image representation inspired by the human visual system. Similarly to the cortex transform (Watson, 1987a), it consists of four frequency channels with peak frequencies distributed in octaves, f,,= fN/2, fN/4, fN/8, and fN/16 (f, iS the spatial sampling Nyquist frequency) and four orientation channels (with orientations at 0" , 45", 90", and 135" 1. In order to obtain optimal covering of the Fourier domain, i.e., to approximately satisfy the condition C y ( w ) = 1 [and thus be able to apply Eq. (15) for an approximate reconstruction of the image], we have set the parameter a = 0.71f0. This produces in a log scale a constant bandwidth of 1 octave (radial) and an angular bandwidth of 0.71 radians (close enough to the optimum value of 7~/4). The resulting representation is redundant (by roughly a factor of 51, thus increasing robustness. Figure 14 shows the 16 basic Gabor functions (real parts) in the spatial domain (a) and the coverage achieved in the frequency domain (b). To improve this coverage, we include a low-pass filter (one Gabor function centered in the DC zero frequency). Eventually it can be useful to incorporate a high-pass residual that permits us to exactly meet the condition CW(:(o) = 1 and consequently leads to an exact reconstruction of the image. For each of these channels, we apply a Nyquist sampling of the spatial domain, resulting in a typical multiresolution pyramid. In this way, we can implement the computation of the inner products [Eq. (811 of
36
RAFAEL NAVARRO ET AL.
FIGURE 14. (a) Basic 4 X 4 Gabor filters depicted in the spatial domain, corresponding to four orientations (columns) and four frequencies (rows); (b) coverage of the Fourier plane with the set of Gabor functions.
the Gabor expansion as a series of 4 (frequencies) X 4 (orientations) X 2 (parities) convolutions in a pyramid implementation. Therefore, instead of doubling the size of the filters to obtain the next lower frequency channel, we apply the same set of filters to a decimated (by a factor of 2) version of the image (with a previous low-pass filtering to avoid aliasing). Another interesting advantage of Gabor functions is their duality in both domains, which allows efficient implementations in either of them.
37
IMAGE REPRESENTATION WITH GABOR WAVELETS
Spatial domain implementations, apart from the biological plausibility (no evidence for neural Fourier transforms in the visual system), present additional advantages, such as the possibility of restricted application to a region of interest, as opposed to the global nature of a Fourier domain implementation. Furthermore, we can turn the 2D convolutions into cascaded 1D convolutions [by applying simple trigonometric rules: sin(a 6) = sin(a) cos(b) + cos(a) sin(b) (Heeger, 1987)l. In this way, Gabor convolutions can be implemented faster in the spatial domain than through the conventional fast Fourier transform (FFT). Nestares et al. (1995) have developed an optimized implementation of this Gabor scheme with 1D convolution masks of 11 elements, as the result of a trade-off between fidelity to the frequency response of the channels and low computational cost. In the design of the masks, the filters are forced to be pure bandpass (zero DC response). Other authors multiply them by a power of a cosine function to guarantee zero DC response (Heitger et al., 1992). Figure 15 shows the resulting responses of the (even) Gabor channels for the Einstein’s face test image (upper right). Each channel captures the image features corresponding to its frequency and orientation. On the upper left we have included the response to the high-pass residual, (1 - C w ) ) , to be used when we want to reconstruct the image exactly; the low-pass channel is on the lower left. To reconstruct the image, we have only to apply Eq. (15): adding together all the Gabor channels plus the low-pass and (optionally) high-pass residuals. Figure 16 shows different possible reconstructions with increasing quality. If we add only the Gabor channels plus the low-pass residual we get a reconstruction (a) that is not perfect due to computational errors in the pyramid implementation. This can be greatly improved with a simple equalization, by assigning (fixed) weights to the different channels to obtain a flatter transfer function. As a result, the reconstruction so obtained (b) has an SNR of 24 dB and a high visual quality. Finally, by introducing the high-frequency residual, the reconstruction (c) is visually indistinguishable from the original (d), having an SNR of 28 dB (Nestares et al., 1995).
+
w.(
I v . VISION MODELING A. Image Representation in the Visual Cortex
In the previous sections, we have presented the Gabor expansion and other related wavelets as convenient and powerful mathematical tools for the analysis and representation of images in the conjoint space. During
FIGURE15. Einstein image (256 x 256) and its decomposition in four frequency channels (rows) and four using the set of filters shown in Fig. 14. Both the high- and low-pass residuals are shown s orientation c h a ~ e l (columns) on the left, in the upper and lower rows, respectively.
IMAGE REPRESENTATION WITH GABOR WAVELETS
39
FIGURE16. Image reconstructions with increasing quality. From left to right and from top to bottom: direct reconstruction adding up the 16 Gabor channels and the low-pass residual, reconstruction with channel equalization, reconstruction also using the high-pass residual, and finally the original image.
that discussion we hinted at the connection between Gabor schemes and the visual system. In this section we further analyze the relationship, reviewing physiological and psychophysical data that support it. There are several general features of the functional architecture of the human visual system that remind us of many aspects treated in the previous sections. Moduhriq and parallelism are basic features of that architecture. Different parts (optics, photoreceptor array, groups of neurons, etc.) constitute specialized modules that process the information sequentially or in parallel, with many feedbacks and interactions among them. It seems likely that there are different independent (although interacting) channels that process color, shape, motion, stereo, etc. (Levingston and Hubel, 1988). Moreover, the image representation of
40
RAFAEL NAVARRO ET AL.
some parts of the visual field is highly redundant, an aspect that carries robustness and stability (as we have largely discussed in connection with Gabor representations and wavelets). In a living system, redundancy is also necessary to maintain functionality, even after losing a percentage of units. The seminal works of Hubel and Wiesel (1962) in physiology and Campbell and Robson (1968) in psychophysics suggested that some type of bandpass filtering existed in the visual pathway. Later works confirmed that the early visual processing included an analysis of the image through a series of orientation-selective (Thomas and Gille, 1979) and frequencyselective (Campbell and Robson, 1968) mechanisms or channels. The physiological substrate for this processing were thought to be cells in the striate cortex, whose bandpass spatial response could be described by the Fourier transform of their receptive fields. Each neuron in the visual pathway has an associated area on the retina, within which the neuron presents some sensitivity to a light stimulus. The receptive field (RF) of a neuron is usually described as the cell response to a small spot of light on the retina as a 2D function of the position of the spot. It can show positive areas where the light increases the response (excitation) and negatives zones where the light reduces the activity of the neuron (inhibition). The RF is the equivalent of the impulse response of a linear system (although neurons are nonlinear in general, the response of simple cells can be considered linear in a first approximation) (Movhson el al., 1978a). Therefore, the response R of a neuron will be the inner product of the stimulus, or input image, i ( x , y ) , with its R F g x o , y o , f8,.o , ( x , y ) :
The labels stand for spatial localization, frequency tuning, and parity of the receptive field. This expression reminds us of the computation of the expansion of a signal (the image) into a set of functions (the RFs) suggesting a joint representation in the visual system (provided that the RFs adequately sample the conjoint domain). Computer modeling is very useful when trying to understand the properties of receptive fields, the visual representation, etc. In particular, it is important to find the mathematical function (if any) that fits the RFs best. This is somehow equivalent to choosing the basic window in the expansion of a signal. One of the first proposals by Rodieck (1965) used a difference of Gaussians (DOG, or “Mexican hat”) to fit the receptive field of retinal ganglion cells. It was not until 1980 (MarEelja, 1980; Daugman, 1980) that Gabor functions were proposed to describe the receptive fields of cortical units. Apart from a good fit to the data, these works suggested the
IMAGE REPRESENTATION WITH GABOR WAVELETS
41
fascinating question of the visual system evolving toward an “optimal” packing of information (Gabor’s logons). At this point, it is important to note that there is no experimental evidence supporting further visual processing to obtain the coefficient of an exact Gabor expansion. This is consistent with the fact that in the visual system there is no need to reconstruct the input signal and suggests an analysis-oriented multiscale model more similar to those presented in Sections II,D and II1,C. This connects with another relevant feature of the representation of the images in the visual cortex: the sampling of the conjoint domain, which is the related to the choice of the sampling intervals T, W. The sampling in the visual system combines many of the aspects discussed in the previous section; it is redundant in some areas, while heavily undersampling others. Such undersampling, along with the existence of many visual illusions, or the bandpass response, suggest a noncomplete representation. There is also a fundamental asymmetry between the very fine sampling of the spatial domain and the much coarser sampling of the frequency domain. These aspects show that the sampling lattice is far from being Cartesian but rather corresponds to a log-polar grid in both the spatial and frequency domains (Rolls and Cowey, 1970; Rovamo et al., 1978). We want to stress the fact that the mathematical models on which we will comment in the rest of this chapter can hardly capture the variety and richness of biological systems. Nevertheless, they are a key stage in the understanding of the complex visual system. We will focus on the aspects of the models more related to Gabor o r similar schemes, namely, the modeling of R F profiles and the sampling strategies. Other very important aspects, such as nonlinear stages, the role of noise, or decision-making processes (involved in detection, discrimination, etc.), are outside the scope of this review.
B. Gabor Functions and the RFs of Cortical Cells Hubel and Wiesel (1962) found two different types of cells in the striate cortex. Simple cells showed RFs with clear excitatory and inhibitory areas and linear response (Movhson et al., 1978a), whereas complex cells lacked distinct excitatory/inhibitory zones in their RFs and presented a nonlinear behavior (Movhson et al., 1978b). Simple cells directly receive the visual information from the lateral geniculate nucleus (LGN), and therefore they seem to constitute the first step in the image representation and processing in the visual cortex. Their RFs show similarities to Gabor functions and their response can be approximated by Eq. (38), as they have a mostly
42
RAFAEL NAVARRO ET AL.
linear behavior. The modeling of complex cells is more difficult, and from now on we will refer only to simple cells. By correlating the responses of simple units (obtained by physiological methods) with the presented stimulus, it is possible to measure their frequency and orientation tuning (peak frequency, orientation, and bandwidth) and spatial support (RF size). Since the first experiments (Hubel and Wiesel, 1962), the orientation selectivity of cortical units was clear, as opposed to retinal ganglion cells in earlier stages in the visual pathway. Further studies showed many cortical units with narrow frequency tuning (Campbell et al., 1969; Maffei and Fiorentini, 1973). These 1D results hinted that a cell might respond only to a small region in the 2D spatial-frequency plane. As each cell also had a limited size of RF, its response would correspond to an individual information logon in the 4D joint space. Studies with larger populations of neurons (e.g., Movhson et al., 1978a,b, c; DeValois et al., 1982a,b, etc.) have provided average data. Mean orientation bandwidth (measured as the width at half-height of the response) is about 35". The frequency bandwidth is roughly proportional to the peak frequency. Consequently, bandwidth is usually expressed in octaves, varying between 0.7 and 2.5 octaves, with an average of 1.4. This gives an aspect ratio (ratio between the frequency and orientation bandwidths) of about 1.5. MarEelja (1980) reported good 1D fits of Gabor functions to experimentally measured RFs. She was especially keen to note that although many other schemes could provide a joint representation of a signal, the visual system seemed to have chosen the one with the best localization in both domains. Pollen and Ronner (1983) added further support to this idea, finding couples of cortical units whose RFs were identical, except for a 90" shift in phase; i.e., they were in phase quadrature. This strongly suggests that a couple of these neurons could represent the real and imaginary parts of a complex Gabor function. They found pairs of cells without a definite parity, indicating an arbitrary phase offset in the complex Gabor function, but still keeping the quadrature phase relationship. Nevertheless, apart from this study, there exists little additional evidence in the physiological literature to support the idea that these couples of cells in phase quadrature are always present across spatial and frequency domains. More recently, Jones and Palmer (1987) presented a more extensive study in two dimensions. Figure 17 compares experimental data for three individual neurons in the primary visual cortex of cats (upper row) with a least-squares fit to Gabor functions (middle row) and the difference between them (lower row). In many cases, the deviations from the Gabor model were within experimental errors.
IMAGE REPRESENTATION WITH GABOR WAVELETS
43
Residuals
FIGURE 17. A comparison of three R F profiles obtained physyologically (first row) and the corresponding fit to Gabor functions (middle row). The error for each case is shown below. From Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opr. SOC. Am. A 2, 1160- 1169. Copyright 1985 Optical Society of American. Reprinted by permission.
Phychophysical studies can also help in inferring the shape of RFs. These studies suggest the existence of frequency channels with about 1 octave bandwidth and an aspect ratio of about 1.5 or 2, which roughly agree with physiological findings. Figure 18 shows the shape of the psychological channels found in four subjects using masking techniques (from Daugman, 1984), which can also be modeled as 2D Gabor functions in the Fourier domain. Nevertheless, psychophysical experiments reflect the global activity of many underlying cortical units (plus some decision-making process), and therefore it is harder to extrapolate from these data the shape of the RFs of individual cells. Despite the previous considerations, several objections have been raised against Gabor functions as models of RFs of simple cells. They are not pure bandpass (nonzero D C response) and consequently they do not decay fast enough at low frequencies, showing a lack of symmetry compared with real RFs (Hawken and Parker, 1987). Some “modified” Gabor schemes have been proposed to meet this objective such as using “Gabor” functions that are Gaussians in a log scale in frequencies (Field, 1987). Another problem is that the equispacing of zero crossings in the G F (due to the
44
RAFAEL NAVARRO ET AL.
periodicity of the sinusoids) is often not consistent with the experimental placement of zero crossing (Webster and DeValois, 1985). Consequently, other similar wavelets have been used to model the RFs of simple cells. We will review here the most commonly used: difference of Gaussians (DOG) and the Gaussian derivatives, C j . 1. Difference of Gaussians (DOG)
Rodieck (1965) and Enroth-Cugell and Robson (1966) first proposed DOGS, i.e., difference of two Gaussians centered at the same point, to model the RFs of retinal ganglion cells. The Fourier transform of the DOG is another DOG, with the frequency bandwidths of both Gaussians being the inverse of their original spatial bandwidths. These functions are
S u bj ec t H.W.
Subject J. D.
16 c/dw
16 c / d q
Subject W. C.
Subject R.E
16cb.o
16 C/&Q
FIGURE 18. Isoamplitude contours of psychophysical channels found in four subjects using a masking experiment. The tunning surfaces were sliced at half amplitude, thus representing the bandwidth of the channels. Reprinted from Daugman, J. G. Spatial visual channels in the Fourier plane. f i i o n Res. 24, 891-910. Copyright 1984 with kind permission from Elsevier Sci. Ltd., The Boulevard, Langford Lane, Kidlington OX5 lGB, UK.
IMAGE REPRESENTATION WITH GABOR WAVELETS
45
rotationally symmetric and have even parity, and therefore they can fit only ganglion cells that have no orientation selectivity. Hawken and Parker (1987) used linear combinations of several DOGs with some separation between their centers [difference of offset Gaussians (DOOGs), first proposed by MacLeod and Rosenfeld (1974)l to model the RFs of simple cortical neurons. Their model is physiologically plausible, because simple cells receive their inputs from LGN cells, which can in turn be modeled by simple DOGs. They reported that this model was better than either simple DOGS, Gabor functions, or the second derivative of a Gaussian, being able to fit even cells with “bumps” and other irregularities in their frequency response. Nevertheless, such good fit was attained at the cost of using a larger number of parameters. 2. Gaussian Derivatives (G,) The Laplacian of a Gaussian was introduced to model LGN cells (Marr and Hildreth, 1980). Later, Young (1985, 1987; 1993) further considered the successive derivatives of a Gaussian G j , claiming that they provided a better fit to experimental data than either Gabor or DOG functions. Gaussian derivatives have the advantage of being pure bandpass, which produces a better fit of the low-frequency part of the cell’s response (Hawken and Parker, 1987; Young, 1993). From experimental data alone, it is not yet clear which of these competing models we should choose. Moreover, connections have been found among them: combinations of DOOGs can be almost identical to Gaussian derivatives (Young, 19851, and from the latter we can generate Gabor-like functions, with a proper choice of parameters. Therefore, Gabor functions can be seen as an elegant simplification (very few fitting parameters providing good results) of more realistic models (requiring a larger number of parameters). Along this line, Hawken and Parker (1987) remarked that a subset of the RFs studied could be fitted with great accuracy using only Gabor functions, although their more sophisticated DOOG model could fit a wider range of RFs. In conclusion, although more sophisticated models can be more accurate, the simplicity of Gabor schemes makes them especially appropriate for computer modeling of image representations in the visual cortex.
C. Sampling in the Human Visual System Here we come back to the number and distribution in the conjoint domain of the sampling units in the visual system (Kulikowski et al., 1982; Sakkit and Barlow, 1982). Perhaps one of the most striking illustrations of this
46
RAFAEL NAVARRO ET AL.
joint sampling was the modular arrangement of the neurons in the visual cortex (first reported in 1962 by Hubel and Wiesel). An electrode penetrating at a right angle encountered cells with the same orientation tuning (and receiving inputs from the same eye). Further studies with radioactive markers confirmed that the cortex was divided into modules (hypercolumns), each of them receiving inputs from a particular region of the visual field. Within each module, depending on the direction in which the electrode moves, one can find cells tuned to different spatial frequencies and orientations. This topography clearly shows the underlying joint representation: each module is the result of a particular sampling point in space and contains the sampling units of the frequency domain. We now review the main aspects of the distribution of these sampling units, in both frequency and space. 1. Frequency-Orientation The frequency bandwidth is commonly measured in octaves, which suggests aspects of a typical wavelet or multiresolution sampling (Section 11). In those representations, the units tuned to the higher frequencies have smaller RFs than those tuned to the lower frequencies. This wavelet type of representation has important advantages (Kulikowski et al., 1982). A regular sampling lattice (the one proposed originally by Gabor as shown in Fig. 2a) would be better for the detection of simple gratings, but natural scenes tend to show a l/f decay in their power spectra (Navarro et al., 1987). Consequently, a logarithmic sampling of frequency is optimal for natural images, because it produces a nearly constant response across frequencies (Field, 1987). Models including logarithmic sampling were first used to predict experimental results of contrast sensitivity with fairly good results (Campbell and Robson, 1968; MacLeod and Rosenfeld, 1974; Wilson and Bergen, 1979; Watson, 1983). Later on, Gabor schemes inspired by the visual system were proposed for specific applications such as texture analysis (Turner, 1986; Malik and Perona, 1990) and image coding and compression (Watson, 1987b; Daugman, 1988). We will review these applications in detail in the next sections. In almost all cases, these models consider only a limited number of frequency channels (typically four to six) with a constant bandwidth in octaves. For each radial frequency, there is a similar number of orientation channels, constituting a log-polar sampling of the frequency domain. Of course, the visual system does not exactly follow a simple wavelet scheme. For instance, cortical cells tuned to high frequencies show smaller bandwidths (in octaves) than those tuned to lower frequencies (DeValois
IMAGE REPRESENTATION WITH GABOR WAVELETS
47
et al., 1982a). Therefore, simple cells do have a different numbers of excitatory and inhibitory areas depending on their peak frequency and are not scaled versions of one of them. Furthermore, neurons with the same peak frequency may have RFs differing in size. In addition, there are cells tuned to almost a continuum of frequencies as opposed to the typical four or five channels considered in the models. Finally, in most models the density of sampling units is adjusted to the bandwidth of the channels to permit a complete (Watson, 1987a) or quasicomplete (Navarro and Tabernero, 1991) representation. The visual system clearly differs from such an ideal situation, and most of the cortical units are tuned to intermediate frequencies (2-6 cycles/deg) at which we are most sensitive, instead of being tuned to the highest frequencies. This is a clear example of the nonrealistic simplifications introduced by the models in order to facilitate computer implementations.
2. Spatial Sampling An important feature of the cortical representation is that the topographi-
cal organization of the original image is maintained (although greatly distorted) through many stages in the visual pathway. Considering cones and rods, the retinal sampling is fairly regular, but in photopic conditions only cones are active (rods are saturated), and their density decays steeply with eccentricity, even within the fovea (Curcio et al., 1990). In the next level, the ganglion cell density drops even more drastically (Curcio and Allen, 1990). Therefore, much more information (per area) is transmitted from the fovea than from the periphery to the cortex. Van Essen et al., (1984) showed that a further emphasis on central vision takes place in the cortex, with almost half of the cortical units being devoted to the fovea. Figure 19 (from Van Essen et al., 1984) shows the retinocortical mapping of an “average” striate cortex, i.e., the position of individual cells in the cortex labeled with the eccentricity in the visual field of their corresponding RFs. Figure 20 (from Kelly, 1990) illustrates this situation with a test image (a) that has been distorted and redrawn in (b) to show how it might be mapped in the visual cortex. In this case, it has been assumed that the observer is looking at a point between the two eyes of the man. The results is that most of the distorted figure corresponds to the head of that man. Mathematically, such distortion corresponds to the log-polar sampling [lodr),01 that is a particular case of conformal mapping (Weiman and Chaikin, 1979; Braccini et al., 1982). In the fovea (and parafoveal region) this log-polar sampling approach is not valid anymore (as the map presents a discontinuity at the origin), and more accurate modeling is needed (Tabernero and Navarro, 1993a).
48
RAFAEL NAVARRO E T AL.
Eccentricity I cm
Polar angle FIGURE 19. Localization of cells with RFs at different eccentricities in the macaque cortex, showing the mapping between the visual field and the cortex. From Van Essen, D. C., Newsome, W.T., and Maunsell, J. H. R., The visual field representation in striate cortex of the macaque monkey. Vkion Res. 24,429-448. Copyright 1984, reprinted with kind permission from Elsevier Sci. Ltd., The Boulevard, Langford Land, Kidlington O W lGB, UK.
Contrary to the schemes studied in the preceding sections, where underand oversampling were presented as opposed strategies, the spatial sampling in the visual system combines a redundant oversampling in the fovea, while implementing a drastic undersampling in the periphery. The log-polar spatial sampling (similar to log-polar sampling of the frequency domain, but much finer) seems to be the solution chosen by the visual system as a trade-off to obtain simultaneously high resolution (in a limited area) and a wide visual field. This inhomogeneous scheme of representation may be a very useful strategy in artificial vision applications.
IMAGE REPRESENTATION WITH GABOR WAVELETS
49
We would like to end this section by mentioning that both the shape of the RFs and their cortical organization could be the result (at least partial) of learning during the development of the visual system, because it is unlikely that the whole cell wiring could be predetermined genetically. Instead, the evidence tends to support the idea that an initial rough connectivity pattern refines to a more precise one through activity-dependent synaptic modification. This process can be altered if normal development is impaired (Stryker and Harris, 1986). Most of the proposed models use associative Hebbian learning techniques (Hebb, 1949). Synapses are strengthened if there exists correlation between the activities of the connected neurons (Miller et af., 1989). Inhibition processes among adjacent cells would ensure that neighboring units would pick up different features of the input, thus leading to the organization of cortical modules. Sanger (1989) showed that these techniques, when applied to natural
FIGURE20. Graphical illustration of the “retinocortical” projection: the original image (a) has been distorted (b) according to the point of fixation, to represent the dedication of the cortex to different eccentricities. Reprinted by permission from Kelly, D. H. Retinocortical processing of spatial patterns. Proc. SPIE 1249, 90-117. Copyright 1990 SPIE.
50
RAFAEL NAVARRO ET AL.
FIGURE 21. Some receptive fields obtained through the combination of Hebbian learning selecting the principal features of the training images (low-passed noise) plus an anti-Hebbian process that prevents all the RFs from converging to a similar shape. A translation-invariant algorithm is used to ensure the similarity of the RFs within a channel. We can appreciate different orientations and frequencies in the resulting RFs.
images, generated RFs that closely resembled windowed sinusoids. Therefore, if a similar process took place in the visual cortex, that could explain the Gabor-like form of the receptive fields, including frequency and orientation selectivity, and even phase quadrature pairs (Yuille ef al., 1989; Ahumada and Tabernero, 1992). For a review of these models with the supporting evidence and limitations see Miller (1990). Figure 21 shows the organization and shape of receptive fields resulting from a neural network learned with these rules. Each panel corresponds to a channel; i.e., it contains a set of RFs, all of them roughly tuned to the same frequency, orientation, and parity but differing in their spatial position.
V. IMAGECODING, ENHANCEMENT, AND RECONSTRUCTION In this section we review several applications of Gaussian (and similar) wavelets to several problems of image processing: coding and compression, image enhancement, restoration from noise or blur degradations, or 3D reconstruction. We also mention some related applications such as image fusion or the development of image quality metrics that take into account the response of the human visual system. These measurements of image quality are crucial to evaluating the results provided by the techniques described in this section. A. Image Coding and Compression
This is an important classical problem, in which wavelets, vision-oriented models, and similar approaches have shown to be highly useful. Digital image compression techniques are essential for efficiently storing and transmitting information. Examples of areas of application are consumer
IMAGE REPRESENTATION WITH GABOR WAVELETS
51
imaging, medical, remote sensing, graphic arts, facsimile, high-definition
TV (HDTV), and teleconferencing. Compression is usually achieved by removing the redundancy inherent in natural images, which tend to show a high degree of correlation between neighbor pixels. In color images, there is an even larger correlation between different color planes. Finally, in moving sequences, it is common to find high redundancy between adjacent frames. The technological importance of image compression led to the development of standards, such as JBIG (Joint Bilevel Imaging Group) for bilevel images (halftones, documents, etc.), JPEG (Joint Photographic Expert Group) for still gray-level images (photographs, etc.), and MPEG (Moving Picture Expert Group) for sequential coding (digital video, etc.). Image compression consists of three steps: mapping (decomposition or transformation), quantization, and coding. The two last steps are common to most signal compression techniques. Quantization is always an irreversible process that must be done carefully to avoid reconstruction artifacts. Traditionally, quantifiers have been designed based on leastsquares error or similar criteria, but lately they also take into account the visual perception of the human observer (Watson, 1987b, 1993). According to Shannon’s information theory (Shannon, 1948), one can obtain higher compression by using vector quantization (VQ). VQ maps a sequence of vectors (subimages) to a sequence of indices according to a codebook, or library of reference vectors. The resulting sequence of indices is much more compact than the original vector, so it is stored instead of the original vectors. The design of the codebook is the key issue in VQ, and in particular Antonini et al. (1992) have proposed a multiresolution design. After quantization, the size of the final coding will depend on the entropy of the quantized signal. The Huffman code approaches the theoretical minimum bit/pixel ratio predicted by the entropy of the signal, and therefore it is widely used (Jain, 1989). Here we will focus on the initial mapping step, which is necessary for decorrelating the signal. The two main approaches are predictive coding and transform coding. A typical example of the first approach is differential pulse code modulation (DPCM) in which only the prediction error (difference between the test image and the value predicted from its neighbors) is quantized. In transform coding, decorrelation is accomplished by some reversible linear transform. For instance, we have already mentioned the decorrelating properties of wavelet, multiscale, or Gabor decomposition. Other examples include block transforms, e.g., the DCT used in the JPEG compression standard or the Karhunen-Lohe transform (KLT). Burt and Adelson (1983) designed the Laplacian pyramid (see Section II,C) as an efficient way to remove spatial correlation. This method
52
RAFAEL NAVARRO ET AL.
combines predictive and transform analysis and can be considered the origin of the actual multiresolution techniques. The prediction error at each scale is computed as the difference between the image and a low-pass version after applying the Gaussian filter. The Laplacian pyramid had scale (frequency) but not orientation selectivity. Further multiresolution schemes incorporated orientation selectivity (Watson, 198%). Related subband coding has been a very active field of research (Woods and O’Neil, 1986; Simoncelli and Adelson, 1990), and wavelet transforms have also been applied to image compression (Antonini et al., 1992; Devore et al., 1992; Lewis and Knowles, 1992). The Gabor expansion (see Section III,B) was first applied to image compression by Daugman (1988) and Porat and Zeevi (1988). It provides a very compact representation which notably reduces the entropy of the data. However, due to the lack of orthogonality of Gabor functions, obtaining the coefficients is computationally expensive (as discussed in Section 111). Further practical improvements of Daugman’s relaxation neural network (1988) have been introduced, such as the use of a cortical relaxation network (Pattison, 1992) or successive overrelaxation iterations and look-up table techniques (Wang and Yan, 1993). The high decorrelating properties and visual significance of the Gabor expansion are illustrated in Fig. 22 (unpublished material kindly provided by J. Daugman). Four attempts at reconstructing Lena (Fig. 13a) are shown, using an increasing number of coefficients (obtained through the Daugman’s relaxation network with log-polar sampling of the frequency domain) from a Gabor expansion, which weight their corresponding Gabor functions [Eq. (3511. The coefficients have been sorted according to their magnitude. The first attempt (upper left) was done with only 25 coefficients, which, of course, are too few. With only 100 coefficients (upper right) we start to appreciate some important features, such as the eyes, and with 500 coefficients (lower left) we can basically recognize the original image. Finally, 10,000 coefficients ( < 1/6 of the total number, 65,536) provide a reconstruction with enough visual quality. A simple implementation of the Gabor expansion has been applied to still image and video compression (Ebrahimi et al., 1990; Ebrahimi and Kunt, 1991). Wang and Yan (1992) have proposed a Gabor-DCT transform, using cosine elementary functions (instead of complex exponentials) with a Gaussian envelope. Because the computation of the Gabor-DCT is computationally expensive, they propose a sampling strategy similar to the DCT, in which only the coefficients to be coded are computed. The wavelet transform can be equivalent used for image compression. As in the example of Fig. 22, it is usual to sort the coefficients, discarding those with lower absolute (or energy) values (Anderson, 1992). Froment
IMAGE REPRESENTATION WITH GABOR WAVELETS
53
FIGURE 22. Partial reconstructions of Lena from a limited number of coefficients of its Gabor expansion (after sorting the coefficients according to their energy). From left to right and top to bottom, the reconstructions are obtained by using the first 25, 100,500, and 10,ooO coefficients out of 65,536. Courtesy John Daugman, unpublished.
and Mallat (1992) suggested coding only edge information, so that they reconstruct the image by combining multiscale edge information contained in the wavelet transform. As a first step, they considered only important edge points by thresholding edges (by length) at a previously determined scale. Then for the rest of scales they kept the wavelet coefficients at the same positions. The complete wavelet transform is obtained by alternative projection in an affine space (Mallat and Zhong, 1992). Figure 23 shows the reconstruction of Lena (left) obtained from the important edges found at the second level of the pyramid. The resulting image preserves most of its original edges, but texture has almost disappeared.
54
RAFAEL NAVARRO ET AL.
FIGURE23. Image reconstruction (left) after a wavelet edge detection and thresholded process (right). Reprinted by permission from Froment and Mallat (19921, “Wavelets: a tutorial in theory and applications,” Chui, C. K., Ed., Academic Press, San Diego, CA. Copyright 1992 Academic Press, Inc.
An interesting example of practical application is a new standard for a fingerprint database. The FBI has recently adopted a standard for (8-bit gray-scale) fingerprint image compression based on the use of a 64-band wavelet transform followed by an entropy coder. The compression rates are about 20:l (Bradley and Brislawn, 1993). Figure 24 compares the quality of the reconstruction for different compression schemes with the test image Lena (Hilton et al., 1994). The peak signal-to-noise ratio (PSNR; computed from the root-mean-square error of the reconstructed image) is plotted against compression rate. The PSNR decays with the rate of compression. For low compression ratios (less than 251) JPEG performs better, but for compression ratios above 30:1 the quality of the JPEG reconstruction rapidly deteriorates. These rms estimates do not consider visual criteria. More appropriate image quality metrics will be analyzed below. For instance, knowledge of the color perception mechanisms is important for developing visually efficient compression methods (Martinez-Uriegas et al., 1993). In fact, all perceptual aspects of spatial, color, and motion (or even stereo) information are important for image coding and compression (Watson, 1990).
B. Image Enhancement and Reconstruction Image capture always involves degradations, such as optical blur and detection noise. Most image forming processes can be described by the following expression (assuming an additive noise n and a space invariant
55
IMAGE REPRESENTATION WITH GABOR WAVELETS Comparison of Wavelet Compression Methods
42
I
I
I
I
I
I
I
1
40
38 36 34 32 30
28 26 24
+
22 20
8
I
1
16 32 64 128 Compression Ratio FIGURE24. Comparison of the quality of the reconstructions of Lena (512 X 512; 8 bpp) by different wavelets and JPEG. The Y axis represents the peak signal-to-noise ratio, in dB, given by PSNR = 2Olog1,(255/RMSE). The graph shows the influence of the particular wavelet and the encoding technique chosen. The zero-tree coder perform best and the biorthogonal basis performs better than Daubechies' W6. VLC, variable-length coder; FLC, fixed-length coder. Reprinted by permission from Hilton et al. Compressing still and moving images with wavelets. Multimedia Syst. 2, 218-277. Copyright 1994 Springer-Verlag, Berlin. 4
impulse response HI: i ( X , Y ) =H ( x , y ) @ .(x,y) + n ( x , y ) (39) where i ( x , y ) is the image and o ( x , y ) is the original object. The goal of image enhancement is to improve the image to facilitate feature detection, recognition, or analysis by human observers and lately also by artificial vision systems. Typical enhancement techniques are noise reduction, histogram stretching, contrast and edge enhancement, pseudocoloring, etc. The term restoration usually refers to an a posteriori compensation from known degradations, i.e., recovering the original object o ( x , y ) from the observed image i ( x , y ) . Nevertheless, a "blind" restoration may be possible under certain assumptions, with only generic knowledge of the nature of the degradation (Stockman et al., 1975; Navarro et al., 1987; Cristobal and Navarro, 1994a). Lindenbaum et al. (1994) have pointed out that Gabor was the first to propose an image deblurring method based on a directionsensitive filtering (Gabor, 1965). Traditionally, image enhancement and restoration (GonzGlez, 1986) used global space-invariant operations, in either the domain space (e.g., histogram stretching) or frequency domain (e.g., attenuating low frequencies). However, like most natural signals,
56
RAFAEL NAVARRO ET AL.
degradations tend to be nonstationary, i.e., spatially variant (Fahnestock and Schowenderdt, 19831, thus requiring a local description in both domains. Peli (1987) has proposed a method for adaptive enhancement using polar-separable fan filters (similar to those shown in Fig. 6b); Freeman and Adelson (1991) have applied steerable filters, and Toet (1992) used a multiresolution Gaussian pyramid. Figure 25 is an example (from Freeman and Adelson, 1991) in which an input angiogram (a) is enhanced by first filtering with a Gaussian derivative and then dividing the result by a local average of its absolute value. 1. Denoising The traditional approach to remove noise is based on simple low-pass filtering or neighbor averaging. Wavelet representations permit more efficient noise removal while preserving high frequencies. Mallat and Hwang (1992) developed a method for noise removal based on analyzing the evolution of maxima of the wavelet transform across scales (similarly to the edge coding by Mallat and Zhong, 1992). This method suppresses a large portion of noise but also removes most of the texture in the image (as in Fig. 23). By applying nonlinear soft thresholding to the coefficients of the wavelet transform, Donoho (1995) introduced a nearly optimal method for removing additive Gaussian white noise that is also applicable to a variety of noises (Lang et al., 1995). This method is based on the fact that in a wavelet transform most of the energy is concentrated in a limited number of coefficients. He has shown that the following thresholding:
is the optimal nonlinear transformation to obtain a smooth estimate of the signal. The threshold 6 is estimated from the observed image. Figure 5 illustrates the effect of thresholding the coefficients of a wavelet transform. The use of critical sampling, which may be very convenient in image compression, has important drawbacks in image enhancement or denoising applications (instability, ringing, or resolution loss). Using redundant transforms, it is possible to avoid artifacts (Coifman and Donoho, 1995; Lang et af., 1995). Based on the redundant, quasicomplete Gabor representation (Navarro and Tabernero, 1991) described in Section III,C, Cristobal and Navarro (1994a, b) have developed a blind restoration method for noise removal. This method is based on locally estimating the departure of the test (corrupted) image from an ideal prototype (uncorrupted). If the degradation consists of additive noise (Brownian fractal), then they estimate (locally) its magnitude at each pyramid level and
FIGURE25. (a) Digital cardiac angiogram; (b) after local contrast enhancement through filtering along the dominant orientation with the second derivative of a Gaussian; (c) isotropic bandpass filtering from (a). Reprinted by permission from Freeman et al. The design and use of steerable filters for image analysis, enhancement, and wavelet representation. IEEE Trans. Putt. Anal. Mach. Intell. 13, 891-906. Copyright 1991 IEEE. The original authors acknowledge P. Granfors of G. E. Medical Systems for providing the digital angiogram.
58
RAFAEL NAVARRO ET AL.
subtract it so that the resulting Gabor magnitude corresponds to that of the image prototype. The idea of an image prototype is based on the fact that natural scenes tend to show a common pattern of decay in their amplitude spectra ( - l/f), so we can expect our test image to behave similarly. This idea was previously applied to blind deconvolution of astronomical images (Navarro et al., 19871, as well as to defining image quality metrics (Nil1 and Bouzas, 1992). Figure 26 illustrates the noise removal application: a digital 512 X 512 aerial image (a) is artificially corrupted (b) by adding a large amount of computer-generated Brownian fractal noise (SNR = 1:lO). The result of denoising has been further enhanced by conventional histogram stretching (c). This method preserves information about edges and texture but has a high-pass effect, which causes some edge enhancement that may facilitate object recognition, at the cost of losing the original photometric information. Similar procedures, using an image prototype and the multiscale Gabor representation, have been developed and successfully applied to image restoration from spacevariant blur (defocus) and homomorphic enhancement (Cristobal and Navarro, 1994b). 2. Image Fusion
The goal of image fusion is to integrate information coming from different sensors into a single image (Ehlers, 1991). In the particular case of producing a single monocrome image from two images coming from different spectral ranges, multiresolution subband merging methods (Toet, 1992) provide very good results. Figure 27 shows results of image fusion by the method of Santamaria and G6mez (1993), based on a multiscale Gabor representation (Navarro and Tabernero, 1991). The two input images (upper panels) are two nuclear magnetic resonance images of a coronal slice of a human head. The left image was taken to see the bone structures having conspicuous details to be used as reference landmarks by the surgeons. The image on the right shows soft structures (brain, etc.) with a tumor corresponding to the bright spot above the ear. These two images are decomposed into a Gabor pyramid (see Fig. 15). Then a single reconstruction (lower panel) is obtained by choosing, for each pair of Gabor samples (left and right), that with the higher energy. The reconstruction shows all the significant visual information contained in the two input images, and in this way the surgeon can better estimate the exact position of the tumor. This example corresponds to a real and successful surgery that was carried out in the spring of 1995 at the “Gregorio Marafi6n” Hospital, Madrid, Spain.
IMAGE REPRESENTATION WITH GABOR WAVELETS
59
FIGURE 26. (a) Aerial image of a “road” intersection (512 X 512; 8 bpp); (b) simulated “foggv” test image obtained by a linear combination of (a) and computer-generated fractal noise (S/N= 1:lO); (c) enhanced image after multiscale decomposition. Adapted from Crist6bal and Navarro. Blind and adaptative image restoration in the framework of a multiscale Gabor representation. IEEE Symposium on Time-Frequency Time-Scale Analysis, Philadelphia, pp. 306-309. Copyright 1994 IEEE.
Lately, a more general framework for image fusion has been developed based on a QMF wavelet transform, including several metrics for the evaluation of the results provided by different fusion methods (Li et al., 1995). 3. 3 0 Image Reconstruction
3D image reconstruction, such as computed tomography, involves a huge computational cost. The image forming process can be described by an
60
RAFAEL NAVARRO ET AL.
FIGURE27. Image fusion of two nuclear magnetic resonance images of a coronal slice of a human head (upper panels). The left image mainly shows bone structures with conspicuous details used as reference landmarks by the surgeons; the right image shows inner soft structures (brain, etc.) with a tumor corresponding to the bright spot above the ear. The fused image, obtained from a multiscale Gabor decomposition, contains all the relevant infonnation from both images, which was very helpful for successful surgery. Courtesy J. Santamarfa and M. T. Gbmez, SENER, Ingenieria y Sistemas, Madrid, Spain.
expression similar to Eq. (391, in which each projection p is
p=Ro+n (41) where R is the Radon transform of the 3D object o and n is the noise. The object o can be reconstructed from a number of projections if their Fourier transform reasonably samples the 3D Fourier domain. Nevertheless, reconstructing a 3D signal from a finite set of 2D projections is an inverse ill-posed problem, requiring regularized iterative algorithms. Solving this problem in a multiscale (Bhatia et af., 1993) or wavelet domain (Peyrin et af., 1993; Olson and DeStefano, 1994), presents important advantages, including computational efficiency. Thresholding the sparse wavelet coefficients causes a great majority of coefficients to be zero, which is a very important advantage when dealing with 3D signals (BlancFeraud et af., 1994).
IMAGE REPRESENTATION WITH GABOR WAVELETS
61
4. Image Quality Metrics
Evaluation of the techniques of image compression, enhancement, etc., requires the development of tools for measuring image quality. This an open issue that still lacks a satisfactory answer. The traditional engineering approach considers the root-mean square (RMS) error (or mean squared difference) between an original undergraded image and the resulting image (decompressed, enhanced, reconstructed, etc.). However, this and similar parameters do not consider that, in most applications, the final step is visual perception by a human observer. Consequently, image quality metrics should consider the observer’s visual response. Watson (198%) applied visual criteria for image compression using the cortex transform. Peli (1990) defined local contrast as the ratio of the bandpass filtered image to the low-pass filtered version that contains all the energy below that band. Nil1 and Bouzas (1992) derived an image quality parameter by weighting the power spectra of the image by the human contrast sensitivity. Teo and Heeger (1994) proposed a method also based on empirical data about human vision. They apply a visual model consisting of (1) a bank of filters tuned to different orientations and frequencies, (2) normalization of the responses, and (3) a detection mechanism simulated by a simple squared error norm. Figure 28 illustrates that the standard RMS error is often not a good measure of visual image quality. The two images in the central row have identical RMS error, but they suffer dissimilar degradation, producing different visual quality. The bottom row shows the errors: difference between the degraded image and the original in the upper panel. Perceptual aspects of image coding, as well as the measurement of visual quality, are extensively discussed in a recent collective book (edited by Watson, 1993).
VI.
IMAGEANALYSIS AND MACHINE VISION
We review in this section some relevant applications of Gabor functions, and similar wavelets, to image analysis and machine vision. There is a very extensive and continuously increasing literature in this field that we can hardly embrace here. For this reason, we will focus our review on a few tasks, typical in low to middle-level vision. We will not include here higher level analysis (such as shape from texture gradients and shape from motion), even though this is a very active field of research (see, for instance, Haralick and Shapiro, 1992, 1993; for a review). There are several reasons why Gaussian wavelets (GWs) are so useful for machine vision (in the term GW we will include wavelets having a
R G U R28. ~ Two examples of distorted images. Central row: the left panel shows a test image with PSNR = 24.9 dB and the right panel a test image with PSNR = 24.3 dB. The original undistorted image is shown on top. Both distorted test images show nearly identical PSNRs, whereas the perceptual image quality measures are 0.69 and 1.78, respectively. The lower row displays the perceptual distortion measures corresponding to the test images. Reprinted by permission from Teo and Heeger. Perceptual image distortion. Roc. SHE Human Vlwn Digital Display 2179,127-141. Copyright 1994 SPIE.
IMAGE REPRESENTATION WITH GABOR WAVELETS
63
Gaussian envelope, such as Gabor functions and Gaussian derivatives). As mentioned in the Introduction, GWs are multipurpose. Although different authors propose a variety of filtering schemes (which are similar in general), Gabor functions perform well in most applications, providing in many cases nearly optimal results. Keeping this in mind, we will include here different applications no matter which particular filter or basis function was used by the author, as in most cases changing the original basis by a GW (or vice versa) will not change the essential facts. The great success of GWs in computer vision applications is not surprising, if we remember that they are a good model for image representation in the visual cortex (Section IV). Biological visual systems have to be multipurpose. They need to perform a wide variety of tasks, most of the time in parallel, while at the same time having to optimally exploit limited resources. The extraordinarily good performance achieved by our visual system suggests that emulating biology could be a good engineering strategy in designing machine vision applications. In the following, we describe applications to edge extraction, texture description, and motion and stereo analysis, as well as a few practical pattern recognition applications. A. Edge Detection
Extracting the edges of an image is one of the earliest and most paradigmatic approaches in image understanding and computer vision. Abrupt changes of the gray level in images are both relatively easy to detect and visually meaningful, as they usually correspond to the edges of the objects in the scene. Many authors have considered edge detection as a first stage before accomplishing further analysis (shape, texture, stereo, or motion). Marr’s primal sketch (Marr, 1982) is probably the most representative example of this kind of approach. We can briefly summarize a rather extensive literature on edge detection with a couple of basic ideas. The straightforward way to detect edges is by applying derivative operators: a sharp edge will present a local maximum in its first derivative and a zero crossing in its second derivative. However, due to the instability inherent in numerical differentiation, it is usually necessary to smooth the image beforehand, typically by convolution with a Gaussian. But the derivative of a convolution product is equivalent to the convolution of the function with the derivative of the kernel. Consequently, efficient edge detection can be done through a convolution with Gaussian derivatives. Marr and Hildreth (1980) proposed the Laplacian of a Gaussian operator, the so-called Mexican hat filter. Later, Canny (1986) showed that the first derivative of a
64
RAFAEL NAVARRO ET AL.
Gaussian is close to the optimal edge detector, simultaneously meeting the two basic requirements of good localization and high sensitivity. These two, first and second Gaussian derivative, operators are probably the most widely accepted basic approaches to edge detection. Although further sophistication may be introduced to avoid false alarms, such as introducing cellular automata, neural networks, etc., these are beyond the scope of this review. It turns out that wavelets constitute very interesting edge detectors. Gaussian derivatives are a particular family of GWs commonly used in image representation and visual modeling (see Sections I1 and IV). Interestingly, first and second Gaussian derivative edge detectors have odd and even symmetry, respectively, such as the basis functions used in many wavelet schemes. This reminds us of the arrangement found in the visual cortex, with couples of neurons in phase quadrature (Pollen and Ronner, 1983). In fact, by combining both the even and odd filters (zero crossing and maxima), one can increase the robustness of the edge detection. This is one in a series of examples in which GWs are involved in the solution of an image analysis problem. Although Gaussian derivatives are the optimal edge detectors, other similar wavelets can also provide good performance, which is close to optimal in many cases. In particular, Gabor functions provide excellent results (Mehrotra et al., 1992). Figure 29 (from Mehrotra et al., 1992) shows an illustrative example. The efficiency achieved here by an odd Gabor filter is not far from that provided by the Canny edge detector. Marr (1982) hypothesized that further visual analysis (shape, texture, stereo, motion) would be based on processing edges, i.e., on the primal sketch. However, Daugman (1987) pointed out that the computation of Laplacian zero-crossings (primal sketch) is not necessary for most image analysis tasks. Furthermore, if we consider only zero crossings, we are discarding a great deal of useful information. Hummel and Marriot (1989) pointed out that reconstructing a signal from zero crossings in a scale space is possible but unstable (the reconstruction was stable when the gradient data along the zero crossing was also included.) The following paragraphs illustrate how current approaches try to use all the information available in the image.
B. Texture Analysis Texture analysis has been one of the most active field of application of GWs. Most natural images are composed of patches of textures that we perceive as uniform within each zone. Although there is not yet a unique
IMAGE REPRESENTATION WITH GABOR WAVELETS
65
FIGURE29. Example of edge detection with Gabor functions. (a) Input image; (b) absolute response of a Gabor odd filter; (c) all the local peaks in the response without thresholding; (d) edges detected after appropriately thresholding the responses shown in (b). From Mehrotra et al. Gabor filter-based edge detection. Pattern Recogn. 25, 1479-1494. Copyright 1992, reprinted with kind permission from Elsevier Science Ltd., The Boulevard, Langford Lane, Kidlington OX5 IGB, UK.
satisfactory definition, textures are made of a set of local structural elements that are repeated somehow across the texture, producing a sensation of uniformity. Consequently, texture analysis requires combining local analysis for the basic textural elements and then a global analysis looking for uniformity within a texture and discriminating between different textures. The joint localization of GWs makes them a very interesting tool for a local description of textures. Current models of texture perception (Sutter et af., 1989; Malik and Perona, 1990) use Gaussians or similar
66
RAFAEL NAVARRO ET AL.
wavelets, which basically extract the local second-order statistics, such as autocorrelation or power spectrum [although other structural models have been proposed (Julesz et al., 197311. In a multiresolution approach, texture segmentation typically consists of several basic stages (Clark and Bovik, 1989; Turner, 1986; Fogel and Sagi, 1989; Porat and Zeevi, 1989; Bovik et al., 1990; Malik and Perona, 1990; Tabernero and Navarro, 1990, 1993b; Jain and Farrokhinia, 1991; Dunn and Higgins, 1994): 0
0
Application of a bank of Gabor (or similar) filters tuned to different frequencies and orientations, with even and odd parities [although Malik and Perona (1990) used only even filters]. One or more nonlinearities (energy, modulus, halfwave rectification, sigmoids, etc.). Spatial local average of the responses. Further optional processing (lateral interactions, opponency, thresholding, etc.). The final result is a set of local descriptors to be used in texture discrimination, segmentation, classification, or even learning (Greenspan et al., 1994). In most of the referenced models the phase is ignored, even though it has been pointed out that Gabor phase may be important in texture discrimination (du Buf, 1990; du Buf and Heitkamper, 1991).
To illustrate the performance achieved by GWs with practical examples we will use here the particular model described in Section III,C, in a simple, but efficient approach, directly using the Gabor filter responses as texture descriptors (Tabernero and Navarro, 1990, 1993b; Navarro et al., 1995). This is quasicomplete representation, and therefore we can use this texture description for either analysis, coding, or synthesis. Figure 30a shows a texture (wood) and the mean descriptor matrix with the responses (averaged over all pixels of the texture) of the four frequencies (rows) by four orientations (columns) Gabor filters. The elements of the matrix correspond to the moduli of the complex responses. Here, these responses have been normalized to the DC of the image (dividing by the low-pass frequency channel), so that the result is a contrast measure, independent of the local gray-level values on the image. In this particular example, the directionality of this texture is clearly visible in the first column of the matrix (bold), showing much higher values. This feature matrix can be used for segmentation and classification. Figure 31 shows two examples of image segmentation using a classical k-means clustering algorithm. In the upper row, the test image (left) consists of two textures, wood and cotton canvas; in the lower row the textures are sea and sand. The direct result of clustering is displayed in the
IMAGE REPRESENTATION WITH GABOR WAVELETS
67
a
FIGURE30. Texture “wood” (a) and its mean (averaged over all pixels) set of Gabor contrast descriptors (b).
FIGURE31. Two examples of image segmentation through the Gabor contrast descriptors. In the upper row the test image contains the textures “wood” and “cotton canvas.” The textures in the lower row are “sea” and “sand.” In both cases, the center panel shows the direct result of a clustering (k-means) algorithm. In the right panels a mode filtering has reduced errors.
68
RAFAEL NAVARRO ET AL.
middle column. The panel on the right corresponds to simple postprocessing consisting of mode filtering to eliminate isolated errors. Bayesian classification is a higher level task in which each sample must be assigned to one of a set of possible classes, requiring previous training with known textures. In the examples shown in Fig. 32, the task was to directly assign every pixel of the test image (containing one, two, or four textures) to one of the four possible classes. In all three cases, more than 85% of the pixels were correctly classified (this is far above the chance level for a random classification between four alternatives: 25%). After a mode filtering, the result is close to 100% (lower row). There is only a place (see middle column in the figure) in the boundary between wood and cotton canvas where a piece of cotton was missclassified as sand. Going back to Fig. 30, a 45" rotation of the texture will produce a shift of columns in the matrix. Similarly, stretching or shrinking the texture by a factor of 2 will cause a change of scale, shifting the rows up or down, respectively. We can take advantage of these properties by using this feature matrix for invariant texture classification under changes of scale and orientation. Nevertheless, large changes of scale may change the textural appearance of the image, unless the texture is a fractal (Tabernero and Navarro, 1990).
FIGURE32. Bayesian classification (lower row)of the pixels of three test images containing one, two, and four textures (upper row) from a set of four possible textures, using the Gabor contrast descriptors. Isolated errors (rnisclassified pixels) have been eliminated by mode filtering.
IMAGE REPRESENTATION WITH GABOR WAVELETS
69
1. Fractal Dimension The ideas of texture, roughness, and fractal dimension are related. The problem of estimating the fractal dimension of an image can be considered as a particular case within texture analysis. Textures are not fractals in general, but some textures look like fractals. Figure 33 shows samples of two-dimensional Brownian fractals generated by a stochastic process, which are characterized by having a (l/f)Y power spectrum, with y being linearly related to the fractal dimension. Among the different methods proposed to estimate the fractal dimension, Heeger and Pentland (1986) used Gabor functions to test for a fractal scale-invariant regularity across space and time in turbulent motion. Subsequently, other authors have developed methods to measure the fractal dimension in 2D images with Gabor functions (Tabernero and Navarro, 1990; Super and Bovik, 19911, or even with optical implementations of the Mexican hat filter (Freysz et al., 1990). Following our model, if we compute the Gabor feature matrix (as in Fig. 30b) of a Brownian fractal, then the four values in each column will lie on a straight line when displayed against the corresponding peak frequency of the channel in a log-log plot: log H = k - y log f. For isotropic fractals, we can just average the four columns. The slope of this straight line is linearly related to the fractal dimension. Figure 33 shows four computergenerated (isotropic) Brownian fractals with increasing fractal dimension: 2.1, 2.3, 2.6, and 2.9, respectively. Figure 34 displays the result of a
FIGURE33. Computer-generated Brownian fractals with increasing fractal dimension: 2.1, 2.3, 2.6, and 2.9, from left to right and top to bottom, respectively.
70
RAFAFiL NAVARRO ET AL. 0.7
0.6 0.5 0.4
B
2 0.3 v)
0.2
0.1
0 -0.1
2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3
Fractal Dimension FIGURE34. Linear relationship between the fractal dimension D and the slope m. The slope is obtained by least-squares fitting of the 4 X 4 Gabor contrast descriptors as a function of their corresponding peak frequency on a log-log scale. This result has been empirically obtained with a set of computer-generated fractal textures such as those shown in Fig. 33.
computer simulation, demonstrating empirically the linear relationship between the fractal dimension D and the slope m. A least-squares fit ( r = 0.94) gives the following result: D = 2.88 - 1.19m. This equation permits estimating the fractal dimension D from the slope m, obtain by fitting the data extracted from the image. 2. Texture Synthesis
As we said before, Gabor functions permit a quasicomplete representation, so that we can invert the above procedure to synthesize textures. Porat and Zeevi (1989) created simple synthetic textures combining Gabor filters. More recently Navarro et al. (1995) have implemented a new procedure to code pure textures with a fixed number of only 69 parameters. From this feature set, we can produce a realistic synthetic copy with
IMAGE REPRESENTATION WITH GABOR WAVELETS
71
the same visual appearance as the original. The 69 parameters code the average response of the Gabor channels (3 parameters: energy and effective bandwidths, along the’two axes, per channel), the low-pass residual (5 parameters), and the gray-level histogram of the original image (subsampled to 16 parameters). The synthesis process consists of generating a series of 16 random noises that are filtered with the corresponding Gabor functions and appropriately merged to obtain the synthetic texture. The last stage is to impose the desired histogram. Figure 35 shows two characteristic results with the textures “wood” (upper) and “text” (lower).
FIOURE 35. Two examples of texture synthesis: “wood” (upper) and “text” (lower). The original image is on the left, and the synthetic texture is on the right.
72
RAFAEL NAVARRO ET AL.
In both cases the original texture is on the left and the synthetic texture on the right. There is a high visual resemblance between the original and synthetic textures in the first example (wood). However, when trying to synthesize a piece of text from a stochastic process, the results looks like ancient deteriorated writing (with a sort of funny pixelated characters). This approach works well with low-structured textures but fails with highly structured plus disordered textures. C. Motion Analysis
Biological vision systems are highly specialized in motion detection and analysis because of its vital relevance in navigation, hunting, defense, etc. Motion perception and analysis are very important goals of artificial vision, although biological systems are still far beyond in performance and capabilities. Current models of motion perception (Adelson and Bergen, 1985; Watson and Ahumada, 1985; Chapters 1, 8, and 16 in Landay and Movshon, 1991) are based on spatiotemporal3D ( x , y, t ) filtering by Gabor or similar wavelets. If we consider a spatiotemporal separable model: g ( x , y ) h ( t ) , then the impulse response of the temporal filter h ( t ) is not a symmetric Gabor function. The biological response has some delay, typical of a causal filter; Le., causal filters do not affect past frames. The temporal impulse response, h(t), is somehow similar in shape to a first derivative operator (or odd part of a Gabor function), but apart from the delay, it is not symmetric (Watson and Ahumada, 1985). These filters sample the 3D frequency space and selectively respond to a limited range of velocities (velocity tuning). A higher level stage in motion analysis is to extract the field of velocities in the image, the optical flow. This is a difficult problem in general, and many different methods and practical implementations have been proposed since the first optical flow algorithm by Horn and Schunk (1981). There are two main approaches: those based on the original differential method (Horn and Schunk, 1981) and those based on 3D frequency analysis through Gaussian or similar 3D wavelets (Heeger, 1987). Nevertheless, these two schemes turn out to be equivalent (Jahne, 1991). The filtering approach was motivated by visual models and is illustrated in Fig. 36. The diagram shows the time course of a 1D bar object (its length corresponds to that of the shadowed area) moving with constant velocity (given by the l/slope). An odd symmetric filter tuned to that velocity (i.e., parallel to the slope) and placed at the edge of the object will give a maximum response to this particular motion. Therefore a set of these filters, covering the conjoint 2D space-time/frequency domain, as shown in Fig. 37 (from Heeger, 19871, will also code velocity. This figure
IMAGE REPRESENTATION WITH GABOR WAVELETS
73
FIGURE36. Motion detection by an odd-symmetricfilter placed at the edge of a moving bar (shadowed). It produces maximum response when it has the same orientation in the space-time domain as that corresponding to the velocity of the bar.
can be realized as one of the possible 3D generalizations of the 2D sampling of the Fourier domain depicted in Fig. 14b. In Fig. 37 and 3D log-polar sampling is generalized to a cylindrical sampling (log f,8, o),i.e., including linear sampling along the temporal frequency, o, axis. The sampling units are 3D Gaussians (Gabor functions). Later, other models used a sampling based on spherical instead of cylindrical coordinates (Simoncelli, 1993). In the differential approach the gray-level distribution of a moving image is considered as a fluid. Then we can intuitively think that, to a first approximation, from one instant to the next, there are differential motions but not global changes; i.e., its global differential dI(x,y,t) = 0 (this is equivalent to considering only the first term in a Taylor series expansion):
This equation relates the spatiotemporal gradient of the image with the velocity (u,,uy). It is a single equation with two unknowns, and thus we need additional constraints for solving it. A possible constraint is local smoothness in space (Horn and Schunk, 1981) or across scales (Battiti et al., 1991). Another possibility is to apply a GW decomposition in such a way that we have an equation, similar to Eq. (421, for each filtered version of the image, which yields an overdetermined system of equations (Weber
74
RAFAEL NAVARRO ET AL.
FIGURE37. ’helve motion-sensitive Gabor energy filters. They are positioned in pairs on a cylinder in the spatiotemporal-frequency domain (temporal-frequency axis pointing up). The plane represents the power spectrum of a translating texture; the tilt depends on velocity. From Heeger. Model for the extraction of image flow. 1. Opt. SOC. Am. A 4, 1455-1471. Copyright 1987 Optical Society of America. Reprinted by permission.
and Malik, 1995). This approach combines both the filtering and differential methods. Simoncelli (1993) has developed an elegant algorithm based on a generalized formulation (J%hne, 1991). Figure 38 (from Heeger, 1987) is an example corresponding to the computer-generated sequence “flight through Yosemite valley.” It shows one frame of the sequence (a), the actual optical flow field (b), the optical flow obtained with Heeger’s algorithm (c), and the error (d), i.e., the difference between (b) and (c). As we can see, optical flow algorithms produce noisy results (due to the aperture problem, failure of the differential approach, etc.).
D. Stereo In stereo vision we use the displacements (disparities) between points corresponding to the same object produced by the perspective differences between the stereo pair of images. There are many similarities between
IMAGE REPRESENTATION WITH GABOR WAVELETS
75
finding corresponding points and disparities in stereo and optical flow computations. The main difference is that in the former, displacements are produced simultaneously (instead of sequentially). In addition, stereo displacements are one-dimensional, restricted to the direction passing through the center of the two eyes (although in the human eye, the spherical projection also causes small vertical disparities). Another difference is that in optical flow extraction, we can use information from a continuous sequence, whereas in stereo we are always restricted to two simultaneous frames. In principle, it is possible to adapt optical flow algorithms to this particular problem. However, computational models for stereo vision have historically appeared before motion models, as a matching problem. Among the different models, Marr and Poggio (1976, 1979) used a Mexican hat (Laplacian of a Gaussian) wavelet to find correspondences and developed a very efficient cooperative algorithm. Figure 39 shows the evolution of their iterative algorithm. The algorithm reconstructs depth from a random-dot stereogram test (at the top). The different shades of gray represent different disparity values and hence different depths. Lately, wavelet transforms have also been applied to disparity analysis (Djamdji and Bijaoui, 1995). Apart from these general problems, GWs are being applied to an increasing number of specific practical problems in pattern recognition. They range from localization of the address block in envelopes (Jain and Bhattacharjee, 1992) to character recognition (Shustorovich, 19941, face recognition (Lourens et al., 1994; Gross and Koch, 1995), or even real-time personal identification based on iris analysis (Daugman, 1993; Daugman and Downing, 1995).
VII. CONCLUSION Conjoint representations, such as wavelet and multiscale representations, have produced a real breakthrough in both understanding the nature of the image representation in the visual cortex and opening the way to solve many problems in image analysis and low-level machine vision applications. Among the different representations, those based on Gabor functions have some advantages: optimal localization in both (space and frequency) domains, robustness, and the complete duality of the representation in both domains. In particular, linear dependent (oversampled or redundant) representations are quite robust to partial losses of information. The main drawback of Gabor functions is the lack of orthogonality, which makes the exact computation of the coefficients difficult. Relaxing
FIGURE38. Illustrative example that corresponds to a computer-generated realistic image sequence "flight through Yosemite valley." It displays one frame of the sequence (a), the actual known flow field (b), the optical flow obtained with Heeger's algorithm (c), and the
.. ................
d
~
-
.
.
,
L
-
C
-
C
~
C
*
.
.
.
.
.
,
-
~
C
-
C
-
.
.
,
-
. . . . . . . . . . . . . . . . . . ............ .. .. .. .. ., .. .. .. .*.....-. -. .. .. -. .- .. .. .. .. .. .. .. . . . - - , ......... . . . ----. . . . . . . . . '
.
.
.
.
.
.
C
C
4
C
,
,
C
.
.................... . .
-
C
,
C
.
C
C
c-
I - -
C
C
.
.
.
.
.
.
.
.
.
.
c
*
-
C
c
C
..
C
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...............................
............................... ........................ .................... ...........................
~
L
-
-
r
.
-
-
.
.
I
/
P
/
/
l
/
~
l
.
.
j
.
.
.
t
.
~
#
#
.
.
>
I
.
,
.
.
.
.
.
.
,
.
.
,
.
.
.
.
.
,
,
#
*
.
*
,
.
~
f
f
.
.
.
.
.
.
.
.
.
,
.
.
.
.
.
.
-
.
//lit?...............,. 8
0
1
1
r
*
.
.
.
.
.
.
. . . . . . . .
........................ ......................
*
.* .*... .. .. . ( . ~ . . ( . l . . . . . l. .. . . . . . . . . .-.... .....*. .. \ \\ (\ \\ \. .~ (. .. .. .. -. .. .. ... .. . . ... .. .. .. .. .. .. ......
,
C
l
f
t
.
(
.
t
.
.
I
.
.
\
.
\
.
\
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
.
.
1
error (d): difference between (b) and (c). From Heeger. Model for the extraction of image flow. J . Opt. SOC. Am. A 4, 1455-1471. Copyright 1987 Optical Society of America. Reprinted by permission.
78
RAFAEL NAVARRO ET AL.
FIGURE39. Decoding of a random-dot stereogram by a cooperative algorithm. The stereogram appears at the top. The algorithm gradually reveals the structure through a few iterations: 0, 1, 2, 3, 4, 5, 6, 8, and 14. The different shades of gray represent different disparity values. From Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Freeman, New York. Copyright 1982 W. H. Freeman and Company. Used with permission of W.H. Freeman and Co.
some requirements (completeness, orthogonality) permits us to design quasicomplete redundant schemes which may constitute an interesting alternative for real-time implementations. There is a number of open issues that will certainly forward new research in the field. Apart from the choice of the basis functions and the sampling strategies (log-polar, Cartesian, etc.), we want to remark on two of them here. On the one hand there
IMAGE REPRESENTATION WITH GABOR WAVELETS
79
is still a lack of experimental data about important aspects of the visual representation. For instance, the finding of couples of neurons in phase quadrature (Pollen and Ronner, 1983) still lacks further confirmation. On the other hand, we still need to develop widely accepted metrics and standards to evaluate objectively the quality of images and representations. These metrics must consider both how the visual system will respond to the image under test and which is the visually relevant information contained in that image.
ACKNOWLEDGMENTS This work has been partially supported by the Spanish CICYT under grant TIC94-0849. We especially thank Dr. John Daugman for a critical revision of the manuscript and Oscar Nestares and Javier Portilla for their kind collaboration in preparing several figures and graphics.
REFERENCES Adelson, E. H.,and Bergen, J. R. (1985). J. Opt. SOC. Am. A 2, 284-299. Ahumada, A., and Tabernero, A. (1992). OSA Annual Meeting Technical Digest 23, 130-131. Akansu, A. N., and Haddad, R. A. (1992). “Multiresolution Signal Decomposition.” Academic Press, Boston. Anderson, P. (1992). Wavelet transforms and image compression. MsSci. thesis, Chalmers University of Technology, Goteborg, Sweden. Antonini, M., Barlaud, M., Mathieu, P., and Daubechies, I. (1992). IEEE Trans. Image Process. 1, 205-220. Bargmann, V . , Buttera, P., Girardello, L., and Klauder, J. R. (1971). Rep. Math. Phys. 2, 221 -228. Bartelt, H. O., Brenner, K. H., and Lohmann, A. W. (1980). Opt. Comm. 32, 32-38. Bastiaans, M. J. (1981). Opt. Engineer. 20, 594-598. Bastiaans, M. J. (1982). Opt. Acra 29, 1349-1357. Bastiaans, M. J. (1985). IEEE Trans. ASSP 33, 868-873. Bastiaans, M. J. (1994). Appl. Opf.33,5241-5255. Battiti, R., Amaldi, E., and Koch, C. (1991). Int. J. Comput. Vision 6, 133-145. Bhatia, M., Karl,, W. C., and Willsky, A. S. (1993). Proc. SPIE 2034,58-69. Blanc-Fkraud, L., Charbonnier, P., Lobel, P., and Barlaud, M. (1994). h c . IEEE ICASSP, pp. 491-494, Adelaide, Australia. Bovik, A. C., Clark, M., and Geisler, W. S. (1990). IEEE Trans. PAMI 12, 55-73. Braccini, C. Gambardella, G., Sandini, G., and Tagliasco, V. (1982). Biol. Cyber. 44, 47-58. Bradley, J . N., and Brislawn, C. M. (1993). Proc. SPIE 1961, 293-304. Braithwaite, R. N., and Beddoes, M. P. (1992). IEEE Trans. Image Process. 1, 243-234. Burt, P. L., and Adelson, E. H. (1983). IEEE Trans. Comm. 31, 532-540. Caelli, T., and Moraglia, G. (1985). Vision Res. 25, 671-684. Campbell, F. W., and Robson, J. G. (1968). J . Physiol. (Lond.) 197, 551-566.
80
RAFAEL NAVARRO ET AL.
Campbell, F. W., Cooper, G. F., and Enroth-Cugell, C. (1969). J. Physiol. (Lond.) 203, 223-235. Canny, J. (1986). IEEE Trans. PAM1 8, 679-698. Chui, C. K., Ed. (1992a). “An Introduction to Wavelets.” Academic Press, San Diego, CA. Chui, C. K., Ed. (1992b). “Wavelets: A Tutorial in Theory and Applications.” Academic Press, San Diego, CA. Claasen, T. A. C. M., and Mecklenbrauker, W. F. G. (1980). Parts I, 11, and 111. PhillipsJ. Res. 35,211-250, 276-300,372-398. Clark, M., and Bovik, A. C. (1989). Putt. Recogn. 22, 707-717. Cohen, L. (1966). J . Math. Phys. 7, 781-786. Coifman, R. R., and Donoho, D. L. (1995). In “Wavelets and Statistics” (A. Antoniadis, Ed.). Springer-Verlag, New York. Cristbbal, G., and Navarro, R. (1994a). Putt. Recog. Lett. 15, 273-277. Cristbbal, G., and Navarro, R. (1994b). Proc. IEEE-Sp Symposium on Time-Frequency TimeScale Analysis, pp. 306-309, Philadelphia, PA. Cristbbal, G., Gonzalo, C., and B e d s , J. (1991). I n “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Vol. 80, pp. 309-404. Academic Press, San Diego, CA. Curcio, C. A., and Allen, K. A. (1990). J. Comp. Neurol. 300,5-25. Curcio, C. A., Sloan. K. R., Jr., Kalina, R. E., and Hendrikson, A. E. (1990). J. Comp. Neurol. 292, 497-523. Daubechies, I. (1990). IEEE Trans. Inform. Theory 36, 961-1005. Daubechies, I. (1992). “Ten Lectures on Wavelets.” SIAM, Philadelphia, PA. Daubechies, I., Grossman, A., and Meyer, Y. (1986). J . Math. Phys. 27, 1271-1283. Daugman, J. G. (1980). Ksion Res. 20, 847-856. Daugman, J. G. (1984). Viwn Res. 24,891-910. Daugman, J. G. (1985). J . Opt. SOC. A m . A 2, 1160-1169. Daugman, J. G. (1987). J. Opt. SOC.A m . A 5, 1142-1148. Daugman, J. G. (1988). IEEE Trans. ASSP 36, 1169-1179. Daugman, J. G. (1993). IEEE Tram. PAM1 15, 1148-1161. U.S.Patent 5,291,560 (March 1, 1994). Daugman, J. G., and Downing, C. J. (1995). J. Opt. SOC. A m . A 12, 641-660. DeValois, R. L., Albrecht, D. G., and Thorell, L. G. (1982a). Viion Res. 22,545-559. DeValois, R. L., Yund, E. D., and Hepler, N. (1982b). V O n Res. 22, 531-544. Devore, R. A., Jawerth, B., and Lucier. B. J. (1992). IEEE Trans. lnf. Theory 38, 719-746. Djamdji, J. P., and Bijaoui, A. (1995). IEEE Trans. Geosci. Remote Sensing 33, 67-76. Donoho, D. (1995). IEEE Trans. Info. Theory 41, 613-627. du Buf, J. M. H. (1990). S i g n a l h e s s . 21, 221-240. du Buf, J. M. H.,and Heitkamper, P. (1991). Signal Process. 23, 227-244. du Buf, J. M. H., and Fischer, S. (1995). Opt. Engineer 34, 1900-1911. Duffin, R. J., and Schaeffer, A. C. (1952). Tmns. Am. Math. SOC.72, 341-366. Dunn, D., and Huggins, W. E. (1994). IEEE Trans. 16, 130-149. Ebrahimi, T., and Kunt, M. (1991). Opt. Engineer. 30,873-880. Ebrahimi, T., Reed, T. R., and Kunt, M. (1990). Signal Process. 5, 769-772. Ehlers, M. (1991). ISPRS J. Photogrammetry Rem. Sensing 46, 19-30. Eizinger, P. D. (1988). Elec. Lett. 24, 810-811. Eizinger, P. D., Raz, S.,and Farkash, S. (1989). Elec. Letr. 25, 80-82. Enroth-Cugell, C., and Robson, J. G. (1966). J. Physiol. (Lond.) 187, 517-552. Esteban, D., and Galand, C. (1977). Proc. Intnl. Conf. on Acoust. Speech and Signal Proc. ICASSP, pp. 191-195, Washington, D.C. Fahnestock, J. D., and Schowenderdt, R. A. (1983). Opt. Engineer, 22, 378-381.
IMAGE REPRESENTATION WITH GABOR WAVELETS
81
Field, D. J. (1987). J. Opt. SOC.A m . A 4, 2379-2394. Fogel, I., and Sagi, D. (1989). Biol. Cyber. 61, 103-113. Fournier, A., Ed. (1994). Wavelets and their applications in computer graphics. IGGRAPH’94 Course Notes, University of British Columbia. Freeman, W. T., and Adelson, E. H. (1991). IEEE Trans. PAMI 13, 891-906. Freysz, E., Pouligny, E., Argoul, F., and Ameodo, A. (1990). Phys. Rev. Lett. 64, 745-748. Froment, J., and Mallat, S. (1992). In “Wavelets: A tutorial in Theory and Applications” (C. K. Chui, Ed.), pp. 655-678, Academic Press, San Diego, CA. Gabor, D. (1946). J. Ins?. Electr. Eng. 93, 429-457. Gabor, D. (1965). Lab. hues?. 14, 801-807. Genossar, T., and Porat, M. (1992). IEEE Trans. Systems, Man Qbemetics 22, 449-460. Geri, G., Lyon, D. R., and Zeevi, Y. Y.(1995). Vision Res. 35,495-506. Gonzdlez, R. C. (1986). In “Handbook of Pattern Recognition and Image Processing” (T. Y. Young and K. S. Fu, Eds.), pp. 191-213, Academic Press, San Diego, CA. Greenspan, H., Goodman, R., Chellappa, R., and Anderson, C. H. (1994). IEEE Trans. PAMI 16, 894-901. Gross, M. H., and Koch, R. (1995). IEEE Trans. Viualization Computer Graphics 1, 44-59. Grossman, A., and Morlet, J. (1984). SL4M J. Math. 15, 723-736. Haar, A. (1910). Math. Ann. 69, 331-371. Haralick, R. M., and Shapiro, L. G. (1992). “Computer and Robot Vision,” Vol. I. AddisonWesley, Reading, MA. Haralick, R. M., and Shapiro, L. G. (1993). “Computer and Robot Vision,” Vol. 11. AddisonWesley, Reading, MA. Hawken, M. J., and Parker, A. J. (1987). Proc. R . SOC. Lond. B 231, 251-288. Hebb, D. 0. (1949). “The Organization of Behaviour.” John Wiley & Sons, New York. Heeger, D., and Pentland, A. P. (1986). IEEE Proc. Workshop on Motion: Representation and Anafysis, pp. 131-136, Charleston, SC. Heeger, D. J. (1987). J. Op?. SOC. Am. A 4, 1455-1471. Heitger, F., Rosenthaler, L., von der Heydt, R., Peterhans, E., and Kubler, 0. (1992). Viion Res. 32, 963-981. Helstrom, C. W. (1966). IEEE Trans. lnf. Theory. 12, 81-82. Hilton, M. L., Jawerth, B. O., and Sengupta, A. (1994). Mulrimedia Sys?ems 2, 218-227. Horn, B. K. P., and Schunk, B. G. (1981). Artif. Inrell. 17, 185-203. Hubel, D. H., and Wiesel, T. N. (1962). J. Physiol. (Lond.) 160, 106-154. Hummel, R., and Marriot, R. (1989). IEEE Trans. ASSP. 37, 2111-2130. Jacobson, L. D., and Wechsler, H. (1988). Signal Process. 14, 37-68. Jiihne, B. (1991). “Digital Image Processing.” Springer-Verlag, Berlin. Jain, A. K. (1989). “Fundamentals of Digital Image Processing,” Prentice Hall, Englewood Cliffs, NJ. Jain, A. K., and Bhattachajee, S. K. (1992). Pa??. Recog. 25, 1459-1477. Jain, A. K., and Farrokhinia, F. (1991). Pa??.Recog. 24, 1167-1186. Jones, J., and Palmer, L. (1987). J. Neurophyswl. 58, 1233-1258. Julesz, B., Gilbert, E. N., Sheep, L. A., and Frisch, H. L.(1973). Perception 2, 391-405. Kelly, D. H. (1990). Proc. SPIE 1249, 90-117. Klein, S. A., and Levi, D. M. (1985). J. Op?. Soc. Am. A 2, 1170-1190. Kritikos, H. N., and Farnum, P. T. (1987). IEEE Trans. System, Xan, Qbemerics 17, 978-981. Kulikowski, J. J., MarEelja, S., and Bishop, P. 0. (1982). Biol. Cybet. 43, 187-198. Landy, M. S., and Movshon, J. A., Eds. (1991). “Computational Models of Visual Processing.” MIT Press, Cambridge, MA.
82
RAFAEL NAVARRO ET AL.
Lang, M., Guo, H., Odegard, J. E., and Burrus, C. S. (1995). Proc. SPIE 2491, 640-651. Lau, P., Papanikolopoulos, N. P., and Boley, D. (1993). Elec. Left. 29, 2182-2183. Leipnik, R. (1959). Information Control 2, 64-79. Leipnik, R. (1960). Information Control 3, 18-25. Levingston, M., and Hubel, D. (1988). Science 240, 740-749. Lewis, A. S., and Knowles, G. (1992). IEEE Trans. Image Process. 1, 244-250. Li, H., Manjunath, B. S., and Mitra, S. K. (1995). Gruph. Models Image Process. 57, 235-245. Li, Y., and Zhang, Y. (1992). Opt. Engineer. 31, 1865-1885. Lindenbaum, M., Fischer, M., and Bruckstein, A. (1994). Putt. Recog. 27, 1-8. Lourens, T., Petkov, N., and Kruizinga, P. (1994). Fututv Gener. Syst. 10, 351-358. MacLeod, 1. D. G., and Rosenfeld, A. (1974). Vision Res. 14, 909-915. Maffei, L., and Fiorentini, A. (1973). Vision Res. 13, 1255-1267. Malik, J., and Perona, P. (1990). J . Opr. SOC.A m . A 7, 923-932. Mallat, S. G. (1989a). IEEE Trans. ASSP 37, 2091-2110. Mallat, S. G. (1989b). IEEE Trans. PAMI 11, 674-693. Mallat, S. G., and Hwang, W. L. (1992). IEEE Tmns. Info. Theury 38, 617-643. Mallat, S. G., and Zhong, S. (1992). IEEE Trans. PAMI 14, 710-732. Malvar, H. S. (1989). IEEE Trans. ASSP 37,553-559. MarEelja, S . (1980). 1. Opr. SOC.Am. 70, 1297-1300. Marr, D. (1982). “Vision: A Computational Investigation into the Human Representation and Processing of Visual Information.” Freeman, New York. Marr, D., and Hildreth, E. C. (1980). Proc. R. Sac. Lond. B 207, 187-217. Marr, D., and Poggio, T. (1976). Science 194, 283-287. Marr, D., and Poggio, T. (1979). Proc. R. Sac. Lond. B 204, 301-328. Martens, J. B. (1990a). IEEE Trans. ASSP 38, 1595-1606. Martens, J. B. (1990b). IEEE Trans. ASSP 38, 1607-1618. Martinez-Uriegas, E., Peters, J. D., and Crane, H. D. (1993). Proc. SPIE 1913, 462-472. Mehrotra, R., Namuduri, K. R., and Ranganathan, N. (1992). Putt. Recog. 25, 1479-1494. Meyer, Y.(1988). “Ondelettes et OpCrateurs.” Hermann, Paris. Meyer, Y. (1993). “Wavelets, Algorithms and Applications.” SIAM, Philadelphia, PA. Miller, K. D. (1990). In “Neuroscience & Connectionist Theory” (M.A. Cluck and D. E. Rumnelhart, Eds.), pp. 267-353, Lawrence Erlbaum Associates, Hillsdale, NJ. Miller, K. D., Keller, J. B., and Stryker, M. P. (1989). Science 245, 605-615. Morlet, J., Forgeau, I., and Giard, D. (1982). Geophysics 47, 203-236. Movhson, J. A., Thompson, 1. D., and Tolhurst, D. J. (1978a). J. Physiol. (Lond.) 283, 53-77. Movhson, J. A., Thompson, I. D., and Tolhurst, D. J. (1978b). J . Physiol. (Lond.) 283, 79-99. Movhson, J. A,, Thompson, I. D., and Tolhurst, D. J. (1978~).J . Physiol. (Lond.) 283, 101-120. Navarro, R., and Tabernero, A. (1991). Multidim. Sys. Signal Process. 2, 421-436. Navarro, R., Santamaria, J., and Gbmez, R. (1987). Asiron. Astrophys. 174, 344-351. Portilla, J., and Tabernero, A. (1995). lnstituto de Optica (CSIC), Navarro, R., Nestares, 0.. Technical Report no. 51, Madrid, Spain. Nill, N. B., and Bouzas, B. H. (1992). Opt. Engineer. 31, 813-825. Olson, T., and DeStefano, J. (1994). IEEE Trans. Signal Process. 42, 2055-2067. Papoulis, A. (1989). “Probability, Random Variables and Stochastic Processes.” McGraw-Hill, New York. Pattison, T. R. (1992). Biol. Cyber. 67, 97-102. Peli, E. (1987). Opf. Engineer. 87, 655-660. Peli, E. (1990). J. Opf.Sac. Am. A 7, 2032-2040. Perona, P. (1995). IEEE Trans. Pan. Anal. Machine Intell. PAMI 17, 488-499.
IMAGE REPRESENTATION WITH GABOR WAVELETS
83
Peyrin, R., Zaim, M., and Goutte, R. (1993). J . Math. Imaging h i o n 3, 105-121. Pollen, D. A., and Ronner, S. F. (1983). IEEE Trans. Systems, Man, Cybernetics 13, 907-916. Porat, M., and Zeevi, Y. Y. (1988). IEEE Trans. PAM1 10, 452-468. Porat, M., and Zeevi, Y. Y. (1989). IEEE Trans. Biomed. Eng. 36, 115-129. Qian, S., and Chen, D. (1993). IEEE Trans. Signal Process. 41, 2429-2438. Qian, S., Chen, K, and Li, S. (1992). SignalProcess. 27, 177-185. Rao, K. R. (1990). “Discrete Cosine Transform: Algorithms, Advantages, Applications.” Academic Press, Boston. Rao, K. R., and Ben-Arie, J. (1993). Analog Integr. Circuit Signal Process. 4, 141-160. Rioul, 0.. and Vetterli, M. (1991). IEEE Signal Process. Mag. 8, 14-38. Rodieck, R. W. (1965). Vision Res. 5, 583-601. Rolls, E. T., and Cowey, A. (1970). Exp. Brain Res. 10, 298-310. Rovamo, J., Virsu, V., and Nasanen, R. (1978). Nature 271, 54-56. Sagi, D. (1990). Vl.Res. 30, 1377-1388. Sakkit, B., and Barlow, H. B. (1982). Biol. Cyber. 43,97-108. Sanger, T. D. (1989). Neural Networks 2, 459-473. Santamaria, J., and Gbmez, M. T. (1993). Proc. Annual Meeting of the European Opiical Society, EOS’93, pp. 97-98, Zaragoza, Spain. Shannon, C. E. (1948). Bell Syst. Tech. J . 27, 370-423, 623-656. Sheng, Y., Roberge, D., and Szu, H. H. (1992). Opt. Engineer. 31, 1840-1845. Shustorovich, A. (1994). Neural Networks 7, 1295-1301. Simoncelli, E. P. (1993). Distributed representation and analysis of visual motion. Ph.D. Thesis, MIT, Cambridge, MA. Simoncelli, E. P., and Adelson, E. H. (1990). In “Subband Image Coding” (J. W. Woods, Ed.), Chap. 4, Kluwer, Norwell, MA. Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992). IEEE Trans. lnf. Theory 38,587-607. Stockman, T. G., Cannon, T. M., and Ingebretsen, R. G. (1975). Proc. IEEE 63,678-692. Stork, D. G., and Wilson, H. R. (1990). J. Opt. SOC.Am. A 7, 1362-1373. Stryker, M. P., and Harris, W. (1986). J. Neurosci. 6, 2117-2133. Super, B. J., and Bovik, A. C. (1991). J. V h a 1 Comm. Image Repres. 2, 114-128. Sutter, A., Beck, J., and Graham, N. (1989). Percept. Psychophys. 46,312-332. Tabernero, A., and Navarro, R. (1990). Annual meeting of the Optical Society of America Boston, Conference Digest, 25. Tabernero, A., and Navarro, R. (1993a). Perception 22 (Suppl.), 130-131. Tabernero, A., and Navarro, R. (1993b). In “Optics in Medicine, Biology and Environmental Research” (G. von Bally and S. Khanna, Eds.), Vol. 1, pp. 272-274, Elsevier, Amsterdam. Tanimoto, S., and Pavlidis, T. (1975). Comp. Graphics Image Process. 4, 104-119. Teo, P. C., and Heeger, D. (1994). Proc. SPlE 2179, 127-141. Thomas, J. P., and Gille, J. (1979). J. Opt. SOC. Am. 69, 652-660. Toet, A. (1992). Opt. Engineer. 31, 1026-1031. Turner, M. (1986). Biol. Cyber. 55, 71-82. Van Essen, D. C., Newsome, W. T., and Maunsell, J. H. R. (1984). hlion Res. 24, 429-448. Ville, J. (1948). Cables ef Transmission 2A,61-74. Wang, H., and Yan, H. (1992). Elec. Lett. 28, 1755-1756. Wang, H., and Yan, H. (1993). J. Elecf. Imaging 2,38-43. Watson, A. B. (1983). In “Physical and Biological Processing of Images” (A. C. Slade, Ed.), pp. 100-114, Springer-Verlag, Berlin. Watson, A. B. (1987a). Comp. &ion, Graph. Image Process. 39, 311-327. Watson, A. B. (1987b). J . Opt. SOC. Am. A 4, 2401-2417.
84
RAFAEL NAVARRO ET AL.
Watson, A. B. (1990). J. Opi. SOC. Am. A 7 , 1943-1954. Watson, A. B., Ed. (1993). “Digital Images and Human Vision.” MIT Press, Cambridge, MA. Watson, A. B., and Ahumada, J. A. J. (1985). J. Opi. SOC. Am. A 2, 322-341. Watson, A. B., and Ahumada, J. A. J. (1989). IEEE Trans. Bwmed. Eng. 36.97-106. Watson, A. B., and Turano, K. (1995). Viiion Res. 35,325-336. Weber, J., and Malik, J. (1995). h i . J . Comp. Vision 14, 67-81. Webster, M. A., and DeValois, R. L. (1985). J. Opi. SOC.Am. A 2, 1124-1132. Weiman, C. F. R., and Chaikin, G. (1979). Compuier Graphics Image Process. 11, 197-226. Wexler, J., and Raz, S. (1990). Signal Process. 21, 201-220. Wigner, E. (1932). Phys. Rev. 40, 749-759. Wilson, H. R., and Bergen, J. R. (1979). Viiion Res. 19, 19-32. Woods, J. W., and ONeil, S. D. (1986). IEEE Trans. ASSP 34, 1278-1288. Yao, J. (1993). IEEE Trans. Image Process. 2, 152-159. Young, R. A. (1985). General Motors Research Labs. Technical Report GMR-4920. Young, R. A. (1987). Spatial Viiwn 2,213-293. Young, R. A. (1993). General Motors Research Labs. Technical Report GMR-7878. Yuille, A. L., Kammen, D. M. (1989). Biol. Cyber. 61, 183-194. Zak, J. (1967). Phys. Reu. Lett. 19, 1385-1397. Zeevi, Y. Y., and Gertner, I. (1992). J. Viual Comm. Image Repres. 3, 13-23. Zibulski, M., and ZRevi, Y. Y. (1993). IEEE Trans. Signal Process. 41, 2679-2687.
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL. 97
Models and Algorithms for Edge-Preserving Image Reconstruction L. BEDINI. I. GERACE. E. SALERNO. AND A. TONAZZINI Consiglio Nazionale delle Ricerche. Istituto di Elabomzione della Informazione. Via Santa Maria. 46. 1-56126 Pisa. Italy
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 ..................... 91 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 . . . . . . . . . . . . .94 . . . . . . . . . . . . . . . . . . .94 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 ........................ 98 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 . . . . . . . . . . . . . . . . . . . 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 ....................... 104 ............................ 106 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 . . . . . . . . . . . . 119 ..................... 120 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 .......................... 129 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 ............................... 137 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 ..................... 149 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
1. Introduction
A . Regularization and Smoothness B. Accounting for Discontinuities C. Edge-Preserving Reconstruction Algorithms D.Overview I1. Inverse Problem. Image Reconstruction. and Regularization A . Objects. Observations. and the Direct Problem B. Data and the Inverse Problem C. Regularization 111. Bayesian Approach A . Composition of States of Information B. Solving the Inverse Problem C. Optimal Estimators Based on Cost Functions D . The Gaussian Case IV . Image Models and Markov Random Fields A. MRFs and Gibbs Distributions B. Introducing Discontinuities V. Algorithms A . Monte Carlo Methods for Marginal Modes and Averages B. Stochastic Relaxation for MAP Estimation C . Suboptimal Algorithms VI . Constraining an Implicit Line Process A . Mean Field Approximation B. ExtendedGNC C . Sigmoidal Approximation VII . Determining the Free Parameters A . Regularization Parameter B. MRF Hyperparameters C. Parameter Estimation from Training Data VIII . Some Applications A . ExplicitLines B. ImplicitLines IX. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References
......................................... 85
181 184
.
Copyright 8 1996 by Academic Press Inc . All rights of reproduction in any form resewed.
86
L. BEDINI ET AL.
I. INTRODUCTION Image restoration and reconstruction are fundamental in image processing and computer vision. Indeed, besides being very important per se, they are preliminary steps for recognition and classification and can be considered representative of a wide class of tasks performed in the early stages of biological and artificial vision. As is well known, these are ill-posed problems, in that a unique and stable solution cannot be found only by using observed data but always entails using regularization techniques. The rationale is to force some physically plausible constraints on the solutions by exploiting a priori information. The most common constraint is to assume globally smooth solutions. Although this may render the problem well-posed, it was evident from the start that the results were not satisfactory, especially when working with images where abrupt changes are present in the intensity. As these discontinuities play a crucial role in coding the information present in the images, many researchers tried to introduce some refinements in the regularization techniques to preserve them. One idea was to introduce, as constraints, functions that vary locally with the intensity gradient so as to weaken the smoothness constraint where it has no physical meaning. Another approach, closely related to the first, is to consider discontinuities as explicit unknowns of the problem and to introduce constraints on their geometry. In both approaches, the computation is extremely complex, and thus several algorithms have been proposed to find the solution with feasible computation times. In this chapter we begin with Tikhonov’s regularization theory and then formalize the edge-preserving reconstruction and restoration problems in a probabilistic framework. We review the main approaches proposed to force locally varying smoothness on the solutions, together with the related computation schemes. We also report some of our results in the fields of restoration of noisy and blurred or sparse images and of image reconstruction from projections. A. Regularization and Smoothness
From a mathematical point of view, image restoration and reconstruction, as well as most problems of early vision, are inverse and ill-posed in the sense defined by Hadamard (Poggio et al., 1985; Bertero et al., 1988). This means that the existence, uniqueness, and stability of the solution cannot be guaranteed (see Courant and Hilbert, 1962). This is due to the fact that information is lost in the transformation from the image to the data,
EDGE-PRESERVING IMAGE RECONSTRUCTION
87
especially in applications where only a small number of noisy measurements are available. To compensate for this lack of information, a priori knowledge should be exploited to “regularize” the problem, that is, to make the problem well-posed and well-conditioned, so that a unique and stable solution can be computed (Tikhonov, 1963; Tikhonov and Arsenin, 1977). In general, a priori knowledge consists of some regularity features for the solution and certain statistical properties of the noise. One approach to regularization consists of introducing a cost functional, which is obtained by adding stabilizers, expressing various constraints on the solution, to the term expressing data consistency. Each stabilizer is weighted by an appropriate parameter. The solution is then found as the minimizer of this functional (Poggio et al., 1985; Bertero et al., 1988). A number of different stabilizers have been proposed; their choice is related to the implicit model assumed for the solution. In most cases, such models are smooth in some sense, as they introduce constraints on global smoothness measurements. In standard regularization theory (Tikhonov and Arsenin, 1977), quadratic stabilizers, related to linear combinations of derivatives of the solution, are used. It has been proved that this is equivalent to restricting the solution space to generalized splines, whose order depends on the orders of the derivatives (Reinsch, 1967; Poggio et al., 1985). Another classical stabilizer is entropy, which leads to maximum entropy methods. Many authors insisted on the superiority of the entropy stabilizer over any different choice. Maximum entropy has indeed two indisputably appealing properties. First, it forces the solution to be always positive. Second, it yields the most uniform solution consistent with the data, ensuring that the image features result from the data and are not artifacts. For this reason, maximum entropy methods have been extensively studied and used in image restoration/reconstruction problems (Minerbo, 1979; Burch et al., 1983; Gull and Skilling, 1984; Frieden, 1985). In Leahy and Goutis (1986) and Leahy and Tonazzini (1986), the model-based interpretation of regularization methods was well formalized and the explicit form of the model for the solution was given for a set of typical stabilizers. This interpretation shows that no stabilizer can be considered superior to the others, which means that it should be chosen from our prior expectations about the features of the solution. Another approach to regularization, which proves to be intimately related to the variational approaches, is the Bayesian approach. The solution and the data are considered as random variables and all kinds of information as a suitable probability density, from which some optimal solution must be extracted. The reconstruction problem is thus transformed into an inference problem (Jaynes, 1968, 1982; Backus, 1970;
88
L. BEDINI ET AL.
Franklin, 1970). Tarantola (1987) proposed a general inverse problem theory, completely based on Bayesian criteria and fully developed for discrete images and data. Tarantola argued that any existing inversion algorithm can be embedded in this theory, once the appropriate density functions and estimation criterion have been established. Tarantola’s theory enables a deep insight into inverse problems, and it can also be used to interpret or compare different results or algorithms. However, translating each state of information into an appropriate density function is one of the difficulties of this theory. Once this has been done, the so-called prior density, expressing the extra information, is combined with the likelihood function, derived from the measurements and from the data model, thus resulting in the posterior density. This can be maximized, to give the maximum a posteriori (MAP) estimate, or used to derive other estimates. One solution is to look for the estimate that minimizes the expected value of a suitable error function, such as the MPM (maxima of the posterior marginals) and the TPM (thresholded posterior means) estimates. These estimates minimize the expected value of the total number of incorrectly estimated elements and the sum of the related square errors, respectively. Whereas MPM and TPM have a pure probabilistic interpretation, the MAP estimate can be seen as a generalization of the variational approach described above. Indeed, a cost functional can always be seen as the negative exponent (posterior energy) of an exponential form expressing a posterior density. Thus, minimizing a cost functional is equivalent to maximizing a posterior density. From this point of view, the stabilizer can also be seen as the negative logarithm of the prior and the prior as the exponential of the negative stabilizer. By virtue of this equivalence, hereafter the terms “cost functional” and “energy” will be used indifferently. In many cases, the cost functional is convex. This means that standard descent algorithms can be used to find the unique minimum (Scales, 1985). Nevertheless, because the dimension of the space where the optimization is performed is the same as the image size (typically 256 X 256 pixels or more), the cost for implementing these techniques is very high. This is especially true when the cost functional is highly nonquadratic, as in the case of the entropy stabilizer. Neural networks could be a powerful tool for solving convex, even nonquadratic, optimization problems. This is related to the ability of a stable continuous system to reach an equilibrium state, which is the minimum of an associated Liapunov function (La Salle and Lefschetz, 1961). Electrical analog models of neural networks have been proposed as a basis for their practical implementation (Poggio and Koch, 1985; Poggio, 1985; Koch et al., 1986). The computation power of these circuits is based
EDGE-PRESERVING IMAGE RECONSTRUCTION
89
on the high connectivity typical of neural systems and on the convergence speed of analog electric circuits in reaching stable states. In Bedini and Tonazzini (1990, 19921, we suggested using the Hopfield neural network model (Hopfield, 1982, 1984, 1985; Hopfield and Tank, 1986) to effectively solve the problem of the restoration of blurred and noisy images. Another problem arising in the variational approach to regularization is the choice of the parameters in the cost functional. Regarding a convex cost functional as the Lagrangian associated with a constrained minimization problem and the parameters as the Lagrange multipliers, the necessary conditions for the minimum also specify equations to be satisfied by the Lagrange multipliers (Luenberger, 1969). In many cases, however, the solution of these equations is a formidable computational problem, because they are nonlinear. Bedini et al. (1991) proposed a different method that allows the Lagrange multipliers to be estimated with a relatively low cost. The method is based on the primal-dual theory for solving convex optimization problems (Luenberger, 1984). The original, primal, problem is reformulated in an equivalent form so that the related dual problem can be solved through a single unconstrained maximization. The solution of the original problem is then related to the solution of the dual problem through a model that depends on the particular stabilizer adopted. The method has been derived for the restoration of blurred, noisy images and for different kinds of stabilizers, such as cross-entropy and energy. An alternative, which is common in standard regularization, is merely to consider the regularization parameters as weights that balance data consistency and a priori information in the cost functional. The choice of these weights is still a critical task, as they considerably affect the quality of the reconstructions, so that some objective criteria should be devised to determine them. Some of these criteria are shown in Tikhonov and Arsenin (1977), Golub el al. (1979), and De Mol (1992).
B. Accountingfor Discontinuities The smoothness constraint has its validity in image processing because physical images are usually smoothly varying. However, object boundaries, occlusions, textures, and shadows can cause discontinuities in image intensity. Ordinary regularization techniques normally fail in these situations, because the smoothness properties of the stabilizers cannot be varied over the reconstruction domain. The first attempts to take discontinuities into account were made by using stabilizers that are nonquadratic functions of the image gradients. Their aim was to encourage smoothing within homogeneous regions with-
90
L. BEDINI ET AL.
out excessively penalizing the high gradients that occur at the boundaries between different regions. Some authors suggested using nonquadratic yet still convex stabilizers (Besag, 1989; Green, 19901, and others proposed nonconvex stabilizers that have a finite asymptotic behavior at infinity (Geman and McClure, 1985, 1987; Blake and Zisserman, 1987a, b; Geman and Reynolds, 1992). Another approach to treating image discontinuities is to augment the cost functional by means of the explicit introduction of a line process. Terzopoulos (1986, 1988) introduced a class of “controlled-continuity” stabilizers that preserve the discontinuities by spatially controlling the smoothness of the image. Discontinuities must be located in advance (Terzopoulos, 1986; March, 1988) or can be considered unknowns of the problem (Terzopoulos, 1988; March, 1989). Blake and Zisserman (1987a, b) and Mumford and Shah (1989) proposed variational techniques to optimize a functional where the interaction between the intensity field and the unknown discontinuity set is described by a particular weak membrane energy. Blake and Zisserman assumed a discrete image model and proposed a graduated nonconvexity (GNC) algorithm to recover a piecewise smooth solution image, by first eliminating the binary line process from the weak membrane energy. In a continuous setting, Mumford and Shah established a cost functional with a singular part representing the discontinuity set. The numerical optimization of such a functional is a very difficult problem. March (1992) uses the r-convergence theory (see De Giorgi, 1977; Ambrosio and Tortorelli, 1990) to put the cost functional of Mumford and Shah into a more tractable form. The singular cost functional is transformed into an elliptic functional, which can be minimized by standard numerical techniques. The discontinuity set can be precisely located by means of a sequence of smooth functions, converging to Terzopoulos’ continuity control function. Geman and Geman (1984) outlined the advantages of using a Bayesian approach in which the image is modeled as a pair of Markov random fields (MRFs). One of them (the intensity process) represents the multilevel field of the pixel values; the other (the line process) is a two-level field representing the discontinuities. The Clifford-Hammersley theorem establishes the equivalence between MRFs and Gibbs distributions. The local correlations between image elements can thus be expressed in the form of Gibbs priors. This is particularly useful, as it allows many image features to be accounted for by introducing simple local terms into the prior. It is also flexible in exploiting certain physical and geometrical constraints on the lines, such as their smoothness and connection features. In this approach, the solution to the reconstruction problem has generally been defined as a MAP estimate, although other estimators have been
EDGE-PRESERVING IMAGE RECONSTRUCTION
91
proposed (Marroquin, 1984; Marroquin et al., 1987). Nevertheless, the nonconvexity of the cost functionals arising in these cases entails using minimization algorithms with extremely high complexity. Many attempts have been made to reduce the complexity of nonconvex optimization or to devise algorithms at feasible costs. As already seen, when the stabilizer acts locally on the image intensity and satisfies certain properties, the discontinuities can be preserved without introducing extra variables. Geman and Reynolds (1992) established a strict relationship between explicit and implicit line treatment. They derived a “duality theorem,” which relates a class of primal energies with implicit lines and a class of dual energies with explicit lines. Primal and dual energies are equivalent, in that their global minima over the intensity variables coincide. The duality theorem states the conditions under which this equivalence exists and gives a tool for deriving the dual from the primal. Implicit line treatment is interesting because efficient, although suboptimal, deterministic algorithms can be used instead of the stochastic algorithms usually required for explicit line treatment. These algorithms have mainly been developed for noninteracting lines. This means that significant constraints are not enforced on line geometry. On the other hand, prior information on the interactions between lines is available in many reconstruction problems. For instance, the discontinuities associated with edge contours are often connected (hysteresis) and thin (nonmaximum suppression). Introducing this information into the problem would greatly improve the quality of the reconstructed image. Thus, some authors proposed approaches which are suitable for treating interacting lines in implicit form. They found that some approximations are needed. Geiger and Girosi (1991) proposed the mean field theory to average out the binary line process from the weak membrane energy and derived some ad hoc approximations to enforce the line continuation constraint. In the context of a GNC algorithm for image reconstruction, we adopted a different approximation, which permits connected and thin lines to be obtained (Bedini et al., 1994a).
C. Edge-PreservingReconstruction Algorithms
As already seen, in regularization techniques involving discontinuities the cost functional is not convex, and the usual descent algorithms do not ensure that the global minimum will be found. In principle, the computation thus has to be performed by using stochastic relaxation algorithms, to
92
L. BEDINI ET AL.
avoid local minima at which descent algorithms would get stuck. Most of these algorithms employ simulated annealing techniques. Despite their convergence properties, these algorithms are computationally very heavy, owing to the size of the problems treated and the number of iterations required. Two main strategies have been adopted to reduce the complexity of the problem and/or the execution times. In the first, the problem is transformed or approximated to permit the use of totally or partially deterministic algorithms. In the second, parallel algorithms are studied, for use on general-purpose or dedicated parallel architectures. With an application for the restoration of blurred and noisy images with explicit line treatment, Geman and Geman (1984) proposed finding the MAP estimate through a parallel Gibbs sampler algorithm. This is possible in image restoration, in that the posterior distribution is still Gibbsian, with relatively small neighborhoods. Parallel approaches to image reconstruction from projections (or any other problem that has the degradation operator with a broad support) are not so direct, because the posterior probability is no longer Gibbsian. In Bedini and Tonazzini (1992) a mixed-annealing algorithm is proposed for the parallel computation of the MAP estimate, which is suitable for both image restoration and reconstruction. In this algorithm, the minimization is performed by an annealing scheme, which can be considered as the cooperation of two computational blocks. The first performs a quadratic minimization over the continuous intensity variables and can be effectively implemented using a linear neural network. The second block updates the binary line process by means of a Gibbs sampler and can be implemented by a grid of processors working in parallel. In this hybrid architecture, the linear neural network would support most of the computation. Iterated conditional modes (ICM) is a deterministic algorithm proposed by Besag (1986) for discrete intensity fields. ICM approximates the MAP estimate by computing the maximum of the posterior probability of each image element, conditioned on the values assumed by all other elements at the previous iteration. Extensions to continuous intensity fields are interesting when simple closed forms for the solutions can be derived. ICM can also be used to update the binary line process, for instance, within a mixed-annealing scheme. In these cases, however, better results can be obtained by a slight modification of ICM, called iterated conditional averages (ICA), which can prevent the (continuous) line process from converging faster than the intensity process (Johnson et al., 1991). Blake and Zisserman (1987a, b) derived their cost functional for implicit line treatment by eliminating the binary line process from the weak membrane energy. Because it remains nonconvex, these researchers developed a special optimization algorithm, which is based on the minimization
EDGE-PRESERVING IMAGE RECONSTRUCTION
93
of a series of approximations of the original cost functional. The minimizations can be performed by standard gradient descents. This is the graduated nonconvexity algorithm, which aroused much interest for its simplicity and reduced computational complexity compared with stochastic approaches. In their mean field annealing approach, Geiger and Girosi (1991) provided a parametric family of energy functions, converging to the same cost functional as Blake and Zisserman’s, and showed that, when applied to this family, GNC can be seen as a deterministic annealing. They approximated the global minimum by iteratively minimizing the energy functions through the solution of deterministic equations. Geman and Yang (1994) proposed a linear algebraic method for implementing regularization with implicit discontinuities. When suitable auxiliary variables are introduced, the posterior distribution becomes Gaussian in the intensity variables, with a block circulant covariance matrix. A simulated annealing algorithm with simultaneous updating of all the pixels can thus be designed using fast Fourier transform (FFT) techniques. In emission tomography, iterative deterministic algorithms based on the expectation-maximization method have been adopted for maximum likelihood estimation (Dempster et al., 1977). Since expectation-maximization exploits knowledge on the random nature of the physical data generation process, these algorithms produce better reconstructions than those of the classical methods, such as filtered backprojection. Generalized expectation-maximization (GEM) algorithms have also been proposed to solve the MAP problem that arises when Gibbs priors are introduced to stabilize the solutions. These techniques can address discontinuities either implicitly or explicitly (Hebert and Leahy, 1989; Gindi et al., 1991; Leahy and Yan, 1991). Like mixed annealing, GEM can handle mixed continuous and binary variables by splitting each iteration into two independent steps, one acting on the continuous variables and the other on the binary variables. For this reason, GEM permits the incorporation of various forms of interactions among discontinuities. GEM algorithms have also been derived for transmission tomography from Poisson data (Lange and Carson, 1984). In Salerno et al. (1993) and Bedini et al. (1994d) we considered the GEM approach for transmission tomography, assuming a Gaussian data model and explicit discontinuities.
D. Overview This chapter is a review (by no means complete) of edge-preserving regularization for image reconstruction and restoration. It obviously re-
94
L. BEDlNl ET AL.
flects our particular point of view and is particularly influenced by our experience in this field. We first describe regularization and the problem of discontinuities in the deterministic and Bayesian approaches (Sections I1 and 111). In Sections IV and V, we introduce our point of view on image reconstruction, which is Bayesian and considers discrete images and data. Some of the issues that are still open in this field, namely the introduction of constraints in the discontinuity set and parameter estimation, are treated in Sections VI and VII, respectively. In Section VIII, we show some applications of the techniques and algorithms described, taken from our previous work, regarding image restoration and tomography. We present the results of some experiments, showing the influence of different methods for introducing the discontinuities in the image model.
AND REGULARIZATION 11. INVERSE PROBLEM, IMAGE RECONSTRUCTION,
In this section, we define the inverse problem of image reconstruction and restoration and introduce an approach for its solution, which is developed in the following sections. Although the physical processes involved and the numerical difficulties in implementing and executing the algorithms are very different in different cases, reconstruction and restoration are formally analogous and can be treated in a unified framework. For this reason, we speak of image reconstruction, meaning both the problems addressed in this chapter. Both the image and the data to be measured can be modeled as continuous or discrete functions. In this respect, the data generation model can be continuous-continuous, continuous-discrete, or discretediscrete (Andrews and Hunt, 1977). However, most works assume a discrete-discrete model, so we too will refer to this type of model, although the formal development of the theory is often more general. In this section in particular, all the relations are valid for any kind of model. A. Objects, Observations, and the Direct Problem
Consider an N-dimensional vector space X that contains the images to be reconstructed, x, whose elements are the image pixels; we call it object space. Suppose that there exists an operator, d, related to some physical process, which maps the object space onto another M-dimensional vector
EDGE-PRESERVING IMAGE RECONSTRUCTION
95
space, Y (the observation space), namely, which transforms each x E X in a unique element y E Y, so that y =MIX).The study of the physical system consists of deriving a model for d. Because the physical process may be very complex or even partially unknown, this model is generally an approximated operator, A. For example, in image restoration A could be a convolution operator, with a certain point spread function; in tomographic reconstruction, A could be the Radon transform. In these two examples, the model A is linear and approximates sufficiently well the real process. This situation is very frequent in practical applications. A linear approximation for the true operator st’ enables the exploitation of the very powerful tools of linear algebra. Computing y from x will be referred to as the direct problem, because the goal is to obtain the effect, or response, of a known physical system to the input object, or stimulus. In a causal world, the solution of the direct problem is unique, but there will always be some uncertainty. In fact, operator A introduces an error n1 (the model error), because, as already mentioned, some physical approximations are introduced to model the data generation process. It is
y =Ax
+ nl.
(1)
B. Data and the Inverse Problem Suppose now that (as always happens) our measurement system introduces additional uncertainty in the data, and call g the vector obtained through the measurement system. Then g=y
+ n 2 = A x + n,
(2)
where n is the system noise, generated jointly by the model error, n,, and the measurement error, n2. The inverse problem of image reconstruction consists of estimating x from g and A. This can be done by imposing data consistency, i.e., by searching for an x such that Ax is not too far from g. However, this is always an ill-posed problem, in the sense of Hadamard, in that the existence, uniqueness, and stability of the solution are never guaranteed (Tikhonov and Arsenin, 1977). Existence is not assured because the system noise can cause incompatibility between data and images. Uniqueness is not assured because A is usually a noninvertible operator. In the continuous-continuous or continuous-discrete model, stability is related to the continuity of the inverse operator. In the discrete-discrete model, a continuous generalized inverse operator always exists, so that the problem
96
L. BEDINI ET AL.
is always well-posed. However, the inverse operator is generally ill-conditioned, and finite (although very small) errors in the data may be highly amplified in the solutions. C. Regularization
The means of overcoming ill-posedness and ill-conditioning is regularization, i.e., any technique for obtaining a unique and stable solution by studying a well-posed restriction of an originally ill-posed problem. This restriction is always knowledge based. Note that the data set is not the only piece of information we have on the problem and that even the classical reconstruction and restoration algorithms, which do not make explicit use of any knowledge but the data set g, actually impose very strong constraints on the solutions. These constraints are always based on implicit assumptions on the solutions. We will show below how additional information can be explicitly introduced into the problem to obtain a regularized solution. Let us start with data consistency, and suppose we have a criterion to determine whether Ax is too far from the data; that is, we know a constant K such that any estimate for x must satisfy the following condition:
where the squared norm term (any norm in the data space) is called residual. Intuitively, K is smaller for lower noise and, in the limit, goes to zero for a noise-free system; K can be defined more precisely if we introduce particular assumptions on the noise. The set defined by Eq. (3) contains the so-called feasible solutions of the problem. As already said, condition (3) alone often establishes an ill-posed problem. Using the constrained least-squares approach (Andrews and Hunt, 19771, image reconstruction can be reformulated as a variational problem, and its solution can be computed by optimizing suitable cost functions over the set of feasible solutions. These functions model our prior information on the solution to be sought and can be seen as particular norms in the object space. Let C(x) be one of these functions. Minimizing C(x) subject to Eq. (31, if C is convex, leads to a unique solution, which is also the solution of an equivalent unconstrained minimization problem: 2
=
arg mini18 X - AXII*+ AC(X),
(4)
for a particular value of the Lagrange multiplier A. In standard regularization (Tikhonov and Arsenin, 1977) this estimate has another interpreta-
EDGE-PRESERVING IMAGE RECONSTRUCTION
97
tion: the solution image results from a compromise between the residual, related to data consistency, and the cost functional, now called stabilizer, which enforces regularity in the solution. In this case, the nonnegative regularization parameter, A, is no longer a Lagrange multiplier and determines this compromise, in that its value can be tuned to balance the effect of data and regularity on the solution. Intuitively, A should be large if the data set is heavily corrupted by noise and small otherwise. In Section VII, we shall see some selection strategies for the regularization parameter. Here are some examples of the stabilizers:
C,(x) C,(X)
=
=
llVxIl2,
c x i log x i . i
When used in Eq. (41,stabilizer C, prevents the solution from having a large energy, stabilizer C, prevents the solution gradient from having a large energy, stabilizer C, gives the so-called maximum entropy solution. These three functionals, and many others, enforce global constraints on the solution: they require a particular property of the solution to be satisfied everywhere in its support region. From physical considerations (e.g., coherence of matter), this property is often smoothness. For the case of C,, for example, first-order smoothness means no large gradient magnitudes. Higher order smoothness can be obtained using the Laplacian or higher order differential operators. For C,, it can be proved (see Jaynes, 1982; Burch et al., 1983) that the solution obtained is maximally flat, compatible with data consistency. The use of global constraints has been proved to give unique and stable solutions in image reconstruction, but normally the smoothness assumption is not valid everywhere. In fact, any physical image can be assumed piecewise smooth, i.e., composed of a number of connected regions where smoothness is verified, separated from one another by a set of curvilinear boundaries, where the smoothness constraint has no physical meaning. Reconstructing one such image using a global smoothness stabilizer leads to a solution that is oversmoothed across the boundaries. In image analysis, whatever its purpose, boundary detection and location are very important tasks. It is thus clear that a reconstructed image with no boundary information is unacceptable for most applications. The way to reconstruct images while preserving the boundaries (edgepreserving image reconstruction) is to adopt stabilizers that act locally on the image to be reconstructed. Although we could also define local
98
L. BEDINI ET AL.
stabilizers in the present context, we prefer to introduce them in the framework of the Bayesian techniques, which give a more comprehensive point of view on image reconstruction and to which the standard regularization approach shown here can be brought back in particular cases.
APPROACH 111. BAYESLAN
According to the general inverse problem theory as presented by Tarantola (1987), the Bayesian approach to the solution of any inverse problem is based on considering any piece of information on the problem as a measure density function on the appropriate space. In particular, any measure density becomes a probability density if it can be normalized, and each parameter of the problem is considered as a random variable, with its probability function. In this theory, the solution to an inverse problem is the joint density (or posterior) obtained by composing the densities for the model information with those for prior knowledge on the solution, the data, and the measurement system. Using the posterior as the final state of information, image reconstruction can be treated as a parameter estimation problem. The theory is fully developed for the discrete case. The generalization for a continuous model is not complete because not all the properties of probability densities can be extended to the infinite-dimensional case. What follows in this section should thus be considered valid for a discrete-discrete model In the following, we briefly outline the ideas and the formulas of the composition of states of information and the theory of estimation. We also show a particular case in which the result is the same as the one obtained by standard regularization. We use the same notation and functional space definitions introduced in the previous section; a complete treatment can be found in Tarantola’s textbook.
A. Composition of States of Information
Let us consider a sample space 8,mapped, as usual, onto a vector space M, isomorphic to RN,whose elements are N-vectors describing the events in 8. We can establish a probability measure in 3,described by a probability density in M . This probability density, say f , is a state of information on M , and each subset in M has its measurable information
EDGE-PRESERVING IMAGE RECONSTRUCTION
99
content on the basis of f. The way to solve an inference problem is to collect all possible states of information on M and derive the final state of information from them. If m is the generic element of M, the formula for the composition of two states of information, related to the densities f,(m) and f,(m) is the following:
This can be shown to be a generalization of the AND logical operator. The normalizing density p ( m ) is called noninformatwe or zero information density and enables the final information content to be kept independent of possible changes of coordinates in M, or equivalently, of different parametrizations of the space a (see also Jaynes, 1968).
B. Solving the Inverse Problem We shall use Eq. (8) to solve our inverse problem after we have translated all our information into probability densities. In this case, the vector space M is the Cartesian product of the object and the observation spaces. The vector m will thus be obtained by juxtaposing vectors x and y. 1. Theoretical Information
Let us consider the direct problem; that is, let us try to model the operator that maps images into data under the form of a probability density. With reference to Eq. (0,because of the presence of the model error, it appears that the relationship between y and x can be described only in probabilistic terms. This can be done by using a conditional probability O(yIx). From this density, we derive the joint density of observations and objects, assuming at this stage no prior knowledge on the real object:
where px is the zero information density for the object. For example, if the error is null or negligible, we have
where S( is the Dirac delta function. Another example, the Gaussian observation model, will be shown in Section III,D.
100
L. BEDINI ET A L
2. Prior Information
Let us now introduce a probability density function that contains all kinds of prior information on the problem. Assuming independence between a priori information on objects and observations, we have: P(Y,X)
= PY(Y)
PX(X).
(11)
The density pv represents the prior state of information on the observations, which includes knowledge of the data set g; the density px represents prior knowledge on the solution. If we assume no prior information on the values of the observations, we have: PY(Y) =
4 B IY) cLY(Y),
( 12)
where g is the measured data vector, p y is the zero information density for the observations, and v(g Iy) is the probability density of the data, given the observations, which is characteristic of the measurement system used. This formula is a generalization of the first equality in Eq. (2). As far as the density px (hereafter called prior) is concerned, we have already said that it should be designed to enforce smoothness on the solution. Sections IV, VI, and VII deal with the determination of useful priors for the solution. 3. Posterior Density
Let us now apply Eq. (8) to compose the information contained in (9) and (11); also using (121, we obtain:
(13)
which does not depend on the unspecified zero information densities. The density function CJ contains all the posterior information about both the object and the observations and can be considered as the solution of the inverse problem. If we look for an estimate of x, we should evaluate the marginal posterior density for that vector:
Since the estimation of y is out of our scope, we shall consider Eq. (14) to be the solution of the reconstruction problem. The unessential subscript X
EDGE-PRESERVING IMAGE RECONSTRUCTION
101
will hereafter be omitted from the notations of the posterior marginal a, and of the prior density p x .
C. Optimal Estimators Based on Cost Functions The posterior a ( x ) is the ultimate state of information, from which we should find an optimal estimate for the object x. To this end, we must choose the particular optimality criterion to follow. A good approach to the representation of optimality criteria is to introduce cost functions. Let us define a distance function, C(x*, x), between two different objects x* and x, and call it cost of x* with respect to x. Let xop,.be the image to be estimated, and let us try to evaluate an x* to minimize the cost C(x*, xOpt). Unfortunately, the only information we have on the solution is contained in o(x), and thus C(x*,xoPl) cannot be directly computed. However, if a ( x ) is reasonably concentrated around xopI, we can assume C(x*, xopl)to be equal to the expected value of C(x*, x) according to d x ) . Assuming that the image pixels have discrete values, we have E,{C(X*,X))
c
=
C(X*,X)
a(x),
( 15)
XEM
where the summation range M is now the set of all possible configurations assumed by x. Let us now introduce a property that may help to minimize (15) over x * . It can be proved (Marroquin, 1985) that, if c(x*,x)
=
Cci(x’,xj),
i = 1 , 2,..., N ,
( 16)
i
with = 0 Ci(a’ ’)( > 0
fora = b, otherwise,
then E,{C;( X ’ ,
Xi)} =
c C;(
XT, X i )
X,E
a i (X i ) ,
( 18)
A
where A is the discrete set of the values assumed by the ith pixel, and a i is the related marginal density. In other words, if the cost is an additive function over all the image pixels, under condition (171, the minimizer of (15) over x* can be found by minimizing separately the expectations of the Ci’s with respect to the related marginals. The difficulty inherent in this approach is the calculation of the marginals, in that each of them is obtained by summing a ( x ) over a huge high-dimensional space.
102
L. BEDINI ET AL.
The particular choice of the cost function will influence the estimate of the image and will clarify exactly how this estimate is optimal. Below, we introduce three different cost functions with the related estimation criteria. 1. Maximizer of the Posterior Marginals The following cost function:
is the count of the pixels of x* that are different from the corresponding pixels of xOpt.Observe that this function has the property (16)-(17), and thus is can be minimized separately for each i . The expectation of each term is the sum of the marginal minus its value for x i = x:. The minimizers of the costs per pixel are thus the maximizers of the marginals. The derived estimate is thus called maximizer of the posterior matginak (MPM). 2. Marginal Posterior Mean
The cost function:
measures the total squared distance of x* from x, t . Once again, the minimization can be made using the marginals (185. Let us show the expected cost of the ith pixel: E{Ci<XT,Xi))=
C
(xi* - x i ) * a ' ( x ; ) = x T 2
- 2(Xi>X?
+ (Xf),
(21)
xi€ A
where ( x i > denotes the expectation of the ith pixel. The minimizer of this expression is obtained by choosing for x,* the discrete value nearest to ( x i > . It is easy to verify that the expectation of Ci in this case is the variance of the pixel value over the marginal m i . 3 . Maximum a Posteriori Probability
The function 0 C(x*,xopt) =
for x* = xopt, for x* # xopt
assigns the same cost to any image different from xOpt.It is easy to verify that, using Eq. (22) in Eq. (151, we obtain the sum of the posterior density,
EDGE-PRESERVING IMAGE RECONSTRUCTION
103
minus a ( x * ) . The minimum cost will then be obtained if x* is the maximizer of d x ) ; the cost function (22) thus defines the well-known maximum a posteriori (MAP) estimation criterion. D. The Gaussian Case
Now, let us show how the Bayesian approach can lead to functionals similar to those met in standard regularization, Eqs. (4)-(7). In fact, this can be achieved if the observation and data models are considered to be Gaussian. We adopt here the same notation introduced in Section II1,B. Let us assume that the observations are related to the true image through operator A, up to a Gaussian deviation with variance a::
where M is the number of observation samples. Furthermore, let us suppose that the measured data are affected by a Gaussian error with variance a;:
If we use Eqs. (23) and (24) in Eq. (14) to calculate the joint posterior, we find the convolution between two Gaussian densities, which is another Gaussian density with variance u 2= a: + a:, multiplied by the prior density p:
d(27rU2p Note that this equation can be seen as the posterior density f ( x Ig), obtained by the Bayes rule:
ignoring the denominator, which is constant in x. Equation (14) is thus a generalization of the Bayes rule. The function which appears in Eq. (14) as an integral becomes the usual likelihood density only when the data and the measurement models are Gaussian.
104
L. BEDINI ET AL.
Let us take the negative logarithm of Eq. (251, ignoring the constant terms: -logu(x) =
llg - Ax1l2 - log p(x) 2u2
= E(x).
Maximizing the posterior u is equivalent to minimizing the negative log-posterior, or posterior energy, E(x). Observe that the minimization of (27) gives the same solution as Eq. (41, with A = 2u2 and C(x) = -log p(x). Thus, the Gaussian assumptions (23) and (24) can reduce the MAP estimation to the standard regularization formula (4). The introduction of a stabilizer in standard regularization, as presented in the previous section, can thus be interpreted as a particular choice of the log-prior in a Bayesian setting. If p(x) is assumed to be constant, the MAP estimation is reduced to the minimization of the square norm term, derived from the Gaussian likelihood function. This is the well-known maximum likelihood (ML) criterion, which leads, for example, to the least squares and pseudoinverse solutions. ML solutions are often very unstable, and special stopping criteria have been developed for the iterative optimization algorithms to avoid too many artifacts in the images (Veklerov and Llacer, 1987). Another strategy for obtaining a stable solution is to give the prior p(x) a suitable form. In the previous section, we showed three global log-prior functions; in the next section, we will introduce a general framework to establish local, edge-preserving, priors.
Iv.
IhrlAGE MODELSAND MARKOV RANDOM
FIELDS
We have shown that the Bayesian approach to regularization offers a way to express our prior knowledge in the form of prior models for the solution. As the aim of image reconstruction is to recover an intensity map that represents the spatial distribution of some physical quantity, these models should be able to describe the complex nature of an image. In particular, they should be able to take into account a variety of attributes which, although related to the behavior of the intensity, have an independent characterization. These attributes can be, for instance, intensity discontinuities, texture types, and connected components. Introducing a priori knowledge available on such attributes would help the reconstruction task and would make the reconstructed images more suitable for image analysis. We have already stressed the importance of taking discon-
EDGE-PRESERVING IMAGE RECONSTRUCTION
105
tinuities into account. In this section, we introduce a general theoretical framework underlying edge-preserving image reconstruction, and we define a class of image models which are suitable for describing the behavior of both the intensity and the discontinuity fields. These models should provide a common characterization for a wide set of images, without overconstraining the features about which we have no prior knowledge. Moreover, the dependence among the field elements should be local, in that the correlation between two elements which are far enough away from each other is not expected to be large in real images. This feature can facilitate the implementation of distributed and parallel reconstruction algorithms. In the literature on image modeling, both deterministic (Andrews and Hunt, 1977) and stochastic (Jeng and Woods, 1988) models have been considered. However, the most common is the class of stochastic models, in which images are considered to be sample functions of an array of random variables (random field). Much research has been devoted to Markov random fields (MRFs) models on finite lattices (Geman and Geman, 1984; Derin and Kelly, 1989; Jeng and Woods, 1991). Indeed, this class of models can describe complex image structures, where both intensity pixels and discontinuities can be treated. Let us start by characterizing the intensity attribute alone. In this case, if we deal with an N X N discrete image, the lattice will be the N x N pixel grid, and the site set S will be any arrangement of the grid, for instance S = { ( i , j ) , i = l , 2,..., N; j = 1,2,..., N } . On the site set, we can define a neighborhood system lY={q,sES).
If s and t are two distinct sites of the lattice, their neighborhoods, gs and g,, respectively, have the following properties: 1. segs 2. s E g,* t
E
PS.
In particular, we can define a homogeneous neighborhood system where the neighborhood of each site (i, j) is the following set:
and r is the order of the neighborhood. Associated with the neighborhood system, there is a set iZ? of cliques. A clique, C,is a single site or a subset of sites such that any two distinct sites
106
L. BEDINI ET AL.
in C are neighbors. The cliques are thus uniquely determined by the neighborhood system chosen. Let us call X a family of random variables associated with S, X = {X,, s E S), whose values x , lie on a common discrete or continuous set A, and call R the set of all the possible configurations
R
=
{x,:x,
E
A,s E S ) .
In our case, each x , represents the value assumed by X, at site s. X is an MRF with respect to the couple (S, 3 ) if and only i f p(x) > 0
vx E R ,
p ( x , I x , , V t # s) = p ( x , I x , , V t
E
(28a)
g,) V s E S , Vx E R , (28b)
where p(x) is the probability density of the random vector X. In other words, the conditional density of the element in s, given all the other elements, actually depends only on the values of its neighbors.
A. MRFs and Gibbs Distributions Any process satisfying Eq. (28a) is completely determined by specifying all the conditional densities. Nevertheless, specifying the conditional densities and deriving the related joint probability is not straightforward. This problem is overcome if the process also satisfies Eq. (28b), i.e., if the process is an MRF. Indeed, by virtue of the Clifford-Hammersley theorem, the joint density of an MRF is that of a Gibbs process, with the same neighborhood system:
where 2 is the normalizing constant, U(x) is called energy function, p is a positive parameter which controls the “peaking” of the distribution, and the potentials VJx) are functions supported on the cliques of the field. This important result allows the joint density of an MRF to be directly derived by specifying the potentials instead of the conditional probabilities. This makes it very easy to model and constrain the local behavior of an MRF with specified neighborhood and clique systems.
EDGE-PRESERVING IMAGE RECONSTRUCTION
107
Assuming an MRF image model and a linear and Gaussian data model, the posterior density (25) and the posterior energy (27) become:
and E(x)=
llg - AX112
+ -,V ( x )
2a2 P respectively. Once again, it is immediate to recognize in Eq. (31) the typical form of the cost functionals derived by the constrained optimization and standard regularization approaches, considering 2 u 2//3 as the regularization parameter. In fact, when U(x) is chosen as a measure for the image derivatives of a given order, it represents a roughness penalty, and Eq. (31) exactly corresponds to the cost functional used in standard regularization. This choice is correctly interpretable as the energy associated with a Gibbs distribution, in that it refers to a neighborhood system whose cliques are the sets of sites needed to compute locally the partial derivatives of a predefined order. In this case, U(x) can be written as
are finite-difference approximations to the kth order (k = where @(XI 0,1,2,3, ...) partial derivatives of x, expressed as functions of the pixels in C (Geman and Reynolds, 1992). Note that, for k = 0 and k = 1, respectively, Eq. (32) is reduced to the energy stabilizer in Eq. (5) and to the gradient stabilizer in Eq. (6). The entropy penalty in Eq. (7) can be seen as an energy associated with a Gibbs Distribution whose only nonzero potential cliques are those made by single elements. Another approach to establishing a prior energy consists of adopting the class of homogeneous Gaussian MRFs (GMRFs). In this subclass of models, the interaction among neighboring pixels can be given by means of the following parameteric model: x '91 . .=
c
hk,l xi-k, j-1
+ ni, j
(33)
k,leN
where: (i) N = {(k,I ) I0 < k 2 + 1' 5 r), r the order of the neighborhood, is the coefficient support region for the space-invariant neighborhood for each pixel.
108
L. BEDINI ET AL.
(ii) (hkJ are a suitable set of coefficients. (iii) ni, is a Gaussian, zero-mean, random field satisfying the following covariance constraint:
E[ni,jn,,,]
=
c'
-hi-k,j-,u2
if ( i , j ) = ( m , n ) , if ( i - m ,j - n ) E N, (34) otherwise.
It is straightforward to verify that a homogeneous GMRF has a Gibbs distribution, as in Eq. (29). In the case of first-order neighborhoods, the cliques can contain one or two sites, and the potentials V,(x) are given by
B. Introducing Discontinuities Although formally expressed as sums of local functions, the prior energies considered above force a global smoothness constraint on the image, due to the propagation of the smoothing throughout the image domain. This drawback can be overcome by introducing prior energies that permit us to locally break or relax the smoothness constraint where discontinuities are likely to occur. This can be accomplished by introducing the discontinuities as explicit auxiliary variables of the problem. The same goal can be reached using particular stabilizers that are able to preserve discontinuities without treating extra variables. In both cases, this can be done in the context of MRF models. 1. Implicit Treatment
In Eq. (32) each potential is a quadratic function of the partial derivatives. This function has the desirable properties of being positive, even, finite in zero, and increasing with the magnitude of its argument. Nevertheless, it increasingly penalizes high differences between neighboring pixels, and this prevents the discontinuities from being recovered. This can be avoided by replacing the quadratic function with a nonquadratic function which retains the good properties mentioned above but allows sharp transitions between distinct regions to be preserved. The general form of the prior
EDGE-PRESERVING IMAGE RECONSTRUCTION
109
energy in this case is
where a and A are positive parameters. Function C#J (also called neighbor interaction function; see Blake and Zisserman, 1987a) is a positive and increasing function of the derivatives @(x), thus enforcing a kth order smoothness constraint on x. Its form, however, can be chosen so as to relax this constraint where it is more likely to have a discontinuity. Various forms for 9 have been proposed in the literature, with different features concerning convexity and asymptotic behaviors. In particular, the following nonmnvex functions (Geman and McClure, 1985; Blake and Zisserman, 1987a; Gindi et al., 1991; Geman and Reynolds, 1992):
share the two properties that their limit at infinity is finite and that c # ~ ( \ / r ) is concave. The plots of these three functions are reported in Fig. 1. When these functions are used in Eq. (36), their meaning can be easily understood. They encourage neighboring pixels to have similar values if the derivatives are lower than A. Beyond this value, a further increase in the derivatives is allowed, with a relatively small increase in the penalty. The differences between neighboring pixels within smooth regions are thus penalized without excessively penalizing the larger differences occurring at the boundaries between different regions of the image. Other nonquadratic, but convex, functions are +‘l(t)
t)
=
Itl,
logmsh( t ) ,
(40)
(41) again shown in Fig. 1. The first was proposed by Besag (1989) and the second by Green (1990) for Bayesian reconstruction in emission tomography. The behavior at infinity of (40) and (41) is linear; discontinuities are thus allowed, but excessive intensity jumps are penalized. The effect of these functions is thus a compromise between a parabola and an asymptotically finite stabilizer such as those shown in Eqs. (37)-(39). Note that the convexity of functions and 45 facilitates the global minimization of the posterior energy. m5(
+,,
=
110
L. BEDINI ET AL. I .25
1.25
I
I
I
-
92w
1
1 -
0.75 0.5
-
0.25
-
0 0
1
2
3
4
0
1
2
3
4
0.5 0.25 0
2.5
5
7.5
10
FIGURE1. Plots of the neighbor interaction functions for implicit line treatment of Eqs. (37), (38), (391, (40) (dotted line), and (41).
All the functions shown above can take discontinuities into account, without introducing extra variables. However, no information on the geometrical line structure can be introduced, at least not straightforwardly. For example, these functions do not distinguish between isolated discontinuities, possibly due to peaks of noise, and connected discontinuities that are part of an object boundary. Blake and Zisserman argue that the truncated parabola of Eq. (37) has a natural hysteresis property; i.e., it tends to promote unbroken edges, without any need to impose additional penalties on line endings. Below, we will show that a more manageable and comprehensive way of treating geometrically constrained discontinuities is to consider them as explicit unknowns of the problem.
2. Explicit Treatment In this approach, the original image is regarded as a pair of interacting MRFs, (X,L), where X is the matrix of the pixel intensities and L is a new field, associated with the discontinuities. We call X the intensityprocess and L the line process.
EDGE-PRESERVING IMAGE RECONSTRUCTION
-----
---I I I I O
O
O
O
111
T
o l o l o l o l o FIGURE2. Grid of intensity and line elements.
In the simplest case L will be made of binary elements, with values 1 (‘‘line on”) and 0 (‘‘line off”). More generally, the line elements can be associated with continuous values, for instance in the range [0,1]. Typically, the line elements are localized in a rectangular interpixel grid and are distinguished into vertical and horizontal elements (Fig. 2). Under this assumption, L will be given by L, and L,, the ( N - 1) x N and N x ( N - 1) random matrices associated with the horizontal and vertical line elements, respectively. The values assumed by the elements of L, and L,. will be denoted by hi, and u,,j , respectively. The set of sites for the global field (X,L)will be given by the union of the intensity and line sites. The configuration space will be the set of pairs (x,l). In this case the neighborhood system Y!. and the related clique system I: must be defined on the mixed set of sites, allowing adjacent pixels and lines to be neighbors. Thus the prior distribution for (X,L) is
where
is the prior energy function. The potentials can be defined on homogeneous cliques (made of intensity sites alone or line sites alone) or mixed cliques (made by intensity and line sites). The general form (42b) thus admits the following decomposition:
112
L. BEDINI ET AL.
where U,<x>models the local constraints of the intensity process, U,(x,I) enforces the dependence between pixel intensities and line element configurations, and U,(I) represents the mutual relationships among neighboring line elements. Besides making the introduction of constraints on the line geometry simple and direct, this approach also allows cooperative processing; for instance, simultaneous reconstruction and edge detection can be performed. The above theory was principally stated by Geman and Geman, who referred to discrete MRFs. Because in many applications it is useful to consider the intensity field Gaussian, compound Gauss-Markov (CGM) models were proposed which combine a continuous intensity process with a discrete (binary) line process. As in the simple GMRFs, interaction among intensity pixels can be expressed in parametric form. However, the coefficients and the noise process depend on the values locally assumed by the line field. Equation (33) is thus modified as follows: x 1. . 1. =
C
'k,/(')Xi-k,j-/
+ ni,j(i)
(43)
k,lcN
where the coefficients hk,.,(l)are controlled by the line process, and the noise process is a conditionally Gaussian noise whose variance is controlled by 1. Computational properties for a broad subclass of CGM models are given in Jeng and Woods (1990). We focus on the class of piecewise smooth images, with connected and thin binary discontinuities. For this class, the neighborhood system in Fig. 3, which is of first order with respect to the intensity sites, is considered sufficient. In Fig. 4 we show two size-two line cliques, one size-four line clique, and two size-three mixed cliques. The possible configurations, up to rotation, for these line cliques are shown in Fig. 5.
I,) --loI a
-Ilo Io1
I
b
0
0 1 ° 1 0 0 C
FIGURE3. First-order neighborhood system for (a) horizontal line element; (b) vertical line element; (c) intensity element.
113
EDGE-PRESERVING IMAGE RECONSTRUCTION
a b C d e FIGURE4. Cliques for the neighborhood system in Fig. 3: (a and h) size-two line cliques; (c) size-four line clique; (d and e) size-three mixed cliques.
The mixed cliques shown in Fig. 4 allow us to enforce a local first-order I): smoothness constraint, according to the following expression for U3(x,
where A and a are positive parameters and flrepresents a threshold on the gradient, above which a discontinuity is likely to be created. In other words, the term U3(x,I)in the prior encourages solutions with discontinuities where the horizontal or vertical gradient is higher than the threshold and are smoothly varying elsewhere. Considered individually, A is a regularization parameter that promotes smoothing in the absence of discontinuities, and (Y represents the cost of creating a discontinuity, so as to prevent the creation of too many discontinuities. If a higher order neighborhood system is chosen for the intensity elements, Eq. (44) can be generalized to take into account higher order
0
0
0
0
a
0
0
0
-
0
0 1 0
0
0 1 0
0
b
0
0
9
7
1
0
0 1 0
e
d
C
-0 1 0
00
0
lo
1.1
h
i
f
FIGURE5 . Possible configurations, up to rotation, for the line cliques of Fig. 4 (a) no line; (b) termination; (c) turn; (d) straight continuation; (e) T-junction; (0cross; (g) no line; (h) single line; (i) double line.
114
L. BEDINI ET AL.
derivatives:
U3(x,l)= CE 0
A[D~(X)]~(~ - 1,) + al,
(45)
where 1, is the line element related to the clique C. The term U,(I) should reflect our prior expectations concerning the structure of discontinuities; for example, we know that lines are generally sparse and connected. A very general form for U,(l) is a table of values related to all the possible configurations for the line cliques. It can be used when there are not too many configurations. A low value of U,(l) makes the corresponding configuration more likely; conversely, the greater U,(I) is, the more unlikely the configuration will be. For instance, it is reasonable to associate a high value with the line termination configuration, in order to impose a low probability of abrupt line endings. For the line cliques in Fig. 4, the term U,(I) has the following form:
where Vl, V,, and V3 are tabular functions associated with all the possible configurations of the size-four and size-two cliques, respectively. To enforce constraints such as the favoring of line continuation and the inhibition of adjacent parallel lines, simple analytic forms for U,(I), which still refer to the same line cliques, can be given as alternatives. One of these forms is
EDGE-PRESERVING IMAGE RECONSTRUCTION
115
where the first and second terms penalize the formation of adjacent parallel lines (double lines), the next six terms favor the formation of continuous lines, and the last five terms penalize the formation of branches and crosses. The values of the parameters pi should thus be negative and can be chosen to give different probabilities to straight lines and turns. 3. Duality Theorem
Let us consider the general form of the posterior energy with explicit lines: E(x,I)
llg =
-
AxI12
2a2
+ U(x,l),
which should be minimized if a MAP estimation is required. If we define a function F ( x ) such that: F(x)
=
inf E ( x , I), I
then arg min F( x)
= x*
,
X
where (x* ,I* )
=
arg min E( x, I). (x, I)
Thus, the search for the global minimum of the posterior energy function E(x, I) can be restricted to the set of pairs (x, I*(x)), where, for each x, I*(x> is the minimum of the energy function over 1. In particular, given the structure of E(x,I), it is
where U ( x ) = inf U ( x ,I). I
For particular forms of U(x,I),U(x) can be computed analytically, and F(x) becomes a function that addresses the discontinuities implicitly rather than explicitly. Considering a prior in the form U3(x,l) in Eq. (44), and exploiting the independence of the line elements, Blake and Zisserman (1987a) calculated a U(x) in the form (36), where the neighbor interaction function 4 is the truncated parabola (37). Geman and Reynolds (1992) generalized this result and derived a duality theorem to establish the conditions on U(x) and U(x,I) for which Eq. (50) holds, so that minimizing E(x,l) is equivalent to minimizing F(x). They started from a prior energy U(x), called primal, in the form of Eq. (36).
116
L. BEDINI ET AL.
Their duality theorem shows sufficient conditions, to be satisfied by 4, for the existence of a function U(x, I) satisfying Eq. (50). U(x,I) is called dual and contains an explicit “line process’’ I, suitably correlated to the intensity process. The theorem is formulated as follows. Given a function 4(1) with the following properties in [0, +m): 1. +(O) = 0. 2. 4tfi) is concave. 3. lim, + - + ( t ) = CY < ~
+m.
then there exist two functions t ( b ) and $ ( b ) defined on an interval [O, MI such that
and satisfying the following properties: 1. 2. 3. 4. 5.
$ ( b ) is decreasing. $(O) = ff.
$(MI = 0. ( ( b ) is increasing. ( ( 0 ) = 0.
The geometrical proof of the theorem is based on the fact that ~$(\/r)is the lower envelope of a one-parameter family of straight lines y = rnt + q, where rn = t ( b ) and q = $(b). Thus, if 4(\/r) is strictly concave, then $ ( b ) is strictly decreasing, t ( b ) is strictly increasing, and M is the right-hand derivative of 4(fi) at the origin. The theorem allows us to define the dual prior energy U(x, I) in the form:
with the property that F(x), as defined in Eq. (49), can be seen as the minimum of E(x,I), as defined in Eq. (481, with respect to 1. Thus, if (x*,l*) is the minimizer of E, then x* is the minimizer of F, and the problems of minimizing functions (48) and (49) are, in this sense, equivalent. In the dual posterior energy E, the line process I is directly associated with the intensity discontinuities. As in F(x), the term [@<x)l2 enforces a magnitude limitation on the value of the kth derivative, that is, a smoothness constraint on x. Nevertheless, the increasing function &-) weakens this constraint where 1, assumes a low value, thus marking the presence of
EDGE-PRESERVING IMAGE RECONSTRUCTION
117
a discontinuity. The function $ has the effect of “balancing” the energy, thus preventing too many lines from being created. Note that, whereas so far the line elements have been considered as being binary, here they can assume continuous nonnegative values in 10, MI. Each of the three neighbor interaction functions in Eqs. (37)-(39) satisfies the requirements of the duality theorem and then leads to a corresponding dual prior energy of the type (521, for suitable forms of the functions 5 and r(l. When + ( t ) is the truncated parabola (37), Geman and Reynolds showed that Eq. (51) is satisfied with
(53a)
5(b) = b , +(b)
=
0 I b I 1;
(1 - b),
the infimum is reached with
and the line process is thus binary. In case (38) it is 5(b) +(b)
=
(b - 2 1 6
=
b,
+ l),
0 I b I 1,
from which the minimizer of Eq. (51) is b*
1
=
(t2
+ 1)2
*
For Eq. (39), Geman and Reynolds computed
and
(53b)
118
L. BEDINI ET AL.
Aubert et af. (1994) extended the duality theorem by relaxing the assumptions on 4. They proved that a function whose behavior at infinity is at most linear may act equally well in preserving discontinuities from excessive smoothing. It can easily be verified that function (41) satisfies the hypotheses of this extended theorem. The duality theory was derived for noninteracting line processes. If interaction terms between two or more discontinuities have to be included in the image model, to encourage or penalize particular line configurations, the discontinuities should be treated explicitly. Alternatively, if we want to maintain implicit lines (for instance, for computational purposes), we must suitably approximate E(x,l) to eliminate the line process. This subject will be developed in Section VI.
V. ALGORITHMS In the previous sections, we showed how the image reconstruction and restoration problems become estimation problems, in which the expectation of a particular cost function over the posterior density must be minimized. We showed three cost functions, leading to the maxima of the posterior marginals (MPM), the marginal posterior means, and the joint posterior mode (MAP) criteria. The posterior densities to be treated are rather complicated functions, especially when edge-preserving priors are used. In particular, they are generally nonconvex and depend on a very large number of variables. The MAP criterion, which requires the maximization of the joint posterior, must thus be implemented as a nonconvex optimization algorithm. The minimization of the two other cost functions shown in Section II1,C is reduced, for the MPM criterion, to the maximization of many one-dimensional functions and, for the marginal mean criterion, to the computation of many averages over one-dimensional densities. However, these one-dimensional densities are obtained by integrating the joint density over a high-dimensional domain. Stochastic methods have been studied to obtain good solutions in all these cases, but they are not the only approaches that can be used. For example, for MAP, deterministic algorithms are often more efficient than stochastic ones and can reach nearly optimal solutions. In this section, we describe stochastic methods for computing the estimates shown in Section III,C and two particular deterministic strategies for MAP estimation. Unless otherwise specified, field x can be
EDGE-PRESERVING IMAGE RECONSTRUCTION
119
interpreted both as the only intensity process and as the coupled intensity-line process.
A . Monte Carlo Methods for Marginal Modes and Averages Because of the high dimension of the joint posterior d x ) , the integrations needed to derive the marginals u i ( x i ) cannot be performed by means of deterministic numerical integration procedures, in that a very large integration grid would have to be used, and this would result in a very inefficient algorithm from a computational viewpoint. Stochastic integration methods can reduce by several orders of magnitude the number of points at which the function must be evaluated. The basic idea of stochastic integration is that a much more efficient integration grid can be obtained by sampling at a higher rate where the integrand assumes higher values. Following a Monte Carlo integration method, this is accomplished by randomly generating points of the integration domain, in accordance with a suitable probability density (see Hammersley and Handscomb, 1985). In our case, this density is d x ) ; we thus need a procedure for generating random samples of the integration space distributed as d x ) . A suitable procedure is the well-known Metropolis algorithm (Metropolis et al., 19531, which simulates the evolution of a multivariate physical system at its thermal equilibrium. Given a density function in the form d x ) = exp{ -E(x)}, the elements of x are visited in any order, and a random value is generated for each of them. Let x* be the state of x when an update for the value of a pixel is proposed, and let x k be the previous state. The value A E = E ( x * ) - E ( x k ) is then calculated; if it is negative, that is, if the update lessens the total energy of the system, the update is accepted, and we let x k + ’ = x*. If the energy change is positive, the update of the pixel is accepted with probability exp( -A.E}. In practice, a random number 7,uniformly distributed in [0,1), is generated and, if T < exp{ - A E } , the update is accepted; otherwise, the proposed update is refused, and x k + ’ = x k . It can be proved that the successive updates of x are a homogeneous Markov chain whose stationary state is distributed with density d x ) . In practice, after a sufficiently large number, say h, of updates, the system reaches the “thermal equilibrium,” and the successive updates of x are distributed in accordance with a ( x ) . In Geman and Geman (19841, another technique for drawing a sample from a probability density is shown: this is the so-called Gibbs sampler algorithm. Geman and Geman proved that, if a ( x ) is a Gibbs distribution, a random sample distributed as u ( x ) can be drawn by updating each pixel
120
L. BEDINI ET AL.
on the basis of its local conditional probability, which depends only on the state of its neighborhood. This means that the algorithm can be easily parallelized. Using a Metropolis algorithm to generate random samples, d f ) ,of the integration space and a Monte Carlo integration method, the posterior marginal for the ith pixel is
that is, for each value of x i , the marginal is given by the relative frequency of that value among the samples generated after equilibrium. From Eq. (59), the MPM estimate is given by x i = arg max a'( a)
Vi,
(I
and the marginal means are
The algorithm (61) for the estimation by the marginal means criterion is called the thresholded posterior mean (TPM), because each pixel is estimated as the mean of the values it assumes after a threshold number, h , of updates.
B. Stochastic Relawtion for MAP Estimation We said that the Metropolis algorithm is capable of generating points of the integration domain distributed according to d x ) . It is interesting to see that the exploration of the state space in this case is such that transitions with increasing energy are allowed with nonzero probability. We can control the generation of increasing energy transitions so as to skip the local energy minima and reach a global minimum. This can be accomplished by introducing a parameter to control the peaking of the posterior. This is the principle underlying the simulated annealing minimization algorithm, which is described below. 1. Simulated Annealing
Simulated annealing is the numerical counterpart of the thermal evolution of a physical system, characterized by a large number of permissible energy states, during an annealing process. The system is suitably heated, so that
EDGE-PRESERVING IMAGE RECONSTRUCTION
121
virtually any state has the same probability of occurring. The temperature is then lowered very slowly, so that the system passes through a series of states of thermal equilibrium, until it “freezes” in a minimum energy state. If the cooling schedule is too fast, the system reaches a state corresponding to a local energy minimum. The numerical procedure generates a nonuniform Markov chain whose stationary state converges to the uniform distribution over the modes of a specified energy function. Let us modify the posterior density (30) by a “temperature” parameter T:
Observe that, if the temperature is high, the density is practically flat over its domain, and all the changes proposed by a Metropolis algorithm are accepted. When the temperature goes to zero, it can be proved that (62) becomes uniform on the set of the global energy minima and zero elsewhere. If we start from high temperatures and reach the thermal equilibrium for several, slowing decreasing, values of the temperature, we are guaranteed to reach a global energy minimum. Suitable convergence criteria for simulated annealing can be found in Geman and Geman (1984) and Aarts and Korst (1989). We do not report here theoretical criteria for convergence in distribution, as they lead to computational requirements that cannot be fulfilled by any feasible procedure. However, practical criteria for reaching a good solution can be established and validated experimentally, although they do not guarantee convergence in distribution. In order to obtain feasible annealing algorithms, we must choose: 0 0
0
An initial value, To, for the temperature.
A criterion for deciding whether quasiequilibrium is reached at each temperature. A suitable cooling schedule. A criterion for determining the final temperature value and stopping the iterations.
For example, Kirkpatrick et al. (1983) proposed the following rules: 0
0
The initial temperature value is established by starting with quite a high temperature and gradually increasing it until the ratio between the numbers of accepted and proposed transitions is practically one. The iteration at temperature Tk should be long enough to permit the quasiequilibrium to be reached. This situation is reached after at least
122
L. BEDINI ET AL.
a fixed number of transitions have been accepted. The transitions are accepted with a decreasing probability for decreasing temperatures, so the iteration might be too long at low temperatures. This difficulty is overcome by fixing a maximum number of iterations per temperature value. The following cooling schedule is proposed: Tk+l = aTk,
0
where a is a real constant in the range [0.8, 0.991. The algorithm is terminated when, for a certain number of consecutive temperature values, the energy remains unaltered.
Other practical rules have been proposed by Aarts and van Laarhoven (see Aarts and Korst, 1989) and Garnero et al. (1991). 2. Mixed Annealing Introducing a line process in the image model, as shown in Section IV,B,2, raises the computational burden of minimizing the posterior. In cases in which the posterior is a convex function of the intensity process alone, the stochastic procedure of simulated annealing can be modified in order to become less expensive. In particular, in Marroquin et al. (1987), a mixed procedure is proposed, where stochastic steps are alternated with deterministic steps that support almost all the computational load of the minimization. In Bedini and Tonazzini (1992) and Bedini et al. (1993a) we propose a similar procedure (mixed annealing), with applications to image restoration and image reconstruction from projections, respectively. The minimization with respect to the intensity process is performed with a standard conjugate gradient algorithm, while the line elements are updated with a Gibbs sampler algorithm. Because the posterior energy E(x,I) is a convex function of x for any fixed configuration of the line process, the search for the global minimum can be restricted to the set of configurations (x*(I), I), where x*(l) is the minimizer of the posterior energy with the line process fixed at I (optimal conditional estimate). The nonconvex posterior energy restricted to this set is now a Gibbs energy, with the same neighborhood system as that of the prior. E(x*(I), I) can be thus minimized by an annealing scheme in which the random samples can be drawn by a Gibbs sampler. In Bedini et al. (1993a) we described this annealing procedure as follows: 1. An initial temperature, To, and a cooling schedule are chosen; the number N,)of iterations spent at temperature 7'0 and an initial guess
I, for the line process are given.
EDGE-PRESERVING IMAGE RECONSTRUCTION
123
2. For each temperature Tk and number of iterations N,, starting from the line configuration I,, the Gibbs sampler draws a line process sample I,+ I , with density
3. New values Tk+ and Nk+ are selected, and the algorithm repeats steps 2 and 3, until the stop criterion is satisfied. Note that, at each update of a single line element by the Gibbs sampler, a new optimal conditional estimate x* should be evaluated. For this reason, the algorithm is still very expensive, but its most expensive part is the deterministic one. Due to the small size of the neighborhoods, and because the line process is binary, the cost of drawing a sample from the line process is low. In Bedini and Tonazzini (19921, we proposed a possible dedicated architecture to perform the mixed-annealing algorithm. It is based on an analog Hopfield neural network and a grid of digital processors. The neural network performs the deterministic minimization over the intensity process. The digital grid, which receives the intensity process as its input, implements the Gibbs sampler algorithm. We also proposed an approximated version of the same algorithm, which experimentally gave good results and can be implemented on conventional hardware. Let 1; be a generic configuration generated by the Gibbs sampler starting from 1, and differing in only a few elements from I,. Let us assume that E(x* (I; ), I;) = E(x* (I,), I; ). Step 2 in the mixed-annealing algorithm can thus be split as follows: 2a. Compute x*(l,) for a given I,. 2b. Starting from I,, the Gibbs sampler draws a line process sample with density
Assuming that a complete scan of the line grid slightly modifies the status of the line process, using Eq. (64) rather than Eq. (63) can considerably reduce the computational cost of the algorithm. In Section VIII, we show some results obtained using this approximated algorithm.
124
L. BEDINI ET AL.
C. Suboptimal Algorithms
In Section IV,B,l we said that the advantages of treating discontinuities implicitly are mainly computational. In fact, by eliminating the line process we can devise deterministic minimization algorithms, which are generally cheaper than stochastic ones, although they give no theoretical guarantee of global convergence. Note that the practical annealing algorithms do not give such a guarantee either. In this section, we present two classes of algorithms that can be totally or partially deterministic and can also treat explicit discontinuities, by eliminating the line process using the duality theory or by splitting a minimization step as in the mixed-annealing procedure. 1. Graduated Nonconvexity An implicitly edge-preserving neighbor interaction function, e.g., one of
those shown in Section IV,B,l, eliminates the explicit line process from the posterior, but the posterior is still a nonconvex function. Blake and Zisserman (1987a) derived a fully deterministic algorithm to minimize a posterior energy with a neighbor interaction function of the type (37); this algorithm can also be extended to other types of functions. Blake (1989) proved its lower complexity when compared with stochastic relaxation techniques. This algorithm is called graduated nonconvexity (GNC). It is based on a sequence of approximations, FP(x), of the posterior energy in the form of Eq. (49), depending on a real parameter, p E [O,p*l, and such that Fo(x) = F(x) and FP*(x) is a convex function. Gradient descent algorithms are then applied to minimize the modified posterior energies, for decreasing values of p . starting from p = p * . The starting point of each minimization is the minimizer found for the previous value of p. If F P * is already a good approximation of F, then the algorithm can reach from the first iteration a point that is close to the desired global minimizer. As FP approaches F, this estimate is refined, so that a good approximation of the global minimizer can be obtained. Once the suboptimal estimate for the intensity x has been obtained, an explicit line process can be recovered from the neighbor interaction function, as shown in Section IV,C. The fundamental step for the implementation of a GNC procedure is the creation of a sequence of approximations such as those described above. This step is specific to any particular posterior energy. For the usual Gaussian likelihood and the implicit weak membrane model, Blake and
EDGE-PRESERVING IMAGE RECONSTRUCTION
125
Zisserman built the following primal energy:
with +(t)
=
min( a ,At').
(66)
To construct their GNC procedure, they substituted this neighbor interaction function with the following series of piecewise polynomials
\a
otherwise,
with C*
c = -
P
9
2 r'="(T+:),
and
a q=-.
Ar
Figure 6 shows a typical diagram for a function in the form (67).
0.5 1 I .5 2 FIGURE6. Typical plot of a function in the form (67),which approximates function (37) (see Fig. 1 for a comparison). 0
126
L. BEDlNl ET AL.
Gerace (1992) derived a series of approximations for a function of the type (39). These are
I.’
+4
otherwise
with aAp - p’(Ap
a A’
r=
2 p ( Ap
4=
+ a)‘ ’
Ap
+a)
+a
It is straightforward to verify that, for p = 0, Eq. (67) is reduced to Eq. (66) and Eq. (68) to a function of the type (39). The important issue is now the search for a value p* such that the resulting FP* is convex. In Blake and Zisserman (1987a1, this is done by “balancing” the positive second derivative of the likelihood term against the negative second derivative of +P*. If c# is designed to satisfy &pP*
at2
2 -c*
vt,
then the Hessian of FP* is positive definite. In their case, Blake and Zisserman found the condition 0 < c* < 1/(8a2). In practice, c* should be chosen such that +“* is as close as possible to and this leads to c* = 1/(8a2). The application of the same criterion to Eq. (68) leads to the condition (Gerace, 1992)
+,
c*
2
A 2a
and
for generic values of h and a. In Fig. 7, we show an example of function (68). Recently More and Wu (1995) outlined a general method for obtaining a family of approximations for an objective function. This method is based on applying the Gaussian filter to the objective function. The role of parameter p is played here by the variance of the Gaussian kernel. Under fairly general assumptions, it can be proved that a value of the variance
EDGE-PRESERVING IMAGE RECONSTRUCTION I
I
1
1
I
1
I
I
127
4JP(t)
I
0.5
0
6 8 10 FIGURE 7. Typical plot of a function in the form (68), which approximates function (39) (see Fig. 1 for a comparison). 0
2
4
always exists beyond which the filtered function is convex; this value plays the role of p*. Applying the Gaussian filter to a general function is not easy; however, it becomes easier if the function assumes a particular form. 2. Generalized Expectation Maximization The expectation-maximization(EM) approach was proposed to solve maximum-likelihood estimation problems (Dempster et al., 1977) and has been applied to tomographic reconstruction problems (see Shepp and Vardi, 1982; Lange and Carson, 1984; Levitan and Herman, 1987; Hebert and Leahy, 1989; Green, 1990; Hebert and Gopal, 1992). Note that, in these applications, the posterior density is not Gibbsian, because the integral relationship between the data and the image gives rise to neighborhoods that coincide with the entire image. This means that the computation required for maximum likelihood or MAP is much more complex than the one required for other image processing applications. In particular, the large “posterior” neighborhood prevents the application of efficient parallel procedures. The EM approach was devised as a tool to develop effective optimization algorithms, although their convergence is guaranteed only on a stationary point of the objective function. This approach consists of reformulating the estimation problem in a new parameter space, where the optimization is simpler than the one to be performed in the original space. The key point of the procedure is, in fact, to define this auxiliary space; in practice, this definition is often suggested by physical considerations,
128
L. BEDINI ET AL.
although a physical relationship between the original and the auxiliary variables is not actually needed. Suppose we have a sample space G of observable data, denoted the incomplete data space. We use a particular realization g E G to estimate a solution x, by maximizing the likelihood function f(g 1x1. The direct maximization of function f may be difficult. The basic idea is to solve our problem in a new sample space, 2, called the complete data space, where this problem is simpler. Space Z does not need any particular property, and its elements do not need to be observable quantities. The only requisite is that the elements of G must be obtained by an explicit many-to-one relation from the elements of Z. Let us denote with z an element of 2, with /(z 1x1 the likelihood function in the new space, and with h : Z + G the many-to-one relation. The EM procedure consists of iteratively finding the maximizer over x of the likelihood f, making use of the associated density /. To obtain this result, we establish an initial guess xo and perform iteratively the following two steps: E-step: calculate the expectation of the function lod/'(z I x)] conditioned by the observed data g and the current estimate X":
&[log[P(z Ix)llg,x"]. (71a) M-step: calculate the new estimate x " + I as the maximizer of (71a):
These two steps can be interpreted intuitively as follows. We cannot directly maximize the function log[/(z 1x11, because it has been derived in an unobservable space. We are thus able only to maximize its expectation conditioned to the knowledge of the observed data and the current estimate of the unknowns. The proof that the successive estimates of x converge to a stationary point of f<s 1x1 can be found in Dempster et al. (1977). This property remains valid even if the M-step is not actually a maximization but is substituted by the following: M'-step: determine a new estimate x"" such that The EM algorithm, modified by substituting Eq. (71b) with Eq. (72), is known as the generalized ~pectation-maximization(GEM) algorithm. Its convergence properties are roughly the same as those of the original EM, except the convergence rate, which is slower in GEM. For the choice between the two approaches, one must bear in mind that the higher
EDGE-PRESERVING IMAGE RECONSTRUCTION
129
convergence rate of EM is normally paid in computational complexity, which is higher for step (71b) than for step (72). The choice should thus be guided by considerations of the complexity in evaluating the objective function. The EM/GEM strategy can be applied to a MAP problem by simply adding the log-prior - U(x) to the function to be maximized (or increased) in the M-step (or M'-step). A GEM strategy can also be applied in cases of posteriors with explicit line processes (see Bedini et al., 1994d1, by splitting the M-step as follows: M,-step: find an x"+ such that: E,[log[J(z
IX"+
')I Ig, x " ] - U(X"+
2 E,[log[/'(zIx")]lg,x"]
M,-step: find an 1""
1,
I")
- U(x",I")
(73)
such that
E,[log[/(
2
Ix"+
')I Ig,x"] - U(X"+
1,
2 ~ ~ [ l o g [ / ' ( z l x " + ~ ) ] l g , x "-]
I"+
1)
U(X"+l,l").(74)
These relations are easily verified, bearing in mind that 0
The line process, as defined in Section IV,B,2, is unobservable and thus does not appear in the likelihood function /'. The log-prior does not depend on either the complete or the incomplete data.
Note that the M-step is split almost as in mixed annealing. In Section VIII, we show an application of this strategy, where the M,-step is performed stochastically. AN IMPLICIT LINEPROCESS VI. CONSTRAINING
So far our analysis of edge-preserving reconstruction techniques has identified two major requirements: (1) feasible image models that can capture all available information, and (2) efficient, even if only near-optimal, algorithms. It would appear that none of the methods reviewed here can deal with both these requirements. Specifically, the relatively efficient deterministic algorithms have all been derived for models that treat implicit discontinuities and do not allow for self-interactions. This prevents the exploitation of the important piece of prior information regarding the geometrical features of the discontinuities. For instance, we know that in most real images discontinuities tend to form connected, thin, and closed
130
L. BEDINI ET AL.
curves, which occasionally have sharp direction changes (corner points, crosses, T-junctions, etc.). This can be seen as a sort of local smoothness property of the line field, which recursively extends the one of the intensity field. This fact has been well described and handled in a continuous setting (March, 1992). As already stressed, the MRF-based approach with explicit lines is very good at modeling constrained discontinuities, but at the price of high nonconvexity for the related energy functions. Moreover, because these functions have mixed, continuous, and binary variables, it is difficult to devise an optimization scheme besides stochastic relaxation. Modeling constraints on the discontinuity field becomes much more complicated when the line process is addressed implicitly. In fact, all the neighbor interaction functions proposed so far can only predict the presence or the absence of a discontinuity, and a formal relationship among the implicit and explicit approaches has been found only for noninteracting lines. Since disregarding evident and well-defined properties of the field to be reconstructed is not in the spirit of regularization, some attempts have been made to deal with the problem of addressing a constrained line process while maintaining the computational advantages of deterministic minimization. All the methods reviewed below are based on considering a model with explicit and self-interacting discontinuities and then on adopting suitable approximations of the prior energy U(x,I) that allow the elimination of the line process itself, thus resulting in the implicit line treatment. These methods have been designed for very simple self-interactions of the lines, such as expressing a line continuation constraint or a penalization of parallel adjacent lines. The line continuation constraint, also referred to as the hysteresis property, is characterized by the following form of U2W: '2(')
C h l , J h l , J +Elf f
C
I.J
1.1
-
=
"l,J'l+1,J9
(75)
where parameter E takes positive values in [0,1) and controls the amount of line propagation. The price to be paid to create a discontinuity is decreased by E(Y when a discontinuity at a neighboring site is present. The discontinuity field is generally thin, in the sense that multiple responses to a single edge are not feasible (nonmaximum suppression) (Canny, 1986). Thus, to penalize the formation of adjacent parallel lines, the following form can be chosen for U2(1):
Uz(1)
=
+Y
C
hi.jht+l,j
I7J
where y can now take any positive value.
+Y
C 1,J
ui,,ui,j+]*
(76)
131
EDGE-PRESERVING IMAGE RECONSTRUCTION
Below, we will restrict our analysis to the line continuation constraint; parallel line inhibition is a slight modification of the former. A. Mean Field Approximation Geiger and Girosi (19911, in the framework of statistical mechanics and mean field techniques, derived an approximated solution for both the intensity and the discontinuity fields, which is also suitable for treating self-interacting discontinuities. They first considered MRF models without interactions of the line process and used the mean field theory to obtain mean statistical values for the intensity field x and for the line process 1. These values are actually functions of the data and the partition function 2. Because of the practical difficulties with computing the partition function, they proposed to eliminate the line process I from 2 and then derived some approximations to obtain a set of nonlinear equations with reduced complexity. They called these equations deterministic to stress the deterministic character of the whole procedure. More specifically, they considered the function Z in the form:
z
=
C exp[ -E(x,I)/T]
(77)
x, I
where the energy E(x,I) is the sum of the data term and the prior energy U(x,I) which, in this case, consists of the mixed term alone. Owing to the independence of the single line elements, the summation on I can be computed analytically and this results in a new expression for Z, where the mixed term now assumes the form of a function U,(X) depending on the temperature T. Assuming that the fluctuations around the mean values are small, suitable approximations can be adopted to obtain the following set of equations for the mean values X i , j , j , and Ei, j .
zj,
21.1 . . = g1.1 ..-2u2A[(X.1.1. - X i , j + l ) ( l +(ii,j
-Ei+l,j)(l
-
-zi,j)
-Ei,j) - ( X i , j - l -Xi,j)(l
-
(Xi-1.j
X.1 . 1.)(1 - % j - , , j ) ] ,
(78a)
1
h 1.1 . .=
-u.
-
-Ei,j-l)
.
1.1
=
(78b) 1
( 78c)
132
L. BEDINI ET AL.
This set of equations can be solved using a fast, parallel, and iterative scheme. Note that the family of functions &(XI can be regarded as a family of approximating functions to be used in a GNC algorithm, with the neighborhood interaction function given by:
( f
~ $ ~ (=t A) t 2 - T In 1 + exp - -(
[
cy
- At2)
)].
(79)
In particular, it is straightfonvard to verify that becomes the truncated parabola, when T goes to zero. Gerace (1992) showed that, using Eqs. (78), a suitably large value of T exists such that UT(x)is convex, as required by the GNC algorithm. Geiger and Girosi also showed that the case of interacting discontinuities can be treated by simply augmenting the prior energy with a term U2(l). They explicitly considered the case in which the line continuation constraint is enforced [Eq. (791 and again derived a set of deterministic equations in the form (78a) for the mean values Zi,j , while for the mean values and Ei, they obtained:
xi,
1
The main drawback in this case is that, for each iteration, the transcendental equations (80) must be solved. In Geiger and Girosi (1989), ad hoc approximations are adopted to obtain a local version of Eqs. (80). Similar equations can be derived when different constraints are considered, for instance to enforce constraints which penalize the formation of adjacent parallel lines [Eq. (7611.
B. ExtendedGNC A different way to implicitly manage the line continuation constraint is given in Bedini et al. (1994a, b). It consists of approximating U2(I) in Eq.
133
EDGE-PRESERVING IMAGE RECONSTRUCTION
(75) in the following way:
where
In practice, this assumption means that we approximate the true values of with the functions 5 ( x i ,j - ! - x i + j - 1 the line elements hi,j - and 0;and t ( x i - l , j - x i - I , j + l ) which depend only on the intensity gradients across the line elements themselves. The approximated energy is 9
which can easily be minimized over I to give the following primal energy:
+
C 4( x i , j
- xi + 1 , j
9
xi. j + 1 - xi + 1, j + 1 )
(84)
3
i,j
where the neighbor interaction function
4(ll,t2)= min A ( l - s ) t : S
+ as - & a s t ( t 2 ) ,
tl,t2E R,s
does not depend on the particular site (i, j ) and is given by
E
(0,l)
134
L. BEDlNl ET AL.
with
The term \/a(1 - & ) / A is called suprathreshold. If the intensity gradient is greater than the threshold a line element will be created; if the intensity gradient is lower than the suprathreshold, the smoothness constraint will become active; if the gradient value falls between the suprathreshold and the threshold, the creation of a line will depend on the gradient across a neighboring line element. In Fig. 8, a surface plot of function (85) is reported for particular values of the parameters a,A, and
m,
E.
Analogous results can be obtained when U,(I) assumes a form which inhibits line parallelism; it suffices to substitute -&a with y in Eqs. (83) and (8617 ( X , , , t I - x, i I , , + I ) with ( X , + l . , - X,+Z.,)’ and ( X i + l . , - x, t I., + I ) with ( x , , , + I - x ,,,, 2 ) in (811, (83) and (84). To minimize the primal energy (84) using a GNC algorithm, a convex approximation F* (x) must first be provided, by constructing an appropriate neighbor interaction function &*. Following the criterion given by Blake and Zisserman (1987a1, in Bedini et al. (1994a) we proved that, when A = I, if 4* satisfies
a *&*
-(t,,t*) at;
2 -c*
where 0 < c* < 1/32a2, then the Hessian of F* is positive definite. For adjacent parallel line inhibition, the same result holds with 0 < c* < 1/24a2. In practice, to have &* as close as possible to 4, we set c* = 1/32a2. We first derived a two-parameter family of energy functions F ( p > ” ) , which are continuous and continuously differentiable, and identified from them a convex function, by applying inequalities (87). The F ( P * u )are ~ constructed by replacing 4 in (84) by suitable neighbor interaction func-
EDGE-PRESERVING IMAGE RECONSTRUCTION
135
FIGURE8. Typical surface plot of the neighbor interaction function (85).
tions 4 ( P v u ) , whose definition follows the same criteria adopted to derive functions 4“ in the previous section. In formulas, we have
otherwise, (88)
m,
where s = + P ( t ) is given by Eq. (67), with c* = 1 / 3 2 a 2 , and 4,P(t)is the same as 4P(t>,with a substituted by a ( l - E ) . Function U ( P , ~ ) ( ~is) given by
In Fig. 9, a surface plot of function (88) is shown, for particular values of the parameters A, a, E , p , and u. The neighbor interaction function 4(tl,t 2 ) can be recovered from Eqs. (88) and (89) in the limit of p to 0 and u to \lcu/h. The upper bounds for both p and u are given by those values p* and u* for which the corresponding F* = F ( P * * ’ * ) is found to be
136
L. BEDINI ET AL.
FIGURE9. Typical surface plot of a function in the form (881, which approximates function (85) (see Fig. 8 for a comparison).
convex. In Bedini et al. (1994a) we proved that p* and u* must satisfy the following inequalities: p* 2 1,
(90a)
2 ( 6-
d
m
6 ( u * - s)
2p*A
+ c*
) 9
8A s c * ( u * - s)
-
In order to find suitable values for u* and p* from these inequalities, we ) first chose u* such that (90b) is verified; then, substituting u* in ( 9 0 ~ and (90d), we looked for a p* 2 1 that verifies them. Such a p* will always exist because, in the limit of p* to infinity, the left-hand terms of both (904 and (90d) go to infinity. Moreover, the greater the u* chosen, the smaller the p* needed. The GNC algorithm begins by minimizing F* = F(P**'*).Then p is For every decreased from p* to 0, while u is decreased from u* to fl. value of p and u, F(P,") is minimized by a gradient descent, starting with ) . call this algorithm extendedthe local minimum of the previous F ( P * UWe GNC (E-GNC).
137
EDGE-PRESERVING IMAGE RECONSTRUCTION
C. Sigmoidal Approximation In Bedini et al. (1995), we proposed a way to eliminate the line process from a generic energy function E(x, I) of the following form:
+A
C ( x i , j -xi+*,j)*(1 i.j
- hi,j) + a
C h i . j + a C ui,j i,j
i,j
C Q h ( h i . j ~ h m , n ~ u m , n ~ ( ~ ~ Nn h) ( i y j ) ) i.j
+ C Qo(ui, j ' h m , n urn,,, ( m , n)
E
N ( i 9
d),
(91)
i7i
where the sixth and seventh term express constraints on the configurations of the discontinuities, through self-interactions of any order of the lines. The order of the interactions is determined by the size of the neighborhoods, Nh and N,, adopted for the generic horizontal and vertical line elements, respectively. We suggested substituting each line element in (91) with a function qT(r),depending on a parameter T and with values in [O, 11, where t is some measure related to the local intensity gradient. Function q T ( t ) is chosen with the following properties: (a) For any T > 0, q,(t) is increasing and continuously differentiable. (b) as T goes to zero, q T ( t ) converges to the function
where 8 is the step threshold, which may be site dependent. We chose our functions q T ( t )in the family of the sigmoid functions and assumed the difference between two adjacent pixels as a measure of the local intensity gradient. For the generic horizontal line element, hi,j , we set
138
L. BEDINI ET AL.
and, for the generic vertical line element, ui,j , 1
(92b) In Fig. 10, plots of the sigmoid function q T ( t )are reported for different values of temperature T and for 0 = 60. Given Eqs. (92a) and (92b), Eq. (91) assumes the form:
'- 4
T = 1000 1r = 2000 1r = 3000
p
0.a-
0.60.4-
t
O
0
20
, 40
60
b 80
FIGURE10. m i c a 1 plots of the sigmoid function of temperature.
, 100 120 q T ( t )of Eqs. (92) for different values
EDGE-PRESERVING IMAGE RECONSTRUCTION
139
ensures that there exists a function F(x), in general nonconvex and nondifferentiable, which is the limit of the sequence F,(x) as T goes to zero. Moreover, a value for T always exists such that the corresponding FT(x) is convex. Indeed, it can be immediately verified that, when T goes to infinity, the sigmoid function converges to the constant 1/2. In practice, a finite value, even if it is high, for T*, such that F J x ) is convex can be found, following the general criterion established by (87). The application of this criterion under the nonrestrictive hypothesis that the discrete gradient of x is bounded, leads to an inequality for T* which depends only on A and a. These conditions permit the application of GNC-type algorithms, based on the successive minimization of the various F,(x) via gradient descent techniques. Note the form that F ( x ) assumes in particular cases. Let us first assume Q,, = Q, = 0; in this case, Eq. (91) becomes the weak membrane energy. The extended form of Eq. (93) is now given by llg - AxlI2 FT(X) =
2a2 . j ) ~ e x -((x;,j p[ -xi+l.j)2 e 2 ) ) / ~+] a + cA(Xi+j- ~1i ++ Iexp[ -((xi,j r+l.j)2- e 2 ) / ~ ] -
-x.
i. j
which, in the limit of T to zero, becomes:
where +(t)
=
(
it2
if I ~
m.
(96)
is the truncated parabola if 0 = In Section V, we showed the families of approximations derived by Blake and Zisserman for this neighbor interaction function and, in Section VI,A, those proposed by Geiger and Girosi. Thus, when no constraints are
140
L. BEDINI ET AL.
forced on the line configurations, our method gives a different way to obtain approximations for the truncated parabolas. Let us consider now the line continuation constraint, i.e., Qh
=
C hi,jhi,
j +1 9
i ,j
Qv
=
C ui,jui+1,j* i ,j
The extended form of Eq. (93) is now given by FdX)
Ik - Axil2 =
2a2
&ff
(97)
EDGE-PRESERVING IMAGE RECONSTRUCTION
In the limit for T to zero, Eq. (98) becomes
where
4 ( t )is the same as in Eq. (961, and
with -
8)
if It21 < 0, otherwise.
As shown in Eqs. (961, (1001, and (1011, parameter 0 defines a threshold for the intensity gradient above which the smoothness constraint is broken. In the cases of more complicated and higher order interactions, it is difficult to explain the meaning of F(x) with respect to the line process. However, from the cases examined, it appear that the model proposed here for the implicit treatment of the lines can be considered as an extension of the one derived from the weak membrane. VII. DETERMINING THE FREEPARAMETERS All the methods described above have been developed assuming that the parameters appearing in the cost functionals are known exactly. These parameters are the variance of the noise CT the regularization parameter, A, and, more generally, all the free parameters (also referred to as hypetparurneters) needed to specify completely the MRF image model. Most works on image reconstruction assume that the noise variance and the degradation operator A, related to the imaging system, are known. This means that the likelihood function is completely specified. Without this assumption, the reconstruction problem would result in the so-called blind restoration problem, which requires techniques of system identification. As this is beyond the scope of this chapter, we consider as a valid assumption in our case the a priori knowledge of A and CT’. Similar
’,
142
L. BEDINI ET AL.
considerations cannot be applied to the free parameters in the MRF model, in that usually we do not have enough information on the true image to determine them satisfactorily. Let q denote the vector of all these hyperparameters, including the regularization parameter. Estimating q is a critical issue in image reconstruction and restoration methods, as it considerably affects the solutions obtained. Let us consider some simple cases. When q contains only parameter A, as in standard regularization or maximum entropy, it is easy to verify that, by modifying its value, very different solutions can be obtained, ranging from the ultrarough leastsquares solution, when A = 0, to the ultrasmooth solution, when A goes to +a. Moreover, experiments showed that the best value for A is very sensitive to the stabilizer adopted, the image structure, and the amount of noise affecting the data. When MRF models with noninteracting discontinuities are adopted, another fundamental parameter is the gradient threshold for detecting a line. Its value should be related to the effective minimum value of those horizontal and vertical gradients that, in the true image, correspond to the edges to be preserved. In practice, an exact value of the threshold is not available for real images. Bounds can, however, be obtained, considering the resolution of the imaging system and the variance of the noise affecting the data. In many applications, it could be useful to consider the threshold as a dynamic parameter, whose value can change through the various regions of the image. When a line continuation constraint is included in the model, a positive paramter E , less than one, is needed to control the propagation of the lines [see Eq. (7511. Its effect is described through the following example. If the threshold is chosen to be lower than the minimum intensity gradient present in the original image, but comparable to the variance of the noise, even the finest discontinuities would be detected; however, the noise cannot be removed completely. In this situation, any nonzero value for E would be unfeasible, as it would further propagate the incorrect edges. Increasing the threshold removes the spurious edges but can also prevent the detection of the edges corresponding to the lowest discontinuities in the original image. This causes an oversmoothed reconstruction. In this case, it would be useful to give E a nonzero values, say E = 0.5, as this favors the continuation of the correct edges. However, if E is too high, the reconstruction deteriorates, owing to the excessive proliferation of the edges. Most methods for parameter estimation are based on recursively reconstructing the image using the current parameter estimates and then using it to estimate a new set of parameters. Another approach is to determine the hyperparameters “off line” from training data, that is, from one or
EDGE-PRESERVING IMAGE RECONSTRUCTION
143
more sample images which represent the class of the expected solutions. Once the estimation has been performed, the reconstruction task is accomplished assuming the hyperparameters to be known. Below, we briefly overview the most common methods developed in both approaches. A. Regularization Parameter
In the simplest methods for regularizing ill-posed problems, the cost functional to be minimized depends only on the smoothing parameter A. Several data-driven methods have been described in the literature for its estimation. In standard regularization with quadratic stabilizers C(x) = IlCx I’, where C is a positive-definite matrix, the solution depends on A through the equation x( A)
=
(A%
+ ACTC)-’ATg.
( 102)
As mentioned in Section 11, the oldest idea for determining the regularization parameter is to consider it as a Lagrange multiplier of an equivalent constrained problem (Andrews and Hunt, 1977; Luenberger, 1984). This approach leads to the two following criteria (Bertero et al., 1988): 1. Among all x such that llCx11’ IE, find the one that minimizes IIAx - 811.’ Using the method of Lagrange multipliers the solution to this problem has the form (1021, where A is the unique solution of the equation IICx(A)II’ = E. The resulting solution is the one that satisfies the constraint and best fits the data. 2. Among all x such that IlAx - 811’ 5 E , find the one that minimizes 11Cx11’. Again, the solution t o this problem has the form (102), where A is the unique solution of IIAx(A) - 811’ = E . This is also called Morozou’s discrepancy principle (Morozov, 1966). The solution is the one which is sufficiently close to the data and is most regular. Unfortunately, in both cases 1 and 2, the calculation of A is not straightforward, as it involves solving nonlinear equations. A third criterion, which gives an explicit expression for A, consists of looking for an x(A) in the form (102) that satisfies both the constraints IICx11’ IE and 11 A x - 811’ IE . This has been shown to be equivalent to taking A = E / E (Miller, 1970). This solution is a compromise between regularization and closeness to the data. The difficulty in this case is to get estimates of E and E. Again for quadratic stabilizers, statistical criteria have been proposed based on minimizing an average risk depending on A (Kay, 1988; Thomp-
144
L. BEDINI ET AL.
son et al., 1991). In particular, considering a loss measure, QMA),x), between x(A) and the true solution x, the value of A which minimizes the expectation of Q conditioned on the data would be a reasonable choice. The risk for estimation is defined as E(IMA) - x1I2}, while the risk for prediction is defined as E{IIAx(A) - Axil2]. Unfortunately, these risks depend on the true solution, which is unknown, so they have to be estimated from the data. This leads to a number of criteria, the most popular being the chi-squared, the equivalent degrees of freedom, and the generalized cross-validation (Golub et al., 1979; Hall and Titterington, 1987). In the chi-squared criterion, A is defined as the solution, &HI, of the equation: RSS( A)
=
IIAx( A) - g1I2 = N u 2 .
(103)
This is due to the fact that, in the presence of Gaussian noise, the residual IlAx - g1I2 follows a chi-squared distribution with N degrees of freedom, where N is the size of x. It is immediate to recognize that the method is equivalent to the constrained optimization method 2 when E = Nu2 (Hunt, 1973). Nevertheless, the computation of x(A) by Eq. (102) involves a loss in degrees of freedom for the residual, which has been evaluated. On the basis of this consideration, the so-called equivalent degrees of freedom criterion has been proposed, in which A is computed as the solution, AEDF, of the equation: RSS( A)
= u 2 EDF(
A)
( 104)
where EDF(A) = N - tr[K(A)] and H A ) = A(A% + ACTC)-'AT. Thompson et al. (1991) proved that A,,, produces solutions that are smoother than the ones produced by A,,,. This is confirmed by the fact that, since the residual is a monotonic increasing function of A, it is &HI > AEDF. Generalized cross-ualidation consists of removing a data point gi and predicting it by using the solution computed from the remaining data. As an estimate for A, the value A,, that minimizes the mean squared error between all the data and their predictions is chosen. Unlike chi-squared and equivalent degrees of freedom, generalized cross-validation does not require knowledge of the noise variance. Moreover, it has been shown that this estimate converges to the one produced by the risk for prediction criterion when N is large. In formulas it is A,,
=
arg min A RSS( A) / [EDF( A)]
(105)
EDGE-PRESERVING IMAGE RECONSTRUCTION
145
When substituted in Eq. (1041, AGcv gives a data-based estimate of the unknown noise variance and, as long as this estimate is reliable, this implies that A,, and A,,, are comparable (Thompson et al., 1991). As for the Lagrange multiplier, the computation of these estimates entails the iterative solution of nonlinear equations and the computation of the solution x(A) at each iteration. An explicit expression for x(A) can, however, be found in terms of the eigenvalues and the eigenvectors of the matrices involved in Eq. (102). In most cases, these matrices are blockTopelitz and thus diagonalizable by discrete FFT when approximated as block-circulant matrices (Hunt, 1973; Andrews and Hunt, 1977; Kay, 1988). These computations can be accomplished only once, off line; moreover, solving the equations for A, both the estimation of the regularization parameter and the optimal reconstruction for x are simultaneously obtained. Alternative approaches have been derived to estimate the regularization parameter when the stabilizer is nonquadratic, as in the case of implicitly addressed discontinuities. They are mainly based on maximum likelihood (ML) estimations, assuming a uniform probability distribution for the regularization parameter (Geman and McClure, 1985). Let us rewrite the prior energy in Eq. (36) as
U(X)
=
aV(x),
where the known threshold A is incorporated in V(x). If the true solution, 2, of the reconstruction problem were known, the maximizer of the prior probability given 2 would be given by the unique solution of the equation:
( 106) whose left-hand side is a monotonic decreasing function of a, which is independent of the data and can be tabulated off line, for a number of values of a. The maximizer of the posterior probability given 2 is the solution of the equation
Ea[V/(x)]
=
V(9,
E a [ V ( x )181 = V ( 2 ) . (107) Combining Eqs. (106) and (1071, we have the following equation: Ea [ V(x)1
= ‘a
[ v(x>181
9
( 108)
which has to be used in real cases, where the true solution is not known in advance. To solve Eq. (1081, an iterative procedure can be devised in which, given the current estimate of cr, the right-hand side is computed by drawing samples of x from the posterior, and then a new value of a is computed by solving the resulting single-solution equation (Geman and McClure, 1987).
146
L. BEDlNl El' AL.
This procedure, which, however, presents extremely high complexity for the tabulation of En[V(x)l, permits the estimation of the regularization parameter and, simultaneously, the computation of the solution according to some Bayesian criterion. As we will see in the next section, this method is a particular instance of a more general method in which ML estimations of all the hyperparameters and MAP image reconstructions are performed alternately. 0. MRF Hypeparameters
Let us assume now a general image model, characterized by a prior energy U,(x) of any form. The subscript q has been introduced to highlight the dependence of U on the hyperparameters. Unless specified, x stands for both the intensity process alone and the intensity process plus the line process. When reconstruction has to be performed by MAP estimation, an approach to estimating the hyperparameters could be to assume a uniform prior probability for q and then estimate at the same time the hyperparameters and the object x, by maximizing the posterior distribution p(x Ig,q). In this case, however, the data probability density, p(g), cannot be left apart, since it is also dependent on q. The computations required are thus impracticable. A different distribution, which has been shown to be relatively easier to compute, is p(x, g 19). On this basis, x and q are given by the solution of the following problem:
The joint maximization (109) is still a difficult task, so that some strategies must be found to reduce its computation cost. Besag (1986) and Lakshmanan and Derin (19891, among others, adopted the following suboptimal iterative procedure: x k = arg maxp(x,g ISk),
(110a)
X
q k + ' = arg maxp(xk,g Is).
( 1lob)
9
Starting from an initial guess qo and iterating steps (110a) and (110b), a sequence ( x k , q k ) which converges to a local maximum of p ( x , g lq) is obtained. In this sense, (110) is weaker than (109); however, if (x*,q*) is the solution of Eqs. (1101, x* is the MAP estimate of x based on g and q*, and q* is the ML estimate of q based on x*. In fact, it is P(x,g Is*)
=P(X
Ig,q*) P ( g lq*).
(111)
EDGE-PRESERVING IMAGE RECONSTRUCTION
147
Since p ( g I q*) is independent of x, the global maximum of p(x, g I q* with respect to x coincides with the global maximum of the posterior energy p(xl g, q* ). Moreover, it is P(X*,
g Is)
=P(g
Ix* ) P(X* 19).
(112)
Since p(g I x * ) is independent of q, the global maximum of p(x*, g 19) with respect to q coincides with the global maximum of the prior distribution p(x* 1s). The same considerations hold at each stage of the iterative procedure (110). Thus, to obtain (x*, q*) several MAP estimates for x and several ML estimates for q need to be computed. For the MAP estimate, depending on the form of the prior Uq(x) and on the explicit or implicit presence of a line process, one of the algorithms described in Sections V and VI can be adopted. For image restoration with implicit lines, Besag (1986) proposed carrying out a single cycle of ICM to approximate the solution of Eq. (110a) at each iteration, and Lakshmanan and Derin (1989) proposed simulated annealing with an application to image segmentation. The main problem in solving (110) is to find an efficient algorithm to compute the ML estimate of q in step (110b). Indeed, the difficulty with Eq. (110b) is related to the presence in the prior distribution of the normalizing constant which depends not linearly on q. More specifically, it is (113a) ( 113b) X
One possible approach is to compute the gradient of p(x* Iq) with respect to q and set it to zero. This results in the following set of equations: which expresses the necessary conditions for a vector q to be a maximum of the distribution (113). It is straightforward to verify that Eq. (114) is reduced to Eq. (106) in the case where vector q consists of a single parameter. Although the normalizing constant does not explicitly appear in Eq. (114), the computation of the expectation of the left-hand side still requires a summation over x whenever q is updated. In the coding method (Besag, 1974; Cross and Jain, 1983), instead of maximizing the prior distribution, q is estimated by maximizing the conditional likelihood:
148
L. BEDINI ET AL.
where M ,called “coding,” is a set of sites which does not contain any pair of neighbors in the MRF sense. For example, considering the intensity process alone as a first-order MRF, at least two disjoint codings can be defined, corresponding to the checker-board partition of the sites. Maximizing the conditional likelihoods related to different codings gives different estimates of the hyperparameter vector. The final estimate can be obtained by averaging these estimates. In this method, the effort in computing the normalizing constants is reduced, since, for each distribution in (119, the summation required to compute the normalizing constant is made on the values of x i and not on all possible configurations for x. An extension of the coding methods is the maximum pseudolikelihood estimation (MPL) (Besag, 1986; Lakshmanan and Derin, 19891, in which the product in (115) is extended over all sites. Although this is not a true likelihood, in many cases of parameter estimation, MPL is a good approximation of the ML estimate, and it is also consistent, in the sense that it converges in probability to the ML estimate when the size of the image increases. Nevertheless, most works proposing MPL for the estimation of the MRF hyperparameters are in the context of image segmentation, in which the image pixels can assume only a few values. Moreover, these works do not consider, at least not explicitly, the line process. In our opinion, the introduction of the discontinuities into the MRF model considerably complicates the problem of parameter estimation, in that intensity and line elements are highly correlated, thus preventing the pseudolikelihood from being a good approximation of the original prior distribution. An alternative, although similar, approach consists of maximizing the marginal distribution p ( g I q), which is obtained from p(x, g I q) integrating out the image field x: P(g
I4 = C P b Ix,q)p(x 1s).
(116)
X
The marginal in Eq. (116) can be maximized using the EM algorithm, considering g as the incomplete data set and the pair (x, g> as the complete data set. In this particular case, the many-to-one relationship between x and g is probabilistic and governed by the likelihood p(g Ix, 9). Thus, at the (k + 1)th iteration the two steps in the EM algorithm become:
M-step: compute qk+
=
arg max Q(q I q k )
I
9
EDGE-PRESERVING IMAGE RECONSTRUCTION
149
In the absence of an analytical expression for Q, the E-step must be computed by sampling x from p(x Ig,qk). Since this is impossible in practice, some pseudolikelihood approximation for p ( x I q>must be adopted.
C. Parameter Estimation from TrainingData The ML methods described above attempt to determine the best model parameters for a given reconstruction task, i.e., when a specific set of observed data is available. Another approach is to determine the hyperparameters that make the MRF model suitable for describing a given class of images. This is accomplished by estimating the hyperparameters from one or more sample images which represent that class. This approach differs from the previous one, because an image belonging to the class of interest is used instead of the current estimate x k , given some specific data. To study the effectiveness of MRF models for textured images, Cross and Jain (1983) used the codings method to estimate the model parameters from natural texture samples. They observed little variations in the estimates over various codings and a better fitting of the model to microtextures than to regular textures. Derin and Elliott (1987) computed the histograms of local configurations in sample images and then solved a set of equations by a standard least-squares technique. To model positron emission tomography (PET) images of the brain, Levitan et al. (1995) proposed using an MPL estimate to determine the parameter that controls the overall correlation strength among neighboring pixels, assuming that the other free parameters can be heuristically selected on the basis of general principles. A special instance of parameter estimation from training data is the study of the properties of the MRF model for the explicit line process alone. Derin and Giiler (1990) tested an MRF line model whose Gibbs energy is given by the term LI,(l) in Eq. (471, but without multiple-edge penalization. They first reduced the number of parameters by assuming an isotropic line field and then computed an MPL estimation by using a multistart deterministic relaxation algorithm. They carried out several experiments. In the first set of tests they compared the realizations of the line process obtained by using the Gibbs sampler with different sets of parameters and studied the effect of varying a parameter while keeping the others fixed. In the second set, they sampled from a prior distribution characterized by certain parameters and then used this sample to estimate the parameters themselves via MPL. A comparison of the values of the parameters, as well as a comparison of the realizations obtained, showed good agreement between true and estimated parameters. In the third set, they estimated the parameters from both hand-drawn edge images and the edge maps obtained using a thresholded Sobel operator on natural images.
150
L. BEDINI ET AL.
In this case, only comparisons between the original edge maps and realizations from the estimated prior could be performed. However, these comparisons showed promising statistical similarity between the edge maps. We did a similar set of experiments for a method of parameter estimation based on the learning capabilities of a stochastic neural network, namely the generalized Botlzmann machine with multiple interactions. This method will be described in some detail in the next subsection. 1. Learning the Hypelparameters
In Bedini et af. (1993b, 19944 we proposed a unified framework for dealing with the two main problems that arise in the MAP restoration of images with explicit discontinuities: MRF parameter estimation and nonconvex optimization. This framework is given by a particular model of a neural network, the generalized Boftzmann machine (GBM) with multiple interactions, which has interesting learning capabilities and computational power. In the standard quadratic Boltzmann machine (BM), interactions between neurons are limited to pairs of neurons with binary states (Hinton et af., 19841, whereas a generalized BM is a network of discrete state units interconnected according to a system of cliques (Azencott, 1990, 1992). The degree of activity of each clique C is measured by an interaction potential Vc(x), which is an arbitrary function of the clique configuration. Associated with each clique C there is a real parameter wc called clique weight. All the clique weights can be organized into a vector w.The global configuration of the BM has energy:
An asynchronous stochastic dynamics can be defined so that the machine has at its equilibrium a Gibbs probability distribution on the set of configurations given by
where Z T is the partition function, and T is the temperature. Owing to the equivalence between MRFs and Gibbs distributions, this BM model can be considered as a special case of an MRF. The asynchronous dynamics of a BM is naturally suitable for a parallel implementation. It can also be implemented sequentially: at each instant, only one of the neurons attempts to modify its state, and all neurons are
EDGE-PRESERVING IMAGE RECONSTRUCTION
151
visited periodically following a preassigned but arbitrary order. Single neurons are updated randomly by selecting the next state according to its conditional probability. This update is local, as it involves only the states of the neurons in its neighborhood. The similarity between this model of BM and MRFs is clear. In fact, like the realizations of the MRFs, the configurations of the BM can be generated, in accordance with the rule defined above, using the Metropolis algorithm or the Gibbs sampler. Moreover, if the temperature T is suitably decreased, the Boltzmann machine can reach a configuration of maximum probability, corresponding to the global minimum of U(x). For this GBM, we defined two sets of units: the input neurons corresponding to the elements of the intensity process x, and the output neurons corresponding to the elements of the line process 1. For every clique C, a potential V,(x,I) and a weight w, were defined, so that the sum of all the weighted potentials coincided with the energy U(x,I) associated with the MRF image model, for instance, Eq. (44) plus Eq. (47). The weights in the network are thus the parameters of the Gibbs distribution or, equivalently, of the MRF model. When the weights are predefined, if all input and output units run free, the GBM can be used to draw samples from the prior probability of the MRF model. If the input units x are clamped, the GBM can be used to compute realizations of the line process alone; that is, it can be considered as an edge finder. Alternatively, when coupled with a deterministic procedure that gives the input x, in the context of a mixed-annealing or GEM algorithm, the GBM can be left to evolve at a fixed temperature so that it behaves like a Metropolis algorithm or a Gibbs sampler and, at equilibrium, it gives an update of the line process. When the parameters of the prior energy are not known in advance, the GBM can be trained to learn the weights by examples, constituted by pairs of inputs x and outputs I that agree with our knowledge about the solution. More precisely, w is adjusted during a preliminary training phase, alternating between clamped and undamped example presentations. In the clamped period, one or several examples are presented successively, and all the units are clamped onto the examples. In the undamped period, only the input units are clamped onto the examples; the output units evolve freely, according to the stochastic dynamics. A learning algorithm is a procedure which minimizes a distance c$(w) between the asymptotic distributions PYX) and p ( x ) of the configurations in the clamped and undamped phases, respectively. In particular, the vector of the optimal weights should satisfy the zero gradient condition. This can be obtained by a gradient descent algorithm, which updates the current values of w by the quantity Aw, proportional to the negative gradient of c$(w).
152
L. BEDlNl ET AL.
Various choices for t$(w) are possible, leading to different learning rules (Azencott, 1992). We referred to the diuergence measure, introduced by Hinton et al. (1984) (see also Aarts and Korst, 1989):
which leads to the following learning rule:
where E,, and Euncldenote the expectations of Vc in the clamped and unclamped phases, respectively, and the small gain 7, known as the learning rate, is an appropriate convergence factor. In practice, the expectations in Eq. (121) cannot be computed exactly; they can only be estimated by averaging the clique activity Vc during a monitoring period after the GBM has reached equilibrium. The computations are thus only local. Nevertheless, these estimates contain some noise, because they are obtained using only a finite number of examples and a finite monitoring time. The experiments performed on the GBM described above were mostly designed to highlight the performance of the learning algorithm. In an initial set of experiments we tested the learning algorithm in the ideal case in which the examples are drawn from some predefined distribution. In other words, different prior energy functions were predefined, by assigning different values to the model parameters and then to the weights of the associated GBMs. For each choice of the weights, a set of samples for all the units was generated by letting the corresponding GBM run freely at a constant temperature. Each set of samples was then used to train the same GBM, with initial weights chosen randomly. As a measure of convergence of the learning algorithm, the mean squared error between the theoretical weights and the current weights at each iteration was chosen. To reduce the computation cost of the experiment, we restricted the GBM model to the case of one-dimensional signals (Bedini et al., 1993b). In the second set of experiments, we assumed that the parameters of the prior energy were unknown, so that we had no predefined criteria for selecting the set of training examples. We assumed only that we had some examples of piecewise smooth images plus the related edge maps. The BM was then trained on this set of examples, and the weights were computed. In this case, the convergence properties of the learning algorithm for examples drawn from known distributions were no longer confirmed. Specifically, the weight values at convergence were found to depend on the initial values. However, in all the experiments we found that different sets of weights, corresponding to different initial values, always produced the
EDGE-PRESERVING IMAGE RECONSTRUCTION
153
same edge maps. This behavior of the learning algorithm could be related to the interdependence of the parameters in synthetic and real image models. Moreover, in our experiments we found that the weights had different convergence rates. This is typical in the BM learning algorithm. In particular, the weights related to clique configurations that are presented rarely to the BM converge more slowly; in the limit case of configurations occurring very seldom, the weights tend to diverge. To prevent an unbounded increase in the weights, the computation of the unclamped expectations was made only on locations where a line element is present in the example being examined. This strategy was very useful for correctly determining those weights, which must be negative, associated with line configurations that are very likely, and then have to be promoted. Moreover, while this strategy does not affect the generality of the algorithm, it greatly reduces the computations required. In Howard and Moran (1993) the same learning rule described above was augmented by a temperature variation schedule, in an attempt to keep the parameters bounded during training. However, Howard and Moran considered MRF models for the intensity process alone, without introducing the line process.
VIII. SOMEAPPLICATIONS In this section, we report some experimental results drawn from our activity in the field of image reconstruction/restoration. Some of these results come from theoretical investigations; others were obtained in a more practical context. We were dealing with three main applications: image reconstruction from projections, image deblurring, and image reconstruction from noisy, sparse or dense, data. In the various cases, we tested different image models and different algorithms. In all cases our guideline was to use image discontinuities, in order to prove that adopting a priori information about their geometry can significantly help the reconstruction task. For the sake of clarity, we will subdivide these experimental results into two groups. In the first, we consider the use of image models with explicit, binary line processes and employ simulated annealing, mixedannealing, and GEM algorithms for the MAP estimation. In the second, we consider image models that implicitly refer discontinuities and employ fully deterministic GNC-type algorithms.
154
L. BEDINI ET AL.
The results highlight that the quality of reconstructions depend both on the model adopted and on a good choice of parameters, whose number increases as the complexity of constraints increases. In many practical applications, simple guidelines for selecting these parameters are not yet available, and determining their values by means of trial-and-error procedures is very expensive. Moreover, as reported in Section VII, the methods proposed for parameter estimation are recent, and their effectiveness has been shown only for estimating the regularization parameter A. In many cases, it can thus be simpler, although less effective, not to use line configuration constraints, so as to have a reduced set of parameters to be determined. Thus, we also report results obtained when no line configuration constraints are forced on the solution. A. Explicit Lines
The set of experiments described below regards the restoration of blurred and noisy images and the tomographic reconstruction from a few projection data. These experiments were aimed at verifying the efficiency of the introduction of different types of constraints on the line element configurations. We address an explicit binary line process, because, as already highlighted, in this context the formulation of even complex self-interactions of the lines is relatively easy. From a theoretical point of view, the convergence to the global minimum of the mixed-variable cost functionals was proved only for the simulated annealing algorithm. Criteria for obtaining good approximations in finite time were also proposed (Aarts and Korst, 1989). It is well known that implementing simulated annealing algorithms on sequential machines is impractical in most cases. Nevertheless, the availability of parallel machines makes it possible to implement them with reasonable computation times. The choice of simulated annealing provides us with a tool to evaluate the models adopted, which is not affected by the limits of suboptimal algorithms. In particular, we exploited the computational power of the parallel machine Cray T3D, based on 64 processors, to implement a parallel version of simulated annealing, according to the practical schedule suggested by Aarts and van Laarhoven (Aarts and Korst, 1989). The hopefully optimal reconstructions obtained by this algorithm can be used to evaluate, through a comparison of the quality, the performances of the other algorithms designed for mixed variables nonconvex optimization, namely GEM and mixed annealing. These algorithms were implemented on sequential machines; the mixed annealing was also implemented on Cray T3D.
EDGE-PRESERVING IMAGE RECONSTRUCTION
155
1. Deblum'ng and Noise Removal
For the restoration of blurred and noisy images, we considered small (128 X 128) synthetic images, degraded by the convolution of the original images with a low-pass kernel, defined by a uniform mask, and by adding an uncorrelated Gaussian noise of zero mean and variance u '. Within this assumption, the data generation model is linear, given by Eq. (2), where matrix A is a block-Toeplitz matrix, which expresses, in the lexicographic notation, the convolution operator. We consider a piecewise smooth model for the image with different constraints on the line field. This results in a posterior energy of the general form:
where U(x,I) is given by U,(x, I) of Eq. (44) plus different forms for U2(1), derived from Eq. (47). In particular, we first considered a noninteracting line field, by setting U2(I) = 0. In this case the posterior energy is reduced to the weak membrane. As an alternative, we also considered a line process constrained to be connected, by setting, in Eq. (471, yi = 0, ei= 0, and K = 0, and connected and thin, by setting, in Eq. (471, ei= 0 and K =
0.
The simulated annealing algorithm was implemented according to the following rules:
As the initial temperature we chose To = 100. 0
0
The cooling schedule was Tk = 0.95Tk-,. The Markov chain for each Tk consisted of 80 complete updates of all the intensity and line elements, performed by visiting the image in raster scan mode.
The results obtained are shown in Figs. 11-14. Figure 11 shows the original 128 X 128 synthetic image, with dynamic range [0, 2551 (Fig. lla), a blurred and noisy version (Fig. llb) obtained by convolution with a uniform mask of size 3 X 3, and adding Gaussian noise with u = 10, and the image obtained by minimizing posterior energy (122), keeping all line elements fixed to zero, and using A = 0.024 (Fig. llc). As is well known, this reconstruction corresponds to the solution produced by standard regularization with first-order derivatives. This solution shows the typical performance of the methods that do not account for discontinuities, i.e., the good noise removal but, at the same time, the excessive smoothing across the edges. Figures 12a and 12b, Figs. 13a and 13b, and Figs. 14a and 14b show the reconstructions and the corresponding edge maps obtained from the degraded image of Fig. llb, by adopting the posterior energy of
156
L. BEDINI ET AL.
C
F I ~ U R11. E Restoration without Lines: (a) original synthetic image; (b) degraded image, obtained by convolving the original with a 3 X 3 uniform mask and adding Gaussian noise with u = 10; (c) reconstructionwith standard regularization.
Eq. (1221, and the three choices for U&I) described above. We started by considering the most complete form for U2(l), i.e., the one that incorporates the constraints of line continuation and line thinness. We used the following values for the parameters: h = 0.024, a = 4.5, y1 = yz = 1.8, p1 = p4= - 1.8, pz= p3 = pS = pa= -0.9. Figures 12a and 12b show the intensity map and the corresponding edge map obtained in this case. The intensity map, when compared with Fig. llc, highlights the usefulness of introducing a suitably constrained line process. Indeed, besides good noise removal, we also obtained a complete removal of the blur. Moreover, the edge map obtained as a by-product can be used for recognition and classification. Since the original image was synthetic, the choice of parame-
EDGE-PRESERVING IMAGE RECONSTRUCTION
157
a b FIGURE12. Restoration of Fig. l l b with explicit lines, continuation, and thinness constraints: (a) intensity map; (b) edge map.
ters was relatively easy, although it was done by trial and error. In the next figures, we show the results obtained by keeping the same parameters and removing, in turn, the thinness constraint (Figs. 13a and 13b) an the thinness plus continuation constraints (Figs. 14a and 14b). It is easy to verify that the intensity and the edges deteriorate when the line constraints are dropped. Probably, we could have gotten better results in these cases too, by suitably selecting a new set of parameters for each experiment. Nevertheless the choice of this set would have been more critical and thus more difficult to do by trial and error.
a b FIGURE13. Restoration of Fig. l l b with explicit lines, continuation constraint: (a) intensity map; (b) edge map.
158
L. BEDINI ET AL.
b
a
FIGURE14. Restoration of Fig. llb with explicit, unconstrained lines: (a) intensity map; (b) edge map.
For all the experiments, the computation was stopped when the temperature reached the value T = 1, after 85 iterations. The computation time, approximately the same for all the experiments, was about 10 minutes. The mixed-annealing algorithm, described in Section V,B,3, was implemented by using the same posterior energies, with the same parameters and the same cooling schedule adopted for simulated annealing. The deterministic minimization step of E(x, I), with respect to x, was performed by means of a conjugate gradient. In the stochastic step, a sample of the line process is drawn by repeatedly updating each line element on the basis of its local conditional probability, given the values of the image elements in its neighborhood. In our case, the local conditional probability for the horizontal line element hi, is
P ( h i , j = 1 Ix,I)
=
exp( -E(x, 1')/T) exp( -E(x, I')/T) exp( -E(x, I')/T) ' (123)
+
where 1' and I' are two configurations of line elements differing only in the value of hi,j , which is 0 in 1' and 1 in 1'. Due to the properties of the Gibbs distributions, Q. (123) can be simplified as follows:
P ( h i , j = 1 Ix,l)
=
1 + exp
"c
1
T c:h,, j'c
9
(U1) -
(124)
W)]
where Vc(l) and V,(O) are the potential values associated with clique C when the element hi, of the clique itself is 1 or 0, respectively, and the
EDGE-PRESERVING IMAGE RECONSTRUCTION
159
other elements are fixed. A similar expression can easily be derived for the vertical line element ui,j . We found that mixed annealing converges faster than simulated annealing but makes the cooling more critical. If the cooling is not slow enough, the line elements may become “frozen” right from the first iterations. To avoid this, we adopted a variable threshold, which decreases as long as the temperature decreases. In our experiments, starting from a high value, we decreased the threshold until it reached the same constant value adopted in simulated annealing. We thus obtained reconstructions that were almost identical to those shown in Figs. 11-14, with a reduction in the computation time on Cray T3D of about 80%. In Bedini and Tonazzini (19921, we reported results of image deblurring obtained with mixed annealing. In that case, because a parallel computer was not available, the results were obtained by adopting quite heavy approximations, in order to reduce the computation times drastically. In particular, we kept the temperature fixed at a relatively low value. In this case the algorithm becomes suboptimal, although in any case the results were satisfactory after a few iterations. 2. Transmission Tomography Tomographic reconstruction from projections can be formulated in the same way as the restoration problem. As examples, we consider the application of mixed-annealing and GEM algorithms to X-ray transmission tomography. In this case, the matrix A which appears in the linear data generation model Eq. (2) is the Radon matrix R, whose element rk,, is the length of the intersection between the kth projection path the lth pixel of the lexicographically ordered image vector. Despite the particular nature of the X-ray photon emission process, we assumed a Gaussian noise model (see Manbeck, 1992). As the posterior energy, we chose Eq. (122), where U(x,1) is given by U&x, I) in Eq. (44) plus U,(l) in Eq. (461, expressed in the form of a table of potential values. We first consider a set of numerical experiments to test the performance of mixed annealing for this application. These experiments are reported in Bedini et al. (1993a). Since only a sequential computer was available at that time, we implemented the mixed-annealing algorithm keeping the temperature fured along all the iterations. The test image we used in each case is shown in Fig. 15. Its shape and intensity values, in the range [O., 0.41, were suggested by the classical Shepp and Logan head phantom. We numerically generated projection data from this image and corrupted the various data sets with different amounts of Gaussian noise.
160
L. BEDINI ET AL.
RGURE
15. Test image for tomographic reconstruction.
Figure 16 shows the results obtained using a data set of 30 projections (uniformly spaced over the entire T angle), with 151 samples per projection, corrupted by Gaussian noise with 40 dB SNR. The reconstruction with mixed annealing was obtained after 10 iterations, using A = 150, a = 0.345,and T = 0.7. Because the test image was piecewise constant, with clearly separate uniform regions, we did not need to vary the threshold. The values of the line potentials adopted for the possible clique configurations in Fig. 5 are shown in Table I. The results of filtered backprojection (FBP) and algebraic reconstruction (ART) are shown for comparison. We also show the image resulting from the first iteration of mixed annealing, as it is computed assuming all zero line elements and is thus comparable with the result of the standard Tikhonov regularization technique. Figure 17 shows the results of ART and FBP, for 150 projections and the same SNR.As synthetic quality indexes for the reconstructions, we took the normalized root-mean-square distances from the original (Herman, 1980); their values are reported in Table I1 for the various cases shown. As can easily be seen, the distance for ART reconstruction from 150 projections is comparable with that of the proposed mixed-annealing reconstruction from 30 projections. In other words, reconstructions of comparable quality have been obtained by our method from as much as one fifth of the data set needed by ART. Although the effectiveness of the local continuity constraint and the edge-preserving features have also been found for image restoration, the possibility of data reduction is specific to the application of the MRF approach to image reconstruction.
EDGE-PRESERVING IMAGE RECONSTRUCTION
161
d
C
FIGURE16. Reconstructions of Fig. 15 from 30 projections and 40 dB SNR (a) mixedannealing reconstruction after 10 iterations; (b) FBP reconstruction; (c) ART reconstruction; (d) reconstruction after the first iteration of the mixed annealing (all zero line elements). TABLE I POTENTIAL VALUES FOR THE RESULTS IN FIG.16 Configuration No line Termination
Turn Continuation T Cross
V, 0.0 3.0 0.0 0.0 1.25 1.75
Configuration
No line Single element Double line
v* v3 0.0 0.0 2.25
162
L. BEDINI ET AL.
a b FIOURE17. Reconstructions by ART (a) and FBP (b) from 150 projections and 40 dB SNR.
Let us now consider the application of GEM algorithms to the same tomographic reconstruction problem. This topic is described in Salerno er al. (1993) and Bedini et al. (1994d). As seen in Section V,C,2, the application of the GEM strategy entails defining a complete data set, which is specific to the problem at hand. In our case, we chose the following family of independent random variables:
+ 4.
(125) each variable is related to the jth pixel and the ith ray path; 4.,jis a zero-mean Gaussian variable, with variance qiTj such that Zi,
i
4 , j= 4.
= ri,j X j
and
j;
uiiTj = u 2 ,
i
( 126)
where 4. is the ith component of the random noise vector N, and each sum is performed over all the pixels crossed by the ith ray path. The TABLE I1 NORMMIZED RooT-MeAN-SOUARE DISTANCES
Reconstruction
Mixed annealing FBP ART Standard regularization FBP ART
30 projections 30 projections 30 projections 30 projections 150 projections 150 projections
Distance 0.0625 0.5313 0.2578 0.2891 0.4141 0.1016
163
EDGE-PRESERVING IMAGE RECONSTRUCTION
many-to-one relationship between unobservable and observable data is Gi =
C Zi,j
Vi.
i
(127)
From Eqs. (126) and (127), the likelihood function for each unobservable data point is: P ( z ~Ix,I), ~ a exp
( zi,j
- ri,
V i ,j
2 ui;j
( 128)
and, for the independence of the Zi,j's,
+ terms independent of x.
(129)
To compute the conditional expectation Ez{log P(z I x, I) I g, xk,I k ) from Eq. (129), we need an expression for Ez{zi, I g i , xk,l k } . By using the second relation in Eq. (126) and bearing in mind that, in our case, Ez(zi, I x, I} = ri,j x j , the following expression holds for the conditional expectation of the variable Zi, (see Salerno et al., 1993): U.L.
Ez[~i,jIgi,xk,Ik} = r 1.1 . .x!I + + ( g i -
Cri,nxL),
(130)
n
with which we can write the conditional expectation of log P(z I x, I):
which, as expected, is independent of both I and Ik. If we let r? .
aI . =
u2
C r.l i
and
ui.j
6:
=
C(gi - C r i , n x f ) r i , j (132) i n
then Q(x,l I x k , l k )
1 =
C ( ( a j x f + 6j")xj - : x f }
J
i
-
U(x,l) (133)
164
L. BEDINI ET AL.
where U(x,I) is given by Eq. (44) plus Eq. (46). To compute the aj's, we need to specify the ui,j's. Like the definition of Z,this specification does not need a physical meaning; the only constraints are those shown in Eq. (126). We subdivided the noise variance into terms proportional to the squared values of the lengths rj,j :
where li is the set of the indexes of the pixels crossed by the ith ray path. Let us now examine the MI-step.As Q(x, Ik I xk, Ik) is a concave quadratic function in x, its maximum can be found by setting its gradient to zero and solving the resulting linear system. In practice, following the GEM approach, we look for an estimate x k + ' that only increases the value of Q. This is achieved by computing a fixed number of iterations of a Gauss-Seidel scheme. The resulting formula, incorporating both the E-step and the MI-step, is the following:
where we substituted the lexicographic index J, appearing in Eqs. (132) and (133), with the matrix index I , rn. In the M2-step, we look for an I k + l such that Q ( x k + l , Ik+' Ixk,lk) > Q(xk+', Ik Ixk, Ik). Because only the prior energy term in Q depends on I, we can equivalently look for an I k + l such that U ( x k + ' , I k + l ) < U ( x k + l Ik). , To this end, we adopted a simulated annealing scheme. If this procedure is carried out until convergence, it finds the global minimum of U(xk+ 0, which is nonconvex with respect to 1. However, this gives rise to two problems, because simulated annealing is a computationally intensive algorithm and, as already observed, it may freeze the lines after a few iterations. Both these difficulties can be dealt with by performing a fixed number of iterations of the simulated annealing for each M2-step. This fixed number should in any case guarantee that the prior decreases. Moreover, the M2-step can be accomplished once for every N complete M,-steps, with N evaluated experimentally. Another strategy for control-
',
EDGE-PRESERVING IMAGE RECONSTRUCTION
165
ling the growth of the line process is to allow threshold to start with a large value and then gradually decrease during the iterative process, for a coarse to fine detection of the discontinuities. To evaluate experimentally the performance of GEM in this case we carried out several simulations with computer-generated projection data and compared GEM with mixed annealing. We used two synthetic test images. One of them is the one shown in Fig. 15; the other was obtained from a real tomographic image and is shown in Fig. 19a. Figures 18a and 18b show the GEM reconstruction (intensity and line processes, respectively) of the object shown in Fig. 15 from the same data set used in the mixed-annealing experiments. Starting with a uniform image, the algorithm ran for 40 iterations keeping all the line elements at zero. The algorithm then ran for 360 iterations including the line process. For each iteration, we performed 300 iterations of the Gauss-Seidel scheme in the MI-step and 500 iterations of simulated annealing in the M,-step. We set A to 0.2 and u z to 0.01. The values for the line potentials V,, V,, and V, are shown in Table 111. The threshold initially at 0.045, was gradually decreased to 0.016 by suitably decreasing a. This allowed us to detect most of the discontinuities in the image intensity and to avoid the spurious lines that would have been detected starting with a low value of the threshold. In the case shown, the root-mean-square distance from the original is 0.0522. As can be seen by comparing the corresponding results for the mixed-annealing algorithm, the two reconstructions have a similar quality.
w,
FIGURE18. GEM reconstruction of Fig. 15 from 30 projections and 40 dB SNR (a) intensity map; (b) edge map.
166
L. BEDINI ET AL.
TABLE I11 POTENTIAL VALUES FOR THE RESULT^ IN
FIGS.18 AND 19
Configuration
V,
Configuration
v2 = v,
No line Termination Turn Continuation T Cross
0.0 0.0003 0.0 0.0
No line Single element Double line
0.0 0.0 0.02
0.0005
0.0015
The generation of the data set and the reconstruction, for the test image in Fig. 19a, again with values in [0.,0.4], were carried out under the same conditions as in the previous case. In this case, however, the total number of iterations was 440 and the threshold was decreased to 0.02. The results are shown in Figs. 19c and 19d. The normalized root-mean-square distance from the original is now 0.17304. To demonstrate the influence of the explicit introduction of a line process, in Fig. 19b we show the reconstruction obtained from the same data by 440 iterations of the GEM procedure without lines. The same parameters have been kept. Excessive smoothing across the discontinuities is evident. In this case, the value of the normalized root-mean-square distance is 0.18069. Another aspect to be investigated is the influence of the self-interaction term for the line process. In Figs. 19e and 19f, we show the result obtained from the same data set and in the same conditions as in Figs. 19b and 19c, except for the removal of the line potentials from the prior. As can be seen, the intensity reconstruction looks good, but the line field recovered shows many unwanted double or triple lines, as well as crosses and isolated line elements. B. implicit Lines
When the lines are implicitly referred, expressing complex constraints on line configurations is usually difficult; only simple constraints can thus be used in practical applications. We considered the models expressed by the two neighbor interaction functions described in Sections IV,B and VI,C, respectively, using first-order derivatives. The first model is an extension of the truncated parabola and is suitable for introducing the constraints of line continuation or line thinness. The second is based on a sigmoidal approximation of the binary line elements explicitly appearing in cost
EDGE-PRESERVING IMAGE RECONSTRUCTION
167
FIGURE19. GEM reconstruction of a test image obtained from a real CT reconstruction, from 30 projections and 40 dB SNR: (a) test image; (b) reconstruction without lines; (c) reconstruction with constrained lines; (d) edge map; (e) reconstruction with unconstrained lines; (0edge map.
168
L. BEDINI ET AL.
functionals of the form of Eq. (122) and could express more complex constraints. In our experiments, we focused on the line continuation constraint alone. The set of experiments performed on synthetic images shows the effectiveness of using this constraint. However, in restoring real images we often had difficulty in determining the values of the model parameters, when line continuation constraints were considered. This is caused by the fact that, as already noted, these values have to be determined by trial and error, with impractical computation costs. Thus, in several cases, we tried solutions without line continuation constraints, adopting image models described by some of the neighbor interaction functions reported in Section IV,B,l, again for the implicit line treatment. Most of our experiments concern image reconstruction from noisy data. In this case, matrix A in Eq. (2) is either the identity matrix, when the data are dense, or a diagonal matrix with zeros and ones as elements, when the data are sparse. In the latter case, the task is to remove the noise and to fill in the missing data, by exploiting the smoothness properties of the neighbor interaction functions used. To minimize the nonconvex posterior energies, we applied the various GNC-type algorithms described in Sections V,C,l, VI,B, and V1,C. All these algorithms were implemented on sequential machines. In all the trials we found that, for a 128 X 128 image, the algorithms stop after about 10 iterations consuming about 400 CPU seconds on an IBM-3081 computer. We considered both real and synthetic test images. In particular, we considered synthetic piecewise smooth images with constant and planar regions, real images of printed characters (which can be roughly considered piecewise constant), and real images of faces, landscapes, and sculptures. These last images can hardly be considered as piecewise smooth and would need at least the adoption of higher order derivatives. However, these images are useful for testing the performance of the adopted neighbor interaction functions when applied to images of a general type. The degraded images were obtained by adding the usual Gaussian noise to the originals and, for sparse data, by randomly selecting a percentage of the pixels. In the latter case, the remaining pixels were considered to be missing. For the choice of the parameters in the neighbor interaction functions, we adopted empirical criteria, based on the quality of the reconstructions and on the neighbor interaction function adopted. The quality of the reconstructed images was evaluated by computing the root-mean-square error (MSE) between the reconstruction itself and the original image. The value of the cost functional in the reconstructed images was also computed as a measure of convergence.
169
EDGE-PRESERVING IMAGE RECONSTRUCTION
1. Serf-InteractingLines
We first considered the line continuation constraint and the following cost functional [see Eq. (84)l:
+C
+(xi,j
--x. i +
1, j
7
Xi. j + 1
- xi+ 1, j + I )
( 136)
i.j
As the neighbor interaction function we adopted [see Eqs. (85) and (8611:
with
Approximating 4 following Eqs. (88) and (89) leads to the so-called E-GNC algorithm described in Section V1,B. We started from values p* and u* chosen according to inequalities (901, in order to obtain the first convex approximation of F. The schedules adopted for decreasing p and u were the following:
Pk=
P*
F,
k
=
1,2,3,..., k,,, - 1,
where k is the current iteration and k,,, is the maximum number of iterations. The convergence properties of the algorithm when using Eq. (138) were evaluated for different values of the integer n and on different types of images. As an index of convergence, we computed the MSE between the original and the reconstructed images at each iteration. Typical plots of this error as functions of k" are shown in Fig. 20 for n = 1 (continuous line) and n = 2 (crosses). The experiments produced similar MSEs for n = 1 and n = 2, thus highlighting that the MSE substantially depends on Pk alone. In particular, we found that the MSE can be considered stationary for k > 100, when n = 1, and for k > 10 when
170
L. BEDlNl ET AL. 40 -
I
1
1
1
I
MSE
-
25
-
20
-
I5
r(
I
I
I
r(
r(
I
I
k"
iterations, for
n = 2. In the following experiments, we adopted n = 2 and k,,, = 11, thus obtaining a considerable reduction in the number of iterations. We also considered the following neighbor interaction function [see Eqs. (100) and (1001: if It,I < 8, otherwise,
(139a)
if It21 < 8 , otherwise,
( 139b)
with 1 - c)
which derives from approximating each line element with a parametric sigmoid function of the gradient. This argument was developed in Section VI,C. In this case the approximations for F(x) to be used in a GNC-type algorithm are given by Eq. (98) as T varies. When using these approximations, an initial value T* of temperature is sought for which the corresponding F,*(x) is convex. This value is then gradually lowered to zero. In this sense, the algorithm can be seen as a deterministic annealing. We tested the following annealing schedule for temperature: T*
Tk
=
k",
k
=
1,2,3,..., k,,,.
( 140)
EDGE-PRESERVING IMAGE RECONSTRUCTION
171
The convergence properties of the algorithm when using this schedule were evaluated for different values of the integer n and on different types of original images. No substantial differences in convergence were found for n equal to 3, 2, or 1; thus, all the experiments were performed with n = 3 and k,,, = 20. For the minimization of each F ( p * ' ) or F,(X) we applied a conjugate gradient algorithm. With reference to the synthetic image shown in Fig. 21a, some experiments with E-GNC were performed to test the influence of parameter E in the reconstruction. We considered dense data obtained by corrupting the original image with Gaussian noise with (+ = 25 (Fig. 21b). In the first experiment, we assumed A = 0.051 and (Y = 19.2, which give a very low = 19.38. As can be seen from Figs. 21c and 21d, assuming threshold, E = 0, this choice yields a good reconstruction of the true edges, but many spurious edges are also created. Indeed, the threshold is lower than the minimum intensity gradient in the original image, thus allowing even the finest discontinuities to be detected, but at the same time it is also comparable with the standard deviation of the noise, so this cannot be removed completely. In this situation, any value for parameter E different from zero would be unfeasible, as it would further propagate the incorrect edges. Increasing the threshold to 30.4, with A = 0.051, a = 47.2, and E = 0, removes the spurious edges but also prevents the edges corresponding to the lowest discontinuities in the original image from being detected. This causes oversmoothing in the reconstruction (see Figs. 21e and 21f). It would be preferable to adopt an E different from zero, such as E = 0.5, as this would favor the continuation of the correct edges (see Figs. 21g and 21h). However, if E is too high, the reconstruction would show an excessive proliferation of the edges. This effect is shown in Fig. 22, where a complete sequence of reconstructions, corresponding to different values of E ranging from zero to 0.75, is shown. For this test, the original image in Fig. 22a was degraded by an additive noise of standard deviation (+ = 40 (Fig. 22b), and the parameters a and A were chosen equal to 7.18 and 0.0077, respectively, corresponding to a threshold of 30.44. In real images, the minimum value of the intensity gradient can be nearly zero, although it is not always desirable to create discontinuities where the gradient is too small. Since the suprathreshold goes to zero as E goes to one, values of E close to one favor the creation of edges in correspondence with such small gradients, thus producing an excessively large number of edges. This effect is higher in the presence of noise. Conversely, a low value of E reduces the number of lines, thus producing an oversmoothed reconstruction. Experiments performed on real images showed that better results can be ob-
FI~URE 21. E-GNCreconstructions for different values of the threshold and of pararneter 8 : (a) original Mondrian image; (b) original plus noise (u= 25); (c) reconstruction with low threshold and no line continuation ( 8 = 0); (d) edge map; (e) reconstruction with high threshold and no line continuation ( 8 = 0); (0edge map; (g, reconstruction with high threshold and line continuation ( 8 = 0.5); (h) edge map.
EDGE-PRESERVING IMAGE RECONSTRUCTION
173
FIGURE22. Effect of varying the amount of line propagation in E-GNC for a synthetic image: (a) original image; (b) original plus noise (u = 40); (c) reconstruction with E = 0; (d) reconstructionwith 8 = 0.25; (e)reconstructionwith 8 = 0.5; (0reconstructionwith 8 = 0.75.
tained by adopting an intermediate value of E , such as E = 0.25. In this way, the price for creating a line is reduced by one quarter when there is a discontinuity at a neighboring site. The first image processed was a 128 X 128 image of printed characters, artificially degraded by randomly selecting 50% of the original image and adding uncorrelated Gaussian
174
L. BEDINI ET AL.
noise with u = 12. In Fig. 23, the original, degraded, and reconstructed images, plus line elements, are shown for A = 1.7 X a = 0.041, E = 0.25 (threshold = 15.57). In another example, a 200 X 200 real image of the Leaning Tower of Pisa was artificially degraded by randomly selecting 50% of the original image and adding uncorrelated Gaussian noise with u = 12. In Figs. 24a and 24b, the original and degraded images are reported. In Fig. 24c, the a = 1.3 X E = reconstructed image is shown for A = 1.2 X 0.25. The experiments performed when the neighbor interaction function is given by Eq. (139) were again designed to analyze the effectiveness of using self-interactions between lines and are similar to those presented above. An example is shown in Fig. 25. We considered dense data obtained
FIGURE 23. E-GNCreconstruction of a real image. (Top left) Original image; (top right) randomly selected 50% of the original plus noise (u= 121-for display purposes the missing data are filled with white dots; (bottom left) reconstructed intensity map; (bottom right) edge map.
EDGE-PRESERVING IMAGE RECONSTRUCTION
175
FIGURE 24. E-GNC reconstruction of a real image: (a) original image; (b) randomly selected 50% of the original plus noise (a= 12)-for display purposes the missing data are filled with white dots; (c) reconstructed intensity map.
by corrupting an original 128 X 128 step image (Fig. 25a) with the addition of Gaussian noise with a standard deviation cr = 25 (Fig. 25b). We first assumed a low threshold 8 = -= 46 and E = 0 (Fig. 25c). In this case, too, this choice creates many spurious edges. Increasing the threshold to 65 removes the spurious edges but also prevents the edges from being correctly detected along the step, causing an oversmoothed reconstruction (see Fig. 25d). Using the same threshold with E = 0.5, the continuation of the correct edges is obtained (see Fig. 25e). The line continuation constraint was then used to restore a real 128 x 128 image of printed characters (Fig. 26a), artificially degraded by adding
176
L. BEDINI ET AL.
e FIOURE25. Reconstructions using sigmoidal approximations, with different values of the threshold and of parameter 6 : (a) original step image; (b) original plus noise (u= 25); (c) reconstruction with low threshold and no line continuation ( 6 = 0); (d) reconstruction with high threshold and no line continuation ( 8 = 0); (el reconstruction with high threshold and line continuation ( 8 = 0.5).
Gaussian zero mean noise with standard deviation u = 25 (Fig. 26b). We and 8 = 0.5; the initial temperature used A = 0.02, (I! = 64, 8 = was T* = 7000 (Fig. 26d). For comparison, the reconstruction obtained without the line continuation constraint ( 8 = 0) is shown in Fig. 26c; here Note that, the best parameters were A = 0.02, (I! = 8, and 8 =
G,
G.
EDGE-PRESERVING IMAGE RECONSTRUCTION
C
177
d
FIGURE26. Reconstructions using sigmoidal approximations:(a) original real image; (b) original plus noise ((r = 25); (c) reconstruction with no line continuation ( E = 0); (d) reconstruction with line continuation ( E = 0.5).
although in both cases the noise is well removed, in the second the finer details of some of the printed characters are lost, whereas they were perfectly reconstructed in the first case. This effect can be seen more clearly in Fig. 27, which shows an enlarged section of the same sequence of images as in Fig. 26. Figure 27 looks smoother than Fig. 26 because of the zoom algorithm used. 2. Results without Line Constraints The results reported above, in agreement with those obtained for explicit lines, clearly highlight the importance of suitably constraining the line process. The availability of more constraints, and thus of more information, makes the choice of the correct parameters less critical. Indeed, as we have seen, even an imperfect value for the threshold can be corrected by the addition of a line continuation constraint. Nevertheless, selecting the parameters by trial and error can be very expensive. In many practical applications, a trade-off between an optimal choice of parameters and the
178
L. BEDINI ET AL.
C
d
FIGURE27. Enlarged section of the same sequence of images as in Fig. 26; the smoother appearance is due to the zoom algorithm used.
computation cost must be found. Some authors have claimed that satisfactory results can be also obtained for unconstrained discontinuities (see, e.g., Geman and Reynolds, 1992). In particular, Blake and Zisserman (1987a) argue that a sort of hysteresis property is implicit in their truncated parabola. In the case of implicit, unconstrained lines, we deal with two parameters alone, A and a, and thus the cost of a trial-and-error choice is certainly reduced. Nevertheless, the reconstruction is very sensitive to even small variations in their values. This is because, especially for images that are not exactly piecewise smooth, it is very hard to find values that are suitable for the whole image. Probably the best choice would be to consider as a space-variant parameter or, at least, to decrease it during the processing, as we did for explicit line treatment. This could be combined with the use of second- or third-order derivatives, as suggested by Geman and Reynolds (1992).
a
EDGE-PRESERVING IMAGE RECONSTRUCTION
179
For noninteracting discontinuities, the posterior energy considered is
with the following neighbor interaction functions [see Eqs. (37) and (39)l: At2 ff
m,
if It( < otherwise,
We recall that, while refers to a binary line process, 42 refers to a graduated line process. For &, we adopted the approximations (68). The start value p* for parameter p was chosen following Eq. (70). The schedule adopted for decreasing p was the following:
Pk = P * -
P* kmax
-
(k-l),
1
k = 1 , 2,..., k,,,,
(144)
where k is the current iteration and k,,, is the maximum number of iterations. In particular, we set k,,, = [ p* I. This schedule was found to be more effective than the usual one, based on successive reductions p + p / 2 . For dl we used approximations (67), provided by Blake and Zisserman (1987a), and (941, corresponding to a parametric sigmoidal approximation for each line element. When using Eq. (67), p* was chosen equal to 1, and the schedule for decreasing p was (1441, with k,,, = 10. When using the approximations in Eq. (94), an initial value T* of temperature is sought, for which the corresponding F,,(x) is convex. This value is then lowered to zero following Eq. (140). As an example, Fig. 28 shows the reconstruction obtained by using the truncated parabola and first-order derivatives, with approximations as in Eq. (94). The original image was obtained by digitizing a detail from the picture of a sculpture (Fig. 28a). The degraded image was obtained by randomly selecting 50% of the original pixels and adding Gaussian noise with a zero mean and a standard deviation (+ = 12 (Fig. 28b). The corresponding reconstruction in Fig. 28c was obtained with A = 0.086, ar = 260.4, 0 = and T* = 7000. The reconstruction is fairly good, although in some areas it looks a bit more “stylized” than the original. This effect is probably due to the use of a constant threshold for the whole image.
m,
180
L. BEDINI ET AL.
C
ROURE 28. Reconstruction with the truncated parabola and sigmoidal approximations: (a) original real image; (b) randomly selected 50% of the original plus noise ( D = 12)-for display pwposes the missing data are 5lled with white dots; (c) reconstructed intensity map.
In another experiment, we compared the performance of the truncated parabola (142) with that of function (1431, which refers to graduated discontinuities, when applied to the same data image. For each function, we used the best set of parameters that have been found, which were different for each case. The main result was that the use of graduated discontinuities performs better in reconstructing the planar regions of the images. This is confirmed by looking at the reconstructions in Fig. 29, obtained from an image degraded by adding Gaussian noise with u = 30 and randomly removing 50% of the pixels. Note that with function (142) ( A = 0.045, a = 3.4) the planar regions of the image are well reconstructed (bottom left). On the contrary, the same planar areas become almost constant in the reconstruction with the truncated parabola ( A = 0.125, a = 1.1) (bottom right).
EDGE-PRESERVING IMAGE RECONSTRUCTION
181
Figure 30 shows the reconstructions of the image of a face (Caravaggio’s), degraded by adding Gaussian noise with cr = 12 and randomly removing 50% of the pixels. The reconstruction with function (142) (bottom left) was obtained by using A = 0.35 and a = 3.47, and the reconstruction with the truncated parabola (bottom right) was obtained by using A = 0.014 and a = 2.8. From the results obtained, we are reasonably convinced that the use of graduated discontinuities can partially compensate for the absence of line continuation constraints and/or a space-variant threshold.
IX. CONCLUSIONS This work originated as a review of our activity in optimization techniques for edge-preserving image reconstruction and restoration during the last few years. Although it should not be considered as a complete state-of-theart review, we tried to arrange our work in the general context of the research in this area.
FIGURE29. Surface plot of reconstructions with the truncated parabola and the neighbor interaction function of Eq. (143). (Top left) Original image; (top right) randomly selected 50% of the original plus noise ( I T = 30); (bottom left) reconstruction with function of Eq. (143); (bottom right) reconstruction with the truncated parabola.
182
L. BEDINI ET AL.
FIGURE30. Reconstruction with the truncated parabola and the neighbor interaction function of Eq. (143). (Top left) Original image; (top right) randomly selected 50% of the original plus noise (a 12I-for display purposes the missing data are filled with white dots; (bottom left) reconstruction with function of Eq. (143); (bottom right) reconstruction with the truncated parabola.
-
We have addressed two particular problems: image restoration and tomographic reconstruction. We chose to consider both of them because they are strictly connected and can be formulated in the same way. These problems are found in many fields; tomographic reconstruction is principally of interest in medicine but is also becoming of increasing interest in other fields, such as nondestructive quality control of materials. Moreover, these problems are prototypical of a large class of problems in image processing and computer vision, in which it is important to preserve the
EDGE-PRESERVING IMAGE RECONSTRUCTION
183
image features related to intensity discontinuities. The most common approaches proposed in the literature are based on MRF models for the image. Their success lies in the flexibility of MRF in modeling images with discontinuities and in introducing suitable constraints both in the image intensity and in the line field. Following these approaches, the problem is formalized in a Bayesian framework and the solution is usually obtained as the global minimum of a generally nonconvex posterior energy. We considered several MRF models for treating discontinuities both explicitly and implicitly and report the main algorithms proposed to obtain the solution. In particular, we have highlighted models which allow line configuration constraints to be forced in the solution. For the models that treat the discontinuities explicitly, even very complex constraints can be handled, whereas in models with implicit discontinuities, forcing complex constraints is very difficult; in this case, only simple constraints, such as line continuation, can thus be introduced. Many experiments have been performed with both synthetic and real images. The results showed that the use of constraints on the line configurations always improved image quality. They also highlighted that in many practical applications the line continuation constraint seems to be the most important one. Because this constraint can be treated using implicit discontinuities, the algorithms referring implicitly to the discontinuities acquire particular importance because of their relatively low complexity. On the basis of our experience, models that do not allow for self-interacting lines can also permit satisfactory results to be reached in many practical applications. This is particularly true when graduated discontinuities are used. Nevertheless, this entails a fine tuning of the model parameters, namely the regularization parameter and the threshold for creating a line, and reduces considerably the robustness of the methods with respect to small variations in the parameters. On the other hand, despite the many methods proposed in the literature for estimating the model parameters from the data or from examples, none of them at present can be considered effective for large problems. For these reasons, we believe that parameter estimation should still receive particular attention in the future. Another important aspect that we have focused on in this review is the high complexity of the reconstruction algorithms. Complexity is strictly connected to the nonconvexity of the posterior energy and to the usually very high number of variables t o be estimated. The computation time required is still impractical for real-time applications when sequential computers are used. Fortunately, most of the algorithms considered are intrinsically parallel and substantial advantages can be had by using different types of parallel architectures, including massively parallel archi-
184
L. BEDINI ET AL.
tectures such as neural networks. However, the study of suitable architectures is still an open issue. When real-time processing is not required, we believe that several algorithms reported in the review could have practical applications. In particular, mixed-annealing algorithms can run on general-purpose parallel machines, such as Cray T3D, in about 2-5 minutes, depending on the image size. Algorithms that implicitly treat the discontinuities usually require less than 10 minutes even with sequential machines.
ACKNOWLEDGMENTS We are grateful to all those who have worked with us in these last few years. In particular, we should like to thank Franco Caroti-Ghelli for helpful discussions and Lucio Benvenuti, Simone Pandolfi, Maria Pepe, and Xiaoyu Qiao for their valuable contributions. Special thanks to Enrico Fantini and Alberto Ribolini for their software support. This work has been partially supported by the North-East Italy Inter-University Computing Center (CINECA), which placed some free computing time on the Cray T3D at our disposal.
REFERENCES Aarts, E., and Korst, J. (1989). “Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing.” Wiley, Chichester, UK. Ambrosio, L., and Tortorelli, V. M. (1990). Approximations of functionals depending on jumps by elliptic functionals via r-convergence. Comm. Pure Appl. Math. 43, 999-1036. Andrews, H. C., and Hunt, B. R. (1977). “Digital Image Restoration.” Prentice-Hall, Englewood Cliffs, NJ. Aubert, G., Barlaud, M., Blanc-FCraud, L., and Charbonnier, P. (1994). Deterministic edgepreserving regularization in computed imaging. Universit.6 de Nice-Sophia Antipolis, France, Laboratoire 3 s . Rapport de Recherche no. 94-01. Azencott, R. (1990). Synchronous Boltzmann machines and Gibbs fields: learning algorithms. In “Neurocomputing”(F. Fogelman and J. HCrault, Eds.). NATO AS1 F 68, pp. 51-64. Azencott, R. (1992). “Boltzmann machines: high-order interactions and synchronous learning. In “Stochastic Models in Image Analysis (P. Barone and A. Frigessi, Eds.), pp. 14-45. Lecture Notes in Statistics. Springer-Verlag, New York. Backus, G. (1970). Inference from inadequate and inaccurate data, Parts I, I1 and 111. Proc. Natl. Acad. Sci. USA 65, 1-105,281-287; 67, 282-289. Bedini, L., and Tonazzini, A. (1990). Neural networks use in maximum entropy image restoration. Image Wion Comput. 8, 108-114. Bedini, L., and Tonazzini, A. (1992). Image restoration preserving discontinuities: the Bayesian approach and neural networks. Image Vision Comput. 10, 108-118.
EDGE-PRESERVING IMAGE RECONSTRUCTION
185
Bedini, L., Fantini, E., and Tonazzini, A. (1991). A dual approach to regularization in image restoration. Pattern Recogn. Lett. 12, 687-692. Bedini, L., Benvenuti, L., Salerno, E., and Tonazzini, A. (1993a). A mixed-annealing algorithm for edge preserving image reconstruction using a limited number of projections. Signal Process. 32, 397-408. Bedini, L., Pandolfi, S., and Tonazzini, A. (1993b). Training a Boltzmann machine for edge-preserving image restoration. In “Neural and Stochastic Methods in Image and Signal Processing” (S. S. Chen, Ed.). Proc. SPIE 2032, pp. 192-202. Bedini, L., Pepe, M. G., Salerno, E., and Tonazzini, A. (1993~).Non-convex optimization for image reconstruction with implicitly referred discontinuities. In “Image Processing: Theory and Applications” (G. Vernazza, A. N. Venetsanopoulos, and C. Braccini, Eds.), pp. 263-266. Elsevier, Amsterdam. Bedini, L., Gerace, I., and Tonazzini, A. (1994a). A deterministic algorithm for reconstructing images with interacting discontinuities. CVGIP: Graphical Models Image Process. 56, 109-123. Bedini, L., Gerace, I., and Tonazzini, A. (1994b). A GNC algorithm for constrained image reconstruction with continuous-valued line processes. Pattern Recogn. Lett. 15, 907-918. Bedini, L., Qiao, X., and Tonazzini, A. (1994~).Using a generalized Boltzmann machine in edge-preserving image restoration. IEI-CNR, Pisa, Italy, internal report B4-40. Bedini, L., Salerno, E., and Tonazzini, A. (19948. Edge-preserving tomographic reconstruction from Gaussian data using a Gibbs prior and a generalized expectation-maximization algorithm. Int. J. Imaging Syst. Technol. 5, 231-238. Bedini, L., Gerace, I., and Tonazzini, A. (1995). Sigmoidal approximations for self-interacting line processes in edge-preserving image restoration. Pattern Recogn. Lett. 16, 1011-1022. Bertero, M., Poggio, T., and Torre, V. (1988). Ill-posed problems in early vision. IEEE Proc. 76, 869-889. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Royal Statist. Soc. Ser. B 36, 192-236. Besag, J. (1986). On the statistical analysis of dirty pictures. J. Royal Statist. Soc. Ser. B 48, 259-302. Besag, J. (1989). Towards Bayesian image analysis. J. Appl. Starist. 16, 395-407. Blake, A. (1989). Comparison of the efficiency of deterministic and stochastic algorithms for visual reconstruction. IEEE Trans. Pattern Anal. Machine Intell. 11, 2-12. Blake, A., and Zisserman, A. (1987a). “Visual Reconstruction.” MIT Press, Cambridge, MA. Blake, A., and Zisserman, A. (1987b). LoCalising discontinuities using weak continuity constraints. Pattern Recogn. Lett. 6, 51-59. Burch, S. F., Gull, S. F., and Skilling, J. (1983). Image restoration by a powerful maximum entropy method. Comput. Vuwn, Graphics, Image Process. 23, 113-128. Canny, J. (1986). A computational approach to edge detection. IEEE Tram. Pattern Anal. Machine Intell. PAMI-8,679-698. Courant, R., and Hilbert, D. (1962). “Methods of Mathematical Physics.” Interscience, London. Cross, G. R., and Jain, A. K. (1983). Markov random field texture models. IEEE Trans. Pattern Anal. Machine Intell. 5, 25-39. De Giorgi, E. (1977). r-convergenza e G-convergenza. Boll. Unwne Matematica Italiana 14-4 213-220. De Mol, C. (1992). A critical survey of regularized inversion methods. I n “Inverse Problems in Scattering and Imaging” (M. Bertero and E. R. Pike, Eds.), pp. 345-370. Adam Hilger, Bristol.
186
L. BEDINI ET AL.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. SOC.B 39, 1-38. Derin, H., and Elliott, H. (1987). Modeling and segmentation of noisy and textured images using Gibbs random fields. IEEE Trans. Pattern Anal. Machine Intell. 9, 39-55. Derin, H., and Giiler, S. (1990). Realizations and parameter estimation for line processes. Proc. ICASSP 90,pp. 2213-2216. IEEE, New York. Derin, A., and Kelly, P. A. (1989). Discrete-index Markov-type random process. IEEE Proc. 77, 1485-1510. Franklin, J. N. (1970). Well posed stochastic extensions of ill posed linear problems. J. Math. Anal. Appl. 31, 682-716. Frieden, B. R. (1985). Dice, entropy, and likelihood. IEEE Proc. 73, 1764-1770. Garnero, L., Franchois, A., Hugonin, J.-P., Pichot, C., and Joachimowicz, N. (1991). Microwave imaging-complex permittivity reconstruction by simulated annealing. IEEE Trans. Microwave Theory Techn. MIT-39, 1801-1807. Geiger, D., and Girosi, F. (1989). Parallel and deterministic algorithms for MRFs: surface reconstruction and integration. Artificial Intelligence Laboratory, MIT, A1 Memo 11 14. Geiger, D., and Girosi, F. (1991). Parallel and deterministic algorithms for MRFs: surface reconstruction. IEEE Trans. Pattern Anal. Machine Intell. PAMI-13, 401-412. Geman, S., and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. PAM1-6,721-740. Geman, S., and McClure, D. E. (1985). Bayesian image analysis: an application to single photon emission tomography. Proc. A m . Slat. Assoc. Stat. Comp. Secr. pp. 12-18. Geman, S., and McClure, D. E. (1987). Statistical methods for tomographic image reconstruction. Proceedings of the 46th Session of the 1st. Bull. IS1 52, 1-18. Geman, D., and Reynolds, G. (1992). Constrained restoration and the recovery of discontinuities. IEEE Trans. Pattern Anal. Machine Intell. PAMI-14, 367-383. Geman, D., and Yang, C. (1994). Nonlinear image recovery with half-quadratic regularization and FFT. Preprint. Gerace, 1. (1992). Algoritmi deterministici per la ricostruzione di immagini che presentano discontinuit;. Thesis in Computer Science, University of Pisa, Italy, 1991-1992. Gindi, G., Lee, M., Rangarajan, A., and Zubal, I. G. (1991). Bayesian reconstruction of functional images using registered anatomical images as priors. I n “Information Processing in Medical Imaging” (A. C. F. Colchester and D. J. Hawkes, Eds.), pp. 121-131. SpringerVerlag, New York. Golub, G. H., Heath, M., and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215-223. Green, P. J. (1990). Bayesian reconstructions from emission tomography data using a modified EM algorithm. IEEE Trans. Med. h a g . 9, 84-93. Gull, S., and Skilling, J. (1984). Maximum entropy method in image processing. IEE h c . F-131,646-659. Hall, P., and Titterington, D. M. (1987). Common structure of techniques for choosing smoothing parameters in regression problems. J . Royal Slat. SOC.B 49, 184-198. Hammersley, J. M., and Handscomb, D. C. (1985). “Monte Carlo Methods.” Methuen, London. Hebert, T. J., and Gopal, S. S. (1992). The GEM MAP algorithm with 3-D SPECT system response. IEEE Trans. Med. Imag. 11(1), 81-90. Hebert, T., and Leahy, R. (1989). A generalized EM algorithm for 3-D Bayesian reconstruction from Poisson data using Gibbs priors. IEEE Trans. Med. Imag. 8, 194-202. Hebert, T. J., and Leahy, R. (1992). Statistic-based MAP image reconstruction from Poisson data using Gibbs priors. IEEE Trans. Signul Process. 40,2290-2302.
EDGE-PRESERVING IMAGE RECONSTRUCTION
187
Herman, G. T. (1980). “Image Reconstruction from Projections: The Fundamentals of Computerized Tomography.” Academic Press, London. Hinton, G. E., Sejnowski, T. J., and Hackley, D. H. (1984). Boltzmann machine: constraint satisfaction networks that learn. Carnegie-Mellon University, Technical Report CNU-CS84-119. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79, 2554-2558. Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proc. Natl. Acad. Sci. USA 81,3088-3092. Hopfield, J. J. (1985). Neural computation of decisions in optimization problems. Biof. Cybern. 52, 141-152. Hopfield, J. J., and Tank, D. W. (1986). Computingwith neural circuits: a model. Science 233, 625-633. Howard, D.,and Moran, W. (1993). Self annealing when learning a Markov random field image model. In “Complex Systems: From Biology to Computation” (D. G. Green and T. Bossomaier, Eds.), pp. 327-340. 10s Press, Amsterdam. Hunt, B. R. (1973). The application of constrained least squares estimation to image restoration by digital computer. IEEE Trans. Compur. 22,805-812. Hunt, B. R. (1977). Bayesian methods in nonlinear digital image restoration. IEEE Trans. Cornput. 26, 219-229. Jaynes, E. T. (1968). Prior probabilities. IEEE Trans. Sysr. Sci. Cybern. SSC-4(3),227-241. Jaynes, E. T. (1982). On the rationale of maximum-entropy methods. IEEE Proc. 70(9), 939-952. Jeng, F. C., and Woods, J. W. (1988). Inhomogeneous Gaussian image models for image estimation and restoration. IEEE Trans. Acoust. Speech Signal Process. 36, 1305-1312. Jeng, F. C., and Woods, J. W. (1990). Simulation annealing in compound Gaussian random field. IEEE Trans. Inform. Theory 36,94-107. Jeng, F. C., and Woods, J. W. (1991). Compound Gauss-Markov random fields for image estimation. IEEE Trans. Signal Process. 39,683-697. Johnson, R. W., and Shore, J. E. (1981). Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inform. Theory 27, 472-482. Johnson, R. W., and Shore, J. E. (1983). Comments and correction to ‘Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inform, Theory 29,942-943. Johnson, V. E., Wong, W. H., Hu, X.,and Chen, C. (1991). Bayesian restoration of PET images using Gibbs priors. In “Information Processing in Medical Imaging” (D. A. Ortendahl and J. Llacer, Eds.), pp. 15-28. Wiley-Liss, New York. Kay, J. W. (1988). On the choice of regularization parameter in image restoration, In “Springer Lecture Notes in Computer Science,” Vol. 301, pp. 587-596. Springer-Verlag, New York. Kikuchi, R., and Soffer, B. H. (1977). Maximum entropy image restoration. I. The entropy expression. J. Opr. SOC.Am. 67, 1656-1665. Kirkpatrick, S.,Gellatt, C. D., and Vecchi, M. P. (1983). Optimisation by simulated annealing. Science 220,671-680. Koch, C. Marroquin, J., and Yuille, A. (1986). Analog ‘neuronal’ networks in early vision. Proc. Natl. Acad. Sci. USA 83,4263-4267. Lakshmanan, S., and Derin, H. (1989). Simultaneous parameter estimation and segmentation of Gibbs random fields using simulated annealing. IEEE Trans. Pattern Anal. Machine Intell. 11, 799-813.
188
L. BEDINI ET AL.
Lange, K., and Carson, R. (1984). EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 8,306-316. La Salle, J., and Lefschetz, S. (1961). “Stability by Liapunov’s Direct Method with Applications.” Academic Press, London. Leahy, R. M., and Goutis, C. E. (1986). An optimal technique for constraint based image restoration and reconstruction. IEEE Trans. Acoust., Speech Signal Process. 34, 1629-1642. Leahy, R. M., and Tonazzini, A. (1986). Maximum entropy signal restoration from short data records. I n Pzoc. 8th USTED Int. Symp. MECO ’86 Taomina, Italy, (G. Messina and M. H. Hamza, Eds.), pp. 195-199. Acta Press, Anaheim, CA. Leahy, R.. and Yan, X. (1991). Incorporation of anatomical MR data for improved functional imaging with PET. I n “Information Processing in Medical Imaging” (A. C. F. Colchester and D. J. Hawkes, Eds.), pp. 105-120. Springer-Verlag. New York. Levitan, E., and Herman, G. T. (1987). A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography. IEEE Trans. Med. h a g . 6(3), 185-192. Levitan, E., Chan, M., and Herman, G. T. (1995). Image-modeling Gibbs priors. Graphical Models Image ptocess. 57, 117-130. Luenberger, D. G. (1969). “Optimization by Vector Space Methods.” Wiley, New York. Luenberger, D. G. (1984). “Linear and Nonlinear programming,” 2nd ed. Addison-Wesley, Reading, MA. Manbeck, K. M. (1992). On Gaussian approximation to the Poisson distribution in image processing. Reports in Pattern Theory, no. 157, Division of Applied Mathematics, Brown University, Providence, RI. March, R. (1988). Computation of stereo disparity using regularization. Pattern Recogn. Lett. 8, 181-187. March, R. (1989). A regularization model for stereo vision with controlled discontinuity. Pattern Recogn. Lett. 10, 259-263. March, R. (1992). Visual reconstruction with discontinuities using variational methods. Image Vision Comput. 10, 30-38. Marroquin, J., Mitter, S.,and Poggio, T. (1987). Probabilistic solution of ill-posed problems in computational vision. J. Am. Stat. Assoc. 82, 76-89. Marroquin, J. L. (1984). Surface reconstruction preserving discontinuities, MIT, Artificial Intelligence Laboratory, A1 Memo 792. Marroquin, J. L. (1985). Probabilistic solution of inverse problems. PbD. thesis, MIT, T.R. 860. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., and Teller, E. (1953). Equations of state calculations by fast computing machines. J . Chem. Phys. 21, 1087-1091. Miller, K. (1970). Least squares methods for ill-posed problems with a prescribed bound. S U M 1. Math. Anal. 1, 52-74. Minerbo, G. (1979). MENT: a maximum entropy algorithm for reconstructing a source from projection data. Computer Graphics Image Aocess. 10,48-68. Mort, J. J., and Wu, Z. (1995). Global continuation for distance geometry problems. Argonne National Laboratory, Preprint MCS-P505-0395. Morozov, V. A. (1966). On the solution of functional equations by the method of regularization. Soviet Math. Dokl. 7, 414-417. Mumford, D., and Shah, J. (1989), Optimal approximations by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42,577-685. Poggio, T. (1985). Early vision: from computational structure to algorithms and parallel hardware. Comput. Viswn Graph. Image Rocess. 31, 139-155.
EDGE-PRESERVING IMAGE RECONSTRUCTION
189
Poggio, T., and Koch, C. (1985). 111-posed problems in early vision: from computational theory to analogue networks. Proc. Roy. SOC.Lo&. B 226,303-323. Poggio, T., Torre, V., and Koch, C. (1985). Computational vision and regularization theory. Nature 317, 314-319. Reinsch, C. H. (1967). Smoothing by spline functions. Numer. Math. 10, 177-183. Salemo, E., Bedini, L., Benvenuti, L., and Tonazzini, A. (1993). GEM algorithm for edge-preserving reconstruction in transmission tomography from Gaussian data. I n “Mathematical Methods in Medical Imaging 11” (D. C. Wilson and J. N. Wilson, Eds.), Proc. SPIE 2035, pp. 156-165. Scales, L. E. (1985). “Introduction to Non-linear Optimization.” Macmillan, New York. Shepp, L. A., and Vardi, Y.(1982). Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imag. 1(2), 113-122. Shore, J. E., and Johnson, R. W. (1981). Properties of cross-entropy minimization, IEEE Trans. on Information Theory 27,472-482. Tarantola, A. (1987). “Inverse Problem Theory.” Elsevier, Amsterdam. Terzopoulos, D. (1986). Regularization of inverse visual problems involving discontinuities. IEEE Trans. Pattern Anal. Mach. Intell. 8, 413-424. Terzopoulos, D. (1988). The computation of visible-surface representations. IEEE Trans. Pattern Anal. Machine Intell. 10(4), 417-438. Thompson, A. M., Brown, J. C., Kay, J. W., and Titterington, D. M. (1991). A study of methods of choosing the smoothing parameter in image restoration by regularization. IEEE Trans. Pattern Anal. Mach. Intell. 13, 326-337. Tikhonov, A. N. (1963). Solution of incorrectly formulated problems and the regularization method. Soviet Math. Dokl. 4, 1035-1038. Tikhonov, A. N., and Arsenin, V. Y. (1977). “Solutions of Ill-posed Problems.” Wiley, Washington, DC. Titterington, D. M. (1984). The maximum entropy method for data analysis, plus Skilling replies. Nature 312, 381-382. Trussel, H. J. (1980). The relationship between image restoration by the maximum a posteriori method and a maximum entropy method. IEEE Trans. Acoust., Speech Signal Process. 28, 114-117. Veklerov, E., and Llacer, J. (1987). Stopping rule for the MLE algorithm based on statistical hypothesis testing. IEEE Trans. Med. Imag. 6, 313-319. Wemecke, S. J., and D’Addario, L.R. (1977). Maximum entropy image reconstruction. IEEE Trans. Comput. 26, 351-364. Zhao, Y.,Zhuang, X.,Atlas, L., and Anderson, L. (1992). Parameter estimation and restoration of noisy images using Gibbs distributions in hidden Markov models. CVGIP: Graphical Models Image Process. 54, 187-197.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS.VOL. 97
Successive Approximation Wavelet Vector Quantization for Image and Video Coding E. A. B. DA SILVA AND D. G. SAMPSON* Department of Electronic Systems Engineering Uniuersiv of Essex, Muenhoe Park, Colchester C 0 4 3SQ, England
... . ... . .... . . . . . . . . . . . . . . . . . . . . . . . 191 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 . . . . . . . . . . . . . . . . . . . . . . . 195 .... . . . . . 198 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 . . . . . . . . . . . . . . . . . . . . . 205 . . . . . . . . . . . . . . . . . . . 205 . . . . . . . . . . . . . . . . . . . . . 214 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 .. . . . . . . . 221 . . . . . . . . . . . . . . . . . . . . . 221 ..... .. .... . . . . . . . . . . 226 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 ... . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 .... .... . . . . . . . . . . . . . . . . . . . . . . . 253
1. Introduction.. . . ... . 11. Wavelets.. . ... .. . . . A. Introduction to Wavelet Transforms . . . . B. Application of Wavelet Transforms in Image Compression ..... C. Motivations for Using Successive Approximation Quantization in Wavelet . . . . Transform Coding . . 111. Successive Approximation Quantization . . .. .. A. Successive Approximation Quantization .. ... . ... . B. Selection of the Orientation Codebook. C. Regular Lattices . .. .. . IV. Successive Approximation Wavelet Lattice Vector Quantization .... A. Description of the Coding Algorithm. .. ... V. Application to Image and Video Coding .. ... . A. Still Image Coding . . . . . .. .. B. Low-Bit-Rate Video Coding. . . . . VI. Conclusions.. . .... . . . ... References .. . . .... .. ..
I. INTRODUCTION Visual information is an important factor in human activities. Up to recently, storage and transmission of pictorial information, such as photography, cinema, television, and video, have been restricted to conventional analog methods. However, it has been recognized that improving the means of picture representation and processing can have a significant impact in several consumer applications. As a result, there has been a growing demand for digital pictures during the last two decades. Still and moving digital image signals produce a vast amount of data. For example, a single frame of super high definition image occupies around 12 Mbytes of memory, whereas 30 minutes of uncompressed video of digital * E-mail address:
[email protected] and
[email protected]. 191
Copyright 8 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
192
E. A. B. DA SILVA AND D. G . SAMPSON
broadcast TV quality would require around 38 gigabytes of data to be stored. Hence, storage requirements of uncompressed still and moving image files are very expensive, compared to text files. On the other hand, even with low-cost storage devices available, the speed of data transfer could impose another limitation to real-time processing of video signals. A full-resolution broadcast TV video requires data rate as fast as 21 Mbytes/s. Real-time transmission of videoconference and videophone pictures over the existing communication channels [Public Switched Telephone Network (PSTN), or narrowband Integrated Services Digital Network (ISDN)], also requires a considerable redudion of the original bulk of image data. Therefore, considering the current stage of technology in storage devices, computational power, and telecommunication networks, compression of image and video data is an essential part of digital imaging and multimedia systems (Netravali and Haskell, 1995; Zhang et al., 1995). Compression of still and moving image data is possible due to the considerable amount of redundant and irrelevant information that exists in digital images. The main objectives of an efficient image coding algorithm can be defined as: (i) the reduction of signal redundancy and (ii) the removal of irrelevant information (Jayant, 1994). Figure 1 illustrates a general framework for digital coding systems. It can be decomposed into two parts: the encoder and the decoder. Each part consists of four distinct functions that sometimes can be linked together. The encoder consists of preprocessing, analysis stage of the original signal representation, quantization, and codeword encoding. The decoder consists of codeword decoding, inverse quantization, synthesis of the reconstructed signal, and postprocessing. Pre- and postprocessing modules are usually not considered as part of the coding algorithm; however, they can
Signal Preprocessing
+ Representation (analysis) -D
Quantization
+
Codeword encoding
4
Encoder
Channel
Postprocessing
inverse + Representation (synthesis) + Quantization
4-
Codeword decoding
WAVELET VECTOR QUANTIZATION
193
be important features of a compression system. Preprocessing may involve image format conversion (e.g., from CCIR-601 to CIF, from CIF to QCIF), color space conversion (e.g., from RGB to YUV), or spatial/temporal filtering (e.g., to remove camera noise from an image sequence). Postprocessing can involve the inverse image format or color space conversions or some filtering to reduce perceptually annoying artifacts from the decoded images. Representation of the Signal. This stage of the compression algorithm aims to remove the redundant and irrelevant information from the original image signal by representing the image in a different form, more suitable for compression. Ideally, one expects to compress the maximum possible amount of perceptually important information into a small fraction of parameters that will be further processed during the next stage (quantization). The existing methods for redundancy removal can be classified in three general categories: (i) predictive methods (e.g., DPCM), (ii) transform methods (e.g., subband, wavelet, DCT), and (iii) model-based methods. They can be combined into hybrid schemes to obtain more efficient signal representation. Quantization. The representation methods, described in the previous section, attempt to place most of the perceptually important information into a few parameters; however, they do not actually compress the original signal. To achieve compression, it is necessary to perform some type of quantization. Quantization can be defined as the mapping of a set of input samples with M distinct values into a finite set of N ( N < M ) discrete output or quantization values, referred to as quantization alphabet or codebook. There are two main categories of quantization: 0
0
Scalar quantization (SQ), where each input sample is individually quantized. Vector quantization (VQ), where a group of input samples is quantized as one entity.
Vector quantization, which is a generalization of scalar quantization, can offer certain advantages over scalar quantization. This comes from the fact that blocks of samples are jointly processed, and therefore dependencies among neighboring data as well as properties of k-dimensional space can be exploited (Gersho and Gray, 1991). Coding. During this stage, binary codewords are assigned to the quantization symbols (which can be either scalars or vectors) produced during the quantization process in order to form the actual data bit stream that will be stored or transmitted. In general, some form of entropy coding is
194
E. A. B. DA SILVA AND D. G . SAMPSON
performed at this stage. Essentially, entropy coding techniques represent more likely units with a smaller number of bits than less likely units, obtaining as a result a reduction in the average number of bits (Bell et al., 1990). Two popular forms of entropy coding are: 0
0
Huffman coding, where each symbol is assigned a binary codeword with number of bits inversely proportional to the logarithm of the inverse of the probability of occurrence of the symbol. Arithmetic coding, where the whole message is represented by a real number. As more symbols are added to the message, the “precision” in the representation of the real number increases. Arithmetic coding is a powerful technique for achieving coding rates very close to the entropy of the source (Witten et al., 1987).
Most image compression standards to date, Le., JPEG (Pennebaker and Mitchell, 1993), H.261 (ITU-T, 1990), and MPEG (Le Gall, 1992), use the discrete cosine transform (DCT) at the representation stage. However, wavelet transforms, a relatively recent development from functional analysis (Daubechies, 19911, are anticipated to replace DCT in the future image and data compression products, due to their good compression characteristics and less annoying coding artifacts produced. We have developed a coding scheme based on wavelet transforms and lattice vector quantization, which we refer to as successive approximation wavelet lattice vector quantization. According to this method, groups of wavelet coefficients are successively refined by a series of vectors having decreasing magnitudes and orientations chosen from a finite codebook. The performance of this coding method has been investigated for still image coding (da Silva et al., 1995) and for low-bit-rate video coding (Sampson et al., 1995). In this chapter we revise the successive approximation wavelet lattice vector quantization method and its application to image and video data compression. Section I1 gives a brief introduction to wavelet transforms, emphasizing their application to image compression. Section 111 discusses successive approximation quantization. First the scalar case is described and then extension to vectors is addressed. Successive approximation using vectors is analyzed and conditions for convergence are derived. Based on the conclusions of this analysis, criteria for the selection of the orientation codebook are determined. Regular lattices are then investigated as orientation codebooks in successive approximation vector quantization. Section IV describes a method for wavelet image coding based on successive approximation lattice vector quantization (SA-W-LVQ). Section V describes the application of SA-W-LVQ to still images and video coding. Finally, Section VI presents the conclusions.
WAVELET VECTOR QUANTIZATION
195
11. WAVELETS A. Introduction to Wavelet Transfomzs
A wavelet transform (WT) is the decomposition of a signal into a set of basis functions consisting of contractions, expansions, and translations of a mother function + ( t ) , called the wavelet (Daubechies, 1991). Any function x ( r ) E L 2 { B ) ,the space of square integrable functions in 9, can be expressed as: x(t) =
C C ~ , , , 2 - ~ / ~ $ ( -2 n- )~, t m
(1)
n
The Z,,,, are the coefficients of a discrete biorthogonal wavelet transform of &). The functions + ( t ) and $0) are called the analysis and synthesis wavelets. In order to understand the implementation of a wavelet transform of a digital signal, one has to consider, besides the analysis and synthesis wavelets, the analysis and synthesis scaling functions, + ( t ) and & t ) , respectively (Vetterli and Herley, 1992). They should be such that they constitute a biorthogonal set, i.e., for m €2:
a,, - m)) = a,,
< 4 ( t ) ,S ( t - m)) =
(3)
<+(t),$(t
(4)
where 8,
=
(t;
m = 0, elsewhere.
Also, the following equations have to be satisfied:
n
n
196
E. A. B. DA SILVA AND D. G. SAMPSON
Assuming that the functions have finite support (Daubechies, 19881, the transforms of hon, gon, h,,, and g,, in Eqs. (7)-(10), the functions Ho( z), Go(z), H , ( z ) , and G,(z), respectively, must satisfy the following equations (Vetterli and Herley, 1992):
z
Ho( z)Go( 2 ) Go( 2)
+ Ho( -z)Go(
-2)
= 2 2 m - 1I(H- 2 )
Gl( 2 ) =
=
2,
(11)
( 12)
9
- Z * ~ - I H ~ (- 2 ) .
For the biorthogonal functions to be reasonably smooth, H o ( z ) and G , ( z ) need to have characteristics of low-pass filters and H , ( z ) and G,(z) of high-pass filters (Daubechies, 1988; Vetterli and Herley, 1992). In order to compute the discrete biorthogonal wavelet transform of a signal x ( t ) from its digital representation, x0,,, it is assumed that x0,, is equivalent to the coefficients of the projection of x ( t ) onto the space generated by the functions &t - n). That is,
The wavelet transform of x ( t ) , , f m , n , can be computed by the following recursion for m 2 1 (Mallat, 1989; Vetterli and Herley, 1992): xm,n
= ChOkXm-1,2n -k
3
k ,fm,n
= ChlkXm-l,2n-k'
k
The digital representation of the signal can be recovered from its wavelet transform coefficients by another recursion: 'm-l,n
=
CXm.kg0.n-2k
k
+
C2m,k81,n-2k.
( 17)
k
Equations (15) and (16) imply that the coefficients of a wavelet transform can be obtained from the digital representation of the signal by filtering it with H o ( z ) and H , ( z ) and subsampling both outputs by a factor of 2 (Crochiere and Rabiner, 1983). Recursive application of the above to the subsampled output of H o ( z ) results in an octave band subband decomposition of the signal. Conversely, Eq. (17) implies that the digital representation of the signal can be recovered from its wavelet transform coefficients by a subband synthesis process. This process is illustrated in Fig. 2. This is equivalent to saying that a wavelet transform can be implemented via a subband analysis and synthesis process, in an octave basis.
197
WAVELET VECTOR QUANTIZATION
0 0 0
0 0 0
a 0 0 0
0 0 0
b FIGURE2. (a) Computation of the discrete wavelet transform; (b) recovery of the digital signal from the wavelet coefficients.
Two-Dimensional Wavelet Transforms. One way to generate twodimensional wavelet transforms is to apply the recursions of Eqs. (15) and (16) to both the rows and columns of a digital image. This is referred to as a separable two-dimensional wavelet transform (see Fig. 3a). Alternatively, nonseparable wavelet transforms can be generated. Examples can be found in KovaEeviC and Vetterli (1992) and Tay and Kingsbury (1993).
198
E. A. B. DA SILVA AND D. G. SAMPSON
B. Application of Wavelet Transfoms in Image Compression The wavelet transform, like any other transform or frequency subdivision scheme, essentially generates coefficients that are less correlated than the original image samples. It is not a coding technique by itself because the coefficients still need to be coded. The advantage of a wavelet transform is that its coefficients, being less correlated, are easier to code than the image pixels. Therefore, the development of image data compression systems employing wavelet transforms requires devising efficient techniques for coding wavelet coefficients. In this section, we briefly describe some of the main techniques used for this purpose. First, the characteristics of wavelet transforms which are most relevant to image coding will be described. They are as follows: Variable Spatial and Frequency Resolutions. Wavelet transforms can perform a space-frequency decomposition of an image with arbitrary spatial and frequency resolutions (Rioul and Vetterli, 1991). This makes them particularly suitable for analyzing or coding real images. Real images usually have features of different sizes which demand the variable frequency and spatial resolutions of the wavelet transform. Possibility of Adaptation to HVS Characteristics. Similarly to any frequency decomposition system, wavelet transforms permit bit allocation among the different channels to be made according to the human visual system (HVS) frequency response (Macq, 1992). This is known as noise shaping and can provide perceptually optimum distribution of the coding distortions among the coefficients. Similarities among Bands. A two-dimensional separable n-stage wavelet transform can be represented as in Fig. 3a, where Li, Hi, and D, are respectively the low-pass, vertical, horizontal, and diagonal bands generated after an i-stage transformation. Figure 3b shows a three-stage wavelet transform of the Lena test image. It can be observed that all the vertical bands look like scaled versions of each other, the same being true for the horizontal and diagonal bands. This is equivalent to their edges being at approximately the same corresponding positions. This implies that the nonsignificant coefficients from bands of the same orientation tend to be in the same corresponding locations. Figure 4 shows corresponding coefficients in bands 6 , i = 1, ...,4. This similarity can be stated more precisely as follows: in Fig. 4, if the coefficient b , ( i , j ) in band Bk is zero, it is likely that the coefficients b,- ,(2i,2 j ) , b,- 1(2i + 1,2j), b,- J2i, 2 j + l ) , and b,- ,(2i + 1,2j + 1 ) in band B,- will also be zero, where B can be V , H, or D (Lewis and Knowles, 1992). The similarity among the bands can be exploited in the coding of wavelet coefficients, by using information in band B, to make some
v,
la Vn HnDn
vn-I 000
Dn-1
Hn-1
v1 0
0
HI
D1
a
b FIGURE3. (a) Two-dimensional n-stage wavelet transform; (b) three-stage wavelet transform of the Lena image. From da Silva, E. A. B., Sampson, D. G., and Ghanbari, M., A successive approximation vector quantizer for wavelet transform image coding, IEEE TMW. Image Process. 5 (21, 299-310. Q 1995 IEEE.
200
E. A. B. DA SILVA AND D. G . SAMPSON
DI
FIGURE4. Correspondingcoefficients in bands
v, i
=
1,. . . ,4.
,,
prediction about the coefficients in band B,- or vice versa. Therefore, wavelet image coding techniques can be characterized not only in terms of how the coefficients are coded but also in terns of whether or not the similarities among the bands are exploited. The most popular wavelet coding methods can be categorized according to: Whether the bands are coded independently of each other or their similarities are exploited The coefficient quantization and coding technique 1. Methods in which Bands are Independently Coded a. Scalar Quantization of Coefficients. In this category, each band of coefficients is coded independently of the others by using scalar quantization techniques. Gharavi and Tabatabai (1988) were among the first to use
WAVELET VECTOR QUANTIZATION
I
201
this approach successfully. They employ a two-stage wavelet transform. Referring to Fig. 3, the band L2 is coded using DPCM and a nonuniform scalar quantizer followed by variable-length coding. The remaining bands are coded using uniform scalar quantizers with a dead zone, followed by a combination of run-length and variable-length coding. Run-length coding is an efficient method for coding a small number of significant coefficients scattered among many nonsignificant ones. This makes the coding of the nonsignificant information quite efficient. Run-length coding is often considered in wavelet image coding because the AC bands ( H , V , and D ) have in general many nonsignificant coefficients (refer to Fig. 3b), which can be quantized as zero, leading to large zero runs, making possible the efficient addressing of the nonzero coefficients. A different approach is used in DeVore et al. (1992) and Argast et al. (19931, where the wavelet coefficients are quantized in such a way that, at each bit rate, it is guaranteed that the distortions in each coefficient will be smaller than a given threshold. The value of the threshold controls the bit rate. The quantized coefficients are then encoded using an arithmetic coder. This approach guarantees that the ringing distortions (given by the superimposition of the synthesis wavelet, scaled by the quantization error, on the coded image) in all regions of the image will not exceed a maximum. This is perceptually important because even if only a single coefficient has a large quantization error, this error can be spread over a region of the image, leading to an annoying artifact A more involved coding method is employed in Efstratiadis et al. (19921, where the wavelet coefficients are scalar quantized and then scanned using a space-filling curve. The addressing of the nonzero coefficients is tackled by a technique known as partition priority coding (PPC) (Huang et al., 1992), which divides the coefficients into classes according to their magnitudes and encodes efficiently their values and positions within each class. While providing efficient addressing of the nonzero coefficients, this technique enables the prioritization of the coding of the most important wavelet coefficients (that is, the ones with highest magnitudes), which also leads to increased coding efficiency. The magnitude and the position information produced by the PPC is encoded using an adaptive arithmetic coder.
b. Vector Quantization of Coefficients. A key issue in the performance of wavelet coding techniques is the addressing of the significant coefficients. One way to deal with this problem is by using vector quantization. Besides of its inherent efficiency for coding the significant coefficients, it can offer the extra advantage of coding groups of nonsignificant coefficients as zero vectors. This reduces the amount of bits necessary to code
202
E. A. B. DA SILVA AND D. G . SAMPSON
the nonsignificant coefficients in the bands, which can result in efficient addressing of the significant coefficients. In Antonini et al. (1992), the low-frequency band ( L , ) is scalar quantized and coded by PCM, 8 bits/pixel, while the remaining bands are coded using VQ. Different VQ codebooks are designed for each orientation, referred to as multiresolation codebooks. Because the bands in each orientation have strong structural characteristics (either horizontal, vertical, or diagonal details), VQ codebooks can be efficiently designed to match these characteristics. A different approach is adopted in Li and Zhang (1994). The wavelet transform is applied to vectors instead of pixels; i.e., the algebraic operations of convolution and subsampling are not carried out on individual pixels but on groups of pixels. The authors claim that this type of vector wauelet transfom reduces the correlation among vectors while preserving the correlations inside the vectors, which makes VQ more efficient for each vector. 2. Methodr Which Exploit Similarities among Bands a. Scalar Quantization of Coeficients. In this category of methods, the coefficients are scalar quantized and the similarities among bands are exploited in order to provide efficient coding of the significant information in the bands. The most effective of these methods use the concept of zero-trees, which can provide efficient addressing of the nonzero coefficients. Zero-trees can be understood by referring to Fig. 3a. When bk(i,j ) in band E J E = H, V,D) and all its corresponding coefficients in bands E,, r < k, are zero (or have magnitudes below a certain threshold), instead of transmitting all these zero values, one can just indicate that bk(i,j ) is a zero-tree root. Because, due to the similarity among the bands of same orientation, this is a likely event, the savings in terms of bit rate can be significant. Lewis and Knowles (1992) were among the first to use zero-trees in wavelet image coding. According to their scheme a tree of corresponding coefficients is indicated as a zero-tree according to a cost function based on human visual system properties. The decision is made in such a way that only information based on previously encoded coefficients is used, which enables the decoding to take place without the need for any overhead. The disadvantage of this method is that one has to “guess” whether a coefficient in a higher frequency band is significant or not based on information in lower frequency bands, which can lead to wrong decisions about the existence of a zero-tree root and, therefore, to the loss of significant image detail.
WAVELET VECTOR QUANTIZATION
203
In Shapiro (1993) and Said and Pearlman (1993), the zero-tree roots are combined with successive approximation scalar quantization. According to this approach, the wavelet coefficients are refined in several passes and the most important information is transmitted first. This is equivalent to a bit-plane coding of the wavelet coefficients using zero-tree roots to address significant information efficiently. The locations of the zero-tree roots as well as the values of the bit-planes are encoded with an arithmetic coder. As the successive approximation process enables control of the level of distortion of the wavelet coefficients, it has the extra ability to perform the optimal bit allocation among the bands in a straightforward way. Moreover, in each pass, it guarantees that a certain level of distortion will not be exceeded by each wavelet coefficient, which is perceptually important (DeVore et al., 1992). In Naveen and Woods (1993), a method referred to as finite state scalar quantization is employed for wavelet transform coding. According to this method, the state of the quantizer at a given input sample determines its step size. The state of the quantizer depends on a classification of quantizer outputs, as well as on the interdependences among bands. Therefore, the exploitation of the similarities among bands is intrinsic in the method. The quantizer indices are encoded by an arithmetic coder.
b. Vector Quantization of Coeficients. By using vector quantization, the similarities among bands can be exploited in alternative ways. The most common is the one found in Renaud and Smith (1990), where vectors are formed with corresponding coefficients from bands of the same orientation. This approach aims to take into consideration the interband dependences. However, the rate-distortion results obtained for coding still images are moderate. In Mohsenian and Nasrabadi (1994), a method similar to the zero-tree based ones is used. An edge detector is applied to the reconstructed low-frequency band ( L , ) to identify the position of significant information in the AC bands (edges). Then the significant information in the AC bands is coded by VQ, with the vectors formed by taking one coefficient from each AC band, while the information in the low-frequency bands is coded by intraband VQ. C. Motivationsfor Using Successive Approximation Quantizationin Wavelet Transfomt Coding From the discussion so far, a number of observations can be made regarding the features that can be advantageous for wavelet coders. The first one is related to the exploitation of the similarities among the bands of same orientation. Wavelet coding schemes which exploit these similari-
204
E. A. B. DA SlLVA AND D. G . SAMPSON
ties tend to give better rate-distortion performance than others. Among these, the ones which employ the concept of zero-trees are simple and efficient. Another advantageous feature for wavelet coders is related to the ability to control the levels of distortion in each band. Assuming that the distortion function of the quantizer is exponential, the optimum bit allocation is achieved when the “measured” distortions contributed by each band in the final image are equal. The “measurement” process can in general be modeled as a weighting of the bands with a factor which is dependent on the band and on the distortion metric used (for example, mean square error, human visual system properties). Therefore, if the coding process can control the levels of distortion in each band, it can achieve the optimum bit allocation naturally. More precisely, if a band k has distortion per coefficient dk and a measurement weight w k , the optimum bit allocation occurs when w,d,=C, k = O ,..., K - 1 , ( 18) where C is a constant which depends on the overall bit rate and the image is decomposed by the wavelet transform into K bands. Thus, optimum bit allocation can be achieved by setting the distortions in the bands according to Eq. (18). Note that, even if HVS characteristics are to be taken into account, this affects only the computation of the weights w k , and the resulting perceptually optimum bit allocation can be achieved in the same way. Finally, a quantization error in a wavelet coefficient is equivalent to the superposition, on the reconstructed image, of the corresponding synthesis wavelet scaled by the error. This implies that the quantization distortion of a coefficient will be spread over the reconstructed image by the synthesis wavelet. Therefore, the distortion of a coefficient corresponding, for example, to an edge will not be confined to that edge but will be spread over an area of the image, which can cause annoying ringing artifacts. One consequence of this fact is that edge masking effects cannot easily be exploited when coding wavelet coefficients. In addition, even if only a single coefficient in the whole image is poorly quantized, this coefficient’s quantization error can be quite annoying because it will be spread over an area of the image. Therefore, it is important that a wavelet coder can guarantee that the quantization distortions in every single coefficient are compatible with the aimed level of image quality. Summarizing, it is advantageous for wavelet transform coders to meet the following requirements: (i) To exploit the similarities among the bands of same orientation. (ii) To be able to set an arbitrary average level of distortion for each band.
WAVELET VECTOR QUANTIZATION
205
(iii) To be able to guarantee that the quantization distortion of each coefficient does not exceed a certain maximum. Requirement (i) can be satisfied by using any of the techniques described in Section II,B,2. One way to achieve requirements (ii) and (iii) is to code the wavelet coefficients in successive passes, whereby in each pass the quantization error is further refined. This causes the distortion in each coefficient to be compatible with the refinement level of the current pass, which satisfies requirement (iii). Also, the average distortion in each band is determined by the refinement level of the current pass. This implies that all the bands will tend to have the same level of distortion. However, in order to achieve optimum bit allocation and thus satisfy requirement (ii), Eq. (18) has to be satisfied. Because such a refinement process can guarantee that the average distortions in all the bands are equal, in order to satisfy Eq. (18) it suffices to multiply all coefficients in band k by the measurement weight w, prior to quantization and divide the same coefficients by w, after quantization. This is so because, if the distortion in all bands is d, in band k, after division by w,, the effective distortion will be d, = d/w,, which implies that dkwk is the same for all bands. As seen in Section II,B,2, Shapiro (1993) and Said and Pearlman (1993) propose methods which incorporate successive approximation scalar quantization that implements the refinement of the wavelet coefficients in the way described above and thus satisfies the requirements (ii) and (iii). Also, they use the concept of zero-trees in order to exploit the similarities among bands of same orientation. A method has been proposed in da Silva et al. (1995) which incorporates all these concepts using vector instead of scalar quantization. 111. SUCCESSIVE APPROXIMATION QUANTIZATION A. Successive Approximation Quantization
1. The Scalar Case
The successive approximation of a scalar quantity is equivalent to the approximation of a length L by a series of yardsticks of progressively smaller lengths. The number of yardsticks, or passes of the approximation process, depends on the required level of error. Figure 5 illustrates an example of this process, where the yardstick lengths are halved at each pass. The process begins by choosing an initial yardstick length 1 such that 1 > L. As can be inferred from this figure, after each pass the error magnitude is bounded by the yardstick length, which becomes smaller at each pass (for example, when the yardstick length is 1/2" the error is
206
E. A. B. DA SILVA AND D. G . SAMPSON
L 0 0 0
..-..------FIGURE5. Successive approximation of a coefficient for the scalar case. From da Silva, E. A. B., Sampson, D. G., and Ghanbari, M., A successive approximationvector quantizer for wavelet transform image coding, IEEE Trans. Image Process. 5 (21, 299-310. 6 1995 IEEE.
bounded by 1 / 2 9 . Then, by increasing the number of passes, the error in the representation of L can be made arbitrarily small. With reference to Fig. 5, the length L can be expressed as
Therefore, given an initial yardstick length I , a length L can be represented as a string of “ + ” and “ - ” symbols. As each symbol “ ” or “ - ” is added, the precision in the representation of L increases, and thus the distortion level decreases. In essence, this process is equivalent to the binary representation of real numbers. Each number can be represented by a string of “0’s’’and “1’s” and by increasing the number of digits, the error in the representation can be made arbitrarily small. Assuming that a is the scaling factor which determines the decrease of the yardstick magnitude at each pass, it can be shown that this type of successive approximation process converges whenever a 2 0.5. As the magnitude of the approximation error at pass n is bounded by the the current yardstick length, a“1,one can say that the smaller the value of a the smaller the number of passes required to achieve a certain distortion level. This implies that for maximum coding efficiency, the minimum possible value of a should be used (i.e., a = 0.5); that is, the yardstick length should be halved at each pass.
+
207
WAVELET VECTOR QUANTIZATION
2. Extension of the Successive Approximation Process to Vectors a. The Trivial Case. As seen in the previous sections, methods based on successive approximation of the wavelet coefficients have a good potential for image coding applications. Those successive approximation processes are equivalent to the approximation of a scalar quantity by yardsticks of progressively decreasing lengths, as depicted in Fig. 5. It would be interesting if one could generalize successive approximation processes to k-dimensional vectors instead of scalars, such that some of the potential advantages offered by vector quantization over scalar quantization could be exploited together with the advantages offered by successive approximation processes. A straightforward way of achieving this is to approximate successively the components of the vectors by yardsticks of decreasing lengths, considering in each pass the vectors of decreasing magnitudes whose components are the yardstick lengths corresponding to each component of the original vector. This is illustrated in Fig. 6 using a two-dimensional example.
-
X
-
\ ~
112
FIGURE 6. Illustration of the successive approximation of a two-dimensional vector [according to Eq. (23)l.
208
E. A. B. DA SILVA AND D. G . SAMPSON
Assume that the two components u1 and u2 of a vector v = ( u l , u ,) can be represented as a sum of decreasing yardstick length 1/2“ as follows: u,
=
u2=
1 2
1 4
1 8
1 16
1 3 2 ’
+ l - - + - + - - - + - ... 1 1 1 1 -1 - - - - + - + - + 2 4 8 16
1 -...
3 2 ,
Putting these equations in a vector form, we have:
+/)I
+
L( -1)‘ 16 + I
+1)l
+
L(
+i( 8 +1
+
L( +l)I 32 + I
+
....
This can be rewritten as:
+’(
8 +1
16 + 1
- 1( + I ) ( + 32 + 1
...(
This equation can be interpreted as the representation of a vector v as a sum of vectors of decreasing lengths and varying orientations in the two-dimensional space. Since each of the vectors composed of 1’s and - 1’s has magnitude fi,the vector in pass n has magnitude l f i / 2 , - ‘ . The orientations are chosen from the vectors u1 = (1 l), u2 = (1 -0, u3 = ( - 1 l), and u4 = ( - 1 - 1). The approximation in Eq. (23) is depicted in Fig. 6. Therefore, given an initial “vector yardstick” length, a two-dimensional vector can be represented as a string clc2 c, *.., where c, belongs to the four-symbol alphabet {sl,s2, s3,s&, with si corresponding to the orientation vector ui . Hence, if c, = si, this indicates that in pass n the orientation of the vector yardstick is the one of vector ui. Generalizing into k-dimensional space, one can say that a k-dimensional vector can be approximated by a series of vector yardsticks of decreasing lengths and orientations chosen from a codebook composed of vectors vi having the form Vi =
where pij E {O, 11, j
=
(( - 1)P“ ... ( - 1)
1 , . ..,k
PIk)
(24)
WAVELET VECTOR QUANTIZATION
209
As mentioned previously, the successive approximation process of each vector component is guaranteed to converge provided that the yardstick lengths decrease by a maximum factor of 2 in each pass. For the vector case, this implies that, using the codebook in Eq. (241, the vector successive approximation process converges if the vector yardstick magnitudes decrease also by a maximum value of 2 in each pass. The approach described in this section is a trivial extension of the scalar successive approximation into vectors. We refer to the orientation codebooks defined in Eq. (24) as trivial codebooks. However, the following questions may arise: (a) Are there other k-dimensional codebooks which can lead to a more efficient successive approximation process? (b) Which conditions does a codebook have to satisfy for the vector successive approximation process to be possible? The next subsection will address these questions.
b. The General Case. In general, a k-dimensional vector can be represented by two parameters, namely, 0
0
Its magnitude Ikll, which is a scalar component that corresponds to the norm of the vector, and Its orientation in k-dimensional space, u, = x/llxll, which is a kdimensional vector with unit energy, i.e., llu,ll = 1.
A k-dimensional vector x can be approximately represented by a series of vectors of decreasing magnitudes and given orientations in k-dimensional space. Figure 7 illustrates this process, where it is assumed that IlZll is a given magnitude value larger than or equal to the maximum magnitude of any vector in a given set of input vectors X = {x,; n = 1. . .}, i.e., IlZll 2 llxlImax. First, the original vector lkll is approximated by a vector v1 with a magnitude which is only a fraction of the original magnitude, i.e., llvlll = allZll (where a < 1.0 and a E W + )and an orientation vector y1 that is selected from a finite set of orientation code vectors, Y = {yi : Ib;.ll = 1; i = 1,2, ..., N},so that the inner product between x and y, takes a maximum value. Then the residual error which is the difference between the original vector x and its first approximation v1 = allZlbl is further refined with a new vector vz = a211Z1$,. At the second stage the magnitude of the approximation vector, llvzll, is reduced con.yred to the maximum magnitude value 11Z11, by a factor larger than the one used at the first stage (e.g., a'). This process is repeated until a certain criterion is satisfied. This criterion could be either to approach the original input vector at a certain error level or to exhaust a ceitain number of stages.
210
E. A. B. DA SILVA AND D. G. SAMPSON
c
FIGURE 7. Analysis of convergence for the k-dimensional case. From da Silva, E. A. B., Sampson, D. G., and Ghanbari, M., A successive approximation vector quantizer for wavelet transform image coding, IEEE Trans. Image Process. 5 (2), 299-310. 0 1995 IEEE.
This method for successive vector approximation is employed for the development of a multistage VQ, referred to as successive approximation vector quantization (SA-VQ). The basic idea in SA-VQ is that, at each quantization stage s, any input vector (or residual vector, after the first stage) x s i which has energy larger than a given threshold value T, is represented by a given magnitude R , (which is related only to the index of the quantization stage, rather than the actual energy of the input vector) and an orientation code vector which is selected from the orientation codebook Y to give a maximum inner product with x S i . For a given set of input vectors ( x i ; i = 1 , 2 , . .. MI, the operation of SA-VQ can be described as follows:
where xsi is the input vector at quantization stage s with index i Ilxsill is the magnitude of x s i T, is the magnitude threshold at stage s R , is the reconstruction magnitude at stage s ysj is the best-matched orientation code vector for x s i , selected from the orientation codebook Y , such that: y s j , y s , , ~ Y , j # n ; j , n = 1,... , 2N . X s i ' Ysj 2 X s i ' Ysn 1
211
WAVELET VECTOR QUANTIZATION
Hence, the SA-VQ is designed using three sets of parameters: (i) The set of threshold magnitudes {T,; s = 1,2.. .). (ii) the set of reconstruction magnitudes { R , ; s = 1,2. . . ). (iii) The finite set of orientation code vectors Y = {yi : ltyill 1,2,. .., N).
=
1, i
=
Following the description of successive vector approximation illustrated in Fig. 7, the threshold magnitudes T, can be selected as T, = a s I I X I I m a x , where Ilxllmax is the maximum magnitude in the set of the original input vectors. The reconstruction magnitudes can be defined, in general, as R , = PT,; however, it is assumed that p = 1, so that the reconstruction magnitude is equal to the magnitude threshold at any stage s. In this case, the two main design considerations in SA-VQ involve: (i) The selection of the scaling factor a. (ii) The selection of the orientation code vectors to be included in the orientation codebook. 3. Conditionsfor Convergence
The problem of the successive approximation of a given input vector by a series of vectors with decreasing magnitudes is examined. The main aim of this analysis is to investigate sufficient conditions under which this approximation scheme converges to a minimum error; i.e., the original vector is almost perfectly reconstructed by its final approximation vector, provided that a sufficiently large number of stages is allowed. Based on the conclusions from the analysis of the convergence problem, some criteria regarding the design of the orientation codebook in SA-VQ will be derived. For the formulation of the convergence problem in successive vector approximation the following suppositions are made:
(I) For a given vector x, the threshold magnitude at each stage s is assumed to be given by T, = c~~llxII,,,,~,where I k I I m a x is the maximum Euclidean norm of the entire set of input vectors and the constant factor a is in the range 0.5 Ia I1.0. This scaling factor is referred to as an approximationscaling factor. (11) The reconstruction magnitude R, of the approximation vector v, at each stage s is equal to the threshold magnitude at this stage, i.e., R, = T,. Thus, the v, is formed as the product of the current reconstruction magnitude R , and the best-matched orientation code vector y,.
212
E. A. B. D A SILVA AND D. G . SAMPSON
Y is used at all stages of SA-VQ. The orientation codebook is built so that the angle between any possible vector and its closest orientation code vector is upper bounded by Omax. Hence, at each stage the maximum error is introduced when the residual vector is approximated by a vector with error in orientation fl,,,,,.
(111) The same orientation codebook
Sufficient conditions to guarantee the convergence of the successive approximation by a finite set of orientation vectors of decreasing lengths can be derived by evaluating the worst case. This can be illustrated in Fig. 7, assuming that at each stage the selected orientation code vector gives the maximum error in the orientation. From supposition (111) this implies that dl = 8, = *.- = Omax. Moreover, convergence can be assumed if a vector with zero magnitude can be approximated with arbitrary precision after a sufficiently large number of passes irrespective of the initial reconstruction magnitude. In Fig. 7 this is equivalent to replacing llxll by the initial reconstruction magnitude R , = 11Z11. Assuming that the initial approximation is
llroll = 11z11,
(26)
after m passes the magnitude of the residual vector is given by: where Ilr,ll is the norm of the residual vector r, at stage s, a is the approximation scaling factor, and Omax is the maximum angle between any given vector and its closest available orientation code vector. Using the recursive formula in Eq. (271, with the initial condition Ilr,ll = IIZII, we can compute the residual vector magnitudes after each stage, llrsll,s = 1,2,. . .,n, for any given pair (a,emax).Convergence of the vector successive approximation scheme is equivalent to: lim Ilr.II
n-m
=
0.
(28)
We assume that convergence occurs when the improvement in the approximation after pass n is less than a small fraction of the magnitude of the original vector, that is, An -
IlZll
< E
where A,, = Illrnll- llrn-llll,and e, in the graphs shown in Fig. 8 is Equation (27) is used to find the value of the convergence scalingfacror a, for any Omax in the range 0" I Omax < 90", such that the scheme converges for any a 2 Z, where 0.5 s a < 1.0. Figure 8a gives the values
u)48
1024
512
256 E
128
64
32
16
0
10
20
30
40
e
50
60
70
80
90
(b) FIGURE8. Plots of 0 versus (a) convergence scaling factor, E ; (b) number of iterations required for convergence, 7. From da Silva, E. A. B., Sampson, D. G., and Ghanbari, M., A successive approximation vector quantizer for wavelet transform image coding, IEEE Trans. Image Process. 5 (21, 299-310. 0 1995 IEEE.
214
E. A. B. DA SILVA AND D. G. SAMPSON
of the convergence scaling factor E for angles Om,, in the range 0" IOm,, < 90". Figure 8b shows Om,, plotted against the number of iterations q required for convergence when a = E . From the results illustrated in Fig. 8 one can conclude that for Omax values up to 82" this successive approximation scheme is guaranteed to converge provided that a suitable value of a is chosen. For example, for Om,, = 0", which is equivalent to the scalar case, convergence is guaranteed for a I0.5. For Om,, > 0", convergence requires that (Y > 0.5. Indeed, the larger Om,, is, the larger (Y must be to ensure convergence. Also, as Om,, increases, so does the number of iterations q. As exemplified in Section III,A,2 for the two-dimensional case, in a vector successive approximation process, given an initial yardstick length, a vector can be represented as a string of symbols clc2 c, where symbol c, indicates which orientation code vector is used in pass n. Then more iterations mean that more symbols c, are necessary to represent a vector for a given accuracy. Therefore, if only this factor is considered, the selected orientation codebook should be such that Om,, is as small as possible. Nevertheless, another fact must be taken into account: if the vector dimension is k, a single vector represents a group of k samples. Since n passes using an orientation codebook of N symbols can generate N" different strings, without entropy coding each string can be represented by nlog, N bits. Hence, each sample can be represented by (n log,N)/k bits. Therefore, the choice of the vector dimension involves a compromise. On one hand, the use of a larger vector dimension contributes to decreasing the number of bits/sample. On the other hand, it can lead to larger values of Om,, which require a larger number of iterations to achieve a certain distortion, contributing to increase the number of bits/sample. This means that the use of codebooks having larger values of Om,, can be advantageous if the resulting increase in the number of iterations is compensated by a larger vector dimension. .a*,
B. Selection of the Orientation Codebook Following the discussion of the condition for convergence in the successive vector approximation scheme described in the previous section, some considerations regarding the design of the orientation codebook can be inferred. These are summarized as follows. First, it has been assumed that the orientation code vectors have unit energy. Any orientation codebook can fulfill this requirement after the code vectors are properly scaled. More important, in supposition (1111, it was assumed that the maximum error in the orientation introduced by approximation at any stage is
WAVELET VECTOR QUANTIZATION
215
bounded by a given value Om,,. Moreover, the graph in Fig. 8b indicates that a small value of Omax is desirable, because it implies fast convergence to an arbitrary error. From this point of view, the main requirement in the design of the orientation codebook is to guarantee a certain value of Omax,which is as small as possible. Therefore, the convergence conditions do not impose any particular requirements regarding the location of the individual code vectors in the k-dimensional space. As a result, orientation codebooks with a regular structure can be a reasonable choice for successive approximation vector quantization. There is no apparent reason for designing a nonregular orientation codebook through a training process. In this case some orientation code vectors will be close to each other, while others will be separated by a larger angle. The value of Omax would be extracted from these regions. Thus, it can be argued that a uniform codebook with the same Omax and a smaller number of code vectors can be employed. This is exemplified in Fig. 9 for the two-dimensional case. Figure 9a shows a nonregular codebook with seven vectors and Omax of 60" and Fig. 9b shows a regular codebook with the same Omax and just three vectors. The trivial codebooks described in Section III,A,2 are clearly regular and are therefore suitable for use in successive approximation of vectors. However, it is worth investigating whether other regular codebooks can lead to smaller values of Omax for a given dimension. Codebooks based on regular lattices are good candidates because they can offer a good trade-off between Omax and the codebook population due to their space-packing
(4
(b)
FIGURE9. (a) Nonregular codebook with seven vectors and Omax = 60"; (b) regular codebook with three vectors and Omax = 60".
216
E. A. B. DA SILVA AND D. G. SAMPSON
properties (Conway and Sloane, 1988; Gibson and Sayood, 1988). In addition to their well-known and well-defined structural properties, lattice codebooks offer the advantage of simple and fast encoding algorithms (Sampson and Ghanbari, 1993), which can lead to efficient implementation. In the following subsection, a brief description of regular lattices will be made, and their properties which are relevant to the vector successive approximation problem will be analyzed. C. Regular Lattices 1. Definitions A regular lattice is a discrete set of points in the k-dimensional Euclidean
space W kwhich can be generated by the integral linear combination of a given set of basis vectors. Hence, a k-dimensional lattice L, is defined as a subset of real space W k , such that: L,
=
{y € W k : y
= UIUl
+ azuz +
+a,u,)
(30) where { u i } is a set of linearly independent vectors that span L,, called basis vectors of lattice Lk, and { a i } is the set of integers which specify a particular point in lattice L,, known as coefficients of the basis vectors. Regular lattices were originally investigated in the context of sphere packing. Sphere packing is concerned with the densest way of arranging k-dimensional, identical, nonoverlapping spheres in the real space (Conway and Sloane, 1988). More formally, a sphere packing P, of radius p consists of an infinite set of points (y1,y2,...) in the Euclidean space S k , such that the minimum distance between any two points is not smaller than double the radius of the packing (Sloane, 1981): I & dist(yi,yj) = ( y i W - y j W ) * 2 2p, Vi z j . (31)
d
.**
w=l
Thus, a sphere packing is described by specifying the centers {yi), i = 1, 2,. .. and the radius p of the k-dimensional spheres. Lattice packing is a sphere packing in which the sphere centers are points of a particular lattice. It is intuitive to suppose that the lattices which correspond to the densest sphere packing in k-dimensional space tend to give the best trade-off between the values of Om,, and the codebook population. In fact, this is one of the main motivations for the study of regular lattices in the context of successive approximation of vectors. Next, some regular lattices will be described, in particular the ones which give the best sphere packings in dimensions k = 4, 8, and 16.
217
WAVELET VECTOR QUANTIZATION
2. Important Regular Lattices An important category of lattices is that of the root lattices, namely Z,(k > I), A,(k > I), D,(k > 3), E,(k = 6,7,8), and the Barnes-Wall A16 and the Leech AZ4,which have been shown to offer the best lattice packing of their space (Conway and Sloane, 1988). Definitions of the lattices considered for the successive approximation of vectors are given below.
The Integer Lattice 2,. The integer or cubic lattice Z,(k > 1) is defined as the set of k-dimensional vectors with all integer components z k = {y = (y, y, .-.yk) : yi E 3 1 where Z is the set of integer numbers. Lattice z k gives the simplest structure of points in g k and , most regular lattices can be generated from zk. The Lattice D,. The k-dimensional lattice D , (k > 3) is defined by spanning the integer lattice zk and retaining those points y in z k which have coordinates with an even sum:
D,=
i
1
k
y : y , E Z A ~ y i = O ( m o d 2 ). i= 1
(32)
D , is the “backbone” of other more complex lattices that give the most dense sphere packing at high dimensions, namely the Gosset E, and the Barnes-Wall A,,. The Lattice Ek (k = 6, 7,s). The most dense lattices in k = 6, 7 , and 8 dimensions are the members of the Ek ( k = 6,7,8) family. Among them the Gosset E, is particularly useful due to its symmetrical structure. It can be defined as the union of two subset of points, the lattice D , and the coset ( D , + $1:
i +2
E, = D ,U D ,
- ,
where 2
=
)
-------( 21 21 12 12 12 12 12 12 ‘ (33)
The Bames-Wall Lattice A I 6 . The Barnes-Wall lattice A16 is the most dense lattice at k = 16 dimensions. A16 can be conveniently defined as the . scaled lattice 2 D l , is the set of union of 32 cosets of the lattice 2 D 1 6 The even coordinate points in z,, such that the sum of the coordinates is a multiple of 4. Thus, A,, is defined as: 32
=
u
ICi
+ 2D16)
(34)
i= I
where the coset representatives ci are codewords of the rows of the Hadamard matrix HI, and its complementary HI, after changing 1’s to 0’s
218
E. A. B. DA SILVA AND D. G. SAMPSON
and -1’s to 1’s. Therefore, A16 can be decomposed into 32 subsets of points based on the Hadamard matrix rows:
A16 = {2D16+ (
~
~
~
0U {2D16 0 +) (1111111111111111)} }
u{ 2 0 1 6+ (0101010101010101)) u {20,, U{2D,,
+ (~ O ~ O ~ O ~ O ~ O ~ O ~ O ~ O + (0011001100110011)) U (2Ol6 + (1100110011001100)}
U {2D1,
+ (0110100110010110)) U {2D1, + (1001011001101001)). (35)
3. Lattice Codebooksfor Successive Approximation of Vectors
The construction of orientation codebooks using regular lattices requires that the original infinite lattice (as defined in the previous section) is truncated, so that a finite set of lattice points is used. A convenient way to create lattice codebooks is by considering all lattice points with the same I , norm. Truncation of the infinite lattice can then be achieved by considering only the lattice points with 1, norm within two given threshold values. In general, the points of a given regular lattice are distributed on the surface of successive, concentric, k-dimensional hypershells centered at the origin, so that all lattice points at the same shell have the same 1,-norm. Hence, the mrhshell S, of a given lattice Lk is the set of all Lk points at the same distance from the origin, r ( L k ,m): Lk) : {y Lk : I b l l r = r( Lk m ) } (36) where Ib.11, = [Es_,lyjlrll’r is the 1, norm of y. The shells have pyramidal shape for r = 1 (l,-norm) and spherical shape for r = 2 (l,-norm). The exact number of Lk-lattice points at any shell, for the most important regular lattices, can be calculated by using the theta functions (Conway and Sloane, 1982b) or the recently developed Nu functions (Barlaud et al., 1994) for spherical and pyramidal shells, respectively. In this analysis, only the case of r = 2 (Euclidean distance) will be considered. Two types of lattice codebooks are considered for the orientation codebook in successive approximation vector quantization: sm(
(i) Single-shell lattice codebooks, which are built by taking all lattice points from a single shell s m ( L k )of a given lattice Lk: y( Lk 9 s m ) =
{sj
sm(
Lk) ; l b j l l
= r( Lk
sm);
i = 1,2,***,N(Lk,sm)} (37)
WAVELET VECTOR QUANTIZATION
219
(ii) Multishell lattice codebooks, which are built as the union of the ,,,(Lk) of a given lattice L,: lattice points from M shells S,, ~
where r ( L k , S r nis ) the radius of the lattice shell S,(L,), and N ( L , , S,) denotes the population of the shell. From the discussion in Section III,B, one can infer that that the following parameters play a key role in the efficiency of a particular lattice orientation codebook for successive approximation of vectors: (a) The dimension of the lattice points. (b) The population of the lattice points on the codebook, MY(’,, SrnM)). (c) The maximum possible angle between any input vector and its SmM)). closest code vector, Omax(Y(Lk, Table I summarizes the parameters of shells of regular lattices which give the best lattice packing at dimensions k = 4, 8, and 16 (Conway and Sloane, 1988), together with the parameters for the “trivial” orientation codebooks defined by Eq. (24) in Section III,A,2. The maximum possible angle Omax between any input vector and its closest code vector has been computed exhaustively by a numerical method. From this table, one can make the following observations: 0
0
Confirming what was intuitively expected, for each vector dimension, codebooks based on regular lattices have a much smaller Omax than the one of the corresponding trivial codebook. This suggests that it is advantageous to use codebooks based on regular lattices for the successive approximation of vectors. Considering codebooks generated from different sets of shells of the same regular lattice, the more vectors a codebook has, the smaller are the values of Omax and the number of approximation stages required for convergence (Fig. 8b). Therefore, the choice of the codebook population depends on a trade-off between the savings provided by the reduction in the number of stages and the increase in the number of bits required to represent a larger codebook population.
220
E. A. B. DA SILVA AND D. G . SAMPSON TABLE I
PARAMETERS OF THE REGULAR LATrICES WITH BESTPACKING IN DIMENSIONS k = 4,8,16, TOGETHER WITH THE EQUIVALENTPARAMETERS FOR THE TRIVIAL ORIENTATION CODEBOOKS OF THE SAME DIMENSIONS [T4. T8, AND T 1 6 - S ~EQ. ~ (24)]
Lattice type, Lk
Shell index, m,
+ ... + m ,
Population,
N(Lk,s,,,,
+ ." fm,)
0 4
1
24
0 4
2 1+2 -
24 48 16
0 4
T4 E8 E8 E8
E8 E8
TR A 16 T16
1 2 3 1+2 1+2+3 -
2 -
240 2,160 6,720
2,400 9,120 256 4,320 65,536
Maximum actual angle, 4nax
45" 45" 32" 60" 45" 45" 35"
32" 29" 69" 55" 76"
As the codebook dimension increases, so does the codebook population and the values of em,,; however, this is counterbalanced by the savings in the number of bits resulting from the dimensionality increase. Another advantage of the use of codebooks based on regular lattices is the fast encoding process. When the codebook is built based on a finite set of lattice vectors, nearest-neighbor (NN) search is carried out only among a limited number of code vectors (depending on the properties of the particular lattice), as opposed to the exhaustive full codebook search of the conventional clustering VQ. Conway and Sloane (1982a) have developed fast and simple NN algorithms for all the important regular lattices. Their algorithms exploit the symmetry of the root lattices to find the closest lattice point for a given input vector with a minimum computational effort, assuming an infinite lattice. However, in the case of finite codebooks made from sets of shells from a regular lattice, modifications to the original algorithms are needed to deal with the points outside the boundary regions of a truncated lattice. Such modifications are beyond the scope of the present analysis, but they can be found in Jeong and Gibson (1993), Sampson and Ghanbari (19931, and Barlaud et al. (1994). In the next section an image coding method based on the successive approximation of vectors is proposed, and the performance of the different codebooks in table I is assessed.
WAVELET VECTOR QUANTIZATION
221
IV. SUCCESSIVE APPROXIMATION WAVELET LATTICE VECTOR QUANTIZATION
In this section we describe a method for wavelet image coding based on successive approximation lattice vector quantization. This method will be referred to as successive approximation wavelet lattice vector quantization (SA-W-LVQ). This is an extension of the embedded wavelet zero-tree (EZW) coding algorithm developed by Shapiro (1993) using SA-VQ instead of successive approximation scalar quantization (SA-SQ).
A. Description of the CodingAlgorithm SA-W-LVQ succeeds the EZW algorithm through two main modifications. The first one is that vectors of coefficients are processed instead of individual coefficients. A vector of coefficients is considered as insignificant (treated as zero) if its energy is smaller than a threshold. A zero-tree root occurs when a vector and all its corresponding vectors in bands of the same orientation are insignificant. The second important modification is that the yardstick length, instead of being halved at each pass, is multiplied by a factor a 2 0.5. The exact value of a depends on the codebook used (see discussion in Section 111,C). The basic principles of SA-W-LVQ are outlined in Fig. 10. First, the mean value of the original image is computed and extracted from the image. An M-stage wavelet transform is then applied to the zero-mean X X Y image resulting into an image decomposition such as the one and Dk,k = 1,2,. .. ,M , shown in Fig. 11. Each subimage, L , and vk,Hk, is partitioned to n X m blocks of wavelet coefficients. For the formation of these vectors, a different scanning is used according to the orientation of the particular band (Gharavi and Tabatabai, 1988). This scanning process is illustrated in Fig. 11. Suppose that:
(i) A given orientation codebook Y based on the innermost shells of a given lattice Lk is employed. The codebook consists of N lattice code vectors and it is characterized by a known Om,, value. (ii) A given value for the approximation scaling factor a is selected according to the Om,, of the lattice codebook Y. (iii) The reconstruction magnitude R , at each stage s is equal to the threshold magnitude T,.
222
E. A. B. DA SILVA AND D. G . SAMPSON
FIGURE10. Block diagram of the SA-W-LVQalgorithm.
(iv) The initial threshold magnitude is set as TI = a l k l l m a x where I k I I m a x represents the maximum magnitude in the input set of vectors ( x i ; i = 1,2,. ..,XY/nm). That is,
k,
WAVELET VECTOR QUANTIZATION
223
;oo
Vn
0
0
0
Hn
Dn
FIGURE11. Scanning of each block of coefficients to form the vectors in bands of different orientations.
The coding algorithm employed to code the wavelet transform coefficient vectors can then be described as follows: 1. The image mean is computed and extracted from the image. 2. An M-stage wavelet transform is applied to the zero mean image. 3. Each band of wavelet coefficients is divided into n X m blocks forming vectors of dimension nm. Depending on the band considered, the scanning of the blocks to form a vector is different. The scanning is vertical in the vertical bands, horizontal in the horizontal bands, and zigzag in the diagonal bands (Fig. 11). 4. The maximum magnitude llxlImaxof the vectors of wavelet coefficients is computed. 5. Initially, the threshold magnitude T is set to alJXllmax, where the value of a is dependent on the Omax value of the selected orientation codebook. 6. A list of the positions of the vectors, called the dominant fist is generated. This list determines the order in which the vectors are
224
E. A. B. DA SILVA AND D. G. SAMPSON
scanned. It must be such that vectors from a lower frequency band (higher scale) are always scanned before the ones from higher frequency bands. Two empty lists of vector positions, called the subordinate list and the temporary list, are also generated. 7. The wavelet transform of the image is scanned, and if a vector of wavelet coefficients has magnitude smaller than the threshold magnitude T it is reconstructed as zero. Otherwise, it is reconstructed as its closest orientation code vector scaled with magnitude T . 8. Dominant pass: The reconstructed coefficients are scanned again, according to the order in the dominant list, generating a string of symbols as follows: If a reconstructed vector is nonzero, a C (coded vector symbol) is added to the string and the position of this vector is appended to the subordinate list. If a reconstructed vector is zero, its position is appended to the temporary list. In the case of a zero reconstructed vector, two different symbols can be appended to the string. If all its corresponding vectors in bands of same orientation and higher frequencies are zero, a zero-tree root (ZT) is added to the string, and its corresponding vectors are removed from the dominant list and added to the temporary list (since they are already known to be zero, they do not need to be scanned). Otherwise, an isolated zero ( Z ) is added to the string. An exception to this is the lowest frequency band ( L , in Fig. 3a), where a zero tree root is equivalent to all the corresponding vectors in all bands being zero. As the string generated from the three-symbol alphabet (C,Z T , and Z ) is being produced it is encoded into a bitstream by an adaptive arithmetic coder (Witten et al., 19871, whose model is updated with three symbols at the beginning of this pass. However, during the scanning of the highest frequency bands ( H I , V , , and D, in Fig. 41, no zero-tree roots can be generated. Therefore, just before the scanning of the first vectors of these bands the model of the arithmetic coder is updated with two symbols (C and 2). 9. The dominant list is scanned again and the indices of the vectors marked as C are encoded into the bitstream by the arithmetic coder, whose model is reinitialized at the beginning of this pass to have as many symbols as the population of the orientation codebook. 10. The threshold magnitude T is multiplied by the approximation scaling factor a. 11. Subordinatepass: The vectors which have been previously marked as C are rescanned and refined according to the order in the subordinate list. In the refinement process the difference between the original and the nonzero reconstructed vectors is coded using the new yardstick length. As the indices of the new orientation code vectors are produced they are also encoded into the bitstream via
WAVELET VECTOR QUANTIZATION
225
the arithmetic coder (whose model had already been initialized at the beginning of pass 9 to have as many symbols as the number of orientation code vectors). 12. The subordinate list is reordered so that the vectors whose reconstructed magnitudes have higher values come first. 13. The dominant list is replaced by the temporary list, and the temporary list is emptied. 14. The whole process is repeated from pass 7. It stops at any point when the size of the bitstream exceeds the desired bit rate budget. SA-W-LVQ uses a similar strategy to EZW to increase the number of zero-tree roots. In the dominant pass only the reconstructed vectors which are still in the dominant list can be modified. Therefore, in order to increase the number of zero-tree roots, vectors not present in the dominant list can be considered as zero for determining whether a zero vector is either a zero-tree root or an isolated zero. The overhead information is similar to the one used in the EZW algorithm, with the addition of one more byte in the header, specifying the value of a used. Thus, for monochrome images, it has 11 bytes as follows: the value of a (1 byte), the number of stages (1 byte), the image dimensions (4 bytes), the image mean (1 byte), and the initial value of the yardstick length (4 bytes). Also, as in the EZW algorithm, the decoder can track the encoder provided that its initial dominant list is identical to the one from the encoder. Several common features are shared by SA-W-LVQ and EZW. Among them, we can highlight the following: (a) Use of zero-trees, which exploit the similarities among bands of same orientation. (b) A certain distortion level (which is defined by the current magnitude threshold) is guaranteed at each quantization stage. This enables SA-W-LVQ to satisfy requirements (ii) and (iii) in Subsection II,C. (c) Encoding of the image data with priority given to the most important information, which is made possible by the successive approximation process. One characteristic, however, is not shared entirely by the two coders. In the EZW coder, only four symbols are encoded into the bitstream (‘‘ + ”, “-”, Z T , and Z ) . However, in the SA-W-LVQ coder, besides the three symbols used to localize the significant vectors (C,Z T , and Z ) , the indices of the code vectors are also encoded. In the case of the encoding of the three symbols C, Z T , and Z , the arithmetic coder, due to the small number of symbols, can adapt very quickly to the source statistics and therefore be as efficient as in the EZW coder. However, in the case of the indices of the code vectors, which can be on the order of thousands, this
226
E. A. B. DA SILVA AND D. G . SAMPSON
efficiency is greatly reduced. For example, using the orientation codebook based on the first shell of the lattice A,6, there are 4320 code vectors. Nevertheless, this reduction in efficiency is compensated by the savings of vector over scalar quantization (each k-dimensional vector corresponds to k coefficients). For example, when k-dimensional vectors are used, there are roughly k times less ZT and Z symbols than in the scalar case, and therefore the savings in the representation of the localization of the significant information can be roughly reduced by k. v . APPLICATION TO IMAGE A N D VIDEO CODING
A. Still Image Coding
In this section, the performance of SA-W-LVQ for still image coding using various orientation codebooks from Table I is assessed. Also, SA-W-LVQ rate-distortion results are compared with those of the EZW algorithm. 1. Details of the Coder Used
The wavelet transform used in the simulations was a five-stage separable two-dimensional biorthogonal wavelet based on the biorthogonal filter bank described in Table 11, which was shown to give good subjective results (da Silva and Ghanbari, 1994). As in the EZW coder, the biorthogonal TABLE I1 COEFFICIENTs OF THE USED
HJz) 27 26
25
24
23 22
2' 20
2-1 Z-2
2-3 2-4
z-5 2-6 2-7
.00000000 .00000000 .02005660 - .01115792 - .14261994 .04480910 33891217 33891217 .04480910 - .14261994 - ,01115792 .02005660 .00000000 .00000000 .00000000
FILTERBANK
GJz)
H,(z)
.00000000 - .00599119 - ,00333303 ,03609141 .00976279 - .07237464 .22230811 A1353655 .81353655 .22230811 - .07237464 .00976279 .03609141 - .00333303 - .a5991 19
- .00599119 .00333303 .03609141 - .00976279 - .07237464 - .22230811 A1353655 - 31353655 .22230811 .07237464 .00976279 - .03609141 - .00333303 .00599119 .00000000
G,(Z) .00000000 .00000000 .00000000
- .02005660 -.01115792 .14261994 .04480910 - .58891216 .58891216 - .04480910 - .14261994 .01115792 .02005660 .00000000 .00000000
WAVELET VECTOR QUANTIZATION
227
filters were normalized such that the optimum bit allocation is achieved through the distortion equalization among the bands provided by the successive approximation process [see Eq. (18)l.
2. Comparison between Different Lattice Codebooks In the first experiment, the performance of different orientation codebooks was evaluated. These codebooks are built by using the lattices with the the best known space-packing properties in dimensions k = 4, 8, and 16, as well as the "trivial" orientation codebooks in the same dimensions. The parameters of these codebooks are tabulated in Table I. The test image in this experiment was monochrome Lena 256 x 256. First, the best value of the approximation scaling factor (Y is estimated for various orientation codebooks. Figures 12 and 13 show the peak signal-to-noise ratio (PSNR) performance against a,obtained by orientation codebooks based on the trivial orientation codebooks for dimensions 4 and 8, as well as spherical shells of lattices D4,E8 and A16. The bit rate used was 0.5 bit/pixel. Before analyzing the results, it is important to observe that the values of (Y obtained from Omax in Fig. 8a are for the worst case. The worst case is when, at each approximation stage, the error in orientation is equal to Om,,. Therefore, it is expected that the optimum values of (Y found for the different codebooks are smaller than ones drawn by cross-corresponding Table I and Fig. 8a. Table I11 shows the optimum values of (Y for each of the orientation codebooks used along with the corresponding PSNRs. The worst case values of (Y (Fig. 8a) are also shown. The following observations can be made from Figs. 12 and 13 and Table 111:
1. In all cases the PSNR curve reaches its peak for a value of (Y which is well inside what was expected from table I and Fig. 8a (see Table 111). 2. As expected, the performance of the trivial codebooks is well below those of the codebooks derived from the regular lattices which provide the best sphere packing in the same dimensions (D4, E8, and AI6). This confirms the importance of the Omax for orientation codebooks. For example, from Table I, the first shell of D4has 24 code vectors and Omax = 45", against the 16 code vectors and a Omax= 60" of the trivial orientation codebook T4. Because the smaller population of T4 contributes higher efficiency, the worse performance of SA-W-LVQ using T4 compared with with that using D,-shell 1 highlights the strong influence of Omaxon the efficiency of SA-W-LVQ.
4-Dlmenslonal Orlentatlm Codebooks I -
, ..... 1.... ...-.
."
.........
.........
\, i.
0.55
0.6
0.85
0.7
0.75 alpha
0.8
0.85
0.9
0.95
0.9
0.95
1
(4 EDlmenrlonal Orlentation Codebdu
0.5
0.55
0.6
0.85
0.7
0.75 alpha
0.8
0.85
(b)
FIGURE12. Performance of SA-W-LVQversus (Y values for the Lena 256 X 256 image at 0.5 bit/pixel and different orientation codebooks: (a) four-dimensional orientation codebooks-trivial and D4 lattice, shells 1, 2, and 1 + 2; (b) eight-dimensional orientation codebooks-trivial and Es lattice, shells 1, 2, 3, 1 + 2, and 1 + 2 3.
+
16-Dlmenslonal Odentatlon Codebook
0.5
1
1
I
1
I
I
I
I
I
0.55
0.6
0.65
0.7
0.75 alpha
0.8
0.85
0.9
0.95
I
I
I
1
(4 Flrst Shells
33
I
1
I
1
I
I
-
D4
32 31
30 29
8
f
28 27 26 25 ::
24
.........................................................
23
..............'................... .......................................
22
I
I
.......................................
I
1
;..,
I
.
I I
,
.
,
(
;.:.
._................................................................
.. ., . . ........... ...#................'... . . . . ..................... :
'.
:
I
,
1
.
I
alpha
(b) FIGURE13. Performance of SA-W-LVQ versus a values for the Lena 256 x 256 image for 0.5 bit/pixel and different orientation codebooks: (a) A I 6 lattice, shell 1; (b) D4,E,, and A,6 lattices. From da Silva, E. A. B., Sampson, D. G., and Ghanbari, M., A successive approximation vector quantizer for wavelet transform image coding, IEEE Trans. Image Process. 5 (21, 299-310. 0 1995 IEEE.
230
E. A. B. DA SILVA AND D. G. SAMPSON TABLE 111 VALUES OF (Y FOR SEVERAL ORIENTATION CODEBOOKS, TOGETHER WITH THE WORST-CASE VALUES OF a ACCORDING TO THE CORRESPONDING em,, (FIG.8A) AND PSNR (DB),FOR THE LENA256 X 256 TEST IMAGE
OpnMUM
Lattice type, Lk
a
a
worst case
optimum
PSNR (dB)
0.71 0.71 0.59 0.87 0.71 0.71 0.62 0.59 0.58 0.93 0.82
0.55 0.56 0.56 0.63 0.60 0.53 0.53 0.53 0.54 0.69 0.62
31.90 31.77 31.82 31.57 32.15 31.35 31.04 31.53 31.06 30.84 32.45
Shell index m,
+ ... + m , 1 2 1+2
1 2 3 1+2 1+2+3
2
3. The best performance in all dimensions is given by the first shells of the lattices D4, E,, and h16, despite the fact that the values of Omax for the first shells are in general larger than those for higher shells. This is due to the trade-off between the number of stages necessary for convergence (given by Omax) and the codebook population (for the same lattice, the smaller the value of Omax the larger the codebook populationsee Table 0.' 4. From Fig. 13b one can see that the orientation codebook formed by the first shell of the lattice A16 is the one which gives the best performance among the codebooks analyzed. This can be checked in Figs. 14a and 14b, which show the rate-distortion curves for the Lena 256 x 256 and Lena 512 X 512 test images. These results highlight the trade-off between the values of Om,,, codebook population, and vector dimension. 3. Comparison with the EZW Coder After selecting the best scaling factor (Y for the three lattice-based orientation codebooks, the coding performance of the proposed method is I Besides the fact that a smaller codebook population contributes to a less costly representation of the data (see the discussion in Subsection III,A,3), in SA-W-LVQ there is another factor relating the coding performance to the codebook population. Since the indices of the vectors are encoded using an arithmetic coder, a smaller codebook population leads to quicker adaptation of the arithmetic coder to the statistics of the vectors, which contributes to a further increase in efficiency (Witten ef al., 1987)
-
Flnt shell8 LENA 256x256
45
1
I
1
I
1
1
I
I
....
..... ..........->:-------. :
* ......
40
....
.-/.
.
D4 -
€6
L16
1
30
......
25
.....
20
..........
0
0.2
0.4
0.6
0.8
..
, .....
..........,............
-
15
.....
.............
. . . . .
35
............
I
1
1 rate(bpp)
(4 45
40
35
30
26
20
02
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
mWbpp)
(b) FIGURE14. Rate-distortion performance of SA-W-LVQ using the first shells of the D4, E,, and Al, lattices: (a) for the Lena 256
X
256 image; (b) for the Lena 512
X
512 image.
232
E. A. B. DA SILVA AND D. G . SAMPSON
compared against that of the EZW coder. Table IV shows values of the PSNR obtained with SA-W-LVQ for the first shells of D4,&, and L16 lattices along with the performance of the EZW method for the following test images: Lena 256 X 256, Lena 512 X 512, and the set of five ISO/ITU-T test images (Barbara, Boats, Girl, Gold, and Zelda), of dimensions 512 X 512, at a rate of 0.4 bit/pixel. In order to serve as a reference, PSNR results for the JPEG coder (Wallace, 1992) are also included. It can be seen that the SA-W-LVQ algorithm using the first shells of the & and A16 lattices outperforms the EZW algorithm for all test images. Figures 15a and 15b show the rate-distortion curves of SA-W-LVQ using AI6-shell 1 and EZW, for the test images Lena 256 X 256 and Lena 512 X 512, respectively. We can see that SA-W-LVQ provides better performance than the EZW algorithm at all bit rates. In order to provide an appreciation of the image quality obtained using SA-W-LVQ, Fig. 16 shows the test images Lena 256 X 256 and Barbara coded at a compression ratio of 20:l (0.4 bit/pixel) using SA-W-LVQ with the AI6-shell 1 orientation codebook.
B. Low-Bit-Rate Video Coding 1. Introduction The ITU-T (formerly C C Im ) recommendation H.261 has defined a coding scheme operating at integer multiples of 64 kbit/s, which is suitable for videophone and videoconference applications (ITU-T 1990). This is a hybrid DPCM/DCT coder, where the motion-compensated interframe prediction (MCIP) error image is partitioned into blocks of 8 x 8 pixels that are transformed using a two-dimensional DCT. The transform coefTABLE IV PSNR PERFORMANCE OF SA-W-LVQ (DB)USING THE FIRSTSHELLS OF THE D4, E8, AND 1\16 LATTICES FOR SEVERAL TEST IMAGES AT A RATE OF 0.4 BIT/PIXEL COMPARED WITH THE EZW ALGORITHM AND JPEG
Test image Barbara Boats Girl Gold Zelda Lena 256 Lena 512
0 4
E8
A16
EZW
JPEG
29.36 34.19 35.27 31.01 38.43 30.13 35.17
30.60 34.18 35.91 32.76 39.36 30.15 35.86
30.90 35.24 36.12 32.61 39.44 30.29 36.09
29.03 34.29 35.14 32.48 39.08 30.06 35.02
27.27 32.63 33.98 31.38 37.16 28.07 33.42
/ /--
.........,.r :.. .................................. SA-W-VOL16 E m
--
_<--
-
2
2.5
(4 LENA 512x512
45
/ __.---
.
_---
_/--
_/-
,__---
..... .........A ...........
40
S N-VO L16
E m
-
...................................
35
P
e.
h 25
20
0
0.5
1
blt rate (bpp)
1.5
2
5
(b) FIGURE15. Rate-distortion performance of SA-W-LVQ using the first shell of the A,6 lattice and of the EZW algorithm: (a) for the Lena 256 512 X 512 image.
X
256 image; (b) for the Lena
FIGURE16. Test images coded at 0.4 bit/pixel using SA-W-LVQwith the lattice AI6: (a) Lena 256 x 256; (b) Barbara. From da Silva, E.A. B., Sampson, D. G., and Ghanbari, M., A successive approximation vector quantizer for wavelet transform image coding, IEEE Tmns. Image Rwcess. 5 (Z), 299-310. Q 1995 IEEE.
WAVELET VECTOR QUANTIZATION
235
ficients are quantized with a uniform scalar quantizer and encoded with a combination of run-length and variable-length coding. Although this video coder is now widely accepted, its picture quality suffers from noticeable blocking artifacts at low bit rates. For this reason, investigations on other video coding methods are conducted aiming to exploit the potential of tools such as subband/wavelet transform and vector quantization for video coding applications (Woods and Naveen, 1989; Westerink et al., 1990; Fadzil and Dennis, 1991; Gharavi, 1991; Zhang and Zafar, 1992; Mersereau et al., 1993; Ohta and Nogasi, 1993). The success of these techniques is mainly evaluated by the picture quality of the reconstructed frames at low bit rates, where the block transform codecs perform poorly. In this section we present a coding method designed for low-bit-rate coding of video signals. It consists of two main parts: (i) The motion estimation/compensation, where the overlapped block matching algorithm is employed. (ii) The successive approximation wavelet lattice vector quantization coding algorithm, which is used to code the motion-compensated interframe prediction error images. 2. Motion Estimation Using Overlapped Block Matching
a. Introduction-Motivation. Temporal redundancy between successive image frames can be removed by taking into consideration the displacements of moving objects. Thus, motion estimation/compensation is an important part of any video coder (Musmann et al., 1985). Block matching (BM) motion estimation has been widely used in video coding applications and fast implementations have been suggested (Ghanbari, 1990). In the conventional BM algorithms, the image is partitioned into nonoverlapping, rectangular blocks of size N X N pixels. Each block in the current frame is then compared against all N X N blocks within a specified search area of size ( N + 2s + 1) X ( N + 2s + 1) in the previous frame. The search window is located at the center of the current block at the previous frame. The best matched position is found either by minimizing a distortion cost function (e.g., the mean square error or the mean absolute difference) or by maximizing a correlation function (e.g., the cross-correlation) between the two blocks. The location of the best matched block determines the motion vector D = (dx,dy) which indicates the displacement of the current block in relation to the previous frame. The maximum allowed displacement is Its. Figure 17 illustrates the basic operation in a conventional block matching motion estimation. Although BM motion estimation has proved very successful in reducing the energy (and consequently the amount of data) of the interframe
236
E. A. B. DA SILVA AND D. G. SAMPSON
previous decoded frame
FIGURE17. Block matching motion estimation.
prediction error, it fails to estimate the true motion present in the scene. This is partly due to the unrealistic assumption of purely translational motion and partly due to the limited image information provided by the confined search area. The motion vectors of neighboring blocks may point toward different directions due to the irregularities which occur in the motion field. As a result of these discontinuities in the motion field, considerable blocking artifacts are introduced into the prediction error image. Figure 19a illustrates an example of the prediction error image obtained after conventional block matching motion compensation. It can be observed that the boundaries of adjacent blocks are noticeable. For block transform codecs, possible blocking artifacts present in the prediction error image do not significantly deteriorate the coding performance, provided that the motion compensation block dimensions are multiples of the transform block ones. However, blockiness can have a considerable effect on the coding efficiency of subband/wavelet codecs, where the entire image rather than a small subimage is transformed. In this case, blocking artifacts on the boundaries of the motion blocks are translated to a large signal energy in the high-frequency bands. Most subband/wavelet coding techniques tend to spend a large proportion of their bit rate budget just for coding this high-frequency information, which is a waste because it is not part of the original scene.
237
WAVELET VECTOR QUANTIZATION
Hence, it is important to employ a motion estimation/compensation technique that does not lead to blocking artifacts. For this reason, a variation of block matching in which the blocks are overlapped with each other is employed. This method is referred to as overlapped block matching (OBM) motion compensation. Overlapped motion compensation methods were first proposed Watanabe and Suzuki (1991). In this paper the overlapped windows have been used only during the computation of the predictive image and not for the selection of the motion vectors. The authors have used their OBM motion compensation scheme as part of the MPEG-1 video coder. Young and Kingsbury (1993) investigated spatial and frequency domain OBM in connection with lapped orthogonal transforms. Ohta and Nogasi (1993) have used OBM in their wavelet transform video coder. b. Description of the Algorithm. The basic algorithm of the overlapped block matching motion estimation/compensation employed in this chapter can be described as follows. A. Overlapped Block Matching Motion Estimation 1. The current image frame is divided into Overlapped blocks of size M X M that are located around a core block of N X N pixels. Qpically, the size of each dimension of the overlapped blocks is chosen to be double the size of the core blocks, i.e., M = 2 N. 2. For each M x M overlapped block, an area of fs pixels around the block is searched to select the best motion vector. Hence, the total size of the search area is (N + 2s + 1) X ( N + 2s l), and the maximum displacement is D&{ = fs. Selection of the best motion vector is conducted in two stages. 2.1. For each candidate displacement vector Di = (hi, dy,), i = 1,2,. ..,(2s + 1)*, the following steps are performed: 2.1.1 The prediction error block is formed as the difference between the translated block in the previous locally decoded (reference) frame and the current block. That is,
+
E ( x , y ) = C ( X , Y ) - P ( x + &Y + dY), x=xo+n, n = 0 , 1 , ..., M - 1, y=yo+m, m = 0 , 1 , ...,M - 1
where ( x , y ) indicates the pixel coordinates at the image grid; ( x o ,y o ) indicates the starting coordinates of the current overlapped block;
238
E. A. B. DA SILVA AND D. G . SAMPSON
E(x, y ) represents the prediction error at the position (X,Y);
C ( x , y ) represents the pixel intensity value at position ( x , y ) in the current frame; P ( x , y ) represents the pixel intensity value at the position ( x , y ) in the decoded previous frame, and (du,, dy,) represent the displacement in the horizontal and vertical directions, respectively. 2.1.2 The prediction error block is weighted with a window function W ( n ,rn) E W ( x , y )= W ( n , m ) E ( x , y )
where E W ( x ,y ) represents the weighted prediction error, and W(n,rn) represents the weighting coefficients that correspond to position ( n , rn). 2.1.3 The average energy of the weighted prediction error block is calculated from
which corresponds to the mean squared error made by the prediction based on the displacement vector Di, after weighting with W(m,n). 2.2. The motion vector is selected to give minimum weighted prediction error. 3. Store the motion vectors for each overlapped block, Di, i= 1,2,. ..,X Y / N 2 , where X and Y represent the vertical and the horizontal image frame size. B. Ouerlapped Block Matching Motion Compensation 1. Same as step 1 in OBM motion estimation. M X M overlapped block form the prediction block by carrying out the following steps. 2.1 The motion-compensated prediction for the current block is formed by taking the block in the locally decoded previous (reference) frame indicated by the estimated displacement vector,
2. For each
PD(x,Y) = P ( ~ + h , ~ + d y )
239
WAVELET VEmOR QUANTIZATION
where P&, y) represents the motion-compensated prediction at ( x , y) and D = (h,dy) is the estimated displacement vector. 2.2 The prediction block is weighted with the same window function used during the motion estimation process. That is, P.r(x,y)
=
W(n,m)P,(x,y)
where P,"(x, y) represents the weighted motion compensated prediction at ( x , y). 3. The windowed overlapped prediction blocks are summed up. Figure 18 illustrates the operation of motion compensation using overlapped block matching. The window function must be selected such that it does not change the intensity values of the original image after weighting the pixels and adding the weighted values in the areas where two or more blocks are overlapped. In our experiments, we have used the raised-cosine function:
W(n,m)=ms2(;i)cosz( -
z),
where n,m = - N . .
.,N .
(39)
FIGURE18. Overlapped block matching motion compensation. From Sarnpson, D. G., da Silva, E. A. B., and Ghanbari, M., Low-bit-rate video coding using wavelet vector quantization, IEE Proc. Part I, Vuwn, Image Signal Pmcess. 142(3), 141-148.6 IEE.
240
E. A. B. DA SILVA AND D. G . SAMPSON
Figure 19b illustrates the prediction error image obtained by OBM. It is clear by comparing the images in Figs. 19a and 19b that the MCIP error image created by OBM-MC is virtually free from blocking effects. In Section V,B,3, we will demonstrate that this leads into significant improvement of the coding performance achieved by the presented wavelet video codec. 3. Experiments-Simulation Results In this section, the performance of the OBM-SAWLVQ video codec for low-bit-rate video coding applications is evaluated. The basic block diagram of the encoder is shown in Fig. 20.;The main elements of the codec are the overlapped block matching moti6n estimation/compensation (described in Section V,B,2) and the succesbive approximation wavelet lattice vector quantization algorithm (describedl in Section IV,A). The two-dimensional wavelet transform used in the experiments presented in this chapter is implemented by the biorthogonal filter bank described in Table 11. A set of experiments have been conducted to test the performance of different parts of the codec. For these experiments, three standard test image sequences, namely, Miss America, Claire, and Salesman, suitable for videophone and videoconference applications have been used. The original pictures are in common intermediate format (CIF) (ITU-T 1990) with frame rate 30 frames/s. For all experiments, a frame rate of 10 frames/s was used by simply subsampling the original sequence by a factor of 3 in the temporal domain without any additional processing. In all cases, the first frame of the sequence is coded using intraframe SA-W-LVQ scheme at 0.5 bit/pixel. An important feature of SA-W-LVQ is that the wavelet coefficient vectors are scanned according to their reconstructed values and the vectors with the higher energy are coded first (see Section IV). This guarantees that the bit rate budget will be used with priority to the image data that would result into maximum distortion. This is important for low-bit-rate video coding applications. For the experiments presented in this chapter a constant bit rate is assumed by allocating a fixed number of bits for each frame (equal to the ratio of the data rate to the frame rate). This eliminates the need of a buffer for smoothing out the bit rate variation. a. Selection of the Number of Stages of the Wavelet Decomposition. First, the effect of the number of stages of the wavelet transform in the coding performance is investigated experimentally. Figure 21 shows the average peak signal-to-noise ratio (PSNR)’ for all three test image sequences The average PSNR over k frames is calculated by l O l 0 g ~ ~ ( 2 5 5 ~ / M S Ewhere ,~~~,) MSE,,,,, = (I/k)E:= I MSE;,,,,.
b
FIGURE19. Magnified part of the prediction error image obtained using: (a) the conventional block matching; (b) the overlapped block matching. From Sampson, D. G., da Silva, E. A. B., and Ghanbari, M., Low-bit-rate video coding using wavelet vector quantization, IEE Proc. Pari I , Viion, Image Signal Process. 142(3), 141-148. 0 IEE.
242
E. A. B. DA SILVA AND D. G. SAMPSON
Vldeo Input
I
Overlapped Block Matching Motlon Compensation I
+ inverse Wavelet Tranfotm
Approxlmatlon Wavelet Lattice VQ
t
Bltstream
Successive Approxlmation Wavelet Lattice VQ
A
Channel FIGURE20. Block diagram of the OBM-SAWLVQ encoder.
coded at 64 kbit/s using two- to five-stage decomposition. For this experiment, the A,,-based lattice codebook is employed. A 256 X 256 window of the original CIF pictures is used. The results presented in Fig. 21 do not reveal any significant differences between the various stages. In all cases the improvement in average PSNR of the best over the worse case is within a fraction of 1.0 dB (0.3-0.6 dB). Nevertheless, the two-stage one gives better performance for all three test sequences. Hence, a two-stage wavelet transform has been used for the rest of the simulations, because it can be directly implemented with the original CIF image frames, as well as having faster implementation than the others.
WAVELET VECTOR QUANTIZATION
+Miss America, 10 Hr. 64 WIvsec 41.2
'
243
\41.1!
8
41.0.
40.8
.
33.8
=z 8
33.6.
e
33.4.
-a-
6ab8mn, 10 Hz, 64 WWwc
33.2:
8
33.0 stg2
rtg-3
sIg4
sg-5
number of stages
(4 FIGURE21. Average PSNR performance of different number of stages of the WT: (a) Miss America, 10 Hz, 64 kbit/s; (b) Claire, 10 Hz, 64 kbit/s; (c) Salesman, 10 Hz, 64 kbit/s.
244
E. A. B. DA SILVA AND D. G . SAMPSON
b. Comparison between Various Lattice Codebooks. The performance of the orientation codebooks built based on the innermost shells of the lattices is tested. The parameters of these codebooks, related to SA-LVQ, are given in Table I. The selection of the approximation scaling factor (Y for each lattice codebook is discussed first. From the plot shown in Fig. 8a, the value of the convergence scaling factor CY can be found provided that Omax is known for a given codebook. This value represents the approximation scaling factor needed to guarantee c~nvergence,~ assuming that at each quantization stage the worse-case error is made. In practice, however, a smaller value of (Y can be selected, because it is not realistic to assume that the maximum error always occurs. The selection of the appropriate scaling factor is very important for SA-LVQ, because it affects the number of stages required to achieve a certain distortion threshold level for a given codebook. The approximation scaling factor is selected experimentally, for the lattice codebooks used in our experiments, by calculating the peak signalto-noise ratio for the (Y values in the range 0.5 to 1.0. For the rest of the experiments, the optimum (Y value shown in Table I11 for each lattice codebook is used. Figure 22 shows the average PSNR obtained by different lattice codebooks for the three test sequences coded at 64 kbit/s. These results demonstrate the robust performance of the codec; i.e., there are only small variations between the various codebooks. It also highlights the fact that the performance of SA-W-LVQ is a trade-off between three factors, namely, the vector dimension, the population of the orientation codebook, and the value calculated for the particular lattice codebook. Although lattice codebooks of smaller population (like the ones based on lower order shells) may offer some advantages during their encoding with the adaptive arithmetic coder, they typically require a larger number of stages to attain a certain distortion target, due to the larger Omax value. On the other hand, although codebooks operating at larger dimensional blocks (e.g., k = 8 or 16) involve larger codebook sizes, the total number of bits used for the classification of the block and the encoding of the selected indices is shared by a larger number of samples. Finally, another factor that may affect the evaluation of the overall performance of the codec is the computational complexity required for the selection of the closest lattice code vector. In general, Al, involves a considerably more complex nearest-neighbor selection algorithm than E, That is, an almost perfect reconstructionof the original signal is achieved provided that a sufficiently large number of approximation stages is available.
245
WAVELET VECTOR QUANTIZATION
- 42.1
Miss America, 10 Hz,64 kblVsec
42.0 41.9
-
41.8
-
41.5
-
41.4 Dqll
D412
D4/1+2
Eel1
Eel2
Eel3
E8/1+2 E8/1+2+3
L16
lattice codebook
. +Claire 10 Hz,64 kbiVsec
39.6
*,* Dqll
D4l2
D4/1+2
Eel1
Eel2
E8/3
E811+2 E811+2+3
L16
lattice codebook
34.2,
g U 5 n h
e
34.1
34.033.9
0)
2
I
33.8 33.7
I
. I
+ salesman 10Hz. 84kbWsec
(4 FIGURE22. Average PSNR performance of different lattice codebooks:(a) Miss America, 10 Hz, 64 kbit/s; (b) Clair, 10 Hz, 64 kbit/s; (c) Salesman, 10 Hz,64 kbit/s.
246
E. A. B. DA SILVA AND D. G . SAMPSON
and D4. For the rest of the experiments, the E, - shell 1 + 2 lattice codebook was employed, which has given on average the best PSNR performance and also has a reasonable codebook size ( N = 2160). c. Comparison between the Overlapped and the Conventional Block Matching Motion Compensation. The efficiency of the overlapped block matching motion compensation (OBM-MC) algorithm, described in Section V,B,2, for the SA-W-LVQ video codec is investigated. Figure 23 shows the PSNR performance of the codec for Miss America, Claire, and Salesman at 64 kbit/s using the OBM and the conventional BM motion compensation. In both cases, the core motion compensation block is 16 X 16 and a search area of f15 pixels is assumed. Hence, the same numbers of motion vectors are created. In the OBM-MC, the size of the overlapped blocks is 32 X 32 and the window function is the two-dimensional raised cosine given by Eq. (39). In the simulation results presented in this section the absolute values of the horizontal and vertical components of the motion vectors (including zero ones) are coded using an adaptive arithmetic coder (Witten et al., 1987) and embedded into the bitstream. This may not the most efficient way to code the motion vectors; nevertheless, it allows a common strategy for both the OBM and the conventional BM. The plots in Fig. 23 illustrate the improvement in PSNR obtained by employing the OBM instead of the conventional BM. This improvement is also reflected in the picture quality of the reconstructed frames, following the reasons explained in Section V,B,2. In case the OBM is employed instead of the conventional BM, it is interesting to see that for typical videophone sequences (i.e., Miss America and Claire) small PSNR degradation is observed as the sequence advances to higher order frames. d. Comparison with Other Low-Bit-Rate video Codecs. The performance of the presented video coding scheme is compared with the RM.8 implementation of the ITU-T recommendation H.261 and other video codecs proposed for low-bit-rate video coding applications. Figure 23 illustrates the PSNR improvement achieved by the OBM-SAWLVQ over the RM.8 simulations. The picture quality of the reconstructed frames is demonstrated in Figs. 24 and 25. Figures 24a and 24b show frame 73 of Claire and Figs. 25a and 25b show frame 43 of Salesman all coded by OBM-SAWLVQ and RM.8, respectively, at 64 kbit/s. It is evident from these figures that the picture quality of the OBM-SALWVQ coded pictures is good and free of the annoying blocking effects of RM.8 coded images. The OBM-SAWLVQ video codec offers constant bit rate, with no need for a buffer, yet, remarkably, the PSNR fluctuations from frame to frame are reasonably small. This is due to the fact that the SA-W-LVQ
Framc Number
(4 FIGURE23. Performance comparison between overlapped and conventional block matching: (a) Miss America, 10 Hz, 64 kbit/s; (b) Claire, 10 Hz, 64 kbit/s; (c) Salesman, 10
Hz,64 kbit/s.
FIGURE 24. Reconstructed image frames: (a) frame 73, Claire, CIF, 10 Hz, 64 kbit/s using OBM-SAWLVQ;(b) frame 73, Claire, CIF, 10 Hz,64 kbit/s using H.261-RM.8.From Sampson, D. G., da Silva, E. A. B., and Ghanbari, M., Low-bit-ratevideo coding using wavelet vector quantization, IEE Proc. Part I, Viiion, Image Signal Aocess. 142(3), 141-148. (D IEE.
FIGURE25. Reconstructed image frames: (a) frame 43, Salesman, CIF, 10 Hz, 64 kbit/s using OBM-SAWLVQ, (b) frame 43, Salesman, CIF, 10 Hz, 64 kbit/s using H.261-RM.8. From Sampson, D. G., da Silva, E. A. B., and Ghanbari, M., Low-bit-rate video coding using wavelet vector quantization, IEE Proc. Part I , Vuion, Image Signal Process. 142(3), 141-148. 8 IEE.
250
E. A. B. DA SILVA AND D. G. SAMPSON
always codes the most important image data first. Moreover, there is relatively small quantization error accumulation as the image sequence is advanced to higher order frames. Tables V to VII compare the average PSNR obtained for the three test image sequences coded at 64 kbit/s by various low-bit-rate subband/wavelet video codecs reported in the literature. It can be claimed that the scheme described in this chapter achieves coding performance that comparable to that of the best known reported simulation results at low-bit-rate video coding. Ohta and Nogasi (1993) have presented a video codec based on overlapped block matching and wavelet transform coding of the motion-compensated prediction error. According to their algorithm, referred to as OBM-WT in Tables V to VII, an OBM algorithm (similar to the one described in Section V,B,2) is used and the MCIP error images are wavelet transformed. A uniform scalar quantizer with dead zone is used to code the wavelet coefficients. The authors suggest an alternative to zero-trees based on run-length coding. They describe a scanning process which is implemented across the bands by first considering a single coefficient at the same position from bands L , , V , , H,, and D, and then the four corresponding coefficients from H,, V,,D,, up to the 2M coefficients from YM,HM, D M bands, before returning to the next single coefficient in L , . The aim of this scanning technique is to create a large number of zero runs (since the quantized coefficients at high-frequency bands tend to have zero values), so that the wavelet coefficients can be efficiently coded with run-length and variable-length coding. It must be pointed out that this coding scheme shares the same motion compensation/estimation with the video codec presented here. However, an improvement of 1.55 and 2.83 dB in average PSNR is obtained by using OBM-SAWLVQ instead of the OBM-WT codec for Salesman and Miss America, respectively, coded at 64 kbit/s.
TABLE V VIDEO CODECS FOR CUIRE AT PERFORMANCE COMPARISON WITH OTHER LOW-BIT-RATE 10 FRAMES/S Spatial resolution
Bit rate (kbit/s)
Average PSNR (dB)
Method
Reference
~~
352 X 288 352 X 288
64 64
36.64 39.59
H26 1-RM8 OBM-SAWLVQ
-
25 1
WAVELET VECTOR QUANTIZATION TABLE VI
PERFORMANCE COMPARISON W I T H &'HER LOW-BIT-RATE VIDEO CODECS FOR SALESMAN AT 10 FRAMES/S Spatial Bit rate resolution (kbit/s)
Average PSNR (dB) ~~
352 X 352 X 352 X 352 X
288 288 288 288
700 64 64 64
33.90 32.50 31.90 34.05
Method
Reference
~
Edge/Sub VQ OBM-WT H261-RM8 OBM-SAWLVQ
(Mohsenian and Nasrabadi, 1994) (Ohta and Nogasi, 1993) -
Bhutani and Pearlman (1993) have tested the performance of a video codec in which the image frames are decomposed by a five-level subband, and pel-recursive motion compensation is performed on each subband separately. The authors have used the embedded zero-tree wavelet codec proposed by Shapiro (1993) to code the frame difference subimages. They have reported that their codec (referred to as zero-tree in Table VII), achieves an average PSNR of 40.52 dB for Miss America coded at 332 kbit/s. For the same test image OBM-SAWLVQ gives an average PSNR of 41.98 dB at 64 kbit/s. Mohsenian and Nasrabadi (1994) have suggested an edge-based subband video codec, in which edge detection is employed to extract edge information in the baseband image, aiming to predict the locations of the significant coefficients that correspond to temporal changes between the frames. In this codec, the authors have used a simple block matching motion compensation and seven-band decomposition of the MCIP error images. Mersereau et al. (1993) have also used a simple BM-MC algorithm and split the MCIP error image into 16 uniform subbands. An energy-based TABLE VII WITH OTHER LOW-BIT-RATEVIDEOCODECSFOR PERFORMANCE COMPARISON Miss AMERICA AT 10 FRAMES/S
Spatial resolution 256 256 352 352 352 352 352
X X X X
X X X
256 256 288 288 288 288 288
Bit rate (kbit/s)
Average PSNR (dB)
64 64 332
41.80
260 64 64 64
38.10 40.52 38.63 39.15 40.33 41.98
Method
Reference
Subband/VQ OBM-SAWLVQ Zero-tree Edge/Sub VQ OBM-WT H261-RM8 OBM-SAWLVQ
(Mersereiau et al., 1993) -
(Bhutani and Pearlman, 1993) (Mohsenian and Nasrabadi, 1994) (Ohta and Nogasi, 1993) -
252
E. A. B. DA SILVA AND D. G. SAMPSON
selection rule is then applied to select the vectors, which are coded with a two-stage multistage vector quantizer. The locations of the coded vectors at each stage are encoded using run-length coding. VI. CONCLUSIONS In this chapter we have described investigations related to successive approximation wavelet lattice vector quantization and its application to image and video data compression. The basic idea of successive approximation vector quantization (SA-VQ) is that groups of samples are successively refined by a series of vectors with decreasing magnitudes. This method belongs to the framework of multistage vector quantization. Conventional multistage vector quantization techniques require the design of different codebooks for each quantization stage and a complex process for the selection of the optimum combination of the final code vectors (Gersho and Gray, 1991). The successive approximation vector quantization described in this chapter uses a single codebook in all quantization stages. We demonstrated that the main design requirement of the SA-VQ codebooks is related only to the maximum error in orientation (determined by em,, 1. Furthermore, the use of lattice orientation codebooks with regular structure is justified because they can offer a good compromise between Om,, and the codebook population and vector dimension. Lattice codebooks also have the advantage of fast techniques for the selection of the closest orientation code vector. In SA-W-LVQ, successive approximation lattice vector quantization has been applied to coding wavelet transform coefficients incorporating (i) the prediction of the nonsignificant information in different subband images using zero-tree roots and (ii) arithmetic coding of both the addressing information and the lattice code vector indices. This method ensures that the available bit rate is spent with priority given to the most significant information of the wavelet coefficients. The SA-W-LVQ coding algorithm has been employed for still image coding and low-bit-rate video coding. Simulation results demonstrate that this technique obtains coding performance comparable to that of the state-of-the-art image and video codecs. ACKNOWLEDGMENTS The work of E. A. B. da Silva was supported in part by Universidade Federal do Rio de Janeiro and Conselho Nacional de Desenvolvimento Cientifico e Tecnolbgico, Brazil, under grant 200885/91-0.
WAVELET VECTOR QUANTIZATION
253
REFERENCES Antonini, M., Barlaud, M., Mathieu, P., and Daubechies, I. (1992, April). Image coding using wavelet transform. IEEE Trans. Image Process. 1(2), 205-220. Argast, J., Rampton, M., Qiu, X.,and Moon, T. (1993). Image compression with the wavelet transform. Proc. SPIE 93 2094, 1347-1356. Barlaud, M., Sole, P., Gaidon, T., Antonini, M., and Mathieu, P. (1994, July). Pyramidal lattice vector quantization for multiscale image coding. IEEE Trans. Image Process. 3(4), 367-381. Bell, T. C., Cleary, J. G., and Witten, I. H. (1990). “Text Compression.” Prentice Hall, Englewood Cliffs, NJ. Bhutani, G., and Pearlman, W. A. (1993). Image sequence coding using the zero-tree method. Proc. SPIE 93 2094, 463-471. Conway, J. H., and Sloane, N. J. A. (1982a). Fast quantizing and decoding algorithms for lattice quantizers and codes. IEEE Trans. Inform. Theory IT-28,227-232. Conway, J. H., and Sloane, N. J. A. (1982b). Voronoi regions of lattices, second moments of polytopes, and quantization. IEEE Trans. Inform. Theory IT-28, 211-226. Conway, J. H., and Sloane, N. J. A. (1988). “Sphere Packings, Lattices and Groups.” Springer-Verlag, New York. Crochiere, R. E., and Rabiner, L. R. (1983). “Multirate Digital Signal Processing.” PrenticeHall, Englewood Cliffs, NJ. da Silva, E. A. B., and Ghanbari, M. (1994, September). Linear phase wavelet transforms for low bit rate image coding. In “VII European Signal Processing Conference-EUSIPCO94,” Edinburgh, UK. pp. 1222- 1225. European Association for Signal Processing. da Silva, E. A. B., Sampson, D. G., and Ghanbari, M. (1995). A successive approximation vector quantizer for wavelet transform image coding. IEEE Trans. Image Process., Special Issue on Vector Quantization 5(2), 299-310. Daubechies, I. (1988). Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. XLI,909-996. Daubechies, I. (1991). “Ten Lectures on Wavelets.” Society for Industrial and Applied Mathematics, Philadelphia. DeVore, R. A., Jaweth, B., and Lucier, B. J. (1992, March). Image compression through wavelet transform coding. IEEE Trans. Inform. Theory 38(2), 719-746. Efstratiadis, S. N., Rouchouze, B., and Kunt, M. (1992, April). Image compression using subband/wavelet transform and adaptive multiple distribution entropy coding. In “SPIE Conference on Visual Information Processing 92,” Orlando, FL, Vol. 1075, pp. 753-764. SPIE-Int. Soc. Opt. Eng., Bellingham, WA. Fadzil, M. H. A., and Dennis, T. J. (1991). Video subband VQ coding at 64 kbit/s using short kernel filter banks with an improved motion estimation technique. Signal Process. Image Commun. 3,3-21. Gersho, A., and Gray, R. M. (1991). “Vector Quantization and Signal Compression.” KIuwer Academic Publishers, New York. Ghanbari, M. (1990). The cross-search algorithm for motion estimation. IEEE Trans. Commun. COM38,950-953. Gharavi, H. (1991). Subband coding of video signals. In “Subband Coding of Images (J. W. Woods, Ed.), Chap. 6. Kluwer Academic Publishers, Boston. Gharavi, H.,and Tabatabai, A. (1988, February). Sub-hand coding of monochrome and color images. IEEE Trans. Circuits Syst. 35(2), 207-214.
254
E. A. B. DA SILVA AND D. G. SAMPSON
Gibson, J. D., and Sayood, K. (1988). Lattice quantization. In “Advances in Electronics and Electron Physics,” Vol. 72, (P. Hawkes, Ed.), pp. 259-330. Academic Press, New York. Huang, Y., Dreizen, H. M., and Galatsanos, N. P. (1992, October). Prioritized DCT for compression and progressive transmission of images. IEEE Trans. Image Process. 1(4), 477-487. ITU-T. (1990). Recommendation H.261, Video codec for audio visual services at p X 64 kbit/s. Jayant, N. (1994). “Image Coding Based on Human Visual Models in Image Processing.” Academic Press, San Diego. Jeong, D, G., and Gibson, J. D. (1993, May). Uniform and piecewise uniform lattice vector quantization for memoryless gaussian and laplacian sources. IEEE Trans. Inform. Theory IT-39, 786-804. KovaEeviC, J., and Vetterli, M. (1992, March). Nonseparable multidimensional perfect reconstruction filter banks and wavelet bases for L P . IEEE Trans. Inform. Theory 38(2), 533-555. Le Gall, D. (1992, April). The MPEG video compression algorithm. Image Commun. 42). 129- 140. Lewis, A. S., and Knowles, G. (1992, April). Image compression using the 2-D wavelet transform. IEEE Trans. Image Process. 1(2), 244-250. Li, W., and Zhang, Y.-Q. (1994, August). A study of vector transform coding of subband-decomposed images. IEEE Trans. Circuits Syst. Wdeo Technol. 44). 383-391. Macq, B. (1992, June). Weighted optimum bit allocations to orthogonal transforms for picture coding. IEEE J . Selected Areas Commun. 1061, 875-883. Mallat, S. G. (1989, July). A theory of multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Machine Intel. 11(7), 674-693. Mersereau, R. M., Smith, M. T. J., Kim, C. S., Kossentini, F., and Truong, K. K. (1993). Vector quantization for video data compression. In “Motion Analysis and Image Sequence Processing” (M. 1. Sezan and R. L. Lagendijk, Eds.), Chap. 9. Kluwer Academic Publishers, Boston. Mohsenian, N., and Nasrabadi, N. M. (1994, February). Edge-based subband VQ techniques for images and video. IEEE Trans. Circuits Syst. V i i o Technol. 40,53-67. Musmann, H., Pirsch, P., and Garllet, H. (1985, April). Advances in picture coding. IEEE Proc. 73(4), 631-670. Naveen, T., and Woods, J. W. (1993, April). Subband finite state scalar quantization. In “Proceedingsof the 1993 ICASSP Conference, Minneapolis, Minnesota,” pp. V-613-V-615. IEEE-Int. SOC.Opt. Eng., Bellingham, WA. Netravali, A., and Haskell, B. (1995). “Digital Pictures. Representation and Compression,” 2nd ed. Plenum Press, New York. Ohta, M., and Nogasi, S. (1993, December). Hybrid picture coding with wavelet transform and overlapped motion-compensated interframe prediction coding. IEEE Trans. Signal Process. SP-41(12), 3416-3423. Pennebaker, W. B., and Mitchell, J. L. (1993). “JPEG Still Image Data Compression Standard.” Van Nostrand Reinhold, New York. Renaud, P. J., and Smith, M. J. T. (1990). Recursive subband image coding with adaptive prediction and finite state vector quantization. Signal Process. 20, 25-42. Rioul, O., and Vetterli, M. (1991, October). Wavelets and signal processing. IEEE Signal Process. Mag. 14-38. Said, A., and Pearlman, W. A. (1993). Image compression using the spatial-orientation tree. In “Proceedings of the 1993 IEEE International Symposium on Circuits and Systems, Chicago, 11” pp. 279-282. IEEE-Int. Soc. Opt. Eng.,Bellingham, WA.
WAVELET VECTOR QUANTIZATION
255
Sampson, D. G., and Ghanbari, M. (1993, February). Fast lattice-based gain-shape vector quantisation for image sequence coding. IEE Proc. Part I , Commun. Speech Vision 140(1), 55-65. Sampson, D. G., da Silval, E. A. B., and Ghanbari, M. (1995). Low bit rate video coding using wavelet vector quantization. IEE Proc. Part I: Vision, Image Signal Process. 142(3), 141-148. Shapiro, J. M. (1993, December). Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Acoustics, Speeeh Signal Process. 41(12), 3445-3462. Sloane, N. J. A. (1981). Tables of sphere packings and spherical codes. IEEE Trans. Inform. Theory IT-27,327-338. Tay, D. B., and Kingsbury, N. G. (1993, October). Flexible design of multidimensional perfect reconstruction FIR 2-band filters using transformations of variables. IEEE Trans. Image Process. 2(4), 466-480. Vetterli, M., and Herley. C. (1992, September). Wavelets and filters banks: theory and design. IEEE Trans. Signal Process. 40(9), 2207-2232. Wallace, G. K. (1992, February). The JPEG still picture compression standard. IEEE Trans. Consumer Electronics 3801, 1-2. Watanabe, H., and Suzuki, Y. (1991). Windowed motion compensation. SPIE V u a l Commun. Image Process. 1605,582-589. Westerink, P. H., Biemond, J., and Muller, F. (1990). Subband coding of image sequences at low bit rates. Signal Process. Image Commun. 2, 441-448. Witten, I. H., Neal, R. M., and Cleary, J. G. (1987, June). Arithmetic coding for data compression. Commun. ACM 30,520-540. Woods, J. W., and Naveen, T. (1989). Subband encoding of video sequences. In “SPIE Conference on Visual Communications and Image Processing IV,” Vol. 1199. Young, R. W., and Kingsbury, N. G. (1993, December). Frequency domain motion estimation using a complex lapped transform. IEEE Trans. Image Process. 2(1), 2-17. Zhang, Y.-Q., and Zafar, S. (1992, September). Motion-compensated wavelet transform coding for color video compression. IEEE Trans. Circuits Sysr. Video Technol. 2(3), 285-296. Zhang, Y. Q., Li, W. P., and Liou, M. L., Eds. (1995, February). Special Issue on Advances in Image and Video Compression, Proc. IEEE, Vol. 83.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 97
Quantum Theory of the Optics of Charged Particles R. JAGANNATHAN AND S. A. KHAN The Institute of Mathematical Sciences, C.I.T.Campus, Tharamani, Madras 600113, India
......................................... 257 ..................... 259 A. General Formalism: Systems with Straight Optic Axis, . . . . . . . . . . . . . . .259 B. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 111. Spinor Theory of Charged-Particle Wave Optics. .................... 322 A. General Formalism: Systems with Straight Optic Axis . . . . . . . . . . . . . . . .322 B. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .330 IV. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Appendix: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 A. The Feshbach-Villars Form of the Klein-Gordon Equation . . . . . . . . . . . .339 B. The Foldy-Wouthuysen Representation of the Dirac Equation . . . . . . . . . . 341 C. TheMagnusFormula.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 D. Green’s Function for the Free Nonrelativistic Particle . . . . . . . . . . . . . . .350 E. Matrix Element of the Rotation Operator ...................... 351 F. Green’s Function for a System with Time-Dependent Quadratic Hamiltonian . . 351 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 I. Introduction
11. Scalar Theory of Charged-Particle Wave Optics
I. INTRODUCTION Those starting to study electron optics, or charge-particle optics, now are very fortunate indeed; the three-volume encyclopedic text book of Hawkes and Kasper (1989a, b, 1994) is available to show the various aspects of the subject in proper perspective, presenting thorough and comprehensive analysis with detailed guidance to the literature. In particular, in a single volume (Hawkes and Kasper, 1994) we find a detailed and up-to-date account of the fundamentals of the developments of the past three decades in “electron wave optics” where in whole branches of the subject have emerged since the appearance of the late Walter Glaser’s classical article (Glaser, 1956), which was the last attempt to cover systematically the whole of electron optics. The present chapter is a modest attempt to supplement the excellent survey of the analytic treatment of the scalar theory of electron wave optics given by Hawkes and Kasper (1994) with a complementary, alge257
Copyright 0 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
258
R. JAGANNATHAN AND S. A. KHAN
braic, approach to the topic, based on recent work (Khan and Jagannathan, 1994, 1995) and give a fairly self-contained account of the spinor theory of electron wave optics, based on the Dirac equation, briefly mentioned by Hawkes and Kasper (1994, Section 56.4). We are concerned mainly with the initial chapters of Hawkes and Kasper (1994). As pointed out rightly by Hawkes and Kasper (19941, the traditional theory based on the relativistic scalar equation has been found to be adequate for almost all practical purposes in electron microscopy although the Dirac equation is the proper basic equation. After some preliminary studies by Rubinowicz (1934, 1957, 1963, 19651, Durand (19531, and PhanVan-Loc (1953, 1954, 1955, 1958a, b, 1960) on the use of the Dirac equation in electron optics, it was only in the last decade that Ferwerda et al. (1986a, b) first reopened the question of using the spinor wavefunction for understanding electron optical images. Essentially, Ferwerda et al. (1986a, b) found after a thorough analysis that the use of the scalar Klein-Gordon wavefunction in electron microscopy could be vindicated because a scalar approximation of the Dirac spinor theory would be justifiable under the conditions obtaining in present-day electron microscopes. Subsequently, the development of the spinor electron optics is being pursued (Jagannathan et al., 1989; Jagannathan, 1990; Khan and Jagannathan, 1993) mainly due to a desire to understand how the Dirac equation, the equation for electrons, explains electron “optics.” Of course, there is also the hope that any better understanding of the way the scalar theory becomes such an excellent approximation of the spinor theory in electron microscopy may eventually be of some practical use in certain situations. To this end, we analyze the small differences between the Klein-Gordon theory and the Dirac theory and spell them out up to leading order approximations. The formalism of spinor theory, using essentially an algebraic approach, has been extended (Khan and Jagannathan, 1994,1995) to the scalar theory, using a two-component formalism, suitable for treating the forward and backward propagating beams separately, analogous to the Feshbach-Villars representation of the KleinGordon theory. The traditional geometrical charged-particle optics is obtained in the classical limit of the quantum theory. It is found that there are interesting additional small contributions to the classical aberrations, arising from quantum mechanics even at the scalar level. Of course, in the Dirac theory the spinor nature of the wavefunction modifies further, although only minutely, the various optical characteristics of the system. In the classical limit the algebraic approach of our theory tends to the Lie algebraic treatment pioneered by Dragt et al. (e.g., see Dragt and Forest, 1986; Dragt et al., 1988; Rangarajan et al., 1990; Ryne and Dragt, 1991; see also Forest et al., 1989, Forest and Hirata, 1992).
OPTICS OF CHARGED PARTICLES
259
In our treatment, at the level of single-particle dynamics, we consider, as is usual, a beam of identical charged particles as a mere collection of point particles without any mutual interactions (including the statistical interactions). The electromagnetic field is treated as classical. We take the charge of the particle, generally, to be q and electron optics is obtained when q = -e.
11. SCALAR THEORY OF CHARGED-PARTICLE WAVE OPTICS
A . General Formalism: Systems with Straight Optic Axis
We are concerned with a beam of identical particles of charge q and rest mass m , passing through a stationary electromagnetic field, constituting a lens or any other such charged-particle optical system. We shall assume the beam to be monoenergetic, with all its constituent particles having the same positive energy, say, moc2 + E; E is the dynamic part of the total energy of the particle. Let {Ah), 4(r)) be the four-potential corresponding to the time-independent electromagnetic field of the system. In passing through this system the energy of the particle will be conserved and so E is time independent. We shall assume the system to be, in practice, localized in a definite region of space so that regions “outside” the system can be considered field free. Then, E can be taken to be the kinetic energy of the particle entering the system from the field-free “input” region. On leaving the system the particle will enter the field-free “output” region with the same kinetic energy E because the scattering by the time-independent system conserves the energy. Disregarding the spin of the particle, if any, we can consider its quantum mechanics to be governed by the scalar Klein-Gordon equation
where c is the speed of light and
Since we are dealing with a time-independent system we can take the wavefunction of the beam particle in the form
q,(,., t )
= e-i(muc2+~~r/fi
*(r).
(3)
260
R. JAGANNATHAN AND S. A. KHAN
Then Eq. (1) becomes
If the particle starts off from its source with a small initial kinetic energy (kinetic emission energy) and attains the kinetic energy E by acceleration through a potential reaching a constant value c$o in the field-free input region just outside the system then we can write E
E
=E -
q40.
(5)
Substituting this expression for E in Eq. (4), we get
defines the total electrostatic potential of the system including the constant value about which it varies inside the system. Since E is usually very small (in the case of electron microscopes it is only 1 eV) one can always drop the term g 2 / c 2 compared to 2&m0 in the right-hand side of Eq. (6) and write
-
Introducing the notations
and
OPTICS OF CHARGED PARTICLES
26 1
Eq. (8) becomes, after dividing by 2m0 throughout,
which is, with q = -e, the basic electron optical scalar wave equation of Hawkes and Kasper (1994) [see Eq. (56.11k note that their E is our E ] . Rewriting Eq. (11) as
+2+(r)
=
(2~m(r) 2moq@(r)}+(r),
( 12)
and defining
it is seen that Eq. (11) takes the Helmholtz-like form
{S2+ k2(r))+(r) = 0.
(15) Equation (11) [or equivalently Eq. (1511 is the basis for understanding the electron optical image formation including all the significant aspects of the process. For a complete account of the traditional analytic treatment, with detailed bibliography on the various stages of development of the theory, the reader is referred to Hawkes and Kasper (1994). Equation (14) defines the wavenumber k(r) of the particle corresponding to the kinetic momentum m = hk in accordance with the de Broglie relation. Let us note that in terms of E, the kinetic energy of the incident particle in the field-free input region of the system, the wavenumber k(r) is 1
E2
exactly, according to the relativistic relation E
+ m2,c4j1/2- moc2 + q4(r).
( 17) Then, it is seen that Eq. (4) also has the same Helmholtz-like form as Eq. (15) with the definition of k(r) as in Eq. (16) instead of Eq. (14). Substituting E = E - q40 and +(r) = @(r) - 4o in Eq. (16), it is seen that the definition of k(r) in Eq. (16) becomes the same as that in Eq. (14) under the approximation of neglecting e 2 / c 2 compared to 2 c m O .Thus, we shall = {c2h2k2(r)
262
R. JAGANNATHAN AND S . A. KHAN
take Eq. (151, the basic scalar wave equation, as the starting point for understanding the quantum mechanics of charged-particle beam transport through optical elements. We shall analyze Eq. (15) with k(r) denoting, in general, the exact expression in Eq. (16) where E is the kinetic energy of the incident particle in the field-free input region of the system and +(r) is the variation of the electrostatic potential inside the system over the constant value doit has in the input region just outside the system. From here, we take a path following our recent work (Khan and Jagannathan, 1994, 1999, using an algebraic approach, different from the traditional path using an analytic approach (see Hawkes and Kasper, 1994). We shall make contact with the basic results of the traditional theory whenever necessary. To proceed further, we have to choose the system more specifically. We shall consider the system to have a straight optic axis along the z-axis of a Cartesian coordinate frame and consider the charged-particle beam to be monoenergetic, quasiparaxial, and moving close to the +z-direction. Let it be assumed that the system is located between the xy-planes with the z-coordinates zin and zOut;i.e., the system field is practically zero in the input region ( z 5 zin) and the output region ( z 2 zoUt).By input and output regions we mean the regions outside the system and close to it. The constant wavenumber of the incident particle in the input region is given by Po k(r, ,z
(18)
as is seen by putting d(r) = 0 in Eq. (16). After elastic scattering by the system the particle will emerge in the field-free output region with the same value of the wavenumber, namely, k,; that is, k(r I ,z > zOut)also has the value k,. Since the beam is supposed to be monoenergetic, the wavevector k, of any particle of it will have the same magnitude
irrespective of its direction. Quasiparaxiality of the input beam implies that
OPTICS OF CHARGED PARTICLES
263
Since we consider the optical system to be such that the input “beam” emerges in the output region again as a beam, the relation
k,(r)2
+ k,(r)2 = k:
(r) < k,(r)2
(21)
will be assumed to hold throughout the propagation of the beam. Further, since we are always concerned only with the forward propagating beam close to the +z-direction, the beam wavefunction we consider, throughout the transport of the beam, would be a packet, or linear combination, of only those plane waves corresponding to wavevectors satisfying the conditions
k t 4 k:, k, > 0. (22) Our aim is to relate the beam wavefunction in the field-free output region,
+out(r)
=
+(rl
- 22
%It),
(23)
to the beam wavefunction in the field-free input region, +in(r) =
Jl(r1 , z
SZin),
(24)
so that the values of the observable beam characteristics in the output region can be related to their values in the input region using the wavefunction. To this end, the most desirable starting point would be a z-evolution equation for Jl(r I ,z ) linear in d / d z . So, first, we cast Eq. (15) in such a form using a method similar to the way in which the Klein-Gordon equation is written in the Feshbach-Villars form (linear in a/&), unlike the Klein-Gordon equation (quadratic in d / d t ) . (See Appendix A for the Feshbach-Villars form of the Klein-Gordon equation.) Let us write Eq. (16) as
k(r) defining
1/2
=
{,ti- i 2 ( r ) )
,
(25)
264
R. JAGANNATHAN AND S. A. KHAN
Now, let
Then, Eq. (27) is equivalent to
--(-k , i
p4)(
;;)
d - i
dz
with fi: = fi; + 8;.Hereafter, for notational convenience we shall not usually indicate the r-dependence of $, A, k , f , etc. Next, define
(:I)
$1
+
*2
I ( $1
-
*2)
1 =
Consequently, Eq. (29) can be written as
265
OPTICS OF CHARGED PARTICLES
Multiplying Eq. (31) throughout by -1 and taking the A, term on the left-hand side to the right-hand side we get
--(
i *+ k , d z 4-
)=A(;:),
( 32)
with
A = -a,+9+2,
(33)
where 1 is the 2 x 2 identity matrix and oy and a- are, respectively, the y and z components of the triplet of Pauli matrices
Let us note that Z-? has been partitioned, apart from the leading term -u,,into an:‘even’’ term k? which does not couple $+ and Jr- and an “odd” term 5‘ which couples JI+ and In order to understand 3. (32) better let us see what it means in the case of propagation of the beam through free space. In free space, with +(r) = 0 and A = (O,O, 01, we have RZ(r)= 0 and
+-.
”(*+)
4 k, dz
JI-
where V: = d 2 / d x 2 kOy,k,, 1, namely,
=
-1-7v: 2ko 1 -V: 2ki
+ d2/dy2. A
--v: 2ki 1
1 + -v; 2ki
plane wave with a given k,
=
(k,,,
266
R. JAGANNATHAN AND
S. A. KHAN
is associated with
It can easily be checked that this
(?) satisfies Eq. (37). For a quasi-
paraxial beam moving close to the +z-direction, with k,, > 0 and k,, = k,, it is clear from Eq. (39) that $+P $-. By extending this observation it can be seen easily that for any wavepacket of the form
$(r)
=
/d3kocp(k0)+k,,(r),
/d3k0tcp(k0)~*= 1
with lkol = k , , k , , = k , , k , , > 0, (40) representing a monochromatic quasiparaxial beam moving close to the ~. +z-direction,
=i
!d% cp(k0) #ko. + (r) ~ d % c p o ( k d ~-(r) ~,, with lkol = k,, k,, = k,, k,, > 0, (41)
is such that $+(r) P $-(r). Thus, in general, in the representation of Eq. (32) we should expect $+ to be large compared to (I- for any monochromatic quasiparaxial beam passing close to the +z-direction through the system supporting beam propagation.
OPTICS OF CHARGED PARTICLES
267
The purpose of casting Eq. (27) in the form of Eq. (32) will be obvious now when we compare the latter with the form of the Dirac equation
where
and
01 =
( a x ,ay,a,) and
P are the 4
X 4
Dirac matrices given by
As is well known, for any positive energy Dirac q in the non-relativistic situation (In1< mot) the upper components (Wu)Aarelarge compared to the lower components (*,I. The “even” operator &YDAdoes not couple the large and small components and the “odd” operator does couple them. Using mainly the algebraic property
PgD
= -$DP
(48)
the Foldy-Wouthuysen technique expands the Dirac Hamiltonian H D in a series with l/moc as the expansion parameter. This leads to a good understanding of the nonrelativistic limit of the Dirac equation by showing how the Dirac Hamiltonian can be seen as consisting of a nonrelativistic part and a systematic series of correction terms depending on the extent of deviation from the nonrelativistic situation (see Appendix B for a risum6 of the original Foldy-Wouthuysen theory). The analogy between Eq. (32) and the Dirac equation should be clear now. The correspondences are: positive forward propagation of the beam close to the +r-direction energy Dirac particle, paraxial beam (1- I I hk) nonrelativistic motion (In14 mot), deviation from paraxial condition (aberrating system) deviation from nyrelativistic situation (relativistic motion). Also, it may be noted that in H [sce Eq. (33)] a, plays the role of P and analogously to Eq. (48) we have a,@= -@a,. Hence, by applying a Foldy-Wouthuysen-
- - -
268
R. JAGANNATHAN AND S .
A.
KHAN
like technique to Eq. (32) it should be possible to analyze the beam propagation in a systematic way up to any desired order of approximation. In the Foldy-WouthFysen theory a series of transformation! are used to ?ake the odd term in H D as small as desired: in the present HD [Eq. (4411 BD is of first order in l/m,c$or +/rn,c) and the transformations applied lead to representations of HD in which the corresponding odd terms 5ontain only s2ccessive higher powers of l/m,c so that one can choose an HD with an 19, as small as desired for the purpose of approximation. We shall apply this !echnique to Eq. (32) to arrive at a representation in which the odd term B pill be as small as we would like. We can label the order of smallness of B by the lowest power of l/k,, the expansion parameter in this context, highzr order smallness corresponding to higher power. The smallness of B in Eq. (32) is of order l/k& Following the Foldy-Wouthuysen method, let us define
Equation (32) is now transformed into
where
1
-
P),
OPTICS OF CHARGED PARTICLES
269
with the odd term of order l/ki [see Appendix B for details of the calculation leading to Eqs. (50)-(53)]. By an2ther transformation 9f the same type as in Eq. (49) with b replaced by @(') we can transform H") to a I%(2) with an &2) of order l/k$ Since the even part of H(')already represents a sufficiently good approximation for our purpose, we shall not continue this process further. Hence, we write
with H"' = - a,
dropping the odd term. Note that Let us now look at
(2;)
+ $(I)
[a,h] =
-
a.
corresponding to the plane wave +,$)
(55)
in
free space. We get
1 1
-V2
4ki
+it!-
1 4ki
-V;
1
showing that +i'd,+P for a quasiparaxial beam. This result easily extends to the wavepacket of the form in Eq. (40). Thus, in general, we can take +$I) P +)I! in Eq. (54) for the beam wavefunctions of interest to us.
270
R. JAGANNATHAN AND S. A. KHAN
We can express this property that compared to $$I), as
$$'I
P
$?, or essentially $!) = 0
Then Eq. (54) can be further approximated to read
with
We want to get the z-evolution equation for $(r). To this end, let us now retrace the transformations and rewrite Eq. (58) in terms of which $,
=
$ [see Eq. (28)].Substituting in Eq. (58)
we get (see Appendix B for details of the calculation)
=
A.(
;;),
(::)
in
27 1
OPTICS OF CHARGED PARTICLES
where
Since h, has nonzero entries only along the diagonal, Eq. (61) describes the z-evolution of and J12 independently. We are interested in the z-evolution of II,= and hence we have i dl(l - - = I?+,
+,
ko
with
-
4 -1 - -A, hk0
dz
1 2ki
- -(D:
,
-i2)
1 dZ
dz
1 + -(b: Ski
2
-
P)
1
-
-([(b: 16ki
-
[b;, x4A , ] dz
dz
P),
272
R. JAGANNATHAN AND
S. A. KHAN
Multiplying Eq. (63) throughout by hk, we get
. a*
*
=&"* O '
lh-
dz
with
8 = -Po
- qA,
1 + -(+t +p)
2 Po
+ --+; +e2) - - ( [ ( a t 1
2
8Po
1
+jq,
16Po"
;it
where* we have used the re1a:ions p o = hk,, g 2 = h2k2, and = - h2D; . Now, we can identify as the optical Hamiltonian corresponding to ih d / d z or -fi, ( = - the canonical momentum in the z-direction). Since j d2rl Jl(r I ,z)I2, representing the probability of finding the particle in the xy-plane located at z , need not be a constant along the z-axis (only l&d2rlJl(rI ,z)I2 = 1 ) the z-evolution of +(rL , z ) , given by Eq. (651, is not necessarily unitary. This implies that representing ih d / d z need not be hermitian. Actually, one should expect a loss of intensity of the forward propagating beam along the z-axis since there will, in general, be reflections at the bouqdaries of the system. So it is not surprising to find non-hermitian terms in above [the term l / p i ; the general pattern in the above expansion procedure is that among the terms (even powers of l / p o ) alternate terms are nonhermitian]. Of course, the effect of these nonhermitian terms can be Txpected to be quite sm$l and negligible. Hence, we can approximate Zo,fucher, to a hermitian W, by dropping the nonhermitian terms (or, taking W, = +A?:)); later on, we shall analyze the small influence these nonhermitian terms have on the optics of the system. Thus, we write
8
-
<
4%
-
273
OPTICS O F CHARGED PARTICLES
with
A, =
-Po -qA,
1
+ -(+? 2PO
+JP)
Taking 8 I = -ihV, - q A , , Eq. (67) becomes the basic scalar Schrodinger- equation of charged-particle optics; if we drop the terms l/p: in W, the resulting Schrodinger equation corresponds essentially to Glaser’s equation, for the monochromatic case, including the relativistic correction (see Hawkes and Kasper, 1994, Section 58.1) and the third-order l/p& dropped are essentially certain very small aberrations. The terms quantum corr5ction terms which, besides the non-Hermitian terms, would not appear if W, is obtained by quantizing directly the classical (geometrical) optical Hamiltonian as was done by Glaser to get the nonrelativistic paraxial Schrodinger equation; in other words, such terms disappear in the classical limit as we shall see later. Glaser’s npnrelativistic paraxial Schrodinger equation is obtained by dropping from W, also the third-order aberration term ( - l/p:) and taking p o and FZ to be given by their and 2m0q4(r), respectively; these values are nonrelativistic values obtained by the nonrelativistic approximations of Eq. (18) and Eq. (26) using the relations E 4 moc2 and Iq+(r)l 4 moc2.Under this nonrelativistic approximation we have 1 k2(r) = -hZ { 2 m o ( E - q4W)) (69)
-
-
-4
and then it is seen that Eq. (15) is equivalent to the nonrelativistic approximation of the Klein-Gordon equation [Eq. (4)], namely, the nonrelativistic Schrodinger equation
8
[Eq. (6611, Having obtained the basic optical Hamiltonian operator we can now proceed to get the desired relation between I(lin and lout using the well-known techniques of quantum mechanics.
274
R. JAGANNATHAN AND S .
+
...
A. KHAN
( 72)
9
where i is the identity operator and 9'denotes the path-ordered exponential. For our purpose,)he most convenient form of the expression for the z-evolution operator ~'CZ(~), z(')), or the z-propagator, is 9(2(2), ~(1)) =
with
exp
(73)
275
OPTICS OF CHARGED PARTICLES
as given by the Magnus formula (see Appendix C for details). It may be noted that when is hermitian, or when it is approximated t? A,, f becomes hermitiap and g j s unitary. In this case, we shall denote T 2nd $ is approximated to W,, or respectively, by T and U. Thus, when corresponding to Eq. (67), we have
I I)( P) ) = q P), & I ) ) II)( z(1)) ) ,
(75)
where
c(
z(2),z(1)) =
if
exp - -f( 2 ( 2 ) ,
with
+
(76)
...
(77) In order to understand the electron optical image formation we should work with the coordinate representation in the Schrodinger picture. So, we write Eq. (711, the integral form of the optical Schrodinger equation [Eq. (691, as I)(r(:), z @ ) )= / d z r ( ' )G(ry), z(');r(;), z('))#(ry), z ( ' ) ) ,
(78)
where the Green's function G(ry), I('); r(l),z('))is given by G(r(:), z ( ~ )ry), ; z(l)) = (r(:)I,@z(*),Z('))lr(:)) =
/ d2r
~
2
(r* I
- .':))@
z(2),I (1) ) S 2 ( r ,
-r':)).
(79)
As is well known, this is onlx a formal expression of the Green's function as the matrix element of 9and its computation is, generally, quite a
276
R. JAGANNATHAN AND S. A. KHAN
difficult task beyond the case of paraxial approximation. The paraxia! approximation, as we shall see later, would correspond to approximating T to a quadratic expression in r I and fi I . There is a semiclassical formula for evaluating G which is approximate, in general, except in special cases like the paraxial approximation. This formula is given in terms of the classical eikonal function (i.e., the optical action function) corresponding to the particle trajectory from the point (ry), z(l))to (r(12),z")). This is how the classical (geometrical optics) solutions for the trajectories of the electron in the given system help us understand the wave mechanics of electron optical image formation by providing an expression (exact or approximate, depending on the situation) for the Green's function, or the optical transfer function (see Hawkes and Kasper, 1994, Chapters 58 and 59). The extension of Glaser's treatment of electron optical imaging, based on the nonrelativistic Schrodinger equation, to a relativistic theory based on the Klein-Gordon equation (Ferwerda et al., 1986a, b) is essentially through this semiclassical formula where, in the relativistic context, the classical relativistic dynamics is to be used to compute the eikonal function. When we want to relate the values of the quantum averages of the observables of the beam at two different points along the axis of the system we have to use essentially the Heisenberg picture. The quantum average, or the expectatiop value, of any observable, say 0, associated with the hermitian operator 0 is given as follows: for the state l$(z)) at the xy-plane at the point z
with the notations
( + ( z ) ~ + ( z )=>/ d 2 r q * ( r . , z ) + ( r l
Sometimes we shall denote ( O ) ( z )by relation in Eq. (711, we have
J).
(82)
( 6 ) < z )also. Now, in view of
the
OPTICS OF CHARGED PARTICLES
277
leading to the required transfer map giving the expectation values of the observables in the plane at z(’) in terms of their values in the plane at if). It should be noted that (O)(Z(’))is real even if the transfer operator 9is nonunitary. We will be using Eqs. (78) and (79) to understand diffraction in the field-free space and electron optical image formation using the round magnetic lens. Equation (83) will be the basis for our understanding of the focusing properties of electron lenses. Later, we shall see how this formalism, developed for the scalar wavefunction so far, is generalized to the case of the Dirac spinor wavefunction. It will be seen later that, in the classical limit, Eq.(83) leads to the Lie algebraic treatment of geometrical charged-particle optics pioneered by Dragt et al. (e.g., see Dragt and Forest, 1986; Dragt et al., 1988; Rangarajan et al., 1990; Ryne and Dragt, 1991; see also Forest et al., 1989; Forest and Hirata, 1992). The differential form of Eq. (83) corresponds to the Heisenberg equation of motion for the observables and would lead to the trajectory equations of geometrical optics in the classical limit in accordance with the correspondence principle (Ehrenfest’s theorem). Finally, it may also be noted that taking A = (0, 0,O)and writing k’(r) = oZnZ(r)/c2= o’(n; - i?’(r))/c’, where w is the circular frequency and n(r) is the refractive index, Eq. (27) becomes the Helmholtz equation of light optics and the above formalism should be suitable for use in certain situations, in the theory of graded index fibers, for example. In fact, wherever one studies single-frequency propagating rays making small angles with the principal direction of propagation the above method could be used-in ocean acoustics, for example (see Patton, 1986, and references therein). We shall now proceed to consider the applications of the above scalar theory. Before that, let us close this section with the following observations. We have assumed throughout, in the above treatment, an ideal picture of the beam with respect to its energy, namely, a monoenergetic (or monochromatic) beam [Eq. (311. In practice, the beam purity may be imperfect for various reasons: the initial distribution of emission kinetic energies, inelastic scattering by the specimen (object being imaged), the Boersch effect (widening of the energy spread by the interaction between the beam particles), fluctuations in the accelerating potential and the parameters of the system field, etc. So, in general, the beam wavefunction will be such that we can write
278
R. JAGANNATHAN AND S. A. KHAN
with qin(r,t )
=
q ( r ,z I zin,t ) 1
-
(2.rrh 1312
L
d3p +(p) e i [ ~ . -r (
m d + E ( P W I/ h
,
> /po+Ap P’PO-AP
d”I+(p)l2
=
1, (85)
where E ( p ) is the (kinetic) energy of the particle of momentum p entering the system, from the input region, which will be conserved. Substituting the form of the wavefunction given by Eq. (84) in Eq. (l),we find that each )(IE(,,)(r) satisfies Eq. (41, the time-independent Klein-Gordon (or Schrodinger) equation, because the system field is static. Then, it follows that, in this case, Eq. (78) generalizes to
-/
P O + ~ /P ~ ~z ~ ( Ie -)i ( m o c 2 + E ( p ) ) t / f i
Po-AP
X
G(ry), z(’);r(’) I , z(’); p ) +E ( P )(ry),d ’ ) ) , (86)
with G(r(’) I *z ( ~ )ry), ; z(’);p ) as the Green’s function coFesponding t? the momentum p [i.e., defined by Eq. (79) with p o = p in 3 defining fl.In the quasimonoenergetic case, when A p z: 0, it is clear that we can write Eq. (86) as Y(ry), z ( ~ )t ,)
3
=
/ d’r‘’) G(r(:), z ( ~ )ry); ,z(’);p o )
/ d2r(’)G(r(:), z(’);r(!), z(’);po)q(r(j),z(’),t),
(87)
as implied by Eq. (78). In electron microscopy the purpose is to deduce ‘Pin (the object wavefunction) as far as possible from whatever knowledge can be had about (the image wavefunction). How best this is achieved also depends how best the approximation in Eq. (87) holds or how best, in practice, the chromatic and other aberrations can be reduced. The extent of chromatic aberration is indicated by the deviation in the transfer map ((rI >, (p I ))(object plane) -,((r I ), (p ))(image plane) due to the energy
OPTICS OF CHARGED PARTICLES
279
spread and the small variations in the field parameters (for details see Hawkes and Kasper, 1989a, b, 1994). Throughout this chapter, we shall be concerned only with monochromatic beams, since the extension of the theory to nonmonochromatic case, based on Eq. (86), is, in principle, straightforward. The other aberrations which are to be controlled to get the image clearer are the geometrical aberrations due to the beam not being ideally paraxial. Another crucial factor limiting the resolution of the image is diffraction arising at the edge of the aperture, confining the electron beam, located in the field-free space behind the lens. B. Applications 1. Free Propagation:Diffraction
Let us consider the propagation of a monoenergetic quasiparaxial beam of particles of any single kind (or a quasiparaxial monochromatic photon beam) in free space. Now, our system is an infinite slab of free space situated perpendicular to the z-axis between the coordinates zin and zOut. The corresponding optical Hamiltonian is
as seen from Eqs. (66) and (68) by putting A = (O,O,O), +(r) = 0, and, consequently, i;: = fi: , p 2 = 0. Under the paraxial approximation,
A, = A 0 . p = -po + - P I ,- 2 2Po
retaining only terms up to second order in 3, in view of the paraxiality condition Ip I I 4 p o assumed to be valid for all particles of the beam. Then, with z(*)= zin and z(’) = zout,Eq. (75) becomes
1 @out)
= ciD,p< zout
9
Zin
I@in ),
(90)
with
,
Az =zOut- z i n ,
280
R. JAGANNATHAN AND S. A. KHAN
where the subscripts p and D indicate, respectively, paraxial approximation and drift in free space. In coordinate representation
The matrix element (r I ,outlexp{-(i/h) Az((l/2po)fi:))C I , i n ) is readily calculated and is just the well-known Green's function of a nonrelativistic free particle of mass po moving in the xy-plane and corresponding to a time interval Az (see Appendix D for details of the calculation). Then, we have GD,p(r
I .out
3
zout;
I , i n , 'in)
Now, Eq. (92) becomes + ( r I .out, zout)
which is the well-known Fresnel diffractional formula [see Hawkes and Kasper, 1994, Eq. (59.17)]; A, is the de Broglie wavelength 2rh/p0. Here, the zin-plane(the xy-plane at z = zin)is the plane of the diffracting object and the zout-plane (the xy-plane at z = zOut)is the observation plane; +(r I , i n , zin)is the wavefunction on the exit side of the diffracting object
28 1
OPTICS OF CHARGED PARTICLES
and I+(r I,out, ~,,,~)1~ gives the intensity distribution of the diffraction pattern at the observation plane. It is clear that the approximation of A, as in Eq. (89), dropping terms of order higher than second in 3, , essentially corresponds to the traditional approximation used in deriving the Fresnel diffraction formula from the general Kirchhoff s result. We can also recognize now that Eqs. (71)-(74), (78), and (79), along with Eq. (661, represent, in operator form, the general theory of charged-particle diffraction in presence of electromagnetic fields (for more details of diffraction theory, see Hawkes and Kasper, 1994, Chapters 59 and 60). Actually, the free propagation case can be treated exactly. The expression for the Hamiltonian A, given above, in Eq. (88), is an approximation obtained by quantizing the for the exact result, A,, = classical expression for -p, for a particle of momentum p o ; this exact result will be obtained in the infinite series form in our approach also if we continue the Foldy-Wouthuysen transformation process up to all orders. Hence the exact form of the Green's function is given by the matrix element of exp((i/ii) Az and it can be shown that an explicit evaluation of this matrix element is possible leading to the well-known exact scalar wave Green's function for the plane (for details see Dragt, 1995). Let us now work out the transfer maps for the expectation values of the transverse coordinates (r ,) and their conjugate momenta (p ,) in the case of free propagation. Let a particle of the input beam be associated with a wavepacket +(r I ,z ) having (r ,)in = (r I )(zi,) and (p ,)in = (p ,k i n ) as the average values of the transverse coordinates and momenta at the 2,-plane. In the geometrical optics picture the particle corresponds to a ray intersecting the z,,-plane at the point r ,= (r ,)in with the gradient dr I /dz = p / p , = p ,/ p o = (p ,>in/po;the paraxiality condition is seen to be dx/& 4 1, dy/dz 4 1. From the formula of Eq. (83), and using Eqs. (76) and (77), we get
d m ,
i n )
,
(r,
(P,
)out =
)O"t
( r , )(zout)
-
( q i n l e ( i / h ) A ~ f i or I e - ( i / h ) A z f i ,
=
(P,
kl,
(96)
l+in),
(97)
)(tout)
- (+inle(i/fi)Azk$ with
[+in),
,e - ( i / f i ) A z f i ,
as given in Eq. (88). Using the relation
282
R. JAGANNATHAN AND S. A. KHAN
valid for any pair of operators
(a,61, we have from Eqs. (96) and (97)
confirming the rectilinear propagation law for the free ray. In matrix form, we have
giving the familiar transfer matrix for free propagation in terms of the traditional ray variables (r ,dr I /&I. Taking p z = p o , we can write
giving the transfer matrix in terms of the canonical phase-space variables (r ,p I) [see Ximen (1991) for a treatment of geometrical electron optics using the canonical variables (r ,p I)I. 2. The Axially Symmetric Magnetic Lens: Electron Optical Imaging The axially symmetric magnetic lens, or the round magnetic lens, is the central part of any electron microscope. It comprises an axially symmetric magnetic field and there is no electric field. In practice, the round magnetic lenses of electron microscopy are convergent lenses. The four vector potential of any round magnetic lens can be taken as +(r)
=
0,
283
OPTICS OF CHARGED PARTICLES
with
1 B""(z ) - ... (105) 8 192 where B o ( z ) = B ( z ) , B Y z ) = dB(z)/dz, B"(z) = d Z B ( z ) / d z 2 ,B"'(z) = d3B(z)/dz3,B""(z)= d4E(z)/dz4,., . , and, in general, B(2")(z)= d(2n)B(z)/dz(2") (for more details, see Hawkes and Kasper, 1989a, b). The corresponding magnetic field, rotationally symmetric with respect to the optic axis of the system (z-axis), is 1
= B( z )
B,
=B(z) -
- -r: B"( z ) + - r t
1
-riB"(z) 4
1 + -r:B""(z) 64
9
-
... ,
( 106)
as given by B = V X A; it may be noted that the function B ( z ) completely characterizes the field. In general, we shall use the notation f ' ( z ) = df(z)/dz, f"(z) = d2f(z)/dz2,..., for any f(z). To understand the single-stage imaging by the simple round magnetic lens, we shall first use the paraxial approximation and, later, look at the aberrations due to deviations from such an ideal situation. The paraxial approximation entails dropping from the Hamiltonian terms of order higher than second in 18 I I/po, in view of the condition Jp,I/po 4 1, and approximating the magnetic field by
(:
1 B = Bo = - - x B ' ( z ) , - - y2 B ' ( z ) , B ( z ) ) ,
( 107)
keeping only the leading terms in the expressions [Eq. (10611 for its components, considering that only the lowest order terms in r , would contribute to the effective field felt by the particles of the paraxial beam; the subscript 0 indicates that only the first term (n = 0) in Eq. (105) has been retained in this approximation. The corresponding approximation of the vector potential is
The practical boundaries of the lens, say, z , and z , , are determined by where the function B ( z ) becomes negligible, i.e., B ( z < z , ) = 0 and B ( z > z , ) = 0; for the lens system zin= zI and zOut= z,.
284
R. JAGANNATHAN AND S. A. KHAN
We are now concerned with a monoenergetic quasiparaxial beam (with IpI =pa for all its constituent charged particles, electrons in the case of
electron microscopy) illuminating the specimen (object to be imaged), being scattered elastically and going through a size-limiting aperture. The wavefunction of the beam exiting from the aperture is our Jlin, and will be the final image wavefunction. In this case lset us denote in and out, respectively, by “0” and “i” meaning object and image. So, Jlin at the exit plane of the aperture, z,-plane, will be denoted by Jl(r I ,, z , ) and Jlobbut, or the image wavefunction, at the zi-plane (to be determined) will be denoted by Jl(r I , i , zi). We shall assume the lens field to be such that a picture of our complete system is: object wavefunction (at 2,)-free space-lens (I, s z s 2,)-free space-image wavefunction (at zi). The corresponding optical Hamiltonian, under the paraxial approximation, is obtained by substituting +(r) = 0 (or 3’ = 0 ) and the above expression for A [Eq. (108)l in Eq. (66) and retaining only terms of order up to two in (r I ,$ The resulting Hamiltonian is: A
.
=
-Po
7
+ - a’, + -1q 2 B 2 ( z ) r :
2 Po ih 1
- 4B(z)iz)
4
+-(4p;
- ZB(z)B’(z)r:
0 -0
in the lens region ( z , I z I z r ) , forr
z,,
2q
- qB’(z)i,
where #
’(’)(
(110)
and
i,= x&
(111) - yax is the z-component of the angular momentum operator i = r X fi. It is obvious that when we compute the transfer operator y= exp(-(i/fL)T) [Eqs. (73) and (74)], using the above Hamiltonian [Eq. (10911, the third (nonhermitian) part l/pi and the fourth (4ermitian) part l/pi will not contribute to the leading term, /2 dz%,p, since these parts contain only terms which are proportional to B ’ ( z )or ( B 2 Y ( z ) and B ( z ) has been take? to vanish at z , and zi. By the same argument, their contributions to 9 from the commutator terms in Eq. (74) can be expected to be quite small when we compute the transfer operator from a plane at z < z, to a plane at z > z,. Thus, we shall take the Hamiltonian
-
-
285
OPTICS OF CHARGED PARTICLES
of the system to be
with B ( z ) being norzero only in the lens region (z, Iz Izr). Another way to see that the terms l/pi and l/pi in the above*,, may be dropped is to observe that these are proportional to changes in B ( z ) or B 2 ( z )over a distance of the order of only a de Broglie wavelength. With the notations
-
A
-0,
-
1
P =
let us write the Hamiltonian
-b;
2Po
1
+ ~ P o F ( Z ) ‘2 , ,
A,,, as A
-
-Po - e ‘ ( z ) L , +
wo,p’ A
(116)
Note that if z represents time is the Hamiltonian of the isotropic harmonic oscillator in the xy-plane with mass p o and time-dependent circular frequency Thus, modulo the constant ( - p o l and the rotation (- O’(z)L,) terms, the optical Hamiltonian governing the evolution of the paraxial beam along the z-axis is like the Hamiltonian of a two-dimensional isotropic harmonic oscillator with time-dependent frequency. This connection between beam optics and harmonic oscillator is of course well known [e.g., see Agarwal and Simon (19941, in the context of light optics, and Dattoli et al. (1993), in the context of charged-particle optics]. Since A , is hermitian, the corresponding transfer operator will be unitary as in Eqs. (76) and (77). Consequently, in this case, the total intensity of the beam at any plane is a conserved quantity:
dm.
j c ~ r ~ + ,(2 r1 1~ = ~ constant.
( 117)
Let us now consider the transfer operator fip(z,z,) for z > z;, ultimately our interest will be to d5termine the z,-plan$, wher? the image is formed, and the corresponding Up(z,,2,). Since p o , L,, and W,, commute with each other, we can write
fip(z,zo)
= e(i/h)po(z-z,)e(;/fi)e(z.r,)i,fi - p ( Z ,
Z0)Y
(118)
286
R. JAGANNATHAN AND S. A. KHAN
where e(z, 2,)
=
/Ldz e y z )
(119)
Z"
and dP(z,z , ) = exp( -Ji/fL)$,(z, z,)), the transfer operator correspondilng to the Hamiltonian is to be computed using Eq. (77) with W, replaced by I&,. Then, + ( r , , z ) = (d2r0Gp(r1
,
~
~
~
I
,
O
~
~
O
~
+
(120) ~ ~ I
with the Green's function Gp(r, J;rL*o9zo) =
(r I 1fiP< z ,zo Ir I ,,)
=
e(i/*)po(z-zo)/d2f(rI
~e(i/*)e(z,zo)izls~
~ $ ( z ,zo)lrI ,,).
(121)
First, let us note that (r I le(i/*)e(z*zo)Lzl? I ), the matrix element of the operator for rotation around the z-axis through an angle O(z, z,), is given by (see Appendix E for details) ( r I l ~ ( i / * ) e ( z , z o ) L ~=~S2(rI ~I) (e)
(122)
- fI),
where I',
( e < z , z , > )=
( ~ ( ~ ( ~ , Z O ) ) , Y ( ~ ( Z , Z O ) ) )
.
(123)
Substituting the result of Eq. (122) in Eq. (1211, we get Gp(rI ,z ; r ,,, z , )
=
e ( i ~ ' ) p o ( z - z o )I ~p (e(( rz , z , ) ) , z ; r I ,,, z , ) , ( 124)
where Gp(r * ( e( z z , )) z ; 9
9
I .o 9 2,)
=
(r I ( e( Z,Z,) )lop( z
9
2,)
Ir I ,o ) (125)
is the Green's-function corresponding to the time-dependent-oscillator-like Hamiltonian No, p. The exact expressions for the transfer operator op(z, 2,) and the Green's function Gp(rI (O(z,z,)), z ; r I ,,, z,), which, in fact, take into account all the terms in the infinite series in Eq. (771, can be written down. We shall
,
O
~
~
OPTICS OF CHARGED PARTICLES
287
closely follow the prescription by Wolf (19811, which can be used for getting the evolution operator and the corresponding Green’s function for any system with a time-dependent Hamiltonian quadratic in (r I ,fi ) (see Appendix F for some details). This is possible because of the Lie algebraic structure generated by the operators {r: ,at ,r -fiI + fi I -r I). The results in this case are:
if hp(z , z o ) # 0, (127) Gp(rJ.
where
(e(z’z0))~z;rI,09z0)
288
R. JAGANNATHAN AND
S. A. KHAN
with gp(z, z , ) and hp(z,z,) as two linearly independent solutions of either ( x and y) component of the equation r", ( z ) satisfying the initial conditions
+ F( z)r
g p ( z o , z o )= N p ( Z o J o ) and the relation
1,
=
I
( z ) = 0,
hp(Zo,Zo)=g;(Zo,Zo)
( 130) =
0, (131)
for any z 2 2., (132) gp(z, z,)Mp(z, 2,) - h p ( z ,z,)g;(z, 2,) = 1, As we shall see soon, Eq. (130) is the classical paraxial ray equation for the beam, modulo a rotation about the z-axis; with z interpreted as time, Eq. (130) is the equation of motion for the classical system associated with the p , namely, the isotropic two-dimensional harmonic oscillaHamiltonian tor with the time-dependent circular frequency Now, from Eqs. (120), (1241, (127), and (1281, it follows that
h0,
*(rl
dm.
9 2 )
if hp(z , 2 , )
f
0, (133)
if hp(z , z,)
=
0, (134)
representing the well-known general law of propagation of the paraxial beam wavefunction in the case of a round magnetic lens (Glaser, 1952, 1956; Glaser and Schiske, 1953). Equation (133) is precisely same as Eq. (58.42) of Hawkes and Kasper (1994) except for the inclusion of the extra phase factor e i 2 " ( z - z ~and ) ~ Athe ~ Larmor rotation factor in the final z-plane; these extra factors would not appear if we remove the axial phase
289
OPTICS OF CHARGED PARTICLES
factor in the beginning itself and introduce a rotated coordinate frame as is usually done. We shall not elaborate on the well-konwn practical uses of the general propagation law [Eq. (13311: it may just be mentioned that Eq. (133) is the basis for the development of Fourier transform techniques in the theory of electron optical imaging process (for details, see Hawkes and Kasper, 1994). As is clear from Eq. (1341, if h p ( z , z o )vanishes at, say, z = zi, i.e., hp(zi,z,) = 0, then we can write 1
.
+ ( r l , i , z i ) = -e'Yo(ZI*zo)+(rI,i(s)/M,z,), M withM=gp(zi,z,),6= e(zi,zo), 2lr ~ o ( z i , z o )= -[(zi
-20)
+gb(zi,~o)r:,i/2~],
(135)
A0
1
I+(rl,i,zi)12 =
zI+(r, ,i(s)/M,Zo)l
2
.
( 136)
This demonstrates that the plane at zi, where h p ( z , z , ) vanishes, is the image plane and the intensity distribution at the object plane is reproduced exactly at the image plane with the magnification M = gp(zi,z , ) [see Eq. (58.41) of Hawkes and Kasper, 19941 and the rotation through an angle
a = e(zi,z,)
=
1d z e y z ) zi
20
As is well known, the general phenomenon of Larmor precession of any charges particle in a magnetic field is responsible for this image rotation obtained in a single-stage electron optical imaging using a round magnetic lens. It may also be noted that the total intensity is conserved: obviously,
290
R. JAGANNATHAN AND
S. A. KHAN
We shall assume the strength of the lens field, or the value of ~ ( z )to , be such that the first zero of hp(z, zo) is at z = zi > z,. Then, as we shall see below, M is negative as should be in the case of a convergent lens forming a real inverted image. So far, we have looked at imaging by paraxial beam from the point of view of the Schrodinger picture. Let us now look at this single-stage Gaussian imaging using the Heisenberg picture, i.e., through the transfer maps ((r )(zo), (p )(zo)) + ((r )(z), ( p )(z)). Using Eqs. (83), (98), (118), and (126), we get
,
(I. )(z)
,
,
,
(*(z,)lfi;(z, z0).,4(z, z0)l@(zo)) zo)lfi;(z, = <*( - zo)e-(i/we(z.zo)L*r
=
~.)i,c -p( z, o)l*(zo))
x e(i/~e(z. - (~
-
(z*z~)r~
( z o ) ~ e - ( i / ~ ) ~ ( z . ~ o ) ~ z ~ ~
x &(I, s ) e( i / h)ecz, z,)izl J,(,~) ) =
(Ik(zo)le-(i/fi)e(z. z 0 ) L
x (gp(z, zo)r, +hp(z, zo)$ , / p o ) e ~ i ~ ~ ~ e ~ z ~ z ~ ~ L =
(*(zo)lgp(z,zo)r* (-@(z,zo))
+ hp(z, zo)B* ( - e ( z ) ) / ~ o J * ( z o ) ) = gp(G zo)(r,
( - e ( z *zo)))(zo)
+ h p ( G ~ O ) ( P *( - ~ ( z ~ ~ o ) ) ) ( ~ o ) / ~ o ~ (139) with
< r , ( - e(z9 zo)))(zo) = (COS e ( z , z,)(x>(z,) -sin
+ sin e ( z , zo)(y)(zo), e(z, zo)(x)(zo) + cos e(z, zo)(y)(zo)),
(P* ( - e ( z * zo)))(zo) = (COS e ( z , ~ ~ ) < p , > (+ z sin ~ ) e ( z , zo)(p,)(z0), -sin e(z, zo)(px)(zo) + cos e ( z , z,)( py)(z0)). (140) Similarly, we have
(P, )(z)
= pogb(z9 zo)(r.
(-e(z9 zo)))(zo)
+ h'p(z,zO)(~l(-e(z,zO)))(zO)'
(I41)
29 1
OPTICS OF CHARGED PARTICLES
At the image plane at z = zi, where hp(zi,z,) Eqs. (139) and (141) becomes
=
0, the transfer map in
(r, >(zi) = M ( r , ( - 6 ) ) ( z , ) ,
(P, )(zi) =PogL(zi,zo)(r, (-'))(zo)
+
(P, (-'))(zo)/MS
(142)
where 6 is given in Eq. (137), M = gp(zi,z,), and 1/M = h',(zi, z,) [see Eq. (132)l. The content of Eq. (142) is essentially the same as that of Eq. (136); i.e., at the image plane a point-to-point, or stigmatic, image of the object is obtained and the image is magnified M times and rotated through an angle 6. It may also be verified directly that, as implied by Eq. (142), (r, )(zi)
=
/d2ri r1,iI#(rl,i,zi)12
=
- /d2ri r , , i l # ( r ~ , i ( 6 ) / M , z ~ ) 1 2
1
M2
= M/d2ro
r l ,,( -6)I#(r*
,o*
z,)12
= M(r. ( -*))(zo). (143) Let us now see how (r )(z) and (p )(z) evolve along the z-axis. Since
,
d
i ,
A
- Up(2, 2,) dZ
a ,
=
- ~WOUP(Z z o.) ,
dZ U ,'( 2, 2,) =
it follows that d -(r I >(2)
dz
d
-(PI dz
=
1
,
nU,'( 2, z,)A,,
x(i + ( z , ) ~ f i ~ ( 2,)z , [Qo,p,r ,] fip(z9z,)I+(z,)), 1
(144)
(145)
> ( z ) = -fi ~ + ~ ~ o ) l f i , + ~ ~ , ~ o ~ [ Q , , p ~ ~ l (146) ]fip~~~~o
Explicitly, these equations of motion, Eqs. (145) and (146), become
292
R. JAGANNATHAN AND
T(Z) =
P(Z) =
I
0 0 -F(z) 0
-eyz) 0
S. A. KHAN
1 0 0 0
0 0 0 -F(z)
0 1 0 0
o
0
0
0
0
ep(z)
(149) *
where
-
-sin O( z , 2,) 0
cos f3( z , z , )
0
0
0
cos e( z , z , )
sin e( z , z , )
’
OPTICS OF CHARGED PARTICLES
293
If we now go to a rotated coordiante system such that we can write
with ( X , Y ) and (Px, P y ) respectively as the components of position and momentum in the new coordinate frame, then Eq. (150) takes the form
Note that xyz and XYz frames coincide at z = z o .Then, the equations of motion for ( R , ) ( z ) = ( ( X ) ( z ) ,( Y ) ( z ) ) and (P, ) ( z > = ((Px)(z), ( P y ) ( z ) )become
From Eq. (156) it follows that
which represent the paraxial equations of motion with reference to the rotated coordinate frame; now, compare Eq. (158) with Eq. (130). Equation (159) is not independent of Eq. (158) since it is just the consequence of the relation ( d / d z ) ( R ,) ( z ) = (P, ) ( z ) / p o [see Eq. (15611, and a solution for (R ,) ( z ) yields a solution for (P, ) ( z ) .
294
R. JAGANNATHAN A N D S. A. KHAN
Equation (156) suggests that, due to its linearity, we can write its solution, in general, as
where, as already mentioned above, the second relation follows from the first assumption in view of the first relation of Eq. (1561, namely, ( d / d r ) ( R , ) ( z ) = (P, )(z)/po.Substituting the first relation of Eq. (160) in Eq. (158) it follows from the independence of (R ) ( z , ) and (P, ) ( z , ) that
,
g;(zJo)
+ F ( z ) g , ( z , z , ) = 0,
h”,z,zo) + F ( z ) h , ( z , z , ) = 0
(161)
Since at z = z , the matrix in Eq. (160) should become identity we get the initial conditions for gp(z,z,) and hp(z, 2,) as
gp(z,,z,) = ~ p ( z o , z o=) 1,
=g6(z0,z,)
hp(Z,,Z,)
=
0. (162)
In other words, g ( z , z , ) and h,(z, z,) are two linearly independent solutions of either (X or Y)component of Eq. (158) subject to the initial conditions in Eq. (162). From the constancy of the Wronskian of any pair of independent solutions of a second order differential equation of the type in Eq. (158) we get g p b ZO)h‘,(ZY
20)
-h
p b
= gp(z0, Zo)h’p(zo, 2,)
zo)gb(z,
20)
- hp(Z0,
Z O k b ( Z 0 , 20) =
1,
for any z 2 zo. (163) Thus, it is seen that the solutions of Eq. (1301, g,(z, z , ) and h,(z, z,), contained in Eqs. (126)-(128), (1331, and (134) can be obtained by integrating Eq. (156). Note that we can formally integrate Eq. (156) by applying the formula in Eq. (72) in view of the analogy between Eq. (65) and Eq. (156): the amtrix in Eq. (160) can be obtained using Eq. (72) by replacing (-(i/h)$,) by the matrix in Eq. (156). The result obtained gives g,(z, 2,) and h,(z, 2), as infinite series expressions in terms of F ( z ) . Then, with
OPTICS OF CHARGED PARTICLES
295
and 9 ( z , z,) as given by Eq. (153), Eq. (150) is seen to be the matrix form of Eqs. (139)-(141). This establishes the correspondence between the transfer operators in the Schrodinger picture and the transfer matrices in the Heisenberg picture: e(i/fi)Wz9zo)Lz
fip(z, z o )
+9( z , z , ) ,
= e(i/fi)e(z.4zfi-p(G =
~
(
~
,
~
o
)
20)
~
Explicitly, g,(
2, 2,) =
1 - /z&l 20
/"'&F( 20
-
@p(zJo) - , . % ( Z 7 Z O ) ,
2)
s,(2 9 2 0 )
=( z~& z, , ~) ao( z)J , ) .
(165)
296
R. JAGANNATHAN AND S. A. KHAN
x / z 3 d z z jz2dz1F( zl)( 20
2,
-zo)}+
... .
(169)
zo
It is easy to verify directly that these expressions for gp(z, z,) and hp(z, 2,) satisfy Eqs. (161) and (162). The transfer operator defined by Eqs. (71)-(74) [or Eqs. (75)-(77)1 is an ordered product of the transfer operators for successive infinitesimal distances from the initial point to the final point, an expression of the Huygens principle in operator form. Hence it can be written as an ordered product of transfer operators for successive finite distances covering the entire distance from the initial point to the final point. Thus, we can write cp(Z,Zo)
c
> z,,
= ~ ~ , p ( ~ , ~ r ) ~ ~ , p ( ~ r , z ~ ) ~ D , pwith ( ~ ] 2,, ~ o Z) ,I , ~
(170)
where D refers to the drift in the field-free space and L refers to the propagation through the lens field. Consequently, one has
$(rl
9 2 )
= / ~ 2 ~ r / ~ z ~ ~ / ~ , 2z ;~r ~ o , ~, ~~ z ,1 )p ( ~ ~
x GL.p(r I , I f 21;
x
I , I , 21)
, 1+(r I ,o
~ D , p (1 r , I , 21; r I ,n z o
9
zo)
*
(171)
297
OPTICS OF CHARGED PARTICLES
Using the direct product notation for matrices,
where
a= e(t,z,)
=
e(Zl,rr).
Since F ( z ) = 0 outside the lens region, we have, from Eqs. (166)-(169),
with
Il as the 2 x 2 identity matrix. For the lens region
=
(gp*L gP,L
"IL)
h'P,L
8
R(6),
(175)
with g , , = gp(zr,zI), h , , = hp(zr,q), g;,, = g$z, Z ~ ) I ~ = ~lip,, ,, = h',(z, Z ~ ) I ~ = ~ , Then, . substituting Eqs. (17314175) in Eq. (170) we get the
298
R. JAGANNATHAN AND S. A. KHAN
identity
If we now substitute
then Eq. (177) becomes the familiar lens equation
u1 + ,1= T 1,
(179)
with the focal length f given by
Equation (178) shows that the principal planes from which the object distance ( u ) and the image distance ( u ) are to be measured in the case of a thick lens are situated at
OPTICS OF CHARGED PARTICLES
299
The explicit expression for the focal length is now obtained from Eqs. (168) and (180):
To understand the behavior of this expression [Eq. (18211 for the focal length, let us consider the idealized model in which B ( z ) = B = constant in the lens region and 0 outside. Then l/f = (qB/2p0) sin(qBw/2po) where w = (zr - z , ) is the width, or thickness, of the lens. This shows that the focal length is always nonnegative to start with and is then periodic with respect to the variation of the field strength. Thus, the round magnetic lens is convergent up to a certain strength of the field beyond which it belongs to the class of divergent lenses, although this terminology is never used due to the fact that the divergent character is really the result of very strong convergence (see Hawkes and Kasper, 1989a, p. 229). In practice, the common round magnetic lenses used in electron microscopy are convergent. The paraxial transfer matrix from the object plane to the image plane now takes the form
as is seen by simplifying Eq. (176) for z = zi using Eqs. (177)-(180). Note that in our notation both u and u are positive and M is negative, indicating the inverted nature of the image, as should be in the case of imaging by a convergent lens. Another observation is in order. When the object is moved to -03, i.e., u + w, u is just f. Hence, the focus is situated at zF = zPi + f = zr -k fgp,*.I (184) Now, with the object situated at any z , < z , the transfer matrix from the object plane to the back focal plane becomes (185) as is seen by substituting z
= zF
in Eq. (176) and simplifying using Eqs.
300
R. JAGANNATHAN AND S. A. KHAN
(178), (180), and (184). The corresponding wave transfer relation in Eq. (133) shows that, apart from unimportant phase factor and constant multiplicative factor, the wavefunction in the back focal plane is equal to an inverse Fourier transform of the object wavefunction at z , < z , (see Hawkes and Kasper, 1994, pp. 1248-1249 for more details). Let us now consider the lens field to be weak such that l:'dzF(z)
4
w = 2, - 2,.
l/w,
( 186)
I
Note that 1: d z F ( z ) has the dimension of reciprocal length and for the weak lens it is considered to be very small compared to the reciprocal of the characteristic length of the lens, namely, its width. In such a case, the formula for the focal length [Eq. (182)] can be approximated to give 1
- = /:"F(
q2
q2
1
7 dzB2(Z ) (187) 4Po Z I 4Po --m f l which, first derived by Busch (1927), is known as Busch's formula for a thin axially symmetric magnetic lens (see Hawkes and Kasper, 1989a, Chapters 16 and 17 for details of the classical theory of paraxial, or Gaussian, optics of rotationally symmetric electron lenses). A weak lens is said to be thin since in this case f*w ( 188) as seen from Eqs. (186) and (187). For the thin lens the transfer matrix can be approximated as Z) = 7 /"dzB2( z ) =
1 1 - - ( z - zp)
f
1
--
f
1
--(zp - z,)(z
f
1 1 - -(
f
1
- zp)
Zp - z o )
1 with zp = ~ ( z+ ,2,). (189)
In this case the two principal planes collapse into a single principal plane at the center of the lens. If imaging occurs at z = zi for a given zo then u = zp - z , and u = zi - zp satisfy the lens equation l/u + l / u = l/f
OPTICS OF CHARGED PARTICLES
301
and the transfer matrix from the object plane to the image plane becomes
( -y/f l;M)
8 R(6), with
M
=
-v/u.
From the structure of the
transfer matrix in Eq. (189) it is clear that apart from rotation and drifts through field-free regions in the front and back of the lens the effect of a thin lens is essentially described by the transfer matrix
(-bf
:)
which, as seen from Eq. (1341, corresponds to multiplication of the wavefunction by the phase factor exp(-(i?r/A,f)r:) as is well known. As has been emphasized by Hawkes and Kasper (1994) (see pp. 1249-12501, although the attractive paraxial theory is in full agreement with the corresponding classical corpuscular theory it is certainly wrong, since the inevitable lens aberrations and all diffraction at the beam-confining apertures are neglected. Let us now look at the aberrations due to the beam not being ideally paraxial.ATo this end, we shall treat the nonparaxial teArmsin the Hamiltonian W, as perturbations to the paraxial Hamiltonian W , and use the well-known technique of time-dependent perturbation theory in quantum mechanics utilizing the so-called interaction picture. In the classical limit this treatment tends to the similar approach pioneered by Dragt et al. (e.g., see Dragt and Forest, 1986; Dragt et al. 1988; Rangarajan et al., 1990, Ryne and Dragt, 1991; see also Forest et al., 1989; Forest and Hirata, 1992) for dealing with the geometrical optics of aberrating charged-particle optical systems. When the beam deviates from the ideal pacaxial condition, as is usually the case in practice, we have to retain in [Eq. (66)] terms of order higher than second in (r I ,fi *). Thus, going beyond tbe paraxial approximation, the next approximation entails retaining the 3 terms of order up to fourth in (r I ,fi I ). To this end, we substitute in [Eq. (6611
<
<
4(r) = 0,
( 190)
1
,z),-xII(rI , z ) , O 2 1 withn(r, , z ) = B ( z ) - -r:B”(z), (191) 8 expand, and approximate as desired. As before we shall neglect !hose nonhermitian and hermitian terms which do not contribute to ;/ dz< for z , < z , and z > 2,. Then, we get, up to unimportant additional constant factors which give rise only to multiplicative phase factors,
302
where
R. JAGANNATHAN AND S. A. KHAN
A,,is the paraxial Hamiltonian in Eq. (112) and =
1
1
1
- T a j j ; L z - - a 2 ( $ , *r, + r , 8Po 2Po 8Po 1 + - U3' ( j 5 ; r: + r : p J + -( U" - 4U3)Lzr; 8
8Po
~/m
Po =qB(z) . (193) - a a " ) r : , with a = 8 2Po Note that A,,, is the sum of -po and a homogeneous quadratic polynois to indicate that it is a mial in (r, ,$,). The subscript (4) in homog:neous fourth-degree polynomial in (r ,,fi ) and the superscript (4) in Hi4)indicates that it contains terms of order up ,to four in (r ,,$ ). Absence of odd-degree homogeneous polynomials in W, is a consequence of the rotational symmetry around the z-axis. Now, the z-evolution equation for the system is +-(a4
,
,
Let
I+@)
=
fip(z,zo)l+l(z))
so that Eq. (194) becomes
From Eqs. (192) and (194141961, we get d
ih -I+'(Z)) dZ
= @,,(4)~#1(
z)),
with Afi,(4) =
f i J,Z(p(')4(z,0'),0z
' O ) ,
where the superscript I denotes the so-called interaction picture. Integrating Eq. (1971, we have I+'(Z))
=
q:)(z, zo)l+l(zo))= q:)(z, zo)l+(zo))9
(195)
OPTICS OF CHARGED PARTICLES
with 4:)('7'0)
-;
eq(
/zdzAb,(4)(z)). zo
303
(200)
where we have disregarded all the commutator terms in the formula for fi [Eqs. (75)-(76)] since they lead to polynomials of degree higher than four in (rL ,B,>. Using the result
[see Eqs. (139)-(141)1, with fip = fip(z,zo), g = gp(z7z o ) , h = h&z, Zo)? g' = g&, z0), h' = h&, zo), and 8 = O(z, z,), and Eqs. (193) and (198), we find, after considerable but straightforward algebra, that
304
R. JAGANNATHAN AND S. A. KHAN
where { A ,B }
= AB
+ BA and 7/ l
C(Z,2,)
=
z
&{( a 4 - a a ” ) h 4 + 2a2h2hr2- W4},
LO
51 / z &{( a 4- aa”)gh3+ a’(gh)’hh’ + g’h’’),
K ( z , zo) =
20
[:&(
k(z,z,) =
1
A ( z , 2,)
=
5/
2
( i1 a ” -
1
&{( a 4 - a a ” ) g 2 h 2
zo
+2a2gg’hh‘ + g”h‘’ - a’), a ( z , z , ) =/‘&(($a’’ LO
1
F( z , 2,)
=
12&((
a 4-
- a 3 ) g h - ag’h’),
aa”)g2h2
zo
+ g”h” + 2 a 2 ) , 1 z = 5 / dz{( a4 - a a ” ) g 3 h + a2gg‘(gh)’+ gl’h’), +(u2(g2h’2 + g”h2)
D( z , z , )
=0
E( Z , 2,)
=
1 z 5 / &{( a 4 - a a ” ) g 4 + 2ag2g’’ + g’4}.
(203)
LO
From Eqs. (195) and (199), we have
I$(+
~p(~,z,)q:)(z,z,)I$(z,)~ (204) which represents the generalization of the paraxial propagation law in Eq. (1331, corresponding to the inclusion of the lowest order aberrations. Now the transfer map becomes (L’L
=
)(3)(Z) =
(q:ppk fipq:))(zo), (@C:$ cpqi))(
(205)
(p 1 )(3)( 2) = I 2,) (206) with = 4 : ) ( z ,z,) and ( * - - ) ( z , ) = ($(z,)l-.I$(zJ). The subscript (3) indicates that the correction to the paraxial (or first order) result
4;)
OPTICS OF CHARGED PARTICLES
305
incorporated involves up to third-order polynomials in (r I ,$ I). Explicitly, (X)(3)(4
where the geometrical aberrations, or the deviations from the paraxial
involving expectation values of homogeneous third-order polynomials in (rI ,$,). Hence the subscript (3) for ( A x ) ( ~ ) ( z )( ,A Y ) ( ~ ) ( z )etc., , and the
306
R. J A G A N N A M AND S. A. KHAN
name third-order aberrations, Note that, here, we are retaining only the single coymutator terms in the application of the formula in Eq. (98) to compute V,$&, qi;yqi), etc., since the remaining multiple commutator terms lead to polynomials in (r ,,fi I 1 which are only of degree 2 5 and are to be ignored in order to be consistent with the fact that we have retained only terms up,fo fourth order in (r ,,fi I 1 in the Hamiltonian and the transfer operator U. Obviously, the plane at which the influence of aberrations is to be known is the image plane at z = zi:
where
308
R. JAGANNATHAN AND S. A. KHAN
A =A(zi,z0),
Q
=Q(Z,,Z,), F =F(zi,z0),
D
d
= d( z i ,z 0 ) ,
= D( z i ,z 0 ) ,
E
= E(
zi, z 0 ) .
(214)
With reference to the aberrations of position [Eqs. (210) and (211)l the constants C,, K, k , A , Q, F, D , and d are known as the aberration coefficients corresponding, respectively, to spherical aberration, coma, anisotropic coma, astigmatism, anisotropic astigmatism, curvature of field, distortion, and anisotropic distortion [see Hawkes and Kasper (1989a) for a detailed picture of the effects of these geometrical aberrations on the quality of the image and the classical methods of computation of these aberrations; see Ximen (1991) for a treatment of the classical theory of geometrical aberrations using position, momentum, and the Hamiltonian equations of motion]. The gradient aberrations [Eqs. (212) and (21311 do not affect the single-stage image but should be taken into account as the input to the next stage when the lens forms a part of a complex imaging system. It is interesting to note the following symmetry of the nine
309
OPTICS OF CHARGED PARTICLES
aberration coefficients: under the exchange g h, the coefficients transform as C,c)E, K c) D, k t)d , A t)F , and a remains invariant. To see the connection A c) F we have to use the relation gh’ - hg’ = 1. Introducing the notations u =x
+ iy,
u
= ($x
+ i&)/po,
(215)
the above transfer map [Eqs. (209)-(213)1 can be written in a compact matrix form (see Hawkes and Kasper, 1989a, Chapter 27, for the aberration matrices in the classical context) as follows:
1 0
0 1
c, ik-K
2K ia-2A
D+id
2A+ia
2k -a -a
2d
X
F id-D
K+ik\ -F
I
310
R. JAGANNATHAN AND S . A. KHAN
Let us now look at the wavefunction in the image plane. From Eq. (204) we have
x/d2ro(rI,i(g)/MI~:)(zi,Zo)Irr
x +(rI ,O)
,o)
(217)
20).
lq&,
When there are no aberrations (r I,i(S) / M zo)lrI , ) = 6’(r + , - r l , i( S ) / M ) and hence one has the stigmatic imaging as seen earlier from Eqs. (135) and (136). It is clear from Eq. (217) that when aberrations are present the resultant intensity distribution in the image plane will represent only a blurred and distorted version of the image. Usually, is approximated by keeping only the most dominant aberration term, namely the spherical aberration term, which is independent of the position of the object point [see Eqs. (210) and (211)l. An important result to be recalled here in this connection is the celebrated Scherzer theorem (Scherzer, 1936) which shows that the spherical aberration coefficient C, is always positive and cannot be reduced below some minimum value governed by practical limitations [to obtain this result from the expression for C given in Eq. (2031, see Dragt and Forest (198611. Attempts to correct this aberration have a long history since Scherzer’s theorem and there seems to be much to be achieved yet in this direction (see Hawkes and Kasper, 1989a, b, 1994). Let us also note that, in practice, there are further modifications required to be incorporated in the general propagation law [Eq. (204)] (for details of practical transfer theory, using Fourier transform techniques, and aspects of the influence of diffraction and
4:)
OPTICS OF CHARGED PARTICLES
31 1
aberrations on resolution in electron microscopy, see Hawkes and Kasper, 1994, Chaper 65). For example, one has to take into account the following aspects: (i) the specimen may not be exactly in the plane conjugate to the (fixed) image plane so that a drift factor (&,) of the type in Eq. (91) with a suitable Az (known as defocus; Scherzer, 1949) will have to be considered in defining the actual object wavefunction, and (ii) the diffraction by the beam-confining aperture behind the lens. Now, we have to emphasize an important aspect of the aberrations as revealed by the quantum theory in contrast to the classical theory. We have identified the quantum mechanical expectation values ( r ,) ( z ) and (p I ) ( z ) / p , as the classical ray variables associated with position and gradient of the ray intersecting the z-plane. Then, with the expressions for the various aberration coefficients being the same as their respective classical expressions (of course, under the approximations considered), Eqs. (210)-(213) correspond exactly to the classical expressions for aberrations of position and gradient provided we can replace (fix$; ), ({fix,$ 2, - r , + r , .$,I), < ( x , i ” , ) , etc., respectively, by ( ~ , ) ( ( p , + ) ~( p , ) ), 4((~>(p,> +~( y ) ( p , ) ( p y ) ) , ~ ( ( X ) ( ( P , +) ~(p,,)’)), etc. But, that cannot beAdone. In quactum mechanics, in general, for any observable 0, (+lf(O)l+) =f((+lO)l+)) only when the state I+) is an eigenstate of 0 and, for any two observables, say 0, and O,, only when the state I +) is a simultaneous eigenstate of both 0, and 0, can we have (i,hlf(6,,6,>1+) =f((+16,1+),(+(1621*)). It is thus clear that for the wavepackets involved in electron optical imaging the above-mentioned replacement is not allowed. As a result we see that the aberrations depend not on& on ( r I ) and (p ) but also on the higher order central moments of the wavepackets. Thus, for example, contrary to the classical wisdom, coma, astigmatism, etc. cannot vanish when the object point lies on the axis. As an illustration, ( ( r . ,B:}Xz,), one of the terms contributing to consider the term coma [see Eqs. (210) and (211)] which, being linear in position, is the dominant aberration next to the spherical aberration. The corresponding classical term, ((dX/dzl2 + ( d y / d z ) 2 ) r , at z,, vanishes obviously for an object point on the axis. But, for a quantum wavepacket with (r I ) ( z , ) = (0,O) the value of ( ( r ,fi:})(z,) need not be zero since it is not linear in (r, )(zo). More explicitly, we can write, with S r , = r , - ( r ,) and
,
N
,
a, B, =
-(P,),
312
R. JAGANNATHAN AND S . A. KHAN
=
2(r, > ( z O ) < p > l (z0)’ + 2(r1 ) ( Z o ) ( ( W 2
+((ar,
+ 2(ISr,
,(sax)’ 9
+
+
(vy)’)(~o)
(VY)’})GO)
~ ~ J ( ~ o ) ( P A ~ o )
+ 2({% w y } ) ( ~ o ) ( P y ) ( ~ o ) ~ (218) showing clearly that this coma term is not necessarily zero for an object point on the axis, i.e., when (r )(zo) = (0,O).Equation (218) also shows how this coma term for off-axis points ((r ) ( z o )# (0, )) also depends on the higher order central moments of the wavepacket besides the position ((r I ) ( z o ) )and the slope ((p )(zo)/p,,) of the corresponding classical ray. When an aperture is introduced in the path of the beam to limit the transverse momentum spread one will be introducing un3
,
d m ,
Jm)
Ay = certainties in position coordinates (Ax = and hence the corres onding momentum uncertainties (Ap, = Apy = in accordance with Heisenberg’s uncertainty principle, and this would influence the aberrations. However, the schemes for corrections of the aberrations may not be affected very much since these schemes depend only on the matching of the aberration coefficients and the quantum mechanical expressions for these coefficients turn out to be, under the approximations considered, the same as the classical expressions. Before closing, we have to consider a few other points: If we go beyond the approximation in Eq. (66) to include higher order terms in then, in general, we will have
dm’,
d ~ ) ,
2,
n
OPTICS OF CHARGED PARTICLES
313
where ~ & ] ( Z , Z , ) is to be calculated using the formula in Eqs. (76) and (77) keeping in the corresponding ?&&, z,) only terms of order up to 2n in (r I ,fi ,). Using Eq. (2201, aberration beyond the third order can be computed following the same procedure used above for studying the third-order aberrations. Here again, in the application of Eq. (98) to calculate the transfer maps for r I and @ I [Eqs. (205) and (206)l the series of commutators on the right-hand side of Eq. (98) should be truncated in such a way that only terms of order up to (2n - 1) in (rI ,$,I are retained. Comments on the effect! of the hermitian and nonhermitian terms dropped from to obtain W, in Eq. (192) [or in general in Eq. (219)] are in order. The hermitian terms we have been dropping are terms of nonclassical origin proportional to powers of A, such that they will vanish in the geometrical optics (or classical) limit when we make the replacments fi I = - ihV, + p , and A, + 0. Under the approximation considered above the terms dropped are
8
where the superscript (A,) indicates the explicit A, dependence. Taking into account the influence of the above terms [Eqs. (221) and (222)] is straightforward. Note that A:?) is a paraxial term and should be added to A , while computing the paraxial transfer operator $ ( z , z,)JEq. (126)]. Using the prescription outlined in Appendix F, one gets for gp(z,z , ) the same expression as in Eq. (126), but having (g$z, z o ) - Aig,(z, z,)F’(z)/ 161~’)and (h’’(z, 2,) - Aih,(z, zo)F’(z)/l6.rr2) instead of g$z, z , ) and h’,,(z,z,), respectively, and, with g,(z, z , ) and h,(z, z , ) satisfying the modified paraxial equation
r’y
+
(
F(z)
-
A: 4 F”(z)- -
16w2 2561~~ replacing Eq.(1301, and the initial conditions
314
R. JAGANNATHAN AND S. A. KHAN
The relation g,h’, - h,gl, = 1 is true at all z as before. Consequently, the paraxial properties of the system are slightly modified and the changes are easily computed. Since the additions are proportional to powers of A, they are essentially small compared to the clas$cal parts and vanish in, the geometrical optics limit (A, + 0,.The term W?$ has to be added to W0,(4) to compute the corresponding L& and this leads to the modification of the aberration coefficients. For example, the modified spherical aberration coefficient turns out to be
l’iI
c, = - 1‘dz
-
2
(ff4 -
ffff”)
z,
A4, +y ( a4a” + 3 2 ~
[ +[
+
(Y(Y’~CY”
1
+ a2a’”’’’)h4 A:
~ ( ~ C Y-Cah” X‘ ad“‘) - 7 f f 4 :4 1 6 ~
1
3 A: 2 a 2 + y ( ~ ” f ’ h2N2 ’ 32-
d 2
1
h3h’
where h = h,(z, z , ) satisfies Eqs. (223) and (224). Since the nonclassical A,-dependent contributions to are very small compared to the dominant classical part, Scherzer’s theorem would not be affected. Let us now consider the nonhermitian term
c,
5
which is really an %ntihermitianterm. Since it a paraxial term its effect will be to modify V,(z, z , ) when we add it to H,,. If we retain any such antihermitian term in the paraxial Hamiltonian the reulstant transfer operator .5$z, z , ) obtained using the formula in Eqs. (73) and (741, will, in general, have the form
315
OPTICS OF CHARGED PARTICLES
where i,(z, z , ) and ZA(z, z,) are, respectively, the hermitian and antihermitian correction terms to theAmainpart ?(z, zok it may be npted that any term of the type ( i / h ) [ A ,B ] is hermitian when both A and k are hermitian or antihermitian and such a term is antihermitian if one of the two operators is hermitian and the other is antihermitian. When $,p(z, 2,) is used to calculate the transfer maps
(a,&
( r l >(2,)
-+
( r L>(z)
it is seen that the hermitian correction term modifies the paraxial map while the antihermitian correction term leads to an overall real scaling factor 1/( $ ~ z o ~ l e ~ ~ * i ' ~ ~ f iaffecting ~ l + ~ z othe ~ ) ,image magnification, as a consequence of nonconservation of intensity, and contributes to ,e ( - i i ~ / cand ) e(-i'~/*)'$ aberrations since the terms like e(-i'A/fi)tr e(-i'A/*)lead on expansion, respectively, to hermitian terms of the fork r + nonlinear terms in (r ,, and $ + nonlinear terms in ( r , ) only. In the present case, the term .$Ao) in Eq. (226) does not lead to any hermitian correction term (note that pd?;)(z(')), ~;;)(Z(~))I = 0 for any z(l) and d2))and its contribution to the optics is only through the antihermitian correction term affecting the conservation of intensity and adding to the aberrations. Since the effects of the &-dependent hermitian and antihermitian terms are quite small, as found above in a preliminary analysis, we in proposed that all such terms may be treated as perturbations and clubbed with the aberration terms to be dealt with using the interaction picture. In the computation of the corresponding transfer operator &-dependent terms may be retained up to any desired order of accuracy. Thus, for example, in the present case, to obtain the effects of the terms @$),A$& and 4,"$ we may replace z,) in Eq. (204) by a %r(z, z,) which is to be computed by using the formula in Eqs. (73) and (74) with = A:,,, + fi(Ao)l 0. P + @,?if +d,$)' and keeping the commutator terms up to the desired level of accuracy in terms of powers of A, and such that the resultant polynomial in (r ,$ 1 is only of order four. It should also be
-
8
qi)(z,
8
316
R. JAGANNATHAN AND S. A. KHAN
noted that the precise forms of the A,-dependent correction terms depend on the order of approximation, in terms of powers of p , , chosen to expand the Hamiltonian H in Eq. (32) to arrive at the optical Hamiltonian W, in Eq.(66). We shall not elaborate on this topic further since the calculations are straightforward [more details will be available elsewhere (Khan, 199611. 3. Some Other Examples In this section, we consider a few other examples of the application of the general formalism of the scalar theory of charged-particle wave optics. The examples we shall treat briefly are the magnetic quadrupole lens, the axially symmetric electrostatic lens, and the electrostatic quadrupole lens (see Hawkes and Kasper, 1989a, b, for the practical utilizations of these lenses). The straight optic axes of these lenses are assumed to be in the z-direction. Let us briefly recapitulate the essential aspects of the general framework of the theory. We are interested in the study of the z-evolution of a quasimonochromatic quasiparaxial charged-particle b5am being transported through the lens system. The Hamiltonian Z, of the system, governing the z-evolution of the beam through the optical Schrodinger equation
can be written as
where p , is the magnitude of the design momentum corresponding to the mean kinetic energy with which a constituent particle of the quasimonoenergetic beam enters the systlem, from the field-free input region, in a path close to the + z direction, W,,is the hermitiy paraxial Hamiltonian [in general a quadratic expression in (r I ,fi W,,is the hermitian aberration (or perturbation) Hamiltonian [a polynomial of degree > 2 in (r I ,fiL)], and 2$Ao) is a sum of hermitian and antihermitian expressions with explicit A, dependence containing paraxial as well as nonpara$ai terms. In the geoemtrical optics limit (A, + 0)2$^0) vanishes, unlike W,, and Aqa, which tend to the corresonding classical expressions in this limit. From Eqs. (229) and (2301, we have that l@(z)> = @ z ,
Z,)l@(Z,)),
(231)
OPTICS OF CHARGED PARTICLES
317
with
where aexp( -(i/tL)lL dz( 1)) is the path-ordered exponential to be computed using Eqs. (73) and (74). When A , is a sum of r: , {r I .fi I +fi I -rI), and , as in the case of the examples we are considering, Up(z,z o ) may be computed “exactly,” in the same form as in Eq. (126) and as exactly as gp(z,z , ) and hp(z,z , ) can be tbtained, using the procedure outlined in Appendix F. The expression for $ ( z , z,) can be calculated up to the desired order of approximation consistent with the approgmation made in obtaining the nonparaxial and A,,-dependent parts of &”, in Eq. (230). Then, using Eq. (230, the behavior of the system can be understood by analyzing the average values of r l and p l at any final plane of interest, namely,
at
(235)
in terms of the state l@(zo)) at any desired initial plane. Wh-en .$z,zo) in Eq. (231) is approximated by the dominenat paraxial part, Up(z,zo), alone [see Eq. (23211 one gets the ideal, or the desirable, behavior of the system expected on the basis of the paraxial (or Gaussian) geometrical optics: in this case, the transfer map ((r I )(zo),((p I ) ( z o ) ) + ((r I Xz), ((p ) ( z ) ) , for any z-plane, is linear in ((r )(zo), ((p A )(zo)). Here, we shall treat briefly the magnetic quadrupole lens, the electrostatic round lens, and the electrostatic quadrupole lens according the scheme outlined above. In each case, we shall explicitly consider only the ideal behavior of the system in order to identify the essential characteristics of the system. The deviations from the ideal behavior leading to the various classical and nonclassical aberration effects as well as A,-dependent corrections to the paraxial optics can be studied exactly in the same
318
R. JAGANNATHAN AND S. A. KHAN
way as in the case of the magnetic round lens, which has been treated above the some detail.
a. Magnetic Quadmpole Lens. Let us consider the ideal magnetic quadrupole lens, with the optic axis along the z-direction, consisting of the field B = ( - Q ~ Y -, Q m x , o ) , constant in the lens region ( zI Iz Iz,), (236) Qm= 0 outside the lens region (2z,)
(
corresponding to the vector potential
Since there is no electric field in the lens region we can take +(r) Then, from Eq. (66) the optical Hamiltonian & is obtained as
& = -Po + A P + kio,a 0.
+/@*o),
=
0.
(238)
Since A,,, is independent of z , the exact expression for the unitary paraxial transfer operator can be immediately written down: with Az = (z - z,),
319
OPTICS OF CHARGED PARTICLES
analogous to Eq. (170) in the case of the round lens. The corresponding paraxial transfer map for (r I ,p I) becomes
TXL
cosh( @ w ) =
\ @ sinh(@
w)
I
o=(:
:),
1
-sinh( @ w )
@ cash(@ w ) 1
in( fiw )
cos( @ w )
~ , ( d ) =1 ( ~ d1 ) ,
K=-
(244) Po
It is readily seen from this map that the lens is divergent (convergent) in the xz-plane and convergent (divergent) in yz-plane when K > 0 ( K < 0). In other words, a line focus is produced by the quadrupole lens. In the weak field case, when w 2 4 1/IKI [note that K has the dimension of (length)-'] the lens can be considered as a thin lens with the focal lengths given by 1 1 -= -(245) f(X) f ( Y ) 3 -wK. Study of deviations from the ideal behavior [Eq. (244)l due to
A,,
and
2@*0) is straightforward using the scheme outlined above [Eqs. (231)-(235)1
and we shall not consider it here. In the field of electron optical technology, for particle energies in the range of tens or hundreds of kilovolts up to a few megavolts, quadrupole lenses are used, if at all, as components in abelration-correcting units for round lenses and in devices required to produce a line focus. Quadrupole lenses are strong focusing: their fields exert a force directly on the electrons, toward or away from the axis, whereas in round magnetic lenses, the focusing force is more indirect, arising from the coupling between B,
320
R. JAGANNATHAN AND S. A. KHAN
and the aximuthal component of the electron velocity. So it is mainly at higher energies, where round lenses are too weak, that the strong focusing quadrupole lenses are exploited to provide the principal focusing field (see Hawkes and Kasper, 1989a, b for more details). Magnetic quadrupole lenses are the main components in beam transport systems in particle accelerators [for details see, e.g., Month and Turner (1989) and the textbooks by Conte and MacKay (1990, and Wiedemann (1993, 1995) and references therein]. b. The Axially Symmetric Electrostatic Lens. An electrostatic round lens, with axis along the z-direction, consists of the electric field corresponding to the potential
inside the lens region ( z , s z 5 zr).Outside the lens + ( z ) = 0. Using this value of 4(r) in Eqs. (26) and (66), with A = (0, 0, O), the optical Hamiltonian of the lens takes the form,
OPTICS OF CHARGED PARTICLES
321
The unitary paraxial transfer operator l&(z,z,) can be obtained as outlined in Appendix F, in terms of minus the first term ( -pol which contributes only a multiplicative phase factor to the wavefunction. In this case, unlike the situation for the magnetic round lens, the coefficient of is seen to depend on 2. The calculation is straightforward and the paraxial transfer map reproduces the well-known classical results (see Hawkes and Kasper, 1989a). Here we have just demonstrated that @ can be brought to the general form, as required by Eq. (2301, for application of the scheme of calculation outlined above. It may be noted that we have assumed the lens potential +(r I ,z ) to vanish outside the lens region. In other words, we have considered the unipotential (einzel) lens having the same constant potential at both the object and the image side. There is no loss of generality in this assumption of our scheme, since the so-called immersion lens, with two different constant potentials at the object and the image sides, can also be treated using the same scheme simply by considering the right boundary (zr) of the lens to be removed to infinity and including the constant value of the potential on the image side in the definition of 4(r I ,2 ) .
A,,
c. The Electrostatic Quadrupole Lens. For the ideal electrostatic quadrupole lens with 1
44r) = z Q e ( x 2 - y 2 ) , constant in the lens region 0 outside the lens region
% = -Po
( 2 ,Iz Iz , ) ,
(z<
+ fio,p+ A,,,,
Z ~ , Z> z r ) ,
(252)
(253)
322
R. JAGANNATHAN AND S. A. KHAN
1
9=
E
+ moc2
,
qQ, l =-
CPO CPO and there ae! no A,-dependent terms-up to this approximation. Simply by comparing W,,in Eq. (253) with the W,,of the magnetic quadrupole lens [Eq. (23911 it is immediately seen that a thin electrostatic quadrupole lens, of thickness w = z, - z , , has focal lengths given by
1
-=---
1
I
wqQe(E + mot')
(257) CZP,2 Again, it is straightforward to study the deviations from the ideal behavior using the scheme outlined above. f(Z)
f‘Y’
111. SPINOR THEORY OF CHARGED-PARTICLE WAVEOPTICS
A. General Formalism: Systems with Straight Optic Axis
The developments of a formalism of spinor electron optics (Jagannathan et al., 1989; Jagannathan, 1990; Khan and Jagannathan, 1993) has been mainly due to a desire to understand electron optics entirely on the basis of the Dirac equation, the equation for electrons, since in the context of electron microscopy the approximation of the Dirac theory to the scalar Klein-Gordon theory seems to be well justified (Ferwerda et al., 1986a, b), under the conditions obtaining in present-day electron microscopes, and accelerator optics is almost completely based on classical electrodynamics (see e.g., Month and Turner, 1989; Conte and MacKay, 1991; Wiedemann 1993, 1995, and references therein). The algebraic structure of this spinor formalism of electron optics, built with a Foldy-Wouthuysen-like transformation technique, was later found (Khan and Jagannathan, 1994, 1995) to be useful in treating the scalar theory of charged-particle wave optics based on a Feshbach-Villars-like representation, as we have already seen in the earlier sections. Now, we shall present the essential details of the wave optics of the Dirac particles (spin- particles) in the case of systems with straight optic axis along the z-axis and demonstrate its application by considering the magnetic round lens and the magnetic quadrupole lens.
OPTICS OF CHARGED PARTICLES
323
We shall use the same notations as in the previous sections, for describing the optical system, the wavefunction (now, with four components), the Hamiltonian (now, a 4 X 4 matrix), etc., which will be clear from the context. Let us start with the time-dependent Dirac equation written in the dimensionless form
A
H,
=
p
+ 8,.,+ 8,, A
As is well known, in the nonrelativistic situation (In1+ mot), for any positive-energy the upper components are large compared to the lower components The even operator 8, does not couple q,, and q , and the odd operator g, couples them. Further, one has to note the algebraic relations
pgD = -gDp,
(262) Let us consider the optical system under study to be located between the planes z = zI and z = z,. Any positive-energy spinor wavefunction obeying Eq. (258) and representing an almost paraxial quasimonoenergetic Dirac particle beam being transported through the system in the +z-direction would be of the form
P
=
A p (Po,
lpl,
E(p) =
p&D = & D p .
+ d n , IPII ' P
P
=
(P, , P ,
=
+Fz)
(264)
324
R. JAGANNATHAN AND S. A. KHAN
p p ( l a + ( P ) 1 2+ lU-(P)12)
=
(265)
1,
where {u *(p) exp[(i/hXp * r - E(p)t)]) are the standard positive-energy free-particle plane-wave spinors (see, e.g., Bjorken and Drell, 1964). We are interested in relating the scattering state wavefunction 9 ( r I ,z; t ) at different planes along the z-axis. To this end, we shall assume the relation
+(r I ,z(2); p )
=
ck ld2r(1)(ry)l.$k(z(2),
z ( l ) ;p)Ir(l))+k(r(j), z(l);p
j,k
=
),
1 , 2 , 3 , 4 , (266)
for +(r I ,z ; p ) , such that we have
I*(
z(*), t ) )
=
in the paraxial case ( Ap
z ( ~ )z('); , po)l*( z(');t ) ) ,
2:
0). (267)
It is obvious that the desired z-propagator z('); pol, corresponding to p o t the mean value of p for the beam, is to be gotten by integrating for z-evolution the time-independent equation q4J
QL.+.
P-m,cz+-a,
m0c
(
ih-
m0c d
az
)I
+qA, $ ( r l , z ; p o ) = 0,
obtained for Wr, t ) = +(r I ,z ; pol expi - ( i / h ) E ( p o ) t ) . Now, multiplying Eq. (268) by mOca,/pOthroughout from the left and rearranging the
325
OPTICS OF CHARGED PARTICLES
terms we get
where E is the kinetic energy of the beam particle entering the system from the field-free input region [i.e., moc2+ E = E(po)l. Noting that, with I as the 4 X 4 identity matrix, 1 1 - ( I + X % ) P X % ( I - xa,) = P , $ 1 + x % ) ( l - xu,) = 2 (272) 1 9
let us define a transformation *-b*'=
M
M+,
1
=
-(Z
a
Then, V ,I satisfies a Dirac-like equation ih, d+' - (MAM- 1) +' 27T dz
=
+ xa,).
(273)
A'+',
(274)
7)44
.+
1 - -@,, Po 1 I - ,moq4JXcYz,
E
+ mOc2
8 =- p
CPO
A
1 8 = -xa Po
Po
(277)
7)= CPO
For a monoenergetic quasiparaxial beam, with IpI = p o , p z > 0, and pz = p,,, entering the optical system from the field-free input region, +' has its upper pair of components large compared to the lower pair of components as can be v:rifiFd using the form of i+h(! ,z 5 z , ) given in Eqs. !264) aFd (265). In H', 8 is an even operator, B is an odd operator, PU = -BP, and Pg = &3, Now we can apply the Foldy-Wouthuysen transformation
326
R. JAGANNATHAN AND S. A. KHAN
technique to reduce the strength of the odd operator 8^ to any desired level taking l/p, as the expansion parameter. The first transformation
leads to the result
= @- -pg2 - 21
= - : p2 i
81
[[ 8,
[@,@I A
ih, ( d g ) ) ] 1 ++ -g4, (282) 21r d z 8
A
2($))
[2,@] +
1 ,
- -83. 3
(283)
There are a few technical points to note here. The Hamiltonian fi' is not hermitian: this is related to the fact that Cg_ ld2rl+$r I ,z)I2 need not be conserved along the z-axis. The transformation in Eq. (279) is not unitary. The equations (279)-(283) can be written down from the corresponding equations of Appendix B [(B6) and (BlO)] simply by using the analogy t + z , moc2 + -1, and h + hO/27r, which follows from a comparison of Eqs. (274) and (275) with Eqs. (Bl) and (B2). It may also be noted that having the equations in dimensionless form is helpFI for symbolic algebraic manipulations in the above calculations. Now, contains only higher power of l/po compared to @. The second transformation,
leads to the result
327
OPTICS OF CHARGED PARTICLES
with g2containing only higher powers of l/p, compared to another such transformation,
2,. After
we have
fi(3) =
+ g3+ g3,
-p
(290)
g3= gl(c + g2,g+ g2), g3= g1(2+ &,&
--$
g2),
(291)
with g3containing only hig$er powers of l/po compared to at this stage and omitting 83,we can write
ih, -272
go)= 2 - -pg2 1 2
+ - 1p
fii(3)+(3),
dz -
[(
1 8, A
8
( ,. [ ..
g2.Stopping
“)I]
”,g] + ih,( A
2a
+ in,(
dz
6 4 + [8,8]- dg)]’) A
2a
8
dz
(293)
It can be shown that the above transformations make the lower composuccessively smaller and smaller compared to the upper nents of components for a quasiparaxial beam moving in the +z-direction. In other words, one can write
+
p+‘3’
Now, tis) is found to be of the form
+‘3’.
(294)
328
R. JAGANNATHAN AND S. A. KHAN
Taking into account Eq. (2941, we can approximate Eq. (289) further, getting
To enable physical interpretation directly in terms of the familiar Dirac wavefunction let us return to the original Dirac representation by retracing the transformations: .
+(3)
+
.
*
*
+ = M-le-iSle-iS2e-i.f3
(3)
~-1~-i5$,(3) 9
(297)
1
il+ i2+ i3- ?([i1,i2] + [i1,i3] + [i2,i3]) 1
--“i1,i2],i3] 4 . *.*
Implementing this inverse transformation in Eq. (2961, with calculation done up to the desired level of accuracy in terms of l/p, (here, up to l/pi), and, finally multiplying throughout by p o we get
a ih ~ I J ~ ( Z )=%I+(z)), ) e-i.Gp,Aei4
-i h - i y -
ei9)
az
(299)
)M
The resulting optical Hamiltonian of the Dirac particle will have the form
% = -Po +
+ l171,a
+&(*a)
+2@*o*q
(301)
where A,,,, A,,,, and are scalar terms ( - I ) and % ( * o * ~ ) is a 4 x 4 matrix term which also vanishes in the limit A, + 0, like Now, the performance of the optical system under study, corresponding to the assumed values of the potentials +(r) and Ah), can be calculated using the
329
OPTICS OF CHARGED PARTICLES
same scheme $s in Eqs. $231)-(235); the matrix term 2$Ao* can also be clubbed with W,,and q ( * O ) and treatingysing the interaction picture. It is found that the optical Hamiltonians W0)in the Klein-Gordon theory and the Dirac theory do not differ in their 'classical'parts (Ao,, + A,,). Thus the Klein-Gordon t4eory without the term &(Ao) and the Dirac theory without the terms and 2 $ A o v u ) are identical, effectively, as seen below. Note that for an observable 0 of the Dirac particle, with the corresponding hermitian operator 6 given in a 4 x 4 matrix form, the expectation value is defined by (O)(Z) =
-
( @( 2 ) l a @( 2))
(@(z)l@( 2))
l d 2 r $ * ( r l 7 ~ ) 6 j k @ k ( r l9 2 ) c;=,/ d 2 r * * ( r * , Z ) * ( ' L , z )
c;,k=l
Hence, the map ( 0 ) ( z o ) 4 ( O ) ( z )becomes ( O ) (z )
'
(302)
330
R. JAGANNATHAN AND S. A. KHAN
-
When the terms and are dropped from the Dirac optical Hamiltonian it becomes I and the corresponding transfer operator also becomes I with respect to the spinor index: i.e.,$Jz, z,) = $2, z,)Sj,. Then, although all four components of (+,, t,b2, &, +,,I contribute to the averages of r I ,p I ,etc., as seen from the above definitions, one can think of them as due to a single component effectively, since the contributions from the four components cannot be identified individually in the final results. Thus, in this case, there would be no difference between the “classical” transfer map for ((c I ) ( z ) ,( p I ) ( z ) )[Eqs. (205)-(208)] and the corresponding transfer map in the Dirac theory. In this sense, the Dirac theory and the Klein-Gordon theory are identical scalar theories when A,-dependent terms are ignored in the Klein-Gordon theory and A,-dependent scalar and matrix terms are ignored in the Dirac theory. We shall consider below, very briefly, a few specific examples of the above formalism of the Dirac theory of charged-particle wave optics.
-
+ +,
B. Applications
1. Free Propagation: Difiaction
For a monoenergetic quasiparaxial Dirac beam propagating in free space along the +z-direction Eq. (274) reads
with
(p o 9 ’ y = ( p i - J q z . Thus, p o d f = -po p + xa I -fi I can be identified with the classical optical Hamiltonian for free propagation of a monoenergetic quasiparaxial beam, with the square root taken in the Dirac way. Although in the present case it may look as if one can take such a square root using only the three 2 X 2 Pauli mmatrices, it is necessary to use the 4 X 4 Dirac matrices in order to take into account the two-component spin and the propagations in the forward and backward directions along the z-axis considered separately. It can be verified that for the paraxial planewave solutions of Eq. (306) corresponding to forward propagation in the + z direction, with p , > 0 and Ip I I 4 pr = p o t the upper pair of components are large compared to the lower pair of components, analogous to
d m ,
33 1
OPTICS OF CHARGED PARTICLES
the nonrelativistic positive-energy solutions of the free-particle Dirac equation. In the same way as the free-particle Dirac Hamiltonian can be diagonalized by a Foldy-Wouthuysen transformation (see Appendix B) the odd part in fit can be completely removed by a transformation: with
we have
1
=
--(I&E)P.
(310)
Po
Now, invoking the fact that JI" will have lower components very small compared to the upper components in the quasiparaxial situation, we can write iho a+"
21T dz
Po
Then, making the inverse transformation
+ = M-leBxal'iLe
$
(312)
9
Eq. (311) becomes
2 -( d m ) =
1
3
-Po
+ @-2Pll ;
1
+ 8Po
+
'.*
9
(314)
332
R. JAGANNATHAN AND S. A. KHAN
+
exactly as in the scalar case [see Eq. (8811 except for the fact that now has four components. Then it is obvious that the diffraction pattern due to a quasiparaxial Dirac-particle beam will be the superposition of the patterns due to the four individual components (JI,, J12, J13, +J of the spinor representing the beam: for a highly paraxial beam the intensity distribution of the diffraction pattern at the xy-plane at z will be given by [see Eq. (9511
+
where the plane of the diffra$ting object is at z,. It is clear th:t when the presence of a fieldAmakes &", acquire a matrix component ~ ( A o O . u )the l transfer operator f l z , z , ) would have a nontrivial matrix structure leading to interference between the diffracted amplitudes (+,, +b2, +3, JIJ. When the monoenergetic beam is not sufficiently paraxial to allow the approximations made above one can directly use the free z-evolution equation
obtained by setting Eq. (3161, we have
I+(.)>
=
C#J =
0 and A
=
(0, 0,O) in Eqs. (26914271). Integrating
il:
exp - Az(poPxa, + i(ZXbY- zybx)))l+(zo)L
AZ = (Z - z,),
(318)
the general law of propagation of the free Dirac wavefunction in the +z-direction, showing the subtle way in which the Dirac equation mixes up the spinor components [for some detailed studies on the optics of general free Dirac waves, in particular, diffraction, see Rubinowicz (1934, 1957, 1963,1965), Durand (19531, and Phan-Van-Loc (1953,1954,1955,1958a, b, 196011.
OpfICS OF CHARGED PARTICLES
333
2. The Axial& Symmetric Magnetic Lens
In this case, following the procedure of obtaining & as outlined above, we get A
‘% = -Po + H 0 , p A
+
Ao,a
+$(AD)
+ 2 ( A D ’ d ,
(319)
335
OPTICS OF CHARGED PARTICLES
+
@oAo a”( 2 ) I iP0 A: a( 2 ) a’”(2 ) 3 2 ~ 64r2
( --
Po A: + ,xPaz( 64P
2 a ’ ( $r:
1 2
- -a’( 2 ) Q”( 2)r:
i
-
Comparing with the scalar case it is seen that the difference in the scalar part ( I ) lies only in the A,-dependent term. Thus as already noted, even the scalar approximation of the Dirac theory is, in principle, different from the Klein-Gordon theory, although it is only a slight difference exhibited in the A,-dependent terms. The matrix part in $ in the Dirac theory, ( * o S u ) , adds to the deviation from the Klein-Gordon theory. Without further ado, let us just note that the position aberration ( 8 r 1 2 ( 3 J ~ 0 ) gets additional contributions of every type from the matrix part g(Ao.u). For example, the additional spherical aberration type of contribution is
&
where h is the “classical” h,(z, zo). Obviously, such a contribution, with unequal weights for the four spinor components, would depend on the nature of I+I(z,)) with respect to spin.
336
R. JAGANNATHAN AND S. A. KHAN
3. The Magnetic Quadrupole Lens
Now, for the ideal magnetic quadrupole lens,
is different Again, it is seen that, the &-dependent scalar term, R(Ao) from the corresponding one in the Klein-Gordon theory.
IV. CONCLUDING REMARKS In fine, we have reviewed the quantum mechanics of charged-particle beam transport through optical systems with a straight optic axis at the single-particle level. To this end, we have used an algebraic approach which molds the wave equation into a form suitable for treating quasimonoenergetic quasiparaxial beams propagating in the forward direction along the axis of the system. We have considered both the Klein-Gordon theory and the Dirac theory with examples. In particular, we have dealt with the magnetic round lens and the magnetic quadrupole lens in some detail. It is found that in the treatment of any system a scalar approxima-
OPTICS OF CHARGED PARTICLES
337
tion of the Dirac spinor theory would differ from the Klein-Gordon theory, but with the difference being only in terms proportional to powers of the de Broglie wavelength such that in practical electron optical devices there is no significant difference between the two treatments. The spin-dependent contributions in the Dirac theory are also found to be proportional only to powers of the de Broglie wavelength. So the contributions to the optics from such terms dependent on the de Broglie wavelength and spin could be expected to be visible only at very low energies. This vindicates the conclusion of Ferwerda et al., (1986a, b) that the reduction of the Dirac theory to the Klein-Gordon theory is justified in electron microscopy. Perhaps the extra contributions of the Dirac theory could be relevant for low-energy electron microscopy (LEEM) where the electron energies are only in the range 1-100 eV (see Bauer, 1994, for a review of LEEM). Regarding some other approaches to the quantum mechanics of charged-particle optics, we note the following: a path integral approach to the spinor electron optics has been proposed (Liiiares, 1993); a formal scalar quantum theory of charged-particle optics has also been developed with a Schrodinger-like basic equation in which the beam emittance plays the role of h (Dattoli et al., 1993). In the context of probing the small differences between the KleinGordon and Dirac theories, another aspect that should perhaps be taken into account is the question of proper definition of the position operator in relativistic quantum theory related to the problem of localization (Newton and Wigner, 1949). It should be interesting to study the transfer maps using the various proposals for the position operators for the spin-0 and spin- particles (e.g., see Barut and Rqczka, 1986). Throughout the discussion we have kept in mind only the application of charged-particle beams in the low-energy region compared to accelerator physics. However, the frameworks of the scalar and spinor theories described above are applicable to accelerator optics as well. In particular, the formalism we have discussed should be well suited for studying the quantum mechanical features of accelerator optics, since its structure has been adapted to handling beam propagation problems (for a quantum mechanical analysis of low-energy beam transport using the nonrelativistic Schrodinger equation see Conte and Pusterla, 1990). Also, as is well known, in accelerator optics the spin dynamics of beam particles is traditionally dealt with using the semiclassical Thomas-BargmannMichel-Telegdi equation (see, e.g., Montague, 1984). As has been shown by Ternov (19901, it is possible to derive this traditional approach to spin dynamics from the Dirac equation and also to get a quantum generalization of it. It should be worthwhile to study spin dynamics using the beam optical representation of the Dirac theory described above.
338
R. JAGANNATHAN AND S. A. KHAN
An important omission in the discussion is the study of systems with a curved optic axis such as bending magnets, which are essential components of charged-particle beam devices (see Hawkes and Kasper, 1989b, Part X). In these cases, the coordinate system used will have to be naturally the one adapted to the geometry, or the classical design orbit, of the system. Then in the scalar theory one has to start with the Klein-Gordon equation written in the suitably chosen curvilinear coordinate system and the two-component form of the wavefunction will have to be introduced in such a way that one component describes the beam propagating in the forward direction along the curved optic axis and the other component describes the beam moving in the backward direction. Starting with such a two-component representation one can follow exactly the same approach as above using the Foldy-Wouthuysen technique, to filter out the needed equation for the forward-propagating beam. The rest of the analysis will follow the same scheme of calculations as described above. Similarly, for the Dirac theory we can start with the Dirac equation written using the chosen set of curvilinear coordinates following the method of construction of the Dirac equation in a generally covariant form (see, e.g., Brill and Wheeler, 1957). Then the treatment of the given system follows in the same way, via the Foldy-Wouthuysen transformations, as discussed above (for some preliminary work along these lines, see Jagannathan, 1990). There are also other important omissions from our account of the quantum mechanics of particle optics: coherence, holography,. . . . For such matters we refer the reader to Hawkes and Kasper (1994). Any physical system is a quantum system. If it exhibits classical behavior, it should be understandable as the result of an approximation of a suitably formulated quantum theory. We have seen that the classical mechanics of charged-particle optics, or the geometrical charged-particle optics, follows from identifying, Ci. la Ehrenfest, the quantum expectation values of observables, like r I ,p I , and polynomials in (r I ,p I), with the corresponding classical ray variables. The quantum corrections to the classical theory, at the lowest level of approximation, leaving out the effects depending on the de Broglie wavelength and spin (if # 01, arise from the dependence of the aberrations on not only the quantum averages of r I and p I but also the higher order central moments of polynomials in (r I ,p I1. This implies, for example, that the off-axis aberrations, considered to vanish for an object point on the axis according to the classical theory, would not vanish, strictly speaking, due to the quantum corrections. Another way in which the classical theory can be recovered from the quantum theory is to describe the action of the transfer operator on the quantum operators, in the Heisenberg picture, in the classical language using the correspondence
OPTICS OF CHARGED PARTICLES
339
principle by which we make the replacements h + 0, the quantum operators + the classical observables, the commutator brackets ( ( l / i h ) [ A ,i l l + the classical Poisson brackets ((A, B)). Then, the formalism described tends to the Lie algebraic approach to the geometrical charged-particle optics pioneered by Dragt et al. (e.g., see Dragt and Forest, 1986; Dragt et al., 1988; Rangarajan et a!., 1990; Ryne and Draft, 1991; see also Forest et al., 1989; Forest and Hirata, 1992). In the context of understanding the classical theory of charged-particle optics on the basis of the quantum theory, it should also be mentioned that a phase-space approach to the quantum theory of charged-particle optics, using the Wigner function, may prove useful. Use of the Wigner function in the scalar theory of paraxial electron optics has been found (Jagannathan and Khan, 1995) to have attractive features (see also Castafio, 1988, 1989; Castafio et al., 1991; Polo et al., 1992; Hawkes and Kasper, 1994, Chapter 78 and references therein). In this connection, it may also be noted that the Wigner function can be extended to the relativistic case in a natural gauge-covariant way using an operator formalism and such an approach admits a straightforward second quantization leading directly to a many-body theory (Elze and Heinz, 1989). It should be worthwhile to see how such an approach can be used in the quantum theory of charged-particle beam optics so that one can take into account the many-body effects also.
APPENDIX A. The Feshbach-Villars Form of the Klein-Gordon Equation
The method we have followed to cast the time-independent Klein-Gordon equation into a beam optical form linear in d / d z , suitable for a systematic study, through successive approximations, using the Foldy-Wouthuysenlike transformation technique borrowed from the Dirac theory, is similar to the way the time-dependent Klein-Gordon equation is transformed (Feshbach and Villars, 1958) to the Schrodinger form, containing only a first-order time derivative, in order to study its nonrelativistic limit using the Foldy-Wouthuysen technique (see, e.g., Bjorken and Drell, 1964). Defining d
a=-*, dt
340
R. JAGANNATHAN AND S. A. KHAN
the free particle Klein-Gordon equation is written as
Introducing the linear combinations ih
iii
the Klein-Gordon equation is seen to be equivalent to a pair of coupled differential equations:
Equation (A41 can be written in a two-component language as
with the Feshbach-Villars Hamiltonian for the free particle, bY
kow,given
s2u, + '-ay. s2 +-
= moc2uz
2m0
2m0
For a free nonrelativistic particle with kinetic energy 4 moc2 it is seen that is large compared to Y -. In presence of an electromagnetic field, the interaction is introduced through the minimal coupling
*+
OPTICS OF CHARGED PARTICLES
341
The corresponding Feshbach-Villars form of the Klein-Gordon equation becomes
I?'"
=
moc2uz+ 2 + 2,
As in the free-particle case, in the nonrelativistic situation 1I'+ is large compared to 1I'-. The even term 2f does not couple 1I'+ and 1I'- whereas and 1I'-. Starting from Eq. (A@, the nonrelativistic limit of the Klein-Gordon equation, with various correction terms, can be understood using the Foldy-Wouthuysen technique (see, e.g., Bjorken and Drell, 1964). It is clear that we have just adopted the above technique for studying the z-evolution of the Klein-Gordon wavefunction of a charged-particle beam in an optical system comprising a static electromagnetic field. The additional feature of our formalism is the extra approximation of dropping a- in an intermediate stage to take into account the fact that we are interested only in the forward-propagating beam along the z-direction.
2 is odd, which couples 1I'+
B. The F o e - WouthuysenRepresentation of the Dirac Equation The main framework of the formalism of charged-particle wave optics, used here for both the scalar theory and the spinor theory, is based on the transformation technique of the Foldy-Wouthuysen theory which casts the Dirac equation in a form displaying the different interaction terms between the Dirac particle and an applied electromagnetic field in a nonrelativistic and easily interpretable form (Foldy and Wouthuysen, 1950; see also Pryce, 1948; Tani, 1951; see Acharya and Sudarshan, 1960, for a
342
R. JAGANNATHAN AND S . A. KHAN
general discussion of the role of Foldy-Wouthuysen-type transformations in particle interpretation of relativistic wave equations). In the FoldyWouthuysen theory the Dirac equation is decoupled through a canonical transformation into two two-component equations: one reduces to the Pauli equation in the nonrelativistic limit and the other describes the negative-energy states. Analogously, in the optical formalism the aim has been to filter out from the nonrelativistic Schrodinger equation, or the Klein-Gordon equation, or the Dirac equation, the part which describes the evolution of the charged-particle beam along the axis of an optical system comprising a stationary electromagnetic field, using the FoldyWouthuysen technique. Let us describe here briefly the standard Foldy-Wouthuysen theory so that the way it has been adopted for the purposes of the above studies in charged-particle wave optics will be clear. The Dirac equation in presence of an electromagnetic field is
=
moc*p + 2 + g,
(B2)
with Z? = qc$ and = c a 3. In the nonrelativistic situation the upper pair of components of the Dirac spinor q are large compared to the lower pair of components. The operator k? which does not couple the large and small components of q is called even and 2 is called an odd operator which couples the large to the small components. Note that
p&=
-&,
pk? = Z?p.
033)
Now the search is for a unitary transformation, q + the equation for V does not contain any odd operator. In the free particle case (with 4 = 0 and 3 = Wouthuysen transformation is given by
=
OW,such that
a> such
9+
=
a Foldy-
OPTICS OF CHARGED PARTICLES
343
This transformation eliminates the odd part completely from the free-particle Dirac Hamiltonian reducing it to the diagonal form: jh
-*‘ d
at
= eif(moc$
+
fi)e-ifq’
In the general case, when the electron is in a time-dependent electromagnetic field it is not possible to construct an exp(i$) which removes the odd operators from the transformed Hamiltonian completely. Therefore, one has to be content with a nonrelativistic expansion of the transformed Hamiltonian in a power series in l/m,c2 keeping terms through any desired order. Note that in the nonrelativistic case, when IpI 4 m0c2 th? transformation operator fiF = exp(iS^) with s^ = -iP&/2moc2, where @ = c a fi is the odd part of the free Hamiltonian. So in the general case we can start with the transformation
-
Then, the equation for
W1) is
344
R. JAGANNATHAN AND
S. A.
KHAN
where we have used the identity
Now, using Eq. (98) and the identity “ a 1 -( e - i ( l ) ) = (1 + A ( t ) + - i ( t ) z dt 2!
x =
(
1
x
-( d
at
1
1
1
-A(t) + -A(ty 2!
-
1 -.) -A(t), 3!
1
1
+A(t)+ -A(t)z + -A(t), 2! 3! dA(t) --
dt
1 3!
--(
1 +2!
1
+ -3!~ < t ) 3
{
dA(t) -A(t)
*
dt
***
1 dmd t
A
+A(t)-
dA(t) dA(t) ---a(t)2 + A(t)-A(t) dt at
1
A
”””)
+A(t )z - ...)
3
-[ A(t),-4 dA(
-a&) - 1 at 2! 1
--
3!
&),
- -1 &), 4! with
A = i$,
we find
at
t)
[A(t),%I] [A W ,[AO),*I] .
(B8)
OPTICS OF CHARGED PARTICLES
345
Substituting in Eq. (B9), AD = moc2/3+ & + d,simplifying the right-hand side using the relations P& = -& and P& = @, and collecting everything together, we have
f i g ) = moc2P + &,
+ dl,
with $?,and 2, obeying the relations /3d, = -dl P and P&, = g1/3 exactly like & and d.It is seen that while the term d in I-?, is of orper zero with respect to the expyxion paramfter l/moc2 [i.e., U = O ( ( l / m , ~ ~ ) ~the ) ] odd part of H g ) , namely U,, tontains only terms of order l/moc2 and higher powers of l/moc2 [i.e., Hl = O((l/moc2))1. To reduce the strength of the odd terms further in the transformed Hamiltonian a second Foldy-Wouthuysen transformation is applied with the same prescription: .\1r(2) = e&.\Ir(l)
%=--
i
&,
After this transformation,
a
ifc -*(a at
= f i WD. \ I r ( 2 ) ,
f i g ) = moc2P + g2+ i2,
where, now, g2= O((l/moc2)2). After the third transformation
A
A
A
g3= g2= 8,,
where
A
@3
”’)
(
= - [g2,g2] + i h - at 2m,c2
(B14)
’
S3= O ( ( l / m , ~ ~ ) ~So) .neglecting g3, 1
fig) = m,c$ + ,i+? -pb’ 2m,c2 -
[ (
a81
-8,[ b,4 + ih 1
8m2,c4
at
ItAm$ybe noted that starting with the second transformation successive ( g ,d)pairs can be obtained recursively using the rule
and retai$ng only the r$evant terms of desired order at each step. With B = qc#J and d = c a - 6,the final reduced Hamiltonian [Eq. (B15)I is, to the order calculated,
--
”’
8rn;c’
divE
with the individual terms having direct physical interpretations. The terms in the first parentheses result from the expansion of showing the effect of the relativistic mass increase. The second and third terms are the electrostatic and magnetic dipole energies. The next two terms, taken together (for hermiticity), contain the spin-orbit interaction.
-4
347
OPTICS OF CHARGED PARTICLES
The last term, the so-called Darwin term, is attributed to the zitterbewegung (trembling motion) of the Dirac particle: because of the rapid coordinate fluctuations over distances of the order of the Compton wavelength (2.rrh/m0c) the particle sees a somewhat smeared-out electrical potential. It is clear that the Foldy-Wouthuysen transformation technique expands the Dirac Hamiltonian as a power series in the parameter l/moc2 enabling the use of a systematic approximation procedure for studying the deviations from the nonrelativistic situation. Noting the analogy between the nonrelativistic particle dynamics and the paraxial optics, the idea of the Foldy-Wouthuysen form of the Dirac theory has been adopted to study the paraxial optics and deviations from it by first casting the relevant wave equation in a beam optical form resembling exactly the Dirac equation [Eqs. (Bl)-(B2)] in all respects [i.e., a multicomponent 1J' having the upper half of its components large compa!ed to the lower 5omponents and the Hamiltonian having an even part (g),an odd part (a),a suitable expansion parameter characterizing the dominant forward propagation and a leading term with a /Mike coefficient commuting with i? and anticommuting with g].The additional feature of our formalism is to return finally to the original representation after making an extra approximation, dropping p from the final reduced optical Hamiltonian, taking into account the fact that we are interested only in the forward-propagating beam.
C. The Magnus Formula The Magnus formula is the continuous analogue of the famous Baker-Campbell-Hausdorff (BCH) formula
aeLi
=
,a+ Li + l/Z[ a. Li]+
1/12([
a,[ A,i l l + [ [ a, Li],B])+.
' '
(C1)
Let it be required to solve the differential equation d -dut ( t )
=i(t)u(t)
(C2)
to get u ( T ) at T > to, given the value of u(to).For an infinitesimal A t , we can write ~ ( t , , ~ t = )eA~a(~o)u(t,). (C3) Iterating this solution we have u( to + 2 A t ) = e A f ~ ( f ~ + A l ) e A l ~t ( f ~ ) ~ ( 01, U ( t o+ 3 A t ) = e A l a ( ~ , + 2 A ~ ) e A l a ( l , + A ' ) e A ~ a ( ' o ) u( t o ) , and so on. (C4)
+
348
R. JAGANNATHAN AND S. A. KHAN
If T = to + N A t we would have
Thus, u ( T ) is given by computing the product in Eq. (C5) using successively the BCH formula [Eq. (Cl)] and considering the limit At 4 0, N + 03 such that N A t = T - to. The resulting expression is the Magnus formula (Magnus, 1954): u( T ) =
*
T , t,)u( t o )
9
To see how Eq. (C?) is obtained let us substitute the assumed foTm of the solution, u ( t ) = S t , to)u(to),in Eq. (C2). Then it seems that St,t o ) obeys the equation
a ,4t,to)
=i
*to,to)
=
f.
(C7)
Introducing an iteration parameter A in Eq. (C7), let d
t o ; A)
-atq t ,
q r o , t o ;A)
=
=i,
A a ( t ) $ t , t o ; A),
(C8)
q t , t o ; i ) =*t,to).
(C9)
Assume a solution of Eq. (C8) to be of the form *t,t,;
A)
= en(l*lo;A)
(C W
with m
n ( t , t oA) ;
A"An(t,to),
=
A , , ( t o , t o )= 0
n=l
Now, using the identity (see Wilcox, 1967)
for all n. (C11)
OPTICS OF CHARGED PARTICLES
349
one has
a
~ ‘ ~ p u . r 0at ; w
’ t O ’. A ) ~ - $ ~ ( ~ . M = A )A a ( t ) .
(C13)
Substituting in Eq. (C13) the series expression for M t , to; A) [Eq. (Cll)], expanding the left-hand side using Eq. (981, integrating, and equating the coefficient of AJ on both sides, we get, recursively, the equations for A,(t, to),A&, to), etc. For j = 1
and hence
Similarly,
Then, the Magnus formula in Eq. (C6) follows from Eqs. (C9)-(Cll). Equation (74) we have in the context of z-evolution follows from the above discussion with the identification t + z , to + z(’), T + z(’), and a(t) -(i/WZ?JZ).
For more details on the exponential solutions of linear differential equations, related operator techniques, and applications to physical problems the reader is referred to Wilcox (19671, Bellman and Vasudevan (1986), Dattoli et al. (1993), and the references therein.
350
R. JAGANNATHAN AND S . A. KHAN
D. Green’s Function for the Nonrelativistic Free Particle
For a nonrelativistic free particle of mass m moving in one dimension the Schrodinger equation is
a
ih-T(x,t) at
2
H = -. 2m
=A*(x,t),
The corresponding Green’s function, or the propagator, given by ~ ( ~ ’ , xt ,’t ;)
=
(X’~e-(i/h~~‘= - ~(X’le-(i/hXI’-,)BI/2m )~lX) Ix) ( D 2 )
is such that *(x ’ , t ’ )
=
/&G(
XI,
t ‘ ; x , t)*( x , I ) .
(D3)
The expression for G(x’,1’; x , t ) in Eq. (D2) can be evaluated easily using the momentum representation. The explicit calculation is as follows: with (t’ - t ) = A t , (Xile-(i/fr)Al(B~/2m)lX)
=
// dp’ dp( xp Ip’ )( p’ l e - ( i / * ) A f ( B f
p )(p l x )
/’m)~
=
2Th
/dp’exp(
(;
p‘(x’ - x )
At2m
m( x’ - x ) At
1
2Th
im( x’ - x ) ~ =2-f.exp( 2 h A t 1
-
)
OPTICS OF CHARGED PARTICLES
im( x’ - x)’
1
=Gexp{ 2hAt
] / dP’ exp {
351
--P2
2mh
Equation (94) is the two-dimensional generalization of this result. Since the variables x and y are separable for the free motion in the xy-plane, it follows that
G(r’L , t ‘ ; r , , t )
imlr’, - r I I 2h At
m =
(2aihA t )
E. Matrix Element of the Rotation Operator
The required matrix element of-the rotation operator around the z-axis through an angle 6 , ( r le(i/*)aLzlf ), can be easily calculated using the cylindrical coordinate system x = p cos 0, y = p sin 0, z = z. Then,
e(i/h)aLz+(r I ,z )
,
= ,$i/hP(xBy-YBx)+(r, 2 ) =
efl(d/de)+(p , 0 , z )
=+(rl(6),z)9 (El) where r (6) = ( x cos 6 - y sin 6, x sin 6 + u cos 6). Using this result, we get =+(p,0+6,z)
(r’, l e ( i / f ’ f l L z l f L )
=
/d’ra’* ( r l
) e ( i / h ) a i , a 2 (rl - E L )
F. Green’sFunctionfor a System with Time-Dependeni Quadratic Hamiltonian Let it be required to compute the Green’s function for a system in one dimension obeying the Schrodinger equation
352
R. JAGANNATHAN AND S . A. KHAN
and the required Green’s function is given by G ( x , t ;x ’ , t ’ ) =
( x l 8 ( t ,t’)lx’)
(F5)
such that
9(x , t )
=
1&G( x ,
t ; x’, t ’ ) 9 ( x’, 1’).
(F6)
From the fact that the operators ( x 2 , $,: {xFx + j&r)) are closed under commutation, leading to the Lie algebra,
[ x 2 , fi:]
= 2in(xix
[ x 2 , ( x i x + &}] [ ( x i x + a x ) , ij:] it is clear that
+b p ) ,
= 4itlr2, = 4ini:,
O(r,t ’ ) in Eq. (F4) can be written in the form
f i ( t , t # ) = exp - z [ u ( t 7 r ’ ) &
(
i
+b(t,t’)(xbX +
a x ) + ~ ( ? , r ’ ) x ~ ] } (F8) ,
where u(t, t’), b(t, t ‘ ) , and c ( t , t ’ ) are infinite series expressions in terms of A(t), B(t), and C(t). The precise form of Eq. (F8) can be obtained as
353
OPTICS OF CHARGED PARTICLES
follows. Substituting the relation I+(t)> =
fi(t,t‘)~+(t’)>
(F9)
in Eq. (Fl) it is seen that d
ih -atf i ( t , t ’ )
=A(t)fi(t,t’),
f i ( r f , t l ) = i.
(F10)
This implies that
+b(t,t’){& =
( A ( r ) $ j + B(t){&
+ a x } + c(t,t’)x2]
+ b p ) + c(t)x2)
+b(t,t’){xix + f i x x )
+ c(r,tl)x21)). ( ~ 1 1 )
The algebra in Eq. (F7) can now be used to relate a, b, and c with A, B, and C. Following Wolf (1981), we shall spell out these relations as follows: with the parametrization a=-
QP 2Sk
Q ( a - 6) Q’
=
c=
4sin cp ’
-- QY 2 sin Q ’
1 c o s q = -2( f f + S ) , (F12) it is seen that d t , rf), P ( t , r’), y ( t , 0,and S ( t , t ’ ) satisfy the equations A & - A & + [4A(AC - 8 a ( t ’ , t ’ )= 1, ffg -
p ( t ’ , t ’ ) = 0,
2 ) -
2AB
&(t’,t’) =
pff
-
2A
=
+ 2 k s I f f = 0, 2B(tf),
(F13)
0,
j ( t ‘ , t ’ )= 2 ~ ( r ‘ ) ,
(F14)
354
R. JAGANNATHAN AND
S. A. KHAN
1) +
1 + - ( a ( t , t ’ ) - S(t,t’)){xfi, +fi,x} - y ( t , t ’ ) x 2 2
(F17)
where a,P, y , S satisfy Eqs. (F13HF16) and cos cp = $(a 8 ) . With A , B, and C being real it is seen that a,P, y , 6,and cp are real, implying that D is unitary. To find the Green’s function [Eq. (F5)], the following observation helps: $(r, t ’ ) generates a real linear canonical transformation of the conjugate pair ( x , fix>as
In other words,
x q t , t ’ )IJI ) = D( f ,t
a (t , t ‘ )x
I)(
+ P( t , t ‘ ) f i x ) lJI >,
fixD(f,f’)lJI)= D ( r , t ‘ ) ( y ( t , r ’ ) x + q t , t ’ ) f i x ) l J I ) ,
(F19)
for any I+). Writing out Eq. (F19) explicitly in terms of matrix elements it is possible to solve for G(x,1; x’, t’) = (xl&, t ’ ) l x ’ ) up to a multiplicative constant phase factor (see Wolf, 1979, 1981, for details of the solution). The result is G(x , t ; x‘, t ’ ) =
1
4
m
i
[ a(t,t’)x’2
- 2xx‘ if
and
+ S(r,r’)x’]
P(t,t’)
+0
(F20)
355
OPTICS OF CHARGED PARTICLES
where 6(x’ - x / a ) is Dirac delta function. In the two-dimensional case, if the Hamiltonian is of the form =A(t)fi:
+ B ( t ) { r , .fi, +fil
.rl)
+ C(t)r:
,
(F22)
then, because of the independence of the corresponding x and y motions leading to the separation of the variables x and y, the above results are extended in a straightfomard manner corresponding to the replacements x 2 - + r ~ , f i ~ + ,f{ i~~f i ~ + f i , x ) + { r , . f+i .f i l * r l ) . In the case of the round magnetic lens taking ( t , t’) as ( z , z,), one has in -0. ll?lp [see Eq. (1191
showing that g,(z, z , ) and h,(z, z , ) are two linearly independent solutions of the paraxial equation [Eq. (130)l with the initial conditions as in Eq. (131). Since g,(z, z,) and h,(z, 2,) are a pair of solutions of the same second-order differential equation [see Eqs. (F25) and (F26)], it follows from the Wronskian relation, gh’ - hg’ = 1, that 6(z7 z,) = h’,(z, 2,). Substituting in Eq. (F17) a = g,, P = h,/po, y = peg;, 6 = hk, and cos q = (g, + h’,)/2 we get Eq. (126). With 27rh = Aopo, Eqs. (127) and (128) follow obviously from Eqs. (F20) and (F21) since G(rl ,rll) = G(x, x ’ ) G ( y , y’) where G ( y ,y’) is obtained from G(x,x ’ ) by just replacing x and x’ by y and y’, respectively. It is clear that if the A,-dependent paraxial hermitian term &$) in Eq. (221) is also added to h,,, while computing the paraxial optics of the round magnetic lens, then one has nonzero B ( z ) given by AiF’(z)/327r2. Consequent changes in Eqs. (F25)-(F28) lead to the results briefly discussed following Eq. (221).
356
R. JAGANNATHAN AND S. A. KHAN
ACKNOWLEDGMENTS One of us (R. J.) is grateful to Professor E. C. G. Sudarshan for suggesting the project of investigating the quantum theory of electron optics on the basis of the Dirac equation, fruitful discussion, and collaboration during the initial period of the work. He wishes also to acknowledge thankfully the benefit of collaboration he had, during the initial work, with Professor N. Mukunda and Professor R. Simon. Later, both of us have had the benefit of many discussions with Professor R. Simon on several aspects of the analogy between electron optics and photon optics. Thanks are due to Professor N. R. Ranganathan from R. J. for help in understanding the work of Phan-Van-Loc. We are thankful to Professor R. Ramachandran, Professor G. Rajasekaran, and Professor K. Srinivasa Rao for kind encouragement. It is a pleasure to thank Professor P. W. Hawkes: first one of us (R. J.) and then later both of us are enjoying his continued encouragement in the project of developing the quantum theory of charged-particle optics, along the lines of the work presented here, which has led us to write this chapter. We are thankful to him for a critical reading of the manuscript and very useful suggestions. Finally, thanks are due to the three volumes of Hawkes and Kasper for a large part of our education in electron optics. Note added in proof In the context of accelerator physics, the formalism of this chapter has been applied to study the beam optics of spin-1/2 particles with anamolous magnetic moment, leading to a unified quantum mechanical treatment of orbital motion, Stern-Gerlach effect, and the Thomas-BMT spin dynamics, based on the Dirac equation (Conte, M., Jagannathan, R., Khan, S. A., and Pusterla, M., 1996). Beam optics of the Dirac particle with anamolous magnetic moment. Preprint: INFN/AE-96/06 and IMSc-96/03/07).
REFERENCES Acharya, R., and Sudarshan, E. C. G. (1960). J . Math. Phys. 1,532-536. Agarwal, G. S., and Simon, R. (1994). Optics Comm. 110, 23-26. Barut, A. O., and Rgczka, R. (1986). “Theory of Group Representations and Applications.” World Scientific, Singapore. Bauer, E. (1994). Rep. Frog. Phys. 57,895-938. Bellman, R., and Vasudevan, R. (1986). “Wave Propagation: An Invariant Imbedding Approach.” D. Reidel, Dordrecht. Bjorken, J. D., and Drell, S. D. (1964). “Relativistic Quantum Mechanics.” McGraw-Hill, New York. Brill, D. R., and Wheeler, J. A. (1957). Reu. Mod. Phys. 29, 465-479. Busch, H. (1927). Arch. Elebotech. 18, 583-594.
OPTICS OF CHARGED PARTICLES
357
Castaiio, V. M. (1988). Optik 81,35-37. Castaiio, V. M. (1989). In “Computer Simulation of Electron Microscope Diffraction and Images” (W. Krakow, and M. A. OKeefe, Eds.), pp. 33-41. Minerals, Metals and Materials Society, Warrendale, PA. Castaiio, V. M., Viuez-Polo, G., and Gutierrez-Castrej6n, R. (1991). In “Signal and Image Processing in Microscopy and Microanalysis,” Proceedings of the 10th Pfefferkorn Conference, Cambridge (P. W. Hawkes, W. 0. Saxton, and M. A. OKeefe, Eds.), pp. 414-422 (Scanning Microsc., Suppl. 6, published 1994). Conte, M., and MacKay, W. W. (1991). “An Introduction to the Physics of Particle Accelerators.” World Scientific, Singapore. Conte, M., and Pusterla, M. (1990). I1 Nuovo Cimento 103, 1087-1090. Dattoli, G., Renieri, A., and Torre, A. (1993). “Lectures on the Free Electron Laser Theory and Related Topics.” World Scientific, Singapore. Dragt, A. J. (1996). Lie algebraic methods for ray and wave optics. University of Maryland Report, in preparation., Draft, A. J., and Forest, E. (1986). Adu. Electron. Electron Phys. 67,65-120. Dragt, A. J., Neri, F., Rangarajan, G., Douglas, D. R., Healy, L. M., and Ryne, R. D. (1988). Annu. Rev. Nucl. Part. Sci. 38, 455-496. Durand, E. (1953). C. R . Acad. Sci. Paris 236, 1337-1339. Eke, H. T., and Heinz, U. (1989). Phys. Rep. 183,81-135. Ferwerda, H. A., Hoenders, B. J., and Slump, C. H. (1986a) Opt. Acta 33, 145-157. Ferwerda, H. A., Hoenders, B. J., and Slump, C. H. (1986b) Opt. Acra 33, 159-183. Feshbach, H., and Villars, F. (1958). Reu. Mod.Phys. 30, 24-45. Foldy, L;L., and Wouthuysen, S. A. (1950). Phys. Rev. 78, 29-36. Forest, E., and Hirata, K. (1992). A contemporary guide to beam dynamics. KEK Report 92-12,,National Laboratory for High Energy Physics, Tsukuba, Japan. Forest, E., Ben, M., and Irwin, J. (1989). Part. Accel. 24, 91-107. Glaser, W. (1952). “Grundlagen der Elektronenoptik.” Springer, Vienna. Glaser, W. (1956). Elektronen und Ionenoptik. In “Handbuch der Physik” (S. Fliigge, Ed.), Vol. 33, pp. 123-395. Springer, Berlin. Glaser, W., and Schiske, P. (1953). Ann. Physik 12, 240-266 and 267-280. Hawkes, P. W., and Kasper, E. (1989a). “Principles of Electron Optics,’’ Vol. 1, “Basic Geometrical Optics.” Academic Press, London. Hawkes, P. W., and Kasper, E. (1989b). “Principles of Electron Optics,” Vol. 2, “Applied Geometrical Optics.” Academic Press, London. Hawkes, P. W.,and Kasper, E. (1994). “Principles of Electron Optics,” Vol. 3, “Wave Optics.” Academic Press, London. Jagannathan, R. (1990). Phys. Rev. A42, 6674-6689; corrigendum, ibid. A44 (1991) 7856. Jagannathan, R., and Khan, S. A. (1995). In “Selected Topics in Mathematical Physics-Professor R. Vasudevan Memorial Volume” (R. Sridhar, K. Srinivasa Rao, and V. Lakshminarayanan, Eds.), pp. 308-321. Allied Publishers, Delhi, Madras. Jagannathan, R., Simon, R., Sudarshan, E. C. G., and Mukunda, N. (1989). Phys. Lett. A134, 457-464. Khan, S. A. (1996). Quantum theory of charged-particle beam optics, Ph.D. Thesis, University of Madras (to be submitted). Khan, S. A., and Jagannathan, R. (1993). Theory of relativistic electron beam transport based on the Dirac equation. Presented at the 3rd National Seminar on Physics and Technology of Particle Accelerators and Their Applications, November 1993, Calcutta, India. Khan, S. A., and Jagannathan, R. (1994). Quantum mechanics of charged-particle beam optics: an operator approach. Presented at the JSPS-KEK International Spring School on
358
R. JAGANNATHAN AND S. A. KHAN
High Energy Ion Beams-Novel Beam Techniques and Their Applications, March 1994, Japan. Khan, S. A., and Jagannathan, R. (1995). Phys. Rev. E51, 2510-2515. Lifiares, J. (1993). In “Lectures on Path Integration: Trieste” (A. Cerdeira, S. Lundqvist, D. Mugnai, A. Ranfagni, V. Sayakanit, and L. S. Schulman, Eds.), pp. 378-397. World Scientific, Singapore. Magnus, W. (1954). Comm. Pure. Appl. Math. 7, 649-673. Montague, B. W. (1984). Phys. Rep. 113, 1-96. Month, M., and Turner, S., Eds. (1989). “Frontiers of Particle Beams: Observation, Diagnosis and Correction.” Springer-Verlag, Berlin. Newton, T. D., and Wigner, E. P. (1949). Reu. Mod. Phys. 21, 400-406. Patton, R. S. (1986). In “Path Integrals from meV to MeV” (M. Gutmiller, A. Inomata, J. R. Klauder, and L. Streit, Eds.), pp. 98-115. World Scientific, Singapore. Phan-Van-Loc (1953). C. R . Acad. Sci. Paris 237,649-651. Phan-Van-Loc (1954). C . R . Acad. Sci. Paris 238, 2494-2496. Phan-Van-Loc (1955). Ann. Fac. Sci. Uniu. Toulouse 18, 178-192. Phan-Van-Loc (1958a). Interpretation physique de I’expression mathkmatique du principe de Huygens en theirie de I’electron de Dirac. Cuhiers Physique No. 97,12, 327-340. Phan-Van-Loc (1958b). C . R . Acad. Sci. Paris 246,388-390. Phan-Van-Loc (1960). Principes de Huygens en thkorie de I’electron de Dirac. Thesis, Toulouse. Polo, G. V., Acosta, D. R., and Castaiio, V. M. (1992). Optik 89, 181-183. Pryce, M. H. L. (1948). Proc. Roy. SOC.Ser. A 195,62-81. Rangarajan, G., Dragt, A. J., and Neri, F. (1990). Pat?. Accel. 28, 119-124. Rubinowicz, A. (1934). Actu Phys. Polon. 3, 143-163. Rubinowicz, A. (1957). “Die Beugungswelle in der Kirchhoffschen Theorie der Beugung.” Pahstwowe Wydawnictwo Naukowe, Warsaw; (1966). 2nd ed. PWN, Warsaw and Springer, Berlin. Rubinowicz, A. (1963). Actu Phys. Polon. 23, 727-744. Rubinowicz, A. (1965). The Miyamoto-Wolf diffraction wave. Prog. Opt. 4, 199-240. See Part 11, Section 3: Diffraction wave in the Kirchhoff theory of Dirac electron waves. Ryne, R. D., and Dragt, A. J. (1991). Part. Accel. 35, 129-165. Scherzer, 0. (1936). Z. Physik 101,593-603. Scherzer, 0.(1949). J . Appl. Phys. 20, 20-29. Tani, S. (1951). Prog. Theor. Phys. 6,267-285. Ternov, I. M. (1990). Sou. Phys. JETP 71,654-656. Ximen, J. (1991). Adu. Electron. Electron Phys. 81, 231-277. Weidemann, H. (1993). “Particle Accelerator Physics: Basic Principles and Linear Beam Dynamics.” Springer-Verlag, Berlin. Wiedemann, H. (1995). “Particle Accelerator Physics 11: Nonlinear and Higher-Order Beam Dynamics.” Springer-Verlag, Berlin. Wilcox, R. M. (1967). J. Math. Phys. 8, 962-982. Wolf, K. B. (1979). “Integral Transformation in Science and Engineering” Plenum, New York. Wolf, K. B. (1981). S W J . Appl. Math. 40,419-431.
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 97
Ultrahigh-Order Canonical Aberration Calculation and Integration Transformation in Rotationally Symmetric Magnetic and Electrostatic Lenses JIYE XIMEN Department ofRadw.Electronics. Peking Unwersiw Betjing, People’s Republic of China
........................................ 360 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 . . . . . . . . . . . . . .361 . . . . . . . . . . . . . .366
I. Introduction 11. Power-Series Expansions for Hamiltonian Functions and Eikonals in Magnetic Lenses A. Hamiltonian Function and Its Power-Series Expansion B. Dimensionless Eikonal and Its Power-Series Expansion I11. Generalized Integration Transformation on Eikonals Independent of (r X p) in Magnetic Lenses A . Normalized Fourth-order Eikonal in Terms of T4 and S4 B. Normalized Sixth-order Eikonal in Terms of T6 and S, C. Normalized Eighth-order Eikonal in Terms of T, and S, D . Normalized Tenth-order Eikonal in Terms of T,,and S,, IV . Canonical Aberrations up to the Ninth-Order Approximation in Magnetic Lenses V . Generalized Integration Transformation on Eikonals Associated with (r X p) in Magnetic Lenses A. Normalized Fourth-Order Hamiltonian Function in Terms of
................................ 369 . . . . . . . . . . . . . 373 . . . . . . . . . . . . . . 373 . . . . . . . . . . . . .374 . . . . . . . . . . . .377
......................................
381
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 TM.T22. S4,. S2, . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 B. Normalized Sixth-Order Hamiltonian Function in Terms of Tm.T42. T24. S60. S4,. S24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 C. Normalized Eighth-Order Hamiltonian Function in Terms of 391 Tm.T62.T44.T26.S,,. S62. S44. S26 ......................... D. Normalized Tenth-Order Hamiltonian Function in Terms of Ti,,. Tg2.T64. T46.T28.Sl,. S,,. S64. S46. . . . . . . . . . . . . . . . . . . . 391 VI . Eikonal Integration Transformation in Glaser’s Bell-Shaped Magnetic Field . . . . 393 VII. Generalized Integration Transformation on Eikonals in Electrostatic Lenses . . . 396 A. Normalized Fourth-Order Eikonal in Terms of T4and S, . . . . . . . . . . . . 398 B. Normalized Sixth-Order Eikonal in Terms of T6and S, . . . . . . . . . . . . .398 C. Normalized Eighth-Order Eikonal in Terms of T, and S, . . . . . . . . . . . . 399 D . Normalized Tenth-Order Eikonal in Terms of T,,and S,, . . . . . . . . . . . . 400 VIII . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 5’2,
359
. .
Copyright Q 1996 by Academic Press. Inc All rights of reproduction in any form reserved
360
JIYE XIMEN
I. INTRODUCTION In the literature, Glaser and Scherzer systematically developed electron optical aberration theory (Glaser, 1933a, b, 1952; Scherzer, 1936a, b). The theory for calculating higher order geometrical and chromatic aberrations was further investigated by Sturrock (1955) and Plies and Typke (1978). The author (Ximen, 1983, 1986) utilized the eikonal method to investigate electron and ion optical systems. Hawkes and Kasper (1989) published their encyclopedic book in which original articles about aberration theory were reviewed in detail. In order to improve the performances of electron optical instruments it is necessary to investigate higher order geometric and chromatic aberrations. The author has developed the canonical aberration theory (Ximen, 1990a, b, 1991). In the canonical theory, electron trajectories have been described in the Hamiltonian representation using generalized position and momentum representations, i.e., (r, p), and thus electron optical aberrations have been derived for up to the ninth-order approximation in rotationally symmetrical magnetic and electrostatic systems (Ximen, 1995). Therefore, in the present study, we will perform a generalized integration transformation on Hamiltonian representation eikonals and thus obtain a set of different-order normalized eikonals, which are position dependent and momentum independent. This powerful technique enables us to greatly simplify Hamiltonian representation eikonals, and thus electron optical aberrations can be ingeniously expressed in compact form for up to the ninth-order approximation in both rotationally symmetric magnetic and electrostatic systems. This progress facilitates numerically calculating ultrahigh-order canonical aberrations in practical rotationally symmetric magnetic and electrostatic systems. In Sections II-VI, we will deal with rotationally symmetric pure magnetic systems, and in Section VII we will discuss rotationally symmetric pure electrostatic systems. Section II,A derives power-series expansions of different-order Hamiltonian functions for up to the tenth-order approximation in a rotationally symmetric magnetic system. In Section II,B, we transform physical quantities into corresponding dimensionless ones and thus derive the dimensionless eikonal function and its power-series expansion for up to the tenth-order approximation. In Section I11 we perform a generalized integration transformation on eikonals, independent of a constant product (r X p), and then derive a set of different-order normalized eikonals, which are position dependent and momentum independent. In Section IV we completely calculate isotropic intrinsic and combined aberrations, containing the zero power of the product (r x p), in up to the
361
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
ninth-order approximation by performing gradient operations on corresponding-order normalized eikonal functions. Therefore, third, fifth, seventh, and ninth-order canonical position and momentum aberrations have been expressed in concise and explicit form. In Section V we further carry out the generalized integration transformation on eikonals, associated with the constant product (r X p), and then derive a set of normalized eikonals. Therefore, one can calculate anisotropic intrinsic and combined aberrations, containing the odd power of the product (r x p), in up to the ninth-order approximation. In Section VI, the technique of integration transformation on eikonals is applied to Glaser's bell-shaped magnetic field, and thus ultrahigh-order position and momentum aberrations can be completely calculated and expressed in analytical formulae. In Section VII, we further perform such a generalized integration transformation on eikonals in rotationally symmetric electrostatic systems and then derive a set of different-order normalized eikonals, which are position dependent and momentum independent. Thus we can also calculate intrinsic and combined aberrations by a method similar to that shown in Section IV. Finally, in Section VIII we draw some conclusions about canonical aberration theory and discuss its possible applications and future developments.
11.
POWER-SERIES EXPANSIONS FOR HAMILTONIAN FUNCTIONS AND EIKONALS IN MAGNETIC LENSES
A. Hamiltonian Function and Its Power-SeriesExpansion
Based on the theory of classic mechanics (Arnold, 1978; Goldstein, 19801, the canonical aberration theory has been discussed in the author's previous papers (Ximen 1990a, b, 1991, 1995). In the present section, a rotationally symmetrical pure magnetic system will be discussed. Let the axial coordinate be z and the transversal components of the position vector and the momentum vector be r = ( x , y) and p = ( p x ,p,,), respectively. The Hamiltonian function H is defined as follows (Glaser, 1952; Sturrock, 1955; Hawkes and Kasper, 1989; Ximen, 1990a, b, 1991, 1995):
A (pep) + 2q1/' -(r r
1/2
X
p)
+ qA2])
(1)
362
JIYE XIMEN
-
where r = ( - y , x ) , r = ( x 2 + y 2 ) ’ / ’ , 11 = (-e/2m) > 0, and the accelerating voltage V = constant. The magnetic vector potential A will be expanded into the following power series (Glaser, 1952; Ximen, 1983,1986, 1995): A 1 - = - B o ( Z ) - b2B:(2)(r-r) -I-b4BA4)(r.r)2- b 6 B f ’ ( r ‘ r ) 3 r 2 + b , B f ) ( r . r)4 + -.* (2) where the magnetic vector potential has only an azimuthal component A, Bo(z)is the axial distribution of magnetic induction, and (r * r) = x 2 + y 2 . 1 1 1 b4 = b b2 = -, 16 12 x 32 ’ - 6 x 24 x 128’ 1 b, = (3) 24 X 120 X 512 We can expand the Hamiltonian function H into a power series (Ximen, 1995): H = Ho + H2 + H4 + H6 H , + H I , + * * . (4) where H , = -V1I2. In a previous chapter (Ximen, 1999, different-order Hamiltonian functions H2,(n = 1,2,3,4,5) have been derived. In a rotationally symmetric pure magnetic systems, it can be shown that the quasivector product (r x p) is conserved.
+
r’ =
dH
-= l { V dP
+
[(p-p)
2
A 2q1I2-(r r X p)
-1/2
+ gA2])
r -1/2
A
dA2
r
Because in such a magnetic system, d/adA/r) and d A 2 / d r are vectors along the r-direction, we get (r x p)’= r’ x p + r x p ’ = 0, (r X p) = dynamic constant = (r X p), . (5) Therefore, the product (r X p) can be treated as a constant product in arbitrary-order approximation.
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
363
1. Second- and Fourth-Order Hamiltonian Functions
H2 = M(r r) L H 4 = - ( r’r) 4
+ -N
+ -(r*r)2
+ 2Q(r X p),
V
(P * P) V
(6)
N (P PI’ + -4 v’ *
where L , M, N , P , Q, K are field-distribution functions for describing third-order aberrations (Glaser, 1952; Ximen, 1983, 1986, 1995):
+ 7 B i ) - B:
1
,
2. Sixth-OrderHamiltonian Function H~ = ~ , ( - rr13 + L,(r r)
2
(P - P) + Ll(r V
(P * P)’ (P * p)’ r) -+ LoV‘ v3
364
JIYE XIMEN
where L,, L,, L,, Lo,M I , M,,M3, N,, N,,N3 are field-distribution functions for characterizing fifth-order aberrations (Ximen, 1995). Only L,, L,, L,, Lo are given; others will be seen in the Appendix.
L,
=
3 3(V" + qB,2) -E M, 64V'I2
3. Eighth-Order Hamiltonian Function 4 3 ( P ' P) H , = J 4 ( r . r ) + J 3 ( r . r ) -+ J2(r V
+J,(r-I-)-
(P P)4 v3 + Jo- v4
( P P)' e
*
where field-distribution functions can be found in a previous chapter (Ximen, 1995). Only J4, J3, J,, J , , J , are given; others will be seen in the
ULTRAHIGH-ORDERCANONICAL ABERRATIONS
Appendix.
qBOB$,@
-
2 X (12)’
128
X
L’ 32
:’(- + L 2 M + L 3 N ) ,
J 3 = -v1/‘
LN
J2 = -
M’
4. Tenth-Order Harniltonian Function
HI, = 15(r* r)
5
(P PI’ (P - P) + 13(r*r)3 + 14(r - r) V V’ 4
2
(P * P)’
(P PI4
+ 1 2 ( r + r )-+Zl(r-r)-
v3
(P PI5 v4 + zo- v 5 *
*
365
366
JIYE XIMEN
+ + where field-distribution functions can be found in a previous chapter (Ximen, 1995). Only f,, f4,f3, I,, I , , f, are given; others will be seen in Appendix.
Bf’ + 2 x 24vBo + x 120 x 512
71( BA4’)2 +J4M+2 x (12 x 32)’
r4 = - J4 N + J3M + +2 f3 = - J , N + J , M + I,
=
- JZN
I,
=
Y), +Y),
L3N 4
L,M ++ 2
L2N
L,M
++ JIM + 4 2
-
4
B. Dimensionless Eikonal and Its Power-Series Expansion Based on canonical theory in electron optics (Ximen, 1991, 1993, canonical eikonal functions (i.e., functions of optical length) in up to the tenthorder approximation can be defined: eZn =
L H 2 , , dz
(n
=
2,3,4,5).
(15)
367
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
Having strictly proved in Eq. ( 5 ) that the quasivector product (r X p) is constant, we can divide el, into two parts: one is independent of (r X p) and the other is associated with (r X p). From now on we will investigate the independent-of-(r X p) part of eZn(n= 2,3,4,5); the associated-with(r X p) part of e2,, will be discussed in Section V. For computational simplicity, we will transform physical quantities into corresponding dimensionless ones. This transformation is symbolized by following expressions, i.e., the former physical quantity -+ the latter dimensionless quantity.
r/a
+r,
Jn
V1/2
+
J,(n
z/a
=
+z,
0,1,2,3,4),
d/d( z/a)
-+
d/dz,
'n
v1/2 + I ,
(n
=
p/V'/'
+
p,
0,1,2,3,4,5), (16)
where a is a characteristic length and Bo is the maximum value of the axial distribution of magnetic induction in the magnetic lens, while k 2 is a dimensionless magnetic lens-strength parameter (Glaser, 1952). Magnetic field distribution functions b , L, M , N , L,, J,, and I,, are functions of the dimensionless z coordinate, and thus the prime denotes the derivative with respect to 2.
1. Dimensionless Expansions of H,, H4, and e4 H2 = M ( r - r ) + N ( p . p )
( 17)
368
JIYE XIMEN
2. Dimensionless Expansions of H6 and e6
L,
1
=
L,
-(3k4b4 - k’bb”), 16
=
3 -k2b2, 16
Lo
1
=
16.
(23)
3. Dimensionless Expansions of Ha and
J
’
15 -k4b4 - 64 -
-
3 -k2bb”, 64
J
5 - -k2b2, -
4. Dimensionless Expansions of H I , and el,
32
J
-
-.5
128
(26)
369
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
+
+
1 kZbffb(6)+ k bb@' 20 x (12 x 16)2 2( 6 x 32)2
1
1
2( 6
X
k 2 (b ( 4 ) ) 2 , 32)2
12 X
I
-
3 -
35 -k6b6 128
35 I - -k4b4 - 128
1 kZpb(4)k2bb(6), ( 16)2 2(8 x 12)2
1
-
15 128
- -k4b3btf -
5 -k2bb", 128
+
1
16 X
3 k2bb'4' + k 2 (b)', 16 X 64 32
35 256
I, = - k 2 b 2 ,
7 - 256
I -
-.
(29)
111. GENERALIZED INTEGRATION TRANSFORMATION ON EIKONALS INDEPENDENT OF (I' X
p) IN
MAGNETIC
LENSES
In the present section, we will perform a generalized integration transformation on eikonals and thus obtain a set of different-order normalized eikonals, which are position dependent and momentum independent. This powerful technique was first introduced by Seman in order to simplify third-order aberration coefficients in electrostatic and magnetic round lenses (Seman, 1955, 1958; also Hawkes and Kasper, 1989). Afterward, this technique was also applied to simplify third-order aberration coefficients in orthogonal systems consisting of suitably orientated magnetic and electrostatic quadrupoles and round electrostatic lens (Hawkes, 1966/67; also Hawkes and Kasper, 1989). We will utilize this transformation tech-
370
JlYE XIMEN
nique for greatly simplifying up to the tenth-order eikonals in rotationally symmetrical pure magnetic systems. The basic integration transformation on eikonals can be expressed as follows:
Properly choosing the integration factor T2",we obtain a series of normalized eikonals, which are position dependent and momentum independent. 5
5
5
(31) n=2
n=2
By comparing Eq. (30) with Eq. (31), the integration transformation can be expressed by
+ H8 + H l o ) - (T4+ T6 + Ta+ TI")' = s 4 r 4 + S6r6 + Sar8+ S,,r'O.
( H4 + H6
(32)
According to classic mechanics (Arnold, 1978; Goldstein, 1980), the canonical equation is the electron's trajectory equation.
r'
=
dH
-,
p'=
dH --*
(33)
dr
dP
In the ultrahigh-order aberration calculations, the key point is that the derivatives r' and p' must retain necessary high-order terms in the total derivative [T2,,l.Therefore, we get:
r'
dH = -=
dP
d
-(H2 dP
+ H4 + H6 + H , +
dH d p' = - - = - -( H2 dr dr
a s . ) ,
+ H4 + H6 + H8+
-.*
).
(34)
Obviously, in the third-order aberration calculations (Seman, 1955, 1958; Hawkes, 1966/67; Hawkes and Kasper, 19891, the derivatives r' and p' are
371
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
simply given by the following Gaussian trajectory equation: d
r'
=
-( H 2 ) = p, dP
(35)
For formulary simplicity, scalar expressions will be utilized instead of vector ones; for example,
(r r)" = r Z n ,
(36)
( p * p)" = p2".
By using Eqs. (32)-(34), the total derivative of the integration factor can expressed by: dT2n + -p'
aT2n [T2,y = ?r'
3P
dT2,
dT2, d H
dT2, dH ap d r
dT2,
+ -= -- - -- +-. dr
dZ
ap
dZ
(37) We introduce a Poisson bracket (Arnold, 1978; Goldstein, 1980; Ximen, 19911, which can be equivalently expressed by a determinant:
[T2,,H]=
dT2, dH d T 2 , dH -- - -- = d r ap ap ar
-
- I
Thus Eq. (37) can be rewritten as follows:
Therefore, Eq. (32) can be expressed in terms of a determinant: (H4+H6+H8+H10)
d(T4
+
**.
dr
-dH2 + dr -
d(
d ( H4
dP
+
+H8)
dr
+T l o )
T4 + dz
a( T4 + ... + Tl0)
+Tlo)
=
s4r4
-dH2 + dr
d ( H4
+
**-
dr
+ S6r6 + s8r8+ S,,r'O.
+Hg)
312
JIYE XIMEN
Equation (40) can be classified into four different-order normalized Hamiltonian equations:
H4
T4
-
dz
- S4r4,
T8
dz
- S8r8, (43)
In the conventional third-order aberration calculations (Seman, 1955, 1958; Hawkes, 1966/67; Hawkes and Kasper, 19891, the first-order derivative equation (35) and the fourth-order normalized Hamiltonian equation (41) are universally utilized. It is the author’s contribution that the ultrahigh-order derivative equation (34) and the ultrahigh-order Poisson brackets [T2,,,H,,],see Eqs. (4214441, have been introduced into the generalized integration transformation on eikonals for deriving ultrahigh-order canonical aberrations. In the remaining paragraphs, we will use the computer software MATHEMATICA to derive T,, and S2,,. However, laborious derivations will be omitted; only final results will be provided. For mathematical simplicity, we will define a 3-term polynomial P3, a 6-term polynomial p6, and a 12-term polynomial P,, as follows: P3( q4,q5,q6)
+ q5k2bb”+ q6k2br2,
= q4k4b4
(45)
373
ULTRAHIGH-ORDER CANONICAL ABERRATIONS P 6 ( q 7 7 ~ 8 7 q 9 7 ~ 1 0 7 q 1 1 , q 1 2 ) =q7k6b6 +q8k4b3b" +qgk 4 b 2 ~ 2
+ q10k2bb(4) + q11k2(b")'+ qI2k2b'b", (46) p l 2 ~ ~ l 3 ~ ~ l 4 ~ ~ l S ~ ~ l 6 ~ ~ l 7 ~ ~ l 8 ~ ~ l 9 ~ ~ 2 O ~ ~ 2 l 7 ~ 2 2 7 ~ 2
+ ql4k6b5b"+ ql,k6b4bt2+ q16k4b2(btt)2 + q17k4b3b(4) + qI8k4b2b'b"+ q1,k4bbt2b"+ qZ0k4(b'), + q21k2b"b(4) + qZ2k2bb@) + q23k2b'b'5'+ qZ4k2(b''')2, (47)
= qI3k8b8
where qi(i = 4,5,6,7,8,.. . ,13,14,. . .,24) are dummy constant coefficients to be specified. A. Normalized Fourth-OrderEikonal in Terms of T4 and S,
According to conventional integration transformation (Seman, 1955, 1958; Hawkes, 1966/67; Hawkes and Kasper, 1989), we obtain: 1 T4 = -p3r 8
5
+ -Mpr3 12
5 - -M'r4 48
1
= -p3r
8
5
+ -k2b2pr3 24
-
5 -k2bb'r4. 48 (48)
By substituting Eqs. (19), (20) into Eq. (411, we obtain the fourth-order normalized function S,:
5 12
1
S - - L + -Mk2b2
, - 4
5 + -MIf 48 €4 =
=
1 5 -k4b4 + -k2p2 3 48
-
1 -k2bb", 48
[ S d 4 d~ + [T,]:,.
(49) (50)
B. Normalized Sixth-OrderEikonal in Terms of T6 and S,
In Eq. (421, the modified sixth-order Hamiltonian function is defined as: ( h h ) 6= H6 - [T,, H , ] L33
=
5 -k2bb', 24
L
+ L2,p2r4 + Ll,prs + LO6r6,(51)
= L3,p3r3
1 1 LZ4 = -k4b4 - -k2bb" , 6 4
1 1 - -k6b6 - -k4b3b" 06-6 6
+ -k
128
2( b
)2
+
5 Lls = -k4b3b', 24 1 -k2bb(,', (52) 192
374
JIYE XIMEN
T6 =
5 -M‘p2r4 96
+ P3( A,,
A,, A6)prS
C. Normalized Eighth-OrderEikonal in Terms of T, and S ,
In Eq. (431, the modified eighth-order Hamiltonian function is defined as: (hh)8
= H8
-
tT4, H6i
- LT6,
H4I
+ J6,p6r2 + J5,p5r3+ J4,p4r4+ J35p3rS + J2,p2r6 + J,,pr7 + Jo8r8,
= J,,pa
(59)
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
1 J B o =-128 ’
J 62
-
1 32
- -k2b2,
5 JS3 = -k2bb’, 96
375
376
JIYE XIMEN
According to Eq. (431, constant coefficients p can be specified:
7
11 p2=-, 12
C L 3 = - 3
cL4=
--7
19 640
Substituting Eq. (63) into Eqs. (61) and (621, we obtain: 1 T8 = - - p 7 r 128
-
11 7 -k 2b2p5r3+ -k2bb’p4r4 384 256
383 + ( =k4b3b‘
23 k2bHf’)p2r6 17 k2b’b” - --
121 + ( =k6b6
- -k 4 b 2 ~ 2- 80640 k b3b”
9
+ -k2( 2240
2560
4608
1783
689 26880
b”)2+
67
89 k bb(‘))pr7
k4b’b” + 32256 40320
689 391 k 2 b b +-107520 k 4 b ( b ’ ) 3- 322560
377
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
S
=
8
1 -k8b8 7
-
4409 20160
-k6bsb”
3113
1089 b t ) 2+ -k4b2( 17920
- -k6b4( 20160
b”)2
4559 419 871 k4b3b(4)+ -k462b’bf” - -k b ( b’ ) b” +-322560 64512 107520 689 479 k4( b f ) 4+ -k2bttb(4) 107520 430080
--
+
61 258048
-k
bb(6)
193 391 k2b’b‘S) + +-215040 322560 k 2 (bm)2, =
‘8
:/ S8r8
dZ
(65)
+ [ T8]za.
(66)
D. Normalized Tenth-OrderEikonal in Terms of T,, and S,,
In Eq. (441,the modified tenth-order Hamiltonian function is defined as: = HIO
(hh)10
-
+ 1
-128 ’
=
I
Iss
I 82 11
46
27 2560
961 7680
- -k6b6 - 320
1603 13440
+ -k4b2u2
=
(67)
1 --k2bb‘ 384
bb”
59 7680
- -k2bb“’ +
2917 23040
-k4b3b” 23 9216
67 11520
- -k2b’bf” - -k2bb(4)
689 26880
-k‘b’b’ - -k4b(b‘)3 391
I,,
85 +k 1536
- -k2b’b”
17 5120
=
5 2b2, 128
- -k
1536
- -kZ(b”)’
I,,
H4]
+ -k 2 b f 2
13 -k4b3b’ 1920
3
I
=
31
- --k4b4 - 192 =
- [T8,
H6i
+ ~,,p’r~ + z7,p7r3 + ~,,p‘r‘ + ~,,p~r’ + z,,p3r7 + 128p2r8+ ~ , , p r +~ Iolorlo
= I,,,p’o
I,,,
- fT6,
[T49 H 8 ]
713 +-80640k2b”b” + -k2b’b(4) 322560
53 2016
- -k4b3bf” +
89
4307 26880
- -k b2b’b”
k2bb(”, 64512
378
JIYE XIMEN
649 1581 6 4 12 719 -k8b8 + k b b -k b5bt1 4480 17920 10752 33 37 9 + -k4b2( b”)’ + -k4b3b(4) -k 4bb’ 2bll 1792 10752 2560 67 4 2 lbllt - 7 7 -k2b“b(4) k2bb@), 13440 3072 18432 937 28733 3001 k6 b5blll k6b4&bt1 I,, = -k8b7b113440 161280 161280 689 187 383 k4b2b“b“ + -k4bbt( b“)2 - -k6b3( b1)3+ 26880 10752 15360 821 89 k4b2btb(4) + -k4b3b(5), 107520 64512 1471 899 27079 kloblo - -k8b6(b1)’ -1010 = 13440 53760 161280k 8b7b” 899 3293 67 6b4( b”)’ + -k6b4btb(3) + -k6b3(b’)2b” + -k 53760 46080 80640 937 1 + -k6656(4) - -k4( b’)’( b”)’ 107520 2048 417 67 1 - -k4b( - -k4bb’b”b” - -k4b( b’)’b(4) 71680 80640 3072 191 4 2 I1 ( 4 ) 13 1 -k b b b -k4b3b(6)+ -k 2 (b(4))2 35840 55296 73728 1 1 kZb”b(6)+ 73728 737280 k2bb@), (68) I,
=
-
-
-
+-
+-
-
379
ULTRAHIGH-ORDERCANONICAL ABERRATIONS
1 40 specified: According to Eq. (44)’ constant coefficients v can be 14 9 1
-[ k2b2Z3,]’. (70)
v2=--,
v5 =
v4*
u3=--
29
v7=
--
15360 ’ 97
--
3840 ’
2240 ’
161280 ’
1920 ’
39
-
2560 ’ 599
1
Ul0 =
v6 =
u5* = vg =
v11 =
16 ’
1
1391
=
v4=--
40 ’
-
40320 ’ 1
--
v6*
1
=
512 ’ 41
v9=
--5376 ’ 1
v12 =
-80640 ’
53760 ’ 7649 23983 Y15 = -v14 = - v13 = 725760 ’ 725760 ’ 40320 ’ 4693 2123 1699 v18 = U16 = 387072 ’ = 1935360 ’ 967680 ’ 10957 31 545 v20 = v21 = v19 = 1548288 ’ 215040 ’ 1935360 ’ 79 1423 193 u23 = V24 = vz2 = 1161216. 3870720 ’ 3224320 ’ (71) 67
380
JIYE XIMEN
Substituting Eq. (71) into Eqs. (69) and (70), we obtain: T,,
=
T,,p9r + Tl3p7r3+ T6,p6r4+ Tssp5rS+ T4,p4r6+ T3,p3r7
+ T,p2r8 + T,,pr9 + Tolorlo,
(72)
where coefficients T91, T,3,...,T,,, Toloare derived as follows: T91
=--
1
128 '
7
T73 =
- -k2b2,
192 9 TM = -k2bb', 512 1 1 29 T - --k4b4 + -k2H2 + -k2bb", " - 16 1920 3840 1391 49 13 T4 = -k4b3b' - -k2b'b" - -k2bb"', 23040 15360 5120 97 41 599 T37 = - m k 6 b 6 - -k4b2U2 + 40320 k 4b3b" 5376 1 1 1 - -k2( btt)2 - -k 2 H b + k bb@), 53760 80640 161280 11701 2719 197 k6bsb' - k4b2b'b" - -k4636" T, = 161280 107520 30720 93 79 k 2 bll blll - -k4b(bt)3 + 71680 129024 143 443 + -k2b'b(4) + 2580480 k2bb(5), 5 16096 67 7649 23983 T,, = -k8b8 -k6bSb" - k6b4(b')2 40320 725760 725760 4693 2123 1699 k 4 b 2 w b +-967680 k 4 b 2 (b")' + 1935360 k4b3bC4)+ 387072 10957 31 4 H 4 545 k2vb(4) + 1935360 k 4 b ( b t ) 2 b t+t 215040k ( ) - 1548288 1423 193 79 kZbb(6)k2b'b'5' - 1161216 k2(b ) 2 , 23224320 3870720
381
ULTRAHIGH-ORDERCANONICAL ABERRATIONS
4433
5417
To10
=
1733
k6b5blll k6b4b’b“ - 268800 kgb7b’ - 403200 829440
631 k4bb‘(b“)2 1612800 1973 2423 12073 + 38707200k4b2b’b(4)+ 38707200 kdbJb(5)- 19353600k4(b’)3b” 27947 1423 k4b(b’)2b” 232243200 k2bb(7) 19353600 2267 2581 1037 k2bb(4), kZb”b(5)+ 46448640 232243200 b’b(6) 25804800 (73) 1 86563 180163 75097 k6b2(b’)4 S = -k’oblo - k8b6(b’)’ - =k8b7b“ 2419200 9 453600 34039 211213 +-967680 k6b3(b‘)2btr+ k6b4(b”)2+ 93499 k6 b4 Ub(3) 2419200 7257600 667 711 34549 k”sb(4) + 4 (b ’ 12 ( b ” ) 2+ 322560k4(b ’ ) 3 b + 2903040 k 716800 2861 893 1093 k4bb‘b”b + 1075200k4b(b’)2b(4) -k4b(b”)3+ 2419200 460800 59851 133 k4b2b’b(5) 30239 k4b2b”b(4)- k4b2( b‘fl)258060800 9676800 460800 1637 5167 10421 k 2 b(5) 2 b(4) k4b3b(6)29030400 46448640 ( )’ - 58060800 143 - 313 k2b”b(6)k2b’b(7)277 k2bb@), (74) 8294400 8294400 58060800 301783
2921
+ 29030400k6b3(btl3+ 6451200k4b2bllblll
+
+
+
+
~~~
(75)
Iv.
CANONICAL hERRATIONS U P TO THE NINTH-ORDER APPROXIMATION IN MAGNETIC LENSES
In Section 111, by performing a generalized integration transformation on eikonals, we have succeeded in deriving a set of different-order normalized eikonals, which are position dependent and momentum independent.
382
JIYE XIMEN
These normalized eikonals greatly facilitate deriving up to the ninth-order canonical aberrations in rotationally symmetrical pure magnetic systems. In the first-order approximation, substituting H , into Eq. (331, we obtain the first-order Gaussian trajectory equation, i.e., Eq. (35). By solving that equation and using the convention of dimensionless notations, i.e., Eq. (161, the first-order trajectory and momentum can be expressed as follows: rg = rpra
+ rupa,
pg = rhra
+ rLPa,
(76)
where r,, ra are well-known particular solutions to Gaussian trajectory equation, which satisfy following initial conditions at 2,. r a ( z a )= 0,
rL(za) = 1,
r p ( z a )= 1,
r ; ( z a ) = 0. (77)
Meanwhile the Wronsky determinant is given by (r;rp - rurh) = 1.
t 78)
In Eq. (761, the subscript g indicates the first-order Gaussian quantity. Since the first-order Gaussian quantities rg and pg can be tacitly understood, the subscript g will be omitted hereafter. According to canonical aberration theory, knowing eikonal functions e2,. . ., e10 enables us to calculate both intrinsic and combined aberrations in up to the ninth-order approximation by means of a gradient operation on corresponding eikonal function (Ximen, 1991, 1995).
where
eaZ and
T,, are defined by
T = T4
TaL= [TI:,,
+ T6 + T, + T10.
(81)
By utilizing Eq. (76), gradient operators are given as follows: d
d
a
- =rs- +rh-, 'P dra dr
d
a
d
-- -rU--+rL-. aPa dr 'P
(82)
383
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
Because only a position parameter but no momentum parameter appears in the normalized eikonal integral eaz, the gradient operator d / d p in Eq. (82) onto eaz will vanish.
as la -dz + la z
Ar
=
-ra
Ap
=
- r A j za
rp
dS
rp
dr
as
z
r,
dr
dz
dS
z
r p - d z + r i / z n r a - d zd r dr d
1
+ ri - (dT a z ) . aPa
By means of Eq. (781, gradient operations onto Taz can be simplified:
a
d
(84)
Canonical aberrations A r and A p at any observation plane z can be rewritten:
z
A P = -rh/
as
rp-dz+rb za dr
zI:[
dS
za
.
(85)
At Gaussian image plane z b , we obtain that r,(zb) = 0 and r p ( z b )= m . Thus canonical aberrations A d z b ) and A p ( z b ) at Gaussian image plane z b can be derived:
Therefore, the most important advantage of the generalized integration transformation is that canonical aberrations simply depend on gradients
384
JlYE XIMEN
d S / d r , d T / d p , d T / d r , and that the gradient d S / d p is identically vanishing. The canonical aberrations can be solved by the successive approximation procedures (Ximen, 1991, 1995). rltotalexactvalue = r
+ Ar3 + Ar5 + Ar7 + At99
pltotalexactvalue = p + Ap3
+ ApS + Ap7 + Ap99
(87)
where ( r , p ) is first-order position and momentum for the Gaussian electron trajectory, while (Ar3,Ap,), (Ar5,Ap5), (At-?,Ap,), ( A r 9 ,A p 9 ) are third-, fifth-, seventh-, and ninth-order position and momentum aberrations, respectively. Taking consideration of Eq. (801, we obtain different-order gradients:
-as4 - r - 4S4( r dr
dS6r6 -=
dr
+ Ar3 + Ar, +
6s6(r
+ Ar, + Ar5)’,
dS,r8 - 8S8( r dr
--
+ Ar3)?,
dS,orlo - 10s,,(r)~. dr
--
According to Eq. (go), the gradient a S / d r can be derived by summing up all terms in the above-mentioned Eq. (88). dS
= 4r3S, + 6[2S,(Ar3)rz+ S 6 r 5 ] dr
+ 2[6S4(Ar,)’r + 6S4(A r 5 ) r z+
3rd- and 5th-order terms Ar3)r4 + 4S,r7] 7th-order terms
+ ( 4 S 4 [( Ar3)3+ 6( Ar3)(Ar5)r + 3( Ar,)r2] +30S6[2(Ar,)’r3 + ( A r 5 ) r 4 ]+ 56S8(Ar3)r6+ lOSI0r9). 9th-order terms (89)
385
ULTRAHIGH-ORDERCANONICAL ABERRATIONS
Substituting Eq. (89) into Eq. (851, we obtain third-, fifth-, seventh-, and ninth-order position and momentum aberrations, respectively.
Ar5 = -r, l : r p { 6 [ 2 S 4 Ar,)r2 (
+ S6r’]} dz
+ rp[:r,{6[2S4(Ar3)r2 + S , r 5 ] } d z + Ap,
=
-rL /*rp{6[2S4(Ar3)r2+ S 6 r 5 ] dz } m
+ rb[1r,{6[2S4(Ar3)rz + S 6 r 5 ] dz } -
dT ’p
,
1115th
’1
-
,
(91)
IA5th
Ar7 = -ra”p12[6S4(Ar,)2r
+ 6S4(Ar,)rz
+ 15S6(Ar,)r4 + 4 S 8 r 7 ] )dz + r p L r , ( 2 [ 6 S 4 ( A r , ) 2 r+ 6S4(Ar5)r2 + 15S6(Ar3)r4+ 4S,r7]) dz + Ap7 = - r L / z r p ( 2 [ 6 S 4 ( A r 3 ) 2 r+ 6S4(Ar,)r2 za
+ 15S,( Ar3)r4+ 4 S 8 r 7 ] )dz + r;Lr,(2[6S4(Ar3)2r + 6 S 4 ( A r 5 ) r z
[$1’
I
za 7th
,
386 Ar,
JIYE XIMEN =
-ra/:rp(4S4[(Ar,)’
+ 6(Ar3)(Ar5)r+ 3(Ar7)r2]
+30S6[2(Ar,)’r’
+ (Ar5)r4]
+56S8(Ar3)r6 + 10S,,r9) dz
+ rpLrm(4S4[(Ard3 + 6(Ard(ArS)r + 3(Ar7)r2] +30S6[2(Ar3)2r3+ (Ar5)r4] + 56S8(Ar,)r6 + 10S,,r9) dz
[GI~J,~; dT
+ Ap,
=
+ 6(Ar3)(Ar5)r+ 3(Ar,)r2] +30S6[2(Ar3)’r3+ (Ar5)r4] +56S,( Ar3)r6+ 10S,,r9) dz
-rL /’rp(4S4[(Ar3)’ za
+ rbllra(4S4[(Ar3)3+ 6(Ar3)(Ar5)r+ 3(Ar7)r2] +30S,[2(A1-,)~r’+ (Ar5)r4] +56S8(Ar,)r6
+ 10S10r9)dz -
[:]:j,t;
(93)
In Eqs. (90)-(931, the braces ( } indicate that the entry must assume its Gaussian value and lower order aberration values. The primary third-order aberrations in Eq. (90) exactly agree with those presented in the literature (Glaser, 1952; Seman, 1958; Ximen, 1983, 1986; Hawkes and Kasper, 1989). The higher and ultrahigh-order aberrations in Eqs. (91)-(93) essentially coincide with those provided in the author’s previous chapter (Ximen, 1993, but the present results are much far more compact and usable than the original ones (Ximen, 1995). It is to be emphasized that Eqs. (91)-(93) describe not only intrinsic aberrations (e.g., S6r5,S8r7,Sl0r9 appear in the integrands) but also combined aberrations (e.g., lower order aberrations Ar,, Ar5,Ar7 exist in the integrands). Finally, we will discuss how to deal with the gradients aT/dp, dT/dr at the boundary planes (za, z ) or at the object and image planes ( z a ,2,). The total integration factor T = T4 + T6 + T8 + T I , is a very complicated function with respect to z, r, and p-see Eqs. (48), (561, (641, (72)-but it only assumes its boundary values and has nothing to do with laborious
387
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
integrations with respect to z. In principle, all integration factors T4,T6, Ta,T I , can be calculated numerically. However, if the boundary planes ( z a ,z ) or the object and image planes (zo, z,) are assumed to be in field-free space, where 6 = 6' = 6" = = 6") = 0, then the field-free total integration factor T* can be very much simplified as follows:
[
T,*,= - p r
-
- pl 7 r 128
-
- p19 r 128
1:.
1
=
1
[( i P 3EP7 -
-
1
128'
)r]:a'
(94)
By procedures similar to those for deriving d S / d r in Eqs. (88)-(89), we can derive gradients d T * / d p and - d T * / d r at boundary planes (zo, z). dT* 3 3 3rd- and 5th-order terms -= - p 2 r + 7(A p 3 ) p r JP 8 7 3 3 - -p6r -(Ap3)'r + - ( A p 5 ) p r 128 8 4
[
+
+
[
+
3
3
+ z( Ar3)( A p 3 ) p + i( Ars)p2] +
[
7th-order terms
21 64
9
- -( A p 3 ) p 5 r
- -par
128
3
3
7
+ - p P 3 ) ( A P s ) r + q ( A P 7 ) P r - ,,,(Ar3)p6 3
3
+ p 3 ) ( A P 3 ) 2+ q ( A r s ) ( A P s ) P
3
3
+ q ( A r s ) ( A p 3 ) p+ i ( A r 7 ) p 2 ] , dT* --=
dr
+
[ [
[8
(Ap3)p2]
-
3 i(Ap3)2p
3
1 - 8p3 -
1 Ep7
1 + -P9 128
7th-order terms
-
-
3 q(AP3)(APs)P
1 3 (AP~) ~, ( A p 7 ) p 2 ] . 8
--
(95)
3rd- and 5th-order terms
7
+ --(AP3)P6 128
9th-order terms
9th-order terms (96)
388
JIYE XIMEN
Substituting Eqs. ( 9 9 4 9 6 ) into Eqs. (9014931, we obtain field-free boundary values of d T * / d p , - d T * / d r in different-order aberrations.
Combining Eqs. (901493) with Eqs. (97)-(1001, we obtain third-, fifth-, seventh-, and ninth-order canonical position and momentum aberrations in concise and explicit form.
389
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
v. GENERALIZED b"EGRATI0N TRANSFORMATION OF EIKONALS ASSOCIATED WITH (r X
p) IN MAGNETIC LENSES
In this section, we will investigate the associated-with-(r X p) part of H 2 , and e2Jn = 2,3,4,5); meanwhile the product (r X p) can be treated as a constant in an arbitrary-order approximation. There is no need to perform the integration transformation for different powers of the pure product (r X p) in H 2 n ,which can be collected as follows:
H r x p= K(r
X
p)2 + N3(r X p)'
+ JN3(rX
+ ZN3(rX P ) ~ .(101)
According to Eqs. (714141, we may rewrite H , J n = 2,3,4,5) in order to indicate explicitly and symbolically their relationships with respect to the product (r x p), but neglecting pure (r x p) terms. H 4 = H40 H6 = H60
+ H42(r
H8 = HE, + H62(r HIO
=
+ H82(r
X
+ H22(r
( 102)
P),
P) + H24(r
P)2,
P) + H44(rX P ) + ~ H26(r
P) + H64(r
PI2 + H46(r
(103) P),,
(104)
PI3 + H28(r
PI4*
X
(105)
In comparison with Eqs. (7)-(14), one can derive above-mentioned Hamiltonian functions: (i) H?. is defined by field-distribution functions L, M ,N; H22 defined by P, Q. (11) H60 defined by L,, L,, L,, Lo; H 4 2 defined by M,, M , , M,;H24 defined by N , , N,. (iii) HE, defined by J 4 , J,, J 2 , J , , J,; H62 defined by J L 3 , J L 2 ,JL,, JLo; H44defined by J M i , J M 2 ,J M 3 ; H26 defined by J N l ,J N 2 . (iv) HI,, defined by I,,&, I,, I,, I,, I,; HE2 defined by IJ4,IJ39 I J 2 9 I,(); H64 defined by I L 3 , I L 2 , I L 1 , ILO; H46 defined by ZMl,IM2,Z,,,,; H28 defined by I N 1 ,I N 2 . In the Appendix, we will list the above-mentioned field-distribution functions other than those presented in Eqs. (201, (23), (26), (29) (Ximen, 1995). Correspondingly, integration factors T2, and qj can be defined by: T4 T6
= T60
T~ = T~~+ ~ =
+ T82(r
= T40
+ T22(r
( 106)
P),
+ T42(r P) + T24(r P)', ( 107) ~ x p) ~ + (~ r~ x p)' ~ +( ~ r ~x p13, ~ ( (108) r
PI + T64(r
P>2 + T46(r
PI3 +
T28(r
PI4* ( 109)
By mathematical procedures similar to those shown in Section 111, i.e., substituting Eqs. (102)-(109) into Eqs. (41)-(44), we can derive a series of equations for calculating both Tiand S i j (i, j = 2,4,6,8).
390
JIYE XIMEN
A. Normalized Fourth-OrderHamiltonian Function in Terms of T40
9
T22 9 s40
9
s22
Obviously, Ta and S4, exactly coincide with T4 and S4 as presented in Section II1,A. Moreover, we can derive dimensionless H , from Eqs. (7) and (8): = Pr2
H,
p
=
+ Qp2,
1 1 -k3b3 -kb” 2 8 ’
Q =
1 -kb. 2
(110)
Consequently, one can derive T, and S,, by the following equation:
H22 -
1 1 T2, = ?kbrp - -kb’r2, 4
S,
1 + -kb” 8
= k3b3
(112)
B. Normalized Sixth-OrderHamiltonian Function in Terms of T60 9 T42
9
T24
s60
s42
9
s24
Obviously, Tm and S,, exactly coincide with T6 and Section 1II.B.
aT24 aT24 -
dr
dp
dH, dr
dH2 ap
-
s 6
as presented in
391
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
C. Normalized Eighth-OrderHamiltonian Function in Terms of T80 9 T62
7
T44
7
T26
7
'80
7
'62
7
s44
7
s26
Obviously, T,, and S,, exactly coixide with T, and S, as presented in Section II1,C. H62
- [ T 2 2 9 H601 - iT407 H 4 2 1 -
-
dr
dp
dH2 -
dH2 -
ar
aT,
-dz
H221 - [ q 2 9 H 4 ~ I
= S,r4,
dp aT26 aT26 -
ar dH2 dr
ap dH2 ap
a
- - =T26
dz
S2,r2.
D. Normalized Tenth-Order Hamiltonian Function in Terms of T I O O , T 8 2 , T647 T467 T28, '1007
'827 '649 '467 ' 2 ,
Obviously, TI,, and S,,, exactly coincide with T I , and S,, as presented in Section II1,D.
- [T22, -[T80,
- [T407 H221
- [T629
H621 H401
- iT607 H4,1
-
[T429 H601
392
aT64 aT64 -
ar aH2 ar
ap aH2 ap
- - aT, =
dz
s64r6,
(119)
aT46 aT46 -
ar
aH2 dr
ap dH2 dp
a T46 az
- s46r4,
In summary, we may classify generalized integration transformation into five groups: (i) TZ, S22;TZ4,S24; T26, s26; T28, satisfy the integration transformation similar to that shown in Eqs. (111)-(112). (ii) Tm,S40; T42, S42;T44,S44;T46,s 4 6 satisfy the similar integration transformation as shown in Section III,A. (iii) Tho, s60; T62, S62; Ta4,S,, satisfy the integration transformation similar to that shown in Section II1,B. (iv) TEo,SEo;T82, s 8 2 satisfy the integration transformation similar to that shown in Section III,C. (v) Tlo0,S,,, satisfy exactly the same integration transformation for TI,, S,, as shown in Section III,D. Therefore, in principle, one can calculate both intrinsic and combined aberrations in up to the ninth-order approximation, including isotropic aberrations containing the zero or even power of the product (r x p), and anisotropic aberrations containing the odd power of the product (r X p).
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
393
VI. EIKONAL INTEGRATION TRANSFORMATION IN GLASER’S BELL-SHAPED MAGNETIC FIELD In Glaser’s bell-shaped magnetic field (Glaser, 1952; Ximen 1983, 1986; Hawkes and Kasper, 19891, the axial distribution of magnetic induction can be expressed by an analytical formula: B(z) =
BO
1
+ (2/a)2 ’
where Bo is the maximum value of the axial magnetic induction at the center of the lens ( 2 = 01, a is the half-width of the magnetic field. Glaser’s bell-shaped magnetic field is a very important theoretical model, because not only can its Gaussian trajectory equation be solved analytically, but also its primary-third-order aberrations can be exactly expressed by analytical formulae (Glaser, 1952). Based on the results provided in the present study, one can confidently conclude that higher and ultrahigh-order aberrations in Glaser’s bell-shaped magnetic field can also be completely expressed in analytical formulae. In fact, by using the convention of dimensionless notations, i.e., Eq. (16)’ the dimensionless axial distribution of magnetic induction b ( z ) and its derivatives can be derived as follows: 1
I
b(2) = 1 +z2’ 22
b ’ ( z )= (1
+ z2)2 ’
394
JIYE XIMEN
b‘6’(z )
4 6 0 8 0 ~ ~ 5 7 6 0 0 ~ ~ 172802’
=
(1 bC”( z )
-
+ z2)7
(1
+
6 ’
6 4 5 1 2 0 ~ ~ 9676802’
= -
+ zz))”
(1 b@’(z )
=
+
(1
(1
z2)
+ z2)7
-
720
-
+ 2)’
(1
403200~~ (1
+z2)
6 +
+
z2)4’
403202 (1
+ z2)5
’
10321920~~ 18063360~~ 9 6 7 6 8 0 0 ~ ~
-
(1
-
1612800~’
(1
+
(1
+z2)9
+
+z2)6
+
Z’f
+ 2)’
(1
40320
(1
+z2)5’
In a bell-shaped magnetic field, the Gaussian trajectory equation can be derived from Eq. (35): r’ = p ,
p’
rff+
k’
-k2b2r,
=
(1 + z ’ )
’ r = 0,
where k’ is a dimensionless lens-strength parameter (Glaser, 1952). Substituting z
=
cp = arcctg z,
ctg cp,
z,
=
ctg Q,,
cp, =
arcctg z , (125)
and defining w = (1
+ k’)l’*
one can obtain two particular solutions of Eq. (124): rJz) rs( z )
=
sin cp,
1
rL(z) = rb( z )
=
w
sin Q,
sin cp,[
w
cos P,
+-[--0cos 0
=
-
1 sin w(cp w sin Q, sin Q
sin cp
cos w ( cp
p,)
Q,) 9
( cp - q,) +-cos pa sin wsin cp
9
0
[ - w cos w ( cp - (p,) sin cp + sin w ( cp - 9,) cos cp] ,
sin w ( cp - 9,) sin cp w ( cp
+ cos o(Q - cp,)
- pa) sin cp
cos ‘91
+ sin w( cp - q,) cos c p ] . (127)
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
395
Obviously, particular solutions r&), rs ( z ) satisfy initial conditions Eq. (77) and Wronsky determinant Eq. (78). By using Eq. (761, the first-order trajectory and momentum can be expressed as follows: rg = rsra + rapa,
p g = rhr,
+ rLpa.
( 128)
It is to be noted that, substituting Eq. (123) into Eqs. (491, (571, (65), and (741, we obtain field distribution functions S,, s6, s8, S,, in correspondingorder normalized eikonals.
s4 =
k’ 24(1
k’ s6 =
1440( 1
+ z’)
+ 2’) ( -201
4(1
+ 8k2 + 7z2),
+ 740k’ + 288k4
+3510z2 - 2769k2z2- 2769z4), kZ s8 =
40320(1
+ z’)
( -9018
(130)
+ 23478k’ + 17636k4+ 5760k6
+252396z2 - 220449k’~’ - 77812k4z2
+
+
- 5 4 5 6 3 4 ~ ~ 174819k2z4 160632z6), (131)
s,,
k’ =
3628800( 1 + z ’ ) l 0 X(
-968895
+ 2195451k’ + 2303748k4 + 1441304k6+ 403200k’
+41960700z2 - 37379520k2z2- 23633514k4z2- 7093928k6z2
+
+ 22493010k4z4
- 174877650~~76761495k2z4
+ 147583980~~ - 26072910k2z6- 228903752’).
(132)
Evidently, by substituting Eqs. (12914132) into Eqs. (9014931, higher and ultrahigh-order position and momentum aberrations in Glaser’s bell-shaped magnetic field can be completely calculated and expressed in analytical formulae, From Eqs. (129)-(132), it is to be seen that field-distribution functions S,, decrease rapidly with increase of the number n(n = 2,3,4,5). Therefore, weights of higher ad ultrahigh-order aberrations with respect to the total aberration decrease remarkably. It is expected that these theoretical results are useful for estimating effects of ultrahigh-order aberrations in magnetic lenses.
396
JIYE XIMEN
VII. GENERALIZED hTEGRATION TRANSFORMATION ON EIKONAIS IN ELECTROSTATIC LENSES In the present section, a rotationally symmetric pure electrostatic system will be discussed. The Hamiltonian function H is defined and expanded as follows (Glaser, 1952; Sturrock, 1955; Hawkes and Kasper, 1989; Ximen 1990a, b, 1991, 1995):
H = -{4-(P-P)p2,
+
H = H , + H , + H4 H6 + + H,, + . (133) In order to establish the canonical aberration theory in up to the tenthorder approximation, the electrostatic potential 4 is expanded into power series (Glaser, 1952; Ximen, 1983, 1986, 1995):
4
=
V(Z)
-a
- + a , ~ ( ~ ) (rl2r - a , ~ ( ~ )- (r13 r r14 - uloV1o)(r- r)’ + , (134)
, ~ ( z ) ( r r)
+ a,V’)(r 1
a2=q,
*
*.-
1
1 a4=64,
U6 =
36 X 64 ’
1 1 (135) ‘lo = 36 x 64 x 64 x 100 ’ 36 x 64 x 64 ’ where V ( z ) is the axial distribution of the electrostatic potential, and a, =
ff,
I
-j/1/2.
In a rotationally symmetric pure electrostatic system, an electron trajectory is not rotated by a magnetic field, thus there is no (r X p) term in the Hamiltonian function. Therefore, the Hamiltonian function can be simplified and expressed in physical units instead of in dimensional form. In order to describe the canonical aberration theory in up to the tenth-order approximation, we have to list all nonvanishing field-distribution functions with respect to H2 in Eq. (61, H4 in Eq. (71, H6 in Eq. (91, H8 in Eq. (11) and H , , in Eq. (13) as follows (Ximen, 1995):
ULTRAHIGH-ORDERCANONICAL, ABERRATIONS
I -
1
- 24576V5/2
397
( 105Vrr3- 45W”V‘4’ + 2V2V‘6’),
The Gaussian trajectory equation in an electrostatic system is given by: d d V” p’ = - - d (H 2 ) = - 2 M r = -r’ = - ( H , ) = p/~’/2, r 4V/’/2I-. dP ( 140)
398
JIYE XIMEN
In following paragraphs, we have further performed a generalized integration transformation on eikonals in a rotationally symmetric electrostatic system, and then derived a set of different-order normalized eikonals, which are position-dependent and momentum-independent. Thus we can also calculate intrinsic and combined aberrations by the same method as shown in Section IV. A. Normalized Fourth-OrderEikonal in Terms of T4 and S,
According to Eqs. (71, (40, and (136) we obtain: T4 = t 3 1 p 3 r+ t 2 , p 2 r 2+ t1,pr3 + tO4r4, 1
t31 =
8V’
t,,
V’
=
16V3/’ ’
ti3 =
(141)
1 V” V t 2 - 7)’ 32( V +
( 142)
t,=-
Obviously, these results coincide with those presented in the literature (Seman, 1955, 1958). B. Normalized Sixth-Order Eikonal in Terms of T, and S,
According to Eqs. (9), (421, and (137) we obtain:
+ tZ4p2r4+ t,,prS + tO6r6, 1 t,, = -( - 2VI2 + W ”,} 48V3
T6 = t4,p4r2+ t,,p’r3 V’ t42 =
--
t,,
-(210V4 - 33OW”V”
=
32V5/’ ’
1 7680V4
(144)
+ 84V’V”’
+39V2V’V‘3’ - 26V3V4’}, t,
=
1 (840V” - 2040W’3V” 948V2V’V’’’ 46080V9/’ +408V2V’2V‘3’- 182V3V”V‘3’- 95V’V’V‘4’ + 26V4V”’}, (145)
+
399
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
s,
1
=
(756OVt6 - 22O50Wf4Vf‘+ 16080V2V’2V’f2 92160V ‘ ‘ I 2
+ 6120V2V‘3V(3’- 5853V3V’V’V(3’ +364V4(V(3))2- 1191V3v12V(4)+ 296V4VfV(4) - 1464V3V”’
+216V4V’V(’) - 32V5V(6)).
(146)
C. Normalized Eighth-Order Eikonal in Terms of T, and S , According to Eqs. (10, (431, and (138) we obtain:
T,
=
t,,
+ t62p6r2+ t,,p5r3 + t4,p4r4 + t3Sp3r5+ t2,p2r6+ tl,pr7 + to,r8,
t,,p’r
1 128V3 ’
= --
t,,
t44 =
t3’
=
=
I62 =
-
( 147)
5 V’ 256 V 7/2 ’
1 1536V4
- -( 2 1 v 2 + 5W“),
1 2048V9/2 ( -4OW‘V”
1 ( -840V4 61440V5
+ 9V2V3’},
+ 9 O W 2 V ” + 34V2V”’
+219V2V‘V‘,) - 46V3V‘4’), t26 =
1 (-756OV” 368640V “I2
+ 11760W’3V” - 549OV2V’V’‘
- 1065V2V’2V‘3’ + 1091V3Vf’V‘3)+ 249V3V’V‘4’ - 58V4V‘5’), 1 t17 =
5160960V
( -8316OVf6 - 1638O0W4V” - 95760V2V’2V’f2 + 11574V3Vf’, - 30975V2V’3V‘3’ + 31774V3V’V’fV(3) -2182V4( V(,))’ -672V4V’V(5)
+ 6435V3V’2V(4)- 5806V4V”V(4)
+ 556V5V@)),
400 t,
JIYE XIMEN =
1 4128768OVl3/’ X
( -49896OVf7 + 119448OW5V” - 804720V2V‘3V”2 + 119520V3V’V” - 287700V2V f 4F3’+ 338040V3V 2V”V‘3’ -40211V4V”2V(3)- 36138V4V’(V ( 3 ) ) 2+ 65400V3V’3V(4) -57873V4V/‘V“V4’+ 9330V5V”’V‘4’ - 7779V4V’2V‘5’ +3888V5V”V”’
s -
-
+ 2068VsV‘V‘6’ - 556V6V7’),
(148)
( - 6486480V” + 19792080W’6V“ - 18461520V2V’4V’”2+ 5186160V’V’2V”3
82575360V15/2
- 136800V4V”4- 4978260V2V’5V/‘3’
+ 7762860 V
V’ V“V 3 ) 2130135V4V’V” V ( 3 ) -856770V4Vf2( + 224392V5Vf’(V ( 3 ) ) 2
+962640V3V’4V(4)- 1236345V4V’2V‘rV(4) + 105744V5V‘f2V(4) 275184V5V‘V(3)V(4) -7404V6( V(4))2- 169695V4Vr3V(5)
+
+ 155838VsV’V”V‘s’ - 26436V6V‘3’V‘5’ +25122V5V”2V(6’- 4O88V6V”V6’ -4692V6VV‘7’
+ 832V7V8’).
( 149)
D. Normalized Tenth-Order Eikonal in Terms of T,, and S,, According to Eqs. (13), (44) and (139) we obtain:
+ t6,p6r4+ t S 5 p 5 r 5 + t4,p4r6+ t37p3r7+ t z a p 2 r 8+ t,,pr9 + tO1OrlO,(150) 1 7 vr 1 = -t73 = - { V 2+ W ” } , t,, = 1 2 8 ~ ’4 512V9I2’ 128V5 TI,
t91
=
t , , p 9 r + t8,p8r2 + t,,p7r3
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
‘*
=
1 { -6174V2V‘V’’2 73728OVl3l2
+
+ 2008V3V”V‘3) +891V3V’V‘4’
t37 =
1 (4158OVf6 - 99540W’4V‘’ 5160960V7 -4464V3V‘”
40 1
- 144V4V‘5’},
+ 35532V2V’zV’’2
+ 15582V2V/’3V‘3’+ 1116V3V’v”V(3’
+ -915V4V‘V5) + 86V5P6)),
-917V4( V ( 3 ) ) 2 2715V3V’2V(4)- 3 6 2 ~ ~ v ” v ( ~ )
t28 =
1 27525120V15/2
x (36O360Vt7 - 99792OW”V”
+ 756000V2~’3v2
+ 214200V2V’4V‘3’ - 217448V3V’2V’’V‘3’ - 22428V3V3V4’ +49345V4V”2V(3)+ 9468V4V’( V3))’ + 33579V4V‘V’’V‘4) - 4796V5V‘3’V‘4’ - 1047V4V’2V(5) -3030VsV”V‘5’ - 338V5V’P6’ + 128V6V‘”), - 188664V3V’V”3
1 ‘19 =
1486356480V8
x (16216200Vt8 - 51060240W’6V” + 48036240V2V’4V’2 - 15150240V3V’2V’’3+ 852912V4Vtt4+ 13056120V2V’5V(3) - 18966528V3V’3v“V(3)
+ 6302001V4V/’V”2Vv‘3’
+ 1503516V4V’’( V ( 3 ) ) -2 618588V5V”(W3))’ - 1573236V3V’4V‘4’+ 2256231V4V’zVzV(4) -532200VsVt’2V(4) - 422562VsV’V(3)V(4)+ 49944v6( ~ ( ~ 9 ’ + 112581V4V’3V/‘5’- 240192V5V‘V’’V(5’ + 46956V6V‘3’V‘5’ - 25668VsV’2V(6) + 42528V6V”V(6) +3180V6V‘V‘’’ - 2728V7V‘8)),
402
JIYE XIMEN
1 to10 =
14863564800V I 7 l 2 X
(1297296OOVf9- 459459000W’7V“ + 5 189184oOV V” V 2- 207431280V V’3V” + 19266912V4V’V”4 + 129396960V2V’6V(3) - 240“5120V3V’4V’’V‘3’ + 107398704V4V’2V ” 2 V ( 3 ) - 5629134V5v”3V(3) + 24980592V4Vt3(Y ( 3 ) ) 2 - 15910194V5V’V”( V ( 3 ) ) 2 + 618588V6( V 0 ) ) 3 - 17747100V3V’5V‘4’ + 26229000V4V”V” P4) - 6790878V’V’
V4)- 5645673V’ V” V(3)V(4)
+ 1595250V6V”V(3”c/(4)+ 361422V6V’(V(4))2
+2023560V4V‘4V(5)- 306”75V5V’2V’’V(5) +418062V6Vf’2V(5)+ 756666V6V’V(3)V(5) - 92916‘v7V4’V‘’’ - 360945V5V’3V‘6’ + 424050V6L”V”V(6’ - 84444V7V3)V6’+ 32028V6V‘2V(7)- 22236V7V”V‘7’
- 1O948V7V’V@)+ 2728V8V9)), 1 ‘lo
=
(151)
29727129600V 1912 X
(2205403200V”0 - 9145936800W‘8V“
+ 129080952O0V2Vr6Vff2- 7196500080V3V’4V”3 + 1318651488V4V’2V’’4- 28353024V’V”’ + 2601078480V2V’7V(3)- 6203652840V3V’5~‘’V(3) + 4031128836V4V’3V”2V(3)- 585020205V5V’V”3V‘3’ + 7O4915568V4Vf4(V3))’- 683332146V5V2V~‘(V39’ + 62109492V6V”2( + 34913328V6V‘( V ( 3 ) ) 3
+ 851509260V4V’4V”V‘4’ + 14061024V6V’”3V‘4’ - 186324579V5V’3V‘3’V(4)+ 11”46580V6V’V’’V(3’V(4) - 6509268V7( V(3))2Vc4) + 12166956V6V2(V(4))2 - 1461864V7Vf(V(4))2 + 53706240V4V’5V(5) - 439043220V3V’6V‘4’
- 383907699V’V’2V”2V‘4’
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
403
- 89523000V5V’3V’fV(5)+ 26601246V6V’V“2V(5)
+ 21200226V6V’*Y(3)V(5) - 61413OOV’ V ” V ( 3 ) V ( 5 ) -3116808V7V’V(4)V(5) + 185832V8( V ( 5 ) ) 2 - 5515335V5V‘4V‘6’
+ 8821530V6V’2V ” V ( 6 )
- 595344V V” V @ ) 24182O4V V’V‘3’V‘6’
+ 73200V8V(4)V(6)+ 882030V6V’3V(7)
- 1027020V7V’V’’V(7’+ 213360V8V‘3’V(7) -1
+
2 2 1 0 0 ~ ~ ~i 4~ 9~ 2~ 8( ~~ ~) ~ ” ~ ( ~ )
+ 24624V8V’V(9)- 4448V9V(*0)).
(152)
So far, in rotationally symmetric pure electrostatic systems, we have performed a generalized integration transformation on eikonals and derived a set of different-order normalized eikonals, which are position dependent and momentum independent. These normalized eikonals greatly facilitate calculating intrinsic and combined aberrations by the same method as shown in Section IV. However, it is to be emphasized that, for rotationally symmetric electrostatic lenses, only isotropic aberrations exist, but no anisotropic aberration appears. VIII. CONCLUSION Based on the ultrahigh-order canonical aberration theory (Ximen, 19951, we have derived the power-series expressions for Hamiltonian functions up to the tenth-order approximation in rotationally symmetric magnetic and electrostatic systems. In the ultrahigh-order abberation calculations, the key point is that the derivatives r’ and p’ must retain necessary high-order terms in the total derivative of the integration factor T,,. It is the author’s contribution that the ultrahigh-order derivative equation (34) and the ultrahigh-order Poisson brackets [T,, ,H,, I have been introduced into the generalized integration transformation on eikonals for deriving ultrahighorder canonical aberrations. For investigating magnetic systems, by transforming physical quantities into corresponding dimensionless ones, we have derived the canonical power-series expressions for dimensionless eikonal functions up to the tenth-order approximation. Obviously, in power-series expressions of Hamiltonian functions, dimensionless field-distribution functions with the even power of the magnetic field describe isotropic aberrations, and the
404
JIYE XIMEN
field-distribution functions with the odd power of the magnetic field describe anisotropic aberrations. We have successfully performed a series of generalized integration transformations on eikonals independent of the constant product (r X p) and on eikonals associated with the constant product (r X p), thus obtaining a set of different-order normalized eikonals, which are position dependent and momentum independent. According to canonical aberration theory, knowing different-order eikonal functions enables us to calculate both intrinsic and combined aberrations up to the ninth-order approximation by means of a gradient operation on the corresponding-order eikonal function in a rotationally symmetric magnetic system. Because normalized eikonals are position dependent and momentum independent, it is much easier to performing their higher and ultrahigh-order gradient operations. Therefore, in principle, we can calculate not only isotropic but also anisotropic, intrinsic, and combined aberrations in up to the ninth-order approximation. Precisely speaking, third-, fifth-, seventh-, and ninth-order canonical position and momentum aberrations have been completely expressed in concise and explicit form. By a similar theoretical method, we have also performed a series of generalized integration transformation on eikonals in electrostatic systems, thus obtaining a set of different-order normalized eikonals which are position dependent and momentum independent. Therefore, we can calculate intrinsic and combined aberrations in up to the ninth-order approximation by means of a gradient operation on a corresponding-order eikonal function in a rotationally symmetric electrostatic system. It is to be emphasized that this progress facilitates numerically calculating ultrahigh-order canonical aberrations in practical rotationally symmetrical magnetic and electrostatic systems. As an application, we have calculated higher and ultrahigh-order position and momentum aberrations and expressed them in analytical formulae for Glaser’s bell-shaped magnetic field. For such a bell-shaped magnetic field, weights of higher and ultrahigh-order aberrations with respect to the total aberration decrease remarkably with increase of the aberration order n ( n = 3,5,7,9). It is expected that the present theoretical results will be useful for estimating effects of ultrahigh-order aberrations in magnetic lenses. The canonical aberration theory has several main advantages: the momentum aberrations are much simpler than the same-order slope aberrations; the normalized eikonal expressions enable us to calculate position and momentum aberrations, including axial and off-axial aberrations, at any observation plane in magnetic or electrostatic systems with rectilinear or curvilinear axes. In principle, the canonical aberration theory can be utilized to calculate higher than ninth-order canonical aberrations, including intrinsic and combined position and momentum aberrations, in rota-
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
405
tionally symmetric magnetic or electrostatic systems. It is evident that the calculation of ultrahigh-order canonical aberrations is very complicated. However, the theoretical features of the canonical aberration theory, i.e., its conciseness and simplicity, position dependence but momentum independence to a certain extent, symmetrical property, and recursive structure, give us the attractive possibility of calculating ultrahigh-order canonical aberrations with the computer software MATHEMATICA.
APPENDIX In Eqs. (201, (231, (26), and (29), dimensionless field-distribution functions L, M, N,L,, J,,, I,, have been presented in detail. Based on the previous chapter (Ximen, 1995), we will list other dimensionless field-distribution functions of H2,(n = 2,3,4,5) that appeared in Eqs. (7), (9), (111, and (13).
(a) N,
=
3 1 - k 4 b 4 - -k2bb", 4 8
JL1=
JM, =
-
1 -kb" ' 64
JL, =
15 3 1 -k6b6 - -k4b3bIf + - k 2 ( 16 8 128
JM2 =
JNl =
15 -k3b3 16
3 N - -k2b2, 2 - 4
15 1 -k4b4 - -k2bb" , 8 8
5 3 -k5b5 - -k3b2b" , 4 16
1 N - - k 3 b 3 , (A3) 3-2
5 -kb, 16
(A41
1 b f t ) 2+ -k2bb(4), 192 15 JM3 = -16 k2b2,
5 Jn2 = -k3b3, 4
(W 5
JN3 =
- k 4 b 4 , (A6) 8
406
JIYE XIMEN
Obviously, dimensionless field-distribution functions with the even power of the magnetic field kb describe isotropic aberrations, and dimensionless field-distribution functions with the odd power of the magnetic field kb describe anisotropic aberrations. ACKNOWLEDGMENT
This work was supported by the Doctoral Program Foundation of the Institute of Higher Education of China.
ULTRAHIGH-ORDER CANONICAL ABERRATIONS
407
REFERENCES Arnold, V. I. (1978). “Mathematical Method of Classical Mechanics.” Springer-Verlag, New York. Glaser, W. (1933a). Z. Physik 81, 647. Glaser, W. (1933b). Z. Physik 83, 104. Glaser, W. (1952). “Grundlagen der Elektronenoptik.” Springer, Vienna. Goldstein, H. (1980). “Classical Mechanics,” 2nd ed. Addison-Wesley, Reading, MA. Hawkes, P. W. (1966/67). Optik 24, 252-262, 275-282. Hawkes, P. W., and Kasper, E. (1989). “Principles of Electron Optics.” Academic Press, London. Plies, E., and ’Qpke, D. (1978). Z. Naturforsch. 33a, 1361. Scherzer, 0. (1936a). Z. Physik 101, 23. Scherzer, 0. (1936b). Z. Physik 101, 593. Seman, 0. I. (1955). Trudy Inst. Fir. Astron. Akad. Nauk Eston SSR No. 2, 3-29, 30-49. Seman. 0. I. (1958). “The Theoretical Basis of Electron Optics.” Higher Education Press, Beijing. Sturrock, P. A. (1955). “Static and Dynamic Electron Optics.” University Press, Cambridge. Ximen, J. (1983). “Principles of Electron and Ion Optics and Introduction to Aberration Theory.” Science Press, Beijing. Ximen, J. (1986). Aberration theory in electron and ion optics. In “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Suppl. 17. Academic Press, New York. Ximen, J. (1990a). Oprik 84, 83. Ximen, J. (1990b). J . Appl. Phys. 68,5963. Ximen, J. (1991). Canonical theory in electron optics. I n “Advances in Electronics and Electron Physics” (P. W. Hawkes and B. Kazan, Eds.), Vol. 81, p. 231. Academic Press, Orlando, FL. Ximen, J. (1995). Canonical aberration theory in electron optics up to ultrahigh-order approximation. In “Advances in Imaging and Electron Physics” (P. W. Hawkes, Ed.), Vol. 91, p. 1. Academic Press, San Diego.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL.97
Erratum and Addendum for Physical Information and the Derivation of Electron Physics B. ROY FRIEDEN Optical Sciences Center, University of Arizona Tucson, Ariwna 85721
Soon after the publication of Frieden (1999, it was found that some key equations were off by a factor of c, the speed of light. The corrected equations lead to a new physical interpretation of Fisher information I. Also, some improvements have been made in the physical model for the information approach that is the basis for the chapter. These will be briefly mentioned. Equation (VII.19a) should have an extra factor of c,
Correspondingly, Eqs. (VII.19b) should read
Then Eq. (VII.20) reads
I
=
($)//dpdEP(p,E)
(
-p2
;:)
+-
(VII.20)
and Eq. (VII.21) becomes (VII .21) The lack of a c in the first factor then obviates the following remark about c five lines below: “In the first factor, quantity c is shown elsewhere (Section IX) to be constant.” The key result of these corrections is as follows. Information Z in Eq. (VII.26) becomes
I =J
=
(2rnc/iQ2 = (2/2qZ7 409
(VII .26)
Copyright 0 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
410
B. ROY FRIEDEN
where 2’is the Compton wavelength for the particle. Now, by Eq. (III.lOb) of the chapter, I relates to the minimum mean square error of estimation of the particle four-position, e&, as ezff = I/z.
(1II.lOb)
Hence, Eq. (VII.26) predicts that the minimum root-mean-square error e is one-half the Compton wavelength. This is reasonable, since the Compton wavelength is a limiting resolution length in the measurement of particle position. The upshot is that the information-based derivation (now) makes a reasonable prediction on resolution, as well as deriving the Klein-Gordon and Dirac equations (the main thrust of the chapter). The improvements in the model for the information procedure are twofold. The first is as follows. It previously had to be assumed axiomatically that the total physical information (I - J ) is zero at its extremum. In fact it was recently found (Frieden and Soffer, 1995) that the zero, and the extremization, may be explained on the basis of a zero-sum game of information transfer that transpires between the data measurer and nature. The information Z preexisting in the data has to come from somewhere. That “somewhere” is the physical phenomenon (nature) underlying the measurement. Nature’s version of Z is the information form J. Thus, whereas the data information Z is expressed abstractly as Eq. (VI.71, 4
1= 4
C jdroqn(r) n=l
*
nqn(r),
(VI .7)
in terms of the “mode functions” qn defining probability density p, nature’s information J is I expressed in terms of the physical parameters governing the measurement. Since I = J the game is zero sum, and since the measurer and nature both “want” to maximize their information states, the variation S(I - J ) = 0 as required.’ The second improvement in the theory lies in the physical basis for the form (VI.7). It previously had to be assumed that the mode functions are in an idealized state during the single gedanken measurement that underlies the theory. This state was called the “characteristic state” and corresponds to the situation where the mode functions qJr) have no overlap of their support regions r. [Such mode functions allow for an additivity of information I, as expressed by the summation in form (V1.7).] Unfortunately, the characteristic state is unphysical in many problems, such as the quantum mechanical free-field particle in a box. ‘Most recently, this was found to follow from the perturbation of the system wave function at the measurement. See B. R. Frieden and B. H. Soffer, “Extreme physical information as a natural process,” Phys. Reu. E (submitted).
ERRATUM AND ADDENDUM
41 1
We recently found’ that the same form (VI.7) follows if, instead of the single gedanken measurement, many independent measurements of the desired parameter are made. At measurement n, the system is in the “prepared” state qn. Modes q,, are physically realizable, since they are the solutions to the very differential equation (Schr6dinger wave equation, Dirac equation, etc.) that the information procedure derives. In this way, the unphysical assumption of nonoverlap of modes q,,(r) is avoided. The theory has been significantly strengthened in this way.
REFERENCES Frieden, B. R. (1995). Physical information and the derivation of electron physics. I n “Advances in Imaging and Electron Physics” (P. W. Hawkes, Ed.), Vol. 90,pp. 123-204. Academic Press, San Diego. Frieden, B. R., and Soffer, B. H. (1995). Lagrangians of physics and the game of Fisherinformation transfer. Phys. Reu. E 52, 2274.
*B. R. Frieden and W. J. Cocke, “Foundation for Fisher information-based derivations of physical laws,” Phys. Reu. E (in press).
This Page Intentionally Left Blank
Index
A
B
Accelerators, optics, 337 Algebraic reconstruction (ART),image reconstruction, 160-162 Algorithms block matching algorithms, 235-237 compression algorithm, 193 edge-preserving reconstruction algorithms, 91-93, 118-129 extended-GNC algorithm, 132- 136, 171-175 generalized expectation-maximization algorithm, 93, 127-129, 153, 162-166 graduated nonconvexity algorithm, 90, 91,93,124-127, 153, 168-175 EZW algorithm, 221, 232 Gibbs sampler algorithm, 119-120, 123 image discontinuities, 89-91, 108-118 low-bit-rate video coding, 237-240 Metropolis algorithm, 119, 120 mixed annealing minimization algorithm, 122-123, 158 overlapped block matching algorithm, 235, 237-252 SA-W-LVQ algorithm, 220-226 simulated annealing minimization algorithm, 120-122, 155 suboptimal algorithms, 124-127 Approximation scaling factor, 21 1 Approximation vector, 211 Arithmetic coding, 194
Baker-Campbell-Hausdorff formula, 347 Barnes-Wall lattice, 217-218 Bayesian approach, regularization, 87-88, 98-104 Bayesian classification, pixels, 68 Biorthogonal functions, Gabor expansion, 27-29 Blind restoration problem, 141 Block matching algorithms, motion estimation, 235-237 Block transforms, joint space-frequency representation, 11 Boltzmann machine (BM), 150-151, 153
413
C
Canonical aberration theory, ultrahigh order, 360-406 Characteristic state, 410 Charged-particlewave optics, 257-259, 336-339 Feshbach-Villars form of Klein-Gordon equation, 263, 322, 339-341 Foldy- Wouthuysen representation of Dirac equation, 267-269,322, 341-347 Green’s function for nonrelativistic free particle, 280, 350-351 for system with time-dependent quadratic Hamiltonian, 351-355
414
INDEX
Charged-particlewave optics (Continued) Klein-Gordon equation, 259,276,337,338 Feschbach-Villars form, 263,322, 339-341 Magnus formula, 347-349 matrix element of rotation operator, 351 scalar theory, 316-317 axially symmetric electrostatic lenses, 320-321 axially symmetric magnetic lenses, 282-316 electrostatic quadrupole lenses, 321-322 free propagation, 279-282 general formalism, 259-279 magnetic quadrupole lenses, 317-320 spinor theory, 258 axially symmetric magnetic lenses, 333-335 free propagation, 330-332 general formalism, 322-330 magnetic quadrupole lenses, 226 Clifford-Hammersley theorem, 90 Cliques, neighborhood system, 105-106 Codebooks lattice codebooks, 218-220, 227-230 multiresolution codebooks, 202 regular lattices, 218-220 successive approximation quantization, 214-216 Coding, see Image coding Combined aberrations, 360-361 Complex spectrogram, conjoint image representation, 9-10 Compression, see Image compression Compression algorithm, 193 Compton wavelength, 410 Computed tomography, image formation, 59-60 Conjoint image representation, 2-4, 5 Gabor wavelets, 19-37 Continuous signals, exact Gabor expansion, 23-30 Cost functions, 88 optimal estimators based on, 101-103
D Daugman's neural network, image reconstruction, 31, 52 DCI', 51,52,194
Deblurring, image reconstruction, 155- 159 Decoder, digital coding, 192 Denoising, image enhancement, 56-58 Difference of Gaussian (DOG), receptive field, 44-45 Differential pulse code modulation (DPCM), 51 Diffraction, charged-particle beam scalar theory, 279-282 spinor theory, 330-332 Digital coding, 192-194 Dirac equation, Foldy-Wouthuysen representation, 267-269,322, 341-347 Discontinuities image processing, 89-91 image reconstruction, 108-118 duality theorem, 91, 115-118 explicit lines, 110-115, 154-166 implicit lines, 108-110, 166-181 line continuation constraint, 130-141, 142 Discrete cosine transform, 11 Discrete signals, exact Gabor expansion, 30-33 Discrete spectrogram, 11 Duality theorem, image processing, 91, 115-118
E Edge detection Gabor functions, 7 wavelets, 63-64 Edge-preserving reconstruction algorithms, 91-93, 118-129 extended GNC algorithm, 132-136, 171-175 generalized expectation-maximization (GEM) algorithm, 93, 127-129, 153, 162-166 graduated nonconvexity (GNC) algorithm, 90,91, 93, 124-127, 153, 168-175 Electron optics canonical aberration theory, 360-406 lenses axially symmetric electrostatic lenses, 320-321 axially symmetric magnetic lenses, 282-316,333-335 magnetic quadrupole lenses, 317-320, 336
415
INDEX Electron wave optics, see Charged-particle wave optics Electrostatic lenses charged-particle wave optics, 320-321 integration transformation, 396-403 Encoder, digital coding, 192 Entropy coding, 193-194 Expectationmaximization (EM) approach, image processing, 127-129 Explicit lines, image reconstruction, 110-115, 154-166 Extended-GNC (E-GNC) algorithm, 132-136, 171-175 EZW algorithm, 221,232
F Filtered backprojection (FBP), image reconstruction, 160-162 Fingerprint database, image compression, 54 Finite state scalar quantization, 203 Fisher information, new interpretation, 409 Foldy-Wouthuysen representation, Dirac equation, 267-269,322,341-347 Fractal dimension, image, 69-70 Free propagation, charged-particle beam scalar theory, 279-282 spinor theory, 330-332 G
Gabor-DCT transform, 52 Gabor expansion biorthogonal functions, 27-29 exact Gabor expansion, 23-30 image enhancement, 54-55 quasicomplete, 34-37 Gabor functions (Gabor wavelets, Gaussian wave packets, GW), 3-4,5, 7 applications, 78 human visual system modeling, 41-45 continuous signal, 23-27 biorthogonal functions, 27-29 Zak transform, 29-30 discrete signals, 30-33 Daugman’s neural network, 31-32 direct method, 32-33 drawbacks, 6-7 image analysis and machine vision, 61-78
image coding, 50-54 image enhancement, 54-59 image reconstruction, 59-60 machine vision, 61-66 mathematical expression, 5 orthogonality, 6, 11, 13, 22 quasicomplete Gabor transform, 34-37 receptive field of visual cortical cells, 41-45 vision modeling, 17, 34-35, 41-45 Gabor transform, quasicomplete, 34-37 Gaussian derivatives edge detection, 64 vision modeling, 17, 19, 45 Gaussian Markov random fields (GMRFs), 107 Gaussian wavelets machine vision, 61, 63 texture analysis, 64-68 Gaussian wave packets, see Gabor functions Generalized Boltzmann machine (GBM), 150-151 Generalized expectation-maximization (GEM) algorithm image processing, 93, 153 tomographic reconstruction, 162-166 Generalized integration transformation, eikonals electrostatic lenses, 396-403 magnetic lenses, 369-381, 389-392 Gibbs distributions, Markov random fields, 106-108 Gibbs sampler algorithm, 119-120, 123 Graduated nonconvexity (GNO algorithm, image processing, 90,91, 93, 124-127, 153, 168-175 Green’s function for nonrelativistic free particle, 280, 350-351 for system with time-dependent quadratic Hamiltonian, 351-355
H Hadamard matrix, 217-218 Hexagonal-oriented quadrature pyramid, joint space-frequency representations, 19,20 H u m a n coding, 51, 194
416
INDEX
Human vision Gabor functions, 17,34-37,41-45 joint representations, 16-19, 37-50 receptive field, 40-44 sampling, 45-50 Hyperparameters MRF hyperparameters, 146-149 regularization, 141-143
line continuation constraint, 130-141, 142
duality theorem, 91, 115-118 expectation-maximization(EM) approach, 127- 129
generalized expectation-maximization (GEM) algorithm, 93, 127-129, 153, 162-166
graduated nonconvexity (GNC) algorithm, I
Image analysis, 61-63 edge detection, 7, 63-64 motion analysis, 72-74 stereovision, 74-75, 76-78 texture analysis, 64-72 Image coding algorithms EZW coding, 221,232 SA-W-LVQ, 221-226 arithmetic coding, 194 digital coding, 192-194 entropy coding, 193-194 Gabor expansion, 50-54 Huffman coding, 51, 194 low-bit-rate video coding, 232-252 partition priority coding, 201 predictive coding, 51, 193 regularization, 147-148 still images, 226-232 transform coding, 51, 193 wavelets, 198-205 Image compression, 192 applications, 50-51, 54 fingerprint database, 54 methods, 51 standards, 51, 194 wavelet transforms, 52-53, 194, 198-205 Image deblurring, 155-159 Image discontinuities, see Discontinuities Image enhancement denoising, 56-58 Gabor expansion, 54-55 image fusion, 58-59 Image fusion, image enhancement, 58-59 Image processing discontinuities, 89-91, 108-118 duality theorem, 91, 115-118 explicit lines, 110-115, 154-166 implicit lines, 108-1 10, 166-181
90,91, 93, 124-127, 153, 168-175
iterated conditional modes, 92 theory, 2-3 Image quality, measuring, 61 Image reconstruction, 86-87, 181-184 algebraic reconstruction (ART), 160-162 applications, 153-154 explicit lines, 110-115, 154-155 implicit lines, 91, 108-110, 166-181 blind restoration problem, 141 Daugman’s neural network, 31, 52 deblurring, 155-159 discontinuities duality theorem, 91, 115-118 explicit treatment, 110-115, 154-166 implicit treatment, 108-110, 166-181 line continuation constraint, 130-141, 142
edge-preservingalgorithms, 91-93, 118-129
extended GNC algorithm, 132-136, 171-175
GEM algorithm, 93,127-129,153. 162-166
GNC algorithm, 90, 91, 93, 124-127, 153, 168-175
edge-preserving regularization, 93-94 theory, 104-118 filtered backprojection, 160-162 inverse problem, 94-98,99-101 regularization, 87-89 Bayesian approach, 87-88, 98-104 discontinuities, 89-91, 108-118 inverse problem, 94-98, 99-101 three-dimensional, 59-60 tomographic reconstruction, 159-166 Image representation, 75, 78-79 Gabor schemes, 19-23 continuous signals, 23-30 discrete signals, 30-33 quasicomplete Gabor transform, 34-37
417
INDEX image analysis, 61-63 edge detection, 7, 63-64 motion analysis, 72-74 stereovision, 74-75, 76-78 texture analysis, 64-72 image coding, see Image coding image compression, 192 applications, 50-51, 54 fingerprint database, 54 methods, 51 standards, 51, 194 wavelet transform, 52-53, 194, 198-205 image enhancement and reconstruction, 37,54-56 denoising, 56-58 Gabor expansion, 54-55 image fusion, 58-59 image quality metrics, 10, 61 three-dimensional reconstruction, 59-60 joint space-frequency representations, 3, 8 block transforms, 11 complex spectrogram, 9-10 multiresolution pyramids, 13-16 vision-oriented models, 16- 19 wavelets, 11-13 Wigner distribution function, 9 machine vision, 61-78 orthogonality, 6-7, 11, 13, 22 theory, 2-7 vision modeling Gabor functions, 17.34-37,41-45 sampling in human vision, 45-50 visual cortex image representation, 37-41 Implicit lines image processing, 91 image reconstruction, 108-110, 166-181 Informational uncertainty, 10 Integration transformation electrostatic lenses, 396-403 Glaser’s bell-shaped magnetic field, 393-395 magnetic lenses, 369-381, 389-392 Inverse problem, image reconstruction, 94-98, 99-101 Isolated zero, 224 Isotropic intrinsic aberrations, 360-361 Iterated conditional modes (ICM), image processing, 92
J Joint space-frequency representations, 3, 8 block transforms, 11 complex spectrogram, 9-10 multiresolution pyramids, 13-16 vision-oriented models, 16-19 wavelets, 11-13 Wigner distribution function, 9 JPEG, 51,54
K Klein-Gordon equation charged-particlewave optics, 259, 276,337, 338 Feschbach-Villars form, 263,322,339-341
L Laplacian pyramid, image compression, 51-52 Lapped orthogonal transform, 11 Lattice codebooks, 218-220, 227-230 Lattice packing, 216 Lattice vector quantization, 194 Likelihood function, 88 Line continuation constraint, 130 extended GNC,132-136 mean field approximation, 131-132 sigmoidal approximation, 137-141 Logons, 4 Low-Balian theorem, 23, 28 Low-bit-rate video coding, 232-252 algorithm, 237-240
M Machine vision, 61-78 Gabor function, 61-66 Gaussian wavelets, 61, 63 Magnetic lenses canonical aberrations, 381-388 charged-particle wave optics axially symmetric lenses, 282-316, 333-335 quadrupole lenses, 317-320, 336 integration transformation, 369-381, 389-392
418
INDEX
Magnetic lenses (Continued) power-series expansions eikonal, 366-369 Hamiltonian function, 361-366 Magnus formula, 347-349 MAP (maximum a posteriori) estimate, 88 edge-preservingalgorithm, 92,102-103, 104 Mapping, 51, 193 image compression, 51-52 Marginal posterior mean, cost function, 102 Markov random fields (MRFs) Gibbs distributions, 106-108 image processing, 90,105 Maxima of the posterior marginals estimate, see MPM estimate Maximum a posteriori estimate, see MAP estimate Maximum likelihood (ML) criterion, 104,145, 149 Maximum pseudolikelihood (MPL) estimate, 148 Metropolis algorithm, 119, 120 Mixed annealing minimization algorithm, 122-123, 158 ML criterion, see Maximum likelihood criterion Modularity, human visual system, 39 Monte Carlo methods, image regularization, 119-120 Morozov’s discrepancy principle, 143 Motion analysis, vision systems, 72-74 MPL, see Maximum pseudolikelihood estimate MPM (maxima of the posterior marginals) estimate, 88, 102 MRFs, see Markov random fields Multiresolution codebooks, 202 Multiresolution pyramids, joint space-frequency representation, 13-16 Multishell lattice codebooks, 219
N Neighborhood system, 105 Neighbor interaction function, 109 Neural networks Daugman’s neural network, 31,52 generalized Boltzmann machine (GBM), 150-151 optimization, 88-89
Neuron, receptive field (RF), 40-41 Noise removal deblurring, 155-159 image enhancement, 56-58 Noise shaping, 198 0
Optics accelerators, see Accelerators, optics charged particles, see Charged-particle wave optics Optimization, neural networks, 88-89 Orthogonality,image representation, 6-7,11, 13,22 Orthogonal wavelets, 13-16 Overlapped block matching algorithm, 235, 237-252
P Parallelism, human visual system, 39 Partition priority coding (PPC), 201 Physical information, erratum and addendum, 409-411 Posterior density, 100-101 Postprocessing, digital coding, 192-193 Predictive coding, 51, 193 Preprocessing, digital coding, 192-193 Primal sketch, 64 Prior density, 88 Prior information, 100 Probability density function, states of information, 98-101 Propagation, charged-particle beam scalar theory, 279-282 spinor theory, 330-332 Psychophysics, vision modeling, 35,39,40,43
Q Quadrature pyramid, joint space-frequency representations, 19, 20 Quantization defined, 51, 193 image compression, 5 1 scalar quantization, 193, 194 finite state scalar quantization, 203 wavelets, 200-201, 202-203
419
INDEX successive approximation quantization convergence, 211-214 orientation codebook, 214-216 scalar case, 205-206 vectors, 207-211 successive approximation wavelet lattice vector quantization (SA-W-LVQ), 191-194, 252 coding algorithm, 220-226 image coding, 226-232 theory, 193-220 video coding, 232-252 vector quantization, 51, 193 wavelets, 201-202,203 Quantum theory charged particle wave optics, 257-259, 336-339 aberrations, 311-312 scalar theory, 259-322 spinor theory, 258, 322-336 Quasicomplete Gabor transform, 34-37
R Receptive field (RF) Gabor function, 41-43 neuron, 40-41 Reconstruction, see Image reconstruction Redundancy, temporal redundancy, 235 Redundancy removal, 192, 193 Regularization, 87-89 Bayesian approach, 87-88, 98-104 discontinuities, 89-91, 108-1 18 duality theorem, 115-118 dual theorem, 91 explicit treatment, 110-115, 154-166 implicit treatment, 108-110, 166-181 line continuation constraint, 130-141, 142 edge-preserving algorithms, 91-93, 118-129 extended GNC algorithm, 132-136, 171- 175 GEM algorithm, 93, 127-129, 153, 162- 166 GNC algorithm, 90, 91, 93, 124-127, 153, 168-175 edge-preserving regularization, 93-94 Markov random fields, 90,105, 106-108 theory, 104-118 hyperparameters, 141-143
inverse problem, 96-98, 99-101 Gaussian case, 103-104 optimal estimators based on cost functions, 101-103 posterior density, 100-101 prior information, 100 states of information, 98-101 Regularization parameter, 96, 143-146 Regular lattices, 216 Risk for estimation, 144 Risk for protection, 144 S
Sampling vision modeling, 3-4, 45-50 visual cortex, 45-50 SA-W-LVQ, see Successive approximation wavelet lattice vector quantization Scalar quantization, 193, 194 finite state scalar quantization, 203 wavelets, 200-201, 202-203 Scalar theory charged-particle wave optics, 316-317 axially symmetric electrostatic lenses, 320-321 axially symmetric magnetic lenses, 282-316 electrostatic quadrupole lenses, 321-322 free propagation, 279-282 general formalism, 259-279 magnetic quadrupole lenses, 317-320 Signal processing, 192 compression, 51, 192-194 digital coding, 192-194 Gabor functions, 5 theory, 2-3 Signal redundancy, 192 Signal uncertainty, 10 Simulated annealing minimization algorithm, 120-122,155 Single-shell lattice codebooks, 218 Smoothness, image processing, 89,97 Spatial sampling, visual cortex, 47-50 Spectrogram complex spectrogram, 9-10 discrete spectrogram, 11 reconstructing signal from, 19 Sphere packing, 216 Spin dynamics, 337
420
INDEX
Spinor theory charged-particle wave optics, 258 axially symmetric magnetic lenses, 333-335
free propagation, 330-332 general formalism, 322-330 magnetic quadrupole lenses, 336 States of information, regularization, 98-99 Stereo vision, 74-75 Still image coding, 226-232 Stochastic integration, image regularization, 119
Suboptimal algorithms, 124-127 Successive approximation quantization convergence, 211-214 orientation codebook, 214-216 scalar case, 205-206 vectors, 207-211 Successive approximation wavelet lattice vector quantization (SA-W-LVQ), 191-194, 252
coding algorithm, 220-226 image coding, 226-232 theory, 193-220 successive approximation quantization, 205-220
wavelet transforms, 195-205 video coding, 232-252
T Temporal redundancy, 235 Texture analysis, Gaussian wavelets, 64-68 Three-dimensional image reconstruction, 59-60
Tomography image formation, 59-60 image reconstruction, 159-166 TPM (thresholded posterior means) estimate, 88, 120 Transform coding, 51, 193 Two-dimensionalwavelet transforms, 197
U Ultrahigh-order canonical aberration theory, 360-406 Uncertainty, informational uncertainty, 10
V Vector quantization, 51, 193 successive approximation wavelet vector quantization (SA-W-LVQ), 191-194 coding algorithm, 220-226 image coding, 226-232 theory, 193-220 video coding, 232-252 wavelets, 201-202, 203 Vector wavelet transform, 202 Video coding, low-bit-rate, 232-252 Video signals, 192 Vision modeling Gabor functions, 17, 34-37,41-45 joint representations, 16-19,37-50 receptive field, 40-44 sampling, 3-4,45-50 Visual cortex image representation, 37, 39-45 sampling, 45-50 Visual psychophysics, 35,39,40,43
W Wavelet coefficients scalar quantization, 200-201, 202-203 vector quantization, 201-202, 203 Wavelets edge detection, 63-64 signal and image processing, 3-5, 11-13, 52-53
Wavelet transforms, 12, 52-53, 194 defined, 195 image compression, 52-53, 194, 198-205 theory, 195-197 two-dimensional, 197 Wiper distribution function, 2-3, 9
X X-ray transmission tomography, image reconstruction, 159 Z
Zak transform, Gabor expansion, 29-30 Zero-tree root, 224 Zero-trees, 202-203
This Page Intentionally Left Blank