ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 111
This Page Intentionally Left Blank
Advances in
Imaging and Electron Physics EDITED BY PETER W. HAWKES CEMES/Laboratoire d’Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
VOLUME 111
San Diego San Francisco New York Boston London Sydney Tokyo
This book is printed on acid-free paper. Copyright © 1999 by Academic Press All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-1998 chapters are as shown on the title pages: if no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/99 $30.00 ACADEMIC PRESS A Harcourt Science and Technology Company 525 B. St. Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com Academic Press 24 – 28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014753-X Typeset by Laser Words, Madras, India Printed in the United States of America 99 00 01 02 03 BB 9 8 7 6 5 4
3 2 1
CONTENTS
CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . FORTHCOMING CONTRIBUTORS . . . . . . . . . . . . . . . . . . .
vii ix xi
Number Theoretic Transforms and their Applications in Image Processing S. BOUSSAKTA AND A. G. J. HOLT I. Introduction . . . . . . . . . . . . . . . . . . . . . . II. Number Theoretic Transforms and Their Application to Convolutions . III. Application of Two-dimensional Fermat Number Transforms to the Calculation of 2-D Convolutions . . . . . . . . . . . . . . IV. 2-D NTTs of Periodic Functions and Their Applications . . . . . . V. Another Family of Number Theoretic Transforms using the Mersenne Numbers (NMNTs) . . . . . . . . . . . . . . . . . . . VI. Combination of 2-D NTTs Using the 2-D MRC Suitable for Parallel Image-Processing Applications . . . . . . . . . . . . . . . VII. Hardware Implementations . . . . . . . . . . . . . . . . VIII. Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . List of Abbreviations and Symbols . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . .
3 4 9 38 53 65 73 80 85 89 90
On the Electron-Optical Properties of the ZrO/W Schottky Electron Emitter M. J. FRANSEN, Th. L. VAN ROOY, P. C. TIEMEIJER, M. H. F. OVERWIJK, J. S. FABER, AND P. KRUIT I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . Electron Emission Theory . . . . . . Boersch Effect for Electron Emitters . . Experimental Methods and Systems . . Experiments on ZrO/W Schottky Emitters Applications to Other Emitters . . . . Conclusions . . . . . . . . . . . References . . . . . . . . . . .
v
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
92 95 116 119 128 156 162 164
vi
CONTENTS
The Size of Objects in Natural and Artificial Images LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL I. II. III. IV. V. VI. VII. VIII.
Introduction . . . . . . . . . . . . . . . . . . . . Statistics of Natural Images: A Review . . . . . . . . . . Sizes of Sections in Natural Images . . . . . . . . . . . Size of Sections and the BV Norm of Natural Images . . . . The Dead Leaves Model . . . . . . . . . . . . . . . A Short Review of Texture Synthesis by Mathematical Methods . Some Principles for the Synthesis of Abstract Natural Images . . Appendix: Matheron’s Dead Leaves Model . . . . . . . . References . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
168 170 175 186 192 210 228 235 239 242
. . . . . . .
244 254 259 263 281 312 323 325
. . . . . . . .
327 332 335 342 351 362 365 365
Reconstruction of Nuclear Magnetic Resonance Imaging Data from Non-Cartesian Grids GORDON E. SARTY I. II. III. IV. V. VI. VII.
Introduction . . . . . . . . . . . . . . . . . Gridding Reconstruction and Direct Reconstruction . . Natural k-Plane Coordinate Systems . . . . . . . . Integrating Curve Band-pass Operators . . . . . . . Curve Band-pass Operator Point-Spread Functions . . . Direct Reconstruction Using Natural k-Plane Coordinates Conclusion . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
.
An Integrated Approach to Computational Vision SIBEL TARI I. II. III. IV. V. VI.
Introduction . . . . . . . . . . . . . . . . . Review of the Edge-Strength Function . . . . . . . Geometry of the Edge-Strength Function . . . . . . Extraction and Representation of Perceptual Information . Shape Skeleton and Shape Segmentation (2-D Case) . . Coloring Nested Symmetries in Higher Dimensions . . Abstract . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Index . . . . . . . . . . . . . . . . . . . . . . . . . .
367
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the author’s contribution begins.
LUIS ALVAREZ (167), Departmento de Informatica y Sistemas, Universidad de Las Palmas, Campus de Tafira, Las Palmas, Spain, 35017. S. BOUSSAKTA (1), Section of Electrical and Electronic Engineering, University of Teeside, Middlesborough, Cleveland, United Kingdom, TS1 3BA. J. S. FABER (91), Philips Electron Optics/FEI, Eindhoven, The Netherlands. M. J. FRANSEN (91), Philips Analytical, Lelyweg 1, Almelo, The Netherlands, 7602 EA. YANN GOUSSEAU (167), Centre de Mathematiques et leurs Applications, ENS Cachan, 61 av. du President Wilson, Cachan Cedex, France, 94235. A. G. J. HOLT (1), Department of Electrical and Electronic Engineering, University of Newcastle, Newcastle upon Tyne, United Kingdom, NE1 7RU. P. KRUIT (91), Delft University of Technology, Department of Applied Physics, Delft, The Netherlands. JEAN-MICHEL MOREL (167), Centre de Mathematiques et leurs Applications, ENS Cachan, 61 av. du President Wilson, Cachan Cedex, France, 94235. M. H. F. OVERWIJK (91), Philips Analytical, Lelyweg 1, Almelo, The Netherlands, 7602 EA. GORDON E. SARTY (243), University of Saskatchewan, Department of Medical Imaging, Royal University Hospital, 103 Hospital Drive, Saskatoon, Saskatchewan, Canada, S7N 0W8.
vii
viii
CONTRIBUTORS
SIBEL TARI (327), Department of Engineering Sciences, Middle East Technical University, Ankara, Turkey, 06531. TH. L. VAN ROOY (91), Philips Analytical, Lelyweg 1, Almelo, The Netherlands, 7602 EA.
PREFACE Four of the the five contributions to this volume, some of which form short monographs on their topic, cover various aspects of image character, image manipulation and computer vision while the fifth is a detailed study of the Schottky electron emitter. We begin with a presentation of number-theoretic transforms and their utility in image processing by S. Boussakta and A.G.J. Holt. Both the basic theory and some account of the appropriate hardware are included, which makes this a self-contained study of an interesting area. A family of transforms based on the Mersenne numbers, introduced by the authors, is described fully. The literature on these questions is already extensive and the attractive features of these integer transforms are well known but they are only slowly gaining acceptance. I hope that this survey will lead more scientists to try them out. The second contribution, by M.J. Fransen, T.L. van Rooy, P.C. Tiemeijer, M.H.F. Overwijk, J.S. Faber amd P. Kruit, provides much new information about a surprisingly little studied type of electron source, the Schottky emitter consisting of a thin layer of zirconium oxide on tungsten. Although such emitters are widely used and their properties and advantages are known in general terms, there have been very few systematic studies and no reliable sets of measurements are available. “Are the present values of brightness and energy spread of the ZrO/W. . .emitter in the electron optical instruments in which these sources are employed in accordance with theoretical values?. . . What are the optimal values of the tip diameter, field strength, and emitter temperature for a certain application?. . .Is it possible to estimate the enegy width at all field strengths with a simple model?. . .What is the field on the emitter surface?” These are the questions to which the authors of this chapter provide answers and I have no doubt that it will rapidly become the standard reference for these emitters. Next, we have a study by L. Alvarez, Y. Gousseau and J.-M. Morel of scaling phenomena and image statistics, which provides a readable account of material that has hitherto been scattered over numerous publications. The seven sections have been organized in such a way that they can be read separately, although of course ‘the whole is greater than the sum of its parts.’ The chapter opens with a review of the statistics of natural images and continues with a discussion of the notion of size. The BV (bounded variation) norm is then introduced after which a long section explores the dead-leaves model. Next comes a short review of mathematical methods of performing texture synthesis and the seventh section presents some principles for the synthesis of abstract ix
x
PREFACE
images. The authors return to Matheron’s dead-leaves model in an appendix. The subject is thus very fully covered and here too, I am sure that students of these questions will frequently consult this authoritative survey. We then turn to a problem that has arisen recently in magnetic resonance imaging, where image acquisition is accelerated by replacing the traditional cartesian grid by more complicated patterns such as spirals or rosettes. How are the resulting data to be exploited? G.E. Sarty opens his chapter with a general introduction that will enable non-specialists to understand the nature of the problem and its practical importance. He then turns to gridding reconstruction and then moves on to k-plane coordinate systems. Two sections then explain the notions of curve bandpass operators and the associated point-spread functions and a final section describes direct reconstruction using natural k-plane coordinates. Although this may seem of rather specialist interest, it should be remembered that data are acquired in the spatial frequency plane in other areas too — radio astronomy is the obvious example — so the techniques presented are of direct interest beyond the world of MRI. The volume concludes with a short contribution by S. Tari on computer vision based on the edge-strength function. The relation between the human visual system and perception and the information gathered by a machine is examined and several ways of identifying shape and edges are examined. I thanks most warmly all the contributors to this volume for the time and trouble they have devoted to making their material accessible to readers less familiar with it and close with a list of chapters planned for future volumes. Peter W. Hawkes
FORTHCOMING CONTRIBUTIONS
D. Antzoulatos Use of the hypermatrix N. D. Black, R. Millar, M. Kunt, F. Ziliani and M. Reid Second generation image coding N. Bonnet Artificial intelligence and pattern recognition in microscope image processing G. Borgefors Distance transforms A. van den Bos and A. Dekker Resolution P. G. Casazza Frames J. A. Dayton Microwave tubes in space E. R. Dougherty and D. Sinha Fuzzy morphology J. M. H. Du Buf Gabor filters and texture analysis R. G. Forbes Liquid metal ion sources E. F¨orster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect M. Gabbouj Stack filtering W. C. Henneberger The Aharonov–Bohm effect M. I. Herrera and L. Bru´ The development of electron microscopy in Spain xi
xii
FORTHCOMING CONTRIBUTIONS
K. Ishizuka Contrast transfer and crystal images C. Jeffries Conservation laws in electromagnetics M. Jourlin and J. -C. Pinoli Logarithmic image processing E. Kasper Numerical methods in particle optics A. Khursheed Scanning electron microscope design G. K¨ogel Positron microscopy K. Koike Spin-polarized SEM W. Krakow Sideband imaging A. van de Laak-Tijssen, E. Coets and T. Mulvey Memoir of J. B. Le Poole L. J. Latecki Well-composed sets W. Li Vector transformation C. Mattiussi The finite volume, finite element and finite difference methods S. Mikoshiba and F. L. Curzo Plasma displays R. L. Morri Electronic tools in parapsychology J. G. Nagy Restoration of images with space-variant blur P. D. Nellist and S. J. Pennycook Z-contrast in the STEM and its applications M. A. O’Keef Electron image simulation
FORTHCOMING CONTRIBUTIONS
G. Nemes Phase-space treatment of photon beams B. Olstad Representation of image operators C. Passow Geometric methods of treating energy transport phenomena E. Petajan HDTV F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission J. W. Rabalais Scattering and recoil imaging and spectrometry H. Rauch The wave-particle dualism D. Saldin Electron holography G. Schmahl X-ray microscopy J. P. F. Sellschop Accelerator mass spectroscopy S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications I. Talmo Study of complex fluids by transmission electron microscopy J. Toulouse New developments in ferroelectrics T. Tsutsui and Z. Dechun Organic electroluminescence, materials and devices Y. Uchikawa Electron gun optics D. van Dyck Very high resolution electron microscopy
xiii
xiv
FORTHCOMING CONTRIBUTIONS
J. S. Villarrubia Mathematical morphology and scanned probe microscopy L. Vincent Morphology on graphs N. White Multi-photon microscopy J. B. Wilburn Generalized ranked-order filters C. D. Wright and E. W. Hill Magnetic force microscopy
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 111
Number Theoretic Transforms and their Applications in Image Processing S. BOUSSAKTA1 AND A. G. J. HOLT2 1 Section
of Electrical and Electronic Engineering, University of Teesside, Middlesborough, Cleveland, TS1 3BA, U.K. 2 Department of Electrical and Electronic Engineering, University of Newcastle, Newcastle upon Tyne, NE1 7RU, U.K.
I. Introduction . . . . . . . . . . . . . . . . . . . . . .
3
II. Number Theoretic Transforms and Their Application to Convolutions . . .
4
A. Definition of NTTs . . . . . . . . . . . . . . . . . . .
4
B. General Constraints for Choosing NTTs for Digital Image and Signal Processing . . . . . . . . . . . . . . . . . . .
5
C. Choice of the Modulus and Kernel
6
. . . . . . . . . . . . .
1. Mersenne Transforms . . . . . . . . . . . . . . . . . 2. Fermat Number Transforms . . . . . . . . . . . . . . .
6 6
D. Considerations of Transforms and Word Lengths . . . . . . . . .
7
1. Word Length . . . . . . . . . . . . . . . . . . . .
8
E. Complex Convolution Using the NTTs . . . . . . . . . . . . III. Application of Two-Dimensional Fermat Number Transforms to the Calculation of 2-D Convolutions . . . . . . . . . . . . . . . . . . . 1. Definition of the Two-Dimensional Fermat Number Transform (2-D FNT) . . . . . . . . . . . . . . . . . 2. The Relationship between Transform Size and Modulus for 2-D FNT . . . . . . . . . . . . . . . . . . 3. Arithmetic Complexity of the 2-D Fermat Number Transforms 4. The Convolution Property of the 2-D FNT . . . . . . . 5. Examples . . . . . . . . . . . . . . . . . .
. . .
8 9 9
. . . .
9 10 10 11
A. Two-Dimensional Convolution with NTTs without Matrix Transpose and without Overlap . . . . . . . . . . . . . . . . . .
11
1. The Twiddle Factor ˛qs . . . . . . . . . . . . . . . .
16
a. Computation of the 2-D NTT . . . . . . . . . . . . . b. Example of 2-D Autoconvolution Obtained from Simulations . . .
19 20
B. Number Domain Analysis in Two Dimensions with Possible Applications . . . . . . . . . . . . . . . . . .
24
1. Simple Images in the 2-D NTT Domain . . . . . . . . . . . 2. Mathematical Development . . . . . . . . . . . . . . .
24 24
Volume 111 ISBN 0-12-014753-X
. . . .
. . . .
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
2
S. BOUSSAKTA AND A. G. J. HOLT 3. Square Images . . . . . . . . . . . . . . . . . . . a. b. c. d. e. f. g. 4. 5. 6. 7. 8. 9.
Square Square Square Square Square Square Square
of of of of of of of
Side Side Side Side Side Side Side
D2 D3 D4 D5 D6 D7 D8
25
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
25 27 27 28 28 30 30
Rectangular Images . . . . . . . . . Summary of the Results . . . . . . . Generalization and Possible Applications . The Effect of Translation on the Zero Pattern Possible Applications . . . . . . . . Summary . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
30 32 32 34 36 38
L L L L L L L
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
IV. 2-D NTTs of Periodic Functions and Their Applications . . . . . . . . A. Analysis of the 2-D FNT of Periodic Structures . . . . . . . . . Case 1: Tr D Tc . . . . . Generalization for 2-D FNT . . The Effect of Data Errors on 2-D Summary of the Results . . .
38 38
. . .
39 41 44 44
B. Applications . . . . . . . . . . . . . . . . . . . . .
46
C. Implementation of the Inspection Method . . . . . . . . . . .
49
1. 2. 3. 4.
1. 2. 3. 4.
. . . . FNT . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Implementation with F3 . . . . . . . . . . Implementation Using 1-D FNT . . . . . . . Number Theoretic Transform with Independent Length Summary . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
.
. . . .
49 50 50 53
V. Another Family of Number Theoretic Transforms using the Mersenne Numbers (NMNTs) . . . . . . . . . . . . . . . .
53
1. Transform Definition . . . . . . . . . . . . . . . . .
53
a. The Forward Transform . . . . . . . . . . . . . . . b. The Inverse Transform . . . . . . . . . . . . . . .
54 54
2. Properties of the ˇ Function . . . . . . . . . . . . . . 3. Calculation of 1-D Convolutions . . . . . . . . . . . . . 4. Fast Algorithms for This Transform . . . . . . . . . . . .
54 55 56
a. Radix-2 Algorithm Derivation . . . . . . . . . . . . . b. Arithmetic Complexity . . . . . . . . . . . . . . .
56 57
B. The 2-D New Mersenne Number Transform (2-D NMNT) . . . . . .
58
1. 2. 3. 4.
. . . . . . . . . . and Modulus . . . . .
Definition of the 2-D Transform . . . . . . . . Relationship Between Transform Sizes and Modulus . . Calculation of 2-D Convolutions Using the 2-D NMNT . Multidimensional Transform . . . . . . . . . .
. . .
58 59 59 60
a. The M-D Forward Transform . . . . . . . . . . . . . b. The M-D Inverse Transform . . . . . . . . . . . . . .
60 61
5. The Calculation of Multidimensional Convolution
. . . . . . . .
. . . . . . . .
.
. . . . . . .
61
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
3
C. The Separable Two-Dimensional and Multidimensional Mersenne Transforms . . . . . . . . . . . . . . . . . .
62
1. Definition of the Separable Two-Dimensional Transform (2-D NMNT) . . . . . . . . . . . . . . . . . . . .
62
a. The Forward Transform . . . . . . . . . . . . . . . b. The Inverse Transform . . . . . . . . . . . . . . .
62 63
2. Arithmetic Complexity . . . . . . . . . . . . . . . . 3. The Calculation of 2-D Convolution . . . . . . . . . . . . 4. Generalization of This Transform . . . . . . . . . . . . .
63 63 64
a. The Forward Multidimensional Transform . . . . . . . . . b. The Inverse Multidimensional Transform . . . . . . . . .
64 64
5. Summary . . . . . . . . . . . . . . . . . . . . .
65
VI. Combination of 2-D NTTs Using the 2-D MRC Suitable for Parallel Image-Processing Applications . . . . . . . . . . . . . . . .
65
A. Introduction . . . . . . . . . . . . . . . . . . . . .
65
B. Two-Dimensional Composite Transform (2-D CNTT) . . . . . . .
66
C. Calculation of the 2-D Cyclic Convolution Using This Method . . . .
67
D. Two-Dimensional Mixed Radix Conversion (2-D MRC)
67
. . . . . .
E. Combination of the 2-D NMNT and 2-D FNT for Two Moduli . . . .
68
F. Advantages of This method . . . . . . . . . . . . . . . .
72
VII. Hardware Implementations . . . . . . . . . . . . . . . . . A. TTL-based Designs . . . . . . . . . . . . . . . . . . .
73 73
B. Pipelined Designs . . . . . . . . . . . . . . . . . . .
74
C. Vector Radix Method . . . . . . . . . . . . . . . . . .
74
VIII. Conclusions . . . . . . . . . . . . . . . . . . . . . .
80
References . . . . . . . . . . . . . . . . . . . . . . .
85
List of Abbreviations and Symbols . . . . . . . . . . . . . . .
89
Acknowledgments . . . . . . . . . . . . . . . . . . . .
90
I. INTRODUCTION This chapter briefly introduces the concepts of number theoretic transforms and their applications to 2-D convolutions. The use of NTTs for 2-D convolutions is discussed and a method for calculating them without matrix transpose and without overlap is explained with examples. Although the transform domain for 2-D NTTs is not as simple to interpret as for some other transforms, the 2-D FNT domain does have special patterns of zeros and nonzero values that could find applications. These are discussed for a number of simple images and also for 2-D periodic structures. The effects of small errors in the periodic data on the 2-D FNT are discussed and applications are considered. Another family of transforms based upon the Mersenne numbers is introduced together
4
S. BOUSSAKTA AND A. G. J. HOLT
with its separable form. The transform is extended to the multidimensional case. It is also shown how this transform can be combined with the 2-D FNT, using the 2-D mixed radix conversion, to provide extended dynamic range and great parallelism using simple and convenient moduli leading to an efficient method for parallel image-processing applications. Because of the somewhat inconvenient word lengths required by some NTTs, the use of VLSI, where the word length can be chosen to suit the design, has had a dramatic effect on circuit design for NTTs. A number of designs for pipeline FNT processors, including a vector radix structure, are discussed, and block diagrams for their operation are included. II. NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATION TO CONVOLUTIONS Number theoretic techniques have been discussed by many authors, including Pollard [1971], and Rader [1972a, b], and their application to digital signal processing has been surveyed in an illuminating paper by Jullien [1991] and in other references [McClellan and Rader, 1970; Rabiner and Gold, 1975; Agarwal and Burrus, 1974a, b, c; King et al., 1989; Gregory and Krishnamurthy, 1984; Martens, 1983; Reed et al., 1977; Schroeder, 1990; Myers, 1990]. Here we briefly discuss some aspects of the theory of number theoretic transforms and their applications to image processing. The well-known fast Fourier transform (FFT) method for calculating the discrete Fourier transform (DFT) may involve significant amounts of roundoff error [Oppenheim and Weinstein, 1972; Meyer, 1989] and also require the storage or generation of the complex basis functions — i.e., sine and cosine — which must be rounded. Thus, exact results for the FFT are not possible on a digital machine. This encouraged the investigation of other transforms that retain the circular convolution property (CCP) and also reduce rounding errors and computational load. This search has led to the introduction of a new family of transforms defined in finite fields. These are known as number theoretic transforms (NTTs). A. Definition of NTTs Let F be a prime integer and ˛ be a root of unity of order N; then the NTT defined in the Galois field, GF[F] modulo F, is given by: Xk D
N1
xn ˛nk
mod F
nD0
k D 0, 1, 2, . . . , N 1
1
5
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
Since N and F are chosen to be relatively prime, N1 exists and an inverse can be defined: xn D 1/N
N1
Xk ˛nk
mod F
2
kD0
n D 0, 1, 2, . . . , N 1 NTTs in general have the cyclic convolution property and can be used to calculate the convolutions by the same method as the FFT. NTTs have some advantages over the FFT: 1. NTTs do not need any manipulation of the trigonometric functions (sine and cosine). 2. They are calculated modulo an integer and hence exact results are possible. 3. Multiplication-free transforms can be achieved through an appropriate choice of the transform parameters. B. General Constraints for Choosing NTTs for Digital Image and Signal Processing Using transforms defined in finite fields means that the problem of the accumulation of the round-off errors resulting from the use of the DFT is now solved. All arithmetic operations are in integer form carried out modulo some integer F, so no rounding and/or truncation errors occur, and because all results are residue reduced, no overflow occurs either. This makes exact results in the calculation of convolution and correlation possible, which is a desirable feature for any transform. Thus, no additional noise arising in the calculation is introduced by the transforms, which may be important in critical cases. If the desirable features of NTTs are considered, the ideal could be as follows: 1. Simple binary representation for the modulus simplifies arithmetic operations. 2. In order to avoid complex multiplications, the multiplication by powers of ˛ in the NTT should be as simple as possible. The case ˛ D 2 meets this condition. 3. The number of arithmetic operations required should be small. 4. The transform should have the cyclic convolution property. Besides these desirable features, the following conditions must be met in defining any NTTs:
6
S. BOUSSAKTA AND A. G. J. HOLT
1. N and F must be relatively prime so that N1 exists, ensuring that the inverse transform exists. 2. The kernel ˛ must be a root of unity of order N; i.e., N is the least positive integer such that ˛N D 1 mod F and ˛t 6D 1 for 0 < t < N. 3. In the case of F composite, (1 ˛t ) and F must be mutually prime for 0 < t < N, or N should divide the gcd(p1 1, p2 1, pk 1), where pi m2 mk are the prime factors of F (F D pm1 1 p2 Ð Ð Ð pk ) [Agarwal and Burrus, 1974a]. The condition that ˛ must be a root of unity of order N ensures the CCP and causes the rigid relationship between the modulus, the transform length, and the kernel ˛. This condition can be neglected when the NTTs are applied to applications other than convolution. C. Choice of the Modulus and Kernel The parameters that must be chosen for NTTs are the modulus F, the transform length N, and the basis function ˛. Frequently the choice starts by selecting an F suitable for the computation on binary processors (requiring a number of bits compatible with the processor used and a representation of few binary bits in order to facilitate the arithmetic operations), followed by the selection of suitable values for N and ˛. 1. Mersenne Transforms These transforms have as modulus the Mersenne primes F D 2p 1, with p being prime. This transform has the cyclic convolution property and can calculate cyclic convolutions. Although the Mersenne number transforms (MNTs) have simple arithmetic operations and kernel, the transform length p or 2p (corresponding to ˛ D 2 or 2, respectively) is not highly composite, making the application of fast algorithms impossible because p is prime [Rader, 1972a]. Transforms that solve these problems will be introduced in a later section. 2. Fermat Number Transforms The limitations of Mersenne transforms in terms of transform length flexibility and the lack of fast algorithms led Rader [1972a] and Agarwal and Burrus [1974a] to consider primes of the form Ft D 2b C 1 with b D 2t . This type of modulus, known as the Fermat numbers (Ft , t D 0, 1, 2, . . .), is suitable for NTTs. Transforms defined in these fields or rings are known as Fermat number transforms (FNTs). Only the first five Fermat numbers are known to be primes
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
7
(F0 , . . . , F4 ). For these cases we can define transforms up to Ft 1. For any Fermat numbers, it is always possible to define transforms for any N that is a power of 2 up to N D 2tC2 for composite Fermat numbers and for any N that is power of 2 less than Ft for prime Fermat numbers. These transforms can be implemented readily using radix 2, radix 4 [Brigham, 1974], split radix [Duhamel and Holmann, 1984], etc., decimation in time or in frequency FFT-type algorithms. It is desirable to have the multiplication by ˛ and its powers as simple as possible, preferably with ˛ D 2 or a power of 2. If ˛ D 2, then the multiplication by ˛ will reduce to bit shifts, giving the FNTs an advantage in speed and cost. It can be shown that 2 is a root of order N D 2tC1 mod Ft . Hence a transform with the simplest possible kernel exists in finite field defined modulo the Fermat numbers. Applications of the FNT are given in Siu and Constantinides [1983, 1984], Marshal and Soraghan [1988], Morandi et al. [1988], Boussakta and Holt [1988], Boussakta [1990], Smith [1983], Marir [1986], Marshal and Soragan [1988], Reed et al. [1977], Reed and Truong [1978], and Schroeder [1990]. D. Considerations of Transforms and Word Lengths The condition for NTTs to have the cyclic convolution property has created a rigid relationship between the modulus, the transform length, and the kernel chosen. In fact, the maximum length for a multiplierless FNT is 256. This is well suited for moderate-length convolutions and image filtering. However, a problem arises when a longer transform length is needed, e.g., if the sequences to be processed are naturally long. Several solutions are available. Agarwall and Burrus [1974b] used the scheme proposed by Rader [1975] of mapping one-dimensional convolution to two-dimensional. Other authors proposed the use of mixed-radix NTTs with 2 as a root of unity and composite transform lengths at the expense of using moduli with more complex arithmetic [Duhamel and Hollman, 1982] or at the expense of reducing the dynamic range [Robanov et al., 1998; Li and Peterson, 1990]. Another solution is to use a kernel that is different from 2, e.g., 3. Using this as the basis function all the transform lengths up to the modulus (for prime numbers) are possible and the transform length problem is considerably alleviated. As is shown in Agarwal and Burrus [1974c], values of N up to 65,536 are possible with ˛ D 3. Owing to the advances in fabrication technology of integrated circuits (ICs), the multiplication operation is not as great a problem as it once was, and fast implementation of FNTs with ˛ 6D 2 is possible in VLSI. Compared with the FFT, the FNT provides a simpler butterfly structure involving one single integer multiplier and two integer adders per butterfly against four real multipliers and six real adders in the case
8
S. BOUSSAKTA AND A. G. J. HOLT
of the FFT. The use of look-up tables can also remove the need for fast multiplications. It is also possible to use a kernel which is two or a power of two in most stages and general multiplications in the other stages. This was used by Agarwal and Burrus [1974c] and Reed and Truong [1978] and can be generalized to higher-order roots. Unusual-length NTTs [Parker and Benaissa, 1995] may be derived using recursive extensions of Rader’s theorem. See also Lee and Lu [1988] and Liu, Huizhu, and Lee [1991]. 1. Word Length With NTTs all operations are calculated modulo an integer F. Thus, an integer number is correctly presented only if its absolute value does not exceed F/2. When calculating the convolution of two sequences x(n) and h(n) in a finite field or a ring, the amplitude of x(n) and h(n) must be scaled so that the convolution result, jynj, does not exceed F/2 [Agarwal and Burrus, 1974a, c]. Using the polar representation of the Fermat prime field [Alfredsen, 1994] has shown that this representation provides suitable choices of the transform kernel for which multiplication by power of the kernel mod 2b C 1 are carried out by one modulo addition. As pointed out by P. R. Chevillat [1978], the signal-to-noise ratio (SNR) of an NTT filter shows no explicit dependence on the transform length as is the case for the FFT, where, owing to scaling and round-off error accumulation, doubling the length reduces the SNR by more than 3 dB. E. Complex Convolution Using NTTs The calculation of complex convolution (CC) arises in radar and sonar, for example. The calculation is usually performed in the complex field using the FFT. However, the same procedure can be evaluated in a finite ring of complex integers ZF , where all the arithmetic operations are performed as in a normal complex field with the real and imaginary parts evaluated modulo an integer F. Using NTTs to calculate the CC gives the same result as the conventional methods, provided the conditions for the dynamic range are met for both real and imaginary parts [Vegh and Leibovitz, 1976; Reed, 1975]. There are many proposed complex NTTs, such as the complex pseudo Fermat number (CPFNT), complex Mersenne (CMNT), and complex pseudo Mersenne (CPMNT) transforms, which are attractive for certain lengths [Nussbaumer, 1977] and complex NTTs based on quadratic residue number systems [Krishnan et al., 1986a, b]. We emphasize the FNTs here because such transforms have been tested in both hardware and software and found to be efficient. They can be used to compute the complex convolution [Nussbaumer, 1976].
9
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
III. APPLICATION OF 2-D FERMAT NUMBER TRANSFORMS TO THE CALCULATION OF 2-D CONVOLUTIONS The calculation of 2-D convolutions and correlations in the time domain is expensive in time even for relatively small filter sizes; hence transform methods are of interest. Multiplierless 2-D FNTs have been shown to be well suited for 2-D convolutions, where many restrictions imposed upon them in one dimension are either lifted or are of less effect. The reason for this, as was pointed out by Rader [1975] is that the 2-D filters are relatively small in size and magnitude. This is compatible with the 2-D FNT transform lengths which can be doubled using the method in [Boussakta and Holt, 1991]; hence the FNTs are found to be suitable for 2-D filtering [Vanser Kraats and Venetsanopoulos, 1982]; [Shakaff, Pajayakrit, and Holt, 1988; Marir, Shakaff, and Holt, 1985; Ahmed and Nundararajan, 1987; Boussakta, 1990; Marir, 1986]. In this section the application of 2-D NTTs to the calculation of 2-D convolutions and correlations and related problems encountered are discussed with examples. 1. Definition of the Two-Dimensional Fermat Number Transform (2-D FNT) The 2-D FNT of an image xm, n of dimension M ð N may be written as Xk, l D
M1 N1
nl xm, n ˛mk 1 ˛2
mod Ft
3
mD0 nD0
k D 0, 1, . . . , M 1,
l D 0, 1, . . . , N 1
Here ˛1 and ˛2 are roots of unity of order M and N, respectively. It is also possible to calculate the transform using one prime number along the rows and another prime number along the columns, although for practical reasons the modulus is usually kept the same for both dimensions. The inverse transform is given by xm, n D 1/MN
M1 N1
Xk, l˛mk ˛nl 1 2
mod Ft
4
kD0 lD0
m D 0, 1, . . . , M 1,
n D 0, 1, . . . , N 1
2. The Relationship Between Transform Size and Modulus for the 2-D FNT The condition under which 2-D NTTs have the 2-D cyclic convolution property creates a rigid relationship between the modulus, the transform size, and the kernel chosen. Table 1 shows the relationship between the transform size and the modulus. From ˛1 and ˛2 of Table 1, the values of ˛1 and ˛2 for any transform size that is a power of 2 up to Nmax ð Nmax can be calculated.
10
S. BOUSSAKTA AND A. G. J. HOLT TABLE 1 THE RELATIONSHIP BETWEEN TRANSFORM SIZE AND THE MODULUS FOR THE 2-D FNT.
Ft 28 C 1 D F3 216 C 1 D F4 232 C 1 D F5 264 C 1 D F6
N ð N for ˛1 D ˛2 D 2
N ð N forp ˛1 D ˛2 D 2
Nmax ð Nmax
˛ D ˛1 D ˛2 for Nmax ð Nmax
16 ð 16 32 ð 32 64 ð 64 128 ð 128
32 ð 32 64 ð 64 128 ð 128 256 ð 256
28 ð 28 216 ð 216 27 ð 27 28 ð 28
3 p3 p2 2
This is similar to a table for the 1-dimensional case (Agarwal and Burrus [1974c]) (© 1974, lEEE).
3. Arithmetic Complexity of the 2-D Fermat Number Transforms For certain transform sizes, the calculation of the 2-D Fermat number transforms will involve shifts and additions only. Therefore, the arithmetic complexity of this transform will depend on the transform sizes and the modulus chosen, as shown in Table 2. Note in Table 2 that the number of multiplications includes shifts and trivial multiplications by š1. 4. The Convolution Property of the 2-D FNT The two dimensional Fermat number transforms have the 2-D cyclic convolution property and hence can be used for the calculation of 2-D convolution/correlation as follows: yc m, n D 2-D INFNT f2-D FNT[xm, n] 2-D FNT[hm, n]g TABLE 2 NUMBER OF ARITHMETIC OPERATIONS NEEDED, FOR TRANSFORM SIZE N ð N,
Modulus
Nshift ðNshift for ˛1 D ˛2 D 2
Number of shifts for ˛1 D ˛2 D 2
Number of additions for ˛1 D ˛2 D 2
28 C 1 216 C 1 232 C 1 264 C 1
16 ð 16 32 ð 32 64 ð 64 128 ð 128
N2 log2 N N2 log2 N N2 log2 N N2 log2 N
2N2 log2 N 2N2 log2 N 2N2 log2 N 2N2 log2 N
BY THE
5
2-D FNT.
Nmax ðNmax ˛1 D ˛2 D 3
Number of Multiplications ˛1 D ˛2 D 3
Number of Additions ˛1 D ˛2 D 3
256 ð 256 216 ð 216
N2 log2 N N2 log2 N
2N2 log2 N 2N2 log2 N
Nshift ð Nshift Dpmaximum transform size involving shifts and additions only. For ˛1 D ˛2 D 2, the transform sizes are 2Nshift ð 2Nshift , but the number of shifts and adds are slightly higher. Nmax ð Nmax D maximum transform size involving multiplication.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
where yc m, n D
M1 N1
xi, jhhm iiM , hn jiN
11
6
iD0 jD0
and h iM means modulo M. In the 2-D circular convolution, the indices of the rows and the columns are calculated modulo M and modulo N, respectively. The linear convolution of an M1 ð N1 image and an L1 ð L2 filter can be obtained through the circular counterpart only if M ½ M1 C L1 1 and N ½ N1 C L2 1
7
Thus, the two images should be padded with zeros to form an extended image, as follows: xm, n for m D 0, 1, 2, . . . , M1 1; n D 0, 1, 2, . . . , N1 1 xm, n D 0 for M1 m < M and N1 n < N 8 hm, n for m D 0, 1, 2, . . . , L1 1; n D 0, 1, 2, . . . , L2 1 hm, n D 0 for L1 m < M and L2 n < N 9 5. Examples Figure 1 presents an example of self-convolving a square image of constant intensity, shown in Figure 1(a). The sequences are sufficiently padded with zero values, so the well-known symmetric pyramid shape is the result. High intensity in the middle of the picture is presented in gray levels as white; dark indicates low intensity, as shown toward the edges in Figure 1(b). The array size is increased beyond the bounds set by Equation (7). This results in wraparound error, which is indicated by loss of the symmetric shape of the picture. (The top and right sides are missing, and the lower and left sides are in error; Figure 1(c).) Also, for the convolution calculated by means of 2-D NTTs to be equivalent to the ordinary convolution, the input data must be scaled so that the output must not exceed the modulus. Figure 2(a) shows a linear 2-D convolution verifying this condition, and Figure 2(b) shows an overflow indicated by varying intensity from white to black and vice versa. A. Two-Dimensional Convolution with NTTs without Matrix Transpose and without Overlap A problem arises in two-dimensional processing using the convolution method because of the large information content of the images (of the order of 1000 ð
12
S. BOUSSAKTA AND A. G. J. HOLT
(b)
(a)
(c)
FIGURE 1. Wraparound errors in 2-D convolution. (a) Square image of constant intensity. (b) Exact self-convolution. (c) Inexact self-convolution.
1000 discrete points). It may be necessary to determine the convolution of a filter with a large signal by successive convolutions of the filter with sections of the image. The well-known overlap-save technique for two-dimensional image processing requires that the image be divided into sections and the original image size extended with a frame of zeros, as described earlier. Each extended section is convolved with the filter, which has also been extended with zeros. Only a limited number of the results of these convolutions are retained as useful. This process is repeated for all the sections until the whole image is convolved. Hunt [1972] has determined the size of the optimum sections for use with this method.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
(a)
13
(b)
FIGURE 2. Calculation of 2-D convolution using 2-D NTTs. (a) Exact linear convolution. (b) Inexact linear convolution due to integer overflow.
Hunt’s approach follows that of Helms [1967] in minimizing the quantity B D N1 N2 log2 N1 C log2 N2 /N1 L1 C 1N2 L2 C 1
10
Here N1 and N2 are the dimensions of the sections to be optimized and L1 and L2 are the filter dimensions. B is minimized and the results are tabulated. The constraint is that N1 and N2 are to be integer powers of 2. For a 30 ð 30 filter, Hunt’s table gives the optimum section as 512 ð 512 in dimension. Image arrays of the form 22u ð 22u may be convolved with a filter without recourse to sectioning. Also, the process may be computed without the need for matrix transposition [Marir, Shakaff, and Holt, 1985]. Consider images of 1024 ð 1024 pixels. The linear convolution y D x Ł h is exactly a circular convolution and can be computed using two forward NTTs and one inverse NTT if the relation N ½ L C H 1 is fulfilled, where N is the transform length, L is the length of the signal, and H is the filter length. The convolution becomes y D INV-NTT[NTTx NTTh] 11 The two-dimensional NTT can be written in its general form as follows: Xk, l D
N 1 1 N 2 1
1 hnli2 xm, n˛hmki ˛2 1
mod F
12
mD0 nD0
where h i1 stands for mod N1 and h i2 stands for mod N2 . Here ˛1 and ˛2 are roots of unity of order N1 and N2 , respectively. We assume a square input
14
S. BOUSSAKTA AND A. G. J. HOLT
matrix xm, n, where N1 D N2 D N, and let ˛1 D ˛2 D ˛, giving Xk, l D
N1 N1
xm, n˛hmki ˛hnli
mod F
13
mD0 nD0
The arrays that recur in this case are of the form 22u ð 22u and of the order of 1000 ð 1000 points. We treat the case u D 5, giving an image array of 1024 ð 1024. The value of u is arbitrary and the development is valid for any other value of u. With the dimensions defined, Equation (13) can be written as Xk, l D
1023 1023
xm, n ˛hmki ˛hnli
mod F
14
mD0 nD0
So far the value of ˛ has not been specified but must satisfy ˛1024 D 1 mod F4 and ˛t 6D 1 if 0 < t < 1024. We write Equation (14) in the form Xk, l D
1023
hmki
˛
mD0
1023
xm, n ˛hnli
mod F
15
nD0
k, l D 0, 1, . . . , 1023 Clearly the two-dimensional FNT is merely a one-dimensional FNT followed by another one-dimensional FNT. If it is assumed that the data are stored on a disk and randomly available in rows, then let the rows be first processed as follows. Let Gm, l D
1023
xm, n ˛hnli
mod F
16
nD0
m, l D 0, 1, . . . , 1023 Equation (16) can be written as Gm l D
1023
xm n ˛hnli
mod F
17
nD0
m, l D 0, 1, . . . , 1023 Equation (17) shows the one-dimensional nature of the transform to be computed.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
Let
l D 32p C q n D 32r C s
15
p, q, r, s D 0, 1, 2, . . . , 31
Then Equation (17) becomes Gm p, q D
31 31
xm s, r˛h32psi ˛h32rqi ˛hqsi
mod F
18
sD0 rD0
because ˛h32pÐ32ri D ˛h1024pri D 1 mod F4 . Note that the submatrix xm s, r is made up of elements of xm n ordered columnwise. This is done for practical reasons and causes the result Gm p, q to be ordered in a rowwise fashion. Consider the following relations: ˛1024 D ˛32 32 D 1 mod F4
and 232 D 1 mod F4
Comparing the two relations, it is clear that if among all the ˛s that satisfy ˛1024 D 1 mod F4 , there is at least one that satisfies ˛32 D 2 mod F4 , then the 1024-length NTT will reduce to a 32-length FNT. In fact, there are 32 such ˛s. The first, ˛ D 5851, was found by a computer search. A list containing all these ˛s is given here. The list contains the values of ˛ that satisfy ˛32 D 2 mod F4 . These values are derived from ˛i D 5851 ð 2i mod F4 , i D 1, 2, . . . , 31. Values 5851 11702 23404 46808 28079 56158 46779 28021 56042 46547 27557 55114 44691 23845 47690 29843
of a 59686 53835 42133 18729 37458 9379 18758 37516 9495 18990 37980 10423 20846 41692 17847 35694
16
S. BOUSSAKTA AND A. G. J. HOLT
Equation (18), therefore, becomes Gm p, q D
31 31
xm s, r2ps 2rq ˛hqsi
mod F
19
sD0 rD0
which, apart from the twiddle factor ˛hqsi , is exactly a two-dimensional 32length FNT, provided it can be shown that the exponents ps and rq are computed modulo 32. 1. If ps < 32, then ps mod 32 D ps. 2. If ps D 32, then 2ps D 1 mod F4 D 2ps mod 32 . 3. If ps > 32, then ps D ps mod 32 C 32t t D integer and 2ps D 2ps mod 32 Ð 232t D 2ps mod 32 . Equation (19) is then written in the form Gmp,q D
31
2jpsj ˛hqsi
sD0
31
xm s, r2jrqj
mod F
20
rD0
p, q D 0, 1, . . . , 31 where j Ð j stands for mod 32. 1. The Twiddle Factor ˛qs The twiddle factor arises from the transformation of the one-dimensional process into a two-dimensional process and acts on the result of the first one-dimensional FNT only. It also creates two problems, one of storage and one of time. Referring to Equation (20), let Um s, q D
31
xm s, r2jrqj
mod F
21
rD0
s, q D 0, 1, . . . , 31 which represents a 32-length FNT. Next compute Ł Um s, q D Um s, q ˛hqsi
mod F
22
Equation (22) represents a pointwise multiplication of two 32 ð 32 matrices. Finally, 31 Ł Gm p, q D Um s, q2jpsj mod F 23 sD0
which represents another 32-length FNT.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
17
Because the problems are associated with the twiddle factor ˛hqsi , consider how storage and time are reduced for that factor. There are 1024 sample values of ˛hqsi to be stored, but if it is noted that the matrix is symmetric, i.e. ˛hqsi D ˛hsqi , then savings in memory can be obtained at the price of a careful addressing scheme. The number of the sample values to be stored can be further reduced. The obvious reduction is made in the main diagonal, whenever ˛ii D ˛sq , with s 6D q. (For example, ˛2.2 D ˛1.4 . which is already stored) These cases and others bring the number of distinct values to be stored down to 339, which represents some 30% of the original space required. If each sample value (known prior to the computation of the two-dimensional FNT) has a 16-bit representation, only a 1 k ROM is necessary, with each ˛hqsi occupying two adjacent memory locations. This reduction in memory space is not matched by the same reduction in the number of multiplications by the twiddle factor; but if the time taken to compute a 32-length FNT is greater than or equal to the time needed to perform 32 multiplications, no time is wasted at all during the twiddling operation if parallel processing is used. After the first row of xm s, r has been transformed and while the second row is being processed, each element of the result of the previous operation is twiddled by ˛hqsi ; in other words, first compute Um 0, q D
31
x0, r2jrqj
mod F
24
rD0
q D 0, 1, . . . , 31. Then compute Um 1, q D
31
xm 1, r2jrqj
mod F
25
mod F
26
rD0
and
Ł Um 0, q D Um 0, q ˛hq.0i
until
31
xm 31, r2jrqj
mod F
27
UŁ m 30, q D Um 30, q ˛hq.30i
mod F
28
Um 31, q D
rD0
and
This means that the calculation of Equation (23) cannot be undertaken until UŁ m 31, q has been computed. But only UŁ m 31, 0 is needed in the first FNT Ł along the columns, and Um 31, 0 D Um 31, 0 ˛h31.0i mod F D Um 31, 0.
18
S. BOUSSAKTA AND A. G. J. HOLT
Thus, to compute Equation (23) for q D 0, the result of the last operation in Equation (22) is not required, and the two can be computed simultaneously. No twiddling occurs afterwards, and hence no time has been wasted. So far, it has been shown how the rows of a large input array can be transformed using a small-length FNT that needs no multiplications but only additions and word shifts (multiplication by powers of 2). The columns of the input array follow the same development, but because it was assumed that the data are available only in rows, the whole matrix must be transposed. If 32 rows of the input array can be accommodated in fast RAM, the whole twodimensional FNT of the 1024 ð 1024 image array can be computed using a 32-length FNT without transposing the array. Each row and each column of the 1024 ð 1024 matrix is broken up into a 32 ð 32 submatrix, where data are ordered columnwise. This is important, especially when processing the columns of the whole array. Consider the first column: when broken into a 32 ð 32 submatrix, the first row of this submatrix contains the elements of the initial column spaced 32 samples apart. Hence, if the input array is xm, n, m, n D 0, . . . , 1023, the first column would be broken up into the 32 ð 32 submatrix
AD
x0, 0 x1, 0 ...... x31, 0
x32, 0 x64, 0 . . . x992, 0 x33, 0 x65, 0 . . . x993, 0 ...... ......... x63, 0 x95, 0 . . . x1023, 0
29
The first row of A contains the elements of xm, 0 spaced 32 samples apart. Hence, if 32 rows of the input array are chosen such that each of the 1024 columns represents the first row of each of the 1024 submatrices, after processing each of the 32 rows read into fast memories, an extra 1024 32-length FNTs can be computed without extra input/output (I/O) operations. The twiddling operation following these 1024 FNTs consists of the multiplication of each 32-sample array by the first row of ˛hqsi . The next 32 rows of the input array also result in an extra 1024 FNTs, all of them being the second row of each 32 ð 32 submatrix. When all the 1024 rows of the input array have been read, processed and written back, not only will the NTT have been computed along the rows, but also a part of the NTT will have been done along the columns. Another pass on a different set of 32 rows will terminate the computation of the two-dimensional NTT. This development is shown shortly on a 4 ð 4 matrix. The number of I/O operations (each I/O operation consists of the transfer of 32 rows of data between disk and RAMs and back) is 32 for each dimension. An extra 31 I/O operations are needed at the end for the permutation of the rows of the result (it is assumed that the processing is done in place — i.e., the data are transferred back to the original position after computation).
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
19
Given enough fast memory, a large input image can be NTT transformed using a 32-length FNT without matrix transposing. Also, this fast memory allows a convolution to be computed without the need to use overlap sectioning methods, even when the filter dimensions are small. The two-dimensional convolution is computed using two forward NTTs and one inverse NTT. Thus, the two-dimensional NTT of the filter is computed and stored on a disk; the same disk will accommodate the two-dimensional NTT of the picture to be filtered. Both arrays are of dimension 1024 ð 1024. The fast memory is then used to accommodate 16 rows of each array, and the pointwise multiplication is computed. The result is then written back on to the disk while another pair of 16 rows is read into RAMs. The disk then contains the two-dimensional NTT of the convolution. The inverse two-dimensional NTT follows exactly the same development as the forward two-dimensional NTT. In fact, time-reversing every row (column) before computing the FNTs will enable the same algorithm and the same twiddle factors to be used. The advantages of this method are as follows: 1. The two-dimensional convolution using the overlap techniques is not efficient unless optimal sectioning is used, and referring to Hunt [1972], the optimal sections in this case must be of the order of 512 ð 512, which means that in this case eight times more memory would be needed: 2. Using this optimal sectioning, the advantage of using a small-length transform that has the attributes of the FNT is lost. 3. The method gives exact results (NTT property). a. Computation of the 2-D NTT. Suppose that the matrix
AD
a0 a4 a8 a12
a1 a5 a9 a13
a2 a6 a10 a14
a3 a7 a11 a15
30
is stored on a disk and is accessible by rows only. Suppose also that enough fast memory is available to accommodate two rows of A; then the two rows [a0 a1 a2 a3 ] and [a8 a9 a10 a11 ] are read into RAMs and each row is broken up into a 2 ð 2 submatrix, with data ordered columnwise: a0 a1
a2 a3
a8 a9
a10 a11
Each submatrix is NTT transformed to give b0 b2
b1 b3
b8 b10
b9 b11
20
S. BOUSSAKTA AND A. G. J. HOLT
Writing them back into a row format gives row 1
[b0 b1 b2 b3 ]
row 3
[b8 b9 b10 b11 ]
31
Doing the same for the other two rows of A yields row 2 [b4 b5 b6 b7 ]
row 4 [b12 b13 b14 b15 ]
32
Therefore, the NTT along the rows of A is given by B:
BD
b0 b4 b8 b12
b1 b5 b9 b13
b2 b6 b10 b14
b3 b7 b11 b15
and Bt D
b0 b1 b2 b3
b4 b5 b6 b7
b8 b9 b10 b11
b12 b13 b14 b15
33
To compute the NTT along the columns of B (rows of Bt ), the procedure would be row 1 of Bt row 3
[b0 b4 b8 b12 ] [b2 b6 b10 b14 ]
Breaking up into submatrices gives b0 b4
b8 b12
and
b2 b6
b10 b14
34
From Equation (34), it is clear that the rows to be transformed were already available in Equations 31 and 32 and hence would have been processed then, without the need to perform a matrix transposition. Referring to Eklundh’s fast algorithm for matrix transposing [Eklundh, 1972], with the same number of rows read into computer memory from a disk, the number of I/Os necessary to transpose a 4 ð 4 matrix is equal to 4. To compute a two-dimensional NTT, two matrix transpositions are needed, one to compute along the columns and the other to get back to the correct order, giving a total number of eight I/Os. The method described here uses only five I/Os: four to compute the two-dimensional NTT and one to permute the rows. b. Example of 2-D Autoconvolution Obtained from Simulations. Modulus D 28 C 1 D 257, ˛ D 35, array size D 4 ð 4 (padded with zeros to form 64 ð 64), magnitude D 2.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
21
Input array: 2 2 2 2 0 0
xm, n D
2 2 2 2 0 0
2 2 2 2 0 0
0 0 0
2 2 2 2 0 0
0 0 0 0 0 0 .. .
0 0 0 0 0 0
0 0 0
... ... ... ... ... ...
0 0 0 0 64 0 0
35
... 0
64 In this case the two-dimensional FNT is given by
xk, l D
63 63
xm, n35hmki 35hnli
mod 257
36
mod 257
37
xn, m35hnli
mod 257
38
xm n35hnli
mod 257
39
mod 257
40
mod 257
41
mD0 nD0
D
63
35hmki
mD0
Gm, l D
63
63
xm, n35hnli
nD0
nD0
or Gm 1 D
63 nD0
Here h i stands for mod 64. If l D 8p C q and n D 8r C s then
Gm p, q D
7 7
xm s, r35h8psi 35h8rqi 35hqsi
sD0 rD0
but 358 mod 257 D 4; therefore Gm p, q D
7 7 sD0 rD0
xm s, r4jpsj 4jrqj 35hqsi
22
S. BOUSSAKTA AND A. G. J. HOLT
The following example shows the three steps involved in computing G0 p, q: 1.
7
U0 s, q D
x0 s, r4jrqj
mod 257
42
mod 257
43
mod 257
44
rD0
2. 3.
U0Ł s, q D U0 s, q ð 35hsqi 7
G0 p, q D
U0Ł s, q4jpsj
sD0
The first row of the 64 ð 64 matrix is x0 m D [2 2 2 2 0 0 . . . 0 0 0].
x0 s, r D
2 2 2 2 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
45
x0 m broken up into an 8 ð 8 sub-matrix.
U0 s, q D
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
2 2 2 2 0 0 0 0
46
FNT of x0 s, r computed along the rows.
˛hsqi D
1 1 1 1 1 1 1 1
1 35 197 213 2 70 137 154
1 197 2 137 251 236 135 85
1 213 137 36 135 28 29 169
1 2 251 135 165 237 214 170
1 70 236 28 237 119 99 219
Only 26 distinct values out of 64.
1 137 135 29 214 99 33 105
1 154 85 169 170 219 105 239
47
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
U0Ł s, q D
2 2 2 2 70 137 2 137 4 2 169 17 0 0 0 0 0 0 0 0 0 0 0 0
G0 p, q D
8 170 0 106 0 155 0 91
121 183 80 139 157 93 164 107
2 2 169 4 17 245 72 13 0 0 0 0 0 0 0 0
160 160 119 36 109 229 134 97
3 161 252 40 35 130 232 191
2 2 140 17 215 13 56 58 0 0 0 0 0 0 0 0
7 144 127 245 230 247 158 143
156 133 103 92 21 69 242 228
2 51 170 81 0 0 0 0 90 135 104 86 197 28 131 16
47 143 123 99 40 161 55 119
23
48
49
G0 p, q is the two-dimensional FNT of x0 s, r. next G0 1 D [8 121 160 3 7 156 90 47 170 183 160 . . . 228 16 119]
50
This method was used to compute the autoconvolution ym, n D xm, nŁ x m, n from Yk, l D jXk, lj2 :
ym, n D
4 8 12 16 12 8 4 0 0 .. .
8 16 24 32 24 16 8 0 0 .. .
12 24 36 48 36 24 12 0 0 .. .
16 32 48 64 48 32 16 0 0 .. .
12 24 36 48 36 24 12 0 0 .. .
8 16 24 32 24 16 8 0 0 .. .
4 8 12 16 12 8 4 0 0 .. .
0 0 0 0 0 0 0 0 0 .. .
0...0 0...0 0...0 0...0 0...0 0 . . . 0 64 0...0 0...0 0...0 .. .. . .
0
0
0
0
0
0
0
0 0...0
51
64 Images of approximate dimensions 1000 ð 1000 are common, so the 1024 ð 1024 two-dimensional NTT is not as peculiar as it might appear. Clearly the convolution can be computed efficiently using a single transform length, needing no matrix transposition and not using overlap sectioning techniques.
24
S. BOUSSAKTA AND A. G. J. HOLT
B. Number Domain Analysis in Two Dimensions with Possible Applications NTTs have been mainly applied to the calculation of convolutions (filtering) and correlations. Until recently no other applications were thought to exist, because of the residue number system, where the same quantities are congruent rather than equal. Hence the concept of magnitude, as in normal fields, does not hold. NTTs are calculated by retaining the residues only, which has led to difficulty in trying to relate the information content in the number domain generated by NTTs to the time representation of signals and images, such as that of the Fourier transform correspondence between frequency and time domain. The aim of this section is to ease that difficulty. First, the patterns that emerge in the 2-D NTTs of elementary images are shown. Also, the patterns of 2-D NTTs of periodic images are considered. Such patterns are regular in form, indicating that 2-D NTTs are, in fact, very well structured. This gives rise to the possibility of their application in pattern recognition. 1. Simple Images in the 2-D NTT Domain Although the transform domain for NTTs is not as simple to interpret as for some other transforms, the 2-D FNT domain for two-dimensional data does have special patterns of zero and nonzero values that could find applications. For objects such as squares, rectangles, and triangles, patterns of zeros appear in the 2-D FNT domain as a signature of the geometrical shapes [Marir, 1986; Boussakta, 1990]. In this section uses and limitations of these patterns are considered. Although the features of zeros can be formulated into equations, they are not unique. Thus, they do not characterize a unique shape. Additional pixels may be needed. However, where there is a need for classifying a closed set of objects, the zero pattern may offer a simple pattern classifier, which can be used for industrial detection problems, for example [Hollinghum, 1984; Wallace, 1988]. The zero patterns in the 2-D NTT domain are invariable to translation, and the 2-D NTT domain rotates with the rotation of the original picture. 2. Mathematical Development In the following development, xm, n is assumed to have a uniform intensity. The 2-D NTT transform defined in Equation (3) becomes Xk, l D xm, n
M1 N1 mD0 nD0
nl ˛mk 1 ˛2
mod F
52
25
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
3. Square Images For a square image of side L with i1 and i2 defining the position of the first pixel, Equation (52) becomes L10 1 L20 1
Xk, l D xm, n
nl ˛mk 1 ˛2
mod F
53
mDi1 nDi2
where L10 D L1 C i1 and L20 D L2 C i2 for a square of side L, L1 D L2 D L Xk, l D xm, n ˛1i1 k ˛2i2 l
L 1 1 L 2 1
nl ˛mk 1 ˛2
mod F
54
Xk, l D Ak, lBk, l
mod F
55
Ak, l D xm, n ˛1i1 k ˛2i2 k
mod F
56
mod F
57
mD0 nD0
where
Bk, l D
L 1 1 L 2 1
nl ˛mk 1 ˛2
mD0 nD0
Ak, l is in general different from zero and depends only on the intensity and the position of the first nonzero pixel in the image data. In real arithmetic Equation (57) can never be zero for a positive ˛1 and ˛2 . However, in finite fields it takes zero values for particular values of k and l [Boussakta, 1990]. This gives a regular pattern of zeros that can be used as a signature for the feature under consideration. We study simple objects; the results can be extended to more complicated shapes utilizing the linearity property of the 2-D NTTs for possible use in pattern recognition. a. Square Element of Side L D 2. For a square of side L D 2 and constant intensity (see Figure 3(a)), Equation (57) becomes Bk, l D
1 1
nl ˛mk 1 ˛2 mod F
58
mD0 nD0
To find the pattern of regular zeros, Equation (58) is set to equal zero. It is then solved modulo F as follows: Bk, l D 0
59
which is equivalent to 1 C ˛k1 1 C ˛l2 D 0
mod F
60
26
S. BOUSSAKTA AND A. G. J. HOLT
(a)
(b)
(c)
(d)
(e)
(f)
FIGURE 3. Square images of sides 2, 3, 4 and their 2-D FNTs. (a) Square image of side 2. (b) 2-D FNT transform of (a). (c) Square image of side 3. (d) 2-D FNT transform of (c). (e) Square image of side 4. (f) 2-D FNT transform of (e).
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
27
In finite fields Equation (60) has two solutions: k D M/2 and l D N/2
61
Thus, the NTT of this square contains one row and one column of zero values in the middle, which divide it into four parts, as shown in Figure 3(b). Note no particular modulus or transform length is specified, although in the following we use F3 with the transform dimensions M D N D 16 and ˛ D 2. Unless otherwise stated, both the image data and the 2-D FNT are displayed using 32 grey levels, ranging from black for zero intensity to white for the maximum intensity. b. Square of Side L D 3. Bk, l D
See Figure 3(c). Equation (57) becomes
2 2
ml ˛nk 1 ˛2
mod F
62
mD0 nD0
Bk, l D 0 mod F is equivalent to 1 21 1 C ˛k1 C ˛2k 1 1 C ˛2 C ˛2 D 0
mod F
63
Whatever the values of k and l, Equation (63) has no solution because neither 2l 1 (˛k1 C ˛2k 1 ) nor (˛2 C ˛2 ) can be equal to 1. Hence, there is no regular pattern of zeros (no whole column or row of zero values) in the NTT of a square image of size 3 ð 3. This is confirmed in Figure 3(d). However, if we look at the NTT as a whole, other elements can be selected as a signature of this square. For example, if the pattern of 2 is selected, it gives a regular pattern, forming a circle with one value at the center. c. Square of Side L D 4. Bk, l D
3 3
See Figure 3(e).
nl ˛mk 1 ˛2
mod F
64
mD0 nD0 3k 1 2l 3l Bk, l D 1 C ˛k1 C ˛2k 1 C ˛1 1 C ˛2 C ˛2 C ˛2 D 0
3k 1 C ˛k1 C ˛2k 1 C ˛1 D 0 for
3l 1 C ˛l2 C ˛2l 2 C ˛2 D 0 for
65
k D M2i C 1/2
and k D M2i C 1/4 and
mod F
66
l D N2j C 1/2
and l D N2j C 1/4
67
where i and j are integers such that k and l are less than M and N, respectively. For N D M D 16, the solutions are k, l D 4, 8, 12. Hence, the pattern of a
28
S. BOUSSAKTA AND A. G. J. HOLT
square image with side L D 4 divides the NTT image into 16 parts, as shown in Figure 3(f). d. Square of Side L D 5.
See Figure 4(a).
Bk, l D
4 4
ml ˛nk 1 ˛2 D 0
mod F
68
nD0 nD0
This equation has no solution; thus a square image of side L D 5 has no regular pattern of zeros, as shown in Figure 4(b). However, it has a special distribution of other intensities such as 2 that is different from all previous features. e. Square of Side L D 6. Bk, l D
5 5
See Figure 4(c). nl ˛mk 1 ˛2 D 0
mod F
69
mD0 nD0
D 1 C
˛3k 1 1
C
˛3l 2
2 2
nl ˛mk 1 ˛2 D 0
mod F
70
mD0 nD0
The double sum was shown to be different from zero. Thus, the regular zero values of Bk, l are the same as the zeros of: 3l 1 C ˛3k 1 1 C ˛2 D 0
mod F
71
which gives: k D M2i C 1/6 and l D N2j C 1/6
72
where i, j are integers such that k and l are also integers and less than M and N, respectively. For M D N D 16, there are two solutions, corresponding to k D 8 and l D 8
73
Thus, a square image of side L D 6 is characterized by one row and one column of zeros in the middle of the NTT image dividing it into four parts, which is the same as a square image of size 2 ð 2. This regular pattern of zeros is not unique, and other pixels are needed to make the difference. This does not become a problem because there is a clear difference in the images as a whole. Consider the dc values (the first pixels at k D 0 and l D 0). As shown in Figure 3(b) and Figure 4(d), these pixels are clearly different. Also,
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
(a)
(b)
(c)
(d)
(e)
29
(f)
FIGURE 4. Square images of sides 5, 6, 7 and their 2-D FNTs. (a) Square image of side 5. (b) 2-D FNT transform of part (a). (c) Square image of side 6. (d) 2-D FNT transform of part (c). (e) Square image of side 7. (f) 2-D FNT transform of part (e).
30
S. BOUSSAKTA AND A. G. J. HOLT
the distribution of other low-intensity pixels presented as black can result in a notable difference. Hence, a few extra pixels are sufficient to distinguish between these squares. f. Square of Side L D 7.
See Figure 4 (e).
Bk, l D
6 6
nl ˛mk 1 ˛2
mod F
74
mD0 nD0
Bk, l D 0 has no solution. Thus, as Figure 4(f) illustrates, there is no regular pattern of zeros in the NTT of this image. However, the pattern of 2 is well distributed, giving a diagonally parallel pattern of (2) that uniquely characterizes this shape, as shown in Figure 4(f). g. Square of Side L D 8. Bk, l D
See Figure 5(a). 7 7
nl ˛mk 1 ˛2 D 0
mod F
75
mD0 nD0
which is equivalent to 2k k 4l 2l l 1 C ˛4k 1 1 C ˛1 1 C ˛1 1 C ˛2 1 C ˛2 1 C ˛2 D 0
mod F
76
This equation has fourteen solutions, corresponding to k D M2i C 1/8
l D N2j C 1/8
k D M2i C 1/4
l D N2j C 1/4
k D M2i C 1/2
l D N2j C 1/2
78
If N ð M D 16 ð 16, then k D 2, 4, 6, 8, 10, 12, 14 l D 2, 4, 6, 8, 10, 12, 14 Therefore, seven rows and seven columns of a regular pattern of zeros are present in the NTT of a square image of size 8 ð 8. This is shown in Figure 5(b). The same development has been carried out for squares of other sizes. The results for these are shown in Figures 5–7. 4. Rectangular Images The same development for different rectangles shows that this pattern obeys the same rules as that of the squares. Taking M 6D N in the development
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
(a)
(b)
(c)
(d)
(e)
31
(f)
FIGURE 5. Square images of sides 8, 10, 11 and their 2-D FNTs. (a) Square image of side 8. (b) 2-D FNT transform of part (a). (c) Square image of side 10. (d) 2-D FNT transform of part (c). (e) Square image of side 11. (f) 2-D FNT transform of part (e).
32
S. BOUSSAKTA AND A. G. J. HOLT
presented previously will give the case for different rectangles. Thus, as for the square images, a rectangle with odd length and odd width has no zero pattern and a rectangle of length L and width W shares the same pattern of zeros along rows and along columns with the square of side L and the square of side W, respectively. In addition, a rectangle may have the same pattern of zeros as a square, which necessitates the comparison of other pixels to discern a difference. 5. Summary of the Results We may draw the following conclusions: 1. All squares having sides formed from odd numbers of pixels and all rectangles with odd lengths and odd widths have no pattern of zeros but a regular pattern of other intensities; as an example consider the value of 2. 2. Some patterns of zeros are common for several shapes, for example, the pattern of 2 ð 2, 6 ð 6, 10 ð 10, and 14 ð 14 or the pattern of 4 ð 4 and 12 ð 12. Therefore, other elements in the 2-D NTT domain should be selected to make the difference if needed. Few pixels are needed to distinguish between the patterns. For example, all cases previously stated as having similar 2-D NTTs have a different dc value, which may be used to identify each of them. Other intensities, such as the pattern of 2s, can also be used whenever the pattern of zeros is absent or there is a need for extra pixels to make the difference. 3. The pattern of zeros, as will be shown later, is invariable to translation and rotates with the rotation of the picture. In all the figures just given, the modulus is chosen to be F3 . The zero pattern is presented as a black background and the nonzero pattern is grey using 32 levels, which range from black to white. 6. Generalization and Possible Applications Because the NTTs are linear, generalization to images of higher complexity follows the same analysis. This is done for more complicated shapes, such as triangles and alphabetic characters [Boussakta, 1990; Marir, 1986]. In Figure 8(a) a triangle of equal sides is shown in the top left corner. Figure 8(b) shows only the pattern of zeros as white on a black background, which appears in the transform domain to form two triangles placed base to base. This shape is also diagonally symmetric. In Figure 8(c), the same triangle is displaced and reoriented. Figure 8(d) shows the pattern of zeros. It is clear that the same zero pattern exists (its orientation follows the input image), and the
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
(a)
(b)
(c)
(d)
(e)
33
(f)
FIGURE 6. Square images of sides 12, 13, 14 and their 2-D FNTs. (a) Square image of side 12. (b) 2-D FNT transform of part (a). (c) Square image of side 13. (d) 2-D FNT transformation of part (c). (e) Square image of side 14. (f) 2-D FNT transform of part (e).
34
S. BOUSSAKTA AND A. G. J. HOLT
(a)
(b)
(c)
(d)
FIGURE 7. Square images of sides 15 and 16 and their 2-D FNTs. (a) Square image of side 15. (b) 2-D FNT of part (a). (c) Square image of side 16. (d) 2-D FNT of part (c).
presence of the triangle can be detected by counting the few zero values. If the data are in diminished one representation, the detection of zero values in the 2-D FNT domain is easily done by testing only the most significant bit. This may be used, for example, in pattern recognition. 7. The Effect of Translation on the Zero Pattern If the squares or triangles presented so far are translated along the rows or the columns or in both directions, the coordinates will be changed from i1 and i2 to i1 C a and i2 C b, where a presents the translation along rows and b, along
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
(a)
(b)
(c)
(d)
(e)
(f)
35
FIGURE 8. Simple shapes and their 2-D FNTs. (a) A 3 ð 3 triangle. (b) Zero pattern in the 2-D FNT of part (a). (c) 3 ð 3 triangle displaced and reoriented. (d) Zero pattern in the 2-D FNT of part (c). (e) A square of side 4 translated into the middle. (f) Zero pattern of the 2-D FNT of part (e), which is the same as Figure 3(f).
36
S. BOUSSAKTA AND A. G. J. HOLT
columns. Then Equation (53) becomes Xk, l D xm, n
L 1 1
L 2 1
nl ˛mk 1 ˛2
mod F
79
mDi1 Ca nDi2 Cb 2 Cb Xk, l D xm, n ˛1ki1 Ca ˛li 2
L 1 1 L 2 1
nl ˛mk 1 ˛2
mD0 nD0
D A2 k, lBk, l where
80
lb ki1 li2 A2 k, l D xm, n ˛ka 1 ˛2 ˛1 ˛2 lb D Ak, l˛ka 1 ˛2
mod F
mod F 81
From Equation (81) we can see that translation is equivalent to multiplying each pixel by a corresponding factor: lb ˛ka 1 ˛2
mod F
82
However, the solution of the equation that gives rise to the pattern of zeros is unchanged. L 1 1 L 2 1 nl ˛mk mod F 83 Bk, l D 1 ˛2 mD0 nD0
which is the same as Equation (57). Hence the pattern of zeros that appears in the 2-D NTT domain remains unchanged. This is true for all kinds of images and all NTTs. Therefore, the method described here is insensitive to translation. This is an important feature for industrial inspection problems, for instance. Figure 8(e) shows a 4 ð 4 square translated to the middle, and Figure 8(f) shows its 2-D FNT transform, which has the same pattern of zeros as Figure 3(f). 8. Possible Applications The patterns of zeros show the behavior in the 2-D FNT domain with a view to the use of NTTs for purposes other than convolutions. The results may be important in pattern-recognition applications such as automatic isolated machine character recognition. In Figure 9(a), (c), and (e), letters are simulated; their 2-D NTTs are presented in Figure 9(b), (d), and (f), respectively, where only the zero values are displayed as white on different backgrounds. It is clear that the pattern of zeros is unique for all letters considered here. Counting all or some of the zero values can lead to the identification of each letter. Thus, the complexity of the process is reduced to the calculation of the 2-D FNT of each letter. It is not necessary to calculate and display the whole transform; only the zero values are needed.
37 FIGURE 9. Some characters and their 2-D FNT domain patterns. (a) Letter T. (b) Zero pattern in the 2-D FNT of the letter T. (c) Letter F. (d) Zero pattern in the 2-D FNT of the letter F. (e) Letter E. (f) Zero pattern in the 2-D FNT of the letter E.
38
S. BOUSSAKTA AND A. G. J. HOLT
9. Summary Although the number domain generated by the 2-D NTTs is not always simple to interpret, 2-D NTTs are meaningful. The study of some of the patterns of zeros that appear in the number domain for 2-D NTTs shows that, even though the patterns may be similar for some features, they might be used effectively in pattern-recognition problems where there is a need only for classifying a closed set of objects, such as those that occur in character recognition. If the process is complicated and the zero pattern might be not clear, other pixels in the NTT domain can be selected to show the difference between different patterns. This decision is problem dependent and can be made in the training session that precedes any pattern-recognition process [Hollingum, 1984; Wallace, 1988; Bongard, 1970]. IV. 2-D NTTS OF PERIODIC FUNCTIONS AND THEIR APPLICATIONS Some of the properties of the transform domain of the 2-D NTT and its possible applications are considered. Particular attention is given to the 2-D FNT and to data of 16 ð 16 pixel size. However, the results presented would apply to other picture sizes [Boussakta, 1990; Boussakta, Shakaff, Marir and Holt, 1988]. The transform domain for 2-D NTTs is not as simple to interpret as a frequency spectrum. However, for the periodic 2-D data patterns considered, the transform does have a well-structured form. A number of specific examples of 2-D FNT of various periodic patterns are considered, and both the data patterns and their FNTs are displayed. A general result is given for 2-D FNTs, showing the maximum number of nonzero values that appear in the transforms of given data patterns. These results are generalized for 2-D FNTs, and it is shown that similar results hold for other 2-D NTTs. For periodic 2-D data the transform contains many zero-valued entries: these are true zeros, because no rounding or truncation errors are introduced by the 2-D FNT calculation. It follows that the 2-D FNT computation of such data is simplified. The 2-D FNT of periodic data has a regular pattern, and any small imperfection in the data that interferes with its periodic nature can destroy the regularity of the 2-D FNT pattern. This is shown to be so for a number of examples where the effect of a small (one-pixel) change in otherwise periodic data produces a marked change in the resulting 2-D FNT patterns [Boussakta, 1990; Boussakta, Shakaff, Marir and Holt, 1988]. A. Analysis of the 2-D FNT of Periodic Structures Consider periodic patterns. The structure of 2-D FNT is formulated into simple equations, which lead to a general result. An image xm, n is periodic with period Tr ð Tc if the pixel xm, n is equal to the pixel xm C Tr , n C Tc , where Tr is the period along rows and Tc is the period along columns. An image of
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
39
period 8 ð 8 is shown in Figure 10(a). Although the following development assumes that ˛1 D ˛2 D 2, M D N D 16, and F D F3 D 257, the theory is applicable to any case where M and N are highly composite. 1. Case 1: Tr D Tc Let the period be Tr ð Tc D M/2 ð N/2 D 8 ð 8: xm, n D xm C 8, n C 8. The data are shown in Figure 10(a). The 2-D FNT of an image xm, n from Equation (3) is Xk, l D
15 15
xm, n2mk 2nl
mod F3
84
mD0 nD0
Interchanging the summations, performing them first with respect to m, Equation (84) can be written as Xk, l D
15
2nl
nD0
7
xm, n2mk C
mD0
15
xm, n2mk
mod F3
85
mD8
Because the image data are periodic along columns with Tr D 8, Equation (85) can be written as Xk, l D
15
2nl 1 C 28k
nD0
7
xm, n2mk
mod F3
86
mD0
Performing the same step along rows gives Xk, l D 1 C 28k 1 C 28l
7 7
xm, n2mk 2nl
mod F3
mD0 nD0
D Ak Ð Al
7 7
xm, n2mk 2nl
mod F3
87
Ak D 1 C 28k
mod F3
88
Al D 1 C 28l
mod F3
89
mD0 nD0
and
where Ak D 2 0 Al D 2 0
if k D 2i, i D 0, 1, 2, . . . , 7 mod F 3 otherwise if l D 2j, j D 0, 1, 2, . . . , 7 mod F 3 otherwise
90 91
40
S. BOUSSAKTA AND A. G. J. HOLT 1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
3
4
5
6
7
8
9
1
3
4
5
6
7
8
9
1
4
5
6
7
8
9
3
2
4
5
6
7
8
9
3
2
5
6
7
8
9
2
4
6
5
6
7
8
9
2
4
6
6
7
8
9
1
2
3
4
6
7
8
9
1
2
3
4
7
8
9
3
6
9
2
1
7
8
9
3
6
9
2
1
8
9
1
2
4
6
8
1
8
9
1
2
4
6
8
1
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
3
4
5
6
7
8
9
1
3
4
5
6
7
8
9
1
4
5
6
7
8
9
3
2
4
5
6
7
8
9
3
2
5
6
7
8
9
2
4
6
5
6
7
8
9
2
4
6
6
7
8
9
1
2
3
4
6
7
8
9
1
2
3
4
7
8
9
3
6
9
2
1
7
8
9
3
6
9
2
1
8
9
1
2
4
6
8
1
8
9
1
2
4
6
8
1
(a)
67
0
61
0
40
0
221 0
0
0
245 0
177 0
84
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
144 0
110 0
117 0
205 0
105 0
181 0
87
0
250 0
0
0
0
0
0
0
0
0
0
0
44
0
192 0
243 0
144 0
159 0
115 0
29
0
239 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
160 0
171 0
202 0
37
0
21
0
29
0
253 0
53
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
16
0
194 0
189 0
119 0
56
0
93
0
60
0
173 0
0
0
0
0
0
0
0
0
0
0
0
0
0]
26
0
164 0
150 0
134 0
95
0
247 0
54
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
173 0
205 0
36
0
30
0
26
0
183 0
222 0
17
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
8
0
102 0
188 0
36
0
181 0
38
0
10
0
38
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
(b)
FIGURE 10. Periodic image with Tr ð Tc D 8 ð 8 and its 2-D FNT. (a) Periodic image with Tr ð Tc D 8 ð 8. (b) Two-dimensional 2-D FNT of part (a).
41
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
Hence,
Xk, l D
4
7 7
if k D 2i, l D 2j,
xm, n2mk 2nl
mD0 nD0
i, j D 0, 1, 2, . . . , 7 otherwise
0
mod F3
92 Thus, period Tr ð Tc D 8 ð 8 leads to a maximum of 64 nonzero elements for an image of 16 ð 16. Figure 10(b) illustrates this. For other periods and cases when the images have different periodicities along rows and columns, the results are tabulated in Tables 3 and 4. 2. Generalization for 2-D FNT Let xm, n be any periodic image with period Tr ð Tc and Ft be any modulus. Then the 2-D FNT of the data is given by Xk, l D
M1 N1
xm, n2mk 2nl
mod Ft
mD0 nD0
D
M1 mD0
2mk
N1
xm, n2nl
mod Ft
93
nD0
TABLE 3 CASE 1 Tr D Tc . Maximum Number of Tr ð Tc
Equations of Nonzero Pixels 16
3 3
xm, n2mk 2nl
mod F3
Nonzero Pixels for k D 4i;
mD0 nD0
4ð4
i D 0, 1, 2, 3;
Xk, l D
l D 4j;
16
j D 0, 1, 2, 3 0
otherwise 64
2ð2
1 1
xm, n2mk 2nl
mod F3
for k D 8i;
mD0 nD0
Xk, l D
l D 8j; i, j D 0, 1 0
otherwise
4
42
S. BOUSSAKTA AND A. G. J. HOLT TABLE 4 CASE 2 Tr 6D Tc . 7 1
16
xm, n2mk 2nl
mod F3
mD0 nD0
2ð8
If k D 8i;
Xk, l D
i D 0, 1;
l D 2j;
16
j D 0, 1, . . . , 7 0
otherwise 8
7 3
xm, n2mk 2nl
mD0 nD0
4ð8
mod F3 if k D 4i;
Xk, l D
l D 2j
32
i D 0, 1, . . . , 3; j D 0, 1, . . . , 7 0
otherwise
32
3 1
xm, n2mk 2nl
mD0 nD0
2ð4
mod F3 if k D 8i;
Xk, l D
l D 4j;
8
i D 0, 1; j D 0, 1, . . . , 3 0
otherwise
Summing along rows gives Xk, l D
M1
Tc1
mk
2
mD0
xm, n2 C nl
N1
xm, n2nl
nDTc
nD0
CÐÐÐ C
2T c 1
xm, n2nl
mod Ft
94
nDNTc
because xm C Tr , n C Tc D xm, n. Equation (94) can be written as Xk, l D
M1
2mk 1 C 2Tc l C 22Tc l
mD0
CÐÐÐ C2
NTc l
T c 1 nD0
xm, n2nl
mod Ft
95
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
43
Summing along columns with respect to the period Tr , Equation (95) can be written as Xk, l D 1 C 2Tr k C 22Tr k C Ð Ð Ð C 2MTr k ð 1 C 2Tc l C 22Tc l C Ð Ð Ð C 2NTc l
T c 1 r 1 T
xm, n2mk 2nl
mod Ft
96
mD0 nD0
Factoring Equation (96) gives Xk, l D Ak Ð Al
T c 1 r 1 T
xm, n2mk 2nl
mod Ft
97
mD0 nD0
where log2 M/Tr
Ak D
1 C 2M/Ek ;
E D 2q
mod Ft
98
1 C 2N/Gl ;
G D 2p
mod Ft
99
qD1 log2 N/Tc
Al D pD1
From Equation (98) and Equation (99): Ak D Al D
M/Tr
0 N/Tc 0
if k D Mi/Tr ,
i D 0, 1, 2, . . . , Tr 1
otherwise if l D Nj/Tc ,
j D 0, 1, 2, . . . , Tc 1
otherwise
mod Ft 100 mod Ft 101
Therefore, M/Tr N/Tc
T c 1 r 1 T
xm, n2mk 2nl
mod Ft
mD0 nD0
Xk, l D
if k D Mi/Tr l D Nj/Tc i D 0, 1, 2, . . . , Tr 1 j D 0, 1, 2, . . . , Tc 1 0 otherwise
102
44
S. BOUSSAKTA AND A. G. J. HOLT
Theorem If an image xm, n is periodic with period Tr ð Tc , the number of nonzero pixels in its 2-D FNT is less than or equal to the product Tr ð Tc . These nonzero pixels are determined by the following equation: Xk, l D M/Tr N/Tc
T c 1 r 1 T
nl xm, n ˛mk 1 ˛2
mod Ft
mD0 nD0
with k D Mi/Tr ; l D Nj/Tc ; i D 0, 1, 2, . . . , Tr 1; j D 0, 1, 2, . . . , Tc 1. 103 The result in Equation (103) is derived for Fermat number transforms. It can be extended to other number theoretic transforms. 3. The Effect of Data Errors on 2-D FNT We have considered cases where the data are periodic. Now consider the effect on the 2-D FNT of small errors in the data that will produce departures from periodicity in one period of the data while all the other data remain periodic. Figure 11(a) shows an example where the data are all periodic with Tr D Tc D 2, except for a single pixel in row 2 column 5, where the expected data value 7 is replaced by the value 2, which may be regarded as an error. Clearly this single error disturbs the periodicity of the data pattern. If the actual data including the error were to be subtracted from the expected periodic data patterns, the error would be detected by the only nonzero value in the difference. However, this one difference pixel would be the only indication of the occurrence of an error. The effect of the single error upon the 2-D FNT transform domain is very much more obvious than it is in the nontransformed data. This is shown in Figure 11(b), which, in contrast to the transform of periodic data given in Figure 10(b), has no zero values at all; although a pattern can be detected in the transform elements, Figure 11(b) displays none of the regular structure with zeros that is apparent in the transforms of periodic data. The regular structure of the FNT has been completely altered by the single error that has been introduced into the data. It follows that, although the existence of small errors in a periodic pattern may be detected by eye after a subtraction between the actual and the expected or standard data patterns, it becomes much more obvious when the 2-D FNT of the data is observed. 4. Summary of the Results From the preceding development, the following results are noted: 1. The 2-D FNTs of periodic images are well structured. 2. The position and the values of the nonzero pixels can be precisely determined.
45
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
2
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
3
7
3
7
3
7
3
7
3
7
3
7
3
7
3
7
4
9
4
9
4
9
4
9
4
9
4
9
4
9
4
9
(a)
182 97
20
126 177 10
63
237 131 80
247 194 40
252 97
177 10- 63
217 5
194 40
252 97
217 200 160 237 131 80 126 177 10
63
160 237 131 80
247 194 40
252 97
217 5
20
126 177 10
63
5
160 237 131 80
247 194 40
252 97
20
126 177 10
63
217 5
80
247 194 40
252 97
20
60
97
126 177 10
63
217 69
63
217 5
20
247 194 40
20
160 237 131 80
237 131 80
247 194 40
177 10
63
217 5
194 40
252 97
217 5 20
160 126
160 237 131 80
247
20
126 177 10
63
217
160 237 131 80
247 194 40
252 97
126 177 10
217 5
160 237 131 80
247 194 40
252 97
63
160 237 131 247 194 40
252 97
20
126 177 10
20
126 177 10
63
217 5
252 97
160 237 131 80
247 194 40
20
126 177 10
63
217 5
5
160 237 131 80
247 194 40
252 97
20
126 177 10
63
217 5
80
247 194 40
252 97
63
217 5
160
20
126
160 237 131 80
247
20
126 177 10
63
217
252 97
160 237 131 80
247 194 40
20
126 177 10
63
217 5
160 237 131 80
247 194 40
252 97
20
160 237 131 126 177 10
(b)
FIGURE 11. (a) Periodic image where one pixel is in error. (b) Two-dimensional FNT, showing that the regular pattern is destroyed.
46
S. BOUSSAKTA AND A. G. J. HOLT
3. The amount of computation required to determine the 2-D FNT of periodic images can be drastically reduced by using the preceding results. For example, in the extremely redundant case of calculating an M ð N image with period Tr ð Tc , only Tr ð Tc pixels need to be computed instead of M ð N. In extreme cases, such as a periodic image of 128 ð 128 D 16, 384 pixels with period Tr ð Tc D 2 ð 2, the computation of 16,384 pixels is reduced to only 4 pixels. A slight imperfection in the period of the image data will destroy the highly regular pattern that arises due to the periodicity, and hence the presence of defects can be easily detected from the dramatic change in the 2-D FNT produced by even a one pixel error. B. Applications The 2-D FNTs can be used to detect defects, especially small defects, in periodic structures. This result would appear to be applicable to several detection problems. A possible application could be in the process for integrated or film circuits, where photomasks are used for fabrication steps. Typically, these photomasks possess a repeated or periodic pattern. Before being used the masks must be carefully inspected for any possible defects, which, if not detected, would result in the manufacture of faulty circuits. In this case, it may be possible to apply the 2-D FNT method to readily show the existence of a mask defect. The inspection process begins by taking a digitized image (of dimension M ð N) of the photomask followed by the computation of its 2-D FNT. The defect could be detected by the visual examination of the transforms of the ideal and faulty masks. This effect is illustrated in Figure 12(a) and (b), where the data simulating a very simple photomask as blocks of nine 1s and having a periodic structure are shown with their 2-D FNT. The transform clearly shows the regular pattern with small number of nonzero elements that is to be expected. Figure 13(a) shows the same simulated photomask data as in Figure 12(a), with the addition of a single error introduced at the boundary between two blocks of 1s. Figure 13(b) shows the 2-D FNT of the photomask with the error, and it is clear that it is drastically different from the 2-D FNT of the error-free mask in Figure 12(b). It follows that a single error in a photomask could readily be detected by eye when comparison is made between the 2-D FNTs of a standard mask and the one under inspection. Use can also be made of the fact that it will be necessary to compare only a few elements in the 2-D FNTs in order to detect the existence of an error. In Figure 12(b) the 2-D FNT of an error-free mask has many rows and columns of zero-valued elements.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
47
(a)
(b)
FIGURE 12. (a) Perfect photomask of 64 ð 64. (b) 2-D FNT of the perfect photomask. The regular white pixels indicate nonzero data.
In order to detect the presence of the error in Figure 13(a), it is necessary to examine (automatically) only the element where row 7 and column 7 meet and note that its value is nonzero. The same method can, of course, be applied using any other zero-valued row and column in the 2-D FNT of the periodic, error-free photomask. Because some zero values do occur in the 2-D FNT of a mask containing an error (see Figure 13(b)), it would be advisable to check the values or the sum of a small number of elements on the same row and column where a zero element is expected, as at 7, 7 in the earlier example. The number of nonzero elements in the 2-D FNT is small, especially for the data with Tr ð Tc − M ð N, which is typically the case with a photomask.
48
S. BOUSSAKTA AND A. G. J. HOLT
(a)
(b)
FIGURE 13. (a) Defective photomask with just one pixel error. (b) 2-D FNT of part (a) showing a completely different FNT pattern of the defective mask.
Therefore, the detection process may require comparison of only a few 2-D FNT elements. Any differences between the 2-D FNT values being compared automatically imply a defective mask. If it is required, the precise location of the defect may be determined either by a subtraction operation between the data for the mask under test and a standard mask or by taking the inverse 2-D FNT of the difference between the 2-D FNT values for the standard and the faulty mask. The same result could, of course, be obtained by subtracting the data in Figure 12(a) from that in Figure 13(a). Note that in its present form, this defect-detection technique is a highly sensitive go/no-go test. If required, it
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
49
should be possible to reduce the sensitivity by, for example, grouping the data pixels before calculating the transform or by considering the value of the transform element at k D 0, l D 0, where this value can be constrained to be less than the modulus (for example, by thresholding the input image) and then checking that this value lies within some predefined limit. Referring back to Figures 12(b) and 13(b), comparing the two values at location k D 0, l D 0 indicates that the difference between the two images is 1, which is equal to the total error. The example quoted requires considerable further investigation and tests before it can be regarded as practical. It is not the aim of this chapter to investigate this application; however, it does indicate possible applications of number theoretic transforms that do not appear to have been explored before.
C. Implementation of the Inspection Method It is helpful to select a transform with a suitable length that can be implemented sufficiently fast and with small complexity. The 2-D FNT meets these requirements. The maximum transform size for p a multiplierless 2-D FNT, as shown in Table 1, is 256 ð 256, with ˛ D 2 and modulus F D F6 . This modulus needs 65 bits to be represented, which is much larger than required in the applications of pattern recognition or in industrial inspection, where the image is usually thresholded to an appropriate level (8 bits or less in many cases). For images up to 256 ð 256 the best modulus seems to be F3 or F4 . 1. Implementation with F3 The maximum multiplierless p transform length with the modulus F3 D 257 is 32, corresponding to ˛ D 2, and can be increased by using roots of unity of higher order. In fact, the transform length can be extended to 64, 128, and 256 if ˛ is chosen to be the fourth, the eighth, or the sixteenth root, which correspond to 35, 42, and 88, respectively. If the image is large, it would be segmented into small subimages and then each subimage inspected individually. The maximum transform length that can be attained is 256 for the modulus F3 D 257. This seems to be ideally suited for images up to 256 ð 256. However, this value of ˛ requires normal integer multiplications along the last three stages of the algorithm used to calculate the 2-D FNT, involving three integer multiplications per pixel. This can be done quickly by using a look-up table. In fact, in the finite field defined by the modulus F3 D 257, the number of elements is just 256. These elements must be multiplied by all powers of ˛, ranging from 0 to 127. Hence, with a look-up table, all
50
S. BOUSSAKTA AND A. G. J. HOLT
multiplications can be stored in a memory of size 32k bytes, which is fairly small. Thus, all multiplications are eliminated and replaced by a look-up table. 2. Implementation Using 1-D FNT The inspection process begins by taking a digitized image of dimension N ð M of the photomask or periodic image, followed by the computation of its 2D FNT. Because the inspection of the photomask is based on the pattern of zeros and not on the complete image, it is unnecessary to compute the whole transform. A partial computation of the 2-D FNT can lead to the same result. In Figure 14(a) and (b), the 1-D FNT along rows and along columns is computed and shows that the image is periodic along all rows and along all columns (using the result found in one dimension, the periodicity of each column can be easily detected). Figure 15(a) shows a photomask containing four errors. In Figure 15(b) the errors are indicated by rows of nonzero pixels showing the existence of these errors. The errors may lie anywhere in those rows. Thus, the location of the errors is narrowed and can be detected either by scanning those rows or by taking the 2-D FNT along columns, which leads to columns of nonzero pixels (Figure 15(c)), indicating the absence of periodicity along those columns. The errors are located at the intersection of the previous rows and these columns (the nonzero values are presented as white and the zero values as black background) [Boussakta, 1990]. 3. Number Theoretic Transform with Independent Length and Modulus Previously, the main concern has been 2-D FNTs that possess fast algorithms and can be computed with simple arithmetic operations. All operations are performed in a finite field with the arithmetic carried out modulo a Fermat prime, which requires a b C 1-bit representation. The pattern of zeros is true for all 2-D NTTs, where N is highly composite. For the detection of errors the convolution property, which limits transform lengths, is no longer needed. This allows other 2-D NTTs to be considered as candidate transforms for error detection. The one-dimensional number theoretic Walsh transform was introduced in Thomas, Larsen, and Keller [1983] and extended to two dimensions in Boussakta [1990], but, because of its lack of the cyclic convolution property, which is a major feature for a transform to be used in digital signal processing, this transform finds few applications. In this case the most important things are the transform itself and the transformed data rather than the convolution property. This type of transform has the advantage that it has no rigid relationship between the transform length and the modulus.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
51
(a)
(b)
FIGURE 14. (a) 2-D FNT along the columns of a perfect photomask; the discontinuous white pattern indicates perfect periodicity along this dimension. (b) 2-D FNT along the rows of a perfect photomask; the discontinuous white pattern indicates perfect periodicity along the rows.
The two-dimensional number theoretic Walsh transform of length N ð N of an image data xm, n is given by Wk, l D 1/N
N1 N1
xm, nwk, mwl, n
mD0 nD0
k D 0, 1, 2, . . . , N 1 l D 0, 1, 2, . . . , N 1 where wk, m is the well-known Walsh function.
mod F
104
52
S. BOUSSAKTA AND A. G. J. HOLT
(b)
(a)
(c)
FIGURE 15. (a) A faulty photomask with four pixels in error. (b) The 2-D FNT along the rows of part (a), with four white bars locating the faults within the rows. (c) The 2-D FNT along the columns of part (a), with three white bars locating the faults within the columns.
The inverse transform is similar to the forward and is given by
xm, n D 1/N
N1 N1
Wk, lwk, mwl, n
kD0 lD0
m D 0, 1, 2, . . . , N 1 n D 0, 1, 2, . . . , N 1
mod F
105
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
53
The constant factor 1/N2 is split between the forward and the inverse transform to make them symmetric. This transform involves only additions and subtractions. The forward and the inverse Walsh transforms are taken modulo F; hence for the inverse to exist, F should be relatively prime to N (N is a power of two); the best choice for F is a modulus with efficient arithmetic such as Fermat and Mersenne primes. Equation (104) can be calculated by an efficient FFT-type algorithm involving N2 log2 N2 operations (operation means either one addition or one subtraction). The Fast Walsh transform can be computed by the same program used to compute the 2-D FNT. The only difference is that all powers of ˛ are set equal to one for the Walsh transform. The two transforms have a close relationship and each can be calculated from the other through a block diagonal transformation matrix [Boussakta and Holt, 1989]. 4. Summary The analysis of the 2-D FNTs of periodic structures leads to a rule that simplifies calculations and determines the position and values of nonzero elements in the transform domain, showing that 2-D NTTs are well structured. Some new applications of 2-D FNTs are suggested that indicate the number domain can be utilized for applications other than convolution. Integrated circuit designs and microprocessor implementations for 2-D FNTs should make this transform more readily available for use in systems. The transform-length constraint for the 2-D FNT is not a great problem; in fact, several solutions are proposed, allowing the processing of sufficiently large images while retaining the advantages of the FNT, such as fast computation and simple arithmetic. Other transforms, such as the number theoretic Walsh, may also be used. V. ANOTHER FAMILY OF NUMBER THEORETIC TRANSFORMS USING THE MERSENNE NUMBERS (NMNTS) So far only number theoretic transforms with Fourierlike structure have been considered; in this section another family of transforms is introduced. These transforms are defined modulo Mersenne numbers, but they have long transform lengths that are powers of 2 and FFT-type algorithms [Boussakta and Holt, 1992; Boussakta and Holt, 1994a; Boussakta and Holt, 1994b]. 1. Transform Definition The transform described here has the cyclic convolution property (CCP) and is defined modulo the Mersenne numbers. It is suitable for fast algorithms, with a convenient word length and arithmetic.
54
S. BOUSSAKTA AND A. G. J. HOLT
a. The Forward Transform. The 1-D forward transform of sequence xn of length N is defined as [Boussakta and Holt, 1995a] N1 Xk D xnˇnk 106 nD0
Mp
k D 0, 1, 2, . . . , N 1 where h iMp denotes mod MP , and Mp is Mersenne prime equal to 2p 1. ˇn D ˇ1 n C ˇ2 n
107
ˇ1 n D hRe˛1 C j˛2 n iMp and ˇ2 n D hIm˛1 C j˛2 n iMp Also
˛1 D šh2 iMp ; q
˛2 D šh3 iMp ; q
qD2
p2
108 109
In Equation (109) ˛1 and ˛2 are of order N D 2pC1 . For transform length N/d, where d is an integer power of 2, ˇ1 and ˇ2 are given by ˇ1 n D hRe˛1 C j˛2 d n iMp and ˇ2 n D hIm˛1 C j˛2 d n iMp
110
Re(Ð) and Im(Ð) denote real and imaginary parts of the enclosed term, respectively. The transform defined in Equation (106) uses a real kernel leading to a real transform. Also, this transform has the same form of inverse as its forward transform. b. The Inverse Transform. The form of the inverse transform is the same as that of the forward transform, except for a factor (1/N): N1 xn D 1/N Xkˇnk 111 kD0
Mp
n D 0, 1, 2, . . . , N 1 Because Mp is an odd number and N is a power of 2, the inverse of N always exists. Also the term 1/N could be split between the forward and the inverse transforms to make them both exactly the same form. 2. Properties of the ˇ Function Consider the relation between ˇÐ and the kernels of the Fourier and Hartley transforms in order to clarify the properties of ˇÐ D ˇ1 Ð C ˇ2 Ð.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
55
As an example take the values of ˇ1 nk and ˇ2 nk calculated for N, Mp , ˛1 , ˛2 D 8, 31, 4, 4.
ˇ1 nk D
1 1 1 1 1 4 0 4 1 0 1 0 1 4 0 4 1 1 1 1 1 4 0 4 1 0 1 0 1 4 0 4
1 1 1 1 1 4 0 4 1 0 1 0 1 4 0 4 1 1 1 1 1 4 0 4 1 0 1 0 1 4 0 4
ˇ2 nk D
0 0 0 0 0 0 0 0 0 4 1 4 0 4 1 4 0 1 0 1 0 1 0 1 0 4 1 4 0 4 1 4 0 0 0 0 0 0 0 0 0 4 1 4 0 4 1 4 0 1 0 1 0 1 0 1 0 4 1 4 0 4 1 4
112
From these matrices it can be seen that ˇ1 Ð and ˇ2 Ð have the same sign and one and zero values as the cosine and sine in the DFT or DHT. Relations similar to those existing between the trigonometric functions casÐ, cosine, and sine, exist for ˇÐ, ˇ1 Ð, and ˇ2 Ð Table 5 gives the transform parameters for some different moduli and maximum transform lengths Nmax D 2p . From ˛1 and ˛2 of Table 5, the values of ˛1 and ˛2 for any transform length power of 2 to Nmax can be calculated. 3. Calculation of 1-D Convolutions Let xn and hn be two sequences of length N1 and N2 , respectively. Then their linear convolution is of length an N1 C N2 1 point sequence. The new transform of this length can be used to calculate the convolution. TABLE 5 TRANSFORM PARAMETERS FOR THE NMNT. Modulus (Mp ) 1 D 31 27 1 D 127 213 1 D 8,191 217 1 D 131,071 219 1 D 524,287
25
Nmax
˛1 (for Nmax )
˛2 (for Nmax )
32 128 8,192 131,072 524,288
5 5 5 5 5
10 22 5,381 94,889 46,561
56
S. BOUSSAKTA AND A. G. J. HOLT
yn D xnŁ hn D transformfXk Hev k C XN k Hod kg D transformfXk0Hkg
113
where is point-by-point multiplication, 0 is defined by Equation (113), and Hev k and Hod k stand for even and odd parts of Hk, respectively, and are given by Hev k D hfHk C HN kg/2iMp and
Hod k D hfHk HN kg/2iMp
114
The division by 2 in Equation (114) is a division modulo Mp 1/2 D 2p1 . Usually the filter impulse response hn is known a priori, and hence Equation (114) can be precalculated and stored. Also xn and hn must be scaled so that the convolution result jynj does not exceed Mp /2. 4. Fast Algorithms for This Transform The length of the new transform is a power of 2. Fast algorithms such as the radix-2, radix-4, and split radix [Rabiner and Gold, 1975; Duhamel and Hollman, 1982] could be applied for its evaluation. a. Radix-2 Algorithm Derivation. As an example, a radix-2 algorithm is derived. Dividing the input sequence into its odd and even parts, Equation (106) can be decomposed as Xk D hX2n k C X2 kiMp N/21 x2nˇ2nk X2n k D nD0
X2 k D
N/21
115 116 Mp
x2n C 1ˇ2n C 1k
nD0
117 Mp
ˇ2n C 1k D hˇ1 kˇ2nk C ˇ2 kˇ2nkiMp
118
X2n k can be identified as an N/2-point transform and Equation (117) can be written as N/21 N/21 X2 k D ˇ1 k x2n C 1ˇ2nkCˇ2 k x2n C 1ˇ2nk nD0
nD0
Mp
119
57
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
Combining Equations (115) and (119) leads to the recursive formula for this transform: Xk D hX2n k C X2nC1 kˇ1 k C X2nC1 N kˇ2 kiMp
120
k D 0, 1, 2, . . . , N 1 where X2nC1 k D
N/21
x2n C 1ˇ2nk
nD0
121 Mp
Combining the four points, Xk, XN/2 k, Xk C N/2, XN k gives an in-place radix-2 algorithm: Xk D hX2n k C [X2nC1 kˇ1 k C X2nC1 Nkˇ2 k]iMp 122 Xk C N/2 D hX2n k [X2nC1 kˇ1 k C X2nC1 N kˇ2 k]iMp123 XN/2 k D hX2n N/2 k C [X2nC1 N/2 C kˇ2 k X2nC1 N kˇ1 k]iMp
124
XN k D hX2n N/2 k [X2nC1 N/2 C kˇ2 k X2nC1 N kˇ1 k]iMp
125
b. Arithmetic Complexity. A single butterfly is shown in Figure 16. It calculates four points and requires four integer multiplications and six additions. Therefore, the calculation of the whole transform using a single butterfly involves N log2 N multiplications (including the multiplications by š1) and 3 2 N log2 N additions. Using different butterflies, the number of multiplications can be reduced quite significantly [Boussakta and Holt, 1995a]. In Table 6 the shifts are due to the fact that the stages of order three and higher include twiddle factor ˇ1 N/8 and ˇ2 N/8, which are both equal to 2p1/2 . TABLE 6 GIVES THE NUMBER OF OPERATIONS FOR DIFFERENT NUMBER OF BUTTERFLIES. Number of Butterflies Used 1 2 3 5
Number of Multiplications N log2 N N log2 N 2N C 2 N log2 N 3N C 4 N log2 N 4N C 8
Number of Shifts
Number of Additions
1 2N 2
3 2 N log2 N 3 2 N log2 N N C 1 3 3 2 N log2 N 2 N C 2 3 3 N log N 2 2 2N C 2
58
S. BOUSSAKTA AND A. G. J. HOLT X(k)
X(k)
X(N/2−k)
X(N/2−k) β1(κ)
X(N/2+k)
X(N/2+k)
−
β2(κ) β2(κ) β1(κ)
X(N−k)
−
−
X(N−k)
FIGURE 16. Four-point butterfly.
B. The 2-D New Mersenne Number Transform (2-D NMNT) The new transform can be readily extended to two-dimensional and multidimensional cases. 1. Definition of the 2-D Transform The transform pair is defined as follows. The 2-D forward transform of a 2-D input image xn, m of dimensions N ð N is defined as [Boussakta and Holt, 1995a]: N1 N1 xn, mˇnk, ml 126 Xk, l D nD0 mD0
Mp
k D 0, 1, 2, . . . , N 1,
l D 0, 1, 2, . . . , N 1
ˇn, m D ˇ1 n, m C ˇ2 n, m ˇ1 n, m D Re˛1 Cj˛2 nCm
Mp
127
and ˇ2 n, m D Im˛1 Cj˛2 nCm
128
Mp
˛1 and ˛2 are as defined for the 1 D case for transform dimensions N/d ð N/d; ˇ1 and ˇ2 are given by ˇ1 n, m D Re˛1 Cj˛2 d nCm
and ˇ2 n, m D Im˛1 Cj˛2 d nCm Mp 129 Re(Ð) and Im(Ð) denote the real and imaginary parts of the enclosed term, respectively. The two-dimensional inverse transform has the same form as that of the forward transform except of a factor (1/N2 ). N1 N1 xn, m D 1/N2 Xk, lˇnk, ml 130 Mp
kD0 lD0
n D 0, 1, 2, . . . , N 1,
Mp
m D 0, 1, 2, . . . , N 1
Because Mp is an odd number, the inverse of N2 always exists.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
59
TABLE 7 THE RELATIONSHIP BETWEEN THE TRANSFORM SIZE AND THE MODULUS FOR THE 2-D NMNT. Modulus (Mp )
Nmax ð Nmax
˛1
˛2
25 ð 25 27 ð 27 213 ð 213 217 ð 217 219 ð 219
5 5 5 5 5
10 22 5,381 94,889 46,561
25 1 27 1 213 1 217 1 219 1
2. Relationship Between Transform Sizes and Modulus The choice of the modulus depends on the dynamic range and the transform size needed as shown in Table 7. From ˛1 and ˛2 of Table 7, the values of ˛1 and ˛2 for any transform-size power of 2 up to Nmax ð Nmax can be calculated. As an example, a number of different transform parameters for the modulus 213 1 are given in Table 8. 3. Calculation of 2-D Convolutions Using the 2-D NMNT Similar to the one-dimensional transform, the two-dimensional transform has the 2-D cyclic convolution property and hence can be used for the calculation of 2-D convolution/correlation for image-processing applications. yn, m D transform fXk, l Hev k, l C XN k, N l Hod k, lg D transformfXk, l0Hk, lg
131
TABLE 8 TRANSFORM PARAMETERS FOR THE MODULUS 213 1. Transform Sizes 8ð8 16 ð 16 32 ð 32 64 ð 64 128 ð 128 256 ð 256 512 ð 512 1024 ð 1024 2048 ð 2048 4096 ð 4096 8192 ð 8192
˛1 64 1,735 42 2,498 4,634 336 3,039 253 3,390 49 5
˛2 64 812 7,010 3,389 2,338 1,198 3,779 2,016 1,624 4,664 2,810
60
S. BOUSSAKTA AND A. G. J. HOLT
where Hev k, l and Hod k, l stand for even and odd parts of Hk, l, respectively, and are given by Hev k, l D hfHk, l C HN k, N 1g/2iMp and
Hod k, l D hfHk, l HN k, N 1g/2iMp
132
Finally, in order to ensure a meaningful result free from overflow effects, the input and filter array must be properly scaled such that yn, m < M
133
The procedure for the calculation of 2-D convolution/correlation using this transform is shown in Figure 17. 4. Multidimensional Transform The multidimensional forward and inverse transforms are defined next. a. The M-D Forward Transform. The multidimensional (M-D) forward transform of xn1 , n2 , . . . , nM of dimensions N ð N ð Ð Ð Ð N is defined as Xk1 , k2 , . . . , kM D
N1 N1 n1 D0 n2 D0
N1
...
xn1 , n2 , . . . , nM
nM D0
ð ˇn1 k1 , n2 k2 , . . . , nM kM
134 Mp
ki D 0, 1, 2, . . . , N 1,
i D 1, 2, . . . , M
ˇn1 , n2 , . . . , nM D ˇ1 n1 , n2 , . . . , nM C ˇ2 n1 , n2 , . . . , nM 135 M
ˇ1 n1 , n2 , . . . , nM D
ni
Re˛1 C j˛2 iD1
136 Mp
h(n,m) 2-D Transform H(k,l) x(n,m)
X(k,l) 2-D Transform
Y(k,l)
2-D Transform
y(n,m)
FIGURE 17. The 2-D convolution using the new 2-D transform.
61
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS M
ˇ2 n1 , n2 , . . . , nM D
ni
Im˛1 C j˛2 iD1
137 Mp
For transform length N/d ð Ð Ð Ð ð N/d; ˇ1 and ˇ2 are given by M
ˇ1 n1 , n2 , . . . , nM D
ni
Re˛1 C j˛2 d iD1
138 Mp
M
ˇ2 n1 , n2 , . . . , nM D
Im˛1 C j˛2
ni
d iD1
139 Mp
b. The M-D Inverse Transform. given by:
The M-dimensional inverse transform is
xn1 , n2 , . . . , nM D
1/NM
N1 N1
...
k1 D0 k2 D0
N1 kM D0
Xk1 , k2 , . . . , kM
ð ˇn1 k1 , n2 k2 , . . . , nM kM Mp
ni D 0, 1, 2, . . . , N 1,
i D 1, 2, . . . , M
140
5. The Calculation of Multidimensional Convolution The M-D convolution property of this transform is given as: yn1 , n2 , . . . , nM D xn1 , n2 , . . . , nM Ł hn1 , n2 , . . . , nM D transform fXk1 , k2 , . . . , kM Hev k1 , k2 , . . . , kM C XNk1 , Nk2 , . . . , NkM Hod k1 , k2 , . . . , kM g D transform fXk1 , k2 , . . . , kM 0Hk1 , k2 , . . . , kM g 141 where Hev and Hod stand for even and odd parts of H , respectively.
62
S. BOUSSAKTA AND A. G. J. HOLT
C. The Separable Two-Dimensional and Multidimensional Mersenne Transforms (2-D NMNT) In some cases a separable transform, where the transform can be calculated using the row-column method, is preferable. One advantage in using a separable transform in two dimensions and more is the simplicity of using algorithms and programs developed for the 1-D case. Therefore, it is the aim of this section to present a separable version of the transform introduced in Equation (126). 1. Definition of the Separable Two-Dimensional Transform (2-D NMNT) Although the transform defined in Equation (126) is the true two-dimensional extension of the 1-D transform defined in Equation (106), Equation (126) is not separable because ˇn, m 6D ˇnˇm 142 when the row-column method is applied where the 2-D transform computation is broken into a series of 1-D transforms executed first along rows and then along the columns or vice versa. It is preferable to use a separable transform and hence define the 2-D transform as follows. a. The Forward Transform. The 2-D separable forward transform of xn,m of dimensions N ð N is defined as [Boussakta and Holt, 1995b] Xk, l D
N1 N1
xn, mˇnkˇml
nD0 mD0
143 Mp
k D 0, 1, 2, . . . , N 1,
l D 0, 1, 2, . . . , N 1
where ˇn D ˇ1 n C ˇ2 n
ˇm D ˇ1 m C ˇ2 m
144
ˇ1 n D hRe˛1 C j˛2 n iMp
ˇ1 m D hRe˛1 C j˛2 m iMp
145
ˇ2 n D hIm˛1 C j˛2 iMp
ˇ2 m D hIm˛1 C j˛2 iMp
146
n
m
for transform dimensions N/d ð N/d; ˇ1 and ˇ2 are given by ˇ1 n D hRe˛1 C j˛2 d n iMp ,
ˇ2 m D hIm˛1 C j˛2 d m iMp
Re(Ð) and Im(Ð) stand for real and imaginary parts, respectively.
147
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
63
b. The Inverse Transform. The transform defined in Equation (143) has a similar inverse defined as N1 M1 2 xn, m D 1/N Xk, lˇnkˇml 148 kD0 lD0
n D 0, 1, 2, . . . , N 1,
Mp
m D 0, 1, 2, . . . , M 1
Equation (143) and Equation (149) form a transform pair, and they are both separable and exact. 2. Arithmetic Complexity A successive calling of the 1-D transform applied along the rows and then along the columns using the row-column method can be used. For an image size of N ð N, the number of arithmetic operations using different numbers of butterflies to remove trivial multiplications is given in Table 9. Stages of order 3 and higher include twiddle factors ˇ1 N/8 and ˇ2 N/8, which are both equal to a power of 2 and can be calculated using shifts only [Boussakta and Holt, 1995a]. 3. The Calculation of 2-D Convolution The two-dimensional transform defined in Equations (143) and (148) has the 2D cyclic convolution property. Consequently, it can be used in the calculation of 2-D convolutions or 2-D correlations for image-processing purposes without the effect of rounding errors or truncation. Let Hev k, l D Hk, l C Hk, l/2
149
Hod k, l D Hk, l Hk, l/2
150
Haev k, l D Hk, l C Hk, l/2
151
Haod k, l D Hk, l Hk, l/2
152
TABLE 9 ARITHMETIC COMPLEXITY FOR THE 2-D SEPARABLE NEW MERSENNE TRANSFORM. Number of Butterflies Used 1 2 3 5
Number of Multiplications 3 2 2 N log2 N 3 2 2 N log 2 N 3N C 3N 2 3 2 9 2 2 N log2 N 2 N C 6N 3 2 2 2 N log2 N 6N C 12N
Number of Shifts
N2 4N
Number of Additions 7 2 2 N log2 N 7 2 2 N log 2 N 3N C 3N 2 7 2 9 2 2 N log2 N 2 N C 6N 7 2 2 2 N log2 N 5N C 8N
64
S. BOUSSAKTA AND A. G. J. HOLT
and A D Hev k, l C Haev k, l/2
153
B D Hev k, l Haev k, l/2
154
C D Hod k, l C Haod k, l/2
155
D D Hod k, l Haod k, l/2
156
The convolution is given by Yk, l D Xk, l A C Xk, l B C Xk, l C C Xk, l D mod Mp D Xk, l5Hk, l
157
In real applications, to reduce the number of arithmetic operations the quantities A, B, C, and D are precalculated. A listing of the program to calculate 2-D convolution using this transform is available upon request from the authors (S. Boussakta). 4. Generalization of this Transform The transform can be easily extended to the multidimensional case; owing to the separability of the kernel, it can be calculated using a 1-D transform applied along each direction. a. The Forward Multidimensional Transform. The M dimensional (M-D) forward transform of xn1 , n2 , . . . , nM of dimensions N ð N ð . . . ð N is defined as N1 N1 N1 M ... xn1 , n2 , . . . , nM ˇni ki 158 Xk1 , k2 , . . . , kM D n1 D0n2 D0
nM D0
ki D 0, 1, 2, . . . , N 1, ˇni D ˇ1 ni C ˇ2 ni
iD1
Mp
i D 1, 2, . . . , M 159
ˇ1 ni D hRe˛1 C j˛2 ni iMp
160
ˇ2 ni D hIm˛1 C j˛2 iMp
161
ni
˛1 and ˛2 are as defined for the 1-D case. b. The Inverse Multidimensional Transform. The form of the inverse transform is the same as that of the forward transform except for a factor
65
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
of (1/NM :
xn1 , n2 , . . . , nM D 1/NM
N1 N1
...
k1 D0k2 D0
ni D 0, 1, 2, . . . , N 1,
N1
M
Xk1 , k2 , . . . , kM
kM D0
ˇni ki iD1
i D 1, 2, . . . , M
Mp
162
D. Summary In this section another family of transforms was introduced. The transforms were defined modulo the Mersenne numbers, where arithmetic operations are simple; the transform length is long and equal to a power of 2, making it amenable to fast algorithms. It has the cyclic convolution property and hence can be used for the calculations of digital filtering without introducing additional processing noise. The transform was shown for the two-dimensional and multidimensional cases. It is worth mentioning that the emphasis is put on Mersenne primes in order to give long transform lengths with simple arithmetic and residue reduction; however, the theory could be extended to rings defined modulo Mersenne numbers and some other rings. VI. COMBINATION OF 2-D NTTS USING THE 2-D MRC SUITABLE FOR PARALLEL IMAGE-PROCESSING APPLICATIONS A. Introduction Two-dimensional digital convolution and correlation are widely used for digital image-processing applications such as image filtering, enhancement, and recognition. Because the number of arithmetic operations is very large and the demand for real-time, high-resolution images is ever increasing, their computation becomes extensive; hence the need for parallel image-processing algorithms for high speed and high throughput is becoming more and more important. It is the aim of this section to introduce a method for the calculation of 2-D convolution and correlation for image-processing applications. The technique combines a recently developed 2-D transform based on the Mersenne numbers with the 2-D Fermat number transform, using the 2-D mixed radix conversion. The resulting combination uses fast two-dimensional residue transforms, which can be implemented in parallel for high speed and high throughput rate [Boussakta, 1999; Boussakta and Holt, 1996].1 1
This section includes material reprinted with permission from Elsevier Science.
66
S. BOUSSAKTA AND A. G. J. HOLT
The advantage of this method is that it achieves a large dynamic range, great parallelism, and simple modulo arithmetic. The 2-D convolution and correlation are calculated modulo a large integer number, giving a sufficient dynamic range, but all calculations are performed modulo smaller Mersenne and Fermat numbers, where arithmetic operations are known to be much easier than other moduli (equivalent to 1’s complement). This combination provides both simple arithmetic operations and more balanced use of hardware and software. Thus it is suitable for parallel calculation of two-dimensional digital convolution/correlation. The 2-D NMNT and 2-D FNT were described in Sections III and V, respectively. B. Two-Dimensional Composite Transform (2-D CNTT) The transform sizes of the 2-D transforms in Equation (3) and Equation (143) are powers of 2. Also, both transforms have the 2-D CCP. Hence, using the 2D MRC, they can be combined for greater dynamic range, greater parallelism, and greater speed [Boussakta, 1999].2 Table 10 shows the most practical Fermat and Mersenne numbers which can be used for this method with the corresponding transform sizes. Depending on the dynamic range required, any set of numbers from Table 10 could be used and combined by this technique. The transform size must exist in each chosen modulus. When mixing Fermat and Mersenne numbers, the choice of the word length becomes more flexible and the difference in the number of bits between different moduli can be small, leading to more balanced parallel residue transforms. TABLE 10 USEFUL MERSENNE AND FERMAT NUMBERS.
Mp 2p 1 Mersenne Primes
Possible Transform Sizes Powers of 2 Using Mersenne Primes N ð N D 2i ð 2i , i p
25 1 27 1 213 1 217 1 219 1 231 1 261 1
N ð N D 2i ð 2i , i 5 N ð N D 2i ð 2i , i 7 N ð N D 2i ð 2i , i 13 N ð N D 2i ð 2i , i 17 N ð N D 2i ð 2i , i 19 N ð N D 2i ð 2i , i 31 N ð N D 2i ð 2i , i 61
2
Ft 22 C 1 Fermat Numbers
Possible Transform Sizes Using Fermat Numbers N ð N D 2i ð 2i , i 2t
24 C 1 28 C 1 216 C 1
N ð N D 2i ð 2i , i 4 N ð N D 2i ð 2i , i 8 N ð N D 2i ð 2i , i 16
232 C 1 264 C 1
N ð N D 2i ð 2i , i 7 N ð N D 2i ð 2i , i 8
t
This section includes material reprinted with permission from Elsevier Science.
67
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
For example, the set (27 1, 28 C 1) will give a dynamic range of nearly 15 bits and transform size up to 27 ð 27 , whereas the set (216 C 1, 217 1) will give a dynamic range of nearly 33 bits and a transform size up to 216 ð 216 . C. Calculation of the 2-D Cyclic Convolution Using this Method The transforms defined in Equation (3) and Equation (143) have the 2-D cyclic convolution property (2-D CCP); hence, to calculate the N ð N-length cyclic convolution modulo M denoted by yn, m, the N ð N cyclic convolution modulo, each mi in parallel, are calculated; then, using the 2-D MRC, the convolution yn, m, can be constructed, as shown in Figure 18, for two moduli. D. Two-Dimensional Mixed Radix Conversion (2-D MRC) The 2-D mixed radix conversion of yn, m is zL n, m, zL1 n, m, . . . , z1 n, m, where the zi n, ms satisfy Equation (163): L1
L2
mi C zL1 n, m
yn, m D zL n, m iD1
mi C Ð Ð Ð C z3 n, mm1 m2 iD1
C z2 n, mm1 C z1 n, m
163
In 2-D mixed radix conversion, the zi n, ms can be easily generated from the residue digits (in our case the 2-D residue convolutions) as follows: z1 n, m D hyn, mim1 D y1 n, m
z2 n, m D m11 yn, m z1 n, m m2
1 z3 n, m D m2 yn, m z1 n, m z2 n, m
164 165 166
m3
.. . H1(n, m) x1(n,m)
2-D IFNT
2-D FNT
y1(n, m) 2-D MRC
x(n,m)
y(n, m)
2-D NMNT
2-D NMNT
y2(n, m)
x2(n,m) H2(n, m) x1 (n) = x(n) mod Ft x2 (n) = x(n) mod Mp
FIGURE 18. The calculation of 2-D convolution using the new method using two moduli.
68
S. BOUSSAKTA AND A. G. J. HOLT H1(n, m) x1(n, m)
x2(n, m)
2-D CNTT
2-D CNTT
2-D CNTT
2-D CNTT
y1(n, m)
y2(n, m)
x(n, m)
x3(n, m)
2-D MRC
H2(n, m) 2-D CNTT
2-D CNTT
y(n, m)
y3(n, m)
H3(n, m) xn(n, m)
2-D NMNT
2-D CNTT
yn(n, m)
Hn(n, m) xi (n,m) = x(n,m) mod mi, i = 1,2,..n, mi is either a Mersenne or a Farmet number hi (n, m) = h(n, m) mod mi 2-D CNTT is either 2-D FNT or 2-D-NMNT
FIGURE 19. The calculation of 2-D convolution using the new method for n moduli.
From Equations (163) –(166), it can be seen that the calculation of the 2-D convolution depends only on smaller residue moduli mi ’s, which are in this case the popular Fermat and Mersenne numbers. Therefore, the calculation of the convolution modulo M can be carried out by performing the calculations with respect to two or more relatively prime moduli (m1 , m2 , . . . , mk ). Then the convolution modulo M D m1 ð m2 ð . . . ð mk is constructed using the 2-D MRC. This does not involve difficult and time-consuming arithmetic operations and residue reduction modulo M. The process of calculating a 2-D convolution and correlation using this method is shown in Figure 19 for n moduli.
E. Combination of the 2-D NMNT and 2-D FNT for Two Moduli For the purpose of illustration and without loss of generality, only two moduli are used in Table 11. From Table 11, it can be seen that larger transform blocks are obtainable when the 2-D new Mersenne number –based transform is used, as shown in Figure 18 for two moduli.
69
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS TABLE 11 POSSIBLE TRANSFORM SIZES USING THIS METHOD. The Modulus (M)
Possible Transform Sizes Using This Method
27 128 C 1
N ð N D 2i ð 2i , i 7
213 1216 C 1
N ð N D 2i ð 2i , i 13
1216
C 1
N ð N D 2i ð 2i , i 16
231 1232 C 1
N ð N D 2i ð 2i , i 7
1264
N ð N D 2i ð 2i , i 8
217 261
C 1
Note that for greater dynamic range, more than two moduli can be combined. However the transform size will be limited by the modulus with the smallest transform size. Example Two moduli, m1 D F3 D 257 and m2 D M7 D 127, are chosen for this example. This choice gives a dynamic range of M D 257 ð 127 D 32,639 ³ 15 bits. The corresponding parameters are given in Table 12. Using these two moduli, a two-dimensional transform size up to N ð N D 128 ð 128 could be chosen. The input image is chosen to be a simple square padded with zeros to give a 16 ð 16 results as follows: 4 4 4 4 4 4 4 4 xn, m D 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 0 4 4 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
70
S. BOUSSAKTA AND A. G. J. HOLT
The 2-D filter is chosen to be the same: 4 4 4 4 4 4 4 4 hn, m D 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The residue convolutions are calculated using the 2-D Fermat number and the 2-D new Mersenne-based transforms with a transform size of 16 ð 16. The 2-D residue convolution y1 n, m using the 2-D FNT with parameters (Ft , ˛, N ð N D 257, 2, 16 ð 16 is given by 16 32 48 64 80 96 112 128 y1 n, m D 112 96 80 64 48 32 16 0
32 64 96 128 160 192 224 256 224 192 160 128 96 64 32 0
48 96 144 192 240 31 79 127 79 31 240 192 144 96 48 0
64 128 192 256 63 127 191 255 191 127 63 256 192 128 64 0
80 160 240 63 143 223 46 126 46 223 143 63 240 160 80 0
96 192 31 127 223 62 158 254 158 62 223 127 31 192 96 0
112 224 79 191 46 158 13 125 13 158 46 191 79 224 112 0
128 256 127 255 126 254 125 253 125 254 126 255 127 256 128 0
112 224 79 191 46 158 13 125 13 158 46 191 79 224 112 0
96 192 31 127 223 62 158 254 158 62 223 27 31 192 96 0
80 160 240 63 143 223 46 126 46 223 143 63 240 160 80 0
64 128 192 256 63 127 191 255 191 127 63 256 192 128 64 0
48 96 144 192 240 31 79 127 79 31 240 192 144 96 48 0
32 16 0 64 32 0 96 48 0 128 64 0 160 80 0 192 96 0 224 112 0 256 128 0 224 112 0 192 96 0 160 80 0 128 64 0 96 48 0 64 32 0 32 16 0 0 0 0
Because the modulus 257 does not offer a sufficient dynamic range, the 2-D residue convolution calculated using the 2D FNT is in error. The pixels in error are highlighted in bold.
71
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS TABLE 12 TRANSFORM PARAMETERS FOR COMBINING 257 (28 C 1) and 127(27 1).
Transform Sizes 8ð8 16 ð 16 32 ð 32 64 ð 64 128 ð 128
m1 D F3 D 257 ˛ 4 2 p 2 81 9
m2 D M7 D 127 ˛2 ˛1 8 106 102 49 5
8 24 30 34 22
The second 2-D residue convolution y2 n, m using the 2-D NMNT with parameters (Mp , ˛1 , ˛2 , N ð N D 127, 21, 24, 16 ð 16 is 16 32 48 64 80 96 112 1 y1 n, m D 112 96 80 64 48 32 16 0
32 48 64 80 96 112 1 112 96 80 64 48 32 16 0 64 96 1 33 65 97 2 97 65 33 1 96 64 32 0 96 17 65 113 34 82 3 82 34 113 65 17 96 48 0 1 65 2 66 3 67 4 67 3 66 2 65 1 64 0 33 113 66 19 99 52 5 52 99 19 66 113 33 80 0 65 34 3 99 68 37 6 37 68 99 3 34 65 96 0 97 82 67 52 37 22 7 22 37 52 67 82 97 112 0 2 3 4 5 6 7 8 7 6 5 4 3 2 1 0 97 82 67 52 37 22 7 22 37 52 67 82 97 112 0 65 34 3 99 68 37 6 37 68 99 3 34 65 96 0 33 113 66 19 99 52 5 52 99 19 66 113 33 80 0 1 65 2 66 3 67 4 67 3 66 2 65 1 64 0 96 17 65 113 34 82 3 82 34 113 65 17 96 48 0 64 96 1 33 65 97 2 97 65 33 1 96 64 32 0 32 48 64 80 96 112 1 112 96 80 64 48 32 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Again, because the modulus 127 does not offer a sufficient dynamic range, the 2-D residue convolution calculated using the 2-D NMNT is in error. The pixels in error are highlighted in bold. Now applying the 2-D MRC to this example gives yn, m D m1 z2 n, m C z1 n, m D 257z2 n, m C z1 n, m
167
with z1 n, m D hyn, mim1 D y1 n, m
z2 n, m D m11 yn, m z1 n, m m2
1 D m1 y2 n, m y1 n, m
168
m2
D h85y2 n, m y1 n, mi127
169
72
S. BOUSSAKTA AND A. G. J. HOLT
Substituting y1 n, m and y2 n, m with their values in Equations (167), (168), and (169) gives the correct 2-D convolution result yn, m. 16
32
48
32
64
96 128 160 192 224
64
80
96 112
64
48
32
16 0
256 224 192 160 128
48
96 144 192 240 288 336
96
64
32 0
384 336 288 240 192 144
96
48 0
64 128 192 256 320 384 448
512 448 384 320 256 192 128
64 0
80 160 240 320 400 480 560
640 560 480 400 320 240 160
80 0
96 192 288 384 480 576 672
768 672 576 480 384 288 192
96 0
112 224 336 448 560 672 784 yn, m D
128 112
96
80
896 784 672 560 448 336 224 112 0
128 256 384 512 640 768 896 1024 896 768 640 512 384 256 128 0 112 224 336 448 560 672 784
896 784 672 560 448 336 224 112 0
96 192 288 384 480 576 672
768 672 576 480 384 288 192
96 0
80 160 240 320 400 480 560
640 560 480 400 320 240 160
80 0
64 128 192 256 320 384 448
512 448 384 320 256 192 128
64 0
48
96 144 192 240 288 336
384 336 288 240 192 144
96
48 0
32
64
96 128 160 192 224
256 224 192 160 128
96
64
32 0
16
32
48
64
80
128 112
0
0
0
0
0
96 112 0
0
0
0
96
80
64
48
32
16 0
0
0
0
0
0
0 0
From this example, it can be seen that the convolution is calculated modulo M D 257 ð 127, which has a large dynamic range, but arithmetic operations are carried out modulo 257 and 127, so they are simpler and faster. Also the residue convolutions yi n, m are not a problem if they overflow; the final result is correct provided there is no overflow modulo M. F. Advantages of This Method 1. The two-dimensional residue transforms are totally independent and hence can be calculated in parallel for high-speed processing and high throughput rate. 2. The method allows the dynamic range to be sufficiently large, but all operations are calculated using small moduli. 3. The residue moduli can be chosen to be the Mersenne and Fermat numbers, where arithmetic operations are known to be easier than when using other moduli. 4. The transform sizes of the residue transforms are equal to a power of 2; hence they are suitable for computation using fast algorithms such as radix-2 ð 2, radix-4 ð 4, split radix, etc. 5. The method is free from rounding and truncation errors.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
73
VII. HARDWARE IMPLEMENTATIONS A. TTL-Based Designs The implementation of NTTs has been greatly influenced by developments in DSP hardware. Implementations by McClellan and Rader [1976] and Smith [1983] using the TTL circuits available at the time proved the operation of FNTs. Later both TTL and gate array devices were used [Shakaff, Pajayakrit, and Holt, 1988] in an investigation of the FNT as a block-mode imagefiltering tool. A block diagram of the overall system is shown in Figure 20. It consists of a single board, 16-bit MC68000 processor with VME interface, special-purpose 64-point FNT hardware, a modulo F4 multiplier, an auxiliary TMS32010 coprocessor also programmed to compute modulo F4 multiplications, fast memory banks for the transform process, and the essential image-capture, -storage, and -display hardware. The whole image-filtering process is controlled by the MC68000. This includes uploading the digitized image from the frame grabber, performing the overlap-save sectioning procedures, managing the movement of data blocks to and from the FNT hardware during the 2-D FNT computations, initiating the FNT and other hardware, controlling the multiplication process, and, finally, displaying the results. The hardware was microcoded TTL circuitry operated as a slave processor to a host MC68000. p It was designed to compute a 64-point transform using modulus F4 , with 2 as the basis function. A 17-bit word length is required, VME bus interface
Frame display
TV monitor
Decoder ‘start’
RAM
VME interface MC68000 PIA ports
Frame grabber Multiplier modulo F4
decoder
TMS 17-bit wide 32010 memory banks ‘done’
Memory banks control
‘Start FNT’ FNT hardware ‘End FNT’
Camera
FIGURE 20. Outline diagram of FNT used as block-mode image-filtering tool.
74
S. BOUSSAKTA AND A. G. J. HOLT
but arithmetic involving the extra bit is avoided by using the simplified binary operations described by Leibowitz [1976], with data being coded in the “diminished-1” representation. B. Pipelined Designs An NMOS VLSI circuit, a set of which can be cascaded to form a 32-bit FNT operation over F4 , is described in Towers, Pajayakrit, and Holt [1987] and Alfredsen [1996]. With the addition of a modulo F4 multiplier, a fast convolver/correlator can be constructed. The design comprises one complete section of a pipelined transformer and is novel in that it can be programmed to function at any point in a forward or inverse pipeline, thus allowing the construction of a pipelined convolver or correlator using identical chips. This overcomes the difficulty of fitting a complete pipeline onto one chip without resorting to the use of several designs. See also Truong et al. [1983] and Truong et al. [1982]. The pipeline sections described in Towers, Pajayakrit, and Holt [1987] were implemented in the NMOS technology available at the time. Later designs were based on the CMOS technology when it became available [Bouridane et al., 1989; Jullien, 1991]. The pipeline designs for FNTs described require multipliers modulo a Fermat number. Two VLSI designs for diminished-1 multiplication modulo a Fermat number are given in Benaissa et al. [1988a,b], Jullien, [1980], Sander, El-Guibaly, and Antinou [1993], and Ashur, Ibrahim, and Aggoun [1994]. A CMOS-based VLSI design giving a significant improvement in speed of operation and a higher data rate is described in Benaissa, Dlay, and Holt [1991a]. It is based on the use of three-input modulo F4 adders and uses the decimation-in-time algorithm. Using a 2.5-m process, it has predicted clock and sample rates of 16 MHz and 8 MHz, respectively, and can be cascaded to perform a 64-point FNT/IFNT. Alternative approaches to the problems of VLSI implementations of number theoretic transforms are given by Wigley and Jullien [1992], Nagpal, Jullien, and Miller [1983], Arambepola [1989], and Gudvangen [1997]. C. Vector Radix Method An effective algorithm for computing the 2-D FFT as an alternative to the conventional row-column decomposition has been presented by Rivard [1977] and Harris et al. [1977]. Known as the vector radix algorithm, it saves 25% of the complex multiplications and avoids the matrix-transpose operation. In this method the decimation is performed in both rows and columns simultaneously.
75
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
The 2-D FFT is decomposed successively into smaller 2-D FFTs until only trivial 2-D FFTs need to be evaluated. This algorithm can be applied to the 2-D FNT. An approach using the vector radix algorithm for the FNT and a CMOS VLSI design of a pipeline programmable vector radix FNT computational unit is described in Benaissa, Dlay, and Holt [1991b]. Only a programmable delay commutator chip is required to complete the 2-D convolution system. This allows a substantial reduction of the computational time required for image filtering via the FNT to be achieved at the expense of more hardware complexity. Figure 21 shows a block diagram for an N ð N pipeline vector radix 2 ð 2 FNT. It involves programmable variable-length delays, commutators, and 2-D radix 2 ð 2 FNT butterflies. An important feature of this architecture is that only two types of circuit are used, the butterfly and the variable-length delay. The data flow and interstage reordering of an 8 ð 8 pipeline vector radix FNT are illustrated in Figure 22; additional information is given in Figure 8 of the reference.3
2
ZN /2+N/2
ZN+1
OUTPUT
2
ZN /4
Butterfly
ZN/2
ZN
Commutator
ZN/4
Butterfly
Commutator
INPUT
2
ZN /2
Z1
2
ZN /4+N/4
FIGURE 21. N ð N pipeline vector radix 2 ð 2 butterfly.
Com mutat or 1
32
4
Butter fly 1
2
16 18
INPUT
9
18
36
Com mutat or 2
16
2
Butter fly 2
1
8
Com mutat or 3
8
Butter fly 3
1
9 OUTPUT
FIGURE 22. Data pattern through delay commutators for 8 ð 8 pipeline vector radix 2-D FNT.
3
This section includes material reprinted with permission from Elsevier Science.
76
S. BOUSSAKTA AND A. G. J. HOLT
The basic radix 2 ð 2 butterfly is shown in Figure 23. Because N is a power of 2, the decimation procedure requires log2 N stages, each stage consisting of N2 /4 butterflies and each butterfly involving three multiplications (shifts) and eight additions. Consequently the numbers of multiplications and additions performed in the computation of an N ð N radix 2 ð 2 FNT are Nmul D 34 N2 log2 N and Nadd D 2N2 log2 N, respectively. The number of multiplications is, therefore, 25% less than that required for a row-column decomposition (N2 log2 N), but the number of additions remains the same. In order to translate a DSP application into silicon, the most important decision to make is to select a suitable architecture. This selection depends mainly on the following points: first, on the modularity, parallelism, and data-flow mechanism of the application; second, on the throughput requirements; and third, on the feature size of the accessible technology. One of the most effective architectures for VLSI implementation of DSP applications is the pipeline scheme, due to its inherent modularity and suitability for high-throughput applications. Modulus F4 was chosen for convenient implementation. The computational unit is the main part of the 2-D transform. It was designed to function in both forward and inverse directions of the transform and according to the decimation stage of the vector radix algorithm under the control of externally supplied rank and direction signals. This approach allows a whole 2-D FNT butterfly section to be implemented on one chip, and a set of these are cascaded with delay commutator circuits and a multiplier to build a 2-D FNT convolver. A block diagram of the 2-D computational unit is shown in Figure 24; it consists of the basic circuits of an FNT butterfly. The arithmetic used is the diminished-1 arithmetic. The design can be divided into four major parts, as follows. 1. Addition/subtraction part: This part comprises four adders and four subtractors, implemented using diminished-1 arithmetic and two-input carrylook-ahead logic. 2. Multiplication by powers of 2 part: This part comprises three multipliers. Each multiplier is implemented with a modified barrel shifter capable of shifting left (positive) or right (negative) the input data under the control of an exponent generator driven by a programmable logic array finite-state machine (PLAFSM) based counter and a shifter. The exponent generator generates the exponents required in the multiplication, according to the different stages of the vector radix algorithm. The sequences of exponents for the three multipliers in the different stages of both the forward and inverse transforms using the vector radix algorithm are given in Tables 13, 14, and 15.
77
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
x (m, n)
X(k, l)
x (m, n+N/2)
X(k, l+N/2)
x (m+N/2, n)
X (k+N/2, l)
X (k+N/2, l+N/2)
x (m+N/2, n+N/2)
FIGURE 23. Basic radix 2 ð 2 butterfly.
INPUT
L04
L03 adder
adder
adder
L02 L01
adder
ENB
negator
SCAN
ENB
ENB
ENB
reg12
ENB
reg10
reg11
FSM Controler
negator
negator
reg13
reg6
RSTT
reg7
reg2
L01 ENB
ENB
reg8
reg3
ENB
reg9
L02
L04
reg4
L03 ENB
reg5
negator adder
adder
adder adder
reg15
reg14
RSTC
reg16
ENB
ENB
ENB
ENB
DIR2
reg17
ENBC1
ENBAR
OUTPUT
78
reg22
L01
reg26
ENB L02
reg23
ENB
reg18 reg19
ENB
ENBC3
FSM Controler
Multiply by powers of two
ENB
L04
reg24
ENBC2 ENB
reg20
Exp-gen
Multiply by powers of two
ENBC2 ENB
ENBC3
reg25
Exp-gen
Multiply by powers of two
Exp-gen
DIR1 RANK
reg21
ENBC2
FIGURE 24. Block diagram of a pipeline vector radix 2-D FNT computational unit.
reg1
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
79
3. Pipeline/testing registers: These registers divide the system into subblocks of reduced propagation delay, allowing a higher throughput. They were designed also to act as scan path registers during the testing mode. 4. The control part: This part controls all the operations of the chip. It consists of five programmable logic arrays (PLA) based finite-state machine (FSM) controllers. The first PLAFSM generates the LOAD and ENABLE signals used to multiplex the inputs and outputs in order to reduce the pin count. Because of multiplexing, the internal data rate is a quarter of the external clock and data rates. Therefore, the clocks of the different blocks in the chip are gated with the ENABLE signal, which controls all the clocked elements in the design. This PLAFSM can be reset by an external RESET LINE, and during testing mode it freezes the operation of the chip under the control of the external signal SCAN. Each of the three multipliers (barrel shifters) is controlled by a PLAFSM counter that generates the required sequence of exponents for the first stage in both forward and inverse directions. The exponents are then fed to a simple shifter to generate the sequences corresponding to the remaining ranks under the control of the externally supplied RANK signals. Each PLAFSM is clocked under the control of an ENABLE-COUNT (ENBC) signal generated by an additional PLAFSM. The use of a simple shifter and an extra PLAFSM significantly reduces the complexity of the PLAFSM counters. The last PLAFSM is used to generate the ENABLE-COUNT signals (ENBC1, ENBC2, ENBC3) required to control the three PLAFSM counters. As shown in Table 13, the sequence of exponents required for the first multiplier in the first stage consists of the iteration 16 times of the sequence f0, 1, . . . , 15g for the forward direction and its bit-reversed sequence for the inverse direction. Therefore, a direct use of a PLAFSM counter to generate all the exponents would result in a very large and slow — and, hence, impractical — PLAFSM. A more viable approach consists of using an additional PLAFSM that generates an ENABLE-COUNT (ENBC1 for mult 1) signal that controls the iteration of the much shorter sequence f0, 1, . . . , 15g 16 times, allowing the use of a much simpler PLAFSM counter. For the second multiplier Table 14 shows that the sequence of exponents is incremented only after each 16 cycles. Hence, ENBC2 is used to freeze the counting for each 16 cycles. For the third multiplier, Table 15 shows that the sequence of exponents is slightly more complicated than before. Not only are higher exponents required (up to 30), but also the sequencing starts from an incremented exponent after each 16 cycles. Therefore, ENBC3 controls the counting for each 16 cycles and the PLAFSM counter of the third multiplier is programmed to start counting after each 16 cycles from an incremented state.
80
S. BOUSSAKTA AND A. G. J. HOLT TABLE 13 EXPONENT SEQUENCES FOR THE FIRST MULTIPLIER
Rank Dir.
Sequence of Exponents
0
Forw. 0 1 Inv. 0 8
2 3 4 12
4 5 2 10
6 7 8 6 14 1
9 10 11 12 13 14 15. 16 Times 9 5 13 3 11 7 15. 16 Times
1
Forw. 0 2 Inv. 0 0
4 6 8 8
8 10 12 14 0 4 4 12 12 2
2 4 6 2 10 10
2
Forw. 0 4 Inv. 0 0
8 12 0 0
0 8
4 8
8 12 0 8 8 4
4 4
8 12 0 4 8 12. 16 Times 4 4 12 12 12 12. 16 Times
3
Forw. 0 8 Inv. 0 0
0 8 0 0
0 0
8 0
0 8 0 0 0 8
8 8
0 8
8 8
0 8
8 8
0 8
8. 16 Times 8. 16 Times
4
Forw. 0 0 Inv. 0 0
0 0 0 0
0 0
0 0
0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 16 Times 0. 16 Times
8 10 12 14. 16 Times 6 6 14 14. 16 Times
The final layout of the vector radix computational unit section was designed in 2.5 m CMOS. The predicted clock rate was 18 MHz, and the estimated power consumption was 0.3 W. Due to multiplexing, the sample rate in a 16 ð 16-point impulse response overlap-save convolver would be 4.5 MHz. In order to complete the whole pipeline 2-D FNT, a design for a programmable delay commutator similar to that by Swartzlander and Hallnor [1985] was required. The design was programmable to take on different delay lengths according to the appropriate rank and direction of the pipeline. Table 16 shows the lengths of the delays required in a 32 ð 32-point forward and inverse vector radix pipeline transformer. It is clear that each pipeline section requires two programmable delay commutators, a prebutterfly and a postbutterfly chip. Consequently, the number of chips required to implement a 32 ð 32 point vector radix FNT transformer would be 5 ð 3 D 15. The CMOS designs discussed were for implementation with 2.5-μm technology. This work was carried out before the submicron technology became available. VIII. CONCLUSIONS Transforms defined in finite fields and rings with all arithmetic performed modulo an integer, which do not involve complex arithmetic or errors in calculation, have appreciable advantages for image processing and other applications. The 2-D NTTs are well suited to the calculation of two-dimensional convolutions and correlations. The choices of the transform size, N ð N, the modulus, F, and the basis function, ˛, used in the 2-D NTT have been widely
TABLE 14 EXPONENT SEQUENCES FOR THE SECOND MULTIPLIER. Sequence of Exponents for Forward Direction Rank 0
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
0 1 2 3 .. .
Sequence of Exponents for Inverse Direction 0 1 2 3 .. .
0 1 2 3 .. .
15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 Rank 1
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
0 2 4 .. .
14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14
81
Rank 2
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
0 4 8 .. .
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 Rank 3
Rank 4
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
0 0 8 .. .
14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 8 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
TABLE 15 SEQUENCE OF EXPONENTS FOR THE THIRD MULTIPLIER. Sequence of Exponents for Forward Direction Rank 0
Rank 1
82
Rank 2
Rank 3
Rank 4
0 1 2 3 .. .
1 2 3 4 .. .
6 7 8 9 10 7 8 9 10 11 8 9 10 11 12 9 10 11 12 13 .. .. .. .. .. . . . . . 15 16 17 18 19 20 21 22 23 24 25 0 2 4 .. .
2 4 6 .. .
14
0
2 3 4 5 .. .
3 4 5 6 .. .
4 5 6 7 .. .
5 6 7 8 .. .
4 6 8 10 12 14 0 6 8 10 12 14 0 2 8 10 12 14 0 2 4 .. .. .. .. .. .. .. . . . . . . . 2 4 6 8 10 12 14
2 4 6 .. . 0
11 12 13 14 .. .
12 13 14 15 .. .
13 14 15 16 .. .
Sequence of Exponents for Inverse Direction 14 15 16 17 .. .
15 16 17 18 .. .
26 27 28 29 30
4 6 8 10 12 14 6 8 10 12 14 0 8 10 12 14 0 2 .. .. .. .. .. .. . . . . . . 2 4 6 8 10 12
0 4 8 12 0 4 8 12 0 4 8 12 0 4 8 12 4 8 12 0 4 8 12 0 4 8 12 0 4 8 12 0 8 12 0 4 8 12 0 4 8 12 0 4 8 12 0 4 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . 12 0 4 8 12 0 4 8 12 0 4 8 12 0 4 8
0 16 8 24 4 20 12 28 2 18 10 26 6 22 14 30 16 8 24 4 20 12 28 2 18 10 26 6 22 14 30 1 8 24 4 20 12 28 2 18 10 26 6 22 14 30 1 17 Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð Ð .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . . 30 1 17 9 25 5 21 13 29 3 19 11 27 7 23 15 0 8 4 .. .
0 8 8 4 4 12 12 2 2 10 10 6 6 14 14 8 4 4 12 12 2 2 10 10 6 6 14 14 0 0 4 12 12 2 2 10 10 6 6 14 14 0 0 8 8 .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . . . 14 14 0 0 8 8 4 4 12 12 2 2 10 10 6 6 0 8 4 .. .
0 8 4 .. .
0 8 4 .. .
0 8 8 8 8 4 4 4 4 12 12 12 12 8 4 4 4 4 12 12 12 12 0 0 0 0 4 12 12 12 12 0 0 0 0 8 8 8 8 .. .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . . . . . . 12 12 12 12 0 0 0 0 8 8 8 8 4 4 4 4
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
8 0 .. .
0 8 .. .
0 8 .. .
0 8 .. .
0 8 .. .
0 8 .. .
0 8 .. .
0 8 .. .
0 8 .. .
8 0 .. .
8 0 .. .
8 0 .. .
8 0 .. .
8 0 .. .
8 0 .. .
8 0 .. .
8 0 .. .
8
0
8
0
8
0
8
0
8
0
8
0
8
0
8
0
8
8
8
8
8
8
8
8
0
0
0
0
0
0
0
0
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0 0 0 .. .
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
83
TABLE 16 DELAY LENGTHS FOR A PIPELINE 32 ð 32 POINT VECTOR RADIX TRANSFORMER. Forward Direction DIR D 0 Rank
0 1 2 3 4
Prebutterfly Delay Lengths 528, 264, 132, 66, 33,
512, 256, 128, 64, 32,
16 8 4 2 1
Inverse Direction (DIR D 1)
Postbutterfly Delay Lengths 8, 4, 2, 1, 0,
256, 128, 64, 32, 0,
264 132 66 33 0
Prebutterfly Delay Lengths 0, 1, 2, 4, 8,
0, 32, 64, 128, 256,
0 33, 66 132 264
Postbutterfly Delay Lengths 33, 32, 66, 64, 132, 128, 264, 256, 528, 512,
1 2 4 8 16
examined. The 2-D FNTs are shown to be suitable 2-D NTTs for the applications considered. These transforms have a highly composite transform length (power of two) and hence they are amenable to fast algorithms, have the capability of performing exact calculations, and maintain a simple arithmetic operation and hardware implementation if the kernel is properly selected. The transform size is limited if the 2-D FNT is used with ˛ D 2. Relaxing this condition for the whole transform or for some stages allows the transform length to be increased. Although the number domain generated by the NTTs is not always simple to interpret as in the case of the FFT, NTTs are meaningful and well structured. The study of some of the patterns of zeros that appear in the number domain for 2-D NTTs shows that, even though this pattern may be similar for some features, it might be used effectively in pattern-recognition problems where there is a need for classifying only a closed set of objects, such as occurs in character recognition. If the process is complicated and the zero pattern might be not secure, other pixels in the 2-D NTT domain can be selected to show the difference between different patterns. This is problem dependent and can be designed for the specific problem in the training session that precedes the pattern-recognition process. The 2-D NTTs offer an exceedingly sensitive method, whereby small errors in a nominally periodic pattern may be detected. The exact location of the pattern error may be determined by brief inspection of the transform. Also, it is not always necessary to calculate the complete transform. Analysis of the 2-D FNTs of periodic structures leads to a rule that simplifies the calculations and determines the position and values of nonzero elements in the transform domain, showing that 2-D NTTs are well structured. Some new applications of 2-D FNTs are suggested that indicate that the number domain can be utilized for applications other than convolution. Integrated circuit designs and microprocessor implementations for 2-D FNTs should make this transform more readily available for use in systems. The transform-size constraint for the 2-D FNT is not a great problem; in fact, several solutions are proposed, allowing the processing of sufficiently large
84
S. BOUSSAKTA AND A. G. J. HOLT
images while retaining the advantages of the FNT, such as fast computation and simple arithmetic. Other transforms, such as the number theoretic Walsh, may also be used in these applications. Another real transform defined modulo the Mersenne numbers, which provides long transform lengths equal to a power of 2, is introduced in Section V. This is done by dropping the condition that ˛ should be a power of 2 and using a new definition of NTTs that departs from the usual Fourierlike definition. This transform is suitable for fast algorithms. It has the cyclic convolution property and hence can be applied to the calculation of convolutions and correlations. The transform is extended to the separable case, making it a good candidate for the calculation of 2-D convolutions for image-processing applications. Section VI describes how the 2-D new transform is combined with the 2D FNT using the 2-D mixed radix conversion. The resulting transform uses fast, small residue transforms that can be implemented in parallel for high speed and high throughput rate. This method allows the combination of the Fermat and Mersenne number-based transforms with their moduli chosen to be conveniently close to one another, thus making more efficient use of the hardware and software than had previously been possible. The method is suitable for the calculation of 2-D convolutions and leads to increased dynamic range and to the convenient use of parallel operation. Also, all arithmetic operations are carried out modulo the Fermat and Mersenne numbers, which offer simple arithmetic. VLSI is an ideal medium for the implementation of transforms based upon NTTs and especially FNTs, because the moduli used are of the form 2b C 1, where b D 2t and so the word lengths required in the system are 2t C 1. The peculiar number of bits required presents no difficulty in a custom VLSI design but makes the use of standard 16- or 32-bit processors quite inefficient. A brief discussion is given in Section VII of VLSI implementations for 2-D FNTs for image processing. The pipeline structure that can alleviate the computational burden for implementing a row-column 2-D FNT is discussed. Another approach using the vector radix algorithm is described, and a CMOS VLSI design of a pipeline programmable vector radix 2-D FNT computational unit is presented. Only a programmable delay commutator chip is required to complete the 2-D convolution system. This is more efficient than the pipeline row-column structure. Thus, VLSI allows the 2-D FNT and other 2-D NTTs to be used in image-processing systems where high-accuracy and high-speed filtering is required. It is suggested that the 2-D NTTs offer a means of eliminating the noise arising from rounding and truncations in the use of other transforms and, using VLSI implementations, can provide the fast operations necessary for image processing. They are also used (for example, in Hsue [1995]) to avoid
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
85
round-off error in ill-conditioned problems arising in the solution of Toplitz systems of equations. REFERENCES Agarwal, R. C., and Burrus, C. S. (1974a). Number theoretic transforms to implement fast digital convolution, Proc. IEEE, 63(4), 550 – 560. Agarwal, R. C., and Burrus, C. S. (1974b). Fast one-dimensional digital convolution by multidimensional techniques, IEEE Trans. Acoust., Speech and Signal Processing ASSP-22(1), 1 – 10, Feb. Agarwal, R. C., and Burrus, C. S. (1974c). Fast convolution using Fermat number transforms with applications to digital filtering, IEEE Trans. Acoustics, Speech and Signal Processing ASSP-22(2), 87 – 97. Agarwal, R. C., and Burrus, C. S. (1975). Number theoretic transforms to implement fast digital convolution, Proc. IEEE 63(4), 550 – 560. Agarwal, R. C., and Cooley, J. W. (1974). New algorithms for digital convolution, IEEE Trans. Acoustics, Speech and Signal Processing ASSP-25(5), 392 – 410. Ahmed, M. O, and Nundararajan, D. (1987). A fast implementation of two-dimensional convolution algorithm for image processing applications, IEEE Trans. on Circuits and Syst. CAS-34, 577 – 579, May. Alfredsen, L. (1994). A fast Fermat number transform for long sequences, IEEE Seventh European Signal Processing Conference, pp. 1579 – 1581, September, Edinburgh, U.K. Alfredsen, L. (1996). VLSI architectures and arithmetic operations with applications to the Fermat number transform, Lenkoping Studies in Science and Technology (425), Linkoping, Sweden. Arambepola, B. (1989). VLSI architectures for convolver design using number theoretic transforms, Electron. Lett. 25(23), 1604 – 1606. Ashur, A. S., Ibrahim, M. K., and Aggoun, A. (1994). Area-time efficient diminished-1 multiplier for Fermat number transforms, Electron. Lett. 30(20), 1640 – 1641, Sept. Benaissa, M., Bouridane, A., Dlay S. S., and Holt, A. G. J. (1988). Diminished-1 multiplier for a fast convolver and correlator using the Fermat number transform, IEE Proc. G, Electron. Circuits and Syst. 135(5), 187 – 193. Benaissa, M., Dlay, S. S., and Holt, A. G. J. (1991a). CMOS VLSI design of a high-speed Fermat number transform-based convolver/correlator using three-input address, IEE Proc. G. 138(2), 182 – 190. Benaissa, M., Dlay, S. S., and Holt, A. G. J. (1991b). VLSI implementation issues for the 2-D Fermat number transform, Signal Processing 23, 257 – 272. Benaissa, M., Pajayakrit, A., Dlay, S. S., and Holt, A. G. J. (1988). VLSI design for diminished1 multiplication of integers modulo a Fermat number, IEE Proc. E, Computers and Digital Techniques 135(3), 161 – 164. Bongard, M., (1970). Pattern Recognition, New York, Washington: Spartan books. Bouridane, A., Pajayakrit, A., Dlay S. S., and Holt, A. G. J. (1989). CMOS VLSI Circuit of Pipeline sections for 32- and 64-point Fermat number transformers, Integration, the VLSI Journal 8, 51 – 64. Boussakta, S. (1990). Algorithms and development of the number theoretic and related fast transforms with applications, Ph.D. diss., University of Newcastle upon Tyne, U.K. Boussakta, S. (1999). A novel method for parallel image processing applications, Journal of Systems Architecture 45(11).
86
S. BOUSSAKTA AND A. G. J. HOLT
Boussakta, S., and Holt, A. G. J. (1988). Calculation of the discrete Hartley transform via the Fermat number transform using a VLSI chip, IEE Proc. G 135(3), 101 – 103. Boussakta, S., and Holt, A. G. J. (1989). Relationship between the Fermat number transform and the Walsh-Hadamard transform, IEE Proc. 136(Pt. G, no.4), 191 – 204, August. Boussakta, S., and Holt, A. G. J. (1991). Calculating linear from circular convolutions/correlations, Proc. Inst. of Acoustics, 13, (Pt. 9), 76 – 85. Boussakta, S., and Holt, A. G. J. (1992). New number theoretic transform, Electronics Letters 28(18), 1683 – 1684, August. Boussakta, S., and Holt, A. G. J. (1993a). New two-dimensional transform, Electronics Letters 29(11), 949 – 950. May 17. Boussakta, S., and Holt, A. G. J. (1993b). New fast symmetric long transform using Mersenne primes and its applications, Conference Proceedings of UDT, Undersea Defence Technology, Palais des Festivals et des Congres, Cannes, France, June 15 – 17, pp. 552 – 555. Boussakta, S., and Holt, A. G. J. (1994a). Filters based on a new transform, UDT, Conference Proceedings, Wembley, London, July 5 – 7, pp 314 – 319. Boussakta, S., and Holt, A. G. J. (1994b). Filtering employing a new transform, IEEE Proc. Conference OCEANS94, Brest, France, September 13 – 16, Vol. 1, pp 547 – 553. Boussakta, S., and Holt, A. G. J. (1995a). A new transform using the Mersenne Numbers, Proc. IEE, VIS. Image Signal Processing 142(6), 381 – 388, Dec. Boussakta, S., and Holt, A. G. J. (1995b). A new separable transform, Proc. IEE Vision, Image and Signal Processing 142(1), 27 – 30, February. Boussakta, S., and Holt, A. G. J. (1996). A novel combination of NTTs using the MRC, Signal Processing Journal 54, 94 – 98. Boussakta, S., Shakaff, A. Y., Marir, F., and Holt, A. G. J. (1988). Number theoretic transforms of periodic structures and their applications. IEE Proceedings 135, (Pt. G, no. 2), 83 – 96, April. Brigham, E. O. (1974). The Fast Fourier Transform. Englewood Cliffs, N.J.: Prentice Hall. Chevillat, P. R. (1978). Transform domain digital filtering with number theoretic transforms and limited wordlengths, IEEE Trans. Acoustics, Speech and Signal Processing ASSP-26(4), 284 – 290, Aug. Duhamel, P., and Hollman, H. (1982). Number theoretic transforms with 2 as a root of unity, Electronics Lett. 18, 978 – 980, Oct. 28. Duhamel, P., and Hollman, H. (1984). Split-radix FFT algorithm, Electronics Lett. (20), 14 – 16. Eklundh, J. O. (1972). A fast computer method for matrix transposing, IEEE Trans. C-21, 801 – 803. Gregory, R. T., and Krishnamurthy, E. V. (1984). Methods and Applications of Error-free Computation. New York: Springer-Verlag. Gudvangen, S. (1997). A class of sliding Fermat number transforms that admit a tradeoff between complexity and input-output, IEEE Trans. On Signal Processing, 45(12), 3094 – 3096, Dec. Harris, D. B., McClellan, J. H., Chan, D. S., and Scheussler, H. W. (1977). Vector radix fast Fourier transform, IEEE Int. Conf. ASSP, rec., 548 – 551. Helms, H. D. (1967). Fast Fourier transform method of computing difference equations and simulating filters, IEEE Trans. on Audio and Electroacoustics, AU-15(2), 85 – 90, June. Hollingum, J. (1984). Machine Vision: The Eyes of Automation. U.K.: IFS Publications. Hunt, B. R. (1972). Minimizing the computation time for using the technique of sectioning for digital filtering of pictures, IEEE Trans. on Computers (Nov), 1219 – 1222. Hsue, J. (1995). Fast algorithms for solving Toplitz systems of equations using number theoretic transforms, Signal Processing, 44(1), 89 – 101, June. Jullien. G. A. (1980). Implementation of multiplication modulo a prime number with applications to number theoretic transforms, IEEE Trans. Comput. C-29(10), 899 – 905, Oct. Jullien, G. A. (1991). Number theoretic techniques in digital signal processing, Advances in Electronics and Electron Physics, 80, 69 – 161.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
87
King, R., Ahmadi, M., Gorgui-Naguib, R., Kwabwe, A., and Azimi-Sadjdi, M. (1989). Digital Filtering in One and Two Dimensions, London: Plenum Press. Krishnan, R., Jullien, G. A., and Miller, W. (1986a). Implementation of complex number theoretic transforms using quadratic residue number Systems, IEEE Trans. on Circuits and Systems CAS-33(8), 759 – 765, Aug. Krishnan, R., Jullien, G. A., and Miller, W. (1986b). The modified quadratic residue number system (MQRNS) for complex high-speed signal processing, IEEE Trans. on Circuits and Systems CAS-33(8), 325 – 327, Mar. Lee, S. C., and Lu, H. (1988). Fast convolution using generalised Fermat/Mersenne number transforms, Proc. IEEE, ICASSP, CH2561-9/88/0000-1910. Leibowitz, L. (1976). A simplified binary arithmetic for Fermat number transform, IEEE Trans. ASSP-24(5). 199 – 201, Oct. Li, W. and Peterson, A. M. (1990). FIR filtering by the modified Fermat number transform, IEEE Trans. Acoust., Speech, Signal Processing 38(9), 1641 – 1645, Sept. Liu, Huizhu, and Lee, C. (1991). A new approach to solve the sequence-length constraint problem in circular convolution using number theoretic transform, IEEE Trans. on Signal Processing 39(6), 1314 – 1321, June. Marir, F. (1986). The application of number theoretic transforms to two-dimensional convolutions and adaptive filtering, Ph.D. diss., University of Newcastle upon Tyne, U.K. Marir, F., Shakaff, A. Y., and Holt, A. G. J. (1985). Two-dimensional convolution using number theoretic transforms without matrix transposition and without overlap, IEE Proc. G, Electron. Circuits and Syst., 132(5), 211 – 215. Marshal, S., and Soraghan, J. J (1988). Shape analysis for object recognition using number theoretic transforms, IEEE Proc. International Conference on Acoustics, Speech and Signal Processing ICASSP.88, CH2561-9/88/0000-0964, pp. 964 – 967. Martens, J. B. (1983). Number theoretic transforms for the calculation of convolutions, IEEE Trans. Acoustics, Speech and Signal Processing ASSP-31(4), 969 – 978, Aug. McClellan, J. H., and Rader, C. M. (1970). Number Theory in Digital Signal Processing Englewood Cliffs, N.J.: Prentice Hall. McClellan, J. H., and Rader, C. M. (1976). Hardware realization of a Fermat number transform IEEE Trans., Acoust., Speech and Signal Processing ASSP-24, 216 – 225. Meyer, R. (1989). Error analysis and comparison of FFT implementation structures, IEEE Int. Conference, Acoustics, Speech and Signal Processing ICASSP.89, Glasgow, Scotland, 888 – 891. Morandi, C., Piazza, F., and Dolcetti, A. (1988). Image registration using Fermat transforms, Electronics Letters, 24(11), 678 – 680. May. Myers, D. G. (1990). Digital Signal Processing: Efficient Convolution and Transform Techniques, New York: Prentice Hall. Nagpal, H. K., Jullien, G. A., and Miller, W. C. (1983). Processor architectures for two-dimensional convolvers using a single multiplexed computational element with finite field arithmetic, IEEE Trans. on Computers C-32(11), 989 – 1001, Nov. Nussbaumer, H. J. (1976). Complex convolution via Fermat number transforms, IBM J. Res. Develop. 20, 282 – 284. Nussbaumer, H. J. (1977). Relative evaluation of various number theoretic transforms for digital filtering applications, IEEE Trans. on Acoustics, Speech and Signal processing ASSP-26, 83 – 93. Oppenheim, A. V., and Weinstein, C. J. (1972). Effects of finite register length in digital filtering and fast Fourier transform, Proc. IEEE 60(8), 957 – 976, Aug. Parker, M. G. and Benaissa, M. (1995). Unusual length number theoretic transforms using recursive extensions of Rader’s algorithm, IEE Proc. Vis. Image Signal Process 142(1), 31 – 34, Feb.
88
S. BOUSSAKTA AND A. G. J. HOLT
Pollard, J. M, (1971). The fast Fourier transform in a finite field, Mathematics of Computation 25(114), 365 – 374, Apr. Rabiner, L. R., and Gold, B. (1975). Theory and Application of Digital Signal Processing, Englewood Cliffs, NJ: Prentice-Hall. pp. 419 – 433. Rader, C. M. (1972a). Discrete convolution via Mersenne transforms, IEEE Trans. Comput. C21, 1269 – 1273, Dec. Rader, C. M. (1972b). The number theoretic DFT and exact discrete convolution, IEEE Arden House Workshop on Digital Signal Processing, Harriman, N.Y., Jan. 11. Rader, C. M. (1975). On the application of the number theoretic methods of high speed convolution to two-dimensional filtering, IEEE Trans. Circuits and Syst. CAS-22, 575, June. Reed, I. S. (1975). The use of finite fields to compute convolutions, IEEE Trans. on Information Theory IT-21(2), 203 – 208, March. Reed, I. S., and Truong, L. R. (1978). The fast decoding of Reed-Solomon codes using the Fermat transforms, IEEE Trans. on Information Theory IT-24(4), 497 – 500, July. Reed, I. S., Truong, T. K., Kwoh, Y. S., and Hall, E. L. (1977). Image processing by transforms over a finite field, IEEE Trans. on Computer C-26(9), 874 – 887, Sept. Rivard, G. E., (1977). Direct fast Fourier transform of bivariate functions, IEEE Trans. On Acous. Speech, Signal Process ASSP 25(3), 250 – 252, June. Robanov, N. S., Bovbel, E. I, Kucharchik, P. D., and Bodrov, V. J. (1998). The modified number theoretic transform over the direct sum of finite fields to compute the linear convolution, IEEE Trans. On Signal Processing 46(3), 813 – 817, March. Sander, S., El-Guibaly, F. and Antinou, A. (1993). Area efficient diminished-1 multiplier for Fermat number transforms, IEE Proc. G, Circuits Devices and Systems 140(3), 211 – 15, June. Schroeder, M. R. (1990). Number Theory in Science and Communication, New York: SpringerVerlag. Shakaff, A. Y. (1987). Practical implementation of the Fermat number transform with applications to filtering and image processing, Ph.D. diss., University of Newcastle upon Tyne, U.K. Shakaff, A. Y., Pajayakrit, A., and Holt, A. G. J. (1988). Practical implementations of blockmode image filters using the Fermat number transform on a microprocessor-based system, IEE Proc. G, Electron. Circuits and Syst. 135(4), 141 – 154. Siu, W. C., and Constantinides, A. G. (1983). Very fast discrete Fourier transform using number theoretic transform, IEE Proc. G, Circuit Theory and Applications 130(5), 201 – 204. Siu, W. C., and Constantinides, A. G. (1984). On the computation of discrete Fourier transform using Fermat number transform, IEE Proc. 130(5), 201 – 204, Feb. Smith, N. E. (1983). The application of number theoretic transforms to fixed coefficient and adaptive digital filters, Ph.D. diss., University of Newcastle upon Tyne, U.K. Swartzlander, E. E., and Hallnor, G., (1985). Frequency domain digital filtering with VLSI, VLSI and Modern Signal Processing, Englewood Cliffs, N.J.: Prentice Hall, 153 – 169. Szabo, N. S., and Tanaka, R. I. (1967). Residue Arithmetic and Its Applications to Computer Technology, NewYork: McGraw-Hill. Thomas, J. J, Larsen, G. N., and Keller, J. R. (1983). Number theoretic transforms with independent length and moduli, IEEE Trans. On Acoustics, Speech, and Signal Processing ASSP-31(1), 215 – 216, Feb. Towers, P. J., Pajayakrit, A., and Holt, A. G. J. (1987). Cascadable NMOS VLSI circuit for implementing a fast convolver using the Fermat number transform, IEE Proc. G, Electron. Circuits and Syst. 135(6), 57 – 66. Truong, T. K., Reed, I. S., Yeh, C. S., and Shao, H. M. (1982). Parallel VLSI architecture for a digital filter of arbitrary length using the Fermat number transforms. IEEE Proc. Int. Conference on Circuits and Computers ICCC-82, Sept. 28 – Oct. 1, New York, pp. 574 – 578.
NUMBER THEORETIC TRANSFORMS AND THEIR APPLICATIONS
89
Truong, T. K., Yeh, C. S., Reed, I. S., and Chang, J. J. (1983). VLSI design of number theoretic transforms for a fast convolution, Proceedings of IEEE International Conference on Computer Design, VLSI in Computing, ICCD, New York, 31, pp. 200 – 203. Vanser Kraats, R. H., and Venetsanopoulos, A. N. (1982). Hardware for two-dimensional digital filtering using Fermat number transforms, IEEE Trans. on Acoustics, Speech and Signal Processing ASSP-30(2), 155 – 161, April. Vegh, E., and Leibovitz, L. M. (1976). Fast complex convolution in finite rings, IEEE Trans. on Acoustics, Speech and Signal Processing, 343 – 344, Aug. Wallace, A. M. (1988). Industrial applications of computer vision since 1982, IEE Proceedings 135(Pt. E, no. 3), 117 – 133, May. Wigley, N. M., and Jullien, G. A. (1992). VLSI implementations of number theoretic concepts with applications in signal processing, Proc. SPIE, Int. Soc. Opt. Eng. 1770, 98 – 109.
LIST OF ABBREVIATIONS AND SYMBOLS CCP DFT FFT ˛ NTT F N Nmax FNT INFNT GF[F], ZF GF[F]C , Zcm gcd 2-D MNT Ft Mp DIF yL n, m yc n, m T Ns ð Ns 2-D FFT 2-D NTT 2-D MRC 2-D CCP 2-D FNT
Cyclic convolution property Pointwise multiplication Discrete Fourier transform Fast Fourier transform Nth root of unity, kernel, basis function Number theoretic transform Modulus Transform length Maximum transform length Fermat number transform Inverse FNT Finite field or ring Complex finite field or ring Greatest common divisor Two dimensional Mersenne number transform tth Fermat number pth Mersenne number Decimation in frequency Two-dimensional linear convolution Two-dimensional circular convolution Time period Number of nonzero values in 2-D NTT domain Two-dimensional Fast Fourier transform Two-dimensional number theoretic transform Two-dimensional mixed radix conversion Two-dimensional cyclic convolution property Two-dimensional Fermat number transform
90 M 2-D MNT 2-D NMNT ˇÐ, Ð ml ˛nk 1 ˛2 0
S. BOUSSAKTA AND A. G. J. HOLT
Overall modulus, the dynamic range (M D m1 ð m2 ð m3 ð Ð Ð Ð ð mk ) Two-dimensional Mersenne number transform Two-dimensional new Mersenne number transform Transform kernel for the 2-D NMNT Transform kernel for the 2-D FNT or the 2-D MNT An operation, defined in Equation (113) An Operation, define in Equation (157) ACKNOWLEDGEMENTS
The authors are pleased to acknowledge the support of research grants from SERC and MOD as well as financial support from the Ministry of Higher Education of Algeria and the Royal Thai Navy. They also thank their colleagues in the Department of Electrical and Electronic Engineering at Newcastle University for helpful discussions and Professor D.J. Kinniment for the use of VLSI design software. Also, Dr. S. Boussakta thanks the University of Teesside for allowing him to write this chapter. The authors have pleasure in acknowledging the help in checking sections of the proof by Dr. M. Benaissa and Dr. F. Mavir.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 111
On the Electron-Optical Properties of the ZrO/W Schottky Electron Emitter M. J. FRANSEN1,2 , TH. L. VAN ROOY1 , P. C. TIEMEIJER1 , M. H. F. OVERWIJK1 , J. S. FABER3 , AND P. KRUIT2 1 Philips
Research Laboratories, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands. Philips Analytical, Lelyweg 1, 7602 EA Almelo, The Netherlands. 2 Delft University of Technology, Department of Applied Physics, Delft, The Netherlands. 3 Philips Electron Optics/FEI, Eindhoven, The Netherlands.
I. Introduction . . . . . . . . . . . . . . . . . . . . . . .
92
A. General Information . . . . . . . . . . . . . . . . . . .
92
B. The Schottky Emitter . . . . . . . . . . . . . . . . . .
93
II. Electron Emission Theory . . . . . . . . . . . . . . . . . .
95
A. Qualitative Description of the Emission Process
. . . . . . . . .
96
B. Calculation of Distribution Functions . . . . . . . . . . . . .
98
C. Emission at High Temperature and Low Field . . . . . . . . . .
100
1. Thermionic Emission . . . . 2. Schottky Emission . . . . . 3. Extended-Schottky Emission; The Approximation . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . Parabolic-Barrier . . . . . . . . . . . . .
100 101
D. Emission at Low Temperature and High Field . . . . . . . . . .
105
1. Thermal-Field Emission
. . . . . . . . . . . . . . . .
E. Bridging the Gap Between Thermionic and Field Emission . . . . . . 1. 2. 3. 4.
Determination of the Parameter Current Density . . . . . Total Energy Distribution . . Brightness . . . . . . .
III. Boersch Effect for Electron Emitters
Space . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . .
A. Application of Jansen’s Theory in the Emitter Region
103 105 107 108 109 111 114 116
. . . . . . .
116
B. Knauer’s Model for Energy Broadening . . . . . . . . . . . .
119
IV. Experimental Methods and Systems . . . . . . . . . . . . . . .
119
A. Measurement of the Energy Distribution . . . . . . . . . . . .
120
B. Measurement of Brightness . . . . . . . . . . . . . . . .
124
C. Indirect Determination of Energy Distribution and Brightness . . . . . . . . . . . . . . . . . . . . . .
126
D. Measurement of Emitter Stability . . . . . . . . . . . . . .
127
Volume 111 ISBN 0-12-014753-X
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
92
M. J. FRANSEN, TH. L. VAN ROOY, et al.
V. Experiments on ZrO/W Schottky Emitters . . . . . . . . . . . . . A. Energy Spread of the Schottky Emitter 1. 2. 3. 4.
. . . . . . . . . . . .
Energy Spread of a 0.9 m Schottky Emitter . . . . . Energy Spread of a 0.3 m Schottky Emitter . . . . . Intermezzo: Energy Spread of the Heated Tungsten Cathode Boersch Effect for Schottky Emitters . . . . . . . .
128 128
. . . .
129 136 141 143
B. Brightness of the Schottky Emitter . . . . . . . . . . . . . .
147
C. Emission Stability . . . . . . . . . . . . . . . . . . .
151
. . . .
. . . .
. . . .
VI. Application to Other Emitters . . . . . . . . . . . . . . . . .
156
A. Electron-optical Parameters of a 0.5 m Schottky Emitter . . . . . .
157
B. HfO/W versus ZrO/W . . . . . . . . . . . . . . . . . .
158
VII. Conclusions . . . . . . . . . . . . . . . . . . . . . . .
162
References . . . . . . . . . . . . . . . . . . . . . . .
164
I. INTRODUCTION A. General Information One of the most successful high-brightness electron source types is the ZrO/W electron emitter, also known as Schottky emitter. It consists of a tungsten emitter covered with a thin layer of zirconiumoxide. This source is employed in many commercially available electron-beam instruments, such as electron microscopes [Otten, 1994] and electron beam lithography machines [Koek et al., 1993]. For a recent review of the Schottky emitter, see Swanson and Schwind [1997]. Remarkably, the parameters characterizing the source for use in electron-beam instruments, the spread in energy of the emitted particles and the reduced brightness, still had not been measured accurately, in spite of the valuable information these quantities give to the designer of electronoptical instruments. In the present work we intend to answer some questions concerning the electron-optical properties of the Schottky emitter. Are the present values of brightness and energy spread of the ZrO/W Schottky electron emitter in the electron-optical instruments in which these sources are employed in accordance with theoretical values? Currently, the accepted values for reduced brightness and energy spread of the Schottky emitter are 2 Ð 107 A/m2 sr Ð V [Otten, 1994] and 0.8 eV [Reimer, 1993], respectively. Are these values correct? Is it possible to optimize the source? What are the optimal values of the tip diameter, field strength, and emitter temperature for a certain application? As an example, let us consider lowvoltage scanning electron microscopy. The performance of such a microscope is limited by the aberrations of the objective lens [Joy and Joy, 1996], especially by the chromatic aberration. It seems advantageous to operate the
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
93
emitter under such a condition that the energy spread is minimized. In this way, the contribution of the chromatic aberration is lower, and the optimum half opening angle of the beam can be somewhat larger. Such a choice, however, might have a negative influence on the brightness and thus on the current of the probe. With a good knowledge of the parameter dependencies of energy spread and brightness on the temperature of the emitter and the field strength on its surface, it is possible to optimize the source for such a system. Is it possible to estimate the energy width at all field strengths with a simple model? For an application in which high brightness is of prime importance, for instance, in electron-beam lithography, it is advantageous to operate the source at the highest extraction field possible. Up to now, there has been no simple analytical theory available to answer this question. The interpretation of experimental results is complicated by the influence of statistical electronelectron interactions on the energy spread and brightness. Is it possible to predict the order of magnitude of these effects for a given configuration? Swanson and Schwind indicate that longitudinal electron-electron interactions (Boersch effect) can cause a considerable energy broadening of several eV at high emission current. Can we separate the intrinsic energy spread resulting from the emission process from the additional broadening caused by the Boersch effect? Is there a simple model for an estimation of the energy broadening for a given emitter configuration? What is the field on the emitter surface? It is a generally accepted practice to use Schottky plots of experimental current-voltage data for determining the field factor ˇ D F/V describing the linear relation between the extraction voltage, V, and the field on the emitter surface, F. The implicit assumption that is made with this technique is that the emitter is operating in the Schottky emission regime. It is hard to judge the correctness of this assumption from a Schottky plot alone. Is there an alternative for this procedure? In general, can we characterize the Schottky electron source parameters in the electron-beam instrument in situ? The work described in this chapter is a part of a Ph.D. study [Fransen, 1999] on the electron-optical properties of three emitter types. Aside from the ZrO/W emitter described here, this study deals with the ultrasharp h111ioriented tungsten field emitter and the use of individual carbon nanotubes as monochromatic field emitters. B. The Schottky Emitter The remarkably stable field emission current from a tungsten emitter covered with a thin layer of zirconium (Zr) was reported by several groups investigating the physical nature of adsorption on metal surfaces [Shrednik, 1961; Fursei and Shakirova, 1966; Swanson and Crouser, 1969]. The adsorption of Zr on
94
M. J. FRANSEN, TH. L. VAN ROOY, et al.
W occurs preferentially on the (100)-face of tungsten. A flat facet is grown on top of an electrochemically etched h100i-oriented tungsten wire by heating the tip while applying an electric field. The combination of this large facet and the low work function of the adsorbed layer results in a stable emission current at moderate field strength. Noise phenomena observed with cold field emitters are reduced when using large emitting areas. It was soon recognized that this unique feature of Zr on W overcomes the intrinsic instabilities of a metallic field emitter. However, the size of a facet is a dynamic equilibrium between field and surface tension, and it is known that it can take several hours to reach this equilibrium, which is a drawback. At the end of this chapter we discuss the remaining, long-term emission current instabilities inherent to this source type in more detail. In early papers, in which the properties of the Zr layer on the emitter surface were studied, it was mentioned that “a poorly outgassed (Zr) source” [Collins and Blott, 1968; Swanson and Crouser, 1969] was necessary for the formation of the desired surface layer with low work function. Subsequent studies [Danielson and Swanson, 1979] revealed that the resulting layer was a compound of Zr, O, and W, with a work function of about 2.8 eV. For longtime operation of the source, it is necessary to keep the emitter at an elevated temperature of 1800 K, so that surface diffusion of fresh Zr over the emitter surface occurs. Because of this high temperature, the properties of the ZrO/W emitter are described using thermionic emission theory, in which the lowering of the potential barrier by the high electric field is taken into account. This lowering of the effective work function is known as the Schottky effect. We review the Schottky emission theory in Section II. The widespread use of this emission model to describe the properties of the ZrO/W electron emitter has led to the name Schottky emitter. At a temperature of 1800 K, thermionic emission from other tungsten faces becomes important as well. For this reason a screening electrode is usually mounted around the shank of the ZrO/W emitter, the suppressor. In Figure 1(a) a scanning electron microscope (SEM) image is shown of a ZrO/W emitter, manufactured by FEI and used in FEI/Philips electron microscopes. The emitter tip protrudes about 0.25 mm out of the suppressor. In Figure 1(b) the apex of the emitter is shown with a higher magnification, showing the flat facet. The faceted end-form of the Schottky emitter causes some effects unique for this emitter type; e.g., emission from the edges of the facet causes a bright ring around the uniform emission pattern for increased electric fields [Tuggle, 1984]. This ring is typically situated about 7° from the optical axis. In most applications only the central spot is used, by confining the beam to ¾ 4° with an aperture in the extraction electrode. Depending on the etching conditions and facet-formation procedure, emitters with a radius of curvature at the apex ranging from 0.1 m to 2 m can be
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
95
FIGURE 1. Scanning electron micrographs of a ZrO/W (Schottky) emitter employed in electron microscopes. (a) The emitter tip with the suppressor. (b) The tip itself. The flat facet at the apex of the emitter can clearly be recognized.
fabricated. Emitters with a low radius of curvature require a lower extraction voltage for the same emission current density. However, in applications where a high angular current density from the source is required, the field applied to the sharp emitter has to be higher to obtain the same angular current density, because the emitting surface is smaller. The choice of the optimum emitter radius depends strongly on the type of application; e.g., Philips transmission electron microscopes (TEMs) are equipped with emitters with an apex radius of 0.9 m, whereas for SEMs sharper tips in the order of 0.5 m are used. For special applications, such as miniaturized electron columns, even sharper emitters are used [Kim et al. 1997].
II. ELECTRON EMISSION THEORY The physical process of spontaneous electron emission from a metal into a vacuum depends on three parameters: the work function of the material under consideration, the temperature of the emitter, and the electric field strength on the emitter surface. Of these three, the last two can be chosen freely once the emitter material is chosen. Unfortunately, there is no general analytical expression describing the emission process for all possible work functions, temperatures, and fields. Approximations exist for two cases: high temperature and low field and low temperature and high field. In the transition regime between these two, no analytical theory is available. We start with a qualitative description of the approximations and review the equations for the emitted current density, total energy distribution, and reduced brightness in each regime of validity in subsequent sections. The last section is devoted to the limits of validity of the different emission regimes, and we discuss our efforts to describe the energy distribution in the gap between the thermionic
96
M. J. FRANSEN, TH. L. VAN ROOY, et al.
and field emission theories analytically. Expressions for the current density and brightness are then obtained with a simple numerical integration. A. Qualitative Description of the Emission Process In this section, we give a brief description of the physical model of spontaneous electron emission from a metallic emitter into vacuum. We assume that there is no band structure in the metal and use the free-electron approximation. Consider the interface between a conductor and vacuum at zero temperature T, drawn in the right-hand-side plot of Figure 2. In the metal, electrons occupy energy levels. At T D 0, the energy levels of the metal are filled up to the Fermi level, EF , and empty above. This distribution function is drawn in the left-hand-side plot of Figure 2 with a dashed line. Electrons are kept inside the metal because a potential difference exists between the metal and the vacuum: the work function W. This value depends on the material under consideration and its crystallographic orientation. It ranges from about 2.5 eV to 6 eV for pure metals [Lide, 1991–2]. Most metals have a work function around 4.5 eV, including h111i-oriented tungsten. The work function of a metal can be lowered with a suitable surface layer, such W = 2.8 eV
T = 1800 K
Energy relative to EF (eV)
ΔW
Un
z1
z2
0.6 V/nm
EF 1.2 V/nm metal 1
0 f(E,T)
−2
vacuum
0 2 Distance from metal vacuum interface (nm)
4
FIGURE 2. Simple model of the interface between metal and vacuum. The filling of energy levels in the metal is determined by the Fermi-Dirac distribution function, drawn at the left-hand side of the interface. At 0 K, the electron energy levels are filled up to the Fermi level, EF , and empty above. The difference, W, between the vacuum potential and the Fermi level is referred to as the work function.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
97
as with ZrO, yielding 2.8 eV, or with HfO, which results in a work function of 3.0 eV. For nonzero temperatures, the step function drawn in Figure 2 smoothes out. The distribution of electrons with energy E and temperature T over the energy levels in the metal is given by the well-known Fermi-Dirac function: fE, T D
1 1 C exp[E EF C V0 /kb T]
1
with kb Boltzmann’s constant. It is drawn in Figure 2 on the left-hand side with a solid line for a temperature of 1800 K. The temperature dependence of the Fermi level is neglected. The parameter V0 determines the position of the zero level of the energy scale. We will choose the zero level either at the Fermi level, as in Figure 2, at the top of the barrier or at absolute zero energy, for ease of integration. At sufficiently high temperatures, electron states above the work function are occupied, and emission of electrons into vacuum becomes possible. This process is known as thermionic emission. The emission current from a given material can be greatly increased by the application of an electric field to the emitter surface, which causes an effective decrease of the work function. The field is generated by applying a suitable voltage difference between the emitting surface and a nearby electrode, commonly denoted as the extractor. In order to limit the size of the emitting area and to further increase the local field at the emitter surface, the emitter is generally shaped into a sharp point. The strong field has two effects on the emission process. The first effect can be seen directly in the plot on the right-hand side of Figure 2, where the function Vz — i.e., the potential as a function of the distance from the interface z — is drawn for two values of the electric field (0.6 V/nm and 1.2 V/nm). Vz is determined by the work function W, the effect of the electric field F, and the image charge potential. A work function of 2.8 eV is taken. The potential barrier, and thus the effective work function, is lowered with an amount 1W. This is known as the Schottky effect, and the enhanced emission is called Schottky emission. As a second effect, the width of the barrier becomes smaller for increasing fields. If the barrier width is sufficiently small, quantum mechanical tunneling of electrons through the potential barrier occurs. For high temperatures, when there is a substantial occupation of energy levels at the top of the barrier, electron tunneling will mainly occur there. The total emitted current is a combination of thermionic electrons originating from above the top of the barrier and electrons tunneling through the barrier. This emission regime is referred to as extended-Schottky emission. For low temperatures, tunneling of electrons occurs only around the Fermi level, because higher energy levels are unoccupied. This process is known as
98
M. J. FRANSEN, TH. L. VAN ROOY, et al.
field emission. The effect of higher temperatures on the tunneling current is included in the thermal-field emission model. Analytical expressions for the emission current have been published either for a high emitter temperature together with a weak electric field or for a high field at low temperature. We discuss these expressions in the next two sections. There is no overlap between both approaches: in the intermediate region, no simple analytical theory existed until now. We show that it is nevertheless possible to describe the energy distribution of the electrons in this transition region with a simple analytical expression and thus require only one integration step for obtaining the emission current density and the brightness. B. Calculation of Distribution Functions Here and in the following two sections we review the emission theory more quantitatively using some review books and papers, such as Principles of Electron Optics by Hawkes and Kasper [1989a, b], a cold field emitter review by Swanson and Bell of [1973] and a paper by Shimoyama and Maruse [1984] considering theoretical expressions for the maximum axial brightness. In this review, all quantities are in SI units. As a starting point for the discussion, we take Equation (44.8) from Hawkes and Kasper [1989a], defining the three-dimensional current density element d3 j as 2e d3 j D pfE, TDUn d3 p 2 mh3 In this relation, p is the three-dimensional impulse vector, fE, T describes the distribution of electrons in the metal, and DUn is the transmission coefficient, the wave-mechanical coefficient describing the fraction of electrons incident on the surface that can pass the potential barrier. We decompose the impulse vector p into a component normal to the surface, pn , and a transverse component, pt . The energy associated with these components is denoted as Un and Ut , respectively. In this chapter we use the terms normal energy and tangential energy instead of the correct definition, the energy associated with the normal (or tangential) component of the impulse vector. Note that the transmission coefficient is a function only of the normal energy, Un . In most practical situations, the curvature of the emitter surface can be neglected when considering effects close to the surface; consequently, the metal-vacuum interface is taken as plane. We define a Cartesian coordinate system with the z-axis normal to the emitter surface. We consider the current in the z-direction only; consequently the three-dimensional current density element simplifies to d3 jz . In the next sections we omit the z-index and simply write d3 j. The potential V is now a function of z only, and the normal and
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
99
transverse components of the energy Un and Ut are conserved. The following conservation laws apply: 1 2 p C p2y D Ut D constant 2m x 1 2 p C Vz D Un D constant 2m z Ut C Un D E D constant
3
where E is the total energy. We now introduce polar or cylindrical coordinates to obtain distributions of the normal energy, dj/dUn , the tangential energy, dj/dUt , or the total energy, dj/dE. Integration of these distribution functions yields the emission current density, j. The distribution dj/dE describes the total energy spread of the emitted beam. We express the momenta in polar coordinates: px D
2mE Vz sin cos
py D
2mE Vz sin sin
pz D
2mE Vz cos
4
With these equations, we calculate the volume element in momentum space at each plane z as d3 p D p2 dp d D m 2mE Vz sin dE d d
5
After substitution of Equation (5) into Equation (2), we can calculate the total energy distribution by integration over all emission angles (0 < < 2, 0 < < /2), once a suitable expression for the transmission coefficient is found. The tangential energy distribution dj/dUt is used to find an expression for the maximum brightness. Following Shimoyama and Maruse [1984], the reduced axial brightness, Br , of the source is defined in SI units as dj e 6 Br dUt Ut D0 We express the momenta in cylindrical coordinates: px D
2mUt cos
py D
2mUt sin
pz D
2mUn Vz
7
100
M. J. FRANSEN, TH. L. VAN ROOY, et al.
The volume element in momentum space at each plane z is given as d3 p D m2 p1 z dUn dUt d
8
After substitution of Equation (8) into Equation (2), the tangential energy distribution can be calculated by integration over and Un . In order to do so, we again need an expression for the transmission coefficient, DUn . The normal energy distribution, dj/dUn , can play a role in the situation when the beam coming from a flat emitter is limited with a narrow aperture in the extractor. For pointed cathodes however, where the electrons gain most of their kinetic energy in the vicinity of the emitter, such an aperture has to have an extremely small diameter in order to observe this effect, and it is not necessary to consider it in the remainder of this chapter. A general expression for the transmission coefficient requires the solution of the Schr¨odinger equation to obtain the tunnel chance of electrons through the barrier as a function of energy. For the barrier shapes encountered in field emission, this is possible only with numerical techniques [El-Kareh, Wolfe and Wolfe, 1977]. Several analytical approximations are known in literature for certain regimes of field and temperature. We discussed these regimes qualitatively in the first part of this section. In the next sections the analytical approximations are described in more detail. C. Emission at High Temperature and Low Field 1. Thermionic Emission For pure thermionic electron emission, the field F D 0. Only electrons occupying energy levels above the work function W can leave the metal. The origin of the energy scale is chosen at the top of the potential barrier. Vz becomes EF C W for z < 0 Vz D 9 0 for z ½ 0 The transmission factor DUn is simply DUn D
0 1
for Un < 0 for Un ½ 0
10
So, the emission current density, d3 j, is found to be ep cos 2 fE, TDUn d3 p m h3 4me ECW ' 3 E exp sin cos dE d d h kb T
d3 j D
11
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
101
in which fE, T, the Fermi-Dirac distribution describing the energy distribution of electrons in the metal, is simplified by omitting the unit term in the denominator of Equation (1). This approximation is allowed, because the only energies of interest are located around the top of the potential barrier, far away from the Fermi level. The total energy distribution for thermionic emission, djT /dE, is then easily calculated. The result is 4me ECW E exp dE djT D 3 h kb T
12
Integration over all energies (0 < E < 1) yields the emitted current density: 4me W 2 kb T exp jT D h3 kb T
13
The thermionic emission equation in this form was derivedpfor the first time by Dushman [1923], and corrected the empirical equation A T expc/T as found by Richardson [1902, 1912]. A plot of ln jT /T2 versus 1/T is generally referred to as a Richardson-Dushman plot. From the slope of this plot a value for the work function can be obtained. For the calculation of the maximum axial brightness of the thermionic emitter, the tangential energy distribution, djT /dUt , is required. We write d3 j as d3 j D
epz 2 U n C Ut C W exp m2 p1 z dUn dUt d m h3 kb T
14
and integrate over Un and : djT D
4me Ut C W k T exp b 3 h kb T
dUt
15
After substitution of this result into Equation (6) the thermionic maximum axial brightness becomes Br,T D
4me2 W k T exp b 3 h kb T
D
ejT kb T
16
Note that in a practical application of such a thermionic emitter, a weak field is applied in order to prevent the buildup of a space-charge cloud that causes a saturation of the current drawn from the source.
102
M. J. FRANSEN, TH. L. VAN ROOY, et al.
2. Schottky Emission When the field is increased, its influence on the effective work function cannot be neglected anymore. In order to calculate the potential distribution Vz, it is necessary to include the potential of image charges in the model. This yields a potential function made out of three components: the work function, W, the effect of the electric field, F, and the image charge potential: Vz D
0 W eFz e2 /16ε0 z
for z < 0 for z > 0
17
with ε0 the permittivity of free space. The zero of potential energy is taken at the Fermi level, EF . In Figure 2, this function is plotted for two fields: 0.6 V/nm and 1.2 V/nm. It can be seen that the effective barrier height is lowered as the field increases. The top of the potential barrier is located at zm D
e/16ε0 F
18
and Vm D Vzm D W 1W D W The origin of the energy scale for tion is again chosen at the top of given by 0 DUn D 1
e3 F/4ε0
19
the calculation of the total energy distributhe barrier. The transmission coefficient is for E < EF C Vm for E > EF C Vm
20
The calculation of the total energy distribution, the emission current density, the tangential energy distribution, and the brightness in the Schottky regime is completely analogous to that of the thermionic emission case considered earlier. The total energy distribution is found to be 4me E C Vm djS D E exp 3 h kb T
and the emission current density is given by 4me W 1W 2 kb T exp jS D h3 kb T
dE
D exp
21
1W jT kb T
22
The enhancement of the emission current with respect to the zero-field thermionic case resulting from the lowering of the potential barrier was first described by Schottky [1914, 1923] and is known as the Schottky effect. As long as
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
103
p electron tunneling can be neglected, plotting lnj as a function of F results in a straight line: the Schottky plot. We show an example of such a plot in Section IV. The tangential energy distribution is given by 4me Ut C W 1W djS D kb T exp 23 dUt 3 h kb T and the reduced axial brightness, Br,S , by Br,S D
4me2 W 1W k T exp b h3 kb T
D
ejS kb T
24
3. Extended-Schottky Emission; the Parabolic-Barrier Approximation When the field at the emitter surface is further increased, electron tunneling cannot be neglected anymore. Usually, the transmission coefficient is evaluated according to the Wentzel-Kramers-Brillouin (WKB) approximation as DUn D
1 1 C expG
25
with G the Gamov exponent: GUn D
2 h¯
z2
2mVz Un dz
26
z1
The limits z1 and z2 of the integral correspond to the intersection of Un with Vz, as indicated in Figure 2. The Gamov exponent can be integrated analytically. The resulting expression, however, contains elliptic integrals, which is not convenient for subsequent integrations. We discuss this further in the next section. In the present situation, it is sufficient to use an approximation that is valid only near the top of the potential barrier. The barrier function, Equation (17), is Taylor-approximated with a second-order polynomial, as first proposed by Christov [1966]: e2 Va z ³ Vm z zm 2 27 16ε0 zm3 This parabolic function is drawn in Figure 2 with a dashed line. Using this approximation, Equation (26) can be written in closed form, yielding: GUn D
Vm Un
28
104
M. J. FRANSEN, TH. L. VAN ROOY, et al.
with
D
h¯ p 4ε0 eF3 1/4 c1 F3/4 m
29
Using this Gamov exponent, the transmission coefficient can now also be used above the potential barrier, unlike Equation (26). We discuss the error that is introduced with this approach in Section II.E. With the choice of the origin of the energy scale at the top of the potential barrier, the current density, d3 j, can be integrated over all emission angles, yielding the total energy distribution in the parabolic-barrier approximation, djPBA : djPBA
E 4me ln 1 C exp D 3 h 1 C exp[E C W 1W/kb T]
dE 30
The total energy distribution in the extended-Schottky model, djES , is found after neglecting the unit term in the denominator of the Fermi-Dirac distribution function. This simplification is necessary in order to find an expression for the emission current density in closed form. Integration of djES from E D 1 to 1, using a tabulated integral [Gr¨obner and Hofreiter, 1961a], yields: jES D
4me W 1W 2 k T exp b 3 h kb T
q q D jS sin q sin q
31
with q D /kb T D c1 F3/4 /kb T. For low fields q approaches zero and the current density calculated with the extended-Schottky model reduces to the Schottky model. As the field increases and/or the temperature decreases, q rises and the current density increases due to the contribution of the tunnel effect. When q approaches unity Equation (31) diverges, due to the omission of the unit term in the denominator of the Fermi-Dirac distribution function, fE, T. A consequence of the significant number of electrons that tunnels through the potential barrier is that a Schottky plot loses its value. A Schottky plot of experimental current-voltage data will yield a line that is approximately straight, with a small upward curvature. The calculation of the transverse energy distribution, dj/dUt , and thus the maximum axial brightness in the extended-Schottky regime proceeds along the same lines as outlined previously. When, as before, the unit term in the denominator of the Fermi-Dirac distribution function is neglected, an analytical expression for the reduced axial brightness in the extended-Schottky model, Br,ES D
ejES kb T
32
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
105
can be derived. Use has been made of a tabulated integral [Gr¨obner and Hofreiter, 1961b]. Again, this equation is valid only for small values of q. In practice, an upper limit for q of 0.7 is used. For a combination of temperature, work function, and field that exceeds this value, no analytical theory exists. We pay more attention to this regime in the last part of this section. In the next section, analytical expressions for the energy distribution, current density, and brightness are obtained for low temperatures and high fields. D. Emission at Low Temperature and High Field 1. Thermal-Field Emission At room temperature or below, energy levels close to the top of the potential barrier are empty. Electrons can escape from the metal only by tunneling. The tunnel probability of electrons is largest for the highest occupied energy levels, close to the Fermi level, EF . We will now integrate the Gamov exponent, Equation (26), directly, yielding 4 2mW Un 3 vy 33 GUn D 3e¯hF with y D 1W/W Un . The function vy results from the inclusion of the image charge effect in the expression for the potential barrier, Equation (17). It contains elliptic integrals of the first and second kind, Kk and Ek: vy D
1 C y[Ek yKk]
34
p with k D 1 y/1 C y. In order to find an expression for the Gamov exponent that is simple enough to enable subsequent calculations, Equation (33) is Taylor-expanded around the Fermi level, EF , up to the first derivative. A new variable, D Un EF , is used for ease of calculation. This yields the following expression: GUn D c
d
35
with e¯hF p 2ty0 2mW W 4 p 2mW3 vy0 D b cD 3e¯hF d
dD
36 37
106
M. J. FRANSEN, TH. L. VAN ROOY, et al.
y0 D 1W/W, and the function ty D vy 23 y v0 y
38
Finally, the parameter b is given by bD
2vy0 3ty0
39
Both vy and ty are slowly varying functions. In a practical situation, b ³ 0.6, vy ³ 1 y 1.69 , and the function t obeys the relation ty ³ 1 C 0.1107y 1.33 [Hawkes and Kasper, 1989a]. For tungsten, with a work function of 4.5 eV, typical field strengths are on the order of 2–7 V/nm, which causes vy to vary between 0.8 and 0.45 and ty to vary between 1.07 and 1.03. Substitution of Equation (35) into Equation (25) yields a transmission function DUn given by DUn D
1 1 C expc /d
40
Because the emitted electrons originate mainly from the vicinity of the Fermi level, the transmission coefficient is simplified to expc C /d. This result enables further analytical evaluation of the distribution functions and the total emission current density. The current density element, d3 j, is written as d3 j D
E Vz 4me 3 h 1 C expE/kb T ð exp
E cos2 C Vz sin2 bW d
ð sin cos dE d d
41
We follow Kasper’s approach [Kasper, 1982] and evaluate the integral at a fixed z-value, at V D EF . By assuming that cos2 ³ 1, i.e., small values for the angle , the integral can be further simplified. E is chosen to be zero at the Fermi level. After performing the integrations over and , the following expression for the total energy distribution is obtained: expE/d 4me bW djTF E D dE 42 d exp 3 h d 1 C expE/kb T This expression exhibits a strong similarity with the total energy distribution in the extended-Schottky model, Equation (30) with the unit term in the
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
107
denominator of the distribution function omitted. In this case however, the transmission coefficient itself is simplified instead of fE, T. This is necessary in order to perform the integration over E analytically. Integrating over E finally yields p 4me 2 bW jTF D 43 d exp 3 h d sin p Again, use was made of a tabulated integral [Gr¨obner and Hofreiter, 1961c]. The parameter p D kb T/d is a measure for the additional number of electrons that is emitted from above the Fermi level. For emitters that are operated at room temperature or below, the factor p/ sin p is usually omitted, because it is close to unity. For practical values of the field strength, the resulting error is less than 6% at room temperature. The zero-temperature equation was first derived by Fowler and Nordheim [1928], who also showed that this equation corresponded well to the experimental results reported by Millikan and Lauritsen [1928]. A plot of lnj/F2 as a function of 1/F yields a straight line if T D 0, the so-called Fowler-Nordheim plot. It can be used to extract the emitting area, A D I/j, and the field factor, ˇ D F/V, from experimental I, V data. The effect of a nonzero temperature on the emission equation was investigated by Young [1959]. Again, this equation can be used only for p-values substantially below unity. In practice an upper limit of 0.7 is used. For combinations of field, work function, and temperature that exceed this value, no analytical theory exists. In the next section, we show that it is possible to describe this region with a theory that is partly analytical. With this theory, it is easy to study the behavior of the emission current, energy distribution and brightness in the transition region. The reduced brightness in the thermal-field regime can be calculated by substituting Equations (1), (40), and (8) into (2). Again, the unit term in the denominator of the transmission coefficient is neglected. A tabulated integral is used for the integration over Un [Gr¨obner and Hofreiter, 1961d]. The reduced brightness in the thermal-field emission regime, Bt,TF , is given by Br,TF D
ejTF d
44
Again, the resulting expression is valid for p < 0.7. E. Bridging the Gap Between Thermionic and Field Emission In this section we consider the consequences for the emission current density, total energy distribution, and brightness for a certain choice of the electric
108
M. J. FRANSEN, TH. L. VAN ROOY, et al.
field, F, the temperature, T, and the work function, W. At first, we investigate the limiting fields and temperatures for which the emission equations in the previous sections can be used. The discussion in this section qualitatively follows Swanson and Bell’s extensive review on metallic field emission [1973], but focuses on the transition regime between the extended-Schottky model and the thermal-field emission model. Up to now, the current density, energy spread, and brightness can be calculated only numerically. In the following, it is shown that the transition regime can well be described with an analytical expression for the energy spread, which can be easily integrated numerically in order to obtain the current density. 1. Determination of the Parameter Space In this section, the boundaries between the emission regimes, presented in the previous section, are discussed. We start with the transition from Schottky emission to extended-Schottky emission. We propose to define the critical field strength, Fc1 , between the Schottky regime and the extended-Schottky regime as the field at which 10% of the total emission current density is due to tunneling. In other words, q/ sinq D 1.1, i.e., q ³ 0.24, yielding
Fc1 D
p q mkb p h¯ 4 4ε0 e
4/3
T4/3 D 1.74 Ð 104 T4/3
[V/m]
45
This boundary value is thus a function of temperature only. Similarly, the upper limit for the field in the extended-Schottky regime is reached for q D 0.7, corresponding to Fc2 D 6.84 Ð 104 T4/3 [V/m] 46 The minimum value of the field for which the thermal-field emission model is valid is reached for p D 0.7, or p p 2kb Tty0 2mW Fc3 D ³ 1.06 Ð 106 WT [V/m] 47 e¯hp with W in eV and T in K. We take the boundary field Fc4 separating the thermal-field regime from the cold-field regime as the value for which less than 10% of the total emission current is due to emission above the Fermi level, EF . This means that p ³ 0.24 and p 48 Fc4 D 3.10 Ð 106 WT [V/m] again with W in eV.
em iss
io
n
re
gi
m e
109
ot
tk
y
109
W = 2.8 eV (ZrO/W emitter) e im eg r on ssi mi e e ld im fie eg d r l Co ion iss m e ld fie l a erm Th
gi Sc h
ot
tk
y
ex
em iss
te
io
n
nd
re
ed
Transition regime
m e
-S ch
Applied field (V/m)
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
108 100
1000 Cathode temperature (K)
FIGURE 3. Emission regimes as a function of field and temperature for an emitter work function of 2.8 eV.
In Figure 3 the limits of validity of the different regimes are sketched as a function of the electric field and emitter temperature. The work function of the cathode was taken to be W D 2.8 eV, characteristic for the ZrO/W emitter. The temperature axis of the plot runs from 100 K to 3680 K, the latter value being the melting temperature of tungsten. As an upper value of the field axis, the field F for which W D 1W is used, in other words: when the top of the potential barrier equals Ef [Swanson and Bell, 1973]. This map of the parameter space of the ZrO/W emitter can be used to investigate the effect of a change in the field and/or the emitter temperature. Between the analytical emission theories, a transition region exists between temperature-dominant emission and emission in which electron tunneling dominates. In the next paragraphs we investigate the possibility of finding a preferably simple expression to be used in this “forbidden” region. 2. Current Density In Figure 4 the emission current density, j, is shown as a function of the field strength, F, for the ZrO/W electron emitter with a work function of 2.8 eV. The emitter temperature is taken to be 1800 K. The dotted vertical lines correspond to the critical field values Fc that were defined in the previous section.
110
M. J. FRANSEN, TH. L. VAN ROOY, et al. 1016
Fc1
1015
Emission current density (A/m 2)
1014
Schottky regime
Extended-Schottky
Fc2
Fc3
Fc4
Parabolicbarrier app.
1013 1012
Thermalfield regime
1011 W = 2.8 eV
1010 109
(ZrO/W emitter)
108 107
T = 1800 K
T=0K
106 105
Cold field emission
104 108
109 Electric field strength (V/m)
1010
FIGURE 4. Calculated current density of the Schottky emitter as a function of the electric field strength. The work function is taken as 2.8 eV.
In the plot, the current density according to the Schottky theory, Equation (22), is shown with a solid line. In the extended-Schottky regime, the emission current increases more strongly than predicted by Schottky theory, as a result of electron tunneling through the top of the potential barrier. The current density in the extended-Schottky regime, Equation (31), is drawn with a dashed line. Above the critical field strength, Fc2 , it can be seen that the function diverges. Similarly, the current density in the thermal-field emission model, given by Equation (43) and drawn in Figure 4 with a dashed-dotted line, diverges when the field is below Fc3 . At T D 1800 K and W D 2.8 eV, the boundary between thermal-field and field emission is Fc4 D 9.3 Ð 109 V/m. It can be seen that the current density in the thermal-field emission regime meets the zero-temperature field-emission current density here (drawn with long dashes). In the regime between extended-Schottky and thermal-field, no analytical expression for the current density has been found. The expressions for the current density diverge as soon as p or q exceeds 0.7. The current density should, of course, be a smooth varying function of field and temperature. Theoretical values for the current density in the transition region have only been obtained numerically [El-Uareh, Wolfe, 1977]. Let us consider the approximations carried out in the previous sections in more detail. In order to find a simple expression for the emission current
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
111
density, we take the total energy distribution in the extended-Schottky model, Equation (30). It was necessary to omit the unit term in the denominator of the Fermi-Dirac distribution function to obtain Equation (31). It is clear that js ! 1 with increasing field strength. We can leave the unit term intact and integrate Equation (30) numerically. In Figure 4 we plot the resulting current density jPBA with a dashed–triple dotted line. Above q ³ 0.7 the current density calculated with the extended-Schottky model diverges. The numerical integration of Equation (30), however, yields a current density that rises steadily with the field and differs by only a factor of ¾1.5 from the current density in the thermal-field regime at fields exceeding Fc3 D 3.2 Ð 109 V/m. The accuracy of the current density jPBA in the parabolic-barrier approximation was further checked with a full numerical solution of the Schr¨odinger equation for calculating the transmission coefficient [Fransen et al., 1998]. The current density resulting from this numerical evaluation is drawn in Figure 4 with a dotted line. In the remainder of this chapter, we refer to the numerical integration of Equation (30) as a parabolic-barrier approximation and the full numerical evaluation of the Schr¨odinger equation, yielding an exact transmission coefficient, as a numerical model. It is interesting to compare the outcome of the full numerical calculation with the analytical approximations. The largest difference between both calculations is in the Schottky regime, where the current density calculated with the numerical model is about twice as high as the Schottky emission current, jS . The origin of this error results from the fact that in the Schottky regime the tunnel current is taken to be zero. The numerical calculation indicates that even in the Schottky regime tunneling takes place. In the transition regime, the parabolic-barrier approximation overestimates the current density by about 30%. From Figure 4 it is clear that the integration of Equation (30) yields a result that describes the emission current density in the transition regime quite reasonably. Let us discuss the use of Equation (30) for the total energy distribution in the transition regime. 3. Total energy distribution The physical interpretation of the transition region between the extendedSchottky regime and the thermal-field regime becomes clear if one considers the behavior of the total energy distribution as a function of the applied field. In Figure 5 we plot these energy distributions in the different regimes. All distributions are scaled to the same height. The temperature of the emitter is 1800 K. Figure 5(a) shows the total energy distribution at zero field, thermionic emission, given by Equation (12). The trailing edge of the spectrum is determined by the temperature of the emitter; the sharp leading edge results from the work function, taken as 2.8 eV (ZrO/W emitter). In Figure 5(b) the Schottky effect is shown, plotted using Equation (21): the effective barrier height is lowered
112
M. J. FRANSEN, TH. L. VAN ROOY, et al.
thermionic emission
(a)
Schottky emission (b)
extended-Schottky emission
(c)
parabolic-barrier approximation (d)
thermal-field emission (e) −2
−1
0
1
2
3
4
Energy relative to Fermi level (eV) FIGURE 5. Calculated total energy distributions, plotted for different emission regimes. As the field increases, the regime changes from (a) thermionic emission. (b) to Schottky, and (c) to extended-Schottky emission. (d) The changes in shape and position in the transition region are displayed. (e) The total energy distribution in the thermal-field regime is shown for comparison. The work function is W D 2.8 eV and the emitter temperature is 1800 K.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
113
as a result of the field, and the spectrum shifts downwards in energy as a function of the field. The field ranges from 2.5 Ð 108 V/m to 1.25 Ð 109 V/m. Only the first value of the field is below the critical field Fc1 separating the Schottky regime from the extended-Schottky regime. In Figure 5(c) the same energy distributions are redrawn, but now they are plotted with the extendedSchottky equation for the total energy distribution, Equation (30), without the unit term in the denominator of the Fermi-Dirac distribution function. The effect of omitting the tunneling process on the energy spectrum is clear from a comparison of Figure 5(b) with 5(c). In Figure 5(d) we plot the total energy distribution in the parabolic-barrier approximation, Equation (30), in its full form. The field ranges from 2.5 Ð 108 V/m to 4 Ð 109 V/m in steps of 2.5 Ð 108 V/m. In a small range of fields, from 1.75 Ð 109 V/m to 2.75 Ð 109 V/m, the position and shape of the total energy distribution changes drastically. In this range, the peak of the distribution shifts rapidly from the top of the potential barrier, characteristic for emission that is mainly thermionic, toward the Fermi level, EF , where field emission is dominant. At this point in the discussion, the reader may question the accuracy of the parabolic-barrier approximation in describing the strong changes of the spectrum in the transition regime. In the previous section we demonstrated that the approximation is useful for the integral of the spectrum: the current density, jPBA . Let us now compare the full width at half maximum (FWHM) of the distributions calculated with the parabolic-barrier approximation with a numerical evaluation of the transmission coefficient and the total energy distribution [Fransen et al., 1998]. In Figure 6 the dependence of the FWHM on the electric field strength is plotted. The sudden increase in width of the energy distribution in the transition regime, as also observed in Figure 5(d), is in agreement with the full numerical calculation. For larger values of the work function, this increase in width is even more pronounced. Tungsten, having a work function of 4.5 eV, shows a maximum FWHM value of 2.8 eV at T D 1800 K and F D 2.4 Ð 109 V/m. The dependence of the FWHM on the field is described by the parabolicbarrier approximation within 6% of the numerical calculation for an emitter having a work function of 2.8 eV. As a further check of the accuracy of the parabolic-barrier approximation, we compare the position of the maximum of the energy distribution, calculated with the parabolic-barrier approximation, with that found by the full numerical calculation. In Figure 7 the peak position with respect to the Fermi level, EF , is plotted as a function of the electric field. The graph indicates that the parabolic-barrier approximation can safely be used in order to evaluate the position of peak maxima as a function of the field. We use the parabolic-barrier approximation in Sections V and VI.
114
M. J. FRANSEN, TH. L. VAN ROOY, et al. 1.2
FWHM energy spread (eV)
1.0
W = 2.8 eV
0.8
0.6
0.4 Parabolic-barrier approximation 0.2 Numerical calculation 0.0 108
109 Electric field strength (V/m)
1010
FIGURE 6. The FWHM of the calculated total energy distribution, plotted as a function of the electric field in the parabolic-barrier approximation. For comparison, the outcome of a full numerical calculation is also shown. 3 W = 2.8 eV
Position of maximum (eV)
2
1
0 Parabolic-barrier approximation Numerical calculation −1
108
109 Electric field strength (V/m)
1010
FIGURE 7. Calculated position of the peak of the total energy distribution as a function of the field strength resulting from the parabolic-barrier approximation and the numerical evaluation of the transmission coefficient.
115
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
4. Brightness Finally, we discuss the influence of temperature, field, and work function on the brightness of an emitter. Let us consider the reduced brightness as a function of the electric field. We consider two source types. The first one is a ZrO/W emitter, which has a work function of 2.8 eV and is operated at T D 1800 K. The second source is a tungsten field emitter with a work function of 4.5 eV, operated at room temperature. In Figure 8 we show these dependencies. Unfortunately, there is no analytical expression for the brightness in the transition region between extended-Schottky and thermal-field. However, a good approximation of the brightness in the transition regime is given by ejPBA 49 Br,PBA D kb T The emission current in the transition regime jPBA is obtained by integration of Equation (30) in its full form. In Figure 8, the brightness in the transition regime is plotted with a dashed line. It forms a smooth transition between the brightness in the extended-Schottky regime and the brightness in the thermalfield regime, as expected. 1014
Reduced brightness (A/m2sr V)
1012
1010
108
W = 2.8 eV T = 1800 K
W = 4.5 eV T = 293 K
106
104
102 108
109 Electric field strength (V/m)
1010
FIGURE 8. Calculated reduced brightness for the ZrO/W electron emitter and the coldtungsten field emitter.
116
M. J. FRANSEN, TH. L. VAN ROOY, et al.
III. BOERSCH EFFECT FOR ELECTRON EMITTERS At low beam currents, brightness and stability are uniquely determined by the source. At increasing currents, however, the influence of the repulsive forces between individual electrons in the beam becomes so strong that brightness is decreased and energy spread is increased. These two adverse effects on beam quality are generally referred to as Coulomb interactions. An estimate of the magnitude of the statistical Coulomb interactions in a charged particle beam instrument requires the use of Monte Carlo techniques for evaluation in the whole system, a tedious job. Recently, analytical approximations to the statistical Coulomb interactions were derived, valid for a large range of beam currents [Jansen, 1990]. A summary was written by Kruit and Jansen [1997]. With the expressions described there, it is possible to quickly evaluate the magnitude of the repulsive Coulomb forces in field-free beams. In our experiments, to be described later in this chapter, we observe a broadening of the energy spread of the beam as a result of the Coulomb repulsion effect. We therefore limit ourselves to the Coulomb forces that operate along the beam axis, the Boersch effect. In this section, we describe two analytical approximations to the statistical electron-electron interactions. It is outside the scope of this chapter to discuss the theory behind these approximations in full detail; consequently, only the essential equations for the Boersch effect are given. A. Application of Jansen’s Theory in the Emitter Region In the theory of Jansen [Jansen, 1990; Kruit and Jansen, 1997], an electronoptical instrument is modeled as a series of thin lenses, with drift spaces in between. In each drift space, a crossover is present. In a first approach, we simply take the cathode-anode region as field-free and assume that the emitted particles gain all energy very close to the source. We thus modify Jansen’s model by taking only the region of the beam after the crossover into account, so we have to divide his original expressions by 2. A schematic drawing of the beam geometry is shown in Figure 9: the crossover radius is denoted by rc and I is the current in the beam. The conical beam segment has a length L and a half-opening angle ˛. The potential V of the electrons is taken constant in the whole beam segment; in other words, the region close to the apex of the tip where the electrons gain most of their kinetic energy is not modeled. In the analytical approximations to the statistical Boersch effect in a narrow crossover [Jansen, 1990], four regimes are distinguished that can dominate the energy broadening: Gaussian, Holtsmarkian, Lorentzian, and pencil beam regime. In each regime a certain type of interaction dominates. The average broadening resulting from electron-electron interactions is given by 1E. Two
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
117
α
rc If
L V
FIGURE 9. Simple model for the estimation of the Boersch effect in the source region. The figure equals Figure 5 from Kruit and Jansen [1997], but here only the beam segment after the crossover is taken into account.
measures for the width of a distribution are used: the smallest full width containing 50% of the current (FW50) and the full width at half maximum (FWHM) of the energy distribution. After filling in parameters specific for electrons and considering that E D eV in our case, Equations (79–82) from Kruit and Jansen [1997] divided by 2, simplify to 1EFW50G D 208.1I1/2 V1/4 1EFW50H D 4.88 1EFW50L
for
rc '
2.26 Ð 109 [m],Ł ˛2 V
50
I2/3
51
1/3
rc ˛V1/3 I D 1.90 Ð 105 1/2 ˛V
1EFW50P D 1.29 Ð 1018
52
I2 L V
53
In these equations, the broadening is given in electronvolts (eV). For the FWHM energy broadening, the prefactors are 363.3, 7.23, 1.90 Ð 105 , and 2.25 Ð 1017 , respectively. In order to obtain a general expression for the energy broadening, the contributions can be added, using the following rule [Kruit and Jansen, 1997]: 1 1 1 1 1 D C C C 4 4 4 4 1EFW50 1EFW50G 1EFW50H 1EFW50L 1EFW50P
1/4
54
Ł Note that in Kruit and Jansen [1997] the dependency of Equation (79) on the current density is incorrectly given as I1/4 . This is a typing error; the graphical material is correct.
118
M. J. FRANSEN, TH. L. VAN ROOY, et al.
The correct power of the addition was determined by Jansen [1990] by comparison of the analytical expressions with a Monte Carlo simulation. The broadening can be plotted as a function of the emission current in a doublelogarithmic plot, which shows the region of validity of each term. As an example, we take the parameters for a ZrO/W Schottky emitter with an apex radius of curvature of 0.9 m. Figure 10 is plotted for an extraction voltage of 5 kV, a half-opening angle ˛ of 7° , and a facet radius rc D 0.12 m. The beam length, L, is 2.8 mm. The dotted lines are the FWHM broadening values obtained for the individual regimes, and the solid line is an addition of these contributions using Equation (54). The dashed line is an addition of the FW50 energy widths. In most practical applications, the current coming from the facet varies between 1 and 50 A, showing that the dominant regimes for the ZrO/W electron emitter are the Holtsmark regime and the Lorentz regime. Because the beam length, L, does not appear in the equations for these regimes, its magnitude is unimportant. A few years ago, an estimation of the energy broadening in the vicinity of a point source was carried out by Elswijk, van Rooy, and Schiller [1995].
nc
il b
ea
m
10.000
ssia Gau
1.000
an arki
0.100
an
tzi
n ore
ime
n reg
Pe
FWHM and FW50 energy spread (eV)
reg
im
e
100.000
me
regi
tsm Hol e im reg
L
0.010
0.001 0.1
1.0
10.0
100.0
1000.0
Emission current (10−6 A)
FIGURE 10. The four interaction regimes and their influence on the calculated energy broadening due to the Boersch effect as a function of the emitted current. The contribution to the FWHM energy spread for each regime is drawn with dotted lines. The solid line is the overall FWHM energy broadening. The dashed line shows the total broadening using the FW50 prefactors.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
119
Elswijk calculated the energy broadening using Monte Carlo techniques and compared his results with an analytical model proposed by Loeffler [1969]. According to this model, the FWHM energy broadening is given (in eV) by 1E D 4.75 Ð 104
I ˛V1/2
55
a factor of 4 lower than the energy broadening in the Lorentzian regime given by Equation (52). Apart from the pre-factor, Elswijk’s equation can be seen as a subset of the equations given by Jansen. B. Knauer’s Model for Energy Broadening A weak point of the model described in the previous section is the assumption that the kinetic energy of the electrons is constant throughout the emitter region. A simple way to include the nonconstant velocity of the electrons in a model is to assume a spherical electric field around the emitter [Knauer, 1981; Modinos and Xanthakis, 1991]. In this case, the apex radius of curvature of the emitter tip, instead of the size of the crossover, rc , becomes the size parameter in the calculation. Knauer explicitly evaluates the magnitude of the correction to a calculation in which the electron velocity is taken as constant and finds that the energy broadening is a factor of 1.2 higher with the spherical acceleration field. The final expression for the energy broadening is quite similar to the energy broadening according to Jansen in the Holtsmark regime, Equation (51), except for the different angle dependency and prefactor: 1Ekn D 7.43
I2/3 1/3
V1/3 ˛4/3 rtip
56
Again, the broadening is given in eV. The parameter rtip is the radius of curvature of the emitter apex. The difference in dependency with Jansen’s equation for the Holtsmark regime on the half-opening angle ˛ is small. It will be difficult to determine the correct dependency with an experiment. With Knauer’s equation and the expressions from the previous section, we can investigate the relevance of these theories for describing the results of experiments. IV. EXPERIMENTAL METHODS AND SYSTEMS In this section we discuss the experimental setups and methods that we used for the determination of the energy distribution, reduced brightness, and the stability of a ZrO/W emitter.
120
M. J. FRANSEN, TH. L. VAN ROOY, et al.
A. Measurement of the Energy Distribution The energy of charged particles is usually analyzed by measuring the deflection of the particles in electrostatic or magnetic fields [Young and Kuyatt, 1968; Isaacson and Gomer, 1978]. Alternatively, retarding electric fields can also be used [Simpson, 1961; Bell and Swanson, 1979]. This last option is not viable for high-intensity beams, because the retardation close to zero velocity causes additional energy broadening by electron-electron interactions (Boersch effect). Magnetic deflection is, in general, used only in applications where the voltages on electrodes for electrostatic deflection would be impractically high, such as spectrometers for electron energy-loss spectroscopy in the transmission electron microscope. Energy analysis of electrons or ions with deflecting electrostatic fields is widely used in surface analysis techniques, such as Auger electron spectroscopy (AES) and X-ray photo-electron spectroscopy (XPS). The deflecting field can either be applied between two coaxial cylinders, such as in the cylindrical mirror analyzer, or between two cylindrical or spherical electrodes. The advantage of using spherical electrodes is that in this case the instrument is double-focusing: the focusing action occurs in the dispersive as well as in the nondispersive direction [Ballu, 1980]. Another advantage of spherical capacitor analyzers is that these instruments are commercially available. We used a VSW Class 150 electron energy analyzer [VSW, 1995], modified on a few points to improve the electron-optical resolution of the instrument. The analyzer was mounted on a dedicated UHV system. During experiments, the residual gas pressure in the vacuum system was about 2 Ð 108 Pa. The dispersive power of the analyzer increases when the incoming electrons are decelerated before entering the energy-separating region. This is done using a retarding electrostatic lens system. Because of the low energy of the electrons, it is necessary to apply -metal shielding for stray fields. In Figure 11 a schematic drawing of the energy analyzer is shown. The electron current per energy interval is determined by scanning the voltages on the halfspheres and measuring the transmitted electron current using a channeltron, which is an electron multiplier capable of detecting single electrons. The scan voltages on the deflector are controlled via a personal computer (PC). While scanning the voltages on the hemispheres, pulses of the channeltron are read out using an interface board and are stored on the PC. A retractable fluorescent screen is placed in front of the entrance lens of the analyzer. Electrons can enter the analyzer through a probe hole in the screen. The hole has a diameter of 2 mm. The distance between emitter and screen is about 10 cm. A viewport above the entrance lens column enables alignment of the emitter and the apertures with respect to the analyzer with the aid of a laser beam. The laser is mounted on top of the system. It can be aligned with respect to the analyzer with shift-and-tilt manipulators.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
121
laser
deflector apertures
entrance lens
probe hole apertures fluorescent screen emitter module
FIGURE 11. Schematic drawing of the energy analyzer. The emitter module is placed opposite to the entrance lens, which decelerates the electrons before they enter the deflector. A channeltron behind the exit slit of the deflector is used to count individual electrons. Apertures along the beam path are used to control the beam current and/or the resolution of the spectrometer.
Let us take a closer look at the electron-optical resolution of such an analyzer. The resolution is determined by the size of the spot at the entrance and exit sides of the hemispheres, the half-opening angle of the beam, the initial velocity of the electrons, and the mean radius of the hemispheres. This last value cannot be easily changed; it is 150 mm in our case. Sets of apertures at the entrance and exit side of the deflector determine the beam size. For secondary electron emission spectroscopy, such as AES and XPS, the electron current is low and the apertures must be relatively large. For measurement of the energy distribution of high-brightness sources, the size of the apertures can be decreased, which improves the resolution. In our experiments, we mainly used circular apertures of 1.0, 0.3, and 0.1 mm. For alignment of the source with respect to the analyzer, we also used large slits with a diameter of 4 mm ð 12 mm. The electron energy analyzer should yield reliable results for a large range of currents. Especially at higher emission currents, the beam current in the retarding lens system and in the analyzer should still be low enough to prevent a broadening of the initial energy distribution by electron-electron interactions (Boersch effect). We therefore mounted a set of probe hole apertures in front
122
M. J. FRANSEN, TH. L. VAN ROOY, et al.
of the entrance lens system. The size of the probe hole apertures ranged from 5 m to 1 mm. As a result of this, the acceptance angle of the entrance lens changed, and thus the half-angle of the beam that enters between the hemispheres changes too. The electron-optical energy resolution, 1E, of such a spherical capacitor analyzer is a function of the pass energy, Ep , the half-opening angle of the beam entering the analyzer, ˛, the mean radius, R, between the two halfspheres, and the mean width of the entrance and exit apertures of the analyzer, d [Ballu, 1980]: d ˛2 1E D Ep C 57 2R 4 The opening angle ˛ depends on the lens magnification M and the acceptance angle of the entrance lens. This last parameter is determined by the size of the probe-hole aperture in front of the lens. The magnification of the entrance lens is about unity, so ˛ D E/Ep , with E the kinetic energy of the electrons entering the analyzer. For an analyzer aperture size of 0.3 mm, a probe hole aperture size of 0.4 mm, and a pass energy of 20 eV, the electron-optical resolution, using Equation (57), is 40 meV. The stability of the control electronics is of the same order of magnitude, so by adding both contributions quadratically, the analyzer broadening will be 57 meV, provided that the beam path is shielded adequately for stray fields. We determined the resolution of the analyzer with a method described by Young and Kuyatt [1968]. In this method, the energy distribution of a metallicfield emitter is measured at a low emitter temperature. The electron-energy distribution is now close to a step function, and so the high-energy tail of the energy distribution is almost absent. A broadening of the energy distribution as a result of the finite resolution of the analyzer can best be observed at the steepest edges in the spectrum. In our experimental setup, the emitter can be cooled to 77 K. Figure 12 shows such a spectrum, obtained from a cooled h111i-oriented tungsten field emitter, operated at 657 V and an emission current of 25 nA. The experimental spectrum is fitted with Equation (42). The fit is drawn with a dashed line. The solid line is a convolution of the fit with a Gaussian function, describing the energy broadening resulting from the finite resolution of the analyzer. We determine the resolution of the spectrometer from the sharp, temperature-controlled right edge of the spectrum. The best fit with the experimental data is found when the Gaussian broadening has a full width at half-maximum of 53 meV. This broadening is thus due to the complete experimental system — i.e., the power supply of the tip, the mechanical stability of the experimental setup, the influence of stray fields, and the electronics and electron optics of the analyzer.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
123
101 Vext = 680 V
Counts (arb. units)
100 10−1 10−2 10−3
10−4 10−5 −3
−2
−1
0
1
Energy relative to Fermi level (eV) 1.2
Counts (arb. units)
1.0
0.8
0.6 FWHM = 0.27 eV 0.4
0.2
0.0 −3
−2 −1 0 Energy relative to Fermi level (eV)
1
FIGURE 12. Energy distribution of the electrons emitted by a cooled h111i-oriented tungsten field emitter, used for analyzer testing. The upper plot shows the spectrum on a logarithmic scale; the lower plot has a linear y-scale. The dashed line is a fit through the spectrum using Equation (42). The solid line is a convolution with a Gaussian function that describes the additional broadening resulting from the analyzer. From the fit with the temperature edge of the experimental data, we find that the analyzer resolution is 53 meV.
124
M. J. FRANSEN, TH. L. VAN ROOY, et al.
B. Measurement of Brightness The other important parameter for application of an electron source in an electron microscope is the reduced brightness, defined as 1I 1a!0 1 !0 1a1 Vbeam
Br D lim lim
58
in which 1I is the current in a beam segment with an infinitesimally small solid angle, 1 , and an infinitesimally small area, 1a, and measured at a beam potential Vbeam . In electron microscopy, the opening angles are small and 1 D ˛2 , with ˛ the half-opening angle. The brightness is a conserved quantity in an aberration-free electron-optical system, as long as electronelectron interactions are negligible. In an ideal system, its magnitude is thus determined by the source, so it is possible to measure the brightness on an arbitrary position in the beam, for instance, in the specimen plane of the transmission electron microscope. Several methods are described in the literature for determining the brightness of an electron source. The simplest method is a measurement of the current, area, and half-opening angle of the focused probe [Fransen et al., 1999; Lauer et al., 1985]. We used this method for a determination of the reduced brightness of a ZrO/W emitter with a radius of curvature of 0.9 m. Alternative methods exist for measuring emitter brightness, such as the use of an electron biprism [Speidel and Kurz, 1977; Lauer et al., 1985] or the use of the decay in intensity of Fresnel fringes near edges in a partly opaque, partly transparent specimen [van der Mast, 1975]. This last method, however, cannot be used directly in the TEM, as discussed in more detail by Fransen [1999]. We used a Philips CM300ST FEG transmission electron microscope, operated at 300 kV beam energy. Because the brightness is most important for probe-forming applications, the microscope is operated in the nanoprobe mode, i.e., the mode in which the condenser system of the microscope is optimized to form a very small demagnified image of the source on the specimen for a given beam current. A sketch of the instrument is shown in Figure 13. To determine its diameter, the focused probe was magnified on a CCD camera with the projection system of the microscope. The current in the probe is measured on the fluorescent screen of the microscope. The current readings from the screen are calibrated with a Faraday cup prior to the experiment. An aperture in the second lens of the condenser system determines the halfopening angle ˛ of the beam at the specimen position. This opening angle can be deduced from the size of the disc in the back-focal plane of the objective lens of the microscope. In determining these three parameters, the measurement of the probe size requires most care, because contributions resulting from lens aberrations and
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
125
Source plane
demagnifying optics α
Condensor system Specimen plane (without specimen)
magnifying optics
Projection system
Recording plane (fluorescent screen) CCD camera
FIGURE 13. Schematic drawing of the elements in a probe-forming TEM.
electron diffraction need to be taken into account. As long as these effects do not dominate, they can be corrected for. A convenient method for separating contributions from aberrations and diffraction from the contribution of the geometrical demagnified image of the source is the root-power-sum method [Barth and Kruit, 1996]. In short, the method proceeds as follows: the total probe size, dp , is a summation of four contributions. These are the geometrical probe size, dg , the spherical aberration size, ds , the chromatic aberration size, dc , and the electron diffraction size (Airy disk), ddif . The geometrical probe size, dg , is related to the source size by the demagnification of the system. The virtual source current density has a Gaussian distribution [Hawkes and Kasper, 1989b]. An important issue in the root-power-sum method is a proper the correct width measures for the contributions to the probe size. As shown by Barth and Kruit [1996], one should use the smallest full width containing 50% of the current in the distribution (FW50) instead of the commonly used ones. The FW50 contributions from aberrations and diffraction are given by ds D 0.177Cs ˛3 ,
dc D Kc Cc
1V ˛, Vbeam
and ddif D 0.54
˛
59
The aberration coefficients, Cs and Cc , of the SuperTwin objective lens of the microscope are 1.3 mm and 1.6 mm, respectively. Because the probe size is measured on the CCD, we have to take the aberrations from the condenser part as well as the objective part of the SuperTwin lens into account. We added both contributions linearly. At Vbeam D 300 kV, the electron wavelength D 1.97 pm. According to Barth and Nykerk [1999], the factor Kc depends on the ratio between the FW50 and the FWHM and varies around 0.4–0.6. In the transmission electron microscope, the chromatic aberration of the objective lens can be safely neglected at high acceleration voltages and small ˛.
126
M. J. FRANSEN, TH. L. VAN ROOY, et al.
Without this term, the contributions add up as dp D [d4s C d4dif 1.3/4 C dg1.3 ]1/1.3
60
The correct powers in this addition were derived by Barth and Kruit [1996] by comparison with ray tracing. Once dg , ˛, and I are found, the brightness of the beam can be calculated. In a SEM, a method similar to the one described in this section can be used. In this case the probe size must be determined by scanning the focused probe over a knife edge while recording the transmitted current. The method described in this section can also be used to determine if stray fields, mechanical vibrations, and transverse electron-electron interactions (trajectory displacement) limit the performance of an electron-beam instrument by determining brightness values at a range of extraction voltages and comparing them with theory. C. Indirect Determination of Energy Distribution and Brightness Section II showed that in an emission regime where a high emitter temperature is accompanied by a strong electric field, the emission properties of the emitter can change rapidly from Schottky emission to thermal-field emission. This results in a strong rise of the energy spread in the transition regime. As a result, it is essential to know the field at the emitter surface for an accurate calculation of the emission properties. The linear relation between the extraction voltage Vext and the electric field at the surface of the emitter F is expressed as F D ˇVext , in which the factor ˇ is a complicated function of several geometrical and electrical parameters. The task of estimating the energy spread and the maximum brightness for certain operating conditions reduces simply to a determination of the field factor ˇ and the emitter temperature. There are several possibilities for obtaining a value for ˇ. First, ˇ can be found from a calculation of the potential distribution in the space between emitter, suppressor, and anode. In principle, accurate results are possible by using suitable numerical methods [Kang et al., 1985]. However, it is known that the emitter shape may change during operation, which influences the field factor. It is, at least in principle, also possible to determine the field factor from current-voltage relations, similar to the use of the slope of the Fowler-Nordheim plot in the case of cold-field emission. In many cases [Swanson and Schwind, 1997 Kim et al., 1997], Schottky emission theory is applied, and thus p a value for ˇ is found from the slope of the plot of ln I as a function of Vext ; see
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
127
−12
−13
−14 ln(Icol)
1907 K 1850 K
−15
1780 K 1689 K
−16
1520 K 1478 K
−17 50
60
70 V0.5
80
90
FIGURE 14. Schottky plot of a ZrO/W emitter with an apex radius of curvature of 0.9 m, obtained for six different temperatures.
Equation (22). An example of such a plot is shown in Figure 14. Linear leastsquares fits through the data points yield ˇ-values of 1.2 š 0.1 Ð 105 m1 for the five highest temperatures. Only the value at 1478 K is significantly higher (2 Ð 105 m1 ). A closer examination of Figure 14 reveals that only the plot at 1907 K shows a true straight-line behavior; at the other temperatures an upward curvature of the experimental data points can be seen. This upward curvature is an indication that the use of the Schottky equation for the current density is incorrect and that the extended-Schottky model should be used instead. Swanson and Schwind [1997] propose a method of using the small upward curvature of such a plot for estimating the magnitude of the term q/ sin q in the emitted current density in the extended-Schottky model. We feel, however, that there are better methods available for determining the field factor ˇ. In Section V Equation (30) is fitted to series of experimental energy distributions, obtained as a function of the extraction voltage. From the shape and position of the spectra, it is possible to obtain an accurate value for ˇ. D. Measurement of Emitter Stability In order to measure the emission stability of electron emitters as a function of time, we used a simple vacuum system, as sketched in Figure 15. It contains
128
M. J. FRANSEN, TH. L. VAN ROOY, et al. mass spectrometer + vacuum gauge
retractable flu-screen Ti-sublimator
emitter collector
to ion getter pump
to turbomolecular pump
FIGURE 15. The experimental system used for monitoring electron emission as a function of time.
a ZrO/W emitter, together with its module. Opposite to the emitter module is a Faraday cup. The lens electrode of the module is used to focus the beam into the Faraday cup. The emission current was measured once every 10 s. A computer was used for logging the emission data to a file. We show some results in the next section.
V. EXPERIMENTS ON ZrO/W SCHOTTKY EMITTERS A. Energy Spread of the Schottky Emitter The first authors who studied the energy spread of the ZrO/W Schottky emitter in considerable detail were Bell and Swanson [1979]. In their paper, the authors present measurements on the total energy distribution, which show a considerable broadening at increasing emission current, resulting from the Boersch effect. The broadening is so large that it is hard to separate the intrinsic energy broadening resulting from the emission process from the electron-electron interactions. Energy broadening in the retarding-field analyzer used in Bell and Swanson’s experiments is probably the most important contributor to the large energy widths published. In the present work, we discuss energy distributions obtained with a deflection-type energy analyzer. In the first section, we describe our own experiments on a ZrO/W emitter with an apex radius of curvature of 0.9 m. We use the experimental results to find an accurate estimate of the field factor ˇ for this emitter-extractor system and separate the intrinsic energy broadening from the additional broadening resulting from electron-electron interactions. In the
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
129
second section, we evaluate the experimental energy distributions published by Kim et al., [1997a, b] who measured energy distributions of the electrons emitted from a ZrO/W emitter with an apex radius of curvature of 0.3 m. We now use the parabolic-barrier approximation to explain these experimental results and separate again the intrinsic energy broadening from the Boersch effect. Finally, we demonstrate the usefulness of the parabolic-barrier approximation for ˇ-determination on experimental results taken from a paper by Gadzuk and Plummer [1971], who measured the energy distribution of a tungsten field emitter as a function of the field at 1570 K. 1. Energy Spread of a 0.9 m Schottky Emitter In order to measure total energy distribution, we used the electron-energy analyzer discussed in Section IV.A. The Schottky emitter was mounted in an emitter module, which is shown in Figure 16, and placed in front of the entrance lens of the analyzer. The emitter was spotwelded onto a filament ribbon, which can be heated resistively. The heating current in the experiments ranged from 1.9 to 2.3 A, corresponding to emitter temperatures between 1200 and 1900 K, as determined with an optical pyrometer. Due to the wire resistance, the potential of the emitter tip will change slightly with varying heating current. The emitter is surrounded by three electrodes: the suppressor, the extractor, and the lens electrode. These electrodes are controlled with power supplies floating on the beam potential, Vbeam , 400 V in all experiments. During the experiments the suppressor was kept at a constant voltage of 500 V with respect to the emitter. The extractor voltage varied between 3000 and 6000 V, relative to the emitter voltage. The upper value of the extraction voltage was set by the range of the high voltage supplies used. The lens electrode is meant for keeping the illumination area on the fluorescent screen constant when changing the extraction voltage. In this setup, the extraction field on the emitter can be varied while the beam potential and, thus, the energy window of the analyzer remains constant. The emission current leaving the module, i.e., the beam current, is measured by observing the voltage drop over a resistor R at the grounded side of the beam voltage power supply. As a result of this, the potential of the module changes slightly when the emission current changes. In the experimental results presented here this effect is corrected for. The power supplies shown in Figure 16 are computer-controlled and, together with the computer-controlled electron energy analyzer, it is possible to obtain energy distributions for varying temperature and extraction voltage automatically. Using this setup, we measured energy distributions of a ZrO/W electron emitter with a radius of curvature of 0.9 m at nine different temperatures ranging from 1200 K to 1900 K and seven extraction voltages ranging from 3000 V to 6000 V.
130
M. J. FRANSEN, TH. L. VAN ROOY, et al.
FIGURE 16. Photograph and schematic drawing of the emitter module, mounted on a five-axis manipulator in order to measure the energy distribution. In the schematic drawing the electrical connections to the electrodes of the emitter module are shown.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
131
All 63 spectra were fit with the total energy distributions given by the extended-Schottky model, Equation (30), in which the unit term in the denominator of the Fermi-Dirac distribution function is removed, and the full numerical evaluation of the transmission coefficient, discussed in Section II, using only two free parameters for the fit. The first parameter is the factor ˇ, describing the relation between the extraction voltage and the field on the emitter surface. We assume that this factor stays constant; in other words, we do not assume a change in emitter geometry during the experiment. We checked the validity of this assumption as follows. In scanning electron micrographs taken before and after the experiment, no significant changes in emitter shape were observed, even though the tip was cycled repeatedly from 1200 K to 1900 K while varying the electric field. Note that this assumption may not always be valid. We consider this problem in some more detail when we discuss measurements on the stability of the emitter. The second fit parameter is the offset from zero energy. For each temperature setting it is necessary to take a different offset value to account for the shift in voltage induced by the voltage drop of the heating supply of the emitter. Also, the value for the beam energy, 400 eV in this case, is included in this offset. Because this offset is not important in the following discussion, its value is not worth reporting. For the extended-Schottky model the best fit was obtained for ˇ D 1.9 Ð 105 m1 . With the numerical model a somewhat higher value of 2.2 Ð 105 m1 , accompanied by a slightly different offset, was necessary for a good fit. The difference between the two models is probably caused by the fact that the WKB transmission coefficient is overestimated below the top of the potential barrier in the extended-Schottky model, as compared to the transmission coefficient found directly by numerical integration. Let us compare these results with a determination of the field factor from a Schottky plot. The current-voltage measurements plotted in Figure 14 were recorded in between the energy distribution experiments described here. We found that ˇ D 1.2 Ð 105 m1 , a considerably lower value; note the sensitive dependency of the energy distribution on the field as plotted in Figure 5. It demonstrates that the Schottky plot cannot be used directly to estimate the field factor. As an example of the quality of the fits, consider Figure 17. The temperature of 1780 K, as measured with an optical pyrometer, is close to the optimum operating temperature of the Schottky emitter (1800 K). It is clear from the graph that the fit of the simulations to the experimental data is good, especially at low extraction voltages. We can check this by plotting the experimental data and the fitted theory logarithmically. The resulting curves are shown in Figure 18. The logarithmic plots show that far below the top of the barrier, the assumption of a parabolic barrier shape causes an overestimation of the emission current with respect to the numerical calculation.
132
M. J. FRANSEN, TH. L. VAN ROOY, et al. T = 1780 K
Experimental results
βnum = 2.2 105 m−1
Numerical calculation
βext = 1.9 105 m−1
extended-Schottky model
Vext = 6000 V
Vext = 5500 V
Vext = 5000 V
Vext = 4500 V
Vext = 4000 V
Vext = 3500 V
Vext = 3000 V
391
392
393 Kinetic energy (eV)
394
395
FIGURE 17. Energy distributions for a Schottky emitter with a radius of curvature of 0.9 m. The extraction voltages range from 3 kV to 6 kV, and the emitter temperature is 1780 K. The experimental spectra are fitted with the extended-Schottky model, using ˇ D 1.9 Ð 105 m1 (dashed line), and the numerical model, with ˇ D 2.2 Ð 105 m1 (thin line).
Both plots show that, at higher extraction voltages, the experimental energy distribution is broader than the simulated spectrum. We believe that this is due to electron-electron interactions. In order to check this assumption, we show spectra taken at the highest temperature in our experiment, 1907 K (Figures 19 and 20) and at the lowest temperature of 1210 K (Figures 21 and 22).
133
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY 106
Intensity (arb. units)
104
102
100
10−2
10−4 391
392
393 Kinetic energy (eV)
394
395
FIGURE 18. Figure 17 plotted logarithmically.
At 1907 K, the broadening of the experimental spectra with respect to the calculations becomes evident at medium extraction voltages, supporting the assumption that this effect is due to e-e interactions. This broadening is discussed in more detail in Section V.A.4. In contrast to this, at the lowest temperature we do not expect additional broadening. In this case, however, the value of q increases, and the accuracy of the extended-Schottky model for electron emission should decrease. Figures 21 and 22 show energy spectra obtained at 1210 K. Especially in the logarithmic plot, it is clear that the extended-Schottky model is not able to
134
M. J. FRANSEN, TH. L. VAN ROOY, et al. T = 1907 K βnum = 2.2 105 m−1
Experimental results Numerical calculation extended-Schottky model
βext = 1.9 105 m−1
Vext = 6000 V
Vext = 5500 V
Vext = 5000 V
Vext = 4500 V
Vext = 4000 V
Vext = 3500 V
Vext = 3000 V
391
392
393 Kinetic energy (eV)
394
395
FIGURE 19. Energy distributions for a Schottky emitter with a radius of curvature of 0.9 m. The emitter temperature is 1907 K. Note the deviation of the experimental energy spread from the simulations for increasing extraction voltages.
follow the experimental distribution far below the top of the potential barrier. In this region however, the intensity is low. In Figure 22 the energy distribution is drawn in the parabolic-barrier approximation, i.e., Equation (30) in its full form, with a dotted line, in order to show the difference. A comparison reveals that the difference for this range of fields occurs only in a region
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
135
106
Intensity (arb. units)
104
102
100
10−2
10−4 391
392
393
394
395
Kinetic energy (eV)
FIGURE 20. Figure 19 plotted logarithmically.
where the intensity is low, so we can safely use Equation (31) for calculating the emission current density for this emitter. Note that this is not allowed at higher field strengths. In Figure 23 the full width at half maximum (FWHM) of each experimental spectrum is plotted as a function of the calculated emission current density. The solid lines in Figure 23 represent FWHM values for the same conditions as in the experiment, as calculated with the total energy distribution in the extended-Schottky model djES . At low current densities, the experimental FWHM values correspond reasonably to the calculated energy spreads. For current densities exceeding jES D 108 A/m2 , the experimental spread deviates
136
M. J. FRANSEN, TH. L. VAN ROOY, et al. T = 1210 K
Experimental results
βnum = 2.2 105 m−1
Numerical calculation
βext = 1.9 105 m−1
extended-Schottky model
Vext = 6000 V
Vext = 5500 V
Vext = 5000 V
Vext = 4500 V
Vext = 4000 V
Vext = 3500 V
Vext = 3000 V
391
392
393 Kinetic energy (eV)
394
395
FIGURE 21. Energy distributions for a Schottky emitter with a radius of curvature of 0.9 m. The emitter temperature is 1210 K.
from the calculated FWHM values considerably, again due to the Boersch effect. 2. Energy Spread of a 0.3 m Schottky Emitter Energy spectra of a Schottky emitter with a radius of curvature of 0.3 m were published by the IBM group [Kim et al., 1997a, b]. These spectra exhibit large FWHM values accompanied with a strong change in the shape of
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
137
106
Intensity (arb. units)
104
102
100
10-2
10-4 391
392
393 Kinetic energy (eV)
394
395
FIGURE 22. Figure 21 plotted logarithmically. The dotted line is the energy distribution calculated with Equation (30) in its full form, for comparison with the distributions calculated with the extended-Schottky model.
the spectra for increasing fields. Figure 24 shows experimental energy spectra, taken from Figure 3 of the paper by Kim et al. [1997a], at an emitter temperature of 1800 K. Let us analyze these spectra by fitting the parabolicbarrier approximation and the numerical model to these experimental spectra. Again, only two fit parameters are used: the field factor ˇ and an offset to the graphs accounting for the beam potential. In this case, it was necessary to use Equation (30) in its exact form. For the parabolic-barrier approximation, the best fit was found for ˇ D 8.2 Ð 105 m1 , and for the numerical model ˇ D 9.7 Ð 105 m1 yielded the best results. By fitting each spectrum with an
138
M. J. FRANSEN, TH. L. VAN ROOY, et al.
0.80
1907 K
FWHM Energy spread (eV)
1850 K 0.70
1780 K 1689 K
0.60
1520 K 1478 K
0.50
1400 K 1300 K
0.40
0.30 4 10
1210 K
105
106 107 Emission current density (A/m2)
108
109
FIGURE 23. FWHM energy widths for the Schottky emitter with a radius of curvature of 0.9 m, plotted as a function of the calculated current density using Equation (31) and ˇ D 1.9 Ð 105 m1 . The solid lines are calculated FWHM values. It can be seen that for current densities exceeding 108 A/m2 , the width of the calculated spectra is smaller than the width of the measured ones.
individual ˇ, the fit becomes slightly better, the necessary variation in ˇ being below 10%. Note that the difference between this value for ˇ and our own result from the previous section is not due to the variation in emitter radius only, but also to the dimensions of the emitter module, especially the distance between emitter and extractor. At the lowest fields in Figure 24 the fit of both models is in good agreement with the experimental results. For increasing fields, a symmetrical broadening of the experimental energy distribution with respect to the numerical model occurs. We did not succeed in explaining this effect with a different choice for ˇ. We assume, as before, that this additional broadening is due to electronelectron interactions. The IBM results at a lower emitter temperature, taken from Figure 5 of the paper by Kim et al. [1997a], support this conclusion. Figure 25 shows these results fitted with the parabolic-barrier approximation and the numerical model. A slightly different value for ˇ was necessary to obtain the best fit. The graphs illustrate the limit of usefulness of the parabolicbarrier approximation in representing the exact shape of the energy distribution: The peak shift is accurate, but the leading edge of the spectrum is distorted, because the parabolic approximation of the barrier becomes too inaccurate. Figures 6 and 7 showed, however, that two parameters of practical importance, the position of the spectrum and the width of the distribution, are well described by the energy distribution in the parabolic-barrier approximation. This fact is used to evaluate the experimental data from Kim et al. [1997b]. In this paper, a series of energy distributions is shown in which the width
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY T = 1800 K
Experimental results
βnum = 8.9 105 m−1
Numerical calculation
βPBA = 8.2 105 m−1
Parabolic-barrier approximation
139
Vext = 2500 V
Vext = 2400 V
Vext = 2300 V
Vext = 2000 V
Vext = 1500 V
995
996
997
998
999
1000
Kinetic energy (eV)
FIGURE 24. Energy distributions of a Schottky emitter with an apex radius of curvature of 0.3 m, taken from the paper by Kim et al. [1997a]. The experimental results were fitted with the parabolic-barrier approximation and the numerical model, both described in Section II.
of the distribution decreases for increasing extraction voltage. It can readily be shown with the parabolic-barrier approximation that this is a result of the emission process. Figure 26 shows experimental full width at half-maximum values of the energy distributions taken from the paper by Kim et al. [1997b]. The FWHM values calculated from the parabolic-barrier approximation are drawn with a straight line. The only parameter used for the fit is the field factor ˇ. Again, we find a good fit for ˇ D 8.2 Ð 105 m1 . For the highest emitter temperature, the experimental energy spread is larger than expected from theory, because of the Boersch effect. At the lowest temperature of 1330 K, it can be seen that the energy spread decreases for increasing field strength, consistent with theory. A similar plot can be made of the peak position of the distributions as a function of the field. Especially in the transition regime between thermionic and field emission, the position of the spectrum is sensitive to the field strength.
140
M. J. FRANSEN, TH. L. VAN ROOY, et al. T = 1450 K
Experimental results Numerical calculation Parabolic-barrier approximation
Vext = 2000 V βnum = 9.7 105 m−1 βPBA = 8.5 105 m−1 Vext = 1900 V βnum = 9.0 105 m−1 βPBA = 8.1 105 m−1 Vext = 1800 V βnum = 9.0 105 m−1 βPBA = 7.8 105 m−1 Vext = 1590 V βnum = 9.0 105 m−1 βPBA = 7.6 105 m−1 Vext = 1000 V βnum = 9.5 105 m−1 βPBA = 7.7 105 m−1 995
996
997 998 Kinetic energy (eV)
999
1000
FIGURE 25. Energy distributions of a Schottky emitter with an apex radius of curvature of 0.3 m at 1450 K, taken from the paper by Kim et al. [1997a]. The same fitting procedure was used as in Figure 24. For the low-emission current density associated with lower emitter temperatures, it can be seen that the Boersch effect is absent.
Figure 27 compares the peak position of the spectra from the paper by Kim et al. [1997b] with the peak maxima calculated from the parabolic-barrier approximation. As an additional fit parameter, an offset from the beam potential to a position relative to the Fermi level was again used. For each emitter temperature we had to find an individual offset value, suggesting that the heater supply is located asymmetrically, similar to the authors’ experimental setup, sketched in Figure 16. It can be seen that the experimental results follow the sudden shift in position of the spectrum in the transition regime well. The results demonstrate that it should be possible to determine the field factor ˇ with a very simple energy analyzer — for instance, a retarding-field analyzer. Although the energy broadening resulting from electron-electron interactions (Boersch effect) in such an analyzer will be considerable, the position of the maximum of the distribution is rather insensitive to the symmetrical energy-broadening effect.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
FWHM energy spread (eV)
1.0
0.5
0.0 108
1.5 FWHM energy spread (eV)
1.5
T = 1800 K
109 Electric field strength (V/m)
1.0
0.5
0.0 108
109 Electric field strength (V/m)
0.5
1.5
T = 1450 K
1010
T = 1700 K
1.0
0.0 108
1010
FWHM energy spread (eV)
FWHM energy spread (eV)
1.5
141
109 Electric field strength (V/m)
1010
T = 1330 K
1.0
0.5
0.0 108
109 Electric field strength (V/m)
1010
FIGURE 26. Full width at half maximum of the energy distributions presented in the paper by Kim et al. [1997b]. The lines are calculated with Equation (30). The only fit parameter is the field factor ˇ, taken as 8.2 Ð 105 m1 .
3. Intermezzo: Energy Spread of the Heated Tungsten Cathode Experimental determinations of the total energy distribution in the transition regime between thermionic and field emission are rare in literature. Apart from the recent paper by Kim et al. [1997b], there is one earlier paper by Gadzuk and Plummer [1971], who measured the energy distribution of a tungsten emitter as a function of the field strength at an emitter temperature of 1570 K. In the paper, the authors show that the experimental results correspond well with theoretical spectra, in which a modified Gamov factor was used to describe the tunnel probability near the top of the barrier. With this method, it is necessary to integrate the Gamov factor numerically in order to obtain an energy distribution. The analytical expression found with the assumption of a parabolic barrier, however, is also well suited to explaining the measured energy width and position of the maximum for this emitter type, as will be demonstrated. Consider the full width at half maximum values and peak positions from Figure 2(a) of Gadzuk and Plummer’s paper. We plot these values in Figures 28 and 29. We fit the data points with the calculated FWHM using Equation (30). The work function is taken as 4.5 eV, as normal for tungsten. The only fit
3
T = 1800 K
2 1 0 −1 108 3
109 1010 Electric field strength (V/m) T = 1450 K
2 1 0 −1 108
109 1010 Electric field strength (V/m)
Position relative to Fermi level (eV)
M. J. FRANSEN, TH. L. VAN ROOY, et al.
Position relative to Fermi level (eV)
Position relative to Fermi level (eV)
Position relative to Fermi level (eV)
142
3
T = 1700 K
2 1 0 −1 108 3
109 1010 Electric field strength (V/m) T = 1330 K
2 1 0 −1 108
109 1010 Electric field strength (V/m)
FIGURE 27. The position of the maximum of the energy distribution as a function of temperature and field. The lines are the positions calculated from Equation (30). The experimental values are taken from the paper by Kim et al. [1997b]. Again, the field factor ˇ is taken as 8.2 Ð 105 m1 . An additional fit parameter used was the offset from the beam energy to a position relative to the Fermi level.
parameter is the field factor ˇ. The best fit is found for ˇ D 1.85 Ð 106 m1 . The plot of the FWHM shows that the energy width can rise with a factor of five in the transition regime, as confirmed experimentally. The good correspondence of the results with theory, even at the highest currents, shows that additional broadening due to Boersch effect is negligible for this emitter in the range of fields considered. For emitters with a high work function, the sudden shift in peak position in the transition regime is even more pronounced than for the ZrO/W emitter. Again, the shift of the peak of the energy distribution can be used for a determination of the field factor ˇ. From a fit of the peak positions calculated with the aid of Equation (30) to the experimental data we find that ˇ D 1.9 Ð 106 m1 . It was necessary to apply a small shift of 0.15 eV to the experimental peak positions for a good fit in the y-direction of the plot. It shows that an evaluation of the peak shift as a function of the extraction voltage can well be used to extract a value for the field factor ˇ.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
143
FWHM energy spread (eV)
4
2
0 108
109 Electric field strength (V/m)
1010
FIGURE 28. Experimental FWHM energy widths of a tungsten emitter operated at 1570 K, taken from the paper by Gadzuk and Plummer [1971] and fitted with the theoretical widths calculated from Equation (30).
4. Boersch Effect for Schottky Emitters Let us take a closer look at the broadening of the experimental energy distributions for higher j. In Figure 19 the effect is clearly visible. It is necessary to check the assumption that this broadening is due to the Boersch effect. Section III discussed two of these approximations. The first one is a modification of Jansen’s theory for Boersch effect in a beam segment with a crossover [Kruit and Jansen, 1997], in which the velocity of the electrons is taken as constant. Secondly, the theory by Knauer [1981], who takes the acceleration of electrons in the vicinity of the emitter into account, was described. To compare the measured magnitude of the Boersch effect with Jansen’s expressions, we need to find values for the beam length, L, the emission current, I, the half-opening angle of the source, ˛, and the emitting area having a radius, rc . Note that rc is not equal to the emitter apex radius of curvature rtip that is used in Knauer’s model. These parameters were elucidated in Figure 9. The beam length, L, is simply the distance between the tip apex and the beam-limiting aperture in the extractor and equals 2.8 mm. We cannot use the total emitted current as a value for I, because it is well known that a large part of the current of a ZrO/W emitter is emitted in off-axis directions [Swanson and Schwind, 1997]. Instead, we estimate the current, If , in the
144
M. J. FRANSEN, TH. L. VAN ROOY, et al. 5
Position relative to Fermi level (eV)
4
3
2
1
0
−1 108
109 Electric field strength (V/m)
1010
FIGURE 29. Experimental positions of the maximum of energy distributions of a tungsten emitter operated at 1570 K, taken from the paper by Gadzuk and Plummer [1971] and fitted with the theoretical positions calculated from Equation (30).
central spot coming from the flat facet on the apex of the emitter. In our total energy-distribution measurements we recorded the angular current density leaving the emitter module. The facet current If can be calculated from these data using the fact that the half-opening angle ˛ of the beam coming from the facet is 7° [Tuggle, 1984]. Finally, the crossover radius, rc , is calculated from the emission area A D If /jES . For q-values exceeding 0.7, jPBA must be used. From the experimental data we find that the crossover radius is close to 0.1 m. We take the mean value of our experimental data: rc D 0.12 m. In the analytical approximations to the statistical Boersch effect in a narrow crossover, Jansen distinguishes four regimes that can dominate the energy broadening: Gaussian, Holtsmarkian, Lorentzian, and pencil beam. Section III showed that the dominant regimes for the ZrO/W electron emitter are the Holtsmarkian regime and the Lorentzian regime. For a comparison of the results obtained from the calculations with experimental values, the results of Figure 23 are replotted in Figure 30, but now as a function of the extraction voltage. The four highest temperatures are plotted. The intrinsic energy spread is drawn with a solid line. The additional broadening is shown in the lower part of the graphs. The outcome of Jansen’s equations is drawn with a dashed line, whereas Knauer’s model is drawn with
1.0
T = 1689 K
0.5
0.0 2
FWHM energy spread (eV)
FWHM energy spread (eV)
1.5
4 6 Extraction voltage (kV)
1.5
1.0
T = 1850 K
0.5
0.0 2
4 6 Extraction voltage (kV)
8
145
1.5
1.0
T = 1780 K
0.5
0.0 2
8
FWHM energy spread (eV)
FWHM energy spread (eV)
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
4 6 Extraction voltage (kV)
8
1.5
1.0
T = 1907 K
0.5
0.0 2
4 6 Extraction voltage (kV)
8
FIGURE 30. Experimental (individual points) and calculated FWHM energy spread of the ZrO/W Schottky emitter with a radius of curvature of 0.9 m. The intrinsic energy spread is drawn with a solid line; the lower dashed and dotted curves are the energy-broadening curves in Jansen’s and Knauer’s models, respectively. The upper dashed and dotted lines are quadratic additions of the broadening values to the intrinsic energy distribution.
a dotted line. The additional energy broadening due to the Boersch effect is added quadratically to the energy width predicted by the emission theory and is also plotted with dashed and dotted lines. For the addition of the four components of Jansen’s broadening theory, we used the addition rule given in Section III. In Figure 31 the experimental FW50 energy spreads are compared with theoretical values, with and without the Boersch effect. It is clear from Figures 30 and 31 that both theories can be used to describe the magnitude of the Boersch effect in the vicinity of the ZrO/W emitter. The difference between Jansen’s theory and the experimentally observed broadening is less than 14%. With Knauer’s theory the largest difference is 18%. For sharper emitters, we found that Knauer’s model has a larger range of validity [Fransen, 1999]. We believe that the presence of large field gradients in the vicinity of very small emitters with an apex radius of curvature below 10 nm causes a misestimation of the correct broadening regime in Jansen’s model. More theoretical work has to be done on this subject.
146
M. J. FRANSEN, TH. L. VAN ROOY, et al.
FW50 energy spread (eV)
0.6 0.4 0.2 0.0 2 0.8
FW50 energy spread (eV)
0.8
T = 1689 K
4 6 Extraction voltage (kV)
0.6 0.4 0.2 0.0 2
4 6 Extraction voltage (kV)
0.4 0.2
0.8
T = 1850 K
8
T = 1780 K
0.6
0.0 2
8
FW50 energy spread (eV)
FW50 energy spread (eV)
0.8
4 6 Extraction voltage (kV)
8
T = 1907 K
0.6 0.4 0.2 0.0 2
4 6 Extraction voltage (kV)
8
FIGURE 31. A repeat of Figure 30, but now the FW50 values are calculated as a function of the extraction voltage. Only the results from Jansen’s model are shown.
The accuracy of the estimation of the Boersch effect for the ZrO/W emitter can be tested further by using the results presented by Kim et al. [1997a] in combination with the analysis of these results in Section V.A.2. The estimation of the Boersch effect follows the same route as indicated in the previous paragraphs, with one minor difference: instead of jES , the current density in the parabolic-barrier approximation jPBA is used, obtained by numerical integration of Equation (30). The facet current If is calculated with the assumption that the half-angle of the emission cone coming from the facet is the same as in our own experiments (7° ). With these assumptions, we arrive at a facet radius, rc , of only 15 nm. The resulting FWHM energy spreads at 1800 K are plotted in Figure 32. Again, the energy broadening obtained with Knauer’s model is drawn with a dotted line, and Jansen’s model is drawn with a dashed line. The broadening is added quadratically to the intrinsic energy broadening. Also in this case the calculated FWHM values correspond well to the experimental energy spreads. The error in the estimation is below 10%. In this case, it can be seen that the intrinsic energy spread decreases when the extraction voltage exceeds 2.6 kV. However, the Boersch effect is so strong that this will not be visible in the experiment.
147
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY 1.5
FWHM energy spread (eV)
T = 1800 K
1.0
0.5
0.0 1.0
1.5
2.0 Extraction voltage (kV)
2.5
3.0
FIGURE 32. A repeat of Figure 30, but now the Schottky emitter has an apex radius of curvature of 0.3 m. The data were taken from the paper by Kim et al. [1997a].
B. Brightness of the Schottky Emitter In spite of the importance of the reduced brightness for electron microscopy, experimental determinations of this quantity are rare in literature. We are aware of two previous attempts in which the brightness of the ZrO/W emitter was determined experimentally. Otten [1994] reported a reduced brightness of 2 Ð 107 A/m2 sr Ð V, measured in the TEM on an emitter with an apex radius of curvature of 0.9 m. Samoto et al. [1985] measured a value of 4 Ð 107 A/m2 sr Ð V. In their paper, an emitter with an apex radius of curvature of 0.8 m was used. In both cases the brightness was determined from probe-size and probe-current measurements. For the cold-field emitter, more results are available. The highest brightness value for a cold-field emitter measured in the electron microscope was reported by Speidel and Kurz [1997]. The brightness was calculated from the decrease in modulation depth of hologram fringes, produced with a biprism, as a function of the illumination angle. The emitter was operated at 2.5 kV, yielding an emission current of 1 A. The authors found, for a tungsten h310i-oriented tip operated at room temperature, that Br D 2 Ð 108 A/m2 sr Ð V. The experiment was repeated by Lauer, Hanszen, and Ade [1985] and Hanszen, Lauer, and Ade [1985] who determined a reduced brightness of 2 Ð 106 A/m2 sr Ð V with the
148
M. J. FRANSEN, TH. L. VAN ROOY, et al.
same method and 2 Ð 107 A/m2 sr Ð V from probe-size and probe-current measurements in STEM mode. According to these authors, the difference between the methods results from stray fields. It is difficult to estimate the magnitude of such disturbances, because for calculating the theoretical brightness it is necessary to know the field on the emitter surface. In the authors’ experiment an emitter was used with the same apex radius of curvature as for the energy-distribution measurements in a similar module. For a comparison of the experimental brightness values with theory we can use the same value for ˇ, 1.9 Ð 105 m1 . Let us describe our experimental results in some more detail. The current on the spot was measured on the fluorescent screen of the microscope. For extraction voltages that ranged from 2.75 kV to 4 kV, the probe current varied between 0.07 nA and 0.3 nA. From the size of the spot in the diffraction plane of the microscope, we found a half-opening angle of the beam of 6.2 mrad. The magnification of the projection system was 1.2 Ð 106 . In Figure 33(a) we show an enlarged image of the probe. In order to improve visibility in print, a gamma correction of 3 is applied to Figure 33(a). In Figure 33(b) a contour plot of the same data is shown, without gamma correction. The contours are plotted at 10%, 20%, . . ., of the maximum intensity. Between successive images, the beam is blanked in order to protect the camera. For short exposure times, the influence of the beam blanker is visible in the low-intensity asymmetric features in the contour plot. Its influence was minimized by taking the smallest cross section through the spot, perpendicular to the low-intensity tail. The direction of such a cross section is shown in Figure 33(b) with a dashed line. In Figure 34 the cross section is plotted. 150
100
Pixel number
Pixel number
150
50
100
50 0.5 nm
0
0
50 100 Pixel number
150
0
0
50 100 Pixel number
150
FIGURE 33. A typical CCD image of the focused probe (left), together with a contour plot (right). A gamma correction of 3 was used for the CCD image. The lines drawn in the contour plot correspond to intensities relative to the maximum of 10%, 20%, and so on. The asymmetries in the spot at low intensities are an artifact due to the beam blanker, visible at short exposure times of the camera. A one-dimensional cross section (denoted with a dashed line) is drawn through the profile in such a way that the asymmetries are minimized.
149
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY 3000
2500 FWHM = 0.38 nm
Intensity on CCD
2000
1500
1000
500
0 0.0
0.5
1.0 1.5 Position in specimen plane (nm)
2.0
FIGURE 34. The one-dimensional cross section along the dashed line through the contour plot of the spot shown in Figure 33. The Gaussian fitted through the experimental data points demonstrates that the simplification of taking the one-dimensional cross section of the beam is justified.
The spot image shown in Figure 33 is not purely a demagnified image of the source, but it is enlarged by the influence of lens aberrations and electron diffraction. These effects are corrected for with the root-power-sum method [Barth and Kruit, 1996] described in Section IV.B. In order to use this method, the two-dimensional FW50 of the focused probe must be found. The determination of this quantity is simplified considerably if the spot profile is Gaussian, because for a two-dimensional Gaussian beam, the FW50 equals the FWHM of a one-dimensional cross section. In Figure 34 a Gaussian curve has been fitted through the one-dimensional cross section of the spot shown in Figure 33. Although this method is an approximation of the correct way of determining the two-dimensional FW50 from Figure 33, the minor differences between the measured spot profile and the Gaussian fit indicate that, in this case, the error in using the FWHM of a one-dimensional cross-section is insignificant for the determination of dp . Typical 1-D FWHM spot sizes are about 0.4 nm. The geometrical probe size, dg , can now be calculated by inverting Equation (60). The reduced brightness can now be calculated with the use of the wellknown experimental definition [Barth and Kruit, 1996], in which the total current in the probe I is divided by the FW50 probe area, 0.25d2g , the solid angle, D ˛2 , and the acceleration voltage, V.
150
M. J. FRANSEN, TH. L. VAN ROOY, et al. 109
Reduced brightness [A/m2 sr V]
β = 1.9.105 m−1
108
107
106 2
3
4
5
Extraction voltage (kV)
FIGURE 35. Reduced brightness as a function of the extraction voltage. Measured data points are plotted without (ž) and with (4) root-power-sum correction. The maximum axial brightness predicted by the emission theory is plotted with a solid line. A field factor ˇ D 1.9 Ð 105 m1 is assumed.
In Figure 35 we plot the reduced brightness without (ž) and with (4) the root-power-sum correction as a function of the extraction voltage. It can be seen that the uncorrected experimental results are of the same order of magnitude as those reported by Otten [1994] and Samoto et al. [1985]. To calculate the theoretical reduced brightness as a function of the extraction voltage, F D ˇV is substituted into Equation (32), using the value for ˇ that was found in Section V.A.1. The resulting brightness values are drawn with a solid line in Figure 35. The experimentally determined values of the reduced brightness, corrected for lens aberrations and electron diffraction, are close to the maximum axial brightness in the extended-Schottky model. To our knowledge, this is the first quantitative confirmation of this equation. The close correspondence strongly suggests that the influence of external disturbances, like stray fields and mechanical vibrations, was negligible in this experiment. Furthermore, the fact that the agreement between the experimental results and the emission theory is good even at the highest extraction voltages demonstrates that transverse electron-electron interactions, which would decrease the experimental brightness, are negligible in this experiment. It may well be, however, that at higher emission currents, trajectory displacement is present [Tiemeijer, 1999].
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
151
Finally, the experimental brightness of the ZrO/W emitter in the transmission electron microscope is comparable with the results reported for cold field emitters in similar instruments.
C. Emission Stability One of the most important advantages of the Schottky emitter is the relatively large emitting area, combined with a high operating temperature. Because of this, impurities and inhomogeneities on the surface do not have a large influence on the total emission current and are quickly diffused away and annealed. However, it is well known that the shape of the emitter under an applied field may not be stable, because of the dynamic equilibrium between the surface tension on the emitter and the electric field [Swanson and Schwind, 1997]. A change in the electric field can cause a slow drift of the size and shape of the emitting facet, which may last until a new equilibrium state is found. The high temperature of the Schottky emitter helps to speed up this process considerably. We studied this effect in a small vacuum system, sketched in Figure 15, in which a module containing a ZrO/W emitter with a radius of curvature of 0.53 m was mounted (see Figure 16) facing a Faraday cup. Once every 10 s, the current leaving the emitter module and the current entering the Faraday cup are measured, and the data are stored in a personal computer. After the first 3 days, during which small changes to the operating parameters of the tip were made, the heating current was fixed to 2.4 A and the extraction voltage to 5.2 kV. During the startup, the emission pattern was recorded a few times. Figure 36 shows these micrographs. The concentric dark rings in the first micrographs are well known [Tuggle, 1984] and are thought to be due to the build-up process of a new facet. Indeed, after a few hours of emission, the concentric rings have disappeared, as demonstrated by Figure 36(c).
(a)
(b)
(c)
FIGURE 36. Emission patterns of the ZrO/W Schottky emitter during start-up. In the first patterns, dark concentric rings are visible.
152
M. J. FRANSEN, TH. L. VAN ROOY, et al.
We recorded the emission current leaving the module and the current entering the Faraday cup. Our simple data-logging system did not allow the temporal changes of the emission current to be observed while the experiment was running. A few Schottky plots were made at irregular intervals, and some more micrographs of the emission pattern were taken. In the micrographs, an evenly illuminated spot was visible. After 14 days of continuous operation, the emitter was stopped and the current leaving the emitter module was plotted. Figure 37 shows the emitter stability as a function of time. The results are surprising: after 3 days of stable operation, the emission current gradually rises, until a maximum is reached. Within a short time, the emission current drops from 10 to 5 A, after which the process repeats itself. As time goes by, the distance between the dips increases, and the duration of the effect increases as well. Figure 38 zooms in on these instabilities. The dips in the stability show a striking similarity when observed on a smaller time scale: all feature a decrease in current followed by one short-time dip, after which the current increases again. During this increase, more short-time depressions are observed. After a period of gradual increase, the current suddenly stabilizes, and the emitter resumes its desired mode of operation. The behavior just described is known in the literature as ring collapse [Swanson and Schwind, 1997]. A schematic drawing of the effect is shown in Figure 39. It was taken from page 16 of [Tuggle, 1984]. If the field on the emitter during operation is too low, it tends to reshape itself toward a larger radius of curvature. This means that material is transported from the center of the emitter toward the sides. The scanning electron micrographs shown in Figure 40 support this: after operation, the emitter radius has increased. Furthermore, the diameter of the cylindrical part of the emitter is larger, whereas the length of this part has become shorter. If we calculate the volume in each cylindrical part (ignoring the spherical end), we find that the amount of material in both volumes is the same. This confirms that material is transported from the apex of the tip toward the shank, causing a gradual increase in emitter radius. Note that this effect is not directly visible in the emission current: the field decreases, but the emitting area gets larger. The explanation of all the small features in Figure 38 still awaits further study. The optimum field strength, Fopt , that balances the surface tension, , is given by Fopt D
4k εo r
61
in which k is a form factor. According to Barbour et al. [1960], k D 0.5 for metallic emitters. The surface tension was estimated by Swanson and Schwind [1997] to be 1.41 N/m, yielding an optimal field strength of 8 Ð 108 V/m. Let
153
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY 12
Current (μA)
10 8 6 4 2 0 0
6
12
18
24
30
36 Time (h)
42
48
54
60
66
72
0 72
78
84
90
96
102
108 Time (h)
114
120
126
132
138
144
150
156
162
168
174
180 Time (h)
186
192
198
204
210
216
222
228
234
240
246
252 Time (h)
258
264
270
276
282
288
294
300
306
312
318
324 Time (h)
330
336
342
348
354
360
12
Current (μA)
10 8 6 4 2
12
Current (μA)
10 8 6 4 2 0 144
12
Current (μA)
10 8 6 4 2 0 216
12
Current (μA)
10 8 6 4 2 0 288
FIGURE 37. The beam current leaving the ZrO/W Schottky emitter module, recorded over 15 days.
154
M. J. FRANSEN, TH. L. VAN ROOY, et al.
60
Current (including offset) (μA)
40
20
0 −60
−40
−20
0 Time (minutes)
20
40
60
FIGURE 38. The dips observed in Figure 37 plotted on a smaller time scale. An arbitrary offset is added to the emission current in order to separate the individual events from each other.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
155
(a) (b) (c) (d) FIGURE 39. Schematic drawing of facet formation during emitter operation.
FIGURE 40. The tip apex of the Schottky emitter used for the stability experiment before (a) and after (b) operation. It is clear from the micrographs that the emitter radius has increased. Note that this effect is not directly visible in the emission current: the field decreases, but the emitting area gets larger.
us estimate the field strength in the stability experiment described here. The value of ˇ in this experiment is around 3.0 Ð 105 m1 . The procedure for estimating ˇ is described in the next section. At 5.2 kV, this means that F D 1.6 Ð 109 V/m. It is clear from the experimental data that even at this field strength emitter dulling occurs, indicating that Swanson’s estimate is too low. More work is necessary in order to quantify the relation between Fopt and r. Once such a relation is found for one emitter, the optimum field strength for other emitter radii is simply calculated with the aid of Equation (61). At this point in the discussion the reader may question the importance of the ring-collapse effect. A wrongly chosen extraction field may cause an increase of the emitter radius, but the emitting area gets larger as well, so the
156
M. J. FRANSEN, TH. L. VAN ROOY, et al.
emission current is not strongly influenced. However, during a ring collapse the properties of the beam change considerably. Figure 39(c) shows that just before the flat facet is restored, a small protrusion is left on the terrace. It is known from high-voltage theory that at these small protrusions, the field can locally be higher with a factor of three. Section II showed that, in the extendedSchottky regime and the transition regime, the energy spread depends strongly on the applied field. The energy spread of the total beam is, of course, the sum of the energy distribution of the electrons emitted by the protrusion and the distribution of the electrons emitted by the rest of the emitter area. The Schottky effect causes a shift between these distributions, and the resulting energy spread may be very large, which is detrimental for the performance of the electron microscope. Furthermore, the dependency of the emitted current on the extraction voltage changes strongly, and this is reflected in the slope of the Schottky plot. Attempts to determine the field factor ˇ will yield erroneous results. The slopes of the Schottky plots taken just before a collapse support this. From the Schottky plot taken after 24 h of emission, a value for ˇ of 1.8 Ð 105 m1 was determined, which seems to be rather low when compared to the value of 1.9 Ð 105 m1 for a blunter emitter, but one should realize that estimations of ˇ from Schottky plots tend to be too low. The Schottky plot recorded after 174 h yields ˇ D 7.2 Ð 105 m1 , and after 247 h, ˇ D 4.1 Ð 105 m1 . Both results were obtained just before a collapse occurred. Once again, this experiment shows that firm conclusions about the emitter properties cannot be drawn from a single Schottky plot. The ring-collapse effect can also influence experiments for determining the field factor ˇ from series of energy distributions. In fact, we also measured the energy distribution of the electrons emitted from a ZrO/W emitter with an apex radius of curvature of 0.5 m, as a function of temperature and field. Unlike the experiments discussed in Section V.A.1, not all spectra could be fitted with a single value for the field factor ˇ. Large energy widths, which could not be reproduced were sometimes measured. It is likely that the emitter shape changed during the experiment, in the same way as described before. It is clear that for reliable operation the field strength should be chosen in such a way that ring collapses do not occur. If chosen properly, periodic depressions in the emitted current and the large increase in energy spread accompanying a ring collapse are prevented.
VI. APPLICATION TO OTHER EMITTERS The previous section demonstrated the usefulness of the expressions for the energy distribution, brightness, and Boersch effect. These expressions can also
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
157
be used on other emitter shapes and types. We deal with two examples. The energy spread of a Schottky emitter with a smaller radius of curvature than the one previously measured, under similar operating conditions, is estimated. Then, the consequences of 0.9 m tip changing the ZrO coating to another one, for instance HfO, are described. A. Electron-optical Parameters of a 0.5 m Schottky Emitter The previous sections showed that the field on the emitter surface — and thus the intrinsic energy spread and brightness — can be determined accurately from experimentally determined energy distributions of Schottky emitters. Here, we use these results to see whether we can predict the effect of changes in the emitter parameters, such as the emitter radius. Let us consider the energy distribution of a Schottky emitter with a radius of curvature of 0.5 m. We assume that it is mounted in the same module as used for the 0.9 m emitter, so that, apart from its radius, all geometrical dimensions are the same. An empirical model for the influence of geometry changes on the field factor is given by Gomer [1961]. p In this model, originally derived by Charbonnier, ˇ D Krtip 1 and k D C 3 ˛x/rtip 0.13 , with x the emitter-anode distance and ˛ the half-angle of the cone-shaped emitter. Charbonnier found that the constant C equals 0.59 when ˛ is expressed in degrees. For the Schottky emitter with its additional suppressor electrode, the value of this constant is not correct, but the parameter dependency can be used to derive ˇ0.5 D r0.9 /r0.5 0.87 ˇ0.9 , yielding ˇ0.5 D 3.2 Ð 105 m1 . A more elaborate model, specifically designed for the Schottky emitter, is described by Swanson and Schwind [1997]. In this model, the dependency of ˇ on r is given as ˇ D Kr 0.76 , in which the constant K is a complicated function of the distances between suppressor, tip and anode, the ratio between anode voltage and suppressor voltage and the ratio between facet size and emitter radius. Using this equation, we find that ˇ0.5 D 3.0 Ð 105 m1 . With such an estimate for ˇ, the magnitude of the Boersch effect can be calculated with the aid of Jansen’s and Knauer’s equations. Again, the halfopening angle of the beam is assumed to be 7° , and the crossover radius, rc , is 59 smaller than found for the ZrO/W emitter with a radius of curvature of 0.9 m. Figure 41 shows the expected full width at half maximum energy spread as a function of a typical range of extraction voltages. Recent experiments by one of the authors [Tiemeijer, 1998] are in good agreement with these results. This value for the field factor can be used to estimate the reduced brightness of a ZrO/W emitter with an apex radius of curvature of 0.5 m. According to Figure 8, this may imply that a brightness of 6 Ð 109 A/m2 sr V is achievable for such an emitter, operated at 1800 K and 5 kV. Such a high value may, however, not be measured in practice, as a result of trajectory displacement
158
M. J. FRANSEN, TH. L. VAN ROOY, et al. 1.5
FWHM energy spread (eV)
T = 1800 K
1.0
0.5
0.0 2
4
6
8
Extraction voltage (kV)
FIGURE 41. Full width at half maximum energy spread as a function of the extraction voltage, plotted for the ZrO/W emitter with an apex radius of curvature of 0.5 m. The solid line is the intrinsic energy spread, and the two dashed and dotted lines are the expected energy broadening values according to Jansen’s and Knauer’s theory, respectively.
at high emission-current densities. Recent calculations by one of the present authors [Tiemeijer, 1999], however, indicate that the trajectory displacement effect in the emitter region is small, even at high emission current densities. It is clear that more experimental work has to be done in order to verify the correctness of this value. B. HfO/W versus ZrO/W As another example, we consider the influence of a change of the work function of the emitter. A disadvantage of the ZrO/W electron emitter is the large thermionic emission current from the shank of the tip. These electrons, originating far off-axis, may be scattered elastically or inelastically inside the emitter module, and part of these may enter the electron-optical column with an energy lower than that of the primary beam. This effect has important consequences for the quality of the beam. Near the specimen, a weak intensity pattern with a fourfold symmetry can be observed around the central spot. As a solution for this, HfO instead of ZrO is proposed as a work function–lowering coating. As a result of the higher work function, the thermionic emission current will
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
159
be less intense. The work function of HfO is 3.0 eV, 0.2 eV higher than the ZrO/W emitter. In this section we study the consequences of the increase in work function on parameters such as current density, total energy distribution, Boersch effect, and brightness. For comparison, we use the field range in which the ZrO/W emitter is usually operated: 5 Ð 108 V/m for the 0.9 m radius of curvature TEM emitter at 3 kV and ¾ 2 Ð 109 V/m for the 0.3 m emitter at the highest-reported extraction voltage. For both emitters, a temperature of 1800 K is assumed. The parabolic-barrier approximation, Equation (30), is used for the calculation of current density and total energy distribution. We compare the two emitters for two situations: equal current density, i.e., a higher field for the HfO/W emitter, and equal field. Let us consider these two possibilities, in more detail, starting with the current density. Figure 42 plots the fraction between the ZrO/W and HfO/W current density. For thermionic and Schottky emission, this fraction is a constant, given by exp[WHfO/W WZrO/W /kB T] D 3.63. Thus, it can be concluded that the current emitted from off-axis positions is lowered by the same factor. At the apex of the tip, however, the field is higher, and a part of the current is due to tunneling through the top of the potential barrier. This tunnel process, however, is almost the same for both emitters, so the fraction of the loss in current is not as large when the field is kept constant, as can be seen in Figure 42.
jZrO/W/jHfO/W
4
2
0 108
109 Electric field strength (V/m)
1010
FIGURE 42. Fraction of the emission current density of the ZrO/W emitter and the HfO/W emitter, as a function of the electric field strength. In the Schottky regime, the fraction is a constant. As the importance of tunneling increases, the fraction becomes smaller.
160
M. J. FRANSEN, TH. L. VAN ROOY, et al.
We have determined the increase in field strength necessary for maintaining the same current density when changing the ZrO coating to HfO graphically from a plot of the current density as a function of the field strength. It turns out that the correction can be described simply as FHfO/W D FZrO/W C 2.8 Ð 108
62
with F in V/m. As an example, let us consider the 0.9 m radius of curvature ZrO/W emitter that was discussed in Section V.A.1. A typical extraction voltage for this emitter type is 4.5 kV and the corresponding field strength is 8.6 Ð 108 V/m, so the current would be lower by a factor of 3.63. In order to obtain the same current, the extraction voltage has to be raised to 5.97 kV. If the sharp ZrO/W emitter with its apex radius of curvature of 0.3 m, discussed in Section V.A.2, is replaced by an HfO/W emitter, the current would be lower by a factor of 3.34, for an extraction voltage of 2500 V and ˇ D 8.2 Ð 105 m1 . An increase in voltage of only 341 V is necessary to achieve the same current density. The shape of the total energy distribution of the HfO/W emitter is equal to that of the ZrO/W emitter in the Schottky and extended-Schottky regimes, as can be observed in Figure 43. In the transition regime between thermionic and
FWHM energy spread (eV)
1.5
1.0
0.5
ZrO/W emitter HfO/W emitter 0.0 108
109 Electric field strength (V/m)
1010
FIGURE 43. FWHM energy distribution as a function of the electric field strength for the ZrO/W and HfO/W electron emitter.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
161
field emission, the energy spread will be somewhat larger, due to the 0.2 eV larger distance between the Fermi level, EF , and the top of the potential barrier. If a ZrO/W emitter is changed for an HfO/W emitter and the emission current density should be the same, the energy spread of the latter is larger, at least at fields below 2 Ð 109 V/m. For the ZrO/W 0.9 m emitter at 4.5 kV, the FWHM energy spread is 0.49 eV. For the same current density, at 5.97 kV, the HfO/W energy spread is 0.57 eV. For an emitter with a lower radius of curvature, the field is higher, and the effect of a change in emitter material is more drastic. Consider a ZrO/W emitter with an apex radius of curvature of 0.3 m. At V D 2.5 kV, the intrinsic energy spread of the ZrO/W emitter is 1.14 eV. For the HfO/W emitter, this value increases to 1.25 eV. At 2.84 kV, the intrinsic energy spread of the HfO/W emitter is about 1.33 eV. For fields exceeding 2 Ð 109 V/m, the energy spread will decrease if the coating is changed from ZrO to HfO and the emission current is kept the same. Such a situation, however, will probably not occur because of the Boersch effect. To estimate the change in magnitude of the Boersch effect when the work function is changed, consider two emitter types, the 0.9 m and the 0.3 m Schottky emitter. Figure 44 shows a plot of the intrinsic energy width of the 0.9 m ZrO/W emitter and the HfO/W emitter with a solid line. Because the field strength is low for this emitter type, the intrinsic spread of both emitters is equal, and the Boersch effect is not so strong, especially for the HfO/W emitter, because the current density is low. The magnitude of the Boersch effect is calculated with Knauer’s theory and Jansen’s model for the energy broadening. The results produced by these theories are plotted with dotted and dashed lines, respectively. The resulting energy spread is obtained by adding the intrinsic energy width and the broadening quadratically. The parameters mentioned in Section V.A.4 are used: ˇ D 1.9 Ð 105 m1 , ˛ D 7° , rc D 0.12 m, and L D 2.8 mm. In this way, the only difference between the emitters is the magnitude of the facet current If . We make a worst-case estimation of the energy broadening and use Knauer’s theory. For an extraction voltage of 4.5 kV, the energy spread of the ZrO/W emitter is 0.53 eV. For the HfO/W emitter 0.49 eV is measured, due to the lower contribution by the Boersch effect. If the HfO/W emitter is operated at 5.97 kV in order to compensate for the loss in current, the energy width is 0.61 eV. For the sharp Schottky emitter with an apex radius of curvature of 0.3 m, the plot is somewhat different, as shown in Figure 45. Again, we use the parameters from Section V.A.4: ˇ D 8.2 Ð 105 m1 and rc D 15 nm. The intrinsic energy width of the ZrO/W emitter is represented by the lower solid line; the upper solid line is the intrinsic energy spread of the HfO/W emitter. The lower set of dashed and dotted lines corresponds to the sharp HfO/W emitter, calculated with Jansen’s and Knauer’s broadening model, respectively. For an extraction voltage of 2.5 kV we find with Knauer’s broadening model
162
M. J. FRANSEN, TH. L. VAN ROOY, et al.
FWHM energy spread (eV)
1.5
1.0 ZrO/W HfO/W
0.5
0.0 2
4
6
8
Extraction voltage (kV)
FIGURE 44. Energy widths for the 0.9 m ZrO/W and HfO/W emitter. The intrinsic energy width is equal and drawn with a solid line. The dotted lines are calculated with Knauer’s model for energy broadening, and the dashed lines are done with Jansen’s theory.
a width of 1.38 eV for the ZrO/W emitter and 1.30 eV for the HfO/W emitter. An increase of the voltage in order to obtain the same current density with the HfO/W emitter yields a distribution having a width of 1.49 eV. Due to the lower current density of the hafniated emitter, its brightness will also be lower. The brightness is linearly dependent on the emission current density, so Figure 42 can be used directly for an estimation of change in brightness when a HfO/W emitter is employed instead of a ZrO/the emitter.
VII. CONCLUSIONS Let us return to the Introduction and see whether we have answered the questions that were posed there. We have shown that the generally accepted values for energy spread and brightness need some correction. It is possible to operate a ZrO/W electron emitter in such a way that one can achieve a brightness of 1.2 Ð 108 A/m2 sr Ð V combined with a full width at half maximum energy spread of 0.48 eV. An increase of the electric-field strength will result in a higher brightness, accompanied by an increase in energy spread. The calculations indicate, for
163
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY 2.0 ZrO/W
FWHM energy spread (eV)
1.5
1.0
HfO/W
0.5
0.0 1.0
1.5
2.0 Extraction voltage (kV)
2.5
3.0
FIGURE 45. A repeat of Figure 44, but ZrO/W and HfO/W emitters with an apex radius of curvature of 0.3 m are now used.
instance, that it may be possible to operate a ZrO/W emitter with an apex radius of curvature of 0.5 m at a brightness of 6 Ð 109 A/m2 sr Ð V and an energy width of 1.2 eV. Currently, no experimental data are available on the transverse electron-electron interactions (trajectory displacement) in the emitter region, so it is not possible to say if such a high brightness value can be reached in practice. Recent calculations by one of the authors indicate, however, that the trajectory displacement effect in the emitter region is small. More experimental work is necessary in order to clarify this subject. The broadening of the energy distribution at high electric-field strengths is partly caused by the emission process itself and partly by statistical electronelectron interactions in the beam, the Boersch effect. These two effects can well be separated using simple analytical models. A comparison of these models with experimental results obtained in the authors’ experimental setup as well as results taken from literature allow confidence in using these models for optimizing Schottky electron emitters for certain applications. In order to optimize a ZrO/W emitter it is necessary to determine the field on the emitter surface accurately. It is possible to determine the field factor ˇ from an analysis of the peak position of the energy distribution as a function of the extraction voltage, measured with a simple energy analyzer. The authors have a strong preference for this method over the use of Schottky plots.
164
M. J. FRANSEN, TH. L. VAN ROOY, et al.
REFERENCES Ballu, Y. (1980). High resolution electron spectroscopy, supplement 13B of Advances in Electronics and Electron Physics. New York: Academic Press, pp. 257 – 381. Barbour, J. P., Charbonnier, F. M., Dolan, W. W., Dyke, W. P., Martin, E. E., and Trolan, J. K. (1960). Determination of the surface tension and surface migration constants for tungsten, Phys. Rev. 117(6), 1452 – 1459. Barth, J. E., and Nykerk, M.D. Dependence of the chromatic aberration spot size on the form of the energy distribution of the charged particles. (1999). Submitted for publication in the Proceedings of CPO5. Barth, J. E., and Kruit, P. (1996). Addition of different contributions to the charged particle probe size, Optik 101(3), 101 – 109. Bell, A. E., and Swanson, L. W. (1979). Total energy distributions of field-emitted electrons at high current density, Phys. Rev. B 19(7), 3353 – 3364. Christov, S. G. (1966). General theory of electron emission from metals, Phys. Stat. Sol. 17 (11), 11 – 26. Collins, R. A., and Blott, B. H. (1968). The adsorption and nucleation of zirconium on tungsten field emitters, Surface Science 10, 349 – 368. Danielson, L. R., and Swanson, L. W. (1979). High temperature coadsorption study of zirconium and oxygen on the W(100) crystal face. Surface Science 88, 14 – 30. Dushman, S. (1923). Electron emission from metals as a function of temperature, Phys. Rev. 21, 623 – 636. El-Kareh, A. B., Wolfe, J. C., and Wolfe, J. E. (1977). Contribution to the general analysis of field emission, J. Appl. Phys. 48(11), 4749 – 4753. Elswijk, H. B., van Rooy, Th. L., and Schiller, C. (1995). Energy broadening by Coulomb interactions of an electron beam emitted from a point source. J. Vac. Sci. Technol. B 13(3), 1037 – 1043. Fowler, R. H., and Nordheim, L. (1928). Electron emission in intense electric fields. Proc. Roy. Soc. London A 119, 173 – 181. Fransen, M. J. (1999). Towards high-brightness, monochromatic electron sources, Ph.D. diss. Delft University of Technology. Published by M. J. Fransen, ISBN 90-9012398-9. Fransen, M. J., Faber, J. S., van Rooy, Th. L., Tiemeijer, P. C and Kruit, P. (1998). Experimental evaluation of the extended-Schottky model for ZrO/W electron emission. J. Vac. Sci. Technol. B 16(4), 2063 – 2072. Fransen, M. J., Overwijk, M. H. F., and Kruit, P. (1999). Brightness measurements of a ZrO/W Schottky electron emitter in a Transmission Electron Microscope. Presented at IVESC 1998, accepted for publication in Applied Surface Science. Fursei, G. N., and Shakirova, S. A. (1966). Localization of field emission in small solid angles, Soviet Physics — Technical Physics 11(6), 827 – 832. Gadzuk, J. W., and Plummer, E. W. (1971). Energy distributions for thermal field emission. Phys. Rev. B 3(7), 2125 – 2129. Gomer, R. (1961). Field emission and field ionization. Cambridge, Mass.: Harvard University Press. Reprinted by the American Institute of Physics (2nd ed. 1993). Gr¨obner, W., and Hofreiter, N. (1961a). Integraltafel. II Teil: Bestimmte Integrale, 3rd ed. Wien: Springer-Verlag, p. 83, Equation 4a. Gr¨obner, W., and Hofreiter, N. (1961a). Integraltafel. II Teil: Bestimmte Integrale, 3rd ed., Wien: Springer-Verlag, p. 177, Equation 13d. Gr¨obner, W., and Hofreiter, N. (1961c). Integraltafel. II Teil: Bestimmte Integrale, 3rd ed. Wien: Springer-Verlag, p. 53, Equation 9.
THE ELECTRON-OPTICAL PROPERTIES OF THE ZRO/W SCHOTTKY
165
Gr¨obner, W., and Hofreiter, N. (1961d). Integraltafel. II Teil: Bestimmte Integrale, 3rd ed. Wien: Springer-Verlag, p. 177, Equation 13e. Hanszen, K.-J., Lauer, R., and Ade, G. (1985). Possibilities of brightness measurements in nonrotationally symmetrical electron beams by measuring the current density and determining the decrease of contrast in electron interferograms or micrographs, Optik 71(2), 64 – 72. Hawkes, P. W., and Kasper, E. (1989a). Principles of electron optics Vol. 2: Applied geometrical optics. Chapter 44: Theory of electron emission. New York: Academic Press. Hawkes, P. W., and Kasper, E. (1989b). Principles of electron optics Vol. 2: Applied geometrical optics. Chapter 47: Brightness. New York: Academic Press. Isaacson, M., and Gomer, R. (1978). Extended range field emission spectroscopy. Appl. Phys. 15, 253 – 256. Jansen, G. H. (1990). Coulomb interactions in particle beams, suppl. 21 in Advances in electronics and electron physics. New York: Academic Press. Joy, D. C., and Joy, C. S. (1996). Low voltage scanning electron microscopy, Micron 27(3 – 4), 247 – 263, part VI. Kang, N. K., Tuggle, D. W., and Swanson, L. W. (1983). A numerical analysis of the electric field and trajectories with and without the effect of space charge for a field electron source. Optik 33(4), 313 – 331. Kasper, E. (1982). Field electron emission systems, volume 8 of Advances in Optical and Electron Microscopy. New York: Academic Press, pp. 207 – 260. Kim, H. S., Yu, M. L., Thomson, M. G. R., Kratschmer, E., and Chang, T. H. P. (1997a). Energy distributions of Zr/O/W Schottky electron emission, J. Appl. Phys. 81(1), 461 – 465. Kim, H. S., Yu, M. L., Thomson, M. G. R., Kratschmer, E., and Chang, T. H. P. (1997b). Performance of Zr/O/W Schottky emitters at reduced temperatures, J. Vac. Sci. Technol. B 15(6), 2284 – 2288. Knauer, W. (1981). Energy broadening in field emitted electron and ion beams, Optik 59(4), 335 – 354. Koek, B., Chisholm, Th., Davey, J. P., Romijn, J., and van Run, A.J. (1993). A Schottky-emitter electron source for wide range lithography applications. Jap. J. Appl. Phys. 32(Part 1, no. 12B), 5982 – 5987. Kruit, P., and Jansen, G. H. (1997). Handbook of charged particle optics. Chapter 7: Space charge and statistical Coulomb effects. New York: CRC Press. Lauer, R., Hanszen, K.-J., and Ade, G. (1985). Analyse des interferometrischen Verfahrens zur Messung des Richtstrahlwerts von Elektronenkanonen, number APh-24 in PTB-Bericht. Physikalisch-Technischen Bundesanstalt. Lide, D. R. (1991 – 2). CRC Handbook of Chemistry and Physics, 72nd ed. New York: CRC Press. Loeffler, K. H. (1969). Energy-spread generation in electron-optical instruments, Z. Angew. Phys. 27, 145 – 149. Millikan, R. A., and Lauritsen, C. C. (1928). Relations of field-currents to thermionic-currents. Proc. Nat. Acad. Sci. of the USA 14, 45 – 49. Modinos, A., and Xanthakis, J. P. (1991). Energy-broadening of field-emitted electrons due to Coulomb collisions, Surface Science 249, 373 – 378. Otten, M. T. (1994). Performance of the CM200 FEG, in Proceedings ICEM 13-Paris, 235 – 236. Reimer, L. (1993). Image Formation in Low-Voltage Scanning Electron Microscopy. SPIE Optical Engineering Press, Bellingham, Washington. Richardson, O. W. (1902). Negative radiation from hot platinum, Proc. Cambridge Philos. Soc. 11, 286 – 295. Richardson, O. W. (1912). The electron theory of contact electromotive force and thermoelectricity. Philos. Mag. XXIII, 263 – 278.
166
M. J. FRANSEN, TH. L. VAN ROOY, et al.
Samoto, N., Shimizu, R., Hashimoto, H., Tamura, N., Gamo, K., and Namba, S. (1985). A stable high-brightness electron gun with Zr/W tip for nanometer lithography. I. Emission properties in Schottky- and thermal field-emission regions, Jap. J. Appl. Phys. 24(6), 766 – 771. ¨ Schottky, W. (1914). Uber den Einfluss von Strukturwirkungen, besonders der Thomsonschen Bildkraft, auf die Elektronenemission der Metalle, Physik. Zeitschr. XV, 872 – 878. ¨ Schottky, W. (1923). Uber kalte und warme Elektronenentladungen, Zeit. Phys. 14, 63 – 106. Shimoyama, H., and Maruse, S. (1984). Theoretical considerations on electron optical brightness for thermionic, field and T-F emissions, Ultramicroscopy 15, 239 – 254. Shrednik, V. N. (1961). Investigation of atomic layers of zirconium on the faces of a tungsten crystal by means of electron and ion projectors. Soviet Physics — Solid State 3(6), 1268 – 1279. Simpson, J. A. Design of retarding field energy analyzers, Rev. Sci. Instrum. 32(12), 1283 – 1293. Speidel, R., and Kurz, D. (1977). Richtstrahlwertmessungen an einem Strahlerzeugungssystem mit Feldemissionskathode, Optik 49(2), 173 – 185. Swanson, L. W., and Bell, A. E. (1973). Recent advances in field electron microscopy of metals, volume 32 of Advances in Electronics and Electron Physics, Appendix I. New York: Academic Press, 296 – 304. Swanson, L. W., and Crouser, L. C. (1969). Angular confinement of field electron and ion emission, J. Appl. Phys. 40(12), 4741 – 4749. Swanson, L. W., and Schwind, G. A. (1997). Handbook of Charged Particle Optics. Chapter 2: A review of the ZrO/W Schottky cathode. New York: CRC Press. Tiemeijer, P. C. Unpublished experimental results. Tiemeijer, P. C. (Awaiting publication). Measurement of Coulomb interactions in an electron beam monochromator. Presented at TARA 1998, submitted to Ultramicroscopy. Tuggle, D. W. Emission characteristics and electron optical properties of the ZrO/W point cathode, Ph.D. diss., Oregon Graduate Center, 1984. van der Mast, K. D. A laser heated Schottky emission gun for electron microscopy, Ph.D. diss., Delft University of Technology, 1975. VSW (Vacuum Science Workshop). (1995). Ltd. Lyme Green Business Park, Macclesfield, Cheshire, U.K. Young, R. D. (1959). Theoretical total-energy distribution of field-emitted electrons, Phys. Rev. 113(1), 110 – 114. Young, R. D., and Kuyatt, C. E. (1968). Resolution determination in field emission energy analyzers, Rev. Sci. Instrum. 39(10), 1477 – 1480.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 111
The Size of Objects in Natural and Artificial Images LUIS ALVAREZ1 , YANN GOUSSEAU2 , AND JEAN-MICHEL MOREL3 1 Departamento
de Informatica y Sistemas, Universidad de Las Palmas, Campus de Tafira, 35017 Las Palmas, Spain.
[email protected] 2 Centre de Math´ ematiques et leurs Applications, ENS Cachan, 61 av. du Pr´esident Wilson, 94235 Cachan Cedex, France.
[email protected] 3 Centre de Math´ ematiques et leurs Applications, ENS Cachan, 61 av. du Pr´esident Wilson, 94235 Cachan Cedex, France.
[email protected]
I. Introduction . . . . . . . . . . . . . . . . . . . . . .
168
II. Statistics of Natural Images: A Review . . . . . . . . . . . . .
170
A. Motivations . . . . . . . . . . . . . . . . . . . . .
170
B. First-Order Statistics . . . . . . . . . . . . . . . . . .
171
C. Second-Order Statistics . . . . . . . . . . . . . . . . .
172
1. Covariance and Power Spectrum . . . . . . . . . . . . . 2. Other Second-Order Statistics . . . . . . . . . . . . . .
172 173
D. Linear Decomposition of Images . . . . . . . . . . . . . .
174
E. Scale Invariance in Natural Images
. . . . . . . . . . . . .
175
. . . . . . . . . . . . . .
175
. . . . . . . . . . . . . . . .
176
. . . . . . . . . . . . . . . . . .
177
III. Sizes of Sections in Natural Images A. The Distribution of Areas B. Digital Photographs
C. Other Types of Images
. . . . . . . . . . . . . . . . .
179
D. The Distribution of Boundary Lengths . . . . . . . . . . . .
184
E. The Distribution of Intercept Lengths . . . . . . . . . . . . . IV. Size of Sections and the BV Norm of Natural Images
. . . . . . . .
185 186
A. A Lower Bound for the BV Norm . . . . . . . . . . . . . .
186
B. Application to Natural Images . . . . . . . . . . . . . . .
190
V. The Dead Leaves Model . . . . . . . . . . . . . . . . . .
192
A. Random Closed and Compact Sets of IRn
. . . . . . . . . . .
193
B. The Dead Leaves Model . . . . . . . . . . . . . . . . .
194
C. The Geometric Covariogram . . . . . . . . . . . . . . . .
199
D. The Distribution of the Intercepts in the Dead Leaves Model . . . . .
201
E. The Stability of the Power Law . . . . . . . . . . . . . . .
205
F. The Size Distribution of the Relief Grains . . . . . . . . . . .
207
G. The Average Area After Occlusion: Numerical Experiments . . . . .
208
Volume 111 ISBN 0-12-014753-X
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
168
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
VI. A Short Review of Texture Synthesis by Mathematical Methods A. The Normal Vectors Perturbation Method . . . . . . B. Solid Textures . . . . . . . . . . . . . . . C. Reaction-Diffusion Textures . . . . . . . . . . . D. Spot Noise Textures . . . . . . . . . . . . . E. Fractal Models . . . . . . . . . . . . . . . 1. Fractional Brownian Motion . . . . . . . . . . 2. Midpoints Displacement Method . . . . . . . . 3. The Spectral Synthesis Method . . . . . . . . . 4. Iterated Function Systems . . . . . . . . . . F. Random Fields Models . . . . . . . . . . . . 1. Markov Models . . . . . . . . . . . . . . 2. An Example: The Autobinomial Model . . . . . . 3. The Model of Gagalowicz and Ma . . . . . . . . G. Multiscale Synthesis Model . . . . . . . . . . . H. The FRAME Model . . . . . . . . . . . . . VII. Some Principles for the Synthesis of Abstract Natural Images . A. Structural Laws . . . . . . . . . . . . . . . B. Scale and Perspective Laws . . . . . . . . . . . C. Synthesis Rules and “Synthetic Worlds” . . . . . . . VIII. Appendix: Matheron’s Dead Leaves Model . . . . . . . A. Poisson Point Processes . . . . . . . . . . . . B. The Boolean Model . . . . . . . . . . . . . C. The Dead Leaves Model in IR2 . . . . . . . . . . References . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
210 211 212 214 215 216 216 218 219 219 221 222 223 224 225 226 228 229 229 230 235 235 237 238 239 242
ABSTRACT This paper is partly written as a review, in which we discuss old and new image analysis and synthesis methods. We also introduce a new method for analyzing scaling phenomena in natural images and draw some consequences as to whether natural images belong to the space of functions with bounded variation. In some sense, our analysis computes the size distribution of objects in an image. By using the dead leaves model, we study the influence of occlusion on size distribution, and prove compatibility with our experimental results. I. INTRODUCTION There are many statistical studies of natural images, i.e., of pictures of scenes encountered in the surrounding world. Most of the studies are concerned with
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
169
first- or second-order statistics (through the power spectrum, the covariances, the cooccurrences) or with additive decompositions of images. Those statistical studies are partially matched by synthesis models, especially for texture synthesis, which try to compose an image either by imposing some second order statistics, or by adding (waveletlike) basis functions while maintaining some statistical requirements. Our aim in this paper is first to give an analytical account of those analysis and synthesis methods, in Sections II and VI, respectively. Then, we will try to complete the frame by introducing a new way of analyzing scaling phenomena in an image in Section III. We shall show that in natural images, there is a constant form for the size distribution. The definitions of sizes we consider are of several types: area, boundary length, and length of intersections with lines. An experimental program that we performed on many photographs of very diverse natural scenes indicates that the size distribution of homogeneous parts in images has a very peculiar form. We define in Section III what we mean by homogeneous parts, the connected components of image domains where contrast does not exceed a certain threshold. We can roughly define the size distribution of objects in an image as the size distribution of these parts. Our experiments show that the number of homogeneous parts as a function of their size s obeys the law cardfhomogeneous regions with size sg D
K s˛
where K is an image-dependent constant. When the size s denotes the area, in most photographs, ˛ is close to 2. As a consequence of the size power law, some information can be obtained about the “natural” function space for images, as shown in Section IV: we focus our attention on the space BV of functions with bounded variation. We are in a position to tell when a given image is not in this space, provided the observed size distribution model remains true at smaller (not observable) scales as well. In Section V, we present a powerful but scarcely used model, the dead leaves model of Matheron, and analyze the effect of occlusion on the size distribution of objects. There is no obvious equivalence between the way we define regions in our experimental program and regions in the dead leaves model. However, we show that, for some notions of size, the power law is preserved by occlusions in the dead leaves model. In order to straighten the link between the dead leaves model and our analysis, we also compute size distributions in synthetic images. We tried several principles for the synthesis, some being generalizations of the dead leaves model definition and some completely different, such as the exclusion principle. New kinds of textures can be generated by those models. We display several examples of “natural” and “nonnatural” abstract images in Section VII. For the purpose of simplicity, we tried to avoid any precise
170
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
references in this short introduction. Instead, each section is documented and can be read fairly independently.
II. STATISTICS OF NATURAL IMAGES: A REVIEW A. Motivations A digital, gray level image may be seen as the realization of a random vector of size M ð N taking values in a discrete set V D 1, . . . , C. For typical values such as M D N D C D 256, the number of possible realizations, CMN D 2524,288 , is huge. Obviously, “natural images”, i.e., digital photographs of natural scenes, form only a small subset of all possible realizations. Looking at random realizations of such vectors is enough to be convinced of this fact. Natural images are highly improbable events. It is therefore interesting to look for statistical characteristics of such images: what are the relationships between gray level values at distant pixels? Is it possible to define a probability law for natural images? In what follows, I is a digital image, for x 2 M ð N, Ix 2 V is the gray level value at pixel x, PrA stands for the probability of the event A, and E is the mathematical expectation. In addition to the legitimate interest in the complicated and fairly unknown structure of images, there are some strong motivations to investigate such statistics. The first one comes from biological vision and has initiated most of the work on the field. Indeed, it is likely that visual systems are adapted to the type of scenes they deal with (which may be very different from one species to another). H. Barlow [1961] suggested that the goal of sensory coding is to reduce redundancies in the input signals, so that studying those redundancies (for example, dependency between adjacent pixels) gives an insight to the way our brain is extracting informations from the huge amount of data it receives through the eyes. A second motivation arises from the Bayesian framework. Suppose we want to describe an image by a set of parameters fp1 , . . . , pn g. In order to estimate these parameters, given the image I, we make use of Baye’s rule PrIjfpi g Prfpi g Prfpi gjI D PrI which indeed presupposes the existence of a probability distribution Pr on images. This can for instance be useful for the denoising (through the search of a good approximation of the “true” image knowing a noisy, and possibly convolved, image) if we know characteristics of Pr. A third motivation is image compression. Imagine a very simple type of image where each realization is
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
171
either black or white; then only one bit will be needed to encode any image. More generally, information theory tells us that the compression rate is related to the entropy of the image I, EI D
PrI log PrI.
allI
Shannon’s source coding theorem asserts that in order to compress (many) images with almost no loss of information, the number of bits needed is almost equal to the entropy. Moreover trying to improve this bound leads to a drastic loss of information. Codes to achieve such a compression are given by information theory but are not feasible in practice, and the measure of loss may be poorly adapted to images. However, statistics of images may give an insight to the type of compression schemes that are to be used. We also show in the present paper (see Section IV) that statistics of images may account for the type of functional space to which images belong. The study of the statistics of images is not a new subject. It started with television signals studies [Kretzmer, 1952; Deriugin, 1956]. The subject has gained considerable interest during the last 10 years. In all studies, images are assumed to be translation invariant, allowing the relationships between pixels to depend on distance only, even though this assumption is not strictly observed. When dealing with statistics of natural images, two different approaches are possible: either assuming that a single image is large enough to evaluate parameters on it, by making ergodicity assumptions, or averaging the quantity under study over a large number of images. The first method enables us to study differences between images and may render easier a physical explanation, whereas the second is more robust and permits the estimation of outliers, which are not numerous enough in a single image. In order to study natural scenes, one may be interested in studying the light intensity arriving on the lens and thus in having a linear relationship between gray level and intensity. Moreover, one may want to have an histogram centered at 0 and invariant under multiplication of the gray levels by a constant. One way to achieve that is to study logI/I0 instead of I, where I0 is the mean gray level. B. First-Order Statistics The simplest thing we can study from an image is the gray level histogram, i.e. the distribution of gray levels in any pixel (under translation invariance each pixel is equivalent). Several authors [Ruderman and Bialek, 1994; Huang and
172
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Mumford, 1999] found linear tails and a picked central value when plotting log(number of occurrences of gray level) versus log(gray level). They worked on images of logarithms of initial gray level values for the previously mentioned reasons. Those linear tails indicate that the histogram is not Gaussian, showing by the contraposal of the central limit theorem that values in different pixels are correlated. Moreover, the distribution is not symmetric, which could be due to the presence of large parts of sky in many images, as suggested in Huang and Mumford [1999]. However, looking at the histogram could be interesting for the purpose of biological studies when dealing with calibrated images, but it is not of much use in order to study the morphological structure of images. It is well known that we can strongly modify the histogram of an image without affecting its perception. C. Second-Order Statistics 1. Covariance and Power Spectrum In this section, I is assumed to be a continuous function from IR2 into IR. Under the assumption of translation invariance, the covariance of an image I is defined as (assuming the mean is zero) Cx D EI0Ix where 0 is an arbitrary origin and x a location in the image. If IO is the Fourier transform of the image (I is viewed as a continuous function belonging to L 2 and IO is a function of the complex variable ω), the power spectrum is defined as Pω D jIO ωj2 and we have the Wiener-Khintchin formula O Pω D Cω. It is well known that, when averaged over all directions, the power spectrum of natural images, 2 1 Ptei d ft D 2 0 has been experimentally found to have the form ft '
C t˛
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
173
with ˛ close to 2 [Deriugin, 1956; Field, 1987; Burton and Moorhead, 1987]. This phenomenon is also observed with the logarithm of images [Ruderman and Bialek, 1994]. It should be mentioned that the form of Pω is strongly dependent on the direction [Baddeley, 1997], and that there are large differences in the exponent ˛ of the power law from one image to another [Tolhurst et al., 1992]. Moreover, the Fourier transform is a nonlocal operation, and consequently the power spectrum may emphasize a particular aspect of the image, such as the presence of an edge. In the case of the presence of a single very sharp edge — for instance, at the horizon — the exponent of the power law gives no account of other statistics of the image; see Voss and Wyatt [1993]. A very interesting fact about this power law is that it is implied by scale invariance (as stressed by Field [1987]). Indeed, if there is the same energy for frequencies ω between t and t0 , t < jωj < t0 , and between at and at0 , at < jωj < at0 , then t0
ufu D
t
at0
u fu. at
But this must be true for all t and t0 so that fau D 1/a2 fu for all u, a, so that fu D C/u2 . It is worth noticing that the same reasoning about sounds leads to Pt D C/t, a property of the so-called 1/f-noise, which is also observed [Voss and Clarke, 1975]. Assuming this power law for the power spectrum and because of the WienerKhintchine relation, the correlation should be of the form Cx D A C B logjxj. It is worth mentioning that if Pω is a power law with an exponent ˛ 6D 2, then the covariance should be of the form Cx D A C Bjxj2˛ with the signs of A and B ensuring its decay when x increases. Those types of covariances have been observed, at least asymptotically [Roderman, 1994, 1997; Baddeley, 1997; Huang and Mumford, 1999]. 2. Other Second-Order Statistics The covariances of an image do not characterize the complete second-order statistics. Those statistics are fully described by the cooccurrence function: Ki, j, x D PrI0 D i and Ix D j
174
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
where x is a site in the image and i, j are gray levels. Gagalowicz and Ma [1985] made use of related quantities to synthesize textures, as explained in Section VI.F.3. As a particular case of cooccurrences Huang and Mumford [1999] look at the logarithm of the difference of gray levels between two adjacent pixels, working with the logarithm of the images. The observed distribution has a very peculiar form, with long tails. Once more this emphasizes the fact that images are non-Gaussian. The large tails of the distribution denote the presence of edges in the images. D. Linear Decomposition of Images Other types of studies deal with the decomposition of images into a linear combination of basis images. An image I is considered as a random vector of size M ð N and is written n ID an An 1
where fai g are random variables and fAi g are fixed images. The Ai ’s are determined by imposing some conditions on the ai ’s and are then used as a statistical description of the images. In the principal components analysis, the ai ’s are requested to be pairwise uncorrelated. The Ai ’s are obtained by looking for eigenvalues of the covariance matrix. For natural images, they are nonlocal and resemble a Fourier basis [Olshausen and Field, 1996; Bell and Sejnowski, 1997]. In the independent components analysis, the ai ’s must be pairwise independent. Such a decomposition may not exist, and the Ai ’s are approximated by numerical schemes. Bell and Sejnowski [1997] used a neural network approach to compute the independent components of natural images (actually on small 12 ð 12 patches extracted from four natural images) and obtained vectors that are localized and oriented and resemble edge-detector filters. Those vectors also resemble receptive fields of simple cells in the visual cortex, confirming the hypothesis of Barlow mentioned in the introduction. Another decomposition is studied in [Olshausen and Field, 1996]. The ai ’s should minimize ED I
2
ai Ai
C
S
ai
with i2 D Eai2 , a constant, and S some increasing real function. The idea is to find Ai ’s that cost as little as possible with respect to the cost function, S. As in the case of independent components analysis, the Ai ’s are localized and oriented.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
175
E. Scale Invariance in Natural Images We do not attempt to give a precise definition of scale invariance, but we call so the property of images to have a zoom-invariant probability law. Scale invariance has been observed in several ways in natural images. First of all, the simple fact of being able to observe many common statistics on images representing arbitrarily chosen scenes (thus seen from arbitrary distance and with any size of objects) is a strong clue to scale invariance. Secondly, as we previously mentioned, the observed form of the power spectrum is the only possible one for scale invariant images. Moreover, many statistics have been found to be preserved when block-averaging images. This is the case of the histogram and covariances [Ruderman and Bialek, 1994; Ruderman, 1994] and is also the case for the gray value differences between adjacent pixels [Mumford, Zhu, and Gidas, 1997]. It must be emphasized that this is the case even though the procedures to obtain smaller images where very brutal: small filtering followed by subsampling. Actually, images are not exactly scale invariant. We already mentioned that the exponent of the power law decay of the power spectrum is not always 2. Moreover, when block-averaging the images, statistics are preserved by slightly renormalizing the image, i.e., by multiplying it by a constant at each blowdown, which is a way to measure the departure of images from scale invariance [Mumford, Zhu, and Gidas, 1997]. Our results about areas of sections also confirm this scale invariance, as is shown in Section III. III. SIZES OF SECTIONS IN NATURAL IMAGES As reviewed in the previous section, the known statistical properties of natural images are mainly first- or second-order statistics and statistics of additive decompositions of images. We have a different approach, working in the image domain on items that can have a straightforward visual interpretation. The goal of our experiments is to study the areas of homogeneous regions in digital images. We shall look at the area distribution of the connected components of bilevels of various digital photographs, and realize that those distributions are very similar one another. Indeed, the observed values are very close to functions A/area˛ , where ˛ is a real number around 2, and A is an image-dependent constant. That is, cardfhomogeneous regions with area ag D
A . a˛
We shall also study the distribution of the lengths of the boundaries of those connected components of bilevel sets and also observe a power law, with a
176
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
different exponent. Eventually, the same result is true for the distribution of the lengths of the intersections of those components with lines of fixed direction. From the first two of those statistics, we are in a position to tell when images are not in the space of functions with bounded variation, BV, as is shown in Section IV. A. The Distribution of Areas We now make clearer what we mean by homogeneous region of an image. We begin by equalizing the image histogram and uniformly quantifying it in the following way. We consider a digital image I of size H ð L, with G gray levels, and write Ii, j for the gray level at pixel i, j. Let k be an integer less than G. Let N1 be the first integer such that more than HL/k pixels have a gray level less than N1 ; then N2 , the first integer such that more than 2HL/k have a gray level less than N2 ; then N3 . N3 , . . . , Nk D G defined the same way, this sequence being possibly constant at some point. For l varying from 1 to k, let Il be the binary image with Il i, j D 1 if Ii, j 2 [Nl1 , Nl and Il i, j D 0 otherwise. We call those images k-bilevels of I. Each bilevel image represents a quantization level of the equalized image. Next, we look at the area histogram of the connected components of the bilevels. For s an integer varying from 0 to HL, let fs be the number of connected components with area s of the set of 1s pixels, in any of the k-bilevels of I. We consider both consider 4-connectivity (each pixel has 4 neighbors: up, down, right, left) and 8-connectivity (we add the diagonal neighbors, so that each pixel has 8 neighbors). We computed the function f on many digital photographs. We did not attempt to use a single source of images; the digitized images either are scanned photographs or are from a digital camera, with diverse optical systems and exposures. Those functions are of the form fs D C/s˛ , with C a constant and ˛ a real number close to 2, for values of s in a certain range and reasonable values of k (basically between 4 and 30). The observed fit is excellent, as can be seen from Figure 2, which actually corresponds to one of the worst cases we observed. For fixed k, we consider the set of points S D flogs, logfs, 0 s Tmax g where Tmax C 1 is the smallest value of s such that fs D 0. We perform a linear regression on this set S to find the straight line (in the log-log coordinates) glogi D A ˛ logi the closest to S in the least-squares sense and write E for the least-squares error. The choice to consider only the data
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
177
s Tmax is motivated by the fact that f takes mainly 0 and 1 values at some point, which makes difficult the fit of f to a reference function. Thus we minimize the energy E˛, A D
Tmax
log fi A C ˛ log i2 .
iD1
B. Digital Photographs We present the result for pictures having different scales and textures (see Figures 1–7) in Tables 1–7. The two texture images in Figures 6 and 7 are from the Brodatz’s album. The value of ˛ seems to be related to the amount of texture in the image; the more textured the image, the bigger the value of ˛. Typically, for photographs of natural scenes, the value of alpha is between 1.5 and 3 (the values close to 3 being reached for images as the baboon (Figure 3), which present textured areas), whereas for textures it is typically between 2.5 and 3.5. We also performed the linear regression on sets of points ST min D flogs, logfs, Tmin s Tmax g
FIGURE 1. Baobab, 665 ð 1024, 119 gray levels.
178
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL 105
number of c.c.
104
103
102
101
100 100
101
102
103
size of c.c.
FIGURE 2. Function f (area distribution) for the baobab image (Figure 1), k D 16, 8-connectivity, Tmax D 182.
FIGURE 3. Baboon, 512 ð 512, 224 gray levels.
for various values of Tmin to show that the fit of S to the power law was not forced by small areas only, and moreover that if the contribution of E mainly comes from the large areas, the value of ˛ computed with those large areas was close to the initial value. The results for the image of the baobab are shown in Table 3.
179
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
FIGURE 4. Airport, 510 ð 341, 256 gray levels. 105
number of c.c.
104
103
102
101
100 100
101 size of c.c.
102
FIGURE 5. Function f for the airport image (Figure 4), k D 16, 8-connectivity, Tmax D 92.
C. Other Types of Images In order to see whether the power law is in some sense characterizing digital photographs, we computed histograms of areas of bilevels for other types of images. We looked at noise images, white or correlated; text images; and synthesized images consisting of the superposition of simple shapes. White noise images, i.e., images in which the gray level values at distinct pixels are independent random variables, present a histogram of the form fs D expCs, with C a constant. We observed this fact on two different kinds of white noises: uniform and Gaussian. Thus the value of E is large for those images, and so is ˛, as presented in Table 8.
180
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 6. Texture1, 256 ð 256, 153 gray levels.
FIGURE 7. Texture2, 256 ð 256, 118 gray levels. TABLE 1 DIFFERENT VALUES OF THE QUANTIZATION NUMBER k FOR THE BAOBAB IMAGE, 8-CONNECTIVITY. AREA DISTRIBUTION IS fs D AS˛ , Tmax IS THE MAXIMAL CONSIDERED AREA. Image
k
˛
E
Tmax
A
Baobab Baobab Baobab Baobab
20 16 12 8
1.98 1.98 1.85 1.67
0.24 0.45 0.31 0.27
109 182 111 82
11.2 11.0 10.3 9.3
Text images produced by text editor do lead, as one would guess, to a histogram consisting of isolated peaks, whose height is not directly related to the value of s. Thus the fit to the power law is poor; see Figure 8 and Table 9. Concerning the synthesized images, we observed two distinct behaviors. First, if we consider superposition of simple shapes (disks, rectangles, polygons) with
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
181
TABLE 2 DIFFERENT VALUES OF k FOR THE BAOBAB IMAGE, 4-CONNECTIVITY. Image
k
˛
E
Tmax
A
Baobab Baobab Baobab Baobab
20 16 12 8
2.15 2.06 1.96 1.86
0.32 0.38 0.30 0.40
146 160 106 133
11.7 11.4 10.8 10.1
TABLE 3 DIFFERENT VALUES OF Tmin FOR THE BAOBAB IMAGE, k D 16, 8-CONNECTIVITY. Image
Tmin
˛
E
A
Baobab Baobab Baobab Baobab
5 10 20 50
2.04 2.07 2.09 2.08
0.44 0.44 0.46 0.50
11.3 11.4 11.5 11.5
TABLE 4 DIFFERENT VALUES OF k FOR THE BABOON IMAGE, 8-CONNECTIVITY. Image
k
˛
E
Tmax
A
Baboon Baboon Baboon Baboon
20 16 12 8
2.55 2.38 2.42 2.35
0.30 0.33 0.47 0.41
70 82 78 76
11.7 11.3 11.4 11.2
TABLE 5 DIFFERENT VALUES OF k FOR THE AIRPORT IMAGE (FIGURE 4), 8-CONNECTIVITY. Image
k
˛
E
Tmax
A
Airport Airport Airport Airport
20 16 12 8
1.95 1.93 1.79 1.83
0.43 0.47 0.38 0.35
92 92 95 63
9.7 9.6 9.0 8.7
various laws of sizes and possibly made of subobjects, the function fs is often well fitted by a power law, but with an ˛ close to 1. This is quite difficult to predict by calculations, and we refer to the section presenting the dead leaves model for the mathematical analysis of those phenomena. We present two results. In the first one we considered 50 images produced by the superposition of disks of the same size. Each disk has a random color uniformly drawn between 0 and 255, and its center follows a uniform law in the image (actually in a slightly enlarged
182
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL TABLE 6 DIFFERENT VALUES OF k FOR THE TEXTURE1 IMAGE (FIGURE 6), 8-CONNECTIVITY. Image
k
˛
E
Tmax
A
Texture1 Texture1 Texture1 Texture1
20 16 12 8
3.59 3.56 3.34 2.44
0.66 0.66 0.67 0.51
19 26 31 25
11.5 11.7 11.6 10.4
TABLE 7 DIFFERENT VALUES OF k FOR THE TEXTURE2 IMAGE (FIGURE 7), 8-CONNECTIVITY. Image
k
˛
E
Tmax
Slope
Texture2 Texture2 Texture2 Texture2
20 16 12 8
3.42 3.17 2.91 2.61
0.43 0.45 0.34 0.49
28 29 34 38
11.3 11.1 10.9 10.7
k D 16 Image Uniform noise Gaussian noise
FOR
TABLE 8 WHITE NOISES, 8-CONNECTIVITY. ˛
E
Tmax
A
5.00 4.70
0.93 0.88
11 11
13.5 13.4
FIGURE 8. Text, 703 ð 615, 2 gray levels.
image). In the section about the dead leaves model we used the same image to compute average sizes, but with a different analysis to avoid the union of objects and boundary problems. The values of fs for each s were then added over all images, and the function f fitted by a power law as before. We then performed the same analysis with disks having an area distributed according to a power law with exponent 2, and a radius between 20 and 80. See Table 10.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
kD2 Image Text Text
FOR THE
183
TABLE 9 TEXT IMAGE (FIGURE 9).
Connectivity
˛
E
Tmax
A
8 4
0.84 1.70
1.15 0.81
17 59
4.0 8.1
TABLE 10 VARIOUS SYNTHESIZED IMAGES, 8-CONNECTIVITY, k D 16. Image Disks Disks, power law Eyes (Figure 9)
˛
E
Tmax
A
0.86 0.80 1.66
0.29 0.39 0.31
1000 807 686
9.0 6.9 12.4
Furthermore, if we do the same with more complicated figures, having holes and asperities, the observed law is a power law as well, but with an ˛ close to two. In the example presented in Figure 9, the basic shape is itself the connected component of the bilevel of a photograph, actually a part of a fly eye. It must be emphasized that discretization effects are very strong in these types of complicated shapes, but if we perform a morphological closing of size 1 pixel on the basic shapes, the results are not qualitatively affected, which tends to minimize the effect of discretization. Last but not least we present the results obtained when analyzing a correlated noise. We performed convolutions between white noises and a Gaussian 1 exp 2 x 2 C y 2 2
FIGURE 9. Dead leaves model, where all shapes have the same size, 900 ð 900, 256 gray levels.
184
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
TABLE 11 UNIFORM WHITE NOISE IMAGE, AFTER CONVOLUTION WITH A GAUSSIAN OF PARAMETER , k D 12, 8-CONNECTIVITY. Image convolved with a Gaussian Noise Noise Noise Noise
convolved convolved convolved convolved
with with with with
a a a a
Gaussian Gaussian Gaussian Gaussian
˛
E
Tmax
A
0.71 0.85 1.12 1.58
3.88 3.57 3.27 2.87
0.60 0.50 0.55 0.47
26 30 39 59
13.3 13.1 13.0 12.6
where is a variable parameter. This was done by multiplication in the frequency domain. Such a convolution can be seen as a crude approximation of the effect of an optical lens. The results we obtain for those images were similar to the ones for digital photographs of textures. We present the results obtained in the case of the white uniform noise in Table 11. We also tested the effect of the convolution with Bessel functions (Fourier transform of disks), and the results were very similar. Those “nonnatural images” lead to two remarks about the 1/area2 law. First, this law does not characterize natural images, even though a correlated noise looks similar to a natural texture, and the complicated synthesized images may look familiar to us. Second, the size law could be related to the way the optical photographic device captures the image, as suggested by the behavior of noise convolved with a Gaussian. More precisely, we observed that the convolution with a Gaussian increases the value of ˛ for images where the initial ˛ is small (such as text and synthetic images) whereas it tends to decrease its value when it is initially bigger than 2 (noises). Another, and more satisfactory explanation of this power law relies on the assumption that natural images are scale invariant so that all observed statistics should be scale (zoom) invariant. This simple assumption yields the 1/area2 law. Indeed, if we suppose that the total area occupied by regions having an area between A and A0 is the same as the total area occupied by regions with area between tA and tA0 , for all t, A, A0 , then the same reasoning as presented for the power spectrum (see Section I.C.1) leads to the power law with exponent 2. D. The Distribution of Boundary Lengths We performed exactly the same analysis on the boundary lengths of connected components of bilevels as we did before on areas of those components. As a discrete definition of the length of a discrete connected set S (8-connectivity), we chose to count the pixels not belonging to S that are neighbors of some pixel of S in the 4-connectivity sense. There are many other ways to define discrete boundary length. We tried several other methods that gave basically
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
185
the same results as the one we detail here. The notations k, E, A, Tmin , and Tmax refer to the same quantities as before; ˇ now stands for the exponent of the power law whose fit to the boundary length distribution is the best in the least-squares sense. We chose Tmin D 10, because some small values for the boundary lengths are attained only for regions touching the border of the image. The fit to the power law is again very good, and the exponent ˇ is usually between 2 and 3. We present the results for the two images of the baobab and the baboon in Tables 12 and 13. We note that ˇ ' 2˛ 1 accounts for connected components of bilevel sets satisfying on the average a decent isoperimetric ratio, c
area1/2 C. boundary length
This is not the case in general, except for some images of textures. E. The Distribution of Intercept Lengths We also studied the distribution of the lengths of the intersections of the bilevel sets with lines of fixed direction. This was motivated by the fact that this quantity, in an object-based model of images, is easily amenable to calculations (see Section V.D). We once more observed a power law, which, according to the previously mentioned model, is compatible with a power law hypothesis TABLE 12 BOUNDARY LENGTHS FOR THE BAOBAB IMAGE, WITH DIFFERENT VALUES OF THE QUANTIZATION NUMBER k. Image
k
ˇ
E
Tmax
A
Baobab Baobab Baobab Baobab
20 16 12 8
2.39 2.37 2.26 2.23
0.34 0.35 0.25 0.35
176 154 125 111
13.2 13.0 12.3 11.6
TABLE 13 BOUNDARY LENGTHS FOR THE BABOON IMAGE, WITH DIFFERENT VALUES OF THE QUANTIZATION NUMBER k. Image
k
ˇ
E
Tmax
A
Baboon Baboon Baboon Baboon
20 16 12 8
3.02 2.91 2.93 2.89
0.28 0.35 0.41 0.33
82 81 100 96
14.2 13.9 14.0 13.8
186
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL TABLE 14 INTERCEPT LENGTHS FOR THE BAOBAB IMAGE, WITH DIFFERENT VALUES OF THE QUANTIZATION NUMBER k. Image
k
ˇ
E
Tmax
A
Baobab Baobab Baobab Baobab
20 16 12 8
2.72 2.60 2.55 2.33
0.42 0.37 0.47 0.44
88 77 118 124
13.3 13.0 13.1 12.6
TABLE 15 INTERCEPT LENGTHS FOR THE BABOON IMAGE, WITH DIFFERENT VALUES OF THE QUANTIZATION NUMBER k. Image
k
ˇ
E
Tmax
A
Baboon Baboon Baboon Baboon
20 16 12 8
3.79 3.58 3.37 3.23
0.47 0.35 0.41 0.58
24 27 29 44
12.9 12.8 12.8 13.0
for the size distribution of objects, even if the model is not isotropic. See Section V.E. However, it should be mentioned that the result is strongly varying with the direction. The experiments we present here are concerned with the horizontal direction; see Tables 14 and 15. The observed values for the exponent of the power law are typically between 2 and 3.5.
IV. SIZE OF SECTIONS AND THE BV NORM OF NATURAL IMAGES The aim of this section is to give a computational tool to decide whether an image can belong to the space BV of functions with bounded variation. The BV assumption for natural images is far ranging, from image restoration [Rudin, 1987; Rudin, Osher, and Fatemi, 1992] to image compression.
A. A Lower Bound for the BV Norm We consider I 2 BV , a bounded image belonging to the space of functions with bounded variation [Ziemer, 1989; Evans and Gariepy, 1992] on a domain (e.g., rectangular) ² IR2 . For 2 IR, define the level set of I with level by I D fx, Ix ½ g.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
187
Recall that a function is of bounded variation if, for almost every 2 IR, I is a set with finite perimeter and, denoting this perimeter by per I (for a precise definition of the perimeter and the essential boundary, refer to Evans and Gariepy [1992]), jjIjjBV D
per I d.
1
IR
(By the coarea formula, [Evans and Gariepy, 1992], we also have jjIjjBV D
jDIj.) In addition, by the classical isoperimetric inequality, we have for every set O with finite perimeter, perO ½ 21/2 O1/2
2
where O denotes the Lebesgue measure of O. In the following, we shall consider sections of the image. We always assume that the image I satisfies 0 Ix C. We first fix two parameters , , with 0 . For any n 2 IN, we consider the bilevel sets of I fx, C n 1 Ix < C ng D Cn1 InCn I. We call , -section of I any set that is a connected component of a bilevel set Cn1 InCn I for some n. We denote each one of them by S,,i for i 2 J, , a set of indices. Notice that the , -sections are disjoint and their union is the image domain , S,,i D .
3
i2J,
There are several ways to define the connected components of a set with finite perimeter, because such a set is defined up to a set with zero Lebesgue measure. We denote by H1 the one-dimensional Hausdorff measure, i.e., the length. In the following, we call a Jordan curve a simple closed curve of IR2 , i.e., the range of a continuous map c : [0, 1] ! IR2 , such that cs 6D ct for all 0 < s < t < 1, and c0 D c1. A Jordan curve defines two and only two connected components (in the usual sense) of IR2 nc[0, 1], one bounded and one unbounded. We shall say that a Jordan curve separates two points x and y if they do not belong to the same connected component of IR2 nc[0, 1]. One can prove [De Giorgi, 1954; Caselles and Morel, 1998] that a definition of connected components for a set with finite perimeter permits the following statements:
188
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Theorem 1 (and definition) Let O be a set with finite perimeter. (i) The essential boundary of O consists, up to a set of zero H1 -measure, of a countable set of noncrossing simple rectifiable closed curves cj with finite length such that perO D 6j H1 cj (ii) Two points are in the same connected component of O if and only if for any representation of the essential boundary by a family of Jordan curves of the preceding kind, cj , they are not separated by one of the cj . (iii) With this definition, the perimeter of a set with finite perimeter is the sum of the perimeters of its connected components. We denote by Jn ² J, the set of indices of sections that are connected components of Cn1 InCn I. As an obvious consequence of Proposition 1, we have the following. Corollary 1 perCn1 InCn I D
perS,,i .
i2Jn
When A is a set with finite perimeter, we have [Evans and Gariepy, 1992] perA D jjllA jjBV where llA is the characteristic function of the set A. Lemma 1 If B ² A are two sets with finite perimeter, then perAnB perA C perB Proof
Indeed, by the subadditivity of the BV norm, we deduce from llAnB D llA llB
that perAnB perA C perB. In the following theorem, we analyze the statistics of sizes of sections. We fix , i.e., the overall contrast of considered sections and for each 0 , we count all sections S,,i that have an area between s and s C ds. In other terms we consider the integer cardfi, s jS,,i j s C dsg.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
189
We average this number over all ’s in [0, ], and assume that this average number has a density f, s with respect to s. In other terms,
1
cardfi, s jS,,i j s C dsg d D f, s ds.
4
0
Theorem 2 Assume that there exists some > 0 such that (4) holds, i.e., the average number of sections with area s, for 0 , has a density f, s. Then there is a constant c, not depending on I, such that
jjIjjBV ½ c
s1/2 f, s ds.
5
0
Proof
Applying Corollary 1 and Lemma 1,
jjIjjBV D
perfx, Ix ½ g d IR
1 D 2 ½ D D
1 2 1 2 1 2
perfx, Ix ½ g d C IR
per In I d D IR 0
perfx, Ix ½ g d IR
1 2 n2Z
nC1
per In I d n
perCn1 InCn I d
n2Z
0
perS,,i d.
i2J,
By isoperimetric inequality (2), we therefore obtain
jjIjjBV ½ 1/2 0
jS,,i j1/2 d.
i2J,
Applying the Fubini-Tonelli theorem, some slicing, and Assumption (4), we get
jjIjjBV ½ 1/2
d
D 1/2
cardfi 2 J, , s jS,,i j s C dsgs1/2
0
0
0
d card fi 2 J, , s jS,,i j s C dsgs1/2
0
D 1/2
s1/2 f, s ds. 0
190
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
We can repeat the preceding analysis by assuming now that cardfi, p perS,,i p C dpg has an average density g˛, p with respect to p; i.e., 1
cardfi, p perS,,i p C dpg d D g, p dp.
6
0
Then we have the analog of Theorem 1 for the perimeters of sections. Theorem 3 Assume that there exists some > 0 such that (6) holds, i.e., the average number of sections with perimeter s, for 0 , has a density g, p. Then 1 C1 jjIjjBV ½ pg, p dp. 7 2 0 Proof
By proceeding as in the proof of Theorem 1, we have again jjIjjBV ½
1 2
0
perS,,i d.
i2J,
Applying again some slicing and Assumption (6) we get jjIjjBV ½ D
1 2 2
C1
d 0
cardfi 2 J, , p perS,,i p C dpgp
0 C1
pg, p dp. 0
B. Application to Natural Images In this section, we draw the consequences of Theorems 1 and 2 for the images analyzed in Section III. According to the results of this section, we can assume that the considered images satisfy C s˛ C g, p D ˇ p f, s D
8 9
for some constants ˛ > 0, ˇ > 0. This law has been experimentally checked for several values of D 256/k, k ranging from 8 to 20. We also checked that
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
191
the value of ˛ was almost not modified when the bilevels were not defined from gray level 0, but from some gray level less that 256/k (i.e., in the continuous model, for different values of ). By Theorem 1 we have
jjIjjBV ½ c 0
Cs1/2 ds D C1 s˛
if ˛ >
Cp dp D C1 pˇ
if ˇ > 2.
3 2
and in the same way, C1
jjIjjBV ½ c 0
Thus if we admit that (8) and (9) indeed hold for natural images when s ! 0, as is indicated by the experiments of Section IV.A, we determine that the considered images are not in BV if ˛ > 32 or ˇ > 2. Notice, however, that ˛ > 2, which happens for several of the considered images, is not compatible with a finite image area, since then s ds/s˛ D C1. As suggested to us by Vicent Caselles and St´ephane Mallat, this raises the question of whether the area is correctly measured by covering pixels. In fact, if a region is very ragged, the cardinality of covering pixels may be related to its perimeter as well, in which case the estimate of g, s is more reliable. We here point out that wavelet coefficients (see Meyer [1993] and Mallat [1997] for an introduction to wavelet decompositions) also give a way to decide whether or not an image belongs to the space BV. Let (ck ) be the wavelet coefficients of the image I, ordered in a nonincreasing sequence. Let us suppose that the wavelets have compact supports. We say that the ck ’s are in l1 if 6jck j < C1 and that they are in weak-l1 if there exists a constant C such that ck C/k. Obviously l1 is included in weak-l1 . It is quite easy to prove that if the ck are in l1 , then I is in BV. In the other direction, Cohen et al. [1998] recently proved that if I is in BV, then the ck ’s are in weak-l1 . Thus it is possible to decide whether an image belongs or not to BV by looking at its wavelet coefficients decay, except if they decrease as C/k, which often happens to be the case [Mallat, personal communication]. Moreover, it is worth noticing that the wavelet coefficients produced by the characteristic function of a simple shape already decrease as 1/k. We do not present here a precise comparison between the two criteria. Let us just mention that in the case of the baboon image (Figure 3), both methods agree: this image is not in BV. For the well-known image of Lena, our approach gives an ˛ of 1.9 (for k D 16), which suggests Lena being out of BV, whereas from the wavelet approach, the image is in BV.
192
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
V. THE DEAD LEAVES MODEL A tridimensional scene may be modeled as a group of solid objects occluding each other partially or fully, according to their respective positions and their position with respect to the observer. How, then, can the observer make inferences on the size of these partially hidden objects? And which ratio can we assume between the size of visible parts of objects and the size of the objects themselves? This question can be addressed in a statistical framework: given a size distribution for the objects before occlusion, how can we compute the size distribution of visible parts? In order to do so, we shall use a random shape model developed by Matheron [1967, 1968], in the framework of his theory, mathematical morphology. He called this model the dead leaves model. Its formalization requires some explanations, definitions, and mathematical results which are given exposed in Section V.A. This study has two motivations. The first one comes from phenomenology, which states that the occlusion phenomenon, so essential in human or animal life, does not seem to affect too much our perception of objects. In particular, Kanisza [1979] studied several perceptual strategies to detect occlusions but also to get rid of them by mental completion. This leads to the notion that with only the visible parts of objects, a layered view can be mentally reconstructed simply by interpolating their hidden parts. Even though gestaltists do not seem to have addressed the question, it is plain that such an interpolation (which Kanisza called amodal completion) can be performed only if a significant part of each object is under sight, large and significant enough to make extrapolation possible. A modal completion might be in many cases related to our prior knowledge of objects. However, Kanisza and his school proved that the completion takes place fairly independently from this prior knowledge, always giving a preference to constraints of a geometric nature, whether or not these lead to a common sense contradicting completion. The second motivation is the experimental program detailed in Section III. Even though the link between our definition of objects and the real visible parts of objects is not obvious, we can, at least in one case, derive a power law with the help of the model. The dead leaves model permits study of the influence of occlusion on the size distribution of a set of objects, with some restrictions: ž We shall address only stationary models, which do not take into account perspective for instance. It is possible to include perspective in the framework of the dead leaves model, but this will not be presented here. ž Objects will be assumed to spread out independently, which is of course an undue restriction for a real scene, where objects interact physically. ž Most of the theoretical results won’t deal directly with the area of visible objects but rather with the size distribution of their intersections with a straight line.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
193
A. Random Closed and Compact Sets of IRn This section contains the prerequisites for the understanding of the dead leaves model. The interested reader should consult Matheron [1975]. For an introduction to the methods and applications of stochastic geometry, we refer to Stoyan, Kendall, and Mecke [1995]. We denote by , , K, respectively the sets of all closed, open and compact sets of IRn . In the following, we define a topology on and K and its associated -algebra of Borel sets. The choice for the topology is motivated by the following considerations: ž The set spaces thus obtained are separable, Hausdorff, and compact. ž The basic sets (open sets of sets, etc.) for this topology have an easy interpretation. ž The topology induced on Knf;g is metrizable with the Hausdorff metric. Let us set for any subset A of IRn , A A
D fF 2 /F \ A D ;g D fF 2 /F \ A 6D ;g.
In the same way, define subsets of K, KA , and KA . We define a topology T on , generated by the basis of open sets
K
, K 2 K;
G, G
2 .
It can be shown that , endowed with the topology T , is separable, Hausdorff, and compact. We can also endow K with a topology TK , a basis of which is KF , F 2 ; KG , G 2 . This topology is strictly finer than the topology induced on K by T . Define on K0 D Knf;g the Hausdorff distance dH by dH A, B D inffυ > 0/A ² Bυ and B ² Aυ g where
Aυ D fx 2 IRn /dx, A υg.
It can be shown that the topology TK on K0 and the topology induced by dH on K0 are equal (“point” ; is isolated in K). We denote by B and BK the Borel -algebras generated by T and TK , respectively. Definition 1 A random closed set (respectively a random compact set) of IRn is a measurable function from a probability space , T, P into , B (respectively into K, BK ).
194
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
We associate with a random closed set F its range measure F , defined on Borel sets B 2 B by F B D PF1 B. We write PrF \ A D ; D F A , PrF \ A 6D ; D F A , and so on. A first basic result for the study of random sets is the following. Theorem 4 Any random closed set F is characterized by its capacity function defined for every compact set K by GF K D PrF \ K D ;. Proof We have to show that if F1 and F2 are two closed random sets with the same capacity functions GF1 and GF2 , then they are equal. It is enough to prove that F1 and F2 satisfy F1 B D F2 B for every Borel set B in B . From the definition of the topology on , it is enough to have F1 K D F2 K and F1 G D F2 G for every K 2 K and every G 2 . The first condition boils down to the equality of the capacity functions. In addition, let G 2 . Then there exists a sequence of compact sets Kn such that Kn " G, which entails Kn " G . By continuity of the measure with respect to monotone sequences, we then have F1 Kn " F2 G . Now, Kn D Kn c and therefore 1 Kn D 1 F1 Kn D 1 F2 Kn D F2 Kn , which ends the proof. Definition 2 We shall say that a random closed set with capacity G is stationary if for every compact set K and every point x of IRn , one has GK D GK C x. We say that F is isotropic if it is stationary and for every rotation r, GK D GrK. We end this section with some notation that will be of use in the following. For any sets A and B, we set AL D fx, x 2 Ag A B D fx 2 IR2 /x C BL ² Ag A ý B D fx C y, x 2 A, y 2 Bg A B is called erosion of A by B, and A ý B is called dilation of A by B. B. The Dead Leaves Model The model we present here is a slight variation of a model introduced by Matheron [1968]. The original model was defined on all of IR2 , whereas we define it only on a window, but the result is locally identical. In Section VIII, we present the dead leaves model as introduced originally by Matheron. We consider a closed bounded domain in IR2 (a rectangle, for instance). Let Xi , i 2 IN (IN is the set of negative integers) be a family of independent
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
195
equally distributed random compact sets Xi . We assume that X0 is almost surely uniformly bounded, i.e., there exists R > 0 such that X0 ² B0, R almost surely. We also assume, for reasons that will become clear soon, that EX0 D > 0 for some disk D. We then consider a family xi , i 2 IN of random variables that are independently uniformly distributed in R D
C B0, R. We define a sequence of dead leaves in the following way. For each i 2 IN we call dead leaf at time i the random set xi C Xi . We then call visible part of the dead leaf i the set
xi C Xi n
xj C intXj i<j0
where intX is the interior of X. We then define the dead leaves model M to be the collection of all the visible part of the dead leaves i for i 2 IN. We also define a cell of M to be any connected component of a visible part. In the following, X0 will often be referred to as the grain of the model. In Figures 10, 11, and 12, we present some experimental realizations of dead leaves models for different choices of grain X0 . To simulate the model, we associate with each xi C Xi a different gray level uniformly distributed between 0 and 255. We already see from those examples that the behavior of the dead leaves model is strongly dependent on the shape of the leaves as well as on their size distribution. We also notice that the 3-D perception is much stronger in the case of elongated rectangles than in the case of disks.
FIGURE 10. Simulation of a dead leaves model, where the grain X0 is a disk with constant radius.
196
FIGURE 11. Simulation of a dead leaves model, where the grain X0 is a disk with an exponentially distributed radius.
FIGURE 12. Simulation of a dead leaves model, where the grain X0 is a rectangle with a rotation uniformly distributed in [0, 2], and with width and length following a uniform law between extremal values.
Lemma 2 Let K be a compact set of . Then the probability that K is contained in a given dead leaf xi C Xi is L EX0 K . R where is the Lebesgue measure. Proof K is in the dead leaf Xi if K ² xi C Xi , which is equivalent to xi C L ²X L i , i.e., xi 2 X L i K. It is easily checked that X L i K ² R . Since xi is K R uniformly distributed in , we deduce that L i K PrK ² xi C Xi D Prxi 2 X D
1 R
L i K dx Prx 2 X
R
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
D D D
1 R 1 E R
R
197
EllXL i K x dx
R
llXLi K x dx
L EX0 K EXL0 K D R R
10
L > 0, then K Corollary 2 If K ² , K measurable, is such that EX0 K is covered by some dead leaf with probability 1. Since for some disk D, we have EX0 D > 0, then is almost surely covered by a finite number of dead leaves. R L Proof Let a D EX0 K/
> 0. Then K is not completely covered by a leaf xi C Xi with probability 1 a < 1. Thus, since leaves are independent, K is not covered by any leaf with probability 0. Moreover, it is possible to cover with a finite number of the sets Yi D yi C D \ , with yi points of . Then each Yi is covered by some dead leaf at time ti with probability one, and thus at T D maxfti g, is covered by a finite number of leaves.
We are interested in the relation between the size of the grain X0 and the size of the visible parts of the dead leaves model, or the size of the cells. Next, we are going to prove a fundamental result which yields the probability for a compact set K ² to be included in a visible part of the dead leaves model. Theorem 5 Let M be the dead leaves model associated with X0 , as before. Then for any compact set K ² , the probability that K is included in a visible part of M is given by L EX0 K QK D . 11 L EX0 ý K Clearly, if K is connected, QK also denotes the probability that K is included in a cell of the model. Proof First, we note that the probability QK corresponds to the probability that for some i, K is included in the dead leaf xi C X and for i < j 0, K does not intercept any of the dead leaves xj C X. As the proof of Lemma 2 shows, in order to compute PrK \ xj C Xj D ;, we can work with Xj D X deterministic, xj uniformly distributed, and then take the expectation of the resulting formulas. The probability that the compact set K does not intercept L We remark the dead leaf xj C X is equal to the probability of xj 2 / X ý K. R L L that since X ² D0, R and K ² , X ý K ² . We deduce that
198
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
the probability in xj (for fixed X) that K does not meet xj C X is equal to L D1 PrK \ xj C X D ; D 1 Prxj 2 X ý K
L X ý K . R
Now, if Xj is a random compact set distributed as X0 , taking the expectation yields L EX0 ý K PrK \ xj C Xj D ; D 1 . 12 R We now consider the following disjoint events Ei , for i 2 IN: K is included in the dead leaf xi C Xi , and for i < j 0 the compact set K does not intercept the dead leaves xj C Xj . Since the leaves are independent, we have, using Lemma 2, i L L EX0 K EX0 ý K . 13 PrEi D 1 R R Making the sum over i 2 IN in (13) we obtain that the probability that K is contained in a visible part of the dead leaves covering is i L L EX0 K EX0 ý K QK D 1 R R i2IN which yields the announced result by summing the geometric sum, QK D
L EX0 K . L EX0 ý K
14
We end the section by remarking that if each leaf is assigned a value (color) in [0, C], C real, then a realization of the model can be seen as a function from to [0, C]. We may then wonder which functional space such a function belongs to. The characteristic function of a set with finite boundary length in
is in the space of functions with bounded variation, BV , and this will also be the case for almost every realization of the dead leaves model. Indeed, we saw previously that for almost every realization there exists an i0 < 0 such that the union of the leaves with indices i > i0 covers . This is also the case for the corresponding cells. Moreover, the sum of the length of the boundaries of those cells is obviously smaller than the sum of the lengths of the boundaries of their parent leaves. Now, if the parent leaves have boundaries whose length is bounded by a constant value B, the BV norm is finite and bounded by i0 Ł B for almost all realization. In particular, this is the case if the leaves are convex, since they are included in B0, R. We have thus proved the following:
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
199
Proposition 1 If X0 has uniformly bounded boundary length, then the dead leaves model M is in BV almost surely. The meaning of the result is strong and general. Even though we can have arbitrarily small realizations of X0 , these small leaves do not succeed in creating boundaries with infinite length, the reason being that the small leaves are steadily covered by larger ones. The next section introduces a tool in the study of compact sets that will be of use in the evaluation of the sizes of the cells of the dead leaves model. C. The Geometric Covariogram The geometric covariogram of a convex and compact set C of IR2 is the function C from IR2 to IR defined by C l, D C \ Cl , D C l, where Cl, is the set C C x, x being the point with polar coordinates (l, ), l > 0, 2 [0, . The C function has compact support and is equal to zero if l ½ 1 p? C, where p is the orthogonal projection on a line perpendicular to the direction , ? D C /2, and 1 is the Lebesgue measure in IR. In the following we assume that 1 p C > 0 for all (this will be the case with the leaves of M that have nonempty interiors). We are interested in the size distribution of the intersection of C with all lines with direction meeting C. We assume that the intersection of these random lines with a line perpendicular to is uniformly distributed in p C. We associate with each s 2 p C a line Ls of direction and meeting C. We denote by FC l, the distribution function of the size of the segments, defined by 1 fs, 1 Ls \ C ½ lg . FC l, D 1 p C Lemma 3 FC l, D Proof
1 p C \ Cl, . 1 p C
It is easily seen that Ls \ C \ Cl, D ;
iff 1 Ls \ C ½ l
which yields the result. The function FC is related to the function C by the next theorem.
200
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Theorem 6 For any compact and convex set C in IR2 , 2 [0, , C ., is differentiable on [0, p? C, and ∂ C l, D 1 p C \ Cl, D 1 p CFC l, . ∂l Proof
15
[Matheron, 1975] For any fixed ,
ε1 p C \ Cl, ½ C \ Cl, nC \ ClCε, D C l, C lCε, ½ ε1 p C \ ClCε,
16
and ε1 p C \ Clε, ½ C \ Clε, nC \ Cl, D C lε, C l, ½ ε1 p C \ Cl, .
17
In order to establish the theorem, it is enough to prove the continuity of the function fl D 1 p C \ Cl, . For any l > 0, C ., is continuous from the left. Indeed, ln " l ) C \ Cln , # C \ Cl, (because the interception is an s.c.s. function) so that ln " l ) p C \ Cln , # p C \ Cl, (because p is continuous from K0 to K0 [Matheron, 1975, p. 69]). Therefore, we obtain 1 p C \ Cln , # 1 p C \ Cl, following the monotone continuity of 1 , so the left-continuity is proved. It is easily seen that fl is continuous from the right for any l such that C l, > 0, which ends the proof. We note that, in particular, the theorem states that C is a convex function. In what follows, we denote by C0 the derivative of C with respect to l. We can generalize the definition of the C function to a random compact set, using the expectation EC l, with respect to the probability law of C, when such a definition makes sense. In particular, we can define such an expectation when EC < C1, which is the case for the grain of the dead leaves model. For instance, we can consider a family of random compact sets fCr gr2I , depending — among other parameters — on r > 0 in such a way that Cr is a rescaling of C, Cr D rC, with a probability distribution fr. We then define the covariogram as C1
l, D E 0
Cr l, fr dr.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
201
We are interested in the lengths of the intercepts of C, i.e., its intersections with lines of direction . We define FC l, D
EC0 l, EC0 0,
when this makes sense, i.e., when EC0 0, is finite and nonzero. As an easy consequence of Theorem 5, we have the following. Corollary 3 Assume that 0 < EC0 0, < C1. Then for any l 2 [0, p? C, EC l, 0 D EC0 l, so that FC l D
EC l, 0 . EC 0, 0
18
Proof We take the expectation with respect to the probability law of C of all members of relations (16) and (17). We notice that we can divide by ε and pass to the limit as ε ! 0. Indeed, we just have to prove that E1 p C \ Cl is continuous with respect to l. Now, we have seen in the proof of Theorem 5 that 1 p C \ Cl is continuous with respect to l. In addition, we obviously have 1 p C \ Cl 1 p C, which has a finite expectation by assumption. This proves the asked continuity by Lebesgue theorem. So, by dividing by ε and letting ε ! 0, we get the analog of (15), i.e., ∂ E l, D E1 p C \ Cl, D E1 p CFC l, . ∂l C
19
Thus in the following, where we will consider only covariograms for random convex sets, we will write C for EC and C0 for its derivative with respect to l. D. The Distribution of the Intercepts in the Dead Leaves Model We are interested in the relation between the size of the convex set X0 , the grain in the dead leaves model, and the size of the connected components of the partition of the plane provided by the dead leaves model M, the cells of M. In the following, as a notion of size of a set, we choose the set of sizes of its intersections with lines of direction , the so-called intercepts. We fix
202
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
, and we will sometimes omit it. Let x0 be the center of , the window in which the model has been defined. Let D be the straight line of direction 0 passing by x0 . Let Fxtess l be the repartition function of the length of the component of M \ D containing x0 . We define C1
Ftess l D C l
0 dFxtess u u
(the integral has actually finite bounds, because the model is bounded), according to the usual length biased sampling. The preceding integral is a 0 slight abuse. In fact, dFxtess /u D is a locally bounded nonnegative Radon 0 measure on (0, C1], since Fxtess l is nonincreasing, and the integral means Ftess l D C[l, C1]. Roughly, this definition means that we are interested in the length repartition of all the components of the intersection of the model with a straight line, whereas it is more likely that x0 be in a large component than in a small one. Besides, in order to avoid edge effects, we assume that x0 C X0 ² almost surely (we consider a large enough window). As we are going to see, Ftess l, is related to QK, , the probability that a segment K 2 on a line with direction and length l is included in a connected component of M (in this case K is connected, so that the probability that it is included in a visible part or a cell are the same). We write Ql, D QK, . We sometimes write Ql, Ftess l, when there is no ambiguity about the value of . Theorem 7 Let M be a dead leaves model associated with a convex X0 such that there exists a point x0 such that x0 C X0 ² almost surely. Then for any 2 [0, C1 y l x0 dFtess y, 20 Ql, D y l so that Ftess l, D
Q0 l, . Q0 0,
21
Proof Consider the following event Ey, dy: a segment contained in M \ D with length y y C dy contains x0 . The probability of Ey, dy is 0 dFxtess y. On the other hand, since M is stationary,1 the conditional probability that the segment of M \ D containing x0 also contains the whole segment [x0 l/2, x0 C l/2], knowing its length is y, is equal to the probability that a segment with length l and uniformly distributed center x 2 [y/2, y/2] be 1 M is not, strictly speaking, stationary, but we are considering only what happens inside the window .
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
203
wholly contained in [y/2, y/2]. Obviously, this last probability is y l/y. Since the events Ey, dy are disjoint, we conclude that C1
Ql D l
l y x0 dFtess y. y
D Ql lQ0 l and, therefore, by an easy calculaThis implies that tion using (20) that Ftess l D CQ0 l. But Ftess 0 D 1, and the theorem is proved. 0 l Fxtess
Next, we are going to formulate the relation between Ql, the probability for a segment of size l to be included in M, and the covariogram l associated with the grain X0 . We will write [0, l] for the segment of length l and direction , when is fixed. Theorem 8 Consider a dead leaves model whose grain X0 is convex. Then Ql, D Proof
22
According to Formula (11), we have Ql D
and
l, . 0, l 0 0,
EX0 [0, l] EX0 ý [0, l]
EX [0, l] D EX0 \ X0 l D l
because X is convex, and, again by convexity, EX0 ý [0, l] D EX0 C lE1 p X0 which is a directional version of Steiner formula. By Formula (18), we obtain EX0 ý [0, l] D 0 l 0 0
23
and the theorem follows. Theorem 9 Consider a dead leaves model whose grain X0 is convex. Then the size repartition Ftess l D Ftess l, of the connected visible intervals of the dead leaves model in direction is related to the size repartition FC l D FC l, of the leaves in direction by C1 h
Ftess l dl D
C1
1 2
1 C ah
FC l dl h
24
204
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
where aD Proof
0 0, . 0,
This is an easy consequence of Formulas (18), (21), and (22).
Corollary 4 The average size of the intercepts for the visible parts of leaves is twice as small as the for the leaves themselves, i.e., C1
ldFtess l, dl D
0
Proof
1 2
C1
ldFC l dl. 0
By Theorem 8, C1
Ftess l dl D
0
1 2
C1
FC l dl
25
0
and the result follows by integration by parts. If, in addition, we can make observations on the tessellation, then we can estimate Ftess and therefore deduce the value of a because, by (21) and (22), C1
Ftess l dl D
0
1 02 1 D D . Q0 0 2 0 00 2a
Then Formula (24) permits to deduce FC . This relation also implies that there is no invariant distribution function, i.e., no distribution of the grains such that Ftess D FC . We also get C1
C1
FC D
0
0
0 l 0 EX0 D 0 D 0 0 0 EUX0
because, by Cauchy formula, 2
E
pX0 d D EUX0
0
It seems nonetheless difficult to recover other characteristics of the dead leaves. Conversely, if we know the leaves model, then we can compute a, and, therefore, Ftess . Notice, however, that the cells of the model are not necessarily convex. Thus the connected components of the intersection of a given straightline with the tessellation are not independent: two intersections may belong to the same original leaf or even to the same cell.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
205
E. The Stability of the Power Law In this section, we prove that two important quantities do follow a power law if the size distribution of objects does. The first one is the probability that two points at distance x are in the same visible part of an object, and the second one is the length distribution of intercepts, introduced in the previous section. The first of these results was proved, with a different formalism, by Ruderman [1997], who also experimentally observed the power law for the probability of two points belonging to the same object in natural images. The objects he considered were obtained by visual segmentation. We consider that an interesting quantity to study in images is the distribution of connected components of bilevels. We experimentally observed that the length distribution of intersections of those components with a line of fixed direction follows a power law (see Section III). The grain X0 is assumed to be a convex set. We assume that realizations of X0 are homothetics Xr D rX1 of a random compact X1 , according to the density function fr, and we fix . Moreover, we assume fr D 0 for r < Rmin or r > Rmax . Without loss of generality we assume Rmin D 1. Then we set l, r, D Xr l, D EXr \ Xrl, where the expectation is taken at fixed r. First of all, we remark that l, r, D EX \ r
Xrl,
D r E X1 \ X1l 2
l D r 1, , r 2
r ,
so that for any nonzero l, Rmax
X0 l, D
l, r, fr dr D l
3
Rmax /l 1/l
1
1 r 1, , flr dr. r 2
If we now assume that fr D Cr υ , i.e., that the size of objects follows a power law, we get X0 l, D l3υ Al, with Al, D
Rmax /l 1/l
1 r 2 1, , fr dr. r
206
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
1 Now, for l ½ 1, the previous integral is from 1 to Rmax /l, because 1, , r D 0 for r < 1. Eventually, if we suppose Rmax /l large enough and υ > 3 we get C1
Al, ³ A D
1 r 1, , fr dr r 2
1
so that X0 l, ³ Al3υ
26
with A independent of l. We also notice that if we allow objects not to be bounded, for instance, using Matheron’s original model (see Section VIII), then the previous approximation becomes an equality for l ½ 1. The probability Vl, that two points at distance l and in direction are included in the same visible part of the model is, according to Formula (11), Vl, D
EX0 fx, x C lg X0 l, D . EX0 ý fx, x C lg 2X0 0 X0 l,
Using Formula (26) we get, for large values of l (small enough compared to Rmax ), and υ > 3, Vl ³ Cl3υ . 27 If we average V over all orientations, we still have Vl ³ Cl3υ . It is then easy to show that, if each leaf is given a different color, the covariances of the model also follow a power law. This result was already obtained by Ruderman [1997]. He also experimentally observed this fact on images, using an object segmentation “by hand.” We now come to the study of Ftess l the repartition function of intercepts that was defined in Section V.D. This function has been observed to follow a power law in natural images (actually the density function is following a power law); see Section III. We recall that, following Formulas (21) and (22), Ftess l, D
X0 0 l, . X0 0 l, lX0 0 0,
Now, following the same reasoning as before, it is easy to see that if fr follows a power law Cr υ , and if υ > 3, then X0 0 l ³ cl2υ (as before either for fixed or an average in all directions). Therefore, for values of l much larger than 1 and much smaller than Rmax (hypothesis that can be removed if objects are unbounded) we get Ftess l ³ Cl2υ
28
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
207
which is indeed what we observe in Section III, a clue for a power law distribution of object sizes in the surrounding world. Moreover, our experiments on natural images gave values of υ slightly above 3 (the density functions have an exposent slightly above 2, see Section III.E). This emphasizes once more the fact that natural images are nearly scale invariant. Indeed, the limiting case υ D 3 correspond to a log behavior of the function V (still assuming Rmax /l is large), which, as we saw in the Section II.C.1, corresponds to the scale invariant case. F. The Size Distribution of the Relief Grains In this section we are interested in the size distribution of the grains that are completely visible after the occlusion process. More formally, those grains are the xi C Xi such that xi C Xi \ xj C Xj D ;,
i < j 0.
Following G. Matheron, those grains will be called relief grains. As before, we consider a dead leaves model on a window , with convex grains X0 . We further assume that X0 ² D0, R with probability 1, and to avoid edge effects we will suppose that the centers are uniformly picked in 2R D ý D0, 2R. Thus the result of Lemma 2 is valid for any compact K such that K D0, R D ; and K \ 6D ; which in particular is the case for any leaf that touches . Moreover, we suppose that X0 has area A and perimeter U following the density fA, U. Eventually, we assume that X0 is isotropic, which will enable us to apply the Steiner formula. Theorem 10 For a dead leaves model satisfying the preceding assumptions, thus with grains X0 having area A and perimeter U with density fA, U, the relief grains touching follow the density fr A, U D C
fA, U EUX0 U EX0 C A C 2
where C is a normalizing constant. Proof The probability that Xi is centered in , has area between A and A C dA, has perimeter between U and U C dU, and is not touched by any of the following leaves is EXi ý X0 jXi has area A and perimeter U i fA, UdA dU 1 2R 2R
208
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
which by the Steiner formula is fA, UdA dU 2R
1
UUX0 2 2R
i
EX0 C A C
.
We sum this relation over all i’s and get the density function of the relief grains up to a normalizing constant, which proves the theorem. The results about relief grains are computable because we know how to get the probability for a compact to be avoided by a leaf. On the other hand, we don’t know how to get the probability for a compact to be completely overlapped by the subsequent leaves, because this may be achieved in several stages, and thus we don’t know how to compute the proportion of leaves that disappear completely. G. The Average Area After Occlusion: Numerical Experiments As we saw on the three examples of Figures 10–12, both the shape of the grain and its size distribution have a strong influence on the visual flavour of the dead leaves model. In particular, the isoperimetric ratio of the grain seems to be of great importance. Contrarily to the average intercept, which is reduced to its half by occlusion whatever the parameters of the grain are, the average area after occlusion seems to follow more complicated rules. This quantity appeared to us as essential, since it gives a partial answer to the question: “how much do we see of objects?” In this section, we present some experimental results concerning the ratio between the average area of the the grain X0 , and the same quantity for the cells of the dead leaves model. We also checked, in a sense, the influence of discretization in our simulations by checking that the average intercept length for the cells was twice as small as the one for the grain, X0 . This experimental quantity is also of interest when the shape is not convex, in which case we do not have a theoretical result. We denote average area of the cells of M RS D average area of X0 and RL D
average length of intercepts for the cells of M . average length of intercepts for X0
The simulations of the model consist in sequentially superposing discrete shapes whose size follows a certain probability law and whose centers are uniformly drawn in the image, numerous enough to cover all of it. We worked on 900 ð 900 images. The averages for the grain X0 were computed in the
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
209
discrete case. In order to reduce boundary influence in the computations of the ratios, we take into account only cells falling in a small enough subimage. We studied only isotropic models (except for the horizontal squares) and compute segments only in the horizontal direction. Moreover, each experiment consists of 50 such images, over which the quantity under study is averaged. We present the results in the following cases: ž ž ž ž ž ž ž ž
Disks with radius of 20 pixels (diameter of 41 pixels) Disks with radius uniformly distributed between 15 and 40 pixels Disks with radius exponentially distributed between 5 and 50 pixels Squares with sides of 41 pixels parallel to the border of the image Squares with sides of 41 pixels, uniformly rotated Elongated rectangles of size 108 ð 12, uniformly rotated Arrows of fixed size, uniformly rotated T’s, uniformly rotated
In Table 16 we present RL and RS for those experiments. We notice that the preceding experimental results confirm the expected value of the ratio RL D 0.5 for the average length of segments and, therefore, the validity of our protocol. Moreover, it seems that the result remains true for nonconvex shapes. Concerning the average area ratio RS , and contrarily to the constant value of RL , the values are strongly dependent on two parameters: the size probability law and the shape of the grain. We basically observe two phenomena. The first one is that at constant shape for the grain, the more numerous the small objects, the bigger RS . This is illustrated in the case of disks with three different size laws and has been confirmed by other experiments. Secondly, the value of the isoperimetric constant for the grains seems to have TABLE 16 RL IN VARIOUS CASES. RS IS THE RATIO OF THE AVERAGE AREA AFTER AND BEFORE OCCLUSION. RL IS THE RATIO OF THE AVERAGE INTERCEPT LENGTH AFTER AND BEFORE OCCLUSION (PREDICTED BY THE THEORY TO BE 0.5).
RS
AND
Experiment Disks, same radius Disks, uniform radius Disks, exponential radius Horizontal squares Rotated squares Elongated rectangles Arrows T’s
RS
RL
0.257 0.283 0.339 0.275 0.23 0.08 0.10 0.061
0.51 0.51 0.52 0.51 0.51 0.53 0.52 0.54
210
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
a strong influence on RS . It seems that elongated or nonconvex shapes lead to a small value of RS . We claim that there must be a link between the decay of the average area by occlusion, and the decay of the average perimeter. This interaction between area and perimeter has already been observed in the case of the relief grains studied in the preceding section.
VI. A SHORT REVIEW OF TEXTURE SYNTHESIS BY MATHEMATICAL METHODS Research in texture analysis and synthesis has several aims. One of them, of a wide commercial relevance, is the ability to reproduce any visual aspect of the surrounding world by fast computational methods. A second aim is to understand how we perceive textures and discriminate them. This is the view point of psychophysics [Julesz, 1981,1986; Beck, 1983]. A third aim is the creation of mathematical models general enough to be able to synthesize both natural and new, artificial textures. All three aspects are closely related. The first one, however, seems to have of late adopted the “lazy solution,” which consists in storing in large databases all textures of interest for animation. The main problem of texture synthesis then becomes a mapping problem: how to map a given plane texture on the 3D surface of an object. We shall not develop this aspect here and refer to Heckbert [1986]. From the animation viewpoint, it must be emphasized that the superposition of two very simple synthesis methods is widely used: the normal perturbation method (explained in Section VI.A) followed by a rendering algorithm using an illumination model (see Kajiya [1986]). However, all textures obtained this way end up looking very much alike. Among the mathematical models that are candidates for being general texture synthesis models, some are, in fact, too specific in structure to permit a synthesis of all or even many textures, but they may give good results in some particular instances. This is the case with the models we shall first briefly review: fractional Brownian motion and the perturbation of normal vectors permit creation of rough surfaces; iterated function systems create ragged, textured self-similar shapes; solid texture methods create marblelike or woodlike textures; reaction-diffusion equations create regular patterns, appearing for instance on the skin of some animals and shells; spot noise methods create clouds or tissues, etc. These methods are generic and can also create textures that do not refer to any existing one. This is not a drawback at all! There are also much more specific methods simulating specified surfaces, such as particle systems for fires and plants [Reeves, 1983], physically based models for water surface [Fournier and Reeves, 1986], and tissues [Weil, 1986], etc.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
211
We then present the only two really general models: Markov random fields and multiscale models. The first one models any texture by pointwise interactions (between pixels). This is a strong structural drawback: a privilege is given to the pixel scale. Long-distance interactions are not well modeled and subpixel interactions are not allowed. Thus Markov random fields, as general a formalism as they are, make it difficult to analyze and synthesize textures that have several significant scales. This probably explains the computational cost in identifying the Gibbs potentials of a given texture. Notice that in discrete grid models, even a good rotation invariance requires many nonlocal interactions between pixels. In fact, a recent paper by Heeger and Bergen [1995] seems to be a major breakthrough in texture analysis/synthesis. This progress is somewhat in agreement and inspired by psychophysical experiments showing the relevance for texture discrimination of the responses of textures to filter banks. See Bergen and Adelson [1988], Chubb and Landy [1991], and Malik and Perona [1990]. The Heeger and Bergen method and its generalizations [Simoncelli and Portilla, 1998; De Bonet and Viola, 1997] seem to be able to reproduce fairly any given texture in reasonable computational cost. At the end of the section we also present the Zhu and Mumford model, which merges the filter bank response method with a Gibbs field formalism.
A. The Normal Vectors Perturbation Method This is one of the simplest and most used methods for texturing a surface. It was introduced in Blinn [1978]. The idea is to start from a surface and then slightly modify the normal vectors to this surface before computing the lighting. To do so, we consider a small perturbation of the surface in the direction of the normal and then compute the effect on the normal vectors. We then consider the same initial surface, but with modified normals for the rendering. In the case where the surface is known through a parametric E equation Mu, v, with normal vectors n E u, v, and the deformed surface is E nu, v, then up to the first order in the deformation the new Mu, v C Fu, vE normals are E dF E dM dM dF n EC 3 n E3 n EC . du dv du dv Most of the time, F consists of a table of interpolated and filtered values (“bump map”). An easy way is to give random values to the elements of the table to be interpolated. This method is probably the most used in image synthesis, in conjunction with texture mapping. It is fast and avoids the storage of complicated surfaces, thus saving memory. The fact that only the normals and not the surface itself are modified is also one of the drawbacks of the
212
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 13. Normals are perturbed by the interpolation of a white noise.
method, because it creates artifacts at the junction of surfaces with different textures. We give a very simple example of the method in Figure 13, where we use a white noise to perturb the normals. The main idea of this texture model is to transform a smooth surface into a more irregular one, that looks more realistic. This is one of the leitmotivs of computer graphics, because all objects whose physical boundaries are modeled by primitives having a simple mathematical expression, if appropriate for man-made objects, look far too regular to mimic the appearance of natural objects. We here refer to constructive geometry (objects are modeled by a small number of geometric primitives linked together by logical operations), interpolation methods (starting from some points of an object, it is reconstructed by means of splines, Bezier curves, etc.), and elastically deformable models (objects are modeled by deformable surfaces subject to external constraints; see Terzopoulos et al. [1987]). Therefore, Fournier, Fussel, and Carpenter [1982] introduced a variation on constructive geometry to model such irregular objects as mountains, and Szeliski and Terzopoulos [1989] introduced a variation on deformable surfaces (thus also on interpolation methods) that permits the modeling of irregular objects by adding perturbations to the energies constraining the surface. It is impossible here to give a detailed account of all those methods, but the perturbation idea will often be encountered in the following.
B. Solid Textures The basic idea of solid texturing [Perlin, 1985; Peachey, 1985] is the following: each point of the three-dimensional Euclidean space is given a numerical value (or an array of values) and then a surface is textured by looking at the values corresponding to its boundary in this space. Those values can be used
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
213
to determine the gray level of a surface point, its color or its normal, that can then be used for rendering. The method is free of the distortion problems encountered when mapping textures to three-dimensional objects. The values at each point are determined by procedural methods, mainly variations around noises having good invariance properties, by translation, rotation, and possibly scaling (see Perlin [1985] and Peachey [1985]). Let us define the Noise function introduced in Perlin [1985]. At each point x of aZ2 , a rough grid where a is an integer larger than 1, Noise(x) is computed as a random value, and the values elsewhere, in the fine grid aZ2 , are obtained by regular interpolation (e.g., by splines) of the values in the rough grid. This function can then be used for a variety of texture synthesis. We can plainly use the value Noise(x) to determine the gray levels of the surface of the object. We can also use the values of the noise function to determine intensity or direction of vectors that will enable us to perturb the normals as in the Blinn model (Section VI.A). There are many more elaborate ways to use the 3-D information, and we mention just one of them. If we let each point x of IR3 have the value sinx, where x is the first coordinate of x, and use an appropriate look-up table (a table in which real values are related to colors), we can synthesize a texture looking like marble, but far too regular to be realistic. The idea is then to define a new function Turbulence from IR3 to IR, defined by Turbulencex D
k½0
Noise
x 2k 2k
(noises with lower and lower frequencies are superposed). The marble texture is then obtained by applying a look-up table to the values sinxCTurbulencex instead of sinx. Using similar methods, it is also possible to construct textures with approximately 1/f power spectrum. The results are impressive, and their generation is not too memory demanding, because they are synthesized by procedural methods. We show an example of such a marble solid texture in Figure 14. A variation on these methods, introduced in Perlin [1989], is the so called hypertexture model. It is a way to represent objects for which the texture is the result of an intricate 3-D shape, such as hair or turbulent fluids. An object is represented by a function D from IR3 into [0, 1], taking values 1 where the object is hard and values in (0, 1) in a region where the object is deformable. The soft part is deformed using various functions to give a textured aspect to the object. Since the first articles about solid textures, they have been widely used together with many variations. For instance, a recent article introduces new basis functions based on Voronoi cells [Worley, 1996].
214
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 14. Solid texture model: marble.
C. Reaction-diffusion Textures A texture is modeled as the result of local and nonlinear interactions, representing the diffusion of chemics (morphogens) in a medium, and the reactions of creation and vanishing of those chemics. This type of model was initially proposed by A. Turing for the purpose of understanding the patterns on the skins of animals. The interactions lead to differential equations. If CX, t stands for the concentration of morphogens at X, t (to which a texture control function will be associated), then the simplest model is: ∂C D a2 1C bC C R. ∂t Here a2 1C is the diffusion term, bC is the term for the disappearance of the morphogens in time, and R is an arbitrary function representing the creation of morphogens. In the case of anisotropic patterns (as with zebra’s stripes, for instance), the term a2 1C is replaced by an anisotropic diffusion operator a11
∂2 C ∂2 C ∂2 C C a22 2 . C 2a12 2 ∂x ∂x∂y ∂y
The coefficients may also depend on the location. For the purpose of texture synthesis, the function C is associated with a color map, or two such functions may be used simultaneously with a threshold in order to produce bicolor textures. We present a simple example with isotropic diffusion in Figure 15. Such textures are quite realistic, but they are limited to a restricted class. Two advantages of the method are the possibility to paste texture samples by applying limit conditions, but, also the possibility of directly synthesizing
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
215
FIGURE 15. Isotropic reaction-diffusion texture.
texture on surfaces that are not plane by modifying the equation, as explained in Turk [1991]. D. Spot Noise Textures This section presents a linear model for textures, based on the addition of translations of characteristic functions of sets. The method has been introduced by Van Wijk [1991] for the purpose of scientific visualization. We define a spot noise by fx D ai hx xi , where ai and xi are random variables, h is the characteristic function of a simple shape (disk, ellipse, . . .), x is a point, and f is the gray value in this point. We may have to rescale f to stay between minimum and maximum values. If the ai ’s are centered and if, on average, there are occurrences of Xi ’s by unit of surface, then one gets Pf ω D hai2 ijHωj2 where Pf is the power spectrum of f, and H the Fourier transform of h. This is of use in view of the analysis/synthesis of textures, but unfortunately it is not that easy to choose h. More precisely, if we know the power spectrum of a texture on a disk centered on the origin, it is not granted that the inverse Fourier transform of an approximation of that power spectrum will be positive, and it is even more difficult to find a simple shape whose transform will match the spectrum. The synthesis is achieved in the frequency domain. For instance, if we suppose that the ai ’s are constant and the xi ’s are uniformly distributed, we compute the discrete Fourier transform of the shape h and then uniformly randomize the phase at each point. We then come back to the image domain. This (up to the discretization) amounts to superposing copies of h all over
216
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 16. Spot noise: on the left the generating shape, on the right, the texture obtained by randomizing the phases.
the image and then normalizing the gray levels. We give some examples of synthesis in Figures 16 and 17. The control of the texture by the shape characteristics is far from being obvious, but we have the following heuristics: a symmetry of the shape will appear in the texture, the larger the shape the larger the correlations in the image. This method agrees with the hypothesis that a microtexture is characterized by its power spectrum only and that the phases of the Fourier transform are unimportant, a hypothesis that is easily checked by randomizing the phases of a microtexture. E. Fractal Models 1. Fractional Brownian Motion A one-dimensional fractional Brownian motion (fBm) is a stochastic process VH from IR to IR, with 0 < H < 1, such that its increments VH t1 VH t2 have a Gaussian distribution, centered and with variance proportional to t2 t1 2H . The case H D 12 is the case of the usual Brownian motion, which is the integral of a Gaussian white noise. If H > 12 , the increments are positively correlated, whereas they are negatively correlated if H < 12 , and this has a strong repercussion on the aspect of the graph of V. An important property of
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
217
FIGURE 17. Spot noise: generating shape on the left, generated texture on the right.
VH is the statistical self-affinity of its increments. Indeed VH t0 VH t0 C t and r 2H VH t0 VH t0 C rt have the same distribution. The realizations of the graphs of such processes are fractal in the sense that the Hausdorff dimension of the graph of VH (or its box-counting dimension, which in this case is the same) is 2 H with probability 1. We here stress the difference with a usual Brownian path for which the dimension is 2. In dimensions larger than 1 it is possible to define processes VH,n from IRn to IR, with increments from x1 to x2 proportional to jjx1 x2 jj2H , jj Ð jj being the Euclidean norm. The dimension of such a process is n C 1 H. Other sets of interest are the zero-sets of fBm. The zero-set of VH,n is the subset of IRn1 , fx/VH,n x D 0g. This set has dimension n H and, contrarily to fBm, is statistically self-similar; i.e., VH x0 VH x0 C x and r 1 VH x0 VH x0 C rx have the same distribution. Those processes do exhibit the kind of irregularity and relationships between scales one would expect from many natural objects, together with randomness. For instance, to represent a coast, the zero-set of a two-dimensional fBm will prove useful, because it provides a close curve, which is self-similar in agreement with the shape of a real coast line, with a dimension chosen through H. On the other hand, if one wants to represent a shape for which the closer we look the more regular it seems (as for some mountains), then a twodimensional fBm with H < 12 can be convenient. There is also another way to
218
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
use those processes, not taking them as a geometrical model, but as a model for the color levels of a texture. In Jardine [1993] a texture is synthesized by approximating a two-dimensional fBm (by the midpoints displacement method presented in the next section) and then associating with each value of the process a color value using a principal component analysis. More generally, many processes similar to fBm are widely used in computer graphics to perturb the normals or the color maps. The following two sections present some of the algorithms for the generation of approximates of fBm. A good reference concerning those algorithms is Saupe [1988]. 2. Midpoints Displacement Method The midpoints displacement method enables us to construct a pseudo fBm starting from a polygonal representation of a shape. This explains its success, considering the importance of geometric modeling in computer graphics. It was introduced for the purpose of computer graphics by Fournier, Fussel, and Carpenter [1982] and also gained some success being used as a way to produce mountains in the movie Star Trek II. In dimension 1, the idea is to start from a segment, then to divide it in two and to displace its center by a random vector, then to cut the two resulting segments in two and to displace their centers by a smaller quantity, etc. The amplitudes of the displacements at each iteration follow the proportionality law between Varxt1 xt2 and jt1 t2 j2H . The algorithm is the following: ž In IR2 equipped with a Euclidean basis x, t, we start from a line segment S0 whose endpoints have coordinates (0, T) and (0, T0 ), respectively, T and T0 being two random numbers following a centered normalized Gaussian law. ž Si are sets of segments ski recursively defined in the following way: for i D 0 to N, for each ski , we displace its center vertically by a random variable following a centered Gaussian law of variance 2iC1H . The set SiC1 then consists of the line segments thus obtained. The justification of this algorithm comes from the following formula, where BH is an fBm and u 2 [0, 1]: EBH ujBH 0 D 0, BH 1 D 1 D 12 u2H C 1 ju 1j2H so that it is equal to 12 if u D 12 . This method can then be applied to a triangle, a widely used primitive in geometrical modeling. We displace the middle of the sides as before and then
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
219
link them to the center of mass of the triangle, therefore obtaining three new triangles that can be modified, and so on. It is also possible to use squares on a grid, to use fictive centers of those squares, and to keep only points produced every two iterations. A serious problem arises with such a simulation, as stressed in Mandelbrot [1988]: we lose the stationarity of the fBm. Intuitively, this is the consequence of the immobility of the vertices of the triangles, which leads to creases. A solution consists in adding at each step a random perturbation not only to the centers of the line segments, but also to their endpoints. At each step, a random quantity is added to centers and endpoints, following the same law as the one used to displace centers in the previous algorithm. 3. The Spectral Synthesis Method The spectral synthesis method is based on the fact that an fBm with parameter H has a power spectrum (the square of the modulus of its Fourier transform) given by Sf D 1/jjfjj2HC2 (we assume here that the process is defined on a two-dimensional space). We therefore can build a Fourier decomposition whose coefficients are random variables with Ejai,j j2 D
i2
1 C j2 ˛
with ˛ D H C 1. Care must be taken with the symmetry conditions of the Fourier transform. The major drawback of the method is that it is not possible to adjust the data to some boundary conditions. We mention that if we choose the ai,j to be normally distributed with mean 0 and variance given by the preceding formula with ˛ D 2, we simulate the unique scale invariant Gaussian process [Mumford, Zhu, and Gidas, 1997]. Indeed fBms get closer to scale invariance as H tends to 0. 4. Iterated Function Systems This model represents an image as the attractor of a set of contracting affine functions and permits us to define a quite complicated object with few parameters. Therefore, the main application of this model, which is not presented here, is image compression. Some references about the use of iterated function schemes (IFS) in image synthesis are Demko, Hodges, and Naylor [1985], Barnsley et al. [1988], Barnsley [1988], and Barnsley, Elton, and Hardin [1989]. An IFS is a set (Ai , pi ), where the Ai are affine functions from IR2 to itself and the pi are strictly positive numbers such that pi D 1. Moreover, it is assumed that the set of transformations is contracting, i.e., each Ai has a
220
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Lipschitz constant si such that p
si i < 1.
29
i
It is enough but not necessary that each Ai be contracting. Under this condition, we can associate with the IFS a unique compact set A and a unique measure (supported by A), P. In the case where there is only one function Ai , A is the fixed point of the application Ai , which is contracting from the set of compact sets of IR2 into itself for the Hausdorff distance d. The case where each Ai is contracting is the same. In fact, the fixed-point theorem holds for the function Ui Ai , which is also contracting on compact sets for d. The measure P has the following interpretation: if we apply the functions Ai to a point x with probability pi , then do the same with the point Ai x, and so on, the sequence of points thus obtained will visit all of A, visiting a region R with a frequency proportional to PR. The set is often, but not necessarily, a fractal set. For instance if we consider (in IR instead of IR2 but with the same definitions), A1 x D x/3, A2 x D x/3 C 2/3, p1 D p2 D 1/2, then the set A is the dyadic Cantor set, and the measure P is uniformly supported by A. To synthesize a digital image from (A, P), the set A is pixelized, and for each pixel R a gray level (or color array) proportional to PR is given. But of course we do not know (A, P), but Ai , pi . Thus: ž We consider a pixel grid. ž We start from a point x0 of A, (at least one of the Ai is contracting; otherwise Condition (29) is not true, and we can show that its fixed point belongs to A). ž We pick an Ai with probability pi and increment by 1 the gray level value of Ai x0 . ž We repeat the preceding operation (starting from the previous point) many times (many compared to the size of the grid). ž We rescale the obtained gray level values between 0 and 255. An ergodicity theorem ensures the validity of this method. Of course, we don’t know so far how to choose the Ai ’s in order to design an a priori fixed shape. For this task, the following result, the so-called collage theorem, is essential. We consider T a compact set. If dT, [i Ai T < ε then dT, A <
ε . 1s
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
221
This theorem is the consequence of an intermediate result in the proof of the fixed-point theorem for the set application [i Ai which, as previously mentioned, is contracting on compact sets. This enables us to construct a set of functions (Ai ) to approximate a given shape T. We superpose some Ai T to T, where the Ai ’s are affine transforms, in order for T to be covered as much as possible and for the complementary set of T to be covered as little as possible. Then the theorem tells us that A will be close to T. T is then represented by the (few) parameters of the Ai ’s. In Figure 18, we present an example of such a synthesis for hay, using three transformations Ai . We can choose the pi ’s to be all equal, thus leading to a shape of uniform color, or vary them in order to have nonuniform gray levels on T. However, once the geometry is fixed, the pi ’s can control the gray level values only according to the Ai ’s. A way to attenuate this dependency between color and geometry (that can, however, produce interesting effects) is to add redundant functions Ai to the ones that are necessary for the geometry. F. Random Fields Models In those models a texture is assumed to be the realization of a stationary random field on L a finite subset of Z2 (the pixels) and with integer values from 0 to N (the gray levels). In the following, if X is such a field and s a point of L, we will write Xs for the random variable corresponding to the value of the field at s, D s2L f0, . . . , Ng for the Cartesian product of sets, and PE for the probability of an event E ² . PL stands for the set of subsets of L. If x 2 , we call it a realization of X and denote by xs its 5th coordinate. We present the Gibbs formalism because of its generality, but indeed many
FIGURE 18. Iterated function system, with three affine functions, simulating hay.
222
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
other models are closely related [Chellappa and Kashyap, 1985; McCormik and Jayaramamurthy, 1974]. 1. Markov Models A Markov field is a field X, such that for any s 2 L there exists a finite B 2 PLns such that PXs D xs jXt , t 6D s D PXs D xs jXt , t 2 B. In the case of interest to us, a finite L, all random fields are Markov fields. This definition gains interest if we are to specify the set B. We call neighborhood system a set ∂ D f∂s; s 2 Lg ² PL, such that (s 2 / ∂s) and (s 2 ∂t iff t 2 ∂s). In this case we say that s and t are neighbors. We call clique of ∂ two elements of L that are neighbors. We shall say that a field is a Markov field with respect to a given neighborhood system ∂ if PXs D xs jXt , t 6D s D PXs D xs jXt , t 2 ∂s. An important theoretical result about Markov fields, which has a straightforward application in synthesis, is the characterization of Markov fields by Gibbs potentials. We call potential a function U from PL ð to IR, such that U;, x D 0 for all x and UA, x D UA, y for all x, y 2 such that xs D ys for s 2 A ² L. We call energy of U the function from to IR: EU D A²L UA, .. We call Gibbs field with potential associated to U the field defined by Px D
1 expEU x Z
where Z is a normalizing constant. If, for a neighborhood system ∂, U is zero outside the cliques of this system, then the Gibbs field corresponding to U is a Markov field with respect to this neighborhood system. We thus define a Markov field without checking difficult compatibility relations between local characteristics (i.e., the probability of the type PXs D xs , s 2 AjXs , s 2 L A. In fact, it is even true that every Markov field is such a Gibbs field (Hammersley-Clifford theorem), so that the preceding method for defining fields is not restrictive.
223
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
Those results permit an easy definition of a new field in view of the synthesis, but then half the problem remains: how to synthesize a field knowing its potential. One possibility for achieving a fast synthesis is the so-called Gibbs sampler: ž ž ž ž
We start from a random image (Ys D ys ). A site s is randomly picked in L. The gray level value of s is updated according to PXs jXu , u 6D s. The operation is repeated k card L times, where has the size of a small integer.
Concerning those results and their application to image processing, we refer to the introductory article of Geman and Geman [1984] and the recent book by Winckler [1995]. 2. An Example: The Autobinomial Model In the autobinomial model, introduced by Cross and Jain, [1983], the image is associated with a Gibbs random field in order to synthesize natural textures. We keep the notations of the previous section. We write Cpn for the binomial coefficients. We define, for each site s, its neighborhood in a uniform way by s C i , i 2 f1, 2, . . . , Ag. It means that all cliques are of size less than 2, a so-called automodel. The i ’s are fixed vector in Z2 . They are equal to (1,0), (1,0), (1,1), (1,1), (2,0), etc. The energy function is then directly defined by EU y D
A iD1
i
ys ys C i 0
s
ys
s
y
ln CNs .
s
If we define as D exp
0 C
A iD1
i
yt
hs,tii
then the local characteristic in s is
PYs D ys jYt D yt , s 6D t D
y CNs
a 1Ca
ys
1
a 1Ca
Nys
.
Thus the name autobinomial model. The meaning of the formula is that at each pixel, the probability law of the gray level corresponds to the number of heads obtained when tossing an unfair coin N times, for which the probability of getting a head is a/1 C a. Of course, this probability depends on the gray level values of neighboring pixels. In the original article by Cross and Jain, the length of the i ’s is smaller than 4. Moreover, N D 2 (autologistic model). For
224
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
given types of neighborhoods, the coefficients are determined by maximizing separately log pxi,j jneighbors Sj D i
where the xi,j ’s are requested not to be neighbors at fixed j and are such that [i,j xi,j D L. The parameters for different j are then averaged. There is no procedure to infer the form of the neighborhoods. Several microtextures may be synthesized this way, but failures occur in the case of macrotextures or textures with different scales. These models have been more successful in image segmentation or denoising than in texture synthesis itself. The main problem for the synthesis by a Markov random field, which cannot be overcome without a high computational cost and heavy formalism, is the modeling of long-distance interactions and the intricate interactions between scales. Let us also mention that rotation invariance cannot be achieved without many parameters. Thus only a few textures can be realistically synthesized by Markov fields with reasonable cost, the main difficulty remaining in the determination of texture parameters and neighborhoods. One possible way to overcome this difficulty is to find a way to determine simultaneously those parameters and the form of the neighborhood, including long-range ones, as is presented in the work of Zhu, Wu, and Mumford [1998], which we introduce in Section VI.H. 3. The Model of Gagalowicz and Ma The model is introduced in Gagalowicz and Ma [1985]. Once more, the texture is related to a random field. The idea is to characterize a texture by some of its statistical moments. Again, we consider a net L, N is the number of gray levels, and t is a point of L. A microscopic texture seems to be well represented by its histogram H and by its estimated moments: Sp,q t D
Xs p XsCt q 1 cardL s2L pCq
( and , respectively, are the estimated mean and variance of Xs ). In order for the model to be tractable, only values of t smaller than about 10 or 12 can be in practice considered, for obvious computational reasons. This is also motivated by perception experiments, an observer looking at an image from a distance where pixels are just discernable being receptive only to statistics over a distance of 8 to 12 pixels (a vision cone of about 9° of solid angle).
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
225
Moreover, it is assumed that p C q < 5. To represent other types of textures (macroscopic textures) it seems to be necessary to consider moments of third order. This model is richer than those considering only autocorrelation, and indeed one can distinguish between textures having the same autocorrelation [Pratt, Faugeras, and Gagalowicz, 1978]. The synthesis of a texture knowing its moments is achieved in a very straightforward way. To synthesize a texture with histogram vector H0 and moments vector M0 , we start from an initial image of histogram H0 and moments M and then sequentially minimize the quantity ˛jjH H0 jj2 C jjM M0 jj2 , pixel after pixel. The minimization is achieved by trying each gray level value for the pixel we are currently looking at. The order in which the pixels are visited can be deterministic or random. Of course, the major drawback of the method is that it requires a long computing time and an important number of parameters (up to about 20,000, which is the order of a small image size). On the other hand, the resulting images are very realistic and visually match quite well natural textures with the same moments. G. Multiscale Synthesis Model Heeger and Bergen [1995] recently proposed a novel method for reproducing natural textures. The basic idea is to recursively fit the pyramidal decomposition of a white noise image to the one of a sample of the texture to synthesize. The method enables one to produce samples of any size as well as solid textures. A pyramidal decomposition of an image I is a linear decomposition of I into Ii,j , a set of images of various sizes, each capturing information at a certain scale i and orientation j, from which the image I can be reconstructed. Heeger and Bergen synthesize texture using the steerable pyramid decomposition [Simoncelli et al., 1992]. In general j varies from 1 to 4, and “steerable” means that those four filters allow recovery of any orientation as a linear combination of them. The image can be reconstructed linearly from its decomposition. The synthesis algorithm is the following. Let I be a sample of the texture to synthesize and Ii,j be its steerable pyramid decomposition. Let N be a white Q by fitting its histogram to the one of I noise image. First transform N into N (by histogram we mean histogram of gray levels). Then compute the steerable Q i,j of the modified noise image. Then for each i, j fit the histogram pyramid N Q i,j to the one of Ii,j , and use the resulting channels NQQi,j of the channel N to reconstruct a new image I1 . The previous operation is then repeated a few times (the histogram of I1 is no more fitted to the one of I), leading to an image I . There is no theoretical result about the convergence of I . Anyway, the problem of measuring the distance between two texture images is far from
226
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
being simple. After a small (around 5) number of iterations, I and I are often visually very similar (however, the iteration should not be repeated too often). The method works in agreement with psychophysical experiments, which show that two textures having the same responses in some banks of orientation and scale selective linear filters are perceptually indiscriminable [Bergen and Adelson, 1988; Malik and Perona, 1990; Chubb and Landy, 1991]. Moreover, as stressed by the authors, it is important to realize that, even though it is based on histogram matching, the algorithm is modifying more than the first-order statistics of the initial noise, because the responses of linear filtering capture higher order statistics. This algorithm has been given for gray level images, but it is straightforward to extend it to color images by working not on the RGB components, but on a color space where the three color components are decorrelated. Moreover, it is possible to synthesize solid textures (see Section VI.B) starting from a 3-D noise. The experiments presented by Bergen and Heeger are very convincing (better than any previous method) and at a very low computational cost. See Figure 19, where three examples from the original article are displayed. The synthesized textures are required to be homogeneous, and, as pointed out by the authors, failures occur in the case of quasi-periodic textures. It is possible to generalize the method, enabling the synthesis of quasiperiodic textures, or textures having extended structural elements [Simoncelli and Portilla, 1998]. The idea is still to work with a steerable pyramid decomposition, but to fit higher-order characteristics of the subband images and not only the histogram. It is remarkable that by doing so the authors manage to synthesize images that, even if abstract, look quite similar to a predetermined natural scene having a more complicated structure than a texture. H. The FRAME Model The FRAME (filters, random field and maximum entropy) model of Zhu, Wu and Mumford [1998], unifies the two points of view of filter bank responses and Markov fields. The idea, in order to synthesize a texture perceptually close to a sample image A, is the following. Let S D fF1 , . . . , F g be a set of filters. Let U be the set of all probability distributions on images. We denote by ² U the set of distributions that have the same histogram responses as A to all filters in S . Then we look for the element of having the maximum entropy, i.e., pS D arg max
pI log pI dI
where the maximum is taken over all p in . The form of pS , through Lagrange parameters, is a Markov field that can be synthesized by the Gibbs
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
227
FIGURE 19. Multiscale synthesis. Left: original, right: synthesis [Heeger and Bergen, 1995].
sampler method (see Section VI.F.1), giving an image IS . Actually, the situation is slightly more complicated. Without going into details, the derivation of the parameters of the field and its synthesis are achieved conjointly, leading to successive approximations of pS . Moreover, the authors describe how to iteratively choose the filters of the set S from a general filter bank S, in order to consider only those filters that are needed to synthesize A. First, a filter F1 is chosen from S so that the responses of a white noise and A to F1 are as distant as possible for the L 1 -norm. Assume that S has been chosen. Synthesize IS by the previously described method, and define F C1 to be the filter such that the responses of IS and A to F C1 are as distant as possible for
228
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 20. Frame synthesis. Left: original, right: synthesis [Zhu, Wu, and Mumford, 1998].
the L 1 -norm. Then S C1 D S [ fF C1 g. It should be emphasized that the set S may consist of linear and nonlinear filters, and that any filter can be added to a preexisting bank. The filters used by the authors are mainly Laplacians of Gaussian (with 8 scales), Gabor filters (with 6 scales and 6 orientations), and the square of the modulus of the Gabor filters. The method gives very good results, both on microtextures and on textures with extended elements. We present two examples from the original article in Figure 20. The computational cost, though, is quite high. The advantage over other Markov field methods is that there is an explicit way to choose the form of neighborhoods through the filter approach. The computation of long distance interactions is made feasible.
VII. SOME PRINCIPLES FOR THE SYNTHESIS OF ABSTRACT NATURAL IMAGES In this section, we propose a new methodology to perform image synthesis based on objects interaction and perspective. We do not intend to create real images. We derive several visual rules from simple physical properties of the objects, and we want to study the consequences of such rules in the image configuration.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
229
A. Structural Laws We emphasize some simple physical laws that strongly restrict the images structure. First of all, solids are made of homogeneous materials that do not mix, a law of physical exclusion. This results in a visual exclusion law (which, of course, is violated in the case of liquids mixing together or diffusing in a solid). The physical law of exclusion also results, for opaque objects, in a visual occlusion law: an object that is closer to the observer may partially or completely hide one which is further. This law is violated by transparent objects (glass, some liquids, but also shadows and spots). Moreover, the physical inclusion of a subobject in an object results in a visual law of occlusion at the border. Part of an object may disappear because it lies on a nonvisible part of the object it is included in (selfocclusion). Notice that the change of point of view of the observer (or camera) creates occlusion, but also deformations of visible parts due to the 3-D shape of the object.
B. Scale and Perspective Laws Scale and perspective laws affect the visual sizes of objects. The key point here is to notice that the simple fact of imposing an exclusion principle yields a size law. Let us illustrate this by two examples. If we juxtapose shapes in the plane whose sizes are random and if we want those shapes to be disjoint, it is clear that as the process evolves, small sizes are more and more likely to appear. In the same way, if we suppose that objects are organized in an object-subobject hierarchy, then the size is rapidly decreasing as we progress in the tree of subobjects. Indeed, it is clear that if an object has some parts, at least one of those will have an area smaller than half the area of the original object. At this point let us mention that a very simple model for the preceding phenomenon yields a power law for the size histogram of objects, with an exponent between 0 and 1. We start from a square of side a, containing N disjoint subsquares of side Ka. Those subsquares themselves contain N subsquares of side K2 a, etc. We assume that K < 1 and NK2 < 1, so that there is enough room to fit the subsquares. We then have objects with areas An D Kn a2 NKnC1 a2 for n varying from 0 to C1. The number of objects with area An is Nn D Nn . Thus log N/ log Kj Nn D CA1/2j n
230
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
with C a constant. We thus have an area histogram following a power law area˛ with 0 < ˛ < 1, because NK2 < 1. The size decay from one scale to the next may vary from some units in the case of subobjects to the order of several thousand or more in the case of a texture. In an object-subobject configuration, we should also consider the phenomenon of occlusion at the border, but more importantly the fact that the 3-D structure of an object may affect the size distribution of subobjects. This is of use in the case of the shape from texture problem or, conversely, is a thorny problem in image synthesis. There may also be exclusion or not between subobjects. The occlusion at this stage may be encountered in the case of textures. In some cases, subobjects of distinct objects may interfere, as in the case of textures such as fur or in the case of a scene viewed through a dirty window for instance. The size distribution of visual objects depends on the size distribution of physical objects and also on perspective laws. First, there is a decay of visual size due to the distance to the observer. At this point, let us mention a very simple model for the conjoint effects of perspective and physical exclusion on size distribution, which as been suggested to us by S. Geman. We suppose that an observer is looking at objects lying on the floor that are all of the same width. We neglect the occlusion phenomenon. We also suppose that the vision field is conical, so that the number of objects the observer sees, at distance between d and d C 1d, is nd D C1 d1d. Those objects have a visual area of C2 /d2 . In these conditions the number of objects with a visual area between a and a C 1a is Ca2 , once more a power law. It is worth mentioning that this is incompatible with a finite area for the image, maybe suggesting that the occlusion phenomenon has to be taken into account, at least for objects far away. But, as Section V shows, the precise form of the effect of occlusion on size distribution is not easily determined. C. Synthesis Rules and “Synthetic Worlds” The main goal of this section is to explore how, with a few number of rules concerning the interaction between the objects, we can create various universes of synthetic images. Moreover we think that some of these images might have an aesthetic interest by themselves. We define the following synthesis procedure. An image is organized as a tree, each node representing an objectsubobject relation. The initial node is the entire support of the image. At each node, we choose parameters (deterministic or random) and we construct subobjects Oi ’s in a “father” object O. Each parameter may depend on the characteristics of O. Those parameters are ž Number of objects Oi ’s, ž Shape law of the objects Oi ’s,
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
231
ž Size law of the objects Oi ’s, ž Occlusion, exclusion, or transparency between the Oi ’s, nothing that in the case of occlusion the respective locations of objects may be or not related to their sizes, ž Occlusion of the Oi ’s at the border of O or not, ž 3-D deformation or not (including the perspective law at node 0). An alternative is to construct subobjects forming a partition of O, i.e., to create a patchwork. The objects we used in the following examples are of two kinds. The first ones are simple geometric shapes, such as rectangles, polygons, and disks. The second ones are connected components of bilevel sets of photographs (see Section III), i.e., connected homogeneous parts of photographs. This appeared to be a simple and fruitful method to generate new shapes. The previously mentioned principles permit to create, among many others, the following types of images: ž Flat world: Subobjects are strictly included in father objects and exclude each other. There is no perspective law. ž Bicolor world: Black and white alternate as we proceed into the tree, creating a confusion foreground-background. ž Windowed world: There is exclusion at node 0 and occlusion (at the border and between objects at all other nodes. We display several examples of such synthesis in Figures 21–31.
FIGURE 21. Polygons at several scales. At each scale, the number of objects is multiplied by 4 and their area is divided by 4. Larger scales are painted first, and objects may occlude each other at each scale. There are five scales.
232
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 22. Texture generated by a dead leaves model with time-dependent sizes; the principles are the same as in Figure 21, but there are seven scales.
FIGURE 23. Bicolor image, made of polygons that exclude each other at every scale (strict inclusion object-subobject, strict exclusion between subobjects.)
FIGURE 24. The first scale is a patchwork, and then objects are of random types with exclusion (strict inclusion, and strict exclusion, the first scale being excepted.)
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
233
FIGURE 25. Objects are elongated rectangles, their sizes follow a uniform law, and they exclude each other.
FIGURE 26. Texture generated by a dead leaves model; disks of smaller and smaller sizes are superposed.
FIGURE 27. Occlusion everywhere: between objects, between subobjects, and between objects and their subobjects.
234
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
FIGURE 28. Exclusion at the first scale and occlusion everywhere else.
FIGURE 29. The shape is a section from Figure 1. There are 10 different scales drawn sequentially, with occlusion.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
235
FIGURE 30. The objects are sections of a fly eye on which shears were performed. There are three scales sequentially drawn with occlusion.
FIGURE 31. Figure 30 is repeated without shear and with six scales.
VIII. APPENDIX: MATHERON’S DEAD LEAVES MODEL In this appendix we present the dead leaves model as it was originally developed by G. Matheron [1968] on the whole of IR2 . The main difference with our simplified model is that it is not possible to draw a single point randomly in IR2 , so that we need to draw infinitely many points at each stage. On the other hand, if we draw several leaves at a time, we need them to be disjoint to define the occlusion process. This may be achieved using an infinitesimal point process. A. Poisson Point Processes Define P the set of all locally finite sequences of points x D xi i2IN of IRn . With each x 2 P we associate the map B ! cardx \ B, defined from the set of Borel sets of IRn into IN. We then define N, the smallest -algebra such that x ! cardx \ B is measurable for each Borel set B. We define a point
236
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
process as any measurable map into P, N. Elements of P are closed sets of IRn , because they are discrete sets. It is easily shown that point processes are, in fact, random closed sets. Indeed, for every compact set K, x1
K
D fw 2 /cardxw \ K D 0g
1
which proves that x FK is measurable and x is a random closed set. Therefore, such a process is characterized by its capacity function. The interested reader may consult Cox and Isham [1980] and Daley and Vere-Jones [1988]. Definition 3 Let be a measure on IR2 . We call Poisson process (p.p.p.) associated with any point process x satisfying the following: 1. If A1 , . . . , An are -measurable disjoint sets, then A1 , . . . , An are independent random variables. 2. For every bounded Borel set B, cardB \ x obeys a Poisson law with parameter B, i.e., for any k in IN, Prcardx \ B D k D
Bk expB. k!
30
Theorem 11 Let be a nonatomic measure (i.e., x D 0 for x in IR2 ), finite on all bounded sets. Then there exists a Poisson point process associated with . We omit the proof of this result, and refer to Kingman [1993]. We now list some properties of Poisson processes. ž If a p.p.p. is stationary, then x is associated with the measure , where is a positive real number and denotes the Lebesgue measure. x is called a Poisson point process with parameter . In that case, Property 2 of Definition 3 is entailed by Property 1. ž Let x1 and x2 be two p.p.p. associated, respectively, with 1 and 2 . Then x1 [ x2 is a p.p.p. associated with 1 C 2 . This statement follows from the next result, whose proof will be omitted because it is too technical: for every compact set K, x1 \ x2 \ K is empty with probability 1. It is plain that x1 [ x2 satisfies Properties 1 and 2 of Definition 3, and the sum of two Poisson laws with parameters ˛ and ˇ is a Poisson law with parameter ˛ C ˇ. ž Let x be a p.p.p. associated with and p be a -measurable real function defined on IR2 such that 0 px 1 for every x. Then we define a new p.p.p., x0 , as the set of points of x surviving with probability px, independently of each other. Then x0 is a p.p.p. associated with the new measure 0 B D
px dx. B
We omit the proof of this result, and refer to Kingman [1993].
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
237
B. The Boolean Model We consider a particular instance of random closed set of IRn . Let x D xi i2IN be a stationary p.p.p. with parameter and Xi i2IN a sequence of independent identically distributed closed random sets that are independent of the xi ’s. In addition, we assume EX0 ý K < C1,
for every K 2 K
31
where is the Lebesgue measure in IR2 and E is the expectation with respect to the image measure by X0 . This implies, among other properties, that X0 is almost surely bounded. Definition 4 We define a Boolean model with intensity and grain X0 and denote by B, X0 the closed random set [i xi C Xi The closedness of B, X0 is a consequence of Condition (31), which ensures that each xi C Xi meets only a finite amount of other xj C Xj ’s, with probability 1. The capacity function associated with B is remarkably simple. Theorem 12 Let X D B, X0 be a Boolean model and G its capacity function; then for any compact set K, GK D PrX
L K D ; D expEX0 ý K.
32
Proof Let K 2 K and x0 defined by x0 D fxi 2 x/xi C Xi \ K 6D ;g. Then x0 also is a Poisson process associated with 0 B D B Prx C X0 \ K 6D ; dx, by Formula (8.1). Then PrX \ K D ; D Prx0 D ; D exp D exp D exp D exp E
IR2
Prx C X0 \ K 6D ; dx L 0 ý K dx Prx 2 X
IR2
IR2
EllXL 0 ýK x dx
IR2
llXL 0 ýK x dx
D expEXL 0 ý K L 0 ý K D X0 ý K, L we where llC x is 1 if x 2 C and 0 otherwise. Since X obtain the announced result.
238
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Remark The quantity PrK ² X is not known in general. It is even difficult to obtain approximations. C. The Dead Leaves Model in IR2 Let us consider Xt t21,0] a family of independent random compact sets satisfying Condition (31) and a continuous function t. For 1 < t 0, we consider the infinitesimal Boolean model Bt dt, Xt . We superpose the Boolean models Bt dt, Xt from 1 to 0 and define the dead leaves model Ft, Xt as the boundaries of the sets obtained by such a superposition procedure. That is, we keep the boundaries of Ft, Xt that do not intersect any of the Ft0 , Xt0 for t0 2 t, 0]. For the sake of simplicity, we assume that t D and Xt D X. We remark that in view of tridimensional scenes modeling, the t parameter permits to represent perspective effects. In addition we assume that X has almost surely nonempty interior. It can be proved that F, X is a closed random set and that the superposition of the Boolean models from 1 to 0 covers the whole space IR2 . Indeed, any small enough compact set K is included with probability 1 in the superposition of L is not zero the Boolean models: if K is a compact set such that EX K (which will be the case for sufficiently small K if X has nonempty interior), then the probability that K is included in some xi C Xi at some time t0 in L (as is proved next). Therefore, because this prob[t, t C dt] is dtEX K ability is nonzero, K is included in a grain of the model for some time t with probability 1. Thus, the model is a closed set because it is a locally finite union of closed sets. As in the bounded case, we have the following. Theorem 13 Let 2 IR and X be a random compact set satisfying Condition (31). Let us define QK as the probability that the compact set K is included in a connected component of the partition of IR2 provided by the dead leaves model. Then we have L EX K QK D . 33 L EX ý K Sketch of proof K is included in a connected component if for some time t 2 1, 0K is included in a grain of Bdt, X and K does not intercept any of the grains Bdt, X, from t to 0. If the Boolean model at time t is given by [xi C Xi , then Prxi C Xi \ xj C Xj 6D ;, i 6D j D odt and (32) provides PrK ² K dt, X D PrK ² xi C Xi , for some i C odt L C odt D dtEX K
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
239
Moreover, the probability that K does not intercept the Boolean model from L t to 0 is exptEX ý K. Indeed, the union of several Boolean models is a Boolean model whose parameter is the sum of the parameters of the initial Boolean models. Therefore, because the events just described are independent, we obtain QK D
0 1
L exptEX Kdt, L EX K
and therefore QK D
L EX K . L EX ý K
34
REFERENCES Baddeley, R. J. (1997). The correlational structure of natural images and the calibration of spatial representations, Cognitive Science 21, 351 – 372. Barlow, H. B. (1961). The coding of sensory messages, Current Problems in Animal Behavior, (W. H. Thorpe and O. L. Zangwill, Eds.) Cambridge University Press, pp. 331 – 360. Barnsley, M. F. (1988). Fractals Everywhere. San Diego: Academic Press. Barnsley, M. F., Elton, V., and Hardin, J. H. (1989). Recurrent iterated function systems. Constructive Approximations 5, 3 – 31. Barnsley, M. F., Jacquin, A., Malassenet, F. Reuter, L., and Sloan, A. D. (1988). Harnessing chaos for image synthesis, Computer Graphics 22(4), 131 – 140. Beck, J. (1983). Textural segmentation, second order statistics and textural elements, Bio. Cyber. 48, 125 – 130. Bell, A. J., and Sejnowski, T. J. (1997). The “independent components” of natural scenes are edge filters, Vision Research 37, 3327 – 3338. Bergen, J. R. and Adelson, E. H. (1988). Early vision and texture perception, Nature 333, 363 – 367. Blinn, J. F. (1978). Simulation of wrinkled surfaces, Computer Graphics 12(3), 286 – 292. De Bonet, J. S., and Viola, P. (1997). A non-parametric multiscale statistical model for natural images, Advances in Neural Information Processing 10. Burton, G. J., and Moorhead, I. R. (1987). Color and spatial structure in natural scenes, Applied Optics 26, 157 – 170. Caselles, V., and Morel, J.-M. (1998). The connected components of sets of finite perimeters in the plane, Preprint. Chellappa, R., and Kashyap, R. (1985). Texture synthesis using 2d non-causal autoregressive models, IEEE Trans. Acoust. Speech Signal Process. 33, 194 – 203. Chubb, C., and Landy, M. S. (1991). Orthogonal distribution analysis: a new approach to the study of texture perception, in Computational Models of Visual Processing (M. S. Landy, and J. A. Movshon, Eds.) MIT Press, pp. 291 – 301. Cohen, A., DeVore, R., Petrushev, P., and Xu, H. (1998). Nonlinear approximation and the space BVIR2 , preprint, submitted to Amer. J. Math. Cox, D. R., and Isham, V. (1980). Point Processes. New York: Chapman and Hall. Cross, A., and Jain, A. K. (1983). Markov random field texture models, IEEE Trans. on Pattern Analysis and Machine Intelligence 5(1), 25 – 39.
240
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Daley, D. J., and Vere-Jones, D. (1988). An Introduction to the Theory of Point Processes. New York: Springer-Verlag. Demko, S., Hodges, C., and Naylor, B. (1985). Construction of fractal objects with iterated function systems, Computer Graphics 19(3), 271 – 278. Deriugin, N. G. (1956). The power spectrum and the correlation function of the television signal, Telecommunications 1, 1 – 12. Evans, L. C. and Gariepy, R. F. (1992). Measure Theory and Fine Properties of Functions, Studies in Advanced Math. CRC Press. Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells, Jour. Opt. Soc. Amer. A4, 2379 – 2394. Fournier, A., Fussel, D., and Carpenter, L. (1982). Computer rendering of stochastic models, Communication of the ACM 25(6), 371 – 384. Fournier, A., and Reeves, W. (1986). A simple model of ocean waves. Computer Graphics, 75 – 84. Gagalowicz, A., and Ma, S. D. (1985). Sequential synthesis of natural textures. Computer Vision, Graphics and Image Processing 30, 289 – 315. Geman, S., and Geman, D. (1984). Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images, IEEE Trans. on Pattern Analysis and Machine Intelligence, 721 – 741. De Giorgi, E. (1954). Su una teoria generale della misura r 1-dimensionale in uno spazio ad r dimensioni. Ann. Mat. Pura ed Appl., IV. Ser. 36, 191 – 213. Heckbert, P. S. (1986). Survey of texture mapping. IEEE Computer Graphics and Applications 6, 56 – 67. Heeger, D. J., and Bergen, J. R. (1995). Pyramid based texture analysis/synthesis. Computer Graphics Proceedings, pp. 229 – 238, Huang, J., and Mumford, D. (1999). Statistics of natural images and models. To appear in CVPR. Jardine, L. F. (1993). Fractal based analysis and synthesis of multispectral visual texture for camouflage, in Applications of Fractals and Chaos, (A. J. Crilly, R. A. Earnshaw, and H. Jones, Eds.) New York: Springer-Verlag, pp. 101 – 116. Jeulin, D. (1996). Dead leaves models, from space tesselation to random functions, in Advances in Theory and Applications of Random Sets (D. Jeulin, Ed.). Julesz, B. (1981). Textons, the elements of texture perception and their interactions, Nature 290, 91 – 97. Julesz, B. (1986). Texton gradients, the texton theory revisited, Biol. Cybern. 54, 245 – 251. Kajiya, J. T. (1986). The rendering equation. Computer Graphics 20(4), 143 – 150. Kanisza, G. (1979). Organization in Vision: Essays on Gestalt Perception. Praeger. Kingman, J. F. C. (1993). Poisson Processes, Oxford Studies in Probability. Clarendon Press. Kretzmer, E. R. (1952). Statistics of television signals, Bell Syst. Tech. Journal 31, 751 – 763. Malik, J., and Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms, Jour. Opt. Soc. Amer. A7, 923 – 931. Mallat, S. Personal communication. Mallat, S. (1997). A Wavelet Tour of Signal Processing. New York: Academic Press. Mandelbrot, B. (1988). Fractal landscapes without creases and with rivers, in The Science of Fractal Images (H.-O. Peitgen and D. Saupe, Eds.). New York: Springer-Verlag, pp. 243 – 260. Matheron, G. (1967). El´ements pour une Th´eorie des Milieux Poreux. Paris: Masson. Matheron, G. (1968). Mod`ele s´equentiel de partition al´eatoire. Technical report, CMM. Matheron, G. (1975). Random Sets and Integral Geometry. New York: John Wiley. McCormick, B. H., and Jayaramamurthy, S. N. (1974). Times series models for texture synthesis, Int. J. Comput. Inform. Sci. 3, 329 – 343, Meyer, Y. (1983). Wavelets: Algorithms and Applications. SIAM. Mumford, D., Zhu, S. C., and Gidas, B. (1997, working draft). Stochastic models for generic images.
THE SIZE OF OBJECTS IN NATURAL AND ARTIFICIAL IMAGES
241
Olshausen, B. A., and Field, D. J. (1996). Natural images statistics and efficient image coding, Network: Computation in Neural Systems 7, 333 – 339. Peachey, D. R. (1985). Solid texturing of complex surfaces, Computer Graphics 19(3), 279 – 286. Perlin, K. (1989). An image synthesizer, Computer Graphics 19(3), 287 – 296. Perlin, K. (1989). Hypertexture, Computer Graphics 23(3), 253 – 262. Pratt, W. K., Faugeras, O. D., and Gagalowicz, A. (1978). Visual discrimination of stochastic texture fields, IEEE Trans. Systems, Man, and Cybernetics 8, 796 – 804. Reeves, W. T. (1983). Particle systems — a technique for modeling a class of fuzzy objects, Computer Graphics 17(3), 359 – 376. Ruderman, D. L. (1994). The statistics of natural images, Network: Computation in Neural Systems 5, 517 – 548. Ruderman, D. L. (1997). Origins of scaling in natural images, Vision Research. Vol 37, No. 23, pp. 3385 – 3395. Ruderman, D. L., and Bialek, W. (1994). Statistics of natural images: scaling in the woods, Physical Review Letters, pp. 814 – 817. Rudin, L. (1987). Images, Numerical Analysis of Singularities and Shock Filters. Ph.D. diss., California Institute of Technology. Rudin, L., Osher, S., and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms, Physica D 60, 259 – 268. Saupe, D. (1988). Algorithms for random fractals, in The Science of Fractal Images. New York: Springer-Verlag. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992). Shiftable multiscale transforms, IEEE Transactions on Information Theory, Special Issue on Wavelets, pp. 587 – 607. Simoncelli, E. P., and Portilla, J. (1998). Texture characterisation via joint statistics of wavelet coefficient magnitudes, in 5th International Conference on Image Processing. Stoyan, D., Kendall, W. S., and Mecke, J. (1995). Stochastic Geometry and its Applications, 2d. ed., Wiley Series in Probability and Statistics. New York: Wiley. Szeliski, R., and Terzopoulos, D. (1989). From splines to fractals. Computer Graphics 23(3), 51 – 60. Terzopoulos, D., Platt, J., Barr, A., and Fleischer, K. (1987). Elastically deformable models, Computer Graphics 21(4), 205 – 214. Tolhurst, D., Tadmor, Y., and Chao, T. (1992). Amplitude spectra of natural images, Ophthal. Physiol. Opt. 12, 229 – 232. Turk, G. (1991). Generating textures on arbitrary surfaces using reaction-diffusion, Computer Graphics 25(4), 289 – 297. Voss, R. F., and Clarke, J. (1975). f1 noise in music and speech, Nature, 258 317 – 318. Voss, R. F., and Wyatt, J. C. Y. (1993). Multifractals and the local connected fractal dimension: classification of early Chinese landscape painting, in Applications of Fractals and Chaos (H. Jones, A. J. Crilly, R. A. Earnshaw, Eds.). New York: Springer-Verlag, pp. 171 – 184. Weil, J. (1986). The synthesis of cloth objects, Computer Graphics 20, 49 – 54. Van Wijk, J. J. (1991). Spot noise: texture synthesis for data visualization, Computer Graphics 25(4), 309 – 317. Winckler, G. (1995). Image Analysis, Random Field and Dynamic Monte Carlo Methods. New York: Springer-Verlag. Witkin, A., and Kass, M. (1991). Reaction-diffusion textures, Computer Graphics 25(4), 299 – 307. Worley, S. (1996). A cellular texture basis function, Computer Graphics Annual Conferences Series, pp. 291 – 293.
242
LUIS ALVAREZ, YANN GOUSSEAU, AND JEAN-MICHEL MOREL
Zhu, S. W., Wu, Y., and Mumford, D. (1998). Filters, random fields and maximum entropy (FRAME), Int’l Journal of Computer Vision 27, 1 – 20. Ziemer, W. P. (1989). Weakly Differentiable Functions. New York: Springer-Verlag.
ACKNOWLEDGEMENTS We would like to thank Vicent Caselles, Stuart Geman, St´ephane Mallat, and David Mumford for several helpful comments and discussions. We also thank Les Treilles foundation, which made some of these discussions possible.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 111
Reconstruction of Nuclear Magnetic Resonance Imaging Data from Non-Cartesian Grids GORDON E. SARTY University of Saskatchewan, Department of Medical Imaging, Royal University Hospital, 103 Hospital Drive, Saskatoon, Saskatchewan, Canada S7N 0W8
I. Introduction . . . . . . . . . . . . . . . . . . . . . A. Generation of an MRI Signal . . . . . . . . . . . . . . B. k-Space . . . . . . . . . . . . . . . . . . . . . C. Gradient Limitations . . . . . . . . . . . . . . . . II. Gridding Reconstruction and Direct Reconstruction . . . . . . . . A. Gridding Reconstruction . . . . . . . . . . . . . . . B. Direct Reconstruction . . . . . . . . . . . . . . . . C. Using Gridding to Provide an Approximation of Direct Reconstruction III. Natural k-Plane Coordinate Systems . . . . . . . . . . . . . A. Normal-Segment Coordinates . . . . . . . . . . . . . . B. Direct Reconstruction via Natural k-Plane Coordinates . . . . . C. Examples of Natural k-Plane Coordinates . . . . . . . . . . D. Reconstruction from a Single Acquisition of Data . . . . . . . IV. Integrating Curve Band-pass Operators . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
244 245 248 253 254 256 257 258 259 259 260 261 262 263
A. Fourier Transforms Restricted to Curves in 2 . . . . . . . . . . B. The Curve Band-pass Operator on Paley-Wiener Spaces . . . . . . . C. Band-pass Operators as Integrals of Curve Band-pass Operators . . . . D. A Resolution of the Identity for Paley-Wiener Spaces . . . . . . . . E. Further Properties of Curve Band-pass Operators . . . . . . . . . V. Curve Band-pass Operator Point-Spread Functions . . . . . . . . . . A. The PSF as an Inverse Fourier Transform of a Finite Nonnegative Lebesgue Measure . . . . . . . . . . . . . . . . . . . . . . . B. PSFs Associated with Curved k-Plane Trajectories Are Peaked . . . . . C. The PSFs Associated with Lines, Squares and Circles in k-Space . . . . D. Integration of Curve Band-pass Point-Spread Functions . . . . . . . E. Operators Associated with Reconstruction from Discrete Data Samples . . VI. Direct Reconstruction Using Natural k-Plane Coordinates . . . . . . . A. Sample Band-pass Operators and Their Point-Spread Functions . . . . B. Reconstructions of Undersampled Simulated MRI Data . . . . . . . VII. Conclusion . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
264 270 273 276 278 281
Volume 111 ISBN 0-12-014753-X
281 283 286 298 306 312 313 316 323 325
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
244
GORDON E. SARTY
I. INTRODUCTION Data for magnetic resonance imaging (MRI) are collected in two-dimensional frequency space, the k-space, and so must be Fourier-transformed to yield an image for the physician. MRI data have traditionally been acquired on a Cartesian grid in k-space, which makes reconstruction via the fast Fourier transform (FFT) a straightforward exercise. The properties and limitations of acquiring and reconstructing MRI data on Cartesian grids are well understood via the Nyquist theorems. Recently, in an effort to accelerate data acquisition, MRI data have been gathered along non-Cartesian grids such as those generated by spirals and rosettes. The standard way of reconstructing data gathered on non-Cartesian grids is to interpolate back to a Cartesian grid by a process known as gridding so that the FFT may then be applied. However, it is not necessary from a mathematical point of view to interpolate these data to a Cartesian grid; they may be reconstructed into an image directly. A rigorous analysis of the direct reconstruction of MRI data sampled on non-Cartesian grids in k-space is the subject of this article. The first part of the article gives a quick background on the nature of the MRI signal, and the signal model chosen for analysis is fixed. In Section II the process of gridding is reviewed, and it is shown how gridding is related to direct reconstruction. Non-Cartesian grids may be viewed as grids that arise from a related coordinate system in k-space; the nature of these natural coordinate systems and their relation to direct reconstruction is the subject of Section III. The article takes a turn for the abstract in Section IV, where ideal direct reconstruction operators on Paley-Wiener and Sobolev spaces are defined and characterized. These operators turn out to have very interesting properties and can be integrated in an operator sense to produce bandpass operators. The material of Section IV may not be of much concern to one who is interested only in reconstruction algorithms, but the careful reader will see that the subject of Fourier transforms is not a trivial one. For example, the ideal reconstruction operators show that one-dimensional curves in k-space contain enough information to produce a two-dimensional image. In Section V it is seen that the zero-dimensional set of sample points contain two-dimensional image information; a point that can be missed if data are treated exclusively with the discrete Fourier transform and the FFT. Examples of reconstructions of simulated MRI data are given in Section VI, where the importance of the point-spread function is emphasized and an approach to the formulation of sampling theorems similar to the Nyquist theorems for Cartesian data is given. What follows is an analysis of two-dimensional MRI reconstruction; direct three-dimensional MRI is also possible, and the results given here are easily applicable to three-dimensional MRI. The phrases k-space and k-plane
245
RECONSTRUCTION OF NMR IMAGING DATA
have been used interchangeably to remind the reader that the two-dimensional theory given is also applicable to three-dimensional (and higher) situations. Finally, it is noted that long baseline interferometric radio telescope data are also acquired in frequency space, so the image-reconstruction techniques given here are also applicable to radio astronomy.
A. Generation of an MRI Signal A schematic of the hardware important for the operation of a magnetic resonance imager is shown in Figure 1. The essential components are the main magnetic field coils, the X, Y, and Z gradient magnetic field coils, and the radio frequency (RF) transmit and receive coils.
X gradient field coils (Two more in front)
Y gradient field coils (4 total)
Main field coils (4 total) Receiver coil (RF transmitter in similar location)
X Z Y
z gradient coils
FIGURE 1. The arrangement of magnets in a nuclear magnetic resonance imager.
246
GORDON E. SARTY
The main magnetic field is required to generate differing quantum spin energy levels in the atomic nuclei of the object to be imaged so that the phenomenon of nuclear magnetic resonance (NMR) may be induced. A typical clinical MRI of 1999 has a main magnetic field strength of 1.5 (Tesla), although both higher and lower field magnets are being used in specialized circumstances. The main field causes atomic nuclei with noninteger quantum spin numbers to assume differing spin energies [Slichter, 1963]. The most abundant atomic nuclei in the human body having a noninteger quantum spin number are hydrogen nuclei, which are found primarily in water and lipid (fat) molecules. So, although it is possible to image other nuclear species, clinical imagers currently image hydrogen nuclei. The hydrogen nucleus, with spin 12 , assumes two different energy configurations, with most of the spins occupying the higher energy level. On a macroscopic level, an ensemble of nuclear spins may be considered as occupying the higher energy level and can be visualized as a magnetic moment vector aligned with the direction of the main magnetic field. The macroscopic magnetic moment vector is termed magnetization; if it is perturbed from alignment with the main magnetic field, it will precess with a frequency (called the Larmor frequency) of ω D B, where B is the strength of the magnetic field and is the gyromagnetic ratio. For hydrogen nuclei in the absence of chemical shift, /2 D 42.6 MHz/T. The phenomenon of chemical shift is not an issue in the following discussion of image reconstruction from MRI signal data, but it is important enough to mention briefly. In the human body, the hydrogen atom is always chemically bound to a molecule, principally a water or lipid molecule. The electronic shell configuration of the molecule slightly alters the magnetic field seen by the hydrogen nucleus, and this in turn slightly alters the Larmor frequency of the nucleus. When imaging, it is desirable to work with a signal generated by nuclei precessing at a single Larmor frequency in the main magnetic field. Care is therefore taken in practice to suppress the signal from the unwanted component in a process called magnetization saturation, by exciting the spins with a narrow frequency RF pulse, or by acquiring data so that the difference in Larmor frequencies of the two main components is very small or very large relative to the frequency shifts caused by the subsequent application of the imaging magnetic field gradients. A small difference will make the two frequencies essentially the same; a large difference will allow unwanted frequencies to be filtered out. The RF coils transmit and receive radio energy from the hydrogen nuclei at the resonant Larmor frequency. By decomposing the magnetic field of a transmitted RF pulse into two circularly polarized fields, it can be seen that one of the components will precess at the same frequency as the magnetization in the X-Y plane [Hinshaw and Lent, 1983]. The presence of this transverse,
RECONSTRUCTION OF NMR IMAGING DATA
247
resonant, rotating magnetic field from the RF pulse causes the magnetization vector to tip from alignment with the main field into the transverse X-Y plane. It is necessary to move the magnetization vector into the transverse plane because only there is it possible for the magnetization to generate a small voltage in the receive RF coil via Faraday’s generator law. By varying the duration and strength of the RF transmit pulse, the tip angle of the magnetization vector from the main magnetic field can be controlled, with a tip angle of 90° producing the maximum observable voltage or signal in the receive coil. Once excited, the generated signal will decay via two mechanisms known as spin-lattice (T1 ) and spin-spin (T2 ) relaxation. The relaxation of the macroE can be phenemologically described by the scopic magnetization vector, M, Bloch equation [Bloch, 1946; Hinshaw and Lent, 1983]: E 1 1 dM E ðB E Mx EI C My J E E Mz M0 K D M dt T2 T1
1
E are the unit vectors in the X, Y, and Z directions, respectively, where EI, JE and K Mx , My , and Mz are the components of the magnetization vector in those directions, M0 is the equilibrium length of the magnetization vector in the E is the applied magnetic field, which includes the main field Z-direction, and B plus any applied RF magnetic field plus any applied gradient field. Both the T1 - and T2 -terms lead to the exponential decay of the signal-producing X-Y component of the magnetization in the absence of applied RF energy. For the purposes of this article, it may be assumed that signal acquisition is very short relative to the relaxation times. However, in practice the signal-acquisition time can be a significant fraction of either the T1 or T2 rate, especially for the fast-scan signal-acquisition sequences. So, although acquisition that is slow relative to T1 or T2 introduces error, it can frequently be ignored and an average value of the signal can be considered as being responsible for the final reconstructed image. The secret to NMR imaging lies in using a single species of nucleus, at a single Larmor frequency, to generate the signal and using applied gradient magnetic fields to modify that Larmor frequency into a function of spatial position. The application of gradient fields during signal acquisition leads to the concept of acquiring the signal in the spatial frequency domain or k-space, as discussed in detail next. The preceding is an extremely brief overview of the hardware required and its function in making an NMR image. Since it is not essential to the main theme of this article, many details about how a signal is generated using a sequence of RF and gradient pulses have been omitted. The timing of the pulses is very important and the variation of certain times, known
248
GORDON E. SARTY
as repeat (TR ) and echo (TE ) times, lead to different contrasts between the tissues being imaged. Details on pulse sequences and their effect on image contrast may be found elsewhere [Vlaardingerbroek and den Boer, 1996]. Two types of magnetization preparation through the application of the appropriate pulse sequences have been traditionally popular. These sequences are the spinecho and gradient-echo sequences. The spin-echo sequence produces a signal whose decay is resistant to decay due to main magnetic field inhomogeneity, whereas the gradient-echo sequence is capable of producing a signal sooner after the initial excitation. Other magnetization-preparation sequences that use steady-state effects are also possible, but the main point is that the acquisition of imaging data can be functionally separated from the preparation of the magnetization that precedes it [Stehling, Turner, and Mansfield, 1991]. Figure 1 shows the Z axis along the main magnetic field and the X and Y directions in a transverse plane. In what follows, x and y are used to denote directions on the image slice. Although it is possible to arrange for the image slice to be in the transverse plane, the MRI is a much more flexible tool. By using appropriate combinations of RF and gradient-field application, nuclei along any thin slice through the body’s volume may be excited and reconstructed into an image. Therefore, x and y are used to denote image directions, and it should be remembered that the image slice is not always in the transverse plane of the main field. B. k-Space The application of a gradient field causes the Larmor frequency of the nuclei of interest to become a linear function of distance from the plane of unmodified magnetic field. The gradient fields are used to encode spatial information as frequency information in the MRI signal. Application of gradients in the three spatial directions is required to complete the frequency encoding. To simplify the presentation, it will be assumed that a slice, rather than a volume, of magnetization has been excited. However, the imaging principles described here in two dimensions may be extended without significant complication to three dimensions. To be more specific, MRI data are acquired in the presence of a gradient field after magnetization preparation via the application of RF pulses. This causes spatial frequency to be a function of time. The spatial frequency domain is known as k-space, and the MRI signal may be modeled as [Ljunggren, 1983; Twieg, 1983]: E
p E e2iEpÐKt dE p
St D A
2
RECONSTRUCTION OF NMR IMAGING DATA
249
where is the transverse magnetization density (the image), p E D x, y repreE sents spatial position, A represents a set containing the support1 of and Kt represents the k-space coordinate at time t. The k-space position is determined by the gradient field configuration, as given explicitly by E DK E0 C Kt
2
t
E d G
3
tI
The time tI represents the beginning of signal acquisition, and the time course of the k-space evolution of the signal is determined by the configuration of the E D Gx , Gy during the acquisition.2 The starting point, K E 0, gradient fields G is determined by the application of gradient fields before signal acquisition. The first NMR image was produced by acquiring data radially in k-space E 0 D 0, 0 and various combinations [Lauterbur, 1973]. In that application K of constant (non-time-varying) gradients Gx and Gy were applied to yield kspace trajectories for each signal acquisition, as shown in Figure 2. The radial method of data acquisition proved to be sensitive to distortion by magnetic field inhomogeneities, and the parallel line, or spin warp, method illustrated in Figure 3 soon replaced the radial method.3 Spin warp acquisition is accomplished with one gradient (the read gradient ) turned on (Gx for Figure 3) and kx 4
2
−4
−2
2
4
kx
−2
−4
FIGURE 2. Radially oriented k-space acquisition lines. 1
The support of a function is the set of values where the function is not equal to zero. E D Gx , Gy , Gz . Note that in a three-dimensional imaging situation, G 3 There now exist many excellent algorithms for correcting magnetic field inhomogeneities [Noll et al., 1991, 1992], so radial data acquisition is once more a viable alternative. Many of the time-varying gradient-acquisition methods are also sensitive to magnetic field inhomogeneity and require the same correction algorithms. 2
250
GORDON E. SARTY 200 150 100
ky
50 0 −50 −100 −150 −200 −200 −150 −100 −50
0 kx
50 100 150 200
FIGURE 3. The k-plane lines associated with spin warp imaging.
the other turned off during signal acquisition. As with the radial acquisition, the gradients applied in spin warp imaging are constant (i.e., switched on to a constant level just before signal acquisition and switched off after signal acquisition). The ky steps between the acquisitions illustrated in Figure 3 are accomplished through the application of the Gy gradient (the phase gradient) before signal acquisition. Applications of the Gx gradient before each E 0. acquisition are also required to achieve the required nonzero K The need to allow T1 relaxation time before enough magnetization has been reestablished along the Z direction for another signal-generating RF pulse to act on means that it is required to wait between approximately 0.5 and 3 s between signal acquisitions for the radial or spin warp methods. This waiting time, or TR , makes these two methods relatively slow. One way to speed up acquisition is to allow the gradient fields to vary during signal acquisition, thereby allowing more of the k-space to be covered during the acquisition. The first such fast-scan method to be implemented is called echo planar imaging, or EPI [Mansfield, 1977] and involves the rapid switching of the gradient fields to cover k-space in a single or a few acquisitions. The echo planar method is illustrated in Figure 4. Other time-varying gradient methods followed EPI, beginning with spiral imaging [Meyer et al., 1992; Noll, 1995] and, more recently, rosette imaging [Noll, 1997; Noll, Peltier, and Boada, 1998]. In (Archimedean) spiral imaging, the applied gradient fields required are Gx t D 2/ A/T cos2ωt/T 2ωAt/T2 sin2ωt/T
4
251
RECONSTRUCTION OF NMR IMAGING DATA Gy ky ... t
kx
No signal collection Gx ...
t
FIGURE 4. The gradient fields and the k-plane trajectory associated with EPI.
and Gy t D 2/ A/T sin[2ωt/T] C 2ωAt/T2 cos[2ωt/T]
5
where t 2 [0, T]. The corresponding k-space trajectory is given by Kx t, Ky t D At/T cos[2ωt/T], sin[2ωt/T]
6
an example of which is shown in Figure 5. For rosette imaging, the applied gradient fields are Gx t D 22 A/T ω C ω1 sinω C ω1 2t/T C ω ω1 sinω ω1 2t/T
7
Gy t D 2 A/T ω C ω1 cosω C ω1 2t/T 2
C ω ω1 cosω ω1 2t/T
8
resulting in the k-space trajectories Kx t, Ky t D A cos2ωt/Tcos2ω1 t/T, sin2ω1 t/T
9
an example of which is shown in Figure 6. Many other k-space trajectory variations are also possible. For example by using the gradient fields Gx t D
2 ωx Ax sin[ωx t px ]
10
252
GORDON E. SARTY ky 15 10 5 −10
k 20 x
10 −5 −10 −15 −20
FIGURE 5. The k-plane curve associated with a spiral scan. Here ω D 10 and A D 20. 600 400 200 0 −200 −400 −600 −600 −400 −200
0
200
400
600
FIGURE 6. An example of the k-space trajectory followed during a single rosette acquisition.
and Gy t D
2 ωy Ay sin[ωy t py ]
11
it is possible to generate Lissajous trajectories in the k-plane. Lissajous trajectories are given explicitly by Kx t, Ky t D Ax cos[ωx t px ], Ay cos[ωy t py ]
12
where ωx /ωy must be a rational number. The period, , of the Lissajous figure will be 2N 2M D D 13 jωx j jωy j
RECONSTRUCTION OF NMR IMAGING DATA
253
ky 20
10
−20
−10
10
20
kx
−10
-20
FIGURE 7. A Lissajous k-plane trajectory. Here Ax D Ay D 20, ωx D 15, ωy D 16, px D /2, py D 0, t 2 [tI , tF ], where tI D /2 and tF D /2.
where N, M 2 are relatively prime and jωx j/jωy j D N/M. An example is plotted in Figure 7.
C. Gradient Limitations Although the specification of specific gradient fields leads to specific k-plane trajectories via Equation (3), the set of physically accessible gradient functions G is restricted by the gradient coil design. Two opposing factors must be balanced in the design of the gradient coils: the size of the region of linearity where G is linearly proportional to the applied current (G D CI, where I is the applied current) and the coil self-inductance L [Vlaardingerbroek and den Boer, 1996]. Ideally, the individual coil inductance should be as small as possible, because the applied voltage V is governed by VDL
dI C IR dt
14
where R is the relatively small resistance of the gradient coil. Once the coil design and L are fixed, the limitations are set by the electrical power that the current amplifiers driving the coils are able to generate. Manufacturers specify the maximum gradient amplitude, G, and slew rate, dG/dt, that their amplifier/coil system can attain. These two specifications limit the set of physically attainable gradient functions and limit the speed at which NMR imaging data may be acquired.
254
GORDON E. SARTY
The maximum gradient strength required for an Archimedean spiral scan occurs at the end of the scan, at t D T, and is given by jGjmax D
42 Aω T
15
The corresponding maximum slew rate is given by dG dt
D max
83 ω2 A T2
16
The maximum gradient strength required for a rosette scan is given by jGjmax D
42 Aω T
17
The corresponding maximum slew rate is given by dG dt
D max
83 ω2 C ω12 A T2
18
Note that continuous EPI data acquisition is impossible because the ideal square-wave gradient waveforms would require an infinite slew rate. EPI works by either excluding data acquired while the gradients are between steps or by sampling at a rate and with timing that avoids sampling while the gradient currents are changing.
II. GRIDDING RECONSTRUCTION AND DIRECT RECONSTRUCTION The idea behind the collection of NMR imaging data in k-space is to collect enough samples near the origin in a region so that a numerical or discrete Fourier transformation may be used to reconstruct the image. An exact reconstruction would require knowledge of the complete integral Fourier transform of the desired image . That is, we would need to know k O x , ky for every point kE D kx , ky , where the integral Fourier transform is defined by E D O k
1
1
1
1
E
p E e2ikÐEp dE p
19
RECONSTRUCTION OF NMR IMAGING DATA
255
If O were completely known, then could be recovered through the inverse integral Fourier transform: p E D
1
1
1
1
E
E 2ikÐEp dkE O ke
20
E Two practical limitations prevent the acquisition of all the values of O k: (1) Only a finite radius in k-space may be surveyed, and (2) with modern digital technology, k-space is sampled, i.e., only a finite number of values of E can be measured. O k The second limitation could theoretically be avoided through the use of analog technology, but noise issues currently appear to give digital technology an edge. Sampling issues are discussed in this section and then set aside for a couple of sections while the mathematical consequences of the first limitation are explored. Perhaps a future analog technology will give the curve band-pass operators (introduced in Section III) a direct application, but for the present those operators may be used to shed light on the mathematical issues behind the reconstruction of arbitrarily sampled k-space data. For conventional spin warp and EPI acquisition, the region is a square, E are acquired on a Cartesian grid within . If all the values and samples of O k E E of O k for k 2 were known, then a band-limited version of , B could be reconstructed: Ep E E 2ikÐE E D O ke dk 21 B p
The samples may be used to compute a Riemann sum approximation, P, to Equation (21): E
O kEpq e2iEpÐkpq 1kx 1ky
Pp E D p
22
q
which is a discrete Fourier transform of the given data. The properties of this approximation are well understood through the Nyquist sampling theorem [Higgins, 1996]. The important conclusion from the Nyquist theorem relevant to MRI reconstruction is that the sample spacing, 1k, must satisfy the Nyquist criteria 1 1k 23 2A where A is the smallest radius of a circle that encloses the support of . Another advantage of Equation (22) is that, for a specific grid of points fp E rs g, the sum may be computed with the computationally efficient fast Fourier transform (FFT) [Brigham, 1974]. The grid of points fp E rs g define the centers
256
GORDON E. SARTY
of the pixels4 of the reconstructed image. For a typical 256 ð 256 MRI image, the reconstruction time may, depending on the computer used, be reduced from many minutes or even hours to less than a second by using an FFT over the direct computation of the sums and products in Equation (22). In fact, the FFT is so efficient and intuitive to many that when data are collected on non-Cartesian grids, such as those that arise from spiral or rosette acquisition, the first step taken is an interpolation of the data from the acquisition grid to a Cartesian grid. Once the data have been interpolated to the Cartesian grid, an FFT is applied to effect the reconstruction. But the interpolation of data using standard interpolation approaches, such as polynomial interpolation, leads either to excessive computational time for the interpolation or to a poor reconstruction. Therefore a more sophisticated interpolation approach, known as gridding, has been developed [Jackson et al., 1991; O’Sullivan, 1985; Schomberg and Timmer, 1995] for fast and accurate interpolation in k-space.
A. Gridding Reconstruction Gridding begins with the convolution of the data with a function of small support. Evaluating the result of the convolution on a Cartesian grid gives data that can be fast Fourier transformed. After the data have been discrete Fourier transformed, division by the inverse integral Fourier transform of , L , gives the reconstructed image. More precisely, to effect a gridding reconstruction, begin with the generalized function associated with the data samples: O x , ky Skx , ky D k
υkx Kx p, q, ky Ky p, q p
24
q
where kx and ky represent any point in k-space and Kx p, q and Ky p, q represent the sample point k-space coordinates. The index q corresponds to data points from a single acquisition, whereas p will index an “interleaf” paO x p, q, Ky p, q. rameter. Note that SKx p, q, Ky p, q Sp, q D K Convolve the generalized function with , Sc kx , ky D
4
1
1
1
1
S, kx , ky d d
Pixels are “picture elements,” the tiny squares that make up a computer image.
25
RECONSTRUCTION OF NMR IMAGING DATA
257
and evaluate on a Cartesian grid: Sc n1kx , m1ky D
Sp tq n1kx Kx p, q, m1ky Ky p, q 26 p
q
The small support of ensures that the time required to compute the sums and products in Equation (26) will not be excessive. The selection of needs due consideration; the reader is referred to other literature [Jackson et al., 1991; O’Sullivan, 1985; Schomberg and Timmer, 1995] for a discussion of the relevant issues. As will be seen in later sections, there is a calculable point-spread function (PSF) associated with gridding reconstruction, and it was realized in radio astronomy applications that weighting the data could “taper” the PSF [Bracewell and Thompson, 1973; Brouw, 1975]. The taper function f added to Equation (26) gives Scw n1kx , m1ky D
Sp, qfp, q p
q
ð n1kx Kx p, q, m1ky Ky p, q
27
The specification of the taper function is usually given by the inverse of the k-space sampling density; in the next section a more natural way of specifying the taper function is given. The reconstructed image is finally computed via RSx, y D IDFTScw x, y/ L x, y
28
where the inverse discrete Fourier transform (IDFT) is computed via the FFT. B. Direct Reconstruction A direct, or weighted correlation [Maeda, Sano, and Yokoyama (1988)], reconstruction of the original data, weighted by the taper function f, can be accomplished with RSx, y D
Sp, qe2ixKx p,qCyKy p,q fp, q p
29
q
The drawback to Equation (29) is that there has been no fast algorithm, such as the FFT, immediately available for computing the large number of sums and products. However, methods based on fast Chebyschev polynomial interpolation [Dutt, Gu, and Rokhlin, 1993] do exist for computing nonuniform discrete
258
GORDON E. SARTY
Fourier transforms [Dutt and Rokhlin, 1993, 1995] and have been applied to the problem of reconstructing spiral MRI data [Tong and Cox, 1998]. C. Using Gridding to Provide an Approximation of Direct Reconstruction The direct reconstruction of Equation (29) involves summing data along a nonCartesian trajectory in k-space. The gridding reconstruction approach involves convolving non-Cartesian samples with a function , sampling the result on a Cartesian grid, and summing with an IDFT. Even if the data were weighted by the same taper function, why should the two approaches be approximately equivalent? The answer is simple. Gridding can alternately be viewed as convolving a Cartesian-sampled version of the exponential function with a function , sampling the result on the data-sampling grid, and summing along the given trajectory. This assertion is now shown in detail. Writing out IDFTScw x, y from Equation (28) explicitly and changing the order of summation results in IDFTScw x, y D
n1kx Kx p, q, m1ky
Sp, qfp, q p
q
n
m
Ky p, qe2ixn1kx Cym1ky
30
The term within the square brackets of Equation (30) is the convolution of the exponential function sampled on a Cartesian grid with , evaluated at the data sample grid points tq and ˇp . Denoting that term by Ep, q, it explicitly is Ep, q D
1
1
1
1
υkx n1kx , ky m1ky e2ixkx Cyky n
m
ð kx Kx p, q, ky Ky p, q dkx dky
31
and it is claimed that Ep, q/ L x, y ¾ D e2ixKx p,qCyKy p,q
32
Substituting the right-hand side of Equation (32) into Equation (28) gives Equation (29). To see that Equation (32) is true, recall that the discrete Fourier transform is used as an approximation for the integral Fourier transformation. Therefore, Ep, q ¾ D
1
1
1
1
kx Kx p, q, ky Ky p, qe2ixkx Cyky dkx dky
D L x, ye2ixKx p,qCyKy p,q
33
RECONSTRUCTION OF NMR IMAGING DATA
259
The selection of an optimal taper function is an interesting problem. As mentioned before, f is often set as the inverse of the k-space sample density [Maeda, Sano, and Yokoyama, 1988]. For non-Archimedean spirals, the Jacobian determinant of a transformation from spiral to Cartesian coordinates has been used as the taper function [Hoge, Kwan, and Pike, 1997]. As shown shortly, natural local coordinate systems may be associated with many k-space trajectories, and the corresponding Jacobian determinant presents itself as a natural taper function. It is noted that other reconstruction methods, in particular a singular-value decomposition method [Rosenfeld, 1998], lead to taper, or density, functions that are quite complicated and that can take negative values. The study of such reconstruction algorithms is beyond the scope of this article.
III. NATURAL k-PLANE COORDINATE SYSTEMS Equation (21) can be written in a nice way for MRI data gathered on arbitrary k-space trajectories. Associated with a given trajectory or curve on the k-plane is a local coordinate system about the curve, provided the curve doesn’t do funny things such as cross itself. Even then, a curve with a finite number of crossing points has local coordinate systems on pieces of the curve. Because a given k-plane curve is parameterized naturally by the acquisition time, t, it may be used as a first coordinate. A second coordinate, ˇ, can be had in an arbitrary manner transverse to the given curve, but there are two natural choices for the definition of that coordinate. If the k-plane curve admits an interleaf parameter, then it can be used as the second coordinate. Spirals, for example, may be rotated about the origin of the k-plane to cover a disk in k-space, so polar angle presents itself as a natural second coordinate. Dilation is another possible way to interleave data acquisitions, with circles in k-space, for example. The second natural choice for a second coordinate is to specify it to be in a direction orthogonal to the given curve at a rate equal to the underlying Cartesian kx and ky coordinate rates.
A. Normal-Segment Coordinates The second type of natural ˇ coordinate will lead to a coordinate system called the normal-segment coordinate system, because the ˇ coordinates consist of short line segments at right angles to the given k-plane curve. Normal-segment coordinates are constructed explicitly as follows. Let ϕE denote the required
260
GORDON E. SARTY
coordinate system. Let E D At
∂ϕx ∂ϕy , ∂t ∂t
D ˇD0
dKx dKy , dt dt
E D Kx , Ky and be the tangent vector to the k-plane curve K ∂ϕx ∂ϕy E t D , B ∂ˇ ∂ˇ ˇD0 E ÐB E D 0. Also, ˇ 2 [W, W] should It is required to construct ϕE such that A define a strip of uniform width 2W (relative to the kx and ky units) around the k-plane curve. The easiest way to do this is to consider the line perpendicular E E to dK/dtt at the point Kt given by the line 1 dKx dKy ˇ 7! t, ˇ t C Kx t, Ky t ˇ vt dt dt where vt is as given by vt D
dKx t dt
2
C
dKy t dt
2
34
the velocity of parameterization in t. That is, choose ϕx t, ˇ D Kx t
ˇ dKy t vt dt
35
ϕy t, ˇ D Ky t C
ˇ dKx t vt dt
36
B. Direct Reconstruction via Natural k-Plane Coordinates To apply the natural k-plane coordinates to the reconstruction problem, write Equation (21) explicitly using Cartesian k-plane coordinates: Bx, y D
ky2 ky1
kx2
k O x , ky e2ixkx Cyky dx dy
37
kx1
Rewriting this integral with respect to the natural k-plane coordinates ϕE : t, ˇ ! kx , ky gives Bx, y D
W2 W1
tF tI
t, O ˇe2ixϕx t,ˇCyϕy t,ˇ jJt, ˇj dt dˇ
38
RECONSTRUCTION OF NMR IMAGING DATA
261
where jJj is the determinant of the Jacobian of the transformation ϕE . A Riemann sum approximation of Equation (38) will give a direct reconstruction formula similar to Equation (22). Explicitly, let q index the samples obtained along a given k-plane trajectory and let p index the interleaves so that5 kEpq D ϕE tq , ˇp and the samples are given by Sp, q D t O q , ˇp . Then the direct reconstruction is given by E
O kEpq e2iEpÐkpq jJtq , ˇp j 1t 1ˇ
Pp E D p
39
q
Comparing Equation (39) to Equation (29) shows that natural k-plane coordinates suggest that fp, q D jJtq , ˇp j1t1ˇ be used as a taper function in direct or gridding reconstruction. The fact that the Riemann sum approximation to the band-limiting integral, Equation (21), is numerically very similar to the result of gridding reconstruction makes the study of band-limiting operators directly relevant to understanding the properties of gridding reconstruction. Before looking at the band-limiting operators, some explicit examples of natural k-plane coordinates are presented. C. Examples of Natural k-Plane Coordinates Coordinates based on interleaved Archimedean spirals on the k-plane may be written as ϕx t, ˇ D At/T cos[2ωt/T C ˇ]
40
ϕy t, ˇ D At/T sin[2ωt/T C ˇ]
41
where t 2 0, T] and ˇ 2 [, . The origin would be a singular point of the coordinate system and so must be excluded. After a time T the spirals will be sampling at a radius A on the k-plane and the region of the corresponding band-limiting operator will be a disk. The Jacobian determinant of the Archimedean spiral coordinate system is given by jJt, ˇj D A2 t, which, because of symmetry, is independent of ˇ. For the case of a more general E E non-Archimedean spiral Kt, whose radial component jKtj increases monotonically, it can be shown that the Jacobian determinant associated with its E ÐK E 0 t [Hoge, Kwan, and Pike, natural coordinates is given by jJt, ˇj D Kt E by Equation (3). E 0 is proportional to G 1997]. Note that the derivative K 5 The acquisition trajectories are related explicitly to the natural k-plane coordinates by E ˇ D ϕE t, ˇ. Note that when there is no interleaf parameter, such as for sinusoidal trajectoKt, ries, the given curve corresponds to ˇ D 0.
262
GORDON E. SARTY
Rosette scans may be interleaved by rotation as well. With rotation as the second natural k-plane coordinate, many redundant coordinate systems valid on a disk about the origin of k-space (without the origin and the outer edge of the disk) are generated. The domains for the t coordinates in each coordinate system are obtained by restricting t so that its image is one outward-going or one inward-going side of a leaf. That way, the t-coordinate represents a “curved radius” on the k-plane. On valid domains for t, the rosette coordinates are given by ϕx t, ˇ D A cos2ωt/T cosˇ C 2ω1 t/T
42
ϕy t, ˇ D A cos2ωt/T sinˇ C 2ω1 t/T
43
The associated Jacobian determinant when T D 2 is given by jJt, ˇj D jA2 ω cosωt sinωtj. It is independent of both ˇ and ω1 due to symmetries. The Jacobian determinant for the normal-segment coordinate system is given by ∂ϕx ∂ϕy ∂ϕx ∂ϕy ∂t ∂ˇ ∂ˇ ∂t 2 2 1 dKx dKy t C t D vt dt dt dKx d2 Ky ˇ dKy d2 Kx t 2 t t 2 t 2 v t dt dt dt dt
jJϕE t, ˇj D
44
When ˇ doesn’t correspond with an interleaf parameter for normal-segment coordinates, as in the case of a sinusoidal trajectory, then only the value of the Jacobian determinant on the given trajectory (at ˇ D 0) is required and jJϕE t, 0j D vt, where vt is the speed of parameterization of the k-plane trajectory, as given by Equation (34). The speed of parameterization gives the inverse of the sampling density along the given trajectory, which will be different, in general, from the two-dimensional density of samples in 2 . D. Reconstruction from a Single Acquisition of Data The possibility of having data along only a single k-plane trajectory leads to the consideration of Equation (38) without the integration with respect to ˇ and the operator CK defined by tF
CK p E D tI
E
t, O 0e2ipE Ðk jJt, 0j dt
45
RECONSTRUCTION OF NMR IMAGING DATA
263
A Riemann sum approximation of Equation (45) gives a formula to apply to the reconstruction of MRI data sampled along a single k-plane curve [Sarty, 1995] E PK p E D O kEq e2iEpÐkq jJtq , 0j 1t 46 q
where kEq are the Cartesian k-space coordinates of the sample points. The application of Equation (46) in cases where the given trajectory sufficiently covers a region in k-space leads to very good reconstructions. An example of such a reconstruction for sinusoidal data is given later in the article. The operator CK turns out to be an interesting object from a mathematical point of view. Some of its mathematical properties are reviewed in the next section. The less mathematically inclined may safely skip the next section and pick up the story afterward without too much loss in understanding the general underlying principles.
IV. INTEGRATING CURVE BAND-PASS OPERATORS The reconstruction of MRI data is all about Fourier transforms. Most will be familiar with the integral Fourier transform as defined in Equation (19) and the discrete Fourier transform as defined in Equation (22). Here it is shown how to rigorously define a Fourier transform from functions on the domain defined by the trace K of a given k-plane trajectory to functions on 2 . The relevant function spaces are the space of square summable functions on the plane, L 2 2 , and Fourier transforms of the Paley-Wiener spaces, L 2 A, where A is the support of the function to be reconstructed, . The subset of differentiable (in a generalized sense) functions in L 2 2 are also important. These subsets are the Sobolev spaces. Definition 1 Let Wm 2 be the Sobolev space consisting of equivalence classes of functions f that are associated with tempered distributions Tf such that f 2 Wm 2 if and only if jjfjjWm 2 D
O kj E 2 1 C jjkjj E 2 m dkE < 1 jf 2
O denotes the Fourier transform of f defined through the distribution where f Tf via Tf D TfO . The space Wm 2 is a Hilbert space with the following
264
GORDON E. SARTY
inner product: O kO E gk1 E C jjkjj E 2 m dkE f
hf, giWm 2 D
2
where f and g 2 Wm 2 and the overbar denotes complex conjugation. Definition 2 Let Lm2 2 be the Sobolev space consisting of equivalence classes of functions f, where f 2 Lm2 2 if and only if jjfjjLm2 2 D
jfp E j2 1 C jjpjj E 2 m dE p<1 2
The space Lm2 2 is a Hilbert space with the following inner product: hf, giLm2 2 D
fp E gp E 1 C jjpjj E 2 m dE p 2
where f and g 2 Lm2 2 . Note that the (properly extended) Fourier transform is a Hilbert space isomorphism.
: Lm2 2 ! Wm 2
A. Fourier Transforms Restricted to Curves in 2 The first theorem is a modification of Theorem IX.39 in Reed and Simon’s functional analysis book [1972], where most of the mathematical definitions not given here may be found. The term St denotes a value of the unsampled signal. In the proof of Theorem 2 the following lemma is required. Lemma 1 Let f 2 L 1 [tI , tF ] be nonnegative, and define a measure on E : [tI , tF ] ! 2 E I , tF ] via dt D ft dt, where K the curve trace K D K[t is a closed, bounded, measurable one-to-one function. For each S 2 L 1 K, let the function S be defined by tF
E D S p
E
Ste2iKtÐEp ft dt
tI
Then S 2 Cω 2 , the space of analytic functions on 2 .
265
RECONSTRUCTION OF NMR IMAGING DATA
Proof Define the tempered distribution T S 2 action on Schwartz functions h 2 2 : T S h D
2
0
2 in the usual way by its
S ph E p E dE p
Now compute the distributional Fourier transform of T S ; let g 2 S2 : T S g D T S gO D 2
S pO E g p E dE p
tF
D 2
E
Ste2iKtÐEp ft dt
tI
tF
D
E
gO p E e2iKtÐEp dE p
Stft
dt
2
tI tF
D
gO p E dE p
E StftgKt dt
tI
Thus the support of T S , K, is compact (by the Heine-Borel theorem). So T S is a distribution with compact support and, by the Paley-Wiener theorem for distributions (see, for example, Reed and Simon [1972] or Yosida [1980]), S 2 Cω 2 . E : [tI , tF ] ! 2 be an injective Theorem 2 Let 0 < c1 < c2 < 1 and let K C1 map that satisfies
c12 <
dKx t dt
2
C
dKy t dt
2
< c22
47
for all t 2 [tI , tF ]. Let f 2 L 1 [tI , tF ] be such that ft ½ 0 for all t 2 [tI , tF ] E I , tF ] given by dt D ftdt. And let and let be the measure on K D K[t 1 1 0 m > 2, m < 2. (a) Let g 2 2 , the set of Schwartz functions on 2 . If TK g is the restricE then TK extends uniquely to a tion of g to K given by TK gt D gKt, bounded map of the Sobolev space Wm0 2 into L 2 K, . (b) Let S 2 L 2 K, and define 1 K, Sx, y
tF
D tI
e2ixKx tCyKy t St ft dt
48
266
GORDON E. SARTY
as the inverse two-dimensional Fourier transform of S on K, . Then 1 1 2 2 2 ω 2 K, : L K, ! Lm is continuous. Furthermore, K, S 2 C . 1 (c) The map CK, D K, TK : Lm2 0 2 ! Lm2 2 is continuous where : 2 2 2 Lm0 ! Wm0 is the Fourier transform and TK : Wm0 2 ! 1 : L 2 K, ! Lm2 2 are as defined in parts (a) and (b). L 2 K,, K,
Proof Note that part (c) follows from parts (a) and (b). Part (b) will be proved first; then part (a) will follow from a duality argument. Let dKx C Aa,b D t 2 [tI , tF ] j a < t < b dt dKx t < a A D t 2 [tI , tF ] j b < a,b dt dKy C t < b D t 2 [tI , tF ] j a < Ba,b dt and D Ba,b
Then
dKy t < a t 2 [tI , tF ] j b < dt
c1 c1 C AC a,b , Aa,b , Ba,b , Ba,b p a < c2 and p < b c2 2 2
is an open cover for the compact set [tI , tF ]. It therefore contains a finite subcover of open sets fDi g. Each set Di in turn is the countable union of intervals Ii,j , and again a finite subcover fIi,j g of [tI , tF ] can be taken. By picking a point in the intersection of two adjacent intervals, a finite decomposition of [tI , tF ] into N subsets Tj D [tj1 , tj ], j D 1, . . . , N, where tI D t0 < t1 < Ð Ð Ð < tN1 < tN D tF , can be found such that either
or
c1 dKx p < t < c2 for t 2 Tj dt 2
(Case 1)
dKy c1 p < t < c2 for t 2 Tj dt 2
(Case 2)
By the implicit function theorem a map can be found in Case 1, K1 x , such that 1 1 t D K1 x kx for kx 2 Kx Tj D Vj and in Case 2, Ky , such that t D Ky ky for ky 2 Ky Tj D Vj .
RECONSTRUCTION OF NMR IMAGING DATA
267
Therefore, K can be decomposed into N disjoint measurable sets Sj , where Sj D fkx , ky j ky D hj kx , kx 2 Vj g where hj D Ky °K1 x when Case 1 holds or Sj D fkx , ky j kx D hj ky , ky 2 Vj g where hj D Kx °K1 y when Case 2 holds. The argument for when Case 1 holds is next presented; the argument for when Case 2 holds is similar. The function Vj is the characteristic function of the set Vj , equal to 1 when kx 2 Vj and 0 otherwise.6 Define 1 dKx 1 1 2ix kxCy hj kx 1 Kx kx SKx kx e fKx kx dkx Gj x, y D dt Vj D
Ste2ix Kx tCy Ky t ft dt Tj
Let Zj,y kx D Vj kx
SK1 x kx
e
so that
2iy hj kx
fK1 x kx
dKx 1 Kx kx dt
1
1
Gj x, y D
1
Zj,y kx e2ix kx dkx
Using the fact that f ½ 0, f 2 L 1 [tI , tF ] and S 2 L 2 K, it can be shown that Zj,y 2 L 2 for all y 2 : jjZj,y jj2L2 D
1 1
D Vj
jZj,y kx j2 dkx 2iy hj kx SK1 fK1 x kx e x kx
ð
2 c12
Vj
dKx 1 Kx kx dt
1
2
dkx
1 2 jSK1 x kx fKx kx j dkx
6 The characteristic function of a set S is the function whose value is 1 on the set and 0 S elsewhere.
268
GORDON E. SARTY tj
2 c12
D
2
jSt ftj2
tj1 tj
c2 c12
dKx t dt dt
jStj2 f2 t dt
tj1
tj c2 jjfjjL1 [tI ,tF ] jStj2 ft dt 2 c1 tj1 c2 2 2 jjfjjL1 [tI ,tF ] jjSjj2L2 K, c1
2
Let m < 12 ; then, using the Plancherel theorem in the seventh line below: jjGj jj2Lm2 2 D
2
D 2
1 C jx 2 C y 2 jm jGj x, yj2 dx dy 1 C jxj2 C jyj2 m jGj x, yj2 dx dy 1 C jyj2 m
2
D 2
D
1 1
D
1 1
D
1 1
1 1
2
Zj,y kx e2ix kx dkx
dx dy
1 C jyj2 m jZj,y xj2 dx dy
1 1
jZj,y xj2 dx 1 C jyj2 m dy
jjZj,y jjL2 1 C jyj2 m dy jjZj,y jjL2 1 C jyj2 m dy
c2 2 2 jjfjjL1 [tI ,tF ] jjSjj2L2 K, 1 C jyj2 m dy c1 1 1 c2 D 1 C jyj2 m dy 2 2 jjfjjL1 [tI ,tF ] jjSjj2L2 K, c1 1 c2 D Im 2 2 jjfjjL1 [tI ,tF ] jjSjj2L2 K, c1
1
where Im D
0
1 1 2 0 m 2 0m
269
RECONSTRUCTION OF NMR IMAGING DATA
for m < 12 . (According to the integral tables in the CRC Handbook [Beger, 1 1979], ( 1 1 C jyj2 m dy diverges when m ½ 12 ). Therefore, jj
1 2 K, SjjLm2 2
2
N
D
Gj jD1
Lm2 2 2
N
jjGj jjLm2 2 jD1
c2 N Im 2 2 jjfjjL1 [tI ,tF ] jjSjjL2 K, c1 c2 D N2 Im 2 2 jjfjjL1 [tI ,tF ] jjSjj2L2 K, c1
2
or, if c D N Im 2
c2 jjfjjL1 [tI ,tF ] c12
then jj
1 K, SjjLm2 2
cjjSjjL2 K,
1 1 showing that K, : L 2 K, ! Lm2 2 is continuous. Finally, note that K, S 1 ω 2 2 C by putting K, S D S in Lemma 1. This establishes part (b). 1 : L 2 K, ! To prove part (a), note that if m0 > 12 , then part (b) says that K, 2 2 2 2 Lm 0 is bounded. The function space Lm0 can be identified with the du2 2 2 al of Lm0 by associating to each h 2 Lm0 2 the functional hh, Ði defined by
hp E gp E dE p
hh, gi D 2
2 2 where g 2 Lm 0 . 1 Ł : Lm2 0 2 ! L 2 K, , is bounded; Equation (49) So the adjoint map, K, 1 Ł implies that jj K, hjjL2 K, c jjhjjL2 0 2 . If one defines TK : Wm0 2 !
L 2 K, by TK D
1 Ł K,
1
m
, then
jjTK jjL2 K, cjj
1
jjL2 0 2 D c jjjjWm0 2 m
270
GORDON E. SARTY
and for h 2 2 Lm2 0 2 , g 2 L 2 K, ,
E hKt gt ft dt D K
1 hp E 2
K
2
Dh
E
1 hp E
D 1
E p e2iKtÐE dE p gt ft dt
gt e2iKtÐEp ft dt
dE p
K 1 K, gi
h,
Dh
1 Ł K,
1
h, gi
D hTK h, gi So, for h 2 2 , TK h is just the restriction of h to K in the sense that E This establishes part (a). TK ht D hKt. B. The Curve Band-Pass Operator on Paley-Wiener Spaces 1 In Theorem 2, CK, and K, have been written to reflect the fact that the measure on the given k-space curve is important. To control the complexity of the subscripts, CK and K1 will now be written when the measure , given E I , tF ], is understood. Note for K E : [tI , tF ] ! by dt D ft dt for K D K[t 2 2 2 2 2 2 and 2 that the map CK : Lm0 ! Lm of Theorem 2 is given explicitly by
E D CK p
tF
E
E Kt e2iKtÐEp ft dt
50
tI
where
in the integral Fourier transform. Also, if tF
E D K p
E
e2iKtÐEp ft dt
51
tI
then K 2 Cω 2 \ Lm2 2 for m < 12 by Theorem 2, because K D
1 K K .
Proposition 3 Let CK D K1 TK : Lm2 0 2 ! Lm2 2 , where m0 > m < 12 , be as defined in Theorem 2 and let
1 2
tF
E D K p
E
e2iKtÐEp ft dt
tI
Then we have as an equivalent definition of CK the following: CK D Ł K
and
271
RECONSTRUCTION OF NMR IMAGING DATA
Before proving the proposition, note that this is just the expression that K is the point spread function associated with CK when it is used to reconstruct an ideal NMR image. It is not immediately obvious that the rigorous definition of CK given in Theorem 2 is equivalent to an unrestricted convolution so it needs to be explicitly proved. As noted above, for 2 2 \ Lm2 0 2 ,
Proof
E D CK p
tF
E
E Kt e2iKtÐEp ft dt
tI
Because ² L , we can use Fubini’s theorem to write 2
E CK p tF
D tI
D D D
1
1
1
1
1
1
a, b e2iaKx tCbKy t da db e2ixKx tCyKy t ft dt
1
1
1
1
1
1
1
1
1
1
2
1
tF
a, b
tF
a, b
e2iKx t axCKy t by ft dt
da db
tI
e2iKx t xaCKy t yb ft dt
da db
tI tF
a, b
K x a, y bft dt
da db
tI
Thus CK D Ł K for 2 2 . Then, because 2 \ Lm2 0 2 is dense in Lm2 0 2 and convolution with K is bounded from Lm2 0 2 to Lm2 2 , the Hahn-Banach theorem may be used to conclude that CK D Ł K for all 2 Lm2 0 2 . Proposition 4 Let A be a bounded measurable subset of 2 . Let Em0 : L 2 A ! Lm2 0 2 , m0 > 0, be the extension mapping given by fp E if p E 2A Em0 fp E D 0 if p E 62 A Let m : Lm2 2 ! L 2 A, m < 0, be the restriction mapping given by the following: E D fp E if p E 2A m fp Then Em0 and m are continuous, and, as well, we have the following continuous imbeddings: Lm2 0 2 ! L 2 2 ! Lm2 2 when m0 > 0 and m < 0.
272 Proof
GORDON E. SARTY
To see that Em0 : L 2 A ! Lm2 0 2 , m0 > 0, is continuous, compute: jjEm0 fjj2L2
m0
2
0
D
jEm0 fp E j2 1 C jjp E jj2 m dE p
2
0
jfp E j2 1 C jjpjj E 2 m dE p
D A
sup1 C jjpjj E 2 m
0
jfpj E 2 dE p
p E 2A
A
2 m0
E jj D sup1 C jjp p E 2A
jjfjj2L2 A
To see that Lm2 0 2 , m0 > 0, is continuously embedded in L 2 2 , compute jjfjj2L2 2 D
jfp E j2 dE p 2 0
jfp E j2 1 C jjp E jj2 m dE p
2
D
jjfjj2L2 2 m0
To see that L 2 2 is continuously embedded in Lm2 2 , m < 0, compute jjfjj2Lm2 2 D
jfp E j2 1 C jjpjj E 2 m dE p 2
jfp E j2 dE p
2
D jjfjj2L2 2 To see that jj
m
2 m fjjL 2 A
: Lm2 2 ! L 2 A, m < 0, is continuous, compute D
j A
D 2
2
D
E j m fp
2
dE p
jA pf E p E j2 dE p jA pf E p E j2
1 infqE2A 1 C jjEqjj2 m
1 C jjp E jj2 m dE p infqE2A 1 C jjEqjj2 m 2
jA pf E pj E 2 1 C jjpjj E 2 m dE p
RECONSTRUCTION OF NMR IMAGING DATA
D
1 infqE2A 1 C jjEqjj2 m
273
jfp E j2 1 C jjpjj E 2 m dE p 2
1 jjfjj2Lm2 2 infqE2A 1 C jjEqjj2 m
E : [tI, tF ] ! Theorem 5 Let A be a bounded measurable subset of 2 and let K 2 be an injective C1 map that satisfies
c12 <
dKx t dt
2
C
dKy t dt
2
< c22
52
for all t 2 [tI , tF ]. If m < 12 and m0 > 12 , then CK D
m
1 K TK
Em0 : L 2 A ! L 2 A
is continuous, where, as described in Proposition 4, Em0 : L 2 A ! Lm2 0 2 is the extension operator, m : Lm2 0 2 ! L 2 A is the restriction operator, and 1 are as defined in Theorem 2. K , TK , Proof
This theorem follows directly from Theorem 2 and Proposition 4.
The map used to describe the ideal MRI signal, the MRI map, may therefore be defined as 53 M D TK Em0 : L 2 A ! L 2 K, and the curve reconstruction operator, as RK D
m
1 K
: L 2 K, ! L 2 A
54
with the assurance that both maps are continuous and linear. C. Band-pass Operators as Integrals of Curve Band-pass Operators Next we show how to integrate the operators CK (which amounts to continuously adding the results of a continuum of MRI measurements) to obtain a band-limiting operator B. Then, if K circles the origin, we show how integrating the associated operators CK can lead to a resolution of the identity on L 2 A. Theorem 6 Let ϕE : [tI , tF ] ð [W1 , W2 ] ! 2 be a C1 map such that the restriction of ϕE to tI , tF ð[W1 ,W2 ] is a C1 diffeomorphism onto D ϕE tI , tF ð [W1 , W2 ]. Denote the components of ϕE as follows: ϕE t, ˇ D kx t, ˇ,
274
GORDON E. SARTY
ky t, ˇ. Suppose also that ϕE satisfies:
c1 ˇ <
∂kx t, ˇ ∂t
2
C
∂ky t, ˇ ∂t
2
< c2 ˇ
Eˇ : for all t 2 [tI , tF ] and ˇ 2 [W1 , W2 ] where 0 < c1 ˇ < c2 ˇ < 1. Let K 2 E ˇ t D ϕE t, ˇ and let Kˇ D K E ˇ [tI , tF ]. Let A [tI , tF ] ! be defined by K be a bounded measurable subset of 2 . Let m < 12 and m0 > 12 . Let CKˇ : L 2 A ! L 2 A be given by CK ˇ D 1 Kˇ
where
m
1 Kˇ TKˇ
Em0
: L 2 Kˇ , ! Lm2 2 , where dt D jJϕE t, ˇj dt, is given by 1 E Kˇ Sp
tF
D
E
Ste2ipE ÐKˇ t jJϕE t, ˇj dt
tI
and TKˇ : Wm0 2 ! L 2 Kˇ , is as given in Theorem 2, : Lm2 0 2 ! Wm0 2 is the (distributional) Fourier transform, and m : Lm2 0 2 ! L 2 A, 2 2 Em0 : L 2 A ! Lm 0 are the restriction and extension operators described in Proposition 4. Let B : L 2 A ! L 2 A be given by B D
m
1
Em0
where 1 : Wm0 2 ! Lm2 2 is the inverse Fourier transform. W Then B D W12 CKˇ dˇ strongly. Proof Let 2 L 2 A; then 2 L 1 A, so that Em0 2 L 1 2 . Also, since ϕE is a C1 diffeomorphism, the function j detDE ϕj jJϕE j 2 L 1 [tI , tF ] ð [W1 , W2 ] so that Fubini’s theorem may be used to justify the fourth equality in the following string of equalities: 1
Em0 x, y Em0 kx , ky e2ixkx Cyky dkx dky
D
D
W2 W1
D
W2 W1
tF tI tF tI
Em0 kx t, ˇ, ky t, ˇ e2ixkx t,ˇCyky t,ˇ jJϕE t, ˇj dt dˇ
1
1
1
1
Em0 a, b e2ikx t,ˇaCky t,ˇb da db
ð e2ixkx t,ˇCyky t,ˇ jJϕE t, ˇj dt dˇ
275
RECONSTRUCTION OF NMR IMAGING DATA 1
W2
D
1
W1
tF
1 1
e2ikx t,ˇaxCky t,ˇby
E a, b m0
tI
ð jJϕE t, ˇj dt da db dˇ 1
W2
D
1
W1
1 1
tF
Em0 a, b
e2iKˇx t,ˇaxCKˇy t,ˇby
tI
ð jJϕE t, ˇj dt da db dˇ W2
D
W1
1
1
1
1
W2
D
A
W1 W2
D
Em0 a, b Kˇ x a, y b da db
1 Kˇ TKˇ
W1
Em0 a, b Kˇ x a, y b da db
dˇ
dˇ
Em0 x, y dˇ
where tF
Kˇ p E D
E
e2iKˇ tÐEp jJϕE t, ˇj dt
tI
and the last equality (in the above string of equalities) is justified by ProposiW 1 E D W12 ˇK TˇK Em0 p E tion 3. Thus it has been shown that 1 Em0 p dˇ, and so
m
1
Em0 p E D
W2 m W1
1 Kˇ TKˇ
Em0 p E dˇ
because W2 m W1
1 Kˇ TKˇ
Em0 p E dˇ D
W2 m W1
1 Kˇ TKˇ
Em0 p E dˇ
is clear upon evaluating m on each side. W To see that W12 CKˇ dˇ is a well-defined Riemann integral, it needs to shown that ˇ 7! CKˇ is continuous with respect to the strong operator topology. This will be true if jjCKˇ CKˇ0 jjL2 A ! 0 for all 2 L 2 A whenever
276
GORDON E. SARTY
ˇ ! ˇ0 in [W1 , W2 ]. But, tF
CKˇ p E D
E ˇ t e2iKE ˇ tÐEp jJϕE t, ˇj dt E1 Ip K
tI tF
D
E1 Ip E ϕt, ˇ e2iEϕt,ˇÐEp jJϕE t, ˇj dt
tI
and, because Em0 has compact support for all 2 L 2 A, Em0 is a continE is continuous uous function, as are ϕE and jJϕE j, it is seen that ˇ 7! CKˇ p and so it is clear that jjCKˇ CKˇ0 jjL2 A ! 0 for all 2 L 2 A whenever ˇ ! ˇ0 in [W1 , W2 ]. From the definition of the Riemann integral as a generalized limit of sums and from the principle of uniform boundedness, it is true that
W2
W2
CKˇ dˇ D
W1
CKˇ dˇ W1
Because it has been shown that
m
1
Em0 x, y D
for each x, y 2 A, it can be concluded that in L A, where 2
that
m
1
W2 W1
m
1
W2 W1
CKˇ x, y dˇ
Em0 D
W2 W1
CKˇ dˇ
CKˇ dˇ is as defined above. Finally, it can be concluded
Em0 D
W2 W1
CKˇ dˇ as operators on L 2 A.
D. A Resolution of the Identity for Paley-Wiener Spaces Theorem 7 Let R D fkEu 2 2 j k ½ 0, uE 6D 0E fixedg be a ray in 2 and let E : [tI , tF ] ! 2 be a C1 map such that ϕE : [tI , tF ] ð 0 < c1 < c2 < 1. Let K 2 E E satisfies: 0, 1 ! is given by ϕE t, r D r Kt where K
c1 <
dKx t dt
2
C
dKy t dt
2
< c2
for all t 2 [tI , tF ] and the restriction of ϕE to tI , tF ð 0, 1 is a C1 diffeomorphism onto 2 nR. Let A be a bounded measurable subset of 2 . Let m < 12 E I , tF ]. and m0 > 12 . Define the set rK D r K[t Let CrK : L 2 A ! L 2 A be given by CrK D
m
1 rK TrK
Em0
277
RECONSTRUCTION OF NMR IMAGING DATA
where
1 rK
is given by 1 E rK Sp
tF
D
E
Ste2ipE Ðr Kt jJϕE t, rj dt
tI
and TrK : Wm0 2 ! L 2 rK, r , where dr t D jJϕE t, rj dt, is as given in Theorem 2, : Lm2 0 2 ! Wm0 2 is the Fourier transform and, m : Lm2 2 ! L 2 A, Em0 : L 2 A ! Lm2 0 2 are the restriction and extension operators described in Proposition 4. Let B : L 2 A ! L 2 A be given by B D
1
m
Em0
where D ϕE [tI , tF ] ð [R1 , R2 ] for 0 < R1 < R2 and where Lm2 2 is the inverse Fourier transform. Then (a) B D
R2 R1
1
: Wm 2 !
CrK dr strongly.
(b) Also
1
D
CrK dr 0
strongly, where is the identity on L 2 A and 1
CrK dr s-lim s-lim
R2 !1
R1 !0
0
R2
CrK dr R1
Proof Part (a) follows directly from Theorem 6. To establish (b), note that each statement in the following string of equalities is true when evaluated at each 2 L 2 A and that the sequences implied by these statements all converge in L 2 A. Thus the principle of uniform boundedness allows7 D
m
D
m
D
m
1
Em0
1
2 Em0 1 s-lim s-lim ϕE tI ,tF ð[R1 ,R2 ]
D s-lim s-lim R1 !0
7
R2 !1
R1 !0
R2 !1
m
1
Em0
ϕE tI ,tF ð[R1 ,R2 ] Em0
The notation s-lim refers to convergence in the strong operator topology [Yosida, 1980].
278
GORDON E. SARTY
D s-lim s-lim BϕE tI ,tF ð[R1 ,R2 ] R1 !0
R2 !1
R2
D s-lim s-lim R1 !0
R2 !1
CrK dr R1
1
D
CrK dr 0
Theorem 7 uses the polar radial coordinate as the coordinate ˇ to compleE An immediate generalization ment the natural coordinate t given through K. 1 comes from any C map of [tI , tF ] ð 0, 1 to 2 that restricts to a diffeomorphism of tI , tF ð 0, 1 onto 2 nM, where M is a set of measure zero that satisfies
c1 ˇ <
∂kx t, ˇ ∂t
2
C
∂ky t, ˇ ∂t
2
< c2 ˇ
for all t 2 [tI , tF ] and ˇ 2 0, 1, where 0 < c1 ˇ < c2 ˇ < 1. In that case E ˇ t D ϕE t, ˇ, as in Theorem 6. Also define CKˇ as in Theorem 6; then put K the following (strong) resolution of the identity on L 2 A will arise: 1
D
CKˇ dˇ
55
0
E. Further Properties of Curve Band-Pass Operators What else can be said about CK ? We begin with the following theorem. Theorem 8 CK : L 2 A ! L 2 A is self-adjoint. Proof
Use the characterization of CK given by Proposition 3 that CK D
Em0 Ł K as functions, where
tF
K p E D
E
e2iKtÐEp ft dt
tI
Notice that K p E D K p E (see also Equation (67)). Then, take , g 2 L 2 A and use the inner product on L 2 A: hCK , gi D hEm0 Ł K , gi Eq K p E qE dEq gp E dE p D A
A
279
RECONSTRUCTION OF NMR IMAGING DATA
D A
A
A
A
D
Eq gp E K p E qE dE p dEq Eq gp E K Eq p E dE p dEq
D
Eq A
A
gp E K Eq p E dp E
dEq
D h, Em0 g Ł K i D h, CK gi Therefore, CK D CŁK . Theorem 9 Let A D [a, b] ð [c, d] be a rectangle in 2 . Then CK : L 2 A ! L 2 A is positive. Proof For each natural number N, let PN D fAi gN iD1 be a partition of A into N disjoint rectangles of equal positive measure whose union is A and is a refinement of PN1 . Then, if 2 CA, h, CK i D
A
D
p E CK p E dE p
p E A
A
D A
A
N
N
D lim
N!1
Eq K p E qE dEq
dE p
p E K p E qE Eq dE p dEq E ! K E ! E k E k
kD1 !D1
j Aj 2 N2
E j 2 Aj . In matrix form, this is where h, CK i D lim E 1 Ð Ð Ð E N N!1
ð
E 1 .. . E N
K E 1 E 1 .. . K E N E 1
ÐÐÐ ÐÐÐ
K E 1 E N .. . K E N E N
j Aj 2 N2
This defines a convergent sequence in in which every member of the sequence is nonnegative, because [ E i E j ] is a positive matrix ( is a function of positive type; see Definition 3). Therefore, h, CK i ½ 0 for 2 CA.
280
GORDON E. SARTY
Because CA is dense in L 2 A and both hÐ, Ði and CK are continuous, it may be concluded that h, CK i ½ 0 for all 2 L 2 A. Thus CK is positive. Clearly, Theorem 9 will be true for A more general than a rectangle, but that line of thought is pursued here. If we compute CK CK we will find that CK is not idempotent, so that CK is not a projection on L 2 A. But CK is a self-adjoint positive operator on L 2 A when A ² 2 is a rectangle. From this it can be immediately inferred that the spectrum of CK consists only of eigenvalues that are nonnegative real numbers. The eigenvalues of CK are those numbers that allow CK
D
56
to have nonzero solutions for , the eigenfunctions. According to Proposition 3, Equation (56) can be written as
A
Eq K p E qE dEq D p E
57
which is a homogeneous Fredholm integral equation of the second kind. Fredholm’s theorem (see, for example, Tricomi [1985]) gives explicit formulas for computing the eigenvalues and eigenfunctions of Equation (57). E , If the kernel of the integral equation is symmetric, i.e., if Kp E , qE D KEq, p then the Hilbert-Schmidt theorem (again, Tricomi [1985], for example) will give more information on the nature of the eigenvectors of the integral operaE , qE D K p E qE , is tor. Because the kernel of the integral operator CK , Kp symmetric, as Equation (67) shows, the Hilbert-Schmidt theorem is directly relevant to CK . Because CK is a positive bounded operator, its eigenvalues are contained in a bounded set of nonnegative real numbers. A natural question to ask is whether or not CK is one-to-one; i.e., is zero an eigenvalue of CK or not? Intuitively it would seem that CK is not one-to-one and the kernel of CK would contain functions whose Fourier transforms are zero on the set K provided such functions exist. If A is a compact set, the Paley-Wiener theorem tells us that the Fourier transform of a function in L 2 A must be analytic. So, some hints on how to construct such a kernel function might be had from the theory of analytic functions of several complex variables. A function in the kernel of CK can be constructed for a very specific kind of set A and k-plane curve. Let A D fx, y 2 2 j R x R, R y Rg. Then A kx , ky D 4R2
sin[2Rkx ] sin[2Rky ] 2Rkx 2Rky
281
RECONSTRUCTION OF NMR IMAGING DATA
and, as well, any function of the form [A,A]ð[A,A] kx , ky D 4A2
sin[2Akx ] sin[2Aky ] 2Akx 2Aky
58
with A R will have as its inverse Fourier transform the characteristic function of a square, [A, A] ð [A, A], that is contained in A. The function of Equation (58) is zero on the k-plane curve whose trace is KB D boundary[fkx , ky 2 2 j B kx B, B ky Bg], where B D 1/2A. So if K D KB for B D 1/2A and A R, the function [A,A]ð[A,A] 2 L p A will be a nonzero function in the kernel of CK . V. CURVE BAND-PASS OPERATOR POINT-SPREAD FUNCTIONS Here a closer look is taken at the nature of the PSF defined by tF
p E D
E
e2iKtÐEp ft dt
59
tI
E I , tF ] E : [tI , tF ] ! 2 is a closed, bounded function (so that K D K[t where K is compact) that is one-to-one and f is a nonnegative function in L 1 [tI , tF ] (so that dt D ft dt gives a positive measure on K). It is already known that 2 Cω 2 from Lemma 1 by putting S D K . It can also be concluded, from Lemma 1 and the Paley-Wiener theorem for distributions, that j p E j C 1 C jjp E jjN
60
for some nonnegative constants C and N. A. The PSF as an Inverse Fourier Transform of a Finite Nonnegative Lebesgue Measure To inquire a little further, some facts about tempered distributions are needed. If is a finite, nonnegative measure on 2 , then it can be used to define a tempered distribution T in 0 2 , as follows. For a Schwartz function g 2 2 , define E dk E T g D gk 61 2
Usually (in Reed and Simon [1972], for example) the Fourier transform of the finite nonnegative measure would now be defined. But to keep a closer
282
GORDON E. SARTY
connection to the function, , of interest, the inverse Fourier transform of will be defined, the difference being in the sign of the exponent of e in the definition. The interesting mathematical properties of the Fourier transform and the inverse Fourier transform are identical. So, without further ado, define the function 1 D L by E E e2iEpÐk dk
L p E D
62
2
E defined by dt D ft dt on K and zero elsewhere on 2 , we With k have D . L It turns out that TL D TL , as one can see from the following calculation; let g 2 2 : TL g D
L p E gp E dE p 2
E
E gp e2iEpÐk dk E dE p
D 2
2
E
e2iEpÐk gpdE E p
D 2
E dk
2
E dk E gL k
D 2
D TL g so this definition of inverse Fourier transforms of nonnegative measures coincides with the restriction of the inverse Fourier transform on 0 2 to the nonnegative measures (i.e., to distributions of the form T , where is a nonnegative measure). Suppose E 1 , . . . , E N 2 2 and #E D #1 , . . . , #N 2 N . Then N
N
L E ! E j #j #! D
!D1 jD1
2
N 2
#! e
2iE ! ÐkE
E ½0 dk
!D1
This shows that the function L has the property that for any E 1 , . . . , E N 2 2 , [LE i E j ] is the matrix of a positive operator on N . Furthermore, by the dominated convergence theorem, L is continuous, and because jLE j 2
L is also bounded.
EE E D 2 je2iÐk j dk
RECONSTRUCTION OF NMR IMAGING DATA
283
Definition 3 A complex-valued, bounded, continuous function f on 2 that has the property that [fE i E j ] is a positive matrix on N for each N and all E 1 , . . . , E N 2 2 is called a function of positive type. There are three easy properties of functions of positive type which follow from the definition. Letting N D 1 and p E 2 2 : E ½0 f0
63
E is a positive operator on 1 . Letting N D 2 and choosing E 1 D p because f0 E, E it is seen that the matrix E2 D 0, E f0 fp E E fp E f0 must be positive and therefore self-adjoint with positive determinant. This implies that fp E D fp E 64 and
E jfp E j f0
65
Although it shall be used here, it is interesting to note [Reed and Simon, 1972] the following. Theorem 10 (Bochner’s theorem) The set of inverse Fourier transforms of finite, nonnegative measures on 2 is exactly the cone of functions of positive type. In other words, the functions of positive type are all potential point-spread functions! B. PSFs Associated with Curved k-Plane Trajectories Are Peaked It has been shown that is the inverse Fourier transform of a nonnegative measure on 2 and so is a function of positive type. The PSF therefore satisfies Equations (63), (64), and (65). Explicitly;
and
E ½0 0
66
p E D p E
67
E j pj E 0
68
284
GORDON E. SARTY
E To get a better Equation (68) says that is trying to be a function peaked at 0. idea about the decay properties will have at infinity, methods from the theory of stationary phase will be useful, with the main result (Corollary 12) being E has nonvanishing curvature. that will decay as jjp E jj1/2 when the curve K The following theorem is a rewording of Corollary 1.1.3 of Sogge [1993], who attributes the core of its proof to Van der Corput. Theorem 11 (Van der Corput) Let 8 2 C1 2 ; be such that ∂8 0, 0 D 0 ∂y but
∂2 8 0, 0 6D 0 ∂y 2
where y refers to the second factor. Then, by the implicit function theorem, there is a C1 function h that is the unique solution to the equation ∂8 x, hx D 0 ∂y in some compact neighborhood, N D [c, c], c < 1, of 0 that satisfies h0 D 0. Assume that ∂8 ∂y x, y 6D 0 for x, y 2 S D fx, y j x 2 [c, c], jy hxj c, y 6D hxg. Let a : 2 ! be a C1 function such that a, Ð, Ð has support in S D fx, y j x 2 [c, c], jy hxj cg and ja, x, yj ˛ < 1 for all , x and y. Let Ix, D
1
1
e2i8x,y a, x, y dy
Then for x 2 N, ½ 0, jIx, j K1 C 1/2 where K is a constant. E 00 t 6D 0E and E : [tI , tF ] ! 2 be a C1 curve such that K Corollary 12 Let K E 0 t 6D 0E for all t 2 [tI , tF ]. Let K tF
p E D
E
e2iEpÐKt ft dt
tI 1
where f 2 C [tI , tF ]. Then j p E j K1 C jjp E jj1/2 where K is a constant.
RECONSTRUCTION OF NMR IMAGING DATA
285
E it can be assumed that E 0 t 6D 0, Proof Without loss of generality, because K t is arclength. This is because, in general, if the arclength is given by s D gt, then gtF
p E D
E
e2iEpÐKs Fs ds
tI
where
E s D Kg E 1 s K
and Fs D fg1 s
dg1 s ds
Define 8 : [, ] ð [tI , tF ] ! by E 8, t D E Ð Kt where E D cos, sin. Then p E D
1 1
E
E pjjÐKt e2ijjEpjjp/jjE [tI ,tF ] tft dt
D I,
1 1
e2i8,t [tI ,tF ] tft dt
where D jjpjj E and E D p/jj E p E jj when p E 6D 0. (Note that the boundedness E of f implies the boundedness of 0.) For each t0 2 [tI , tF ], there exists t0 2 such that Et0 D
E 00 t0 K E 00 t0 jj jjK
Define 80 , t D 8 C t0 , t C t0 Then ∂80 ∂8 E 0 t0 0, 0 D t , t0 D Et0 Ð K ∂t ∂t 0 E 00 t0 K E 0 t0 D 0 ÐK D E 00 t0 jj jjK
286
GORDON E. SARTY
and ∂2 80 ∂2 8 E 00 t0 0, 0 D t , t0 D Et0 Ð K ∂t2 ∂t2 0 E 00 t0 K E 00 t0 D jjK E 00 t0 jj 6D 0 ÐK D E 00 t0 jj jjK Then, by the implicit function theorem, there is a C1 function h0 that is the unique solution to the equation ∂80 , h0 D 0 ∂t in some compact neighborhood, Nt0 D [c0 , c0 ], of 0 that satisfies h0 0 D 0 0. Also, ∂8 ∂t , t 6D 0 on S D f, t j 2 [c0 , c0 ], jt h0 j c0 , t 6D h0 g by choosing c0 small enough. Cover [tI , tF ] with all such sets Nt0 C t0 . Then, because [tI , tF ] is compact, there is a finite set of points fti gN iD1 [tI , tF ] such that the Nti C ti form an 1 open cover of [tI , tF ]. Let fai gN partition of unity on this cover and iD1 be a C let 1 Ii , D e2i8,t ai tft dt 1
Then Theorem 11 implies that jIi , j Ki 1 C 1/2 for some constant Ki . Because I D N iD1 Ii , it is true that jI, j K1 C 1/2 where K D
N iD1
Ki , which is equivalent to j p E j K1 C jjp E jj1/2
C. The PSFs Associated with Lines, Squares and Circles in k-Space Next, we take a more computational look at what kind of function is. Because the unknown spin-density function, , is real, only k-plane curves that give rise to real are of interest. The next proposition gives a sufficient condition for this to be so and is a reflection of the Hermitian conjugation property of the Fourier transform of a real function.
RECONSTRUCTION OF NMR IMAGING DATA
287
E : [T, T] ! 2 satisfy Proposition 13 Let K E E Kt D Kt
69
and suppose f 2 L 1 [T, T] is real and satisfies ft D ft. Then p E D
T T
E
e2iKtÐEp ft dt
is real. Proof
Compute: p E D
0 T
T
E
e2iKtÐEp ft dt C 0
T
T
E
e2iKtÐEp ft dt C 0
T
T
E
e2iKtÐEp ft dt C
0
E
e2iKtÐEp ft dt 0
T
D
E
e2iKtÐEp ft dt
0
D
E
e2iKtÐEp ft dt
0
D
E
e2iKtÐEp ft dt 0
T
D
T
E
e2iKtÐEp ft dt C
E
E
e2iKtÐEp C e2iKtÐEp ft dt
0 T
D2
E Ðp cos[2Kt E ]ft dt
70
0
which is real. E 1 : [0, T] ! 2 is used to When is real, O p E D O p. E So when a curve K generate a signal, S, on [0, T], the signal can be extended to [T, 0] by setting St D St. Extending S in this way is equivalent to collecting a signal E DK E 1 t for t ½ 0 and E : [T, T] ! 2 where Kt generated via the curve K E D K E 1 t for t < 0. Since it is easy to arrange for tI D 0 by a simple Kt time translation, it will be assumed here that the k-plane curve is symmetrical about the origin. Because many curves can be approximated by polygonal curves, it is natural to compute the PSF associated with a straight-line segment. Specifically,
288
GORDON E. SARTY
compute the PSF associated with ft D 1 and the k-plane curve given by t kE1 C vE T t kE1 C vE T
E D Kt
for T t < 0 0tT
for
71
where vE D kE2 kE1 and kE1 and kE2 are fixed vectors. This curve is shown in Figure 8. Then, using Equation (70): T
p E D 2 0
D2
T
t cos 2kE1 Ð p E C 2 vE Ð p E dt T
T
t cos[2kE1 Ð p E ] cos 2 vE Ð p E dt T
0
D2
t cos 2 kE1 C vE Ð p E dt T
0
T
2 0
t sin[2kE1 Ð p E ] sin 2 vE Ð p E dt T
E] sin[2Ev Ð p C 2T sin[2kE1 Ð p E] E] D 2T cos[2kE1 Ð p 2Ev Ð p E ð
E] 1 cos[2Ev Ð p 2Ev Ð p E
ky k2
v
u k1
kx
FIGURE 8. Nomenclature used for line-segment k-plane curve.
289
RECONSTRUCTION OF NMR IMAGING DATA
Using the relationships sin A cos B C cos A sin B D sinA C B
and sin ˛ sin ˇ D 2 cos
˛Cˇ 2
sin
˛ˇ 2
obtain: p E D D
2T sin[2kE1 C vE Ð p E ] 2T sin[2kE1 Ð p E] 2Ev Ð p E 2Ev Ð p E 2Tsin[2kE2 Ð p E ] sin[2kE1 Ð p E ] 2Ev Ð p E cos 2
D 4T
kE2 C kE1 2
Ðp E sin 2
kE2 kE1 2
Ðp E
2kE2 kE1 Ð p E
so that p E D 2T cos[2Eu Ð p E]
sin[2w E Ðp E] [2w E Ðp E]
72
where uE D kE2 C kE1 /2 and w E D kE2 kE1 /2 D vE/2. The function of Equation (72) clearly does not decay perpendicular to the vE direction. This is consistent with Corollary 12, because the curvature of the curve of Equation (71) is zero everywhere. Equation (72) can immediately be used to give a formula for a general polygonal curve when ft D 1. Referring to Figure 9, describe the general E : [T, T] ! 2 , where 0, T] D [niD1 ti1 , ti ], 0 D t0 < polygonal curve K Ð Ð Ð < ti1 < ti < Ð Ð Ð < tn , by
E D Kt
t ti1 vEi Ti t ti1 C kEi1 C vEi Ti C C kEi1
for
ti < t < ti1
for
ti1 < t < ti
73
C and Ti D ti ti1 (so niD1 Ti D T). Note that it is poswhere vEi D kEi kEi1 sible for the curve to be discontinuous at the endpoints of the line segments if kEi 6D kEiC . Such a situation could be used to describe the complete spin warp k-plane trajectory. If kEi D kEiC , the kEiC notation will be dropped. Applying Equation (72), the following expression for the PSF associated with
290
GORDON E. SARTY ky
ki + ki−1 = ki−1
kx + −ki−1 = −ki−1
−ki
FIGURE 9. Nomenclature used for a continuous polygonal k-plane curve.
the polygonal curve is obtained: n
2Ti cos[2Eui Ð p E]
p E D iD1
where uEi D
C kEi C kEi1 2
E] sin[2w Ei Ð p [2w Ei Ð p E]
and w Ei D
74
C kEi kEi1 2
Equation (74) will now be applied to the case when the k-plane curve is a square. In order to see more clearly what is going on, the expression for the PSF associated with the square will be derived in two steps. First, compute the PSF associated with Equation (73) when n D 1, kE0 D R, R, and kE1 D R, R, which gives the k-plane curve shown in Figure 10(a). In this case w E 1 D 0, R and uE1 D R, 0, x, y D 2T cos[2Rx]
sin[2Ry] [2Ry]
75
The PSF of Equation 75 is shown in Figure 10(b). The PSF associated with a pair of horizontal lines is obtained from Equation (75) by interchanging x and y. Then the PSF associated with the square shown in Figure 11(a) is had by adding the contributions from the vertical pair and the horizontal pair. So the PSF associated with the square of size R ð R, with ft D 1, is given by x, y D T cos[2Rx]
sin[2Ry] sin[2Rx] C T cos[2Ry] [2Ry] [2Rx]
This PSF is plotted in Figure 11(b).
76
291
RECONSTRUCTION OF NMR IMAGING DATA ky
k1 = (R, R)
−k0
kx
−k1
1 0 −1
10
−20
0 y
k0 = (R, −R)
20
−10 x
−10
0 10
(a)
20 −20
(b)
FIGURE 10. (a) Vertical straight segment k-plane curve. (b) The PSF associated with the k-plane curve with T D 1, R D 15 . ky
−k0 = k2
R
−R
−k1
k1
R
−R
k0 = −k2
kx
2 0 −2 −20
20 10 0 y
−10 x
(a)
−10
0 10 (b)
20 −20
FIGURE 11. (a) Square k-plane curve. (b) The PSF associated with the k-plane curve with T D 2, R D 15 .
In computing Equation (76), the k-plane curve started at R, R at t D T, moved to R, R at t D T/2 and then to R, R at t D 0 . The curve value then jumped to R, R at t D 0C , moved to R, R at t D T/2, and moved to R, R at t D T. What is the PSF that is associated with a continuous k-plane curve whose trace is a square? The following proposition shows that the PSF is the same. E and LE : [0, T] ! 2 be two k-plane curves that satisfy Proposition 14 Let K E E L t D KT t and let f, g 2 L 1 [0, T] be two functions that satisfy gt D
292
GORDON E. SARTY
fT t. If T
p E D
E
e2iKtÐEp ft dt 0
and T
p E D
E
e2iLtÐEp gt dt 0
then D
.
Proof
T
p E D
E
e2iKtÐEp ft dt 0 0
D
E
e2iKTÐEp fT d
T T
D
E
e2iLtÐEp gt dt 0
D t So let aE : [T, 0 ! 2 be the k-plane curve that begins at R, R at t D T, moves to R, R at t D T/2 and then moves to R, R at t D 0 . Let bE : [T, 0 ! 2 be the k-plane curve that begins at R, R at t D T, moves to R, R at t D T/2, and then moves to R, R at t D 0 . And let Ec : [0, T] ! 2 be the k-plane curve that begins at R, R at t D 0, moves to R, R at t D T/2, and then moves to R, R at t D T. The objective is to argue that the PSF associated with aE C Ec : [T, T] ! 2 , which is given by Equation (76), is the same as the PSF associated with bE C cE : [T, T] ! 2 , where k-plane curve addition is defined in the obvious tail-to-head sense. Let ˛R C i˛I be the PSF associated with aE, let ˇR C iˇI E and let R C iI be the PSF associated with be the PSF associated with b, Ec, where the subscripts R and I refer to the real and imaginary parts of the functions. Then the PSF associated with aE C cE is ˛R C R C i˛I C I D , where is the real function of Equation (76). The PSF associated with bE C Ec is ˇR C R C iˇI C I . But, according to Proposition 14, ˛R C i˛I D ˇR C iˇI , so the PSF associated with bE C cE is ˛R C R C i˛I C I D of Equation (76), which is what was to be shown. Note that, with ft D 1, the strict geometrical relationship between the k-plane curve and the PSF suggested by the results of Section IV and using ft D jJt, 0j is lost and the relationship depends on T (as well as the linear dependence on t that has been used here).
RECONSTRUCTION OF NMR IMAGING DATA
293
Next consider the case of a single line through the origin, the k-plane curve associated with one scan in a radial data acquisition measurement. Again use ft D 1 so that Equation (74) can be used. In an actual radial measurement, ft proportional to t would be used to reflect the Jacobian determinant associated with the polar-to-Cartesian transformation; however, the qualitative behavior of the PSFs are the same as with ft D 1. So, consider a horizontal line segment through the origin, extending from y D R to y D R. Then, to apply Equation (74) take n D 1, kE0 D 0, 0 and kE1 D R, 0, so the PSF for this k-plane curve is x, y D 2T cos[Rx]
sin[Rx] sin[2Rx] D 2T [Rx] [2Rx]
77
a plot of which is shown in Figure 12. Using several lines through the origin leads to the situation depicted in Figure 13. In this case the k-plane curve is discontinuous at a finite number of points. In Figure 13 there are four line segments through the origin. To describe E E this curve via the scheme of Equation (73), p p setC Ti D T/4, k0 D 0, 0, kC1 D C E E E E E 0, R, k1 D 0, 0,p k2 D p R/ 2, R/ 2, k2 D 0, 0, k3 D R, 0, k3 D 0, 0, and kE4 D R/ 2, R/ 2. These four rays, when parameterized in this way, lead to the PSF: p p sin[Rx C y/ 2] T p x, y D cos[Rx C y/ 2] 2 [Rx C y/ 2] p p sin[Rx y/ 2] 78 p C cos[Rx y/ 2] [Rx y/ 2] C cos[Rx]
sin[Ry] sin[Rx] C cos[Ry] [Rx] [Ry]
a plot of which is given in Figure 13. Although the natural polar-coordinate system was not introduced, it can be seen how adding the PSFs from line segments at different orientations leads to a PSF that is peaked at the origin. In other words, it appears that if the PSFs associated with all lines through the origin were added together, a Dirac delta function would be obtained. In fact, an implication of an obvious variation of Theorem 7 is that if ft is proportional to t and T D 1, then this will be true. Now consider the case when the k-plane curve is a circle: t T t ky t D R sin T kx t D R cos
79
294
GORDON E. SARTY ky
−k1
k1 −R
R
kx
1.5 1 0.5 0 −20
20 10 0 y
−10 x
−10
0 10
(a)
20 −20
(b)
FIGURE 12. The PSF associated with a horizontal ray through the origin with T D 1, R D 15 , kE0 D 0, 0 and kE1 D R, 0. ky −k1 R
−k2
k4
−k3
k3 −R
R
−k4
−R k1
k2
kx
1.5 1 0.5 0 −20
20 10 0 y
−10 x
−10
0
(a)
10
20 −20
(b)
FIGURE 13. (a) A k-plane curve consisting of four line segments through the origin. (b) The PSF associated with the k-plane curve with T D 1, R D 15 .
E D Kt E for 0 t T and Kt for T t < 0. (Note that, by an argument similar to that following Proposition 14, it can equivalently be imagined E can be given by Equation (79) for all T t T.) To evaluate the that K corresponding PSF, it is best to express the point p E in the p-plane (the image plane) in polar coordinates, p E D x, y D r cos , r sin
80
RECONSTRUCTION OF NMR IMAGING DATA
295
Equations (79) and (80) lead to E Ðp Kt E D rR cos cos E Ðp Kt E D rR cos
t C sin sin t T T
t T
81
Substituting Equation (81) into Equation (70) with ft D 1 gives T
r, D 2
cos 2rR cos
0
t T
dt
82
or, upon a simple change of variables, r, D
2T
cos [2rR cos ] d
83
0
Note also that r, D
2
2T
so that r, D
T
cos[2rR cos ] d
84
cos[2rR cos ] d
85
2 0
Because the (inner) cosine in Equation (85) is 2 periodic in t, is independent of . So using Equation (83) with D /2 gives r D
2T
cos[2rR sin] d
86
0
Now the nth order Bessel function can be expressed as Jn ˛ D
1
cos[n ˛ sin] d
87
0
for n 2 f0, 1, 2, . . .g and, in particular, J0 ˛ D
1
cos[˛ sin] d
88
0
So comparing Equation (88) to Equation (86) gives r D 2TJ0 2rR A plot of the PSF given in Equation (89) is given in Figure 14.
89
296
GORDON E. SARTY ky R
−R
R
10
kx 0.5 0
5
−10
0 y
−R
−5 x
(a)
−5
0 5
10 −10
(b)
FIGURE 14. (a) A k-plane curve consisting of a circle of radius R. (b) The PSF associated with the k-plane curve with T D 12 , R D 15 .
An interesting variation arises with the PSF associated with a k-plane curve that is a circle whose center is not the origin. The following proposition is useful for computing such a PSF. Proposition 15 Let E D Kt
E E 0 t C K 0 E E K t C C
for for
T
E 0 t for all T t T, let ft D ft for all T E 0 t D K where K t T, and let T
p E D 2
E 0 t Ð p]ft sin[2K E dt
0
E 0 and is the PSF associated with K, E then If 0 is the PSF associated with K E Ðp E Ðp E sin[2C E ] p E p E D cos[2C E ] 0 p E E Proof First note that Kt D Kt for T t T from the conditions E 0 . Using Equation (70) compute on K T
p E D2
E Ðp cos[2Kt E ]ft dt
0 T
D2 0
E Ðp E 0 t C C cos[2K E ]ft dt
RECONSTRUCTION OF NMR IMAGING DATA T
D2
297
E Ðp E 0 t Ð p cos[2K E ] cos[2C E ]ft dt
0 T
2
E Ð p]ft E 0 t Ð p sin[2K E ] sin[2C E dt
0
E Ðp E Ð p] D cos[2C E ] 0 p E sin[2C E p E E 0 be given by Begin with a circle around the origin. That is, let K kx0 t D R cost/T ky0 t D R sint/T
90
E 0 t D K E 0 t for T t < 0 . It will be useful to translate for 0 < t T, K the parameterization of the k-plane curves in order to simplify the calculations. This is no problem when f D 1, but otherwise we need to define a new function f0 for the reparameterization. The way this is done when the E is shifted by a constant is as follows. parameterization of K In the original parameterization; tF
p E D
E
e2iKtÐEp ft dt
tI
E : [tI , tF ] ! 2 . In the new parameterization, s D t C c where c is a where K E 0 s D Ks E c, and sI D tI C c, sF D E 0 : [sI , sF ] ! 2 , where K constant, K 0 tF C c. And put f s D fs c. So, in the new parameterization: sF
E D 0 p
E0
e2iK sÐEp f0 s ds
sI sF c
D
sI c tF
D
E
e2iKtCcÐEp ft C c dt E
e2iKtÐEp ft dt
tI
D p E Now it is desired to compute the PSF (with f D 1) associated with the curve E E 0 t C T C for 2T t < 0 E D K 91 Kt 0 E E t T C C K for 0 < t 2T E 0 is given by Equation (90). Use Proposition (15) to simplify the where K E into four pieces; reparameterize each piece calculations and split the curve K
298
GORDON E. SARTY
by a constant shift to obtain the following E E 0 t C E 1 t D K K E E 0 t C C K E E 0 t C C E 2 t D K K 0 E E K t C
two k-plane curves: for T t < 0 for 0 < t T
92
for T t < 0 for 0 < t T
93
From Proposition 15 it is known that the PSFs associated with the k-plane curves of Equations (92) and (93) are given by E Ðp E Ðp E D cos[2C E ] 0 p E sin[2C E ] p E 1 p
94
E Ðp E Ðp 2 p E D cos[2C E ] 0 p E C sin[2C E ] p E
95
where, from Equation (89), it is known that 0 r, D 2TJ0 2rR E 1 and K E 2 , it is deduced that the PSF associShifting the parameterizations of K ated with the k-plane curve of Equation (91) is found by adding Equations (94) and (95): E Ðp r, D 4T cos[2C E ]J0 2rR 96 E is parameterized on [2T, 2T] in this case. Reparameterizing Note that K E K to the interval [T, T] gives E Ðp r, D 2T cos[2C E ]J0 2rR
97
Compare this to Equation (89). E D R, 0. Then As an example of an offset-circle k-plane curve, let C Equation (96) becomes x, y D 4T cos2RxJ0 2R x 2 C y 2 a plot of which is shown in Figure 15. D. The Integration of Curve Band-Pass Point-Spread Functions Here two examples are given to illustrate the application of Theorem 7. They involve the k-plane curves associated with a circle (see Figure 14) and a square (see Figure 11).
299
RECONSTRUCTION OF NMR IMAGING DATA ky
−R
R
0.5 kx 0.25 0 −0.25 −10
10 5 0 y
−5 x
−5
0
(a)
5
10 −10
(b)
FIGURE 15. (a) A k-plane curve consisting of circle of radius R offset in the x-direction by R units. (b) The PSF associated with the k-plane curve with T D 12 , R D 15 .
Let T D so that [tI , tF ] D [, ] and rewrite Equation (79), for a k-plane circle of radius R, as Kx t D R cos t 98 Ky t D R sin t The natural auxiliary coordinate to use is the radial one, R, so the relevant coordinate transformation is, therefore, ϕE t, R D R cos t, R sin t
99
which has as its Jacobian determinant jJϕE t, Rj D R
100
With these coordinates, define the operator CR by CR x, y D
R O cos t, R sin te2ixR cos tCyR sin t R dt
101
Because jJϕE j is independent of t, the result of Equation (89) may be used to conclude that, in the radial p-plane coordinates r, ˛, the PSF associated with CR is R r D 2RJ0 2rR 102 A plot of the PSF of Equation (102) has the same shape as the function plotted in Figure 14.
300
GORDON E. SARTY
Theorem 7 states that
1
CR dR D
103
0
strongly on L 2 A, where A is a bounded measurable subset of 2 . Also b
CR dR D B
104
a
strongly on L p A, where is the annulus of inner radius a and outer radius b in the k-plane and B is the associated band-pass operator. Let Rp : L 2 2 ! L 2 A and Ep : L 2 A ! L 2 2 be restriction and extension operators,8 then B D Rp Ep Ł
ab
105
where ab r
b
D
R r dR a b
D 2
RJ0 2rRdR
106
1 [bJ1 2br aJ1 2ar] r
107
a
D
where J1 is the first-order Bessel function, defined by Equation (87). As an interesting connection with a multiresolution analysis, the function ab of Equation (107) qualifies as a 2-D wavelet in the Meyer-Mallet sense (as defined, for example, in Daubechies’s book [1992]). The integral of Equation (106) was evaluated with the symbolic computer package Mathematica [Wolfram, 1991] to give Equation (107). A plot of ab is shown in Figure 16. This plot was made by evaluating Equation (107) numerically with Mathematica. Also, D
s-lim
a!0,b!1
Rp Ep Ł
ab
108
so one could formally write υr D
b
lim
a!0,b!1
R r dR
109
a
8 E p p E equals itself inside A and equals 0 outside A; the result of Rp is to discard all function values outside A.
301
RECONSTRUCTION OF NMR IMAGING DATA ky b
a −b
−a
a
b
20
kx 0.05 10
0
−a
−20
0 y
−10
−b
x
−10
0 10 20
(a)
−20
(b)
FIGURE 16. (a) An annulus in the k-plane. (b) The wavelet 1 , b D 15 . with a D 10
ab
associated with the annulus
ky b
b −b
kx
0.1 0.05 0 −20
20 10 0 y
−10
−b
x
(a)
−10
0 10
20 −20
(b)
FIGURE 17. (a) A disk in the k-plane. (b) The Meyer-Mallet scaling function with the disk when the radius is 15 .
0b
associated
When a D 0, B is a low-pass operator and 0b is no longer a wavelet, but it is a Meyer-Mallet scaling function (the square of which, in this case, is an Airy function). This Meyer-Mallet scaling function is shown in Figure 17. Using a Riemann sum, Equation (104) can be approximated by B ¾ D
m
CRj 1R jD1
110
302
GORDON E. SARTY
where Rj D a C b aj/m and 1R D b a/m. The right-hand side of Equation (110) can be used to define another operator on L p A, which, as is shown in Theorem 20, strongly converges to B in L p A as m ! 1. If, also, CRj is approximated with a Riemann sum, a nice numerical approximation Pnm, for B results, where n1
m
Pnm, x, y D
R O cos tq , R sin tq e2ixR cos tq CyR sin tq Rj 1t 1R
qDn jD1
111 where tq D q/n and 1t D /n. It is also true that, for 2 L p A, Pnm, D Rp Ep Ł
nm,
112
where n1 nm,
m
D
e2ixR cos tq CyR sin tq Rj 1t 1R
113
qDn jD1
gives an approximation of the Meyer-Mallet scaling function 0b when a D 0 or of the wavelet ab for a 6D 0. Note that these approximations for ab are not in L 2 2 , because they are trigonometric polynomials. The setup for radial data acquisition will not lead to a resolution of the identity, but it does lead to an approximation of the same low-pass operator B given here. A square k-plane curve (see Figure 11) leads, of course, to similar results. If we take [tI , tF ] D [2, 2], a square k-plane curve with sides of length 2R is given by
kx t; R D
ky t; R D
R2t C 3 R R2t 1 R R R2t C 1 R R2t 3
2 t 1 1 t 0 0t1 1t2 2 t 1 1 t 0 0t1 1t2
If R is used as the auxiliary coordinate, the absolute value of the Jacobian of the associated coordinate transformation is jJϕE t, Rj D
∂kx ∂ky ∂ky ∂kx t, R t, R t, R t, R ∂t ∂R ∂t ∂R
RECONSTRUCTION OF NMR IMAGING DATA
303
And in this case ∂kx t, R D ∂t
∂ky t, R D ∂t Also,
2R 0 2R 0 0 2R 0 2R
2 t 1 1 t 0 0t1 1t2 2 t 1 1 t 0 0t1 1t2
2t C 3 1 2t 1 1 1 2t C 1 1 2t 3
∂kx t, R D ∂R
∂ky t, R D ∂R
2 t 1 1 t 0 0t1 1t2 2 t 1 1 t 0 0t1 1t2
So, jJϕE t, Rj D 2R
114
which, as in the circle case, is independent of t. So Equation (76) can be used to conclude that the PSF associated with this square k-plane curve is R x, y D 2R cos[2Rx]
sin[2Rx] sin[2Ry] C 2R cos[2Ry] [2Ry] [2Rx]
115
The curve band-pass operator, CR , is given by CR x, y D
1 2
R2t O C 3, Re2ixR2tC3CyR 2R dt
0
C
1 1
C
R, O R2t C 1e2ixRyR2tC1 2R dt
R2t O 1, Re2ixR2t1yR 2R dt
0 2
C 1
R, O R2t 3e2ixRCyR2t3 2R dt
116
304
GORDON E. SARTY
As before, the operator of Equation (116) may be integrated to give 1
CR dR D
0
or
b
CR dR D B
a
where is the square annulus bounded by a square with sides of length 2a on the inside and of length 2b on the outside; see Figure 18(a). Again a wavelet (if a > 0) or a Meyer-Mallet scaling function (if a D 0), ab , is obtained, which is given by b
ab x, y D
R x, y dR a b
D
b
D 4b2
e2ixkx Cyky dkx dky
a a
e2ixkx Cyky dkx dky
sin[2bx] sin[2by] sin[2ax] sin[2ay] 4a2 117 [2bx] [2by] [2ax] [2ay]
Plots of ab , a > 0, and 0b are given in Figures 18 and 19. As in the case for the circle, one may write a numerical (Riemann sum) approximation for the band- or low-pass operators, B , from the square-curve band-pass operators of Equation (116). Also note that the method behind the ky b
a −b
−a
a −a
b
kx
0.15 0.1 0.05 0 −20
20 10 0 y
−10 −b
x
−10
0 10 20
(a)
−20
(b)
FIGURE 18. (a) A square annulus in the k-plane. (b) The wavelet 1 , b D 15 . annulus with a D 10
ab
associated with the
305
RECONSTRUCTION OF NMR IMAGING DATA ky b
b −b
kx
0.15 0.1 0.05 0 −20
20 10 0 y
−10
−b
x
−10
0 10 20
(a)
−20
(b)
FIGURE 19. (a) A square tile in the k-plane. (b) The Meyer-Mallet scaling function associated with the square when b D 15 .
0b
spin warp measurement provides a numerical approximation for the low-pass operator, B[b,b]ð[b,b] . The regions given by in Theorem 7 should properly be called apertures, because they limit the resolution possible in an MRI measurement in the same way that physical apertures do in optics (at least for the signal model being used here). Figures 17 and 19 show the maximum apertures that are accessible by radial and spin warp measurements, respectively. The maximum apertures accessible by spiral and rosette measurements are disks, and the maximum apertures accessible by the Lissajous and EPI measurements are squares. The lesson of this section can be summarized in the following two corollaries. They are corollaries to Theorem 6 and emphasize the fact that band-pass operators provide a reconstruction at a certain scale — at a fixed resolution. Corollary 16 Let ϕE : [tI , tF ] ð [W1 , W2 ] ! 2 be a C1 map such that the restriction of ϕE to tI , tF ð [W1 , W2 ] is a C1 diffeomorphism onto D E ˇ : [tI , tF ] ! 2 be ϕE tI , tF ð [W1 , W2 ], where 0E 2 interior . Let K E ˇ t D ϕE t, ˇ. Let 1 p 1. defined by K Let CKE ˇ : L p A ! L p A be given by CKE ˇ D
tF
E
E ˇ t e2ipE ÐKˇ t jJϕE t, ˇj dt E1 Ip K
tI
Let R : C0 2 ! L p A be the restriction of a function defined on all of to A and let B : L p A ! L p A be given by 2
B D R
1
E1 Ip
306
GORDON E. SARTY
Let E D KE ˇ p
tF
E
e2iKˇ tÐEp jJϕE t, ˇj dt
tI
and p E D
W2 W1
KE ˇ p E dˇ
Then E D REp Ł p E B p and
118
is a Meyer-Mallet scaling function.
Corollary 17 is an immediate consequence of Corollary 16: E E : be a ray in 2 . Let K Corollary 17 Let R D fkEu 2 2 j k ½ 0, uE 6D 0fixedg 2 1 [tI , tF ] ! be a C map such that for ϕE : [tI , tF ] ð 0, 1 ! 2 given by E ϕE t, r D r Kt, the restriction of ϕE to tI , tF ð 0, 1 is a C1 diffeomorphism 2 onto nR. Let 1 p 1. Let Cr KE : L p A ! L p A be given by E D Cr KE p
tF
E E Kt E E1 Ip r Kt e2ir pÐ jJϕE t, rj dt
tI
Let R : C0 2 ! L p A be the restriction of a function defined on all 2 to A and let B : L p A ! L p A be given by B D R
1
E1 Ip
where D ϕE [tI , tF ] ð R1 , R2 ] for 0 R1 < R2 . Then B p E D REp Ł p E
119
and is a Meyer-Mallet scaling function when R1 D 0 and is a 2-D wavelet (in the sense of [Daubechies, 1992]) when R1 6D 0. E. Operators Associated with Reconstruction from Discrete Data Samples The Riemann sum approximations of the curve band-pass operators, CK , and curve reconstruction operators, RK (see Equation (54)), define operators on spaces of square summable (and other) function spaces in their own right, and it can be shown in what sense these discrete operators converge to the integral operators.
RECONSTRUCTION OF NMR IMAGING DATA
307
E 2 C[tI , tF ]; 2 . Let A be a bounded Theorem 18 Let f 2 C[tI , tF ] and K 2 measurable subset of . If 2 L p A, let PN,K be defined by N
E
E j e2iKtj ÐEp ftj 1t O Kt
E D PN,K p
120
jD1
where tj and 1t have the obvious meanings and N
E
E D N,K p
e2iKtj ÐEp ftj 1t jD1
Then (a) E D PN,K p
A
Eq N,K p E qE dEq
121
and (b) PN,K : L p A ! L p A
122
is continuous for 1 p 1. Proof Part (a) for 2 L p A follows from interchanging the order of summation and integration. For part (b), first prove the boundedness of PN,K on L 1 A and L 1 A. To see that PN,K is bounded for in L 1 A, compute jjPN,K jjL1 A D
A
A
Eq N,K p E qE dEq dE p
sup N,K Es jAj jjjjL1 A Es2A0
where
A 0 D fp E qE 2 2 j p E , qE 2 Ag
To see that PN,K is bounded for in L 1 A, compute jjPN,K jjL1 A D ess sup
p E 2A
A
Eq N,K p E qE dEq
sup N,K Es jAj jjjjL1 A Es2A0
123
308
GORDON E. SARTY
The continuity of PN,K on L p A then follows from the Riesz-Thorin theorem [Reed and Simon, 1972], as does part (a) for 2 L p A. Theorem 19 Let A be a bounded measurable subset of 2 . Let f 2 E 2 C[tI , tF ]; 2 . For 1 p 1, define RN : C[tI , tF ], f ½ 0, and let K 1 C[tI , tF ] L [tI , tF ] ! L p A, by N
RN Sp E D
E
Stj e2iKtj ÐEp ftj 1t jD1
where tj and 1t have the obvious meanings. Then RN has a unique extension to L 1 [tI , tF ] such that RN : L 1 [tI , tF ] ! L p A is continuous for 1 p 1. Proof It will be shown that RN : C[tI , tF ] L 1 [tI , tF ] ! L 1 A and RN : C[tI , tF ] L 1 [tI , tF ] ! L 1 A are both bounded. It then follows from the Hahn-Banach theorem that RN can be uniquely extended to a map with the same bound on each of L 1 A and L 1 A. An application of the Riesz-Thorin theorem will then establish the theorem. So to see that RN : C[tI , tF ] L 1 [tI , tF ] ! L 1 A is bounded, compute N
jjRN SjjL1 A D
E
A jD1
Stj e2iKtj ÐEp ftj 1t dE p
N
j Aj
jStj j jftj j 1t jD1
jAjN 1t jjfjjL1 [tI ,tF ] jjSjjL1 [tI ,tF ] D jAjtF tI jjfjjL1 [tI ,tF ] jjSjjL1 [tI ,tF ] To see that RN : C[tI , tF ] L 1 [tI , tF ] ! L 1 A is bounded, compute N
jjRN SjjL1 A D ess sup
E
p E 2A jD1
Stj e2iKtj ÐEp ftj 1t
N
ess sup p E 2A jD1
jStj j jftj j 1t
N 1t jjfjjL1 [tI ,tF ] jjSjjL1 [tI ,tF ] D tF tI jjfjjL1 [tI ,tF ] jjSjjL1 [tI ,tF ]
RECONSTRUCTION OF NMR IMAGING DATA
309
Note that RN : C[tI , tF ] L 1 [tI , tF ] ! L 1 A (when f ½ 0) is not bounded because if it were one would need to prove that N
jjRN SjjL1 A D
E
A jD1
Stj e2iKtj ÐEp ftj 1t dE p
CjjSjjL1 [tI ,tF ] for some constant C; but by choosing the right S, jjSjjL1 [tI ,tF ] can be made arbitrarily small while the sum in the first line remains the same (think of the case when N D 1 and S is a Gaussian with an arbitrarily small variance). E is continuous, The Riemann integration theorem says that, if K N,K p E ! K p E
124
when N ! 1, where tF
E D K p
E
e2iKtÐEp ft dt
tI
Also, if S 2 C[tI , tF ],
RN Sp E ! RK Sp E
125
as N ! 1 and, if 2 CA, E ! CK p E PN,K p
126
where RK and CK are the continuous curve reconstruction (see Equation (54)) and curve band-pass operators. Equation (126) implies that jjPN,K CK jjLp A ! 0
127
as N ! 1 for 2 CA, where A is a bounded measurable subset of 2 . Further, for every 2 L p A, there exists a sequence fm g CA such that m ! in L p A; therefore, because PN,K is continuous on L p A, PN,K m ! PN,K
128
in L p A as m ! 1. And, since CK is continuous on L p A, C K m ! CK
129
310
GORDON E. SARTY
in L p A as m ! 1. In addition it is true that j supEs2A0 N,K Esj tF tI jjfjjL1 [tI ,tF ] , so, from the estimates given in the proof of Theorem 18, it can be concluded that jjPN,K jjLp A tF tI jjfjjL1 [tI ,tF ] jAj jjjjLp A independently of N. So for any fixed N, there exists Mε such that if m > Mε , jjPN,K m PN,K jjLp A ε/3 and jjCK m CK jjLp A ε/3 from Equations (128) and (129). Furthermore, because of the uniform boundedness of PN,K , there exists Nε such that if N > Nε , jjPN,K m CK m jjLp A ε/3 from Equation (127), the preceding estimates remain valid for m > Mε . Therefore, jjPN,K CK jjLp A D jjPN,K PN,K m C PN,K m CK m C CK m CK jjLp A jjPN,K PN,K m jjLp A C jjPN,K m CK m jjLp A C jjCK m CK jjLp A ε This shows that CK D s-lim PN,K N!1
130
on L p A. In a similar manner it can be shown that as a mapping from L 1 [tI , tF ] to p L A, 131 RK D s-lim RN N!1
Next, we present a theorem about the Riemann sum approximation of the curve band-pass operator integration of Theorem 6. Theorem 20 Let ϕE : [tI , tF ] ð [W1 , W2 ] ! 2 be a C1 map such that the restriction of ϕE to tI , tF ð [W1 , W2 ] is a diffeomorphism onto D ϕE tI , tF ð
RECONSTRUCTION OF NMR IMAGING DATA
311
E ˇ : [tI , tF ] ! 2 be defined by K E ˇ t D ϕE t, ˇ. Let A be [W1 , W2 ]. Let K 2 E D a bounded measurable subset of and let BM be defined by BM p 6M E 1ˇ, where 1ˇ D W2 W1 /M and mD1 Cm p tF
E D Cm p
E
E ˇm t e2iEpÐKˇm t jJϕE t, ˇm j dt O K
tI
where ˇm 2 [W1 C m 11ˇ, W1 C m1ˇ]. Let B : L 2 A ! L 2 A be given by B D R 1 E1 I2 , where is the Fourier transform on L 1 2 and R, E1 , and I2 are restriction (from L 1 2 to L 2 A), extension (from L 1 to L 2 2 ), and inclusion (from L 2 A to L 1 A) operators, respectively. Then B p E D limM!1 BM p E . Proof
It follows from Theorem 6 that ˇ 7! Pˇ p E is continuous, where tF
E D Pˇ p
E
E ˇ t e2iEpÐKˇ t jJϕE t, ˇj dt O K
tI
so the result follows from the Riemann integration theorem. The next theorem shows how properly set-up, sampled, multiple-trajectory MRI data acquisitions may be combined to yield a nice approximation for the associated band-pass operator. Theorem 21 Let ϕE : [tI , tF ] ð [W1 , W2 ] ! 2 be a C1 map such that the restriction of ϕE to tI , tF ð [W1 , W2 ] is a C1 diffeomorphism onto D E ˇ : [tI , tF ] ! 2 be defined by K E ˇ t D ϕE t, ˇ. ϕE tI , tF ð [W1 , W2 ]. Let K 2 Let A be a bounded measurable subset of and let PNM be defined by M
PNM p E D
PmN p E 1ˇ
132
mD1
where 1ˇ D W2 W1 /M and N
PmN p E D
E
E ˇm tn e2iEpÐKˇm tn jJϕE tn , ˇm j 1t O K
nD1
where ˇm 2 [W1 C m 11ˇ, W1 C m1ˇ], 1t D tF tI /N, and tn 2 [tI C n 11t, tI C n1t]. Let B : L 2 A ! L 2 A be given by B D R
1
E1 I2
312
GORDON E. SARTY
where is the Fourier transform on L 1 2 and R, E1 , and I2 are restriction, extension, and inclusion operators. Then, for 2 L 2 A, E D lim lim PNM p E B p N!1 M!1
133
Define M
N
E D NM p
E
e2iEpÐKtn jJϕE tn , ˇm j 1t
mD1 nD1
Then PNM , as defined by Equation (132), can be written as E D PNM p
A
Eq NM p E qE dEq
134
Proof The statement of Equation (133) follows from the Riemann integration theorem. The equivalence of Equations (132) and (134) follows from the linearity of the integral. Also note that by mimicking the proof of Theorem 18, it can be shown that PNM : L 2 A ! L 2 A is continuous. The operators PNK and PNM associated with one or more sampled k-plane curves, respectively, may be described as sample band-pass operators. They filter the Paley-Wiener function9 through a zero-dimensional set in the kplane to produce the functions PN,K and PNM , which are still functions in L 2 A.
VI. DIRECT RECONSTRUCTION USING NATURAL k-PLANE COORDINATES The rigorous analysis of the operator CK and its integrability properties should convince the reader that the Jacobian determinant of the transformation from natural k-plane coordinates to Cartesian coordinates provides an excellent choice for the taper function. The next step is to examine the discrete approximation of the operators CK and B D ˇ CK dˇ for some specific cases. To recap the conclusions of the last section, reconstruction from a single-trajectory data acquisition can be obtained from the Riemann sum approximation to CK , via 9 Technically, is the Fourier transform of a Paley-Wiener function, but L 2 A is isomorphic to the classical Paley-Wiener function space via the Fourier transformation.
313
RECONSTRUCTION OF NMR IMAGING DATA
the single-shot NKPCRM 10 formula [Sarty, 1995] as Q
Stq e2iKx tq xn CKy tq ym jJtq , 0j
RSxn , ym D C
135
qD1
where C is a scaling constant that may be used to adjust the final greyscale range, or window, and xn and ym are the coordinates of the centers of the pixels of interest. Similarly, the multiple-shot NKPCRM formula [Sarty, 1997] is given by P
Q
RSxn , ym D C
Stq , ˇp e2iKx tq ,ˇp xn CKy tq ,ˇp ym jJtq , ˇp j
136
pD1 qD1
Note that the constants C in Equations (135) and (136) absorb the terms 1t and 1t1ˇ in Equations (46) and (39), respectively. The issue of a generalized sampling theorem now arises. For Cartesian coordinates, the aliasing behavior of undersampled reconstructions is well understood as a folding-in of the object from outside the field of view into the reconstruction. A good place to start understanding the aliasing properties of a reconstruction is to understand the associated PSF. A. Sample Band-pass Operators and Their Point-Spread Functions By substituting Stq D and Stq , ˇp D
1
1
1
1
1
1
1
1
E
E Ktq p E e2ipÐ dE p
E
E Ktq ,ˇp p E e2ipÐ dE p
into Equations (135) and (136), respectively, and interchanging the order of integration and summation, the associated PSFs may be seen as Q
x, y D C
e2iKx tq xCKy tq y jJtq , 0j qD1
10
Natural K-plane coordinate reconstruction method.
137
314
GORDON E. SARTY
for the single-shot NKPCRM and as P
Q
x, y D C
e2iKx tq ,ˇp xCKy tq ,ˇp y jJtq , ˇp j
138
pD1 qD1
for the multiple-shot NKPCRM. It is not yet clear what form a sampling theorem for arbitrarily sampled kspace data should take, but the relevant issues may be discerned by a detailed look at the PSFs. In the case of Cartesian coordinates, the familiar folding phenomenon that results from undersampling (see Figure 20) can be seen to follow from the periodicity of the PSF. In other words, the trigonometric sum P
Q
x, y D
e2iKx p,qxCKy p,qy
139
pD1 qD1
is periodic when fKx p, q, Ky p, qg is a finite grid uniform in each of the kx and ky directions, with a wavelength in each direction of D 1/ks , where ks is the smallest nonzero frequency11 sampled. For example, spin warp data are frequently sampled on a uniform square grid with a data sample spacing of 1k in each direction. The grid is usually centered in k-space so that the origin is
(a)
(b)
FIGURE 20. Undersampling spin warp data leads to the folding of image data from outside the field of view V D 1/1k, where 1k is the sample spacing. (a) A reconstruction of spin warp MRI data of a pumpkin with adequately spaced k-space samples. (b) A reconstruction of spin warp data spaced at 1k > 1/V, where V is the diameter of the pumpkin. 11
Note that the unit of frequency in the k-plane is cycles/meter if x and y are in meters.
315
RECONSTRUCTION OF NMR IMAGING DATA
not sampled. The lowest frequency sampled is then 1k/2, and the wavelength of the PSF is D 2/1k. The real part of the function of Equation (139) also has negative peaks at /2 with the standard spin warp k-space sampling scheme (see Figure 21). The negative peaks limit the field of view to V D /2 or V D 1/1k, a result known to every NMR imager but usually understood in terms of a Nyquist theorem and not in terms of the PSF. The resolution of the reconstructed image is determined by the area under the central peak. That is, the sharper and more concentrated the peak, the finer the resolution. The resolution is determined by the maximum radius in k-space that is sampled. In other words, the larger the region sampled, the finer the resolution. As P and Q in Equation (139) approach infinity, while keeping the region D [A, A] ð [A, A] a constant square, the PSF will approach the function x, y D 4A2
sin[2Ax] sin[2Ay] [2Ax] [2Ay]
140
in L 2 A as a consequence of Theorem 21 (see also Equation (117) with b D A and a D 0). Note that is no longer periodic on 2 and that 2 L 2 2 . One
150000 100000 50000 0 −50000 −100000 −150000 −0.4
−0.3 −0.2
−0.1
0 x
0.1
0.2
0.3
0.4
0 −0.1 −0.2 −0.3 −0.4
0.1
0.2
0.3
0.4
y
FIGURE 21. The PSF associated with the reconstruction of data sampled on a Cartesian grid in k-space (spin warp data) is periodic. If the object being imaged has a support larger than the distance between the peaks, then more than one peak will contribute to image reconstruction at a point and information from more than one small neighborhood of the object will be folded or added together. The wavelength of the PSF in the x- and y-directions is the distance between the positive peaks, D 2/1k. However, the negative peaks limit the field of view to 1/1k.
316
GORDON E. SARTY
can let A ! 2 as P,Q ! 1 and show that 2 L 2 A ! 2 L 2 2 . As P,Q ! 1, the peaks of the periodic separate more and more widely, and the function approaches the sinc tensor product . Regardless of the field of view imaged, the resolution of the reconstructed image can be taken as proportional to 1/A, one over half the length of the sides of the square covered by the k-space samples. This resolution approximation follows from the limit PSF and represents the region under its central peak. The first zeros of occur at x D š1/2A and y D š1/2A along a square centered at the origin and with sides of length 1/A. The number 1/A itself is too conservative an estimate of the resolution, which is closer to 1/2A when the Rayleigh criterion used. The Rayleigh criterion may be interpreted as the minimum offset that produces a ratio of the value of the peaks of two superimposed, offset PSFs to the value of the function between the two peaks of greater than 0.81 [Callaghan, 1991]. Obtaining a crisp image is then a matter of ensuring that the pixel size, V/Q (or V/P), is slightly larger than 1/2A. For the PSFs associated with discrete grids other than the Cartesian, it is still true that as P,Q ! 1, the trigonometric functions of Equations (137) and (138) approach the PSFs associated with the curve band-pass and ordinary band-pass operators, respectively. The convergence is valid with respect to the metric of a fixed function space L 2 A, and the limit function belongs to L 2 2 . For adequately sampled trajectories, the limit function may be used to estimate the resolution of the reconstructed image. However, the aliasing properties of undersampled reconstructions are more complicated than the folding that occurs with the Cartesian data. Computer simulations best illustrate what happens when non-Cartesian data are undersampled. B. Reconstructions of Undersampled Simulated MRI Data The mathematical phantom shown in Figure 22 provides a convenient image for the generation of a simulated MRI signal, as given by Equation (2). The phantom is composed of multiples of characteristic functions of ellipses. So a simulated MRI signal may be computed analytically as a superposition of the Fourier transforms of translated and rotated characteristic functions, E , of an ellipse. Specifically, let E x, y D
1 0
y2 x2 C 1 a2 b2 otherwise
if
141
RECONSTRUCTION OF NMR IMAGING DATA
317
FIGURE 22. The mathematical phantom used to generate the simulated MRI signals. It is a superposition of characteristic functions of circles. The x-axis is vertical, whereas the y-axis is horizontal, an important point to remember when looking at later figures. The scale of the image box shown is 0.2 m ð 0.2 m, and it is that scale that defines the field of view required to produce an alias-free image. In subsequent plots of the PSFs in later figures, the distances x and y are in meters.
Then E kx , ky D
abJ1 2
a2 kx2 C b2 ky2
a2 kx2 C b2 ky2
142
where J1 is the first-order Bessel function. If T x, y D E x Tx , y Ty , then OT kx , ky D e2iTx kx CTy ky OE x, y, and if R x, y D E x cos C y sin, x sin C y cos, then OR kx , ky D OE kx cos C ky sin, kx sin C ky cos. In what follows, all images were directly reconstructed on 64 ð 64 pixel grid and the taper functions used were those given by the Jacobian determinants associated with natural k-plane coordinates. The k-space was sampled to a radius that gave a PSF resolution comparable to the pixel’s spatial size. Note that it is possible to reconstruct the same data on a finer pixel grid by computing more values from Equations (135) or (136), but the PSF resolution won’t change. The equivalent trick with the FFT and Cartesian data is to zero-fill the k-space data by padding the data set at frequencies outside the sampled region with zeros. Figures 23 and 24 show the reconstructions of undersampled spiral data and the corresponding PSFs for 2, 4, 8, and 16 times undersampling, along with an adequately sampled reconstruction. Four interleaved spiral trajectories, each with ω D 32, were sampled and a total of 214 samples (212 per trajectory) were required to obtain an unaliased reconstruction. It should be noted that by using
318
GORDON E. SARTY
ω D 16, it is possible to obtain unaliased reconstructions from four interleaved spirals with a total of 213 samples; remarkably, the reconstructions and PSFs for 2, 4, 8, and 16 times undersampling are very similar to the results shown in Figures 23 and 24. By restricting ω1 to be equal to 1 in Equation (9), the petals of the rosette trajectory can be made to follow a largely radial direction, as shown in Figure 25. Such radially oriented excursions in k-space (ROSE) scans may have the advantage of being motion resistant in the sense that the artifact resulting from
FIGURE 23. From top to bottom, the images in part (a) show reconstructions of simulated k-space data from four interleaved Archimedean spirals at an adequately sampled rate of 2 14 total samples and with 213 and 212 total samples, which give undersampling by factors of 2 and 4, respectively. In (b) are the corresponding PSFs. Note how the PSF flat spot, which defines the field of view, shrinks and disappears with continued undersampling.
319
RECONSTRUCTION OF NMR IMAGING DATA 100000
50000
0
0.2 0.15
−50000 −0.2
−0.15
0.1 0.05 −0.1 −0.05
0 x
0.05
0.1
0.15
0.2
0 −0.05 −0.1 −0.15 −0.2
y
150000
100000
50000
0
−50000
−0.2
0.2 0.15 0.1 0.05 −0.15
−0.1
−0.05
0 x
(a)
0.05
0.1
0.15
0.2
0 −0.05 −0.1 −0.15 −0.2
y
(b)
FIGURE 24. From top to bottom, the images in part (a) show reconstructions of simulated k-space data from four interleaved Archimedean spirals undersampled with 211 and 210 total samples, which give undersampling by factors of 8 and 16, respectively. The corresponding PSFs (part (b)) now show radial spikes along the radii defined by the location of the k-space samples.
20
10
−20
−10
10
20
−10
−20 FIGURE 25. The ROSE k-plane trajectory follows a largely radial direction.
320
GORDON E. SARTY
object motion during data collection is less offensive than the artifact produced from the reconstruction of data acquired in other ways, such as Cartesian EPI, for example. Figures 26 and 27 show the reconstructions of undersampled ROSE data and the corresponding PSFs for 2, 4, 8, and 16 times undersampling, along with an adequately sampled reconstruction. Four interleaved ROSE trajectories with ω D 16 were used and a total of 214 samples were required to avoid aliasing.
FIGURE 26. From top to bottom, the images in part (a) show reconstructions of simulated k-space data from four interleaved ROSE trajectories at an adequately sampled rate of 2 14 total samples and undersampled with 213 and 212 total samples, which give undersampling by factors of 2 and 4, respectively. The corresponding trigonometric PSFs (part (b)) are shown on the right. Note how the flat spot that defines the field of view shrinks and finally inverts the central peak as the number of samples decreases.
RECONSTRUCTION OF NMR IMAGING DATA
321
FIGURE 27. From top to bottom, the images in part (a) show reconstructions of simulated k-space data from four interleaved ROSE trajectories undersampled at rates of 211 and 210 total samples, which give undersampling by factors of 8 and 16, respectively. The corresponding trigonometric PSFs are shown in part (b). Ring lobes have formed, and at 210 samples the central peak has inverted again.
Figures 28 and 29 show the reconstructions of undersampled sinusoidal data and the corresponding PSFs for 2, 4, 8, and 16 times undersampling, along with an adequately sampled reconstruction. A single sinusoid with ω D 64 was sampled, and the Jacobian determinant used for the taper function was the k-space velocity associated with the normal-segment coordinates. As with the ROSE reconstructions, 214 samples were required for an unaliased reconstruction. Note that the aliasing is similar to the folding found in Cartesian sampling. Also note that the folding occurs only in the amplitude direction of the sinusoid; the sampling 1k in the other direction is always less than the Nyquist value. The PSFs associated with the spiral and ROSE reconstructions show a clear flat spot that shrinks as the number of samples is decreased. The flat spot represents a zone of nearly complete destructive interference among the sinusoids
322
GORDON E. SARTY
FIGURE 28. From top to bottom, the images in part (a) show reconstructions of simulated sinusoidal k-space data at an adequately sampled rate of 214 total samples and with 213 and 212 total samples, which give undersampling by factors of 2 and 4, respectively. One sinusoidal trajectory was used, with the samples taken at equal 1t along the trajectory. The corresponding trigonometric PSFs are shown in part (b). As the number of samples decreases, the PSF degenerates along the y-direction.
that comprise the discrete PSF, and the diameter of the flat spot determines the field of view. The PSF associated with the sinusoidal acquisition has peaks all along the y-axis and is flat away from that axis. The grid used to compute the sine PSF for the figures doesn’t adequately show the structure of the function along the y-axis, which roughly consists of alternating positive and negative peaks. So while a numerical analysis of the PSFs associated with
RECONSTRUCTION OF NMR IMAGING DATA
323
FIGURE 29. From top to bottom, the images in part (a) show reconstructions of simulated k-space data from simulated sinusoidal k-space data undersampled with 211 and 210 total samples, which give undersampling by factors of 8 and 16, respectively. The PSFs shown in part (b) continue to degenerate in the y-direction as the number of samples along the k-space trajectory is reduced.
non-Cartesian sampled MRI data is a good place to start in formulating a general sampling theorem, it appears that a complete analytical understanding of the discrete PSF is required before it can be claimed that the reconstruction of such data is completely understood.
VII. CONCLUSION The reconstruction of MRI data sampled on non-Cartesian grids in k-space offers challenges not encountered with the reconstruction of data sampled on Cartesian grids. The FFT allows the efficient reconstruction of Cartesian sampled data, and its efficiency has been the motivating force behind the development of the gridding approach for the reconstruction of non-Cartesian sampled data, especially for spiral MRI.
324
GORDON E. SARTY
The direct reconstruction determined by summing data along k-space trajectories has been shown in this article to be intimately related to integral transforms that depend on the existence of local coordinates on the k-space that are naturally associated with the given k-space data-acquisition trajectories. The local coordinates are natural in the sense that acquisition time provides an obvious coordinate. The interleaf parameter from acquisition schemes that can be interleaved can provide the second natural k-space coordinate. The relationship between the integral transforms and their approximations required for practical image reconstruction has been explored from a function space point of view. That point of view has allowed definite statements to be made about the convergence properties of discrete operators and their corresponding discrete point-spread functions to their integral counterparts as the number of samples is increased. Thus, statements about the resolution of a discrete reconstruction scheme can be made through the analysis of the point-spread function associated with an integral reconstruction scheme. Gridding reconstruction has been shown to be an approximation of direct reconstruction, so that all the results obtained from a study of direct reconstruction, including those related to sampling issues, may be directly applied to gridding reconstruction. The computational inefficiency of direct reconstruction does not, therefore, detract from the value of the theorems about integral and discrete operators on Paley-Wiener spaces to practical situations. The mathematics underlying gridding reconstruction is approximately the same as that underlying direct reconstruction. Finally, the largest open issue in reconstructing non-Cartesian sampled MRI data is the formulation of a sampling theorem. In other words, how many samples are required for a generic sampling scheme to result in an unaliased reconstruction? The results of computer simulations show that a careful study of the trigonometric point-spread functions associated with the reconstructions may shed some light in the direction of a generalized sampling theorem. Non-Cartesian grids in k-space originally arose from an effort to increase the speed of MRI data acquisition while limiting the demands placed on the gradient field power systems. It turns out that differing k-space acquisition patterns provide advantages not seen in the signal model used in this article. For example, rosette and Lissajous patterns provide some spectral selectivity among chemically shifted nuclear spins or among slices in an object. The ROSE pattern may, like the older radial pattern, provide some resistance to motion artifact. Some patterns may even be conducive to combined spatial/temporal data acquisition for the production of real-time MRI movies. Although more sophisticated signal models are required for those situations, they will all be based on the signal model used here, and the basic theory of reconstruction will be as given in this article.
RECONSTRUCTION OF NMR IMAGING DATA
325
REFERENCES Bloch, F. (1946). Nuclear induction, Physical Rev. 70, 460 – 474. Bracewell, R. N., and Thompson, A. R. (1973). The main beam and ringlobes of an east-west rotation-synthesis array, Astrophysical. J. 182, 74 – 94. Brigham, O. E. (1974). The Fast Fourier Transform, Upper Saddle River, N.J.: Prentice Hall. Brouw, W. N. (1975). Aperture Synthesis, in Methods in Computational Physics, vol. 14: Radio Astronomy. New York: Academic Press, pp. 131 – 175. Callaghan, P. T. (1991). Principles of Nuclear Magnetic Resonance Microscopy. Oxford: Clarendon Press. Daubechies, I. (1992). Ten Lectures on Wavelets. Philadelphia: SIAM. Dutt, A., Gu, M., and Rokhlin, V. (1993). Fast Algorithms for Polynomial Interpolation, Integration and Differentiation, Tech. report YALEU/DCS/RR-977, Yale University, Department of Computer Science. Dutt, A., and Rokhlin, V. (1993). Fast Fourier transforms for nonequispaced data, SIAM J. of Sci. Comp. 14, 1368 – 1393. Dutt, A., and Rokhlin, V. (1995). Fast fourier transforms for nonequispaced data II, Appl. and Comp. Harmonic Anal. 2, 85 – 100. Beger, W. H., ed. (1997). CRC Standard Mathematical Tables. Boca Raton: CRC Press. Higgins, J. R. (1996). Sampling Theory in Fourier and Signal Analysis. New York: Oxford University Press. Hinshaw, W. A., and Lent, A. H. (1983). An introduction to NMR imaging: From the Bloch equation to the imaging equation, Proc. of the IEEE 71, 338 – 350. Hoge, R. D., Kwan, R. K. S., and Pike, G. B. (1997). Density compensation functions for spiral MRI, Mag. Resonance in Med. 38, 117 – 128. Jackson, J. I., Meyer, C. H., Nishimura, D. G., and Macovski, A. (1991). Selection of a convolution function for Fourier inversion using gridding, IEEE Trans. on Med. Imaging 10, 473 – 478. Lauterbur, P. C. (1973). Image formation by induced local interactions: Examples employing nuclear magnetic resonance, Nature 242, 190. Ljunggren, S. (1983). A simple graphical representation of Fourier-based imaging methods, J. of Mag. Resonance 54, 338 – 343. Maeda, A., Sano, K., and Yokoyama, T. (1988). Reconstruction by weighted correlation for MRI with time-varying gradients, IEEE Trans. on Med. Imaging 7, 26 – 31. Mansfield, P. (1977). Multi-planar image formation using NMR spin echos, J. of Physics C 10, L55. Meyer, C. H., Hu, B. S., Nishimura, D. G., and Macovski, A. (1992). Fast spiral coronary imaging, Mag. Resonance in Medicine 28, 202 – 213. Noll, D. C. (1995). Methodologic considerations for spiral k-space functional MRI, Int. J. of Imaging Syst. and Tech. 6, 175 – 183. Noll, D. C. (1997). Multishot rosette trajectories for spectrally selective MR imaging, IEEE Trans. on Medical Imaging 16, 372 – 377. Noll, D. C. Meyer, C. H., Pauly, J. M., Nishimura, D. G., and Macovski, A. (1991). A homogeneity correction method of magnetic resonance imaging with time-varying gradients, IEEE Trans. on Med. Imaging 10, 629 – 637. Noll, D. C., Pauly, J. M., Meyer, C. H., Nishimura, D. G., and Macovski, A. (1992). Deblurring for non-2D Fourier transform magnetic resonance imaging, Mag. Resonance in Med. 25, 319 – 333.
326
GORDON E. SARTY
Noll, D. C., Peltier, S. J., and Boada, F. E. (1998). Simultaneous multislice acquisition using rosette trajectories (SMART): A new imaging method for functional MRI, Mag. Resonance in Med. 39 709 – 716. O’Sullivan, J. (1985). A fast sinc function gridding algorithm for Fourier inversion in computer tomography, IEEE Trans. on Med. Imaging 4, 200 – 207. Reed M., and Simon, B. (1972). Methods of modern mathematical Physics, Vol. II. New York: Academic Press. Rosenfeld, D. (1998). An optimal and efficient new gridding algorithm using singular value decomposition, Mag. Resonance in Med. 40, 14 – 23. Sarty, G. E. (1995). Method for reconstruction for MRI signals, Canadian patent application 2,147,556 (April 21). Sarty, G. E. (1997). The natural k-plane coordinate reconstruction method for magnetic resonance imaging: Mathematical foundations, Int. J. of Imaging Syst. Tech. 8, 519 – 528. Schomberg, H., and Timmer, J. (1995). The gridding method for image reconstruction by Fourier transformation, IEEE Trans. on Med. Imaging 14, 596 – 607. Slichter, C. P., (1963). Principles of Magnetic Resonance. New York: Harper and Row. Sogge, C. D. (1993). Fourier Integrals in Classical Analysis. Cambridge, Mass.: Cambridge University Press. Stehling, M. K., Turner, R., and Mansfield, P. (1991). Echo-planar imaging: Magnetic resonance imaging in a fraction of a second, Science 251, 43 – 50. Tong R., and Cox, R. W. (1998). Reconstruct NMR images from arbitrary scanning trajectory using nonuniform fast Fourier transform, Proc. of ISMRM, 6th Scientific, Sydney, p. 2065. Tricomi, F. G. (1985). Integral Equations. New York: Dover Publications, Inc. Twieg, D. B. (1983). The k-trajectory formulation of the NMR imaging process with applications in analysis and synthesis of imaging methods, Med. Physics 10, 610 – 621. Vlaardingerbroek, M. T., and den Boer, J. A. (1996). Magnetic Resonance Imaging. Berlin: Springer-Verlag. Wolfram, S. (1991). Mathematica. Reading, Mass.: Addison-Wesley. Yosida, K. (1980). Functional Analysis. Berlin: Springer-Verlag.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 111
An Integrated Approach to Computational Vision The Edge Strength Function and the Nested Symmetries SIBEL TARI Department of Engineering Sciences, Middle East Technical University, Ankara, Turkey, 06531
I. II. III. IV.
Introduction . . . . . . . . . . . . . . . Review of the Edge-Strength Function . . . . . . Geometry of the Edge-Strength Function . . . . . Extraction and Representation of Perceptual Information A. Local Symmetry Set . . . . . . . . . . . B. Partial Symmetries and the Nested Loci . . . . C. Illustrative Examples and Discussion . . . . . V. Shape Skeleton and Shape Segmentation (2-D Case) . VI. Coloring Nested Symmetries in Higher Dimensions . . A. 3-D Case . . . . . . . . . . . . . . . 1. Singular Locus . . . . . . . . . . . . Abstract . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
327 332 335 342 343 344 345 351 362 363 364 365 365
I. INTRODUCTION Visual perception has been a source of curiosity in many disciplines. Psychologists, psycho-physicists, and neurophysiologists have tried to find out how the visual stimuli — the light absorbed by the receptors in the retina — are transformed into perception of shapes. With the advance of digital computers, a new approach to perception called computational vision has been initiated. Computational vision is characterized by an interest in formally defining the perceptual process necessary to represent the world [Matlin and Foley, 1992; Marr, 1982]. Ultimately, a computational vision theorist wants to develop programs that can take in the sort of data sensed by the photoreceptors in the retina (a digital image) and process these data to extract useful information that yield representation of the sensed world. The goal is twofold: to build intelligent machines with the ability to see and to better understand human vision. An important feature of the computational approach is that it attempts to solve problems using general physical knowledge rather than using specific knowledge about the objects being seen at the time, in that the computational Volume 111 ISBN 0-12-014753-X
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
328
SIBEL TARI
approach differs from the information-processing approach to vision that is popular among computer people. Scientists working on vision, however, agree that recognition of an object such as a cat requires that we know what a cat is. Therefore, the work in computational vision has focused on the early stages of visual processing. It has always been the biggest challenge of computational vision to express intuitive shape properties in quantitative terms using a representation that places similar shapes in a neighborhood without fixing a metric or knowing what is meant by a similar shape, because such a knowledge depends on the context (of course, the representation should be computable from images). Representations based on algebraic invariants and techniques derived from signal processing fail to capture intuitive properties of flexible shapes surrounding us. All or part of the problem has been addressed by many leading researchers of computational vision [Blum, 1973; Marr, 1982; Latto, Mumford, and Shah, 1984; Mumford, 1987]; the need for an object-defined reference system is emphasized. In the late 1960s, motivated by the preceding problem, H. Blum [1967, 1973] searched for an alternative geometry that would be devised to work for flexible natural objects and would replace the conventional geometry developed in collaboration with mechanics and out of concern for properties such as congruence, similarity, and equivalence under planar projection. Inspired by the influential Gestalt school of psychology, he presented a pioneering work on shape representation. He called his transform the medial axis transform and the representation a medial axis, or shape skeleton. A medial axis transform is defined by an axis that is the locus of the centers and the radii of the inscribed maximal circles of the shape. The key point is that the shape is locally symmetric with respect to this axis. Following the work of Blum, a variety of shape representation methods, each of which directly or indirectly makes use of an axis about which the shape is locally symmetric, has been proposed. These methods include, among others, generalized cylinders [Binford, 1971; Marr and Nishiara, 1977]; generalized ribbons [Brooks, 1981]; smoothed local symmetries [Brady and Asada, 1984], cores [Burbeck and Pizer, 1994]; and shock grammar [Siddiqi and Kimia, 1996]. The use of local symmetry was established as the result of a collaborative effort of mathematicians, engineers and psychologists, in particular, the Gestalt psychologists. Since their introduction by Wertheimer [1923], Koffka [1935], and K¨ohler [1947], Gestalt laws have had significant impact on many areas of psychology, mostly on perception of form; they have been a major source of inspiration for the shape theories in computational vision. The law of Pr¨agnanz [Koffka, 1935] of the Gestalt school refers to the tendency to perceive figures as good, simple, and stable. According to this law, some figures are better than others because they are simpler; a circle is a perfect figure. Gestalt theorists, however, failed
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
329
to provide quantitative measures, but later, H. Resnikoff [1989] showed that in a closed contour, the increase in information obtained by going from point A to point B is given by logjkB /kA j, where kA and kB are the respective curvatures at points A and B. Thus, information in a closed contour is concentrated at the points of extreme curvature (note that this is consistent with the Attneave and Arnoult [1956] experiments in the 1950s on human subjects). Therefore, the circle (maximum symmetry and uniform curvature) is the most economical shape for encoding in the memory. Leyton’s PISA [1988] (Process Inferring Symmetric Axis) is another symmetry-based representation in which a shape is interpreted as a circle deformed with indentations and protrusions. Leyton [1987] first argued the duality of local symmetry and curvature and used this duality to infer PISA. Thus, PISA measures how a shape deviates from a circle. The basic idea of interpretation of a shape as a collection of pieces that are themselves deformed versions of a perfect shape through indentations and protrusions (curvature extrema) is particularly explicit in superquadrics [Bajcsy and Solina, 1987] — parametrized forms that may be best described as lumps of clay deformed and glued — and in the entropy scale space [Kimia, Tannenbaum, and Zucker, 1995]. The underlying principle of the entropy scale space is the representation of a shape in a two-dimensional scale space by letting the shape evolve toward a more circular shape under the influence of a nonlinear wave-diffusion equation. The diffusion tends to smooth shapes and the reaction tends to form singularities. The singularities that arise during the evolution of the shape provide a multiscale shape skeleton. It appears that there is a close connection among local symmetry, curvature, and wave propagation. In fact, wave propagation is the source of inspiration for Blum’s [1967] method. He gave several constructions for shape skeletons. One of these constructions is known as a grassfire, in which the interior of the shape is filled with dry grass and a fire is set simultaneously at all points along the shape boundary. The fire front propagates with constant inward normal velocity. Then, at any time, its points are equidistant from the shape boundary. The axis of local symmetry or the shape skeleton is defined as the locus of quench points (singularities) of the advancing front. By keeping track of the time at which the front arrives at the skeleton points, it is possible to recover the original shape. Thus, no information is lost. A consequence of this fact is that if the shape boundary is noisy, the noise is also preserved in the skeleton and thus the representation is not robust with respect to noise (Figure 1). Many complicated strategies have been devised to prune noisy skeletons in order to arrive at the essential skeleton of shape [Ogniewicz, 1994]. An alternative systematic approach is to introduce smoothing or regularization in the grass fire itself, thus combining skeletonization and regularization in a single formulation.
330
SIBEL TARI
FIGURE 1. Blum’s shape skeletons for (a) a rectangle and (b) a rectangle with a small bump at the boundary.
A straightforward approach to regularization is to let the front move with a curvature-dependent inward normal velocity consisting of two components: a constant component (Blum’s grass fire [1967]) and a smoothing component proportional to the curvature. Then, the propagating fronts may be interpreted as the steps in the evolution of a shape boundary evolving toward a smoother form. The resulting Hamilton-Jacobi equation is difficult to solve, requiring computationally intensive and complicated techniques [Tari, Shah, and Pien, 1996; Osher and Sethian, 1988; Sethian, 1982]. In order to implement such a nonlinear evolution, a linear diffusion–based approach was recently formulated by Tari, Shah and Pien [1996, 1997]. To illustrate the idea, first consider Blum’s grass fire analogy. Think of time t as a function over the plane by setting tx, y as the time when the advancing fire front passes through the point x, y. The value of the function at point x, y is the Euclidean distance of the point to the shape boundary. Its level curves describe the propagation of the shape boundary, and the projection of its singularities constitute the shape skeleton (Figure 2). In place of the surface tx, y, they proposed the function vx, y, which is called the edge-strength function. Function v is smooth, equals 1 along the shape boundary, and decays rapidly away from the boundary. Thus 1 v may be thought of as a monotonic function of t. The surface 1 v looks like a smoother (monotonically scaled) version of the surface t; the level curves of 1 v, and thus the level curves of v, are smoothed analogs of the propagating fire fronts (Figure 3). (Notice that time is increasing in the direction of decreasing v.) The formulation involves a parameter, which controls the degree of smoothing. The edge-strength function is inspired by an approximation [Ambrosio and Tortorelli, 1992] of the famous Mumford-Shah segmentation functional [Morel and Solimini, 1996; Mumford and Shah, 1985, 1989]. Indeed, Tari, Shah, and Pien used Euler-Lagrange equations of the approximate functional (a set of coupled linear PDEs) to compute the edge-strength function from gray scale images.
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
331
FIGURE 2. Surface tx, y defined by equidistant contour lines.
FIGURE 3. Surface 1 v and the level curves of v.
What they demonstrated, however, is that the edge-strength function is much more than a technical device for applying gradient descent to Mumford-Shah functional. They showed that the level curves of the edge-strength function describe the evolution of the shape boundary by letting each point move with a curvature-dependent inward normal velocity, and the local geometry of the edge strength function provides information regarding the geometry of the shape. Later, Tari and Shah went further and presented a very efficient and easy method for extraction of this information from arbitrary dimensional images in the form of nested local symmetries [Tari and Shah, 1998]. The most important property of the method is that it extends naturally to real images (i.e., gray scale and color images). We have assumed to this point that the image segmentation problem is solved; the shape outline is already extracted and available. But this is not an easy task, especially if the image is noisy or has varying contrast. The crucial point is that the edge-strength function corresponding to the segmentation locus of a gray scale or color image can be computed without determining the segmentation locus itself.
332
SIBEL TARI
Thus, the method permits the extraction of local symmetries and the shape skeleton directly from raw (unprocessed) images, combining shape extraction and shape representation (two major challenges of computational vision) into a single integrated framework. The purpose of this chapter is to discuss this development in local symmetry extraction and its connections to the edge-strength function as determined by inhomogeneous diffusion and to the fronts moving with curvature-dependent speed. The chapter is organized as follows. In Section II, the edge-strength function is reviewed. In Section III, the geometry of the edge-strength function is examined and the behavior of its level curves is related to a front moving with a curvature-dependent velocity. Extraction of perceptual information in the form of nested local symmetry loci is discussed in Section IV; the idea is developed for 2-D case first and then generalized to an arbitrary dimension. Section V deals with the coloring or the classification of local symmetry points for 2-D case; the shape skeleton and the shape decomposition are computed; recovery of the object boundaries from the symmetry branches is discussed. In Section VI the constructions of Section V are extended to higher dimensions.
II. REVIEW OF THE EDGE-STRENGTH FUNCTION Mumford and Shah [1985] formulated the image-segmentation problem as a functional minimization via which a piecewise smooth approximation of the image and a set of discontinuity loci corresponding to object boundaries are to be recovered from a given noisy image. The functional is as follows: EMS D ˛
jjrujj2 dx dy C ˇ Rn0
u g2 dx dy C length0
1
R
where R ² <2 is a connected, bounded, open subset representing the image domain, gx, y is the feature intensity, 0 is a curve segmenting R, p ux, y is the smoothed image ² Rn0, and ˛ and ˇ are the weights. Let D ˛/ˇ. Then may be interpreted as the smoothing radius in Rn0. With fixed, the higher the value of ˛, the lower the penalty for length (0) and, hence, the more detailed is the segmentation. Recently, Morel and Solimini [1996] argued that the Mumford-Shah model is the general model of image segmentation and the existing segmentation algorithms and heuristics are its variants. Unfortunately, efficient algorithms that minimize the functional do not exist. Even finding a segmentation that only approximately minimizes this functional is an extremely difficult task. The source of difficulty is the discrete characteristic function of 0. As a solution,
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
333
Ambrosio and Tortorelli [1992] replace the discrete characteristic function by a continuous function defined over the entire image domain. The new function varies between 0 and 1 and decays exponentially away from 0. Specifically, it is the unique minimizer of the following functional: 3v D
1 2
jjrvjj2 C R
v2
2
subject to the boundary condition v D 1 along 0. That is, v is the solution of the following boundary value problem: r2 v
v D 0; 2
vj0 D 1
3
As ! 0, 3 v ! length0 and v ! 0 everywhere except along 0. Thus, v may be thought of as a blurred version of the characteristic function of 0 with a blurring radius , or it may be interpreted as the probability for the presence of an edge at every image point; thus it is appropriate to call it the edge-strength function. In the Mumford-Shah functional, replacing length(0) by 3 v and jjrujj2 dx dy Rn0
by 1 v2 jjrujj2 dx dy R
the following elliptic approximation is obtained: EAT D
˛1 v2 jjrujj2 C ˇu g2 C R
v2 jjrvjj2 C 2 2
dx dy
It is now possible to apply the gradient descent, and the corresponding Euler-Lagrange equations are ∂u ˇ D r Ð 1 v2 ru u g ∂t ˛ v 2˛ ∂v D r2 v 2 C 1 vjjrujj2 ∂t ∂u ∂t
D 0 and ∂R
∂v ∂t
D0
4
∂R
where ∂R denotes the boundary of R and n denotes the direction normal to ∂R.
334
SIBEL TARI
Notice that the equation for each variable is a diffusion equation that minimizes a convex quadratic functional in which the other variable is kept fixed: keeping v fixed, the first equation minimizes the integral ˛1 v2 jjrujj2 C ˇu g2 R
Keeping u fixed, the second equation minimizes the integral jjrvjj2 C R
1 C 2˛jjrujj2 2
v
2˛jjrujj2 1 C 2˛jjrujj
dx dy
Thus, the edge-strength function v is nothing but a nonlinear smoothing of 2˛jjrujj2 1 C 2˛jjrujj with u being a simultaneous nonlinear smoothing of g. When the Ambrosio and Tortorelli approximation is used, determination of actual discontinuities is an open problem. (Note that recovery of 0 from v is not an easy task [Shah, 1991, 1992]). However, in the remainder of the chapter, we demonstrate that the edge-strength function carries relevant perceptual information. Thus the segmentation loci of a gray scale image can be represented and described without the exact knowledge of the segmentation loci themselves. Before that, it should be pointed out that the computation of the edgestrength function extends naturally to vector cases. Assume now that the image g is a set of k functions: gx, y D g1 x, y, g2 x, y, . . . , gk x, y (i.e., k channels of a registered set of MR data). If the components are unrelated functions, the simplest process for linking them is through a common edge-strength function by considering the minimizer of the following functional: 1 v2 R
˛i jjrui jj2 C
ˇi ui gi 2 C R
R
where i D 1, 2, . . . , k. The corresponding Euler Lagrange equations are: ∂u1 ˛1 D r Ð 1 v2 ru1 u1 g1 ∂t1 ˇ1 ∂u2 ˛2 D r Ð 1 v2 ru2 u2 g2 ∂t2 ˇ2
v2 jjrvjj2 C 2 2
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
335
.. . ˛k ∂uk D r Ð 1 v2 ruk uk gk ∂tk ˇk v ∂v D r2 v 2 C 1 v ∂tv
k
iD1
2˛i jjrui jj2
5
Intuitively, each image is nonlinearly smoothed away from the boundaries and approximated by a piecewise smooth image that best represents the input image. At the same time the value of the edge-strength function is increased proportionally to the image gradient. In [Shah, Pien, and Gauch, 1996] a single common edge-strength function is computed from a range and intensity data. In [Shah, 1996], Shah treated the color images as a vector function after transforming the RGB (red, green, and blue) components into a CIE 1976 L Ł aŁ bŁ space, which behaves more like a vector space. The edge-strength-function computation is also valid in higher dimensions. The key point is that as ! 0, min 3 v approaches to the “volume” of 0. III. GEOMETRY OF THE EDGE-STRENGTH FUNCTION In order to understand the geometry of the edge-strength function, let 0 be a simple closed curve and v be the solution of the boundary value problem given in Equation (3). Based on a derivation in [Mumford and Shah, 1989], inside 0, x, y ∂v x, y C O3 vx, y D 1 C 6 2 ∂n where kx, y is the curvature of the level curve passing through the point x, y and n is the direction of the inward normal. Therefore, if one imagines moving from a level curve to a level curve along the normals, then for small values of , a change of 4v in level requires the movement 1r ³
v
1C
x, y 4v 2
where r denotes the arc length along the gradient lines of v, the positive direction being the direction of inward normals. Let 4t D 4v/v. Notice that v is a decreasing function in the direction of inward normal and hence 4v is negative. Functions v and t have the same set of level curves because
336
SIBEL TARI
t is a monotonic function of v. Passing to the infinitesimals, one obtains the velocity dr ³1C dt 2 A similar argument shows that for moving outward from 0, the velocity is given by the formula dr ³ 1 C dt 2 Thus, for the small values of , successive level curves of v may be interpreted as a shape boundary “evolving” with an inward normal velocity 1 C /2, consisting of two components corresponding to morphology (Blum’s grass fire [1967]) and curvature-dependent selective smoothing, respectively. Figure 4 displays selected level curves of v in a sequence of decreasing v. 0 is the boundary of a cat shape given on a 256 ð 256 lattice. Level curves outside the shape are not displayed. The value of is 4. As time progresses — or, in other words, as v decreases — protrusions such as ears of the cat disappear. The shape splits into its constituent components, and the parts shrink and evolve toward more circular shapes and eventually disappear. The smaller parts disappear before the larger parts. The shape splits into parts along necks, as seen in the cat example, where the tail separates from the body.
FIGURE 4. The level curves of v for p D 4; 0 is the boundary of a cat shape given on a 256 ð 256 lattice; the numbers show the values of v, and only the inner level curves are depicted. Notice that protrusions disappear; parts break off and evolve toward a more circular form as one moves towards the inner-level curves.
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
337
FIGURE 5. The level curves of v inside a doll shape given on a 128 ð 128 lattice; D 4.
The head, which is connected to the body by a much thicker neck, disconnects much later than the tail. Figures 5 and 6 display the same information for two other cases. Notice that the method can handle triple points (pliers example of Figure 6) without trouble. Even though the preceding construction is motivated by the example of simple closed curves, it works equally well for more general curves, as already demonstrated by the pliers example. In Figure 7, another case that is not simple closed is depicted. It is the case of an incomplete shape obtained by removing a 56-pixel-long piece from the middle of each of the two long sides of a complete 175 ð 140 rectangle. As the evolution progresses, the gaps between the pieces of the boundary are filled in. For gray scale images, v is computed by means of the inhomogeneous diffusion system given in Equation (4). Level curves of v computed for a synthetic gray scale image that contains two squares and three triangles are depicted in Figure 8; selected level curves are superimposed on the original gray image. The values of and are 16 and 4, respectively. The value of ˛ is chosen in order to make the maximum value of v close to 1. Finally, Figure 9 depicts the level curves of v computed for an MR image of the human brain. The original image is given in Figure 10. The level curves of v are superimposed on the original image; only the central area containing the ventricles (the butterfly-shaped dark feature) is depicted. The values of both and are 8, and the value of ˛ is chosen to make the maximum value of v close to 1.
338
SIBEL TARI
FIGURE 6. The level curves of v for D 4. 0 is a line drawing of a pair of pliers given on a 256 ð 256 lattice; the method can handle triple points.
FIGURE 7. Level curves of v for an incomplete rectangle.
The only parameter of the formulation for the binary case (when 0 is known), , controls the amount of diffusion. Thus is a scale parameter (Figure 11). A shape tends to break up into its components more easily when is smaller (Figure 12). The effect of is the same for the gray scale images; in Figure 13, the level curves of v computed for the synthetic gray image
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
339
FIGURE 8. Level curves of v computed by means of a coupled diffusion system from a gray scale image; D 16 and D 4.
FIGURE 9. Level curves of v in the butterfly-shaped ventricle area; D D 8, ˛ is chosen to make the maximum value of v reach 1.
remain sharper when is reduced from 16 to 4 while the other parameters are kept unchanged. In the preceding analysis, above, the successive level curves of v were interpreted as the fire fronts propagating with a curvature-dependent speed. This, interpretation holds, however, only at the points where is small compared to the nearby radii of curvature of the level curve through the point and also small compared to the radius of the largest circle passing through the point and contained entirely inside or entirely outside the level curve. The differences between a shape boundary evolving with a curvature-dependent velocity 1 C /2 and the true behavior of the level curves of v emerge as becomes large. The main difference is that, in the former case, the velocity of the curve
340
SIBEL TARI
FIGURE 10. MR image of a human brain.
FIGURE 11. Level curves of (a) v at D 8 and (b) D 16; curvature-dependent selective smoothing increases with the increasing .
at a point depends only on the curvature at that point, whereas the velocity of the level curves of v also depends on the interactions among nearby points. As a simple example, consider a circle containing a smaller eccentric circle inside. Thus, the shape is an annulus of varying width. The eccentricity is so large that the minimum width of the annulus is very small compared to its maximum thickness. Under curvature-dependent evolution, circles remain circles. If is sufficiently large, both circles shrink toward their centers and eventually disappear without intersecting. Therefore, assume that is small enough so that, eventually, the two circles touch. Note that, at the point they touch each other, the circles are tangent to each other. Thereafter, the annulus
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
341
FIGURE 12. Level curves of v computed for two different values. (a) During the course of evolution, the shape breaks up into four parts: the leaves and the center. (b) When the amount of diffusion is increased by a factor of 4, the same shape evolves as a single piece.
FIGURE 13. Level curves of v computed by means of a coupled diffusion system from the gray scale image; D 4 and D 4.
breaks up at the singular point and continues to evolve as a simple closed curve until it shrinks to a point and disappears. Consider now the level curves of v. Assume that is small compared to the minimum width of the annulus and also compared to the radius of the smaller circle. Then the level curves near the boundary of the annulus closely approximate the initial set of evolved shapes. However, for the level curves farther inside the annulus, as the width of the annulus enclosed by a level curve becomes comparable to , the interaction between the opposite boundaries of the annulus becomes significant
342
SIBEL TARI
FIGURE 14. Level curves of an annulus. Notice that at the break, the level curve forms a cross.
and the value of v becomes larger than what it would be without the interaction. As a result, the gradient is reduced. In other words, the speed of the evolving curve is increased. Therefore, in the thinnest part of the annulus, the inner circle begins to develop a bulge and the outer circle begins to develop a dent, leading to a formation of a neck and an eventual break. At the point of break, the outer and the inner boundaries of the annulus form a cross instead of remaining tangent to each other, as in the case of curvature-dependent evolution (Figure 14). This is definitely a computational advantage, as argued in Section V.
IV. EXTRACTION AND REPRESENTATION OF PERPETUAL INFORMATION As demonstrated in Section III, v contains information regarding parts, protrusions, and necks. Now the question is how this information is represented and how this representation is computed from v. A hint is suggested by the following observation. In Blum’s grass fire [1967, 1973], singularities develop as corners and self-intersections form. The locus of these singularities forms the medial axis or shape skeleton. When smoothing is introduced, selfintersections may still develop due to thinning of narrow necks, but corners are rounded out. Therefore, when smoothing is present, points of maximum curvature serve as substitutes for corners. Now, note from Equation (6) 1C
2
v ∂v D ∂n
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
343
i.e., along a level curve, the points of maximum curvature correspond approximately to the points where jjrvjj is minimum. The computation of curvature involves second-order derivatives of v and hence is more sensitive to noise than jjrvjj. This observation suggests an alternative way of defining the shape skeleton by defining points where jjrvjj is minimum along the level curves of v. Points where jjrvjj attains its minimum along the level curves of v are also of interest because they may indicate an indentation due to the existence of a neck, which joins two parts.
A. Local Symmetry Set Now, consider the locus of points where the magnitude of the gradient, jjrvjj, vanishes along the level curves. These points are given by the zero crossings of v , where is in the direction of the gradient vector, rv, and is in the direction tangent to the level curve. In terms of global coordinates x and y, vx jjrvjj vy jjrvjj
vxx vxy
vxy vyy
vy jjrvjj vx jjrvjj
D
fv2y v2x vxy vx vy vyy vxx g jjrvjj2
D0
7 Now, notice that the symmetry of the level curve at a point P, where v D 0 is indicated by the missing -term in the Taylor series of v at P: v D a00 C a10 C a01 C a20 2 C a02 2 C Ð Ð Ð
Thus, locally at P, the level curve v D a00 is approximately a conic section whose one of the principal axes coincides with the gradient vector. An equivalent description of the symmetry at P is that the Hessian of v at P is diagonalized when expressed in local coordinates and . This means that the gradient vector rv is an eigenvector of the Hessian at P, the other eigenvector being tangent to the level curve at P. In order to extend this analysis to an arbitrary dimension consider the points where jjrvjj is stationary along the level hypersurfaces of v. Proposition 1 jjrvjj is stationary along a level hypersurface at a point P if and only if rv is an eigenvector of the Hessian H of v. Proof jjrvjj is stationary along a level hypersurface at a point P if and only if the derivative of jjrvjj in any direction tangent to the level hypersurface at P vanishes. That means that at P, rjjrvjj cannot have a component tangent to the level hypersurface at P. That is, the directions of rjjrvjj and rv must
344
SIBEL TARI
coincide at P. In other words, rjjrvjj must be a multiple of rv. But rjjrvjj D
Hrv jjrvjj
It follows that the necessary and sufficient condition for P to be a stationary point is that Hrv D crv
for some constant
If rv is the vector fv1 , v2 , . . . , vn g and Hrv is the vector fw1 , w2 , . . . , wn g, then we have n 1 equations w1 w2 wn D D ÐÐÐ D v1 v2 vn
8
to determine the one-dimensional locus of the extremal points. In the case of 3-D shapes, these equations take the form vx vxx C vy vxy C vz vxz vx vxy C vy vyy C vz vyz vx vxz C vy vyz C vz vzz D D vx vy vz
9
which may also be written as vx vy vxx vyy C vxy v2y v2x C vz vy vxz vx vyz D 0 vy vz vyy vzz C vyz v2z v2y C vx vz vyx vy vzx D 0 vz vx vzz vxx C vzx v2x v2z C vy vx vzy vz vxy D 0
Note that the Hessian vector is chosen as one are chosen appropriately Thus, the convexity of v
10
at an extremal point is diagonalized if the gradient of the local coordinates and the other coordinates in the hyperplane tangent to the level hypersurface. is symmetric with respect to the direction of rv.
B. Partial Symmetries and the Nested Loci At a local symmetry point, the convexity of the level hypersurface of v is symmetric about the gradient vector. Therefore, in order to obtain more information about the shape, let us consider the locus at which the level hypersurface is symmetric only about some linear space containing rv by looking for the smallest linear subspace that is invariant under H and in which the gradient lies. To be more specific, consider the matrix 6 whose columns are rv, Hrv, H2 rv, . . . , Hn1 rv
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
345
We observe that if rv is an eigenvector of H, the rank of 6 is 1. If rv is in the subspace spanned by k eigenvectors, then rv, Hrv, H2 rv, . . . , Hk rv will form a linearly dependent set, making the rank of 6 less than or equal to k. Therefore, Sk,n , the k-dimensional partial symmetry locus of an n-dimensional shape, may be defined as the locus of points where 6 has rank k. Now, let , a connected, bounded, open subset in
D Sn,n ¦ Sn1,n ¦ Ð Ð Ð ¦ S1,n ¦ S0,n To determine Sk,n , it is sufficient to test for linear dependence among first k C 1 columns of 6 by setting the k C 1 ð k C 1 minors of the first k C 1 columns of 6 equal to zero. Then Sn1,n is given simply by the zerocrossings of the determinant of 6. The condition for S1,2 reduces to finding the zero-crossings of the determinant of [rvjHrv]; namely, S1,2 is the locus of points where v2y v2x vxy vx vy vyy vxx D 0, or, equivalently, the locus of points at which jjrvjj is extremum along a level curve (see Equation (7)). The condition for S1,3 reduces to Equation (10) and S0,n is given by vx1 D vx2 D Ð Ð Ð D vxn D 0. C. Illustrative Examples and Discussion Local symmetry axes S1,2 of various 2-D shapes are shown in Figure 15. Notice that the symmetry axes are not connected. A simple closed curve may be thought of as a deformation of a circle by means of protrusions and indentations; as it evolves towards a more circular form, the symmetry axes track the evolution of its protrusions and indentations (Figure 16). The direction of evolution at each point of S1,2 is the direction of decreasing v. During the course of evolution, a protrusion might merge with an indentation joining the two branches of S1,2 , which are tracking, respectively, the evolution of a protrusion and the evolution of an indentation. More complicated merges among the branches are also possible, and in such a case, a new branch might start from the junction. If a branch is not terminated at a junction it will terminate at a point in S0,2 . If the point is a minimum point of v, then the evolution comes to rest at that point, and it is appropriate to call such a point the center of a part. If a point in S0,2 is not a center, it may signify a change in the topology of the evolving curve, i.e., a breakup of the shape due to a thinning neck. If the point signifies a change in the topology of the shape, it is called neck point. Figure 17 depicts S0,2 ² S1,2 detected as the simultaneous zero-crossings of vx and vy . For the duck shape, S0,2 consists of three points; one of them is located in the middle
346
SIBEL TARI
FIGURE 15. Symmetry axis — S1,2 — for various shapes.
FIGURE 16. Symmetry axes track the evolution of the portrusions and the indentations of the level curves of v as they evolve toward a more circular form.
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
347
FIGURE 17. S0,2 : simultaneous zero crossings of vx and vy . For both shapes, S0,2 consists of three points: two centers and one neck.
of the duck’s neck and the other two are located, respectively, at the centers of the head and the body. In the case of two overlapping squares, again, two of the three points in S0,2 are located, respectively, in the centers of the two squares and the third point is located in the middle of the neck that connects the two squares. In the case of the rectangle, the only point in S0,2 is located at the center of the shape. In differential geometric terms, a point in S0,2 is either elliptic, hyperbolic, or parabolic, depending on whether the product of the eigenvalues (or the determinant) of the Hessian of v is positive, negative, or zero, respectively. An elliptic point is a minimum, and hence a center, if both eigenvalues are positive. A hyperbolic point is a neck point. Figure 18 depicts v in the vicinities of the duck’s neck and the center of the duck’s head.
FIGURE 18. v near (a) hyperbolic and (b) an elliptic point.
348
SIBEL TARI
Along all the branches of S1,2 meeting at a center, v is monotonically increasing when it is moving away from the center; in other words, all the branches of S1,2 “flow into” the center. When moving away from a neck point, there are at least two branches of S1,2 along which the edge-strength function decreases monotonically; these branches, which “flow out” of the neck point, belong to shape skeleton; the branches that flow into the neck point decompose the shape. The parabolic points indicate global shape symmetry (i.e., parallel boundaries) if one of the eigenvalues is positive. The points satisfying this condition occur rarely in binary images due to the effect of diffusion. However, in real images it is possible to detect points that satisfy this condition; they are somewhat troublesome to classify because of their global nature. Branches of S1,2 meeting at such a point (or the connected component of S0,2 containing the point) need to be analyzed. If there are at least two branches of S1,2 flowing out of the connected component containing the point, then it is a neck. If all the branches of S1,2 meeting at the connected component containing the point flow into it, it is a center. Note, however, that it is impossible to test numerically whether the gradient of v and the determinant of its Hessian vanish simultaneously anywhere because it requires infinite precision; thus a numerical criterion may be adopted, such as if the gradient of v and the determinant of its Hessian both vanish within a pixel (possibly at different points within the pixel), the pixel is marked as parabolic. The parabolic points are classified according to the sign of the eigenvalues of the Hessian. Because one of the eigenvalues will be nearly zero, we look at the eigenvalue that is largest in magnitude. If the sign of this eigenvalue is positive, the parabolic point is classified as belonging to the shape skeleton. If it is negative, it is classified as belonging to an object boundary. Ideally, along the object boundary v should be constant and equal to the maximum value of v. However, this happens rarely if at all, due to the existence of noise and varying contrast and the interactions among nearby points. The parabolic points of the shape boundary are replaced by a sequence of maximum and saddle points and the parabolic points of the shape skeleton are replaced by a sequence of minimum and saddle points. Indeed, the preceding definition of parabolic skeleton point includes minimum points and the saddle points that are minima in the direction of maximal change. Equivalently, the definition of parabolic boundary point includes maximum points and the saddle points that are maxima in the direction of maximal change; those points shall be called maximum points of v and the 0 . locus of such points shall be denoted by S0,2 Nested loci S2,3 ¦ S1,3 ¦ S0,3 for a square prism are shown in Figure 19. S2,3 , which is a two-dimensional locus in three-dimensional space, is displayed in black. A thick dark gray marker is used for S1,3 and a thick light gray marker is used for S0,3 . The loci are displayed in a sequence of slices perpendicular
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
349
FIGURE 19. Nested symmetry loci — S2 , S1 , S0 — for a square prism. S1 is dark gray; S0 is light gray. S2 consists of three orthogonal planes parallel to the prism faces and passing through its center and 12 essentially planar surfaces emanating diagonally from the 12 edges of the prism. S0 consists of a single point, the centroid. Notice that S0 ² S1 ² S2 . Part of S2 will form the shape ‘skeleton’.
to one of the short axes of the prism. Starting from the first slice, every other slice is displayed. Slice 27 is through the center of the prism. After Slice 27, only two more slices are displayed, because the shape is symmetric. It is not difficult to visualize the symmetry loci. S0,3 consists of a single point, the centroid. The locus S2,3 consists of 3 orthogonal planes parallel to the prism faces and passing through its center and 12 essentially planar surfaces emanating diagonally from the 12 edges of the prism. S1,3 consists of the three principal axes of the prism, 12 lines starting from the midpoints of 12 edges, and 8 lines starting from 8 corners. In Figure 20, S1,3 is again depicted; it is a one-dimensional locus in three-dimensional space. A case of a noisy shape is examined in Figure 21 through a wiggly prism. Notice that the locus is quite similar to the locus for a perfect prism. Very small branches start at the shape boundary and terminate very quickly. Perturbations of the shape do not change the dominant symmetry locus. In fact, as is increased, noise branches get even shorter.
350
SIBEL TARI
FIGURE 20. Symmetry axis — S1,3 — for a square prism: 1-D locus in 3-D space.
FIGURE 21. S2 for a wiggly rectangle prism. (a) A cross section of the wiggly prism. (b) – (g) S2 in selected slices. Notice that the locus is roughly the same as the locus for a perfect prism. There are small branches near the boundary, which shorten as increases.
The effect of on symmetry axes is illustrated in Figure 22; as is increased from 4 to 32, the symmetry branches due to discretization artifacts, and minor boundary details either disappear or shorten. To demonstrate locality, let us examine the symmetry loci of a compound shape (Figure 23). An ellipsoid that is rotated 45° about each axis, a rectangular prism, and a triangular pyramid intersect the ellipsoid. Some slices from S2,3 are displayed. Observe that, away from the ellipsoid, the loci of
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
351
FIGURE 22. Regularization effect of on symmetry axes. As is increased from (a) 4 to (b) 16 and then to (c) 32, branches tracking less important details and discretization artifacts either disappear or shorten.
the polyhedra are exactly what one would get if they were isolated. In the neighborhood of the intersection of the solids, there is a small distortion of these loci. Nested symmetry loci of the synthetic gray scale image computed from v, which was shown in Figure 8, is depicted in Figure 24. Finally, Figure 25 depicts the level curves of v computed for a color image; v is computed by means of a coupled diffusion system presented in Shah [1996].
V. SHAPE SKELETON AND SHAPE SEGMENTATION (2-D CASE) Symmetry axes track the evolution of the curvature extrema of the shape boundary. The shape skeleton and the shape segmentation are related, respectively, to the positive curvature maxima (protrusions) and negative curvature
352
SIBEL TARI
FIGURE 23. S2 for a complex shape. (a) Cross section taken through the middle of the shape. It consists of an ellipsoid rotated 45° degrees about an each axis, a prism, and a pyramid. (b) – (g) S2 at selected slices. The loci for the ellipsoid and the two polyhedra deform only in the neighborhood of their intersection.
FIGURE 24. Nested symmetry set for the synthetic gray scale image; S0,2 consists of five centers shown in light gray.
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
353
FIGURE 25. Color image; RGB components of the color image are shown in (a) – (c); level curves of v and S1,2 in the selected area are shown in (d) and (e). Note that a single edge-strength function is computed directly from an unprocessed color image.
minima (indentations). Thus, the points where jjrvjj attains, respectively, its local maximum and its local minimum along the level curves of v, should be considered. The second derivative of jjrvjj along the level curves is given by d2 jjrvjj v v v D v C ds2 jjrvjj
11
354
SIBEL TARI
where s is the arc length along the level curves and v D v D v D
fv2y vxx 2vx vy vxy C v2x vyy g jjrvjj2 fv2x vxx 2vx vy vxy C v2y vyy g jjrvjj2 1 fvx v2y vxxx C vy v2y 2v2x vxxy jjrvjj3 C vx v2x 2v2y vxyy C v2x vy vyyy g
C (S1,2 ) denote the locus of points in S1,2 where d2 jjrvjj/ds2 > Let S1,2 C 2 2 0 d jjrvjj/ds < 0. That is, at a point in S1,2 (S1,2 ), jjrvjj is minimum (maximum) along the level curve; thus curvature of the level curve passing from the point is maximum (minimum). C and S1,2 for the duck and the overlapping Figures 26 and 27 depict S1,2 squares, respectively. Each protrusion of the shape boundary gives rise to C C , and the evolution of the protrusions is tracked by S1,2 . a branch of S1,2 Each indentation of the shape boundary gives rise to a branch of S1,2 . As two indentations on the opposite sides of a neck begin to evolve toward each other, . However, as the indentations initially they are tracked by the branches of S1,2 approach a break point, they interact and slow down the rate of decay of v. splits into three branches. The middle branch belongs to Each branch of S1,2 C S1,2 , and it continues toward the break point, whereas the other two belong to S1,2 (Figure 27). Now, the shape skeleton and the shape segmentation can be defined.
C FIGURE 26. S1,2 and
1,2
superimposed on the level curves of v.
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
355
C FIGURE 27. S1,2 (black) and S1,2 (gray) superimposed on the level curves of v.
C 0 Definition 1 The shape skeleton is the subset of S1,2 [ S0,2nS0,2 that excludes C those branches of S1,2 nS0,2 that flow into a connected component of S0,2 containing a neck point. C The definition is designed to exclude the branches of S1,2 along which necks of the shape evolve toward a break. The definition also excludes those points of S0,2 that belong to the object boundary (pages 348 –49).
Definition 2 Shape decomposition, or shape segmentation, is the union of branches of S1,2 that flow into a neck point. Note that the definition of the decomposition of shape corresponds to its breakup into parts during evolution due to presence of narrow necks. It does not deal with protrusions. Significant protrusions, such as fingers of a hand, have to be recovered as parts of a shape from the branches of the skeleton. The strict definitions, however, are not very attractive in terms of implementation. They are hard to calculate because one has to trace the branches of S1 . Fortunately, calculations may be simplified. At a break point, due to the interaction of opposite boundaries on the two sides of the neck, the boundaries tend to form a cross rather than being tangent to each other (Section III, annulus example). In other words, at break points, v tends to have hyperbolic points rather than parabolic points. The advantage of having a hyperbolic C flowing point is that the curvature of the level curve along the branch S1,2 into a neck point is always negative, whereas it is positive along the branch flowing out. The simplified definitions of the shape skeleton and shape decomposition are as follows: Definition 3 The skeleton of a shape is the union of S0,2 and those branches C along which the curvature of the level curves is positive. of S1,2
356
SIBEL TARI
C Definition 4 The segmentation of a shape is the union of S1,2 along which the curvature of the level curves is negative. 0 Note that maximum points of v (S0,2 ) are automatically excluded from the 0 skeleton because along the branches of S1,2 emanating from the maxima of v, the level curves of v have negative curvature. However, they do become part of the segmentation. Notice that we now have the segmentation of the raw image given by 0 and the segmentation of the shape. When the simplified definition is used, the branches of segmentation will not extend to the boundary of the shape, because they may start out as branches (i.e., overlapping squares) or level curves of v along the branch may of S1,2 initially have positive curvature (i.e., annulus). The length of the segmentation branch depends on . The larger the value of , the longer the branch. The strict definition must be used if the segmentation must be extended to the boundary. The skeleton and shape segmentation depend on the choice of and should be computed for many values of to provide a scale-space representation such as the Laplacian pyramid. One can compare robustness of the skeleton branches with respect to increasing values of . With increasing , less significant branches become shorter, whereas the most prominent branches remain essentially unchanged (Figure 22). There are two other ways in which one may assign a level of significance to a point on the symmetry axes or the shape skeleton. Branches tracking less important details terminate earlier than the branches tracking globally more prominent details. Therefore, the value of v at the termination point is a measure of significance of a branch. Because 1 v may be interpreted as corresponding to time in the evolution of the boundary, our first measure is the time of extinction of a branch. Finally, another measure of significance is the curvature of a level curve at points on the skeleton. Thus, along a level curve, a skeleton point with a large curvature indicates a more significant protrusion than one with a smaller curvature. Skeletons of various shapes are depicted in Figure 28; the shape skeletons at the top row are computed using the strict definition and the ones at the bottom row are computed using the simplified definition. The shape segmentations of a duck and overlapping squares using the simplified definition are shown in Figure 29; the segmentation of the overlapping squares using the strict definition is also shown in the same figure. Figure 30 illustrates the pruning of a shape skeleton based on the curvature of the level curves. In Figure 31, the shape skeleton and the shape segmentation of the incomplete rectangle are shown. The skeleton consists of the skeleton of the complete rectangle and two new branches, which may be interpreted as the skeleton of gaps. The larger the value of , the shorter these additional branches. The gaps
357 FIGURE 28. Shape skeletons for various shapes.
358
SIBEL TARI
FIGURE 29. Shape decomposition for various shapes. (a) and (b) Simplified definition. (c) Strict definition.
FIGURE 30. Pruning the shape skeleton; points with large level-curve curvature indicate a more significant protrusion.
FIGURE 31. Incomplete rectangle. Shape skeleton acquires two extra branches, which may be interpreted as the gap skeleton. The segmentation fills in the missing portions of the boundary. Simplified definitions are used.
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
359
FIGURE 32. Vase or face?
themselves are filled in by the decomposition lines. Note that this is analogous to a Gestalt phenomenon, figure ground reversal (Figure 32). The point is that v should be computed over the whole plane, and the skeleton and the segmentation really describe the complement of 0. Whether the complement of 0 is connected or not is irrelevant to the computation of the skeleton and the decomposition. This property is especially useful in real images: Ideally, the edge-strength function v computed from a raw image by the coupled diffusion system given in Equation (4) should be constant along the object boundaries so that its level curves correspond to the ideal cases discussed up to here. However, this almost never happens due to noise, varying levels of contrast along the object boundaries, and interactions between nearby edges. Therefore, the object boundaries are no longer defined by the level curves of v. Typically, one should expect a level curve corresponding to a value of v near its maximum to consist of several connected components, each surrounding a high-contrast portion of 0. The situation is analogous to the case of an incomplete rectangle, where the boundary consists of several disconnected pieces. Note that even in the case of the original Mumford-Shah functional given in Equation (1), if ˛ is not high enough, 0 may not include the portions of the object boundary where the contrast is too low. The important point is that it is not essential to have the complete shape boundary to compute its local symmetry. As the evolution progresses, the gaps between the pieces of the boundary are filled in. Thus, an essentially correct skeleton can still be recovered. As pointed out in Section 2, as the value of ˛ increases, the penalty for segmentation decreases; thus 0 becomes more and more detailed, adding to the skeletal details. This effect on the skeleton is distinct from the effect of varying , where the details of the initial shape boundary are fixed and the skeleton is smoothed as is increased. Three different edge-strength functions, v˛O , v2˛O , and v4˛O , of the ventricle area of the MR image — which are computed by the coupled diffusion system
360
SIBEL TARI
FIGURE 33. Ventricle area ( D D 8). (a) Edge-strength function v˛O . (b) Level curves superimposed on the original.
FIGURE 34. Ventricle area ( D D 8). (a) Edge-strength function v2˛O . (b) Level curves superimposed on the original.
given in Equation (4), setting D D 8 and picking three different values of ˛ — are shown in Figures 33–35. The value of ˛O is sufficiently low so that only the corner portions of the ventricle are prominent in the image of v˛O . The inner details, such as the inner boundaries of the four disjoint lobes, are smoothed over to a large degree. The value of ˛ is doubled to obtain v2˛O and doubled again to obtain v4˛O . At each stage, the edge-strength function becomes more detailed. The shape skeleton and the shape segmentation computed from
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
361
FIGURE 35. Ventricle area ( D D 8). (a) Edge-strength function v4˛O . (b) Level curves superimposed on the original.
FIGURE 36. Ventricle area: (a) Shape skeleton and (b) shape segmentation computed from v˛O .
v˛O are depicted in Figure 36. Notice that the shape skeleton for v˛O is similar to the shape skeleton of the earlier example of an incomplete rectangle. The shape segmentation loci, however, fill in the missing portions of the boundary, and a crude segmentation, which captures the ventricles in a form of a dog-bonelike shape, is obtained, providing a simultaneous segmentation of images and shapes [Tari and Shah, 1997]. As ˛ is increased, the shape skeleton becomes more detailed and depicts the axes of the four lobes more accurately and finds a center for each lobe (S0,2 has two sets of parabolic points shown in gray and two elliptic points shown in black); the segmentation in this case clearly captures all four lobes (Figure 37).
362
SIBEL TARI
FIGURE 37. Ventricle area: (a) Shape skeleton. (b) S0,2 . (c) Shape segmentation computed from v˛O . Shape skeleton captures four distinct lobes, the singular locus contains the four centers, 0 [ shape segmentation provides an anatomically correct segmentation. and S0,2
VI. COLORING NESTED SYMMETRIES IN HIGHER DIMENSIONS For the 2-D case, jjrvjj is (approximately) inversely proportional to the levelcurve curvature, and the shape skeleton is related to the positive curvature maxima. Thus, to extract the shape skeleton, it is checked whether jjrvjj is a local minimum along the level curve. It is, however, difficult to extend this strict definition beyond three dimensions. An alternative definition of the skeleton of a 2-D shape is the union of S0,2 and those points of S1,2 nS0,2 where the level curve has a positive curvature. The relevant points by this method are the points where the second derivative of v in the direction orthogonal to rv is positive. This measure may be generalized to higher dimensions by considering a measure that depends only on the second derivatives of v along the directions orthogonal to L, the linear space spanned by the columns of 6. In the case
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
363
of S1,n , such a measure is already available and is provided by the mean curvature, , given by the following formula: DrÐ D
rv jjrvjj
1 [r2 v rvHrvT ] jjrvjj
12
To extend this construction to every Sk,n , note that the expression in brackets is just the sum of the eigenvalues corresponding to the eigenvectors of H orthogonal to rv. (The first term, the Laplacian of v, is the trace of H.) Now, consider the linear space L spanned by the columns of 6 and the linear space T orthogonal to L and choose a basis for Rn by choosing an orthonormal basis for L and an orthonormal basis for T. Then, at points in Sk,n nSk1,n , H is block-diagonal with respect to this basis, consisting of a k ð k block, 3k , and an n k ð n k block, 2nk . At points of Sk,n nSk1,n , define k,n D
trace2nk jjrvjj
13
The numerator in the expression is simply the sum of the second derivatives of v along the basis vectors of T. For the case of S2,3 , the linear space T is one dimensional and is spanned by the unit vector t given by tD
rv ð Hrv jjrv ð Hrvjj
Then 2,3 D
14
tHt jjrvjj
Notice that the preceding construction produces the correct shape skeleton for a dog-bone shape; but for a rectangle, all the symmetry points will be classified as belonging to the shape skeleton. Fortunately, the strict construction can be extended to the 3-D case by taking planar sections of the surface in many directions orthogonal to the linear space L spanned by the columns of 6 and checking that in each plane, jjrvjj is an extremum. A. 3-D Case First consider the case of S1,3 : rv is an eigenvector of H and the linear space L is spanned by rv. Let n be the unit normal vector rv/jjrvjj, and let t be
364
SIBEL TARI
a vector in the plane orthogonal to rv. Then, the second derivative of jjrvjj along the level curve of v in the plane spanned by n and t is given by utt vtt ð vnn /u where u D jjrvjj. A systematic way to choose the directions of t is as follows. 1. Compute the three eigenvectors of H. Let e1 be the one that satisfies e1 Ð rv D maxei Ð rv i
2. Choose m tangent directions, ti , i D 1, . . . , m, as follows: for i D 1 : m m ti D e2 cosangle C e3 sinangle
angle D i 1
In the case of S2,3 , the linear space L is spanned by two vectors: the gradient rv and Hrv. Thus the only direction (which is orthogonal to L) to check is given by t (Equation (14)). As in the case of two dimensions, it is possible to assign a level of significance to the skeleton points based on the robustness with respect to increasing and the time of extinction. Also, those points of Sk,n with a large k,n indicate more significant protrusions than the ones with a smaller k,n . 1. Singular Locus As in the case of S0,2 , S0,3 carries information regarding centers, necks, global symmetries, and object boundaries, and the points in S0,2 can also be classified based on the eigenvalues of the Hessian of v. Let 1 , 2 , and 3 be the three eigenvalues of the Hessian of v. Consider the following cases. 1 > 0,
2 > 0,
1 D 2 D 0, 1 D 0, 1 2 < 0, 1 D 0,
3 > 0
3 > 0
2 > 0,
3 > 0
3 > 0 2 3 < 0
The first case, a local minimum, corresponds to a shape center, such as the centroid in the prism example. The second and third cases correspond to
AN INTEGRATED APPROACH TO COMPUTATIONAL VISION
365
global shape symmetries, e.g., parallel boundaries. Neglecting the effect of diffusion, the second case occurs at the center, along the long axis of a square prism. The fourth and fifth cases are the saddle points, and they correspond to neck points where the shape breaks up. The former case is attained for a dumbbell-like shape. At the break point two parts touch at a single point. The latter case arises for a cylinder with a peanut-shaped cross-sectional area. Two separated parts, each of which is a cylinder, touch along a line perpendicular to the base of the cylinder. For the case of raw images, S0,3 also contains maximum points of v, namely, the points at which the eigenvalue with the largest magnitude is negative.
ABSTRACT A new development in local symmetry extraction and its connections to segmentation functionals and the fronts propagating with curvature-dependent speed are examined. The basic tool is a new distance function that attains its maximum value at the shape boundary and decays rapidly away from there. It is shown that the Hessian of the distance function captures perceptual information that can be extracted easily, efficiently, and robustly in the form of nested local shape symmetries at multiple scales. The most important property of the distance function is that it can be computed from unprocessed images; thus the segmentation loci of an image can be represented without the knowledge of the segmentation loci itself. The formulation is also valid for shapes in any dimension. REFERENCES Ambrosio, L., and Tortorelli, V. M. (1992). On the approximation of functionals depending on jumps by quadratic, elliptic functionals, Bol. Un. Mat. Ital. Attneave, F., and Arnoult, M. (1956). The quantitative study of shape and pattern perception, Psychology Bull. 53, 452 – 471. Bajcsy, R., and Solina, F. (1987). Three dimensional object representation revisited, IEEE Proc. of the Int. Conf. on Computer Vision. Binford, T. O. (1971). Visual perception by computer, IEEE Proc. Conference on Syst. Contr. Blum, H. (1973). Biological shape and visual science, J. of Theoretical Biology 38, 205 – 287. Blum, H. (1967). A Transformation for Extracting New Descriptors of Shape, in Models for the Perception of Speech and Visual Form (W. Wathen-Dunn, ed.). Cambridge, Mass.: MIT Press. Brady, M., and Asada, H. (1984). Smoothed local symmetries and their implementation, Int. J. of Robotics Research 3, 36 – 61. Brooks, R. A. (1981). Symbolic reasoning among 3-D models and 2-D images, AI 17, 285 – 348. Burbeck, C., and Pizer, S. (1994). Object representation by cores: Identifying and representing primitive spatial regions, UNC Tech Report, TR94-160.
366
SIBEL TARI
Kimia, B. B., Tannenbaum, A. R., and Zucker, S. W. (1995). Shapes, shocks, and deformations I: The components of shape and the reaction-diffusion space, Int. J. of Computer Vision. Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt Brace. K¨ohler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology. Liveright. Latto, A., Mumford, D., and Shah, J. (1984). The representation of shape, Proc. of the Workshop on Computer Vision Representation and Control. Leyton, M. (1987). Symmetry-curvature duality, CVGIP 38, 327 – 341. Leyton, M. (1988). A process grammar for shape, AI 34, 213 – 247. Marr, D. (1982). Vision. New York: Freeman. Marr, D., and Nishiara, K. (1977). Representation and recognition of the spatial organization of three dimensional shapes, Proc. of the Royal Soc. of London B200, 269 – 294. Matlin, M. W., and Foley, H. J. (1992). Sensation and Perception. Reading, Mass.: Allyn and Bacon. Morel, J. M., and Solimini, S. (1996). Variational Methods in Image Segmentation, Birkh¨auser, Progress in Nonlinear Differential Equations and Their Applications. Mumford, D. (1987). The problem of robust shape descriptors, IEEE Proc. of the Int. Conf. on Computer Vision. Mumford, D., and Shah, J. (1985). Boundary detection by minimizing functionals, IEEE Proc. Conf. on Computer Vision Pattern Recognition. Mumford, D., and Shah, J. (1989). Optimal approximations by piecewise smooth functions and the associated variational problems, Comm. on Pure and Applied Math. XLII (5), 577 – 684. Ogniewicz, R. L. (1994). Skeleton-space: A multiscale shape description combining region and boundary information, IEEE Proc. of Conf. on Computer Vision Pattern Recognition. Osher, S., and Sethian, J. (1988). Fronts propagating with curvature dependent speed: Algorithms based on the Hamilton-Jacobi formulations, J. Comp. Physics, 79. Resnikoff, H. (1989). The Illusion of Reality. New York: Springer Verlag. Sethian, J. (1982). An analysis of flame propagation, Ph.D. diss., University of California Berkley, CPAM Report 79. Shah, J. (1991). Segmentation by nonlinear diffusion, II, IEEE Proc. Conf. on Computer Vision Pattern Recognition. Shah, J. (1992). Segmentation by nonlinear diffusion, II, IEEE Proc. Conf. on Computer Vision Pattern Recognition. Shah, J. (1996). Curve evolution and segmentation functionals: Application to color images, IEEE Proc. Int. Conf. on Image Processing. Shah, J., Pien, H., and Gauch, J. (1996). Recovery of surfaces with discontinuities by fusing shading and range data within a variational framework, IEEE Trans. in Image Proc. 5, 1243 – 1251. Siddiqi, K., and Kimia, B. (1996). Toward a shock grammar for recognition, IEEE Proc. of Conf. on Computer Vision Pattern Recognition. Tari, S., and Shah, J. (1998). Local symmetries of shapes in arbitrary dimension, IEEE Proc. Int. Conf. on Computer Vision. Tari, S., and Shah, J. (1997). Simultaneous segmentation of images and shapes, Vision Geom. Tari, S., Shah, J. and Pien, H. (1996). A computationally efficient shape analysis via level sets, IEEE Proc. Workshop in Math. Methods in Med. Image Anal. Tari, S., Shah, J., and Pien, H. (1997). Extraction of shape skeletons from grayscale images, Computer Vision Image Understanding 66, 133 – 146. Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II (translated as Laws of Organization in Perceptual Forms), in A Source Book of Gestalt Psychology, (W. D. Ellis, ed.). London: Routledge & Kegan Paul. pp. 71 – 88.
INDEX Apertures, 305 Artificial images. See Natural and artificial images Auger electron spectroscopy (AES), 120 Autobinomial model, 223 – 4 Baye’s rule, 170 Beam current, 129 Bloch equation, 247 Bochner’s theorem, 283 Boersch effect for electron emitters: Jansen’s theory, 116 – 19 Knauer’s model, 119 Boersch effect for Schottky emitters, 143 – 6 ˇ() function, 54 – 5, 126 – 7, 139 Brightness, 114 indirect measurement of, 126 – 7 measurement of, 124 – 6 of Schottky electron emitters, 147 – 51 Cauchy formula, 204 Chemical shift, 246 Complex convolution (CC), calculating, 8 Computational vision: defined, 327 edge-strength function, development of, 330 – 2 edge-strength function, geometry of, 335 – 42 edge-strength function, review of, 332 – 5 examples and discussion, 345 – 51 Gestalt laws and, 328 – 9 grass fire, 329 – 30 local symmetry set, 343 – 4 nested symmetries in higher dimensions, 362 – 5 partial symmetries and nested loci, 344 – 5 Process Inferring Symmetric Axis (PISA), 329 shape representation methods, 328 shape skeleton and segmentation, 351 – 62 wave propagation, 329 Coulomb interactions, 116 Current density, 109 – 11 Curve band-pass operator point-spread functions: curved k-plane trajectories and, 283 – 6 defined, 281 integration of, 298 – 306
as an inverse Fourier transform of a finite nonnegative Lebesgue measure, 281 – 3 lines, squares, and circles in k-space and, 286 – 98 reconstruction from discrete data samples and, 306 – 12 Curve band-pass operators, 263 band-pass operators as integrals of, 273 – 6 Fourier transforms restricted to curves in R2, 264 – 70 on Paley-Wiener spaces, 270 – 3 properties of, 278 – 81 resolution of the identity for Paley-Wiener spaces, 276 – 8 Curve reconstruction operator, 273 Cyclic convolution property (CCP), 4 – 5 Cyclic convolution property, (cont.) calculation of two-dimensional, 67 for 2-D FNTs, 10 – 11 Data errors, in 2-D FNT, 44 Deal leaves model: average area after occlusion, 208 – 10 Boolean model, 236 – 7 description of, 194 – 9, 231 – 9 distribution of intercepts in, 201 – 4 geometric covariogram, 199 – 201 grain of, 195 Lebesgue measure, 196 Poisson point processes, 231 – 6 random closed and compact sets of Rn , 193 in R3 , 238 – 9 restrictions of, 192 size distribution of grains, 207 – 8 stability of power law, 205 – 7 Direct/weighted reconstruction, 244, 257 – 8 via natural k-plane coordinates, 260 – 1 Discrete Fourier transform (DFT), 4 Distribution functions, calculation of, 98 – 100 Echo planar imaging (EPI), 250, 305 Edge-strength function development of, 330 – 2 geometry of, 335 – 42 review of, 332 – 5 Eklundh’s fast algorithm, 20
368
INDEX
Electron emission theory: brightness, 114 calculation of distribution functions, 98 – 100 current density, 109 – 11 description of emission process, 96 – 8 emission at high temperature and low field, 100 – 5 emission at low temperature and high field, 105 – 7 extended-Schottky emission, 97, 103 – 5 parameter space determination, 108 – 9 Schottky emission, 97, 102 – 3 thermal-field emission, 98, 105 – 7 thermionic emission, 97, 100 – 1 total energy distribution, 111 – 13 Electron emitters: See also Schottky electron emitters stability, measurement of, 127 – 8 Electron emitters, Boersch effect for: Jansen’s theory, 116 – 19 Knauer’s model, 119 Electron-energy analyzer, 120 – 2, 129 Energy distribution, total, 111 – 13 indirect measurement of, 126 – 7 measurement of, 120 – 3 measurement of, in heated tungsten cathode, 141 – 2 Euler-Lagrange equation, 333 – 5 Extended-Schottky emission, 97, 103 – 5 Extractor, 97
inspection method, implementation of, 49 – 53 nonzero pixels in, 44 relationship between transform size and modulus, 9 – 10 Fermi-Dirac function, 97, 104 Field emission, 98 Forward transform, 54, 60 – 2 multidimensional, 64 Fourier transform, 54 Fractal models: fractional Brownian motion, 216 – 18 iterated function systems, 219 – 21 midpoints displacement method, 218 – 19 spectral synthesis method, 219 Fractional Brownian motion, 216 – 18 FRAME model, 226 – 8 Fubini-Tonelli theorem, 189 Full width at half maximum (FWHM), 113, 117 – 19, 125, 135 – 6, 139, 146
Fast algorithms for Mersenne number transforms, 56 – 8 Fast Fourier transform (FFT), 4, 244, 254 – 6 Fast Walsh transform, 53 Fermat number transforms (FNTs), 6 – 7 Fermat number transforms, one-dimensional (1-D FNTs), 50 Fermat number transforms, two-dimensional (2-D FNTs): analysis of periodic structures, 38 – 46 applications, 46 – 9 arithmetic complexity, 10 combined with 2-D NMNT, 68 – 72 convolution property, 10 – 11 defined, 9 effect of data errors in, 44 examples, 11 generalization for, 41 – 4
Hahn-Banach theorem, 308 Hammersley-Clifford theorem, 222 Hartley transform, 54 Heine-Borel theorem, 265 Hermitian conjugation property, 286 – 7 Hilbert-Schmidt theorem, 280 Hilbert space, 263 – 4 Holtsmark regime, 118 – 19, 144 Hypertexture model, 213
Gamov exponent, 103, 141 Geometric covariogram, 199 – 201 Gestalt laws, 328 – 9 Gibbs field, 222 Gibbs sampler, 223, 226 – 7 Gradient echo, 248 Grass fire, 329 – 30 Gridding reconstruction, 244, 256 – 7 used to provide direct reconstruction, 258 – 9 Gyromagnetic ratio, 246
Inspection method, implementation of, 49 – 53 Intercepts in dead leaves model: distribution of, 201 – 4 Inverse discrete Fourier transform (IDFT), 257 – 8 point-spread functions and, 281 – 3 Inverse transform, 54, 61, 63 multidimensional, 64 – 5
INDEX Isoperimetric inequality, 187, 189 Iterated function systems, 219 – 21 Jansen’s theory, 116 – 19 Knauer’s model, 119 k-plane coordinate systems: direct reconstruction via, 260 – 1, 312 – 23 examples of, 261 – 2 multiple-shot, 313 – 14 normal segment, 259 – 60 reconstruction from single acquisition of data, 262 – 3 single-shot, 313 – 14 k-plane trajectories, point-spread functions and, 283 – 6 k-space, 247 – 53 point-spread functions and, 286 – 98 Larmor frequency, 246 Lebesgue measure, 196, 281 – 3 Linear decomposition, 174 Lissajous trajectories, 252 – 3, 305 Lorentz regime, 118, 144 Magnetic resonance imaging (MRI): generation of, 245 – 8 gradient limitations, 253 – 4 k-space, 247 – 53 reconstructions of undersampled simulated, 316 – 23 Magnetic resonance imaging, curve band-pass operator point-spread functions and: curved k-plane trajectories and, 283 – 6 defined, 281 integration of, 298 – 306 as an inverse Fourier transform of a finite nonnegative Lebesgue measure, 281 – 3 lines, squares, and circles in k-space and, 286 – 98 reconstruction from discrete data samples and, 306 – 12 Magnetic resonance imaging, curve band-pass operators and, 263 band-pass operators as integrals of, 273 – 6 Fourier transforms restricted to curves in R2 , 264 – 70 on Paley-Wiener spaces, 270 – 3 properties of, 278 – 81
369
resolution of the identity for Paley-Wiener spaces, 276 – 8 Magnetic resonance imaging, reconstruction of nuclear: on Cartesian grids, 244 conclusions, 323 – 4 direct/weighted, 244, 257 – 8 fast Fourier transform (FFT), 244, 254 – 6 gridding, 244, 256 – 7 gridding used to provide direct reconstruction, 258 – 9 natural k-plane coordinate systems, 259 – 63, 312 – 23 Magnetization, 246 Markov field, 222 – 3 Mersenne number transforms (MNTs), 6 calculation of 1-D convolutions, 55 – 6 fast algorithms for, 56 – 8 properties of the ˇ() function, 54 – 5 transform defined, 53 – 4 Mersenne number transforms, separable two-dimensional new (2-D NMNTs): calculation of 2-D convolutions, 63 – 4 defined, 62 – 3 forward and inverse transforms, 62 – 3 forward and inverse transforms, multidimensional, 64 – 5 Mersenne number transforms, two-dimensional new (2-D NMNTs): calculation of convolutions, 59 – 60 calculation of multidimensional convolutions, 61 combined with 2-D FNT, 68 – 72 defined, 58 multidimensional forward and inverse transforms, 60 – 1 relationship between transform sizes and modulus, 59 Meyer-Mallet scaling function, 301 – 2, 304, 306 Midpoints displacement method, 218 – 19 Monte Carlo techniques, 116 Multiscale synthesis model, 225 – 6 Mumford-Shah model, 330 – 3 Natural and artificial images: See also Deal leaves model; Textures size power law, 169, 205 – 7 statistical studies, 168 – 9
370 Natural images, principles for synthesis of abstract, 228 scale and perspective, 229 – 30 structural laws, 229 synthesis rules and synthetic worlds, 230 – 1 Natural images, sizes of sections in, 175 digital photographs, 177 – 9 distribution of areas, 176 – 7 distribution of boundary lengths, 184 – 5 distribution of intercept lengths, 185 – 6 lower bound for the BV norm and applications, 186 – 91 other types of images, 179 – 84 Natural images, statistics of covariance and power spectrum, 172 – 3 first-order, 171 – 2 linear decomposition, 174 motivations, 170 – 1 scale invariance, 175 second-order, 172 – 4 Natural k-plane coordinate systems. See k -plane coordinate systems Noise function, 213 Normal energy, 98 Normal-segment coordinates, 259 – 60 Normal vectors perturbation method, 211 – 12 Nuclear magnetic resonance. See Magnetic resonance imaging, reconstruction of nuclear Number domain analysis and two-dimensional number theoretic transforms (2-D NTTs): applications, 32, 34, 36 – 7 mathematical development, 24 rectangular images, 30, 32 simple images, 24 square images, 25 – 30 summary, 32, 37 – 8 zero pattern, 34, 36 Number theoretic transforms (NTTs): See also under type of advantages over FFT, 5 complex convolution, calculating, 8 conclusions, 80 – 5 constraints for selecting, 5 – 6 defined, 4 – 5 Fermat, 6 – 7 hardware implementations, 73 – 80
INDEX length considerations, 7 – 8 Mersenne, 6 pipelined designs, 74 TTL-based designs, 74 vector radix algorithm, 74 – 80 Number theoretic transforms, two-dimensional (2-D NTTs): computation of, 19 – 20 convolution without matrix transpose and overlap, 11 – 16 example of autoconvolution from simulations, 20 – 3 twiddle factor, 16 – 23 Number theoretic transforms (2-D NTTs), number domain analysis and two-dimensional: applications, 32, 34, 36 – 7 mathematical development, 24 rectangular images, 30, 32 simple images, 24 square images, 25 – 30 summary, 32, 37 – 8 zero pattern, 34, 36 Numerical model, 111 Nyquist theorems, 244, 255 Occlusion phenomenon, 192 One-dimensional Fermat number transforms (1-D FNTs), 50 One-dimensional number theoretic Walsh transform, 50 Overlap-save technique, 12 Paley-Wiener spaces, 270 – 3, 276 – 8 Parabolic-barrier approximation, 111, 113, 129, 137 – 9 Parameter space determination, 108 – 9 Periodic structures, analysis of 2-D FNT, 38 – 46 Phase gradient, 250 Pipelined designs, 74 Plancherel theorem, 268 Point-spread function (PSF), curve band-pass operator: curved k-plane trajectories and, 283 – 6 defined, 257, 281 flat spot, 321 – 2 integration of, 298 – 306 as an inverse Fourier transform of a finite nonnegative Lebesgue measure, 281 – 3
INDEX
371
lines, squares, and circles in k-space and, 286 – 98 reconstruction from discrete data samples and, 306 – 12 Point-spread functions, sample band-pass operators and, 313 – 16 Pr¨agnanz, law of, 328 Process Inferring Symmetric Axis (PISA), 329
Spectral synthesis method, 219 Spin echo, 248 Spin-lattice relaxation, 247 Spin-spin relaxation, 247 Spin warp, 249 – 50 Spiraling imaging, 250 – 1 Spot noise textures, 215 – 16 Square images, 25 – 30 Steiner formula, 203, 207 – 8
Radix-2 algorithm 56 – 7 Random fields models, 221 autobinomial model, 223 – 4 gagalovicz and Ma model, 224 – 5 Markov models, 222 – 3 Rayleigh criterion, 316 Reaction-diffusion textures, 214 Read gradient, 249 Rectangular images, 30, 32 Richardson-Dushman plot, 101 Riemann integration theorem, 309, 310 – 11 Riemann sum approximation, 261, 263, 302, 306, 312 Riesz-Thorin theorem, 308 Ring collapse, 152, 156 Root-power-sum method, 125, 149 ROSE (radially oriented excursions in k-space), 318 – 21 Rosette imaging, 251
Tangential energy, 98 Textures, 210 fractal models, 216 – 21 FRAME model, 226 – 8 multiscale synthesis model, 225 – 6 normal vectors perturbation method, 211 – 12 random fields models, 221 – 5 reaction-diffusion, 214 solid, 212 – 13 spot noise, 215 – 16 Thermal-field emission, 98, 105 – 7 Thermionic emission, 97, 100 – 1 Tip angle, 247 Transmission coefficient, 98 TTL-based designs, 73 – 4 Turbulence, 213 Twiddle factor, 16 – 23 Two-dimensional composite transform (2-D CNTT), 66 Two-dimensional Fermat number transforms (2-D FNTs): analysis of periodic structures, 38 – 46 applications, 46 – 9 arithmetic complexity, 10 combined with 2-D NMNT, 68 – 72 convolution property, 10 – 11 defined, 9 effect of data errors in, 44 examples, 11 generalization for, 41 – 4 inspection method, implementation of, 49 – 53 nonzero pixels in, 44 relationship between transform size and modulus, 9 – 10 Two-dimensional mixed radix conversion (2-D MRC), 67 – 8 Two-dimensional new Mersenne number transforms (2-D NMNTs): calculation of convolutions, 59 – 60
Scale invariance, 175 Schottky effect, 94, 97, 102 Schottky electron emitters: basic description, 93 – 5 Boersch effect for, 143 – 6 brightness of, 147 – 51 effects of changing ZrO coating to HfO, 158 – 62 energy spread of, 128 – 46 energy spread of, with small radius of curvature, 157 – 8 measurement of total energy distribution in heated tungsten cathode, 141 – 2 stability of emission, 151 – 6 Schottky emission, 97, 102 – 3 extended, 97, 103 – 5 Schr¨odinger equation, 100, 111 Schwartz functions, 265, 281 Shannon source coding theorem, 171 Sobolev space, 263 – 4 Solid textures, 212 – 13 Spatial frequency domain. See k -space
372 Two-dimensional new Mersenne number transforms (2-D NMNTs): (cont.) calculation of multidimensional convolutions, 61 combined with 2-D FNT, 68 – 72 defined, 58 multidimensional forward and inverse transforms, 60 – 1 relationship between transform sizes and modulus, 59 Two-dimensional new Mersenne number transforms, separable (2-D NMNTs): calculation of 2-D convolutions, 63 – 4 defined, 62 – 3 forward and inverse transforms, 62 – 3 forward and inverse transforms, multidimensional, 64 – 5 Two-dimensional number theoretic transforms (2-D NTTs): computation of, 19 – 20 convolution without matrix transpose and overlap, 11 – 16 example of autoconvolution from simulations, 20 – 3 twiddle factor, 16 – 23
INDEX Two-dimensional number theoretic transforms (2-D NTTs), number domain analysis and: applications, 32, 34, 36 – 7 mathematical development, 24 rectangular images, 30, 32 simple images, 24 square images, 25 – 30 summary, 32, 37 – 8 zero pattern, 34, 36 Two-dimensional number theoretic Walsh transform, 51 – 3 Van der Corput theorem, 284 Vector radix algorithm, 74 – 80 Walsh function, 51 Wentzel-Kramers-Brillouin (WKB) approximation, 103 Wiener-Khintchin formula, 172 – 3 Work function, 96 – 7 X-ray photo-electron spectroscopy (XPS), 120 Zero pattern, 34, 36