ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 106
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILaboratoire d’Optique Electronique du Centre National de la Recherche ScientiJique Toulouse, France
ASSOCIATE EDlTORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITEDBY PETER HAWKES CEMESILaboratoire d'Optique Electronique du Centre National de la Recherche Scient$que
Toulouse, France
VOLUME 106
ACADEMIC PRESS San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. @ Copyright Q 1999 by ACADEMIC PRESS All rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher.
The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-1999 chapters are as shown on the title pages. If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/99 $30.00 Academic Press division of Hartcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com
(I
United Kingdom Edition published by Academic Press 24-28 Oval Road, London NW1 7DX, UK htt p://www.hbuk.co.uk/ap/
International Standard Book Number: 0-12-014748-3 PRINTED IN THE UNITED STATES OF AMERICA 98 99 00 01 02 03 QW 9 8 7 6 5 4 3 2
1
CONTENTS . . . . . . . . . . . . . . . . . . . .
CONTRIBUTORS . . . . . PREFACE. . . . . . . .
vii ix
Effects of Radiation Damage on Scientific Charge Coupled Devices T . D . HARDY.M . J . DEEN.AND R . MUROWINSKI
I. I1 . Ill . IV . V. VI . VII .
Introduction . . . . . . . . . . . . . . . Device Structure and Operation . . . . . . Radiation Damage . . . . . . . . . . . . Dark Current . . . . . . . . . . . . . . Charge Transfer Efficiency . . . . . . . . ReadNoise . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . .
2 11 26 36 48 66 86 88
CAD Using Green’s Functions and Finite Elements and Comparison to Experimental Structures for Inhomogeneous Microstrip Circulators CLIFFORD M . KROWNE
I . Introduction to CAD for Microstrip Circulators . . . . . . . . I1. Ferrite Physical and Chemical Attributes Relevant to Microstrip Circulator Material Selection . . . . . . . . . . . . . . . . 111. Processing of Ferrite Materials for Microstrip Circulator Structures . . . . . . . . . . . . . . . . . . . . . . . . IV . Microstrip Circulator Considerations for Modeling . . . . . . . V . Setup Formulas for Numerical Evaluation of Microstrip Circulators . . . . . . . . . . . . . . . . . . . . . . . VI . Numerical Results and Comparison to Experiment for Microstrip Circulators . . . . . . . . . . . . . . . . . . . . . . . VII . Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
97
99
105 113 150 172 181 182
Discrete Geometry to Image Processing STEPHAWMARCHAND-MAILLET 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . I1 . Binary Digital Images . . . . . . . . . . . . . . . . . . . 111. Digital Topology . . . . . . . . . . . . . . . . . . . . . V
186 187 189
vi IV. V. VI . VII .
CONTENTS
Discrete Geometry . . . . . . . . . . . . Extensions in the 16-Neighborhood Space . Application to Vectorization . . . . . . . Conclusion . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . . . .
198 218 233 234 235
Introduction to the Fractional Fourier Transform and Its Applications HALDUN M . OZAKTAS. M . ALPER KUTAY.AND DAVID MENDLOVIC
I . Introduction . . . . . . . . . . . . . . . I1. Notation and Definitions . . . . . . . . I11. Fundamental Properties . . . . . . . . . IV. Common Transform Pairs . . . . . . . . V . Eigenvalues and Eigenfunctions . . . . . . VI . Operational Properties . . . . . . . . . VII . Relation to the Wigner Distribution . . . . . . . . . . . VIII . Fractional Fourier Domains IX. Differential Equations . . . . . . . . . . X . Hyperdifferential Form . . . . . . . . . XI . Digital Simulation of the Transform . . . . XI1. Applications to Wave and Beam Propagation XI11. Applications to Signal and Image Processing Acknowledgments . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
239 243 245 247 249 252 256 260 261 263 263 265 279 286 286
Confocal Microscopy ERNSTHANSKARLSTELZER A N D FRANK MARTINHAAR
I. I1. I11. IV. V. VI . VII . VIII . IX . X. XI .
Resolution in Light Microscopy . . . . . . . Calculating Optical Properties . . . . . . . Principles of Confocal Microscopy . . . . . Improving the Axial Resolution . . . . . . . Nonlinear Imaging . . . . . . . . . . . . . Aperture Filters . . . . . . . . . . . . . . Axial Tomography . . . . . . . . . . . . . Spectral Precision Distance Microscopy . . . Computational Methods . . . . . . . . . . Spinning Disks . . . . . . . . . . . . . . Perspectives of Confocal Fluorescence Microscopy References . . . . . . . . . . . . . . . .
INDEX
. . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
293 299 301 311 321 327 329 333 334 335 336 337
. . . . . . . . . . .
341
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . .
. . . . . .
. . . . . .
CONTRIBUTORS Numbers in parentheses indicate the pages on which the author’s contribution begins.
M. J. DEEN(l), School of Engineering, Simon Fraser University, Vancouver, British Columbia, Canada V5A 1S6
FRANK MARTINHAAR(292), Light Microscopy Group, Cell Biology and Biophysics Programme, European Molecular Biology Laboratory, Meyerhofstrasse 1, Postfach 10.2209, D-69117 Heidelberg, Germany T. D. HARDY(l), School of Engineering, Simon Fraser University, Vancouver, British Columbia, Canada V5A 1S6 CLIFFORDM. KROWNE(96), Microwave Technology Branch, Electronics Sciences Technology Division, Naval Research Laboratory, Washington, D.C. 20375 M. ALPER KUTAY(238), Department of Electrical Engineering, Bilkent University, T-06533 Bilkent, Ankara, Turkey (1 85), Department of Multimedia ComSTEPHANEMARCHAND-MAILLET munications, EURECOM Institute, BP 193,06560 Sophia Antipolis, France
DAVIDMENDLOVIC (238), Faculty of Engineering, Tel-Aviv University, 69978 Tel-Aviv, Israel R. MUROWINSKI (l), National Research Council, Herzberg Institute of Astrophysics, 5071 West Saanich Road, Victoria, British Columbia, Canada V8X 4M6 (238), Department of Electrical Engineering, Bilkent HALDUNM. OZAKTAS University, T-06533 Bilkent, Ankara, Turkey ERNSTHANSKARLSTELZER (292), Light Microscopy Group, Cell Biology and Biophysics Programme, European Molecular Biology Laboratory, Meyerhofstrasse 1, Postfach 10.2209, D-69117 Heidelberg, Germany
vii
This Page Intentionally Left Blank
PREFACE The five contributions to this volume cover CCDs, microstrip circulators, the need for discrete geometry in image processing, the fractional Fourier transform and confocal microscopy, all familiar themes in the series. The opening chapter by T. D. Hardy, M. J. Deen and R. Murowinski explains in depth the effects that radiation damage can have on chargecoupled devices. The long introductory section enables the reader to understand why these devices have become so important for imaging and the nature of the difficulties that remain to be overcome. The authors then recapitulate the structure and modes of operation of CCDs, after which radiation damage is investigated very fully. The closing sections are devoted to the dark current, charge-transfer efficiency and read noise. C . M. Krowne has already made several contributions to these Advances on the computer-aided design of microwave components. A further chapter appears here on the use of Green’s functions and finite elements to study inhomogeneous microstrip circulators. The author goes well beyond the computational details, however, and includes discussion of the physical and chemical aspects of the various materials employed, the processing of ferrites and ways of establishing the parameters that are needed if the computer modeling is to be successful and accurate. It is not until the fifth section of this chapter that we meet the formulas required for numerical evaluation of these devices. The closing section presents results and comparisons with measurements. The fact that semi-continuous “real” images must be replaced by discrete structures for image processing creates problems when we come to consider the geometrical properties. What happens to straightness, connectedness, what is a digital arc? In the third contribution, S. Marchand-Maillet sets out the problem with great clarity. The various features of discrete topology and geometry that are relevant to image processing are presented: the notions of neighborhood, digital arc and closed curve, discrete distance, convexity and straightness in particular. This leads us to an important section in which a 16-neighborhood space is introduced and two new definitions of distance are proposed. This is not the first contribution on this topic; readers may recall a chapter by V. A. Kovalevsky in volume 84. It is reassuring to see that this troublesome feature of the transition from continuous to discrete is gradually being understood. The importance of the Wigner distribution in areas far from its origins is ix
PREFACE
X
now well known but the relation between it and the fractional Fourier transform is less familiar. This transform, which reduces to the everyday Fourier transform for integral values of the parameter a, is defined by
.f,(u) =
s
(1 - icot $ ) ‘ I 2 exp{in(u2cot 4 - 2uu‘cosec 4
+ u”cot 4))f(u’)du’
and has numerous applications in the theory of propagation and image processing. H. M. Ozaktas, M. A. Kuay, and D. Mendlovic have written a very clear, full account of the properties of the fractional transform and its applications. The final chapter is a reminder that these Advances amalgamated with Advances in Optical and Efectron Microscopy a few years ago. The confocal microscope has now established its importance but it is far from having reached its ultimate performance and new developments regularly appear in the microscopy journals. In the final contribution, E. H. K. Stelzer and F. M. Haar first explain how the confocal microscope works and then turn to methods of improving the axial resolution, nonlinear imaging, aperture filters, axial tomography, spectral precision distance microscopy, computational methods, spinning disks and the outlook for confocal fluorescence microscopy. This is an authoritative appraisal of the present state of confocal microscopy, which will, I hope, be widely appreciated. My thanks as always to the authors, in particular for their efforts to ensure that the complex material that they are presenting can be grasped by readers who are newcomers to the topic. This is a very important aspect of texts for a review series such as AIEP and I am sure that readers will be grateful for all the care that has been taken to ensure readability. I conclude with a list of articles that are promised for future volumes.
FORTHCOMING CONTRIBUTIONS Mathematical models for natural images Soft morphology Use of the hypermatrix Interference scanning optical probe microscopy Modern map methods for particle optics
L. Alvarez Leon and J.-M. Morel I. Andreadis D. Antzoulatos W. Bacsa M. Berz and colleagues (vol. 109)
xi
PREFACE
Magneto-transport as a probe of electron dynamics in semiconductor quantum dots Second generation image coding
Artificial intelligence and pattern recognition in microscope image processing Distance transforms Resolution
J. Bird (vol. 107) N. D. Black, R. Millar, M. Kunt, F. Ziliani and M. Reid N. Bonnet
G. Borgefors A van den Bos and A. Dekker 0. Bostanjoglo High-speed electron microscopy S. Boussakta and Number-theoretic transforms and image A. G. J. Holt processing J. A. Dayton Microwave tubes in space E. R. Dougherty and Fuzzy morphology D. Sinha J. M. H. Du Buf Gabor filters and texture analysis R. G. Forbes Liquid metal ion sources E. Forster and X-ray optics F. N. Chukhovsky A. Fox The critical-voltage effect M. Gabbouj Stack filtering W. C. Henneberger The Aharonov-Bohm effect The development of electron microscopy in Spain M. I. Herrera and L. Bru Contrast transfer and crystal images K. Ishizuka Conservation laws in electromagnetics C . Jeffries M. Jourlin and Logarithmic image processing J.-C. Pinoli (vol. 110) M. A. Karim and External optical feedback effects in M. F. Alam (vol. 107) semiconductor lasers Numerical methods in particle optics E. Kasper Scanning electron microscope design A. Khursheed Positron microscopy G. Kogel Spin-polarized SEM K. Koike P. V. Kolev and Development and applications of a new M. Jamal Deen deep-level transient spectroscopy method (vol. 108) and new averaging techniques Sideband imaging W. Krakow A. van de Laak-Tijssen, Memoir of J. B. Le Poole E. Coets, and T. Mulvey
xii
PREFACE
Well-composed sets Vector transformation Complex wavelets
The finite volume, finite element and finite difference methods Plasma displays Electronic tools in parapsychology Restoration of images with space-variant blur Z-contrast in the STEM and its applications Electron image simulation Phase-space treatment of photon beams Representation of image operators Aharonov-Bohm scattering Geometric methods of treating energy transport phenomena HDTV Nitride semiconductors for high-brightness blue and green light emission Scattering and recoil imaging and spectrometry The wave-particle dualism Digital analysis of lattice images (DALI) Electron holography Reconstruction from non-Cartesian grids X-ray microscopy Accelerator mass spectroscopy Vector coding and wavelets Focus-deflection systems and their applications Hexagonal sampling in image processing Study of complex fluids by transmission electron microscopy Shape skeletons and greyscale images New developments in ferroelectrics Organic electroluminescence, materials and devices
L. J. Latecki W. Li J.-M. Lina, B. Goulard and P. Turcotte (vol. 108) C . Mattiussi
S . Mikoshiba and F. L. Curzon R. L. Morris J. G. Nagy P. D. Nellist and S. J. Pennycook M. A. O’Keefe G. Nemes B. Olstad M. Omote and S. Sakoda C . Passow E. Petajan F. A. Ponce J. W. Rabalais H. Rauch A. Rosenauer (vol. 107) D. Saldin G. E. Sarty G. Schmahl J. P. F. Sellschop M. Shnaider and A. P. Paplinski (vol. 110) T. Soma R. Staunton (vol. 107) 1. Talmon
S. Tari J. Toulouse T. Tsutsui and Z . Dechun
...
PREFACE
Electron gun optics Very high resolution electron microscopy Mathematical morphology and scanned probe microscopy Morphology on graphs Generalized ranked-order filters Representation theory and invariant neural networks Magnetic force microscopy Fuzzy cellular neural networks
XI11
Y. Uchikawa D. van Dyck J. S. Villarrubia L. Vincent J. B. Wilburn J. Wood (vol. 107) C . D. Wright and E. W. Hill T. Yang (vol. 108)
This Page Intentionally Left Blank
ADVANCES I N IMAGING AND ELECTRON PHYSICS. VOL. 106
Effects of Radiation Damage on Scientific Charge Coupled T. D . HARDY'.2. M . J . DEE"*.
and R . MUROWINSKI*
'School of Engineering Scimce. Simon Fruser University Vancouver. Briti.sii Columhiu Cunudu VSA IS6 'Nutional Research Council. Herzherg Institute of Astrophysics 5071 West Suunich Road. Victoria. Brirish Colurnhia Canadu V8X 4,446
.
.
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. CCD Development and Current Status . . . . . . . . . . . . . . . . . C. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [I. Device Structure and Operation . . . . . . . . . . . . . . . . . . . . . A. Charge Generation . . . . . . . . . . . . . . . . . . . . . . . . . B. Charge Collection . . . . . . . . . . . . . . . . . . . . . . . . . C . ChargeTransfer . . . . . . . . . . . . . . . . . . . . . . . . . . D . Charge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Radiation Damage . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Ionization Damage . . . . . . . . . . . . . . . . . . . . . . . . . B. Displacement Damage . . . . . . . . . . . . . . . . . . . . . . . C. BulkTrap Levels . . . . . . . . . . . . . . . . . . . . . . . . . . D. DLTS Measurements . . . . . . . . . . . . . . . . . . . . . . . . E. Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F . FUSE Radiation Environment . . . . . . . . . . . . . . . . . . . . 1V. Dark Current . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . V . Charge Transfer Efficiency . . . . . . . . . . . . . . . . . . . . . . . A . Simple Physical Model . . . . . . . . . . . . . . . . . . . . . . . B. Measurement Techniques . . . . . . . . . . . . . . . . . . . . . . C. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . VI . ReadNoise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. NoiseSources . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Correlated Double Sampling . . . . . . . . . . . . . . . . . . . . . C. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . VII . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Abbreviations and Symbols . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 2 4 10 11 12 13 20 23 26 27 28 29 31 32 34 36 36 42 48 49 54 58 66 66 69 75 86 88 93 95
*Corresponding Author . Phone: (604)291-3248; Fax: (604)291-4951; E-mail: jamal(@cs.sfu.ca.
1 Volume 106 ISBN 0-12-014748-3
ADVANCES I N IMAGING AND ELECTRON PHYSICS Copyright (13 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99$30.00
2
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
I. INTRODUCTION Charge coupled devices (CCDs) were first introduced by Boyle and Smith [ 19701. Several different applications have been explored for the devices, including digital memories and analog signal processing, but CCDs are best known today as optical detectors. The phenomenal success of these devices in electronic imaging has spawned a great deal of research, and many advances have been made over the past three decades. Due to their sensitivity and precision, CCDs have made a particularly large impact in the field of scientific imaging, from optical astronomy to medical research. These applications continue to push the limits of CCD performance.
A . Background CCDs operate in the charge domain. The electrical signals which propagate through the device are small bundles of charge carriers, either electrons or holes. These “charge packets” are created, stored, and moved around inside the device to perform the operations required. The fundamental unit of a CCD is a metal-insulator-semiconductor (MIS) capacitor on which the charge packets are stored. If two of these capacitors are placed close enough together, the charge can be transferred from one to the other by manipulating the voltages on their gates. Charge transfer between capacitors is the key operation performed by a CCD and is the origin of the term charge coupled device. If we make a whole string of these closely spaced capacitors in a row, we can form a serial shift register. Charge can be injected into the capacitor at one end of the register through an adjacent diode, transferred down the line of capacitors, and read out at the other end with a charge detection amplifier. The first CCD made by Boyle and Smith was composed of 24 capacitors in a similar configuration, and was first used as an &bit serial shift register. A binary “1” was represented by the presence of a charge packet, and “0’by the absence of a packet. Originally, the CCD was envisioned as a memory device, and there was much activity in this area in the first few years after its invention. The first generation of commercial CCD memory devices appeared in 1975 in the form of &kilobit and 16-kilobit memories [Kosonocky and Zaininger, 19791. These were serial-access memory devices implemented as long circular shift registers. Eventually, however, CCDs were outmoded by other technologies with faster access times o r larger capacities. Another application area explored in the early years of CCDs was analog signal processing. A CCD is inherently an analog device, since the charge packets can be of arbitrary size, and it lends itself naturally to time-sampled signal processing.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
3
In these applications, each charge packet represents one sample of the analog signal. A delay line is the most obvious possibility, since it is basically an analog shift register, and CCD delay lines were successfully used to perform such tasks as resynchronizing video signals [Buss, Tasch, and Barton, 19791. Analog transversal filters were also built, but when numeric processors became fast enough, it was more convenient to perform most signal processing tasks in the digital domain. CCDs are still used in some cases, for instance, to capture a transient signal which is too fast for an analog-to-digital converter (ADC) to sample adequately. The CCD can then be read out at a slower rate for the ADC to digitize. This approach is used in some digitizing oscilloscopes. It was in imaging, however, that the CCD found an enduring niche. Over the past three decades, its impact on the field of electronic image capture has been nothing short of revolutionary. The shift from memory and signal processing into the optical detector field was simple for CCDs because the silicon from which they are made is naturally light-sensitive. Light incident on a volume of silicon will generate charge (electrons and holes) through the photoelectric effect. So in an imaging CCD, instead of charge being injected electrically at one end of the register it is created by the incident light along the whole length, and the size of the charge packets detected at the output (barring complications to be discussed later) will be in direct proportion to the light intensity at each point. This property enabled Tompsett, Amelio, and Smith [1970] to use the first CCD also as a simple line imager. CCD line imagers have been used with success in such devices as spectrographs and facsimile machines. The extension from a line imager to an area imager is straightforward: a series of CCD shift registers is placed side by side to form a two-dimensional array. If an image is focused onto this array, the photo-generated charges in the capacitors will form an electrical analog of the image, each charge packet corresponding to a single picture element (pixel). The charge packets can then be transferred along the shift registers to a detection circuit, and the image can be recorded or displayed. Figure 1 shows a simple analogy. In this figure, the buckets represent the CCD capacitors, the raindrops represent the photons of light, and the water collected in the buckets represents the generated charge packets. The conveyor belts represent the CCD shift registers, and the measuring station represents the chargesensitive output amplifier. After collecting rainwater (charge) for a certain period of time (the exposure or integration time), the side-by-side conveyor belts that form the array (the parallel registers of a CCD) shift one unit, and load the transverse conveyor (the serial register), which then conveys each of its buckets, one at a time, to the measuring station. When the row has been completely transferred, the parallel registers shift again and load a new
4
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI I
I
buckets = capacitors
measuring station = output amplifier conveyors = shifi regis
FIGURE 1. Simple C C D analogy: incident light is represented by rainfall, and the C C D registers are represented by conveyor belts. The “image” (rainfall distribution) is acquired by transferring the rainwater in each bucket, one at a time, to the measuring station.
row into the serial register. This continues until the entire array has been read out and the distribution of rainfall over the array of buckets can be reconstructed from the data. In the same way, the distribution of light intensity (the image) incident on a CCD can be reconstructed from the measurements of the charge collected in its capacitors.
B. CCD Development and Current Status The initial research efforts into CCD imaging arrays were aimed at producing devices for the large markets of broadcast television, home video, surveillance, and closed circuit television (CCTV) systems, which at the time were largely dominated by vidicon tubes. The idea was to make replacements for the vidicon tube that incorporated the CCD’s advantages in size, weight, reliability, and low power requirements. It turned out to be quite difficult, however, to produce arrays of any appreciable size, and it was many years before researchers were able to produce CCDs of sufficient array
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
5
sizes (around 500 x 500 pixels for standard television) that could also match the vidicon in frame rate and cosmetic quality. The first commercially available CCD camera was only a small 100 x 100 array unveiled by Fairchild Semiconductor in 1973 [Solomon, 19741. Nevertheless, driven by the mass market possibilities and aided by advances in integrated circuit manufacturing technology, several manufacturers were able to produce cameras that were fully television-compatible by the mid 1980s, and CCDs soon completely replaced vidicons in most applications. Today, CCD-based hand-held home video cameras about the size of a paperback novel are widely available. CCD imagers also generated considerable interest in the scientific community because of their low noise, high linearity, large dynamic range, good geometric accuracy, and broad spectral response. Astronomers, who are always interested in detecting fainter and more distant objects, were particularly impressed with the sensitivity of CCDs, which is approximately 100 times greater than photographic film. Frame rate was not a major issue, since astronomical exposures typically last from a few minutes to several hours. Once arrays of reasonable size were available, CCDs rapidly became the detector of choice at all major astronomical observatories. NASA also commissioned CCDs for several space missions, and in 1980 Texas Instruments, Inc. managed to fabricate 800 x 800 pixel imagers of which four were used in the first Wide Field and Planetary Camera (WF/PC) of the Hubble Space Telescope (HST) [Blouke et ul., 1981al. Another 800 x 800 pixel device made by Texas Instruments with a different architecture designed to reduce susceptibility to radiation damage and enhance spectral responsivity was sent on the space probe Galileo to take pictures of Jupiter and its moons [Janesick, Hynecek, and Blouke, 19811. Despite technical problems with other parts of both spacecraft (the flawed main mirror of HST and the failed high-gain antenna of Galileo), the CCDs produced stunning results. 1. Array Size As the capabilities of CCDs as scientific instruments became known, scientists pushed manufacturers for devices with larger and larger array sizes, broader spectral response, and lower noise. In 1983 Texas Instruments bettered their 800 x 800 array with a 1024 x 1024 device [McGrath, Freeman, and Keenan, 19831, and a year later Tektronix, Inc. had produced a 2048 x 2048 array. In 1989 Ford Aerospace Corp. managed a 4096 x 4096 array [Janesick et ul., 19893, and this remains the largest commonly available format. Larger arrays have, however, been built for special applications. A U. S. Navy project is currently underway to build a reconnaissance instrument for which Loral Aerospace is building CCDs of
6
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
9216 x 9216 pixels, each pixel being 8.75 pm square. This device, the largest CCD array ever built in terms of pixel count, is 80.6 mm on a side and each one takes up an entire 5-inch silicon wafer. At an astronomical CCD conference in October 1996 [Bredthauer, 19971 it was reported that the effort had produced three or four working arrays, with more being fabricated. These first devices, however, had numerous defects, and the images they produced were not cosmetically good enough for scientific work. In another effort, researchers at the Steward Observatory in Arizona are currently evaluting the scientific suitability of large CCD arrays manufactured by Phillips Imaging Technology, Inc. These devices are built on 6-inch wafers and consist of a 7168 x 9216 array of 12-pm pixels, making them the largest integrated circuits ever built [Theuwissen, 19971. There are two difficulties with building these extremely large arrays: the first is reducing losses in the large number of transfers each charge packet has to undergo to reach the output, and the second is avoiding defects in circuits of such enormous physical size and density. Charge transfer efficiency (CTE) is a measure of the percentage of charge that is successfully transferred from one pixel to the next during readout of the array. The CTE must be very high in order to get reasonable output from a large array. For example, an average packet in a 4096 x 4096 array undergoes around 4000 transfers to reach the output, and even with a CTE of 0.9999 it would arrive with only two-thirds of its charge. The original CCDs of Boyle and Smith had CTEs of about 0.98, so it was fortunate the charge packets had only eight transfers to undergo! An important cause of poor CTE is trapping by midgap energy states. The first CCDs experienced a lot of charge loss because the charge was stored and transferred at the interface between the silicon and the insulating layer, where there are a large number of these trapping states. Early on, researchers experimented with adding an implanted layer just below the surface to create “buried channel” devices in which the charge was stored and transferred away from the surface states [Walden et al., 19721. This made a dramatic improvement in CTE over “surface channel” operation and is the standard device structure used today. Buried channel devices still encounter trapping states due to impurities in the silicon, but silicon manufacturing has improved so much in the last 30 years that impurity levels are now very low and CTEs of up to 0.9999998 have been achieved [Murowinski, Deen, and Hardy, 19951. The great advances in integrated circuit fabrication that have accompanied the improvements in silicon purity and crystal quality have also reduced the problem of circuit defects when making very large arrays, but it is still expensive to fabricate devices in which a single defect, such as a short between two clock phases, can ruin an entire wafer. To circumvent this
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
7
problem, many manufacturers and research groups have opted for a less demanding, hence less costly, solution to creating large area detectors: making several smaller arrays and tiling them together on a single focal plane. With the most common geometries, four devices can be fabricated on one wafer, so a single point defect will only affect one quarter of the devices. This approach has, however, required putting considerable effort into producing CCDs in an appropriate format and devising means of butting the arrays together in such a manner as to minimize the dead space between them while maintaining stringent optical flatness across the entire plane. At least three manufacturers are now producing 2048 x 4096 pixel arrays for this purpose, which can be butted on three sides to produce conglomerate arrays 8192 pixels wide and any number of pixels long. Another advantage of these CCD “mosaics” is that a much larger focal plane can be covered because it is no longer limited to the size of a single silicon wafer. In two current MEGACAM camera projects, one being planned for the Multiple Mirror Telescope (MMT) in Arizona and the other for the Canada France Hawaii Telescope (CFHT) in Hawaii, thirty-two 2048 x 4096 CCDs are used to cover an approximately 240 x 240 mm focal plane [Boulade, 19871. Despite the impressive advances in array sizes and densities, CCDs are only beginning to match the resolving power of photographic film, which, combined with film’s low cost, has largely prevented CCDs from making inroads into the huge 35mm still camera market. At least two companies (Canon and Nikon) have high-resolution CCD-based digital still cameras available, both based on 4096 x 4096 CCDs made by Eastman Kodak, but at roughly $20,000 each, they are aimed mainly at professional news photographers. There are numerous digital cameras being marketed to the consumer, but they are based on CCDs of small pixel counts (typically 640 x 480, a standard resolution for computer displays) and produce images of relatively poor quality. Better products are continually appearing, however, and there can be little doubt that CCDs will soon be competing successfully with film. Of course, with film, one can always change the optics and use a larger piece of film if greater resolution is required; it will be a long time before a CCD can produce images of the same quality as large-format film cameras can. One of the major technological hurdles of building cameras with very large pixel counts (apart from producing large, cosmetically perfect CCDs) is developing the supporting equipment to handle the vast amounts of data produced. For example, a 4096 x 4096 array generates 32Mb of data for each image (with 16 bits per pixel digitization), and an 8192 x 8192 mosaic produces over 128Mb per image. Efficiently dealing with this volume of data, especially in a portable device like a still camera, is an area of ongoing development effort.
8
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
2. Noise The sensitivity of a CCD is determined in large part by the noise, which imposes a fundamental limit on the minimum detectable signal. There are many sources of noise, including photon noise, thermal noise, and electrical noise in the readout circuit. Photon noise is a result of the fact that photoelectric charge generation is a random process, governed by Poisson statistics. Therefore, due to the very nature of the detection mechanism in a CCD, there is noise present with a root mean square (rms) value equal to the square root of the signal level in electrons. However, this source of noise is not a serious detriment to the sensitivity of a CCD because it is proportional to the signal level and is lower for low signals. The dark current is the amount of charge generated by thermal energy in the device, which produces charge in the pixels even when the CCD is not exposed to light (hence the term dark current). Thermal generation is also a random process, thus dark current also adds noise. In this case, the noise is proportional to the amount of dark current and not the signal level, so it can reduce the ability to resolve faint objects and is particularly harmful in the extremely long exposures typical of astronomical imaging. Dark current is highest in regions where there are a large number of midgap levels, so the largest contribution to the dark current comes from the surface, with its high density of midgap states. However, CCD researchers have devised a couple of techniques which very effectively reduce or eliminate the surface dark current (see Section IV). The remaining contribution to the dark current comes from midgap states in the bulk silicon away from the surface, and this has been significantly reduced by the same improvements in silicon purity that have increased the CTE. Typical values in modern CCDs are about 20 pA/cm2 at room temperature. These low levels have reduced the need for an elaborate cooling apparatus to lower the dark current. Read noise is the amount of noise introduced by the charge detection circuit at the output of the CCD. This has been the ultimately limiting source of noise for most of the CCD’s history. The first CCDs exhibited input-referred noise levels of around 30e- rms. Much effort has been expended to lower the read noise by signal processing techniques and by optimizing the geometry of the transistors used in the output circuit [Kim, Blouke, and Heidtmann, 19901. This has resulted in read noise levels in the best devices of around 2 e- rms at slow (50 kHz) readout rates. Another possibility which a few researchers have pursued is devising output circuits that can sample the charge packet nondestructively. It is then possible to average multiple samples of the same charge packet, which reduces the noise by the square root of the number of samples, though at the cost of a slower readout rate. Such a device has produced output with
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
9
a noise of less than 1 e- rms by averaging 64 samples per pixel [Janesick et al., 19891.
3. Quantum Eficiency Scientists have also been pushing the limits of CCDs in the area of quantum efficiency (QE). QE is a measure of how accurately the charge generated in a pixel represents the actual intensity of light incident on the CCD at various wavelengths. Typically, there are losses due to reflections from and absorptions in nonactive layers of the device, and certain wavelengths may pass through the device undetected. A great deal of loss at short wavelengths occurs in the insulating and gate layers of a CCD, and three methods of overcoming this have been pursued. The first is to use special implants to eliminate one or more of the gate phases so that a portion of each pixel has only a thin oxide layer over it to interfere with incident light. This type of device was built by Texas Instruments in 1981 and used in the Galileo spacecraft [Janesick, Hynecek, and Blouke, 198I]. The second approach is to flip the device over, etch away the substrate, and illuminate it from the backside. This clears the entire surface of interfering structure. The thinning process, however, is difficult and therefore costly. It took ten years of work by Texas Instruments to perfect the thinning process used in the production of the CCDs for WF/PC [Blouke et al., 1981bl. Reticon, Inc. introduced a commercial thinned, backside-illuminated CCD in 1987, and many manufacturers today offer thinned versions of their CCDs. The third and much simpler approach to enhancing responsivity at short wavelengths is to use a phosphor coating (such as lumigen, the material used in fluorescent yellow highlighting pens), which converts short wavelength photons to longer wavelengths that pass more easily through the frontside surface layers. This was the approach used for the CCDs of the camera upgrade to the HST (WF/PC2) after an unexpected problem with the thinned WF/PC CCDs emerged shortly before launch, resulting in a $5 million emergency fix. To further improve QE performance, an antireflection (AR) coating is often applied to the devices after manufacture. AR coatings reduce the losses at certain wavelengths due to reflection from the surface of the CCD and can make a substantial improvement to the QE. Current CCDs can have a QE that peaks at over 90% and, depending on the type of AR coating, reasonable ( > 30%) performance can extend down to wavelengths of 200nm (ultraviolet) or up to 1OOOnm (near infrared). A phosphor coating can extend the useful QE range down to 50nm (far ultraviolet). Figure 2 shows a set of typical QE curves for three different devices. These curves were measured for CCDs used at the National Research Council’s (NRC) Dominion Astrophysical Observatory (DAO).
10
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI 1001
'
'
'
'
80
200
'
'
'
'
'
'
'
I
'
'
'
I
'
'
'
I
0 backside-illuminated, AR coated A frontside-illuminated + frontside-illuminated, lumigen coated
400
600 800 Wavelength (rim)
1000
1200
FIGURE 2. Typical quantum efficiency (QE) curves. The diamonds are for backside illuminated, Ar coated; the triangles are for frontside illuminated; and the plus signs are for frontside illuminated, lumingen-coated CCDs.
C. Motivation
As the limits of CCD performance are pushed further, the devices become
more and more sensitive to small amounts of damage. One cause of damage encountered by CCDs in certain applications is nuclear radiation. Radiation can cause charge buildup in the insulating layers or even damage to the atomic crystal structure of the silicon and can result in serious degradation of the sensor's performance. The aim of this article, a continuation of our earlier research [Murowinski, Deen, and Hardy, 1995; Morowinski, Linzhuang, and Deen, 1993a; Hardy, Murowinski, and Deen, 1997; Murowinski and Deen, 1994; Murowinski, Linzhuang, and Deen, 1993d; Murowinski and Deen, 1993; Murowinski, Linzhuang, and Deen, 1993c; Hardy, Murowinski, and Deen, 19981, is to examine the effects of radiation damage on the operation of CCDs and to discover, if possible, means by which the effects can be minimized. The motivation behind this investigation was a satellite astronomy project called the Far Ultraviolet Spectrographic Explorer (FUSE). One of the instruments abroad the satellite is the Fine Error Sensor (FES), which is used to keep the satellite pointed in the desired direction. The FES
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
11
uses a CCD to track guide stars and provide feedback to the attitude control system of the satellite. To achieve the necessary pointing accuracy for the telescope, the specification for the FES states that it must be able to determine the centroid of a guide star image to within 0.08 of a pixel, which corresponds to an angular deviation of 2 arcseconds. In the original FUSE mission design, the satellite was to travel in a highly elliptical orbit (HEO), and twice each orbit would have traversed a belt of protons trapped by the earth’s magnetic field. In this orbit, the FES would have received a very large dose of energetic protons, thus there was considerable concern over the effects of this type of radiation on the instrument. After initial studies, the mission was revised with a lower orbit to reduce the amount of radiation encountered; in addition, the lifetime of the mission was reduced. However, radiation damage is still a concern. Our investigations focused on three characteristics of CCDs which are susceptible to radiation damage: dark current, charge transfer efficiency (CTE), and noise in the output circuit. We discovered the rate of degradation in these three areas under the level of radiation expected for the FUSE mission and made recommendations for various means of reducing the effects. The next section describes the basic structure of CCDs and the theory of their operation. Section 111 gives a brief outline of the physics of radiation damage in semiconductor materials. After this we describe the experiments we performed to investigate radiation effects on the above three performance characteristics of CCDs. Section IV deals with dark current, Section V with charge transfer efficiency, and Section VI with read noise. In Section VII we sumarize our conclusions.
11. DEVICESTRUCTURE AND
OPERATION
The operation of a CCD can be broken down into four steps: charge generation, charge collection, charge transfer, and charge detection. Charge generation is how the external signal we desire to detect (light intensity) is converted into an internal electrical signal in the form of electronic charge. Charge collection is the next step, in which the generated charges are gathered into discrete packets. It has a twofold purpose: to allow integration of the signal over a long period of time and to spatially localize the signal to get a two-dimensional signal distribution. Charge transfer is the process whereby the integrated and spatially localized charge signals (charge packets) are moved to a single detector. The detector performs the last step, which is to convert the charge signal into a more convenient electrical signal, namely voltage, for further processing.
12
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
A. Charge Generation
Charge generation occurs through the photoelectric effect, in which a photon of light interacts with an electron in the valence band of a semiconductor and imparts enough energy for the electron to jump to the conduction band, creating an electron-hole pair (Figure 3). The energy required is equal to the bandgap of the semiconductor (the energy difference between the conduction and valence bands). Silicon has a bandgap of 1.12eV and therefore any photon with energy greater than 1.12eV is capable of boosting electrons into the conduction band in silicon. Photons with greater energy may cause more than one electron to jump to the conduction band. It has been determined empirically that photons with energy greater than about 5 eV will generate 1 electron-hole pair for every 3.65 eV of energy they possess [Janesick, 19911. The energy of a photon is related to its frequency by
where h is Planck's constant, v is the frequency, c is the speed of light, and 2 is the wavelength. From this equation we can see that the upper end of the useful spectral range of silicon as a detector of electromagnetic radiation is about 1100 nm, which is in the near infrared. Above this wavelength the photons d o not have enough energy to excite electrons into the conduction band. Other semiconductors with smaller bandgaps can be used if the detection of longer wavelengths is desired. Germanium, for example, has a bandgap of 0.66eV, and germanium CCDs have been built with good spectral response to 1600 nm [Janesick, 19911. The responsivity of silicon also tapers off at very short wavelengths because there is a reduced
Ec
.
,f
t
I I
I
1.12eV EV
1'1
I I
0
\
electron-hole pair
/
0 eFIGURE3. Photoelectric effect. An incident photon of energy h v excites an electron into the conduction band, creating an electron-hole pair.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
13
probability of interaction (due to the lower absorption distance). The practical lower limit is about 0.1 nm, or a photon energy of 10 keV, which is in the x-ray region of the spectrum. So, ideally, the useful range of a silicon detector like a CCD extends over the near infrared, visible, ultraviolet, extreme ultraviolet and soft x-ray portions of the electromagnetic spectrum. However, other effects such as reflection from the surface or absorption in nonactive regions of the device can significantly reduce the sensitivity of a CCD in certain spectral ranges.
B. Charge Collection Once the charge is generated, it must be collected and stored. As stated in the Introduction, the basic element of a CCD is a MIS capacitor, upon which the photo-generated charge can be stored. The device is built on a p + type substrate on which is grown an epitaxial p-type layer of about 10-20 pm in thickness. The resistivity of the epitaxial layer can be varied depending on the application, but a typical value is around 30-50 ohm-cm. On top of the epitaxial layer is an insulator layer, which can be a simple oxide or a double layer of oxide and nitride. The nitride layer assists in ensuring a uniform insulator thickness throughout the repeated oxidations and oxide etchings involved in creating the multilayer gate structure of the CCD. The insulating layer is usually about 1000 angstroms thick in total. O n top of the insulator the gate material is deposited to form the metal plate of the MIS capacitor. Figure 4 shows a cross section of the capacitor. If a positive voltage is applied to the gate of the capacitor, a depletion region is formed in the silicon below the gate as the majority carriers (holes in p-type silicon) are pushed away (Figure 4a). The charge on the gate is
VG
VG>O
gate oxide
I
I @@@em
(a)
I
I
VT
(b)
FIGURE4. Cross section of an MIS capacitor. (a) Depletion. (b) Inversion.
14
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
balanced by the space charge in the depletion region and the resulting potential profile will create a potential well at the surface (Figure 5). The steady-state minority carrier (electron) concentration is given by
where 111 is the intrinsic Fermi potential and is the Fermi potential. If the applied gate voltage is high enough, the surface potential will exceed the Fermi potential by a sufficient amount to allow a significant minority carrier buildup in the potential well at the surface. The additional charge on the gate then begins to be balanced by collected minority carriers instead of the fixed charge of a widened depletion region (Figure 4b). The surface is said to be “inverted,” and the layer of collected minority carriers is called the “inversion layer.” If a voltage greater than the threshold voltage V, is applied to the gate, the minority carrier concentration at this point will exceed the steady-state majority carrier concentration, and the surface is said to be in “strong inversion.” What we have not addressed in the above description is where the minority carriers in the inversion layer come from. The capacitor is isolated, so the minority carriers can only come from carrier generation processes in the depletion region, which takes time. These processes, including thermal generation and the photoelectric effect, are described in Section IV. CCDs are operated in a transient state called “deep depletion.” In a deep depletion state, the gate voltage is sufficient to invert the surface, but enough minority carriers have not yet been collected to do so, thus the gate charge must be balanced by a wider depletion region. The depletion region and the resulting potential well in this state extend deep into the silicon substrate. If the other 4
,
1
1
I
t
‘potential
I I
I I
I I I
I I I I
I
I
I I
+ z
depletion edge gate oxide
(?I)
FIGURE5. Potential well at the surface of an MIS capacitor. Free electrons will gather just below the oxide at the potential maximum.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
15
carrier generation mechanisms are slow enough, the majority of the carriers collected on the capacitor will be those generated by the photoelectric effect; therefore, the charge stored on the silicon plate of the capacitor will be proportional to the incident light intensity. 1. Buried-Channel Operation
As mentioned in the Introduction, the surface is a poor place to store and transfer charge because of the large number of trapping states there, so most modern CCDs are buried channel devices. In buried channel devices, a shallow layer at the surface is implanted with n-type impurity atoms, similar to a depletion-mode metal-oxide-semiconductor field-effect transistor (MOSFET), and the n-layer is biased positively with respect to the p-type substrate. This argument alters the potential distribution so that the charge on the capacitor is collected below the surface in the bulk silicon. To understand the potential distribution in a buried channel CCD we first consider a reverse-biased p-n junction as shown in Figure 6. For simplicity,
I
x 10,
FIGURE6. (a) P-N junction under reverse bias. (b) Charge concentration. (c) Electric field magnitude. (d) Potential distribution.
16
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
we assume a step junction, with a space charge distribution in the depletion region as shown. This fixed charge gives rise to an electric field ??,which is calculated by integrating the charge density. Integrating the electric field gives the potential distribution, which has a maximum at the far edge of the n-type region. Free electrons will be swept by the electric field along the potential gradient toward this maximum and out through the metal contact. We can now alter the picture to resemble a buried channel CCD by adding an oxide layer and a gate (Figure 7). If a potential V, lower than the reference potential I/*es is applied to the gate, the potential at the surface will be pulled down, forcing the maximum of the potential distribution deeper into the silicon. As V, is lowered further, the maximum moves deeper until the surface potential is lower than the substrate potential, as in the curve for V,, in Figure 7. At this point, holes are attracted from the surrounding p-type material and invert the surface, “pinning” the surface potential to just below that of the substrate. If V, is lowered still further (e.g., to V,), the extra gate potential will be balanced by an increase in the inversion charge, so that the potential within the silicon remains fixed. The potential well still exists in inversion, and although its height can no longer be adjusted it is capable of collecting charge as before, a feature which is exploited in certain devices to reduce the dark current (see Section IV). Vref
gate oxide
FIGURE 7. Buried channel CCD cross section and corresponding potential distributions for several applied gate voltages.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
17
potential
-x
I
('4 FIGURE 8. CCD channel cross sections and potential distributions. (a) Across the channel. (b) Along the channel.
The preceding describes the potential variation along the vertical (gate to substrate) axis. To spatially localize the charge signal into an array of pixels, the charge must be confined along both of the remaining axes as well. Figure 8 shows two cross sections through a CCD channel. Figure 8(a) is the cross section across the channel, perpendicular to the direction of charge transfer. The charge in this direction is confined by the channel stops, which are electrically connected to the substrate and are at the substrate potential. They are thus biased negatively with respect to the channel and create a potential gradient as shown in the figure. Along the channel, parallel to the direction of transfer, the charge is confined by the potentials applied to adjacent gates. In Figure 8(b), the collecting gate is set to VG,from Figure 7, and the adjacent gates are set to VG2.This creates a potential profile as shown.
18
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
2. Charge Spreuding The collection of charge into discrete packets in a CCD is remarkably efficient. One of the features of a CCD which makes it an attractive image sensor is that it has a 100% fill factor. This means that there is no insensitive dead space between pixels. If a photon is absorbed within the channel stop region or in the volume beneath a noncollecting gate, the generated electron will still be collected by the potential well. If, however, the photon is absorbed deep in the device beyond the depletion region, the generated electron will drift randomly until it recombines or until it enters the depletion region and is swept by the electric field into the potential well. It is possible that the random motion of the photoelectron will take it away from the pixel in which it originated before it is swept into a potential well, thus the spatial resolution of the device is compromised. This spreading effect is particularly problematic for long-wavelength photons, which tend to penetrate deeper into the device and are more likely to be absorbed in the field-free region. The spatial resolution can be improved by reducing the number of electrons which are collected from the field-free region [Blouke and Robinson, 19811; however, this improvement comes at the expense of reduced quantum efficiency.
3. Buckside Illiimination In addition to the problem of photons which penetrate too far, there is a problem with photons which do not penetrate far enough. Clearly, a CCD cannot detect photons which are absorbed in the gate and oxide layers of the device and never reach the depletion region, which occurs most frequently for short-wavelength photons. Backside illumination is a technique that was developed to improve the quantum efficiency of CCDs at shorter wavelengths. In a difficult, low-yield process, the CCDs are flipped over and the substrate is etched away right up to the epitaxial layer. The device is then illuminated from the substrate side and the photons enter the active region directly, without having to pass through the gate and oxide layers. The spectral dependence of the spatial resolution is reversed for backside-illuminated CCDs, because in these devices the shorter wavelengths are the ones absorbed far from the frontside depletion regions. An unanticipated problem with backside illumination was revealed when the first devices were tested [Janesick, 19911. A thin native oxide layer grows on the backside surface when it is exposed to air; it turns out that holes can be trapped at the interface, creating a layer of positive charge. This charge layer deforms the potential distribution so that a second maximum is created at the backside surface (Figure 9). Therefore, short-wavelength photons absorbed near the surface are swept back by the electric field and
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES gate
oxide
19
oxide
FIGURE 9. CCD cross section and potential distribution showing backside charging effects.
become stuck at the backside until they recombine. There are several methods which have been developed to create the necessary electric fields to negate the effect of the trapped holes and drive the photoelectrons toward the potential wells at the frontside [Janesick et nl., 19851, but we will not discuss them here. With these techniques, it is possible to eliminate the field-free region and achieve a 100% internal QE; in other words, all electrons generated in the device are collected and none are lost to recombination, so that every photon which is not reflected from the surface or allowed to pass right through the device, is detected. 4. FuIt Well
As charge is collected in the potential well, the shape of the potential distribution is altered. The peak is reduced, and the well becomes flatter and broader (Figure 10) until eventually it is no longer capable of containing additional charge. The maximum amount of charge which can be held in the potential well is called the “full well” charge. This level can be defined in several ways. The two most common definitions in buried channel CCDs are referred to as surface full well and bloomed full well. These levels are indicated in Figure 10. Surface full well occurs when the collected charge begins to interact with the surface, and is manifested by a significant increase in trapping phenomena due t o the surface states. Bloomed full well is the level at which the potential equals that under the noncollecting gates and the charge is no longer confined (the potential profile of Figure 8b is flat).
20
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
,*-. I
8 ,
“Q2
barrier phase
FIGURE10. Potential well alteration by collected charge.
The charge will spread up and down the channel, an effect known as “blooming.” The optimum full well is achieved when the noncollecting gates, the “barrier phases,” are in inversion (VG2)and the collecting gate potential ( VG,)is set so that the surface and bloomed full well levels coincide. If VG,is too low, blooming will occur first; if it is too high, the collected charge will reach the surface first.
C. Charge Transfer Once the charge has been collected into the pixels, the packets must be transferred to the output. The charge contained in the potential well beneath a CCD gate is moved to the next and following gates by what is usually described as a simple process of phased, or peristaltic, clocking. The most common scheme has three phases, such as the one in Figure 11, in which three adjacent gates form one pixel. The charge is collected under one phase, for example p l , which is held at a positive voltage, while the other two phases (p2 and p 3 ) are held at negative voltages (Figure 1l(a)). The adjacent phase in the desired direction of motion, for example p2, is then also made positive, causing the charge packet to become distributed under p l and p 2 (Figure ll(b)). A short time, later p l is set to the negative voltage level, forcing the entire charge packet to collect under p 2 (Figure 1l(c)). The next transfer begins when p 3 is set high (Figure ll(d)) and ends with the packet
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
I
’
I
21
I
1
_ _ _ _ _ _ _ _ _ _ _,- - - - - ‘ ---- - - - _ _ _ c
1
_ - - - - - - _ _ ., ----\. - - - _ _ - .-,- og;-. I
-.
Fmme 1 I . Three-phase charge transfer sequence showing the charge packets and potential wells under the CCD gates.
under p3. Repeating a similar sequence with p 3 and p l will move the charge packet under the next p l gate, completing a one-pixel transfer for the threephase CCD. The actual movement of the charge from one gate to the next occurs by three basic mechanisms [Banghart et al., 19911: thermal diffusion, drift due to the fringing field between gates, and self-induced drift due to the mutual electrostatic repulsion between charges. Thermal diffusion is simply the random thermal motion of the electrons, which tends to move them from regions of high concentration to regions of low concentration. Drift is the motion caused by an electric field. Because of the coupling between the gates and the resulting potential gradients, a “fringing field” exists which sweeps electrons into the shifting potential wells. An electric field is also created by the electrons themselves, which causes them to repel each other. This self-induced drift is most effective for large packets at the beginning of charge transfer when the concentration of electrons in the starting well is high. Thermal diffusion and fringe field drift are not dependent on packet size, hence are important near the end of the transfer and at the small signal limit. All three transfer mechanisms are sensitive to temperature through the
22
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
thermal velocity and electron mobility. The transfer proceeds quite rapidly, with a time constant on the order of a few nanoseconds, so it will cause noticeable delay only in very high-speed devices. To achieve efficient transfer, the gates of a CCD must be placed close together, close enough, in fact, that their depletion regions are coupled together. The first devices to be fabricated used a single layer of aluminum for the gates, which was etched to form the individual phases. In order to get good coupling between gates, they had to have gaps of less than 3 pm between them. Achieving this without any shorts between gates proved a considerable challenge, and yields of properly functioning devices were low. Later, researchers discovered that the yield could be improved by using multiple gate layers and overlapping them [Bertram et al., 19741. This arrangement created good coupling between gates, while reducing the chance of a short because of the insulating layer between them. Although several variations of the gate structure are in use, the most common is a three-layer configuration. Each layer is etched from a doped deposition of polysilicon. After etching, the wafer is oxidized before the next layer of polysilicon is deposited in order to insulate the layers from each other. The process for a typical CCD shift register is shown in Figure 12.
-
oxide --r. implanted n layer
p substrate I
First plysilicon depasrtion
Polysilicon etch (phase one gates)
Oxidalion
Second polysilicon layer (phase two gates)
Third polysilicon layer (phase three gates) and passlvationoxide
FIGURE12. Fabrication sequence of a three-layer polysilicon process for CCD gate structures.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
23
1. Simulations As part of our investigations, we used simulation software to simulate a segment of a CCD shift register. First, we used a two-dimensional process simulation package, TSUPREM4 [TMA, 1994b1, to calculate the doping concentrations and create the simulation mesh for six gates of a three-phase, buried channel CCD with 15-pm pixels using the three-layer polysilicon process just described. This mesh was then fed into a two-dimensional device simulation package, MEDIC1 [TMA, 1994a1, which iteratively solves Poisson’s equation and the charge-continuity equation to calculate the potentials and charge concentrations in the device. The results are shown in Figures 13 and 14; Figure 13 shows a set of potential contours, while Figure 14 shows the corresponding charge concentration contours.
D. Charge Detection Charge detection is the last stage of CCD operation in which the charge packets collected in the device and transferred to the output are converted one at a time into voltage signals which can be processed by external
Distance (Microne)
FIGURE13. Potential contours for a 15-pm pixel, three-phase CCD.
24
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
1
0.0
1
1
1
1
5.0
~
I
I
I
l
10.0
I
I
I
I
I
15.0
I
,
I
20,01 , l
I
I
8
I
I
25.0 Distance (Microns1
I
t
’3d,0’
I
8
1
I
35.01
‘46.0
FIGURE14. Charge concentration contours for a 15-pm pixel, three-phase CCD.
electronics. The usual method is to use a floating gate amplifier, shown schematically in Figure 15. The floating gate amplifier consists of two transistors: the output transistor and the reset transistor. The output transistor is connected in a source follower configuration with a load resistor R , from source to ground. The gate of the output transistor is connected through the reset transistor to the reset drain voltage, but when the reset transistor is off, the output gate/reset source node is “floating.” The output sequence begins with the reset transistor turning on and resetting the output gate voltage to a fixed value (V,,) in the linear region of the output transistor. The reset transistor is then turned off and the gate node is allowed to float. Then the next charge packet is transferred to the gate node through the last gate (LG) of the CCD serial register. The last gate is set to a fixed value between the high and low levels of the serial register gates so that it forms a half-height potential barrier between p 3 and the floating node. It is not clocked in order to avoid spurious signals caused by capacitive coupling between the last gate and the floating node. Some CCDs have a special, separately clocked gate in the place of p 3 called a “summing well.” The summing well (SW) allows one to combine, or ‘‘sum,’’ several pixels in a row
RADIATlON DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
25
FIGURE15. Diagram of a C C D output circuit showing a cross section of the last few gates and a schematic representation of the output source follower amplifier.
before transferring the combined packet to the output. The summing well gate is usually larger than the regular gates in order to increase its full well capacity. When p 3 (or SW) goes low, the charge packet is spilled over the barrier onto the gate node and a voltage is induced there which is proportional to the charge and inversely proportional to the capacitance of the node. The voltage at the output source is the gate node voltage multiplied by the gain of the source follower, which is close to unity. From the output source, the signal is applied to an external preamplifier and then to the rest of the signal processing circuitry. The timing of the output sequence is shown in Figure 16. The A V indicated on the output waveform (0s)in the figure is the difference between the output after the reset pulse (the reset level) and the output after the charge packet has been dumped to the floating node (the signal level). The difference between these two levels is the value for that pixel. Note that the 0s waveform shows the feedthrough of the reset pulse, which is a result of the parasitic gate-source capacitances of the reset and output transistors ( C , and C, in Figure 15). The capacitance of the floating node is an important parameter because it determines the sensitivity of the output circuit. The smaller the capaci-
26
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI
P3
3 reset feedthrough
reset level AV
= qlC
signal level
FIGURE16. CCD output sequence showing the waveforms for the phase three gate (p3). the reset gate (RG), and the output source node (0s).
tance, the greater the voltage induced by a given amount of charge. The main component of the capacitance is that between the gate and channel of the output transistor. The gate-channel capacitance is proportional to the area of the gate, so it is desirable to make the output transistor as small as possible. There are also parasitic capacitances such as the gate-source capacitances C, and C, mentioned above, All other parasitics can be represented by a lumped capacitor C, to ground. The gate-source and gate-drain capacitances can be reduced by employing a lightly doped drain (LDD) structure, which reduces the overlap between the gate and the source and drain implants [Kim, Blouke, and Heidtrnann, 19901. The sensitivity of these LDD-type output transistor is around 1 pV/e-. Although the single-stage floating gate amplifier is the most common, other forms of the output circuit exist, such as the nondestructive charge sensing scheme mentioned in the Introduction. Recent devices manufactured by English Electric Valve (EEV), Inc. [EEV, 19951 have two-stage output amplifiers. In this configuration, the second-stage transistor is large to provide a high level of drive capability while causing minimal loading to the first stage, which enables the first transistor to be very small, increasing the sensitivity. These amplifiers exhibit an overall output sensitivity of 4 pV/e-.
111. RADIATION DAMAGE
In many scientific imaging applications it is necessary to subject the detector to harmful radiative environments. Such applications include almost any space mission, x-ray crystallography, certain forms of medical imaging, and energetic particle detection. Radiation deposits its energy in silicon in various ways, some of which can result in permanent damage. At its most
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
27
benign, the radiation energy may simply be transferred to mechanical vibration of the silicon atoms and be manifested as heat. Two of the more harmful effects are of most concern in electronic devices: the first is ionization and the second is atomic displacement. A. Ionizution Dumage
Charged particles, such as electrons or protons, lose most of their energy in Coulombic scattering, interacting with the silicon atoms through the electrostatic force. Because this is a long-range interaction, the dominant effect is small energy transfers to the atomic electrons [Van Lint, 1980al. If enough energy is imparted to the electrons, they will be ejected from the host atoms, creating free electrons and positively charged ions (ionization). Photon radiation, such as x-rays or gamma rays, can cause ionization in a similar way through Compton scattering. The ionization process is very similar to the photoelectric effect discussed in Section 11, and in the active silicon region it merely results in electron-hole pairs which will then diffuse and drift through the device if they do not immediately recombine. High-energy photons ( > 1 MeV) may also produce electron-positron pairs, although the probability of this type of event is extremely low. Like photo-generated carriers, the holes will migrate toward the substrate or channel stops and the electrons towards the potential wells where they will be collected as part of the signal. In nuclear particle detectors or x-ray imaging, this is precisely the desired effect used to detect the passage of high-energy particles or x-rays. In other applications the signal is spurious. A well-known phenomenon of this sort which occurs even in ground-based CCD astronomy is the appearance of spots or streaks in an image resulting from the passage of cosmic rays (see Part F for a description of cosmic rays). However, these spurious signals can often be removed by image processing, for example, taking two images of the same scene and eliminating any artifacts not present in both. In any case, in the active silicon region the effect is not permanent and is of little concern. If the incident radiation causes ionization in the oxide or other insulating material, however, the effect can be permanent. The insulating layer has a much wider bandgap than the semiconductor, thus it takes a larger amount of energy to excite electrons to the conduction band (about 18 eV per electron in SiO, [Van Lint, 198Oc]), and the midgap trapping states are correspondingly deeper. The existence of large numbers of deep trapping centers in the oxide means that the electron-hole pairs created in the oxide layer which escape recombination can be trapped for long periods of time, essentially permanently. In MOS devices under
28
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
positive gate bias, the electrons are usually swept out of the oxide very quickly by the high electric field, but the holes tend to be trapped near the semiconductor-oxide interface [Van Lint, 19871. The positive charge buildup due to the trapped holes alters the electric field in the device and results in a shift in the flat-band voltage. These changes to the flat-band voltage can be compensated for by simply adjusting the operating voltages. Ionizing radiation also creates trapping states at the semiconductor-oxide interface. These interface states can have several effects. If they are deep trapping states, holes or electrons can be held semipermanently at the interface, resulting in charge buildup as just described. Interestingly, the negative charge of trapped electrons can compensate for trapped holes and actually reverse the damage. Shallower interface trapping states can severely degrade the charge transfer efficiency in a surface channel CCD, so in radiative environments buried channel devices are invariably used. Ionization damage can also provide midgap levels for carriers to thermally “hop” between the valence and conduction bands, which means an increase in the dark current, but because this occurs at the surface it can be significantly alleviated by one of the techniques described in Section IV. Finally, the interface traps due to ionization damage can affect the output transistor on the CCD, manifesting itself as increased read noise due to trapping, though once again buried channel devices are used to reduce the effects.
B. Displacement Damage The remaining, nonionizing fraction of the energy deposited by the radiation goes into displacements. Displacement damage occurs when the incident radiation interacts directly with the atomic nucleus with enough energy to displace the atoms from their positions in the crystal lattice (Figure 17). The recoil atom from the initial collision may travel some distance through the silicon and undergo further collisions or ionization interactions, producing more recoil atoms and leaving a trail of displaced or ionized atoms in its
FIGURE17. Displacement damage in silicon.
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
29
wake. The displaced atoms end up in interstitial positions, leaving vacancies in the lattice, and the combination is called a Frenkel pair. Displacement damage is primarily caused by heavy particles such as protons or neutrons, although electrons above a certain threshold ( - 180 keV) and even photons deposit a small fraction of their energy in displacements [Van Lint, 1980bl. Very high-energy particle, especially neutrons because they are not subject to Coulombic forces, may interact directly with the nucleus of the silicon atoms and create a cascade of secondary particles. The ejected secondary particles can then also cause further displacements or ionization. After the initial damage, a rearrangement of the atoms occurs through thermal motion. Most of the interstitial-vacancy pairs created by particle radiation recombine and have no permanent effect. Typically 2% of the initially generated pairs remain [Van Lint, 19871. The vacancies which do not recombine are unstable and will migrate to more favorable positions in the lattice, often combining with other vacancies or becoming trapped near impurities because of the stress these atoms cause to the lattice. These vacancy-vacancy and impurity-vacancy complexes introduce new midgap energy levels, which have the same effects as interface states, except that they occur in the bulk silicon. They produce an increase in the bulk dark current due to thermal hopping, and they produce increases in the CTI and read noise due to charge trapping. Large localized increases in dark current, or “hot” pixels, are frequently observed, which may be due to clusters of defects. Because the permanent effects of displacement damage are not confined to the surface as in ionization damage, they are seen even in buried channel devices, and the techniques for reducing the dark current at the surface are inadequate. Therefore, in modern scientific CCDs, the displacement damage is more important than the ionization damage. C. Bulk Trap Levels The most important of the radiation-induced defects in the CCDs we are studying is one which introduces a bulk trapping state with an activation energy of about 0.4 eV below the conduction band. Most CCD researchers [Saks, 1977; Janesick et al., 1991; Robbins, Roy, and Watts, 1992; Holland, 1993; Hopkins, Hopkinson, and Johlander, 1994; Gendreau et al., 1995; Meidinger and Struder, 19951 have identified this trap as being due to the phosphorus-vacancy (P-V) complex (or E center) because of the high concentration of phosphorus impurities used to create the n-type buried channel of the CCD. However, other researchers attribute this energy level to a singly charged vacancy-vacancy (V-V -) complex (or divacancy) [Coffa et al., 19971 or a combination of the two defects [Svensson, Jagadish, and
30
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
Williams, 19931. Benton and Kimerling [19821 gives a comprehensive listing of silicon defects and energy levels in which the E-center and the singly charged divacancy are listed with distinguished levels of 0.44 and 0.41 eV, respectively. Table 1 summarizes the various trap levels reported. It should be noted that the energy level and cross section are very difficult to resolve separately because their effects are closely coupled. This helps explain the TABLE 1 RADIATION-INDUCED TRAPLEVELS. Trap level E,-E, (eV)
Trapping cross section on (cm’)
Identification
Researchers
0.14 0.23 0.4 1
0-v v-v
0.4
P-v
J. Janesick et ul. (1991)
0.36_+0.06
P-v
K. C. Gendreau
0-v
N. Meidinger and L. Struder (1995)
P-v
I. H. Hopkins, G. R. Hopkinson, and B. Johlander (1994)
0.18
N. S . Saks (1977)
=
V-V- +unknown
1 x 10-14
0.416k0.029
et
ul. (1995)
A. Holland (1993)
0.12 0.30 0.42
1 x 10-14 1 x 10-14 6 x to-”
P-v
0.47
(3
P-v
M. S. Robbins, T. Roy, and S . J. Watts (1992)
0.18
1 x 10-14
0-v
0.23 0.41 0.44
2 x 10-’6 4 x 10-15 >I x
v-v= v-v P-v
J. L. Benton and L. C. Kimerling (1982)
1) x 10-15
0.16 0.23 0.41
0-v v-v= v-v-
S. Coffa et ui. (1997)
0.18
0-v
0.23 0.43
v-v v-v- + P-v
B. G. Svensson, C. Jagadish, and J. S. Williams (1993)
=
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
31
divergence in the values shown. Two other trap levels are commonly reported in radiation-damaged devices. These are the oxygen-vacancy (0-V) complex (A center), which has an activation energy of 0.18eV, and the doubly charged divacancy (V-V=), which has an activation energy of 0.23 eV [Svenson, Jagadish, and Williams, 1993; Benton and Kimerling, 19821. D. DLTS Meusurernents We performed deep level transient spectroscopy (DLTS) measurements [Kolev et al., 1997; Kolev et al., 19981 on a number of buried channel MOS transistors which had been irradiated as part of an earlier investigation into noise effects [Murowinski, Linzhuang, and Deen, 1993a-J.DLTS is a method of investigating trapping states by measuring the exponential decay of the trapped charge. From the variation in the exponential time constant over a range of temperatures the trap parameters can be determined. We used a constant resistance DLTS (CR-DLTS) method in which the charge state of trapping levels is monitored through the change in the threshold voltage of a FET. For a full description of the experimental method and setup and its applications, see [Kolev et at., 1997; Kolev et uE., 1998; Kolev and Deen, 1998; Kolev and Deen, 1997; Kolev, Deen, and Alberding, 19981. The test devices were lightly doped drain (LDD) depletion-mode n-type buried channel MOSFETs that were fabricated by Tektronix, Inc. as part of their CCD development program [Kim, Blouke, and Heidtmann, 19901. Three dies, each consisting of 15 independent transistors, were placed in 24-pin ceramic packages. Each package contained transistors with width (W) to length (L) ratios from 60pm/lOpm to 27pm/15pm and LDD lengths varying from 1 to 4pm. Two of the packages were placed in the beam of the University of Western Ontario tandem accelerator and subjected to radiation by 1 MeV protons. The irradiations were performed at room temperature and all pins were grounded. One set of devices received 5.0 x lo8 protons/cm2 and the other received 2.7 x lo9 protons/cm2, as determined by a previously calibrated event-counting detector. The third set of devices was not damaged. Figure 18 shows the DLTS temperature spectra measured for three different transistors of varying proton dose. The ordinate is the change in threshold voltage measured over a fixed interval during the exponential transient, in this case 7.44 ms. Five peaks are distinguishable in the spectrum of the most damaged transistor, each corresponding to a different trapping level. An Arrhenius plot (see Section VI for an explanation) of the positions
32
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI
t
E5
50
100
150 Temperature (K)
200
250
FIGURE18. CR-DLTS spectra of radiation-damaged buried channel MOSFETs compared with the spectrum of an undamaged device.
of these peaks for several different intervals is shown in Figure 19. A linear least-squares fit to the data reveals the trap parameters, which are summarized in Table 2. Again we see dominant peaks at around 0.43eV and 0.23 eV, suggesting the divacancy and the phosphorus-vacancy complexes, and at 0.17eV, which is very likely the oxygen-vacancy complex.
E. Annealing After irradiation, the defects caused in a device can be repaired through thermal motion of the atoms in the lattice. This process is called annealing and is highly temperature-dependent. In fact, the temperature dependence of the annealing process is often used to identify the defects introduced by radiation because it is different for different defects. The divacancy levels show little annealing below 300°C, and the oxygen-vacancy complex is stable to 3 W C , whereas the phosphorus-vacancy has a characteristic anneal temperature of 150°C [Benton and Kimerling, 19821. Holland et al. [1990] have performed tests of the effect of annealing on the proton damage in CCDs and found that 85% of the detectable damage could be removed by annealing at 160°C for 16 hours. This suggests that the damage was due to the E center. Robbins, Roy, and Watts [1992] report an almost complete elimination of the trapping effect from the radiationinduced level at -0.4 eV after annealing at 15OoC,again suggesting that the
-
-
-
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
0
50
150
100
33
200
l/kT (eV') FIGURE 19. Arrhenius plot using the CR-DLTS data of the device which received 2.7 x lo9 protons/cmz.
P-V center is responsible. Robbins, Roy and West [1992] also observed an increase in the trapping at a shallower level after the anneal, which was attributed to an increase in the density of 0 - V centers. We did not perform any experiments to investigate the effect of high-temperature annealing, but these results indicate that it could be a successful means of alleviating
TABLE 2 OF IRRADIATEDBURIED CHANNEL MOS TRANSISTORS. DLTS MEASUREMENTS Trap label
Trap level E ,
-
E , (eV)
Trapping cross section CT,, (cm2)
El
0.107
1.7 x
E2
0.166
3.5 x 1 0 - ' 4
E3
0.225
4.8 x 10-15
E4
0.293
4.2 x 10-15
E5
0.425
1.2 x 10-14
10-15
34
T. D. HARDY, M. J. DEEN, AND R . MUROWINSKi
radiation damage. It may not be practical for a spacecraft-mounted device, however. Provision must be made for a high-power onboard heater or periodic reorientation of the spacecraft to make use of solar heating, and the CCD package must be able to survive the elevated temperatures required. The FUSE FES design does include heaters to maintain the target operating temperature, but they are insufficient to raise the temperature of the CCD above about 30°C. Solar heating would be a possibility, but the maximum temperature stated for the CCD package used in the design is about 60°C. Therefore, annealing is not possible without significant design changes. F. FUSE Radiation Environment
Spacecraft such as the FUSE satellite which operate in low earth orbits (LEOS) are subject to four major sources of radiation [Barth and Stassinopoulos, awaiting publ.]: heavy ions trapped in the magnetosphere, protons and electrons trapped in the Van Allen belts, cosmic ray protons and heavy ions, and protons and heavy ions from solar flares. Cosmic ray particles originate outside the solar system and include ions of all elements from atomic number 1 to 92 with energies from around 10 MeV to hundreds of GeV, which makes them difficult to shield against. Unlike most radiation originating in space, they are able to penetrate the earth’s magnetic fields and affect devices on the ground. The heavy, highly energetic particles produce intense ionization as they pass through matter; however, the flux level of these particles is low even for LEO, so although they are a concern in terms of the spurious signals they generate, we will not consider them in estimating the permanent damage. The heavy ions trapped in the magnetosphere are largely of such low energy that they are not able to penetrate a spacecraft to affect the electronics and the trapped electrons cause only small amounts of damage, so neither are of much concern. The trapped protons, however, along with the solar flares, are very difficult to shield against and can be a significant source of damage. The protons in the Van Allen belts vary in energy from keV to hundreds of MeV and in intensity from 1 to 1 x lo5 protons/cm2/s. The actual populations depend on the altitude and inclination of the orbit, the cyclic activity of the sun, geomagnetic storm perturbations, and the gradual change in the earth’s magnetic field. The solar flare activity is random and cannot be predicted with certainty, although several probabilistic models exist.
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
35
We have made estimates of the expected damage to the FES CCD due to energetic protons encountered by the device in orbit. The radiation environment was taken from the calculations by Stassinopoulos and Barth [1991] for a 700 km, circular, 28-degree inclination orbit and scaled for the baseline FUSE orbit (800 km, circular, 25 degrees). Figure 20 shows the expected total proton flux for the three-year mission as a function of proton energy. The flux for three different shield cases are shown: 5-mm aluminum, 0.4-mm aluminum, and no shield. Also shown is the spectrum of unattenuated solar flare protons for four anomalously large solar flares, the number recommended in Stassinopoulos and Barth [1991]. It should be noted, however, that at the inclination of the FUSE orbit (<45 degrees), all of these solar flare protons will be stopped by the geomagnetic field, thus are of no concern [Barth and Stassinopoulos, awaiting publ.]. Once the total proton flux is known, it is necessary to take into account the varying penetration depth of protons of different energies in order to predict the effective damage to a device. The most harmful protons are those absorbed in the active region of the silicon, which is usually a layer about 2 microns thick. We took the spectra from Figure 20 and calculated the number of silicon lattice displacements expected in the active region of the
5 I-.--
-.-.+
m m A1
0.4 m m A1
no shield flares
3
FIGURE 20. Total proton flux over a three-year mission. The lowest curve is for 5 mm A1 shield.
36
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI
h
g
F4
*5E II) c)
1010 109 5mmAl
108
0.4 nun A1
107
no shield flares
0 -1
100
10'
102
103
Energy (MeV) FIGURE21. Total displacements over a three-year mission. The lowest curve is for 5 mrn Al shield.
CCD. This was based on data generated by Janesick et nl. [1991] using TRIM [Ziegler, Biersack, and Littmark, 198.51 software, who assumed proton incidence on the frontside with an oxide layer of 2 p m and a polysilicon (gate) layer of 0.2pm above the active silicon. The number of displacements as a function of energy is shown in Figure 21. From this figure, it is clear that the most damaging protons are those at around 250keV. Those with lower energies cause no significant damage because they do not penetrate to the active region of the device, and those with much higher energies pass right through. The displacement spectrums were then integrated to find the total number of active region displacements expected over the FUSE mission. Since the solar flare protons will be completely attenuated as mentioned above, they were not included in the estimates. The results are given in Table 3. These results indicate that shielding with aluminum has a significant effect and should be employed to minimize the damage caused by radiation. IV. DARKCURRENT A . Theory
The dark current in a CCD is the charge generated in the imaging area which is not caused by the photoelectric effect. It will accumulate in the
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
37
TABLE 3 INTEGRATED DISPLACEMENTS FOR A THREE-YEAR MISSION. Shielding
Displacement s/cmz
5mm A1
1.32 x 10*
0.4mm Al
5.54 x los
no shield
3.59 x loy
potential wells under the pixels even when no light is incident on the device, hence the name. The dark current is a problem in scientific imaging for two reasons. The first is that it introduces an unavoidable background signal which builds up in the potential wells during an exposure until the device reaches full well and saturates the image. This limits the exposure length which can be used, a factor particularly important to astronomical applications where exposures can last up to several hours in order to detect very faint objects. The second problem with dark current is that because it is a random process, it introduces noise. There are three sources of dark current:
1. thermal generation in the depletion region due to bulk traps, 2. thermal generation at the silicon-silicon dioxide interface due to interface states, and 3. diffusion current at the edge of the depletion region. 1. Bulk Generation
Thermal generation of carriers in semiconductor materials is the result of electrons possessing enough thermal energy to be excited from the valence band into the conduction band. According to Fermi-Dirac statistics, the probability of an electron having enough energy to do this at thermal equilibrium is [Kim, 1979a1
where E , is the energy of electrons in the conduction band and E , is the Fermi energy. At normal temperatures, very few electrons will have the required energy to make this transition directly. However, the process is assisted by the presence of energy states in the bandgap (or traps). When these states are present, electrons can be excited to the conduction band in
38
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
two steps, first to the midgap state and then to the conduction band. Each of these transitions requires less energy than the direct one, thus is more likely to occur. The four transitions t o be considered for a trap with level E , are shown in Figure 2 2 (a) an electron jumps from the conduction band to the trap level (electron capture), (b) an electron jumps from the trap level to the conduction band (electron emission), (c) an electron jumps from the trap level to the valence band (hole capture), and (d) an electron jumps from the valence band to the trap level (hole emission). The rate at which process (a) occurs depends on the density of electrons in the conduction band n and the number of empty traps. It is given by
where is the thermal velocity, n is the free electron concentration, N , is the density of traps, and f is the probability that a trap is occupied. The rate of process (b) is proportional to the number of filled traps
Here the proportionality constant is the electron emission probability en: n
=u
.,e(Et-Ei)/kT
(6)
t h n r
where ni is the intrinsic carrier concentration and Eiis the intrinsic Fermi energy. Similar expressions can be written for the hole processes (c) and (d). The four processes will drive the fraction of filled traps f to an equalized point where the number of electrons entering the trapping states is equal to the number of electrons leaving, i.e.,
ra + r,, - rb - r,
=0
(7)
Using Equation 7 we can combine the above expressions to determine f at this point, hence determine the net rate of carrier generation G :
Ec
(a)
(b)
(C)
(d)
FIGURE 22. Generation and recombination of carriers through trap levels. (a) Electron capture. (b) Electron emission. (c) Hole capture. (d) Hole emission.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
39
where p is the density of free holes and CJ,, is the hole capture cross section. When a semiconductor is in steady state, n; = np and the net generation rate is zero. For a device in deep depletion, such as a CCD pixel during exposure, n and p are much less than n, in the depletion region and the net generation rate becomes
where zo is called the "effective lifetime within a depletion region" and is given by (Et-ExIlkT + CJ e ( E x - E t ) / k T
To
= one
P
(1 1)
2CJnCJpU,hNt
The bulk dark current density (in nA/cm2) can now be written
where x,, is the width of the depletion region. The dark current in electrons per pixel can be found by multiplying Equation (12) by the pixel area A , and dividing by the electronic charge q :
For states near the midgap, which dominate the bulk dark current generation because of the exponential dependence of the generation rate on the trap energy, zo is nearly constant, and the primary temperature dependence of the dark current comes from the intrinsic carrier concentration ni, which is given by
where N , and N , are the effective densities of states in the conduction and valence bands.
2. Surface Generation At the interface between the silicon and the silicon dioxide, the periodic lattice structure of the crystalline silicon is disrupted and results in a high density of midgap states. These interface, or surface, states cause a high rate of thermal generation at the surface [Deen and Quon, 19911. The process is
40
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI
identical to bulk generation except that instead of isolated trapping states, there is a continuum of energy levels distributed across the bandgap. Instead of a single trap concentration we now have an energy-dependent trapping state density D ( E J and must integrate the generation rate over energy to get the net generation rate G:
Again, because the generation rate is exponentially dependent on the activation energy E,, the dominant generation centers are those within a few kT of the midgap energy Ei.The distribution of states D(E,) has been found to be fairly uniform across the bandgap, except for significant peaks near the conduction and valence band edges where the contribution to thermal generation is extremely low. Therefore, for calculation purposes, it is reasonable to use a uniform distribution, D(E,) = Dt. If we make the further simplification of replacing up and on with their geometric mean o = [Burke and Gajar, 19911, the integral in Equation (15) can be evaluated analytically to give
J‘...,
G
= 71 av,,D,kTn, L
and we can write the surface dark current J,, in nA/cm2, as
where so is the “surface generation velocity” and is given by IL
SO
= -ov,hD,kT
2
(18)
The dark current in electrons per pixel can be found by multiplying Equation (17) by the pixel area A , and dividing by the electronic charge q: J,
= APniso
(19)
As with bulk dark current which is dominated by midgap states, the main temperature dependence of the surface dark current comes from the exponential term in n i .
3. Surface Dark Current Suppression Because of the high density of states at the silicon-oxide interface, the surface generation forms the largest component of the dark current (up to 98%) in most CCDs. Fortunately, in buried channel devices, it is possible to significantly reduce or eliminate the surface dark current by running the
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
41
clocks in an inverted mode. If the gates of the device are biased sufficiently negative, an inversion layer of holes will be created at the surface. These free holes will fill the interface states and prevent them from capturing electrons from the valence band, thus halting the carrier generation process. Therefore, while the surface is inverted, the surface component of the dark current is essentially zero. To take advantage of this phenomenon, it is necessary only to set the low level of the transfer clocks to a sufficiently negative potential. However, for a three-phase device, one of the phases must remain high during an exposure in order to create a potential well for collecting the signal charge; the surface under this phase will still produce dark current. There are two methods for dealing with this difficulty: placing an extra implant under one of the phases so that all the phases can be inverted at the same time, or using a dithered clocking scheme. The first method involves performing an extra implantation step during the CCD fabrication process. The usual method for an n-type buried channel device is to add a light boron implant under one of the three phases. The negatively charged impurity atoms lower the potential under that phase, even in inversion, and allow all three phases to be inverted while maintaining a potential barrier between pixels so that the charge packets are still confined (Figure 23). Devices with this implant are referred to as multiple pinned phase (MPP) devices, since all the phases can be inverted, or “pinned,” at the same time. Because of the dramatic difference that surface dark current suppression can make, many CCD manufacturers offer MPP versions of their devices. For devices without an MPP implant, it is still possible to suppress the surface dark current under all the phases of a device using a dithered clocking technique. It has been shown [Burke and Gajar, 19911 that when
potential
FIGURE 23. Charge confinement in an MPP device.
42
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
the surface is switched from inversion to depletion, the surface generation rate remains low for a characteristic period of time before recovering to its steady-state value. Thus if a phase is only allowed to remain in depletion for an amount of time shorter than the recovery time before it is inverted again, the surface dark current will be almost completely suppressed. To do this during long exposures, it is necessary to rapidly switch the charge collecting potential well back and forth between two of the phases, with each of the two phases being inverted while it is not collecting charge. If the switching period is kept short enough, this will effectively suppress the dark current through the entire period.
4. Diffusion Current The third source of dark current, the diffusion current [Jaggi and Deen, 19881,is caused by the minority carrier gradient at the edge of the depletion region. At the depletion edge, the concentration of minority carriers is zero and increases exponentially into the substrate. For a p-type substrate the minority carriers are electrons, so there is an electron diffusion current flowing into the depletion region from the substrate. This current is proportional to nz (as compared to simply ni for thermal generation), and at the usual CCD operating temperatures (
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
43
register I
imaging area
-
I I
A
;
undamaged
; I I
I
B
1
I
; 4
I
I
I
I
I
I
I
I I I
I
I
I I
C
1.5~10~ 6.0~10~ protons/cm2 protons/cm2
I
I
output amplifier
t
I I I
, I
I I
I I
I I
I
FIGURE24. Radiation damage of the CCDs used in our experiments.
to those described in Section I11 for the FUSE radiation environment, we estimated that this would result in the two damaged sections (C and B) containing about 9.0 x lo8 and 2.3 x lo8 displacements/cm2, respectively. The undamaged section A was used as a control region. In all of the experiments reported here, which were conducted roughly two years after the irradiations were performed, the CCDs were always read out of the undamaged output amplifier in section A. The experimental setup is shown in Figure 25. The CCD is mounted in a liquid nitrogen dewar with a quartz window. A low-noise amplifier is located a short distance from the dewar to provide some gain ( 5 to 10) and to buffer the CCD output. An electronic chassis is located within one meter of the dewar, containing the clock drivers and bias supplies for the CCD. It also houses the analog signal processing chain, which performs a correlated double sample with the dual slope method [Murowinski, Linzhuang, and Deen, 1993a1 prior to passing the signal to the 16-bit analog-to-digital converter. The electronic chassis is under the command of, and passes its data to, the data acquisition processor. Raw data are saved as disk files in the FITS (Flexible Image Transport System) standard [Wells, Greisen, and Harten, 19811 and are subsequently analyzed. The temperature during the experiments is regulated with the help of a calibrated silicon diode mounted in the cold block contacting the CCD. The diode voltage drop at a constant current is used as input to a linear temperature regulator capable of depositing up to about 10 watts of heat at
44
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI electronic chassis
preamplifier
clock drivers bias supplies
I
I
temperature controller
host PC
FIGURE25. Experimental setup for CCD measurements.
the CCD mount. The thermal connection between the CCD mount and the 77 K station in the dewar contains a variable resistance link to allow us to further extend our experimental temperature range. A platinum resistance temperature detector (RTD) is clipped directly to the CCD package and serves to indicate the temperature of the device. The TK512 has an implant allowing it to be run in an M P P mode, so we measured the dark current with the device running in this mode as well as in a normal inverted mode (two phases inverted). Figure 26 shows the results over the range 180-280K. Above around 270K, the dark current was so high that the device would saturate during the readout period. Below 180 K, the dark current was too small to be measured without extremely long exposures. The figure shows six sets of data. The upper three sets are the data obtained while running the device in non-MPP mode, with one set for each of the three sections of the CCD: unirradiated, low radiation, and high radiation. The lower three data sets were obtained with the device in M P P mode. The data clearly show an increase in the dark current with increased radiation. We used the models given in the preceding sections to fit theoretical curves to the data. The dashed lines represent the surface generation model from Equation (19), which was fit to the non-MPP data because this data should be dominated by the surface dark current. We found that our initial data did not match the expected temperature dependence. Since the theo-
RADIATION DAMAGE O N SCIENTIFIC CHARGE C O U P L E D DEVICES
45
retical dependence has been so well established by numerous CCD researchers, we assumed that our temperature readings must be wrong, We postulated that there was a temperature gradient between the RTD and the CCD due to an imperfect thermal connection such that the RTD reading deviated from the actual device temperature by an increasing amount as the CCD was cooled. We were able to correct for this by applying a linear transformation to our temperature data. A good fit to the theoretical dependence was found by adjusting the RTD readings by 7 % of the difference from room temperature. The data shown in Figure 26 have been adjusted by this factor. For the unirradiated section of the device, the plotted model in Figure 26 uses D,= 1.6 x l O * ~ m - ~ e V - and ' , 0 = 5 x 10-'6cm-z. In the high-radiation case we evaluated the model with D, = 4 x lo9 cm-2 eV-', and cr = 5 x 10- l 6 cm-2, and for the low-radiation case we used exactly one-quarter of this D, value. Since we have no way of independently determining the ~7for the surface states, there exists an ambiguity in the D, and cr values. For our results we simply chose (i to be representative of typical values and found the corresponding D, which provided the best fit to the data. The dark current for a CCD is usually quoted as the room temperature rate in
A
$a \
o
1.5 x lo9 protons/cm2
+ 1.5 x lo9 protons/cm*
- - _ surface component
- bulk component
1 180
200
220
240
260
Temperature (K) FIGURE 26. Dark current as a function of temperature. The symbols represent measurements in the three sections of the device. The solid lines represent the bulk generation model and match the MPP mode data. The dashed lines represent the surface generation model and match the non-MPP mode data.
46
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
nA/cm2. Our measurements here correspond to room-temperature currents of 0.15, 0.9, and 3.6 nA/cm2 for the unirradiated, low-radiation, and highradiation sections of the device. The solid lines represent the bulk generation model of Equation (13). We fit this model to the MPP mode data because this data should have no surface component due to the total inversion of the surface in MPP mode operation. The displayed curves were calculated using a depletion depth xd of 4.3 pm (based on device simulation) and assuming a trap at the midgap with E, = 0.55 eV below the conduction band and (T, = np = 5 x 1OI6 crn-'. For the unirradiated case we used a trap concentration N , = 4 x lo9 cmP3, , for the high radiation for the low radiation case N , = 1 x 1 0 " ~ m - ~and case N , = 4 x l O " ~ m - ~ .However, as in the surface current case above, there is some ambiguity here between the trap concentration and the trapping cross section, and the (T values were simply chosen as typical. These results correspond to room-temperature rates of 0.031, 0.076, and 0.31 nA/ cm2 for the unirradiated, low-radiation, and high-radiation sections of the device. We see from these results that the damage is directly proportional to proton fluence. The density of radiation-induced traps in the high-radiation section, for both surface states and bulk states, is exactly four times greater than in the low-radiation section. Interestingly, the bulk dark current and surface dark current show an identical dependence on temperature because they are both dominated by trapping levels near the midgap. We originally expected the bulk dark current to show an activation energy of around 0.4 eV because this (identified as the E center) was the dominant level in both our CTI measurements (see Section V) and our DLTS measurements (Section 111), as well as being the most widely reported trapping level in radiation-damaged CCDs. To our knowledge, a bulk trap level of 0.55eV has not been reported before by CCD researchers. However, the result is not surprising. If a 0.55eV level exists, because it is so close to the midgap it would dominate the dark current generation due to the exponential dependence of thermal generation on the activation energy of the trap. A 0.55 eV trap will produce roughly 160 times as much dark current as a 0.4 eV trap for the same trap concentration, so it would be impossible to see the effects of the 0.4eV trap on the dark current. The 0.55eV level would not have shown up in our CTJ or DLTS measurements because it is too deep. Also, as Saks [1977] points out, the traps responsible for the dark current generation may be located in the depleted portion of the p-type substrate, whereas to contribute to the CTI, they must be in the n-type buried channel. The substrate does not contain the high concentration of phosphorus impurities which go into creating the E center, thus entirely different trapping states may dominate.
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
47
1. Noise
Thermal generation is governed by Poisson statistics in which the variance is equal to the mean, so the inherent noise in the process (the standard deviation) is proportional to the square root of the mean level. However, in CCD images there is additional noise because the mean generation rate varies from pixel to pixel, and the total noise shows a linear dependence on the global mean level. This additional variation seems to be relatively stable over time, thus can be measured and systematically removed. Figure 27 shows measurements of the dark current noise in the high-radiation section of the TK512. The data indicated by crosses were obtained with the device operated in M P P mode; those data are dominated by bulk dark current. The data indicated by diamonds were measured in non-MPP mode; those data are dominated by surface dark current. In Figure 27, the upper two sets of data represent the raw dark current noise, which follows a linear dependence on the dark current level. The solid lines represent linear least-squares fits to the data. The upper (MPP) line has a slope of 0.32 and the lower (non-MPP) line has a slope of 0.048. The noise deviates from linearity at the high end, which is probably due to the saturation of the signal. The full well for this device is around 200,00Oe-, and as the mean
A
l i
Mean dark charge (e-) FIGURE27. Dark current noise. The upper two sets include spatial noise; the lower two sets have the spatial noise subtracted out. The lines are least-squares fitted curves.
48
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
approaches this level the noise (standard deviation) will be lower than it should be because the upper tail of the distribution will be cut off. The lower two sets of data represent the noise with the spatial noise component subtracted out. To obtain this data we took two images at each dark current level and subtracted one from the other. The fixed pattern noise was thus removed. The residual noise, adjusted for the noise increase from the subtraction, displays the expected square root dependence in the non-MPP case, as indicated by the lower dotted line. In the M P P case the residual noise still follows a linear curve with a slope of 0.008. It is interesting that the MPP mode (bulk) noise is greater than the non-MPP mode (surface) noise for a given dark current level, both before and after bias subtraction. From the bias-subtracted data, it appears that the noise associated with bulk generation exceeds Poisson statistics. However, for a given exposure length the mean M P P dark current level is so much lower that the resulting dark current noise is still less than for the non-MPP dark current. We conclude that an effective means of reducing the noise in an image due to the radiation-induced dark current (as well as eliminating the bias introduced by this dark current) is to subtract a dark frame. A “dark frame” is an exposure of equal length at the same temperature with no light incident on the device. Subtracting the dark frame will eliminate the spatially fixed noise caused by the dark current, leaving a much lower residual noise. In the surface dark current case, the residual noise is at the fundamental limit imposed by the Poisson statistics which govern the generation process. We tried to improve the noise reduction of the dark frame subtraction by creating a “super dark frame,” an average of several long dark exposures. This super dark frame was then scaled to the length of each exposure and subtracted. We found that this did not improve the noise performance; the best noise removal was obtained when the dark frame had the same length of exposure as the image and was taken at about the same time. This may be due to small fluctuations in the temperature of the device over time as a result of imperfect temperature regulation.
V. CHARGE TRANSFER EFFICIENCY
The charge transfer efficiency (CTE) of a CCD is the amount of charge in a given packet which is successfully clocked past the series of gates from one pixel to the next. This factor is usually represented as a single fraction, typically 0.999990 for a modern CCD. Because the CTE is so high for most CCDs, it is more conveniently expressed in terms of the charge transfer inefficiency (CTI = 1 - CTE = 1 x 1W6). The use of such a number implies
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
49
that the total signal loss is simply proportional to the amount of charge in a packet and the clocked distance. However, these and other experiments [Murowinski, Deen, and Hardy, 1995; Hopkins, Hopkinson, and Johlander, 1994; Murowinski, Linzhuang, and Deen, 1993b; Hardy, Murowinski, and Deen, 1997; Mohsen and Tompsett, 1974; Zayer et al., 19931 show that the CTI is dependent on a number of other factors, including temperature, clocking speed, and even the history of charge packets clocked through a given pixel. The charge transfer efficiency is an important parameter in scientific CCDs, especially for the demanding photometric applications of optical astronomy. Astronomers are frequently interested in the precise brightness of a given object, and if an indeterminate amount of signal is lost, it is impossible to estimate accurately from the CCD image. If the CTE is poor, some faint objects may even be completely obliterated. Low CTE also has the effect of smearing the image, which will affect the CCD’s usefulness in assessing object morphology (shape) or position. The latter is of importance to our work for the FUSE mission, since the CCD is to be used to track the position of stars in order to maintain the pointing of the main telescope. If the centroid of a star image deviates in unpredictable ways by even very small amounts (less than one-tenth of a pixel), the performance of the Fine Error Sensor will be compromised. Figure 28 shows an image taken by the damaged TIC512 CCD we used in our experiments (see Section IV) in which the smearing effect can be clearly seen. The point images in the most damaged section (section C) have been smeared into streaks, especially those in the upper right corner, which have undergone the most transfers (the readout amplifier is at the lower left corner).
A . Simple Physical Model All three of the charge transfer mechanisms mentioned in Section 11, thus the CTI, are sensitive to temperature. Thermal velocity decreases with temperature, slowing charge transfer due to thermal diffusion. On the other hand, the carrier mobility will increase as the temperature drops, resulting in improved effectiveness of fringe-field and self-induced field drift. In addition, the efficiency of transfer due to the self-induced field is dependent on the size of the charge packet. However, the time constants of all these processes are on the order of a few nanoseconds, and at the clock rates used for astronomical CCDs ( - 1 p s per transfer) the effects on the CTI are negligible. In such slow-scan applications, two mechanisms which increase CTI through charge deferral are more important: potential pockets and bulk states or traps in the silicon. Potential pockets are irregularities in the
50
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
FIGURE 28. Effect of poor charge transfer efficiency. This CCD image shows severe CTE problems in the radiation-damaged section of the CCD on the right side.
potential well shape in which signal charge can be caught during transfer. In well-designed buried channel CCDs these pockets are minimized or eliminated, and trapping by bulk states is the more important chargedeferral mechanism. Trapping is certainly the most important effect for any CCD which has suffered bulk damage from energetic particles. A trapping state captures charges from a packet and emits them at a later time, which may be after the charge packet has been transferred to the next pixel. The charge which is “deferred” in this way is lost from the original packet and and Deen is thus a source of CTI. The following analysis, after Kim [1979~1 et a1 [1995], can serve as the basis of a charge deferral model resulting from bulk states. A similar analysis is given in [Dale et al., 19931.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
The capture and emission time constants t, and
T~
51
for the bulk states are
where E, is the trapping state energy level below the conduction band, on is the trapping cross section, urh is the thermal velocity of carriers, n is the density of electrons in the conduction band, and N , is the effective density of states in the conduction band. From Sze [1981],
where A* is the effective Richardson constant (252 Alcm2/K2 for n-type (lOO)Si), T is the absolute temperature, and q is the electronic charge. Therefore, we can express the emission time constant as a function of temperature: -
,kT
t, =
A*
ell- T 2
q
When a charge packet is present, the number of charges n, held in bulk states within the packet volume V, containing N , traps per unit volume is (assuming all traps are full)
n, = N, v,
( 24)
When the charge packet has been removed, n, will decrease exponentially with emission time constant t, so that
Consider one gate of the typical clocking scheme of a three-phase CCD such as p l of Figure 29. We assume that the charge packet has substantially arrived under this gate at time T,. Traps in the volume occupied by the charge packet will be filled with time constant t,, and because this time constant is relatively short we will assume the states are filled at T,. At time the charge packet becomes shared with p2 (as in Figure ll(b)), the volume of the charge packet under p l diminishes, and the traps under p l
52
T. D.HARDY, M. J. DEEN, AND R. MUROWINSKI
FIGURE29. Three-phase clock timing with transition points marked for determining the trap emission times.
begin to emit. By time q,the charge packet has moved on completely to p2. The traps underpl will continue to emit until q ,or, in the case of a sparsely illuminated CCD, until the next nonempty charge packet arrives. At that time, the states which have emitted their charges during the period since T, will be filled from the new packet, resulting in a charge loss from the new packet of
where is the total emission time from T, to the arrival of the new charge packet. A portion of this charge, however, is regained. Any charges emitted between T, and will rejoin their parent packet due to the fringe field (see Figures 1I(b) and (c)). In fact, some of the charges emitted between and TI5 (Figure 1 l(d)) will also join the parent packet, but because of uncertainty of the partition function, we will assume that the time period during which the charges can join the parent packet is Toin= - T,. The amount of charge that is reintroduced to the packet by these emissions is
and therefore the net loss from a charge packet for one transfer through a p l gate is
where
Ternif
is the emission time from the previous packet. A similar
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
53
evaluation using appropriate values for Kmitand Tjoin can be performed for the other two phases. If more than one trap level exists, additional terms of the form of the right side of Equation (28) can be added, using the corresponding values of ze and N,. A significant complication to the above model is the spatial distribution of the traps and the charge packets within the bulk silicon. The volume of the charge packet will vary with the number of electrons contained in it, thus the packet will interact with a varying fraction of the traps within the pixel. The equivalent two-dimensional effect in surface channel CCDs is known as the “edge effect” [Tompsett, 19731. This three-dimensional “shell effect” means that each shell of traps within the pixel will have a different emission period Ternit,which will depend on the frequency of charge packets large enough to interact with it. The traps in the outer shells, once filled by the arrival of a large charge packet, will continue to emit until a charge packet of equivalent size arrives to refill them. A further complication is that the packet volume does not have welldefined boundaries; rather, the charge concentration falls off gradually from the center of the packet. Our earlier assumption that the capture time constant z, was short is not valid for small charge concentrations because zc is inversely proportional to the charge concentration. If the capture time constant is not short compared to the time the charge packet spends in contact with the traps, we cannot assume that all the traps are filled, and Equation (24) is no longer valid. In order to calculate the number of filled traps in the less dense shoulder regions of a charge packet, it is necessary to take this time dependence into account. Trapping theory [Hopkins, Hopkinson, and Johlander, 1994) gives the number of filled traps as
nt -- n s s . ( l - e - l P c . e -
(29)
where t is time and nss is the number of filled traps at steady state, given by
and zc will vary across the charge packet according t o Equation (20). Therefore, the number of filled traps will depend on both the charge distribution in the packet and the amount of time t it spends in contact with the traps (the “dwell time” underneath a single gate [Hopkins, Hopkinson, and Johlander, 19931). To include this dependence in CTI calculations, the N , V, term in Equation (28) must be replaced by a volume integral of Equation (29), which gives
54
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
B. Measurement Techniques 1. Pulse Train Technique
There are various methods which can be used to measure the CTI of a CCD. The pulse train technique for measuring the CTI was the first one to be used, and was naturally suited to the analog shift registers that were the first devices to utilize the charge coupling concept. The technique involves electrically injecting a stream of identical charge packets into one end of a serial register and then transferring them to the other end, where they are detected by an output circuit. If there is a significant CTI problem, the output will typically look something like Figure 30. The first, or leading, pulse experiences charge loss due to the CTI. The remaining pulses do not, if charges are not lost during transfer, but are simply deferred by the trapping mechanism described earlier. Since each charge packet is roughly the same size, the packets will interact with the same traps within the cells of the serial register, and the number of charges trapped will be the same. During each transfer, the charges emitted by the traps which do not rejoin the original packet will join the following packet and be used to refill the traps so that there is no net loss from the following packet. Thus the nonleading packets can be used to estimate the original size of the leading packet, enabling a calculation of the charge loss and the CTI. The charges captured from the last packet in the train will join empty packets, and will not be able to interact with the traps available to the larger packets from the train. As a result, the traps in the outer “shells” will not be refilled but will continue to emit their charges into subsequent packets. These trailing pulses give a time profile of the decay of the trapped charge. The pulse train technique is not suitable for most imaging CCDs because they are not equipped with an input structure for electrically injecting charge into the registers.
charge
fl nq@? n
los ^
deferred
t
leading pulse
trailing pulses
FIGURE 30. Pulse train CTI measurement.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
55
2. X-ray Illumination One of the more precise techniques developed for measuring the CTI in CCD imaging arrays involves sparsely illuminating the CCD with monoenergetic x-rays. X-ray photons from nuclear decay have a predictable amount of energy and because this energy is much greater than the bandgap, each photon when absorbed in the CCD produces a small cloud of free electrons (see Section 11). The cloud contains a statistically consistent amount of charge with a mean value equal to the photon energy divided by 3.65eV and a variance equal to the mean value multiplied by the “Fano factor,” which is typically about 0.1 [Janesick et al., 19881. Because this cloud is smaller than the size of a pixel ( < 1 pm), all the charge is collected into a single packet, and the result is a well-defined charge packet size. For example, the Mn decay resulting in Fe55produces photons of 5.9 keV which create charge packets with a mean size of 1620 e- and a standard deviation of 13e-. By illuminating the CCD with such a radioactive source and examining the resulting image, we can measure how the charge packet changes as a function of clocking distance through the device and thereby measure its CTI. This is commonly accomplished by taking a series of images and plotting on the same graph the pixel values against the row number (for parallel CTI) for all the images. Each row should have a cluster of pixel values at around 1620e-, corresponding to x-ray events, but the mean value will decrease for rows farther from the output due to the CTI. The slope of the line fit to these points is equal to the CTI. Figure 31 shows such a plot for an undamaged CCD. As one can see from this plot, the slope becomes difficult to accurately determine for CCDs with very low CTI due to the noise in the signal. The minimum detectable CTI depends on the number of transfers; in this case (512 row transfers) the limit is about It is possible to improve the resolution by shuffling a portion of 5x the image back and forth numerous times, thus increasing the number of transfers [Murowinski, Deen, and Hardy, 1995). Figure 32 shows an x-ray event plot for a radiation-damaged CCD. In this case, the CTI is too high to be accurately determined with this method. Our investigations involved measuring CTIs of this magnitude and higher, so we were not able to use the x-ray technique. However, we did make use of another very practical feature of x-ray photons: the uniform packets provide a simple and precise measurement of the system gain. 3. Fine Spot Illumination
A particularly versatile measurement technique is possible using a setup similar to [Hopkins, Hopkinson, and Johiander, 19941, in which a very
56
T. D. HARDY, M. J. DEEN, AND R. MUROWlNSKI
0
100 FIGURE
0
100
200
300 Row number
400
500
600
31. X-ray event plot for an undamaged CCD.
1200
I
300
400
500
Row number
FIGURE32. X-ray event plot for a radiation-damaged CCD.
60C
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
57
small pinhole is imaged onto the CCD in such a way that all the light is absorbed in a single pixel. This method requires a rather elaborate equipment setup because of the difficulty in creating a stable pinhole image smaller than the size of a single pixel, which is usually around 20pm. The result is something similar to the x-ray technique, except that the size of the charge packet is adjustable by changing the intensity of the pinhole image. However, the size is not known a priori and must be estimated from acquired images. Also, the charge is only generated in one pixel per image, so a much larger number of images is required to obtain good statistics. 4. EPER
Another method often used to measure the CTI is the Extended Pixel Edge Response (EPER) technique [Janesick et al., 19883. In this technique, the CCD is illuminated uniformly and read out in such a way as to obtain an image which extends over one or both of the edges opposite the output amplifier. It is quite similar to the pulse train technique as applied to all the parallel registers at once, except that the registers are loaded in parallel by illumination instead of serially by electrical injection. Because of this, the “leading pulse” only undergoes one transfer, so it cannot provide a useful measurement of the charge loss, but we can use the “trailing pulses,” the charge packets measured in the extended regions. The size of these “extended pixels” indicates the number of charges which have been captured from the image pixels and then reemitted during a complete set of transfers through the parallel registers. The charges emitted in the wake of a pulse indicate the number of empty traps which would be available to capture charge from a subsequent signal packet, thus this number can be used to estimate the CTI. The first trailing pixel indicates the charge which would be captured from packets that were two pixels apart, the second the loss from packets three pixels apart, and so forth. EPER is perhaps more properly considered a technique for measuring charge deferrd, since it only gives an indirect measure of the CTI. Because it does not measure the actual diminished magnitude of a transferred charge packet, it cannot detect losses due to charge which is completely absorbed or deferred for times much longer than the clocking periods. However, if the CTI is dominated by trapping at a single energy level, which it appears to be in radiation-damaged CCDs, EPER is a very useful tool. The EPER technique is particularly useful in probing how CTI changes with charge packet size since, like the fine spot technique, the illumination level can be easily varied. However, EPER requires no special equipment beyond a diffuse light source for creating the flat field image.
58
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
We used the EPER technique because of its simplicity, its ability to adjust the charge packet size, and its ability to measure CTI over a very wide range. We calculated the CTI by dividing the total amount of charge in the extended pixels (the charge loss) by the amount of charge in the last pixels of each column (the original packet size) and then dividing by the total number of single-pixel transfers involved in the readout. This gives a "worst case" CTI figure equivalent to that experienced by an isolated feature in a completely dark field, averaged over the number of columns. C. Experimcntul Results 1. CTI us u Function of Temperature
For our CTI experiments we used the same setup as in Section IV and tested the same device. Figures 33 and 34 show the parallel EPER CTI of the radiation-damaged TK512 device as a function of temperature. The two figures represent two different clock-timing schemes. The three sets of data plotted in each figure indicate measurements of the three damage regions of the CCD. We note not only a general increase in CTI with increasing
L
+ 0 A
6.0~10' protons/cm* 1.5x109 protons/cm2
undamaged
c 0
0 140
160
180
200 Ternperoture (K)
220
240
FIGURE 33. Parallel CTI as a function of temperature (fast clocking). The symbols are experimental data from the three sections of the device. The lines indicate model results evaluated for a combination of the three traps in Table 4.
59
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES 5 . 0 ~ 1 0 - ~ E I " '
2 . 0 1~o r 4
I
I
'
I
'
I
'
'
'
I
"
'
I
"
'
l
j
1.0~10-~
I
.
0 140
160
180 200 Temperature (K)
220
240
FIGURE34. Parallel CTI as a function of temperature (slow clocking). The symbols are experimental data from the three sections of the device. The lines indicate model results evaluated for a combination of the three traps in Table 4.
amount of damage but also that the CTI is more strongly dependent on temperature in the damaged sections. Unfortunately, we could not get data at higher temperatures because the dark current was so high, especially in the high radiation zone, that it was distorting the results. This distortion can be seen in the upper few temperature points of the high-radiation section in Figure 33. Since it could be assumed that the additional CTI experienced by the TK512 from radiation damage is the result of bulk trapping, we tried to fit the trapping model described by Equation (28) to the data. First, the CTI in the undamaged part was modeled. In Figure 33 it was sufficient to simply model it as a constant (1.21 x which is shown as the lowest line. Then the model was fit to the excess CTI in the high-radiation data. The uppermost line drawn on the graph indicates the model evaluated for a combination of three traps with the parameters given in Table 4,making use of clock timings measured from our CCD electronics. In these experiments we used the three-phase non-MPP clocking scheme shown in Figure 29 for the parallel clocks with a clock pulse width ( T3 - T,) of 96 ,LLS. For Figure 33, the period of the clocking cycle ( T , - T,) was 2 0 0 p for the first 500 cycles, during which the serial register was simply flushed,
60
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI TABLE 4 RADIATION-INDUCEDTRAPS.
~~
A
0.2 1
5 x
4.0
B
0.41
5 x 10-16
4.0
C
0.42
6 x
26
and 22.2ms for the final 13 cycles when the serial register was being read out between parallel transfers. The lower-temperature CTI maximum in Figure 33 is caused by Trap A, and the higher maximum is a caused by the combination of Traps B and C. In Figure 34, we modeled the undamaged CTI as a constant (2.88 x plus a trapping level with E, = 0.48 eV, CJ,= 5 x 10- l 6 cm-2, and N, V, = 1.5. The radiation-induced trap parameters remained the same, but the clock periods used for this figure were 25.3 ms for 490 cycles during which the output was digitized, and 400 ms for 23 cycles when the data was being written to disk. In Figure 34, the maxima have shifted down in temperature because of the slower clock rate, and only the feature due to Traps B and C is visible. Finally, assuming the density of trapping states is proportional to the proton dose, the model was evaluated for one quarter of the N, values used in the high-radiation cases, and these results were plotted as the middle lines in both figures for comparison with the low-radiation data. The excellent fit indicates that the CTI increase is, in fact, proportional to the proton fluence. When matching our model to CTI versus temperature data, it is difficult to decouple the effects of the trap energy and the trapping cross section, so we could not unambiguously determine these two parameters. We chose values which resulted in a good fit and were in line with previously published figures. The activation energy of the lower peak seems too high to be attributable to the oxygen-vacancy complex, which is reported to have E, = 0.18 eV [Benton and Kimerling, 19821. Therefore, we decided to try values corresponding to the divacancy, which introduces two levels that always appear in equal concentrations [Saks, 19771. The parameters for Trap A and Trap B correspond to these two levels. The Trap C parameters match the published figures for the phosphorus-vacancy complex, which is usually given as the dominant trap level in irradiated CCDs. Energy levels for this complex have been reported in the range of 0.4-0.44 eV [Saks, 1977; Janesick et al., 1991; Robbins, Roy, and Watts, 1992; Holland, 1993;
RADlATION DAMAGE ON SClENTIFIC CHARGE C O U P L E D DEVICES
61
Hopkins, Hopkinson, and Johlander, 1994; Gendreau et al., 1995; Meidinger and Struder, 1995; Coffa et al., 1997; Svenson, Jagadish, and Williams, 1993; Benton and Kimerling, 19823. Traps B and C are not resolved in our data, and the upper maximum could be due to a single trap at around 0.4 eV, but we introduced Trap B to see if the data were consistent with a divacancy interpretation, which it appears they are. It is clear from comparing Figure 33 with Figure 34 that both the temperature and the clocking speed can have a significant effect on the CTI. The model we have described can be used to choose combinations of temperature and clocking speed to minimize the CTI for a radiationdamaged CCD. As an example, we used the model to calculate the CTI for various parallel clock periods at the designed operating temperature of the FUSE FES (223 K). The results are shown in Figure 35. For this plot, we used a clock pulse width equal to one-half the period, which corresponds to a clocking pattern with no break between cycles (i.e., T6immediately follows Ts).This type of pattern is used when unwanted portions of the CCD are being flushed out prior to reading out the region of interest, because in that case there is no need to wait for the readout of the serial register between parallel clock cycles. The FES only reads out small sections of the CCD to
Parallel clock period (sec)
FIGURE 35. Parallel CTI model as a function of clock period at 223 K. The lines indicate model results evaluated for a combination of the three traps in Table 4. The uppermost line corresponds to the model evaluated with the high radiation damage ( 6 x 10’) protons/cm*); the middle line to the low radiation damage (1.5 x 10’) protons/cm*); and the lowermost line (almost coinciding with the x-axis) is the undamaged case.
62
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
minimize the processing time, so most of the transfers a charge packet undergoes in the FES CCD will be with this clock pattern. We find that the minimum CTI is achieved at a clock period of about 8 ps.
2. CTI us u Function of Signd Simulations. In order to quantify the shell effect mentioned earlier, we examined the charge packet distribution within the bulk silicon using the same simulation software that produced Figures 13 and 14. We used TSUPREMA4 [TMA, 1994bl and MEDIC1 [TMA, 1994aI to simulate one 27-ym pixel of a three-phase buried channel CCD with an M P P implant under one of the phases (like the TK512 we measured). Using MEDICl’s photo-generation feature, we were able to create a range of simulated charge packets with increasing numbers of electrons and determine the correponding charge concentrations in the potential wells. By performing simulations of two two-dimensional cross sections at right angles through the device, we constructed a three-dimensional model of the charge concentrations and potentials. We were then able to perform the integration in Equation (31) numerically and see the relationship between signal level and CTI due to bulk trapping. Figure 36 shows the simulated charge distribution for 11
eW
0
5
10
5
15
25
20
30
(C)
1016 1015 1014 1013 0.0
0.5
1 .o
1.5
2.0
Distance (Fm) FIGURE36. Simulated charge concentration profiles under one gate of a 27-pm pixel device. (a) Along channel. (b) Across channel. (c) Vertically from gate to substrate.
63
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
different packet sizes ranging from around 100 to 270,000 electrons, with each subsequent packet being two to three times the size of the previous one. Each of the three plots shows the charge concentration along one axis through the center of the packet: (a) along the channel, (b) across the channel, and (c) vertically from gate to substrate. The simulated charge distributions were then processed to determine the percentage of filled traps throughout each packet. Using Equations (29) and (30) and a dwell time of 0.1 seconds, we calculated the percentage of filled traps for Trap A at a temperature of 155K. The results are shown in the three plots of Figure 37. A dwell time of 0.1 second was used; even though in a three-phase device the charge packet is present under a given phase for only about one-third of a one-pixel transfer period, in an EPER measurement there are packets in every pixel, and the time between charge packets is not long enough for the traps to empty again. Therefore, we have assumed that the effective dwell time is roughly the entire readout period. As it turns out, this dwell time is so long compared with the capture and emission times that the traps always reach steady state. In addition, the emission time is so long that the steady-state fraction nss is always approximately 1 for capture time constants corresponding to electron concentrations at the limit of the
-
0
U a,
5
15
10
20
25
30
20
25
30
E
g!
m
.5
(b) 1.0
c
0.8
3 u,
0.6 0.4
*
83
0 c 0
0
15
10
5
5
E LL
(C)
1.o 0.8 0.6 0.4
8.6 0.0
0.5
10
1.5
2.0
Distance (pm)
FIGURE 37. Simulated filled trap profiIes under one gate of a 27-pm pixel device at 155 K cm’. (a) Along channel. (b) Across channel. for a trap with E , = 0.21 eV and un = 5 x (c) Vertically from gate to substrate.
64
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
accuracy of our simulation model ( - 10" cmP3).Therefore, we are not able to verify Equations (29) and (30) with our data. We then multiplied the trap occupation data by the trap concentration and performed a numerical integration over the volume of the packet to find the total number of filled traps in each packet. Finally, we inserted the number of filled traps into Equation (31) to find the deferred charge for comparison with experimental data. Measurements. We again used the EPER technique and the same experimental setup. Figure 38 shows a plot of deferred charge versus the packet size in electrons at a temperature of 155 K, with a logarithmic x-axis to improve the legibility. The crosses represent the results of EPER measurements of deferred charge in the high-radiation section of the TK5 12, and the diamonds represent deferred charge in the low-radiation section. The lines represent the deferred charges calculated from the simulated charge packets. The trap concentrations used for the calculations were 5.2 x lolo cmP3 for the high-radiation section and 1.3 x 10" cm-3 for the low-radiation section, so again we see that the trap concentrations vary
5000t
'
'
'
""'1
I
I
4000
I
W
3000
+
6.0~10'protons/cm2
0
1 . 5 ~ 1 protons/cm2 0~
!? c U D
-$
2000
0)
n
1000
0 102
1 o4 Signal packet size (e-)
1 o5
106
FIGURE38. Deferred charge vs. signal packet size at 155 K. Symbols indicate experimental data from the damaged sections of the device. The lines indicate the simulated results with E, = 0.21 eV, g n = 5 x 10-16cm2, and N , = 5.2 x 10'0cm-3 (high damage) and 1.3 x 1010cm-3 (low damage).
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
65
linearly with proton fluence. Figure 39 shows the corresponding estimate of CTI. The two lowest signal points display a poor fit due to the difficulty in accurately determining the deferred charge at such a low level. The error is magnified in Figure 39 because of dividing by the low signal level to get the CTI. With the stated trap parameters, the simulated results match the experimental data very well, which indicates that the nonlinear variance in CTI with signal level can indeed be explained by the variance in the packet volume and density. These results agree qualitatively with previously published results [Mohsen and Tompsett, 1974; Hopkins, Hopkinson, and Johlander, 19941 at low signal levels, although both of these papers show a linear dependence at high signal levels. The specifics of device fabrication, such as pixel size and channel doping, will have a large effect on the shape of the curve, so a direct comparison with these previous results is not practical. It is well known that a CCD exhibits larger CTI for small signals, and we see here that this effect increases markedly with radiation damage. The increased CTI at small signal levels can be explained by the fact that the volume over which a charge packet is distributed is not proportional to the 0.0020
0.0015
+
6.0~10'protons/cm2
0
15x10' protons/cm2
6 0.0010
0.0005
0.0000 102
1 o4
106
Signal packet size (e-)
FI(;URE39. CTI vs. signal packet size at 155 K. Symbols indicate experimental data from the damaged sections of the device. The lines indicate the simulated results with E , = 0.21 eV, B,, = 5 x 10-'6cm2, and N, = 5.2 x l O " ~ r n - ~ (high damage) and 1.3 x 10'0cm-3 (low damage).
66
T. D. HARDY, M . J. D E E N , A N D R. M U R O W I N S K I
amount of charge contained in it. Charge packets with small amounts of charge have a lower charge density. The bulk traps, however, have a fixed uniform density, thus the smaller packets interact with more traps per electron of signal. The effect in the trapping model in Equation (31) is to create disproportionately large CTI for smaller packets. The importance of charge packet distribution to bulk trapping, especially for small packets, has been demonstrated clearly by the success of several researchers in using a narrow supplementary buried channel, or “notch,” to reduce the radiation-induced CTI [Janesick et a/., 1991; Holland et al., 19911. The notch is a small central section of the buried channel which has been doped higher than the rest, creating a narrow trough in the middle of the potential well. Small charge packets are constricted inside this trough, which reduces both their volume and the number of traps they encounter. A corresponding reduction in the radiation-induced CTI is observed. VI. READNOISE The noise in the output signal of a CCD dictates the minimum detectable signal and, therefore, represents the ultimately limiting factor in the CCD’s sensitivity. There are many sources of noise in CCDs, but for most of the CCD’s history the dominant source of noise has been the noise introduced by the output amplifier (the read noise). Radiation damage has been found to cause an increase in the read noise [Murowinski, Linzhuang, and Deen, 1993a; Murowinski, Linzhuang, and Deen, 1993b1, and we investigated this phenomenon further, A . Noise Sources
There are three main sources of noise affecting the CCD output. These sources include the thermal, or Johnson, noise associated with the resistances in the output circuit, the reset noise associated with the charging of the gate capacitance of the output transistor, and the flicker, or l/,L noise of the output transistor. In addition, the output transistor may exhibit excess generation-recom bination noise from isolated midgap trapping levels, especially if it has been subjected to radiation. 1. Thermal Noise
Thermal noise is a result of the random thermal motion of the electrons in the device. It is independent of frequency, a form of “white” noise. Thermal noise voltage power is directly proportional to the temperature and the
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
67
resistance. The noise spectral density (in A2/Hz) is given by
s, = 4kT R where k is Boltzmann's constant, T is the absolute temperature, and R is the resistance. Because there is a resistance associated with the channel of a MOSFET, it will exhibit thermal noise. Modeled as a current source in the channel of the transistor [Chen and Deen, 1998; Chen and Deen, in press; Chen et al., in press], the current noise spectral density can be written [van der Ziel, 19863 where y is a constant of proportionality which accounts for the variation of the conductance along the channel and gdO is the drain conductance of the transistor at zero drain bias. For a transistor in saturation, y = ySat = 4 and qds N g,, the transconductance. This gives S,
= 4kT($g,)
(34)
In a CCD, the main source of thermal noise is the output source follower transistor. We do not expect the thermal noise to be affected by radiation. 2. Reset Noise Reset noise is the noise resulting from the uncertainty in the charging of a capacitor due to noise in the charging circuit. In a CCD output circuit, reset noise is seen when the voltage at the gate of the output transistor (the floating node) is fixed by the reset transistor. The reset noise is essentially a sampling of the noise in the reset transistor, The total noise (in V 2 )can be found by integrating the noise spectrum with the low-pass filter characteristic of the capacitor circuit. For the simple case when the noise is white thermal noise, this is
where C is the capacitance of the gate node and R is the resistance in the reset circuit. Note that the noise does not depend on the resistance. The reset noise in CCDs is usually eliminated by signal processing, as we shall see later. 3. Generation-Recombination Noise The interaction of carriers with trapping states causes a random fluctuation in the current due to the appearance and disappearance of carriers. This
68
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
random fluctuation adds another noise component. The spectrum S , resulting from a fluctuating number of carriers N has a Lorentzian shape and is given by [van der Ziel, 1986b; Deen, Ilowski, and Yang, 19951 S,(f)
__ = 4AN2.
z 1 + (2nf)2?
__
where AN2 is the variance of N, f is the frequency, and z is the lifetime of the carriers. The lifetime z is determined from the capture and emission time constants of the trap (see Section V) by
If there are M separate trapping levels, each with its own time constant zi, their contributions to the noise spectrum can be combined as
where Ai is the variance of the noise associated with the ith trap. We expect that generation-recombination noise will be present in the output MOSFET of an irradiated CCD due to the bulk trapping states introduced by the radiation damage. 4. Flicker Noise
Flicker noise is often referred to as l/f noise because its spectral density has a l l f " frequency dependence with c1 close to unity. The origins of flicker noise are varied and not well understood. It is generally accepted that in MOS devices the dominant source is the interaction of the channel carriers with near-interfacial oxide traps in SiO, [Fleetwood et al, 1993 and references therein]. Due to their varying distances from the interface, these traps have a continuous distribution of trapping time constants. The combined effect of these traps is to produce a noise power spectral density S, [Zhu, Deen, and Kleinpenning, 1992; Deen and Zhu, 19931 of the form
K , 12 s, = f"
(39)
where K , is the flicker noise coefficient, I, is the drain current, A, is a constant of about 2, and c1 is a constant which varies between 0.8 and 1.2. K , is a device-dependent parameter and varies widely. In buried channel CCDs, the MOS transistors used at the output are usually also buried channel devices. Because the signal-carrying channel is
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
69
located below the surface in such devices, the carriers are kept away from the interface and the near-interfacial oxide traps. This has been found to reduce the flicker noise significantly [Kandiah and Whiting, 19911. Therefore, we expect that the flicker noise in buried channel devices will not be greatly affected by radiation damage, even though radiation causes an increase in the number of surface traps and would seriously affect a surface channel device.
B. Correlated Double Sampling
In order to reduce the amount of noise at the output, especially the reset noise, researchers devised a signal processing technique known as correlated double sampling (CDS) [Murowinski, Linzhuang, and Deen, 1993b; Hardy, Murowinski, and Deen, 1997; White et al., 19741. The main idea behind the technique is to sample the output twice, once just after the reset pulse and once after the charge packet has been dumped to the gate of the output transistor, and subtract the two samples. The subtraction of the two samples effectively eliminates the reset noise because this noise is a constant after the reset transistor turns off. The same reset noise voltage will thus be present in both samples and will be subtracted out. CDS also reduces any other low-frequency noise which is common to both samples, including, significantly, the l/f noise. 1. Clump and Sample One simple method of performing CDS is to use a clamp-and-sample circuit (CS-CDS). Figure 40 shows a schematic of a clamp-and-sample processor. Figure 41 illustrates the typical output waveform from a CCD showing the switching points for CS-CDS. Both switches are normally closed. At the clamp point the clamp switch is opened, fixing the bias across C , to the reset level. The voltage at the output then becomes the preamp output minus the reset level. A short time later during the signal period of the output signal the sample switch is opened, and the output is held by C, at a voltage equal to the signal level minus the reset level. The CS-CDS processor produces an output noise signal N,,(t) equal to NJt)
= N(t) - N(t -
T,)
(40)
where N ( t ) is the input noise signal and T, is the time between samples. Note that because of the sampling operations in the implementation of CS-CDS, the actual output does not continuously vary with the input as suggested by Equation (40). However, this simplification does not affect the accuracy of the following analysis, since the equation does hold at the sample point,
70
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI output drain
I I
f i sample switch
output source I
1
‘me ar
FIGURE40. Clamp-and-sample (CS) CDS processor schematic.
I d
reset pulse
CCD output
I
reset level
,--
I
pixel value
n
I1
clamp sample timing
FIGURE41. CCD output waveform and CS-CDS sample timing.
sample
71
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
which is all we are interested in. The Fourier transform of N J t ) is N,,(f)
=
S,(f)(l
- e-jzxfT*)
(41)
where S , ( f ) is the Fourier transform of the input noise. The magnitude squared of this quantity is INc,(f)12= IN(f)I2 4(sin(.f73
(42)
from which we can see that the transfer function of the CS-CDS processor is zero at DC and has maxima at odd multiples of half the sampling frequency. Usually a low-pass filter is inserted before the CS-CDS processor, which cuts out the higher maxima, and the overall transfer function becomes essentially a bandpass filter centered around the first peak at f = 1/(2Ts). If we use a second-order prefilter with cutoff frequency f,, the overall transfer function of the CS-CDS processor is
A plot of the transfer function versus frequency, normalized to l/K, is shown in Figure 42. The cutoff frequency of the second order prefilter is l/q, which we found to be the optimum value when the input contains flicker noise.
0.001
0.01 0
0.100 f Ts
1.000
10.000
FIGURE 42. Transfer function of a CS-CDS processor. Frequency is normalized lo the sampling frequency l/X. The cutoff frequency of the second-order prefilter is l/K.
72
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
The total noise at the output is found by multiplying the input noise spectrum S,( f ) by the transfer function and integrating over frequency. For the clamp-and-sample processor the output noise voltage V , in volts rms is
As in most measurements, what is really important is not the absolute magnitude of the noise but the signal-to-noise ratio (SNR). The prefilter will affect the signal as well as the noise, so the output signal voltage will be less than the input. We can calculate the output magnitude V, from
V, = y,(1
- e-2afcT*)2
(45)
which is the response of a second-order filter to a step input shown in Figure 41. The SNR is the ratio V,/V,.
y n like the one
2. Dual Slope Integration An elegant method for performing CDS processing that eliminates the need for the low-pass prefilter is dual slope integration (DSI).Figure 43 shows a schematic of a DSI circuit. After the reset pulse, the CCD output is applied to the input of an integrator where it charges a capacitor. When the signal charge is dumped to the output transistor, the output signal from the CCD is inverted and applied to the same integrator where it discharges the capacitor for the same amount of time. After the second integration, the output of the integrator will be proportional to the difference between the two signals. When the output has been sampled, the integrator is reset by closing a switch across the capacitor. Figure 44 shows the output of a DSI processor in response to a typical CCD output waveform. The Fourier transform pair for the integration function I ( t ) is
where rintis the time constant of the integrator. This transform is valid only for integration from minus infinity. Since the integrator is reset each cycle, it is effectively only integrating from zero to qnt.To account for this we must subtract the integration from minus infinity to zero: f(t’)dt’
::j =
f(t’)dt’ -
I:,
f(t‘)dt’
= I ( T,,,)
-
Z(0)
(47)
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
I I I I
I I
FIGURE43. Schematic of a dual slope integration (DSI) CDS processor.
I
I
I
I
73
74
T. D. HARDY, M. .I. DEEN, A N D R. MUROWINSKI
The transform of this subtraction introduces a product term of the kind in Equation (41) to the overall transfer function. The resulting transfer function is
To get the overall transfer function of the DSI processor, we must add another of these product terms for the subtraction of the reset integration from the signal integration. The overall transfer function Hdsi is then Hdsi(f)
=
(1 - e - j 2 n S T s ) ( 1 - e - j 2 n f T i n r ) j2?f7iflt
(49)
Normally, there is no significant pause between the two integrations, and T, is approximately equal to Tnt.The magnitude squared of the DSI processor transfer function will then be
This transfer function is shown in Figure 45 versus normalized frequency, using qnt= TJ2. The total noise at the output of the DSI processor (in volts rms) is again obtained by integrating the input noise spectral density S,(f) with the
10.000
1.000
“F
,---. L
v rn 0
0.100
0.010
0.001 0.001
0.010
0.100 f Ts
1.000
10,000
FIGURE45. Transfer function of a DSI processor vs. normalized frequency (fin, = T,/2).
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
75
TABLE 5 CHARACTERISTICS OF TESTDEVICES. ~~~
Width
Length
(w)
LDD length
(ilm)
Proton fluence (protons/cm2)
D4QX
50
11.5
2
undamaged
D6Q4
60
13
3
5.0 x I O R
D9Q5
60
10
1
2.1 x 109
Transistor
(w)
transfer function Hdsi:
And again we must consider the effect of the processor on the input signal. The signal V, at the output of the integrator in response to a step input Kn is
from which we can determine the SNR. In Equations (51) and (52) we note that both V, and V , are inversely proportional to zinr, thus the SNR is independent of this value.
C. Experimental Results
We measured the noise spectra of several transistors, some of which had been damaged by proton radiation. These were the same devices we used for the DLTS experiments in Section 111. The results presented here are based on the measurements of three representative transistors, one at each radiation dose. Unfortunately, many of the transistors were damaged in handling and no longer usable, so we chose from the available working devices three whose dimensions were similar to each other and typical of the transistors used in actual CCD arrays. These test devices are summarized in Table 5. The three devices had different LDD lengths, but according to the results in [Kim, Blouke, and Heidtmann, 19901 the variation in the characteristics is minimal (for LDD lengths greater than zero). 1. Current-Voltage Meusurements
The current-voltage (I-V) characteristics of the transistors were measured with a curve tracer and found to match typical FET I-V curves. Because
76
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
these are depletion-mode buried channel devices, the threshold voltage is a large negative value. With the substrate source voltage VBsset to - 12 V, the threshold voltages V, were around - 11 V. We used V,, = - 12 V because this is the typical substrate bias used in CCD arrays, which ensures that the conducting channel is well below the surface. We note that in previous measurements [Murowinski, Linzhuang, and Deen, 1993a; Kim, Blouke, and Heidtmann, 1990; Kim and Heidtmann, 19891 the substrate was shorted to the source, which may have caused an increase in surface-related noise due to the shallow channel. We note also that if V,, is zero, these transistors cannot be shut off because the surface inverts before the threshold is reached, pinning the surface potential and preventing any further reduction in channel width. The “threshold voltage” reported in [Murowinski, Linzhuang, and Deen, 1993a1 for these devices is actually the inversion voltage. We expect the threshold voltage to be affected by radiation due to charge buildup in the oxide. We measured the threshold voltages for all the working devices on each die, but there was a large variance in our results, and no clear dependence on radiation damage could be discerned.
2. Noise Measurements The experimental setup used to measure the noise spectra is shown in Figure 46. We used the same dewar and temperature controller as in our CCD measurements (see Section IV), but with the quartz window replaced by a Metal enclosure
1
,
shielded cables
Liquid nitrogen dewar
FIGURE46. Experimental setup for measuring the low-frequency noise of the translstors.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
77
solid aluminum disk for shielding from electromagnetic interference (EMI). For ease of analysis, the transistor was connected in a common emitter configuration and biased in the linear, or triode, region. T o minimize 60 Hz powerline interference, the transistor biases were supplied by batteries which were shielded and connected to the dewar with coaxial cables. All resistors were wire-wound to reduce excess flicker noise. The voltage signal at the drain node was boosted by a Stanford Research Systems SR560 low-noise amplifier (also powered by batteries) before being applied to the input of a Wavetek 5820A autocorrelating spectrum analyzer. The amplifier was set to AC coupling, using a second-order bandpass filter with cutoffs at 1 Hz and 1 MHz, and the gain was set to 100. The gain and frequency response of the amplifier were characterized by using a test signal from the spectrum analyzer. In order to cover a large frequency range with reasonable accuracy, we measured the noise spectra in three frequency ranges: 0-50 Hz = 5 Hz), (spectrum analyzer channel width B = 0.125 Hz), 50-2,050 Hz and 0-50 kHz (B = 125 Hz). A Hanning weighting function was applied to all measurements. The data from the spectrum analyzer were recorded by a PC using an IEEE 488 interface bus. The resulting spectra for each of the transistors operated in the linear region at room temperature is shown in Figure 47 along with the system noise. We measured the system noise by replacing the transistor with a reference resistor between the source and drain nodes and subtracting the thermal noise of the resistor calculated from Equation (32). The smooth lines in Figure 47 are noise models (described later). The noise is represented in this figure as the magnitude squared currents, in AZ/Hz, from a noise current generator between the source and drain. The data in the figure were arrived at by collecting the raw spectrum analyzer data, compensating for the gain and frequency response of the amplifier, and then calculating the equivalent noise current. The noise current is equal to the drain voltage ud divided by the parallel combination of the output resistance of the transistor R , and the load resistor R,:
(a
Since the output resistance of the transistor varies with the operating point, we measured R , with a curve tracer for each operating point and used this value to calculate the noise current. Figure 47 shows a marked increase in the low-frequency noise with radiation damage. We expect from the preceding discussion of noise sources that the excess noise will be dominated by generation-recombination noise from bulk traps. We fit a noise model to the measured data, including thermal noise, flicker noise, and generation-
78
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI 10-18
10-20
,--. N
L
\ N
& 10-22
v
d)
.-0 z
1 O-Z4
10-26 1
,,,,!
10’
I
, , , , , a , !
I
I 1 4 1 1 1 1 1
o2
o3
1 I Frequency ( H L )
I
1 , 1 1 1 1 1 1
1 o4
, - _ L w . L
1 o5
FIGURE 47. Noise spectra at room temperature for three buried channel MOSFETs with varying amounts of radiation damage.
recombination (g-r) noise from one or two isolated trapping levels. The model is simply a sum of Equations (33), (39), and (38):
The thermal noise arises from the resistance of the channel, and in the linear region, where the transistor acts as a voltage-controlled resistor, the resistance of the channel is simply equal to the output resistance R,. In the measured spectra, we find that indeed the thermal noise component is accurately modeled by Equation (32): 4kT S,=R,
(55)
The thermal noise shows no dependence on radiation dose, as expected. The variation in the white high-frequency noise seen in the figure is due to the variation in the output resistance of the three transistors. At room temperature we found that the low-frequency noise is totally dominated by the g-r noise, even for the undamaged device, and the dramatic increase in the damaged devices can be explained by the introduc-
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
79
TABLE 6 GENERATION-RECOMBINATION NOISEPARAMETERS AT ROOM TEMPERATURE. 1 Note that 5, = 2nfi Transistor
Ai
ti
D4Q8 (undamaged)
1.24 x
0.0352
D6Q4 (5.0 x 10' protons/cm2)
i x 10-17 2.6 x
0.002 0.013
D9Q5 (2.7 x lo9 protons/cm2)
5 x 10-17 6x
0.002 0.009
tion of new bulk trapping states. The flicker noise is insignificant, which is what we expect for buried channel devices. The noise models plotted in Figure 47 are based on Equation (54) and use the g-r parameters given in Table 6. The parameters were determined by multiplying the spectra by frequency to accentuate the g-r peak and then fitting the model to the data as suggested in [Jones, 1994; Deen, 1993; Deen, 1993; Zhao, Deen, and Tarof, 19961. An example of this for the high-radiation spectrum is shown in Figure 48.
Frequency
(H7)
FIGURE48. Noise x frequency for the high-radiation device (2.7 x lo9 protons/cm2), showing clear evidence of g-r noise peaks. r , and r 2 denote the centers of the two overlapping peaks.
80
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
The data in Table 6 indicate that the proton damage introduces two trapping levels with characteristic time constants of about 0.002 second and 0.01 second at room temperature. The level of trapping noise seems to be proportional to the proton fluence, at least for the 0.002 level. i t is very difficult to get accurate values for A or z for these traps because the peaks overlap. The g-r noise is dependent on temperature because the trapping mechanism is thermally activated; we can use this dependence to characterize the traps. The peak noise power occurs when z, = 7,. We can evaluate z at this point from Equation (23): E.
which can be rearranged to give (57) Therefore, if we plot ln(zTZ)versus l/kT (an Arrhenius plot as in Figure 19 in Section 111 for the DLTS data), we should get a straight line with a slope equal to the activation energy of the trap E,. We measured the spectra of the transistors over a range of temperatures from 200K to 300K and estimated the time constants for the g-r peaks in each one. Figure 49 shows the noise frequency plots for the highly damaged device over this range. The corresponding Arrhenius plot, including data from the low-damage device, is shown in Figure 50. Due to the inaccuracies involved in estimating the time constants of the overlapping peaks, we cannot make any definite conclusions about the trap energy levels from these measurements alone. The plotted lines are models based on the trap energies found in Section I11 and represent traps with activation energies of 0.43, 0.30, and 0.20eV. The lines show that these results are consistent with the results from the DLTS study (Section 111) and the CTI measurements (Section V). We conclude that the g-r noise seen in the damaged devices is probably due to the same bulk traps. Scientific CCDs are usually operated at low temperatures to reduce the dark current to minimal levels, so it is important to examine the noise at these temperatures. The CCD for the FUSE FES will be operated between 213 and 223 K. Figure 51 shows the noise spectra of the three transistors at 208, 210, and 213 K. From the figure we can see that in this temperature range (208 to 213 K) the flicker noise dominates. The g-r noise is insignificant except for a small peak in the highly damaged device, which appears to
RADIATION DAMAGE ON SCIENTIFIC CHARGE C O U P L E D DEVICES
1 o1
100
1 o3
102
1
o4
81
1 os
Frequency (Hz)
FIGURE49. Noise x frequency for the high-radiation device (2.7 x 10’ protons/cm2) over a range of temperatures, showing the variations in the g-r noise peak.
+
8
2.7~10’protons/crn2
-
4-
2
1
*
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.
.
I
.
,
,
,
82
T. D. HARDY, M. J. DEEN, A N D R. MUROWINSKI
-
10-20
. . . .
.."
/
N
L
> J:
2.jx109 protons/cin2 (213K) j
1
,o--22
I
.
8
.
v
W v)
0
z
,o -24
.. ..... ..... ....
1 oo
:
1o1
1
oL
I o3
I o4
IFrequency (Hz)
FIGURE51. Noise spectra for the three transistors in the range 208-213K. Solid lines correspond to the noise model.
be due to the trap at 0.30eV. The flicker noise does not show a strong dependence on the radiation damage, and since the white noise variations can once again be attributed to variations in the output resistance, it appears that the noise at these low temperatures is largely unaffected by radiation. This explains why other groups have not seen an increase in the read noise of CCDs after radiation [Holland et ul., 1991; Murata-Seawalt, 19901. We do not know the origin of the observed flicker noise, but Kandiah and Whiting [1991], who observed similar spectra on undamaged CCD output transistors, propose that it is due to holes which have been trapped at the surface under the gate by a poor connection between this region and the source and drain diffusions. These holes could interact with the surface states and cause flicker noise. If the flicker noise is due to surface states, we would expect it to increase with radiation damage because radiation causes an increase in the density of the surface states. We do observe an increased flicker noise in the damaged devices, but it is a very minor effect. 3. Effect qf CDS Signul Processing In order to more completely determine the effect of noise increases on a CCD system, we calculated the effect of correlated double sampling on the noise signal and arrived at a total equivalent input noise.
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
83
The output transistor of a CCD is operated in a source follower configuration with a load resistance R,, which is typically 15-30kQ. We multiplied our measured drain current noise spectra by a load resistance of 20 kQ to get output noise voltage spectra and extended the frequency range by assuming that the noise was constant (white) above 50kHz. These extended voltage spectra were then multiplied by the transfer function of a dual slope integrator (DSI), integrated over frequency (Equation (44)), divided by the gain of the DSI processor (Equation (45)), and finally divided by the output sensitivity of the transistor. The output sensitivity So is the voltage at the output per electron of charge deposited on the floating node and is equal to the gain of the source follower G divided by the capacitance C, of the floating node:
s I)
= -qG
c,
where q is the charge on an electron. Kim et al. [42] measured the sensitivity of CCDs fabricated in the same lot as the devices we tested and using the same type of transistor for the output transistors. They found S, to be about 1.3 pV/e-, and we used this value in our calculations. Figure 52 shows the results of these calculations for the noise spectra of the most damaged transistor over a range of temperatures and DSI
Sampling frequency ( H z )
FIGURE52. Total input-referred noise vs. sampling rate for the highly damaged transistor (2.7 x 10" protons/cm2) over a range of operating temperatures.
84
T. D.HARDY,M. J. DEEN, AND R. MUROWINSKI
sampling periods. The total input-referred noise (in electrons) is plotted as a function of the sampling frequency for each temperature. The sampling frequency f , is calculated from the sampling period T, (see Figure 44) and is given by 1/(2q), since two samples are required for each output value. We see from Figure 52 that for high sampling frequencies the total noise drops as the sampling rate is reduced. The noise at high sampling frequencies is due mainly to the high-frequency white noise; the reduction occurs because the bandwidth of the DSI processor is decreasing (the noise is higher for the 201 K and 214 K curves in the high-frequency range because these spectra had higher thermal noise due to a lower output resistance at the operating point used). The bandwidth varies linearly with sampling frequency, and we see the total noise varying linearly as well. We can use the linear bandwidth variation to make a simple prediction of the total noise variation by multiplying the noise spectrum by frequency. Note the similarity between Figures 49 and 52. The fact that the noise drops at lower sampling frequencies has often been used by CCD operators to improve noise performance by slowing down the readout rate, but the improvement is limited at lower sampling frequencies. At low sampling frequencies, in the absence of g-r noise (i.e., at 214 K), the noise is dominated by the flicker noise and the total noise flattens out because the input noise is rising because l/f and the noise bandwidth product is a constant. For transistors with significant g-r noise (i.e., all the curves above 253 K), the total noise increases at lower frequencies as the sampling frequency approaches the upper side of the g-r peak. The g-r noise has a Lorentzian spectrum and varies as l/f above the g-r peak, so the noise bandwidth product is inversely proportional to sampling frequency in this region. Therefore, for g-r noise dominated transistors, the speed/noise tradeoff reverses in the frequency region just above the g-r peak. In this region, the noise performance actually degrudes with slower sampling rates. At sampling frequencies below the g-r peak, the noise will begin to improve again because the Lorentzian noise spectrum flattens out. This effect can be seen in the curve at 201 K, which shows a small bump at about 500 Hz due to a small g-r peak in the spectrum caused by the 0.22 eV trap. Finally, noting that the total noise is a strong function of the thermal noise level at high sampling frequencies, we performed a calculation to find the total noise for a transistor operated in the forward active region. In this operating region the thermal noise current is considerably lower due to the higher effective channel resistance, thus we expect the total integrated noise to be lower. Because the output transistor of a CCD is usually operated in the forward active region, this would give a better idea of the actual noise in a typical CCD system. For this purpose we created simulated noise
RADIATION DAMAGE ON SCIENTIFlC CHARGE COUPLED DEVICES
85
spectra based on the models plotted in Figure 51 but with the white noise reduced to a level calculated from Equation (34). The transconductance g m was estimated at 70 pS from room temperature measurements. Figure 53 shows the results. In this figure the noise is plotted as a function of the pixel rate R , instead of the sampling frequency. The pixel rate is generally lower than the sampling frequency and can be calculated from R,
1
= ____
(59)
2T,+T,
where T, is the sampling time and T, is a fixed overhead time. The overhead time includes allowances for things like capacitor settling times and the digitization time, which are necessary for proper operation of the CDS processor. In our calculations we assumed an overhead of 1 ps. Including this time gives a more realistic picture of the achievable noise performance. Adding the overhead means that the pixel rate asymptotically approaches 1/T, at very short sampling periods, thus the noise climbs faster near this value. We conclude from Figure 53, in agreement with previous findings [Holland et d.,1991; Murata-Seawalt et al, 19903, that only a very minor 15
T
I
'
'
" * " ' I
*
' " > " , I
10-
a,
v
a,
._ 0 C
0
c
2
52.7~10'protons/cm2
0
I
,
.
L 1 l l , . l
Pixel rate (Hz) FIGURE 53. Total equivalent input noise at 210 K for damaged and undamaged devices from simulated spectra for a transistor in the forward active region.
86
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
increase in read noise can be expected from radiation damage at the normal operating temperature of most CCDs, including the FUSE FES; therefore, it is of little concern.
VII. CONCLUSIONS We have examined the effects of high-energy proton radiation on scientific charge coupled devices and evaluated the severity of the damage at various levels of proton fluence. For buried channel devices which operate in the region below the semiconductor-oxide interface, we found that the dominant factor in the damage effects was an increase in the number of bulk trapping states. We have seen evidence for traps at several energy levels, summarized in Table 7. Because of the exponential temperature dependence of the time constants of the trapping mechanism, the damage effects show a very significant dependence on temperature. 1. Dark Current
Our measurements of the dark current indicate that the increase in thermal generation in radiation-damaged devices is directly proportional to the proton fluence. The increases exhibited in the CCD we measured were equivalent to a room temperature dark current of 5.8 x lo-'' nA/cm2 per TABLE I BULK TRAPPING LEVELSOBSERVED IN PROTON-DAMAGED CCDs.
(& - E,) (ev)
Cross section 0, (cm')
Introduction raten (cm-')
0.1 1
10-15
7
0.17
10-14
84
0.22
10-15
0.30
10-15
Energy
8.7
Observed trap effects
increased low-temperature CTI
28 ~~~~-
0.43
10-14
0.55
10-15
56
6.7
increased high-temperature CTI increased high-temperature read noise increased bulk dark current
"The introduction rate is given as the trap concentration (traps/cm3) per unit fluence (protons/cmz) of 3 MeV protons.
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
87
unit fluence ( 3 MeV proton/cm2) for the surface component and 4.7 x lo-" nA/cm2 per unit fluence for the bulk component (MPP mode). The variation of the dark current with temperature indicated that both the surface and bulk components were due to trapping states very close to the midgap. The dark current showed a significant variation from pixel to pixel, but most of the variation was of a spatially fixed nature. We found that the spatial noise introduced by this variation could be largely eliminated by subtracting a bias frame of equal exposure length. With the fixed-pattern noise subtracted out, the residual noise of the surface current was equal to the Poisson noise inherent in the thermal generation process. In the bulk dark current case, the residual noise was greater than Poisson statistics, showing instead a linear dependence on the dark current level, but the total noise had still been reduced by 96%.
2. Charge Transfer Eficiency The charge transfer efficiency also displayed a linear dependence on the proton fluence, and we found that the CTE became more dramatically dependent on the combination of temperature and clocking speed due to its origin in trapping phenomena with temperature-dependent time constants. Our model of CTE based on trapping theory matched our experimental data very well and seems to be a useful tool for predicting the temperature/ clock rate variation. Thus it can be used to optimize these operating conditions if the parameters of the dominant traps are known. According to our model, if the clock rate is optimized for the baseline FUSE FES operating temperature of -50°C, the increase in the CTI is equivalent to 5.1 x lo-'' per unit fluence.
3. Read Noise Once again in the noise case, the damage factor was proportional to the proton dose. We observed an increase in the low-frequency noise in the damaged devices, which we attributed to generation-recombination noise caused by the bulk traps. The effect was quite sensitive to temperature, and we found that at -50°C the total noise increase was negligible. 4. CCDs in Satellite Missions
Based on our research, we made the following recommendations to the FUSE FES designers, which are applicable to any satellite mission employing CCDs. The CCD should be shielded with the equivalent of 5-mm aluminum. This will reduce the amount of proton radiation experienced by the device by a factor of about 30.
88
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI TABLE 8 FUSE FES CCD PERFORMANCE.
Characteristic
Launch
1/2 mission life (1.5 years)
End of mission (3 years)
Total damage (displacements/cm2)
0
6.6 x 10'
1.3 x 10'
Dark current (e-/pixels/s)
0.7
1.2
1.7
Dark current noise (e-)
0,008
0.014
0.019
Parallel charge transfer inefficiency
2.9 x
5.1 x
7.4 x lo-'
Read noise (e-)
3
3
3
An MPP mode device or dithered clocking is highly desirable. This will dramatically reduce the amount of dark current and dark current noise. Use dark subtraction with dark frames of the same exposure length taken as close as possible to the image frame in order to get the best reduction of the spatially fixed dark current noise. Make operating temperature and clock rate commandable in flight so that the CTI can be optimized as damage progresses. A notch-type device is highly desirable to reduce the low-signal CTI. Make operating voltages commandable in flight. This will enable compensation for the flat-band voltage shift caused by trapped charge in the oxide. Use a correlated dual sampling (CDS) processing scheme to reduce the low-frequency read noise. Table 8 shows the expected performance of the FUSE FES CCD at different stages of the mission if these recommendations are adopted. The results are based on an operating temperature of -50°C, 24-pm pixels, exposure lengths of 1 second, a parallel transfer rate of 12.5 kHz, and a pixel readout rate of 50 kpixels/s.
REFERENCES Banghart, E. K., Levine, J., Trabka, E., Nelson, E., and Burkey, B. (1991). A model for charge transfer in buried channel charge-coupled devices at low temperature, I E E E Puns. Electron Devices 38, 1162-1 174. Barth, J. and Stassinopoulos, E. G. (awaiting publication). Ionizing radiation environment concerns, in Single Euent Criticality Anulysis, NASA Godard Space Flight Center report.
RADIATION DAMAGE ON SCIENTIFIC CHARGE C O U P L E D DEVICES
89
Benton, J. L. and Kimerling, L. C . (1982). Capacitance transient spectroscopy of trace contamination in silicon, J. Electrochernicul Soc. 129, 2098-2 102. Bertram, W. J., Mohsen, A. M., Seater. D. A., Sequin, C. H., and Tompsett, M. F. (1974). A three level metallisation three-phase CCD, IEEE Trcins. Electron Deuices 21, 758-767. Blouke, M., Janesick, J., Hall, J., and Cowens, M. (1981b). Texas Instruments (TI) 800x800 charge-coupled device (CCD) image sensor, in Solid Stute Imugers for Astronomy (J. Geary and D. Lathan, Eds.) Proc. SPIE 290, pp. 6-15. Blouke, M.. Janesick, J., Hall, J., and Cowens. M. (1981b). Texas Instruments (TI) 800x 800 charge-coupled device (CCD) image sensor, Solid Srtrte Imtiyers.fiw Astronomy (J. Geary and D. Latham. Eds.) Proc. SPIE 290, pp. 165-173. Blouke, M. M. and Robinson, D. A. (1981). A method for improving the spatial resolution of frontside-illuminated CCDs, I E E E Duns. Electron Dei:ices, 28, 251 -256. Boulade, 0. (1997). The MEGACAM project: a wide field imaging camera at CFHT, Proceedings of the ESO Workshop "Optirul Detecforsfor Asrrorioniy" (J. W. Beletic and P. Amico, Eds.), Kluwer ASSL Series, Dordrccht, The Netherlands, pp. 203-208. Boyle, W. S. and Smith G. E. (1970). Charge coupled semiconductor devices, Bell Systems Tech. J . 49, 587-593. Bredthauer, R. (1998). The state of C C D technology at Lockheed Martin Fairchild Systems, Prowedings of' the ESO Workshop "Opticul Detectors fbr Astronomy (J. W. Beletic and P. Amico, Eds.), Kluwer ASSL Series, Dordrecht, The Netherlands, pp. 13- 18. Burke, B. E. and Gajar, S. A. (1991). Dynamic suppression of interface-state dark current in buried-channel CCDs. IEEE Trcrns. Electron Deoices 38(2), 285-290. Buss, D. D., Tasch, A. F. Jr., and Barton, J. B. (1979). In Charye-coupled Deoices and Svsterns (M. 1. Howes and D. V. Morgan, Eds.), New York: Wiley-Interscience, p. 121. Chen, C. H. and Deen, M. J. (1998). Direct Calculation of the MOSFET High Frequency Noise Parameters, J. kicuurn Sci. Tech. A (Special Issue), A16(2), 855-859. Chen, C. H. and Deen, M. J. (in press). High Frequency of MOSFET-I Modeling, Solid-Sttrte Electronics. Chen, C. H., Deen, M. J., Yan, Z. X., Schroter, M., and Enz, C. (in press). High Frequency of MOSFET-I1 Experiments, Solid-Sture Electronics. Cofh, S., Privitera, V., Priolo, F., Libertino, S., and Mannino, G. (1997). Depth profiles of vacancy- and interstitial-type defects in MeV implanted Si, J. Appl. Physics 81(4), 1639-1 644. Dale, C., Marshall, P., Cummings, B., Shamey, L., and Holland, A. (1993). Displacement damage effects in mixed particle environments for shielded spacecraft CCDs, I E E E Trrrns. Nucltar Sci. 40(6), 1628-1637. Deen, M . J. (1993). Low frequency noise as ;I characterization tool for InP- and GaAs-based double-barrier resonant tunnelling diodes, Muterids Sci. Engiiwerinyq B, B20, 207-21 3. Deen, M. J. (1993). Low Frequency Noise and Excess Currents Due to Trap-Assisted Tunneling in Double Barrier Resonant Tunneling Diodes, 23rd European Solid-State Device Reserrrch Conference ( E S S D E R C ,931, Grenoble, France, pp. 355-358. Deen, M. J., Ilowski, J. I., and Yang, P. (1995). Low Frequency Noise in Polysilicon-Emitter Bipolar Junction Transistors, J . Appl. Physics 77(12), 6278- 6288. Deen, M. J. and Quon, C . (1991). Characterization of Hot-Carrier Effects in Short Channel NMOS Devices Using Low Frequency Noise Measurements, 7th Biennid Eirropeun C o n f ~ r enre -Insulating Fillns on Semiconductors (INFOS Y l ) (Liverpool, United Kingdom, 2-5 April, 1991) (W. Eccleston and M. Uren. Eds.) U.K.: IOP Publishing Ltd., pp. 295-298. Deen. M. J. and Zhu, Y. (1993). l/f Noise in n-Channel MOSFETs at High Temperatures, A I P Conference Proceedings 28.5 Qriunturn l l j Noise & Other LOPVF ~ P ~ U ~ V FlnctutrICJ, (ions in Electronic Devices (P. H. Handel and A. L. Chang, Eds.), New York: AIP Press, pp. 165- 188. ~
90
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
EEV CCD47 Series (1995). Data sheet from English Electric Valve (EEV) Inc., Chelmsford, United Kingdom. Fleetwood, D. M., Winokur, P. S., Reber, R. A,, Meisenheimer, T. L., Schwank, M. R., Shaneyfelt, M. R., and Riewe, L. C. (1993). Effects of oxide traps, interface traps, and “border traps” on metal-oxide-semiconductor devices, J . Appl. Physics 73(10), 5058 -5074. Gendredu, K. C., Prigozhin. G., Huang, R., and Bautz, M. (1995). A technique to measure trap characteristics in CCDs using x-rays, IEEE Trans. Electron Devices 42( 11). 1912- 1917. Hardy, T., Deen, M. J., and Murowinski, R. (1997). Charge transfer efficiency in proton damaged CCDs, Proceedings y f the ESO Workshop “Opticrrl Detectors ,for Astronorny” (J. W. Beletic and P. Amico, Eds.), Kluwer ASSL Series, Dordrecht, The Netherlands, pp. 223-230. Hardy, T., Murowinski, R., and Deen, M. J. (1997). The Effect of Proton Radiation on the Charge Transfer Eficiency in CCDs, in Optical Derectorsfiir Astronomy (J. W. Beltic and P. Amico, Eds.), Netherlands: Kluwer Academic Press, pp. 223-230. Hardy, T., Murowinski, R., and Deen, M. J. (1998). Charge Transfer Efficiency in Proton Damaged CCDs, I E E E Trans. Nucleur Sci. 45(2), 154-163. Holland, A. D. (1993). The effect of bulk traps in proton irradiated EEV CCDs, Nuclear Instr. Meth. Physics Res A326, 335-343. Holland, A., Abbey, A,, Lumb, D., and McCarthy, K. (1990). Proton damage effects in EEV charge coupled devices, in Photonics,fijr Space Environments, Proc. SPIE 1344, pp. 378-394. Holland, A., Holmes-Siedle, A,, Johlander, B., and Adams, L. (1991). Techniques for minimizing space proton damage in scientific charge-coupled devices, I E E E Trans. Nuclerir Sci. 38(6), 1663- 1670. Hopkins, I. H., Hopkinson, G. R., and Johlander, B. (1994). Proton-induced charge transfer degradations in CCDs for near-room temperature applications, I E E E Trans. Nucleur Scimcc 41(6), 1984-1991. Jaggi, B. and Deen, M. J. (1988). Low Temperature Operations of Silicon Charge Coupled Devices for Imaging Applications, Proceedings of the Syniposium on Loiv Tem~~rrcirure Electronics and High Temperature Superconductors, Proc. Vol. 88-9 (S. I. Raider, R. Kirschman, H. Hayakawa and H. Ohta, Eds), New Jersey: Electrochemical Society Press. pp. 579-589. Janesick, J., (1991). History and advancements of large area array scientific C C D imagers, in the notes from the short course An Introduction to Scientific Charge Coupled Devices, San Diego. Janesick, J. Elliott, T., Bredthauer, R., Chandler, C., and Burke, B. (1988). Fano-noise limited CCDs, X - R a y Instrumentation in Asrronomy 11 (L. Golub, Ed.), Proc. SPIE 982, pp. 70-95. Janesick, J., Elliott, T., Daud, T., and McCarthy, J. (1985). Backside charging of the CCD, Solid State Imaging Arrays (K. Prettyjohns and E. Dereniak, Eds.), Proc. SPIE 570, pp. 46-79. Janesick, J., Elliott, T., Dingizian, A., Gunn, J., Bredthauer, R., Chandler, C., and Westphal, J. (1989). New advancements in charge-coupled device technology- sub-electron noise and 4096 x 4096 pixel CCDs, in CCDs in Astrononiy (G. H. Jacoby, Ed.), Astronomical Society of the Pacific Conference Series 8. Utah: Brigham Young University Press, pp. 18-39. Janesick, J . , Hynecek, I., and Blouke, M. (1981). Virtual phase imager for Galileo, Solid Stare Imugers/iv Astronomy (J. Geary and D. Latham, Eds.), Proc. SPIE 290, pp. 165-173. Janesick, J., Soh, G., Elliott, T., and Collins, S . (1991). The effects of proton damage on charge-coupled devices, Charge Coupled Devices and Solid Stute Optical Sensors 11, Proc. SPIE 1447, pp. 87-108. Jones, B. K. (1994). Low-frequency noise spectroscopy, IEEE Trans. Electron Devices 41( 11). 2188-2197. Kandiah, K. and Whiting, F. B. (1991). Nonideal behavior of buried channel CCDs caused by
RADIATION DAMAGE O N SCIENTIFIC CHARGE C O U P L E D DEVICES
91
oxide and bulk silicon traps, Nuclear Instr. Meth. Physics Res. A305, 600-607. Kim, Ch.-K. (1979a). In Charge-coupled Deiices and Systems (M. J. Howes and D. V. Morgan, Eds.), New York: Wiley-Interscience, pp. 10- 14. Kim, Ch.-K. (1979b). In Churye-coupled Deoices und Systems (M. J . Howes and D. V. Morgan, Eds.), New York: Wiley-Interscience, p. 17. Kim, Ch.-K. (1979~).In Churye-coupled Devices uiid Systems (M. J. Howes and D. V. Morgan, Eds.), New York: Wiley-Interscience, pp. 57-58. Kim, H. E., Blouke, M., and Heidtmann, D. (1990). Effects of transistor geometry o n C C D output sensitivity. Charge-Coupled Deliices und Solid Stute Optical Sensors (M. Blouke, Ed.), Proc. SPIE 1242, pp. 195-204. Kim, H., and Heidtmann, D. L. (1989). Characteristics of ljf noise of the buried-channel charge-coupled device (CCD), Optictrl Sensors und Ekctronic Photogruphy, Proc. SPIE 1071, pp. 66-74. Kolcv, P., and Deen, M. J. (1997). Constant Resistance DLTS in Submicron MOS FETs, Proceedings of the Fourth Syniposium on LOWTemnpcruture Electronics and High Temperature Si~perconductivif~i, Proc. 97-2 (C. Claeys, S . 1. Raider, M. J. Deen, W. D. Brown, and R. K. Kirschman, Eds.), New Jersey: The Electrochemical Society Press, pp, 147- 158. Kolev, P. V., and Deen, M. J. (1998). Constant-Resistance Deep Level Transient Spectroscopy in Sub-Micron Metal-Oxide-Semiconductor Field-Effect Transistors, J . Appl. Physics 83(2), 820-825. Kolev, P., Deen, M. J., and Alberding, N. (1997). Digital Averaging and Recording of DLTS Signals, Proceedings ofthe Symposium on Diugnostic TechniyuesfOr Setniconductor Muterials md Devices, Proc. 97-12 (P. Rai-Choudhury, J. Benton, and D. Schroder. Eds.), New Jersey: The Electrochemical Society Press, pp. 377-388. Kolev, P., Deen, M. J., and Alberding, N. (1998). Averaging and Recording of Digital Deep-Level Transient Spectroscopy Transient Signals, Rev. Sci. Instr. 69(9), 2464-2474. Kolev, P., Deen, M. J., Hardy, T., and Murowinski, R. (1998). A New Voltage Transient Technique for Deep Level Studies in Depletion-Mode Field-Effect Transistors, J . Elecrrochemicul SOC.145, 3258-3264. Kolev, P.. Hardy, T., Deen, M. J., and Murowinski, R. (1997). The use of constant-resistance DLTS to study proton radiation damage in C C D output MOSFETs, Proceedings of thr Symposium on Diugnostic Techniquesfor Semiconductor Mu1eriuls and Devices, Proc. 97-1 2 (P. Rai-Choudhury, J. Benton, and D. Schroder, Eds.). New Jersey: The Electrochemical Society Press, pp. 388-399. Kosonocky, W. F., and Zaininger, K. H. (1979). In Charge-coupled Detiices and Systems (M. J. Howes and D. V. Morgan, Eds.), New York: Wiley-Interscience, p. 218. McGrath, R. D., Freeman, J. W., and Keenan, W. F. (1983). A 1024 x 1024 virtual phase C C D imager, IEDM Technicul Digest, 749. Meidinger, N., and Striider, L. (1995). Radiation hardness of pn-CCDs for x-ray astronomy, IEEE Trans. Nucleur Science 42(6), 2066-2073. Mohsen, A,, and Tompsett, M. (1974). The effects of bulk traps on the performance of bulk channel charge-coupled devices, IEEE Trans. Electron Deaices 21( 1 I), 701-712. Murata-Seawalt, D., Orbock, J. D., Delamere, W. A,, Fowler, W., and Blouke. M. M. (1990). Proton radiation effects on multiple-pinned phase CCDs, Churge-Coupled Deuices und Solid Stuie Opticul Sensors, Proc. SPIE 1242, pp. 84-93. Murowinski, R., and Deen, M. J. (1993). Low Temperature Characteristics of Large Array Charge Coupled Devices, in Proceedings of the Symposium on Low Temperature Electronics and High Temperuture Superconductivity, Proc. 93-22 (S. Raider, C. Claeys, D. Foty, and T. Kawai, Eds.), New Jersey: Electrochemical Society Press, pp. 209-220. Murowinski, R. and Deen. M. J. (1995). CCDs for the Lyman FUSE Mission, Applicutions of Photonics Technology ( G . A. Lampropoulos, J. Chrostowski, R. M. Measures, Eds.), New
92
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
York: Plenum Press, pp. I81 - 196. Murowinski, R. Deen, M. J., and Hardy, T. (1995). Charge transfer efficiency in low temperature charge coupled devices, Proceedings of’ the Symposium on Low Trmperuture Electronics and High Temperature Superconductivity, Proc. 95-9 (C. Claeys, S. Raider, R. Kirschman, and W. Brown, Eds.), New Jersey: The Electrochemical Society Press, pp. 299-314. Murowinski, R., Linzhuang, G., and Deen, M. J. (1993a). Effects of space radiation damage and temperature on the noise in CCDs and LDD MOS transistors, JEEE Trans. Nuclear Sci. 40(3), 288-294. Murowinski, R., Linzhuang, G., and Deen, M. J. (1993b). Effects of space radiation damage and temperature on CCD noise for the Lyman FUSE mission, Photonicsfor Space Environmcnts (E. Taylor, Ed.), Proc. SPIE 1953, pp. 71-81. Murowinski, R., Linzhuang, G., and Deen, M. J. (1993~).Effects of Radiation Damage and Temperature on CCD Noise for the Lyman FUSE Mission, Proc. SPIE Photonicsfor Space Environmenrs- OEIAerospace Science and Sensing 1953, Orlando, Florida, pp. 7 1-81. Murowinski, R., Linzhuang, G., and Deen, M. J. (1993d). Temperature Dependence of Radiation-Induced Trapping Phenomena in CCDs, IEEE Workshop on Charge Coupled Devices, Waterloo, Ontario, p. 5. Robbins, M. S.. Roy, T., and Watts, S. J. (1992). Degradation of charge transfer efficiency of a buried channel charge coupled device due to radiation damage by a beta source, RADECS 91, IEEE Proc. 15, pp. 327-332. Saks, N. S. (1977). Investigations of bulk electron traps created by fast neutron irradiation in a buried channel CCD, IEEE Trans. Nuclear Science 24, pp. 2153-21 57. Solomon, A. L. (1974). Parallel-transfer-register charge-coupled imaging devices, 1974 IEEE Inrercon Technical Papers, Session 2, New York. Stassinopoulos, E. G., and Barth, J. (1991a). FUSE elliptical orbit radiation environment, NASA Goddard Space Flight Center publication X-900-91-09, Greenbelt, Maryland. Svensson, B. G., Jagadish, C., and Williams, J. S. (1993). Generation of point defects in crystalline silicon by MeV heavy ions: dose rate and temperature dependence” Physicul Rei: Lett. 71(2), 1860-1863. Sze, S. M. (1981). Physics q’Semiconductor Devices, New York: Wiley-Interscience, p. 261. Theuwissen, A. (1997). Modular CCD concept for large area CCD imagers, Proceedings of the ESO Workshop “Optical Defectors for Asrronomy” (J. W. Beletic and P. Amico, Eds.), Kluwer ASSL Series, Dordrecht, The Netherlands, pp. 37-44. TMA MEDIC1 (1994a). Two-Dimensional Device Simulation Program, Version 2.0, Technology Modeling Associates, Inc., Palo Alto, California. TMA TSUPREM-4 (1994b). Two-Dimensional Process Sirnulation Program, Version 6.1, Technology Modeling Associates, Inc., Palo Alto, California. Tompsett, M. F. (1973). The quantitative effects of interface states on the performance of charge-coupled devices, IEEE Trans. Electron Devices 20( l), 45-55. Tompsett, M. F., Amelio, G. F., and Smith, G. E. (1970). Bell Sysrems Tech. J . 49, 593. van der Ziel, A. (1980a). Noise in Solid State Deuices and Circuits, New York: WileyInterscience, p. 76. van der Ziel, A. (1986b). Noise in Solid Sture Decices and Circuits, New York: WileyInterscience, p. 121. Van Lint, V. A. J. (1980a). Mechanisms of Radiation Efects in Electronic Materials, New York: Wiley-Interscience, p. 14. Van Lint, V. A. J. (1980b). Mechanisms ofRadiation Effects in Elecrronic Materials, New York: Wiley-Interscience, p. 85. Van Lint, V. A. J. (1980~).Mechanisms of Radiation Efects in Electronic Materials, New York,
RADIATION DAMAGE ON SCIENTIFIC CHARGE COUPLED DEVICES
93
Wiley-lnterscience, p. 222. Van Lint, V. A. J. (1987). The physics of radiation damage in particle detectors” Nuclear Instr. Meth. Physics Res. A253,453-459. Walden, R. H., Krambek, R. H., Strain, R. S., McKenna, J., Schryer, N. L., and Smith, G . E. (1972). A buried channel charge coupled device, Bell Systems Tech. 51, 1635-1640. Wells, D. C., Greisen, E. W., and Harten, R. H. (1981). FITS: A Flexible Image Transport System, Astronomy and Astrophysics Suppl. Ser. 44, 363-370. White, M. H., Lampe, D. R., Blaha, F. C., and Mack, I. A. (1974). Characterization of surface channel CCD image arrays at low light levels. IEEE .I Solid . Stute Circuits 9(2), 1-14. Zayer, I., Chapman, I., Duncan, D., Kelly, G., and Mitchell, K. (1993). Results from proton damage tests on the Michelson Doppler Imager CCD for SOHO, Charge Coupled Devices and Solid Slate Optical Sensors 111 (M. M. Blouke, Ed.), Proc. SPIE 1900, pp. 97-107. Zhao. X., Deen, M. J., and Tarof, L. (1996). Low Frequency Noise in Separate Absorption, Grading, Charge and Multiplication (SAGCM) Avalanche Photodiodes, Electronics Lett. 32(3), 250-252. Ziegler, J . , Biersack, J., and Littmark, U. (1985). The Stopping und Range oflons in Solids, New York: Pergamon Press. Zhu, Y., Deen, M. J., and Kleinpenning, T. M. (1992). A New l/f Noise Model for Metal-Oxide-Semiconductor Field-Effect Transistors in Saturation and Deep Saturation, J . Appl. Physics 72( 12), pp. 5990-5998.
LISTOF ABBREVIATIONS AND SYMBOLS ADC AR CCD CDS CFHT CR-DLTS CS-CDS CTE CTI DLTS DSI EPER FUSE FES FET g-r HE0 HST LDD LEO LG
analog-to-digital converter antireflection charge coupled device correlated double sampling Canada France Hawaii Telescope constant resistance deep level transient spectroscopy clamp-and-sample correlated double sampling charge transfer efficiency charge transfer inefficiency deep level transient spectroscopy dual slope integration extended pixel edge response Far Ultraviolet Spectrographic Explorer Fine Error Sensor field effect transistor generation-recombination highly elliptical orbit Hubble Space Telescope lightly doped drain low earth orbit last gate
94
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
metal-insulator-semiconductor MIS Multiple Mirror Telescope MMT metal-oxide-semiconductor MOS MOSFET metal-oxide-semiconductor field effect transistor multiple pinned phase MPP output drain OD output source 0s 0-v oxygen-vacancy phosphorus-vacancy P-v quantum efficiency QE RD reset drain RG reset gate sw summing well vacancy-vacancy (divacancy) v-v WFjPC Wide Field and Planetary Camera A*
f .L Yd
9,
G h Jb JS
k n ni 4,
effective Richardson constant (252 A/cm’/K’ for n-type (100) Si) pixel area (cm’) speed of light in vacuum (2.998 x 10” cm/s) capacitance (F) trapping state density (cmP2) electron electron emission probability energy (eV) conduction band energy (eV) Fermi energy (eV) intrinsic Fermi energy (eV) trap energy (eV) electric field (V/cm) frequency (Hz) flicker noise corner frequency drain conductance (S) transconductance (A/V) carrier generation rate (s- ’) Planck‘s constant (4.136 x eV-s) bulk dark current density (A/cm2) surface dark current density (A/cm’) Boltzmann’s constant (8.617 x l o P 5eV/K) free electron concentration (cm-3) intrinsic free electron concentration (cm- 3, number of occupied traps at steady state
RADIATION DAMAGE O N SCIENTIFIC CHARGE COUPLED DEVICES
number of occupied traps acceptor concentration (cm-3) effective density of states in the conduction band (cmdonor concentration (cm-3) trap concentration (crn-j) effective density of states in the valence band (cmP3) free hole concentration (cm- 3, electronic charge (1.602 x lO-”C) load resistance (SZ) surface generation velocity (cm/s) noise spectral density (A2/Hz) time (s) temperature (K) thermal velocity (cm/s) applied voltage (V) charge packet volume (cm3) threshold voltage (V) depletion depth (cm) wavelength (cm) frequency (Hz) electron trapping cross section (cm’) hole trapping cross section (cm’) carrier lifetime (s) effective lifetime in a depletion region (s) electron capture time constant (s) electron emission time constant (s) integrator time constant (s) intrinsic Fermi potential (V) surface potential (V) Fermi potential (V)
95
3,
ACKNOWLEDGMENTS We are grateful to the Canadian Space Agency and Dr. John Hutchings, Dominion Astrophysical Observatory (DAO), for their support of this work, and to the National Research Council, the Natural Sciences and Engineering Research Council (NSERC) and Simon Fraser University for partial financial support. We would also like to thank the members of the Integrated Devices and Circuits Laboratory at Simon Fraser University for their comments and suggestions during our research meetings and for their
96
T. D. HARDY, M. J. DEEN, AND R. MUROWINSKI
support of this work. In particular, we are grateful to Dr. Plamen Kolev for assistance with the transistor measurements and for the results of his Deep Level Transient Spectroscopy (DLTS) experiments and to Dr. Sergei Rumyantsev for a very helpful discussion on measuring the low-frequency noise in transistors.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 106
CAD Using Green’s Functions and Finite Elements and Comparison to Experimental Structures for Inhomogeneous Microstrip Circulators CLIFFORD M. KROWNE Microwave Technology Brunch, Electronics Science und Technology Division. Nuvd Resrurch Laborutory. Wushington. DC 20375
I. Introduction to CAD for Microstrip Circulators . , . . . . . . . . . . , 11. Ferrite Physical and Chemical Attributes Relevant to Microstrip Circulator Material Selection . . . . . . . . . . . . . . . . . . . . . , . . . . Ill. Processing of Ferrite Materials for Microstrip Circulator Structures . , . . , A. Hybrid Circuit Compatible Techniques . . . . . . . . . . . . . . . B. Magnetless Compatible Techniques . . . . . . . . . . . , , . . . , C. Monolithic Circuit Compatible Techniques . . . . . . . . . . . . . . IV. Microstrip Circulator Considerations for Modeling . . . . . . . . . . . . A. Ferrite Material Parameters . . . . . . . . . . . . . . . . . . , . B. Matching Sections . . . . . . . . . . . . . . . . . . . . , , , . C. First-Order Layer Effect Estimation . . . . . . . . . . . . . . . . . D. First-Order Loss Effect Estimation . . . . . . . . . . . . . . . , . V. Setup Formulas for Numerical Evalution of Microstrip Circulators . . , . . A. Formulas for Static Internal Magnetic Field . . . . . . . . . . . . . B. RF Formulas from Green’s Function Approach . . , . . . . . , , . . C. RF Formulas from Finite Element Approach . . . . . . . . . . . . . VI. Numerical Results and Comparison to Experiment for Microstrip Circulators A. Static internal Magnetic Field Results . . . . . . . . . . . . . . . . B. RF Field and +Parameters Using Green’s Functions . . . . . . . . . . C. RF Field and s-Parameters Using Finite Elements . . . . . . . . . . . VII. Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . , , , . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
97
.
.
99 105 105 110 11 1 I 13 113 120 131
.
134
. . . .
. . ,
,
.
. . . . .
. .
.
150 158 167 172 172 172 174 178 181 182
I. INTRODUCTIONTO CAD FOR MICROSTRIP CIRCULATORS Computer aided design, or CAD, has become indispensable in the microwave electronic industry- an industry devoted to producing reliable, lightweight, compact, and efficient circuits for commercial and military applications. Use of CAD to solve physics and engineering design problems for components used in electromagnetic communication equipment has been the norm today for such components as filters, resonators, isolators, limiters, oscillators, transmission lines, couplers, attenuators, and amplifiers, 97 Volume 106 ISBN O-I?-014748-7
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyrighl R I 1999 by Acddemlc Press All rights of reproduction in diiy form reserved ISSN 1076-567nm m o m
98
CLIFFORD M. KROWNE
as well as circulators. The microstrip circulator is a basic building-block device in many microwave and millimeter wave circuits because it provides nonreciprocal port control in the planar format, presently the technology of choice for all advanced hybrid and monolithic integrated circuits. What has changed today is an ever-growing need to avoid expensive trial and error design methods previously used to fabricate microstrip circulators (and their cousins, stripline circulators), and to ensure tremendous reliability with attention to stringent performance demands. This is especially important when hundreds, thousands, or even hundreds of thousands of devices are required for insertion or placement into circuits. The need for better design tools for circulators in today’s potential market of missiles, planes, ships, satellites, trains, and cars must finally catch up with demand and change from employing antiquated approaches utilized thirty years ago. It has been that long of a time period between the recent developments to be presented in this chapter for nonuniform circulators and the original introduction of theoretical Green’s function methods for uniform devices. Our focus here is CAD techniques which lead to realistic engineering design, and that implies that all the modeling tools to be used must allow rapid redesign trials of device structures. The Green’s function approaches meet this criterion since they allow analytical integration at several stages of derivation and lead to expressions which may be evaluated to determine electromagnetic fields and s-parameters. Supporting use of finite element solvers to determine static magnetic field bias- or, respectively, as ancillary and corroborating simulators to calculate fields for irregularly shaped circulators or for circularly shaped circulators -is also part of a complete CAD effort. The fastest run times may be expected from two-dimensional theoretical models put into computer code form. A dyadic Green’s function approach for two dimensions has been discussed in Volume 98 of this series [Krowne, 1996al. For the case where some theoretical assessment of the finite substrate thickness effect on the electromagnetic field structure is desired, as well as a guide in developing a three-dimensional computer code [Krowne, 1996b], reference to the second part of the contribution in Volume 98 is recommended. Both the two- and three-dimensional models assume hard magnetic walls between the internal ferrite circulator puck and the exterior substrate material which could be a dielectric, nonconductive semiconductor, unbiased ferrite, o r even a partially biased ferrite in an actual device structure. Hard magnetic wall presence ensures lack of interaction between the internal circulator field and the environment beyond the puck perimeter. This approach simplifies modeling and is found to approximate reality very closely. It also has a great advantage in directing attention to the entry and
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
99
exit of electromagnetic propagating waves through the device ports. However, if it is desired to find the small effect (hopefully) of the surrounding medium on the internal field and device performance, the magnetic wall must be made somewhat leaky or penetrable. This has been treated in Volume 103 of this series [Krowne, 1998a1. Often in the disussion of CAD, scant attention is placed on those factors which allow CAD to be applied, namely the special properties of the materials and structures which make CAD feasible and facilitate its application. Those factors will be included as well. Section I1 will cover the molecular physical and chemical properties of ferrite materials so that compositions may be selected which are acceptable for planar microstrip circulators. Section I11 will cover fabrication methods for making planar ferrite layers, including hybrid, magnetless, and monolithic circuit compatible techniques. Essential considerations for modeling, such as ferrite material parameter requirements, matching sections needed to take the circulator port impedance to the external circuit impedance, multilayer first-order effects, and loss effects, are all treated in Section IV. Loss effect derivations provide the first simple, detailed and complete electromagnetic presentation as directly applied to the planar microstrip circulator in the literature. Equations needed to interface the Green’s function or finite element computation engines with other routines under the control of the user, so that static internal (bias) magnetic field, rf field or s-parameters can be found, are given in Section V. Finally, Section VI gives the simulation results obtained from numerical modeling and the experimental results found from measurements, and makes a comparison.
11. FERRITE PHYSICAL AND CHEMICAL ATTRIBUTESRELEVANT TO MICROSTRIP
CIRCULATOR MATERIAL SELECTION Ferrimagnetic oxides, or ferrite oxides, are perfect for the nonreciprocal material since they are insulators, making them act as part of the substrate for the circulator structure. Three crystalline types have been found particularly suitable for bulklike, thick film or thin film processing. These types are the garnet, spinel, and hexagonal crystalline structures. There are a few garnet systems, quite a number of spinel systems, and several hexagonal systems, so selected examples will be discussed in the following text in order to give the reader familiarity with the various crystalline structures. Table 1 gives the name, composition, structure, saturation magnetization 47cM,, relative dielectric constant E, loss tangent tan 6, ferrimagnetic resonant linewidth AH contributions, Curie temperature T,, magnetic anisotropy
TABLE 1 PROPERTIES OF BULK FERRITE MATERIALS.
Composition
Code
-
Structure
4nM, (25°C) (Gauss)
E'
(X-band)
Tan6 X-band
AH 25 C X-band (Oe)
AHK 25°C X-band (Oe)
Curle temp (C)
H,,,, 2k,/M, (Oe)
YIG
Y3FesO12
Garnet
1780
15
0.0002
40
1.4
285
80
NiFerrire
NiFe20,
Spinel
3000
12.8
0.0005
350
12.4
585
425
Ni-A1
NiA1,2Fe,
Spinel
2100
12.6
0.001
460
6.1
540
600
Li-Ti
LiO
Spinel
2750
Spinel Spinel Hexagonal Hexagonal Hexagonal Hexagonal 'pine' Spinel
1300 5000 4000 4000 2500 3600 2250 2300
12.5 22.7 23 16 17.4 15.8 12.7
0.0005 0.0008 0.001 0.001 0.0003 0.0005 0.0002
160 400
6.0 2.44
500 450 320
2.44 1.5 2.5
375 470 480 430 510 520 300
300 17,000 19,000 23.500 12,700 800 350
Garnet Garnet
1000 1400
15.4 15
0.0002 0.0005
100
6.0 0.8
285 120
200 Low
,804
0
Li-Ti Ni-Zn BaM SrM SrM(0.04AI) Ni,W Li-Ti MgMn Y-Gd YCAVIG
55SMn0.ITi01IFe2.235'4
Li0.75SMn0l T i O 5lFeI.635O5 Ni0.6Zn0.5Fe04
BaFe 1 2 0 1 9 SrFe1201,
SrAl0,,Fe,, 20,y BaNi,Fe,,O,, Li0.6 I S M n O ~Ti0.28cu0.05Fel.95S04 56.48%Mg0 -6.26% MnO -37.26% Fe,O, Y1.7Gd1.3Fe5012 y2 ti~ao.,~~o.tiVo.,Fe4.2O12
2
Circulator application X-band (6- 12 GHz & 35 GHz) Ku-band (12-20 GHz) X-band (7-11 GHz) Broadband (6- 18 GHz, co-fired) mmW (30-94 GHz) mmW (30-94 GHz) mmW (30-90 GHz) 35 C H z & 77 GHz Ka-band X-band (Latching) 6-15 GHz C-bandp-band X-band (LEC)
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
101
field Hanis,and frequency value or range where the circulator is operated using the ferrite. The table lists three garnet systems, seven spinel systems, and four hexagonal systems. A magnetic garnet system (Y,Fe,O,,) of ferrimagnetic oxides is the most stable of the three types (see Fig. 1 for the basic crystal structure [Standley, 19723). Dielectric losses are lower than those of the spinels partly due t o the cations being in the 3 state, thereby making it less likely for Fe2' to form. Magnetocrystalline anisotropy and magnetostriction are also manageable quantities. Additionally, there are a great number of substitutions available for all three sites that may be used to adjust magnetic, microwave, and optical properties. The problem with the garnets is the near-refractory thermal properties. Garnet systems with lower sintering temperatures would be the YCaV and CaVBi iron garnet families, if reduced magnetizations are acceptable and, for the latter family, the high volatility of Bi is tolerated.
+
FeS+( a )at (004) Fe3+(d) a t (04 %) 0 Y3+at($&&)and at (040) 8 0
FIGURE1. Garnet ferrite crystal structure. Coordination of positive ions about oxygen ions in YIG. (Standley, 1972.)
102
CLIFFORD M. KROWNE
A nickel ferrite system (NiFe,O,) is usually a stable spinel [Dionne, 1987; Dionne et al., 19951, with nickel staying in the 2+ state and mostly occupying octahedral sites (see Fig. 2 for the basic crystal structure [Pollert, 19841). The occurrence of Fez+ is not much of a concern because Ni3+ appears sporadically in special cases like Li' substituted NiO (Dionne, 1990). Dielectric losses are still low for the spinels. The main drawback to +
FIGURE 2. Spinel ferrite crystal structure. Formation of the spinel structure by the occupation of tetrahedral and octahedral holes by cations: big open circles = oxygen anions; solid small circles = cations in tetrahedral positions (A); small open circles = cations in octahedral positions (B). (Pollert, 1984.)
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
103
the nickel ferrite system is the large magnetostriction constant A,,, that makes the anisotropy and hysteresis loop behavior of the substance very stress-sensitive. Magnetostriction compensation using manganese in the 3 + state (Mn3+) is helpful, and should be used once suitable deposition parameters have been determined [Dionne and West, 19871. Overall magnetization 4nM reduction is possible by diluting the magnetization of the octahedral sublattice using A13+.Magnetization increase, on the other hand, is possible by diluting the tetrahedral sublattice, substituting Zn2+ for Ni2+, with a concomitant rise in Fe3+ at the octahedral sites, further raising 471M. As a consequence of Zn2+ introduction, the Curie temperature T, is reduced (Fig. 3), increasing temperature sensitivity. Zinc addition does have an extra benefit, the reduction of the sintering temperature. The resulting magnetization rise due to Zn substitutions is particularly welcome at millimeter wave frequencies.
700
600
-Y
o Fe30, 0
CoFez 0,
500
Y
Q
El
.-3 400
u
300
200
100
0
0
1000
2000
3000
4000
SO00
6000
4 n at ~204c
FIGURE 3. Curie temperature versus 4nM, for several microwave spinel ferrites.
104
CLIFFORD M. KROWNE
Manganese (MnFe,O,) and manganese/magnesium ((MgMn)Fe,O,) systems are related to the Ni ferrite spinels, but with octahedral-tetrahedral exchange coupling much reduced. This is because Mn2+ has a bigger ionic radius (about 0.80 A) than Ni2', making octahedral site M n 2 + excharlge coupling to T, and other Mn2+ ions much weaker. This results in a Curie temperature below all of the other spinels, and may be unacceptably low where moderate circulator power dissiption is expected. Zn addition will further reduce T,. For the MgMn system, there is still a temperature sensitivity limitation. Past theoretical study has indicated that obtaining both high 4zM and T, is unlikely [Dionne, 19883. Dielectric losses may be very high in these compounds because of the charge transfer process Fe2++ Mn3+-+ Fe3++ M n 2 + that stabilizes Fe3+ in octahedral sites. Also, the polaronic conduction mechanisms Fe2+ --+ Fe3+ + e - and Mn3+ Mn2+ - e- may occur. Such conductivity processes, although less apparent, are also found in MgMn ferrites where Mg2+ ions prefer the octahedral sublattice sites. Uncertain ionic site locations make the final ferrite properties subject to firing schedule choices. M n 2 +in small amounts can suppress Fe2+content, a very good feature, and Mn3+ substitutions can reduce magnetostriction, also a good result. However, large amounts of Mn cause problems. Two of the desirable properties of the Mn and MgMn family are the insensitivity to stress and the square hysteresis loop shape. It is thought at this time that the Mn ferrite family is not promising for use in microwave and millimeter wave circulators. A lithium (Lio,5Fe2.504)spinel system displays high Curie temperatures from the larger amount of Fe3+ in both sublattices, causing more antiferromagnetic exchange interactions. It is undesirable, though, in that Li' is rather volatile. Above 1000°C, Li,O comes out of this spinel compound and it becomes iron-rich, creating Fe2+ and the phase a-Fe,O,. Dense, microwave-usable-quality Li ferrite spinel material can be encouraged using Bi,O, as a flux to reduce sintering temperatures. However, Bi tends to segregate along grain boundaries, damaging mechanical strength. Copper (CuFe,O, and Cu,,,Fe,,,O,) spinel systems in bulk ceramic form have a difficult chemistry to control and so have rarely been used for microwave applications [Schieber, 19671. Such problems are expected to persist when considering use in planar circulator structures. Like Li, Cu has a tendency to be volatile at high temperatures. Sintering temperatures are lower than most spinels, and proper stoichiometry is difficult to attain. The Cu2+ ion also has significant Jahn-Teller behavior that pushes the lattice toward tetragonal symmetry and will produce large positive magnetostrictive effects when added to cubic lattices. Another negative characteristic of Cu spinel ferrites is the existence of polaronic conduction reactions similar to those found in Mn spinel ferrites, namely Cu2+-+ Cu3+ + e- and Cu2+*Cu'+ - e- [Dionne, 19961. f--)
+
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
105
Barium or strontium (BaFe,,O,, and SrFe,,O,, or the Ba06Fe2O, and SrO.6Fe2O, group) hexagonal systems produce huge internal anisotropy bias fields, which makes them of tremendous interest for self-biasing structures where the use of external magnets can be forgone (see Fig. 4 for the basic crystal structure [Standley, 19721). Magnetization is lowered by A13+ substitutions. Diamagnetic In, Sc, or other ion replacements for Fe also lower 4rcM. This reduced magnetization for uniaxial hexagonal ferrites is associated with an increase in the anisotropy field. HRnisreduction occurs when Co'" ions in combination with Ti4+ ions for electrical charge balance are substituted. These hexagonal ferrites are referred to as BaM or SrM type materials. Sintering aids such as CaCO, and Bi,O, enable almost full densification at temperatures below 950°C. A1,0, can be added as a grain growth inhibitor, yielding very high coercive fields H , up to 6000 Oe. Figure 5 shows a typical hysteresis curve for a hexagonal ferrite. For additional information and diagrams on crystal structure of magnetic oxides, the reader is referred to Landdolt-Bornstein [1970] and von Aulock [I19651. 111. PROCESSING OF FERRITE MATERIALS FOR MICROSTRIP
CIRCULATOR STRUCTURES Quite a few methods have been examined to prepare bulk thick film and thin film ferrites compatible with integrated circuit planar technology [Webb, 19951. Methods include tape casting (aqueous, organic, magnetic), roll compaction, pulsed laser deposition, spin spray and solution plating, liquid phase epitaxy, jet vapor deposition, co-firing, screen printing, hot isostatic pressing, direct chemical deposition, and plasma vapor deposition, cyclic dip and fire. The following text will discuss the primary classes of processing techniques: hybrid circuit compatible, magnetless circuit compatible, and monolithic circuit compatible. There are a number of particular processing methods which are applicable for each class, so they have been grouped according to which circuit application they are best suited. The subsections to follow will discuss the interest for each class and mention a few successful processing techniques. A . Hybrid Circuit Compatible Techniques
Processing is done in a hybrid sense to produce ferrite in a planar configuration which will become part of a standalone circulator device,
106
CLIFFORD M. KROWNE
iron in fivefold position
0
iron in octahedral position
0
iron in tetrahedral position
0
@ ___-__ X
oxygen barium mirror plane c-axis, threefold symmetry center of symmetry
FIGURE 4. Hexagonal ferrite crystal structure. The unit cell of BaFe,,O,,. All atoms lie in mirror planes containing the c-axis, and atoms in one such plane are shown, together with the blocks S, S*, R, and R*. (Standley, 1972.)
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
107
FIGURE5 . Hysteresis curve for a BaM hexagonal ferrite. (Provided by Gordon R. Harrison of EMS Technologies, Inc.)
which will eventully have to be externally bonded to other parts of the circuit. The least reliable method is of course wire bonding, and it is labor-intensive, although it can be mechanized. Tape casting is used to produce thick sheets of ceramic material, and pulsed laser deposition or jet vapor deposition is used to produce thin films. Tape casting allows sheets with thicknesses from 300 pm to 600 pm as required for microstrip circulators in the microwave frequency regime. It has proven to be an excellent high-volume, low-cost method for producing dielectric substrates, but little has been done in adapting the technique to make ferrite materials. The method entails first forming a slurry of the material with an organic binder and organic solvent. Because magnetic materials tend to flocculate, a dispersant must be used to make uniform sheets. A doctor blade is applied to spread the material in uniform thickness green sheets. The sheets are then heated to evaporate the organic materials and sintered at temperatures between 1200°C to 1500°C. Tape cast yttriumiron-garnet (YIG) has been produced this way, and is noteworthy since YIG is one of the most useful ferrite materials for mid-microwave frequency (C, X,and Ku bands) applications. Tapes are fully dense and exhibit magnetic and electric properties comparable to conventional bulk YIG. An alterna-
108
CLIFFORD M. KROWNE
tive technique of tape casting is to proceed with an aqueous approach, avoiding environmentally unfriendly organic materials and solvents. Problems with foaming and low viscosity are common for the aqueous approach, but they have been overcome. YIG material produced using the aqueous method of tape casting has excellent characteristics: fired density of 98% T.D., surface finish of 20 microinches, coercive field H, = 7 Oe, 4nM, = 1730G, and tan8 = 0.0002. Using post-tape cast rolling (calendaring), the fired density can be increased to 99.7% with excellent microwave properties. It is possible to take the tape cast sheets and roll them up and store them for four months without evidence of cracking, sticking, or loss of strength. The tapes of a single-component ferrite material are suitable for low-cost fabrication for some circulator configurations; others require multiple ferrite compositions. A two-material composition device could employ a high saturation magnetization M , inner circular plug (really a cylindrically shaped volume) embedded in a low M, host material for very broadband applications. Only a minimal gap between the two saturation magnetization material types can be tolerated so that subsequent metallization steps can be reliably carried out. It has been shown that this geometry can be produced by co-firing two different members of the lithium manganesetitanium ferrite family with 4nMS1= 2750G and 4nM,, = 1300G (Fig. 6). However, tape casting of the geometry would be a less expensive process. The tape cast approach casts thick 75-mil tapes of the constituent ferrites and reduces their thickness by calendaring to avoid firing shrinkage anisotropy. The inner and outer portions of the substrate are next stamped out
FIGURE6. Circular boundary between two co-fired LiMnTi ferrites. (Provided by Ernst F. Schloemann of Raytheon Co.)
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
109
of the green tape, pressed together, and fired to the desired density. Different amounts of bismuth oxide are added to the two compositions to control shrinkage and generate a zero gap between the two materials after firing. Pulsed-laser deposition of spinel NiZn ferrite film is a way to overcome the relatively low saturation magnetization of YIG. 30 pm thick NiZn films on alumina substrates have been produced with favorable properties, namely, loss tangent tan6 < 0.001, 4nM, > 4000 G, and a perpendicular ferromagnetic resonant linewidth (FMR) AH = 100 Oe. Deposition occurs at 550°C at rates exceeding 15 pm/hour with no subsequent anneal required [Dorsey, 19963. Another technique is the jet vapor deposition method, a productionoriented procedure for depositing thin and thick films. It uses sonic jets of inert carrier gas in low vacuum to convect vaporized film constituents to a substrate at high speed [Halpern and Schmitt, 19943. The process gives high deposition rates over large areas and is very versatile, capable of making films of metals, alloys, multilayers, and multicomponents such as oxides and nitrides. Nickel ferrites and other ferrites have been produced using the jet vapor deposition process employing an electron source, with deposition rates being between 50 and 300pm/hour. The jet also provides an intense plasma for rf ion bombardment control of film crystallinity. The jet vapor deposition process has produced Ni ferrites with thicknesses ranging from 25 pm to 125 pm, and with 4nM > 2500 G. Excellent results have been achieved at deposition temperatures less than 550°C followed by a rapid thermal anneal (RTA) at 850°C for 20 seconds. Aluminum and copper have also been added to adjust the magnetization and improve microwave qualities. Work has focused on balancing composition, deposition rate, temperature, and rf ion bombardment to optimize magnetic properties and obtain required surface morphology. Jet vapor deposition appears to generate high-quality magnetic materials while providing a high-throughput manufacturing method for commercial production of circulator devices. All film deposition processes investigated indicate that thermomechanical and morphology problems place a practical upper limit of about 100 pm on thickness. This means that the primary use of such films will be for the millimeter wave frequency regime. Since tape casting can produce much thicker layers, it is best suited for the microwave frequency regime. Thus the two approaches, tape casting and jet vapor deposition, are complementary. Figure 7 shows several Ka-band circulators fabricated on NiZn ferrite substrates. Each substrate is only NiZn ferrite, having a saturation magnetization 4nM 5100 G. The metallization pattern defines the circulator main cylindrical ferrite region by the circular shape, and the microstrip transmission lines exit and enter the central region. Circulator structures,
=
110
CLIFFORD M. KROWNE
FIGURE7. Several Ka-band circulators fabricated on NiZn ferrite substrates by EMS Technologies, Inc. (Newman, Webb, and Krowne, 1996.)
including matching sections to the outside circuitry, will be covered in more detail in Section IV.
B. Magnetless Compatible Techniques It is well known that an external magnetic field dc bias must be applied to generate the spin precession characteristic of a nonreciprocating ferrite medium. Obtaining this field has always made the designer entertain the idea of providing suitable external magnets. So the technology of fixed bias magnets or controllable dc bias magnets is very mature. Although this is acceptable for larger circuits, it is not appropriate for either smaller hybrid circuits or monolithic integrated circuits where such use is to be especially shunned when any future consideration of seriously making competitive integrated circuits is broached. Thus a tremendous desire arose to find reasonable materials which might maintain a permanent magnetization perpendicular to the surface. Highly anisotropic aligned uniaxial hexagonal ferrites, similar to those fabricated for permanent ceramic magnets, can provide the desired permanent magnetization. Since they are insulators, they can be incorporated into the microwave circuit substrate to realize compact, internally biased configurations. The material must exhibit not only good permanent magnetic behavior but also low microwave loss. Hexagonal ferrite material should have a large coercive field in order to maintain a big remanent magnetization. The anisotropy field should range from 10,000Oe to 30,000Oe for millimeter wave devices. For example, Hanis= 17.3 kOe for BaM, which causes a natural gyromagnetic resonant frequency near 50 GHz. Thus these
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
111
TABLE 2 PROPERTIES OF UNIAXIAL HEXAGONAL FERRITES.
Chemical formula Anisotropy field Hvnis Saturation magnetization 4nMs Density Curie temperature T, Lattice constant, c-axis Lattice constant, a-axis Anisotropy constant
SrM
BaM
Sr0.6Fe2O, 19,500G 4750 G 5.14gm/cm3 480 C 23.031 8, 5.864 8, 3.7 x lo6 ergs/cm3
BaO.6Fe2O3 17,300 G 4775 G 5.29 gm/cm3 450 C 23.1548, 5.893 8, 3.3 x 106ergs/cm3
materials are ideal for millimeter wave devices, since operation in the microwave frequency range causes the off-diagonal permeability tensor element K to be too small for effective circulation behavior based upon electromagnetic nonreciprocity. More on this subject will be covered in Section IV. Hexagonal ferrites have hexagonal platelet grains with a large anisotropy field along the c-axis normal to the platelet. Grain growth occurs primarily in the a-axis direction, resulting in larger area platelets. The coercive magnetic field is promoted by small particle size while the remanent flux relies upon a high degree of grain alignment. A thermomechanical process, press forging, can be used in the sintering of these hexagonal compounds to achieve grain-oriented layers. The result is a high-density material with very small grain size. For example, BaM material is obtained with a density of 4.9 gm/cm3, a remanent magnetization of 3400 G, a coercive magnetic field of 3950 Oe, a relative dielectric constant of 21.2 (at 10 GHz), a loss tangent of 0.001 (at 10 GHz), and an anisotropy magnetic field of 17,000Oe. Table 2 lists a number of properties of the SrM and BaM hexagonal ferrites.
C. Monolithic Circuit Compatible Techniques Advantages in size reduction, lowered cost, increased manufacturability, and improved reliability make a film technology allowing monolithic integration of planar circulators possible. In the millimeter wave frequency regime, where circuit size becomes small due to severely decreased wavelengths, it is transparent why the use of monolithic compatible processing methods would be very desirable. Producing ferrite films which are useful for
112
CLIFFORD M. KROWNE
fabricating circulators on semiconductor substrates represents a major technical challenge. Not only must they have good physical, magnetic, and electrical properties but they must be manufacturable. Goals and acceptable film characteristics for a circulator operating at lower millimeter wave frequencies are summarized in Table 3. The thickness should approach 100pm to realize low loss. Studies have revealed that a surface finish rougher than 5 microinches caused increased insertion loss at frequencies of 30GHz and higher. Crack-free, rugged films are essential for good electromagnetic circulator performance, and are needed for high yield in metallization and high reliability. High deposition rates over large areas are required for low cost and high throughput. Gallium arsenide, GaAs, the most widely used millimeter wave semiconductor substrate material, degrades at temperatures exceeding 300°C if unprotected. With the use of proper protective layers it can withstand temperatures approaching 600°C. This can still be a major problem since ferrites commonly need high-temperature formation and anneal to attain good characteristics. Low-temperature ferrite deposition techniques (temperatures roughly equal to 100'C) have been reported [Abe et a/., 19871, but scaling to the thick films necessary for circulators without encountering large dielectric loss due to excess Fe2+ has proven difficult. Small linewidth and low loss tangent values are needed for reduced insertion loss resulting in good performance. Saturation magnetizations of around 2000G are obtained and are found to be satisfactory for narrowband millimeter wave applications, but greater values are required for wider bandwidths.
TABLE 3 DESIRED FILMCHARACTERISTICSFOR MILLIMETER WAVE CIRCULATORS. Property Thickness Area Surface finish Surface adhesion (tape test) Crack-free growth Deposition rate Dep. & extended Anneal temperature Saturation magnetization FMR linewidth Loss tangent
Goal
Acceptable
100 pm 50 cm' 5 pin, passes 100% no cracks 2500pm cm2 hr-' 300°C
50 pm 5 cmz 20 pin. passes 50% device yield 20% 100prncm2hr-' 600°C < 850°C 1500-5000 G 500 Oe 0.01
2500-5000 G 200 Oe 0.001
CAD USING GREEN'S F U N C T I O N S AND FINITE ELEMENTS
113
Pulsed laser deposition (PLD) has been the most successful approach for making ferrite films for planar circulators [Williams et ul., 19941. It is a flash evaporation technique which transfers the stoichiometry of the target material to the film. Metal oxide films including high-temperature superconductors have been grown using the PLD method. So have ferroelectrics and dielectrics. Growth rates are typically in the 1 pm to 30pm per hour range depending upon substrate size and growth factors. PLD started out as a laboratory tool allowing researchers to deposit films of complex materials that are diMjcult to produce by other physical deposition processes. However, it has been scaled up to 150 mm diameter wafers for a variety of metal oxides [Greer and Tabat, 19941. YID pulsed laser deposition has given the greatest success for producing ferrite films on semiconductors for circulators. Mechanical stability and low dielectric and magnetic loss are exhibited by YIG films. Films of thicknesses up to 100pm have been made on both Si and GaAs semi-insulating semiconductor substrates, and they typically have a 4xM, = 1780 G, a magnetic linewidth AH = 50 Oe, and a tans < 0.001 [Buhay et ul., 19951. Temperatures equal to nearly 600°C are used for film growth, followed by a rapid thermal anneal. High-temperature processing and thermal mismatch between film and substrate had presented numerous thermomechanical issues including wafer breakage, ferrite film cracking and delamination, and metal peeling and arsenic contamination. All problems were overcome by carefully selecting processing temperatures and using an oxy-nitride layer to prevent GaAs degradation and arsenic contamination. Once the oxy-nitride layer is in place, a high-temperature Ti/Pt/Au ground plane is then defined using the liftoff technique. The gold alloy ground plane helps relieve stress in the wafer and reduces mechanical problems. An 80-pm to 100-pm thick YIG film deposition follows through a silicon shadow mask by PLD [Adam et al., 19951 at 550"C, then annealed with an 850"C, 20-second RTA. The metal circulator shield and input/output circuitry, including matching sections, is finally applied. IV. MICROSTRIP CIRCULATOR CONSIDERATIONS FOR MODELING A . Ferrite Muteriul Pmameters
Nonreciprocity is generated in the circulator device by applying a bias dc magnetic field perpendicular to the planar surface of the structure, be it hybrid or monolithic. The bias field will create offdiagonal tensor elements in the permeability, and these new elements are anti-symmetrically disposed with respect to the diagonal. The size of the off-diagonal element K to the diagonal element p will determine the extent of nonreciprocal action
114
CLIFFORD M. KROWNE
possible in a circulator device. The permeability tensor is
where p o is the free space value and p and K (relative values) are given by real and imaginary parts when the system is lossy, as is any real ferrite material. Thus, in phasor form (assuming an eJwrtime dependence) [Soohoo, 19601, p = p'
-
jp"
K = K' - j K t f
p'= 1
(2)
(3)
- 02(1 - a:)] + [o;o,w,[o; w2(1 + .:)I2 + 402w;a: -
K'=
-
w,0[0;
[o;- w2(1
- OZ(1 + M i ) ] + a;)] + 402w;.;
Controlling variables in these equations are the magnetization radian frequency a,, ferromagnetic resonance radian frequency coo, phenomenological damping term a,, and operating radian frequency w. The first three frequencies can be found by
-yM
w,=
= -yHi
xm==
yAH 20
--
Here y is the gyromagnetic ratio, whose value in rationalized MKS units is -2.21265 x lo5 (rad/s)/(ampturns/m) = -27c x 2.8 MHz/Oe. M is the magnetization, which may approach a saturated value M,, A H is the ferromagnetic linewidth, and H i is the internal magnetic field, which may be expressed as a superposition of the net internal field due to externally applied H a p pand the anisotropy field Hanis.
+
Hi = Hi(app) Hanis (9) Internal magnetic bias field Hi(app)has previously been approximated
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
115
using a demagnetization factor N;, (for preferred direction z): Hi(upp)
=Happ
-
471Nz;M
(10)
M is assumed to be in the t direction also. When saturation has been attained, M is replaced by M,. Of course, the resultant field is not necessarily in only the z direction even if the applied field is. Equation (10) then represents an approximation of further implications other than its scalar form. In vector form we would have
Hi(app)= H a p p- 471N M '
(11)
Demagnetization factor N,, which is a function of radial location r within the circulator puck R (r < R), has the property N,,
=
Nz,(r, 2 )
<1
(12)
The last term in (10) and (11) is referred to as the demagnetization field, allowing us to rewrite these two equations as
If it is desired to avoid the approximation implied by the designations (13b) and (14b), then the problem must be solved in a full-up numerical sense, self-consistently obtaining the static field solution inside and outside of the ferrite puck, whatever its geometric shape. This holds true whether the shape is a thin cylindrical volume, or a more irregular shape like a hexagon cross sect ion. For finite thickness puck, NzZwill be nonuniform, and this will make p and K through formulas (4) and (5) also nonuniform. Therefore we see that the circulator problem must by necessity become an inhomogeneous boundary value and forcing function problem. In those situations where a large enough bias field is applied to create saturation, H u p p% Hdemag,and their cancellation in (10) leads to Hi(app)z 0. By (9), the net internal magnetic field H i will be either H i x 0 (ordinary ferrite material) or H i x HUni, (hexagonal ferrite material). In fact, in a hexagonal ferrite, with H R P p= 0 with its remanent magnetization M = 0, Hi(app)= 0, and H i= Hanis holds exactly. Real hexagonal ferrites will have nonzero M, making H i = HanisHdenrug% Hanis. For hexagonal ferrites where no applied magnetic field is necessary, typical anisotropy values of 17,000 Oe through 19,500 Oe lead to a ferromagnetic frequency range f b ( f o = w0/271)from 47.6 GHz through
116
CLIFFORD M. KROWNE
54.6 GHz. The anisotropy field is derived from the anisotropy energy of the ferrite. It produces a torque on the magnetization in the same manner as an externally applied field, and lies in the same direction as the remanent magnetization. One of the issues which must be understood is why these devices are not operated near or at the ferromagnetic resonance frequency. Other than the intuitive idea that near resonance operation may accentuate electromagnetic field loss due to absorption mechanisms, a simple rigorous way to see that this is indeed the case is to examine expressions (4) and (5) in the limit of low but finite CI,. Four reduced relationships are found for p‘, p”, K’, and K ” :
These expressions are accurate to the first order in ct,. As o + w o making the denominators in the loss component formulas p“ and K” approach zero, because a second-order singularity is reached, namely (o-
This is precisely what makes it unacceptable to operate near the ferromagnetic resonance frequency. Strength of the singularity is first order in a,. Equations (15) and (16) may be used to assess the degree of circulation possible, which depends upon the amount of nonreciprocal anisotropy present in the ferrite material used in the device. Nonreciprocal anisotropy is measured by the ratio of the real part of the off-diagonal permeability to . (15a) and (16a). the real part of the diagonal permeability, ~ ‘ / p ‘Invoking
l ~ ’ / p ’is l plotted for YIG and a hexagonal ferrite in Fig. 8. For oozz 0, the formula limits to K‘
lim
-
oa+O
p’
0, =0
(19)
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
0
20
40 60 80 Frequency (GHz)
117
100
versus frequency for a YIG ferrite (coo x 0) and a hexagonal ferrite
This formula implies that it is necessary to have a significantly sized 0, in order to obtain sizable circulation behavior, which means a large magnetization value. In addition, bandwidth of a well-designed circulator can be shown to be roughly equal to w,, so there is a second reason for wanting the formula limits to large values of magnetization. If w << a,,,
which further reduces if w, << w,, to
In the limits of extremely high operating frequency (w + co) or extremely we use, respectively, (19) high ferromagnetic resonance frequency (w,, -+ a), and (21) to find K’
lim
= :
o~m;wo’Op
lim wo-m;ooBw
Kr
_-
; -
I*
.
w,
w-m
w
Iim - = 0 o w
lim - = O uo’m
O$
(22) (23)
The extreme limiting cases w --f co and w,, + cc can be found directly from (18). They tell us something about the actual cases of ferrite material made out of YIG or hexagonal ferrite like BaM or SrM. For YIG, where (22) applies, it is noticed that no circulation behavior can be utilized at very high frequencies. Therefore, for YIG material, operation above the ferro-
118
CLIFFORD M. KROWNE
magnetic resonance is required (but not too high above, which would cause nonreciprocity to be lost). Since the diameter of a circulator is constrained to be approximately half a wavelength ,Ir in the ferrite medium, because the puck acts (excluding the effects of the ports) as a distributed resonator, the operating resonance or electrical resonance must be above, but not too far above, the ferromagnetic resonance. Equation (23) applies to hexagonal material if the device is being operated way below the ferromagnetic resonance, and says that no circulation behavior will be found. Therefore, for BaM or SrM material, operation below the ferromagnetic resonance is acceptable (but not too far below, which would cause nonreciprocity to be lost). Because o,,is finite -on the order of 50 GHz- operation above the ferromagnetic resonance is possible as well, and here the applicable limiting formula will be (22). Again, operation too far above this resonant point is not recommended, as useful nonreciprocity will quickly be lost. The wavelength A, in the ferrite medium is calculated using the effective two-dimensional permeability peSS
where the ~ / ratio p is evident. From (24), the ferrite wavelength is calculated as
where relative values are used in the first formula and the second defines the free-space velocity of light. An X-band circulator is typically 4-5 mm in diameter, and millimeter wave devices are typically 1 mm or less, depending upon the dielectric constant of the ferrite and the operating frequency. Figure 9 shows a plot of the calculated required diameter for a 0.1 mm thick yttrium iron garnet (YIG) puck as a function of frequency [Adam et al., 19961. Also plotted in that figure is the calculated real part of the device input impedance. Examination of the impedance curve shows that whereas microwave circulators will have values below 20 R, millimeter wave devices can be designed to have values close to 50 R. Another constraint on design is the substrate thickness, which should be thin enough to remove radiation or launching of surface wave modes, which are unrelated to desired circulating modes in the device. Figure 10 graphs circulator thickness against frequency for the lowest-order surface wave mode, and for a 50 Cl input port impedance condition [Adam et al., 19961. It is seen that a 5OR impedance can be maintained at and beyond K-band (around 17 GHz and higher).
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
1
1
1
'
1
119
I
t
5 -
E E i
a,
4-
c
E
.g U
3-
5
c
-m3 i?
2-
is 1 -
I
20
0
I 40
l
l
'
I
'
80
60
I
l
o
100
Frequency, GHz FIGURE9. Calculated circulator diameter and impedance (real part) for a 0.1 mm thick yttrium iron garnet (YIG) puck as a function of frequency. (Adam et al., 1996.)
-
0.01
3
I
I
,
I
I
lowest-order . . surface wave mode, and (2) a 50Q input port impedance condition. (Adam et al., 1996.)
120
CLIFFORD M. KROWNE
B. Mutching Sections The self-consistent electromagnetic solvers work on the premise that specific driving conditions exist at each circulator port, thereby exciting the internal fields of the device, sometimes referred to as the intrinsic device. This is true whether the solver is a Green’s function method relying upon Dirac delta function sources at the port locations or a finite element method relying upon imposed fields at the port locations. In either case, the tangential E and H fields at the circulator device perimeter R port locations must obey continuity with the tangential E and H fields in the exiting port transmission lines. These microstrip transmission lines have impedance Z , for the kth port location, and their impedances are found by using the width determined from each particular port extent, and the common susbstrate thickness. For ports of identical angular extent, all Z , will be the same (making Z , = Z,, k = 1,2,. . . ,N ) . The s-parameters of the intrinsic circulator are referenced to these impedances, or impedance if they are all the same. To facilitate the incorporation of the circulator device into a CAD program generating s-parameters, the N-port device should be re-referenced to the system impedances Z; in use. Z ; , k = 1,2,. . . ,N , would usually be 50 Q for microstrip circulators. Figure 11 shows a sketch of a symmetrically
t
Port I ISI FIGURE11. Intrinsic circulator and its matching circuit network.
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
121
disposed 3-port device, where at progressively increasing radial distance from the central circulator puck, the s-parameter matrix goes from its intrinsic N x N sized Sintr= S, to its re-referenced value N x N sized S,,.,, = S2 to its final value N x N sized Smatched= S after encountering a matching section with a 2 x 2 sized transfer matrix T,. This T, matrix will generally be referenced to the system matrix impedance Z;. The relationship between S, and S, is [Kurakawa, 19651
where A and
r are N
x N sized diagonal matrices whose kth element is
In these three formulas, the dagger represents a conjugate transpose, the asterisk * a conjugate, and I the unity matix. Matching circuits for microstrip circulators are most commonly cascaded sections of transformers, usually quarter-wavelength sections, placed in the same configuration at each port. We have developed a computer code which incorporates such cascaded transformers directly into the s-parameter calculation. Each microstrip transformer section is permitted to have a user-chosen length and width. Dissipation losses are included in each microstrip transformer section, so that the final s-parameters calculated include all intrinsic as well as extrinsic device losses. This is important for wide-bandwidth circulators having many quarter-wavelength matching sections, because matching section losses frequently are greater than the intrinsic device losses. The final matched s-parameters for the circulator structure are found by mapping or transforming S, through T, yielding S [Gupta, Garg, and Chadha, 19811. The relationship giving this transforrnation uses the four elements of T, [Parisot and Soares, 19881:
s = (TllS, + TI 2I)(T21% + T22U
-
(29)
To obtain a rough idea of what a circulator circuit looks like, Fig. 12 shows an exploded view of the circuit metal on YIG, which is then placed on top of another dielectric substrate. Above the microstrip metal is a spacer metal, which separates the circuit layer from the permanent magnetic placed directly over the spacer. There are many circulator structure possibilities, and this figure shows one of them in the class of externally biased devices. The microstrip metal matching transformer in the figure has only one section, just like that seen previously in Fig. 7 for a circuit over NiZn ferrite.
122
CLIFFORD M. KROWNE
YIC
FIGURE12. Example of a circulator device, with exploded view to show basic component structures. (Provided by Gordon R. Harrison of EMS Technologies, Inc.)
External magnetic biased devices can be made to operate anywhere from the C-band to bands in the millimeter frequency regime, depending upon the magnetization of the ferrite material. However, if it is desired to construct a microwave circulator which is self-biased [Weiss, Watson, and Dionne, 19891 using a hexagonal ferrite as the puck material, trouble ensues because
CAD USING G R E E N S FUNCTIONS A N D FINITE ELEMENTS
123
FIGURE 13. Self-biasing circulator structure in cross section, using an internal ordinary ferrite puck and an external hexagonal ferrite. (Webb. 1995.)
the ferromagnetic resonance frequency w o is simply too high. Raytheon Company (1995) found a way around this problem by only using the hexagonal ferrite as part of the biasing circuitry, as seen in Figs. 13 and 14, showing the structure in cross section and perspective views. The configuration consists of a low-coercivity ferrite plug within a high-coercivity BaM substrate. A low-coercivity ferrite insert establishes the
FIGURE 14. Perspective view of a self-biasing circulator, employing external hexagonal ferrite. (Provided by Ernst F. Schloemann of Raytheon Co.)
124
CLIFFORD M. KROWNE
active circulator junction, while the BaM ferrite provides the bias field. High-permeability Permalloy sheets are used to confine the magnetic flux. Excellent performance was demonstrated over a 35% bandwidth centered at 9 GHz with less than 0.4 dB loss and more than 20 dB isolation. Total thickness was less than 0.9mm, more than five times thinner than designs which use an external magnet. Of course, if one is operating in the millimeter wave frequency regime, the magnetization of the hexagonal ferrite material can be used directly while employing the anisotropy field to establish the ferromagnetic resonance in the millimeter wave frequency regime. Thus in Fig. 15, the YIG or spinel core would be replaced with BaM and the outer region or substrate replaced with either a dielectric or low magnetization ferrite. The purpose of retaining a low outer magnetization ferrite would be to compensate the demagnetization effects [Blight and Schloemann, 19921, which produce a relatively uniform internal magnetic bias field throughout the puck diameter, except near the perimeter where the internal field rapidly rises, causing potential degradation of circulator performance. Low outer magnetization ferrite material makes the magnetic bias field slowly varying as the puck edge is reached, as seen in Fig. 16.
SHIELD DIA. 0.037”\
SrM-20 PUCK
DA-9
1
I
ALUMINA SrM-ZOA
FIGURE15. Simple perspective and cross-sectional views of self-biasing circulator with internal hexagonal ferrite and external ordinary ferrite. (Provided by Gordon R. Harrison of EMS Technologies, Inc.)
CAD USING G R E E N S FUNCTIONS AND FINITE ELEMENTS
125
FIGURE 16. Demonstration sketch showing low outer magnetization ferrite material making the magnetic bias field more uniform as the puck edge is reached. (Provided by Ernst F. Schloemann of Raytheon Co.)
The sketch of the field distribution in that figure is obtained by using ordinary ferrites (and taking Hanis= 0), a bias field so that Hap,,= 4nM,, making the applied field inside the first ferrite material exactly compensate the magnetization with M , = M , and using an averaging approximation at the puck-external medium to obtain the result at a medium 1-medium 2 boundary and a t a medium 1-air boundary. For the medium 1-medium 2 boundary, use (9) and (lo), making H i = Hi(app, Hanisand Hi(app)= Hap,,471Nz, M . Having dropped the anisotropy field already, H i = Hi(app),making the second formula H i = Hap,, - 4nNZ,M. Assuming for simplicity that the demagnetization factor is unity, i.e., N,, = 1, H i = Ha,,,, - 4nM. Now, for the case of a medium 1-medium 2 boundary, estimate M as M = (M + M 2 ) / 2 , and using the fact that Happ= 4nM1, determine that H i = 2 n ( M , - M 2 ) . Next, for the case of a medium 1-air boundary, estimate M as M = ( M , + 0)/2 and, again using Hap,,= 4nM,, determine that H i = 2 n M , . A similar argument to that just presented for an ordinary ferrite material applies for a hexagonal ferrite material, since a large anisotropy field is equivalent by (9) and (10) to an externally applied bias field. It is also possible to make the ferrite puck out of several concentric rings, the inner being the largest in area and a disk with the highest magnetization, the next having a lower magnetization, and so on. In fact, the simplest realization of such a puck would be a two-material-region ferrite. As we have said, there are many different ferrite microstrip circulator topologies which have been found to be useful, each having somewhat varying material
+
126
CLIFFORD M. KROWNE
and device fabrication implications. The simplest topology is shown in Fig. 17, employing a metal shield on a ferrite substrate with a metal ground plane on the opposite side. A permanent magnetic (not shown) provides the magnetic field bias of sufficient magnitude to just bring the entire junction area into saturation. The entire circulator structure is mounted in a test fixture for measurement. We also notice that the circuit has a one-section matching transformer beyond the puck. Next we look at complete circulator circuit structures, showing their experimental measurements, with numerical results sometimes accompanying them. These numerically generated results were determined using a Green’s function simulator operating with only the uniform capability feature. Figure 18 shows a Westinghouse X-band circulator fabricated with 80pm YIG film on a Si substrate [Adam et al., 19951. Single-phase polycrystalline YIG ferrite material was made by using pulsed-laser deposition through a shadow mask followed by a rapid thermal anneal at 850°C for 20 seconds. Silicon was selected because of its mechanical strength and
FIGURE17. Simple circulator topology employing a metal shield o n a ferrite substrate with a metal ground plane o n the opposite side. Here opposite side metallization on the ferrite is contiguous with the measurement test jig, which allows for the microstrip lines out of the circulator ports to transition finally to SMA coaxial connectors. (Device produced by EMS Technologies, Inc. Webb, 1995.)
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
5
127
mm
FIGURE18. Photograph of Westinghouse X-band circulator with a three-stage matching transformer fabricated with 80pm YIG film on a Si substrate. (Adam et ul., 1995.)
processing tolerance. A three-stage transformer section takes the intrinsic circulator impedance out to the external environment. Minimum insertion loss s2 was 3 dB with insertion loss less than 5 dB over the 6.2 GHz to 11.5 GHz range (Fig. 19). Isolation s l 2 was greater than 11 dB over this frequency range. The measured losses are several times higher than the calculated result because of the design having a thicker substrate (l00pm) than was realized, unaccounted for surface roughness on the YIG film, and film cracking of the YIG material. Similar reasons would explain the degraded isolation. A nominal 20 GHz circulator with a one-stage matching transformer section is seen in Fig. 20, constructed by Westinghouse using pulsed-laser deposition for depositing YIG on a GaAs wafer [Webb, 19951. Diameter of the metal shield is nominally 3 mm. Minimum insertion loss is 2 dB and isolation exceeds 18dB over a 1-GHz bandwidth (Fig. 21). Higher than modeled midband loss and rapid decline in loss and isolation at the high end of the band is postulated to be caused by high ground plane resistance and an excessively thin YIG film. However, we see that the isolation measured and modeled agree reasonably well, especially at and below the puck diameter resonance peak (located at 20.5 GHz). Three similarly
128
CLIFFORD M. KROWNE
12
8
Frequency (GHz) FIGURE19. Measured insertion loss sZ1 and isolation sl2 versus frequency for the circulator device seen in Fig. 18. (Adam et al., 1995.)
FIGURE20. Photograph of a nominal 20GHz circulator with a one-stage matching transformer section using YIG on a GaAs wafer. (Produced by Westinghouse. Adam et a/., 1995.)
CAD USING G R E E N S FUNCTIONS AND FINITE ELEMENTS
0
129
0
-10
-20
-20
-40 14
16
18
20
22
24
26
Frequency (Gl-k) FIGURE 21. Measured insertion loss s 2 , and isolation s I Z versus frequency for the YIG circulator device seen in Fig. 20. (Produced by Westinghouse. Adam er al., 1995.)
prepared circulator circuits by EMS Technologies, Inc. over NiZn ferrite substrates, each with a single-stage quarter-wave matching transformer section, are presented in Fig. 22 [Newman, Webb, and Krowne, 19961. Three circulator shield diameters are shown: 0.056 in. (1.42 mm), 0.060 in. (1.52 mm), and 0.064 in. (1.62 mm). Figure 23 shows the insertion loss and
FIGURE 22. Photographs of several Ka-band circulators (nominally about 32-GHz center frequency), each with single quarter-wave matching transformer sections, fabricated on NiZn ferrite substrates by EMS Technologies, Inc. (Newman, Webb, and Krowne. 1996.)
130
CLIFFORD M. KROWNE
i
-5I-
Insertion loss&
1
Isolation
-25
Frequency (GHz)
FIGURE 23. Measured insertion loss s2, and isolation s I z versus frequency for the Ka-band NiZn circulator type of device seen in Fig. 22. (Newman. Webb, and Krowne, 1996.)
isolation of this nominal 32-GHz circulator device (center frequency appears to be 32.4 GHz). Fixture loss of approximately 1.75 dB is included in the measured insertion loss. Minimum insertion loss is about 2 dB, with isolation exceeding 20 dB over a 1.8 GHz bandwidth (from 31.7 GHz to 33.5 GHz) and exceeding 15 dB over a 3.3 GHz bandwidth (from 31 GHz to 34.3 GHz). A magnetless circulator circuit mask for use with SrM hexagonal ferrite as developed by EMS Technologies, Inc. is pictured in Fig. 24 [Newman, Webb, and Krowne, 19961. Substrate SrM thickness is lOmils (0.010 in.) for a Ka-band operation with the shield setting (diameter equal to 0.034 in.) the resonance frequency to be 32.5 GHz. One transformer stage occurs in the matching section to bring the intrinsic impedance to 50 Q at the edge of the
EMS
a
FIGURE24. Magnetless circulator circuit mask for use with SrM hexagonal ferrite operating at 32.5 GHz as developed by EMS Technologies, Inc. (Newman, Webb, and Krowne, 1996.)
*
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
131
Insertion loss
31
31.5
32
32.5
33
33.5
Frequency (GHz)
FIGURE25. Measured insertion loss szl and isolation s 1 2versus frequency for the SrM circulator device seen in Fig. 24. (Newman, Webb, and Krowne, 1996.)
pattern. Minimum insertion loss is about 3 dB, with isolation greater than 15 dB over a 0.6 GHz bandwidth (from 32.2 GHz to 32.8 GHz) (Fig. 25). A nominal 34 GHz YIG circulator circuit is shown in Fig. 26, with a disk diameter of about 1.3mm and substrate thickness equal to 0.1 mm [Newman, Webb, and Krowne, 19961. Microstrip lines exit directly from the shield region in 50 f2 impedances, in keeping with the ease at millimeter wavelengths of obtaining the system impedance value. This Westinghouse (Northrup-Grumman) device was made by growing YIG thin film on GaAs employing PLD technology. Insertion loss and isolation are plotted in Fig. 27, giving rough agreement with theory calculated using a uniform Green’s function model. Insertion loss at the center frequency, 34 GHz, is 6.5 dB, a poor showing compared to the calculated value of 0.5 dB. Furthermore, the minimum insertion loss values occur at two humps on either side of the center frequency. Isolation exceeds 10dB from 30.5GHz to 36GHz, a 5.5 GHz bandwidth. C. First-Order Layer Efect Estimation Circulator structures realized in the laboratory and in industrial applications are made with either a single substrate or with several layers. The single-substrate case obviously occurs when the substrate and the ferrite material are one and the same. But due to complications of mechanical and
132
CLIFFORD M. KROWNE
FIGURE 26. Photograph of a nominal 34GHz circulator with YIG o n GaAs by Westinghouse (Northrop-Grumman). (Newman, Webb, and Krowne, 1996.)
h
g o u) u)
3 c 0 .-r
-10
0) u)
E -20
5
.s -30 c Q
-% L
26
34 Frequency (GHz)
30
38
42
FIGURE 27. Measured and modeled insertion loss s2, and isolation sIZ versus frequency for the YIG circulator device seen in Fig. 26. Model calculations were done using uniform Green’s function theory. (Newman, Webb, Krowne, 1996.)
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
133
material processing steps, several layers may exist. The designer can expect layers which do not exhibit nonreciprocal action to reduce the effectiveness of the finished circulator device when this layered effect shows up in the region under the shield. Structures which use drop-in pucks into an existing substrate, whether of dielectric or other ferrite material, will not suffer degradation in the main spatial operating region of the device. Film processes, which rely on deposition on top of an existing semiconductor semi-insulating substrate, or on top of an insulator, will produce the layered effect under the shield. It is possible to avoid this effect if the metal ground plane for the device can be deposited on top of the substrate, thereby bringing up the ground plane from below and then depositing the ferrite film. Assuming that we have a layered situation and would like to amend the 2D Green’s function solver to model the device, we recommend a first-order method to estimate the new tensor permeability and permittivity [Neidert, unpubl., 19951. For a two-layered medium consisting of the ferrite and a dielectric, demonstrative of the thinking, an equivalent medium is found which is uniform, and this then is consistent with the 2D nature of the problem, which is collapsed in the third dimension. Equivalent distributed capacitance or inductance of a mixed insulator transmission line [Ramo, Whinnery, and Van Duzer, 19671 and equivalent series inductance of a transmission line with lossy magnetic conductors [Jackson, 19751 are employed. Electric equivalence is established by using an equivalent permittivity that makes the shunt capacitance of the equivalent material equal to that of the real layered medium. Magnetic equivalence is established by using an equivalent permeability that makes the series inductance of the equivalent material equal to that of the real layered medium. Applying Gauss’ and Ampere’s laws, the derived relationships for the equivalent rf permitivity ceq and equivalent rf permeability tensor Zeq are
are, respectively, the dielectric and ferrite permittivities. In (30), E, and The total substrate thickness d is merely d = d,
+ d,
(32)
Equation (31) arises from (l), using the Polder tensor expressions for complex p and IC, and adding the subscript f to the tensor permeability to
134
CLIFFORD M. KROWNE
be absolutely unambiguous. The dielectric permeability, a scalar (and assumed for simplicity to be unity times the free space value), is upgraded to an identity tensor, .+ Pd
e
(33)
= POf
Inserting (1) and ( 3 3 ) into (31) gives the new equivalent single medium permeability tensor:
u
Peq
(34)
= Pc
-
0
0
1.
D. First-Order Loss Efect Estimation Metallic, dielectric, and magnetic losses are the basic loss mechanisms to be dealt with in circulator devices. When ferrite materials are utilized with low intrinsic losses, metallic losses of the conductors constituting the circulator structure will dominate, this being especially true for thin substrates. Care must be exercised in assessing and interpreting where loss actually comes from. There is a decided difference between bulk and thin film properties, and one would expect finite film thickness effects, surface roughness, cracks, and domain size to be a few of the important contributors to the material loss effect. Figure 28 shows the dissipation losses versus substrate thickness for a C-band (6GHz) YIG circulator calculated using a uniform Green's function solver [Neidert and Philips, 19931. The Green's function code was modified by an ad hoc, heuristic approach to account for losses, and to its credit it can be said to predict circulator losses reasonably well, having been verified by numerous experimental s-parameter measurements. Green's function code for inhomogeneous circulators [Krowne and Neidert, 1995) has been also modified along the same lines by Neidert [unpubl., 19951, keeping intact the basic inhomogeneous recursive Green's function construction developed by Krowne. The figure shows that if height exceeds about 10 mils (0.254 mm), the effect of metallic ohmic losses drops drastically and the overall loss is only slightly height-dependent, determined mostly by the magnetic and dielectric losses. Magnetic and dielectric losses together are height-independent over the entire thickness range. Insertion loss versus substrate thickness for a higher-frequency-band (X-band) device are calculated [Adams et al., 19951 using the uniform Green's function method [Neidert and Philips, 19931 in Fig. 29 at 10 GHz.
CAD USING G R E E N S FUNCTIONS A N D FINITE ELEMENTS
135
MICROSTRIP CIRCULATOR DISSIPATION LOSSES VERSUS HEIGHT A T 6GHZ 1
a (3 cn w a a
0.8
sz
0.6
I-
0.4
o_
aQ a
‘/1
a
0.2
0 20
0
40
60
80
100
HEIGHT, MILS
FIGURE 28. Dissipation losses versus substrate thickness for a C-band (6 GHz) YIG circulator calculated using a uniform Green’s function solver.
0 20
40
60
80
100
YIG Thickness (microns) FIGURE 29. Insertion loss versus substrate thickness for an X-band (lOGHz) circulator calculated using the uniform Green’s function solver. (Adam et a!., 1995.)
136
CLIFFORD M. KROWNE
The circulator is constructed with a YIG film on silicon substrate, using gold metallization. Dielectric, magnetic, and conductor losses are broken out separately as plotted curves, and it is seen that the conductor loss dominates over the entire range from 25 pm to 100 pm, decreasing in effect as thickness increases. Dielectric loss is about twice that of magnetic loss, decreasing slightly and then leveling off to be constant over two-thirds of the graphed range. Even the conductor loss tries to soften its slope beyond 50 pm, which was relatively steep under that thickness. Unlike the 6 GHz device in Fig. 28, there is no crossover between conductor and material losses. In comparing the C-band results to the X-band results, we should note that the C-band curves were generated with many more thickness points, accounting for the smooth curves, providing greater accuracy, and giving us loss results near zero thickness where it is known that conductor loss effects will explode. The X-band plots knocked out the region below 25 pm, where the steep rise for conductor loss would occur. Electromagnetic Sciences Company has conducted a circulator performance versus thickness study for a variety of circulator materials: garnet, YIG, Ni ferrite, and BaM [Webb, 19951. For each plotted curve (Fig. 30), at the given frequency, the shield diameter and coupling aperture were optimized to obtain the lowest insertion loss. Data points are provided at 25 pm,
3.5
3
0.5 0 0
-
50
loo
150 200 250 300 350 Substrate Thkkness (pm)
30 GHq BaM Gband: Garnet
-
17 GHr; Nirenite
-e-
400
-
450
500
550
X-band; YIG
77 GHZ; BaM
FIGURE 30. Measured insertion loss versus substrate thickness for garnet, YIG, Ni, and BaM ferrite circulators produced by EMS Technologies, Inc. (Webb, 1995.)
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
137
63 pm, 126 pm, 253 pm, and 508 pm thicknesses. The frequencies studied were C-band, X-band, 17 GHz, 30 GHz, and 77 GHz. Substrate thickness choice depends on how much loss is acceptable in the circuit in which the circulator device is to become a part. For example, if we want 0.3 dB of loss or less, this will be attained at about 85 pm for the 30 GHz BaM device, and at about 175 pm for the X-band YIG device. First-order construction done before [Neidert and Philips, 19931 is ad hoc and known to be not entirely rigorous, despite its reasonable predictive value. One of its main defects is that the vertical electric field under lossless conditions in the puck region (in its vertical extent) was replaced, when loss was imposed, by a new vertical electric field which was composed of the horizontal field due to surface loss and the original vertical field. That some type of addition will occur is not questioned, but the fact that vector effects may have been ignored is of concern. Interest in a rigorously derived first-order method is thus warranted. Clearly there are several ways to do this, and here we will only take one approach. The thin but finite thickness substrate (compared to a wavelength) will be studied as to the electromagnetic fields in the ferrite region under the circulator shield. It will be assumed that the primary electric field component will be Ex perpendicular to the shield, consistent with a 2D model and our desire to fold the reults of this study into such a model. Furthermore, since small regions under the shield will appear to have the individual waves traveling in straight lines (a small segment of the circulating curved azimuthal wave), we will obtain the solution to guided waves in structure uniform in the x y cross section perpendicular to the propagating z-direction (see Fig. 31). Because in the ideal lossless case the fields are bounded by perfect conductors from above (the microstrip shield) and below (the ground plane), what will be examined is in fact a parallel plate waveguiding situation. This is the first time in the literature thjat a simple but detailed and complete derivation as applied to the planar microstrip circulator has been available. A Helmholtz equation for the primary field in the propagation direction E, can be written down using the separation constant in the ferrite medium k, :
V 2 E , + k:-Ez = 0
(35)
Writing this out explicitly in rectangular coordinates,
Uniformity in the y-direction requires all d"/iiy" partial derivative orders to be zero. Taking also the propagation constant dependence in the z direction
138
CLIFFORD M. KROWNE X
--
+ t, #d/2 + 6 d/2 d/2
ferrite 0
+ Y
CLfv Ef
e:,* -6
- d/2 - t g
FIGURE 31. Cross-sectional diagram for analyzing a small parallel plate waveguiding slice of the circulator structure. Shown are the microstrip metal (thickness t,,,), ground plane metal (thickness tg), and ferrite material (thickness d). First skin depth 6 penetration of the electromagnetic fields into the metal regions is indicated. Propagation is into the paper in the z-direction.
to be
E,
e-jkpz
(37)
and inserting these results into (36) yields
Allowing plane waves also in the direction perpendicular to the shield, the field is upgraded from (37) to
E,
#,xe
-jk+
making (38) reduce to the separation equation
- k f - kp’ + k:
=0
(39)
139
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
Satisfaction of electric wall conditions at x = 0 and x = d (we have temporarily shifted the x-coordinate by adding d/2 to simplify the immediate discussion below) requires a superposition of (39) waves to be employed:
E,
AmeJkxmx
= m
=
C
s u m over +" m values
[Am(+)ejk-mx
+~ , , ( - ) ~ - . i k ~ *1 +
(41)
Here the z-propagation dependence was dropped for economy of notation. Boundary conditions are
EJO)
= 0;
(4% b)
E;(d) = 0
Equations (42a) and (42b), respectively, result in, at x electric field constraints
C
[Am(+)
+ Am(-)]
=0
and x
= d,
=O
the (43)
sum over"+" m values
1
+
[A,( + ) e j k x m d A,(
sum over" +" m values
s u m over" + m values
"
s u m over + m values
"
1
=
= 2j
1 ~ , ( + ) [ ~ j k ~~ de - j k X m d
1
1
-)e-jkxmd
s u m over '. + *' m values
1
A,( +) sin(kxmd)= 0
(44)
The second right-hand-side expression in (44) results from the constraint for all m modes (positive or zero) obtained by satisfying (43), namely, A m (+)
+ A,( -)
=0
(45)
and the third line equality in (44) requires sin(k,,d) = 0 to hold also on an individual mode basis. The solution to (46) is
(46)
(47) Therefore, from the separation equation (40), the propagation constant in
140
CLIFFORD M. KROWNE
the structure is
In order to understand which m modes to worry about, we must find k,. It is given by
k,
=0
6=
'
~ JJG GG = JG= k a i r J G C
(49)
The parameters in (49) are selected or evaluated as follows, prior to finding k,r. f = 1 GHz; c f r
13.3; L,/,
=
kair =
= / L ~ D ~ I ~= = O /L,. %
1
(504
2071 z 20.94 m3
~
p,, is found in (50a) by using (24) to equate it to the effective twodimensional permeability in a circulator, but evaluating for simplicity at zero applied magnetic bias field, making the off-diagonal contribution K = 0. Then k! is needed in the (48) evaluation, and it is found with the help of (50) from (49):
k;
=
5.83 x 103mm-2
(51)
The second contribution under the radical sign in (48) is k&,, which for the substrate thickness d = 100 pm, is evaluated to be
Inserting values for k! and k:,, obtained respectively in (51) and (52), into (48) produces a negative argument under the radical sign, making the k , propagation constant always imaginary, indicative of decaying or nonpropagating waves for all rn modes, except for the m = 0 case. Thus,
k,
=
k,
(53)
Information on the electromagnetic fields can be taken from Ramo, Whinnery, and Van Duzer [1967]:
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
H , = 0; E,
= 0;
H,
= o(a/ay = 0)
141
(544 e, f)
These TM,,, waves we had seen from the steps leading up to (53) can only be m = 0 type, or TM,,. Placing m = 0 into (54)produces a null E , component, showing us that we have the lowest-order TM,, wave being a TEM wave. The fact that the field expressions for nonzero components blow up merely means a renormalization is appropriate. It is
which transforms (54) to
E,=
kxm
- -Eo,sin
mzx
(mix)
; E x = EOxcos - ; H , =O ~E E , , c o s ( ~ )
JkP
k, (56a, b, c)
Equation (56) allows us to kill all modes except m = 0. E , may be safely dropped and the simple field formulas E , = 0; E , = Eo,;
H,
=
WE
Eo,
&P
remain. From (57c) identify the right-hand side as H , = H,,, analogous to (57b), and note that wave impedance in the ferrite is defined as
where the last equality results from (49). Impedance specification (58) will be employed later, but we will move on to another line of analysis. Referring back to Fig. 31, the original problem is considered to have perfect electric walls, making the field solution exist only in the ferrite material region. The field solution is denoted by capital letters E and H and given by E = Eox e - j k f z ) ; : ; H = H ox , - J k f z A Y (59) having utilized (57) and having upgraded to include the z-dependence. Allowing the metal at the bottom and top to become imperfect will perturb the original solution so that (1) there will now be a finite but small field in the metal, and (2) the large-field solution in the ferrite region will be
142
CLIFFORD M. KROWNE
modified by an additional small-field correction. The small-field solution or correction is denoted by lowercase letters e and h. Maxwell’s equations and the constitutive relationships in the ferrite are
VxE=-jwB; B = p,H;
VxH=J+jwD
D = &,E; J
= o,E = 0
(6% b) (61a, b, 4
The loss behavior (dielectric) in the ferrite will have only a secondary effect on the field solution in the metal, and it is that solution which will generate the correction field to the primary ferrite field, justifying setting nf = 0. The small field in the metal
V x h, = i, -tjod,
V x em= -jo,b,; b, = pmhm; d,
=
&,em; i, = n,e,
(62a, b) (63a, b, c)
and the correction field will obey
V x e,
--job,;
=
b, = p,h,;
V x h, = i f + j o d f
d, = Efef; if
= alef = 0
(64%b) (65a, b, c)
Surface current J,, which is a primary current, can be related to the primary H field [Harrington, 19611 by
A x [H‘”
- H‘”] = J,
(66)
Here the normal vector A points into region 1 at the surface, and J, is along the surface forming an interface between regions 1 and 2. From Fig. 32 identify H(’)= 0 as the primary field (by problem construction it has been set to zero already) in the top microstrip circulator metal at the interface and H(2)= H as the field in the ferrite side. Because A = 2, we find that J, =
- 2 x jjH = -2H
,-jkfz OY
(67)
This leads us to choose the small current i, in the metal to be in the z-direction also, namely, i, = izm2. Let us derive the Helmholtz equation for this current. Insert (63b) and (63c) into (62b):
V x h,
= (a,
+ jm,)e,n
(68)
Now take the curl of (62a), using (63a) to obtain
V x V x em= -jwp,V
x h,
(69)
then employ the identity V(V’e,) - V’e, = V x V x em to replace the left-hand side in (69) and (68) to do the same for the right-hand side.
V(V. em) - V2e, = -jop,(a,
+ jax,)e,
(70)
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
n
143
region 1
FIGURE 32. Pictorial representation of the discontinuity in H field across a general interface formed from two dissimilar regions as related to surface current J, at the interface, with eventual application to metal-ferrite interfaces in the circulator.
Taking the divergence of (68) specifies V .em as
Therefore, (71) reduces (70) to the wave equation V2e,
= jup,(o,,,
+ jo.x,)e,
(72)
Invoking (63c), the wave equation for i, from (72) can be easily put down.
V2i,,,, = jwp,,,(a,
+j m m ) i z m
(73)
Uniformity in y-direction, upon explicitly expressing V2, lets us write
Realizing that i,, cc e j k x mex- j k f z and assigning p x m = jk,, generates a separation equation solution for pXm:
+ j u g r n ) - k;
(75)
d m = jWPrn(orn
For copper, oCu= 5.8 x 107R-'/m, using
E~ =
8.854 x 10-l2F/rn and
po = 4n: x l o p 7H/m, ocu = 5.8 x lo7R-'/m >> W E ,
z WE^
= 5.56 x
f(GHz)
(76)
making (75)
drn = j W p m6,
- k:
(77)
Doing a comparison again, oymocuE opooc, = 4.58 x 10'' f(GHz) rn-' >> k: = 5.83 x lo3 m-'
(78)
144
CLIFFORD M. KROWNE
making (77)
or
where the positive p,, branch has been selected. Metal current i,, can be expressed as
iT(x) = c
T , ~ P+ ~c T~~ -~P - - x
= iT
02
e-pxmx
=
i&e - X I S e - j x / a
(80)
to maintain i f ( x ) 0, x -+ 00, noting x = 0 is referenced to the metal-ferrite interface. Surface current in the top metal must be the semi-infinite integral -+
Jf
=
[’
.T
i,(x)dx
102
=Pxm
Now return to (62b) and, dropping the displacement current in keeping with the level of approximation done here,
Expanding out the V x operator, (82) becomes
which, when applying uniformity in the y-direction and h, oc e-pxmXe-’kfz, gives
+ h,,j) + p,,(h,,9
(84) iZm2 Remembering the earlier argument showing that p,, >> k,, the reduction to -jk,.(
-
h,,,$
-
.
Pxm(hzmB
- hym2)
h,,2)
%
A
(85)
lzmz
follows. To have consistency on both sides of (85), the small metallic magnetic field h,, must be dropped.
h,,
=0
(86)
Therefore, the y-component of the magnetic field is .T
h,,(x) T
= - -ifn,(x) 1 Pxm
=
- %e-Pxmx
= -J f e - P x m x
=H
e-Pxmx OY
(87)
Pxm
where the third equality arises from (81) and the fourth from (67). Retrieving the other Maxwell equation (62a), letting the metallic magnetic
CAD USING GREEN'S FUNCTIONS A N D FINITE ELEMENTS
145
field consist only of the y-component (hxm= 0 as well), and expanding the
V x operator, we find that V x em= -jcomp,,,hymj becomes
which, when applying uniformity in the y-direction and emcc e - p x F , t x e - j k f Z , gives -jkf(-eymi
+ ex,$) + P x m ( e z m $
- eymf)
=
-jmmpmhym$
(90)
Once again, recalling that p,, >> k,, the reduction to Pxm(ezmj
- eym')
=
-j~~mPmkym.jj
(91)
follows. To have consistency on both sides of (85), the small metallic electric field eymmust be dropped. eym= 0
Therefore, the z-component of the electric field is T e,,(x)
1
= - -jwp,k:,(x)
=
1
jmpmHOye-PxmX
--
Pxm
Pxm
where the second equality comes from (87) and the fourth from (79). For the bottom metal ground plane of the circulator device, follow a similar procedure as done for the top metal microstrip shield. Refer to Fig. 32 for the interfacial geometry. Now the surface current is J, = 51. From Fig. 32 identify H'2)= 0 as the primary field (by problem construction it has been set to zero already) in the bottom ground plane circulator metal at the interface and H(') = H as the field in the ferrite side. Because 2 = 2 still, we find from (66) that J," = W x
Bottom metal current i,,, i!(x) =
$H
= 2H,,e-jkf"
(94)
using (SO), can be expressed as
ctePxmx +
C!~-PX,X
=
i~ze~xr.+ iOze .B xla e j x / a
(95)
to maintain if(x) 0, x + -a, noting x = 0 is referenced to the ferritemetal interface. Surface current in the bottom metal must be given by the ---f
146
CLIFFORD M. KROWNE
semi-infinite integral
Pxm
Invoking (62b) again and dropping the displacement current,
V x hfl, rz ifm
(97)
With the expansion of the V x operator,
which, when applying uniformity in the y-direction and hfl, oc epxmxe-jhfz, gives
+ h b j ) - pxm(htmj
-jk,(-h;m,?
-
h,B,P)
ib2
(99)
Since p x m >> k,, (99) reduces to pxm(hfmj -
h;mP)
.n
( 100)
M lzmz
Again, to have consistency on both sides of (loo), the small metallic magnetic field h:m must be dropped.
hfm = 0
(101)
Therefore, the y-component of the magnetic field is 1 p (x) /p (x) = ym
zm
is
= 2e P m X = JsBzeePxmX = H
Pxm
ePxntx
OY
(102)
Pxm
where the third equality arises from (96) and the fourth from (94). Retrieving the other Maxwell equation (62a), and expanding the V x operator, we find that V x efl, = - j o
m i;l m hB y mj
(103)
becomes
(104) which, when applying uniformity in the y-direction and efl, K epxmxe-jkfz, gives -,jkf(-e;mk
+ e!mj)
-
p x , ( e f m j - e;,P)
=
-jwmpmh:mj
(105)
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
147
This reduces to P.xm(Zm,it
-~
$ 2 )=
-.i~mpmh,~m,it
( 106)
because p,, >> k,. The small metallic electric field e;m must be dropped for consistency on both sides of (85): e,B, = 0
(107)
Therefore, the z-component of the bottom electric field is
(108) where the second equality comes from (102) and the fourth from (79). By (38) for the primary field in the ferrite, using (53) equating the guide propagation constant to the unbounded propagation constant in the ferrite,
Because this was generated from d;E, = 0, the same relation ought to hold for the correction field in the ferrite e/., namely d;eJ = 0, which leads to an analogous relation to (109):
whose solution is
The boundary conditions to be applied are e,(x
=42)=
e’!in; e Z f ( x= - d / 2 ) = e&
(1 12a, b)
Taking the sum and difference of these two (1 12) formulas yields
Inserting (93) and (108) into (113) shows that ~1,2:,f = 0, and that the final ferrite correction field by (11 l), including z-dependence, is
148
CLIFFORD M. KROWNE
h, correction field in the ferrite region can be found by combining (64a) and (65a), and then using (114) for e,,. 1 hf(x, z ) = - -V
x ef
Wp,
where the first approximation comes from the uniformity condition in the y-direction and dropping field components eXS and e y f .The total magnetic field in the ferrite region is by superposition (of (59) and (115))
Wave impedance, extending (58), is
Effective permeability within the ferrite region, after inspecting the last
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
149
(1 17) expression, is now given by Pey
=
Pf
"-1'
[
1 -(1 -j)-d PJ-
z &[I
+ 2(1 - j ) - -
"-1
d Pf
Equation (1 18) provides an equivalent permeability for the ferrite region which accounts for the whole guiding structure, imperfect metal regions, and the main ferrite puck region. Second approximation in the equation is true for small corrections, implicit in the whole derivation. Extending the validity range to cases when the metallic loss is oppressive, besides being unwise from a pragmatic circulator construction point of view, would be highly suspect since many field components were dropped along the way to the final result. If we had assumed an e-j"' time dependence, changes along the way would occur in expressions found, easy enough for the reader to determine by following through the algebra, and the final result in (118) would change to
What is placed in formulas (1 18) or (1 19) for pf is the two-dimensional effective permeability given by (24), pf = p,, = p p f r . Using the earlier results in (78) to calculate skin depth (the formula there is in MKS, conversion to CGS requires umks/47c~0 + ccgJ
=
d
I
@ =
G = JG J4.58
$ x f(GHz) x 10' '/m2
2.09 --
f(GHz) (120)
which gives 2.1 pm, 0.66 pm, and 0.21 pm for, respectively, ,f = 1, 10, 100GHz. For a 100pm thick substrate of ferrite, the ratio 6/d << 1, and assuming the permeability ratio pm/pf doesn't misbehave (i.e., it is on the order of unity), the corrections in (118) and (119) are indeed small as expected. (The CGS formula for 6 is 6 = c J , / G where p,,,? is the relative permeability in the metal.) Equation (120) was done for the case of copper, but it is easy to modify the formula for another metal. A few other issues must be addressed if we are to be able to use the equivalent permeability in the 2D Green's function code. Some issues involve what to d o with the particular tensor elements p and K seen in (l),
150
CLIFFORD M. KROWNE
which may be needed individually, although we happen to know that the H , and H , depend upon the ratio K I P , pe,, (the 2D value), and the E , field through partial derivative operators. And E , depends upon an EH Green's function which again only depends upon the ratio u/p. The most reasonable thing to do is simply to assume the approach really transformed the diagonal element p, and that K must follow suit since the analysis is only isotropic and the material is anisotropic. This process of logic will immediately remove any problems since pe,, depends upon p and KIP, and similar dependences in p and K will cancel out in the ratio. The last remaining issue is how to obtain a revised propagation constant in the ferrite medium. keq.p
= keg,, =
UJX
If we had assumed an constant would be
=0
e-j"'
1 6
1 -(1 - j ) - -6 P m d p,
= k,
1
1 - ( I - j ) - 8- P m d ill,
time dependence, the equivalent propagation
V. SETUPFORMULAS FOR NUMERICAL EVALUATION OF MICROSTRIP CIRCULATORS A. Formulas for Static Internal Mugnetic Field
in developing electromagnetic rf field solutions for circulators, the simplest assumption for the dc bias magnetic field is that of a nonvarying or constant spatial field. This may be satisfactory in many cases where the bias field circuit has been engineered to meet this requirement. And this may be more the case in permanent magnetic construction for industrial applications which are in a mature state of development, especially in low-frequency regime uses. But this probably is not the case in hybrid or monolithic circuit applications for the microwave and millimeter wave frequencies. Certainly, where laboratory measurements are conducted using large electromagnet pole pieces, the attainment of a nearly constant magnetic field inside the pole pieces is assured. Such an arrangement is not possible for a packaged miniaturized circulator with a nonideal geometric configuration of the ferrite puck, permanent magnet, and flux return path. Even if the applied
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
151
magnetic field is maintained to be uniform, the internal magnetic field Hi, which determines the values of the elements in the permeability tensor, will still not be uniform (except in the extreme limit of infinitesimal substrate thickness). Instead, the circulator’s aspect ratio of radius to thickness controls the degree to which the inhomogeneous demagnetization field opposes the applied magnetic field. One consequence of Hi variation is that the ferromagnetic resonance frequency, where the magnetic dissipation losses are a maximum, is spread out over a distribution of frequencies. For ordinary circulators designed with the ferromagnetic resonance frequency well below the geometric shield rf resonant frequency (approximating the center frequency of the circulator operating band), cancellation of the applied bias field by a saturated magnetization leads to nearly zero Hi. Inhomogeneous demagnetization, however, can create a range of ferromagnetic resonance frequencies which are not zero and which may become quite large close to the circulator perimeter, where H ican rise dramatically. Thus near the circulator perimeter, the ferromagnetic resonance may encroach on the low-frequency end of the operating bandwidth, bringing many problems, including high losses to the outermost annular region. H ican be found by a direct solution of Maxwell’s magnetostatic (timeindependent) equations. The relevant equation is the curlH relation, Ampere’s law governing the magnetic field H in a current-free, time-independent environment. VxH=O
(123)
Equation (123) is solved specifying H as a gradient of a magnetostatic potential Y,
H = -VY
( 124)
For nonlinear ferrite material, the constitutive relation is (125a) in MKS units, where the second equality comes about from the near equality of pLsand pn. In CGS units this relationship would look like (125b) In either case, M is the magnetization inside the ferrite material caused by the applied field H, where we have dropped the earlier subscripts (H‘,pp)for brevity. Note that the terms within brackets are the resulting internal field (see (10) and (11)). For ordinary ferrites, the B-H relation can be assumed to be single-valued because the hysteresis effect is small, and where it is most noticeable near zero applied field it is ignored. Such an assumption is not permissible for hexagonal ferrites, which have a large anisotropy field
152
CLIFFORD M. KROWNE
(ranging between 10,000Oe and 30,00OOe), which must be taken into account by (9) and have very hysteretic B-H curves. In any event, working with ordinary ferrite materials where the singlevalued nature of the B-H relationship is accepted, the magnetic flux density B is a nonlinear, monotonically increasing function of H and in the same direction as H. Magnetization M is incorporated into a nonlinear permeability factor p ( H ) where H is the field magnitude, yielding
Since the divergence of B is zero (an immediate consequence of taking V. of (60a)), we insert into
the previous B expression, finding the nonlinear Poisson equation for magnetostatic potential [Newman and Krowne, 19981
Expanding this formula gives
or
Any of the forms (128)-(130) may be solved for Y. Newman and Krowne (1998) have employed a software package called PDE2D, which is a two-dimensional finite element solver, to analyze the circulator problem by solving (128) in an appropriate form for the package [Sewell, 19931. PDE2D solves equations of the form
where A, B, F may be linear or nonlinear vector functions of the spatial coordinates x and y , of the time t, and of the unknown vector U. The region in which the equations are solved may have curved or straight boundaries, and the boundary functions G , and G , must be specified as either of two forms; An,
+ Bn, = G , ( x ,y , t, U) U = G,(x, Y , t )
(132a) (132b)
Equation (132a) represents free boundary conditions; (132b) represents fixed
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
153
boundary conditions. n, and ny are the x and y outwardly normal components of the unit vector in relation to the boundary of the solution region. If the problem is time-dependent or nonlinear, an initial condition or starting estimate U = U,(x, y. t ) must be specified. PDE2D has been applied in the past to a wide variety of problems, and recently to the problem of waveguide propagation [Sewell and Cvetkovic, 1989; Cvetkovic, Zhao, and Punjani, 19941. Magnetostatic potential (and as a result, the demagnetizing field) is solved using PDE2D in the cylindrical coordinate system, consistent with a circular ferrite puck cross section. Nonlinear Poisson equation (1 28) becomes
Because of azimuthal 4 symmetry, the three-dimensional puck region problem reduces to a two-dimensional problem in coordinates r, z :
Equation (134) has a singularity at the origin, which is removable by multiplying through by r, giving the well-posed problem equation
In order to use the PDE2D package, set the vector U = U (scalar) = Y , identify x + r, y -+ z , and solve for U and its spatial derivatives U , = aU/& and U , = aU/az. The magnitude of the gradient of scalar magnetostatic potential Y is IVY1 =
[($>'+ (g)2]"2 JFTZ =
Therefore, the equation to solve using PDEZD is a(rp(Jm)Ur)
dr
1
+$ (r,dJo.: + U s ) U , uz
=0
(137)
Recalling (124), the z-directed component of the dc magnetic field used to provide bias for the ac radio frequency (rf) problem is obtained from the above solution as
154
CLIFFORD M. KROWNE
Figure 33 shows the domain region selected in two-dimensional space from the rz cut plane along 4 = constant through the origin. The puck volume is contained within the region 0 d r d a, 0 d z d 2h and 0 d 4 d 27c. The cut plane reduces the three-dimensional problem to one of two dimensions, and symmetry further simplfies it to the rectangular domain 0 d r d L, and 0 d z d L,. In the final reduced domain, the ferrite material occupies the region 0 < r d a and 0 6 z d h. The problem is solved for the case when an external magnetic field H a p p = Hap,$ is applied; that is, when the puck is removed, only a uniform field exists in the z-direction. Therefore, we require far from the puck, say at z = L,, that the gradient of the magnetostatic potential ('P = U ) normal to the boundary only be in the z-direction and have a value
In the center of the puck due to symmetry (midline r = 0, 0 d z d Lz),and at the outside circumference (r = L,, 0 d z d L,) of the puck and far from it, the magnetostatic potential normal to the boundary has zero value (zero r-directed field) so that
au
-=0 dr
r=O
and O d z < L L ,
(140)
Happlied
FIGURE33. Domain region selected for determining the magnetostatic potential Y in two-dimensional space from the rz cut plane along 4 = constant through the origin. The puck volume is contained within the region 0 < r < a, 0 < JzJ< h, and 0 < 4 < 2n. (Newman and Krowne, 1998.)
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
au
r=L,
-500; ar
and O < z < L ,
155
(141)
Finally, at the inidplane (r$ plane) of the puck the gradient of the magnetostatic potential (Y = U ) parallel to the boundary ( z = 0, 0 < r < L,) must be zero due to symmetry (zero r-directed field):
-dU_ dr
- 0; z = 0,O
d r d L,
Integration of (140) yields U = constant; z
=0,O
< r d L,
(143)
and because of superposition and the gradient nature of the magnetic field construction, this constant is arbitrary and so may be set to U = 0; z
= 0,O
,< r
< L,
( 144)
The permeability function p ( H ) is constructed as follows. Outside the puck, p ( H ) = / l o . Inside the puck, it is required that the permeability function be single-valued, necessitating neglect of hysteresis. A reasonable and convenient analytical approximation developed for p ( H ) is [Newman and Krowne, 19981
M, is the saturation magnetization and H , is the corner magnetic field at which the magnetization reaches 0.707 times its saturation value. To see that (145) is indeed reasonable, asymptotically examine the low and high field limits. For very small fields,
The B-H relation will then be
B = p(H)H
= p0
(1
+
2) H
(147)
making the magnetization contribution to B proportional to H. In fact, this part is M,H/H,. For very large fields,
156
CLIFFORD M. KROWNE
The B-H relation will then be B = p(H)H = po(H
+ M,)
(149)
making the magnetization contribution to B independent of H, the saturated limit. Examining B( H ) curves for various ferrites [Trans-Tech, Inc., 19891, we see that the corner field HI is often on the order of 1 Oe and at that field, the flux density is on the order of but still much less than the saturation magnetization, which is often on the order of thousand, of gauss. The advantage of our (145) ferrite model for use in PDE2D, over, for example, a piecewise linear model for p ( H ) given by
is that whereas (150) is discontinuous at H = HI, (144) is both continuous in H and produces continuous Jacobian matrix elements which can be calculated explicitly. The Jacobian matrix for the partial differential equation under consideration is aF aF. .aF - J=
*
au au, au, ro dA aA aA aA - - au au, au, = - l o aH, aB aB -
aB -
0 1 dA
~
au au, au,
The four partial derivatives in (150) are determined, for the ferrite model in (145), explicitly to be Hf
aB 3H,
-- -
aA aH,
( 1 52a)
+
(152b)
H 2 rH, HI H 2
(152c)
Hr* z -rp H: ~2
-- - - - r P
+
-aB -=r[l-p-p aH,
+ H2
HT
+ H2
(1 52d)
CAD USING GREEN'S FUNCTIONS AND FINITE ELEMENTS
157
where 4nM,
p=Jm
(153)
Because of the symmetry of the Jacobian matrix, the execution time and the memory storage requirements of the solver are cut in half. The solution is started by choosing an initial guess for the unknown U , which is the magnetostatic potential, labeled as U = U&, z). Choice made here is to select the empty space solution as the initial guess, thereby making
-Huppz (154) This really comes from (124), which is in effect (138) but now applied to the entire solution space, not just at z = L,. That is, by (124) Uo(r2 Z)
H
=
H a p p i= -VY
(1 5 5 )
or
where the second equality comes about from azimuthal symmetry (really uniformity), a property of the circular symmetry built into the problem. The third equality arises from the lack of any boundary condition asymmetries imposed externally on the empty space problem in the radial direction. Clearly, the final form
is like (139), but valid now over 0 d r < L, and 0 d z d L,. Integrating (157) over the interval 0 d z' d L, produces
Y ( z ) - Y(0) = But by (144), we have already set "(0) in (158) gives
l;
HuppdZ'
= 0 (U = Y ) .Thus the
Uo(r,z ) = Y(r,z ) = - H a p P z which is precisely (154).
(158) trivial integral (159)
158
CLIFFORD M. KROWNE
B. RF Formulas from Green’s Function Approach Green’s function techniques have been extended in the last few years to handle radial variation for the most commonly found round circulators found in military, commercial, and industrial use. The new method involves the use of a recursive Green’s function [Krowne and Neidert, 19961 as opposed to a uniform region Green’s function [Neidert and Philips, 19931. The theory behind the recursive Green’s function allows many of the economies of the uniform Green’s function theory to be applied to the evaluation of the recursive Green’s function. This leads to extremely short calculation times compared to that found for numerically intensive simulators like finite difference and finite element codes. The essence of the recursive Green’s function approach is to break up the circulator puck into a single internal disk containing the origin and a set of annuli, or rings. Each of these circulator zones may be characterized by different values of applied magnetic bias field, saturation magnetization (magnetization if not fully saturated), and a demagnetization factor. In this way the circulator, with natural as well as intentionally imposed inhomogeneities, can be properly modeled. Recursive Green’s function theory applied to two-dimensional circulators has been thoroughly presented [Krowne, 1996a; Krowne, 1997; Krowne, 1998b1. Capability to account for radial variation can provide new insight into circulator operation and assist design efforts. First consider the case (natural inhomogeneities) when the ferrite material is uniform and the rings are used simply to provide a way of accommodating internal field radial variation (see Fig. 34). This case corresponds to the situation when the demagnetizing factor is the only radially varying parameter. The demagnetization factor can be found self-consistently by using the formulation and finite-element engine seen in the preceding subsection (A), or approximately by using Joseph and Schloemann [1965]. Once the self-consistent static field solution is obtained at each radial location, the demagnetizing factor can be found as well and inserted as a parameter into the ac Green’s function code, which finds the electromagnetic fields and s-parameters for the circulator device. Just as demagnetizing factor N z z ( r ) may have radial variation, so too may the other parameters which feed into the ac electromagnetic Green’s function solver. These parameters include saturation magnetization M,(r) radial variation and externally applied field Hopp(r),to name two others. The second case (intentional inhomogeneities) occurs when the circulator has multiple rings of different ferrite materials, or a combination of ferrite and dielectric rings. Modeling a dielectric ring can be performed by reducing the saturation magnetization to zero, taking away all magnetic behavior while leaving its dielectric properties intact. In the recursive Green’s function technique, an algorithm for calculating
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
159
Port
: Port
Microstrip Conductor
Ground Plane FIGURE34. Diagram of an inhomogeneous circulator, showing how the device is broken up into radially concentric rings. Each ring has constant saturation magnetization M,, applied magnetic field H,,,, and demagnetization Nzz. (Adapted from Krowne and Neidert, 1996.)
the dyadic Green’s functions for a ferrite disk is implemented. The Green’s functions are kernels which relate the electric field at a particular point (the response point) to the magnetic fields on the boundary (the source points). It is important to realize that the dyadic Green’s functions can be obtained without knowledge of the outside circuits or transmission lines. The technique only requires a specification of the source fields (the magnetic fields) at the boundary. Because the magnetic wall condition ( H , = 0 at r = a if 4 # 4im,i = port, m = subintervals for ith port) is assumed to hold everywhere except at the ports, the only source points are at the ports. The electric field (2-component of the electric field E , , abbreviated here by suppressing the index) at any point within the interior of the ferrite puck, using port index labeling i = 1 , 2 , 3 and abbreviated notation, is given by [Newman and Krowne, 1998)
E(r, 4) = GIb.2 4)Hl
+ Gk-3 4 ) H 2 + G 3 k ? 4
w 3
( 160)
When the response point is also located at a port, (158) can be evaluated accordingly and successively to yield the matrix relation
G , gives the electric field at port i caused by a magnetic field at port j , and
160
CLIFFORD M. KROWNE
the total field is found by superposition. All the Gi’s and Gij’s in (160) and (169, respectively, are calculated using the recursive Green’s function procedure. Attached to each port is an impedance equal to the wave impedance q d of a microstrip line having a width and height equal to the port width and height, with a dielectric substrate whose permittivity value is selected by the designer. The electric and magnetic fields tangent to the interface between the circulator and the attached microstrip line are equal on either side of the interface, assuring continuity. Characteristic impedance of the transmission line is proportional to the wave impedance in the transmission line, for TEM and quasi-TEM lines. Therefore, at each port E!nside i, tan
outside - E i , tan
(162a)
H I 1n. stan ide
- H ioutside , tan
( 1 62b) Einside
Zi,in
= constant
x qi,d = constant x 2 H(nside I,
Z i , o= constant x
q;,d
= constant
x
(162c)
tan
___
(162d)
where the explicit indexing with i allows for differing field and impedance values among the ports. (Note here that the #-component index of the magnetic field H , has been suppressed also). Because the port aperture is the same whether you are looking inward to the circulator or outward to the attached microstrip line, the proportionality constant relating characteristic impedance to wave impedance is the same in both directions. Intrinsic circulator port boundary conditions are such that at ports 2 and 3 the incident wave amplitude is zero (see Fig. 35). Nulls as stipulated can be accomplished by terminating the attached transmission lines in infinitely absorbing loads, so that whatever power does get transmitted into the output lines will never be reflected back into the circulator. Such loading is not the same as an impedance match to the circulator, which would imply maximum power transfer into the transmission lines across the interface and could only be brought about with a frequency-dependent complex impedance whose value is the complex conjugate of Zi,in.Since we want to find the important circulator performance characteristics, focus will be on obtaining the s-parameters for the device and the electric field distribution within the device. The first desired characteristics, the s-parameters, allow one to assess the circulator’s terminal behavior, regardless of what is occurring inside the device puck electromagnetically. Of course, this is why s-parameters are so critical for circuit designers, precisely because the ultimate goal is to insert the circulator inside a larger circuit. Usually the
CAD USING GREEN'S FUNCTIONS A N D FINITE ELEMENTS
161
a3 = O
FIGURE 35. Incident (a,, a2. a 3 ) and reflected ( h , , h,, h,) traveling waves from outside the intrinsic circulator at each port, providing the basic external field constraints determining the internal ac field distribution. The driving wave occurs at port 1. No reflected traveling waves occur at ports 2 and 3. (Newman and Krowne, 1998.)
s-parameters are determined as a function of frequency for insertion loss, return loss, and isolation. Even though s-parameters are extremely useful, obtaining field plots can enable the circuit designer to understand actually why the circulator is behaving the way it is and thereby optimize its performance, making it a better device when inserted into the larger circuit. Thus field plots complement s-parameter determination. First s-parameter determination is covered, then field plot construction. Based on the previous discussion, assume reflectionless load conditions at ports 2 and 3, only allowing an incoming signal at port 1. For traveling waves, yld is equal to the E / H ratio in the microstrip lines, and we have
where the superscripts denote traveling waves: incoming (incident) or exiting (outgoing). (Observe that by dropping the port index on qd, it is being taken for argument's sake that all the microstrip lines are identical.) Next all the port fields can be expressed in terms of the s-parameters and the incident electric field at port 1:
E,
x
EF
+ E Y f = EYc + s l l E';"' = (1 + s1l)Eiy
(164a)
162 65a) 65b) E,
=
E?‘
H 3 -- Hyt
= s3 1 Einc 1
= -
3~Ein1C
( 166a)
(1 66b)
qd
Equations (164) through (166) are now solved for the total electric fields at each of the three ports, E , , E,, E, in terms of H I , H,, H, because it is these total fields which are “computed” by the internal electromagnetic behavior of the circulator. Therefore, we obtain El
= 2E’;”’- qdH1
(167)
E2
= -qdH2
(168)
E,
=
-hH3
(169)
Inserting these total electric field results into the circulator equation (161) reduces the system from one of six unknown components in three equations, or equivalently two unknown vectors with one matrix equation, into one with three unknown components in three equations, or equivalently one unknown vector with one matrix equation. Thus the total magnetic fields can be solved uniquely. Placing the H I solution into (164b), while noting (164a), gives the input reflection coefficient, the s1 parameter: Sll
=
1 - q d - HI E;“‘
Placing the H, solution into (165b), while noting (165a), gives the transmission coefficient, the s Z 1parameter, otherwise referred to as the insertion loss:
Finally, placing the H3 solution into (166b), while noting (166a), gives the other transmission coefficient, the sjl parameter, otherwise referred to as the isolation:
The solution found in (170) through (172) may be called the “intrinsic” solution for the circulator. It is neither perfectly matched nor matched to a 50-ohm system impedance- instead its s-parameters are referenced to the
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
163
characteristic impedance 2, of the “native geometry” microstrip transmission lines attached to its ports. This “hooking” the circulator up to the external world method of solution is essentially identical to that done before [Bosma, 19641. Standard s-parameter matrix conversion formulas may be applied to renormalize the intrinsic circulator to a different impedance or to embed the circulator in a complex network (see Section 1V.B). The above solution may be embedded into standard circuit simulators if the reference impedance is renormalized to 50 SZ (following [Kurakawa, 19651, for example), letting the circuit simulator supply the embedding network. One natural choice for an embedding network is to select quarter-wave microstrip transforming sections (see Section 1V.B). The resulting frequency response of the circulator circuit is due to the cascade of the intrinsic circulator performance with that of the impedance-matching network. Now we move on to find the electric field (one component, E , ) within the circulator puck. The magnetic field H (two components) can also be found, but that will not be covered here. Suffice it to say that H can be recovered from E , once it has been determined. Further details on recovery of the magnetic field are available [Krowne, 1998b-J. Once s1 L, szL,and s3] of the intrinsic circulator are found, one can obtain the electric field at an interior point in terms of the incident electric field by substituting (164b), (165b), and (166b) into (160):
E(r, 411inc. port 1 1 L - -[G,(l - 2,
- L,E‘;“~ =
L,JZ,~,
- s11) - G 9 , l - G3s31l
(173a) (173b)
Equation (173a) gives the field solution when an incident wave enters only port 1. The other two ports have null values in this regard (only waves are allowed to exit into ports 2 and 3). The last equality in (173a) makes use of the fact that E’;”‘ is a traveling electric field wave in a microstrip transmission line of characteristic impedance Z,, so one can write the electric field in the interior of the circulator in terms of the normalized transmission line traveling wave component
u1
=-
E?‘
4%
But we can imagine that for our real circulator embedded in an actual circuit, not only will incident waves enter port 1, but reflected waves at ports 2 and 3, created by the electromagnetic waves exiting the circulator at the through and isolated ports, will return to enter the circulator, acting as unintentional incident waves serving to excite the puck.
164
CLIFFORD M. KROWNE
Therefore, at the through port 2, reflected wave E';J' back into the port produces an internal field independent of the incident field at port 1:
~ ( r4 ),1 r e j , p o r t 2 = L L,
1
~E= FL Z J Z , ~ ~
= - [G,(1 - sI1)-
G2s,, - G
zo
a2 =
, ~ , , ] e j ~ " ~ ~
(175a) (175b)
E*," ~
JZ,
Similarly to port 2, at the isolated port 3 another reflected wave E;"* goes back into the port, producing an internal field independent of both the incident field at port 1 and reflected field at port 2.
Total electric field E(r, 4) at a location (r, 4 ) within the puck region due to incoming waves at all three ports can now be stated by invoking superposition as
using (173a), (175a), and (177a). Successive contributions in (179), roughly speaking, are associated with puck resonances excited by inputs at ports 1, 2, and 3, respectively. Equation (179) is made more manageable for calculation by restating the electric field solution as a function of only the port 1 incident wave a l . To do this, waves a2 and a3 must be found as functions of a,. Suppose all ports have identical matching networks attached to them followed by a termination in the system impedance Zsys.Let each matching network have the same symmetrical scattering matrix
with port 1 of the matching network attached to the system impedance and port 2 attached to the circulator. The load reflection coefficient looking outward from each of the circulator ports into the system-terminated
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
165
matching network is (181) with (182) Next incorporate t*.c loading just discussed into t..e circulator circuit. Letting the abbreviated notation c1 = sll,/3 = szl, y = s31 be used for the intrinsic circulator’s effects on incident and reflected waves at its ports, first write
The matrix in (183) describes the behavior of the intrinsic circulator device. Loading circuit effects are incorporated at this stage by noting the two relationships due to the load impedance rL, a, =
rLb2;u3 = rLb3
(184a, b)
Relationships (184) arise when looking outward from the circulator into the matching networks at ports 2 and 3. Substituting (184) into (183) gives
Rearranging (185) produces a single unknown vector b in one matrix equation,
solvable in terms of a,. Once the scattered waves b,, b,, and b, are determined, they are inserted into (184) to find the incident waves a, and u3 in terms of a,: a, = ~ , a , ; K , =
- UTL) + Y 2 L (1 - MrL),- Pyr;
rLP(1
(187a, b)
166
CLIFFORD M. KROWNE
bl
= (. =
+ BK3 + S J a 1
(189a)
r,a,
(189b)
Equation (189a) was found using (183), with (187a) and (188a). The second equality in (189) serves to define r, as
r, = N + /3K3 + y K ,
( 190)
r, is the input reflection coefficient looking into port 1 referenced to Zo when the circulator is terminated at ports 2 and 3 with matching networks, We are now which are themselves terminated in the system impedance Zsys. in the position to finally obtain the desired electric field expression in terms of only a,. Recall (179), and place into it (187a) and (188a), giving the result E(r, 4) = (“1
+ LzK, + “3K3)JZ0a,
(191)
If 1 . 1 << 1, IyI << I, and IpI z 1, which is true for a well-designed circulator near the center of its band, then K , z pr, and K , E yrL (pr,)’, so these coefficients approximately represent one trip and two trip reflections off of the matching structures at ports 2 and 3, respectively. Thus K , is small and K , is even smaller. This means that the electric field pattern in the circulator, when matched, will look a lot like the basic pattern due to only L,, but with corrections given by the complex quantities K , and K , . The field solution provided in (191) can be related to the incident power Pincby using the relationship between the incident wave at port 1, a,, and the wave a, incident on the matching network attached to port 1. a, must pass through the matching network before it can become incident on the port-circulator interface:
+
Putting (192) into (191) yields E(r, 4) dependent upon a,:
Realizing that the incident power Pin, is related to the incident wave a, by
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
167
we find that a,=&
(195)
Finally insert (195) into (193) to obtain the desired form
m,4) = w, +
L2K2
+ L,K,)JZ, 1 - s y - ,
(196)
Equation (196) gives the electric field at every point within the circulator shield region as a function of the rf (or ac) power incident on the input to the matching structure. C. RF Formulas from Finite Element Approach
We have already discussed a finite element approach for setting up the formulas for determining the static magnetic field. For the rf or ac solution using finite elements, this static field information must be inserted into the rf permeability tensor given in (1). Following the ii expression were equations for the different tensor elements, which were broken up into real and imaginary parts. It is instructive here to look at the complex form of the elements simply because they are more compact and show the origin of the loss construction: (197a) (197b)
Apparent is the loss contribution by a shift in ferromagnetic resonance frequency wo by an amount jwa,. Constitutive relation between the rf magnetic flux density B and the rf magnetic field H is
B=PH
(198)
Retrieving from (1) and placing it into (198) produces the B vector needed in Maxwell’s equations for this anisotropic material. (199) Insertion of the magnetic constitutive relation (198) into Maxwell’s curlE expression (60a) gives
V x E = -jwpH
(200)
168
CLIFFORD M. KROWNE
Constitutive relation between the rf electric flux density D and the rf electric field E is D Scalar permittivity tangent tan&
E
=
(201)
EE
accounts for dielectric losses through the use of loss E
= E ~ E , (1 - tan 6)
(202)
Insertion of the electric constitutive relation (201) into Maxwell's curlH expression (60b) gives V xH
=j
mE
(203)
Maxwell's equations (200) and (203) are required in rectangular coordinates to proceed with the setup for finite element PDE2D package use. aE, ay
aE,
dE,
dE,
az
+ j m p L o ( p H-, jlcH,)
=0
(204a)
ax + jwp,( jKH, + pH,,) = 0
(204b)
az
(204c) (205a) aH,
aH,
dH,
dH, ay
aZ
ax
ax
-
josE,
=0
(205b)
-
josE,
=0
(205c)
Restricting these equations to the problem at hand, namely the 2D modeling of the circulator device, set E , = E, = 0, H , = 0, and the variation with respect to the z-coordinate equal to zero (a/dz = 0) in (204) and (205):
r
a i
169
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
Equation (206) represents three linear partial differential equations which can be solved using PDE2D. It is convenient to normalize these equations by defining
u1 = v.JH,;
u2
x
= vU,fHy; U , = E,
y,
(207a)
= Y-
62'
(207b)
U
Here a is chosen to be the ferrite shield radius and quf the wave impedance in an unmagnetized ferrite:
Also, defining a propagation constant k,, in an unmagnetized ferrite,
k,, = oj;c,c,E,, we obtain the matrix equation for the complex phasor variables U , , U 2 ,and
r
a
1
In order to work with real quantities, (210) must be separated into real and imaginary parts. This is first accomplished by writing al! matrix and vector entries as complex pieces, as follows:
=0
(211)
where U , = u1
+ v,;
u1 = Re[q,,-H,];
u1
= Im[q,,Hx]
(212a)
170
CLIFFORD M. KROWNE
U,
= u2
U,
+ u2;
=
u2 = Re[qufH,];
u3 = Re[&];
u 3 -t u,;
p, = ReCjuka];
u2 = Im[qufH,]
(212b)
u3 = Im[E,]
(212c)
41 = Im[j,ukUfa]
p2 = ReC~ka]; q2 = I m [ ~ k , ~ a ]
p3 = Re[jk,,.a(l -jtanS)];
q3 = 1m[jkufa(l -jtan6)]
(213a) (213b) (213c)
Matrix equation (21 1) is equivalent to six linear partial differential equations in six unknowns ul, ti2, u3, vI, v2, v3 in the vector U = [ u , u2 u3 u , v 2 u3IT:
+ Piu, - 4 1 ~ + 1 P 2 u 2 - q 2 O 2 = F~= "3, + qIu1 + P I u , + q Z u 2 + P Z U 2 = F2 = u 3, + p2u1 - 4 2 0 1 - Piu, + 4i'2 = F3 = u3x + 42u1 + P2Ul - 91uz - P l U 2 = F, = 0 u3p
u,, - ~2~ Ul,
- VZx
+
~
3
- ~432133
+ 43u3 +
p3u3
(214a) (214b) (214c) (214d)
= F, = 0
(214e)
= F, = 0
(214f)
Subscripts x and y on unknowns in (214) indicate partial differentiation. This set of six equations fits into the required PDE2D formalism by setting A = 0, B = 0, and F(x,y , t ; U , U,, U,) = 0. Boundary conditions at the ferrite circulator perimeter are either a magnetic wall at the ferrite-dielectric interface or a continuity condition at the puck-microstrip interface. The perfect magnetic wall condition is an acceptable approximation, and makes Htan(r,4) = H&
4) = 0
(215)
at the radius r = a. Because the solution has been sought in rectangular coordinates, H, must be constructed from the correct H , and H , projections, transforming (213) into H, cos 4 - H , sin 4
= 0;
x2 + y 2 = u2, 4 E nonport
(2 16)
The boundary function has been presented in (132a), An, + B n , = G , . Since both A and 6 are zero, the boundary function G, can be used to enforce the tangential condition (216) on the magnetic field when (207a) is invoked to convert into solution variables. Applying (212a, b) changes the problem over to real variables from complex variables:
x2 + y 2 = a', 4 ~ n o n p o r t
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
171
At the ports, continuity of the fields is maintained. At port 1 where the field is incident, the only existing electric field component E, is chosen to have unit amplitude and zero phase angle, Re[E,] = E , = 1 and Im[E,] = 0. The tangential magnetic field for the circulator can be determined by the continuity across the circulator-microstrip port interface. In the microstrip it is found from (163a) to be E,/qa. Therefore, by the projection employed in (216), H,cos
4 - H,sin 4 = -, Eo . x2 + y2 = a2, 4 E p o r t
1
(2 18)
vll
By the same process of reasoning as leads to (217), (218) is converted to VUf u2 cos 4 - u1 sin 4 - E, G,(.x, Y , t ; U, U,, U,) = u2 cos 4 - u I sin q a
4
x2
]=[:I;
+ y2 = a2, ( b ~ p o r 1t
(2 19)
The second component entry symbolizes the fact that the magnetic field also must have zero imaginary part due to the electric field assignment. For ports 2 and 3, we realize it is total fields which are continuous across the circulator-microstrip port interfaces. If to first approximation the loads on these two ports don’t produce reflected signal back into the device, then the E-H field ratio in the microstrip line represents a total field ratio, and its value -qd may be used to find the magnetic field in terms of the electric field, similar to that utilized in (163b,c): H,cosQ, - H,sin$
E
= I ;x 2 + y 2 = a2, $€port
qd
2, 3
(220)
Conversion to problem variables produces
System equation (214) plus the boundary conditions (217), (219), and (221) give the required information to solve the problem using PDE2D.
172
CLIFFORD M. KROWNE
VI. NUMERICAL RESULTSAND COMPARISON TO EXPERIMENT FOR MICROSTRIP CIRCULATORS A . Static Internal Magnetic Field Results
Meshing for finite element processing (we used triangular-shaped elements) is chosen by specifying an initial minimum extent (simplest) grid which uniquely delineates the regions of ferrite and air. Mesh density of the triangular elements is then increased proportionately to the gradient of the solution. Convergence of the solution, using typical ferrite puck parameters, occurs after about 10 iterations. Magnetostatic potential Y versus coordinates z and r, using the formulas in Section V.A, is given in Fig. 36 for a puck diameter-to-height ratio D/u = 2h/u = 10, saturation magnetization of 2000 G (47cM,), applied magnetic field of 2000 Oe, and corner magnetic field of H , = 1 Oe (see (145)). Solution space shown corresponds to the quarterplane region of the Fig. 33 cut plane. Y indicates that close to the origin the z-component of the magnetic field (the negative gradient of Y ) is relatively constant and increases as the outer wall and top edge of the puck is approached. Magnetic field directed in the z-direction H , is calculated from the finite element solver by employing (138). Figure 37(a) gives the results for the midplane z = 0 of the puck. Permeability reduces from a tensor to a scalar value of unity on going from the interior to the exterior of the puck. Notice that there is a sizable transition region at the r = u puck-exterior interface.
FIGURE 36 Magnetostatic potential P ' versus coordinates z and r for a puck diameter-to= 10, 4nM, = 2000G, H,,, = 2000Oe. and H , = 1 Oe. (Newrnan and height ratio D / a = 2/1/~1 Krowne, 1998 )
173
CAD USING G R E E N S FUNCTIONS AND FINITE ELEMENTS
2a/h=10 2000
8
1500
zM
v
r"1000
0.4
500 0
1
- - - J-S approx at z=h/2
-1
u
0
0.5
1
1.5
2
0
0.2
0.4 0.6 rta
0.8
1
(b)
(a)
FIGURE37. (a) Static magnetic field H , versus r/u in a ferrite puck calculated by the 2D FE code at z = 0. (b) Demagnetization Nzz(r,z = 0) versus r/a for both the 2D FE method and the Joseph-Schloemann approximation, at two aspect ratios. (Newman and Krowne, 1998.)
In fact, only at a distance roughly equal to half the radius u/2 removed from the interface does the field settle down to its deep interior or far exterior values. Thus we see that significant field variation within the circulator puck is of very real concern. An analytical approximation to the self-consistent finite element solution is available [Joseph and Schloemann, 19651. The approximation calculates the value of the demagnetization factor N,, used in the demagnetization equation (10). For magnetization M equal to the saturation value M,, and no anisotropy field,
H i = H a p p- 4i~nN,,M,
In the limiting case of infinitesimal film thickness, h + 0, N,, = 1 and H i can be forced to be zero by applying precisely the correct valued applied external magnetic field H a p p= 4nM,. Finite thickness pucks, on the other hand, as Fig. 37 demonstrates, produce N,, varying with radial and vertical coordinates r and z. This variation of N J r , z = 0) at midplane is shown in Fig. 37(b) for both the finite element method and the approximation method, at two aspect ratios 2u/h = 1,lO. The finite element method was used to find N,, by reverse solution of (222), that is, using
Clearly, since H a p pand M, are known, and H i is found by the finite element
174
CLIFFORD M. KROWNE
solver, N,, is acquired immediately upon PDE2D solution completion. Thin circular pucks are approximated for almost all r by the analytical expansion method, except at and beyond 0.97a. However, thicker pucks have a much larger range where the analytical and finite element methods deviate from each other, r < 0 . 4 5 ~ .
B. RF Field and s-Parameters Using Green’s Functions We have earlier discussed the reasons why it is desirable to bias a circulator with H , close to OOe for ordinary ferrite material, allowing the operating frequency to be far removed from the ferromagnetic resonance frequency f o , thereby minimizing magnetic losses. Ferromagnetic resonance frequency f o = w0/(27c) = ?Hi, where y is the gyromagnetic ratio equal to about 2.8 MHz/Oe. Constant null value for oo (or in fact any constant value) cannot be maintained throughout the puck cross-sectional region of the ferrite because of the intrinsically inhomogeneous nature of the circulator problem, which may have one or all of the variables HapprN,,, and 47cM, varying with radial position. Only an extraordinary magnet configuration would be able to satisfy rf design requirements and have nonuniform variables which are specified to vary with radial position so as to produce w o = constant. In practice, the internal magnetic field in the ferrite puck will always radially vary to some extent. Since the internal magnetic field is a function of radial position, the permeability tensor elements will also be a function of radial position. Thus the real microstrip circulator problem is an intrinsically inhomogeneous problem. Figures 38(a) and (b) show for a circulator, respectively, the measured insertion loss and isolation versus frequency, and measured return loss versus frequency, compared to calculations performed using the recursive Green’s function code, implemented by following the equations presented in Section V.B. Magnetization ~ K M=, 2300 G, applied magnetic field Hupp= 2300 Oe, and AH = 320 Oe were assumed perfectly uniform for purposes of calculation. However, this does not mean that the internal magnetic field H i was uniform (it was not) because of demagnetization effects due to the finite thickness of the circulator puck (height = 0.635mm). Puck radius is 2.7026mm and port apertures are all 1.6561mm. The ferrite relative dielectric constant is tf = 13.3 with the surrounding dielectric nonmagnetic material having E~ = 9.5 (needed to determine input and output microstrip line properties). Ferrite material loss tangent is 0.0003. Circulator design was of the planar embedded type where a ferrite cylinder is dropped into a slot cut into a dielectric substrate. The circulator matching circuit contains several sections of quarter-wave transformers (center frequencies all set to
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
175
E ul
Frequency (GHz) (a)
@)
FIGURE 38. Measured and calculated (recursive Green’s function code): (a) insertion loss and return loss ( s l l ) and (b) isolation (sgl) versus frequency. 4nM, = 2300G, H,,, = 2300Oe, AH = 320Oe, h = 0.635mm, a = 2.7026mm, w = 1.6561 mm, w, = 1.4w,E~ = 13.3, cd = 9.5, tans = 0.0003, and fcenre, = 9.5 GHz. (Device fabricated at EMS Technologies, Inc. by David Popelka and Gordon Harrison.) (Newman and Krowne, 1998.) (szl)
,henre* = 9.5 GHz) at each port. Comparison is made with an experiment by ~ w is the physical using a computational port aperture width w, = 1 . 4 where width of the circulator port microstrip lines. Fringing field behavior of microstrip structures necessitates such a correction to the physical width because of the open and inhomogeneous nature of the lines [Newman, Webb, and Krowne, 1996; Vaughn, Popelka, and Williams, 19961. Excellent in-band agreement (Fig. 38(a)) of the calculated and measured insertion loss occurs, with good agreement for the return loss as well but with the calculated minimum shifted to a slightly higher frequency. This shift is not observed for the isolation. However, the measured isolation values (Fig. 38(b)) are considerably worse than calculated (by as much as 8.5 dB at the center frequency and reduced to small values, nearly 0 dB, at some places between 1.5 and 2.5 GHz off center), which means the actual device imperfections may be degrading circulator behavior. Field contour patterns are extremely interesting and can be particularly informative at times on illuminating circulator device performance. The process of constructing the proper superposition of the actual excitation signals seen at all ports in a simulated device is shown in Fig. 39(a) through (d) where Fig. 39(e) shows the attached circuits for ports 2 and 3 in the final structure to be modeled. (The same material and geometric parameters are used as for Fig. 38.) Electric field patterns for the intrinsic circulator (without any matching structures) are calculated using an incident signal at (a) port 1, (b) port 2, and (c) port 3, with no incoming signal at the remaining two ports for each case. Results are obtained by running the
176
CLIFFORD M. KROWNE
FIGURE39. Electric field patterns for the intrinsic circulator (without any matching structures) are calculated by the recursive Green’s function code using an incident signal at (a) port 1, (b) port 2, and (c) port 3, with no incoming signal at the remaining two ports for each case. (d) Electric field pattern obtained by immersing the intrinsic device in a matching network, consisting of individual port loads shown in (e). (Newman and Krowne, 1998.)
recursive Green’s function code at a frequency corresponding to maximum isolation. As the excited port moves counterclockwise, the minimum which starts out near port 3 moves successively to ports 1 and 2. It is this minimum which indicates where a field null is to be expected, and delightfully it is near an actual port, allowing the identification of a real isolated port. Of course, the actual circulator has matching structures, and when these are added in by taking a weighted superposition of the results in Fig. 39(a)(c), Fig. 39(d) is found. The field pattern seen in Fig. 39(d) was calculated assuming a unit incident signal at port 1, and excitation signals entering ports 2 and 3 due to finite reflections occurring off of the terminating structures (Fig. 39(e)) attached to these ports. Terminating structures attached to ports 2 and 3 are identical and consist of a quarter-wave matching transformer and the system impedance Zsys.Intensity of the gray shading denotes electric field strength, with the darkest areas having the highest field and the lightest areas having the lowest field. When all three intrinsic circulator patterns are superimposed, properly weighted by the actual port excitation signals, the field pattern in Fig. 39(d) follows. Clearly appearing is the lowest field intensity region migrating (from its intrinsic location in Fig. 39(a)) from within the circulator puck interior to a zone attached to the perimeter engulfing port 3, now apparently the isolated port. Behavior of the field minimum is unmistakably demonstrated by these field
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
177
plots. Insight into the behavior of the circulator operation is deepened by examination of the calculated field patterns and proves the tremendous value in obtaining them. We also notice that the contours of highest intensity (symmetrically) attach ports 1 and 2, indicating that the input signal is transmitted optimally to the output port. A comment about the way the plots were constructed in Fig. 39 (and the following Fig. 40) is appropriate here. The interface between each uniformly shaded region represents a contour line of constant electric field magnitude. The first such line encircling the lightest region corresponds to 5 % of the maximum value attained within the puck. We have referred to the lightest region as the ‘‘null” because it has field magnitudes below this tiny value, and inside its contour toward its center the values approach zero. Successive contours encircling ever-larger regions or moving further away from the null correspond to 15%, 25%, 35%, 45%, 55%, 65%, 75’/0, 85%, and 95% of the maximum electric field value. For example, the uniformly shaded region between the 35% and 45% contours must have 0.351E2,,,,J d JE,(d 0.4~I~,,,,,I. Circulator performance (of the device embedded in a matching network from Fig. 39(d)) over a frequency band can be understood in terms of its field pattern change as the frequency is swept (Fig. 40). Location of the ports
FIGURE40. Electric field patterns for the intrinsic circulator embedded in a matching network are calculated by the recursive Green’s function, and displayed at nine individual frequencies. (Newman and Krowne, 1998.)
178
CLIFFORD M. KROWNE
is indicated in the middle diagram (labeled 9 GHz). The electric field null region begins inside the puck and left of the midline at 5GHz, rotating clockwise with increasing frequency. At 6 GHz the null region is nearly at midline and closer to the circulator perimeter. At 7 GHz it has rotated to the right-hand side of midline and near to port 3. Coalescing into the perimeter at the port 3 location occurs at 8GHz, with nearly symmetric disposition about port 3 happening at 9 GHz (commonly referred to as the circulation frequency). Spatial width of the null region and the rate at which it traverses the port aperture location with varying frequency is a good determinant of bandwidth of the circulator. Increasing frequency to 10 GHz starts to move the null region away from the perimeter, with full detachment completed at 11 GHz. f = 12 GHz sees an increasingly complex field pattern trying to emerge, with f = 13GHz bringing about a second detached null region and higher-order puck resonance behavior. Care must be exercised in trying to interpret the results as simply that of a simple resonator. Obviously, the circulator is not an isolated resonator because it has apertures in its walls which may be quite substantial in size, and through these apertures flow propagating waves. What we are witnessing is the combination of propagating waves from all three ports, some propagating behavior within the circulator puck, and standing-wave behavior within the puck. It is the complex interaction of all three physical processes which leads to the circulator field patterns we observe. For the particular device considered here, the minimum signal at port 3, and maximum isolation, occur near 9.5 GHz.
C. RF Field and s-Parameters Using Finite Elements Numerical setup formulas presented in Section V.C were used to analyze an embedded puck circulator (i.e., a ferrite puck surrounded by a dielectric). The finite element code PDE2D with all the pre- and post-processing software added by Newman and Krowne (1998) was used and the results compared to previously published Green’s function codes appropriate for uniform circulators [Neidert and Philips, 19931. Ability to employ a finite element code (here a two-dimensional one) depends intimately on the user’s capability to have numerical data obtained with a sufficiently meshed solution region, here the circulator puck environment. These elements, described by their mesh size, location, and density, determine the resulting data accuracy. Figure 41 shows the process of mesh generation done by both the 2D FE code engine and the preprocessing control we set up. Figure 41(a) is the initial mesh defined by us at the start of the simulation. Notice that triangular elements are used. Vertices (7), triangle elements (6), and boundary arcs (6) are labeled. Elements are generalized triangles, having
CAD USING GREEN’S FUNCTIONS AND FINITE ELEMENTS
(a)
(b)
179
(C)
FIGURE41. Mesh generation done by the 2D F E code engine and preprocessing control setup. (a) Initial mesh defined at the start of the simulation. with vertices (7), triangle elements ( 6 ) ,and boundary arcs ( 6 ) labeled. Elements are generalized triangles, having either curved or straight boundaries, enabling natural meshing in a circular solution domain. Microstrip transmission lines adjoin the circulator puck at arcs 2, 4. and 6. Magnetic wall boundary conditions are enforced at arcs I , 3, and 5. (b) Final mesh created by the 2D F E code (number of elements set to 300). (c) Mesh after adaptive refinement (same number of elements as in (b)). (Newman and Krowne, 1998.)
either curved or straight boundaries, enabling natural meshing in a circular solution domain. Microstrip transmission lines, which adjoin the circulator puck at arcs 2,4, and 6, are not shown. A magnetic wall boundary condition is enforced at arcs 1, 3, and 5. Next, Fig. 41(b) presents the final mesh created by the 2D FE code. The mesh generator in the PDE2D code attempts to produce equal-area triangles, and this is evident in the figure. In the case of this figure, mesh generation stopped after reaching 300 elements, a value set by the user. Last, Fig. 41(c) gives the finished mesh after adaptive refinement (element number still held to 300), which we added to allow densification of elements in regions where the solution had the highest gradient. Calculation time on a high-end workstation for sufficiently small gridding (as in Fig. 41), which yields smooth plotted results, is a couple of minutes per frequency point. The 2D FE code is most valuable in solving arbitrarily shaped noncircular 2D circulators. Here its slower processing times compared to the Green’s function code (requiring a few seconds per frequency point) is accepted because there is no alternative. Performance of the 2D FE code may be checked by running the Green’s function code. Figure 42 shows ~ , obtained by running the Green’s the s-parameter results, s1 s ~ sjl, function and 2D FE codes for an intrinsic circulator (an incident signal at port 1, only outgoing signals at port 2 and 3). Calculations were done for YIG ferrite material by setting magnetization 4nM,= 1780 G, applied magnetic field H a p p= 1780 Oe, and AH = 45 Oe. Internal magnetic field H iwas uniform because demagnetization effects (which are nonuniform over the
180
CLIFFORD M. KROWNE
-Sl1 (analytical) S21 (analytlcal)
S11 (2DFE) S2l (2DFE)
--
-S31 (analytical)
-20
‘
0
”
1
’
8
2
*
I
8
0
o S31 (2DFE)
*
1
6
4
8
8
0
I
8
a
I
I 10
t
I
t
1
12
I
I
I
14
Frequency (GHz)
,,
FIGURE42. The s-parameter results, s, s Z Ls3,, , obtained by running the Green’s function and 2D FE codes for an intrinsic YIG circulator; 4nM, = 1780G, Ha,, = 1780Oe. and AH = 45 Oe. Internal magnetic field H , was uniform because demagnetization effects were neglected; a = 2.79 mm, MJ = 1.5 mm, h = 0.508 mm, E / = 15. E,, = 15, and tans = 0.0002. (Newman and Krowne, 1998.)
puck radial dimension) due to the finite thickness of the circulator puck (height = 0.508 mm) were neglected. Puck radius is 2.79 mm and port apertures are all 1.5 mm. Ferrite relative dielectric constant is E~ = 15 with the surrounding dielectric nonmagnetic material being set to the same value G,, = 15 (needed to determine input and output microstrip line properties). Ferrite material loss tangent is 0.0002. It is clear that the ac s-parameter results plotted in Fig. 42 for the 2D FE code and the Green’s function code exhibit striking agreement. Contour plots of constant electric field magnitude for an intrinsically matched circulator are given in Figs. 43(a) and (b) for, respectively, a round and a distorted hexagonal device using the 2D FE code for the same material and geometric parameters (the case for the round device only) used for Fig. 38. Higher numbers which label the contours correspond to higher magnitudes of the vertical (E,) electric field. “0’corresponds to 5% of the maximum magnitude of electric field within the puck, and every increase in integer number adds 10%. Thus the contour lines here relate exactly to those used previously for the Green’s function method plots in Figs. 39 and 40. Microstrip transmission lines are drawn outside the puck regions to indicate definitively circulator excitation. The input port is at the left, the isolated port at the upper right, and the through port at the lower right. The hexagonal circulator was a circularly symmetric device, with the same
CAD USING GREENS FUNCTIONS AND FINITE ELEMENTS
181
FIGURE 43. Contours of constant electric field magnitude for an intrinsically matched circulator for (a) a round ferrite puck and (b) a distorted hexagonal puck, using the 2D FE code for the same material and geometric parameters as used for Fig. 38. (Newman and Krowne, 1998.)
thickness and port apertures as the round device. Its major diameter was chosen to be identical to the round device’s diameter. Sidewalls containing the ports were 1.7 mm in extent on the hexagonal device. What is interesting is the tremendous similarity of the contour plots for the round and hexagonal circulators. This result is not entirely surprising when it is realized that the excitations, symmetry locations of the ports, material parameters and magnetic field bias conditions, and magnetic wall constraints holding are all identical. The other reason why the field plots are so similar is that the device is based upon circulation behavior, a physical effect expected to be alike in both configurations. Also, the resonator aspects of both the round and hexagonal pucks should yield related standing-wave patterns. Thus we conclude that the Fig. 43 results are quite reasonable.
VII. CONCLUSIONS We have covered computer aided design of microstrip circulators in this contribution. A special effort has been made to provide a very general context in which such design is performed, namely, giving the background in the area of material attributes of the constituent ferrites all the way to
182
CLIFFORD M. KROWNE
another area of study, that of the formulas needed to implement computational engines. Therefore, in Section I1 we treated the molecular physical and chemical properties of ferrite materials so that compositions could be selected which were acceptable for planar microstrip circulators. Section I11 went over the fabrication methods required for making planar ferrite layers, including hybrid, magnetless, and monolithic circuit compatible techniques. Essential considerations for modeling, such as ferrite material parameter requirements, matching sections needed to take the circulator port impedance to the external circuit impedance, multilayer first-order effects, and loss effects, were all treated in Section IV. Equations needed to interface the Green’s function or finite element computation engines with other routines under the control of the user, so that static internal (bias) magnetic field, rf field, or s-parameters could be found, were all given in Section V. Finally, the last main area was treated in Section VI, giving the simulation results obtained from numerical modeling and the experimental results found from measurements, and their comparison. A tremendous amount of recent research work has gone into bringing circulator fabrication, theory, and numerical modeling into the realm of state-of-the-art technology compatible with hybrid microwave circuits and monolithic microwave integrated circuits. These developments have appeared in the literature only over the last few years, and some of the material has never been put under one moniker until this chapter. The motivation to proceed at a rapid pace, leading to the new developments in the planar circulator field, has been due to a combined industrial, university, and government research laboratory effort. The results of this effort will surely continue to bear fruit in the coming years.
REFERENCES Abe, M., Itoh, T., Tamaura, Y., Gotoh, Y., and Gomi, M. (1987). Ferrite plating on GaAs for microwave monolithic integrated circuit. I E E E Trans. Magnetics 23, 3736-3738, Sept. Adam, J. D., Buhay, H., Daniel, M. R., Driver, M. R., Eldridge, G. W., Hanes, M. H., and Messham, R. L. (1995). Monolithic integration of an X-band circulator with GaAs MMICs. IEEE Microwave Theory Tech. Symposium Digest, 97-98, May. Adam, J. D., Buhay, H., Daniel, M. R. Eldridge, G. W., Hanes, M. H., Messham, R. L., and Smith, T. J. (1996). K-band circulators on semiconductor wafers. I E E E Microwave Theory Tech. Symposium Digest, 113-1 15, June. Blight, R. E. and Schloemann, E. (1992). A compact broadband circulator for phased array antenna modules. IEEE Microwave Theory Tech. Symposium Digest, 1389- 1392, June. Bosma, H. (1964). On Stripline Circulation at UHF. IEEE Trans. Microwave Theory & Tech. 12,61-72, Jan. Buhay, H., Adam, J. D., Daniel, M. R., Doyle, N. J., Driver, M. C. Eldridge, G. W., Hanes, M. H., Messham, R. L., and Sopira, M. M. (1995). Thick yttrium iron garnet films produced by
CAD USING G R E E N S FUNCTIONS AND FINITE ELEMENTS
183
pulsed laser deposition for integration applications. IEEE Trans. Magnetics 31,3832- 3834, Nov. Cvetkovic, S. R., Zhao, A. P., and Punjani, M. (1994). An implementation of the finite element analysis of anisotropic waveguides through a general purpose PDE software. IEEE Trans. Microwave Theory & Tech. M T T 42, 1499-1505, Aug. Dionne, G. F. (1987). High magnetization limits of spinel ferrite. J . Appl. Physics 61, 3865-3867, April. Dionne, G . F. (1988). Molecular-field coefficients of MnFe,O, and NiFe,O, spinel ferrite systems. J . Appl. Physics 63, 3777-3779, Aug. Dionne, G. F. (1990). Spin states and electronic conduction in Ni oxides. J . Appl. Physics 67, 4561-4563, May. Dionne, G. F. (1996). Magnetic exchange and charge transfer in mixed-valence manganites and cuprates. J . Appl. Physics 79, 5172-5174, April. Dionne, G. F., Cui, G.-J., McAvoy, D. T., Halpern, B. L., and Schmitt, J. J. (1995). Magnetic and stress characterization of nickel ferrite ceramic films grown by jet vapor deposition. IEEE Trans. Magnetics 31, 3853-3855, Nov. Dionne, G. F. and West, R. G. (1987). Magnetic and dielectric properties of the spinel ferrite system Ni,,,,Zn,,,,Fe,_,Mn,O,. J . Appl. Physics 61, 3868-3870, April. Dorsey, P. (1996). Private communication, Naval Research Laboratory. Greer, J. and Tahat, M. (1994). Large-area pulsed laser deposition; techniques and applications. Proc. Amer. Vacuum SOC.Meet. Denver, CO, Oct. Published in J . Vac. Soc. Tech. A 13, 1175-1181, May/June, 1995. Gupta, K. C., Garg, R., and Chadha, R. (1981). Computer Aided Design of Microwave Circuits. Artech House: Dedham, MA. Halpern, B. L. and Schmitt, J. J. (1994). “Deposition technologies for thin films and coatings,” in Jet Vapor Deposition (R. F. Bunshah, Ed.). Noyes Publications, Park Ridge, NJ, 2nd ed. Harrington, R. F. (1961). Time-Harmonic Electromagnetic Fields, New York: McGraw-Hill. Jackson, J. D. (1975). Classical Electrodynamics. New York: Wiley. Joseph, R. 1. and Schloemann, E. (1965). Demagnetizing field in non-ellipsoidal bodies. J . Applied Physics 36, 1579-1593, May. Krowne, C. M. (1996a). Theory of the recursive dyadic Green’s function for inhomogeneous ferrite canonically shaped microstrip circulators Adu. Imaging Electron Physics 98 (P. W. Hawkes, Ed.). San Diego, CA: Academic Press, 77-321. Krowne, C. M. (1996b). 3D dyadic Green’s function for radially inhomogeneous circular ferrite circulator. IEEE Microwave Theory Tech. Synzposium Digest, 121- 124, June. Krowne, C. M. (1997). Symmetry Considerations Based Upon 2D EH Dyadic Green’s Functions for Inhomogeneous Microstrip Ferrite Circulators. Microwave Optical Technology L6,rt.t 16, 176-186, Oct. Krowne, C. M. (1998a). Dyadic Green’s function circulator theory for Inhomogeneous Ferrite with and without penetrable walls, in Adu. Imaging Elecrron Physics 103 (P. W. Hawkes, Ed.). San Diego, CA Academic Press, 151-275. Krowne, C. M. (1998b). 2D Cross-Coupling Recursive Dyadic Green’s Function, Circuit Parameters and Fields For Radially Inhomogeneous Microstrip Circular ferrite Circulators. Microwave Opticul Technology Letrs. 17, 140- 148, Feb. Krowne, C. M. and Neidert, R. E. (1995). Inhomogeneous ferrite microstrip circulator: theory and numerical calculations using a recursive Green’s function. 25th European Microwave Corrference Digest, pp. 414-420, Sept., Bologna, Italy. Krowne, C. M. and Neidert, R. E. (1996). Theory and numerical calculations for radially inhomogeneous circular ferrite circulators. IEEE Trans. Microwave Theory Tech. 44, 41943 I , March. Kurakawa, K. (1965). Power waves and the scattering matrix. IEEE Trans. Microwave Theory Tech. 13, 194-202, March.
184
CLIFFORD M. KROWNE
Landdolt-Bornstein (1970). Group Ill: Crystal and Solid State Physics, Vol. 4. Magnetic and Other Properties of Oxides and Related Compounds, Parts a and b, in Numerical Data und Functiond Relationships in Science and Technology (K.-H. Hellwege, Ed.). Berlin: SpringerVerlag. Lax, B. and Button, K. J. (1962). Microwaoe Ferrires and Ferrimagnetics. New York: McGrawHill. Morrish, A. H. (1965). The Physical Properties oJ Magnetism. New York: Wiley. Neidert, R. E. (1995). From unpublished notes, Naval Research Laboratory. Neidert, R. E. and Philips, P. M. (1993). Losses in Y-junction stripline and microstrip ferrite circulators. IEEE Trans. Microwave Theory Tech. 41, 1081-1086, June/July. Newman, H. S. and Krowne, C. M. (1998). Analysis of Ferrite Circulators by 2-D Finite Element and Recursive Green’s Function Techniques. IEEE Truns. Microivuve Theory Tech. 46, 167-177, Feb. Newman, H. S., Webb, D. C., Krowne, C. M. (1996). Design and realization of millimeter-wave microstrip circulators. Proc. Intern. Conf Millimeter Submillimeter Wuues Appl. 111 Digest, S P I E Proceedings, Denver, CO., 181-191, Aug. Parisot, M. and Soares, R. (1988). GaAs MESFETs: S parameters, measurements and their use in circuit design, in GaAs MESFET Circuit Design (R. Soares, Ed.). Boston: Artech House. Pollert, E. (1984). Crystal Chemistry of Magnetic Oxides Part 1: General Problems-Spinels, in Progress in Crystal Growth and Characterization, Vol. 9 (B. R. Pamplin, Ed.), Oxford: Pergamon Press. Ramo, S., Whinnery, J. R., and Van Duzer, T. (1967). 2nd ed. Fields and Wuves in Communication Electronics. New York: Wiley. Raytheon Co. (1995). Ferrite Development Corporation 6th Quarterly Report. Schieber, M. M. (1 967). Experitnental Mugnetochemistry. New York: Wiley. Sewell, G. (1993). PDEZD: Easy to use software for general two dimensional partial differential equations. Adu. Engin. Software 17, 105- 112. Sewell, G. and Cvetkovic, S. R. (1989). Waveguide-an interactive waveguide program. Adu. Engin. Software 11, 169-175, Oct. Smit, J. and Wijn, H. P. J. (1959). Ferrites. New York: Wiley. Soohoo, R. F. (1960). Theorji and Applicafion of Ferrites. Englewood Cliffs. New Jersey: Prentice-Hall. Standley, K. J. (1972). Oxide Magnetic Materials. Oxford: Clarendon Press (Oxford Univ. Press). Trans-Tech, Inc. (1989). Microwuve Materiuls. A Technical Supplement. Publi. #500301 R 1. Adamstown, MD. Vaughn, J. T., Popelka, D. J., and Williams, A. S. (1996). Applications of CAD tools for planar ferrite circulators. Workshop on Ferrite CAD and Applications. IEEE Microivuve Theory Tech. Symposium, June. von Aulock, W. H. (1965). ed. Hundbook ofMicrowave Ferrite Materials. New York: Academic Press. Webb, D. C . (1995). Design and fabrication of low-cost ferrite circulators. 25th European Microwave Conference Digest, 1191- 1200, Sept., Bologna, Italy. Weiss, J. A., Watson, N. G., and Dionne, G . F. (1989). New uniaxial-ferrite millimeter wave junction circulators. IEEE Microwave Tlieory Tech. Symposium Digest. 145- 148, June. Williams, C. M., Chrisey, D. B., Lubitz, P., Grabowski, K. S., and Cotell, C. M. (1994). The magnetic and structural properties of pulsed laser deposited epitaxial MnZn ferrite films. J . Appl. Physics 75, 1676-1680, Feb.
ADVANCES IN I M A G I N G A N D ELECTRON PHYSICS, VOL. 106
Discrete Geometry for Image Processing STEPHANE MARCHAND-MAILLET Instirut EURECOM, D~purtmentof' M ~ l t i m e ~Cvmniunicutions, i~i B. P. 193. Sophiu-Antipoli.r, France 06904
1. Introduction . , . . 11. Binary Digital Images
. . . . . . . . . . . . . . . . . . . . 111. Digital Topology . . . . . . . . . , . . A. Neighborhoods. . . . . . . . . . . . B. Digital Arcs and Closed Curves . . . . . C. Image-to-Graph Mapping . . . . . . . IV. Discrete Geometry . . . . . . . . . . . . A. Discrete Distance and Shortest Paths . . B. Discrete Convexity . . . . . . . . . . C. Discrete Straightness . . . . . . . . . V. Extensions in the 16-Neighborhood Space . . A. Definitions. , . . . . . . . . . . . . 9.Grid-Intersect Quantization GIQ,, . . .
. . . . . .
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Discrete Straightness in the 16-Neighborhood Space VI. Application to Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . A. A Greedy Algorithm . . . . . . . . . . . . . B. Checking Discrete Straightness Using the Duality Generated by 7; . . . . VII. Conclusion , . . . . . . . . . , , . , . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
.
. .
. .
186 187 189 190 191
.
195
. .
198 198 210 212 2 18 218 224 227 233 233 234 234 235
. .
. .
. .
. . . . .
Image processing operations typically involve processing of data in discrete form. Information given by such data is mostly recovered via the study of interrelationships between discrete points (i.e., pixels). There is, therefore, a need for developing a context in which concepts used are kept consistent with this kind of data. In this paper, we summarize and extend results known in discrete geometry from the construction of a discrete topological concept to the characterization of geometrical properties of discrete sets of points. The context of binary image processing is taken as a support for illustrating this study. Emphasis is placed on characterizing straightness and convexity in discrete spaces. This is done via the definition of discrete distances, which are shown to be close to well-known concepts in graph theory. An extended neighborhood space is also constructed and shown to provide us with more flexibility and compactness than classically used neighborhood spaces while preserving the possibility of characterizing analytically the main geometrical properties of discrete points. 185 Volurnc IOh ISBN (1- 12-014748-3
ADVANCES IN I M A G I N G A N D ELECTRON PHYSICS Copyright I', 1999 by Academic P r e s All lights of reproduction in m y form rrservcd. ISSN 1076-5670/99 S3000
186
STEPHANE MARCHAND-MAILLET
The study developed in this paper can form the basis for different extensions, both regarding the richness of the neighborhood used and the quantity of information available at each pixel location.
I. INTRODUCTION The use of computers for the development of new technologies has imposed processing of data in discrete form. Information is no longer continuous but rather given at some discrete locations in time or space. This is particularly true in the context of image processing, where pictures are digitized into pixels. The global information contained in the image is recovered via the study of interrelationships between pixels. For analysis of such data, therefore, it is crucial to obtain formal discrete characterizations similar to that known in continuous spaces. Discrete geometry is one such field which aims for characterizing concepts such as straightness in a discrete space. In this paper, we first present the construction of a formal context in which such characterizations will be further developed. Advantages of such an approach are illustrated using the context of binary image processing. Then, we recall major results in discrete geometry in relation to the well-studied %neighborhood discrete space. Discrete convexity and discrete straightness are mostly considered here. In a second part, we extend these properties to a newly constructed discrete space. While doing that, we derive some results as to the advantages of such a mapping. More precisely, the paper is organized as follows. Section 11 presents the particular class of binary images which will be used as a support to our developments. By this means, this section briefly recalls the underlying structure of most discrete spaces encountered in image processing. The identification of image components relies on a connectively relationship between pixels. This topological context, which introduces the concept of neighborhood between pixels, has been developed in the early stages of binary image analysis and is presented in Section 111. In turn, digital topology allows for the definition of connected subsets of pixels such as arcs and curves. We consider connectivity in relation to square lattices. In other words, pixels in the image are arranged on the unit square grid. Such an underlying structure facilitates analytical developments and image storage and remains the most widely used framework. The originality of this study lies in the fact that we will map the digital topology onto a combinatorial structure, the grid graph. This approach has been suggested in [Ronse, 1985b1 and developed in [Sharaiha, 19913. Image-to-graph mapping provides us with efficient procedures for solving discrete optimization problems [Gondran and Minoux, 19841. Moreover, efficient data structures have been created to manage data in this context (e.g., see [Christofides, Badra, and Sharaiha, 19971).
DISCRETE GEOMETRY FOR IMAGE PROCESSING
187
Pixels are now grouped in discrete objects (e.g., paths or connected components), and it is the properties of these subsets that are under study. Two aspects are generally considered for analysis. In order to perform shape characterizations, geometric notions such as straightness and convexity are to be defined in discrete spaces. On the other hand, measurements within the image are necessary. Both discrete geometry and shape measurements thus rely on the definition of a distance. For consistency with the context in which image analysis operations are studied, purely discrete distances have been proposed. Different approaches are generally taken for their definitions. However, a common framework defines discrete distances using known local distances within a neighborhood or within combinations of neighborhoods. Section IV summarizes these advances in relation to the 8-neighborhood space. Building on this, Section V presents the construction of the 16-neighborhood space within which equivalent characterizations will be derived. In particular, we introduce two new discrete distances in this space which will form the basis for the development ofdiscrete properties in this space. It is also shown that the approach taken allows for an easy mapping of most of the properties known in the 8-neighborhood space into the 16-neighborhood space. Finally, Section V1 suggests a direct application of these results to the domain of binary image processing. 11. BINARYDIGITAL IMAGES
The acquisition of an image is generally done using a set of physical captors. The acquisition process can thus be accurately modeled as a sampling of the continuous image using a discrete partitioning of the continuous plane. For the sake of simplicity, only partitions involving regular polygons are considered, that is, polygons with sides of constant length and a constant angle between them. It is easy to show that, for constructing a partition of the plane, only three regular polygon types can be used. The possible numbers of sides of the regular polygon used are three, four, and six, leading to triangular, square, and hexagonal partitioning schemes, respectively (see Figure 1).
FIGURE1. Different sampling schemes.
188
STEPHANE MARCHAND-MAILLET
m
FIGURE2. Pixels resulting from sampling shown in Figure t
In the mathematical model of an image, the pixel area is identified with its center leading to the representation of pixels as discrete points in the plane. As shown in Figure 2, a lattice can be built which connects all such pixel centers. The sampling partition is represented with dotted lines and the pixel center as black dots ( 0 ) (as seen in Figure 3, later). The lattice represented with continuous lines is dual to the partition in the sense that two pixels are joined in the lattice if and only if the two partition polygons share a common edge. A triangular partition results in a hexagonal lattice. Conversely, a hexagonal partition will result in a triangular arrangement of pixels, the triangular lattice. Finally, for a square sampling of the image, the pixels can be considered as integer points of a square lattice. Physically, such polygons represent captors sensitive to the intensity of light. Their output is a value on a scale. In a greyscale image, each pixel is thus associated with a single color value. Equivalent to the sampling of the spatial domain of the image, the color scale is sampled using a given number of discrete ranges. We consider greyscale images where the color scale is one-dimensional. When using only two such ranges representing white and black colors (0 and 1, respectively), we obtain binary images. As result of the complete acquisition process, a two-dimensional binary image is given as a two-dimensional array of pixels in which each pixel is associated with a color value which can be either 0 (white pixel) or 1 (black pixel). In order to define mathematical tools for picture processing such as connectivity and distance measurement, we need to set a theoretical basis
FIGURE 3. Equivalence between triangular and square lattices.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
189
on the discrete set of pixels thus obtained. Digital image processing relies heavily on the definition of a topology which forms the context in which local processing operators will be defined. In this work, we will specialize in square lattices and partitions, since they represent the most suitable case for analytical study. Moreover, it will become apparent that a mapping can be defined which creates a relation with other types of regular partition (e.g., triangular partitions).
JII. DIGITAL TOPOLOGY It is commonly known that the discrete topology defined by pure mathematics cannot be used for digital image processing, since in its definition every discrete point (i.e., a pixel in the image processing context) is seen as an open set. Using this definition, a discrete operator would consider the image as a set of disjointed pixels only, whereas it is generally admitted that the information contained in the image is stored in the underlying pixel structure and the neighborhood relations between pixels. Alternative definitions have been proposed. Jn contrast with classic discrete topology, digital image processing is based on digitul topology [Chassery and Chenin, 1980; Kong and Rosenfeld, 1989; Rosenfeld, 19791. The definition for digital topology is based on a neighborhood for every point. Neighborhoods in digital topology are typically defined by referring to the partition dual to the lattice considered. For a given point, defining its neighboring points is equivalent to defining a relationship between the corresponding pixel areas in the partition. The simplest instance is when the neighbors of a pixel are defined as the pixels whose areas share a common edge with the pixel area in question (direct neighbors). Extensions for this principle are also considered by defining indirect neighbors for a pixel. Subsection J1J.A introduces neighborhoods defined on the square lattice. Because of the simplicity of their definitions, these neighborhoods are commonly used for the definition of digital image processing operators. Moreover, it can easily be shown that there exists a one-to-one mapping between the square and triangular lattices as sketched in Figure 3. The hexagonal lattice is of limited practical use because of the coarseness of the pixel distribution it induces and the unrealistic aspect of the dual triangular partition. Building on these definitions, Subsection J1I.B formally defines digital arcs and connected components, which will form the basis for further study. Finally, Subsection 1JJ.C sets the basis for the analogy between topological relationships and combinatorial structures.
190
STEPHANE MARCHAND-MAILLET
A. Neighborhoods
For main neighborhoods are generally defined on the square lattice. Firstly, the 4-neighborhood (N,(p)) includes the four direct neighbors of the point in question (see Figure 4(A)). By duality, they are pixel areas which share a common edge with the center pixel area. This neighborhood is completed using pixel areas which share a common corner with the pixel area in question (indirect neighbors), leading to the %neighborhood of the point p , N,(p) (see Figure 4(B)). By analogy with a chess board, the 8-neighborhood corresponds to all possible moves of the king. Extending this analogy, the knight-neighborhood (Nknigh,(p)), which corresponds to all possible moves of a knight on the chess board, can also be defined (see Figure 4(C)). Finally, the combination of the 8- and the knight-neighborhood yields the 16-neighborhood of p , NI6(p)(see Figure 4(D)). Figure 4 illustrates the construction of these neighborhoods. The lattice is shown as continuous lines, whereas dotted lines represent the dual partition. Using this notation, centers of pixels thus lie at the intersections between continuous (i.e., lattice) lines.
Remark ZZZ.1 It is important to note that the square lattice is simply a translated version of its dual partition. Moreover, the positions of the points on this lattice are well suited for matrix storage. For these reasons, neighborhoods on the square lattice are the most well studied and the most commonly used. Rosenfeld [ 19791 defined digital topology on this lattice. For later purposes, it is generally the case that codes are associated with moves in the neighborhood in question. Typically, starting from the positive move along the horizontal axis numbered as 0, moves are sequentially numbered in a counterclockwise fashion as shown in Figure 5.
(C)
(D)
FIGURE 4. Neighborhoods on the square grid. (A) N,(p): 4-neighborhood. (B) N,(p): 8-neighborhood. (C) Nknighf@): kniyhr-neighborhood. (D) N , J p ) : 16-neighborhood.
191
DISCRETE GEOMETRY FOR IMAGE PROCESSING
2
4
3
(A)
5
6
(B)
7
l1
(C)
'3
FIGURE5. Codes associated with moves on the square grid. (A) 4-neighborhood. (B) 8-neighborhood. (C) 16-neighborhood.
B. Digital Arcs and Closed Curves The concept of neighborhood allows for the definition of local connectivity between points. Digital arcs and curves are simply an extension of this property. In turn, they impose conditions on their underlying neighborhoods.
Definition III.2 Digital Arc. Given a set of discrete points with their neighborhood relationship, a digital arc P,, from the point p to the point q is defined as a set of points P,, = {pi;i = 0,. . . ,n} such that:
(9 Po = P> P , = 4. (ii) Vi = 1,. . . ,n - 1, p i has exactly two neighbors in the arc P,,, the points pi- and p i + (iii) p o (respectively, p,) has exactly one neighbor in the arc P,,, namely p1 (respectively, pn- l). Definition 111.3 Cardinality of a Digital Arc. The variable n is called the cardinality of the digital arc P,, and is also denoted /Ppql.
A set of points may satisfy the conditions to be a digital arc using a specific neighborhood but may not satisfy these conditions for a different neighborhood. Since most of the definitions and properties depend on the neighborhood used, we specify this dependence by adding the neighborhood prefixes (i.e., 4-,8-, or 16-) to the names of the properties or digital objects cited. For instance, a digital arc in the 16-neighborhood will be referred to as a 16-arc. Equivalently, a 16-arc is a digital arc with respect to the 16-connectivity relationship. Using the definition of a digital arc, a connected component on the lattice is defined as follows. Definition 111.4 Connected Component. A connected component on the
192
STEPHANE MARCHAND-MAILLET
lattice is a set of points such that there exists an arc joining any pair of points in the set.
A further restriction on connectedness leads to the simple connectivity. Definition ZZZ.5 Bounded and Simple Connected Component. On the infinite lattice, a connected component that contains an infinite number of points is said to be unbounded. On the finite lattice, a connected component is unbounded if and only if it intersects the border of the lattice. Otherwise, it is said to be bounded. A simple connected component is a connected component whose complement does not contain any bounded connected component. By definition, a digital arc is a simple connected component. An important notion in the continuous space is that of closed curves, which, in turn, define holes. In the continuous space, Jordan’s theorem characterizes a closed curve as a curve which partitions the space into two subparts, the interior and the exterior (see, e.g. [Voss, 19931). The definition of a closed curve in the discrete space relies on that of a digital arc.
Definition 111.6 Digital Closed Curve. A digital closed curve (or equivalently, a digital curve) on the lattice is a set of points such that the removal of one of its points transforms it into a digital arc. A version of Jordan’s theorem in the digital space can then be formulated.
Theorem 111.7 Discrete Jordan’s Theorem. A digital curve defines exactly two separate connected components on the lattice, the interior and the exterior. ?herefore, there should be no arc joining these two subsets. Remark 111.8. Theorem 111.7 emphasizes the fact that, by definition, a digital closed curve is not a simple connected compound since its interior is bounded (i.e., contains a finite number of points). In general, a connectivity relationship cannot be used for both a set and its complement. A duality between possible (k- and k’-) connectivities and neighborhoods on the lattices is to be defined. We introduce this notion of duality via the following example.
Example 111.9 Dual Neighborhoods on the Square Lattice. In Figure 6 , the %curve C does not separate the digital plane into two 8-components. As counterexample, there exists an 8-arc joining two potential interior and exterior points p and q, respectively. However, it is clear that an 8-curve will define two 4-connected components as its exterior and interior. Hence, discrete Jordan’s theorem will be satisfied when using %connectivity (respectively, 4-connectivity) for the curve and 4-connectivity (respectively,
DISCRETE GEOMETRY FOR IMAGE PROCESSING
193
FIGURE6. An 8-digital closed curve.
8-connectivity) for the interior and exterior on the square lattice. Via this duality, the neighborhood relationships are extended to the connectivity relationships. Therefore, points can now be grouped in different subsets on which operations are to be performed. Border ofa Digital Set. An important subset of points in digital topology is the set of border points which separates a digital set from its complement.
Delinition II1.10 Border (faDigital Set. Given a k-connected set of points P , the complement of P , noted P', defines a dual connectivity relationship (noted k'-connectivity). In our case, k = 8 and k = 4 when using the duality between 8- and 4-connectivities. The border of P is the set of points r, defined as the k-connected set of points in P that have at least one k'-neighbor (i.e., a neighbor with respect to the k'-connectivity) in P'. An example for this definition can be given when the set of points represents the pixels in a binary image. A binary image is represented by an array of discrete points labeled with a value (1 or 0) which indicates the black or white color of the corresponding pixels, respectively. By convention, two basic subsets can be identified. Delinition III.11 Foreground and Background in a Binary Digital Image. (i) The foreground is the set of points F which are labeled with a value equal to 1. By convention, the foreground corresponds to the set of black pixels in a binary image. (ii) The background is the complement of the set F , noted F'. It is the set of points associated with a zero-value. By convention, the background corresponds to the set of all white pixels in the image. (iii) The border points are the points which form the border to the set according to Definition 111.10. The corresponding pixels in the image are called border pixels. A point (respectively, a pixel) which is
194
STEPHANE MARCHAND-MAILLET
not in the border set is referred to as an interior point (respectively, interior pixel).
Remark ZZI.12. The foreground and the background may both contain more than one connected component. Example 111.13 Border qf a Binury Digitul Image. Consider the digital image shown in Figure 7(A). The black pixels (i.e., points of the foreground F ) are symbolized as black circles ( 0 ) and the white pixels (i.e., the points of The 8-connectivity is considered in the background F') as white circles (0). the foreground F , hence the 4-connectivity is considered in the background (i.e., k = 8 and k' = 4). Depending on whether the foreground or the background is considered as an open set, two different border sets are defined. In Figure 7(B), the foreground is considered as a closed set. Hence, it contains its border. By definition, the border of the foreground is the set of black pixels r that have at least one white pixel among their 4-neighbors. The points in this set are surrounded by a square box in Figure 7(B). Conversely, in Figure 7(C), the foreground is considered as an open set. The border thus belongs to its complement, namely, the background. In this case, the border is the set of white pixels that have at least one black pixel among their 8-neighbors. The points in are surrounded by a square box in Figure 7(C). From this example, it is clear that the two borders arising from these cases are different. Remark ZZI.14. Although the set of border points of a connected component is a connected component with respect to the connectivity of the set it belongs to, it generally does not satisfy the conditions for being a digital closed curve. In the example shown in Figure 7(B), r, the border of the foreground is %connected but the point p in the rightmost bottom corner has three %neighbors in r. Similarly, in Figure 7(C), is a 4-connected Y
'
'
0
I
0
0
0
0
0
D
D
0
~
"
"
............... ............... ............... ............... ............... ............... 0
0
0
O
V
O
0
0
0
0
0
.
0
0
0
.
0
~
0
0
0
0
0
0
D
0
0
0
D
O
O
U
D
O
D
0
0
0
0
0
~
0
0
0
(A)
0
0
0
0
0
0
D
0
0
0
Y
0
0
0
0
"
~
0
"
0
"
Y
0
0
0
0
0
0
0
0
0
"
0
0
0
0
0
"
0
0
0
0
D
0
D
0
0
0 0 0 b J ~ . a 0 ~ 0 * 0 0 0 0 0O
0
0
0
0
0
0
O E ] I @ [ . J O
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
l g l g . . p J o o & ~ @ ~ a o o
O o ~ . m O o o + @ m o o o o
o m .*.m &i ::.m a o o
Y O B . . @ o @ . . . @ o o o
0@
[email protected]
O
O
~
.
.
.
@
0
0
~
.
.
.
.
.
.
.
.
.
"
.
'
.
@
~
.
0
a
O
O~...........~~
o@...........~o
0
Oe@.........@ao 0 00 0
0@...........@0
0 ~ 0
0 0
0 ~
0
0
~
0 0
~ 0
0 0
(B)
~
~ ,
0
0
0
0
0 ~~ ~ . . .~ . . . . ] . . . p~ J o
~ 0
0 0
0 0
o0 o
0 0
0
~@ 0
0
0
@~~ D
0
m 0
@ 0
0
@ 0
p 0
~ 0
~
~ 0
0
~
~
[
g
~
@
0
(C)
FIGURE7. Borders in a binary digital image. (A) The representation of the binary image on the square lattice. (B) The foreground is taken as a closed set. (C) The foreground is taken as an open set.
o
~
~
DISCRETE GEOMETRY FOR IMAGE PROCESSING
195
component. However, the point q has three 4-neighbors in r. Therefore, in neither case is r a closed curve as defined in Definition 111.6. Discrete sets are now well characterized in digital topology. The next section introduces the mapping between concepts previously presented and terminology given by graph theory. This will allow for creating a favorable context in which a formal study of discrete geometry can be performed. C. Imuge-to-Gruph Mupping
Earlier work relates image processing and graph theory. Connectivity relationships are mapped onto the graph-theoretical concept of adjacency in the study presented in [Ronse, 1985b1. Based on this theoretical context, accurate topological thinning can be characterized [Ronse, 1985d1. Similar results are also developed in [Suzuki, Ueda, and Sklansky, 19931 where arcs in the connectivity graph are successively deleted to simulate an erosion process. The concept of discrete distance is formulated using graph theory in the early work presented in [Montanari, 19681. Further developments on the study of discrete distances using graph mapping can be found in [Sharaiha and Christofides, 19941. However, in most of these references (with the exception of the last one), the concept of a graph is only used to represent connectivity relationships between pixels. By contrast, we will make use of powerful properties of combinatorial structures and related algorithms for formalizing and extending results in discrete spaces. A graph G = ( K A) is based on the definition of a discrete data set (vertices in V ) and their interrelationship (arcs in A). A digital image is a set of discrete points on which a digital topology can be defined. Moreover, digital topology introduces the concept of neighborhood for a pixel, which in turn defines digital arcs and curves. It is thus clear that a graph G = ( K A ) can be defined using the set of pixels F in the image as set of vertices I.: Such a graph is referred to as the grid graph of the image.
Definition 111.15 Grid Graph (see [Ronse, 1985b; Sharaiha, 19913). Given a set of pixels F in the image and a connectivity relationship on which a digital topology is based, the grid graph G = ( K A ) of the image is defined as follows: (i) To every pixel p in F corresponds a vertex u in I.: (ii) An arc (u, u) exists in A whenever the pixels p and q corresponding to vertices u and b, respectively, are neighbors in the digital topology. The forward star of a given vertex u is the set of vertices u such that arcs (u, u ) exist in A . In this study, the forward star of a vertex LI
196
STEPHANE MARCHAND-MAILLET
corresponds to the set of pixels q in the neighborhood of the pixel p associated with the vertex u. (iii) The length I(u, u) associated with the arc (u, v ) is the length of the move made between the corresponding two pixels p and q, respectively. (iv) The abstract grid graph corresponding to the infinite lattice is called the complete grid graph Immediate properties of the grid graph are given in Proposition 3.16. Proposition 111.16. B y definition of digital topology. ( i ) The grid gruph G = ( V, A ) of an image is spurse. The number ojpixel neighbors to a given pixel is limited by the size of this neighborhood. Typically, M , d k, N , , where M , = \ A ( , N , = I and k = 4 (4-neighborhood), 8 (8-neighborhood), or 16 (16-neighborhood). (ii) The grid gruph on a set of N pixels can be constructed in lineur time (i.e., in Q ( N ) operations).
Remark 111.17. In the previous sections, pixels were identified with discrete points (pixel centers). From now on, a further analogy identifies pixels and vertices in the grid graph. Therefore, pixels will be equivalently referred to as discrete points (e.g., p , q ) or vertices (e.g., u, u). Similarly, depending on the context, the set of pixels will be equivalently noted F or V by analogy with the set of vertices in the grid graph. Finally, arcs in the grid graph will be equivalently referred to as moves on the underlying lattice. Definition 111.18 Path and Path Length. A path between two vertices u and u in the grid graph G = (I!A ) is a set of vertices P,, = {uo,u l . . . . ,u,,} such , u any ~ + i~= O), ~..., A n-1.Theexpression that ~ ~ = ~ , ~ , = ~ a n d a r c ( u , for n = lP,,,l is the cardinality of the path P,,, and l(PuV)= C:Z,' l(uz,u,+ I) is the length of this path. The notion of connectivity between pixels is mapped onto that of adjacency between vertices in the grid graph. Therefore, the definitions of connected components in digital topology and graph theory are clearly equivalent. Moreover, using this image-to-graph mapping, the concept of the neighborhood of a pixel is directly mapped onto that of the forward star of a vertex. In the case of a complete grid graph, the forward star of a vertex u readily contains the neighborhood of the corresponding pixel p (e.g., N , ( p ) ) . In the case where the grid graph spans only vertices corresponding to a subset F of pixels in the image (e.g., the foreground pixels in the image), the forward star of a vertex u in such a grid graph will characterize the pixel neighbors to u which are included in F (e.g., N,(p) n F ) .
197
DISCRETE GEOMETRY FOR IMAGE PROCESSING
Example 111.19 Grid Graph of the Foreground of a Binary Digital Image. Consider the binary digital image shown in Figure 8(A). The set F of foreground pixels (i.e., black) is displayed as black circles ( 0 ) . Empty circles (0) represent background pixels (i.e., white) in F'. Figure 8(B) shows the grid graph of the foreground F when considering the 8-neighborhood relationship (i.e., 8-grid graph). Clearly, this graph is sparse. In this example, F is considered as a closed set and thus border pixels are foreground pixels (i.e., c F). By definition of an interior pixel p (i.e., p $ r), all dual neighbors of p are included in F (ie., N,(p) c F in this case; see Definition 111.10). Therefore, such a pixel p is characterized in the grid graph by a vertex whose forward star contains (N,(p)( = 4 vertices corresponding to its 4-neighboring pixels. For example, this is the case for vertices u1 and u2 in Figure 8(B). By opposition, uo is a border vertex. Figure 8(C) shows an example of a path P,,,, between vertices uo and u2 in the grid graph. The analogy between such a path and a digital arc is discussed next. The mapping between a digital arc and a path in the grid graph needs further precision. A digital arc was defined as a set of neighboring pixels such that each pixel in the digital arc has exactly two neighbors, except for the start and end vertices (see Definition 111.2). Therefore, it is clear that a corresponding path in the grid graph is a simple path. However, an additional condition for a simple path in the grid graph to correspond to a digital arc in the image is required. This condition simply states that, for each vertex u in such a path, exactly two vertices in the forward star of u in the graph are included in the path, except for the start and end vertices, each of which has only one adjacent vertex in the path in question. This condition
0 0 0 0 0 0 0 0 0 0 000.....00 00.......0
0........0 0....00000 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0....00..0
0........0
o........o 00.......0
0 0 0 0 0 0 0 0 0 0
(A)
u1
(B)
UO
(C)
FIGURE8. ( A ) Binary digital image. (B) Corresponding 8-grid graph. (C) A path in the grid graph.
198
STEPH A N E MARCH AND- M A I L L ET
will always be satisfied on any shortest path in a grid graph, as discussed in the next section. For example, in Figure 8(C), P,,, does not correspond to an 8-digital arc since the predecessor of vertex u , has three 8-neighbors on this path. However, it is easy to verify that each subpath P,,,, and P,,,,defines an 8-digital arc.
IV. DISCRETE GEOMETRY Discrete geometry aims for the characterization of geometrical properties of a set of discrete points. Geometrical properties of a set are understood to be global properties. Points are grouped, thus forming discrete objects, and it is the properties of these discrete objects that are under study. In contrast, digital topology, described in Section Ill, allows for the study of the local properties between discrete points within such an object. In short, topological properties such as connectivity and neighborhood are first used to define discrete objects, and discrete geometry then characterizes the properties of these discrete objects [Chassery and Montanvert, 1991; Rosenfeld and Melter, 1989; Voss, 19933. A . Discrete Distance and Shortest Paths
In this section we first recall existing definitions and results in discrete geometry applied to the 8-neighborhood space. Based on the conclusions derived earlier, we aim to map such results in an extended neighborhood space, the 16-neighborhood space. Subsection IV.A.1 introduces the concept of discrete distances. This concept is detailed in relation to the analogy with graph theory, leading to the study of shortest paths in the grid graph in Subsection IV.A.2. Finally, Subsection IV.A.3 gives some insights as to the relation between discrete and continuous distances. 1. De3nitions
By analogy with the continuous space, a discrete distance function should verify the classic metric conditions given by Definition IV.l. Definition tV.1 Distance. Given a set of points P, a function d : P x P
---* R + is said to be a distance on P if and only if it satisfies the following conditions:
(i) d(p, q) is defined and finite for all p and q in P (d is total on P). (ii) d(p, q ) = 0 if and only if p = q (d is positive definite).
DISCRETE GEOMETRY FOR IMAGE PROCESSING
199
(iii) d(p, 4 ) = 4 4 , p ) , V(p, q) E P x P (d is symmetric). (iv) d(p, 4 ) 4 4 , r) 2 d(p, r), V(p, 4, r) E P x P x P (d satisfies the triangular inequality).
+
In the digital topology, distance calculations are based on local distances within the neighborhood of a point. Their definitions are related to basic moves on the corresponding lattice as introduced by Definition IV.2.
Definition ZV.2 Move and Move Length. A move on the lattice is the displacement from a point to one of its neighbors. A move length is the value given as local distance between a point and one of its neighbors. The notion of length for a move can be readily extended to that of a digital arc.
Definition ZV.3 Length o f a Digital Arc. The length of a digital arc is the sum of the length of the moves that compose it. The generic definition for a discrete distance is as follows.
Definition IV.4 Discrete Distance. Given the lengths for all possible moves in a neighborhood, the distance between two points p and q is the length of the shortest digital arc (i.e., the arc of minimal length) from p to q. Although the distance between two points is given as a unique value, the digital arc which realizes this distance is not necessarily unique. The fact that such a distance satisfies the metric conditions relies on the definition of move lengths. Originally, a unit value has been attributed to any move length (e.g., see [Rosenfeld and Pfaltz, 19683). In this case, the digital arc associated with the distance between p and 4 is the arc of minimal cardinality joining p and 4. Real or integer move lengths have been designed for a discrete distance related to a specific neighborhood to achieve a close approximation of the Euclidean distance in the plane [Borgefors, 1984; Montanari, 19681. Common definitions of distances are presented here. For each distance, the corresponding discrete disc (Definition IV.5) obtained is also presented. The geometrical properties of such discs constitute an important factor in characterizing how close a discrete distance can approximate the Euclidean distance.
Definition ZV.5 Discrete Disc. Given a discrete distance d,, a discrete disc of radius r 2 0 centered at point p for this distance is the set of discrete points A&, r) = ( 4 such that d,(p, 4 ) < r ) . When no reference to the center point is necessary, a discrete disc of radius r for the distance d , will also be noted as AD(r). In the particular case of an infinite square lattice, a point p on the lattice
200
S T ~ P H A N EMARCHAND-MAILLET
can be uniquely characterized by an integer pair (x,, y,) (the coordinates of the point p in the Z2 plane). Conversely, any integer pair ( x , , y P ) ~ Z 2 represents a point p on the square lattice. Therefore, there exists a one-toone mapping from points on the square lattice to 22’. This property eases the definition of analytical expressions for the discrete distances on the square lattice. Definition IV.6 recalls the analytical expression of the Euclidean distance d , that is used as reference in both continuous and discrete spaces.
Definition IV.6 Euclidean Distance. Given two points p = ( x p , y,) and q = (x,, y,) the Euclidean distance value between p and q is given by
It is easy to verify that d , satisfies the conditions to be a distance given in Definition IV.1. A11 move lengths are first set to unity, leading to the d , distance (Definition IV.7) and d , distance (Definition lV.9).
Definition IV.7 City-Block Distance. The City-Block distance (or Manhattan distance) between p and q is the length of the shortest 4-arc joining p and q when the move lengths are all set to unity. The City-Block distance between p and q is noted as d,(p, q ) and is also referred to as the d , distance. The location of the points on the square lattice allows for an equivalent definition of the d , distance.
Proposition IV.8. Given two points p = (x,,y,) and q = (x,, y,) on the square lattice, the minimal cardinality of a 4-arc joining p to q is given by d,(P> 4 ) = IX, - xpl + ly, - Y,l As a consequence of Proposition IV.8, the 4-neighborhood of the point p
can be characterized as follows: N,(p)= ( q = ( x q , y,)
E Z2
such that Ixq-x,I
+ lyq-ypl = 1)
Vp=(x,,
y,)
E
Z2
More generally, a discrete 4-disc centered at p and of radius r (e.g., see Figure 9) is characterized by
A&, r ) = ( q = ( x q ,y,) E Z2 such that Ix,
- xpI
+ ( y , - y,l < r )
A simple extension of the d , distance on the 8-neighborhood leads to the definition of the Chessboard distance.
Definition IV.9 Chessboard Distance. The Chessboard distance (or Diamond distance) between p and q is the length of the shortest %arc joining p and q when the move lengths are all set to unity. The Chessboard distance
. ... ..... ....... ..... ....
20 1
DISCRETE GEOMETRY FOR IMAGE PROCESSING
FIGURE 9. 4-disc of radius 3: AJ3)
between p and q is noted as d&, q ) and is also referred to as the d , distance. Again, using the coordinates of integer points, d, can be given an analytical expression as follows.
Proposition IV.10. Given two points p = (x,, y,) and q = (x,, y,) on the square lattice, the minimal cardinality of an 8-arc joining p to q is given by d,(P, 4 ) = max(lx, - Xplr lYq - Ypl)
From Proposition 1V.10, the 8-neighborhood of the point p can be characterized as follows: N,(p) = { q = (x,, y,)
E Zz
such that max(lx, - xpl, lyq- y,l)
= 11
vP=(X,,Yp)EzZ
Therefore, an 8-disc of radius r centered at p is also defined by
A&, r) = { q = ( x q ,y , ) e h 2 such that max(lx,
- xpl, Iy, - ypl) d r )
(See Figure 10 for an example.) Since the move lengths that define the d, and d , distances are all equal to 1, both these discrete distance functions satisfy the metric conditions given in Definition IV.l. Note there exists a strong similarity between the norms IlGll I = 1x51 + IY,L I l 4 2 = and Illill z = SUP(lX,l, IL’iil) defined in the continuous space R 2 and d,, d,, and d, on the digital space. Recalling that IjUII, 2 IIU112 2 I l i l l , V i i ~ Rz, this property is mapped in the digital space as d,(P, 4 ) 2 d,(P, (7) 2 d,(P, dVP9 4 E zz. Remark IV.11.
JiGFm
Combining the %neighborhood with the knight-neighborhood, thus forming
....... ....... ....... ....... FIGURE10. &disc centered at p and of radius 3. A&p. 3)
202
STEPHANE MARCHAND-MAILLET
the 16-neighborhood with unit move lengths, does not yield a distance (see Remark IV.16, later). These neighborhoods are thus not detailed further at this stage. With the aim of improving simplicity and accuracy in the approximation of Euclidean distance on the square lattice, chamfer distances have been introduced as a generalization of the previous definitions [Borgefors, 1984; Borgefors, 19861. In chamfer discrete distances, moves are given different lengths depending on some criteria. Chamfer distances have been intensively studied for developing image processing operators. The generic definition of a chamfer distance is given as follows. DeJinition IV.12 Chamfer Distance. Given a neighborhood and associated move lengths, the chamfer distance between p and q relative to this neighborhood is the length of the shortest digital arc from p to q.
A chamfer distance is relative to a neighborhood associated with move lengths. The cases of further neighborhoods presented in Subsection 1II.A are successively detailed. Starting with the 4-neighborhood, the length of a 4-move is noted a. In this respect, a 4-move is also called an a-move. Clearly, in the 4-neighborhood all moves are equivalent by symmetry or rotation. In this case, the only possible definition of a discrete distance that is geometrically consistent is that of the d, distance, where u = 1. A simple extension of the 4-neighborhood leads to the %neighborhood. Diagonal moves are added to the horizontal and vertical moves. The length of such diagonal moves is noted b. In this respect, diagonal moves are called b-moves and the chamfer distance obtained in the %neighborhood is noted du,b.Given any positive value for a (i.e., the length for all 4-moves), in order to preserve a geometrical consistency within the 8-neighborhood the diagonal moves should be associated with a length b larger than a. In this context, the most natural value is b = a$, since it allows for an exact value of the chamfer distance along the diagonal lines from a given point. However, for the sake of simplicity of computation and storage, it is also important to preserve integer arithmetic for distance calculations. In this respect, integer values for a and b have been derived (e.g., see [Borgefors, 1984; Borgefors, 1986; Hilditch and Rotovitz, 1969; Montanari, 19681). The most commonly used set of such values is (a = 3, b = 4) [Borgefors, 1984; Borgefors, 19861. Referring to Subsection I K A , a further extension defines the 16-neighborhood. The knight-move is introduced and its length is noted c (thus defining a c-move). The chamfer distance obtained in the 16-neighborhood is noted du,b,c.Assuming that a = 1, b = $,the value c = $ allows for an exact chamfer distance value along the lines that support the c-moves. For
DISCRETE GEOMETRY F O R IMAGE PROCESSING
203
preserving integer calculations of chamfer distances, the lengths of the moves are commonly approximated included in the 16-neighborhood (1, $, by using the set of integer values (u = 5, b = 7 , c = 11) [Borgefors, 1984; Borgefors, 19861. The fact that a chamfer distance satisfies the metric conditions given in Definition IV.l depends on the values of the move lengths. Hence, restrictions on these values for chamfer distances to satisfy the metric conditions have been set.
3)
Proposition IV.13. are
The conditions on a and hfor da,hto be a discrete distance
0 < a d b 6 2u The typical values a = 3 and b = 4 satisfy these conditions and thus d3,4 is a distance in the 8-neighborhood. In this case, the value of the diagonal move length $ is approximated by 4.
Remark ZV.4. Note that the values a = b = 1 used for the definition of d , satisfy the conditions given in Proposition 4.13. Therefore, d , and d, can be seen as particular cases of chamfer distances in the 4- and 8-neighborhoods, respectively. Similar conditions can be expressed in the 16-neighborhood for da.b,c to be a discrete distance.
Proposition IV.15. The values of a, b, and c should satisfy the .foflowYng conditions for da,b,cto be a distance on the 16-neighborhood. 0 < a 6 b 6 2u d c and c 6 a
+ b and 3b d 2c
Again, the typical values for the move lengths a = 5, b = 7, and c = 11 satisfy the above conditions. Therefore, d5,,.l is a distance. In this case, the diagonal move length $ is approximated by = 2, and the knight-move length of $ is approximated by = $.
Remark ZV.16. The values a = b = c = 1 do not satisfy the conditions given in Proposition IV.15. Therefore, as mentioned earlier, an extension of d, in the 16-neighborhood by setting all move lengths to unity is not possible. Chamfer discs are presented in Figure 11. Typically, the convex hull of a chamfer disc in the 8-neighborhood is an octagon which approximates the Euclidean circle, depending on the values of u and b. More generally, a chamfer disc is a polygon with as many sides as there are different moves in the neighborhood on which the chamfer distance is defined.
204
STEPHANE MARCHAND-MAILLET
. ....... ............. ............... ............... ................. ................. ................... ................. ................. ............... ............... ............. ....... . (A)
(B)
(C)
FIGURE11. Chamfer discs. (A) A3,J27). (B) Au,h.(C) Aa,h,L
The definition of chamfer distances readily suggests further extensions of the neighborhoods. This procedure makes use of Farey sequences to define extra basic moves (e.g., see [Montanari, 19681). Conditions on the lengths of these moves can be developed analytically [Thiel and Montanvert, 19921. By analogy with the definition of a discrete distance, it is clear that the discrete distance value between two pixels is the length of the shortest path between the two corresponding vertices in the grid graph [Harary, Melter, and Tomescu, 1984; Montanari, 1968; Montanari, 1970; Morris, de Jersey Lee, and Constantinides, 1986; Sharaiha and Christofides, 19941. Moreover, the properties of the shortest path justify the fact that such a length defines a distance. The definition of grid graph allows for the use of shortest path algorithms, since it is a sparse graph with typically small positive arc lengths (see Proposition 111.16). Finally, using such an approach one can take advantage of byproducts arising from such algorithms. DeJnition IV.7 Shortest Path Buse Graph. Given the grid graph G = (I!A ) with arc lengths and two vertices U E V and U E I! the shortest path base graph associated with the vertices u and u is the subgraph SPBG(u, u) of G formed by all possible shortest paths from u to u. The notation for the shortest path base graph will include the dependency of the neighborhood relationship considered with an index k (i.e., SPBG,) corresponding to that neighborhood space (e.g., k = 4,8, 16). Typical properties of a shortest path base graph in the 8-neighborhood space are given in Example IV.18. Example IV.18 Shortest Path Base Gruph in the 8-Neighborhood Space (SPBG,). Consider the complete 8-grid graph G = A ) presented in Figure 12(A). Given the two vertices u E I/ and u E r/; the shortest path base graph SPBG,(u, v) is shown as bold lines in Figure 12(B).
(v
205
DISCRETE GEOMETRY FOR IMAGE PROCESSING
V
(A) FIGURE12. (A) h e i g h b o r h o o d complete grid graph. (B) Shortest path base graph SPBG,(u, u).
Montanari [1968] has proved that there exists a shortest path in the complete 8-grid graph between any two vertices u, u E V that consists of only two straight segments, one horizontal (or vertical) and one diagonal. Thus it is clear that any shortest path in the complete 8-grid graph will be composed of, at most, two basic directions. Hence, the shortest path base graph SPBG,(u, u) is included in a parallelogram shape as shown in Figure 12(B).
Definition ZV.19 Number of Moues in a Path. In the 8-grid graph, arcs correspond to either a- or b-moves. In this respect, given an 8-path P,, between two vertices u and u, k,(u, u ) (respectively, k,(u, u)) denotes the number of arcs corresponding to a-moves (respectively, b-moves) in P,,,. The length of P,,, is thus given by l(PUt,)= u.k,(u, u ) b.k,(u, u). Similarly, in the 16-grid graph, the length of a 16-path P,,, is given by /(Pun)) = u.k,(u, u ) b.k,(u, u ) c.k,(u, u), where k,(u, u) is the number of arcs corresponding to c-moves in
+
+
+
The %grid graph considered now is that shown previously in Figure 8 and described in Example 111.19. In contrast with a complete grid graph, it is the grid graph of a bounded connected component. Figure 13(A) shows the shortest path spanning tree rooted at uo obtained with arc lengths a = 3 and b = 4. In such an 8-grid graph, the previous description of a shortest path is not always valid, since the shortest path between two vertices may be constrained by the border of the component. Figure 13(B) shows such a shortest = 31 between uo and uz. Clearly, this shortest path in path of length I(PuoUI) the grid graph corresponds to an %digital arc. Moreover, it is the only possible shortest path between uo and u2 in the grid graph.
206
STEPHANE MARCHAND-MAILLET
FIGURE13. (A) Shortest path spanning tree in the grid graph shown in Figure 8(A). (B) An example of shortest path.
We now take a closer look at shortest paths in the 16-neighborhood space. Our aim is to highlight the advantages of this neighborhood in some applications compared to the 8-neighborhood commonly used. Shortest paths are first characterized and their properties further detailed. Results concerning the comparison of shortest path lengths and cardinalities in the 8- and 16-neighborhood spaces are developed in Subsection IV.A.2 to emphasize the need for such a study. Given two vertices u and u, we define SP,,(u, u ) as the 16-shortest path between these two vertices. Let (xu,y,) and (xu,y,) be the coordinates of the vertices u and u in the real plane. The origin is arbitrary. The 16-shortest path SP,,(u, u) can be defined by the respective number of moves taken in the three directions on the grid graph. Let k,(u, u), kb(u, u), and k,(u, u ) be the composition of arcs corresponding to a-moves, b-moves, and c-moves, respectively, on the 16-shortest path from u to u.
Proposition IV.20. SP,,(u, u) is composed of urcs corresponding to, ut most, two types of moues. Proof: Given two vertices u and u, let CJ be the slope of [ p , q ] . Without loss of generality, we only consider the case of the first octant (0 d CJ < 1) as shown in Figure 14. = 0: k,(u, u) = Ix, - x,l; k,(u, u ) = 0; k,(u, u ) = 0 0 < (T < 3:Ix, - x,] > Zly, - y,] then: CJ
kc(%
t<
4 = IY” - y,l; k,(u, u) = Ix, - x,I
CJ
= +: k , ( U , U) = 0;
(T
< 1: 21y,
- y,l
kb(u, U)
> Ix,
- x,I
- 2k,(u, u);
k,(u, u ) = 0
= 0; k,(u, U) = ly, - y,l > ly, - y,l then:
207
DISCRETE GEOMETRY FOR IMAGE PROCESSING
I
a and c moves
/
U
- - - - Borders of the Shortest Path Base Graphs FIGURE 14. Illustration of the shortest paths in the first quadrant of R2
k,(u,
=
lxi, =
- lyt' - y u l ;
1: k,(u, 0 ) )
=
0;
kb(u, 0 ) =
(L'i, - yul - k,(%
0));
k,(u, 0 )
=0
kb(u, 0 ) = Ix,. - xul; k,(u, 0) = 0
The other cases are equivalent by symmetry. Proposition IV.20 can be seen as an extension of Montanari's [1968] characterization of (8-)shortest path. More specifically, the possible combinations of moves on SP,,(u, v ) are: a, b, or c-moves occurring singly, a and c-moves, or b and c-moves. The combination of a and b-moves never occurs, since the condition a b 2 c has to be satisfied. The shortest path base graph SPBG,,(u, 11) is the subgraph formed by of all possible 16-shortest paths from u to u. A n example is given in Figure 15, where the two moves are a and c.
+
2. Shortest Path Ccrrdinc~litg
In this subsection we compare the cardinality of the 16-shortest path S P 1 6 using d,,b.r with that of the 8-shortest path SP, using du,b. We assume that the values of a and b are the same for the 8-neighborhood and the 16-neighborhood, and that a, b, and c satisfy the conditions defined in the previous sections. Throughout, we take V to be the set of all pixels on the unit square grid.
208
STEPHANE MARCHAND-MAILLET
FIGURE15. Shortest path base graph SPBG,,(u, u).
Proposition IV.21.
lsP,6(U, u)l
< lSP,(p, q)1 Vu, V E I.:
PvooJ: Without loss of generality, we only consider the case of the first octant (0 d CJ d 1). If: *
.
a=0,1: ISP,,(U?
41 = ISPdU, 41 = max(lx,
- XUI,
IYI, - Y I' H
o=l. 2.
ISP,6(U?41 = min(lx, - Xulr
IY,.-
YUI)
Now, min(lx, - 4, Iyt, - Y,O < max(lx, - -4,IY, - y,l) And, max(lx, - xu(,IY,
0 < CJ <
a: ISP,,(u,
- Y J = ISP&
u)l
= Ix, - x,I
- IY, - Y,l
Now, 1% - x,l - IY, - Y,l < 1%. - x,I And, Ix, - x,l
=
ISP&
41
*+<(T<1: IsP16(~, 4 = ~ I Y -, yUl- Ix,, - x,I
NOW,21y,
- y,( - lx,
And, 2IX, - X,I - IX,
- x,I < 2/x1,- xu( - 1x0 - x,I - X,I
= IXl, - X u ( = ISP,(U, u)I
Therefore, the storage of a 16-shortest path between two pixels p and q can be, at worst, equal to that of a 8-shortest path ((TE{ - 1,0,1, a})and, at best, half that of the 8-shortest path ((T E { -& +}). We have a similar proposition regarding the length of the shortest path as that of its cardinality. Proposition IV.22. I(SP 16(u, u ) ) d I(SP,(u, u) Vu, u E F.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
209
Proof. We only consider the case of the first octant (0 d rs d 1).
.
d = 0,l:
d.LL.
l(SP,(U, u ) ) = l(SP,,(U, u)) = u.k,(u, u) + b.k,(u, u)
2.
mp,,(U,
4) = C.lY,
1(SP,(U, u))
- Y,l - IY, - Yul)
= U.(IX, - XI,
Since /x, - x,I Also, since a
= 2/y,
-
+ b.lY,. - Yul
y,l, then l(SP,(u, v))
= (a
+ b).ly,, - y,l
+ b 3 c, then I(SP,(u, u)) 2 c.Jy, - y,J = l(SP,,(u, u))
O
4) = 4 Y " - Y,l + (IX, - X"I - 21Y" - y,l).a I(SP,(U, 4) = U.(IX" - XI, - IY, - Yul) + b.IYi, - Y"l & S P , ( ~4) , - USP,,(U, U)) = (b + a - 4 l Y , - Y,l 4SPl,(U,
Since a
+ b 3 c, then
WP,(U, 4)
-
l(SP,,(U, u)) 3 0 * I(SP*(U,u)) 3 USP,,(u,
0))
*+
l ( S P , , ( U , u)) = c.(Ix, - X,I
l(SP,(u,
0))
+
= u.(I.G - Xu( -
IYI,
IY,
l(SP,(u, 4) - NSP,,(u, u ) ) = ( b Since u
+ b.(21Y, - Yul - IX,, - Xul)
- Y,l)
-
Y,O
+a
+ h.lY, -
-
YJ
c).(ly, - Y,l - (XI, - X,l)
+ b 3 c, then
l(SP,(U, 0 ) )
-
I(SP,,(u,
0))
3 0 * l(SP,(U, u)) 3 1(SP,,(U, 0 ) )
Remark IV.23. The need for the condition u above result.
+ b 2 c is emphasized by the
The preceding result highlights the performance of discrete distances based on the 16-neighborhood for the approximation of Euclidean distances. The addition of the knight-move allows for more flexibility and thus more precision when approximating real distances. Moreover, the relation between discrete distances in the extended neighborhood and Euclidean distances is given in the next subsection.
3. Relation with Euclidean Distance Given k,(u, u), k,(u, u), and k,(u, v) on the 16-shortest path between z.1 and u, the Euclidean distance between u and u is given by the following proposition.
210
STEPHANE MARCHAND-MAILLET
Proposition IV.24
dE(f4 u) = J(k,(% u ) + kb(%
U)
+ 2k,(U, U))’
-!- ( k b ( u , U )
+ k , ( U , V))’
VU, UE
I/
Proof: With the aid of Figure 14, we consider the following cases for O
dE(u,U ) = k,(u, u), k,(u, U ) = 0, k,(u, V ) = 0 0 < 0 < f: d,(u, U ) = J(k,(u, 0 ) + 2kc(U, U ) ) 2 4- kc(u, U ) 2 , kb(u, V ) = 0 0 = 4:dE(U, V ) = J4kC(u, v ) + ~ k,(u, u)’, k,(u, U ) = 0, k,(u, u) = 0 4 < 0 < 1: dE(u, u ) = J(kb(u, U ) 2k,(u, U ) ) 2 -!- (kb(u, 0 ) k , ( U , U ) ) ’ ,
+
+
The other cases are equivalent by symmetry. Therefore, Proposition IV.24 holds.
Remark IV.25. A similar formula is readily obtained for the case of the 8-neighborhood space. In this case, d,(u, u ) = J(k,(u,
0)
+
kb(u, u))2
+ kb(u, 0)’ vuj UE I/
Such a result allows for further developments for characterizing analytically errors made when using discrete distances compared to continuous distances (see e.g. [Marchand-Maillet and Sharaiha, 19971). As a byproduct, they allow for the characterization of optimal values of move lengths (e.g., a, h, c) regarding this criterion. B. Discrete Convexity Results presented in this subsection summarize the development of definitions and characterization of discrete convexity. Most of these results are presented in great detail in [Chassery, 1983; Kim, 1981; Kim, 1982; Kim and Rosenfeld, 1982; Kim and Sklansky, 1982; Ronse, 1985a; Ronse, 1986; Ronse, 19891. We adopt the notation in [Ronse, 1985a1 for introducing discrete convexity.
Definition IV.26. Notation ,for Discrete Convexity Given a set of discrete points P , the cardinality of P is the number of discrete points that is included in P and is noted \PI. Given a set of discrete points P = ( p i } , ( P ) is the set of discrete points contained in [PI, the continuous convex hull of P.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
21 1
If P contains a finite number of discrete points (i.e., n = IPI is finite), then P can be written as P = { p o , p l , . . . ,p,}. In this case, ( P ) can be equivalently written as (po, pl,. .. ,p,,). A continuous set S is convex if and only if [S] = S . A similar characterization of discrete convexity via the continuous convex hull of P can be formulated as follows. Proposition IV.27. A set of discrete points P is discrete convex if and only if any discrete point contuined in the convex hull of P belongs to P. In short, P is discrete convex if and only i# ( P ) = P. See [Chassery, 1983; Kim, 1981; Kim and Rosenfeld, 19821.
Figures 16(A) and 16(B) display the resulting characterization of discrete convexity. Moreover, Proposition IV.27 holds if and only if for any discrete points p , q, and r in P, ( p , q, r ) E P ; for a set of discrete points, such a property is called triangle-convexity (T-convexity) [Ronse, 1985a1. Different alternative characterizations of discrete convexity exist. Under certain conditions (mostly simple connectivity), they can be proved to be equivalent to one another. Figure 17 gives a summary of such developments. A geometrical characterization of discrete convexity similar to that of continuous convexity can now be formulated. Proposition IV.28. A set of discrete points P is discrete convex ifand only if any point pi E P is connected to any other point p j E P by un 8-digital straight segrnenf whosepoints belong to P. See [Kim and Rosenfeld, 19821.
Such a characterization gives a first insight as to the definition of discrete straightness. When applied to digital arcs, Proposition IV.28 reduces to the following. Proposition IV.29. A digital arc is u digital straight segment lfand only ifit is discrete convex. See [Kim and Rosenfeld, 19821.
FIGURE16. A characterization of discrete convexity.
212
STEPHANE MARCHAND-MAILLET
Cellular Convexity
([Kim1981; Kim and Rosenfeld 19821) + simple &connectivity L-Convexity ([Ronse 1985a1)"
i_Ji'
T-Convexity (Prop. IV.27 and [Ronse 1985a1)
+ 4-connectivity
I+ 4-connectivity II
4-convexity ' ([Ronse 1 9 8
I
5
a
1
8-Convexity 7 ([Ronse 1985a1)
FIGURE17. Equivalence between characterizations of discrete convexity.
Subsection C (following) will detail a formal characterization of discrete straightness. In Section V we aim to extend these results to the 16neighborhood space. C. Discrete Straightness
In the discrete space, straightness is referred to as discrete straightness. The following introductory definition for this concept is given. Definition IV.30 Digitul Struight Segment. A discrete set of points is a digital straight segment if it is the digitization of at least one continuous straight segment. Definition IV.30 only takes its full meaning when the digitization scheme is defined. The most commonly used digitization scheme is the grid-intersect quantization [Freeman, 19701 and is presented here through the following example. Consider the continuous segment [cr,p] and the square lattice shown in Figure 18. The intersection points between [a,P] and the lattice lines are mapped to their nearest integer points. In case of a tie, the discrete point which is locally at the right of [a, p] is selected ( [ a ,p] is oriented from a to p). This digitization scheme is illustrated in Figure 18 by the fact that intersections between [a ,p ] and lattice lines are mapped to their closest discrete points on the lattice.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
213
FIGURE 18. Grid-intersect quantization.
The set of discrete points { p i } i = o , , , . ,resulting , from the digitization of a continuous segment [CI,p] is called the digitization set of [a, 81. It can then be shown that the grid-intersect quantization of a continuous straight segment is an 8-digital arc [Rosenfeld, 19741. Different approaches have been taken to characterize discrete straightness in the 8-neighborhood space. Clearly, the minimum requirement for a set of discrete points to form a digital straight segment is that this set forms a digital arc defined by Definition 111.2.
1. Freeman’s Codes The first characterization of a digital straight segment has been given by Freeman [1970; 19741. This characterization is descriptive and makes use of codes that are defined for all possible moves in the 8-neighborhood (see Subsection 1II.A). The particular structure of a sequence of such codes (i.e., the chain-code) is then used to characterize discrete straightness (Proposition IV.33).
DeJinition ZV.31 Freemun’s Codes and Chuin-Code. All possible moves in the 8-neighborhood are numbered successively counterclockwise from 0 to 7, as shown in Figure 19.
FIGURE 19. Freeman’s codes in the 8-neighborhood
214
STEPHANE MARCHAND-MAILLET
The encoding ( c ~ } ~....,= (tic {O, 1 , . . . ,7}) of a given sequence of 8-moves . . , called , the chain-code of this defined by the discrete points { P ~ ) ~ = ~ , . is sequence. Example IV.32 Chain-Code. The chain-code of the 8-move sequence depicted in Figure 20 is (O,O, 1,3,0,0,0,6,7,0,2,2,2,4,4,4,4,4,4,4,6,6)
The characterization of a digital straight segment using the chain-code is formulated as in Proposition IV.33. Note that since the grid-intersect quantization of a continuous straight segment is an 8-digital arc, Proposition IV.33 below assumes that the 8-move sequence considered forms an %digital arc. Proposition IV.33. An 8-digitul arc is a digital straight segment if and only
if its chain-code sutisjrs the following conditions (see [Freeman, 19703): ( i ) A t most two types of codes can be present, and these can difer only
by unity, modulo eight. (ii) One of the two code vulues always occurs singly. (iii) Successive occurrences of the code occurring singly are us unijormly spuced as possible.
Algorithms that test for the straightness of a digital arc can be derived from adapted versions of this proposition. They are based on different rules derived from the conditions given in Proposition IV.33 (e.g., [Freeman, 1970; Hung, 1985; Rosenfeld, 19741). 2. Chord Properties
This subsection introduces a different class of characterization for discrete straightness called chord properties. Originally proposed by Rosenfeld [1974], the chord property (Proposition IV.34) remains one of the major results in discrete geometry. Variations and generalizations of the original
FIGURE20. An example of the use of Freeman’s code.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
215
characterization have been proposed and are also detailed in this subsection. Proposition IV.34 first introduces the chord property as originally formulated by Rosenfeld C1974). Proposition IV.34. An 8-digital arc‘ P,, = (piJi=o,...,n satisfies the chord property ifand only if; for any two discrete points p i and pi in P,,, and for any real point u on the continuous segment [pi,pj], there exists a discrete point pkE Pp4 such that d,(a, p k ) < 1.
Remark IV.35. In Proposition IV.34, the definition of the (Chessboard) d , distance is extended to real points via its analytical characterization given by Proposition IV.10 (i.e., d,(u, 8) = max((x, - xzI, ly, - y,l) for any u = ( x ~y,) , and B = (xs, ys) in 88’). Geometrically, the chord property and the resulting visibility polygon can be illustrated by Figure 21. Given the digital arc P,,, the shaded polygon in Figure 21 illustrates the set of points u E R2 such that there exists a discrete point pk E P,, such that d,(u, p p ) < 1 (i.e., the visibility polygon is the union of 8-discs of unit radii centered at every discrete point pk of P,,). From Proposition IV.34, P,, satisfies the chord property if and only if the continuous segment [pi,pj] is totally contained in this area for any i and j in {O,. . . , n } . The chord property can therefore be reformulated as follows: “An 8-digital arc P,, = { P ~ J ~ = ~ ,satisfies . . . , ~ the chord property if and only if any point pi is visible from any other point p j within the visibility polygon defined by { a € R 2such that d,@,,u) < 1 for all k = 0,. . . ,n).” Figure 22 illustrates an instance where the conditions for the chord property are not satisfied. In this example, it is clear that u E [ p , , p s ] is such that d,(pk,u) 2 1 for any k = O,.. . , n . In other words, a is outside the visibility polygon and p 1 is not visible from ps (and conversely) within the visibility polygon.
,
,
....,
,..
.
....
.
.
”
....
.
.. .
.
,
,.... :. , .
.
,
~ . . .
I ,
,
I
,
”
,
,
/
.:__ L. . ,
.
.
.,
,!
, .I_.
.
.
.,.
i.. , , .I
FIGURE 21. Example of the validity of the chord property.
216
STEPHANE MARCHAND-MAILLET ,
.,
.
...
.
.
, .... (
,
.. ,
. I
,
. ,
..
,
.
.... ,
. .
. ",
.
.
,
.
I
FIGURE 22. Example for the violation of the chord property.
Reinark ZV.36. If P,, does not satisfy the chord property, then there exist two points pi and pi in P,, such that [pi,p j ] intersects the visibility polygon associated with P in a single point r (i.e., there exists p k € P p y such that p4: d,(p,, r ) = 1). In this case, r is an integer point. This property is illustrated in Figure 23, where Pp, is the same digital arc as in Figure 22, i = 1,j = 7, and k = 3 or k = 4. The chord property is an essential result since it gives an analytical formulation of discrete straightness via Theorem IV.37.
Theorem IV.37. In the &digital space: ( i ) The digitization of a straight line is a digital arc and has the chord property. (ii) If a digital arc has the chord property, it is the digitization qf u straight-line segment. See [Rosenfeld, 19741.
The original proof of Theorem IV.37 can also be found in [Rosenfeld, 19741. A simpler proof based on Santal6's theorem is given in [Ronse, 1985~1. This result enables one to test the discrete straightness of an 8-digital arc without reference to any related continuous segment whose digitization
FIGURE23. A special case for the violation of the chord property
DISCRETE GEOMETRY FOR IMAGE PROCESSING
217
would yield the digital arc in question. Moreover, the concept of visibility is important since it readily suggests a simple greedy algorithm which would successively test for the visibility of a point from other points in the digital arc under study (see Section VI). Areas of visibility polygons defined in Proposition IV.34 are clearly not minimal. By definition, this polygon should be convex. The compact chord property aims for the reduction of visibility polygons by using the d , distance.
Proposition IV.38. An 8-digital arc P,,, = fpi)i-o.....n satisfies the compact chord property ij'and only if: f o r any two distinct discrete points p i and pi in P, and f o r any real point a on the continuous segment [ p i , p i ] , there exists a real point PE R 2 in the broken line u i [pi,pi+,] such that d,(a, p) < 1. See [Sharaiha and Garat, 19933. Remark IV.39. In Proposition IV. 38, the definition of the d , distance is extended to real points via its analytical characterization given by Proposition IV.8 (i.e., d,(a, 8) = Ixp - x,I Iy,, - y,1 for all a = (x,, y,) and P = ( x p ,y p ) in R2).
+
The visibility polygon defined in the compact chord property is the set { C ~R E 2 such that d4(/3, a) < 1) where BE R 2 is on the continuous segment [ p i , p,] for all i, j = 0,. . . ,n. I t thus corresponds to a unit 4-disc swept along the broken line ui [pi, pi+I]. Figure 24 illustrates the difference between visibility polygons induced by the chord and compact chord properties, respectively. The shaded polygon is the visibility polygon defined by the compact chord property (Proposition IV.38), wheres the dashed bold polygon represents the contour of the visibility polygon defined by the chord property (Proposition IV.34). Since one is always included in the other, the term compact chord property was used. Using the same example as in Figure 22, Figure 25 shows that the digital arc also fails to satisfy the compact chord property. More generally, the , 0
I,
,
I
*
,
,
,
,
FIGURE24. Example for the validity of the compact chord property
218
STEPHANE MARCHAND-MAILLET
... _,. ..... . _,_. .-...- - .,.. .
,
.
.
- .,.. . .. ....,.
...,..
...,.. .
.
.
. ..
FIGURE25. Example for the violation of the compact chord property.
exact equivalence between the chord and the compact chord properties is proved in [Sharaiha and Garat, 19931. Our aim in the next section is to give an equivalent characterization of discrete straightness in the 16-neighborhood space. IN THE 16-NEIGHBORHOOD SPACE v . EXTENSIONS
In order to facilitate the study of geometrical properties in the 16-neighborhood space, we first introduce the concept of distance in this extended neighborhood space. Two new discrete distance functions will be constructed in Subsection V.A which allow for the analytical characterization of the 16-neighborhood as was the case for the City-Block and Chessboard distance in the 4- and 8-neighborhood spaces, respectively. Based on these results, a digitization scheme is then defined which maps real straight segments onto 16-digital arcs (Subsection V.B). This digitization scheme is typically equivalent to the grid-intersect quantization introduced previously in the 8-neighborhood space. Finally, Subsection V.C formally characterizes discrete straightness in the 16-neighborhood space. A . DeJinitions
In this subsection we will establish an analogy between a 16-digital arc and an 8-digital arc. We introduce a transformation which uniquely maps an 8-digital arc to a 16-digital arc (and vice versa). This is motivated by the fact that the 16-digital arc does not necessarily visit the pixels on all the grid lines (vertical or horizontal) between the two end points (as deduced from Proposition IV.21). The objective would be then to define a distance in the N , , space and, finally, a characterization of a 16-digital straight segment. We define the transformation on the chain-code of the digital arc in question, where the subscript i indicates that the transform is chain-code
219
DISCRETE GEOMETRY FOR IMAGE PROCESSING
,
,
,
,
.
...., .., .., .., .
- ., I
._,_ .., ... I
T3
,
T,
T,
FIGURE 26. The 7: transformation
(i.e., move) dependent. It will become apparent that this subscript indicates the octant in which the digital arc lines (e.g., see Figure 26, later). We recall that the 16-chain codes are given by {0,1,. . . ,151, and, using the same numbering scheme, the 8-chain codes are given by even codes {0,2,4,. . . , 14}, as shown in Figure 5 (Subsection 1II.A).
Example V.1 T2 Trunsform. We first describe the case for i = 2 as an example. Let Pp4be a 16-digital arc which is also a shortest path between p and q (SP,,(p, 4)). Let us assume that the slope D of [ p , q] lies between 0 and 3. This implies that P,,, will be composed of a-moves and c-moves. The T, transformation applies for the first octant codes (i.e., 0 d G- 6 1). The analytical expression of T2 is given by: T2:(x, y ) H (x - y, y). A chain-code c represents a pair of displacements (dx, 6,) in the discrete space. For this reason, we use equivalently the notation IT;(d,, 6,) and T ( c ) . In particular, the chain-codes in N , , in this octant {0,1,2} are mapped to the chain-codes in N , as follows: T2(0)= 0, T2(1)= 2 and T,(2) = 4. More formally, we define the 7; transformation and its inverse T-' for i = 0,. .. , 3 representing the four octant on the right-hand side of the grid shown in Figure 26 (ie., - co 6 D d + co) as follows: - cc
d
CT
d
-
1: To:(x, y) H(X, y
+ x) and TG': (x, y ) H(X,
y - x)
220
STEPHANE MARCHAND-MAILLET
T0(12) = 12, T0(13) = 14, and T0(14) = 0
- 1 < CT d 0: T,: (x, y ) H(x
+ y , y) and T c ':(x,y )
H(x -
y, y )
T1(14) = 12, T,(15) = 14, and T,(O) = 0 06
cr
d 1: T2:(x, y ) w ( x - y , y) and T;': (x, y) ~
T2(0)= 0, Tz(l) = 2, and T'(2) 1d
CT
d
+ co: 7'': (x, y) H(x, y
= -
(
+ xy , y)
4 x) and T;
':(x, y ) H(x,y + x)
T3(2) = 0, T3(3)= 2, and T3(4)= 4 The four other cases on the left-hand side of the grid are the same by symmetry with the origin. Figure 26 depicts graphically how the 16-chain codes are transformed onto the 8-chain codes. Thus, a 16-digital arc (respectively, 8-digital arc) P,, can be mapped onto the 8-digital arc (respectively, 16-digital arc) P,.,. using the transformation P,.,, = 7;(P,,) (respectively, P,.,, = q-'(P,,)) where p = p'. In the case illustrated in Figure 27, P,, is defined by the chain-code sequence {O, 1,0,0,1,1} and P,,,, is given by the chain-code sequence {O, 2,0,0,2,2}. The d, and d, distances have been established as distance functions. It was also shown in [Das and Chatterji, 1988; Das and Mukherjee, 19903 that Nknigh, could also be characterized by a discrete distance called dknight. By contrast, the 16-neighborhood N , , defined by the union of N , and Nkn,gh, cannot be simply characterized by a discrete metric (see Remark IV.16). We now introduce a new distance function d,, in the N , , space which is slopedependent and can be seen as an extension of the City-Block distance d,. Definition V.2 d,, Distance. Given two points p = (x,, y,) and q = (x,, y,), we note 6,(p, q ) = x, - x, and 6,(p, q ) = y , - y,. We define di(p,q) i =
FIGURE27. An example of the q-transform with i
= 2.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
22 1
0.. . 3 as follows: d,(P, 11)
= P,(P,
41 + ISX(P>9) + fi,(P? 4)l
dl(P?4 ) = I q A 411
+
tbX(P9
4 ) + d,(P> 41
d,(P, 4 ) = I q P , 41 + Id,(P, 4 ) - d,(P, 411 d,(P, 4 ) = Id,(P, 411 + P,(P, 4) - q P , 411
Given two points r and s, we define dpq(r,s) as: dpq(r,s) = &(r, s) with i” such that: &(p, q) = min di(p, q) i = O . ..., 3
Note that the value of i* corresponds to the octant pairs defining the transformation (see Figure 26). Example V.3 d,, Distance. For example, consider the four points p , q, r, and s in Figure 28. We have b,(p, q) = 10, d,.(p, q ) = 3, and hence, do(p, q ) = 23, d,(p, q ) = 16, d,(p, q ) = 10, and d,(p, q ) = 17. Therefore, i* = 2 and d,,(P> 4 ) = d,(P, 4 ) = 10. The definition of d,, is based on the analogy with the 16-shortest path. More precisely, d,,(p, q) = I(SP,,(p, q)) with a = 1, h = 1, and c = 2. The analogy also applies to the transformed path using TI..Indeed, d,,(p, q) is the length of the 4-shortest path from p‘ = T,(p) to q’ = T’(q) with a = b = 1. In other words, d,,(p, q ) = d4(7;:*(p),%.(q)) = d,(p’, q’) (dashed line). Now, given two points r and s, for instance, we can compute dPq(r,s). Since dx(r, s) = 8, 8,(r, s) = 2 then dPq(r,s) = 8. Similarly to d,,(p, q), this value is the length of the 4-shortest path from r’ = T2(r)to s’ = T2(s)with a=b=l. In other words, d,,,(r, s) = d4(T*(r), %.(s)) = d4(r’,s‘) = 8 (dashed line). In this example, we have chosen the slope of [ r , s] to be in the same octant as [ p , q ] for simplicity. However, the distance function is general and can apply to instance when the two slopes do not belong to the same range defining i*.
I
I
/
,--
FIGURE 28. Distance calculations
- - *st
/s
222
STEPHANE MARCHAND-MAILLET
Similar to the 7; transform, these values of pairs as shown in Figure 26. Proposition V.5.
CT
(resp. i*) define four octant
For a given p , q, dpq(.,.) is a distance function.
€'roo$
(i) dPq(r,s) = O o r = s (trivial). (ii) dpq(r,s) = d,,(s, r ) (trivial). (iii) dPq(r,t ) d dpq(r,s) d,,(s, t): 6,k,
+ t ) = ij,(r, s) + 6,(s, t ) and d,(r, t ) = dy(r, s) + d,(s, t )
then lfix(r7 t ) + d Y h t)l d Id*@, s) + d
Y k 41 + ld,(s,
1)
+ d,(s,
t)l
Likewise, Id,(r, t ) + dy(h 01 G Id.&,
4 - dy(r, $1 + 16,(s,
t ) - d,(s, t)l
Hence, dpq(.,.) is a distance function. Proposition V.6 d,,(p, q ) = d,(p, 4). Alternatively, d,,(p, q) = 1 c s p and q are 8-neighbors.
223
DISCRETE GEOMETRY FOR IMAGE PROCESSING
Prooj
d,,(P> 4) = min(l6x(P, 4)L 16,(P, = max(ldXb,4)1,
41) + IlS,(P,
411 - I6,cp,
411
16,(P, 4)1)
= d8(p, 4 )
We can also define a new distance function D,, in the N , , space, which can be seen as an extension of the Chessboard distance d,. This new distance function will prove fundamental in the characterization of the 16-neighborhood.
Definition V.7 D,, Distance. Given two points p we define Di(p, 4 ) i = 0,. . . , 3 as follows:
= (x,,
y,) and q
= (x,,
y,),
Do@, 4 ) = max(l6,(n 4)L 16,(P, 4 ) + 6,(P, 4)l)
D,@,4 ) = max(ld,b, d1, I~,(P,
4)
+ a,@, 41)
D,(P, 4 ) = max(16,(p74)L 16,(P, 4) - 6,@, 411) D3@,
4) = max(lbx(p9 4)L P,(P, 4) - 6,(P,411)
Given two points r and s, we define D,,,(r, s) as
D,,,(r, s) = Di.(rr s) with i* such that D&, 4 ) = min Di(P3 4 ) i=O,
....3
We first note that, for a given p and q, the i* which minimizes the expression of Ji in the definition of d,, is the same as the i* which minimizes the expression of Di in the definition of D,,.
Example V.8 D,, Distance. Using the same example in Figure 28, Do(p,q) = 13, D,(p, q)= 13, D,(p, q)=7, and D3@, q)= 10. Hence, i * = 2 and D,,(p, q)= D,(p, q ) = 7. This value now corresponds to d,(p‘, 4’) where p’ = T’(p) and 4’ = T,(q). In other words, D,,(p, 4 ) = d,( T&), T*(q))= d,(p’, 4’) (dotted line) which establishes the analogy between D,, and d , . Likewise, D,,,(r, s) = d,(T*(r), T,(s)) = d,(r’, s‘) = 6 (dotted line). Proposition V.9 D,,(p, 4) = 1 -=p and q are 16-neighbors.
Proof: Immediate by the definition of D P y . We can also easily prove that DJ.,.) is a distance function in essentially the same way as we did for J,,,(.,.). Figure 29 shows the different discs of radius 1 for the two new distance functions where p is the origin and q is any pixel in each quadrant of Z2. The disc of radius i is centered at pixel p (shown highlighted) and the locus
224
STEPHANE MARCHAND-MAILLET
FIGURE29. Bowls d,,(u. shaded regions).
G()
< 1 (light-shaded region) and
D,,(u, a ) < I (dark- and light-
of points c( E Rz are shown shaded as given in the caption of Figure 29. This figure clarifies the fact that the distance metrics are defined such that they are dependent on the slope of the segment [p,q]. The union of discs fully describes the 16-neighbors of a pixel. Thus D,, can be used to define the N , , neighborhood of a pixel explicitly. It will become apparent in the next sections how the newly defined distance functions will assist in reaching the goal of characterizing 16-digital straight segments. We first use these distance functions d,, and D,, for defining a digitization scheme that applies in the 16-neighborhood space.
B. Grid-Intersect Quantization G l Q ,
,
Freeman [1970] defines the grid-intersect quantization in the N , space (GIQa) using the distanced, as the set of pixels closest to a curve whenever it intersects a horizontal or a vertical grid line. The (8-)grid-intersect quantization of a real segment was then defined as an (8-)digital straight segment [Freeman, 19701. However, in the N , , space, because the c-move skips some grid lines, this definition is not satisfactory. We propose a new definition for the grid-intersect quantization of a real straight segment in the N , , space (GIQ,,). Given a real segment [a,p], let p (respectively, q ) be the grid point &closest to o! (respectively, B). Then, similar to GIQ,, we call GlQ,,(a,B) the 16-digital arc which realizes the minimum area of S (ISl), where S is the surface between the 16-digital arc and the real segment [ a , f i ] (see Figure 30).
DISCRETE GEOMETRY FOR IMAGE PROCESSING
225
FIGURE30. Surface between [ p , q ] and the 16-digital arc. (A) Nonoptimal case. ( B ) Optimal case (GIQ,,(p, 4)).
Proposition V.10. GlQ,,(a, 8) is a 16-shortest puth between p and q with a, 6, and c defined such that da,6,cis a distance (see Subsection IV.A.l).
Proof: Let { p o , p , , . . . ,p,,] be the set of grid points (pixels) in GIQ,,(cc, p). Let 4i be the angle between [ p , q] and [ ~ ~ , p ~i = + ~0,.] .,. ,n - 1. Thus the minimum value of IS[ will be reached for the minimum value of 2rLgL In turn, this also implies that each [q5,[ is also a minimum. Therefore, GIQ,,(a, 8) will be composed of two types of moves, one with the minimum slope which is greater than the slope of [ p , 91, and the other with the maximum slope which is smaller than the slope of [ p , q ] . In other words, the two moves are those whose slope is closest to that of [ p , q ] . Such a digital arc exists since these two moves compose SPBG,,(p, q ) as shown previously in Figure 14. Hence, GIQ,,(a, p) will be a path in SPBG,,(p, q) and, therefore, it is a shortest path between p and q. As a byproduct of Proposition V. 10, a simpler characterization of GIQ,,(p, q ) would be the following. GIQ,,(a,p) is the 16-shortest path between p and 4 which is closest to the real straight segment [ p , q ] . Moreover, Freeman’s [19701 first criterion (Proposition IV.33(i)) can clearly be extended in the case of a 16-digital straight segment. We can deduce from Propositions 1V.20 and V.10 that GIQ,,(a, p) is composed of, at most, two directions which differ by one modulo 16. We now describe a simple algorithm for computing GIQ,,. 1. Implementation
Given a and
p, the following algorithm will give as output GIQ,,(a,p):
1. Compute p and q, the dmp-closestpixels to a and p, respectively. 2. Initialize: P,, c { p } . 3. Build a path P,, by adding the next pixel pi in SPBG,,(p, q ) closest to PI 914. If p i = q stop else go to step 3.
c
226
STEPHANE MARCHAND-MAILLET
By extension of the special case of a tie in the N , space, we choose the lower coordinate pixel in the N , , space (step 3). Proposition V.ll.
This dgorithm converges und gives us output GIQ,,(cc,
8).
Pro05 The existence of p and q is trivial. We only need to show that: For any i >0, there exist two discrete points pi and pi - adjacent in SPBG, ,(p, q ) and there exists a real point 6 in [ p , q] such that dp,(pi, 6) d f. By induction, Po = P. We define I and m such that [ p , q ] crosses the line y = x + m at y and 1 - $ < y,
,
2. Discrete Convexity of G I Q , ,(p, q ) A number of intimately linked definitions have been proposed for discrete convexity in the N , space on the unit square grid (see Section 1V.B). We will follow the characterization of discrete convexity given in Proposition
_ _ - _1 P . d
Vertices of CIQ , & ~ q )
- -0
Reachable vertices in SPBGIQp,q) Others vertices in SPBG,,( p,q) Lines of the form y=x+k
- GIQ16(P,q)
where k varies in N
FIGURE31. Construction of GIQ,,(p, 9).
DISCRETE GEOMETRY FOR IMAGE PROCESSING y=x+m
227
y=x+m+l
FIGURE32. A step in the construction of GIQ,,(p, 4).
IV.27, where a set P of grid points is said to be discrete convex if and only if P = ( P ) . The aim of this section is essentially to prove the following result.
,
Theorem V.12. GIQ ,(p, q) i s discrete convex. Proof: The first remark is that SPBG,,(p, q) contains all the points of the grid as pixels in the vicinity of [p, q ] (e.g., see Figure 15). Without loss of generality, we can assume that [p, q] has a slope c such that 0 < c < 4.Therefore, by definition, d J r , s) = I(x, - x,) - ( y , - yJ ly, - y,l and GIQ,,(p, q ) is composed o f c (knight) and a (horizontal) moves (codes 0 and 1). Let { p 0 , p , , . . . ,p , } be the set of discrete points in GIQ,,(p, q). By construction of GIQ,,(p, q), for any point pi in GIQ,,(p, q ) there exists a real p , q] point y in [ p , q] such that dp,(pi, y ) d 4.Let 9 = ( Y E R 2 such that 3 6 [ ~ such that dp,(y, 6) < 4). Then, the real convex hull of GIQ,,(p, q), [GIQi& q)I 1s such that [GIQi,(P, 4)I C 9. Let us say that a pixel p* of SPBG,,(p, q ) is such that p*$GIQ,,(p, q ) and p* €4,where is the interior of the set 9 (i.e., there exists a real point y in [ p , q ] such that dpq(p*,y ) < 4). Since GIQ,, is a 16-shortest path on the grid (see Proposition V.lO), it is composed of, at most, two moves ( a and c in the case of 0 < (r < 4). Hence, each move along GIQ,, crosses a line of the form y = x + k ; k E N exactly once. More precisely, on each of such lines, only one pixel p i is such that there exists a real point 6 ~ [ p , q ]such that d p , ( p i , 6 )d (see Figure 32). Therefore 4 does not contain any pixels other than those which are on G I Q , , ( p , q ) .Consequently, the interior of the real convex hull of GIQ,,(p, q ) satisfies the same property (see Figure 33). Hence, GIQ,,(p, q ) is discrete convex.
+
C. Discrete Straightness in the 16-Neighborhood Space
In Subsection IV.A.2 we obtained results concerning the 16-neighborhood coding. These results motivate the study of discrete straightness in this
228
STEPHANE MARCHAND-MAILLET
. . ..
. .
,
. .
FIGURE33. Verification of the discrete convexity for GIQ,,(p, q)
extended neighborhood. In this subsection our aim is to arrive at a characterization of digital straight segments in the 16-neighborhood space (16digital straight segments) similar to that given by the chord properties in the 8-neighborhood space. Based on the newly defined distances and digitization scheme, we follow the approach taken when introducing straightness in the 8-neighborhood space. The 16-digital arcs resulting from the 16-digitization of real straight segments are first proved to satisfy properties which belong to the class of chord properties in Subsection 1 (following). The major result of this study can then be given as an analytical characterization of 16-straightness. For the sake of completeness, Subsection 2 also presents the study of Upper and Lower bounds of a digital straight segment in the 16-neighborhood space. 1. 16-Digital Straight Segments We introduce chord properties in the extended neighborhood space. These properties make use of the distance functions d,, and Dpq,By analyzing the construction of these distances, the new 16-chord properties will be proved to be analytical characterizations of 16-digital straight segments. Proposition V.13 is to be compared with Propositions IV.34 and IV.38.
Proposition V.13 Chord Properties in the 16-Neighborhood Space. A 16-digital arc P,, = { P ~ } ~ = ~ , satisfies ,..,~ the 16-chord property if and only if; f o r any two discrete points p i and p j in Ppqand for any real point c1 on the continuous segment [ p i , p,], there exists a point pk E P,, such that D,,(a, p k ) < 1. A 16-digital arc P,, = { P ~ } ~.,n= satisfies ~,. the 16-compuct chord property if and only $ for any two discrete points p i and pi in P,, and,for any real point
DISCRETE GEOMETRY FOR IMAGE PROCESSING
a on the continuous segment [ p i , p j ] , there exists a real point line ui [ pi, pi+ such that d,,(a, p) < 1.
229
/? in the broken
Figure 34(A) shows the geometric shapes for the 16-chord property and the 16-compact chord property. The dashed polygon (0) contains the locus of points ct E R2 such that D,,,(a, P,,,) < 1 and the solid polygon (0) contains the locus of points a E Iwz such that d,,(a, ui [ p i , pi+ ,I) < 1. Using the concept of visibility in computational geometry, the properties can be reformulated as follows. Given two points r, s in a region R, s is said to be strictly visible from r in W if the real segment [r, s] is wholly contained in 9 (i.e., [r,s] does not cross or touch the boundary of B).
Proposition V.14. A 16-digital arc P,,, = {pi}i=o,...,nsatisjies the 16-chord property (respectively, 16-compact chord property) if and only $ for any two discrete points pi and p j in P,,,, pi is strictly visible from p i in 0 (respectively, in 0). Proof: Immediate by the definition of 0 (respectively, 6). These chord properties can now be used for characterizing analytically 16-digital straight segments. We first formally define the concept of straight segment in the 16-neighborhood space using the digitization scheme defined earlier.
Definition V.15 16-Digital Straight Segment. A 16-digital arc P,,, is a 16digital straight segment if there exist two real points a and /Isuch that GIQiJa, B) = P p q . The characterization of 16-digital straight segments will be operated in two stem formulated in Lemmas V.16 and V.17, respectively.
FIGURE34. (A) A 16-digital arc which satisfies the 16-(compact) chord property. (B) A 16-digital arc violating the 16-(compact)chord property.
230
STEPHANE MARCHAND-MAILLET
Lemma V.16. property.
A 16-digital straight segment satisjes the 16-compact chord
. . a. , 16digital ~ straight segment. By Definition Proof: Let P,, = { P ~ J ~ = ~ ,be V.15, there exist two real points a and fi such that GIQ,,(a, fi) = P,,. It was shown in Subsection V.B.2 that GIQ,,(p,q) is discrete convex. In other words, if there exist two discrete points pi and p j in GIQ,,(p, 4 ) for which the 16-compact chord property is violated, then there exists a real point y in [ p i , p j ] such that for any 6 in ui[ pi,pi+ ,I, d,,(y, 6) > 1. In other words, in this case the real convex hull of GIQ,,(p, q), [GIQ,,(p, q)] will contain a pixel of the SPBG,,(p, 4) which does not belong to GIQ,,@, q), contradicting the fact that GIQ,,(p, q) is discrete convex. In Figure 34(B), the 16-compact chord property is violated. Moreover, [GIQ,,(p, 4)] contains a which is not in GIQ,,(p, 4).
Lemma V.17. I f a ]&digital arc P,, satisfies the 16-compact chord property, it is a 16-digital straight segment. Proof: We assume without loss of generality that the slope o f [ p , q ] is between 0 and 4.The proof for the other cases is similar. Let {po, p l , . .. ,p n ) be the set of discrete points in P,,. We define the following lines (see Figure 35): 9,:y = ox + p 1 such that VpiePpq, p i is below 9,( p i = (xi,Y i ) * yi G cxi + P I ) * 5B2: y = ox + p 2 such that V p , € Ppq,pi is above 9 2 ( P i = (xi,Y i ) * Y i 2 oxi + ~ 2 ) . 9 , and g2are defined as two parallel lines such that any point pi in P,, lies between 9,and Q2. Building on this, we define a width measure for J', ,: W h p l , p 2 )= m i n ( d , , ( 6 , & ) 1 6 ~ ~ l ; & e ~ 2 ) . W * = W(o*,p?, p;) = mina.,,,,P2(W(c,p , , p 2 ) ) . (W* is the minimal d,,-width of P,,,).
FIGURE35. Characterization of a digital straight segment.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
23 1
Now, to prove Lemma V.17, we only need to prove that there exists a real straight segment [a, fi] where &(a, p ) < f and d,,(p, q ) < f such that GlQ,,(a,fi) = P,,. In this context, it is then sufficient to prove that, if P,, satisfies the 16-compact chord property, then W * < 1. By definition, one of the two lines gl and g2(gI, say) contains at least contains ) at least one two points pi and pi from Pps, and the other line (9, point pk from P,,. Otherwise, W * would not be minimal. For obtaining a minimal width ( W * ) ,one of 93, or g2should be a part of the real convex hull of P,,, as shown schematically in Figure 36(A). Now, let q be a point on g1 such that d p q ( p k , q )is minimum (i.e., dp,(pk, q ) = W * ) .For the triangle A p i p j p k to have a minimal width W * , q should be a point on [ p i , p j ] (see Figure 36(B)). Hence, pk lies between pi and pi on the 16-digital arc P,, (i.e., p k E P,,,, c P,,). Now, if P,, satisfies the 16-compact chord property then clearly P p , p P,, ,~ satisfies the 16-compact chord property. Moreover, since pt E P,,,,, there exists y E [pi,pi] (i.e., on g I )such that d,,(pk, y) < 1. Hence, there exist YE^^ and S€g2such that d,(y, 6) < 1. Since gl and s2are two parallel lines, we obtain W * < 1. 93is both parallel and d,,-equidistant to Let 93be the line y = ox + y. g1and 9,. Hence, for any piE P,,, there exists y ~9~such that d,,(S, pi) < f. Let ct and fi be two real points on 23 such that d,,(p, a) < f and d,(q, 8) < *, respectively. Then GIQ,,(a,fi) = P,,. Therefore, for each digital arc which satisfies the 16-compact chord property we can define a real segment (not unique) such that its grid-intersect quantization is the digital arc in question. In other words, a digital arc which satisfies the 16-compact chord property is a 16-digital straight segment. Combining the above results, we conclude this study of 16-straightness with the following main result. Theorem V.18. A 16digital arc is a digital straight segment ij'and only ij'it satisfies the 16-compact chord property.
Proof: The necessary condition is given by Lemma V.16 and the sufficient condition is given by Lemma V.17.
L
FIGURE 36. Geometrical evidence
232
STEPHA N E MARCH A N D- M A I LLET
2. Upper and Lower Bounds of [ p , q] Pham [1986] defined the (8-)Upper and Lower bound digital arcs for a given real segment and proved that they define two (&)digital straight segments. Their construction is based on shifting chain-codes of the gridintersect quantization of the real segment in question. These bounds allow for locating all possible digital straight segments joining two discrete points. We can easily extend these definitions to the 16-neighborhood. In the algorithm defined in Subsection V.B at each step, at most two pixels can be reached in SPBG,,(p, q) (see Figures 15 and 32). We will say that one of these pixels defines the 16-Upper bound if it is above [ p , q] while the other pixel defines the 16-Lower bound (see Figure 37).
Proposition V.19. The 16- Upper and 16-Lower bounds define two 16-digital straight segments. Proof. The 16-Upper and 16-Lower bounds are two 16-shortest paths since they are defined in the SPBG,,(p, 4). Let Up,= {uo,.. . ,u,,) and L,, = {lo,. . .,1,) be the 16-Upper bound and 16-Lower bound, respectively. By definition, V i E (0,.. . ,n } there exists r E P,,, such that d,,,(r, u i ) < 1. We can thus define the same surface 9 as in Subsection V.B.2 where [ p , q] is included in the border of 23. We can then immediately see that U p , satisfies the 16-compact chord property. More precisely, if we define a real segment [cr, 83 as the medial line of 9, then clearly, GIQ,,(cr, 8) = U p , . Hence, U p , is a digital straight segment. The proof for the case of L,, is obtained by symmetry. Further properties defined in the 8-neighborhood space find their equivalent in the 16-neighborhood space. The intimate link with combinatorial structures defined in this study and the duality created by transforms readily suggest these extensions.
---
Upper Bound Lower Bound
FIGURE 37. Upper and Lower bounds.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
233
In the next section, we present an example of such a mapping applied to the problem of vectorization of chain-code sequences.
VI. APPLICATIONTO VECTORIZATION
In this section we suggest two algorithms for the polygonal decomposition of a 16-digital path. For illustrating the developments presented in this study, we propose two algorithms for checking for the straightness of digital arcs. Applications of such algorithms include vectorization of binary line imagers (e.g., engineering drawings). Two approaches are taken. Subsection V1.A first proposes a direct application of chord properties. This results in a greedy (i.e., suboptimal) decomposition of the 16-digital arc in question. Optimal polygonal approximation algorithms exist in the 8-neighborhood space. With the same token as in Subsection V.C.2, we propose in Subsection V1.B to exploit the duality created by transforms for enabling such procedures to operate equivalently on 16-digital arcs. A. A Greedy Algorithm
This algorithm tests the straightness of P,,, in a greedy fashion. Each time the straightness is violated, another 16-digital straight segment is stacked. The algorithm returns A, the list of break points for all the 16-digital straight segments in PPq,and N,, the number of 16-digital straight segments. In this case, N , is not necessarily minimum. Given a 16-digital arc P,,, = ( p o , p l , . . . , p,,}: 1. i , j , N , + O ; A + ( p o ) 2. j + j + 1 3. I f j > n then A + A u { p , , } ; N , + N , + 1; Stop. 4. Compute the slope of [ p i ,p j ] to determine an expression for d,,pJ(.,.). 5. Check the visibility of the pixel p j against the pixels { p i ,pit . .,pi- 1 in the polygon ( c c ~ [ Wsuch ~ that 3 / 1 u ~ i [ p i p i t such that d , , , p J ( ~8) , < 11. I f pi is visible from all these pixels then go to step 2. else:
(a) A + A u ( p j - , ) (b) N , + N , + 1. (c) i +j go to step 2.
234
STE PH A N E M A R C HA N D-MA ILLET
B. Checking Discrete Straightness Using the Duality Generated by
The validity of the next proposition is based on the continuity and reversibility properties of transformations q.
Proposition VI.l. Given a Ibdigital arc P,,, let 0 be the slope of [ p , q]. Using the suitable index i* E { 0,1,2,3) (i.e., depending on 0, see definition of in Section K A ) , the following holds: P,, satisfies the 16chordproperty ifand only i f t h e 8-digital arc Til(Ppq) satisfies the 8-chord property. P,, satisfies the I6compact chord property fi and only if the 8-digital arc T*(Ppq)satisfies the 8-compact chord property.
Example V1.2 Duality. Proposition VI.1 can be illustrated using the example shown in Figure 38 where i* = 2. From Proposition VI.1, we can readily define an algorithm which tests for 16-digital straightness by combining the use of transformations q and existing procedures that check for discrete straightness in the N , space (e.g. [Dorst and Smeulders, 1984; Lindenbaum and Koplowitz, 1991; Sharaiha, 1991; Sharaiha and Christofides, 1993; Smeulders and Dorst, 19911).
VII. CONCLUSION In this paper we aimed to introduce techniques for analyzing data in discrete spaces. In particular, we concentrated in binary digital image processing where the only information available is the (0-1) value of a pixel at a discrete location.
FIGURE38. (A) The chord properties hold. (B) The chord properties are violated.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
235
The mapping of continuous data into a discrete space and the construction of an underlying topological structure for the discrete space were detailed in the first sections. Then, based on connectivity relationships formally established between discrete points, we detailed the study of typical geometrical properties of connected sets. We then mapped these results into an extended discrete space. This was essentially based on the rigorous characterization of a mapping between 8- and 16-neighborhood spaces. This study resulted in a formal characterization of discrete straightness in the 16-neighborhood space. We also showed that such an approach allowed us to map and extend further results into the 16-neighborhood space. The study of digital data in discrete spaces allows for a better understanding of problems encountered when operating digitization. It is important to follow such a discrete approach from the basis of an analysis to be able to control and overcome approximations associated with discrete data processing. Although applied solely to binary images, this context can be extended to other types of images with essentially no fundamental modification. For example, geodesic distances, leading to DTOCS [Toivanen, 19961 in grayscale images are an instance of an extension of discrete distances presented in this paper.
REFERENCES Borgefors, G . (1984). Distance transformations in arbitrary dimensions. Coniputer Vision, Gruphics and Image Processing 27, 321 -345. Borgefors, G. (1986). Distance transformations in digitial images. Computer Vision, Graphics and Image Processing 34, 344-31 1. Chassery, J.-M. (1983). Discrete convexity: definitions, parameterization and compatibility with continuous convexity. Computer Vision. Gruphics und Image Processing 21, 326-344. Chassery, J.-M. and Chenin, M. I. (1980). Topologies on discrete spaces, in Digifal Image Processing (Simon and Haralick, Eds.). pp. 59-66. Reidel Publ. Chassery, J.-M. and Montanvert, A. (199 I). G6omPrrie Discrete en Analyse d’images. Editions Hermis, Paris (in French). Christofides, N., Badra, H. O., and Sharaiha, Y. M. (1997). Data structures for topological and geometric operations on networks. Anna/.c of Operutions Research: ORIIS Interface 71, 259-289. Das, P. P. and Chatterji, B. N. (1988). Knight’s distances in digital geometry. Puttem Recognition Letters 7. 21 5-226. Das, P. P. and Mukherjee, J. (1990). Metricity of super knight’s distance in digital geometry. Pattern Recognition Letters 11, 601-604. Dorst, L. and Smeulders, A. W. M. (1984). Discrete representation of straight lines. I E E E nuns. on Pattern Analysis and Machine Intelligence PAMI-6(4), 450-463. Freeman, H. (1970). Boundary encoding and processing, in Picture Processing und Psychopictorics (B. S. Lipkin and A. Rosenfeld, Eds.), pp. 241-266. New York: Academic Press.
236
STgPHANE MARCHAND-MAILLET
Freeman, H. (1974). Computer processing of line-drawing images. Computing Suroeys 6( l), 57-97. Gondran, M. and Minoux, M. (1984). Graphs and algorithms. Wiley-Interscience Series in Discrete Mathematics. New York: Wiley. Harary, F., Melter, R. A,, and Tomescu, I. (1984). Digital metrics: a graph-theoretical approach. Pattern Recognition Letters 2, 159- 163. Hilditch, C. J. and Rutovitz, D. (1969). Chromosome recognition. Annuls qf the New York Academy of Sciences 157, 339-364. Hung, S. H. Y. (1985). On the straightness of digital arcs. IEEE Trans. on Pattern Analysis and Muchine Intelligence PAMI-7(2), 203-21 5. Kim, C. E. (1981). On the cellular convexity of complexes. IEEE Trans. on Purtern Analysis and Machine Intelligence PAMI-3, 617-625. Kim, C. E. (1982). Digital convexity, straightness and convex polygons. I E E E Trans. on Pattern Analysis and Machine Intelligence PAMI-4, 6 18-626. Kim, C. E. and Rosenfeld, A. (1982). Digital straight lines and convexity of digital regions. I E E E Trans. on Pattern Analysis and Muchine Intelligence PAMI-4(2), 149- 153. Kim, C. E. and Sklansky, J. (1982). Digital and cellular convexity. Pattern Recognition 15, 359-367. Kong, T. Y. and Rosenfeld, A. (1989). Digital topology: introduction and survey. Computer Vision, Graphics and Image Processing 48, 357-393. Lindenbaum, M. and Koplowitz, J . (1991). A new parametrisation of digital straight lines. I E E E Trans. on Puttern Analysis and Machine Intelligence PAMI-13, 847-852. Marchand-Maillet, S. and Sharaiha, Y. M. (1997). Euclidean ordering via chamfer distance calculations. Computer Vision and Imuge Understunding. Montanari, U . (1968). A method for obtaining skeletons using a quasi-Eucliden distance. Journal qf the ACM 15(4), 600-624. Montanari, U. (1970). A note on minimal length polygonal approximation to a digitized contour. Communications qf the ACM 13,41-47. Morris, 0. J., de Jersey Lee, M., and Constantinides, A. G. (1986). Graph theory for image analysis: An approach based on the shortest spanning tree. I E E E Proceedings-F Communicutions Radar and Signal Processing 133(2), 146-152. Pham, S. (1986). Digital straight segments. Computer Vision, Graphics and Image Processing 36, 10-30. Ronse, C. (1985a). Definitions of convexity and convex hulls in digital images. Bull. Soc. M a r k Belge SPrie B 37(2), 71-85. Ronse, C. (1985b). An isomorphism for digital images. Journml of Combinatorial Theory, Series A 39, 132-159. Ronse, C. (1985~).A simple proof of Rosenfeld’s characterisation of digital straight segments. Pattern Recognition Letters 3(5), 323-326. Ronse, C. (1985d). A topological characterization of thinning. Theoretical Computer Science 43, 31-41. Ronse, C. (1986). A strong chord property for 4-connected convex digital sets. Computer Vision. Graphics and Image Processing 35, 259-269. Ronse, C. (1989). A bibliography on digital and computational convexity (1961- 1988). I E E E Trans. on Pattern Analysis and Machine Intelligence PAMI-I1(2), 181- 189. Rosenfeld, A. (1974). Digital straight line segments. I E E E Trans. on Computers C-23( 12), 1264-1269. Rosenfeld, A. (1979). Digital topology. American Mathematicd Monthly 86, 621 -629. Rosenfeld, A. and Melter, R. A. (1989). Digital geometry. The Mathematical Intelliyencer 11(3), 69-72.
DISCRETE GEOMETRY FOR IMAGE PROCESSING
237
Rosenfeld, A. and Pfaltz, J. L. (1968). Distances functions on digital pictures. Purtern Recognition 1, 33-61. Sharaiha, Y. M. (1991). A graph theoretic approach for the raster-to-vector problem in digital image processing. PhD thesis, Imperial College, London. Sharaiha, Y. M. and Christofides, N. (1993). An optimal algorithm for straight segment approximation of digital arcs. CVGIP; Gruphicul Models und h u g e Processing 5 3 9 , 397-407. Sharaiha, Y. M. and Christofides, N. (1994). A graph theoretic approach to distance transformations. Puttern Recognition I5(10). 1035- 1041. Sharaiha, Y. M. and Garat, P. (1993). A compact chord property for digital arcs. Puttern Recoynition 26(5),799-803. Smeulders, A. W. M. and Dorst, L. (1991). Decomposition of discrete curves into piecewise straight segments in linear time, in Vision Geometry, Contemporary Muthemutics (A. Rosenfeld, R. A. Melter, and P. Battacharaya, Eds.), pp. 169-195. American Mathematical Society, Providence, RI. Suzuki, S., Ueda, N., and Sklansky, J. (1993). Graph-based thinning for binary images. Internutional Journul of Pattern Rrcugnition and Artificial Intelligence 7( 5). 1009- 1030. Thiel, E. and Montanvert, A. (1992). Chamfer masks: discrete distance functions, geometrical properties and optimizations, in Eleventh Interntionul Cor~ferenceon Pattern Recognition, pp. 244-247. The Hague, The Netherlands, August 30-September 3. Toivanen, P. J. (1 996). New geodesics distance transform for grayscale images. Puttern Recognition 11, 431-450. Voss, K. (1993). Discrete images, objects and functions in Z". Algorithms and Combinatorics 11. Berlin: Springer Verlag.
This Page Intentionally Left Blank
ADVANCES IN IMAGING A N D ELECTRON PHYSICS. VOL . LO6
Introduction to the Fractional Fourier Transform and Its Applications Haldun M . Ozaktas and M . Alper Kutay Deparfnient of Electricul Engineering Bilkent University TR-06533 Bilkent. Ankara. Turkey
David Mendlovic Faculty qf Engineering. Tel-Aviv University 69978 Tel.Aviv Israel
.
1. Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . Fundamental Properties . . . . . . . . . . . . . . . . . . . . . . . Common Transform Pairs . . . . . . . . . . . . . . . . . . . . . . . Eigenvalues and Eigenfunctions . . . . . . . . . . . . . . . . . . . . Operational Properties . . . . . . . . . . . . . . . . . . . . . . . . Relation to the Wigner Distribution . . . . . . . . . . . . . . . . . . Fractional Fourier Domains . . . . . . . . . . . . . . . . . . . . . . Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . Hyperdifferential Form . . . . . . . . . . . . . . . . . . . . . . . . Digital Simulation of the Transform . . . . . . . . . . . . . . . . . . Applications to Wave and Beam Propagation . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Quadratic-Phase Systems as Fractional Fourier Transforms . . . . . . . C. Propagation in Quadratic Graded-Index Media . . . . . . . . . . . . D. Fresnel Diffraction . . . . . . . . . . . . . . . . . . . . . . . . E . Fourier Optical Systems . . . . . . . . . . . . . . . . . . . . . . F . Optical Implementation of the Fractional Fourier Transform . . . . . . G . Gaussian Beam Propagation . . . . . . . . . . . . . . . . . . . . XI11. Applications to Signal and Image Processing . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
II. I11 . IV . V. V1 . VII . VIII . IX . X. XI. XI1.
239 243 245 247 249 252 256 260 261 263 263 265 265 268 270 271 273 275 276 279 286 286
I . INTRODUCTION The purpose of this chapter is to provide a self-complete introduction to the fractional Fourier transform for those who wish to obtain an understanding of the essentials without having to work through the hundreds of papers
240
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
which have appeared in the last few years. A general introduction will be followed by the definition of the transform and a discussion of its fundamental and operational properties. Of central importance is the relationship of the transform to the Wigner distribution and other phase-space distributions (also known as time-frequency or space-frequency representations). We will concentrate on two main application areas which have so far received the most attention: wave and beam propagation and signal processing. The fractional Fourier transform is a generalization of the ordinary Fourier transform with an order parameter a. Mathematically, the ath order fractional Fourier transform is the ath power of the Fourier transform operator. The a = 1st order fractional transform is the ordinary Fourier transform. With the development of the fractional Fourier transform and related concepts, we see that the ordinary frequency domain is merely a special case of a continuum of fractional Fourier domains, and we arrive at a richer and more general theory of alternate signal representations, all of which are elegantly related to phase-space distributions. Every property and application of the common Fourier transform becomes a special case of that of the fractional transform. In every area in which Fourier transforms and frequency domain concepts are used, there exists the potential for generalization and improvement by using the fractional transform. For instance, the well-known result stating that the far-field diffraction pattern of an aperture is in the form of the Fourier transform of the aperture can be generalized to state that at closer distances, one observes the fractional Fourier transform of the aperture. The theory of optimal Wiener filtering in the ordinary Fourier domains can be generalized to optimal filtering in fractional domains, resulting in smaller mean-square errors at practically no additional cost. In essence, the ath order fractional Fourier transform interpolates between a function f(u) and its Fourier transform &). The 0th order transform is simply the function itself, whereas the 1st order transform is its Fourier transform. The 0.5th transform is something in between, such that the same operation that takes us from the original function to its 0.5th transform will take us from its 0.5th transform to its ordinary Fourier transform. More generally, index additivity is satisfied: The a,th transform of the a,th transform is equal to the ( a z + u,)th transform. The - lth transform is the inverse Fourier transform, and the -ath transform is the inverse of the ath transform. Scattered early papers related to the fractional Fourier transform include Wiener [1929], Condon [1937], Bargmann [1961], and de Bruijn [1973]. Of importance are two separate streams of mathematical papers which appeared throughout the eighties [Namias, 1980; McBride and Kerr, 1987;
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
241
Mustard, 1987a,b, 1989, 1991, 19961. However, the number of publications exploded only after the introduction of the transform to the optics and signal processing communities [Seger, 1993; Lohmann, 1993; Ozaktas and Mendlovic, 1993a,b; Mendlovic and Ozaktas, 1993; Ozaktas and others, 1994a; Alieva and others, 1994; Almeida, 19943. Not all of these authors were aware of each other o r building on the work of those preceding them, nor is the transform always immediately recognizable in some of these works. The fractional Fourier transform (or essentially equivalent transforms) appears in many contexts, although it has not always been recognized as being the fractional power of the Fourier transform and thus referred to as the fractional Fourier transform. For instance, the Green’s function of the quantum-mechanical harmonic oscillator is the kernel of the fractional Fourier transform. Also, the fractional Fourier transform is a special case of the more general linear canonical transform (see Wolf [1979] for an introduction and references). This transform has been studied in many contexts, but again the particular special case which is the fractional Fourier transform has usually not been recognized as such. The preceding citations do not represent a complete list of known historical references. For a more complete list and also a more comprehensive treatment of the fractional Fourier transform and its relation to phase-space distributions, we refer the reader to a forthcoming book o n the subject by the authors (Wiley, to be publ. 1999). We expect further scattered historical references not known to us to be revealed in time. Given the multitude of contexts in which essentially equivalent or closely related integral transforms appear, it is probably not possible to attribute its invention to a particular set of authors. These many contexts in which it was reinvented time after time in different guises is testimony to the elegance and ubiquity of the transform. Given the widespread use of the ordinary Fourier transform in science and engineering, it is important to recognize this integral transform as the fractional power of the Fourier transform. Indeed, it has been this recognition which has inspired most of the many recent applications. Replacing the ordinary Fourier transform with the fractional Fourier transform (which is more general and includes the ordinary Fourier transforms as its special case) adds an additional degree of freedom to the problem, represented by the order parameter a. This in turn may allow either a more general formulation of the problem (as in the diffraction from an aperture example) or improvements based on the possibility of optimizing over a (as in the optimal Wiener filtering example). The fractional Fourier transforms has been found to have several applications in the area known as analog optical information processing, or
242
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
Fourier optics. This transform allows a reformulation of this area in a way much more general than that found in standard texts on the subject. It has also led to generalizations of the notions of space (or time) and frequency domains, which are central concepts in signal processing, leading to many applications in this area. More generally, the transform may be expected to have an impact in the form of deeper understanding or new applications in every area in which the Fourier transform plays a significant role, and to take its place among the standard mathematical tools of physics and engineering. More specifically, some applications which have already been investigated or suggested include diffraction theory [Alieva and others, 1994; Gori, Santarsiero, and Bagini, 1994; Pellat-Finet, 1994; Pellat-Finet, 1995; Ozaktas and Mendlovic, 1995; Abe and Sheridan, 1995a; Alonso and Forbes, 1997; Ozaktas and Erden, 19971, optical beam propagation and spherical mirror resonators (lasers) [Ozaktas and Mendlovic, 1994; Erden and Ozaktas, 1997; Ozaktas and Erden, 19971, propagation in graded index media [Ozaktas and Mendlovic, 1993a,b; Mendlovic and Ozaktas, 1993; Mendlovic, Ozaktas, and Lohmann, 1994a; Alieva and Agullo-Lopez, 1995; Abe and Sheridan, 1995b; Gomez-Reino, Bao, and Perez, 19961, Fourier optics [Bernardo and Soares, 1994a,b; Pellat-Finet and Bonnet, 1994; Ozaktas and Mendlovic, 1995; Ozaktas and Mendlovic, 19961, statistical optics [Erden, Ozaktas, and Mendlovic, 1996a,b], optical systems design [Dorsch, 1995; Dorsch and Lohmann, 1995; Lohmann, 19951, quantum optics [Yurke and others, 1990; Aytur and Ozaktas, 19951, radar and phase retrieval [Raymer, Beck, and McAlister, 1994a,b; McAlister and others, 19951, tomography [Beck and others, 1993; Smithey and others, 1993; Lohmann and Soffer, 1994; Wood and Barry, 1994a,b], signal detection, correlation, and pattern recognition [Mendlovic, Ozaktas, and Lohmann, 1995d; Alieva and Agullo-Lopez, 1995; Garcia and others, 1996; Lohmann, Zalevsky, and Mendlovic, 1996b; Bitran and others, 1996; Mendlovic and others, 1995a; Mendlovic, Zalevsky, and Ozaktas, 19981, space- or timevariant filtering [Ozaktas and others, 1994a; Granieri, Trabocchi, and Sicre, 1995; Mendlovic and others, 1996b; Ozaktas, 1996; Zalevsky and Mendlovic, 1996; Mendlovic and others, 1996b; Kutay and others, 1997; Mustard, 19971, signal recovery, restoration, and enhancement [Lohmann and others, 1996a; Erden and others, 1997a,b; Ozaktas, Erden, and Kutay, 1997; Kutay and Ozaktas, 1998; Kutay and others, 1998a,b], multiplexing and data compression [Ozaktas and others, 1994a1, study of space- or timefrequency distributions [Almeida, 1994; Fonollosa and Nikias, 1994; Lohmann and Soffer, 1994; Ozaktas and others, 1994a; Mendlovic and others, 1995c; Dragoman, 1996; Mendlovic and others, 1996a; Ozaktas, Erkaya, and Kutay, 1996a; Mihovilovic and Bracewell, 19913, and solution of
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
243
differential equations [Namias, 1980; McBride and Kerr, 19871. We believe that these are only a fraction of the possible applications. We hope that this chapter will make possible the discovery of new applications by introducing the subject to new audiences. 11. NOTATION AND DEFINITIONS
The ath order fractional Fourier transform of the function f ( u ) will most often be denoted by f a ( u ) or, equivalently, Faf(u). When there is possibility of confusion, we may more explicitly write @ ' [ f ( u ) ] . The transform is defined as a linear integral transform with kernel K,(u, u'):
f,(~)= F [ , f ( u ) ]
=
J
K,(u, u ' ) f ( u ' ) du'.
The kernel will be given explicitly in the following text. All integrals are from minus to plus infinity unless otherwise stated. We prefer to use the same dummy variable u both for the original function in the space (or time) domain and its fractional Fourier transform. This is in contrast to the conventional practice associated with the ordinary Fourier transform, where a different symbol, say p, denotes the argument of the Fourier transform
w:
F ( p ) = s f ( u ) e - i 2 f f pdu, u f(u) =
But these can be rewritten as F(u) = f(u)=
s s1
F(p)ei2"fiU dp.
(2) (3)
f ( ~ ' ) e - " ~ du' "~'
(4)
F(u' )ei2xu'u dp'.
(5 )
When it is desirable to distinguish the argument of the transformed function from that of the original function, we will let u, denote the argument of the ath order fractional Fourier transform: f,(u,) = (Fa[f(u)])(u,). With this convention, u,, corresponds to u, the space (or time) coordinate; u1 corresponds to the spatial (or temporal) frequency coordinate p ; and u2 = -uo, u3 = -ul. Finally, we will agree to always interpret u as a dimensionless variable.
244
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVTC
We will refer to 9;“[.], or simply Pa, as the ath order fractional Fourier transform operator. This operator transforms a function f ( u ) into its fractional Fourier transform f,(u). We will restrict ourselves to the case where the order parameter a is a real number. The signal ,f is a finite energy signal and f ( u ) is a finite energy function both of which are well behaved in the sense usually presumed in physical applications. In quantum mechanics f is the abstract state vector I f ) and f ( u ) = ( u l f ) is the u-representation of J: Likewise, f,(u) = (u ,If) is the u,-representation, which we will also refer to as the representation of f in the ath order fractional Fourier domain. In this context If(u)l’ is interpreted as a probability distribution so that the energy of the function En[f] = flf(u)l’du = ( , f l . f ) corresponds to its integrated probability and is thus equal to 1. In signal processing and optics, the energy can take on any finite value but is conserved if attenuation or amplification mechanisms do not exist. (We will also deal with sets of signals and functions whose energies are not finite (delta functions and harmonic functions); these will not correspond to physically realizable functions, but rather serve as intermediaries in our formulations.) We now define the ath order fractional Fourier transform f,(u) through the following linear integral transform:
f,(u)= K,(u, u’)
=
s
K,(u, u ’ ) f ( u ’ ) du‘,
A , exp[ir(cot
u2 - 2 csc 4 uu‘
(6)
+ cot 4 u ‘ ~ ) ] .
where
A,
= J I - icotd.
(8)
The square root is defined such that the argument of the result lies in the interval (-n/2, 421. The kernel is not strictly defined when a is an even integer. However, it is possible to show that as a approaches an even integer, the kernel behaves like a delta function under the integral sign. Thus, consistent with the limiting behavior of the above kernel for values of a approaching even integers (further discussed later), we define K l j ( u , u‘) = 6(u - u’) and K A j k 2 ( uu’) , = 6(u + u’), where j is an arbitrary integer. Generally speaking, the fractional Fourier transform of f ( u ) exists under the same conditions under which its Fourier transform exists [McBride and Kerr, 1987; Almeida, 19943.
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
245
111. FUNDAMENTAL PROPERTIES
We first examine the case when a is equal to an integer j . We note that correspond to the identity operator 9 and by definition P4’and F4j+2 the parity operator 9,respectively (that is, f4j(u) = f ( u ) and f 4 j + 2 ( u ) = f(- u)). For a = 1 we find 4 = 4 2 , A , = 1, and exp(- i2nu211),f(u’)du’.
fi(u) =
(9)
We see that f,(u) is equal to the ordinary Fourier transform of f(u), which was previously denoted by the conventional upper case F(u). Likewise, it is possible to see that F - ,(u) is the ordinary inverse Fourier transform of f’(u). Our definition of the fractional Fourier transform is consistent with defining integer powers of the Fourier transform through repeated application (that is, P 2= 9797,F 3 = 97F2,and so on). Since I$ = an12 appears in Equation 6 only in the argument of trigonometric functions, the definition is periodic in u (or 4) with period 4 (or 2.n). Thus it is sufficient to limit attention to the interval a E [ - 2,2). These facts can be restated in operator notation: 9-0
= 6,
9-1 =
F 2 = 9, 9
3
974 9 4 j t a
= 99 = Y
3>
-94j’+a
where j , j ‘ are arbitrary integers. Let us now examine the behavior of the kernel for small la1 > 0: - insgn(d4)/4
K,(u,u‘) =
Ja
exp[i.n(u - u’)~/I$].
Now, using the well-known limit
the kernel is seen to approach 6(u - u ’ ) as u approaches 0. Thus defining the kernel Ka(u, u ’ ) to be precisely S(u - u’) at a = 0 maintains continuity of the transform with respect to a. A similar discussion is possible when a
246
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
approaches other integer multiples of 2. A more rigorous discussion of continuity with respect to a may be found in McBride and Kerr [1987]. We now discuss the index additivity property: p ' p y ( U )
= g-:"'p-"'f(u),
= p a 1 +"'f(U)
or in operator notation galojraz
=y
a l + a 2
=4 o-azo-a! 4 .
(19)
This can be proved by repeated application of Equation 6, and amounts to showing
s
Ku2(u,u")K,,(u",u') du"
= K,,
u')
+a2(~,
by direct integration, which can be accomplished by using standard Gaussian integrals. We do not present the details of this proof, since this property will follow much more simply from certain properties of the transform to be discussed. The index additivity property is of central importance. Indeed, without it, we could hardly think of Faas being the ath power of 9 (more will be said on this later). For instance, the 0.2nd fractional Fourier transform of the 0.5th transform is the 0.7th Fourier transform. Repeated application leads to statements such as, for instance, the 1.3th transform of the 2.lst transform of the 1.4th transform is the 43th transform (which is the same as the 03th transform). Transforms of different orders commute with each other so that their order can be freely interchanged. From the index additivity property, we deduce that the inverse of the ath order fractional Fourier transform operator (Fa)-'is simply equal to the operator F-, (because 9-"Fa = 3).This can also be shown by directly demonstrating that
s
K,(u, u")K -,(u", u') du = 6 ( ~ u'),
(21)
so that K ; ' ( u , u') = K-,(u, u'). Thus we see that we can freely manipulate the order parameter a as if it denoted a power of the Fourier transform operator F. Fractional Fourier transforms constitute a one-parameter family of transforms. This family is a subfamily of the more general family of linear canonical transforms which have three parameters [Wolf, 1979; Mohinsky and Quesne, 1971; and Mohinsky, Seligman, and Wolf, 19721. As all linear canonical transforms do, fractional Fourier transforms satisfy the associativity property and they are unitary, as we can directly see by examining the
FRACTIONAL F O U R I E R T R A N S F O R M AND ITS A P P L I C A T I O N S
247
kernel of the inverse transform obtained by replacing a with --a:
KO-‘(u,u‘)
=
K - , ( u , u‘)
=
K,*(u,u’) = K,*(u’,u).
(22)
The kernel K,(u, u’) is symmetric and unitary, but not Hermitian. Unitarity implies that the fractional Fourier transform can be interpreted as a transformation from one representation to another, and that inner products and norms are not changed under the transformation.
IV. COMMON TRANSFORM PAIRS Table 1 gives the fractional Fourier transforms of a number of functions for which the integral appearing in Equation 6 can be evaluated analytically (often using standard Gaussian integrals). More will be said on the fractional Fourier transforms of chirp functions exp[in(Xu2 + 25u)] after we discuss the Wigner rotation property of the transform. Greater insight can be obtained by considering some numerically obtained illustrations. Indeed, the fractional Fourier transforms of many common functions do not have simple closed-form expressions. These may be obtained numerically using the algorithm discussed in Section 11 later. We know that when a = 0 we have the original function, and when a = 1 we have its ordinary Fourier transform. As a varies from 0 to 1, the transform evolves smoothly from the original function to the ordinary Fourier transform. Figures 1 and 2 show the evolution of the rect(u) TABLE 1 THEFUNCTIONS ON THE RIGHT ARE THE FRACTIONAL FOURIER TRANSFORMS OF THE FUNCTIONS ON THE LEFT; j IS AN ARBITRARY INTEGER,AND ( AND x ARE REAL CONSTANTS. FORCERTAIN ISOLATED VALUES OF (1, THE EXPRESSIONS BELOWSHOULD RE INTERPRETED IN THE LIMITING SENSE (EQUATION 17). IN THE LASTPAIR,x z 0 IS REQUIRED FOR CONVERGENCE.
248
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC a=O
a=113
(c)
(4
FIGURE1. Magnitudes of the fractional Fourier transforms of the rectangle function, I.
function into the sinc(u) E (sinw)/(nu) function. Figure 3 shows the real parts of the fractional Fourier transforms of the Dirac delta function 6(u - 1). We note that for orders close to zero, the transform of the delta function is highly oscillatory, and thus will approximately behave like the delta function under the integral sign, averaging out to zero whatever function it happens to multiply. Finally, we give the fractional Fourier transform of the quadratic phase exp(inu2/r,) with complex radius rc: function f(u) = exp( - 4 4 )
fi
provided Y(r,)d 0, which is also the condition for the original function f(u) to have finite energy. From this result we conclude that the complex radius
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
249
FIGURE 2. Magnitudes of the fractional Fourier transforms of the rectangle function, 11.
r: of the transformed function is r', =
'
+
rc t a n 4 1 - r,tan$
This result is useful in beam propagation problems since the original function f ( u ) represents a Gaussian beam with complex radius rc.
V. EIGENVALUES AND EIGENFUNCTIONS The eigenvalues and eigenfunctions of the ordinary Fourier transform are well known (although seldom discussed in introductory texts). They are the Hermite-Gaussian functions t+hn(u), commonly known as the eigensolutions of the harmonic oscillator in quantum mechanics, or the modes of propagation of quadratic graded-index media in optics. The eigenvalues may be expressed as exp( - in742) and are given by 1, - i, - 1, i, 1, - i, . . . for n = 0, 1, 2, 3, 4, 5,. . . . Thus the eigenvalue equation for the ordinary Fourier transform may be written as S$,(u) = e-'""'2$,(u),
(25)
250
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
r
1.5-
1.
1
1.5 1 0.5 0
0.5.
-0.5 -1
0
-1.5
where the Hermite-Gaussian functions are more explicitly given by $"@)= A,H,(JGu)e-""',
(26)
for n = 0, 1, 2, 3, 4, 5 , . . . . Here H,(u) are the Hermite polynomials. The particular scale factors which appear in this equation are a direct consequence of the way we have defined the Fourier transform with 2n in the exponent. The ath order fractional Fourier transform shares the same eigenfunctions as the Fourier transform, but its eigenvalues are the ath power of the eigenvalues of the ordinary Fourier transform:
Pa$,,(u) = e-iann'z$n(u).
(28)
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
25 1
This result can be established directly from Equation 6 by induction. First, we can show that ICl0(u) and $,(u) are eigenfunctions with eigenvalues 1 and exp( - ian/2) by evaluating the resulting standard complex exponential integrals. Then, by using standard recurrence relations for the HermiteGaussian functions it is possible to assume that the result-to-be-shown holds for n - 1 and n, and show that as a consequence it holds for n + 1. This completes the induction. The preceding demonstrated outline of the fact that Hermite-Gaussian functions are eigenfunctions of the fractional Fourier transform as defined by Equation 6 then reduces to the well-known fact that Hermite-Gaussian functions are eigenfunctions of the ordinary Fourier transform when a = 1, since Equation 6 reduces to the definition of the ordinary Fourier transform and since e-ianx'Zreduces to when a = 1. Readers familiar with functions F N ( d ) of an operator (or matrix) d with eigenvalues A,, will know that in general &"(d)will have the same eigenfunctions as d and that its eigenvalues will be F N ( I n ) . The above as we eigenvalue equation is particularly satisfying in this light since 9*, have defined it, is indeed seen to correspond to the ath power of the Fourier transform operator ( F N ( * )= (.)"). However, it should be noted that the definition of the ath power function is ambiguous, and our definition of the fractional Fourier transform through Equation 6 is associated with a particular way of resolving the ambiguity associated with the ath power function (Equation 28). Other definitions of the transform also deserving to be called the fractional power of the Fourier transform are possible. The particular definition we are considering is the one that has been most studied and that has led to the greatest number of interesting applications. We are convinced it has a special place among other possible definitions. Knowledge of the complete set of eigenvalues and eigenfunctions of a linear operator is sufficient to completely characterize the operator. In fact, in some works the fractional Fourier transform has been defined through its eigenvalue equation [Namias, 1980 Ozaktas and Mendlovic, 1993a,b; Mendlovic and Ozaktas, 19931. To find the fractional transform of a given function f ( u ) from knowledge of the eigenfunctions and eigenvalues only, we first expand the function as a linear superposition of the eigenfunctions of the fractional Fourier transform (which are known to constitute a complete set):
252
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
Applying 9" on both sides of Equation 29 and using Equation 28, one obtains
(31) J n=O
n=O
Upon comparison with Equation 6 , the kernel K,(u, u ' ) is identified as m
K,(u, u') =
1 e-'an"/2
$n(U)$
n(u').
(32)
n=O
This is the spectral decomposition of the kernel of the fractional Fourier transform. The kernel given in Equation 32 can be shown to be identical to that given in Equation 6 directly by using an identity known as Mehler's formula:
Several properties of the fractional Fourier transform immediately follow from Equation 28.In particular the special cases a = 0, a = 1, and the index additivity property are deduced easily. (The latter can be shown by applying F'' to both sides of Equation 28.)
VI. OPERATIONAL PROPERTIES Various operational properties of the transform are listed in Table 2 [Namias, 1980; McBride and Kerr, 1987; Mendlovic and Ozaktas, 1993; Almeida, 19941.Most of these are most readily derived or verified by using Equation 6 or the symmetry properties of the kernel. Operations satisfying the first property are referred to as even operations, so that the fractional Fourier transform is an even operation. This property also implies
which in turn imply that the transform of an even function is always even and the transform of an odd function is always odd. Similar facts can be stated in operator form: All even operators, and in particular the frac-
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
253
TABLE 2 OPERATIONAL PROPERTIES OF THE FRACTIONAL FOURIER TRANSFORM. 4 IS AN ARBITRARY REAL NUMBER, k Is A REALNUMBER ( k # 0, cu), AND n Is AN INTEGER; 4' = arctan(kz tand), WHERE 4'Is TAKEN T o BE I N THE SAME QUADRANT AS 4. f(u)
m)
tional Fourier transform operator, commute with the parity operator 9 ' ( F a y= 9'Fa)and satisfy F n= 9'.FaP.The eigenfunctions of even operations can always be chosen to be of definite (even or odd) parity (the Hermite-Gaussian functions satisfy this property). The second property is the generalization of the ordinary Fourier transform property stating that the Fourier transform of f ( k u ) is lkl-iF(p/k). Notice that the fractional Fourier transform of f ( k u ) cannot be expressed as a scaled version of f a ( u ) for the same order a. Rather, the fractional Fourier transform of f ( k u ) turns out to be a scaled and chirp-modulated version of f,.(u) where a' # a is a different order. Now we turn our attention to the fifth and sixth properties. The fractional Fourier transform of uf(u) is equal to a linear combination of ufa(u) and df,(u)/du. The coefficients of this linear combination are cos 4 and -sin 4. When u = 1, this reduces to the corresponding ordinary Fourier transform property. Similar comments apply to the fractional Fourier transform of df(u)/du. The essence of these properties are most easily grasped if we express them in pure operator form. Let us define the coordinate multiplication operator 4?l and differentiation operator 99 through their effects in the space domain
These are simply dimensionless versions of the position and momentum
254
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
operators of quantum mechanics and might have been written as
1 d
( u l W > = -- (ulf). i2n du
(39)
We may define in the same spirit operators %, and g,, which have the same effect on f,(u),the ath order fractional Fourier transform of f ( u ) :
where we have explicitly written u, to avoid confusion. The effect of these operators is to coordinate multiply and differentiate the fractional Fourier transform of f ( u ) , rather than f ( u ) itself. Now, with these definitions, the fifth and sixth properties of Table 2 can be written as
9'c"~f(41 = cos4(% f),(u,) - sin $(9,f)&,), S0[19f(41= sin $(%.f),(%) + cos d J @ , . f ) o ( ~ , ) .
(42) (43)
(%,f),(u,) is simply the u, representation of %,f, which we also refer to as the representation of @, f in the uth fractional Fourier domain. In the notation of quantum mechanics, (42, f),(u,) would have been written as (u,l%,',f). Similar comments apply to (B,f),(u,). The two preceding equations can be written in abstract operator form as
We see that the coordinate multiplication and differentiation operators corresponding to order u are related to those in the ordinary space (or time) domain by a simple rotation matrix. The commutator [%, 9 1 3 a9 - 9% is well known to be equal to i/2n. By using Equation 44 we can easily derive the commutator [Aytur and Ozaktas, 1995; Ozaktas and Aytur, 19951:
Knowing the commutator of two operators allows one to deduce an uncertainty relation between the two representations associated with those
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
255
operators. In particular, the above commutation relation leads to [Aytur and Ozaktas, 1995; Ozaktas and Aytur, 19951
Here a*o is the standard deviation of J f , ( ~ , ) 1 ~ and a.yl,. is the standard deviation of Ifa,(~,.)1~. The translation and phase shift operators can also be expressed in operator notation. Let F ( 5 ) denote the operator which takes f ( u ) to f ( u - 5) and let 9(5)denote the operator which takes f ( u ) to exp(i2n(u)f(u), all in the ordinary space domain. We may also define Fa(() and Po([) as the operators which have the same effect on the ath order fractional Fourier transforms: FJt) takes f,(u,) to f,(u, - 5) and Pa(<) takes f,(u,) to exp(i27t5ua)f,(u,). Then the third and fourth properties of Table 2 can be expressed as [Aytur and Ozaktas, 1995; Ozaktas and Aytur, 1995)
+)md
q~[y(5)f(~)] = eiz<2sin6cos4 u [2,( - 5 sin 4)%(5 cos e-i@sinq5cos6 9 [I A 5 cos +)9,(-t sin 4 > f l a ( ~ , ) , (47)
, p [ ~ g y , ~ )=f (e -~i ~ t)r 2ls i n d c o s 6CYa(5 cos 4 ) 9 ( 5sin 4)fI,(%) - e+in<*sin#cos4 [I9A5 sin 4)9,(5 cos 4>fl,(u,).
(48)
Here again the notation [df],(u,), where a? is some operator, denotes the u, representation of which would be written as ( u , l d f ) in quantum mechanics. In operator form
~ ( 5 =) e i a ~ 2 s i n & c o s ~ Psin , ( -4( ) ~ , (cos ( 4) e i @ s i n & c o s $ y n ( 5 cos 4)YJ - 5 sin 41, y(5)= ,-in< sin . d c o s 4 ~ , ( [ cos +)%,(5 sin 4) - eix<2sindcosq5 J,(5 osin 4PJ5 cos 4).
(50)
(51)
We again see that the effect of translation is a combination of translation (by C O S ~and ) phase multiplication (by sinq5) of the fractional Fourier transform. A similar comment applies to the effect of phase multiplication. When a = 1, these results reduce to the corresponding well-known properties of the ordinary Fourier transform. The fractional Fourier transform does not have a convolution or multiplication property of comparable simplicity to that of the ordinary Fourier transform.
256
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
VII. RELATION TO THE WIGNER DISTRIBUTION The direct and simple relationship of the fractional Fourier transform to the Wigner distribution as well as to certain other phase-space distributions is perhaps its most important and elegant property [Mustard, 1989, 1996; Lohmann, 1993; Almeida, 1994; Mendlovic, Ozaktas, and Lohmann, 1994a; Ozaktas and others, 1994a1. Here we will define and briefly discuss some of the most important properties of the Wigner distribution. The Wigner distribution W,(u, p) of a function f(u) is defined as
W,(u, p ) =
s
,f(u
+ u'/2)f*(u
- u'/2)e-2"p"'du'.
(52)
Wf(u, p) can also be expressed in terms of F(p), or indeed as a function of any fractional transform of f(u). Some of its most important properties are
(55) Roughly speaking, W(u,p ) can be interpreted as a function that indicates the distribution of the signal energy over space and frequency. The Wigner distribution of F(u) (the Fourier transform of f(u)) is a ninety-degree rotated version of the Wigner distribution of f(u). More on the Wigner distribution and other such distributions and representations may be found in Claasen and Mecklenbrauker [1980a,b,c, 19931, Hlawatsch and Boudreaux-Bartels [1992], and Cohen [1989, 19951. Now, if W,(u, p ) denotes the Wigner distribution of f(u), then the Wigner distribution of the ath fractional Fourier transform of f(u), denoted by WJa(u, p ) , is given by
Wfn(u, p ) = W,(u cos (6 - p sin (6,
u sin (6
+ p cos (6).
(56)
so that the Wigner distribution of Wm(u,p) is obtained from W,(u, p ) by to be the operator rotating it clockwise by an angle 4. Let us define 9?+ which rotates a function of ( u , p ) by angle 4 in the conventional counterclockwise direction. Then we can write
Wf,(U, p) = a-&w,(U,p).
(57)
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
257
This elegant and fundamental property underlies an important number of the applications of the fractional Fourier transform. In fact, some authors have defined the transform as that operation which corresponds to rotation of the Wigner distribution of a function [Lohmann, 19931. Equation 56 can be derived directly from Equation 6 and the definition of the Wigner distribution given by Equation 52 [Ozaktas and others, 1994a1. The derivation is somewhat lengthy but straightforward. A similar derivation is given by Mustard [1989, 19961 and by Almeida [1994]. Lohmann [19931 shows the reverse, starting from the rotation property and arriving at Equation 6. An at least equally important form of this result follows easily [Mustard, 1989, 1996; Lohmann and Soffer, 1994; Ozaktas and others, 1994a1. Let us recall Equations 53 and 54, which state that the integral projection of W,(u, p ) onto the u axis is the magnitude square of the u-domain representation of the signal and that the integral projection of W,(u, p ) onto the p axis is the magnitude square of the p-domain representation of the signal. Now, let us rewrite the first of these equations for .f;(u), the uth order fractional Fourier transform of f ( u ) :
Since Wfa(u,p ) is simply W,(u, p ) clockwise rotated by angle 4, the integral projection of Wfa(u, p ) onto the u axis is identical to the integral projection of W,-(u,p ) onto an axis making angle 4 with the u axis. This new axis making angle 4 = 4 2 with the u axis is referred to as the u, axis. Let [email protected],1j denote the Rudon trunSform operator, which maps a two-dimensional function of (u, p ) to its integral projection onto an axis making angle 4 with the u axis [Bracewell, 19951. Thus the above can be written as
W,(K PI
gJq5
= I.f,(412.
(59)
In conclusion, the integral projection of the Wigner distribution of a function onto the u, axis is equal to the magnitude square of the uth order fractional Fourier transform of the function (Fig. 4). Equations 53 and 54 are special cases with u = 0 and a = 1. Wood and Barry discussed what they referred to as the “Radon-Wigner transform” without realizing its relation to the fractional Fourier transform [Wood and Barry, 1994a,b]. The above discussion demonstrates that the Radon-Wigner transform is simply the magnitude squared of the fractional Fourier transform. The results of this section, and in particular Equation 59, continue to hold when W’(u, p ) is the Wigner distribution of a random process since the expectation value operation can move inside the Radon transform and
258
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
u,
FIGURE 4. Oblique integral projections of the Wigner distribution.
rotation operators: W,,(U>PI
= B-4, W,(% P I , ______
( 60)
a.dg+w,(KP ) = l.Lt4IZ.
(61)
Here the overbars denote ensemble averages. The Wigner distribution is not the only time-frequency representation satisfying the rotation property (Equation 57). The ambiguity function also satisfies this property because the ambiguity function is the two-dimensional Fourier transform of the Wigner distribution, and the two-dimensional Fourier transform of the rotated version of a function is the rotated version of the two-dimensional Fourier transform of the original function [Ozaktas and others, 1994a; Almeida, 19941. Almeida [1994] showed that the rotation property also holds for the spectrogram. It has been further shown that the rotation property generalizes to certain other time-frequency distributions belonging to the so-called Cohen class, whose members can be obtained from the Wigner distribution by convolving it with a kernel characterizing that distribution. The distributions for which the rotation property holds are those which have a rotationally symmetric kernel [Ozaktas, Erkaya, and Kutay, 1996a1. Thus, fractional Fourier transformation corresponds to rotation of many phase-space representations. This not only confirms the important role this
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
259
transform plays in the study of such representations but also supports the notion of referring to the axis making angle 4 = a 4 2 with the u axis as the ath ,fractional Fourier domain. Despite this generalization, the only distribution which satisfies a relation of the form of Equation 59 is the Wigner distribution [Mustard, 19891. Cohen [I9891 argues that there is nothing special about the Wigner distribution among other members of the Cohen class, since all members of this class, including the Wigner distribution, are derivable from each other through convolution relations. However, the fact that only the Wigner distribution satisfies Equation 59 has led Mustard to suggest that the Wigner distribution is a specially distinguished member of the Cohen class [Mustard, 1989, 19961. As an instructive application of the Wigner rotation property, we discuss the fractional Fourier transforms of chirp functions. The transform of the chirp function exp[in(Xu2 + 2&)] was given in Table 1 as a rather complicated expression. Phase-space offers a much more transparent picture. The Wigner distribution of the chirp function exp[in(Xu2 + 25u)] is W’(u, p ) = S(,u - xu - 0, which is simply a line delta in phase space along the line ,u = xu + 5 making angle orctan (x) with the u axis (Fig. 5). The Wigner distribution of exp(i2ntu) is W,(u, p ) = 6 ( p - 5 ) and is seen to be a special case in which the line delta is horizontal. Similarly, the Wigner distribution of 6(u - 5 ) is W’(u, p) = 6(u - 5 ) and is also seen to be a special case in
FIGURE5. Wigner distribution of a chirp function.
260
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
which the line delta is vertical. (That the chirp indeed behaves like a delta function as x, 5 cn can be shown by using Equation 17.) Thus, since harmonic functions and delta functions can be considered degenerate or limiting cases of chirp functions, it is possible to make the general statement that the fractional Fourier transform of a chirp function is always another chirp function. In phase space, we observe this as the rotation of line deltas into other line deltas.
FOURIER DOMAINS VIII. FRACTIONAL Equations 51 and 59 immediately lead to the interpretation of oblique axes in phase space as fractional Fourier domains. Just as the projection of the Wigner distribution onto the space domain gives the magnitude square of the space-domain representation of the signal, and the projection of the Wigner distribution onto the frequency domain gives the magnitude square of the frequency-domain representation of the signal (Equations 53 and 54), the projection onto the axis making angle 4 = an12 with the u axis gives the magnitude square of the ath fractional Fourier-domain representation of the signal (Equation 59). When we need to be explicit we will use the variable u, as the coordinate variable in the uth domain, so that the representation of the signal .f in the ath order fractional Fourier domain will be written as ,f,(u,). We immediately recognize that the 0th and 1st domains are the ordinary space and frequency domains and that the 2nd and 3rd domains correspond to the negated space and frequency domains (uo = u, u1 = p , u2 = - u , uj = - p ) . The representation of the signal in the u’th domain is related to its representation in the ath domain through an (u’ - u)th order fractional Fourier transformation:
When (a’ - a) is an integer, this corresponds to a forward or inverse Fourier integral. The notion of fractional Fourier domains as oblique axes in spacefrequency is also confirmed by the operational formula presented earlier. We ~ of the shift had seen that a translation of uo corresponded to C O S much and a sin &much phase shift in the ath domain. Likewise, we had seen that multiplication by u corresponded to cos 4 much of the multiplication and a sin &much differentiation.
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
26 1
IX. DIFFERENTIAL EQUATIONS Here we will see that the fractional Fourier transform f,(u) of a function is the solution of a differential equation, where ,f,(u) may be interpreted as the initial condition of the equation. This is the quantummechanical harmonic oscillator differential equation and also the equation governing optical propagation in quadratic graded-index media (in the former case the order parameter a corresponds to time and in the latter case it corresponds to the coordinate along the direction of propagation) [Agarwal and Simon, 1994; Ozaktas and Mendlovic, 19951. In fact, in some sources the solution is written in the form of an integral transform whose kernel is sometimes referred to as the harmonic oscillator Green's function, without the authors knowing that this is the fractional Fourier transform. (To be precise, we must note that the differential equations governing these physical phenomena differ slightly from the equation we discuss, but this small difference is inconsequential.) The differential equation is
f,(.)
with the initial condition fo(u) = f(u). The solution f,(u) of the equation is the ath order fractional Fourier transform of f ( u ) as can be shown by direct substitution of Equation 6. Alternatively, and more instructively, we may take an eigenvalue equation approach. Substituting the form f,(u) = exp( -ipa)f,(u) in Equation 63, we obtain
Comparing this equation with the standard equation
whose solutions are well known as the Hermite-Gaussian functions $,,(u), we conclude that the nth Hermite-Gaussian function is a solution of Equation 64 when fl = fl, = nn/2. It is now possible to write down arbitrary solutions of the equation as a linear superposition of these eigenfunctions (modes). If the initial condition fo(u) is $,,(u), the solution ,fa(u) is exp( - iann/2) $,(u); this is what it means to be an eigenfunction. Given an arbitrary initial condition fo(u), we can expand it in terms of the Hermite-
262
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
Gaussian functions $,,(u) as
Since Equation 63 is linear, the solution corresponding to this initial condition is readily obtained as
from which one can obtain
exactly as in Equation 32. The differential equation in question can also be written in terms of the operators 42 and 9 as follows:
or
Readers with a background in quantum mechanics will readily recognize the Hamiltonian P = x(4Y2 + g 2 )- 1/2 which characterizes the harmonic oscillator. This Hamiltonian is domain-invariant, which means that 7~(422d,2 9:)is the same operator regardless of the value of a, as can be readily shown by using Equation 44. This rotational invariance of the Hamiltonian ties in with the Wigner rotation property. (The extra -1/2 represents the inconsequential discrepancy mentioned earlier.) More on the relationship of the fractional Fourier transform to differential equations and their solutions is found in Namias [1980] and McBride and Kerr [1987].
+
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
263
X. HYPERDIFFERENTIAL FORM The hyperdifferential form of the fractional Fourier transform operator is given by [Namias, 1980; Mustard, 1987aI. 90 =
- i ( a n i 2 ) zr
2 = 7L(@
,
(72)
1 + 6k2)- 5'
Applied to a function f(u) we may write
This hyperdifferential representation may be considered the formal solution of the differential equation written in the form of Equation 71. The operator 9'given in Equation 72 generates f , ( u ) for all values of a from f , ( u ) = f ( u ) . The index additivity property and the special case a = 0 immediately follow from the exponential form exp( - i&P).
XI. DIGITAL SIMULATION OF THE TRANSFORM Here we briefly discuss how the fractional Fourier transform may be computed on a digital computer, referring the reader to Ozaktas and others [1996b] for further details. The defining equation (Equation 6) can be put in the form
s
f o ( u ) = ~ ~ ~ i n c o t q 5 u, -2i 2 n c s c d u u '
incotdu'2
Ce
f(u')] du'.
(74)
We assume that the representations f , ( u a ) of the signal f in all fractional Fourier domains are approximately confined to the interval [ - Au/2, Au/2] (that is, a sufficiently large percentage of the signal energy is confined to these intervals). This assumption is equivalent to assuming that the Wigner distribution of f ( u ) is approximately confined within a circle of diameter Au (by virtue of Equation 59). Again, this means that a sufficiently large percentage of the energy of the signal is contained in that circle. We can ensure that this assumption is valid for any signal by choosing Au sufficiently large. Under this assumption and initially limiting the order a to the interval 0.5 < la1 d 1.5, the modulated function e i n a u ' * f ( u ' ) may be assumed to be approximately band-limited to k Au in the frequency domain. Thus
264
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
einau’2 f ( u ’ )
can be represented by Shannon’s interpolation formula
where N = (AM)’. The summation goes from - N to N - 1 since ,f(u‘) is assumed to be zero outside [I-Au/2, Au/2]. By using Equation 75 and Equation 74 and changing the order of integration and summation, we obtain
iI
(
x sinc 2Au u’ - -
du‘.
2:JI
By recognizing the integral to be equal to ( 1/2An)e-i2“csc~u~n”1Au’ rect(csc (4)u/2Au), we can write N-1
A f
Q
(u) = 2 2 6 , ~, -
since rect(csc (4)/2Au)
e i n c o t ~ ~ u 2 e - i 2 n c s c 4 ~ u ( n / 2 A ue)incot4(n/ZAu)*
N
=
1 in the interval ( u ( < Au/2. Then, the samples of
f,(u) are given by
which is a finite summation allowing us to obtain the samples of the fractional transform f Q ( u )in terms of the samples of the original function f ( u ) . Direct computation of Equation 78 would require O ( N 2 ) operations. A fast (O(N log N ) ) algorithm can be obtained by putting Equation 78 into the following form:
’(&)
N-1 ein(cotQ, -cscq5)(m/2Au)2
- 2Au ein(cotrp -csc$)(n/2A11)~
1
eincsc#((m-n)/ZAu)2
n= -N
(79)
We now recognize that the summation is the convolution of eincscg(ni2Au)2 and the chirp-modulated function f ( . ) . The convolution can be computed in O(N log N ) time by using the fast Fourier transform (FFT). The output samples are then obtained by a final chirp modulation. Hence the overall complexity is O(N log N ) . We had limited ourselves to 0.5 < la] d 1.5 in deriving the above algorithm. Using the index additivity property of the fractional Fourier trans-
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
265
form we can extend this range to all values of a easily. For instance, for the range 0 < a < 0.5, we can write
9” =F a - l + l
= U-u-l
s
0-1 A .
(80)
Since 0.5 < la - 11 < 1, we can use the above algorithm in conjunction with the ordinary Fourier transform to compute f , ( u ) . The overall complexity remains at O(N log N ) .
XII. APPLICATIONS TO WAVE
AND
BEAMPROPAGATION
A considerable number of papers have been written on the application of the fractional Fourier transform to wave and beam propagation problems, mostly in an optical context. Our presentation will also be phrased in the notation and terminology of optics. Nevertheless, the reader should have no difficulty translating the results to other propagation, diffraction, and scattering phenomena which are mathematically equivalent or similar. Whenever we can express the result of an optical problem (such as Fraunhofer diffraction) in terms of a Fourier transform, we tend to think of this as a simple and elegant result. This is justified by the fact that the Fourier transform has many simple and useful properties which make it attractive to work with. The Fourier transform and image occur at certain privileged planes in an optical system. Often all our intuition about what happens in between these planes is that the amplitude distribution is given by a complicated integral. We will see below that the distribution of light at intermediate planes can be expressed in terms of the fractional Fourier transform (which also has several useful properties and operational formulas). Thus the fractional Fourier transform completes in a very natural way the study of optical systems often called “Fourier optics.” Fourier optical systems can be analyzed using geometrical optics, Fresnel integrals (spherical wave expansions), plane wave expansions. HermiteGaussian beam expansions, and, as we will discuss, fractional Fourier transforms. The several approaches prove useful in different situations and provide different viewpoints which complement each other. The fractional Fourier transform approach is appealing in that it describes the continuous evolution of the wave as it propagates through the system. A . Introduction
Optical systems involving an arbitrary sequence of thin lenses separated by arbitrary sections of free-space (under the Fresnel approximation) belong to the class of quadratic-phase systems. Mathematically, quadratic-phase sys-
266
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
tems are equivalent to linear canonical transforms [Wolf, 19791. Systems contain arbitrary sections of quadratic graded-index media also belong to this class. The class of Fourier optical systems (or first order optical systems) consist of arbitrary thin filters sandwiched betwen arbitrary quadratic-phase systems. Members of the class of quadratic-phase systems are characterized by linear transformations of the form [Bastiaans, 1978, 1979a, 1979b, 1989, 1991; Nazarathy and Shamir, 1982; Ozaktas and Mendlovic, 19951
where K’ is a complex constant and a, j9, and y are real constants. Comparing this equation with Equation 6, we see that fractional Fourier transforms are a special case of quadratic-phase systems. Until this point, all variables have been considered to be dimensionless and were denoted by u, p, etc., and all functions and kernels took dimensionless arguments and were denoted by f(u), g(u), h(u, u’), etc. In optical applications we will often employ variables with the dimensions of length or inverse length, which we will denote by x, p , etc. Functions and kernels taking such arguments will be distinguished as T ( u ) , $(u), h(u, u’), etc. With these conventions, Equation 81 can be rewritten as
where K = K‘/s, s is a constant with the dimension of length, and ,LUtCx)= fout(x/s),etc. The choice of s essentially corresponds to the choice of units. We will assume s is specified once and for all throughout our analysis. The kernels associated with a thin lens with focal length .f and free-space propagation over a distance d are given respectively by [Saleh and Teich, 19911
A is the wavelength of light in free space and Klensand Kspaceare constants. We continue to work with one-dimensional notation for simplicity, although
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
267
most optical systems are two-dimensional. These kernels are special cases of the kernel given in Equation 81. It is possible to prove that any arbitrary concatenation of kernels of this form will result in a kernel of the form given in Equation 82. Apart from the constant factor K , which has no effect on the resulting spatial distribution, a member of the class of quadratic-phase systems is completely specified by the three parameters a, fi, and y (Equation 81). Alternatively, such a system can also be completely specified by the transformation matrix [Bastiaans, 1989; Nazarathy and Shamir, 1982; Ozaktas and Mendlovic, 19953
with AD - BC = 1. Here again s is the scale factor relating our dimensional and dimensionless variables. If several systems, each characterized by such a matrix, are cascaded, the matrix characterizing the overall system can be found by multiplying the matrices of the several systems. The matrix defined above also corresponds to the well-known ray matrix employed in ray optical analysis. At a certain plane perpendicular to the optical axis, a ray can be characterized by its distance from the optical axis z and its paraxial angle of inclination 8. We will define the ray vector as [x p I T where p = 8/A. Then, the ray vector at the output is related to the ray vector at the input by
The matrices corresponding to a thin lens and a section of free-space are given respectively by
and
Quadratic graded-index media exhibit a parabolic refractive index profile n(x) about the optical axis, characterized by the two parameters no and q as follows:
268
H A L D U N M. OZAKTAS, M. A. KUTAY, A N D DAVlD MENDLOVIC
The matrix corresponding to quadratic graded-index media is given by
where d is the length of the medium and do = ~ 7 4 2 .This matrix can be derived by a simple application of the ray equation [Saleh and Teich, 19911. The space spanned by the coordinates x and p also constitutes a phase-space which directly corresponds to the space-frequency plane on which the Wigner distribution was defined. A particular ray characterized by its phase-space vector [x pIT will be mapped to another according to its ABCD matrix given above. If we consider a bundle of rays constituting a region in this phase-space, this region will likewise be transformed according to the same matrix. For instance, let us consider the rectangular bundle consisting of rays whose intercepts lie between -xo and xo and whose inclinations lie between -po = -BOA and po = 8,A (Fig. 6a). If this bundle passes through a lens, it will be transformed according to Equation 87 into the bundle shown in Fig. 6b. If this bundle passes through a section of free-space, it will be transformed according to Equation 88 into the bundle shown in Fig. 6c. More generally, for arbitrary ABCD it will be transformed into a bundle of the general form shown in Fig. 6d. This mapping of phase-space regions can also be posed in terms of the Wigner distribution. It is known that if To,, is the linear canonical transform of &, with parameters ABCD, then the Wigner distributon of To,, is related to that of An by the relation [Bastiaans, 1979b1
where x,,,, poutand xi", pin are related according to Equation 86. This is simply a generalization of the Wigner rotation property discussed in Section VII. In particular, since we know from this property that fractional Fourier transformation corresponds to rotation in phase-space, it follows that the ABCD matrix for fractional Fourier transforms should be the rotation matrix.
B. Quadratic-Phcisse Systems as Fractional Fourier Transforms As is evident by comparing Equations 6 and 81, the one-parameter class of fractional Fourier transforms is a subclass of the class of three-parameter quadratic-phase systems. If we allow an additional magnification parameter M and a phase curvature parameter 1/R, the family of fractional Fourier transforms will now also have three parameters and can be put in one-toone correspondence with the family of quadratic-phase systems. The kernel
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
269
I Po
& -Pol
-Xo-Pohf
%+P,hf
(c>
( 4
FIGURE 6 . Effect of quadratic-phase systems in phase-space.
of this three-parameter transform may be written as
lo,,(.^)
=
s
h(x, xr)Jn(xf)dx‘
h(x, x‘) = K,, exp(inx2/1R) exp
xx’
cot 4 - 2 - csc 4 M
+ x” cot 4
which is in the form of a quadratic-phase system. (The pure mathematical form given by Equation 6 is recovered by setting x/s = u, M = 1, R = m.) This kernel maps a function f(x/s) into K’exp(ixx*/AR)f,(x/sM ), where f,(u) is the uth order fractional Fourier transform of f ( u ) . Here 4 = an/2 as before, M > 0 is referred to as the magnification associated with the transform, and R is the radius of the spherical surface on which the perfect
270
H A L D U N M. OZAKTAS, M. A. KUTAY, A N D D A V I D MENDLOVIC
fractional Fourier transform is observed. When R = co,the quadratic-phase term disappears and the perfect fractional Fourier transform is observed on a planar surface. The above family of kernels is in one-to-one correspondence with the family of kernels given in Equation 81. The parameters a, B, and y are recognized to be related to the parameters 4, M , and R through the relations CI
= cot 4 / M 2
+ s’/IR,
(93)
p = CSC~/M, y = cot 4.
(94) (95)
Alternatively, the ABCD parameters are related to relations
[i
=
4, M , and R through the
M cos 4 s2M sin 4 [-sin (P/s’M M cos 4lAR cos qh/M s’M sin +/AR
+
+
which can be inverted to yield tan4
M
1 B
=--
(97)
s’ A’
=
Jm,
- 1_ -- 1
ILR s4 A’
C
BIA
+ (Bls’)’
+
A’
The above result essentially means that any quadratic-phase system can be interpreted as a magnified fractional Fourier transform, perhaps with a residual phase curvature. Since a relatively large class of optical systems can be modeled as quadratic-phase systems, these systems can also be interpreted as fractional Fourier transforms [Ozaktas and Mendlovic, 19961. We will first consider two elementary examples -propagation in quadratic graded-index media and diffraction in free-space-and then treat the more general case of arbitrary composition of thin lenses and sections of freespace. C. Propagation in Quadratic Graded-Index Media Quadratic graded-index media have a natural and direct relationship with the fractional Fourier transform. Light is simply fractional Fourier transformed as it propagates through such media. The refractive index distribu-
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
271
tion for such media was already given in equation 89. The ABCD matrix for graded-index media was given as Equation 90. Comparing this with Equation 96, we immediately conclude that propagation through a section of graded-index media results in a fractional Fourier transform of order a = 2d/ny E d/d,, provided the scale parameter s is chosen such that s2 = l y / n , . Agreeing on this choice of s, there is no magnification ( M = 1) and no residual phase curvature ( R = 00). Recalling the comment at the end of Subsection A, we conclude that as light propagates through quadratic graded-index media, its Wigner distribution rotates. Quadratic graded-index media realize fractional Fourier transforms in their purest and simplest form. If the distribution of light at the input plane is given by f(x/s), then the distribution of light at the output plane is proportional to fa(x/s), where the transform order a increases linearly with distance of propagation. The same result can be arrived at by starting from the Helmholtz equation for quadratic graded-index media, finding its modes (which are the Hermite-Gaussian functions), and constructing arbitrary solutions as linear superpositions of these modes. This approach may be found in Mendlovic and Ozaktas [1993] and Ozaktas and Mendlovic [1993a,b].
D. Frrsnel Difraction Although quadratic graded-index media are perfectly matched to the fractional Fourier transform, it is of interest to discuss the more basic problem of diffraction from a planar screen with complex amplitude transmittance f ( x ) . The complex amplitude distribution G(x) of light in a diffraction plane at distance d is given by the Fresnel integral (Equation 84):
X"t(4
=
kpace(x,x')
=
1
&.x,
X')X"(X')
Kspaceexp[in(x
dx' -
~')~/l-dl,
( 100)
assuming illumination of the screen by a uniform plane wave. Now, with f(x/s) = y(x), it is possible to cast this integral in the form of the integral of Equation 92 by identifying 6 = an/2 = arctan(Ad/s2), M = AR = (s4 + A2d2)/ld. (The same results can be arrived at by comparing Equation 96 with Equation 88, or by specializing Equations 97, 98, and 99 to A = D = 1, B = i d , C = 0.) At a distance d from the diffracting object or aperture, we observe the uth order fractional Fourier transform of the object on a spherical reference surface with radius R. The transform is magnified by M . As d is increased
Jm,
272
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
from 0 to co, the order a of the fractional transform increases according to a = (2/7c) arctan(id/s2) from 0 to 1 (Fig. 7). Letting d --* 00,we obtain a = 1, M = Ad/s2 cc d, and R = d, which we readily associate with the Fourier diffraction pattern, which is nothing but the Fourier transform of the diffracting screen. Note that in this limit, the magnification and radius of curvature are both proportional to the distance d. Thus we see that the propagation of light along the + z direction can be viewed as a process of continual fractional Fourier transformation. As light propagates, its distribution evolves through fractional transforms of increasing orders. The fact that the far-field diffraction pattern is the Fourier transform of the diffracting object is one of the central results of diffraction theory. We have shown that the field at closer distances is given by fractional Fourier transforms of the diffracting object. More generally, there exists a fractional Fourier transform relation between the amplitude distribution of light on two spherical reference surfaces of given radii and separation [Ozaktas and Mendlovic, 1994, 19951. It is possible to determine the order and scale parameters associated with this fractional transform given the radii and separation of the surfaces. Alternatively, given the desired order and scale parameters, it is possible to determine the necessary radii and separation.
1
0.9 -
0 0
2
4
6
8
10
12
14
16
hdls2 FIGURE 7. a
= (2/n) arctan(Ld/s2) as
a function of dd/s2.
18
273
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
E. Fourier Optical Systems We will now see that any quadratic-phase system can always be interpreted as a fractional Fourier transforming system. Thus the fractional Fourier transform can describe all systems composed of an arbitrary number of lenses separated by arbitrary distances, whereas imaging and Fourier transforming systems are only special cases. All of the essential ingredients have already been provided and it merely remains to state the result. Any quadratic-phase system (Equation 82) can be characterized by its ABCD parameters. As discussed earlier, the kernel in Equation 82 can be cast in the form of the kernel in Equation 92 by identifying 4 = a 4 2 , M , and R according to Equations 97,98, and 99. Since any quadratic-phase system can be interpreted as a fractional Fourier transform, and since any optical system consisting of an arbitrary concatenation of lenses and sections of free-space can be modeled as a quadraticphase system, it follows that such an optical system can also be interpreted in terms of the fractional Fourier transform. A concrete example will be useful. Figure 8a shows a system consisting of several lenses whose focal lengths have been indicated in meters. The input plane is taken as z = 0. The output plane is variable, ranging from z = 0 to z = 2 m. Two rays have been drawn through the system. Let A(z), B(z), C(z), D(z) denote the ABCD parameters of the section of the system occupying the interval [0, z ] , which can be readily calculated using the matrices for lenses and sections of free-space and the concatenation property. Also let [x(z) p(z)lT denote the ray vector at z. Then,
We further let 4 ( z ) = a(z)x/2, M(z). R(z) represent the order, magnification, and phase curvature of the fractional Fourier transform observed at z. These can be determined again by using Equations 97, 98, and 99. The fractional transform order a(z), the scale parameter M(z), and the radius R(z) of the spherical surface on which the perfect transform is observed are plotted as functions of z in Fig. 8 [Ozaktas and Erden, 19971. Letting j denote an arbitrary integer, when a = 4j we observe an erect image, when a = 4j + 2 we observe an inverted image, when a = 4j 1 we observe the common Fourier transform, and when a = 4j - 1 we observe an inverted Fourier transform (which is the same as an inverse Fourier transform). The reader should study the behavior of the two rays in conjunction with the graphs in Fig. 8. At z = 0.4 we obtain a conventional Fourier transform (a = 1) as a result of the conventional 2f system
+
274
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
I .
,I
!
-1
:
-2
0
(b.)
m
I,'
I
I
!
I
I
I
I
0.2
0.4
0.6
'
I
I
! 1
0.8 2
1.2
1.4
1.6
1.8
2
(m)
1
0 0
j z )c(
02
04
06
08
1 z (m)
12
14
16
18
2
02
04
06
08
z (m) 1
12
14
16
18
2
1 00 50
U FIGURE8. Evolution of a(z), M(z), I/R(z) as functions of z [Reprinted from Optics Communicutions, 143. 75-83, 1997 with kind permission from Elsevier Science-NL, Sara Burgerhardtstraat 25, 1055 KV Amsterdam, The Netherlands.].
occupying the interval [O, 0.41. An inverted image (a = 2) is observed at z z 0.65. We see that M < 1 and R > 0, as confirmed by an examination of the rays. (The ray represented by the solid line crosses the z = 0.65 plane at a negative value (implying an inverted image) smaller than unity in magnitude (implying M < I), with a slope indicating divergence (implying
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
275
R > O).) An inverted Fourier transform ( a = 3) is observed at z z 1.2, almost coincident with the lens at that location. An erect image (a = 1) is observed at z M 1.4, immediately after the lens at that location. The field curvature 1/R of this image has a very small negative value and the magnification M is slightly smaller than 1. The imaging systems discussed in Bernard0 and Soares [1994b] provide additional useful examples which the reader may wish to study in a similar manner [Ozaktas and Erden, 19973. Fourier optical systems consist of an arbitrary number of thin filters sandwiched between arbitrary quadratic-phase systems. It readily follows that any Fourier optical system can be modeled as filters sandwiched between fractional Fourier transforms stages, or as repeated filtering in consecutive fractional Fourier domains (see Section XIII) [Ozaktas and Mendlovic, 19963. F. Optical ~mplementatio~ of' the Fractional Fourier Trunsform Here we mention a number of systems which map Jn(x) = f ( x / s ) into cc ,f,(x/s). Conceptually simplest is to use a section of quadratic graded-index media of length d = y(an/2) = ad, with s2 = l q / y , (Subsection C). In practice, systems consisting of bulk lenses may be preferred. Two such systems were first presented by Lohmann [1993]. We present these systems without derivation, referring the reader to Lohmann [1993] and Ozaktas and Mendlovic [1995] for details. The first system consists of a section of free-space of length d followed by a lens of focal length f followed by a second section of free-space of length d. To obtain an ath order fractional Fourier transform with scale parameter s, we must choose d and f according to
f,",(x)
S2
d = - tan(g/2),
A
The second system consists of a lens of focal length f followed by a section of free-space of length d followed by a second lens of focal length f: This time d and ,f must be chosen according to
276
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
More general systems can easily be obtained by using the general formulation presented in Ozaktas and Mendlovic [19951, Mendlovic and others [1995b,c], Jiang [1995], Liu and others [1995], Sahin, Ozaktas, and Mendlovic [1995], and Ozaktas and Erden [1997]. G. Gaussian Beam Propagation
We have already seen that the propagation of light can be viewed as a process of continuous fractional Fourier transformation. In this subsection we will discuss the same facts, but this time in terms of Hermite-Gaussian beam expansions rather than Fresnel integrals or plane wave expansions. We will further see that the order of the fractional Fourier transform is proportional to the Gouy phase shift accumulated by the beam as it propagates. Let f ( x , 0) denote the complex amplitude distribution at the plane z = 0. We can expand this function in terms of the Hermite-Gaussian functions:
We can interpret the function s-”~$,,(x/s) as the amplitude distribution of a one-dimensional nth order Hermite-Gaussian beam at its waist. Then, it becomes an easy matter to write the amplitude distribution f(x, z ) at an arbitrary plane, since we know how each of the Hermite-Gaussian components propagates [Saleh and Teich, 19911:
In this equation m(z) = &w(z)/s, where w(z) = w(O)[1 + ( Z / Z ~ ) ] ~ /is~ the beam radius. Thus m(0) = &w(O)/s, where w(0) is the waist radius. The Rayleigh range zo is related to s by the relation s2 = lz,. We also have k = 2 n / 5 where 1is the wavelength. Thus r(z) = z[1 + ( Z ~ / Z ) ~ is ] the radius of curvature of the wavefronts, and [(z) = arctant(z/z,) is the Gouy phase shift [Saleh and Teich, 19911.
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
277
Equation 108 can be written in a considerably simple manner in terms of the fractional Fourier transform. Let us define functions with normalized arguments such that f(x, z ) = f(x/s, z/s), etc. Then the amplitude distribution at any plane is given by
where 2 [(z). n
a(z) = -
In Equation 109, the fractional Fourier transform is taken with respect to u, and f(u, 0) = f((su, 0). Rewriting
we see that the “angular order” 4 of the fractional Fourier transform in question is simply equal to the Gouy phase shift accumulated in propagating from z = 0 to z. As z co, we see that [(z) -+ 7c/2 and a(z) 1, corresponding to the ordinary Fourier transform. This is the same result discussed in Subsection D. This result can be generalized for propagation between two spherical references surfaces with arbitrary radii [Ozaktas and Mendlovic, 19943. Let the radius of the surface at z = z , be denoted by R , and that of the surface at z = z2 be denoted by R,. The radii are positive if the surface is convex to the right. Then, there exists a fractional Fourier transform between these two surfaces whose order is given by --f
--f
It is well known that if a certain relation between R , , R , and z , - z1 holds, one obtains an ordinary Fourier transform relation between two spherical surfaces. What we have shown is that, for other values of the parameters, we obtain a fractional Fourier transform relation. Given any two spherical surfaces, what we need to do to find the order a of the fractional Fourier transform relation existing between them is to find the Rayleigh range and waist location of a Gaussian beam that would “fit” into these surfaces, and then calculate a from Equation 112. We may also think of a complex amplitude distribution “riding” on a Gaussian beam wavefront. The spatial dependence of the wavefront as the
278
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
wave propagates is like a carrier defining spherical surfaces, on top of which the complex amplitude distribution rides, being fractional Fourier transformed in the process. Since laser resonators commonly consist of two spherical mirrors, it becomes possible to characterize such resonators in terms of a fractional order parameter, again obtained from Equation 112. The well-known stability (or confinement) condition for spherical mirror resonators can be stated in a particularly simple form in terms of the parameter a: As long as a is real, we have a stable resonator. (In our discussion we have implicitly assumed that a and the Rayleigh range zo are real, which means that we have implicitly assumed stable resonators.) Unstable resonators are described by values of a which are not real. Further details may be found in Ozaktas and Mendlovic [1994]. In addition to the relation between the fractional order parameter $(z) and the Gouy phase shift [(z), the reader might also have noticed the similarity between the behavior of M(z), R(z),and the common parameters of Gaussian beams, namely the beam diameter w(z) and the wavefront radius of curvature r(z). Indeed, readers well familiar with the propagation of Gaussian beams will have no difficulty interpreting the evolution of R(z) and M(z)in Fig. 8 as the wavefront radius and diameter of a Gaussian beam. In considering systems such as that in Fig. 8, we will use [(z) to denote the accumulated Gouy phase shift with respect to the input plane at z = 0, rather than the conventional Gouy phase shift with respect to the last waist of the beam [Erden and Ozaktas, 19971. Essentially, the accumulated Gouy phase shift of a Gaussian beam passing through an optical system is defined as the phase accumulated by the beam in excess of the phase accumulated by a plane wave passing through the same system. In Ozaktas and Erden [1997] we have determined how the expressions for [(z), w(z),and r ( z ) are related to the expressions for 4(z),M(z), and R(z) (given in Equations 97,98, and 99). The main result can be stated as follows: “Let the output of an arbitrary system consisting of lenses and sections of free space be interpreted as a fractional Fourier transform of the input of order 4(z)with scale factor M(z)observed on a spherical surface of radius R(z).Let a Gaussian beam whose waist is located at z = 0 with waist diameter wo exhibit an accumulated Gouy phase shift [(z),beam diameter w(z),and wavefront radius of curvature r(z) at the output of the same system. If the unit s appearing in Equations 92 and 96 is related to wo as s= w o , then 4(z) = [(z), M(z)= w(z)/w,, and R(z)= r(z).” The reader is referred to Ozaktas and Erden [I19971 for further details.
fi
FRACTIONAL FOURIER TRANSFORM A N D ITS APPLICATIONS
279
XIII. APPLICATIONS TO SIGNAL AND IMAGEPROCESSING The fractional Fourier transform has found many applications in optical and digital signal and image processing, where the ordinary Fourier transform has traditionally played an important role. Here we satisfy ourselves by considering a number of basic concepts and simple application examples. In many signal processing applications, signals which we wish to recover are degraded by a known distortion and/or by noise. Then the problem is to reduce or eliminate these degradations. Appropriate solutions to such problems depend on the observation model and the objectives as well as the prior knowledge available about the desired signal, degradation process, and noise. A commonly used observation model is P
where h(u, u ’ ) is the kernel of the linear system that degrades the desired signal f(u), and n(u) is an additive noise term. The problem is to find an estimation operator represented by the kernel g(u, u’), such that the estimated signal
minimizes the mean square error defined as
where the overline denotes an ensemble average. The classical Wiener filter provides a solution to the preceding problem when the degradation is time-invariant and the input and noise processes are stationary. The Wiener filter is time-invariant and can thus be expressed as a convolution and implemented effectively with a multiplicative filter in the conventional Fourier domain with the fast Fourier transform algorithm (Fig. 9a). For an arbitrary degradation model or nonstationary processes, the classical Wiener filter often cannot provide a satisfactory result. In this case the optimum recovery operator is in general time-varying and has no fast implementation. The dual of filtering in the ordinary Fourier domain is filtering in the space- or time-domain (Fig. 9b). This operation simply corresponds to multiplying the original function with a mask function. Filtering in the
280
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
fobs
(c) FIGURE9. (a) Filtering in the Fourier domain. (b) Filtering in the space (or time) domain. (c) Filtering in the uth order fractional Fourier domain.
ordinary space or Fourier domains can be generalized to filtering in the ath order fractional Fourier domain (Fig. 9c) [Mendlovic and others, 1996b; Ozaktas, 1996; Zalevsky and Mendlovic, 1996; Kutay and others, 19973. For a = 1 this reduces to the ordinary multiplicative Fourier domain filter, and for a = 0 it reduces to space-domain multiplicative filtering. To understand the basic motivation for filtering in fractional Fourier domains, consider Fig. 10, where the Wigner distributions of a desired signal and an undesired distortion are superimposed. We observe that they overlap in both the 0th and 1st domains, but they do not overlap in the 0.5th domain (consider the projections onto the uo = u, u1 = u, and uo.5 axes). Although we cannot eliminate the distortions in the space or frequency domains, we can eliminate them easily by using a simple amplitude mask in the 0.5th domain.
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
281
P
a
FIGURE10. Filtering in fractional Fourier domains as observed in the space- (or time-) frequency plane.
We now discuss the optimal filtering problem mathematically. The estimated (filtered) signal A,, is expressed as (Fig. 9c)
where 9” is the ath order fractional Fourier transform operator, Ag denotes the operator corresponding to multiplication by the filter function g(u), and y3ingle is the operator representing the overall filtering configuration. According to Equation 117, we first take the uth order fractional Fourier transform of the observed signal ,fobs(u), then multiply the transformed signal with the filter y(u) and take the inverse ath order fractional Fourier transform of the resulting signal to obtain our estimate. Since the fractional Fourier transform has efficient digital and optical implementations, the cost of fractional Fourier domain filtering is approximately the same as the cost of ordinary Fourier domain filtering. With the above form of the estimation operator, the problem is to find the optimum multiplicative filter function gopt(u) that minimizes the mean-square error defined in Equation 115. For a given transform order a, gop,(u,) can be found analytically using the orthogonality principle or the calculus of variations (Kutay and others,
282
HALDUN M. OZAKTAS, M. A. KUTAY, A N D DAVID MENDLOVIC
1997):
where the stochastic auto- and cross-correlation functions RjfObb(u, u ' ) and Rfohrfob,(u,u ' ) can be computed from the correlation functions Rj,(u, u') and Rnn(u,u') (which are assumed to be known). Fractional Fourier domain filtering is particularly advantageous when the distortion or noise is of a chirped nature. Such situations are encountered in many real-life applications. For instance, a major problem in the reconstruction from holograms is the elimination of twin-image noise. Since this noise is essentially a modulated chirp signal, it can be dealt with by fractional Fourier domain filtering. Another example is the correction of the effects of point or line defects found on lenses or filters in optical systems, which appear at the output plane in the form of chirp artifacts. Another application arises in synthetic aperture radar which employs chirps as transmitted pulses, so that the measurements are related to the terrain reflectivity function through a chirp convolution. This process results in chirp-type disturbances caused by moving objects in the terrain, which should be removed if high-resolution imaging is to be achieved. Fractional Fourier domain filtering has also been applied to restoration of images blurred by camera motion or atmosphere turbulence [Kutay and Ozaktas, 19981. Further generalizations of the concept of filtering in fractional Fourier domains have been referred to as multistage (repeated) and multichannel (parallel) filtering in fractional Fourier domains [Erden, 1997; Erden and others, 1997a,b; Ozaktas, Erden, and Kutay, 1997; Kutay and others, 1998a,b]. These systems consist of M signal-stage fractional Fourier domain stages in series or in parallel (Fig. 1la, b). M = 1 corresponds to single-stage filtering in both cases. In the multistage system shown in Figure l l a , the input is first transformed into the a,th domain, where it is multiplied by a filter g,(u). The result is then transformed back into the original domain and the same process is repeated M times consecutively. (Note that this amounts to sequentially visiting the domains a,, a2 - a,, a3 - u2, etc. and applying a filter in each.) On the other hand, the multichannel filter structure consists of M single-stage blocks in parallel (Fig. 11b). For each channel k , the input is transformed to the a,th domain, multiplied with a filter gk(u), and then transformed back. Let A,, denote the operator corresponding to multiplication by the filter function gj(u). Then, the outputs Jest s e r ( ~ ) and .fe,, pBr(u)of the serial and parallel configurations are related to the input fobs(u)
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
283
\ FIGURE11. (a) Multistage filtering in fractional Fourier domains. (b) Multichannel filtering in fractional Fourier domains.
according to the relations
represents the ajth order fractional Fourier transform operator where 9""~ .Tpor the operators representing the overall filtering configurations. and ,TSer, As M is increased, both the cost and flexibility of the systems increase.
284
H A L D U N M. OZAKTAS, M. A. KUTAY, A N D D A V I D M E N D L O V I C
The digital implementation of these systems takes O ( M N logN) time and their optical implementation requires an M-stage or M-channel optical system, each of whose stages or channels should have space-bandwidth product N . The increase in flexibility as M increases will often translate into a reduction of the estimation error. Thus we can trade o f fbetween cost and accuracy by choosing an appropriate number of stages or channels. As a simple example, we consider restoration of images blurred by a
(el
(f )
FIWRE 12. Image restoration with the fractional Fourier transform
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
285
nonconstant velocity (space-variant) moving camera. Figure 12a shows the original image, and Fig. 12b shows the blurred image. Figure 12c shows the restoration possible by using ordinary Fourier domain filtering, and Fig. 12d shows restoration possible by single-stage filtering. In this case the optimal domain was a = 0.7, resulting in a mean-square error of 5%. Figure 12e and Fig. 12f show the restored images obtained by using multichannel and multistage filtering configurations with M = 5. We see that the two latter options offer the best performance. A further extension of these concepts is to combine the serial and parallel filtering configurations in an arbitrary manner to obtain generalizedjiltering corzfigurations or circuits (Fig. 13) [Kutay and others, 1998a,b]. In the preceding discussion we have posed the multistage and multichannel configurations as filter structures for optimal image estimation. They can also be used for cost-efficient synthesis of desired linear systems, transform-
I I
fin
-
-
f f fout
FIGURE13. Filter circuits. Each block corresponds to single-stage filtering.
286
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
ations, or mappings, including geometric distortion compensators, and beam shapers and synthesizers as well as linear recovery operators. In this approach, given a general linear system X characterized by the kernel h(u, u’) which we wish to implement, we try to find the optimal orders uk and Fs,,, or filter coefficients g k such that the overall linear operators FSingle, Ypar (as given by Equation 117, Equation 119, or Equation 120) is as close as possible to 2, according to some specified criteria (such as minimum Froebenius norm of the difference of the kernels). The optical and digital implementations of general linear systems are costly. Using the abovementioned approach, it is possible to approximate the systems by multistage or multichannel filtering operations in fractional Fourier domains, which are much cheaper to implement. This would allow signficant savings in cost with little or no decrease in performance. Further discussion of this approach in a signal processing context may be found in Erden [1997], Erden and Ozaktas [1998], Ozaktas, Erden, and Kutay [1997], and Kutay and others [1998a]. We believe that this approach will find further applications in many other contexts. Finally, we note that optimal filtering and image restoration is only one of the many signal processing applications explored. Correlation and pattern recognition applications have also received a considerable amount of interest. We refer the reader to the chapter by Mendlovic, Zalevsky, and Ozaktas [1998] and also to the following papers: Mendlovic, Ozaktas, and Lohmann [1995d1; Alieva and Agullo-Lopez [1995]; Garcia and others [19961; Lohmann, Zalevsky, and Mendlovic [1996b]; Bitran and others [1996]; and Mendlovic and others [1995a].
ACKNOWLEDGMENTS We acknowledge the contributions of M. Fatih Erden to various parts of this chapter. It is also a pleasure to acknowledge the benefit of interactions with Adolf W. Lohmann. This chapter previously appeared as Ozaktas, Kutay, and Mendlovic 1998, parts of which previously appeared in Ozaktas and Mendlovic 1994, 1995, and Ozaktas and Erden 1997.
REFERENCES Abe, S. and Sheridan, J. T. (1995a). Almost-Fourier and almost-Fresnel transformations. Optics Communications 113 385-388. Abe, S. and Sheridan, J. T. (1995b). Comment on ‘The fractional Fourier transform in optical propagation problems.’ J . Modern Optics 42 2373-2378.
FRACTIONAL FOURIER TRANSFORM AND ITS APPLICATIONS
287
Agarwal, G. S. and Simon, R. (1994). A simple realization of fractional Fourier transforms and relation to harmonic oscillator Green’s function. Optics Commun. 110 23-26. Alieva, T., Lopez, V, Agullb-Lopez, F., and Almeida, L. B. (1994). The fractional Fourier transform in optical propagation problems. J . Modern Optics 41 1037-4044. Alieva, T. and Agullo-Lopez, F. (1995). Reconstruction of the optical correlation function in a quadratic refractive index medium. Optics Communications 114 161- 169. Erratum in 118 657. Almeida, L. B. (1994). The fractional Fourier transform and time-frequency representations. I E E E Trans. Signal Process. 42 3084-3091. Alonso, M. A. and Forbes, G. W. (1997). Uniform asymptotic expansions for wave propagators via fractional transformations. Submitted. Aytiir, 0. and Ozaktas, H. M. (1995). Non-orthogonal domains in phase space of quantum optics and their relation to fractional Fourier transforms. Optics Communications 120 166- 170. Bargmann, V. (1961). On a Hilbert space of analytic functions and an associated integral transform. Part I. Comm. Pure and Applied Mathematics 11 187-214. Bastiaans, M. J. (1978). The Wigner distribution applied to optical signals and systems. Optics Communications 25 26-30. Bastiaans, M. J. (1979a). The Wigner distribution function and Hamilton’s characteristics of an geometric-optical system. Optics Communications 30 321- 326. Bastiaans, M. J. (1979b). Wigner distribution function and its application to first-order optics. J . Optical Society of America A 69 1710-1716. Bastiaans, M. J. (1989). Propagation laws for the second-order moments of the Wigner distribution function in first-order optical systems. Optik 82 173- 181. Bastiaans, M. J. (1991). Second-order moments of the Wigner distribution function in first-order optical systems. Optik 88 163- 168. Beck, M., Rayner, M. G., Walmsley, I. A,, and Kong, V. (1993). Chronocyclic tomography for measuring the amplitude and phase structure of optical pulses. Optics Letters 18 204 1-2043. Bernardo, L. M. and Soares, 0. D. D. (1994a). Fractional Fourier transforms and optical systems. Optics Communications 110 517-522. Bernardo, L. M. and Soares, 0. D. D. (1994b). Fractional Fourier transforms and imaging. J . Opticul Society of America A 11 2622-2626. Bitran, Y., Zalevsky, Z., Mendlovic, D., and Dorsch, R. G. (1996). Fractional correlation operation: Performance analysis. Applied Optics 35 297-303. Bracewell, R. N. (1995). Two-Dimensional Imaging. Prentice-Hall, Englewood ClitTs, NJ. 1995. Claasen, T. A. C. M. and Mecklenbrauker, W. F. G. (1980a). The Wigner distribution-a tool for time-frequency signal analysis. Part I: continuous-time signals. Philips J . Research 35 217-250. Claasen, T. A. C. M. and Mecklenbrauker, W. F. G. (1980b). The Wigner distribution-a tool for time-frequency signal analysis. Part 11: discrete-time signals. Philips J . Research 35 276-300. Claasen, T. A. C. M. and Mecklenbrauker, W. F. G. (1880~).The Wigner distribution-a tool for time-frequency signal analysis. Part 111: relations with other time-frequency signal transformations. Philips J. Research 35 372-389. Cohen, L. (1989). Time-frequency distribution-a review. Proceedings of I E E E 77 941 -981. Cohen, L. (1995). Time-Frequency Analysis Prentice-Hall, Englewood Cliffs, NJ. Condon, E. U. (1937). Immersion of the Fourier transform in a continuous group of functional transformations. Proc. National Academy of Sciences 23 158- 164. de Bruijn, N. G. (1973). A theory of generalized functions, with applications to Wigner distribution and Weyl correspondence. Nicuw Archief voor Wskunde 21 205-280.
288
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
Dorsch, R. G. (1995). Fractional Fourier transformer of variable order based on a modular lens system. Applied Optics 34 6016-6020. Dorsch, R . G. and Lohmann, A. W. (1995). Fractional Fourier transform used for a lens design problem. Applied Optics 34 4111-4112. Dragoman, D. (1996). Fractional Wigner distribution function. J . Optical Society of America A 13 474-478. Erden, M. F. (1997). Repeated Filtering in Consecutive Fractional Fourier Domains. Ph.D. Thesis, Bilkent University, Ankara. Erden, M. F., Kutay, M. A,, and Ozaktas, H. M. (1997b). Repeated filtering in consecutive fractional Fourier domains and its application to signal restoration. Sub. to appear I E E E Trans. Signal Procesiny, 1990. Erden, M. F., Ozaktas, H. M. (1997). Accumulated Gony phase shift in Gaussian beam propagation through first-order optical systems. J . Optical Society of Americu B 14 21902194. Erden, M. F. and Ozaktas, H. M. (1998). Synthesis of general linear systems with repeated filtering in consecutive fractional Fourier domains. To appear in J . Optical Society of America A , 1998. Erden, M. F., Ozaktas, H. M., and Mendlovic, D. (1996a). Propagation of mutual intensity expressed in terms of the fractional Fourier transform. J . Oprical Society of America A 13 1068- 1071. Erden, M. F., Ozaktas, H. M., and Mendlovic, D. (1996b). Synthesis of mutual intensity distributions using the fractional Fourier transform. Optics Comniunications 125 288-301. Erden, M. F., Ozaktas, H. M., Sahin, A,, and Mendlovic, D. (1997a). Design of dynamically adjustable anamorphic fractional Fourier transformer. Optics Communications 136 52-60. Fonollosa, J. R. and Nikias, C. L. (1994). A new positive time-frequency distribution. In Proc. I994 Int. Conf Acoustics, Speech, and Signal Processing. IEEE, NJ. 1V 301-304. Garcia, J., Mendlovic, D., Zalevsky, Z., and Lohmann, L. (1996). Space-variant simultaneous detection of several objects by the use of multiple anamorphic fractional-Fourier-transform filters. Applied Optics 35 3945-3952. Gomez-Reino, C., Bao, C., and Pkrez, M. V. (1996). GRIN optics, Fourier optics and optical connections. In 17th Congress ofthe Internutional Commission for Optics: Optics for Science and New Technology, SPIE Proceedings 2778 128-131, SPIE, Bellingham, Washington, 1996. Gori, F., Santarsiero, M., and Bagini, V. (1994). Fractional Fourier transform and Fresnel transform. Atti Foundaz Georyio Ronchi. Granieri, S., Trabocchi, O., and Sicre, E. E. (1995). Fractional Fourier transform applied to spatial filtering in the Fresnel domain. Optics Communications 119 275-278. Hlawatsch, F. and Boudreaux-Bartels, G. F. (1992). Linear and quadratic time-frequency signal representations. I E E E Signal Processing Magazine April 21-67. Jiang, 2. (1995). Scaling laws and simultaneous optical implementation of various order fractional Fourier transforms. Optics Letters 20 2408-2410. Erden, M. F., Ozaktas, H. M. (1998b). Kutay, M. A., Arikan, O., Candan, C., Giileryiiz, Cost-efficient approximation of linear systems with multi-channel fractional Fourier domain filtering. Submitted to IEEE Signal Process. Lett. Kutay, M. A., Erden, M. F., Ozaktas, H. M., Arikan, O., Giileryiiz, and Candan, C. (1998a). Space-Bandwidth efficient realizations of linear systems. Optics Lerters, 23 1069-1071. Kutay, M. A. and Ozaktas, H. M. (1998). Optimal image restoration with the fractional Fourier transform. J . Optical Society of America A 15 825-834. Kutay, M. A., Ozaktas, H. M., Arikan, O., and Onural, L. (1997). Optimal Filtering in Fractional Fourier Domains. I E E E Trans Signal Process 15 1129- 1143.
o.,
o.,
FRACTIONAL FOURIER TRANSFORM A N D ITS APPLICATIONS
289
Liu, S., Xu, J., Zhang, Y., Chen, L., and Li, C. (1995). General optical implementation of fractional Fourier transforms. Optics Letters 20 1053- 1055. Lohmann, A. W. (1993). Image rotation, Wigner rotation, and the fractional order Fourier transform. J . Optical Society of’ America A 10 2181-2186. Lohmann, A. W. (1995). A fake zoom lens for fractional Fourier experiments. Optics Communications 115 437-443. Lohmann, A. W., Mendlovic, D., Zalevsky, Z., and Dorsch, R. G. (1996a). Some important fractional transformations for signal processing. Optics Communications 125 18-20. Lohmann, A. W. and Soffer, B. H. (1994). Relationships between the Radon-Wigner and fractional Fourier transforms. J . Opticul Society of America A I I 1798-1801. Lohmann, A. W., Zalevsky, Z., and Mendlovic. D. (1996b). Synthesis of pattern recognition filters for fractional Fourier processing. Optics Comnrunication.s I28 199-204. McAlister, D. F., Beck, M., Clarke, L., Meyer, A,, and Rayner, M. G . (1995). Optical phase-retrieval by phase-space tomography and fractional-order Fourier transforms. Optics Letters 20 1181-1 183. McBride, A. C. and K e n , F. H. (1987). O n Namias’s fractional Fourier transform. ZMA J . Applied Mathematics 39 159- 175. Mecklenbrauker, W. F. G. ( I 993). The Wigner Distribution: Theory and Applicutions it1 Signul Processing. W. F. G. Mecklenbrauker, ed. Elsevier. Amsterdam. Mendlovic, D., Bitran, Y., Dorsch, R. G., Ferreira, C., Garcia, .I.and , Ozaktaz, H. M. (1995b). Anamorphic fractional Fourier transform: optical implementation and applications. Applied Optics 34 7451-7456. Mendlovic, D., Bitran, Y., Dorsch, R. G., and Lohmann. A. W. (1995a). Optical fractional correlation: experimental results. Applied Optics 34 1665- 1670. Mendlovic, D., Dorsch, R. G., Lohmann, A. W., Zalevsky, Z., and Ferreira, C. (1996a). Optical illustration of a varied fractional Fourier-transform order and the Radon-Wigner display. Applied Optics 35 3925-3929. Mendlovic, D. and Ozaktas, H. M. (1993). Fractional Fourier transformations and their optical implementation, I. J . Optical Society of’ Americu A 10 1875-1881. Mendlovic, D., Ozaktas, H. M., and Lohmann, A. W. (1994a). Graded-index fibers, Wignerdistribution functions and the fractional Fourier transform. Applied Optics 33 6188-6193. Mendlovic, D., Ozaktas, H. M., and Lohmann, A. W. (1995d). Fractional correlation. Applied Optic’s 34 303-309. Mendlovic, D., Zalevsky, Z., Konforti, N., Dorsch, R. G., and Lohmann, A. W. (1995~). Incoherent fractional Fourier transform and its optical implementation. Applied Optics 34 7615-7620. Mendlovic, D., Zalevsky, Z., Lohmann, A. W., and Dorsch, R. G. (1996b). Signal spatialfiltering using the localized fractional Fourier transform. Optics Comniunications 126 14- 18. Mendlovic, D., Zalevsky, Z., and Ozaktas, H. M. (1998). The applications of the fractional Fourier transform to optical pattern recognition. I n Opticrrl Puftem Rec~oynition,Academic Press. Milhovilovic, D. and Bracewell, R. N. (1991). Adaptive chirplet representation of signals on time-frequency plane. Electronics Letters 27 1159- I16 I . Moshinsky, M. and Quesne, C. (1971). Linear canonical transformations and their unitary representations. J . Mothemutics Physics 12 1772- 1780. Moshinsky, M., Seligman, T. H.. and Wolf, K. B. (1972). Canonical transformations and the radical oscillator and Coulomb problem. J . Mrrthemctrics Physics 13 90 1-907. Mustard, D. A. (1987a). Lie group imbeddings of the Fourier transform. School of Mathematics Preprint AM87/13. The University of New South Wales, Kensington, Australia. Mustard, D. A. (1987b). The fractional Fourier transform and a new uncertainty principle.
290
HALDUN M. OZAKTAS, M. A. KUTAY, AND DAVID MENDLOVIC
School of Mathematics Preprint AM87/14. The University of New South Wales, Kensington, Australia. Mustard, D. A. (1989). The fractional Fourier transform and the Wigner distribution. School of Mathematics Preprint AM89/6. The University of New South Wales, Kensington, Australia. Mustard, D. A. (1991). Uncertainty principles invariant under the fractional Fourier transform. J . Australian Mathematical Society B 33 180-191. Mustard, D. A. (1996). The fractional Fourier transform and the Wigner distribution. J . Australian Mathematical Society B 38 209-219. Mustard, D. A. (1997). Fractional convolution. T o appear in J . Australian Mathematical Society B. Namias, V. (1980). The fractional order Fourier transform and its application to quantum mechanics. J . Inst. Maths Applics 25 241-265. Nazarathy, M. and Shamir, J. (1982). First-order optics-a canonical operator representation: lossless systems. J . Opt. Soc. Am. 12 356-364. Ozaktas, H. M. (1996). Repeated fractional Fourier domain filtering is equivalent to repeated time and frequency domain filtering. Signal Processing 54 81-84. Ozaktas, H. M., Arikan, O., Kutay, M. A., and Bozdagi, G. (1996b). Digital computation of the fractional Fourier transform. I E E E Trans Signal Processing 44 2141 -2150. Ozaktas, H. M. and Aytiir, 0. (1995). Fractional Fourier domains. Signal Processing 46 119-124. Ozaktas, H. M., Barshan, B., Mendlovic, D., and Onural, L. (1994a). Convolution, filtering, and multiplexing in fractional Fourier domains and their relation to chirp and wavelet transforms. J . Optical Society qf America A 11 547-559. Ozaktas, H. M. and Erden, M. F. (1997). Relationships among ray optical, Gaussian beam, and fractional Fourier transform descriptions of first-order optical systems. Optics Communicutions 143 75-86. Ozaktas, H . M., Erden, M. F., and Kutay, M. A. (1997). Cost-Efficient Approximation of Linear Systems with Repeated Filtering. Submitted to I E E E Signal Processing Lett. Ozaktas, H. M., Erkaya, N., and Kutay, M. A. (1996a). Effect of fractional Fourier transformation on time-frequency distributions belonging to the Cohen class. I E E E Signal Processing Lett. 3 10-11. Ozaktas, H. M., Kutay, M. A,, and Mendlovic, D. (1998). The fractional Fourier transform. Technical Report BU-CEIS Introduction ro fractional Fourier transform 9802. Bilkent University, Department of Coniputer Engineering rind Information Sciences. Bilkent, Ankara. Ozaktas, H. M. and Mendlovic, D. (1993a). Fourier transforms of fractional order and their optical interpretation. Optics Communications 101 163- 169. Ozaktas, H. M. and Mendlovic, D. (1993b). Fractional Fourier transformations and their optical implementation. 11. J . Optical Society of’ America A 10 2522-2531. Ozaktas, H. M. and Mendlovic, D. (1994). Fractional Fourier transform as a tool for analyzing beam propagation and spherical mirror resonators. Optics Letters 19 1678- 1680. Ozaktas, H . M. and Mendlovic, D. (1995). Fractional Fourier optics. J . Optical Society qf America A 12 743-751. Ozaktas, H. M. and Mendlovic, D. (1996). Every Fourier optical system is equivalent to consecutive fractional-Fourier-domain filtering. Applied Optics 35 3 167-3 170. Pellat-Finet, P. (1994). Fresnel diffraction and the fractional-order Fourier transform. Optics Letters 19 1388-1390. Pellat-Finet, P. and Bonnet, G. (1994). Fractional order Fourier transform and Fourier optics. Optics Communications 111 141-154. Pellat-Finet P. (1995). Transfert du champ klectromagnetique par diffraction et transformation de Fourier fractionnaire. C R Acad. Sci. Paris 320 9 1-97.
FRACTlONAL FOURIER TRANSFORM AND ITS APPLICATIONS
291
Raymer, M. G., Beck, M., and McAlister, D. F. (1994a). Complex wave-field reconstruction using phase-space tomography. Physicul Reoiew Letters 72 1137- 1140. Raymer. M. G., Beck, M., and McAlister, D. (1994b). Spatial and temporal optical field reconstruction using phase-space tomography. In Quuntum Optics V1. Springer, Berlin. Sahin, A,, Ozaktas, H. M., and Mendlovic, D. (1995). Optical implementation of the twodimensional fractional Fourier transform with different orders in the two dimensions. Optics Connnunications 120 134- 138. Saleh, B. E. A. and Teich, M. C. (1991). Fundumentul qf Photonics. Wiley, New York. Seger, 0. (1993). Model Building und Restorrrtion with Applications in Confocul Microscopy, P1i.D. thesis, Linkoping University, Sweden. Smithey, D. T., Beck, M., Raymer, M. G., and Faridani, A. (1993). Measurement of the Wigner distribution and the density matrix of a light mode using optical homodyne tomography: application to squeezed states and the vacuum. Pkysicul Reciew Letters 70 1244-1247. Wiener, N. (1929). Hermitian Polynomials and Fourier Analysis. Journal of’ Murhmiarics Physics M I T 18 70-73. Wolf, K. B. (1979). Construction and properties of canonical transforms. In Integral Trunsforms in Science and Engineering. Plenum Press, New York. Wood, J . C. and Barry, D. T. (1994a). Tomographic time-frequency analysis and its application toward time-varying filtering and adaptive kernel design for multicomponent linear-FM signals. f E E E Trans Signul Processing 42 2094-2104. Wood, J. C. and Barry, D. T. (1994b). Linear signal synthesis using the Radon-Wigner transform. I E E E Puns Signal Processing 42 2105- 211 I . Yurke, B., Schleich, W., and Walls, D. F. (1990). Quantum superpositions generated by quantum nondemolition measurements. Physicctl Rev. A 42 1703- 1711. Zalevsky, 2. and Mendlovic, D. (1996). Fractional Wiener filter. Applied Optics 35 3930-3936.
ACKNOWLEDGMENTS We acknowledge the contributions of M. Fatih Erden to various parts of this chapter. It is also a pleasure to acknowledge the benefit of interactions with Adolf W. Lohmann. This chapter previously appeared as Ozaktas, Kutay, and Mendlovic 1998, parts of which previously appeared in Ozaktas and Mendlovic 1994, 1995, and Ozaktas and Erden 1997.
ADVANCES I N IMAGING A N D ELECTRON PHYSICS. VOL. 106
Confocal Microscopy: Recent Developments ERNST HANS KARL STELZER and FRANK-MARTIN HAAR Light Microscopy Group. Cell Biology and Biophysics Programme. European Molecular Biology Laboratory ( EMBL) Meyerhofstrasse I Pos.@ch 10.2209, 0-6911 7 Heidelberg. Germany
.
.
1. Resolution in Light Microscopy . . . . . . . . . . . . . . . . . . . . . 11. Calculating Optical Properties . . . . . . . . . . . . . . . . . . . . . .
A . Point-Spread Functions . . . . . . . . . . . . . . . . . . . . . . . 111. Principles of Confocal Microscopy . . . . . . . . . . . . . . . . . . . .
A . Light Paths in a Confocal Microscope . . . . . . . . . . . . . . . . . B. Technical Aspects of a Confocal Microscope . . . . . . . . . . . . . . C. Applications of Confocal Microscopy . . . . . . . . . . . . . . . . . D . Alternatives to Confocal Microscopy . . . . . . . . . . . . . . . . . E. Optimal Recording Conditions . . . . . . . . . . . . . . . . . . . . F. Index Mismatching Effects . . . . . . . . . . . . . . . . . . . . . . IV . improving the Axial Resolution . . . . . . . . . . . . . . . . . . . . . A . Standing-Wave Fluorescence Microscopy . . . . . . . . . . . . . . . B. 4Pi-Confocal Fluorescence Microscopy . . . . . . . . . . . . . . . . C . Confocal Theta Microscopy . . . . . . . . . . . . . . . . . . . . . V . Nonlinear Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Two-Photon Excitation . . . . . . . . . . . . . . . . . . . . . . . B. Multiphoton Excitation . . . . . . . . . . . . . . . . . . . . . . . C . Stimulated-Emission-Depletion Fluorescence Microscopy . . . . . . . . D . Ground-State-Depletion Fluorescence Microscopy . . . . . . . . . . . VI . Aperture Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Axial Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Three-Dimensional Measurements and Qualitative Analysis . . . . . . . . VIII . Spectral Precision Distance Microscopy . . . . . . . . . . . . . . . . . . IX . Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . X . Spinning Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI . Perspectives of Confocal Fluorescence Microscopy . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
293 299 299 301 301 307 307 308 309 309 311 311 314 317 321 322 324 324 325 327 329 331 333 334 335 336 331
1. RESOLUTION IN LIGHTMICROSCOPY
The modern light microscope is usually operated in a mode that is close to the diffraction limit [Abbe. 18731. This means that the resolution is determined by the wavelengths of the incoming and outgoing light. the refractive index of the medium. the focal length of the lens. and the diameter of the aperture (Figs. 1 and 2). The optical systems. in particular the 293 Volume 106 ISBN 0-12-014748-3
ADVANCE3 I N IMAGING AND ELECTRON PHYSICS Copyright ‘c) 1999 by Academic Press All rights of reproduction iii m y form reserved ISSN 1076-5670/99 $30 00
294
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
FIGURE1. Characteristic properties of a telecentric system. In a telecentric optical system all beams pass the aperture diaphragm as a planar wave. In the optical path the tilt angle p determines the distance of the focus from the optical axis. The focal length f of the objective lens and the diameter of the entrance aperture 2a determine the opening angle a. The numerical aperture is the product of the angular aperture sin CI and the refractive index of the medium n. A planar wave tilted by an angle fi has a focus in the object plane at a distance s = J’.tanB from the optical axis.
objective lenses, have their limits, of course. They will not transmit outside a certain spectral range, they have a finite working distance and a finite field of view, and they show slightly different lateral and axial magnifications depending on wavelength, position in the field, polarization, and temperature. Assuming the usual working conditions, such limits are not encountered. The consequence is that it makes sense to calculate system functions (point-spread functions), to discuss their properties, and to describe the image formation process as an accumulation of point images.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
I
S
.
29 5
,
I
w j
s
FIGURE2. Three-dimensional imaging in a telecentric system. A pair of angles always encodes the lateral positions (see Fig. 1). A divergence or convergence angle defines the position of the emitter along the optical axis. The lateral distance M ' s in the image plane i is independent of the position of the emitter along the optical axis. In every telecentric system the lateral magnification is the ratio of the focal length of the tube and the objective lens. The axial magnification is the square of the lateral magnification M. If the objects are located in different planes with a distance z then the images have a distance along the optical axis M 2 .z.
A thorough and readable description on how to calculate the intensity distribution in the focus of a lens is found in the famous book by Born and Wolf [1980, pp. 435-4411. The essence of their calculations and of many other authors is that the lateral resolution (Ax, Ay, i.e., the resolution in the focal plane) is proportional to the wavelength 1 and inversely proportional
296
ERNST H. K. STELZER A N D FRANK-MARTIN HAAR
to the numerical aperture N . A . = n.sin a. The axial resolution (Az,i.e., the resolution along the optical axis) is proportional to the wavelength and the refractive index n and inversely proportional to the square of the numerical aperture. The effects of the light distribution in the aperture are only apparent in some factors (ki,mi,yi):
AX = k;k,. ..:
A kp*N.A.
- k, . k , -
n.i N .A .
*
i 1 n sin@
..: k; -. -
Az = m , ~ m , - , . : r n ; ~= m,.m,. ..:m
A 1 n sin2ci
The minimal extent of a volume element Au is, therefore, proportional to the third power of the wavelength and inversely proportional to the fourth power of the angular aperture. The volume element [Lindek et al., 1994b1 is still not a popular way of looking at resolution, but it is reasonable. Microscopy, and in particular confocal microscopy and two-photon excitation, provide a three-dimensional resolution; therefore, a three-dimensional resolution criterion is necessary, which documents that a resolution improvement along one axis is not imposed at the expense of a decrease along another axis. Some techniques using, for example, annular apertures [Wilson and Hewlett, 19901 improve the lateral resolution but at the same time degrade the axial resolution. In this case the three-dimensional (volume) resolution will be worse. Before going into the details one can consider how the lateral and/or the axial resolution can be improved by decreasing the illumination wavelength, increasing the refractive index, or increasing the numerical aperture [Stelzer, 19981. Other means to modify the resolution are nonlinear effects, multilens arrangements, or computational efforts that take the actual illumination/ detection process into account. The lateral and the axial resolutions can be improved simultaneously by decreasing the wavelength. The wavelengths, currently used span a range from 300 nm to about 1100 nm. Due to technical limits the wavelengths are probably closer to a range from 350 nm to 900 nm. Wavelengths in the UV may provide the optimal resolution, but biological objects tend to suffer and eventually die [Carlsson et al., 1992; Montag et al., 19911. In fluorescence microscopy the excitation wavelength has to be adapted to the fluorophore. The choice of the dye thus determines the wavelengths of the light source [Tsien and Waggoner, 19951.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
297
However, longer wavelengths have their virtues [Fischer, Cremer, and Stelzer, 19951. Short pulses of high intensity can be used to induce a two-photon absorption process [Denk, Strickler, and Webb, 19901. Applying two-photon excitation in fluorescence microscopy has three effects. First, the wavelength is increased by a factor of two and the resolution is a factor of two worse along all three axes [Stelzer et al., 19941. Second, the proportionality of the fluorescence emission to the square of the excitation intensity introduces the same factor 1/$ encountered in confocal fluorescence microscopy [Sheppard and Gu, 19903. This partially compensates worse, but the higher wavelength and the resolution is only a factor of the two-photon microscope has the same properties as a confocal fluorescence microscope. Third, almost all dyes can be excited with two photons [Fischer, Cremer, and Stelzer, 1995). Increasing the refractive index will improve the resolution in the same manner as a decrease of the wavelength. Lenses for refractive indices up to 1.7 (using Xylol as immersion medium) have been available, but common oil-immersion media have a refractive index of 1.518 at 23°C and a wavelength of 546 nm. Since the observation of living samples becomes more important in the life sciences, water immersion lenses corrected for the refractive index of 1.33 replace oil immersion lenses. Increasing the angular aperture (sin a ) will also increase the resolution. Angles a up to about 70" are technically feasible. The largest numerical aperture (n.sina) in air is 0.94, in water 1.23, and in oil 1.4. Another limit is the angle of total reflection, which is encountered when oil immersion lenses are used to observe samples mounted in an aqueous medium [Hell et al., 19931. The numerical aperture is then at most 1.3. Another problem is the refractive index mismatch, which induces important spherical aberrations at large depths [Hell et al., 1993; Torok, Varga, and Booker, 1995). However, magnification, working distance, and numerical aperture of an objective are not independent. An oil immersion lens with a numerical aperture of 1.4 and a magnification of 63 (abbreviated 1.4/63x) will have a working distance around 0.250 mm, while a water immersion 0.9/63x lens will have a working distance around 1.5 mm. High numerical aperture lenses are, therefore, only available for thin objects. A good method to reduce the stray light coming from thick samples is to reduce the field of view. The confocal microscope reduces the illumination field and the detection field to the physical limit determined by diffraction and hence discriminates all light emitted outside the focal volume. Using a confocal microscope [Brakenhoff, 19793 increases the lateral resolution by 1/$. It affects only the factors ki,mi,qi. However, as we will explain later, it has a depth discrimination capability like that of the two-photon excitation microscope.
4
298
ERNST H. K. STELZER A N D FRANK-MARTIN HAAR
Another important method to improve the resolution is to change the intensity distribution in the aperture plane. The best-known case is the annular aperture [Airy, 1841; Sheppard, 19771. It has a low transmission in the center and a high transmission on the edges. The higher angles thus contribute more to the image formation process. This improves the lateral resolution up to 73% but causes extensive ringing and increases the depth of field to infinity. Special apertures can be designed that maintain the lateral resolution intact but improve the axial resolution (Martinez-Corral, Andres, and Zapata-Rodriguez, 1995b1. While all the methods previously mentioned leave the instrument basically intact, others require severe modifications to the microscope, as it is traditionally known. Recently introduced methods that improve the numerical aperture use two or more lenses that illuminate a sample coherently or detect the emitted fluorescent or scattered light coherently. In a standing-wave fluorescence microscope [Lanni, 1986; Lanni, Waggoner, and Taylor, 19861 a whole field is illuminated coherently using planar waves in the focus of two opposing lenses, which produces a fringe pattern along the optical axis. Images are recorded as a function of the phase and the position of the object along the optical axis and later reconstructed using appropriate algorithms. In a 4Pi(A) confocal microscope [Hell and Stelzer, 1992a,b], two opposing lenses use spherical waves to illuminate a focal spot coherently and produce a standing wave that modulates the intensity along the optical axis. Two minima are slightly more than i/2n apart. Adjusting the phase and moving the object relative to the focal spot while recording the fluorescence intensity as a function of the position generates the images. Two lenses in an orthogonal arrangement provide another interesting method since in this case the lateral resolution dominates the extent of the point-spread function along all three directions [Stelzer and Lindek, 19941. The resolution becomes isotropic, that is, the lateral and the axial resolution are almost identical. In the confocal theta microscope the axial resolution can be improved by a factor of three, and low-NA systems may have axial resolutions, which are better than those achieved with confocal high-NA systems. In tomographic methods [Bradl et al., 1994; Shaw et al., 1989; Skaer and Whytock, 19751, the sample is mounted on a rotating stage and observed from different angles. This can be accomplished using conventional or confocal microscopes. The data sets are then normalized and an attempt is made to construct an improved view of the object. Such a system can be applied to increase the axial resolution to almost the lateral resolution, and three-dimensional distances can be measured with an improved accuracy. Another nonlinear method for fluorescence microscopy is to prevent the excitation of fluorophores by depleting their ground state [Hell and Kroug,
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
299
1995; Hell and Wichmann, 1994) or by stimulating the emission [Hell and Wichmann, 19941. By generating appropriate patterns with one light source and observing the remaining fluorophores with another light source, a higher resolution can be achieved. If the imaging process can be reconstructed using samples with a well-characterized spatial dye distribution, attempts to estimate the fluorophore concentration in the sample by computational means can be applied [Agard and Sedat, 1983; Carrington et al., 1995b; Shaw, 19951. Such methods can be used with data recorded using wide-field methods as well as data recorded with confocal microscopes. The former actually competes with the confocal microscope. The latter, however, is usually regarded as a means to relax the confocal imaging conditions, for example, using larger pinholes or noisier images. The first description of a confocal microscope is found in a patent by Marvin Minsky [1961; 19881. It was probably Charles McCutchen [1967] who first appreciated and described the properties of a combined point illumination and detection device. Petran et al. [1968] built one of the earliest instruments, and the first experimental verification of the confocal principle is due to Godefridus Jacobus Brakenhoff, P. Blom, and C. Bakker [1978] and Brakenhoff, Blom, and P. Barends [1979]. The first investigators to realize the depth discrimination capability of the confocal fluorescence microscope were Ingemar Cox, C. J. R. Sheppard, and T. Wilson [1982]. Most of the literature on the theory of confocal microscopes has been written by Tony Wilson and Colin Sheppard [198l].
PROPERTIES 11. CALCULATING OPTICAL A . Point-SpreadFunctions
In a microscope the field of a pointlike light source in the image plane is equivalent to the system response. It is referred to as the amplitude pointspread function (PSF) and is used to describe the properties of the optical components for a given wavelength [Born and Wolf, 1980, pp.435-4491. The most basic approach to calculating an amplitude PSF is to apply Huygens’ principle, that is, to assume an aperture with an area A as the source of waves. A complete description of this process is the following equation
which can be solved numerically for each point p in the object volume [Born and Wolf, 1980, p. 4361. Appropriately phrased, it will take polariz-
300
ERNST H. K. STELZER A N D FRANK-MARTIN HAAR
ation effects into account [Hell et al., 19931. However, for most purposes it is sufficient to restrict all calculations to a domain in which a linear approach holds. The amplitude PSF extends in all three dimensions. Due to the cylindrical symmetry of lenses the two lateral components can be regarded as equal. It is, therefore, in most cases sufficient to be able to describe an amplitude PSF in a plane containing the optical axis. Applying appropriate simplifications [Born and Wolf, 1980, p. 436ff1, the amplitude PSF becomes a solution of
h(u, v) = - i
2nnA sin2cr ei&
A
lo'
Jo(vp)e - + U P z p dp
v = 2nnr sin ~ J I . A u = 2nnz sin2u//1 A r =
Jm,
where v and u are normalized optical units perpendicular and parallel to the optical axis, respectively [Hopkins, 19431, while r is the distance from the optical axis. Neither the spatial distribution of the amplitudes nor the variation as a function of time can be measured directly in the optical frequency range. However, the intensity PSF (i.e., an image) can be visualized by placing a piece of paper into the optical path or after recording it using a camera oriented normal to the optical axis. The intensity PSF jh(u, u)I2 is calculated as the product of the amplitude PSF and its complex conjugate
Ih(u, Y ) ( 2
= h(u, v ) . h*(u, v).
The intensity PSF also describes the spatial absorption pattern of a uniform fluorophore solution in the vicinity of the focus. The concept of transfer functions [Goodman, 1968, pp. 111-120; Frieden, 19671 has many advantages. However, in this paper we avoid them and use PSFs instead. Coherent transfer functions are the Fourier transform of the amplitude PSF. Optical transfer functions are the autocorrelation function of the coherent transfer function. Therefore the two descriptions are equivalent. There are three questions concerning the concept of PSFs or system functions:
1. Can they be measured? The concept of PSFs may seem straightforward, but under ordinary conditions a recording system based on an objective lens has to be able to cope with aberrations, noise, and a number of nonlinear effects. Spherical aberration, for example, causes a decrease of the energy under the main maximum and its shift into the higher-order terms [Hell et al., 19931. The maximal intensity is lower than in the unaberrated PSF. A number of conditions have to be met by an optical system: The optical system must be linear, that is, no light must be absorbed or scattered and no object must be in the shadow of another object.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
301
The optical system must be invariant, that is, features in the image of an object must be independent of the position of the object in the field of view. Since these conditions are never perfectly fulfilled, the concept of PSFs is only approximately correct. Therefore, the existence of PSFs is not obvious. 2. How can one derive PSFs from real images? The shape of the analyzed object has to be taken into account. This is often forgotten when regular patterns are used to determine the resolution of an optical system. It is probably a common error to confuse the transfer function of line pairs that are rectangular (or square wave) functions with those of sine waves. 3. How are they calculated? As previously pointed out, two conditions must be met for PSFs to exist. There is a rich literature on how to include effects such as high numerical aperture, absorption, and refractive-index mismatch. They usually provide insight into how to calculate images of point sources, but these are in general not identical to PSFs. It usually turns out that invariance is not maintained and that taking into account absorption effects requires the exact path of light through the object [White et al., 19961. Although we are aware of the theory’s deficiencies we will follow it as Born and Wolf present it in our calculations of intensity distributions. We will also not take into account effects such as polarization, absorption, and scattering, unless they are mentioned explicitly. In this paper we investigate images of point objects. These objects are so small that their features cannot be resolved. We define point objects as objects whose diameters are much smaller than the diameter of the intensity PSF. In fluorescence microscopy we look at objects in which, for example, the diameter of the area over which the fluorescent molecules are scattered, or the maximum distance between two fluorescent molecules, is very often only half the diameter of the Airy disk [Stelzer, 19981. Each of the molecules creates an intensity PSF in the image, but they are so close to each other that their sum becomes smeared and indistinguishable from a single intensity PSF.
111.
PRINCIPLES OF CONFOCAL
MICROSCOPY
A. Light Paths in a Confocal Microscope
A confocal fluorescence microscope (CFM) is usually based on a conventional microscope. It contains an objective lens, a stage for the sample, a light source, and a detector. If we work with fluorescent light, two filters are
302
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
instrumental. The dichroic deflector separates the illumination light with which the fluorophore is excited from the fluorescent light that is emitted by the fluorophore, while the second filter separates the emitted fluorescence from scattered excitation light (Fig. 3). The light path is best understood if one first looks at the excitation light and then at the emission light. The light source is usually a laser. The laser light is focused into a pinhole, deflected by a dichroic mirror into the objective lens, and focused inside the specimen. Most of the light will pass the specimen, but a small fraction is absorbed by the fluorophore, which will emit fluorescent light. The fluorescent light is emitted in all directions with an almost equal probability.' The lens will, therefore, collect only a small fraction of the fluorescent light. This fraction passes the dichroic mirror and is focused into a pinhole in front of a detector. The detector will convert the flux of photons into a flux of electrons, which is converted to a number proportional to the intensity of the fluorescent light [Stelzer, 199.51. As pointed out, the excitation light passes the sample. It will excite not only the fluorophores that are in the plane of focus but also those that are either in front or behind the plane of focus. However, their images are either in front or behind the plane in which the point detector is located (Fig. 2). In the plane of the detector these images are expanded, hence only a small fraction of the light will pass the pinhole and enter the detector. The detector pinhole thus discriminates against the light that is not emitted in the plane of focus. The importance of the pinhole may become clearer if the detector pinhole is removed and a detector with a large sensitive area is used. The discrimination does not occur anymore. Instead, all the fluorescent light that is collected by the objective lens contributes to the signal. Such an optical arrangement behaves essentially like any conventional fluorescence microscope. Another view is to regard the objective lens as the device that forms an image of the illumination pinhole and the detection pinhole in their common conjugate image plane, the object plane 0.Only the fluorophores that are in the volume shared by the illumination and detection PSFs are excited and detected (Figs. 3 and 4). Therefore, in order to calculate the confocal PSF, one calculates the illumination intensity PSF and the detection intensity PSF and multiplies the two PSFs:
The PSFs can also be viewed as being proportional to probability density 'Each Buorophore behaves as a dipole, but in general its orientation is not fixed and many fluorophores are observed at the same time. On average the polarization and the orientation are lost. Fluorophores attached to bio-polymers behave differently (Marriott, G. Zechel, K., Jovin, T. M. (1988). Spectroscopic Biochem. 27(17) 6214-6220).
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
7 &\dichroic
\/ I-y
303
deflector1
1pinholclr
FIGURE3. Principal layout of a beam/object scanning confocal fluorescence microscope. A laser beam is focused into an illumination pinhole (in the plane i’), collimated and deflected toward a microscope objective lens, which focuses the light inside the object. The emitted light is collected by the objective lens, passes the dichroic mirror, and is focused into the detection pinhole in the image plane i. This pinhole allows the light emitted in the plane of focus o to pass and discriminates against all out-of-focus light (also see Fig. 2).
functions. An integral over a volume ui in the illumination intensity PSF describes the probability of illuminating the fluorophores in that volume. In order to operate confocally, both events -the illumination event and the detection event -have to occur. The probabilities have to be multiplied. In many cases the illumination and detection intensity PSFs are quite similar, and a reasonable first approximation of the confocal intensity PSF is to assume it is the square of the illumination intensity PSF: Ihc,(x, Y , z)12
Ihi,(x, Y , z)12
Ihiit(X9
Y , z)I’ = (Ihiii(X* Y , 211’)~.
Figure 4 shows three intensity PSFs. Figure 5 shows their components
N
304
x 2 N
0
4
a: 4 \9
4
305
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS b)
1
0.8
0 0.8
2
.-3 0.6 5
3
4 0.4
4 0.4
B
8 0.2 0.1
0.2
0.3
0.4 r [PI
0.2
0.4
0.6
0.8
1
1.2
1.4
[WI
FIGURE 5. Comparison of lateral (a) and axial (b) intensity point spread functions. (a) The components of the intensity PSFs in the focal plane ( z = 0) for an illumination at 488 nm, a detection at 530nm, a refractive index of 1.518, and a numerical aperture of 1.4. The illumination and detection curves describe the Airy disk. The confocal curve is the product of the illumination and detection curves. (h) The components of the intensity PSFs along the optical axis ( r = 0) at the same conditions. The illumination and detection curves follow the behavior of the function (sin(u/4)/(~/4))~.The confocal curve results from the product of the illumination and detection curves.
along a lateral direction (z = 0) and along the optical axis fr = 0). If one looks at the full width at half maximum (FWHM) value the CFM has an improved lateral resolution and an improved axial resolution by about a factor of 1/& The zero crossings (location of the first minimum) are of course identical in the PSFs. Using this definition, the CFM has no improved resolution. It should not be forgotten that a conventional fluorescence microscope has an axial resolution for pointlike objects, which is not much worse than that of the CFM. To fully appreciate the CFM one should look at the integrated intensities of the illumination and confocal intensity PSFs:
lr
r=m
Eill,int(z)
==
IhiI,
=
(r, z)122nrdr.
This function is constant, which reflects the conservation of energy. The square of the integrand, however, is not conserved. Thus the integral
jr=, r=m
Ecf,int(z)
=
(lhi1I(r, z)12)22nr dr
has a maximum in the focal plane (Fig. 6a). This is the explanation for the depth discrimination capability of a CFM [Cox, Sheppard, and Wilson, 1982; Wijnaendts-van-Resandt et al., 19851. The best illustration for this effect is to record the intensity as one focuses through the cover slip into a
306
B
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
0.8
;
illumination
c
o.6
2
.a 0.6
.
confocal
I\ \ 1u 0.4 0.2
-1.5
/
-1
-0.5
\
0.0
0.5
1
1.5
= trml
W I
FIGURE6. Integrated intensities and sea-response. The parameters for the calculations of the intensity point-spread functions are identical to those for the previous figures. (a) Due to the conservation of energy, the integrated intensity of the illumination PSF is constant. Since the square of the intensity is not conserved, the confocal PSF has a maximum in the geometrical focus. (b) The sea-response for a confocal microscope indicates the resolution of an axial edge.
thick layer of fluorophore dissolved in the immersion medium of the lens. This sea-response EcJ,sea(Zo) =
I;=:
jz=zn z = - x
(Ihin(r,z)I2)’2nr dr dz
is plotted in Fig. 6b. It shows the intensity recorded by the photodetector behind the detection pinhole in a CFM. The light distribution described by the confocal intensity PSF in the layer of fluorophore experiences no fluorophores outside the layer and finds fluorophores everywhere in its immediate environment once deep inside the thick layer. The slope and intensity variations in the shape of the sea-response can be used to characterize the resolution of many confocal microscopes. The sea-response is unique to the CFM. A conventional fluorescence microscope has no such property and, so long as no phase information is available, no computational methods are able to reconstruct the transition into the fluorophore layer from wide-field images. It may also become clear that not all contrasts apart from fluorescence will show depth discrimination in a confocal arrangement. Transmission contrasts (implemented using two lenses) [Brakenhoff, Blom, and Barends, 1979; Marsman et al., 19831 usually depend on absorption and on scattering. Only those in which the signal is a t least partially due to scattered light will have an improved lateral resolution (e.g., phase contrast and differential interference contrast). An axial resolution as defined through the searesponse is only available in fluorescence, reflection, and scattering light microscopy.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
307
B. Technical Asprcfs of a Confocul Microscope The confocal microscopes described so far observe only a single point in an object. Such an instrument, therefore, does not record an image. To get an image one must either move the beam relative to the object [Wilke, 19831 or the object relative to the beam while recording the intensity as a function of their relative position [Stelzer, Marsman, and Wijnaendts-van-Resandt, 1986; Voort et al., 1985; Wijnaendts-van-Resandt et al., 19851. In a practical instrument the beam is moved laterally in the focal plane of the instrument while the sample is moved along the optical axis. The lateral movement can be achieved by two mirrors, which control the direction of the beam in two orthogonal axes [Slomba et al., 1972; Wilke, 19851. The mirrors are mounted on very accurate motors (galvanometers) that allow almost arbitrary changes of the angle as well as the speed at which an angle is reached. The optical system assigns a position in the object to every angle and allows the beam to address every point in the object. A large angle is equivalent to a large field. Thus, changing the angle opening controls the field size [Stelzer, 1994; Stelzer, 1995; Stelzer, 19971. The axial movement is achieved by moving either the lens relative to a fixed stage (in most inverted microscopes) or the stage relative to a fixed optical system (in most upright microscopes). Since the axial displacement moves a larger mass, it is in general much slower than the lateral movement. A serious alternative is the use of scanning disks [Nipkow, 1884) that are located in an image plane. These have a number of holes (usually wellspaced) that transmit the light of ordinary lamps [Petran et al., 1968; Petroll et al., 19923. In its simplest form the same holes are used in the illumination and the detection process [Kino, 19951. One rotation of the disk covers the whole field of view, which is observed either directly or recorded using a camera, at least once. When a laser is used instead of the lamp, lens arrays can replace the holes in a disk and provide a very efficient and fast confocal microscope [Yin et al., 1995). Apart from the fact that certain compromises are made to allow for an efficient observation and the background is not as well discriminated, the properties of such systems are described in the same way as explained previously [Sheppard and Wilson, 1981). C. Applications of Confocul Microscopy
The main reason to use a CFM is to get rid of the background haze. The sample should be thick, that is, extend along the optical axis, and in the conventional fluorescence microscope the image should suffer from out-offocus contributions. Using the CFM the images become crisper, and features
308
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
that were invisible become observable. The new information one gets by analyzing the images contributes significantly to what was known before. Some typical good applications are the observation of small but densely labeled structures such as chromosomes [Agard and Sedat, 1983; Merdes, Stelzer, and De Mey, 1991; Stelzer, Merdes, and De Mey, 19911 or the observation of large, thick objects such as mouse embryos [Palmieri et al., 19941. The confocal microscope is of no use when the samples are flat. The slightly higher lateral resolution usually cannot be used because the signalto-noise ratio is not sufficient [Stelzer, 19981. A really bad application for CFM is to study flat, fluorescent in situ hybridized samples (for the technique, not the application, see Speicher, Ballard, and Ward [1996]). This also applies to samples in which fluorescent objects are sparse, well separated, or hardly overlapping, which is very often the case when less abundant proteins are observed. Problems are also raised with very dense objects such as those encountered in many medical samples. The dye concentrations are too high, and absorption effects prevent a penetration beyond that available in conventional fluorescence microscopy. That a new instrument ever becomes widely accepted depends on several factors. The new instrument must provide information that was not available until then or only available at outrageous costs or efforts. The instrument must be reliable. Most importantly, there have to be scientists who are willing to invest time in the sample preparation. The latter is usually underestimated. In confocal microscopy one of the main efforts is to make sure that the three-dimensional structure is preserved. This is a tricky task that is very often not accomplished and is the main reason why only living specimens were observed initially [Van Meer et al., 19871. However, considerable progress has been made [Bacallao, Kiai, and Jesaitis, 1995; Reinsch, Eaton, and Stelzer, 19981, and fixed as well as living specimens are now relatively easily observable [Zink et al., 19981.
D. Alternatives to Confocal Microscopy The simplest alternative to CFM is to work with small fields of view. This is particularly useful when thick specimens are observed, where the production of stray light is avoided by restricting the illumination to the actually observed field of view. In fact, the confocal fluorescence microscope can be regarded as the instrument that implements the lower limit for the field size by decreasing it to a single focal volume. The diameters of the illumination and detection pinholes are determined by the diffraction limit. Perhaps the most serious contender for CFM is conventional recording
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
309
and a subsequent deconvolution [Cox and Sheppard, 1993; Holmes et al., 1995; Krishnamurthi et a]., 1995; Sandison et al., 1995a; Shaw, 19951, which has been described many times and is available through many software manufacturers.
E. Optimul Recording Conditions In conventional microscopy the magnification of the lens determines the field size, and since ordinary film has the extremely high resolution of several thousand lines, a full field of view can be photographed at the resolution of the lens. The disadvantage of film is a low sensitivity. CCD-based cameras have a good sensitivity but a limited number of picture elements [Hiraoka, Sedat, and Agard, 19871. Working with appropriate oversampling requires a reduction of the field of view. On the other hand, extensive oversampling reduces the number of photons per picture element, and the images tend to become noisy. Scanning microscopes suffer in principle from the same problems as microscopes that use cameras, but the amplitude of the scanner and, therefore, the field of view can be changed, the pixel-pixel distance can be very small, and the dwell time per pixel can be adapted. The main disadvantage of scanning microscopes is that they are sampling devices. They observe one picture element at a time, whereas cameras record the intensities of all picture elements in parallel. From the point of view of sampling, one requires between 8 and 16 picture elements per Airy disk diameter to record a fully resolved data set [Stelzer, 19983. In an image with a size of 500 elements per line and 500 lines per image the field area will be reduced to about 5%. Obviously, the higher the resolution the smaller the field will be. A serious problem in CFM is that the dyes are essentially consumed during the illumination process. Fluorophores can only be excited a certain number of times before they become nonfluorescent and in some cases even toxic. This limits the number of photons one can get from a sample and the resolution that can be achieved. But this of course works in both directions. Provided an image has been recorded, it should be possible to estimate the number of photons and the resolution actually achieved. Good methods that estimate the perfect recording conditions and take all these aspects into account have not been implemented until now. F. Index Mismatching Efects
A serious problem that cannot be neglected is the spherical aberration due to mismatching of the refractive indices. One problem is that high-resolution
3 10
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
FIGURE7. Calculation of point-spread functions in optically mismatched systems. There are at least four elements in the optical path of a microscope that can have different refractive indices: the objective lens (nl), the immersion medium (nJ, the cover slip (n,). and the sample (n4). Ideally, all four are identical. In many cases in biology, the values for the refractive indices are n, = nz = n , = 1.518, and n4 = 1.33. This mismatch will cause a change in the position of the focal point. The actual position (AFP) is closer to the cover slip than the nominal focal position (NFP).
oil immersion objective lenses are used to observe specimens that are embedded in an aqueous medium (Fig. 7 ) . Another problem is that the refractive index varies inside large specimens, and recording conditions that may be valid in one spot may not work in others. This problem is important for quantitative microscopy. The usual case (high NA oil immersion lens, aqueous embedding medium) causes a shift of the actual focal plane toward the lens, hence a decrease of the axial distances. A decrease of the maximal intensity as well as an increase of the axial FWHM as one moves the focal plane further away from the refractive index transition plane degrades the image quality. For example, ten microns below the transition plane the axial FWHM is twice as large as under perfect conditions [Hell et al., 19931. The literature on this topic is quite rich [Gibson and Lanni, 1991; Torok, Hewlett, and Varga, 1997; Torok et al., 19961, and the effects are quite well
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
311
understood. But, despite a number of efforts [Visser, Groen, and Brakenhoff, 1991; White et al., 19961, it is unlikely that such effects will ever be correctable. The only reasonable solution to this serious problem is to use water immersion lenses. This attempts to evade the problem. The disadvantage is a lower resolution close to the cover slip (i.e., transition plane) but the advantage is of course a uniform, undistorted view of the complete specimen [Hell and Stelzer, 19951.
IV. IMPROVINGTHE AXIAL RESOLUTION The important role confocal fluorescence microscopy has in modern research is entirely due to its axial resolution, that is, its depth discrimination capability, which allows three-dimensional imaging. However, since a typical microscope objective covers only a small fraction of the full solid angle of 47t and thus focuses only a small segment of a spherical wave front, the axial resolution of a confocal microscope is always poorer than the lateral resolution. Hence the observation volume in any single-lens microscope is an ellipsoid elongated along the optical axis (Figs. 4 and 5). A large extent of the observation volume in that direction is equivalent to poor axial resolution. This elongation gives rise to certain artifacts [Stelzer et al., 19951, and any attempt to improve CFM should address the axial resolution and try to decrease its extent. A . Standing- Wave Fluorescence Microscopy
In standing-wave fluorescence microscopy (SWFM) two coherent, counterpropagating planar waves cross each other in the specimen volume [Bailey et al., 1993; Lanni, 1986; Lanni, Waggoner, and Taylor, 19861. The fluorophore in the specimen is excited by a series of axially spaced planar interference fringes, which are parallel to the focal plane of the microscope (Fig. 8). The fluorescence images are recorded as a function of the position of the object relative to the focal plane or relative to the phase of the two planar waves using sensitive cameras. SWFM is, therefore, not a confocal method. Although the SWFM can be regarded as an image-forming device since it produces an image, the full information must be reconstructed from a series of images. While the first instruments were based on total internal reflection (TIR) [Lanni, 19863 or a setup with a mirror [Bailey et al., 19931, a more powerful design uses two opposing microscope objectives [Bailey, Krishnamurthi, and Lanni, 1994; Lanni et al., 19931. If the two fields are polarized normal to their common plane of incidence
312
ERNST H. K. STELZER A N D FRANK-MARTIN HAAR
’\*>
0
~
, * upper objective lens
,
[lower objective lens]
FIGURE8. Setup of a standing-wave fluorescence microscope with two objective lenses. Two coherent counter-propagating planar waves overlap in the specimen on the optical axis of the microscope. They create an axial interference fringe field consisting of nodal and anti-nodal planes with a spacing As. Using complementary angles of incidence H with respect to the optical axis, the nodal and anti-nodal planes are parallel to the focal plane. Fluorescence in the specimen is excited at anti-nodal planes. One of the objectives is used conventionally to form an image of the specimen in a camera.
(s-polarization), are of equal amplitude, and cross at complementary angles (0,n - 6) relative to the axis of the microscope, the resulting excitation intensity field varies sinusoidally along the microscope axis
I,,, Here k
= 2nn cos 6/A
= ZO[l - cos(2kz
+ $)I.
where I is the wavelength and n is the refractive
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
313
index, and 4 specifies the shift of the pattern relative to the specimen. The nodes and anti-nodes of this field, which are planes parallel to the focal plane, are AS = ____ 2n cos 8 apart. By controlling the angle 8, the node spacing Asmincan be varied down to a minimum value of
The relative position of the nodes and anti-nodes within the specimen can be adjusted without changing the node spacing by shifting the relative phase of one of the beams. The PSF of a SWFM is calculated by multiplying the PSF of a conventional epifluorescence microscope with Zexc(z): IhSWFhdX,
Y, Z)l2
=
Ihdetb,
y , z)I2 . I3 - cos(kz + 411.
The lateral resolution of a SWFM is determined by the conventional lateral properties of the objective lens. The enhanced axial resolution in SWFM is due to the modulation of the excitation field and thus not directly limited by diffraction. It can be estimated by noting that two small objects will be differentially excited by a 180" shift in the field if their axial separation is half the node spacing: Asmin/2= L/4n. Using ultraviolet light with a wavelength 1 = 365 nm, and considering a refractive index of water ( n = 1.33), the axial resolution limit is around 68 nm independent of the numerical aperture of the lens. However, this is only correct in the case of a very thin specimen that falls entirely within the depth-of-field of a high-NA objective lens and has a thickness t < ;1/4n. The sample thickness is less than half the node spacing. A controlled movement of a single node or anti-node within the object alternately illuminates stratified structures (optical subsectioning). In this case the axial resolution can be better than 1/8n, or 40-50 nm, which is one order of magnitude better than a confocal microscope. SWFM is particularly useful when the specimen is so thin that only one or two nodal planes cover its entire depth. In thicker objects several planes are illuminated at the same time, and their separation becomes very complicated. The SWFM has an axial discrimination, which is determined by the sum of detection PSFs as in any image-forming device. All layers, which are axially separated by half a wavelength, are observed simultaneously and there is no obvious way to resolve this ambiguity in a general manner. No successful reconstruction of a thick specimen has been reported until now.
3 14
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
A very interesting improvement is due to developments by Gustafson, Agard, and Sedat [1995; 19963, who use “white light.” It means the coherence length becomes extremely short and the axial interference pattern extends only over a few micrometers, which makes the reconstruction process much simpler. A serious problem in all high-resolution methods using interference is wave-front uniformity. Its effect is obvious in SWFM, and little can be done to account for the refractive-index heterogeneity inside a specimen and the resulting light scattering or wavefront distortion. Defocusing, aberrations, or irregularities on reflecting surfaces cause deformations in all planes. An optimization of the SWFM optics permits nodal plane flatness better than one-tenth of a wavelength peak-peak [Freimann, Pentz, and Horler, 19971 so long as the specimen does not change any phase relationships. Excitation field synthesis [Lanni et al., 1993) can be used to further improve the power of SWFM. Coherent light sets up an interference pattern that is most intense at the in-focus plane of the specimen and is sharply attenuated over sub-wavelength distances above and below this plane. As a result, the axial resolution can be improved to well below the wave-optical depth of field of the objective lens. B. 4Pi-Confocal Fluorescence Microscopy In a 4Pi-confocal fluorescence microscope (Fig. 9), a sample is illuminated and/or observed coherently through two coaxial objective lenses opposing each other but having a common focus [Hell and Stelzer, 1992b; Hell et al., 1994c; Lindek, Stelzer, and Hell, 19951. This technique is, in effect, an increase of the angular aperture, hence an improvement of the axial resolution. It was given the name 4Pi-confocal microscopy since the technique tries to come close to a perfect spherical wave with a solid angle of 471. In contrast to SWFM, the lenses focus the light into the focal volume. Lanni [1986], who introduced the SWFM, already noted that the convergent beam of an objective lens damps the axial lobes. As in SWFM, coherent illumination wave fronts can interfere in the focal volume, and the illumination intensity PSF is modulated along the optical axis. Depending on the phase difference cp, the interference in the geometrical focal plane is constructive (cp = 0, Zn,. . .), destructive (cp = TC, 371,. . .), or something intermediate (e.g., cp = n/2). The calculation of the 4Pi-illumination intensity PSF requires two counter-propagating amplitude PSFs, which are independently shifted in phase (Fig. 10). This is indicated by two functions whose difference is proportional to the phase: @,(a) - a2(p) cc eiq.
315
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
I
ldichroic bcamsplitter
?Z
.
... . .._... ..
t I
Ibeamsptittedh
__.
Fieurn 9. Schematic diagram of a 4Pi-confocal fluorescence microscope. Laser light is appropriately split into two coherent beams, which are deflected into two opposing objective lenses and thereafter interfere in the focal region. A phase-compensating device in the illumination path adjusts the relative phase of the beams to allow for constructive or destructive interference in the geometrical focus for 4Pi(A) contrasts. The same lenses collect the light, which now passes dichroic beamsplitters that separate the illumination from the detection path. A phase-compensating device in the detection path is needed to adjust the phase for 4Pi(B) and 4Pi(C) contrasts. The two beams are combined and interfere in the pinhole in front of the detector. For 4Pi(A) contrasts, either one lens is used to collect the detection light or both lenses collect the light incoherently.
In the case of constructive interference, the modulation of the PSF leads to a central main maximum with a full width at half maximum (FWHM) that is four to five times smaller than the FWHM of the envelope (Fig. lob): lh4Pi,ill(X,
y,2)I2 = I@~fa)hillfx, Y ,2)
+ @,(B)hill(X,
Y , -z)l2.
Unfortunately, secondary maxima are also present along the optical axis, and their contribution or loss has to be taken into account when calculating the total illumination volume (Fig. 10). For reasons of symmetry, these considerations can also be extended to the detection of light. Provided the
1.5
[WI 1
4 0 0
0.5
0
LA w
rn
-0.5
-1
0
0 -1.5
-0.6 -0.4 -0.2 0
0.2 0.4 0.6
rrw1 FIGURE 10. Intensity PSFs of a 4Pi(A)-confocal fluorescence microscope. (a, b, c) The parameters for the calculations of the intensity point-spread functions are identical to those for the previous figures. (a) The two-dimensional intensity PSF for a 4Pi(A)-confocal fluorescence microscope. (b) The z-component of the illumination intensity PSF forms an envelope for the z-component of the 4Pi-illumination PSF. (c) The multiplication by the detection PSF damps the axial side lobes. The result is the z-component of the 4Pi(A)-confocal fluorescence microscope. (d) Using the higher numerical aperture of 0.9 but otherwise identical conditions produces considerably more fringes. Note that the z-axis scale has changed. ( e ) For the confocal 4Pi(A) microscope the multiplication by the detection PSF reduces the axial side lobes.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
317
path length difference is smaller than the coherence length of the emitted light, the signal collected by the two objective lenses interferes in the detector, and the detection PSF presents an analogous interference pattern along the optical axis: lh4Pi.det(X?
Y?z)t2
= tyl(t)hdet(xj
Y , z>+ y2(t)hdet(x,
Y? -z)12.
In any confocal arrangement, the PSF of the microscope is the product of the illumination and detection PSFs. Consequently, combining the 4Pi-method with the confocal principles leads to several microscopies [Hell and Stelzer, 1992bl. In the 4Pi(A)-confocal microscope, illumination occurs coherently through two lenses, while the fluorescently emitted light is gathered incoherently using probably only one of the two lenses [Hell and Stelzer, 1992b1: lh4Pi(A)(X,
Y , z)12
= lhLPh.ill(X, Y,
z)12
’
IhdedX?
Y , z)12.
In a 4Pi(B)-confocal microscope [Hell et al., 1994a], the illumination occurs through one lens, while both lenses are used to collect the emitted light coherently and have it interfere in the pinhole in front of the detector: lh4Pi(B)(X,
y , z)12 = lhilI(& y , z)lz ’ Ih4Pi,det(X, y , Z)l’.
In a 4Pi(C)-confocal microscope [Hell et al., 1994~1,illumination and detection are both coherent through two objective lenses: JkPi(C)(x,
Y , z)12 = t h 4 P i , i d ~ Y, , z)12 . I h P i , d e t ( x , Y , z)Iz.
These effects have been extensively verified in a series of papers [Lindek, 19931. An application that makes use of this technique has not been reported until now. An important improvement has been the use of two-photon excitation [Gu and Sheppard, 1995; Hanninen et al., 1995; Hell and Stelzer, 1992a; Hell et al., 1994b; Hell, Lindek, and Stelzer, 1994d; Lindek, Stelzer, and Hell, 19951. The resolution of a two-photon 4Pi(A)-confocal microscope is of course worse than its single-photon counterpart, but the main effect is that node spacing will only change in the illumination intensity PSF, and the subsequent multiplication by the intensity detection PSF reduces the axial lobes much more efficiently. The second major improvement has been the use of de-convolution and other more sophisticated computational methods that take the imaging process into account and numerically remove the axial side lobes [Hell et al., 1996b; Hell, Schrader, and Van Der Voort, 19971. C. Confocal Theta Microscopy
Confocal theta microscopy has been proposed [Stelzer and Lindek, 19941 to overcome the elongation of the observation volume and to achieve an
318
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
almost isotropic resolution. The idea is to change the spatial configuration of the illumination and detection volumes by using two different axes (Fig. 11). One objective lens is used to illuminate the sample. The other objective lens has its optical axis at an angle 9 to the illumination axis and is used to collect the emitted light. The improvement in axial resolution stems from an arrangement in which the detection axis is nearly orthogonal to the illumination axis. Then, the good lateral resolution in the detection path compensates the poor axial resolution in the illumination path and vice versa. The lateral resolution dominates the overall resolution, and the resulting observation volume is nearly isotropic [Lindek and Stelzer, 1994; Stelzer and Lindek, 19941. It can be shown that an azimuth angle of 9 = 90" results in the smallest confocal volume [Lindek et al., 1994bl. Angles between 70" and 110" result in still acceptable small volumes [Lindek, Pick, and Stelzer, 1994a1. There are different possibilities to realize such an optical arrangement. The most apparent way is the use of two (or even more) microscope objectives, which are positioned in such a way that an angle 9 of nearly 90"
FIGURE11. Principles of theta microscopy. Two optical axes are used for illumination and detection. The two PSFs are centered and tilted by 90" relative to each other. Since the axial extents of the PSFs are larger than their lateral extents, they overlap in a volume whose size is dominated by the lateral extents. Fractions of the volumes encircled by the PSFs are illuminated but not detected and vice versa.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
319
between illumination and detection axis can be achieved [Lindek, Pick, and Stelzer, 1994a; Stelzer et al., 19951. Two more practical solutions that can be adapted to any standard confocal microscope are the theta double objective (TDO) [Stelzer and Lindek, 1996a) or a configuration where a single microscope objective lens can be used (single-lens theta microscope, SLTM) [Lindek, Stefany, and Stelzer, 1997; Stelzer and Lindek, 1996bl. In Fig. 12 one possible SLTM design is shown. The beam of the microscope objective is reflected by the surface of a horizontal mirror, and the fluorescence signal is detected by the same objective via a coated rectangular prism that is glued to the flat mirror (Figs. 12 and 13). This mirror unit is placed between the objective lens and its focal plane. It is deflecting illumination and detection light in such a way that their foci coincide and the detection axis is perpendicular to the illumination axis. Since the
FIGURE 12. Single-lens confocal theta microscope. The SLTM is based on a confocal fluorescence microscope. The light is focused into the sample that is above a horizontal mirror in the object plane 0.A second mirror deflects the light that is emitted at an angle of 90". While the illumination is on the optical axis, the detection is off-axis.
320
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
FIGURE 13. Mirror unit in a single-lens confocal theta microscope. The theta mirror unit consists of a horizontal mirror with an attached prism. The incident laser beam is deflected off the plane mirror and forms a focus above the front mirror surface. The fluorescence is emitted in all directions, but only a fraction can be detected due to the constraints of the optical system. As indicated by the dashed lines, it is deflected off the hypotenuse of the prism toward the microscope objective lens and is focused into an off-axis point in the image plane.
illumination light is reflected by the surface of the horizontal mirror, the resulting focus is above the mirror surface. The detection light is reflected by the hypotenuse face of the rectangular prism so that the detection axis is horizontal. Using further prisms on the flat mirror, different kinds of microscopies can be realized [Lindek, Stefany, and Stelzer, 19971. For example, a 4Pi(A)confocal fluorescence microscope can be built by arranging two prisms opposite to each other. The 4Pi-illumination is performed using the prisms, and the fluorescence light is detected using the flat mirror. The investigation of physical or biological specimens in a confocal theta microscope is performed by mounting them onto glass capillaries, which are scanned through the coincident foci of illumination and detection axes.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
32 1
The extent and the overlap of the illumination PSF and the detection PSF determine the shape of the PSF. Therefore, the resulting observation volume can be considerably reduced by choosing an angle 9 % 90' between the axes used to illuminate the sample and to detect the emitted light. This means the good lateral resolution of the optical detection system compensates for the inferior axial resolution of the illumination system. Additionally, the good lateral resolution of the illumination system compensates for the inferior axial resolution of the detection system. The result is an almost spherical observation volume that results in an almost isotropic resolution (Fig. 14):
The confocal fluorescence-4Pi(A) theta intensity PSF is calculated by replacing the illumination PSF by the 4Pi(A) illumination PSF. The excellent lateral resolution of the detection intensity PSF damps the axial side lobes considerably (Fig. 14c, d). Since the lateral and axial resolution are proportional to the inverse of the numerical aperture and its square, respectively, theta microscopy is very well suited for optical systems that use low-NA, large-working-distance objective lenses [Stelzer et al., 19951. There are also some technical constraints, since two lenses have to come quite close, and although a water immersion lens with a numerical aperture of 0.9 has a better axial resolution than an oil immersion lens with NA = 1.4 its volume resolution is still worse [Stelzer and Lindek, 1994; Sheppard, 19953. More about theta microscopy is found in remarks by other authors [Gu, 1996; Hell, 1997; Shotton, 19951.
V. NONLINEAR IMAGING The methods described so far assume that the intensity of the fluorescence emission is linearly proportional to the intensity of the absorbed light. This assumption is usually correct so long as most of the fluorophore molecules are in the ground state. If the excitation intensity becomes too high, the linear response fails and less fluorescent light is emitted than expected. The other inherent nonlinear effect is that the fluorophores are consumed by bleaching. There is always a certain probability that fluorophores react with each other or fall victim to free oxygen radicals, that is, they are photobleached. Although such effects have been described and taken advantage of [Brakenhoff, Visscher, and Gijsbers, 1994; Sandison and Webb, 1994; Sandison et al., 1995b], they are not yet common.
322
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
I
-0.5 ___a)
0
0.5
-
@
twl 0.5
0
-0.5
I
0.5
0
-0.5 0
0.2
-1 -0.5
0
0.5
0.4
0.6 0.8
1
1.2
x [pml
FIGURE14. Intensity point-spread functions of confocal theta microscopes. All PSFs are calculated for a numerical aperture of 0.9, a refractive index of 1.518, an excitation wavelength of 488 nm and an emission wavelength of 530 nm. (a) Two-dimensional confocal intensity PSF for a confocal theta fluorescence microscope. (b) Comparison of the z-component of the PSF in a confocal and a confocal theta fluorescence microscope. The axial resolution is improved by a factor of 3.5. (c) Two-dimensional confocal intensity PSF for a 4Pi-confocal theta fluorescence microscope. (d) Comparison of the z-component of the PSF in a 4Pi-confocal and a 4Pi-confocal theta fluorescence microscope. The higher side lobes are well suppressed.
The most popular nonlinear imaging methods take advantage of absorbing more than a single photon. Another method is to deplete the ground states in certain areas with one laser and thereby prevent their observation with another probing laser.
A . Two-Photon Excitation
During the research for her Ph.D. thesis, Maria Goppert-Mayer [1931] was the first to realize that the transition from the ground state into an excited
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
323
state can be accomplished by absorbing two photons, each having half the energy of the gap. She also realized that the probability for such a process is quite low and that high intensities are required to induce it. Thus it was not until 1961 that Kaiser and Garrett [1961] proved the existence of this effect in an experiment using lasers. The important aspect for microscopy is that the fluorescence emission Flhv after two-photon excitation (TPE) is proportional to the probability of absorbing two photons within a short period of time. This probability is proportional to the square of the excitation intensity, hence the fluorescence intensity is proportional to the square of the excitation intensity: Flhv
Iexc
F2hv
I;xc.
The PSF of a microscope that is based on TPE is thus the square of the illumination intensity PSF [Sheppard and Gu, 19901. The TPE-microscope has the same properties as a CFM but does not require a detection pinhole. In a CFM, having point illumination and point detection create the volume. In a TPE-microscope the volume is created by the intensity squared dependence of the fluorescence emission: IhZhv(X,
Y,Z)l2
= (Ihill(X,
y,z)12)2.
By adding a point detector the resolution is further improved [Stelzer et al., 19941: Ih2hv,r/(X,
Y , z)12
=
Y,z)12)2
'
Ihdet(&
Y,z)l
2 '
Denk, Strickler,and Webb [1990; 19911were the first to describe a microscope based on TPE. They used a colliding pulse mode-locked (CPM) laser to demonstrate the effect in test samples and biological specimens. More recent microscopes usually use lasers with short pulses in the femtosecond range and peak pulse powers in the kW range. TPE has been reported with picosecond- and cw-lasers [Hanninen, Soini, and Hell, 1994; Hell et al., 1994b), but their advantage for microscopy is not clear at the moment. Basically, all important dyes can be excited with laser lines between 700 and 1100 nm [Fischer, Cremer, and Stelzer, 1995; Xu and Webb, 19961.The wavelengths are thus about twice as long as those used for single-photon excitation. The longer wavelength is also the reason why the resolution of the TPE-microsco is worse than that of a CFM, which is only partially compensated by the l/$erm due to the squaring effect. A discussion of the possibilities to use two-photon emission has also been reported [Hell, Soukka, and Hanninen, 19951. TPE-microscopy will also not evade the problem of the refractive index mismatch just discussed. This has been verified theoretically [Hell et al., 1993; Hell and Stelzer, 1995;
324
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
Jacobsen et al., 1994; Jacobsen and Hell, 1995) and experimentally [Hell et al., 1993; Jacobsen et al., 1994; Hell and Stelzer, 1995; Jacobsen and Hell, 19951. (1) The main advantage of TPE-microscopy is that an illumination volume is created. This has been demonstrated in bleaching experiments using single- and two-photon excitation. Only in TPE, a hole (a volume in which the fluorophore has been bleached and is no longer excitable) is found in the location of the geometric focus [Denk, Strickler, and Webb, 1990; Stelzer et al., 19941. In single-photon excitation the fluorophore is bleached all along the optical axis, so only those volumes that are observed (Az = 50.45 pm) are actually excited in TPE-microscopy. (2) Caged fluorescent dyes and compounds such as caged FITC [Mitchison, 19891 and caged ATP [Kubitscheck, 19951 can be activated in a volume well defined in three-dimensional space. ( 3 ) The excitation light is well beyond the glass barrier of 380nm. (4) Fluorophores excited at higher wavelengths are usually quite efficient, and photomultipliers have a higher quantum efficiency in the blue region than in the red region. (5) Biological objects are less sensitive to near-infrared than to blue and ultraviolet light. (6) Since the excitation wavelength is higher, less light is scattered [Stelzer et al., 19941. B. Muhiphoton Excitation A further extension of TPE-microscopy is the use of three or more photons
to bridge the gap from the ground state to an excited state in a fluorophore. Several papers report results in this direction [Davey et al., 1995; Gryczynski, Malak, and Lakowicz, 1996; He et al., 1995; Hell et al., 1996a; Nakamura, 1993; Sheppard, 19961. However, an application has not been reported and an advantage is not obvious: Ihnhv(x,
Y , z)12 = (Ihilj(X, Y , z)121fl
Ihnhv,c/(X,Y ,
412= (Iil,(X, Y , 412)”. Ihdet(Xr y , 412.
The resolution will be further improved but the intensities are much higher (the excitation efficiency decreases), and the likelihood of inducing artifacts, for example, to damage the samples or the fluorophores, is also much higher. C. Stimulated-Emission-DepletionFluorescence Microscopy After the absorption of an excitation photon, the fluorescent molecules undergo a transition from a low vibronic level of the ground state So to a
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
325
vibrationally excited level of a higher singlet state S , . Within picoseconds this level decays to a low vibronic level of the first singlet state S , , which has a lifetime of 1-5 ns. This state is now susceptible to stimulated emission, provided that the wavelength of the light is in the emission spectrum of the dye. Hell and Wichmann first proposed stimulated-emission-depletion (STED) fluorescence microscopy [Hell and Wichmann, 19943. In STED microscopy the diffraction resolution limit is overcome by employing the effect of stimulated emission to inhibit the fluorescence process in the outer regions of the illumination PSF. Therefore, the spatial extent of the PSF in the focal plane is reduced, and as a consequence the resolution is increased. The stimulated emission is induced by an additional beam of light (STED beam), which depletes the excited singlet state S , of the fluorophores before fluorescence can take place. For stimulated emission it is advantageous to use pulsed lasers with pulses significantly shorter than the average lifetime of the excited state, that is, in the picosecond range. STED can thus be realized by two subsequent pulses, one for excitation and one for stimulated emission. This results in a temporal separation of excitation and stimulated emission. A set-up of a STED microscope should be possible by using two STED beams symmetrically offset by v = 1 . 2 2 ~with respect to the geometric focus. With this offset the first minimum of hSTED(u)coincides with the maximum of h,,(u). The resulting effective excitation PSF of the STED microscope is
where n , ( x , y, z ) is the spatial distribution of the fluorophore molecules in the S , state. The lateral resolution of STED microscopy is about 3-5 times higher than that of confocal microscopes. On the other hand, the increase in lateral resolution is associated with a reduction of the detectable intensity. D. Ground-State-Depletion Fluorescence Microscopy
Hell and Kroug [19951 introduced the ground-state-depletion (GSD) microscope. The idea is to deplete the ground states of the fluorophores in the outer regions of a PSF in such a way that no excitation-emission process is possible and that all emissions come from the innermost region of the PSF. In contrast to the STED microscope, the GSD microscope can be used with low-power, continuous-wave illumination. A lateral resolution in the range of 10-20nm, which would be an improvement by an order of magnitude compared to confocal microscopy, has been predicted. The calculation of the intensity PSF relies on three or more PSFs (Fig. 15). The
5;
326
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
327
central PSF is responsible for the actual detection process while the outer PSFs cause the depletion of the outer regions: IhGdx, Y,z)12 = Ihildx, Y, z)12 .(1 - Ihill(x + a, Y , z)12Y (1 - IhiIl(x - P, Y,~11'). '
If the intensity of the laser used for GSD is higher than 10 MW/cm2, the first triplet state T, of the fluorescent molecules has to be taken into account. The molecules undergo a recycling process from the ground state So to the first singlet state S , and back to So. During each loop, a fraction is caught via intersystem crossing from S , to T, into the long-lived triplet state. This leads to the depletion of the ground state. The ground state remains depleted so long as the excitation beams are switched on. Considering an arrangement of GSD and excitation laser beam as pointed out for STED microscopy, one can estimate the resolution of GSD microscopy. For an offset of Au, = 1.2271, the first minima of the GSD beams coincide at the geometrical focus, hdepl(Ux) =
h,(U, - AU,)
+ h2(u, + Au,),
whereas the main maximum of one beam partly overlaps with the first side maximum of the other. Their intensity can be given by IhGSD(x9
Y , z)12 = Ihill(X,
Y,z)12 .(1 - nz(x, Y , z)),
where n,(x, y , z ) is the spatial distribution of the fluorophores in the triplet state. GSD fluorescence microscopy is limited by the relaxation of the dye from the triplet state, since this determines the maximum pixel scan rate. To record the neighboring point, one has to wait until all the molecules are back in the ground state again. Therefore, the maximum recording speed is about 200 kHz, which is of the same order as that of a confocal laser scan microscope.
VI. APERTURE FILTERS
A well-established field is the modification of the PSF by changing the light distribution in the illumination and/or detection aperture. In such arrangements, absorbing or phase-shifting plates are placed into the Fourier plane of the optical system. They will affect all beams, independent of the location of their focus in the object plane. Well known in conventional microscopy are phase contrast and differential interference contrast (DIC or Nomarski). Both can be used in laser scanning microscopy but are of no importance to CFM.
328
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
A main issue in any fluorescent microscopy is to have an efficient detection system. All fluorescence emitted by an excited molecule should be detected. Apertures in the illumination path can, however, have any desired transmittance so long as simply increasing the illumination intensity can compensate it. Apertures in the detection path must have a high transmission and are thus best avoided. The idea of using apertures was discussed extensively by Francia [1952], who realized that super-resolution can be pushed to an arbitrarily high level at the expense of signal. He also noted a conflict with some basic physical principles. Boivin [1952] offered a calculation more relevant to the theory of microscopy in which he determined the diffraction due to concentric arrays of rings. Finally, McCutchen [19643 discussed the three-dimensional intensity and phase distribution in the focus and presented a result for the smallest achievable diameter of a focal spot. Since 1982 a whole series of papers have been published that discuss the effects of the aperture modification on the lateral and axial resolution in CFM. In essence, it is probably fair to say that annular apertures will improve the lateral resolution but at the same time tend to decrease the axial resolution. An exception is the confocal theta microscope because its lateral and axial resolution are determined by the lateral extents of the illumination and the detection PSFs. Annular apertures will thus improve the lateral as well as the axial resolution in confocal theta fluorescence microscopy. More complicated apertures with a central and an annular opening have been proposed. A special class of these filters, where the area of the central opening and the outer annular opening are identical, have the interesting property of leaving the lateral resolution intact but improving the axial as shown by Martinez-Corral et al. resolution by at most a factor of [1995a]. An axial resolution gain results in loss of transmission. Other types of apertures with gradient intensity changes and super-resolving characteristics have been proposed, but their fabrication is somewhat difficult. To summarize, the resolution gains achievable with technically feasible pupil plane filters are at most a factor of two [Hegedus and Sarafis, 19861. A breakthrough has been the proof that any rotationally symmetric pupil filter can be approximated by a binarized set of concentric rings. This has pushed the field somewhat and resulted in special apertures for improved resolution in confocal microscopy. By this method the implementation of gradient intensity changes has become possible, since the fabrication of binarized versions of these filters with similar properties is easier. There are various different methods for achieving this binarization [Hegedus, 19851. The calculation of PSFs is identical to the methods we have mentioned so far. A pupil function P(p) defines the contributing areas. Intensity PSFs are calculated for the detection and the illumination path and then
fi
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
multiplied: Il,(U,
v)
= --i
;&
2 z n sin2cc ~
A
329
jol
P ( p ) J , ( o p ) e f i U p 2dp p - 1 d P ( p ) < 1.
In Fig. 16 we present an example described by Martinez-Corral et al. [1995a], which shows an axial resolution improvement by a factor of Finally, it should be mentioned that these apertures are located in conjugate Fourier planes. So, even with a single lens for illumination and detection, the pupil functions can be different.
fi.
VII. AXIALTOMOGRAPHY If the specimen is tilted in the focal plane it can be imaged from different directions, and this should result in an improved axial resolution. Skaer and Whytock [1975] first described the tilting of objects by a few degrees. Shaw [1990] and Shaw et al. [1989] presented a method in which the specimens were tilted by an angle of k90”. This allowed an investigation of the object from different sides. But the complexity of internal movements of structures in biological specimens and internal rearrangements during the tilting process seem to have limited the resolution improvement. Axial tomography is a microscopic technique first presented by Bradl et al. [1992] that tilts objects by any angle in the range 0 to 2z. Twoand three-dimensional images become recordable from any desired perspective. A special tilting device is used in which the specimens are adapted to a rotatable mounted capillary or a glass fiber (Fig. 17). The rotation axis of the object is usually parallel to the capillary axis. The resolution along all three axes depends on the technique that is used to observe the object. Using a CFM will provide the highest resolution. The method can be used to generate views from different directions and to use computational methods to reconstruct an improved view of the object [Larkin et al., 1994; Satzler and Eils, 1997; Shaw, 19901. However, a common problem in many fields in biology is the quantitative distance measurement of adjacent objects. In a conventional microscope the depth of field region of a 3D object is projected onto a two-dimensional image. Therefore, distances mostly appear to be shorter than they are. For the determination of the distance d of two points P and Q in the object space, one has to determine their coordinates. d
= J(x,
-XJ2
+ ( Y , - Y J 2 + ( z p- z*)”
The error Ad of the distance measurement depends on the spatial location
.
C
7
w
E
w
B
.-
0
B -I
-1.5
-0.6 -0.4 -0.2
0
0.2 0.4
0.6
-1.5
-1
-0.5
0.5
1
[WI FIGURE 16. Intensity point-spread function for a confocal fluorescence microscope with a multislit aperture. The PSFs are calculated for a numerical aperture of 1.4, a refractive index of 1.518, an excitation wavelength of 488nm, and an emission wavelength of 530nm. The central opening and the outer ring have the same area. Half of the illumination aperture is obstructed while the detection aperture is completely open. (a) Two-dimensional confocal intensity PSF. (b) Comparison of the r-component of the PSF with that of a confocal microscope. The lateral resolution has not changed significantly. (c) Comparison of the z-component of the PSF with that of confocal. The axial resolution has been improved.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
331
FIGURE17. Setup of a capillary-based tilting device used for axial tomography. A capillary attached to a mounting block is placed into the focal region of the microscope objective lens. The capillary axis is chosen perpendicular to the optical axis. The capillary is located between the mounting block and the cover glass and is embedded in a buffer medium. The freely rotating axis is pointing out of the image.
of the two points and is determined by the measurement with the lowest precision. This is always the measurement along the optical axis. With the help of a tilting device as in axial tomography, the object can be moved into the focal plane of the microscope, and the distance between the two points can be measured accurately (Fig. 18). A. Three-Dimensional Measurements and Qualitative Analysis
Optical sectioning of the object from different views can perform the determination of object volumes. In one 3D data set the resolution along the optical axis is inferior compared to the resolution in the focal plane. Additional information from data sets acquired at different angles can be
332
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
d FIGURE 18. Distance measurement using axial tomography. (a) Two objects inside the capillary are in different planes but laterally indistinguishable. (b) By rotating the capillary around its axis and a careful translation both objects are moved into the focal plane. The distance d , between the two objects is thereby maximized.
included in the analysis to achieve the best possible resolution (ideally the lateral resolution). After the segmentation of distinct domains from each data set, their volumes can be determined. Moreover, the tilting of the sample allows observation of only one part of interest in the object from different views by angular sectioning. As in computer tomography, only one image is recorded at each angle. This
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
333
results in a number of image data sets of the same object in cylindrical coordinates, which have to be transformed into a Cartesian coordinate system. But it should be considered that after each tilting step the focal plane has to be readjusted (Fig. 18), so that in general there is no common point of reference in the different images. Nevertheless a qualitative visualization of the data is indeed possible after the alignment of the image data in such a way that for all images the same coordinates are allocated to the center of mass (bary center) in the object. Animated sequences from these “corrected” images provide a first impression of the three-dimensional organization of the object.
VIII. SPECTRAL PRECISION DISTANCE MICROSCOPY In this paper we have so far looked at resolution, that is, at two objects that emit light of the same wavelength. Therefore, an image can only show the sum of both objects. However, if two pointlike objects emit at distinct wavelengths, two independent images can be recorded that will each show a single object. The exact location of each object can be calculated using the intensity-weighted center of mass equivalents, and the distance of any two objects can be determined with a noise-limited precision [Burns et al., 19851. The same idea also applies to single-particle tracking in video sequences [Saxton and Jacobson, 19971. The issue is to determine distances from intensity-weighted center of mass equivalents in independently recorded images. Such distances can be below 20nm. Another example is the photonic force microscope [Florin et al., 19971, which uses the position of a single bead to determine a three-dimensional structure. The position of the bead inside the focal volume can be determined with a precision that is most likely below 10 nm. The distance of topological structures in an object can thus be determined with a resolution around 15 nm. An important example is the determination of the surface topology of integrated circuits using, for example, confocal reflection microscopes [Wijnaendts-van-Resandt, 19871. The height differences of planes that are sufficiently far apart can be determined with an unlimited precision. Here the surface roughness, the precision with which the position of the object can be measured along the optical axis, the reflectivity of the surface, and the coherence of the light source [Hell et al., 19911 limit the resolution. In the case of distance determination of small objects, the localization accuracy of these objects is given by the error of the coordinates for the intensity maximum. This intensity maximum corresponds to the intensity bary center. Therefore, the standard deviation of the intensity bary center
334
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
coordinates of a series of measurements can be used to express the localization accuracy of an object. In a biological specimen it was found that it can be estimated to about a tenth of the corresponding PSF-FWHM. Thus the accuracy of distance determination for objects that are more than 1 FWHM apart and possess the same spectral signature is considerably better than the optical resolution (as low as f20 nm) [Bradl et al., 1996a; Bradl et al., 1996b1. In order to measure distances of objects that are beyond 1 FWHM, “spectral precision distance microscopy” can be used [Bornfleth et al., 1998; Burns et al., 1985; Hausmann et al., 1998; Hausmann et al., 19971. As a prerequisite, pointlike objects have to carry a different spectral signature (e.g., different emission spectra or different fluorescence lifetimes). Diffraction limited images can be recorded independently for each object, and their intensity bary centers are determined independently from each other with a localization accuracy valid for targets of the spectral signature. Applying digital image analysis, the Euclidean distances between the intensity bary centers can be calculated. The resolution equivalent, that is, the smallest distance between targets of different spectral signature, determines the precision with which these distances can be measured. It depends strongly on the localization error, which is influenced by the optical resolution, the signal-to-noise ratio, the detector sensitivity, and the digitization [Bornfleth et al., 1998; Manders et al., 1996; Manders, Verbeek, and Aten, 19931. In particular, chromatic shifts between the objects of different spectral signatures have to be taken into account [Bornfleth et al., 1998; Manders, 19971. By using a combination of spectral precision distance microscopy with one of the other high-resolution microscopy techniques, a further improvement of localization accuracy down to the nanometer range seems possible.
IX. COMPUTATIONAL METHODS The idea behind deconvolution is best understood when one looks at transfer functions. Higher frequencies are less efficiently transferred than lower ones. Deconvolution essentially divides the Fourier transform of the image by the Fourier transform of the PSF and thereby amplifies the higher frequencies [Agard, 1983; Agard, 1984; Agard and Sedat, 19831. Due to noise this procedure is not straightforward, thus the noise has to be estimated as a function of the frequency [Shaw and Rawlins, 19911. In addition, the PSF must be estimated. This is done in a separate experiment [Carrington, 1994; Carrington et al., 1995a; Hiraoka, Sedat, and Agard, 19901 or calculated during the deconvolution process [Holmes, 19921. A
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
335
perfect deconvolution would produce a transfer function in the shape of a rectangle. Its Fourier transform is a sinc-function, which causes ringing visible in the edges of the reconstructed image. The solution is to employ additional filters that give the transfer function of the image a smoother shape. It has been mentioned that conventional microscopy has a constant integrated intensity. It is, therefore, unable to resolve axial edges. Deconvolution of conventional images works well with pointlike objects such as spheres and collections of spheres. Using integrating CCD cameras it can start with images having a high dynamic range, but since it produces information about a small volume this is given up during the computational process. Computational methods that claim to reassign the photons to the location from which they were emitted have also been used on images recorded with CFM [Van Der Voort and Strasters, 19951 and 4Pi-CFM [Hell, Schrader, and Van Der Voort, 1997; Schrader and Hell, 1996; Schrader, Hell, and Van Der Voort, 19963.They should have an even higher resolution [Shaw, 19951. A method that has been discussed several times but has never had a really strong impact is the use of more than one detector in the image plane [Bertero, Brianzi, and Pike, 1987; Reinholz et al., 19891. By detecting not only the central spot that is usually observed in a confocal microscope but also the complete pattern using either a CCD camera or several point detectors, the damping of the transfer functions toward higher frequencies can be compensated to a certain extent [Bertero et al., 1990; Reinholz and Wilson, 19941. By using a square instead of a spherical aperture the signal can be directed to a small number of detectors [Barth and Stelzer, 19941, which greatly simplifies the data collection procedure. The method is computationally intensive because several signals have to be combined to calculate an image. Since it reconstructs an almost square optical transfer function, extensive ringing occurs and has to be corrected.
X. SPINNING DISKS An alternative to beam scanning devices are Nipkow disks [Nipkow, 18841 that rotate either in primary or conjugated image planes [Kino, 1995; Petriin et al., 1968; Xiao and Kino, 19871. Their optical properties are identical to those of any other CFM, with a few minor exceptions [Wilson and Sheppard, 1984, pp. 157-168; Wilson, 19901. The advantage is that images can be observed directly through an eyepiece or integrated using CCD cameras. A very important development is to replace the disk by
336
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
arrays of microlenses and to use lasers instead of white light sources [Yin et al., 19951. It also works with TPE [Bewersdorf et al., 19981. This field has become interesting due t o developments found by Juskaitis et al. [1996], who realized that by subtracting certain patterned images from a bright field image a confocal image can be found on top of a constant background signal. XI. PERSPECTIVES OF CONFOCAL FLUORESCENCE MICROSCOPY This review concentrates on the scientific and not the technical developments that push CFM. New lasers, improved detectors, better scanners, faster computers and so forth will influence the application of the developments and may even make certain currently important developments obsolete. Improved computer interfaces that guide the user, perform many tasks automatically, and thus relieve the user from routine corrections will of course have a major impact. Such programs can help with proposals concerning the optimal wavelength and the objective lens. They provide hints concerning the optimal recording conditions and the effective resolution. Of course, this development is found in every scientific instrument. The development of lasers will have a dramatic influence. Since singlephoton excitation requires only laser power in the mW regime, small diode lasers that cover the range from blue to infrared will replace helium-neon and argon-ion lasers. Small, solid state diode pumped lasers will provide enormously high powers and extremely short pulses in the 10-femtosecond range and thus make two- and three-photon excitation much more widely available. In a few years simple laser light sources will cover the whole range from UV to IR with three or maybe even only two lasing units. Filters and shutters will be replaced to some extent by pulsing and switching lasers. The least progress can be expected in the field of detectors. Quenched avalanche photodiodes will push the operating frequency of solid state detectors and may even surpass photomultipliers, but this development is currently not too clear. Two-photon excitation is one of the most fascinating developments of the past few years. Unfortunately, it has generated a lot of hype, and this makes it somewhat difficult to estimate the actual impact on the application side. The situation is a bit similar to the early days of CFM, when many applications made use of it but in unconvincing ways. It took a number of years to identify the really useful areas of applications. TPE-microscopy is most likely useful for thick specimens ( > 100pm) that require a good resolution. It is of no advantage for the study of single cells. It is also the technique that can make best use of the many new lasers that already have
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
337
appeared and will continue to appear on the market. A further impact may result from the development of new dyes with an increased two-photon excitation cross section [Cheng et al., 19981. An important development will be the application of computational methods. They will certainly supplement the purely technical attempts to improve the resolution. Again, it is difficult to assess the impact since a direct comparison with C F M and TPE-microscopy is very time-consuming. While some of the techniques mentioned at this point could disappear, the computational methods will not. They will either be used to improve the quality (whatever that may be) of conventional images or they will be applied to images recorded with confocal-4Pi and theta microscopes. The problem with all methods that claim to achieve a higher resolution is the signal-to-noise ratio (SNR). A higher resolution is always equivalent to a decreased volume, and (assuming the dye concentration remains the same) this means that fewer fluorophores are observed. Hence the signal decreases, and to maintain the SNR the observation time must be extended [Stelzer, 19981. Another point is the significance of a higher resolution in biological specimens. Apart from y = “green fluorescent fusion proteins (GFPs) [Cubitt et al, 19951,’’ most methods are indirect and require a homogeneous penetration of the sample to guarantee a complete labeling of the sample. This, and of course the fact that target and ligand have finite sizes, put a lower limit to the actual distances that can be resolved. There is no doubt that SWFM, Theta-CFM, and 4Pi-CFM will be further developed and will take advantage of many technical achievements. Their impact in terms of applications is a different issue and depends on the acceptance by potential users and their direct advantage. This in turn depends on how those fields develop. One should not underestimate the power of conventional microscopy and the power that is provided by relatively simple techniques such as fluorescence resonant energy transfer [Bastiaens et al., 19961. 111 roto, confocal fluorescence microscopy has seen a rapid development since 1979 [Brakenhoff, Blom, and Barends, 19791. Instrumentation may have matured, but there is no reason to believe that the evolvement of new ideas has stopped. REFERENCES Abbt, E. (1873). Beitrage zur Theorie des Mikroskops und der mikroskopischen Wahrnehmung. Arch. Mikroskopische Anutomie 9 41 1-468. Agard, D. A. (1983). A least-squares method for determining structure factors in threedimensional tilted-view reconstructions. J Mol Biol 167(4) 849-52. Agard, D. A. (1984). Optical sectioning microscopy: cellular architecture in three dimensions. Atin. Rev. Biophys. Bioeny. 13 191-219.
338
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
Agard, D. A. and Sedat, J. W. (1983). Three-dimensional architecture of a polytene nucleus. Nature 302 676-681. Airy, G. B. (1841). On the diffraction of an annular aperture. London. Edinburgh, und Dublin Philosophicul Magazine, Series 3 18 1- 10, 132- 133. Bacallao, R., Kiai, K., and Jesaitis, L. (1995). Guiding principles of specimen preservation for confocal Ruorescence microscopy. In Handbook of Bioloyicul Confocal Microscopy, J. B. Pawley, ed. Plenum Press, New York. pp. 311-325. Bailey, B., Farkas, D. L., Taylor, D. L., and Lanni, F. (1993). Enhancement of axial resolution in fluorescence microscopy by standing-wave excitation. Nuture 366 44-48. Bailey, B., Krishnamurthi, V., and Lanni, F. (1994). Optical subsectioning and multiple focal-plane imaging in the standing-wave fluorescence microscope. In Proceedings ofthe 22nd Annual Meeting of the Microscopy Society of Americu, G. W. Bailey and A. J. Garratt-Reed, eds.) San Francisco Press, Inc., New Orleans, Louisiana, pp. 158-159. Barth, M. and Stelzer, E. H. K. (1994). Boosting the optical transfer function with a spatially resolving detector in a high numerical aperture confocal reflection microscope. Optik 96(2) 53-58. Bastiaens, P. I. H., Majoul, I. V., Verveer, P. J., Soling, H.-D., and Jovin, T. M. (1996). Imaging the intracellular trafficking and state of the AB, quaternary structure of cholera toxin. EMBO 1.15(16) 4246-4253. Bertero, M., Boccdcci, P., Brakenhoff, G . J., Malfanti, F., and Van Der Voort, H. T. M. (1990). Three-dimensional image restoration and super-resolution in fluorescence confocal microscopy (of biological objects). In 1st International Conference on Confocal Microscopy und 2nd International Conference on 3-0Image Processing in Microscopy (Netherlands Soc. Electron Microscopy; Royal Microscopical Soc.; Int. Soc. Stereology; Univ. Amsterdum, 15-1 7 March 1989), Amsterdam, Netherlands, pp. 3-20. Bertero, M., Brianzi, P., and Pike, E. R. (1987). Super-resolution in confocal scanning microscopy. Inverse Probl. 3 195-212. Bewersdorf, J., Pick, R., Hell, S. W. (1998). Multifocal multiphoton microscopy. Opt. Lert. 23(9) 655-657. Boivin, A. (1952). On the Theory of Diffraction by Concentric Arrays of Ring-Shaped Apertures. J . Opt. Soc. Am. 42( 1) 60-64. Born, M. and Wolf, E. (1980). Principles of Optics. Pergamon Press, Oxford. Bornfleth, H., Satzler, K., Eils, R., and Cremer, C. (1998). High-precision distance measurements and volume-conserving segmentation of objects near and below the resolution limit in three-dimensional confocal Ruorescence microscopy. J . Microsc. 189(2) 118- 136. Bradl, J., Hausmann, M., Ehemann, V., Komitowski, D., and Cremer, C. (1992). A tilting device for three-dimensional microscopy: application to in situ imaging of interphase cell nuclei. J . Microsc. 168 47-57. Bradl, J., Hausmann, M., Schneider, B., Rinke, B., and Cremer, C. (1994). A versatile 2n-tilting device for fluorescence microscopes. J . Microsc. 176(3) 21 1-221. Bradl, J., Rinke, B., Krieger, H., Schneider, B., Haar, F.-M., Durm, M., Hausmann. M., and Cremer, C. (1996a). Comparative study of three-dimensional distance resolution in conventional, confocal-laser-scanning and axial tomographic fluorescence light microscopy. In BIOS Europe '96 Optoelectronic Reseurch and Techniques Optical Microscopic Techniques, Wien. Bradl, J., Rinke, B., Schneider, B., Edelmann, P., Krieger, H., Hausmann, M., and Cremer, C. (1996b). Resolution improvement in 3-D microscopy by object tilting. Europ. Microsc. A n d . 44 (November) 9- 11. Brakenhoff, G. J. (1979). Imaging modes in confocal scanning light microscopy (CLSM). J . Microsc. 117(2) 233-242.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
339
Brakenhoff, G. J., Blom, P., and Bakker, C. (1978). Confocal scanning light microscopy with high aperture optics. In Opticu Hoy Y Mununu (Optics Present and Future). jhternat. Commission f o r Optics: Ministerio de lrr Defensa: et id. 10-17 Sept. 1978) J. Bescos, A. Hidalgo, L. Plaza, and J. Santamaria, eds. Instituto de Optica ‘Daza de Valdes,’ CSIC, Sociedad Espanola de Optica, Madrid, Spain, Madrid, Spain, pp. 215- 18. Brakenhoff, G. J., Blom, P., and Barends, P. (1979). Confocal scanning light microscopy with high aperture immersion lenses. J . Microsc. 117(2) 219-232. Brakenho5, G. J., Visscher, K., and Gijsbers, E. J. (1994). Fluorescence bleach rate imaging. J . Microsc. 175(2) 154-161. Burns, D. H., Callis, J. B., Christian, G. D., and Davidson, E. R. (1985). Strategies for attaining superresolution using spectroscopic data as constraints. Appl. Opt. 24(2) 154-161. Carlsson, K., Mossberg, K., Helm, P. J., and Philip, H. (1992). Use of UV excitation in confocal laser scanning fluorescence microscopy. Micron and Microscopicu Acta 23(4) 413-428. Carrington, W. A. (1994). Advances in computational fluorescence microscopy. In Proceedings 41 the 52nd unnular meeting of the Microscopy Society of Americu, G. W. Bailey and A. J. Garratt-Reed, eds. San Francisco Press, Inc., New Orleans, LA, pp. 926-927. Carrington, W. A,, Lynch, R. M., Moore, E. D. W., Isenberg, G., Fogarty, K. E., and Fay, F. S. (1995a). Superresolution three-dimensional images of fluorescence in cells with minimal light exposure. Science 268 1483-1487. Carrington, W. A,, Lynch, R. M., Moore, E. D. W., Isenberg, G., Fogarty, K. E., and Fredric, F. S. (1995b). Superresolution three-dimensional images of fluorescence in cells with minimal light exposure. Science 268(5216) 1483-1487. Cheng, P. C., Pan, S. J., Shih, A,, Kim, K.-S., Liou, W. S., and Park, M. S. (1998). Highly efficient upconverters for multiphoton fluorescence microscopy. J . Microsc. 189(3) 199-212. Cox, G. and Sheppard, C. (1993). Effects of image deconvolution on optical sectioning in conventional and confocal microscopes. Bioim. 1 82-95. Cox. I. J., Sheppard, C. J. R., and Wilson, T. (1982). Super-resolution by confocal fluorescent microscopy. Optik 60(4) 391-396. Cubitt, A. B., Heim, R., Adams, S. R., Boyd, A. E., Gross, L. A. and Tsien, R. Y.: (1995). Understanding, improving and using green fluorescent proteins. Trends Biochem. Sci. 20( 11) 448-455. Davey, A. P., Bourdin, E., Henari, F., and Blau, W. (1995). Three photon induced fluorescence from a conjugated organic polymer for infrared frequency upconversion. Applied Physics Letters 67(7) 884-885. Denk, W., Strickler, J. H., and Webb, W. W. (1990). Two-photon laser scanning fluorescence microscopy. Science 248(4951) 73-6. Denk, W., Strickler, J. P., and Webb, W. W. (1991). Two-photon laser microscopy. In Unired Stures Patent. Cornell Research Foundation, Inc., Switzerland, New York, p. 12. Fischcr, A,, Cremer, C., and Stelzer, E. H. K. (1995). Fluorescence of coumarins and xanthenes after two-photon absorption with a pulsed titanium-sapphire laser. Appl. Opt. 34(12) 1989-2003. Florin, E.-L., Pralle, A,, Horber, J. K. H., and Stelzer, E. H. K. (1997). Photonic force microscope based on optical tweezers and two-photon excitation for biological applications. Journul of Structural Biology 119 202-21 1. Francia, G. T. D. (1952). Super-Gain Antennas and Optical Resolving Power. Supplemento del Nuouo Cimento 9(9) 426-437. Freimann, R., Pentz, S., and Horler, H. (1997). Development of a standing-wave fluorescence microscope with high nodal plane flatness. J . Microsc. 187(3) 193-200. Frieden, 9. R. (1967). Optical transfer of the three-dimensional object. J . Opt. Soc. Am. 57(1) 56-66.
340
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
Gibson, S. F. and Lanni. F. (1991). Experimental test of an analytical model of aberration in an oil-immersion objective lens used in three-dimensional light microscopy. J . Opt. Soc. Am. A 8(10) 1601-13. Goodman, J. W. (1968). Introduction to Fourier Optics. McGraw-Hill, San Francisco. Goppert-Mayer, M. (1931). Uber Elementarakte mit zwei Quantenspriingen. Ann. Phys. 9 273-294. Gryczynski, I., Malak, H., and Lakowicz, .I.R. (1996). Multiphoton Excitation of the DNA stains DAPI and Hoechst. Bioim. 4 138-148. Gu, M. ( 1996). Principles of three-dimensional imaging in confocal microscopes. World Scientific Publishing Co., Singapore, p. 337. Gu, M. and Sheppard, C. J. R. (1995). Optical transfer function analysis for two-photon 4n confocal fluorescence microscopy. Opt. Commun. 114( 1-2) 45-49. Gustafsson, M. G . L., Agard, D. A,, and Sedat, J. W. (1995). Sevenfold improvement of axial resolution in 3D widefield microscopy using two objective lenses. S P I E Proc. 2412 147. Gustafsson, M. G. L., Agard, D. A,, and Sedat, J. W. (1996). 3 D widefield microscopy with two objective lenses: experimental verification of improved axial resolution. S P I E Proc. 2655 62. Hanninen, P. E., Hell, S. W., Salo, J., Soini, E., and Cremer, C. (1995). Two-photon excitation 4n confocal microscope: enhanced axial resolution microscope for biological research. Applied Physics Letters 66( 13) 1698-1700. Hanninen, P. E., Soini, E., and Hell, S. W. (1994). Continuous wave excitation two-photon fluorescence microscopy. J. Microsc. 176(3) 222-225. Hausmann, M., Esa, A,, Edelmann, P., Trakhtenbrot, L., Bornfleth, H., Schneider, B., Bradl, J., Ben-Bassat, I., Rechavi, G., and Cremer, C. (1998). Advanced precision light microscopy for the analysis of 3D-nanostructures of chromatin breakpoint regions towards a structurefunction relationship of the BCR-ABL region. In Fundamentals for rhe Assessment qf Risks from Environmentd Radiation, G. Horneck, ed. Plenum Press, NY. Hausmann, M., Schneider, B., Bradl, J., and Cremer, C. (1997). High-precision distance microscopy of 3D-nanostructures by a spatially modulated excitation fluorescence microscope. S P I E Proc. 3197 217-222. He, G. S., Bhawalkar, J. D., Prasad, P. N., and Reinhardt, B. A. (1995). Three photon absorption induced fluorescence and optical limiting effects in an organic compound, O p t Left. 20( 14) 1524- 1526. Hegedus, Z. S. (1985). Annular pupil arrays- application to confocal scanning. Opt. Acta 32 815-826. Hegedus, Z. S. and Sarafis, V. (1986). Superresolving filters in confocally scanned imaging systems. J . Opt. SOC.Am. A 3 1892- 1896. Hell, S. (1 997). Increasing the resolution of far-field fluorescence light microscopy by pointspread-function engineering. In Topics influorescence spectroscopy Nonlineur and two-photonindured.puorescence, J. Lakowicz, ed. Plenum Press, NY, pp. 361-426. Hell, S., Reiner, G., Cremer, C., and Stelzer, E. H. K. (1993). Aberrations in confocal fluorescence microscopy induced by mismatches in refractive index. J . Microsc. 169(3) 391 -405. Hell, S. and Stelzer, E. H. K. (1992a). Fundamental improvement of resolution with a 4n-confocal fluorescence microscope using two-photon excitation. Opt. Cornrnun. 93 277.282. Hell, S. and Stelzer, E. H. K. (1992b). Properties of a 4n-confocal fluorescence microscope. J . Opt. Soc. Am. A 9(12) 1259-2166. Hell, S., Stelzer, E. H. K., Lindek, S., and Cremer, C. (1994a). Confocal microscopy with an increased detection aperture: type-B 4n-confocal microscopy. Opt. Lett. 19(3) 222-224.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
34 1
Hell, S., Witting, S., Schickfus, M. V., Resandt, R. W. W. V., Hunklinger, S., Smolka, E., and Neiger, M. (1991). A confocal beam scanning white-light microscope. J . Microsc. 163(2) 179- 187. Hell, S. W., Bahlmann, K., Schrader, M., Soini, A., Malak, H., Gryczynski, I., and Lakowicz, J. R. (1996a). Three-photon excitation in fluorescence microscopy. Journal of Biomedical Optics l(1) 71-74. Hell, S. W., Hanninen, P. E., Salo, J., Kuusisto, A., Soini, E., Wilson, T., and Tan, J. B. (1994b). Pulsed and cw confocal microscopy: a comparison of resolution and contrast. Opt. Cornmun. 113 144-152. Hell, S. W. and Kroug, M. (1995). Ground-state-depletion fluorescence microscopy: a concept for breaking the diffraction resolution limit. Appl. Phys. B 60 495-497. Hell, S. W., Lindek, S., Cremer, C., and Stelzer, E. H. K. (1994~).Measurement of the 4n-confocal point spread function proves 75 nm axial resolution. Applied Physics Letters 64( 11) 1335-1337. Hell, S. W., Lindek, S., and Stelzer, E. H. K. (1994d). Enhancing the axial resolution in far-field light microscopy: two-photon 4a-confocal fluorescence microscopy. J . Mod. Opt. 41(4) 675-68 1. Hell, S. W., Schrader, M., Hanninen, P. E., and Soini, E. (1996b). Resolving fluorescence beads at 100-200nm axial distance with a two-photon 4x-microscope operating in the near infrared. Opt. Commun. 128(4-6) 394. Hell. S. W., Schrader, M., and Van Der Voort, H. T. M. (1997). Far-field fluorescence microscopy with three-dimensional resolution in the 100-nm range. J . Microsc. 187( I ) 1-7. Hell, S. W., Soukka, J., and Hanninen, P. E. (1995). Two- and multiphoton detection as an imaging mode and means of increasing the resolution in far-field light microscopy: a study based on photon-optics. Bioim. 3(2) 64-69. Hell, S. W., and Stelzer, E. H. K. (1995). Lens aberrations in confocal fluorescence microscopy. In Hundbook ef biologicul confocal microscopy. J. B. Pawley. ed. Plenum Press. NY, pp. 347-354. Hell, S. W., and Wichmann, J. (1994). Breaking the diffraction resolution limit by stimulated emission: stimulated-emission-depletion fluorescence microscopy. Opf. Lett. 19( 1 1) 780-782. Hiraoka, Y., Sedat, J. W., and Agard, D. A. (1987). The use of a charge-coupled device for quantitative optical microscopy of biological structures. Sciences 238 36-41. Hiraoka, Y., Sedat, J. W., and Agard, D. A. (1990). Determination of three-dimensional imaging properties of a light microscope system. Partial confocal behavior in epifluorescence microscopy. Biophys. J . 57(2) 325-33. Holmes, T. J. (1992). Blind deconvolution of quantum limited incoherent imagery: maximumlikelihood approach. J . Opt. Soc. Am. A 9(7) 1052-61. Holmes, T. J., Bhattacharyya, S., Cooper, J. A,, Hanzel, D., Krishnamurthi, V., Lin, W. C., Roysam, B., Szarowski, D. H., and Turner, J. N. (1995). Light-microscopic images reconstructed by maximum likelihood deconvolution. In Hundbook of Biological Confocul Microscopy, J. B. Pawley, ed. Plenum Press, NY, pp. 389-402. Hopkins, H. H. (1943). The Airy disk formula for systems of high relative aperture. Proceedings of the Physical Society 55 116- 128. Jacobsen, H., Hiinninen, P. E., Soini, E., and Hell, S. W. (1994). Refractive-index-induced aberrations in two-photon confocal fluorescence microscopy. J . Microsc. 176(3) 226-230. Jacobsen, H. and Hell, S. W. (1995). Effect of the specimen refractive index o n the imaging of a confocal fluorescence microscope employing high aperture oil immersion lenses. Bioim. 3 39--47. Juskaitis, R., Wilson, T., Neil, M. A. A., and Kozubek, M. (1996). Efficient real-time confocal microscopy with white light sources. Nature 383 804-806.
342
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
Kaiser, W. and Garrett, C. G . B. (1961). Two-photon excitation in CaF,:Eu*+. Phys. Rev. Lett. 7 229-231. Kino, G. S . (1995). Intermediate optics in Nipkow disk microscopes. In Handbook of Biological Confocal Microscopy, J . B. Pawley, ed. Plenum Press, NY, pp. 155-165. Krishnamurthi, V., Liu, Y.H., Bhattacharyya, S., Turner, J. N., and Holmes, T. J. (1995). Blind deconvolution of fluorescence micrographs by maximum likelihood estimation. Appl. O p t . 34(29) 6633-6647. Kubitscheck, U., Pratsch, L., Passow, H., Peters, R. (1995). Calcium pump kinetics determined in single erythrocyte ghosts by microphotolysis and confocal imaging. Biophys. J . 69( 1) 30-41. Lanni, F. (1986). Standing-wave fluorescence microscopy. In Applications offluorescence in the biomedical sciences, D. L. Taylor, A. S. Waggoner, R. F. Murphy, F. Lanni, and R. R. Birge, eds. Alan R Liss, Inc., pp. 505-521. Lanni, F., Bailey, B., Farkas, D. L., and Taylor, D. L. (1993). Excitation field synthesis as a means for obtaining enhanced axial resolution in fluorescence microscopes. Bioim. 1 187-196. Lanni, F., Waggoner, A. S., and Taylor, D. L. (1986). Standing wave luminescence microscopy, USA. Larkin, K., Cogswell, C. J., O’Byrne, J. W., and Arnison, M. R. (1994). High-resolution, multiple optical mode confocal microscope: 11. Theoretical aspects of confocal transmission microscopy. In SPIE Proc., San Jose, CA, pp. 55-63. Lindek, S. (1993). Auflosungsmessungen mit dem 4n-konfokalen Mikroskop. Universitat Heidelberg, lnstitut fur Angewandte Physik. Lindek, S., Pick, R., and Stelzer, E. (1994a). Confocal theta microscope with three objective lenses. Rev. Sci. Instr. 65( 11) 336773372, Lindek, S., Salmon, N., Cremer, C., and Stelzer, E. H. K. (1994b). Theta microscopy allows phase regulation in 4n(A)-confocal two-photon fluorescence microscopy. Optik 98( 1) 15-20. Lindek, S., Stefany, T., and Stelzer, E. H. K. (1997). Single-lens theta microscopy-a new implementation of confocal theta microscopy. J . Microsc. 188(3) 280-284. Lindek, S. and Stelzer, E. H. K. (1994). Confocal theta microscopy and 4n-confocal theta microscopy. In Three-dimensional microscopy image acquisition and processing. SPIE Proc., San Jose, CA, pp. 188-194. Lindek, S., Stelzer, E. H. K., and Hell, S. (1995). Two new high-resolution confocal fluorescence microscopies (4Pi, Theta) with one- and two-photon excitation. In Handbook of Biological Confocal Microscopy, J. B. Pawley, ed. Plenum Press, NY, pp. 417-430. Manders, E. M. M. (1997). Chromatic shift in multicolour confocal microscopy. J . Microsc. 185(3) 321-328. Manders, E. M. M., Hoebe, R., Strackee, J., Vossepoel, A. M., and Aten, J. A. (1996). Largest contour segmentation: a tool for the localization of spots in confocal images. Cytometry 23 15-21. Manders, E. M. M., Verbeek, F. J., and Aten, J. A. (1993). Measurement of co-localization of objects in dual-colour confocal images. J . Microsc. 169(3) 375-382. Marsman, H. J. B., Wicker, R., Resandt, R. W., Brakenhoff, G. J., and Blom, P. (1983). Mechanical scan system for microscopic applications. Rev. Sci. Instr. 54(8) 1047- 1052. Martinez-Corral, M., Andres, P., Ojeda-Castaiieda, J., and Saavedra, G. (1995a). Tunable axial superresolution by annular binary filters. Application to confocal microscopy. Opt. Commun. 119(5-6) 491-498. Martinez-Corral, M., Andres, P., and Zapata-Rodriguez, C. J. (199513). Improvement of three-dimensional resolution in confocal scanning microscopy by combination of two pupilplane filters. Optik lOl(3) 1-4.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
343
McCutchen, C. W. (1964). Generalized Aperture and the Three-Dimensional Diffraction Image. 1. Opt. Soc. Am. 54(2) 240-244. McCutchen, C. W. (1967). Superresolution in microscopy and the Abbe resolution limit. J . Opt. Sot. Am. 57(10) 1190-1192. Merdes, A., Stelzer, E. H. K., and De Mey, J. (1991). The three-dimensional architecture of the mitotic spindle, analyzed by confocal fluorescence and electron microscopy. J Electron Microsc Tech l8(l) 61-73. Minsky, M. (1961). Microscopy apparatus. In United States Parent Ofice, US, p. 9. Minsky, M. (1988). Memoir on inventing the confocal scanningmicroscope. Scanning 10 128-138. Mitchison, T. J. (1989). Journal Cell Biology 109: 637. Montag, M., Kukulies, J., Jorgens, R., Gundlach, H., Trendelenburg, M. F., and Spring, H. (199I). Working with the confocal scanning UV-laser microscope: specific DNA localization at high sensitivity and multiple-parameter fluorescence. J . Microsc. 163(2) 201-210. Nakamura, 0. (1993). Three-dimensional imaging characteristics of laser scan fluorescence microscopy: two-photon excitation vs. single-photon excitation. Optik 93( 1) 39-42. Nipkow, P. (1884). Elektrisches Teleskop. In Kaiserfische.7 Patentamt (Deutsches Patentamr), Germany. Palmieri, S. L., Peter, W., Hess, H., and Scholer, H. R. (1994). Oct-4 transcription factor is differentially expressed in the mouse embryo during establishment of the first two extraembryonic cell lineages involved in implantation. Developmental Biology 166(1) 259-267. Petran, M., Hadravsky, M., Egger, M. D., and Galambos, R. (1968). Tandem-scanning reflected-light microscope. J. Opt. Soc. Am. 58 661-664. Petroll, W. M., Cavanagh, H. D., Lemp, M. A,, Andrews, P. M., and Jester, J. V. (1992). Digital image acquisition in in vivo confocal microscopy. J . Microsc. 165 61 -69. Reinholz, F., Schiitt, W., Griimmer, G., Kuhlmann, F., and Kraft, S.-K. (1989). A new powerful mode of laser scanning microscopy. Optik 82 165-168. Reinholz, F. and Wilson, T. (1994). Image enhancement by tracking and sampling in the detection plane. Optik 96(2) 59-64. Reinsch, S., Eaton, S., and Stelzer, E. (1998). Confocal microscopy of polarized epithelial cells. In Cell biology: a laboratory handbook, J. E. Celis, ed. Academic Press, pp. 170-178. Sandison, D. R., Piston, D. W., Williams, R. M., and Webb, W. W. (1995a). Quantitative comparison of background rejection, signal-to-noise ratio, and resolution in confocal and full-field laser scanning microscopes. Appl. Opt. 34( 19) 3576-3588. Sandison, D. R. and Webb, W. W. (1994). Background rejection and signal-to-noise optimization in confocal and alternative fluorescence microscopes. Appl. Opt. 33(4) 603-61 5. Sandison, D. R., Williams, R. M., Wells, K. S., Strickler, J., and Webb, W. W. (1995b). Quantitative fluorescence confocal laser scanning microscopy (CLSM). In Handbook qf Biological Confocal Microscopy, J. B. Pawley, ed. Plenum Press, NY, pp. 39-53. Satzler, K. and Eils, R. (1997). Resolution improvement by 3-D reconstructions from tilted views in axial tomography and confocal theta microscopy. Bioim. 5 171-182. Saxton, M. J. and Jacobson, K. (1997). Single-particle tracking: applications to membrane dynamics. Annu Rev Biophys Biomol Struct 26 373-399. Schrader, M. and Hell, S. W. (1996). 4n-confocal images with axial superresolution. J . Microsc. 183(2) 189-193. Schrader, M., Hell, S. W., and Van Der Voort, H. T. M. (1996). Potential of confocal microscopes to resolve in the 50- 100 nm range. Applied Physics Letters 69(24) 3644-3646. Shaw, P. J. (1990). Three-dimensional optical microscopy using tilted views. J . Microsc. 158(2) 165-172. Shaw, P. J. (I 995). Comparison of wide-field/deconvolution and confocal microscopy for 3D imaging. In Hundhook ofBiologica1 Confocal Microscopy,J. B. Pawley, ed. Plenum Press, NY, pp. 373-387.
344
ERNST H. K. STELZER AND FRANK-MARTIN HAAR
Shaw, P. J., Agard, D. A,, Hiraoka, Y., and Sedat, J. W. (1989). Tilted view reconstruction in optical microscopy. Three-dimensional reconstruction Drosophila melanogaster embryo nuclei. Biophys. J . 55(1) 101-1 10. Shaw, P. J. and Rawlins, D. J. (1991). The point-spread function of a confocal microscope: its measurement and use in deconvolution of 3-D data. J . Microsc. 163(2) 151-165. Sheppard, C. J. R. (1977). The use of lenses with annular aperture in scanning optical microscopy. Optik 48(3) 329-34. Sheppard, C. J. R. (1995). Fundamental reduction of the observation volume in far-field light microscopy by detection orthogonal to the illumination axis: confocal theta microscopy Comment. Opt. Commun. 119 639-695. Sheppard, C. J. R. (1996). Image formation in three-photon fluorescence microscopy. Bioim. 4 124- 128. Sheppard, C. J. R. and Gu, M. (1990). Image formation in two-photon fluorescence microscopy. Optik 86(3) 104-6. Sheppard, C. J. R. and Wilson, T. (1981). The theory of the direct-view confocal microscope. J . Microsc. 124 107-17. Shotton, D. M. (1995). Electronic light microscopy: present capabilities and future prospects. Hisrochem. Cell. Biol. 104 97- 137. Skaer, R. J. and Whytock, S. (1975). Interpretation of the three-dimensional structure of living nuclei by specimen tilt. J . Cell Sc. 19 1-10. Slomba, A. F., Wasserman, D. E., Kaufman, G. I., and Nester, J. F. (1972). A laser flying spot scanner for use in automated fluorescence antibody instrumentation. Journal qf the Assoc. for the Adv. of Med. Instrumentation 6(3) 230-234. Speicher, M. R., Ballard, S. G., and Ward, D. C. (1996). Karyotyping human chromosomes by combinatorial multi-fluor FISH. Nature Genetics 12 368-375. Stelzer, E. H. K. ( 1994). Designing a confocal fluorescence microscope. In Computer-assisted multidimensional microscopies, P. C. Cheng, T. H. Lin, W. L. Wu, and J. L. Wu, eds. Springer, NY, pp. 33-51. Stelzer, E. H. K. (1995). The intermediate optical system of laser-scanning confocal microscopes. In Hundbook of Bioloyical Confocal Microscopy, J. B. Pawley, ed. Plenum Press, NY, pp. 139- 154. Stelzer, E. H. K. (1997). Three-dimensional light microscopy. In Handbook of Microscopy: Applications in materials science, solid-state physics and chemistry, S. Amelinckx, D. v. Dyck, J. v. Landuyt, and G. v. Tendeloo, eds. VCH-Verlag, Weinheim, NY, pp. 71-82. Stelzer, E. H. K. (1998).Contrast, resolution, pixelation, dynamic range and signal-to-noise ratio: fundamental limits to resolution in fluorescence light microscopy. J . Microsc. 189( I ) 15-24. Stelzer, E. H. K., Hell, S., Lindek, S., Stricker, R., Pick, R., Storz, C., Ritter, G., and Salmon, N. ( I 994). Nonlinear absorption extends confocal fluorescence microscopy into the ultraviolet regime and confines the illumination volume. Opt. Commun. 104 223-228. Stelzer, E. H. K. and Lindek, S. (1994). Fundamental reduction of the observation volume in far-field light microscopy by detection orthogonal to the illumination axis: confocal theta microscopy. Opt. Commun. 111 536-547. Stelzer, E. H. K., and Lindek, S. (1996a). Konfokales Mikroskop mit einem DoppelobjektivSystem. In Deutsches Patentumt. EMBL, Germany. Stelzer, E. H. K., and Lindek, S. (1996b). Strahlumlenkeinheit zur mehrachsigen Untersuchung in einem Mikroskop, insbesondere Rasterlichtmikroskop. In Deutsches Putentamt. EMBL, Germany. Stelzer, E. H. K., Lindek, S., Albrecht, S., Pick, R., Ritter, G.. Salmon, N. J., and Stricker, R. (1995). A new tool for the observation of embryos and other large specimens: confocal theta fluorescence microscopy. J . Microsc. 179(1) I - 10.
CONFOCAL MICROSCOPY: RECENT DEVELOPMENTS
345
Stelzer, E. H. K., Marsman, H. J. B., and Resandt, R. W. (1986). A setup for a confocal scanning laser interference microscope. Optik 73( 1) 30-33. Stelzer, E. H. K., Merdes, A,, and De Mey, J. (1991). Konfokale Fluoreszenzmikroskopie in der Zellbiologie. Biol. in uns. Z . 1 19-25. TorGk, P., Hewlett, S. J., and Varga, P. (1997). The role of specimen-induced spherical aberration in confocal microscopy. J . Microsc. 188(2) 158-172. Torok, P., Varga, P., and Booker, G. R. (1995). Electromagnetic diffraction of light focused through a planar interface between materials of mismatched refractive indices: structure of the electromagnetic field. 1. J . Opt. Soc. Am. A 12(10) 2136-2144. Torok, P., Varga, P., Konkol, A., and Booker, G. R. (1996). Electromagnetic diffraction of light focused through a planar interface between materials of mismatched refractive indices: structure of the electromagnetic field. 11. J . Opt. Soc. Am. A 13(11) 2232-2238. Tsien, R. Y. and Waggoner, A. (1995). Fluorophores for confocal microscopy. In Hundbook of Biologicul Conjbcul Microscopy, J. B. Pawley, ed. Plenum Press, NY, pp. 267-279. Van Der Voort, H . T. M., Brakenhoff, G. J., Valkenburg. J. A. C., and Nanninga, N. (1985). Design and use of a computer controlled confocal microscope for biological applications. Srunniny 7 66-78. Van Der Voort, H. T. M., and Strasters, K. C. (1995). Restoration of confocal images for quantitative image analysis. J . Microsc. 178 165-181. Van Meer, G., Stelzer, E. H. K., Wijnaendts-van-Resandt, R. W., and Simons, K. (1987). Sorting of sphingolipids in epithelial (Madin-Darby canine kidney) cells. J . Cell Biol. 105 1623- 1635. Visser, T. D., Groen, F. C. A,, and Brakenhoff, G . J. (1991). Absorption and scattering correction in fluorescence confocal microscopy. J . Mioosc. 163 189-200. White, N. S., Errington, R. J., Fricker, M. D., and Wood, J. L. (1996). Multidimensional fluorescence microscopy: optical distortions in quantitative imaging of biological specimens. In Fluorescence Microscopy und Fluorescent Probes. J. Slavik, ed. Plenum Press, NY, pp. 47-56. Wijnaendts-van-Resandt, R. W. ( 1 987). Application of confocal beam scanning microscopy to the measurement of submicron structures. In S P I E Proceedings, pp. 101- 106. Wijnaendts-van-Resandt, R. W., Marsman, H. J. B., Kaplan, R., Davoust, J., Stelzer, E. H. K., and Stricker, R. (1985). Optical fluorescence microscopy in three dimensions: microtomoscopy. J . Microsc. 138(1) 29-34. Wilke, V. (1983). Laser scanning in microscopy. S P I E Proc. 397 164-172. Wilke, V. (1985). Optical scanning microscopy- the laser scan microscope. Scunniny 7 88-96. Wilson, T. (1990). Detector arrays in confocal scanning microscopes. Scunning Microscopy 4( 1) 21 -4. Wilson, T. and Hewlett, S. J. (1990). The use of annular pupil plane filters to tune the imaging properties in confocal microscopy. J . Mod. Opt. 37( 12) 2025-46. Wilson, T. and Sheppard, C. J. R. (1984). Theory and practice of scanning optical microscopy. Academic Press, London. Xiao, G . Q. and Kino, G . S. (1987). A real-time confocal scanning optical microscope. In Scanning Imaging Technology (SPIE; Comiie Brlgr Opt.; IEEE; el al., 2-3 April 1987). The Hague, Netherlands, pp. 107- 13. Xu, C. and Webb, W. W. (1996). Measurement of two-photon excitation cross sections of molecular fluorophores with data from 690 to 1050nm. J . Opt. Soc. Am. B 13(3) 481-491. Yin, S., Lu, G., Zhang, J., Yu, F. T. S., and Mait, J. N. (1995). Kinoform-based nipkow disk for a confocal microscope. Appl. O p f . 34(25) 5695-5698. Zink, D., Cremer, T., Saffrich, R., Fischer, R., Trendelenburg, M. F., Ansorge, W., and Stelzer, E. H. K. (1998). Structure and dynamics of human interphase chromosome territories in vivo. Hum. Genet. 102 241-251.
This Page Intentionally Left Blank
ADVANCES I N IMAGING A N D ELECTRON PHYSICS, VOL. 106
Index
A
C
Ampere’s laws, 133 Analog optical information processing, 241-42 Analog signal processing, CCDs and, 2-3 Annealing, 32-34 Antireflection (AR) coatings, 9 Aperture filters, 327-29 Array size, CCD, 5-7 Arrhenius plot, 31-32, 80 ath fractional Fourier domain, 259 ath order, 240, 243-44 Axial resolution, confocal microscopy: confocal theta microscopy, 3 17-21 4Pi-, 314-17 standing-wave fluorescence microscopy, 31 1-14 Axial tomography, 329-33
CAD (computer aided design). See Microstrip circulators Canada France Hawaii Telescope (CFHT), 7 Cardinality: of a digital arc, 191 of shortest path, 207-9 Chain code, 213-14 Chamfer discs, 203 Chamfer distances, 202-4 Charge collection: backside illumination, 18-19 buried channel operation, 15- 17 charge spreading, 18 defined, 11 full well, 19-20 inversion layer, 14 process, 13-20 steady-state minority carrier concentration, 14 Charged coupled devices (CCDs): applications, 2-3 array size, 5-7 background, 2-4 basic structure and operation of, 1 1-26 development and current status, 4-10 noise, 8-9 origin of term, 2 quantum efficiency, 9-10 in satellite missions, 87-88 Charged coupled devices (CCDs), radiation damage and: annealing, 32-34 bulk trap levels, 29-31 charge transfer efficiency, 6, 48-66, 87 dark current, 8, 36-48, 86-87
B Background, binary digital image, 19394 Backside illumination, 18- 19 Barium hexagonal systems, 105 Binary digital images, 187-89 foreground and background, 193-94 grid graph of foreground of, 197-98 Bloomed full well charge, 19-20 Border: of a binary digital image, 194 of a digital set, 193 Bounded connected component, 192 Bulk generation, 37-39 Bulk trap levels, 29-31 Buried channel operation, 15- 17
347
348
INDEX
Charged coupled devices (Continued) deep level transient spectroscopy measurements, 3 1-32 displacement damage, 28-29 FUSE radiation environment, 34-36 ionization damage, 27-28 read noise, 8, 66-86, 87 research background on, 10- 11 Charge detection: defined, 11 process, 23-26 Charge generation: defined, 11 process, 12- 13 Charge packets, 2 Charge transfer: defined, 11 process, 20-23 simulation software for, 23 Charge transfer efficiency (CTE), 6 conclusions, 87 defined, 48-49 experimental results, 58-66 extended pixel edge response technique, 57-58 fine spot illumination, 55, 57 measurement techniques, 54-58 potential pockets and trapping, 49-53 pulse train technique, 54 simple physical model, 49-53 x-ray illumination, 55 Charge transfer inefficiency (CTI), 48 as a function of signal, 62-66 as a function of temperature, 58-62 Chessboard distance, 200- 1 Chirp functions, 259-60 Chord properties, 214- 18 16-neighborhood space, 228-3 1 City-Block distance, 200 Clamp and sample correlated double sampling (CS-CDS), 69-72 Co-firing, 105, 108 Cohen class, 258, 259 Compact chord property, 217- 18 16-neighborhood space, 230-3 1
Compton scattering, 27 Computer aided design (CAD). See Microstrip circulators Confocal fluorescence microscopy (CFM): alternatives to, 308-9 aperture filters, 327-29 applications, 307-8 axial tomography, 329-33 computational methods, 334-35 4Pi-, 314-17 future developments, 336-37 ground-state-depletion, 325-27 improving axial resolution, 3 11-21 index mismatching effects, 309- 11 light paths in, 301-6 multiphoton excitation, 24 nonlinear imaging, 321-27 optical properties, calculating, 299-301 optimal recording conditions, 309 point-spread functions 299-301 resolution in light microscopy, 293-99 sea-response, 306 spectral precision distance microscopy, 333-34 spinning disks, 335-36 standing-wave, 3 1 1- 14 stimulated-emission-depletion, 324-25 technical aspects, 307 two-photon excitation, 322-24 Confocal theta microscopy, 317-21 Connected component, 191-92 Constant resistance deep level transient spectroscopy measurements (CR-DLTS), 31, 32 Convexity, discrete, 210- 12 grid-intersect quantization, 226-27 Copper spinel systems, 104 Correlated double sampling (CDS): clamp and sample, 69-72 dual slope integration, 72-75 effect of signal processing, 82-86 Cosmic rays, 27, 34
INDEX
Coulombic scattering, 27 Cyclic dip and fire, 105
D Dark current, 8 bulk generation, 37-39 conclusions, 86-87 diffusion, 42 experimental results, 42-48 noise, 47-48 sources of, 37 surface generation, 39-40 surface suppression, 40-42 theory, 36-42 Deconvolution, 334-35 Deep depletion, 14, 39 Deep level transient spectroscopy measurements (DLTS), 31-32 Demagnetization factor, 115 Dielectric losses, 134-50 Differential equations, fractional Fourier: transform and, 26 1-62 Diffusion dark current, 42 Digital arcs and closed curves, 191-95, 199 upper and lower bounds, 232 Digital images: binary, 187-89 topology, 189-98 Digital simulation of fractional Fourier: transform, 263-65 Digital straight segment, 212 Dirac delta function, 248 Direct chemical deposition, 105 Discrete convexity, 210-12 grid-intersect quantization, 226-27 Discrete disc, 199 Discrete distances and shortest paths, 198-210 Discrete geometry: application to vectorization, 233-34 binary digital images, 187-89 chord properties, 214-18 convexity, 210-12
349
digital arcs and closed curves, 191-95 digital topology, 189-98 distances and shortest paths, 198-210 Freeman’s codes and chain code, 213-14 image-to-graph mapping, 195-98 neighborhoods, 189, 190-91 neighborhood space (16), 218-33 role of, 186 straightness, 212-18, 227-32, 234 Discrete Jordan’s theorem, 192 Discrete straightness, 212-18 checking, using duality of transformations, 234 16-neighborhood space, 227-32 Displacement damage, 28-29 Distances, discrete, 198-210 Dithered clocking technique, 41-42 Doubly charged vacancy-vacancy (V-V) complex, 30, 31 Drift, fringe field an self-induced, 21 Dual slope integration (DSI), 72-75, 83-84 Dyadic Green’s function, 98, 159
E Edge effect, 53 Eigenvalues and eigenfunctions, 249- 52 EMS Technologies, Inc., 129, 130, 136 English Electric Valve (EEV), Inc., 26 Euclidean distance, 200, 209-10 Even operations, 252-53 Extended pixel edge response technique (EPER), 57-58
F Fairchild Semiconductor, 5 Far Ultraviolet Spectrographic Explorer (FUSE), 10-11, 34-36,49, 88 Fast Fourier transform (FFT), 264 Ferrite for microstrip circulators: hybrid circuit compatible techniques, 105-10
3 50
INDEX
Ferrite for microstrip circulators (Continued) magnetless compatible techniques, 110-11 matching sections, 120-31 material parameters, 113-19 monolithic circuit compatible techniques, 1 11- 13 physical and chemical attributes of, 99-105 processing of, 105-13 Fine Error Sensor (FES), 10-11,49, 88 Finite element approach: R F field and s-parameters using, 178-81 RF formulas from, 167-71 static internal magnetic field, 150-57, 172-74 First-order layer effect estimation, 131-34 First-order loss effect estimation, 134-50 Flicker noise, 68-69 Ford Aerospace Corp., 5 Foreground, binary digital image, 193-94 grid graph of, 197-198 Fourier optics, 242, 265, 273-76 4Pi-confocal fluorescence microscopy, 314-17 Fractional Fourier transform: applications, 241-43 ath fractional Fourier domain, 259 ath order, 240, 243-44 chirp functions, 259-60 common transform pairs, 247-49 differential equations, 261-62 digital simulation of, 263-65 Dirac delta function, 248 domains, 260 eigenvalues and eigenfunctions, 249-52 fundamental properties, 245-47 Hermite-Gaussian functions, 249-5 1, 261-62, 276 historical research, 240-41
hyperdifferential form, 263 identity and parity operators, 245 index additivity property, 246, 252, 264-65 notation and definitions, 243-44 operational properties, 252-55 quadratic phase function, 248 Wigner distribution and, 256-60 Fractional Fourier transform, signal and image processing and, 279-86 chirps, 282 multistage and multichannel filtering, 282-86 Fractional Fourier transform, wave and beam propagation and: Fourier optics, 242, 265, 273-76 Fresnel diffraction, 271-72 Gaussian beam propagation, 276-78 quadratic graded-index media, 27071 quadratic-phase systems, 265-70 Freeman’s codes, 213-14 Frenkel pair, 29 Fresnel diffraction, 271-72 Full well charge, 19-20
G Garnet systems, 99-101 Gaussian beam propagation, 276-78 Gauss laws, 133 Generation-recombination noise, 67-68 Gouy phase shift, 276-78 Graph, shortest path base, 204-5, 207 Graph mapping, image-to-, 195-98 Greedy algorithm, 233 Green’s functions: dyadic, 98, 159 first-order loss effect estimation, 134 recursive, 158-67 RF field and s-parameters using, 174-78 Grid graph: defined, 195-96
INDEX
of foreground of a binary digital image, 197-98 properties, 196 Grid-intersect quantization, 224-27 Ground-state-depletion (GSD) fluorescence microscopy, 325-27
H Helmholtz equation, 137, 142 Hermite-Gaussian functions, 249-5 1, 261-62, 276 Hexagonal system, 99, 102, 105, 110- 11 Hot isostatic pressing, 105 Huygens’ principle, 299 Hybrid circuit compatible techniques, 105-10 Hyperdifferential form, fractional Fourier: transform and, 263
I Image-to-graph mapping, 195-98 Imaging, CCDs and line, 3-4 Index additivity property, 246, 252, 264-65 Insertion loss, 162 Ionization damage, 27-2% Isolation, 162
J Jacobian matrix, 156-57 Jet vapor deposition, 105, 107, 109 Johnson noise, 66-67 Jordan’s theorem, discrete, 192
K Knight move, 202, 209 Knight neighborhood, 190, 201-2
L Liquid phase epitaxy, 105 Lithium spinel systems, 104 Loral Aerospace, 5-6
351
M Magnetic garnet systems, 101 Magnetic losses, 134-50 Magnetization radian frequencies, 114 Magnetless compatible techniques, 11011 Manganese and manganese/magnesium systems, 104 Maxwell’s equations, 142, 144-45, 146, 151, 167-68 MEGACAM camera projects, 7 Mehler’s formula, 252 Memory devices, CCDs as, 2 Metal-insulator-semiconductor (MIS): capacitor, 2, 13-14 Metallic losses, 134-50 Microstrip circulators: CAD for, 97-98 EMS Technologies, 129, 130 ferrite material parameters, 113- 19 first-order layer effect estimation, 131-34 first-order loss effect estimation, 134- 50 hybrid circuit compatible techniques, 105-10 Ka-band, 129-30 magnetless compatible techniques, 110-11 matching sections, 120-31 monolithic circuit compatible techniques, 111- 13 physical and chemical attributes of ferrite for, 99-105 processing of ferrite materials for, 105-13 R F field and s-parameters using finite elements, 178-81 RF field and s-parameters using Green’s functions, 174-78 RF formulas from finite element approach, 167-71 RF formulas from Green’s function theory, 158-67 self-biasing, 123-24
352
INDEX
Microstrip circulators (Continued) SrM, 130-31 static internal magnetic field, formulas for, 150-57 static internal magnetic field, results, 172-74 Westinghouse, 126-27, 131 Monolithic circuit compatible techniques, 111 13 Move and move length, 199 Multiphoton excitation, 324 Multiple Mirror Telescope (MMT), 7 Multiple pinned phase (MPP) devices, 41
P
N
Q
-
NASA, 5 National Research Council (NRC), Dominion Astrophysical Observatory (DAO) of, 9 Neighborhoods, 189, 190-91, 192-93 Neighborhood space, 16 definitions, 218-24 discrete straightness, 227-32 grid-intersect quantization, 224-27 transform, 218-20 Nickel ferrite system, 102-3 Nipkow disks, 335-36 Noise, CCD, 8-9 See also under type of' Nonlinear imaging, 321 ground-state-depletion fluorescence microscopy, 325-27 multiphoton excitation, 324 stimulated-emission-depletion fluorescence microscopy, 324-25 two-photon excitation, 322-24
0 Operational properties, fractional Fourier transform, 252-55 Optical properties, confocal microscopy and, 299-301 Oxygen-vacancy ( 0 - V ) complex, 30, 31
PDE2D software, 152-53, 156, 168-71, 178, 179 Permeability tensor, 114 Phillips Imaging Technology, Inc., 6 Phosphorus-vacancy (P-V) complex, 29, 30 Photon noise, 8 Plasma vapor deposition, 105 Point-spread function (PSF), amplitude, 299-301 Pulsed laser deposition (PLD), 105, 107, 109, 113 Pulse train technique, 54
Quadratic graded-index media, 270-71 Quadratic phase function, 248 Quadratic-phase systems, 265-70 Quantum efficiency (QE), 9-10
R Radiation damage. See Charged coupled devices (CCDs), radiation damage and Radon transform operator, 257-58 Raytheon Co., 123 Read noise, 8 conclusions, 87 correlated double sampling, 69-75 current-voltage measurements, 75-76 effect of CDS signal processing, 82-86 experimental results, 75-86 flicker, 68-69 generation-recombination, 67-68 noise measurements, 76-82 reset, 67 sources of, 66-69 thermal/Johnson, 66-67 Recursive Green's function, 158-67 Reset noise, 67 Resolution in light microscopy, 293-99 Reticon, Inc., 9 R F field and s-parameters: using finite elements, 178-81
INDEX
using Green’s functions, 174-78 R F formulas: from finite element approach, 167-71 from Green’s function theory, 158-67 Roll compaction, 105
S Scanning disks, 307 Screen printing, 105 Sea-response, 306 Shannon’s interpolation formula, 264 Shell effect, 53, 62 Shortest paths, discrete, 198-210 Signal and image processing. See Fractional Fourier transform, signal and image processing and Simple connected component, 192 Single-lens confocal theta microscope (SLTM), 319-20 16-neighborhood space. See Neighborhood space, 16 Solution plating, 105 s-parameters: using finite elements, 178-81 using Green’s functions, 174-78 Spectral precision distance microscopy, 333-34 Spincl systems, 99, 100, 102-4 Spinning disks, 335-36 Spin spray, 105 Standing-wave fluorescence microscopy (SWFM), 311-14 Static internal magnetic field: formulas for, 150-57 results, 172-74 Steward Observatory, 6 Stimulated-emission-depletion (STED): fluorescence microscopy, 324-25 Straightness, discrete, 212-18 checking, using duality of transformations, 234 16-neighborhood space, 227-32 Strontium hexagonal systems, 105
353
Summing well, 24-25 Surface dark current suppression, 40-42 Surface full well charge, 19 Surface generation, 39-40
T Tape casting, 105, 107-8, 109 Tektronix, Inc., 5, 31, 42 Texas Instruments, Inc., 5, 9 Thermal diffusion, 21 Thermal noise, 8, 66-67 Theta double objective (TDO), 319 Transfer functions, 300 Trapping by energy states, 6, 28 bulk trap levels, 29-31 charge transfer efficiency and, 49-53 Two-photon excitation (TPE), 322-24
U Unbounded connected component, 192
V Vacancy-vacancy (V-V) complex, 29-30 Vectorization, 233-34 Vertices, path and path length, 196 Vidicon tubes, 4-5
w Wave and beam propagation. See Fractional Fourier transform, wave and beam propagation and Westinghouse, 126-27, 131 Wide Field and Planetary Camera (WF/ PC), 5, 9 Wiener filter, 279 Wigner distribution, fractional Fourier transform and, 256-60 Wire bonding, 107
X X-ray illumination, 55
I S B N 0-12-014748-3