ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 86
EDITOR-IN-CHIEF
PETER W. HAWKES Centre National de la Recher...
58 downloads
889 Views
12MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 86
EDITOR-IN-CHIEF
PETER W. HAWKES Centre National de la Recherche Scient$que Toulouse. France
ASSOCIATE EDITOR
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
Advances in
Electronics and Electron Physics EDITEDBY PETER W. HAWKES CEMESILaboratoire d’Optique Electronique du Centre National de la Recherche Scientijique Toulouse, France
VOLUME 86
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
Boston San Diego New York London Sydney Tokyo Toronto
This book is printed on acid-free paper. @ COPYRIGHT 0 1993 BY ACADEMIC PRESS,INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS,
INC. 1250 Sixth Avenue, San Diego. CA 92101-4311
United Kingdom Edition published by
ACADEMIC PRESS LIMITED 24-28 Oval Road, London NWI 7DX
LIBRARY OF CONGRESS ISSN 0065-2539 ISBN 0-12-014728-9
CATALOG C A R D
NUMBER: 49-7504
PRINTED IN THt UNITED STATES OF AMERICA
93
94
95
96 97
BC 9 8 7 6 5 4 3 2
1
CONTENTS CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . PREFACE. . . . . . . . . . . . . . . . . . . . . . . . . .
Recent Advances in GaAs Dynamic Memories JAMESA . COOPER. JR. I . Introduction. Motivation. and Potential Applications . . . IT. pn-Junction Storage Capacitors . . . . . . . . . . . . I11. JFET and MESFET DRAM Cells . . . . . . . . . . . IV . Heterostructure DRAM Cells . . . . . . . . . . . . . V . Bipolar DRAMS. . . . . . . . . . . . . . . . . . . . VI . Future Directions . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
vii ix
. . . . .
Expert Systems for Image Processing. Analysis. and Recognition TAKASHI MATSUYAMA I . Introduction . . . . . . . . . . . . . . . . . . . . . I1. Expert Systems for Image Processing and Analysis (ESIPAs) 111. Representing Knowledge about Image Analysis Strategies . . IV . Representing Spatial Relations and Spatial Reasoning for Image Understanding . . . . . . . . . . . . . . . . . . Concluding Remarks . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
1 6 32 44 59 64 77
81 87 124 143 163 168
n-beam Dynamical Calculations KAZUTO WATANABE I. I1. I11. IV . V.
Introduction . . . . . . . . . . . . . n-beam Dynamical Calculation Methods Bethe Method . . . . . . . . . . . . . Multislice Method . . . . . . . . . . Coupled Differential Equations . . . . V
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173 176 178 187 199
vi
CONTENTS
VI. Summary. . . . . Acknowledgments . Appendix A: Crystal References . . . .
. .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Potential. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
218 2 19 219 22 1
Methods for Calculation of Parasitic Aberrations and Machining- Tolerances in Electron Optical Systems M. I. YAVOR 225 I. Introduction . . . . . . . . . . . . . . . . . . . . . 11. Parasitic Aberrations caused by Electron Optical Element 227 Misalignment . . . . . . . . . . . . . . . . . . . . . 111. Effects of Electromagnetic Field Disturbances on Charged Particle Trajectories . . . . . . . . . . . . . . . . . . 235 IV. General Methods for Calculation of Electromagnetic Field Disturbances due to Electrode or Pole Face Distortions. . . 245 V. Field Disturbance in Electrostatic and Magnetic Sector Analyzers . . . . . . . . . . . . . . . . . . . . . . . 26 1 VI. Application of Approximate Conformal Mappings . . , . . 269 VII. Conclusion . . . . . . . . . . . . . . . . . . . . . . 277 References . . . . . . . . . . . . . . . . . . . . . . 279
INDEX. . . . . . . . . . . . . . . . . . . . . . . . . .
28 3
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
JAMES A. COOPER, JR. (l), School of Electrical Engineering, Purdue University, West Lafayette, IN 47907- 1285 (8 l), Department of Information Technology, Faculty TAKASHI MATSUYAMA of Engineering, Okayama University, 3- 1 - 1 Tsushima-Naka, Okayama, Okayama 700, Japan
KAZUTO WATANABE (173), Tokyo Metropolitan Technical College, 1-10-40 Higashiohi, Shinagawa-Ku, Tokyo 140, Japan M. I. YAVOR(225), Institute of Analytical Instrumentation, Pr Ogorodnikova 26, 198103 St. Petersburg, Russia
vii
This Page Intentionally Left Blank
PREFACE
Computer memory, expert systems in image processing, and image simulation and tolerances, are the themes of this volume. All of them are of considerable importance today. We begin with an account by James A. Cooper on the use of gallium arsenide for dynamic RAM technology. Attractive though GaAs is, there are problems to be overcome and James Cooper examines these critically, notably the limitations to long-term charge storage on p n junctions. He then examines the various one-transistor dynamic RAM cells, notably junction field-effect transistors, metal-semiconductor FETs, modulationdoped FETs, heterojunction, quantum well and bipolar cells. His chapter ends with a critical examination of future possibilities. The chapter has numerous illustrations to help make this material accessible to a wide readership. The second chapter deals with expert systems for image processing, analysis and recognition. This is a rapidly developing field of great commercial as well as intellectual importance. Takashi Matsuyama subtitles his account “Declarative Knowledge Representation for Computer Vision,” which reminds us that in these expert systems we have to distinguish between procedural and declarative knowledge. After an introductory analysis of the problem and a brief recapitulation of the associated vocabulary, the author explores in detail expert systems in the field of image processing and analysis. A section is then devoted to the representation of strategic knowledge and a further section deals with the important topic of spatial relations in image understanding. Here too, the many examples should help the reader through this difficult but important material. The third chapter is concerned with the simulation of images by the n-beam method, which is a topic of major importance for the electron microscopist. Although this is a subject that is found in textbooks on high-resolution microscopy, new ideas and fresh developments continue to appear in the specialized journals. This survey by an expert, Kazuto Watanabe, is therefore very welcome. Simulation is an essential step in the interpretation of electron images when the detail is close to the limit of resolution of the instrument since direct interpretation is impossible or at best risky. The volume ends with a chapter by Michael Yavor on a remarkably unpleasant topic in particle optics, the calculation of parasitic aberrations. These aberrations arise as a result of imperfect machining of polepieces and electrodes, and are almost inevitable owing to the extremely high tolerances ix
X
PREFACE
that have to be maintained not only in machining but also in alignment. They are not easy to calculate because they require either the application of a sophisticated perturbation calculation to a system of relatively high symmetry (typically two planes of symmetry or rotational symmetry) or the development of a fully three-dimensional approach. Although the theory of such aberrations has been known for several decades, the practical implications have not yet been fully explored. This is hence a valuable, and bold, addition to the subject. In conclusion, I thank all the authors for taking so much time to work on their chapters, and I append a list of those anticipated in forthcoming volumes of the series.
FORTHCOMING ARTICLES Neural networks and image processing Image processing with signal-dependent noise Parallel detection Microscopic imaging with mass-selected secondary ions Magnetic reconnection Sampling theory ODE methods Interference effects in mesoscopic structures Integer sinusoidal transforms The artificial visual system concept Minimax algebra and its applications Corrected lenses for charged particles Data structures for image processing in C The development of electron microscopy in Italy Electron crystallography of organic compounds The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Amorphous semiconductors Median filters
J. B. Abbiss and M. A. Fiddy H. H. Arsenault P. E. Batson M. T. Bernius
A. Bratenahl and P. J. Baum J. L. Brown J. C. Butcher M. Cahay W. K. Cham J. M. Coggins R. A. CuninghameGreen R. L. Dalglish M. R. Dobie and P. H. Lewis G. Donelli D. L. Dorset M. Drechsler J. M. H. Du Buf W. Fuhs N. C. Gallagher and E. Coyle
xi
PREFACE
Bayesian image analysis Non-contact scanning force microscopy with applications to magnetic imaging Theory of morphological operators Noise as an indicator of reliability in electronic devices Applications of speech recognition technology Spin-polarized SEM Fractal signal analysis using mathematical morphology Electronic tools in parapsychology Image formation in STEM Phase-space treatment of photon beams Fuzzy tools for image analysis Z-contrast in materials science Electron scattering and nuclear structure Edge detection The wave-particle dualism Electrostatic lenses Scientific work of Reinhold Rudenberg Metaplectic methods and image processing X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology Focus-deflection systems and their applications The suprenum project Knowledge-based vision Electron gun optics Spin-polarized SEM Cathode-ray tube projection TV systems
Parallel imaging processing methodologies Signal description The Aharonov-Casher effect
S. and D. Geman U. Hartmann
H. J. A. M. Heijmans B. K. Jones H. R. Kirby K. Koike P. Maragos R. L. Morris C. Mory and C. Colliex G. Nemes S. K. Pal S . J. Pennycook G . A. Peterson M. Petrou H. Rauch F. H. Read and I. W. Drummond H. G . Rudenberg W. Schempp G. Schmahl J. P. F. Sellschop J. Serra T. Soma 0. Trottenberg J. K. Tsotsos Y. Uchikawa T. R. van Zandt and R. Browning L. Vriens, T. G . Spanjer and R. Raue S. Yalamanchili A. Zayezdny and I. Druckmann A. Zeilinger, E. Rase1 and H. Weinfurter
This Page Intentionally Left Blank
.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS VOL . 86
Recent Advances in GaAs Dynamic Memories JAMES A . COOPER. JR . School of Electrical Engineering. Purdue University. West Lafayette. IN
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . A . Introduction and Background . . . . . . . . . . . . . . . . . B. Motivations for the Development of DRAM Technology in GaAs . I1. pn-Junction Storage Capacitors . . . . . . . . . . . . . . . . . . . A . General Description . . . . . . . . . . . . . . . . . . . . B. Charge Storage on pn Junctions . . . . . . . . . . . . . . . . C . Theory of pnp Storage Capacitors . . . . . . . . . . . . . . . D . Experimental Results . . . . . . . . . . . . . . . . . . . . . I11. JFET and MESFET DRAM Cells. . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . B. Implementations of JFET and MESFET DRAM Cells . . . . . . C . Effect of Transistor Gate Leakage . . . . . . . . . . . . . . . D . A 4-Bit JFET Dynamic Content-Addressable Memory . . . . . . IV . Heterostructure DRAM Cells . . . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . B . Undoped Heterostructure DRAMs . . . . . . . . . . . . . . . C . Quantum-Well Floating-Gate DRAMs . . . . . . . . . . . . . D . Modulation-Doped Heterostructure DRAMs . . . . . . . . . . . V . Bipolar DRAMs . . . . . . . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . B. Concept of the Bipolar DRAM Cell . . . . . . . . . . . . . . . C . Experimental Results . . . . . . . . . . . . . . . . . . . . . VI . Future Directions . . . . . . . . . . . . . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . . . . . . B. Trench Capacitors and Stacked Capacitors in GaAs . . . . . . . C . Nondestructive Readout Cells . . . . . . . . . . . . . . . . . D . Ultra-Long Storage Times: Quasi-Static and Nonvolatile DRAMs . E . Nonconventional Applications . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . .
1 1
4 6 6 8 16 23 32 32 32 37 40
44 44 44 49 54 59 59 59 62
64 64 65 69 71
74 17
1. INTRODUCTION. MOTIVATION. AND POTENTIAL APPLICATIONS A . Introduction and Background
The evolution of electronic systems inevitably creates demand for integrated circuits with higher integration densities. higher speeds. and lower power consumption. Often the performance of a system is limited by a relatively 1
Copyright 0 1993 by Academic Press. Inc . All rights of reproduction in any form reserved . ISBN 0-12-014728-9
2
JAMES A. COOPER, JR.
small number of components in certain critical paths, so that improvements in the speed of these components reflect directly in enhanced performance of the system as a whole. The desire for high-speed integrated circuits for such applications has led to the investigation of GaAs as a host semiconductor for high-performance systems. The trend toward programmable digital systems has intensified the need for large quantities of high-speed digital memory. To date, most of the memory development in GaAs has been based on simple adaptations of existing GaAs circuit technology in the form of six-transistor static storage cells (Fiedler, Chun, and Kang, 1988; Makino et al., 1988, 1990; Vogelsang et al., 1988; Terrell, Ho, and Hinds, 1988; Maysue et al., 1989; Nakano et al., 1990), as shown in Fig. l(a). These cells are organized in a two-dimensional array to form a random-access memory (RAM). Circuits of up to 16 kilobit complexity have been demonstrated, with access times in the 3-7 nsec range. While straightforward to design and build, these static RAM (SRAM) cells are inefficient in two respects: they occupy considerable chip area, and they dissipate static power in the storage state. In silicon integrated circuit technology, the majority of large-scale RAM circuits are one-transistor dynamic RAMS (DRAMs) of the type shown in Fig. l(b). The one-transistor DRAM cell is much smaller than the six-transistor SRAM cell and dissipates negligible standby power, thus permitting large arrays to be incorporated on a single chip. The one-transistor cell consists of a storage capacitor and a single access transistor. In operation, the access transistor is turned on by the word line, electrically connecting the storage capacitor to the bit line. The storage capacitor is then charged to the potential of the bit line, representing either a logic 0 or a logic 1. When the access transistor is turned off by the word line, the storage capacitor is isolated from the bit line and remains charged to the former bit-line potential until leakage currents eventually destroy the stored information. This gradual degradation of the stored potential requires that the storage cell be periodically read and refreshed. The refreshing period must be substantially shorter than the storage time of the weakest cell in the array at the highest operating temperature. In silicon DRAM systems, refresh rates in the neighborhood of 1 kHz are commonly used, and therefore the weakest DRAM cell must have a storage time greater than about 20msec at the highest temperature. The storage capacitor in most silicon DRAMs is a metal-oxide-semiconductor (MOS) capacitor, made possible by the high-quality SiO, insulator formed by thermal oxidation of silicon. One of the obstacles to the development of DRAM cells in GaAs has been the lack of a high-quality native oxide. However, it has been found that charge storage in GaAs can be successfully accomplished using the capacitance of properly designed reversebiased p n junctions. As evidence of progress in this area, Fig. 2 shows the
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES Bit Line
VDD
VDD
3
Bit Line
+ Bit Line
Word Line
(b) FIGURE 1. Six-transistor static RAM cell (a), and one-transistor dynamic RAM cell (b). Static RAM cells in GaAs usually use MESFETs or JFETs.
evolution in room temperature storage time of GaAs homojunction capacitors over the past four years. Note that storage times have increased about one order-of-magnitude every two years and are now over 10 hours at room temperature. These results are far superior to those observed in silicon MOS capacitors. Several types of access transistors are possible in GaAs. These include metal-semiconductor field-effect transistors (MESFETs), junction field-effect transistors (JFETs), modulation-doped field-effect transistors (MODFETs), and heterojunction bipolar transistors (HBTs). All have been investigated
4
JAMES A. COOPER, JR.
loo I
,
1986
,
'
'
' ' ' ,
1987
'
1988
' ' ,
'
1989
' ' '
'
'
1990
' ' '
'
, ' ' ,
1991
I
1992
Year FIGURE 2. Room temperature storage time of GaAspn homojunction storage capacitors as a function of year since development began in 1987. The trend line indicates one order-ofmagnitude increase every two years.
and will be discussed and compared in Sections 111-V. At this point, however, it is appropriate to consider possible applications for GaAs DRAMS. B. Motivations for the Development of DRAM Technology in GaAs As mentioned previously, digital systems in GaAs will require digital memory, and in order to preserve the speed advantage of GaAs systems, this memory must be fast. As a specific example, consider a GaAs microprocessor operating at a clock rate of 200MHz (5nsec cycle time). Such a processor would be similar to the 32-bit GaAs RISC microprocessor designed by Texas Instruments as part of the VSHIC program (Whitmire, Garcia, and Evans, 1988). A typical instruction requires several accesses to external memory. The simple instruction shown in Fig. 3(a) consists of four cycles: fetch instruction, decode instruction, fetch data, operate on data. Each cycle requires 5 nsec, for a total of 20 nsec for the entire instruction. However, if the program and data are stored off-chip in a silicon memory, operation is slowed considerably. For example, assume a 20nsec silicon SRAM is used. Then each external memory access requires the insertion of three "wait states," bringing the total instruction time to 50 nsec, as shown in the bottom half of Fig. 3(a). Now suppose instead we place a high-speed GaAs cache memory on the same chip as the microprocessor. Assuming a good ratio of cache "hits," the wait states can be eliminated and the instruction execution time decreases by a factor of 2.5. Figure 3(b) shows system throughput as a function of CPU cycle time for two cases: (i) when each memory access must be directed to a 20 ns off-chip RAM, and (ii) when each access can be handled by a 0.5 ns
RECENT ADVANCES I N GaAs DYNAMIC MEMORIES
5
Simple 4-Cycle Instruction Format
Decode
Fetch Instruction
-
I Instruction 1 I +t
Fetch Data
Operate on Data
I I
I
cycle
Introduction of Wait States to Accomodate Slow Memory (5 ns cycle time, 20 ns memory access time)
Fetch
Decode Fetch Data
Wait Wait Wait
I
I
I
I
I
I
Operate
Wait Wait Wait
I
D~~
I
CPU Clock Rate (MHz) 1oooo
1000
100
1
10
1000
100
10
0.1
Cycle Time (ns)
FIGURE 3. Part (a) shows a basic four-cycle CPU instruction. Assuming a 5 nsec CPU cycle time, each access to a 20 nsec external memory would require the introduction of four wait states, increasing the execution time to 10 CPU cycles. In part (b) we illustrate the impact that a 0.5 nsec on-chip cache memory would have on system performance as a function of cycle time, assuming a unity cache hit ratio.
on-chip cache memory. It is apparent that the benefits of cache memory increase significantly as the CPU cycle time decreases. In order to be useful, a cache memory must be sufficiently large to ensure a good ratio of cache “hits.” A typical size would be 16 kilobytes (131,072
6
JAMES A. COOPER, JR.
bits). If such a memory were constructed with six-transistor static RAM cells, it would probably be too large to be included on a GaAs microprocessor chip. However, such a cache memory could be implemented using one-transistor dynamic RAM cells without excessive area penalty. There are other motivations for the development of dynamic memory technology in GaAs. A dynamic memory is inherently an analog device, and several specialized applications exist for analog storage. Examples include switched capacitor filters and electronic neural networks (see Section VI). In addition, the basic information obtained in the course of DRAM development is applicable to any device or circuit that requires low leakage, such as GaAs dynamic logic or low-dark-current imagers. As an example, GaAs DRAM research has led to a technique for reducing the subthreshold current in GaAs MESFETs by three orders of magnitude (see Section 111). In the sections that follow we will consider the factors limiting long-term charge storage on pn junctions in GaAs. We will then describe several types of one-transistor DRAM cells, including JFET, MESFET, MODFET, heterojunction, quantum well, and bipolar cells. We will conclude with a discussion of future directions, applications, and opportunities. 11. PN-Junction Storage Capacitors
A . General Description
Dynamic charge storage in GaAs can be accomplished quite efficiently using the capacitance of a properly designed reverse-biased p n junction. In this section we will first discuss the operating principles and design considerations for pn junction storage capacitors. We will then present experimental data on the performance of GaAs p n junction capacitors as a function of doping, geometry, and temperature. To begin, let us consider the pnp test structure shown in Fig. 4. Here the central n-layer is floating, and forms two back-to-back pn junctions with the surrounding p-type layers. The storage time of these back-to-back p n junctions is tested as follows: A bias Vpof either polarity is applied across the structure. This forward biases one p n junction and reverse biases the other. In the process, electrons are removed from the floating n-region across the forward-biased junction, leaving the n-layer at a positive potential. In steady state, the potential of the n-layer is essentially equal to V,, since the steady-state current will be negligible unless the reverse-biased junction is driven into breakdown. When the applied bias is returned to 0, the charge on the n-region redistributes between the two junctions in such a way that both are reverse biased. The n-region is now floating at a positive potential
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
P GaAs Substrate
FIGURE 4. Schematic of a pnp charge storage capacitor, with junction voltages defined for further analysis. Note that the voltage polarities are chosen so that positive values indicate reverse bias.
somewhat less than V,, as determined by the capacitance divider formed by the two junctions. The storage capacitor will gradually discharge as thermal generation supplies electrons to the positive n-region. This discharge can be monitored by observing the capacitance of the two back-to-back junctions, as illustrated in Fig. 5. Immediately following the bias pulse, this capacitance is small due to the large depletion regions of the reverse-biased junctions. As thermal generation returns the structure to equilibrium, the capacitance gradually rises to the value determined by the zero-bias depletion widths of the two junctions. For discussion purposes, we define the storage time as the time required for the capacitance to return to within l/e (36.8%) of its equilibrium value. In designingpn junction storage capacitors, two objectives must be kept in mind. First, we wish to maximize the charge storage density at a given voltage; i.e., the capacitance per unit area of the junction. Second, we wish to minimize the generation current that discharges the capacitor. Up to a certain point, both these objectives may be met by increasing the doping on both sides of the junction. There is, however, a limit beyond which increasing the doping will actually increase the generation current per unit area and reduce the storage time. These effects will be discussed later in this section. Before moving to a more detailed discussion of charge storage in the pn junction capacitor, however, we will anticipate the final results of this chapter by
8
JAMES A. COOPER, JR.
FIGURE 5 . Capacitance transient observed on a symmetrically-doped pnp storage capacitor. For t < 0 the bias is 0 and the structure is in equilibrium. A bias pulse V, is applied at I = 0 and removed at I = to. During the bias pulse, one junction is forward biased while the other is reverse biased. Electrons are removed from the n-region by the forward-biased junction, leaving that region positive charged. The capacitance decreases due to the increased depletion width of the reverse-biased junction. At t = to the bias is returned to 0. The n-region remains positively charged and both junctions are now reverse biased. Thermal generation gradually returns the structure to equilibrium. The capacitance recovery is approximately exponential in time, with time constant T ~ .
revealing that, by proper design, it has proven possible to achieve both high-charge storage densities (1.85 fC/pm2for a single junction or 3.7 fC/pm2 for a pnp structure at 1 V reverse bias [Dungan, 1989; Dungan et al., 19901) and long storage times ( 2 10 hours at room temperature [Stellwag, Melloch, and Cooper, 1991b]) using pn-junction storage capacitors. B. Charge Storage on pn Junctions 1. Steady-State Relationships
In order to examine the most important features of charge storage on p n junctions, we will make a number of simplifying assumptions. We assume an abrupt (step) junction having uniform nondegenerate doping on each side, as shown in Fig. 6 . We also assume that all dopant atoms are ionized at the temperatures of interest. Employing the depletion approximation, it is easy to show that the maximum electric field, which occurs at the metallurgical
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
9
Mx -X
0
FIGURE 6 . Cross section of an abrupt pn junction. The voltage polarity is defined so that a positive value represents a reverse bias.
junction, is given by
where E, is the semiconductor permittivity, NA and No are the doping on the p and n sides, respectively, and xp and x, are the depletion widths on thep and n sides. These depletion widths can be expressed in terms of the total potential drop across the junction as
Here VAis the applied voltage, taken to be positive under reverse bias, and V,, is the built-in potential of the junction, given by (4)
where q is the electronic charge, k is Boltzmann’s constant, T is absolute temperature, and niis the intrinsic carrier concentration at temperature T. The capacitance of the junction serves as a useful indicator of the electrostatic state at any given time and is given by
where A is the junction area. Finally, we will calculate the charge storage density of a pn junction. The charge per unit area removed from the junction under reverse bias VAis given
10
JAMES A. COOPER, JR.
where xpoand xn0are the equilibrium values of xpand x,. Making use of either (2) or (3), the charge stored per unit area at reverse voltage V, is
Here we see that the charge storage density increases as the square root of doping and also as the square root of (VB,+ 5 ) .The maximum charge storage density is limited by junction breakdown. In GaAs, to achieve breakdown fields requires charge densities in excess of 5 fC/pm2.However, a more reasonable value for maximum charge storage density is in the range of 1-2 fC/pmz. This is comparable to the planar charge storage densities presently achieved in silicon dynamic RAMS. 2 . Generation Mechanisms In describing the operation of a pnp storage capacitor, we indicated that if the voltage is quickly swept back toward 0, the charge removed from the n-region will not change appreciably. The rate at which charge “leaks back” onto the n-region is a central issue in DRAM cell operation. Working toward an understanding of the charge recovery transient in a pnp structure, we first need to consider the generation mechanisms responsible for leakage in reverse-biased pn junctions. Electron-hole pairs can be created by thermal generation, photogeneration, or impact ionization. Assuming the capacitor is kept in the dark and that internal fields are low enough that impact ionization can be neglected, the primary leakage mechanism is thermal generation (Dungan, 1989). Thermal generation can occur within the depletion region itself or in the neutral bulk within a diffusion length of the depletion region. We will show that generation in the neutral bulk can be neglected compared to generation in the depletion region. Finally, thermal generation may occur along the periphery of the device, where thepn junction is exposed by a mesa etch. This “edge” generation will prove to be important and must be included in calculation of the charge recovery transient. Thermal generation may involve either direct band-to-band transitions or transitions through defect centers in the forbidden band-gap (ShockleyRead-Hall generation). Because of the greater energy required for each transition, band-to-band generation is much less likely than generation by means of defect centers (SRH generation). Therefore, we will neglect bandto-band generation in the discussions which follow.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
11
a. Generation in the Depletion Region. The bulk generation rate per unit volume due to a single-level defect center at energy ETis given by (Shockley and Read, 1952)
n: - np (8) zp(n + n1) + TAP +PI) Here n and p are the electron and hole densities, z, and 7p are electron and hole lifetimes, and n, and p , are constants that depend on the energy E T of the defect center. Since we have assumed nondegenerate material, the carrier densities can be written
G=
n = niexp( F,, T - Ei
)
(9)
where F, and Fp are the quasi-Fermi levels for electrons and holes, and E, is the intrinsic Fermi level, usually taken to lie approximately at midgap. The constants n, and p, are the carrier densities that would be present if the respective Fermi levels lay at the energy ET. Therefore, we can write
PI =.PI(
T)
- ET
The np product in the numerator of (8) can be obtained by multiplying (9) and (10): np = n:exp( F" -7;T-) -Fp It is commonly assumed that the quasi-Fermi levels can be extended across the depletion region of a pn junction with essentially zero slope. Under this assumption, the splitting of the quasi-Fermi levels within the depletion region is equal to the applied voltage, and (13) may be written np
= n:exp(
%)
Combining (8)-(12) and (14), we can write
Equation (15) gives the generation rate per unit volume at any point within
12
JAMES A. COOPER, JR. 1.5
,
,
,
,
,
,
,
,
,
N A = N D = l x l O ” cmm3
-
V*=5V
. c
1.0
-
0.5
-
i
1
0
u b‘
,
0.0
I
,
t i , ;
FIGURE 7. Generation rate per unit volume as a function of position in the depletion region, calculated using (15). Here we have assumed a symmetrical junction with midgap generation centers.
the depletion region. To calculate the total leakage current per unit area arising from generation within the depletion region, we need to integrate (15) with respect to position from one edge of the depletion region to the other: + XP
Jdepl
=4
G ( 4&
(16)
The integration in (16) is complicated by the fact that the terms (F, - E,) and ( E , - F,) in (15) are strong functions of position. The simplest way to evaluate the integral in (16) is to recognize that the integrand is very nearly a rectangular function of position. Figure 7 shows the generation rate given by (15) as a function of position for a typicalpnjunction. Note that G(x) goes exponentially to 0 wherever either (F,, - E T ) or ( E , - 4 ) is positive. When both (F,, - E T ) and (ET - F,) are negative, the exponential terms may be
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
13
neglected and (15 ) reduces to G=“[l %
-exp(s)]
where z, is a generation lifetime given by %=
e(E~-EOIkT p
e(E,--Er)IkT
+
n
(18)
In the typical case, V, 9 kT/q, so that (17) becomes simply G = ni/zc. The leakage current can then be expressed as
where W, is the generation width, defined as the region within which (Fn- E,) and (E, - F,) are negative; i.e., the region where F, < E, < Fp.
b. Generation in the Neutral Regions. Hole-electron pairs that are thermally generated withln one diffusion length of the edge of the depletion region can also contribute a leakage current that will discharge the capacitor. This diffusion current is described by the Shockley diode equation:
where D,and Dpare diffusion coefficients, L, and Lpare diffusion lengths, and npo and pnoare equilibrium minority carrier concentrations for electrons and holes, respectively. This diffusion current arises as a result of the concentration gradient of minority carriers in the neutral regions adjacent to the edge of the depletion region. Under sufficiently large reverse bias (V, > 0), the minority carrier concentrations at the edge of the depletion regions are essentially 0, while deep within the neutral regions they are at their equilibrium values: npo= n f / N Aand pno= $IND. This concentration gradient gives rise to a steady diffusion current in the reverse direction. In GaAs pn junction storage capacitors, however, this diffusion current is negligible compared to generation within the depletion region (Dungan, 1989). This can be seen by the following calculation: At room temperature, n: in GaAs is approximately 4 x 10l2cm-6 while the doping densities N, and ND for typical storage capacitors are in the range lO”-lO’*~m-~.Thus the equilibrium minority concentrations npoand p,,,, are typically on the order of lo-’ ~ r n at - ~room temperature. Assuming reasonable values for diffusion coefficients and diffusion lengths, we estimate a total diffusion current of about 2000 carriers/(cm2s). If the capacitor has an area of 10 x 10 pm’, the Carrie&, or total diffusion current discharging the capacitor is only 2 x one carrier returned to the storage capacitor every eight minutes. Since the
14
JAMES A. COOPER, JR.
minimum charge stored on the capacitor is at least lo6 carriers, this leakage rate is totally negligible. c. Generation at the Junction Perimeter. Leakage current arising from generation at the perimeter of the storage capacitor must also be considered. Experimental measurements on real storage capacitors indicate that a substantial portion of the leakage current is due to such generation. At etched surfaces the termination of the crystal lattice gives rise to a continuum of generation-recombination centers distributed across the bandgap in energy. The total generation rate per unit area at the perimeter can be calculated by summing the contributions from generation centers at various energy levels across the band-gap. Thus, in analogy to (8), we may write (Schroder, 1987)
where n, and p , are the carrier densities at the surface, ,c, and c,, are capture coefficients of surface centers for holes and electrons, and D,T(E) is the density of surface centers at energy E measured per unit area per eV. The terms nls and pisare functions of energy E and are analogous to the n, and p i terms defined for bulk centers in (11) and (12) earlier. As in the case of bulk generation, we regard (21) as representing the surface generation rate at one point within the depletion region and integrate (21) with respect to position within the depletion region. Thus, by analogy to (16), the surface generation contribution to current in the depletion region is given by
j
+X P
J W P
=4
GP(x)dx
(22)
- Xn
In order to calculate this integral, we recognize that n, and p s are functions of x,and that the G, in the integrand will be negligible except in the region WGpwhere both n, and p s are small compared to ni. Since n, and p , are negligible within W,,, the integrand GPis independent of position and can be taken outside the integral in (22). Thus, combining (21) and (22) using these considerations leads to
where s, is identified as the surface generation velocity. Equation (24) for surface generation in the depletion region is the direct counterpart to (19) for bulk generation in the depletion region.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
15
3. Charge Recovery Transient
Combining (19) and (24) and taking into account the area and perimeter of the junction leads to an expression for the total generation current discharging the p n junction storage capacitor
z = ( ~ c e p )l A +
( ~ d e p 1 . P )= ~
qni
[(3) + w,,s)P] 7,
A
(
(25)
Using (6), the total charge stored on the capacitor can be written
Q = qNA(xp - x p ~ ) A (26) where xp is the depletion width on the p side of the junction and x is the 40 value of xp in equilibrium. To maintain charge neutrality, we require that NA ( x p - x p O ) = ND ( x n - xnO) (27) where x, and xd are the actual and equilibrium values of depletion width on the n-side of the junction. Denoting (x, + x , ) by Wand (xpO xd) by Wo,we can write
+
In order to write the differential equation governing charge decay in a p n junction charge storage capacitor, we must obtain expressions for the effective generation widths W, and W, in (25). The generation width W, was introduced in (19) and defined as the width of the region where F, < E, < F,. For the most general situation where NA # No and E , # E,, the expressions for W, as a function of bias are cumbersome (Dungan, 1989). However, if the generation centers lie very close to midgap ( E , x E,) and the junction is symmetrically doped (NA= ND = NB),then the expression for the generation width W, is simplified considerably. Under these assumptions, one can show that W,
-+
(W - W,). . .if NA = ND and E,
= E,
(29) where W is the depletion width at reverse bias and W, is the equilibrium depletion width ( VA= 0). If we further assume that the generation width at the perimeter is the same as the generation width in the bulk, we can write (Dungan, 1989) that
w,=
W,,=(W-
Inserting (30) into (25) yields
W,)=- 2Q qAN,
16
JAMES A . COOPER. JR
Equation (31) has the solution
Q(r)
=
Q(0)eKf’7s
where tS is the storage time constant defined by
Equation (33) is extremely important to understanding the capabilities of pn junction storage capacitors in GaAs. As will be confirmed by experimental measurements in Section II.C, (33) tells us that the storage time should be a strong function of temperature, since the intrinsic carrier concentration ni increases exponentially with temperature according to
where N, and Nvare the effective density of states in the conduction band and valence band, respectively, and EGis the band-gap energy. Although N,, N,, and E G are weak functions of temperature, the dominant temperature dependence arises because of the kT factor in the exponent. Thus, if (34) is inserted into (33), one expects that a plot of In (tS)versus l/Twould be linear with a slope approximately equal to EG/(2k).In other words, the storage time of a pn junction storage capacitor decreases exponentially with temperature. A second point that is apparent from an examination of (33) is that a plot of 1 / t S versus ( P / A ) should be linear, with slope 2nis,/NB and intercept 2n,/NBt,. From such a plot, it should be possible to estimate both the surface generation velocity and the generation lifetime at a given temperature. A final point should be made with regard to (30)-(33). In developing these equations we have assumed a symmetrically doped junction with generation centers at midgap. If either of these assumptions fail, our equations have to be modified. In the general case where ND# NAor E , # E,, WGis larger than ( W - W,) and the ratio of WG/(W - W,) is bias dependent (Dungan, 1989). As a result, the charge recovery transient in (32) is no longer exactly exponential, and the storage time in (33) loses its special significance. A recovery time can still be defined, however, but it becomes bias dependent. We will touch on this matter again in Section II.C.2. C. Theory of pnp Storage Capacitors Having considered the generation mechanisms and charge recovery transient in pn junctions, we now return to the pnp storage capacitor structure shown
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
17
in Fig. 4. The behaviour of this structure was described qualitatively in Section 1I.A. 1. Steady-State Capacitance- Voltage Relationship At this point we wish to calculate the capacitance-voltage relation of the pnp structure shown in Fig. 4.We assume that, in general, the two pn junctions are not doped symmetrically, and we designate the doping of the top layer NAl and the bottom layer NA2.When a bias is applied across the pnp structure, negligible current will flow in steady state, since one of the pn junctions is reverse biased. Consequently, the voltage drop across the forward-biased junction is also negligible, and virtually all the applied voltage develops across the reverse-biased junction. Thus, the total capacitance of the structure is that of two capacitors in series, one at essentially zero bias and the other reverse biased by an amount V,. Assume that V, > 0, so that the top junction is forward biased and the bottom junction is reverse biased. Under these conditions (and assuming the dopings NAIand NA2are within about two orders of magnitude of each other) the capacitance of the top junction CJ,(0) will be larger than the capacitance of the bottom junction cJ2( V,). Equation ( 5 ) can be used to calculate both these capacitances. The total capacitance of the pnp structure measured in steady state is then
The total charge stored is given by (7) evaluated for the reverse-biased junction (junction 2):
The steady-state capacitance-voltage relationship (V, swept away from 0 ) is illustrated for two values of NA2in Fig. 8. Note that if the doping is not symmetrical (i.e., NA2# NAI), the capacitance-voltage relationship will not be symmetrical about VA= 0. Figure 8 shows a decreasing capacitance when the voltage is swept away from 0 in either direction (assuming quasi-steady-state conditions), and a different, almost flat, capacitance if the voltage is swept rapidly back toward 0. The capacitance remains nearly constant when the voltage is swept toward 0 because the charge Qlotal removed from the n-region during the outward sweep cannot be quickly restored when the voltage is reduced. Again, assume that V, > 0, so that the top junction is forward biased and the bottom junction is reverse biased. When the applied bias is swept toward 0, the charge Q,,,,, , which originally resided entirely on the bottom junction, must
18
JAMES A. COOPER, JR. 4.0 10 N -
3.5 10
. E
3.0 10
8 .-0
2.510
c)
p
2.0 10
0 -5
-4
-3 -2
-1
0
1
2
3
4
5
Applied Voltage (V)
FIGURE8. Capacitance-voltage curves of a pnp storage capacitor for two different values of doping in the second pf region.
now redistribute between the two junctions in such a manner that the total voltage drop across both junctions adds up to V, . Thus, the voltages V, and V, must satisfy
where Qto,,, is given by (36) when V, reaches its maximum value on the outward sweep. At the same time, we can write
-v,+v,=V,
(38)
Substituting ( 3 8 ) into (37) yields
Equation (39) can be solved for V, as a function of VA.The capacitance of the pnp structure as VAis swept back toward 0 is then given by
where V, is obtained from the solution of (39). Depending on the relative dopings N,, and N A 2 ,the capacitance may increase or decrease as VAis swept toward 0. An example of C-V relationships for both symmetrical and asymmetrical doping is shown in Fig. 8. The value of V, that satisfies (39) when V, = 0 is of particular importance, since this is the voltage across each of thepn junctions at the end of the "write pulse." It is this voltage that determines the generation widths W, and W,, at the start of the charge recovery transient.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
19
2. Capacitance Recovery Transient (Dungan, 1989) Assume that the pnp structure has been ,pulsed to a specific voltage V, = Vp, resulting in a stored charge QtOmi, where Qtotai is given by (36).If the structure is symmetrically doped ( N A i= NA2),the charge Qtota, divides equally between junctions 1 and 2. If we further assume that each junction individually is symmetrically doped (N,, = NA2= N,, call this value of doping NB)and that the dominant generation centers lie at midgap (E, = Ei),the charge recovery transient of each junction is given by (32), with Q(0) set equal to Qtota,/2.The instantaneous charge on each junction given by (32) can be related to an instantaneous value of depletion width using (28). Thus, combining (32) and (28), we can write for each junction that Q(t)
Qtotai
= -exp 2
(- t/z,)
=qANB 2 ( W ( t )-
6)
(41)
Solving for W(t), W ( t ) = W,+-exp(-t/z,) Qtotai qANB Since the capacitance of the pnp structure is that of two junctions in series, we can write
As can be seen from (43), the capacitance transient that occurs due to generation within the pn junction is not exponential, but rather the reciprocal of a constant plus an exponential. Plots of (43) for NB = 5 x 10’7cmp3and pulse voltages of 1,2, and 5 V are shown in Fig. 9 as symbols. One is tempted to view the curves in Fig. 9, as truely exponential; i.e., as represented by an equation of the form C(t) = C,, - [C,,- C(O)]exp ( - t / z c )
(44)
In fact, the lines in the figure (barely visible beneath the symbols) are least-squares fits of (44)to the “data” represented by the symbols. The agreement would appear to be excellent, but the time constants zc obtained by the fit are approximately 9, 16, and 32% longer than z, for the 1, 2, and 5 V curves, respectively. Under the assumed conditions of symmetrical doping and midgap generation centers, we can derive an expression relating the measured capacitance recovery time constant zc to the actual charge recovery time constant z, by
20
JAMES A. COOPER, JR.
h
2 610.’
8 a C .-8
c)
510-8
P 0 4 10 -
1
0
1
2
3
4
5
t l zs
FIGURE 9. Capacitance recovery transients (symbols) calculated from (43) for a symmetricalpnp storage capacitor with NA = ND = 5 x lO”cm-’ and pulse voltages of 1,2, and 5 V. The solid lines (almost obscured by the symbols) are least-squares fits to (44).Although the fits appear to be quite good, the time constants T~ obtained by the fitting are longer than T~ by 9, 16, and 32%, as shown on the figure.
equating (43)and (44).As shown by Dungan (1989), this ratio can be written
Here e is the base of natural logarithms (e = 2.7182). The time constant ratio in (45)is typically between 1 and 2. In particular, since Qtota, increases as the square root of pulse voltage V, through (36), the capacitance recovery times observed in experiments will be voltage dependent, increasing as pulse voltage is increased. This is consistent with the trends in Fig. 9. As stated, such variations are typically less than a factor of 2 over the normal range of pulse voltages used (1 V < I V,l < 5 V). Note that the charge recovery time constant ss is not voltage dependent (see eq. 33) - rather it is our approximation of zs by zc that is voltage dependent. The preceding discussion required the rather restrictive assumptions that both junctions are symmetrically doped ( N A ,= ND = N A 2 ) and that the dominant generation centers lie at midgap (ET = Ei). These restrictions may be removed through a more careful calculation of the generation volume, the region where FN < ET < F,. The mathematics involved are straightforward but tedious (Dungan, 1989) and will not be reproduced here. However, an example of such a calculation is shown in Fig. 10. Here we plot the normalized capacitance recovery transient for two pulse voltages and three generation center positions. All six curves are calculated assuming that the (A/ effective generation current per unit volume in the depletion region, Jeff cm’), is equal to unity. The “time” axis actually represents the time integral of the charge density generated per unit volume in the depletion region during
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
21
1.o
0.8 0.6 0.4
0.2 0.0 0.00
0.01
0.02
0.04
0.03
JEFFt (C/cm
0.05
3,
FIGURE10. Normalized capacitance recovery transients for a nonsymmetrical pnp capacitor ( N A ,= NA2= 1 x 10’8cm-3,ND = 1 x l O ” ~ m - ~for ) two pulse voltages and three generation center energies. The horizontal axis is “time” multiplied by the effective generation current per unit volume Jeff= qn,(l/s, + s,P/A).
the recovery. Actual times can be computed by dividing the normalized = qn,(l/tG s G P / A ) . “time” axis by Jeff One observes that the capacitance recovery time constants in Fig. 10 (the point where the normalized capacitance is 0.632) are dependent both on initial bias and on the energy of the generation center. Again, higher pulse voltages lead to longer capacitance recovery times. However, as stated earlier, the variations are typically less than a factor of 2. We shall see that these corrections are small compared to the range of storage times that arise due to variations in the design of the storage capacitors and the procedures used to fabricate them. In the sections that follow, we will make no particular distinction between the capacitance recovery time constant tC and the charge recovery time constant tS and will simply refer to the measured capacitance recovery time constant as the “storage time” ts.
+
3 . Charge Removal Transient
In the development in Section II.C.1, we made the tacit assumption that charge is removed instantaneously from the n-region whenever a bias V, is applied across the pnp structure. This is not the case, and in fact the charge removal transient that occurs on the application of bias can be extremely slow in a higher band-gap semiconductor material such as AlGaAs or Sic. To obtain closed-form expressions for the charge removal transient, we will consider the simple circuit consisting of a forward-biased diode charging a capacitor, as shown in Fig. 11. For this development, we assume the capacitor has a fixed value independent of voltage. The capacitor is initially at zero voltage, and a bias 6 is applied across the diode-capacitor combination at t = 0. We wish to obtain an expression for the voltage V , across the
22
JAMES A. COOPER. JR
I +
t=O
VA
T
7i7T FIGURE11. Simple circuit model for visualizing the charge removal transient in a pnp capacitor structure.
capacitor as a function of time. The current flowing through the forwardbiased diode may be written 1 = lo ( e q ' ~ ' ~ ' - 1) = C dQ
dt
(46)
where Qc is the charge on the capacitor and I, is the saturation current density of the diode given by I,
=
ya.i(0" + L~NA
--)
0,
(47)
LpND
It is significant to note that I, varies as the square of n,. Since n, decreases exponentially with band-gap energy, as shown by (34), I, may be extremely small in high-band-gap semiconductors at room temperature. The charge on the capacitor can be written Qc = cvc = C ( ~-A VD)
(48)
and its derivative is
Combining (46) and (49) gives a differential equation that may be solved for Vo(t).After algebra, we find that
where
T,,
given by
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
10.1510-1310-11
l o - O 10-7 .,0-5
23
lo-3
t IT,
FIGURE 12. Diode voltage drop as a function of time for several values of applied voltage
6 during the charge removal transient. The write time can be made arbitrarily short if the write process is terminated before the diode voltage reaches 0. For example, if we are willing to provide a write voltage V, that is 0.25 V greater than the final value of capacitor voltage (allowing VD to be 0.25V % 10kT/q at the end of the write process), then the write time can be reduced to 10-~~,.
is the time for a steady current I, to charge the capacitor C to a potential kT/q. Equation (50) is plotted for several values of V, in Fig. 12. If V, b kT/q and t -g q,, (50) can be simplified to
This relationship explains the linear dependence of the V, = co curve in Fig. 12. It is perhaps more enlightening to plot the capacitor voltage as a function of time, as shown in Fig. 13. The point to note from both figures is that the capacitor charges to within VD(t)of the pulse voltage V, in a specific time t , regardless of the value of VA. Therefore, if we wish to charge the capacitor to a voltage V,, in time t , , this can be accomplished by applying a pulse , V D ( t lis ) obtained from (50) or voltage V, given by V, = V,, + V D ( t l )where Fig. 12. These arguments can be refined by including the specific voltage dependence of the capacitance using (5). This is left as an exercise for the reader.
D . Experimental Results 1. Eflect of Temperature
Consider a large-area storage capacitor in which bulk generation dominates
24
JAMES A. COOPER, JR.
lo-’
10.5
10’
t IT,
FIGURE13 Capacitor voltage as a function of time for several values of applied voltage VA the capacitor dunng the charge removal transient. Note that at a specific time, say t/ro = voltage falls short of the applied voltage VAby the same amount, regardless of the value of VA. This “shortfall” is, of course, the diode voltage drop.
(A/PS
sGtG) so
that the storage time constant given by (33) can be written
Inserting (18) and (34) into (53) and assuming that (i) the dominant generation center is above midgap (this assumption is arbitrary, and equally valid equations would result if the opposite assumption were made), and (ii) z, is of the same order of magnitude as r P ,leads to
where AET = IE, - E l / .Equation (54) suggests that the storage time will be thermally activated with an activation energy EAgreater than or equal to half the band-gap. (Note that we would have reached the same conclusion if we had assumed the generation center to be below midgap, except that t pwould be replaced by 7,). To further appreciate the temperature dependence of (54), note that N , and N y are proportional to T3I2and that T~ is inversely proportional to the thermal velocity vth, which increases as TIi2.Thus, the prefactor is proportional to (1 / T)’. This dependence, however, is overshadowed by the temperature dependence of the exponential factor. In the exponent, the band-gap energy EG is a weak function of temperature and can be approximated by a firstorder Taylor expansion as EG(T) % E G O
-
UT
(55)
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
25
Temperature ("C) 181
144
111
2.4
2.6
84
60
40
21
3.0
3.2
3.4
F8
d
I"
2.2
2.8
1000/T
(1IK)
FIGURE14. Measured capacitance recovery time constants as a function of temperature for pnp capacitors in GaAs and Al,Ga,_,As (x x 0.2). The recovery process is thermally activated, with activation energies very close to half the band-gap.
where EGO is the extrapolated zero-temperature band-gap and ci is a parameter. For GaAs in the neighborhood of room temperature (Thurmond, 1975), EGO z 1.56eV and ci = 4.17 x 10-4eV/K. Inserting (55) into (54) yields
The first exponential term is now explicitly temperature independent. From the second exponential term, we see that the activation energy should be greater than or equal to half the extrapolated zero-temperature band-gap, which for GaAs is about 0.78 eV. Figure 14 shows measured storage times (Dungan, 1989) forpnp capacitors fabricated in GaAs and Al,Ga,-,As (x z 0.2). These capacitors were grown by molecular beam epitaxy (MBE) and were mesa isolated by chemical etching. The doping and thickness of the layers are the same. The storage times in Fig. 14 are indeed thermally activated, as required by (56), and the activation energies are consistent with generation sites at or very close to midgap (AETz 0). In considering the magnitudes of the storage times, we note that dynamic memories in silicon are typically refreshed at a 1 kHz rate, so that the storage times of individual cells must be longer than the refresh period of 1 msec. To take a conservative view, we shall require a storage time of at least 100msec
26
JAMES A. COOPER, JR Temperature (“C) 182 l0000
144
111
2.4
2.6
84
60
40
21
5
2.8
3.0
3.2
3.4
3.6
1000 100 10 1 0.1 0.01 2.2
lOOO/T(1/K)
FIGUREIS. Capacitance recovery time constants for three GaAs pnp capacitors having different doping concentrations. The doping on the lightly doped side is given in the legend. The ~ ~ Note the reduction longest storage times are obtained for dopings in the low l O ’ ’ ~ n -range. in activation energy for the most highly doped sample (0 1990 IEEE).
for GaAs-based memories. By this criterion, Fig. 14 indicates that both these (unoptimized) pnp storage capacitors would be capable of satisfactory operation to over 100°C. 2. Effect of Doping
Equation (33) indicates that the charge storage time constant T~ is expected to scale directly with doping. This is because higher dopings result in smaller depletion regions, and generation within the depletion region determines the storage time. Also, (7) tells us that the charge storage density, i.e., the charge stored per unit area at a given voltage, increases as the square root of doping. Since long storage times and high-charge storage densities are both desirable, it would appear that high doping levels on both sides of the junction are called for. Figure 15 shows measured time constants for three pnp storage capacitors grown by MBE and mesa isolated by chemical etching (Dungan, 1989; Dungan ef al., 1990). All growth and processing conditions were the same except for the doping of the layers. In all three structures the n-layer is doped 1 x l O ” ~ m - ~The . p-layer dopings are 7 x 1015cm-3,1 x 1017cm-3,and 1 x 1019cm-3.All three capacitors are effectively one-sided step junctions, with the generation occurring primarily on the lightly doped side. The doping of the lightly doped side is shown in the figure. (Note that for the sample with NA = 10’9cm-3,the lightly doped side is the n-region, ND = l O ” ~ m - ~ ) . Equation ( 3 3 ) suggests that the charge recovery time constant should scale linearly with the doping of the lightly doped side. Figure 15 shows that
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
27
recovery time indeed increases with doping up to about 1 x 1017cm-3. However, the sample with dopings of 1 x 1018cm-3exhibits a reduced time constant at all temperatures and a reduced activation energy. Thus it appears that there is a practical limit above which further increases in doping actually reduce the storage time. The reduced storage time of the most highly doped sample can be explained by the phenomenon of field-induced barrier lowering. This effect occurs when electric fields within the depletion region become high enough to modify the escape energy of carriers confined on a generation center. This can occur when the potential drop over a distance comparable to the radius of the electron orbit is significant compared to the ground-state energy of the center. Frenkel(l938) showed that the change in escape energy AEA is given by -
AEA = 2q/$
(57)
where F is the electric field in V/cm. Thus, AEA calculated by (57) can be viewed as the reduction in activation energy resulting from field-induced barrier lowering. Applying this correction to the data of Fig. 15, we obtain AE, values of 90, 180, and 330meV, respectively, for the three curves of increasing doping density. These values correspond approximately to the differences between activation energies seen in the figure. It has been found experimentally (Dungan, 1989; Dungan et al., 1990) that higher doping levels can be used successfully if a small undoped (or lightly doped) i layer is inserted between the p and n-type regions in each junction, resulting in a p-i-n-i-p structure such as shown in Fig. 16. The i layer has the effect of reducing the maximum electric field in the depletion region, thereby avoiding the barrier lowering phenomenon, while maintaining a very highcharge storage density (high capacitance per unit area). The storage times of two p-i-n-i-p storage capacitors are compared to a pnp storage capacitor of similar doping in Fig. 17. 3. Eflect of Device Area and Device Scaling In order to construct a high-density memory array using p n junction storage capacitors, it is important that the size of the capacitor be reduced as much as possible. Two issues arise immediately: (i) charge storage capacity and (ii) storage time. Clearly, as the area of the capacitor is reduced, the amount of stored charge is reduced proportionally. What is perhaps not immediately apparent is that the storage time is also reduced. We shall consider both of these limitations in this section. First, we consider the charge storage capacity. To permit reliable detection
28
JAMES A. COOPER. JR.
i (undoped) 30nm
N+ 10’’ cmJ i (undoped) 30nm
FIGURE16. Storage capacitor having p-i-n-i-p structure. The presence of the undoped i layers reduces the maximum electric field in the junction, minimizing field-induced barrier lowering, while still providing a high capacitance per unit area.
of the signal charge during readout and to minimize soft errors due to alpha particle strikes, a dynamic memory must store a certain minimum charge in each cell. In silicon dynamic memories, this minimum is typically on the order of lo6 electrons. To understand how this restriction affects the GaAs storage cell, let us consider the p-i-n-i-p capacitor of Fig. 16. For this capacitor, the Temperature (“C) 182 10000
144
111
84
60
40
21
5
2.4
2.6
2.8
3.0
3.2
3.4
3.6
1000
100
sE H
10 1
0.1 0.01 2.2
10001T(IIK)
FIGURE17. Capacitance recovery times for pnp and p-i-n-i-p storage capacitors. By introducing an i layer, we are able to avoid the reduction in room temperature storage time caused by field-enhanced generation.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
29
Square Edge Length (pm) 198
98
78
62
5.0 6.25
.-cE
0.16
a,
0.12
8.33
8
0.08
12.5
f
0.04
25.0
P
-
3 v
$
C
0 0 100 200 300 400 500 600 700 800
Perimeter / Area (Ilcm)
FIGURE18. Inverse storage time of square pnp capacitors as a function of P / A ratio at several temperatures. At each temperature, the generation lifetime T~ is calculated from the intercept and the surface generation velocity sc from the slope.
charge storage density of each junction is about 1.85fC/pm2 at a reverse voltage of 1 V. Accounting for both junctions, the area required to store lo6 electrons would be 43 pm2 or 6.6pm on a side. This area would be reduced if a higher signal voltage were used. A reasonable lower limit would appear to be around 25 pm2 or 5 pm per side.’ Next we consider the effect of scaling on storage time. One obvious geometrical consequence of reducing the capacitor area is the fact that the perimeter-to-area ratio P / A increases. Taking the inverse of (33),
(-+
1 =2ni 1 TS
NB
sG;)
TG
This implies that a plot of inverse time constant versus P / A should produce a straight line whose slope is determined by the surface generation velocity and whose intercept is determined by the bulk generation lifetime. This is illustrated by the data (Sheppard, 1991) in Fig. 18 for four different temperatures. The inverse storage time increases linearly with P / A , in good agreement with (58). From the slopes and intercepts, we can calculate values of T~ and sG at each temperature using (58). These data are shown in Fig. 19. Here we see that both T~ and sG are relatively weak functions of temperature I As an aside, we note that one wishes to retain lo6 electrons at the end of one refreshing period. We have defined storage time as the time for the electron density to be reduced to l/e of its initial value. Thus, we should store 2.7 x lo6 charges at the beginning of the transient, requiring 2.7 times the quoted area. However, in practice one never allows the charge to decay to lie of its initial value before refreshing. In fact, if we require the storage time to be at least IOOmsec, but refresh at a 1 kHz rate, the initial charge will only decay to 0.99 of its initial value at the end of one refreshing period.
30
JAMES A. COOPER, JR Temperature (“C)
-
111
104
97
91
84
-‘E In
78
In
Y
r”
C
.-0
c)
e
al c
8 0 Q
lo4 3 2.60
h
2.65
2.70
2.75
2.80
2.85
lOOO/l(lIK)
FIGURE19. Bulk lifetime tG and surface generation velocity sG deduced from the plot of Fig. 18.
(at least compared to the temperature dependence of n,). This slight temperature dependence indicates that the dominant generation centers are not precisely at midgap. The generation lifetime is in the vicinity of 200 ns, which is an excellent value for GaAs. We note that the precise values of T~ are somewhat uncertain, since the intercepts in Fig. 18 are estimated by extrapolation and are sensitive to small errors in fitting the data. Of more practical interest is the time constant to be expected if the area of the storage capacitor is reduced below the 62pm per side of the smallest capacitor shown. If we extrapolate the data in Fig. 18 to a side length of 5 pm, we find a storage time of 1.5 sec at 101OC. This is still comfortably above our criterion of 100msec. Therefore it appears that adequate charge storage densities and acceptable storage times can be achieved with storage capacitors down to 5pm on a side.
4. Metal Evaporation Technique Since storage time is determined by extremely small generation currents, it is to be expected that it should be sensitive to certain details of the processing. One of the most important aspects of the processing in this regard is the method of metal deposition. In fact, the storage time of well-made identically prepared samples is reduced by about three orders of magnitude if the ohmic metal is deposited in an electron-beam evaporator as compared to a thermally heated evaporator (Stellwag et al., 1992a). Figure 20 shows storage time versus temperature for four 100 x 100pm2 p-i-n-i-p storage capacitors (Stellwag et al., 1992a). These capacitors are taken from two wafers whose epitaxial layers were grown by MBE using As, and As, flux, respectively. Both p+ layers are doped 1 x 1019cm-3,the n-layer is
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
/
lo5
5
Thermally lo3
Evaporated
i! i=
$
// / AS2
z
d
/
10'
lo-'
10"
E-Beam Evaporated
,
1
I
,
1
,
1
,
1
,
1
1
,
I
,
31
32
JAMES A. COOPER, JR
on the flux conditions during MBE growth. This suggests that, rather than creating new centers, the electron-beam environment is simply activating preexisting centers present in the As-grown film, the density of which depends on the growth conditions.
111. JFET
AND
MESFET DRAM CELLS
A . Introduction
In Section I1 we considered the design and performance of pn junction storage capacitors in some detail. However, the storage capacitor is only half the story. In order to construct a complete DRAM cell, it is necessary to provide an access transistor that will connect the storage capacitor to the bit line for reading and writing and isolate the storage capacitor from the bit line during storage. In this section we will consider two means of implementing this access transistor in GaAs: (i) using junction field-effect transistors (JFETs), and (ii) using metal-semiconductor field-effect transistors (MESFETs). Figure 21 shows how one might implement a one-transistor DRAM cell utilizing a JFET access transistor and a MESFET access transistor. In the following sections we will present experimental results on early versions of both JFET and MESFET DRAM cells, with special attention to the effects of transistor leakage on storage time.
B. Implementations of JFET and MESFET DRAM Cells The first demonstration of a complete JFET-accessed GaAs DRAM cell was reported by Neudeck et al. (1989). The structure, shown in Fig. 22, consists of two large-area p + n storage capacitors surrounded by a ring-gate JFET. Two storage capacitors are used in this implementation so that the charge can be monitored by observing the capacitance between the two p+ plates. This is necessary since the structure is fabricated on a 1 pm undoped buffer layer. The layers are grown by MBE, and the p + gates are patterned by wet etch. The 50nm p-type GaAs layer under the channel is included to improve subthreshold performance of the access transistor, and it is fully depleted during normal operation (Yamasaki, Kato, and Hirayama, 1985). Figure 23 shows waveforms obtained during electrical writing of the cell. The top waveform is the capacitance between the twop+ capacitor plates, the middle waveform is the bit line voltage applied to the source of the JFET, and the lower waveform is a series of 1 msec write pulses applied to the gate of the access transistor (the word line). The word line is held at a negative
33
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES Bit Line
Word Line
PorS.1. GaAs
JFET DRAM Cell
Bit Line
Word Line
ohmic contac
PorS.1. GaAs I
I
MESFET DRAM Cell
FIGURE21. Two examples of FET-accessed DRAM cells in GaAs. Each cell consists of a pn junction storage capacitor and an access transistor. The top drawing depicts a JFET DRAM cell, while the bottom drawing illustrates a MESFET DRAM cell.
potential to keep the JFET biased off. When the word line is pulsed to 0, the JFET is turned on and the potential of the bit line is connected to the storage capacitor. During the first pulse the bit line is positive, and electrons are removed from the n-side of the p n junction storage capacitors. This reverse biases the p n junctions, widening the depletion regions and reducing the capacitance signal. When the second word line pulse occurs, the bit line has returned to 0. This allows electrons from the bit line to flow to the storage capacitors, returning them to their equilibrium conditions. The capacitance is thus increased. One notes that the capacitance signal changes only when the word line is pulsed to 0. This proves that the storage capacitor is effectively isolated by the access transistor: changes in bit line potential have no effect on capacitance while the access transistor is off. The high-capacitance condition occurs when the storage capacitors are in equilibrium, so this state will not decay
34
JAMES A. COOPER, JR. JFET Source (Bit Line)
JFET Drain
JFET Ring Gate (Word Line)
PN Junction Charge Storage Capacitors
50 nm P+ GaAs
FIGURE 22. A prototype JFET DRAM cell implemented in GaAs by Neudeck et al. The cell consists of two storage capacitors surrounded by a ring-gate JFET. Two storage capacitors are used to allow the charge recovery to be monitored by observing the capacitance between the storage gates. A second JFET is included for evaluation purposes (01990 IEEE).
with time. The low-capacitance state represents a reverse bias on the storage capacitor. This is a temporary condition, and the capacitance will gradually return to its equilibrium value as thermal generation discharges the storage capacitor. This can be seen in the waveforms of Fig. 23, where the storage
r
CPII
Capacitance Signal
-
1 -
0
1 msec Write pulse
--)-
-1
-
A L
-2 -
4
wPiue I
4 WPite 0
Bit Line Voltage
4 wuiue I
Word Line Voltage
Time (s)
FIGURE23. Writing waveforms for the JFET DRAM cell of Fig. 22. The upper trace is the cell capacitance, the middle trace is the bit line voltage, and the lower trace is the write pulse train applied to the word line. Note that the cell capacitance changes only at the moments when write pulses are applied to the word line. The stored 1 state gradually decays due to gate leakage in 1990 IEEE). the JFET access transistor and generation in the substrate depletion region (0
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
-
Storage Node Write Pulse
-
0
>
35
-1 -
r
;
:
?
,
:
:
;
:
-
;
;
Word Line Voltage
time is estimated to be on the order of 1 s. This low storage time is due to two factors. First, the gate-to-drain leakage current in the access transistor flows directly to the storage capacitor. In this prototype geometry, the access transistor is a ring-gate configuration that actually maximizes the leakage current supplied to the storage capacitor. Second, the large undoped buffer layer below the storage capacitor provides a large generation volume for leakage currents discharging the storage capacitor. During normal operation of a GaAs DRAM cell, the cell capacitance would not be monitored. Instead, the charge state of the cell would be detected by a change in potential of the bit line when the access transistor is turned on. This mode of operation is demonstrated in Fig. 24. Here the bit line is connected directly to a high-impedance active oscilloscope probe. The top waveform in the figure is the potential of the probe, the middle waveform is a write pulse applied to the p + terminal of the storage capacitor, and the bottom waveform is the pulse train applied to the word line. The write pulse (middle waveform) momentarily forward biases the pn junction storage capacitor, removing electrons from the n-region. During the first word line pulse, the JFET is turned on and electrons flow from the bit line to the storage capacitor to restore the missing electrons. This causes the bit line to jump to a positive potential. The rapid decay of the bit line signal is caused by the RC time constant of the active probe. By the end of the first word line pulse, the bit line potential has essentially returned to ground, and as a result the storage capacitor is also essentially at ground, The second word line pulse produces almost no response on the bit line, since the storage capacitor is
36
JAMES A. COOPER, JR.
Alloyed Ohm (Bit Line) Contact
E, 0
10pm x 200 p n Au MESFET Ga (Word Line)
0
i Storage Node Connection
50 nm
P-
NA = 1 x 10" cm-3
Undoped GaAs
FIGURE25. Top view and cross section of a MESFET DRAM cell implemented by Neudeck et al. This cell employs a linear gate rather than a ring gate, in order to minimize gate leakage current discharging the storage capacitor.
now at ground potential. Thus the charge state of the cell can be determined by detecting the bit line potential when the access transistor is turned on. Similar results have also been achieved using MESFET access transistors (Dungan et al., 1990). Figure 25 shows the top view and cross section of an experimental MESFET DRAM cell (Neudeck, 1991). Notice that in this structure the ring gate has been replaced by a linear gate. Writing waveforms for the MESFET cell are shown in Fig. 26. These waveforms are very similar to those of the JFET DRAM cell in Fig. 23. In order to eliminate the need for a negative supply voltage, the cell is operated with bit line potentials that vary from 1.3V for a logic 0 to 2.45 V for a logic 1. (A thorough discussion of operating voltage considerations for FET-accessed GaAs DRAM cells can be found in Dungan, 1989, and in Dungan et al., 1990). As expected, the cell capacitance changes only when the access transistor is turned on. In spite of the reduced gate width exposed to the storage capacitors, the storage time for this cell is only about 25 msec at room temperature. The drastic reduction in storage time is the result of increased gate leakage in the MESFET as compared to JFET. This effect will be discussed next.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES L
-
P5
Cell Capacitance Signal
4 :
1
-
Q
c
,
3-5
5
-0
’
37
r
Bit Line Voltage
2 -
-_
s”
1 psec Pulse
-
1 -
*-Write
MESFET Gate (Word Line) Voltage
0 I
I
I
I
I
I
I
I
FIGURE 26. Writing waveforms for the MESFET DRAM cell of Fig. 25. The top trace is the cell capacitance, the middle trace is the bit line voltage, and the lower trace is the write pulse train applied to the word line. The access transistor threshold voltage is + 0.8 V, and positively 1990 IEEE). shifted logic levels are employed (0
C . Eflect of Transistor Gate Leakage When an access transistor is connected to the storage capacitor, additional leakage mechanisms are introduced that reduce the storage time of the cell, even when the access transistor is deliberately biased into the off state. These leakage mechanisms are illustrated in Fig. 27. The primary leakage mechanisms are subthreshold leakage of electrons from the source to the drain and leakage of electrons from the gate (or gate depletion region) to the drain. Since the drain is connected directly to the storage capacitor, these leakage paths tend to prematurely discharge the capacitor. Subthreshold leakage from the source can be described by (Conger, (Word Line)
t
4 -
Source
Drain
IDG
(Storage Node)
(Bit Line)
ov
++I++
4
IDS
--r-
FIGURE 27. Drain current components in a GaAs field-effect access transistor. The primary mechanisms affecting DRAM storage time are drain-to-source current I,,, and drain-to-gate current I,.
38
JAMES A. COOPER, JR. 10
10 -5
- VDs = + 0.5V
Transistor
ON
0..
c p! L
a
-1 .o
\
/,
10 -10
-0.5
0.0
. 0.5
Gate Voltage (V)
FIGURE28. Drain current as a function of gate voltage in a GaAs field-effect transistor. I,, is the drain-to-source current and IDG is the drain-to-gate current. I,, decreases exponentially with gate voltage below threshold, but eventually I,, dominates. This gate current is not present in silicon MOS transistors, but it is the dominant leakage mechanism in most FET-accessed GaAs DRAM cells.
Peezalski, and Shur, 1988)
Here ZDs, is the current measured at threshold (VGs= VT), V,, is gate-tosource voltage, VTis the threshold voltage, VDs is drain-to-source voltage, and n is a subthreshold ideality factor. If V,, is greater than a few kT, the term in square brackets approaches unity. In any event, the last factor shows that the current decreases exponentially as V,, is reduced below threshold. Unfortunately, one encounters a fundamental difficulty in taking advantage of this exponential reduction in subthreshold current. This is because GaAs, unlike silicon, does not possess a true insulator that can be placed between the gate and channel. As a result, a significant gate-to-drain current can flow whenever the gate becomes too strongly reverse biased with respect to the channel. This is illustrated in Fig. 28, where we show the measured drain current for a GaAs MESFET as a function of gate voltage (Neudeck, 1991). As gate voltage is reduced below threshold, the drain current in Fig. 28 drops exponentially with gate voltage, in agreement with (59). However, a minimum is reached at a gate voltage of about -0.2V; below - 0.2 V the drain current increases. This increase is due to electrons flowing from gate to drain, and the I-V characteristic in this region is that of the reverse-biased gate-to-channel junction. If the transistor of Fig. 28 were used in a p n junction DRAM cell, the storage time would be reduced to a few hundred milliseconds at room temperature. Obviously, there is considerable motivation for minimizing this effect.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
39
o3
1
10-4
*
=: 3
j
105
104 107
lo4 lov 10-10 10 -11
-2.0
-1.o
-1.5
-0.5
0.0
Gate Voltage (V)
FIGURE29. Drain current vs. gate voltage for a 5 x 350pm ring-gate JFET.
For comparable devices, gate leakage is considerably lower in a JFET than a MESFET (Dungan et al., 1990; Neudeck, 1991). Figure 29 shows drain current versus gate voltage for a 5 x 350pm ring-gate JFET, while Fig. 30 shows drain current in a 10 x 350pm ring-gate MESFET. At room temperature, the gate-to-drain leakage in the JFET is more than two orders of magnitude smaller than in the MESFET. Thus, one expects superior storage time performance from a JFET-based memory. It has been found that the gate-to-drain leakage of MESFETs can be reduced significantly by treating the GaAs surface with ammonium sulfide (NH,),S just prior to thermal evaporation of Schottky metal (Neudeck et al., 1991a). (It is important in this procedure that the metal not be deposited in an electron-beam evaporator). Figure 31 shows drain and gate currents at room temperature for treated and untreated 10 x 350pm ring-gate
10.~ 10’~ 10.~
10.’
10.’ 10.10
-1.4
-1.2
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
Gate Voltage (V)
FIGURE30. Drain current vs. gate voltage for a 10 x 350 pm ring-gate MESFET. Note that at room temperature the gate-to-drain leakage in the MESFET is more than two orders of magnitude higher than in the JFET of Fig. 29.
40
JAMES A. COOPER, JR
10.~ 10.6 10” 10-8
10.”
lo-’* -1.o
-0.5
0.0
0.5
Gate Voltage (V)
FIGURE31. Drain and gate currents for a 10 x 350pn ring-gate MESFET with and without (NH,),S surface treatment prior to gate metal deposition. In the treated device the gate current exceeds the drain current for Vc < - 0.3 V because the gate depletion region is punching through to the substrate, causing substrate holes to flow to the gate. This phenomenon has no effect on DRAM storage (01991 IEEE).
MESFETs. The drain current in the treated device is reduced by about three orders of magnitude compared to the untreated device. (In the treated device, the gate current exceeds the drain current below about - 0.3 eV. This excess gate current is due to gate-to-substrate punchthrough, and has no effect on the storage time of the DRAM cell, since it does not flow to the storage capacitor .) The reduction in gate-to-drain current in the treated devices is attributed to an increase in Schottky barrier height that results from an unpinning of the Fermi level (Carpenter, Melloch, and Dungan, 1988). The (NH,)2S layer is thought to passivate most of the surface states that otherwise would exist at the metal-semiconductor interface. Although (NH,),S-treated surfaces degrade within a few hours if exposed to air, the presence of the metal gate protects the (NH,),S layer in these MESFETs. No changes in I-V characteristics could be detected after over a year of undessicated storage at room temperature.
D. A 4-Bit JFET Dynamic Content-Addressable Memory As a final example of demonstration circuits involving GaAs DRAM technology, we consider the four-bit dynamic content addressable memory (DCAM) constructed by Neudeck and coworkers (Neudeck, 1991). In a content-addressable memory, each storage cell contains logic to compare the bit of information stored in the cell with a binary signal presented on the bit line. The organization of a typical content addressable memory is depicted in Fig. 32. Each word in the memory array contains many bits of data stored
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES Bit Line
41
Bit Line
Word Line
Match Line
Word Line
Match Line
FIGURE 32. Schematic diagram of a content-addressable memory. Each memory cell contains logic elements that compare the contents of the cell to the information on the bit lines. If a bit fails to match, the cell pulls its match line to ground. Only if all cells in a word match the input pattern will the match line for that word remain high.
in individual cells connected to a common word line. In addition, the storage cells in each word of a content addressable memory are also connected to a second horizontal line called the match line. To select a stored word based on content, a binary pattern is impressed on the bit lines. Each cell in the memory compares its stored bit to the bit on its bit line. If any cell finds a mismatch, it pulls its associated match line to ground. Only if every bit in a word matches the input pattern will the associated match line remain high. Content addressable memories are useful in situations where large data bases are to be searched rapidly for specific patterns. The GaAs DCAM cell (Neudeck, 1991) is shown in Fig. 33. This cell consists of two JFET DRAM cells and four additional n-channel JFETs used to implement an exclusive-nor (XNOR) function. A unique feature of this design is the fact that an ohmic contact is established to the n-type region of the storage capacitor in each DRAM cell. This contact is connected to the gate of an n-channel JFET in the XNOR logic so that the contents of the cell can be compared with the new data on the bit lines. Altogether, six transistors and two storage capacitors are required per cell. The cell of Fig. 33 is implemented in a 2 x 2 DCAM array, complete with peripheral circuitry to load the memory with initial data, read the stored data using the bit lines, select the word lines, and read the match lines. In all, 40 JFETs and 8 storage capacitors are involved. Figure 34 shows operating waveforms of the 2 x 2 DCAM array. During the first six cycles a distinctive binary pattern (all 0s) is loaded into the
42
JAMES A. COOPER, JR. Word Line I
-
. Bit
Bit Line
Line
Match Line
5 0,
L
m
B L
Storage Node Contact
Storage Capicitor (Gate Metal)
v)
>m
‘DRAM Access JFET
FIGURE33. Schematic of one cell of the JFET DCAM cell (top) and layout of the cell (bottom). Each cell consists of two one-transistor DRAM cells and four n-channel JFET XNOR transistors. A two-level metal process is employed, with the top interconnect metal placed on polyimide for isolation.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
43
Keyword Input (81 & 82) and Match Line Output (M1 & M2)
MI M2
81
881
82
882
Stored Memory Contents
FIGURE 34. Operating waveforms of the 2 x 2 JFET DCAM array. The two match line and four bit line waveforms are shown for six cycles. During each cycle, the memory is loaded with a four-bit data pattern and then presented with nine interrogation patterns. The boxes at the bottom indicate the data pattern written into each word of the memory. The boxes at the top indicate the nine interrogation patterns and the match line responses. A d represents a “don’t care” condition, implemented by taking both bit and bit-bar lines low; d bits always match the stored bit.
memory. In the next nine cycles, all possible binary input patterns (including “don’t care” inputs) are presented and the match line outputs tabulated at the top of the figure. Beginning with the sixteenth cycle, a new data pattern (all 1s) is loaded into the memory and the interrogation pattern repeated. This cycle continues until all stored data patterns are verified with all possible input combinations. In all cases, the DCAM produced the correct outputs. The array was also tested as an eight-bit DRAM. In this mode, the match lines are ignored and the individual DRAM cells in each DCAM cell are written and read out using the bit lines. The array was fully functional in this mode as well.
44
JAMES A. COOPER, JR.
This DCAM array is far too small to be useful, it does not contain all the necessary peripheral logic such as sense amplifiers and address decoders, and it has not been optimized for minimum cell size or maximum speed of operation. Nevertheless, it does demonstrate that GaAs JFET DRAM cells can be combined into a working memory array, that they can be read and written electrically without interference between adjacent cells or excessive cross talk, and that they can be combined with logic within each cell to operate as a content-addressable memory.
IV. HETEROSTRUCTURE DRAM CELLS A . Introduction
Heterostructure DRAM cells have been investigated by several groups. These cells can be classified into two types: (i) generation-limited cells, similar to the pn junction JFET and MESFET DRAM cells already discussed, in which the nonequilibrium state is a deficit of carriers and the cell returns to equilibrium by thermal generation, and (ii) leakage-limited cells, in which the nonequilibrium state is an excess of carriers and the cell returns to equilibrium by leakage of excess carriers over a potential barrier. As we shall see, the generation-limited DRAM designs have been more successful, since generation is characterized by activation energies of at least half the semiconductor band-gap. To create a comparable leakage-limited memory would require a heterojunction band discontinuity of comparable magnitude. Such band discontinuities are difficult to achieve in 111-V material systems. In the following sections we shall first review the results obtained with leakage-limited devices (undoped heterojunction cells and quantum-well floating-gate cells). We will conclude with a discussion of generation-limited devices (modulation-doped heterojunction cells). B. Undoped Heterostructure DRAMS
A storage capacitor employing an undoped heterojunction (Cooper, Qian, and Melloch, 1986) is shown in Fig. 35(a). The barrier layer is undoped Al,Ga,-.As (x = 0.38), and excess electrons are confined in a two-dimensional electron gas (2DEG) in the GaAs adjacent to the heterojunction, as shown in part (b). Figure 36 shows C-V and I-V characteristics of the structure at 77 K in the dark and under white light (Kleine et al., 1989b). In the dark no measurable current flows, and the capacitance is that of an MOS structure in deep
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
45
FIGURE 35. A leakage-limited heterojunction storage capacitor formed by undoped AlGaAs on GaAs. A cross section of the structure is shown in (a) and the band diagram for the interface region is shown in (b). The top Al,Ga,-,As layer has a mole fraction x of 0.38, while the Al,Ga,_,As layers in the five-period superlattice have a mole fraction of 0.30. The superlattice buffer layer is included to improve material quality of the subsequently grown layers. Electrons in the 2DEG tend to escape. over the heterojunction to the GaAs gate. This leakage process limits the storage time of this capacitor.
46
JAMES A. COOPER, JR.
-
I . r L , , , , I , , , ,
I
,
,
, ,
,,
, , ,
,
, , ,
,
,
, , , ,,
1n-‘
10-7
Gate Voltage [V]
FIGURE36. Capacitance- and current-voltage characteristics of the heterojunction storage capacitor of Fig. 35 at 77 K. The current and the upper capacitance curves were taken under white light illumination, and the lower capacitance curve was taken in the dark. The dark current was too small to be measured. The ledge in the light-on capacitance curve indicates that electrons 1989 IEEE). are being stored in the 2DEG (0
depletion - the capacitance decreases as voltage is swept positive due to the expanding depletion region in the p-type GaAs. Under illumination, photogeneration occurs in the depletion region. Holes are swept into the substrate, while electrons drift to the GaAs/AIGaAs interface, where they are trapped in the potential well. As voltage is swept positive under illumination, a capacitance ledge is observed. This ledge is caused by the screening effect of electrons in the 2DEG at the GaAs/AlGaAs interface. The leakage current also increases. This can be explained as follows: At any bias, a steady state exists between electrons supplied to the 2DEG by photogeneration and electrons escaping the 2DEG to the gate or to the substrate. In the ledge region, not all of the photogenerated electrons flow from the 2DEG to the gate. Some are injected to the substrate, where they recombine with holes, producing no current. As the gate voltage increases, field emission and thermionic field emission of electrons from the 2DEG to the gate increase, and an increasing fraction of photogenerated electrons flow to the gate. The capacitance remains almost constant for a range of gate voltages (the “ledge”) because additional charges added to the gate by the increasing gate voltage are imaged by additional electrons in the 2DEG; the depletion region is shielded from the additional charges and does not expand. Eventually, the 2DEG contains the maximum electron density that can be contained behind the finite potential barrier (AEc = 0.3eV). Beyond this point no more electrons can be added to the 2DEG, and the depletion region is no longer shielded from increases in the gate charge. As a result, the capacitance again
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
-1 .o
0.0
47
1.0
Voltage (Volts)
FIGURE37. Transient capacitance decay at several gate voltages at 55K.When light is extinguished, electrons in the 2DEG leak over the heterojunction barrier to the gate, and the capacitance decreases to the dark value. The lack of hysteresis in the C-V curves in depletion (V, < - 0.4V) proves that all the stored electrons have been expelled from the ZDEG, and none 1989 IEEE). are held in traps (0
decreases with gate voltage. The gate current saturates, since now essentially all of the photogenerated electrons are flowing from the 2DEG to the gate. The various processes by which electrons escape the 2DEG are illustrated in part (b) of Fig. 35. These include thermionic emission over the potential barrier to the gate, thermionic-field emission through the top of the barrier, tunneling from quantized energy levels at E, and E , , diffusion into the substrate, and black-body photoemission over the barrier to the gate (Kleine et al., 198913). The rate at which electrons escape determines the storage time. This escape rate can be determined from a capacitance-time (C-t) transient at any voltage in the capacitance ledge region. An example of a series of such C-t transients is shown in Fig. 37. By a careful analysis of the rate of change of capacitance during each transient, the leakage current can be determined as a function of electric field in the AlGaAs and the quasi-Fermi level splitting in the 2DEG. From these analyses it is possible to distinguish which of the several leakage mechanisms in Fig. 35(b) is dominant in each bias region and to determine relevant constants in the mathematical expressions for each leakage mechanism. We shall not go into such detail here; the interested reader is referred to the literature (Kleine et al., 1989b). The major conclusion to be drawn from the work on leakage-limited undoped heterostructure DRAM cells is that storage times are too short to be useful except at cryogenic temperatures. This is illustrated by Fig. 38, where electron density in the 2DEG is plotted as a function of time following extinction of the light (Kleine et al., 1989b). Symbols represent experimental data obtained from capacitance transients, and lines are theoretical predictions based on the known emission mechanisms. One observes that storage
48
I
10'
10'
102
Time [s]
FIGURE38. Stored electron density as a function of time at 50 and 80K. The points represent experimental data at several values of gate voltage, while the lines are predictions of the emission theory of Kleine et al. (1989b). Storage times are on the order of a few seconds at 80 K (0 1989 IEEE).
times of around 1000 seconds are obtained at 50K. However, at 80K the storage times have dropped to the order of a few seconds. Measurements at higher temperatures are not practical due to the short time constant of the C-t transient. Figure 39 shows the heterojunction barriers in the GaAs/AlGaAs system as a function of AlAs mole fraction (Batey and Wright, 1986). The largest conduction band discontinuity, about 0.35 eV, occurs around x = 0.4, where the AlGaAs bandgap becomes indirect. This is very close to the mole fraction of the samples presented in Figs. 35-38. However the valence band discon0.6 0.5
0.4
I5 '0
w" U
0.3 0.2 0.1
0 0.0
0.2
0.4
0.6
0.8
1.0
AlAs Mole Fraction
FIGURE39. Conduction and valence band discontinuities in the AlGaAs ternary system as a function of AlAs mole fraction. The largest discontinuity in the conduction band occurs around x = 0.4,where the AlGaAs becomes indirect. In the valence band, however, the discontinuity increases monotonically as x approaches unity.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
49
0
I
Barrier
Potential Well
AuZn
-
f
N GaAs 1 ~ 1 0 cm3 '~
\
3 N+ GaAs Substrate
FIGURE 40. A heterojunctioncapacitor for storing holes at the interface between undoped AlAs and n-type GaAs. The valence band discontinuity at this interface is about 0.55 eV.
tinuity increases monotonically, reaching a maximum value of about 0.55 eV at x = 1. Perhaps an advantage can be gained by storing holes in a twodimensional gas at the GaAs/AlAs interface! Figure 40 shows a heterostructure storage capacitor formed by undoped AlAs and n-type GaAs. This capacitor was also studied by the method of C-t transients (Qian, Melloch, and Cooper, 1986, 1989), and the results for a gate voltage of - 3 V are shown in Fig. 41. Here we see that storage times of tens of seconds are obtained at 100 K, and measurable charge storage persists to almost 200 K. Again, however, room temperature operation is not feasible. In addition to questions of storage time, we must also consider the charge storage density of heterojunction DRAM cells. The hole memory capacitor of Fig. 41 can retain about 8 x 10" holes per cm2at 150 K, while the electron capacitor of Fig. 38 retains only about 1 x 10" electrons per cm2at 80 K. In order to store lo6 carriers, the hole capacitor would need to be at least 1 1 x 11pm2, while the electron capacitor would need to be at least 32 x 32 pm2. In Section I1 we found that pnp homojunction capacitors could be made as small as 5 x 5pmZ. C. Quantum- Well Floating-Gate DRAMS
Two groups have investigated charge storage in quantum wells formed by heterojunctions. The first such structure was reported by Capasso and co-workers (Capasso et al., 1988; Beltram et al., 1988), and is shown in Fig. 42. In this structure a GaAs quantum well, formed between two AlAs barriers, serves as a floating gate for an underlying field-effect transistor. Electrons are injected into the quantum well from the gate by a negative pulse. This injection is aided by grading the band-gap of the barrier layer between the gate and the quantum well. Once stored, the electrons cause a positive shift in the threshold voltage of the underlying transistor. This
50
JAMES A. COOPER, JR.
90 K
vg = -3.0 v
10
.---.
3
N
I
b!
E
L
rl
0
4
x
5
h
Y
v
a
..
A
0
lo-'
I
10"
I
10'
I
102
lo3
t [sec]
FIGURE 41. Hole density in the capacitor of Fig. 40 as a function of time at several temperatures. Hole densities in the low 10" cm-' range can be retained for several seconds at temperatures up to 190K, but room temperature operation is not feasible.
threshold shift can be detected by observing the current in the transistor, providing a form of nondestructive readout. In the GaAs/AlAs version, the room temperature storage time was about 2 sec, increasing to about 4 hr at 77 K. Figure 43 shows drain current transients at 140 K and at room temperature following filling pulses from the gate. From measurements of effective threshold voltage shift, a sheet charge density of about 10l2cm-2 was inferred for the quantum well. A similar structure was investigated by Lott, Klein, and Weaver (1989b). This structure differed from the structure of Fig. 42 in that a variable-period superlattice was used to produce the graded-band-gap effect in the injector region. Storage times of around 1 sec were reported at room temperature, with sheet charge densities above 10''cm-2. The latter workers have also reported similar devices using InAs (wells)/ AlAsSb (barriers) on InAs substrates (Lott et al., 1989a), and using InGaAs (wells)/InAlAs (barriers) on InP substrates (Lott et al., 1990). The InAs/ AlAsSb structures exhibited a l / e storage time of only 50 sec at 77 K. Storage time data was not reported at higher temperatures. The InGaAs/InAlAs devices had storage times of around lOsec at 200K, decreasing to less than 1 s at 250 K. In the latter devices, the dominant mechanism of charge loss was
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
N+Gah 500nm
graded undoped
"raped
AlGaAs 180 nm
1
200 nm
U AlAs 100
nm
51
N SI GaAs OeAS 75 Substrate nm
Equlllbrlum
Du~Ing WRITE phase
FIGURE 42. A floating-gate DRAM cell in which electrons are stored in a GaAs quantum well between two AlAs barrier regions. Electrons are injected from the top gate over a compositionally graded AlGaAs barrier. The presence of electrons on the floating gate modulates the 1988 IEEE). current in the GaAs channel below, providing nondestructive readout (0
identified as horizontal leakage over a lateral potential barrier, rather than vertical emission over the heterojunction discontinuity. The quantum-well DRAM structures described previously are qualitatively similar to the single-heterojunction storage capacitors discussed in Section 1V.B. In both cases electrons are retained behind the potential barrier created by the conduction band discontinuity at a heterojunction. The storage time is determined by the rate at which electrons are emitted over the heterojunction. In comparing the storage times reported for these two types of structures, a striking difference in storage time is apparent: The capacitors of Section IV.B, which were based on the heterojunction between GaAs and Al,Ga,_,As
52
JAMES A. COOPER, JR.
-a -E
2 c5-)/-vw
atp= loops
Atp=1 ms
(z
(b)
3
za
4T=300K
a
n
3
I
I
9
5-
T= 140K
-
I WRITE
4-
0
WRITE
ERASE (SHORT LIGHT PULSE)
2
4 TIME
6
(c)
8
10
(5)
FIGURE 43. Channel current in the floating-gate DRAM of Fig. 42 as a function of time at (a) 140K and (b) room temperature. The current transients have time constants of 4 hours at 140 K and 2 seconds at room temperature. Part (c) illustrates the write-erase operation at 140 K (erase is accomplished by shining light on the sample) (01988 IEEE).
(x = 0.38), had measurable storage times only below 100 K. In contrast, the quantum-well capacitors of Section C, based on the heterojunction between GaAs and AlAs, had storage times of 1-2 sec at room temperature and 4 hr at 140K. What can account for such a dramatic difference? One possible explanation is that the GaAs/AlAs and InAs/AlAsSb systems are heterojunctions between direct band-gap and indirect band-gap semiconductors. Thus, electrons residing in the r valley in the quantum well must be emitted into the X valley in the barrier. The thermionic emission rate of
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
53
electrons over a potential barrier decreases exponentially with the barrier height. However, Solomon, Wright, and Lanza (1986) have argued that when the heterojunction is indirect, the prefactor before the exponential term is diminished, owing to reflections caused by wave function mismatch. A similar argument applies to thermionic-field emission (tunneling). Solomon et af. (1986) have calculated these prefactors from measurements of dc current through n+ in- n+ structures, where the i layers were undoped Al,Ga, -,As (0.3 < x < 0.8) and the doped layers were GaAs. They conclude that the prefactor for thermionic emission decreases by about a factor of 200 at x = 0.8 as compared to the value when x < 0.4. For thermionic-field emission, the prefactor decreases by about a factor of 2500 at x = 0.8. However, Kleine, Melloch, and Cooper (1989a) have pointed out that these decreases are insufficient to explain the reported room temperature storage times of the quantum-well storage capacitors. Kleine et al. measured the transient decay of electron density at the heterojunction between GaAs and Al,Ga,-,As at 55K for mole fractions of 0.4, 0.6, 0.8, and 1. The optimum mole fraction for electron storage at this temperature was found to be between 0.4 and 0.6, and the storage time decreased markedly as the mole fraction approached unity (Kleine et al., 1989a). Moreover, they showed that the storage time of a GaAs/AlAs barrier at room temperature, based on the parameters of Solomon et al., would be less than 1psec. Another possible explanation for the long storage times reported for quantum-well capacitors is charge trapping (Kleine et af., 1989a). It is not possible to determine from capacitance or threshold voltage transients whether the stored charges are mobile or confined in deep traps in the material. Certainly, if charges were confined in traps, the storage times would be much greater than predicted by thermionic emission or thermionic field emission theories. If charge trapping were taking place, the cell would also be highly resistant to electrical erasure, and could not function effectively as a random-access (read-write) memory. Indeed, the floating-gate cells of Capasso et al. (1988) and Beltram et af. (1988) could not be electrically erased, and instead were erased with light. Moreover, evidence of charge trapping in the AlAs was specifically mentioned by the groups investigating GaAs/AlAs quantum-well capacitors (Capasso et al., 1988; Beltram et al., 1988; Lott et al., 1989b). In the single-heterojunction results reported in Section B and in the emission study of Kleine et al. (1989a), charge trapping could be ruled out by the fact that no hysteresis was observed when C-V curves were swept rapidly in both directions (electrical removal of stored electrons). To summarize this rather confusing section, quantum-well storage capacitors have exhibited storage times up to 1-2 sec at room temperature, although some controversy exists regarding the explanation for the storage
54
JAMES A. COOPER, JR.
I
2DEG
P+ GaAs
....................
I
N AlGaAs
P GaAs P+ GaAs Substrate
FIGURE44. Cross-section and band diagram of a modulation-doped heterostructure storage capacitor. Because of the band banding in the AlGaAs layer, the Fermi level at the interface lies above the conduction band in the GaAs. As a result, a 2DEG will be present at the GaAs/AlGaAs interface in equilibrium.
phenomenon. The issue is made moot, to some extent, by the results obtained with simple pn junction storage capacitors in Section I1 and the results to be reported in the next section. D . Modulation-Doped Heterostructure DRAMS
A heterostructure capacitor employing modulation doping is illustrated in Fig. 44. This type of structure has a distinct advantage over both the undoped heterostructure capacitors considered in Section B and the quantum-well capacitors discussed in the last section. This is because the modulation-doped structure of Fig. 44 is generation limited, in the sense that it retains a two-dimensional electron gas as the equilibrium state. The non equilibrium
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
55
Access
P+ substrate
L--
Storage Capacitor
, Word Ly,
FIGURE45. Cross section of a MODFET DRAM cell. This cell consists of a rnodulationdoped heterostructure capacitor and a MODFET access transistor.
state is achieved by removing electrons, and the structure returns to equilibrium by thermal generation, just as the pn homojunction capacitors of Section 11. Thus one expects storage times in modulation-doped heterostructure capacitors to be comparable to those in p n homojunction capacitors. Figure 45 illustrates a complete one-transistor DRAM cell using a modulation-doped heterostructure (Kleine, Cooper, and Melloch, 1991). A p + GaAs layer is used as the gate of the access transistor and the top electrode of the storage capacitor. This p + layer has a higher barrier to the AlGaAs than a metallic gate, thus reducing gate leakage current in the access transistor (Priddy et al., 1987). The AlGaAs layers have an AlAs mole fraction x = 0.3. The storage capacitor is 187 x 190pmZ,while the access transistor has a gate length of 5Spm and width of 54pm. Storage times of an isolated modulation-doped storage capacitor and a complete DRAM cell are shown as a function of temperature in Fig. 46. As was the case with JFET and MESFET DRAM cells, addition of the access transistor reduces storage times by several orders of magnitude. This is due to gate leakage current flowing when the transistor is biased off. At room
56
JAMES A. COOPER, JR. Temperature ("C) 105f,'
144 .
,
111 '
'
60
84 '
'
'
39 ,.
21 ,
1OOO/T (1/K)
FIGURE46. Storage times of an isolated modulation-doped storage capacitor and a complete MODFET DRAM cell as a function of temperature. The isolated capacitor has a storage time of 4.3 hr at room temperature. The storage time of the complete cell is reduced to about 2 min at room temperature because of leakage currents in the MODFET access transistor.
temperature, the storage time of an isolated modulation-doped storage capacitor is about 4.3 hr. In the complete DRAM cell, gate leakage in the access transistor reduces the storage time to about 2min at room temperature. The storage capacitor exhibits two activation energies, indicating two independent generation processes are occurring. The fact that both of these activation energies are greater than half-band-gap suggests generation through centers some distance from the middle of the band-gap. The results shown in Fig. 46 were obtained on cells that had thermally evaporated metal. The use of electron-beam metallization reduces the storage time of isolated capacitors to around 15min at room temperature. This is consistent with the results reported for pn junction storage capacitors in Section II.D.4. Figure 47 illustrates electrical writing of the MODFET DRAM cell. The charge state of the storage capacitor is monitored by measuring the capacitance between the storage gate and the substrate. A gate bias of - 2.9 V was applied to the access transistor during storage, and the gate was pulsed to 0 to write information into the cell. The arrows indicate the points where the MODFET access transistor was momentarily pulsed on. Write pulses as short as 20nsec were effective in writing the cell. Shorter pulses were not investigated due to equipment limitations. It can be seen that for this particular device the room temperature l/e capacitance recovery time is about 3 min. The bit line waveforms during reading are shown in Fig. 48. Here the potential of the bit line is monitored by a high-impedance FET probe. The 2DEG in the storage capacitor is first partially removed by pulsing the
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES 11
0.5
10
0.4
9
0.3
a 0 8 c m c
0.2
E: a v
-k 0
-
’ 0
.-O
7
0.1
Q
6
0.0
al
=-
m
B -
8
57
C
2
-0.1
5 4
-0.2 0
200
100
300
500
400
600
Time (s)
FIGURE 47. Writing waveforms for the MODFET DRAM cell of Fig. 45. Note that the cell capacitance changes only at the instants when write pulses are applied to the word line (indicated by arrows). The storage time of this cell can be estimated from the capacitance transient to be about 3 min at room temperature.
storage gate to a negative bias for sufficient time for the electrons to recombine with substrate holes. The pulse voltage magnitude is indicated on the figure. Note that for the largest pulse voltage shown here (- 1.5V), the 2DEG is still not totally removed, since the threshold voltage for this device is - 2.08 V. The access transistor is turned on at t = 0, which is 50 msec after the end of the emptying pulse. For r > 0, charge sharing occurs between the storage capacitor and the combined capacitance of the bit line and FET probe, resulting in a voltage output proportional to the charge required to refill the storage capacitor. The decay of the output signal for r > 0 is due primarily to the conductance of the FET probe. 0.5
E
-.-m
,
,
I
,
,
,
,
,
,
,
,
,
,
,
,
. , ,
,
,
,
,
300 K
0.4
E: 0.3 al w
tc
0.2
.=
0.1
0
.4 m
0 -10
0
20 Time (ps)
10
30
40
FIGURE48. Bit line waveforms during reading of the MODFET DRAM of Fig. 45 as a function of the write pulse voltage used to empty the storage capacitor. The access transistor is turned on at time t = 0, which occurs 50msec after a I is written to the cell. The storage capacitor contains a partial 2DEG even for the - 1.5 V case, since the threshold voltage for total removal of the 2DEG is - 2.08 V.
58
JAMES A. COOPER, JR.
output
Memory Cel I S
F1
MODFET F3
F2
s
F4
D
500 nm undoped AlGaAa
-
950 nm p AlGaAs
-
360 nm p GaAs
seml-InsulatingG a b substrate
FIGURE49. A modulation-doped heterostructure DRAM cell reported by Chen et al. Electrons are stored in the 2DEG under gate F2, while gates F1 and F3 are used for isolation. The transistor formed by gate F4 is used to buffer the output. During testing, gate F3 was left on permanently. In an actual DRAM, transistors FI and F3 would be replaced by a single MODFET (reprinted with permission of Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, Massachusetts).
A similar modulation-doped heterostructure DRAM cell has been reported by Chen, Goodhue, and Mahoney (1991). This cell, shown in Fig. 49, utilizes a 20 nm undoped GaAs quantum-well channel grown on a fourperiod AlGaAs/GaAs superlattice buffer layer. An undoped AlGaAs layer and a p + AlGaAs barrier are placed under the superlattice to prevent electron emission to the semiinsulating substrate, and a modulation-doped barrier is grown on top of the GaAs channel. All AlGaAs layers have mole fraction x = 0.23. The primary storage gate is labeled F2, and two separate transistors, labeled Fl and F3, are used for writing and reading the cell. An output MODFET, F4, is also included to amplify the signal resulting from charge transfer when F3 is turned on. In functionality demonstrations, however, Chen et al. kept F3 biased on to avoid capacitive coupling between the read pulse and the cell and placed a 1 MQ resistor in series with the gate of F2 to minimize coupling of the write pulse applied to F1. The circuit of Fig. 49 has a storage time of only 250msec at room temperature, probably due to leakage through transistor F3, which is always left on. However, large-area isolated storage capacitors exhibit storage times as long as 5.4sec at room temperature and up to 40 sec at 77 K. The charge storage density is 4.5 x 10" electrons per cm'.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
59
V. BIPOLARDRAMs A . Introduction
In Section I1 we found that the storage time ofpn junction storage capacitors is limited by thermal generation of hole-electron pairs. This process is slow in higher band-gap materials such as GaAs, and well-made storage capacitors can have storage times of several hours at room temperature. In Sections I11 and IV we learned that the addition of a field-effect access transistor (either JFET, MESFET, or MODFET) reduced the overall storage time of the complete cell to the order of 0.1-100sec. This drastic reduction in storage time is due to unavoidable gate leakage when the access transistor is in the off state. We note that silicon DRAM cells avoid this problem because MOS transistors have essentially no gate leakage. The most straightforward way to eliminate gate leakage in the access transistor is to eliminate the gate! This can be done if, instead of a field-effect access transistor, we employ a bipolar access transistor. We should point out that bipolar DRAM cells are really not new, since DRAMs based on integrated injection logic were introduced in silicon in the early 1970s (Sander and Early, 1976; Quinn et al., 1978). These cells worked well, but they offered no particular operational advantage over MOS DRAM cells, since MOS cells in silicon are not limited by gate leakage. However, in GaAs the advantage over FET-accessed cells is enormous. In this section we will discuss the operation of a bipolar DRAM cell in GaAs. B. Concept of the Bipolar DRAM Cell A bipolar DRAM cell in GaAs (Stellwag, Cooper, and Melloch, 1991, 1992) is illustrated schematically in Fig. 50. This cell can be recognized as the basic pnp storage capacitor of Fig. 4 with an additional n-type layer on top. This n-type layer, together with the p - and n-type layers immediately beneath it, form an npn bipolar transistor. This bipolar transistor is merged with the pnp storage capacitor, as indicated in the figure. The base of the bipolar transistor is connected to the word line and the emitter to the bit line. The collector is floating and forms a p n junction with the substrate. The capacitance of this latter pn junction, together with the capacitance of the base-collector junction, stores the charge. Operation of the bipolar DRAM cell is illustrated in Fig. 51. In equilibrium, the n-type collector is at ground potential, corresponding to a stored logic 0 (part a). A logic 1 is written by removing electrons from the collector. This is done by taking both bit line and word line positive, thereby forward
60
JAMES A. COOPER. JR
Bit Line NPN Bipolar Access Transistor
Word Line
t ...........
%
.......
P GaAs N GaAs
Floating Collector
PNP
Storage Capacitor
P+ GaAs substrate
*
FIGURE50. Cross section of a bipolar DRAM cell in GaAs. An npn bipolar transistor is merged with a pnp storage capacitor to form the cell. Alternatively, the cell may be viewed as an npn bipolar transistor with a floating collector, with the collector specially constructed to have a large capacitance to ground.
biasing the collector-base junction (part b). Electrons will flow from the collector to the emitter until the collector-base voltage has been reduced to 0. The collector is now at the potential of the word line. If both bit line and word line are returned to ground, part (c), the collector remains positively charged due to the loss of electrons. The structure is now storing a logic 1. Note that all potentials applied to the cell are now at ground. As a result, the only currents that will discharge the collector are those due to generation in the depletion regions of the two reverse biased junctions. This is the same situation as the pnp storage capacitors of Section 11, where storage times of several hours were obtained at room temperature. Also, note that, since storage takes place with all terminals at ground potential, no power is dissipated in the storage state. Indeed, the memory can be considered nonvolatile, in the sense that all external power can be removed without loss of stored data, at least for times short compared to the normal storage time of the cell. The cell can be written to a logic 0 state by keeping the bit line at ground and taking the word line slightly positive, as shown in part (d). Electrons then flow from the emitter to the collector until the collector-toemitter voltage is 0. In examining the sequence of band diagrams in Fig. 5 1, we note a conflict. To empty the cell, the word line and bit line are taken several volts positive. To fill the cell, however, the bit line is kept at ground and the word line is taken only slightly positive. Since we have no way of knowing whether the bit line will be positive or at ground during a particular writing operation, how can we know to what potential to take the word line? The answer is that the writing operation must be a two-step procedure: First all cells in a word
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES Substrate
61
Floating coiiector
%?* \....
0
.
(a)
Store Zero (Equilibrlurn)
0
....
(b) Read -or-
Write One
0
0
0
(c) Store One
FIGURE 51. Band diagrams illustrating the operation of the bipolar DRAM of Fig. 50. Data consists of the presence or absence of electrons in the floating collector. Electrons are removed by taking both the word line and bit line positive, as shown in part (b). Data is stored when both the word line and bit line are at ground potential, as in parts (a) and (c). Electrons are inserted by taking the word line slightly positive while the bit line is held at ground, part (d).
must be emptied (word line and all bit lines taken several volts positive), then the cells must be selectively refilled, based on the bias state of their particular bit lines (word line taken only slightly positive). From the schematic drawing in Fig. 50, two other advantages of the bipolar cell can be seen. First, the cell conserves area, since the access transistor is vertically integrated on top of the storage capacitor. Second, the cell is inherently fast, since charge transport during reading and writing is vertical rather than horizontal. This latter point deserves further explanation. In FET-accessed cells such as shown in Fig. 21, charge transport occurs laterally along the FET channel. During reading and writing, the storage capacitor in these structures functions essentially as a long-channel FET, and the channel can pinch off, limiting current flow. This pinch-off effect is avoided in the bipolar structure.
62
JAMES A. COOPER, JR.
The bipolar DRAM of Fig. 50 can also be realized as a heterojunction bipolar transistor, or HBT. In the HBT the emitter is formed from AlGaAs, and the base and collector from GaAs. HBTs are widely used in GaAs bipolar circuits because the valence band offset at the emitter-base junction drastically reduces hole injection from the base into the emitter, improving the emitter injection efficiency. This improvement is so great that it is no longer necessary to keep the base doping lower than the emitter doping. The increase in base doping has the added advantage of reducing the base spreading resistance. To apply these ideas to the bipolar DRAM cell, we must consider the requirements on the bipolar access transistor. As illustrated in Fig. 51, the operation of the bipolar DRAM dictates that the access transistor operate in both the forward and inverse modes. When electrons are removed from the floating collector (Fig. 51(b)), the transistor is in the inverse mode, with the collector functioning as an emitter and the emitter functioning as a collector. If we construct the cell using HBT technology, it will be desirable to fabricate both the collector and emitter in AlGaAs to enhance both forward and inverse operations. Therefore, we envision a double-heterojunction bipolar transistor in which the n-type emitter and n-type collector are both AIGaAs. The presence of a heterojunction in the reverse-biased collector-base junction would have minimal effect on forward operation. A further extension would be to make all layers (except the base) AlGaAs, since this would reduce generation in the depletion region of the collectorsubstrate junction. C . Experimental Results A prototype homojunction bipolar DRAM cell is shown in cross section in Fig. 52. This cell was grown by MBE, and metalized using a thermal evaporator (Stellwag et al., 1991a, 1992b). The storage time of the cell is measured by observing the transient recovery of the base capacitance following writing a logic 1. The capacitance transient of a 36 x 48 ,urn2cell at room temperature is shown in Fig. 53. The l/e recovery time is 16,200 sec, or 4 . 5 hr. This storage time is of the same magnitude as storage times of isolated pnp capacitors reported in Section 11, indicating that the presence of the bipolar access transistor has no negative effects on storage time. Figure 54 shows storage time as a function of temperature for several bipolar DRAM cells. An activation energy of 0.88 eV is common to all cells, and larger cells have slightly longer storage times, consistent with the scaling arguments presented in Section II.D.3. The cell is electrically written in Fig. 55. The 3 V, 1 msec word line pulses are not shown, but occur at the points indicated by arrows. Note that the capacitance increases when the bit line is at ground (store 0 state), since this
63
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
p+ GaAs I I
1x10 19cm-3
Access Transistor
100 nm
I GaAs
30 nm 5x10 "cni3
p+ GaAs
I I
200 nm
1x10 '9cmi3 1000 nm
p+ GaAs Substrate
FIGURE 52. Cross section of the experimental device used to verify the operation of the 1992 IEEE). bipolar DRAM (0
returns the cell to equilibrium. When the bit line is positive (store I state), the capacitance decreases. This is because electrons have been removed from the floating collector, widening the depletion regions. The capacitance transient in the store 0 state is due to the turn-off time of the forward-biased emitterbase junction, as excess carriers are removed by recombination. This transient has no effect on operation of the memory. We should emphasize that in the high-capacitance store 0 state, the cell is actually in equilibrium, Time (hours) 0
2
4
6
1.10
G'
e 8c
1
105
-
.
,
0
10
mj. I '
.
-
m
0
10,000
20,000
30,000
40,000
Time (sec) FIGURE 53 Capacitance recovery transient measured at the base contact of the bipolar DRAM cell of Fig 52 at room temperature A recovery time constant of 4 5 hr is observed (01992 IEEE)
64
JAMES A. COOPER, JR. Temperature ( "C ) 144
111
84
60
40
21
1
105
2.4
2.6
2.8
3.0
1000 / T (1/
3.2
3.4
K)
FIGURE54. Storage time as a function of temperature for several bipolar DRAM cells. All cells have an activation energy of about 0.88 eV. Storage time decreases slightly as the cell is 1992 scaled to smaller dimensions, as expected from the scaling discussion in Section I1 (0 IEEE).
and no further decay of capacitance will occur. The nonequilibrium condition actually corresponds to the low-capacitance store 1 state. Thus, the capacitance in the store 1 state will gradually rise (with a time constant of 4.5 hr at room temperature). No such rise is perceptible on the short time scale of this figure.
VI. FUTURE DIRECTIONS A . Introduction
In the five previous sections we have described experimental work that has led to the present state of the art in GaAs one-transistor DRAM cells. Over the past years, storage times in GaAs capacitors have increased dramatically - from about 3 min at room temperature in 1987 to over lOhr in 1991, and the trend seems likely to continue. Moreover, the use of bipolar access transistors, as described in Section V, makes it possible to construct complete DRAM cells having storage times on the same order of magnitude as the capacitors alone. The charge storage density of p n junction capacitors in GaAs is comparable to that achieved in planar storage capacitors in silicon. What next? What directions should GaAs DRAM development take from this point? What applications can be identified that would lead to commercialization of these devices? We will address some of these questions in this final section.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
65
-
L
c m
Equilibrium Capacitance
I
---
m
c)
Store 0
0
8 0
Store 1
m
iL------1
0
200
400
600
800
1000
Time (sec)
FIGURE 55. Electrical writing of the bipolar DRAM cell. For this test, the charge state of the cell is monitored by observing the base capacitance. To write the cell, 3 V, 1msec write pulses are applied to the word line at the times indicated by arrows. The capacitance decay in the store 0 state is due to the turn-off transient of the forward-biased emitter-base junction, and it has no effect on operation. The store-0 state is in fact the equilibrium condition and will never decay. The low-capacitance store 1 state is the nonequilibrium condition. This capacitance will gradually increase with a time constant of 4.5 hr at room temperature. This increase is imperceptible on the time scale of this plot (0 1992 IEEE).
B. Trench Capacitors and Stacked Capacitors in GaAs One of the dramatic developments in the evolution of silicon DRAMs was the introduction of vertical structures to increase the effective charge storage density (Sunami, 1985; Lu, 1989; Sunouchi et al., 1990). These vertical structures take the form of trench capacitors and stacked capacitors, as illustrated in Fig. 56. A large number of variations on these basic ideas have been proposed, but we will not attempt to review them here. The main point is that vertical structures have become absolutely necessary in silicon DRAMs at the integration levels prevalent today. On first glance it would appear that GaAs would be unable to emulate these vertical structures, and so would be totally unable to compete for high-density DRAM applications. However, that may not be the case at all. In this section, we will describe early efforts to develop a trench capacitor
66
JAMES A. COOPER, JR. Bit Line
Word
Line
Plate Voltage
SIO,
P Silicon
Trench Capacitor Cell
Plate Voltage Bit Line
Word Line
P Silicon
P Stacked Capacitor Cell
FIGURE56. Two types of vertical-geometry DRAM cells in silicon. The trench capacitor cell conserves area by placing the storage capacitor on the sidewalls of a vertical trench. The trench aspect ratio can be as high as 30: 1. The stacked capacitor cell utilizes a multilayer sandwich of insulating and conducting layers formed on top of the semiconductor surface.
technology in GaAs. We will also speculate on the use of stacked capacitors in conjunction with GaAs DRAM cells. Trench capacitors and complete trench DRAM cells can be grown in GaAs using a technique known as atomic layer epitaxy (ALE) (Bedair et al., 1985). ALE is a gas-phase epitaxial growth technique similar to metal-organic chemical vapor deposition (MOCVD) except that special precautions are taken to expose the growth surface to only one species of adatom at a time. Epitaxy is carried out in a temperature range where growth is self-limiting, so that a single monolayer of either Ga or As is formed during one cycle. The growth surface is alternately exposed to ambients containing Ga and As source gases, resulting in layer-by-layer growth. This leads to unprecedented control of the growing film on an atomic level. One very useful property of this growth technique is the ability to grow high-quality layers conformally on vertical sidewalls of etched trenches. This property is exploited to build trench DRAMS in GaAs. Figure 57 is a scanning electron micrograph of four 100 x 100pm’ mesaisolated GaAs p n diodes grown by ALE over a substrate in which several square 2pm deep trenches were etched prior to growth (Neudeck et al.,
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
67
FIGURE 57. SEM photograph of four 100 x 1oOpm2mesa-isolated pn diodes grown by ALE conformally over a substrate in which 2 pm deep trenches were etched prior to growth. Two of the mesas have a single 30 x 30pm2trench each, one mesa has nine 10 x 10pm2trenches. and one is planar (no trenches). The planar mesa is at the upper right. The light-colored square regon near the center of this device is the metal ohmic contact. An SEM cross section of one of the trenches is shown at the bottom.
68
JAMES A. COOPER, JR. Area = 100 x 100 pm
-1
0
-0.5
0.5
Voltage (V)
FIGURE58. Current-voltage characteristics at 1 4 4 O C for three ALE-grown pn diodes (symbols) with and without trenches. The nine-trench diode has 360pm of trench sidewall, while the one-trench diode has 120pm of trench sidewall. The line represents one of the best MBEgrown planar diodes.
1991b). Three types of diodes are shown: the diode on the upper right has no trenches (planar diode), the diodes on the upper left and the lower right each have one 30 x 30pm’ trench, and the diode on the lower left has nine 10 x 10pm2 trenches. The surface morphology is smooth, and conformal growth of epitaxial layers over the trench sidewalls is observed. Figure 58 shows the current-voltage characteristics of these three types of structures at 144OC. Also shown for comparison is one of the best planar diodes grown by MBE. Both forward and reverse currents are larger for the diodes containing trenches, but the reverse characteristics are not significantly degraded. Figure 59 shows the dependence of reverse current on trench perimeter for the three structures at a reverse voltage of 1V. Note that the one-trench and the nine-trench diodes each have the same trench area and same total diode area,
0
100
200
300
400
Trench Perimeter (pm)
FIGURE59. Leakage current at 144°C and 1 V reverse bias for the three ALE diodes of Fig. 5 8 . The reverse leakage increases linearly with trench perimeter.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
69
but the trench perimeter differs by a factor of three. From this plot it is apparent that the reverse leakage scales with trench perimeter. It is not possible to determine from this measurement whether the leakage is due to the sidewalls themselves or to generation at the edges where the vertical sidewalls meet the horizontal top and bottom surfaces. This question can be resolved by a study of the dependence of leakage current on trench depth. Measurements of leakage current as a function of temperature indicate that the leakage is thermally activated. The planar diodes have an activation energy of 0.713 eV while both the trenched samples have an activation energy of 0.844 eV. These samples were metalized by electron-beam evaporation. Further work is needed to determine the dependence of leakage current on trench depth and sidewall orientation. In addition, both pnp storage capacitors and bipolar DRAM cells need to be built using this technique, and storage times need to be measured as a function of trench depth and orientation. Preliminary evidence based on the diode leakage measurements suggests that the degradation of storage time caused by the trenches would be within acceptable limits. The advantages of increased charge storage in the same amount of horizontal chip area would probably far outweigh the negative effects on storage time, particularly when storage times of planar devices are now measured in terms of hours at room temperature. Another approach to achieve higher charge storage densities with nonplanar capacitor structures is the use of stacked capacitors, similar to those illustrated in Fig. 56. In silicon DRAMS these stacked capacitors are formed by alternating layers of polysilicon conductors and SiO, insulators fabricated by a complex and ingenious sequence of deposition and selective etching. There is no fundamental reason why these steps cannot be performed on GaAs substrates. In effect, we are proposing that the storage capacitor need not be integral to the GaAs substrate, but could simply be deposited on the top surface after all the GaAs active devices are fabricated. This would open the possibility for a number of different structures, including metaloxide-metal capacitors, stacked capacitors such as shown in Fig. 56, and capacitors having a ferroelectric material as dielectric. C . Nondestructive Readout Cells
Another direction that shows great promise are GaAs DRAM cells that provide nondestructive readout with internal gain. Two types of structures are known to be under investigation (Cooper, 1989; Hetherington, Klein, and Weaver, 1991). One such structure, the bipolar/field-effect (BiFET) cell (Cooper, 1989), is shown in Fig. 60. Here the basic bipolar cell of Section V has been modified by the addition of a second base contact. This second contact allows the base layer to function as the channel of a lateral JFET,
70
JAMES A. COOPER, JR Gate / Emitter Source / Base
P I 30 nm undoped
r-
-
Y
Drain / Base
P
N GaAs P+ GaAs
I
N+ GaAs
l
FIGURE 60. A proposed bipolar-field-effect (BiFET) DRAM cell. The cell operates as a bipolar storage cell as described in Section V. Once a potential is stored on the floating collector, the depletion region of the collector-base junction modulates the lateral conductivity of the base layer. By establishing two ohmic contacts to the base, it is possible to detect the charge state of the floating collector by measuring the current between the two base contacts. In effect, the base forms a lateral JFET, gated from above by the emitter and from below by the floating collector.
gated from the top by the emitter and from the bottom by the floating collector. As a result, the current in the JFET is controlled by the potential of the floating collector, providing nondestructive readout. Because the cell now has gain, a robust signal can be placed on the bit lines during readout. Data is written to the cell in the conventional way using the bipolar access transistor (see Section V). The additional requirements on the base layer place new constraints on the design of the cell. In particular, the base layer thickness and doping must be adjusted to allow significant modulation by the floating collector. This must be accomplished without sacrificing the operation of the bipolar access transistor, both in the forward and inverse modes. Work is proceeding on these design issues, but no insurmountable problems are foreseen. The design problem is made more tractable by the several degrees of freedom inherent in the structure. Adjustable parameters include the base doping and thickness, the logic voltages on the floating collector, and the bias voltage on the emitter during readout. In addition, the use of heterojunctions at the emitter-base and collector-base junctions would improve bipolar operation, further relaxing the design constraints. In addition to providing nondestructive readout for digital (binary) memories, the BiFET cell can also be used as an analog memory. This mode of operation opens the possibility for several new applications, one of which is in the field of artificial neural networks. This will be discussed in Section
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
71
D. Ultra-Long Storage Times: Quasi-Static and Nonvolatile DRAMs As pointed out in the introduction to this section, storage times in GaAs memories have increased dramatically in recent years. In simple GaAs bipolar DRAM cells, storage times in excess of 4 hr at room temperature are now routinely obtained. For many applications such as cache memory, in which paging occurs at predictable intervals, storage times of this magnitude allow the cell to be operated essentially as a static memory, without the need for refreshing of any kind. Moreover, the bipolar DRAM has the unique feature that storage is accomplished with no external bias applied to the cell. This is because the bipolar access transistor is a current-controlled device and is off when the base current is 0. Thus, bipolar DRAMs can be used in applications where data is to be retained during temporary power losses, provided such interruptions are short compared to the normal storage time of the memory. In this mode, the memory is considered “nonvolatile,” at least for limited periods of time. What if the storage time could be extended to the order of days, months, or even years? In a moment, we will show experimental evidence that such storage times may indeed be possible. But first, we should point out the potential advantages that this type of nonvolatile memory would have in comparison to nonvolatile memories available today. In present-day nonvolatile memories in silicon, a charge is injected onto a floating gate by tunneling through a thin tunnel oxide or by avalanche injection over the potential barrier of a thicker oxide (Kahng and Sze, 1967; Frohman-Bentchkowsky, 1974; Nishi and Iizuka, 1981). Storage times are measured in years, but writing operation is quite slow, typically on the order of 100psec per bit, and writing requires pulse voltages much larger than the normal 5 V supply voltage. Therefore, operation is effectively restricted to read-only memory (ROM) applications, and the devices are variously referred to as programmable ROMs (PROMS), electrically alterable ROMs (EA-XOMs or E-PROMS), or “flash” E-PROMS. The bipolar DRAM, on the other hand, achieves nonvolatile storage from its inherently long charge recovery time and does not depend on tunneling or avalanche injection across an oxide barrier. As a result, electrical writing is expected to be very fast, typically on the order of nanoseconds or less per bit, and operation as a high-speed read-write memory is not compromised. Silicon E-PROMS also suffer from a wearout mechanism that limits the number of times the memory can be reprogrammed to around 106-107times. Although this is not a limitation when the memory is used as a ROM, it does preclude use of these devices in RAM applications, where lo6 writing operations can occur in a few seconds of real time. The bipolar DRAM suffers from no comparable wearout mechanism.
72
JAMES A. COOPER, JR
As stated earlier, storage times on the order of months to years may be possible using the basic bipolar DRAM structure of Section V, provided a suitable semiconductor material is utilized. One particularly attractive material for such applications is silicon carbide (Sic) (Davis et al., 1988). This is because S i c has a band-gap almost three times as large as silicon and twice as large as GaAs. S i c occurs in both cubic and hexagonal forms. Cubic (“beta”) S i c has a lattice constant of 4.359A and a band-gap of 2.3eV. Hexagonal (“alpha”) S i c occurs in a variety of polytypes, each having a hexagonal basal plane but with different stacking sequences in the vertical direction. The most useful polytype is “6H,” with a band-gap of 2.9 eV. Both beta and 6H material have been used to fabricate electronic devices (Kong et al., 1987; Palmour et al., 1991), including MESFETs, JFETs, MOSFETs, and bipolar transistors. 6H-Sic wafers are available commercially from Cree Research, Inc., Durham, NC, and are now grown at a number of research laboratories in the United States, Europe, Japan, and the states constituting the former Soviet Union. S i c is physically robust, being one of the hardest and most chemically inert materials known to humankind. It is thermally stable, and electronic devices retain good electrical characteristics to very high temperatures. MOSFET operation has been demonstrated (Palmour, Kong, and Davis, 1987) to 65OoC, and packaged devices operate reliably for extended periods at temperatures up to 35OoC, limited by the thermal stability of the package. Therefore, a major application for S i c is high-temperature and high-power electronic devices. For further information on S i c crystal growth, fabrication techniques, and electrical properties, the reader is referred to the literature. In order to evaluate S i c for use in long-term dynamic memory applications, we have investigated the storage time of npn storage capacitors in 6H-Sic (Gardner et al., 1991). A cross section of the experimental device is shown in Fig. 61. All layers are doped in situ during epitaxy. Circular npn diodes are isolated by reactive ion etching and passivated with S O z formed by wet thermal oxidation. Ohmic contacts are annealed Ni. Capacitance recovery times are measured on various devices using the techniques described in Section 11. Referring to equations (33) and (34) of Section 11, we are reminded that the recovery time of a pn junction storage capacitor is expected to increase exponentially with band-gap energy as exp (EG/2kT), assuming that generation lifetimes in the materials are comparable. Since the band-gap energy of 6H-Sic is 1.48eV greater than the band-gap of GaAs, one expects the storage time at room temperature to be larger by a factor of exp(28.6) or approximately 12 orders of magnitude. Alternatively, we expect to observe the same storage time as GaAs at an absolute temperature that is higher by the ratio of the band-gaps, or approximately a factor of 2. This means that S i c devices should exhibit
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
73
Ni Ohmic Contact
30 nm undoped
-
N 6H Sic Substrate (Siface)
FIGURE 61. Prototype n-i-p-i-nstorage capacitor fabricated in 6H-Sic for storage time measurements. The etched surfaces are passivated by thermal SiO, grown by wet oxidation.
storage times at 600 K (3OOOC) that are comparable to those of GaAs at room temperature. Figure 62 shows storage time versus temperature for several Sic diodes (Cooper et al., 1991). As predicted, storage times on the order of 10 min are observed at 300°C. All diodes have an activation energy around 1.48 eV, close to half the band-gap. If we extrapolate these data to room temperature, we would predict a storage time of about loi3sec, or about 300,000 years! Of course, such an extrapolation is not realistic - other mechanisms having lower activation
los
b
'
.
,
,
,
,
,
,
1
,
.
,
,
I
"
' 3
I I I
lo4
I
/-
6H-!SIC
I
10'
loo
; '
'
'
FIGURE62. Recovery time versus temperature for storage capacitors fabricated in GaAs and in 6H-Sic. The activation energy of the Sic capacitors is about 1.48 eV, close to half the band-gap. Data on the S i c samples had to be taken above 300°C because at lower temperatures the storage times are too long to be conveniently measured.
74
JAMES A. COOPER, JR.
energies (or not thermally activated at all) will become dominant before room temperature is reached. The point, however, is that thermal generation is not likely to limit storage times in these devices. How long will room temperature storage times be? That question is difficult to answer with any confidence, since it will require actual room temperature measurements over extended periods of time. Such measurements have not yet been attempted. To summarize, we have shown the possibility of one-transistor memories having exceedingly long storage times at room temperature. For practical purposes, such memories can be regarded as static. In addition, since the bipolar cell requires no external bias to retain data, these memories are also nonvolatile. Because they can be written in nanoseconds, these devices can be used in high-speed RAM applications. E. Nonconventional Applications
TOconclude this chapter, we wish to suggest some nonconventional applications for these compact storage devices. Dynamic memory is usually associated with digital electronics. However, the applications we will describe in this section use the fact that the dynamic memory is inherently an analog device. The basic elements of the dynamic memory cell, storage capacitors and access transistors, are utilized in a number of analog applications such as switched capacitor filters. In addition, the basic capacitor structure, which has been optimized for exceedingly low dark current, is ideal for use in low-light-level imagers. Perhaps the most novel application, one that has arisen only in the last few years, is in the emerging field of electronic neural networks. We will describe this particular application next. Artificial neural networks are electronic circuits that embody many of the features of biological neural systems (Hopfield, 1988; Lippmann, 1987). In particular, they are highly interconnected networks composed of a large number of very simple identical elements. The basic elements of neural networks are neurons and synapses. In electronic neural networks, neurons are simple amplifiers having saturating (or sigmoidlike) input-output characteristics. The input to each neuron is a summing node connected to a large number of surrounding neurons by resistive connections called synapses. An example of a simple electronic neural network is shown in Fig. 63. Here the network consists of a row of amplifiers (neurons) whose outputs are brought around to a matrix of resistive connections (synapses). The synapse matrix forms the interconnections between neurons. One simple way to visualize the operation of a neural network is to define an energy function for the network. This energy function depends on the output voltages of the neurons in the array. If there are n neurons, the energy function is defined on an n-dimensional space. The exact dependence of the
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
I
75
I
FIGURE63. Illustration of a simple Hopfield neural network. The network consists of a row of amplifiers (neurons) whose outputs are brought around to a matrix of resistive connections (synapses). The synapse matrix forms the interconnections between neurons. Programming is accomplished by specifying the strengths of the resistive connections in the synapse matrix.
network energy on the individual neuron voltages is determined by the resistive weights of the synapse elements in the interconnection matrix. Thus, the information content of the network, in effect its programming, is contained in the resistive weights of the synapse connections. One use of such a neural network is in pattern recognition. Suppose that an input pattern (which could be an optical image, a fragment of speech, a radar signature, etc.) is to be classified based on a number of predetermined “example” patterns, To be specific, let us assume that the network is to identify a two-dimensional image as belonging to one of 26 possible classes: the letters of the alphabet. The weights in the synapse matrix of the neural network are selected so that the energy function of the network exhibits local minima. Each local minimum corresponds to one of the “example” patterns that are acceptable answers to the identification problem. The inputs to the neurons are then preset to potentials that correspond to the input image to be identified. This, in effect, places the network in an initial condition at some point in the n-dimensional parameter space corresponding to the input pattern. The network is then allowed to settle into the local energy minimum that is “nearest” the starting point. This minimum corresponds to one of the 26 possible “answers” - the 26 letters of the alphabet. In this way, the network performs a natural minimization function, determining which “example” pattern is the minimum distance from the input pattern.
76
JAMES A. COOPER, JR
The question of how to specify the weights in the synapse matrix so that the local minima in the energy function are efficiently placed is a subject of current research. However, for our purposes it suffices to understand that the network is “programmed” by specifying these synapse weights. To perform a meaningful pattern recognition task, the electronic neural network must be of sufficient size. In particular, the number of local minima in the energy function grows approximately as the square root of the number of neurons. Unfortunately, the number of synapse connections grows as the square of the number of neurons. It is easy to see that simply identifying the letters of the alphabet would require perhaps a few thousand neurons and a few million synapses. How does all this relate to dynamic memories? In Section C , we described a combination bipolar-field-effect (BiFET) cell capable of nondestructive readout of stored information. We now suggest that this BiFET cell can be used to implement a very compact synapse connection for electronic neural networks. As discussed earlier, the potential on the floating collector in this device serves to back-gate the channel of the lateral JFET formed by the base layer. Thus, the conductance of this lateral JFET can be programmed to an analog value by storing the proper voltage on the floating collector. The JFET is then used to provide the desired resistive connection required of the synapse element. An illustration of the use of the BiFET cell in a synapse array is shown in Fig. 64. We assume that the weighting information is stored off-line in a conventional semiconductor memory. This weighting information is then loaded into the BiFET array using the data lines shown. Once initialized, the BiFET array is ready to provide weighted synapse connections for the neural network. Since it is important that the analog voltage in each cell does not change appreciably, the array will need to be refreshed in a time that is much shorter than the l/e storage times previously quoted for our DRAM cells. However, considering the long room-temperature storage times of GaAs bipolar cells (not to mention the storage times of S ic cells), such periodic refreshing should not be a significant problem, particularly since the normal settling time of the neural network is on the order of a few microseconds. The use of dynamic storage for the synapse weights allows the network to be reprogrammed on-the-fly to solve different classification problems, in effect becoming a general-purpose neural computer. In conclusion, we have pointed out a number of nonconventional applications for dynamic storage devices. The ultra-long storage times now being realized make possible a number of novel applications, as diverse as nonvolatile digital memory and high-density neural networks. The evolution of nonsilicon dynamic memory devices has been rapid, and a number of exciting directions are opening for future development.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
-
I,
11
l2
77
12
FIGURE 64. Realization of a Hopfield network using the BiFET storage cell of Fig. 60 as a programmable analog synapse element. Synapse weights are entered using the horizontal “data” lines.
REFERENCES Auret, F. D., Myburg, G., Bredell, L. J., Barnard, W. O., and Kunert, H. W. (1991). 16th Int’l Conf. on Defects in Semiconductors, Bethlehem, PA. Batey, J., and Wright, S. L. (1986). J. Appl. Phys. 59, 200. Bedair, S. M., Tischler, M. A., Katsuyama, T., and El-Masry, N. A. (1985). Appl. Phys. Lett. 47, 51.
Beltram, F., Capasso, F., Walker, J. F., and Malik, R. J. (1988). Appl. Phys. Lett. 53, 376. Capasso, F., Beltram, F., Malik, R. J., and Walker, J. F. (1988). IEEE Electron Device Lett. EDL-9, 377. Carpenter, M. S., Melloch, M. R., and Dungan, T. E. (1988). Appl. Phys. Lett. 53, 66. Chen, C. L., Goodhue, W. D., and Mahoney, L. J. (1991). Electronics Lett. 27, 1330. Conger, J., Peczalski, A., and Shur, M. S. (1988). IEEE Electron Device Lett. EDG9, 128. Cooper, J. A,, Jr. (1989). Unpublished. Cooper, J. A., Jr., Qian, Q.-D., and Melloch, M. R. (1986). IEEE Elecrron Device Lett. EDG7, 374. Cooper, J. A., Jr., Palmour, J. W., Gardner, C. T., Melloch, M. R., and Carter, C. H., Jr. (1991). 1991 Int’l. Semiconductor Device Res. Symp., Charlottesville, VA.
78
JAMES A. COOPER, JR
Davis, R. F., Sitar, Z., Williams, B. E., Kong, H. S., Kim, H. J., Palmour, J. W., Edmond, J. A,, Ryu, J., Glass, J. T., and Carter, C. H., Jr. (1988). Mat’l. Sci. and Engr. B1, 77. Dungan, T. E. (1989). Ph.D. dissertation, Purdue University, West Lafayette, IN (available as Tech. Rept. TR-EE 89-47). Dungan, T. E., Neudeck, P. G., Melloch, M. R., and Cooper, J. A,. Jr. (1990). IEEE Trans. Electron Devices ED-37, 1599. Fiedler, A., Chun, J., and Kang, D. (1988). IEEE GaAs IC Symposium Tech. Dig. 67. Frenkel, J. (1938). Phys. Rev. 54, 647. Frohman-Bentchkowsky, D. (1974). Solid-state Electron. 17, 517. Gardner, C. T., Cooper, J. A,, Jr., Melloch, M. R., Palmour, J. W., and Carter, C. H., Jr. (1991). 4th Int’l Conf. on Amorphous and Crystalline Silicon Carbide and Other IV-IV Materials, Santa Clara, CA. Hetherington, D. L., Klem, J. F., and Weaver, H. T. (1991). IEEE ELectron Device Lett. EDL-13, 146.
Hopfield, J. J. (1988). IEEE Circuits and Devices Magazine 4, 3 . Kahng, D., and Sze, S . M. (1967). Bell Syst. Tech. J . 46, 1283. Kleine, J. S., Melloch, M. R., and Cooper, J. A,, Jr. (1989a). Appl. Phys. Lett. 55, 1656. Kleine, J. S . , Qian, Q.-D., Cooper, J. A,, Jr., and Melloch, M. R. (1989b). IEEE Trans. Electron. Devices ED-36, 289. Kleine, J. S., Cooper, J. A,, Jr., and Melloch, M. R. (1991). Appl. Phys. L e u . 61, 834. Kleinhenz, R., Mooney, P. M., Schneider, C. P., and Paz, 0. (1984). In “13th Int’l. Conf. on Defects in Semiconductors” (L. C. Kimmerling and J. Parsey, eds.), p. 627, Coronado, CA. Kong, H. S., Palmour, J. W., Glass, J. T., and Davis, R. F. (1987). Appl. Phys. Lett. 51, 442. Lippmann, R. J. (1987). IEEE ASSP Magazine, 4. Lott, J. A,, Dawson, L. R., Weaver, H. T., Zippenan, T. E., and Caldwell, R. B. (1989a). Appl. Phys. Lett. 55, 1110. Lott, J. A,, Klem, J. F., and Weaver, H. T. (1989b). Appl. Phys. Lett. 55, 1226. Lott, J. A,, Klem, J. F., Weaver, H. T., Tigges, C. P., and Radoslovich-Cibicki, V. (1990). Electronics Lett. 26, 972. Lu, N. C. C. (1989). IEEE Circuits and Devices Magazine 5, 21. Makino, H., Matsue, S., Noda, M., Tanino, N., Takano, S . , Nishitani, K., and Kayano, S . (1988). IEEE GaAs IC Symposium Tech. Digest, 71. Makino, H., Matsue, S . , Noda, M., Tanino, N., Takano, S., Nishitani, K., and Kayano, S. (1990). IEEE Journal of Solid-state Circuits 25, 1232. Matsue, S., Makino, H., Noda, M., Tanino, N., Takano, S . , Nishitani, K., Kayano, S. (1989). IEEE GaAs IC Symp. Tech. Dig. Nakano, H., Noda, M., Sakai, M., Matsue, S., Oku, T., Sumitani, K., Makino, H., Takano, H., and Nishitani, K. (1990). IEEE GaAs IC Symp. Tech. Dig. Nel, M., and Auret, F. D. (1988). J. Appl. Phys. 64, 2422. Neudeck, P. G . (1991). Ph.D. dissertation, Purdue University, West Lafayette, IN (available as Tech. Rept. TR-EE 91-21). Neudeck, P. G., Dungan, T. E., Melloch, M. R., and Cooper, J. A,, Jr. (1989). IEEE Electron Device Lett. EDL-10, 477. Neudeck, P. G., Carpenter, M. S., Cooper, J. A,, Jr., and Melloch, M. R. (1991a). IEEE Electron Device Lett. EDL-10, 553. Neudeck, P. G., Kleine, J. S . , Sheppard, S. T., McDermott, B. T., Bedair, S . M., Cooper. J. A,, Jr., and Melloch, M. R. (1991b). Appl. Phys. Lett. 58, 83. Nishi, Y., and Iizuka, H. (1981). In “Applied Solid State Science, Suppl. 2A.,” (D. Kahng, ed.), Academic Press, New York. Palmour, J. W., Kong, H. S., and Davis, R. F. (1987). Appl. Phys. Lett. 51, 2028.
RECENT ADVANCES IN GaAs DYNAMIC MEMORIES
79
Palrnour, J. W., Kong, H. S., Waltz, D. G., Edrnond, J. A., and Carter, C. H., Jr. (1991). In “Trans. of First Int’l. High Temperature Electronics Conf.” (D. B. King and F. V. Thome, eds.), U.S. Government Printing Office, Washington, DC. Priddy, K. L., Kitchen, D. R., Grzyb, J. A,, Litton, C. W., Henderson, T. S., Peng, C.-K., Kopp, W. F., and Morkoc, H. (1987). IEEE Trans. Electron Devices ED-34, 175. Qian, Q.-D., Melloch, M. R., and Cooper, J. A,, Jr. (1986). IEEE Electron Device Lett. EDL-7, 607. Qian, Q.-D.. Melloch, M. R., and Cooper, J. A,, Jr. (1989). J . Appl. Phys. 65,3118. Quinn, P. M., Early, J. M., Sander, W. B., and Longo, T. A. (1978). IEEE Int’l. Solid-St. Circuits Conf. Rooks, M. J., Eugster, C. C., del Alarno, J. A,, Snider, G. L., and Hu, E. L. (1991). Int’l. Symp. on Electron, Ion, and Photon Beams, Seattle, WA. Sander, W. B., and Early, J. M. (1976). IEEE Int’l. Solid-St. Circuits Conf. Schroder, D. K. (1987). “Advanced MOS Devices.” Addison-Wesley, Reading, MA. Sheppard, S. T. (1991). MS dissertation, Purdue University, West Lafayette, IN. Shockley, W., and Read, W. T., Jr. (1952). Phys. Rev. 87, 835. Stellwag, T. B., Cooper, J. A., Jr., and Melloch, M. R. (1991a). IEEE Device Research Con$, Boulder, CO. Stellwag, T. B., Melloch, M. R., and Cooper, J. A., Jr. (1991b). Unpublished. Stellwag, T. B., Melloch, M. R., Cooper, J. A,, Jr., Sheppard, S. T., and Nolte, D. D. (1992a). J . Appl. Phys. 71, 4509. Stellwag, T. B., Cooper, J. A,, Jr., and Melloch, M. R. (1992b). IEEE Electron Device Lett. EDL-13, 129. Solomon, P. M., Wright, S. L., and Lanza, C. (1986). Superlattices and Microstructures 2, 521. Sunarni, H. (1985). IEEE Int’l. Electron Dev. Mtg. Tech. Dig., 694. Sunouchi, K., Horiguchi, F., Nitayarna, A,, Hieda, K., Takato, H., Okabe, N., Yarnada, T., Ozaki, T., Hashimoto, K., Takedai, S., Yagishita, A,, Kurnagae, A,, Takahashi, Y., and Masuoka, F. (1990). IEEE Int’l. Electron Dev. Mtg. Tech. Dig., 647. Terrell, W. C., Ho, C. L., and Hinds, R. (1988). IEEE GaAs IC Symp. Tech. Dig., 79. Thurmond, C. D. (1975). J . Electrochem. SOC.122, 1133. Vogelsang, C. H., Castro, J. A., Notthoff, J. K., Troeger, G. L., Stephens, J. S., and Krein, R. B. (1988). IEEE GaAs IC Symp. Tech. Dig., 75. Whitmire, D. A., Garcia, V., and Evans, S. (1988). IEEE Int’l. Solid-St. Circuits Conf. Yamasaki, K., Kato, N., and Hirayama, M. (1985). IEEE Trans. Electron Devices ED-32,2420.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS. VOL. 86
Expert Systems for Image Processing. Analysis. and Recognition: Declarative Knowledge Representation for Computer Vision TAKASHI MATSUYAMA Department o$ Information Technology. Faculty of Engineering. Okayama University. Okayama. Japan
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A . Declarative Knowledge Representation . . . . . . . . . . . . . . . . . B. Knowledge Representation in Computer Vision . . . . . . . . . . . . . C. Organization of the Chapter . . . . . . . . . . . . . . . . . . . . . . I1. Expert Systems for Image Processing and Analysis (ESIPAs) . . . . . . . . . A . Problems in Image Processing and Analysis . . . . . . . . . . . . . . . B. Characteristics of ESIPAs . . . . . . . . . . . . . . . . . . . . . . . C . General Architecture of ESIPAs . . . . . . . . . . . . . . . . . . . . D . Consultation System for Image Processing . . . . . . . . . . . . . . . . E . Knowledge-Based Program Composition System . . . . . . . . . . . . . F. Rule-Based Design System for Bottom-up Image Segmentation Algorithms . G . Goal-Directed Top-down Image Segmentation System. . . . . . . . . . . I11. Representing Knowledge about Image Analysis Strategies . . . . . . . . . . A . Heterogeneous Combination of Image Processing Operators . . . . . . . . B. Cooperative Integration of Multiple Stereo Vision Algorithms . . . . . . . IV. Representing Spatial Relations and Spatial Reasoning for Image Understanding A . Knowledge Representation in Logic Based on Topological Relations . . . . B. Structural Representation of Geometric Relations . . . . . . . . . . . . C . Algebraic Representation of Geometric Information and Geometric Theorem Proving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D . Reasoning Based on PART-OF Relations . . . . . . . . . . . . . . . . V. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81 81 84 86 87 87 89 93 95 98 111 114 124 125 139 143 143 146 149 154 163 168
I . INTRODUCTION A . Declarative Knowledge Representation
Expert systems are computer software systems capable of intelligent problem solving. which usually requires sophisticated knowledge of human experts. In the 1980s expert systems in various task domains were developed and their practical utilities have been widely recognized: medical diagnosis. chemical 81
Copyright 0 1993 by Academic Press. Inc. All rights of reproduction in any form reserved. ISBN 0-12-0 14728-9
82
TAKASHI MATSUYAMA
substance analysis, geological investigation, computer system configuration, and so on. Generally speaking, any meaningful computer software or program can be considered as an embodiment of human knowledge; it is designed and implemented using knowledge in a specific task domain. In other words, without knowledge no useful software or program can be realized. For example, the following function in C programming language is a realization of the mathematical knowledge about absolute values of integer numbers:
absolute (n) int n;
c if (n)=o) return (n); else return (-n);
1 The most distinguishing characteristic of expert systems is the use of declarative knowledge. That is, knowledge in expert systems is explicitly represented as symbolic data, independent of the control of problem solving, while that in ordinary programs is implicitly encoded in and spread over various types of program statements. The knowledge represented by procedures like the preceeding absolute function is called procedural knowledge. A FORTRAN or C program for the fast Fourier transform (FFT) is a typical example of procedural knowledge. Although the program embodies the knowledge about FFT (i.e., when executed, it transforms an input data correctly based on the knowledge), the knowledge is implicitly encoded in the sequence of primitive operations like assignment and IF-THEN-ELSE control statements. Thus it is almost impossible to identify (and separate) the knowledge itself in (from) the program code. On the other hand, formulae in the first-order predicate calculus like Vx[MAN(x) + MORTAL(x)] [For all x,M A N ( x ) implies MORTAL(x). MAN and MORTAL are predicates and x a variable.] and so-called production rules like I F (condition for activation) THEN (action) (If the condition is satisfied, execute the action.) are typical examples of declarative knowledge, where knowledge is explicitly described in terms of symbols in a predefined vocabularly. Note that declara-
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
83
tive knowledge does not include any control information about how and in which order it is used. Thus we can freely design the control structure of reasoning and problem solving independent of the knowledge. Moreover, flexible modification and augmentation of knowledge can be easily realized by symbol manipulations, since the embodiment of knowledge is just a collection of well-formed symbolic data. The clear separation of knowledge from control leads to the natural decomposition of an expert system into knowledge base and reasoning engine.' The knowledge base is a collection of symbolic data representing knowledge and the reasoning engine is a control program that conducts reasoning based on the knowledge base. Merits and demerits of the declarative knowledge representation can be summarized as follows: Merit 1 : High Modularity and Easy Modijication. The declarative knowledge representation enables highly modular description of knowledge; knowledge is represented by a set of mutually independent descriptions (e.g., rules and axioms). Thus knowledge can be added, deleted, and modified easily by editing related portions of symbolic data in the knowledge base, while even a small change in procedural knowledge can lead to extensive modification of an entire program. Merit 2: High Transportability and Knowledge Sharing. Once knowledge in a specific task domain is described declaratively, it can be used in or transported to other expert systems even if it is not originally intended to be used in such systems. In other words, declarative knowledge can be shared by different expert systems. Merit 3: High-Level Reasoning. New knowledge can be automatically generated by reasoning and learning mechanisms since the addition of knowledge can be easily realized by inserting new symbolic descriptions into the knowledge base. Moreover, meta knowledge about how to use knowledge in a task domain can also be described declaratively to realize flexible control of reasoning processes. Merit 4: Introspective Explanation. Since the reasoning engine can memorize which knowledge is used in a specific reasoning process, it can present to a user an introspective explanation of the executed reasoning process. Demerit 1 : Limited Descriptive Vocabulary. Knowledge used in expert systems is often called shallow knowledge; declarative knowledge used in them is described in terms of naive qualitative vocabularies. Such naive descriptions limit the level of reasoning and analysis. Demerit 2: Low Eficiency. Declarative knowledge should be interpreted to be
'
Although the term inference engine is more popular than reasoning engine, we reserve inference for that in the formal mathematical logic and use reasoning in a general context.
84
TAKASHI MATSUYAMA
used in a reasoning process, so that many complicated symbolic processings (e.g., symbolic pattern matching) are inevitable. This lowers execution speed considerably. In short, it is the essence of the declarative knowledge representation that knowledge itself is described as independent objective data to be processed. With the declarative knowledge representation, we can gain much flexibility and versatility despite loose efficiency. The problem described in Demerit 1 is a crucial point in developing an expert system in a specific task domain. That is, how well we can symbolically describe the knowledge in that task domain determines the performance and utility of the developed expert system. On the other hand, Demerit 2 can be resolved by compiling the knowledge base. That is, although during the development we should use the declarative knowledge representation for its modifiability, we can encode the knowledge into procedures for efficiency once the entire set of knowledge required for problem solving is fixed. B. Knowledge Representation in Computer Vision
In general computer vision’ refers to visual information processing by computer. It includes image processing, analysis, and recognition. Image processing denotes transformation of an input image to another. Filtering operations like smoothing and edge enhancement are its typical examples. On the other hand, image analysis refers to image segmentation to extract meaningful image features such as lines and regions.’ In image processing, analysis, and recognition and modern computer vision for three-dimensional object recognition, most of the knowledge has been described procedurally. Reasons for this are as follows: 1. Digital filtering is a fundamental scientific discipline of image processing, and photometry and geometry in the three-dimensional world provide major knowledge sources for three-dimensional computer vision. All these scientific disciplines have been established based on quantitative analysis of physical phenomena, where mathematical formulae such as differential equations are used as general representational schemes. Thus it is natural to use numerical computation procedures as the embodiment of such physical knowledge.
’
Computer vision sometimes refers specifically to three-dimensional information analysis and object recognition from two-dimensional images. When necessary, we will refer to it explicitly as three-dimensional computer vision or computer vision for three-dimensional object recognition. In this chapter we will sometimes use image processing in a general sense, including both filtering and feature extraction.
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
85
2. For human beings, the identification of various visual properties - i.e., brightness, color, size, and shape - is so natural and immediate that no conscious thought or logical reasoning is required for their characterization. Consequently, only limited descriptive vocabularies for such visual properties have been developed and few efforts for their declarative description have been made until quite recently. 3. In ordinary statistical pattern recognition, patterns and objects to be analyzed are represented by n-dimensional vectors. This also leads to the development of numerical computation procedures in realizing the knowledge about various recognition methods. 4. Since the size of image data is large and its processing should often be done quickly in practical applications, the speed of execution has been an important factor to design computer systems for image processing, analysis, and recognition. Thus the efficiency of the procedural knowledge representation has been preferred to the flexibility of the declarative knowledge representation. Several years ago, Rosenfeld (1986) proposed the term expert vision systems to denote computer vision systems using declarative knowledge and had open discussions with several computer vision researchers on possibilities and utilities of the declarative knowledge representation for computer vision. Unfortunately, the discussions were not developed extensively, nor did they produce any creative concrete ideas at that time. Since then, however, many research efforts have been put to prove the feasibilities of expert vision systems. Especially in Japan, we organized a special interest group on expert vision and have developed various types of expert systems for image processing and analysis. On the other hand, several formal approaches to the declarative knowledge representation for computer vision were proposed to clarify the knowledge and reasoning mechanism required for image recognition and understanding. Moreover, a major focus of recent multimedia communication and database research is on the integration of textual, verbal, and visual information; and many methods for declarative characterization of visual information have been proposed for multimedia information processing. From an engineering point of view, since the speed and memory space of computers increased dramatically in the past few years and this increase is expected to continue, system designers for practical applications gradually shifted their attention from efficiency to the flexibility and versatility of implemented systems. Therefore we believe that by exploring the declarative knowledge representation for computer vision new scientific findings and technological
86
TAKASHI MATSUYAMA
successes will be obtained and many new-featured vision systems can be developed.
C. Organization of the Chapter In this chapter we will describe various approaches to the declarative representation of visual information and knowledge for image processing, analysis, and recognition. We will not survey or discuss that for multimedia information processing. First Section I1 gives an extensive survey of expert systems for image processing and analysis. They use the knowledge about image processing techniques to compose complex image analysis processes from primitive image processing operators. We classify the expert systems into the following four categories and discuss their objectives, knowledge representation, and reasoning methods: 1. Consultation system for image processing, 2 . Knowledge-based program composition system, 3. Rule-based design system for bottom-up image segmentation algorithms, 4. Goal-directed top-down image segmentation system. In order to realize flexible and reliable image analysis, ordinary sequential reasoning mechanisms such as backward-forward chaining in expert systems and linear resolution in PROLOG are not enough. In Section I11 we discuss the following sophisticated strategies (i.e., control mechanisms) for image analysis with practical examples: 1. Heterogeneous combinations of image processing operators. 2 . Cooperative integration of multiple stereo vision algorithms. The success of the declarative knowledge representation for computer vision rests wholy on how well we can symbolically describe various types of visual properties. In Section IV we first discuss the declarative representation of spatial relations for image recognition and understanding. During the discussion, we introduce two formal declarative knowledge representation methods for computer vision: 1. Knowledge representation in terms of the first-order predicate calculus, 2 . Algebraic knowledge representation and geometric theorem proving.
Then we describe expert vision systems for recognizing complex objects with internal structures; that is, reasoning based on PART-OF relations.
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
87
In Section V, we conclude the article by discussing future problems of the declarative knowledge representation for computer vision.4
FOR IMAGE PROCESSING AND ANALYSIS (ESIPAs) 11. EXPERTSYSTEMS
A . Problems in Image Processing and Analysis
A variety of image processing algorithms have been devised in the history of digital image processing. Although they do not work perfectly for complex natural images, their utilities have been proven in various application areas such as remote sensing, medical engineering, office and factory automation, and so on. In order to facilitate a wider use of digital image processing techniques, various software packages have been developed: FORTRAN subroutine libraries (Tamura e f al., 1983) and command libraries in image processing systems. As is well known, however, it is not so easy to make full use of such libraries; various forms of knowledge and know-how about image processing techniques are required to realize effective image analysis. In other words, the requirement for such knowledge limits utilities of the libraries and increases the cost of developing image processing systems for various applications. From a knowledge representation viewpoint, each subroutine or command in a library is a procedural embodiment of the knowledge about a primitive image processing operator, and the library is a simple collection of such procedural knowledge. Although the procedural knowledge representation for primitive image-processing operators is natural and effective, no systematic description about syntactic and semantic relations among subroutines and commands is supported in ordinary libraries. This makes it difficult for their users, especially those with little knowledge of and experience in image processing, to compose effective image analysis processes5 by selecting and combining primitive operators in the libraries. Note that even to read manuals of the libraries requires knowledge about image processing, for example, meanings of specialized terminology in image processing. Thus what we mean by knowledge and know-how about image processing techniques includes syntactic and semantic relations among subroutines and commands in image processing operator libraries. To investigate the knowledge about image processing techniques further, first we should examine popular problems encountered in designing image An earlier version of Sections I1 and 1II.A was published in Matsuyama (1989). By an image unalysisprocess we mean an executableprogram composed of various combinations of primitive operators. Types of combinations will be discussed in Section I11 in detail.
88
TAKASHI MATSUYAMA
analysis processes by using an image processing operator library. The following are typical problems in development image analysis processes: 1. Assessment of Image Quality. To assess the quality of an image is the first step in image analysis. Although one should design an image analysis process based on the assessment, how to measure and describe the image quality is a difficult problem. In the analysis of complex natural images, moreover, we encounter a more difficult problem; since the image quality often changes depending on its location in an image, we need to analyze the image and determine its structure to make the image quality assessment. This leads to a chicken-and-egg problem between image quality assessment and image structure analysis. When physical models of objects and imaging devices are available, we can assess the image quality based on such models. 2 . Selection of Appropriate Operators. There are many different operators (algorithms) for a specific image processing task. For edge detection, for example, several tens or more of operators have been developed. They are designed based on different image models and computation schemes, so that one has to select an appropriate operator considering the image quality, the purpose of image analysis, and characteristics of the operators. 3. Determination of Optimal Parameters. Many operators have adjustable parameters and their performance is heavily dependent on the values of the parameters (e.g., threshold in binarization). How to determine optimal parameter values is another difficult problem. 4. Combination of Primitive Operators. It is often necessary to combine many primitive operators to perform a meaningful task. For example, a popular way of extracting regions from an image is to apply smoothing edge detection +-edge linking sclosed boundary detection. To attain an effective combination of operators, the knowledge about syntactic and semantic relations between operators is required. 5. Trial-and-Error Experiments. Usually it is very hard to estimate a priori the performance of an operator for a given image, so that one has to repeat trial-and-error experiments by modifying parameters (and sometimes operators). To control such trial-and-error analysis, sophisticated knowledge about and rich experience in image processing are required. 6. Evaluation of Analysis Result. The process of evaluation is very important in realizing flexible image analysis. For example, the feedback analysis (Nagao, 1984) evaluates the difference between a processing result and the ideal output (i.e., the model of an object) and adjusts parameters for the analysis. How to evaluate the analysis result
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
89
and how to adjust parameters based on the evaluation is an important problem in designing image analysis processes with feedback loops. Vogt (1986) pointed out 10 major problems in developing image analysis programs, which are almost the same as the preceding, and analyzed their causes extensively. Among others he claimed that many intuitive, ad hoc, nonsystematic, and sometimes ambiguous factors are involved in the image analysis program development and that well-defined formal systems like mathematical morphology are required to facilitate the development. From the knowledge representation viewpoint discussed in Section I, his claim can be put as follows. When we develop a complex image analysis process, we usually write a main program that calls many primitive operators (subroutines) in a library. Such main program is really a procedural embodiment of the knowledge to solve the problems listed earlier. However, since this procedural knowledge is not systematically designed, no one, not even the programmer who wrote the main program, can understand clearly what knowledge is incorporated and how some portions of the program are related to others. Even worse, meaningless and incorrect knowledge can sneak into the program because there is no way of verifying the correctness of the procedural knowledge except by reading the program intensively. Due to these problems, it is very hard to improve an image analysis (main) program and modify it for other applications. This increases the time and cost of the image analysis program development. In other words, the procedural knowledge representation is efficient once correctly implemented, but its development and verification processes are far from efficient and usually require much time, cost, and moreover human expertise because no mechanical development and proof system is available. In short, while the procedural knowledge representation is useful for primitive image processing operators, its use in developing complex image analysis processes causes many problems. Therefore, we should study the declarative knowledge representation to examine its practical utilities in the development of complex image analysis processes.
B. Characteristics of ESIPAs Recently, several expert systems for image processing and analysis (ESIPAs, in short) were developed to facilitate the development of image analysis processes. They incorporated declarative knowledge representation and symbolic reasoning methods in artificial intelligence to solve the problems discussed in the previous section. The knowledge used by these systems is about how to effectively use and combine primitive image processing operators for image analysis. That is, the expertise stored in the systems is
90
TAKASHI MATSUYAMA
what we, computer vision researchers, have acquired and accumulated through the development of image analysis techniques. Before proceeding to technical discussions of ESIPAs we will discuss general characteristics of ESIPAs by comparing them to ordinary expert systems and image understanding systems. Expert systems have been developed for various tasks: signal interpretation, medical diagnosis, circuit design and trouble shooting, plant control, and so on (Hayes-Roth, Waterman, and Lenat, 1983). In general, the task of ESIPAs is to compose effective image analysis processes based on primitive image processing operators. In this sense, ESIPAs can be considered as expert design systems. The most successful system of this type would be R1 /XCON (McDermott, 1980b), which configures computer systems suitable for customer’s requirements by combining available functional components. The critical differences between ESIPAs and such expert design systems are 1. Although we can symbolically describe characteristics and behaviors of electronic circuits and computer hardware in terms of logical and mathematical expressions, it is very hard to describe the visual information included in an image and specify characteristics of image processing operators. 2. Since ESIPAs are given the input information in the form of raw image data (signal data), they have to analyze it to extract meaningful information and verify the effectiveness of composed image analysis processes.
Difference 1 implies that a major objective of developing ESIPAs is to investigate methods of formulating and describing the visual information in a declarative way: what types of image features we can extract from images, what properties they have, and how they are related to each other. Difference 2 means that ESIPA is not only an expert design system for symbolic reasoning but also an image analysis system for signal processing. Therefore, ESIPAs should have capabilities of both qualitative symbolic reasoning and quantitative signal processing. The integration of both qualitative (symbolic) and quantitative (numeric) information processing is also an important problem in image understanding and multimedia information processing. A primary objective of image understanding systems (IUSs in short) is to construct the symbolic description of the scene depicted in an image, while image processing transforms an image to another and pattern recognition classifies and labels objects represented by feature vectors. IUSs analyze an image(s) to interpret the scene in terms of object models given to IUSs as the knowledge about the world. Here interpretation refers to the correspondence
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
91
(i.e., mapping) between the description of the scene and the structure of the image. It associates objects in the scene (e.g., houses, roads) with image features in the image (e.g., points, lines, regions). Once the description of the scene is constructed, computer systems can answer various queries about the scene (e.g., how many houses exist in the scene?), perform physical operations by controlling robot manipulators (e.g., pick up and move physical objects), and if needed, generate explanations of the scene in natural languages. It is in this sense that we can say IUSs understand the scene. IUSs require diverse sources of knowledge to interpret visual scenes. In general, the knowledge they use can be classified into the following three types: 1. Scene Domain Knowledge. This type of knowledge includes intrinsic properties of and mutual relations between objects in the world. It is described in terms of the terminology defined in the scene: names of scene objects and their constituent parts, geometric coordinate systems to specify locations and spatial relations, physical scale systems to measure various size properties (e.g., length in meters) and so on. 2. Image Domain Knowledge. This type of knowledge is used to extract image features from an image and to group them to construct the structural description of the image. It is described in terms of the terminology defined in the image domain. This terminology must not be confused with that for describing the scene. For example, a word adjacent in the scene domain knowledge must be clearly discriminated from that in the image domain knowledge; adjacent image features need not correspond to adjacent objects in the scene. 3. Knowledge About the Mapping Between the Scene and the Image. This type of knowledge is used to transform image features to scene features and vice versa. It defines translation rules between the two terminologies used to describe the scene and the image domain knowledge. The knowledge about photogeometry is a typical knowledge source for the translation rules: viewing angle and focal length of a camera, color spectral properties, resolution, and so on. Although IUSs often use the declarative knowledge representation as ESIPAs, there are several differences between these two systems: Objective: A major purpose of ESIPAs is to realize effective image analysis processes by combining primitive image processing operators in a program library, while that of IUSs is to interpret a scene. In other words, ESIPAs are developed to make full use of available image processing techniques, while IUSs to realize new versatile visual recognition capability. Knowledge Sources: The knowledge used by ESIPAs is about how to use image processing techniques as well as the image domain knowledge. That
92
TAKASHI MATSUYAMA
is, no knowledge about the scene is used in ESIPAs, while IUSs require all three types of knowledge listed previously. Goal Specification: “Find roads” is a typical goal given to IUSs. Since goals for IUSs are described in terms of the scene domain terminology, IUSs require models of objects in the scene and the knowledge about the mapping in order to establish correspondence between the object models and image features extracted from the image. On the other hand, “find rectangles” is a typical goal given to ESIPAs. Since there are many possible methods to extract rectangles from an image, ESIPAs require the knowledge about primitive image processing operators in order to select promising ones and know-how about image processing techniques so as to combine them effectively. As will be described in detail in the following subsections, we can develop ESIPAs for various different tasks. Among them, one view of ESIPAs is to consider them as image segmentation modules in IUSs. Most of IUSs so far developed emphasized the importance of the knowledge of types 1 (scene domain knowledge) and 3 (knowledge for mapping), and many reasoning and computational methods have been developed based on such knowledge (Ballard and Brown, 1982; Binford, 1982; Brooks, 1981). However, we also need a lot of knowledge to analyze image data. The knowledge of type 2 (i.e., image domain knowledge) has been usually encoded in programs, so that it is very hard to see what knowledge is used for the image analysis in IUSs. Moreover, fixed processes of image analysis (i.e., image analysis procedures) reduce the flexibility of image segmentation capability of IUSs. For example, the poor capability of the ribbon detection (i.e., image segmentation) in the ACRONYM image understanding system (Brooks, 1981) limits its overall performance in object recognition. Selfridge (1 982) incorporated the knowledge about image processing techniques to realize the adaptive operation and parameter selection in his aerial image understanding system. The appropriate image processing operator and its optimal parameters were automatically selected through several iterations of trial-and-error image segmentation. This allowed flexible image segmentation and increased the reliability. Although his idea is very similar to ESIPAs, the knowledge for the operator and parameter selection in his system was still represented procedurally, and consequently, its reasoning capability was limited. In ESIPAs we describe the image domain knowledge and the know-how about image processing techniques explicitly (i.e., declaratively) and make it clear what knowledge is important and how we can use it effectively. Replacing a group of image analysis programs in IUSs by ESIPAs, we can
93
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
-
.
Library of Image
Request I Goal
Processing Operators
e r
Database of Characteristics of
n
imagedata specification
I
Image data
Analysis Result Reasoning Engine
composedprocess
a
knowledge about
C
Image Processing Techniques Knowledge about Knowledge about Image Processing Standard operators Image Analysis Processes
Selector
e -
increase the flexibility of image analysis in IUSs. Especially, the reliability of top-down image segmentation6can be greatly improved by incorporating an ESIPA as an image analysis module; since the model of a target image feature to be extracted and its approximate location are given in the top-down image segmentation, the ESIPA can accurately reason about the most reliable image segmentation tehnique (i.e., operators and parameters) using such information. C . General Architecture of ESIPAs
Figure 1 illustrates the general architecture of ESIPAs. It consists of the following modules: User Interface: A user of ESIPA interacts with the system through this
module. Its basic functions are to get a user’s request or analysis goal and image data to be analyzed and to return the analysis or reasoning result. Besides these primary functions, it supports various man-machine interface facilities such as displaying intermediate analysis results and reasoning histories graphically and sometimes asking the user for the evaluation of analysis results. Library of Image Processing Operators: This library is a major knowledge source of ESIPAs and contains a collection of procedures for primitive image processing operators, which can be called by the analysis executor. Note that the knowledge in this library is represented procedurally. Knowledge About Image Processing Techniques: This module is another major Later in this section we will discuss bottom-up and top-down image segmentation and describe ESIPAs for these two types of image segmentation.
94
TAKASHI MATSUYAMA
knowledge source of ESIPAs and includes various types of declarative knowledge about image processing: knowledge about standard image analysis processes, declarative characterization of individual image processing operators, rules for parameter selection, and so on. Database of Characteristics of Image Data: While the preceding two knowledge sources are usually built-in static information, this database stores all input and intermediate image data and their characteristics. They are dynamically created during the analysis and reasoning process. Analysis Executor: This module takes the full responsibility of executing image processing operators in the library and image analysis processes composed by the reasoning engine. All analysis results by this module are stored in the database for later use by the reasoning engine. Reasoning Engine: This is the central reasoning module in ESIPAs. It uses the knowledge about image processing techniques and characteristics of image data to reason about analysis plan generation, operator and parameter selection, and so on. Usually, reasoning in ESIPAs is done at two levels: 1. Analysis Plan Generation. First, ESIPAs reason about an appropriate
global plan to guide the analysis of a given image. The reasoning engine uses characteristics of the image and the knowledge about standard image analysis processes to generate the plan. The generated analysis plan can be considered as an abstract analysis process. 2 . Operator Selection and Parameter Adjustment. The reasoning at this level instantiates the generated analysis plan into an executable image analysis process: specific operators are selected from the library and values for their parameters are determined. These selections are done through the trial-and-error analysis of the input image: first ESIPAs perform image analysis by applying a promising operator and then evaluate the result to replace the operator or adjust parameter values. Here we classify ESIPAs into the following four categories: 1. Consultation system for image processing to improve user interface of image processing systems, 2. Knowledge-based program composition system to automatically generate complex image analysis programs, 3. Rule-based design system for image segmentation algorithms to realize flexible bottom-up image segmentation, 4. Goal-directed top-down image segmentation system for detecting specified image features. In the following subsections, we present an overview of these four types of
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
95
ESIPAs and discuss their objectives, knowledge representation, and reasoning methods.
D . Consultation System for Image Processing
A user of an interactive image processing system is usually required to select a command from a command library and specify appropriate parameters for the command. Although several HELP facilities are available, one has to refer to a manual to see the detailed usage of commands and the meanings of their parameters. In other words, syntax and semantics of commands are described only verbally in the manual, which forces users to read and understand documents written in specialized terminologies in image processing. Moreover, the information about how to select an appropriate command from a group of commands with similar functions and how to combine different commands to realize a meaningful image analysis process is rarely described in the manual. These problems implicitly prevent inexperienced users from using the system and limit its utility. Expert consultation systems for image processing (Sueda and Hoshi, 1986; Toriu, Iwase, and Yoshida, 1987) use such manual information as their knowledge source and help a user to select an appropriate command and parameters and to combine primitive commands to realize meaningful image analysis processes. Since the command and parameter selection is done under the system guidance, the man-machine interface of the systems can be greatly improved. This facility is useful especially for those with little experience in image processing. In Sueda and Hoshi (1986) a prototype of such a consultation system, EXPLAIN, was proposed. Figure 2 shows the general flow of the consultation. First a user specifies the purpose of image processing in terms of predefined abstract functional specifications (e.g., image quality enhancement, image segmentation). Then he or she inputs an image to be processed and describes its rough characteristics (e.g., color or B&W, noise level, contrast) via a terminal. The system first reasons about a global processing plan based on both the given goal specification and the stored knowledge about standard image analysis processes. The plan is described as an ordered sequence of abstract image processing algorithms (functions) such as noise elimination, edge detection, thresholding, region segmentation, and so on. The knowledge about standard image analysis processes includes a set of such abstract plans and rules to select an appropriate plan satisfying the user’s request and suitable to the given image data. This knowledge is the embodiment of know-how about how to combine primitive image processing operators to
96
TAKASHI MATSUYAMA
Global Procassi
Evaluation of the Result
G All Procassing Finished?
No
FIGURE2 . General flow of consultation (from Sueda and Hoshi, 1986).
realize meaningful image analysis processes. Such knowledge rarely has been written explicitly in ordinary manuals. After selecting an appropriate global processing plan, the system instantiates each abstract algorithm in the selected plan one by one from the beginning (Figure 3): a promising practical command and its appropriate parameters are determined for each abstract algorithm. The instantiation is done through conversations on the detailed user’s objective and image quality guided by the knowledge about commands. The knowledge in this system is described by a set of production rules, which control the search process to find an appropriate command sequence. In Figure 3, for example, the global plan consists of two abstract algorithms:
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
I
F I OK
Evaluation
I
97
OK
pi55-l
FIGURE3. Search process for the instantiation of abstract algorithms (from Sueda and Hoshi, 1986).
and NOISE-ELIMINATION. At the first stage, rule 20 for the instantiation of BACKGROUND-ELIMINATION was applied but failed; a user did not satisfy the analysis result by the selected command. Then, the system activates an alternative rule, rule 21, for the same abstract algorithm, BACKGROUND-ELIMINATION. The following is an illustrative example of a rule for instantiating BACKGROUND-ELIMINATION: BACKGROUND-ELIMINATION
*rule 21*
FOR INPUT OUTPUT EFFECT IF
BACKGROUND-ELIMINATION BINARY- IMAGE LABELED-IMAGE the background region i s eliminated the size o f the background i s larger than
objects
THEN
execute LABELING and LARGE-REGION-ELIMINATION
When this rule is activated, it instantiates into an abstract algorithm sequence,
BACKGROUND-ELIMINATION LABELING
followed
by
98
TAKASHI MATSUYAMA
As the next step of reasoning, rule 81 for instanis activated but fails. And finally, rule 82 instantiates LABELING to generate an executable command. The selected command is applied to the input image, and the result is immediately displayed on the monitor screen. Then, the system asks the user for its evaluation. Depending on the user’s evaluation, the system replaces the operator or modifies parameters by activating other rules and retries the analysis. If the selected command successfully analyzes the image, the system proceeds to the instantiation of the next abstract algorithm in the plan. Note that even if the intermediate analysis result by the selected command is satisfactory, the selection is cancelled and another rule for instantiation is activated if the user does not satisfy the analysis result at some later stage. This backtracking capability realizes the trial-and-error analysis to select commands and adjust parameters. The system performs a large backtrack to try another global processing plan when no new command can be applied to satisfy the user. In addition to the knowledge about standard image analysis processes and commands, EXPLAIN contains rules describing the information about hardware and software architectures of the image processing system such as the number of image memories and special registers. This information is useful to hide specific architectural features of the system from the user and enables him or her to think about image processing at the logical level. LARGE-REGION-ELIMINATION.
tiating
LABELING
E. Knowledge-Based Program Composition System Currently, many software libraries for image processing are available. For example, SPIDER (Tamura et al., 1983) is a FORTRAN subroutine library for image processing containing over 300 subroutines, and many image processing systems are equipped with command libraries. Syntactic and semantic characteristics of program modules (subroutines and commands) in a library, such as data types of arguments of a subroutine, are usually written in a manual. Using such software characteristics of program modules as the knowledge source, we can develop an automatic programming system, which composes complex programs by combining program modules in the library. A user of the system has only to write an abstract program specification without knowing about details of program codes. Although we usually have to improve the composed program in its analysis capability and efficiency, expert systems of this type are useful to quickly develop image analysis programs for various applications. Automatic programming has long been a dream in software engineering and artificial intelligence (Barr and Feigenbaum, 1982), but no general-purpose practical system has been developed. However, if we confine ourselves to a
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
99
specific application domain, there may be a good possibility to realize usable systems; we can employ much domain specific knowledge and heuristics. One of the most critical problems in automatic programming is how to describe the program specification. ESIPAs proposed so far use the following specification methods: 1. Specification through conversation, 2. Specification by abstract command and language, 3. Specification by example. Although these specification methods themselves are not very new, there is a big difference between ordinary automatic programming systems and ESIPAs for program composition. That is, besides the symbolic reasoning engine, ESIPAs contain the analysis executor (see Figure l), which executes partially composed programs so as to verify their utilities during the program composition. This capability is necessary because (a) specifications given to ESIPAs are often informal and ambiguous, so that they have to repeat trial-and-error experiments; and (b) usually specifications given to ESIPAs describe only image features to be extracted. Therefore, in order to verify if composed programs are satisfactory, ESIPAs have to apply them to real images and to examine whether or not extracted image features satisfy the specifications. 1. Program Specification Through Conversation Those systems that obtain program specifications through conversation (Tamura et al., 1988; Sato, Kitamura, and Tamura, 1988; Bailey, 1988, Clement and Thonnat, 1989; Bunke and Grimm, 1990) are very similar to the consultation systems described in Section 1I.D. That is, the reasoning and conversation with a user are performed just in the same way as the consultation systems; and when the user is satisfied with the final analysis result, the system composes a program based on the analysis history stored in the system. Figure 4 illustrates a search tree representing the reasoning history. Each node in this tree represents a practical operator (subroutine) selected by the system during the reasoning process, and arcs between nodes specify orders of execution. In the figure some sequences of operators could not generate the satisfactory result, and the sequence illustrated by bold lines denotes a successful image analysis process. The system generates an executable program corresponding to the successful path in the tree. For this program generation, the knowledge about practical programming languages is required in addition to that about image processing techniques: declaration of variables and their types, syntactic forms for procedure or subroutine calls and argument specification and so on. Since the trial-and-error analysis is required to find appropriate operators
100
TAKASHI MATSUYAMA
n
Y
successful analysis process n
FAIL
FAIL
b
\
FAIL
0
:operator SUCCESS
FIGURE 4. Search tree representing the reasoning history.
and parameters, the flexibility of a system is heavily dependent on the modification process of an instantiated plan. EXPLAIN, described earlier, uses an ordinary tree search algorithm with backtracking to find alternatives. It should be noted that the tree in Figure 3 illustrates a history of the search and that the system instantiates abstract algorithms one by one in a sequential fashion. Such a sequential search prevents the system from reasoning about image analysis processes from a global viewpoint. DIA-Expert (Tamura et al., 1988; Sato et al., 1988) uses the operation tree to explicitly describe an image analysis process at various levels of abstraction (Figure 5(a)). The level of the operation tree means the level of abstraction. That is, vertical arcs in the tree represent abstraction and instantiation relations between image analysis algorithms. At each level a sequence of image analysis algorithms is described. In the figure, such sequence is shown as a group of rectangular nodes connected by dashed arrows. The sequence at the bottom of the tree represents the sequence of executable software modules in the program library, names of subroutines in SPIDER. In DIA-Expert all reasoning processes for the plan generation, instantiation, and modification are considered as symbolic manipulations of operation trees (Figure 5(b)). Besides rules for these reasoning processes, the system contains additional rules to avoid artifacts caused by image processing operators and to enhance effects of the operators. When these rules are activated, auxiliary nodes representing various pre- and postprocessing operations are added to the operation tree (see the right side of Figure S(b)). These rules are useful to improve the capability and robustness of composed image analysis programs.
101
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION abstract algorithm abstract
binarization
noise elimination
-
single threshold
-
small reglon elimination
shape smoothing
-
shrink
expand
concrete executable command
order of execution
(a)An operation tree for segmentation.
iNSTANTlATE
L
I
small region elimination
I
I
AUGMENT
noise elimination
hole filling
FHL1
INSTANTiATE
small region elimination
- - - - _ _ - _ - ESAR
(b)Instantiationand augmentation of operation trees.
FIGURE5. Operation tree (from Tamura et al., 1988): (a) an operation tree for segmentation; (b) instantiation and augmentation of operation trees.
In short, the operation tree provides a uniform common data structure and enables the system to take a global view for program composition and modification. A similar tree representation of image analysis processes was used in Clement and Thonnat (1989). 2. Program SpecEfication by Abstract Command and Language
When we write a program for image processing, we have to write many codes
102
TAKASHI MATSUYAMA
besides those for essential processing: declaration and initialization of image data arrays, allocation of working memories, and so on. Especially, when we develop a complex image analysis program using a program module library, we want to devote ourselves to the function of each module without caring about such programming details. Expert systems that generate executable programs from abstract commands facilitate the development of complex image analysis programs. With such systems, we have to only specify combinations of modules in the library without knowing about their detailed syntactic and semantic structures. Sakaue and Tamura (1985) proposed an automatic program generation system using SPIDER. The system generates a complete RATFOR (structured FORTRAN) main program from a given abstract command sequence. Figure 6(a) shows an input command sequence, whose meaning is as follows: 1. For an image in the standard format (SFDI), G, compute its histogram (HISTl), 2. Find a threshold value from the histogram (THDS2), 3. Apply binarization to G using the threshold (SLTHl), 4. Apply connected component labeling (CLAB), 5. Remove tiny regions (ERSR3), 6. Compute a compactness measure of each region (CRCLI).
Each command denotes the name of a subroutine in SPIDER. The system stores the syntactic and semantic information about each argument of every subroutine in SPIDER, such as input-output discrimination, data type, and semantic usage (e.g., image data, histogram, property table, etc.). For example, Figure 6(b) shows the syntactic and semantic constraints of HISTl. The second line reads “the first argument ($1) is an input argument and its data type is Gray Picture” and the third line “the second argument ($2) is an output argument and its data type is Histogram.” The last three lines specify the semantic constraint on the first and second arguments: the third attribute of the first argument (i.e., $1-3, the number of gray levels of the input picture) must be equal to the first attribute of the second argument (i.e., $2-1, the size of an array for the output histogram). Based on this information, the system determines real arguments for each subroutine, if necessary, asks a user to specify missing parameters, and generates a complete main program consisting of a set of necessary data declarations and a sequence of subroutine calls (Figure 6(c)). By comparing the length of the input command sequence with that of the generated program, the effectiveness of the system can be clearly understood. Specifications used by this system describe abstract image analysis processes in terms of names of subroutines. In this sense, the level of abstrac-
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION SFDl
103
HIST1 $1 In GRYN $2 out HlST
G HlSTl in G THDS2 SLTHl in G CIAB ERSR3 CRCLl
I
$19 $2-1
(%)Syntactic and semantic consIraints.
(a)Command sequence. I program
PARTICLE
I
GRYU: GRAY PICluRE (O<= h <M;R ) HIST: H I S T O X M I EIW: BINARY IPdGi LAB: LAeELEO PICTURE WORK: kQ% AREA (1dlnENSIONAL) I m: LAIELEO PICTLRE
I integer integer lnteqger integer inteper integer
thrb51 WIT01
.I
errc AWN1
I
I I
M 1s
lrr ngr "r nr1
I
(c)Generated complete main program. FIGURE 6. Specification by abstract commands (from Sakaue and Tamura, 01985 IEEE): (a) command sequence; (b) syntactic and semantic constraints; (c) generated complete main program.
104
TAKASHI MATSUYAMA
tion of the program specification is not high; users have to know specific names of subroutines in SPIDER. We can consider it as a syntactic and semantic editor (i.e., so-called structured editor; Donzeau-Gouge, 1984) for the library, by which the program development is facilitated much more than ordinary text editors. De Haas (1987) proposed an automatic programming system that generates object detection programs from specified object models. Object models to be detected are described in terms of rewriting rules of an attribute grammar. For example, rules dl(R, S), d2(R, S) = disk(R, S) twodisks(R, D, S) -+ [dl(R, S ) , d2(R
+ D, S ) ]
describe that nonterminal symbol twodisks consists of two terminal symbols d l and d2, both of which are disk types. A disk (i.e., d l or d2) has two attributes, R and S. The former represents the location (two-dimensional position vector) and the latter the size of a disk. The symbol twodisks has an additional attribute I),which represents the displacement vector between two constituent disks. A user gives a goal like with “twodisks”: print (D,
2);,
which specifies “generate a program that detects an instance of twodisks and prints D with a precision plus or minus 2.” Using predefined object models, the system transforms the goal into an executable program: dl
= diskprograml();
d2
= diskprogram 1( );
D
= d2.R
-
d1.R;
print(D, k 2); Here diskprograml() denotes a primitive function in the program library. It analyzes an image to detect a disk and measure its properties. As is seen in this example, terminal symbols in the attribute grammar directly correspond to program modules that extract corresponding types of image features, and attributes of terminal symbols describe properties of extracted image features. In contrast to the previous ESIPA, the level of the specification in this system is too high; whereas a user can specify complex image features with internal structures by defining new nonterminal symbols, it is impossible to compose and modify program modules for primitive image feature extraction. That is, combinations of fundamental image processing operators like
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
105
smoothing, edge detection, and thresholding are built in and fixed in the program modules. In other words, the knowledge for primitive image feature extraction is procedurally represented in the modules. A more flexible ESIPA for extracting complex image features will be described in Section 1V.D.
3 . Program Specijication by Example In most ESIPAs a user specifies the objective, characteristics of an image, and evaluation results by using a predefined vocabularly (i.e., a set of symbolic predicates). As is well known, however, it is very hard to describe the expected analysis result and the image quality in such symbolic descriptions. (We will discuss this problem, specifically declarative representations of spatial relations and structures, later in Section IV.) One idea to avoid this problem is to use images and or figures for the goal specification. In IMPRESS (Hasegawa, Kubota, and Toriwaki, 1986, 1987), a user specifies a goal of analysis by a sample3gure. That is, a request given to the system is “compose a program by which the input image is transformed into the sample figure.” Figure 7 illustrates a reasoning process in IMPRESS. The overall reasoning process in IMPRESS is similar to those of other ESIPAs: the global processing plan determination followed by the operator and parameter selection. First, the system determines a global processing plan based on the type of the sample figure: point, line, or region. A plan in IMPRESS is also described as a sequence of abstract algorithms. Then, the reversed version (i.e., inverse function) of each abstract algorithm in the plan is applied to the sample figure one by one from the end (see the left column in Figure 7). The images generated by these inverse operations are used as the references to evaluate the performance of and select practical operators and parameters for each abstract algorithm in the plan. In selecting an appropriate operator and parameters for (i.e., instantiation of) each abstract algorithm, the system applies several promising operators with different parameter values in parallel. All the processing results are compared with the reference image that was generated from the sample figure. Then, the system selects the best operator with optimal parameters that generated the most similar result to the reference image. In Figure 7, for example, six different edge detection operators like SOBEL, PREWITT, and so on were applied to the input image. The numbers listed under the resultant images show goodness measures evaluated based on the reference image at the top of the left column of Figure 7. In this example, the first-order diagonal difference operator was selected and its resultant image was used as the input for the next binarization stage. After interating similar instantiation process for each subsequent abstract algorithm, the system generates a complete
106
TAKASHI MATSUYAMA
I
I
1 1
1
f
--
TION
E I M sf B I N A R I Z A
t
E
FIGURE 7. Goal specification by a sample figure (from Hasegawa et al., 1987)
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
107
image analysis program by combining the selected practical operators with optimal parameter values. The goal specification by example is very natural and easy for any user. The use of sample figures is especially effective to determine optimal parameters, which usually requires trial-and-error analysis. In IMPRESS, however, all knowledge about inverse operators for generating reference images and evaluation methods for selecting practical operators and parameters is described procedurally in a group of specialized programs. Thus, many heuristics and ad hoc rules are implicitly incorporated. This makes it hard to improve and augment reasoning capabilities of the system. Another serious problem in IMPRESS is that it composes an image analysis process (program) based on a single pair of an image and a sample figure, so that selected operators and parameters depend heavily on that specific image data and sample figure. Most users, however, want to have a general image analysis process that works well for a class of images with similar properties. A preliminary discussion on this problem is given in Hasegawa et al. (1988). First several pairs of images taken from the same class and corresponding sample figures were given to IMPRESS to generate a set of (different) image analysis processes. They proposed a heuristic reasoning method of consolidating such image analysis processes into an integrated one, which is expected to work well for all member images of that class. In general, however, the consolidation requires sophisticated reasoning for generalization and abstraction of image analysis processes. Such highlevel reasoning is very difficult without well-defined theoretical foundations for describing syntax and semantics of image analysis operators and their combinations. In this sense, the method proposed in (Hasegawa et al., 1988) is far from complete, nor does it give a promising direction to be pursued for the generalization of image analysis processes. Vogt (1989) proposed an automatic programming system named REM, whose goal specification is also given by sample figures. He emphasized that to realize intelligent systems like automatic programming systems, a concrete mathematical theory is of the first importance. REM uses the mathematical morphology (Serra, 1982, 1986) as its theoretical basis. That is, image analysis processes composed by REM are described by combinations of various morphological operators for (mainly binary) two-dimensional images. In REM, the goal specification was augmented from that in IMPRESS as follows. The following three sets of images are given as a goal specification (Figure 8(a)): 1. XO = {XOi( i = 1
-
N)}: A set of base images. These images are samples taken from a class of images to be analyzed.
108
TAKASHI MATSUYAMA base image
reject mask
I
(a) Problem specification by a base image, accept and reject masks.
Sequential Decomposition
Split Decomposition
Transform
-
x1
Sequential Decomposition
Fill-in Transform
Transform
1
Fill-in Transform
( XO-
OP1
X4-
OP2
X5-
OP4
A0 I RO
)
(b) Example of the algorithm state tree (see text).
FIGURE8. Program composition using morphological operator: (a) problem specification by a base image, accept, and reject masks; (b) example of the algorithm state tree (see text).
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
-
109
2. A0 = {AOi (i = 1 N ) } :A set of accept masks. Each mask image in this set specifies a group of pixels that must be extracted from its corresponding base image. 3. RO = {ROi (i = 1 N ) } : A set of reject masks. Each mask image in this set specifies a group of pixels that must not be extracted from its corresponding base image.
-
First, the problem of IMPRESS discussed earlier, i.e., a composed image analysis process in IMPRESS depends heavily on a pair of an input image and a sample figure, is solved to some extent by using sets of images as a problem specification. Second, a goal of image analysis is described more precisely by using both positive and negative examples; i.e., accept and reject masks. The reasoning for program composition in REM is performed by the stepwise refinement of abstract analysis processes. The reasoning engine uses a tree structure named Algorithm State Tree to maintain the current state and history of the reasoning process so far executed. Figure 8(b) illustrates an example of the algorithm state tree. Each nonterminal node (ellipse) in the tree represents an abstract image analysis process, which in turn is represented by a directed graph. In the graph, a node represents a base image set, a set of intermediate analysis results or a pair of accept and reject mask sets. An arc in the graph represents a practical morphological operator (OPS 1, OPS2, OPS3, OPS4, and UNION in Figure 8(b)) or a subproblem to be solved (i.e., an abstract image analysis process to be composed, PROBO, PROB1, PROB2, PROB3, and PROB4 in Figure 8(b)). For example, the root node of an algorithm state tree contains a directed graph with a single arc. This arc represents the original problem specified by a user, PROBO, and connects a set of base images and a pair of accept and reject mask sets given as a goal specification (see Figure 8(b)). Each subproblem is symbolically described in terms of three sets of images: input images to be analyzed and accept and reject masks for the input images. To solve a subproblem (i.e., to instantiate an abstract image analysis process), the reasoning engine first analyzes degrees of overlap among the three sets of images and masks. Then it instantiates the subproblem by selecting an appropriate morphological operator depending on characteristics of the overlaps. About 10 predicates are prepared to describe characteristics of overlaps. Predicate A-SUP (A-ZNF), for example, implies that each base image (i = 1 N ) is an upper (lower) bound on its corresponding accept mask A i . Predicates R-SUP and R-INF are defined similarly using base images and reject masks. The operator selection process is implemented by a decision tree, so that the flexibility of reasoning is limited. In most cases, however, the selected operator, when applied to base
x.
-
110
TAKASHI MATSUYAMA
images, does not produce perfect results that match the given accept and reject masks completely. Then the reasoning engine transforms (specializes) the subproblem into a combination of the selected operator and a new (sub-) subproblem. By this transformation, the graph including that subproblem (i.e., node in the algorithm state tree) is expanded to a new graph, which then is added to the algorithm state tree as a new child node of the original one (see Figure 8(b)). An arc in the algorithm state tree represents such transformation. The following three types of transformations were proposed in REM: 1. Fill-in Transform. This transform just replaces a subproblem with a selected morphological operator. The transform of this type is executed when the selected operator generates analysis results that completely match the accept and reject masks. 2. Sequential Decomposition Transform. This is the most commonly used transform. The original subproblem is decomposed into a sequence of the selected operator followed by a new subproblem. The set of base images for the new subproblem is generated by applying the selected operator to the set of original base images. On the other hand, complicated processings are required to produce the accept and reject masks for the new subproblem. This is because a major purpose of the new subproblem is to compensate the imperfection of the selected operator. Thus the accept and reject masks of the new subproblem are not the same as those of the original subproblem. Note that this point is not illustrated in Figure 8(b). 3. Split Decomposition Transform. As shown in Figure 8(b), this transform splits the original subproblem (arc in a graph) into multiple new subproblem (arcs), whose results are combined by UNION or INTERSECTION operators. This transform does not seem to be implemented in Vogt (1989) because it is very hard to generate accept and reject masks for :he new subproblems; degrees of overlap among images and masks alone do not include enough information to conduct reasoning about new subproblems in this transform.
The search process executed by the reasoning engine expands the algorithm state tree by applying these transforms until a graph with no subproblem (leaf node in the tree) is obtained. The implemented search algorithm was a simple breadth first search and some cost function was incorporated to guide the search. In short, although REM’s approach is interesting and promising, the image analysis processes that can be composed by it are limited. This is because 1. Although a simple thresholding operator and some mechanism to cope
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
111
with noise were incorporated, many filtering operators for gray images were not used. The implemented system is mainly for binary image processing. 2. Morphological operators were augmented with bandpass operators that include ranges of parameter values to control sizes of structural elements. But the ability to describe shapes, especially linelike figures, is limited. 3. Sophisticated knowledge about operator characteristics is required to realize more flexible operator combinations like the split decomposition transform; image analysis processes composable by the implemented system are confined to sequential ones. (We will discuss this problem later in Section 111.)
F. Rule-Based Design System f o r Bottom-up Image Segmentation Algorithms Image segmentation is a crucial analysis process in image recognition and understanding. It extracts image features such as lines, corners, and regions from an image, in terms of which the structure of the image is described symbolically. In this sense, the segmentation process can be considered as a bridge between iconic signal data and symbolic structural description. Hundreds of image segmentation algorithms have been developed (Ballard and Brown, 1982; Rosenfeld and Kak, 1982), but none of them works perfectly for complex natural images and yet another new segmentation algorithm is being proposed. In general, there are two types of image segmentation for image recognition and understanding: 1. Bottom-up Image Segmentation. Given an image(s), an ordinary image
analysis process to extract image features is image (image quality enhancement) * enhanced image * (feature extraction) image feature * (grouping) =s structural description. The segmentation of this type is called bottom-up image segmentation. Its major knowledge sources are characteristics of imaging devices (e.g., resolution, focal length, color spectral properties), image models (e.g., step and roof edge models, smoothness of gray-level variations in a region), and so-called gestalt laws for grouping (e.g., proximity, good continuation). Usually no knowledge about specific objects to be detected is incorporated in the bottom-up segmentation process. This means that the entire image is analyzed uniformly; no specification about interesting areas is given a priori. 2 . Top-down Image Segmentation. The segmentation of this type
112
TAKASHI MATSUYAMA
sometimes called goal-directed image feature extraction. As its name stands, top-down image segmentation is guided by a goal, a model of an image feature to be extracted from an image. A goal may be specified directly by a human user or generated by the reasoning engine in an image understanding system. In IUSs top-down image segmentation is used to compensate the imperfection of bottom-up image segmentation; since the latter uses very general knowledge for image segmentation and little is known about the structure of an image under analysis, it often fails to extract some meaningful image features. (This reminds us the chicken-and-egg problem discussed in Section 1I.A.) Then, top-down image segmentation is initiated to find such missing image features. At the initiation of top-down image segmentation, IUSs reason about types and properties of missing image features and their approximate locations. Then such information is given as a goal specification to the top-down segmentation process. Thus, in contrast to the uniform analysis over the entire image in bottom-up image segmentation, topdown image segmentation analyzes only focused local areas based on the specific knowledge given in the goal specification. This is the reason why top-down image segmentation can compensate the imperfection of bottom-up image segmentation. The combination of the uniform bottom-up analysis and the goal-directed top-down analysis is a powerful and practical way of solving the chicken-and-egg problem in image analysis and understanding. Several ESIPAs described so far can be considered as those for top-down image segmentation: Haar’s system, IMPRESS, and REM. In this and next subsections, we will describe ESIPAs for image segmentation: first in what follows, we describe an ESIPA for bottom-up image segmentation and then, in the next subsection, we will describe an ESIPA for automatic top-down image segmentation. Usually, algorithms for bottom-up image segmentation use many heuristics to split or merge regions and lines into meaningful ones. For example, “merge neighboring small regions with similar properties into one region” and “connect pieces of collinear line segments into one” are typical heuristics used in bottom-up image segmentation. While filtering operators for smoothing and edge detection can be designed based on well-defined mathematical theories, the incorporation of heuristics into segmentation algorithms is inevitable; we have no concrete formal theory for grouping operations except psychologically tested gestalt laws. Thus, to design an image segmentation algorithm with high performance, we have to repeat trial-and-error experiments to test the effectiveness of incorporated
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
113
RULE(1): IF: (1)The. REGION SIZE is LOW AND ii)mREGION ASPECT RATIOis VERY HIGH AND (3)The DIFFERENCE in INTENSITY is N O T HIGH AND (4)The DIFFERENCE in HUE is NOT HIGH AND (5)The DIFFERENCE in SATURATION is NOT HIGH THEN: (1)MERGE the two REGIONS.
METARWE( 1): IF: (1)Previous PROCESS was FOCUS AND (2))previousPROCESS was ACTIVE THEN: (1)Match the REGION ANALYSIS rules.
METARULE(2): IF: (1)Previous PROCESS was REGION AND (2)Previous PROCESS was NOT ACTIVE THEN: (1)Match the LINE ANALYSIS rules.
FIGURE9. Production rules for bottom-up image segmentation.
heuristics. Moreover, different heuristics should be used depending on the image quality. Most image segmentation algorithms have been implemented as ordinary procedural programs, so that heuristics used by them are implicitly represented, and consequently, their flexibility and modifiability are vitiated. Nadif and Levine (1984) proposed a rule-based expert system for image segmentation to increase the flexibility and modifiability of bottom-up image segmentation. The objective of the system is to partition a given image into a set of mutually disjoint meaningful regions. In the system, various heuristics for image segmentation are represented by a set of production rules (Figure 9). As discussed in Section I, the explicit declarative representation of heuristics greatly facilitates their modification, augmentation, and the test of their effectiveness. Such facilities are very valuable in designing image segmentation algorithms because we have to devise and test various different heuristics depending on properties of images under analysis. As will be described later, moreover, flexible control mechanisms can be realized by rule-based systems. Roughly speaking two types of rules were used in the system: one for heuristics for image segmentation and the other for controlling the analysis flow of the segmentation process. 1 . Rules for Image Segmentation Heuristics. Rules of this type can also be
114
TAKASHI MATSUYAMA
divided into two sets: one for region-based segmentation and the other for line-based segmentation. This is because, to realize the meaningful partitioning of a complex image, region-based segmentation alone is not enough and complimentary line-based segmentation for detecting region boundaries is required. In both types of rules, the condition part of a rule describes constraints on attributes of regions and lines and their mutual spatial relations. The action part specifies a split and/or merge operation on the regions and lines specified in the condition part. Rule (1) in Figure 9 illustrates a production rule for region-based image segmentation. Several tens of primitive predicates and actions were prepared to describe heuristics in the form of production rules: in rule (1) in Figure 9, for example, REGION SIZE, LOW, ASPECT RATIO, VERY HIGH, MERGE, and so on. 2. Rules for Process Control. Besides rules for image segmentation heuristics, the system stores a set of meta rules to control the overall segmentation process (metarules (1) and (2) in Figure 9). First, the mode of the system is switched by a meta rule from region analysis to line analysis and vice versa. That is, which region-based or line-based analysis is performed is dynamically determined by meta rules. The second role of meta rules is to select and focus on a local area to be analyzed, and rules for image segmentation are applied to those regions or lines in the focused area. Metarule (1) in Figure 9 represents the control knowledge such that when a focused local area is selected, first perform region-based segmentation. Metarule (2) implies that perform line-based segmentation if region-based segmentation is terminated. Using these metarules, flexible control structures can be realized, which are very difficult in ordinary procedural segmentation programs. The flexibility of the system allows the fast development of effective image segmentation algorithms, although the execution time is slow. It should be noted that this ESIPA is different from the other ESIPAs discussed in this chapter; it is an expert system to develop a new image segmentation algorithm, while the others compose image analysis processes based on existing image processing operators. G . Goal-Directed Top-down Image Segmentation System
In image understanding of complex scenes, goal-directed top-down image segmentation is often required to correct errors incurred by initial bottom-up image segmentation as well as to verify the existence of a hypothesized object (Selfridge, 1982; Matsuyama and Hwang, 1990). For example, Figure 10 illustrates a snapshot of the spatial reasoning performed by our aerial image
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
115
FIGURE10. Example of top-down segmentation: (a) hypothesis generation for a missing object; (b) target window for top-down segmentation; (c) detected elongated rectangle.
116
TAKASHI MATSUYAMA
understanding system SIGMA (Matsuyama and Hwang, 1990). The scene under analysis is a suburban housing development; and the task of the system is to recognize houses, roads, and driveways and describe the structure of the scene in terms of recognized objects. The elongated and compact solid rectangles in Figure I O(a) show recognized road-piece and house instances that were detected by bottom-up image segmentation, respectively. Two overlapping dashed rectangles between the solid ones represent hypotheses for a driveway that were generated by these two recognized objects. As shown in the figure, since the contrast between the driveway and its surrounding area is very low, the elongated rectangle corresponding to the driveway could not be extracted by the bottom-up image segmentation. Then, the system initiates top-down image segmentation for extracting an elongated rectangle in the local window specified by the hypotheses (Figure 10(b)). Figure 1O(c) illustrates the elongated rectangle detected by the low level vision expert (LLVE) for top-down image segmentation in SIGMA. In what follows, we will describe the knowledge representation and reasoning methods of LLVE. (A summary of the reasoning mechanism of SIGMA will be given in Section 1V.D.) The task of LLVE is to automatically extract image features (e.g., lines and regions) that satisfy the constraints specified in a given goal. “Find a rectangle whose area size is between 100 and 200 pixels” is a typical example of the goal specification. LLVE uses two types of knowledge to conduct automatic image segmentation: 1. Knowledge about fundamental concepts in image segmentation: types of image features extractable from an image and types of image processing operators. 2. Know-how about image segmentation techniques: how to combine operators effectively and conduct trial-and-error analysis by selecting operators and adjusting their parameters.
The knowledge of type 1 can be defined in a formal way, while that of type 2 involves many heuristics. In LLVE the former type of knowledge is represented by a network describing the type structure in the task domain of image segmentation and the latter by a set of production rules. Suppose we want to extract line segments from a gray picture. Gray picture * (edge detection) * edge picture => (thresholding)*edge point => (linking) * line segment would be a typical analysis process to satisfy the objective. We call gray picture, edge picture, edge point, and line segment in the preceding example image features, and edge detection, thresholding, and linking transfer processes. An image feature refers to a type of information extractable from a raw picture data, and a transfer process an abstract
IMAGE PROCESSING, PROCESSING,ANALYSIS, ANALYSIS, AANNDD RECOGNITION RECOGNITION IMAGE
117 117
output Image Feature
Input Image Feature
Algorithm 1 Algorithm 2 Transfer Process
.......... Algorithm n-1 Algorithm n
Transfer Process 1
Transfer Process 2
-
1
. .. . . . .
Transfer Process N
FIGURE11. Image features, transfer processes, and process sequences.
algorithm’ that analyzes its input image feature to generate its output image feature. Usually, to extract a specific image feature from a raw picture, we have to combine several different transfer processes as shown in the preceding example. We call such ordered sequence of transfer processes process sequence. Figure 11 illustrates the conceptual relations among image features, transfer processes, and process sequences. Note again that a transfer process denotes a class (type) of practical algorithms with similar functions. LLVE uses image features and transfer processes as fundamental descriptive terms to represent the knowledge about image segmentation. Figure 12 illustrates the network knowledge organization used by the system, where each ellipse denotes an image feature and a directed arc a transfer process. Several kinds of icons are used to represent heads of arcs, which illustrate types of transfer processes. (Here we will not go into detailed discussions on the meanings of such types.) Each image feature is represented by a frame (Minsky, 1975). We call it the image feature frame. A frame is a data structure to represent structured knowledge about an object. It consists of a set of slots, where various types of information, such as attributes of the object, relations to other objects, and procedures to compute properties, are stored. Each image feature frame consists of a set of slots representing various attributes of the image feature. For example, the image feature frame for Line contains slots representing starting point, chain code, length, etc. (Figure 13(a)). In addition to slots for such attributes, every image feature frame
’
software type type theory theory (Harland, (Harland, 1984), 1984), aa transfer transfer process process can can also also be be From the viewpoint of software considered as a type: type: aafunctional functional type. type. That That is, is, all all functions functions that that transform transformdata dataof of type type AA into into type B are of the same same functional functional type type denoted denoted by by A A+ +B. B. those of type
118
TAKASHI MATSUYAMA Low Level Vision E x p e r t NETWORK D i s p l a y W i n d o w
FIGURE12. Network knowledge organization.
includes two additional slots used by the system: MADE-BY slot to represent the transfer process that generated the image feature, and FROM slot to represent its source image feature from which that image feature was generated. The system uses these two slots to maintain the analysis history. An image feature frame defines a class of practical image features. Real data extracted from an image are represented by its instances. Figure 13(b) shows an instance of the image feature frame for Line. All of the slots are filled with values representing the attributes of the detected line, while the image feature frame defines the slot structure. In image segmentation, image feature frame instances are analyzed by transfer processes, and new instances are generated. We also use a frame to represent a transfer process. Figure 13(c) shows the structure of the frame to represent transfer processes. All transfer processes have the same slot structure, while those of image feature frames vary depending on their attributes. As in the case of image feature frames, every really executed analysis process is represented by an instance of such frame, where all slots are filled with values representing the name of an executed practical algorithm, parameter values, input and output image feature frame instances, and computation time. Note that the value in the MADE-BY slot in Figure 13(b), (LINKING 890), denotes the name of an instance of transfer process frame LINKING, which was applied to an instance of image feature
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION Image Feature Frame Name: Slot Name:
119
Line Starting Point Chain Code Length MADE-BY FROM
(a)Image feature frame for Line Slot Name Image Feature Frame Name: Starting Point: Chain Code: Length: MADE-BY: FROM:
Slot Value Line (10, 60) (2, 3, 3, 5 ,....) 15.5
<Edge Point 567>
(b)Instance of image feature frame Line
I Transfer Process Frame Name:
-----
Slot Name:
Type of Transfer Process Algorithm Parameters Input Image Feature Frame Instance Output Image Feature Frame Instance Computation Time (c)Transfer process frame.
FIGURE 13. Symbolic representation of image features and transfer processes: (a) image feature frame for Line; (b) instance of the image feature frame for Line; (c) transfer process frame.
frame edge point, (Edge Point 567), to generate an instance of image feature frame line, (Line 123). In addition to the network describing the fundamental terminology for image segmentation, knowledge and know-how are required to realize effective image segmentation. The following lists tasks requiring knowledge. 1. Selection of an Appropriate Process Sequence (Efective Combination of Transfer Processes). Usually, there are many different ways of extracting a specific image feature. Such different analysis processes are
120
TAKASHI MATSUYAMA
represented as multiple directed paths (i.e., sequences of arcs) in the network. In other words, various different process sequences are embedded in the network. For example, there are two major paths to extract a region from a gray picture: one via line and the other via homogeneous region. Knowledge-based reasoning is required to find an appropriate process sequence. 2. Selection of a Practical Algorithm and Its Optimal Parameters. As noted before, a group of executable image processing operators are associated with each transfer process (see Figure 11). For example, the transfer process (i.e., arc in Figure 12) from gray picture to edge picture represents edge detection, with which various practical edge detection operators like the Sobel and Laplacian are associated. The system has to reason about which practical operator is useful for each transfer process in the selected process sequence. Moreover, since operators have many adjustable parameters, the system has to determine optimal values for them. 3. Realization of Robust Image Segmentation. We need many heuristics to make image segmentation robust and adaptive for images of different qualities. For example, if an image is noisy, one should smooth it before segmentation, and it is preferable to apply nonmaximum suppression to an edge picture before edge tracking. Moreover, image segmentation involves many steps of trial-and-error analysis; it is usually hard to determine the best operator and optimal parameter values before analyzing a given image. Thus, in order to make image segmentation adaptive, we need know-how to conduct trial-and-error analysis: how to select an alternative operator and adjust parameter values based on preceding analysis results. In LLVE, these knowledge and know-how are represented by production rules. Table I summarizes seven types of production rules used by the system. Each rule has an AT part in addition to ordinary IF and THEN parts. Conceptually, the AT part specifies where in the network each rule is to be stored: which node or arc in the network. As will be described later, the reasoning engine searches the network for the most promising process sequence to extract the image feature specified in a given goal. When the search process visits a node or follows an arc, rules stored in it are activated to guide the subsequent search. In other words, image features and transfer processes define the structure to be searched and production rules control the search. Figure 14 shows an example of the reasoning and segmentation by LLVE. The goal specification given to the system is as follows (Figure 14(b)). Find a rectangle(s) in the image shown in Figure 14(a) whose area size is between 100 and 400 and that is located in the specified window (upper-left and
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
121
TABLE I TYPESOF PRODUCTION RULESUSEDIN LLVE. Type of Production Rule
Function
Example
Transfer Process Selection Rule
Select promising transfer processes during the search
AT Region IF ISOLATION is NO, THEN select CLOSED-SEGMENT-TOREGION
Transfer Process Dependency Rule
Describe the dependency between transfer processes
AT HOMOGENEOUSREGION-TO-REGION IF NOISE-LEVEL is not LOW, THEN include BINARYPICTURE-TO-BINARYPICTURE
Cost Computation Rule
Estimate the processing cost of transfer processes
AT EDGE-POINT-TO-LINE IF algorithm equals linking and COMPLEXITY is HIGH, THEN (equation to compute the cost)
Constraint Transformation Rule
Transform constraints on the output image feature into those on the input image feature of the transfer process
AT REGION-TO-POLYGON FOR area IF T, THEN set POLYGON.AREA to REGION. AREA
Algorithm Selection Rule
Select a promising executable algorithm for the transfer process
AT EDGE-POINT-TOLINEAR-SEGMENT IF ACCURACY IS HIGH, THEN select fine Hough transform
Parameter Selection Rule
Select an appropriate parameter value for the algorithm
FOR thresholding, IF failed before, THEN increase the threshold by 20%
Failure Rule
Specify alternatives to recover from failure
AT POLYGON-TORECTANGLE IF T, THEN FIND-ANOTHERPATH from Polygon
122
TAKASHI MATSUYAMA
GOal Frame : RECTANGLE Goal Proowties : AREA-WE IINTEOW): 1103 mot
coMpLD(TTY 1syMBouc:
(b)
*0
No *Rectangle
c
Modify Threshold
FIGURE14. Example of image segmentation by LLVE.
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
123
in the goal specifies lower-right corners: (1 6539) and (204,115)). COMPLEXITY the complexity of the image data under analysis. To describe other image NOISE-LEVEL, and TEXTURE properites, predicates like ISOLATION, CONTRAST, are prepared. In addition to these specifications, we can describe ACCURACY and ALLOWABLE-COST in a goal. The former specifies how accurately the analysis should be done and the latter how much computation time can be spent. Given the goal, the system first reasons about which process sequence is best to extract the specified rectangle from the window. This reasoning is done by searching for the most promising path in the network connecting gray picture and rectangle. The search is initiated from the goal image feature, rectangle, and traces arcs in the network backward to the source image feature, gray picture. In principle, the search process tries to find the minimum cost path between the source and goal image features. Note that the cost of each transfer process is dynamically estimated by cost computation rules (Table I), because the computation cost varies depending on specified goals and properties of image data under analysis. Transfer process selection rules and transfer process dependency rules (Table I) represent know-how about how to combine transfer processes and guide the search. During the search, they are activated to prune search paths and introduce auxiliary transfer processes to make a selected process sequence effective, respectively. The pruning and introduction of auxiliary transfer processes by such rules surpass the minimum cost search: an expensive path can be selected if its utility and effectiveness are acknowledged by rules. In the current example, the process sequence shown in Figure 14(c) was selected. The self-looping transfer process at binary picture means noise elimination. This auxiliary transfer process was introduced to realize effective analysis by a transfer process dependency rule. After the search, the system executes the transfer processes included in the selected path one by one from the beginning (Figure 14(d)). Algorithm and parameter selection rules (Table I) are activated to select an appropriate executable operator and its parameters for each transfer process. Constraint transformation rules (Table I) transform constraints on the goal image feature (i.e., rectangle) into those on intermediate image features (i.e., nodes) in the selected path. When a transfer process (i.e., selected operator) is executed, an instance(s) of its output image feature is generated, and its attributes are compared with the constraints transformed onto the output image feature. If there exists at least one instance that satisfies the constraints, the transfer process is regarded as successful, and the next transfer process is executed by using that instance as input data. In the example, however, the transfer process POLYGON-TO-RECTANGLE failed: no instance of rectangle satisfying the given constraints was generated.
124
TAKASHI MATSUYAMA
In case of such failure, the system activates failure rules (Table I) associated with the failed transfer process. The rule associated with POLYGON-TORECTANGLE suggested “retry the same process sequence by changing a threshold value for binarization” (Figure 14(e)). This rule embodies the knowledge that the result of binarization is very sensitive to a threshold value, so that the modification of the threshold can improve the analysis result. By this modification, a rectangle satisfying the goal specification was successfully extracted (Figure 14(f)). In general, failure rules modify operators, parameters, and process sequences to recover from failure. The system returns failure when no possible modification by failure rules is left. As described, all reasoning processes are guided by production rules. Therefore, not all possible process sequences nor operators may be tried. In other words, the amount and quality of the knowledge represented by rules determine the capability of the system. The current system contains about 400 rules, but they are far from sufficient. Moreover, as shown in Figure 12, image features extractable by the system are confined to very primitive ones. LLVE cannot directly extract complex image features such as dark regions with narrow bottlenecks and bright holes. (We will describe an ESIPA for detecting complex image features with internal structures in Section 1V.D.) In addition to these problems, analysis processes executed by LLVE are confined to sequential combinations of transfer processes, which, as in the other ESIPAs, limits the capability of image analysis. We will discuss this point in the next section.
111. REPRESENTING KNOWLEDGE ABOUT IMAGE ANALYSIS STRATEGIES While objectives, knowledge representation, and reasoning methods of the ESIPAs discussed in Section I1 are different, they share a common general characteristic; that is, they are sequential reasoning systems. Their sequentiality is twofold: 1. Object-Level Sequentiality. Objects here stand for the entities that expert
systems process and reason about. In the case of ESIPAs, objects are image analysis processes to be composed or executed. Thus the object level sequentiality implies that all image analysis processes composed and executed by the ESIPAs so far described are confined to sequential combinations of primitive operators. Note that in REM, although a nonsequential combination, split decomposition transform, was proposed, no practical algorithm to realize it was described or implemented. 2 . Reasoning-Level Sequentiality. The reasoning processes of the ESIPAs
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
125
are realized by sequential search methods to find a solution in a problem state space. Figures 3 and 8 illustrate typical examples of such sequential search processes. While backtracking and some meta (control) rules were incorporated to augment the flexibility of the search, no concurrent, parallel, or cooperative reasoning was used.
In some sense, these sequentialities are natural partly because many standard image analysis processes can be modeled by sequential combinations of primitive operators, and partly because reasoning engines of ESIPAs can be easily realized by ordinary sequential reasoning mechanisms such as backward-forward chaining in expert systems and linear resolution in PROLOG. Such sequentialities, however, put significant limitations on the systems’ capabilities, which cannot be overcome no matter how much knowledge is incorporated. In the history of image processing research, many sophisticated image analysis strategies such as coarse-to-fine image analysis and integration of edge-based and region-based analyses have been developed to improve the effectiveness and reliability of image analysis. However, image analysis processes based on such strategies cannot be composed as sequential combinations of primitive operators. At the reasoning level, on the other hand, concurrent and parallel reasoning methods are being studied intensively in artificial intelligence to improve flexibility, reliability, and robustness as well as efficiency of intelligent systems (Bond and Gasser, 1988). By introducing concurrent or parallel processing we can get much more than the speed-up of computation; the cooperation and integration among concurrent and parallel reasoning modules allow various substantial improvements of systems’ reasoning capabilities, which are very difficult to realize by sequential reasoning systems. In this section, we will describe the following two advanced ESIPAs to see how reliable and robust image analysis can be realized by introducing nonsequentialities at the object and reasoning levels respectively: 1. Heterogeneous combinations of image processing operators, 2. Cooperative integration of multiple stereo vision algorithms. A . Heterogeneous Combination of Image Processing Operators
In the history of digital image processing, there have been proposed many image analysis strategies to increase the performance of image analysis: feedback (Nagao, 1984) and plan-guided (Kelly, 1971) analyses, coarse-tofine image analysis based on multiresolution images, integration of edgebased and region-based analyses, various optimization methods, and so on.
126
TAKASHI MATSUYAMA
Although these strategies are useful to improve the accuracy, reliability, and efficiency of the analysis, the ESIPAs described so far cannot utilize them at all. Therefore how to incorporate the analysis strategies into ESIPAs is an important problem to improve their image analysis capabilities. In what follows, we discuss two schemes of representing image analysis strategies: one from a software engineering viewpoint and the other from a knowledge representation viewpoint. 1. Functional Programming Language for Heterogeneous Operator Combination All ESIPAs except the one described in Section 1I.F compose image analysis processes by combining primitive operators stored in a library. However, the composition is confined to sequential combinations of the operators. That is, the output of an operator is successively passed to the next operator as its input data. Although many standard image analysis processes can be represented as sequential combinations of operators, their performance is limited. When we describe an image processing operator by a function, the result of applying operator 0 to image D is denoted by O(D). Thus the sequential combination of operators can be described as O,(On-, (. ’ . O,(O, (D))),
(1)
where D denotes an input image data and the 0,, 0 2.,. . ,On-,, 0, functions representing image processing operators. We omit arguments of functions representing parameters for each operator. These functions are successively applied in this order: the innermost function is applied first to produce the data for the second innermost function and so on. This evaluation order is called the applicative order in functional programming. We proposed the following three types of heterogeneous (i.e., nonsequential) combination functions to represent typical image analysis strategies (Matsuyama, Murayama, and Ito, 1988): 1. Combination of multiple analysis results, 2. Mask controlled operation, 3. Parameter optimization.
While we implemented a simple functional programming language with these functions, we will only describe illustrative programs to see how complex image analysis processes can be described without going into implementation details. The combination of type 1 above is described as COMBINE(O, ( D l), . . .
,on(0,) BY c),
(2)
127
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
(a)Combinationof multiple analysis results
(b)Mask controlled operation
(c)Parameteroptimization
FIGURE15. Heterogeneous combinations of operators: (a) combination of multiple analysis results; (b) mask controlled operation; (c) parameter optimization.
where D , ,. . . ,Onare input images for operators 0,,. . . ,On, respectively (Figure 15(a)). C denotes a function that combines O,(D,) (i = 1 n). Note that C is just a function that takes multiple images as its input data and combines them to produce one output image. Therefore we could describe function (2) as
-
C ( 0 ,(Dl
)?
. . On(DnN, . 9
(3)
or more simply
-
C(D', . . . ,D"),
(4)
where D' = O,(D,) (i = 1 n). In other words, the function COMBINE is introduced just to explicitly describe the combination of multiple image data, and the keyword BY to specify the function name for the combination. These syntactic structures improve the readability of programs consisting of nested function calls. (See examples given later.)
128
TAKASHI MATSUYAMA
Typical examples of the processing done by the combination function C are (a) pixelwise logical and arithmetic operations between images, (b) spatial combination (i.e., mosaicking) of local analysis results, (c) consistency examination among multiple analysis results. Combination functions of type (a) are trivial. Practical examples of types (b) and (c) are given later. The mask controlled operation is a popular method to analyze specified (focused) local areas in an image, by which the focus of attention (Nagao, 1984) in image analysis can be realized. That is, suppose an entire image of a cluttered scene is too complex to be segmented into meaningful image features; due to various types of objects, complex lighting conditions, mutual reflections, and shadows, the image quality changes so widely depending on locations in the image that no operator based on a single image model works correctly for the entire image. However, if we can confine the domain of analysis into a small local area in the image, even a simple-minded operator can work correctly to extract a meaningful image feature. This is because the image quality in such a small area is almost homogeneous and the image structure is simple. Of course, it is not easy to find a local area where a meaningful image feature is located. Sophisticated reasoning mechanisms for such focus of attention were described in Selfridge (1982) and Matsuyama and Hwang (1990). The mask controlled operation is described as MASK(O(D) BY M ) . (5) M means a logical mask image to specify local areas in image D where operator 0 is to be applied (Figure 15(b)). Each pixel in the mask image has either TRUE or FALSE as its value. The output of function MASK is defined as O(D(x,y ) ) if M ( x,y ) = TRUE
undefined
if M ( x , y ) = FALSE,
(6)
where D(x,y ) and M(x,y ) denote pixels in D and M , respectively. Mask M itself can be an output of another operator, which extracts interesting regions. Note that we cannot write function (5) as M), (7) where D’= O(D). This is because the function MASK really applies function 0 to D only in those areas specified by M . In this sense, function 0 itself is an argument of the function MASK. A function that takes another function as its argument is called a functional in computer science. To make it explicit MASK(D’ BY
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
that
MASK
129
is a functional, we may write function (5) as MASK(O, D, M ) .
(8)
We use the keyword BY in (5) just for easy reading as in the case of (2). The parameter optimization is very useful to determine the optimal parameter value for an operator. It is described as
.. OPTIMIZE(O(D,
a ,
*, . .), n,,n,, n 3 , BY 0, AT E ) , *
(9)
where D denotes an input image to be processed, 0 an image processing operator, D, the reference image, and E the evaluation function. The symbol * denotes the parameter for operator 0 to be optimized. The parameter is changed from n, to n2 at the increment of n3, and operator 0 with each different parameter is applied to D repeatedly. Then the processed results are evaluated by evaluation function E using D, as the reference: the function E compares each processed result with D, and measures their mutual consistency. The function OPTIMIZE selects and outputs the best result among them: it selects the one whose consistency measure is the highest (Figure 15(c)). Note that the function OPTIMIZE is also a functional and that the keywords BY and AT are used for easy reading. Using these three types of combination functions, we can describe complex heterogeneous analysis processes in a compact way. In what follows, we will demonstrate the effectiveness of the heterogeneous combinations with illustrative examples. The first example is an integration of edge-based and region-based analyses. As is well known, it is very hard for an ordinary region segmentation operator to perfectly partition a complex image into meaningful regions. While most segmented regions correspond to objects in the scene, some cover multiple objects and others only parts of objects. Figure 16(b) is the result of ordinary region segmentation of Figure 16(a), where a shadowed face of the cube and a shadowed area of the floor were merged into one region (the rotated L-shaped region in Figure 16(b)). An image analysis strategy to detect and recover such segmentation errors is to integrate both region segmentation and edge detection. That is, since these analysis methods are complementary to each other, errors caused by one method can be corrected by the other. Moreover, we can evaluate the reliability of analysis based on the degree to which results by these complementary methods coincide with each other. The following function is an example of the integration between edgebased and region-based analyses. MAsK(region(D)BY binarym(coMBINE(region(D), edge(D) BY
edge-count), threshold))
(10)
130
C
TAKASHI MATSUYAMA
d
FIGURE 16. Combining edge-based and region-based analyses: (a) input image; (b) result of region segmentation; (c) detected edge points; (d) result of combining edge-based and regionbased analyses.
Figure 16(d) shows the analysis result by this function. First a pair of the innermost functions region and edge are applied to D, an input image: input image D (Figure 16(a)) is segmented into regions by the function region (Figure 16(b)) and edges are extracted by the function edge (Figure 16(c)). The function edge means edge enhancement followed by binarization and produces a binary picture representing edge points. That is, the function edge itself stands for a sequential combination of two primitive image processing operators (functions) in the library: edge(D) = binary(sobel(D), threshold-value),
(1 1)
where sobel denotes a filtering operation to enhance edges and threshold-value may be given by the user or computed by other functions. Then, the results of edge-based and region-based segmentations are combined by the function edge-count. It enumerates the number of edge points in each segmented region and outputs the image where each pixel is given the number of edge points in the region to which it belongs. The function binarym performs binarization using the specified threshold (i.e., threshold in function (10)) and outputs a logical mask image. Finally, the
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
131
I
FIGURE17. Optimization of threshold: (a) image of very low contrast; (b) result of ordinary binarization, the threshold value was determined by Ohtsu’s (1979) method; (c) detected edge points; (d) result of the sophisticated binarization, the threshold value was determined by the parameter optimization operator.
areas specified by this mask, which denote regions including many edge points in their interiors, are analyzed by the function region again to segment such nonhomogeneous regions. In the example, the rotated L-shaped region in Figure 16(b) was regarded as nonhomogeneous and segmented into two small regions in Figure 16(d). As is seen in Figure 16(a), the contrast of the boundary between these two newly segmented regions is too low to be detected by the initial uniform region segmentation; i.e., by the first application of the function region to input image D. It should be noted that Figure 16(d) includes those homogeneous regions that were not processed by the last mask operation as well as the regions generated by function (10). Thus, to obtain Figure 16(d), we applied another function COMBINE that spatially combined (mosaicked) into one image both the newly segmented regions produced by function (10) and the homogeneous regions detected at the initial region segmentation. Figure 17 shows an example of the parameter optimization. It is usually difficult to determine the optimal threshold for binarization in an image of low contrast (Figure 17(a)). Figure 17(b) shows the result of binarization by
132
TAKASHI MATSUYAMA
using the threshold value determined by Ohtsu’s (1979) method, where most of the elliptic particles could not be detected correctly. An effective strategy to determine a better threshold value is to use spatial information; the threshold value in Ohtsu’s method was determined based only on the gray-level histogram. oPTrMIzE(binary(D,*), 40,60,2 BY edge(D) AT efuncl)
(12)
realizes the binarization strategy proposed by Milgram (1979). First, the function edge defined in (1 1) is applied to input image D to detect edge points (Figure 17(c)). The threshold-value in function (1 1) was determined by Ohtsu’s method. Then the evaluation function efuncl examines the consistency between the resulting edge image and the region boundaries obtained by the binarization operator, binary: how well the region boundaries coincide with the detected edge points. Function OPTIMIZE changes the threshold value from 40 to 60 by 2 and outputs the binary picture processed with the optimal threshold value determined by efuncl (Figure 17(d)). The third example is so-called coarse-to-fine image analysis. To use images of multiple, different resolutions is an effective image analysis strategy to realize the focus of attention mechanism (Rosenfeld, 1984). The analysis strategy based on multiresolution images is called coarse-to-Jne analysis and has been used for various purposes: edge detection, motion analysis, stereo matching, and so on. Figure 18 illustrates an example of the edge detection using multiresolution images. Figures 18(a) and (b) show an original noisy image and edge points detected from it respectively. MAsK(binary(sobel(D),threshold)
BY
enlarge(binarym(sobe1
(shrink(D)), threshold)))
(13)
realizes the plan-guided edge detection proposed by Kelly (1971). First the innermost function shrink is applied to input image D to reduce the size of the image, and edge points are extracted from the shrunken image by the function sobel followed by binarym. Note that the function sobel enhances edges and that the function binarym produces a logical mask to be used for the subsequent mask operation. Then the logical mask image representing the detected edge points is enlarged by the function enlarge. The enlarged image (Figure 18(c)) is used as the mask for detecting edge points in the original image. (The functions shrink and enlarge modify the size (resolution) of an image by 0.5 and 2.0, respectively.) In the last mask operation, the functions sobel and binary (i.e., binary(sobel(D), threshold)) are applied only for those pixels specified by the mask in Figure 18(c). The threshold in (13) refers to a given threshold value for the binarization. Figure 18(d) shows the output image by function (13). By recursively applying this plan-guided edge
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
133
FIGURE18. Plan guided edge detection: (a) noisy image; (b) result of ordinary edge detection; (c) mask image for detailed edge detection; (d) result of the plan guided edge detection.
detection, the analysis based on the pyramid data structure (Rosenfeld, 1984) (hierarchical multiresolution images) can be realized. As demonstrated in these examples, we can describe various sophisticated image analysis strategies by using the proposed combination functions, Their incorporation into ordinary image processing softwares and ESIPAs will greatly improve the performance of image analysis.
2. Representing Image Analysis Strategies in LL VE LLVE, described in Section ILG, uses the network structure to represent the knowledge about image processing (Figure 12). However, the input of each transfer process (i.e., an arc in the network) is limited to a single image feature and the process sequence executed is nothing but a sequential combination of transfer processes. In order to represent the heterogeneous combinations described previously, we have to incorporate transfer processes with multiple, different input image features. To represent such extended transfer processes, we introduced the knowledge representation using a hypergraph. A hypergraph is a generalized graph (network) consisting of a set of nodes and a set of hyperarcs. The generalization is in that, while an arc in a graph connects a pair of nodes, a hyperarc connects a node with a set of nodes.
134
TAKASHI MATSUYAMA
result combinablon
FIGURE19. Knowledge representation by a hypergraph.
We can use a hyperarc to represent a transfer process with multiple input image features. That is, each function for the heterogeneous combinations, COMBINE, MASK, and OPTIMIZE, is represented by a hyperarc in the hypergraph. In Figure 19, a hyperarc is illustrated by a set of directed arcs grouped by a circular line. For example, the result combination in function (10) can be represented by the hyperarc connecting Edge-Point and HomogeneousRegion to Mask-Picture, the mask operation in function (10) by the hyperarc connecting Gray-Picture and Mask-Picture to Homogeneous-Region, and the binarization (optimization) process by function (12) by the hyperarc connecting Gray-Picture and Edge-Point to Binary-Picture in Figure 19. As discussed earlier, the image analysis using multiple resolutions has proven a very effective image analysis strategy. However, all image features in LLVE are considered as at a single resolution. So we need to further extend its knowledge representation so that the image analysis based on multiple resolutions can be realized. One straightforward idea is to use a hierarchical hypergraph (Figure 20). Each level of the hierarchy stores a subhypergraph, representing the knowledge about image features and transfer processes at a certain resolution. Operations that change the resolution, such as shrinking or
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
135
resolution2
resolution1
FIGURE20. Hierarchical hypergraph.
enlargement of an image, are represented by transfer processes across the levels. However, since most image processing operators are defined independent of spatial resolution, subhypergraphs at different levels take the same structure. In order to eliminate this redundancy, we introduced into each image feature a parameter representing its resolution. For example, GrayPicture(s) denotes the image feature of Gray-Picture type at resolution s. Using such image features with the resolution parameter, we can compress the hierarchical hypergraph in Figure 20 into a single-layered parameterized hypergraph. Those transfer processes that change resolutions, i.e., those arcs across the levels in Figure 20, are represented by looping arcs in the parameterized hypergraph. Note that, while the (abstract) knowledge itself is represented by this single-layered parameterized hypergraph, it is instantiated and expanded into a multilayered (concrete) hypergraph by the search (reasoning) process. Figure 2 1(a) illustrates a parameterized hypergraph representing the analysis strategy for edge detection using multiple resolutions. Figure 21(b) shows an instantiated hypergraph with fixed resolution parameters, which was generated from the parameterized hypergraph in Figure 21(a). While Figure 21(a) represents the abstract knowledge, Figure 21(b) shows the heterogeneous analysis process to be executed. (As for the instantiation process see later.) While the network knowledge representation is LLVE is augmented to the
136
TAKASHI MATSUYAMA shrink
enlarge
(a)Parameterized hypergraph representing the image analysis strategy using multiple resolutions
(b)An instantiated hypergraph with fixed resolution parameters
FIGURE21. Instantiation of a parameterized hypergraph: (a) parameterized hypergraph representing the image analysis strategy using multiple resolutions; (b) an instantiated hypergraph with fixed resolution parameters.
hypergraph with resolution parameters, the other knowledge represented by a set of production rules need not be changed. We can use the same search method to reason about the most effective image analysis process. Note, however, that, while the original reasoning process only searches a connected path (process sequence) in the network, the new reasoning process should generate an instantiated hypergraph as well as search the parameterized hyper gr aph . Let us explain the process of instantiating the parameterized hypergraph
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
137
in Figure 21(a), the reasoning process using the hypergraph knowledge representation. When we are given an input image, the Gray-Picture(s) node in Figure 21(a) is instantiated and its resolution parameter is set to 1; that is, Gray-Picture(1) node is generated. Suppose the current goal is to find Edge-Point(1). Given such goal, Edge-Point(s) in Figure 21(a) is also instantiated to generate Edge-Point( 1) node. At this initial stage, the instantiated hypergraph consists of two isolated nodes: Gray-Picture( 1) and EdgePoint(]). LLVE searches the abstract hypergraph in Figure 21(a) to generate an instantiated hypergraph connecting these two nodes. The search and instantiation of Figure 21(a) are done as follows. Step 1. As described previously, Gray-Picture(1) and Edge-Point( 1) are generated first as seed nodes in the instantiated hypergraph. Step 2. The search in the parameterized hypergraph is started at the goal image feature node, Edge-Point(s) in Figure 21(a), and follows hyperarcs backward toward Gray-Picture(s). (This backward search is the same as that used in original LLVE.) In Figure 21(a), two incoming arcs (transfer processes) are attached to Edge-Point(s): enlarge and binarization. Step 3. Suppose a production rule associated with Edge-Point(s) prohibits the search process from following the former arc (i.e., enlarge). The rationale behind this rule is that to apply transfer process enlarge, process A in Figure 21(a) for the mask operation must be included in the already traversed path. In other words, transfer process enlarge must be used in connection with the mask operation represented by process A to realize the plan guided edge detection. This rule is realized by the transfer process selection rule listed in Table I. Step 4. The, the search process follows the binarization arc to Edge-Picture(s). At the same time, the search process instantiates Edge-Picture(s) to generate Edge-Picture( 1) node and connects it to Edge-Point( 1) via the binarization arc in the instantiated hypergraph. Step 5. There are two incoming arcs at Edge-Picture(s): a hyperarc representing process A and an ordinary arc representing edge detection (Figure 21(a)). Suppose we take the hyperarc representingprocess A . If we followed the other arc, the search would be terminated to generate the process sequence consisting of edge detection followed by binarization. Step 6. Since the selected hyperarc leads to both Gray-Picture(s) and MaskPicture(s), the search process has to follow these two paths (partial hyperarcs). First, the path leading to Gray-Picture(s) (denoted as P1 in Figure 21(a)) is terminated at this node, since the input data to be processed is stored at Gray-Picture( 1). (Gray-Picture( 1) has been already included in the instantiated hypergraph.) Then Edge-Picture( 1) node and Gray-Picture(1) node are connected by a partial hyperarc denoted by P1
138
TAKASHI MATSUYAMA
in the instantiated hypergraph. On the other hand, the search process traces the other path leading to Mask-Picture(s) (i.e., P 2 in Figure 21(a)) and instantiates Mask-Picture(s) to generate Mask-Picture( 1) node, which then is added to the instantiated hypergraph and is connected with Edge-Picture( 1) node. Step 7. Next, the search process traces the incoming arc at Mask-Picture(s) to visit again Edge-Point(s) node. Since process A has been included in the already traversed path (i.e., the instantiated hypergraph so far constructed), the enlarge process is selected by the transfer process selection rule associated with Edge-Point(s) (see Step 3). When transfer process enlarge is selected, a production rule associated with it is activated. This rule represents the dependency between transfer processes (i.e., the transfer process dependency rule in Table I). The knowledge represented by this rule is that to apply transfer process enlarge, transfer process shrink must be applied first: both of these transfer processes must be included sequentially in the instantiated hypergraph. Based on this knowledge, GrayPicture(# node and Edge-Point(+) node are generated and connected to Gray-Picture( 1) and Edge-Point( 1) by shrink and enlarge arcs in the instantiated hypergraph, respectively. Step 8. After applying the preceding transfer process dependency rule, the goal of the search is reduced to finding a path connecting Gray-Picture()) and Edge-Point()). Since this goal is exactly the same as the original one except the resolution, the same search process as above is conducted again in the parameterized hypergraph. Step 9. This time, suppose we take edge detection rather than process A at Step 5. Then, the search path from Edge-Point($) is terminated at GrayPicture(# via Edge-Picture(+), and the completely instantiated hypergraph in Figure 21(b) is constructed. If we took process A again, the resolution would be reduced to and the same search process would be repeated once again. By iterating such search, we could generate the image analysis process (instantiated hypergraph) based on the pyramid data structure.
+
Although the search process seems to be complicated, the search in hypergraphs is the same as that in AND-OR graphs (Nilsson, 1980). There have been developed many search algorithms for AND-OR graphs, so that we can use them for this search process. In summary, with the augmentations just described (i.e., hypergraph knowledge representation and introduction of the resolution parameter), LLVE could perform various types of heterogeneous combinations of primitive operators, which would substantially improve its image analysis capability. (These augmentations have not been implemented in LLVE.)
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
139
B. Cooperative Integration of Multiple Stereo Vision Algorithms As noted at the beginning of this section, ordinary ESIPAs involve two types of sequentialities: one at the object level and the other at the reasoning level. The former has been augmented by the heterogeneous combination operators described in Section 1II.A. In this section, we will discuss the latter problem and introduce a cooperative reasoning scheme for integrating multiple different stereo matching algorithms. Generally speaking, there are two different objectives in parallel-concurrent processing. One is, of course, fast computation, and the rate of speed-up is of primary concern. The other is the so-called distributed problem solving to develop more flexible and robust intelligent systems (Bond and Gasser, 1988). The flexibility and robustness are attained by cooperative interactions among parallel-concurrent reasoning agents. Each agent is an independent reasoning module with its own specific analysis and reasoning capabilities and specialized knowledge. During problem solving, agents exchange partial analysis and reasoning results to each other by communication. While the knowledge and input information of each individual agent is limited, it can conduct analysis and reasoning based on the information communicated from other agents. Thus a major interest of distributed problem solving is not in speed-up but in the development of sophisticated communication protocols to realize flexible cooperations among agents; i.e., cooperative integration of multiple reasoning agents. No cooperative integration among image processing operators was realized in the ESIPAs so far described although various types of primitive operators are combined to form complex image analysis processes. Ordinary combinations of operators can be compared to playing with toy blocks. While we can build a variety of complex-shaped objects (i.e., analysis processes) by combining blocks (i.e., operators), each block itself stays as is; its properties such as shape and color are not changed at all by the combination. On the other hand, what we mean by the cooperative integration can be compared to genetic surgery to fuse different types of biological cells. After the fusion, a new cell (i.e., analysis process) is born whose properties are different from those of its constituent cells (i.e., operators). The properties of the constituent cells are mixed and inherited to the new cell. As is clear from these metaphors, the cooperative integration is quite different from the combination and enables us to realize new analysis and reasoning capabilities. In this section we describe a cooperative image analysis for stereo vision and later in Section 1V.D a cooperative spatial reasoning for image understanding. Watanabe and Ohta (1990) proposed a cooperative integration of multiple stereo vision algorithms. A major problem in stereo vision is how to make
140
TAKASHI MATSUYAMA TABLE I1 CHARACTERISTICS OF THE THREE STEREO MATCHING ALGORITHMS ~
Image feature
~~
Application range
~
~~~~~
Accuracy of correspondence
Characteristics
~~~~
Point-based Interval-based Segment-based
Interesting points Intervals on the scanline Connected edge elements ~
Source: Watanabe and Ohta,
0
A
Textural part
A
0
X
A
Intra-scanline consistency Inter-scanline consistency
~~~~~~
01990 IEEE
correspondence between points in the left and right images. Once the correspondence is established, the three-dimensional depth (distance from the camera) of the corresponding point can be calculated easily. Although various matching algorithms for stereo vision have been developed, none of them works perfectly for complicated scenes. This is because the structure of one image can be very different from that of the other due to mutual and self-occlusions among three-dimensional objects. The idea in Watanabe and Ohta (1990) for realizing reliable stereo matching is to execute different stereo matching algorithms in parallel and integrate their results. The difference between this method and the COMBINE operator described in Section I1I.A. 1 is that in the former multiple algorithms make interactions during execution, while in the latter multiple operators are applied independently without any interaction. That is, the integration in the former is done among algorithms, while that in the latter among analysis results by independent algorithms. The following three stereo matching algorithms were used: Point-Based Matching. Using small windows centered at prominent edges in the left and right images as templates, apply template matching to find a pixel location with the highest correlation value. Interval-Based Matching. Using a dynamic programming technique, conduct matching between intervals delimited by edges on the corresponding left and right horizontal scan lines. Segment-Based Matching. Make correspondence between connected edge segments extracted from the left and right images using their geometric properties and gray-level contrast along the edges. Table I1 summarizes the characteristics of these three methods. It shows that they complement each other. Thus, integrating these complementary
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
Algorithm A results of algorithm A
rules for *-
communi-
query uer
% answer r x 4
rules for communication
.
141
Algorithm B results of algorithm B
FIGURE 22. Internal structure of cooperative reasoning modules (from Watanabe and Ohta, 01990 IEEE).
algorithms, a more reliable matching process can be realized; the drawbacks of one algorithm can be compensated by the others. The integration among these algorithms is realized by the following cooperative interaction among reasoning modules. 1. For each algorithm, a reasoning module is implemented as a concurrent process. It analyzes given stereo images by applying its own algorithm, and the analysis result is stored in it as its private information. Thus, the same input stereo images are analyzed by a group of modules in parallel. Figure 22 illustrates the internal structure of cooperative reasoning modules. 2. When a module finds ambiguity in matching at a certain location, it asks the other modules for their analysis results at that location. Using the information given by the others, the module resolves the ambiguity. Since the protocol for such communication between the modules depends on applications, each module stores a set of production rules to represent the knowledge about the mutual communication (Figure 22). With such rules, various types of interactions among the modules can be realized: determination of the applicability of a specific algorithm, adjustment of parameters of an algorithm, complementation of imperfect analysis result, as well as the resolution of ambiguity described previously. In short, the declarative knowledge used in this ESIPA is about the mutual communication-interaction among image analysis processes. (Analysis algorithms are fixed and described procedurally .) 3. Each module must conduct the analysis to reply requests given from the others as well as the matching for its own purpose. Moreover, multiple requests may be issued simultaneously. Thus, a module itself consists of multiple concurrent subprocesses; when a request is given, the module generates a new subprocess to conduct analysis and reply to the request. 4. Besides three modules for stereo matching, the system includes the disparity map generation module (Figure 23). This module accumulates all partial results produced by the matching modules. When a new correspondence is established, each matching module informs it to the
142
TAKASHI MATSUYAMA
Results on
matching module
matching module Information
of occulusion
matching module Results on segments
FIGURE23. Module organization in the cooperative integration of stereo vision algorithms (from Watanabe and Ohta, 0 1 9 9 0 IEEE).
disparity map generation module. Each correspondence is associated with a numerical value representing its reliability as well as the spatial location and the depth information. The disparity map generation module computes the overall reliability and the global disparity map based on the partial information given from the matching modules. Note that the processing done by this module is the same as the combination of multiple analysis results. According to Watanabe and Ohta (1990), a considerable amount of improvement was attained by the cooperative integration. However, since only a few studies on distributed problem solving has been done in computer vision and image understanding, more intensive researches should be done on the following problems:
More Sophisticated Scheme for Interaction. In the preceding system, the interaction among analysis modules is specified by production rules. Although such declarative specification is useful, the implemented interaction scheme is very primitive; only point-to-point communication (i.e., one module makes a request to a specific module) was realized. We should investigate more sophisticated interaction schemes such as broadcasting and multiparty interaction (Bond and Gasser, 1988) to develop distributed computer vision systems. Well-Defined Reliability Computation Method. In integrating multiple partial analysis results, numerical values representing their reliabilities are often used. However, many heuristics and ad hoc rules are implicitly used for the evaluation and integration of reliability values. We should study welldefined mathematical foundations like Dempster-Shafer’s evidence theory (Shafer, 1975) for the reliability computation and integration.
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
143
IV. REPRESENTING SPATIAL RELATIONS AND SPATIAL REASONING FOR IMAGEUNDERSTANDING In general the information used for visual recognition can be classified into the following two types:
Attributes: intrinsic properties such as brightness, color, textures, location, shape and so on. ReZations: spatial relation, temporal relation, and semantic relations like causal and generalization-specialization relations. Recognition based on object’s attributes have been studied extensively in statistical pattern recognition, in which the knowledge about objects is represented by probabilistic distribution functions. On the other hand, a major research goal of image understanding is the analysis and reasoning based on the relational information. With the introduction of relational information, limitations of statistical pattern recognition can be removed. For example, relaxation labeling and constraint filtering are useful relationbased analysis methods to resolve ambiguities and remove errors in the attribute-based recognition (Ballard and Brown, 1982; Binford, 1982; Brooks, 1981). Since the geometry is an important axis defining the visual world, spatial reasoning, reasoning based on spatial relations, is a crucial function in image understanding. In this section, we first present an overview of various schemes of representing spatial relations8 and spatial reasoning methods for image understanding. In the latter half of the section, we discuss the knowledge representation and reasoning based on PART-OF relations, spatial composition relations, and introduce ESIPAs for the recognition of complex objects with internal structures.
A . Knowledge Representation in Logic Based on Topological Relations Topological relations, such as adjacency, intersection, and inclusion, have been often used in IUSs to represent spatial relations among image features and scene objects (Ohta, 1985; Rubin, 1980). They also play an important role in describing structures of complex-shaped objects (e.g., the winged edge model for three-dimensional objects; Ballard and Brown, 1982). Such popular utilization of topological relations can be ascribed to their clear semantics; even in digital images, we can rather easily define and analyze topological relations between image features. We use spatial relations in a general sense and geometric relations to denote specific spatial relations involving metric information.
144
TAKASHI MATSUYAMA
With well-defined topological relations, we can represent the knowledge for image understanding in a formal way. The most formal declarative knowledge representation scheme would be mathematical logics. (See Genesereth and Nilsson, 1987, for utilizations of mathematical logics for artificial intelligence.) Reiter and Mackworth (1989) uses first-order predicate calculus to represent the knowledge for image understanding. Specifically, the knowledge is represented by three types of logical axioms: scene axioms for the scene domain knowledge, image axioms for the image domain knowledge, and depiction axioms for the knowledge about the mapping between the scene and image. (As for the knowledge used in IUSs, see Section 1I.B.) The task of their system is to interpret a given line drawing as a map; that is, to recognize line segments and regions as rivers, roads, shores, land, and water areas. First, image axioms, the knowledge about semantic definitions of primitive image features are described in terms of logical formulae as follows: (Vx)image-feature(x) = line-segment(x) v region(x)
(14)
(Vx) i [line-segment(x) A region(x)J,
(15)
where = and i stand for logical equivalence and negation, respectively. These axioms imply that there are two types of image features, line-segment and region, and they are mutually exclusive. To describe the structure of a given line drawing, first, connected line segments and regions are extracted from the line drawing. Then their types and topological relations among them are described using the following logical formulae: (Vx)line-segment(x) = x
=
I,
(Vy)region(y) = y = r , v y
v x = I, v . . . v x = 1, = rz
... v y = I, . . . v (x = I, A y
= ( x = l, A y = r,,) v (Vx,y)chi(x,y) = (x = I, A y = I,.) v . . . v
(Vx,y)bound(x,y)
(16)
v
(x = 1,
A
y
= r,)
=
1,)
-
-
(17) (18) (19)
The first and second formulae imply that logical constants I, (i = 1 n ) and r, ( j = 1 m) denote line segments and regions, respectively. The third and fourth formulae describe topological relations between line segments and regions; predicate bound(x,y) means that line segment x is on the boundary of region y and c h i ( x , y ) that two line segments x and y intersect each other. The knowledge about the scene, in this case, the map, is described in the same way. First, categories of scene objects are defined as follows.
(VX)SCENE-OBJECT(X)
= ROAD@)
v RIVER(X) v SHOFE(X) v LAND(X) v
WATER(X)
(20) ( V X ) l [ROAD(X) A RIVER(X)]
(2 1)
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
145
( v X ) l [RIVER(X) A SHORE(X)]
(22)
(VX) 1(LAND(X) A WATER(X)].
(23) Then, the knowledge about topological relations between scene objects is described using predicates like CROSS and LOOP. (Vx,~)RIVER(X) A
RIVER(
CROSS(X, y )
(24)
+ LOOP(X)
(25)
+ 1LOOP(X)
(26)
(VX)SHORE(X)
(VX)RIVER(X)
y ) -+ i
Note that predicates representing image domain categories and relations are in lowercase letters while those representing the scene domain knowledge in capital letters. As discussed in Section II.B, for correspondence between image features and scene objects, the knowledge about the mapping between the scene and image domains is required. Such knowledge can also be described by logical formulae as follows: (Vx)image-feature(x) -+ SCENE-OBJECT(O(X)) A ((VY)A(X,
Y> Y = 4x)) +
A
A(x, a(x))
(27)
Here, (r stands for the mapping function which transforms image domain entities into scene domain entities, and predicate A(x, y ) implies that image feature x corresponds to scene object y. That is, interpretation, correspondence between the image and the scene, is represented by this function and relation. It should be noted that the preceeding formula just represents general knowledge about the mapping. We must prove for each logical constant representing an image feature (e.g., I,) which of ROAD(CT(~,)) A A(Z!,a(&)), RIVER(CT(~~)) A A((, a(l,)), or SHORE(O(Z,)) A A(l,,a(l!)) is the logical consequence of all sets of given axioms. (A line segment is to be recognized as either a road, river, or shore.) Such proof process is the recognition in logical systems. Reiter and Mackworth proposed the following recognition process. Given a set of image features extracted from a line drawing, logical axioms describing their types and structures such as (16)-(19) are generated. The interpretation of the line drawing is defined as a logical model of all logical axioms about the image structure, the image domain knowledge, the scene domain knowledge, and the knowledge about the mapping. Note that the logical model here means one defined formally in first-order logic while the interpretation implies no such formal one but the result of recognition. In (Reiter
146
TAKASHI MATSUYAMA
and Mackworth, 1989), however, no practical computational algorithm to find the logical model was described, because in general it is impossible to compute logical models of a set of arbitrary formulae in first-order predicate calculus. While a serious computational problem is left unsolved, the proposed logical framework enlightened various essential problems and assumptions in image understanding that had been treated implicitly. In short, mathematical logic provides a very general and clear declarative knowledge representation scheme and enables us to investigate structures of problem domains in detail. From a computational viewpoint, however, its high generality prevents intelligent systems from conducting efficient reasoning. To attain efficient reasoning, we should incorporate domainspecific heuristics and procedural knowledge. From an image understanding point of view, ordinary deductive inference in mathematical logic alone is not sufficient to realize versatile reasoning capabilities required in image understanding. As discussed in Sections 1I.F and II.G, we have to incorporate top-down image segmentation to recover the imperfection of bottom-up image segmentation (see Figure 10). This is because, at the beginning of reasoning, the set of image features is incomplete and includes erroneous ones. In other words, the set of axioms describing image structures such as (16)-( 19) is not complete: logical constants representing image features such as 1, ( i = 1 n) and r, ( j = 1 m ) are insufficient and include erroneous (uninterpretable) ones. In general, however, the deductive inference does not create any new information (e.g., no logical constant is ever introduced during inference) but only proves that a given logical formula is a logical consequence of (is implied by) the axioms. That is, the deductive inference assumes that all necessary knowledge is given a priori. Hence the interpretation scheme proposed in Reiter and Mackworth (1989) is not satisfactory as a general scheme of image understanding, and we need more sophisticated reasoning capabilities that dynamically generate missing image features (new logical constants) as well as remove erroneous ones. In Matsuyama and Hwang (1990), we discussed a logical framework for image understanding based on so-called hypothesis-based reasoning (Poole, Alelionas, and Goebel, 1987), by which new logical constants and new axioms are dynamically generated during the reasoning process to compensate the imperfection of bottom-up image segmentation. (Since such higher-level logical reasoning is beyond the scope of this chapter, we do not discuss it any further. Interested readers should consult Matsuyama and Hwang, 1990; Genesereth and Nilsson, 1987; Poole et a/., 1987.)
-
-
B . Structural Representation of Geometric Relations Although topological relations are useful, they are not enough to characterize
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
147
structures of and spatial relations between geometric objects. That is, while most topological relations are defined between mutually connected geometric objects, there are many geometric relations involving distance and directional information that are defined between spatially separated disjoint objects: left-right, above-below, proximate, parallel, collinear, coplanar, and so on. While topological relations are well defined and easy to verify, it is often difficult to define semantics of geometric relations. Here we survey various approaches to characterize and describe geometric relations for computer vision. First, we can use several spatial data structures to characterize geometric relations among disjoint image features: the minimum spanning tree (Zahn, 1974), the Voronoi diagram, the Delaunay triangulation (Shamos and Hoey, 1975), the quad tree (Rosenfeld, 1984), and the k-d tree (Bentley and Friedman, 1979). These data structures were originally devised to structurally describe spatial proximity among a set of points, and later generalized for lines and regions. Zahn (1974) represented the spatial proximity among data points by their minimum spanning tree and used its structural properties (e.g., angles between branches in the tree) for pattern matching. Note that in his method geometric relations themselves are used as matching keys rather than attributes of image features. Toriwaki et al. (1982) and Matsuyama and Phillips (1 984) developed algorithms of computing digital Voronoi diagrams for disjoint line segments and regions and defined several types of proximity relations among these image features. Proximity relations characterized by these spatial data structures complement ordinary topological relations. Lowe (1985) used collinearity, proximity, and parallelism to group line segments into meaningful structures. In early IUSs, geometric relations were often represented by defining symbolic predicates for them. For example, Winston (1975) defined predicate LEFT-OF representing a two-dimensional geometric relation between regions as follows: For geometric objects A and B, A is LEFT-OF B if the centroid of A is located at the left of that of B, and if the right most point in A is located left of that in B. Predicate left of in these conditions is defined simply by comparing x (horizontal) coordinate values. Geometric meanings of various predicates, such as ABOVE and BETWEEN, were investigated in Freeman (1975). However, it is difficult to define them consistently because there are many constraints to be satisfied among geometric relations. For example, LEFT-OF must be the reverse relation of RIGHT-OF (i.e., LEFT-OF (A, B) must be equivalent to RIGHT-OF (B,A ) ) ,and only
148
TAKASHI MATSUYAMA
one of LEFT-OF, RIGHT-OF, ABOVE, and BELOW should be satisfied between a pair of geometric objects. Therefore, in order to define geometric relations, we must develop a comprehensive theory in which relations between geometric relations as well as individual relations between geometric objects can be defined consistently. The problem of defining such predicates involves a difficult universal problem of interfacing numerical (quantitative) computation with symbolic (qualitative) computation. We often encounter the same problem in developing rule-based expert systems. Since rules are to be described in terms of predefined symbolic predicates, what predicates we should prepare to characterize various quantitative features in the task domain and how to define their meanings are crucial problems in developing capable expert systems. How to describe and reason about quantitative characteristics of various physical systems (e.g., electric circuits) based on qualitative symbolic features is a primary research objective of qualitative reasoning in artificial intelligence (Special Issue on Qualitative Reasoning, 1984). One idea of integrating quantitative and qualitative information is to incorporate fuzzy descriptions (Zadeh, 1965). Haar (1982) used fuzzy predicates to represent the ambiguity involved in symbolic descriptions of geometric relations. A geometric relation is described by using two primitive fuzzy predicates, DISTANCE and BEARING. {[DISTANCE ([BEARING
A B (7 10)]0.6}
A B (45 60)]0.8)
Here A and B represent geometric objects. The former means that the distance between A and B is between 7 and 10 and the latter that the direction from A to B is between 45 and 60". Each proposition is associated with its reliability: 0.6 and 0.8 in these examples. Symbolic predicates like LEFT-OF for geometric relations are defined using these fuzzy descriptions. In Haar's system several computational rules were implemented to perform reasoning about object locations based on fuzzy symbolic descriptions. In spatial reasoning in the two-dimensional space, iconic descriptions (i.e., regions) have been often used to specify approximate object locations. That is, regions represent fuzzy object locations and various set operations among them are used for spatial reasoning. In Haar's system, for example, up-right rectangular regions were used to describe estimated (approximate) object locations. McDermott (1980a) used a similar method to reason about locations of geographic objects in maps. Russell (1979) used a constraint network to represent geometric relations among objects in aerial photographs. There are two types of nodes in the network: one represents object locations and the other geometric relations. Arcs imply which object nodes are arguments of which relation nodes. Object
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
149
locations are represented by two-dimensional regions, and set operations among them are defined at geometric relation nodes. Given location data (i.e., regions) for certain object nodes in the network, geometric relation nodes connected to them perform their corresponding set operations to produce new location data, which then are propagated to other geometric relation nodes. For example, when relation node REL(O],O,, 0,) is given location data for 0, and O , , it computes a region for O2 by applying predefined set operations to the given data. Then the computed region for 0, is given to another relation node as its input data. By iterating such propagation, object locations that satisfy the specified geometric relations are determined. Of course, if initially given object locations or specified geometric relations are not sufficient, some object locations cannot be computed. This knowledge representation is the same as constraint graphs, graph structures used in constraint programming (Leler, 1988). To represent (partial) knowledge as constraints and regard reasoning as constraint satisfaction is a major knowledge representation and reasoning scheme in modern artificial intelligence and logic programming (Van Hentenryck, 1989). C. Algebraic Representation of Geometric Information and Geometric Theorem Proving
The most general method of representing shapes of and spatial relations between geometric objects would be to use algebraic geometry. In algebraic geometry the geometric information is described by a set of equations and inequalities involving variables. Variables represent locations of points in a certain coordinate system, and equations and inequalities specify constraints on the point locations. For example, the location of a point in the two-dimensional Euclidean coordinate system can be represented by a pair of variables ( x , y ) and a straight line by an equation ax by c = 0, where a, b, and c are parameter variables specifying the location (shape) of the straight line. That is, straight line L(a, b, c) is defined as a set of points satisfying the constraint ax + by + c = 0: L(a, b, c ) = {(x, y)lax + by + c = 0), where { ) denotes a set. Using these algebraic descriptions, we can precisely represent shapes of and geometric relations between objects: planar and curved surfaces, polygons, geometric transformations between three-dimensional scenes and two-dimensional images, and so on. In computer vision for three-dimensional object recognition, utilities of the algebraic shape representation have been widely recognized, and many useful geometric models and recognition algorithms have been developed (Ballard and Brown, 1982). Pentland (1986) and Solina and Bajcsy (1990) used superquadrics to represent a class of three-dimensional objects with curved
+ +
-
150 0.1
1.o
2.0
0.1
0.5
10
2.0
30
\/ El
FIGURE24. Shapes of superquadrics.
surfaces. Shapes of superquadrics are defined by the following equation:
By changing the shape parameters and E ~ a, variety of three-dimensional objects can be represented (Figure 24). (The terms u1,u2,u3are scale parameters specifying the size of an object.) Moreover, by introducing various deformation functions such as tapering and bending, we can represent tapered objects and those with curved axes (Solina and Bajcsy, 1990). Although we can model a wide variety of geometric objects by algebraic shape representation, the recognition algorithm using such geometric models is rather simple and far from being called geometric reasoning. That is, the recognition of objects is usually realized as a parameter optimization process: find the optimal values of shape parameters that minimizes the distance (i.e., approximation error) between a geometric model and observed data (Pentland, 1986; Solina and Bajcsy, 1990). The algebraic representation is also very useful to describe geometric relations between objects. Sugihara (1984) showed that algebraic representa-
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
151
tion is very effective to decide if a given two-dimensional line drawing can be a correct projection of a three-dimensional polyhedral scene (i.e., classic computer vision problem of interpreting line drawings of the block world). In his formalism, various geometric relations between polygon vertices are derived by analyzing a line drawing: occlusion and convexity relations between polygon faces and coplanarity relations between polygon vertices. These geometric relations are described by a set of linear algebraic equations and inequalities involving variables representing three dimensional vertex locations. If the set of these equalities and inequalities have common solutions (i.e., consistent), then each solution provides a possible threedimensional interpretation of a given line drawing. Thus, the interpretation of line drawings can be realized by a consistency examination algorithm among a set of algebraic equalities and inequalities. Recently, several geometric reasoning algorithms were proposed to prove algebraically described geometric problems (Kapur and Mundy, 1989; Kutzler, 1988): 1. Buchberger’s Grobner bases method, 2. Wu’s pseudo-division method, 3. Collins’s cylindrical algebraic decomposition method. While in the algebraic shape representation the object recognition is realized by numerical computation procedures, these algorithms symbolically manipulate algebraic equations to prove geometric theorems. A typical example of geometric theorems to be proved is as follows. Prove that, in any triangle, three perpendicular lines from vertices to their facing sides meet at one point.
This problem can be described algebraically as follows. First, let ABC be an arbitrary triangle and H be the intersection point of two perpendicular lines from B and C . Then we should prove that line AH is perpendicular to BC. To describe these hypotheses and goal algebraically, we set the coordinate system as shown in Figure 25: Hypotheses -
x2(x3
x2Y3
) + Y2Y3 = O;
- x3Y2 =
(xl- x3)y5+ (x5- xl)y3= 0 x2 = xq = x5
BE is perpendicular to AC A , E, and C are collinear B, H, and E are collinear CF is perpendicular to A B and C , H , and F are collinear
Goal x5(x2
-
) + YZY5 =
AH is perpendicular to BC
152
TAKASHI MATSUYAMA
A(O,O)
v 4
no)
B(x 1 90)
FIGURE25. Coordinate system to describe the geometric problem.
To prove the goal by the Buchberger’s method, first we must add the following subsidiary condition to exclude degenerated triangles:
This implies that B is not at the origin and C is not on the x axis. In general, the negation of an equation can be translated into an ordinary equation by introducing an auxiliary variable, so that the preceding subsidiary condition is translated into z , x , y , - 1 = 0,
where z,is an auxiliary variable. The validity of this translation can be verified easily. Then, the proof of the goal is done by refutation: first take the negation of the goal and prove that the set of equations consisting of the hypotheses, the subsidiary condition, and the negated goal is inconsistent. In this example, the negated goal is described by
where z2 is another auxiliary variable. The inconsistency among a set of equations can be verified by the following process: 1. For each equation, move all variables and constants to the left side of the equation (i.e., the right side becomes 0). Then, let (P, , . . . ,P,} be a
set of polynomials in the left sides of all equations. 2 . Compute a Grobner basis of the ideal of {P,, . . . ,P,} by the Buchberger’s algorithm. 3. If the Grobner basis includes 1 , the set of original equations is inconsis-
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
153
tent and consequently the goal is proven. 4. Otherwise, the equation set is consistent and the goal may be false.
In this example, the goal can be successfully proven by this refutation procedure. Strictly speaking, however, this refutation procedure for geometric theorem proving is not complete. That is, even if the Grobner basis does not include 1, the geometric property specified by a goal can be valid. A major reason for this incompleteness is that, while variables used to describe geometric problems are considered as ranging over real numbers, the Grobner basis computation is defined for polynomials whose variables range over complex numbers. For example, we cannot prove x = 0 (and y = 0 ) from 2 + y 2 = 0, since the Grobner basis of the following set of polynomials does not include 1: { 2 + y 2 , z x - l} That is, the set of equations
(2+ y2 = 0, zx - 1 = 0 ) has a complex solution (i.e., consistent), whereas they would be inconsistent if the variables ranged over real numbers. Wu’s pseudo-division method is also incomplete in the same sense. Irrespectively of such incompleteness, Kutzler (1988) showed that many geometric theorems can be proved by Buchberger’s Grobner bases method. He compared the performance in detail between Buchberger’s and Wu’s method and concluded that Wu’s method has another problem of incompleteness and proofs given by it should be regarded as near proofs. On the other hand, Collins’s cylindrical algebraic decomposition method is complete (i.e., it can check the consistency in the domain of real number), but its computational complexity is too high to be used for practical geometric theorem proving. In summary, although the algebraic representation is general and effective to symbolically describe geometric information (shapes and relations), currently we have no complete and efficient algorithm for geometric reasoning. From a viewpoint of computer vision and image understanding, moreover, the following points should be studied further:
1. First we have to augment algebraic representation and reasoning algorithms so as to cope with observation errors; strict equality does not hold or is meaningless in describing geometric relations between observed point locations. 2. As will be discussed in the next subsection, we need a hierarchical representation scheme for describing complex objects with internal structures. The algebraic representation in terms of point locations is
154
TAKASHI MATSUYAMA
too flat and primitive to conduct reasoning about complex objects. Besides relational decriptions about point locations, we need higherlevel descriptions using structured geometric objects such as triangles and planar surfaces as descriptive terms.
D . Reasoning Based on
PART-OF
Relations
PART-OF (PO in short) relations are commonly used to describe complex objects with internal structures: an object is composed of a group of part objects, each of which in turn is composed of a group of subpart objects, and so on. A PO relation represents a compositional relation between an object and its part, and a part object and its subpart. While many computer vision and image understanding systems used PO relations to describe object structures hierarchically, their semantics and usage in object recognition vary widely. ACRONYM (Brooks, 1981) is an image understanding system that recognizes complex three-dimensional objects like airplanes from a two-dimensional image. It uses a hierarchically organized symbolic (declarative) knowledge representation: models of objects to be recognized are symbolically described in terms of frames (Minsky, 1975), and subpart slots are used to represent PO relations, where names of part objects are specified (Figure 26(a)). The semantics of each PO relation is described by another frame. It represents a geometric transformation between two object centred coordinate systems defining shapes of an object and its part respectively. ACRONYM uses generalized cylinders to represent three dimensional shapes of primitive part objects. As shown in Figure 26(b), a generalized cylinder is described by
(a) spine (axis): a three dimensional curve, (b) cross section: planar region, (c) sweeping rule to transform the cross section as it is moved along the spine.
A generalized cylinder is defined by the three-dimensional subspace swept by the cross section when it is moved along the spine. During the sweep, the shape of the cross section is changed according to the sweeping rule (Figure 26(c)). Note that while the algebraic shape representation by Eq. (28) explicitly describes the surface of an object, that of a generalized cylinder is implicitly specified by the preceding three descriptive terms. The object recognition in ACRONYM is realized by the following algebraic geometric reasoning (Figure 27). The system first generates expected two-dimensional appearances of three-dimensional part objects (i.e., generalized cylinders) using the estimated camera model. When an
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
155
(define object JET-AIRCRAFT having subpart FUSELAGE subpart PORT-WING subpart STARBOARD-WING)
slou to represent part names
(define object FUSELAGE havlng conedescriptor FCONE subpart PORT-STABILIZER STARBOARD-STABILIZER subpart subpart RUDDER)
(a)Structural model representation by part-of relations
(define cone FCONE having main-cone (define simple-cone having cross-section (define cross-section having type circle radius FUSELAGE-RADIUS) spine (define spine having type stralght length FUSELAGE-LENGTH) sweeping-rule constant-sweeping-rule))
(b)Symbolic shape description of a g e n e r a l i d cylinder
sweep cross section
(c)Generalized cylinder FIGURE26. Symbolic model representation of three-dimensional objects: (a) structural model representation by PART-OF relations; (b) symbolic shape description of a generalized cylinder; (c) generalized cylinder.
image feature is matched with such an appearance model (i.e., apartial match is established), a set of algebraic equations and inequalities are generated that constrain the geometric transform between the object centered coordinate system and the image coordinate system. Since a generalized cylinder
156
TAKASHI MATSUYAMA final Interpretation
image features
expected 20 appearances
30 Object Model (Generallzed Cyilnder)
FIGURE27. Geometric reasoning in ACRONYM.
representing a primitive object includes several shape parameters (e.g., in Figure 26(b)), constraints on such shape parameters are also generated by a partial match. Many partial matches are generated at the first stage of interpretation. Some are correct but others are not due to segmentation errors and intrinsic ambiguities in the object model; the shape of a primitive part object is so simple that its appearance model matches multiple different image features and vice versa (Figure 27). The global match, i.e., recognition of an entire complex object, is established by combining mutually consistent partial matches. To find a global match, the system examines the consistency among sets of algebraic equations and inequalities associated with individual partial matches guided by PO relations specified in the object model. While algebraic constraints in Sugihara’s line drawing interpretation system (Sugihara, 1984) were confined to linear algebraic equations and inequalities and those in geometric theorem provers (Kapur and Mundy, 1989; Kutzler, 1988) to polynomials, ACRONYM must examine the consistency among nonlinear constraints involving trigonometrical functions. Moreover, since many types of imprecision such as observation noise and the limited accuracy in the camera model are to be taken into account in image understanding, most algebraic FUSELAGE-LENGTH
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
157
Image processing --C, Feature extraction from Image Feature extraction from features-&-Integration
FIGURE8. Jetwork knowledge representation using PART-OF relations (from Nakamura and Nagao, 1990).
constraints should be formulated as inequalities rather than equations. However, there is no complete algorithm to check the consistency among such complex constraints so that ACRONYM used a heuristic procedure. It should be noted that the consistency examination in ACRONYM is also executed symbolically as in the geometric theorem provers described in Section 1V.C. Nakamura and Nagao (1990) proposed an ESIPA that extracts complexshaped image features automatically. The system uses a network knowledge representation (Figure 28) similar to that used in LLVE (Figure 12): nodes represent types of image features and arcs analysis processes. The critical difference between these two ESIPAs is that the former incorporates arcs representing PO relations between image features. In Figure 28, for example, rectangle is defined as a composition of either a pair of parallel lines or a group of four lines. Figure 29 shows the structural description of the model
158
TAKASHI MATSUYAMA (deffeature rectangle (:attributes (center-point (:evd (calculate-center-point)) area (:eval (calculate-area))
............. ) ) .............
(:component-features (:primitive .(line1 (:type straight-line :attributes (length %lenl))
..........................
line4 (:type straight-line :attributes (length %len4)))) ( :compound ( p l l (:type parallel-lines :components (linel line3)) p12 (:type parallel-lines :components (line2 line4)) ci (:type corner :components (line1 line2) :attribute (inner-angle (nearly-equal 90)))
..................
c4 (:type comer :components (line4 linei) .......) ) ) (:labeling-variations (:circular (line1 line2 line3 line4)))) (:component-relations (parallell (:features (linel line3) :type line-parallel)) connect1 (:features (line1 line2) :type line-connect :attribute (connect-type Xctypel inner-angle (nearly-equal 90)))
.....................
cal (:features (corner1 corner2) :type corner-aligned)
.....................
)
(:constraints (con1 (:parameters (Xctypei Xctype2) :type :member (((1 3)(3 4 ) ) ( ( 2 4)(1
.....................
2))))
)I))
FIGURE29. Structural description of rectangle (from Nakamura and Nagao, 1990).
of rectangle, where its component image features and their mutual spatial relations are described symbolically (declaratively). Guided by these descriptions, the system groups instances of component image features (e.g., lines) to form an instance of a composite image feature (e.g., rectangle). With such declarative descriptions based on PO relations, the system can extract a wide variety of complex image features with internal structures. Note that in LLVE such grouping operations are realized by specialized (fixed) procedures. The image feature extraction is executed in the bottom-up fashion from the original image node to find all instances of higher-level composite image feature nodes. Instead of the sequential search process used in LLVE, the system conducts image analysis in parallel; the image plane is partitioned into a set of meshes, and the analysis of each mesh is performed in parallel. In
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
(a) Original Image
I I
Feature Straight line1 Straight line2
159
(b) Extracted straight-lineprim
I Number I I
52
I
Corner Rectangle
Arrow (c) number of extracted features
(d) extracted arrow
( e ) feature network for extracted arrow
FIGURE 30. Extraction of complex image features (from Nakamura and Nagao, 1990).
addition, the network is also partitioned into a set of planes (Figure 28), and the analysis specified by each plane is performed in parallel. Since the analysis is executed in parallel, the same image feature can be detected by different analysis processes. To avoid multiple, duplicated instances of the same image feature, the system examines if a newly extracted image feature can be considered the same as the already extracted one. If so, no new instance of the image feature is created. Figure 30 illustrates an experimental result of image analysis. The system first extracted line segments as the most primitive images features. Then various grouping procedures for line segments, which are defined by PO relations in the network, were executed in parallel to extract corners, line sequences, parallel lines, rectangles, and arrows. Figure 30(c) shows the
160
TAKASHI MATSUYAMA 0 , : Object Class
;----I
i i
I I I
s : agent (Object Instance)
generate
&
_ _ _ _z --, Hypothesis
FIGURE31. Scheme for cooperative spatial reasoning
numbers of extracted image features. Figures 30(d) and (e) illustrate an extracted arrow and its internal structure (i.e., composition hierarchy by PO relations), respectively. In the preceding two systems, PO relations are used only in the bottom-up fashion: first detect primitive part objects and then group them to form composite objects based on PO relations. In this bottom-up analysis, however, if some primitive part objects are not detected at the first stage of the analysis, it is impossible to recover from such errors at the later stage. Thus, to realize flexible object recognition, we should use PO relations in both bottom-up and top-down fashions; PO relations should also be used to guide the search to find missing part objects. In our SIGMA image understanding system (Matsuyama and Hwang, 1990), the bottom-up and top-down analyses based on PO relations are integrated into a unified reasoning process. The object recognition by SIGMA is realized by a cooperative spatial reasoning by active reasoning agents. In S I G M A , recognized objects are regarded as active reasoning agents that perform spatial reasoning about their surrounding environments based on their own knowledge. Figure 3 1 shows the scheme for the cooperative spatial reasoning in SIGMA. When a new object is recognized, an instance of its corresponding object class is generated to represent properties of the recognized object. Then, the object instance starts reasoning as an active agent based on the knowledge stored in its corresponding object class. The knowledge for spatial reasoning is described by a set of rules. Each rule consists of the following three parts: Condition: Conditions to apply a rule. Hypothesis: A procedure to generate a hypothesis about a spatially related
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
161
object. Action: A procedure to be executed when the answer to the generated hypothesis is returned. An agent activates those rules whose condition parts are satisfied, by which hypotheses about its spatially related objects are generated. A hypothesis is described by Locational constraint: A local area where a target object may be located. Property constraint: A set of constraints that attributes of a target object must satisfy. After generating a hypothesis, the activation of a rule is suspended. We call instances of and hypotheses about objects evidence. GRE in Figure 31 is the system controller that examines the consistency among pieces of evidence so far generated and returns the result to those agents that generated hypotheses. When an answer to a hypothesis is returned, the suspended rule is reactivated to execute its action part. Suppose the PO hierarchy illustrated in Figure 32(a) is given as the model of an object with internal structure: (part) object classes represented by nodes are linked by PO relations to form a tree structure. We call objects corresponding to leaf nodes in the tree primitive objects and the others composite objects. In general, primitive objects are recognized first and object instances representing them are generated, because their appearances are simple and correspond directly to image features. Let s denote an instance of primitive object 0, in Figure 32(a). It directly instantiates its parent object through the PO relation. That is, in the bottom-up use of a PO relation, an object instance rather than a hypothesis is generated (Figure 32(b)). This instantiation is realized by the following rule stored in object class 0,: Condition: TRUE Hypothesis: NIL Action: make-instance (0,) Once an instance of composite object 0,, p, is generated by s, it generates hypothesisf(p) for its missing part object 0, (Figure 32(c)). This top-down hypothesis generation is realized by a rule in object class 0,. The recognition of composite objects (i.e., grouping of recognized part object instances) is done as follows. Suppose an instance of 0, corresponding to the missing part, say t , has already been detected. Then, since two pieces of evidencef(p) and t are consistent (i.e., they denote the same entity), GRE unifies them and reportsf(p) = t top. Then, agent p evaluates the action part of the rule that generated the hypothesis f ( p ) to establish a PO relation between p and t.
162
TAKASHI MATSUYAMA
(a)Part-of hierarchy
instantiate
P
(b)Bottom-up instantiation
(c)Top-down hypothesis generation
feature
(d)Partially instantiated part-of hierarchies
(e)Object instance with multiple parents
QUnified part-of hierarchy
(g)Part-of hierarchy constructed by top-down analysis
FIGURE32. Spatial reasoning based on PART-OF relations: (a) PART-OF hierarchy; (b) bottom-up instantiation; (c) top-down hypothesis generation; (d) partially instantiated PART-OF hierarchies; (e) object instance with multiple parents; (f) unified PART-OF hierarchy; (g) PART-OF hierarchy constructed by top-down analysis.
However, since t is an instance of 0 2it, also has instantiated its own parent object O4before the unification. Let u denote such parent instance: a pair of instances t and u have been connected by a PO relation (Figure 32(d)). Thus, as a result of unifying f ( p ) and t , instance t comes to have two parent
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
163
instances, p and u, at the same time (Figure 32(e)). This leads GRE to another unification. When instance t finds that it has two parent instances, it asks GRE to unify them. GRE examines a pair of parent instances p and u, and if they are consistent, it unifies them to generate a new instance of 04,say r, by which a merged instantiated PO hierarchy is constructed (Figure 32(f)). This is the reasoning process for grouping part object instances to construct an instance of a composite object. In the preceding example, we assumed that both an instance of part object O , , s, and an instance of part object 0 2 ,t , were successfully detected. As discussed previously, however, some of part objects may be missed at the initial segmentation. Suppose an instance of part object O,, t , has not been detected. Then, the system conducts the following top-down object detection for a missing part object. Since an instance of part object 0,, s, is detected, the reasoning illustrated in Figures 32(b) and (c) are conducted in the same way as before: a hypothesis for object class O , , f ( p ) , is generated. Although GRE tries to find an object instance consistent withf(p), it fails. Then, it activates the top-down image analysis to extract an image feature that satisfies the constraints associated with f(p ) . Actually Figure 10 illustrated the process of this top-down image analysis, and LLVE described in Section 1I.G is activated for detecting the missing image feature. (The reasoning and analysis processes by LLVE were shown in Figure 14.) If LLVE successfully extracts an image features satisfying the constraints, GRE makes a new instance of object class 0 2 ,n, and reports f ( p ) = n to agent p . Then, the action part of the rule that generated hypothesisf(p) is activated to connect p and n by a PO relation (Figure 32(g)). Note that the unification illustrated in Figure 32(e) is not required in this case since new agent n is connected to p before instantiating its parent object instance. In short, by using PO relations for both bottom-up instantiation of parent objects and top-down hypothesis generation for missing parts, we can realize flexible spatial reasoning that detects missing parts as well as groups primitive objects into a composite object. V. CONCLUDING REMARKS
In this chapter we discussed declarative knowledge representation for computer vision and surveyed four types of expert systems for image processing and analysis (ESIPAs): 1. Consultation system for image processing, 2. Knowledge-based program composition system, 3. Rule-based design system for bottom-up image segmentation
164
TAKASHI MATSUYAMA
algorithms, 4. Goal-directed top-down image segmentation system. We also discussed several declarative knowledge representation schemes for spatial reasoning in image understanding systems: first-order predicate calculus for reasoning based on topological relations, spatial data structures to structurally describe geometric relations, algebraic shape representation and geometric theorem proving, and reasoning based on PART-OF relations. In addition, we introduced two cooperative reasoning schemes to realize sophisticated image analysis and understanding: one for integrating multiple stereo matching algorithms and the other for cooperative spatial reasoning in image understanding. As is seen from the references, many ESIPAs have been developed in Japan. We believe this can be ascribed to intensive discussions by members of the special interest group on expert vision. First, the group was organized informally as a voluntary group in 1984 and held eight workshops on knowledge-based computer vision over the next two years. Then, in 1986 a formal special interest group on expert vision was founded inside the Institute of Electronics, Information and Communication Engineers, one of major academic societies for information processing and computer science in Japan. Before the break-up in 1990, we had three workshops and conferences every year to promote the study of declarative knowledge representation for computer vision. Most Japanese ESIPAs were developed by members of the group and were presented and discussed at these meetings. Some people may doubt the concept of ESIPAs itself; since production systems and other knowledge representation schemes are nothing but new programming styles, we cannot produce anything new from them. However, nobody would doubt that many useful application programs can be implemented by using (combining) existing image processing operators. We believe that ESIPAs are useful software tools to facilitate the development of such application programs. With flexible software tools, application areas of computer vision can be expanded greatly. Especially, they are very helpful for those who are not speciaIists in computer vision. In this sense, we should regard ESIPAs so far developed as new flexible software environments for developing image analysis programs. While the system described in Toriu et al., (1987) is commercially available, the others were developed mainly to examine the feasibility of ESIPAs. So what we have shown is just a step toward knowledge-based composition of image analysis processes and programs. To make current ESIPAs real expert systems, we should study the following problems intensively. Symbolic Description of Pictorial Information and Knowledge. Most ESIPAs use production rules to represent heuristics and knowledge about image
IMAGE PROCESSING, ANALYSIS, A N D RECOGNITION
165
processing techniques. However, vocabularies used to describe the rules are very limited, and the semantics of symbolic predicates is defined only informally (see Figure 9 and Table I, for example). To set a formal basis for automatic program generation, REM tried to characterize image processing operators based on mathematical morphology. However, the predicates to describe operator selection rules (e.g., R-INF and A-SUP) were too primitive to characterize properties of image data and effects of operators although their definitions were clear. Thus we should develop well-defined and capable schemes for describing image properties and operator characteristics. Especially, to realize sophisticated combinations of image processing operators as described in Section III.A, formal descriptive schemes are of the first importance. Generalization of Composed Image Analysis Processes. Most ESIPAs compose an image analysis process (program) based on a given sample image, so that selected operators and parameters are heavily dependent on that specific image data. In general, however, users of ESIPAs want to have a general image analysis process that works well for a class of image data with similar properties. So we have to generalize the composed process to make it effective for every member image in the class. As discussed earlier, however, such generalization requires sophisticated learning mechanisms as well as a rich characterization of the class of images. As for learning mechanisms, an inductive inference method proposed by Shapiro (1983), which constructs PROLOG programs from examples, would be very helpful. Since his method is based on a formal logical model inference, we can extend it safely by introducing heuristics to cope with intrinsic problems of computer vision such as characterization and symbolic description of image properties; without such a well-defined theoretical basis, the introduction of heuristics could lead a learning system to confusion. Evaluation of Analysis Results. As noted in Section II.A, the capability of evaluating analysis results is crucial in realizing effective image analysis processes. In the heterogeneous combinations described in Section III.A, for example, many evaluation functions were incorporated to implement image analysis strategies. Thus, evaluation methods themselves should be considered as the important knowledge for ESIPAs. We should develop new effective evaluation methods as well as new image processing operators. Recently, Kanatani (1990) proposed a general method of evaluating the coincidence between hypothesized and observed geometric entities. He used the projective geometry in the homogeneous coordinate system to measure collinearity of line segments, congruence between lines, and collinearity of intersecting points between lines. Using his method, all these
166
TAKASHI MATSUYAMA
measures can be reduced to a single measure of edge-point displacement, which provides a universal reliability measure in geometric reasoning. Introduction of Imaging Model. All knowledge used by ESIPAs is purely domain independent; no knowledge about a specific task domain (i.e. scene) is used. In one sense this is a big advantage; they can be used to analyze images in any task domain. On the other hand, however, when we want to develop image analysis processes for a specific task like medical examination, we have to translate the terminology in the task domain into that in the image domain: boundaries of blood vessels are represented by gray-level edges in x-ray images. Usually this translation process requires the knowledge about imaging models (i.e., the knowledge about the mapping between the scene and image domains). Shape from X (X shading, texture, contour) in computer vision for three-dimensional object recognition (Ballard and Brown, 1982) can be considered an example of the (reverse) translation process based on imaging models. By introducing imaging models as new knowledge sources, capabilities of ESIPAs will be increased. Ikeuchi and Kanade (1988) emphasized the importance of sensor models in computer vision for three-dimensional object recognition. They characterized a sensor by feature detectability and reliability. The former specifies what image features can be detected from images taken by a specific sensor, and the latter the confidence of detected image featuers. Using such sensor models, they showed how two-dimensional appearances can be generated from geometric models of three-dimensional objects. It should be noted, however, that sensor and geometric object models alone are not enough to estimate object appearances; appearances change depending on object configurations (e.g., mutual reflection and occlusion), which cannot be determined before analyzing images. Thus we again come up with the chicken-and-egg problem discussed in Section 1I.A. How to describe and use knowledge about visual information is an essential problem in image understanding as well as in ESIPAs. In this chapter we emphasized declarative knowledge representation and discussed several declarative representation schemes of spatial information in Section IV. Our belief is that it would be very useful to make explicit what we regard as knowledge, for the explicit description of knowledge facilitates the investigation of knowledge itself. From this viewpoint, we should have listed and made a dictionary of all predicates (i.e., descriptive terms) and rules used in ESIPAs and IUSs. Since we have only little experience in declarative knowledge representation for computer vision, much more research efforts should be paid to this problem. Comparing algebraic and logical knowledge representations, the former
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
167
seems to be concrete and the latter abstract. Concreteness is amenable to low-level quantitative numerical analysis involved in visual sensing and signal processing, while abstraction is required to conduct high-level qualitative symbolic reasoning for object recognition and scene interpretation. Since ESIPAs and IUSs require both these capabilities, we should study an integrated scheme of algebraic and logical knowledge representations. Such integration is a major characteristic of constraint logic programming (Van Hentenryck, 1989), in which we can describe both logical predicates and algebraic expressions in a single rule. Therefore applications of constraint logic programming to image understanding would be a promising research topic to be studied. How to represent and process uncertainty is a major problem in computer vision and other research areas involving signal processing. In this article we briefly discussed fuzzy representation of geometric relations in Section 1V.B. To introduce uncertainty measures in symbolic representations is a practically useful method of realizing smooth interface between numeric and symbolic descriptions. At the introduction, however, we should define the semantics clearly, because uncertainty involves a wide spectrum of different ingredients: observation error, accuracy, fuzziness, ambiguity, reliability, probability, possibility, belief, and so on. Without care, these different meanings are mixed into a single numerical value by ad hoc computation procedures. Recently Provan (1990) gave a logical definition of DempsterShafer’s evidence theory (Shafer, 1975) and applied it to an image recognition system based on the logical formalism described in Section 1V.A. Since both object recognition and reliability computation are defined in terms of mathematical logic, symbolic reasoning and numeric computation are integrated consistently. On the other hand, we showed the utility of random closed sets (RACS) in representing uncertainty (Quinio and Matsuyama, 1991). First, using RACS we can represent both the imprecision involved in observation and reliability of analysis. Second, we showed that RACS can be considered as generalization of both Dempster-Shafer’s evidence theory and fuzzy set theory (Zadeh, 1965). As for reasoning mechanisms, we discussed two topics besides ordinary sequential deductive reasoning in this article. First, we introduced two cooperative reasoning schemes for computer vision and image understanding. The most distinguishing characteristics of these schemes are their robustness and adaptability. The former implies that the reliability of analysis is increased by accumulating partial evidence from different sources (i.e., agents), and the latter most suitable image analysis methods are dynamically selected and applied depending on local image properties of a focused area (i.e., environments). Since robustness and adaptability are crucial characteristics in developing practically useful visual recognition systems, we should
168
TAKASHI MATSUYAMA
investigate various new schemes of cooperative reasoning for computer vision: what we should regard as agents (e.g., stereo matching algorithms in Watanabe and Ohta, 1990, or recognized object instances in Matsuyama and Hwang, 1990) and what communication protocols can be used for the cooperation among agents (e.g., point-to-point message passing, broadcast, and so on). Second, we pointed out that when we design reasoning engines for IUSs, we should keep it in mind that ordinary deductive inference is not sufficient. This is because deductive inference methods usually assume that complete knowledge or information is given a priori, and no erroneous entity is included. As discussed earlier, however, these two assumptions do not hold in image understanding; it is inevitable that some meaningful image features are left unextracted and erroneous ones are included at the initial image analysis stage. Moreover, properties of image features necessarily involve imprecision due to limited accuracy of sensors and observation noise. Thus, standard algebraic consistency examination algorithms and deductive logical inference rules should be extended so as to cope with incompleteness, error, and imprecision of information. While how to conduct reasoning based on incomplete knowledge has been studied in artificial intelligence (Genesereth and Nilsson, 1987; Poole er al., 19871, little investigation is made on reasoning under both incomplete and erroneous information. We believe that computer vision researchers should attack this problem. Finally, we hope this chapter will help computer vision researchers to study declarative knowledge representation and sophisticated (nondeductive) reasoning schemes.
REFERENCES Bailey, D. G . (1988). “Research on Computer-Assisted Generation of Image Processing Algorithms”, in Proc. of IAPR Workshop on Computer Vision - Special Hardware and Industrial Applications,” 294-297. Ballard, D.H., and Brown, C. M. (1982). “Computer Vision,” Prentice-Hall, Englewood Cliffs, NJ. Barr, A., and Feigenbaum, E. A. (eds.) (1982). “The Handbook of Artificial Intelligence,” Chapter 10, William Kaufman, Reading, MA. Bentley, J. L., and Friedman, J. H. (1979). “Data Structures for Range Searching,” A C M Computing Surveys 11, No. 4, 397-409. Binford, T. 0. (1982). “Survey of Model-Based Image Analysis Systems,” Znt. J . Robotic Res. 1, NO. 1, 18-64. Bond. A. H., and Gasser, L. (eds.). (1988). “Readings in Distributed Artificial Intelligence,” Morgan Kaufmann, San Mateo, CA. Brooks, R. A. (1981). “Symbolic Reasoning About 3-D Models and 2-D Images,” Artjficial Intelligence 17, 285-348.
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
169
Bunke, H., and Grimm, F. (1990). “A Flexible Approach to the Construction of Expert Systems for the Application of Image Processing Software,” in “Proc. of IAPR Workshop on Machine Vision Applications,” 67-70. Clement, V., and Thonnat, M. (1989). “Handling Knowledge in Image Processing Libraries to Build Automatic Systems,” in Proc. of International Workshop on Industrial Applications of Machine Intelligence and Vision,” 187-192. de Haas, L. J. (1987). “Automatic Programming of Machine Vision Systems,” in “Proc. of 10th International Joint Conference on Artificial Intelligence,” 790-792. Donzeau-Gouge, V. G. (1984). “Programming Environments Based on Structured Editors: The MENTOR Experience,” in “Interactive Programming Environments” (D. R. Barstow, H. E. Shrobe, and E. Sandewell, eds.), McGraw-Hill, New York. Freeman, J. (1975). “The Modelling of Spatial Relations,” Computer Graphics and Image Processing 4, 156- 171. Genesereth, M. R., and Nilsson, N. J. (1987). “Logical Foundations of Artificial Intelligence,” Morgan Kaufmann, Los Altos, CA. Haar, R. L. (1982). “Sketching: Estimating Object Positions from Relational Descriptions,” Computer Graphics and Image Processing 19, 227-247. Harland, D. M. (1984). “Polymorphic Programming Languages,” Ellis Honvood Ltd., Chichester, England. Hasegawa, J., Kubota, H., and Toriwaki, J. (1986). “Automated Construction of Image Processing Procedures by Sample-Figure Presentation,” in “Proc. of 8th International Conference on Pattern Recognition,” 586-588. Hasegawa, J., Kubota, H., and Toriwaki, J. (1987). “IMPRESS: A System for Image Processing Procedure Construction Based on Sample-Figure Presentation,” Transactions of IEICE Japan J70-D, No. 11, 2147-2153 [in Japanese]. Hasegawa, J., Kubota, H., Takasu, A., and Toriwaki, J. (1988). “Consolidation of Image Processing Procedures in the Image Processing Expert System IMPRESS,” Transactions of Information Processing Society of Japan 29, No. 2, 126-133 [in Japanese]. Hayes-Roth, F., Waterman, D. A,, and Lenat, D. B. (eds.). (1983). “Building Expert Systems,” Addison-Wesley, Reading, MA. Ikeuchi, K., and Kanade, T. (1988). “Modeling Sensor and Applying Sensor Model to Automatic Generation of Object Recognition Program,” in “Proc. of Image Understanding Workshop,” 697-710. Kanatani, K. (1990). “Hypothesizing and Testing Geometric Attributes of Image Data,” in “Proc. of 3rd International Conference on Computer Vision,” 370-373. Kapur, D., and Mundy, J. L. (eds.). (1989). “Geometric Reasoning,” MIT Press, Cambridge, MA. Kelly, M. D. (1971). “Edge Detection in Pictures by Computer Using Planning,” in “Machine Intelligence” (B. Meltzer and D. Michie, eds.), 6, 377-396, Edinburgh University Press, Edinburgh. Kutzler, B. (1988). “Algebraic Approaches to Automated Geometry Theorem Proving,” Ph.D. thesis, Research Institute for Symbolic Computation, University of Linz. Leler, W. (1988). “Constraint Programming Languages,” Addison-Wesley, Reading, MA. Lowe, D. G. (1985). “Perceptual Organization and Visual Recognition,” Kluwer Academic, Norwell, MA. Matsuyama, T. (1989). “Expert Systems for Image Processing: Knowledge-Based Composition of Image Analysis Processes,” Computer Vision, Graphics, and Image Processing 48, 22-49. Matsuyama, T., and Hwang, V. (1990). “SIGMA: A Knowledge-Based Aerial Image Understanding System,” Plenum, New York. Matsuyama, T., and Ozaki, M. (1986). “LLVE: An Expert System for Top-Down Image
170
TAKASHI MATSUYAMA
Segmentation,” Transactions of Information Processing Society of Japan 27, 191-204 [in Japanese]. Matsuyama, T., and Phillips, T. Y. (1984). “Digital Realization of the Labeled Voronoi Diagram and Its Application to Closed Boundary Detection,” in “Proc. of 7th International Conference on Pattern Recognition,” 478480. Matsuyama, T., Murayama, N., and Ito, T. (1988). “On Representation of Image analysis Strategies,” Transactions of Information Processing Society of Japan 29, No. 2, 169-177 [in Japanese]. McDermott, D. (1980a). “A Theory of Metric Spatial Inference,’’ in Proc. of National Artificial Intelligence Conference,” 246-248. McDermott, J. (1980b). “Rl: An Expert in the Computer Systems Domain,” in Proc. of National Conference on Artificial Intelligence,” AAAI-80, 269-27 1. Milgram, D. L. (1979). “Region Extraction Using Convergent Evidence,” Computer Graphics and Image Processing 11, 1-12. Minsky, M. (1975). “A Framework for Representing Knowledge,” in “Psychology of Computer Vision” (P. H. Winston, ed.), McGraw-Hill, New York. Nadif, A. M., and Levine, M. D. (1984). “Low Level Image Segmentation: An Expert System,” IEEE Trans. PAMM, No. 5 , 555-577. Nagao, M. (1984). “Control Structures in Pattern Analysis,” Pattern Recognition 17, 45-56. Nakamura, Y., and Nagao, M. (1990). “A Blackboard System for Feature Extraction,” Journal of Japanese Society for Artificial Intelligence 5, No. 3, 354-366 [in Japanese]. Nilsson, N. J. (1980). “Principles of Artificial Intelligence,” Tioga Publishing Co., Palo Alto, CA. Ohta, Y. (1985). “Knowledge-Based Interpretation of Outdoor Natural Color Scenes,’’Pitman, London. Ohtsu, N. (1979). “A Threshold Selection Method from Gray-Level Histogram,” IEEE Trans. SMC-9, 62-66. Pentland, A. P. (1986). “Perceptual Organization and the Representation of Natural Form,” Artificial Intelligence 28, No. 3, 293-33 1. Poole, D., Aleliunas, R., and Goebel, G. (1987). “Theorist: A Logical Reasoning System for Defaults and Diagnosis,” in “The Knowledge Frontier: Essays in the Representation of Knowledge” (N. J. Cercone and G. McCalla, eds.), Springer-Verlag, New York. Provan, G. M. (1990). “The Application of Dempster-Shafer Theory to a Logic-Based Visual Recognition System,” in “Uncertainty in Artificial Intelligence, 5” (M. Henrion et al., eds.), 389405, North-Holland, Amsterdam. Quinio, P., and Matsuyama, T. (1991). “Random Closed Sets: A Unified Approach to the Representation of Imprecision and Uncertainty,” in “Lecture Notes in Computer Science 548,” 282-286, Springer-Verlag. Berlin. Reiter, R., and Mackworth, A. K. (1989). “A Logical Framework for Depiction and Image Interpretation,” ArtiJicial Intelligence 41, 125-155. Rosenfeld, A. (ed.). (1984). “Multiresolution Image Processing and Analysis,” SpringerVerlag, Berlin. Rosenfeld, A. (1986). “Expert’ Vision Systems: Some Issues,” Computer Vision, Graphics, and Image Processing 34, 99-1 17. Rosenfeld, A., and Kak, A. C. (1982). “Digital Picture Processing,” 2nd ed., Vols. 1 and 2, Academic Press, New York. Rubin, S. M. (1980). “Natural Scene Recognition Using Locus Search,” Computer Graphics and Image Processing 14, No. 4, 298-333. Russell, D. M. (1979). “Where Do I Look Now?” in “Proc. of IEEE Conference on Pattern Recognition and Image Processing,” 175-183. Sakaue, K., and Tamura, H. (1985). “Automatic Generation of Image Processing Programs by
IMAGE PROCESSING, ANALYSIS, AND RECOGNITION
171
Knowledge-Based Verification,” in “Proc. of IEEE Conference on Computer Vision and Pattern Recognition,” 189-192. Sato, H., Kitamura, Y . , and Tamura, H. (1988). “A Knowledge-Based Approach to Vision Algorithm Design for Industrial Parts Feeder,” in “Proc. of IAPR Workshop on Computer Vision - Special Hardware and Industrial Applications,” 41 3-41 6. Selfridge,P.G. (1982). “Reasoning About Success and Failure in Aerial Image Understanding,” Ph.D. Thesis, University of Rochester. Serra, J. (1982). “Image Analysis and Mathematical Morphology,” Academic Press, London. Serra, J. (1986). “Introduction to Mathematical Morphology,” Computer Vision, Graphics, and Image Processing 35, 283-305. Shafer, G . (1975). “A Mathematical Theory of Evidence,” Princeton University Press, Princeton, NJ. Shamos, M. I., and Hoey, D. (1975). “Closest Point Problems,” in Proc. 16th Annual Symp. Foundations of Compu. Sci.,” 131-162. Shapiro, E. Y. (1983). “Algorithmic Program Debugging,” MIT Press, Cambridge, MA. Solina, F., and Bajcsy, R. (1990). “Recovery of Parametric Models from Range images: The Case for Superquadrics with Global Deformations,” IEEE Trans. PAMI-12, No. 2, 131-147. Special Issue on Expert Systems for Image Processing. (1988). Transactions of Information Processing Society of Japan 29, No. 2 [in Japanese]. Special Issue on Qualitative Reasoning. (1984). Artificial Intelligence 24. Sueda, N., and Hoshi, H. (1986). “An Expert System for Designing Image Analysis Programs Based on Software Package, in Expert Systems - Theory and Application,” 135-154, Nikkei BP, Inc. (Nikkei McGraw-Hill, Inc.), [in Japanese]. Sugihara, K. (1984). “An Algebraic Approach to Shape-from-Image Problems,” Artificial Intelligence 23, 59-95. Tamura, H., Sakane, S., Tomita, F., Yokoya, N., Kaneko, M., and Sakaue, K. (1983). “Design and Implementation of SPIDER - A Transportable Image Processing Software Package,” Computer Vision, Graphics, and Image Processing 23, 273-294. Tamura, H., Sato, H., Sakaue, K., and Kubo, F. (1988). “DIA-Expert System and its Knowledge Representation Scheme,” Transactions of Information Processing Society of Japan 29, No. 2, 199-207 [in Japanese]. Toriu, T.. Iwase, H., and Yoshida, M. (1987). “An Expert System for Image Processing,” Fujitsu Sci. Tech. J . 23, No. 2, 111-118. Toriwaki, J., Mase, K., Yashima, Y., and Fukumura, T. (1982). “Modified Voronoi Diagrams and Relative Neighbors on a Digitial Picture and Their Applications to Tissue Image Analysis,” in “Proc. 1st Int. Symp. on Medicial Imaging and Image Interpretation,” 362-367. Van Hentenryck, P. (1989). “Constraint Satisfaction in Logic Programming,” MIT Press, Cambridge, MA. Vogt, R. C. (1986). “Formalized Approaches to Image Algorithm Development Using Mathematical Morphology,” in “Proc. of VISION ’86,” Detroit, 5-17-5-37. Vogt, R. C. (1989). “Automatic Generation of Morphological Set Recognition Algorithms,” Springer-Verlag, New York. Watanabe, M., and Ohta, Y. (1990). “Cooperative Integration of Multiple Stereo Algorithms,” in “Proc. of 3rd International Conference on Computer Vision,” 476-480. Winston, P. H. (1975). “Learning Structural Descriptions from Examples,” in “Psychology of Computer Vision” (P. H. Winston, ed.), McGraw-Hill, New York. Zadeh, L. A. (1965). “Fuzzy Sets,” Information and Control 8, 338-353. Zahn, C. T. (1974). “An Algorithm for Noisy Template Matching,” in “Proc. IFIP 74,” 727732.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS, VOL. 86
n-Beam Dynamical Calculations KAZUTO WATANABE Tokyo Metropolitan Technical College, Tokyo, Japan
I. Introduction . . . . . . . . . . . . . . 11. n-Beam Dynamical Calculation Methods . . 111. BetheMethod . . , , . . . . . . . . . A. Bethe Method Formulation . . . . . . B. Inelastic Scattering . . . . . . . . . . IV. Multislice Method. . . . . . . . . . . . A. Propagation in a Vacuum and a Material B. Defect Calculation . . . . . . . . . . C. Derivation of the Multislice Method from D. Extended Multislice Method. . . . . . E. Inclined Illumination . . . . . . . . . V. Coupled Differential Equations . . . . . . A. Real Space Method . . . . . . . . . B. Direct Integrated Method . . . . . . . C. Scattering Matrix Method. . . . . . . VI. S u m m a r y . . . . . . . . . . . . . . . Acknowledgments. . . . . . . . . . , . Appendix: Crystal Potential . . . . . . . References . . . . . . . . . . . . . , .
. . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
the Schrodinger Equation
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . .
. . . .
. . . . .
. . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . .
, . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
173 176 178 178 183 187 187 191 193 195 197 199 199 203 210 217 219 2 19 22 1
I. INTRODUCTION Recent developments in high-resolution transmission electron microscopy (HRTEM) have made it possible to directly observe lattice images by improving resolution limitations, with this technique being successfully used to determine the material structures of both large and small unit cells i(Allpress, Sanders and Wadeley, 1969; Uyeda et al., 1972; Izui et al., 1978). For semiconductor materials, perfect crystals as well as strained-layer sublattices and defects have had their structures determined (Khang, Ku, and 'Wu, 1986; Legoues, Copel, and Tromp, 1989; Muller et al., 1989). In addition, sublattices and compound semiconductor polarities have been identified using intensity differences from either elongated black and white spot contrasts that result from closely spaced pairs of atomic columns or from asymmetrical spot contrasts (Shiojiri et al., 1982; Yamashita et al., 1982; Ourmazd, Rentscher, and Taylor, 1986; Watanabe et al., 1987; 173
Copynght 0 1993 by Academic Press, Inc All nghts of reproduction in any form reserved ISBN 0-12-014728-9
174
KAZUTO WATANABE
1 s t Process: Mater i a I
Passing through the material
2nd Process:
I mag i ng
3 r d Process: Display
FIGURE1. Electron-optic system used for electron diffraction simulations.
Bourret, Rouviere, and Spendeler, 1988; Wright, Ng, and Williams, 1988; Smith, Glaisher, and Lu, 1989). Current HRTEM applications are solving a broad spectrum of complex material problems, being one of today’s most powerful techniques for examining atomic-scale crystal structures and acquiring chemical information. However, since the electron-diffraction process is irreversible, no unique structures have been derived except under special conditions, i.e., variations of the HRTEM contrast rapidly occurs in thickness and in instrument parameters such as defocus , energy spread, and illumination angle; thereby making it difficult to experimentally determine these values with sufficient accuracy to enable qualitative interpretations. As a result, extensive image fitting simulations in comparison with experimental results are acquired until the mismatch disappears. Therefore, image simulations are indispensable for HRTEM investigations. Image simulations are generally considered to be composed of three main processes as shown in Fig. 1 . The first process concerns how the incident electrons pass through the crystal and involves two problems. One involves calculation of the
n-BEAM DYNAMICAL CALCULATIONS
175
crystal potential for incident electrons. In conventional image simulation techniques, the fast electrons allow the crystal potential to be constructed by superposition of the neutral atoms. Then, if each atom’s position is known, the potential can be easily evaluated. Furthermore, this method can be applied to defects as well as complex structures. In fact, many good correlations exist between simulated and experimental results, although the screening of valence charge electrons due to crystallization has not yet been taken into account. This method’s limitation with respect to compound semiconductors was previously reported (Watanabe et al., 1987; 1991; Hashikawa et al., 1991; Hiratsuka, 1991), where it was found that the screening effect is required for accurate image simulation when the difference in the atomic number of compound semiconductors is small. The other problem involves the beam dynamical calculation. Applicable theoretical developments are at a stage that utilizes advances in HRTEM resolution to quantitatively reproduce detailed atomic structures and chemical characteristics of various materials. The second process concerns image generation created by electrons propagated from the crystal exit surface onto the image plane. Frank (1973) and Wade and Frank (1977) derived a contrast transfer function for partial coherence using first-order approximation in terms of the product of the unmodified transfer function and two modulation functions due to spatial and temporal partial coherence, respectively. On the other hand, O’Keefe (1979) and Ishizuka (1980) obtained a contrast transfer function through the transmission cross coefficient. The third process is image display, which allows simulation results to be compared with experimental images. To obtain suitable comparisons, a method to map calculated intensities to the gray-scale levels was proposed (Self and O’Keefe, 1988). Pattern recognition approaches have also been reported that achieve quantitative comparisons using model images (Ourmazd et al., 1989). The present chapter introduces fundamental concepts and basic principles related to both well-known and other new n-beam dynamical calculations. The advantages and disadvantages are also discussed. It should be noted that overrigorous mathematical arguments are avoided to increase understanding. Section I1 classifies n-beam dynamical calculations, whereas Sections I11 and IV, respectively, describe fundamental concepts and new ideas related to the two most commonly used methods; i.e., Bethe’s eigenvalue method (Bethe method) and multislice method. These methods are not covered in exhaustive detail, especially the applications to convergent-beam electron diffraction (CBED) and scanning transmission electron microscopy (STEM). Other new calculational methods are discussed in Section V, with Section VI providing a summary.
176
KAZUTO WATANABE
11. H-BEAMDYNAMICAL CALCULATION METHODS Since Bethe (1928) developed a dynamical theory of electron diffraction, the problem of dynamical scattering has been investigated by many researchers (Fujimoto, 1959; Cowely and Moodie, 1957; Hirsch et at., 1965; Van Dyck, 1980; Watanabe et al., 1988; 1990), with correlations between these approaches having been previously discussed in detail (Jap and Glasser, 1978; Gratias and Protier, 1983). Figure 2 shows various methods that have advanced the available supply of atomic and chemical information. The most common calculations utilize either the Bethe or multislice methods, where in the former (Fujimoto, 1959), the Schrodinger equation is solved under appropriate boundary conditions by assuming three-dimensional (3-D) periodicity. Once the Bloch waves have been determined, the specimen’s wave function is readily calculated for any crystal thickness. The resultant wave function is then evaluated using a few of the dominant Bloch waves. However, present computer technology does not enable n-beam dynamical calculations of complex systems to be easily solved. On the other hand, the multislice method was originated by Cowley and Moodie (1957) using physical optics, being followed by deviations involving quantum mechanical considerations (Ishizuka and Uyeda, 1977; Jap and Glasser, 1978; Gratias and Protier, 1983). Although this method is the most conventionally used theory, determination of the corresponding slice thickness for complex systems and their respective positions has not been completely clarified. Self et al. (1 983) provided a comparison of the existing methods and discussed the relative merits of each method. Several excellent articles on the Bethe and multislice methods have also been published (Hirsch et al., 1965; Cowley, 1981; 1988). Three types of n-beam dynamical calculations have evolved from the one particle Schrodinger equation, and it is felt that these methods contain enough capability and potential to meet today’s demand for HRTEM simulations. The real space method was derived by Van Dyck (1980) and has been extended. This method is different from other approaches because all calculations are performed in real space, thus making it suitable for defect calculations. Watanabe et al. (1988; 1990) introduced the direct integrated method under two-dimensional (2-D) periodicity. Initially, a coupled second-order differential equation was derived, followed by analytical integration. This equation was then modified assuming a modulated wave function, and a coupled first-order equation was subsequently developed. In addition, Nagano (1990) proposed a scattering matrix method in order to calculate reflection high-energy electron diffraction (RHEED). In this method, a coupled second-order differential equation, is solved,
v)
e x
L I -
s o c a
7
c
c al
0
r
w
0
I-d
o
c
L
ca-z
0)
I - c c
a a l c
.-
c
n-BEAM DYNAMICAL CALCULATIONS
.-
-.I-
m=5
I
I o a m a 0
.-
I--
1 - I - -
- 0 c c a 1 a 0 0 0 I-.U c a e o c m al 0.4.. a
c a.- w c m n w
177
I
I.: I L
r 1
a
c x
.-
1,
178
KAZUTO WATANABE
thereby enabling it to be easily applied to HRTEM n-beam dynamical calculations.
111. BETHEMETHOD The foundation of this method was originally presented by Bethe (1928) as an electron diffraction theory. Later, its application was developed by Fujimoto (1959), who solved the Schrodinger equation under appropriate boundary conditions by assuming 3-D periodicity. A . Bethe Method Formulation
The Schrodinger equation for electron energy ranges commonly employed in electron microscopy is
V]* = 0. The crystal potential has the same periodicity as the lattice and is assumed to be a simple function of position; i.e., V ( r ) can be expanded as a Fourier sum: - 2m/ti2V(r) =
1 vh - exp (ibh
r).
h
(2)
The summation extends over all the reciprocal lattice vectors bhof the crystal. The periodicity imposes the following Bloch conditions on the wave function:
-
$k(r) = exp(ik r)Uk(r).
The function Uk(r)must be unchanged during a translation through any lattice vectors, hence it can also be expanded in a Fourier sum as $k(T)
= eXp(ik 'I')x4heXp(ibh' r) h
where kh = k -I- bh. Function (3) is called the Bloch wave of wave vector k. Substituting (2) and (3) into Eq. (1) results in
(k2- ki)4h
+
xz)h-g4g
=
0,
(4)
g
+
where k = {m/h2(E v,,)}"~ is the magnitude of the wave vector inside a crystal. Introducing excitation error (h and anpassung value 5, the fOllOWing
n-BEAM DYNAMICAL CALCULATIONS
179
Crystal surface
0
r
b
FIGURE 3. The excitation error (,, and the anpassung value
relation is obtained (Fig. 3): k h = Ic
-
<
' COSe -
[h
(5)
where 8 is the angle from the surface normal. During the transmission of high-energy electrons through a thin crystal, no appreciable backscattering and diffraction occur within a small angle around the incident-beam direction. This allows t2and 5 * [h to be neglected with cos o h = 1 for forwardscattering and small-angle approximations. As a result, k2 - ki in Eq. (4)becomes
kz - ( K - 5 -case - ch)* = -x
+Ph,
(6)
where x = 2 ~ 5 and , Ph = 2 ~ [ , - [ i . Equation (4) is expressed in matrix representation as (7)
MdJ = X d J ,
where matrix M and vector PO
4 are written by ...
v_h
Ph
. . . v_, . . . . . . K-g . . .
6 . . . vg-h . . .
Pg
...
180
KAZUTO WATANABE
The diagonal elements of matrix M are evaluated from the excitation error, whereas the off-diagonal ones from the Fourier components of crystal potential. Matrix M is Hermitian in the absence of absorption and real for centrosymmetric conditions. Using the eigenvalues and vectors of Eq. (7), the solutions of the Schrodinger Eq. (l), i.e., the Bloch waves, are obtained by
b' ( k * r)
= h
& * exp [i(kj,- r)].
The wave function inside the crystal can therefore be written as
where a' is the coefficient determined using the boundary condition at the entrance surface. In the forwardscattering approximation, the continuity of wave function and its first derivative at the entrance surface ensures that the sum of all the beam amplitudes is 0 except in the incident beam direction, where
1.i = c a ' $ i = d h n . I
(10)
I
Multiplying both sides of (10) by
&* and summing over h gives
Provided the Bloch wave is orthogonalized and normalized, Eq. (1 1) becomes Hence, in a vacuum the wave function and intensity are, respectively,
n-BEAM DYNAMICAL CALCULATIONS
181
After the beam's condition is determined, the matrix elements can be obtained. Numerical evaluation of these amplitudes is then performed by solving Eq. (7). Although the computational time for perfect crystals is not a significant factor using currently available advanced computers, its reduction has been examined using crystal symmetry. Hirsch et al. (1965), Howie (1966), and Blume (1966) respectively solved reflection cases having two-, four-, and sixfold symmetries. Fukuhara (1966) additionally provided a number of examples in which exact solutions were calculated by quadratic equations. Many beam dynamical simulations of the symmetric Laue case have been developed by Kogiso and Takahasi (1977) using group theory. In order to effectively examine atomic structures with HRTEM, a matching between the experimental images and simulated images is required. Up to now, values of very thin thickness and defocus have unfortunately not yet been determined accurately. Experimental ambiguities are normally eliminated by fitting through-focus and through-thickness images with simulated images under many different thickness and defocus conditions. The Bethe method is the most suitable one for thickness determination because, once the secular equation is solved, the wave function for various thicknesses is readily calculated. This makes Bethe method convenient to use, although it is inadequate to evaluate lattice images of defects due to its 3-D periodicity. Figure 4 shows a simulated through-focus series of GaAs images in the (100) projection (Watanabe et al., 1991). All simulated images were obtained by the Bethe method. Simulations used thickness increments of 0.565 nm (2.825-16.95nm) at 300 kV and Cs/Cc values of 0.9/1.5mm. The defocus range was from -200 to 200nm with a 5nm interval. The lower insert simulated image was obtained using the conventional crystal potential superposing neutral atoms, whereas the upper simulated one with the screening crystal potential constructed from ion and Hartree potentials (see Appendix). White spots corresponding to constituent atoms are resolved, but the contrast between the two different atoms is the same. Resultant simulations using the Bethe method can reproduce contrast intensity changes with defocus whether or not the screening effect is accounted. As for the proceeding of Bethe method, the deviation of the scattering matrix form was given by Fujimoto (1959) and Sturkey (1962). For a particular eigenvalue, x i , matrix representation is
M @ = xi4;. (15) Since xi is a simple number, applying the matrix to the right-hand side of Eq. (15) gives Mnf#j = (X')n&.
(16)
FIGURE 4. A through-focus image series of (100) GaAs. Simulated images using screening and conventional potential are superimposed, respectively, on the upper and lower right of each micrograph. Estimated thickness are 14a and 15a (a = lattice constant) using conventional and screening potentials, respectively. Defocus values are (a) 20nm, (b) 25 nm, and (c) 30nm underdefocus. (Courtesy of Taylor and Francis Ltd.)
n-BEAM DYNAMICAL CALCULATIONS
Multiplying both sides of Eq. (16) by
&,*
183
and summing over i results in
C<x'>"+b* 4; = CC&* 4; - ( M " ) *
*
1
I
=
8
1(1bh* g
'
4;)
* (Mn)hg
= (Mn)hO.
(17)
l
Applying the boundary condition (10) gives
I& 4; = 6,. * *
I
Using Eq. (17), Eq. (13) can be rewritten as
where the scattering matrix is defined as S
= exp {(iz/2ic)M).
Applying Eq. (18) to a thin crystal of thickness z and assuming the incident wave amplitude, i.e., O,,, the amplitude q$, at the exit surface is D O
= e x p (2K kM)
0
(19) 0
Although this method has been known since 1959, it has received little use (Self et al., 1983). B. Inelastic Scattering
Incident electrons impinging on a crystal are elastically scattered by the crystal potential and inelastically scattered by the polarization field of the crystal structure. The polarization process is divided into two parts, a real process in which the definite energy is transferred and a virtual process that occurs before the energy transfer. The real process corresponds to the
184
KAZUTO WATANABE
complex potential, with this complex potential being called the optical potential in the field of atomic collisions and in nuclear physics for homogeneous scattering distributions. A theoretical justification for treating the crystal potential as complex was given by Yoshioka (1957), when inelastic scattering is small compared with the elastic scattering. The Schrodinger equation representing an incident electron’s interaction with a crystal is
where - h2/2mAis the electron’s kinetic energy, H, is crystal’s energy and H‘ is the interaction energy between the electron and crystal. Neglecting the exchange effect, the wave function of this system, Y, is expanded as Y{r, r l , . . . , T N ) =
c *,(r>
*
afl(r1>
. . . ,r N ) ,
(21)
n
where a, is the wave function representing the nth excited state of the crystal and satisfying H,a, = Enan. Function $o in (21) is the elastic scattering and $n is the inelastic scattering that causes excitation to the nth state. Substituting function (21) into Eq. (20),Yoshioka’s coupled equations are obtained as
where ki
= ( 2 m / h 2 ) ( E- E n )
s
Hni= a,*H’amdrl. . . dr,v.
The right-hand side of Eqs. (22) and (23) gives the effect of inelastic waves. Since is small compared with $o, and 12rnHLm/h’*I6 1, Eq. (23) reduces to
+,,
+
(A k f ) G = ( 2 m / h 2 ) H ~ o t j o , (n 2 1 ) Eq. (24) is easily solved as *n(r) = -
exp {ik,(r - r’l} Hio(r’)$o(r’) dr‘. )r - r‘l
(24)
(25)
185
n-BEAM DYNAMICAL CALCULATIONS
Substituting solution (25) into Eq. (22) enables the elastic scattering equation to be obtained as A(r, r’)+hodr‘ = 0, where A(r, r’) is written by A(r, r’) =
-
m exp {iknlr- r’l} H&,Hio 2nh2n + O Ir - r’l
1
-
The wave function of Eq. (26) and crystal potential HA can then be respectively expressed as $o(r> = 1
h
4 h ’ exp (ikh
HA = - 1 h
*
r),
(kh
=k
(27)
bh)
- exp (ib, - r).
(28)
-
By substituting (27) and (28) into Eq. (26),multiplying by exp (- ik, r), and integrating over the entire volume, a generalized fundamental equation of the dynamical theory that takes into account inelastic waves is obtained as
where
Using the identity relation of 1 1 - Px+id x --
+ inij(x)
enables qgto be rewritten as V,g=vg+i-Kg. Inelastic scattering effects can be accounted for by adding a complex potential to the Fourier coefficient of the crystal potential. The matrix M of (8a) is then expressed as
+ + ic V, + K + ir5: V, + v,‘+ i<
po
v-, + v:, + ivi, . . . v-, + v:, + ivi, . . . ... ph + + ic . . . V,-z + K-z+ i%-, ... . . . Vg-,, + T-, + iq-, . . . p,+v,+ic ... ...
1
(30)
186
KAZUTO WATANABE
The matrix M' of (30) is not Hermitian, and the eigenvectors are not real, hence the Bloch waves have wave vectors with an imaginary z component and are attenuated. The eigenvalues and eigenvectors can be easily calculated using a computer whether or not the matrix M' is Hermitian. On the other hand, perturbation theory offers a good approximation to determine the changes in k' that are required by the absorbing potential (Hirsch et al., 1965) because the ration V,,'/ 6 is very small. The first-order perturbation results in the wave vector becoming
k"
=
k'
+ iq',
(3 1)
where 4,is given by
The intensity of the waves leaving the exit surface of a crystal having thickness z is
Corrections accounting for single-electron inelastic scattering have been calculated using atomic wave functions (Whelan, 1965), although Humphreys and Hirsch (1968) also calculated this correction for different inelastic excitation processes. It has also been shown that the ratio of the Fourier coefficients of the absorption potential due to thermal diffuse scattering, expressed by an Einstein model and elastic scattering, %/ 6 ,are not constant with respect to reciprocal lattice vector g , but instead show complex behavior. Additionally, a contribution of plasmon scattering to the complex potential was investigated by Radi (1970). Recently, Allen and Rossouw (1990) proposed an expression for Fourier coefficients of the absorptive potential due to single-impact ionization in crystals, as well as for the coefficients for thermal diffuse scattering from the first principle. It should be noted that all these studies are conducted to examine the effects of inelastic scattering on the elastically scattered electrons. The first attempt to solve inelastic wave functions using Eqs. (22) and (23) was initiated by Howie (1963) based on the Bloch wave approach under a small angle approximation. It was subsequently shown that the matrix element Him(r)could be expressed as Hn',,,( r ) = exp ( - iq,,
- r ) c' H,"'"
*
exp (ig * r ) ,
(34)
4
where q,, is the wave vector of the crystal excitation created in the transition from m to n, and g are the reciprocal lattice vectors.
n-BEAM DYNAMICAL CALCULATIONS
187
Both the initial and the final states of the fast electron may be represented as a sum of Bloch waves: i.e., = 1.;(4
$k@)
*
b'(k, r)
1
=
1a;(z) - 1&,
*
-
exp (iki r).
h
I
(35)
At this point, the objective is to determine the depth dependence of the various Bloch wave-excitation amplitudes CI:(Z). By substituting (35) into Eqs. (22) and (23), neglecting the small term d2tLk/dz2,multiplying both sides by bJ*, and integrating over the x, y plane containing the reciprocal lattice vector, the following relation is obtained between the Bloch wave amplitudes:
where CiL
=
-
-
{im/h2(k,)Z} exp{i(k,
-
- k, - qL)z z ) ~ d $ * ( k , ) H ~ ' M ( k , ) . h,g
(37) Equations (36) have been applied to various inelastic processes, for example, Howie (1963) considered plasmon losses, whereas Cundy, Howie, and Valdre (1969) and Humphreys and Whelan (1 969) investigated valenceband and single-electron excitations. In addition, phonon excitation has been treated by neglecting the transition among excited states (Rez, Humphreys, and Whelan, 1977; Rez, 1983).
IV. MULTISLICE METHOD From an analog of the interaction of light with a transparent object, Cowley and Moodie (1957) derived multislice method for elastic scattering of fast electrons in a material. Thus, it is called physical optics.
A . Propagation in a Vacuum and a Material Following Huygens's principle, the wave function at S, in a vacuum is constructed by propagating from S, as shown in Fig. 5: $2W, v')
=
5
$1
(x, Y)P,(X'
-
x, y'
-
Y ) dx 4 ,
(38)
where z is the propagation direction and p z the propagation function.
188
KAZUTO WATANABE
FIGURE5. Coordinate system for the propagation of electrons
Generally, the propagation function is written in terms of spherical wave: 1 r
- exp (ikr).
Making use of the small angle approximation results in
1 r
1
- exp (ikr) = - exp (ikz)exp Z
Neglecting exp(ikz), which does not depend on x and y , the propagation function is given by
For the propagation of electrons in a material, the material is divided into a number of slices perpendicular to incident beam direction, and propagation in a vacuum and transmission in a potential field are treated separately. The propagation of electrons in the vacuum is already given by the propagation function (39). The transmission in the potential field gives rise to phase change associated with change in wave length, hence the wave length of electrons through the potential field V ( x , y , z ) is
x = h/[2m{w - V ( x , y , z))]”’. The phase difference with respect to the wave in the vacuum for slice
189
n-BEAM DYNAMICAL CALCULATIONS
thickness Az is defined as -~ V ( X y)Az/il , W.
Here, W is the accelerated voltage and the projected potential, V ( x , y ) , is obtained by V ( X ,y)Az =
J
Z=ZO+AZ
V ( x , Y , z)dz.
z=zg
For most crystals, the projected potential per unit length is integrated over the crystal-repeated distance parallel to the beam direction:
{
V ( x , Y ) = (I/ZO)
ZO
C v(h, k , /)exp {i(h - x + k - y + I - z ) ) dz
0 h,k,l
=
1v(h, k, 0) exp ( i ( h x + k - y ) } , *
h, k
where v(h, k, I) are the Fourier coefficients of the crystal potential. The Fourier coefficients and structure factors are respectively defined as v(h, k, Z )
= h2F(h, k, /)/271meS1.
F(h, k, I)
= i
f ; exp (ig * rj 1,
where the vector rj specifies the positions of each atom of the unit cell in crystal and f ; the electron scattering factor for j t h atom. Normally the electron scattering factor is obtained from either the scattering factor for x-rays in international tables for x-ray crystallography or tabulated values for neutral atoms (Doyle and Turner, 1968). Then, the projected potential of a perfect crystal is readily calculated, if the atom positions and their species in the unit cell are determined. The phase change of the electron produced by passage through the potential field in the region zo + zo Az is given by
+
d x , Y ) = exp { - i a V ( x , v)Az),
where a = 2711 WA,
Therefore, the real space wave function, $,, , is written in terms of the wave function $n-, through (n - 1)th slice as
s
$?Ax, Y ) = [$n-I(x’,Y’)Pn-I(X’, Y’)lqn-I(x - x’, Y - Y’)dXdY
-
*
(40) (b) Pn- I @)I 9 n - I (b)? where P ~ is - propagation ~ function for (n - 1)th slice (Fig. 6). The con=Wn-
I
190
KAZUTO WATANABE
A z
A z 9 1
9 1
A z 9
Az
A z 3
q
4
9
q
n-1
n
FIGURE6. Diagram illustrating the multislice method.
volution symbol,
*,
is defined by
f(x>
=
jf(X)n(x -
w dX.
The phase grating function qn-, in the crystal potential at (n - 1)th slice is expressed as qn-l(X,v) = exp{-iaK-,(x, y)Az}. (41) For initiation of the iteration of Eq. (40), the zeroth-slice wave function $o is set to the incident wave function. Although equal slice thickness is not required in the iteration, computational time and memory can be saved if the slice is defined such that qn and pn are unchanged from one slice to the next. The multislice Eq. (40) is expressed in the reciprocal-space representation as IcIn
(k k) = C 1[$n- i (A’, W”- (A’, k’>lQ,- i ( A - A’, k h k
- k’)
= [$,-i(k k)Pn-i(kk)I*Q+i(k k), (42) where h, k represent the reciprocal-space coordinates corresponding to x and y . Using equal slice thickness, Eq. (42) can be rewritten as
$n(h, k ) =
CCW
[$n-
i
(A’, k’)f‘(h’, k’)lQ(A - h’, k
-
k’)
h’
(43)
If there is a total of N beams included in the calculation, then the preceding sum requires 2 N 2 complex multiplications and N 2 complex additions. The computation time for this procedure can be prohibitive, considering that it must be repeated for every slice. An alternate method for evaluating the convolution is use of the convolution theorem for Fourier transforms (Ishizuka and Uyeda, 1977). With this theorem, the multislice iteration is k ) = q F - ” L ( k k)P(h, k)lF-“Q(h,
Wl},
(44)
n-BEAM DYNAMICAL CALCULATIONS
191
where F and F-' denote the forward and inverse Fourier transforms. In this calculation, four steps are needed at one convolution sum so that it seems to be roundabout. However, with an increasing beam number, the computation time in each sum is less than that in straightforward calculation because of the efficient of fast Fourier transform algorithm (Brigham, 1973). Furthermore, the method is extremely fast when it is programmed in conjunction with an array processor, which makes possible the simulation of defects. The practical procedure of this calculation is summarized by several authors (Ishizuka and Uyeda, 1977; Cowley, 1981). The two main reasons why the multislice method fails to find accurate values for diffracted amplitudes and phases result from the larger slice thickness and the use of an insufficient number of beams in the calculation. Therefore, it is necessary to determine the convergent conditions for beam number and slice thickness before fitting with experimental images. The convergent conditions depend on the atomic scattering factor, accelerating voltage, and orientation.
B. Defect Calculation The extension of the multislice method to the calculation of nonperiodic objects such as defects and amorphous scattering can be performed with the so-called periodic continuation approximation. It retains periodicity artificially, so that reciprocal space is sampled at positions for an extended unit cell like Bragg reflections (Fields and Cowley, 1978).Then, both Bragg reflections and diffuse scatterings are calculated with the convolution theorem for Fourier transforms as in a single unit cell. Up to now, there are some works for simulations of defects (Armigliato et af., 1985; Kikuchi et al., 1988). In order to perform the accurate simulations for a defect, the large extended unit cell is demanded. In practical calculation, however, the selection of extended unit cell size has not been completely clarified. The simulated images of a vacancy for each extended unit cell are shown in Fig. 7 under the following beam conditions: (1) all Bragg reflections and diffuse scatterings inside the objective aperture corresponding to the dark area in the diffraction pattern (Fig. 7(a)): (2) only Bragg reflections inside the objective aperture (Fig. 7(b)); and (3) the 000 and diffuse scatterings in dark area (Fig. 7(c)) (Kikuchi et af., 1988). By choosing optimum defocus, the identification of a vacancy can be carried out with the difference in spot contrast between a vacancy and surrounding atoms. However, the small extended cell size, i.e., 2a x 2a and 4a x 4a (a = lattice constant), leads to virtual bright spots between vacancies, although the contrast change in spot contrast due to a vacancy for 8a x 8a is isolated around the vacancy. The insufficient extended unit cell size gives rise to incorrect identification of a
FIGURE7. Simulated images of a vacancy in copper as a function of extended unit cell size at AF = 49 nm. Images were simulated under the following conditions: (a) Bragg reflections and diffuse scattering inside the objective aperture; (b) only Bragg reflections inside the objective aperture; (c) 000 and diffuse scattering of dark area. (Courtesy of Dr. K. Kikuchi and Phys. Sratus Sol.)
193
n-BEAM DYNAMICAL CALCULATIONS
defect. For the determination of the atomic structure of a defect, the extended cell size is one of the important factors as well as the selection of slice thickness and position. C . Derivation of the Multislice Method from the Schrodinger Equation
Besides the physical optics approach, the multislice method was derived from quantum mechanics by several authors (Ishizuka and Uyeda, 1977; Jap and Glasser, 1978; Gratias and Protier, 1983). In this subsection, the approach of Ishizuka and Uyeda (1977) is discussed, because it is applied to deriving the extended multislice theory and the inclined illumination one. The integral equation of the Schrodinger equation is defined as exp {iklr - r’l} $(r) = exp ( i k * r) - m/(27ch2) W’)$(r’) dr’, (45) Ir - r‘l
s
where k = lkl, r = ( x , y , z), and k is a wavevector of an incident electron. For high-energy electrons, it is useful to rewrite the wave function as modulated plane wave:
-
(46) $(r) = exp ( i k r)p(r). Substituting (46) into (49, the integral equation for 4 is exp {iklr - r’l - k (r - r’)} V(r’)4(r’) dr’. (47) 4(r) = 1 - m/(2nh2) Ir - r’l
-
Using a small angle approximation, the following relation is obtained: Ib - bI2 klr - r’l - k (r - r’)k * (r - r’) x k 2(z - z‘) Here vectors b(x, y ) and b(x’, y’) are perpendicular to the z-axis. Then Eq. (47) is approximated for the forwardscattering as
-
$(b, z ) = 1 - ia
ss
V(b’, z’)#(b’, z’)p(b - b’, z - z’)dz’db.
(48)
By using the following properties
r
p ( b - b’, z - z’)
=
J,
p(b - b , z - z”)p(b - b , Z”
-
z’)db, (49b)
where X is the plane positioned at z“ between z and z’, Eq. (48) is modified as 4(b, z> =
s
4@0
7
zo)p(b
-
bo z 2
-
zo) dbo
194
KAZUTO WATANABE
-4
lDll l+., -2
V(b’, z’)#(b’, z’)P(b - b’, z - z’) dz’db’, (50) in‘ where n’ = (l/hv) and v is velocity. This integral equation of Volterra’s type (Margenau and Murphy, 1943) can be solved by expanding 4(b, z) in an infinite series off,(b, z): -
Substituting the trial forms of (51) into (50), zero-order part and nth order one are obtained as
I
f,(b, z ) = #@o, zo)p(b - bo, z - zo)dbo,
(524
L(b, z ) = j ‘ = ‘ j V ( b ’ , z’)f,_,(b’, z’)p(b - b’, z - z’)dz’db’.
(52b)
I =zo
With the successive application of the stationary-phase approximation method, f, reduces to
f,(h z ) = ~ ( 1 / n ! ) { V(bo, ~ ~ ~ z’ldz’ ‘ -_
r
4(bo, zo)p(b - bo, z
-
zO)dbo.
(53) When relations of (52) and (53) are substituted into (51), the final relationship for the n-beam dynamical calculation before and after passing the slice is given by z =z
4(b, z)
= lexp{ - io’l
V(bo, z’)dz’
:=zo
1
4(bo, zo)p(b - bo, z - zo)dbo
(54) Equation (54) can also be transformed into the familiar convolution form:
4n+,(b) = [#n(b), qn(b)l*.p(b,
Zn+I - Z n h
(55)
where
4%(b) = #(b3
zn
)3
.-- ‘ “ +_I
{ jz ~
q n = exp - 10’
=:”
I
V(b, z’)dz’ .
The sequence of the wave propagator and the phase grating function in Eq. (55) is in the reverse order to that in Eq. (40). This difference is caused by the manner of the projection potential in a slice. The projection on front surface is carried out in Eq. ( 5 5 ) , while the back one is in Eq. (40).
195
n-BEAM DYNAMICAL CALCULATIONS
D . Extended Multislice Method An extended multislice theory succeeded in including multiple inelastic scattering as well as the multiple elastic one by solving Yoshioka's coupled Eqs. (22) and (23) (Wang, 1989; 1990). Equations (22) and (23) are expressed in a matrix form as
The following basic procedure is similar to that introduced by Ishizuka and Uyeda (1977) for finding multislice formula. The wave function for a highenergy electron may be represented by the modulated wave function: The integral equation for
4,
$, = e x p ( W 4 , can be written in the form (57):
F(r - r', ko)
0
0
...
0
F(r - r', k , ) . . .
0
0
... ... ... . . . F(r - r', k,)
0
where function I is
and the function F is defined as
-
F(r - r', k,) = exp[i{k,Ir - r'l - k, (r - r')}l/lr - r'l. Using a small angle approximation, the stationary-phase approximation, and relations of propagation functions (49a, b), the extended multislice equation for multiple elastic and multiple inelastic scattering before and after penetrat-
Po@
40
‘I
4m
= Jdbo
-
bo, Az)
0
0
pl(b - bo, Az)
0
0
...
0
... 0 .. .. .. ... . . . pm(b- bo, Az)
where the propagation function with energy E, is given by
P,(b, Az)
e x p (inb2/1,Az). i&Az
=L
Here, hAm = Hn;(b, z)Az, and a’ = (l/vh). The equation for first-order approximation of (58) is derived by assuming
(60) Equation (60) has clear physical meanings. For the elastic scattered wave, 40, the first term represents simple elastic scattering, which is the elastic penetration of the incident elastic wave. The second term shows the transitions from excited states to the ground state, which is characterized by a complex correction potential associated with the polarization process in a solid. For the excited state, 4,,, the first term represents the elastic scattering of an inelastic wave and the second one the transmission from other states to the nth state. This latter term is considered the generation of the inelastic wave when the electron passes through a crystal. The practical applications of this theory were performed for the calculations of energy-filtered diffraction patterns and images, the energy-filtered
197
n-BEAM DYNAMICAL CALCULATIONS
diffraction patterns from atomic inner-shell losses, and the contribution of thermal diffuse scattering to the high-angle annual-dark-field (ADF) scanning-transmission-electron-microscopy (STEM) lattice images (Wang, 1990). E. Inclined Illumination The previous discussion for the multislice method is based on the conditions that the crystal-zone axis is parallel to incident-beam direction and the surface is normal. When fitting simulated HRTEM images with experimental images, beam tilting is an important factor for HRTEM simulation as well as convergent-beam electron diffraction (CBED) experiment and composition analysis by the thickness-fringe (CAT) one. There are two ways to include the tilting effect in the multislice method. In the former, the effect is introduced only in the propagation function through the excitation error, and the projected potential need not be recalculated. The reciprocal-space form of the propagation function through Az is given by P(h, k ) = exp [ - 2xi[(h, k)Az], (61) where ( ( h , k ) is the excitation error (Allpress et al., 1972; Cowley, 1988). The excitation error is defined as negative when the reciprocal lattice point lies outside the Ewald sphere and is parallel to the surface normal. From the geometric arguments, the following relation is obtained:
l ( h , k) = [(l/i.)2 - ((h - h0)2a*2
+ (k - k0)2b**+ 2(h - h,)(k - k,)a*b* cos [(l/A)’ - {hia*2 + k;b*’ + 2h,koa*b*cos~*)]’~2, B*)]1’2
(62) where jl* is the angle between the reciprocal-lattice basis vectors and (h,, k,) the center of the Laue circle. Self and O’Keefe (1988) showed that when the beam is not aligned with crystal-zone axis, the error caused by this treatment is negligible in calculated diffraction-electron amplitudes and phases for the beam tilt of up to 10” away. As for the tilt of surface normal, the treatment is suitable to the inclined surface by up to 45” from the hk plane. With the exception of the definition (62), the paraboidal approximation to the Ewald sphere is utilized by many workers. A different approach for including the effect of crystal tilt in the multislice method has been reported (Lynch, 1971). The latter was proposed by Ishizuka (1982) based on the Schrodinger equation. The procedure for deriving the multislice equation of inclined illumination is almost the same as that in Section 1II.B. The multislice equation for inclined illumination for an orthogonal system in which the z -
198
KAZUTO WATANABE
FIGURE8. The relationship between the incident beam direction and the crystal coordinates. The shift of origin is denoted by b,.
axis is perpendicular to the crystal surface is found as where k
=
@ " + I (b) = [4fl(b)qfl(b)l* (kZ/k)Pfl(b), lkl, o = ( i / h ) ,k, = z component of k, and
(63)
jZZZn 2=zn+,
qn =ex+
i ( k /k . )c' ~
V(b,
Zk) dZk}.
When the c axis of the crystal coordinates is not perpendicular to the surface, i.e., a nonorthogonal system (Fig. S), the b coordinates of origins of phase grating function shift to a direction parallel to the surface. The origin of @ n + , must coincide with that of the next phase grating function qn+,during the next iteration. If this shift is denoted by b,, then Eq. (63) becomes - b,) = [4fl(b)qfl(b)l*(kz/k)Pfl(b). (64) This multislice method with inclined incidence was applied to throughthickness images of small MgO cubic crystals observed from the (1 10) direction with 200 kV and 400 kV high-resolution electron microscopes (Tanji, Masaoka, and Ito, 1989). Simulated through-thickness images were
n-BEAM DYNAMICAL CALCULATIONS
199
compared with actual microscope images in Fig. 9 for the case of 400 kV. Simulated images show good agreement with the experimental images.
V. COUPLED DIFFERENTIAL EQUATIONS
A . Heal Space Method Van Dyck (1980) derived a new method for n-beam dynamical calculation, in which the entire calculation is done in real space. Thus, it is called the real space method (RS method). The domain of the validity of the RS method has been discussed by several authors (Self, 1982; Kilaas and Gronsky, 1983; Van Dyck and Coene, 1984). In the same way as other n-beam dynamical calculations, the RS method starts with Schrodinger equation:
A$
+ k2$ + V$ = 0,
(65)
where V = - 2me/h2U with crystal potential U. For high-energy electrons, it is convenient to rewrite the wave function as modulated plane wave:
$ = exp(ik
- r)4.
Substituting (66) into Eq. (65) gives
A 4 + i k - V 4 + V 4 = 0. When an incident beam is nearly parallel to the surface normal, as the most experimental situations, the normal component k, of wavevector is larger than the parallel component k, (Fig. 10). It is suitable to separate Eq. (67) into
where V, and A, are the gradient and Laplacian operators in the coordinates x, y , respectively. By making the assumption that 4 is a slowly varying function with respect to z , the second-order derivative with respect to z in Eq. (68) is neglected:
This amounts to ignoring backscattering electrons and a slight change in the electron wave vector when the electron tranverses the potential.
0
0
c-4
20 1
n-BEAM DYNAMICAL CALCULATIONS k.
0
E
FIGURE10. Schematic representation of the scattering problem. The system is divided into a series having thickness E .
Equation (68) is transformed into a first-order differential equation in z :
Using the shorthand notation:
i -V ( x , y ,
2kz
2) =
IV.
Eq. (70) can be reduced to
If the specimen is divided into thin slices of equal thickness, equation for each slice can be given by
E,
the integral
and it can be expanded in power in 1:
The expression of a second-order in I becomes
+
q5& Y , 8 ) = [1 + I { A X y & V,(x, Y>>+ ~ 2 { ( A x y ~ + ) 2V / 2, ( x ,y)'/2
+ A{&,
- Z(x, Y ) > V , ( X , Y )
where the projected potential,
+
P(x9
Y ) V , ( X , Y)Axy1q5(X, Y , 01, (74)
6 ,is defined as in the previous section, and
202
KAZUTO WATANABE
the center of potential given by
I ' z V ( x , y , z>dz T(x, Y ) =
I
E
V ( x , Y , z ) dz
'
Since Axv and V are noncommuting, a rapid calculation can be obtained in the case where Eq. (74) is expanded in alternation of a minimal number of the functions of the form: There is an expression that expands Eq. (74) up to the second order in AxJ and V in one slice E . It takes the form
44%.Y> E )
= exP{l/2iVP(x>Y)(l
+
v)>
*
exP(4y&)
x exp{l/22T/p(x, Y ) ( l - w , .Y))4(x, .Y> 01,
(75)
where the potential eccentricity, 6, is written as
This value presents the relative deviation between geometrical center 4 2 and that of the potential. The basic difference between the RS and multislice methods is that the former is carried out entirely in real space and eliminates the need to use periodic continuation, while the latter approach is carried out either entirely in reciprocal space or swapping between reciprocal and real space when the F F T algorithm is used. The original proposal of the RS method promised to further reduce computation time so that it is directly proportional to number of beams, N , while the multislice method is proportional to Nlog N using the FFT algorithm (Van Dyck, 1980). The method offers a considerable reduction in computation time over the multislice method, when identical sampling conditions are employed. However, keeping the same accuracy, the RS method requires more sampling points and more computation time than the multislice method. The validity and the comparison between the RS method and multislice one were discussed by Kilaas and Gronsky (1983). Van Dyck and Coene (1984) and Coene and Van Dyck (1984a,b) proposed the S 2 error criterion in order to optimize the algorithm and to select the parameters. Well known, the multislice expression is a solution of (73) up to the first order in 1,:
-
d x , Y , &I = exp (jk&A} exp {AK>qYx,Y , 01,
(76a)
n-BEAM DYNAMICAL CALCULATIONS
203
or
4[x, y, E ] = exp (A&>
- exp {A&A}4tx9Y , 01.
(76b)
which is easily derived using expansion of exp { h A } and exp { A V , } in 1. The required wave function for a given thickness can be obtained accordingly by successive applications of (76a, b). Hence, the dynamical calculations in the slice theory can be considered as an alternation between phase grating operation of V, and electron propagation of 2-D Laplacian operator A,. Wang and Chen (1988) made clear the validity of this first-order RS method using a criterion that imposes a practical limitation in choosing the sampling interval and slice thickness. Furthermore, the superiority for the simulation of nonperiodic objects was discussed by Wang et al. (1990). Recently, using the usual approximation in the “elastic” RS method and applying a “single-elastic-into-inelastic” approach, an extended real space method for multiple elastic scattering and multiple inelastic scattering was set up based on the Yoshioka’s coupled equations (22, 23) (Coene and Van Dyck, 1990). B. Direct Integrated Method
Using 2-D periodicity for a thin crystal, Watanabe et al. (1988; 1990) derived two sets of coupled differential equations for n-beam dynamical calculation. 1. Analytical Integration Method
This method (Watanabe et a/., 1988) was derived from Schrodinger equation using 2-D periodicity. As a crystal potential is obtained by V,,,where g,/ is the projection of reciprocal lattice vector g on the surface plane, the wave function such a crystal can be expanded as
wx 2) = C4Mz)exp G O / / + Q) - x)/s1’2,
(77)
g//
where X = (x,y ) and S = the area of surface. Substituting (77) into the Schrodinger equation, the result is a set of coupled one-dimensional equations: [(d2/dz2)+ 2E - Ik,, + gill2 - 2KNg//(Z)= 2 C V,,//-g//+g,//(z), (78) g’//
where V,,, is the Fourier component of the crystal potential at z. Here, the left-hand side indicates the propagating part with oscillations and the right-hand side the scattering one. The preceding coupled equations were also derived on the basis of the same assumption by different workers
204
KAZUTO WATANABE
(Tounarie, 1962; Lynch and Moodie, 1972; Ichimiya, 1983). When the set of differential equations is solved with numerical integration, the integration step must be kept small enough to accommodate the most rapid oscillations. The oscillating term makes the calculation enormous and tedious. This problem, however, is bypassed by integrating the coupled second-order differential equations analytically (Payne et al., 1986). The analytic integration of these equations gives the coefficient $,,,(Az) at the next step as $,// (AZ) = 2 cos W Z ) $ , / / (0) - $g// ( - A 4
+ 4[1 - cos
c
vg‘//-g/,(o)$g’//(o)/K2,
(79)
g’//
K’ = 2E -
Ik,, + gill2 - 2V,,
(80)
where $,,/(O) and t+hgii( - Az) are the coefficients at the present and previous steps, respectively. In principle, it can describe the localized states along the incident beam direction, such as a disorder system and surface relaxation, and may involve the multiple elastic scattering and multiple inelastic scattering in the crystal. Furthermore, using an extended unit cell, simulations of defects are carried out, as in the multislice method. The n-beam dynamical calculations of aluminum, copper, and gold at 100 kV were carried out in a completely parallel manner by the analytical integration, multislice, and Bethe methods (Watanabe et aZ.,1988). The analytical integration method turned out to be more competitive with respect to accuracy than the latter two methods. However, the slice thickness is much thinner than that of multislice method, and it is hard to apply the method to simulations of defects at a high accelerating voltage. 2. Numerical Integration Method
The time-consuming requirement for the analytical integration method is mainly attributed to the first term of the solution (79). Watanabe et al. (1990) derived a new method with the use of modulated plane wave approximation for a fast electron like other n-beam dynamical calculations. The Gg,,( z ) in the wave function (77) is assumed to be represented by a modulated plane wave: $g//
(4 = &,,( z )exp (ikz).
(81)
Substituting (81) into Eq. (78), the coefficient $,/, satisfies, following a set of equations:
[(d’/dz’)
+ 2ik(d/dz) - (k2+ Ik,, + gi/12- 2E)I$,/,(Z)
= 2 c vg,/,-g,/(bgY(z). g’//
(82) 2 , result is a set of coupled firstProvided that kd$,,,/dz 9 d 2 $ g , , / d ~the
n-BEAM DYNAMICAL CALCULATIONS
205
These equations remove the oscillating part of the propagating of electrons through a crystal. The right-hand side in Eq. (83) is very small for a high accelerating voltage, so that the integration step can be chosen to be much larger and much computation time is reduced compared with (78). Since the main part of the integration is the convolution sum at right-hand side in Eq. (83), it is possible to reduce computation time with the fast Fourier transform algorithm (Brigham, 1973) because the computational efficiency increases with the beam number. Compared with the multislice method, the slice thickness is so small that the calculation has little effect on slice position, unlike the multislice method. Therefore, it can be expanded not only to complex systems but also to defects using an extended unit cell. Equation (83) is also similar to the standard scattering equations (Howie and Basinski, 1968), except for a crystal potential. In this treatment, a unit cell is divided into many slices, and the crystal potential is constructed at each slice. On the other hand, the crystal potential for standard scattering equations is built up from the structure factors of the unit cell. The viewpoint of this method is quite different from the standard one in spite of its similar appearance. For RHEED, similar equations were also derived by Maksym and Beeby (1981). The Fourier transform to real space may correspond to the RS method. In this treatment, 2-D periodicity is used in deriving the equations to avoid the troublesome 2-D Laplacian calculation, whereas the RS method extracts the problem using a three- or five-point Laplacian approximation. In order to perform n-beam dynamical calculations accurately, it is indispensable to estimate the upper limit of slice thickness and the lower limit of beam number for various materials; that is, convergent conditions. A rigorous way of finding explicit convergent conditions is to calculate the sum of deviation functions (Ishizuka and Uyeda, 1977; Coene and Van Dyck, 1984a,b). The convergent conditions of aluminum, copper, and gold for (100) and (1 10) n-beam dynamical calculations at 100 and 300 kV are summarized in Table I. From this the (100) n-beam dynamical calculation demands fewer beams than the (1 10) one; and the beam number is independent of accelerating voltage. The slice thickness decreases with increasing atomic scattering power but almost does not depend on accelerating voltage. Figures 11 and 12 show thickness series of 000 and 220 beam intensities for (100) n-beam dynamical calculations simulated by the numerical integration, multislice, and Bethe methods at 100 and 300 kV, respectively (Watanabe et al., 1990). For the simulation of the multislice method, slice thickness was
206
KAZUTO WATANABE TABLE I MAXIMUM SLICETHICKNESS AND MINIMUM BEAN NUMBER FOR CONVERGENCE. IS THE LATTICE CONSTANT.)
(“A”
(100) n-beam dynamical calculations
(1 10) n-beam dynamical calculations
A1 11 x 11
cu
Au
I 1 x 11
11 x I I
cu
lO0kV
a/64
a/256
a/5 12
1 1 x 11
11 x I I
11 x 1 1
13 x 13
13 x 13
13 x 13
300kV
a/64
a/256
a/5 12
(a/2’”)/64
(a/2’I2)/128
(a/2’ 2)/256
Al 13 x 13 (a/2’I2)/64
KV
Au
13 x 13
13 x 13
(a/2”*)/128
(a/2’‘)/256
-
(1001
Al
220
~
~~
0.37--
.
.. .
OO
Z/a
15
20
0.3~
Au
220
L OO
A 5
, L
L
10
Z/a
U
_j
15
20
....
oo-o-qd
5
10
Z/B
~.
15
FIGURE11. Beam intensities of the 000 and 220 reflections for (100) n-beam dynamical calculations at 100 kV: the multislice (circles), Bethe (solid line), and numerical integration methods (dashed line).
207
n-BEAM DYNAMICAL CALCULATIONS
L,
OO
I
5I
I
I
I
‘ 10 1
1
I
L
. . . 15
20
00
5
10
15
20
15
I
z/a
Z/a
L-,
OO
5
10
Z/a
Z/a
5
10
Z/a
15
20
OO
5
10
Z/a
FIGURE12. Beam intensities of the 000 and 220 reflections for (100) n-beam dynamical calculations at 300 kV: the multislice (circles), Bethe (solid line), and numerical integration methods (dashed line).
a (lattice constant) and beam number 32 x 32. The slice thickness is the common condition for aluminum, copper, and gold, and the beam number is large. An 11 x 1 1 beam number was adopted for the Bethe method. Results of the three methods are superimposed at low atomic scattering power independent of accelerating voltage. As atomic power increases, the numerical integration method at 100 kV shows little difference from the Bethe method and yields large disagreements with multislice method. The poor approximation of the multislice method has already been suggested by Lynch (1971). At 300 kV, the difference becomes appreciable. The discrepancy with the multislice method is qualitatively explained in terms of a sudden perturbed approximation (Gratias and Protier, 1983). According to this
208
KAZUTO WATANABE A :
100 KV
(110)
Al
111 ~
~~
0
AU
111 ~
7
FIGURE13. Beam intensities of the 000 and 11 1 reflections for (1 10) n-beam dynamical calculations at 100 kV: the multislice (circles), Bethe (solid line), and numerical integration methods (dashed line).
approximation, the slice thickness in the multislice method must be chosen thin enough to obey the following inequality: Az G 2 K / ( T I V J 2 ) ' * ,
where K = wave number. By selecting the slice thickness to be lattice constant and making the beam number large, the convergent results are obtained for gold at l00kV. However, this condition may be far from satisfactory for a sudden perturbed approximation. This poor approximation is dissolved with the thinner slice thickness (Self et al., 1983). In Figs. 13 and 14, the intensities of the 000 and 111 beams for (110)
209
n-BEAM DYNAMICAL CALCULATIONS Al
000 L..
-,
300KV
( 1 10)
Al
111
I
0.31
I
L
I
15
20
Au
> t0.2
t
I
FIGURE14. Beam intensities of the 000 and 1 1 1 reflections for (110) n-beam dynamical calculations at 300 kV: the multislice (circles), Bethe (solid line), and numerical integration methods (dashed line).
n-beam dynamical calculations simulated by the three methods are plotted against thickness (Watanabe et al., 1990). A 16 x 16 beam number was adopted for the Bethe method, and a 32 x 32 beam number and 42’’’ slice thickness were used for the multislice method. The numerical integration method is in good agreement with the Bethe method for all atomic scattering powers, whether accelerating voltage is 100 kV or 300 kV. While the multislice for aluminum is identical with other two, the deviation increases with atomic scattering power. Compared with the (100) n-beam dynamical calculations, the deviation is not so large and is not drastically diminished with accelerating voltage. This small deviation may be caused by a smaller slice thickness than in the (100) one. The effect of accelerating voltage,
210
KAZUTO WATANABE
however, cannot be interpreted simply by a sudden perturbed approximation, unlike in the (100). The numerical intergarion method is competitive with respect to accuracy as well as calculating speed. Therefore, using an extended unit cell and convolution sum calculation with FFT method, it also can be applied to dynamical calculations for complex systems and defects. In addition, an extended theory that involves multiple elastic scattering and multiple inelastic scattering is introduced with the same manner as the extended RS method (Coene and Van Dyck, 1990).
C. Scattering Matrix Method Zhao, Poon, and Tong (1988) and Nagano (1990) respectively proposed new approaches for RHEED calculation using the scattering matrix method. They derived scattering matrices for a slab using the technique for solving coupled second-order differential equations developed by Stechel, Walker, and Light (1978), Magnus (1954), and Light (1971). In this subsection, a new scattering matrix method for n-beam dynamical calculation is discussed on the basis of Nagano’s treatment, because it is flexible and efficient. Assuming 2-D periodicity, the crystal potential and wave function can be written by
where X = (x, y ) , gii = (gx,g,), k = (k,,, kz); and S = area of surface. Substituting Eq. (84) into the Schrodinger equation, t,hgii(z) obeys the 1-D second-order differential equations which are equivalent to (82):
where
with
Here, E is the incident electron energy.
n-BEAM DYNAMICAL CALCULATIONS
21 1
Introducing the column vector Y as
(89) Equation (86) can be rewritten in the following form: d2Y dz2
-- -
m,
w,/,,/
where ( F),//,,,/ = ' Defining a column vector 8 to combine Y and its derivative as
8 = [dY/dz
1.
Equation (90) can be simply rewritten as
-d8 = [ a dz
718, Fa
where r'is unit matrix and 0 is zero matrix. Over the short range, h, the solution (Magunus, 1954; Light, 1971) can be obtained by an exponential series:
9 ( i )= Mi8 (i - I), f i i = exp [h;
q.
(93) (94)
For simplicity, a slab is divided equally into (n - 1) slices as shown in Fig. 15. Using Eq. (93) repeatedly results in
q ( n ) = M&(n - 1) = MnMn where
811 812 B=[821
8221
-
1 . . . M 2 9 ( 1) = B q ( l),
(95)
212
KAZUTO WATANABE z
0
2 0
~
FIGURE15. Slicing a slab from 0 to zo into n
-
1.
with
The matrix W is diagonalized by the unitary matrix
0:
On- W o n = A: I . In order to apply the layer-doubling method that had been originally
213
n-BEAM DYNAMICAL CALCULATIONS
/I\
incident
transmitted waves
reflected waves c a s e
I
c a s e
I 1
FIGURE16. Schematic view of electron scattering processes for case I and case I1
developed for LEED, Nagano (1990) derived transmission and reflection coefficients for two scattering cases as shown in Fig. 16. The first one is where the incident electron comes from above the slab (case I) and the second is where it comes from below the slab (case 11). In the case I, the electron wave function is written as
Y;(r)
i
-
exp ( - ik,, X - ikglz )
c
&I//
=
+ 1rilg1,/exp {i(k// + gill - X + ikgz},
z
> zA
z
< zg,
g//
exp {i(k// + g//) * x
-
ikg4.
Pi/
Using the proper boundary conditions and calculating bjj,the scattering wave functions are uniquely determined, thereby enabling the transmission
214
KAZUTO WATANABE
coefficients (rgiig.//)and reflection ones
(rg;ig.ii) to be obtained rexP ( - ikglZA)
as
1
where
For case 11, where the incident electron comes from below the slab, the electron wave function is written as
As in case I, the columns for the transmission and reflection are expressed as
215
n-BEAM DYNAMICAL CALCULATIONS
These columns have the complete information of every scattering process in each case. Scattering matrix for the combined system of two slabs can be considered as seen in Fig. 17. According to the layer-doubling scheme, scattering matrices in the cases I and I1 are given, respectively, by RAB-
=RA-
FAB-
=
RAB+
-R
TAB+
=
+ F A i R B - ( f -
FB-(T-
AA+RB-)-I
FA-,
( 102)
RA+RB-)-IFA-,
(103)
and B i + FB-RA+(f-
RB-RA+)-I
FB+ 7
FA+
(T- R B - R A + ) - I
FB+
( 104)
(105) Equations (102)-( 105) have very clear physical meanings. The scattering process is interpreted from right to left in these equations. For example, (102) shows the following processes: the incident electron is simply reflected by slab A , or transmits slab A , and after repeated multiple scattering between slab A and slab B, which is reflected by slab B and transmits slab A . In general, since scattering matrices are calculated at the center of the slab, phase matching becomes sensitive. Equations (102)-( 105) take care of the phase advancement automatically, so that the layer-doubling method of LEED can be used without worrying about phase matching at all. Using the layer-doubling method, therefore, a calculation for a perfect crystal is reduced to a slab corresponding to a unit cell, because other scattering matrices are identical, due to the periodicity of the potential in the z direction.
216
KAZUTO WATANABE
T
FIGURE 17. Schematic view of the layer-doubling method.
The results of (100) n-beam dynamical calculation using the scattering matrix method were compared with the multislice and Bethe methods (Watanabe ef a[.). The parameters used for the multislice and Bethe methods are the same as those in Fig. 11. For the simulations of the scattering matrix method, the convergent beam number was 135 and the convergent slice thickness 4200 (a = lattice constant). Figure 18 shows the thickness series of 000 and 220 beam intensities for aluminum, copper, and gold calculated with three methods. The scattering matrix method is in good agreement with the Bethe method for all atomic scattering power and is also identical with the multislice method except for gold. While this scattering matrix method requires an enormous memory, it is competitive with respect to accuracy and can take into account the inclined beam effect spontaneously. Furthermore, it may involve multiple inelastic scattering.
217
n-BEAM DYNAMICAL CALCULATIONS
100 K V
A I
A1
t
-c n
- -0Mult1-9licc ~the
1
OScatterlng-Matrlx
t
50.5: P
f
~
t --&&the
5
10
5
15
10
Z/e
Z/a
000
220
oMul t 1-91 Ice O S c a t t s r i ng-Matr I x
fn
f.
20
15
cu
cu
I-
~
c
--&the OMul t I -91 I cc OScattcrIng-Matrix .
10
5
15
2 /e
20 - 0 0
5
Au
L &
Z10 /a
15
' ' ' 20
Au
2?!-
. . . 0.. 3 I
-Enthe
>
>
-
0
0
-
t. $0.56 k
f.
t0.2-
z w
L
L
1
I cc OScattcrlng-Matrlx
-
k
-
oMul t 1-91
-
20.1-
--&the
I
OMulti-slice OScattcring-Matrix
5
I0
Z/a
15
20
OO
5
10
15
Z/e
FIGURE18. Beam intensities of the 000 and 220 reflections for (100) n-beam dynamical calculations at 100kV.
VI. SUMMARY As early as 1928, Bethe developed the dynamical n-beam theory of electron diffraction, which is more complex than theories applied to two diffracted beams. Following this, the problems of n-beam dynamical scattering of fast electrons have been approached by a variety of methods. Initially, the most important factor when selecting a method for a given situation was the constraint placed on computing time and memory size, although currently most reported methods are evaluated on a real-time basis without having this constraint. In addition, a complete simulation system is now commercially available. Rapid progress in obtaining better HRTEM resolution requires
218
KAZUTO WATANABE
more precise simulations for n-beam dynamical calculations that account for various effects such as beam tilt and inelastic scattering (Bithell and Stobbs, 1989). Up to the present, the multislice method has been the most flexible and manageable technique to perform n-beam dynamical calculations of defects as well as perfect crystals, having been improved so as to treat the effects of bezm tilt and inelastic scattering. It has also applied to simulations performed using the advanced technology of electron microscope (ADFSTEM). However, the problems involved in selecting slice thickness using strong atomic power at low accelerating voltage and for complex material and determining optimal slice position have not been completely solved. Thus, the multislice method may not be the best method to use for solving various problems. The Bethe method, on the other hand, is the most precise and effective theory used for matching through-focus or throughthickness HRTEM images and those simulated, because the wave functions at each thickness can be easily calculated from the eigenvalues and eigenvectors. It should be noted that this method makes it difficult to evaluate n-beam dynamical calculations for complex systems and conceptually inadequate for defect evaluations. The real space method is a reliable method of simulating nonperiodic objects, although there is a little discrepancy in beam intensities for perfect crystals. The direct integrated method is competitive with respect to accuracy with Bethe and multislice methods, but the computation time and memory requirements are still greater than for other methods. From our elementary calculations, the scattering matrix method is shown to give accurate n-beam dynamical calculations for perfect crystals, although the simulation time and memory size are of the same order as the direct integrated method. The matrix scattering method has an advantage of being able to take into account the beam tilt effect automatically, and this may make it possible to incorporate the inelastic scattering effects in principle. Clearly, there are advantages and disadvantages when using each method. All methods should therefore be applied to various existing problems, making use of their particular advantages. For example, in the case of defect simulations, the thickness and instrumental parameters are determined using the Bethe method, with image simulations being subsequently performed using either the multislice or real space method. More advanced n-beam dynamical calculations will also be required in the future to provide higher resolutions for electron microscopes.
219
n-BEAM DYNAMICAL CALCULATIONS
ACKNOWLEDGMENTS Fruitful discussions with Drs. Y. Kikuchi, K. Hiratsuka, N. Hashikawa, and K. Mituishi on multislice, Bethe’s eigenvalue, direct integrated, and scattering matrix methods are acknowledged. Additional thanks are due to Professors H. Yamaguchi and I. Hashimoto for drawing my attention to HRTEM. I would also like to thank Dr. P. Hawkes for his editional support. APPENDIX: CRYSTAL POTENTIAL As mentioned in Section I, knowledge of the crystal potential is an essential factor for n-beam dynamical calculation. The steps leading to conventional calculation of the crystal potentials are schematically shown in Fig. 19(a). The potential is produced by atomic scattering factorsfor ionic onesf;,, . The Fourier coefficient, V ( g ) , of the potential is obtained as
where R = the volume of unit cell. The indexj indicates a sum over all atoms in the unit cell, with the fractional coordinate of thejth atom being rj and g being the reciprocal lattice vector. The atomic scattering factor, f; , can be obtained either from tabulated values (Smith and Burge, 1962) or from the Mott formula (Mott and Massey, 1965). For n-beam dynamical calculation, it is convenient to store scattering factors in the form of parameterized fits to the scattering curve. While the most commonly used scattering factors are calculated by Doyle and Turner (1968), there is lack of complete table of parameterized fits of scattering factors. Then, Gaussian fits of the international table for x-ray crystallography for x-ray scattering factors, f”,are available for converting to the electron values with the Mott formula: f ’ ( s ) = (me2/2h2)[Z-f”(s)]/s‘.
( 107)
where s = sin 012. However, the nine-parameter Gaussian fits for the x-ray scattering factors is applicable only to an angular range s = 0 to 20 nm- and attempts to extend it to a higher angle met with large errors. Since the diffracted beams out of values of h, k, 1, corresponding to reciprocal lattice vectors of 30nm-’ are required to maintain sufficient precision, Fox et al. (1989) presented more accurate fitting for a higher angle than 20 nm-’ . The modification of the small angle region was proposed by Peng and Cowley (1988). Unfortunately, this approach cannot involve the screening effect or bonding due to valence charge electrons. In order to exactly incorporate the
’
r; calculate
N
N 0
solve
HY = E Y
I
calculate
calculate p
V I . .
I
= Y 'Y
solve
calc u 1 ate
I
FIGURE19. Block diagram of computational steps in calculating crystal potentials: (a) conventional crystal potential evaluated from atomic scattering factors, and (b) VT is constructed from the ionic potential and the screcning one.
F,,,is
n-BEAM DYNAMICAL CALCULATIONS
22 1
screening potential due to valence charge electrons in the crystal potential, Watanabe et al. (1986) and Kikuchi (1988) proposed the calculation of the crystal potential that is divided into an ionic potential and a screening (Fig. 19(b)). The first step is started with solving the optimized Dirac-FockSlater equation (Lingren and Rosen, 1974) or the Hartree-Fock one, and the scattering factor of ions is calculated. Then, the ion potential is given by
The process of screening potential calculation is initiated using a band theory; i.e., the pseudopotential (Cohen and Chelikowsky, 1988), APW (Loucks, 1967), LMTO (Skriver, 1984), and so on. Using a suitable theory for a material, the electronic energy E,(k)and the wave function $nk can be determined. Hence, total valence charge density
is calculated. The charge density is calculated by the special point scheme of Chadi and Cohen (1973), which yields good agreement with a sum throughout the Brillouin zone. Once the valence charge density is known in terms of its Fourier transform, the Hartree-type screening potential can be evaluated easily. The Hartree screening potential is given by
The Hartree potential is cancelled exactly by the ionic potential at g = 0. As a simplification, the nonlocal Hartree-Fock exchange operator is often replaced by a local potential proportional P ' / ~ but , this approximation fails completely when applied to a high-energy one-particle state. Then, the suitable exchange potential for a fast electron is selected or the exchange potential is omitted because of small values compared with other potentials. The V, and V ,together form the electronic screening potential of the system. They are then added to an ionic potential. REFERENCES Allen, L. J., and Rossouw, C. J. (1990). Phys. Rev. B42, 1 1 6 4 4 1 1654. Allpress, J . G., Sanders, J. V., and Wadsley, A. D. (1969). Acta Cryst. 325, 1156-1164. Allpress, J. G., Hewat, E. A,, Moodie, A. F., and Sanders, J. V. (1972). Acta Cryst. A28, 528-536. Armigliato, A,, Parisine, A , , Hillerband, R., and Werner, P. (1985). Phys. Stat. Sol. ( a ) 90, 115-126.
222
KAZUTO WATANABE
Bethe, H. A. (1928). Ann. Phys. (Leipzig) 87, 55-127. Bithell, E. G., and Stobbs, W. H. (1989). Philos. Mag. A@, 39-62. Bourret, A., Rouviere, J. L., and Spendeler, J. (1988). Phys. Stat. Sol. ( a ) 107, 481-501. Blume, J. (1966). Z . Phys. 191, 248-272. Brigham, E. A. (1973). “The Fast Fourier Transform,” Prentice-Hall, Englewood Cliffs, NJ. Chadi, D. J., and Cohen, L. M. (1973). Phys. Rev. B8, 5747-5453. Coene, W., and Van Dyck, D. (1984a). Ultramicroscopy 15, 41-50. Coene, W., and Van Dyck, D. (1984b). Ultramicroscopy 15, 287-300. Coene, W., and Van Dyck, D. (1990). Ultramicroscopy 33, 261-267. Cohen, L. M., and Chelikowsky, J. R. (1988). “Electric Structure and Optical Properties of Semiconductors,” Springer-Verlag, Berlin and Heidelberg. Cowley, J. M. (198 1). “Diffraction Physics,” North-Holland, Amsterdam. Cowley, J. M. (1988). In “High-Resolution Transmission Electron Microscopy and Associated Techniques” (P. R. Buseck, J. M. Cowley, and L. Eyring, eds.), 58-124, Oxford University Press, Oxford. Cowley, J. M., and Moodie, A. F. (1957). Acta Cryst. 10, 609-619. Cundy, S. L., Howie, A., and Valdre, U. (1969). Phil. Mag. 20, 147-163. Doyle, P. A,, and Turner, P. S . (1968). Acfa Crysf. A24, 390-397. Fields, P. M., and Cowley, J. M. (1978). Acta Cryst. A34, 103-112. Fox, A. G., OKeefe, M. A,, and Tabbernor, M. A. (1989). Acta Cryst. A45, 786-793. Frank, J. (1973). Optik 38, 519-536. Fujimoto, F. (1959). J. Phys. Soc. Jpn. 14, 1558-1568. Fukuhara, A. (1966). J. Phys. SOC.Jpn. 21, 2645-2662. Gratias, D., and Protier, R. (1983). Acta Cryst. A39, 576-584. Hashikawa, N., Watanabe, K., Hiratsuka, K., Tsuruta, C., Hasimoto, I., and Yamaguchi, H. (1991). Inst. Phys. Conf. Ser. No. 117, 11-16. Hiratsuka, K. (1991). Philos. Mug. B63, 1087-1 100. Hirsch, P. B., Howie, A,, Nicholson, R. B., Pashley, D. W., and Whelan, M. J. (1965). “Electron Microscopy of Thin Crystals,” Butterworths, London. Howie, A. (1963). Proc. R . Soc. A271, 268-287. Howie, A. (1966). Phil. Mag. 14, 223-230. Howie, A., and Basinski, Z. A. (1968). Phil. Mag. 17, 1039-1063. Humphreys, C. J., and Hirsch, P. B. (1968). Phil. Mag. 18, 115-122. Humphreys, C. J., and Whelan, M. J. (1969). Phil. Mag. 20, 165-172. Ichimiya, A. (1983). Jpn. J. Appl. Phys. 22, 176-180. “International Tables for X-ray Crystallography,” Kynoch, Birmingham, England. Ishizuka, K. (1980). Ultramicroscopy 5, 55-65. Ishizuka, K. (1982). Acta Crysr. A38, 773-779. Ishizuka, K., and Uyeda, N. (1977). Acta Cryst. A33, 740-749. Izui, K., Furuno, S., Nishida, T., Otsu, H., and Kuwabara, S. (1978). J. Electron Microsc. 27, 171-179. Jap, B. K., and Glasser, R. M. (1978). Acra Cryst. A34, 94-102. Kikuchi, Y. (1988). Philos. Mag. B57, 547-556. Kikuchi, Y., Watanabe, K., Naitoh, S., Hiratsuka, K., and Yamaguchi, H. (1988). Phys. Stat. Sol. ( a ) 108, 509-517. Kilaas, R.. and Gronsky, R. (1983). Ultramicroscopy 11, 289-298. Kogiso, M., and Takahasi, H. (1977). J. Phys. Soc. Jpn. 42, 223-229. Legoues, F. K., Copel, M., and Tromp, R. (1989). Phys. Rev. Lert. 63, 1826-1829. Light, J. C. (1971). In “Methods of Computational Physics” (M. Routenberg, ed.), 10, 1 1 1-140, Academic Press. New York.
n-BEAM DYNAMICAL CALCULATIONS
223
Lindgren, L., and Rosen, A. (1974). Case Studies in Atomic Physics, 4, 93-112. Loucks, T. L. (1967). “Augmented Plane Wave Method,” Benjamin, New York. Lynch, D. F. (1971). Acta Crystal. A27, 399407. Lynch, D. F., and Moodie, A. F. (1972). Surf. Sci. 32,422438. Lagnus, W. (1954). Pure Appl. Math. 7,649-673. Maksym, P. A., and Beeby, J. L. (1981). Surf. Sci. 110,423436, Margenau, H., and Murphy, G. M. (1943). “The Mathematics of Physics and Chemistry,” Chapter 14, Van Nostrand Company, New York. Mott, N. F., and Massey, H. S. W. (1965). “The Theory of Atomic Collisions,” 3rd ed., Clarendon Press, Oxford. Muller, E., Nissen, H. U., Ospelt, M., and von Kanel, H. (1989). Phys. Rev. Lett. 63, 1819-1822. Nagano, S.(1990). Phys. Rev. B42, 7363-7369. OKeefe, M.A. (1979). “37th Ann. Proc. EMSA,” San Antonio, 556-558. Ourmazd, A,, Rentscher, J. R., and Taylor, D. W. (1986). Phys. Rev. Lett. 57, 3073-3076. Ourmazd, A.,Taylor, D. W., Cunningham, J., and Tu, C. W. (1989). Phys. Rev. Lett. 62, 933-936. Payne, M. C., Joannopoulos, J. D., Allen, D. C., Teter, M. P., and Vanderbilt, D. H. M. (1986). Phys. Rev. Lett. 56,26562656. Peng, L.-M., and Cowley, J. M. (1988). Acta Cryst. A44, 1-5. Radi, G . (1970). Acra Cryst. A26, 41-56. Rez, P. (1983). Acta Cryst. A39, 697-706. Rez, P., Humphreys, C. J., and Whelan, M. J. (1977). Phil. Mag. 35, 81-96. Self, P. G. (1982). J. Microscopy 127,293-299. Self, P. G., and OKeefe, M. A. (1988). In “High-Resolution Transmission Electron Microscopy and Associated Techniques” (P. R. Buseck, J. M. Cowley, and L. Eyring, eds.), 244307. Oxford University Press, Oxford. Self, P. G., O’Keefe, M. A., Buseck, P.R., and Sparogo, A. E. C. (1983). Ultramicroscopy 11, 35-52. Shiojiri, M., Kaito, C., Sekimoto, S., and Nakamura, N. (1982). Phil. Mag. A46, 495-505. Skriver, H. L. (1984). “The LMTO Method,” Springer-Verlag, Berlin and Heidelberg. Smith, D. J., Glaisher, R. W., and Lu, P. (1989). Phil. Mag. Lett. 59,69-75. Smith, G.H., and Burge, R. E. (1962). Acta Cryst. 15, 182-186. Stechel, E. B., Walker, R. B., and Light, L. C. (1978). J. Chem. Phys. 69,3518-3531. Sturkey, L. (1962). Proc. Phys. SOC.80, 321-354. Tanji, T., Masaoka, H., and Ito, J. (1989). J . Electron Micros. 38, 409414. Tounarie, M. (1962). J. Phys. SOC.Jpn. 17, Suppl. BII, 98-101. Uyeda, N., Kobayasi, T., Suito, E., Harada, Y., and Watanabe, E. (1972). J . Appl. Phys. 43, 5 181-5 188. Van Dyck, D. (1980). J . Microsc. 119, 141-152. Van Dyck, D., and Coene, W. (1984). Ultramicroscopy 15,29-40. Wade, R. H., and Frank, J. (1977). Optik 49, 81-92. Wang, Y., and Chen, J. (1988). Phil. Mag. A58, 817-824. Wang, Y., Hu, T., When, H., and Zeng, X. (1990). Phil. Mag. Lett. 61, 29-36. Wang, 2. L. (1989). Acta Cryst. A45, 636-644. Wang, Z.L. (1990). Phys. Rev. B41, 12818-12837. Watanabe, K.,Kikuchi, Y., and Yamaguchi, H. (1986). Phys. Stat. Sol. ( a ) 98,409416. Watanabe, K., Hiratsuka, K., Kikuchi, Y., and Yamaguchi, H. (1987). Phil. Mag. Lett. 56, 51-55.
Watanabe, K., Kikuchi, Y., Hiratsuka, K., and Yamaguchi, H. (1988). Phys. Stat. Sol. ( a ) 98, 409-416.
224
KAZUTO WATANABE
Watanabe, K., Kikuchi, Y., Hiratsuka, K., and Yamaguchi, H. (1990). Actu Cryst. AM, 94-98 Watanabe, K., Hiratsuka. K., and Yamaguchi, H. (1991). Phil. Mug. A64, 81-86. Watanabe, K., Mitsuishi, K., and Hashimoto, I. (to be published). Whelan, M. J. (1965). J . Appl. Phys. 36, 2103-2110. Wright, A. C., Ng, T. L., and Williams, J. 0. (1988). Phil. Mug. Lett. 57, 107-111. Yamashita, T., Ponce, F. A., Pirouz, P., and Sinclair, R. (1982). Phil. Mug. A45, 693-711. Yoshioka, H. (1957). J. Phys. SOC.Jpn. 12, 618-628. Zhang, J., Kuo, K. H., and Wu, Z. Q. (1986). Phil. Mug. A53, 677-685. Zhao, T. C., Poon, H. C., and Tong, S. Y. (1988). Phys. Rev. B38, 1172-1 182.
ADVANCES 1Y ELECTRONICS AND ELECTRON PHYSICS, VOL. 86
Methods for Calculation of Parasitic Aberrations and Machining Tolerances in Electron Optical Systems M. I. YAVOR Institute of Analytical Instrumentation. St. Petersburg. Russia
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Parasitic Aberrations Caused by Electron Optical Element Misalignment. . . 111. Effects of Electromagnetic Field Disturbances on Charged Particle Trajectories IV. General Methods for Calculation of Electromagnetic Field Disturbances due to Electrode or Pole Face Distortions . . . . . . . . . . . . . . . . . . . . A. Simple Example of the Exact Conformal Mapping . . . . . . . . . . . B. Bertein Perturbation Method . . . . . . . . . . . . . . . . . . . . . C. Coordinate Frame Variation Method. . . . . . . . . . . . . . . . . D. Method of Integral Equations “in Variations” . . . . . . . . . . . . . V. Field Disturbance in Electrostatic and Magnetic Sector Analyzers. . . . . . A. Electrostatic Toroidal Condenser. . . . . . . . . . . . . . . . . . . B. Inhomogeneous Sector Magnet . . . . . . . . . . . . . . . . . . . . VI. Application of Approximate Conformal Mappings . . . . . . . . . . . . VII. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
225 221 235 245 246 248 253
. . 258 . 260 . 261 266
. 269 211 219
I. INTRODUCTION The calculation of parasitic aberrations and machining tolerances is a necessary stage of a design of any electron optical device. Since in practice manufacturing imperfections are inevitable, a designer needs to know their consequences. This knowledge can help provide for ways of the correction of parasitic aberrations. The correct estimation of tolerances allows a manufacturer to choose the manufacturing technology properly and also to optimize the adjustment of the device. It should be emphasized that the calculation of tolerances is important not only for the unique high precision electron optical systems but also for serially produced devices, because setting requirements too high for the manufacturing technology leads to the considerable rise in price of the production. In spite of the evident importance of the problem, there are practically no publications generalizing the experience of calculations of parasitic aberra225
English translation copyright 0 1993 by Academic Press, Inc. All rights of reproduction in any form reserved ISBN 0-12-014728-9
226
M. I . YAVOR
tions due to machining or assembling imperfections of electron optical systems. Only a number of papers exist where the effects of such imperfections on the properties of some types of devices are investigated. The attempt to review these papers was made recently by the author (Yavor, 1991b) but this review is too brief to penetrate deep enough into the problems involved. It is the goal of the present chapter to describe various methods of the parasitic aberration theory, to discuss their fields of applications as well as their merits and shortcomings, and to illustrate them with some examples. This review does not pretend to cover all achievements of the theory of tolerances but rather gives a general idea of its problems and methods. The review also does not contain the detailed investigation of parasitic aberrations and tolerances of any particular electron optical system. However, in order to make it useful not only for theorists who would like to learn how to calculate parasitic aberrations but also for practical engineers who are interested in the results and not in methods, the review is supplied with the exhaustive list of references. In addition to the notes made in the text, some brief comments to this list are found in the Conclusion. Generally speaking, the determination of tolerable limits for machining and assembling inaccuracies in electron optical devices does not reduce only to the calculation of parasitic aberrations due to these inaccuracies. The point is that most first order parasitic aberrations, such as, for example, variations of an image position or dispersion coefficients, can be more or less easily compensated for by means of some correcting elements. Therefore in reality tolerances depend on the possibility of such a compensation provided in the device under consideration. However, the ways for adjusting electron optical systems are the subject of a separate investigation, so in the present chapter we confine ourselves to discussion of the methods for calculation of parasitic aberrations. Section I1 of the chapter is devoted to the methods for calculation of parasitic aberrations due to misalignment of separate elements (lenses, deflectors, multipoles, etc.) of a complex multistage electron optical system. Since the electron optical properties of separate stages are assumed to be unchanged in this case, only geometrical considerations are needed here; and the technique involved is much simpler than that used when the elements themselves are manufactured inaccurately. The latter case is discussed in Section 111, where charged particle trajectories in a disturbed electromagnetic field are investigated. The particle trajectory equations discussed in Section I11 can be used for practical calculations only if the field disturbance inside an electron optical element due to its manufacturing imperfections is known. It is the problem of calculation of this disturbance that is most difficult to solve. Section IV of the review is devoted to the general methods for calculation of electromagnetic
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
227
field disturbances caused by distortions in shapes of electrodes or magnet pole faces. The special method that allows one to describe analytically field disturbances in sector electrostatic or magnet analyzers is described in Section V. Finally, in Section VI the application of the conformal mapping method to the field calculation for weakly distorted electron optical elements is discussed. Now we make some preliminary remarks. First of all, in the chapter we consider only current-free static fields. Furthermore, disturbances of fields and trajectories are calculated in the linear approximation with respect to geometrical parameters characterizing defects of the element manufacturing or assembling. This approach is generally used and justified, since as a rule manufacturing imperfections themselves cannot be described with a high accuracy. It should be noted that very often in the literature on the tolerance theory the linear approximation is used not only in the parameters mentioned previously but also in the geometrical parameters characterizing a charged particle beam. In other words, they consider the influence of imperfections only on the first order properties of a system. However, to our opinion that is not always sufficient. Many high-precision electron optical devices are designed so as to eliminate some higher order aberrations, and manufacturing imperfections break conditions of such an elimination. Moreover, higher order aberrations cannot be so easily adjusted as the first order properties. Thus in some cases it is useful to estimate both parasitic changes of the first order properties and, say, of the second order aberrations. For this reason we shall consider the second order properties when it is possible. BY ELECTRON OPTICALELEMENT 11. PARASITIC ABERRATIONS CAUSED MISALIGNMENT
In multistage electron optical systems parasitic aberrations may be attributed to two different types of defects. One is the manufacturing imperfections of electron optical elements (electrostatic or magnet lenses, sector fields, multipoles, etc.), leading to perturbations of their electromagnetic fields. These defects will be discussed in the following sections. However, even if all the elements are manufactured ideally, the system still possesses parasitic aberrations due to defects of the second type, those are displacements of elements relative to their nominal positions. It is the purpose of this section to describe the methods for calculation of the influence of such imperfections on properties of electron optical devices. The investigation of the aberrations in question is simple compared with that of the defects of the first type. The reason is that the misalignment of an
228
M. I. YAVOR
element does not change its influence on the charged particles and only makes it necessary to consider this influence in a displaced coordinate frame. Therefore the problem of the calculation of the element misalignment effects reduces to the derivation of relations connecting parameters of an arbitrary trajectory in the coordinate frames defined with respect to the ideal and misaligned element position. The first general investigation of the problem, known to the author, was undertaken in the paper by Bazhenova, Zinoviev, and Fjodorova (1967). The formulae were obtained in this paper that expressed the perturbance of an arbitrary paraxial trajectory going through an infinitely thin misaligned electron optical element (the words infinitely thin mean that the entrance and exit faces of an element are assumed to coincide). Later, the article by Matsuda, Matsuo, and Takahashi (1977) studied the influence of some particular types of a median plane misalignment of a two-stage mass spectrometer on its focusing properties. Changes in ion optical properties of a sector magnet analyzer due to its radial displacement was discussed in the paper by Malov and Trubatcheev (1978). The most complete theory of parasitic aberrations caused by misalignments was proposed in the article by Brown and Rothacker (1977). In this paper arbitrary misalignments of sector deflecting systems as well as thick lenses with straight axes were investigated. The corresponding algorithm was included in the computer program TRANSPORT (Brown et al., 1973). Similar techniques of the calculation were also used in some other computer codes; for example, COSY INFINITY (Berz, 1990). In all the publications just listed only the effects of misalignments on the first order properties of a charged particle beam were discussed; corrections affecting higher order aberrations were not taken into account. The calculations we present here are similar to those made by Bazhenova et al. (1967) and Brown and Rothacker (1977), but we retain the correction terms mentioned previously. Figure 1 shows the trajectory of a particle entering a misaligned element of an electron optical system from the outer field-free space. We bind the coordinate frame {x,y,z } with the ideal element position, the z axis pointing along its optical axis and the xy plane coinciding with the entrance face of the element. Suppose the element in question is to experience a small shift characterized by the vector 6r = (dx, 6y, 6z) and a small rotation described by , We bind the coordinate frame ( X , Y , Z } as the vector 6 6 = ( ~ 5 + ~6+?, shown in Fig. 1 with the misaligned position of the element. The X Y plane of the latter frame is the real entrance plane of the element. Y ,Z } In the linear approximation the relations between the {x, y , z } and {X,
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
229
FIGURE I . Shown are the coordinate frames bound with the ideal [x,y,z] and misaligned
[A', Y , Z ] positions of an electron optical element. The displacement of the element is defined by the shift 6r and rotation charged particle.
64 of its entrance profile plane xy.
T is a trajectory of an arbitrary
coordinates of an arbitrary point read
x = x + y s 4 , - zs4, - sx,
+ y + zs4, - sy, z= x&p,- ys4, + z - sz. y = -XS$,
In the field-free space the trajectory equations are
x(z)
= x,
Y ( Z > = Yo
+ x;z, +Y
k
with the vector (xo,xi,y o ,yh) = k, characterizing the coordinates and slopes of the trajectory at its intersection with the entrance face of the aligned element. Let the effect of the aligned element on a charged particle be described generally by the nonlinear operator
k, = M16, (2) where the vector k, = (,2c,,2L,ye,9:) contains the trajectory parameters at the exit face of the aligned element; the i axis of the (i,p, i} coordinate frame points along the optical axis of this element; the i jplane coincides with its exit face. Then the transformation of a trajectory by the misaligned element is described by the following relation:
K,
= MK,.
(3) Here K , = (&,Xi, Yo,Y;);K, = Xo = XJ,,,; X i = dX/dZI,,,; YO= Yll,o; Yi = dY/dZI,,,; the vector K, contains the analogous coordinates and slopes of the trajectory at the exit face 2 = 0 of the misaligned
(x,x,t,t);
230
M. I . YAVOR
element. Thus to obtain the relation between the vectors ko and k, determining (in the aligned coordinate frame) the trajectory transformation by a misaligned element, one should find the relation between the vectors ko and K, as well as between K, and k,. In the linear approximation with respect to the disturbance vectors 6r and S4 the point of the intersection of the trajectory with the X Y plane ( Z = 0 ) has the z-coordinate zo =
-
xos4,
+ y064, + 6z.
Thus
xo = X(Z0) + Y(Z0)64, - z o w , - 6 x = xo
yo =
-
+ y064, + x;sz - sx + x;yos~, - x;xos4,, x(zo)W, + Y ( Z J
= Yo - xow;
+Z O W
-
SY
+ Y 8 Z - 6Y - xoY;64, + Y;Y,64,,
(4)
The formulae obtained show that element shifts along the x and y axes lead to shifts of the corresponding coordinates only; a shift along the z axis is obviously equal to a change in the field-free interval length. A rotation about the z axis mixes x and y coordinates. Only rotations about the x and y axes give rise to terms of the second order with respect to the initial coordinates and slopes. If we neglect the second order terms in Eq. (4)then those equations can be rewritten in the matrix form
KO = Rko - D,
(5)
where
This result is identical to that obtained by Bazhenova er al. (1967) and by Brown and Rothacker (1977). Equation (5) can be applied if we are interested in only the first order properties of a system.
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
23 1
The derivation of relations between the components of the vectors K, and k, is straightforward. These relations read
x
a, = - f,S& 3, = + f,S& a: = 2;- Y;s&
*
+ 62 - fes&+ 2;&5&, - t / S i + S? + t/$sfjjv - t/ ts&, + s& + f;'sl$, - f;s&, - 2;si
*
+
(6)
+
2;s& - s& - f;2sfjjx 2;%s&. j: = Here the values (62,Sj,6.2) = Sf and (S$x, S$,) = S$ determine a displacement of the {T, p,21coordinate frame relative to the {a,9, i} frame. In the linear approximation Eq. (6) takes the form k, = R-IK,
+ D,
(7)
where
1
-6.2
-s&
O \
R-' = 0 -s& 0 1 I Combining Eqs. ( 2 ) and ( 3 ) with Eqs. ( 5 ) and (7) we come finally to the following result: k, = R-'M(R$ - D) + D,
(8) Equation (8) describes the paraxial trajectory transformation by the misaligned electron optical element. Note that one should be very careful when using Eq. (8), since sometimes the substitution into Eq. (8) of the operator M in the form of its linearized first order matrix representation can lead to the incorrect vector k, dependence on the parameters br and S4. Indeed, suppose that an electron optical element with two planes of symmetry possesses second order aberrations; then the trajectory coordinate 2eat the exit face of the element is
232
M. I. YAVOR
where (. . .} denotes terms of the second order with respect to the components of the vector k,. Thus we see that the nonlinear part of the operator M defined by the coefficients a,, , a,2, a,z, q4,and a- contributes to the ie coordinate in the linear approximation with respect to ax,6y, 64r, and 84,. This contribution is described by terms contained in square brackets in Eq. (10). To understand what parasitic aberrations can be caused by the displacement of the electron optical element, we now consider a system forming in a profile plane i = if a stigmatic image of an object situated in a profile plane z = z , . The relation between the trajectory position vectors k, = ( Y,,x,’ ,y , , yr’) in the object plane and k, = (if, ij,j,., 9;) in the image plane can be represented by means of the first order transfer matrix (see Wollnik, 1987):
k,=
[
\
0
0
/.,I
0
0
0
0
a33
0
a43
\
)
k,.
a441
where the coefficients aIkdepend on the system in question. Suppose that the object and image planes are located in the field-free space. It is easy to see from the preceding considerations that in the presence of a misaligned element in the system the relation between k, and k,. generally takes the following form \
where a, and t l I k (i, k = 1,2,3,4) are some small quantities depending on the misalignment. We see now that the plane 1 = 2, is no more the image plane, if a,2 # 0 and c(34 # 0. However, it is not difficult to find the new position 2, = 5,.+ A of the image plane. Since the transfer matrix MAof the field-free interval of a length A is /l
A
0
O\
\o
0
0
1/
233
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
= M,kf; and in the linear approximation with then k, = (2,,2,,$,,pA) Eqs. (12) and (13) we obtain
/a11
\
+ Ell +
4
1
A
a12
a4 1
+ a22A
a13
a42
a43
+ a43
au
a14
\
+
J
It is clear that the plane 2 = i, is the 2-image plane if A = - a 1 2 / a 2 2 . However, this image is slightly astigmatic, because the position of the 9-image plane is defined by the equation A = Furthermore, Eq. (14) shows that the image is shifted by a1in the 2-direction and by a3 in the ?-direction. It is evident that the magnification coefficients of the system are changed, too. But the most essential effect of the system element misalignments is that the 2, coordinate of the trajectory turns out to depend on both yo and y; (with a13 # 0 and aI4# 0),and vice versa, the 9, coordinate depends on both xo and x; (with a j l # 0 and a14 # 0). This means that the image experiences additional defocusing. The correction of such an image defocusing requires special efforts. To complete our considerations for general elements of an arbitrary length, we need to give relations between the vectors 6r, 66 (determining the displacement of the entrance face of an element) and hi, 6 6 (determining the displacement of its exit face). These relations were obtained by Brown and Rothacker (1977). Here we present only their results without details. For a sector electron optical element with the deflection angle $ and a circular optical axis of the radius p, the relations in question read
63 = O(6r + 64 x P),
sf$
=
064,
where 0 is the orthogonal matrix
/
cos$
0
sin$
\
234
M. I. YAVOR
and P is the vector
For an element with a straight axis (such as a quadrupole lens) of a length L we have p --* co, t j + 0, $p = L;then the matrix 0 reduces into the unity matrix 0 = I and the vector P takes the form P = (O,O,L). The method presented allows one to analyze the transformation not only of separate trajectories but also of the ensemble of charged particles represented as an area in the six-dimensional phase space. For the lack of space we do not discuss this subject here and refer an interested reader to the article by Brown and Rothacker (1977), where the beam envelope properties are studied. The technique just described is widely used not only for the calculations of possible distortions of a beam but also for a practical adjustment of multistage electron optical systems. This was illustrated by Matsuda et al. (1977). The authors faced some difficulties when adjusting a high precision double focusing mass spectrometer at Osaka University. Its real resolving power was less than 15,000, and significant deviations in the image size were observed, depending on the source slit height and the vertical aperture angle. The image positions for the point object located at the y-axis (xo = 0) were measured experimentally for various vertical object coordinates yo and trajectory slopes Po E y;l. The results, presented in Fig. 2, show that the image of the narrow vertical source slit turned out to be a parallelogram like area instead of a vertical line. This image distortion was attributed to the small mutual parallel shift, bending and rotation of the median planes of the analyzer electrostatic and magnetic stages. The results of measurements allowed the authors to apply the theory of parasitic aberrations for a numerical estimation of the median plane misalignment, and the final adjustment was made in accordance with the results of this estimation. The resolving power of 240,000 was achieved in this way. Besides the method considered previously, there is an alternative way of computing misalignment effects of electron optical units. According to the latter, instead of using the ideal transfer matrix in a misaligned coordinate frame, the perturbed transfer matrix in the ideal coordinate frame is calculated. This matrix is obtained by the direct integration of trajectory equations in the ideal coordinate frame, but in a changed field (trajectory equations of this type are discussed in Section 111). The electromagnetic field disturbance in the ideal coordinate frame is not difficult to determine if the ideal field as well as the connection between the ideal and misaligned coor-
235
METHODS FOR CALCULATION O F PARASITIC ABERRATIONS
____--
I r-
a
4
€ €
I
3
.4
Y
>
-
2 yo
1
=+o . o 02
0
-1 -2
' Ro=O
-3
' Ro=O.
-4
I
.
-40
-30
.
.
-20 -10
. 0
003
. 10
. 20
. 30
. 40
--_----10
0
10
FIGURE 2. Shown are the points of intersection with the image plane of particle trajectories outcoming from a narrow source slit of a double focusing mass spectrometer. The positions of the points depend on the initial vertical coordinate yo of the trajectory in the object plane and the initial trajectory vertical slope Po. (a) In the misaligned system such points fill the parallelogram like area, forming the defocused source slit image; (b) in the aligned system (after adjustment) all the points lie at the vertical line and form the perfect source slit image. After H. Matsuda et al. (1977), p. 234, Fig. 6.
dinate frames are known. Such approach was used by Kawakatsu, Vosburgh, and Siegal (1968) for the calculation of mechanical aberrations of magnetic quadrupole lenses and by Zhu and Liu (1987) and Liu and Zhu (1990) for magnetic and electrostatic lenses and deflectors. However, this method seems to be less general than the one considered previously, since it requires the specification of the form of the system electromagnetic field; furthermore, it is more complicated because recalculation of the transfer matrix is needed.
111. EFFECTS OF ELECTROMAGNETIC FIELDDISTURBANCES ON CHARGED PARTICLE TRAJECTORIES Imperfections in the machining and assembling of elements of electron optical systems inevitably change their electromagnetic fields. Thus the evaluation of parasitic aberrations due to these imperfections requires the calculation of charged particle trajectories in weakly disturbed fields. The derivation of the equations describing those trajectories usually does not
236
M. I . YAVOR
FIGURE3. The natural coordinate frame { e , , e , , e , ) is shown in the neighborhood of the reference particle trajectory S in an undisturbed field. T is an arbitrary particle trajectory in a disturbed field.
cause serious difficulties and can be made analogously to the derivation of equations for charged particle trajectories in an undisturbed field. However, one should always remember that perturbances generally break the symmetry conditions inherent in electromagnetic fields of most electron optical devices. For this reason, when investigating the trajectory distortions, one cannot directly use the equations valid for the perfectly made element, but has to base calculations on more general equations. In the present section we will discuss several useful equations for particle trajectories in weakly disturbed fields. As the first example we will obtain the linear equations for paraxial trajectories in an electron optical system with an arbitrary curvilinear axis. There exist various methods to derive such equations. One of them was proposed by Grinberg (1948), who obtained the equations for relativistic trajectories in a narrow beam with an arbitrary (but known) optical axis. We will base the following considerations on the Grinberg method; for reasons of simplicity we confine ourselves to a nonrelativistic case. However, since the calculations are rather cumbersome, we will represent them without detail. First of all we will obtain the equations for trajectories lying close by a fixed three-dimensional curve S. We assume this curve to be a real trajectory of a reference particle of mass m, and charge q in an undisturbed electromagnetic field. We will define the undisturbed field by the scalar electrostatic ii and magnetic B potentials. However, in a perturbed field determined by scalar potentials u = ii 6u and w = B 6w,the S curve does not necessarily coincide with any real trajectory. We will use the arc length s as the coordinate along the S curve. We introduce the “natural” coordinate frame in the neighborhood of an arbitrary point N lying on the S curve (Fig. 3). The origin of the coordinate frame coincides with the point N ; and unit vectors ex, e , , and e, are directed along the main normal, the binormal, and the tangent to the S curve. The vector ex is supposed to be directed toward the trajectory curvature center. We also assume that the curvature does not change its sign along the S curve.
+
+
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
237
Now the trajectory T of an arbitrary particle moving close by the S curve can be defined by the parameter s and the vector p(s) = x(s)e,(s) + y(s)e,(s). The equation of motion for this particle in a perturbed field reads d2
m - (R dt2
+ p) = q{E + v x B},
(15 )
where R is the position vector of the point N ; E = - grad u and B = - grad w are field strengths of the electrostatic and magnetic field; v is the velocity, m = m, 6m the mass, and q the charge of the particle. Suppose that the velocity of the reference particle moving along the S curve equals zero at the point where the electrostatic potential equals zero, too. Then the energy conservation law for an arbitrary particle has the following form:
+
mv2 -- -qu
+ 6,
2
where v = (v(,6 is a small energy deviation of the particle in question relative to the reference particle in some initial point where the potential disturbance 6u is assumed to equal zero. The derivative with respect to the time t in Eq. (1 5 ) can be replaced by the derivative with respect to the arc length s. Then, using Eq. (16) and the well-known formulae de,/ds = - e,/p e,/z, de,/ds = - e , / t , de,/ds = e,/p (where p and T are curvature and torsion radii for the S curve), we come after some cumbersome calculations to the following equations for x- and y-components of the trajectory:
+
2u
(
X"-
2y'
T
+Tyr' -2
">
+ P + -p ( U + xu, + yU,) -j
T
3 :; u,
+ U' (xf - -
2u
x
2x' (y''+----T
XT'
T2
- --
-
xu,,
') + U ' (y ' + -:>
T2
>
-
-
yu,,
u,-
xu,, - yu,,
(
=/Z[W 2u ' 2( u l2+ qu + x+ w , + yw, - -X I - yuy
:'>4
(17b) Here U = 0 6U and W = W 6 W (as well as their derivatives) denote the values of the potentials u and w (and their derivatives) at the S curve; prime
+
+
238
M. I. YAVOR
denotes the derivative with respect to the arc length s. Only the first order terms on x, y , and o are retained in Eqs. (17). Rewriting Eqs. (17) for the trajectory of the reference particle that moves in the undisturbed fields (characterized by the values 0 and of the potentials at the S curve), we come to the following conditions of coincidence of this trajectory with the S curve:
-4
+ sUj20)
With Eqs. (18) and the relation z ,,/=(l (1 - 6rn/2rn0),Eqs. (17) can be rewritten as follows:
2u
(
2y’
yt’ 20-9P
=
x
X ” - - + T - i + T 5 t T
P
- yu,,
- p g [ y ( % + ” -2u
2gu
(
U’ x’
P
xu,,
2s u +- su, P
“ >+ x w , , + y w , , -
+ ( ’>
2u y“ + - - - - 2x’ t xt’ T2 t2
(
” >+ - ( x U , + y U , ) +
U’
y‘
:>
+-
-
xu,,
-
(
y’
yu,,
-
su,
Generally these equations can be solved only numerically. Note that Eqs. (19) are inhomogeneous even if the mass and energy of the particle in question coincide with the mass and energy of the reference particle (drn = n = 0). This means that in the disturbed electromagnetic field the reference particle trajectory is displaced, and therefore the S curve no more can be a real trajectory for any particle. Equations (19) allow one to study variations of first order focusing and dispersing properties caused by field disturbances for various electron optical elements. As a simple example we consider the motion of achromatic
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
239
monomass particle beam (Sm = 0 = 0) in a round electrostatic lens with a straight axis ( p = z = 00). We assume the xz and y z planes to be the planes of symmetry for the perturbed potential (i.e., U, = U, = U, = 0). Such a disturbance can be caused, say, by a small ellipticity of one of the lens electrodes. Then. Eqs. (19) simplify to
y”
U’ - y - + y’ - 0. 2u 2u UYY
Since in an ideal round lens o;, = U,, = - 0 ”/2 , but the potential disturbance Su does not possess the axial symmetry (6Uxx# SUy,), the x- and y-focus positions of an imperfect lens do not coincide. This effect due to a small ellipticity of the lens electrodes is well-known and is called axial astigmatism. It was discussed in detail, for example, in the book by Strashkevitch (1959). Some interesting results on the axial astigmatism have been obtained by Janssen and Thiem (1988). If the ellipse axes in the lens in question do not coincide with x and y axes, then SU,, # 0. In that case Eqs. (19) take the following form:
U’ - y f’ + y’ -
2u
32- x u,= 0. 2u
2u
Now the x-focusing properties of the lens are different for trajectories with different y coordinates, and vice versa. The evident result of this defect is the rotation of the linear astigmatic x-image of a point object in the x-image plane. If the object has a finite dimension in the y-direction, then the x-image is defocused. The same effect is observed in the y-image plane. This effect is analogous to the image defocusing in a system containing a misaligned element, discussed in Section 11. Thus we see that the electromagnetic field perturbation in an electron optical system can lead to such parasitic aberrations as lateral and longitudinal displacement of an image, its rotation, astigmatism, and other kinds of defocusing. It is obvious from Eq. (19) that a perturbation also changes system dispersion properties. An interesting example of effects of manufacturing inaccuracies is the charged particle motion in crossed electrostatic and magnetic fields whose orthogonality is violated due to a small imperfection of a system. Changes in focusing and dispersing properties of such a system was studied by Kuzmin (1971).
240
M. I. YAVOR
Equations (19) are not valid for emission electron optical systems, such as cathode lens, where the charged particle energy near the cathode is small. The investigation of cathode lenses requires more sophisticated methods, though some simple defects of cathode lenses were studied as early as the 1950s. Vorobjev ( 1 959) investigated the axial astigmatism effect in round cathode lenses. Later Der-Shvartz and Kulikov (1 968) considered imperfections in electrostatic round cathode lenses with a plane cathode. In this paper the potential disturbance was represented by the Fourier series with respect to azimuthal angle. Der-Shvartz and Kulikov obtained the expressions for the third-order aberrations and estimated the effects of some special manufacturing defects. The most complete theory of parasitic aberrations in cathode lenses has been proposed by Kolesnikov and Monastyrsky (1988). This theory includes the evaluation of the third order space and time-of-flight aberrations caused by the violation of the axial symmetry. The paper by Kolesnikov and Monastyrsky contains both general relations and analysis of some specific types of field disturbances. Some preliminary theoretical results used in the paper were obtained in the early article by Monastyrsky (1978). For lack of space we shall not discuss the parasitic aberration theory for cathode lenses here. As was mentioned earlier, in some cases it is useful to investigate the field disturbance influence not only on the first order properties of a system, but also on the second order aberrations. As an example we will now obtain the equations for charged particle trajectories in a weakly disturbed inhomogeneous magnetic and electrostatic sector fields. We begin with a magnetic case. Suppose the charged particle beam axis in an undisturbed magnetic sector field with the symmetry plane z = 0 to be the circle arc of a radius r,, lying in the z = 0 plane. It is convenient to introduce the cylindrical coordinate frame ( r , z, $}, $ being the azimuthal angle, and the dimensionless coordinates r = ( r - ro)/ro,[ = z/roin the neighborhood of the optical axis r = y o . Any known method can be used to derive the equations for the 9- and (-components of charged particle trajectories in a magnetic field defined by its scalar potential w. These equations read 29'2 9'' - -
I+r
1 x ___ l+r
{
g
9 4 -f V '
pwl
+ (1 + 9)2] + (' aw 24
,
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
x
241
aw - 'I"' - - 'I' 1 { [ P+ (1 + 'I)']l+'I a'I ai 84 aw
awl .
(20b) Here mo and KO are the mass and energy of the reference particle moving along the optical axis in the undisturbed field, y = Sm/moand o = SK/Koare the relative deviations of the mass and energy of an arbitrary particle, prime denotes the derivative with respect to the angle 4. For the reference particle Eq. (20a) gives J w / q = 1 , if we represent the scalar potential I? of the undisturbed magnetic field by the expansion *('I,<) =C
+ a11'Ii+ 4a2,q25+ &ao3i3+ ia13'Ii3 + $z31'13{ + . . . , (21)
and its disturbance Sw by the expansion SW('I,C,
4 ) = @00(4)+ @ , 0 ( 4 ) ' I + @0,(4>i+ +a20(4)'12+ a11(4)'IC + +a02(4)C2 + &'30(4)'I 3 + ia21(4)'12i+ '2 a 12 (4)'IC' + '6 a 03 (4)" + .... (22)
The coefficients aik as well as the 'I and C coordinates of the trajectory are assumed to be small values of the first order. We represent the solutions ~ ( 4 and ) i(4) of Eqs. (20) as the series expansions
where q I and il are the terms of the first order describing the solutions of linearized trajectory equations; and are the second order aberrations, etc. Moreover, it is convenient to represent
)?, and
r, defining the trajectory in the undisturbed field with the potential
W; and fi,,, and [, the distortions of the trajectory due to the potential disturbance 6w. Then, substituting Eqs. (21)-(24) into Eqs. (20) and
+
+ + +
+
+
expanding the terms (1 'I)-' and ,,/['I/' (1 'I)' i"]/[(l o)(l y)] in the series in the powers of small values I], <, o,and 7 , after some cumbersome and [,,. calculations we come to the equations for the functions Vrn, Since the equations for the functions ij,,, and and their solutions, defining the charged particle trajectories in the undisturbed field, are well known (see, for example, Matsuo and Matsuda, 1971), we do not reproduce these
r,
rrn,ern,
242
M . I . YAVOR
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS b56
+
= -
dl = -(4aII d2 - v22 " I
=
CI
= (2all
c2 =
a21
A
d3
=
-a0312,
d5
= d6 = ( i a , ,
+ %)t,+ 2.10 +
[; - a&,
c3
c5 --c
c4=fi;,
- $@01,
))81
+ + 2142,
-ti,
d4
243
6
= Pall
+ 1)fi2,
.zo3
+ a2I)fil +
-
% I 9
- -12 a l l i l - 3.10.
r3.
For the lack of space we do not give the equation for the function Equations (25) and (26) define the deviation of the charged particle beam axis in the disturbed field; Eqs. (27) and (28) determine the changes in the first order properties of a sector magnetic analyzer (those are the focal length, the magnification, the mass and energy dispersions, and the defocusing due to the mixing of q- and i-coordinates). Equation (29) allows one to determine the second order aberration changes. Particle trajectories in an electrostatic toroidal condenser can be defined analogously. The undisturbed potential fi in such an analyzer can be represented as the expansion E ( v , i)= v
+ )h20v2+ 3hO2i2+ 3hl2vi2+ &oV3 + W,v4
+ $h2,v2i2+ hh,i4 + . . . ,
(30)
and its disturbance 6u in the form of the series expansion
+ XOl(4)i + 3 x 2 0 ( 4 ) v 2 + xll(4)vi + 3x02(4)iZ + & x 3 0 ( 4 ) v 3 + 3x21(4)v2i+ )x12(4)vi2 + +xO3(4)i3 + . . .. (31)
W L i , 4) = x00(4)+ xlo(4)v
We give only the final results here. Since the equations for the trajectory components ij, and %, in an ideal toroidal electrostatic field are well known (they are obtained and solved, for example, in the paper by Matsuo, Matsuda, and Wollnik, 1972), we do not reproduce them. The equations for functions fim and t, now have the same form as in Eqs. (25)-(29), and the coefficients in these equations are
A
=3
+ hz0,
B = hO2,
bo = -2xoo - XI02 b~ = -(10hzo b2 = 2fi; b4 = -
+
+ &, 9
h30
co =
-xol>
+ 18) - 2hzoXoo 12x00 - 6x10 b, = - W o , + h12)tI - 2x01 -
~ 2 0 ,
XI15
b5 = (hzo
+ 6)fii + 4x00 +
xi02
244
M. I. YAVOR =
C?
= 2[;,
-
+ h12)ci- 4x01
(4ho2
CI
- Xii,
ci = - (4hO2+ hld41- 2hO2xoo - xO2,
+ &, = hO2t+ I (3h:o + 45h20 + 7h30 + + 54)fii
c, = 24;
611 =
-
c5
x01
1
3h40
- 42x00
-
l6h2o~oo- h30xW
3h20~1o- 21xlo - 5xzO- 3x30,
-
(ko+ 4 ) t I - 2xoo- xlo, 633 = - (h20h02+ 6h02 + 3h12+ 3 h ) k - 4ho2~00- h12xW -_ =
-
- x02
- h02X10 - 3 x 1 2 ,
644
=
655 =
- (A20
-(ho
612 = -(2h,,
+ 21% - 2x00 - x i 0 3 + 816, - 6x00 - x i 0 2 + 8)4; + 2x&3+ x;o,
hi,
=
614
= - (2h20
+ 12ho2 + 6h12 + h22Ei - 2 h o ~ o i 12xoi - 6x11
-(2h,oho,
hi5 = -(2h,o
f 4)g;
3
+ 1614, + 4h2,xoo+ 32x00 + loxi0 + 120,
623 =
xi1
635 =
(4hO2+ h
3
- ~ 2 1 ,
625
= 24;
)tI
-
x;o,
+ 4x01 + xI1,
b4, = 2[;, di = -(10h20 d2
=
24;,
4 = -2[;,
+ h3o + 18)42, d3 = -(2& + hiz)t2, 4 = ( 4 0 + 6162,
66 = Cg = d6 = b16 = h2, = b,, = b4, = b,, = bg6= b,, = b3, = 0.
It should be emphasized that in spite of the apparent complexity of Eqs. (25)-(29) they can be easily solved, since the solutions ijm and [, describing the undisturbed trajectories are trigonometric functions. The solutions of Eqs. (25)-(29) are thus some integrals whose integrands contain trigonometric functions as well as functions x i k ( 4 ) (or xik(4)in the electrostatic case). For some important particular defects like shifts, tilts, ellipticity, and radius variations of electrodes or poles, these integrals can be calculated analytically. In the case of more complex form of pole or electrode surface distortions, the numerical calculation of the corresponding integrals is
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
245
needed. However that may be, the solutions describing charged particle trajectory disturbance is not difficult to implement in any computer program for the calculation of electron optical properties of imperfect electrostatic and magnetic sector analyzers. Parasitic aberrations of a cylindrical sector analyzer were discussed in the papers by Boerboom (1976) and Yavor (1989a). Parasitic aberrations for the toroidal condensers were investigated by Malov and Trubatcheev (1979) and Boerboom (1989) and Yavor (1989~;1990a,b). Numerical calculations of effects of imperfections of a homogeneous sector magnetic analyzers were made by Lilly, Weismann, and Lowitz (1963) and Cambey, Ormond, and Barber (1964). The detailed theoretical investigation of this problem has been undertaken recently by Boerboom (1992) and of the same problem for inhomogeneous magnets by Boerboom and Yavor (1992). The approximate method for calculation of third order parasitic aberrations due to nonuniform shift of geometrical axes of electrostatic lenses and deflection systems has been proposed recently by Kurihara (1990). Although the equations determining distorted trajectories were derived by Kurihara based on a rough approximation of the electrostatic field perturbance, his method seems to be useful for the estimation of aberration coefficient changes. Third order equations for trajectories in electron beam focusing and deflection systems in the case of the electrode ellipticity or tilts, as well as the corresponding third order parasitic geometrical aberrations, are obtained in the paper by Liu and Zhu (1990). FOR CALCULATION OF ELECTROMAGNETIC FIELD I v . GENERAL METHODS DISTURBANCES DUE TO ELECTRODE OR POLEFACE DISTORTIONS
To solve the trajectory equations discussed in the previous section and thus to determine parasitic aberrations one has to evaluate an electromagnetic field disturbance caused by an alteration of electrode or pole face shapes in an imperfectly made electron optical device. This is the most difficult problem of the tolerance theory. Several methods have been developed to solve it, and it is the purpose of the present and following sections to review and compare those methods. In some special cases it is possible to find exact analytical expressions for field variations. As a rule, one manages to do it if both the ideal field and its disturbance are two dimensional and the transformation of the ideal geometry of a system to the disturbed one can be described with the aid of a conformal mapping. For example, Zashkvara and Ilyin (1973) applied a conformal mapping to obtain an analytical expression for the electrostatic field potential in a cylindrical mirror analyzer where the axis of one of
246
M. 1. YAVOR
cylindrical electrodes is shifted slightly relative to the axis of another one. Boerboom (1987) also used the conformal mapping method to investigate the influence of misalignments of dipole magnet poles on the magnetic field and its ion optical properties. However the conformal mapping is not the only method that can help one to obtain an exact analytical solution of the problem in question. Sometimes changing of coordinate frame may also be of use. For example, Vlasov and Shakhmatova (1962) found the exact analytical expression for the three-dimensional field potential perturbation in a two-electrode cathode round lens due to its small ellipticity, having used the elliptic coordinate frame. Unfortunately, such situations, where exact analytical solutions of a problem under consideration exist, are very uncommon. On the other hand, it is inconvenient to apply common numerical methods for the calculation of field perturbations because of different scales of system sizes and its distortions. So, as a rule, special approximate methods are used for this purpose. There exist methods that allow one to obtain approximate solutions for some types of electron optical systems and defects in the analytical form. Those methods are very convenient because they simplify comparing effects of different defects on electron optical properties of a system and also save the time of computation. They will be considered in the following sections. However, for most electron optical devices only general perturbation methods are applicable, which require numerical calculations. These methods are discussed in the present section. They are the Bertein perturbation method, the method of variation of a coordinate frame, and the method of integral equations “in variations.” To make our considerations more definite we will confine ourselves in this section with purely electrostatic systems, although the same methods are also valid for magnetic fields. Before discussing approximate methods, we first consider a simple example, where an exact analytical solution for a field disturbance can be found. This example is the electrostatic field distribution in a cylindrical condenser with a misaligned electrode. The problem is evidently identical to that solved by Zashkvara and Ilyin (1973) for a cylindrical mirror analyzer. Later on we will use this example to illustrate ideas and characteristic features of approximate methods.
A . Simple Example of’ the Exact Conformal Mapping
Consider a cylindrical condenser consisting of two electrodes of radii r , and r2 (Fig. 4). Suppose that the outer electrode (the electrode axes are directed perpendicular to the plane of the figure) experienced a small shift p in the direction of the x axis relative to the axis of the inner electrode. We introduce
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
247
FIGURE4. Shown is the section by the x y plane through a cylindrical condenser with the shifted outer electrode. The cylindrical electrode axes are directed perpendicularly to the xy plane.
the complex coordinate z
=x
+ iy in the xy plane. The conformal mapping I-rl
with an arbitrary real parameter, a transforms all circumferences in the plane of the complex z coordinate into circumferences in the plane of the complex ( coordinate. Moreover, this transformation keeps unchanged the circle of a radius rI with the center at z = 0. It is not difficult to define a conformal mapping of a form of Eq. (32) that transforms the circumference of a radius r2 with the center positioned at z = p into the circumference of the same radius but with the center at z = 0. The corresponding value of a parameter a is -
a=
4 + r: + p2 + J(ri 2Prl
- r: - p2)’ - 4 p 2 4
(33)
A field potential in an ideal cylindrical condenser is well known and reads
V ,In r2 - V2In r , (34) V2 - V , where /zI = r, U = ( V, - Vl)/ln(r2/r,);V , and V , are electrode potentials (we neglect fringing field effects in our considerations). Then the field potential in the deformed condenser can be found as V ,In r2 - V2In rl (35) V2 - Vl Substituting Eqs. (32) and (33) into Eq. (35) we obtain the exact analytical expression for the field disturbance.
248
M. I . YAVOR
We do not give here this exact solution in its explicit form since it is rather cumbersome. This solution, however, can be considerably simplified. First of all, in a linear approximation with respect to the small distance p we can obtain from Eq. (33)
and consequently
The conformal mapping of Eq. (36) transforms the circumference of a radius r l into itself but gives slightly different radius and position for the image of the circumference of the radius r2 as compared with the transformation of Eq. (32). In a linear approximation we can obtain the further simplification of Eq. (36) ((z) "=
2
+ p- r:4-- z2r:'
(37)
Substituting Eq. (37) into Eq. (35) we get the following approximate expression for the disturbed field potential u = fi
+ 624,
where U
(:
6u = p - r:
4
- r)
cosb.
(38)
4 = arctan ( y / x ) is the polar angle. Here r = JzJ= ,/-; Note that an exact conformal mapping can be found not only for the defect just considered, but also for some other types of defects. However, larger resources have approximate conformal mappings obtained by so-called variational principles of the conformal mapping theory. They will be discussed in more detail in Section VI. B. Bertein Perturbation Method
Bertein was the first to apply the perturbation method to the investigation of parasitic aberrations in electron optical systems. The purpose of his works (Bertein, 1947a; 1947b; 1948) was to study aberrations caused by small
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
249
/ -
s
FIGURE 5. The boundary (solid line) of the perfect domain d and the boundary S (dashed line) of the distorted domain D .
deformations of round lens electrodes. Later Sturrock (1951) applied the same method to magnetic lenses. Here we will describe the perturbation method in its general form, applicable for systems of an arbitrary shape. For reasons of simplicity we start with the two-dimensional problem. We search for the electrostatic potential u distribution in a two-dimensional domain D whose boundary S is slighty distorted as compared with the boundary of a perfect undisturbed domain b (Fig. 5). We introduce the coordinate frame {s, n} in the neighborhood of the curve, s being the arc length along the curve and n the normal deviation from this curve (with an arbitrary orientation of a unit normal vector). Then the S curve can be described by the equation
s
s
s
n =f(s>
(39)
with If (s)l -g 1. We assume that the undisturbed potential ii inside the domain b is known and the distortion does not change the potential distribution on the boundary; i.e., u(s, n) = ii(s,0)
= g(s).
(40)
The potential u satisfies the Laplace equation Au = 0 inside the domain D and the boundary condition of Eq. (40). Suppose that the curves and S as well as the function g(s) are such that the potential u can be extended as the harmonic function to the whole domain b and its small neighborhood. Then u(s, n) = u(s, 0) + n[du(s,n)/dn],,, + . . . ; that is, u(s,O) r g(s) n[du(s,n)/8n],=,. Using Eqs. (39) and (40) together with the equation Aii = 0 and the approximate equality &/an aii/dn, we finally obtain the following approximate equations for the potential disturbance 6u = u - ii:
s
The same reasoning can be applied to the three-dimensional case, too. Equation (41) also holds here, but the parameter s should be considered now
250
M . I. YAVOR
as a vector defining an arbitrary point at the boundary of the undisturbed three-dimensional domain. Thus we see that the potential disturbance 6u can be found as the solution of the Laplace equation inside the same domain where the undisturbed field was determined. For this reason the technique in question can be easily embedded in any computer program for the numerical calculation of electron optical system fields. The simplicity of the Bertein method as well as the possibility to treat various defects of complex electron optical devices made this method very popular. Many computer programs for calculation of parasitic aberrations are based on this method. As early as in 1953 Archard developed such a program for the evaluation of the influence of mechanical defects on properties of magnetic electron lenses. Deviations from the axial symmetry in electrostatic electron lenses were studied with the aid of the perturbation theory by Janse (1971). He approximated the differential equation A(&) = 0 in a special way by difference equations. This allowed him to obtain the numerical solution with the aid of a successive overrelaxation method. Romaniv (1974) also treated electrostatic field of an arbitrary set of electrodes with a slightly violated axial symmetry by the same method, but to solve Eq. (41) he applied the boundary element technique. For a calculation, with the aid of the Bertein method, of effects of electrode ellipticity and tilts in lenses and deflectors, Liu and Zhu (1990) used the finite element method. New computer programs based on the Bertein method were also reported by Tsumagari ef al. (1986; 1987) and Munro (1988). Note that in some cases the Bertein method allows one to find approximate analytical expressions for a field disturbance. For example, as early as in 1953 Glaser and Schiske obtained the analytical solution of Eq. (41), describing the elecrostatic field disturbance of a two electrode round lens due to its small ellipticity. To illustrate the possibility of obtaining analytical solutions we will consider another example here; namely, we return to the electrostatic field of a cylindrical condenser with a misaligned outer electrode (see Section IV, A). We remind the reader that it is convenient to introduce the polar coordinate frame ( r , 4 } in the plane of Fig. 4, its origin 0 coinciding with the ideal position of electrode axes, and 4 being the polar angle. Then the field potential in a deformed condenser satisfies the equations M r , 4 ) = 0,
u(r,
9
4)= v,,
u(y2
+ P cos4,4) = V2,
(42)
where V , and V, are electrode potentials, p is a small shift of the outer electrode axis in the x-direction. The solution of Eq. (42) for the ideal cylindrical condenser (with 6(4) = 0) was given in Section IV, A by Eq. (34). We now apply the Bertein method to solve Eq. (42). According to Eq. (41), the potential disturbance 6u = u - ii
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
25 1
satisfies the following equations
The solution of Eq. (43) can be easily obtained by the separation of variables. It reads
and is identical to the approximate solution (Eq. (38)) given by the conformal mapping method. The expressions for potential disturbances can be found analogously in other cases where the perturbed field is two dimensional (see Yavor, 1989b), for example, for the small ellipticity of condenser electrodes. In spite of all its merits the method in question is not universal, and what is more, it has very serious shortcomings. The careful analysis of assumptions made when deducing the Bertein formulae shows that they are valid only if a distortion of the boundary S as well as the boundary S itself (to be precise, its distorted part) and the potential distribution at the boundary are smooth. Thus, this method cannot be applied to systems whose electrodes have sharp edges or with discontinuous potential distribution at the boundary. Attempts to apply the Bertein method to such systems lead to incorrect determination of field disturbances. This was shown by Vlasov and Shakhmatova (1962), who compared the analytical result of the exact calculation of the field disturbance due to a small ellipticity of a two electrode round cathode lens with the result obtain by the Bertein method and found out the considerable difference in the final expressions. The restrictions listed are, of course, very strong and limit substantially the class of systems that can be treated by the Bertein method. However, these restrictions are neglected in most calculations of perturbed fields. For example, the electrode configuration considered by Romaniv (1 974) included electrodes with sharp edges. Tsumagari et al. (1986) calculated the field disturbance in a deflector with a discontinuous potential distribution at the electrodes. The results of such calculations are very rough and may be completely wrong. However, the trustworthiness of results obtained with the aid of the Bertein method depends not only on the electrode configuration, but also on the type of a defect. To illustrate it, we now discuss the defects of the systems considered by Munro (1988). Two examples of his calculation results are shown in Fig. 6. The first of the systems is the bipotential round two electrode lens with a tilted left electrode. Fig. 6(a) shows the distribution of the lens field potential disturbance and below the distribution of the axial field disturbance. It is easily seen from Fig. 6(a) that the potential disturbance
252
M. I. YAVOR ILLUSTRATIVE BI POTENTIAL LENS TILT OF ELECTRODE 1
ILLUSTRATIVE ELECTROSTATIC EINZEL LENS MISALIGNMENT OF ELECTRODE 3
FIGURE6 . The field potential disturbances in round electrostatic lenses, calculated with the aid of the Bertein method. Two systems are shown: (a) a bipotential two electrode lens, and (b) a three electrode einzel lens. At the top of each picture the upper half of the axial section through the lens is shown together with the potential disturbance (6u) distribution. The diagrams below illustrate the distribution of the field disturbance along the optical axis. After E. Munro (1988), p. 946, Fig. 7.
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
253
at the positions of untilted electrodes is nonzero on the smooth part of the left electrode surface. Therefore there is good reason to apply the Bertein method here, and its results are most likely true. As the second example Munro evaluated the potential perturbation in an einzel electrostatic lens due to a misalignment of the right electrode (Fig. 6(b)). Here the potential disturbance at the positions of aligned electrodes is concentrated in the neighborhood of the edge of the right electrode, where the value of the normal derivative of the field potential as well as the direction of a vector normal to the surface change considerably from point to point. This makes the Bertein method inapplicable, and its results very inaccurate. In spite of its shortcomings, at present the Bertein method remains the one most widely used for calculation of tolerances for electron optical systems. The reason is that until recently there practically did not exist other methods for evaluation of electromagnetic field disturbances in general electron optical systems, except the considerably less powerful method of coordinate frame variation. We pass now to the description of the latter. C . Coordinate Frame Variation Method
Instead of determining a field potential disturbance at the ideal positions of electrode surfaces and then solving the Laplace equation, as is done in the Bertein method, one can find an appropriate coordinate frame (slightly different from the initial “perfect” one) so that distorted electrodes to take the perfect form in this coordinate frame. In other words, this means that if in a perfect coordinate frame {ql, q2,q3} boundary conditions are defined at the surface, which is determined by the equation F ( q , ,ij2, q3) = 0, then a new distorted coordinate frame {q,,q 2 ,q 3 } (where q i= qi &f;(ql,q2,&), i = 1,2,3, E < 1 is a small parameter,A are some functions) should be chosen so that the disturbed boundary is to be defined by the equation F(q,,q2,q3)= 0 with the same function F. Then in the new coordinate frame { q l , q2,q 3 } we need again to calculate the field in a system possessing the same perfect geometry as the initial one, but now the differential equation we need to solve in order to determine the system field is no longer the Laplace equation. It is obvious that this method is free of shortcomings inherent in the Bertein method, and it thus can be applied to systems with an arbitrary shape of electrodes. Moreover, both electrode distortions and a potential distribution on the electrodes need not be smooth. The comprehensive investigation of the coordinate frame variation method was reported by Saito (1 960). However, the description of this method in its general form requires too much space, and furthermore, it is rather complicated. So to make the idea of the method clear we confine ourselves to discussing only two examples. The first of them is the calculation of a
+
254
M . 1. YAVOR
FIGURE7. Two sections through a round immersion lens are shown. The solid line corresponds to the perfect lens and the dashed line to the lens with slightly elliptical electrodes.
potential perturbation in a round lens with weakly deformed electrodes. This problem was studied by Der-Shvartz and Kulikov (1962). We reproduce their reasonings now and consider the electrostatic field in a two electrode immersion round lens with slightly elliptic electrodes (Fig. 7). We assume the ideal lens aperture radii of both cylindrical electrodes to equal R. Thus in Cartesian coordinates {x,y , z } the field potential ii of the ideal lens is the solution of the Laplace equation a2ii d 2 i i -+-+-=O ax2 ay2
a2ii
(45)
az2
with boundary conditions
I
x2 + y 2 3 R2,
z = z,,
ii
=
V , for
x2 + y 2 = R2,
zI < z d z2,
z = z2,
x 2 + y 2 3 R2,
z = z3,
x2+ y 2 2 R2,
x2 + y 2 = R2, z
z3 d z
d
(46) z4,
x 2 + y 2 3 R2,
= z4,
where V , and V2 are electrode potentials. Suppose the inner surfaces of the distorted elliptic electrodes were defined by the equation
x2 -+-I+& E
y2
I-&
- R2,
(47)
being a small dimensionless parameter. If we pass to the new coordinates + E ) ’ ” , 9 = y/(l - E ) ~ ” , then Eq. (47) can be rewritten as
( = x/(l
c2 + q2 = R2,
(48)
and so the boundary conditions for the field potential u of the deformed lens
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
255
take the form
1 1
5’
z = zI,
u = V,
for
u = V2 for
t2+ v]’
+ q2 2 R2,
= R2,
z,
< z 2 z2,
z = z2,
t2+ q2 2 R2,
z
5’
= z3,
t2+ v]’
z
+ q2 2 R2,
= R2,
= z4,
zj
(49)
< z4,
t2+ q2 2 R2.
This coincides with Eq. (46) after changing in the latter x to 5 and y to v ] . In the new coordinate frame {t,v ] , z } the Laplace equation for the field potential u reads
Substituting into Eq. (50) the potential in the form of a series expansion u = uo E U , c2u2 . . ., we obtain the equations
+ +
+
a2uo a2uo azu0 at2 all az2
-+7+-=0
with the boundary conditions of Eq. (49), and
aZuo a2u, aV2 az2 at2 aV2
-+>+>=--a2ul a2u a2u
at2
(52)
with the homogeneous boundary condition uI = 0 at the deformed condenser surface, etc. To determine the solution in the linear approximation it is sufficient to calculate the functions uo and uI . The form of Eqs. (51) and (49), defining the function uo, coincides with that of Eqs. (45) and (46), defining the undisturbed potential ii. It is evident, however, that the function uo is not the undisturbed field potential, because it depends on different coordinates. Since the ideal lens is round, it is convenient to rewrite Eqs. in the “imperfect” cylindrical coordinates {A, 4, z } , where A 4 = arctan ( v ] / < ) . We also introduce the new function ti,, with u , = zi, cos24, and then finally obtain the following equations
a2u, 1 au, a2u, -+--+-=o aA2
A ~ A a$
(53)
256
M. I. YAVOR
These equations can be solved with the aid of any numerical method for solving partial differential equations. In the original paper by Der-Shvartz and Kulikov (1962) the finite difference method was used. For the second example we come back again to the calculation of the electrostatic field disturbance in a cylindrical condenser with a slightly shifted outer electrode (see Section IV, A). We recall that in the polar coordinate frame { r ,4} the electrode surfaces in the distorted condenser are defined by the equations r =rl,
r = r2
+ pcos4
(55) (see Fig. 4). In this case the coordinate frame variation leads to a more complicated equations, but they can be solved analytically. In this sense the present example is more obvious than the previous one. The field potential u in the distorted condenser satisfies the Laplace equation
with boundary conditions u = V , and u = V2 at the electrodes, whose surfaces are given by Eqs. (55). We introduce a new coordinate r - rl pcos4 -. r2 - rl Then in the linear approximation with respect to the small shift p the condenser electrode surfaces in the coordinate frame { R , 4} are determined by the equations R = r , and R = r 2 . The Laplace equation (Eq. (56)) in this coordinate frame takes the following form: R =r
aZu
-
aZu cos4 au -2cos4 2- -aR R aR
1 au 1 aZu +-+ 7 +r2-r, R a R R a+
-
aR2
-
--7j
1+
R - r , a2u R - r , a2u 2cos4 --+ 2 sin4 R2 aR&$ R' a42 ~
p".
. .] = 0.
(57)
We represent the field potential u as u = 240
+ pu, + p2u2 + . . . .
(58)
Substituting the expansion of Eq. ( 5 8 ) into Eq. (57) and collecting the terms with the same powers of p, we come to the equations for the functions uo and uI. The function uo satisfies the equation
A+--+--=() 1 auo 1 aZu, a2
aR2
R aR
R~ a42
with the boundary conditions uo(R = r i , 4)
=
q, i = 1,2. The corresponding
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
257
solution of this equation is V, lnr, - V,lnr,
vz - v,
(59)
where U = (Vz - V,)/ln(r,/r,). The form of Eq. (59) is identical to that of Eq. (34), but the former depends on R whereas the latter depends on r = IzI. Returning to the initial polar coordinates {r, 4} we obtain
ii being the ideal condenser field potential given by Eq. (34). We note again that the function uo and the undisturbed potential ~2do not coincide! We now go to the evaluation of the function u, . Taking into account that uo is independent of 4, we obtain the following equation for this function
aZu,
1 au R aR
1 RZ 8b2
-+--'+--A'=-
dR2
C O S ~
r2-r,
We represent u, in the form u, = fi, ( R )C Then we have
azu0 1 a u q 2-+--. aR2 R aR
O S and ~
(61)
substitute it into Eq. (61).
where prime denotes the derivative with respect to R , and the function satisfies the homogeneous boundary conditions f i , ( R= r,)
a,
= fi,(R = r,) = 0.
The corresponding solution of Eq. (62) reads u, = r2 -
rl
R
r2
+ rl
R r2
+ rl
Returning to the coordinates {r, b } and neglecting the terms proportional to p2, we finally come to the following formula for the potential disturbance 6u = uo pu, - ii:
+
which of course is identical to Eqs. (38) and (44) obtained by different methods. The two examples discussed allow us to determine the characteristic features of the coordinate frame variation method. First of all, we see that Eqs. (52) and (61), defining the functions u ,, are inhomogeneous. Moreover, 'comparing Eqs. (54) and (62), we can easily see that the forms of the
258
M. I . YAVOR
right-hand sides of these equations are different and thus depend on the defect type; that is, on the relations between initial and “imperfect” coordinates. To solve these equations we therefore cannot use the same numerical algorithm. This is the first general disadvantage of the method in question. The second and even more important one is that for an arbitrary defect it is not always possible to find the appropriate coordinate frame. These facts mean that the method of the coordinate frame variation can hardly be used as a part of a general algorithm for analysis of arbitrary tolerances. A considerably better alternative to the Bertein method seems to be the socalled method of integral equations “in variations,” which we describe now.
D. Method of Integral Equations “in Variations” This general method for the calculation of a field potential distribution in weakly deformed electron optical systems was proposed by Monastyrsky (1979). It seems to be very promising; so, although the review of the method of integral equations “in variations” was published recently in one of the previous volumes of Advances in Electronics and Electron Physics (Ilyin et al., Academic Press 1990), we find it useful to give its brief description here, in order that the reader can compare it with other methods of the tolerance theory. According to the method in question, the potential iiPof an undisturbed electrostatic field at an arbitrary point P inside an electron optical system is represented in the following form:
where Q is a point at the surface of the system electrodes; OQ is the surface and rpQ is the distance charge density; dS, is the elementary surface at between the points P and Q . It is well known that 5, satisfies the Fredholm integral equation of the first kind:
r;
Both points P and Q in Eq. (66) are assumed to lie at the surface f . We introduce a parametric representation P = F(5, yl) of the points of the i= surface. Suppose for simplicity that the coordinate frame { 5 , yl} is orthogonal on f ; that is T,P, = 0. Then dSQ = JQd( dyl, where JQ = IF,[ lP,,I. We designate
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
In the Lagrange coordinates
259
{t,q} Eq. (66) takes the following form:
S,
~ Q ~ p Q d t d=qu p ,
(68)
where D is the two-dimensional domain in the tq plane, whose mapping P(5, q) coincides with the surface. A distortion of the f surface can be represented as
r
r(t, q) =
f(t-9
9)
+ w5,v).
r
Then the surface and the deformed surface turns out to be parametrized in the same domain D . On the r surface a new surface charge density o((, q) = C(5, q) + So((, q) is realized. The variation of Eq. (68) leads to the following integral equation for So:
lD
S o Q G P Q & d= ~ sup-
s,
CQ6GPQd(dq,
(69)
SUP being the field potential variation on the surface. Since the positions of both points P and Q are changed when deforming the electrodes, the variation SGpQ can be represented as
+
(70) where 6, and 6, are variation operations with respect to the points P and Q , respectively. Simple geometrical considerations (see Ilyin et al., 1990) show that 6GpQ= 6pGpQ ~ Q G ~ Q ,
J sPGPQ
=
(6rPePQ), ‘PQ
where epQdenotes the unit vector r p Q / r p Q . Then the entire algorithm for calculation of the field potential disturbance is as follows: 1. Initial integral equation (Eq. (68)) is solved and the unperturbed charge density ii is determined. 2. Variation 6GpQ of the kernel of the integral equation (Eq. (69)) is calculated with the aid of Eqs. (70) and (71). 3. Assuming the potential variation SUP to be predetermined, Eq. (69) is solved and thus the charge density So, is defined. 4. Perturbation of the electrostatic potential Sup at an arbitrary point P inside the system is calculated by the formula (SO,
GpQ
+5
~ GpQ)dl 6 ~dv.
(72)
260
M . I . YAVOR
It is evident that the method presented can be applied to systems of an arbitrary geometry, with an arbitrary potential distribution on the electrode surfaces and also with arbitrary electrode distortions. The disadvantages of the method are its relative complexity (in particular, the need to parametrize the electrode surfaces by the Lagrange coordinates {t,q } ) , and the fact that the corresponding algorithm can be easily implemented only in the programs applying the boundary element method for the numerical field calculations (whereas the Bertein method is compatible with any numerical method for the field calculation). However, the use of the approach discussed together with the boundary element method is very convenient. Indeed, both integral equations (Eqs. (68) and (69)) are the Fredholm equations of the first kind with the same kernel G,, . Therefore they can be solved numerically with the aid of the same algorithm. Moreover, the matrices of the discrete analogues of these equations (i.e., linear equations derived as a result of some approximation) coincide. This allows one to reduce considerably the extent of calculations when solving the optimization problems for electron optical systems. The report by Monastyrsky (1979) contains not only the general ideas of the method but also formulae for the variation SG,, of the integral equation kernel in the case of round electron optical systems for most important types of electrode deformations maintaining the axial symmetry (such as the transfer of the electrode surface generating lines; their rotation, tension, or compression; and the variation of their curvature radii). Analogous studies for the defects violating axial symmetry are carried out in the paper by Monastyrsky and Kolesnikov (1983). The method of integral equations “in variations” was generalized for the Fredholm equations of the second kind by Freinkman (1983). At present the method in question is embedded in some computer programs for the calculation and optimization of electron optical systems. Such programs are described, for example, by Ivanov (1986) and Ilyin et al. (1990). An approach similar to the one described earlier was also used recently by Ximen and Li (1990) for the investigation of round electrostatic lens aberrations caused by the ellipticity, tilt, or shift of the lens electrodes. V. FIELD DISTURBANCE IN ELECTROSTATIC AND MAGNETIC SECTOR ANALYZERS
For the analysis of parasitic aberrations it is advantageous to have analytical expressions for electromagnetic field disturbances due to distortions of electrodes or pole faces. Such expressions can help one to pick out the types of electrode or pole face imperfections leading to most considerable parasitic
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
261
aberrations and thus to determine tolerances that need the most careful control. Electrostatic and magnetic sector analyzers are examples of the devices for which an approximate analytical method of the field disturbance calculation exists. This method is based on the asymptotic technique that was first proposed by Boerboom, Tasman, and Wachsmuth (1959) for the ideal sector magnet field calculation and by Boerboom (1960) for the ideal electrostatic toroidal condenser field calculation. Boerboom (1976) was also the first one to apply the method to the tolerance theory problems; namely, to the investigation of parasitic aberrations in an electrostatic cylindrical condenser. First approximate analytical formulae for the toroidal condenser field perturbation caused by its electrode shape variations were obtained by Trubatcheev (1977). The general asymptotic method for the field disturbance calculation in a toroidal condenser was developed by Boerboom (1989) and independently by Yavor (1989b), who obtained more accurate results (a comparison of results of the latter two articles is discussed by Yavor, 1991a). The field perturbation in a homogeneous sector magnet due to manufacturing inaccuracies was studied by Boerboom (1992); the similar investigation for an inhomogeneous sector magnet has been made by Boerboom and Yavor (1992). In principle the asymptotic method allows one to obtain analytical expressions for electrostatic or magnetic field concentrated inside a narrow gap between two arbitrarily curved surfaces. However, we will confine ourselves here with only the two special cases most important in electron optics; namely, the fields of distorted electrostatic toroidal condenser and sector inhomogeneous magnet.
A . Electrostatic Toroidal Condenser
A toroidal condenser consists of two electrodes, which are rotationally symmetrical with respect to the z axis of the cylindrical coordinate frame { r , +,z}. The generatrices of their surfaces are the circle arcs, whose curvature centers lie in the median plane z = 0 (Fig. 8) and do not necessarily coincide; and the radii R, and R2 are close by each other, i.e., IR, - R, I < min{R,, R2}. These arcs intersect with the median plane z = 0 at the points with the radial coordinates r , = r, - b and r2 = r,, + b, ro being the beam axis radius inside the condenser. The generatrices of an ideal toroidal condenser are described by the following equations: r - ro = G(z) + bgj(z), (73) 1 -4 where R,
= (R,
+R2)/2, the subscript j = 1 corresponds to the internal
262
M. I. YAVOR
1'
FIGURE8. Shown is the section by the plane 4 = constant through a toroidal condenser with perfect (solid lines) and imperfect (dashed lines) electrodes.
electrode a n d j = 2 to the external one. The form of the equations describing the functions gj(z) depends on the location of the generatrix curvature centers. For example, if the electrode surfaces have equal curvature radii R,= R, = &, then g,(z) = 1, g,(z) = 1; if the curvature centers coincide, i.e., R, = Ro - b, R, = Ro b, then
+
+ . . .]z2 + . . . . The weakly deformed electrode surfaces can be generally described by the following equation: r
-
ro = G ( z ) + bg,(z,
419
(74)
+
where g,(z, 4) = g,(z) 6g,(z, $), functions 6g, characterize small distortions of the electrode surfaces (ISg,1 4 maxlg, I). We introduce now the dimensionless coordinates 4 = (r - ro)/ro and i = z/ro. Then we can rewrite Eq. (74) as 4 = F(i)+ Ef;(i,
41,
(75)
where the dimensionless parameter E = b/r, is assumed to be small ( E 4 I), F ( 0 = G(roi)/ro, + Sf;(i> 4), J(0= t?,@0O9 S f ; G 4) = k , ( r o L 4). In the following we will consider the functions S f ; as smooth; i.e., we will suppose not only these functions themselves but also their derivatives to be small. Note that the derivatives of all the functions involved in Eq. (75) are not large, if the radius & is not much smaller than ro. This fact will allow us
f;cn =J
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
263
to obtain the formulae for the condenser field potential in the form of the asymptotic expansion by the powers of the small parameter E . The potential u of the deformed toroidal condenser electrostatic field satisfies the Laplace equation
L(u)
aZu a2u =-+-+-aV2 at2
1
1
au
a2u
I
+ v a~ +--(1 + v ) a~4 2 -
0
and the boundary conditions
mi)+ Ef;(i,
4), i,41 = y 9
j
=
132,
(77)
V , and V2being the electrode potentials. We represent the potential u in the form of the expansion
c" 1
4%L 4) = r = O 7. f f , ( L 4)v',
(78)
where the coefficients H,(i,4) are also the serii
Substituting Eqs. (78)-(80) into the Laplace equation (Eq. (76)), we come to the following set of equations for the coefficients HZ2.k
+ l)e(?I,k + i2H$)-k 2ie(!l,k+2 + i(i + H!,T = 0.
-k (2i
ff$)+2
1)H?2,k+2
(81)
Here prime denotes the derivative with respect to the angle 4. Equations (81) allow one to determine all the coefficients H$) with i 2 2, if the coefficients H$'l and HfYj are known. To define the latter coefficients we substitute Eqs. (78) and (79) into the boundary conditions of Eq. (77). With the relation [F(O + Ef;(i,
c' Z!(ii! Z)! F'-'(of;'(L 4)E'
4)If= I = o
-
we obtain
where vj = Now we represent the functions F"(i) andf;'(c,
4) involved in Eq. (82) as
264
M. I. YAVOR
the series expansions
$cr3 4) = c CJ)(4)iP, p=o
Substituting Eqs. (83) and (80) into Eq. (82) we come to the relation
We introduce the notations a = n + I , p = 2m + p + k . Now, equating to zero the sums of the terms with the same products [ B ~ a ,we finally come to the following set of equations for the coefficients H $ :
Here for the arbitrary integer numbers t and s, 6,, = 0 with t # s, a,, = 1, M ( a , 8) is the finite set of the values of the integers i, k, m, n, dependent on the integer nonnegative parameters a and B. This set is defined by the following inequalities: Odnda,
OdkdB,
Odm<-B-k 2 '
a -n 6
i< m
+a-
n.
(86)
Equations (8 1) and (85) allow one to determine all the coefficients HJ$). Since the procedure of determining of coefficients H,,, = Z?==,H$&"is rather cumbersome, here we will list only the final results. It is convenient to ( 4 ) ,the represent the coefficients H,,k as the sums f & k ( 4 )= hr,k+ ~ ~ , ~where constant terms hr,,are the field potential expansion coefficients for the undeformed toroidal condenser (see Eq. (30)), and the functions X J , k ( 4 ) are small additional terms characterizing the field perturbation due to electrode shape distortions (see Eq. (31)). Furthermore, we will suppose the electrode potentials to be normalized so the coefficients h,,,, and have the values of h,, = 0 and = 1, as is adopted in Eq. (30). Then the coefficients h , , (with i + k d 4) and x , , ~(with i k < 3 ) are as follows:
+
h0.I
=
ho.3 = hI.1 = 4
+ F$
-
3
=h
, l
= h3,, = 0,
F i - 34(] + e3{. . .},
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS h2,o
= - 1 - h0,2 3
h.2
=
-FidFii - 1) - tcJ2
h3.0
=
Fi- 2qi+ 2 +
ho,4 = h2.2
265
-A)[,+ 4.. .>, -A){(+ &{. . .>,
+ 34dL - f J n - &i + 4.. .I, - 3&)(8i - 1) + (t - 34"L -A)>,, + FiiK + 4.. .>,
3~i(FK- 1)
= I;ii(l
40= - 5 4 0 - 4h2,0- h , , - 4 h l , 2 - 2ho,z, &
xo,o(4) = - p h + K )+ 4.. .I, Xl,0(4)
=
-t(& - Sf,) &
XO,l(4) = - +%
+&h2,0(%
+ S f 2 ) + E2{. . .I,
+ @& + E2{. . .>,
-H& - 6h)i -
Xl,l(4)
=
x2,0(4)
= 3(1 - F,,)(&
+&h2,0(K
+ mi + E2{. . .>,
- ah)+ $.h
- (hI.2 + h*,O)(%
+ W2)M + (% + mii
+ Sf) - +(A +L)ii(v2- wI>l + E 2 { . . .>,
xo,2(4) = 3Fi,(K?- Sf,)+ ; t ~ - ~ 2 , o h o . 2
+ sf2 -Jl)iiI(6h + W 2 )
+ H A +j;)i,(v2 - d f l ) - (Sf1 + mii1+ E2{. . .I, xo,3(4) = #Fii(v2+ 4.. .>, x d 4 ) = - + M 1 - &)(K- Vl)+ a,,(%- S h ) -3 (& - &)>a + &{. . .}, (4) - x0,3(4) - d , 0 ( 4 ) , x 3 , 0 ( 4 ) = - 3 x 2 , 0 ( 4 ) - x ~ ( 4) x l , 2 ( 4 ) - 2xo,2(4) - xb(4). Here the subscripts i and 4 denote the derivatives with respect to the x2.1
(4) =
- 111.1
corresponding variables at i= 0. Using the results listed together with Eqs. (25)-(29), one can analyze the influence of various types of manufacturing defects in a toroidal condenser on its electron optical properties. For example, the displacement of the particle beam axis in the radial direction is defined by the magnitude of the coefficient 6, in Eq. (25), and hence, as is clear from the formulae following Eq. (31), by the values of the functions xo,oand This means that such a displacement occurs mainly due to the variation (SS, - &) of the inter-
266
M. I. YAVOR
electrode gap in the condenser median plane ( = 0. If the interelectrode gap is not changed Sf2 - Sf, = 0, then the beam axis displacement is defined by the function &(Sf, Sf2)and therefore becomes much smaller (for the same order of magnitude of the functions Sf, and Sf) due to the presence of the small parameter E . The beam axis displacement in the axial direction (determined by Eq. (26) and hence by the function x0,,)is proportional to the summary inclination (Sf, Cf2)i of the electrodes relative to the median plane and also is comparatively small due to the presence of the small parameter E . Variations of magnification and dispersion coefficients as well as of the condenser focal length are determined by the coefficients b, ,b,, and b, in Eq. (27); that is, by the functions x ~ ,x~, , ,~and , x ~ , Thus ~ . these variations also appear due to changes in the interelectrode gap and a smaller extent due to other factors. The most dangerous effect, that is, the image defocusing in the radial direction due to the mixing of q- and i-variables in Eq. (27), is proportional to the coefficients b, and 6, in Eq. (27) and hence is defined by the functions x0,,and x,,, . The results listed previously show that the defocusing in question appears mainly because of the mutual inclination (Sf2 - Sf,)( of the electrodes. The influence of the summary electrode inclination (Sf, + S?f2)i relative to the median plane on the defocusing of the image, formed by the condenser, is smaller due to the small factor E. Analogously one can investigate theoretically and calculate the influence of various toroidal condenser manufacturing defects on the other properties of the condenser, including aberrations.
+
+
B. Inhomogeneous Sector Magnet
Pole faces of an ideal inhomogeneous sector magnet possess rotational symmetry with respect to the z axis of the cylindrical coordinate frame { r , @,z},and the generatrices of their surfaces are symmetrical with respect to the magnet median plane z = 0 (Fig. 9). Generally the equations describing the weakly deformed pole faces in the dimensionless coordinates {q, 4, i) introduced in Section V.A can be represented in the following form: c=(k1)’Eq,(V54),
j = 1,2,
(87)
b/robeing the small dimensionless parameter; ro the beam axis radius; b the half air gap in the ideal magnet measured at r = r,; the subscript j = 1 corresponds to the lower pole, j = 2 to the upper one. The functions qJ can be expressed as qJ(q, 4) = 4(q) + 6qj(q,@), where the first term defines the pole faces of the ideal magnet and the second one the small additional terms characterizing the pole face distortions. Generally the function q(q) can be E =
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
”
267
C
FIGURE 9. Shown is the section by the plane 9 = constant through an inhomogeneous (toroidal) magnet with perfect (solid lines) and imperfect (dashed lines) pole faces.
represented as the series expansion
For example, in the case of homogeneous magnets pm = 0 for all m 3 1. For the conical magnet p m = 0 with m 3 2, p , = (tan a)/&,ct being the cone angle. etc, In the case of a toroidal magnet p2 = ro/(2R&),p 3 = 0, p4 = r;/(8R3&), where R is the radius of the pole face generating lines. We will suppose in the following that all the coefficients pm are not large (in particular it means that for the toroidal magnet the condition of R S ro should be fulfilled). We characterize the magnetic field by the scalar potential w . Analogous to what was done in Section V.A, we can represent the scalar potential in the form of the expansion
where
The scalar magnetic potential w satisfies the Laplace equation L(w) = 0, where the operator L is given by Eq. (76) and the boundary conditions
(k1)’&qj(V,41, $1
(f1)’ wo,
(91) (k1)’ Wobeing the scalar potentials at the lower and upper pole surfaces. The calculations similar to those given in Section V.A lead to the two sets of W[V,
=
268
M. I. YAVOR
relations between the coefficients Alyi($). The first is identical to Eq. (81), and the second has the following form:
are the coefficients of the expansion
Here wo = W0/s,@I($)
Analogous to the electrostatic case, it is convenient to represent the coefficientsA,,k($)= C~=oAj[;($)~n as the sums AL,k($)= a8,k+ a,,,($), where the terms a,,kare the constant coefficients of the ideal magnet scalar potential ~(4) the potential expansion (see Eq. (21)), and the functions ~ ~ , characterize disturbance due to the pole face distortions (see Eq. (22)). Furthermore, we will assume the value Woto be chosen to obtain a , , = 1 , as is adopted in Eq. (21). The results of calculations show that the coefficients (with i k < 4) and cqk($) (with i + k d 3) have the following form:
+
u030
UI.1 Q2.l
a3,1
= ul,O = a1,2 = a2,O = a2.2 = a3,O = a4,O = 0,
+ &2 ( - P 3 + PIP2 QP: SP2 + $PI) + E4{. . .I> = 2(P: P 2 ) + E2{. . .}, = - 6(P3 - 2PlP2 + d ) + {. . .>, =
-PI
-
-
-
E2
+ 2p2 + E 2 { . . .>, =6 ~3 12~1~ +26 ~ : 2 ~ +: 2 ~ -2 + ao,o($> = 5 (892 + . .I? a0.3
=
-2p: +PI
a1.3
-
E
-
(%I)
E3{.
PI
E*{.
. .},
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
269
The results listed allow one to analyze the influence of various manufacturing defects of a sector magnet on its electron optical properties. Here we point out only one interesting fact. In Section V.A it was already noted that the defocusing of the image formed by the sector deflecting electron optical element is proportional to the coefficients b3 and b, in Eq. (27). In the case of the magnetic sector field these coefficients are determined by the functions ~ ~ , ~ ( 4 ) , and a0,,(4). Since all these functions are proportional to the small parameter E , the corresponding manufacturing tolerances for the magnetic sector analyzers with the same tolerable limits of the image defocusing can be much more loose than those for the electrostatic toroidal condenser.
VI. APPLICATION OF APPROXIMATE CONFORMAL MAPPINGS Conformal mappings are widely used for the calculation of two-dimensional electrostatic and magnetostatic fields in electron optical devices. It has been shown already in Section 1V.A that they also can be applied to the evaluation of electromagnetic field disturbances in imperfectly made electron optical elements, if such disturbances preserve the two-dimensional field distribution. However, only a limited number of conformal mappings characterizing the manufacturing defects can be described by exact analytical formulae. Therefore it is of a great interest to construct approximate conformal mappings that allow one to describe in an analytical form arbitrary two-dimensional electromagnetic field disturbances caused by shape distortions of an element electrodes or pole faces. The mathematical method (usually called variational) of constructing such conformal mappings was developed by M. A. Lavrentyev; its detailed description is contained, for example, in the textbook by Lavrentyev and Shabat (1965). It allows one to derive approximate analytical expressions for conformal mappings of domains whose shapes are nearly the shape of some canonical domain (the canonical domain may be, for example, a circle, an upper half-plane, or a strip) on this domain. The first one to apply the
270
M. I . YAVOR
FIGURE10. (a) Perfect (solid lines) and imperfect (dashed lines) poles of a magnetic quadrupole lens (the z-plane). (b) Conformal mapping of the z-plane on the [-plane. (c) Conformal mapping of the domain restricted by dashed lines in the [-plane on the I-plane.
variational method to electron optical problems was Shukeylo (1959), who studied the field variation in a magnetic quadrupole lens due to the replacement of hyperbolic poles by cylindrical ones. Later Doynikov (1966) used the same idea for the calculation of the field disturbance in a magnetic quadrupole lens, caused by its assembling inaccuracies. To illustrate the method in question we will discuss this problem in detail. We consider the magnetic quadrupole lens with the hyperbolic pole face generatrices defined by the following equation: I.’=
* 2x1
-
(93)
(see Fig. lO(a)). The values of the scalar magnetic potentials at the pole surfaces are assumed to be L- W, . Suppose that as a result of the machining or assembling inaccuracies the shapes of the pole face generatrices are not exactly hyperbolic, but identical in all sections of the xy planes (i.e., independent of the third Cartesian coordinate). Then the distorted pole surfaces can
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
271
be described by their normal deviations N(x, y ) from the ideal pole surfaces measured at all the points P ( x , y ) of the ideal generatrices. We introduce the complex coordinate z = x + iy in the xy plane. The conformal mapping cash 711 = enz2/2 2
(94)
transforms the domain between the ideal lens poles into the strip - m < ~ < ~ ,- l < ' 1 < l
+
in the plane of the complex variable 5 = ( iq (Fig. 10(b)). In the same plane the curves that are the results of the mapping by Eq. (94) of the distorted pole face generatrices are described by the functions
where in the linear approximation with respect to the small deviation 61 the functions 4((),j = 1,2 are connected with this deviation by the relations
I ":I
4(<)= - tanh - ,/x'[(,(-
l)'-'i] + y 2 [ ( , ( - l)j-'i]
x d l [ x ( ( , ( - I)'-li],y[(,(-
l)j-Ii)],
functions x((, '1) and y ( 5 , 1) being determined by Eq. (94). Now the problem is to construct the conformal mapping of the domain - co <
5 < co,
-1
+ F ' ( 0 < '1 < 1 - Pl(5)
(96)
on the strip
- ~ < < < < , -l
+1 2
1
71t P+(t)cotanh - dt, 2
272
M. I . YAVOR
where F+ (4
1
=2
p,(;)
F-(t) = 2 [F,
+ F,
(i)
- F2
(;)I (;)I.
>
The mapping of Eq. (98) transforms the points ( = + i into the points i(i + po), where
i. =
dt
(99)
7ct'
sinh 2
- X
Now it is not difficult to find the scalar magnetic potential distribution in a quadrupole lens with deformed pole surfaces. The solution of the Laplace equation in the domain of Eq. (97) with the boundary conditions v(p<po,i)=v(p> -po,-i)= -W0,v(p>p,,i)=v(p< - p o , - i ) = Wois
2
+ arctan
[
tanh
7 c (~ '0)
2
1
1+
tan 7cv 2
sinh arctan
Z(P
+ Po) 2
7cV
cos 7 L
sinh - arctan
1
,Po) )
7 - 0 -
7c:
cos T
( 100)
In the linear approximation with respect to the small quantity po Eq. (100) can be rewritten as 7cp
7cv
cash - cos 2 2 7c
[sinh
Ty + [cos
(101) The potential v(p, v) given by Eq. (101) is the imaginary part of the complex
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
273
potential
Substituting Eqs. (98), (99), and (94) into Eq. (102) and expanding the result in the power series in the z variable, we finally come to the following expansion:
To give the reader the idea of the form of the coefficients @k and the formulae for these coefficients with k = 1,2,3:
Pk, we list
-a
Some further coefficients are listed in the paper by Doynikov (1966). The real scalar magnetic potential of the deformed quadrupole lens is the imaginary part of the complex potential given by Eq. (103).
2 74
M. I. YAVOR
FIGURE11. Cylindrical condenser with a local defect (a scratch)
The method considered thus allows one to determine the field disturbance in a quadrupole lens due to an arbitrary small distortion of the pole face shapes, preserving the two-dimensional field distribution. The possibility of correcting such disturbances has been discussed by Doynikov and Samsonov (1989) and by Breese, Jamieson, and Cookson (1990). It should be noted that, as was shown by Lavrentyev and Shabat (1965), the approach just described can be applied only if the boundary distortions of the field domain are smooth (this means that not only the functions F,(5) themselves, but also their derivatives, are small). However, in some cases approximate conformal mappings can help one to estimate the influence on the electromagnetic field of the local manufacturing defects that are not represented by smooth functions. To illustrate this we consider the influence of a local manufacturing defect on the cylindrical condenser electrostatic field. Suppose this defect to be located at the inner electrode of the condenser and in the polar coordinate frame { r , 4 } to be described by the equation
44) = rl[1 + P(4)l> where max Ip($)I < E and p ( 4 ) = 0 for 14 - 4,,11 3 < E (here E 4 1 is a small quantity). The derivative ~ ' ( 4is)not assumed to be small. The defect in question can describe, for example, as a scratch (with p ( 4 ) < 0; see Fig. 11). The outer electrode of the radius r = rz is supposed to be perfect. Then the electrostatic field potential in the condenser is the solution of the following equations:
+
A u ( r , 4 ) = 0, ~ { r I [ l p(4)1,4) = VI, ~ ( ~ 2=~V4, , ) (104) A being the Laplace operator, and V , and V, the electrode potentials. We will construct the approximate solution of Eqs. (104) with the aid of the conformal mapping method. First of all, the conformal mapping z1 =-In1
In R
Z
r,ei'O
transforms the domain r l [ 1 + p(4)l < IzI = r < r z , - co < argz = 4 < 00 in the plane of the complex variable z = rei4 (to be exact, in the so called
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
275
Riemann logarithmic surface) into the domain - ~ < X < < ,
Y(x)
in the plane of the complex variable z I = x terms of the order of E'
Y(x)
( 106)
+ iy. Here accurate within the
1 In R
= -p(& - x In R).
Note that Y(x) = 0 for 1x1 2 xI= 4I/In R. We now will construct the conformal mapping 2, = w(z,), transforming the domain given by Eq. (106) into the strip 0 < Imz, < 1 in the plane of the complex variable z,. It is known (see Lavrentyev and Shabat, 1965) that the inverse mapping zI = g ( z 2 )= x ( z 2 ) iy(z2) can be represented in the following form:
+
m
g(z2)= z2 -
's
4 2 2 - 0 dt. y(t)cotanh 2
-m
Since with real t we have y ( t ) = Img(t) = Y[x(t)], then, denoting we can rewrite Eq. (108) as
t1= w(x,),
51
'S
7+2 - t ) dt. y(t)cotanh 2
g ( z 2 )= z2 - 5
-51
It is evident that the value of tl is small (in the notations adopted in the asymptotical theory t1= O(E), which means ''t is of the same order of magnitude as 2'). In principle, if we are not interested in the detailed field distribution in the very neighborhood of the condenser electrodes, it is sufficient for us to determine the function w ( z I )for Izl I & xIor, in other notation, for Iz2( & 5 , . Expanding the function cotanh [n(z2- t)/2] in the power series and substituting the result into Eq. (109), we obtain
g(zd
= z2 -
Q cotanh I 7-2 + O(e3), L
L
where
Q=
y ( t )dt = O(E2).
(1 11)
-(I
Inverting Eq. (110) we obtain the required formula for the approximate
276
M. I. YAVOR
(accurate within the terms of the order of c3) conformal mapping w(z,)= z,
Q cotanh -.rz +2 2
1
The form of Eq. (1 12) is very simple, but the problem is to determine the value Q . To do this, we use the well-known formula zcotanhz
=
1+
X
n= I
(-I)'-'
2," B, (2n)!
-ZZn,
B, being the Bernulli numbers. Substituting Eq. (113) into Eq. (109) and assuming w = O(E), we obtain 52
-
52
The well-known fact of the theory of the complex variable functions is that with - 5, < 5 = Rez, < tl, v] 3 Imz, > 0 the following equation is valid:
where
It means that x(5) = 5 + v(<)/n+ O(c3), and thus accurate within the terms of the order of E~ we have
Since the function v(5) is dependent on y(<),it is necessary to solve Eq. (1 16) . is very difficult to do because numerically to determine the function ~ ( 5 )This Eq. (1 16) is too complicated. However, in some cases the quantity Q is easy to estimate. For example, if the function Y(x)is symmetrical [Y(-x)= Y(x)], negative [ Y(x)< 01, and has only one minimum, then using Eq. (1 15) it is not difficult to obtain the inequality 15 + v(5)/nl 2 151. In this case the simple estimation holds
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
277
If the function Y(x)is nonnegative [Y(x) 2 01, then we have to find any simple function Y,(x)2 Y(x)such that the corresponding mapping z, = wo(zI)of the domain Y , ( x ) < Imz, < 1 on the strip 0 < Imz, < 1 is known explicitly. Then Q < Q,, where Q, is the constant in the representation
w o ( z l ) z,
+ QoL
7121 cotanh 7.
L
For example, if Y,(x)= ,/for 1x1 < x, and Y,(x)= 0 for 1x1 > x,, then w 0 ( z , )= z , + xi/zl+ O(z;,); that is, Q, = zxi. Other ways to choose the function Y,(x)also exist, of course. Now we come back to the evaluation of the field distribution in a deformed cylindrical condenser. The conformal mapping
i= r , exp{i(4, - z2In R)}
(1 17)
transforms the strip 0 < Imz, < 1 into the domain r , < of the complex variable i.Then we have
iQ In R
cotanh
< r, in the plane
[2::R
-l n 5 ] } '
Using Eq. (35), after some simple calculations we obtain the following approximate representation of the field potential disturbance 6u:
6u(r,d)=
-
[ 21
sin B In
Q
cosh[B($
-
4,)]
-
cos
where B = n/(ln R). Note that usually the interelectrode gap in a cylindrical condenser is small compared with the electrode radii: r, - I , 4 r , . This means that B & 1. The latter relation allows one also to obtain, with the aid of Eq. (119), approximate analytical formulae for distortions of charged particle trajectories in a condenser due to the local defect. Such formulae can be found in the paper by Yavor (1989d). VII. CONCLUSION In the present chapter we tried to describe systematically the modern methods for the calculation of parasitic aberrations caused by manufacturing or assembling imperfections of electron optical systems. It should be noted, however, that in spite of the intensive development of the theory of parasitic
278
M. I. YAVOR
aberrations during the last decade, many actual problems have not yet been solved. For example, what is the influence of the electromagnetic field variations on the properties of devices with crossed fields and mirror-type analyzers? What most important parasitic time-of-flight aberrations can one expect in mirror (reflectron-type) systems as well as analyzers containing sector fields? At present we have no satisfactory answers to these and many other questions. The separate subject, that was not discussed, is the correction of parasitic aberrations. For example, one of the problems here is to find effective ways for correction of the image defocusing caused by the violation of the field symmetry with respect to the system median plane (see Sections I1 and 111). It is also interesting to study the possibility of applying of known multipole correctors to the elimination of parasitic aberrations. Thus the problems of calculation and correction of parasitic aberrations need further thorough investigation. The author hopes that this review will stimulate the studies in the field of the tolerance theory for electron optical systems. In conclusion, to make it easier for the reader to find in the literature the information he or she is interested in, we give some comments to the list of references. The general effects of the mutual displacement of electron optical units in multistage devices are investigated by Bazhenova et al. (1967), Matsuda et al. (1977), and Brown and Rothacker (1977). Some particular problems of this kind were also studied by Malov and Trubatcheev (1978) (radial displacement of a sector magnet), Kawakatsu et d.(1968) (magnetic quadrupole lenses), Zhu and Liu (1987) and Liu and Zhu (1990) (lenses and deflectors). Parasitic aberrations in round lenses were discussed by Strashkevitch (1959) (axial astigmatism), Bertein (1947a, 1947b, 1948), Sturrock (1951), Archard (1953), Der-Shvartz and Kulikov (1962), Janse (1971), Janssen and Thiem (1988), Munro (1988), Kurihara (1990), Ximen and Li (1990) and Cdgcombe (1 99 1). Parasitic effects in cathode lenses were calculated by Vorobjev (1959), Vlasov and Shakhmatova (1962), Der-Shvartz and Kulikov (1968), Kolesnikov and Monastyrsky (1988). Tolerances for focusing and deflection systems were investigated by Tsumagari et al. (1986; 1987), Kurihara (1990), Liu and Zhu (1990), and Liu et al. (1990). Parasitic effects in quadrupole lenses were studied by Doynikov (1966), Kawakatsu et al. (1968), Doynikov and Samsonov (1989), and Breese et al. (1990, 1991). General multipoles were considered by Halbach (1969). The problems of tolerances for cylindrical electrostatic analyzers are discussed by Boerboom (1976) and Yavor (1989a; 1989d); for toroidal condenser by Ioanoviciu et al. (1974), Malov and Trubatcheev (1979), Boerboom (1989) and Yavor (1989~;1990a,b).
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
279
Parasitic aberrations in homogeneous magnets were studied by Lilly et al. (1963), Cambey et al. (1964), and Boerboom (1987; 1992); in inhomogeneous magnets by Ioanoviciu et al. (1974) and Boerboom and Yavor (1992). An information on parasitic effects in crossed electromagnetic fields can be found in the paper by Kuzmin (1971); Wien filters were considered by Ioanoviciu and Cuna (1978). Some particular defects of cylindrical mirror analyzer were investigated by Zashkvara and Ilyin (1973).
REFERENCES Archard. G. D. (1953). J . Sci. Instrum. 30, 352. Bazhenova. I. M., Zinoviev, L. P., and Fjodorova, R. N. (1967). Preprint P9-3552, Joint Inst. Nucl. Researches, Dubna. Bertein, F. (1947a). Ann. Radioel. 2, 379. Bertein, F. (1947b). Comp. Ren. Acad. Sci. Paris 224. 106. Bertein, F. (1948). Ann. Radioel. 3, 49. Berz, M. (1990). “Computer Codes and the Linear Acc. Community.” Los Alamos LA-I 1857-C: 137. Boerboom, A. J. H. (1960). Z . Naturforsch. 1 5 , 347. Boerboom, A. J. H. (1976). In/. J. Mass Spectrom. Ion Phys. 22,259. Boerboom, A. J. H. (1987). Nucl. Instrum. Meth. A258,412. Boerboom, A. J. H. (1989). Int. J. Mass Spectrom. Ion Phys. 93, 267. Boerboom, A. J. H. (1992). Int. J . Mass Spectrom. Ion Phys. Boerboom, A. J. H., Tasman, H. A., and Wachsmuth, H. (1959). Z . Naturforsch. 148, 816. Boerboom, A. J. H., and Yavor, M. I. (1992). Int. J . Mass Spectrom. Ion Phys. Breese, M. B. H., Jamieson, D. N., and Cookson, J. A. (1990). Nucl. Instrum. Meth. B47, 443. Breese, M. B. H., Jamieson, D. N., and Cookson, J. A. (1991). Nucl. Instrum. Meth. B54, 28. Brown, K. L., and Rothacker, F. (1977). Nucl. Instrum. Methods 141,393. Brown, K.L., Rothacker, F., Carey, D. C., and Iselin, C. (1973). SLAC Report No. 91, CERN 73-16. Cambey, L. A,, Ormord, J. H., and Barber, R. C. (1964). Can. J. Phys. 42, 103. Der-Shvartz, G. V., and Kulikov, J. V. (1962). Radiotekhnika i Elektronika I, 2061; Radio Eng. Electron Phys. 7. Der-Shvartz, G. V., and Kulikov, J. V. (1968). Radiotekhnika i Elektronika 13,2223; Radio Eng. Electron Phys. 13. Doynikov, N. I. (1966). In “Elektrofizitcheskaya apparatura” [Electrophysical Apparatus] 4,84. Atomizdat, Moscow. Doynikov, N. I., and Samsonov, G. N. (1989). Zhurnal Tekh. Fiz. 57, 7, 138; Sov. Phys. Tech. Phys. 34. Edgcombe, C. J. (1991). Oprik 86, 120. Freinkman, B. G. (1983). In “Algoritmy i metody rastcheta elektronno-optitcheskich sistem” [Algorithms and Methods for Calculation of Electron Optical Systems], 66. VTs SO AN Publ. Novosibirsk. Glaser, W., and Schiske, P. (1953). 2. angewandte Physik 5, 329. Grinberg, G. A. (1948). “Izbrannyje zadatchy matematitcheskoy teorii elektritcheskikch i
280
M . I. YAVOR
magnitnykch javleniy” [Selected Problems of the Mathematical Theory of Electric and Magnetic Phenomena]. USSR Acad. Sci. Publ., Moscow. Halbach, K. (1969). Nucl. Instrum. Method. 74, 147. Ilyin, V. P., Kateshov, V. A,, Kulikov, Y. V., and Monastyrsky, M . A. (1990). In “Advances in Electronics and Electron Physics,” 78, 155. Academic Press, Cambridge, Mass. Ioanoviciu, D., and Cuna, C. (1978). Inst. Phys. Con$ Ser. 38,258. Ioanoviciu, D., Cuna. C., and Mikaila, A. (1974). Rev. Romaine Phys. 19, 963. Ivanov, V. Y. (1986). “Metody avtomatitcheskogo projektirovanija elektronnykch ustrojstv” [Methods for Automatic Design of Electron Devices], 1, 2. IM SO AN, Novosibirsk. Janse. J . (1971). Optik 33, 270. Janssen, D., and Thiem, S. (1988). Optik 79. 154. Kawakatsu, H., Vosburgh, K . G., and Siegel, B. M . (1968). J . Appl. Phys. 39, 255. Kolesnikov, S. V., and Monastyrsky, M . A. (1988). Zhurnal Tekh. Fiz. 58, 3; Sov. Phys. Tech. Pkys. 33. Kurihara, K. (1990). J . Vac. Sci. Technol. B 8, 452. Kuzmin, A. F. (1971). Zhurnal Tekh. Fiz. 41, 765; Sov. Phys. Tech. Phys. 16. Lavrentyev, M. A., and Shabat, B. V. (1965) “Metody teorii funktcij kompleksnogo peremennogo” [Methods for the Theory of Complex Variable Functions]. Nauka, Moscow. Lilly, A. C., Weismann. T. J., and Lowitz, D . A. (1963). J . Appl. Phys. 34, 631. Liu, H., and Zhu, X . (1990). Optik 83, 123. Liu, H., Zhu, X., and Munro, E. (1990). J . Vac. Sci. Technol. B8, 1676. Malov, A. F., and Trubatcheev, G . M. (1978). Nautchnyje pribory SEV [Scientific Instruments SEV] 18, 34. Malov, A. F., and Trubatcheev, G. M. (1979). Nautchnyje pribory SEV [Scientific Instruments SEV] 19, 57. Matsuda, H., Matsuo, T., and Takahashi, N . (1977). Int. J . Mass Spectrom. Ion Phys. 25, 229. Matsuo, T., and Matsuda, H. (1971). Int. J . Mass Spectrom. Ion Phys. 6, 361. Matsuo, T., Matsuda, H., and Wollnik, H. (1972). Nucl. Instrum. Meth. 103, 515. Monastyrsky, M. A. (1978). Zhurnal Tekh. Fiz. 48, 2228; Sov. Phys. Tech. Phys. 23. Monastyrsky, M. A. (1979). In “Tchislennyje metody rastcheta elektronno-optitcheskikch zadatch” [Numerical Methods for Solving Electron Optical Problems], 108. VTs SO AN Publ., Novosibirsk. Monastyrsky, M. A,, and Kolesnikov, S. R. (1983). Zhurnal Tekh. Fiz. 53,1668; Sov. Phys. Tech. Phys. 28. Munro, E. (1988). J . Vac. Sci. Technol. B. 6, 941. Romaniv, L. E. (1974). In “Vytchislitelnaya i prikladnaya matematika” [Numerical and Applied Mathematics] 22, 94. Vyschaja Shkola, Kiev. Saito, T. (1960). J . Phys. Soc. Japan 15, 2069. Shukeylo, I. A. (1959). Zhurnal Tekh. Fiz. 29, 1225; Sov. Phys. Tech. Phys. 4. Strashkevitch, A. M . (1959). “Elektronnaya optika elektrostatitcheskikch poley, ne obladajustchikch osevoy simmetriey” [Electron Optics of Electrostatic Fields Without Axial Symmetry]. State Math. Phys. Publ., Moscow. Sturrock, P. A. (1951). Phil. Trans. Roy. Soc. London A243, 387. Trubatcheev, G. M . (1977). In “Elektrohitcheskaya apparatura” [Electrophysical Apparatus] 15, 155. Atomizdat, Moscow. Tsumagari, T., Murakami, J., Ohiwa, H., and Noda, T. (1986). J . Vac. Sci. Technol. B. 4, 140. Tsumagari, T., Murakami, J., Ohiwa, H . , and Noda, T. (1987). Jap. J . Appl. Phys. 26, 1772. Vlasov, A. G., and Shakhmatova, I. P. (1962). Zhurnal Tekh. Fiz. 32,695; Sov. Phys. Tech. Phys. 7. Vorobjev, J. V. (1959). Zhurnal Tekh. Fiz. 29, 589; Sov. Phys. Tech. Phys. 4.
METHODS FOR CALCULATION OF PARASITIC ABERRATIONS
28 1
Wollnik, H. (1987). “Optics of Charged Particles.” Academic Press, Orlando, Fla. Ximen, J., and Li, D. (1990). J. Appl. Phys. 67, 1643. Yavor, M. I. (1989a). In “Nautchnoje priborostrojenije. Elektronnaya i ionnaya optika” [Scientific Instrumentation, Electron and Ion Optics], 61. Nauka, Leningrad. Yavor, M. I. (1989b). In “Nautchnoje priborostrojenije. Elektronnaya i ionnaya optika” [Scientific Instrumentation, Electron and Ion Optics], 66. Nauka, Leningrad. Yavor, M. I. (1989~).Preprint No. 30 NTO AN SSSR, Leningrad. Yavor, M. I. (1989d). Zhurnal Tekh. Fiz. 59, 4, 123; Sov. Phys. Tech. Phys. 34, 454. Yavor, M. I. (1990a). Zhurnal Tekh. Fiz. 60,4, 174; Sov. Phys. Tech. Phys. 35, 508. Yavor, M. I. (1990b). Nucl. Instrum. Meth. A298, 223. Yavor, M. I. (1991a). Int. J. Mass Spectrom. Ion Proc. 104, R11. Yavor, M. I. (1991b). Nautchnoje priborostrojenije [Scientific Instrumentation] 1, 3, 9. Zashkvara, V. V., and Ilyin, A. M. (1973). Zhurnal Tekh. Fiz. 43, 1843; Sov. Phys. Tech. Phys. 18.
Zhu, X., and Liu, H. (1989). “Proc. Int. Symp. Electron Optics.” 309. Inst. Electronics, Beijing.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS, VOL. 86
Index
A
B
AIAs, 49 AIGaAs, 44,48 AlsSb, 50 Absorbing potential, 186 Access transistor, 3 ACRONYM, 92, 154, 156-157 Activation energy, 24, 56, 62, 74 ADFSTEM, 197, 218 Adjustment, 225, 234 Agent, reasoning, 139, 160-161, 167-168 ALE, see Atomic layer epitaxy Alpha particle, 28 Ammonium sulfide, 39 Analysis bottom-up, 160 edge-based, 125, 129-130 plan-guided, 125, 132-133 region-based, 125, 129-130 top-down, 160 Anpassung value, 178 APW, 221 Assembling inaccuracies, 226, 270 Asymptotic technique, 261 Atomic layer epitaxy, 66 Automatic programming, 98-99, 104, 107 Axial astigmatism, 239, 240, 278 Axiom depiction, 144 image, 144 logical, 144-145 scene, 144
Backtracking, 98, 125 Bertein method, 246, 249 BiFET, 69 Bipolar memory, 59 BJT, see Bipolar memory Bloch condition, 178 wave, 176, 178, 180, 186, 187 wave-excitation amplitude, 187 C
Cache memory, 4 CAT, 197 CBED, 175, 197 Center of potential, 202 Centrosymmetric condition, 180 Charge density, 258, 259 Chicken and egg problem, 88, 112, 166 Combination function, 128-129, 133 of multiple analysis results, 126-127 of operator heterogeneous, 86, 125-127, 129, 133-134, 138, 165 sequential, 124-126, 130, 133 Complex potential, 184, 185, 186 Computer vision, 84 Condenser cylindrical, 246, 247, 250, 256, 261, 274 toroidal, 243, 245, 261, 278 Conformal mapping, 227, 246, 247, 269, 274
283
284
INDEX
Constraint algebraic, 156 graph, 149 locational. 161 network, 148 property, 161 Consultative system, for image processing, 86, 94-95, 99, 163 Content addressable memory, 40 Contrast transfer function, 175 Convolution, 190 sum, 205, 210 symbol, 190 theorem, 190, 191 Cooperative integration, 86, 125, 139, 142 Coordinate frame, variation, 246, 252, 254, 258 Crossed fields, 239, 278, 279 Cylindrical mirror, 246, 279
D Data structure pyramid, 133, 138 spatial. 147, 164 Defocusing, 233, 239, 243, 266, 269 Description algebraic, 149 fuzzy, 148 iconic, 148 of scene, 90-91 Design system expert, 90 rule-based, 86, 94, 11 1, 163 DIA-Expert, 100 Diffusion current, 13 Discontinuous potential, 251 Dispersion, 239, 243, 266 Dynamic RAM, 2
E Einstein model, 186 Electronic beam metallization, 30 Ellipticity, 239, 245, 246, 250, 260 Emission system, 240 Excitation error, 178, 180, 197 Expert system, 81, 90, 124, 148 vision, 85-86, 164 EXPLAIN, 95, 98, 100 Extended unit cell, 191, 204, 205, 210
F Face, of the element entrance, 228 exit, 229 Fast Fourier transform algorithm, 191, 202, 205,210 Feedback analysis, 88, 125 Ferroelectric, 69 Field disturbance, 226, 234, 235, 245. 261 Field-free interval, 230 Field-induced barrier lowering, 27 Floating gate memory, 49, 71 Focus of attention, 128, 132 Forwardscattering, 179. 180, 193 Frame, 117, 154 image feature, 117-1 19 transfer process, 118-1 19 Functional. 128-129 G
Gate leakage, 37 Gaussian fit, 221 Generalized cylinder, 154-1 55 Generation lifetime, 13, 29 -limited memory, 49, 71 width, 15 Geometric theorem, 151, 153 proving, 86, 149, 153, 164 Geometry, algebraic, 149 Gestalt law, 111-112 Graph, and-or, 138 Grobner basis, 151-153 Group theory, 181
H Hartree potential, 18 I , 221 Hartree-Fock equation, 221 HBT, see Bipolar memory HEMT, see MODET Hermitian, 180. 186 Heterojunction memory, 44 Huygens’ principle, 187 Hyperarc. 133-1 34, 137 Hypergraph, 133- I38 I
Image analysis, 84, 87-88, 92-93
28 5
INDEX coarse-to-fine, 125, 132 cooperative, 139 plan-guided, 125 problems in, 87-89 process abstract, 94, 102, 109 executable, 94 strategy, 124-126, 129, 132-134, 136, 165 base, 107-110 feature, 116-120, 123-124, 133-134, 137 multiresolution, 125, 132 processing, abstract algorithm, 95, 105 operator library, 87-88, 93 problems in, 87-89 quality, 88, 105, 113, 128 reference, 105, 107, 129 segmentation bottom-up, 86, 93-94, 11 1-1 14, 116, 146, 163 line-based, 114 region-based, 114 top-down, 86, 93-94, 11 1-1 12, 114, 116, 146, 163-164 understanding, 90, 114, 142-144, 146, 153, 156, 164, 166 system, 90-93, 112, 147, 154, 164 IMPRESS, 105, 107. 109, 112 InAIAs, 50 InAs, 50 Inference, 83 deductive, 146 engine, 83 Infinitely thin element, 228 InGaAs, 50 InP, 50 Integral equation, 193 Fredholm, 258, 260 in variations, 246, 258 Ion potential, 181, 221 J
JFET memory, 32
K Knowledge base, 83 control, 114 declarative, 82, 93
image domain, 91-92, 144-145 about image processing technique, 86-87, 92-94 meta, 83 procedural, 82, 87, 89, 146 representation, 87, 89, 149 algebraic, 86, 149-150, 153, 166, 167 in computer vision, 84-86 declarative, 81-86, 89, 91, 144, 146, 154, 163-164, 166, 168 in logic, 143, 166-167 network, 117-118, 135, 157 procedural, 85, 89 scene domain, 91-92, 144-145 shallow, 83
L Lagrange coordinates, 260 Laplace equation, 250, 252, 256, 263 Layer-doubling method, 212, 216 Leakage-limited, 44 LEED, 212 Lens cathode, 240, 246, 251, 278 quadrupole, 270, 278 round, 239, 249, 254, 260, 278 LLVE, 116-117, 120-122, 124, 133-135, 137-138, 157-158, 163 LMTO, 221 Local defect, 274 Logic, mathematical, 144, 146, 167
M Magnet sector, 228, 261, 266, 278 toroidal, 267 Magnetic field homogeneous, 245 inhomogeneous, 240, 245 Manufacturing imperfections, 225 Mask accept, 108-1 10 controlled operation, 126-128 image, 128, 130, 132-133 reject, 108-1 10 Match global, 156 partial, 155-156 MESFET memory, 32, 36 Microprocessor, 4
286
INDEX
Misalignment, 226, 227, 246 MOCVD, 66 Model geometric, 149-150, 166 image, 88, 111, 128 imaging, 166 logical, 145-146, 165 object, 90, 92, 104 sensor, 166 MODFET memory, 54 Modulated plane wave, 193, 195, 199, 204 Morphology, mathematical, 89, 107, 165 Mott formula, 219, 221 Multipole corrector, 278 N Natural coordinate frame, 236 Neural networks, 74 (NH4)2S, 39 Non-destructive readout, 49, 69 Nonvolatile memory, 60, 7 1
0 Operator band pass, 1 1 1 morphological, 107, 109-1 11 Optical potential, 184 Optimization, parameter, 126-127, 129-1 30, 150
Optimized Dirac-Fock-Slater equation, 221 P
Parasitic aberrations, 225 Paraxial trajectory, 23 I , 236 Partial coherence, 175 Pattern matching, 147 recognition, 85, 90, 143, 175 Periodic continuation approximation, 191 Perturbation theory, 186 Phonon excitation, 187 Phrase grating function, 190, 194, 198 Physical optics, 176, 177, 187, 193 Pinip storage capacitor, 27 Plan analysis, 94 global processing, 95-96, 98, 105
Plane image, 232 median, 228, 234, 266 object, 232 profile, 232 Plasmon loss, 187 scattering, 186 Pn junction charge recovery transient, 15 C-V characteristic, 9 electrostatics, 8 generation in, 10 Pnp storage capacitor charge recovery transient, 19 charge removal transient, 21 C-V characteristic, 17 Polarization, 183, 196 Potential eccentricity, 202 Predicate calculus, 82, 86, 144, 146, 164 Problem solving, distributed, 139, 142 Process sequence, 117, 119-120, 123-124, 133, 136-1 37 transfer, 116-120, 123-124, 133-135. 137- 138 Production system, 164 Program composition, 86, 94, 101, 109, 163 generation, 102 specification, 98-99, 104 Projected potential, 189, 201 PROM, 71 Propagation function, 187, 188, 189, 196, 197 Proximity, spatial, 147 Pseudopotential, 221
Q Quantum well memory, 49 Quasi-static memory, 71
R Radiation damage, 30 Real process, 183 Reasoning, 83 cooperative, 125, 139-140, 164, 167-168 engine, 83, 93. 109-1 10, 112, 120, 125, 168 geometric, 150-151, 153-154, 156, 166 hypothesis-based, 146
287
INDEX qualitative, 146 sequential, 125, 167 spatial, 114, 143, 148, 160, 162-164 cooperative, 139, 160, 164 symbolic, 90, 99 Reciprocal lattice vector, 187, 186, 187, 203, 219, 221 Recovery time, see storage time Reference particle, 236, 241 Refresh rate, 2, 25 Relation geometric, 143, 146-151, 164 PART-OF, 86, 143, 154-155, 157-164 spatial, 86, 143, 149 topological, 143-147, I64 REM, 107, 109-110, 112, 124, 165 Resolving power, 234 RHEED, 176, 205, 21 1 Rule algorithm selection, 121, 123 constraint transformation, 121, 123 cost computation, 121, 123 failure, 121, 124 for image segmentation heuristics, 113-114 meta, 114, 125 parameter selection, 121, 123 for process control, 114 production, 82, 96, 113-1 14, 116, 120-121, 124, 136-137, 141, 164 transfer process dependency, 121, 123, 138 selection, 121, 123, 137-138
S Sample figure, 105-107 Scaling, 27 Scattering factor, 189, 191, 221 matrix, 181, 183, 211, 215, 216 Series expansion, 255, 264 Shape representation, algebraic, 149-151, 164 Shockley-Read-Hall generation, 10 Sic, 72 SIGMA, 116, 160 Single-electron, excitation, 187 inelastic scattering, 186 Single-impact ionization, 186
Slot, 117-118 subpart, 154 Small-angle approximation, 194, 195 Small parameter, 252, 263 Soft error, 28 Specification by abstract command, 99, 101, 103 through conversation, 99 by example, 99, 105, 107 goal, 105, 107, 109, 112, 116, 120, 124 SPIDER, 98, 100, 102, 104 Stacked capacitor cell, 65 Standard scattering equation, 205 Static RAM, 2 Stationary-phase approximation, 194, I95 STEM, 175, 197 Stereo matching, 140 vision, 86, 125, 139-140, 142 Storage time, 16, 20 effect of doping on, 26 effect of temperature on, 23 Structure factor, I89 Subthreshold current, 37 Sudden perturbed approximation, 207, 208, 210 Sum of deviation functions, 205 Superquadrics, 149-1 50 Surface generation velocity, 14, 29 Surface treatment, 39 Synapse, 74
T Thermal diffuse scattering, 186 Thermionic emission, 53 Thermionic-field emission, 53 Through-focus, 18I , 182 Through-thickness, 181, 198 Tilt, 245, 250, 260 Tolerance, 225 Transfer matrix, 232, 234 Transform fill-in, 110 sequential decomposition, 1 10 split decomposition, 110-1 11 Tree algorithm state, 108-1 10 decision, 109 k-d, 147 minimum spanning, 147
288
INDEX
operation, 100-101 quad, 147 Trench capacitor cell, 65 Trial-and-error analysis, 88, 94, 98-99, 107, 116, 120 experiment, 99, 112 segmentation, 92 Tunneling, 53 Type. structure. 116 U Unitary matrix, 212
V
Valence-band excitation, 187 Variation principles, 248 Virtual process. 183 Volterra’s type, 194 Voronal diagram, 147
WXYZ
Wien filter, 279 Yoshioka’s coupled equations, 184, 195, 203
ISBN 0 - 3 2 - 0 3 4 7 2 8 - 9
This Page Intentionally Left Blank