ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 87
EDITOR-IN-CHIEF
PETER W. HAWKES Centre Nationaf de la Recher...
69 downloads
1224 Views
15MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS
VOLUME 87
EDITOR-IN-CHIEF
PETER W. HAWKES Centre Nationaf de la Recherche Scientifique Toulouse. France
ASSOCIATE EDITOR
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
Advances in
Electronics and Electron Physics EDITEDBY
PETER W. HAWKES CEMESILaboratoire d'Optique Electronique du Centre National de la Recherche ScientlJique Toulouse, France
VOLUME 87
ACADEMIC PRESS, INC. Harcourt Brace & Company Boston San Diego New York London Sydney Tokyo Toronto
This book is printed on acid-free paper. @ COPYRIGHT 0 1994 BY ACADEMIC PRESS. INC ALL RIGHTSRESERVED NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL. INCLUDING PHOTOCOPY, RECORDING, OK ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION I N WRITING FROM THE PUBLISHER
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, CA 92101-4311
Uniled Kingdom Edition pubhhed by ACADEMIC PRESS LIMITED 24-28 Oval Road. London NWI I D X
LIBRARY OF CONGRESS CATALOG CARDNUMBER: 49-7504 ISSN 0065-2539 ISBN 0-12-014729-7 PRINTED IN THE UNITED STATES OF AMERICA
93 94 95 96 BC 9 8 I
6
5
4
3 2
1
CONTENTS
CONTRIBUTORS .............................................. PREFACE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I. I1 . 111. IV . V. VI . VII . VIII . IX .
I. I1. 111. IV . V. V1.
Image Restoration on the Hopfield Neural Network J . B . ABBISS.M . A . FIDDY.AND R . STERITI Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Reconstruction on a Neural Net . . . . . . . . . . . . . . . . Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Examples of Image Restoration . . . . . . . . . . . . . . . . . . . . . New Restoration Approaches . . . . . . . . . . . . . . . . . . . . . . . Hardware Implementations . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fundamentals and Special Applications of Non-contact Scanning Force Microscopy U . HARTMANN Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Probe-Sample Interactions in Non-contact Scanning Force Microscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electric Force Microscopy Used as a Servo Technique . . . . . Theory of Magnetic Force Microscopy . . . . . . . . . . . . . . . . Aspects of Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Electrical Noise as a Measure of Quality and Reliability in Electronic Devices B . K . JONES I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I1 . Established Mechanisms of Excess Noise Involving Defects . . V
vii ix
1
7 11
15 22 27 36 38 42 44 44
49 51 129 133 191 195 197 197
201 215
vi
CONTENTS
111. Quality and Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . .
IV . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
I. I1. 111.
IV . V.
Parallel Processing Metholodogies for Image Processing and Computer Vision S . YALAMANCHILI AND J . K . AGGARWAL Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matching Algorithms and Architectures . . . . . . . . . . . . . . . Architecture-Driven Approaches . . . . . . . . . . . . . . . . . . . . . Application-Driven Approaches . . . . . . . . . . . . . . . . . . . . . Emerging Research Areas . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
237 247 247
259 261 273 285 296 297
301
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
J. B. ABBISS(I), Spectron Development Laboratories, Inc., 3535 Hyland Avenue, Tustin, California 92626 J. K. AGGARWAL (259), Department of Electrical and Computer Engineering, University of Texas, Austin, Texas 78712
M. A. FIDDY(l), Department of Electrical Engineering, University of Massachusetts Lowell, Lowell, Massachusetts 01854 U. HARTMANN (49), Institute of Thin Film and Ion Technology, KFAJulich, P.O. Box 1913, D-5 170 Julich, Federal Republic of Germany
B. K. JONES(201), School of Physics and Materials, Lancaster University, Lancaster LA1 4YB, United Kingdom R. STERITI ( I ) , Department of Electrical Engineering, University of Massachusetts Lowell, Lowell, Massachusetts 01854
S. YALAMANCHILI (259), School of Electrical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0250
vii
This Page Intentionally Left Blank
PREFACE Images, image formation and signal processing are the themes of this volume. The first chapter is concerned with a rapidly growing topic, neural networks for image processing. The authors, who have contributed extensively to the literature of image restoration, explain at length why neural networks are promising for restoration and how these new ideas can be implemented in practice. A class of instruments for which some kind of image processing is indispensable consists of the near-field microscopes. In the second chapter, U. Hartmann explains the principles of scanning force microscopy and explores in detail the design problems of these instruments. This is followed by a discussion by B.K. Jones on a topic that is endemic to all electronics and electron physics: noise. Here the emphasis is on the exploitation of noise measurements to give information about quality and reliability in electronic devices. The volume concludes with another topic that is of the highest interest today, namely, the best ways of using parallelism in computing for image processing and computer vision. This is a relatively new subject and it is obvious that new architectures will continue to require new approaches. This survey, in which the problems, possibilities and constraints are presented very clearly, will certainly be found helpful in confronting the newest developments. It only remains for me to thank all the contributors and to list material promised for future volumes. FORTHCOMING ARTICLES Electron holography G. Ade H. H. Arsenault Image processing with signal-dependent noise Parallel detection P. E. Batson M. T. Bernius Microscopic imaging with mass-selected secondary ions Magnetic reconnection A. Bratenahl and P. J. Baum Sampling theory J. L. Brown ODE methods J. C. Butcher Interference effects in mesoscopic structures M. Cahay W. K. Cham Integer sinusoidal transforms The artificial visual system concept J. M. Coggins ix
X
PREFACE
Projection methods for image processing Minimax algebra and its applications Corrected lenses for charged particles Data structures for image processing in C The development of electron microscopy in Italy Electron crystallography of organic compounds The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Amorphous semiconductors Median filters Bayesian image analysis Theory of morphological operators Electrostatic energy analysers
Applications of speech recognition technology Spin-polarized SEM High-definition television Fractal signal analysis using mathematical morphology Electronic tools in parapsychology Image formation in STEM Phase-space treatment of photon beams Fuzzy tools for image analysis Z-contrast in materials science Electron scattering and nuclear structure Edge detection The wave-particle dualism Electrostatic lenses Scientific work of Reinhold Rudenberg Electron holography X-ray microscopy
P. L. Combettes R. A. CuninghameGreen R. L. Dalglish M. R. Dobie and P. H. Lewis G. Donelli D. L. Dorset
M. Drechsler J. M. H. Du Buf W. Fuhs N. C . Gallagher and E. Coyle S . and D. Geman H. J. A. M. Heijmans S. P. Karetskaya, L. G. Glikman, L. G. Beizina and Y. V. Goloskokov H. R. Kirby K. Koike M. Kunt P. Maragos R. L. Morris C. Mory and C. Colliex G. Nemes S. K. Pal S. J. Pennycook G. A. Peterson M. Petrou H. Rauch F. H. Read and I. W. Drummond H. G. Rudenberg D. Saldin G. Schmahl
xi
PREFACE
Accelerator mass spectroscopy Applications of mathematical morphology Texture analysis Focus-deflection systems and their applications The suprenum project Knowledge-based vision Electron gun optics Spin-polarised SEM Morphology on graphs Cathode-ray tube projection TV systems
Thin-film cathodoluminescent phosphors Diode-controlled liquid-crystal display panels Signal description The Aharonov-Casher effect
J. P. F. Sellschop J. Serra H. C. Shen T. Soma
0. Trottenberg J. K. Tsotsos Y.Uchikawa T. R. van Zandt and R. Browning L. Vincent L. Vriens, T.G. Spanjer and R. Raue A. M. Wittenberg Z. Yaniv A. Zayezdny and I. Druckmann A. Zeilinger, E. Rase1 and H. Weinfurter
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS A N D ELECTRON PHYSICS. VOL X7
Image Restoration on the Hopfield Neural Network J . B . ABBISS Spectron Development Laboratories. Inc., Tustin. Calijornia
M . A . FIDDY and R . STERITI Department of Elecrrical Engineering. University of Massachusetts Lowell. Lowell. Massachusetts
I Introduction . . . . . . . . . . . . . . . . . . . . . A . Artificial Neural Processors. . . . . . . . . . . . . B. Image Deconvolution . . . . . . . . . . . . . . . . 11. Neural Networks . . . . . . . . . . . . . . . . . . . A . Hopfield Networks . . . . . . . . . . . . . . . . . 111. Image Restoration . . . . . . . . . . . . . . . . . . A . Mathematical Background . . . . . . . . . . . . . . B . Prior-DFT Estimator . . . . . . . . . . . . . . . . IV . Image Reconstruction on a Neural Net . . . . . . . . . A . Background . . . . . . . . . . . . . . . . . . . . B. Minimizing an Energy Function . . . . . . . . . . . C . Image Restoration o n a Binary Network . . . . . . . . D . Image Restoration on a Nonbinary Network . . . . . . E . Computational Complexity . . . . . . . . . . . . . . V . Matrix Inversion . . . . . . . . . . . . . . . . . . . A . Neural Matrix Pseudo-Inverse . . . . . . . . . . . . B . Numerical Considerations . . . . . . . . . . . . . . C . Properties of the Neural Matrix Inverse . . . . . . . . VI . Examples of Image Restoration . . . . . . . . . . . . A . Regularized Iterative Reconstructions . . . . . . . . . B. PDFT Reconstructions . . . . . . . . . . . . . . . C . Discussion . . . . . . . . . . . . . . . . . . . . VII . New Restoration Approaches . . . . . . . . . . . . . . VIII . Hardware Implementations . . . . . . . . . . . . . . . A . Electronic Hardware . . . . . . . . . . . . . . . . B . Optical Hardware . . . . . . . . . . . . . . . . . IX . Conclusions . . . . . . . . . . . . . . . . . . . . . Acknowledgment . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
.
. . .
. . . .
1 4 6 7 8 11 12 13 15 15 16 18 19 21 22 23 24 26 27 27 31 34 36 38 38 40 42 44
44
I . INTRODUCTION Images generated by an optical or electro-optical system can be degraded for a number of reasons . An intrinsic limitation is that the finite extent of 1
Copynght 0 1994 by Academic Press. Inc . All rights of reproduction In any form reserved ISBN 0-12-014729-7
2
J . B. ABBISS
el
al.
the entrance pupil imposes a finite upper bound on the system’s spatial frequency response. The image quality of most operational imaging systems will not, however, approach this theoretical limit very closely. It is possible that the design or construction will be flawed, as in the case of the Hubble telescope, through defective manufacture, assembly, or quality assurance procedures. The detector itself may impose limitations; for example, where a CCD array is used, information is lost in the interpixel areas and image energy is integrated over the active area of each pixel. Other degrading factors will include defective pixels and noise in the CCD array and electronic subsystems. The image restoration algorithms considered in this chapter were originally aimed at achieving performance beyond the diffraction limit, but are in fact capable of compensating simultaneously or separately for aberrations induced by the optical components and for the limitations of the detector. They are inherently robust and possess valuable noise-suppressing properties. The mathematical foundation for the methods is the fact that the spatial frequency spectrum of an object of finite extent is bandlimited. The spectrum is the Fourier transform of the object, in the coherent case, or its intensity, in the incoherent case. Image quality is a function of the way in which these spectral components have been truncated or modified. Image restoration is concerned with techniques for correcting or extrapolating the spectrum, thus recovering a closer approximation to the original object. We describe some image restoration or superresolution algorithms that can be implemented on an artificial neural network. The motivation for this is that the kind of restoration algorithms of interest can be formulated as optimization problems, which are well suited to solution in this way. Neural network solutions to problems offer a degree of redundancy or “fault tolerance” (Rumelhart and McClelland, 1986; Kosko, 1991). From a computer science point of view, one could argue that we are really talking about massively parallel interconnections between processing units, or connectionism, and that the term neural has unnecessary or unhelpful additional connotations. Nevertheless, since the term neural net is so pervasive these days, we shall also use this name. There is also the expectation that either electronic or optical hardware will become increasingly available which will permit these algorithms to be executed at high speed and in parallel. We consider both types of hardware in Section VIII. Indeed, one could interpret the content of this chapter as a discussion of the types of algorithms that could be successfully implemented on fully parallel neural hardware once it is available; i.e., we can stipulate how the network can be successfully “trained” to solve this kind of problem. The subject of neural networks is very broad, and we confine ourselves here to one type of network known as a Hopfield network (Hopfield, 1982,
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
3
1984). It is a particular example of a fully connected processing architecture in the sense that each processing element is connected to every other element by a weighted link. We describe the basic properties of this network and show how it provides a framework for interpreting a variety of optimization procedures useful in image restoration, by relating the energy function for the Hopfield network to that of a specific optimization problem. In this way learning as such, e.g., by example or exposure to a training set, is not necessary. More complex neural networks can contain many layers of processing nodes, which are fully interconnected between layers. Training these networks via iterative learning algorithms can be time-consuming and may be only partially successful. For these networks, however, it is still possible to specify the interconnection weights between processing nodes (Poggio and Girosi, 1989). The image restoration problem we consider is that of improving the resolution of a low-pass filtered image or retrieving a high-resolution image from limited noisy and/or distorted spectral data. In all inverse problems of this kind, the practical constraints of real data result in a fundamental lack of uniqueness. As a result, it is necessary to adopt some kind of appropriate model for the specific problem in hand. One must recognize its biases, if any, and then solve an optimization problem, in order to determine the best solution for the problem consistent with the imposed constraints. It is the energy function defined for this purpose that dictates the architecture of the neural network. For optimization problems, the question of uniqueness and the identification of algorithms that can arrive at the desired solution in a stable and repeatable fashion, despite the presence of noise in the data samples, are significant concerns. Algorithms that can effectively reach a solution but not necessarily the solution are of limited practical use. Also, solutions that are optimal in some prescribed sense may not be the best reconstructions, in terms of the quality of that image or the fidelity of the features of interest. For the restoration problems described later, it can be shown that the energy functions used do possess a unique minimum. Consequently, any procedure that reduces this energy, as a function of the image parameters, should ultimately provide a unique estimate of the reconstructed image. The earliest example of using a Hopfield network to obtain an approximate solution to an optimization problem, the traveling salesman problem, was given by Hopfield and Tank (1985). However, there are some important optimization problems for which there is no single minimum associated with the energy function used. Under these circumstances, the procedure may stagnate at some local minimum of the energy surface. An important problem of this type is the Fourier phase retrieval problem (Fienup, 1982; Fiddy, 1987). This arises in many applications, such as imaging through
4
J. B. ABBISS et al.
turbulent or random media, intensity interferometry (for very high resolution imaging), and high-frequency scattering experiments such as x-ray diffraction studies of materials. There is, as yet, no satisfactory solution to this problem. Methods such as simulated annealing, one of the few methods proposed for locating global minima, are notoriously slow and difficult to accelerate; some kind of reliable algorithm must be found. The Hopfield artificial neural network operates in an iterative fashion, and it can be shown that the network converges to a state with the lowest local energy. As will be seen later, training a Hopfield network can lead to an associated energy function for that network with many local minima, some entirely spurious if the number of memorized states is excessive or if the self-feedback to a processing element is nonzero. These difficulties are not an issue in the signal processing applications described here, because the network is designed to have only one energy minimum. The parameters describing the image restoration problem define an energy function that can incorporate prior knowledge about the object. This energy function can be directly mapped onto the connection strengths of a Hopfield network. Thus, once the (Hopfield) hardware is realized (or simulated), the network architecture proceeds to update neural values until a stable state is reached. In our case, that also corresponds to the image restoration problem’s solution. In this way, new algorithms can be developed that allow image reconstruction to be carried out on fully parallel hardware. The hardware we consider here is a programmable Hopfield net, which can be updated synchrotzously (i.e., with simultaneous updating of all neural states) to provide a reconstructed image at high speed. Because of the ill-posed nature of the problem, restoration methods always require some degree of prior knowledge about the image to be available. Examples of prior knowledge include low-resolution image features or edge locations; we therefore envisage this approach to be particularly suitable for remote sensing and monitoring applications, as in quality control. However, image restoration methods are well known to be illconditioned - hence the need to employ regularization techniques.
A. ArtGcial Neural Processors There are several models of neural networks, each of which has a structure based loosely on biological nervous-system hardware (Rumelhart and McClelland, 1986). A neural network architecture consists of a very large number of simple processing elements densely interconnected by a set of weighted links. Each processing element updates its state by comparing the sum of its inputs with a prescribed threshold. The study of the properties of
IMAGE RESTORATION O N THE HOPFIELD NEURAL NETWORK
5
neural networks is a subject still somewhat in its infancy (Zornetzer et a/., 199I; Zurada, 1992). It is also difficult to present many concrete applications based on neural networks, since current hardware limitations reduce their practical impact. It has been suggested by Anderson and Rosenfeld (1987) that they may not become useful until cheap special-purpose parallel hardware is available. It is expected that they will prove useful in solving computationally intensive, difficult, or nonlinear problems such as those in robotic control, pattern recognition, modeling plant dynamics, etc. (Eckmiller and Malsburg, 1987; Pao, 1989). Should neural hardware become available, the question remains as to how one would make best use of a neural computer - i.e., how one should program or “train” it to perform the tasks required. The hope is that some problems for which it is difficult to find satisfactory algorithmic solutions might be amenable to solution on this kind of computing architecture, which can somehow organize itself and learn what it is expected to accomplish. In all cases, the behavior of an artificial neural network, after appropriate training, can be expressed in terms of the minimization of some appropriate energy or cost function. For our purposes, one can describe the recovery or restoration of an image as a deconvolution exercise. It may be necessary to remove systematic degradations such as blurring or low-pass filtering effects, as well as noise. For many years, methods designed to achieve this deconvolution have been based primarily on inverse filtering, which requires high signal-tonoise ratio images (Andrews and Hunt, 1977). These methods can be computationally intensive, and techniques for speeding them up are necessary. A n artificial neural network promises this possibility because of its programmable parallel-processing potential. This is not to say that other parallel-processing architectures could not successfully compete with artificial neural networks. The differences between the two options lie in the way in which the solution is computed. Our task is to find a procedure that minimizes a well-defined energy function. A conventional parallel computer relies on the execution of a search algorithm to do this, and there might be several ways in which the processors could be organized in order to obtain the result; how to partition the processors to effectively compute the solution becomes an issue. However, for the case of hardware representing a fully connected network of processors, the connection weights are modified in order to execute the minimization. Such a network is synonymous with a Hopfield neural network. If the network dynamics permit synchronous updating of the network, then rapid computations are possible. Any deconvolution procedure that is based on a least squares approach can be formulated for high-speed processing on a fully connected computing architecture. In the following sections, we describe the mathematical basis of
6
J. B. ABBISS et
a1
these restoration schemes and suggest different methods of implementation and hardware.
B. Image Deconvolution Deconvolution is a problem that arises in many areas of imaging as well as signal processing. It is a difficult problem to solve algorithmically because it is ill-posed and can be computationally intensive; by ill-posed we mean that a solution may not exist, or it may not be unique, or it may depend discontinuously on the data. Here we will confine our discussion to the study of two-dimensional image restoration. Typical constraints that might be available to assist with the restoration are, for example, prior knowledge that the image should be real positive and bounded by some support shape. The positivity constraint can be unsuitable in the case of low-pass filtered image data; if the spectral extrapolation does not extend to infinity, the restored image will still exhibit negative side-lobes. Deconvolution, viewed as an optimization problem, can be solved in at least two distinct ways: either directly via a matrix inverse, or iteratively. The former leads to the need to implement an algorithm based, for example, on Gaussian elimination or singular value decomposition (SVD) in order to solve a system of equations. Numerical deconvolution is an ill-conditioned procedure; large changes in the solution can result from small changes in the input data. The ill-conditioning is a manifestation in the discrete numerical case of the ill-posed nature of the problem. Steps must be taken to stabilize or regularize the solution, i.e., to ensure existence, uniqueness, and continuous dependence of the solution on the data. A different and equally robust approach to deconvolution is to solve the problem with a regularized iterative procedure, which can be shown to converge to the same solution; we cite, for example, regularized Gerchberg-Papoulis-type algorithms, which can be used for deconvolution and spectral extrapolation. The processing steps required for this are matrix operations involving the imposition of constraints between Fourier transformations. Image restoration, by virtue of its generally multidimensional character, is inherently more suited to parallel processing architectures. Parallel processing on neural electronic hardware is in its infancy, and a few semiconductor devices have only recently become available; examples are briefly reviewed in Section V1II.A. Optical implementation offers a competing technology with potentially higher speeds and higher degrees of parallelism, because interconnections need not be physically hardwired. The concept of using optics for parallel processing is far from new, and much effort over the years has been invested in the development of such
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
7
systems; see Section VII1.B. However, several problems arise when using optical hardware, the most important being the transfer of the required information onto the optical carrier. The available spatial light modulators (SLMs) that could be used have traditionally suffered from limitations in terms of speed, dynamic range, resolution, or cost. The use of neural networks in image (or signal) classification, recognition, or understanding is steadily increasing. These are applications that the human brain is particularly good at, while current algorithms implemented largely on serial machines still leave much to be desired.
11. NEURAL NETWORKS
As mentioned earlier, the Hopfield network is a fully connected network in the sense that any one of the processing elements is connected to every other one. This contrasts with layered networks, such as a multilayered perceptron, MLP (Kosko, 1991), in which processing elements are arranged in layers with connections only between neighboring layers. This difference in topology is accompanied by differences in the thresholding functions and in the procedures to find the connection strengths. The Hopfield network is implemented iteratively; the connection strengths are assigned and specify a cost function that the iterative procedure minimizes. The MLP is a one-pass network once the connection strengths have been ‘‘learned’’ by the minimization of an error function that quantifies the difference between current and desired output states. Such a multilayered network is more versatile in its performance than the Hopfield model. The price paid for this is that there is no rule that is both simple and reliable for ‘‘learning’’ the connection strengths, i.e., by calculation of the outer product that gives the connection strengths. Usually, to determine the connection strengths in the MLP case, an iterative error backpropagation scheme is required, whose convergence properties are uncertain, but which does generally perform, eventually, in a satisfactory manner. Backpropagation (and other learning) algorithms use error signals based on a system’s response to update an initially random set of connection strengths. A comparison between the actual network output for a given input and the training example is made, and the simple difference is used to modify internal connection strengths to reduce this error. Over all of the output neurons, the mean-squared error for the example is reduced iteratively in this way. The process is computationally intensive, and learning times can be very long, almost unacceptably long, for all but
8
J. B. ABBISS et al.
the simplest problem. The representation of the information can affect convergence rates, and sometimes intuitively determined representations comprising a breakdown of the problem into key features can prove successful. In many ways, however, opting for a specific representation of the information to be processed will bias the outcome of the network, and it would be much cleaner and more satisfactory to be able to allow the network to organize the relationships required in terms of its own chosen representations. Such networks are referred to as self-organizing or unsupervised networks. This may not help to reduce the learning times needed, but does permit an attractive “hands-off’ approach.
A . Hopfield Networks
The Hopfield network (Zornetzer et al., 1991; Zurada, 1992) allows one to specify a set of desired stable configurations for the states of the elements of the network. A cost function can then be defined and an iterative procedure specified that takes an arbitrary initial state of the elements to one of the stored states. This, at least, is what happens if the number of stored states, S , is significantly less than the number of elements, N . (Experimentally it has been found that S should be less than about 0.15N.) The basic structure of the Hopfield model is the following. A connection matrix, or 7‘-matrix, is specified by the sum of outer products formed from the desired stored state configurations. The stored states can be represented by N-element vectors for an N-element network. Given an initial starting state for the network, a thresholding rule is usually applied asynchronously to determine the new state of an element. We are interested in applications for which a solution state evolves through the minimization of some specific cost or energy function using a Hopfield neural network. Once an energy function is defined, one can determine the appropriate connection strengths in order that the function associated with the network be the same as that of the problem under consideration. One could regard this as an example of unsupervised learning, in contrast with supervised learning, for which precise information about the network output is available and incorporated. A key feature of a net of this kind is its construction from a set of simple processors, each of whose states is determined by a thresholding operation applied to a sum of weighted inputs from other processors or nodes. The properties of the network as a whole are determined by the thresholding function used, and by the patterns and strengths of the connections between the processing elements. The processing elements in the brain have soft or graded threshold responses with a sigmoid form; at the expense of increasing the number
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
9
of iterations before convergence, this appears to reduce convergence to spurious but stable network states. It is also important to recognize that such a network will have weak formal computing power; its limits, however, remain to be explored. The network consists of N processing elements, each of which has two states and each of which has a thresholding operator that determines the state of the element from the total input to that element. As the network iterates, the energy is reduced until a local minimum is reached. Provided the number of stored memories, M , is sufficiently small, this minimum will correspond to the memory closest in Hamming distance to the state in which the network was started if the connection matrix, T, is formed according to M
( i , j = 1,2, . . . A J ) , where the u1/1are the N elements of the memory vectors to be stored that can take the values 1; in most discussions, it is assumed that the diagonal term, T,, is zero in the Hopfield model. The two-state representation is too limited for an acceptable one-to-one mapping between elements and signal or image samples, in most cases. However, these simple elements can be taken in groups to represent greylevels through a variety of coding schemes (Abbiss et al., 1988; Zhou et al., 1988). Alternatively, either analog or more complex digital processing elements could be used to directly represent a grey level (Abbiss et al., 1991). Given an initial starting configuration or state of the network, each processor or "neuron" randomly updates its state wi according to a thresholding rule of the form
*
N
if
C Tw.. > 0 then vi 'I J
-
=
1 else vi= - I .
(2)
j=l
An energy function for this operation can be defined and can be shown to be always minimized. This has the form
This iterative scheme can be expressed more concisely and modified in the following manner. While retaining the constraint that the elements may only take two values, we let the state of the network in the nth iterative cycle be described by the N-element vector vn+'
=
V ( U "= ) U(Tv" + b ) ,
(4)
10
J . B. ABBISS et a/.
where U ( * )is the threshold operation, T denotes the connection matrix, and b is a bias vector; this equation defines a vector un,each of whose elements is the sum of weighted inputs at a specific neuron. The bias vector incorporates boundary conditions such as image data; it effectively shifts the decision threshold for each element. The energy function minimized by the network is now of the form (Hopfield, 1984) E = - f v T T v - b'v.
(5)
It is important to note that the minimization of E can only be assured for asynchronous updating of the network states. This does not mean that for synchronous updating the network will necessarily fail to converge, but that its behavior is not predictable and the energy at each iteration might increase. This can be seen by considering the change in energy for a change in the state of one or more neural elements. When one neuron changes state from V k to V k Ak, We have
+
E + AEk
=
-fv'T?~ - b'v
-
&(Uk
+ f TkkAk).
(6)
Taking Gk to be zero ensures that the change in energy cannot be positive, since the right-hand term in the parentheses is U k , which always has the same sign as Ak. If Tkkis nonzero, the term in the parentheses will have the same sign as Ak provided Tkkis positive, and then E is guaranteed not to increase. If two (or more) neurons change state simultaneously, the change in E contains terms involving products of the form - TklAkA[ (or these plus higher-order terms if more neurons change), the sign of which can no longer be predicted. Rather than a binary representation for the states of the network elements, one can also adopt a continuous model. This is achieved by defining z k by
1
dzk
where
vk
= Ukr
(7)
is now determined according to vk = o ( z k ) ,
(8)
and cr is a continuous function of sigmoidal shape. One limiting case of a sigmoidal threshold function is a clipped linear function. The proof of the minimization of the same E for this network with any appropriately thresholded change of state was given by Hopfield (1982). The proof relies upon the fact that the thresholding function is a monotonically increasing function.
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
11
111. IMAGE RESTORATION In this section we describe the mathematical model defining the image restoration problem. Having formulated the task in terms of an optimization problem, we relate it to the mathematics of a Hopfield neural network. Some iterative image restoration processes are mathematically very similar to auto-associative memory; indeed, if the input information is incomplete, it can be considered as a key pattern to an associative memory. There are many applications that require the restoration of a signal or image from a limited discrete data set, be it data on the spectrum of a function or data on a low or bandpass image of that function. We use the term image restoration and superresolution interchangeably henceforth; both refer to the recovery of information about an image by some form of spectral interpolation or extrapolation. An important a priori assumption for work in superresolution is the fact that most objects to be imaged are of finite, i.e., compact, support. This leads to the well-known result that their spectra are bandlimited functions. In principle, therefore, one might hope to extend limited spectral data by means of analytic continuation. This procedure is notoriously unstable in the presence of noise and does not provide a practical solution to the problem. One has infinite freedom in interpolating and extrapolating limited sampled data; hence, one is forced to approach superresolution from an optimization point of view (Darling et al., 1983; Byrne et al., 1983). The best that one can hope to achieve is the specification of a cost or energy function that provides a unique minimum. An energy function is designed to incorporate whatever constraints and a priori knowledge might be available to help limit the set of possible solutions to the problem, while retaining desirable and necessary solution characteristics. Examples of constraints include data consistency, support consistency and, perhaps, positivity. It is a matter of taste, to a large extent, how one designs a cost function in order to obtain a desirable solution to the problem, namely a superresolved signal or image with acceptable properties. Here we discuss a method for mapping a specific deconvolution procedure onto a fully connected network of simple processors. The entire network is allowed to iterate until it reaches an energy minimum, the enhanced image being represented by the final processor states. For a fully connected architecture, the number of interconnections grows as the square of the number of image points, and for this reason, an optical processor is an appropriate form of implementation. The mapping of the algorithm is done in such a way that the network will always converge both for serial operation (where the individual processors are updated one at a time) and for parallel operation (where all the processors are updated simultaneously).
12
J. B. ABBISS et al.
The image restoration problem is thus transformed into one of determining the (global) extremum of a cost function, on the assumption that this solution is optimal. The objective of the restoration process is to obtain a final image that has a higher spectral or spatial frequency content than the original data set as a direct consequence of incorporating the prior knowledge available into the cost function. One can regard this application in a sense as establishing a content-addressable memory through an unsupervised learning approach; we require interpolation in the image domain to obtain reasonable estimates of the higher-resolution features of the input image, on the basis of a specified cost function. It has been remarked that one of the early successes of a neural network was to find a good approximation to the traveling salesman problem (Eckmiller and Malsburg, 1987). This is a problem for which many suboptimal solutions can be found but the “global” optimum is sought. There are two distinct ways in which the image restoration problem could be mapped onto a neural network. One is to train the network using a data base of superresolved images (Farhat and Miyahara, 1986; Rastogi et al., 1987; Eichmann and Stojancic, 1987), and the other is to relate the energy function associated with a given network to the chosen restoration energy function. It is the latter that we adopt here. A . Mathematical Background
Most signal or image recovery problems can be described by linear equations of the form
where A is the system spread function or the Fourier transform kernel, for example. To obtain information about the objectf(y) from g ( x ) requires the solution of a linear inverse problem. This is equivalent to finding the solution of a Fredholm integral equation of the first kind. It is well known that small fluctuations in the date, g(x), can lead to very large fluctuations in the unknown function,f(y). This is a manifestation of the ill-posed nature of the problem, and some degree of regularization is required in order to determine stable and meaningful solutions. In practice, an estimate offis determined from a finite set of samples of g(x), and the data vector g is expressed by g = Af+ n, (10) where A is the system operator, and n represents an additive noise component; A contains explicitly the support constraint on f, which is assumed
IMAGE RESTORATION O N THE HOPFIELD NEURAL NETWORK
13
to be known or estimated a priori. These limited data can be regarded as noisy values of a finite set of values of a bounded linear functional on$ A data-consistent solution exists, however, which is a solution of minimum norm. This solution is the data-consistent one that minimizes llI,!112 where {$I} is the set of all possible solutions and 11.11 denotes the L2 norm. The solution to this minimization problem can be written as N k=l
where the uk and the uk are the singular vectors and singular functions, respectively, pertaining to A . N is the number of image data points, and the " k are the singular values; viz., Auk = a k v k and A*vk = QkUk, where A* is the operator adjoint to A . These singular values tend to zero as k increases, leading to the instability of the estimator$ This solution is ill-conditioned. Stability in the preceding solution can be restored by relaxing data consistency; thus, we minimize the cost function (Abbiss et al., 1988):
E
=
I I 4 J- g1I2+ P11@Il2.
(12)
As P tends to zero, this solution becomes more data-consistent, as can be seen from the general solution: N
where the regularization parameter, p, is chosen to achieve a compromise between resolution and stability and usually requires some adjustment in order to establish its optimal value. The minimizer of this cost function can also be computed directly in matrix form, namely
where denotes transpose. One way of inverting the matrix in Eq. (14) would be to find the singular system associated with it. We note also that truncation of the series in Eq. ( 1 1) at some appropriate point, or equivalently the series in Eq. (13) with p set to zero, is an alternative form of regularization (Hansen, 1987; Bertero et al., 1988). B. Prior-DFT Estimator
An alternative approach to estimating the object,f, is to consider the minimization of the cost function 11f - I,!1I2 using a trigonometric polynomial of the
14
J . B. ABBISS el al.
form (Darling et al., 1983; Byrne et al., 1983) N
where the optimal dk satisfy N
m=l
and G, are data corresponding to the Fourier transform of the low-pass filtered image, g; denotes the inner product in the space F of possible objects. It is worth pointing out that in the space F that incorporates the known support constraint for the function to be restored, the three solutions given by Eqs. (13), (14), and ( 1 5) are equivalent; expression ( 1 5) can be obtained from expression (14) (Darling et a[., 1983). Each method for solution is more or less computationally the same in that each requires 0 ( N 3 )multiplications; this was pointed out previously by Abbiss et al. (1991). We note that expression (13) requires on the order of C N 3 multiplications, where the overhead C is large by comparison with the other methods. However, a primary concern is the ease with which the regularization parameter p can be varied; this can be done at the cost of O ( N 2 )multiplications in each case. When only support constraints are being imposed on the reconstruction, all of these methods are also generating the same solution as the iterative procedure known as the regularized Gerchberg-Papoulis algorithm (Papoulis, 1975; Gerchberg, 1974; Abbiss et al., 1983). This is frequently implemented by iteration between the image and the Fourier domains, but can be expressed in the image domain by (aI
-
fk+' = A*g + [( 1 - p)r - A * A ] f k , (16) where the superscript k here denotes the kth iteration of the vector f. One specific example of this approach follows directly and simply from specifying a particular form for a trigonometric polynomial representation for the estimate f given by N
j=l
where p ( t ) encodes some prior knowledge about the expected target or image shape and t can represent a one-, two-, or higher-dimensional spatial variable. This was first proposed by Byrne et al. (1983), and it has also proved useful in recovering an image from limited power spectral data. The optimal set of coefficients for the polynomial are determined by minimizing a cost function of the form Ilf- q!lli,where H denotes a
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
15
weighted L2 space. The weighting function p ( t ) typically encodes prior knowledge about the support of the original signal or image, but could also incorporate information on the internal structure of the object, if available. The resulting estimate f is data-consistent and is determined as a function of a continuous variable. In this sense it can be said to have infinite resolution, because it can be evaluated on an arbitrarily fine grid of points and its spectrum is infinitely extended. The estimator f, in Eq. (17), is a solution of minimum norm in H , the weighting being by the inverse of p ( t ) . This procedure is implemented in a direct (i.e., closed-form) method, but can be shown to be equivalent to the iterative procedure of Gerchberg and Papoulis for the case when only support information is incorporated in p ( t ) (Byrne et al., 1983). It is interesting to note that the sensitivity of the estimate to the choice of p ( t ) makes this procedure attractive for target recognition. The energy associated with the estimate f is a function of the choice of prior estimate p ( t ) . For this reason, the monitoring of the energy o f f as a function ofp(t) provides a measure against which p ( t ) can be systematically modified in order to converge to an optimal shape representation for f ( t ) and thereby classify or recognize that object. Once the support of the prior estimate p ( t ) is smaller than the true support, the out-of-band constraints of the data lead to dramatic increases in the measured energy o f f (Darling, 1984). The computationally intensive part of this method lies in the determination of the optimal set of coefficients, u,, in Eq. (17). This requires a matrix inversion with regularization dependent upon the level of noise in the data, as well as the ill-conditioning due to truncation and round-off errors. In practice, there is considerable updating of the choice for the regularization parameter required, in order to ensure a stable inverse (i.e., an acceptable condition number), and a consequently non-ill-conditioned estimate f. This is achieved at the cost of possibly calculating several matrix inverses. For these reasons, it is desirable to find a procedure that can provide a good estimate of a matrix inverse, with speed, reliability, and minimal reliance upon specific values of the regularization parameter (Steriti and Fiddy, 1993). IV. IMAGERECONSTRUCTION ON A NEURAL NET A . Background
There has been much work in the area of image restoration performed on an artificial neural network. Zhou and Chellappa (1991) used a network to restore grey-level images degraded by a known (5 x 5) shift-invariant blur function and noise; the image grey levels were represented by a simple sum of
16
J . B. ABBISS et al.
binary-state elements. They noted that the energy function did not always decrease monotonically, a possibility we mentioned earlier. This resulted in an annealing rule being described in order to avoid local minima in E, with results that showed some improvement in restorations over alternative methods, for the case of a uniform blur and low noise level. Jang et al. (1 988) used the optimization properties of a fully connected network in order to estimate a matrix inverse; this is important in several signal-processing tasks (see later). In their case, full grey-level representation of information was assumed, and it was observed that there was little dependence of the result on the form of the thresholding rule used. Bai and Farhat (1988) went beyond Eq. (12) and incorporated an additional constraint on the norm of the derivative of the estimate. In order to encourage convergence, a gain factor or regularization-related adaptive threshold was introduced, multiplying the increment to be added to each element prior to thresholding. Their reconstructions, using a linear threshold, exhibited lower background noise levels than those obtained by alternative procedures. Winters adopted the approach described here, but without any explicit regularization term included (Winters, 1988). He uses a two-step penalty method that adds a large positive value to the energy function at each step if its minimization is not satisfied. His results could be implemented in microseconds on an analog electronic network, to be compared with several hours on a microcomputer. In all of these examples, this ill-posed reconstruction problem is being solved in the presence of noise and with the implicit incorporation of a support constraint imposed on the reconstructed function. We have demonstrated that a regularized Gerchberg-Papoulis algorithm is a special case of a general approach to deconvolution, based on directly mapping the least squares cost function onto a fully connected (neural) network (Abbiss et al., 1989, 1990, 1991). Since the approach to image restoration presented here was first proposed (Abbiss et al., 1988), other related investigations have been made. Zhou et al. (1988; Zhou and Chellappa, 1991) considered an energy function identical to Eq. (12) in order to specify network interconnection strengths. Their application was the restoration of grey-level images degraded by a shift-invariant finite impulse response blur function and additive noise. Grey tone information was coded by a redundant (i.e., degenerate) simple sum of neuron state variables and the network was asynchronously updated with a stochastic thresholding rule to keep it from being trapped in local minima of the energy function. B. Minimizing an Energy Function
We now consider the implementation of the image restoration method described by Eq. (14) on a Hopfield network. Having defined the energy
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
17
function, one needs to construct a connection matrix. It is not obvious that this can be done for any given problem, or that it can necessarily be accomplished without performing more calculations than are required for a more conventional solution to the problem. The concept of computational load or computational complexity is relevant in deciding the merits of a neural network solution to a problem. If we consider the expression given earlier for E in Eq. (12), we had E = llAv - g1I2+ PIIv1I2 (18) where we introduce the vector v now to denote the state of the network, which we expect to represent f either directly or indirectly. This expression can be rewritten in the form
+
E = vTATAv - 2gTAv + gTg PvTv,
(19) where the term gTg can be ignored since it represents a total offset for E. Comparing this expression for E with that of the Hopfield network gives T = -2(ATA +,LIZ),
b = 2ATg. Thus, superresolution performed by this procedure can be mapped simply and directly onto a Hopfield network. The regularization parameter, P, which sets a bound on the norm of the final estimatef, determines the Tmatrix from elements of the A-matrix, which encodes information about the imaging system and prior information about the object. The available data, g, contribute only to the bias vector b. While general in form, specific expressions for two-dimensional image deblurring can be found in Zhou et ul. (1988; Zhou and Chellappa, 1991), in which examples of deblurring by 5 x 5 uniform and Gaussian windows were considered. The neural algorithm formulated earlier can be used to recover binary images if a hard threshold is used, i.e., one based upon Eq. (2). However, owing to the presence of the bias term in the expression for the T-matrix, it follows that the diagonal of this matrix is not zero, as is usually the case. That the diagonal is often forced to be zero is done to ensure an energy decrease of the cost function at each iteration. Reduction in the energy is a sufficient condition that ensures convergence to the (closest local) minimum of the energy function. In general, the expression for the change in energy due to a change Ak in the kth entry of v, vk, is given in the binary case by (see Eq. (6)) AEk=-Ak[Uk+iTkkL&] (k= l,...,N). (21) Convergence is only guarunteed when the expression in the square brackets has the same sign as Ak.The differential form of the algorithm removes this
18
J. B. ABBISS et al.
requirement. Thus, using a discrete form of Eq. (7), i.e., AZk = AlUk, (22) one can determine an update to the z k , namely Azk, prior to thresholding, allowing recovery of continuously varying functions rather than binary functions. Since vk = o ( z k ) , there is a relation between gain of the thresholding function, cr, and the size of the time step, At. Reducing the gain or, equivalently, taking smaller time steps, lowers the rate of change in w and thus results in a slowing of the convergence of the network.
C . Image Restoration on a Binary Network In this section, we introduce a modification to the thresholding rule for the Hopfield network that makes it a more tractable approach to mapping image enhancement algorithms onto neural networks. In order to describe an arbitrary vector w , which has grey levels, in terms of a binary vector, we write w = Sv, where S is a mapping from an N-element vector with binary values to an L( 5 N ) element vector having a wider range of values. For example, if S represents a base-two mapping, each element of ZI can represent a power of 2, giving a 2N range of values for w. Several other coding schemes come to mind such as clustering or bit-density codes, or adding a group of elements of w to generate an element of w. From the expression in Eq. (12) for the energy function, we have E = llAWJ- g1I2 + Pllw1I2 = wTATAw - 2wTATg
+ gTg + P w T w ,
or E
= wT
[ ( A T A + P I ) w - 2ATg] + gTg.
(23)
It follows from this that the difference in energy between two states, with a change from w to w Aw, is given by
+
A E = 2AwT [ ( A T A + P I ) w - ATg
+ 4(ATA+ PI)Aw],
(24)
or A E = 2AvT[ST(ATA + p I ) S v - S T A T g + i S T ( A T A+,BI)SAv],
(25)
in terms of the neural state vector, v. This change in energy contains no assumption about the range of values of w; the previous restriction to two values reflects the desire to use a large number of simple binary processing elements in neural architectures.
IMAGE RESTORATION O N T H E HOPFIELD NEURAL NETWORK
19
For asynchronous operation, we can ensure that the change in energy expressed by Eq. (25) does not increase. If we define T = -2ST(ATA
+p l ) S
and
b = 2STAg, Eq. (25) gives, for the energy change in the kth element,
AEk
=
-& [( Tu + b)k+i TkkAk].
(26)
The grey-scale mapping we are considering associates a specific neuron with one and only one image pixel. Hence the columns of S each contain only one element, and it is found that the diagonal elements of T take the form Tkk = - 2 s $ ( A T A f
(27)
where sjk is the nonzero element of the kth column of S. negative, since the diagonal elements of A T A are positive and quantity. Hence we can rewrite Eq. (26) in the form
Tkk
is always
0 is a positive
AEk = -Ak[(Tv+b)k-iITkklAklr
(28)
and thus A E k will be negative provided where
This condition will be met if lAvkI < (Akl and sgn(Avk) = sgn(Ak). For a binary network, where v k E ( 0 , I } , we obtain the following rule if the network energy is not to increase:
Vc+’
=
{
1 for V i
hk
> 1,
for Ak 5 1,
(31)
0 for Ak < -1.
We next consider certain properties of a network of graded responses. D . Image Restoration on a Nonbinary Network
The restriction of the state vector, u, to binary values permitted the simplest possible processing elements to be used in the neural architecture. With more complex processors this simple representation is unnecessary
20
J . B. ABBISS et al.
and inefficient. In addition, two-level elements have the disadvantage that they provide only a coarse quantization of the reconstruction space, which can lead to the creation of local minima. This problem is avoided if the network is composed of elements that can assume a continuum of values. Consider the operation of a nonbinary network operating in the asynchronous mode. Because lAvkI is no longer fixed, there is no need for a thresholding operation, and the value of Awk that yields the greatest decrease in energy can be used. It can be seen from Eq. (28) that this maximum decrease in energy occurs when Auk = Ak/2.
(32)
Thus, if we adopt graded neurons capable of taking any value between adequate limits, the energy can be reduced at a maximum rate for which convergence can be guaranteed. A serially thresholded network of this type will therefore reach the global minimum after the fewest possible iterations. Synchronous operation of the network would make the most efficient use of the inherent parallelism of the system, and for continuously graded neurons we can identify at least one mode which is certainly convergent. If the kth neuron changes by Awk, we have
This can be compared with a regularized form of the Gerchberg-Papoulis algorithm (Abbiss et al., 1983), wn+' = ATg + [( 1 - p>z- A?4]w"
or u"+' = wn
+ @w" + b).
(33)
Hence, if AVk = ;(TW"+b)k,
(34)
parallel operation of the network will result in a computation that is identical to the regularized Gerchberg-Papoulis algorithm. Since the latter always converges, this choice for the Avk will always cause the network to converge to the global minimum. The convergence properties of the synchronous updating case remain unpredictable for the neural model in the general case because the energy is no longer guaranteed to decrease. However, each iterative cycle is
IMAGE RESTORATION O N T H E HOPFIELD NEURAL NETWORK
21
TABLE I (FROM ABBESel a/., 1991, 6 1991 IEEE). Requirement Change
0
Opera tion
SVD
Mults
N2+N N2 N 2N2+ N 2N2 N 18” + 3 N 2 2N N2 N
Adds
Change image
Divs Mults Adds
Total operations
Divs Mults Adds
Divs
Neural
+
K N+ ~ (K+ i ) ~ K N + ~ (K+ i ) ~ N (K+ I ) N 2+ ( K + l ) N ( K + 1 ) + K~ N ~ N N 3 + ( K + 1)N2 ( K + l ) N N~ K N ~ K N N
+
+ +
substantially shorter for the synchronous mode of operation, and numerical examples indicate slightly better reconstructions than with asynchronous updating. The significance of this result is not clear and will be the subject of further studies. We have found that placing limits on the absolute magnitude of Av tends to prevent the algorithm from becoming trapped at local minima in the energy function. This approach is similar to Hopfield’s differential approach for continuous-valued state vectors using a threshold of the kind appearing in Eq. (31) as Hopfield’s nonlinear sigmoidal threshold.
E. Computational Complexity The computational complexity associated with image reconstruction or superresolution using the singular value decomposition method of Eq. (13) and using the neural network approach was examined for onedimensional images. The numbers of additions, multiplications, and divisions for each technique are listed in Table I for three situations. The first set is the number of calculations necessary to update the regularization parameter p; the neural network has a disadvantage in this case because it must generally run for some K iterations. The second set considers the computational cost involved in updating the input image data vector, g . The neural network once again is at somewhat of a disadvantage. However, by examining the total number of operations from the beginning, one can see that the neural approach is substantially more efficient because it calculates a matrix product once without the overhead associated with singular value decomposition; the latter was estimated to grow as 18N3 for an N-point image. This is clearly increasingly significant for larger images.
22
J. B. ABBISS ef ul.
V. MATRIXINVERSION
As indicated in Section III.A, the minimizer of the cost function
E = / / A $ - gIl2 + Pll+l12 (35) can also be computed directly in matrix form (Abbiss et al., 1983), namely
f = [ATA+ pr]-'ATg,
(36) Image restoration can also be achieved by constructing a parameterized model for the solution, and minimizing I l f - G1I2 in a weighted Hilbert space. This method is referred to as the PDFT estimator (Darling et al., 1983; Byrne et al., 1983). Both of these methods have proved themselves to be effective at improving the resolution of limited noisy low-resolution images. However, in both cases, there is a need to invert a matrix in order to solve for the optimal solution. There are many methods for numerically inverting a matrix, including Gaussian elimination and singular value decomposition. The relative sizes of the singular values of the matrix to be inverted determine the stability of the inverse; a measure of the stability is given by the condition number of the matrix, one measure of which is the ratio of the largest singular value to the smallest. If some singular values are identically zero, or if the matrix is rectangular, one can make use of the Moore-Penrose generalized inverse (Albert, 1972), denoted A'. There is a unique A+ for any A , which can be obtained by singular value decomposition. A generalized inverse may still be highly ill-conditioned, however, because of the existence of nonzero but arbitrarily small singular values in the original matrix. The regularized matrix inverse used in the solution described in Eq. (14) is closely related to the Moore-Penrose generalized inverse. This relationship can be expressed as A+ = lim ( A ~ + API)-'A~. (37) 0-0
Increasing the regularization parameter relaxes the data consistency constraint, and also effectively increases the size of the smaller singular values of the inverse, so that their associated singular functions contribute to the solution in a controllable fashion. Use of a regularized generalized inverse allows one to recover information in the reconstructed image in a stable fashion. Thus, by reducing the condition number of the matrix to be inverted, a regularized inverse leads to a reduction in the resolution achieved in the final reconstructed image, but also to a reconstruction that is not susceptible to small changes in the input data. It is important to note that the Moore-Penrose inverse itself would not be useful for image restoration because the condition number of the matrix may
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
23
still be very large and small fluctuations in g would be amplified. It is useful in the context of the pseudo-inverse content addressable memory, however, since it can more precisely discriminate between similar images in recall. An approach of greater generality to the image restoration problem allows one to incorporate more detailed information on the known or desired characteristics of the object o r image; for example, to predispose the reconstruction to a particular shape, or to control its behavior in the neighborhood of the prescribed boundaries. More specifically, this information can be introduced via weighting functions in the inner products associated with the Hilbert spaces of object and image. The regularized inverse of Eq. (14) is modified as a consequence, and the metric altered against which the closeness of the estimate to the true solution is measured. The specification of an appropriate value for the regularization parameter /3 is not straightforward, but various techniques for its estimation do exist. Under certain circumstances, p can be identified with the noise-to-signal ratio, ( e / E ) ’ , where E is the noise level on the signal and E is the signal strength. Cross validation (Nashed and Wahba, 1974) can be also used to provide an estimate for this key parameter. Using regularized SVD to obtain an estimate of the inverse relies upon finding a “good” (i.e., optimal) p. An interesting point to note is that as ,f3 continues to increase, the matrix to be inverted becomes more and more similar to a unit (or identity) matrix. From Eq. (36) it can be seen that the reconstructed image will then be no better than the original image data g, but now truncated to the support in reconstruction space. If the regularization parameter is allowed to be too small, the ill-conditioning renders the estimate of the reconstructed image useless. A . Neuraf Matrix Pseudo-Inverse
Finding a suitable regularized inverse can be posed as an optimization problem by specifying a cost function of the form IIAV- Z1I2, where A is the matrix whose inverse V is to be calculated and I is the identity matrix. Following the approach of Jang et al. (1988), one can define an energy function of the form
24
J. B. ABBISS et al.
where k = 1,2, . . . N. The minimization of the sum of these energy functions will yield the matrix inverse. This energy function can be related to that for a Hopfield neural network that has the energy function given in Eq. ( 5 ) , namely: N i=l
where, as before, T is the network connection matrix, w the state vector, and b a bias vector (which shifts the decision threshold for each network element). The representation required for matrix inversion takes the form
j=l
m =l
where w denotes the network outputs and g { . } the output thresholding function. It is useful and important to note that in this formalism, one column can be inverted at a time; thus, we can write: N
From (41), it can be seen that, prior to thresholding, U has to be determined from the increment dU. This can be accomplished by an integration procedure, and in our numerical implementation a trapezoidal formula was used (Steriti and Fiddy, 1993). Thus, the evolution, in time, of the state of the network is given by
wi(r
+ 1) = w i ( t ) + X{AUi(r)- A U i ( t - I)},
-c N
A U i ( r ) / A t= A i
Tijwj(r),
j=I
(42) where r represents the discrete time step, and X is a relaxation parameter.
B. Numerical Considerations Usually a nonlinear threshold such as a sigmoid or “s”-shaped function is used to limit the output between two values. Also, high gain is commonly used to simulate a decision making system for which the output is hardclipped to be “on” or “off.” However, when a (neural) network is used for inverting matrices, there is no decision of this kind to be made. The choice of thresholding function for this application was g(x) = Ax, with X = 1. In
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
25
practice, when implementing in analog, digital or optical hardware, one is limited by the largest value that can be represented on that machine. This particular implementation of the inversion algorithm on a fully connected network leads to equations that are similar to those used widely in iterative relaxation procedures. The choice of the relaxation parameter of the incremental step, A, can obviously greatly affect the rate of convergence. We studied the rate of convergence of iterative procedures in general, in order to identify any techniques that might improve the performance of the matrix inversion algorithm. All such techniques weight the next iterate by multiplication with a relaxation parameter. For the case of solving optimization problems for which local minima in the energy function occur, a further refinement is necessary. The method of simulated annealing (Metropolis er al., 1953; Carnevali et al., 1985) tests an incremental change in an energy function against a parameterized probability of acceptance or rejection. The parameter is a fictitious temperature that, when zero, results in a simple direct threshold; for nonzero temperatures, an increase in energy might be accepted with some probability. The practical consequences of this are that local minima can be avoided. We shall demonstrate here that use of a suitable annealing schedule can accelerate convergence to the (global) minimum, even in the absence of local minima. Boltzmann machines or algorithms that implement simulated annealing can be regarded as imposing a threshold, the slope of which depends upon the temperature parameter. We can compare and contrast this with the network described earlier, which varies its outputs by changing the slope A. The consequences of modifying this parameter are similar in all cases since changing At, A, and the inverse of the temperature parameter have the same basic effect. Any analysis of the rate of convergence of one of these algorithms could be relevant to another. The paper by Geman and Geman (1984) suggests using a logarithmic formula to reduce the annealing temperature T as a function of iteration number n, namely T ( n )2 c/log(l + n ) .
(43) Here the temperature would start high and slowly drop. Widrow and Stearns (1985) state that, for convergence, the largest value of A should be less than the reciprocal of the largest eigenvalue or the trace of the connection matrix T (assuming positive definiteness), since the trace is the sum of all the eigenvalues. The algorithm used in our simulations was if (A < l/Tr[T]) then X
= exp
( t 2 / 2 2 5 ) - 1.
(44)
This yields a function that is initially zero and exponentially rises to its maximum value. The constant (225) determines the rate at which X
26
J. B. ABBISS ei al.
increases; too large a value of X at any iteration will cause the output to oscillate. Having argued that, in principle, techniques used to set the value of At in numerical integration are equivalent to setting A (for our choice of the output function), there are several differences to be noted. In typical integration routines one is interested in the output as a function of time. In a neural network, only the final state vector is of importance. A variable step algorithm will reduce the incremental step size until the change between iterations is below a prescribed value. For a neural network, since the outputs are expected to settle to a stable configurational state, one wants the algorithm to reach this state in as small a number of iterations as possible. (The package DGEAR [or IVPAG] from the IMSL [International Mathematical & Statistical Library] has been used in neural network computations of a very similar nature to speed the convergence process.) In the computer simulation, the settling of the network was determined by calculating the maximum percentage change in the outputs between time steps. This difference was calculated as
Tests are made to avoid a division by zero. The iterative process continued until this maximum percentage difference fell below a prescribed tolerance, which was the settling accuracy. An appropriate settling accuracy greatly reduces the processing time required. Because of the nature of this network inversion method, a suitable inverse can be obtained even without allowing the network to fully settle. This is due to the iterative minimization of the energy function defining the network. The solution path proceeds along an n-dimensional contour towards a global minimum. This tends to overcome round-off errors inherent in numerical methods for matrix inversion. C . Properties of the Neural Matrix Inverse An objective in the development of the neural matrix-inversion method was to obviate the need for an explicit regularization parameter. It is found that the matrix inverse obtained from a Hopfield-based implementation can indeed be regularized by truncating the number of network iterations. These iterations can be terminated when some prescribed settling accuracy, defined by Eq. (45), is achieved. As the settling accuracy increases, the neural inverse should, and indeed does, tend toward an unregularized inverse.
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
27
The lower the settling accuracy, the more rapidly the inverse is computed (Steriti et al., 1993; Steriti and Fiddy, 1993). The properties of the inverse are very well conditioned for a wide range of settling accuracies, and the iterative procedure requires no action on the part of the user except for the selection of the stopping point through this parameter. We find that the sets of inverses obtained by S V D for a range of values of the regularization parameter, and by the neural approach for a range of values of the settling accuracy, are quite different in their structure, their distribution of singular values, and their properties when applied to image restoration. The singular values of the regularized S V D matrix inverse alter systematically as the regularization parameter changes. For the neural inverse, there is only a small change in the singular values over a wide range of settling accuracies. A moderate and fixed settling accuracy of around 10% performs well in typical image restoration applications. One can estimate an effective regularization parameter associated with the neural inverse by comparing specific features in the restored images obtained by the network and by SVD. A number of examples of low-pass filtered images and their associated restored images have been computed. It has been found that, even for low settling accuracies, matrix inverses are obtained that provide image restorations with significantly enhanced resolution. There may be some additional property associated with the neural inverse matrix that improves its properties in this context; this observation merits further investigation. It should also be noted that low settling accuracies greatly reduce the computational time required, which could be still further reduced by fully parallel implementation in appropriate hardware.
VI. EXAMPLES OF IMAGE RESTORATION A . Regularized Iterative Reconstructions
A series of numerical experiments have been performed in order to evaluate the inversion procedure for image restoration described in Section 1V.D. We show some numerical results in the accompanying figures with reconstructions obtained by serial network updating compared with those obtained using Eq. (13). Figure 1 shows an object (dashed) and the corresponding incoherent image (solid); the imaging point spread function is shown in Fig. 2. Results using Eq. (13) are shown by a dot-dashed curve. We have considered elements that can take both binary (i.e., two-level) and nonbinary values.
28
J. B. ABBISS et al.
-2
-1.5
-1
-0.5
0
0.5
I
1.5
2
FIGURE I Original object (from Abbiss ef a l , 1991, 0 1991 IEEE)
The former case has been extended to include a generalized grey-scale mapping that transforms a binary state vector into a vector having grey scales (Abbiss et al., 1991). For example, Fig. 3 shows a typical estimate of the object calculated by a binary network. The network contained 90 binary elements that were combined using a 4-bit coding scheme to represent the 15-point object. The network converged after only five cycles to the result shown in Fig. 3; the image has been erroneously
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
FIGURE 2 Imaging point spread function (from Abbiss et al., 1991, 0 1991 IEEE)
29
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
I
I
-2
-1.5
-1
-0.5
0
0.5
1
1.5
FIGURE 3 Estimate using binary neural network (from Abbiss et a l , 1991,
2
0 1991 IEEE).
reconstructed into an object having four peaks. In this case the network arrived at a local energy minimum because of the crude quantization of the binary elements. However, the network will always approach a global energy minimum if the elements are allowed to take a range of values over some continuum such as 0 5 vk 5 1. Introducing this modification and using the updating rule of Eq. (31) gives the improved result shown in Fig. 4.Note that five cycles of the network were also used in this case to 50
40
30
20
10
0
-10-2
-1.5
-1
-0.5
0
0.5
1
1.5
FIGURE 4. Estimate with modified updating rule (from Abbiss et al., 1991,
2
0 1991 IEEE).
30
J. B. ABBISS et al.
FIGURE 5. Estimate with 5% noise (from Abbiss el al., 1991, 0 1991 IEEE).
provide a comparison with Fig. 3; the network has not yet converged, although it has resolved the three object features. We also demonstrate the noise stability of this approach using an image with 5% additive noise (Fig. 5). The result after 50 cycles is shown in Fig. 6 . In this case the network reconstruction is virtually identical to that calculated from Eq. (1 3).
-2
-1.5
-1
-0.5
0
0.5
I
FIGURE 6. Estimate after 50 cycles (from Abbiss et al., 1991,
1.5
2
0 1991 IEEE).
IMAGE RESTORATlON ON THE HOPFIELD NEURAL NETWORK
FIGURE
31
7. (a) Original image. (b) DFT estimate.
B. PDFT Reconstructions A series of numerical experiments have also been performed to evaluate the neural inversion procedure described in Section V.A, which is used to calculate a PDFT image as described in Section 1II.B. Of concern are the quality of the reconstructed image, the ease with which an acceptable matrix inverse is found using this approach (as compared to a more traditional one), and the computational complexity. A solution of the form of Eq. (17) requires the inversion of a matrix composed of elements taken from the Fourier transform of the prior estimate p ( t ) . We consider here the use of only support constraints representing prior knowledge about the true image. This means that the matrix to be inverted contains elements of the function [sin( W,x)/x]-[sin( W,y)/y], where W, and Wy denote the dimensions of the support information in the 2-D object domain. A matrix drawn from these elements typically has a high condition number and is thus ill-conditioned; introducing a small nonzero value to the diagonal of this matrix reduces the condition number. Given the trade-off that occurs between the stability of the reconstructed image and its resolution, one must determine an optimal value for the regularization parameter.
FIGURE 8. (a) PDFT reconstruction using SVD; /3 = 0.1. (b) PDFT reconstruction using SVD; p = 0.01. (c) PDFT reconstruction using SVD; /3 = 0.001. (d) PDFT reconstruction using SVD; p = 0.0001.
32
J. B. ABBISS er al.
FIGURE 9. (a) PDFT reconstruction using neuraI net; S / A = 3%. (b) PDFT reconstruction using neural net; S / A = 1%. (c) PDFT reconstruction using neural net; S/A = 0.1%. (d) PDFT reconstruction using neural net; S / A = 0.01%.
In Figs. 7 to 9, the spectral estimation procedure is used to improve the resolution of a low-pass filtered image. Figure 7(a) shows the original image, and 7(b) the low-pass filtered estimate. Figure 8 shows the PDFT reconstructions obtained using SVD, with decreasing levels of regularization ( P = 0.1, 0.01, 0.001, 0.0001). Here the image quality improves as the regularization parameter is reduced, with an “optimal” reconstruction obtained in fig. 8(c) ( P = 0.001). It is important to note that the reconstruction is poor at low values of regularization, Le., when it is underregularized, and deteriorates when the regularization parameter is too large, reverting to the DFT estimate.
Matrlx Slzr (U)
FIGURE 10. Processing time; n vs. time (from Steriti er al., 1990, with permission from IOP Publishing). The filled squares denote numerical method and the open squares the network, as a function of matrix size M x M.
IMAGE RESTORATION ON T H E HOPFIELD NEURAL NETWORK
33
Figure 9 displays PDFT reconstructions obtained using the neural network matrix inversion procedure for different values of reconstruction accuracy (3%, I%, 0.1%, 0.01%). It has been found that there is still a good (i.e., recognizable and improved) reconstruction occurring at low accuracies (values over 20% have been successfully used). Figures 10 and 11 are graphs demonstrating the processing times (on an Apollo DN IOOOO) needed for the matrix inversion. Figure 10 shows the increase in processing time with matrix size. A logarithmic curve fit was made for both the numerical and neural network cases, with the following results: network inverse:
y = (3.0534e-3)~'.~~'~,
numerical inverse:
y
=
(3.4564e-6)x
2.7635
,
with accuracies of 0.974. Although the neural inverse takes significantly longer to compute, it does yield a usable inverse without the need to optimize a regularization parameter. The second graph (Figure 11) shows the processing times needed for differing values of settling accuracies. It is important to note that the algorithm tends to settle fairly quickly to an approximate solution; hence, one can reduce the amount of processing time and still obtain a good reconstruction. This is an attractive feature of the inverse derived by minimizing an energy function.
0
20
40
60
80
Sttllng Accuracy (%) FIGURE I I . Processing time; S / A vs. time (from Steriti et al., 1990, with permission from IOP Publishing).
34
J . B. ABBISS er al.
-
-
B=O B = .01 B1.001 I .0001 B I .00001
+B
Y
0.0
0.2
0.8
0.4
0.8
1 .o
a FIGURE
12. y vs.
01
for the regularized SVD inverses.
C . Discussion
It has been shown that the neural network calculates a somewhat different matrix inverse than that calculated by regularized SVD. Given the relationship between the a and the singular values y of the inverse matrix for regularized SVD, where
Y=
+P),
together with some evaluation criterion, one can deduce an effective regularassociated with a specific settling ization parameter 0,normalized to accuracy. This relationship defines a mapping of the singular values of the matrix into those of its regularized inverse, which can be seen in Fig. 12. Here, as the regularization parameter, ,8, is decreased, the mapping approaches the behavior of the 1/a function. The differences between the regularized SVD and neural network matrix inverses can be examined from their singular value spectra. The singular values of a set of regularized SVD matrix inverses, for a representative example, can be seen in Fig. 13. Figure 14 shows the singular values for a set of neural network inverses for differing values of settling accuracies. Note that the neural network inverses generally have larger condition numbers than their regularized SVD counterparts, but a larger number of the singular values are much smaller. It is because of this apodization of the singular values that the neural matrix inverse is both useful and robust.
35
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
. 0
x
R=.l Dm.01
D = ,001 0=.0001 0-0
D
FIGURE 13. Singular value spectra of the SVD inverses
The neural network inverses do not rely heavily upon a specific value of a regularization parameter or on the choice of settling accuracy used. Clearly, even a large settling accuracy using the neural inversion method is providing a matrix inverse with suitable properties for this restoration problem. This has been our experience with several numerical examples, and it suggests that there is some additional property associated with the neural inverse matrix that improves its properties in this context. Indeed, the quality of the reconstructed images when large values are used for the settling
Settllng
.
Accuracy 10%
1 96 .1 %
.01 %
0
10
20
30
0
FIGLIRE 14. Singular value spectra of the neural net inverses
36
J. B. ABBISS et al.
accuracy, which greatly reduce the computational time required, is a feature which deserves further investigation. Using too small a settling accuracy for the image restoration problem, on the other hand, is counterproductive, because of both the decreasing regularization and the increased computational time.
VII. NEWRESTORATION APPROACHES We have described how the Hopfield model for a neural network can be used for solving an optimization problem that arises in signal and image recovery, namely superresolution or image restoration. There is a practical advantage in implementing such a procedure in this way. There is an updating scheme that ensures convergence both for parallel (synchronous) and for serial (asynchronous) updating. Of particular interest is the parallel updating case, because convergence is faster; convergence can be demonstrated for a specific case, by comparison with the regularized Gerchberg-Papoulis algorithm. This view of iterative superresolution methods through a neural network formalism leads to the development of improved algorithms based on a more versatile class of optimization criteria that generalizes distances of the form (Jones and Trutzer, 1989, 1990)
where one finds a Q ( x ) that is an estimate of the target or image feature, is consistent with the available data, and minimizes the distance to a prior estimate, P ( x ) , for that target. For given choices o f f ( . , .) many distance measures already studied in the literature can be obtained, such as Burg entropy (Burg, 1967) and cross-entropy (Shore and Johnson, 1980). For a given set of prior estimates, minimizing a chosen distance measure with respect to each provides a mechanism for quantifying the similarity a given feature has with respect to each of these prior estimates. This was first suggested by Shore and Johnson (1980), who developed an approach based on cross-entropy minimization to classify an input vector of measurements with respect to a fixed set of characteristic feature vectors or “cluster centers.” The prior estimates can be viewed as prototype estimates to be updated as the classification algorithm proceeds. The restoration step and the classification step both require an optimization algorithm to be performed, and it is this that we carry out on a neural network architecture.
IMAGE RESTORATION ON T H E HOPFIELD NEURAL NETWORK
37
A more general approach to image restoration using a broader definition of distance measure is described next. Examples of solutions using linear methods that incorporate finitely many constraints in the associated meansquare optimization criteria provide relatively poor resolution compared with nonlinear methods. Methods for image restoration invariably involve minimizing (or maximizing) some chosen criterion or energy function, while incorporating as much prior knowledge about the image to be recovered as possible. The solution one wants is the one with the most consistent and likely features. Many methods have been proposed that exploit a variety of criteria; popular criteria include weighted least squares (Bregman, 1967; Luenberger, 1969; Csiszar, 1989); minimum cross-entropy (Rao and Nayak, 1985; Jones, 1989; Jones and Byrne, 1990), which gives maximum Shannon entropy for uniform prior; Burg entropy (Burg, 1967); and Itakura-Saito distortion measures (Itakura and Saito, 1968). Minimum cross-entropy methods have been justified using probabilistic principles (Friedman and Stuetzle, 1981; Huber, 1985; Donoho and Johnstone, 1989). Recently a geometric and approximation-theoretic justification of cross-entropy has been given, based on the fact that it satisfies a directed orthogonality condition (Jones and Trutzer, 1989) and a Pythagorean property (Jones and Trutzer, 1990). The latter can be expressed as follows: D ( R , P) = D ( R , Q )
+ D ( Q ,P),
(47)
where D represents a distance measure, R, the true image, P the (revised) prior estimate of the image, and Q the estimate for the solution for R. A necessary consequence of this property is that D(R,Q ) 5 D ( R ,P ) , which means that the estimate Q is an improvement over the prior estimate P. Also, if D ( Q , P ) is determined to be too large, one knows that the P used is poor and it can be rejected. Jones and Trutzer (1 989, 1990) have developed a class of distances that, together with prior revision, have this orthogonality property and that lead to higher-resolution procedures than maximum entropy and require minimal computations. Consider the reconstruction of a function R ( x ) given the data values
where g k ( x ) can represent the imaging point spread function or system transfer function, depending on the space in which data are measured. An optimal and data consistent estimate of R ( x ) , Q ( x ) , can be determined by finding a Q(x) that, with respect to the minimum distance criterion, is closest to a prior estimate of R ( x ) , P ( x ) i.e., D ( Q , P ) is minimized. A ~
38
J. B. ABBISS et a1
simple mean-square distance is given by
and minimizing this subject to data consistency is equivalent to solving Q=P+Ct,gi,
i = I , ... K ,
for the (Lagrange) constants ti such that Q is data consistent. The precise definition of the distance measure used depends on the nature of the noise present. The proof of the conditions required of a distance measure in order that it satisfy the directed orthogonality condition can be found in Jones and Trutzer (1990). The consequence of this is that, using such distance measures, the minimizing solution is optimal in the sense of minimizing the distance between the true solution and the estimated solution, while simultaneously minimizing the distance between the estimated solution and the prior estimate. The possibility of implementing these techniques with neural networks remains to be explored. Once the connection matrix elements, or the connection strengths between network layers, are determined, their specific values reflect the information and processing capabilities of that network.
VIII. HARDWARE IMPLEMENTATIONS A . Electronic Hardware
There are now many off-the-shelf and customized neural network systems in use. Most are digital electronic systems, since these offer great flexibiIity in terms of connection strength precision and software-driven interconnection capabilities. For example, IBM has developed a neural network image classifier that processes images up to 512 by 512 pixels in size using readily available programmable gate arrays; these were assembled onto processing boards with a full board capacity of 27.8 billion arithmetic operations per second (Studt, 1991). The U.S. Postal Service is currently funding AT&T Murray Hill to develop a neural network system that can read handwritten ZIP codes (Studt, 1991). The network first locates the ZIP code and then identifies the numbers; the network has 100,000 interconnections and 4,000 neurons. The bulk of the neural processing is carried out with software simulations in order to maintain flexibility while training. A number of PC-based neural software systems are commercially available and contain multiple architectures. Also available are plug-in accelerator boards for PCs and workstations that can exceed 25 million interconnections
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
39
per second. However, once a network has been trained or the connection strengths are known or have been specified, the network can be hard-wired. Implementing a neural network on a chip is difficult because, even with advanced VLSI technology, massive bus lines are needed between a large on-chip memory and many parallel processing circuits. A recent publication describes many neural network implementations based on the INTEL 80170NX chip (Intel, 1992). This analog neural chip was first fabricated in 1989 and with the prototyping board can support 1,024 neurons with 81,920 connections; it is also referred to as the INTEL ETANN, for Electrically Trainable Artificial Neural Network. Configured to incorporate 64 analog neurons and 10,240 analog interconnects, the network can compute the product of a 64-element input vector with a 64 by 64 connection matrix at a rate exceeding 2 billion interconnections per second. By interconnecting eight chips, systems can achieve more than 16 billion connections per second. All elements of the computation are analog and carried out in parallel. It represents an electrically reconfigurable network and has found applications in real-time image processing and highenergy particle tracking. A single-chip CMOS microprocessor is being developed, through NSF support, by ICSI and UC Berkeley. It is designed for high-speed special-purpose digital systems for artificial neural network calculations (Asanovic et al., 1992) and connectionist applications that do not require high-precision arithmetic. The first implementation will be in a 1.2 pm CMOS technology with a peak 50MHz clock rate. SPERT (Synthetic PERceptron Testbed) will occupy a double SBus slot within a Sun Sparcstation and will compute more than 0.3 billion connections per second. Another single-chip implementation is that being developed by Hitachi (Watanabe et al., 1992). This is a lo6 neuron architecture running on a 1.5 V battery, permitting its use in portable equipment. An on-chip DRAM cell array stores lo6 8-bit interconnection weights digitally; these are easily programmable and refreshable. The processing speed is in excess of 1 billion connections per second, and power dissipation has been reduced to 75 mW. With advanced 0.5pm CMOS technology, such as TRW Inc. offers, 2.56 billion operations per second are projected. This is not intended to be a review of electronic hardware being developed for neural networks; rather, these examples give some indication of the current state of the art. There are many companies and universities involved in this area of research and development, including AT&T, Adaptive, Bellcore, HNC, Intel, Nestor, Motorola, Texas Instruments, SAIC, and Synaptics in the U.S., CEN/LETI, Ecole Poly, DiMilano, Siemens and Philips in Europe, and Hitachi, Fujitsu, Toshiba, and Ricoh in Japan.
40
J. B. ABBISS el al.
B. Optical Hardware Much has been written on the use of optical hardware for neural networks (Kyuma, 1991; Jenkins and Tanguay, 1991). One of the first implementations of a neural network was optical, using a Hopfield net as an associative memory. The connection between an associative memory and the recall capability of a (volume) hologram has not gone unnoticed (Hong and Psaltis, 1986; Owechko et al., 1987; White et al., 1988). Numerous schemes have been proposed to implement a Hopfield net optically, some of which were demonstrated several years ago (see, for example, the special issues of Applied Optics, 1 Dec. 1987 and 15 Jan. 1989). Formally, the device requirement for this is a fan-out and fan-in interconnect element that can take information from any one location or switch (“neuron”) to all others in the network and/or a device that can store many patterns for optical recall. Holographic storage of information is the favored approach for both of these requirements because of the large potential memory capacity. Interconnections can be realized based on the Bragg condition for angularly selective diffraction from a volume grating; if the recording medium is reusable, it is possible to make a “trainable” interconnection network. One can also achieve a high degree of fan-in or fan-out with a simple convex lens. For the specific case of the Hopfield architecture as a content-addressable memory, a simple hardware realization is the following. One can store a set of desired patterns to be recalled in a two-dimensional holographic mask and SLM, or in a three-dimensional or volume holographic element. In the case of a two-dimensional mask, which would be of more limited capacity than its three-dimensional counterpart, one could form the mask by conventional optical means or one could compute the transmittance required. Using a coding scheme one could plot a “computer-generated optical element” or address an SLM directly. A photorefractive material is attractive for use as the volume storage element, and it is well known that it is not necessary to use a discrete set of angular plane waves as reference waves for holographic storage in this case. It has recently been demonstrated that a continuum of angularly and spatially distributed gratings can be induced as a result of the phenomenon of self-pumping in such materials (Soffer et al., 1986; Owechko, 1989a, 1989b). A dynamic storage or interconnect element is a key requirement for most optical computing architectures, and its further study will have wide-ranging implications for a variety of optical and hybrid neural processors; we and others are actively pursuing methods for the fixing and controlled erasure of these holographic interconnection patterns (Anderson and Lininger, 1987; Yeh et al., 1988).
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
41
For example, in a learning phase, images are stored in a spatially multiplexed fashion in the Fourier domain. In the act of recall, an input is correlated with the stored images, because its spectrum multiplies the information in this Fourier domain. Precision pinholes are sometimes used to isolate cross-correlation from autocorrelation peaks and thus improve discrimination or fidelity of the neural output. The system originally proposed by Psaltis’s group (Psaltis et al., 1988) is essentially a joint transform correlator, representing the input and training planes; the Fourier information is collected either electronically or optically in a volume hologram, and this is followed by a Fourier lens and output neuron plane in the back focal plane. Ideally such a configuration would operate entirely optically and perhaps iteratively if frequent learning was desirable. Such an optical system could have an optical gain step based on two-wave mixing and energy transfer in a photorefractive, resulting in a “ring-resonator” architecture (Cronin-Golomb and Brandle, 1989; Lininger et al., 1989; Fischer et al., 1989). While many publications discuss the optical implementation of a neural processor, very few truly neural architectures are being described, and many are hybrid systems. Some optical architectures are based on optical matrixvector multipliers, but optical CAM systems tend to rely on the read-out of a (volume) hologram using a partial reference wave for reconstruction. It may be possible to make use of the Fourier transforming properties of a lens to further speed up the overall processing time. Indeed, attempts to implement the Gerchberg-Papoulis algorithm optically were made many years ago (Marks, 1980; Marks and Smith, 1980; Marks, 1981). Since an iterative procedure is envisaged, a hybrid processor was thought to be necessary, because of the difficulty of sustaining a reasonable light level within the processor for more than a few iterations. If the processor is hybrid, the loss of accuracy and time associated with an analog step and A/D conversion proves counterproductive, and a totally binary representation throughout the system is more attractive. Lo and Indebetouw recently proposed an all optical solution to an iterative Gerchberg-Papoulis processor that does not suffer from this limitation, since it effectively exploits photorefractive media for gain purposes (Lo and Indebetouw, 1992). One method to provide gain was recently suggested by Yeh et al. (1989); it makes use of the energy transfer, with no phase cross-talk, that can be made to occur in two-beam coupling setups. An optical content-addressable memory processor requires high-volume storage capability; this may be fixed if the connection matrix is defined a priori, or it may need to be updated. If it needs to be updated, then one needs to consider dynamic storage mechanisms and refreshing schedules; this would permit the optical implementation of a learning procedure. The
42
J. B. ABBISS et al.
study of dynamic high-density storage has application to the image restoration methods described earlier, both for parallel read/write capability in an optical processor and for encoding high-efficiency T-matrices. Photorefractives provide a good material for storage because of their volume holographic recording capability. In principle, a crystal can store V / ( X / ~ Ibits, ) ~ where V is the volume, and n the refractive index; this could represent a number as large as 10l2 for a volume of 1 cm3.
IX. CONCLUSIONS We believe that there are practical advantages to considering the image restoration or superresolution problem in terms of a neural network formalism. Neural network solutions to image restoration problems are competitive with, but not necessarily better than, more traditional methods for solving the problem (see also Abbiss et al., 1991). An advantage that we have found is the improved performance with respect to ill-conditioning difficulties. It has been reported by others that a (Hopfield) net formalism overcomes stagnation difficulties encountered with techniques such as gradient descent, and convergence to the best solution does not deteriorate appreciably in the presence of noise (Kamgar-Parsi and Gualtieri, 1992). There is a large body of empirical evidence that the neural network approach enlarges the basins of attraction of the energy function minima, thus enhancing the chances of finding better solutions and making the final solution less dependent on the starting parameters (Kamgar-Parsi and Gualtieri, 1992). Also, convergence can be fast initially, providing a good initial sense of the content of the restored image. We have shown that both binary (two-state) and nonbinary image reconstruction algorithms can be implemented on very similar (Hopfield) neural architectures, the only distinction being whether the nonbinary case is based upon a two-state processing hardware or more complex analog devices. In any case, for this application the diagonal of the connection matrix is nonzero, and the crucial thresholding step must be modified in order to ensure that the energy of the network decreases at each step. Since we expect a unique minimum for the energy function, a guarantee of convergence results in a solution to the restoration problem. We have also shown that an updating scheme can be specified that ensures maximal convergence for serial (asynchronous) updating. Of particular significance is the computational gain in speed associated with parallel updating. It is important to point out that convergence is assured on this basis if the updating is asynchronous (which increases computation times), but
IMAGE RESTORATION ON THE HOPFIELD NEURAL NETWORK
43
synchronous updating has proved successful provided the upper and lower bounds on state changes are not too large, so that limit cycles are excluded. Convergence for this case can be demonstrated for a specific class of updating procedures that are formally equivalent to the regularized GerchbergPapoulis superresolution algorithm, which is guaranteed to converge. Parameterized methods for image restoration were described that required calculation of a set of coefficients by matrix inversion. Using the approach described here for estimating a matrix inverse, we have also shown that no specific regularization parameter appeared necessary. The iterative algorithm is halted on the basis of satisfying a settling-accuracy parameter. We can show that for a large range of settling accuracies (e.g., 10 to 20%), the inverse matrices obtained by the network are similar in character and well-conditioned as judged by their use in a superresolution procedure. This contrasts with the widely differing inverses that are typically found as the regularization parameter is varied with a regularized SVD inverse. As the settling accuracy is decreased, the inverse tends towards an unregularized pseudo-inverse. It was demonstrated that the network inverse provided a good reconstruction of the image without the need for any decision about the value of the regularization parameter. The processing time required for the neural inverse is, however, significantly longer and dependent upon the specified settling accuracy. It is important to note that even at low settling accuracies, the matrix inverse was still accurate enough to be used successfully in the image reconstruction (i.e., spectral estimation) algorithm. An effective regularization parameter is implicitly defined by the neural inversion scheme, once a settling accuracy has been specified. It also appears that, for a wide range of settling accuracies, a regularized pseudo-inverse has been found that generates satisfactory image reconstructions. Because of the complexity of this algorithm it seems likely that (in digital hardware) the complexity will not be less than 0 ( N 3 ) .In practice, when implemented on (serial) digital machines, this algorithm is much slower than other inversion algorithms (such as Gauss-Jordan); however, a useful matrix inverse is calculated without the need for regularization. We therefore believe that with the appropriate hardware, this kind of fully connected architecture offers a significant advantage for computing matrix inverses. In all of these cases, the methods proposed can be implemented in parallel, and thus provide a high-speed calculation of the restored image. This is a relative statement, but we assume that appropriate hardware will become available, which corresponds to a programmable, fully connected massively parallel processor; this would correspond to a Hopfield platform that could now be directly programmed for image restoration, since we have identified the required T-matrices. Clearly, with the appropriate hardware very high
44
J . B. ABBISS et al.
speeds are possible; with optical hardware and parallel memory addressing, memory bottlenecks to processing speed should also be avoided. VLSI technology with parallel interconnections between one-dimensional arrays of neurons is possible. However, optical or opto-electronic implementations permit two-dimensional systems to be realized; larger or more complex systems can also be implemented in this way. Optical implementation of a neural processor has been widely discussed in the literature, but few systems of any real use have yet emerged. Using a Fourier-based correlator architecture with inexpensive SLMs such as liquid crystal televisions offers an optical hardware solution. A simple mask or volume-diffraction element can encode the connection matrix required for image restoration or matrix inversion. Iteration can be achieved via hybrid methods or the use of optical elements such as photorefractives which inject gain into the optical processor. If the connection matrix is to be refined or redefined, an adaptive optical interconnection element is required. With a digital frame store or a dynamic volume holographic storage element, the joint transform correlator architecture could be used as a trainable neural network. The input and output planes would incorporate a number of neurons equal to the number of pixels in the SLMs used. Such an optical system could be used to study information representations and learning protocols. Only when highly parallel processing architectures are available will the full potential of neural net solutions to image restoration problems be realized; our expectation is that the hardware to accomplish this will necessarily rely heavily on optics because of the high density of interconnections that is required.
ACKNOWLEDGMENT This work was in part supported by SDIO/IST and managed by ONR.
REFERENCES Abbiss, J. B., DeMol, C., and Dhadwal, H. S. (1983). “Regularised iterative and non-iterative procedures for object restoration from experimental data,” Opt. Actu 30, 107-1 24. Abbiss, J. B., Bayley, J . S., Brames, B. J., and Fiddy, M. A. (1988). “Super resolution and neural computing,” in SPIE Proc. Vol. 880, High Speed Computing (K. Bromley, ed.), pp. 100- 106. Abbiss, J . B., Fiddy, M. A., and Brames, B. J. (1989). “On the application of neural networks to the solution of image restoration problems,” in SPZE Proc. Vol. 1058, High Speed Computing (K. Bromley, ed.), pp. 138-146.
IMAGE RESTORATION ON T H E HOPFIELD NEURAL NETWORK
45
Abbiss, J . B., Brames, B. J., Byrne, C. L., and Fiddy, M. A. (1990). “Image-restoration algorithms for a fully connected architecture,” Optics Letters 15, 688-690. Abbiss, J. B., Brames, B. J., and Fiddy, M. A. (1991). “Super-resolution algorithms for a modified Hopfield neural network,” IEEE Trans. on Signal Processing 39, 1516-1523. Albert, A. (1972). Regression and the Moore-Penrose Pseudoinverse, Academic Press, New York. Anderson, D. Z., and Lininger, D. M . (1987). “Dynamic interconnects: Volume holograms as optical two-port operators,” Appl. Opt. 26, 503 I . Anderson, J. A., and Rosenfeld, E. (eds.) (1987). Neurul Computing: Foundations of Research, MIT Press, Cambridge, Massachusetts. Andrews, H . C., and Hunt, B. R. (1977). Digital Image Restoruiion, Prentice-Hall, Englewood Cliffs, New Jersey. Asanovic, K., Beck. J . , Kingsbury, B. E. D., Kohn, P., Morgan, N., and Wawrzynek, J. (1992). “SPERT: A VLIW/SIMD neuro-microprocessor,” Proc. IJCNN ’92, Vol. 11, p. 577. Bai, B., and Farhat, N. H. (1988). “Radar image reconstruction based on neural net models,” IEEE APSIURSI Meeting. Syracuse, pp. 774-777. Bertero, M., DeMol, C., and Pike, E. R. (1988). “Linear inverse problems with discrete data: 11. Stability and regularization,” Inverse Problems 4, 573-594. Bregman, L. M. (1967). “The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming,” U S S R Comp. Math. and Math. Phys. 7, 131-142. Burg, J. P. (1967). “Maximum entropy spectral analysis,” Proc. 37th Meeting Soc. qf Exploration Geophysicists. Oklahoma City, p. 127. Byrne. C . L., Fitzgerald, R. M., Fiddy, M. A., Darling, A. M., and Hall, T. J. (1983). “Image restoration and enhancement,” J.O.S.A. 73, 1481-1487. Carnevali, P., Coletti, L., and Patarnello, S. ( I 985). “Image processing by simulated annealing,” IBM J. Res. Dev. 29, 569-579. Cronin-Golomb, M., and Brandle, C. D. (1989). “Ring self-pumped phase conjugator using total internal reflection in photorefractive strontium barium niobate,” Optics Lett. 14, 462-464. Csiszar, I. (1989). “Why least squares and maximum entropy? An axiomatic approach to inverse problems,” Math. Inst. Hungarian Acad. Sci., No. 19. Darling, A. M. (1984). “Digital object reconstruction from limited data incorporating prior information,” Ph.D. thesis, University of London. Darling, A. M., Hall, T. J., and Fiddy, M . A. (1983). “Stable, noniterative, object reconstruction from incomplete data using a priori data,” J . O . S . A .73, 1466-1469. Donoho, D. L., and Johnstone, I. M. (1989). “Projection-based approximation and a duality with kernel methods,” Annuls of’ Statistics 17(1), 58- 106. Eckmiller, R., and Malsburg, C. v. d., eds. (1987). Neural Computers, NATO AS1 Series F, Vol. 41, Springer Verlag, Berlin. Eichmann, G., and Stojancic, M. (1987). “Superresolving signal and image restoration using a linear associative memory,” Appl. Opt. 26, 191 1-1918. Farhat, N. H., and Miyabara, S. (1986). “Super-resoltuion and signal recovery using models of neural networks,” O . S . A . Topical Meeting on Signal Recovery & Synthesis II, pp. 120-123. Fiddy, M. A. (1987). “The role of analyticity in image recovery,” in Image Recovery: Theory and Application (H. Stark, ed.), pp. 499-529. Academic Press, Boca Raton, Florida. Fienup, J. R. (1982). “Phase retrieval algorithms: A comparison,” Appl. Opt. 21, 2758-2769. Fischer, B., Sternklar, S., and Weiss, S. (1989). “Photorefractive oscillators,” IEEE Trans. O E 25, 550-569. Friedman, J. H., and Stuetzle, W. (1981). “Projection pursuit regression,” J . Amer. Stat. Assoc. 76, 817-823.
46
J. B. ABBISS et al.
Geman, S., and Geman D. (1984). “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. PAMI-6, 721-741. Gerchberg, R. W. (1974). “Super-resolution through error energy reduction,” Opt. Acta 21, 709-720. Hansen, P. C. (1987). “The truncated SVD as a method of regularization,” BIT 27, 534-553. Hong, J., and Psaltis, D. (1986). “Storage capacity of holographic associative memories,” Optics Lett. 11, 812-814. Hopfield, J. J. (1982). “Neural networks and physical systems with emergent collective computational abilities,” Proc. Nail. Acad. Sci. USA 79, 2554-2558. Hopfield, J. J. (1984). “Neurons with graded response have collective computational properties like those of two state neurons,” Proc. Natl. Acad. Sci. USA 81, 3088-3092. Hopfield, J. J., and Tank, D. W. (1985). “Neural computation of decisions in optimization problems,” Biol. Cyber. 52, 141- 152. Huber, J. (1985). “Projection pursuit,” Annals of Statistics 13(2), 435-475. Intel (1992). Intel Publication #241359, “801 70NX Neural Network Technology and Applications,” Intel, Santa Clara, California. Itakura, F., and Saito, S . (1968). “Analysis synthesis telephony based on the maximum likelihood method,” Proc. 6th Int. Conf. Acoustics, Tokyo, CI7-C20, p. 196. Jang, J.-S., Lee, S.-Y., and Shins, S.-Y. (1988). “An optimization network for matrix inversion,” in Neural Informution Processing Systems (D. Z . Anderson, ed.), pp. 397-401, AIP Press, New York. Jenkins, B. K., and Tanguay, A. R. (1991). “Photonic implementations of neural networks,” Chapter 9 in Neural Networksfor Signal Processing (B. Kosko, ed.), pp. 287-379, PrenticeHall, Englewood Cliffs, New Jersey. Jones, L. K. (1987). “On a conjecture of Huber concerning the convergence of projection pursuit regression,” Annals of Statistics 15(2), 880-882. Jones, L. K. (1989). “Approximation theoretic derivation of logarithmic entropy principles for inverse problems and unique extension of the maximum entropy method to incorporate prior knowledge,” S l A M J . Appl. Math. 49, 650-661. Jones, L. K., and Byrne, C. L. (1990). “General entropy criteria for inverse problems, with applications to data compression, pattern classification and cluster analysis,” IEEE Trans. IT^, 23-30, Jones, L. K., and Trutzer, V. (1989). “Computationally feasible high-resolution minimumdistance procedures which extend the maximum-entropy method,” Inverse Problems 5, 749-766. Jones, L. K., and Trutzer, V. (1990). “On extending the orthogonality property of minimum norm solutions in Hilbert space to general methods for linear inverse problems,” Inverse problems 6, 379-388. Kamgar-Parsi, B., and Gualtieri, J. A. (1992). “Solving inversion problems with neural networks,” Proc. IJCNN ’92, Baltimore, Vol. 111, p. 955. IEEE Inc., New Jersey. Kosko, B., ed. (1991). Neural Networks for Signal processing, Prentice-Hall, Englewood Cliffs, New Jersey. Kyuma, K. (1991). “Optical neural networks: a review,” Nonlinear Optics I, 39-49. Lininger, D. M., Martin, P. J., and Anderson, D. Z. (1989). “Bistable ring resonator utilizing saturable photorefractive gain and loss,” Optics Lett. 14, 697-699. Lo, K. P., and Indebetouw, G . (1992). “Iterative image processing using a cavity with a phase conjugate mirror,” Appl. Opt. 31, 1745-1753. Luenberger, D. G . (1969). Optimization by Vector Space Methods, Wiley, New York. Marks, R. J . (1980). “Coherent optical extrapolation of 2-D band-limited signals: Processor theory,” Appl. Opt. 19, 1670-1672.
IMAGE RESTORATION O N T H E HOPFIELD NEURAL NETWORK
47
Marks, R. J. (1981). “Gerchberg’s extrapolation algorithm in two dimensions,” App. Opt. 20, 1815-1820. Marks, R . J., and Smith, D. K . (1980). “Iterative coherent processor for bandlimited signal extrapolation,” Proc. SPIE 231, 106- 1 1 1. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). “Equation of state calculations by fast computing machines,” J . Chem. Phys. 21, 1087-1092. Nashed, M. Z., and Wahba, G. (1974). “Generalized inverses in reproducing kernel spaces: An approach to regularization of linear operator equations,” SIAM J . Math. Anal. 5,974-987. Owechko, Y. (1989a). “Self pumped optical neural networks,” in Optical Computing, O.S.A. Technical Digest Series, Vol. 9, pp. 44-47. Owechko, Y. (1989b). “Nonlinear holographic associative memories,” IEEE Trans. OE-25, 619-634. Owechko, Y., Dunning, G . J., Marom, E., and Soffer, B. H. (1987). “Holographic associative memory with nonlinearities in the correlation domain,” Appl. Opt. 26, 1900- 1910. Pao, Y.-H. (1 989). Adaptive Patrern Recognition and Neural Networks, Addison-Wesley, Reading, Massachusetts. Papoulis, A. (1975). “A new algorithm in spectral analysis and bandlimited extrapolation,” IEEE Trans CAS-22, 735-742. Poggio, T., and Girosi, R. (1989). “A theory of networks for approximation and learning,” A.I. Memo 1140, MIT A1 Laboratory. Psaltis, D., Brady, D., and Wagner, K. (1988). “Adaptive optical networks using photorefractive materials,” Appl. Opt. 27, 1752- 1759. Rao, C. R., and Nayak, T. K. (1985). “Cross entropy, dissimilarity measures, and characterizations of quadratic entropy,” IEEE Trans. on Info. Th. IT-31, 5. Rastogi, R., Gupta, P. K., and Kumaresan, R. (1987). “Array signal processing with interconnected neuron-like elements,” Proc. ICASSP, paper 54.8.1, pp. 2328-233 1. Rumelhart, D. E., and McClelland, J. L. (1986). Parallel Distributed Processing, Val. I : Foundations, MIT Press. Shore, J., and Johnson, R. (1980). “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy,’’ IEEE Trans. Info. The. 26, 26-37. Soffer, B. H., Dunning, G. J., Owechko, Y., and Marom, E. (1986). “Associative holographic memory with feedback using phase-conjugate mirrors,” Optics Letters 11, 118-120. Steriti, R., Coleman, J . , and Fiddy, M. A. (1990). “High resolution image reconstruction based on matrix inversion on a fully connected architecture,” Inverse Problems 6, 453-463. Steriti, R.J., and Fiddy, M.A. (1993). “Regularized image reconstruction using SVD and a neural network method for matrix inversion,” IEEE Trans. SP, to be published October 1993. Studt, T. (1991). “Neural networks: Computer toolbox for the ’ ~ O S , ” R&D Magazine, p. 36. Watanabe, T., Kimura, K., Aoki, M., Sakata, T., and Itoh, K. (1992). “A single 1.5V digital chip for a 106-synapse neural network,” Proc. fJCNN ’92, Vol. 11, p. 7, IEEE Inc., New Jersey. White, H. J., Aldridge, N . B., and Lindsay, I. (1988). “Digital and analogue holographic associative memories,” Opt. Eng. 27, 30. Widrow, B., and Steams, S . D. (1985). Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, New Jersey. Winters, J. H. (1988). “Super-resolution for ultrasonic imaging in air using neural networks,” Proc. IJCNN ‘88, p. 358. Yeh. P., Chiou. A. E. T., and Hong, J. (1988). “Optical interconnection using photorefractive dynamic holograms,” Appl. Opt. 27, 2093-2095. Yeh, P., Chiou, A. E., Hong, J., Beckwith, P., Chang, T., and Khoshnevisan, M. (1989). “Photorefractive nonlinear optics and optical computing,” Opt. Eng. 28, 328-343.
48
J. B. ABBISS er al.
Zhou, Y. T., and Chellappa, R. (1991). “Image restoration with neural networks,” in Neural Networks for Signal Processing (B. Kosko, ed.), p. 63, Prentice-Hall, Englewood Cliffs, New Jersey. Zhou, Y.-T., Chellappa, R., Vaid, A,, and Jenkins, B. K. (1988). “Image restoration using a neural network,” IEEE ASSP-36,I 141- 1 15 1. Zornetzer, S. F., Davis, J. L., and Lau, C., eds. (1991). An Introduction to Neuraland Electronic Networks, Academic Press, San Diego. Zurada, J. M. (1992). Artificial Neural Systems, West Publishing Co., St. Paul, Minnesota.
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS. VOL. 81
Fundamentals and Special Applications of Non-contact Scanning Force Microscopy U . HARTMANN Institute of Thin Film and Ion Technology. KFA-Julich. Federal Republic of Germany
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . I1. Probe-Sample Interactions in Non-contact Scanning Force Microscopy . . . . A . Methodical Outline. . . . . . . . . . . . . . . . . . . . . . . B. Van der Waals Forces . . . . . . . . . . . . . . . . . . . . . . C . Ionic Forces . . . . . . . . . . . . . . . . . . . . . . . . . D . Squeezing of Individual Molecules: Solvation Forces . . . . . . . . . . E . Capillary Forces . . . . . . . . . . . . . . . . . . . . . . . . F . Patch Charge Forces . . . . . . . . . . . . . . . . . . . . . . 111. Electric Force Microscopy Used as a Servo Technique . . . . . . . . . . . A . Fundamentals of Electrostatic Probe-Sample Interactions . . . . . . . . B. Operational Conditions . . . . . . . . . . . . . . . . . . . . . IV . Theory of Magnetic Force Microscopy . . . . . . . . . . . . . . . . . A . Basics of Contrast Formation . . . . . . . . . . . . . . . . . . . B . Properties of Ferromagnetic Microprobes . . . . . . . . . . . . . . C . Contrast Modeling . . . . . . . . . . . . . . . . . . . . . . . D . Sensitivity, Lateral Resolution, and Probe Optimization Concepts . . . . . E . Scanning Susceptibility Microscopy . . . . . . . . . . . . . . . . . F . Applications of Magnetic Force Microscopy . . . . . . . . . . . . . V . Aspects of Instrumentation . . . . . . . . . . . . . . . . . . . . . VI . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 51 51
53 102 112 119 127 129 129 131 133 133 138 157
182 183 189 191 195 197 197
I . INTRODUCTION In 1986 Gerd Binnig and Heinrich Rohrer shared the Nobel Prize in Physics for inventing the scanning tunneling microscope (STM) and discovering that it can image the surface of a conducting sample with unprecedented resolution (Binnig and Rohrer. 1982). The instrument utilizes an atomically sharp tip which is placed sufficiently close to the sample so that tunneling of electrons between the two is possible . The tunneling current as a function of position of the tip across the sample provides an image that reflects the local density of electronic states at the Fermi level of the uppermost atoms at the surface of the sample. On the other hand. the close proximity of probe and 49
Copyright 0 1994 by Academic Press. Inc. All rights of reproduction in any form reserved. ISBN 0-12-014729-7
50
U.HARTMANN
FIGURE1. Schematic of the solid-vacuum transition. N denotes the position of the outermost atomic nuclei serving as reference plane. I is the extent of the inner electrons, which is typically 10-30 picometers. The probability density of valence/conduction band electrons usually drops with a decay length V / L between 0.1 and 1 nm. The extent of electromagnetic surface modes, which are responsible for the van der Waals (VDW) interaction, is about 100 nm. Static fields resulting from electric and magnetic charge distributions within the solid may have various extents E / M ranging from a few nanometers up to a macroscopic dimension. [The illustration is based on a presentation previously given by Pohl (1991).]
sample results in a mutual force which is of the same order of magnitude as that of interatomic forces in a solid. This latter phenomenon gave rise to a novel development, the atomic force microscope (AFM), which was presented by Gerd Binnig, Calvin Quate, and Christoph Gerber in 1986. Here, the probing tip is part of a tiny cantilever beam. Probe-sample forces F are detected according to Hooke’s Law, F = -k.s, from microscopic deflections s of a cantilever with spring constant k . Unlike the tunneling microscope, the force microscope is by no means restricted to conducting probes and samples and it is not restricted to probe-sample separations in the angstrom regime. Thus, by modifying the working distance, probe-sample interactions of varying decay lengths become accessible, as shown in Fig. 1. Tip-sample interactions at atomically close separations predominantly result from the overlap of tip and sample electronic wavefunctions. Thus, the “contact” mode of operation of the force microscope is dominated by short-range interatomic forces. Conceptually, the contact mode of imaging is like using a stylus profilometer to measure the topography of surface atoms. The AFM achieves sub-nanometer to atomic resolution by using a very small loading force - typically to lo-” N - which makes the area of contact between tip and sample exceedingly small.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
51
With a sufficient increase in the probe-sample separation, long-range electromagnetic interactions dominate, as shown in Fig. I . In the “noncontact” mode of force microscope operation, both the force acting on the probing tip and the spatial resolution obtained upon detecting a certain type of interaction critically depend on the probe-sample separation and on the mesoscopic to macroscopic geometry of the microprobe’s apex region. The present work is devoted to a review of basic fundamentals and of some important applications of non-contact-mode force microscopy. In Section I1 a detailed discussion of the various surface forces that may occur between the probe and the sample of a force microscope is presented. Section 111 gives a brief introduction to electric force microscopy, which is realized by externally applying an electrostatic potential difference between probe and sample. Section IV is devoted to the basics of magnetic force microscopy, which currently appears to be the most important application of the force microscope in the non-contact mode of operation. Finally, some general principles of instrumentation are discussed in Section V. Concerning the terminology used throughout the present work, “scanning force microscopy” and “scanning force microscope” (both abbreviated by SFM) denote the technique and the instrument in the most general sense (contact or non-contact mode of operation), in contrast to “atomic force microscopy” and “atomic force microscope” (both abbreviated by AFM), which always refer to the contact mode of operation. Unfortunately, this terminology was not used in a consistent way throughout the earlier literature. Since the present work can only cover part of the many facets of the still rapidly growing field of non-contact SFM, the reader is referred to some previously presented excellent general introductions and reviews, among which are the recent articles by Wickramasinghe (1990) and by Rugar and Hansma (l990), as well as the book by Sarid (1991) and the book chapters by Meyer and Heinzelmann (1992), by Wickramasinghe (1992), and by Burnham and Colton (1992).
11. PROBE-SAMPLE INTERACTIONS I N NON-CONTACT SCANNING FORCE MICROSCOPY A . Methodical Outline A general theory concerning the long-range probe-sample interactions effective in non-contact scanning force microscopy (SFM), i.e., at probesample separations well beyond the regime of overlap of the electron wave
52
U. HARTMANN
equation nonequil. thermdyn
generalized Derjaguin approximation
forces
1
t
.
General theorv of noncontactiru? SFM
*
FIGURE 2. Schematic of the approach toward a general theory of non-contact scanning force microscopy.
functions, is a rather ambitious project. Even in the absence of externally applied electro- and magnetostatic interactions, the approach has to account for various intermolecular and surface forces which are, however, ultimately all of electromagnetic origin. Figure 2 gives a survey of the different components which generally contribute to the total probe-sample interaction. In the absence of any contamination on probe and sample surface, i.e., under clean UHV conditions, an ever-present long-range interaction is provided by van der Waals forces. In this area theory starts with some well-known results from quantum electrodynamics. In order to account for the typical geometry involved in an SFM, i.e., a sharp probe opposite to a flat or curved sample surface, the Derjaguin geometrical approximation is used, which essentially reduces the inherent many-body problem to a twobody approach. Under ambient conditions surface contaminants, e g , water films, are generally present on probe and sample. Liquid films on solids often give rise to a surface charge, and thus to an electrostatic interaction between probe and sample. The effect of these ionic forces is treated by classical Poisson-Boltzmann statistics, where the particular probe-sample geometry is again accounted for by employing the Derjaguin approximation. If the probe-sample separation is reduced to a few molecular diameters liquids can no longer be treated by a pure continuum approach. The discrete molecular structure gives rise to solvation forces which are due to the long-range ordering of liquid molecules in the gap between probe and sample. Finally, capillary condensation is a common phenomenon in SFM under ambient conditions. In this area the well-known Laplace equation provides an appropriate starting basis. Capillary action is then treated in terms of two extreme approaches: While the first is for liquid films strictly obeying a
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
53
thermodynamic equilibrium behavior represented by the Kelvin equation, the second approach is for liquids which are actually not in thermodynamic equilibrium. I t must be emphasized that the general situation in non-contact SFM is governed by a complex interplay of all the aforementioned contributions. The situation is further complicated by the fact that not all of these contributions are simply additive. The following detailed discussion relies on a macroscopic point of view. All material properties involved are treated in terms of isotropic bulk considerations, and even properties attributed to individual molecules are consequently deduced from the overall macroscopic behavior of the solids or liquids composed by these molecules. The considerations concerning the presence of liquids in SFM are of course not restricted to “parasitic” effects due to contaminating films, but in particular also apply to the situation where the SFM is completely operated in a liquid immersion medium or where just the properties of a liquid film, e.g., of a polymeric layer on top of a substrate, are of interest. B. Van der W a d s Forces 1. Generul Description of the Phenomenon
Macroscopic van der Waals (VDW) forces arise from the interplay of electromagnetic field fluctuations with boundary conditions on ponderable bodies. These field fluctuations result from zero-point quantum vibrations as well as from thermal agitation of permanent electronic multipoles and extend well beyond the surface of any absorbing medium - partly as traveling waves, partly as exponentially damped “evanescent” waves. According to this particular picture Lifshitz calculated the mutual attraction of two semi-infinite dielectric slabs separated by an intervening vacuum gap (Lifshitz, 1955/56). Since the Lifshitz “random field approach” involves a solution of the full Maxwell equations rather than of the simpler Laplace tin
substrate
substrate
FIGURF 3. Distribution of virtual photons associated with probe and sample. At close proximity an exchange of virtual photons takes place, giving rise to VDW interactions.
54
U. HARTMANN
equation, retardation effects are accounted for in a natural way. The well-known fundamental results of the London (Eisenschitz and London, 1930) and Casimir (Casimir and Polder, 1948; Casimir, 1948) theories are obtained as specific cases of this general approach. Since the VDW interaction between any two bodies occurs through the fluctuating electromagnetic field, it stands to reason that the following alternative viewpoint could be developed: As schematically shown in Fig. 3 for the typical probe-sample arrangement involved in SFM, the fluctuating electromagnetic field can be considered in terms of a distribution of virtual photons associated with probe and sample. Now, if both come into close proximity, an exchange of these virtual photons occurs, giving rise to a macroscopic force between probe and sample. This alternative viewpoint is actually the basis for a treatment of the problem by methods of quantum field theory. Using the formidable apparatus of the Matsubara-Fradkin-Green function technique of quantum statistical mechanics, Dzyaloshinskii, Lifshitz, and Pitaevskii ( 1961) rederived the Lifshitz two-slab result and extended the approach to the presence of any intervening medium filling the gap between the dielectric slabs. Subsequently, several other approaches to the general problem of electromagnetic interaction between macroscopic bodies, all more or less equivalent, have been developed by various authors (see, for example, Mahanty and Ninham, 1976). In the present context the most important aspect common to all this work is the following: On a microscopic level, the origin of the dispersion forces between two molecules is linked to a process which can be described by the induction of polarization on one due to the instantaneous polarization field of the other. However, this process is seriously affected by a third molecule placed near the two. The macroscopic consequence is that VDW forces are in general highly nonadditive. For example, if two perfectly conducting bodies (a perfect conductor may be considered as the limit of a London superconductor, as the penetration depth approaches zero) mutually interact via VDW forces, only bounding surface layers will contribute to the interaction, while the interiors of the bodies are completely screened. Thus, the interaction can certainly not be characterized by straightforward pairwise summation of isotropic intermolecular contributions, at least not in this somewhat fictitious case. However, it is precisely the explicit assumption of the additivity of two-body intermolecular pair potentials which is the basis of the classical Hamaker approach (1937). Granted additivity, the interaction between any two macroscopic bodies which have well-defined geometric shapes and uniform molecular densities, can be calculated by a simple double-volume integration. In spite of its apparent limitations, the Hamaker approach not
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
55
only has the virtue of ease in comprehension, but works over a wider range than would at first be thought possible. The most conspicuous result is that the additivity approach yields the correct overall power law dependence of the VDW interaction between two arbitrarily shaped macroscopic bodies on the separation between them. Although not rigorously proved, this appears to hold for the London limit, i.e., for a completely nonretarded interaction, as well as for the Casimir limit, i.e., complete radiation field retardation (Hartmann, 1990a). From the field point of view, the geometrical boundary conditions associated with the SFM’s probe-sample arrangement lead to tremendous mathematical difficulties in a rigorous calculation of VDW interactions, especially if retardation is included. Actually, several rather involved mathematical detours by various authors have shown that the key problem of a precise calculation of the magnitude of VDW forces as a function of separation of interacting bodies which exhibit curved surfaces can be solved fairly unambiguously only in some elementary cases involving spherical configurations (see, for example, Mahanty and Ninham, 1976). Because of all the aforementioned difficulties, it appears quite clear why a rigorous treatment of VDW interactions in SFM has not yet been presented. On the one hand, field theories are extremely complicated and tend to obscure the physical processes giving rise to the probe-sample forces. On the other hand, although two-body forces generally provide the dominant contribution, the explicit assumption of pairwise molecular additivity of VDW interactions of the many-particle system simply does not hold. The corrections due to many-body effects are generally essential in order to estimate whether the VDW interaction of a given tip-sample arrangement is within or well beyond the experimentally accessible regime. In what follows, a treatment of VDW interactions in non-contact SFM is proposed, which is based on elements of both the quantum field DLP theory and the Hamaker additivity approach. While some basic results from field theory provide an appropriate starting point, a characterization of material dielectric contributions, and a final analysis of the limitations of the developed framework, the additivity approach allows to account in a practical way, in terms of reasonable approximations, for the particular geometrical boundary conditions involved. In this sense the resulting model can best be referred to as a “renormalized Hamaker approach.” 2. The Two-Slab Problem: Separation of Geometrical and Material Properties The DLP theory (Dzyaloshinskii et al., 1961) gives the exact result for the electromagnetic interaction of two dielectric slabs separated by a third
56
U. HARTMANN
dielectric material of arbitrary thickness:
with j = I 2, and 7j(%
iVm1
P) = 47%
&ZiJPZ/C.
(2e)
In this somewhat complex expression, f ( z ) is the “VDW pressure,” i.e., the force per unit surface area exerted on the two slabs as a function of their separation z . kT is the thermal agitation energy, c the speed of light, and h Planck’s constant. p is simply an integration constant, and a , P, y, and 7 j are functions of p and the characteristic frequencies vm.The three media involved are completely characterized by their dielectric permittivities E ~ with j = 1,2,3, where “3” corresponds to the intervening medium. The summation in Eq. ( 1 ) entails calculating the functions tJ at discrete imaginmeans that only the first term of the sum has ary frequencies iv,, where to be multiplied by The dielectric permittivities at imaginary frequency are related to the imaginary parts of the dielectric permittivities taken at real frequency by the well-known Kramers-Kronig relation,
i.
(i)o
The imaginary parts of the complex permittivities €,([) = t;(E) + i c y ( < ) entering Eq. ( 3 ) are always positive and determine the dissipation of energy as a function of field frequency. The values of E, at purely imaginary arguments which enter Eqs. (1) and (2) are thus real quantities which decrease monotonically from their electrostatic limits E , ~to 1 for vrn-+ 00. Separation
,
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
57
of the entropic and quantum mechanical contributions involved in Eq. (1) can now simply be performed by considering the zero-frequency (rn = 0) and the nonzero-frequency contributions separately. In order to ensure convergence of &heintegral, it is wise to follow the transformation procedure originally given by Lifshitz (1956). With y = mp, one obtains
where
For m = 0, one thus has a. = 0 and PO = A130A230, where the latter quantity is determined by the electrostatic limit of A,,(iv) =
t,
(iv)- c3 (iv) iv) €3 (iv)’
€/(
+
given for v = 0. Using the definite integral
one finally obtains
58
U. HARTMANN
where 3 “ Am A m 130 m 3 230 e- 4- k TmC =l
incorporates all material properties in terms of the three static dielectric constants cia. A(.) characterizes the purely entropic contribution to the total VDW pressure given by Eq. (1) and involves a simple inverse power law dependence on the separation z of the two slabs. The zero frequency force is due to the thermal agitation of permanent electric dipoles present in the three media and includes Debye and Keesom contributions. For reasons of consistency with the following treatment of the quantum mechanical dispersion contribution the material properties are all incorporated into the socalled “entropic Hamaker constant” given by Eq. (8b). It should be noted that the latter quantity cannot exceed a value of [3<\3)/4]kT (C denotes Riemann’s zeta function), which is about 3.6 x J or 22.5meV for T = 300K. The maximum value is obtained for c10,c20 + 00 and €30 = 1, i.e., for two perfect conductors interacting across vacuum. According to Eq. (8b), the entropic Hamaker constant becomes negative if the static dielectric constant of the intervening medium is just in between those of the two dielectric slabs. This then leads via Eq. (8a) to a repulsion of the slabs. To discuss the dispersion force contribution resulting from zero-point quantum fluctuations, the nonzero frequency terms (m> 0) in Eq. (1) have to be evaluated. According to Eq. (2a), the discrete frequencies are given by 4 . 3 ~x IOl3Hz at room temperature. Since this is clearly beyond typical rotational relaxation frequencies of a molecule, the effective dielectric contributions according to Eq. (3) are solely determined by electronic polarizabilities. Absorption frequencies related to the latter are usually located somewhere in the UV region. However, with respect to this regime the urns are very close together. Thus, since one has from Eq. (2a) dm = (h/27rkT)dv, one applies the transformation (9) m=l
to Eq. (1) and obtains
where a(iv,p), P ( i v , p ) , and ~ ( u , i v , pare ) given by Eqs. (2b-e) - now, however, for a continuous electromagnetic spectrum. Since v l according
59
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
to Eq. (2a) is much smaller than the prominent electronic absorption frequencies, the spectral integration in Eq. (10) can be performed from zero to infinity. Following the DLP approach (Dzyaloshinskii et al., 1961), the asymptotic VDW pressuref(z + 0) is given from Eq. (10) by
with
Using the identity given in Eq. (7), one can rewrite this as
In almost all cases of practical interest, where some experimental results have to be compared with theory, restriction to the first term of the preceding sum should be sufficient, where corrections due to higher-order terms are always less than 1 - 1/[(3) = 16.7% of the m = 1 term. Equation (1 1) characterizes the dispersion contribution to the total VDW pressure acting on the two slabs in the London limit, i.e., in the absence of radiation field retardation at small separation z . The inverse power law dependence is exactly the same as in Eq. (8a). From the Hamaker point of view this is not so surprising, since intermolecular Debye, Keesom, and London forces all exhibit the same dependence on the separation of two molecules, l / r 7 (see, for example, Israelachvili, 1985). However, contrarily to the entropic Hamaker constant given by Eq. (8b) the "nonretarded Hamaker constant" according to Eq. (12b) now involves the detailed dielectric behavior of the three media through the complete electromagnetic spectrum. Since Hn is thus related to dynamic electronic polarizabilities, while He is related to zero frequency orientational processes, there is generally no close relation between both quantities. In the opposite limit of large separation between the two dielectric slabls, the asymptotic VDW pressure f ( z 4 m) obtained from Eq. (10) is given according to the DLP result (Dzyaloshinskii et al., 1961) by
-
60
U . HARTMANN
cr(0,p) and p(0,p) are again given by Eqs. (2b-d), but now in the static limit of the eIectronic polarizability. Using Eq.(7) one can rewrite the preceding as
Equation (13) characterizes the VDW pressure due to zero-point quantum fluctuations in the Casimir limit, i.e., for total radiation field retardation. A glance at Eq. ( 1 1) shows that, as in the case of two interacting molecules (Casimir and Polder, 1948; Casimir, 1948), retardation leads to an increase of the power law index by unity. However, the material properties now enter through Eq. (l4b) in terms of dielectric permittivities E,(O), j = 1,2,3, depending on the electronic polarizabilities in the electrostatic limit. Thus, ~ ~ (must 0 ) not be confused with orientational contributions E , ~determining the entropic Hamaker constant in Eq. (8b). H,[cl (0),~ 2 ( 0 ) c3(0)] , is called the “retarded Hamaker constant.” In spite of having already performed a tour de force of rather lengthy calculations, one is still at a point where one only has the VDW pressure acting upon two semi-infinite dielectric slabs separated by a third dielectric medium of arbitrary thickness. However, this is actually still the only geometrical arrangement for which a rigorous solution of equations of the form of Eq. (1) has been presented, which is equally valid at all separations and for any material combination. Without fail this means that the adaption of the preceding results to the SFM configuration must involve several serious manipulations of the basic results obtained from field theory. A certain problem in handling the formulae results from the convolution of material and geometrical properties present in the integrand of the complete dispersion force solution in Eq. (10). A separation of both, as in the case of the entropic component given by Eq. (sa), is only obtained for the London and Casimir limits characterized by Eqs. (1 1) and ( I 3), respectively. However, a straightforward interpolation between both asymptotic regimes is given by Hn tanh (x132/z> f(2)=- 67r 23
3
where (16) is a characteristic wavelength which indicates the onset of retardation. X132is determined by the electronic contributions to the dielectric permittivities via the quotient of the nonretarded Hamaker constant, according to Eq. (12b), and the retarded constant, according to Eq. (14b). This approximation is based on the assumption that Hn and H, have the same sign. It turns out x132
= 6.1rHr/Hn
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
61
that this assumption does not hold for any material combination of the two slabs and the intervening medium (see concluding remarks in Section II.B.5). It is fairly obvious that Eqs. (15) and (16) combined immediately yield the London and Casimir limit. This simple analytical approximation of the complex exact result, Eq. (lo), provides an accuracy which is more than sufficient for SFM applications. If entropic contributions are included, the total VDW pressure then is given by He + H , tanh -
This latter result shows that, while retardation causes a transition from an initial 1 / z 3 to a 1/z4 distance dependence of the dispersion contribution, the interaction is dominated by entropic contributions at very large separations, giving again a l/z3 inverse power law (Hartmann, 1991a). However, as will be shown later, this phenomenon is well beyond the regime which is accessible to SFM. 3 . Transition to Renormalized Molecular Interactions
The macroscopic DLP theory (Dzyaloshinskii et al., 1961) can be used to derive the effective interaction of any two individual molecules within two dielectric slabs exhibiting a macroscopic VDW interaction. Accounting for an intervening dielectric medium of permittivity e3 (iv),the intermolecular force is given in the nonretarded limit by F,(z) = - A / z I,
(18a)
where z is the intermolecular distance and
a;(&)are the dynamic electronic “excess polarizabilities” of the two interacting molecules in the immersion medium. For c3 = 1, i.e., interaction in vacuum, the 0;(iv)’s become the ordinary polarizabilities ctj(iv) of isolated molecules, and Eqs. ( 1 8) are identical with the well-known London formula (Eisenschitz and London, 1930). On the other hand, the retarded limit gives (Dzyaloshinskii et al., 1961; Israelachvili, 1972a) FJZ) =
with
-s/z 8 .
(19a)
62
U . HARTMANN
where the electronic contributions now have to be considered in their electrostatic limits. For c3 = 1 and a; = a,(O), the preceding result coincides with the classical Casimir-Polder result (Casimir and Polder, 1948; Casimir, 1948). Since these results have been derived from the macroscopic DLP theory (Dzyaloshinskii et al., 1961), the excess electronic polarizabilities reflect molecular properties that are generally not directly related to the behavior of the isolated molecule, but rather to its behavior in an environment composed by all molecules of the macroscopic arrangement under consideration, e.g., of the two-slab arrangement. The molecular constants A and B thus involve an implicit renormalization with respect to the dielectric and geometrical properties of the complete macroscopic environment. This means in particular that a;(iv) is not solely determined by the overall dielectric permittivities of all three media involved, but varies if for a given material combination only the geometry of the system is modified. Consequently, if a;(iv)is considered in this way, it involves corrections for many-body effects. Using the intermolecular interactions given in Eqs. (18) and (19) within the Hamaker approach (Hamaker, 1937), which involves volume integration of these pairwise interactions to obtain the macroscopic VDW force, yields the correct result if A and B are renormalized in an appropriate way. If, for example, the excess dielectric polarizability a;(iv) of a sphere of radius R and permittivity c,(iv), a; (iu) = 47rc0c3( i ~ ) 2 ~ ,(iv) 3 R
(20a)
with
is introduced into Eqs. (18) and (19) for two spherical particles separated by a distance d, one obtains the accurate result for the macroscopic dispersion interaction of the particles in the London and Casimir limits, respectively: H , R:RI F”(d)= - - _ _ 67r d’ ’
where the nonretarded Hamaker constant is given by
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
63
and
with retarded Hamaker constant 161hc Z A I 3 (0)2A23(0)
For an arbitrary geometrical configuration consisting of two macroscopic bodies with volumes VI and V 2 , the Hamaker approach is given by the sixfold integral
where Fn,rdenotes the nonretarded or retarded macroscopic dispersion force and fn,, is the renormalized two-body intermolecular contribution according to Eqs. (18) and (19). Equation (23) applied to the two-slab arrangement yields
F,(d) = - TPl P2A 36d3 ~
and F,(d) = -
X P I P2B ~
70d4
’
where pI and p2 are the molecular densities. Comparison of Eqs. (24a) and ( 1 1 ) as well as of Eqs. (24b) and ( 1 3) yields the effective molecular constants A and B in terms of their “two-slab renormalization:”
and
Using Eqs. (12b) and (28b), one obtains from Eq. (25a) with reasonable accuracy pc$(iv) = 2 ~ ~ ~ ~ ( i v ) A ~ ~ ( i v )
(26)
for the effective excess dynamic polarizability of an individual molecule ‘7,’’ where A,,(iv) is defined in Eq. (6). Employing this result in a threefold Hamaker integration and using Eqs. (20), the nonretarded interaction between a small particle or a molecule “2” and a semi-infinite dielectric slab
64
U. HARTMANN
“1” is given by
~1
Hn -, F n ( d )= - 67r d 4
with
The corresponding result for the retarded interaction can easily be derived from the original DLP work (Dzyaloshinskii et al., 1961):
with
The result holds for arbitrary dielectric constants q ( 0 ) .If especially ~ ~ (is0 ) sufficiently small (I5), the preceding result simplifies to H,
=
23hc 40.1r2m
If one has a metallic half space, is simply given by
(0) + 00, the retarded Hamaker constant
3hc - 47r2JE30
H -
(o)2A23(o)
2A23
(O).
While Eqs. (25) are ultimately the basis for the renormalized Hamaker approach used in the following, Eqs. (27) and (28) play a role in modeling processes of molecular-scale surface manipulation involving physisorption of large nonpolar molecules (see Section II.B.9). Equations (22) and (28) are finally used to check the limits of the presented theory as provided by size effects (see Section II.B.8). In order to analyze the behavior of a large molecule near a substrate surface, it is convenient to extend the somewhat empirical interpolation
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
65
given by Eqs. ( 1 5) and (16) to the particleesubstrate dispersion interaction. Equations (27) and (28) can then be combined as Hn F(d) = - R23
tanh(X132/d)
67r
d4
1
with the retardation wavelength given by A132 =
Hr 6~ -. Hn
This approach is valid for d >> R. 4. The Efleect qf Probe Geometry
In order to model the probe-sample interaction in SFM, the general expression for the VDW pressure previously obtained now has to be adapted to the particular geometrical boundary conditions involved. Since the actual mesoscopic geometry of the employed sharp probes, i.e., the shape at nanometer scale near the apex region, is generally not known in detail, it is convenient to analyze the effect of probe geometry by considering some basic tip shapes exhibiting a cylindrical symmetry. Additionally accounting for a certain curvature of the sample surface, one obtains the geometrical arrangement shown in Fig. 4. The force between the two curved bodies can be obtained in a straightforward way by integrating the interaction between the circular regions of infinitesimal area 27rxdx on one surface and the opposite surface, which is assumed to be locally flat and a distance C = d + zI z2 away. The error involved in this approximation is thus due to the assumption of local flatness of one surface usually of the sample surface, since the probing tip should be much sharper. However, since the VDW interaction according to Eq. (17) exhibits an overall l / z 3 distance dependence at small separations, those contributions of the force field in Fig. 4 involving increasing distances to the probe’s volume element under consideration exhibit a rapid damping with respect to near-field contributions. This effect is further enhanced by
+
force field
sample FlCiuRE
4. Basic geometry in the Derjaguin approximation
66
U. HARTMANN
radiation field retardation gradually leading to a 1 / z 4 inverse power law for large distances as given by the z + 00 limit of Eq. (17). According to Fig. 4, the VDW force between probe and sample is given by
where f ( 5 ) is simply the previously obtained VDW pressure between two slabs separated by an arbitrary medium of local thickness <, with <(x) = z I(x) z 2 ( x )+ d, and d is the distance between the apices of probe and sample. The relation between the cross-sectional radius x and the vertical coordinate z is given by
+
x = z tan 4,
for a cone with half-cone angle
(31a)
4,
for a paraboloid with semiaxes R , and R,,
for an ellipsoid with the preceding semiaxes. To unify calculations, it is convenient to define an effective measure of curvature by R
=
{
tan4 R:/2R,
(cone) (paraboloid).
R:/R,
(ellipsoid)
Combining Eqs. (31) and (32), one immediately obtains xdx =
(cone) (paraboloid, ellipsoid)
(33a)
with
EX = R * R 2 / ( R 1 +R2),
(33b) where R I and R2 characterize the curvature of probe and sample, respectively. Inserting these substitutions for xdx into Eq. (30) yields
for two opposite conical surfaces and F(d) = 2xRu(d)
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
67
for two opposite paraboloidal or ellipsoidal surfaces, respectively.
44 =
1;
dCf (0
(35)
is the total VDW energy per unit area of two flat surfaces separated by the same distance d as the apices of probe and local sample protrusion (see Fig. 4). For the special case of two interacting spheres with radii R l , R2 >> d, the preceding treatment is known as the Derjaguin approximation (Derjaguin, 1934). It should be emphasized that it is not necessary to explicitly specify the type of interaction f ( C ) which enters Eq. (30). The Derjaguin formulae (34) are thus valid for any type of interaction law, whether attractive, repulsive, or oscillating. In order to check the effect of probe geometry in detail, the dispersion pressure given by Eqs. (15) and (16) is inserted into Eqs. (34) and (35). One thus obtains for the conical arrangement
dx In cosh x.
-
In the nonretarded and retarded limits, wheref(C) in Eqs. (34) and (35) follows a simple inverse power law I/
used in Eq. (30) yields the particularly simple results
and F,(d) = -7rH,R2/3d2 (38b) as nonretarded and retarded limits of Eq. (36). According to Eq. (34b), the dispersion force for two opposing paraboloidal or ellipsoidal surfaces is given by
F(d)= - -
-
~
'132
-1 1
x132
Xi321d 0
Expansion of the ln(cosh) terms for large and small arguments leads to the nonretarded and retarded limits given by F,(d)
=
-H,,R/6d2
(40a)
and
F,(d) = -27rH,R/3d3.
(40b)
68
U . HARTMANN
If the sample locally exhibits an atomically flat surface, Eq. (33b) simplifies to lim W = R , RZ+X
where, according to Eq. (32), R is the effective radius of apex curvature for a paraboloidal or ellipsoidal probe and R = tan 4 for a conical probe. Apart from describing the probe-sample dispersion interation, Eq. (39) also characterizes the adsorption of a large nonpolar molecule or small particle on an atomically flat substrate surface. Directly at the surface, one has R >> d, and the nonretarded dispersion interaction is given by Eq. (40a). On the other hand, if the particle is initially far away from the substrate, R << d, the interaction is given by Eqs. (29). This involves a nonretarded transition of the dispersion force from a l / d 2 dependence at small distances to l i d 4 dependence at large distances, and finally a transition to l i d 5 at very large distances, which is due to retardation. If the retardation wavelength A132 of the particleesubstrate arrangement is assumed to be independent of the particle's distance from the surface, i.e., if A132 is the same in eqs. ( I S ) and (29a), then the particle-substrate dispersion interaction is modeled by
(9) 2
F ( d ) = 27rRw(d) tanh
,
where R characterizes the dimension of the particle according to Eq. (32), and w ( d ) is the specific energy obtained for the two-slab system as given in Eq. (35). The nonretarded l / d 2 to l i d 4 transition is determined by the transition length 1 H , from Eq. (27b) H , from Eq. (12b)'
K
which is more or less close to R. It can easily be verified that Eqs. (42) satisfy the limiting results given in Eq. (40a) for d << R and in Eq. (29) for d >> R. Equations (42) allow the modeling of particle or molecule physisorption processes if the involved dispersion interactions are governed by bulk dielectric properties. Figure 5(a) shows the decrease of the dispersion force for increasing working distance for a conical probe according to Eq. (36) and for a paraboloidal or ellipsoidal probe according to Eq. (39). The curve for the interaction of two slabs is given by Eq. (15), and the physisorption curve for a small particle or large molecule with a flat surface is obtained from Eqs. (42), where xf32/x132 = 0.1 was used as a somewhat typical nonretarded transition length. For reference, the interaction between small particles with
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
69
F I G U R 5E. Dispersion interaction for some rundamental arrangements. [a) shows the normalized force as a function o f separation, where denotes the material-dependent retardation wavelength. (b) shows the corresponding differential power law index to be used in an inverse power law ansatz for the distance dependence of the dispersion force.
U.HARTMANN
70
radii R , and R2 - according to Eqs. (21a) and (22a) modeled by F ( d ) = -(Hn/6r)R:R2 tanh(XI3,/d)/d7, with Xi32 given by Eq. (16) - is also indicated. If one performs the transformation (Hn/6r)R:R: A and B / A , with the molecular constants A and B given by Eqs. (l8b) and ( 19b), the latter curve corresponds to the intermolecular dispersion force and includes retardation effects. Even if all these formulae for the dispersion force involve the same material-dependent retardation wavelength XI 32, the gradual onset of retardation effects is clearly determined by the mesoscopic geometry of the interacting bodies (Hartmann, 1991b). This phenomenon is clarified by considering the differential power law index, -+
-+
k(d) =
-
da dd
- InF(d),
(43)
which has to be applied if the VDW force F ( d ) is approximated for a given distance d by an inverse power law of type F ( d ) N 1/dk(&). Application of Eq. (43) yields for the two-slab arrangement the simple result
wheref(d) is the VDW pressure according to Eq. (15). For a paraboloidal or ellipsoidal SFM probe, one obtains from Eq. (34b) k ( 4 = df(d)/w(dh
(44b) wheref(d) and w ( d ) are the VDW pressure according to Eq. ( I S ) and the VDW energy per unit surface area according to Eq. (35) obtained from the two-slab arrangement. The result for the conical probe is obtained from Eq. (34a):
The preceding results describe in detail the geometry-dependent transition to retardation for an extremely blunt probe (cylindrical, i.e., two-slab arrangement), for a realistic probe type (paraboloidal or ellipsoidal), and for the limit of an atomically sharp probe (conical). The distance dependence of the differential power law index according to Eqs. (44) is shown in Fig. 5(b). Additionally, the physisorption behavior of a small particle or large molecule onto a flat surface, obtained by applying Eq. (43) to Eq. (42a), is indicated. In the present context, the most important result from Fig. 5 is that VDW forces drop with an l/d2 inverse power law for the most realistic probe geometries, i.e., for paraboloidal or ellipsoidal apices, in the nonretarded
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
71
limit. This clearly indicates a long-range probe-sample interaction in comparison with the exponential decrease of a tunneling current or the short-range manifestation of interatomic repulsive forces resulting from electronic orbital overlap. Consequently, the spatial resolution which can be obtained in VDW microscopy should be determined by the mesoscopic probe dimensions at nanometer scale near the probe's apex, and of course by the probe-sample separation (Hartmann, 1991b). An estimation of the lateral resolution is obtained by determining the probe diameter A ( d ) corresponding to the center of interaction. The latter is determined by the maximum contribution of the integrand in Eq. (30), i.e., by
Accounting for the l/Ck dependence off,((), this can be replaced by the more convenient form k
1--x--0. ax
<
According to Eq. (1 1) the nonretarded VDW pressure involves k = 3. Using the geometrical relations given in Fig. 4, one obtains the remarkably simple results
A , ( d ) = 24-d
(47a)
for the resolution of a paraboloidal or ellipsoidal probe, and
A , ( d ) = Rd
(47b)
for a conical probe. R is related by Eqs. (32) and (33b) to the effective dimensions of probe apex and local sample protrusion. For an atomically flat sample, R + R simply characterizes the sharpness of the probe. While the minimum resolvable lateral dimension for a conical probe is proportional to the working distance d, paraboloidal or ellipsoidal probes exhibit a square root dependence on the working distance. While Eqs. (47) give the lateral resolution in VDW microscopy in terms of simple analytical results, more elaborate solutions can only be obtained numerically (Moiseev et al., 1988). Apart from quantifying the microscope's resolution, Eqs. (47) additionally tell us that the resolution is independent of material properties in the considered nonretarded limit. This is, however, not valid if retardation becomes effective. In this case the power law index k in Eq. (46) becomes distance- and material-dependent. Finally, it should be emphasized that the solutions obtained largely analytically for forces, power law indices, and lateral resolutions by using the approximate result for the dispersion pressure in Eq. (15) can be
72
U. HARTMANN
numerically obtained in an exact way by using the DLP result from Eq. ( l o ) , whenever the dispersion pressure f ( d ) for the two-slab arrangement is needed. Consequently, the corresponding specific dispersion energy in Eq. (35) has to be calculated by directly integrating Eq. (10). The result is then
+ In [ I
-
P(iv,p)exp
(-du,iv,P))l),
where a, p, and q are defined as in Eq. (10) for z = d. The entropic component always additionally present is rigorously given by Eq. (8). Integration of the latter equation immediately yields the entropic VDW energy w e ( d ) per unit surface area. Equations (8), (lo), and (48) then provide the general framework for a rigorous numerical calculation of probe-sample forces via Eqs. (34), of power law indices via Eqs. (44), and of lateral resolutions at any working distance via Eq. (45). However, the major advantage of the analytical treatment just presented is that it emphasizes the physical processes giving rise to VDW interaction in SFM, while the rigorous numerical treatment ultimately based on Eq. ( 1 ) tends to obscure the basic physical aspects because of considerable mathematical complexities.
5. Dielectric Contributions: The Hamaker Constants Apart from the probe and sample geometry considered earlier, the magnitude of VDW forces in SFM is determined by the detailed dielectric properties of probe, sample, and an immersion medium which may be present in the intervening gap. The real dielectric permittivities taken at imaginary frequencies, cj(iu), enter the two-slab dispersion pressure in Eq. (15) via the nonretarded Hamaker constant H , and via the retardation wavelength XI3*. The latter quantity is determined according to Eq. (16) by the ratio of H , to the retarded Hamaker constant H,. For a given probe-sample geometry, H , and H , thus completely determine the magnitude of the resulting force as well as the onset of retardation effects. The following discussion is devoted to a calculation of the Hamaker constants in terms of only two characteristic material properties: the optical refractive index and the effective electronic absorption wavelength. The energy absorption spectrum of any medium for frequencies from zero through to the ultraviolet (UV) regime is characterized by jjm 7
(49)
73
F U N D A M E N T A L S OF NON-CONTACT F O R C E MICROSCOPY
where the first non-unitary term describes the effect of possible Debye rotational relaxation processes, and the second models absorption using a Lorentz harmonic oscillator model of the dielectric (see, for example, Mahanty and Ninham, 1976). The characteristic constants cjl, vjl, vim are given in pertinent tables of dielectric data. The damping coefficients rj, associated with the Lorentz oscillations are rather difficult to determine and are not known in most cases. However, since for dielectrics the widths of the absorption spectra are always small compared with the absorption frequencies, i.e., y,,,<< v,,, the term rmv/v$can be dropped to a satisfactory approximation in Eq. (49). If the static dielectric constant is denoted by f j 0 , and if one only has one prominent rotational absorption peak for v = vj,,ot and one prominent electronic absorption peak for v = vj,,, the preceding may be written as
,fi,,,,
where n is the optical refractive index. While vrot is typically given by microwave or lower frequencies, v, is located in the UV regime, and for most materials of practical interest one has v, FZ 3 x lOI5Hz. If there are m individual electronic absorption frequencies, Eq. (50) has to be replaced by
where n,,.] + , = 1 . In the far-UV and soft x-ray regime, all matter responds like a free electron gas (see, for example, Landau and Lifshitz, 1960), and the response function changes to €,(ZV) =
1
+
(52)
where v, is now the free electron gas plasma frequency. This latter expression also characterizes approximately the dielectric permittivity of a metal from zero through the visible to the soft x-ray regime. In the intermediate regime between far UV and soft x-ray, there is little knowledge of ~(iv). However, some reasonable interpolation schemes may be constructed (Mahanty and Ninham, 1976). According to the preceding result, matter may roughly be subdivided into three classes of dielectric behavior, as shown in Fig. 6. For water, the simplest Debye rotational relaxation and some closely spaced infrared (IR) bands lead to variations of ~ ( bbelow ) the UV regime. Thus, ~ ( b ) has to be evaluated according to Eq. (51), conveniently using effective
74
U. HARTMANN 1
log v/Hz FIGURE 6 . Dielectric permittivity on the imaginary frequency axis as a function of real frequency for water, typical hydrocarbons, and typical metals. v, = 3 x lo” Hz is taken as the prominent electronic absorption frequency. The visible regime is indicated for reference.
values for refractive indices and absorption frequencies in the IR and UV regime (see, for example, Mahanty and Ninham, 1976), respectively. On the other hand, typical hydrocarbons (liquid or crystallized) exhibit a constant E(1’v)from zero frequency through the optical regime. The complex absorption spectrum in the near-UV regime is conveniently summarized by taking mean values corresponding to the first ionization potential (Mahanty and Ninham, 1976). In this case, cj(iv) is simply approximated by Eq. (50), where only Lorentz harmonic contributions have to be considered. The third class of dielectric behavior belongs to metals. In this case cj(iv) is simply given by Eq. (52), where typical plasma frequencies are 3-5 x 1015Hz (see, for example, Israelachvili, 1985), and cjo 4 00. According to Eqs. (6) and (12b), the nonretarded Hamaker constant H , is determined by the dielectric response functions of probe, sample, and intervening medium given according to Eqs. (50)-(52). Figure 7 shows the spectral contributions to the VDW interaction for some material combinations of practical interest. The dispersion force in the nonretarded regime is directly proportional to the area under a curve. The Hamaker constant is obviously most sensitive to spectral features between about 1 and 10-20 eV. This involves, for example, the widths of typical band gaps in semiconductors. The maximum H,, is found for two typical metals interacting across vacuum. If one metal is replaced by mica, a representative dielectric, H , becomes
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
3.0
I
*
-ll''oll
l'--rl'l
""""'
75
m "
~
across vocuurn across water
hv (eV) FIGURE 7. Spectral contributions to the nonretarded Hamaker constant H , for some material combinations of practical interest. H , is directly proportional to the area under a curve.
considerably smaller, and the maximum of the corresponding curve in Fig. 7 is slightly shifted to higher spectral energies. If both interacting materials are mica, H , further decreases, and the maximum is further shifted to the right. Two water films interacting across saturated air exhibit a still smaller H,, where the maximum is now slightly shifted to lower energies with respect to the mica/mica interaction. The long low-energy tail of the water/water curve is due to the IR spectral contributions, as discussed earlier. Filling the intervening gap between the two interacting media with water considerably reduces the H , values, respectively. This is not due to a Debye orientational process of the highly polar water molecules, but results from the dynamic electronic contribution e 3 ( i v ) > 1 to the response functions A13(iv)and A23(iv)defined in Eq. (6). The maxima of the curves in Fig. 7 are slightly shifted to lower energies with respect to the vacuum values. If each of the three media involved is characterized with respect to its dielectric permittivity by Eq. (50),and if all media exhibit approximately the same electronic absorption frequency v,, the nonretarded Hamaker constant H , according to Eq. (12b) can be evaluated analytically (Israelachvili, 198S), where sufficient accuracy is obtained if only the first term of the sum is considered. Since possible low-frequency rotational processes corresponding to the first term in Eq. (SO) are represented by the entropic constant He given according to Eq. (Sb), H , is simply given in terms of
U.HARTMANN
76 0.50.
1
I
I
0.40. 0.30
s" L
,"
\c I
0.20 0.10 ,
0.00 --0 1
0.25
I
1.oo
I
2.00
I
3.00
I0
FIGURE 8. Nonretarded Hamaker constant for dielectric systems as a function of optical refractive indices of probe, sample, and intervening medium ( n 3 ) . ve denotes the prominent electronic absorption frequency.
the three optical refractive indices rtj ( j = I , 2 , 3 ) for probe, sample, and immersion medium, and the absorption frequency v,:
The detailed behavior of H , as a function of the refractive indices is shown in Fig. 8. Let n2 be the smaller index with respect to the probe-sample ensemble. A reasonable range covering almost all dielectrics is given by 1/4 5 n 2 / n 3 5 4, where n 3 is the index of the intervening medium. If n 2 / n 3> 1, H , is always positive and is usually given by a point in between the curves for nl = n 2 and nl = 4. Most dielectric materials, however, lie in between the curves nl = n 2 5 2 and NI = 2. If the optical refractive index of the immersion medium matches that of either probe or sample, H , becomes zero and the nonretarded force vanishes. For n 2 / n 3 < 1, one has to distinguish between two regimes: If nl < n 3 , Hn is again positive and is located somewhere in between the curve n l = n 2 and the abscissa which corresponds to n 3 = n I . However, if nl > n 3 , H , becomes negative and, according to Eq. (1 I ) , the nonretarded dispersion pressure becomes repulsive. Almost all dielectrics in this regime lie in between the abscissa and the curve n1 = 4n2, while most of them are limited by the curve n, = 2n2.
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
77
The main conclusions from the preceding analysis of H , are: (i) If probe and sample materials are interchanged, the nonretarded dispersion force remains the same. (ii) For interactions in vacuum or for identical probe and sample materials, the nonretarded force is always attractive. (iii) If probe and sample are made from dielectrics with different refractive indices n l and n 2 , and if there is an intervening medium with n3 > 1, the nonretarded force may be attractive (n3 < n l , n 2 or n 3 > n l , n 2 ) ,vanishing ( n 3 matches either nl or n 2 or both of them), or repulsive (nl < n 3 < n 2 or nl > n3 > n 2 ) . (iv) According to Fig. 8, most of the dielectric material combinations produce a Hamaker constant H , 5 hv,/lO. With a typical value of ve : 3x Hz, one has H , 5 2 x l o p i 9J (1.2eV). H , has to be added to the entropic Hamaker constant He defined in Eq. (8b) to obtain the total nonretarded VDW force given for the z << XI32 limit of Eq. (17). Depending on the static values E , ~( j = I , 2 , 3 ) , the entropic force component may also be attractive, vanishing, or repulsive, where signs and magnitudes of H , and He are generally not correlated. However, the maximum value found for He (see Section II.B.2) is only 1.5% of the limiting value given earlier for H,. This clearly implies that entropic VDW forces play only a minor role in SFM applications (Hartmann, 1991~). If either the probe or the sample is made from a typical metal, characterized by Eq. ( 5 2 ) , the nonretarded Hamaker constant is given by
where ve is the prominent absorption frequency of the system, n 2 the optical refractive index of the remaining dielectric, and n3 that of the intervening medium. Equation (54) permits an estimate of the maximum repulsive dispersion force that can be obtained: n 2 << n 3 yields H , = -(3/8fi)n;hve/(n,' n 3 ) , which gives for large 113 and ve = 3 x l o i 5Hz the upper limit lHnl 5 5 x J (3.1 eV). For two dielectrics with different absorption frequencies veI and ve2,interacting across vacuum, one obtains
+
which was already presented by Israelachvili (1985). If either probe or sample is a metal, one finds
H --h
"-8fi
3
78
U .HARTMANN
and, if both probe and sample are metallic, H - - t i 3- ,
" - 8Jz
veive2 vei ve2
(57)
+
which reduces to the particularly simple result H , = (3/16&)hv, (Israelachvili, 1985) if both metals have the same electronic absorption frequency. Assuming a free-electron gas plasma frequency of 5 x l O I 5 Hz, one obtains H , 5 9 x loi9J (5.4eV) as a realistic upper limit for metallic probeesample arrangements. The dependence of the nonretarded Hamaker constants on the electronic absorption frequencies of probe and sample is shown in Fig. 9 in detail. For given values of vel and ve2,the resulting H, is always highest if both probe and sample are metals. The metal/dielectric arrangement yields lower values depending on the optical refractive index n of the dielectric (either probe or sample). The lowest values of H , are obtained if probe and sample are dielectric, where the magnitude of the nonretarded dispersion force now depends on nl and n2. Anyway, an increase of the absorption frequencies veland ve2always leads to an increase of H,.
- dielectric/dielectric --
rnetal/rnetal or rnetal/dielectn'c
0.3
n 1 =4.0
u,/u, (metal/metal), (n+ ;
else
1)"*v2/v,
FIGURE 9. Nonretarded Hamaker constant H , as a function of the prominent absorption frequencies v, and v2 of the probe-sample combination. n, and n2 denote the optical refractive indices if dielectric materials are involved.
79
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
1.o
I
I
I
I
(a> 0.8-
-
0.6-
-
0.4-
-
0.2-
-
n
r3
c
0
'= . 6
Ng
5 I 0.0 0.0
I
-.6 0.0
I
I
0.2
0.4
I
0.4
0.2
I
I
I
0.6
0.8
1 .o
I
0.8
I
1 .o
0.6
"2ln3 FIGURE 10. Retarded Hamaker constant H , as a function of effective refractive indices (infrared and visible). (a) shows the positive values of H , , where both probe and sample have larger indices than the intervening medium (n3).(b) shows the situation if the indices of probe and sample are smaller than that of the immersion medium. The dotted lines indicate results from the low-permittivity analytical approximation. Both numerical and analytical results correspond to the first term of the infinite series involved.
80
U. HARTMANN
The preceding results obtained for H , are only part of the whole story. The total VDW pressure according to Eq. (1 7) is completely characterized if, apart from He and H , , the retarded Hamaker constant H , according to Eq. (14b) is also calculated. H , depends on the static electronic limits ~ ~ (of0 ) the dielectric response functions of probe, sample, and intervening medium. Since the relative magnitudes of ~ ~ ( (0j = ) 1’2’3) are in general related neither to the overall behavior of the functions t,(iv) [Eq. (12b) via (6)] over the complete electromagnetic spectrum, nor to the quasistatic orientational contributions ejo [Eq. (8b) via (6)], H , is apriori not closely related to H , and He with respect to sign and magnitude. Apart from the magnitude of the dispersion pressure in the retarded limit given by Eq. (13), H , determines together with H , via the retardation wavelength [Eq. (16)] the onset of retardation effects. The electrostatic limits of the electronic permittivity components are given from Eq. (51) by cj(0)total
2
2
- (cjo - nil) = njl
+
ej(0)electronic.
(58)
As for most hydrocarbons (see Fig. 6) the njI’soften equal the usual optical refractive indices nj. However, as in the case of water, which is of particular practical importance for many SFM experiments, nil is sometimes determined by lower-frequency (IR) absorption bands. However, introduction of generalized refractive indices ni ranging from unity to infinity in Eq. (l4b) permits a unified analysis of H , for all material combinations, i.e., metals and dielectrics. The resulting values of H,, as depending on the individual refractive indices nil = n,, are shown in Fig. 10. Let n 2 be the smaller index for the probe--sample system under consideration. n 3 is the index of the intervening immersion medium. If n 2 > n 3 (Fig. IOa), H , is always positive, and its magnitude is given by a point in between the curves for n l = n 2 and nl + 00. For n l , n 2+ 00 (two interacting metal slabs), one obtains from Eq. (14b) H
n hc - 480
n3’
(59)
which gives, according to Eq. (13), a retarded dispersion pressure which is completely independent of the nature of the employed metals - a property that does not hold for small distances, where the dispersion force according to Eqs. (11) and (12b) depends on higher-frequency contributions to the dielectric response functions which are generally different for different metals. For n 3 = 1, Eq. (59) coincides with the well-known Casimir result (Casimir, 1948). If only typical dielectric materials are involved, Eq. (14b) may be evaluated analytically (Israelachvili, 1972a) by expanding cr(0,p)
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
81
and P ( 0 , p ) for small ~,(o)/c~(O),j = I , 2. The approximate result is
with A,, according to Eq. (6). The validity of this approximation, depending on the magnitudes of n l / n 3 and n2/n3, can be obtained from Fig. 10. For two interacting metals, i.e., for A,,(O), A23(0) + 1, the low-permittivity approximation still predicts the correct order of magnitude for H I . More precisely, Eq. (60) yields 69C(4)/(27r4) M 38% of the correct value given by Eq. (59). If one has nJ/n3= 1 at least for one of the quotients ( j = I , 2), H , becomes zero and the retarded dispersion force vanishes. If n 2 < n 3 (Fig. lob), one has to distinguish between two regimes: If also nl < n3, HI is again positive and is located in between the abscissa, corresponding to n l = n 3 , and the curve nl = n 2 . The maximum value of this latter curve is obtained for n , /n3, n 2 / n 3 -.+ 0. The low-permittivity approximation, which underestimates the exact value, yields in this case again 38% of the value given by Eq. (59). On the other hand, if n l > n3, H , becomes negative and is given by some point in between the abscissa (nl = n 3 ) and the curve for nl + 00. The approximation for the minimum of this latter curve, obtained for n 2 / n 3 i 0, i.e., A13, A13 --t 1 in Eq. (60), gives a magnitude of 38% of the value in Eq. (59), which is, according to Fig. 10, an underestimate of the exact value. The maximum repulsive dispersion force that can be obtained for any material combination is obtained from the condition SH1/Sn3 = 0, where nl + CG is an obvious boundary condition to achieve high HI values. The use of Eq. (60) yields n 2 = nj(\/5 - 2)'12 and a maximum repulsive retarded dispersion force with a magnitude of about 22% of the value in Eq. (59). This is again slightly underestimated with respect to the exact value numerically obtained from Eq. (14b). It should be emphasized that the entropic Hamaker constant He scales with kT; the nonretarded constant with hv,; and the retarded constant with hc. The absolute maximum obtained for H , is for two metals interacting across vacuum and amounts according to Eq. (59) to H I = 1.2 x Jm (7.4eVnm). Comparison of Eqs. (60) and (53) confirms that the previous statements (i)--(iii) characterizing the behavior of the nonretarded force can be directly extended to the retarded force, however, where one now has to consider the low-frequency indices nil (Eq. (51)) instead of the ordinary optical indices n,. If there is no absorption in the IR regime, the situation is simple, and = n, as in the case of hydrocarbons. However, strong IR absorption, as in the case of water, considerably complicates the situation: The relative weight of different frequency regimes (IR, visible, and UV) becomes a sensitive function of separation between probe and sample. At
82
U. HARTMANN
small distances (nonretarded regime) the interaction is dominated by UV fluctuations. With increasing distance these contributions are progressively damped, leading to a dominance of visible and then IR contributions. For very large separations the interaction would finally be dominated by Debye rotational relaxation processes. This complicated behavior may in principle be characterized by treating the different spectral components according to Eq. (50) additively in terms of separate Hamaker constants and retardation wavelengths. In the present context, the major point is that, because of a missing correlation between the magnitudes of nj and nj, H , and H , may have differrent signs, i.e., the VDW force may be attractive at small probesample separation and exhibit a retardation-induced transition to repulsion at larger separations, or vice versa. In this case the simple analytical approximation of the DLP theory given in Eqs. (17) breaks down. However, even in this case it is possible to keep the concept of separating geometrical and dielectric contributions. The DLP result from Eq. (1) may now be modeled by
where the definitions of He, H,, H,, and X132 remain totally unchanged. This
--
repulsive
-
Y
0
-
-3-
II
-
-6-
0.1
0.5
1.0
5.0
10.0
z / b FIGURE 1 I. Dispersion pressure for the two-slab configuration as a function of separation. If the system exhibits strong infrared absorption, a retardation-induced transition from attraction to repulsion (or vice versa) may occur. An overall attractive (or repulsive) interaction occurs if nonretarded and retarded Hamaker constants have the same sign.
83
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
formally implies the occurrence of negative retardation wavelengths which are obtained according to Eq. (16) if H , and H , have different signs. Equation (61) exhibits the same behavior as Eq. (17) for the zs)(A132 limits, but additionally demonstrates a retardation-induced transition between the attractive and repulsive regimes at z = A132, as can be seen in Fig. 1 1 . A completely blunt (cylindrical) SFM probe would detect a force exactly corresponding to the curves obtained for the VDW pressure. However, according to Eq. (34b), more realistic probe models (paraboloidal, ellipsoidal) predict a measurement of forces being proportional to the specific VDW energy given via an integration of Eq. (61). This implies that the transition distance measured with a paraboloidal or ellipsoidal probe is somewhat smaller than that measured with a cylindrical probe (Al32). The smallest transition separation is, according to Eq. (34a), obtained for a conical probe. The intriguing conclusion is that for a probe-sample interaction which does not involve a monotonic distance dependence, the force measured at a given probe-sample separation may be attractive for one probe and repulsive for another with different apex geometry. defined in Eq. (16) depends on the The retardation wavelength dielectric response functions of probe, sample, and intervening medium. Retardation effects of the radiation field between probe and sample become noticeable if the probe-sample separation is comparable with AIj2. The retardation wavelength is thus closely related to the prominent absorption wavelength A, = c / u , of the material combination, which is usually about IOOnm, i.e., within the UV regime. The actual onset of retardation effects, manifest in a gradual increase of the differential power law index k according to Eq. (43), is then for a given material combination determined by the probe geometry (see Section II.B.4). In the following, some simple analytical results for X132 are presented which allow a straightforward verification of the relevance of retardation effects for most material combinations of practical importance to SFM. Combining Eqs. (53) and (60), one obtains the retardation wavelength for a solely dielectric material combination. First-order approximation yields A132
=2 23& 1
207T
%I
(A n + v) ~ 3 ’(~ ,/-+{z,,. n,2 rill +n,, n,?
I=,
1
-
3
ye
)1. (62)
If the system does not exhibit effective IR absorption, i.e., n,l = nj ( j = 1,2,3), the product in parentheses reduces to unity and A132 is solely determined by the ordinary optical refractive indices and the prominent electronic UV absorption frequency. If the probe or the sample is
84
U. HARTMANN
metallic, Eqs. (54) and (60) yield the approximate result
where the product in parentheses again becomes unity in the absence of IR absorption. If dielectric probe and sample have different absorption frequencies and if they interact across vacuum, Eqs. (55) and (60) approximately give
which again simplifies for rill = nl. If either the probe or the sample is metallic, one obtains from Eqs. (56) and (60)
1.00 1
I
I
I
_ -
0.80-
<
0.700.60-
N
2
0.50-
4 metal/dielectrk
-
nnl=1.5 l=1.0 n1=2.0 n1=4.0
_
dielectric/dielectriC
0.10-
0.00
----------__________
1 1
_
7
I
I
2
3
I
4
I
5
"2 FIGURE 12. Retardation wavelength X13z as a function of the optical refractive indices of probe and/or sample interacting across vacuum. u, is the prominent electronic absorption frequency of the system of which the absence of infrared absorption bands is assumed. The upper limit provided by the metal-metal arrangement is indicated for reference.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
85
with the aforementioned simplification for n Z 1= n2. If a purely metallic probe-sample system interacts across vacuum, Eqs. (57) and (59) yield the exact result
which only involves the free-electron gas plasma frequencies as characteristics of the metals. If one in particular has clue, = c/ueZ= A,, amounts to 93% of A,. A glance at Eqs. (62)-(65) shows that this can be considered as an upper limit for any material combination with n,, = n, ( j = 1,2,3), i.e., for arrangements where IR absorption only plays a minor role. On the other hand, large values of A132 are obtained according to Eq. (16) if H , is nearly vanishing and H , is determined by IR absorption. Figure 12 shows typical values of A132 obtained in an accurate way by numerically solving Eqs. (12b) and (14b). The maximum value for A132 in a solely dielectric probe-sample arrangement is about 31% of A,. For a metal/dielectric combination this value amounts to 37%. Both values are considerably lower than the aforementioned value, which may be obtained for a metal/metal combination of probe and sample. Typical values of A132 are 20-35% of A, if one does not have a purely metallic arrangement.
v e d v e1 FIGURE13. Retardation wavelength as a function of the prominent ultraviolet absorption frequencies vel and ve2 of probe and sample. n l and n2 denote the ordinary optical refractive indices if dielectric materials are involved. The curves are valid for systems without effective absorption bands in the infrared regime.
86
U. HARTMANN
The rigorous solution for as a function of the prominent UV absorption frequencies involved is shown in Fig. 13. The minimum value of X132 for a metal/metal arrangement is about 46% of A,, = c/vel if ve2 + vel. For a metal/dielectric or dielectric/dielectric combination, XIj2 can be much smaller depending on the optical refractive indices involved. -1
1
1
, ' ' I
-
I: metal-air
-2-
-3-
-
E
-4-
-
'=.
-5-
-
-6-
-
-7-
-
n
N
C
5I C
0
0 -
I -
-8-9
II 111 IV V
-
-10,
I
1
5
8
I
" ' I
10
50
'
'
8
1
100
z (nm>
FIGURE 14. (a) shows the two-slab VDW pressure as a function of separation for some representative material combinations. (b) shows the corresponding retardation-induced increase of the differential power law indices.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
87
Systems with extremely low retardation wavelength may be constructed according to Eq. (62). Suitable material combinations consist of a dielectric probe, sample, and immersion medium with an appropriate choice of refractive indices, minimising X132. Most effective would be a match of the IR indices, n31 of the immersion medium with n i l and/or n21 of the probe-sample combination. Unfortunately, reliable IR data are not available for most materials. For material combinations which do not exhibit pronounced 1R absorption, the ordinary optical indices nj ( j = 1,2,3) should all be as large as possible, where a highly refractive immersion medium ( n 3 ) is especially effective. In this way, retardation wavelengths smaller than 10 nm are generated, which opens the way for an experimental confirmation of radiation field retardation effects by SFM (see Section II.B.6).
6. On the Observability of van der Waals Forces The framework for calculating VDW forces for any material combination and any probe geometry as a function of probe-sample separation is now complete. The material properties of a certain system are characterized by the three Hamaker constants He,H , , and Hr according to Eqs. (8b), (12b), and (14b). This includes the determination of the retardation wavelength via Eq. (16).The total VDW pressure f ( z ) for the two-slab arrangement is then given by Eqs. (17) or (61) in terms of a reasonable approximation. For relevant probe geometries, the VDW interaction is characterized by Eq. (34b), which involves the probe’s effective radius of curvature. An estimate of the resulting lateral resolution is obtained from Eq. (47a). Figure 14 shows the typical order of magnitude of the two-slab VDW pressure as well as the material-dependent onset of retardation effects for some representative material combinations. The dielectric data used for these model calculations are given in Table I. In the regime from 1 to TABLE 1 DIELECTRIC DATAUSEDFOR THE CALCULATIONS’
Metdl/air/rnetal Micaiairirnica H20pdir/H20 Hydrocarbon/air/hydrocarbon Mica/HzO/mica
40 10
3.7 7.1 2.0
0.30 0.17 0.29 0.04 0.2 1
130 9.3 4.5 8.7 2.0
61 20 23 23 17
’ For reasons of comparison, the present data are deduced from the basic data given by lsraelachvili ( I972 b, 1985). For water, infrared absorption contributions have been neglected.
88
U . HARTMANN
1-
I
1
“ “ I
probe radius: 1bOnrn
U
-5-6-
(a>
-7
I
1
5
‘
~~~1
--__
I I
i-.
I I
I I
I
I
I
I
10
50
’
r
-
3
1 I0
d (nm>
A
n
E
FIGURE15. VDW interaction of a IOOnm metal probe with a metal and a mica substrate under clean vacuum conditions, respectively. The retardation wavelengths X for the metal/metal and metal/mica configurations are indicated. The cntropic limit determines the absolute roomtemperature maximum for thermally agitated interaction contributions. Deviations from a linear decrease of the curves with increasing probe-sample separation reflect the gradual onset of retardation effects. The indicated experimental limits are accessible by state-of-the-art instruments. (a) shows the forces measured upon static operation of the force microscope and (b) the vertical force derivative, detected in the dynamic mode.
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
89
100 nm separation, the pressure drops by six to seven orders of magnitude in ambient air (or vacuum). As mentioned before, two typical metals yield the strongest possible interaction. Mica, representing a typical dielectric material, yields a pressure of about 25% of the metal value in the nonretarded regime and of about 7% in the retarded limit, respectively. Crystallized hydrocarbons and water are most frequently the sources of surface contaminations. These media only exhibit a small VDW pressure with respect to the metal limit. Consequently, if initially clean metal surfaces become contaminated by films of hydrocarbons or water, the VDW interaction may decrease by 80-90% or more for a given width of the intervening air gap. If the complete intervening gap between two mica surfaces is filled with water, the VDW pressure drops with respect to the air (or vacuum) value by about 80%. The onset of retardation effects also critically depends on the system composition. The metal system yields the highest The hydrocarbon and water values are about the same. The retardation wavelength for two mica slabs in air (vacuum) is reduced by about 17% if the intervening gap is filled with water. As an example of direct practical relevance, Fig. 15 shows the VDW interaction between a realistic metal probe (paraboloidal or ellipsoidal)
mica fused quartz
I I
I I
"'I"
z 5
.
0.05
I
->
v
polystyrene
o
hydrocarb.
attractive
LL
o.oot
probe-sample distance: 2nm
1.2
1.3
1.4
1.5
1.6
1.7
n FIGURE 16. V D W force between a metal probe operated in a benzene immersion and with various dielectric substrates at a fixed probeesample distance. n denotes the sample's ordinary optical refractive index. For purposes of comparison refractive indices and absorption indices have been choscn according to Isrdelachvih (1985).
90
U. HARTMANN
with a mesoscopic radius of apex curvature of l00nm and two different atomically flat substrates; a typical metal and mica, representing a typical dielectric. Assuming an experimental sensitivity of 10 pN, which is not unrealistic for present-day UHV-SFM systems, forces should be detectable up to about 20nm for the metal sample and up to about 10nm for mica. Radiation field retardation becomes effective just near these probesample separations. The entropic limit, according to Eq. (8a) with He = 3.6 x J, indicates that thermally agitated VDW forces could only be measured at working distances 5 1 nm. In the dynamic or “ac” mode of SFM, the vertical force derivative F ’ ( d ) = 6F/Sd is detected. An accessible experimental sensitivity may be given by 10 pN/m. This extends the measurable regime up to about 70nm for the metal sample and up to about 50 nm for mica. According to Fig. I3b, this clearly involves the onset of retardation effects. Performance of SFM in an immersion medium generally offers the possibility to choose material combinations yielding attractive, repulsive, or just vanishing VDW interactions between probe and substrate. Assuming a metal probe operated in a benzene immersion at a fixed probe-sample separation, Fig. 16 gives the resulting VDW forces for various dielectric substrates as a function of the ordinary optical refractive index n of the sample according to Eq. (53). While polytetrafluoroethylene (PTFE), CaF2, and fused quartz with n < 1.5 produce repulsive nonretarded VDW forces, polyvinylchloride (PVC), polystyrene, and mica with n > 1.5 yield attractive forces. Crystallized hydrocarbons just match the index of benzene, n = 1.5, and the VDW force reduces to the small entropic contribution. 7. The Effect of Adsorbed Surface Layers The analytical solutions for the VDW pressure of the two-slab configuration given in Eqs. (17) and (61) allow straightforward extension to multilayer configurations. Figure 17 shows the basic geometry for two slabs “1” and
FIGURE 17. Basic geometry of the four-slab arrangement used to analyze the interaction of two bulk media which have surfaces covered with adsorbed layers.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
91
“2,” both with an adsorbed surface film “4.” An arbitrary intervening substance is denoted by “3.” If we extend previous results found for the nonretarded interaction (Mahanty and Ninham, 1976; Israelachvili, 1972a) to an analysis for arbitrary separations d and film thicknesses t41 and t42, the VDW pressure is given by .fad(Z) =.h34(z) -h41(z
+ t41)
-f342(z+
t42) + f i 4 2 ( z +
t41 f t42).
(67)
The subscripts of type kIm denote the material combination which actually has to be considered to calculate an individual term of this expression. k and m denote the opposite slabs, respectively, and 1 the intervening medium. The solution of this four-slab problem is thus reduced to the calculation of four “partial” VDW pressures involving four sets of Hamaker constants. Since these partial pressures have different entropic, nonretarded, and retarded magnitudes and varying retardation wavelengths, the distance dependence of .fad is generally much more complex than that for the two-slab arrangement, in particular, if there are fk,,-terms showing a retardation-induced 5.0
-
I ‘ ~ ~ PTFE/adsorbate/immersion 1
4.0
‘
~ ’‘ ’l ~ “ ‘ ‘ ‘ 1 ‘ , ‘ medium/adsorbate/mica 1
. T
3,O hydrocarbon adsorbate, H20 immersion
LL
0
2.0
LL
hydrocarbon adsorbate, vacuum
5 0 u-
---__ --
1.o
.-
H20 adsorbate,.,.’ vacuum
---
I
1
-“-\ __
I
1.o
r
’
“““I
10.0
‘
‘
curved surf
- slabs
_ _ C C
‘
-
~
‘
~
100.0
~r I
‘
‘
I
.
1( 0.0
FIGURE18. Model calculation showing the effect of adsorbed hydrocarbon (liquid or crystallized) or water layers on the interaction between polytetrafluoroethylene (PTFE) and a mica surface. The quotients f,d/f and F a d / F denote the force ratios obtained for adsorbatecovered surfaces with respect to clean surfaces, for planar and paraboloidally or ellipsoidally curved surfaces, respectively. The adsorbate thickness t is assumed to be the same on PTFE and mica. d denotes the width of the intervening gap, either for vacuum or water immersion. The curved- and planar-surface curves for hydrocarbon adsorbate in vacuum cannot be distinguished within the accuracy of the plot. The dashed curves would be detected with a typical probe in dc-mode force microscopy, while the solid lines reflect ac data.
92
U . HARTMANN
changeover between attractive and repulsive regimes according to Eq. (61). However, it follows immediately from Eq. (67) that f a d ( z )+ f 4 3 4 ( z ) for t 4 , / z lt 4 2 / ~+ 00; for large thicknesses of the adsorbed surface layers the VDW pressure is solely determined by the interaction of the layers ~ 0, one across the intervening medium. On the other hand, if t 4 1 / ~t, 4 2 / + immediately finds,fad(z) + f i 3 * ( z ) = f ( z ) , which is simply the solution of the two-slab problem according to Eq. (17) or (61). In the latter case the interaction is dominated by the interaction of the two bulk media across the intervening medium. Figure 18 exemplarily shows the considerable differences of the VDW interactions which occur if initially clean polytetrafluoroethylene (PTFE) and mica surfaces adsorb typical hydrocarbons (liquid or crystallized) or water. The adsorption of hydrocarbons slightly increases the vacuum forces. However, if the intervening gap is filled with water, the magnitude of the VDW forces increases by about a factor of four with respect to the interaction of clean surfaces across water. Water adsorption in air considerably reduces the forces with respect to clean surfaces. In all cases involving adsorbed surface layers, the bulk interaction value is not approached before the intervening gap exceeds the layer thickness by two to three orders of magnitude. This clearly emphasizes the fact that VDW interactions are highly surface-sensitive: Even a monolayer adsorbed on a substrate considerably modifies the probe-sample interaction with respect to the clean substrate up to separations of several nanometers. The situation is additionally complicated by the fact that the difference in VDW force measured between clean and coated substrate surfaces also depends on the probe geometry (see Fig. 18). This intriguing phenomenon is due to the integral equations (34) determining the probe-sample force from the two-slab pressure.
8. Size, Shape, and Surface EfSects: Limitations of the Theory The rigorously macroscopic analysis of VDW interactions in SFM implicitly exhibits some apparent shortcomings which are ultimately due to the particular mesoscopic, i.e., nanometer-scale, physical properties of sharp probes and corrugated sample surfaces exhibiting deviations from ordinary bulk physics. To obtain an upper quantitative estimate for those errors resulting from size and shape effects, it is convenient to apply the present formalism to some particular worst-case configurations for which exact results from quantum field theory are available for comparison. Two such arrangements which have been subject to rigorous treatments are two interacting spheres and a sphere interacting with a semi-infinite slab. These configurations do reflect worst-case situations insofar as the sphere of finite
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
93
size emphasizes geometrical errors involved in the Derjaguin approximation as well as shape-induced deviations from a simple bulk dielectric behavior. Size and shape effects occur when the probe-sample separation becomes comparable with the effective mesoscopic probe radius as defined in Eqs. (32). Since realistic probe radii are generally of the order of the retardation wavelengths defined by Eq. (16), the following analysis is restricted to the retarded limit of probe-sample dispersion interaction to obtain an upper boundary for the involved errors. However, extension of the treatment to arbitrary probe-sample separations is straightforward. According to the basic Hamaker approach given in Eq. (23), the dispersion interaction between a sphere of radius R1 and a single molecule at a distance z from the sphere’s center is simply given by 2 2
r’
where p 1 is the molecular density within the sphere and B the molecular interaction constant given by Eq. (19b). The interaction between two spheres separated by a distance d is then given by
where R2 and p 2 are radius and molecular density of the second sphere and is taken from Eq. (68). The interaction between a sphere and a semiinfinite slab is obtained without problems by analytically evaluating the preceding integrals and letting one of the radii go to infinity. However, from reasons clarified later, the limiting behavior for d >> R is more interesting in the present context. For two identical spheres ( R 1= R2 = R, pI = p2 = p ) , one obtains
f,(z)
16 F ( d ) = - -T 9
R6 p B s , d
and for the sphere-slab configuration ( R I = R , R2 F(d)= -
8 105
-T
2 2
+
m, p1 = p2 = p ) ,
R’
p B7 d
Both results have already been derived in Eqs. (22a) and (28a). If one now assumes that the screening of the radiation field by the near-surface molecules is the same as for the two-slab configuration, the microscopic quantity p 2 B is related to the macroscopic Hamaker constant by Eq. (25b). Especially for ideal metals, which may be considered as the limit of a London superconductor, as the penetration depth approaches zero, one
94
U. HARTMANN
obtains via eq. (59) for an interaction in vacuum 7n2 R 6 F ( d ) = - -hc 7 27 d
(72)
from Eq. (70), and r2 R3 F ( d ) = - - hc 3 (73) 90 from Eq. (71). However, these results are not completely correct, since the surface screening of the radiation field is affected by the actual curvature of the interacting surfaces. The correct result for the two-sphere configuration is obtained by using the Hamaker constant given in Eq. (22b). For perfectly conducting spheres, as considered in the present case, one has, apart from the electric polarizability, to account for the magnetic polarizability, which provides an additional contribution of 50% of the electric component to the total polarizability (see, for example, Jackson, 1975). Appropriate combination of electric and magnetic dipole photon contributions yields 2A13(0)2A23(O) = A2(0) = Ak(0) + AL(0) + (14/23)AEM(0)(Fienberg and Sucher, 1970; Feinberg, 1974), where A,(O) = 1 and A,(O) = are the pure electric and magnetic contributions, respectively, and AEM(0) = is due to an interference of electric and magnetic dipole photons. Inserting A2(0) = 143/92 into Eq. (22b) then ultimately leads to
1
4
which has been previously derived by a more involved treatment (Feinberg and Sucher, 1970; Feinberg, 1974). Comparison with Eq. (72) shows that the two-slab renormalization underestimates the sphere-sphere VDW force by about 19%, which is due to the reduced screening of the curved surfaces. For the sphere-slab arrangement, the rigorous result is obtained by using the Hamaker constant given in Eq. (28d) for a perfectly conducting metal sphere. Using 2A23(0)= this yields
5,
9 R3 F ( d ) = - -hc -, (75) 87r2 d S which is in agreement with a previous result (Datta and Ford, 1981) obtained by different methods of theory. A comparison with Eq. (73) yields a slight underestimate of about 4% due to the two-slab renormalization. At very small separations, d << R, the Derjaguin approach according to Eq. (40b) yields the correct results 7r 2 R F ( d ) = - -hc 7 1440 d
95
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
0.01
1 .oo
0.10
10.00
100.00
d/R
asymptotic limit
-15
(b)
I
'
~
~
.
'
~
-
.
'~ - I~ ' ' ~ - I
-
~
'
~
~
~
T
-
r
FIGURE 19. Retarded vacuum dispersion force between two spheres (a) and between a sphere and a semi-infinite slab (b). R denotes the sphere's radius, F the magnitude of the attractive force, and d the surface-to-surface distance. The upper curves correspond to perfectly conducting constituents, while the lower ones characterize identical dielectrics. The dashed lines indicate results from short- and long-distance approximations.
96
U. HARTMANN
for the sphere-sphere interaction, and 7r 2 R F(d) = - - hc -5 (77) 720 d for the sphere-slab interaction, where in both cases the Hamaker constant according to Eq. (59) has to be used; radiation field screening is about the same for planar and very smoothly curved surfaces. Comparison of Eqs. (74) and (75) with (76) and (77) shows that the dispersion force changes from l i d 3 to a l / d 8 dependence for the two spheres, and from a l/d' to a I /d5 dependence for the sphere-slab arrangement. Both cases are correctly modeled by the Hamaker approach according to Eq. (69), as shown in Fig. 19. For d >, R/10, the Derjaguin approximations exhibit increasing deviations from the Hamaker curves. Deviations in radiation field screening with respect to the two-slab configuration gradually occur and reach the aforementioned asymptotic values when the Hamaker curves approach the asymptotic limit. Figure 19 additionally includes results of the preceding comparative study for interacting dielectrics. In this case, surface screening is much less pronounced, as for perfectly conducting bodies. Thus, the Hamaker approach with two-slab renormalization yields almost accurate results at any interaction distance and for arbitrarily curved surfaces. The major conclusion that can be drawn from this worst-case scenario is that the maximum error due to surface screening of a probe with unknown electric and magnetic form factors amounts, at large distances, to 10% for an arbitrarily corrugated sample surface and to 4% for an atomically flat substrate. At ordinary working distances, d << R , and for dielectrics, geometry-modified screening is completely negligible (see also Mostepanenko and Sokolov, 1988). Another shortcoming of the present theory is that it implicitly neglects multipole contributions beyond exchange of dipole photons. In general, for probe-sample separations greater than about one nanometer, the exchange of dipole photons generally overshadows that due to dipole-quadrupole and higher multipole exchange processes. However, for smaller separations, as present in contact-mode SFM, and for some particular material combinations also at larger separations, multipole interactions assume increasing importance. For the retarded interaction of perfectly conducting spheres, the total force including the interference between electric and magnetic quadrupole photons (Feinberg and Sucher, 1970; Feinberg, 1974) is shown in Fig. 19 in addition to the pure electric dipole contribution. However, in most cases these corrections are of little relevance in the present context. A much more serious obstacle for a rigorous characterization of VDW interactions in SFM results from the explicit assumption of isotropic
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
97
bulk dielectric permittivities of probe, sample, and immersion medium. Especially if probe and sample are in close proximity, this assumption is subject to dispute, in view of the microstructure of solids (Zarenba and Kohn, 1976) and liquids (see Section 1I.D) and the existence of particular surface states. This may require specific correlations to account for the particular molecularscale surface dielectric permittivities. However, unfortunately little reliable information on specific surface dielectric properties has become available so far. Further progress is also needed in appropriately treating the behavior of real metal tips and substrates. Especially for sharp probes and substrate protrusions, the delocalized electrons, moving under the influence of the radiation field fluctuations, require a specific nonlocal microscopic treatment (Girad, 1991). 9. Applicution of van der Wuuls Forces f o r Molecular-Scale Analysis and Surface Munipulution Manipulation of substrate surfaces by using scanned probe devices assumes an increasing importance. In particular, the deposition of individual molecules or small particles provides an approach to study microscopic electronic or mechanic properties and to achieve positional control of interaction processes. First experimental results (Eigler and Schweizer, 1990) imply that VDW forces may play an important role within this field. Figure 20 shows some proposals for the analysis and manipulation of small particles or molecules by a systematic employment of VDW interactions. A small particle or molecule physisorbed on a flat substrate may be moved in close contact to the substrate by a “sliding process,” as shown in the upper left image of Fig. 20. The VDW bonds between particle and substrate and between particle and tip have to ensure on the one hand the fixing of the particle between tip and substrate during sliding, and on the other hand the anchoring to the substrate during final withdrawal of the tip. A liquid environment permits the variation of the nonretarded Hamaker constant for the tip-substrate interaction over a wide range, preferably according to Eqs. (53) or (54). PTFE may be considered as a promising universal substrate material, since its optical refractive index ( n = 1.359) is lower than that of several liquids yielding repulsive interactions with respect to most tip materials. Especially for water immersion, n = 1.333, the interaction between PTFE and any tip material should almost vanish. PTFE can be easily modified to render its surface hydrophilic. Another surface manipulation process (Fig. 20, upper right) involves the elevation of the particle if the tip-particle VDW bond is stronger than that between particle and substrate. The particle can thus be transported over larger distances and obstacles. Deposition is performed at a place where the
c
FIGURE 20. Manipulation (upper row) and analysis (lower row) of small particles and molecules by systematically employing VDW interactions.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
99
substrate-particle interaction is stronger than the tip-particle interaction. For this purpose the tip should be dielectric, and the place of position of the PTFE substrate should have a higher optical refractive index than the tip. As was shown in Section II.B.7, even a monolayer coverage on top of the PTFE substrate can considerably increase the particle-substrate Hamaker constant and can raise the interaction above that between tip and particle. A liquid immersion again allows control of the perturbative tip-substrate interaction. The lack of really reproducible, well-characterized, and mesoscopically (nanometer scale) sharp probes in standard SFM systems is the most apparent obstacle for high resolution VDW measurements. The “molecular tip array” (MTA) philosophy developed by Drexler (1991) could alleviate this problem. The proposed geometry is shown in the lower left of Fig. 20. The sample, shaped as a small bead, is attached to the microscope’s cantilever by adhesion. Small particles or macromolecules are adsorbed on the flat substrate surface. Since the bead’s radius R’ is assumed to be very large compared to the effective molecular radius, which is according to Eq. (32) given by R = 2R:/R, for an ellipsoidal molecule with semiaxes R , and R,, an individial molecule underneath the bead images the gently curved surface upon raster scanning the substrate with respect to the bead. The lateral resolution follows directly from Eq. (47a) and amounts to A,(d)
=2
R . d m .
While the interaction between bead and molecule at arbitrary separations may be obtained from Eqs. (42), the total nonretarded VDW force for separations being large compared with the effective molecular radius, i.e., d > R:/R,, is, according to Eqs. (27), H n ~ 6 , F,(d) = - 67r R:d4‘
(79)
For A << R : / R , Eq. (40a) yields F,(d)
1 H,
--
1
6
~
R: R,d2‘
The substantial bead radius raises the issue of unwanted surface forces. The bead-substrate interaction is, according to Eq. (40a), 1 R* Fi(d)= - - H i 6 (2R, + d ) ”
with a Hamaker constant preferably according to Eqs. (53) or (54). The
100
U. HARTMANN
ratio of this “parasitic” force to the imaging force is, for d >> R $ / R , ,
For a close bead-molecule separation, d << R $ / R , , one obtains
The preceding quotients may be considered as “noise-to-signal ratios” and should be much smaller than unity. The considerable potential of MTA imaging is emphasized if one somewhat quantifies the preceding design analysis. For simplicity, arbitrary spherical macromolecules with R , = R , = 1 nm are assumed. A moleculebead separation of d = 1 nm separates the VDW interaction from shortrange forces due to orbital overlap. Under these conditions, Eq. (78) yields a lateral resolution of A, = 1.3 nm for the VDW imaging of the bead’s surface. Using a somewhat typical Hamaker constant of 1.5 x J (see Section II.B.5), the force according to Eq. (80) amounts to F, = 25pN, which is within reach of present technology. Suppression of parasitic forces F,’ requires, according to Eq. (83), a Hamaker constant H,* which is less than 4% of H , . This may easily be achieved by using PTFE substrates in combination with an aqueous immersion. Potential tip structures may predominantly include single-chain proteins, proteins with bound partially exposed ligands, or nanometer-scale crystalline particles (Drexler, 199I). The considerable capabilities of modern organic synthesis and biotechnology offer broad freedom in molecular tip design. MTA technology would permit the quasi-simultaneous use of a broad varity of tips scattered across the substrate. This may include tips of different composition, electric charge, magnetization, and orientation. A tip density of more than 1,000/pm2 has been considered as reasonable (Drexler, 1991). First results in obtaining suitable metallic bead-cantilever systems have been reported by Lemke et al. (1990). Apart from VDW imaging of the surface of a spherical sample at ultrahigh spatial resolution, the MTA technology may be well suited to obtaining a deeper insight into molecular electronics and mechanics (Fig. 20, lower right). Using an SFM with a conducting tip-cantilever system, simultaneous tunneling and force measurements may be performed on a single molecule. This may help to clarify the process of tunneling through localized electronic states in organic molecules by detecting the tip-moleculesubstrate tunneling current Zas a function of the tunneling voltage V and the force exerted on the molecule.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
101
10. Some Concluding Remarks The present analysis emphasizes that VDW forces play an important role in SFM. At probe-sample separations less than a few nanometers, the force drops according to F,(d) = - ( H , / 6 ) R / d 2 , where R is the probe’s effective radius of curvature, H , the nonretarded Hamaker constant, and d the probe-sample separation. A somewhat representative value for the force at r l = 1 nm is IF,I/R = 10mN/m. While the interaction is always attractive in dry air or vacuum, it may be attractive, repulsive, or even vanishing if the gap between probe and sample is filled with a liquid medium. Thermally activated processes generally play a minor role, and it is in almost all cases sufficient to analyze the dispersion part of the forces. Non-contact VDW microscopy is capable of providing information on surface dielectric permittivities at sub- 100 nm resolution. The technique is sensitive even to monolayer coverages of a substrate. Important future fields of application are the investigation of liquid/air (vapour) interfaces (Mate et al., 1989) and the imaging of soft (biological) samples including individual macromolecules. As in contact-mode SFM ( i e , AFM), where VDW forces have a substantial influence on the net force balance, and thus on the probeesample contact radius, the long-range interactions may play a role in other noncontact modes of operation, i t . , in electric and magnetic force microscopy, if these are performed at low working distances ( z 1 nm). However, in this latter context VDW forces may be reduced in a welldefined way by covering the sample surface with a suitable dielectric and/ or using an adapted liquid immersion medium. Finally, some open questions with respect to the general subject of VDW interactions in SFM should be listed: (i) In what way may the effective dielectric permittivities deviate from the assumed anisotropic bulk dielectric properties, especially for sharp metal tips? (ii) Is the present rigorously macroscopic treatment satisfactory down to probeesample separations which involve electron-orbital overlap, or is a special nonlocal microscopic treatment needed? (iii) May VDW forces be externally stimulated in a measurable way by electromagnetic irradiation, preferably at wavelengths between IR and UV? Such an excitation, beyond zero-point fluctuations, would permit the performance of “scanning force spectroscopy” as a technique to sense the spectral variation of surface dielectric permit tivities. (iv) Do excited surface states, i.e., surface plasmons (see, for example, Rather, 1988, as well as several articles on plasmon observation by
102
U . HARTMANN
STM) have a measurable effect on the probe-sample VDW interaction? These questions are considered as some major future challenges for elaborate SFM experiments on the VDW forces. Additional questions are concerned with the delicate interplay of VDW forces with other interactions to be discussed in the following. C . Ionic Forces 1. Probe-Sample Charging in Ambient Liquids
Situations in which VDW forces solely determine the probe-sample interaction in SFM are in general restricted to an operation under clean vacuum conditions. Under ambient conditions, which are often present in SFM experiments, long-range electrostatic forces are frequently additionally involved, and the interplay of these latter and VDW forces has important consequences. If wetting films are present on probe and sample or if the intervening gap is filled with a liquid, surface charging may come about essentially in two ways (see, for example, Israelachvili, 1985); (i) by ionization or dissociation of ionizable surface groups, and (ii) by adsorption of ions onto initially uncharged surfaces. Whatever the actual mechanism, the equilibrium final surface charge is balanced by a diffuse atmosphere of counterions close to the surfaces, resulting in the so-called “double layer,” (see Fig. 21). The electrostatic interaction between probe and sample is closely related to the counterion concentration profile.
FIGURE 21. Diffuse counterion atmosphere near the surfaces of two slabs which exhibit a certain surface charge density u. The intervening gap of thickness d contains a solution with a static dielectric constant c. Vo and X denote voltage and separation between fictitious centric planes of the near-surface counterion profiles.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
103
Since the intervening homogeneously dielectric gap between two semiinfinite equally charged slabs is field-free, the counterions do not experience an attractive electrostatic force toward the surfaces. The ionic concentration profile is solely determined by interionic electrostatic repulsion and the entropy of mixing, while the amount of surface charge only controls the total number of counterions. For simplicity, an identical charge density 0 is assumed on the opposite surfaces, and one also assumes electroneutrality of the complete two-slab arrangement. The resulting nonlinear second-order Poisson-Boltzmann differential equation (see, for example, Israelachvili, 1985) leads to a general form of the so-called contact value theorem, The ionic excess osmotic pressure is, for a given thermal activation energy, simply proportional to the excess counterion density, present directly in front of the surfaces of the charged slabs. p ( m ) is the ionic surface density for an isolated charged surface. Since the intervening liquid does not contain a bulk electrolytic reservoir, one has
where t is the static dielectric constant of the immersion fluid. The two-slab counterion surface concentration amounts to
where n denotes the ionic valency. The characteristic length X depends, for a given valency and for given values of and E , on the separation d of the slabs and has to fulfill the following condition: 2kT neX
d 2X
-tan -
+
u
-=
tc0
0.
(87)
Insertion of Eqs. (85) and (86) into Eq. (84) shows t h a t f ( d ) in Eq. (84) may be associated with the pressure in the interspace of a simple parallelplate capacitor with C / s = cco/z. If one applies a certain voltage, Vo, this pressure amounts to f ( z ) = -€toVt/2z2.The analogy holds if the virtual separation of the capacitor plates is given by z = InlX(d), while one has to apply an imaginary voltage Vo = i2kT/e,
(88)
which results in a repulsive interaction between the plates. It should be noted that this elementary voltage is completely independent of the type of counterions. The magnitude amounts to 52.5 mV at room temperature.
104
U. HARTMANN
The two-slab ionic pressure is now simply given by
lnlX thus obviously represents the separation of the fictitious capacitor plates which may be associated with the maxima of the near-surface counterion concentration profiles (see Fig. 21). Figure 22 shows the dependence of X for monovalent ions in water on the separation d between two slabs for three different surface charge densities, computed according to Eq. (87). X decreases with decreasing d and increasing 0.For high surface charge densities and at large separations, X becomes proportional to d. In this limit the pressure becomes
which is known as the Langmuir relation. This relation may be used to calculate the equilibrium thickness and disjoining pressure of wetting films on probe and sample in SFM systems. In the oppposite limit d + 0, Eq. (87) yields X + d N ( q k T / n e o ) d , and the pressure according to Eq. (89) is
0.5 0.0
1
I
5
8
’
, ‘ I
10
I
50
I
,
‘
.
100
d (nm> FIGURE22. Characteristic separation length X for monovalent counterions in water as a function of separation of two surfaces exhibiting an equal charge density 0.The latter quantity is given in electrons per surface area, where I e-/0.8nm2 = 0.2C/m2 represents a typical value for a fully ionized surface.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
105
given by f ( d ) = iVoa/nd,
which describes a real and repulsive pressure because V , is imaginary and a and n have opposite signs. This simple equation is considered to be of particular importance for SFM experiments involving deionized immersion liquids and a moderate charging of probe and sample surfaces. However, the total interaction between probe and sample in a liquid environment must of course also include the V D W force. Unlike the ionic force, VDW interactions are largely insensitive to variations of the counterion concentration, while they are highly sensitive to those surface reactions ultimately leading to the ionic forces, i.e., dissociation or adsorption processes (see Section II.B.7). Thus, for any given probe-sample-immersion configuration, the total interaction is obtained by simple linear superposition of V D W and ionic contributions. The comparison of Eqs. (11) and (91) shows that the VDW force generally exceeds the ionic force at small separations of the interacting surfaces, while, according to Eq. (90), the ionic force is dominant at large separations. If the VDW force is attractive, this results in a transition from repulsive to attractive interactions if the -5
-12
I
I
1
1
1
l
l
I
1
I
l
-
l
1
1
repulsive
van der Waals‘., I
2
j
i 4 Q
I
I
l
l
7 8 9 1 0
FIGURF 23. Interplay of ionic and VDW pressure as a function of separation between two planar surfaces interacting in pure water. Surface charging is assumed t o result from a monovalent ionization process. The long-dashed lines correspond t o the pure repulsive ionic J yields an attractive VDW interaction force. A typical nonretarded Hamaker constant of following the short-dashed straight line. The resulting total pressure is given by the solid lines which show a zero-axis crossing for the two lower charge densities.
106
U. HARTMANN
probe approaches the sample, as shown in Fig. 23. Even for highly charged surfaces, the VDW force causes deviations from the simple ionic double layer behavior up to surface separations of more than a nanometer. For low surface charge densities, both contributions may interplay throughout the whole regime that is interesting for SFM experiments. If the VDW force is attractive, the total pressure generally changes from repulsion to attraction below 10 nm separation of the surfaces. If two slabs are finally forced into molecular contact, the pressure pushing the trapped counterions toward the surfaces dramatically increases according to Eq. (91). The high ionic pressure may initiate “charge regulation processes,” e.g., readsorption of counterions onto original surface sites. As a result the surface charge density exhibits a reduction with decreasing distance between the slabs. The ionic force thus falls below the value predicted by Eq. (91). However, charge regulation is expected to be of little importance in noncontact SFM, since probe-sample separations are generally well above the molecular diameter. Moreover, for a sharp tip close to a flat substrate, charge regulation would be restricted to the tip’s very apex, while the major part of the interaction comes about from longer-range contributions. Thus, Eq. (89) should be a good basis to calculate the actual ionic probe-sample interaction via the framework developed in Section II.B.4. 2. The Efect of an Electrolyte Solution
The treatment in Section II.C.1 was based on the assumption that the immersion medium is a pure liquid, i.e., that it only contains a certain counterion concentration just compensating the total surface charge of probe and sample. This assumption is generally not strictly valid for SFM systems involving wetting films on probe and sample or liquid immersion: Pure water at pH 7 contains M (1 M = 1 mol/dm3 corresponds to a number density of 6 x 1026/m3)of H 3 0 f and OH- ions. Many biological samples exhibit ion concentrations about 0.2 M resulting from dissociated inorganic salts. A bulk reservoir of electrolyte ions has a profound effect on the ionic probe-sample interaction. For an isolated surface, covered with a charge density 0 and immersed in a monovalent electrolyte solution of bulk concentration Pb, the surface electrostatic potential is given by $ o ( ~ pb) , = -iVo arsinh
D
J8ebPb ’
which is a convenient form of the Grahame relation (see, for example, Hiemenz, 1977). The imaginary potential difference Vo is defined in Eq.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
I07
n
-0
al
1
n
0
>
v
W
v)
-0
E
n
E C
v
0 4
FIGURE24. Debye length and surface potential of an isolated charged surface as a function of bulk electrolytic concentration.
(88), and
denotes the Debye length. The dependence of qb0 and AD on the bulk electrolytic concentration is shown in Fig. 24. If X,,U/EE << -iV0/2, then Eq. (92) yields (94) which represents a parallel-plate capacitor, the plates of which exhibit a surface charge density u and are separated by AD. A glance at Fig. 24 shows that this proportionality between surface potential and charge density holds up to surface potentials of about -iV0/2. The Debye length characterizes the separation of the effective centric plane of the counterion profile from the charged surface, as shown in Fig. 25. According to the Gouy-Chapman approach (see, for example, Hiemenz, 1977), the electrostatic potential at any separation z from the isolated surface is given by $O(a, Pb) = ITAD(Pb)/EEOj
$(z)
=
-i2 Voartanh [exp (-./AD)
tanh (iqbO/2V O ) ] ,
(953)
which reduces to
$(z)
=
-i2V0 tanh ( i y ! ~ ~ / 2exp V ~ (-./AD) )
(933)
108
U. HARTMANN
FIGURE 25. Diffuse counterion atmosphere near probe and substrate which both exhibit a surface charge density 6.The intervening gap of width d contains an electrolytic solution with a static dielectric constant t . The Debye length AD characterizes the separation of the centric ,I and @,, planes of the counterion clouds from the surfaces of probe and sample, respectively. $ denote the surface and midplane potentials.
for z >> AD and/or do << -i2V0. Especially for low surface potentials, Q0 << -iVo, this latter relation may be represented by the Debye-Hiickel approximation $ ( z ) = $Oexp (-z/AD).
(95c) The ionic pressure between two equally charged surfaces may now be calculated according to Eq. (84). However, it is more convenient to use the contact value theorem in the alternative form (see, for example, Israelachvili, 1985)
f(.)= kT[prn(z)- P b l i
(96) which is the excess osmotic pressure of the ions in the midplane over the bulk pressure. Since the bulk ionic concentration pb is known, the problem is reduced to the calculation of the midplane ionic concentration pm, which is related to the midplane potential $,(z) by the Boltzmann ansatz. This leads to f ( z ) = 4kTpbsinh2[i$,,(z)/VO].
(97a)
In the “weak overlap approximation,” &,(z) is found by a linear superposition of the potentials of the isolated surfaces produced at 212, i.e., $m
(z)= 2 $ ( ~ / 2 )
(97b)
where $(z/2) is given by Eq. (95). The resulting ionic pressure according to the full weak overlap approximation, i.e., Eqs. (97) combined with (95a), is shown in Fig. 26 for various surface charge densities and two electrolyte concentrations. At any surface charge density, more dilute electrolytes with long Debye screening lengths (see Eq. (93)) lead to a stronger repulsion
F U N D A M E N T A L S OF N O N - C O N T A C T FORCE MICROSCOPY
109
F I G U R E 26. Repulsive ionic pressure between two slabs with equal surface charge density n in an aqueous monovalent electrolyte of bulk concentration p a s a function of surface-to-surface separation.
between the slabs than concentrated electrolytes. The difference in magnitude and decay of the electrolytic ionic pressure with respect to the pure liquid results shown in Fig. 23 is striking. While dynamic-mode SFM essentially detects the pressure according to Eq. (97), the actual ionic force exerted on a probe of a given radius R, according to Eq. (32), has to be obtained via the Derjaguin formulae according to Eqs. (34) and (35). Calculations considerably simplify if a small midplane potential can be assumed, i.e., q ! << ~ ~-EVo. This latter condition is satisfied if d > 2xD and/or $yJ << -iVo, where d is the probe-sample separation and the potential of an isolated surface which is assumed to approximately represent the real surface potential (see Fig. 25). Expansion of the sinh term in Eq. (97a) and insertion of Eq. (95b) via (97b) yields the particularly simple result
F ( d ) = 1287rR(e 2 P 2~ 3X ~ / C tanh2(i$,,/2V0) E ~ ) exp (-d/XD).
(98)
Especially if one has low Surface potentials, $ 5 - iV0/2, application of Eq. (94) together with the Debye- Hiickel approximation, Eq. (95c), yields via second-order expansion of Eq. (97a)
These results show that the ionic double layer forces in an electrolytic environment drop exponentially with probe-sample separation. The decay
110
U . HARTMANN
’\,0.5
-0.7 ‘,O.d\\ \
\
\ \
-0.8 , \ \ \ \ \ \ 0.05\ ,
’ \
\
’\ \
--
u=0.01C/mj u=O.O5C/m‘
‘, \
\
\
\
\
FIGURE 27. Total probe-sample force as a function of working distance if the force microscope is operated under electrolytic immersion. The curves correspond to two different surface charge densities 0 and to various bulk electrolytic concentrations p.
length is given by the Debye screening length, which only depends on the bulk electrolytic concentration. The behavior is in strong contrast to ionic forces in non-electrolytic immersions exhibiting a logarithmic to 1/ d force law as discussed in the previous section. However, as for pure liquids, the total probe-sample force also has to include the VDW component. The interference of ionic and VDW forces is well known from the classical Derjaguin-Landau-Verwey-Overbeck (DLVO) theory of lyophobic colloid stability (see, for example, Hiemenz, 1977). Figure 27 shows representative curves of the total probe-sample interaction that may occur if SFM is performed under electrolytic immersion. The results, shown for two different surface charges and various bulk electrolytic concentrations, may be generalized as follows: For highly charged surfaces and dilute electrolytes, there is strong repulsion more or less throughout the whole regime relevant to non-contact SFM. For electrolytes of higher concentration, the total probe-sample interaction exhibits a minimum (attractive), preferably at probe-sample separations of a few nanometers. If surface charging is relatively low, the force exhibits a broad maximum (repulsive) some nanometers away from the substrate surface, and approaches the pure VDW curve for increasing electrolyte concentration. The preceding analysis has important bearings on VDW and magnetic force microscopy performed under liquid immersion. Unwanted ionic forces due to surface dissociation processes of probe and sample can largely be
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
111
suppressed by adding an appropriate amount of inorganic salts to the immersion fluid. For magnetic measurements performed at ultralow working distances, the total nonmagnetic force may be reduced by approximately compensating repulsive ionic and attractive VDW forces for a given average probe-sampling spacing. Equation (99) can immediately be extended to the situation of electrolytic immersions containing divalent ions or a mixture of ions of arbitrary valency nj and bulk particle densities p b j . This only requires the use of a generalized Debye screening length (see, for example, Israelachvili, 1985), now given by
However, it must be emphasized that the weak overlap approximation used to derive the probe-sample ionic interaction is implicitly based on the assumption that the working distance is well beyond AD. For smaller separations one must resort to numerical solutions of the Poisson-Boltzmann equation (see, for example, Hiemenz, 1977). For example, pure water at pH 7 exhibits a room temperature value of AD M 950nm, while AD N 0.8 nm is found for ocean water, and AD M 0.7 nm for many biological samples (see Israelachvili, 1985). However, for almost all SFM applications it is satisfactory to use the simple analytical results obtained in Section II.C.1 if AD k 1 nm, while the results just given are preferred for smaller Debye lengths. Finally, it may be instructive to address at least one of the open problems in the field of collective VDW-ionic interaction in SFM. The preceding treatment was consequently based on the assumption that both contributions can be evaluated separately, where the total force exerted on the probe is then given by a linear superposition. However, fundamental statistical mechanics does not provide any firm basis for this treatment (Mahanty and Ninham, 1976). A rigorous ab initio ansatz would involve a Poisson-Boltzmann equation,
V2($
+ 4) =
-
(101a)
which contains the sum of the equilibrium ionic potential $ and the fluctuating VDW potential 4. Linearization yields
For a sharp probe in close proximity to a substrate, $ may be a complicated function of position and may show deviations from the simple behavior
112
U. HARTMANN
assumed in the preceding treatment; Eq. (101b) in general is extremely difficult to solve in a self-consistent way. However, it confirms that, at least for high ionic mobility, VDW and ionic forces are not completely independent. Detailed SFM experiments on ionic forces may help in future to further clarify this point. If the detailed nature of the interaction is understood, SFM of ionic forces would be particularly valuable to measure surface charge densities at high spatial resolution. This may also include externally superimposed electrostatic potential differences between probe and sample. D . Squeezing of Individual Molecules: Solvation Forces
The theories of VDW and ionic probe-sample interactions discussed so far are pure continuum theories in which immersion liquids present in the intervening gap between probe and sample are solely treated in terms of bulk properties, such as dielectric permittivity and average ionic concentration. This treatment breaks down when the probe-sample separation is decreased to some molecular diameters. In this regime the discrete molecular nature of immersion media can no longer be ignored, since the effective intermolecular pair potentials in the liquids become a sensitive, anisotropic function of the distance between probe and sample. This phenomenon may cause quite long-range ordering effects of the liquid molecules (Nicholson and Personage, 1982; Rickayzen and Richmond, 1985), as shown in Fig. 28. Attractive interaction between the trapped molecules and the surfaces of probe and sample together with the geometric constraining effect give rise to density oscillations which may extend over several molecular diameters. Forces related to these ordering phenomena are known as “solvation forces.” An excess near-surface molecular density is, according to the contact value theorem, Eq. (84),related to a repulsive pressure between two slabs in close proximity. Modeling of probe-sample solvation forces thus consists
FIGURE 28. Squeezing of individual liquid molecules between probe apex and substrate leads to long-range ordering phenomena. The resulting molecular density oscillations exhibit a periodicity roughly equal to the molecular diameter 6.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
113
in calculating the excess surface molecular density for the two slabs separated by a certain distance with respect to that of a free surface. Over the past years there has been much study of the liquid structure near constraining walls (Chui, 1991). Different theoretical approaches to the problem include linear theories, nonlinear density-functional theories, and Monte Carlo simulations. However, the somewhat controversial results indicate that the field is yet not fully exploited. Thus, the present treatment is devoted to a derivation of a representative order of magnitude of the effects and to an analysis of the physics behind solvation interactions in SFM. As intuitively expected from the simple-minded model in Fig. 28, all theoretical work (Nicholson and Personage, 1982; Rickayzen and Richmond, 1985; Chui, 1991) as well as some experimental observations (Israelachvili, 1985) have invariably confirmed an oscillating near-surface excess molecular density which may roughly be modeled by P(Z)
- p ( m ) = pocos
(2xz/6)exp ( - z l d ) ,
(102)
where z denotes the separation of the two slabs and 6 the effective molecular diameter of the intervening fluid. The empirical ansatz describes an exponentially damped oscillatory variation of the surface excess molecular density. po determines the excess density if the gap between the slabs just equals one molecular diameter. The solvation force acting on a typical SFM probe is then given, according to Eqs. (34b) and (84), by integrating Eq. (102): F ( d ) = F(S)[cos(27rd/6) - 27rsin (27rd/6)] exp (1
-
d/6),
(1 03a)
where one approximately has
F ( 6 ) = k7'po6R/(2~exp( I ) ) ,
(l03b)
with an effective probe radius according to Eqs. (32). The problem of estimating a somewhat realistic order of magnitude of the force is now reduced to an estimation of po, i.e., of the molecular density if the probesample spacing is just one molecular diameter 6. This problem is of course hard to solve in general, since po is expected to be sensitive to the geometry of the opposing surfaces (Chui, 1991). However, a rough estimate may be obtained by considering an upper limit of the total order-disorder difference of an ideal hard sphere liquid. In the total ordering limit, i.e., solidification of the hard sphere molecules between probe and sample in a close-packed lattice, the maximum number density would be p ( S ) = d / S 3 . If it is further assumed that the excess near-surface molecular density of a free surface is almost negligible with respect to this value, i.e., p(00) << p ( 6 ) , Eq. (102)
114
U. HARTMANN
yields po = a e x p ( l)/S3. Thus, one obtains from Eq. (103b) F ( 6 ) / R = (l/d%)kT/6*.
(104)
This particularly simple relationship represents an upper limit of the solvation force per unit probe radius measured at a probe-sample separation of one molecular diameter for an ideal hard sphere VDW immersion liquid. For 6 = 1 nm, one obtains a value of about 1 mN/m. This is 10 times smaller than the typical VDW magnitude mentioned in Section 1I.B. 10. The oscillating solvation force according to Eq. (103a) is shown in Fig. 29 in comparison with a small attractive VDW interaction. While the empirical two-slab pressure according to Eq. (102) exhibits a maximum when the gap width d corresponds to multiples of the molecular diameter 6, the force measured with a paraboloidal or ellipsoidal probe exhibits, according to Eq. (103a), a shift of the molecular peaks by about 25% of the molecular diameter toward lower gap values. The amplitude of the force oscillations increases with the square of the reciprocal of the molecular diameter, while the latter also determines the characteristic decay length with increasing probe-sample separation. The total probe-sample interaction at molecular probe-sample separations is of course composed of both solvation and VDW interactions. The solvation forces result from density fluctuations of the molecules trapped between
/
/
/
/
/
H ’,
=
J
_-
6= 1 .Onm
0
FIGURE 29. Oscillatory solvation force per unit probe radius as a function of probe-sample separation for two different molecular diameters. A weak attractive VDW force is shown for reference.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
115
probe and substrate which are due to long-range molecular ordering processes. The VDW force between probe and sample is sensitive to the dielectric permittivity of the intervening gap, which in turn depends on the actual molecular density of the immersion fluid. Thus, it is clear that solvation and VDW forces cannot be treated separately to obtain the total interaction by linear superposition. Both components act collectively rather than simply additively. Since the VDW theory developed in Section 1I.B is a pure continuum theory, it is convenient to treat the molecular ordering processes in a quasi-macroscopic way. This can be done via the ClausiusMossotti equation, t o [ c ( i v ) - I]/[c(iv)
+ 2]p,
(105) which relates the effective molecular polarizability of hard sphere molecules in the gas phase to the dielectric permittivity and average molecular density which would be measured for a macroscopic ensemble of the molecules. Since the effective polarizability of non-interacting hard sphere molecules is invariant to density fluctuations, a certain average molecular density p , deviating from the bulk liquid density p b , transforms into a modification of the dielectric permittivity through C Y ( ~ Y= )
where c b = f ( P b ) corresponds to the permittivity of the bulk liquid. Under the assumed boundary conditions, Eq. (106) holds for any spectral contribution, i.e., for static orient as well as for higher-frequency electronic permittivities. The consequence is, that, according to Eqs. (8b) and (12b), the entropic and nonretarded Hamaker constants exhibit a density-induced modulation. If one assumes that the average molecular density p(d) within the volume between probe and sample is approximately equal to the surface molecular density p ( d ) , Eq. (102) yields ( 107a) p(d)/Pb = p(O0)/pb[l + cos (2Td/6)exp (17 - d/6)1, with (1 07b) 17 = 1 + In [ P ( 6 ) / P ( W ) - 11. At this point some heuristic assumptions have to be made concerning the ratio of the free surface density p ( m ) to the bulk density of the liquid pb, as well as the ratio of the gap's excess surface density p ( 6 ) to p ( m ) . A reasonable assumption is that the permittivity of the small gap between probe and sample approaches its vacuum value somewhere between d = 36/4 and d = 612, when there is no space left to trap any liquid molecules (see Fig. 28). According to Eq. (106), c = 1 requires' p = 0, and
116
U. HARTMANN
i,
thus, according to Eq. (107a), > 2 which in turn yields for the gap-induced increase of molecular ordering 2.6 > p ( 6 ) / p ( c o ) 2 1.6. In other words, the packing fraction of molecules on probe and sample surface increases by a factor of 1.6 to 2.6 when the probe approaches the sample surface and finally reaches a separation corresponding to only one molecular diameter. The second free parameter left in Eq. (107a) is the effective excess molecular packing fraction p ( o o ) / p b , which is simply not known for a system consisting of a sharp tip opposite to an arbitrarily shaped sample surface. Information on this quantity can only be obtained by performing Monte Carlo simulations under realistic boundary conditions. However, first results obtained for the structure of hard spheres near flat or spherical walls (Chui, 1991) suggest that the packing fraction is a complicated function of molecular diameter and constraining wall geometry and may by far exceed unity. Because of these uncertainties, it is convenient to choose a somewhat pragmatic way. Equation (107a) is strictly valid only for probe-sample separations of a few molecular diameters since it was assumed p ( d ) = p ( d ) . However, to ensure bulk convergence for large probe-sample separation, one has to fulfill p(m)= P b , This in turn formally requires p(oo) = P b . This pragmatic approach permits at least an order of magnitude estimate of oscillatory VDW forces without two many ambiguous parameters. A typical result for a metal-dielectric combination of probe and sample immersed in an ideal hard sphere liquid with p ( 6 ) / 6 ( 0 0 ) = 2.1 (which can be considered as a somewhat typical value according to the preceding analysis) is shown in Fig. 30. The oscillating Hamaker constant has been obtained according to and e 3 [ p ( d ) ]according to Eqs. (106) and Eq. (54) with n 3 ( d )= (107). The oscillating refractive index n3 of the immersion liquid transforms into a huge “overshoot” of the nonretarded Hamaker constant with respect to its bulk value. If the bulk index of the immersion fluid is close to that of the dielectric (sample), the originally purely attractive interaction may become repulsive for certain probe-sample separations, while it is solely attractive but oscillating if probe and sample are made from the same material. Refractive index and Hamaker constant both exhibit the exponential damping ultimately resulting from the decrease of the molecular excess osmotic pressure according to Eq. (102). They are completely out of phase, but both show the molecular periodicity.
m,
’
The upper limit Is additionally constrained by the fact that c ( 6 ) must of course be finite. Convergence of Eq. (106) requires p(6) < pb(tb 2 ) / ( f h - I). However, this criterion only becomes relevant if the excess surface density for the gap between probe and sample is almost the same as for the free surfaces, and if this free surface molecular density is much higher than the bulk liquid density. For p(00) = P b , used in the following, p ( 6 ) / p ( m ) < 2.6 can be considered as the relevant criterion for all immersion liquids (with c b < 2.9).
+
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
117
d (nm) FICXJRE 30. Periodic molecular ordering of molecules trapped between probe and sample causes oscillations of the effective optical refractive index n3 of the immersion fluid, the bulk index of which is given by n j m . These oscillations transform into a huge periodic variation of the nonretarded Hamaker constant H , with respect to its bulk value Hnm.6 denotes the effective molecular diameter of the immersion medium and nz the refractive index of the dielectric sample. The probing tip is metallic.
The total VDW solvation force exerted on the probe is now obtained by a linear superposition of the osmotic contribution according to Eq. (103a) and the VDW contribution according to Eq. (40a) using a densitymodulated Hamaker constant according to Eqs. (6), (12b), (106), and (107). A typical result, again for a metal-dielectric combination of probe and sample, is shown in Fig. 31. The total interaction still shows the molecular periodicity 6. However, since osmotic and VDW contributions are mutually phase-shifted in a complicated way, the oscillating curve does generally not peak when the probe-sample separation exactly equals a multiple of half the molecular diameter. The damping at small probesample distances is stronger than that of the excess osmotic pressure in Eq. ( 1 02) and approaches the latter a few molecular diameters away from the sample surface. At very small probe-sample separations, i.e., just before interatomic repulsion occurs, the total interaction approaches the VDW continuum expected for a vacuum interaction between probe and sample. However, if probe and substrate are separated by more than about one molecular diameter, the giant oscillations of the VDW solvation force exceed by far the continuum of VDW forces.
1 I8
U. HARTMANN
-a
I
0.1
I
0.5
" , ' I
1 .o
I
5.0
d (nm> FIGURE 31. Force per unit probe radius as a function of probe-sample separation for a metal/dielectric (optical refractive index n2) configuration of probe and sample, immersed in a hard-sphere liquid with an effective molecular diameter 6 and a bulk optical refractive index n3 m. Superposition of the oscillatory VDW and osmotic contributions yields the total force exerted on the probing tip. For reference, the VDW curve resulting from the pure continuum theory is also shown.
Finally, it should be emphasized that the field of solvation force phenomena in SFM is completely open and, to the author's knowledge, no detailed observation of an oscillating attractive/repulsive interaction at molecular working distances has ever been reported up to the present time. However, the present theoretical analysis confirms that, at least for some model configurations, oscillatory solvation forces should be detectable. Quite promising inert immersion liquids, which contain fairly rigid spherical or quasi-spherical molecules, are, for example, octamethylcyclotetrasiloxane (OMCTS, nonpolar, 6 M 0.9 nm), carbon tetrachloride (nonpolar, 6 M 0.28 nm), cyclohexane (nonpolar, 6 M 0.29 nm, and propylenecarbonate (highly polar hydrogen-bonding, 6 M 0.5 nm) (Israelachvili, 1985). SFM measurements on these and other immersion liquids could help provide a deeper insight into molecular ordering processes near surfaces and in small cavities. As already emphasized with respect to VDW and ionic interactions, solvation forces certainly have to be accounted for as unwanted contributions, if electric or magnetic force microscopy is performed at ultralow working distances and under liquid immersion. In general, the situation is complicated by the fact that VDW, ionic, and solvation forces may contribute to the total probesample interaction in a non-additive way. Unfortunately, this is only part of
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
119
the whole story. If SFM experiments are performed under aqueous immersion, or if only trace amounts of water are present - and this is the case for almost all experiments under ambient conditions - hydrophilic and hydrophobic interactions must often additionally be taken into account (Israelachvili, 1985). The phenomena are mainly of entropic origin and result from the rearrangement of water molecules if probe and sample come into close contact. In this sense hydrophilic and hydrophobic forces clearly belong to the general field of solvation forces; however, macroscopic experiments (Israelachvili, 1985) confirm that they are generally not well characterized by the simple theory presented here. Hydration forces result whenever water molecules strongly bind to hydrophilic surface groups of probe and sample. A strong repulsion results, which exhibits an exponential decay over a few molecular diameters (Israelachvili, 1985). In the opposite situation, for hydrophobic probe and sample, the rearrangement of water molecules in the overlapping solvation zones results in a strong attractive interaction. These phenomena once again show that water is one of the most complicated liquids that we know. However, its importance in SFM experiments under ambient conditions must not be emphasized, and more detailed information on its microscopic behavior is of great importance.
E. Capillary Forces Under humid conditions, a liquid bridge between probe and sample can be formed in two different ways: by spontaneous capillary condensation of
FIGURE32. Capillary interaction between the probe and a substrate which has a surface covered with a liquid adsorbate. When the probe is dipped into the adsorbate the liquid surface exhibits curvature near the probe’s surface (left side). Withdrawal of the probe or spontaneous capillary condensation before the probe contacts the liquid surface results in an elongated liquid bridge (right side).
120
U. HARTMANN
vapours, and by direct dipping of the tip into a wetting film which is present on top of the substrate surface. Capillary condensation is a first-order phase transition whereby the undersaturated vapour condenses in the small cavity between probe apex and sample surface. Because of surface tension, a liquid bridge between probe and sample results in a mutual attraction. At thermodynamic equilibrium, the meniscus radii according to Fig. 32 are related to the relative vapour pressure by the well-known Kelvin equation (see, for example, Adamson, 1976),
where C denotes the universal gas constant and p, M , y are the mass density, the molar mass, and the specific surface free energy or surface tension of the liquid forming the capillary. Since p < p s , the Kelvin mean radius, l r K l = r 1 r 2 / ( r 1 r 2 ) , for a concave meniscus as in Fig. 32 is negative. Figure 33 shows the equilibrium Kelvin radius for a water capillary between probe and sample as a function of relative humidity of the experimental environment. For r K -+ -00, i.e., for a relative humidity approaching loo%, the swelling capillary degenerates to a wetting film. In the opposite extreme, at a relative humidity of a few percent, no capillary is formed, or a preexisting capillary evaporates, since the Kelvin radius approaches molecular dimensions.
+
FIGURE 33. Equilibrium dimension of the Kelvin radius for a water capillary between probe and sample.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
121
The mutual attractioil of probe and sample results from the Laplace pressure,
( 109a) within the liquid bridge. The total capillary force exerted on the probe is thus given by
F(4
= .X2(4r/rk,
(109b)
where, as shown in Fig. 32, x is the radius of the area where the meniscus is in contact with the probe’s surface. The problem is thus to determine this radius as a function of the probe-substrate separation, since the Kelvin radius is known at thermodynamic equilibrium from Eq. (108). One first considers the situation sketched in the left part of Fig. 32, i.e., the probesubstrate separation d is less than or equal to the adsorbate thickness t . For simplicity, an ideally wetting liquid with vanishing contact angle at the probe is considered. From Eqs. (31) one obtains the relation x 2 = 2Rz, where the effective probe radius is determined by Eq. (32). From geometrical considerations one then immediately obtains
z
M
t
-
d + rI[l
+ R / ( R+ rl)],
(1 10)
which is valid for thin adsorbate films with t << R . Since r l << r2, one has a good approximation r l M -rk. The force according to Eq. (109b) is thus given by F ( d ) = - 2 ~ R y [ l+ R / ( R - rk) - ( t - d ) / r k ] .
(111) Force-versus-distance curves according to this relation contain complete information about an adsorbate layer. At low partial vapour pressure leading to -rk << R, the force measured for a virgin probe-adsorbate contact, FIR = - 4 ~ 7 , permits a measurement of the adsorbate’s surface tension. The Kelvin radius is directly obtained from the slope dF/?ld = - 2 n y R / r k . Finally, the adsorbate thickness may be obtained by a simple dipping experiment, whereby F ( d ) is detected for 0 5 d 5 f. The maximum capillary force is obtained just before the tip touches the substrate: F ( 0 ) = -27ry(2rk - t ) R / r K for -rk << R and F ( 0 ) = 2.rrytR/rk for - r h << t . For a water film the specific surface free energy is 73mJ/m2 (see, for example, Israelachvili, 1985). The capillary force acting on a probe which dips into a water film on top of the sample is thus IFI/R > 0.9 N/m, which is about 90 times the typical VDW magnitude mentioned in Section 1I.B. 10. When the probe is withdrawn after it has contact with the liquid adsorbate, an elongated capillary is formed as shown in the right part of Fig. 32. Since for d 2 t both meniscus radii r I and r2 now vary over a considerable
122
U . HARTMANN
range, the calculation of the probe-sample capillary force is slightly more complicated than in the previous situation. Simple geometrical arguments lead to
x = R(r2
+ r , ) / ( R+ r l )
(1 12a)
and
x = RJ1
-
[R
+d -
+
t -r,]/[R rl],
( 1 12b)
where, at thermodynamic equilibrium, rl and r2 are additionally related to each other by Eq. (108). After a little algebra, the radius of the probecapillary contact area is determined by the solution of the following cubic equation:
x 3 - (d - t ) x 2 + 2R(2rk + d - t)x - R ( d - t ) 2 = 0,
(113)
which is valid for d - t << R . The result can of course be obtained analytically, but is then a little bit unwieldy. For d - t >> -rk, rl M ( d - t)/2 and r2 M -rk << R lead, according to Eqs. (1 12a) and (109b), to the asymptotic force
+
F ( d ) = r y R 2 ( d - Z)2/rk(2R d - t ) 2 ,
(1 14)
from which an upper limit of F ( m ) = r y R 2 / r k can be deduced for the capillary force. For example, this upper limit amounts for a lOOnm
- - - - _ _----_ _
-250
--
I
0
!
I
lb
20
I
30
1
40
I
50
I
60
I
70
90
x
50 X 1
80
I
90
1 10
FIGURE34. Capillary force as a function of probe-substrate separation. The substrate is covered with a 5 nm water film of fixed thickness, while the probe-sample interaction is shown for three different values of the ambient relative humidity.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
123
probe, interacting with the substrate via a water meniscus to 1.5pN at an ambient humidity of 50-60%. This order of magnitude shows that capillary forces in SFM are generally much stronger than any aforementioned interaction. Figure 34 shows some force-versus-distance curves obtained according to Eqs. (1 1 I), (109b), and (1 13). When the approaching probe first touches the water film of 5nm thickness, it experiences a sudden attractive force of magnitude 47ryR, which is, for a 100nm probe, about 90nN. The probe then penetrates the adsorbate layer and exhibits a linear increase in attractive force according to Eq. ( 1 11). The maximum value achieved just before touching the substrate depends on the ambient humidity, where the highest values are obtained in a relatively dry atmosphere. If the probe is withdrawn before making contact with the substrate, the attractive force decreases reversibly until the adsorbate-air interface is reached. From then on an elongated capillary is formed, leading to a pronounced hystereris effect. The curves are now described by Eqs. (109b) and (113). Upon further withdrawal of the probe, the force first exhibits a further decrease until some minimum value close to zero is reached. From this point it increases again, approaching the asymptotic behavior according to Eq. (1 14) (not shown). If thermodynamic equilibrium conditions would be present throughout the complete measurement, the capillary between probe and sample would assume an arbitrary length, while the smaller meniscus radius r2 becomes equal to the Kelvin radius - r K , i.e., the circumference of the meniscus becomes stable. However, since the adsorbate film has a finite thickness, material transport into the growing capillary is disrupted at some time, leading to an irreversibility whereby the force suddenly vanishes. The same occurs if the capillary is destroyed by external perturbations (vibration, air currents). These results are only precise for nearly spherical probes and vanishing contact angle. However, the analysis can immediately be extended to paraboloidal or ellipsoidal probes, even of low aspect ratios R,/R, (see Eq. (32)) and to arbitrary contact angles. The main results predicted by the preceding treatment remain unchanged. A particularly interesting feature is related to Eq. (113). A careful analytical examination shows that, for a certain regime of the substrate-sample separation, the cubic equation involves three real roots, all leading to stable solutions for the force according to Eq. (109b). This implies the possibility of discontinuous transitions between different force curves upon variation of the probe-sample separation. It should further be noted that the Kelvin equation (108) as well as the Laplace relation ( 109a) are strictly macroscopic equations, i.e., to ensure validity, the system has to be in thermodynamic equilibrium and the Kelvin radius must well exceed the molecular diameter. For very small Kelvin radii,
124
U . HARTMANN
the liquid's surface tension is no longer a constant. While for simple Lennard-Jones liquids such as cyclohexane or benzene a macroscopic behavior is already manifest at molecular Kelvin radii, for water a value of about 5 nm is assumed (Israelachvili, 1985), which corresponds, according to Fig. 33, to a relative humidity of 80%. The other important question is whether an ideal thermodynamic equilibrium can be assumed for an arbitrary adsorbate. In the extreme situation, where an adsorbate exhibits a nearly vanishing evaporation rate since it forms a stable film on top of the substrate, the simple free-liquid equilibrium conditions considered earlier are no longer valid. In this case, the Kelvin equation (log), which controls the interplay of the meniscus radii r1 and r2, has to be replaced by a relation representing the condition of zero material transport into the capillary. According to Fig. 32 the meniscus volume for d 2 t is given by I
"
(115a) with
zl=R+d-t-rl-
"-}.
R2-x
(115b)
z2 = rl
Zero material transport is then ensured by the constraint a V /a d = 0. This latter condition then relates r 2 ( d ) to r l ( d ) . The additional use of Eqs. ( 1 12) then leads to a first-order nonlinear differential equation for the meniscus-probe contact radius x . The numerical solution permits, together with Eq. (109b), a calculation of the capillary force for any probe-sample separation d. However, in the most interesting regime t 5 d << R , the differential equation can be considerably simplified which leads to an analytical solution for x . z 1 x z2 = rl and rl = ( x 2 2R[d - t ] ) / 4 R inserted into Eq. (1 15a) yields the simple equilibrium condition
+
(
2 [ d - t]
:)
+-
-
+ x = 0,
(116a)
with the solution x =
4Rri d - tNrK'
(116b)
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
125
The capillary force according to Eq. (109b) is thus
F(d)=
TK
d-t-TK’
which matches the result of Eq. (1 11) for d = t. On the other hand, if d - t >> -rK, r l zz ( d - t ) / 2 - rK directly inserted into Eq. (1 12b) yields via Eq. (l09b) the asymptotic behavior
This latter relation clearly shows that the meniscus between probe and substrate may extend over probe-sample separations several times exceeding the adsorbate thickness, and even exceeding the probe radius. The capillary instability point can be estimated by considering the decrease of the meniscus radius r2 (see Fig. 32) with increasing probe-sample separation d. Using r l = ( d - t ) / 2 , the combination of Eqs. (112) yields r2 = 4 2 R 2 + R ( d - t ) - ( d - t)/2,
(119a)
where the capillary becomes unstable if r2 = -rK. This gives a critical probe-substrate separation of d=2(1+&)R+t~t+5.5R.
(119b)
Thus, an ideal, externally unperturbed capillary may extend over more than a hundred nanometers. The force obtained according to the constant-volume equilibrium via Eq. (117) exhibits a behavior completely different from that shown in Fig. 34. The result for a perfluoropolyether (PFPE) polymer liquid film adsorbed on a substrate is shown in Fig. 35. The surface tension and Kelvin radius values were taken from Mate et a/. (1989). Upon approach to the sample, a sudden attractive force is exerted on the probe, when it first touches the adsorbate film. The linear behavior upon dipping the probe into the adsorbate film is again described by Eq. ( 1 1 1 ) . Upon withdrawal, again a considerable hysteresis occurs, since an elongated liquid bridge is now formed between probe and adsorbate surface. This leads to a monotonic decrease of the force with increasing probe-sample separation - initially according to Eq. ( 1 17), and then, in the asymptotic regime, according to Eq. ( 118). The theoretical result shown in Fig. 35 is in good quantitative agreement with experimental results on PFPE polymer liquid films presented by Mate et al. (1989). The existence of long-range capillary forces has been demonstrated by several experimental results. Detailed measurements for water were presented by Wciscnhorn et al. (1989). However, a detection of the pure
126
U. HARTMANN 2
l
l
l
l
l
l
l
l
l
r
l
,
,
l
,
,
,
,
l
,
,
,
,
l
l
l
l
l
-2-
--._______-_--______-----------
-6-
:approach
-
-10-
-
-14-
-
n
z 5
I
FIGURE 35. Capillary force as a function of probe-substrate separation. The adsorbate thickness is assumed to be 30nm. The model calculation actually applies to an adsorbed perfluoropolyether polymer liquid film.
capillary forces appears to be difficult in some cases because the intermediate contact of the probe with the substrate yields additional adhesion forces which may considerably modify the curves shown in Figs. 34 and 35. Yet, not enough experimental data are available to strictly decide whether liquid adsorbates in general exhibit capillary forces according to Fig. 34 or according to Fig. 35, or if they in general exhibit a more complex intermediate behavior. However, especially for very thin films showing clear capillary formation at highly undersaturated vapour pressure, it is likely that Eqs. (1 17) and (1 18) are valid. It is interesting that the force according to these relations does not involve the multistabilities which occur if the free-liquid thermodynamic equilibrium equation ( 1 13) is used to derive the probemeniscus contact radius. In spite of these uncertainties, the preceding analysis has an important implication for force measurements in the presence of liquids: If the SFM probe is dipped into a liquid adsorbate of a new nanometers thickness, the force always exhibits the linear dependence on the probe-sample separation d given by Eq. (1 11). Since all forces dealt with before involve a nonlinear dependence on the probe-sample separation, they can be measured in the presence of capillary forces. A complete immersion of the whole SFM can thus be reduced to dipping only the tip into an immersion film of sufficient thickness, while ultimate corrections of the measured force-versus-distance curves are simply linear. Additionally, capillary forces may be used to lock
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
127
in the non-contacting probe at a certain separation from the substrate surface, instead of using electrostatic servo forces. However, in general, much more experimental information on capillary phenomena is needed to further elucidate this complex but interesting area (see Evens et al., 1986). A more detailed examination should also include the effect of electrostatic fields on capillary equilibrium, since voltages between probe and substrate are used quite often in scanned probe microscopy. F. Patch Charge Forces
While the present work was in preparation, an additional force not included in Fig. 2 was shown to be highly relevant to explain some unusual effects in previous SFM data. Burnham et al. (1992) suggested that very long-ranged, mostly attractive but sometimes repulsive probe-sample forces may be attributed to work function anisotropies and their associated patch charges. The work function is very sensitive to perturbations at a material’s surface. Even if the surface is ideally clean and free of defects, different crystallographic orientations are associated with differences in the work function. If one takes out an electron or one region of a material with work function Q l and puts it back into another region with work function Q2 (a, # Q 2 ) , the energy - (P2) is not conserved. Energy conservation requires that the two regions are at different electrostatic potentials V1 and V2 such that V2 - V1 = - a2 (Ashcroft and Mermin, 1976). Thus, the surface charge density must vary across the sample. Burnham et ul. (1992) were the first to discuss the interaction of the resulting patch charges between two distinct bodies. As an exemplary electrostatic interaction, patch charge forces scale with respect to their range with the characteristic sizes of the patches involved. Some concrete examples were already provided by Harper (1 967). A rigorous calculation would necessarily include all of the patches on the tip, sample, and nearby instrumentation, since charge neutrality must be maintained: a hopelessly complicated task. Burnham et al. (1992) applied in a first ansatz a simple discrete-charge model using the method of images (see, for example, Jackson, 1975). They found the following important implications.
(i) The patch charge effect is not limited to conducting probes and samples. For insulators, however, the work function has to be adequately defined since no electrons are present at the Fermi level (Harper, 1967). (ii) Patch charge forces are generally much longer-ranged than VDW forces. The simple point-charge model yielded a linear dependence of the force on the probe-sample separation at separations that were
128
U. HARTMANN
small compared with the probe’s radius. A more rigorous treatment, however, will of course yield a certain dependence of the forceversus-distance curve on the probe geometry, as for the VDW interaction discussed in Section 1I.B. (iii) If the two patch charges on probe and sample which provide the dominant contribution to the interaction have the same sign, the force can be repulsive at large probe-sample separations. It is attractive at sufficiently small separations independently of being attractive or repulsive at large separations. A transition from repulsion to attraction has been experimentally observed and was shown to be in good agreement with the patch charge model (Burnham et al., 1992). Such a transition in the force-versus-distance curve under well-defined experimental conditions (clean surfaces, no intervening medium, no remanent discrete charges involved) can be considered as the most characteristic signature of patch charge forces, since the initial repulsion is hardly explainable by the exclusive occurrence of the previously discussed surface forces. (iv) Since the electrostatic is caused by local variations in the work function, patch charge forces should critically depend on the surface crystallographic perfection and on the amount of adsorbate covering. This is clearly a good point of application for experimental work in which patch charge forces could be varied in situ by suitable surface treatments. (v) The magnitude of the patch charge interaction is related to the dielectric constants in much the same way as the retarded Hamaker constants discussed in the context of VDW forces. However, because patch charge forces are determined by atomically thin surface layers, the involved dielectric constants are real surface quantities, while retarded Hamaker constants involve a finite electromagnetic penetration depth (see Section 1I.B). As Burnham et al. (1992) have emphasized, recognizing the existence of patch charge forces has some important consequences for SFM. First of all, it strongly suggests that previous interpretations of force curves are incomplete. Patch charge forces are very likely to significantly contribute to the “jump-to-contact’’ phenomenon (Landman et al., 1990). Furthermore, local work function variations should be detectable by force microscopy and thus by a method that is completely independent of the well-established techniques including STM. A significant advantage of this is that work function measurements by SFM are not restricted to conductors. In order to further clarify the role of patch charge forces in SFM, experiments have to be performed on samples with well-known local variations in
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
129
work function. That should in particular yield more information on the typical range of patch charge interactions. A completely unknown aspect is the patch distribution on typical microtips (metallic or nonmetallic) as used for SFM. Improved equipotential calculations should yield information on the influence of probe geometry and of patch size and shape, as well as on fine structure details of force-versus-distance curves.
111. ELECTRIC FORCEMICROSCOPY USEDAS
A
SERVO TECHNIQUE
A . Fundamentals of Electrostatic Probe-Sample Interactions
To ensure proper operation of the force microscope’s feedback loop upon scanning in a mode of constant probe-sample interaction, the interaction must be a monotonically varying function of the probe-sample separation. In general, both surface forces as well as magnetostatic forces may locally be attractive, repulsive, or vanishing. Thus, an additional “servo force” is required to control the gap between probe and sample in a well-defined way. Scanning force microscopy (SFM) has proven capable of sensing Coulomb forces resulting from a charging of probe and sample at high spatial resolution (Stern et al., 1988). In principle, the sensitivity of SFM is high enough to detect single free electrons (Rugar and Hansma, 1990). It is thus convenient to use the electrostatic interaction between probe and sample for servo purposes. In order to model the electrostatic interaction between the SFM probe and a flat sample surface, the probe is often approximated by a sphere. However, this is of course a poor approximation for a sharp SFM probe. Force-versus-distance curves calculated according to this oversimplified model only permit an order-of-magnitude estimate of the electrostatic interaction. As already emphasized in the context of Section 11, approximation of the probe by a paraboloid of rotational symmetry is a first step towards a better characterization of probeesample interactions. The parabolic coordinates are given by (see, for example, Moon and Spencer, 1961) x = ptvcos 0,
( 120a)
y = pusin 0,
( 1 20b)
z
= (p2- u2)/2
( 120c)
The probe’s surface is then determined by v = vo, 0 5 p < 03, and 0 5 0 5 27r. The quantity uo (0 5 uo 5 03) determines the sharpness of the
130
U. HARTMANN
probe. Using this paraboloidal approximation, the problem of calculating the electrostatic interaction between probe and sample is reduced to a wellknown boundary value problem of classical potential theory. The solution of Laplace’s equation for the electrostatic potential may conveniently be expressed in terms of 3essel functions of order zero for p and L/ combined with the usual trigonometric functions for -9 (see, for example, Moon and Spencer, 196 1) . If the probe carries a homogeneous surface charge density, rather than exhibiting a constant electric surface potential with respect to the sample, the electrostatic potential is of the form
4 = (Y + @Inpl
(121)
where Q and @ are determined by the sharpness of the probe, by its surface charge density, and by the dielectric constant of the intervening medium between probe and sample. Once the potential has been determined, the corresponding electrostatic force between probe and sample is obtained by the standard methods of potential theory (Morse and Feshbach, 1953). At small probe-sample separations, z << R , surface curvature may become substantial. In this limiting regime, the Derjaguin approximation (Derjaguin, 1934), which was already discussed in Section 11.3.4, allows one to a certain degree to account for a smooth surface curvature. The approximation yields for the electrostatic force between probe and sample
F = - 7 r q V 2 (R , R , / [ R ,
+ R,])/d,
( 122)
where 6 is the static dielectric constant of the intervening medium between probe and sample and V is the applied voltage. R , and R , are the effective radii of curvature of probe and sample surface, respectively, and d is the probe-sample separation. If the sample is locally flat, the geometric factor in Eq. (122) reduces to the probe’s radius of curvature R , . If the probe consists of a polarizable material rather than of a conducting material, an electric field manifest between the electrodes of probe and sample causes the occurrence of free electric charges on the probe’s surface and in its interior. The resulting charge distribution may conveniently be modeled by approximating the probe’s apex region by a homogeneously polarized prolate spheroid, as shown in Fig. 36. The prolate spheroidal coordinates are given by (see, for example, Moon and Spencer, 1961) x = a sinh 7 sin -9 cos
c1
(123a)
y = a sinh q sin 6 sin El
(123b)
z = acoshqcos-9,
( 123c)
where a determines the focal position of the probe. The probe’s surface is
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
131
FIGURE36. Model used for calculating the electrostatic interaction between a dielectric probe and a dielectric or metallic sample.
then given by q = qo (1 5 qo 5 oo), which determines the sharpness, and 0 5 0 5 7r, 0 5 5 27r. The problem of calculating the electrostatic potential for the probe-sample arrangement is thus again reduced to a standard problem of potential theory. The elementary solutions of Laplace’s equation are of the form (see Moon and Spencer, 1961)
<
q5 = Pn(coshq)Pn(cosO), Q,(coshq)P,(cosO),etc.,
(1 24)
where P, and Qn are Legendre functions of the first and second kind of order n. The complete solution for the potential is then obtained by a linear combination of the elementary solutions. As pointed out in Section TI, the use of dielectric probes is of particular importance with respect to the analysis of long-range surface forces. The Laplace solution for dielectric probes, Eq. (124), is directly transferable to magnetic microsensors used in magnetic force microscopy. B . Operational Conditions
In order to maintain a constant force F o r compliance F’ between probe and sample the microscope’s servo increases the probe-sample spacing over regions of strong attractive interaction and reduces the spacing over regions of weak attraction. The resulting contours of constant F o r F ’ , z = z ( x , y ) , are determined by
where FS” is the surface force contribution, F i ) the electrostatic servo contribution, FA) the magnetic contribution, and FJ’) the constant force or compliance maintained by the servo system. For purposes of contrast modeling, Eq. (125) is solved by standard iterative methods, where the results presented in Section I1 may be employed to account for Fi’). Suitable results for F i ) are presented in Section IV. It has been demonstrated
U. HARTMANN
132
that the electrostatic contribution in F f ) in Eq. (125) may be used to separate topographic and magnetic information, when an additional sinusoidal voltage between probe and sample is superimposed (Schonenberger and Alvarado, 1990a). If the working distance is sufficiently large, the surface force contribution FJ’) in Eq. (125) may often be neglected. The magnetic response of the microscope may be linearized by choosing F f ) >> F$. Thus, the local variation of the probe-sample spacing is given by
Az(X, y ) = F$ (X, y ) / (aF$’ / a z )d, where d = z(F$)) is the average spacing. If magnetic forces are absent, Eq. (125) provides an elegant method of obtaining information about the effective probe radius and about the surface forces. If the decay rate of the relevant surface force is given by a C R , / d ” inverse power law, and if the probe-sample separation is sufficiently small so that eq. (122) can be applied for the electrostatic interaction, then Eq. (126) yields T ~ E O (V
f &)*/d+ C / d ” = Fo/R,,
( 1 27)
where Fo is the maintained force and R, the probe’s effective radius, while the sample is considered to be perfectly flat. Additionally to the applied voltage, the contact potential q$ between probe and sample is included. If the relevant surface force is the nonretarded VDW force, then C = Hn/6 and n = 2, where H , is the nonretarded Hamaker constant (see Section II.B.4). Equation (127) thus yields for a constant-compliance measurement V
+ & ) * / d 2+ H n / 3 d 3= F d / R , .
TTTETTE~(
(128)
If V is ramped from negative to positive values while maintaining Fd, the d( V ) curves approach straight lines for voltages I V I 2 1 V, since the VDW interaction can be ignored in this regime. The slope is given by d d / d V = ,,/mcoR,/F,’
(129)
and allows the determination of the effective probe radius R , . The asymptotes of d( V ) intersect at V = -I&, which allows a measurement of contact potentials. Finally, a determination of the minimum probe-sample separation, do = d( V = -$,.), allows via
H,
= 3Fddi/R,
( 130)
a measurement of the nonretarded Hamaker constant. For the retarded
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
133
VDW interaction one obtains (see Section II.B.4)
H,
= Fdd;/2nRp.
(131)
First direct measurements of Hamaker constants according to this method have recently been presented by den Boef (1991), where some reasonable values for the constants were obtained. Complete VDW images of a sample may be obtained by recording d( V ) curves at any probe position. In this context it seems worthwhile to emphasize the relationship between lateral and vertical resolution inherent to all force microscopes (Hartmann et al., 1991):
F(’)(x+ AX,^
+ AY)- F ( ’ ) ( x , ~=) min ( F ( ’ ) ) .
(132)
Two points of the sample surface, the lateral positions of which are mutually shifted by Ax and Ay, are distinguishable if they produce a variation in force F or compliance F’ which is equal to the force sensitivity min ( F ) or compliance sensitivity min ( F ’ ) . For optimized instruments, min ( F ( ’ ) )is mainly limited by the thermal vibration of the cantilever (Sarid, 1991). The thermal vibration amplitude is determined by the equipartition theorem and is given by
~o((Ad)~= ) / k2T / 2 ,
(133)
where co is the spring constant of the free cantilever. For example, for a co = 1 N/m cantilever at room temperature, the rms thermal vibration amplitude amounts to 0.06 nm. Under typical operation conditions, thermal noise limits the detectable compliance to min ( F ’ ) M Njm.
IV. THEORY OF MAGNETIC FORCE MICROSCOPY A . Basics of Contrast Formation
If the probe and sample of a scanning force microscope (SFM) exhibit a magnetostatic coupling, the magnetic force microscope (MFM) is realized. The manifestation of magnetostatic interactions is obvious if a sharp ferromagnetic tip is brought into close proximity to the surface of a ferromagnetic sample. Raster-scanning of the tip across the surface then allows the detection of spatial variations of the probe-sample magnetic interaction. However, contrary to electromagnetic surface forces and to externally applied electrostatic tip-sample interactions, the long-range magnetostatic coupling is not directly determined by the mesoscopic probe geometry, but rather by the internal magnetic structure of the ferromagnetic probe. As
134
U. HARTMANN
shown in the following, this complicates matters extremely and requires a detailed discussion of contrast formation. A sharp ferromagnetic needle exhibits in a natural way a considerable magnetic shape anisotropy which forces the magnetization vector field near the probe's apex to predominantly align with the axis of symmetry of the probe. On the other hand, sufficiently far away from the apex region, where the probe's cross-sectional area is almost constant, the more or less complex natural domain structure is established. This domain structure depends on the detailed material properties represented by the exchange, magnetocrystalline anisotropy, and magnetostriction energies. Lattice defects, stresses, and the surface topology exhibit an additional influence on the domain structure (see, for example, Chikazumi, 1964). Because of this complicated situation, it is necessary to develop reasonable magnetic models to describe the experimentally observed features of magnetostatic probe-sample interaction as accurately as possible. Since it is generally hopelessly complicated to derive the actual magnetization vector field of a probe from first principles (Brown, 1963), it is reasonable to apply the model shown in Fig. 37. The unknown magnetization vector field near the probe's apex, with all its surface and volume charges, is modeled by a homogeneously magnetized prolate spheroid of suitable dimension, while the magnetic response of the probe outside this fictitious domain is completely neglected. The second assumption is that the dimensions and the magnitude of the homogeneous magnetization of the detector domain are both completely rigid, i.e., independent of external stray fields produced by the sample (Hartmann, 1988). In this way the micromagnetic problem is simplified to a magnetostatic problem. It is shown in what follows that this model allows a simulation of almost all experimental results obtained so far. Moreover, the concept of assuming a single prolate spheroidal domain which is magnetically effective
FIGURE 37. Effective-domain model used for contrast analysis in magnetic force microscopy.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
135
approaches reality for bulk ferromagnetic probes surprisingly well, as shown in Fig. 38. Using this “effective-domain model,” the problem is now to determine the probe’s magnetic properties and the probe-sample magnetostatic interaction for a variety of experimentally relevant cases. The magnetostatic potential created by any ferromagnetic sample is given by
where Ms(r’) is the sample magnetization vector field and s’ an outward normal vector from the sample surface. The first two-dimensional integral covers all surface charges created by magnetization components perpendicular to the bounding surface, while the latter three-dimensional integral contains the volume magnetic charges resulting from interior divergences of the magnetization vector field. The stray field is then given by
FIGURE 38. Transmission electron microscope image of an electrolytically prepared nickel tip taken in the Foucault mode of Lorentz microscopy (see, for example, Reimer, 1984). Details of probe fabrication and characterization were given by Lemke et al. (1989) and McVitie and Hartmann (1991). The dark pattern outside the tip reflects the stray field component oriented along the probe’s axis and allows a determination of the relevant apex domain dimensions. Note that apart from the apex domain, no sources of flux escape are observable. The irregularities on the probe’s surface are nonmagnetic contaminations probably resulting from the fabrication process.
136
U. HARTMANN
H,(r) = -v$$(r). The magnetostatic free energy of a microprobe exposed to this stray field is
where $ , ( T I ) is given by Eq. (134) and Mp(r’) is the magnetization vector field of the probe. The resulting force is then given by F(r) = -v$(r). This ansatz is rigorously valid for any probe involving an arbitrary magnetization field Mp(r).The first integral, taken over the complete surface of the probe, covers the interaction of the stray field with free surface charges, while the latter volume integral involves the probe’s dipole moment as well as possible volume divergences. According to the effective-domain model (see Fig. 37), Mp(r) is divergence-free, and the latter integral in Eq. (135) reduces to the dipole response exhibited by the probe. In many cases of contrast interpretation, it turns out that even further simplification of the probe’s magnetic behavior yields satisfactory results (Hartmann, 1989a, 1990b). The effective monopole and dipole moments of the probe, resulting from a multipole expansion of Eq. (135), are projected into a fictitious probe of infinitesimal size which is located an appropriate distance away from the sample surface. The a priori unknown magnetic moments as well as the effective probe-sample separation are treated as free parameters to be fitted to the experimental data. This is known as the “point-probe approximation” (Hartmann, 1989a). The force acting on the probe which is immersed into the near-surface sample
FIGURE39. Four basic geometrical arrangements often met in magnetic force microscopy. Only the force component along the vector n is usually detected. In the most simple situation, the probe’s magnetic moment m and the average normal vector from the sample surface are on the axis (a). In (b) n and z are on the axis, while m is arbitrarily tilted. (c) involves a situation in which m is perpendicular to the sample surface, while an off-axis force component is detected. (d) reflects the general situation, where all involved vectors are mutually tilted.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
137
microfield is then given by F
= po(q
+ m.v)H,
which implicitly involves the condition v x H = 0. q and m are the probe’s effective monopole and dipole moments. However, as shown in Fig. 39, this force is generally not directly detected by MFM. Usually the instrument detects the vertical component of the cantilever deflection. (Some instruments, however, allow a simultaneous detection of lateral components.) The detected force component is thus rather given by Fd = n-F, where n is the outward unit normal from the cantilever surface. Well-defined different orientations of the probe with respect to the sample, as shown in Fig. 39, then allow the successive detection of lateral as well as vertical field components. Putting Eq. (136) into component form, one gets the more illustrative result
which is the basis of contrast modeling if the MFM is operated in the static mode. However, most instruments are operated in the dynamic mode, where the probe-sample separation is periodically modulated with an oscillation amplitude which is small compared with the average probe-sample distance. In this case the compliance component Fd(r) = (n.v)(n.F(r)), with F(r) according to Eq. (136), is detected. Contrast modeling is then based upon
which involves, apart from monopole moment and dipole components, “pseudo-potentials” $p = aq/dxi and “pseudo-charges’’ q p k i = dmk/axi.2 These pseudo-contributions result from the fact that the actual magnetic response of a real probe of finite size clearly depends on its position with respect to the sample surface. This aspect, which has been completely neglected in previous models, is further clarified in Section 1V.B. In the present context the most important consequence is that in ac mode v’q = I could of course also be associated with a “pseudo-current” and v.m = Vv.M with a “pseudo-divergence” of the probe magnetization within the volume V . However, in the context of Eq. (138), the component form is emphasized and the denotations “pseudo-potential” and “pseudo-charge” are thus preferred.
138
U. HARTMANN
MFM, it is not only the second derivatives of the field components that contribute to the ultimately observed contrast but, according to Eq. (138), also the first derivatives, as well as the field components themselves. The number of field derivatives entering Eqs. (137) and (138) is reduced by v x H(r) = 0, leading to
The most serious limitation of the point-probe approximation is of course that low-pass filtering of the sample’s stray field configuration due to the finite probe size is completely neglected (Schonenberger and Alvarado, 1990a). This latter effect can be accounted for by applying a low-pass filter of type
where r = ( p ,d ) determines that geometrical center of the probe which is at a height d above the sample surface. p‘ is a cross-sectional radial vector whose range is determined by a certain effective probe diameter A. B. Properties of Ferromagnetic Microprobes
1. Bulk Probes Most force sensors in MFM have been fabricated from fine electrochemically etched ferromagnetic wires (Lemke et al., 1990), predominantly made out of nickel and iron. For these soft magnetic materials the apex domain is consistently found to exhibit a major axis length (see Fig. 37), between a few hundred nanometers and about one micrometer (Schonenberger and Alvarado,l990a; Goddenhenrich et al., 1990a), while for some tips, lengths of more than 10 micrometers have been found (Rugar et al., 1990). The actual extent of the apex domain is closely related to the sharpness of the tip. According to the effective-domain model introduced in Section IV.A, the shape-anisotropy field is related to the on-axis demagnetization coefficient by (141a) H , = M ( I - 3N>)/2, where M denotes the probe’s magnetization. N> depends on the aspect ratio a (minor to major semiaxis) of the apex domain (Bozorth, 1951), which is determined by the near-apex geometry of the probe. Figure 40 shows H , as a function of a. For purposes of comparison, the magnetocrystalline
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
0.01 0.1
0.2
0.4
0.6
0.8
139
1.0
a FIGURE40. Shape-anisotropy field normalized to the tip magnetization as a function of aspect ratio. For purposes of comparison, the magnetocrystalline anisotropy fields for the most frequently used tip materials are indicated.
anisotropy fields HK
= 21Kl/pOM,
(141b)
with the room-temperature anisotropy constants IKyI = 4.1 x lo5 J/m3, 4.8 x 104J/m3, and 4.5 x lo3J/m3 for cobalt, iron, and nickel are indicated, respectively. The on-axis orientation of the near-apex domain becomes unstable if H s ( a ) 5 H K . Thus, for iron and nickel even relatively blunt probes still exhibit on-axis polarization, while for cobalt at least a sharpness of ( x 5 0.3 is required. Once the actual dimensions of the effective domain are determined, e.g., by direct observation as in Fig. 38, contrast modeling is based either directly on Eq. (135) or is performed in the point-probe approximation represented by Eqs. (137) and (138). Suitable algorithms for calculating the sample stray field are discussed in Section 1V.C. However, an essential question in the present context of probe characterization is that of stray fields, which are produced by the probes themselves, as can clearly be seen in Fig. 38. At sufficiently small probe-sample separations, this highly focused microfield may perturb the near-surface magnetization of soft magnetic samples (Hartmann, 1988; Goddenhenrich rt al., 1988; Mamin et al., 1989; Scheinfein et al., 1990). However, apart from destructiveness, new concepts for MFM may arise from the availability of highly focused magnetic stray field sources, as provided by sharp tips. This latter aspect is discussed in
140
U. HARTMANN
detail in Section 1V.E. In any case, the effective-domain model permits a fairly realistic analysis of the stray field configuration produced by typical MFM probes. It is convenient to start the analysis with a paramagnetic, prolate spheroidal particle exposed to an external, homogeneous magnetic field. The situation is thus in complete analogy to the case of a dielectric tip exposed to an electric field as discussed in Section 111. Boundary conditions for the present magnetic Dirichlet problem with axial symmetry are
where, as before, prolate spheroidal coordinates ( q , 8 ,E ) have been applied. Thus, a cos qo equals the major semi-axis R , of the spheroidal particle under consideration. The externally applied field Ho is parallel to R,, and p denotes the relative permeability of the particle. 4i and $e are the interior and exterior magnetostatic potentials, and 4o is a less important gauge constant. The particular solutions of Laplace's equation are
4e(v16 ) = do + Hoa(cosh 71 + [ ( P - 1 )/c,~J cash vosinh 7 o Q i (coshqo)) cos 6, ( 143a)
and 4 i ( q , S ) = 40+ H o ~ / c , , , ( Q ~ ( c o s h ~ o ) c o-sQ ~ ~i ( c o s h ~ ~ ) s i n ~ ~ ) (143b) cosh q cos 8, with c ~ , ,= , ~Q;(cosh 770) cosh 70- pQi(cosh 70)sinhqo.
( 143c)
Q, (x) is the Legendre function of second kind and order one, with Q ( ( x )= dQ/dx. The corresponding fields are then easily derived by applying
where u, and He(qi 0) =
Ug
are unit vectors. Thus, one obtains at any exterior point
Ho Jsin2 q sin2 e
+
{
-
[sinhq
+
-
cosh qo sinh qoQ,'(cosh q)
-
cosh qo sinh qOQi(cosh coo 'I
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
141
An experimentally important situation (see Section 1V.E) is given by replacing the paramagnetic probe by an ideal, soft ferromagnetic one of high permeability, >> 1. Then Eq. (145) becomes
which allows a calculation of the field for an MFM probe which is polarized by an exciter coil as used in an early experiment (Martin and Wickramasinghe, 1987). The maximum field is produced at the probe’s apex. 6 = 0 and 7 4 q0 yields
This equation quantifies the well-known result that a sufficiently sharp, soft magnetic tip produces an apex stray field which by far exceeds the driving field produced by the exciter coil. The lower field limit is of course given if the tip degenerates to a sphere, qo -+ 00, which yields He,min = 3Ho. Returning to the paramagnetic case, Eq. (143b) yields the interior field
HI
=
- ( ~ o H ~ / ~ , ~ ~ ) r Q ; ( c o s h ~ o) cQo is(hc ~o soh ~ o ) s i n h ~ o ] ~(148) z,
with z = acosh vcos 0. The homogeneous demagnetizing field in the interior of a prolate spheroidal particle is usually characterized in terms of the principal demagnetization coefficient N , (Bozorth, 1951):
H,
Ho - N>M,
( 149a)
where M is the induced magnetization. Considering the magnetic induction (1 49b) one immediately obtains (1 50a) and thus (1 50b)
142
U. HARTMANN
Comparison with Eq. (148) yields N>(VO)=
Qi(cosh710)
QI(cash 70)- Ql(cosh V O ) coth 710
(151)
for the relevant geometrical demagnetization coefficient. Now, it is straightforward to deal with a usual MFM probe exhibiting the spontaneous magnetization M . The interior magnetostatic potential corresponding to the demagnetizing field is then given by
d,(q,8)= MaN>(Vo) coshVcoshf?,
( 152a)
while at any exterior point
4 e (v,e ) = Ma[l - N > ( V O 11[cash v o / Q I (cash v o 1I Q 1 (cash 7) cos 6, ( 152b) which is related to the stray field. Figure 41 shows the modified equipotentials about a typical MFM probe for (interior) N , / ( 1 - N > ) 4 (exterior)’
(153)
The vertical stray field component,
x/a FIGURE 41. Equipotentials about a typical magnetic force probe, a denotes the focal distance to the center of the apex domain.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
143
is of particular importance. Via Eq. (144) one obtains H , ( q , 8 ) = - M[1 - N,q0)](~~~hq0/Q1(~~~h(r10)](sinh2 qQ~(coshq)cos20
+ cosh qQ,(cosh q ) sin28]/[sin2q + sin281.
(1 5 5 )
Along the probe’s axis of symmetry, 8 = 0, this reduces to Hz ( z ) = - W1 - N> (11011[cash QO / Q 1 (cash T O)I Q I’(cash 77).
(156)
Figure 42 shows the decay of the axial stray field component with increasing distance to the apex for various values of the aspect ratio Q = R,/R, inherent to the effective domain. Q determines the maximum stray field directly at the apex as well as the decay rate. A sphere, a = 1, which is used by many authors for contrast analysis in MFM, yields the minimum apex field strength H , = 2M/3 and a maximum decay length. A sphere is certainly a rather poor approximation for a sharp MFM probe with Q << 1 . With decreasing aspect ratio the apex field strength increases, while the characteristic range of the stray field decreases. Thus, if a minimum or maximum stray field is required for an experiment, the appropriate aspect ratio of the probe is determined by the desired probe-sample separation, e.g., tunneling distances or a working distance of more than 100nm.
i
0 . 9 b
0.34
0.001
\
0.01 0
0.l o o
1 .do0
FIGURE 42. Vertical stray field component produced by typical magnetic force probes as a function of separation from the probe’s apex for various values of the aspect ratio LY = R , / R , . R < ,,denote the minor and major semiaxes of the erective apex domain. M is the spontaneous magnetization of the probe.
144
U. HARTMANN
1
30
a FIGURE43. Vertical stray field component directly at the apex of magnetic force sensors as a function of aspect ratio. The approximation is obtained by first-order expansion of the accurate result.
For tunneling experiments involving ferromagnetic microprobes, the vertical field strength directly at the apex of the tip is of great importance. This quantity is obtained with 77 = qo in Eq. ( 1 56) and is shown in Fig. 43 as a function of the aspect ratio. Probes with (Y 5 0.03 exhibit a stray field almost equal to the spontaneous magnetization, while the minimum value of H , = 2M/3 is obtained when the tip degenerates to a sphere. This result shows that MFM tips can produce high magnetic fields in extremely small areas (10 nm scale) at the sample surface. In order to quickly estimate the actual field magnitude for a sharp tip, first-order expansion of Eq. (156) leads to the convenient form H,(o)
=
M[I
+ ( a 2 / 2In) ( a 2 / 2 ) ] .
( 1 57)
Results obtained by this approximation are also indicated in Fig. 43. This analysis shows that contrast formation in MFM is governed by a kind of uncertainty principle. Sharp probes which would yield a high resolution from the geometrical point of view are likely to perturb the magnetic object via their high stray fields, while dull probes produce less perturbing stray field but also exhibit a reduced lateral resolution. The radial stray field component produced by the microprobe plays a role, for example, if the probe is raster-scanned across a magnetic object which is sensitive to in-plane pinning forces. Such a probe-induced pinning has been observed for highly mobile interdomain boundaries in iron
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
145
0.30 0.25
0.20 0.15 0.10
I 0.05 0.00 I -.05 -.lo -.15
-.20 -.25
-.30
in combination with Eq. (144) yields H , ( 7 / ,19)=
M [1
-
N,(~o)][coshqO/Ql (cosh qo)][sinhvsin0cos8/ sin2 v
+ sin2 O]lQl(cosh v) - cosh ~)Q{(coshq)].
(159)
The behavior of the radial stray field component at the sample surface is shown in Fig. 44 for probe-sample contact. The maximum magnitude is almost independent of the aspect ratio (Y and is about 30% of the saturation magnetization. However, the peak's sharpness and its distance to the probe's axis of symmetry are sensitive functions of a. At large probesample separations (see Fig. 45), the initially pronounced field peaks of sharp probes are smeared out and gradually diminish, while a spherical probe still produces a more or less pronounced peak. The philosophy inherent to the effective-domain model is to approach the net magnetic response exhibited by an MFM probe which involves a more or less complex internal micromagnetic configuration by considering the purely magnetostatic behavior of an idealized apex domain (see Fig. 37). The total magnetic dipole moment of the probe according to this model is
146
U. HARTMANN
-.040
,
-10
. ,
-0
.
,
-6
14
12
0
2
4
6
0
10
dR< FIGURE 45. Same as in Fig. 44 (note the difference in scaling of the abscissa), but for a probe--sample separation that equals the minor semiaxis of the apex domain.
then simply given by m = VM, where V is the volume and M the spontaneous magnetization of the effective apex domain. However, according to Eq. (135), the magnetostatic free energy of the probe, which is exposed to the near-surface microfield of the sample, involves an integration of the spatially varying magnetostatic potential over the probe’s bounding surface and of its gradient, i.e., of the stray field, over the complete volume of the probe. This implies that for sample stray fields whose characteristic vertical decay lengths X are much smaller than the major axis 2 R , of the probe’s apex domain, only a small part of this domain is really relevant for contrast formation. The probe’s effective volume element obviously involves a “magnetic monopole moment,” since the net surface charge of the tip no longer vanishes. This latter aspect was completely neglected in almost all previous discussions of contrast formation. If only a small part of the probe’s apex domain contributes to contrast formation, the tiny magnetic object obviously images the dull probe. This latter phenomenon is of course the reason for the fairly high lateral resolutions which have been obtained in some experiments (Grutter et af., 1990b; Hobbs et al., 1989; Moreland and Rice, 1990), where magnetic objects much smaller than the actual probe diameter, 2 R , , have been imaged. To obtain a complete deconvolution of probe and sample properties, as desired in the point-probe approximation (see Section IV.A), effective magnetic monopole and dipole moments have to be attributed to the probe.
147
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
These have to be compatible with the average range of the sample microfield under investigation. If X is the characteristic vertical decay length, and if the probe-sample separation is given by d, then the actual range of interaction of the microfield with the probe’s apex domain is given by 6 = X - d. The net magnetic charge and dipole moment, carried by the probe, are obtained from simple geometrical arguments: ( 160a)
= - x ( ~ 6 ( 2 R ,- 5 ) M
and
m
2 .2
= T (Y
5 (R,
-
( 160b)
6/3)M,
where M = 3= JMlu,is assumed in accordance to the effective-domain model shown in Fig. 37. The apex domain is completely determined by the aspect ratio (Y = R J R , , the major semiaxis R,, and the spontaneous magnetization M , as before. The effective interaction range 6 in turn determines the relevant probe diameter
A = 2ad-6, (161) which is, according to Eq. (140), an important quantity to model size
-
-.2-
dp-..
-.4-
-
-.6-
-2.-1
.o
a = 0.1 I
I
I
I
I
I
1
I
FIGURE46. Constitutional parameters of the advanced point-probe approximation which allow a fairly realistic estimate of the effective monopole and dipole moments, q and m,as well as the effective probe diameter A. q%pand yp are the pseudo-potential and the pseudo-charge effective in the dynamic mode of operation. The total magnetic charge of the probe is denoted by Q.S is the effective interaction range which is normalized with respect to the probe’s major semiaxis. The assumed aspect ratio may be considered as typical.
148
U. HARTMANN
effects in the advanced point-probe approximation. The effective, moments implicitly depend on the working distance via S = X - d. Thus, if d is periodically modulated upon dynamic operation of the MFM, the magnetic nonlinearity of the sensor leads to the "pseudo-contributions" (162a) 4p = -27ra(R, - 6 ) M , which reflects a potential, and q p = w,
( 162b)
which gives an additional magnetic charge. These pseudo-contributions have already been taken into account in Eq. (138). Figure 46 shows the dependence of all quantities required to characterize an MFM probe in terms of the advanced point-probe model on the effective interaction range. Up to now it was implicitly assumed that the local microfield produced by the sample exhibits a characteristic decay length X in the vertical direction, while the lateral variation is long-range with respect to the effective probe diameter A. However, this situation is of course somewhat pathological. Generally, the radial range of interaction also involves a certain decay length p , In this case, it is convenient to define a modified vertical range of interaction by
6'
= R, -
4R;
- ( p / c ~ ) (with ~ p
5 aR,).
(163)
Straightforward geometrical arguments then lead, with respect to Eqs. (160)-(162), to the transformations
46)
+
( 164a)
(1(6*),
m ( s ) + 7rp2(6 - S' ) M + m ( S * ) ,
( 164b)
4pm
dJ,(S*)l
( 164c)
T P 2 M + qp@* ).
( 164d)
qp(S)
+
+
The nominal probe diameter is directly determined by the radial range of interaction:
A
-+
2p.
( 164e)
Equations (164) are the basis of contrast modeling for a bulk ferromagnetic probe interacting with a microfield which locally involves a vertical decay length X and a radial in-plane decay length p . 2. Thin-Film Probes While, up to the present time, most MFM experiments have been performed using bulk ferromagnetic tip-cantilever systems, some results have also been
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
149
obtained by coating nonmagnetic wire tips with ferromagnetic thin films (Rugar et ul., 1990; Mamin et al., 1990; den Boef, 1990; Sneoka et al., 1991). The coating was performed either by sputter or electrolytic deposition of the ferromagnetic material. However, today, the most advanced cantilevers are microfabricated from silicon, silicon oxide ( S O 2 , Si203), or silicon nitride (Si3N4) by photolithographic techniques (see, for example, Albrecht and Quate, 1988). The integrated extremely sharp tips (Wolter et al., 1991) with apex radii down to about 5nm can easily be coated with soft or hard magnetic materials (Griitter et al., 1990a, 1991). The obvious advantages of microfabricated MFM sensors are the excellent mechanical properties, the extremely sharp integrated tips, and of course the possibility of batch fabrication for a variety of magnetic coatings, which relieves the tedium of etching each individual sensor. However, from the magnetic point of view it is a priori not so clear whether there are considerable advantages over the conventional bulk probes, since no detailed analysis of the magnetic properties of thin-film tips has yet been presented. The effective-domain model presented in Section IV.B.1 for bulk ferromagnetic probes permits a straightforward extension to thin film probes. Instead of directly considering the isolated ferromagnetic thin film deposited on a nonmagnetic probe, it is more convenient to apply the model shown in Fig. 47. The thin layer is modeled with respect to its magnetic properties by considering two fictitious ellipsoidal bulk probes, the foci (or apices) of which are shifted with respect to each other by the film thickness, and the magnetization vectors of which are antiparallel. Both probes are assumed to exhibit the bulk magnetic properties discussed in Section 1V.B.I. This two-probe model implicitly assumes that the thin film has a thickness which is sufficient to ensure bulk ferromagnetic properties. Now the magnetic analysis is straightforward, and the complete framework developed for bulk probes can be used. The aforementioned geometrical boundary condition, which concerns the relative apex positions of the fictitious bulk probes, allows the modeling of
FIGURE 47. Two-probe model applied to the analysis of thin-film magnetic force probes.
150
U . HARTMANN
two different types of thin-film tips. For the type-I probe it is assumed that the semiaxes of the inner probe are given by
where R>,< are the corresponding axes of the outer probe and t is the film thickness. Transfer of these conditions into prolate spheroidal coordinates yields a* =
ad1
-
( 166a)
2t/a(cosh 770 - sinh ~0
for the focal position, and CoshV; = (a/a*)[coshVO- t / a ]
(1 66b)
for the sharpness of the inner probe. Additionally, one obtains cash v* = (u/u*)cash 77,
( 166c)
which determines, together with 8' = 0 and <* = <, the location of any exterior point with respect to the inner probe. For the type-I1 thin-film probe, the geometrical boundary conditions are chosen as
0.0
-.5
5 -1.0
-1.5
,
-I
_. Type I I
FIGURE48. The two basic types of thin-film magnetic sensors. f denotes the film thickness and n the focal length of the outer spheroid.
151
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
This yields a* = a( 1
(1 68a) for the focal position of the inner probe, which exhibits the same aspect ratio as the outer one, i.e., 77; = 770. (1 68b) -
t/R,)
Hence, as before, (1 68c)
cash Q* = (u /L z *cash ) 7,
as well as 0' = 0 and E' = [. The geometrical differences between the two probe types are clarified in Fig. 48. Both types are fairly realistic and could be fabricated by suitably adjusting the parameters for thin-film deposition. With respect to the differences in magnetic behavior, it is interesting to compute the magnetostatic potential according to Eq. (153). According to the two-probe model, the total potential of a thin-film probe is given by 4,(77,0> = 4(% 0 ) - 4*(77,0 ) )
(169) t ) = 4a.,T,o. (q*,0 ) . The potentials are calculated according to where 4*(~, Eqs. ( 1 52) for any interior (71 5 7 ; ) and exterior (77 2 v0) point. Within the thin film (77; 5 77 5 q o ) , one has to use the interior potential for 4 and the exterior one for #*. The result of such a calculation is shown in Fig. 49 for a type-I probe of exactly the same outer geometry as used in Fig. 41. The comparison confirms that the stray field of the thin-film probe exhibits a 0.0 -.5
-1
.o
0
> -1.5
-2.0 t/a
-2.5
5
= 0.1 1
-1.0
I
-.5
I
0.0
I
0.5
I
1 .o
5
x/a FIGUH 49~ ~Equipotentials for a thin-film magnetic force sensor of type 1.
152
U. HARTMANN
0.0
1
-.5
-1.0
R
u #
-1.5
= 0.005 aM
I I
+ 0.005
-2.0 t/a
-2.5
aM
= 0.1
-i.o
j
-15
I
I
0.5
0.0
I
1.o
5
x/a FIGURE 50. Same as in Fig. 49, but for a type-I1 force sensor.
t
0.001
Y-V.
0.01
8
I
I
0.010
0.100
1.ooo
d/R> FIGURE 51. Axial vertical stray field component produced by a thin-film probe with respect to that of a bulk probe of same outer geometry. ddenotes the distance to the probe's apex. R , is the major semiaxis, a the aspect ratio, and t the film thickness.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
153
somewhat smaller range than that of the bulk probe. Because of the geometrical boundary conditions, some of the equipotentials close through the interior of the probe, thus yielding a homogeneous “demagnetizing” field. On the other hand, near the apex the equipotentials exclusively close through the thin film (not shown for reasons of clarity). The corresponding result for a type-I1 probe is shown in Fig. 50. Because of only slight modifications of the geometrical boundary conditions, this probe type shows a substantial reduction in stray field range with respect to type I. The interior is now completely field-free and all equipotentials close through the magnetic layer. The preceding analysis of the two basic types of thin film sensors clearly shows that, if the major goal is a reduction of the stray field range, a thinfilm tip should be of type 11. However, the dispute concerning thin-film versus bulk probes is of course also related to the question of absolute field magnitude (Griitter et al., 1991). The two-probe model permits an easy calculation of the stray field produced by a thin film probe, by applying Eqs. ( 1 55)-( 159) to the two fictitious bulk probes and by deriving the net field according to Eq. (169) by a subtraction of the field produced by the inner probe from that produced by the outer probe. The result for the axial field component along the probe’s axis of symmetry is shown in Fig. 51 for different film thicknesses and aspect ratios of a type-I1 thin-film probe. With respect to a bulk probe which exactly fits the outer geometry of the thin-film
2-
0.40 _____________
0
N
I
.Y-
O
N
I
0.01
0.05
0.10
0.50
1
a F I C ~ J R52E Apex field strength of thin-film probes with respect to those of geometrically equivalent bulk probes as a function of aspect ratio for three different coating thicknesses.
154
U. HARTMANN
probe, the stray field of the latter shows a more rapid decrease with increasing distance from the apex. This behavior is most pronounced for sharp probes (small aspect ratios a ) and small film thicknesses. Dull probes show a stronger field reduction than sharp probes right at the apex, but a relatively small decay rate. If the experimental boundary conditions are given in terms of a certain desired probe-sample separation, Fig. 51 permits a determination of that specific film thickness, which yields the minimum perturbing stray field for a tip of given sharpness. For tunneling experiments involving ferromagnetic probes, the field strength directly at the apex of the probe is of predominant importance. Figure 52 shows the apex field strengths produced by type-I1 thn-film probes with respect to those exhibited by equivalent bulk probes. For very small aspect ratios, a 5 0.01, there is no reduction in field strength at all. However, with increasing a, the field reduction exhibited by the thin-film probes becomes more and more pronounced, where thinner films produce less stray field than thicker ones. As a concrete example, a type-I1 thin-film probe with a major semiaxis of 500 nm, a film thckness of 25 nm, and an aspect ratio of 0.5 produces only 40% of the stray field which would be produced by an equivalent bulk probe (same outer geometry, same ferromagnetic material). The radial stray field magnitude in the sample plane at apex-sample contact, d = 0, is shown in Fig. 53. Again the reduction in stray field 1
I
I
I
FIGURE 53. Radial stray field component produced by thin-film probes with respect to that produced by equivalent bulk probes for vanishing probe-sample separation as a function of the radial distance to the probe's apex. For a given aspect ratio a , the upper, middle, and lower curves correspond to thick, medium, and thin magnetic coatings.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
155
increases with increasing distance from the apex. As for the vertical field component, the effect is most pronounced for dull probes and small coating thicknesses. Note that, according to Fig. 44, the radial field components of both thin film and bulk probes vanish at the apex, r = 0. However, the limiting stray field ratio remains finite. With respect to the dispute concerning thin-film versus bulk probes, the following conclusion can be drawn. An advantage of thin-film probes, apart from their advanced mechanical properties, is the reduced stray field to which the probe is exposed under typical MFM operation conditions, i.e., a probe-sample separation of at least 10 nm. At smaller separations, thin-film probes exhibit almost the same magnitude of stray field as bulk probes. This latter aspect, however, may also be considered as an advantage for applications where just a highly focused stray field is desired, as discussed in Section 1V.E. The major advantage of thin-film probes is of course that they are in general mesoscopically much sharper than bulk probes. Up to now, thin-film probes have only been considered with respect to the stray field that they produce. Intuitively, it is obvious from the two-probe model shown in Fig. 47 that both monopole and dipole moments of thinfilm probes are greatly reduced with respect to a geometrically equivalent bulk probe. However, due to the close vicinity of opposite free surface magnetic charges, the monopole moment appears to be much more greatly reduced than the dipole moment. The consequence is thus that a thin-film probe produces a magnetic image of a given sample which is generally different from an image produced by a geometrically equivalent bulk probe. The differences with respect to the sensor behavior are manifested in modifications of the magnetic moments, which are given in Eqs. (160) and (162) for bulk probes. The two-probe model provides a simple transformation procedure solely based on geometrical arguments:
( 170a) for the monopole moment, m ( 6 )+ m ( 6 ) - ( 1
-
t/R,)3m(6
-
t)
(170b)
for the dipole moment,
( 170c) for the pseudo-potential, and
( 170d)
156
U. HARTMANN
for the pseudo-charge. 6 is the effective range of interaction, as before. The nominal probe diameter A(6) remains the same, since the outer geometry of the probe does not change. The preceding magnetic moments with respect to those obtained for bulk probes are shown in Fig. 54 as a function of interaction range. The obtained results clearly emphasize the fact that the reduction in stray field has to be paid for by a reduction in magnetic sensitivity. As a concrete example, one obtains from Fig. 51 for a thin-film probe with an aspect ratio a = 0.3, a major semiaxis R , = 800 nm and a coating thickness of t = 50 nm, a reduction of the axial vertical stray field component H , by about 40% with respect to a comparable bulk probe at a working distance d = 50nm. If the sample stray field exhibits a vertical decay length of X = d 6 = 250nm, Fig. 54 then shows that the corresponding reduction of the monopole moment is about 70% and that of the dipole moment about 50% with respect to the bulk probe. If the local radial stray field range is governed by a decay length p, a modified vertical interaction range 6* is defined, as in Eq. (163). If 6 is smaller than the film thickness t , the magnetic moments of the thin-film probe are the same as for the bulk probe, i.e., they are given by
+
1 .oo
0.80 0.60
0.10
0.08 0.06
0.02
0.01
0.0
'zero:
0.1
0.2 0.3
0.4 0.5 0.6 0.7
0.8 0.9
1.0
m, FIGURE 54. Magnetic moments of thin-film probes with respect to those of equivalent bulk probes as a function of characteristic interaction range. q denotes the monopole moment, m the dipole moment, dp the pseudo-potential, and qp the pseudo-charge. Q is the total magnetic charge of the probe.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
157
Eqs. (164). On the other hand, for 6*2 t the moments are obtained via Eqs. (170) by replacing 6 by 6*. C. Contrast Modeling 1. Treatment of’ Periodic Charge Distributions Once the magnetic properties of the microprobe are determined, contrast modeling requires an appropriate solution for the microfield profile which is produced by the magnetic sample under investigation. Thus, it is not so surprising that MFM has led to a renascence of the development of advanced algorithms for stray field calculation. Such calculations received much interest almost 30 years ago, during a period when contrast formation underlying the Bitter colloid technique was extensively investigated. The first attempts to detect near-surface microfield profiles by scanned solid-state probes, e.g., Hall and Permalloy induction probes, were even performed at that time (Carey and Isaac, 1966). However, today advanced observation techniques and theoretical methods provide a much deeper insight into the near-surface magnetization of ferromagnetic samples. Microfield calculation is then based, for a given configuration of the sample’s magnetization vector field, on the integral equation (134). Fortunately, the solution can be obtained for almost all samples which have been studied by MFM in an analytical way. The effective-domain model then offers two alternatives: either a combined surface/volume integration over the idealized ellipsoidal domain of the probe, or a direct employment of the stray field components and their spatial derivatives in terms of the advanced point-probe approximation represented by Eqs. (137)-( 140). While the first way always involves some computational effort, the point-probe ansatz can be performed in a straightforward analytical way. Especially if, in terms of Eq. (l40), the finite probe size is accounted for, the data obtained from the advanced point-probe approximation turn out to be consistent with almost all experimental data which have been presented so far (Hartmann et al., 1991). In the following, contrast modeling is performed for some cases of particular experimental relevance. It turns out that classical potential theory combined with the introduction of free magnetic charges, as already used in Section IV.B, is a convenient concept to understand the contrast produced by an MFM. If one has an arbitrary two-dimensional periodic magnetic charge distribution at the sample surface, Fourier expansion of the charge density is given by (171a)
158
U. HARTMANN
with umn
=
1 J’” 4 4n2 0
1;
dV(J, 7)exp (-i
[M+ w1)
1
(171b)
where [ = x/Lx, Q = y/L,,, and where L,, Ly define the unit cell of area 4n2L,Ly. The Laplace equation, 0’4 = 0, valid exterior to the sample, together with the condition of continuity of B = po(H M), yields via H = -V+ the stray field produced by the periodically charged sample. Directly at the surface, z = 0, one thus obtains for the vertical stray field component
+
Y , 0) = 4x7 Y , )/2.
(172)
The exterior solution for the Fourier coefficients of the magnetic potential are thus dmn = -(nmn/2vmn)
~ X (P- v m n z ) i
(173a)
with the “spatial frequencies” (1 73b)
The complete exterior Laplace solution is thus given by
An important area of application to MFM is the analysis of thin-film structures, e.g., of recording media. If the probe-sample separation becomes comparable with or even exceeds the film thickness, the stray fields of both the top and the bottom sample surface contribute to the contrast. Thus, if t is the film thickness,
4(r) = -
C 1
m=-mn=-m
( g m n / v m n ) sinh tvmnf/2>~
X ( P i + WI~
- ~mnz)t
(175) where z is the vertical distance measured from the center of the film. This kind of treatment of periodic magnetic charge distributions was originally used in some classical work devoted to an analysis of the magnetostatic stability exhibited by certain periodic domain arrangements (Kittel, 1956). The applicability to highly symmetric problems in MFM is of course fairly obvious (Mansuripur, 1989; Schonenberger and Alvarado, 1990a). The form of Eq. ( 175) is particularly suitable for numerical computation involving standard two-dimensional FFT algorithms.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
FIGURE
159
55. Schematic of a thin-film longitudinal recording medium.
2. Longitudinal Recording Media In longitudinal magnetic recording, the recording head is flown over the medium with a spacing of a few hundred nanometers or less. Upon writing, oppositely magnetized regions with head-to-head and tail-to-tail transitions are created, as shown in Fig. 55. Since the transitions of width S involve free magnetic charges, a stray field is generated which transmits the stored bit configuration to the recording head. MFM is thus a particularly useful method of analysis, since it detects the generated stray field profile, which is detected by the recording head upon reading operation. Since the stray field is produced right at the transitions between the antiparallel magnetic regions, the detailed internal structure of the transition regimes is of great importance. The latter is determined by demagnetizing effects in the medium. The line charge approximation used in earlier approaches to MFM contrast formation (Mamin et al., 1988; Hartmann, 1989b; Wadas et al., 1990) may thus be an inexpedient approximation. An approximation commonly used in recording physics is (Rugar et al., 1990) M,(x)
=
-(2M/7r) arctan (x/S),
(176)
where M , denotes the in-plane magnetization component near the transition which is centered at x = 0, as shown in Fig. 55. 6 denotes the characteristic transition width and M the spontaneous magnetization in the uniformly magnetized regions. An estimate of 6 may be obtained, for example, from the Williams-Comstock model (Williams and Comstock, 1972). Substitution of Eq. (176) into Eq. (134) yields the stray field for an isolated transition (Potter, 1970): x(t
+z)
-
7r
arctan
x2
+ x62z + 6z]
(177a)
for the in-plane component, and H,(x,z)
M xft
= -
2
x2
+s +z y
+ (z+
(1 77b)
160
U. HARTMANN
1
0.15
1
-
8
1
1
~
"
'
I
'
"
'
d/w=O.l, t)w=0.05
z/w=O.l
-
0.10-
-
-.lo-.15
(b) I
I
I
I
1
.
8
9
I
a
1
I
'
_I
Hx
H,
"
B
'
FIGURE 56. Contributions to the magnetic contrast produced by a longitudinal recording medium. The field components are considered with respect to the in-plane spontaneous magnetization and are plotted as a function of lateral position. w denotes the spacing between the individual transitions. 6 denotes the effective transition width for which a representative value has been chosen. (a) shows the contrast contributions directly at the surface of the medium, together with the magnetization divergence. In (b) the working distance has been increased and is now equal to one-tenth of the transition spacing.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
161
for the vertical component. The total field is obtained by a linear superposition procedure,
l)”H(xwhere w is the spacing between the transitions. The stray field components together with the quantity d M , / a x , derived from Eq. (176), are shown in Fig. 56. According to Eq. ( 1 37), these stray field components predominantly produce the MFM contrast, if the probe mainly exhibits a monopole moment. While in Fig. 56a the contributions to the M F M contrast at vanishing working distance are shown, Fig. 56b shows the effect of increasing probe-sample separation. Apart from a decrease in magnitude, fine details of the field get lost. It should be noted that the MFM tip is still considered as a point probe of infinite lateral resolution. According to Eq. (175) the loss in information is due to the predominant damping of higher Fourier components at increasing working distance. This behavior involves a certain similarity to the “point-spread phenomenon” dealt with in common optics. Apart from the two field components, their first derivatives with respect to x and z also contribute, according to Eq. ( 1 37), to the contrast if the M F M is operated in the static mode. To obtain a better understanding of the relationship between the various contributions, Fig. 57 shows these field quantities, to which the dipole moment of the probe is sensitive. Because of the constraint given in Eq. (139), only three out of four derivatives are required to model the MFM contrast. Again the loss in information with increasing working distance is fairly obvious. Finally, the second derivatives, which according to Eq. (137) are relevant in dynamic-mode MFM, are shown in Fig. 58. Apart from the constraint given in Eq. (139), the symmetry of the arrangement yields d2H,/az2 = -d2H2/dx2. Thus, one has to calculate three out of six possible second derivatives. In general, all components shown in Figs. 56 and 57 contribute to the MFM contrast in the dc mode of operation, while in the ac mode the components shown in Fig. 58 provide additional contributions. For an arbitrary probeesample arrangement, as schematically shown in Fig. 39d, the ultimate contrast is obtained by a linear combination of the various field quantities, where, according to Eqs. (137) and (138), these are weighted by the corresponding magnetic moments of the probe and by the actual orientation of the cantilever with respect to the sample surface. The finite probe size is additionally accounted for by low-pass filtering according to Eq. (140).The probe’s effective magnetic moments and its effective diameter may either be treated as free parameters fitted to the experimental data, or
U. HARTMANN
162
may be estimated from the characteristic lateral and vertical decay rates of the stray field components. However, it must be emphasized that these quantities, which characterize the probe's response, are strictly dependent on the microfield profile under investigation. Calibration of the probe thus always refers to the particular sample used for calibration, rather than to the
1.2-
I
1
I
I
a/w=o.1,
,
-
'
1
8
I
"
"
I
"
"
t/w=0.05
-
0.8- z/w=O.l
3
v
-.4-
-
-.8-
-1.2
--
(b) I
I
I
I
'
'
'
'
I
'
'
'
'
_I
-
8H2/8z
8HU/8z 8H,/Ou
'
'
'
'
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
163
given probe operated on any arbitrary magnetic sample. This latter fact has not been recognized in earlier experimental approaches to probe calibration (Goddenhenrich et af., 1990b). On the one hand, the interplay of five basic contrast contributions in staticmode MFM and eight occurring in the dynamic mode often makes the
b / w = O . l , t/w=0.05 15 Y X
2rc)
\ I cb
>3 U
-10 -15
FIGURE58. Same as in Fig. 56, but for the second derivatives of the stray field components. These quantities become relevant if the force microscope is operated in the dynamic mode.
164
U. HARTMANN
interpretation of experimental data from longitudinal recording media difficult. On the other hand, the well-defined relative orientations of probe and sample, as shown in Fig. 39, may greatly reduce the number of individual contrast contributions. The minimum number is two, e.g., H, and d H Z / d z , for dc microscopy, and three, e.g., H,, dH,/dz, and d2H,/dz2, for ac operation. By successively modifying the relative orientation of probe and sample in order to catch both the in-plane and the vertical field quantities (see Fig. 39), a complete characterization of the recording medium may be achieved (Schonenberger and Alvarado, 1990a; Rugar et al., 1990).
3. Vertical Recording Media The basic geometry underlying the two-dimensional problem is shown in Fig. 59. A uniaxial magnetic anisotropy forces the magnetization to assume an orientation perpendicular to the sample surface. Contrarily to the longitudinal media discussed in the previous section, the magnetic charge density is established along the magnetized regions. The detailed internal structure of the transition zones is thus less important in this case, and an abrupt transition approximation may be used for simplicity. It is convenient to employ a one-dimensional form of the Fourier ansatz given in Eq. (175):
Hence, the stray field components are
H Z ( x , z )=
-'4M"E2n + 1 7r
n=O
W
W
(1 80a)
FIGURE 59. Schematic of a thin-film vertical recording medium magnetized in a squarewave pattern.
165
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
and 2nf 1
W
W
( 180b) These contributions to the M F M contrast are shown in Fig. 60. Since the assumed dimensions for the relative film thickness and working distance, t / w and z / w , are the same as for the longitudinal medium discussed in the previous section, the results shown in Fig. 60 are directly comparable with those shown in Fig. 56b. The maxima of H , and the zero-axis crossings of H , are located for the vertical medium at lateral positions given by odd multiples of w / 2 . The in-plane component, H,, exhibits exactly the same oscillation amplitude as the vertical component H,. For the longitudinal medium (see Fig. 56b), the maxima of H , and the zero-axis crossings of H , are located at the magnetization transitions, i.e., they are phase-shifted by w/2 with respect to the vertical medium. The maximum magnitude of H , is only about 60% of that exhibited by H,. From the technological point of view it is important that the oscillation amplitudes of H , differ only by a factor of 1.9 for the two media, in spite of the fact that the magnetically
*
0.15
8
8
8
9
1
1
8
8
I
1
'
"
'
I
'
"
'
t/w=0.05
z/w=O.l
.I,
0.10-
I, !
1 I
0.052 \
I
'\,
0.00-
-
-.05'
I
-
-.lo-
-.15 -1
.o
I
1
I
,
I
-.5
,
~
'
1
0.0
"
"
"
"
'
0.5
H,
1.o
x/w FIGURE 60. Contributions to the magnetic contrast for a vertical recording medium which shows a square-wave magnetization pattern. As before, I denotes the film thickness, w the spacing between the transitions, z the working distance, and M the spontaneous magnetization.
166
U. HARTMANN
charged area corresponding to a single stored bit is so much more extended for the vertical medium. However, directly at the surface H , exhibits, according to Eq. (172), a value which equals half the spontaneous magnetization of the vertical medium, while the longitudinal medium reaches only 28% of the inherent magnetization.
25 20
0
FIGURE 61. Same as in Fig. 60, but for the relevant first (a) and second (b) field derivatives.
FUNDAMENTALS OF NON-CONTACT F O R C E MICROSCOPY
167
Apart from the constraining condition, Eq. (139), the following identities, which are directly obtained from Eqs. (180), immediately yield the remaining contrast contributions:
aH,
-
x
dz (i
+
1)
(181a)
and thus
(181b) The relevant field quantities shown in Fig. 61 directly correspond to those shown in Figs. 57b and 58b for the longitudinal recording medium. 4. Magneto-optic Recording Media A very promising alternative to longitudinal recording is magneto-optic recording. The concept has received much attention mainly due to the high areal storage density which could be achieved (Rugar et af., 1987). Magneto-optic recording materials exhibit a uniaxial magnetic anisotropy which forces the magnetization to an orientation perpendicular to the film plane. The complete recording process consists of a magneto-thermal writing process and a magneto-optic reading process. Marks are written by locally heating the medium with a focused laser beam above the Curie temperature while an external bias field is present, the orientation of which is antiparallel to the local magnetization vector. After cooling below the Curie temperature, a reverse magnetic domain is formed, which is schematically shown in Fig. 62. The information is read back via Faraday rotation of a polarized laser beam reflected off the written domains. Since the cylindrical domains may be written in arbitrary patterns, it is convenient to treat the problem of MFM contrast formation first for an isolated mark. The stray field produced by an ensemble of domains is then
FIGURE 62. Schematic of a circular domain written into a magneto-optic recording medium.
168
U. HARTMANN
obtained by a linear superposition of the individual domain contributions. Thus, a Fourier ansatz as in Eq. (175) is not convenient in this case. According to Fig. 62, the boundary value problem is three-dimensional, however, involving symmetry of rotation about the vertical axis. Upon evaluating the magnetic potential according to Eq. (134), the volume integral can be dropped because the magnetization is homogeneous throughout the film thickness. Insertion of the magnetic charge density profile for the top surface in Fig. 62 yields the potential for a medium of infinite thickness:
where polar coordinates (r', 0 ) are applied, and where r is the radial distance to the center of the domain. Expansion of the integrands into power series and use of the indentity
then yields
This form of the potential, which is an alternative form to the commonly used expansion in terms of zonal harmonics (Morse and Feshbach, 1953), is particularly suitable for a quick numerical evaluation of the M F M contrast contributions. The finite thickness t of the medium is accounted for by the transformation
d(r,2 )
+
d ( r ,z ) - d ( r ,z
+t).
(185)
The resulting field components are shown in Fig. 63. Two features are particularly important. (i) The maximum radial stray field component exceeds the maximum vertical component. (ii) Far away from the domain,
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY 0.15
L 0.1
z/w
0.10-
t/w
I
I
1
I
I
I
I1 I I
= 0.05
I I
-
1 1
0.055
\
I
-
--- - - - - _ _ _ - _
0.00-:------
-.05-
I
I
I
1
-
I 1 I I I 1 I I I I
-.lo-
-
_-
I 1
-.15
-2.0
169
-i.5
-i.o
I
-.5
I
0.0
I
0.5
Hr
- H, I I 1.0 1.5
2.0
the vertical field component is solely determined by the spontaneous magnetization and the film thickness. However, approaching the transition zone, H2 first shows a negative “overshoot,” then a positive one, and finally reaches a local minimum at the center of the domain. Thus, the presence of the domain locally raises the vertical stray field far above the magnitude obtained for the uniformly magnetized medium. The remaining contrast contributions, numerically evaluated according to Eq. (184), are shown in Fig. 64. It is interesting that all these contributions exhibit their peak values close to the magnetization transition zone. Apart from the constraint given in Eq. (139), the symmetry of the arrangement yields a 2 H z l a r 2= - a 2 H z / d z 2 .
5 . Type-II Supuconductors It is of current importance to estimate the stray field produced by a hexagonal vortex lattice manifest in a type-I1 superconductor which is exposed to an external magnetic field oriented perpendicular to the superconductor’s surface. Several groups are presently working on an experimental detection of the Abrikosov vortex lattice by means of MFM. A detailed discussion of contrast formation thus seems worthwhile. From the symmetry point of view, the magnetostatic boundary value problem exhibits a certain similarity to a hexagonal arrangement of
170
U. HARTMANN
uniformly magnetized cylindrical domains within a nonmagnetic environment. Because of the hexagonal symmetry of the lattice, it is convenient to slightly modify the Fourier ansatz given in Eq. (171): ( 186a)
Hz(r,z = 0) = x f i ( G ) e x p ( i G - r ) , c
z/w = 0.1
I!It
0.8 x(0
\ I rn n
I
\ 3 W
-.a-
\ I
-'*O-
!
-1.2
\I
(a) I
I
I
I
I
- - 8Hr/%r -- %Hz/%r
I
8H7/8Z
-
-
I
25 20
r/w FIGURE 64. Same as in Fig. 63, but for the relevant first (a) and second (b) field derivatives.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
171
with
( 186b) This Fourier ansatz refers to the vertical stray field component at the surface of the superconductor, and the expansion is performed with respect to the reciprocal lattice represented by
which is an arbitrary reciprocal lattice vector. a is the real-space lattice constant of the flux lattice, and f,, denotes the area of the corresponding unit cell (see Fig. 76). h, k are arbitrarily chosen natural numbers. M denotes a fictitious magnetization: The vortices have simply been modeled by cylindrical domains of a certain radius A. Each of these domains is assumed to carry the homogeneous vertical “magnetization” M = 2@0/(p 0 r X 2 ) . $0 denotes the elementary flux quantum. One thus obtains, for example, for niobium diselenide (NbSe2, A = 69nm), a value of p o M = 0.27T. The material is of particular importance for the experimental investigations, since samples usually show a high degree of crysallographic perfection. According to Eq. (172), the fictitious charge density produced at the surface of the superconductor is just twice the magnetization M . The flux produced per unit cell, p0Mf,,/2, then equals a0. Using IrdOexp (-iGr’cosO) = 27rJo(Gr’)
( 188a)
and (188b) where Jo and J I are Bessel functions of order zero and one, one obtains for the Fourier coefficients in Eq. (186)
H ( G ) = TMAJI (AG)/f,,G.
(189)
Commonly, these Fourier coefficients are replaced by “form factors” (see, for example, Hiibener, 1979):
172
U. HARTMANN
where hz(r') is the vertical stray field component which would be produced by an isolated vortex directly at the surface of the superconductor. According to the present approximation, one thus obtains F(G) = 2J,(XG)/AG.
(191)
Hence, the surface vertical field component produced by the complete vortex lattice is, according to Eq. (186a), given by
where H ( G ) = @ o F ( G ) / p 0 f uwas c used. Equation (192) allows, according to Eq. (173), the reconstruction of the complete exterior magnetic potential:
The obvious shortcoming of the preceding approach is that the detailed interior magnetic structure of the vortices has been neglected (see, for example, Hubener, 1975). However, all information about the interior structure of a vortex is represented by the form factors F ( G ) weighting the individual Fourier components in Eq. ( 1 93). Thus, an advanced approach must be based on the employment of more realistic form factors. It has been shown that these quantities can be experimentally obtained from neuron diffraction measurements (Schelten, 1974). The alternative possibility is to apply a more appropriate model for the magnetic behavior in the vicinity of a vortex core. The approach presented by Clem (1 974a, 1975b) appears to be particularly suitable for purposes of MFM contrast modeling. In treating the core of an isolated vortex, Clem assumes a normalized order parameter $(r)
= I$(r)Iexp
(-4,
( 194a)
where r denotes the radial coordinate, 77 the phase, and l$(r)l = r / d r 2
+ 6:.
( 194b)
tVis
a variational core-radius parameter. Substitution into the second Ginzburg-Landau equations yields (Clem, 1975a, 1975b) h,(r)
=
(J 7 Z / A L ) ]
(@0/2"POXLJV) [KO
/[KI (Jv/XL)I
(195)
for the surface stray field produced by an individual vortex. KO and K 1 denote the McDonald functions of order zero and one (see, for example, Abramowitz and Stegun, 1964). Substitution into Eq. (190) yields for the
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
173
form factors
where XL denotes the London penetration depth of the superconductor at a given temperature. The variational core-radius parameter tvis determined by minimizing the energy per unit length of a vortex. This leads to the constraint (Clem, 1975a, 1975b) 6
= J 2 [ 1 - ~02(€v/XL)/K:(~v/XL)IXL/€V,
( 197)
where K denotes the Ginzburg-Landau parameter of the superconductor. Using K = 9, one obtains for niobium diselenide a normalized core radius of ["/XL = 0.15 at OK. Niobium with K = 1.4 yields <,,/AL = 0.84. Clem (1975a, 197%) has further shown that linear superposition according to Eq. (193) can be used to obtain the stray field at arbitrary K values and vortex spacings, provided that the correct spatially dependent magnitude of the order parameter, given in Eq. (194), is used. Overlapping of the vortices can be approximately accounted for by replacing XL by a fielddependent penetration depth Xeff
=~
L / ~ ( I W O ) l * ) >
(198)
where (I$(Ho)12)is the spatial average of the order parameter in Eq. (194b), which is now depending on the externally applied field Ho. Hence, for overlapping vortices the form factors in Eq. (196) exhibit a field dependence not only via the lattice constant u, but also via the modified penetration depth XL -+ X,n. The unit cell's area is related to the externally applied field by Jfo = @ o / P o f u c .
(1 99)
Thus, one obtains from Eq. (193) Hz(r,z)
=
Ho
F(G)exp (iG.r - Gz)
(200a)
G
for the vertical stray field component, and Hr(r,z ) = -iHo
F(G exp (zG-r - Gz)uc
(200b)
G
for the in-plane field component, where uG = G / G is a unit vector. A closer investigation of these equations shows that higher Fourier components are rapidly damped, since F ( G ) is, according to Eq. (196), monotonically
174
U. HARTMANN
P
FIGURE65. Contours of constant field magnitudes for an Abrikosov vortex lattice with a lattice constant a, which is twice the London penetration depth AL. On the left, the normalized vertical field oscillations Hz/(2FjHo) - 1 are shown. On the right, the corresponding in-plane field oscillations Hr/4F, Ho are shown. The maximum and minimum values obtained directly at the surface of the superconductor are indicated. Since no special assumptions on the interior vortex structure enter the calculations, the shown flux distribution applies to any type-I1 superconducting material.
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
175
decreasing with increasing G. This behavior is enhanced by the increasing exponential damping for an increasing distance z to the sample surface. The first Fourier component obtained for G = 0 in Eq. (200a) yields the external field Ho, while this component vanishes for the in-plane stray field H,. Since F ( G ) = F(G) only depends on the magnitude of the reciprocal lattice vector, Eq. (192) shows that all reciprocal lattice vectors with dh2 hk k 2 = n, where n is a natural number, yield the same form factor. Figure 65 shows stray field profiles according to Eqs. (200). For the vertical field component, only the oscillations superimposed on the external bias field have been considered. The calculations were performed in a first-order approximation involving only the six reciprocal lattice vectors with Ghk = 27ra/fu,. The sum of all corresponding form factors is denoted by F, . The maximum vertical field component is then max (H,) = (1 + 6 F I ) H o , while min ( H , ) = ( 1 - 3F,)Ho. For the in-plane component, one obtains max ( H , ) = 4 d 3 F 1 H oand min (H,) = 0. Locations of these peaks are marked in Fig. 65. Equations (200) are the basis for MFM contrast modeling. The various field derivatives which are required according to Eqs. (137) and (138) all show the same symmetry as either H , or H,. From the experimental point of view, the most important question concerns the maximum variation in force or compliance obtained upon raster-scanning a typical ferromagnetic probe across the vortex lattice. In order to deduce a representative value, a bulk iron probe with a saturation flux density p o M = 2.1 T, aspect ratio CY = 0.5,
+ +
FIGURE 66. Possible deformation of the vortex lattice due to the highly focused microfield produced by a ferromagnetic probe. Within the indicated circular area underneath the probe, the lattice constant changes to u'.f,, denotes the undistorted unit cell. In the lower parts of the images, cross-sections are shown, which are taken along the indicated line scans. (a) shows a situation in which the probe's stray field is parallel to the external bias field, which leads to (I* < u. In (b) both fields are antiparallel, which results in a* > a .
176
U. HARTMANN
and semiaxis domain length R , = 500nm is assumed (see Section 1V.B.1 for a description of the parameters). The sample is niobium diselenide, where the material parameters &/XL = 0.15 and K = 9 are used. The externally applied flux density is taken as poHo = 120mT in order to avoid strong vortex overlap. Since both lateral and vertical stray field components involve a characteristic decay range, effective lateral and vertical ranges of probe-sample interaction, p and S', have to be determined (see Section 1V.B.I). For p the half-width-half-maximum of the stray field taken right above the center of a vortex seems reasonable. The modified vertical range 6' is then determined by Eq. (163). For a working distance of 5nm, the maximum force variation amounts to 319 pN, where the finite probe size has been accounted for in terms of Eq. (140). The corresponding maximum compliance detected in the dynamic mode of operation is 89mN/m. While the first value may just be in reach of present technology, the second should be clearly detectable. However, because of an effective probe diameter A = 56 nm, the expected lateral resolution is rather poor. The lateral forces exerted on the vortex ensemble exhibit a maximum value of 330 pN. However, this is only part of the whole story. Up to now, the stray field produced by the ferromagnetic probe itself has completely been neglected. The highly focused microfield superimposed on the externally applied field, in principle, (i) may nucleate vortices underneath the probe; (ii) may lead to a strong repulsive force between probe and sample, which is due to the local flux expulsion; (iii) may cause a deformation of the vortex lattice as schematically shown in Fig. 66. Issues (i) and (ii) are discussed in more detail in Section 1V.E. 6. Interdomain Boundaries in Ferromugnets
Interdomain boundaries are the natural magnetization transition zones between adjacent domains of different magnetic polarization. Within these transition zones the magnetization vector rotates continuously, thus forming a domain wall of finite thickness. In general, the wall thickness is determined by the exchange, magnetocrystalline anisotropy, magnetostriction, and magnetostatic energies inherent to the ferromagnet (see, for example, Chikazumi, 1964). Typical transition widths range from the nanometer scale for hard magnetic materials up to more than a micron for very soft materials. At the intersection with the crystal surface, complex two- or three-dimensional flux closure configurations generally occur in soft magnetic materials (Hubert, 1975). This near-surface modification of the wall structure is due to a natural energy minimization behavior inherent to the wall: Extended free surface magnetic charges are avoided by a suitable rotation of the wall's magnetization vector field.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
177
FIGURE67. Schematic of a 180" Bloch wall (upper image). The type-I approximation models the wall by a plane of infinitesimal thickness, which carries the specific dipole moment 40. The type-11 approximation is based on a finite wall thickness 6 and a homogeneous magnetization M . The near-surface profile of the wall is modeled by an ellipsoidal cylinder. The numerical approach accounts for the complex (asymmetric) near-surface profile of the wall's magnetization vector field M(r). the lower image shows a 180" Nee1 wall.
Nucleation, free motion, and annihilation of domain boundaries, as well as their interactions with the crystal lattice, determine the magnetization process of a ferromagnet, the technical relevance of which need not be emphasized. A study of interdomain boundaries in ferromagnets by means of MFM is thus of importance with respect to both basic as well as applied research. The upper part of Fig. 67 schematically shows a 180" Bloch wall as it occurs, e.g., in iron. Within the wall the magnetization vector rotates between the two antiparallel, adjacent domains and exhibits a component perpendicular to the sample surface. The stray field calculation can be based on three different models of varying complexity. In the type-I approximation, the wall is modeled by a plane of infinitesimal thickness which carries a homogeneous dipole moment &, per unit area. Near-surface fine structures of the wall are neglected. The finite width of a symmetric wall can be accounted for in a first-order approximation by modeling the wall, close to the sample surface, by a cylinder with an ellipsoidal cross-section. This is denoted as the type-I1 approximation in Fig. 67. Finally, the accurate approach consists of a numerical calculation of the internal wall structure by means of energy minimization procedures (see Hubert, 1975). Advanced calculations have recently been performed for iron and Permalloy (Scheinfein rt a/., 1991; Aharoni and Jakubovics, 1991). Previous approaches to MFM contrast formation (Hartmann and Heiden, 1988; Hartmann, 1989c; Hartmann et al., 1991) have shown that
178
U. HARTMANN
the experimental data obtained from domain walls can generally be modeled by use of the type-I approximation in Fig. 67. The solution of the twodimensional problem (infinite extent of the wall along the y axis in Fig. 67) is obtained from Eq. (134):
H(r) = (40/2.)r/r2, (201) where H and r are vectors within the x-y plane in Fig. 67. For a straight wall of constant thickness p , the specific dipole moment is given by 4o = Mb, where M is the wall’s magnetization, which is assumed to be uniform and perpendicular to the sample surface. Assuming wall widths of S = 10-100 nm, one obtains for iron ( p o M = 2.1 T) specific dipole moments of po40= lop8Wb/m. Formally, q50 could also be associated with a magnetic potential or an electric current. In the latter case one would obtain values of I = 18180mA for iron. The basic contributions to the MFM contrast, calculated according to Eq. (201), are shown in Fig. 68 for a working distance which is equal to the wall width d. Already at this distance the stray field profile is much wider than the wall. This result can again be attributed to the general phenomenon of loss in higher Fourier components, as discussed in Section IV.C.1. The fact that the stray field profiles calculated according to the different approaches shown in Fig. 67 are almost the same a few hundred nanometers above the sample surface is the reason why contrast modeling according to the simple type-I approximation yields a satisfactory agreement
-5
I
-4
I
-3
I
-2
I
-1
I
0
I
1
I
2
-
H; I
3
I
4
5
x/A FIGURE 68. Field components contributing to the M F M contrast of a 180” Bloch wall. Mis the spontaneous magnetization and 6 the characteristic wall width. z denotes the working distance.
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
179
with experimental data (Hartmann and Heiden, 1988; Hartmann, 1989~; Hartmann et al., 1991). The additional field quantities required for MFM contrast modeling are shown in Fig. 69. A second basic wall type occurs in very thin ferromagnetic films. Within these Nee1 walls (see lower part of Fig. 67), the magnetization rotation is 0.20
I
I
I
1
I
z/a= 1
I
0.10
/'?,
x-
'
b-0
0.00
'
I
,/ ;
CL]
n
I
I
I-\ I I
> I
I
\ \I'
',
',
____------------_______
5
2 W
-.lo
-.20
(a)
_ - BH,/Bx -14
-.30(b) -.35
-13
-12
I
I
0
-'1
,
1
,I
, I '
I
I
I
I
I
I
I
3
2
-_-
'
1
',I
I
I
I
4
ffH,/Bz' B'H,/Bx' 1
-
1
FIGURE69. Same as in Fig. 68, but referring to the first (a) and second (b) field derivatives of the Bloch wall.
180
U . HARTMANN
perpendicular to that of a Bloch wall (see Chikazumi, 1964; Hubert 1975). This mode of rotation leads to a reduction in magnetostatic energy. Stray field calculation can be performed using the same basic approaches as for the Bloch wall. With respect to the type-I approximation for the Bloch wall, the dipole plane has to be rotated by 90" to obtain the corresponding approximation for the Nee1 wall. The stray field components are thus obtained by the following transformation procedure:
The stray field of interdomain boundaries in materials of finite thickness t is then obtained by the transformation H&:
z ) -i ffx,,(x, z ) - f L , , ( X , Z
+ 4.
(203)
7 . The Detection of Electrical Currents A filament which carries an electrical current I exhibits the magnetic field
H
=
(1/27r)1 x r / r 2 ,
(204) where r is the radial vector with respect to the center of the filament. The field shows the same decay rates as derived for interdomain boundaries in the previous section. Equation (204) permits an estimate of the sensitivity of MFM with respect to the detection of electrical currents. For a bulk iron probe with a semiaxis domain length of R , = 500nm and an aspect ratio of Q = 0.5 (see Section 1V.B.I for a description of the parameters) which is raster-scanned at a height of 10 nm across the filament, the minimum detectable current is 350pA in the dynamic mode of operation if a compliance sensitivity of N/m is assumed. The effective probe diameter derived from the interaction decay ranges (see Section 1V.B.1) amounts to A = 20 nm, which is a first-order approximation for the obtained lateral resolution. If instead of the filament a conductor of rectangular cross-section is considered, the vertical field component is given by
H,
=
z:
-(-1. 27rwt
+ xf
2
XL2+ZT2
z; - - In
x,2+zy
(arctan 7 z: - arctan x,
2
x;2+z;2 x,2+z;2
-
arctan
which is a standard solution of Laplace's equation for the magnetic
181
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY 0.16
e
I
I
8
1
0
I
1
8
I
I
I
I
I
I
9
-
11
I1
z/w = 0.1
0.140.1 20.10-
I1 I1 I 1 I1 I I I 1 I 1
t / w = 0.1
0.08-
I # I Ill I Ill
0.06-
-
I Ill
0.04-
-
0.02-
0.00
-
-.02-
-.04-.06-
I 1 \ I I1
-
filament
- rectangular I
I
-5
__
I'
-.08-
-.lo
-
14
-I3
-2
~
-1
1
0
~
1
1
2
~
1
3
~
4
5
1
'
FIGURF 70 Magnetic field components produced by a conductor of rectangular crosssection (thickness 1, width WJ) and by a filament of infinitesimal cross-section. x and z denote the lateral and vertical coordinates, and I denotes the applied current.
vector potential (Morse and Feshbach, 1953). Using the abbreviations z: = z f t / 2 and x$ = x 5 w / 2 , the in-plane field component Ci, is also obtained from Eq. (205), provided that i$ + x: and x i + z:. Both coordinates x and z are measured from the center of the conductor. The basic contributions to the MFM contrast for both the filament and the rectangular conductor as shown in Fig. 70. The smaller decay rate of the field produced by a conductor of finite size increases the effective interaction ranges experienced by the probe, and thus leads to an enhanced current sensitivity of the MFM at reduced lateral resolution with respect to the filament. A preliminary experiment (Goddenhenrich et a/., 1990b) has demonstrated the potential of MFM to detect electrical currents in microfabricated planar devices. The local imaging of inhomogeneous current distributions in materials and lithographically prepared devices is considered an especially promising new field of application to MFM. D . Sensitivity, Lateral Resolution, and Probe Optimization Concepts
The main strength of MFM, compared to the various other magnetic imaging techniques, is its capability to achieve high spatial resolution (typically better than a hundred nanometers) on technically relevant samples with little or no
1
~
1
182
U. HARTMANN
sample preparation. The ability to handle real-world samples, complete with overcoats and substrates, greatly simplifies the imaging process with respect to, for example, electron microscopic techniques. Major improvements concerning the lateral resolution mainly rely on new concepts of probe fabrication. If the dipole-dipole interaction between two ferromagnetic spheres at a center-to-center separation of 10 nm is considered, 4,300 Bohr magnetons per sphere yield a force which is just in reach of present technology for a microscope which is operated in the static mode. If iron spheres are considered, the corresponding radius of a sphere would be 3.8nm. In the dynamic mode of operation, 2,150 Bohr magnetons would be detectable, corresponding to a radius of 2 nm for the iron spheres. If the monopole interaction between two magnetically charged disks is considered, the minimum radius would be 4.3 nm in the static mode and 1 . 1 nm in the dynamic mode. These considerations are of course somewhat simple-minded. However, the derived quantities may well be considered as some ultimate limits of MFM with respect to sensitivity and lateral resolution. Figure 71 illustrates the basic design concepts for optimized magnetic dipole and monopole probes. Another promising probe type is the superparamagnetic probe shown in Fig. 72. Because of weak or even missing shape and crystalline anisotropies, the magnetization within the probe's effective domain exhibits field-induced free Nee1 rotation. Using such a probe, the detected force component is Fd = poVMn-vH,and the detected compliance is Fi = poVM(n.v)(n.vH),where V is the domain volume, M the spontaneous magnetization, H the stray field magnitude, and n the cantilever's unit normal vector. The main difference in contrast formation with respect to ferromagnetic probes is that the interaction is always attractive. A first step toward the fabrication of superparamagnetic probes has been presented by Lemke el al. (1990). While the aforementioned
optimized ferromagnetic probes FIGURE 71. Design for optimized ferromagnetic force sensors
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
183
FIGURE 72. Schematic of a superparamagnetic probe.
optimization concepts are concerned with advanced probe geometries, it seems also promising to look for other probe materials. Antiferromagnetic, ferrimagnetic, and metamagnetic materials (see, for example, Chikazumi, 1964) may be promising alternatives if it is possible to restrict their net magnetization to the near-apex regime of the probe. However, little information is available concerning the size-affected magnetic behavior of these materials close to the apex of sharp tips.
E. Scanning Susceptibility Microscopy Scanning susceptibility microscopy (SSM) is proposed as a new technique which is closely related to MFM (Hartmann et al., 1991). The highly focused microfield of ferromagnetic probes is used to induce a magnetic response of the sample. If the sample is nonmagnetic but conducting, the probe which is vibrating close to the sample surface generates eddy currents in the near-surface regime of the sample. This leads to repulsive forces between probe and sample, which depend on the electric conductivity of the sample at a local scale. If the sample is a soft ferromagnet, SSM is capable of detecting the static and dynamic susceptibility of the sample perpendicular to its surface. In this case the attractive magnetostatic component interplays with the repulsive eddy current component. Since the magnetic susceptibility is a complicated function of field frequency and
FIOURE
73. Schematic of a sensor suitable for scanning susceptibility microscopy.
184
U. HARTMANN
magnitude, it is desirable to equip the SSM with a soft magnetic tip which is polarized by an exciter coil, as shown in Fig. 73. An interresting application to SSM is the investigation of superconductors. A first step toward a calculation of the forces arising when a ferromagnetic microprobe is approached to a superconductor was recently presented by Hug et al. (1991). The probe was modeled by a magnetic point charge, and the sample was considered as an ideal London superconductor, where full account has been taken for the finite penetration depth XL. Certain limitations of the model result from the fact that the probe is assumed to be a magnetic monopole of fixed moment. The detailed analysis of the magnetic behavior of real MFM probes, presented in Section IV.B, however, has shown that the stray field does not simply exhibit a monopole character, but also contains considerable dipole components, especially when the probe-sample separation becomes comparable with the dimensions of the effective apex domain. In the following a model is presented that accounts on the one hand for the finite probe size and on the other hand for the presence of vortices in the superconductor. With respect to the rigorous London model (Hug et al., 1991), the real situation is simplified by assuming complete flux penetration into the superconductor up to a depth equal to XL and complete flux expulsion beyond XL. The magnitude of the probe-sample interaction derived in the following is thus a lower limit of the accurate value and approaches the latter for increasing probe-sample separation (Hug et al., 1991). The boundary condition H L ( z = -AL) = 0, corresponding to a complete Meissner effect, is met in the usual way by considering an image probe identical to the real probe and equidistant below the plane z = -AL. According to the effective-domain model presented in Section IV.B, the microprobe is represented by its monopole moment q, and its dipole moment m, and by an effective probe diameter A. The total repulsive force between the probe and its magnetostatic image is thus composed by a monopole-monopole component FMM= (1/4TPo)q2/4(d+ XL)',
(206a)
by a monopole-dipole component FMD= (1/4TPo)qm/4(d+ X L ) ~ ,
(206b)
and by a dipole-dipole component
FDD= ( 1 / 4 ~ ~ & 1 ~ / 8X (L )d~+,
(206c)
where d is the distance of the probe's apex to the sample surface. According to Eqs. (160) q and m are sensitive functions of the effective magnetostatic interaction range 5. Since the real probe interacts with its image, the
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
185
interaction range is thus equal to the distance between the probe and its image, 6 = 2 ( d + XL). Substitution of Eqs. (160) into Eqs. (170) then yields the force between the superconductor, which is characterized by its London penetration depth XL, and the finite probe, which is characterized by its semiaxis domain length R,, its aspect ratio a , and its saturation magnetization M . It is convenient to have some upper estimates for the forces at hand, which are obtained when the probing tip touches the surface of the superconductor. Assuming XL << R,, one gets FOMM = 7rtr2R:pOM2,
(207a)
FOMD = “FOMM
(207b)
FODD
(3cr2/2)FOMM.
(207c)
These characteristic forces are completely independent of the superconducting material, as long as XL << R , is provided. Corresponding compliance values are defined as Fd = -FO/XL. Figure 74 shows force-versus-distance and compliance-versus-distance curves for the individual contributions according to Eqs. (206). These curves clearly show that it is not justifiable to neglect the monopole-dipole and dipole-dipole contributions to the total
0.400.200.107
0.08: 0.04-
0.02i i
0.01
I
0.1
I
0.5
’
# ‘ I
d/hL
1.0
5.0 I m
t
r
10.0
FIGURE74. Normalized force ( F )- and compliance ( F ‘ )-versus-distance curves obtained for the interaction of a ferromagnetic probe of finite size with a superconductor from the flux expulsion model. XL is the London penetration depth. MM, MD, and DD denote the monopole-monopole. monopole-dipole, and dipole-dipole contributions. The normalization constants F, and F,; are solely determined by the probe’s magnetic properties.
186
U. HARTMANN
n
E
NbSep
100.0
'1 W
iL n
Z
C
v
LL
__ -
T=O
0.1
I
F' F
I
FIGURE 75. Force ( F ) - and compliance ( F ' )-versus-distance curves expected for niobium and niobium diselenide if the measurement is performed with a bulk ion probe as commonly used.
force. Both of these contributions exceed the monopole-monopole contribution for a probe of finite size. Force- and compliance-versus-distance curves in absolute units are shown in Fig. 75 for niobium and niobium diselenide at zero temperature. The curves refer to a realistic bulk iron probe with R, = 500nm, (Y = 0.5, and ,uoM = 2.1 T (see Section IV.B.1 for a description of these parameters). Both force and compliance exhibit magnitudes which should easily be detectable, at least from the point of instrumental sensitivity. The complete flux expulsion model a priori excludes the phenomenon of probe-induced vortex nucleation in a type-I1 superconductor. On the
FIGURE 76. FIGURE 76. Schematic Schematic of of aa probe-induced probe-induced vortex vortex nucleation nucleation process. process
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
187
other hand, it has been shown in Section 1V.B that sharp ferromagnetic probes may produce quite strong near-apex stray fields which may by far exceed the lower critical field of several type-I1 superconductors. The result could be a partial flux expulsion combined with a partial flux penetration through vortices, as shown in Fig. 76. Some evidence for this behavior has been found in experiments concerned with the levitation of a macroscopic magnet over a type-I1 superconductor (Hellmann et al., 1988). Probeinduced vortex nucleation processes should be detectable in terms of sudden peaks in the force and compliance versus distance curves shown in Fig. 75. Returning to the complete flux expulsion model, a preexisting vortex lattice should be detectable in terms of periodic force or compliance modulations obtained upon raster-scanning the probe across the superconductor. It is convenient to introduce a spatially dependent penetration depth X(v) = A L / l N r ) l l (208) where I@(r)l is the magnitude of the Clem order parameter introduced in Eq. (194b). Figure 77 shows the spatial dependence of X for niobium and niobium diselenide obtained for a center-to-center spacing of the vortices which equals twice the London penetration depth XL. While for
0
FIGURE 77. Variation of the spatially dependent penetration depth A. AL denotes the London penetration depth and tl,the variational core-radius parameter. The solid curve refers to niobium diselenide and the dashed curve to niobium.
188
U. HARTMANN 1
FIGURE 78. Relative force variations for niobium and niobium diselenide obtained upon scanning the probe across an isolated vortex. The broad profiles are produced by a conventional probe, as presently used, while the sharp profiles refer to the optimized probes shown in Fig. 71.
niobium diselenide no substantial overlap of the vortices is obtained, niobium already shows strong overlap. Now, if the probe is raster-scanned across the superconductor, the presence of the vortices affects the manifestation of a complete magnetic image of the probe within the superconductor. Within the present linear approximation, this effect can be accounted for by replacing XL in Eqs. (206) by X(r) from Eq. (208). The resulting variation in force obtained upon raster-scanning the probe across an isolated vortex is shown in Fig. 78. Because of the finite probe size, which has been fully accounted for, the profile of the vortex is considerably broadened. However, the relative variation in force is about 10% for niobium diselenide and about 35% for niobium considering a realistic bulk iron probe. These values would also be obtained if vortex nucleation takes place during recording of a force-versus-distance curve as shown in Fig. 75. Vortex nucleation should thus clearly be detectable. The sharp profiles additionally indicated in Fig. 78 model the contrast which would be obtained by using the optimized probes shown in Fig. 71. F. Applications of Magnetic Force Microscopy
The very first magnetic force microscopy (MFM) images were obtained by Martin and Wickramasinghe in 1987. In a poineering experiment they
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
189
imaged the field profile of a thin film magnetic recording head which was driven by an oscillating head current. In this way they obtained images which were free from nonmagnetic artifacts. Using a slightly modified separation technique, Schonenberger et al. (1 990) simultaneously measured the magnetic and topographic structure of a recording head. Another issue of particular importance in magnetic recording technology is the performance of the recording medium. Thus, it is not surprising that the high-resolution analysis of recording media has received considerable interest in the MFM literature (see Sarid, 1991). An excellent review and extension to the field of longitudinal media has been given by Rugar el a1 (1990). Imaging of stray fields produced by recording media is of great technical importance since contrast formation is closely related to the readback operation of the recording head. The parasitic “media noise,” the erasure, and overwrite behavior have been studied in great detail for various media (Rugar r t al., 1990; Sarid, 1991). Media exhibiting perpendicular anisotropy are of particular importance to magneto-optic recording. In fact, magneto-optic materials were among the first samples examined by MFM (Martin et al., 1988). One of the most recent investigations in this field is the MFM analysis of multilayer structures (den Boef, 1990), which are quite promising candidates for erasable magneto-optic storage schemes. Natural domain structures generally have a much more complex topology than artificially created magnetization patterns in storage media. The analysis is important with respect to both basic research and technical application (Hartmann rr al., 1991). A variety of material has been investigated (see Sarid, 1991). Apart from the aspect of material research the study of naturally established magnetic fine structure provides information on the ultimate capabilities of MFM with respect to sensitivity and resolution. Griitter et at. ( 1990b) observed magnetic features on rapidly quenched FeNdB and claimed a lateral resolution of IOnm. Hobbs et al. (1989) achieved about 25 nm resolution on FbFeCo thin film. A particular challenge of the field of natural domain configurations is the imaging of interdomain boundaries. The very first clear image of an individual domain boundary was obtained by Goddenhenrich et al. (1988), who simultaneously applied MFM and Kerr microscopy to study 180” Bloch walls in iron whiskers. The major problems in imaging interdomain boundaries result on the one hand from the relatively small microfields (Hartmann and Heiden, 1988; Hartmann, 1989c, 1990b, 1990c) produced by the walls, and on the other hand from the perturbative influence of the probe (Goddenhenrich et al., 1988; Hartmann, 1988; Mamin et al., 1989; Griitter et ul., 1990a). Nonetheless, even fine structures within domain boundaries have been resolved (Mamin et al., 1989; Goddenhenrich et al., 1990a).
190
U. HARTMANN
As of this writing, the author counts at least 60 publications on MFM and more than 10 groups working in this field. Reviews on various aspects of MFM have been given by Martin et al. (1989), Hartmann et al. (1990), Hartmann (1990), Sarid (1991), Hartmann et al. (1991), and Griitter et ul. (1992).
V. ASPECTS OF INSTRUMENTATION Considering the beautiful high-resolution topographic SMF data, the ultrasensitive measurements of electromagnetic surface forces, the detection of individual magnetic interdomain boundaries, and the verification of the Coulomb field associated with the minute charge corresponding to only one electron, the most obvious question is: How does an instrument look which is capable of providing us with such striking data? The central part of any SFM is the microprobe which interacts more or less locally with the surface of the sample. The microprobe is in most cases a ~~
deflection sensorm s i c r o p r o b e flexible cantilever sample
1
I
FIGURE 79. Cantilever deflection sensor schemes used in force microscopy. Basically, probe-sample interactions are converted into a physical quantity Q . With electron tunneling, Q corresponds to the current J between cantilever and tip counterelectrode which are at angstrom separation. With optical interferometry, Q is determined by the intensity I of interfering light beams. The beam deflection technique relates Q to the position-dependent intensity I measured with a two-element photodetector for a light beam reflected o n top of the cantilever. The capacitance detector measures deflection-induced changes in capacitance between cantilever and reference electrode, yielding Q in terms of C. The bimorph piezosensor directly converts deflections into a voltage induced between its electrodes, thus relating Q to V .
FUNDAMENTALS O F NON-CONTACT FORCE MICROSCOPY
191
sharp tip, while for some special applications smoothly curved probes have been used (Hartmann et al., 1992). The force-sensing probe must have, apart from its well-defined geometry, certain material properties, such as conductivity, dielectric permittivity, permanent magnetization, etc., which determine the type and strength of the interaction with the sample. The microprobe is attached to - or part of - a soft cantilever spring whose geometrical and material properties are chosen such that the respective interaction between probe and sample is detected via measuring the cantilever deflection. Although such a tip-cantilever ensemble is part of all force microscopes, the details of implementation vary. The original AFM (Binnig et al., 1986), for example, used a handmade cantilever spring formed from a piece of gold foil approximately 1 mm long. A small diamond stylus glued to the foil served as tip. Some of the best sensors for electric and magnetic force microscopy have been fabricated from fine, electrochemically etched wires. Today, the most advanced SFM cantilevers are microfabricated from silicon, silicon oxide ( S O z and Si203), or silicon nitride using lithography and etching techniques well established in micromechanics. Typical lateral cantilever dimensions are of the order of l00pm with thicknesses of the order of I pm. Typical spring constants are in the range of 0.1 to I N/m, and resonant frequencies are 10 to 100 kHz. The other critical component of the SFM is the sensor that detects the cantilever’s deflection. Ideally, the sensor should have subangstrom sensitivity and should exert a negligible force on the cantilever. According to Fig. 79, the deflection schemes which convert the cantilever’s mechanical status into some nonmechanical quantity, are divided into two basic types: electronic and optical systems. Electron tunneling, which was the method originally applied to Binnig el al. (1986), has the virtue of being extremely sensitive: The tunneling current between two conducting surfaces changes exponentially with distance, typically by a factor of 10 per angstrom of displacement. The resolution could thus be as high as A.Although excellent results have been achieved by many groups, tunneiing generally has the disadvantage that its performance can be degraded if the tunneling surfaces become contaminated. Another electrical method consists of the capacitance detection system. The basic philosophy is to monitor the cantilever deflections by measuring the varying capacitance between the free end of the lever and a fixed reference electrode. The virtue of this method is that it is relatively simple to implement even under UHV and low-temperature conditions. Goddenhenrich et al. (l990a) demonstrated an instrument optimized for magnetic imaging, while Neubauer et al. (1990) presented a dual capacitance sensing system for simultaneous measurements of vertical and lateral force
192
U. HARTMANN
components. An advantage of the capacitance detection method is that it is by far not as sensitive as electron tunneling to local surface imperfections of cantilever and reference electrodes. However, the large effective electrode area also causes the main drawback of the capacitance detection system. The interelectrode spacing is limited by the “snapping distance” at which the derivative of electrostatic and surface forces is equal to the lever’s spring constant. Decreasing the interelectrode spacing below this distance causes a jump to contact of the lever. The performance of the capacitance detection SFM is thus mainly limited by appreciable electrostatic and surface interelectrode forces. The third electrical cantilever-deflection sensor scheme is the bimorph piezosensor (Anders and Heiden, 1988), which directly converts, via the piezoelectric effect, deflections into a voltage between the electrodes of the piezoelement. This elegant method has suffered so far from the problem of tailoring bimorph piezoelements to cantilevers of suitable spring constant. The tendency, however, to microfabricate complete scanning probe microscopes on a chip (see, for example, Quate, 1990) will most likely increase the importance of the bimorph piezocantilevers, which are well suited for integrative microfabrication techniques. Optical deflection-sensing schemes are subdivided into two basic classes (see Fig. 79): beam deflection and interferometry. Optical methods average the rough surface of a cantilever and exert an almost negligible force which generally makes them the better alternative with respect to the electrical detection schemes. In a beam deflection system (Meyer and Amer, 1988; Alexander el al., 1989), a collimated laser beam is focused on the lever and is reflected into a two-segment photodetector: The photocurrents of both elements of the detector are fed into a differential amplifier whose output signal is proportional to the cantilever’s deflection. All optical elements are at large distances from the force sensing lever. That offers advantages for some implementations (e.g., UHV) of the instrument, but also raises the sensitivity to externally caused directional fluctuations in the optical paths. The beam deflection has been successfully used in many experiments and was employed in the first commercially available SFMs. Interferometer-based systems have taken various different forms. In a homodyne detection system, the flexible cantilever beam and a fixed optical flat form a Fabry-Perot-type interferometer (McClelland, 1987). The incident laser beam first passes through a beam splitter and is then incident on the Fabry-Perot. The beam reflected back from the latter is incident on the same beam splitter, by which part of it is deflected into a photodetector. In a differential homodyne system, a fraction of the laser power, serving as a reference beam, is diverted by a beam splitter to a first photodetector. The
193
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
light passing through the beam splitter, serving as a signal beam, is incident on the Fabry-Perot, reflected back, and deflected into a second photodiode. The differential signal of the two photodetectors is used to image the force acting on the probe. Stability problems due to the large physical path difference between the reference beam and the light reflected back from the cantilever have been surmounted by using a fiber-optic technique (Rugar, 1989) that places a reference reflector within microns of the cantilever, or they have been overcome by using a polarization interferometer of Nomarski type (Schonenberger and Alvarado, 1989), splitting the incident light beam into two orthogonally polarized fractions reflected off the free and supported ends of the cantilever. Heterodyne detection systems (Martin and Wickramasinghe, 1987) completely eliminate the effects of the drift in the optical path length. This is achieved by introducing a frequency shift between reference and signal beams employing an acousto-optical modulator. On the one hand, such a system exhibits an unprecedented sensitivity and stability, and on the other hand, it consists of a relatively complicated and expensive setup. A special kind of an interferometric deflection sensing scheme is the laserdiode feedback system (Sarid et al., 1988). Lasers are known to be extremely sensitive to optical feedback. Even a minute amount of light emitted by the laser that is fed back into its cavity can affect its operation drastically. In the laser-diode feedback system, the cantilever is positioned a few microns from the front facet of a laser diode. The lever and front facet act as a lossy Fabry-Perot, whose reflectivity determines the effective reflectivity of the front facet of the laser. A deflection-dependent laser power results, which is then measured with a photodetector which is integrated into the rear facet of the laser diode. The laser-diode feedback system has the virtue of being simple to assemble and align and involves only a few components. The theory of operation, however, is more complicated than for the other systems because of the highly nonlinear behavior of the laser. Today, most implementations of non-contact SFM are based on a dynamic mode of probe-sample interaction sensing rather than on a static mode as employed in the original AFM and most of today’s contact-mode applications of SFM. In the dynamic mode the cantilever is driven e.g., by a small piezoelectric element to vibrate near its resonant frequency. The presence of a nonuniform force F acts to modify the cantilever’s effective spring constant according to k = ko d F / d z , where ko is the spring constant of the isolated cantilever and d F / a z is the force gradient’s component along the cantilever’s normal vector. If one exerts, for example, an attractive force on the cantilever, the spring will effectively soften. As a result the resonant frequency will decrease, and this decrease is detected by measuring the amplitude, phase, or frequency shift of the vibration. -
~
+
194
U. HARTMANN
There are mainly three advantages of the dynamic mode of SFM operation: (i) The operation is removed from the regime where llfnoise has a significant contribution; (ii) the use of a phase-sensitive detection method increases the signal-to-noise ratio; and (iii) it is possible to use cantilever resonance enhancement to greatly increase the sensitivity. Martin et al. (1987) demonstrated a sensitivity to forces of 3 x N, where forces were obtained by numerically integrating the force derivatives obtained from force-versus-distance curves. Such a sensitivity would hardly be obtainable by a static measurement. Albrecht et al. (1991) demonstrated a sensitivity to force derivatives of 9 x 10-'N - a value very close to the ultimate thermal limit - using a feedback-driven lever. Schonenberger et al. (1990) demonstrated the advantage of simultaneously having information about both the static and dynamic cantilever status. That allows one, for example, to completely separate electro- and magnetostatic components of the probe-sample interaction. Apart from the tip-cantilever ensemble and the deflection-sensing scheme, three additional components are required for an SFM: (i) a feedback system to monitor and control changes in the cantilever status and, hence, in the force or the force derivative; (ii) a mechanical scanning system - usually piezoelectric - to move the sample with respect to the tip vertically and in a lateral raster pattern; and (iii) a display system to convert the measured data into an image. The scanning, feedback, and display systems are essentially the same as for an STM. The schematic diagram in Fig. 80 shows the basic setup of an SFM as operated in the constant interaction mode, which is preferred in most applications. With respect to the mechanical setup of an SFM system, care must be taken that its performance is not limited by external vibrations, such as
&
Error I
V(
Feedback z)=const
X, y ,
> 1
I
I
z-Piezo Position
Probe-Sample Arrangement
f
FIGURE 80. Typical setup of a scanning force microscope operated in the constant-force or constant-compliance mode.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
195
from the laboratory building. The effect of external vibration is to cause unwanted motion of the tip with respect to the sample and deflection sensor. The ultimate immunity of the SFM to external vibration is determined by the ratio of the excitation frequency v to the lowest resonant frequency vo of the SFM. Hereby, the mechanical system consists of both the lever and the rest of the instrument. The amplitude of unwanted probe motion is attenuated by a factor of (v/v0)*for v << vo. Hence, for an external perturbation of frequency v = 10 Hz and amplitude 1 pm - which may be considered as a typical building vibration - an SFM of lowest resonant frequency vo = 10 kHz would exhibit a probe motion of 0.01 A. This level is well below the thermal vibration rms amplitude of 0.6A obtained for a 1 N/m cantilever at room temperature. Because cantilevers can be readily made with high resonant frequencies (some tens of kilohertz), the art of building good SFMs consists of making the mechanical components rigid and compact, especially in the path from cantilever to sample. An extensive discussion of details of operation and noise sources is found in the book by Sarid (1991).
VI. CONCLUSIONS Because of the rapid growth of the field of SFM in general, it does not appear to be very simple to predict the major future perspectives in this field and in particular for non-contact SFM. On the other hand, reflecting on the past accomplishments and critically analyzing the underlying theories for the various SFM applications presented in this work allows one to draw some conclusions concerning dominant future applications in science and technology, further instrumental improvements, and its ultimate capabilities. One of the definite scientific goals in SFM is to completely understand force-versus-distance curves. Starting at large probe-sample separation, i.e., in the pre-contact regime, then proceeding through the contact regime involving elastic and eventually plastic deformation of probe and sample, and subsequently going back to large probe-sample separation involves interactions caused by all of the electromagnetic contours of the surface of a solid indicated in Fig. 1. Even the pre-contact regime has not yet been completely understood, as the recent paper by Burnham et af. (1992) emphasizes. Major applications of non-contact SFM will clearly be provided by our efforts to completely understand the interaction between two solids at arbitrary separation and with an arbitrary intervening medium. Other more specific scientific applications of non-contact SFM will
196
U. HARTMANN
predominantly include the investigation of electric and magnetic microfields resulting from highly localized charge and spin arrangements. Because the major strength of non-contact SFM is the detection of electric and magnetic microfields and their separation from - or relation to topographic peculiarities, the major technological applications will consequently also be in that area. Engineers can make quantitative measurements on magnetic storage components and can obtain electric potential maps from integrated circuits. The trend is already toward developing instruments capable of handling complete magnetic disks or silicon wafers. The rate of instrumental improvement now, about seven years after the demonstration of the first force microscope (Binnig et al., 1986), exhibits a certain decrease. The first excellent commercial general-purpose instruments are available. These instruments are easy to use, and hence open SFM to a broad community of potential users. Nevertheless, considerable technical challenges do still exist. Among these is certainly the implementation of SFM under UHV or/and low-temperature conditions (see, for example, GieBibl et al., 1991). In particular, for non-contact SFM, the routine low-temperature implementation is of great importance because that would make the interesting electromagnetic properties of superconductors accessible to local investigations. A lot of engineering effort will also concentrate on fabricating tailor-made probes for certain applications. That will involve both the search for new probe materials and the improvement of microfabrication techniques. Ultimate capabilities of non-contact SFM follow directly from the presented theory, even if not all potential experimental capabilities have yet been achieved. These ultimate capabilities are all related to sensitivity and spatial resolution. The basic limit is provided by thermal noise. It is fairly obvious that for a perfectly designed instrument, thermal noise limits the sensitivity to forces or force gradients. Thermal noise, however, also ultimately limits the spatial resolution. For any given kind of interaction between probe and sample, two neighboring points of the sample surface are distinguishable only if they produce distinguishable signals. If future technologies allow SFM probes to be fabricated with arbitrary sharpness while keeping satisfactory mechanical properties, the required probe radius is solely determined by the requirement that the probe has to be large enough so that the interaction variation across the sample is well above the thermal noise limit. Sensitivity to forces or force gradients and spatial resolution are thus unequivocally related in non-contact SFM. From this universal relation it then follows that it would never be possible to image individual spins by magnetic force microscopy, but it is, for example, indeed possible to image the equivalent of only one electron charge smeared out over a certain area (Schonenberger and Alvarado, 1990b).
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
197
ACKNOWLEDGMENTS Thanks are due to C . Heiden, KFA-Jiilich/University of GieBen, for his continuous interest in the present work and many helpful discussions. Part of the theory involved was developed during a seven-month stay at the Institute of Physics of the University of Basel, Switzerland. The author would like to thank H.-J. Giintherodt and his group, especially R. Wiesendanger, for the inspiring atmosphere. The author is indebted to N. A. Burnham, KFA-Jiilich, for her permanent scientific contribution and for the careful proofread of the manuscript. Thanks are due to E. Brauweiler for her technical assistance in the preparation of the manuscript.
REFERENCES Abrdmowitz, M . , and Stegun, 1. A,, eds. (1964). Handbook of Mathematical Functions. National Bureau of Standards, Washington. Adamson, A. W. (1976). Physical Chemistry of Surfaces. Wiley, New York. Aharoni, A., and Jakubovics, J. P. (1991). Phys. Rev. B 43, 1290. Albrecht, T. R., and Quate, C. F. (1988). J . Vuc. Sci. Technol. A 6, 271. Albrecht, T. R., Grutter, P., Horne, D., and Rugar, D. (1991). J . Appl. Phys. 69, 668. Alexander, S., Hellmanns, L., Marti, O., Schneir, J., Elings, V., Hansma, P. K., Longmire, M., and Gurley. J. (1989). J . Appl. Phys. 65, 164. Anders, M.. and Heiden, C. (1988). J . Microsc. 152, 643. Ashcraft, N. W.. and Mermin, N. D. (1976). Solid S a t e Physics. Saunders College, Philadelphia. Binnig, G . , and Rohrer, H. (1982). Helv. Phys. Actu 55, 726. Binnig, G., Quate, C. F., and Gerber, C. (1986). Phys. Rev. Lett. 56, 930. Bozorth, R . M. (1951). Fcwamugnetism. Van Nostrand, Princeton. Brown. W. F.. Jr. (1962). Mugnetostatic Principles in Ferromagnetism. North-Holland, Amsterdam. Brown, W. F.. Jr. (1963). Micromugnetics. Wiley, New York. Burnham, N. A . , and Colton, R. J. (1992). “Force microscopy,” in Scanning Tunneling Microscop),. Theory unil Applicution (Bonnell, D., ed.). VCH Publishers, in press. Burnham, N. A.. Colton, R. J., and Pollock, H. M . (1992). Phys. Rev. Lett. 69, 144. Carey, A., and Isaac, E. D. (1966). Magnetic Domains und Techniques,for Their Observotion. Academic Press, New York. Casimir, H. B. G. (1948). Proc. Kon. Ned. Akud. Wetensch. 51, 793. Casimir, H. B. G.. and Polder, D. (1948). Phys. Rev. 73, 360. Chikazumi, S. ( I 964). Phvsics q / Magnetism. Wiley, New York. Chui, S . T. (1991) Phys. Rev. B. 43, 10654. Clem, J R. (1975a). J . LOMI.Temp. Phys. 18, 427. Clem, J. R. (l975b).In Proc. 14th Intern. Conf. Low. Temp. Phys. (Krusius. M., and Vuorio, M., eds.). North-Holland, Amsterdam. Datta, T., and Ford. L. H. (1981). Phys. Lett. A 83, 314.
198
U. HARTMANN
den Boef, A. J. (1990). Appl. Phys. Lett. 56, 2045. den Boef, A. J. (1991). “Scanning force microscopy using optical interferometry.” Ph.D. Thesis, Twente University. Derjaguin, B. V. (1943). Koll. 2. 69, 155. Drexler, K. E. (1991). J . Vac. Sci. Technol. B 9, 1394. Dzyaloshinskii, 1. E., Lifshitz, E. M., and Pitaevskii, L. P . (1961). Adv. Phys. 10, 165. Eigler, D. M., and Schweizer, E. K. (1990). Nature 334, 524. Eisenschitz, R., and London, F. (1930). Z . Phys. 60,491. Evans, R., Marconi, U. M. B., and Tarazona, P . (1986). J . Chem. Phys. 84, 2376. Feinberg, G. (1974). Phys. Rev. B 9, 2490. Feinberg, G., and Sucher, S. (1970). Phys. Rev. A 2, 2395. GieOibl, F. J., Gerber, C., and Binnig, G. (1991). J . Vac. Sci. Technol. B 9, 984. Girad, C. (1991). Phys. Rev. B43, 8822. Goddenhenrich, T., Hartmann, U., Anders, M., and Heiden, C. (1988). J . Microsc. 152, 527. Goddenhenrich, T., Lemke, H., Hartmann, U., and Heiden, C. (1990a).Appl. Phys. Lett. 56,2578. Goddenhenrich, T., Lemke, H., Muck, M., Hartmann, U., and Heiden, C. (1990b). Appl. Phys. Lett. 57, 2612. Goddenhenrich,T., Lemke, H., Hartmann, U.,and Heiden, C. (1990~). J. Vac.Sci. Technol.A8.383. Goodenough, J. B. (1956). Phys. Rev. 102, 356. Griitter, P., Rugar, D., Mamin, H. J., Castillo, G., Lambert, S. E., Lin, C. J., Valetta, R. M., Wolter, O., Bayer, T., and Greschner, J . (1990a). Appl. Phys. Left. 57, 1820. Griitter, P., Jung, T., Heinzelmann, H., Wadas, A., Meyer, E., Hidber, H.-R., and Giintherodt, H.-J. (1990b). J . Appl. Phys. 67, 1437. Griitter, P., Rugar, D., Mamin, H. J., Castillo, G., Lin, C. J., McFadyen, I. R., Valetta, R. M., Wolter, O., Bayer, T., and Greschner, J. (1991). J . Appl. Phys. 69, 5883. Griitter, P., Mamin, H. J., and Rugar, D. (1992). “Magnetic force microscopy (MFM),” in Scanning Tunneling Microscopy 11 (Wiesendanger, R., and Giintherodt, H.-J., eds.), Springer Series in Surface Science 28, 15 I . Springer, Berlin. Hamaker, H. C. (1937). Physica 4, 1058. Harper, W. R. (1967). Confact and Frictional ElectriJication. Clarendon, Oxford. Hartmann, U . (1988). J . Appl. Phys. 64, 1561. Hartmann, U. (1989a). Phys. Left. A 137, 457. Hartmann, U. (1989b). Phys. Stat. Sol. ( a ) 115, 285. Hartmann, U. (1989~).Phys. Rev. B 40, 7421. Hartmann, U. (1990a). Phys. Rev. B 42, 1541. Hartmann, U . (1990b). J . Vac. Sci. Technol. A 8, 41 1. Hartmann, U. (1990~).J . Magn. Magn. M a f . 83, 545. Hartmann, U. (1990d). Adv. Mater. 2, 550. Hartmann, U. (1991a). Phys. Rev. B 43, 2404. Hartmann, U. (1991b). J . Vac. Sci. Technol. B 9, 465. Hartmann, U. (1991~).Adv. Mat. 2, 594. Hartmann, U., and Heiden. C. (1988). J . Microsc. 152, 281. Hartmann, U., Goddenhenrich, T., Lemke, H., and Heiden, C . (1990). IEEE Trans. Magn. MAG-26, 1512. Hartmann, U., Goddenhenrich, T., and Heiden, C. (1991). J . Magn. Magn. Mater. 101,263. Hartmann, U., Berthe, R., Goddenhenrich, T., Lemke, H., and Heiden, C. (1992). “Analysis of magnetic domains in ferromagnets and superconductors by force and tunneling microscopy,” in Scanned Probe Microscopy (Wickramasinghe, H. K . , ed.), AIP Con$ Proc. 241, 511. AIP, New York. Hiemenz, P. C. (1977). Principles of Colloid and Surface Chemisfry.Dekker, New York.
FUNDAMENTALS OF NON-CONTACT FORCE MICROSCOPY
199
Hellmann. F., Gyorgy, E. M., Johnson, D. W., Jr., O’Bryan, H. M., and Sherwood, R. C. (1988). J . Appl. Phys. 63, 447. Hobbs, P. C. D., Abraham, D. W., and Wickramasinghe, H. K. (1989). Appl. Phys. Lett. 55, 2357. Hiibener (1974) p. 162. Hiibener, R. P. ( 1979). Magnetic Flux Structures in Superconductors. Springer, Berlin. Hubert, A. (1974). Theorie der Domunenwunde in geordneten Medien. Springer, Berlin. Hubert (1975) p. 165, p. 166. Hug. H. J., Jung. T.. Giintherodt, H.-J.. and Thomas, H. (1991). Physica C 175, 357. Israelachvili, J. N . (1972a). Pro<,.R. SOC.Lond. A 331, 39. Israe~achvili,J N. (1972b). Prac. R. Soc. Lnnd. A 331, 19. Israelachvili, J . N. (1985). Intermnlecular and Surface Forces with Applications to Colloidal and Biological L‘$JstemJ.Academic Press, London. Jackson, J. D. (1975). Clu.wica1 Electrodynamics. Wiley, New York. Kittel, C. (1949). Revs. Modern Phys. 21, 541. Kittel (1956) p. 154. Landau, L. D., and Lifshitz, E. M. (1960). EIectrodynumics of Continuous Media. AddisonWesley, Reading. Massachusetts. Landman, U., Luedtke, W. D., Burnham, N. A., and Colton, R. J. (1990). Science 248, 454. Lemke et (11. (1989) p. 194. Lemke, H., G6ddenhenrich, T., Bochem, H. P., Hartmann, U., and Heiden, C. (1990). Rev. Sci. Instrum. 61, 2538. Lifshitz, E. M . (1955). J . Exper. Theorel. Phys. U S S R 29, 94. [(1956). Sov. Phys. JETP 2, 731. Litshitz (1956) p. 78. Mahanty, J.. and Ninham, B. W. (1976). Dispersion Forces. Academic Press, London. Mamin, H. J., Rugar, D., Stern. J. E., Terris, B. D., and Lambert, S. E. (1988). Appl. Phys. Lett. 53, 1563. Mamin, H. J., Rugar, D., Stern, J . E., Fontana, R. E., Jr., and Kasiraj, P. (1989). Appl. Phys. Let[. 55, 318. Mamin, H. J., Rugar, D., Griitter, P., Guethner, P., Lambert, S. E., Yogi, T., Wolter, O., and Greschner, J. (1990). BUN. Am. Phys. Soc. 35,420. Mansuripur, M. (1989). IEEE Trans. Magn. MAG-25, 3467. Martin, Y., and Wickramasinghe, H . K. (1987). Appl. Phys. Lett. 50, 1455. Martin, Y., Williams. C. C., and Wickramasinghe, H. K. (1987). J . Appl. Phys. 61, 4723. Martin, Y., Rugar. D., and Wickramasinghe, H. K. (1988). Appl. Phys. Lett. 52, 244. Martin. Y., Abraham, D. W.. Hobbs, P. C . D., and Wickramasinghe, H. K. (1989). Electrochem. Soc. Proc. Magn. Muter. Process. Dev. 90-8, 1 IS. Mate, C . M., Lorenz, M. R., and Novotny, V. J. (1989). J . Chem. Phys. 90, 7550. McClelland, G . M., Erlandsson, R., and Chiang, S. (1987). “Atomic force microscopy: General principles and a new implementation,” in Review of Progress in Quantitative Nondestructive EvaIuazinn (Thompson, D. O., and Chimenti, D. E., eds.) 68, 307. Plenum, New York. McVitie. S., and Hartmann, U. (1991). “A study of the magnetic structure of magnetic force microscope tips using transmission electron microscopy,” in Proc. EMSA Conference. San Francisco Press, San Francisco. Meyer. G., and Amer, N. M. (1988). Appl. Phys. Lett. 53, 1045. Meyer, E., and Heinzelmann, H. (1992). “Scanning force microscopy (SFM),” in Scanning Tunneling Microscopy I I (Wiesendanger, R., and Giintherodt, H.-J., eds.), Springer Series in Surface Sciences 28, 99. Springer, Berlin. Moiseev, Yu. N., Mostepanenko, V. N., Panov, V. I., and Sokolov, I . Yu. (1988). Phys. Lett. A 132, 354.
200
U. HARTMANN
Moon, P., and Spencer, D. E. (1961). Field Theory.for Engineers. Van Nostrand, Princeton. Moreland, J., and Rice, P. (1990). Appl. Phys. Lett. 57, 310. Morse, P. M., and Feshbdch, H. (1953). Methods of Theoretical Physics. McGraw-Hill, New York. Mostepanenko, V. M., and Sokolov, 1. Yu. (1988). Dokl. Akud. Nauk S S S R 298, 1380. [(1988). Sov. Phys. Dok. 33, 1401. Neubauer, G., Cohen, S. R., McClelland, G . M., Horn, D. E., and Mate, C. M . (1990). Rev. Sci. Instrum. 61, 2269. Nicholson, D., and Personage, N. D. (1982). Computer Simulations and the Statistical Mechunics of Adsorption. Academic Press, New York. Pohl, D. W. (1991). Phys. Bl. 47, 517. Potter, R. 1. (1970). J . Appl. Phys. 41, 1648. Quate, C. F. (1990). In Digest IEEE MicroElectroMechanical Systems, February, p. 188. Rather, H. (1988). Surface Plusmons on Smooth and Rough Surfuces and on Gratings. Springer, Berlin. Reimer, L. (1984). Transmission Elecfron Microscopy. Springer, New York. Rickayzen, G., and Richmond, P. (1985). In Thin Liquid Films (Ivanov, 1. B., ed.). Dekker, New York. Rugar (1989) p. 176. Rugar, D., and Hansma, P. K. (1990). Physics Today, October, p. 23. Rugar, D., Mamin, H. J., and Guethner, P. (1989). Appl. Phys. L e t f . 55, 2588. Rugar, D., Mamin, H. J., Giithner, P., Lambert, S. E., Stern, J. E., McFadyen, I . , and Yogi, T. (1990). J . Appl. Phys. 68, 1169. Rugar, R., Lin, C. J., and Geiss, R. (1987). IEEE Trans. Magn. MAG-23, 2263. Sarid, D. (1991). Scanning Force Microscopy with Application to Electric, Magnetic and Atomic Forces. Oxford University Press, New York. Sarid, D., Iams, D., Weissenberger, V., and Bell, L. S. (1988). O p f . L e f t . 13, 1057. Scheinfein, M. R., Unguris, J . , Pierce, D. T., and Celotta, R. J. (1990). J . Appl. Phys. 67, 5932. Scheinfein, M. R., Unguris, J., Blue, J. L., Coakley, K. J., Pierce, D. T., and Celotta, R. J. (1991). Phys. Rev. B 43, 3395. Schelten (1974) p. 162. Schelten, J., Lippmann, G., and Ullmaier, H. (1974). J . Low Temp. Phys. 14, 213. Schonenberger, C., and Alvarado, S. F. (1989). Rev. Sci. Instrum. 60, 3131. Schonenberger, C., and Alvarado, S. F. (1990a). Z . Phys. B 80, 373. Schonenberger, C., and Alvarado, S. F. (1990b). Phys. Rev. Letf. 65, 3162. Schonenberger, C., Alvarado, S. F., Lambert, S. E., and Sanders, I. L. (1990). J . Appl. Phys. 67, 7278. Sneoka, K., Okuda, K . , Matsubara, N., and Sai, F. (1991). J . Vac. Sci. Technol. B 9, 1313. Stern, J. E., Terris, B. D., Mamin, H. J., and Rugar, D. (1988). Appl. Phys. Letf. 53, 2717. Wadas, A., and Grutter, P. (1989). Phys. Rev. B 39, 12013. Wadas, A., Griitter, P., and Guntherodt, H.-J. (1990). J . Vac. Sci. Techno/. A 8, 416. Weisenhorn, A. L., Hansma, P. K., Albrecht, T. R., and Quate, C. F. (1989). Appl. Phys. Lett. 54, 265 1. Wickramasinghe, H. K. (1990). J . Vac. Sci. Technol. A 8, 363. Wickramasinghe, H. K. ( 1992). “Related scanning techniques,” in Scanning Tunneling Microscopy 11 (Wiesendanger, R., and Giintherodt, H.-J., eds.), Springer Series in Surface Sciences 28, 209. Springer, Berlin. Williams, M. L., and Comstock, R. L. (1972). AIP Conference Proc. 5, 738. Wolter, O., Boyer, T., and Greschner, J . (1991). J . Vac. Sci. Technol. B 9, 1353. Zaremba, E., and Kohn, W. (1976). Phys. Rev. B 13, 2270.
ADVANCES IN FLECTRONICS AND ELECTRON PHYSICS, VOL 87
Electrical Noise as a Measure of Quality and Reliability in Electronic Devices B. K. JONES School of Phrsm and Matrrds, Lmcaster Linrvrrsiry,L~lncasrer,h i r e d Kingdom
1. Introduction. . . . . . . . . . . . . . . A.Outline . . . . . . . , . . . . . . . B. Noise as a Measure. . . . . . . . . . . C. The Character of Noise . . . . . . , . . D. l / j N o i s e . . . . . . . . . . . . . . 11. Established Mechanisms of Excess Noise Involving A. Introduction . . , , . . . . . . , . . B. Noise Due to Structural Defects. . . , , C. Excess Noise Due to Electronic States. . . D. Electrical States Linked to Extended Defects . E. Distributed Efiects . . . . . . . . . , . 111. Quality and Reliability . . . . . . . . . . A . Passive Components . . . , . . . . . . B. Interconnects and Electrornigration. . , , . C. Bipolar Devices . , . . , , , . . . . . D. Field-Efiect Transistors . . . . . . . . . E. Integrated Circuits . . . . . . . . , . F. Optoelectronic Sources . , , , . . . . 1V. Conclusions . . . . . . . . . . . , , , References . . . . . , . , . . . . . . ,
, ,
,
,
,
. . . . . . . . . . . . . . . . . , . . . . . . . . . . . , Defects. . . . . . . . . . . . . . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . . . .
. . . . . ,
,
. .
. . . . , , . . . . . . . . . . , . , ,
,
. . . . . . . .
. . . . .
. . . .
. . . . . . , , . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . , . , . . . . . .
201 201 202 206 212 215 215 217 225 23 I 236 237 238 241 244 245 245 246 247 247
I. INTRODUCTION A . Outline It is a common experience in electronics that there are good circuits and devices and bad ones, even within a group whose members are nominally identical because they have been made in the same way. The quality is often shown in a very sensitive way by noise introduced into the circuit above the normal level. The source of the noise is often obscure but is due in some way to a substandard component or a bad contact. In this way noise is seen to be a delicate and general indicator that something is not of acceptable quality. It is also normal practice to assume that a noisy device will be less reliable 20 1
Copyright Q 1994 by Academic Press, Inc All rights of reproduction in any form reserved ISBN 0-12-014729-7
202
B. K . JONES
and to reject it for any special application where long life is an advantage. The noise is an extra indicator in a system which may otherwise appear to work normally and within specification. The same experiences and arguments are also found in dealing with mechanical systems. In the main part of this review (Section 11) we will attempt to describe what mechanisms have been well established as a source of noise related to the defective or nonideal nature of electronic devices. This relates mainly to the quality of the device and how the noise may be used as a measure of perfection, a diagnostic tool, and an indicator of ideality for use in production to improve quality and yield. The section has been divided into subsections depending on the classification of the type of defect which has been found to cause the noise. The later part (Section 111) is devoted to an account of the success that has been obtained in using noise as a nondestructive indicator of reliability. These tests have been performed on devices which are acceptable since they pass all the design specifications, and the observed noise level is compared with the experimental lifetime. For practical reasons the degradation and measured lifetimes are usually accelerated in some way over the lifetime experienced by the device in normal use. The comparisons are therefore rather artificial but are the best that can reasonably be achieved, and these methods are accepted by the reliability community. The results suggest that noise is a good, nondestructive test of quality and reliability. This Introduction is mainly concerned with a review of the properties of noise and some of the general principles behind the specific mechanisms which will be described later for individual systems.
B. Noise as a Measure
Noise is a random fluctuation of a quantity. If a voltage or current is measured to a high resolution, it will be found to show variations about the mean value, and the variations are random in the sense that a future instantaneous value of the fluctuation cannot be predicted from present and past values. In practice, over some very short time the fluctuation will not have changed its value, and the maximum time-scale over which the variation can be predicted is a measure of the correlation time of the process or the reciprocal of the characteristic frequency of the spectrum of the noise. The noise spectrum is essentially limited to frequencies less than this characteristic frequency. The magnitude of the noise is a measure of the uncertainty in the system and increases if there are more individual sources of randomness. The
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
203
magnitude also increases if a given individual source changes its condition so that its effect on the system is less predictable. For example, an electron state will contribute most noise if it is at the Fermi energy, since then the probabilities of it being occupied or unoccupied are equal and hence its condition is most uncertain. The reason why noise is a sensitive measure of quality or reliability is that the time dependence of the noise sources reveals their presence by a continuously varying signal. A source, such as a trapped charge, may also have an effect on the dc properties, but this is less apparent since it just adds to the large dc effect. Although the noise caused by the device imperfections itself degrades the quality, other aspects of the performance are also likely to be degraded by the presence of these, or other, imperfections. It is possible that a general error in preparation, such as the use of impure chemicals, will introduce both sources of noise and also sources which will degrade the performance or lead to a fast deterioration of the structure or operation of the device and hence a reduced life. In this case it is not necessary that the noise impurity also act as the deteriorating impurity, but just that it indicates that something is not ideal. In many cases the noise is an indicator of lack of equilibrium in the system. The electrical fluctuation itself indicates a variation of the electrical properties, and this is usually an equilibrium fluctuation. However, any gradual change in the geometry, atomic arrangement, or impurity distribution due to a lack of thermodynamic equilibrium may also give rise to an extra change in the electrical properties. The noise level may decrease with time as the constituent atoms reach very stable equilibrium positions due to annealing. Alternatively, the noise level may increase if stress on the device produces changes in the geometry, as in electromigration, or extra electron states, as in hot electron damage in MOS devices. The value of noise as a technique for assessing the quality is that it is sensitive, it is a near-equilibrium measurement, and it is nondestructive. Its sensitivity is apparent in many investigations. In a batch of nominally identical samples with other nearly indistinguishable properties, the variation in the noise level can be very large, perhaps over an order of magnitude. Normally, as we shall see later, other properties can be found which correlate with the increased noise level, but these are not so obvious and constitute a small change in a current, or perhaps an anomalous section to an I-V characteristic or a large low-temperature leakage current. The excess noise increase is normally very obvious because in a good specimen the excess noise is either zero or very small, so that any increase is a large percentage change. In an I-V or C-V measurement, on the other hand, the change is superimposed on a large initial value so
204
B. K. JONES
that high-resolution measurements are needed. The excess noise contribution can be readily distinguished from the ideal noise, since it has a frequency-dependent spectrum while the ideal noise is white, or frequencyindependent. Noise is a near-equilibrium measurement. The noise process itself is usually an equilibrium fluctuation, but since it is usually a resistance fluctuation, a current bias is needed on the specimen so that the noise is converted into a measurable voltage fluctuation. This bias is usually at, or below, the normal operational bias level of the device so that no extra stress is applied to it. This contrasts with many techniques where large stress is applied to reveal the limits to the satisfactory operation of the device. For example, the quality of a p-n junction can be revealed in some cases by studying the reverse voltage breakdown characteristics. Since the noise measurement is made under normal bias conditions, it is nondestructive. Although this is common for a quality assessment, it is an extreme advantage in a reliability assessment. The high reliability of electronic devices with their low failure rates and long lifetimes under normal usage requires that the collection of reliability data from devices used under normal conditions is impractical, at least before the device becomes obsolescent. Hence it is normal practice to perform lifetime tests on relatively small numbers of samples by accelerating the failure processes by applying a high stress. Although the criterion used for the end of the life may not be complete failure, these methods are inherently destructive. Since small sample numbers are used, the results are statistical in nature with large uncertainties. Another major disadvantage is that, since the devices are subjected to a stress beyond that normally experienced in field operation, the failure mode experienced in the accelerated life test may not be the same as that experienced during normal operation, so that the test is rather unrealistic. The only consolation is that the predicted lifetimes for the normal operating condition derived from the data obtained by an accelerated test are most likely to be shorter than those found in normal operation, so that there is an inherent safety factor. The accelerating quantity may be voltage, but normally it is temperature, and the life test measurement is performed at several elevated temperatures. The results are then extrapolated down to the normal operating temperature using the Arrhenius equation. This method also assumes that the same degradation process is operating at both the test temperatures and the normal operating temperature. As we will see in the later sections, other near-equilibrium and nondestructive tests are possible and can often give indications similar to those of the noise measurements. However, these tests are usually specific to a particular system and need to be made with high resolution, since
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
205
only small changes are observed. The noise measurements usually show large percentage changes and are not specific, since almost any deficiency will cause some increased noise. By their nature, independent noise contributions can only add and increase the total effect. This lack of discrimination means that it may not be easy to distinguish the actual cause of the noise without a detailed study of the noise spectrum or the changes to the noise signal with bias or temperature. The noise analysis may be used for various purposes. Once a detailed study has been made so that the source of the dominant noise contribution is fully understood, then the noise can be used as a diagnostic test to discover the number and type of defect. This has been used with some success to determine the number density and type of trap in GaAs and the interface state density in the MOS silicon-silicon dioxide system. This allows informed improvements to be made to the manufacture of the basic materials and the processing of the devices. The interface-state density has been used to study the effect of hot electron damage in submicron MOSFETs. The general quality of devices can be determined from the noise level. This measurement is usually not specific and will give a general measure of a wide range of possible noise sources. However, it is a valuable, sensitive, single-parameter test and has implications for yield, rejection of low-quality devices, and general quality assurance. The use of noise as a specific test for reliability or a long lifetime is less direct. The assumption is that a noisy device will be of lower quality and have at least one type of manufacturing imperfection or nonideality. This imperfection may lead to a premature failure, and hence the device should be rejected in favour of a more perfect device. A further selection as a replacement device can therefore be made among those that have already passed a functional test to show that they are within specification. This correlation between noisy devices and those failing within a short time seems to be verified experimentally. Although this selection procedure may sound crude, it is in fact a sensitive, soundly based and simple extension to the normal functional tests of minimum junction breakdown voltages, maximum leakage currents, etc. Since it is nondestructive, nonstressing, and specific to an individual sample, it is much more valuable than the destructive sampling techniques carried out on batches using accelerated stress techniques. It should be mentioned that the general principle of measuring random, excess signals is not restricted to electronic components and is used to study the quality of many systems from ferromagnetic transformer cores to the degradation of the mechanical properties of composite materials and perhaps a second-hand car.
206
B. K. JONES
C . The Character of Noise
The noise of interest is the “excess noise,” which has also been given other names such as “current noise,” although that term is also used for other purposes. It is the electrical noise added to, or in excess of, the wellestablished intrinsic noise components of Johnson (or Nyquist) noise and shot noise. The former is given by the mean voltage intensity
or the spectral intensity
Here v, is the noise voltage fluctuation about the mean, k is the Boltzmann constant, T is the absolute temperature, R is the specimen resistance, and A f is the bandwidth of the measurement. This is an inherent noise contribution due to thermodynamics and is present in all types of system where there is a dissipative element, in this case R. Nondissipative or reactive elements do not contribute to the noise but, in conjunction with the resistance, filter the noise to alter its spectrum. The spectrum is “white” since Sv is independent of frequency. This is the classical limit, and quantum effects cause a decrease in the intensity at frequencies f > kT/h, where h is the Planck constant. For an ohmic resistance the thermal noise can be expressed as a current source in parallel with the resistance:
The other intrinsic noise source, shot noise, is caused by the discreteness of the current flow produced when the individual electronic charges flow through a device independently, as they do when passing up the potential barrier in a p-n diode or when emitted from a photocathode. The current fluctuation intensity about the mean current, I, is then -
i,’
= 2eIAA
with a spectral intensity
where e is the electronic charge. Similar expressions hold for fluxes of other quantities which occur in discrete units. If the charges interact with each
ELECTRJCAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
207
other and their motions are correlated, then the shot noise will be reduced. The minimum transit time through the system, or time during which the pulses are independent, give the coherence time and hence the reciprocal of the frequency above which the noise intensity decreases. The excess noise which concerns us here can be considered as a fluctuation in the ease of current flow and is usually a resistance or conductance fluctuation, but for some systems it is an emission fluctuation. It can be expressed as a spectral density
and for an ohmic device the relative fluctuations of the quantities are equal so that
Here rn , g, ,2in , and in are the fluctuations about the mean of the resistance, R , the conductance, G = 1 / R,and the voltage or current under suitable bias. Fluctuating quantities are measured by their mean intensity, so that the contributions from independent sources add and the total always increases. The measurement of the noise signal is often a specialised procedure for any type of device, and the individual references cited in this review shouId be consulted for specific current practice. However, the basic technique for a simple device is to bias the sample at constant current or constant voltage and measure the resulting noise voltage or noise current using a very lownoise voltage preamplifier or low-noise current-to-voltage converter. The signal is then processed with an FFT spectrum analyser and the spectrum data fitted to an appropriate theoretical function. The excess part of the total noise can be extracted by subtracting the value of the white noise found at high frequencies, but often a better procedure can be used by subtracting the noise measured with no bias current from the total noise. This technique also removes the noise of the measuring system. Usually acceptable data are obtained by rms averaging many repetitions of the spectrum and averaging the frequency data by the choice of the curve-fitting function. As we will see, the excess noise is normally measured in the low audio frequency range 1 Hz to 10 kHz, since the signal increases at low frequencies. Special techniques will be found in the references which may be used for devices where the current is very low (reverse bias diodes) or the signal must be measured under pulsed bias to reduce self-heating (high-power devices), or the resistance is very high (insulators) or very low (metal resistors), or the signal is very small and is less than or comparable with the thermal noise
208
B. K . JONES
(electromigration), or the signal is best measured at very low frequencies, about 0.01 Hz. Although the measurement technique is fairly specialised so that there is little suitable commercial equipment, there is no reason why such equipment should not be developed for commercial and industrial use, provided care is taken with the design of the equipment, especially the power supplies, and suitable screening to eliminate electrical interference. The main intrinsic reason why the noise technique may not be suitable for a production environment is that the noise is largest at low frequencies, and this means that the measurement is usually performed at 1-10Hz so that it will inevitably take several seconds. The excess noise spectrum normally falls into one of three basic forms, although there are many variants found in practice. The random telegraph signal (RTS) consists of a two-level process with transitions after random times t l in the lower level and t2 in the upper level, Fig. 1. If the characteristic values of t l and t 2 are r1 and r 2 ,then the characteristic time of the full process is ro,where 1
-
'To
1
+ -,1
'TI
'T2
= -
and hence ro is closer to the shorter time. The spectrum of this signal is the same as that of the generation-recombination noise which we will consider next, but the amplitude distribution function is just concentrated at the two voltage values. It is therefore readily distinguished by observation of the signal in the time domain. As well as a random telegraph signal, this is also called burst noise or popcorn noise when observed at large amplitude in devices. It has a characteristic sound when converted into an audio signal. Unless it is specifically the subject of the investigation, devices showing this characteristic are usually removed from the batch of experimental samples. All the other characteristic noise signals have a Gaussian amplitude distribution function. It is often found that a real device shows a more complex RTS pattern with one or more two-level or multilevel processes either adding independently or interacting. The pattern occurs in conventional devices as a big signal involving large current pulses and will be discussed later in Section I1 as burst noise. It is also observed in submicron devices, where the amplitudes of the steps are small on an absolute scale and probably corres-
d+
t,
t2
FIGURE 1. The two-level random telegraph signal
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
209
pond to transitions of a single carrier. This will be discussed in Section 1I.C. The basic generation-recombination (g-r) noise mechanism is that of a two-level system, but the name is given generically to a signal with a Lorentzian spectrum
Here, N is the number of free carriers in the sample, T~ = l/wo = l/2nfo is the characteristic time corresponding to characteristic frequencies fo or wo , and w = 27rfis the angular frequency of measurement. This has a Gaussian amplitude distribution function because it is usually made up from the superposition of a very large number of independent random telegraph signal processes with the same characteristic time. The amplitude, A , is a measure of the number of such individual processes. The spectrum is shown in Fig. 2 and is characterised by a white spectrum up to the characteristic frequency and then a fall as l/f2 at high frequencies. The low-frequency intensity is Are, and the integrated intensity is A / 2 . The basic characteristic of excess noise is that its intensity increases towards low frequencies. This is inevitable since any temporal or spatial coherence will remove high-frequency fluctuations. This gradual increase in the spectral intensity towards low frequencies causes considerable problems in interpretation. Unless a clear g-r spectrum can be identified, it is common to fit a power law variation S R / R 2 l/f’, and since p is found to be near 1 (0.8 < /3 < 1.2 is usually readily acceptable), it is classed as I/f noise. A brief discussion of the phenomenon of l/”noise is given in the next section. There is no doubt that I / f noise exists, but claims are often made of experimental observation of the various theoretical predictions of other power laws such as l / f 3 ’ 2 or l / f 2 . These observations must be treated with great care, since the simultaneous presence of I/f and g-r noise can lead to
-
FIGURE 2. The spectrum of generation-recombination noise and also of the random telegraph signal on log-log scales.
210
B. K. JONES
between 0 and 2 unless a detailed observations of a power law with analysis is made over many decades of frequency. It is in practice not easy to detect even a single g-r noise component, if it is present with a comparable l/fcomponent, by visual inspection of a log (SR/R2)-logfplot. If it is assumed that only l/fand g-r noise exists in a signal, then the most convenient method of analysis is to fit the data to the form
corresponding to the noise
SR
R2
-
c M
i =O
A iroi 1 + (wTo;)2
B
+
7'
where A ; , roi,are the constants of the M g-r processes, and B is a measure of the llfnoise intensity. On a linear-logfplot, as shown in Fig. 3 for a single g-r process ( M = l ) , this produces a symmetrical peak due to the g-r noise on top of a flat l/f background. If the white noise has not been completely removed, a rise is observed at high frequencies. Thus, the basic and common noise components can be separated by observing the time-domain signal or the amplitude distribution function to detect burst noise and a fit to Eq. (10) to separate l/fand g-r noise. It should be noted that if the sample is ohmic, then the voltage noise due to a resistance fluctuation varies as Sv N Z2, and this is a good test that the system is well-behaved. Other powers of the current usually indicate macroscopic nonlinearity of the specimen, or perhaps some microscopic nonlinearity of the noise mechanism such as occurs in the non-ohmic contact resistance between grains of a granular material. Similarly, if a bias current at frequency wl is applied, I = I. cos wl t , then the resistance fluctuation appears in the voltage as sidebands about the carrier at wlr and this is called l/Afnoise.
FIGURE 3. The graph of the measured quantity w S , / R 2 displayed to reveal a single g-r process together with a l/f noise component.
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
21 1
Except for the RTS or burst noise component, all the excess noise appears to have a Gaussian amplitude distribution function, and this is assumed to be due to the generation of the noise by a very large number of independent processes. If each process produces a spectrum and these are added as intensities, we have the spectral intensity S-
(12)
NT,
-
where NT is the number of individual fluctuation sources. If, for example, the fluctuating sources are the free carriers, then S R / R 2 1 / N , where N is the number of free carriers in the sample. Similarly, if there are defects causing the fluctuations, and these are distributed throughout the volume, V , then S R / R 2 1 / V , and if they are distributed about the surface area, A , then S R / R 2 1/A. By varying the specimen geometry, it should therefore be possible to identify at least the location of the noise source. However, in practice the analysis requires considerable care. For a uniform homogeneous sample, the only way to distinguish between the possible "1, or 1 / V dependencies is by the absolute magnitude of the effect, which depends on the theory involved. To distinguish between a 1 / V and a 1/A dependence requires a wide range of specimens of widely differing geometries. These are rarely available, since all the samples must be small to generate sufficient noise, and they must be made from the same starting material, which usually has constraints in one or more dimensions. Even the best investigations of this type, which have used the movement of a depletion layer boundary to change the active dimension, have not been very definitive. The inverse dependence on the sample dimensions reveals a very significant feature of excess noise. It is large in small samples, and as the material quality and processing techniques have improved, the noise magnitude has reduced, so that for some high-quality systems micrometer dimensions are necessary for a significant signal. The small size means that the surface-tovolume ratio is large, and surface properties which are known to generate excess noise can become very important. This implies that the noise measurement technique is particularly sensitive to surface and interface defects. There are several types of noise which do not have such well-characterised properties as those described. In some devices, biased away from equilibrium, there can be spontaneous oscillations with a fairly narrow spectrum or with chaotic properties. These are usually easily distinguished by their spectrum and their very rapid bias dependence. There are also random but poorly characterised events which are best described as impulses or sparking. These occur in breakdown and are rather unpredictable.
--
212
B. K. JONES
D. I / f Noise The phenomenon of l/f noise has been studied for many years, but no general consensus has been reached over its cause. It has received considerable interest, especially in the form of novel theories, because of its apparent unusual features and its apparent generality. There has been a general assumption that this form of fluctuation represents some very fundamental process which exists in all systems. We will not discuss this in detail, since we are concerned here only with noise mechanisms which are very well established. Briefly, the unusual features are that the integrated noise intensity diverges as the data is taken to lower frequencies, and the noise intensity spectra should show an even power in frequency. The divergence is of no practical consequence, since there is a real low-frequency limit given by the length of the experiment. Also, since the fluctuation is only of the resistance, there is no energy fluctuation to reach large values, and hence no practical problem. The odd power of frequency can easily be explained by less fundamental processes, such as the construction of the l/f spectrum from a distribution of even power g-r spectra with suitable amplitudes and characteristic frequencies, as we will show later. This construction, by allowing slightly different distributions, also accounts for the variation of the frequency exponent /3 about 1. This latter variation causes problems in the theory of any universal mechanism, since the 1/f coefficient B in Eq. (1 1) would have to have a flexible and non-integral time dimension if /3 were not equal to unity. The apparent universality of the phenomenon is also claimed to extend to non-electronic systems. These results are entertaining but do not bear much investigation, since they usually only exist over about one decade of frequency and are usually unique results. It should be emphasised that many phenomena will have noise with the same basic features as electrical excess noise, since there is the general requirement that the intensity must increase as the frequency decreases, so that spectra with a near-l/f dependence are inevitable. These correspond to systems with a modest amount of coherence above the lowest frequencies. In electronic systems there has been a strong attempt to explain the observations by a universal mechanism, although usually one that is applicable only to electronic systems. The basis of these attempts has been the empirical Hooge equation,
whch states that the fluctuation depends only on the total number, N , of free carriers in the sample. a. is a constant, 2 x for all systems and is
ELECTRICAL NOlSE AS A MEASURE OF QUALITY A N D RELIABlLlTY 21 3
independent of temperature and other physical changes. This was later modified to
where p is the carrier mobility and p l is the mobility if there were only lattice scattering. This is also only applicable for homogeneous specimens, and specific assumptions are introduced to calculate the noise in inhomogenous specimens. It is also assumed that Matthiessen’s Rule applies, so that
for independent scattering processes, pi,acting in series. Recent reviews by Hooge (1990a, 1990b) have shown that a. can vary by many orders of magnitude, and Eqs. ( 1 3) and (14) are not even approximate indicators of the size of the effects, as is often claimed. It should be noted that a good fit to the equations cannot be expected from published data, since few systems have been demonstrated to show a I / N dependence in distinction to a l / V or a I/A dependence, and the conclusions from most measurements rely just on the calculated value of a. without verification of the form of the equation. Precise fits are not possible if p is not unity, since the calculated value of a. will depend on the frequency of the experimental data which are used for the study of the I , V , or A dependence. Also, few systems have been shown to obey the form of Eq. (14), and the values of a. reported are not consistent in their use of Eqs. (13) or (14) in the data analysis. Although the validity of these equations is in doubt, it is common practice to present the intensity of the llfnoise in the form of a value of a. based on Eq. (13). Since these are more general interpretations, llfnoise data should perhaps be better presented also, or alternatively, as the noise per unit volume or per unit area, so that the magnitude of the bulk or the surface location of noise sources could be compared. It should be noted that once the assumption is made that there is a universal I / f noise mechanism for electronic systems, it is easy to extend the argument to further assumptions which are not justified if there is more than one process and different systems generate the spectrum in different ways. For example, because one system has been studied and shown to have a 1lfdependence down to extremely low frequencies, it does not follow that all systems will show a similar trend over many decades of frequency, and because the detailed amplitude distribution statistics of the noise and its stability with time have been studied in one system, it does not follow that all systems will behave in the same way.
214
B. K. JONES
The apparent early success of Eqs. (13) and (14) led to extensions. Based on limited direct experiments, it was concluded that the noise resulted from mobility fluctuations rather than number fluctuations, where the latter corresponds to the number of free carriers changing by the temporary immobilisation of a carrier in a trap. No detailed theories have been successful, and it is not clear how the mobility, which is an ensemble average quantity, can distinguish between a fluctuation in the scattering rate and the reduction in the average conduction due to the immobilisation of some of the carriers. In many systems a number change must produce a related scattering and hence a mobility change. The detailed calculations and predictions which have been made have been based on a mobility fluctuation analysis and a value of a. x This approach has been extensively reviewed by Hooge et a f . (198 1) and Kleinpenning (1990). A more fundamental and universal theory based on Eqs. (1 3) and (14) has been proposed by Handel in the theory of quantum l/f noise. This assumes that there is a bremsstrahlung energy loss by carriers undertaking collisions. There are specific predictions which usually give values of a. considerably less than 2 x lop3, and these are presented as some form of fundamental limit to the noise level. This approach has been reviewed by van Vliet (1991). In the following sections we will not be concerned about these apparent universal or fundamental noise mechanisms, since they make specific and fixed predictions of the noise levels to be expected in a given system. We do not need to consider whether these particular noise mechanisms exist or are even true, since if less noise is experimentally observed, then the theories must be in error, and if more noise is experimentally observed, then another separate, additive, llfnoise source is being studied. It is a common result in the systems discussed later that the noise magnitude increases with stress and damage and varies between nominally identical devices, and is hence not of fundamental origin. We will also use in the discussion the results of experiments in which the results are well confirmed using several samples over a wide range of variables and where detailed secondary confirmation experiments have been performed. The basic model which we assume for the construction of the l/f spectrum is one which builds up the resultant from a suitably weighted sum of generation-recombination processes occurring over a wide range of characteristic frequencies, or relaxation times. Instead of a specific model to produce the l/f spectrum, a model is then needed for the particular weighting of characteristic frequencies. The mathematical construction is such that physical models to produce such a weighting are easy to justify in a wide variety of systems. To produce a l/f spectrum, this distribution of relaxation times model requires that the individual g-r processes with characteristic time constant 70
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY 215
are weighted by a function
so that
which varies approximately as l/w between the limits of frequency where 0 F wr01 << 1, w7-02 >> 1. The weighting function, g ( r o ) ,can be derived from a uniform distribution, g ( q ) , of a quantity q if To
= TooexPDq1
(18)
where rooand D are constants. Specific mechanisms have been proposed for various systems. For the MOS system, the McWhorter model is used, and Eq. (1 8) is represented by the time for quantum-mechanical tunnelling into oxide traps. For resistance fluctuations in metal films, the Dutta and Horn model is used, and Eq. (18) is represented by the time for thermally activated movement of scattering species such as vacancies between lattice sites. This basic construction for 1 /f noise will be used later in the discussion. General reviews of the mechanisms of 1/fnoise have been given by Dutta and Horn ( 1 98 I), Kogan ( 1 985), Weissman (1 988), and Palenskis (1 990). Aspects of noise theory, measurement, and models in specific devices can be found in the books by Ambrozy (1982), Buckingham (1983), and van der Ziel (1986).
11. ESTABLISHED MECHANISMS OF EXCESS NOISE INVOLVING DEFECTS
A . Introduction
In this main part of the review we will describe what is known experimentally about the cause of excess noise due to non-ideality in a sample, and the models which seem appropriate to each individual system and noise source. I t is a common experience that there are “good” or “bad” specimens, in that the noise level can vary considerably between nominally identical devices made with the same techniques and even at the same time. This sensitivity of the noise magnitude to small differences is typical of the effect of impurities, defects, and damage. Since the effect is due to a variable which is not controlled, the identification of the exact source is not easy, and very
216
B. K . JONES
detailed, quantitative experiments are needed with many variables to give a positive identification. This labour is rarely undertaken. General investigation and conclusions are more common, with ample evidence that the noise depends on the chemical purity of the sample, its crystalline perfection, the number of grain boundaries, point defects, and linear defects, and the condition of its surface. In some cases the evidence is clear when changes of the variable are made on one sample, perhaps reversibly, and with an independent measure of the variable. To some extent the size of the noise introduced by defective parts of the specimen is hidden, since the manufacturer of the specimen will not supply really bad samples which are obviously cracked or damaged, with resistive or non-ohmic contacts and without suitable annealing, encapsulation, or surface passivation. The defects which produce noise usually also produce some other measurable change - for example, a change in the device resistance - but these effects usually produce much smaller proportional changes than that in the noise. It is usually necessary, therefore, to make a direct measure of the quantity causing the noise if definite identification of the mechanism by direct correlation is to be possible. Since the noise intensity is larger for small specimens ( S R / R 2 1 / V or ] / A ) , the noise sample is very small, but the amount of the variable which is suspected of causing the noise often has to be measured on a larger sample. Even if great care is taken, it is unlikely that these two specimens will behave in the same way, especially as the surface-to-volume ratio is very different. The difficulty in making reliable experiments is made worse by the occurrence of more than one noise source, since all noise contributions add. Occasionally it is possible to distinguish the separate noise sources from their different spectra, time-domain signals, or temperature dependence. Inhomogeneous specimens are a particular problem, with weighting of the noise towards sections where there is current-crowding or hot spots. Contacts are also a practical problem, and their noise quality should be checked, or four-terminal samples used. We will try to present here only evidence that is well established using a range of specimens and experimental variables and, if possible, an independent measure of the quantity which is suspected of causing the noise. One should be cautious about generalising any of the results to other materials, geometries, devices, or systems, since it appears that there are many ways of causing excess noise, and the dominant mechanism in each case is not obvious. In this part of the review we will look at the individual mechanisms and attempt to group the observed effects into sections relating to structural defects which probably contribute to the noise by atomic motion, to charge
-
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
217
fluctuations of electrical states either at points or in extended lattice defects, and to various breakdown and distributed effects.
B. Noise Due to Structural Defects 1. Chemicul Chunges to Free Surfaces
In the earliest measurements of the excess noise of semiconductors, it was observed that the chemical state of the surface could affect the magnitude of the noise very strongly. For these measurements, the specimens are normally of high resistivity so that the surface conductivity is easily measured, and clean, unpassivated surfaces are used. Germanium shows particularly large effects. The effects have been reviewed by Kogan (1985), and more recent work has been reported on several semiconductors (van der Ziel, 1986; Chovet and Rahal, 1979; Ambrozy et al., 1991). The effects on devices are typified by the influence of wet and dry ambients on both the reverse leakage current and the noise of a silicon p-n junction dioide (Jantsch and Feigt, 1969). The ILfnoise and the dc current both increase in the wet ambient and show slow changes with time. These effects of chemisorption are very large and variable and illustrate the necessity of good surface protection and passivation for devices in which the surface conduction, scattering, trapping, or recombination is significant. However, the studies so far have not been sufficiently detailed to produce a good model which may be used for analysis. Because the noise changes are much larger than the current or resistance changes, and because the noise spectrum can give more information, it may be that there will be an increase in interest in the topic as the development of solid-state chemical and biological semiconductor surface sensors proceeds. Clearer experiments can be undertaken if the solid is a metal rather than a semiconductor and the surface is clean with well-characterised adsorbed species or has another solid coating. For example, an excess g-r noise has been observed due to reversible processes after an irreversible oxidation of an iron thin film prepared in ultra high vacuum (Shanabarger et al., 1982). The study of solid-state reactions by means of noise as well as resistance changes could be very powerful. The formation of silicides by the reaction of very thin Co or Ni films on silicon has been investigated by Cho and Bene (1989). Both the noise intensity and spectral shape give sensitive indications of the formation of the compound and its crystalline state. We thus see that the contamination of the surface by impurities, electrolytes, or humidity is likely to be detected by noise. Also, the variation of the noise may be seen if
218
B. K. JONES
a chemical reaction takes place and the surface evolves into a more equilibrium state. The large noise observed in semiconductor-oxide systems, such as MOS transistors, due to the interface or surface states will be discussed later in Section II.C.3. This system has been thoroughly studied and is fairly well understood, but oxide and interface states probably also contribute to the noise in other systems. There has been a basic controversy over whether l/f noise is a bulk or a surface effect. There has been an assumption that the phenomenon is general, and hence that there is only one possible outcome, either bulk or surface. However, the source of the noise can be different in different systems, and there is no general conclusion. This has been accepted for g-r noise. We will see later in Section II.B.3 that there are bulk sources of noise in metals, and one of the few very detailed studies has shown that the llfnoise in Cr films is located in the bulk and not the surface (Zimmerman et al., 1986). This clear result is in contrast to a more complex conclusion for A1 films which have a significant anodic oxide covering and perhaps impurities in the grain boundaries. In this system, the excess noise contribution was found to be small and located in the bulk for unannealed films, but large and located at grain boundaries or at the interface with the silicon substrate for annealed films (Bakshee et al., 1990). 2 . Electron Emission from Surfclces
Some of the earliest investigations of excess noise in electronic devices were those of "flicker" or I/f noise in thermionic devices due to the fluctuations in the emission of electrons from the thermionic cathode. The metal cathode covered with an oxide layer is a very complex system which needs further investigation. However, the incentive to study low work-function emitters has reduced with the decline in the use of thermionic devices in low-noise applications and the knowledge that the emission current fluctuations could be reduced by the presence of a space charge just outside the cathode. The mechanism is still assumed to be dynamic spatial and temporal fluctuations in the cathode work function by migration of low work-function atoms to the surface and along the surface. The simpler experimental system of the metal field emitter has enabled considerable progress to be made in the study of emission fluctuations, and the mechanisms are now quite well understood so that the noise may be used as an analytical tool. The system that is best understood is that of an alkali metal on a tungsten tip. The crystallographic direction of the tip can be altered, and by a study
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
219
of the directional pattern of the emission, the spatial variation of the fluctuations on the tip may be measured. The basic mechanism is the surface diffusion of the adsorbate on the substrate, and this depends on the coverage, whether less than a single layer or in multilayers, and temperature. The results fit the basic diffusion model and are confirmed by measuring the correlation of the fluctuations between adjacent regions of the tip (Biernat and Kleint, 1990). Phase transitions in the adsorbate layer can also be observed (Beben, 1990). The spectra range from l / f through a diffusion spectrum, varying as l / f 3 / 2to g-r noise. On silicon emitters a burst noise has been seen with very large and nonstationary pulses of current. From the variation of the pattern with the ambient pressure, it has been suggested that the bursts are due to the appearance of patches of adsorbed gas atoms migrating across the surface (Bakhtizin and Gots, 1981). Although low-power thermionic devices have lost their commercial value, there is an increase in interest in field emitters for microelectronic vacuum devices. These microtips have the disadvantage that the migration of only a few atoms can cause a large change in emission, and large burst noise is observed (Brodie, 1989). Reduction of the fluctuation by space charge smoothing is not possible, but the use of an array of microtips allows a reduction by normal averaging processes.
3. Point Defects Considerable information exists that point defects can cause excess noise in conductors. The defects that have been identified are vacancies or interstitial defects which can move in the lattice, and it is the time dependence of their motion that produces the noise rather than fluctuations of their charge state. This contrasts with the study of the fluctuation of the charge state of point traps in semiconductors, which will be described in Sections II.C.2 and II.C.3, and the electrical effect of extended defects and grain boundaries in Sections 1I.D.I and II.D.2. The extensive experimental data relate to metals in which the magnitude of the noise is too large compared to the carrier density to be accounted for by trapping effects. Time-dependent scattering is the basic cause of the noise. It should be noted that even in metals there seem to be several possible noise mechanisms, so that care should be taken not to generalise a process to all systems if it has been established in one particular system. There are differences in the noise mechanisms between discontinuous films and continuous films and between single-crystal, polycrystal, and amorphous materials. In discontinuous films the resistance and noise are dominated by the conduction between the grains rather than the conduction in the bulk of each grain, and polycrystal materials also often have
220
B. K. JONES
significant contributions to the resistance and noise due to the grain boundaries. The balance between bulk and interface effects is complicated by the fact that most metal noise specimens need to have a very small volume and are hence thin conducting films on an insulating substrate. There is considerable evidence of substrate effects on the noise. The noise usually has a llfspectrum, but diffusion noise varying as l / f 3 ’ 2 or in other ways is sometimes seen. The complex behaviour of the excess noise in metals has been reviewed by Dutta and Horn (1981), Kogan (1989, Weissman (1988), and Palenskis (1990). In general some form of thermally activated behaviour is found, with the noise being located within the volume rather than in the surface (Zimmerman el al., 1986). This allows a grain boundary mechanism to be applicable as well as a true bulk mechanism, provided that the grains are small and uniformly distributed. Some systems give evidence of surface (Cho and Bene, 1989), interface (Rodbell et al., 1987), or substrate (Dutta and Horn, 1981; Fleetwood et al., 1987) influence on the noise. The thermal activation of at least one component of the noise allows the general use of the distribution of relaxation times model with a nearly uniform distribution in the activation energy to give the l/f spectrum (Dutta and Horn, 1981; Kogan, 1985; Weissman, 1988; Palenskis, 1990). The most general evidence that defects are responsible for noise is that the noise increases with strain and decreases after annealing (Fleetwood and Giordano, 1985). Since these are rather general processes and may involve extended defects and macroscopic effects, we will consider the effect of strain in a later section. The best-understood contributions of point defects to noise involve the motion of hydrogen in palladium and niobium. The hydrogen ions diffuse easily through the palladium and scatter the electrons. The noise observed is l/f at low temperatures and I / f 3’2 above 150 K, corresponding to onedimensional diffusion of the hydrogen from the ends of the sample. The noise is low if there is no hydrogen present (Zimmerman and Webb, 1988). In amorphous Pdgo Si20, the l/f noise process was studied and found to be due to thermally activated hopping between adjacent sites, with two separate processes distinguishable (Zimmerman and Webb, 1990). Similar measurements have been performed on the noise due to hydrogen in niobium (Scofield and Webb, 1985; Scofield et al., 1986). The hydrogen can be introduced from the ambient or by chemical etching or sputter etching. It can be removed by heating in a vacuum. It can also be removed by drifting the ions in an electric field applied along the sample. The noise observed is appropriate to one-dimensional diffusion, and the noise change with ion concentration is considerably greater than the associated resistance change. Hydrogen has also been found to have an influence on the l/f noise in polycrystalline gold films (Rodbell et al., 1987). In this system the hydrogen
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
221
segregates to the gold-substrate interface and reduces the noise and decreases the electromigration. In this case the noise is not due to the hydrogen motion; the hydrogen atoms block the movement of the gold atoms during their electromigration. The noise then decreases to the singlecrystal value. The electromigration aspects of this study will be discussed later. Another detailed study has been of the noise due to defects introduced into copper. The damage is complex and various single and multiple defects are created, but the relative role of vacancies and interstitial motions can be resolved to some extent (Pelz el al., 1988). Because of the complex defect structure the results are not very clear, but annealing is seen, and most of the noise is due to mobile defects and is thermally activated. The conclusion is that the mobile species are vacancies and the noise arises when they move close to extended defects. The motion of vacancies has attracted attention, and g-r noise has been seen in aluminium at high temperatures and has been associated with the fluctuation in the equilibrium concentration of thermally generated vacancies (Celasco et ul., 1976). Calculations have been made for the vacancy-generated noise for the equilibrium (Stoll, 1983) and the radiation damage cases (Seeger et al., 1987). Electromigration is an important degradation and failure mechanism in the interconnects in integrated circuits. Open circuit failure occurs because of the net divergence of the motion of the lattice atoms under the momentum of the clectrical current. A high current density and some structure to cause the net divergence are needed. The resistance shows changes before the actual failure, but there are much larger noise changes associated with the process. The problem will be discussed in more detail in Section 1II.B. There is good evidence that the process is often dominated by grain boundary atomic diffusion, and the noise spectrum and temperature dependence of the noise intensity can give activation energies appropriate to this process (Koch et al., 1985; Scorzoni et al., 1991). The actual microscopic mechanism which produces the fluctuation in the scattering from the atomic motion is probably complex, with different processes dominant in different systems. The reviews listed earlier and the more recent review by Giordano (1989) detail the position. The motion of a single defect will not, by itself, cause a resistance change because of the translational symmetry of the lattice. However, a rotation of an asymmetrical defect or the motion of a defect to a different environment may have an effect, and the time dependence of the process can produce noise. In many systems there seems good evidence that the resistance fluctuations are due to the quantum-mechanical interference of the scattered electron waves due to two or more scattering centres. If one centre has a
222
B. K. JONES
fluctuation in its position or orientation, the net conduction may also change. If there are many scatterers involved, the limit is the universal conductance fluctuation model, which predicts a maximum noise level. For only two scatterers, the limit is the local interference model. While the former is more usual at very low temperatures, the latter is more appropriate at higher temperatures such as room temperature. The frequency dependence of the noise spectra of these models still derive from a nearuniform energy distribution of thermally activated processes (Feng, 1991; Giordano, 1991).
4. Strain Efects The extremely large effect of strain on the excess noise of metals and semiconductors is so well established that one must accept that there are at least some noise sources which are not universal and appropriate to pure and ideal samples. Similarly, in the search of the literature for experimental data to compare with the theory for such universal mechanisms, care must be taken that the results given are really applicable to unstrained, and wellannealed, equilibrium samples. This is especially true since excess noise specimens have to have a small volume and dimensions and are usually fabricated as a thin film on a substrate. The differential contraction between the layers during any experiment involving a temperature variable or in the cool-down after the fabrication can produce strain in the thinner and weaker layer, and this adds to the inherent strain formed within the deposited layer during the manufacture. In many materials the noise increase is many orders of magnitude larger than the changes in the resistance due to strain or annealing (Palenski, 1990). The noise can therefore be used as an indicator of the state of the specimen, and in particular how close to equilibrium or stability the lattice has become (Jones and Mzunzu, 1989). Strain creates lattice defects including vacancies, dislocations, and extended defects leading to microscopic cracks. These can introduce extra carrier scattering and resistance, and in semiconductors there can be a contribution, which can extend over a large volume, due to carrier trapping and detrapping. This electrically active property of defects will be discussed in detail in Section 1I.C and 1I.D. Here it is assumed that the scattering effect is dominant in metals, which have been studied in some detail. Studies on thin films of Pt, Au, Ag, Pb, Sn, and others show that the noise is fairly constant during elastic strain, but increases by orders of magnitude with plastic strain, although the resistance increase may be only a few percent. The noise then relaxes back to a low value over a time of perhaps hours after the stress is reduced to zero or kept at a constant value. This
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
223
noise decrease is similar whether the strain is increasing or decreasing (Fleetwood and Giordano, 1982, 1983a, 1983b). This long-time annealing effect can result in specimens “improving” over a long period of time and emphasises that specimens for all studies should be well-annealed (Fleetwood and Giordano, 1983b). Another common observation is that during the stress period, and for a short time afterwards, the noise has a burstlike appearance and a l/f2 spectrum but settles down to a l/f spectrum. Measurements on thin films suggest a strong dependence on the material of the substrate, and also suggest that the adhesion was the controlling factor, but no separate measurements were made of the crystal size or inherent strain in the film under the different deposition conditions (Fleetwood and Giordano, 1982). The most complete studies have been made on aluminium thin films. It has been shown that both dislocations and vacancies are involved. During the deformation, the noise is large and proportional to the rate of plastic strain and is linked to the burst noise process (Bertotti et a/., 1978, 1979; Bertotti and Fiorillo, 1983). The noise can be analysed using the dynamics of the creation and anihilation of dislocations. The lower-intensity noise, with a 1 lf’spectrum, dominates when dislocation movement has ceased or is small and is determined by diffusion of vacancies and their interaction with the other defects. The study of the noise when the stress has ceased has revealed information about the vacancy mechanism (Zhigalskii and Bakshi, 1980; Andrushko et al., 1981). The crystal size and inherent strain in films has been measured, as well as the resistance and noise variations with applied strain. The 1”noise changes are much larger than the resistance changes. The noise intensity is thermally activated with an energy appropriate to vacancy diffusion. When the stress exceeds the tensile strength of the film, the noise drops as the macroscopic strain provides an enhanced diffusion route for the vacancies. The noise is proportional to the total stress, the inherent and the applied, so that it can be reduced if a compressional stress is applied to counter the inherent tensile strength. A similar study in Cr (Zhigalskii et al., 1982) has shown similar effects, but also that there is a considerable effect on the noise properties if an impurity, Ar, is added during the deposition stage. The argon atoms fill any microscopic voids and hence reduce the inherent strain and reduce the vacancy motion. In semiconductors the effects of strain are just as significant but are more difficult to analyse since there are also trapping effects. The standard processing technique of adding a layer of Si3N4 to SiOz on silicon reduces the surface strain by balancing the strains produced by the two layers due to the various differential contractions after fabrication. This results in various
224
B. K . JONES
enhanced performance characteristics for both bipolar and MOS devices, including the noise (Mikoshiba, 1976). Measurements of the effect of strain on the noise in silicon have been few, but there is a suggestion of little effect (Stroeken and Kleinpenning, 1976). However, big increases have been seen in GaAs (Palenskis, 1990; D’yakonova et al., 1991), both under uniaxial compression and after annealing. 5. Internal Friction The excess noise appears in the resistance or dissipative part of the system, so that it is natural to investigate other dissipative mechanisms. A mechanical system experiences loss due to internal friction which occurs because of translational or rotational motions of atoms or defects within the lattice. This link is also natural because of the suggestion in the modified Hooge formula (Eq. (14)) that phonons or mechanical vibrations are involved in the electrical I/f noise. Although there is good evidence for such a connection, it has been demonstrated in very few systems and, as with all the discussion about I/f noise, it is wise not to generalise too much and claim that all electrical excess noise in all systems is related to internal friction. Reviews of the present theory have been given by Kogan (1985) and Weissman (1988). The I/fspectrum arises from a distribution of relaxation times model, and the frequency-independent friction implies a 1/fspectrum of fluctuations in the mechanical strain. The strain is related to the resistance, so that the result is that the I/f resistance fluctuations are directly related to the mechanical quality factor Q by S R / R 2 1/Q. The detailed theories (Kogan and Nagaev, 1982, 1984) involve hopping of defects between adjacent sites or more general two-level systems. Whereas motion of a defect between one pair of sites may cause internal friction loss, there may be no increase in the electrical scattering if, for example, the two sites simply involve translational motion of the mobile defect. Thus there may not be a perfect correlation between the fluctuations in the losses in the mechanical and electrical systems. The most direct experiment has been performed on the amorphous material PdsoSizowith the introduction of hydrogen interstitial impurities (Zimmerman and Webb, 1990). Two low-temperature peaks are observed in the resistance noise, but only one is seen in the internal friction. However, the agreement between this peak and one of the resistance peaks as a function of temperature and hydrogen concentration is remarkable. All the peaks are thermally activated. Another system which has been investigated quite intensively is the quartz crystal resonator. If the resonator is represented by its electrical equivalent circuit with a l/f fluctuation in the resistance that represents the loss, then N
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
225
the frequency noise intensity varies as l/Q4, where Q is the resonator quality factor. This is found by experiment (Planat and Gagnepain, 1986). At low temperatures, peaks are found in the mechanical loss (Q) and one, ascribed to interactions with the thermal acoustic phonons, is found to occur also in the frequency noise. However, another mechanical loss peak ascribed to hydrogen ion motion has no equivalent noise peak. These connections between mechanical loss and resistance fluctuations are likely to give good indications about the mechanisms of the noise. Also, it is possible that the link may be useful in suggesting quality tests for various practical devices.
C. Exce.w Nohe Due to Electronic Stutes In semiconductors and insulators there are distinct electronic states which are localised in space at impurities, at point defects, or at positions along extended line or planar defects and interfaces. The energy levels of interest are localised, so that the carriers cannot conduct and have an energy within the forbidden band gap. The charge occupancy of the trap can change, and this has the direct effect of removing the carrier from the conduction process and the less direct effect of altering the charge and field distribution if the charge of the localised state is at a different place from the free carrier. The noise is caused by the time dependence of the trapping process. It is directly a carrier number fluctuation, but if there are resulting changes of charge distribution, then these may also produce scattering or mobility fluctuations. An ensemble of identical states will produce a spectrum associated with a single state which will then show generation-recombination noise with a single time constant and a Lorentzian form. The basic properties of generation-recombination noise are given in the standard texts (Muller, 1978; Ambrozy, 1982; van der Ziel, 1986). Any spatial inhomogeneity caused by a doping density gradient, local strain, nearest-neighbour interaction, or Fermi energy variation relative to the band edges will result in variation of the time constant and a broadening of the Lorentzian to give, in the limit, a I/f’spectrum based on the distribution of relaxation times model. Here we will describe the noise sources of this type which have been well established. 1. Donors and Trap5
The simplest system is a uniformly doped semiconductor resistor. This is easily achieved with a JFET channel. As the temperature is reduced there is insufficient thermal energy to excite the impurities. so that the carriers due to the donors freeze out and the Fermi level rises past the donor level. The
226
B. K . JONES
noise is an indication of uncertainty and is a maximum when the Fermi energy coincides with the donor energy, whereas there is little noise when the donor levels are most likely to be either completely full or completely empty. The intensity and characteristic frequency of the noise are temperaturedependent, and changes can also be made by moving the Fermi level with an applied field. Carrier density fluctuations due to donor freeze-out have not been investigated in great detail, but the basic principles have been established in silicon JFETs to give the optimum temperature of operation for low noise (Churchill and Lauritzen, 1971; Haslett and Kendall, 1972; Haslett et al., 1975). Individual traps have been studied extensively in many semiconductors. The temperature dependence of the low-frequency intensity of the generation-recombination noise, or the characteristic frequency, gives the activation energy of the trap. Simple assumptions can be used to determine the cross-section, but a fairly full model is often needed to obtain a value for the concentration of the traps (Scholz and Roach, 1992). Noise has been used to investigate the approach to equilibrium of impurity energy levels introduced by ion-implanting and annealing (Liou et al., 1991b). The trap properties in Schottky diodes are normally measured by deeplevel transient spectroscopy (DLTS), which uses the time dependence of the depletion capacitance as the traps empty to give the emission parameters. The trapping and detrapping in the depletion region produces changes in the effective barrier height and hence very large current fluctuations. This trapping mechanism has been verified for InP diodes, and close agreement has been found between the trap properties found by noise and DLTS studies (Grant et al., 1978; White et al., 1978). In silicon JFETs the temperature dependence of the g-r noise has been used to identify trap levels (Copeland, 1971; Haslett and Kendall, 1972; Nashashibi et al., 1983; van Rheenan et al., 1987). Similar studies have been made in MOS channels, although these specimens showed a larze Ilfnoise which reduces the sensitivity (Murray et al., 1991). Only in the MOS devices has a successful comparison been made with DLTS measurements (Scholz et d., 1988). (In all these experiments the traps are assumed to be in the depletion region next to the channel where the Fermi level bends to intersect the trap level. Verification of the processes involved have not been made by measuring the trap parameters in other ways.) The deep-level properties in compound semiconductors are of particular interest because they have a large effect on the dc and ac properties of devices and are caused by disorganisation of the lattice as well as by impurities. A plot of the noise intensity at a single frequency against temperature typically gives a series of peaks corresponding to the temperatures at which
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
227
the characteristic frequency of the contributions to the g-r noise equals the measurement frequency. Measurements at several frequencies enable activation energies to be found, and this has been valuable in the study of GaAs FET devices and materials (Sodini e t al., 1976). More sophisticated analysis involving the separation of the l/f and various g-r components has shown that the noise results agree with those from photoluminescence in AI,Ga,_,As materials (Hofman e t al., 1988) and heterostructures (Kirtley et al., 1988; Hofman et al., 1990). In the investigations just named, the spectra were very close to Lorentzian but with only very slight broadening. Measurements on GaAsFETs have shown considerable broadening which has been attributed to the location of the traps in the Schottky gate depletion layer (Hallgren, 1990). Detailed studies on GaAsFETs using several techniques for measuring the trap parameters, and also methods for determining the location of the traps, have shown spectral shapes very close to Lorentzian. These studies also show that the time constant determined by noise methods seems to be consistently higher than that found by DLTS and other methods (White e l al., 1978; Abdala and Jones, 1992). There is good evidence, therefore, that noise can be used to study the properties of traps within semiconductors. It has the advantage that it is an equilibrium measurement, unlike DLTS where the sample is subjected to large excitations. It cannot, however, distinguish directly between electron and hole traps, and it measures a different characteristic time if the trap emptying and filling rates are comparable. It also appears to have less sensitivity than DLTS, but can perhaps distinguish between neighbouring traps a little better. Whereas DLTS can investigate the properties of any trap which is filled or emptied during the excitation, noise only detects traps which are at, or near, the Fermi energy somewhere within the specimen. 2 . Single Traps Significant advances have been made in the understanding of the mechanisms of excess noise by the observation of the noise generated by individual defects located in submicrometer structures such as silicon JFETS, metalinsulator-metal (MIM) tunnel diodes, and MOS transistors. Because of the special sample conditions needed to observe this noise, and perhaps because of the great detail obtained, it is not expected that this type of noise will be generally used for quality and reliability studies. Silicon material is now very pure, so that in JFETs of small dimensions there are few traps. Defects which are active in producing noise are limited to those at the Fermi energy, and in practice that means that they have to be in the transition region where the bands bend between the neutral channel
228
B. K. JONES
and the depletion region. If the channel is pinched off, those traps near the constricted region have most influence on the channel conductance. In this way the active volume, and hence the number of effective traps, can be made very small, and individual traps can be investigated and also located if a separate substrate bias is available to move the pinched-off region about within the semiconductor volume. It has been shown that individual traps can be detected and that they show g-r noise (Kandiah and Whiting, 1978). In very small cross-section MIM tunnel diodes, the barrier transmission has been found to fluctuate as a two-level random-telegraph signal because of trapping within the junction (Rodgers et al., 1987). In extremely small MOS transistors, individual, thermally activated traps producing a two-level random telegraph signal have been studied (Kirton and Uren, 1989). The behaviour is very complex, but there is convincing evidence that each two-level signal corresponds to an individual trap, and if there are several traps then the spectrum rapidly approximates to a I/f variation to give a good indication of the basic origin of the l/f noise in large-area devices. This is shown in Fig. 4 (Uren et al., 1985). This has been modelled for different-sized devices (Ghibaudo et al., 1991). The properties of the individual defects are found to be very complex (Cobden et al., 1990), but methods of translating the parameters of individual traps to the observed microscopic device properties have been described (Kandiah et al., 1989; Kandiah, 1990; Kandiah and Whiting, 1991). Direct connection between the g-r components of the noise spectrum and steps in a DLTS transient have been observed (Karmann and Schulz, 1989). Initial attempts have been made to link directly the single-state noise with defects in the oxide generated during bias (Farmer and Buhrman, 1989; Fang et al., 1991). Detailed studies have also been made on metal nano bridges (Ralls and Buhrman, 1991) and AlGaAs tunnel barriers (Campbell et al., 1991).
3. Surface and Interface States There is no doubt that there is a large contribution to the excess noise in semiconductors caused by states at or near the surface of the semiconductor or its interface with an oxide or other insulating layer. This noise usually has a I / f spectrum. This is seen most clearly in a surface resistor formed by doping which has depletion layers on all sides except the oxide-covered free surface. If a gate is applied above an oxide insulating layer, the surface can be inverted. The noise drops by several orders of magnitude as the resistive channel changes from the high-noise MOS-type device to the low-noise JFET-type device with depletion layers on all surfaces (Hayat and Jones, 1987). In the latter the surface is also depleted and the electrons cannot interact with the surface or interface states.
‘f I
lo-” 0.1
10
100 f (Hrl
(4
1000
1c
I 1
I
I
I
10
100
loo0
1
* \ 10000
In21
(d
FIGURE 4. The excess noise power spectral density of n-channel MOSFETs. In each case three devices are shown, and the sizes are (a) 20pm x 20pm, (b) 201m x 2pm, (c) 2pm x 2pm. As the device area increases, the number of active noise centres increases, and the noise becomes more l/J-like. From Uren et al. (1985). with permission.
230
B. K. JONES
The discussion becomes more complicated when attempts are made to determine the source of this noise (Hayat et al., 1988; Kleinpenning, 1990). The best evidence comes from the single-trap fluctuations described in the last section. This is a number fluctuation, although the change in the charge state may also change the scattering and recombination at the surface. The single-trap studies reveal such a wealth of information that it may be difficult to relate those findings with more conventional ensemble-average or macroscopic measurements. The noise has been clearly related to all the other methods of measuring the loss at the surface, such as C- V , G- V , and charge pumping experiments. These are carried out on the gate-channel capacitor with the oxide, and its lossy states, as the dielectric (Maes et al., 1985). The noise also decreases with improved processing and increases with damage (Maes and Usmani, 1983). The interface states are often described as fast and slow states, with the assumption that the fast states are at the interface while the slow states are within the oxide. The fast states may be at the interface itself or nearby in the semiconductor. The fluctuation in the oxide state properties are ascribed to sodium ion motion or charging and discharging of electron states via tunnelling from the semiconductor. Unfortunately, there is still not conclusive evidence for the exact nature of the states involved in the noise. There are also difficulties in interpretation because the different quantities are not measured on exactly the same devices, and assumptions are made such as that the density of fast and slow states are proportionally related. Using a buried-channel type of structure, it has been shown that some noise at the surface can be related directly to the surface state density (Jones and Taylor, 1992). If high-energy electrons are extracted from an MOS transistor channel into the oxide insulator, energy levels can be created and charges left in the insulator. This hot-electron degradation of the transistor performance also produces a large l/f noise (Maes and Usmani, 1983; Stegherr, 1984; Kandiah et af., 1989; Fang et al., 1991). In bipolar devices the surface conditions can have a large effect on the device properties, since the p-n junctions and their related depletion regions must intersect the surface. This causes the possibility of changes in the conduction near the surface by changing the geometry of the depleted and inverted regions, and also the possibility of changes in the surface recombination rate. These quantities may also show time-varying fluctuations and hence cause noise. The inversion of the surface to form conducting, MOS-type channels is a severe problem, and normally guard rings are formed to prevent their spread. They can alter the static properties of the device greatly, and since the MOS channels have large l/f noise, the noise increases considerably (Stocker and Jones, 1985; Jones and Truscott, 1987).
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
231
The recombination in the depletion region of a p-n junction where it intersects the surface can be very significant and can dominate the properties of the junction and the transistor. A fluctuating change in the surface area or recombination velocity of this region can make a contribution to the noise. The mechanism is probably the influence of potential fluctuations due to the charge state change of the surface traps. The dependence of the diode and transistor properties on the surface potential has been shown using a gate-controlled diode (Hsu, 1970). From detailed measurements of the bias and temperature dependence of the dc and noise parameters, the noise has been directly related to the generation, non-ideal component of the diode or transistor base current rather than the diffusion component or even the total current (Green and Jones, 1985a, 1985b; Jones and Katai, 1987). Ionising radiation of all types generates defects in semiconductors, and these generate excess noise and produce effects on the device static performance. These effects can usually be partially annealed. The defects are mainly individual point defects or clusters. I t has been found that the traps produced by y irradiation in MOS devices are the same as those producing I / f noise in irradiated devices (Fleetwood and Scofield, 1990). X-rays have been seen to produce an increase in 1 lj’noise in bipolar transistors through a mechanism associated with the surface recombination of the base-emitter depletion region (Blasquez and Roux-Nogatchewsky, 1980). Radiation in JFETs produces g-r noise because of defects induced in the bulk, but they are less sensitive to damage than MOS and bipolar devices (Wang et a f . , 1975; Stephen, 1986). Ion implantation is a standard technique for semiconductor device doping. It inevitably produces lattice damage, and normally care is taken to anneal this and to activate the impurities. The noise is found to decrease with annealing time at low annealing temperatures, but at higher temperatures g-r noise is increased, probably because of vacancy-interstitial diffusion (Vandamme, 1988). Because of these effects, care must be taken in the interpretation of noise measurements on implanted devices unless the effect of their annealing procedure is well understood. Incompletely annealed samples may show excess noise, and then a slow decrease as the lattice anneals during use.
D . Electrical States Linked to Extended Defects In this section we continue the discussion of the noise phenomena which can be related definitely with particular processes and outline the effects of extended defects, line dislocations, two-dimensional defects, and grain boundaries. These appear to produce a noise contribution due to the
232
B. K . JONES
change of the charge state of traps within their structure, and hence this is a continuation of the last section, although the systems are more complex than for point defects. These extended defects may also produce scattering fluctuations as described in Section 1I.B and participate in other processes such as electromigration. 1. Burst Noise Burst noise, or popcorn noise, is a very large-amplitude random two-level telegraph signal, as shown in Fig. 1. It is also sometimes found as a more complex structure with several levels, but the principles can be discussed with the two-level example. Superimposed on the voltage levels are the usual thermal and I/f noises. The process is therefore additive, although there is evidence that a specimen with burst noise also has high l/fnoise which is the same in both burst noise states (Strasilla and Strutt, 1974). The time constants in the two states are thermally activated and the spectrum is Lorentzian, although the amplitude distribution function is not Gaussian. With such properties the amplitude decreases for highfrequency transitions so that it is likely that many specimens show multilevel and complex processes, but the high-frequency effects are not readily visible. A full temperature-dependent study will reveal all the transitions. In the study of l/f and other noise phenomena, it is normal to test all specimens for burst noise and only use the quiet ones. This procedure may perhaps produce a bias in these measurements. The statistics of burst noise have been derived assuming a Poisson distribution of events (Ambrozy, 1982). Thus, if t , and t2 are the times in each state with average values r1and r2, then the probability densities of the times in each state are 1 P ( t l ) = -exp 71
and if
70
-
):(
and P ( t 2 ) =
1 -
72
exp -
(z),
(19)
is the mean time,
to give a Lorentzian spectrum
Burst noise has been seen in many devices, including transistors, p - n junction diodes in forward and reverse bias, Schottky diodes, thick-film resistors, and granular semiconductors. The basic statistics of the trapping times and the
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
233
thermal activation of the process has been verified on the assumption that the occupancy of a trap controls the process (Hsu et al., 1970; Cook and Brodersen, 1971). This idea has been extended to give a better fit to experiment by assuming a three-state trap which produces one burst noise level in two of the states and the other in the other single state (Sikula et al., 1981; Sikulova et nl., 1990). The exact mechanism and location of the phenomenon is still uncertain. Whereas the statistics suggest a fairly simple trapping process, the effect produced is very large and greater than might be expected by the direct action of a single trap. What is needed is a very high-conductivity parallel channel which can have its conduction controlled by a single trap. The situation appears to be different in junctions and in thick-film resistors. In thick-film resistors, the noise is found in high-resistance, small-volume samples, often with visible damage or after trimming. The structure and conduction in real resistors is complex, but the effect can be understood if in the defective devices the resistance is controlled by a few parallel channels and one of these can be turned on or off by the charge in a trap in one of these channels (Chen and Cottle, 1986). The conduction is often by tunnelling between adjacent conducting grains, and small changes in the potential at this point can cause a large change in the conduction. It is likely that one grain boundary may control the conduction in a single conducting channel. This has also been seen in percolation structures (Chen and Chou, 1985). In a device with a potential step or barrier, the conduction depends exponentially on the barrier height so that large conductance changes are possible from a single fluctuating charge. To obtain a large enough effect, as seen in burst noise, a large area needs to be affected, and normally a dislocation bundle or similar defect crossing the junction is assumed. Another large defect which has been considered is a metal precipitate which can change its charge state and affect a large-perimeter area. Stacking faults have been identified in integrated circuits (Conti and Corda, 1973). Emitter edge dislocations have also been shown to be related to the occurrence of burst and other low-frequency noise (Stojadinovic, 1979; Mihaila and Amberiadis, 1983; Mihaila et af., 1984). Direct information has been obtained by studying the noise and induced current when an electron beam is applied to a sensitive region of a transistor (Knott, 1981). Detailed studies have been made on the occurrence of noise and various observable defects and their electrical characteristics produced by different processing parameters (Martin et af., 1972; Blasques, 1978). These all suggest that dislocations which are visible under standard etches are probably responsible for the burst noise.
234
B. K. JONES
2. Granular Materials The classic sources of l/f noise have been carbon composition resistors, “dry” soldered joints, and resistive contacts. These are all now easily avoided, but they are all examples of granular materials with the resistance to current flow dominated by the junctions between particles of homogeneous materials with low resistance and low excess noise. In general the mechanisms which produce the large resistance between the grains are also those of the resistance fluctuations. Detailed measurements on the properties of single grain boundaries of silicon (Madenach and Werner, 1988) have shown that the properties can be described by an interfacial layer of traps with a distribution in energy through the forbidden gap. Charges exchange between those traps located at the Fermi energy and the majority carriers. For a uniform energy distribution of these interface states, N i , ( E ) ,a noise spectrum is obtained which is slightly less sharp than the Lorentzian obtained by states located just at the Fermi energy:
where T~ = (vthon)-’ is the effective time constant, and QR is the depletion charge on the more depleted side of the junction, A is the junction area, z+h is the average carrier thermal velocity, n is the carrier density at the interface, and u is the capture cross-section of the traps. This spectrum has been used successfully to measure the interface-state density and cross-sections which agree with independent measurements. The experimental results produce a spectrum which is slightly less sharp than the preceding function, and this is ascribed, as in MOS capacitance and conductance measurements of surface states, to spatial variations of the trap densities. This good agreement for a single interface allows confidence in making extrapolations to polycrystalline materials. The immediate implication is that the noise will vary as l/(volume) if the number of grain boundaries is large and uniformly dispersed. As the variation across the boundary area and between boundaries increases, the near-Lorentzian spectrum is smoothed further to merge into the distribution of relaxation times model and eventually a 1,/fspectrum. For polycrystalline silicon doped with boron by ion-implantation, the variation through the sample is small, and distinct energy levels have been seen (Madenach and Stoll, 1987). Contacts between metals have been studied, but the results are not very conclusive because there are surface impurity layers and variations with pressure which alter the effective contact area and even the effective number
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY 235
of contact points (Theunissen, 1953). There is evidence, however, of a temperature fluctuation source of the noise through the temperature coefficient of resistance of the small contact volume (Takagi et af., 1986). Very thin metal films are discontinuous with an island structure. The current flows through the interconnecting regions when the islands grow large enough to approach their neighbours. There is a potential barrier at this interface which provides a resistance to the current flow, and the potential height is determined by the occupancy of interface states and hence ternperature (Celasco et al., 1978a, 1978b, 1980). The dominant process is quantummechanical tunnelling through the insulator separating the islands. The general problem of an array of conducting elements separated by insulating regions involves percolation. As the proportion of conductor to insulator increases, there is a critical value at which the whole sample material changes from insulator to conductor. This occurs when the first conducting path appears between the electrodes. In this state the resistance is very susceptible to fluctuations in the properties of this poorly defined system, and large noise is seen. Several systems have been studied, and the results have confirmed the basic mechanisms of the conduction, which is usually temperature fluctuations at the contacts, and the second-order phase transition at the percolation threshold (Chen and Chou, 1985; Mantese and Webb, 1985; Koch et al., 1985a). The noise is usually I/& but the exact reason for this spectrum is unknown, although a distribution of relaxation times approach is usually assumed. Less well-characterised granular materials such as carbon resistors and cermet thick-film resistors have also been studied and show the expected basic properties. Carbon resistors show 1 ’/ noise one or two orders of magnitude larger than that of metal film resistors. The temperature dependence shows small, but linked, variations in both the magnitude and the frequency exponent (Fleetwood et al., 1984). This is expected if a thermally activated process with a nonuniform distribution of activation energies is assumed, as has been seen in metal films (Dutta and Horn, 1981). The mechanism is not necessarily the same for the carbon and the metal. The nonuniformity of the distribution produces structure in the l/f spectrum, and this moves in frequency as the temperature is varied. The cermet thick-film resistors show 1/f noise. Again, tunnelling through the insulating glass between the grains is thought to be the basic process. It is found that the noise is consistent with these assumptions, but the specimens are not well enough controlled to give good results (Chen et al., 1982; Masoero et uf., 1983; Kusy and Szpytma, 1986). Other poorly controlled materials such as ZnO varistors also show large Ilfnoise with thermally activated processes (Prudenziati et a/., 1985).
236
B. K. JONES
E. Distributed Eflects
In addition to the identifiable noise sources due to microscopic defects that have been discussed, there is noise generated by larger-scale processes within the sample. In practice these processes may be initiated or strongly affected by defects, but the mechanism is independent of the defect itself. Many of these effects are nonlinear and depend on the geometry of the specimen and its current flows. 1. Breakdown
There are many voltage-dependent effects in which there is a rapid increase in current or a threshold in the rise of noise. For example, there are various well-characterised oscillatory processes which usually have a high harmonic content. Insofar as these produce spurious signals, they act as noise sources, but in general they do not fall into the category of purely random processes which we are considering here. Similarly, deterministic chaotic systems do not concern us. These are usually easily identified since the spectrum is very sensitive to the control parameter, such as the voltage, and there are also regions of the value of the control parameter in which the spectrum shows simple harmonic and subharmonic structure. At high voltage, the current in a junction or an insulting layer rises rapidly in avalanche or perhaps Zener tunnelling breakdown. Considerable noise is generated by the randomness of the electron or hole multiplication process. These processes are well understood (Ambrozy, 1982; Buckingham, 1983). The detailed character of the breakdown process can be influenced by defects in the breakdown region which perhaps decrease the local breakdown field or increase the ionisation probability. Microplasmas have been considered as a source of burst noise, but although the basic noise pattern that they produce has similar properties, these random local breakdown phenomena produce an effect which is larger and much more voltagedependent than burst noise. Also, light is frequently observed to be emitted from microplasmas (Ambrozy, 1982; Buckingham, 1983). A detailed study has been made of the effects in avalanche reference diodes (Kendall and Tadros, 1975). 2. Locul Inhomogeneity
If an homogeneous sample has a source of volume or surface excess noise, each sub-element acts as a noise source with a magnitude depending on the current density at that point. If there are regions of high current density, then these introduce noise by an amount larger than the increase in resistance. Therefore, specimens with nonuniform current flow are likely to show
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
237
extra noise. This current crowding effect has been demonstrated in thickfilm resistors, and excess noise has been observed to increase if the geometry deviates from a rectangle and also if the edges are not sharply defined. The geometry used and the cutting techniques employed in trimming the geometry to the desired value of resistance are therefore important (Maloberti et a!., 1975; Chen and Rhee, 1977). There is also a significant effect of contact noise at the junction between the resistor and the metal interconnect. The detailed geometry of metal-metal or metal-semiconductor contacts can have a significant effect on the contact noise. If the contact can be considered as an array of many parallel point contacts, then the noise and resistance will both vary as the number and size of the individual contacts. Again, any current crowding increases the noise (Ortmans and Vandamme, 1976; Vandamme and Tijburg, 1976). In semiconductors, any local current crowding results in increased local power dissipation and the creation of hot spots which may have a large effect because of the large temperature variation of semiconductor properties. This type of effect on the noise has been reported for large-area power bipolar transistors (Shacter el al., 1978).
AND RELIABILITY 111. QUALITY
In the previous sections the evidence for the specific mechanisms which generate excess noise in particular systems has been given. This information is needed if identification is to be made of the source of some observed excess noise. It should be emphasised that some noise mechanisms are specific to particular systems, and other systems may be dominated by a very different noise mechanism. I t is apparent that many noise sources derive from defects and imperfections in the devices, and these are the same types of problems which cause a reduction in quality or reliability in a device. It is therefore possible that noise is caused by the same types of defects that also reduce reliability. Although there is less direct evidence, it seems likely that the known problems in semiconductor device fabrication which reduce reliability are also likely to produce excess noise. For example, inadequate annealing results in point defects and atomic motion; poor metal adhesion to the substrate, peeling, and cracking are all likely to produce major crystalline damage and breaks with increased carrier scattering or local current crowding; poor registration also increases the local current density and hence larger excess noise; and excess humidity may result in chemical reactions
238
B. K. JONES
at the surface to produce both changes in the specimen dimensions and fluctuations in the surface scattering and recombination parameters. In general a lack of reliability implies a change with time and a lack of equilibrium which are both seen to be general sources of excess noise. With the general principle that noise is caused by defects, use can be made of a noise test as an extension to the basic functional or specification tests on devices. These tests ensure that all production devices satisfy the manufacturers’ specifications. Such tests of leakage current, gain, or response time can also be used as statistical process control parameters as measures of process quality and indicators of yield problems. Although it is very sensitive to quality problems, noise is not a very convenient test: it is inherently slow because the noise is large at low frequencies, and hence a measurement at sub-audio frequencies is usually needed. For specific mechanisms it is often possible to find a quicker test on some other parameter, but this loses the convenience, generality, and power of noise tests, which show up many different problems. For high-reliability applications, noise is a very convenient nondestructive screening quality so that the best devices from a batch may be selected, This does not ensure long life, but is probably a good principle. The conventional reliability or life tests are destructive, are statistical, and are made by sampling each batch. They are therefore very slow and not very accurate. Normally samples of each batch are selected and subjected to various degrees of accelerated aging. This is normally an elevated temperature test. A thermally activated degradation process is assumed, and the results of the life tests at different temperatures are suitably extrapolated to the normal operating temperature to find an estimated life during normal operation. This process assumes that the same degradation mechanism operates at the operating temperature as at the test temperature. In this section we will discuss those experiments where the noise has been shown to correlate with the more conventional accelerated life tests, and also other evidence where the noise has been related to other tests of quality or reliability (Gupta, 1975).
A . Passive Components
The quality of passive components is important because of the number of these which occur in circuits. Fortunately, there are many different materials and techniques for making these components, so that high-quality methods can normally be found and noisy components are only experienced if the particular device is defective, or if there is some unusual requirement for the device such as operation under extreme conditions so that the choice of
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
239
material and construction is limited. For example, very high-value resistors are often very noisy. For resistors, one common method of assessing the quality of production is the measurement of the third harmonic distortion. If a sine-wave current is passed through the resistor, a voltage at the third harmonic is sometimes generated, even though the resistance is accurately ohmic for large signals. This is apparently due to the multitude of non-ohmic boundaries within the granular or polycrystalline material. There is good evidence that a large third harmonic distortion correlates with an accelerated failure rate (Kasukabe and Tanaka, 1981). It is likely that larger microscopic boundaries or macroscopic cracks are involved. A good correlation between the third harmonic index and the excess noise has also been found (Bristow et ul., 1971; Jones and Xu, 1993), so that probably the noise is also a good and sensitive detector of poor-quality devices. Since most practical resistors have only two terminals, the metal contact to the resistor material must be of high quality and must not generate noise. This is also a possible source of nonlinearity. In continuous metal films, the connection between the noise and nonlinearity has been made through vacancies (Zhigalskii, 1991). The stability of the value of the resistance is an important criterion of quality. A drift in value is an indication that the device is not in equilibrium, and a change towards equilibrium may be such as to cause degradation and failure. Changes under high current-density stress in thin metal films will be considered in the next section on electromigration in integrated circuit interconnects. Measurements on polycrystalline silicon thin films (Jones and Mzunzu, 1989) have shown that the excess noise decreases with time and the decrease in noise is a more sensitive indicator than the change in resistance. In such specimens differences were found after temperature cycling or annealing, and samples which were bonded to a substrate with a different expansion coefficient were found to suffer greater changes, so again the stress on grain boundaries and the formation or extension of cracks may be involved. A correlation between the resistance change during burn-in and the excess noise before the stress has been seen for nickel-chrome thin-film resistors (Pinet et af., 1987). The adhesion of thin-film resistors has been studied in detail during the period when it was thought that a significant source of excess noise was resistance fluctuations derived from temperature fluctuations. This is now considered to be a likely mechanism in only a few special cases. However, although the mechanism is not understood, the noise has been found to be higher in thin-film resistors if the adhesion to the substrate is poor (Bibeau et uf., 1978; Fleetwood and Giordano, 1982) and in other devices if the contact to the surrounding thermal environment is poor (Jones, 1975).
240
B. K. JONES
The effect has already been mentioned of the geometry and the shape and quality of any cut made in order to trim the resistance value of thick-film resistors (Chen and Rhee, 1977; Maloberti et al., 1975). The sensitivity to the quaiity of the noise changes compared with the resistance changes has also been demonstrated in thick-film resistors which have suffered damage to the individual intergrain boundaries because of high-voltage overstress (Stevens et al., 1976). The poor electrical properties of mechanical contacts are a classic source of excess noise, and this can be ascribed to the current-crowding effects if the contact is made only at a few spots (Ortmans and Vandamme, 1976; Vandamme and Tijburg, 1976), or the resistance of any foreign material in the gap between the two good conductors, which can perhaps be considered like a grain boundary. Temporary contacts such as probes may give a large noise contribution (Yassine et al., 1991). Sliding contacts can generate significant noise, but the waveform is less well characterised than for other excess noise, although a power law spectrum between Ilf and l/f2 has been observed and the noise is found to increase in size as the velocity increases (Taniguchi et al., 1985). The noise of “dry joints” in which the solder has crystallised is probably best understood from the large noise found in granular and polycrystalline materials. Noise measurements in capacitors have mainly been confined to the observation of the discharges occurring at the beginning of the breakdown of the dielectric. These pulses have been observed as rf noise, and the noise correlated well with the measured lifetime under accelerated stress (Misra et al., 1991). The impulses have also been studied as a function of stress voltage and found to increase rapidly at a voltage considerably below the breakdown voltage (Sikula et al., 1990a). The noise in this pre-breakdown region has been observed in some specimens to be more stationary with a I / f 3/2 spectrum and a Gaussian amplitude distribution (Pender and Wintle, 1979). The I/f noise in some capacitors has been made more measurable by stimulating the capacitors with a large ac voltage. Then, the noise was found to be a better measure of the lifetime and damage threshold than the measurement of the loss tangent or the measurement of the noise without subjecting the samples to the large stimulating voltage (Kapshin et al., 1984). There has been little study of noise in inductors and transformers, but the motion of domain boundaries in ferromagnetic cores creates impulsive Barkhausen noise. This noise has been used as a measure of the physical damage of the core, so that probably noise could be used as a measure of quality in such devices (Tiitto, 1989; Mykolaitis et al., 1991).
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
241
B. tnterconnects and Electromigration As the size of electronic devices becomes smaller and the current densities in the metal interconnects and the metal-semiconductor contacts become larger, the electromigration failure mechanism becomes more important. This is observed as an increase in the resistance, and finally the creation of an open circuit. Related to this are short circuits between neighbouring tracks as the material displaced from the track is relocated. The basic mechanism is the motion of atoms, vacancies, or voids, under the influence of the momentum of the current flow at high current densities. To create a net change in the track configuration, there has to be a net divergence or convergence of the atom flux at some structural feature. The process produces rapid failure, since any resulting constriction produces a larger current density, power dissipation, and temperature rise, and hence enhanced electromigration in this thermally activated process. The most likely process is atomic diffusion along grain boundaries, but the role of surface diffusion and internal stress is not certain. With changes in the metal film geometry occurring during the progress of the failure, resistance changes are observed, and these become larger towards total failure. It is clearly found that the resistance fluctuations associated with this resistance change provide a much more sensitive indicator that the process is taking place (Koch et af., 1985b; Vossen, 1973). Although the basic value of noise measurements has been demonstrated, the study of this, as well as other aspects of electromigration, is made difficult by the large number of experimental variables and the sensitivity of the process to the exact specimen parameters. For example, the metal, alloy composition, degree of impurity segregation, deposition method and temperature, substrate material, annealing temperature and time, crystal size, and presence of a passivation overlayer all probably have an effect. It is therefore difficult to identify definite patterns. The noise experiments are not easy since the noise in metals is small. The basic observations are that excess noise is a good and sensitive indicator of the quality of metal interconnects. Noise is observed with both I,/f and l / f 2 spectra, with the latter more common at high current densityj, temperature 7 , and degree of damage (Cottle and Chen, 1987). It should be noted that with such experiments there is the possibility of instrumental and other effects such as temperature instabilities becoming significant. The basic model for electromigration results in an atom flux flow density, J , produced by the electric current density, j , acting on the diffusion process: e Z *Do J = Npjexp ( - E A / k T ) , kT
242
B. K. JONES
where W is the atom density, p is the resistivity, and Z * the effective atomic charge, and Do and EA are the thermally activated diffusion constants. In practice, a phenomenological equation is used for the total effect of the electromigration, including the microscopic formation of constrictions with the resulting local failure. This is given by the mean time to failure:
MTTF = Aj-" exp ( E A / k T ) ,
(22)
where A is a constant and EA is the appropriate activation energy, which is not necessarily the same as that in the theoretical equation (21). Because the actual failure is catastrophic and occurs under extreme conditions in the construction, n is found to be large and variable between 2 and 8. By assuming that the excess noise intensity, SR,with any frequency dependence, is proportional to J, we find from Eq. (21) that
which provides an expression for determining the activation energy from the experimental data (Neri et al., 1987). The 1/f spectrum which is observed has been ascribed to a g-r process with a very long characteristic time caused by the transit time of a vacancy between two sites (Neri et al., 1987). This has been generalised to include a spread of transit times, and therefore a possibility for the frequency exponent to vary between 1 and 2 (Yang and Celik-Butler, 1991). This model predicts a current exponent n for the noise intensity near 3. Another model of noise due to vacancy motion ascribes the 1/f2 noise to the thermal fluctuations of the vacancy flux (Chen et al., 1990) and also obtains a j 3 current dependence. It is also pointed out that a linear variation in resistance with time produces a l/f2 spectrum, and if this drift is produced by a diffusion process, then the activation energy found from a plot of the noise intensity corresponds to twice the diffusion activation energy and gives a j 4 dependence (Chen et al., 1990; Liou et al., 1990). Activation energies have been determined from noise measurements using two techniques: the excess noise intensity variation with temperature using Eq. (23) (Neri et al., 1989; Diligenti et al., 1989; Celik-Butler et al., 1991), or sometimes more directly with SR N exp ( - E A / k T ) (Cottle and Chen, 1987) or by using the method of Dutta and Horn (1981) in which the activation energy of any structure in the spectrum is measured (Koch et al., 1985b; Rodbell and Koch, 1991). In general, the values obtained are similar to those obtained from elevated-temperature life tests. The current density dependence of the I/f2 noise in A1 shows a j' dependence (Cottle and Chen, 1987), while the I/f noise shows a distinct threshold at a current density of nearly 1 x lo6 Acm-2, which is normally taken as the
ELECTRICAL NOISE AS A MEASURE OF QUALITY AND RELIABILITY
243
electromigration threshold, and then a j 4 variation (Celik-Butler et al., 1991). The direct correlation between the noise intensity and the lifetime (MTTF) under normal accelerated stress has been investigated widely, and generally a strong correlation is observed. For Al-Si specimens, the I/f noise at 2 x 106Acm-2 at 150°C correlated with a life test at 1 x 106Acm-' at the same temperature (Celik-Butler et al., 1991). For AI-Cu, the 1/ f noise at 20 "C and 1 x lo6 A cmP2 correlated with lifetimes measured at 2.5 x lo6 Acmp2 at 257-300 "C (Schwarz et af.,1991). Correlations of l / f 2 noise measured at high currents, 2 x lo6 Acm-*, and temperatures, 232 "C, have correlated with lifetimes measured at lower stress, 1.25 x 106Acm-2 at 163 "C (Sun et al., 1990; Cottle and Klonaris, 1990). Since conventional life tests on interconnects take a long time, there is considerable interest in rapid tests which can be performed on a wafer. By passing a high current through a test specimen, the temperature is raised to enhance the electromigration, and also the noise is larger (Diligenti et al., 1989; Komori et al., 1991). As with all such tests, the specimen is probably driven into extreme conditions, but correlations are usually found. Although aluminium and its alloys are of most commercial importance, other elements have been studied, including copper (Rodbell and Koch, 1991), indium (Neri et al., 1989), and gold (Rodbell et af., 1987; Neri et nl., 1989; Verbruggen e t al., 1987). The gold results emphasise directly the importance of the diffusion along grain boundaries, which can be reduced by the addition of hydrogen (Rodbell et al., 1987). The study of a single grain boundary also demonstrates that noise and electromigration depend strongly on such defects (Verbruggen et al., 1987). Microstructure effects have also been studied on Al-based systems (Cottle et al., 1990; Schwarz et al., 1991). More complex metal layer structures are being developed for interconnects, and Al/TiW layers have also been shown to allow correlation between noise and reliability (Cottle et al., 1990). Care should be taken in interpreting the noise data, since the noise often is not stationary, and changes can take place during the fairly long time taken for sensitive measurements. Pulse or step variations in the resistance can be included in the total noise spectrum (Diligenti et al., 1988). It has also been observed that the noise differs considerably after a stress if the measurement current is passed in the same, or the opposite, direction as the stress current (Liou et al., 1991a). During the later stages of electromigration, the constrictions produce the increase in resistance, the areas of high current density, and hence the excess noise. A measurement of the increase in temperature at these constrictions through a measurement of the increase in the white, thermal noise (Massiha et al., 1989) demonstrates the versatility of noise techniques.
B. K. JONES
244
C . Bipolar Devices Bipolar devices, transistors and diodes of various types, are very common so that methods of assuring high quality and reliability are essential. Noise investigations have been quite extensive. The investigations into lightemitting diodes and lasers will be considered later in Section 1II.F. The source of the excess noise in p-n diodes and junction transistors has been shown to be in the excess or non-ideal current in a diode or baseemitter junction of a transistor (Sikula et d., 1984; Green and Jones, 1985a; Knott, 1986). If the total current is
I,
= I;
+ I[
= I& exp (eV,,/mkT)
+
exp ( e V B E / k T ) , then I [ is the ideal current and I; is the non-ideal component due to generation-recombination processes in the depleted junction region. IA0, I:o, and m are constants for the diode. The non-ideality factor, m, takes various values up to about four but is only measurable if it is greater than unity. The standard model with a uniform distribution of g-r centres in space and energy predicts a value of two. The high values observed probably originate from surface recombination. The excess, l/f, noise can be described by a strength K‘, where
and A is the device area and y is near unity. In some devices the excess current process may generate g-r noise rather than l/f noise, and in a p-i-n photodiode, three excess current and excess noise components have been linked to processes occurring in different parts of the structure (Kozlowski and Jones, 1990). The location of the excess current and its noise has been investigated by considering the correlation of the noise, expressed by K’, with various other ideal and non-ideal quantities of a transistor, such as gain and leakage currents, but little significant effect was found (Green and Jones, 1985b). A specific and large source of noise occurs when the bipolar transistor develops a surface leakage path in parallel with a junction if the isolation or guarding is insufficient (Jones and Truscott, 1987). The leakage path acts as a noisy MOS device. The use of excess noise as a criterion for selecting “good” and “bad” devices for high-reliability applications has been described for p-n diodes, GaAs FETs, Zener diodes (Savelli et al., 1984), and varactors (Karba and Ul’man, 1977). For bipolar transistors, more direct measurements have been made of the excess noise at the beginning of an accelerated life test and the failure rate.
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
245
In general good correlation has been found (van der Ziel and Tong, 1966; Hoffmann rt ul., 1976; Grzybowski ec a/., 1987; Konczakowska, 1987; Hasse et d., 1990; Konczakowska and Gladysz, 1990; Zhuang and Sun, 1990). In a more detailed analysis of the noise, it has been found that both the l/fnoise and any g-r noise component can be used as a reliability estimator since they will probably identify different failure mechanisms (Dai, 1991). Particular aspects of quality and reliability have been investigated for various devices. The noise signature of avalanche breakdown under reverse bias has been linked with degradation (Kim and Misra, 1969), and “hot spots” have been identified in solar cells (Vandamme et al., 1983) and power bipolar transistors (Shacter et al., 1978).
D. Field-Effect Transistors
Surface leakage channels between the gate and source or drain can generate excess noise in JFETs (Stocker and Jones, 1985). Similar effects have been found in GaAs MESFETs (Peransin el a/., 1989). The excess noise in MOSFETs can be very high because of the trapping effects of states near the silicon-silicon oxide interface. The noise has been shown to be some measure of the trap density, but a detailed analysis, with strong evidence in favour of one model, is still lacking. However, the noise can be used to monitor the increase in the trap density due to damage by radiation or hot electrons. These measurements are generally successful. In a more detailed study it has been shown that the pre-irradiation noise correlates well with that part of the post-irradiation shift of the threshold voltage due to slow oxide charges, but not with that part of the shift due to fast interface traps (Scofield el al., 1989; Scofield and Fleetwood, 1991). This is not expected, since the noise is normally proportional to the interface state density even though models of the noise involve tunnelling into oxide states. The investigations into the noise of single traps have also revealed that the situation is more complicated than these simple models. An interesting series of experiments on MOS capacitors with no applied voltage has shown a noise contribution due to oxide traps (Zhigal’skii and Fedorov, 1991).
E. Integrated Circuits Integrated circuits have considerable complexity, so that there are many possible locations and mechanisms for loss of quality and reliability. With
246
B. K . JONES
such a large circuit with few external connections, it is not easy to detect a weak element among many interconnected good components. However, several tests, including noise, have been used to assess the quality in order to allow selection for high-reliability applications. The phase noise or jitter between the input and output transitions of a CMOS digital IC has been shown to correlate with the damage done to a circuit by electrostatic discharge stress (Roder, 1979). In a very detailed investigation, a series of tests have been performed on CMOS ICs at intervals during an elevated temperature life test (Dorey et al., 1990). Several tests were found to be very sensitive and to correlate with the gradual deterioration in the performance. In particular, two noise tests found that the excess noise was a good measure of the deterioration. The noises measured were in the leakage current between the power supply terminals with no signal passing, and the dynamic noise, which is the noise in, or amplitude modulation of, the current pulses flowing through a CMOS circuit during a signal logic transition (Jones and Xu, 1991).
F. Optoelectronic Sources
The excess noise in GaP light-emitting diodes (LEDs) has been linked to temperature fluctuations within the junction (Lukyanchikova et al., 1972a). In GaAs LEDs, the same group found that the noise is a fluctuation in the excess or nonradiative current (Lukyanchikova et al., 1972b), which links strongly with the basic results for bipolar devices mentioned earlier. For GaAsP LEDs, it has been found that the excess electrical noise of the device increases with degradation. Both the forward and the reverse bias noise is large in poor-quality devices, and the noise before stress is a good measure of the device lifetime (Sikula et al., 1990b). Diode lasers have been extensively studied. As with the LEDs, the bias dependence of the electrical and optical noises and their correlation have been found to be a valuable source of information. The noise below the lasing threshold has been found to increase with degradation during a life test and after electrostatic discharge stress. The noise has also been found to correlate with other quality selection criteria (Vandamme and van Ruyven, 1983). The model assumes that the excess noise is in the nonradiative component of the current. Similar results have been found by other investigations (Lukyanchikova et al., 1973; Gelikonov et al., 1988; Garmash et al., 1989).
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
Iv.
247
CONCLUSIONS
There is now convincing evidence that the excess noise in many, if not most, electronic devices is a sensitive measure of the quality of the device. This measure of quality can be linked to reliability, since premature failure is likely if the manufacture is not ideal so that there is a defect, blemish, or impurity which will lead to enhanced degradation. Although many noise sources can be identified with changes in specific dc or ac characterisation parameters, the noise has the advantage that the change in the noise level with the introduction or enlargement of a defect is much greater than the change in the dc parameter. Also, noise arises from many possible defects, so that monitoring this single quantity could detect the presence of one or more of many different defects. Naturally, the general nature of the noise implies that a detailed investigation of the bias, temperature, or spectral dependence of the noise will be needed to discover what particular process is causing the noise. It is therefore necessary to be careful in interpreting noise results to ensure that a correct mechanism is assigned to the noise, and even greater care is necessary in comparing noise results between different specimens, since the origins and characteristics of the noises may be completely different. Noise has the inherent problems that it is usually a low-frequency phenomenon, which implies that the measurement will take a long time. Also, the measurement techniques are usually not simple. If a specific mechanism is being studied, it is often easier to measure some dc or ac parameter linked directly to the noise than to measure the noise itself. Now that many of the detailed mechanisms for the production of the excess noise are becoming understood, the use of noise as an analysis quantity will increase.
REFERENCES Abdala, M. A.. and Jones, B. K. (1992). “Correlation between trap characteristics by lowfrequency noise. mutual conductance dispersion, oscillations and DLTS in GaAs MESFETs.” Solid-Stale Electr., 35, 1715- 19. Arnbrozy, A. (1982). Elertronic Noise, Akadeniiai Klado, Budapest. Ambrozy, A., Gottwald, P.,and Szentpali, B. (1991). “Surface effects on the low frequency noise of thin GaAs layers,” in Noise in Physical S,y.s1ems and l / f Fhctuations (T. Musha, S. Sato and M . Yaniamoto, eds.), pp. 23-26. Ohmsha Limited, Tokyo. Andrushko, A. F.. Bakshi, I . S., and Zhigalskii, G. P. (1981). “Effects of structural factors on the 1,’fnoise of aluminium films,” Radiophys. mzd Quuiiturn Electron. 24, 343-346. Bakhtizin, R. Z., and Gots, S. S. (1981). “Burst noise in the field emission devices.” Radio Eng. and Eecrron Piiys. ( U S A ] 26, 101-106.
248
B. K. JONES
Bakshee, I. S., Potemkin, V. V., Salkov, E. A., and Khizhnyak, B. I. (1990). “I//noise in Al: The existence of surface noise,” in Noise in Physical Systems (A. Ambrozy, ed.), pp. 389-392. Akademiai Klado, Budapest. Beben, J . (1990). “Time cross-correlation functions of density fluctuations for potassium adlayer on the ( I 12) W plane,” in Noise in Physical Systems (A. Ambrozy, ed.), pp. 57-60. Akademiai Klado, Budapest. Bertotti, G., and Fiorillo, F. (1983). “A current noise investigation on stress relaxation mechanisms in thin metal films,’’ in Noise in Physical Systems and I/fNoise (M. Savelli , G . Lecoy and J-P. Nongier, eds.), pp. 339-342. Elsevier Science. Bertotti, G., Celasco, M., Fiorillo, F., and Mazzetti, P. (1978). “Study of dislocation dynamics in metals through current noise measurements,” Scripta Metal. 12, 943-948. Bertotti, G., Celasco, M., Fiorillo, F., and Mazzetti, P. (1979). “Application of the current noise technique to the investigation of dislocations in metals during plastic deformation,” J . Appl. Phys. 50, 6948-6955. Bibeau, W. E., Porter, W. A,, and Parker, D. L. (1978). “Correlation of thin film adhesion with current noise measurements of Ta2N-Cr-Au resistors on sapphire and alumina substrates,” Proc. 28th Electr. Comp. Conf., pp. 427-432. Biernat, T., and Kleint, Ch. (1990). “Coverage dependence of field emission flicker noise due to lithium adsorbed on the W(112) surface,” Appl. Phys. A50, 95-100. Blasquez, G . (1978). “Excess noise sources due to defects in forward biased junctions,” SolidStare Electron. 21, 1425-1430. Blasquez, G., and Roux-Nogatchewsky, M. (1980). “Effets d’un rayonnement ionisant sur les mecanismes de conduction et de bruit de fond basse frequence des transistors bipolaires,” Revue Phys. Appl. 15, 1599-1605. Bristow, C. W. H., Clough, W. L., and Kirby, P. L. (1971). “The current noise and non-linearity of thick film resistors,” Proc. Int. Microelectron. Symp., pp. 7.6.1 13. Brodie, I . ( I 989). “The significance of fluctuation phenomena in vacuum microelectronics,” Tech. Digest. Inl. Electron Devices Meeting 1989, IEEE, CH 2637, 521-524. Buckingham, M. J. (1983). Noise in Eleclronic Devices and Systems. Ellis Horwood, Chichester, UK. Campbell, P. M., Snow, E. S., Moore, W. J., Glembocki, 0. J., and Kirchoefer, S. W. (1991). “Light-activated telegraph noise in AlGaAs tunnel barriers, optical probing of a single defect,” Phys. Rev. Lett. 67, I 130- 1 133. Celasco, M., Fiorillo, F., and Mazzetti, P. (1976). “Thermal-equilibrium properties of vacancies in metals through current-noise measurements,” Phys. Rev. Lett. 36, 38-41, Celasco, M., Masoero, A,, Mazzetti, P., and Stepanescu, A. (1978a). “Electrical conduction and current noise mechanism in discontinuous metal films,” Phys. Rev. B17, 2553-2563. Celasco, M., Masoero, A,, Mazzetti, P., and Stepanescu, A. (1978b). “Electrical conduction and current noise mechanism in discontinuous metal films 11: Experimental,” Phys. Rev. B17, 2564-2574. Celasco, M., Masoero, A., and Stepanescu, A. (1980). “High temperature behaviour of electrical noise in discontinuous metal films,” Thin Solid Films 66, 1 1 1- 118. Celik-Butler, Z., Yang, W., Hoang, H. H., and Hunter, W. R. (1991). “Characterization of electromigration parameters in VLSl metallizations by 1//noise measurements,” Solid-State Eleclron. 34, 185-188. Chen, C. C., and Chou, Y. C. (1985). “Electrical-conductivity fluctuations near the percolation threshold,” Phys. Rev. Lett. 54, 2529-2532. Chen, T. M., and Cottle, J . G . (1986). “Physical model of burst noise in thick-film resistors,” Solid-State Electron. 29. 865-872. -
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
249
Chen, T. M., and Rhee, J . G . (1977). “The effects of trimming on the current noise of thick film resistors,” Solid-State Technol., June, 49-53. Chen, T. M., Su, S. F.. and Smith, D. (1982). “l/fnoise in Ru-based thick-film resistors,” SolidSlute Electron. 25, 821 -827. Chen, T. M., Fang. P., and Cottle, J. G . (1990). “Electromigration and I / f ” noise in Al-based thin films,” in Noise in Physical S-ystems (A. Ambrozy, ed.), pp. 515-518. Akademiai Klado, Budapest. Cho. N.-I., and Bene, R. W. (1989). “Investigation of excess noise in ultra-thin metal films deposited onto single-crystal silicon,” J . Appl. Phys. 66, 2037-2044. Chovet, A., and Rahal, S. (1979). “Bruit de conduction par I’interface semiconductorelectrolyte liquide,” C.R. Acarl. Sc. Paris 289, 65-68. Churchill, M. J., and Lauritzen, P. 0. (1971). “Carrier density fluctuation noise in silicon junction field-effect transistors at low temperatures,” Solid-State Elecrron. 14, 485-493. Cobden, D. H., Uren, M. J.. and Kirton, M. J. (1990). “Entropy measurement on slow Si/Si02 interface states.” Appl. Phys. Lett. 56, 1245- 1247. Conti, M.. and Corda, G . (1973). “Identification and characterisation of excess noise sources in ICs by correlation analysis,” Tech. Digest Int. Electron Devices Meeting, pp. 248-250. Cook, K. B., and Brodersen, A. J. (1971). “Physical origins of burst noise in transistors,” SolidSiute Electron. 14, 1237-1250. Copeland. I. A. ( 1971). “Semiconductor impurity analysis from low-frequency noise analysis,” IEEE Truns. on Electron Devices ED-18, 50-53. Cottle, J. G., and Chen, T. M. (1987). “Excess noise and its relationship to the activation energies associated with electromigration in thin Al and AI-Si films,” 4th V L S f Multilevel Interconnection Conf. 1987, EEA 67171, 449-455. Cottle, J. G., and Klonaris, N. S. (1990). “Microstructural effects on the I/f“ noise of thin aluminium based films,“ J . Electron. Mar. 19, 1201- 1206. Cottle, J. G., Klonaris, N . S., and Bordelon, M . (1990). “llf“noise and fabrication variations of Ti W/Al VLSI interconnections,” f E E E Electron. Device Lett. 11, 523-525. Dai. Y., (1991). “Optimal low-frequency noise criteria used as a reliability test for BJTs and experimental results,’’ Microelectron. Reliab. 31, 75-78. Diligenti, A., Bagnoli, P. E., Neri, B., and Specchiulli, G. (1988). “Evaluation of electromigration activation energy by means of noise measurements and M T F tests,” in Solid Stare Dei’ice.s (G. Soncini and P. U. Calzolari, eds.), pp. 365-368. Elsevier Science. Diligenti, A,, Bagnoli, P. E., Neri, B., Bed, S., and Mantellassi, L. (1989). “A study of electromigration in aluminium and aluminium-silicon thin film resistors using noise techniques,” Soliti-State Electron. 32, 1 1 - 16. Dorey, A. P., Jones. B. K., Richardson. A. M . D., and Xu, Y. Z. (1990). Rapid Reliability Assessrnenr qf VLS1C.s. Plenum Publishing Co., New York. Dutta, P., and Horn, P. M. (1981). “Low frequency fluctuations in solids: I/fnoise,” Rev. Mod. Ph1/.F. 53, 497-5 I6 D’yakonova. N. V . , Levinshtein. M . E., and Rumyantsev, S. L. (1991). “Temperature dependence of the low frequency noise in structurally perfect GaAs and after destructive compression.” S O V .Ph,ys. Semic,ond. 25, 21 7-21 8. Fang, P., Hung. K . K . , KO, P. K., and Hu, C. (1991). “Hot-electron-induced traps studied through the random telegraph noise,” IEEE Electron. Dev. Lett. 12, 273-275. Farmer, K. R., and Buhrman, R. A. (1989). “Defect dynamics and wear-out in thin silicon oxides,” Semicond. Sci. Techno( 4, 1084- 1105. Feng, S. (1991). “Conductance fluctuations and I / / noise magnitudes in small disordered structures: Theory.” in Mesoscopic Phenomena in Solids, (Altshuler. B. L., Lee, P. A., and Webb, R. A., eds ) Elsevier Science.
250
B. K. JONES
Fleetwood, D. M., and Giordano, N. (1982). “Experimental study of excess low-frequency noise in tin,” Phys. Rev. B25, 1427-1430. Fleetwood, D. M., and Giordano, N . (1983a). “Ilfnoise in metal films: Resistivity dependence and sample-to-sample variation,” in Noise in Physical Systems (M. Savelli, G. Lecoy and J-P. Nongier, eds.), pp. 201-204. Elsevier Science. Fleetwood, D. M., and Giordano, N. (1983b). “Effect of strain on the Ilfnoise of metal films,” Phys. Rev. 828, 3625-3627. Fleetwood, D. M., and Giordano, N. (1985). “Direct link between l/fnoise and defects in metal films,” Phys. Rev. B31, 1157- 1160. Fleetwood, D. M., and Scofield, J. H. (1990). “Evidence that similar point defects cause I/f noise and radiation-induced-hole trapping in MOS transistors,” Phys. Rev. Lett. 64,579-582. Fleetwood, D. M., Postel, T., and Giordano, N. (1984). “Temperature dependence of the I/f noise of carbon resistors,” J . Appl. Phys. 56, 3256-3260. Fleetwood, D. M., Beutler, D. E., Masden, J. T., and Giordano, N. (1987). “The role of temperature in sample-to-sample comparisons of the I/fnoise of metal films,” J . Appl. Phys. 61, 5308-5313. Garmash, I . A,, Zverkov, M. V., Kornilova, N . B., Morozov, V. N., Nabiev, R. F., Semenov, A. T., Sumarokov, M. A,, and Shidlovskii, V. R. (1989). “Analysis of low-frequency fluctuations of the radiation power of injection lasers,” J . Sov. Laser Res. ( U S A ) 10, 459476. Gelikonov, V. M., Mironov, Yu. M., and Khanin, Ya. 1. (1988). “Relationship between lowfrequency fluctuations of the intensity of radiation and fluctuations of the voltage in semiconductor lasers,” Sov. J . Quantum Electron. 18, 1252-1258. Ghibaudo, G., Roux, O., and Brini, J. (1991). “Impact of scaling down o n low frequency noise in silicon MOS transistors,” in Noise in Physical Systems (T. Musha, S. Sat0 and M. Yamamoto, eds.). Ohmsha, Tokyo. Giordano, N . (1989). “Defect motion and low-frequency noise in disordered metals,” Rev. Solid Slate Sci. 3, 27-69. Giordano, N. (1 991). “Conductance fluctuation and low-frequency noise in small disordered systems: Experiment” in Mesoscopic Phenomena in Solids (Altshuler, B. L., Lee, P. A,, and Webb, R. A,, eds.). Elsevier Science. Grant, A. J., White, A. M., and Day, B. (1978). “Low frequency noise and deep traps in Schottky barrier diodes,” in Noise in Physical Systems (D. Wolf, ed.), pp. 175-180. SpringerVerlag, Berlin. Green, C. T., and Jones, B. K. (1985a). “l/fnoise in bipolar transistors,” J . Phys. D18, 77-91. Green, C. T., and Jones, B. K. (1985b). “Correlations between Ilfnoise and dc characteristics in bipolar transistors,” J . Phys. D18, 2269-2275. Grzybowski, B., Konczakowska, A,, and Spiralski, L. (1987). “l/fnoise as the indicator of manufacturing process quality of bipolar transistors,” in Noise in Physical Systems (C. M. van Vliet, ed.), pp. 485-488. World Scientific, Singapore. Gupta, M. S. (1975). “Applications of electrical noise,” Proc. IEEE 63, 966-1010. Hallgren, R. B. (1990). “Low-bias noise spectroscopy of field-effect transistor channels: Depletion-region trap models and spectra,” Solid Sfate Electron. 33, 1071- 1079. Haslett, J. W., and Kendali, .I.M. (1972). “Temperature dependence of low-frequency excess noise in junction gate FETs,” IEEE Trans. ED19, 943-950. Haslett, J. W., Kendall, E. J. M., and Scholz, F. J. (1975). “Design considerations for improving low-temperature noise performance of silicon JFETs,” Solid-state Electron. 18, 189-207. Hasse, L., Konczakowska, A,, and Spiralski, L. (1990). “Noise measurements of power semiconductor devices for reliability evaluation,” in Noise in Physical Systems (A. Ambrozy, ed.), pp. 247-250. Akademiai Klado, Budapest.
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
251
Hayat, S. A., and Jones, B. K. (1987). ”The excess noise in buried-channel MOS transistors,” Semicond. S c i . Technol. 2, 732-735. Hayat, S. A., Jones. B. K., and Russell, P. C. (1988). “l/f noise in ohmic MOS inversion layers,” Semicond. Sci. Technol. 3, 919-925. Hoffmann, K., Erb. H.-J., and Roder, H. (1976). “Ein neues Verfahren der Zuverlassigkeitsanalyse fur Halbleter-Bauteile,” Freyuenz 30, 19-22. Hofman, F.. Zijlstra, R. J . J., and Henning, J. C. M. (1988). “Current noise in n-type AI,Ga,-,As,” Solid-State Electron. 31, 279-282. Hofman, F., Zijlstra. R. J. J., and Bettencourt de Freitas, J. M. (1990). “Voltage noise in an AI,Ga,_,As-GaAs heterostructure,” J . Appl. Phys. 67, 2482-2487. Hooge, F. N. (1990a). “The relation between l/fnoise and number of electrons,” Physica B162, 344-352. Hooge, F. N . (1990b). *‘l/f noise in semiconductors,” in Noise in Physical Systems and I/f Fluctuafions (T. Musha, S. Sato and M. Yamamoto, eds.), pp. 7-14. Ohmsha Limited, Tokyo. Hooge, F. N., Kleinpenning, T. G. M., and Vandamme, L. K. J . (1981). “Experimental studies on Ilfnoise,” Rep. Prog. Phys. 44, 479-532. Hsu. S. T. (1970). “Surface state related Ilfnoise in p-n junctions,” Solid-State Electron. 13, 843- 855. Hsu, S. T., Whittier, R. J., and Mead, C. A. (1970). “Physical model for burst noise in semiconductor devices,” Solid-State Electron. 13, 1055-1071. Jantsch, O., and Feigt, I. (1969). “l[/noise observed on semiconductor surfaces.” Phys. Rev. Lett. 23, 912-913. Jones, B. K. (1975) “ l l f a n d l/Af noise produced by a radio-frequency current in a carbon resistor,” Electron. Lett. 12, 110-1 11. Jones, B. K., and Katai, V. 0. (1987). “The non-ideal current in bipolar transistors,” SolidState Electron. 30, 987-989. Jones, B. K., and Mzunzu, E. S . C. (1989). “The stability of polycrystalline thin film resistors measured using excess noise,” Microelectron. Reliab. 29, 543-544. Jones, B. K., and Taylor, G. P. (1992). “Spectroscopy of surface states using the excess noise in a buried-channel transistor,” Solid-State Electronics 35, 1285- 1289. Jones, B. K., and Truscott, T. (1987). “The characteristics of emitter-collector surface leakage channels in bipolar transistors,” Microelectron. Reliab. 27, 923-93 1. Jones, B. K., and Xu, Y. 2. (1991). “Excess noise as an indicator of digital integrated circuit reliability,” Microelectron. Reliab. 31, 35 1-361. Jones, B. K., and Xu, Y . Z. (1993). “Characterization of electromigration damage by multiple electrical measurements,” Microelectronics and Reliddity, in press. Kandiah, K. (1990). “Improvements in low-frequency noise of MOSFETs for front end amplifiers,” Nuclear Instr. and Methods in Physics re.7. A288, 150- 156. Kandiah, K., and Whiting, F. B. (1978). “Low frequency noise in junction field-effect transistors,” Solid-State Electron. 21, 1079- 1088. Kandiah, K., and Whiting, F. B. (1991). “Non-ideal behaviour of buried channel CCDs caused by oxide and bulk silicon traps,” Nucl. Instr. and Methods in Physics Res. A305, 600-607. Kandiah, K., Deighton, M . O., and Whiting, F. B. (1989). “A physical model for random telegraph signal currents in semiconductor devices,” J . Appl. Phys. 66, 937-948. Kapshin, Y. S., Noskin, V. A,, and Yakubovich, B. I. (1984). “Artificial stimulation of an excess low-frequency noise,” Sov. Tech. Phys. Lett. 10, 446-447. Karba, L. P., and Ul’man, N . N. (1977). “The possibility of predicting the reliability of varactors on the basis of their noise characteristics,” Tefecom. and Radio Eng. Part 2 ( W . S . A . ) 31, 122-124.
252
B. K. JONES
Karmann, A,, and Schulz, M. (1989). ”Characterization of individual defects in MOSFETs,” Appl. SurJ Sci. 39, 500-507. Kasukabe, S., and Tanaka, M. (1981). “Reliability evaluation of thick film resistors through measurements of third harmonic index,” Electrocomp. Sci. and Technol. 8, 167-174. Kenddil, E. J. M., and Tadros, L. B. (1975). “The temperature dependence of low-frequency noise in avalanche reference diodes,” Phys. Star. Sol. ( a ) 27, 291 -302. Kim, Y. D., and Misra, R. P. (1969). “Noise spectral density as a diagnostic tool for reliability of pn junctions,” IEEE Trans. Rel. R-18, 197-200. Kirtley, J. R., Theis, T. N., Mooney, P. M., and Wright, S. L. (1988). “Noise spectroscopy of deep level (DX) centres in GaAs-AI,Ga,_,As heterostructures,” J . Appl. Phys. 63, 15411548. Kirton, M. J., and Uren, M. J. (1989). “Noise in solid state microstructures: A new perspective on individual defects, interface states and low-frequency ( I / f ) noise,” Adv. in Phys. 38, 367468. Kleinpenning, T. G. M. (1990). “ILfnoise in electron devices,” in Noise in Physical Systems 1989 (A. Ambrozy, ed.), pp. 443-454. Akademiai Klado, Budapest. Knott, K. F. (1981). “Low energy electron beam investigation of planar transistors using scanning electron microscopy with particular reference to burst noise,” in Noise in Physical Systems (P. H. E. Meijer, R.D. Mountain and R.J. Soulen, eds.), pp. 97-99. N.B.S. Washington. Knott, K. F. (1986). “Is 1l.f noise fundamental to the generation-recombination process?’ in Noise in Physical Systems (A. D’Amico and P. Mazzetti, eds.), pp. 353-355. Elsevier Science. Koch, R. H., Laibowitz, R. B., Alessandrini, E. I., and Viggiano, J. M. (1985a). “Resistivitynoise measurements in thin gold films near the percolation threshold,” Phvs. Rev. B32,69326935. Koch, R. H.. Lloyd, J . R., and Cronin, J. (1985b). “I/fnoise and grain boundary diffusion in aluminium and aluminium alloys,’’ Phys. Rev. Lett. 55, 2487-2490. Kogan, Sh. M. (1985). “Low-frequency current noise with a I/fspectrum in solids,” Sov. Phys. USP.28, 170-195. Kogan, S. M., and Nagaev, K. E. (1982). “Low-frequency current noise and internal friction in solids,” Sov. Phys. Solid State 24, 1921-1925. Kogan, S. M., and Nagaev, K. E. (1984). “On the low frequency current l/fnoise in metals,” Solid State Comm. 49, 387-389. Komori, J., Takata, Y.,Mitsuhashi, J.-I., and Tsubouchi, N . (1991). “A fast testing of electromigration immunity using noise measurement technique,” Proc. IEEE Int. Conf. on Microelectronic Test Structures 4, 257-261. Konczakowska, A. (1987). “Lifetime dependence on 1lfnoise of bipolar transistors,” in Noise in Physical Systems (C. M. van Vliet, eds.), pp. 489-492. World Scientific, Singapore. Konczakowska, A,, and Gladysz, H. (1990). “Noise as a lifetime estimator for bipolar transistors,” in Noise in Physical Sysrems (A. Ambrozy, ed.), pp. 241 -245. Akademiai KIado, Budapest. Kozlowski, D. A., and Jones, B. K. (1990). “Excess noise and current as a predictor of reliability in pin photodiodes,” IEE Colloquium on Optical Detectors. Digest No. 19901014, 13/1-16,
Kusy, A,, and Szpytma, A. (1986). “On llfnoise in Ru02-based thick resistive films,” SolidState Electron. 29, 657-665. Liou, D. M., Gong, J., and Chen, C . C. (1990). “The I/f2 noise spectrum derived from electromigration-induced resistance change,” Jap. J . Appl. Phys. 29, 1283- 1285. Liou, D. M., Gong, J., and Chen, C. C. (1991a). “Electromigration effect on low frequency noise in A1 thin films,’’ Jup. J . Appl. Phys. 30, 708-710.
ELECTRICAL NOISE AS A MEASURE O F QUALITY A N D RELIABILITY
253
Liou, D. M., Gong, J., and Tsai. C. Y. H. (1991). “Noise associated with reverse anneal phenomenon in high current arsenic-implanted silicon,” J . Mat. Sci.Materials in Electronics 2, 230--235. Lukyanchikova. N. B., Garbar, N. P., Sheinkman, M. K., and Zargarjantz, M. N. (1972a). “Nature of current and forward bias electroluminescence excess noise in GaAs-diodes,” Solid-State Electron. 15, 801 -807. Lukyanchikova, N. B., Sheinkman, M. K., Garbar, N. P., and Svechnikov, S. V. (1972b). “Excess current and luminescence noise of p-n junctions in Gap,” Physica 58, 219-224. Lukyanchikova. N. B., Garbar, N. P., and Zargaryants, M . N. (1973). “Light and voltage noise of lasers based on Al,-Ga,-,As-GaAs heterojunctions,” Phys. Star. Sol. ( a ) 20, 637-645. Madenach, A. J., and Stoll, H. (1987). “Noise spectroscopy of traps in ion-implanted polysilicon thin films,” in Noise in Physicul Systems (C. M. van Vliet, ed.), pp. 74-77. World Scientific, Singapore. Madenach, A. J., and Werner, J. H. (1988). “Noise spectroscopy of silicon grain boundaries,” Phvs. Rev. B38, 13150- 13162. Maes, H. E., and Usmani, S. H. (1983). “l/f noise in thin oxide p-channel metal-nitrideoxide-silicon transistors,” J . Appl. Phys. 54, 1937- 1949. Maes, H. E., Usmani, S. H., and Groeseneken, G. (1985). “Correlation between llfnoise and interface state density at the Fermi level in field effect transistors,” J . Appl. Phys. 57, 481 14x13. Maloberti, F., Montecchi. F., and Svelto, V. (1975). “Flicker noise in thick film resistors,” Alta Frequenzu 44, 68 1-683. Mantese, J. V., and Webb, W. W. (1985). “I/’f’noise of granular metal-insulator composites,” Phys. Rev. Lett. 55, 2212-2215. Martin, J. C.. Blasquez, G., de Cacqueray, A., de Brebisson, M., and Schiller, C. (1972). “L’effet des dislocations cristallines sur le bruit en creneaux des transistors bipolaires au silicium,” Solid-State Electron. 15, 739 - 744. Masoero, A . , Rietto, A. M., Morten, B., and Prudenziati, M. (1983). “Excess noise and its temperature dependence in thick-film (cermet) resistors,” J . Phys. D16, 669-674. Massiha, G. H., Chen, T. M., and Scott, G . J. (1989). “Detection of hot spots in thin metal films via thermal noise measurements,” lEEE Electron Device Lett. 10, 58-60. Mihaila, M., and Amberiadis, K . (1983). “Noise phenomena associated with dislocations in bipolar transistors,” Solid-Stute Electron. 26, 109- 1 13. Mihaila, M., Amberiadis, K., and van der Ziel. A. (1984). “l/J g-r and burst noise induced by emitter-edge dislocations in bipolar transistors,” Solid-Stute Electron. 27, 675-676. Mikoshiba, H. (1976). “Silicon nitride film thickness dependence on bipolar transistor characteristics.” J . Electrochem. Soc. 123, 1539- 1545. Misra, R., Pandey. S., and Sundaresan, V. (1991). “Reliability prediction of solid dielectrics using electrical noise as a screening parameter,” IEEE Trans. on Reliability R-40, 113-1 16. Mullet-, R. (1978). “Generation-recombination noise,” in Noise in Physical Systems (D. Wolf. ed.), pp. I3 -25. Springcr-Verlag, Berlin. Murray, D. C.. Evans, A. G. R., and Carter, J . C. (1991). “Shallow defects responsible for G . R . noise in MOSFETs,” l E E E Truns. ED38, 407-415. Mykolaitis, H., Zurauskas. D., and Misiukonis, A. (1991). “Hysteresis loop fluctuations in amorphous alloy,” in Noise in Physicul Systems und l / ’ f Fluctuations (T. Musha, S. Sato and M . Yamamoto, eds.), pp. 69 71. Ohmsha Limited, Tokyo. Nashashibi, T. S., Carter, M. A., and Taylor, S . (1983). “Generation-recombination noise in Si JFETs,” in N o i s e in Physicul Systems (M. Savelli, G . Lecoy and J-P. Nongier, eds.), pp. 219222. North-Holland, Amsterdam.
254
B. K. JONES
Neri, B., Diligenti, A,, and Bagnoli, P. E. (1987). “Electromigration and low-frequency resistance fluctuations in aluminium thin-film interconnections,” IEEE Trans. Electron. Devices ED-34, 2317-2322. Neri, B., Diligenti, A,, Aloe, P., and Fine, V. A. (1989). “Electromigration in thin metal films, activation energy evaluation by means of noise technique,” Vuoto. Sci. Technol. 19,2 19-222. Ortmans, L. H. F., and Vandamme, L. K. J. (1976). “Characterization of impulse-fritting procedures of contacts by measuring l/fnoise,” Appf. Phys. 9, 147-151. Palenskis, V. (1990). “Flicker noise: Review,” Lith. Phys. J . 30, 1-35. Pelz, J . , Clark, J . , and King, W. E. (1988). “Flicker ( l / f ) noise in copper films due to radiationinduced defects,” Phys. Rev. B38, 10373-10386. Pender, L. F., and Wintle, H. J. (1979). “Electrical Ilfnoise in insulating polymers,” J . Appl. Phys. 50, 361-368. Peransin, J. M., Vignaud, P., Rigaud, D., and Dumas, J. M. (1989). “Low-frequency noise associated to the gate current of GaAs MESFETs and MODFETs,” in Noise in Physical Systems (A. Ambrozy, ed.), pp. 281 -284. Akademiai Klado, Budapest. Pinet, D., Comallonga, J., Savelli, M., and Flassayer, C. (1987). “l/f noise in thin film resistors,” in Noise in Physical Systems (C. M. van Vliet, ed.), pp. 335-338. World Scientific, Singapore. Planat, M., and Gagnepain, J . J. (1986). “Temperature dependence of I/f noise in quartz resonators in relation with acoustic attenuation,” in Noise in Physical Systems (D’Amico, and P. Mazzetti, eds.), pp. 323-326. Elsevier Science. Prudenziati, M., Masoero, A,, and Rietto, A. M. (1985). “Conduction mechanism and flicker noise in ZnO varistors,” J . Appl. Phys. 58, 345-350. Ralls, K. S., and Buhrman, R. A. (1991). “Microscopic study of l/f noise in metal nanobridges,” Phys. Rev. B44, 5800-58 17. Rodbell, K. P., and Koch, R. H. (1991). “Use of a two-frequency a.c. bridge to measure I/f noise in copper films,” Phys. Rev. B44, 767-1771. Rodbell, K. P., Ficalora, P. J., and Koch, R. (1987). “Effect of hydrogen on electromigration and Ilfnoise in gold films,’’ Appl. Phys. Lett. 50, 1415-1416. Roder, H. (1979). “The recognition of latent defects in electronic circuits by measuring phase noise.” Frequenz 33, 101-105. Rodgers, C. T., Buhrman, R. A,, Gallagher, W. J., Raider, S. I., Kleinsasser, A. W., and Sandstrom, R. L. (1987). “Electron trap states and low-frequency noise in tunnel junctions,” IEEE Trans. MAG-23, 1658- 1661. Savelli, M., Lecoy, G., Dinet, D., Renard, J., and Sauvage, D. (1984). “l/fnoise as a quality criterion for electronic devices and its measurement in automatic testing,” A E T Con$ Session 4 , pp. 1-27. Scholz, F. J., and Roach, J. W. (1992). “Low-frequency noise as a tool for characterization of near-band impurities in silicon,” Solid-State Electron. 35, 447-452. Scholz, F., Hwang, J. M., and Schroder, D. K. (1988). ”Low frequency noise and DLTS as semiconductor device characterisation tools,” Solid-State Electron. 31, 205-217. Schwarz, J. A., Patrinos, A. J., Bakshee, 1. S., Salkov, E. A,, and Khizhnyak, B. I . (1991). “Grain size dependence of l/f noise in AI-Cu thin-film interconnections,” J . Appl. Phys. 70, 1561-1564. Scofield, J. H., and Fleetwood, D. M. (1991). “Physical basis for non-destructive tests of MOS radiation hardness,” IEEE Trans. Nuclear Science NS-38, 1567- 1571. Scofield, J . H., and Webb, W. W. (1985). “Resistance fluctuations due to hydrogen diffusion in niobium thin films,” Phys. Rev. Lett. 54, 353-356. Scofield, J. H., Mantese, J. V., and Webb, W. W. (1986). “Temperature dependence of noise processes in metals,” Phys. Rev. B34, 723-73 1.
ELECTRICAL NOISE AS A MEASURE O F QUALITY A N D RELIABILITY
255
Scofield, J . H., Doerr, T. P., and Fleetwood, D. M. (1989). “Correlation between pre-irradiation 1 //-noise and post-irradiation oxide-trapped charge in MOS transistors,” IEEE Trans. NS36, 1946-1953. Scorzoni, A,, Neri, B., Capile, C., and Fantini, F. (1991). “Electromigration in thin-film interconnection lines: Models, methods and results,” Marerial Science Reports 7, 143-220. Seeger, A,, Stoll, H., and Frank, W. (1987). “Interpretation of thef-’ noise in irradiated metals i n terms of one-dimensional migrating defects,” Material Science Forum 15-18, 237-242. Shacter, S. B., van der Ziel, A,, Chenette. E. R., and Sutherland, A. D. (1978). “The effect of hot spots on the noise characteristic of large-area bipolar transistors,” Solid-state Electron. 21, 599-602. Shanabarger, M . R., Wilcox, J., and Nelson, H. G . (1982). “Observation of an excess current noise resulting from oxygen adsorption onto iron films,’’ J . Vac. Sci. Technol. 20, 898-899. Sikula, J.. Sikulova, M., Vasina, P., and Koktavy, B. (1981). “Burst noise in diodes,” in Noise in Physical Sy.~wm.v(P. H. E. Meijer, R.D. Mountain and R.J. Soulen, eds.), pp. 100-104. N.B.S. Washington. Sikula, J., Vasina, P., Musilova, V., Chobola, Z., and Rothbauer, M. (1984). “l/fnoise in GaAs Schottky diodes,” Phys. Stat. Sol. ( a ) 84, 693-696. Sikula, J., Koktavy. B., Vasina, P., Schauer, P., and Strasky, L. (1990a). “Impulse noise in capacitors,“ in Noise in Physical Systems (A. Ambrozy, ed.), pp. 297-300. Akademiai Klado, Budapest. Sikula, J., Koktavy, B., Vasina, P., and Cermakova, A. (1990b). “I/fnoise, degradation and reliability of GaAsP LEDs,” in Noise in Physical Systems (A. Ambrozy, ed.), pp. 479-482. Akademiai Klado, Budapest. Sikulova, M., Sikula, J., Vasina, P., and Nevecny, V. (1990). “Burst noise in pn junction devices,” in Noise in Physical Systems (A, Ambroq, ed.),pp. 227-230. Akademiai Klado, Budapest. Sodini, D., Touboul, A., Lecoy, G., and Savelli, M. (1976). “Generation-recombination noise in the channel of GaAs Schottky-gate field-effect transistors,” Electron. Leu. 12, 42-43. Stegherr, M. (1984). “Flicker noise in hot electron degraded short channel MOSFETs,” SolidStare Electron. 27, 1055-1056. Stephen, J . H (1986). “Low noise junction field effect transistors exposed to intense ionizing radiation,” IEEE Trans. NS-33, 1465- 1469. Stevens, E. H., Gilbert, D. A,, and Ringo, J. A. (1976). “High-voltage damage and lowfrequency noise in thick-film resistors,” IEEE Trans. PHP-12, 351-356. Stocker, J. D.. and Jones, B. K. (1985). “The gate current noise of junction field effect transistors.” J . Phys. D18, 93- 102. Stojadinovic, N. D. (1979). “Effects of emitter edge dislocations o n the low frequency noise of silicon planar n-p-n transistors,” Electron. Lett. 15, 340-342. Stoll, H. (1983). ”Vacancy noise measurements in metals: Calculation of the power spectrum,” Appl. Phys. A30, 1 17- 121. Strasilla, U. J , and Strutt, M. J. 0. (1974). “Measurement of white and Ilfnoise within burst noise,” IEEE Proc. 62, 1711-1713. Stroeken, J. T. M., and Kleinpenning, T. G. M. (1976). “l/fnoise of deformed crystals,” J . Appl. Phvs. 47, 469 1-4692. Sun, M. I . , Cottle, J . G., and Chen, T. M. (1990). “Determination of AI-based metal film lifetimes using excess noise measurements,” in Noise in Physical Systems (A. Ambrozy, ed.), pp. 519-522. Akademiai Klado, Budapest. Takagi, K., Mizunami, T., Suzuki, J.-I., and Masuda, S. (1986). “I/fnoise in metal contacts and granular resistors,’’ IEEE Trans. CHMT-9, 141- 144. Taniguchi, M., Inoue, T.. and Mano, K. (1985). “The frequency spectrum of electrical sliding contact noise and its waveform model,” IEEE Trans. CHMT-8, 366-371.
256
B. K. JONES
Theunissen, F. A. P. M. (1953). “Noise of metal contacts,” Appl. Sci. Res. B3, 201 -208. Tiitto, H. (1989). “Use of Barkhausen noise in fatigue,” Nondestr. Tea. Eval. 5, 27-37. Uren, M. J., Day, D. J., and Kirton, M. J. (1985). “l/fnoise and random telegraph noise in silicon metal-oxide-semiconductor field-effect transistors,” Appl. Phys. Lett. 47, 1195- 1197. Vandamme, L. K. J. (1988). “Annealing of implants reduces lattice defects and l/j‘noise,” Solid State Phenomena B1 and 2, 153-158. Vandamme, L. K. J . , and Tijburg, R. P. (1976). “l/f noise measurements for characterizing multispot low-ohmic contacts,” J . App/. Phys. 47, 2056-2058. Vandamme, L. K. J., and van Ruyven, L. J. (1983). “I/fnoise as a reliability test for diode lasers,” in Noise in Physical Systems (M. Savelli, G. Lecoy, and J.-P. Nougier, eds.), pp. 245247. Elsevier Science. Vandamme, L. K. J., Alabedra, R., and Zommiti, M. (1983). “llf noise as a reliability estimation for solar cells,” Solid-State electron. 26, 67 1-674. van der Ziel, A. (1986). Noise in Solid State Devices and Circuits. Wiley, New York. van der Ziel, A,, and Tong, H. (1966). “Low frequency noise predicts when a transistor will fail,” Electronics 39(24), 95-97. van Rheenan, A. D., Bosman, G., and Zijlstra, R. J. J. (1987). “Low frequency noise measurements as a tool to analyse deep level impurities in semiconductor devices,” SolidState Electron. 30, 259-265. Van Vliet, C. M. (1991). “A survey of results and future prospects on quantum I/fnoise and l / f noise in general,” Solid-State Eleclron. 34, 1-21. Verbruggen, A. H., Koch, R. H., and Umbach, C. P. (1987). “Correlation between l / f noise and grain boundaries in thin gold films,’’ Phys. Rev. B35, 5864-5867. Vossen, J. L. (1973). “Screening of metal film defects by current noise measurements,” Appl. Phys. Lett. 6, 287-289. Wang, K. W., van der Ziel, A,, and Chenette, E. R. (1975). “Neutron-induced noise in junction field-effect transistors,” IEEE Trans. ED-22, 591-593. Weissman, M. B. (1988). ‘ * l / f noise and other slow, non-exponential kinetics in condensed matter,” Rev. Mod. Phys. 60,537-511. White, A. M., Grant, A. J., and Day. B. (1978). “Deep traps in ideal n-InP Schottky diodes,” Elecrron. Lett. 14, 409-41 1. Yang, W., and Celik-Butler, Z. (1991). “A model for electromigration and low-frequency noise in thin metal films,” Solid-State Electron. 34, 91 1-916. Yassine, A. M., Chen, T. M., and Beitman, B. A. (1991). “Characterization of probe contact noise for probes used in wafer-level testing,” IEEE Electron. Device Lett. 12, 200-202. Zhigalskii, G. P. (1991). “Relationship between l/f noise and non-linearity effects in metal films,” JETP Lett. 54, 513-516. Zhigalskii, G. P., and Bakshi, 1. S. (1980). “Excess noise in thin aluminium films,” Radio Eng. and Electron. Phys. 25, 61 -68. Zhigalskii, G . P., and Fedorov, A. S. (1991). “A study of excess noise in MOS structures under quasi-equilibrium conditions,” Radiophys. Quantum Electron. ( U S A ) 34, 380-384. Zhigalskii, G. P., Kurov, G. A,, and Siranashvili, I. Sh. (1983). “Excess noise and mechanical stress in thin chromium films,” Radiophys. and Quunturn Electron. 26, 162-166. Zhuang, Y., and Sun, Q . (1990). “hFE instability and llfnoise in bipolar transistors,” Proc. IEEE Int. Reliab. Phys. Symp. 90 C H 27R7-0, pp. 290-291. Zimmerman, N. M., and Webb, W. W. (1988). “Microscopic scatterer displacements generate the llfresistance noise of H in Pd,” Phys. Rev. Lett. 61, 889-892.
ELECTRICAL NOISE AS A MEASURE OF QUALITY A N D RELIABILITY
257
Zimmerman, N. M . . and Webb, W . W . (1990). “I//resistance noise complements anelasticity measurements of hydrogen motion in amorphous PdxoSi~a,” Phys. Rev. Lett. 65, 1040- 1043. Zimmerman. N . M., Scofield, J. H., and Mantese, J . V. (1986). “Volume versus surface origin of I //’noise in metals,” Phys. Rev. B34, 773-171.
This Page Intentionally Left Blank
ADVANCES IN ELECTRONICS AND ELECTRON PHYSICS, VOL 87
Parallel Processing Methodologies for Image Processing and Computer Vision S . YALAMANCHILI* School of Electrical Engineering, Georgia Institute of Technology. Atlanta. Georgia
J. K . AGGARWAL** Department qf Electrical and Computer Engineering, University of Texas, Austin, Texas
I . Introduction . . . . . . . . . . . . . . 11. Matching Algorithms and Architectures . . . A. The Influence of Algorithmic Characteristics 1. Algorithmic Characteristics , , , , . . 2. Impact on Hardware Architectures , . . B. The Influence of Architectural Characteristics 1. Architectural Characteristics , . , . . 2. Impact on Algorithm Design . . . . . 111. Architecture-Driven Approaches . . . . . . A. Fine-Grained Parallel Architectures . . . . B. Coarse-Grained Parallel Architectures . . . C . The Mapping Problem in Architecture-Driven IV. Application-Driven Approaches. . . . . . . A. Hybrid Parallel Architectures . . . , . . 8. Model-Driven Parallel Architectures . . . V. Emerging Research Areas . . . . . . . . . References . . . . . . . . . . . . . . ,
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. .
.
. . . . . . . . . . . . . .
.
. . . . .
.
. . . . . . . . . .
. . . . . . . . . . Methodologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . . . . .
. . . .
. . . .
. . . . . . . .
. . . .
. . . .
259 26 1 263 265 261 268 269 270 213 274 279 284 285 285 292 296 297
I. INTRODUCTION The human visual system enables us to perceive a stable coherent threedimensional environment. Providing machines with comparable visual capabilities is the goal of modern computer vision and image processing research Considerable progress has been made in the development of computational models for analyzing and “understanding” digital images. However, a major obstacle in the development and widespread use of computer vision has been the enormous data throughput and processing * This research was supported in part by a grant from Digital Equipment Corporation ** This research was supported in part by a contract with IBM Corporation. 259
Copyrighl 8 I994 by Academic Prcss, Inc All rights of reproduction in any form reserved. ISBN 0-12-014729-7
260
S. YALAMANCHILI A N D J. K . AGGARWAL
requirements. In many applications, these have far exceeded the capacities of existing architectural organizations. Moreover, it has become apparent that future applications possess requirements well beyond the limits of current and future uniprocessor architectures. Parallel processing has been perceived as the solution, and this perception has led to the conception, design, and subsequent analysis of a number of parallel architectures for image processing and computer vision (Yalamanchili and Aggarwal, 1984; Yalamanchili et ul., 1985). These efforts have produced a valuable insight into the nature of problems encountered in designing and utilizing parallel architectures in general, and for computer vision in particular. A study of methodologies for the development of parallel processing architectures for image processing and computer vision over the recent past reveals a dichotomy - architecture-driven and application-driven methodologies. Spurred primarily by the need to exploit large degrees of inherent parallelism, historically most approaches have been architecture-driven. The choice of architectural organization in most of the early research efforts was influenced by a few aspects of the design. For example, mesh connected arrays were the most popular organization for many early (and current but for differing reasons) image processing architectures. The choice was primarily motivated by the two-dimensional structure of image data and the local repetitive nature of low-level operations. Technological constraints further encouraged the use of such array architectures. However, as processing progressed to encompass higher levels of data and algorithms, it became clear that this organization was very inefficient. Similar observations were made in several other architecture-driven approaches to computer vision. It became apparent that image processing and computer vision algorithms evolved through different levels of processing, and that the characteristics of algorithms at various levels dictated the choice of alternative parallel architectures. This understanding subsequently led to complementary approaches to developing parallel architectures: application-driven approaches where the computational Characteristics of the algorithms dictated the organization and operation of parallel architectures. Parallel image processing has been dominated by architecture-driven approaches, while parallel computer vision has benefited from the application of both methodologies. Section I1 discusses characteristics of image processing and computer vision algorithms and outlines the essential elements of both the architecture- and application-driven methodologies. We view image processing as a prelude to computer vision. Image processing algorithms operate on images, extracting and representing scene information. Higher-level algorithms operate largely on unstructured data for scene interpretation and understanding. Contemporary computer vision encompasses processing from sensing to interpretation. Therefore,
PARALLEL PROCESSING METHODOLOGIES
26 I
throughout this paper the term computer vision will implicitly be assumed to include the lower-level functions of image processing. Where the unique characteristics of image processing necessitate a distinction from computer vision, we will explicitly do so. As a result, this discussion may cast work described originally as image processing in the more general paradigm of computer vision, and will provide a more uniform basis for comparing architectures and processing approaches. Sections 111 and IV present and discuss existing architecture- and application-driven approaches to parallel image processing and computer vision. The discussion is not meant to be exhaustive, but rather representative of existing and proposed techniques. In either approach, the goal is to match the requirements of the algorithms with the capabilities of the architecture to provide the best overall processing solution. In its most general form, this is the well-known mapping problem (Bokhari, 1981a). It is a problem that has received a considerable amount of attention in the context of specific applications, including computer vision architectures (Sadayappan and Ercal, 1987; Bokhari, 1981b; Lee and Aggarwal, 1987, 1990; Bokhari, 1988; Ranka and Sahni, 1988, 1990). Section I11 also discusses relevant issues related to mapping algorithms onto parallel architectures. Finally, we take a brief look at the future and examine emerging areas of research that will affect the development of parallel processing methodologies for computer vision. Some of these must be in place for many of the architecture-driven or application-driven approaches to be viable, while others promise to open potentially new sources of computational power. Examples of such research areas include automated parallelism extraction, symbolic processing, and neural networks.
11. MATCHING ALGORITHMS A N D ARCHITECTURES
In general, one can establish broad relationships between architectural and algorithmic features. For example, data parallel computations lend themselves to pipelined computation, and local iterative computations lend themselves to synchronous processor arrays. However, integrating these relationships into a cohesive, functional, and efficient implementation is still an elusive goal. How does one go about systematically achieving this goal'? Are there general principles that can be exploited? We can identify two distinct system design philosophies that provide insight into the nature of the answers to these questions. Our discussion of these philosophies is structured around the high-level view of systems shown in Fig. I . Image processing and computer vision is a source of algorithms and data to
262
S. YALAMANCHILI AND J . K. AGGARWAL Image Processing and Computer Vision Algorithms
B
Programmin Languages
Execution Mechanisms
Hardware Architectures
FIGURE I . Prototypical view of a parallel processing system
perform specific tasks. The user designs an algorithm and translates this into a parallel program. The program is compiled into a form suitable for one of possibly several execution mechanisms. For example, a functional program may be implemented by one of several execution mechanisms: string reduction, graph reduction, or dataflow. It is unlikely that there can ever be a consensus as to which of the many available execution mechanisms is preferable. The preferred mechanism will more than likely be a function of the specific language and application domain. Finally, the hardware architecture is designed to support specific execution mechanisms. While the view shown in Fig. 1 is generic in nature, the discussion in this paper is distinguished by its application to image processing. With respect to Fig. 1, we can identify two distinct system design philosophies. The first is an architecture-driven or bottom-up approach. Motivated primarily by a need to exploit the large degrees of inherent parallelism, earlier approaches adopted highly parallel architectures, with the organization influenced by a few aspects of the design, e.g., the twodimensional nature of the image data or vector characteristics of the specific low-level algorithms. Execution mechanisms were subsequently chosen to make the organization work. The earliest and most popular execution models were the Single Instruction Stream Multiple Data (SIMD) stream models and the Multiple Instruction Stream Multiple Data (MIMD) stream models. These have more recently been augmented with dataflow, reduction, object-oriented, production-system, and process models (Dennis, 1980; Arvind and Iannucci, 1983; Treleaven et al., 1982; Almasi and Gottlieb, 1989; Davis and King, 1975). Appropriate synchronization and communication primitives were then designed, dictated by how the hardware was organized and perhaps made visible at the programming-language level. Algorithms now had to be designed to make efficient use of the parallel execution model and its hardware implementation. A good example of how difficult this can be are early examples of algorithms for boundary tracing and region
PARALLEL PROCESSING METHODOLOGIES
263
growing on mesh-connected computers. Communication patterns d o not necessarily occur in a regular local pattern and are highly data-dependent, and it is not obvious how best to orchestrate them so as to minimize the interprocessor communication overhead. The complexity of the resulting algorithm is more a function of the choice of architecture organization and execution mechanism than of any lack of parallelism in boundary tracing. Other factors unrelated to image processing that influenced the design of architectures included the changing constraints introduced by the advent of VLSI. Short interconnects, few processor types, and localized communication were desirable attributes independent of the natural structure of image processing algorithms. Ideally, these factors should be complemented by algorithm requirements to produce an efficient system. In contrast, an application-driven approach initially entails a study of the requirements of algorithms and the selection/design of programming languages to adequately express them. The execution mechanism and hardware implementation are organized to reflect support for the chosen algorithms, data structures, and programming languages. In this context, an application-driven approach starts at the highest level in making choices and proceeds to systematically determine compatible choices at lower levels. Architecture-driven methodologies start at the lowest level and proceed to influence the choices at the language level and algorithm level. The former is certainly the preferred approach for the design of certain classes of systems, such as real-time embedded systems where application domain constraints have to be satisfied. This approach avoids the problem of “making the algorithm fit” that is often encountered in architecture-driven approaches. The following subsections discuss the essential characteristics of both methodologies with respect to image processing and computer vision. This includes a discussion of how algorithmic features affect the architecture and, in turn, how architectural features influence the structuring of algorithms that execute on them. A truly effective methodology for designing parallel computer vision architectures must effectively reconcile the often conflicting demands of both approaches.
A . The Influence of Algorithmic Characteristics
In investigating parallel processing approaches to computer vision, it is necessary to understand the nature of the computations involved. Processing evolves through several stages with a varying mix of symbolic and numeric processing (Yalamanchili et al., 1985), as illustrated in Fig. 2. In the initial stages - or low level - the processing is primarily numeric operations on large numbers of pixels and is characterized by stringent data
264
S. YALAMANCHILI A N D J . K. AGGARWAL Data Rate
Decisions
Processing Rate
bilskec
t
instdsec 6 10
7
10
8 10
t
9 10
Image Data
FIGURE 2. Levels of processing
throughput requirements. Extracted information may be entities denoted by symbols and possessing numeric values, such as gradients of edges, areas of regions, etc. Eventually, distinct entities are identified, and their organization and the relationships between them are encoded in structures such as semantic networks. Processing at this level - high level - is primarily symbolic and is characterized by complex data structures for representing the observed environment and constituent entities. These structures facilitate making inferences about the scene for applications, such as object recognition from partial views, obstacle avoidance, recognition, the tracking of objects in space, etc. As processing proceeds from low levels to high levels, the volume of data analyzed is substantially reduced. However, the “information content” of the data is much higher. For exampll, where pixel values represent reflectance properties of points in the scene, values now may represent motion parameters of objects, measures of shape, etc. Consequently, computations involving these data structures are complex, e.g., tasks such as object recognition and automatic obstacle avoidance. For example, while position and volume information of objects in a scene may not require substantial amounts of storage, the obstacle avoidance and trajectory computations involved in autonomous vehicle navigation can be very complex. In contrast, many low-level operations on images are simple and repetitive. Thus, in progressing from low- to high-level processing, the source of the computational burden shifts from large volumes of data to complex numerical and inferencing operations. At the high level we have tasks such as inference, involving symbolic operations on the descriptions of scene entities. At the low level, we have predominantly arithmetic operations on large arrays of numbers. As a result of this systematic change in the nature
PARALLEL PROCESSING METHODOLOGIES
265
of the processing, one can identify distinct levels of processing, e.g., low, intermediate, high, object, pixel, etc. 1. Algorithmic Characteristics
Nudd ( 1980) originally distinguished between low-level and high-level processing based on requirements for accuracy, data throughput, and the probability of branching. The basis for this distinction can be broadened to include several other relevant features:
0 0 0
0 0
Data types Data resolution Data structures Data dependencies Instruction throughput Data volume Operation density Data access patterns Granularity of parallelism
These features exhibit different characteristics as computations progress from low to high levels. Data types shift from primarily numeric to primarily symbolic. Whereas 32-bit integer and single or double precision floating-point resolutions may suffice for most low-level and intermediate operations, the structures used to encode scene and object descriptions typically benefit from tagged data types and, as a result, field addressing within words. Resolution requirements also typically decrease at higher levels of processing. Low-level data structures typically use multidimensional arrays of integer, real, boolean, etc., types. In contrast, high-level data structures require a certain degree of flexibility in how they are constructed, manipulated, and accessed. Data structures at this level benefit from user-defined types, have flags and status bits associated with them, and make extensive use of pointers. This necessitates the efficient access to fields within structures and dynamic memory allocation. Memory requirements are not as predictable as they are at lower levels of processing, e.g., with image data. Figure 3 illustrates this progressive change in the nature of data structures with a few simple examples. With respect to control flow behavior, low-level algorithms tend to be highly structured, repetitive, and composed of fixed sets of operations with relatively few data-dependent branches. Thus, it is possible to obtain relatively accurate estimates of operation counts. In contrast, high-level algorithms are highly data-dependent, and the processing requirements
266
S. YALAMANCHILI A N D J . K . AGGARWAL IrnageData: pixel values
21 32 87 3423 32 44 62 81 98 32 54 11 32 12
Edges:
+21 -34 -11 + I
gradient mdduection
f
\
t
-
Hegions:
-Region C
ObjectslScenes: StructuraWolumetric Relationships
FIGURE 3. Changing data requirements
can vary widely. Establishing accurate instruction throughput requirements is, therefore, very difficult to do in general. Estimating the data throughput requirements in high-level processing is also difficult. Instruction and data throughput is highly dependent on the application domain. For example, while it is possible to estimate accurately the volume of sensor data to be processed at the low level, it is more difficult to estimate the number of targets to be processed, and even more difficult to accurately estimate the number of arithmetic/logical operations required to compute various object features. The operation density refers to the number of operations (arithmetic, logical, relational, etc.) required for the smallest element of interest. For example, at the lowest level, this is the pixel. At the highest level, it may be the faces of an object, corners, or other primitive features from which complete object descriptions are constructed. In view of the complementary relationship between complexity and data throughput, operation density is intended to capture the source of the computational burden and aid the development of parallel processing schemes, e.g., should the data be partitioned and processed in parallel, or should the algorithms be partitioned into segments that can execute in parallel? Regardless of the source of parallelism, data access patterns are an important component of performance. The data access patterns may be regular and predictable or, at the other extreme, random. Low-level algorithms exhibit the former behavior. For example, when the magnitude and direction of an edge in a local neighborhood is to be determined, the order in which pixels are accessed can be predetermined. It is possible to make
PARALLEL PROCESSING METHODOLOGIES
267
use of existing memory skewing and unscrambling schemes to ensure concurrent (and therefore) fast access to required pixel data (Hwang and Briggs, 1984). On the contrary, the organization and concurrent access to high-level data structures is more challenging. The data structure access patterns tend to be less predictable. Furthermore, with data structures making use of dynamic memory allocation and being dependent on the content of the imaged scenes, it is extremely difficult to use statically structured memory mapping schemes. Concurrent data access is also tightly coupled to the granularity of the parallelism. Image processing algorithms generally exhibit “natural” granularities of parallelism. Low-level operations exhibit parallelism at the pixel neighborhood level, whereas high-level algorithms may find natural parallelism at larger levels of granularity, such as the level of objects, faces of objects, etc. Intermediate levels of processing exhibit parallelism in feature-level computations, e.g., lines, contours, etc. Thus, we see a progression from fine-grained parallelism at low levels to relatively coarse-grained parallelism at higher levels of processing. 2. Impact on Hardware Architectures From the preceding discussion it is evident that the nature of the algorithmic characteristics change as processing evolves from low to high levels. These characteristics, in turn, affect the design of the hardware architectures. The following discussion provides several examples. Data types affect architecture design in a straightforward manner. If the requirements are such that field addressability is desirable (e.g., as in highlevel algorithms), a high-speed shift-mask-field extract operation is desirable. If such operations form a significant proportion of the total number of operations, such as in many high-level algorithms operating on tagged data types and requiring run-time type checking, hardware support may be necessary to achieve the required processing throughput. The data dependencies in the algorithms significantly affect the design of an architecture by precluding (or at least making more complex) the use of many common approaches to high-speed computation. For example, if the algorithms have a large number of data-dependent branches, the use of instruction prefetch units, pipelined computation, and dedicated address generations units becomes much more complex and may even become infeasible. On the other hand, if the algorithms exhibit very little or no data-dependent behavior, it is possible to exploit this predictable behavior in arriving at high-speed pipelined solutions. The following is one example of how this may be done. Because of the relatively long memory access time (as compared to processor cycle times), the processor-memory interface is a bottleneck. This problem is magnified in parallel architectures with large shared memories. If the
268
S. YALAMANCHILI AND J. K. AGGARWAL
algorithms exhibit no, or very little, data-dependent behavior, it is possible to have the program counter located in the memory and to increment it at the end of every instruction cycle. This eliminates instruction address traffic across the processor-memory interface at the expense of possibly a few extra control signals across the processor-memory interface and a higher penalty to recover from (infrequent) branches. Alternatively, in the absence of data-dependent branches, a heavily pipelined instruction pre-fetch and decode unit becomes viable. Such predictable behavior also enables the use of multiple memory banks to relieve the processor-memory bottleneck. Vector machines are predicated upon regular referencing patterns to arrays of data and are, thus, able to achieve impressive execution rates on the order of hundreds of MFLOPS to several GFLOPS on certain problems. Reference patterns of low-level algorithms exhibit this behavior. The behavior also encourages the use of caches. Since data access patterns are almost deterministic and characterized by relatively small working set size (Hwang and Briggs, 1984) or footprint size (Thiebault and Stone, 1987), very fast caches with very high hit rates are possible. The execution speed can approach the maximum processor throughput rather than be limited by memory bandwidth. The cache coherence (Archibald and Baer, 1986) problem encountered in parallel architectures is not as severe a problem because of the predictability of the memory reference patterns. The construction, manipulation, and access patterns of the data structures affect the design of the memory management unit. High-level data structures, in general, cannot be statically allocated. They tend to be dynamically generated and of an indeterminate size. Static allocation is extremely wasteful of memory (assuming that the maximum possible size is allocated). The use of linked structures in high level algorithms raises two issues - (i) chasing down pointers and (ii) garbage collection. While use of arrays requires a fair amount of indexing calculation, using linked lists requires indirect addressing and is therefore memory-intensive. Further, though high-level algorithms may typically have a data volume that is small compared to those encountered in low-level analysis, these volumes are themselves still substantial and exhibit less locality, and garbage collection becomes an important performance issue. While conventional garbage collection techniques are applicable, performance requirements may dictate hardware support.
B. The Infruence of’ Architectural Characteristics
It is also possible to identify a set of features to distinguish parallel architectures in much the same way we identified features to distinguish between
PARALLEL PROCESSING METHODOLOGIES
269
high- and low-level computer vision algorithms. These features are naturally independent of a specific application domain. However, specific tradeoffs are determined by the characteristics of the applications. We can think of the range of parallel architectures in several ways. Our presentation finds the use of “granularity of parallelism” the most useful. We can distinguish between data parallelism and control parallelism. In the former, parallelism is achieved by concurrent operations over large data sets. In the latter, parallelism is available at one of many levels: interinstruction, intra-instruction, and task level. Inter-instruction refers to parallelism within each instruction and occurs at the micro-architecture level. For example, Very Large Instruction Word (VLIW) architectures belong to this class (Colwell et al., 1988). Such machines use parallel processors driven by a single large (hundreds of bits wide) microcode word. The major advantage is the minimal impact on the programming language level, making parallelism potentially accessible to the existing large base of software developed for uniprocessor machines. Intra-instruction-level parallelism refers to the simultaneous execution of multiple instructions constrained by data dependencies. Dataflow is a good example of an approach to exploiting parallelism at this level (Dennis, 1980; Arvind and Iannucci, 1983). Task-level parallelism refers to computations structured as tasks that periodically synchronize to exchange data and control information. The Multiple Instruction Stream Multiple Data (MIMD) stream model of computation is an example (Almasi and Gottlieb, 1989), and the Intel IPSC Hypercube is a good example of a machine in this category. Inter- and intra-instruction level parallelism is categorized as fine-grained computation, while task-level parallelism is categorized as medium- to coarse-grained computation. Data parallelism can occur at various levels of granularity. From the point of view of matching algorithms and machines, low-level algorithms typically exhibit fine-grained (i.e., pixel-level) data parallelism. High-level algorithms largely exhibit coarse-grained (e.g., task-level) control parallelism (Yalamanchili el af., 1985). 1. Architectural Characteristics
In contrast to the application-driven approaches where the algorithmic characteristics, as outlined in the previous section, affect the hardware design decisions, this secion examines architecture-driven approaches where the adoption of specific structures affects the design of algorithms. The design of the parallel architectures is the product of technological and manufacturing constraints. Algorithms are designed “after the fact.” The following outlines some of the dominant constraints that have dictated the
270
S. YALAMANCHILI A N D J . K . AGGARWAL
organization of parallel architectures:
0 0
Increasing chip densities Limited number of pins on a chip Distributed vs. shared memory Synchronization mechanisms Input/output
These constraints have influenced the organization of architectures that span the spectrum from fine-grained to coarse-grained data and control parallelism. The following sub-section describes their impact on the design of algorithms in general and on computer vision in particular. 2. Impact on Algorithm Design Increasing densities and the advent of single-chip processing arrays have had the largest impact on the design of parallel architectures. Though enabling higher chip densities, the technology also introduces constraints that exert considerable influence on the organization of parallel architectures when all of the components share the same wafer surface. These constraints arise from physical as well as manufacturing constraints (Foster and Kung, 1980; Kung, 1982). If we are constrained to two-dimensional VLSI, the architecture must be laid out on a plane and hence contain a minimum of crossovers of communication paths. Costs associated with the fabrication technology encourage a small number of steps, which in turn encourages a small number of processor types. Making copies of a circuit is inexpensive compared to the initial design and development costs of masks for fabrication. Long or irregular communication paths are undesirable because they can become the principal determinant of chip speeds. Thus, the technology favors the use of a number of identical processors or cells, interconnected in a regular fashion to yield short communication paths. Such structures necessitate the careful design of algorithms to produce local regular communication patterns. This can only be achieved by structuring computations and assigning them to processors such that dependencies are restricted to local neighborhoods of cells or communication can be accommodated by the propagation of information by iterative local operations. Coincidentally, this is well matched to the structure of many low-level image processing algorithms. However, this is not the case for many intermediate or high-level image processing algorithms. Such architectures affect the design of programming languages and techniques by requiring suitable abstractions for specifying computations organized for such arrays. As might be expected, algorithms that possess an iterative regular structure
PARALLEL PROCESSING METHODOLOGIES
27 1
(e.g., many numerical algorithms and low-level image processing aIgorithms) find natural computational solutions on such arrays. Along with increasing densities has come the problem of the number of pins available on a chip. It has been empirically observed that the chip densities increase with the volume of a sphere, whereas the available pins increase with the surface area of the sphere (Lint and Agerwala, 1981). Therefore, it becomes necessary to minimize the required bandwidth across the available pins or decrease the requirements for pins as densities increase. The systolic model of computing was proposed in direct response to these technological constraints (Kung, 1982). Early examples showed how it is possible to match the bandwidth of the memory with the computing power, without increasing the pin requirements, by using a linear array of simple processing cells. Subsequently, solutions and systolic arrays have grown considerably more complex to include a multidimensional array of processors supporting the synchronous transfer of multiple data streams through the cells. Use of these arrays for specific problems necessitates the formulation of a “systolic solution” - the specification of the data streams that flow through the arrays and the computations to be performed at each cell (Kung, 1987). Early solutions were developed by insight gained from previous solutions. General characteristics of algorithms that make them amenable to systolic solutions are a relatively recent phenomenon and are beginning to find their way into compilers and language abstractions. We reiterate the necessity of designing algorithms to fit this model. While chip densities increased, manufacturers were beginning to interconnect boards of general-purpose processors to form multiprocessors. The processors typically accessed shared memory, or each processor was provided with its own local memory and processors communicated explicitly via messages. Language constructs were provided to send and receive messages or to access shared variables. Algorithms now had to be structured based on the number of processors available. Interactions between tasks executing on the processors had to be explicitly stated and specified by the applications programmer. In shared-memory architectures, computations were structured to minimize the conflicts encountered in accessing shared variables by distinct tasks. Now in distributed-memory architectures, i t is preferable for communicating tasks to be assigned to processors that can communicate directly to minimize the overhead in store-and-forward interprocessor communication. Thus, the applications programmer is always cognizant of the structure of the underlying machine, and attempts to use that information to generate the most efficient solution, i.e., to maximize speedup. As in single-chip architectures, communication between processors is orchestrated such that they are (as far as possible) local.
272
S. YALAMANCHILI A N D J. K. AGGARWAL
The interaction between distinct tasks requires synchronization to enforce the ordering of events and to ensure correct operation, e.g., data cannot be read before it is produced. The synchronization mechanisms dictate the structure of, and therefore applicability of, various algorithms. In tightly coupled parallel architectures, interactions between processors are very constrained. The SIMD mode of computation is an example of tightly coupled parallel computation. A central controller broadcasts a single instruction stream that is executed by a large number of processors, each operating on distinct data sets. Another example is a number of processors, each executing a distinct instruction stream (MIMD), interconnected to a shared memory over a bus. Processors can potentially interfere on every access to memory with some other processor. In SIMD machines, computations must be structured to avoid data dependencies. In the latter machine, they must be structured to minimize the number of accesses to memory and to minimize the interactions between processors. Along the same lines, vector machines require the code to be vectorizable in order to achieve any increase in throughput. The architecture is visible to the applications programmer, influences the structure of the algorithms, and must be exploited in order to achieve any substantial performance increase. The hardware support for the synchronization operations necessary to ensure the correct interaction between the cooperating tasks (or instructions) can be provided at many levels. At the lowest level, this may take the form of atomic read-modify-write access to a memory location. This allows the atomic read-write to memory upon which test-and-set primitives are based. Alternatively, language constructs can be provided at the programming-language level or operating-system level that are implemented as sequences of instructions that perhaps disable interrupts for the duration of their execution, or are based on read-modify-write hardware implementations. When processors do not share memory, alternate mechanisms may be employed. Synchronization may be implicit, occurring on every clock cycle, as in synchronous systems, or may be explicit based on sendacknowledge protocols. The INMOS Transputer-based parallel architecture supports a process model of parallel computation (Johnson and Durham, 1986). Constructs are provided to read and write channels between tasks. Send-acknowledge protocols are used with processes busy waiting on channel input. The type of synchronization protocols and the performance of their implementation directly affect the granularity of computations the applications programmer creates. When synchronization of the computations across processors is expensive, computations and data are structured to amortize the cost across larger data sets. For example, if synchronizing the exchange of image data across a channel takes on the order of milliseconds, computations are structured to exchange large blocks of pixels. If
PARALLEL PROCESSING METHODOLOGIES
273
the cost of synchronization is on the order of microseconds, data between processors can be exchanged in smaller sets of pixels with a resultant need to restructure the computations. Techniques to efficiently input/output image data have probably received the least amount of attention by comparison. Their practical importance cannot be overlooked, since processing speed is meaningless if the architecture is I/O-bound. The CLIP4 (Preston et al., 1979; Duff, 1978) parallel architecture used a large shift register to acquire imagery from a camera and to shift it into the memories of the CLIP4 array. The Massively Parallel Architecture (MPP) is a two-dimensional array processor that uses staging memories to interface to the processor array (Batcher, 1980). The effect on algorithm design is to encourage structuring computations so that they may be overlapped with input/output. Finally, architecture-driven approaches in general introduce a new problem - t h a t of mapping computations onto the arrays. Since the organization of these architectures is influenced by constraints often unrelated to the computations being performed, there may be no natural fit between the structure of the computation and the organization of the architecture. The resulting need to “fit” the algorithms onto the machine is a very complex problem. By their very nature, most application-driven approaches d o not have this problem.
111. ARCHITECTURE-DRIVEN APPROACHES
An architecture-driven approach to parallel computer vision is essentially a migration of capabilities from the hardware architecture through the levels of system organization shown in Fig. 1. Execution mechanisms, programming-language features, and appropriate algorithms are successively determined by the design of the hardware/software architecture, which has already been fixed. The early evolution of image processing architectures followed this path. It should be pointed out that from the literature, it is not always exactly clear what the motivations are for a particular design. Our discussion makes assumptions and inferences in classifying an approach as architecture- or application-driven. At the very least, we believe the chosen examples do reflect the organizations that result from adopting an architecture-driven or application-driven philosophy. We found it informative to distinguish between architectures based on the granularity of the parallel computation they support. As a result, the following discussion refers to fine-grained parallel architectures and coarse-grained parallel architectures. Since a precise definition of granularity is elusive, the
274
S. YALAMANCHILI A N D J. K . AGGARWAL
latter is intended to include architectures subjectively classified as mediumgrained. We have found that such a distinction groups architectures that have similar effects on the design of parallel computer vision algorithms.
A . Fine-Grained Parallel Architectures
One of the earliest proposals for parallel image processing architectures was that of a two-dimensional array of processing elements (PEs), with each processing element connected to four neighbors (Unger, 1958). The original motivation stemmed from the fact that this organization matched the structure of the image data and, therefore, local operations could be performed at high speeds. The repetitive nature of many of these operations, e.g., convolution, edge detection, etc., prompted the use of the SIMD mode of operation where all PEs executed the same operation in lockstep on local data. Such arrays can also be viewed as application-driven in their organization, although only for what we now know to be a relatively small segment of the range of computer vision algorithms. Subsequent advances in circuit technology made it feasible to implement such large arrays of PEs. Moreover, the property of a single type of generalpurpose processor interconnected in a regular pattern was compatible with the constraints of VLSI technology. Many such SIMD processor arrays were proposed and fabricated, including the Stanford VLSI array (Lowry and Miller, 1981), the Massively Parallel Processor (Batcher, 1980), the Cellular Logic Image Processor (CLIP) series of machines at University College in London (Preston et al., 1979), the Finite Element Machine (Jordan, 1978), the Distributed Array Processor (DAP) (Hunt and Reddaway, 1983), and the Binary Array Processor (Reeves, 1980). Two representative examples of the state of the art of such processor arrays are Thinking Machines’ Connection Machine (Hillis, 1984) and the CLIP7 processor array (Fountain et al., 1988). The architecture of the Connection Machine was motivated by the desire to exploit the large degree of data parallelism in applications such as device-level simulation, image understanding, and artificial intelligence. Connection machines can comprise from several thousand to several million simple bit serial processing elements (PE) interconnected in a regular two-dimensional array. Groups of processors (16 in CM-I) are also interconnected in a multidimensional binary hypercube to offset the long communication delays between some processing elements through the mesh. Figure 4 shows a CM-I PE. The PE has 4K bits of storage per cell and an assortment of status bits and flags; it can perform simple bit serial boolean and arithmetic operations. In three cycles, a PE can read two bits
PARALLEL PROCESSING METHODOLOGIES
215
Processing Element
1
4KBiuMemory
1
I B i t S e r i d ~ I~ ~ FIGURE 4. Architecture of the Connection Machine.
and a flag, perform an operation, and write a single memory bit and a flag. The array as a whole can be viewed as a smart memory operating under the control of the host as illustrated in Fig. 4. Parallelism is achieved by simultaneous operations on a large number of image pixels. Low-level operations are particularly efficient. Flexible abstractions are provided to perform global operations such as sorting, searching, classes of data reduction operations, etc. The massive parallelism provided by the current and future generations of Connection Machines make them attractive in addressing the computational demands of data-intensive applications, such as low-level image processing. Another recent PE design is based on the CLIP7 chip (Fountain et af., 1988). The design of this chip is the result of a successful program at University College in London in developing and experimenting with
I
1
FIGURE 5. Architecture of the CLIP7 chip.
276
S. YALAMANCHILI A N D J. K . ACCARWAL
two-dimensional processor arrays for parallel image processing. The last array was the 96 x 96 CLIP4 (Duff, 1978) array. The CLIP7 chip was based on the lessons learned from that experience and subsequent, more advanced designs. Figure 5 shows the organization of the CLIP7 chip. It is a 16-bit processor with an 8-bit external data path and is partitioned into a data processing and input/output sections. The chip also provides an interface to off-chip memory. While it is intended to be configured into SIMD arrays, autonomy in responding to local data-dependent conditions is provided via local status bits and some degree of local control. The design of the CLIP’IA, a linear array of PEs configured with the CLIP7 chip, illustrates the degree of flexibility in configuring the chip in multidimensional arrays. Figure 6 shows a single cell. The motivation for initially constructing a linear array was to demonstrate the ease with which higher-dimensional arrays could be constructed or emulated. In a CLIP7A PE, while one processor is used for data manipulation, the other is used for address generation. In conjunction with the ability for local control, this makes the PE very flexible and relatively powerful compared to the simpler bit serial PEs used in many other arrays. This makes more general use of the PEs possible and likely. The CLIP7A PE and the Connection Machine PE represent distinct ends of the spectrum of design choices for PEs for state-of-the-art twodimensional arrays. The principal criticism of the use of such arrays is their dependencies on the structure of the input and output data. The power of these arrays is strongly dependent upon the relationship between the spatial distribution of the image data, the similar distribution of the output data, and the proximity between a PE and its input data (i.e., from other PEs). Once these relationships are altered, the effectiveness of the arrays degrades rapidly. Thus, a great deal of care is taken in the design of algorithms to attempt to ensure these relationships, or at least to minimize their effects if efforts are made to apply these arrays at higher levels of processing (Wood, 1986). In general, unless inter-PE communication can be
Data Out Chip
FIGURE 6. Architecture of the CLIP7A processing element.
PARALLEL PROCESSING METHODOLOGIES
277
orchestrated to occur in local neighborhoods, performance rapidly degrades and can become the principal determinant of performance. The need to carefully design algorithms for such arrays is illustrated by examining a parallel algorithm for image normalization on a meshconnected computer (Lee et af., 1987). Image normalization is the process of constructing invariant representations of objects. These descriptions can be used to detect objects in scenes where their relative positions are not known a priori. The normalization algorithm comprises three steps translation, scaling and rotation. The solution considers N x N binary images with one pixel per PE. Normalization reduces to a process of remapping the pixels to PEs. The remapping is implemented by pixels being communicated to their new PEs. Inter-PE communication follows the storeand-forward communication model. Potentially, pixels may be remapped to PEs relatively far apart in the array. The solution provided by Lee et ul. (1987) specifies a number of synchronous communication patterns. These patterns are iteratively applied in conjunction with a local routing algorithm at each PE. Buffering within and between PEs resolves routing conflicts over inter-PE links. A cycle is an iterative application of a sequence of synchronous data transfers. For an N x N image, translation requires at worst N / 2 cycles, rotation 2N cycles, and scaling at worst O ( N ) cycles. The worst case results from potentially having to transfer data across the array at each step, and therefore the overall complexity is the order of the width of the array. Note that this is using binary images with one bit per PE. Even if larger blocks of pixels/PE are used, the overall execution speed is R ( N ) . This illustrates the importance of keeping communication local and further underscores the difficulty of applying such arrays to higher levels of processing. Similar lower-bound results are shown for other algorithms (Gentleman, 1978; Miller and Stout, 1985). More recently, improved algorithms for convolutions on mesh-connected multicomputers have been developed (Ranka and Sahni, 1990). Previous algorithms based on SIMD arrays required broadcast communication. Some of these models support constant time broadcast communication, while others support logarithmic time communication. The algorithms proposed by Ranka and Sahni (1990) require no broadcast. As a result, the algorithms are applicable to both SIMD and MIMD arrays. For an N x N image and an M x M template, previous algorithms required O ( M 2 q )time, where q is the resolution of the computed values. The current algorithm requires O ( M 2 r ) ,where r is the maximum of the resolution of the image and template values. Therefore, this algorithm is more efficient when the size of the image and template values is small compared to the size of the convolution values. The alternative dominant model of fine-grained parallel image processing
278
S. YALAMANCHILI A N D J. K . AGGARWAL
is the systolic model of computation. Several solutions exist for low-level image processing algorithms (Webb and Kanade, 1986; Kung, 1984) on multidimensional systolic arrays. Different algorithms may be realized by organizing the flow of data (streams) through such arrays. Systolic solutions have been obtained for a variety of problems, such as edge detection, FFT, connected component labeling, LU decomposition, matrix multiplication, etc. However, difficult problems arise in determining whether a systolic solution exists for a certain problem and, if so, deriving such solutions. The experience gained from the large body of work in the development of systolic computing is now making its way into language constructs, libraries, and compiler technology for modern systolic arrays. Representative of the state of the art is the CMU Warp project (Webb and Kanade, 1986). The CMU Warp is a linear systolic array of 10 cells developed for use in computer vision. It was designed to provide high-speed operation for a number of low-level image processing tasks, but its flexibility makes it possible to program a variety of tasks. Figure 7 shows the architecture of the Warp cell. Each cell can provide up to 10 MFLOPS and consists of a floating-point adder and multiplier, as well as extensive support for local control. Also included in the cell is 4K of storage and a microstore. The Warp operates as an attached processor to a Sun Host through a 40 Mbytes/s IjO interface. The cells are linked by three communication lines - two for data and one for addressing. The array can operate as a purely systolic array, making extensive use of existing systolic solutions for various problems, or it can operate as a set of processors on a bus in the SIMD or MIMD mode within the limits of the addressing constraints (Webb and Kanade, 1986). Programming the Warp array is based on recognizing algorithms as being local, global, or partial. In local algorithms, each output depends on only a relatively small number of inputs. These are the easiest to program. In global algorithms, each output depends on all of the inputs. These can be the most difficult to program. However, if the total input data can fit inside a Warp cell, results can still be produced continuously because partial I
I
bFppEJl * FIGURE 7. A Warp cell.
PARALLEL PROCESSING METHODOLOGIES
279
results that form the outputs can be passed between cells. For example, this happens with the 1,024-point FFT. If the input data is too large to fit inside a cell but the output data can fit, partial results can be accumulated and read out at the end (e.g., histogramming). If both the input and output are too large to fit inside Warp, several passes are required. Partial algorithms require much more communication between the Warp array and the host. Both of the dominant approaches - multidimensional SIMD and MIMD arrays and systolic arrays - will continue to evolve and make use of future hardware and software developments. The use of optical computing technology is such an emerging area that it is also beginning to attract interest in the development of fine-grained parallel image processing architectures. Many image processing algorithms on ( N x N ) mesh-based VLSI architectures take O ( N ) time (Miller and Stout, 1985). This can be reduced to O(log N ) for a limited class of problems when using hierarchically structured architectures, such as pyramids or a mesh of trees (Miller and Stout, 1988; Kumar, 1986). It is an accepted fact that the communication complexity is the dominant factor for these algorithms. This has resulted in the proposal for architectures using optical interconnects (Eshaghian and Kumar, 1989). The architecture is based on an optical model of computation that allows unit delay interconnects and can efficiently simulate each step of an N-processor parallel random access machine (PRAM) in O(logN) time using O ( N / log N ) processors. This makes most existing algorithms dcrived for PRAMS available on the proposed architecture. Several potential implementation strategies for optical interconnects are briefly discussed: free space interconnects, electro-optical arrays, and electro-optical crossbars. Examples of the complexity of image processing algorithms for labeling, for the computation of the convex hull, for template matching, and for iterative algorithms are discussed. The use of optical interconnects realized a unit time delay network that simultaneously addressed the problems of communication complexity (by a factor of log N ) and the physical barriers to realizing similar performance with electrical interconnects. While the technology makes the use of existing algorithms more efficient, by itself it does not address the difficulty of formulating parallel algorithms to match the structure of these arrays.
B. Coarse-Grained Parallel Architectures
As relatively powerful single-board computers became available, coarsegrained multiprocessor architectures followed suit. The early machines suffered from lack of programming support. Moreover, they were built with off-the-shelf components and therefore were not very efficient, since most
280
S. YALAMANCHILI A N D J . K . AGGARWAL
existing components were designed for use with sequential processors. These lessons have led to second- and third-generation machines that are considerably more efficient and productive. The availability of such coarse-grained machines began to encourage experimentation with various image algorithms that intuitively appeared to be able to take advantage of that level of parallelism, usually in intermediate and high-level computer vision. One example is the Butterfly multiprocessor architecture, which is a Multiple Instruction Stream Multiple Data Stream (MIMD) architecture that exploits parallelism at a relatively coarse level of granularity. The Butterfly is a shared memory parallel architecture where processors are interconnected to a memory through a multistage interconnection network. Figure 8 shows the high-level organization of the Butterfly. Physically, each memory module is associated with a processor, but is accessible over the interconnection by any other processor. Therefore, it realizes a nonuniform-access shared memory. The network is circuit-switched, and remote accesses take logarithmic time relative to local memory accesses to a processor. Thus, from a performance viewpoint it is useful to organize data structures and accesses such that remote accesses are minimized. The structure and organization suggest that a relatively large number of operations should be performed on a processor in between accesses to shared data. This, in conjunction with the flexible control structure, motivated the use of the Butterfly in the recognition of 3-D objects from range images (Bhanu and Nuttal, 1989). Object recognition is a computationally demanding task that can use contextual information, expert knowledge, Butterfly Interconnection I
Memory Modules
FIGURE
8. The organization of the Butterfly multiprocessor.
PARALLEL PROCESSING METHODOLOGIES
28 1
rule-based control, etc. Implementation necessitates support for a flexible control structure, interaction between tasks at the level of segmentation, the computation of surface properties, object surface characterization, etc. The Butterfly provided a means for evaluating whether such coarse-grained machines were naturally suited to such tasks (Bhanu and Nuttal, 1989). This project implemented a goal-directed object recognition system to investigate the utility of nonuniform-access shared memory machines. As in most rule-based systems, this system incorporates several sets of control and high-level processing rules to direct multiple processors in an optimized search for a goal object. Figure 9 shows the general processing steps performed on each object. Parallelism is achieved by performing this step simultaneously for several objects, as well as distributing some of the steps across several processors. The overall control strategy is as follows. Each processor is capable of performing any step (see Fig. 9) on any object. Each processor possesses an identical set of control rules used in the search for useful work. Work can be found in a set of either data queues or job queues. The former contain image objects awaiting the next stage of processing (Fig. 9), while the latter contain specific steps to be performed. Processing terminates on the occurrence of a match or on the completion of processing for all objects. The results were encouraging. Near linear speedup was observed with up to 12 processors. This indicated that efforts to use parallelism at the object level (up to seven processors) and within objects (up to 12 processors) were successful. A goal-directed search, however, can reduce the achievable speedup depending upon when the object is found. Although speedup was diminished in the goal-directed approach, overall speedups still occurred in the range of 6.5 to 10.2 for the images examined. This indicates that sufficient compute-intensive processing was required to offset some of the effects of the goal-directed search. Processor utilization measurements showed a fairly even distribution of work among the processors and, as expected, improved with the complexity of the images (i.e., complexity and the number of objects). Overall, it was possible to achieve a balanced Begin Find o b j e c t s i n t h e image; Identify surface types; Segment s u r f a c e s ; Label s u r f a c e s ; Find s u r f a c e a d j a c e n c i e s ; Resolve o c c l u s i o n ; End F I G I W 9 Processing steps
in
parallel object recognition from range images
282
S. YALAMANCHILI A N D J. K. AGGARWAL
FIGURE 10. Distributed architecture for WORMOS.
workload with as many as 16 processors and as few as five objects. The results of this project are encouraging for the use of coarse-grained machines for high-level computer vision tasks. A communicating sequential process model of computation has also been favored for expressing and implementing high-level computer vision algorithms on largely unstructured data. WORMOS is a distributed software architecture for computer vision based on this view (Sahiner et al., 1988). This software architecture executes on a distributed hardware platform and supports distributed applications written as sequential processes that communicate via messages. The distributed applications themselves are referred to as worms, and the hardware architecture is referred to as the earth. Figure 10 illustrates the distributed hardware environment. The network interconnects a number of Motorola 68000 processors, each potentially expanded to include several additional processors. Work consists of several programs called segments. Each segment can run on a single processor and is essentially
FIGURE 1 I . Hierarchical bus architecture.
PARALLEL PROCESSING METHODOLOGIES
283
a single communicating process. Segments are a source of parallelism at the functional level. The primary advantages of coarse-grained computation and this approach to structuring distributed applications is the full and balanced use of heterogeneous processing resources and the relative ease of adapting to changes in hardware resources (either due to growth or due to failures). Worms for several high-level vision algorithms have been demonstrated, including region labeling and object matching. A third example of a coarse-grained processing architecture is the Hierarchical Bus Architecture (HBA) developed by Hughes Artificial Intelligence Center (Wallace and Howard, 1989). The goal of the HBA system was to reduce both the execution and programming time of image processing algorithms at a reasonable cost, i.e., a machine that was not too specialized, too costly, or I/O-bound. Figure 11 shows the organization of the HBA system. The current architecture consists of 24 MC68020 processors interconnected over a novel digital video bus. Video data is broadcast over the bus, and the programmable bus interfaces of the processors can capture any contiguous set of pixels. This enables very flexible image partitioning and distribution schemes. The Combus (Fig. 11) is a 16-bit bus with the host computer as the arbiter. The programming environment for the HBA system uses one program for the host and each HBA processor. The programs themselves may all be identical, but from the point of view of interprocessor synchronization, they operate in the MIMD mode. A number of benchmarks, including the DARPA benchmarks, have been implemented and evaluated on the HBA system. The limitations are the slow communication bus and the relatively slow processors. However, the architecture is typical of bus-oriented approaches to the design of parallel image processing architectures. The designers are currently looking a t the feasibility of a next-generation HBA system using the optical crossbars being studied at Hughes Research Laboratories, and perhaps replacing the processors with much faster processors from the CMU Warp system. The HBA architecture is a good example of an architecture-driven coarsegrained machine. The design is relatively simple - processor boards communicating over a simple bus - and uses relatively well understood technology. The innovative aspects lie in the application of this machine to problems in computer vision. Several other organizations with PEs interconnected by a fixed set of data paths have also been proposed. Shift register rings, buses, and star organizations are a few examples of interconnection networks proposed for this purpose (Yalamanchili et al., 1985). The performance of all these organizations depends heavily upon how well they can support the communication requirements of the computer vision algorithms. It became apparent that flexibility in computation and communication are important in order to
284
S. YALAMANCHILI A N D J. K . AGGARWAL
efficiently support a wide range of algorithms. As it became clear that the communication overhead in fixed interconnection architectures could dominate the performance, attention shifted to more flexible networks, leading to the advent of parallel architectures consisting of multiple programmable processors interconnected by reconjigurable interconnection networks. These networks provide for the hardware reconfiguration of paths between processors and memory or between processors. Crossbar networks can realize any interconnection pattern between PEs; however, their O ( P 2) cost for a system of P processing elements makes them prohibitively expensive, even for moderately sized systems. Networks made up of “stages” of smaller-size crossbar switches have been investigated. These networks possess O ( P log, P ) cost functions with B x B size crossbar switches. Examples of such systems applied to image processing include the PASM (Partitionable SIMD/MIMD) system (Siegal et a f . , 1981), the PUMPS architecture (Briggs et al., 1982), and the Texas Reconfigurable Array Computer (TRAC) (Premkumar et al., 1980). Descriptions of these architectures in the context of image processing can be found elsewhere (Yalamanchili et al., 1985; Yalamanchili and Aggarwal, 1985~).
C . The Mapping Problem in Architecture-Driven Methodologies
An associated disadvantage of architecture-driven approaches is that often it is necessary to design algorithms customized for that architecture. Alternatively, if algorithms are independently designed and structured, one is faced with the problem of assigning the computations to processors and scheduling their execution in a manner that makes efficient use of the array. This problem is the well-known mapping problem (Bokhari, 1981a). In its most general form, the mapping problem is computationally equivalent to the graph isomorphism problem, one of the classical unsolved combinatorial problems. Finding an exact, efficient (i.e., polynomial time) algorithm for solving this problem for arbitrary graphs appears unlikely. Therefore, the objective typically is to develop a set of heuristics that can be used to map a set of parallel tasks onto a parallel architecture in order to maximize system use and throughput. Many approaches to the mapping problem provide solutions for structured problems executed on structured multiprocessor architectures (Chan and Saad, 1986; Chan and Chan, 1988; Chan, 1988; Choudhary and Aggarwal, 1990). In the absence of optimal solutions, attempts are made to compute assignments with acceptable performance. Alternatively, heuristic algorithms exist that attempt to provide acceptable solutions to arbitrarily unstructured programs and networks of processors. Combinatorial search strategies
PARALLEL PROCESSING METHODOLOGIES
285
constitute a large number of such algorithms (Yalamanchili and Lee, 1988; Bollinger and Midkiff, 1988; Chen and Tsai, 1985; Girkar and Polychronopoulos, 1988). The volume of results and research efforts are too numerous to outline here. We only wish to emphasize the importance of solutions to the mapping problem in efficiently using most parallel architectures.
IV. APPLICATION-DRIVEN APPROACHES
In application-driven approaches, capabilities are “forced the other way”, i.e., a top-down approach. Provided with a set of algorithms characterizing the application, the language features and execution mechanisms are chosen to reflect support for the algorithms. The hardware/software organization is designed around support for these execution mechanisms and languages and, consequently, around support for the algorithms. Application requirements force design choices, not architecture capabilities. The applicationdriven approach, in some sense, complements past experiences in designing parallel architectures, since one cannot altogether ignore technological considerations. As with architecture-driven approaches, it is not always possible to completely determine that a specific design was influenced only (or even largely) by the characteristics of the application. At the very least, we believe the examples discussed here are representative of the outcomes of applicationdriven approaches. We can distinguish between two underlying themes. Over the range of computer vision algorithms - from low to high levels good matches between specific algorithms and architectures are apparent, e.g., two-dimensional arrays and neighborhood operations, pipelines and image-to-image transforms, etc. But architectures with a broader range of applications are not immediately apparent. One approach is to construct parallel architectures by combining several different organizations, e.g., a two-dimensional processor array interfaced to a neural network architecture. The goal is to use distinct organizations for hosting distinct classes of computations. We refer to such architectures as hybrid architectures. A second approach is closer to a purely application-driven approach. One or more models of high-level computer vision are formulated. Architectures are then designed to support these models. The architectures realized by such an approach are referred to as model-driven architectures. ~
A . Hybrid Parallel Architectures A typical example of this class of parallel computer vision architectures is
286
S. YALAMANCHILI A N D I. K. AGGARWAL
the Video Image Processor (VIP) (Yalamanchili et al., 1987). This is a parallel architecture designed for space station applications. The architecture was derived from a study of the requirements of the algorithms necessary to realize the desired functionality of several space station applications such as satellite servicing, rendezvous and proximity operations, and inspection. Architectural solutions were selected based on a simulation of the execution of these algorithms on various architectural organizations and variants. A great deal of emphasis was placed on the ability of the system to evolve and grow to accommodate new applications as they were identified. The result was a hierarchical parallel architecture, illustrated in Fig. 12. The architecture is organized as three levels. The first level is responsible for all low-level operations. It is constructed as a series of custom array processors that communicate via shared memory. Each array processor is a linear array of up to 32 processors. The array can process an image partitioned across the individual processors. Each such array processor performs an image-to-image transform such as a filtering operation or image normalization. After all low-level image-to-image transforms are complete, the images are processed by the level 2 processors to produce high-level descriptions of the contents of the imaged scenes. Based on the communication requirements of algorithms at this level, a dual bus architecture was selected as the interconnection network over the chordal ring and hypercube topologies. The level 2 and 3 architectures operate in the MIMD model. A novel feature at this level is the distinction between memory modules for symbolic and numeric data structures. The motivation is to make use of address generation hardware to realize high-
FP - Floating Point Processor SP. Symholic Racessor
Nh4. Numenc Memory SM .Symbolic Memory
FIGURE 12. Organization of the Video Image Processor.
PARALLEL PROCESSING METHODOLOGIES
287
speed access to array structures that is not slowed down by the need to perform the pointer-intensive dereferencing typically required for access to dynamic structures, such as semantic nets. Such considerations also result in an organization at level 3 where all inferencing operations are performed by special-purpose symbolic processors. Level 3 comprises distinct high-speed floating-point array processors for numerically intensive computations such as motion estimation, 3-D position estimation, and trajectory and path planning. We see a mixture of special-purpose hardware at various levels in an attempt to efficiently satisfy the performance requirements of a range of algorithms. Another example of a hybrid architecture is the Disputer - a parallel architecture developed for both image processing and graphics applications (Page, 1988). The Disputer comprises a 16 x 16 SIMD processor array for low-level image operations coupled to a 7 x 6 transputer array for intermediate and high-level computer vision algorithms. Figure 13 shows the organization of the Disputer. The SIMD PE is a bit serial processor with bidirectional links to its four nearest neighbors. It has 256K bits of local memory providing a total of 8Mbytes for the whole array - sufficient for many multi-image operations. Row and column broadcast lines carry instructions from the central controller. The transputer array makes use of T414 32 bit transputers with 2 Kbytes of on-chip memory. The internal links in the transputer array are hardwired, while the external boundary links are programmable and can be structured to communicate with other transputers, the controller, and the development system. The Disputer is clearly based on the low-level/high-level view of computer vision algorithms. Functions operating on images are performed in the SIMD array, while non-image structures are processed by the transputer array. The WISARD system adopts a similar structure, but encourages much more interaction between components typically responsible for low- and high-level processing (Aleksander, 1985). It is argued that a versatile image
16x16 hoccessor
FIGURE
7x6 Transputer
13. Organization of the Disputer Array.
288
S. YALAMANCHILI A N D J. K . AGGARWAL
/
\ lnielligent Knowledge B a x Lmagc Prwesring
1
Interface
10
the Envuonmenl
FIGURE 14. Organization of the WISARD architecture.
understanding system must entail the cooperative efforts of at least three components a parallel image processor, a semiparallel pattern recognizer, and an intelligent knowledge-based structure. Figure 14 shows the overall structure and relationship with the external world. The WISARD system is based on the view that in conventional architectures, each component does not take sufficient advantage of the other two. A closer relationship between the image processing and pattern recognition components is implied. The function of these two components is to translate the sensory data into symbols and relationships between them. The intelligent knowledge-based structure is to draw inferences from these structures and to make assertions about the contents of the imaged scene. The unique feature of this system is the cooperative interaction among all of the components. For example, the knowledge-based structure may infer that additional information from the image is required before ambiguity can be resolved and may even go so far as to suggest parametric changes to low-level algorithms or regions of the image that warrant further scrutiny. This cyclic process may continue, with the quality of the results steadily improving. ~
Images
Sliding Memory
(___
slag I
Plane Array
1 Scene hcerprelarion
FIGURE
15. The VisTA architecture.
PARALLEL PROCESSING METHODOLOGIES
289
Another hybrid organization based on the low-intermediate-high-level view of computer vision is the Vision Tri-Architecture (VisTA) system. The system comprises a pipeline of three distinct architectures, each constructed to exploit the unique features of low, intermediate, and high-level computer vision, respectively, as illustrated in Fig. 15. The first stage comprises the sliding memory plane array processor (Sunwoo and Aggarwal, 1990a), which was designed to specifically address the problems of IjO and interprocessor communication in low-level operations. The organization of this array is illustrated in Fig. 16. The two-dimensional PE array is augmented with three more arrays - the D and D’ arrays for input and output, and the S array for interprocessor communication. These arrays can synchronously shift data in the horizontal or vertical direction. The elements of the S array are directly accessible to the corresponding PEs and can be latched into a local register. The motivation for this organization is to overlap IjO and inter-PE communication as much as possible. The control unit is partitioned into two segments to control the sliding memory planes and the PEs, respectively. While normal instruction broadcast and execution is taking place, the D and D’ planes can be shifting image data into and out of the machine, respectively. Inter-PE communication can also be overlapped in the following manner. All PEs load the corresponding elements of the S array with data, e.g., a pixel value. The data transfers take place in parallel between PEs, effectively realizing some permutation of the data in the S array. When all pieces of data have reached their destinations, they are latched into a local register, and the corresponding S array element can be loaded with a new value. While this new set of values is being permuted in the S array, the PEs can be executing. Furthermore, IjO can also be proceeding concurrently. Such a flexible organization for two-dimensional arrays reduces the dependence of performance on local, tightly orchestrated communication, as long as the ratio of computation to communication can be maintained
D’plane
PE plane
FIGURE 16. The Sliding Memory Plane architecture.
290
S. YALAMANCHILI A N D J. K . AGGARWAL
at a sufficiently high level. The desirable magnitude of this ratio is proportional to the longest path between communicating PEs. This organization provides a challenge for a restructuring SIMD compiler to reorder machine operations to maximize the overlap in computation, communication, and IjO. Many of the considerations governing instruction scheduling in pipelined Reduced Instruction Set Computers (RISC) are directly applicable. The architecture proposed as the second stage of the VisTA pipeline, i.e., for intermediate-level vision, is representative of a class of flexibly coupled multiprocessor architectures (FCM) (Sunwoo and Aggarwal, 1990b). The proposed design makes use of dynamically reconfigurable address and data buses to realize a flexible memory architecture. The overall organization is illustrated in Fig. 17. The rationale for this architecture is predicated upon a desire to combine the relatively well-understood technology of bus-based architectures, the programming advantages of shared address spaces, and the flexibility of fully asynchronous, loosely coupled operation. The system consists of N processing elements (PEs), each with local memory for program and intermediate results. The PEs can access N data memory modules over a partitionable bus. The set of switches is set by the control unit (CU). An IjO processor (IOP) and a communication bus for PE-CU communication completes the organization. The switches can be used to dynamically partition the bus into a number of distinct segments. All of the data memory modules that are connected on any segment form a single, continguous physical address space ix., a partition consists of several processors and a shared address space, and thus operates as a tightly coupled machine. Distinct partitions interact in a loosely coupled manner. This memory organization is also referred to as a variable space memory organization. Inter-PE communication may now take place over the global communication bus or via shared memory modules that can be switched ~
FIGURE 17. A flexibly coupled multiprocessor for intermediate-level computer vision.
PARALLEL PROCESSING METHODOLOGIES
29 1
between partitions. It is possible to structure parallel computations that require propagation of information to other PEs by setting switches to selectively include memory modules in specific contiguous address spaces. For example, the data from memory module 0 can be made available to PE N - 1 by closing all of the switches. Parallelism is achieved by opening sets of switches. At the system level, the architecture can be viewed as being dynamically reconfigured into shared memory partitions, with each partition being configured with varying amount of memory and a variable number of processors. Algorithms for functions such as region labeling and median filtering have been implemented on a simulation of the FCM, and its performance compared to implementations on a conventional, loosely coupled, distributed memory machine. Speedups averaged up to about 6 for a 32 node system (Sunwoo and Aggarwal, 1990b). An extension to the original single-bus FCM (Sunwoo and Aggarwal, 1990b) has also been proposed. The system shown in Fig. 17 is augmented with an additional partitionable, address/data bus interconnecting the PEs to another set of memory modules. Each PE can now alternate between using memory modules on one bus and using those on the other as the source of data. Data IjO can now be double-buffered across the two memory modules associated with each PE, and potentially 1 / 0 and computation can be fully overlapped. Finally, the third stage (Sunwoo and Aggarwal, 1990c) of the VisTA architecture is based on a multiprocessor interconnected in a binary hypercube topology. This interconnection network is enhanced by selectively embedding switching capability in the hypercube links. In addition to the standard message-passing supported by the hypercube, the switches can effectively map the physical address spaces of adjacent processors into a single contiguous address space in a manner similar to the scheme proposed for the FCM (referred to as the FCHM). A similar approach to adding additional memory modules at each PE to provide for double-buffered (and therefore overlapped) IjO was also proposed. Perhaps the most unusual aspect of the VisTA system is the emphasis on I/O, while the most innovative is the notion of using dynamically restructurable physical address spaces to attempt to combine the advantages of shared memory, message passing, ,and concurrent I j0. All of the architectures described in this section share the same philosophical approach - they use a combination of distinct organizations that are well matched to a subset of the algorithms of interest. A major obstacle to the widespread use of such architectures is the environment and support for programming such machines. The distinct computing paradigms currently appear difficult to accommodate simultaneously without placing a large burden on the applications programmer.
292
S. YALAMANCHILI A N D J. K. AGGARWAL
FIGURE 18. Architecture of the Scan Line Array Processor.
B. Model-Driven Parallel Architectures Model-driven architectures are not necessarily derived from a computational model. They can also be derived from a model of image input/ output. Image inputloutput has been realized with large shift registers, staging memories, double-buffered memories, etc. One example of an architecture for intermediate-level vision predicated upon the scan line format for image acquisition is the Scan Line Array Processor (Fisher and Highnam, 1989) (SLAP). This is an architecture whose design is dictated by the structure of the image data and input/output format of image data. As a result, we classify it as an application driven architecture. Figure 18 shows the organization of the SLAP. The current version is designed with 512 processing elements (PEs). The PEs compute one word at a time, have a local register file, and operate at an instruction clock rate of 12511s. The array is a synchronous SIMD array that operates on (512 x 512) images and is optimized to the raster format of the image data. The SLAP processor operates on each line of the image, as Fig. 18 illustrates. At a frame rate of 30 frame+, the time for a single scan is about 58 ps - time for about 500 PE instructions. Many simple algorithms exist for computing image features in a single pass of the images, such as shape measures, run length coding, the construction of contours, etc. The organization is also used for
I
Cycle
I
\
/ ---
\*
H-H-----Hi 0
Communication
I
t Computation
Synchronization Point
Stage
Stage
FIGURE 19. A model for parallel image processing
__
PARALLEL PROCESSING METHODOLOGIES
293
medium-level processing tasks such as the computation of the Hough transform. Since the organization is well matched to the flow of image data, such computations can be carried out in real time. Alternatively, one can define a computational model that encompasses many high-level functions. Architectures are then optimized to execute the operations that compose the model (Aggarwal and Yalamanchili, 1984). One of the first attempts towards this end is the definition of a synchronous model of parallel image processing (Yalamanchili and Aggarwal, 1985). Figure 19 shows graphically the processing structure defined by this model. Processing proceeds as an alternating sequence of computation stages and communication stages. A communication (computation) stage does not start until the preceding computation (communication) stage is completed. Each communication stage can be broken down into a sequence of cycles. In one cycle, processors transmit data and control information to each other. The source destination pairs in each cycle can be represented as a permutation of the processor addresses. Thus, in a cycle each processor is potentially transmitting (receiving) information to (from) at most one other processor. The number of cycles in a communication stage is equal to the maximum number of processors to which any one processor must transmit information. Given a statically structured representation of the communication requirements in a stage, the iterative application of an algorithm to determine the maximal matching in bipartite graphs can be used to determine a formal specification of the communication requirements in each communication stage. Each such permutation is realized in one or more passes through an interconnection network. Issues relevant to the synthesis of interconnection networks for algorithms formulated using such a model have also been explored. Based on this analysis, recirculating single-stage, generalized shuffle-exchange type networks were found to demonstrate very
Synchronization Point
1
Output
0-
-----------* Computations FiCiuw
Pipelined Parallelism
20. A parallel/pipelined model for image processing.
294
S. YALAMANCHILI A N D J . K . AGGARWAL
good cost performance trade-offs for many intermediate and high-level processing algorithms (Yalamanchili and Aggarwal, 1985). Subsequent work following the same philosophy proposed a model of parallel image processing more relevant to the lower levels of processing. This parallel/pipelined model exploits spatial image parallelism as well as temporal parallelism in the processing of image sequences (Lee and Aggarwal, 1990). Processing progresses as illustrated in Fig. 20. The model exploits pipelined functional parallelism across images and spatial parallelism within each stage. Techniques are provided for partitioning tasks to obtain a well-balanced, high-throughput system. Both statically structured tasks with well-defined computational requirements and functions with dynamically varying computational requirements are considered. The dynamic case is supported by simulation studies using queuing network models to support well-founded design decisions based on estimated task execution times. These decisions include relationships between the relative processing times of successive stages, buffering requirements between stages, and scheduling techniques using estimated processing times. Examples of analysis, partitioning, and scheduling using this computational model and bus-based and hypercube architectures as a target are discussed. These previous models were primarily driven by parallelism in the structure of the algorithm control flow. More recently, models have been derived based on the structure and use of data (Hillis, 1984; Jesshope, 1988). The thesis underlying this approach is that the data abstraction of processing is more suitable for efficiently using large-scale parallelism than the processbased abstraction. This model of concurrency provides for the simultaneous operation across all elements of a data structure. It is represented in the form A t e(A, B ) , where the expression is first evaluated over all elements of A and B before assignment is made back to the elements of A . This model of concurrency makes no assumption about the overall control strategy and, therefore, is applicable to arrays of processors operating as SIMD or MIMD machines. When such abstractions are used, a distinction is made between the data and its activation. The model uses the abstraction of a single virtual processor per data structure element, though only a few real processors may be available (this is akin to the use of virtual memory). In the active data model, processing typically proceeds as a sequence of two steps: Identify all active data elements, and process all of them simultaneously. The first step is necessary because all elements of a data structure may not always be a part of a computational step, a fact that is accommodated in serial machines with conditional expressions and array indexing calculations. The second step is the source of the parallelism. This is graphically illustrated in
PARALLEL PROCESSING METHODOLOGIES
1
1
00
Communication Processors
295
t 0
__.___
I
Network
I
FIGURE 21. The virtual machine corresponding to the active data model of computation
Fig. 21, where the methods identify active data to be subsequently processed. For example, activation may be the selection of a row in a matrix, the copying of a structure, the identification of a set of elements by a key, etc. In the Connection Machine, for example, a common basic unit of operation is the set. Thus, operations on sets are made very efficient, e.g., set intersection and set subset. In computer vision, the nature and semantics of the data structures change as processing evolves. It is more beneficial to represent the data by a structure that is more closely matched to the semantics of the relationships between elements. A virtual machine is proposed that reflects the implementation of the active data model of computation and reflects the distinction between activation and methods. Figure 21 illustrates this machine model. The machine reflects processors for the routing, distribution, and activation of data elements. Another set of processors is provided to process data. This virtual machine can be carried out through implementation or, as proposed, implemented on an existing SIMD or MIMD array. Related current efforts include defining primitives for machines based on the active-data model of computation. The purpose is to allow a continued focus on the data structures and their semantics while hiding the details of a specific parallel implementation. In a sense, these efforts can be viewed as bottom-up and, therefore, complementary to those that work from virtual machines on down. The active-data model of computation reflects a fundamentally different view of image processing algorithms and can be expected to influence new architectures as more insight is gained from its application. Pyramid architectures are another organization that has become increasingly popular and was motivated by the properties of image processing algorithms (Ahuja and Swamy, 1984; Grosky and Jain, 1986; Reeves, 1986; Stout, 1986; Tanimoto, 1983). Hierarchical structures occur naturally at many levels in image processing and computer vision. Pyramid architectures
296
S. YALAMANCHILI A N D J. K. AGGARWAL
are an intuitive, natural extension of these data structures. These architectures have also been the focus of low-level algorithms, as well as of higher-level functions such as image segmentation. A pyramid machine is a parallel architecture composed of stacked, successively smaller arrays of PEs such that each PE has access to the state information of its neighbors. A pyramid of L 1 levels has a square base array of 2L x 2L PEs. The layer at level i has 2' x 2' PEs. The PE at level 0 is the apex. The PEs at each level are connected to PEs in a local neighborhood at that level, to four children directly below it at the next level, and to a single parent at the next upper level. A PE can be specified by its coordinates (i, j , k ) ; it is at level i at the position ( j , k ) in the array. Related hierarchical organizations for intermediate and high-level vision problems, such as segmentation and scene labeling, include binary trees (Ibrahim. 1984), variable pyramid structures, and multilevel array structures.
+
V. EMERGING RESEARCH AREAS
Several areas of research will have a major impact on the future development of parallel processing methodologies for image processing and computer vision. Perhaps the most visible are the developments in application-driven architectures in the area of neural networks. Attempts to build computers modeled after our understanding of the functions of the human brain have accelerated rapidly in the recent past, aided by technological developments that make the testing and implementation of many ideas and concepts feasible. These networks serve to complement existing highlevel processing based on symbols and structures, and potentially provide enormous advantages in object recognition, learning, and constrained optimization. The impact of neural networks will be felt throughout the range of computer vision algorithms. Architecture-driven approaches will be influenced by developments in 3-D VLSI, wafer-scale integration, molecular computing, the development of quantum-effect transistors and solid-state vacuum devices, and optical computing. These new technologies will alter trade-offs between cost and performance, which in turn will dictate the system-level organization of future parallel machines. The fastest electronic transistors take approximately several picoseconds ( to switch between on and of states, while optical pulses of widths 0.008 picoseconds have been demonstrated. Furthermore, the speed of light makes signal propagation delays much shorter, and the use of free space interconnection networks will alter the current structure of networks by altering the physical characteristics that
PARALLEL PROCESSING METHODOLOGIES
297
determine efficient organizations. Quantum effect devices will bring computing down to the atomic level. While a basic on-of device at this level has been demonstrated, how can a large number of such devices be aggregated into a useful computing engine? How can information be entered and retrieved efficiently from such molecular computers? Answers to these and many other questions will motivate the development of these new technologies. In the arena of software/algorithms, advances in automated parallelism extraction will make parallel architectures available to a broader range of applications. Silicon compilation promises to redefine the transition from problem specification to system design and implementation. Before massively parallel architectures can be routinely applied, we need continuing advances in resource management and operating systems. Collectively, all of these visible developments make the field of computer vision an exciting one.
REFERENCES Aggarwal, J. K., and Yalamanchili, S. (1984). “Algorithm driven architectures for image processing,” Proceedings of the Workshop on Algorithm-Guided Architectures for Automated Target Recognition, Leeshurg. Virginia. July 1984. Ahuja, N., and Swamy, S. (1984). “Multiprocessor pyramid architectures for bottom up image analysis,” IEEE Trunsactions on Pattern Ana1ysi.y and Machine Intelligence 6 , 463-475. Aleksander, I (1985). “WISARD: A component for image understanding architectures,” in Image Processing System Architectures (J. Kittler and M. J. B. Duff, eds.), pp. 153-163. Research Studies Press Ltd. Almasi, G . , and Gottlieb, A. (1989). Highly Parallel Computing. Benjamin Cummings. Archibald, J., and Baer, J.-L. (1986). “Cache coherence protocols: Evaluation using a multiprocessor simulation model,” ACM Transactions on Computer Systems 4(4), 273-298. Arvind and lannucci, R. A. (1983). “A critique of multiprocessing von Neumann style,” Proceedings q j the lOrh International Symposium on Computer Architecture, June 1983, 426-436. Batcher, K. E. (1980). “Design of a massively parallel processor,” IEEE Transactions on Computers C-29, 278. Bhanu, B. H., and Nuttall, L. A. (1989). “Recognition of 3-D objects in range images using a butterfly multiprocessor,” Pattern Recognition 22( I ) , 49-64. Bokhari, S. H. (1981a). “On the mapping problem,” IEEE Transaction on Computers C-30(3), 207-2 14. Bokhari, S. H. (1981b). “A shortest tree algorithm for optimal assignments across space and time in a distributed processor system,” I E E E Transactions on Software Engineering SE-7(1 l), 583- 589. Bokhari, S. H . (1988). “Partitioning problems in parallel, pipelined, and distributed computing,” IEEE Trunsactions on Computers C-37( 1 ), 48-57. Bollinger, S. W., and Midkiff, S. F. (1988). “Processor and link assignment in multicomputers using simulated annealing,” International Conference on Parallel Processing. Briggs, F. A,, Hwang, K.. Fu, K . S., and Dubois, M . (1982). “PUMPS architecture for pattern
298
S. YALAMANCHILI A N D J. K. AGGARWAL
analysis and image database management,” IEEE Transactions on Computers C-31(10), 968-982. Chan, M. Y. (1988). “Dilation-2 embeddings of grids into hypercubes,” Internaiional Conference on Parallel Processing. Chan, M. Y., and Chan, F. Y. L. (1988). “On embedding rectangular grids in hypercubes,” IEEE Transactions on Computers, October 1988. Chan. T. F., and Saad, Y. (1986). “Multigrid algorithms on the hypercube multiprocessor,’’ IEEE Transactions on Computers, November 1986. Chen, C.-C., and Tsai, W.-H. (1985). “A graph matching approach to optimal task assignment in distributed computing systems using a minmax criterion,” IEEE Transactions on Computers, March 1985. Choudhary, V., and Aggarwal, J. K. (1990). “Generalized mapping of parallel algorithms onto parallel architectures,” Proceedings of ihe International Conference on Parallel Processing, August 1990, 137-140. Colwell, R. P., Nix, R. P., ODonnell, J. J., Papworth, D. P., and Rodman, P. K. (1988). “A VLIW architecture for a trace scheduling compiler,” IEEE Transactions on Computers C-37(8), 967-979. Davis, R.. and King, K. (1975). “An overview of production systems,’’ Stanford AI Lahoraiory Memo AIM-271, and Computer Science Department Report No. STAN-CS-75-524, October 1975. Dennis, J. 8. (1980). “Dataflow supercomputers,” IEEE Computer 13(1 I ) , 48-56. Duff, M. J. B. (1978). “Review of the CLIP image processing system,” Proceedings of the National Computer Conference, 1055- 1060. Eshaghian, M. Mary, and Kumar, V. K. Prasanna (1989). “Fine grain image computations on electro-optical arrays,’’ Proceedings of ihe AIAA, 666-671. Fisher, A. L., and Highnam, P. T. (1989). “Computing the Hough transform o n a scan line array processor,” IEEE Transactions on Puttern Analysis and Machine Intelligence ll(3). Foster, M. J., and Kung, H. T. (1980). “The design of special purpose VLSl chips,” IEEE Compuier, January 1980, 26-40. Fountain, T. J., Mathews, K. N., and Duff, M. J. B. (1988). “The CLIP7A processor,” IEEE Transacrions on Pattern Analysis and Muchine Inielligence 10(3), 310-319. Gentleman, W. M. (1978). “Some complexity results for matrix computations on parallel processors,” Journal ofthe ACM 25(1), 112-115. Girkar, M., and Polychronopoulos, C. (1988). “Partitioning programs for parallel execution,” International Conference on Supercomputing. Grosky, W. I., and Jain, R. (1986). “A pyramid-based approach to segmentation applied to region matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence 8, 639650. Hillis, D. (1984). The Connection Machine. MIT Press, Cambridge, Massachusetts. Hunt, D. G . , and Reddaway, S. F. (1983). “Distributed processing power in memory,’’ in The Ffth Generation Computer Project. Permagon Infotech Ltd. Hwang, K., and Briggs, F . (1984). Computer Architecture and Parallel Processing. McGrawHill, New York. Ibrahim, H. A. H. (1984). “The connected component algorithm o n the non-von supercomputer,” Proceedings of the Workshop on Computer Vision: Representation and Control. May 1984, 37-45. Jesshope, C. (1988). “A dynamic, load-balanced, active-data model of parallel processing for vision,” in Parallel Architectures and Computer Vision ( I . Page, ed.), 315-329. Oxford Science.
PARALLEL PROCESSING METHODOLOGIES
299
Johnson, T., and Durham, T. (1986). Parallel Processing: The Challenges of New Computer Architectures. Ovum Inc. Jordan, H. (1978). “A special purpose architecture for finite element analysis,” Proceedings of the Internarional Conference on Parallel Processing. August 1978, 263-266. Kumar, V. K. Prasanna (1986). “Parallel geometric algorithm for digitized pictures on a mesh of trees.” Proceedings of the International Conference on Parallel Processing. August 1986.
Kung, H. T. (1982). “Why systolic architectures?” IEEE Computer 15(1), 37-46. Kung, H . T. (1984). “Systolic algorithms for the CMU-Warp processor,” Proceedings of the 7th International Conference on Pattern Recognition. Kung, S. Y. (1987). VLSZ Array Processors. Prentice-Hall. Lee, S.-Y., and Agganval, J. K. (1987). “A mapping strategy for parallel processing,” IEEE Transactions on Computers C-36(4), 433-441. Lee, S.-Y., and Aggarwal, J. K . (1990). “A system design/scheduling strategy for parallel image processing,” IEEE Transactions on Pattern Analysis and Machine Intelligence 12(2), 194-204. Lee, S.-Y., Yalamanchili, S., and Aggarwal, J. K. (1987). “Parallel image normalization on a mesh connected array processor,” Puttern Recognition 20( I), 115-124. Lint, B., and Agerwala, T. K. (1981). “Communication issues in the design and analysis of parallel algorithms,” IEEE Transactions on Software Engineering SE-7(3), 174- 188. Lowry. M. R., and Miller, A . (1981). “A general purpose VLSI chip for computer vision with fault tolerant hardware,” Proceeding of the DARPA lmage Understanding Workshop, April, 1981. 184. Miller, R., and Stout, Q. F. (1985). “Parallel geometric algorithms for digitized pictures on mesh connected computers,” IEEE Transuctions on Pattern Analysis and Machine Intelligence, March 1985. Miller, R., and Stout, Q. F. (1988). “Efficient parallel convex hull algorithms,” IEEE Transactions on Computers, December 1988. Nudd, G . R. ( 1980). “Image understanding architectures,” Proceedings qf the National Computer Conferenre, 377-390. Page, I. (1988). “The disputer: A dual paradigm parallel processor for graphics and vision,” in Parallel Computer Architectures and Computer Vision ( I . Page, ed.), 201 -21 6. Oxford Science. Premkumar, U. V., Kapur, R., Malek, M., Lipvoski, G . J., and Horne, P. (1980). “Design and implementation of the Banyan interconnection network in TRAC,” Proceedings of the National Coniputrr Conference, 643-653. Preston, K.. Duff, M. J. B., Levialdi, S., Norgen, P. E., and Toriwaki. J. (1979). “Basics of cellular logic with applications to medical image processing,” Proceeding qf the IEEE 67(5), 826-856. Ranka, S., and Sahni, S. (1988). “lmage template matching on M I M D hypercube multicomputers.” Proceedings of the Inrernationul Conference on Parallel Processing, August 1988, Vol. 111. Algorithms and Applications, 92-99. Ranka, S., and Sahni, S. (1990). “Convolution on mesh connected multicomputers,” I E E E Transacrions on Patiern Analysis and Machine Intelligence 12(3), 315-318. Reeves, A. P. ( 1980). “A systematically designed binary array processor,” IEEE Transactions on C‘ompurers C-29, 278. Reeves. A. P. ( 1986). “Pyramid algorithms on processor arrays,’’ in Pyramidal Systems for Computer Viswn (V. Cantoni and S . Levialdi, eds.), 195-215. Springer-Verlag, Berlin. Sadayappan, P., and Ercal, F. ( 1987). “Nearest neighbor mapping of finite element graphs onto meshes,’’ IEEE Transactions on Computers C-36( 12), 1408-1424.
300
S. YALAMANCHILI A N D J. K. AGGARWAL
Sahiner, A. V., Kindberg, T., and Parker, Y. (1988). “A distributed architecture for image processing,” in Parallel Architectures and Computer Vision (Ian Page, ed.), 187- 199. Oxford Science. Siegel, H. J. et al. (1981). “PASM: A partitionable SIMDiMIMD system for image processing and pattern recognition,” IEEE Transaction on Computers C-30( 12). 934-946. Stout, Q. (1986). “Hypercubes and pyramids,” in Pyramidal Systemsfor Compurer Vision (V. Cantoni and S. Levialdi, eds.), 74-91, Springer-Verlag, Berlin. Sunwoo, M. H., and Aggarwal, J. K. (1990a). “A sliding memory plane array processor for low level vision,” Proceedings of the International Conference on Pattern Recognition, 3 12-3 17. Sunwoo, M. H., and Agganval, J K. (1990b). “Flexibly coupled multiprocessors for image processing,” Journal of Parallel and Distributed Computing 10, 115- 129. Sunwoo, M. H., and Aggarwal, J . K. (1990~).“VisTA for a general purpose computer system,” Proceedings of the International Conference on Pattern Recognition, 635-641. Tanimoto, S . L. (1983). “A pyramidal approach to parallel processing,” Proceedings of the 10th Symposizim on Computer Architecture, 372-378. Thiebault, D., and Stone, H. S. (1987). “Footprints in the cache,” ACM Transactions on Computer Systems 5(4), 305-329. Treleaven, P., Brownbridge, D. R., and Hopkins, R. P. (1982). “Data-driven and demanddriven computer architecture,” ACM Computing Surveys 14( I ) , 93-143. Unger, S. H. (1958). “A computer oriented towards spatial problems,” Proceedings of the IRE, October 1958, 1744. Wallace, R. S., and Howard, M. (1989). “HBA vision architecture: Built and benchmarked,” IEEE Transactions on Pattern Analysis and Machine Intelligence 11(3), 227-232. Webb, J. A,, and Kanade, T. (1986). “Vision on a systolic array machine,” in Evaluation of Multicomputersfor Image Processing (L. Uhr, K . Preston, S. Levialdi, and M . J. B. Duff, eds.), 181-201. Academic Press, New York. Wood, A. (1986). “Higher level operations using processor arrays,” in Evaluation of Multicomputers for Image Processing (L. Uhr, K. Preston, S. Levialdi. and M. J. B. Duff, eds.). Academic Press, New York. Yalamanchili, S., and Aggarwal, J. K. (1984). “Image processing architectures: Past, present, and future,” Proceedings of the International Conference on Computers, Systems & Signal Processing, December 1984. Yalamanchili, S., and Aggarwal, J. K. (1985a). “Analysis of a model for parallel image processing,” Pattern Recognition 18( I ) , 1 16. Yalamanchili, S., and Aggarwal, J. K. (1985b). “A system organization for parallel image processing,” Pattern Recognition 18(I ) , 17-29. Yalamanchili, S., and Aggarwal, J. K. (1 985c). “Reconfiguration strategies for parallel architectures,” IEEE Computer 18( 12), 44-61. Yalamanchili, S., and Lee, D. T. (1988). “A mapping algorithm for multiprocessor architectures,” 26th Allerton Conference on Computing. Communications. and Control. Yalamanchili, S., Palem, K . V., Davis, L. S., Welch, A. J . , and Aggarwal, J. K. (1985). “Image processing architectures: A taxonomy and survey, in Progress in Pattern Recognition, 1-37, North Holland. Yalamanchili, S., Lee, D., Fritze, K., Carpenter, T., Hoyme, K., and Murray, N. (1987). “The architecture of a video image processor for the space station,” Proceedings of the N A S A Workshop on Space Telerobotics, January 1987. -
A
Abrikosov vortex lattice, 174-175 Active data model of computation, 294-295 Algorithms characteristics, 265-267 design, architectural characteristics effects, 270-273 relationship with architectures, 261 -273 algorithmic characteristic effects, 263-267 architectural characteristic effects, 268-273 impact on hardware architectures, 267-268 Application-driven methodologies, 260, 263, 285 - 296 hybrid parallel architectures, 285-291 model-driven parallel architectures, 292-296 Architecture characteristics, 269-270 relationship with algorithms, 261 -273 algorithmic characteristic effects, 263-267 architectural characteristic effects, 268-273 impact on hardware architectures, 267-268 Architecture-driven methodologies, 260, 262-263, 273-285 coarse-grained parallel architectures, 279-284 fine-grained parallel architectures, 274-279 mapping problem, 284-285 Atomic force microscope, 50
Breakdown, 211, 217, 236, 240, 245 Bulk probes, properties, 138-148 Butterfly multiprocessor architecture, 280-28 I
C Capacitance detection system, 19 1 - 192 Capacitor, 240 Capillary action, 52-53 Capillary forces, 1 19- I27 critical probe-substrate separation, 124 force-versus-distance curves, 121 as function of probe-substrate separation, 122-123, 125-126 hysteresis effect, 123 Kelvin equation, 120 Kelvin radius, 120-121 long-range, 125-126 meniscus volume, 124 Chip, density, 270-271 Clausius-Mossotti equation, 115 CLIP7, 275-276 CLIP7A, 276 CMU Warp, 278-279 Complete flux expulsion model, 186- 187 Connection Machine, 274-275 Contacts, 234 Contact value theorem, 103, 108 Contrast formation, basics, 133-138 Control parallelism, 269
D Data parallelism, 269 Debye-Huckel approximation, 108 Debye length, 107 Deconvolution, 5-6 fully-connected network and, 1 If Defects, 216, 219, 222, 224-225, 232 point, 219, 225 Derjaguin approximation, 67 Electric force microscopy, 130 geometrical errors, 93
B Beam deflection system, 192 Bimorph piezosensor, 192 Bipolar diode, 217, 230, 232, 244 transistor, 230, 232, 244 Bloch wall, 177-180 30 1
302
INDEX
geometry, 65-66 Derjaguin-Landau-Verwey-Overbeck theory, 110 Dielectric permittivity, 56 as function of frequency of water, 73-74 solvation forces, 1 15 Differential power law index, 70 Dirichlet problem magnetic, 140 Dislocations, 222,231,233 Dispersion force, nonretarded, 76-77 Disputer Array, 287 Distance measure, 36f DLP theory, 54-56,59,82 macroscopic, 61-62 DLTS, 226 Donors, 225 Dry joint, 234,240
146-147 magnetocrystalline anisotropy field,
138-139 o-contributions, 147-148 radial stray field, 144-146 shape-anisotropy field, 138-139 vertical stray field, 142-144 thin-film probes, 148-157 equipotentials, 151-1 53 magnetic moments, 155- 156 radial stray field, 154-157 two-probe model, 149,IS5 type of sensors, 150 vertical stray field, 152-154 Ferromagnets, interdomain boundaries,
176-180 Flexibly coupled multiprocessor architectures, 290-291 Fourier coefficients, 171
E Effective-domain model, 134-136,145-147 Electrical current, detection, 180-181 Electric force microscopy, 129-133 constant-compliance measurement, 132 Derjaguin approximation, 130 electrostatic probe-sample interactions,
129-131 Hamaker constant, 132-133 operational conditions, 13 1- 133 polarizable probe, 130- I31 “servo force”, 129 Electrolyte solution, effect on ionic forces, 106-1 12 Electromigration, 208,221,241 Electronic permittivity, electrostatic limits, 80 Electron tunneling, 191 Electrostatic potential, surface, 106-107 Emission field, 218 thermionic, 218 Energy absorption spectrum, 72
G GaAsFET, 227,244 Gerchberg-Papoulis algorithm, 14,16,20 Ginzburg-Landau equations, 172-173 Grahame relation, 106 Granular materials, 210,219,231,234,239,
243
H Hamaker approach, 54-55 Hamaker constant, 72-87 density-modulated, 1 17 electric force microscopy, 132-133 entropic, 58,77,81 macroscopic, 93 nonretarded, 59-60,62,71-78 evaluation, 75-76 as function of absorption frequencies, 78 as function of optical refractive indices,
76 F Ferromagnetic microprobes bulk probe properties, 138-148 demagnetization coefficient, 141 equipotentials, 142 magnetic monopole and dipole moments,
spectral contributions, 74-75 oscillating, 116 partial Van der Waals forces pressures, 91 particle-substrate, 99 retarded, 59,63 as function of effective refractive indices,
79-80
303
INDEX metallic half space. 64 sensitivity to spectral features, 74 Hierarchical bus architecture, 282-283 Hooge equation, 212, 224 Hopfield neural network, 2f, 7, 8ff binary form, 18f convergence, 17 energy function and, 10, 16f matrix inverse and, 23ff nonbinary form, 19h 27f regularization and, I7 superresolution and, 36, 42 Hot electron damage, 230 Hybrid parallel architectures, 285-291
Ionising radiation, 23 1, 245
J
JFETs, 225, 227, 231, 245 “Jump-to-contact” phenomenon, 128
K Kelvin equation, 120 Kramers-Kronig relation, 56
L I Ill-conditioned problem, 13, see also Regularization
Image, normalization, 277 Image restoration, 3, 1 Iff, 42 neural network and, l5ff optimization and, 3f parallel processing and, 6S, 43 superresolution and, 2, 11 Integrated circuit, 233, 245 Interdomain boundaries, ferromagnets, 176- 180 Interface states, 228, 230, 234, 245 Interferometer-based systems, 192- I93 Intermolecular pair potentials, additivity, 54-55 Internal friction, 224 Ionic forces, 102-1 12 characteristic separation length, 104 contact value theorem, 103, 108 Derjaguin - Landau-Verwey-Overbeck theory, I10 diffuse counterion atmosphere, 102, 107- 108 “double layer”, 102 electrolyte solution effect, 106--112 Poisson-Boltzmann equation, 1 1 1 probe-sample charging in ambient liquids, 102- 106 repulsive ionic pressure, 108- 109 two-slab ionic pressure, 103-104 unwanted, 110- 1 11 “weak overlap approximation”, 108
Langmuir relation, 104 Laplace pressure, 121 Laplace’s equation, 131, 140 Laser-diode feedback system, 193 Lasers, 246 Lifshitz “random field approach”, 53-59 Light emitting diode, 246 Liquids, ambient, probe-sampling charging, 102-106 London model, 184 London penetration depth, 187-188 Longitudinal recording media, 159-164 line charge approximation, 159 stray field, 159, 161 Lorentz oscillations, 73
M Magnetic dipole moment, 146-147 Magnetic force microscopy theory, 133-190, see also Ferromagnetic microprobes applications, 188-190 contrast formation basics, 133-138 contrast modeling, 137, 157-181 electrical current detection, 180- 181 interdomain boundaries, 176- 180 longitudinal recording media, 159- 164 magneto-optic recording media, 167-169 periodic charge distributions, 157- 158 type-I1 superconductors, 169-176 vertical recording media, 164- 167 effective-domain model, 134-136, 145-147 ferromagnetic probe, 133-134 geometrical arrangements, 136- I37
304
INDEX
imaging of interdomain boundaries, 189 magnetostatic free energy, 136 magnetostatic potential, 135 perpendicular anisotropy, 189 “point-probe approximation”, 136-1 37 scanning susceptibility microscopy, 183-188 sensitivity, lateral resolution, and probe optimization concepts, I8 I 183 Magnetic moments, 155-156 Magnetic monopole moment, 146-147 Magneto-optic recording media, 167-169 Mapping problem, 284-285 McWhorter model, 215 Microplasma, 236 Model-driven parallel architectures, 292-296 Molecular interactions, renormalized, 61 -65 Molecular scale analysis, Van der Waals forces, application, 97- 100 Molecules, squeezing, see Solvation forces MOSFET, 218, 226, 230, 245 Multi-layered perceptron, 7f -
capabilities, 196 force-versus-distance curves, 195 instrumentation, 190-195 beam deflection system, 192 bimorph piezosensor, 192 capacitance detection system, 191-192 dynamic mode, 193- 194 electron tunneling, 191 force-sensing probe, 190- 191 interferometer-based system, 192- 193 laser-diode feedback system, 193 setup, 194-195 probe-sample interactions, 51 -129 capillary action, 52-53 capillary forces, 1 19- 127 ionic forces, 102-1 12 method, 51-53 patch charge forces, 127-129 solvation forces, 112-1 19 van der Waals forces, see van der Waals forces
P N Nee1 wall, 179-180 Neural network, 4S, see also Deconvolution electronic hardware and, 38f Hopfield, see Hopfield neural network optical hardware and, 40f. 44 Noise Barkhausen, 240 burst, 208, 21 I , 219, 223, 232, 236 current, 206 diffusion, 219 excess, 206, 209, 211, 234 flicker, 218 generation-recombination, 209, 214, 217, 219, 221, 226, 231, 245 Johnson, 206 l/f, 209, 212, 218, 220, 223, 226, 230, 232, 240, 245 Nyquist, 206 quantum l/f, 214 shot, 206 thermal, 206 Non-contact scanning force microscopy, 49196, see also Electric force microscopy; Magnetic force microscopy theory
Parallel architectures, 260 Parallel/pipelined model for image processing, 293-294 Parallel processing methodologies, 259-297, see also Application-driven methodologies; Architecture-driven methodologies algorithmic characteristics, 265-267 changing data requirements, 266 control parallelism, 269 data access patterns, 266-267 data parallelism, 269 evolution with mix of symbolic and numeric processing. 263-264 granularity, 273-274 prototypical view, 262 relationships between architectural and algorithmic features, 261 -273 research areas, 296-297 Patch charge forces, 127-129 Percolation, 233, 235 Periodic charge, distributions, 157- 158 Phase retrieval, 3f Photons, virtual, distribution, non-contact scanning force microscopy, 53-54
305
INDEX “Point-probe approximation”, 136- I37 Poisson-Boltzmann equation, 11 I Prior-DFT estimator, 13f, 31f Processing elements, arrays, 274 algorithm design, 277
Q Quartz, 224
R Radial stray field, 144-146 Random telegraph signal, 208, 21 I Regularization Hopfield energy function and, I7 ill-posed problem and, 12f Moore-Penrose generalized inverse and, 22 neural matrix inverse and, 26f singular value decomposition and, 31 Reliability. 237 Resistor, 232, 235, 239 Retardation wavelength, 83-86 as function of optical refractive indices, 84-85 as function of ultraviolet absorption frequencies. 85-86
S Scan Line Array Processor, 292 Scanning susceptibility microscopy, 183-- 188 application, 184 complete flux expulsion model, 186-187 force and compliance-versus-distance curves, 185- 186 probe-induced vortex nucleation process, 186- I87 relative force variations. I88 sensor, 183- 184 total repulsive force, 184 Scanning tunneling microscope, 49 Schottky diode, 226, 232 “Servo force”, 129 Simulated annealing, 4, 16, 25f Singular value analysis, 13 Singular value decomposition, 22, 27, 31f, 34
Sliding Memory Plane, 289 Solid-vacuum transition, 50-5 I Solvation forces, 112- 1 19 Clausius-Mossotti equation, 115 dielectric permittivity, 115 excess near-surface molecular density, 112-113 force per unity probe radius, 117-118 oscillating attractive/repulsive interaction, 118 oscillating Hamaker constant, 116 oscillatory, 114 periodic molecular ordering, 116-1 17 probe-sample interaction, 114-1 15 Strain, 220, 222, 225 Substrate surfaces, manipulation, 97 Superconductors scanning susceptibility microscopy application, 184 type-11, 169-176 constant field magnitude contours, 174-175 Ginzburg-Landau equations, 172-1 73 magnetostatic boundary value problem, 169- 170 reciprocal lattice, 171 surface stray field, 172-1 73 vortex lattice deformation, 175-176 Superparamagnetic probe, 182-183 Surface effects, 217 Surface states, 228 Synchronization, hardware support, 272-273 Synchronous model of parallel image processing, 292-293 Systolic model of computation, 278
T Thin-film probes, 148-157 Traps, 222, 225, 227, 235 Two-probe model, 149, 155 Two-slab problem, 55-61
V Vacancies, 219, 221 Van der Waals forces, 53-102, see also Hamaker constant absorbed surface layers, 90-92
306
INDEX
application, molecular-scale analysis and surface manipulation, 97- 100 bead-substrate interaction, 99- 100 continuum, 117 description, 53-55 dielectric contributions, 72-87 differential power law index, 70 dispersion interaction, 68-69 effective measure of curvature, 66 excess dielectric polarizability, 62 four-slab arrangement, 90-9 I intermolecular force, 61 interpolation between asymptotic regimes, 60 lateral resolution, 71 macroscopic consequence, 54 metal probe interaction with metal and mica substrate, 88-90 “molecular tip array”, 99- 100 multipole contributions, 96 non-contact microscopy, 101 observability, 87-89 particle-substrate dispersion interaction, 68 pressure, 56, 58-60 four-slab, 91 as function of separation, 86-87 interaction with ionic pressure, 105-106 two-slab, 87, 89 probe geometry effect, 65-72 retardation effect onset, 89
retardation wavelength, 83-86 retarded vacuum dispersion force, 95-96 size, shape, and surface effects, 92-97 sliding process, 97 sphere-slab arrangement, 93-94, 96 theory limitations, 92-97 tip-particle interaction, 97, 99 transition distance, 83 transition to renormalized molecular interactions, 61-65 two-slab problem, 55-61 two-sphere configuration, 94, 96 Variable space memory, 290 Vertical recording media, 164-1 67 Vertical stray field, 142-144, 152 Video Image Processor, 286-287 VisTA system, 288-291 Vortex nucleation process, probe-induced, 186- 187
W
Warp cell, 278 “Weak overlap approximation”, 108 WISARD system, 287-288 WORMOS, 282-283
2
Zener diode, 244
ISBN 0-22-014723-7 90051
I