ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 121 Electron Microscopy and Holography
EDITOR-IN-CHIEF
PETER W. HAWK...
83 downloads
926 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 121 Electron Microscopy and Holography
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics Electron Microscopy and Holography
EDITED BY
PETER W. HAWKES CEMES-CNRS Toulouse, France
VOLUME 121
San Diego San Francisco New York London Sydney Tokyo
Boston
∞ This book is printed on acid-free paper. C 2002 by ACADEMIC PRESS Copyright
All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the Publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-2002 chapters are as shown on the title pages: If no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670/02 $35.00 Explicit permission from Academic Press is not required to reproduce a maximum of two figures or tables from an Academic Press chapter in another scientific or research publication provided that the material has not been credited to another source and that full credit to the Academic Press chapter is given.
Academic Press An Elsevier Science Imprint 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com
Academic Press Harcourt Place, 32 Jamestown Road, London NW1 7BY, UK International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014763-7 PRINTED IN THE UNITED STATES OF AMERICA 02 03 04 05 06 SB 9 8 7 6 5 4 3 2
1
CONTENTS
CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FUTURE CONTRIBUTIONS . . . . . . . . . . . . . . . . . . . . . .
vii ix xi
High-Speed Electron Microscopy O. BOSTANJOGLO
I. II. III. IV.
Introduction . . . . . . . . High-Speed Techniques . . . Time-Resolving Microscopes. Conclusions . . . . . . . . References . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 2 6 45 46
. . . . . .
. . . . . .
53 55 59 81 87 87
. . . . . . . . . .
. . . . . . . . . .
91 93 94 118 119 124 126 132 140 141
Applications of Transmission Electron Microscopy in Mineralogy P. E. CHAMPNESS
I. II. III. IV. V.
Introduction . . . . . . . . . . . . . . Analytical Electron Microscopy of Minerals Phase Separation (Exsolution) . . . . . . HRTEM and Defect Structures . . . . . . Concluding Remark . . . . . . . . . . References . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Three-Dimensional Fabrication of Miniature Electron Optics A. D. FEINERMAN AND D. A. CREWE
I. II. III. IV. V. VI. VII. VIII. IX
Introduction . . . . . . . . . . . . . . . Scaling Laws for Electrostatic Lenses . . . . Fabrication of Miniature Electrostatic Lenses . Fabrication of Miniature Magnetostatic Lenses Electron Source . . . . . . . . . . . . . Detector . . . . . . . . . . . . . . . . Electron-Optical Calculations . . . . . . . Performance of a Stacked Einzel Lens . . . . Summary and Future Prospects . . . . . . . References . . . . . . . . . . . . . . .
v
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
vi
CONTENTS
A Reference Discretization Strategy for the Numerical Solution of Physical Field Problems CLAUDIO MATTIUSSI
I. II. III. IV. V. VI.
Introduction . . Foundations . . Representations Methods . . . Conclusions . . Coda . . . . . References . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
144 147 183 222 273 275 276
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
281 282 288 290 294 302 305 330 330
INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
333
The Imaging Plate and Its Applications NOBUFUMI MORI AND TETSUO OIKAWA
I. II. III. IV. V. VI. VII. VIII.
Introduction . . . . . . . . . . . . . . . . . . Mechanism of Photostimulated Luminescence (PSL) . Imaging Plate (IP) . . . . . . . . . . . . . . . Elements of the IP System . . . . . . . . . . . . Characteristics of the IP System . . . . . . . . . Practical Systems. . . . . . . . . . . . . . . . Applications of the IP . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . .
CONTRIBUTORS
Numbers in parentheses indicate the pages on which the authors’ contributions begin.
O. BOSTANJOGLO (1), Optisches Institut, Technische Universit¨at Berlin, D-10623 Berlin, Germany P. E. CHAMPNESS (53), Department of Earth Sciences, University of Manchester, Manchester M13 9PL, United Kingdom D. A. CREWE (91), Microfabrication Applications Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, Illinois 60607 A. D. FEINERMAN (91), Microfabrication Applications Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, Illinois 60607 CLAUDIO MATTIUSSI (143), Clampco Sistemi-NIRLAB, AREA Science Park, Padriciano 99, 34012 Trieste, Italy NOBUFUMI MORI (281), Fuji Photo Film Co., Ltd., 798, Miyanodai, Kaisei, Ashigarakami, Kanagawa, 258-8538 Japan TETSUO OIKAWA (281), JEOL Ltd., Shin-Suzuharu Bld. 3F, 2-8-3 Akebonocho, Tachikawa, Tokyo, 180-0012 Japan
vii
This Page Intentionally Left Blank
PREFACE
The founding editor of these Advances was Ladislaus (Bill) Marton, one of the pioneers of electron microscopy, who built early microscopes in Brussels in the 1930s and obtained the first biological micrographs. He was later involved in the first efforts to construct a commercial model in the USA during the 1940s. Articles on electron and, more recently, other forms of microscopy have hence appeared regularly in the series. It is thus very natural that this volume, the second of the two thematic volumes announced in volume 119, should contain a collection of recent chapters in the broad area of electron microscopy and holography. In fact, the selection that I originally made proved to occupy too many pages for a single book and three further chapters (by K. Hiraga on quasicrystals, G. Matteucci, G. F. Missiroli and G. Pozzi on electron holography and E. Oho on digital processing of the scanning electron microscope image) will be included in volume 122, together with a regular contribution by A. Khursheed. No further thematic volumes are planned. The five chapters reprinted here cover the very specialized techniques of high-speed electron microscopy, the study of minerals by electron microscopy, miniature electron lenses and microscopes and the imaging plate, which is now usefully complementing the more traditional recording media. In addition, there is a contribution by C. Mattiussi on numerical methods for field calculation. These all seemed to me important enough to deserve republication in this form, though I have to admit that many other contributions had arguably just as strong a claim. I am most grateful to the contributors to this volume for consenting to reappear here and for the work of revision. Their chapters first appeared in vol. 110 (O. Bostanjoglo), vol. 101 (P. E. Champness), vol. 102 (A. D. Feinerman and D. A. Crewe), vol. 113 (C. Mattiussi) and vol. 99 (N. Mori and T. Oikawa). Peter Hawkes
ix
This Page Intentionally Left Blank
FUTURE CONTRIBUTIONS
T. Aach Lapped transforms G. Abbate New developments in liquid-crystal-based photonic devices S. Ando Gradient operators and edge and corner detection A. Arn´eodo, N. Decoster, P. Kestener and S. Roux A wavelet-based method for multifractal image analysis M. Barnabei and L. Montefusco Algebraic aspects of signal and image processing C. Beeli Structure and microscopy of quasicrystals I. Bloch Fuzzy distance measures in image processing G. Borgefors Distance transforms A. Carini, G. L. Sicuranza and E. Mumolo V-vector algebra and Volterra filters Y. Cho Scanning nonlinear dielectric microscopy E. R. Davies Mean, median and mode filters H. Delingette Surface reconstruction based on simplex meshes A. Diaspro Two-photon excitation in microscopy D. van Dyck Very high resolution electron microscopy R. G. Forbes Liquid metal ion sources xi
xii
FUTURE CONTRIBUTIONS
E. F¨orster and F. N. Chukhovsky X-ray optics A. Fox The critical-voltage effect L. Frank and I. Mullerov´ ¨ a Scanning low-energy electron microscopy M. Freeman and G. M. Steeves Ultrafast scanning tunneling microscopy A. Garcia Sampling theory L. Godo & V. Torra Aggregation operators P. W. Hawkes Electron optics and electron microscopy: conference proceedings and abstracts as source material M. I. Herrera The development of electron microscopy in Spain J. S. Hesthaven Higher-order accuracy computational methods for time-domain electromagnetics K. Ishizuka Contrast transfer and crystal images I. P. Jones ALCHEMI W. S. Kerwin and J. Prince The kriging update model B. Kessler Orthogonal multiwavelets A. Khursheed (vol. 122) Recent accessories for scanning electron microscopes G. K¨ogel Positron microscopy W. Krakow Sideband imaging
FUTURE CONTRIBUTIONS
xiii
N. Krueger The application of statistical and deterministic regularities in biological and artificial vision systems B. Lahme Karhunen–Loeve decomposition B. Lencov´a Calculation of the properties of electromagnetic fields and electron lenses C. L. Matson Back-propagation through turbid media P. G. Merli, M. Vittori Antisari and G. Calestani, eds (vol. 123) Aspects of Electron Microscopy and Diffraction S. Mikoshiba and F. L. Curzon Plasma displays M. A. O’Keefe Electron image simulation N. Papamarkos and A. Kesidis The inverse Hough transform M. G. A. Paris and G. d’Ariano Quantum tomography C. Passow Geometric methods of treating energy transport phenomena E. Petajan HDTV F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission T.-C. Poon Scanning optical holography H. de Raedt, K. F. L. Michielsen and J. Th. M. Hosson Aspects of mathematical morphology E. Rau Energy analysers for electron microscopes H. Rauch The wave-particle dualism
xiv
FUTURE CONTRIBUTIONS
R. de Ridder Neural networks in nonlinear image processing D. Saad, R. Vicente and A. Kabashima Error-correcting codes O. Scherzer Regularization techniques G. Schmahl X-ray microscopy S. Shirai CRT gun design methods T. Soma Focus-deflection systems and their applications I. Talmon Study of complex fluids by transmission electron microscopy M. Tonouchi Terahertz radiation imaging N. M. Towghi Ip norm optimal filters Y. Uchikawa Electron gun optics J. S. Walker Tree-adapted wavelet shrinkage C. D. Wright and E. W. Hill Magnetic force microscopy F. Yang and M. Paindavoine Pre-filtering for pattern recognition using wavelet transforms and neural networks M. Yeadon Instrumentation for surface studies S. Zaefferer Computer-aided crystallographic analysis in TEM
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 121
High-Speed Electron Microscopy O. BOSTANJOGLO Optisches Institut, Technische Universit¨at Berlin, D-10623 Berlin, Germany
I. Introduction . . . . . . . . . . . . . . . . . . II. High-Speed Techniques . . . . . . . . . . . . . A. Short-Time Exposure Imaging . . . . . . . . . 1. Laser-Driven Thermionic Gun . . . . . . . . 2. Laser-Driven Photoelectron Guns . . . . . . . B. Streak Imaging . . . . . . . . . . . . . . . . C. Image Intensity Tracking . . . . . . . . . . . III. Time-Resolving Microscopes . . . . . . . . . . . A. Time-Resolving Transmission Electron Microscopy 1. Instrumentation . . . . . . . . . . . . . . 2. Applications . . . . . . . . . . . . . . . 3. Space–Time Resolution . . . . . . . . . . . B. Flash Photoelectron Microscopy . . . . . . . . 1. Instrument for Short Exposure Imaging . . . . 2. Applications . . . . . . . . . . . . . . . 3. Limits . . . . . . . . . . . . . . . . . . C. Pulsed High-Energy Reflection Electron Microscopy D. Pulsed Mirror Electron Microscopy . . . . . . . IV. Conclusions . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
1 2 3 3 4 5 6 6 7 7 12 22 25 27 29 34 36 40 45 46
I. Introduction Electron microscopy is used to investigate miscellaneous material properties with a high spatial resolution. The most familiar applications are imaging of the atomic structure of solids, of crystal defects, of magnetic and electric fields in solids, and of the chemical composition of thin films and surfaces (e.g., Murr, 1991; Reimer, 1985, 1993). Conventionally, a stationary electron beam either illuminates the whole specimen in a single exposure or scans the specimen. An image of the static distribution of a specific material property is produced in both cases. If time-varying effects are to be captured the microscope must be pulsed. Periodic variations of a material property are pinned down by synchronously pulsing the electron beam with the period of the time-varying material property and summing the signals within a selected acquisition time to produce the image. This sampling procedure reduces the superimposed noise to a low level 1 Volume 121 ISBN 0-12-014763-7
C 2002 by Academic Press ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright All rights of reproduction in any form reserved. ISSN 1076-5670/02 $35.00
2
O. BOSTANJOGLO
because of its statistical nature, and “images” with a joint submicrometer– picosecond resolution have been produced, for example, by Brunner et al. (1987). A fast nonrepetitive process is less easily uncovered, as all information about the transient state must be captured by a single short probing pulse. Nevertheless, these nonrepetitive processes have attracted considerable interest in fundamental and applied research in connection with material processing by laser pulses. Typical applications in which pulse lasers progressively replace established tools are localized cutting, drilling, ablating, patterning, alloying, and connecting of a wide variety of materials. The key condition for a precise local treatment, that is, for a minimum thermal and mechanical loading of the neighboring material, is that the required photon energy be deposited locally and in a short time. Thermal melting, melt flow, crystalline and noncrystalline solidification, and thermal evaporation are the main processes which determine the product of material treating with a laser pulse in excess of some 10 ps. Commonly, the pump-probe technique, exploiting, for example, light-optical microscopy, is used as a diagnostic tool to track laser-induced effects. The light-optical methods are very fast and reach a time resolution of several femtoseconds (e.g., Sch¨onlein et al., 1987). Their drawbacks are a limited spatial resolution (>1 μm) and the fact that they primarily sense the electronic system, so that properties related to the atomic structure must be deduced with a suitable model. Material structure is better approached by electron microscopy, some modes of which directly probe the atomic packing. Furthermore, effects which are not accompanied by a large change of electronic states strongly interacting with visible light, such as phase transitions in metals, are not easily detected by light optics. They appear, however, with good contrast when they are imaged by electron microscopy based on Coulomb scattering of the probing electrons at the atomic structure. This article describes the various time-resolving electron-optical techniques which were developed to study fast transient effects in freestanding films and on surfaces of bulk materials down to the nanosecond time scale. Hydrodynamic instabilities in confined laser pulse–produced melts and the solidification and evaporation of these melts were investigated, as they are of major concern to micromachining with laser pulses. The mechanisms uncovered by high-speed electron microscopy are presented.
II. High-Speed Techniques There are three time-resolving techniques, which are distinguished by the number of spatial coordinates in the image: short-time exposure imaging, streak imaging, and image intensity tracking.
HIGH-SPEED ELECTRON MICROSCOPY
3
A. Short-Time Exposure Imaging The short-time exposure imaging technique pins down a transient stage of a fast process by producing a two-dimensional image of the specimen with a short exposure time. This may be realized either by using a stationary illumination of the specimen and enabling the image detector for only a short time (Bostanjoglo, Kornitzky, et al., 1989; Bostanjoglo, Tornow, et al., 1987a, 1987b) or by illuminating/exciting the specimen with a short electron/photon pulse and recording the electron image with a stationary detector. The first method requires sophisticated pulse electronics and shielding precautions. Preferably, the image detector is a charge-coupled-device (CCD) camera backed by an image intensifier. A sealed intensifier may be gated by pulsing the moderate voltage between the photocathode and the first gain stage, which is a microchannel plate (MCP). An open MCP intensifier is enabled by pulsing the voltage across the channel plate. This voltage is the smallest one so that electromagnetic interference due to switching is minimized. In addition, the applied voltage may appreciably exceed the maximum safe dc voltage for a short period, giving a gain in the pulsed mode which surpasses the dc value by two orders of magnitude. The second technique is superior as it may provide a much brighter illumination than that of a stationary beam if the electrons are emitted by a pulsed source. Short electron pulses may be produced by a fast deflection of a constant current beam (Gesley, 1993), by pulsing the voltage of the Wehnelt electrode (Szentesi, 1972) or of a filter lens (Plies, 1982), or by exciting the electron emitter with a laser pulse. Only the last method yields the high current densities required for nonsampling short-time exposure imaging. The laser-driven gun used in the author’s group is distinguished by the fact that it can be operated both as a conventional dc thermal gun and as a highcurrent pulsed gun. It is a three-electrode-type gun, consisting of a hairpin emitter, a Wehnelt electrode, and an anode, which houses an aluminum mirror for directing the laser beam onto the tip of the hairpin. This gun may be pulsed in the thermionic or the photoelectron emission mode. As high current density guns are the key component for short exposure imaging they are considered in some detail next. 1. Laser-Driven Thermionic Gun If the emitter is heated by a nanosecond (or shorter) laser pulse, the emitter can attain a temperature well above the melting point, without being destroyed, and thermal electron pulses with current densities exceeding those produced by dc heating are attained (Bostanjoglo and Heinricht, 1987; Bostanjoglo, Heinricht, et al., 1990; Sch¨afer and Bostanjoglo, 1992). In addition, emitter atoms are evaporated. They are ionized by the accelerated thermal electrons and reduce their negative space charge, so that electron current densities exceeding the
4
O. BOSTANJOGLO
Child limit of genuine electron emitters by one order of magnitude can be generated. However, this gun has several serious drawbacks. First, as the surface is eroded by each laser pulse, its absorption coefficient and therefore the deposited laser fluence vary from pulse to pulse, which produces unpredictable electron pulse currents. In addition, the length of the electron pulse may exceed that of the laser pulse by more than 100% as a result of delayed emission of captured electrons as the plasma is diluted by expanding into the vacuum. This poor pulse-to-pulse stability makes the laser-driven thermionic gun unsuitable for multiframe imaging. Last, this mode of operation is hazardous, since the gun is driven to the threshold of laser-induced electric breakdown. A small up-deviation of the deposited laser fluence triggers a high-voltage breakdown, which in turn launches a high-amplitude traveling wave that may destroy electronic circuits of the microscope or of the attached high-speed diagnostic devices. 2. Laser-Driven Photoelectron Guns Photocathodes with work functions ranging from the lowest values of ≈2 eV up to 4 eV have been used in laser-excited guns. Data on a number of electron emitters are given, for example, by Anderson et al. (1992), Chevallay et al. (1994), and Travier (1994). Materials with low work functions (<3 eV) are alkali, alkali earth and rare earth metals, oxides of alkali earth metals (Lablond and Rajaonera, 1994), rare earth hexaborides (May et al., 1990; Watari and Yada, 1986; Yada, 1986), borides and carbides of refractory metals (Yada, 1986), and semiconductors with negative electron affinity (Baum et al., 1995). Unfortunately, materials with work functions low enough to emit electrons by one-photon excitation with visible light are mechanically and thermally weak. In addition, they have a low threshold for damage by ion bombardment and foreign gases, so that they must be operated at pressures below 10−8 mbar. Mechanical strength can be increased by implanting, for example, alkali atoms into a refractory metal (Girardeau-Montaut, Girardeau-Montaut, Afif, et al., 1995), but these cathodes are still sensitive to poisoning and require an ultrahigh vacuum. The electron yield of some noble metals is appreciably increased if they are deposited as granular films on an inert substrate (Sabary and Bergeret, 1994). Such thin-film cathodes have been successfully used in electron beam testing devices (Batinic et al., 1995). Materials which are mechanically and thermally stable and chemically inert enough to be operated in a high vacuum (<10−5 mbar) have work functions above 3 eV. Photoelectron emission can be excited from these materials either by one-photon absorption of ultraviolet radiation having a quantum energy larger than the work function, or by multiphoton absorption of visible light.
HIGH-SPEED ELECTRON MICROSCOPY
5
The intensity of the light pulse must then be very high (>1013 W/cm2) in order that multiphoton processes occur with a high probability. Simultaneously, the pulse must be shorter than the thermalization time of the lattice (1–10 ps) so that the photon-absorbing electrons are ejected before an appreciable amount of the deposited energy is transferred to the lattice (Fujimoto et al., 1984; Girardeau-Montaut, Girardeau-Montaut, and Monstaizis, 1994; Wang et al., 1994). Currently, these ultrashort laser pulses cannot be used in high-speed electron microscopy as the released electron pulses do not contain enough electrons to produce images with submicron spatial resolution of normal specimens. Since these images usually require exposure times of a few nanoseconds, ultraviolet nanosecond laser pulses must be used to excite the photocathode. Convenient sources of this radiation are excimer or Q-switched and frequencymultiplied solid-state lasers. If multiple pulses with a variable spacing in the nano- to microsecond range are to be produced, Q-switched lasers are well suited. Successive pulses are conveniently generated by a stepwise decrease of the losses of the laser resonator (Koechner, 1996). As it is desirable to operate the same microscope both in the high-speed mode and in the conventional nonpulsed mode for routine investigations and adjustment, the photocathode should tolerate standard Joule heating for thermionic emission. For convenience the cathode also should work properly in the high vacuum of an ordinary electron microscope (10−4 to 10−6 mbar). These specifications can be met by coating the hairpin of a standard thermal cathode with a suitable photoelectron emitter. CeB6, LaB6, ZrC, Ce, Tb, Ti, and Zr were tested as photoelectron emitters, with Nb, Ta, W, Ir-W, and Re as refractory metals for the supporting hairpin in various combinations (Nink, 1998). The compounds, being powders, were deposited by cataphoresis on the hairpin and baked at ≈1200◦ C and 10−5 mbar. The metal photocathodes were fabricated by suspending a tiny chip of the metal from the tip of the hairpin and melting it by Joule heating in high vacuum in the presence of a stabilizing electric field. The photocathodes producing the highest current densities with the used ultraviolet radiation (266 nm) in the microscope vacuum (≈10−6 mbar), and simultaneously having the highest threshold for laserinduced flashover and withstanding conventional operation by dc heating, were Zr- and ZrC-coated Re. These cathodes produced a current density of ≈700 A/cm2 with an axial brightness of ≈4 × 106 A/cm2 · sr at an acceleration voltage of 100 kV. B. Streak Imaging Streak imaging is accomplished by confining the visible part of the object with a slit aperture in the plane of an (enlarged) intermediate image, and sweeping
6
O. BOSTANJOGLO
the final one-dimensional image across the image detector in the direction perpendicular to the slit (Bostanjoglo and Kornitzky, 1990; Bostanjoglo and Nink, 1997). The sweeping velocity should be constant to have a “homogeneous” exposure of the detector. Since the slit width cannot be zero the image is blurred along the time axis. The optimum width is determined by a compromise between shot noise in the image and the time resolution. During the streak operation the specimen is illuminated by an electron beam with the highest current density possible, in order to minimize shot noise. Since prolonged illumination would inevitably lead to radiation damage of the specimen, the electron beam is directed onto the specimen only for the short period of the streak. This technique continuously visualizes transitions in the specimen which proceed along the slit coordinate.
C. Image Intensity Tracking With image intensity tracking, the bright-field image intensity of a selected region of the specimen is detected with a fast scintillator/photomultiplier and registered with a fast storage oscilloscope (Bostanjoglo and Liedtke, 1980; Bostanjoglo, Schlotzhauer, et al., 1982). The specimen is illuminated with an electron beam of maximum current density to achieve a high signal-to-noise ratio. As a way to avoid radiation damage of the specimen, the beam is passed to the specimen by a blanking capacitor for a limited time of a few microseconds only. If the image intensity is to be tracked for a period exceeding the safe time, the electron beam is passed in a number of equidistant short pulses (Bostanjoglo and Thomsen-Schmidt, 1989). This time-resolving mode supplies a continuous record of the fast processes in the specimen which modify the image intensity.
III. Time-Resolving Microscopes There are several types of electron microscopes which probe different zones of the specimen. Transmission microscopes uncover the volume processes of freestanding films which often mimic bulk material. Properties of the top layers of a surface are successfully studied by the photoelectron microscope. Reflection electron microscopy gives access to the space above the surface of the specimen. The mirror electron microscope probes the surface and the space above. These four types of electron microscopes were adapted to investigations of fast processes in their specific domain.
HIGH-SPEED ELECTRON MICROSCOPY
7
A. Time-Resolving Transmission Electron Microscopy The transmission microscope probes films that are 1 nm to 1 μm thick. Usually, a part of the electrons scattered by the atoms is intercepted by an aperture in the back focal plane of the objective lens and a bright- or dark-field image is produced. The image intensity replicates, among other things, lattice structure, orientation, and defects and grain boundaries in crystalline samples by Bragg scattering, and the distribution of thickness, atomic number, and density in amorphous films by atomic scattering (e.g., Reimer, 1993).
1. Instrumentation a. Short-Time Exposure Imaging Figure 1 shows a commercial transmission microscope modified for tripleframe short exposure imaging of laser-induced processes in freestanding films (D¨omer and Bostanjoglo, 2001; Nink et al., 1998). The electron gun is of the standard three-electrode type but with a Zr- or ZrC-coated Re hairpin emitter, which may be operated in the conventional stationary Joule-heated thermal mode or as a laser-pulsed photocathode. Up to three successive high-current electron pulses with a width of 7–10 ns and a selectable spacing of 20 ns to 2 μs are delivered. The laser driving the photocathode is a Q-switched twice frequency-doubled Nd:YAG laser (wavelength, 266 nm). Up to three pulses are extracted by decreasing the Q-spoiling voltage at the Pockels cell in one, two, or three steps with transition times of ≈2 ns and with a variable spacing of the steps. The driving laser beam is directed onto the cathode tip by an aluminum mirror, installed in the anode, and focused by an external lens to a spot with a 1/e2 diameter of ≈20 μm. Photoelectron pulses with peak currents of several milliamperes into a half angle of 7 · 10−3 were produced. Since the microscope is provided with an automatic Wehnelt bias which, however, does not respond to nanosecond pulses of the beam current, the proper Wehnelt bias may be set with a dc electron beam current by conventional Joule heating of the photocathode also in the laser-driven mode. The disturbing stationary beam is deflected by a voltage at a blanking capacitor, which is switched off only during the emission of the photocathode. A bright-field image of the specimen is produced with an objective, an intermediate (not shown in Fig. 1), and a projective lens on a transmission screen. This image is intensified with a fiber-coupled MCP intensifier, picked up by a fiber-coupled CCD camera, digitized with a frame grabber, displayed on a monitor, and stored in a computer. Three successive frames are recorded by displacing the image on the detector with a frame-shifting capacitor between the three illuminating electron pulses.
8
O. BOSTANJOGLO
Figure 1. High-speed transmission electron microscope with integrated pulse laser for treatment of the specimen. (a) Setup for triple-frame imaging: 1, laser pulse–driven photoelectron gun; 2, beam blanker; 3, pulse laser for treatment of the specimen; 4, specimen; 5, objective lens with aperture; 6, field aperture; 7, frame shifter; 8, fiber plate transmission phosphor screen; 9, microchannel plate (MCP) image intensifier; 10, charge-coupled-device (CCD) sensor. PC, (b) Disintegrating melt produced by an 8-ns laser pulse in a chromium film (D¨omer and Bostanjoglo, 2001). The frames were taken at the indicated times counted from the peak of the laser pulse. The liquid is shattered as a result of fast expansion/contraction of the surrounding crystals undergoing a solid-state transition during heat up/cooldown. The frames at 350 and 430 ns are additionally shown with enhanced contrast to reveal drops produced from a liquid cylinder by the Rayleigh instability.
The frames are separated by confining the image by a rectangular aperture in the image plane of the intermediate lens. The fast processes to be investigated are induced in the specimen by a second Q-switched Nd:YAG laser which is frequency doubled and emits 5- or 15-ns pulses (full width at half maximum, or FWHM) at a wavelength of 532 nm. This radiation is directed onto the thin-film specimen by an adjustable dielectric mirror on a heavily doped and polished silicon substrate with a bore to pass the electron beam. The treating laser beam is Gaussian in space and time and can be focused to a spot with a 1/e2 diameter of 12 μm on the specimen.
HIGH-SPEED ELECTRON MICROSCOPY
9
Figure 1. (Continued )
Two modes are provided by a home-built logic circuitry, periodic triggering of both lasers and the beam deflectors for adjusting electron and laser pulses, and single-shot operation for grabbing a multiframe short exposure image of a single laser pulse treatment of the specimen. b. Streak Imaging The arrangement of the transmission electron microscope for streak imaging is shown in Figure 2 (Bostanjoglo and Nink, 1997). It coincides with that in Figure 1 except for two differences. The illuminating electron beam is produced by conventional thermionic operation of the electron gun and is directed onto the specimen for only the period of the streak. A streak image is produced by applying a voltage ramp to the frame-shifting capacitor. The length of the streak can be selected from a few nanoseconds to several microseconds. One-dimensional confinement of the images is achieved with an adjustable narrow slit aperture in the image plane of the intermediate lens. Streak
10
O. BOSTANJOGLO
Figure 2. Transmission electron microscope for streak imaging: 1, conventional electron gun; 2–5 and 8–10, as in Figure 1; 6, slit aperture; 7, linear image shifter.
imaging is particularly appropriate to determine the velocity of fast-moving phase boundaries—solid/solid, solid/liquid, or liquid/vapor—which appear in the bright-field image because of changes in electron scattering or in film thickness. c. Image Intensity Tracking Figure 3 shows the microscope for the image intensity tracking mode of operation. The bright-field electron image intensity of a selected area of the specimen (>100 nm) is converted into a voltage signal by a fast plastic scintillator
HIGH-SPEED ELECTRON MICROSCOPY
11
Figure 3. Transmission electron microscope for tracking laser pulse–induced transitions in thin films: 1 – 5, as in Figure 2; 6, circular field aperture; 7, plastic scintillator; 8, photomultiplier tube.
(Pilot U, 1.9-ns rise time) plus a photomultiplier (rise time, 2 ns). This signal is recorded with a storage oscilloscope (rise time, 0.35 ns). The resulting time resolution of the recording unit is (1.92 + 22 + 0.352)1/2 ≈ 3 ns. The illuminating electron pulse is generated as in the case of streak imaging. Intensity tracking continuously records changes of the electron scattering, which may be due to phase transitions or removal/accumulation of
12
O. BOSTANJOGLO
material from/at the probed region. This technique is therefore well suited to detect transient states and measure their lifetime and the period of phase transformations. 2. Applications Two typical applications of time-resolved transmission microscopy are reported in this section: hydrodynamic instabilities of metal melts subjected to high lateral thermal gradients (≈109 K/m) and ablation of metal films by laser pulses. These processes and metals as material were selected because they have a bearing on micromachining with laser pulses. A laser pulse, bell-shaped both in time (5–15 ns, FWHM) and in space (12 μm, 1/e2 diameter), is applied to a freestanding metal film with a typical thickness of 100 nm. The film contains impurities due to a preceding exposure to air. The existence of such impurities is typically the case in laser microprocessing. As the fluence of the treating laser pulse is increased, two regimes are encountered. In the lower fluence regime a local melt is produced which solidifies again. In the upper regime parts of the treated region are ablated. The details of the observed behavior of the treated metal considerably deviate from what naively is expected. a. Thermal Gradient–Driven Instabilities of Metal Melts The thickness D of the treated film must be smaller than the thermal diffusion length during the laser pulse (D < 200 nm for all metals and 10-ns pulses). In this case an in-plane bell-shaped distribution of the temperature T is produced in the film depending only on the radial coordinate r. The fluence of the laser pulse must be high enough to melt the film within a certain radius but too low to heat the film appreciably above the melting temperature. Then, radiation pressure as well as evaporation of metal atoms and their recoil pressure can be safely neglected for nanosecond pulses. The only force the originally flat melt is subject to after the laser pulse stems from a possible gradient dγ /dr of the surface tension γ, which is identical to a shear stress acting on both surfaces. There then exists a negative thermal gradient ∂T/∂r < 0 in the melt. Since the surface tension depends on temperature, and tabulated thermal coefficients are negative (about −3 × 10−4 N/m · K for many metals; e.g., Iida and Guthrie, 1988), the melt is expected to experience a positive shear stress at both surfaces: ∂γ ∂ T dγ = · >0 dr ∂T ∂r
(1)
This shear force monotonously drags the liquid to the cooler solid periphery, piling it up there and finally opening a hole at the center of the melt. The actual flow, however, is quite different (Bostanjoglo and Nink, 1997; Bostanjoglo and Otte, 1993; Nink et al., 1999). Figures 4 through 6 show
HIGH-SPEED ELECTRON MICROSCOPY
13
Figure 4. Short exposure images of flow in a laser pulse–produced melt in an amorphous Ni0.8P0.2 film (60 nm). Exposure time was 10 ns. The moment of exposure was counted from the peak of the treating laser pulse (−∞ before, ∞ 10 s after the pulse) and is given at the upper right corners. The flow stopped about 1 μs after the laser pulse, whereas the melt crystallized within 4–10 μs after the pulse. (a) Centripetal flow after a low-energy laser pulse (1.2 μJ). There is no reversal of the flow direction. (b) Centripetal flow followed by centrifugal flow after a high-energy pulse (1.6 μJ). Flow direction is reversed ≈300 ns after the laser pulse.
the hydrodynamics of laser pulse–produced melts in different metal films, visualized by the three time-resolving techniques described in Section II. None of the liquids, which were subjected to an in-plane thermal gradient, was perforated, as was expected for a flow driven by negative thermocapillarity (∂γ /∂T < 0). Instead, the flow conspicuously depends on the starting temperature. At lower temperatures the liquid simply contracts within 100 ns and solidifies with a bump at the center. At higher temperatures flow starts with a fast contraction and continues with reversals of the flow direction. In addition to flow, crystallization of the melt of an “ordinary” metal with a high thermal diffusivity starts at its solid periphery and proceeds with an almost constant velocity of several meters per second toward the center of the melt (Fig. 7; Bostanjoglo and Nink, 1996). In the case that the melt accumulates at the center, a solid film with concentric modulations of the thickness is produced (Fig. 8). Melts which are produced with a pulse of a higher fluence are subdivided by an emerging concentric ring-shaped trench (Fig. 9; Niedrig and Bostanjoglo,
14
O. BOSTANJOGLO
Figure 5. Nonmonotonous flow in a laser pulse–produced melt pool in a polycrystalline cobalt film (60 nm). (a) Streak image of the melt flow. The melting 5-ns laser pulse was applied at the top edge. The slit aperture (width, 1 μm) passed the central region of the melt (lower edge in (b)). (b) Texture after crystallization of the melt.
1997). The inner zone contracts and finally separates, which forms a free disk that continues to contract as a result of surface tension and disappears in the end. The observed complicated flow can under no circumstances be explained with tabulated material parameters and the assumed shear stress in Eq. (1). Rigorous numerical simulation based on the Navier–Stokes and heat equation and simple physical arguments lead inevitably to a monotonous perforation of the melt within 100 ns. Figure 10 gives a hint to the decisive mechanism behind the actual flow. A melt in a gold film contracts after the first laser pulse. If a second pulse of similar fluence is applied after solidification but before a monolayer of gas is adsorbed from the high vacuum of the microscope, the melt then flows to the periphery. This reversal does not occur if the treated area is allowed to adsorb about a monolayer of air molecules (Bostanjoglo and Nink, 1996). Obviously the flow of “real” liquid metals is determined by surface active impurity atoms. These atoms accumulate at the surface by replacing metal atoms, which thereby decreases the surface tension according to the Gibbs isotherm: dγ = −kT Ŵ d(ln X )
(2)
HIGH-SPEED ELECTRON MICROSCOPY
15
Figure 6. Oscillating flow in a laser pulse–produced melt pool in a polycrystalline iron film (60 nm). (a) Texture after crystallization of the melt. (b, c) Oscilloscope traces showing the bright-field image intensity within the circle in (a) at two time scales after the melting laser pulse (arrow). m and cr denote melting and crystallization, respectively. The final level of the intensity in (c) remains constant.
with k, Ŵ, and X the Boltzmann constant, the excess surface density of the surface active atoms adsorbed at the surface layer, and the atomic fraction of the surface active atoms in the bulk liquid, respectively. Thus, the surface tension decreases with increasing concentration of surface active impurities (∂γ /∂X < 0). The thermal coefficient ∂γ /∂T is also changed (Fig. 11; Ricci and Passerone, 1993; Vitol and Orlova, 1984). If the concentration of the impurities is high enough ∂γ /∂T becomes even positive below some temperature To. Above To the coefficient is again negative and approaches the value of the pure metal. Taking into account that the surface tension is a function of temperature and atomic fraction of the surface active impurities, γ = γ (T, X ), the shear
16
O. BOSTANJOGLO
Figure 7. Typical crystallization at nearly constant velocity of a melt pool produced by a focused laser pulse in a crystalline metal film (aluminum, 60 nm). (a) Streak image. The melting 5-ns laser pulse was applied at the upper edge. The dark triangle is liquid metal; the vertical dark stripes within the bright area are Bragg-scattering crystals in the crystalline material. Propagation velocity of the crystal/liquid boundary is 5 m/s. (b) Texture after solidification of the melt. The rectangle marks the location of the streak aperture.
Figure 8. Concentric thickness modulations of a solidified laser pulse–produced melt in a gold film (90 nm), imaged by backscattered electrons in the scanning microscope.
HIGH-SPEED ELECTRON MICROSCOPY
17
Figure 9. Chemocapillary flow in a laser pulse–produced melt in an aluminum film (90 nm), imaged by short exposure transmission microscopy. Exposure time was 5 ns. The moment of exposure was counted from the peak of the laser pulse and is indicated at the upper left corners. The applied laser pulse (15 ns, 3.5 μJ) produced a hole.
stress driving the melt flow must then be ∂γ ∂γ ∂T ∂X dγ = + dr ∂ T X ∂r ∂ X T ∂r
(3)
It is determined by the thermal and compositional gradients, which cause a thermo- and a chemocapillary flow, respectively. Oxygen atoms are known
Figure 10. Solidified melt pools in the same gold film (65 nm), showing that melt flow after one laser pulse is opposite that after two successive pulses. (a) Transmission microscope image of the solidified melt after one laser pulse of 1.6 μJ. The melt piled up at its center. (b) Structure after two successive pulses that are 4 μs apart and have the same energy as in (a). The melt solidified after the first pulse and piled up at its periphery after the second melting pulse. The melts solidified about 1 μs after a laser pulse.
18
O. BOSTANJOGLO
Figure 11. Typical dependence of the surface tension γ of a metal on temperature and atomic fraction X of surface active impurities in the bulk liquid. Tm and Tc are the melting and critical temperatures, respectively. With growing X a maximum of γ appears at To.
to be surface active in various metals (Ricci and Passerone, 1993; Vitol and Orlova, 1984), and they were abundant in the investigated films that were exposed to ambient atmosphere. With all this in mind the following scenario is expected. The melt, originally having a homogeneous distribution of impurities along r (∂X/∂r = 0), but being subjected to a thermal gradient (∂T/∂r < 0), starts to flow as a result of the thermocapillary shear stress: ∂γ ∂T ∂γ (4) = ∂r X ∂ T X ∂r The bulk liquid lags behind the near-surface layers, since it is dragged by them by means of viscosity and since it is driven by the Laplace pressure, which appears only as the surface deforms. Accordingly, the surface active atoms are redistributed by fast surface flow in such a way that their concentration is reduced in regions having a positive gradient of the surface velocity, and vice versa. A compositional gradient ∂X/∂r emerges, which has the same sign as that of the thermocapillary force, and which produces a chemocapillary shear stress: ∂γ ∂γ ∂X = (5) ∂r T ∂ X T ∂r Since (∂γ /∂X)T < 0, the chemocapillary force produced by a thermocapillarydriven flow always opposes the latter. Therefore, the original flow is either stopped or even reversed, in agreement with the observed flow dynamics. The different directions of the early stages of flow (i.e., centripetal after a low-energy and centrifugal after a high-energy laser pulse) follow from the
HIGH-SPEED ELECTRON MICROSCOPY
19
convex shape of the γ –T curve at high concentrations of surface active impurities (Fig. 11). Since the compositional gradient ∂X/∂r is zero at the beginning, the direction of the early melt flow is determined by the sign of the thermal coefficient ∂γ /∂T alone. If the maximum temperature produced by the laser pulse is below To in Figure 11, the thermocapillarity coefficient ∂γ /∂T is positive everywhere, and the liquid contracts in the negative thermal gradient according to Eq. (4). If, however, the maximum temperature of the liquid (which is at the center of the melt pool) exceeds To, then ∂γ /∂T is negative up to some radius ro, where the local temperature coincides with To, and positive beyond ro up to the solid rim. Then the liquid experiences an outward thermocapillary drag at the center up to a radius ro and an inward shear stress beyond ro. The melt starts to deplete at the center and at the periphery and produces a ring-shaped bulge somewhere between (Fig. 6). The appearance of a circular trench in aluminum films at higher temperatures (Fig. 9) cannot be explained as before with a positive thermocapillary coefficient ∂γ /∂T > 0, as the temperature after the used laser pulses is too high (T > To at the center). Instead, chemocapillarity presumably is operating. The surface active oxygen atoms, stemming from the disintegrated native oxide, are evaporated from the center (which is hottest) to a large extent during the laser pulse (see also Fig. 16). Thus, a positive gradient ∂X/∂r > 0 of the oxygen concentration is produced. Since the temperature has its maximum at the center of the melt, the thermal gradient is small near the center (∂T/∂r ≈ 0), so that the sign of the total shear stress in Eq. (3) may become negative there and force the central zone of the melt to contract. This physical picture has been substantiated by numerical simulations (Balandin, Otte, et al., 1995). The concentric ripples occurring in solidified melts produced by lower energy laser pulses (Fig. 8) cannot be explained by simple physical arguments. They certainly are not frozen capillary waves, as one might think at first. The large number of ripples would mean that they are due to a high-frequency mode, whose excitation, however, is very improbable. The formation of the observed ripples was reproduced by a numerical simulation which is based on the Navier–Stokes equation, comprising thermo- and chemocapillary shear stress, and which assumes that the surface active impurity atoms segregate at the moving crystallization front and accumulate in the adjacent melt (Balandin, Gernert, et al., 1997; Balandin, Nink, et al., 1998). These simulations give the following physical picture of the solidification process in a metal melt with surface active impurities. As the crystallization velocity exceeds a threshold of about 6 m/s (in gold), a front wave with a width of about 1 μm is produced ahead of the moving phase boundary. It pulsates and periodically emits steps of the impurity concentration, which in turn cause steps of the surface tension and these in turn steps of the flow velocity. All these abrupt changes propagate into the melt. As the crystallization front sweeps across the agitated liquid, ripples of the observed period are in fact frozen. A front wave moving
20
O. BOSTANJOGLO
Figure 12. Short exposure transmission electron microscopy images showing the crystallization of a laser pulse–produced melt in a germanium film (50 nm). Exposure time was 40 ns. The time of exposure after the laser pulse is indicated at the right top corner (∞ 10 s after the melting 30-ns laser pulse). Note the pileup of liquid at the moving crystallization front.
along with the phase boundary in a crystallizing germanium melt is shown in Figure 12 (Bostanjoglo, Marine, et al., 1992). b. Ablation of Metal Films If the deposited laser pulse energy exceeds the enthalpies of melting and evaporation, a certain amount of the hottest part of the melt will evaporate during the laser pulse. Figure 13 shows how evaporation and thermocapillarity compete in ablating an aluminum film after a pulse of medium fluence (Niedrig and Bostanjoglo, 1997). A circular trench emerges, as at lower fluences. In addition, the liquid is removed by evaporation at the center. A hovering liquid ring remains, which collapses as a result of the surface tension. Simultaneously, the hole is expanded by surface tension with a velocity v that can be estimated by equating the approximate change d(2πr2γ ) of the surface energy and the change d(2πrπ R2ρv 2/2) of the kinetic energy v ≈ (4γ/ρ D)1/2
(6)
Calculated and measured velocities are in the order of 100 m/s for films with a thickness of D ≈ 100 nm. As the fluence exceeds a threshold (e.g., ≈5 J/cm2 for a 90-nm Al film) ablation of the aluminum film proceeds exclusively by evaporation (Fig. 14). Liquid flow is reduced to a short radial expansion of the hole, curling up its rim and disrupting it by Rayleigh instabilities into spheres. At first sight, the ablation processes in Figures 13 and 14 seem to be selfexplanatory, but numerical simulations uncover some surprises (Balandin,
HIGH-SPEED ELECTRON MICROSCOPY
21
Figure 13. Double-frame short exposure imaging of the ablation of an aluminum film (90 nm) by volume evaporation and thermocapillary flow, caused by a 15-ns laser pulse of 4 μJ. Exposure time was 5 ns. The moment of exposure was counted from the peak of the laser pulse and is given at the left top corners of the frames. The double-frame series a–c were produced at three different regions of the same film. The final state was always a hole as in series a.
Niedrig, et al., 1995; Niedrig and Bostanjoglo, 1997). The observed time scales of the ablation, by the combined action of thermocapillary flow and evaporation, and by evaporation alone, require that the following two conditions hold: 1. The surface tension decreases with a constant tilt of −3 × 10−4 N/m · K from the melting temperature up to ≈3000 K. This coefficient is equal to the tabulated value of pure aluminum near the melting point (933 K). Above ≈3000 K the surface tension heads, with a very small coefficient of −0.2 × 10−4 N/m · K, toward zero at the critical temperature of ≈8500 K. 2. Surface evaporation is marginal when aluminum is heated by nanosecond laser pulses. Instead, evaporation proceeds by volume evaporation (i.e., boiling), which is calculated to set in at ≈6000 K, assuming that nucleation of critical bubbles is homogeneous in the freestanding films. Models which are based on equilibrium surface evaporation (the evaporation rate and pressure are given by the Hertz–Knudsen–Langmuir and Clausius– Clapeyron equations, respectively) have been advanced, for example, by Ho et al. (1995), Metev and Veiko (1998), and Pronko et al. (1995) to explain
22
O. BOSTANJOGLO
Figure 14. Volume evaporation of an aluminum film (90 nm) by a 15-ns laser pulse of 6.5 μJ. Exposure time was 5 ns. The moment of exposure was counted from the peak of the laser pulse and is indicated at the top left corners of the frames. The four double frames a–d were produced at different regions of the same film. The final state was always a hole coinciding in size with that in frame “45 ns” of d.
ablation of metals by short laser pulses. Although these models reproduce the ablated volume surprisingly well (Preuss et al., 1995; Singh et al., 1990), according to the preceding findings they cannot deal with the dynamics of evaporation of aluminum, and probably of other metals, by nanosecond laser pulses, and are therefore misleading. 3. Space–Time Resolution a. Short-Time Exposure Bright-Field Imaging As each electron image point requires a minimum electron dose to be registered in a single exposure, space and time resolution are not independent. The joint resolution is limited by shot noise in the electron beam and by the detector noise. A specimen area of diameter x be illuminated by n electrons during an exposure time t. A fraction ni of the scattered electrons is passed by the objective lens aperture and produces the bright-field image. An image detector with the gain G delivers nd = Gni signal electrons. Two adjacent areas of
HIGH-SPEED ELECTRON MICROSCOPY
23
equal diameter x, which produce different numbers ni1 and ni2 of image ¯ n¯ i1 − electrons, are distinguished by the detector if the mean difference |G ¯ n¯ i2 | of the signal electrons exceeds the root-mean-square noise amplitude G ((n d1 )2 + (n d2 )2 )1/2 by a minimum signal-to-noise ratio of about 3; that is, ¯ n¯ i1 − G ¯ n¯ i2 |/ (n d1 )2 + (n d2 )2 1/2 > 3 |G (7)
The overbar denotes the average value. The fluctuations nd of the number of the detected electrons comprise shot noise ni in the beam and detector noise expressed by G. The mean square of nd then is (n d )2 = G 2 (n i )2 + n i2 (G)2
(8)
Since the shot noise obeys the Poisson distribution one has (n i )2 = n¯ i and ¯ 2 + (G)2 , n i2 = n¯ i2 + n¯ i which gives, with G 2 = G ¯ 2 [1 + (2 + n¯ i )(G)2 /G ¯ 2] > ni G ¯2 (n d )2 = n¯ i G
(9)
The term within the brackets is of the order 1 as the detector gain G is usually very high and the number n i of electrons imaging the small area π(x)2/4 within the short time t is small. If one combines Eqs. (7) and (9), using the image contrast K = |n i1 − n i2 |/(n i1 + n i2 ) of the two adjacent regions and expressing the average number of image electrons by the current density j of the illuminating electrons, their charge e, and the average transmission factor ε of the objective lens aperture—that is, (n i1 + n i2 )/2 = επ(x)2 jt/4e—the relation between spatial resolution x and exposure time t becomes (x)2 t > 18e/πεK 2j
(10)
If one uses a laser-driven photoelectron gun which delivers 2-mA electron pulses into an area of the object of about 30 μm ø and takes ε ≈ 0.1 and K ≈ 1, Eq. (10) gives an ultimate joint resolution of (x)2 t ≥ 5 × 103 nm2 · ns. An image with an exposure time of, for example, 10 ns will have a spatial resolution of 20 nm at best. An additional limitation is imposed by electron beam heating of the specimen. On the one hand, the beam current density should be as high as possible to keep shot noise low. On the other hand, the probing electron beam should not induce any transitions in the specimen. Since heating by the illuminating electron pulse is adiabatic at the short exposure times, the total energy n¯ E deposited by the n¯ electrons of the pulse must obey n¯ E ≤ π ( x/2)2 Dρc T
(11)
where E, ρ, c, D, and T are average energy loss of a beam electron, density,
24
O. BOSTANJOGLO
specific heat, thickness, and the maximum allowed electron-induced rise of temperature of the film, respectively. If one inserts E = AρD with A ≈ 5 × 10−13 Jcm2/g according to the Bethe stopping power formula (e.g., Reimer, 1993) and uses Eq. (10), the resolution limit due to electron beam heating is given by x > (18A/πε K 2 c T )1/2
(12)
Taking, for example, iron and replacing the actual specific heat by its hightemperature value c = 3k/m (k, Boltzmann constant; m, atomic mass) and T by the melting temperature, one gets, with ε ≈ 0.1 and K ≈ 1, x ≈ 3 nm as the absolute spatial resolution in this case. b. Streak Imaging The time resolution t of a streak image may be defined as t = ts w/L
(13)
where ts is the streak period, w the width of the streak aperture, and L the streak distance, both measured in the object plane. The spatial resolution x along the streak aperture is determined as for short exposure imaging. Two adjacent rectangular areas of the specimen with width w and length x are distinguished by the detector within the time t if their signal-to-noise ratio exceeds a minimum value of about 3. Expressing the average number of the image electrons ni1 and ni2 from the two areas again by the illuminating current density j, that is, (ni1 + ni2)/2 = εwxj t/e, one gets an inequality similar to that in the preceding section: x t > 9e/2εK 2 wj
(14)
Taking typical values ts = 100 ns, w/L ≈ 0.1, ε ≈ 0.1, K ≈ 0.2, w ≈ 1 μm, and j ≈ 3 A /cm2, a one-dimensional space resolution of x ≈ 0.6 μm is calculated for a time resolution of t = 10 ns. This value approximately agrees with the actual resolution. c. Image Intensity Tracking The joint space–time resolution is derived in a similar way as before. The specimen is illuminated by an electron current of density j. A fraction ε of the scattered electrons passes the objective lens aperture and produces a brightfield image. If x is the diameter of the specimen area viewed by the scintillator/photomultiplier detector, the current picked up by the detector then is J = εjπ(x)2/4. The output signal current of the detector, having a gain G (≫1) is Js = GJ. This signal is superimposed by a noise current Jn with an average amplitude (Jn2 )1/2 . The noise is composed of fluctuations G of the gain plus
HIGH-SPEED ELECTRON MICROSCOPY
the amplified shot noise (2eJ f )1/2 of the image current J: 1/2 2 1/2 Jn2 ≈ (2e J f G 2 )1/2 = J (G)2 + 2e J f G 2
25
(15)
where f is the bandwidth of the detector and the processing electronic circuits. Since the detector is based on multiplication processes with very high gain one has (G)2 ≈ G >> 1, and since the image current J and its average shot noise amplitude are of the same magnitude near the resolution limit, Eq. (15) simplifies as indicated. A transition producing a change Js of the signal is resolved, if it exceeds the noise amplitude (Jn2 )1/2 by a factor of at least 3: 1/2 Js = G J ≥ 3 Jn2 ≈ 3G(2e J f )1/2 (16) If one inserts J and replaces the bandwidth f by the minimum detectable rise/fall time t ≈ 0.35/f the joint space–time resolution becomes (x)2 t ≥ 25e/πεj(J/J )2
(17)
Assuming typical values j ≈ 10 A/cm2 (from a conventional thermal tungsten hairpin gun) and ε ≈ 0.1, Inequality (17) states that a phase transition of, for example, 3-ns duration, which produces a change J/J ≈ 1 of the image current, can be detected in specimen areas with diameters down to x ≈ 0.2 μm.
B. Flash Photoelectron Microscopy Any electrons released from a surface (e.g., by ion, electron, or photon bombardment or by heating or high electric fields) can be used to image the surface. Photoelectrons ejected by laser pulses are particularly suited for short exposure imaging because r
r
high electron current densities can be produced without damaging the specimen the moment of exposure can be freely chosen
For decades photoelectron microscopy has been used as a powerful surfaceimaging technique. Very different material properties have been characterized: r
r
r
crystal texture and defects (Engel, 1966; Griffith and Rempfer, 1987; Griffith et al., 1991; M¨ollenstedt and Lenz, 1963) chemical reactions and pattern formation ( Ehsasi et al., 1993; Engel et al., 1991; Rotermund et al., 1991) p–n junctions, metal leads, and surface states on semiconductor devices (Giesen et al., 1997; Ninomiya and Hasegawa, 1995)
26
O. BOSTANJOGLO r r
surface diffusion (von Oertzen et al., 1992) biological tissue (De Stasio et al., 1998; Griffith, 1986)
The spatial resolution of photoelectron microscopy and related techniques, such as low-energy and mirror electron microscopy, was discussed, for example, by M¨ollenstedt and Lenz (1963) and by Rempfer and Griffith (1992). Photoelectrons are emitted after single- or multiphoton absorption. The former requires that the photon energy hf (h, Planck constant; f, frequency of the light) exceed the bond energy of the electron; that is, hf > WA for a metal with a work function WA. At nonzero temperatures thermally excited electrons can be emitted by lower energy photons. Two-photon absorption, as the simplest multiphoton process, produces a photoelectron by the simultaneous absorption of two photons. If they have equal frequencies their quantum energy must exceed only WA/2. However, the intensity of the light must be so high that on the average two photons interact with an electron within a time h/WA according to the uncertainty relation. As the absorption cross section is about σ ≈ 10−16 cm2, intensities of at least WA2/hσ ≈ 1013 W/cm2 are required for metals with WA ≈ 4 eV. Such high light intensities can be produced by laser pulses, but they inevitably damage most metals unless femtosecond pulses are used. Unfortunately, these ultrashort pulses produce far too small numbers of electrons per pulse for a short exposure image with an acceptable signal-to-noise ratio (at fluences below the damage threshold). Therefore, single-photon absorption has been exclusively exploited for photoelectron microscopy. The contrast is determined mainly by the local yield of photoelectrons. This yield depends on the true local work function, on the local thickness of possibly present dielectric (oxide) coating films, and on local variations of the electric field caused by surface geometry and by adsorbed molecules with high electric polarizability or with a permanent electric dipole. Such adsorbed molecules (e.g., water molecules) may enhance the photoelectron emission by more than one order of magnitude (Buzulutskov et al., 1997). All these effects merge to produce an effective work function WA with a local variation WA. Since the density of the photoelectron current (induced by one-photon absorption) is j = const(h f − W A )n
(18)
with n a positive constant, the contrast becomes j/j = −n W A /(h f − W A )
(19)
The contrast increases sharply as the photon energy approaches the work function. Conversely, the quantum efficiency decreases to zero and the photoelectron image is disguised by shot noise. For this reason the illuminating photons should have a large quantum energy. Most metals of technical interest have work functions around 4 eV, so a good compromise between contrast and shot noise is photons with hf ≈ 5 eV.
HIGH-SPEED ELECTRON MICROSCOPY
27
Short exposure photoelectron imaging is most easily realized by illuminating the specimen with an ultraviolet laser pulse. Suitable lasers are frequencymultiplied solid-state lasers and excimer lasers. The latter are preferable because of their smaller coherence length, which helps to avoid disturbing interference patterns in the image. A good choice is the KrF laser (wavelength, 248 nm; hf = 5.0 eV). 1. Instrument for Short Exposure Imaging All previous photoelectron microscopes had a time resolution limited to several milliseconds. Releasing the photoelectrons with a pulse from an excimer laser, having a short coherence length, and carefully avoiding parasitic reflections which cause interference patterns allowed a resolution of a few nanoseconds to be achieved. Figure 15 schematically shows the assembled flash photoelectron microscope that can image nonrepetitive changes of a surface on the
Figure 15. Flash photoelectron microscope with attached lasers for treating the specimen.
28
O. BOSTANJOGLO
nanosecond time scale (Bostanjoglo and Weing¨artner, 1997; Weing¨artner and Bostanjoglo, 1998). The specimen is at a high negative potential (−25 to −30 kV). Imaging photoelectrons are released by a 4-ns (FWHM) pulse from a KrF excimer laser. The fluence of the ultraviolet pulse is kept so low that the surface is not damaged. The photoelectrons are accelerated with a field of 5–8 kV/mm toward a grounded stainless steel anode. They are focused by an electrostatic einzel lens to an intermediate image which is projected by a magnetic lens on a fiber plate transmission screen. The converted electron image is picked up with a fiber-coupled MCP image intensifier plus a CCD camera, digitized by a frame grabber, and stored in computer memory. A home-built trigger circuit allows one to make an “exposure” at any time relative to the processing visible laser pulse (wavelength, 532 or 620 nm). The aperture in the back focal plane of the electrostatic lens decreases the angular and energy spread of the imaging electrons, and therefore makes geometrical modulations of the surface become visible and increases the spatial resolution (Boersch, 1943; M¨ollenstedt and Lenz, 1963). Two adjustable aluminum mirrors, which are fixed at the anode, direct the illuminating ultraviolet and the processing visible laser pulse onto the specimen. A beam blanker passes electrons to the detector for 5 ns only during the ultraviolet laser pulse. The beam blanker consists of a low-impedance parallel plate capacitor which normally deflects the electrons beyond the intercepting aperture in the back focal plane of the electrostatic einzel lens, and which is switched by an avalanche transistor–based cable pulser. In this way disturbing long-lasting thermal electrons and delayed electrons liberated by excited atoms and ions are kept away from the image. Their contribution to the image during the acquisition time (i.e., “exposure”) is negligible if the fluence of the processing laser pulse is not excessive. The specimen can be heated by electron bombardment from the back side for cleaning purposes. The investigated fast processes were launched in the specimen by a focused pulse either from a Q-switched frequency-doubled Nd:YAG laser (pulse width, 10 ns; wavelength, 532 nm) or from a colliding pulse mode–locked dye laser (pulse width, 100 fs; wavelength, 620 nm). The laser beams were focused on the specimen to a spot with a 1/e2 diameter of 15 and 50 μm for the nano- and femtosecond pulses, respectively. For a controlled positioning of the processing laser beam, the specimen is illuminated with white light and imaged with reflected and scattered radiation. The accelerating voltage can be cut off and the specimen grounded within 20 ns with a fast switch consisting of cascaded transistors. In this way a laserinduced electric breakdown is avoided by interrupting the avalanche buildup. This technique is successful only if the breakdown is delayed by more than the fall time of the switch (20 ns) plus the acquisition time for the image (5 ns).
HIGH-SPEED ELECTRON MICROSCOPY
29
Figure 16. Photoelectron images of an aluminum film (100 nm) on (100) silicon, showing the removal of the native aluminum-oxide covering layer by a laser pulse (10 ns, 6 μJ, 20 μm ø). Exposure time was 5 ns. The moment of exposure was counted from the peak of the laser pulse and is given at the right top corners of the images (∞ 10 s after the pulse). The images were produced at previously untreated neighboring regions with equal laser pulses.
2. Applications Because photoelectron emission reflects the bonding of surface electrons, pulsed photoelectron microscopy is an excellent method for imaging local chemical reactions. Figure 16 shows as an example the reaction induced by a nanosecond laser pulse in aluminum covered with its native oxide (thickness, D ≈ 3–4 nm). The fluence was high enough to melt the surface of the metal but too low for appreciable evaporation of metal atoms (as no flashover was initiated). The photoelectron emission at first decreases during 5–10 ns after the laser pulse and then considerably increases, saturating after some 10 ns. If the surface is exposed to air the photoelectron emission returns to the low value of the untreated material. It is well known that liquid aluminum decomposes aluminum oxide, Al2O3, which produces a volatile suboxide, Al2O (Champion et al., 1969). The dielectric native oxide coating reduces the number of the ejected photoelectrons because only a fraction exp(−D/L) are transmitted, the mean free path of the photoelectrons in the oxide being L ≈ 1–3 nm (Buzulutskov et al., 1997). As the oxide coating disintegrates after the laser pulse, the photoelectron yield increases and its rise time reflects the time it takes to decompose the oxide and evaporate the products from the melt. There remains the puzzling early decrease of the photoelectron emission. Such a decrease was observed with all metal surfaces that were not cleaned by electron beam heating prior to the laser treatment. This decrease is therefore probably due to the removal of adsorbed polar molecules (e.g., water molecules), which add their dipole field to the cathode field, decreasing the
30
O. BOSTANJOGLO
work function by epn/εo ( p and n, dipole moment and surface density of adsorbed molecules; εo, vacuum permittivity). A particular benefit of photoelectron microscopy is the fact that the first top layers of a specimen are probed. It is therefore particularly suited to uncover incubation effects and early stages of radiation-induced material modifications. As an example with a bearing on laser microprocessing flash photoelectron microscopy is applied to visualize effects produced by nano- and femtosecond laser pulses with fluences near the ablation threshold. These two pulse lengths are much longer and much shorter, respectively, than the electron/lattice relaxation time, which is some picoseconds for typical metals (e.g., Elsayed-Ali et al., 1987). The laser pulse energy is primarily absorbed by the electrons. In the case of a nanosecond pulse the electrons are practically in equilibrium with the atomic lattice, and the laser power is fed directly to it, which gradually destabilizes it by ordinary heating. This is not so in the case of a femtosecond pulse. In this case the laser pulse energy is almost totally absorbed first by electrons, which excites them to high levels and destabilizes the atomic lattice. A metal is destabilized by the high pressure of the hot conduction electron gas, whereas bonds in semiconductors are weakened as the valence electrons are excited into the conduction band (Stampfli and Bennemann, 1992). If the electron excitation is high enough the lattice will collapse. At lower fluences just a destabilized lattice is produced which starts to sink the energy of the electrons either by mechanical work or by exchange of heat (Stampfli and Bennemann, 1992). Since the atomic lattice occupies two very different states when it sinks the energy of a nano- and a femtosecond laser pulse, respectively, its response on the thermodynamic time scale (some picoseconds and longer) is expected to be quite different. Both metals and semiconductors have been observed to respond in different ways to nano- and femtosecond pulses (Weing¨artner et al., 1998). Figure 17 shows the completely different effects produced by a 10-ns and a 100-fs laser pulse on (100) silicon with a native oxide layer (thickness, ≈3 nm). The nanosecond pulse causes the silicon surface to melt, as is substantiated by the final smooth crater-like structure (Fig. 18). Photoelectron emission rises as the silicon surface is molten and remains high until the melt solidifies 100–200 ns after the laser pulse. Freezing is accompanied by a slight decrease of photoemission. Exposure of the surface to air returns the photoelectron yield to the low value of the untreated silicon. The oxide coating is decomposed as the laser pulse melts the silicon surface, and the photoelectrons can escape the liquid without crossing a solid coating. As the liquid silicon solidifies, oxygen atoms which were dissolved in the melt are segregated at the surface and a covering oxide layer is grown again. However, this layer is thinner than the original one, as part of the oxygen atoms were evaporated, and the
HIGH-SPEED ELECTRON MICROSCOPY
31
Figure 17. Photoelectron images of (100) silicon with a native oxide covering layer of 3 nm thickness, showing the completely different response to (a) a 10-ns and (b) a 100-fs laser pulse. Exposure time was 5 ns. The moment of exposure was counted from the peak of the laser pulse and is given at the top right corner of the images (∞ 10 s after the pulse). The images were produced at previously untreated neighboring regions. The energy was ≈6 μJ for the nanosecond pulse and ≈0.9 μJ for the femtosecond pulse.
photoemission is higher after the laser pulse. Thus, a nanosecond laser pulse effects a partial removal of the oxide from silicon by decomposition, transient storage of some oxygen dissolved in the melt, and regrowth of a thinner coating within 200 ns. This partial cleaning by a melting nanosecond laser pulse, but not the time scale of the process, was previously documented by Auger spectroscopy (Larciprete et al., 1996). The pileup of the melt, freezing at the periphery after a nanosecond pulse (Fig. 18), is not caused by recoil pressure from evaporating atoms. Evaporation was marginal as no flashover occurred.
Figure 18. Typical smooth flat crater produced by a 10-ns laser pulse (≈5 μJ ) on (100) silicon with native oxide and imaged by scanning electron microscopy with secondary electrons.
32
O. BOSTANJOGLO
Figure 19. Typical rough patch produced by a 100-fs laser pulse (≈0.9 μJ ) on (100) silicon with native oxide and imaged by scanning electron microscopy with secondary electrons at grazing incidence (80◦ against the normal of the surface).
Since very similar final structures were produced on (100) silicon without an oxide coating, the crater-like distribution of the melt is not affected by chemocapillary forces but must be caused by thermocapillary forces. The 100-fs laser pulse has a very different effect on (100) silicon covered by a native oxide (Figs. 17b and 19). The photoelectron yield is heavily reduced during ≈100 ns after the laser pulse within the laser spot. An irregular small zone with increased photoelectron emission develops from the dark area. The final structure consists of a weakly corrugated surface which is barely visible in the scanning electron microscope (Fig. 19). It is invisible to light-optical microscopy, even to such surface-sensitive techniques as dark-field and interference microscopy. The transient phase produced by a femtosecond laser pulse on an oxidecoated silicon surface has a very low photoelectron yield and effectively suppresses evaporation of silicon atoms. Probably, it is a foam consisting of oxygen from disintegrated oxide mixed with liquid silicon. This foam settles to a blistered surface with partially removed oxide after ≈100 ns. If a femtosecond laser pulse of equal fluence is applied to a silicon surface having no oxide layer, a heavy ablation occurs. This leads to an electrical breakdown if the high voltage is not switched off within 50 ns. The response of a metal covered by a transparent oxide to a nanosecond laser pulse depends on the thermal stability of the oxide. Either the oxide is thermally destroyed or decomposed by the liquid metal, or the oxide is stable at the melting temperature of the metal, as in the case of cobalt oxide, CoO, on cobalt. Then the coating oxide may increase in thickness after a nanosecond laser pulse, which melts the metal, by gathering oxygen atoms originally dissolved in the crystal. These atoms are abundant in the liquid after the crystal is molten and segregate at the floating oxide as the melt freezes again. This scenario
HIGH-SPEED ELECTRON MICROSCOPY
33
Figure 20. Photoelectron images showing the completely different response of cobalt to (a) a 10-ns and (b) a 100-fs laser pulse (fluence, ≈1 J/cm2). Exposure time was 5 ns. The moment of exposure was counted from the peak of the laser pulse. The arrow in a2 shows the fast shrinking zone with unimpeded photoelectron emission in the solidifying melt. The arrow in b3 shows the crystal defect produced by the 100-fs laser pulse (already visible in b2).
explains the decrease of photoelectron emission of nanosecond laser-treated cobalt during cooldown (Fig. 20a). The reduction of the photoelectron yield was not caused by desorption of adsorbed polar molecules (e.g., water), as adsorbed layers were removed by electron beam heating. A femtosecond pulse of a simular fluence as that of the chemically active nanosecond pulse typically produces dark lines within a crystal (Fig. 20b), which probably are slip lines, bundles of stacking faults, or grain boundaries. There is a transient increase of the electron emission during 20 ns after the laser pulse, where the linear crystal defect later appears. This emission occurs also without photostimulation. Melting does not occur within the laser spot as the crystals remain visible, so that the actual temperature is too low to account for the electron emission as thermal emission. A nanosecond laser pulse with a fluence high enough to melt the treated metal starts chemical reactions between the metal and a coating oxide. When the same metal is treated by a femtosecond pulse of equal fluence (additionally being below the threshold for ablation) it experiences plastic deformations, which
34
O. BOSTANJOGLO
proceed on the nanosecond time scale and are accompanied by emission of exoelectrons. 3. Limits Flash photoelectron microscopy is subject to the usual limitations of the resolution, which originate from lens aberrations and shot noise and are shared with other imaging techniques. However, there are additional constraints as the specimen is located in a high electric field. a. Limits of the Resolution The space–time resolution is restricted by the aberration of the uncorrected accelerating field at the specimen, by the space charge of the imaging electrons, and by their shot noise. Assuming all lenses except the cathode lens to be ideal (which is a good approximation), the spatial resolution xL is then that of the used two-electrode cathode lens, which is given by M¨ollenstedt and Lenz (1963) as x L = 1.2E/eF
(20)
where E and F are energy spread of the photoelectrons and electric field at the specimen, respectively. The space charge produced by the photoelectrons reduces the applied accelerating field and blurs the image. There exists no simple relation between the resolution and the electron current density. The actual blurring is considerably larger than that predicted by model calculations (Massey, 1983; Massey et al., 1981). In any case space charge effects can be neglected if the current density jp of the photoelectrons is less than the space charge–limited Child current density jCh by one order of magnitude: j p < jCh /10 = C F 3/2 /10a 1/2
(21)
with C = 2.34 × 10−6 A/V3/2 and a the spacing of the two accelerating electrodes of the cathode lens. The joint space–time resolution, limited by shot noise, is given by Eq. (10) with j replaced by the current density jp of the emitted photoelectrons: (x N )2 t > 18e/πεK 2 j p
(22)
If one combines Inequalities (21) and (22), the space–time resolution, limited by the combined action of shot noise and space charge, is found to obey (x N,s )2 t > 180ea 1/2 /πCεK 2 F 3/2
(23)
The spatial resolution is improved by reducing the distance a between the electrodes and by increasing the accelerating field F. The former is limited to
HIGH-SPEED ELECTRON MICROSCOPY
35
a > 3 mm to provide convenient access for the laser beams, whereas the electric field should not exceed a safe value of ≈10 kV/mm. When these limits are used and ε ≈ 0.1, K ≈ 0.2, t = 5 ns, and E ≈ 1.5 eV, a spatial resolution of xL + xN,s ≈ 0.8 μm is calculated. b. Limitation of the Laser Treatment In situ material processing by the laser is constrained by the requirement that thermal electron emission and evaporation should not interfere with photoelectron imaging. Heating by the treating laser pulse must be such that the current density jth of the thermal electrons stays below that of the photoelectrons jp; that is, jth = AT 2 exp(−W A /kT ) < j p < C F 3/2 /10a 1/2
(24)
Inequality (21) is applied and the Richardson–Dushman expression is used for the current density of the thermal electron emission with A < 120 A/cm2K2, k the Boltzmann constant, and T the absolute temperature. Inserting the values for electrode spacing (a = 5 mm) and electric field (F = 5 × 106 V/m), permits maximum allowed temperatures of 2400–2800 K to be calculated for metals with work functions in the range 3.6–4.5 eV. Pulsed photoelectron microscopy can be applied up to and even above the melting temperature of most materials without interference from thermionic emission, as was actually observed. The treating laser pulse also causes ablation of the specimen, and its fluence must be kept so low that formation of a laser-induced plasma is avoided. However, even if the laser pulse produces only neutral atoms, these are ionized by the thermal and photoinduced electrons which gain abundant energy in the accelerating field. These ions may cause troubling secondary electrons. Photoionization can be neglected as at least two photons of the used quantum energies must be absorbed for ionization of free atoms, and two-photon processes are very improbable at the restricted fluences. In fact, photoionization was not observed. The number of ions ni produced by electron collisions during the imaging time t is estimated to be n i = nσi ( j p + jth )t/e
(25)
with n and σ i the number of evaporated atoms (during the imaging time) and the ionization cross section averaged over the electron energies, respectively. The positive ions are accelerated toward the specimen, which is at a negative potential, and release ηni electrons (η, secondary electron yield). The number of these secondary electrons must stay below the number of the imaging photoelectrons. This requirement and Inequality (24), jth < jp, limit the allowed
36
O. BOSTANJOGLO
number na of evaporated atomic layers (during the imaging time) according to n a < d 2 /2σi η
(26)
2
where d is the area per atom within the processed surface. If relevant values are inserted (σ i ≈ 10−20 m2, d2 ≈ 6 × 10−20 m2, η ≈ 10), Inequality (26) requires that less than one third of a monolayer be evaporated during imaging, so that ion-induced secondary electrons can be neglected. The vapor pressure of most metals is so low up to several 100 K above the melting temperature that one atomic layer is evaporated during the imaging time of 5 ns. Accordingly, most metals can be pulse molten without disturbing photoelectron imaging, but adsorbates and oxides which decompose can be a problem in short exposure imaging. The reaction products contain excited molecules and atoms, which may liberate electrons from the contacting metal by an Auger process.
C. Pulsed High-Energy Reflection Electron Microscopy In high-energy reflection electron microscopy the surface of a bulk specimen is illuminated by a collimated electron beam at grazing incidence, and specularly scattered electrons are used to image the surface. Reflection electron microscopy was invented by Ruska (1933), who exploited electrons scattered by 90◦ , however. von Borries (1940) introduced a decisively improved technique, concerning chromatic aberration and image intensity, by using glancing incidence illumination and electrons scattered into low angles to image the surface. Reflection microscopy was abandoned with the advent of the scanning electron microscope. It was revived, however, in the early 1980s. The use of improved electron optics, on-axis dark-field imaging with Bragg-reflected “loss-less” electrons, drove the resolution to the atomic scale. Prominent applications have since been the imaging of reconstructing single crystal surfaces (Tanishiro et al., 1983), atomic steps (Cowley and Peng, 1985), structures of submonolayer deposits on silicon surfaces (Osakabe et al., 1980), and surface migration of atoms (Yamanoka and Yagi, 1989). A review of techniques and studies of surface structures and slow dynamic processes is given byYagi (1993). Despite its enormous potential as a surface probe, reflection microscopy based on Bragg diffraction is not very suitable for short exposure imaging of the surface. Usually only a small fraction of the electrons are passed by the objective lens aperture, and the image is buried beneath shot noise. Brightfield imaging with grazing incident and exit angles is more promising. A considerable disadvantage is the almost one-dimensional image of the surface. However, this technique is the only one that visualizes the space above the
HIGH-SPEED ELECTRON MICROSCOPY
37
Figure 21. Pulsed reflection electron microscope: 1, laser pulse–driven thermal electron gun; 2–5, as in Figure 1; 6, fiber plate transmission phosphor screen; 7, MCP image intensifier; 8, CCD sensor.
surface of a specimen that is at ground potential (in contrast to emission and mirror microscopes) so that massive evaporation and plasma formation are accessible to investigations. Figure 21 shows a reflection electron microscope for short exposure imaging of laser-induced processes (Bostanjoglo and Heinricht, 1990). The setup is
38
O. BOSTANJOGLO
similar to that of the transmission microscope in Figure 1, except for the electron illumination system which can be tilted against the specimen and some minor deviations. For reasons of intensity a laser-driven thermionic electron gun is used, which delivers only one, but intense, electron pulse. This allows to produce one shorttime exposure image with an exposure time of 20 ns. The bulk specimen can be rotated about an axis that is orthogonal to the electron and the treating laser beam. Incident and exit angles of the electrons are about 5◦ as measured against the surface. Because of these grazing angles the image of a geometrical structure on the surface is extremely shortened in the direction of the incident electrons. A laser-produced circular crater appears as a very slender ellipse. Any particle ejected from the laser-processed region has two images which appear symmetrically to the slender image of the eroded crater (Fig. 22). The two images are due to the absorption of incident and reflected (at the surface) electrons, respectively. The reflection microscope was used to visualize the evaporation of semiconductors and the ablation of metal films on semiconductors (Bostanjoglo and Heinricht, 1990; Heinricht and Bostanjoglo, 1992). Figure 23 shows, for example, the detachment of a gold film from a silicon wafer by a low-energy laser pulse which melts only the metal. The film was produced by evaporation on a silicon surface covered by native oxide and adsorbed molecules from the ambient atmosphere. As the gold film is molten by the laser pulse the adsorbed layers evaporate and lift the liquid film within 340 ns after the laser
Figure 22. Generation of the double image of a shadow-casting particle above a plane specimen in the reflection electron microscope. e−, illuminating electron beam.
HIGH-SPEED ELECTRON MICROSCOPY
39
Figure 23. Short exposure reflection electron images showing the liftoff process of a laserpulsed 100-nm gold film on a silicon wafer. Exposure time was 20 ns. The moment of exposure was counted from the peak of the laser pulse and is given below the images. These were produced at neighboring, previously untreated regions with equal laser pulses with an energy and a fluence of 1.3 μJ and 0.6 MW/cm2, respectively.
pulse. About 300 ns later the liquid has separated from the wafer and contracted to a drop, which is driven back to the substrate by electrostatic forces (Fig. 24). Such processes occur whenever a light-absorbing coating produces a nonwetting liquid film on the substrate. Separation of the liquid may be due to true nonwetting, or to an isolating gas produced by desorbed molecules or to volatile products from a disintegrated oxide. Laser-based cleaning and restoration methods rely on these and similar ablation effects. The joint spatial (x) and time (t) resolution of the pulsed reflection microscopy is determined mainly by shot noise of the imaging electrons and is derived as for transmission microscopy. It is given by a relation identical in form to Inequality (10): (x)2 t > 18e/πεK 2 j
(27)
As before, j is the current density of the electrons illuminating the specimen for a time t, K the contrast between two distinguished adjacent areas with diameter x, and ε the fraction of electrons passing the aperture of the objective lens. The difference to the relation for bright-field transmission microscopy is in the physical meaning of the passed fraction of electrons, which in this case are all scattered electrons. Conversely, the fraction ε in Inequality (10) contains for not too thick films mostly unscattered electrons and is therefore much larger.
40
O. BOSTANJOGLO
Figure 24. Scanning electron image of the rest of the gold film after the liftoff process shown in Figure 23.
Assuming values of the parameters typical for the assembled pulsed reflection microscope ( j ≈ 80 A/cm2, ε ≈ 10−3, t ≈ 20 ns) and choosing as a specimen an opaque shadow-casting particle on an ideally flat surface (i.e., K = 1) yields a spatial resolution perpendicular to the electron beam of x ≈ 0.3 μm. This is in the order of what actually was achieved. The resolution along the direction of the electron beam is x/sin α, with α ≈ 5◦ the angle which the illuminating electron beam makes with the imaged surface.
D. Pulsed Mirror Electron Microscopy In the mirror microscope the specimen is biased slightly negative with respect to the electron gun. Accordingly, the incoming electrons are reflected by a near-surface equipotential plane. As the latter is a replica of the geometrical and electrical roughness of the specimen surface, the reflected electrons carry information on surface morphology and local electric fields. These may be due to contact potentials, spontaneous electric polarization, p–n junctions, or
HIGH-SPEED ELECTRON MICROSCOPY
41
nonuniform electric conductivity. Furthermore, magnetic stray fields, such as from ferromagnetic domains, influence the trajectories of nonaxial electrons and can also be imaged. Two inconveniences are associated with the mirror microscope. First, the specimen is constrained by the fact that it is at high potential and therefore an integral part of the electron optics. Second, illumination and magnification cannot be chosen independently if the relevant electron beams are not separated. However, there also are merits to using this type of microscope. Since the electrons travel slowly near their point of inflection, they are very susceptible to lateral and axial electric fields, they have a high depth resolution, and they are effectively scattered by gases, which are emitted from the surface. Finally, this type of microscope has the unique property that the electrons can probe the specimen without touching it. The theory of electron mirrors and associated devices is covered by Rempfer and Griffith (1992) and by Hawkes and Kasper (1996). The design of mirror microscopes and applications to stationary specimen, slowly varying and periodic processes are described with numerous references by Bethge and Heydenreich (1987). Previous mirror microscopes were not suited for studying fast nonrepetitive processes. Since the mirror lens requires illuminating beams with a small divergence the electron gun must emit high-current pulses and have a high brightness. Figure 25 shows a mirror microscope which allows short exposure imaging of fast nonrepetitive processes on laser-treated surfaces (Kleinschmidt and Bostanjoglo, 2000). The setup has components in common with the transmission (Fig. 1) and photoelectron (Fig. 15) microscopes. The specimen stage of a transmission microscope was replaced by an electromagnetic prism, which bends the trajectories of the illuminating and reflected electrons by 90◦ , and an electron mirror. The specimen is the decelerating electrode of the mirror which may be biased negatively against the electron gun. As in the flash photoelectron microscope the mirror is a two-electrode lens in order to minimize accumulation of space charge near the specimen when high-current electron pulses are used. A beam blanker passes electrons to the detector for 8 ns only during the illumination in order to minimize blurring of the image by thermal and Auger electrons emitted by the laser-treated material. A 90◦ prism was chosen to separate the electron trajectories as this design allows one to treat the specimen with an expanded laser beam, which is focused to a spot of 20 μm in diameter. Very high thermal gradients of about 108 K/m can be produced and the material subjected to unusual chemical and mechanical processes. For cleaning purposes the specimen is heated by electron bombardment from the back side. Figure 26 demonstrates the sensitivity of the mirror microscope to space– time variations of contact potentials (i.e., work functions of the contacting
42
O. BOSTANJOGLO
Figure 25. Pulsed mirror electron microscope with attached laser for treating the specimen: 1, laser pulse–driven photoelectron gun; 2, magnetic prism; 3, pulse laser for treating the specimen (25 ns, FWHM); 4, electron mirror; 5, beam blanker; 6, fiber plate transmission phosphor screen; 7, MCP image intensifier; 8, CCD sensor.
materials) and for gases evaporating from a heated surface. The series shows the response of a (100) silicon surface, passivated by a monolayer of atomic hydrogen (Miyata et al., 1998; Yablonovitch et al., 1986), to a laser pulse. During and shortly after the laser pulse the treated region is obscured by a cloud of evaporated hydrogen, which expands into the microscope vacuum with a velocity of about 1000 m/s. As the cloud clears, the cleaned region emerges as a dark patch with a bright rim about 100 ns after the peak of the treating laser pulse. The contrast reverses a few seconds after the laser pulse, showing the treated area as a bright spot, and gradually disappears during several hours’ exposure of the surface to the microscope vacuum. No geometrical modification of the treated surface could be detected by scanning electron and light interference microscopy. This variation of contrast can be explained as follows. The cleaned region has a lower work function than that of the passivated surface, as was directly observed in the flash photoelectron microscope. Consequently, the cleaned surface is more positive than the passivated periphery (Babout et al., 1977). The associated localized drop of the decelerating field immediately above the treated region represents a convex microlens having a much smaller focal
HIGH-SPEED ELECTRON MICROSCOPY
43
Figure 26. Mirror electron images of a Si crystal passivated with hydrogen. The series shows the evaporation of the hydrogen monolayer (0–50 ns) after a heating laser pulse (25 ns, ≈6 μJ, 20 μm ø) and adsorption of a monolayer of gas molecules from the microscope vacuum (5–18 s). The exposure time was 5 ns. The pictures were taken at the indicated times, counted from the peak of the treating laser pulse. They were produced at fresh neighboring regions being equally treated.
length than that of the macroscopic concave mirror lens (Orthuber, 1948). Therefore the microlens is expected to produce a reduced intensity in the projected image. As gas molecules from the microscope vacuum are absorbed, the work function of the treated area and the focal length of the associated microlens increase. When the focusing of the microlens compensates the defocusing of the mirror lens the contrast disappears. This occurs a few seconds after the laser pulse, which is just the time it takes the clean area to be covered by a monolayer of gas molecules adsorbed from the vacuum of the used microscope (5 × 10−6 mbar). As the adsorption of the gas molecules continues, the work function, and accordingly the focal length of the microlens, further grows, reaching a stage where the reflected electrons are focused on the detector plane giving the transient bright spot. Finally, the adsorption saturates and the work function approaches the value of the untreated periphery, so that the contrast disappears permanently. As with the other time-resolving microscopes, the joint space–time resolution of the pulsed mirror microscope is determined by the shot noise of the imaging electrons. As a result of the projection-type imaging by the twoelectrode mirror, the divergence of the illuminating electron beam imposes a second restriction. The resolution is derived as follows. Two regions of diameter d in a homogeneous specimen, which is the reflecting potential plane, are considered. They are judged as equally bright in successive exposures of time t, if the mean number n¯ of electrons, which
44
O. BOSTANJOGLO
each region reflects during an exposure, exceeds the root-mean-square of the √ shot noise amplitude n¯ by a minimum signal-to-noise ratio r: n¯ √ >r n¯
(28)
Since n¯ = (π/4e)d 2j t, with e and j the electron charge and mean current density at the reflecting plane, respectively, and since the latter can be expressed by the half angle α of the illuminating beam and by the brightness R of the used electron gun as j = Rπ α 2, Eq. (28) gives (π 2/4e)Rα 2 td 2 > r 2
(29)
The two circular regions are clearly separated, if the distance of their centers x, which denotes the spatial resolution, exceeds their radius d/2. This gives as a preliminary result (π 2/e)Rα 2 t(x)2 > r 2
(30)
Now, a nonparallel electron beam with half-angle α entering the mirror produces a disk as the projected image of an object point. This aberration is easily determined, to a good approximation for electrons paraxially entering the mirror. The latter may be replaced by a thin concave lens, produced by the field step at the mirror anode and a homogeneous electric field (e.g., Rempfer and Griffith, 1992). This field decelerates the incoming electrons and accelerates the reflected electrons, respectively. Two object points, that is, two points on the reflecting plane, with a distance x have projected images spaced by (2L/f ) x, where f is the focal length of the concave lens and L is the distance of the image plane from the mirror anode (L being much larger than f and the spacing of the two mirror electrodes). The image of a point is found to be a disk with a diameter of 2Lα. Accordingly, two object points can be distinguished only if (2L/f ) x > 2Lα, or if their spacing obeys x > f α
(31)
Eliminating α with this relation in Eq. (30) gives for the joint space–time resolution (x)4 t >
r 2e f 2 π2R
(32)
With the relevant values f ≈ 16 mm and R ≈ 7 × 106 A/cm2 · sr and assuming a minimum signal-to-noise ratio of r ≈ 5, a spatial resolution of x ≈ 0.7 μm is computed during the used exposure time of t = 5 ns, which is in the order of what was achieved.
HIGH-SPEED ELECTRON MICROSCOPY
45
The diameter xC of the chromatic aberration disk is computed in a similar way to Relation (31) as E (33) E where x, a, E, and E are the distance of the object point from the optical axis, the spacing of the mirror electrodes, the kinetic energy of the illuminating electrons, and the kinetic energy spread. For typical values x < 300 μm, a ≈ 4 mm, α ≈ 10−3 rad, E ≈ 20 keV, and E < 1 eV, the chromatic aberration is xC < 15 nm and therefore negligible as compared with the resolution limit due to shot noise in nanoseconds exposure images. xC ≈ (x + 2aα)
IV. Conclusions Electron microscopy is an indispensable method for characterization and analysis of materials down to the atomic scale. A very useful application is the in situ investigation, which allows imaging of the dynamics of miscellaneous processes. The time scale of four types of electron microscopes was pushed down to a few nanoseconds for nonrepetitive processes by implementing a high-current laser pulse–driven thermal- and photoelectron gun, fast electron beam shifting, and electronic image registration. The extended electron microscopes were of the transmission, photoemission, and reflection and mirror types, which give access to the volume of the specimen, its surface, and the space above the surface, respectively. Three complementary high-speed techniques were realized: multiframe imaging, streak imaging, and image intensity tracking. The potential of the new time-resolving probes was demonstrated by tracing fast laser-triggered effects as phase transitions, melt instabilities, chemical reactions, and mechanical deformations. Melt flow driven by large thermal and compositional gradients, evaporation of superheated liquid metals, and decomposition and precipitation of oxide surface layers were investigated. High-speed electron microscopy has uncovered effects to which conventional methods, based on light optics, have no easy access. Femtosecond laser pulses, depositing their energy in the electronic system, which then destabilizes the atomic lattice, were found to produce extraordinary effects on a “thermodynamic” (nanosecond) time scale. These effects were completely different from those initiated on the same time scale by the exclusively “thermal” nanosecond laser pulses. Modeling the dynamics, visualized by transmission microscopy, with computer-based numerical simulations allows one to extract material parameters
46
O. BOSTANJOGLO
relevant at temperatures up to the critical point, at thermal gradients up to several 103 K /μm, and at stresses up to the theoretical yield point. Photoelectron and mirror microscopy were found to be well suited to uncover on the nanosecond time scale early stages of material modifications, such as removal or addition of monolayers, where rival high-speed light-optical methods fail because of lacking contrast. The resolution of the described highspeed microscopes is currently limited by shot noise in the electron image to several hundred nanometers and a few nanoseconds for nonrepetitive processes. A higher space–time resolution of the photoelectron microscope can be reached only if the buildup of negative space charge at the electron emitters is reduced. Brighter electron guns would improve the resolution of transmission, reflection, and mirror microscopy. They can possibly be realized by locally increasing the electric field with suitably corrugated emitters, or perhaps by exploiting the very high electric fields of ultrashort laser pulses in completely new designs. Adverse space charge effects at the surface of specimens in the photoelectron microscope could be overcome by pulsing the accelerating voltage. Voltage levels significantly exceeding the presently used safe dc value could be applied during the short imaging time without causing an electric breakdown. So that blurring due to the inevitable oscillations in the voltage pulse at the cathode is avoided, the emission microscope must be all electrostatic and the voltage of all lenses must be derived from the cathode voltage by fast capacitive/resistive dividers.
Acknowledgments Sincere thanks are due to F. Rohn-Schwarz, H. D¨omer, H. Kleinschmidt, T. Nink, and M. Weing¨artner for helping to produce this article. The high-speed research was generously supported by the Deutsche Forschungsgemeinschaft and by the Alexander von Humboldt Stiftung.
References Anderson, T., Tomov, N., and Rentzepis, P. M. (1992). Laser-driven metal photocathodes for picosecond electron and X-ray pulse generation. J. Appl. Phys. 71, 5161–5167. Babout, M., Le Bosse, J. C., Lopez, J., Gauthier, R., and Guittard, C. G. (1977). Mirror electron microscopy applied to the determination of the total electron reflection coefficient at a metallic surface. J. Phys. D: Appl. Phys. 10, 2331–2341. Balandin, V. Y., Otte, D., and Bostanjoglo, O. (1995). Thermocapillary flow excited by focused nanosecond laser pulses in contaminated thin liquid iron films. J. Appl. Phys. 78, 2037–2044. Balandin, V. Y., Niedrig, R., and Bostanjoglo, O. (1995). Simulation of tranformations in thin metal films heated by nanosecond laser pulses. J. Appl. Phys. 77, 135–142.
HIGH-SPEED ELECTRON MICROSCOPY
47
Balandin, V. Y., Gernert, U., Nink, T., and Bostanjoglo, O. (1997). Segregation and surface transport of impurities: New mechanism affecting the surface morphology of laser treated metals. J. Appl. Phys. 81, 2835–2838. Balandin, V. Yu., Nink, T., and Bostanjoglo, O. (1998). Pulsation of a liquid excited by a fastmoving crystallization front with segregation of surface active impurities. J. Appl. Phys. 84, 6355– 6358. Batinic, M., Begert, D., and Kubalek, E. (1995). Pulsed electron beam generation via laser stimulation. Nucl. Instrum. Meth. Phys. Res. A 363, 43. Baum, A. W., Spicer, W. E., Pease, R. F., Castello, K. A., and Aebi, V. W. (1995). Negative electron affinity photocathodes as high performance electron sources. SPIE 2522, 208–212. Bethge, H., and Heydenreich, J. (1987). Electron Microscopy in Solid State Physics. Amsterdam Elsevier, p. 229. Boersch, H. (1943). Die Verbesserung des Aufl¨osungsverm¨ogens im EmissionsElektronenmikroskop. Z. Tech. Phys. 23, 129–130. von Borries, B. (1940). Sublichtmikroskopische Aufl¨osung bei der Abbildung von Oberfl¨achen ¨ im Ubermikroskop. Z. Phys. 116, 370–378. Bostanjoglo, O., and Heinricht, F. (1987). Producing high-current nanosecond electron pulses with a standard tungsten hairpin gun. J. Phys. E: Sci. Instrum. 20, 1491–1493. Bostanjoglo, O., and Heinricht, F. (1990). A reflection electron microscope for imaging of fast phase transitions on surfaces. Rev. Sci. Instrum. 61, 1223–1229. Bostanjoglo, O., Heinricht, F., and W¨unsch, F. (1990). Operation of a high-brightness laserpulsed thermal electron gun. Proceedings of The Twelfth International Congress on Electron Microscopy, Vol. 1, edited by L. D. Peachy and D. B. Williams. San Francisco: San Francisco Press, pp. 124–125. Bostanjoglo, O., and Kornitzky, J. (1990). Nanosecond double-frame and streak transmission electron microscopy. Proceedings of The Twelfth International Congress on Electron Microscopy, Vol. 1, edited by L. D. Peachy and D. B. Williams. San Francisco: San Francisco Press, pp. 180–181. Bostanjoglo, O., Kornitzky, J., and Tornow, R. P. (1989). Nanosecond double-frame electron microscopy of fast phase transitions. J. Phys. E: Sci. Instrum. 22, 1008–1011. Bostanjoglo, O., and Liedtke, R. (1980). Tracing of fast phase transitions by electron microscopy. Phys. Stat. Sol. (a) 60, 451–455. Bostanjoglo, O., Marine, W., and Thomsen-Schmidt, P. (1992). Laser-induced nucleation of crystals in amorphous Ge films. Appl. Surf. Sci. 54, 302–307. Bostanjoglo, O., and Nink, T. (1996). Hydrodynamic instabilities in laser pulse-produced melts of metal films. J. Appl. Phys. 79, 8725–8729. Bostanjoglo, O., and Nink, T. (1997). Liquid motion in laser pulsed Al, Co and Au films. Appl. Surf. Sci. 109/110, 101–105. Bostanjoglo, O., and Otte, D. (1993). High-speed transmission electron microscopy of laser quenching. Mater. Sci. Eng. A 173, 407– 411. Bostanjoglo, O., Schlotzhauer, G., and Schade, S. (1982). Shaping trigger pulses from noisy signals and time-resolved TEM of fast phase transitions. Optik 61, 91–97. Bostanjoglo, O., and Thomsen-Schmidt, P. (1989). Laser induced multiple phase transitions in Ge-Te films traced by time-resolved TEM. Appl. Surf. Sci. 43, 136–141. Bostanjoglo, O., Tornow, R. P., and Tornow, W. (1987a). A pulsed image converter for nanosecond electron microscopy. Scanning Micros. Suppl. 1, 197–203. Bostanjoglo, O., Tornow, R. P., and Tornow, W. (1987b). Nanosecond-exposure electron microscopy of laser-induced phase transformations. Ultramicroscopy 21, 367–372. Bostanjoglo, O., and Weing¨artner, M. (1997). Pulsed photoelectron microscope for imaging laser-induced nanosecond processes. Rev. Sci. Instrum. 68, 2456–2460.
48
O. BOSTANJOGLO
Brunner, M., Winkler, D., Schmitt, R., and Lischke, B. (1987). Electron-beam test system for high-speed devices. Scanning 9, 201–204. Buzulutskov, A., Breskin, A., and Chechik, R. (1997). Photoemission through thin dielectric films. J. Appl. Phys. 81, 466– 479. Champion, J. A., Keene, B. J., and Sillwood, J. M. (1969). Wetting of Al2O3 by molten Al and other metals. J. Mat. Sci. 4, 39– 49. Chevallay, E., Durand, J., Hutchins, C., Suberlucq, G., and Wurgel, M. (1994). Photocathodes tested in the dc gun of the CERN photoemission laboratory. Nucl. Instrum. Meth. Phys. Res. A 340, 146 –156. Cowley, J. M., and Peng, L. M. (1985). The image contrast of surface steps in reflection electron microscopy. Ultramicroscopy 16, 59–67. De Stasio, G., Capozi, M., Lorusso, G. F., Baudat, P. A., Droubay, T. C., Perfetti, P., Margaritondo, G., and Tonner, B. P. (1998). Mephisto: Performance test of a novel synchrotron imaging photoelectron-spectromicroscope. Rev. Sci. Instrum. 69, 2062–2066. D¨omer, H., and Bostanjoglo, O. (2001). Nanosecond transmission electron microscopy of laserpulsed chromium films. Verhandl. DPG, Vol. 36. 1, edited by V. H¨aselbarth. Weinheim, Germany: Physik Verlag, p. 313. Ehsasi, M., Karpowicz, A., Berdau, M., Engel, W., Christmann, K., and Block, J. H. (1993). UV-photoemission electron microscopy investigation of pattern formation during oxidation of CO on a Pt (210) surface. Ultramicroscopy 49, 318–329. Elsayed-Ali, M. E., Norris, T. B., Pessot, M. A., and Mourou, G. A. (1987). Time-resolved observation of electron-phonon relaxation in copper. Phys. Rev. Lett. 58, 1212–1215. Engel, W. (1966). Proceedings of The Sixth International Congress on Electron Microscopy, Vol. 1, edited by R. Uyeda. Emission microscopy with different kinds of electron emission. Tokyo: Maruzen, pp. 217–218. Engel, W., Kordesch, M. E., Rotermund, H. H., Kubala, S., and von Oertzen, A. (1991). A UHV-compatible photoelectron emission microscope for applications to surface science. Ultramicroscopy 36, 148–153. Fujimoto, J. G., Liu, J. M., Ippen, E. P., and Bloembergen, N. (1984). Femtosecond laser interaction with metallic tungsten and nonequilibrium electrons and lattice temperatures. Phys. Rev. Lett. 53, 1837–1840. Gesley, M. (1993). An electron optical theory of beam blanking. Rev. Sci. Instrum. 64, 3169–3190. Giesen, M., Phaneuf, R. J., Williams, E. D., Einstein, T. L., and Ibach, H. (1997). Characterization of p-n junctions and surface-states on silicon devices by photoemission electron microscopy. Appl. Phys. A 64, 423–430. Girardeau-Montaut, J. P., Girardeau-Montaut, C., Afif, M., Perez, A., and Monstaizis, S. D. (1995). Enhancement of photoelectric sensitivity by K+ ion implantation. Appl. Phys. Lett. 66, 1886–1888. Girardeau-Montaut, J. P., Girardeau-Montaut, C., and Monstaizis, S. D. (1994). Femtosecond nonlinear single-photon photoelectron emission from tungsten at 248 nm. J. Phys. D: Appl. Phys. 27, 848–851. Griffith, O. H. (1986). Photoelectron Microscopy—Applications to biological surfaces. Appl. Surf. Sci. 26, 265–279. Griffith, O. H., Habliston, P. A., and Birrell, G. B. (1991). Bibliography on emission microscopy, mirror electron microscopy, LEEM and related techniques: 1985–1991. Ultramicroscopy 36, 262–274. Griffith, O. H., and Rempfer, G. F. (1987). Photoelectron imaging: Photoelectron microscopy and related techniques. Adv. Opt. Electron Microsc. 10, 269–337. Hawkes, P. W., and Kasper, E. (1996). Principles of Electron Optics, Vol. 1. London: Academic Press, p. 261.
HIGH-SPEED ELECTRON MICROSCOPY
49
Heinricht, F., and Bostanjoglo, O. (1992). Laser ablation processes imaged by high-speed reflection electron microscopy. Appl. Surf. Sci. 54, 244–254. Ho, J. P., Grigoropoulos, C. P., and Humphrey, J. A. (1995). Computational study of heat transfer and dynamics in the pulsed laser evaporation of metals. J. Appl. Phys. 78, 4696–4709. Iida, T., and Guthrie, R. I. L. (1988). The Physical Properties of Liquid Metals. Oxford: Clarendon, p. 134. Kleinschmidt, H., and Bostanjoglo, O. (2000). Nanosecond mirror electron microscope. Proceedings of The Twelfth European Congress on Electron Microscopy, Vol. 4, edited by L. Frank and F. Ciamper, ), Czechoslovak Society for Electron Microscopy, pp. S77–78. Koechner, W. (1996). Solid-State Laser Engineering. Berlin: Springer-Verlag, p. 458. Lablond, B., and Rajaonera, G. (1994). Photoemission in the picosecond regime from a coated trioxide cathode. Nucl. Instrum. Meth. Phys. Res. A 340, 195–198. Larciprete, R., Borsella, E., and Cinti, P. (1996). KrF-excimer-laser-induced native oxide removal from Si (100) surfaces studied by Auger electron spectroscopy. Appl. Phys. A 62, 103– 114. Massey, G. A. (1983). Measurement of laser photoelectron image degradation at high current densities. IEEE J. Quantum Electron. QE-19, 873 – 877. Massey, G. A., Jones, M. D., and Plummer, B. P. (1981). Space-charge aberrations in the photoelectron microscope. J. Appl. Phys. 52, 3780–3790. May, P. G., Petkie, R. R., Hasper, J. M. E., and Yee, D. S. (1990). Photoemission from thin-film lanthanum hexaboride. Appl. Phys. Lett. 57, 1584–1585. Metev, S. M., and Veiko, V. P. (1998). Laser-Assisted Microtechnology. Berlin: Springer-Verlag, pp. 46–52. Miyata, N., Watanabe, H., and Ichikawa, M. (1998). HF-chemical etching of the oxide layer near a SiO2/Si (111) interface. Appl. Phys. Lett. 73, 3923–3925. M¨ollenstedt, G., and Lenz, F. (1963). Electron emission microscopy. Adv. Electron. and Electron Phys. 18, 251–329. Murr, L. E. (1991). Electron and Ion Microscopy and Microanalysis: Principles and Applications. New York: Dekker. Niedrig, R., and Bostanjoglo, O. (1997). Imaging and modeling of pulse laser induced evaporation of metal films. J. Appl. Phys. 81, 480–485. Nink, T. (1998). High-speed transmission electron microscopy of instabilities in laser pulseproduced melts in metal films. Doctoral thesis, Technische Universit¨at Berlin. Nink, T., Galbert, F., Mao, Z., and Bostanjoglo, O. (1999). Dynamics of laser pulse-induced melts in Ni-P visualized by high-speed transmission electron microscopy. Appl. Surf. Sci. 138–139, 439–443. Ninomiya, K., and Hasegawa, M. (1995). Scanning photoelectron microscope with sub μm lateral resolution using a Wolter-type X-ray focusing mirror. J. Vac. Sci. Technol. A 13, 1224 – 1228. von Oertzen, A., Rotermund, H. H., and Nettesheim, S. (1992). Investigation of diffusion of CO adsorbed on Pd (111) by a combined PEEM/LITD technique. Chem. Phys. Lett. 199, 131– 137. ¨ Orthuber, R. (1948). Uber die Anwendung des Elektronenspiegels zum Abbilden der Potentialverteilung auf metallischen und Halbleiter-Oberfl¨achen. Z. Angew. Phys. 1, 79–89. Osakabe, N., Tanishiro, Y., Yagi, K., and Honjo, G. (1980). Reflection electron microscopy of clean and gold deposited (111) silicon surfaces. Surf. Sci. 97, 393–408. Plies, E. (1982). Proposal for an electron beam blanking system with monochromator effect. Proceedings of The Tenth International Congress on Electron Microscopy, Vol. 1, edited by J. B. Le Poole, E. Zeitler, G. Thomas, G. Schimmel, C. Weichan, and Y. V Bassewitz. Frankfurt/Main: Deutsche Gesellschaft Elektronenmikroskopie, pp. 319–320.
50
O. BOSTANJOGLO
Preuss, S., Demchuk, A., and Stuke, M. (1995). Sub-picosecond UV laser ablation of metals. Appl. Phys. A 61, 33–37. Pronko, P. P., Dutta, S. K., Du, D., and Singh, R. K. (1995). Thermophysical effects in laser processing of materials with picosecond and femtosecond laser pulses. J. Appl. Phys. 78, 6233–6240. Reimer, L. (1985). Scanning Electron Microscopy. Berlin: Springer-Verlag. Reimer, L. (1993). Transmission Electron Microscopy. Berlin, Springer-Verlag. Rempfer, G. F., and Griffith, O. H. (1992). Emission microscopy and related techniques: Resolution in photoemission microscopy, low energy electron microscopy and mirror microscopy. Ultramicroscopy 47, 35–54. Ricci, E., and Passerone, A. (1993). Review: Surface tension and its relations with adsorption, vaporization and surface reactivity of liquid metals. Mater. Sci. Eng. A 161, 31–40. Rotermund, H. H., Engel, W., Jackubith, S., von Oertzen, A., and Ertl, G. (1991). Methods and application of UV photoelectron microscopy in heterogeneous catalysis. Ultramicroscopy 36, 164 –172. Ruska, E. (1933). Die elektronenmikroskopische abbildung elektronenbestrahlter oberfl¨achen. Z. Phys. 83, 492–497. Sabary, F., and Bergeret, H. (1994). Laser-induced electron emission from granular Au films. Nucl. Instrum. Meth. Phys. Res. A 340, 199–203. Sch¨afer, B., and Bostanjoglo, O. (1992). Laser driven thermionic electron gun. Optik 92, 9–13. Sch¨onlein, R. W., Lin, W. Z., Fujimoto, J. G., and Eesley, G. L. (1987). Femtosecond studies of nonequilibrium electronic processes in metals. Phys. Rev. Lett. 58, 1680–1683. Singh, R. K., Holland, O. W., and Narayan, J. (1990). Theoretical model for deposition of superconducting thin films using pulsed laser evaporation technique. J. Appl. Phys. 68, 233– 247. Stampfli, P., and Bennemann, K. H. (1992). Dynamical theory of or the laser-induced lattice instability of Si. Phys. Rev. B 46, 10686–10692. Szentesi, O. I. (1972). Stroboscopic electron mirror microscopy at frequencies up to 100 MHz. J. Phys. E: Sci. Instrum. 5, 563–567. Tanishiro, Y., Takayanagi, K., and Yagi, K. (1983). On the phase transition between the 7 × 7 and 1 × 1 structures of Silicon (111) surface studied by reflection electron microscopy. Ultramicroscopy 11, 95–102. Travier, C. (1994). An introduction to photo-injector design. Nucl. Instrum. Meth. Phys. Res. A 340, 26–39. Vitol, E. N., and Orlova, K. B. (1984). The surface tension of liquid metals. Russ. Metall. 4, 34 – 40. Wang, X. Y., Riffe, D. M., Lee, Y. S., and Downer, M. C. (1994). Time-resolved electron temperature measurement in a highly excited Au target using femtosecond thermionic emission. Phys. Rev. B 50, 8016 – 8019. Watari, F., and Yada, K. (1986). Photoemission from LaB6 cathode using an excimer laser. Proceedings of The Eleventh International Congress on Electron Microscopy, Vol. 1, edited by T. Imura, S. Maruse, T. Suzuki. Tokyo: Japanese Society for Electron Microscopy, pp. 261–262. Weing¨artner, M., and Bostanjoglo, O. (1998). Pulsed photoelectron microscope for time-resolved surface investigations. Surface and Coating Technol. 100/101, 85–89. Weing¨artner, M., Elschner, R., and Bostanjoglo, O. (1999). Patterning of silicon-differences between nanosecond and femtosecond laser pulses. Appl. Surf. Sci. 138–139, 499–502. Yablonovitch, E., Allara, D. L., Chang, C. C., Gmitter, T., and Bright, T. B. (1986). Unusually low surface-recombination velocity on silicon and germanium surfaces. Phys. Rev. Lett. 57, 249–252.
HIGH-SPEED ELECTRON MICROSCOPY
51
Yada, K. (1986). Researches of cathode materials for thermionic emission. Proceedings of The Eleventh International Congress on Electron Microscopy, Vol. 1, edited by T. Imura, S. Maruse, and T. Suzuki. Tokyo: Japanese Society for Electron Microscopy, pp. 227–228. Yagi, K. (1993). Reflection electron microscopy: Studies of surface structures and surface dynamic processes. Surf. Sci. Rep. (Netherlands) 17, 305–362. Yamanoka, A., and Yagi, K. (1989). Surface electromigration of metal atoms on Si (111) surfaces studied by UHV reflection electron microscopy. Ultramicroscopy 29, 161–167.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 121
Applications of Transmission Electron Microscopy in Mineralogy P. E. CHAMPNESS Department of Earth Sciences, University of Manchester, Manchester M13 9PL, United Kingdom
I. Introduction . . . . . . . . . . . . . . . II. Analytical Electron Microscopy of Minerals . . III. Phase Separation (Exsolution) . . . . . . . . A. Alkali Feldspars . . . . . . . . . . . . B. Amphiboles . . . . . . . . . . . . . . 1. Exsolution in Monoclinic Amphiboles . . 2. Exsolution in Orthorhombic Amphiboles . IV. HRTEM and Defect Structures . . . . . . . A. Biopyriboles and Polysomatic Defects . . . 1. New Biopyriboles . . . . . . . . . . 2. Chain-Width Disorder in Pyriboles . . . 3. Polysomatic Reactions in Pyriboles . . . V. Concluding Remark . . . . . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
53 55 59 60 68 71 77 81 81 82 83 84 87 87
I. Introduction Although transmission electron microscopy (TEM) became a routine tool for the physical metallurgist in the 1960s and the theory of image formation from crystalline materials was well established by then, it was not until the 1970s that the TEM was adopted to any great extent by workers in the earth sciences. The main reason for the long delay was that there was no reliable method for preparing thin foils of nonmetallic materials; studies were restricted to cleavage fragments of layered structures or to powdered fragments sedimented onto carbon films. The latter technique allowed examination of only microstructural features smaller than about 1 μm, and spatially related information on a larger scale than this was largely lost. The advent of reliable, commercial, beam-thinning devices in the early 1970s solved the problem of specimen preparation. Foils in which hundreds of square microns are transparent to the electron beam can now be prepared almost routinely. Disks 3 mm in diameter can be drilled from petrographic thin sections that are approximately 25 μm thick and thinned with a beam of energetic ions or atoms (usually argon) until perforation. The thin sections 53 Volume 121 ISBN 0-12-014763-7
C 2002 by Academic Press ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright All rights of reproduction in any form reserved. ISSN 1076-5670/02 $35.00
54
P. E. CHAMPNESS
Figure 1. Partial projections of the linkages of the Si–O tetrahedra in pyroxenes (singlechain silicates), amphiboles (double-chain silicates), micas (sheet silicates), and feldspars (framework silicates).
can be studied beforehand in the petrographic optical microscope, the scanning electron microscope (SEM), or the electron-microprobe analyzer (EMPA), and regions of interest for TEM study can be chosen. At almost the same time that beam-thinning machines came on the market, the first moon-rock samples started arriving on Earth as a result of the Apollo space missions. For a time in the early 1970s, more moon-rock samples had been studied in the TEM than terrestrial samples. My first ion-thinned mineral specimen was a pyroxene (a single-chain silicate, Fig. 1) from Apollo 11. Since those days, the TEM has become an integral part of much of mineralogical research. In this review I highlight just a few examples that illustrate the impact that TEM has had in mineralogy in the last 25 years. I have chosen to concentrate on two of the commonest silicate groups—the alkali (Na–K) feldspars (framework aluminosilicates, Fig. 1) and the amphiboles (double-chain silicates, Fig. 1)— although I also describe the important contribution that high-resolution transmission electron microscopy (HRTEM) has made to our understanding of mixed-chain structures.
APPLICATIONS OF TEM IN MINERALOGY
55
II. Analytical Electron Microscopy of Minerals The advent of X-ray analysis in the TEM has allowed us to identify fine-scale mineral phases that would have been impossible or extremely tedious to identify by electron diffraction, given the large unit cells, complex chemistry, and low symmetries that are involved in most cases. As will be seen in Section III.B, investigations of phase separation (or exsolution) in the amphibole group have relied very heavily on analytical electron microscopy (AEM), so it is worth outlining here some of the procedures that need to be adopted in the AEM of minerals and some of the precautions that need to be taken. The basis of quantification of mineral analyses is the thin-film criterion of Cliff and Lorimer (1975) in which X-ray absorption and secondary X-ray fluorescence are assumed to be negligible to a first approximation and the ratio of the concentrations of two elements C A /C B is related to the ratio of their measured X-ray intensities I A /I B by the equation IA CA = k AB CB IB
(1)
where k AB is a sensitivity factor that accounts for the relative efficiency of production and detection of the X-rays. For silicates, the reference element, B, is silicon. Because silicates are composed predominantly of oxygen and specific gravities are normally between 2.5 and 3.5, the thickness, tmax, at which Eq. (1) breaks down and corrections for absorption and fluorescence must be made is larger than that for metallic systems. Nord (1982) calculated the value of tmax for Mg/Si, Ca/Si, and Fe/Si for members of the pyroxene quadrilateral. Figure 2 shows a compilation of the minimum value of tmax for all three elemental ratios and indicates that analysis must be carried out in areas that are less than 130–300 nm thick, depending on the bulk composition, if absorption effects are to be insignificant. As it happens, the maximum thickness for which microstructures in silicates can be observed when 100-kV electrons are used is about 200 nm, so if microscopy can be carried out in an area of the foil at ∼100 kV, it can be assumed that the foil fulfills the thin-film criterion for elements Z ≥ 11. For higher voltages or lighter elements, this rule of thumb cannot be used and care must be taken to work in suitably thin areas, or, alternatively, to correct for absorption. Silicates are composed predominantly of oxygen, which cannot be reliably quantified by AEM, even with detectors with ultrathin windows. The method adopted for quantification of anhydrous silicate (or other oxide) phases is to assume that all cations are present as oxides and that the sum of the oxides is 100%. The chemical formula is then recalculated to a suitable number of oxygens: for example, six in the case of the pyroxene (a single-chain silicate,
56
P. E. CHAMPNESS
Figure 2. Maximum thickness (in nanometers) of Ca–Mg–Fe pyroxenes for which absorption corrections can be ignored. (Source: After Nord, 1982.)
Fig. 1) in Table 1 because the general formula for pyroxene is M2Si2O6, where M stands for cations other than silicon. Problems obviously arise in the case of cations such as Fe that can take a number of valences and when there are elements other than oxygen that cannot be detected. The most common of these is hydrogen, as many silicates are hydrated (Fig. 1). For hydrated samples, if all other cations can be detected, a total can be assumed for the oxide analysis that is appropriate to the mineral type (e.g., 95 wt % for the sheet silicate mica, Fig. 1) or the formula can be normalized to an appropriate number of oxygens [22 for micas, Table 1, as the general formula for mica is X2Y4–6 Z8 O20(OH)4, where X and Y are nontetrahedral cations and Z is Si or Al]. In the general case it is recommended that, when possible, normalization be carried out on the basis of the known number of cations in a particular crystallographic site (Peacor 1992). For instance, in Table 1 the tetrahedral sites in pyroxene and mica have been assigned 2 and 8 (Si + Al), respectively. For the mica, the cations known to occupy the X and Y crystallographic sites have been grouped to give totals of 1.78 and 5.71, respectively. A more complex assignment of cations to particular sites will be encountered in Section III.B, where the amphibole group is considered in some detail. Perhaps the severest problem encountered in the AEM of silicates is that of specimen damage during analysis. Silicates are known to suffer from radiolysis (i.e., electronic excitation leading to atomic displacement) during electron
57
APPLICATIONS OF TEM IN MINERALOGY TABLE 1 AEM Analyses of Two Silicatesa Pyroxene
Mica (biotite)
1 SiO2
48.01
Al2O3
4.88
TiO2 FeO MnO MgO Na2O CaO K2 O
0.00 29.91 1.89 15.31 0.00 0.00 0.00
Total
100.00
2
3
Si
1.87
AlIV AlVI Ti Fe2+ Mn Mg Na Ca K
0.13 ⎫ 0.09 ⎪ ⎪ ⎪ 0.00 ⎪ ⎪ ⎪ ⎪ 0.97 ⎪ ⎪ ⎬ 0.06 0.89 ⎪ ⎪ ⎪ ⎪ 0.00 ⎪ ⎪ ⎪ ⎪ 0.00 ⎪ ⎭ 0.00
O
6
2.00
2.01
4
38.13
Si
5.31
23.20
AlIV AlVI Ti Fe2+ Mn Mg Na Ca K
2.69 ⎫ 1.12 ⎪ ⎪ 0.16 ⎪ ⎬ 1.60 ⎪ ⎪ 0.00 ⎪ ⎭ 2.83
0.16 0.00 1.62
O
22
1.58 13.75 0.00 13.58 0.62 0.00 9.14
8.00
5.71
1.78
100.00
Source: Champness (1995); reproduced by permission of Chapman & Hall. The oxide weight percentages in columns 1 and 3 were derived assuming a total of 100%. The atomic formulas in columns 2 and 4 were calculated assuming a total of 6 and 22 oxygens and a total of 2 and 8 (Si + Al) for the pyroxene and the biotite, respectively. All iron has been assumed to be Fe2+. a
irradiation. The high current densities used for high-resolution AEM can lead to significant structural and chemical changes which ultimately limit the accuracy of analyses. The degree of sensitivity to damage depends on a number of factors, among which are the type of linkage of the Si–O tetrahedra, the nature of the cations (Na and K being the most vulnerable to loss), and the presence or absence of hydroxyl ions (Champness and Devenish, 1992; Hobbs, 1984; Veblen and Buseck, 1983). Champness and Devenish (1992) and Devenish and Champness (1993) have shown that all silicates experience some mass loss at the highest current densities used in AEM, but that there is a threshold of the current density for each element in a particular structure for which no loss occurs. For instance, the threshold values of the current density for which no loss occurs for any element is ≈105 A/m2 for calcic pyroxene (diopside) and about 3 × 104 A/m2 for calcic mica (margarite). Notice that both these values are lower than the current density in a focused beam from a LaB6 gun. At the highest current densities available [i.e., those obtainable with a field emission gun (FEG) plagioclase (Na–Ca) feldspar is reduced to the composition of SiO2 after 200 s (Fig. 3).
58
P. E. CHAMPNESS
Figure 3. Energy-dispersive X-ray spectra from plagioclase feldspar: (a) defocused beam rastered over specimen for 200 s; (b) beam focused at an approximate current density of 1.8 × 108 A/m2 in a dedicated scanning transmission electron microscope (STEM). The accelerating voltage was 100 kV.
It is clearly important that, when possible, the analyst operate below the current density at which damage occurs if quantitative results are required. Because of the dependence of the rate of damage on the current density, rather than on the total dose, defocusing the electron beam is more effective in minimizing mass loss than is rastering a focused beam over the same area. The effect of mass loss may also be minimized by using the highest voltage available (Fig. 4a) and by using a cooling stage (Fig. 4b).
Figure 4. Semilog plot for the loss of Na from plagioclase feldspar at a current density of 1.8 × 103 A/m2: (a) dependence on voltage; (b) dependence on temperature. (Source: After Devenish and Champness, 1993; reproduced by permission of the Institute of Physics.)
APPLICATIONS OF TEM IN MINERALOGY
59
III. Phase Separation (Exsolution) It is in the field of phase transformations that TEM has probably had the widest influence in mineralogy. It had long been known from the study of petrographic thin sections in the polarizing microscope that phase separation (or exsolution) is common in the pyroxenes, amphiboles, and feldspars from slowly cooled rocks such as large igneous intrusions. In the 1950s and 1960s, studies by single-crystal X-ray diffraction (XRD) (e.g., Bown and Gay, 1959; Smith and MacKenzie, 1955) were able to indicate the lattice orientations of these intergrowths and to show that exsolution was present in many minerals from more quickly cooled rocks, although the intergrowth was below the resolution of the light-optical microscope. XRD could not, however, give any indication of the mechanisms of exsolution, nor, in general, of the size of the precipitates or the orientation of their interfaces. These areas are where TEM has come into its own. During the early days of the investigation of exsolution in silicates by TEM, it became apparent that two mechanisms that are extremely rare in metallic systems are very common in silicates: spinodal decomposition (the gradual evolution of sinusoidal compositional waves, without a nucleation stage) and homogeneous nucleation and growth of the equilibrium phase (nucleation without the aid of structural defects). The reasons for this difference in behavior between metals and silicates lies in the fact that whereas the crystal structures of the matrix and equilibrium product phases are different in metallic systems, in most cases the structures of the two silicate phases involved in exsolution are identical (Aaronson et al., 1974). Added to this, in silicate systems the equilibrium solubility at high temperatures is relatively small and the volume change involved in the transformation is small. These factors result in the depression of the coherent spinodal below the equilibrium solvus being small enough that relatively rapid diffusion can take place when the temperature drops below that of the coherent spinodal. The factors that favor spinodal decomposition also favor homogeneous nucleation, although homogeneous nucleation is the more difficult process. However, because the equilibrium phases in silicates usually have a common structure of Si–O tetrahedra and only second, third, or even higher nearest neighbors need be in the “wrong” positions across the interphase interface, the chemical interfacial energy term is small. In addition, the appreciable decrease in solubility with temperature that occurs in silicates provides a high driving force for nucleation and growth. Nevertheless, the cooling rate needs to be extremely slow, as it is in many plutonic and metamorphic rocks, for homogeneous nucleation to occur before the coherent spinodal is reached.
60
P. E. CHAMPNESS
My examples of exsolution come from the alkali (Na–K) feldspars and the amphiboles and nicely illustrate the diversity of microstructures in the mineral kingdom. They also provide some very spectacular textures.
A. Alkali Feldspars The feldspars are the commonest silicates in the earth’s crust, making up some 54%. They largely belong to the ternary system NaAlSi3O8 (albite)–KAlSi3O8 (orthoclase)–CaAl2Si2O8 (anorthite), the NaAlSi3O8–KAlSi3O8 series being known as the alkali feldspars and the NaAlSi3O8–CaAl2Si2O8 being known as the plagioclase feldspars. The alkali feldspars show (almost) complete solid solution at temperatures above 660◦ C, but there is a solvus at lower temperatures which extends to almost pure albite and orthoclase at low temperatures (Fig. 5). For most of the
Figure 5. Simplified subsolidus phase diagram for the alkali feldspar binary NaAlSi3O8 (albite)–KAlSi3O8 (orthoclase) at 1 kbar as calculated by Robin (1974). The dashed line is the coherent solvus and the dotted line is the coherent spinodal. (Source: Champness and Lorimer, 1976; reproduced by permission of Springer-Verlag.)
APPLICATIONS OF TEM IN MINERALOGY
61
composition range, the alkali feldspars are monoclinic C2/m above the solvus, but both end members undergo a transition to triclinic C 1¯ ∗ symmetry at lower temperatures. For the sodic phase the transition is the result of distortion of the Si/Al–O framework and is rapid (it is classed as displacive by mineralogists and may well be martensitic), whereas the transition in the potassic phase is slow because it involves Si/Al ordering. The alkali feldspars show coarser precipitation structures (called perthites) than those of any other silicates; lamellae can be several millimeters wide in plutonic (slowly cooled) rocks. This fact can be attributed to the relatively high diffusivities of K and Na ions within the Si/Al–O framework and the fact that, unlike the plagioclase feldspars, precipitation does not require diffusion of the Si and Al. McConnell (1969) was the first to examine the microstructure of a volcanic alkali feldspar (compositional, 36% K-feldspar) in the TEM. He showed that it consisted of coherent compositional modulations with a wavelength of about 10 nm approximately parallel to (100). The diffraction pattern showed a single reciprocal lattice with strong streaks approximately parallel to a.∗ This was the first direct evidence that spinodal decomposition is an important mechanism of phase transformation in the alkali feldspars, as had first been suggested by Christie (1968). Since then, natural samples have been homogenized and heat treated to reproduce the modulated structures (Fig. 6; Owen and McConnell, 1971; Yund et al., 1974). Owen and McConnell were able to show that the wavelength of the modulation was characteristic of the annealing temperature and was larger for higher temperatures, as predicted by spinodal theory. Yund et al. (1974) annealed an initially homogeneous alkali feldspar for several days at 600◦ C and found that the modulations eventually developed into two separate ¯ lamellar phases approximately parallel to (601). Calculations by Willaime and Brown (1974) of the elastic energy at the boundary between two alkali feldspars where both are monoclinic, or where the Na-feldspar is average monoclinic due ¯ to periodic twinning, showed that a minimum occurs at approximately (601). Hence the orientation of the interphase boundary is determined predominantly by minimization of elastic strain. The chemical component of the interphase boundary energy is much less important because the Si/Al–O framework is unchanged across the interface. Although exsolution textures that can be attributed to nucleation and growth (including homogeneous nucleation) have been identified in natural alkali feldspars (e.g., Brown and Parsons, 1988; Snow and Yund, 1988), the interdiffusion of Na and K is too slow to allow nucleation of exsolution lamellae to occur in alkali feldspars in the laboratory. To circumvent this problem, Kusatz ∗ The nonstandard space group is used so that the monoclinic and triclinic phases have the same unit cells.
62
P. E. CHAMPNESS
Figure 6. Natural alkali feldspar (36 mol % K-feldspar) that has been homogenized and annealed at 540◦ C for 48 h at 1 kbar to produce a modulated structure approximately parallel ¯ to (601). Inset is an enlargement of a diffraction spot that shows satellites in a direction perpendicular to the modulations. (Sources: Owen and McConnell, 1971; reproduced by permission of Nature.)
et al. (1987) carried out exsolution experiments on alkali feldspars in which some of the Si had been substituted with Ge∗ to give compositions along the binary NaAlGe2.1Si0.9O8–KAlGe2.1Si0.9O8. This substitution causes the incoherent and coherent solvi to rise (to almost 900◦ C for the critical composition of the incoherent solvus), the solidus to be depressed, and the displacive transformation to move toward the K-rich side of the phase diagram. Kusatz et al. (1987) found two types of textures in their experiments. Short, widely spaced, lens-shaped lamellae were produced between the incoherent solvus and the coherent spinodal and were ascribed to nucleation and growth, whereas thin, closely spaced, and branching lamellae formed only in the central part of the solvus and were ascribed to spinodal decomposition. ∗ This is a trick that mineralogists often employ. For instance, Ge has been substituted for Si in olivine, Mg2SiO4, so that the olivine → spinel transition that occurs at a depth of about 400 km in the earth can be studied in the laboratory (e.g., Rubie and Champness, 1987). The transition occurs at a lower pressure in the germanate because Ge has a smaller ionic radius than that of Si.
63
APPLICATIONS OF TEM IN MINERALOGY
In a detailed study of the coarsening of spinodal textures in alkali feldspars, Yund and Davidson (1978) found that the lamellar spacing could be described as being proportional to the cube root of the annealing time at constant temperature by the relation λ = λ0 + kt 1/3
(2)
where λ0 is the spacing at zero time and k is a rate constant for each temperature. An Arrhenius plot of the natural logarithm of k against 1/T , where T is the temperature, showed a linear relationship within experimental error. However, as Yund and Davidson (1978) acknowledged, the t 1/3 law applies to the coarsening of spherical particles and is not appropriate to the coarsening of lamellae. Brady (1987) proposed that the principal mechanism for coarsening in this case is diffusional exchange between the wedge-shaped terminations of exsolution lamellae as seen in the TEM (Fig. 6) and the large, flat sides of adjacent lamellae. Having derived a formula for the chemical potential gradient due to interfacial energy effects, Brady extended the work of Cline (1971) on the coarsening and stability of lamellar eutectics, to show that the appropriate rate law for lamellar coarsening in silicates is given by λ2 = λ20 + kt
(3) 2
Brady replotted Yund and Davidson’s (1978) data on a graph of λ versus t (Fig. 7) and found an excellent fit which gave an activation energy for
Figure 7. Plot of λ2 versus time, t, for the coarsening experiments of Yund and Davidson (1978) on alkali feldspars. λ is the lamellar wavelength. (Source: Brady, 1987; reproduced by permission of the Mineralogical Society of America.)
64
P. E. CHAMPNESS
coarsening of 33 kcal/mol. Further evidence for the correctness of Brady’s model was provided by the fact that the values of λ0 , the lamellar wavelength at the beginning of coarsening, as derived from the graphs, increased systematically with temperature, as predicted by the theory of spinodal decomposition. The λ0 values obtained by Yund and Davidson from the t 1/3 rate law did not increase in this way. Equations (2) and (3) give different values of predicted lamellar wavelengths for long coarsening times (a difference of more than an order of magnitude for coarsening for 106 years at 500◦ C) but give comparable results for rapidly cooled rocks ( Brady, 1987). However, attempts to determine the cooling history of relatively quickly cooled rocks from the spacing of the lamellae have met with mixed success. There was good agreement between the lamellar spacings observed in a 5.2-m-wide dike and those predicted from heat-flow calculations and Eq. (2) (Christoffersen and Schedl, 1980), but less good agreement for lamellar spacings in a lava flow (Yund and Chapple, 1980) and in a large rhyolitic ash flow (Snow and Yund, 1988). It is also apparent that Si/Al ordering and twinning inhibit coarsening in more slowly cooled rocks (Brown et al., 1983). In some more slowly cooled alkali feldspars, the two-phase lamellar intergrowths have coarsened to the scale of visible light, with the consequence that the scattering of light from their regular interfaces produces iridescence. It was a TEM study by Lorimer and Champness (1973) of two gem-quality varieties of these feldspars, known as moonstones, that led to an understanding of the later stages of coarsening. Fleet and Ribbe (1963) were the first to examine a moonstone in the TEM, using crushed grains. They showed that it contained coherent, lamellar precipitates of triclinic Na-feldspar and mono¯ clinic K-feldspar approximately parallel to (601), the plane of iridescence. The Na-feldspar contained regularly spaced Albite twins,∗ as had been predicted by Laves (1952) from the presence of superlattice reflections parallel to b∗ in X-ray diffraction patterns. (The regularity of the twins, Laves suggested, reduces the strain energy of the interface between the two phases, a suggestion that was subsequently verified from calculations of the strain energy by Willaime and Gandais, 1972.) Lorimer and Champness’s samples had similar compositions (57.3 and 53.7 wt % K-feldspar) but showed markedly different phase distributions. The first sample, which exhibits a blue iridescence, was shown to contain wavy lamellae of regularly Albite-twinned Na-feldspar approximately parallel to ¯ (601), together with apparently monoclinic K-feldspar (Fig. 8a). The other moonstone, which shows a white iridescence, has a coarser microstructure ∗ Albite twins arise during the triclinic → monoclinic transition in Na-feldspar. They are normal twins with (010) as the twin and composition plane.
APPLICATIONS OF TEM IN MINERALOGY
65
Figure 8. Microstructure of two moonstones: (a) feldspar with bulk composition 57.3 wt % K-feldspar contains wavy lamellae of regularly Albite-twinned Na-feldspar approximately paral¯ lel to (601); (b) feldspar with bulk composition 53.7 wt % K-feldspar has a coarser microstructure ¯ with lozenge-shaped particles of Na-feldspar with boundaries approximately parallel to (6¯ 61) ¯ and smaller, zigzag lamellae parallel to approximately (601). (Source: Lorimer and Champness, 1973; reproduced by permission of Philosophical Magazine.)
containing discrete lozenge-shaped particles of regularly twinned Na-feldspar ¯ (Fig. 8b). Significantly, this with boundaries approximately parallel to (6¯ 61) sample also contained zigzag lamellae of Na-feldspar that were smaller in size than the lozenge-shaped particles and therefore must have predated them. Detailed investigation of the K-feldspar showed that it was triclinic and mostly twinned on the diagonal association (basically Albite-twinned, but slightly deformed). The preceding observations suggest the sequence shown in Figure 9 for the evolution of the microstructure in the coarser moonstone. After coarsening of the spinodal modulations has produced distinct lamellae approximately ¯ parallel to (601), the Na-feldspar becomes triclinic and twins on the Albite law. The periodic twinning relieves the strain at the interphase interface and the Na-feldspar remains monoclinic, on average. As the K-feldspar becomes ¯ triclinic, however, the lowest-energy interface becomes approximately (6¯ 61) (as shown in calculations by Willaime and Brown, 1974) and the interface
66
P. E. CHAMPNESS
Figure 9. Sequence of evolution of the microstructure in the moonstones in Figure 8 (Source: Putnis, 1992; reproduced by permission of Cambridge University Press.)
gradually changes during the coarsening process, producing, first, wavy lamellae and, later, discrete, lozenge-shaped particles. Examination of the phase distribution in the coarser of the two samples examined by Lorimer and Champness (1973) shows that rafting of the Narich particles has taken place (Fig. 8b) as a result of interaction of their strain fields during coarsening. This phenomenon has been reported in metallic systems (Ardell et al., 1966). Although the presence of a fluid phase is known not to have an affect on lattice diffusion (Yund, 1983) or on the coarsening of coherent lamellae in alkali feldspars (Yund and Davidson, 1978), it has a dramatic effect on the coarsening of alkali feldspar intergrowths as coherency is lost. Almost all plutonic, igneous rocks are affected to a greater or lesser extent by water derived from the magma (deuteric alteration) at temperatures <450◦ C. The familiar pink, white, or creamy translucent feldspar crystals in granites owe their distinctive appearance to the presence of numerous, small, tubular micropores that were originally, or in some cases still are, filled with fluid (Worden et al., 1990). Parsons
APPLICATIONS OF TEM IN MINERALOGY
67
(1978) and Parsons and Brown (1984) showed, using light-optical microscopy, that there was a connection between the turbidity of these feldspars and the development of coarse, irregular intergrowths of two alkali feldspar phases. Worden et al. (1990), in a TEM and SEM investigation of the microstructure of alkali feldspars from the Klokken syenitic intrusion, Greenland, showed that micropores are abundant in areas where the microstructure is coarsened but are almost absent from uncoarsened areas. The coarsening is patchy and involves an increase in scale of up to 103 without a change in the composition of the phases or in the bulk composition of the crystal. It occurs abruptly along an irregular front; the regular intergrowth that contains coherent, lozenge-shaped particles gives way, over a few microns, to a highly coarsened, irregular, semicoherent or incoherent intergrowth (Fig. 10). The pores occur along subgrain
Figure 10. Micrograph of an alteration front showing fully coherent, unaltered exsolution structure on the right and a deuterically coarsened, irregular, semicoherent microstructure containing micropores on the left. Ab, albite; Ksp, K-feldspar. (Source: Worden et al., 1990; reproduced by permission of Springer-Verlag.)
68
P. E. CHAMPNESS
boundaries within the phases or along the boundaries between them. It is clear that the coarsening has been facilitated by pervasive dissolution–redeposition in an aqueous fluid. The driving force for the coarsening is the reduction in total surface energy for the feldspar intergrowth, including the release of elastic strain energy. What is not so clear is why the fluid, which would be expected to flow along grain boundaries, gives rise to micropores that migrate into the crystal. B. Amphiboles Amphiboles have an extremely varied chemistry (the name is derived from the Greek amphibolos ‘ambiguous’, in allusion to the great variety of composition and appearance within this mineral group). Their chemical complexity explains why amphiboles occur in such a wide variety of igneous, metamorphic, and sedimentary rocks. The standard amphibole formula is taken to contain eight tetrahedral sites and can be expressed as VI A0−1 B2 CVI 5 T8 O22 (OH, F, Cl)2
where the Roman numeral superscripts refer to coordination numbers. The F and Cl content in the OH site is normally minor. The structure consists of double chains of Si/Al–O tetrahedra, which run parallel to the z axis, with cations between them that are coordinated to oxygens from the chains and to the OH at the centers of the hexagonal rings of the chains (Fig. 11). The large A
Figure 11. Diagrammatic representation of the structure of the double-chain silicate, amphibole. Left: Double chain of Si–O tetrahedra extending along the c axis and, below, a representation of the chain viewed end on; right: arrangement of the double chains viewed along the c axis. The M1, M2, and M3 cations form chains of edge-sharing octahedra between the apices of the tetrahedra, and the M4 polyhedra form similar chains between the bases of the tetrahedra. The large 10- to 12-fold coordinated polyhedral positions (the A sites) and the OH sites lie in the rings formed along the double chains. One I-beam has been shaded. (Source: Putnis, 1992; reproduced by permission of Cambridge University Press.)
69
APPLICATIONS OF TEM IN MINERALOGY TABLE 2 Simplified Classification for End-Member Amphibolesa A b
Na Na
Na Na
M4
(M1 + M2 + M3)
T
Mg2 Al2 Mg2
Mg5 Mg5 Mg5
Si8 Si6Al2 Si8
Fe2+ 2 Ca2 Ca2 Ca2 Ca2 Ca2 Ca2
Fe2+ 5 Mg5 2+ Fe5 Mg4Al Mg4Al Mg5 Mg3Al2
Si8 Si8 Si8 Si7Al Si6Al2 Si7Al Si6Al2
Na2 Na2 Na2
Mg3Al2 3+ Fe2+ 3 Fe2 2+ Fe4 , Fe3+
Si8 Si8 Si8
Magnesioanthophyllitec Magnesiogedrite Magnesiocummingtonitec ⎫ Grunerite ⎪ ⎪ ⎪ Tremolite ⎪ ⎪ ⎪ ⎪ Ferroactinolite ⎬ Magnesiohornblende ⎪ ⎪ Pargasite ⎪ ⎪ ⎪ ⎪ Edenite ⎪ ⎭ Tschermakite Glaucophane
Mg5 Mg4Fe3+
Si8 Si8
Richterite Ferriwinchite
CaNa CaNa
Riebeckite Arfvedsonite
Ferromagnesian amphiboles
Calcic amphiboles
Alkali amphiboles Sodic–calcic amphiboles
a There is complete solid solution between Mg and Fe in the M1–M4 sites. Mg-rich members have the prefix magnesio- and Fe-rich members have the prefix ferro- (or ferri-). Intermediate members have no prefix. b Denotes a vacant cation site. c The ferromagnesian amphiboles may be monoclinic (the magnesiocummingtonite–grunerite series) or orthorhombic (the magnesioanthophyllite–gedrite series). All other amphiboles are monoclinic.
site may be vacant or contain varying amounts of Na/Ca, while the B site in the formula corresponds to the M4 site in the structural diagram and may contain Ca, Na, Al, Fe2+, Mg, or Mn. The M4 site is either six- or eightfold coordinated by oxygen, depending on the chemistry; in the former case the symmetry is orthorhombic, Pnma, and in the latter case the symmetry is monoclinic C2/m (or occasionally P21 /m). C in the formula represents the M1, M2, and M3 sites in the structure, all of which are sixfold coordinated by oxygen (and also by OH in the case of M1 and M3). The cations Fe2+, Mg, Fe3+, Al, Cr, and Ti can occupy these sites. The tetrahedral sites, T, are occupied by Si and Al; the limit of Al substitution for Si appears to be Al2Si6. There is an elaborate scheme for naming the amphiboles (Leake, 1978), but a simplified scheme is shown in Table 2. Amphiboles may be considered as ordered stacking sequences of alternate layers of M–O polyhedra and tetrahedra along the x axis (Fig. 12) and there is a stagger of approximately ±c/3 between adjacent tetrahedral layers. For the monoclinic structures, this stagger is always in the same direction, but in
70
P. E. CHAMPNESS
Figure 12. Schematic representation of the stacking of the double chains in monoclinic (left) and orthorhombic (right) amphiboles projected along the b axis. Notice that the (+ + − − + +) sequence in the orthorhombic structure, compared with (+ + +) [or (− − −)] for the monoclinic structure, results in a doubling of the a axis for the former, compared with the latter, structure (∼1.9 and ∼1.0 nm, respectively). (Source: Hawthorne, 1981; reproduced by permission of the Mineralogical Society of America.)
the orthorhombic amphiboles there is a regular reversal of the stagger. The sequence is +c/3, +c/3, −c/3, −c/3 (or simply + + − −). It is this difference between the monoclinic and orthorhombic structures that results in the coordination of the M4 site being eightfold in monoclinic amphiboles but sixfold in orthorhombic ones. During the last three decades, considerable effort has been expended toward an understanding of the extent of solid solution and phase separation within and between the different amphibole series. There are miscibility gaps between all pairs of the major amphibole groups in Table 2, but there is also incomplete solid solution between some members of the individual groups, the solvus in the orthorhombic anthophyllite–gedrite series below about 600◦ C being the best documented (Spear, 1980). Some evidence for incomplete solid solution comes from the coexistence of two amphiboles that grew under equilibrium conditions. However, it can be difficult to establish that equilibrium has been attained (see the discussion in Smelik et al., 1991). As Robinson, Spear, et al. (1982) have pointed out, “The presence of one set of amphibole lamellae in another is one of the surest and soundest pieces of evidence for a . . . miscibility gap.” It is TEM that has often provided that evidence; although some of the coarser exsolution textures have been investigated by light-optical microscopy and EMPA, TEM and AEM have paid a very important role in unraveling phase relations and exsolution mechanisms in the amphiboles because of the small scale of some of the intergrowths.
APPLICATIONS OF TEM IN MINERALOGY
71
1. Exsolution in Monoclinic Amphiboles It has long been known that exsolution occurs between calcic and the monoclinic ferromagnesian amphiboles and between the members of the orthoamphibole series (see Ross et al., 1969, for a review) because the textures that are produced are large enough to be visible in the polarizing microscope. However, it was not possible to determine the exact chemical composition of the precipitates by EMPA, or even to determine their chemical nature at all in some cases, because they are beyond the resolution of the instrument. X-ray single-crystal photographs indicate that the two sets of exsolution lamellae that are visible optically in many slowly cooled calcic and monoclinic ¯ ferromagnesian amphiboles usually share a common (101) or (100) lattice plane. However, careful light-optical studies by Robinson, Jaffe, et al. (1971) showed that the orientations of the lamellar boundaries (habit planes) were not exactly parallel to these planes but could differ from them by 10◦ or more. ¯ Robinson, Jaffe, et al. (1971) used the symbols “101” and “100” to indicate the irrational orientations. I will use the same convention in this review. The relative cell parameters of Ca-rich and ferromagnesian, monoclinic amphiboles are such that for coherent precipitation of one phase from the other, one principal strain is of opposite sign to the other two (the strain quadric is a hyperboloid). Thus there are two directions perpendicular to the intermediate ¯ principal axis of strain (the y axis) for which the strain is zero. One is “101” and the other is “100.” As long as the b-axial lengths are nearly identical, as they are for the phases in question, these two orientations will provide lamellar interfaces of minimum strain, the exact orientations being determined by the relative values of the a and c repeats.∗ This treatment neglects elastic anisotropy and the chemical component of the interfacial energy. However, calculations of the three-dimensional variation of elastic strain-energy of monoclinic pyroxenes (Fletcher and McCallister, 1974), whose structures and phase relations mirror those of the amphiboles, shows that the energy minima are within a few degrees of those calculated from Robinson et al.’s (1971, 1977) two-dimensional, geometric model. Thus, as in the case of the alkali feldspars (Section III.A), the chemical component of the interphase boundary energy is unimportant because the two structures are identical, except for the cation distribution between the Si/Al–O double chains. ∗
Because the relative values of the a and c repeats of monoclinic pyroxenes vary considerably with temperature in the range in which exsolution occurs, the orientation of the lamellae also varies. This variation can be used to estimate the temperature at which exsolution began (Robinson, Ross, et al., 1977). However, the cell parameters of calcic and ferromagnesian amphiboles do not vary so drastically with temperature, and the range of exsolution temperatures is lower than that for pyroxenes, so thermal histories cannot be estimated for amphiboles from lamellar orientations in the same way as for pyroxenes.
72
P. E. CHAMPNESS
Figure 13. Exsolution in monoclinic amphiboles. (a) Exsolution of grunerite (Ca-poor, ¯ monoclinic amphibole) from hornblende (Ca-rich, monoclinic amphibole) by nucleation of “101” lamellae on a (100) twin boundary, T–T. Notice the growth ledges along some of the interfaces. (b) Exsolution of hornblende from grunerite in the same rock as in (a). Notice the homogeneously ¯ distributed “100” platelets of hornblende between the large “101,” X–X and “100,” Y–Y lamellae and the platelet-free zone adjacent to the large lamellae. (Source: Gittos et al., 1976; reproduced by permission of Springer-Verlag.)
TEM of calcic and ferromagnesian amphiboles that show optically visible ¯ “101” and “100” exsolution lamellae has revealed that the microstructure is more complex than it appears in the light microscope. Gittos et al. (1974, 1976) studied the amphiboles in three metamorphic rocks that contained coexisting grunerite–cummingtonite and hornblende. The large (up to 0.5 μm thick) ¯ “101” and “100” exsolution lamellae were found to be coherent with the matrix and had nucleated heterogeneously on twin boundaries or dislocations and thickened by the movement of ledges across the interfaces (Fig. 13a). In addition, the Ca-poor amphibole contained a much finer, homogeneously distributed set of “100” platelets of hornblende between the lamellae (Fig. 13b). A zone free of the platelets occurred adjacent to each lamella. Gittos et al. (1974) concluded that the platelets nucleated homogeneously in areas where the calcium supersaturation was high enough, a diffusion profile having been left from the growth of the lamellae. The formation of the platelets or Guinier-Preston
APPLICATIONS OF TEM IN MINERALOGY
73
Figure 14. Two-stage exsolution involving three different amphiboles: (a) primary exso¯ and “28¯ 1;” ¯ lution of cummingtonite (Cum) lamellae from glaucophane (Gl) parallel to “281” (b) secondary exsolution of actinolite parallel to “100” in the cummingtonite lamellae, different area of the same specimen as in (a). (Source: Smelik and Veblen, 1989; reproduced by permission of the Mineralogical Association of Canada.)
(GP) zones in the Ca-poor, but not in the Ca-rich, amphiboles can be explained by the difference in the shape of the solvus on the two sides of the phase diagram, the Ca-poor side being much steeper than the Ca-rich side (Champness, in preparation). An example of a two-stage exsolution process involving three different monoclinic amphiboles was described by Smelik and Veblen (1994). The matrix phase is the alkali amphibole glaucophane and the first stage of exsolution consists of coherent cummingtonite lamellae, parallel to the irrational planes ¯ and “28¯ 1,” ¯ that reach a maximum thickness of 60–80 nm (Fig. 14a). The “281” most common mechanism of exsolution appears to have been homogeneous nucleation and growth, although there was some nucleation on dislocations and chain-width errors (see Section IV). Some of the cummingtonite lamellae contained periodic lamellae, up to 7.5 nm in width, of a second amphibole parallel to “100” (Fig. 14b). The periodicity of these lamellae and their thickness were dependent on the thickness of the host cummingtonite lamellae.
74
P. E. CHAMPNESS
Figure 15. Ca–Fe–Mg amphibole quadrilateral showing the compositions of the primary cummingtonite lamellae in Figure 14 as determined by AEM. The shaded areas show the normal compositions of natural amphiboles. The compositions of the lamellae fall inside the miscibility gap between Ca-rich and Ca-poor amphiboles. (Source: Smelik and Veblen, 1994; reproduced by permission of the Mineralogical Association of Canada.)
Compared with normal cummingtonite, the cummingtonite lamellae were significantly enriched in Ca, and when their compositions were plotted on the ternary Ca–Mg–Fe amphibole composition diagram, they fell well within the actinolite–cummingtonite miscibility gap and are thus metastable (Fig. 15). The secondary exsolution lamellae inside the cummingtonite lamellae were too narrow for quantitative analysis, but AEM showed that the Ca has segregated almost entirely to one of the phases, while the other phase is richer in Mg and Fe. Thus stable compositions of actinolite and cummingtonite have been produced by the second exsolution process. ¯ and “28¯ 1” ¯ are Smelik and Veblen (1991) showed by calculation that “281” the planes of minimum misfit [or optimal phase boundaries (OPBs)] for a coherent intergrowth of glaucophane and cummingtonite (elastic strain was ignored in the calculations). The glaucophane cell parameters used for the calculation were measured by powder XRD of grains of the glaucophane in the rock, while the cell parameters for the cummingtonite were derived by using the regression equations of Viswanathan and Ghose (1965) and the average composition was determined by AEM (the cell parameters vary nearly linearly with composition). The largest difference in the cell parameters was for the b axis, which confirms that the plane(s) of minimum misfit are expected to be close to (010). Smelik and Veblen (1994) showed from calculations of misfit and elastic strain that although the “100” boundary between the actinolite and cumming¯ interface between the actinolite and glaucophane tonite is optimal, the “281” has relatively high strain. The periodic nature of the secondary exsolution
APPLICATIONS OF TEM IN MINERALOGY
75
is a result of the minimization of the total elastic strain associated with the intergrowth of the three amphiboles. The existence, or otherwise, of a miscibility gap between members of the calcic-amphibole group has been the subject of considerable debate over the last 25 years (see Smelik et al., 1991, for a review). Although some authors have argued for the existence of a gap from the presence of primary actinolite and hornblende grains in the same rock, others have argued that these occurrences represent metastable assemblages. Experimental studies have also yielded contradictory results. Unequivocal evidence for the existence of such a miscibility gap has been provided by TEM of calcic amphiboles from metagabbros in Wyoming which contain another example of a two-stage exsolution process that involves three different amphiboles (Smelik et al., 1991). The calcic amphiboles, which range in composition from actinolite to hornblende, ¯ contain sparse “101” and “100” lamellae of cummingtonite that are just visible in the light microscope. Between them is a fine, tweedlike structure parallel ¯ and “13¯ 2” ¯ (Fig. 16a). to two irrational, symmetrically equivalent planes “132” Diffraction patterns from the tweed structure showed a single reciprocal lattice with four satellites about each spot that are approximately perpendicular to the modulations. HRTEM showed that the interfaces between the elements of the
Figure 16. Microstructure of a calcic amphibole from Wyoming. It shows a two-stage exsolution process involving three different amphiboles. (a) Pervasive tweed exsolution parallel to ¯ between two larger “100” cummingtonite lamellae; (b) high-resolution image of “132” and “13¯ 2” an area showing a coarse tweed. The microstructure is coherent, with no change in the orientation or spacing of the 020 lattice fringes. Cum, cummingtonite; Act, actinolite; Hbl, hornblende. (Source: Smelik et al., 1991; reproduced by permission of the Mineralogical Society of America.)
76
P. E. CHAMPNESS TABLE 3 EMPA and AEM Analysis of Phases in a Calcic Amphibolea Analysisb
Tetrahedral sites Si Al
T site M(1, 2, 3) Al Ti Mg Fe2+
M(1, 2, 3) M4 Ca Na Fe2+ Mn
M(4) A site Na K
A site Fe2+/(Fe2+ + Mg)
1
2
3
4
5
7.02 0.99 8.00
7.88 0.12 8.00
7.10 0.90 8.00
7.62 0.38 8.00
6.52 1.48 8.00
0.50 0.05 2.63 1.82 5.00
0.03 0.01 3.45 1.51 5.00
0.52 0.03 2.62 1.83 5.00
0.38 0.02 3.05 1.55 5.00
0.76 0.05 1.80 2.40 5.00
1.87 — 0.18 0.03 2.08
0.10 0.08 1.64 0.18 2.00
1.80 0.08 0.11 0.02 2.01
1.71 0.18 0.08 0.03 2.00
1.88 — 0.16 0.02 2.00
0.18 0.06 0.23 0.44
0.15 0.01 0.16 0.48
0.34 0.05 0.39 0.47
0.12 0.03 0.15 0.35
0.36 0.14 0.50 0.59
Source: Smelik et al. (1991). a Amphibole formulas are based on normalization to 23 oxygens and the assumption that all Fe is Fe2+. For the method of allocation of cations to the various crystallographic sites, see Robinson, Spear, et al. (1982). The amphiboles were compositionally zoned (cored); columns 1 and 2 represent averages of analyses with a wide range. The total Al contents ranged from 0.559 to 2.581 per formula unit. b 1, Bulk analysis of composite grains by EMPA (average of 8); 2, AEM analysis of cummingtonite lamellae (average of 6); 3, bulk AEM analysis of tweed structure (average of 8); 4, AEM analysis of actinolite lamellae in tweed structure (average of 20); 5, AEM analysis of hornblende lamellae in tweed structure (average of 21).
tweed were coherent, with no change in the orientation or spacing of the lattice fringes between them (Fig. 16b). Smelik et al. (1991) used EMPA and AEM to investigate the compositions of the phases in Figure 16 (Table 3). As expected, the only significant exchanges during the first stage of the exsolution (that producing the cummingtonite lamellae) were Ca ↔ (Fe, Mg) and AlVI, AlIV ↔ MgVI, SiIV, called the tschermakite substitution; the Fe/(Fe + Mg) ratio did not change.
APPLICATIONS OF TEM IN MINERALOGY
77
The tweed structure was coarse enough in places to allow semiquantitative analysis of the individual components by AEM. The tweed was found to consist of two chemically different regions that approached actinolite and hornblende in composition (Table 3). The actinolite regions have higher Si, lower Al, a lower Fe2+/(Fe2+ + Mg) ratio, and an apparently lower A-site occupancy than do the hornblende regions. Na also appears to be slightly redistributed, with slightly more NaM4 in the actinolite and more NaA in the hornblende. However, the apparent redistribution is probably the result of the difficulty of analyzing Na in the AEM (see Section II) and the difference in the Fe contents of the two phases, all of it having been assumed to be Fe2+ (although the amphiboles are likely to contain some Fe3+), which may have led to overestimation of the A site. The miscibility gap defined by these compositional differences is shown graphically in Figure 17. In Figure 17a total Al has been plotted against Fe2+/(Fe2+ + Mg) for the individual analyses that contributed to the average in Table 3, Figure 17b shows the gap in terms of the calculated A-site occupancy versus AlIV, and in Figure 17c total Al is plotted against AlIV. The gap is well defined in each case. The substitutions that occurred were AlVI, AlIV ↔ MgVI, SiIV; A (Na, K)A, AlIV ↔ , SiIV, called the edenitic substitution; and Fe2+ ↔ Mg, with the tschermakite exchange being dominant. Smelik et al. (1991) interpreted the tweed texture as having been produced by spinodal decomposition between two calcic phases at a lower temperature than that at which exsolution of the cummingtonite lamellae took place. They attempted to calculate the orientation of planes of minimum misfit for the actinolite–hornblende pair. However, the cell parameters for the two phases are very similar and their variation with temperature and pressure is not accurately known for the compositions involved. The calculations failed to show ¯ and “13¯ 2” ¯ are the planes of minimum misfit, but it is conclusively that “132” clear that a range of lamellar orientations may be possible, depending on the exact compositions and unit-cell parameters of the phases involved. 2. Exsolution in Orthorhombic Amphiboles a. Exsolution between Two Orthoamphiboles Evidence from exsolution textures for a miscibility gap between the orthorhombic amphiboles anthophyllite and gedrite was first reported by Bøggild (1905, 1924), who used light-optical microscopy, but the complexity of the phase distributions has become apparent only in the last 25 years from TEM observations (Gittos et al., 1976; Smelik and Veblen, 1993). As in the case of the monoclinic amphiboles, the habit plane is determined by the differences in the cell parameters of the two phases. The usual plane is (010) because b is considerably larger than a or c [although in absolute terms it is still small, being around 1.5% or less at room temperature (Smelik and Veblen, 1993)].
78
P. E. CHAMPNESS
Figure 17. Plots of AEM analyses of actinolite (filled diamonds) and hornblende (open squares) regions of the tweed structure, showing the miscibility gap. The open circles connected by a tie line are the average compositions. (a) Plot of total Al versus Fe2+/(Fe2+ + Mg); (b) plot of calculated A-site occupancy versus AlIV; (c) plot of Altot versus AlIV. End-member abbreviations are as follows: tr, tremolite; ed, edenite; pa, pargasite; ts, tschermakite. (Source: Smelik et al., 1991; reproduced by permission of the Mineralogical Society of America.)
APPLICATIONS OF TEM IN MINERALOGY
79
However, the relative cell parameters are extremely sensitive to composition and temperature (Smelik and Veblen, 1993); if the amphibole contains relatively large amounts of Ca and/or Fe, the habit plane changes from (110) to an (hk0) orientation up to 26◦ from (010) (approximately {120}). This variation in orientation can be seen in different areas of the same zoned crystal and sometimes a fall in temperature has caused the orientation to change within the same area (Gittos et al., 1976; Smelik and Veblen, 1993). There is evidence from TEM studies that heterogeneous nucleation, homogeneous nucleation, and spinodal decomposition can all occur during exsolution of the orthoamphiboles. In samples showing the coarsest textures, nucleation appears to have taken place on (010) chain-width defects (Smelik and Veblen, 1993). In Figure 18a almost all the (010) lamellae contain a chainwidth defect that probably acted as a nucleation site. Later nucleation of (010)
Figure 18. Microstructures in exsolved orthorhombic amphiboles. (a) Sample with bulk composition 60% gedrite showing large (010) lamellae of anthophyllite with a complex morphology. The lamellae have probably nucleated on (010) chain-width errors. Terminations of the chain-width errors have impeded lamellar growth (arrowed). Between the lamellae are small (010) platelets, some of which have nucleated on (100) stacking faults. Others appear to have nucleated homogeneously. Notice the precipitate-free zone adjacent to the large lamellae. (b) HRTEM image of a (100) stacking fault in a homogeneous region of exsolved anthophyllite. The image is taken along [011]. The regular alternation of the stacking (+ + − − + + − −) along the a axis can be seen (compare with Fig. 12) in the orthorhombic phase. In the stacking fault the stacking is (+ + + +) (or − − − −), which indicates that it is a narrow strip of monoclinic material. The faults are thought to form by deformation. (c) TEM image of an orthoamphibole that contains curved lamellae straddling (010). The dark lamellae are gedrite and the light ones are anthophyllite. Note the branching of the lamellae. The electron beam is near [001]. (Sources: (a) Gittos et al., 1976; reproduced by permission of Springer Verlag; (b) and (c) Smelik and Veblen, 1993; reproduced by permission of the Mineralogical Society of America.)
80
P. E. CHAMPNESS
platelets has taken place on stacking faults parallel to (100) in regions between the large lamellae that had a higher concentration of solute. These faults have been shown to be narrow strips of monoclinic material (Fig. 18b) that predate all the exsolution (because they pass through it undisturbed) and were probably produced by deformation (Smelik and Veblen, 1993). The final stage of exsolution in the sample illustrated in Figure 18a appears to have been homogeneous nucleation of the (010) platelets in regions devoid of defects. The large lamellae in Figure 18a show an unusual morphology, in the development of which the (010) lamellar defects appear to have played an important role. Lamellar growth is impeded in the vicinity of the dislocation that forms the termination of the chain-width defect (arrow in Fig. 18a), and an embayment is formed in the lamella; thus the defect is effectively pinning the boundary. Similar embayments form at the terminations of (100) stacking faults (Smelik and Veblen, 1993). It is noticeable that in regions where there are no chainwidth terminations, the lamellae are straight (bottom right, Fig. 18a), but that where the orientation deviates from (010) there are terminations. This suggests that the strain produced by the terminations can influence the orientation of the lamellae. This phenomenon will be aided by the fact that the anisotropy of the misfit is very small (Smelik and Veblen, 1993). Orthoamphiboles with a somewhat finer exsolution texture than that shown in Figure 18a and compositions near the centroid of the solvus show characteristics that are consistent with spinodal decomposition (Fig. 18c and Gittos et al., 1976, Fig. 6). Although the interfaces of the lamellae are now sharp, the lamellae are long and thin, their distribution is very regular, and they show evidence of branching similar to that of lamellae produced experimentally by spinodal decomposition in Ge-substituted alkali feldspars (Kusatz et al., 1987) and in pyroxenes (Buseck et al., 1980). b. Exsolution of a Monoclinic Amphibole from an Orthoamphibole Exsolution between orthorhombic and monoclinic amphiboles has been postulated for many years by analogy with the single-chain pyroxenes. In the latter system it is common for ferromagnesian orthopyroxenes to contain exsolution lamellae of Ca-rich clinopyroxene (and vice versa) parallel to (100). Because of the close chemical and structural similarities between the pyroxenes and amphiboles, one would expect similar exsolution between calcic clinoamphiboles and orthoamphiboles. However, despite the abundance of orthoamphibole-bearing rocks, many of which contain coexisting calcic amphiboles, no such microstructures had been reported, until 1992. Smelik and Veblen (1992) found that (100) lamellae of hornblende up to 80 nm wide had exsolved from an orthoamphibole that also contained earlier-formed lamellae of a second orthoamphibole with a habit plane that varied from (010) to ∼{120}. The hornblende lamellae were semicoherent and had nucleated
APPLICATIONS OF TEM IN MINERALOGY
81
on (100) stacking faults. Because the faults are narrow strips of monoclinic material, they act as ideal templates for the hornblende structure. Semiquantitative analysis of the matrix and hornblende lamellae showed that the main chemical change is CaM4↔(Mg, Fe, Mn)M4, as would be expected from the chemistry of the calcic and ferromagnesian amphiboles (Table 2). However, other coupled substitutions involving the M4, M2, and T sites are also important in the exsolution (Smelik and Veblen, 1992). The analysis also showed that during the first stage of exsolution, the Ca segregated largely to the gedrite rather than to the anthophyllite.
IV. HRTEM and Defect Structures The study of defects by conventional amplitude-contrast imaging, such as the dark-field technique, has revealed a vast amount of information about phase transitions (see Nord, 1992, for a review) and deformation structures and mechanisms in minerals (see Green, 1992, for a review). However, in this section, I concentrate on describing how HRTEM has furthered our understanding of the nature of planar defects, in particular polysomatic defects and their role in replacement reactions in the pyroxenes and amphiboles.
A. Biopyriboles and Polysomatic Defects Thompson (1978) defined a polysome as “a crystal . . . that can be regarded as made up of chemically distinct layer modules.” Thus it is distinct from a polytype, in which there is no chemical variation between the layers. A polysomatic series is a group of crystalline compounds (e.g., minerals) that possess the same types of modules in different ratios or sequences, the general term for this structural mixing being polysomatism. As polysomatic defects and small regions of ordered polysomatic structures have finite width, HRTEM can be used to resolve details within them and hence to identify them. In a polysomatic series in which the two types of modules have the same width, there are commonly certain defects that produce virtually no net displacement in the surrounding structure and thus would produce little contrast by conventional bright- or dark-field imaging (Veblen, 1992). The pyroxenes, amphiboles, and sheet silicates (e.g., mica and talc) can be regarded as belonging to a polysomatic series known as the biopyriboles, a term derived from biotite (a variety of mica), pyroxene, and amphibole (Johannsen, 1911). Pyribole is the name given to biopyriboles, excluding the sheet silicates. When projected along the c axis, the amphibole and pyroxene structures can be described in terms of the stacking of I-beams, a pair of Si–O chains, and
82
P. E. CHAMPNESS
Figure 19. Schematic diagram showing I-beams projected onto the (001) plane in orthopyroxene, orthoamphibole, jimthompsonite, and chesterite. The digits refer to the number of chains in each I-beam. (Source: Klein and Hurlbut, 1993; reproduced by permission of John Wiley & Sons, Inc.)
the cations between them (Figs. 11 and 19), along the b axis. The pyroxene I-beam is one chain wide, the amphibole I-beam is two chains wide, and the mica structure has infinitely wide I-beams (Fig. 1). 1. New Biopyriboles Theoretically, there is a complete, homologous series from pyroxene to mica, and in the last 25 years there have been numerous reports of natural and synthetic examples of other biopyriboles than the three described previously. From light-optical and XRD studies, Veblen and Burnham (1978a, 1978b) described four new minerals that were intergrown with anthophyllite and cummingtonite in a metamorphosed rock near Chester, Vermont. The new minerals have either triple chains [jimthompsonite, (Mg, Fe)17Si20O54(OH)6 ] or both double and triple chains in regular alternation [chesterite,∗ (Mg, Fe)10Si12O32(OH)4 ] (Fig. 19) and, like the pyroxenes and amphiboles, can occur in both monoclinic and orthorhombic forms. Specimens of the new, ordered pyriboles have been studied extensively by HRTEM (Fig. 20), but of more interest are the reports of ordered pyriboles in materials not previously known to contain them: for instance, in nephrite (actinolite), jade (Jefferson et al., 1978), and altered pyroxene (Nakijima and Ribbe, 1980). Several new, ordered pyribole structures were also discovered by HRTEM in specimens from Chester (Veblen and Buseck, 1979). Structures with the following statistically significant, ordered mixed-chain sequences were found: (2233), (233), (232233), (222333), (2332323), (2333) (Fig. 21a), and (433323), where the numbers 2, 3, and 4 indicate the number of chains in ∗ New minerals are commonly named after the locality at which they were first found (e.g., chesterite) or after a distinguished scientist [e.g., jimthompsonite (there was already a mineral named thompsonite, hence the use of J. B. Thompson’s forename)].
APPLICATIONS OF TEM IN MINERALOGY
83
Figure 20. HRTEM images, viewed down the c axis, of anthophyllite (An), jimthompsonite (Jt), and chesterite (Ch). The white spots are the projected positions of the A sites, which are located between the I-beams. The structural interpretation is shown in terms of the I-beams in Figure 19. Unit cells are indicated. (Source: Veblen and Buseck, 1979; reproduced by permission of the Mineralogical Society of America.)
each I-beam. The number and complexity of these structures suggest that they are unlikely to be stable, and the reason for their formation remains obscure. Although jimthompsonite and chesterite are far more abundant than the other phases noted, it is still unclear whether they have true stability fields under geological conditions or whether they are always metastable (Droop, 1994). 2. Chain-Width Disorder in Pyriboles Chisholm (1973) was the first to report the existence of chain-width defects (otherwise known as crystallographic shear planes or Wadsley defects) in chain silicates. Chisholm examined a number of amphibole asbestos samples by electron diffraction and conventional TEM and surmised that the (010) defects were intercalated slabs of pyroxene or slabs with more than two chains.
84
P. E. CHAMPNESS
Figure 21. HRTEM images viewed down the c axis of pyriboles from Chester, Vermont. (a) The ordered sequence (2333). The double-chain slabs are unlabeled. The diffraction pattern is on the right. (b) An area containing triple, quadruple, and quintuple chains and exhibiting extreme chain-width disorder. (Source: Veblen and Buseck, 1979; reproduced by permission of the Mineralogical Society of America.)
Since then, HRTEM has shown that the defects are mostly of the triple-chain variety and that they are generally far more common in asbestos that in nonasbestos amphiboles (Fig. 22; see Veblen, 1981, 1992, for reviews). As shown in Section III.B, chain-width errors can act as nucleation sites for exsolution in orthoamphiboles and can have a strong influence on the growth of the coarser lamellae. Isolated chain-width errors in amphiboles are usually thought to be primary growth features, whereas those that have been reported in pyroxenes, jimthompsonite, and chesterite are associated primarily with alteration reactions; see the next section). Some of the pyriboles from Chester, Vermont, are extremely disordered (Fig. 21b). 3. Polysomatic Reactions in Pyriboles Polysomatic reactions can be defined as reactions that turn one polysome into another. In biopyriboles, any reaction that changes the widths or sequences
APPLICATIONS OF TEM IN MINERALOGY
85
Figure 22. HRTEM image of riebeckite asbestos (crocidolite) showing fibrils, which contain chain-width errors, separated by low-angle boundaries. (Source: Ahn and Buseck, 1991; reproduced by permission of the Mineralogical Society of America.)
of the silicate chains is thus a polysomatic reaction. TEM observations have shown that polysomatic reaction of pyriboles is common whenever such minerals are in contact with hydrous fluids at moderate temperatures during retrograde metamorphism. Although bulk processes, in which transformation occurs along a broad reaction front, may operate in many cases of polysomatic reaction in pyriboles, most TEM observations have involved materials that have been replaced wholly or in part by a lamellar mechanism. In such cases a lamella or zipper of material having a different chain sequence for the matrix nucleates and grows. In most cases the lamellae terminate coherently, but the termination may also be associated with a dislocation. Thickening of the lamellae takes place by the propagation of ledges along the interface. Figure 23a shows a HRTEM image of an amphibole lamella in pyroxene that is thickening from four to five unit cells wide by the migration of a ledge that has a width of two amphibole chains (one unit cell); Figure 23b depicts an I-beam model of the ledge, showing that it terminates coherently. Random nucleation and later growth of zippers in pyriboles will inevitably result in chain-width errors in the resultant phase. Figure 24 shows a possible mechanism by which such errors can be eliminated (Veblen and Buseck, 1980). The material in the top of the micrograph is perfectly ordered chesterite, while that at the bottom contains chain-width errors. The two regions are separated by
86
P. E. CHAMPNESS
Figure 23. (a) HRTEM image of an amphibole lamella in pyroxene. A ledge two chains wide is arrowed. (b) An I-beam diagram of the image in (a). (Source: Veblen, 1981; reproduced by permission of the Mineralogical Society of America.)
Figure 24. Possible mechanism by which chain-width errors can be eliminated. The material in the top of the micrograph is perfectly ordered chesterite, while that at the bottom contains chain-width errors. The two regions are separated by an en echelon series of planar faults; the migration of these faults toward the bottom of the figure would result in the replacement of disordered pyribole by ordered chesterite. (Source: Veblen and Buseck, 1980; reproduced by permission of the Mineralogical Society of America.)
APPLICATIONS OF TEM IN MINERALOGY
87
an en echelon series of planar faults; migration of these faults toward the bottom of the figure would result in the replacement of disordered pyribole by ordered chesterite. Veblen and Buseck (1980) have also suggested that the tunnels that exist at the terminations of the zippers in these reactions (e.g., Fig. 23b) provide a route for ultrafast (pipe) diffusion of the chemical species (hydrogen and octahedral cations) that are needed for the change in stoichiometry of the polysomatic reaction. It is thought that chesterite and jimthompsonite usually form as intermediate phases in the retrograde reaction of amphibole to the sheet silicate talc by the mechanisms just outlined, but this does not always appear to be the case ( Droop, 1994).
V. Concluding Remark Although I have been able to cover only a few of the many applications of TEM to mineralogy in the past 30 years and did not have the space to cover convergent-beam electron diffraction and electron energy-loss spectroscopy (both of which are now being applied to mineralogical problems), I hope I have made clear that in the late 20th century to early 21st century TEM has had an impact second only to XRD in unraveling the complexities of mineral behavior.
References Aaronson, H. I., Lorimer, G. W., Champness, P. E., and Spooner, E. T. C. (1974). On differences between phase transformations (exsolution) in metals and minerals. Chem. Geol. 14, 75–80. Ahn, J. H., and Buseck, P. R. (1991). Microstructures and fiber formation mechanisms of crocidolite asbestos. Am. Mineral. 76, 1467–1478. Ardell, A. J., Nicholson, R. B., and Eshelby, J. D. (1966). On the modulated structure of aged Ni–Al alloys. Acta Metall. 14, 1295–1309. Bøggild, O. B. (1905). Mineralogia Groenlandica. Medd. Groenl. 32, 400. Bøggild, O. B. (1924). On the labradorization of the feldspars. K. Dan. Vidensk. Selsk. Mat. Fys. Medd. 6, 1–79. Bown, M. G., and Gay, P. (1959). Identification of oriented inclusions in pyroxene crystals. Am. Mineral. 44, 592–602. Brady, J. B. (1987). Coarsening of fine-scale exsolution lamellae. Am. Mineral. 72, 697–706. Brown, W. L., Becker, S. M., and Parsons, I. (1983). Cryptoperthites and cooling rate in a layered syenite pluton: a chemical and TEM study. Contrib. Mineral. Petrol. 82, 13–25. Brown, W. L., and Parsons, I. (1988). Zoned ternary feldspars in the Klokken intrusion: exsolution microtextures and mechanisms. Contrib. Mineral. Petrol. 98, 444–454. Buseck, P. R., Nord, G. L. Jr., and Veblen, D. R. (1980). Subsolidus phenomena in pyroxenes. Rev. Mineral. 7, 117–211 (C. T. Prewitt, Ed.). Champness, P. E. (1995). Analytical electron microscopy, in Microprobe Techniques in the Earth Sciences, edited by P. J. Potts, J. F. W. Bowles, S. J. B. Reed, and M. R. Cave. New York: Chapman & Hall, pp. 91–139.
88
P. E. CHAMPNESS
Champness, P. E. (in preparation). Spinodal decomposition versus homogeneous nucleation in silicates. Champness, P. E., and Devenish, R. W. (1992). Radiation damage in silicate minerals: implications for AEM. Proc. EUREM ’92, Granada, 2, 541–545. Champness, P. E., and Lorimer, G. W. (1976). Exsolution in silicates, in Electron Microscopy in Mineralogy, edited by H. R. Wenk, et al. New York: Springer-Verlag, pp. 174–204. Chisholm, J. E. (1973). Planar defects in fibrous amphiboles. J. Mater. Sci. 8, 475–483. Christie, O. H. J. (1968). Spinodal precipitation in silicates. 1. Introductory application to exsolution in feldspars. Lithos 1, 187–192. Christoffersen, P., and Schedl, A. (1980). Microstructure and thermal history of cryptoperthites in a dike from Big Bend, Texas. Am. Mineral. 65, 444–448. Cliff, G., and Lorimer, G. W. (1975). The quantitative analysis of thin specimens. J. Microsc. 103, 203–207. Cline, H. E. (1971). Shape instabilities of eutectic composites at elevated temperatures. Acta Metall. 19, 481–490. Devenish, R. W., and Champness, P. E. (1993). The rate of mass loss in silicate minerals during X-ray analysis, in Proceedings of the Thirteenth International Congress on X-Ray Optics and Microanalysis, Manchester, 1992. London/Bristol: Institute of Physics, pp. 233–236. Droop, G. T. R. (1994). Triple-chain pyriboles in Lewisian ultramafic rocks. Mineral. Mag. 58, 1–20. Fleet, S. G., and Ribbe, P. H. (1963). An electron microscope investigation of a moonstone. Philos. Mag. 8, 1179–1187. Fletcher, R. C., and McCallister, R. H. (1974). Spinodal decomposition as a possible mechanism in the exsolution of clinopyroxene. Carnegie Inst. Washington Yearb. 396–399. Gittos, M. F., Lorimer, G. W., and Champness, P. E. (1974). An electron-microscopic study of precipitation (exsolution) in an amphibole (the hornblende–grunerite system). J. Mater. Sci. 9, 184–192. Gittos, M. F., Lorimer, G. W., and Champness, P. E. (1976). The phase distributions in some exsolved amphiboles, in Electron Microscopy in Mineralogy, edited by H. R. Wenk, et al. Berlin: Springer-Verlag, pp. 238–247. Green, H. W., II. (1992). Petrology—high-temperature and deformation-induced reactions. Rev. Mineral. 27, 425–454 (P. R. Buseck, Ed.). Hawthorn, F. C. (1981). Crystal Chemistry of the amphiboles, in Amphiboles and other hydrous pyriboles-mineralogy, edited by D. R. Veblen, Washington, Mineralogical Society of America: pp. 1–188. Hobbs, L. W. (1984). Radiation effects in analysis by TEM, in Quantitative Electron Microscopy (Scottish Universities Summer School in Physics), edited by J. N. Chapman and A. J. Craven. Edinburgh: SUSSP Publications, PP. 399–445. Jefferson, D. A., Mallinson, L. G., Hutchison, J. L., and Thomas, J. M. (1978). Multiple-chain and other unusual faults in amphiboles. Contrib. Mineral. Petrol. 66, 1–4. Johannsen, A. (1911). Petrographic terms for field use. J. Geol. 19, 317–322. Klein, C., and Hurlbut, C. S., Jr., (1993). Manual of Mineralogy. New York: John Wiley. Kusatz, B., Kroll, H., and Kaiping, A. (1987). Mechanismus und Kinetik von Entmischungsvorg¨angen am Beispiel Ge-substituierter Alkalifeldsp¨ate. Forsch. Mineral. 65, 203–248. Laves, F. (1952). The phase relations of the alkali feldspars. II. J. Geol. 60, 549–574. Leake, B. E. (1978). Nomenclature of amphiboles. Am. Mineral. 63, 1023–1052. Lorimer, G. W., and Champness, P. E. (1973). The origin of the phase distribution in two perthitic alkali feldspars. Philos. Mag. 28, 1391–1403. McConnell, J. D. C. (1969). Electron optical study of incipient exsolution and inversion phenomena in the system NaAlSi3O8–KAlSi3O8. Philos. Mag. 19, 221–229.
APPLICATIONS OF TEM IN MINERALOGY
89
Nakijima, Y., and Ribbe, P. H. (1980). Alteration of pyroxenes from Hokkaido, Japan, to amphibole, clays and other biopyriboles. Neus Jahrb. Mineral. Monatsh. 6, 258–268. Nord, G. L., Jr. (1982). Analytical electron microscopy in mineralogy: exsolved phases in pyroxenes. Ultramicroscopy 8, 109–120. Nord, G. L., Jr. (1992). Imaging transformation-induced microstructures. Rev. Mineral. 27, 455– 508 (P. R. Buseck, Ed.). Owen, D. C., and McConnell, J. D. C. (1971). Spinodal decomposition in an alkali feldspar. Nature Phys. Sci. 230, 118–119. Parsons, I. (1978). Feldspars and fluids in cooling plutons. Mineral. Mag. 42, 1–17. Parsons, I., and Brown, W. L. (1984). Feldspars and the thermal history of igneous rocks, in Feldspars and Feldspathoids, edited by W. L. Brown. Dordrecht, The Netherlands: Reidel, pp. 317–371. Peacor, D. R. (1992). Analytical electron microscopy: X-ray analysis. Rev. Mineral. 27, 113–140 (P. R. Buseck, Ed.). Putnis, A. (1992). Introduction to Mineral Sciences. Cambridge, UK: Cambridge University Press. Robin, Y.-P. F. (1974). Stress and strain in cryptoperthitic lamellae and the coherent solvus of alkali feldspars. Am. Mineral. 59, 1299–1318. Robinson, P., Jaffe, H. W., Ross, M., and Klein, C., Jr. (1971). Orientation of exsolution lamellae in clinopyroxenes and clinoamphiboles: consideration of optimal phase boundaries. Am. Mineral. 56, 909–939. Robinson, P., Ross, M., Nord, G. L., Jr., Smyth, J. R., and Jaffe, H. W. (1977). Exsolution lamellae in augite and pigeonite: fossil indicators of lattice parameters at high temperature and pressure. Am. Mineral. 62, 857–873. Robinson, P., Spear, F. S., Schumacher, J. C., Laird, J., Klein, C., Evans, B. W., and Doolan, B. L. (1982). Phase relations of metamorphic amphiboles: natural occurrence and theory. Rev. Mineral. 9B, 1–227 (D. R. Veblen and P. H. Ribbe, Eds.). Ross, M., Papike, J. J., and Wier Shaw, K. (1969). Exsolution textures in amphiboles as indicators of subsolidus thermal histories, in Pyroxenes and Amphiboles: Crystal Chemistry and Phase Petrology, edited by J. J. Papike. Mineralogical Society of America. Rubie, D. C., and Champness, P. E. (1987). The evolution of microstructure during the transformation of Mg2GeO4 olivine to spinel. Bull. Mineral. 110, 471–480. Smelik, E. A., Nyman, M. W., and Veblen, D. R. (1991). Pervasive exsolution within the calcic amphibole series: TEM evidence for a miscibility gap between actinolite and hornblende in natural samples. Am. Mineral. 76, 1184–1204. Smelik, E. A., and Veblen, D. R. (1989). A five-amphibole assemblage from blueschists in northern Vermont. Am. Mineral. 74, 960–964. Smelik, E. A., and Veblen, D. R. (1991). Exsolution of cummingtonite from glaucophane: a new orientation for exsolution lamellae in clinoamphiboles. Am. Mineral. 76, 971–984. Smelik, E. A., and Veblen, D. R. (1992). Exsolution of hornblende and the solubility limits of calcium in orthoamphibole. Science 257, 1669–1672. Smelik, E. A., and Veblen, D. R. (1993). A transmission and analytical electron microscope study of exsolution microstructures and mechanisms in the orthoamphiboles anthophyllite and gedrite. Am. Mineral. 78, 511–532. Smelik, E. A., and Veblen, D. R. (1994). Complex exsolution in glaucophane from Tillotson Park, north-central Vermont. Can. Mineral. 32, 233–255. Smith, J. V., and MacKenzie, W. S. (1955). The alkali feldspars. II. A simple X-ray technique for the study of alkali feldspars. Am. Mineral. 40, 733–747. Snow, E., and Yund, R. A. (1988). Origin of cryptoperthites in the Bishop Tuff and their bearing in its thermal history. J. Geophys. Res. 93, 8975–8984.
90
P. E. CHAMPNESS
Spear, F. S. (1980). The gedrite–anthophyllite solvus and the composition limits of orthoamphibole from the Post Ponds Volcanics, Vermont. Am. Mineral. 65, 1103–1118. Thompson, J. B., Jr. (1978). Biopyriboles and polysomatic series. Am. Mineral. 63, 239–249. Veblen, D. R. (1981). Non-classical pyriboles and polysomatic reactions in biopyriboles. Rev. Mineral. 9A, 189–236 (D. R. Veblen, Ed.). Veblen, D. R. (1992). Electron microscopy applied to nonstoichiometry, polysomatism and replacement reactions in minerals. Rev. Mineral. 27, 181–229. (P. R. Buseck, Ed.). Veblen, D. R., and Burnham, C. W. (1978a). New biopyriboles from Chester, Vermont. I. Descriptive mineralogy. Am. Mineral. 63, 1000–1009. Veblen, D. R., and Burnham, C. W. (1978b). New biopyriboles from Chester, Vermont. II. Crystal chemistry of jimthompsonite, clinojimthompsonte and chesterite, and the amphibole–mica reaction. Am. Mineral. 63, 1053–1073. Veblen, D. R., and Buseck, P. R. (1979). Chain-width order and disorder in biopyriboles. Am. Mineral. 64, 687–700. Veblen, D. R., and Buseck, P. R. (1980). Microstructures and reaction mechanisms in biopyriboles. Am. Mineral. 65, 599–623. Veblen, D. R., and Buseck, P. R. (1983). Radiation effects on minerals in the electron microscope. Proc. Annu. EMSA Meet. 41, 350–353. Viswanathan, K., and Ghose, S. (1965). The effect of Mg2+ substitution on the cell parameters of cummingtonite. Am. Mineral. 50, 1106–1112. Willaime, C., and Brown, W. L. (1974). A coherent elastic model for the determination of the orientation of exsolution boundaries: application to feldspars. Acta Crystallogr. A 30, 316–331. Willaime, C., and Gandais, M. (1972). Study of exsolution in alkali feldspars: calculation of elastic stresses inducing periodic twins. Phys. Status Solidi 9, 529–539. Worden, R. H., Walker, F. D. L., Parsons, I., and Brown, W. L. (1990). Development of microporosity, diffusion channels and deuteric coarsening in perthitic alkali feldspars. Contrib. Mineral. Petrol. 104, 507–515. Yund, R. A. (1983). Diffusion in feldspars. Feldspar Mineralogy. Rev. Mineral. 2, 203–222. (P. H. Ribbe, Ed.). Yund, R. A., and Chapple, W. M. (1980). Thermal histories of two lava flows estimated from cryptoperthite lamellar spacings. Am. Mineral. 65, 438–443. Yund, R. A., and Davidson, P. (1978). Kinetics of lamellar coarsening in cryptoperthites. Am. Mineral. 63, 470–477. Yund, R. A., McLaren, A. C., and Hobbs, B. E. (1974). Coarsening kinetics of the exsolution microstructure in alkali feldspar. Contrib. Mineral. Petrol. 48, 45–55.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 121
Three-Dimensional Fabrication of Miniature Electron Optics A. D. FEINERMAN AND D. A. CREWE Microfabrication Applications Laboratory, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, Illinois 60607-7053
I. Introduction . . . . . . . . . . . . . . . . . . . . II. Scaling Laws for Electrostatic Lenses . . . . . . . . . . III. Fabrication of Miniature Electrostatic Lenses . . . . . . . A. Review . . . . . . . . . . . . . . . . . . . . . B. Stacking . . . . . . . . . . . . . . . . . . . . . 1. Description of Silicon Die Processing . . . . . . . 2. Pyrex Fiber Processing . . . . . . . . . . . . . 3. Stacked MSEM Assembly . . . . . . . . . . . . 4. Stacked MSEM Electrostatic Deflector and Stigmator . C. Slicing . . . . . . . . . . . . . . . . . . . . . 1. Slicing Processing . . . . . . . . . . . . . . . D. LIGA Lathe . . . . . . . . . . . . . . . . . . . 1. LIGA Lathe Processing . . . . . . . . . . . . . 2. LIGA Lathe Dose Calculation . . . . . . . . . . IV. Fabrication of Miniature Magnetostatic Lenses . . . . . . V. Electron Source . . . . . . . . . . . . . . . . . . . A. Spindt Source . . . . . . . . . . . . . . . . . . B. Silicon Source . . . . . . . . . . . . . . . . . . VI. Detector . . . . . . . . . . . . . . . . . . . . . . VII. Electron-Optical Calculations . . . . . . . . . . . . . A. A Tilted MSEM . . . . . . . . . . . . . . . . . VIII. Performance of a Stacked Einzel Lens . . . . . . . . . . A. MSEM Construction . . . . . . . . . . . . . . . . B. MSEM Operation and Image Formation . . . . . . . IX. Summary and Future Prospects . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
91 93 94 94 95 98 100 100 102 104 106 108 109 111 118 119 119 121 124 126 130 132 132 136 140 141
I. Introduction The term miniature electron optics is used in this article to refer to electrostatic lenses that are smaller than 10 cm. The technology to reduce the size of the lens is being used to reduce the beam voltage and miniaturize the scanning electron microscope (SEM). There are several applications for a miniature SEM (MSEM). An MSEM can be brought to the sample instead of bringing the sample to a standard SEM. This would be convenient when access to the sample is limited, for example, when the researcher is inspecting the hull of a 91 Volume 121 ISBN 0-12-014763-7
C 2002 by Academic Press ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright All rights of reproduction in any form reserved. ISSN 1076-5670/02 $35.00
92
A. D. FEINERMAN AND D. A. CREWE
spacecraft or inside a fusion reactor, or when it is desirable to inspect objects in situ instead of bringing them to the analytical laboratory. In semiconductor processing there is a need for a low-voltage, high-resolution SEM that could observe integrated circuits in situ during each deposition and etching process. In biology the same instrument could observe specimens immediately after they were sliced with a microtome to minimize sample degradation. MSEMs can complement other analytical instruments like the scanning tunnel microscope (STM) or the atomic-force microscope (AFM). When the STM and AFM are operated at atomic resolution, their field of view is limited to a few tens of nanometers and the researcher can spend hours trying to determine if the atoms under view are the atoms of interest. An MSEM observing those instruments would allow the researcher to quickly locate the interesting areas of the sample. Miniaturization will speed up the stereo observation of threedimensional samples, which at present proceeds in three steps: observation, rotation, and observation. Two or more MSEMs mounted at 10◦ with respect to each other can directly acquire a stereo image. Three-dimensional samples of interest range from the evaluation of the pore size and permeability of minerals in the petroleum industry (Huggett, 1990) to the submicron linewidth on an integrated circuit. The technology to make one MSEM could make an array of MSEMs, which would be useful for electron beam lithography and wafer inspection: The present state of the art dynamic random access memory (DRAM) technology is 256 Mbit with a minimum feature size of 0.4 μm (Adler et al., 1994). In general, the size of a memory chip doubles and the smallest feature is reduced 70% every 3 years, which quadruples the amount of information that can be stored on a chip (Sematech, 1994). DRAMs are often developed with electron beam lithography and then manufactured with optical steppers (Larrabee and Chatterjee, 1991). The reason for switching technologies is the order of magnitude increase in throughput in the number of wafers an optical stepper can process in 1 h. Optical steppers are faster because all the pixels are exposed in parallel, whereas an electron beam machine exposes pixels in a serial fashion. An array of N beams would reduce the total writing time by a factor of N and would make electron beam lithography economically competitive. In semiconductor processing the minimum feature size will soon be less than 0.1 μm across an 8-in.-diameter wafer. Determining the most economic method of patterning wafers is an active area of research, with X-ray lithography (Fleming et al., 1992), deep ultraviolet (UV) steppers with phase shifting (Lin, 1991), and arrays of electron beam columns (Feinerman, Crewe, Perng, Shoaf et al., 1992a) or STMs (Marrian et al., 1992) under consideration. Regardless of the lithography method chosen, a method will be required to rapidly inspect large wafers with a resolution of one tenth the minimum feature size or 10 nm. This indicates that an inexpensive array of STMs, AFMs, or SEMs will be
3D FABRICATION OF MINIATURE ELECTRON OPTICS
93
essential for the continued growth of this industry. The inspection problem will not be insignificant, however, and it may be simpler to fabricate an array of high-resolution SEMs with the methods discussed in this review than to process the data they will generate. For example, if we examine an 8-in. wafer consisting of 250 identical 1-cm2 die with 250 parallel beams 100 ×100 nm, there will be 2.5 × 1012 pixels/wafer and 1010 pixels/die. A 10-nm or larger foreign particle will vary the backscattered or secondary electron signal just enough so that when 250 channels are being compared simultaneously the equipment can determine which areas might have particles or defects and must be examined at a higher magnification to resolve a 10-nm particle. If we assumed that the data can be processed as fast as it comes in on the 250 channels and 0.1 μs to examine each pixel, it would take at least 1000 s to observe the entire wafer. Following, three techniques are described that can miniaturize electrostatic lenses operating in different voltage regimes. The integration of an electron source, a deflector, and a detector into the electrostatic lens in order to make an MSEM and a method to miniaturize a pancake magnetic lens are also discussed. II. Scaling Laws for Electrostatic Lenses There are two common types of scaling: constant potential, in which all the lengths are reduced by a factor k, where k is less than 1, and constant electric field, in which both the lengths and the voltages are reduced by the factor k. The effect of scaling is shown in Table 1 where α is the maximum angle of emission of an electron that travels down the electrostatic column
TABLE 1 Effect of Scaling Constant potential Lengths Potentials Fields Spherical aberration dcs = 0.5Cs α 3 Chromatic aberration dc = Cc α(V /V ) Interactions de ∼ L/V 1.5 Stray magnetic field deflection
Constant field
k 1 k−1
k k 1
k
k
k
1
k k5/2
k−5 k3/2
94
A. D. FEINERMAN AND D. A. CREWE
(Change et al., 1990). Constant potential scaling provides the largest improvement in resolution. The electric field increases in this case as 1/k until a maximum electric field for a given gap size is reached. In our research we have held off 2.5 kV with 138-μm gaps or 18 kV/mm at 8 × 10−9 torr. III. Fabrication of Miniature Electrostatic Lenses A. Review This section briefly reviews the SEM miniaturization methods developed by other research groups and the three methods developed at the University of Illinois at Chicago (UIC): stacking, slicing, and using the LIGA lathe (LIGA is a German acronym for lithography and galvo-forming or electroplating) (Feinerman, Crewe, and Crewe, 1994; Feinerman, Crewe, Perng, Shoaf et al., 1992a; Feinerman, Lajos et al., 1996). Miniaturizing the SEM involves miniaturizing each component: electron source, deflector, detector, and one or more lenses to focus the beam. The lenses and deflector can be electrostatic or magnetic but electrostatic devices are more easily micromachined and do not dissipate power in vacuum (Trimmer and Gabriel, 1987). There are two main approaches to miniaturizing an electrostatic lens: either assemble layers and then make apertures (method 1) or make apertures in the individual components and then assemble the components (method 2). The drawback of the first approach is the limited flexibility to vary the aperture size along the column. As discussed in Section VII, einzel lenses perform better if one can make the second or focusing electrode aperture larger than the first and third apertures. The drawback of the second approach is the aperture alignment error during assembly of the components. An example of the first miniaturization method is the proposed lithography wand that would be fabricated by thin-film deposition of several layers followed by reactive ion etching (RIE) of the apertures (Jones et al., 1989). The maximum column length with this method is ∼10 μm and is determined by the thickness that can be reliably anisotropically etched to form the apertures and the maximum thickness of the conductor and insulator thin films. A standard vacuum electrostatic design guideline is to restrict the maximum field between electrodes in a column to ∼10 kV/mm (Chang et al., 1990) and a 10-μm-long column can accelerate and deflect a 100-V beam, which would be capable of exposing only a very thin resist layer. Another problem with a very short column is that since the working distance is approximately half the column length, it would be difficult to mount two columns at 10◦ with respect to each other for stereo microscopy. An example of the second type of miniaturization method has been developed at IBM. Layers with pre-etched apertures are optically aligned to
3D FABRICATION OF MINIATURE ELECTRON OPTICS
95
assemble a 2- to 3-mm-long column with a scanning tunneling microscope tip as the electron source (Muray et al., 1991). The disadvantages of this approach are the elaborate column fabrication method in which a sophisticated optical inspection system allows the operator to manually align and then epoxy individual layers and the use of a large but well-characterized electron source. The maximum length of a column fabricated with this techniques is ∼10 mm and is determined by the accuracy of the optical inspection system as it examines layers at different heights. In another example of the second method, an electrostatic lens is made from a perforated carbon film mounted to a transmission electron microscope (TEM) grid placed over a second TEM grid containing several hundred 20-μmdiameter holes where an 8-μm-thick insulating sheet of polyimide separates the two electrodes (Shedd et al., 1993). This technique relies on the random alignment of one of the several thousand perforations in the carbon film with one of the 20-μm holes and there is a small but finite chance of creating a well-aligned column. Our research program has developed three simple methods for manufacturing extremely accurate and inexpensive electron beam columns: stacking, slicing, and using the LIGA lathe. All three methods can vary the aperture size and location along the optical column, the electrode thickness and spacing, and the position of the deflector within the column. Stacking and the miniaturization methods discussed previously approximate the SEM as a series of infinite planes with circular apertures separated by thin insulating layers (Feinerman, Crewe, Perng, Shoaf et al., 1992a). In the slicing method the electrodes are not apertures in planes but are cylinders that are bonded to an insulating substrate where the common cylinder axis defines the electron-optical axis (Feinerman et al., 1994). The maximum length of the column is limited by the size of the substrate and can be 300 mm or longer. Several sliced columns can be fabricated in parallel. As is shown later, the LIGA lathe method is capable of fabricating electrodes with the widest variety of shapes (Feinerman, Lajos et al., 1996).
B. Stacking∗ In stacking, a (100) silicon wafer is anisotropically etched to create an array of die as shown in Figures 1 and 2. On each die there is an aperture etched through a membrane and four v-grooves on the top and bottom surfaces of the die. Precision Pyrex fibers align and bond the v-grooves on both surfaces of the die. The structure can be designed to have the fibers rest either on the ∗ Portions of this section are reprinted, with permission, from Journal of Vacuum Science and Technology A, 10(4), 611–616, July 1992. Copyright 1992 American Vacuum Society.
96
A. D. FEINERMAN AND D. A. CREWE
Figure 1. (a) Silicon die (D1–D4) are stacked with Pyrex fibers that align and bond to the dies’ v-grooves. The v-grooves are staggered and truncated to increase the die strength. The top and bottom surfaces of each die are optically aligned during fabrication. The first silicon die contains a micromachined field-emission lectron source and a gate electrode to generate the emitting field. The next three silicon die form an einzel lens. The last die, D4, has an electron detector on the surface facing the sample. The MSEM is on a Pyrex die to provide electrical insulation between the electron source and the vacuum chamber. (b) The stacked design approximates the SEM as a series of infinite planes with circular apertures separated by thin insulating layers. A design guideline is to make the membrane surrounding the aperture 10 times larger than the aperture diameter. (c) One of the die is diced into eight electrically insulated sections (V1–V8) to generate a transverse electric field in the center of the die to deflect the electron beam. This die can also correct for astigmatism. The Pyrex washer holds the die together. The die are rectangular instead of square to facilitate electrical contact to the stack. As indicated in Figure 1a, the contact region is to the right on D1, out of the page on D2, to the left on D3, and into the page on D4.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
97
Figure 2. Silicon wafers are anisotropically etched to create four v-grooves on the top and bottom surfaces of each die (only three grooves are shown), and an aperture to allow the electron beam to pass through the die. One 4-in.-diameter wafer contains a hundred 7 × 9-mm die. Rectangular die are used to facilitate electrical contact to the column. Precision Pyrex fibers are diced to the proper length and placed in the v-grooves. The Pyrex fibers provide electrical insulation between the die, align the die in three directions, and are bonded to both die.
etched groove surface or on the groove’s edges (Fig. 3). The relationship among groove width (W), fiber diameter (D), and gap between silicon √ die is given by the following equations (Mentzer, 1990), where ϑ = cos−1 ( 2/3) = 35.26◦ . This is the angle between the normal to the (100) surface and a (111) plane. If D ≤
W , cos(ϑ)
If D ≥
W , cos(ϑ)
D W − sin(ϑ) tan(ϑ) Gap = (D 2 − W 2 )
Gap =
(1) (2)
√ If the v-grooves are allowed to etch to completion their depth will be W/ 2. We have found that for structural integrity the wafer thickness should be at least W and that a large gap √ can be obtained by choosing D = W/cos(ϑ), which makes the Gap = W/ 2. The length of the column shown in Figure 1 √ with these choices is then 5W + 3W/ 2, or 7.1W. Adhering to a maximum electric field design guideline of 10 kV/mm, a 15-kV column would require 1.5-mm gaps and 2.1-mm-thick wafers, and it would be 15 mm long. A 1-kV column would require 0.1-mm gaps and 0.14-mm-thick wafers, and it would be 1 mm long. The stacked design can be scaled to a wide range of voltages since silicon wafers and Pyrex fibers of almost any dimension can be commercially manufactured, processed, and assembled.
98
A. D. FEINERMAN AND D. A. CREWE
Figure 3. The gap between silicon die is determined by the v-groove width (W ) and the fiber diameter (D = 2R). The half angle of the v-groove is ϑ = 35.26◦ , and the depth √ is h = W/ 2. (a) The center of a 308-μm fiber is positioned 76 μm above a 270-μm v-groove. The fiber contacts the silicon within the v-groove, 13 μm below the silicon wafer surface. (b) The center of a 450-μm fiber is positioned 180 μm above a 270-μm v-groove. The fiber contacts the silicon at the silicon wafer surface and rests on the groove’s edges.
1. Description of Silicon Die Processing (Fig. 4) A silicon wafer was cleaned and then oxidized in steam at 900◦ C to grow 40 nm of SiO2. A 200-nm Si3N4 layer was then deposited over the SiO2 in a low-pressure chemical vapor deposition (LPCVD) reactor. Both sides of the wafer were coated with photoresist, and rectangular and square windows were opened in the photoresist on the bottom of the wafer after alignment of the pattern to the wafer flat. The flat indicates the silicon 110 direction. The rectangular and square windows were processed to produce v-grooves and apertures, respectively. The Si3N4 was etched in a plasma etcher, then the
3D FABRICATION OF MINIATURE ELECTRON OPTICS
99
Figure 4. The process sequence for the silicon die used in the MSEM. The starting point is a silicon wafer covered with a dielectric layer consisting of Si3N4 on SiO2. (1) Rectangular windows (381 × 5000 μm2) are opened in the bottom dielectric layer. (2) Rectangular and square windows (381 × 3400 μm2 and 1500 × 1500 μm2) are opened in the top dielectric layer. (3) A 30- to 50-μm-deep circular aperture is etched into the silicon by using aluminum as the etch mask. (4) The aluminum is etched away and 2 μm of SiO2 is grown on the exposed silicon. The silicon below the Si3N4 is not oxidized. (5) The oxide protecting the silicon on top of the aperture is removed and the wafer is placed in an anisotropic etchant. The anisotropic etch is interrupted when the etch is about half the thickness of the wafer. (6) The oxide protecting the v-grooves is removed and the wafer is placed into the anisotropic etchant until the silicon above the aperture is removed. (7) The Si3N4 and SiO2 layers are removed and the wafer is cut into individual die.
photoresist was removed. The top surface of the wafer was aligned to the etched features on the bottom of the wafer with an infrared aligner, then plasma etched. An aluminum film was deposited on the bottom of the wafer and circular holes were etched into this film. The patterned aluminum film serves as a mask for a vertical plasma etch ∼30–60 μm into the silicon. The metal mask is removed and a thick SiO2 layer ∼2 μm is grown on any exposed silicon. The oxide protecting the silicon on top of the aperture (on the opposite side of the wafer) is removed and the wafer is placed in an anisotropic etchant (44% by weight KOH in H2O at 82◦ C). This solution etches the silicon 100 direction 400 times faster than the 111 direction (Petersen, 1982). The solution has a slight etch rate for SiO2 and a negligible etch rate for Si3N4 (Bean, 1978). The wafers were kept in this solution until the KOH solution etched about halfway through the wafer. The SiO2 protecting the v-grooves is removed and the wafer is placed into the anisotropic etchant until the silicon above the aperture is removed. The wafer is cut into individual 7 × 9-mm2 die and the Si3N4 and SiO2 layers are removed with a 10-min immersion in 50% HF acid followed by a 5-min deionized H2O rinse.
100
A. D. FEINERMAN AND D. A. CREWE
2. Pyrex Fiber Processing Precision Pyrex fibers were drawn on a laser micrometer–controlled fiberoptic tower. Duran and Pyrex were chosen because their thermal expansion coefficients of 3.2 × 10−6 /◦ C closely match that of silicon at 2.6 × 10−6 /◦ C. Both glasses have nearly identical chemical composition and are trademarks of Schott and Corning, respectively. The Pyrex fibers were waxed to a silicon wafer and cut to the desired length on a MicroAutomation 1006A dicing saw. The fibers were then solvent cleaned before being used in the MSEM assembly. 3. Stacked MSEM Assembly The die were aligned and anodically bonded with 308-μm Pyrex fibers as shown in Figures 5 and 6 (Feinerman, Crewe, Perng, Shoaf et al., 1992a). Pyrex can be bonded to silicon at 250◦ C with a bond strength of 350 psi (Wallis and Pomerantz, 1969). The bond is strong enough (1.0 ± 0.5 lb) to allow the die to be wire bonded. The glass deforms up to 1.6 μm during anodic bonding to silicon (Carlson, 1974; Carlson et al., 1974). This deformation will increase the fiber/silicon contact area and the increase will be larger if the contact point is below the silicon wafer surface (Feinerman, Shoaf et al., 1991; Fig. 3a). The bond strength between the fiber and the silicon will increase as the contact area increases. Die have been stacked with 308- and 450-μm-diameter fibers in 270-μm-wide grooves yielding 152- and 360-μm gaps between the silicon die, respectively. Attempts to bond 510-μm fibers into the 270-μm grooves have not been successful, possibly because of the small fiber/silicon contact area.
Figure 5. Two silicon die are aligned and anodically bonded to a 308-μm-diameter Duran fiber. The die are aligned to within the accuracy of the optical micrograph ∼±2 μm. The separation between the die is 152 μm.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
101
Figure 6. (a) Optical micrograph of three 381-μm-thick silicon die stacked with glass fibers. V-grooves on the right are to check infrared alignment, while the rest are for fibers. At present the overall structure is limited by a ±5-μm infrared alignment of the die’s top surface to its bottom surface. (b) Three silicon die will form an einzellens. The 0.16-in. vacuum pickup tool is visible in the micrograph, showing that the stack is self-supporting. The overhang of the die is rotated 90◦ between layers to facilitate electrical connections.
The accuracy of the stacking technique is limited by the precision of the glass fibers, silicon die, and v-groove etching. Optical fibers have a diameter tolerance of ±0.1%/km of fiber (Gowar, 1984) or ±0.3 μm/km for a 308-μm fiber. A kilometer of fiber would provide enough material for several thousand microscopes. The total indicated runout (TIR), which is defined as the maximum surface deviation, on a 7 × 9-mm2 double-polished silicon die is much less than 1 μm. The etched v-groove (111) surfaces also have less than 1 μm of TIR (Feinerman, Shoaf et al., 1991). At present, the overall accuracy of the
102
A. D. FEINERMAN AND D. A. CREWE
column is limited by the ±5-μm infrared alignment of etched features in the top and bottom surfaces of the silicon die. This accuracy can be improved by exposing the bottom surface of the wafer with X-rays through a metal mask on the top surface of the wafer. The stacking technique should achieve submicron accuracy. 4. Stacked MSEM Electrostatic Deflector and Stigmator A compact MSEM requires a micromachined electrostatic or magnetostatic deflector and stigmator integrated in the column (Figs. 1 and 7). Electrostatic deflector/stigmators can be implemented by generating a transverse electric field with a single die (Fig. 1) or with two die (Figs. 7 and 8). The first design generates the field within a single die, which minimizes the column length. In the second approach a transverse field is generated between two successive die. This design has an advantage when one is building an array of MSEMs,
Figure 7. (a) Deflecting the electron beam inside a decelerating einzel lens increases the field of view and working distance at the expense of increased circuit complexity. (b) The beam is focused by a three-electrode einzel lens and then deflected. The deflector operates near ground potential.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
103
Figure 8. (a) A silicon die at a single potential will have a uniform coating of metal on its top and bottom surfaces. (b) If a pair of silicon surfaces are used to deflect the electron beam and correct for astigmatism, one surface of each die will have eight independently controlled metal electrodes insulated from the silicon with a thick high-quality SiO2 layer. (c) Cross-sectional view of deflector indicating the transverse electric field between the pair of die. (d) The deflectors for an array of MSEMs can be operated in parallel with integrated circuit interconnection technology. The interrupted lines indicate where a second level of metallization is required to avoid shorts between potentials. The contacts at the edge of the array (V1–V8) have been repeated for visual clarity. Only eight contacts are needed to drive an N × N array of deflectors in parallel.
because integrated circuit technology can be used to fabricate the multilevel interconnects that can drive all the electrodes in parallel (Fig. 8d). If a single die generated the transverse electric field then wire bonding or a similar technique would be required to drive all the electrodes. The beam deflection angle is given by tan y = L E tr /2Vb , where L is the axial length of the deflector (thickness of D3 in Fig. 1 or the gap between D3 and D4 in Fig. 7a), Vb is the beam energy as it enters the deflector, and Etr is the uniform transverse electric field created between the deflector plates. If the transverse field is 30 V/mm and the beam travels 220 μm through the gap between D3 and D4 in Figure 7a, then to a target 500 μm beyond D5, a 5-μm beam deflection would be obtained with a 3-milliradian (mrad) deflection angle (vtrans /vbeam ). Deflecting the beam more than the 4-mrad convergence angle would introduce higher-order aberrations.
104
A. D. FEINERMAN AND D. A. CREWE
Figure 8. (Continued)
As shown in Figure 7, the minimum working distance for an MSEM is obtained when the deflector is inside the einzel lens. A practical problem with this choice is that the deflector’s electronics operates at the einzel electrode potential rather than operating at ground potential. C. Slicing∗ As discussed earlier the electrodes fabricated in the slicing method are not apertures in planes but are conducting cylinders bonded to an insulating substrate where the common cylinder axis defines the electron-optical axis (Fig. 9) ∗ Portions of Section III.C are reprinted, with permission, from the Journal of Vacuum Science and Technology B, 12(6), 3182–3186, November 1994. Copyright 1994 American Vacuum Society.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
105
Figure 9. Sliced MSEM. (a) A (100) silicon wafer with a patterned silicon nitride layer is anodically bonded to a Pyrex wafer and anisotropically etched. The nitride is removed with buffered hydrofluoric acid. A dicing saw separates the silicon into electrically isolated electrode sections. (b) Precision GE772 capillary tubes are anodically bonded into the v-grooves. The glass has a thermal expansion coefficient of 3.6 × 10−6 /◦ C and contains 2% PbO. (c) The capillary tubes are separated into electrodes with a dicing saw and a micromachined field-emission source is added to the column. (d) A three-dimensional view of a sliced electrostatic column. Electrodes E1, E2, E3, and E4 are 1.5, 1, 1.5, and 1 mm long, respectively. A 1-mm gap separates electrodes E1, E2, and E3, and a 1.5-mm gap separates E3 and E4. Electrodes E2–E4 have a 300-μm inner diameter and a 500-μm outer diameter.
106
A. D. FEINERMAN AND D. A. CREWE
Figure 10. Sliced deflector. (a) A (100) silicon wafer with a patterned dielectric layer is anodically bonded to a Pyrex wafer and anisotropically etched. The dielectric is removed with buffered hydrofluoric acid. A second layer is “stacked” over the first layer. (b) A preform consisting of GE772 and Pyrex is drawn and anodically bonded into the v-grooves. The structure is then diced as in the previous figure, with the blade electrically isolating the tube sections. (c) A second double-layer substrate is bonded to the top of the composite fibers. Electrical contact can now be made to each section of GE772 glass.
(Feinerman, Crewe, and Crewe, 1994). The maximum length of the column is limited by the size of the substrate and can be 300 mm or longer. The electrode inner and outer diameter, length, and aperture size can all be varied in the design. The slicing method can also produce an integrated electrostatic deflector (Fig. 10). 1. Slicing Processing A (100) silicon wafer with a patterned Si3N4 film is bonded to a Pyrex (Corning 7740) wafer and anisotropically etched (Fig. 9). The anisotropic etchant removes silicon faster in the 100 direction than in the 111 direction and has a negligible ctch rate for Pyrex and the nitride film. This etch creates v-grooves in a silicon wafer whose √ normal is parallel to a 100 direction and the depth of the v-groove is W/ 2, where W is the opening in the nitride film.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
107
√ If the opening in the nitride is larger than 2t; where t is the wafer thickness, the etch will terminate on the Pyrex. The nitride etch mask is designed to create v-grooves in silicon islands on the Pyrex wafer. A Corning 7720 glass capillary is anodically bonded to the silicon and a dicing saw is used to create the required gaps in the capillary. Anodic bonding is a technique in which glass is bonded to silicon at elevated temperatures by passing a current from the silicon into the glass (Wallis and Pomerantz, 1969). As discussed later in this section, the anodic bond is sufficiently strong that solid fibers and capillaries can be diced without any organic “potting” compound. In Section VII the resolution of the proposed sliced column is calculated when each electrode has 300- and 500-μm inner and outer diameters. The electrode aperture size can be varied along the column by bonding capillaries with different inside diameters into the v-grooves holding electrodes E2–E4 (Fig. 9d). After the structure is fabricated the glass surfaces that will be exposed to the electron beam must be made sufficiently conductive to form an electrostatic column. Electrical contact to the conductive glass can be made by attaching leads to the silicon sections. A crucial question for the slicing method is the minimum conductive coating required for each electrode in the column. As is well known, insulators exposed to an electron beam will charge and the resulting electrostatic fields will have a deleterious effect on the electron beam itself. A starting point to determine the minimum conductive layer is to assume that after the first beam-limiting aperture no more than 1% of the beam will strike any surface. The stray current striking the middle of the electrode’s walls should not raise the potential of the wall by more than one tenth of a volt, which is the variation in beam voltage expected from a cold field-emitter scurce. If the glass surface has a coating of Rsq /square, the resistance of the electrode Rel is given by the following formula, where DI and Do are the inner and outer diameters and L is the axial length of the electrode: Rsq L L Do + ln + (3) Rel = 4π DI DI 2Do The preceding formula assumes that the length of the v-groove is half that of the electrode and ignores current bunching where the electrode makes contact to the v-groove. If L, Do, and DI are 1.5, 0.5, and 0.3 mm, respectively, then Rel = 0.56Rsq. If there is a 1-nA beam, then Rsq must be less than ∼1.8 × 1010 /square to avoid a 0.1-V variation in the electrode’s potential. This is an approximation and the assumptions will have to be confirmed by experiment. We have not yet made the glass conductive but we have three proposed solutions. Our first solution is to use a glass containing PbO wherever the glass electrode surface will be exposed to the electron beam. This PbO could be reduced in a hydrogen ambient with the process used to create microchannel
108
A. D. FEINERMAN AND D. A. CREWE
plates. A typical microchannel plate produced at Galileo Electro-Optics in Massachusetts is 400 μm thick with an active area that contains 3.4 × 106 capillaries, 10 μm in diameter. The resistance of each capillary is approximately 13Rsq. The minimum microchannel plate resistance reached is 10– 100 k when Corning 8161 (which contains 51% PbO) or Galileo MCP-10 glass is used (Feller, 1990; Laprade, 1989). This translates into a conductive layer of 3 × 109 to 3 × 1010 /square, which is sufficient for the low beam currents used in imaging. A second solution would be to metallize the glass by chemical vapor deposition. For example, a thin coating of polycrystalline silicon deposited on glass could be exposed to tungsten hexafluoride to form a tungsten film with a sheet resistivity of 2–100 /square (Busta et al., 1985). An alternative procedure is to electroplate a thin layer of gold onto the reduced glass surface on the short capillary sections. The deflector shown in Figure 10 will require that complex glass cross sections be drawn (Jansen and Ulrich, 1991) and selectively made conductive. Any metallic coating will lower the sheet resistance of the glass to a point at which it can be used in an electron-optical column. The overall accuracy of a sliced column depends on the accuracy with which a v-groove can be etched and a capillary can be drawn and diced. The total indicated runout, or maximum surface deviation, on an etched v-groove (111) surface is less than 1 μm (Feinerman, Shoaf et al., 1991). Fibers can be purchased that are drawn with laser micrometer control and have a 1-μm or better tolerance on their diameter. The largest error is in the control of the length of each capillary section which is ∼3 μm. Most commercial electron-optical columns have a dimensional tolerance of ∼0.1%. The errors in a sliced column are on the order of 0.3% (100 × 3 μm/1000 μm) and are expected to slightly degrade the performance of an electron-optical column. The impact each error will have on the column’s resolution will have to be directly measured. D. LIGA Lathe∗ As shown in Figure 11, the LIGA lathe is capable of patterning the widest variety of electrode shapes on a micron scale, including shapes impossible to achieve with a conventional lathe (Feinerman, Lajos et al. 1996). The electrode spacing and aperture size within an electron-optical column can also be varied. The maximum length of the column is limited by the size of the X-ray exposure at a synchrotron, which is 100 mm at Argonne’s Advanced Photon Source (APS). However, successive exposures can be stitched together. ∗ Portions of Section III.D are reprinted, with permission, from the IEEE Journal of Microelectro-mechanical Systems 5(4), 250–255, December 1996.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
109
Figure 11. Three-dimensional views of electrostatic columns that can be produced on a LIGA lathe. The technique can create the widest variety of electrode shapes and can vary the aperture diameter along the length of the column. The technique used to create these structures requires that a cylindrical layer of X-ray resist be exposed and developed. After resist development, metal can be electroplated into the regions where the resist was removed or a conformal metal coating can be deposited around the structure.
1. LIGA Lathe Processing In the standard LIGA process (as mentioned before, LIGA is a German acronym for lithography and galvo-forming or electroplating), a planar substrate is covered with an X-ray-sensitive resist and exposed with a collimated X-ray source (Guckel et al., 1990). A typical X-ray resist is poly(methyl methacrylate), or PMMA (also known as Lucite). The exposed resist is removed in a developer (positive resist) and this process is the analog of a binary mill operating on a micron scale capable of creating two-dimensional structures that are as thick as the PMMA. Metal is electroplated into the exposed and developed voids formed in the resist. The modifications developed in our laboratory extend LIGA into a variety of three-dimensional structures. A cylindrical core coated with an X-ray-sensitive resist is schematically illustrated in Figure 12. Nylon filament 460 μm in diameter has been coated with PMMA as has 125-μm gold-plated copper wire. The PMMA is built up to the desired thickness with multiple layers. This core is mounted with slight tension between the headstock and tailstock of a custom-built glassblower’s lathe shown in Figure 13. The two chucks on the lathe rotate simultaneously to avoid twisting the core during exposure. The lathe rotates at 1 rpm during 30-min and longer exposures.
110
A. D. FEINERMAN AND D. A. CREWE
Figure 12. A blank substrate ready for use on the X-ray lathe. An X-ray-sensitive resist surrounds an opaque core. A solid rod of X-ray-sensitive resist could also be used as substrate.
A two-level (binary) surface possessing cylindrical symmetry was fabricated by exposing the substrate with a mask consisting of opaque bars (Fig. 14). Micrographs taken after PMMA development are shown in Figure 15. The current cylindrical resist layers are not as uniform as planar resist coatings. If the coating technology cannot be significantly improved, a uniform layer could be achieved by exposing the resist through a mask that absorbs all X-rays below the desired radius and removing the excess resist in a developer. The starting material can also be solid PMMA rod. A cylindrically symmetric pattern with a variable radius was fabricated as shown in Figures 16 and 17. The radial penetration of the X-rays is determined by the shape of the X-ray absorber. If the mask extends beyond the outer radius of the resist, no resist is exposed. Conversely, if the mask does not block the exposure all the resist will be exposed. The X-ray mask becomes the analog of the cutting tool of a conventional lathe.
Figure 13. LIGA lathe prototype. Both ends of the substrate shown in Figure 12 rotate at the same rate. Antibacklash gears are used to prevent the substrate from twisting during the exposure.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
111
Figure 14. X-ray mask used to create a two-level cylindrically symmetric surface. The X-ray resist exposed below the transparent regions of the X-ray mask is subsequently removed in the developer.
There are other possible modifications of LIGA technology in which the X-ray exposure is modulated in time. As indicated in Figure 18, octupoles for an electrostatic deflector/stigmator can be created if the substrate is exposed through an aperture and the exposure is chopped synchronously with the rotation. The shutter motion in this case would have to be much faster than the time needed to make one complete rotation, which is 1 min with our current fixture. Solid rods of PMMA were machined and exposed at Argonne’s Advanced Photon Source (APS) with the mask design in Figure 19. The rods are shown in Figure 20 after the exposed PMMA has been removed in a developer. The more energetic X-rays available at the APS allows for the micromachining of macroscopic electrostatic lenses. 2. LIGA Lathe Dose Calculation The binary exposure doses (Fig. 14) are compared with that of a planar slab with the same resist thickness. The exposure time calculation for the binary radius cylinder structure assumes an opaque core with radius Ri covered with resist to a radius Ro. The variables are defined in Figure 21a. This structure rotates with an angular speed of ω while illuminated with collimated X-rays. The X-ray path length h at a particular radius r and angle ϑ is given by the following formulas: r × sin ϑ (4) β = sin−1 Ro h = Ro × cos(β) − r × cos ϑ
(5)
112
A. D. FEINERMAN AND D. A. CREWE
Figure 15. (a) An ∼55-μm-thick PMMA coating on a 125-μm-diameter Au-plated Cu wire. The PMMA cross-section thickness is not uniform with thick coatings. (b) An ∼15-μmthick PMMA coating on a 125-μm Au-plated Cu wire.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
113
Figure 16. X-ray mask used to create a variable-level cylindrically symmetric surface. The separation of the transparent region of the mask and the rotating substrate axis determines the final radius of the resist. The exposure time can be reduced by 50% with a mask that exposes both sides of the substrate simultaneously.
Figure 17. Micrograph of a variable PMMA surface. A 460-μm nylon fiber was coated with ∼125 μm of PMMA. The substrate was intentionally overexposed and the nylon was damaged by the X-rays.
114
A. D. FEINERMAN AND D. A. CREWE
Figure 18. (a) Quadrupoles and other nonazimuthally symmetric shapes can be achieved with the LIGA lathe. (b) Cross-sectional view of a quadrupole exposure. The shutter position determines if X-rays will be transmitted through an aperture in an X-ray mask. The hatched areas represent the resist not exposed. The shutter motion has to be significantly faster than the time to make one revolution, which is ∼1 min.
The exposure at a particular radius takes place between ±ϑ c, where Ri π ϑc = + cos−1 2 r The exposure in one revolution of the substrate is given by ϑc h 2 E binary(r ) = Iinc × dϑ |cosϑ| exp − ω 0 ξ
(6)
(7)
where Iinc is the incident power/area, and ξ is the X-ray absorption length at a particular wavelength. If the core is not opaque and has the same ξ as the resist,
3D FABRICATION OF MINIATURE ELECTRON OPTICS
115
Figure 19. (a) Solid rods of PMMA (Lucite) were machined to make a two-conductor corrugated filter. A two-conductor design simplifies radiofrequency (RF) testing. (b) Brass mask 0.8 mm thick used to pattern 0.06-nm X-rays. These X-rays are blocked by 50 μm of Au, 250 μm of Cu, or 7.4 mm of Si.
the integral extends between 0 and π . The |cosϑ| factor in the integral takes into account the angle between the resist and the radiation. This calculation neglects refraction because the index of refraction for PMMA differs from 1 by less than 10−4 across the range of X-rays used in the LIGA lathe exposures (Cerrina et al., 1993). In one revolution of the substrate, the bottom of a planar resist layer t μm thick would receive the following exposure: t 2π E planar(t) = Iinc × (8) × exp − ω ξ The ratio of the cylinder to planar exposure times for a comparable resist thickness varies slightly with the exact value of the X-ray absorption length, Ro and Ri. The longer exposure time is a consequence of the cylindrical geometry (longer path length and opaque core blocking the X-rays); however, there is more PMMA per unit mask opening in the cylindrical case 0.5π (Ro + Ri)/Ro or 2.4 with a 125-μm-diameter core covered with 50 μm of resist. As shown in Table 2, the exposure ratios range from 3 to 4 for the values of Ri, resist thickness, and wavelength used at the CXrL (Center for X-ray Lithography, Stoughton, WI, USA). The calculations in Table 2 assume an X-ray absorption length of 100 μm, which corresponds to 0.36-nm X-rays. At the CXrL facility with a 1-GeV beam and a 25-μm beryllium window, the X-rays range from 0.25 to 0.5 nm with absorption lengths ranging from 310 to 40 μm (Cerrina et al., 1993). A planar PMMA sheet 50 μm thick is exposed in 6 min with a
116
A. D. FEINERMAN AND D. A. CREWE
Figure 20. (a) Top and (b) side views of a 3-mm outside diameter PMMA rod that has been exposed and developed to create 0.15-mm-wide slots that are ∼0.9 mm deep.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
117
Figure 21. (a) The X-ray dose at radius r with the mask shown in Figure 14 depends on the X-ray path length h(ϑ) and the radius of the opaque core. The exposure takes place from −θ c to +θ c. The hatched area represents the resist, which is not exposed, since the core blocks the X-rays. (b) The X-ray dose at radius r with the mask shown in Figure 16 depends on the X-ray path length h(ϑ) and the amount of the incident radiation the mask intercepts. The exposure takes place between ϑ mi and ϑ mf on both sides of the substrate. The hatched area represents the resist, which is not exposed, since the mask blocks the X-rays.
storage ring current of 150 mA, and 21 min were required to expose the 50-μmthick layer of PMMA surrounding a 125-μm-diameter core. In a planar resist geometry the ratio of the exposure on the top and bottom surfaces is exp(t/ξ ). In a cylindrical geometry this ratio increases by a factor of ∼1.2 because the core shadows the inner surface more than the outer surface during each revolution. Consequently, planar resist can be slightly thicker than cylindrical resist for any given X-ray energy.
118
A. D. FEINERMAN AND D. A. CREWE TABLE 2 Exposure Geometry Core diameter (μm)
Resist thickness (μm)
Cylinder/planar exposure ratio
125 125 125 125 460 460 460 460
5 10 50 124 5 10 50 124
3.20 3.23 3.37 3.45 3.21 3.26 3.54 3.81
The exposure time for the variable radius cylinder shown in Figure 16 is calculated next. In this structure, incident radiation is blocked by the X-ray mask at radii less than Rm from the substrate’s axis of rotation, as shown in Figure 21b. The exposure at a radius greater than Rm takes place on either side of the core between ϑ mi and ϑ mf, where π −1 Rm (9) ϑmi = − cos 2 r π −1 Rm (10) ϑmf = + cos 2 r The exposure in one revolution is then given by h 2 ϑmf |cosϑ| exp − dϑ E variable(r ) = Iinc × ω ϑmi ξ
(11)
There is no exposure at radius Rm since ϑ mi = ϑ mf. The radius at which the resist is sufficiently exposed (Rc) is determined by the actual exposure time. If the ratio of the exposure at the outer and inner surfaces was 5, then with a 125-μm-diameter core covered with 50 μm of resist and Rm = Ri, Rc = 1.16 Ri. The mask must be undersized to achieve the desired radius. IV. Fabrication of Miniature Magnetostatic Lenses Researchers have developed impressive techniques to fabricate electromagnetic components on silicon wafers (Ahn and Allen, 1993). Such techniques include winding conductors around electroplated magnetic material or winding magnetic material around conductors (Ahn and Allen, 1994). These techniques are compatible with the stacked MSEM fabrication approach described in
3D FABRICATION OF MINIATURE ELECTRON OPTICS
119
Section III.B and could be used to create a magnetic pancake lens (Mulvey, 1982). As was already discussed, magnetostatic lenses have a disadvantage with respect to electrostatic lenses of power dissipation in vacuum but they have the benefit of lower aberration coefficients. It is highly unlikely that magnetic pancake lenses can be micromachined with the same tolerances as electrostatic lenses.
V. Electron Source A micromachined field-emission electron source is an essential component for an MSEM. As discussed in Section VIII, initial images in our laboratory were obtained with a commercial thermal field-emission electron source that was an order of magnitude larger than the micromachined einzel lens. A macroscopic source is also very difficult to align to the lens. The majority of the research on micromachined electron sources concerns stable current versus voltage characteristics. A good review of existing work on micromachined sources can be found in an earlier volume in this series (Brodie and Spindt, 1992). An ideal source for an MSEM will produce at least I nA of electrons, which will travel down the electron-optical axis. The electrons will appear to originate from a small source on the order of 1 nm in diameter and will have an energy spread no larger than 0.1 eV. The current from field-emission sources depends exponentially on the tip work function φ and the proportionality factor β between applied voltage and the electric field at the tip’s surface. A common source of variation of these parameters for microfabricated field emitters is a carbonaceous contamination and oxidation of the emitter surface (Somorjai, 1981) and field-enhancing protuberances on the emitter surface. The micromachined electron sources discussed in this section can be easily incorporated into a stacked MSEM and are rugged enough to withstand processes that can remove these deposits from the emitting surface.
A. Spindt Source The fabrication of an array of Spindt sources is schematically shown in Figure 22 (Spindt, 1968; Spindt et al., 1991). A 0.5-μm-thick thermal silicon dioxide (SiO2) layer is grown on a high-conductivity silicon wafer ∼0.01 -cm. The SiO2 is then coated with a 0.25-μm-thick film of molybdenum. Submicron holes ∼0.8 μm in diameter are etched in the Mo film and electropolished to smooth off any deviations from a circular opening. The Mo film serves as an etch mask when holes are being isotropically etched in the SiO2 layer with buffered hydrofluoric acid. A second mask opens rectangular patterns in the Mo and SiO2 films and the wafer is etched in KOH to form
120
A. D. FEINERMAN AND D. A. CREWE
Figure 22. Schematic of a thin-film field-emission array fabricated by using anisotropic etching. The base is single-crystal silicon and the field-emitter cathodes and gate film are vapordeposited molybdenum. The insulating layer is thermally grown SiO2. (a) Schematic of a Spindt cathode array. (b) Scanning electron micrograph of Spindt cathode array. (c) Scanning electron micrograph of Spindt cathode.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
121
v-grooves for aligning the source to the electron-optical axis. A third mask patterns the Mo film into a gate electrode. The wafer is then cut into individual die. Aluminum oxide (Al2O3) is deposited on a die at 60◦ with respect to the normal of the surface. A shallow deposition is used so that the Al2O3 does not reach the bottom of the submicron hole. A second Mo layer is deposited at normal incidence forming a cone in the hole. An immersion in KOH removes the Al2O3 and the second Mo layer everywhere except where it formed a cone in the hole. Micrographs of an array of Mo cones are shown in Figures 22b and 22c. The maximum temperature of this source is approximately 550◦ C and is limited by reaction between the molybdenum and silicon. The effects of a pure H2 and a 9/1 mixture of H2 and Ne plasma glow discharges on the I–V characteristics and emission uniformity of a single emitter tip have been investigated (Schwoebel and Spindt, 1993). The discharges were operated with a current-regulated direct current supply. During glow discharge processing the emitter tip and gate were electrically connected and served as the cathode in the discharge; thus, the emitter was bombarded with only positive ions. The anode of the glow discharge was ∼1 cm from the cathode. At this distance and with a pressure of ∼1 torr, operating voltages of 275–450 V were required to sustain a glow discharge in the gases employed. Total ion current densities at the emitter array were on the order of 1016 ions/cm2 s−1. Gases used were of Matheson research-grade purity admitted to the system from 1-liter glass flasks. Following the glow discharge treatment the system was evacuated to ultrahigh vacuum (UHV) conditions prior to array operation. An in situ hydrogen plasma treatment to doses of 1018 to 1019/cm2 has been shown to reduce the work function for a single tip from 0.5 to 1.5 eV and to increase its emission uniformity. The emission patterns before and after the H2 plasma are shown in Figures 23a and 23b, respectively. The hydrogen cleaning allowed the tips to be immediately operated at 5 μA without the usual 50- to 100-h seasoning time. The hydrogen cleaning was followed by a H2/Ne plasma treatment, which further improved the emission uniformity (Fig. 23c). The two plasma treatments lower the voltage required to achieve a given emission current compared with that of samples that have not been plasma cleaned, as shown in Figure 24.
B. Silicon Source Conventional field-emission sources for electron microscopes can either “flash” a cold tip (heat briefly to 1600 K) to clean and reform its surface or operate a zirconium/tungsten tip at 1800 K for thermally assisted field emission. Our laboratory is investigating a micromachined field-emission source on a single-crystal silicon microbridge that can be flashed or operated continuously
122
A. D. FEINERMAN AND D. A. CREWE
Figure 23. (a) Field electron micrograph of a single tip prior to hydrogen plasma treatment (V = 175 V, I = 1 μA). (b) Field electron micrograph of the single tip following a H2 plasma treatment (V = 133 V, I = 1 μA, total dose ∼1018 ions/cm2). (c) Field electron micrograph of the single tip after 9/1 H2/Ne plasma treatment (V = 130 V, I = 1 μA, total dose ∼7 × 1018 ions/cm2).
at 1000–1200 K. The thermally assisted sources are not as bright as cold fieldemission sources but have more stable emission characteristics. The energy spread of these tips is approximately twice that of a cold field-emission tip. Even operating a cold field-emission tip at 500 K improves its stability. In the past, micromachined sources were developed for high-density displays or for vacuum integrated circuits, and designers could not afford the space or heat
3D FABRICATION OF MINIATURE ELECTRON OPTICS
123
Figure 24. Fowler–Nordheim (FN) data showing effect of pure hydrogen and hydrogen +10% neon plasma treatment on a microfabricated single-tip field emitter. Line A: FN data prior to hydrogen plasma treatment. Line B: FN data after hydrogen plasma treatment (total dose ∼1018 ions/cm2). Line C: FN data after hydrogen +10% neon plasma treatment (total dose ∼1 × 1018 ions/cm2). Line D: FN data following and additional hydrogen +10% neon plasma treatment (total dose ∼7 × 1018 ions/cm2).
dissipation required with this approach. However, as shown in Figure 25, a stacked MSEM has the space to accommodate a miniature heater. In addition, a heat source would be valuable because silicon micromachined tips (Ravi and Marcus, 1991) are often contaminated with a thin SiO2 layer and there are data showing that heating them to 1000 K in 1.5 × 10−8 torr would desorb this layer (Yamazaki et al., 1992). A 15-μm-high silicon tip without a gate has been fabricated in the center of a micromachined microbridge with a cross-sectional area of 1.1 × 10−2 mm2 and 5 mm long. The center of the bridge has been heated continuously
124
A. D. FEINERMAN AND D. A. CREWE
Figure 25. Fabrication of a field emitter on bridge center. (a) Anisotropic etching of silicon to form an ∼100-μm-thick membrane. (b) Isotropic etch of silicon using a 30-μm-diameter SiO2 circle island to form a 15-μm-high silicon tip on the membrane center. (c) Anisotropic etching of the membrane to form the bridge and v-grooves. (d) Oxidation sharpening of tip. (e) Device is bonded to Pyrex die. (f) Dicing saw electrically isolates bridge from the rest of die.
up to 935◦ C. The field-emission characteristics of this tip at room temperature, 395◦ C, and 935◦ C have been investigated. The current fluctuations during 20 min of operation were reduced from 150% at room temperature to 24% at 935◦ C. SEM micrographs in Figure 26 indicate that after the tip operated at 935◦ C, it became rounded and some protrusions and concentric rings developed around the tip. Further investigation is needed before this tip can be used in an MSEM.
VI. Detector A drawback of the MSEM design is that the short working distance reduces the number of choices for detecting secondary and backscattered electrons and X-rays. In the basic MSEM design shown in Figure 1, the working distance is only 0.5 mm and the beam energy is 2.7 keV. The simplest electron detector is a Faraday cup, which collects electrons with a unity gain, which will limit the pixel acquisition time. An approximation to a Faraday cup can be achieved by grounding the last electrode (D4) shown in Figure 1a. A metallic layer can be patterned and electrically isolated from the last silicon chip in the column in order to operate the detector at an accelerating potential with respect to the last die in the column to improve the electron collection efficiency.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
125
Figure 26. SEM micrographs of the 15-μm-high silicon tips after operation at (a) 25◦ C, (b) 395◦ C, and (c) 935◦ C.
A standard silicon detector for secondary electrons is a reverse-biased p–n or Schottky junction. A problem with implementing this in an MSEM is the shallow penetration of low-energy electrons into silicon, with 1000-, 100-, and 10-eV electrons penetrating up to 20, 0.5, and 0.02 nm into silicon, respectively. Our laboratory is investigating a surface p–n junction (Fig. 27), which should be able to detect low-energy secondary electrons. The energy of the electrons
126
A. D. FEINERMAN AND D. A. CREWE
Figure 26. (Continued )
can be increased with the electrode arrangement shown in Figure 28 (Crewe, 1994). A microchannel plate is another secondary electron detector under consideration for an MSEM. The typical microchannel plate consists of an array of glass capillaries that are 400 μm long, with a 10-μm diameter, which would not fit in the 0.5-mm working distance. Researchers at Cornell’s Nanofabrication facility have fabricated a microchannel plate into a silicon chip and achieved a gain of 10 (Tasker, 1990).
VII. Electron-Optical Calculations The resolution of an ideal electrostatic column is determined by the diffraction, spherical, and chromatic aberration limited probe size: dd, dcs, and dc. These quantities are given by the following formulas where α is the final convergence angle, Vb and V are the beam voltage and spread, and Cs and Cc are the spherical and chromatic aberration coefficients: dd =
7.5 × 10−8 0.61λ (cm) = √ α α Vb
(12)
dcs = 0.25Cs α 3 (cm)
(13)
dc = Cc α
V (cm) Vb
(14)
3D FABRICATION OF MINIATURE ELECTRON OPTICS
127
Figure 27. Surface p–n junction detector. Top and bottom square contacts are for the vertical n+ fingers. Left and right contacts are to the p-type bulk. Low energy incident electrons on the detector surface will be collected by the n+ fingers. Guard rings have been omitted for visual clarity.
The method used to calculate the aberration coefficients of a stacked lens was reported previously and is also used to calculate the properties of a sliced column (Crewe, Perng et al., 1992). The ultimate resolution is frequently reported as the root-mean-square of dd, dcs, dc, and source size, but this has been shown to be incorrect (Born and Wolf, 1980; Crewe, 1987). A more accurate determination is obtained by graphing dd, dcs, and dc versus α, the final convergence angle. Two resolution plots for a stacked and a sliced lens are shown in Figure 29. The resolution of the column with this method is the maximum value of dd when it crosses ds, dc, or the source size. In general, electrostatic einzel lenses above 5 kV are spherical-aberration limited, while below 5 kV, chromatic aberration dominates their performance. The most accurate determination of the resolution happens to be the intersection of the diffraction curve with the line 0.15Csα 3 when the system is spherical-aberration limited (Crewe, 1987). A stacked column 3.2 mm long with a 0.5-mm working distance has spherical and chromatic aberration coefficients of 18.2 and 2.2 cm, respectively. The calculations for a stacked lens assume a column constructed with 381-μm wafers separated by 220-μm gaps with the two-layer deflector, and is
128
A. D. FEINERMAN AND D. A. CREWE
Figure 28. Interdigitated fingers on the detector surface at +1000 and −1000 V can boost the incident electron energy from 0–10 eV to several hundred eV. A circularly symmetric interdigitated pattern like two spirals rotated 180◦ would be preferred since it would introduce less astigmatism. The n+ fingers shown in Figure 25 have been omitted to simplify the figure.
shown in Figure 7a. The structure would have −1-, −2.25-, − 2.25-, and 0-kV potentials applied to electrodes D2–D5, respectively, which would make the maximum electric field between electrodes ∼10 kV/mm. This column would be chromatic-aberration limited and should have a resolution of 3.8 nm when operated at 2.5 kV with a 4-mrad convergence angle. The stacked electronoptical calculations assume a point electron source at −2.5 kV and 220 μm below D2. As mentioned in Section III.A, there is an advantage to etching the apertures separately and then assembling the electrodes rather than assembling the electrodes, then making the apertures. The three-electrode lens shown in Figure 7b with all apertures 100 μm in diameter operating at a beam energy of 2.5 keV and a working distance of 0.5 mm will have an optimal resolution of 5.25 nm when the final angle of convergence is constricted to be 3.25 mrad. If the focusing electrode aperture (D3 in Fig. 7b) is 200 μm in diameter the same lens will have an optimal resolution of 3.8 nm at the same working distance and beam energy at a final angle of 4 mrad.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
129
Figure 29. (a) A stacked lens 3.2 mm long at 2.5 kV is chromatic-aberration limited to a resolution of 3.8 nm at a working distance of 0.5 mm. (b) A sliced lens 8.5 mm long at 15 kV is spherical-aberration limited to a resolution of 2.2 nm at a working distance of 1 mm.
The need to use apertures of different diameters in the column is also apparent when one considers the above restriction on the final angle of convergence being limited to something on the order of a few milliradians. Restricting the beam to such a small angle requires a beam-limiting aperture on the order of 10 μm or less to be placed somewhere in the optical column. If the apertures in the column are fabricated simultaneously, then all the apertures would have to be ∼10 μm, which would result in high axial field gradients near the apertures and large aberration coefficients. With the ability to vary the aperture diameters
130
A. D. FEINERMAN AND D. A. CREWE
the final electrode could be used to limit the beam angle and the preceding resolution calculations would not be affected, since that electrode is held at ground potential. A sliced column 8.5 mm long with a 1-mm working distance has spherical and chromatic aberration coefficients of 30.98 and 2.57 cm. The inner and outer diameteres of all electrodes are 300 and 500 μm (see Fig. 9 for other column dimensions). The electron-optical calculations for the sliced column assume a point electron source at −15 kV with a 0.1-eV energy spread 1 mm below E2. The structure would require −5-, −13.7-, and 0-kV potentials applied to electrodes E2–E4, respectively, to focus the beam at the 1-mm working distance. This column should have a resolution of 2.2 nm when operated at 15 kV with a 3-mrad convergence angle. Since the column operates with approximately unity magnification, the source size becomes important only when it exceeds the minimum attainable resolution of 2.2 nm shown in Figure 29. A 125-μmdiameter tungsten wire at 25◦ C oriented along the 111 or 310 direction whose tip has been electrochemically etched to a 0.1-μm radius will have a 1- to 2-nm source size and is therefore acceptable. The sliced electrode fabrication method proposed in Section III.C makes the electrode’s surface highly resistive, on the order of 1010 /square. If stray current strikes the electrode’s surface the electrode’s potential will no longer be constant. Simulation of a fraction of the beam symmetrically striking the electrode’s inside surface was accomplished by choosing a voltage perturbation that linearly increased from zero at the edge of the electrode to a maximum at its center. If a 100-V perturbation occurred on electrodes E2 and E4 (Fig. 9), the effect on Cc and Cs for the column was less than 1%. Ten- and 100-V perturbations of electrode E3 had a negligible effect on Cc and increased Cs by 5 and 11%, respectively.
A. A Tilted MSEM The stacked design can be easily modified to create an array of high-resolution MSEMs that can be tilted 60◦ or more with respect to the sample. Very large tilt angles are required for defect review and inspection. As shown in Figure 30 the working distance with a 2-kV MSEM exceeds 6 mm when the sample is tilted 60◦ and the smallest probe size will be several microns. This hypothetical stack would be constructed from 7 × 9-mm2 silicon die that are 381 μm thick with 269-μm gaps, and the aperture radii in D2, D3, and D4 (Fig. 1) are 100,100, and 5 μm, respectively. A linear array of MSEMs could perform high throughput inspection of wafers. As shown in Figure 31 the working distance can be reduced substantially to 1.082 mm by using anisotropic etching to reduce the width of D4. A dicing saw
3D FABRICATION OF MINIATURE ELECTRON OPTICS
131
Figure 30. An MSEM constructed from 7 × 9 mm2 silicon die that are 381 μm thick with 269-μm gaps will have a working distance exceeding 6 mm when tilted at 60◦ with respect to the sample. In this figure the wafer is 30◦ from the beam and 60◦ from the horizontal.
would trim D3 so that it was 1 mm from the electron-optical axis or 10 times the 100-μm aperture radius. The smallest probe size with a 2-kV beam is ∼5 nm in this case, ignoring any astigmatism introduced by the stack. The working distance can be reduced further to 649 μm by reducing the aperture radius to 75 μm and using a dicing saw to trim D3 so that it comes within 0.75 mm from the electron-optical axis. The smallest probe size with a 2-kV beam in this case is ∼4 nm, again ignoring any astigmatism introduced by the stack. A 20-kV MSEM could also be created along these lines that could achieve a 1.5-nm probe at a working distance of 2 mm and a 60◦ sample tilt. The
132
A. D. FEINERMAN AND D. A. CREWE
Figure 31. Trimming the MSEM shown in Figure 28 will allow the working distance to be reduced to 1.08 mm when the sample is titled 60◦ with respect to the horizontal.
higher voltage would allow the MSEM to also perform chemical analysis on the particles detected. VIII. Performance of a Stacked Einzel Lens∗ A. MSEM Construction The focusing properties of a stacked electrostatic electron lens have been evaluated within a macroscopic assembly shown in Figure 32 (Crewe, Ruffin et al., 1996). The entire MSEM test structure is a cylinder 7.5 cm in diameter and 10 cm tall. This assembly consists of a 2.5-kV einzel lens, an electron source, parallel plate deflectors, and a Faraday cup as an electron detector. The test assembly positions the electron source over the silicon lens. The beam will be electrostatically scanned over the sample and an image can be formed from a current signal taken either from the sample itself or from the detector below the sample. The apparatus can be easily modified to incorporate the other micromachined components (deflector/stigmator, detector, and electron source) in the column as they are developed. The electron source is a macroscopic zirconiated tungsten thermally assisted Schottky field-emitter operating at 1800 K. The thermally assisted ZrO2 field-emission source available from FEI Inc. was chosen because it can provide highly stable field emission in a desirable ∗ Portions of Section VIII are reprinted, with permission, from the Journal of Vacuum Science and Technology A, 14(6), 3808–3812, November 1996. Copyright 1996 American Vacuum Society.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
133
Figure 32. (a) A commercial thermal field-emission (TFE) source can be aligned to a micromachined einzel lens with this experimental arrangement. There are two Macor push rods that move the TFE source. Each rod is driven by a linear-motion feedthrough and works against a UHV spring. Below the einzel lens are electron beam deflectors; a 3-min TEM grid, which serves as the sample: and a Faraday cup to detect the transmitted beam. The entire arrangement is surrounded by a mu-metal can to shield the electrons from the earth’s magnetic field. (b) Thermally assisted field-emission source positioned over micromachined silicon lens demonstrating electron beam focusing to a point on the wire grid sample.
134
A. D. FEINERMAN AND D. A. CREWE
current range (1–25 μA) at readily achieved vacuum levels (10−9 torr). The chief drawbacks to this source are its large physical size (a cylinder 2 cm in diameter and 2 cm tall) relative to the micromachined lens, the need for a mechanism to align the emitted electrons to the optical axis of the micromachined lens, and the relatively high extraction voltage required to achieve field emission (>3 keV). The design of the test structure was dictated by the need to secure, align and electrically insulate the source from an extractor electrode. As recommended by the manufacturer, the source is placed 500 μm from an extractor electrode containing a commercially available 500-μm-diameter Pt-Ir aperture. We chose to machine a bulky stainless-steel extractor electrode fit with a commercially available aperture for the purpose of absorbing most of the emitted electrons from the source. The source and extractor are placed 1 cm before the silicon lens. The test assembly consists of alternating stainless steel and Macor (a machinable glass ceramic made by Corning Inc.) rings. From the bottom up the structure consists of a Faraday cup to collect electrons; a sample holder designed to house a commercial 3-mm gold grid; a parallel plate deflector assembly, which must electrically isolate the deflectors from each other as well as the elements above and below; the micromachined electrostatic lens, which is mounted to a 16-pin Airpax header; an extractor electrode; and the FEI source. The assembly is stacked one ring above another and is held together under compression in a mu-metal exterior can, which provides both the structural integrity of the assembly and magnetic shielding of the optical column (Fig. 32). The critical alignment necessary in the structure is the alignment of the lens electrode apertures to one another and the alignment of the electron source to the lens apertures. The electrode-to-electrode alignment is accomplished through our micromachining technique and the electron source alignment is accomplished by means of two insulated linear-motion feedthroughs, which push on the FEI source at 90◦ with a return spring. This allowed the majority of the pieces in the assembly to be machined to fairly low tolerances (tolerances were specified to ∼±50 μm), which kept the machining cost low. The entire assembly is inserted into a commercially available 6-in. UHV vacuum chamber containing a 30-liter/s nonevaporable getter pump that is mounted to a 120-liter/s ion pump. The motion feedthroughs are attached, electrical connections are made, and the system is evacuated. A base pressure of 1 × 10−9 torr is achieved in 48 h. The silicon lens was fabricated from 380-μm-thick silicon chips separated with 250-μm gaps. The performance of a three-element lens using these physical parameters has been calculated and the results are shown in Figure 33. These calculations indicate that the lens can produce a high-quality focus from a position near the exit aperture of the lens to a working distance of up to a
3D FABRICATION OF MINIATURE ELECTRON OPTICS
135
Figure 33. Solid and dash–dot lines represent 4- and 0.5-mm working distances. (1) Current MSEM operating point; expected resolution of 425 nm; (2) 4-mm working distance; expected resolution of 6.2 nm; (3) 0.5-mm working distance; expected resolution of 2.3 nm.
few centimeters with potentials on the focusing electrode(s) that are allowed by the die-to-die gaps. The extractor aperture is optically aligned to the silicon apertures in the micro-machined lens by placing the assembly under a microscope, using bottom illumination to view the bright circular spot formed by the apertures in the silicon, centering the 500-μm extractor aperture over that spot, and securing the extractor in place. Typical operating potential differences between the source and the extractor electrode are in the range of 2.5–3.75 kV for an emission current of 1–25 μA. The FEI source also contains a suppressor electrode, which is biased negative with respect to the tip to prevent thermally generated electron emission from escaping the source. Initially our micromachined silicon apertures were only 3.5 μm thick, which was probably not thick enough to take the bombardment of ∼30 μA of 3-keV emission. However, we subsequently improved the silicon process to give 100-μm-thick apertures and will later remove the extractor from the assembly. With the stainless-steel electrode in the system, the first two silicon electrodes can be operated in parallel as one optically long focusing electrode. This has been calculated to produce a higher-quality probe as well as to provide more flexibility in operation (Feinerman, Crewe, Perng, Spindt et al., 1994). Calculations indicate that a stacked lens with 150-μm-diameter apertures will produce a 425-nm focus with a 2.5-kV beam at a working distance of 4 mm and a field-emission source 1 cm above the lens (Fig. 33). If the final angle of convergence is reduced from 10 to 2.6 mrad, the focus improves to 6.2 nm. If the working distance is reduced to 0.5 mm, a 2.3-nm
136
A. D. FEINERMAN AND D. A. CREWE
resolution can be achieved at a final angle of convergence equal to 6.5 mrad. The efficiency of the electron detector will have to be increased, however, since the probe current is inversely proportional to the square of the convergence angle. Images of a 200- and 1000-mesh gold TEM wire grid at a working distance of 4 mm have been obtained in transmission. The beam is scanned over the sample by using parallel plate deflectors. The silicon lens is 1.64 mm long and consists of three silicon die separated by Pyrex optical fibers as shown in Figure 2. Images of the grid at magnifications above 7000× are now being obtained.
B. MSEM Operation and Image Formation The potentials applied to the source and lens electrodes and the filament heating current are supplied by a computer-controlled set of electronics (Fig. 34). Three high-voltage power supplies and a constant current supply are floated with their
Figure 34. Flowchart of MSEM control and image-acquisition system. The use of two PCs is redundant and will be reduced to one computer controlling both the high-voltage gun control unit and the scan-generator/image-acquisition electronics.
3D FABRICATION OF MINIATURE ELECTRON OPTICS
137
virtual ground at the beam potential. Isolation from earth ground is achieved through optical couplers. The suppressor and focus potentials, and the filament heating current are controlled through an RS232 serial connection to a personal computer (PC). The beam potential is manually set on an externally regulated high-voltage power supply. After initial conditioning of the extractor electrode to allow for electronstimulated desorption of gas ions, the total emission is increased to ∼3 μA and the source-to-silicon aperture alignment is performed. Once a beam is brought through the lens, the focus electrode potential is optimized by comparing successive line scans over the gold grid. The optimal focusing potential agrees well with calculated values, differing by less than 10%. The deflection potential signals and the image data are generated and received by data-acquisition boards in a PC. The low-voltage deflection ramps are the input to a high-speed, high-voltage amplifier capable of generating −500- to +500-V signals at a rate of 10 kHz. The faces of the deflectors that are perpendicular to the electron beam measure 1.5 × 1.5 mm and are spaced 1.25 mm apart. A simple time-of-flight deflection calculation predicted a beam deflection of 0.5 μm/V of applied deflection signal. Experimentally, we have observed that one volt of deflection potential yields approximately 0.4 μm of beam deflection. Typical deflection signals are staircase ramps in the range −150 to +150 V (for a field of view 120 by 120 μm) generated at a line rate of 10 Hz. Imageacquisition time for a 512 × 512-pixel image is then 51.2 s. The image data consist of the Faraday cup current (for a dark-field image) or the sample current (for a bright-field image) that has been put through a current-to-voltage amplifier with a gain of approximately 1010 and a maximum pixel rate of 100 kHz. This 0- to 1-V signal is the input to a 12-bit analog-to-digital converter that acquires the image data synchronously with the deflection ramp generation. The raw image data are then normalized and imported into a commercially available image-processing software package for viewing. The initial and final voltages of the X and Y deflection ramps can be software selected, and the magnitude of the amplified deflection signal can be varied, which allows the user to perform a direct current offset high-magnification scan of a region of interest that is not in the center of a low-magnification image. Low- and high-magnification images of a 1000-mesh gold wire grid are shown in Figures 35–38. The 10–90% rise time of the line scan shown in Figure 39 covers a lateral distance of 2.1 μm. This indicates that if the probe is Gaussian, it has a sigma of 0.75 μm. This is a worst case estimation of the beam probe size, since the grid wires in reality have a finite slope, but does give a value that agrees well with calculations.
138
A. D. FEINERMAN AND D. A. CREWE
Figure 35. This image obtained with test apparatus demonstrates the ability of the micromachined silicon electron lens to focus on a 1000-mesh gold TEM grid. The grid wires are 6 μm wide and are spaced 19 μm apart, and the signal is from the Faraday cup current.
Figure 36. Magnification is ∼2000 of 6 μm grid, and the signal is from the Faraday cup current. Defect in center of image is from screen saver turning on.
Figure 37. Magnification is ∼3500 of 6-μm grid, and the signal is from the Faraday cup current.
Figure 38. High-magnification image of defect on wire grid. Image has been electronically rotated to bring wire to a nearly vertical position. The cross wire is not at a right angle to the vertical wire, possibly as a result of a deformation of the sample when it was fit into the test assembly. The defect is approximately 0.5 μm wide.
140
A. D. FEINERMAN AND D. A. CREWE
Figure 39. Line scan data from a high-magnification image of one period of the 1000-mesh grid. The scan signals were electrically rotated so that the beam was deflected perpendicular to the wires. The 10–90% rise time of 2.1 μm corresponds to a Gaussian probe sigma of 0.75 μm. In its present configuration the MSEM is spherical-aberration limited, so the Gaussian probe is a good approximation to the actual beam.
IX. Summary and Future Prospects Microfabrication techniques have advanced to the point where conductors, semi-conductors, and insulators can be positioned in complex threedimensional arrangements with very high precision. This is equivalent to a conventional machinist’s operating miniature milling machines and lathes with micron-sized bits. This flexible machining capability allows electric and magnetic fields to be created that can accelerate, focus, steer, and/or align charged particles, because the fields occupy a volume of space rather than simply existing next to a surface. Specific fabrication techniques developed at UIC include stacking silicon chips with Pyrex fibers, selective anodic bonding (slicing), and using a LIGA lathe. These techniques are being used to integrate chargedparticle sources, electrodes, and detectors into various miniature instruments including a subcentimeter SEM, a 10-cm time-of-flight mass spectrometer, a 10-cm nuclear magnetic resonance instrument, and a 5-m linear accelerator/undulator capable of producing hard X-rays. Analytical instruments of this size will allow the analytical laboratory to be brought to the sample, which will be essential when the sample must be observed in situ (e.g., at a toxic waste site or in outer space).
3D FABRICATION OF MINIATURE ELECTRON OPTICS
141
References Adler, E., DeBrosse, J. K., Geissler, S. F., Holmes, S. J., Jaffe, M. D., Johnson, J. B., Koburger, C. W., III, Lasky, J. B., Lloyd, B., Miles, G. L., Nakos, J. S., Noble, W. P., Jr., Voldman, S. H., Armacost, M., and Ferguson, R. (1994). The evolution of IBM CMOS DRAM technology. IBM J. Res. Dev. 39(1/2). Ahn, C. H., and Allen, M. G. (1993). A planar micromachined spiral inductor for integrated magnetic microactuator applications. J. Micromech. Microeng. 3(2), 37–44. Ahn, C. H., and Allen, M. G. (1994). A new toroidal-meander type integrated inductor with a multi-level meander magnetic core. IEEE Trans. Magn. 30, 73–79. Bean, K. E. (1978). IEEE Trans. Electron. Devices ED–25, 1185. Born, M., and Wolf, E. (1980). In Principles of Optics, 6th ed. Oxford, UK: Pergamon, p. 206. Brodie, I., and Spindt, C. A. (1992). Vacuum microelectronics. Adv. Electron. Electron Phys. 83 (P. Hawkes and B. Kazan, Eds.). Busta, H. H., Feinerman, A. D., Ketterson, J. B., and Wong, G. K. (1985). J. Appl. Phys. 58, 987–989. Carlson, D. E. (1974). J. Am. Ceram. Soc. 57, 291. Carlson, D. E., Hang, K. W., and Stockdale, G. F. (1974). J. Am. Ceram. Soc. 57, 295. Cerrina, F., Turner, B. S., and Khan, M. (1993). Microelectron. Eng. 21, 103–106. Chang, T. H. P., Kern, D. P., and Muray, L. P. (1990). Microminiaturization of electron optical systems. J. Vac. Sci. Technol. B 8, 1698–1705. Crewe, A. V. (1987). Ultramicroscopy 23, 159–168. Crewe, A. V. (1994). Private communication. Crewe, D. A., Perng, D. C., Shoaf, S. E., and Feinerman, A. D. (1992). A micromachined electrostatic electron source. J. Vac. Sci. Technol. B 10, 2754–2758. Crewe, D. A., Ruffin, M. M., and Feinerman, A. D. (1996). Initial tests of a micromachined SEM. J. Vac. Sci. Technol. B 14(6), 3808–3812. Feinerman, A. D., Crewe, D. A., and Crewe, A. V. (1994). Microfabrication of arrays of scanning electron microscopes. J. Vac. Sci. Technol. B 12, 3182–3186. Feinerman, A. D., Crewe, D. A., Perng, D. C., Shoaf, S. E., and Crewe, A. V. (1992a). Subcentimeter micromachined electron microscope. J. Vac. Sci. Technol. A 10, 611–616. Feinerman, A. D., Crewe, D. A., Perng, D. C., Shoaf, S. E., and Crewe, A. V. (1992b). SPIE— Imaging Technologies and Applications 1778, 78. Feinerman, A. D., Crewe, D. A., Perng, D. C., Spindt, C. A., Schwoebel, P. R., and Crewe, A. V. (1994). Miniature electron microscopes for lithography. SPIE—Microlithography ’94 2194, 262–273. Feinerman, A. D., Lajos, R., White, V., and Denton, D. (1996). X-ray lathe: an X-ray lithographic exposure tool for nonplanar objects. J. Microelectromech. Syst. 5(4), 250–255. Feinerman, A. D., Shoaf, S. E., and Crewe, D. A. (1991). Precision aligning and bonding of silicon die, in Patterning Science and Technology II/Interconnection and Contact Metallization for ULSI, PV92-6 (Electrochemical Society Proceedings), edited by W. Greene, G. J. Hefferon, L. K. White, T. L. Herndon, and A. L. Wu. Feller, B. (1990). SPIE 1243, 149–161. Fleming, D., Maldonado, J. R., and Neisser, M. (1992). J. Vac. Sci. Technol. B 10, 2511. Gowar, J. (1984). In Optical Communication Systems. London: Prentice Hall International, p. 99. Guckel, H., Christenson, T. R., Skrobis, K. J., Denton, D. D., Choi, B., Lovell, E. G., Lee, J. W., Bajikar, S. S., and Chapman, T. W. (1990). Deep X-ray and uv lithographies for micromechanics, in Proceedings of IEEE Solid State Sensor and Actuator Workshop, Hilton Head, South Carolina. 4–7 June. pp. 118–122.
142
A. D. FEINERMAN AND D. A. CREWE
Huggett, J. M. (1990). Adv. Electron. Electron Phys. 77, 139 (P. W. Hawkes, Ed.). Jansen, K., and Ulrich, R. (1991). J. Lightwave Technol. 9, 2–6. Jones, G. W., Jones, S. K., Walters, M. D., and Dudley, B. W. (1989). IEEE Trans. Electron. Devices 36, 2686. Laprade, B. (1989). SPIE 1072, 102–110. Larrabee, G., and Chatterjee, P. (1991). DRAM manufacturing in the 90s, Part 1: The history lesson. Semiconductor Int. 84. Lin, B. J. (1991). Quarter- and sub-quarter-micron optical lithography, in Patterning Science and Technology II/Interconnection and Contact Metallization for ULSI, PV92-6 (Electrochemical Society Proceedings), edited by W. Greene, G. J. Hefferon, L. K. White, T. L. Herndon, and A. L. Wu. pp. 3–15. Marrian, C. R. K., Dobisz, E. A., and Dagata, J. A. (1992). J. Vac. Sci. Technol. B 10, 2877. Mentzer, M. A. (1990.). In Principles of Optical Circuit Engineering, Appendix IV. New York: Dekker, pp. 301–307. Mulvey, T. (1982). Unconventional lens design, in Magnetic Electron Lenses, edited by P. W. Hawkes. pp. 359–412. Murary, L. P., Staufer, U., Bassous, E., Kern, D. P., and Chang, T. H. P. (1991). J. Vac. Sci. Technol. B 9, 2955. Petersen, K. E. (1982). Proc. IEEE 70, 422. Ravi, T. S., and Marcus, R. B. (1991). Oxidation sharpening of silicon tips. J. Vac. Sci. Technol. B 9, 2733–2737. Schwoebel, P. R., and Spindt, C. A. (1993). Glow discharge processing to enhance field emitter array performance. Appl. Phys. Lett. 63, 33. Sematech (1994). SIA National Technology Roadmap for Semiconductors. Semiconductor Industry Association, 181 Metro Drive, Suite 450, San Jose, California 95110, http://www.sematech.org/public/roadmap/doc/toc.html Shedd, G. M., Schmid, H., Unger, P., and Fink, H.-W. (1993). Rev. Sci. Instrum. 64, 2579. Somorjai, G. A. (1981). Chemistry in Two Dimensions: Surfaces. Ithaca, NY: Cornell Univ. Press. Spindt, C. A. (1968). A thin-film field emission cathode. J. Appl. Phys. 39, 3504–3505. Spindt, C. A., Holland, C. E., Rosengreen, A., and Brodie, I. (1991). Field emitter arrays for vacuum microelectronics. IEEE Trans. Electron. Devices 38, 2355–2363. Tasker, G. W. (1990). SPIE 2640, 58. Trimmer, S. N., and Gabriel, K. J. (1987). Sensors and Actuators 11, 189. Wallis, G., and Pomerantz, D. I. (1969). J. Appl. Phys. 40, 3946. Yamazaki, T., Miyata, N., Aoyama, T., and Ito, T. (1992). Investigation of thermal removal of native oxide from Si(100) surfaces in hydrogen for low-temperature Si CVD epitaxy. J. Electrochem. Soc. 139, 1175–1180.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 121
A Reference Discretization Strategy for the Numerical Solution of Physical Field Problems CLAUDIO MATTIUSSI∗ Clampco Sistemi-NIRLAB, AREA Science Park, Padriciano 99, 34012 Trieste, Italy
I. Introduction . . . . . . . . . . . . . . . . . . . II. Foundations . . . . . . . . . . . . . . . . . . . A. The Mathematical Structure of Physical Field Theories B. Geometric Objects and Orientation . . . . . . . . 1. Space–Time Objects . . . . . . . . . . . . . C. Physical Laws and Physical Quantities . . . . . . . 1. Local and Global Quantities . . . . . . . . . . 2. Equations . . . . . . . . . . . . . . . . . . D. Classification of Physical Quantities . . . . . . . . 1. Space–Time Viewpoint . . . . . . . . . . . . E. Topological Laws . . . . . . . . . . . . . . . F. Constitutive Relations . . . . . . . . . . . . . . 1. Constitutive Equations and Discretization Error . . G. Boundary Conditions and Sources . . . . . . . . . H. The Scope of the Structural Approach . . . . . . . III. Representations . . . . . . . . . . . . . . . . . . A. Geometry . . . . . . . . . . . . . . . . . . . 1. Cell Complexes . . . . . . . . . . . . . . . 2. Primary and Secondary Mesh . . . . . . . . . 3. Incidence Numbers . . . . . . . . . . . . . . 4. Chains . . . . . . . . . . . . . . . . . . . 5. The Boundary of a Chain . . . . . . . . . . . B. Fields . . . . . . . . . . . . . . . . . . . . 1. Cochains . . . . . . . . . . . . . . . . . . 2. Limit Systems . . . . . . . . . . . . . . . . C. Topological Laws . . . . . . . . . . . . . . . 1. The Coboundary Operator . . . . . . . . . . . 2. Properties of the Coboundary Operator . . . . . 3. Discrete Topological Equations . . . . . . . . . D. Constitutive Relations . . . . . . . . . . . . . . E. Continuous Representations . . . . . . . . . . . 1. Differential Forms . . . . . . . . . . . . . . 2. Weighted Integrals . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
144 147 147 150 155 157 158 159 163 165 168 172 175 176 177 183 183 184 186 188 190 191 193 193 197 199 200 202 204 205 207 210 211
∗ Current affiliation: Evolutionary and Adaptive Systems Team, Institute of Robotic Systems (ISR), Department of Micro-Engineering (DMT), Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland.
143 Volume 121 ISBN 0-12-014763-7
C 2002 by Academic Press ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright All rights of reproduction in any form reserved. ISSN 1076-5670/02 $35.00
144
CLAUDIO MATTIUSSI
3. Differential Operators . . . . . . . . . . . . 4. Spread Cells . . . . . . . . . . . . . . . . 5. Weak Form of Topological Laws . . . . . . . . IV. Methods . . . . . . . . . . . . . . . . . . . . . A. The Reference Discretization Strategy . . . . . . . 1. Domain Discretization . . . . . . . . . . . . 2. Topological Time Stepping . . . . . . . . . . 3. Strategies for Constitutive Relations Discretization 4. Edge Elements and Field Reconstruction . . . . . B. Finite Difference Methods . . . . . . . . . . . . 1. The Finite Difference Time-Domain Method . . . 2. The Support Operator Method . . . . . . . . . 3. Beyond the FDTD Method . . . . . . . . . . C. Finite Volume Methods . . . . . . . . . . . . . 1. The Discrete Surface Integral Method . . . . . . 2. The Finite Integration Theory Method . . . . . . D. Finite Element Methods . . . . . . . . . . . . . 1. Time-Domain Finite Element Methods . . . . . 2. Time-Domain Edge Element Method . . . . . . 3. Time-Domain Error-Based FE Method . . . . . V. Conclusions . . . . . . . . . . . . . . . . . . . VI. Coda . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
214 217 220 222 222 223 225 231 239 246 246 252 254 255 256 260 264 267 269 271 273 275 276
I. Introduction One of the fundamental concepts of mathematical physics is that of field; that is, naively speaking, of a spatial distribution of some mathematical object representing a physical quantity. The power of this idea lies in that it allows the modeling of a number of very important phenomena—for example, those grouped under the labels “electromagnetism,” “thermal conduction,” “fluid dynamics,” and “solid mechanics,” to name a few—and of the combinations thereof. When the concept of field is used, a set of “translation rules” is devised, which transforms a physical problem belonging to one of the aforementioned domains—a physical field problem—into a mathematical one. The properties of this mathematical model of the physical problem—a model which usually takes the form of a set of partial differential or integrodifferential equations, supplemented by a set of initial and boundary conditions—can then be subjected to analysis in order to establish if the mathematical problem is well posed (Gustafsson et al., 1995). If the result of this inquiry is judged satisfactory, it is possible to proceed to the actual derivation of the solution, usually with the aid of a computer. The recourse to a computer implies, however, a further step after the modeling step described so far, namely, the reformulation of the problem in discrete
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
145
terms, as a finite set of algebraic equations, which are more suitable than a set of partial differential equations to the number-crunching capabilities of present-day computing machines. If this discretization step is made by starting from the mathematical problem in terms of partial differential equations, the resulting procedures can logically be called numerical methods for partial differential equations. This is indeed how the finite difference (FD), finite element (FE), finite volume (FV), and many other methods are often categorized. Finally, the system of algebraic equations produced by the discretization step is solved, and the result is interpreted from the point of view of the original physical problem. More than 30 years ago, while considering the impact of the digital computer on mathematical activity, Bellman (1968) wrote Much of the mathematical analysis that was developed over the eighteenth and nineteenth centuries originated in attempts to circumvent arithmetic. With our ability to do large-scale arithmetic . . . we can employ simple, direct methods requiring much less old-fashioned mathematical training. . . . This situation by no mean implies that the mathematician has been dispossessed in mathematical physics. It does signify that he is urgently needed . . . to transform the original mathematical problems to the stage where a computer can be utilized profitably by someone with a suitable scientific training. . . . Good mathematics, like politics, is the art of the possible. Unfortunately, people quickly forget the origins of a mathematical formulation with the result that it soon acquires a life of its own. Its genealogy then protects it from scrutiny. Because the digital computer has so greatly increased our ability to do arithmetic, it is now imperative that we reexamine all the classical mathematical models of mathematical physics from the standpoints of both physical significance and feasibility of numerical solution. It may well turn out that more realistic descriptions are easier to handle conceptually and computationally with the aid of the computer. (pp. 44–45)
In this spirit, the present work describes an alternative to the classical partial differential equations–based approach to the discretization of physical field problems. This alternative is based on a preliminary reformulation of the mathematical model in a partially discrete form, which preserves as much as possible the physical and geometric content of the original problem, and is made possible by the existence and properties of a common mathematical structure of physical field theories (Tonti, 1975). The goal is to maintain the focus, both in the modeling step and in the discretization step, on the physics of the problem, thinking in terms of numerical methods for physical field problems, and not for a particular mathematical form (e.g., a partial differential equation) into which the original physical problem happens to be translated (Fig. 1).
146
CLAUDIO MATTIUSSI
Figure 1. The alternative paths leading from a physical field problem to a system of algebraic equations. p.d.e., partial differential equation.
The advantages of this approach are various. First, it provides a unifying viewpoint for the discretization of physical field problems, which is valid for a multiplicity of theories. Second, by basing the discretization of the problems on the structural properties of the theory to which they belong, this approach gives discrete formulations which preserve many physically significant properties of the original problem. Finally, being based on very intuitive geometric and physical concepts, this approach facilitates both the analysis of existing numerical methods and the development of new ones. The present work considers both these aspects, introducing first a reference discretization strategy directly inspired by the results of the analysis of the structure of physical field theories. Then, a number of popular numerical methods for partial differential equations are considered, and their workings are compared with those of the reference strategy, in order to ascertain to what extent these methods can be interpreted as discretization methods for physical field problems. The realization of this plan requires the preliminary introduction of the basic ideas of the structural analysis of physical field theories. These ideas are simple, but unfortunately they were formalized and given physically unintuitive names at the time of their first application, within certain branches of advanced
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
147
mathematics. Therefore, in applying them to other fields, one is faced with the dilemma of inventing for these concepts new and, one would hope, more meaningful names, or maintaining the names inherited from mathematical tradition. After some hesitation, I chose to keep the original names, to avoid a proliferation of typically ephemeral new definitions and in consideration of the fact that there can be difficult concepts, not difficult names; we must try to clarify the former, not avoid the latter (Dolcher, 1978). The intended audience for this article is wide. On the one hand, novices to the field of numerical methods for physical field problems will find herein a framework which will help them to intuitively grasp the common concepts hidden under the surface of a variety of methods and thus smooth the path to their mastery. On the other hand, the ideas presented should also prove helpful to the experienced numerical practitioner and to the researcher as additional tools that can be applied to the evaluation of existing methods and the development of new ones. Finally, it is worth remembering that the result of the discretization must be subjected to analysis also, in order to establish its properties as a new mathematical problem, and to measure the effects of the discretization on the solution when it is compared with that of nondiscrete mathematical models. This further analysis will not be dealt with here, the emphasis being on the unveiling of the common discretization substratum for existing methods, the convergence, stability, consistency, and error analyses of which abound in the literature.
II. Foundations A. The Mathematical Structure of Physical Field Theories It was mentioned in the Introduction that the approach to the discretization that will be presented in this work is based on the observation that physical field theories possess a common structure. Let us, therefore, start by explaining what we mean when we talk of the structure of a physical theory. It is a common experience that exposure to more than one physical field theory (e.g., thermal conduction and electrostatics) aids the comprehension of each single one and facilitates the quick grasping of new ones. This occurs because there are easily recognizable similarities in the mathematical formulation of theories describing different phenomena, which permit the transfer of intuition and imageries developed for more familiar cases to unfamiliar realms.∗ Building in a systematic way on these similarities, one can fill a correspondence ∗ One may say that this is the essence of explanation (i.e., the mapping of the unexplained on something that is considered obvious).
148
CLAUDIO MATTIUSSI
table that relates physical quantities and laws playing a similar role within different theories. Usually we say that there are analogies between these theories. These analogies are often reported as a trivial, albeit useful curiosity, but some scholars have devoted considerable efforts to unveiling their origin and meaning. In these scholars’ quest, they have discovered that these similarities can be traced to the common geometric background upon which the “physics” is built. In the book that, building on a long tradition, took these enquiries almost to their present state, Tonti (1975) emphasized the following: r
r
r r
The existence within physical theories of a natural association of many physical quantities, with geometric objects in space and space-time∗ The necessity to consider as oriented the geometric objects to which physical quantities are associated The existence of two kinds of orientation for these geometric objects The primacy and priority, in the foundation of each theory, of global physical quantities associated with geometric objects, over the corresponding densities
From this set of observations there follows naturally a classification of physical quantities, based on the type and kind of orientation of the geometric object with which they are associated. The next step is the consideration of the relations held between physical quantities within each theory. Let us call them generically the physical laws. From our point of view, the fundamental observation in this context relates to r
The existence within each theory of a set of intrinsically discrete physical laws
These observations can be given a graphical representation as follows. A classification diagram for physical quantities is devised, with a series of “slots” for the housing of physical quantities, each slot corresponding to a different kind of oriented geometric object (see Figs. 7 and 8). The slots of this diagram can be filled for a number of different theories. Physical laws will be represented in this diagram as links between the slots housing the physical quantities (see Fig. 17). The classification diagram of physical quantities, complemented by the links representing physical laws, will be called the factorization diagram of the physical field problem, to emphasize its role in singling out the terms in the governing equations of a problem, according to their mathematical and physical properties. The classification and factorization diagrams will be used extensively in this work. They seem to have been first introduced by Roth (see the discussion ∗
For the time being, we give the concept of oriented geometric object an intuitive meaning (points, and sufficiently regular lines, surfaces, volumes, and hypervolumes, along with time instants and time intervals).
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
149
in Bowden, 1990, who calls them Roth’s diagrams). Branin (1966) used a modified version of Roth’s diagrams, calling them transformation diagrams. Tonti (1975, 1976a, 1976b, 1998) refined and used these diagrams—which he called classification schemes—as the basic representational tool for the analysis of the formal structure of physical theories. We will refer here to this last version of the diagrams, which were subsequently adopted by many authors with slight graphical variations and under various names (Baldomir and Hammond, 1996; Bossavit, 1998a; Palmer and Shapiro, 1993; Oden and Reddy, 1983) and for which the name Tonti diagrams was suggested.∗ The Tonti classification and factorization diagrams are an ideal starting point for the discretization of a field problem. The association of physical quantities with geometric objects gives a rationale for the construction of the discretization meshes and the association of the variables to the constituents of the meshes, whereas singling out in the diagram the intrinsically discrete terms of the field equation permits us both to pursue the direct discrete rendering of these terms and to focus on the discretization effort with the remaining terms. Having found this common starting point for the discretization of field problems, one might be tempted to adopt a very abstract viewpoint, based on a generic field theory, with a corresponding generic terminology and factorization diagram. However, although many problems share the same structure of the diagram, there are classes of theories whose diagrams differ markedly and consequently a generic diagram would be either too simple to encompass all the cases or too complicated to work with. For this reason we are going to proceed in concrete terms, selecting a model field theory and referring mainly to it, in the belief that this could aid intuition, even if the reader’s main interest is in a different field. Considering the focus of the series in which this article appears, electromagnetism was selected as the model theory. Readers having another background can easily translate what follows by comparing the factorization diagram for electromagnetism with that of the theory they are interested in. To give a feeling of what is required for the development of the factorization diagram for other theories, we discuss the case of heat transfer, thought of as representative of a class of scalar transport equations. It must be said that there are still issues that wait to be clarified in relation to the factorization diagrams and the mathematical structure of physical theories. This is true in particular for some issues concerning the position of energy quantities within the diagrams and the role of orientation with reference to ∗ In fact, the diagrams used in this work (and in Mattiussi, 1997) differ from those originally conceived by Tonti in their admitting only cochains within the slots, whereas the latter had chains in some slots and cochains in others (depending on the kind of orientation of the subjacent geometric object). This difference reflects our advocating the use of the chain–cochain pair to distinguish the discrete representation of the geometry (which is always made in terms of chains) from that of the fields (which is always based on cochains).
150
CLAUDIO MATTIUSSI
time. Luckily this touches only marginally on the application of the theory to the discretization of physical problems finalized to their numerical solution.
B. Geometric Objects and Orientation The concept of geometric object is ubiquitous in physical field theories. For example, in the theory of thermal conduction the heat balance equation links the difference between the amount of heat contained inside a volume V at the initial and final time instants Ti and Tf of a time interval I, to the heat flowing through the surface S, which is the boundary of V, and to the heat produced or absorbed within the volume during the time interval. In this case, V and S are geometric objects in space, whereas I, Ti , and Tf are geometric objects in time. The combination of a space and a time object (e.g., the surface S considered during the time interval I, or the volume V at the time instant Ti, or Tf) gives a space– time geometric object. These examples show that by “geometric object” we mean the points and the sufficiently well-behaved lines, surfaces, volumes, and hypervolumes contained in the domain of the problem, and their combination with time instants and time intervals. This somewhat vague definition will be substituted later by the more detailed concept of the p-dimensional cell. The preceding example also shows that each mention of an object comes with a reference to its orientation. To write the heat balance equation, we must specify if the heat flowing out of a volume or that flowing into it is to be considered positive. This corresponds to the selection of a preferred direction through the surface.∗ Once this direction is chosen, the surface is said to have been given external orientation, where the qualifier “external” hints at the fact that the orientation is specified by means of an arrow that does not lie on the surface. Correspondingly, we will call internal orientation of a surface that which is specified by an arrow that lies on the surface and that specifies a sense of rotation on it (Fig. 2). Note that the idea of internal orientation for surfaces is seldom mentioned in physics but is very common in everyday objects and in mathematics (Schutz, 1980). For example, a knob that must be rotated counterclockwise to ensure a certain effect is usually designed with a suitable curved arrow drawn on its surface, and in plane affine geometry, the ordering of the coordinate axes corresponds to the choice of a sense of rotation on the plane and defines the orientation of the space. ∗ Of course it must be possible to assign such a direction consistently, which is true if the geometric object is orientable (Schutz, 1980), as we will always suppose to be the case. Once the selection is made, the object acquires a new status. As pointed out by MacLane (1986): “A plane with orientation is really not the same object as one without. The plane with an orientation has more structure—namely, the choice of the orientation” (p. 84).
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
151
Figure 2. (a) External and (b) internal orientations for surfaces.
In fact, all geometric objects can be endowed with two kinds of orientations but, for historical reasons, almost no mention of this distinction survives in physics.∗ Since both kinds of orientation are needed in physics, we will show how to build the complete orientation apparatus. We will start with internal orientation, using the preceding affine geometry example as inspiration. An n-dimensional affine space is oriented by fixing an order of the coordinate axes: this, in the three-dimensional case, corresponds to the choice of a screw-sense, or that of a vortex; in the two-dimensional case, to the choice of a sense of rotation on the plane; and in the one-dimensional case, to the choice of a sense (an arrow) along the line. These images can be extended to geometric objects. Therefore, the internal orientation of a volume is given by a screw-sense; that of a surface, by a sense of rotation on it; and that of a line, by a sense along it (see Fig. 5). Before we proceed further, it is instructive to consider an example of a physical quantity that, contrary to common belief, is associated with internally oriented surfaces: the magnetic flux φ. This association is a consequence of the invariance requirement of Maxwell’s equations for improper coordinate transformations; that is, those that invert the orientation of space, transforming a right-handed reference system into a left-handed one. Imagine an experimental setup to probe Faraday’s law, for example, verifying the link between the magnetic flux φ “through” a disk S and the circulation U of the electric field intensity E around the loop Ŵ which is the border of S. If we suppose, as is usually the case, that the sign of φ is determined by a direction through the disk, and that of U by the choice of a sense around the loop, a mirror reflection through a plane parallel to the disk axis changes the sign of U but not that of φ. Usually the incongruence is avoided by using the right-hand rule to define B and invoking for it the status of axial vector (Jackson, 1975). In other words, we are told that for space reflections, the sense of the “arrow” of the B vector ∗ However, for example, Maxwell (1871) was well aware of the necessity within the context of electromagnetism of at least four kinds of mathematical entities for the correct representation of the electromagnetic field (entities referred to lines or to surfaces and endowed with internal or with external orientation).
152
CLAUDIO MATTIUSSI
Figure 3. Orientational issues in Faraday’s law. The intervention of the right-hand rule, required in the classical version (a), can be avoided by endowing both geometric objects Ŵ and S with the same kind of orientation (b).
does not count; only the right-hand rule does. It is, however, apparent that for the invariance of Faraday’s law to hold true without such tricks, all we have to do is either to associate φ with internally oriented surfaces and U with internally oriented lines, or to associate φ with externally oriented surfaces and U with lines oriented by a sense of rotation around them (i.e., externally oriented lines, as will soon be clear). Since the effects of an electric field act along the field lines and not around them, the first option seems preferable (Schouten, 1989; Fig. 3). This example shows that the need for the right-hand rule is a consequence of our disregarding the existence of two kinds of orientation. This attitude seems reasonable in physics as we have become accustomed to it in the course of our education, but consider that if it were applied systematically to everyday objects, we would be forced to glue an arrow pointing outward from the aforementioned knob, and to accompany it with a description of the right-hand rule. Note also that the difficulties in the classical formulation of Faraday’s law stem from the impossibility of comparing directly the orientation of the surface with that of its boundary, when the surface is externally oriented and the bounding line is internally oriented. In this case, “directly” means “without recourse to the right-hand rule” or similar tricks. The possibility of making this direct comparison is fundamental for the correct statement of many physical laws. This comparison is based on the idea of an orientation induced by an object on its boundary. For example, the sense of rotation that internally orients a surface induces a sense of rotation on its bounding curve, which can be compared with the sense of rotation which orients the surface internally. The same is true for the internal orientation of volumes and of their bounding surfaces. The reader can check that the direct comparison is indeed possible if the object and its boundary are both endowed with internal orientation as defined previously for volumes, surfaces, and lines. However, this raises an interesting issue, since our list of internally oriented objects does not so far include points, which nevertheless form the boundary of a line. To make inner orientation a coherent system, we must, therefore, define internal orientations for points (as in algebra we extend the definition of the nth power of a number to include the case n = 0). This can be done by means of a pair of symbols
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
153
Figure 4. Each internally oriented geometric object induces an internal orientation on the objects that constitute its boundary.
meaning “inward” and “outward” (e.g., defining the point as a sink or a source, or drawing arrows pointing inward or outward), for these images are directly comparable with the internal orientation of a line which starts or ends with the point (Fig. 4). This completes our definition of internal orientation for geometric objects in three-dimensional space, which we will indicate with the terms P, L, S, and V. Let us now tackle the definition of external orientation for the same objects. We said before that in three-dimensional space the external orientation of a surface is given, specifying what turned out to be the internal orientation of a line which does not lie on the surface. This is a particular case of the very definition of external orientation: in an n-dimensional space, the external orientation of a p-dimensional object is specified by the internal orientation of a dual (n − p)dimensional geometric object (Schouten, 1989). Hence, in three-dimensional space, external orientation for a volume is specified by an inward or outward symbol; for a surface, it is specified by a direction through it; for a line, by a sense of rotation around it; for a point, by the choice of a screw-sense. To distinguish internally oriented objects from externally oriented ones, we will ˜ L, ˜ S, ˜ and V˜ for externally add a tilde to the terms for the latter, thus writing P, oriented points, lines, surfaces, and volumes, respectively (Fig. 5). The definition of external orientation in terms of internal orientation has many consequences. First, contrary to internal orientation, which is a combinatorial concept∗ and does not change when the dimension of the embedding ∗ For example, a line can be internally oriented by selecting a permutation class (an ordering) of two distinct points on it, which become three nonaligned points for a surface, four noncoplanar points for a volume, and so on.
154
CLAUDIO MATTIUSSI
Figure 5. (a) Internal and (b) external orientations for geometric objects in threedimensional space. The disposition of objects reflects the pairing of reciprocally dual geometric objects.
space varies, external orientation depends on the dimension. For example, external orientation for a line in two-dimensional space is assigned by a direction through it and not around it as in three-dimensional space.∗ Another consequence is the inheritance from internal orientation of the possibility of comparing the orientation of an object with that of its boundary, when both are endowed with external orientation. This implies once again the concept of induced orientation, applied in this case to externally oriented objects (Fig. 6). The duality of internal and external orientation gives rise to another important pairing, that between dual geometric objects; that is, between pairs of geometric objects that in an n-dimensional space have dimensions p and (n − p), respectively, and have differents kinds of orientation (Fig. 5). Note that also in this case the orientation of the objects paired by the duality can be directly compared. However, contrary to what happens for a geometric object and its boundary, the objects have different kinds of orientation. In the context of the mathematical structure of physical theories, this duality plays an ∗ Note, however, that the former can be considered the “projection” onto the surface of the latter.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
155
Figure 6. Each externally oriented geometric object induces an external orientation on the objects that constitute its boundary.
important role; for example, it is used in the definition of energy quantities and it accounts for some important adjointness relationships between differential operators. We have now at our disposal all the elements required for the construction of a first version—referring to the objects of three-dimensional space—of the classification diagram of physical quantities. As anticipated, it consists of a series of slots for the housing of physical quantities, each slot corresponding to an oriented geometric object. As a way to represent graphically the distinction between internal and external orientation, the slots of the diagram are subdivided between two columns. So that the important relationship represented by duality is reflected, these two columns—for internal and external orientation, respectively—are reversed with respect to each other, which thus makes dual objects row-adjacent (Fig. 7). 1. Space–Time Objects In the heat balance example that opens this section, it was shown how geometric objects in space, time, and space–time make their appearance in the foundation of a physical theory. Until now, we have focused on objects in space; let us extend our analysis to space–time objects. If we adopt a strict space–time viewpoint—that is, if we consider space and time as one, and our objects as p-dimensional objects in a generic
156
CLAUDIO MATTIUSSI
Figure 7. The Tonti classification diagram of physical quantities in three-dimensional space. Each slot is referred to an oriented geometric object; that is, points P, lines L, surfaces S, and volumes V. The left column is devoted to internally oriented objects, and the right column to externally oriented ones. The slots are paired horizontally so as to reflect the duality of the corresponding objects.
four-dimensional space—the extension from space to space–time requires only that we apply to the four-dimensional case the definitions given previously for oriented geometric objects. However, one cannot deny that in all practical cases (i.e., if a reference frame has to be meaningful for an actual observer) the time coordinate is clearly distinguishable from the spatial coordinates. Therefore, it seems advisable to consider, in addition to space–time objects per se, the space–time objects considered as Cartesian products of a space object by a time object. Let us list these products. Time can house zero- and one-dimensional geometric objects: time instants T and time intervals I. We can combine these time objects with the four space objects: points P, lines L, surfaces S, and volumes V. We obtain thus eight combinations that, considering the two kinds of orientation they can be endowed with, give rise to the 16 slots of the space–time classification diagram of physical quantities (Tonti, 1976b; Fig. 8). Note that the eight combinations correspond, in fact, to five space–time geometric objects (e.g., a space–time volume can be obtained as a volume in space considered at a time instant, that is, as the topological product V × T, or as a surface in space considered during a time interval, which corresponds to S × I). This is reflected within the diagram by the sharing of a single slot by the combinations corresponding to the same oriented space–time object. To distinguish space–time objects from merely spatial ones, we will use the symbols P , L, S , V , and H for the former and the symbols P, L, S, and V for the latter. As usual, a tilde will signal external orientation.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
157
Figure 8. The Tonti space–time classification diagram of physical quantities. Each slot is referred to an oriented space–time geometric object, which is thought of as obtained in terms of a product of an object in space by an object in time. The space objects are those of Figure 7. The time objects are time instants T and time intervals I. This diagram can be redrawn with the slots referring to generic space–time geometric objects; that is, points P , lines L, surfaces S , volumes V , and hypervolumes H (see Fig. 11).
C. Physical Laws and Physical Quantities In the previous sections, we have implicitly defined a physical quantity (the heat content, the heat flow, and the heat production, in the heat transfer example) as an entity appearing within a physical field theory, which is associated with one (and only one) kind of oriented geometric object. Strictly speaking, the individuation within a physical theory of the actual physical quantities and the attribution of the correct association with oriented geometric objects should be based on an analysis of the formal properties of the mathematical entities that appear in the theory (e.g., considering the dimensional properties of those entities and their behavior with respect to coordinate transformations). Given that formal analyses of this kind are available in the literature (Post, 1997; Schouten, 1989; Truesdell and Toupin, 1960), the approach within the present work will be more relaxed. To fill in the classification diagram of the physical quantities of a theory, we will look first at the integrals which appear within the theory, focusing our attention on the integration domains in space and time. This will give us a hint about the geometric object that a quantity is associated with. The attribution of orientation to these objects will be based on heuristic considerations deriving from the following fundamental property: the sign of a global quantity associated with a geometric object changes when
158
CLAUDIO MATTIUSSI
the orientation of the object is inverted. Further hints would be drawn from physical effects and the presence of the right-hand rule in the traditional definition of a quantity, as well as from the global coherence of the orientation system thus defined. The reader can find in Tonti (1975) an analysis based on a similar rationale, applied to a large number of theories, accompanied by the corresponding classification and factorization diagrams. 1. Local and Global Quantities By their very definition, our physical quantities are global quantities, for they are associated with macroscopic space–time domains. This complies with the fact that actual field measurements are always performed on domains having finite extension. When local quantities (densities and rates) can be defined, it is natural to make them inherit the association with the oriented geometric object of the corresponding global quantity. However, it is apparent that the familiar tools of vector analysis do not allow this association to be represented. This causes a loss of information in the transition from the global to the local representation, when ordinary scalars and vectors are used. For example, from the representation of magnetic flux density with the vector field B, no hint at internally oriented surfaces can be obtained, nor can an association to externally oriented volumes be derived from the representation of charge density with the scalar field ρ. Usually the association with geometric objects (but not the distinction between internal and external orientations) is reinserted while one is writing integral relations, by means of the “differential term,” so that we write, for example, B · ds (1) S
and
ρ dv
(2)
V
However, given the presence of the integration domains S and V, which accompany the integration signs, the terms ds and dv look redundant. It would be better to use a mathematical representation that refers directly to the oriented geometric object that a quantity is associated with. Such a representation exists within the formalism of ordinary and twisted differential forms (Burke, 1985; de Rham, 1931). Within this formalism, the vector field B becomes an ordinary 2-form b2 and the scalar field ρ a twisted 3-form ρ˜ 3 , as follows: B ⇒ b2
ρ ⇒ ρ˜ 3
(3) (4)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
159
The symbols b2 and ρ˜ 3 explicitly refer to the fact that magnetic induction and charge density are associated with (and can be integrated only on) internally oriented two-dimensional domains and externally oriented three-dimensional domains, respectively. Thus, everything seems to conspire for an early adoption of a representation in terms of differential forms. We prefer, however, to delay this step in order to show first how the continuous representation tool they represent can be founded on discrete concepts. Waiting for the suitable discrete concepts to be available, we will temporarily stick to the classical tools of vector calculus. In the meantime, the only concession to the differential-form spirit will be the systematic dropping of the “differential” under the integral sign, so that we write, for example, B (5) S
and
ρ
(6)
V˜
instead of Eqs. (1) and (2). 2. Equations After the introduction of the concept of oriented geometric objects, the next step would ideally be the discussion of the association of the physical quantities of the field theory (in our case, electromagnetism) with the objects. This would parallel the typical development of physical theories, in which the discovery of quantities upon which the phenomena of the theory may be conceived to depend precedes the development of the mathematical relations that link those quantities in the theory (Maxwell, 1871). It turns out, however, that the establishment of the association between physical quantities and geometric objects is based on the analysis of the equations appearing in the theory itself. In particular, it is expedient to list all pertinent equations for the problem considered, and isolate a subset of them, which represent physical laws lending themselves naturally to a discrete rendering, for these clearly expose the correct association. We start, therefore, by listing the equations of electromagnetism. We will first give a local rendition of all the equations, even of those that will eventually turn out to have an intrinsically discrete nature, since this is the form that is typically considered in mathematical physics. The first pair of electromagnetic equations that we consider represent in local form Gauss’s law for magnetic flux [Eq. (7)] and Faraday’s induction
160
CLAUDIO MATTIUSSI
law [Eq. (8)]: div B = 0 (7) ∂B curl E + =0 (8) ∂t where B is the magnetic flux density and E is the electric field intensity. We will show next that these equations have a counterpart in the law of charge conservation [Eq. (9)]: ∂ρ =0 (9) ∂t where J is the electric current density and ρ is the electric charge density. Similarly, Eqs. (10) and (11), which define the scalar potential V and the vector potential A, div J +
curl A = B (10) ∂A −grad V − =E (11) ∂t are paralleled by Gauss’s law of electrostatics [Eq. (12)] and Maxwell– Amp`ere’s law [Eq. (13)]—where D is the electric flux density and H is the magnetic field intensity—which close the list of differential statements: div D = ρ (12) ∂D =J (13) curl H − ∂t Finally, we have a list of constitutive equations. A very general form for the case of electromagnetism, accounting for most material behaviors, is t D(r, t) = Fε (E, r′ , τ ) (14) B(r, t) = J(r, t) =
t0
D
t0
D
t0
D
t
t
Fμ (H, r′ , τ )
(15)
Fσ (E, r′ , τ )
(16)
but, typically, the purely local relations D(r, t) = f ε (E, r, t)
(17)
J(r, t) = f σ (E, r, t)
(19)
B(r, t) = f μ (H, r, t)
(18)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
161
or the even simpler relations D(r) = ε(r)E(r)
(20)
J(r) = σ (r)E(r)
(22)
(21)
B(r) = μ(r)H(r)
adequately represent most actual material behaviors. We will now consider all these equations, aiming at their exact rendering in terms of global quantities. Integrating Eqs. (7) through (13) on suitable spatial domains, writing ∂D for the boundary of a domain D, and making use of Gauss’s divergence theorem and Stokes’s theorem, we obtain the following integral expressions: B= 0 (23) ∂V
E+
d J+ dt
∂S
∂ V˜
−
d dt
∂L
d dt
V˜
∂S
L
A=
∂ V˜
D=
S˜
D=
V
0
(24)
0
(25)
S
V˜
A=
d H− dt ∂ S˜
ρ=
S
V−
B=
B
(26)
E
(27)
S
L
ρ
(28)
J
(29)
V˜
S˜
Note that in Eqs. (23), (24), and (25) we have integrated the null term on the right-hand side. This was done in consideration of the fact that the corresponding equations assert the vanishing of some kind of physical quantity, and we must investigate what kind of association it has. Moreover, in Eqs. (25), (28), and (29) we added a tilde to the symbol of the integration domains. These are the domains which will turn out later to have external orientation.
162
CLAUDIO MATTIUSSI
In Eqs. (24), (25), (27), and (29) a time derivative remains. A further integration can be performed on a time interval I = [T1, T2] as a way to eliminate this residual derivative. For example, Eq. (24) becomes
T2 T1
∂S
E+
B S
T2 T1
=
T2 T1
0
(30)
S
We adopt a more compact notation, which uses I for the time interval. Moreover, we will consider as an “integral on time instants,” a term evaluated at that instant, according to the following symbolism: def ·= · (31) · = S
S
T
T
S
Correspondingly, since the initial and final instants of a time interval I are actually the boundary ∂I of I, we write boundary terms as follows: T2 def · = · S
T1
∂I
(32)
S
Remark II.1 The boundary of an oriented geometric object is constituted by its faces endowed with the induced orientation (Figs. 4 and 6). For the case of a time interval I = [T1, T2], the faces that appear in the boundary ∂I correspond to the two time instants T1 and T2. If the time interval I is internally oriented in the direction of increasing time, T1 appears in ∂I oriented as a source, whereas T2 appears in it oriented as a sink. However, as time instants, T1 and T2 are endowed with a default orientation of their own. Let us assume that the default internal orientation of all time instants is as sinks; it follows that ∂I is constituted by T2 taken with its default orientation and by T1 taken with the opposite of its default orientation. We can express this fact symbolically, writing ∂I = T2 − T1, where the “minus” sign signals the inversion of the orientation of T1. Correspondingly, if there is a quantity Q associated with the time instants, and Q1 and Q2 are associated with T1 and T2, respectively, the quantity Q2 − Q1 will be associated with ∂I. We will give these facts a more precise formulation later, using the concepts of chain and cochain. For now, this example gives a first idea of the key role played by the concept of orientation of space–time geometric objects, in a number of common mathematical operations such as the T increment of a quantity and the fact that an expression like T12 d f corresponds to ( f |T2 − f |T1 ) and not to its opposite. In this context, we alert the reader to the fact that if the time axis is externally oriented, it is the time instants that are oriented by means of a (through) direction, whereas the time instants themselves are oriented as sources or sinks.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
163
With these definitions [Eqs. (31) and (32)], Eqs. (23) through (29) become B= 0 (33) T
E+
J+
I
∂S
I˜
−
∂ V˜
I˜
∂I
∂ I˜
V˜
V−
∂L
∂ S˜
H−
I˜
∂I
L
A=
∂ I˜
S
A=
∂ V˜
D=
S˜
D=
I
ρ=
∂S
T˜
V
B=
T
T
S
I
∂V
0
(34)
0
(35)
V˜
T
S
I
L
T˜
V˜
I˜
S˜
B
(36)
E
(37)
ρ
(38)
J
(39)
The equations in this form can be used to determine the correct association of physical quantities with geometric objects. D. Classification of Physical Quantities In Eqs. (33) through (39), we can identify a number of recurrent terms and deduce from them an association of physical quantities with geometric objects. From Eqs. (33) and (34) we get E ⇒ (L × I ) (40) I
L
T
S
B ⇒ (S × T )
(41)
where the arrow means “is associated with.” The term in Eq. (41) confirms the association of magnetic induction with surfaces and suggests a further one with time instants, whereas Eq. (40) shows that the electric field is associated with lines and time intervals. These geometric objects are endowed with internal orientation, as follows from the analysis made previously for the orientational issues in Faraday’s law.
164
CLAUDIO MATTIUSSI
The status of electric current and charge as a physical quantity can be deduced from Eq. (35), which gives the terms J ⇒ ( S˜ × I˜) (42) S˜
I˜
T˜
V˜
ρ ⇒ (V˜ × T˜ )
(43)
which show that electric current is associated with surfaces and time intervals, whereas charge is associated with volumes and time instants. Since the current is due to a flow of charges through the surface, a natural external orientation for surfaces follows. Given this association of electric current with externally oriented surfaces, the volumes to which charge content is associated must also be externally oriented to permit direct comparison of the sign of the quantities in Eq. (35). The same rationale can be applied to the terms appearing in Eqs. (38) and (39); that is, H ⇒ ( L˜ × I˜) (44) I˜
L˜
T˜
S˜
D ⇒ ( S˜ × T˜ )
(45)
This shows that the magnetic field is associated with lines and time intervals and the electric displacement with surfaces and time instants. As for orientation, the magnetic field is traditionally associated with internally oriented lines but this choice requires the right-hand rule to make the comparison, in Eq. (39), of the direction of H along ∂ S˜ with the direction of the current flow through the ˜ Hence, so that the use of the right-hand rule can be dispensed with, surface S. the magnetic field must be associated with externally oriented lines. The same argument applies in suggesting an external orientation for surfaces to which electric displacement is associated. Finally, Eqs. (36) and (37) give the terms V ⇒ (P × I ) (46) I
P
T
L
A ⇒ (L × T )
(47)
which show that the scalar potential is associated with points and time intervals, whereas the vector potential is associated with lines and time instants. From the association of the electric field with internally oriented lines, it follows that for the electromagnetic potentials, the orientation is also internal.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
165
Figure 9. The Tonti classification diagram of local electromagnetic quantities.
The null right-hand-side terms in Eqs. (33) through (35) remain to be taken into consideration. We will see subsequently that these terms express the vanishing of magnetic flux creation (or the nonexistence of magnetic charge) and the vanishing of electric charge creation, respectively. For now, we will simply insert them as zero terms in the appropriate slot of the classification diagram for the physical quantities of electromagnetism, which summarizes the results of our analysis (Fig. 9). 1. Space–Time Viewpoint The terms T˜ V˜ ρ and I˜ S˜ J in Eqs. (42) and (43) refer to the same global physical quantity: electric charge. Moreover, total integration is performed in both cases on externally oriented, three-dimensional domains in space–time. We can, therefore, say that electric charge is actually associated with externally oriented, three-dimensional space–time domains of which a three-dimensional space volume considered at a time instant, and a three-dimensional space surface considered during a time interval, are particular cases. To distinguish these two embodiments of the charge concept, we use the terms charge content, referring to volumes and time instants, and charge flow, referring to surfaces and time intervals. A similar distinction can be drawn for other quantities. For example, the terms I L E and T S B in Eqs. (40) and (41) are both magnetic fluxes associated with two-dimensional space–time domains of which we could say that the electric field refers to a “flow” of magnetic flux tubes which cross internally oriented lines, while magnetic induction refers to a surface “content”
166
CLAUDIO MATTIUSSI
of such tubes. Since the term content refers properly to volumes, and the term flow to surfaces, it appears preferable to distinguish the two manifestations of each global quantity by using an index derived from the letter traditionally used for the corresponding local quantity, as in ρ = Q ρ (V˜ × T˜ ) (48) V˜
T˜
I˜
and
S˜
J = Q j ( S˜ × I˜)
(49)
(50) (51)
T
S
B = φ b (S × T )
I
L
E = φ e (L × I )
The same argument can be applied to electric flux, D = ψ d ( S˜ × T˜ ) T˜
S˜
I˜
L˜
H = ψ h ( L˜ × I˜)
and to the potentials in global form, A = U a (L × T ) T
L
I
P
V = U v (P × I )
(52) (53)
(54) (55)
With these definitions we can fill in the classification diagram of global electromagnetic quantities (Fig. 10). Note that the classification diagram of Figure (10) emphasizes the pairing of physical quantities which happen to be the static and dynamic manifestations of a unique space–time entity. We can group these variables under a single heading, obtaining a classification diagram of the space–time global electromagnetic quantities U , φ, ψ, and Q (Fig. 11), which corresponds to the one that could be drawn for local quantities in four-dimensional notation. Note also that all the global quantities of a column possess the same physical dimension; for example, the terms in Eqs. (48), (49), (52), and (53) all have the physical dimension of electric charge. Nonetheless, quantities appearing in different rows of a column refer to different physical quantities since, even
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
167
Figure 10. The Tonti classification diagram of global electromagnetic quantities.
if the physical dimension is the same, the underlying space–time oriented geometric object is not. This fact is reflected in the relativistic behavior of these quantities. When an observer changes his or her reference frame, his or her perception of what is time and what is space changes and with it his or her method of splitting a given space–time physical quantity into its two “space plus time” manifestations. Hence, the transformation laws, which account for
Figure 11. The Tonti classification diagram of global electromagnetic quantities, referring to space–time geometric objects.
168
CLAUDIO MATTIUSSI
the change of reference frame, will combine only quantities referring to the same space–time oriented object. In a four-dimensional treatment such quantities will be logically grouped within a unique entity (e.g., the charge–current vector; the four-dimensional potentials; the first and second electromagnetic tensor—or the corresponding differential forms—with groupings E and B, and H and D, respectively; and so on). E. Topological Laws Now that we have seen how to proceed to the individuation and classification of the physical quantities of a theory, there remains, as a last step in the determination of the structure of the theory itself, the establishment of the links existing between the quantities, accompanied by an analysis of the properties of these links. As anticipated, the main result of this further analysis—valid for all field theories—will be the singling out of a set of physical laws, which lend themselves naturally to a discrete rendering, opposed to another set of relations, which constitute instead an obstacle to the complete discrete rendering of field problems. It is apparent from the definitions given in Eqs. (48) through (55), that Eqs. (33) through (39) can be rewritten in terms of global quantities only, as follows: e
φ b (∂ V × T ) = 0(V × T ) b
(56)
φ (∂ S × I ) + φ (S × ∂ I ) = 0(S × I )
(57)
Q j (∂ V˜ × I˜) + Q ρ (V˜ × ∂ I˜) = 0(V˜ × I˜)
(58)
v
U a (∂ S × T ) = φ b (S × T ) a
e
(59)
−U (∂ L × I ) − U (L × ∂ I ) = φ (L × I )
(60)
ψ d (∂ V˜ × T˜ ) = Q ρ (V˜ × T˜ ) ψ h (∂ S˜ × I˜) − ψ d ( S˜ × ∂ I˜) = Q j ( S˜ × I˜)
(61) (62)
Note that no material parameters appear in these equations, and that the transition from the local, differential statements in Eqs. (7) through (13) to these global statements was performed without recourse to any approximation. This proves their intrinsic discrete nature. Let us examine and interpret these statements one by one. Gauss’s magnetic law [Eq. (56)] asserts the vanishing of magnetic flux associated with closed surfaces ∂V in space considered at a time instant T. From
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
169
Figure 12. Faraday’s induction law admits a geometric interpretation as a conservation law on a space–time cylinder. The (internal) orientation of geometric objects is not represented.
what we said previously about space–time objects, there must be a corresponding assertion for timelike closed surfaces. Faraday’s induction law [Eq. (57)] is indeed such an assertion for a cylindrical closed surface in space–time constructed as follows (Fig. 12): the surface S at the time instant T1 constitutes the first base of a cylinder; the boundary of S, ∂S, considered during the time interval I = [T1, T2], constitutes the lateral surface of the cylinder, which is finally closed by the surface S considered at the time instant T2 [remember that T1 and T2 together constitute the boundary ∂I of the time interval I, hence the term S × ∂I in Eq. (57) represents the two bases of the cylinder] (Bamberg and Sternberg, 1988; Truesdell and Toupin, 1960). This geometric interpretation of Faraday’s law is particularly interesting for numerical applications, for it is an exact statement linking physical quantities at times T < T2 to a quantity defined at time T2. Therefore, this statement is a good starting point for the development of the time-stepping procedure. In summary, Gauss’s law and Faraday’s induction law are the space and the space–time parts, respectively, of a single statement: the magnetic flux associated with the boundary of a space–time volume V is always zero: φ(∂ V ) = 0(V )
(63)
(Remember that the boundary of an oriented geometric object must always be thought of as endowed with the induced orientation.) Equation (63), also called the law of conservation of magnetic flux (Truesdell and Toupin, 1960), gives to its right-hand-side term the meaning of a null in the production of magnetic flux. From another point of view, the right-hand side of Eq. (56) expresses
170
CLAUDIO MATTIUSSI
the nonexistence of magnetic charge and that of Eq. (57) the nonexistence of magnetic charge current. The other conservation statement of electromagnetism is the law of conservation of electric charge [Eq. (58)]. In strict analogy with the geometric interpretation of Faraday’s law, a cylindrical, space–time, closed hypersurface is constructed as follows: the volume V˜ at the time instant T˜1 constitutes the first base of a hypercylinder; the boundary of V˜ , ∂ V˜ , considered during the time interval I˜ = [T˜1 , T˜2 ], constitutes the lateral surface of the hypercylinder, which is finally closed by the volume V˜ considered at the time instant T˜2 . The law of charge conservation asserts the vanishing of the electric charge associated with this closed hypercylinder. This conservation statement can be referred to the boundary of a generic space–time hypervolume H˜ , which yields the following statement, analogous to Eq. (63): (64) Q(∂ H˜ ) = 0(H˜ ) In Eq. (64) the zero on the right-hand side states the vanishing of the production of electric charge. Note that in this case a purely spatial statement, corresponding to Gauss’s law of magnetostatics [Eq. (56)] is not given, for in four-dimensional space–time a hypervolume can be obtained only as a product of a volume in space multiplied by a time interval. The two conservation statements [Eqs. (63) and (64)] can be considered the two cornerstones of electromagnetic theory (Truesdell and Toupin, 1960). de Rham (1931) proved that from the global validity of statements of this kind [or, if you prefer, of Eqs. (33) through (35)] in a homologically trivial space follows the existence of field quantities that can be considered the potentials of the densities of the physical quantities appearing in the global statements. In our case we know that the field quantities V and A, defined by Eqs. (10) and (11), are indeed traditionally called the electromagnetic potentials. Correspondingly, the field quantities H and D defined by Eqs. (12) and (13) are also potentials and can be called the charge–current potentials (Truesdell and Toupin, 1960). In fact the definition of H and D is a consequence of charge conservation, exactly as the definition of V and A is a consequence of magnetic flux conservation; therefore, neither is uniquely defined by the conservation laws of electromagnetism. Only the choice of a gauge for the electromagnetic potentials and the hypothesis about the media properties for charge–current potentials removes this nonuniqueness. In any case, the global renditions [Eqs. (59) through (62)] of the equations defining the potentials prove the intrinsic discrete status of Gauss’s law of electrostatics, of Maxwell–Amp`ere’s law, and of the defining equations of the electromagnetic potentials. A geometric interpretation can be given to these laws, too. Gauss’s law of electrostatics asserts the balance of the electric charge contained in a volume with the electric flux through the surface that bounds
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
171
Figure 13. Maxwell–Amp`ere’s law admits a geometric interpretation as a balance law on a space–time cylinder. The (external) orientation of geometric objects is not represented.
the volume. Similarly, Maxwell–Amp`ere’s law defines this balance between the charge contained within a space–time volume and the electric flux through its boundary, which is a cylindrical space–time closed surface analogous to the one appearing in Faraday’s law, but with external orientation (Fig. 13). This geometric interpretation, like that of Faraday’s law, is instrumental for a correct setup of the time stepping within a numerical procedure. Equations (61) and (62) can be condensed into a single space–time statement that asserts the balance of the electric charge associated with arbitrary space– time volumes with the electric flux associated with their boundaries: ψ(∂ V˜ ) = Q(V˜ ) (65) Analogous interpretations hold for Eqs. (59) and (60), relative to a balance of magnetic fluxes associated with space–time surfaces and their boundaries:
U (∂ S ) = φ(S )
(66)
We can insert the global space–time statements [Eqs. (63) through (66)] in the space–time classification diagram of the electromagnetic physical quantities (Fig. 14). Note that all these statements appear as vertical links. These links relate a quantity associated with an oriented geometric object with a quantity associated with the boundary of that object (which has, therefore, the same kind of orientation). What is shown here for the case of electromagnetism applies to the great majority of physical field theories. Typically, a subset of the equations which form a physical field theory link a global quantity associated with an oriented geometric object to the global quantity that, within
172
CLAUDIO MATTIUSSI
Figure 14. The position of topological laws in the Tonti classification diagram of electromagnetic quantities.
the theory, is associated with the boundary of that object (Tonti, 1975). These laws are intrinsically discrete, for they state a balance of these global quantities (or a conservation of them, if one of the terms is zero) whose validity does not depend on metrical or material properties, and is, therefore, invariant for very general transformations. This gives them a “topological significance” (Truesdell and Toupin, 1960), which justifies our calling them topological laws. The significance of this finding for numerical methods is obvious: once the domain of a field problem has been suitably discretized, topological laws can be written directly and exactly in discrete form. F. Constitutive Relations To complete our analysis of the equations of electromagnetism, we must consider the set of constitutive equations, represented, for example, by Eqs. (14) through (16). We emphasize once again that each instance of this kind of equation is only a particular case of the various forms that the constitutive links between the problem’s quantities can take. In fact, while topological laws can be considered universal laws linking the field quantities of a theory, constitutive relations are merely definitions of ideal materials given within the framework of that particular field theory (Truesdell and Noll, 1965). In other words, they are abstractions inspired by the observation of the behavior of actual materials. More sophisticated models have terms that account for a wider range of observed material behaviors, such as nonlinearity, anisotropy, nonlocality,
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
173
Figure 15. The Tonti factorization diagram of electromagnetism in local form. Topological laws are represented by vertical links within columns, whereas constitutive relations are represented by transverse links bridging the two columns of the diagram.
hysteresis, and the combinations thereof (Post, 1997). This added complexity implies usually a greater sophistication of the numerical solvers, but does not change the essence of what we are about to say concerning the discretization of constitutive relations. If we consider the position of constitutive relations in the classification diagram of the physical quantities of electromagnetism, we observe that they constitute a link that connects the two columns (Fig. 15). This fact reveals that, unlike topological laws, constitutive relations link quantities associated with geometric objects endowed with different kinds of orientation. From the point of view of numerical methods, the main differences with topological laws are the observation that constitutive relations contain material parameters∗ and the fact that they are not intrinsically discrete. The presence of a term of this kind in the field equations is not surprising, since otherwise—given the intrinsic ∗ In some cases material parameters seemingly disappear from constitutive equations. This is the case, for example, with electromagnetic equations in empty space when we adopt Gaussian units and set c = 1. This induces the temptation to identify physical quantities—in this case E and D, and B and H, respectively. However, the approach based on the association with oriented geometric objects reveals that these quantities have a distinct nature.
174
CLAUDIO MATTIUSSI
discreteness of topological laws—it would always be possible to exactly discretize and solve numerically a field problem, and we know that this is not the case. Constitutive relations can be transformed into exact links between global quantities only if the local properties do not vary in the domain where the link must be valid. This means that we must impose a series of uniformity requirements on material and field properties for a global statement to hold true. On the contrary, since, aside from discontinuities, these requirements are automatically satisfied in the small, the local statement always applies. The uniformity requirement is in fact the method used to experimentally investigate these laws. For example, we can investigate the constitutive relation D = εE
(67)
examining a capacitor with two planar parallel plates of area A, having a distance l between them and filled with a uniform, linear, isotropic medium having relative permittivity ε r. With this assumption, Eq. (67) corresponds approximately to V ψ =ε (68) A l where ψ is the electric flux and V the voltage between the plates. Note that to write Eq. (68), besides using the material parameter ε, we invoke the concepts of planarity, parallelism, area, distance, and orthogonality, which are not topological concepts. This shows that, unlike topological laws, constitutive relations imply the recourse to metrical concepts. This is not apparent in Eq. (67), for—as explained previously—the use of vectors to represent field quantities tends to hide the geometric details of the theory. Equation (67) written in terms of differential forms, or a geometric representation thereof, reveals the presence, within the link, of the metric tensor (Burke, 1985; Post, 1997). The local nature of constitutive relations can be interpreted by saying that these equations summarize at a macroscopic level something going on at a subjacent scale. This hypothesis may help the intuition, but it is not necessary if we are willing to interpret them as definitions of ideal materials. By so doing, we can avoid the difficulties implicit in the creation of a convincing derivation of field concepts from a corpuscular viewpoint. There is other information about constitutive equations that can be derived by observing their position in the factorization diagram. These are not of direct relevance from a numerical viewpoint but can help us to understand better the nature of each term. For example, it has been observed that when the two columns of the factorization diagram are properly aligned according to duality, constitutive relations linked to irreversible processes (e.g., Ohm’s law linking E and J in Fig. 15) appear as slanted links, whereas those representing reversible processes appear as horizontal links (Tonti, 1975).
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
175
1. Constitutive Equations and Discretization Error We anticipated in the preceding discussion that, from our point of view, the main consequence of the peculiar nature of constitutive relations lies in their preventing, in general, the attainment of an exact discrete solution. By “exact discrete solution,” we mean the exact solution of the continuous mathematical model (e.g., a partial differential equation) into which the physical problem is usually transformed. We hinted in the Introduction at the fact that the numerical solution of a field problem implies three phases (Fig. 1): 1. The transformation of the physical problem into a mathematical model 2. The discretization of the mathematical model 3. The solution of the system of algebraic equations produced by the discretization (The fourth phase represented in Fig. 1, the approximate reconstruction of the field function based on the discrete solution, obviously does not affect the accuracy of the discrete solution.) Correspondingly, there will be three kinds of errors (Fig. 16; Ferziger and Peri´c, 1996; Lilek and Peri´c, 1995): 1. The modeling error 2. The discretization error 3. The solver error
Figure 16. The three kinds of errors associated with the numerical solution of a field problem.
176
CLAUDIO MATTIUSSI
Modeling errors are a consequence of the assumptions about the phenomena and processes, made during the transition from the physical problem to its mathematical model in terms of equations and boundary conditions. Solver errors are a consequence of the limited numerical precision and time available for the solution of the system of algebraic equations. Discretization errors act between these two steps, preventing the attainment of the exact discrete solution of the mathematical model, even in the hypothesis that our algebraic solvers were perfect. The existence of discretization errors is a well-known fact, but it is the analysis based on the mathematical structure of physical theories that reveals where the discretization obstacle lies; that is, within constitutive relations, topological laws not implying in themselves any discretization error. As anticipated in the Introduction, this in turn suggests the adoption of a discretization strategy in which what is intrinsically discrete is included as such in the model, and the discretization effort is focused on what remains. It must be said, however, that once the discretization error is brought into by the presence of the constitutive terms, it is the joint contribution of the approximation implied by the discretization of these terms and of our enforcing only a finite number of topological relations in place of the infinitely many that are implied by the corresponding physical law that shapes the actual discretization error. This fact will be examined in detail subsequently.
G. Boundary Conditions and Sources A field problem includes, in addition to the field equations, a set of boundary conditions and the specification that certain terms appearing in the equations are assigned as sources. Boundary conditions and sources are a means to limit the scope of the problem actually analyzed, for they summarize the effects of interactions with domains or phenomena that we choose not to consider in detail. Let us see how boundary conditions and sources enter into the framework developed in the preceding sections for the equations, with a classification that parallels the distinction between topological laws and constitutive relations. When boundary conditions and sources are specified as given values of some of the field quantities of the problem, they correspond in our scheme to global values assigned to some geometric object placed along the boundary or lying within the domain. Hence, the corresponding values enter the calculations exactly, but for the possibly limited precision with which they are calculated from the corresponding field functions (usually by numerical integration) when they are not directly given as global quantities. Consequently, in this case these terms can be assimilated with topological prescriptions.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
177
In other cases boundary and source terms are assigned in the form of equations linking a problem’s field variable to a given excitation. In these cases, these terms must be considered as additional constitutive relations to which all the considerations made previously for this kind of equation apply. In particular, within a numerical formulation, such terms must be subjected to a specific discretization process. For example, this is the case for convective boundary conditions in heat transfer problems. In still other cases boundary conditions summarize the effects on the problem domain of the structure of that part of space–time which lies outside the problem domain. Think, for example, about radiative boundary conditions in electrodynamics, and inlet and outlet boundary conditions in fluid dynamics. In these cases, one cannot give general prescriptions, for the representation depends on the geometric and physical structure of this “outside.” Physically speaking, a good approach consists of extending the problem’s domain, enclosing it in a (thin) shell whose properties account, with a sufficient approximation, for the effect of the whole space surrounding the domain, and whose boundary conditions belong to one of the previous kinds. This shell can then be modeled and discretized by following the rules used for the rest of the problem’s domain. However, devising the properties of such a shell is usually not a trivial task. In any case, the point is that boundary conditions and source terms can be brought back to topological laws and constitutive relations by physical reasoning, and from there they require no special treatment with respect to what applies to these two categories of relations.
H. The Scope of the Structural Approach The example of electromagnetism, examined in detail in the previous sections, shows that to approach the numerical solution of a field problem by taking into account its mathematical structure, we must first classify the physical quantities appearing in the field equations, according to their association with oriented geometric objects, and then factorize the field equations themselves to the point of being able to draw the factorization diagram for the field theory to which the problem belongs. The result will be a distinction of topological laws, which are intrinsically discrete, from constitutive relations, which admit only approximate discrete renderings (Fig. 17). Let us examine briefly how this process works for other theories and the difficulties we can expect to encounter. From electromagnetism we can easily derive the diagrams of electrostatics and magnetostatics. If we drop the time dependence, the factorization diagram for electromagnetism splits naturally into the two distinct diagrams of electrostatics and magnetostatics (Figs. 18 and 19).
178
CLAUDIO MATTIUSSI
Figure 17. The distinction between topological and constitutive terms of the field equations, as it appears in the Tonti factorization diagram. Topological laws appear as vertical links and are intrinsically discrete, whereas constitutive relations appear as transverse links and in general permit only approximate discrete renderings.
Given the well-known analogy between stationary heat conduction and electrostatics (Burnett, 1987; Maxwell, 1884), one would expect to derive the diagram for this last theory directly from that of electrostatics. An analysis of physical quantities reveals, however, that the analogy is not perfect. Temperature, which is linked by the analogy to electrostatic potential V, is indeed associated, like V, to internally oriented points and time intervals, but heat flow density, traditionally considered analogous with electric displacement D, is in fact associated with externally oriented surfaces and time intervals, whereas D is associated with surfaces and time instants. In the stationary case, this distinction makes little difference, but we will see later, in Fig. 20, that this results in a slanting of the constitutive link between the temperature gradient g and the diffusive heat flux density qd , whereas the constitutive link between E and D is not slanted. This reflects the irreversible nature of the former process, as opposed to the reversible nature of the latter. Since the heat transfer equation can be considered a prototype of all scalar transport equations, it is worth examining in detail, including both the nonstationary and the convective terms. A heat transfer equation that is general enough for our purposes can be written as follows (Versteeg and Malalasekera, 1995): ∂(ρcθ) + div(ρcθu) − div(k grad θ) = σ ∂t
(69)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
179
Figure 18. The Tonti factorization diagram for electrostatics in local form.
where θ is the temperature, ρ is the mass density, c is the specific heat, u is the fluid velocity, k is the thermal conductivity, and σ is the heat production density rate. Note that we always start with field equations written in local form, for these equations usually include constitutive terms. We must first factor out these terms before we can write the topological terms in their primitive, discrete form. Disentangling the constitutive relations from the topological laws, we
Figure 19. The Tonti factorization diagram for magnetostatics in local form.
180
CLAUDIO MATTIUSSI
obtain the following set of topological equations, grad θ = g
(70)
∂qc + div qu + div qd = σ ∂t and the following set of constitutive equations,
(71)
qu = ρcθu
(72)
qd = −kg
(73)
qc = ρcθ
(74)
To write Eqs. (70) through (74), we have introduced four new local physical quantities: the temperature gradient g, the diffusive heat flow density qd , the convective heat flow density qu , and the heat content density qc. Note that of the three constitutive equations, Eq. (72) appears as a result of a driving source term, with the parameter u derived from an “external” problem. This is an example of how the information about interacting phenomena is carried by terms appearing in the form of constitutive relations. Another example is given by boundary conditions describing a convective heat exchange through a part ∂ Dv of the domain boundary. If θ∞ is the external ambient temperature, h is the coefficient of convective heat exchange, and we denote with qv and θv the convective heat flow density and the temperature at a generic point of ∂Dv , we can write qv = h(θv − θ∞ )
(75)
An alternative approach is to consider this as an example of coupled problems, where the phenomena that originate the external driving terms are treated as separate interacting problems, which must also be discretized and solved. In this case, a factorization diagram must be built for each physical field problem intervening in the whole problem, and what is treated here as driving terms become links between the diagrams. In these cases, a preliminary classification of all the physical variables appearing in the different phenomena is required, so that we can select the best common discretization substratum, especially for what concerns the geometry. Putting the topological laws, with the new boundary term [Eq. (75)], in full integral form, we have θ= g (76) I˜
∂ V˜
qv +
I˜
∂ V˜
qu +
I˜
∂ V˜
qd +
I
∂L
∂ I˜
V˜
qc =
I
L
I˜
V˜
σ
(77)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
181
We can define the following global quantities θ = (P × I )
(78)
g = G(L × I )
(79)
I
P
I
L
S˜
qu = Q u ( S˜ × I˜)
(80)
I˜
S˜
qv = Q v ( S˜ × I˜)
(81)
I˜
S˜
qd = Q d ( S˜ × I˜)
(82)
I˜
V˜
qc = Q c (V˜ × T˜ )
(83)
T˜
σ = F(V˜ × I˜)
(84)
V˜
I˜
with the temperature impulse associated with internally oriented points and time intervals; the thermal tension G associated with internally oriented lines and time intervals; the convective and diffusive heat flows Qu, Qv , and Qd associated with externally oriented surfaces and time intervals; the heat content Qc associated with externally oriented volumes and time instants; and the heat production F associated with externally oriented volumes and time intervals. The same associations hold for the corresponding local quantities. This permits us to write Eqs. (76) and (77) in terms of global quantities only: (∂ L × I ) = G(L × I ) Q v (∂ V˜ × I ) + Q u (∂ V˜ × I ) + Q d (∂ V˜ × I ) + Q c (V˜ ×∂ I˜) = F(V˜ × I˜)
(85) (86)
Note that Eq. (86) is the natural candidate for the setup of a time-stepping scheme within a numerical procedure, for it links exactly quantities defined at times which precede the final instant of the interval I to the heat content Qc at the final instant. This completes our analysis of the structure of heat transfer problems represented by Eq. (69) and establishes the basis for their discretization. The corresponding factorization diagram in terms of local field quantities is depicted in Fig. 20. Along similar lines one can conduct the analysis for many other theories. No difficulties are to be expected for those that happen to be characterized— like electromagnetism and heat transfer—by scalar global quantities. More complex are cases of theories in which the global quantities associated with
182
CLAUDIO MATTIUSSI
Figure 20. The Tonti factorization diagram for the heat transfer equation in local form. Note the presence of terms derived from the diagrams of other theories or other domains.
geometric objects are vectors or more complex mathematical entities. This is the case of fluid dynamics and continuum mechanics (in which vector quantities such as displacements, velocities, and forces are associated with geometric objects). In this case, the deduction of the factorization diagram can be a difficult task, for one must first tackle a nontrivial classification task for quantities that have, in local form, a tensorial nature, and then disentangle the constitutive and topological factors of the corresponding equations. Moreover, for vector theories it is more difficult to pass silently over the fact that to compare or add quantities defined at different space–time locations (even scalar quantities, in fact), we need actually a connection defined in the domain. To simplify things, one could be tempted to write the equations of fluid dynamics as a collection of scalar transport equations, hiding within the source term everything that does not fit in an equation of the form of Eq. (69), and to apply to these equations the results of the analysis of the scalar transport equation. However, it is clear that this approach prevents the correct association of physical quantities with geometric objects and is, therefore, far from the spirit advocated in this work. Moreover, the inclusion of too many interaction terms within the source terms can spoil the significance of the analysis, for example,
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
183
hiding essential nonlinearities.∗ Finally, it must be said that, given a field problem, one could consider the possibility of adopting a Lagrangian viewpoint in place of the Eulerian one that we have considered so far. The approach presented here applies, strictly speaking, only to a Eulerian approach. Nevertheless, the benefits derived from a proper association of physical quantities to oriented geometric objects extend also to a Lagrangian approach. Moreover, the case of moving meshes is included without difficulties in the space–time discretization described subsequently, and in particular in the reference discretization strategy that will be introduced in the section on numerical methods (Section IV). III. Representations We have analyzed the structure of field problems, aiming at their discretization. Our final goal is the actual derivation of a class of discretization strategies that comply with that structure. To this end, we must first ascertain what has to be modeled in discrete terms. A field problem includes the specification of a space–time domain and of the physical phenomena that are to be studied within it. The representation of the domain requires the development of a geometric model to which mathematical models of physical quantities and material properties must be linked, so that physical laws can finally be modeled as relations between these entities. Hence, our first task must be the development of a discrete mathematical model for the domain geometry. This will be subsequently used as a support for a discrete representation of fields, complying with the principles derived from the analysis of the mathematical structure of physical theories. The discrete representation of topological laws, then, follows naturally and univocally. This is not the case for constitutive relations, for the discretization of which various options exist. In the next sections we will examine a number of discrete mathematical concepts that can be used in the various discretization steps. A. Geometry The result of the discretization process is the reduction of the mathematical model of a problem having an infinite number of degrees of freedom into one with a finite number. This means that we must find a finite number of entities ∗ As quoted by Moore (1989), Schr¨odinger, in a letter to Born, wrote: “ ‘If everything were linear, nothing would influence nothing,’ said Einstein once to me. That is actually so. The champions of linearity must allow zero-order terms, like the right side of the Poisson equation, V = −4πρ. Einstein likes to call these zero-order terms ‘asylum ignorantiae’” (p. 381).
184
CLAUDIO MATTIUSSI
which are related in a known way to the physical quantities of interest. If we focus our attention on the fields, and think in terms of the usual continuous representations in terms of scalar or vector functions, the first thing that comes to mind is the plain sampling of the field functions at a finite number of points— usually called nodes—within the domain. This sampling produces a collection of nodal scalar or vector values, which eventually appear in the system of algebraic equations produced by the discretization. Our previous analysis reveals, however, that this nodal sampling of local field quantities is unsuitable for a discretization which aims at preserving the mathematical structure of the field problem, since such a discretization requires the association of global physical quantities with geometric objects that are not necessarily points. From this point of view, a sound discretization of geometry must provide all the kinds of oriented geometric objects that are actually required to support the global physical quantities appearing within the problem, or at least, those appearing in its final formulation as a set of algebraic equations. Let us see how this reflects on mesh properties. 1. Cell Complexes Our meshes must allow the housing of global physical quantities. Hence, their basic building blocks must be oriented geometric objects. Since we are going to make heavy use of concepts belonging to the branch of mathematics called algebraic topology, we will adopt the corresponding terminology. Algebraic topology is a branch of mathematics that studies the topological properties of spaces by associating them with suitable algebraic structures, the study of which gives information about the topological structure of the original space (Hocking and Young, 1988). In the first stages of its development, this discipline considered mostly spaces topologically equivalent to polytopes (polygons, polyhedra, etc.). Many results of algebraic topology are obtained by considering the subdivisions in collections of simple subspaces, of the spaces under scrutiny. Understandably, then, many concepts used within the present work were formalized in that context. In the later developments of algebraic topology, much of the theory was extended from polytopes to arbitrary compact spaces. The concepts involved became necessarily more abstract, and the recourse to simple geometric constructions waned. Since all our domains are assumed to be topologically equivalent to polytopes, we need and will refer only to the ideas and methods of the first, more intuitive version of algebraic topology. With the new terminology, what we have so far called an oriented p-dimensional geometric object will be called an oriented p-dimensional cell, or simply a p-cell, since all cells will be assumed to be oriented, even if this is not explicitly stated. From the point of view of algebraic topology, a p-cell τ p in a domain D can be defined simply as a set of points that is homeomorphic to a closed p-ball B p = {x ∈ R p : x ≤ 1} of the Euclidean p-dimensional
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
185
Figure 21. (a) Improper and (b) proper joining of cells.
space (Franz, 1968; Hocking and Young, 1988; Whitney, 1957). To model our domains as generic topological spaces, however, would be entirely too generic. We can assume, without loss of generality, that the domain D of our problem is an n-dimensional differentiable manifold of which our p-cells are p-dimensional regular subdomains∗ (Boothby, 1986). With these hypotheses a p-cell τ p is the same p-dimensional “blob” that we adopted as a geometric object. The boundary ∂τ p of a p-cell τ p is the subset of D, which is linked by the preceding homeomorphism to the boundary ∂ B p = {x ∈ R p : x = 1} of Bp. A cell is internally (externally) oriented when we have selected as the positive orientation one of the two possible internal (external) orientations for it. According to our established convention, we will add a tilde to distinguish externally oriented cells τ˜ p from internally oriented cells τ p . To simplify the notation, in presenting new concepts we will usually refer to internally oriented cells. The results apply obviously to externally oriented objects as well. In assembling the cells to form meshes, we must follow certain rules. These rules are dictated primarily by the necessity of relating in a certain way the physical quantities that are associated with the cells to those that are associated with their boundaries. Think, for example, of two adjacent 3-cells in a heat transfer problem; these cells can exchange heat through their common boundary, and we want to be able to associate this heat to a 2-cell belonging to the mesh. So that this goal can be achieved, the cells of the mesh must be properly joined (Fig. 21). In addition to this, since the heat balance equation for each 3-cell implies the heat associated with the boundary of the cell, this boundary must be paved with a finite number of 2-cells of the mesh. Finally, ∗ In actual numerical problems p-cells are usually nothing more than bounded, convex, oriented polyhedrons in Rn .
186
CLAUDIO MATTIUSSI
to avoid the association of a given global quantity to multiple cells, we should ensure that two distinct cells do not overlap. A structure that complies with these requirements is an n-dimensional finite cell complex K. This is a finite set of cells with the following two properties: 1. The boundary of each p-cell of K is the union of lower-dimensional cells of K (these cells are called the proper q-dimensional faces of τ p, with q ranging from from 0 to p − 1; it is useful to consider a cell an improper face of itself). 2. The intersection of any two cells of K is either empty or a (proper or improper) face of both cells. This last requirement specifies the property of two cells’ being “properly joined.” We can, therefore, say that a finite cell complex K is a finite collection of properly joined cells with the property that if τ p is a cell of K, then every face of τ p belongs to K. Note that the term face without specification of the dimension usually refers only to the (p −1)-dimensional faces. We say that a cell complex K decomposes or is a subdivision of a domain D (written |K | = D), if D is equal to the union of the cells in K. The collection of the p-cells and of all cells of dimension lower than p of a cell complex is called its p-skeleton. We will assume that our domains are always decomposable into finite cell complexes and assume that all our cell complexes are finite, even if this is not explicitly stated. The requirement that the meshes be cell complexes may seem severe, for it implies proper joining of cells and covering of the entire domain without gaps or overlapping. A bit of reflection reveals, however, that this includes all structured and most nonstructured meshes, excluding only a minority of cases such as composite and nonconformal meshes. Nonetheless, this requirement will be relaxed later or, better, the concept of a cell will be generalized, so as to include structures that can be considered as derived from a cell complex by means of a limit process. This is the case in the finite element method and in some of its generalizations, for example, meshless methods. For now, however, we will base the next steps of our quest for a discrete representation of geometry and fields on the hypothesis that the meshes are cell complexes. Note that for time-dependent problems we assume that the cell complexes subdivide the whole space–time domain of the problem. 2. Primary and Secondary Mesh The requirement of housing the global physical quantities of a problem implies that both objects with internal orientation and objects with external orientation must be available. Hence, two logically distinct meshes must be defined, one with internal orientation and the other with external orientation. Let us denote them with the symbols K and K˜ , respectively. Note that this requirement does
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
187
not necessarily imply that two staggered meshes must always be used, for the two can share the same nonoriented geometric structure. There are, however, good reasons usually to also differentiate the two meshes geometrically. In particular, the adoption of two dual cell complexes as meshes endows the resulting discrete mathematical model with a number of useful properties. In an n-dimensional domain, the geometric duality means that to each p-cell τ pi i ˜ of K there corresponds a (n − p)-cell τ˜n− p of K , and vice versa. Note that in this case we are purposely using the same index to denote the two cells, for this not only is natural but facilitates a number of proofs concerning the relation between quantities associated with the two dual complexes. We will denote with n p the number of p-cells of K and with n˜ p the number of p-cells of K˜ . If the two n-dimensional cell complexes are duals, we have n p = n˜ n− p . The names primal and dual meshes are often adopted for dual meshes. To allow for the case of nondual meshes, we will call primary mesh the internally oriented one and secondary mesh the externally oriented one. Note that the preceding discussion applies to the discretization of domains of any geometric dimension. Figure 22 shows an example of the two-dimensional case and dual grids, whereas Fig. 33 represents the same situation for the three-dimensional case.
Figure 22. The primary and secondary meshes, for the case of a two-dimensional domain and dual meshes. Note that dual geometric objects share a common index and the symbol which assigns the orientation. All the geometric objects of both meshes must be considered as oriented.
188
CLAUDIO MATTIUSSI
3. Incidence Numbers Given a cell complex K, we want to give it an algebraic representation. Obviously, the mere list of cells of K is not enough, for it lacks all information concerning the structure of the complex; that is, it does not tell us how the cells are assembled to form the complex. Since in a cell complex two cells can meet at most on common faces, we can represent the complex connectivity by means of a structure that collects the data about cell-face relations. We must also include information concerning the relative orientation of cells. This can be done as follows. Each oriented geometric object induces an orientation on its boundary (Figs. 4 and 6); therefore, each p-cell of an oriented cell complex induces an orientation on its (p −1)-faces. We can compare this induced orientation with the default orientation of the faces as (p −1)-cells in K. Given the ith j p-cell τ pi and the jth (p −1)-cell τ p−1 of a complex K, we define an incidence j number [τ pi , τ p−1 ] as follows (Fig. 23): ⎧ j 0 if τ p−1 is not a face of τ pi i j def ⎨ j τ p , τ p−1 = +1 if τ p−1 is a face of τ pi and has the induced orientation ⎩ −1 as above, but with opposite orientation (87) This definition associates with an n-dimensional cell complex K a collection of n incidence matrices j (88) D p, p−1 = τ pi , τ p−1
where the index i runs over all the p-cells of K, and j runs over all the (p −1)˜ p, p−1 the incidence matrices of K˜ . In the particular cells. We will denote by D case of dual cell complexes K and K˜ , if the same index is assigned to pairs of
Figure 23. Incidence numbers describe the cell-face relations within a cell complex. All the other 3-cells of the complex have 0 as their incidence number corresponding to the 2-cell τ˜2k .
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
189
dual cells, the following relations hold: ˜ p, p−1 = DT D n− p+1,n− p
(89)
It can be proved with simple algebraic manipulations (Hocking and Young, 1988) that for an arbitrary p-cell τ p, the following relationship holds among incidence numbers: i j i (90) τ p , τ p−1 τ p−1 , τ p−2 = 0 i
j
Even if at first sight this relation does not convey any geometric ideas, from it there follow many fundamental properties of the discrete operators that we shall introduce subsequently. The set of oriented cells in K and the set of incidence matrices constitute an algebraic representation of the structure of the cell complex. Browsing through the incidence matrices, we can know everything concerning the orientation and connectivity of cells within the complex. In particular, we can know if two adjacent cells induce on the common face opposite orientations, in which case they are said to have compatible or coherent orientation. This is an important concept, for it expresses algebraically the intuitive idea of two adjacent p-cells’ having the same orientation (Figs. 23 and 24). Conversely, given an oriented p-cell, we can use this definition to propagate its orientation to neighboring p-cells [on orientable n-dimensional domains it is always possible to propagate the orientation of an n-cell to all the n-cells of the complex (Schutz, 1980)].
Figure 24. Two adjacent cells have compatible orientation if they induce on the common face opposite orientations. The concept of induced orientation can be used to propagate the orientation of a p-cell to neighboring p-cells.
190
CLAUDIO MATTIUSSI
4. Chains Now that we know how to represent algebraically the cell complex, which discretizes the domain, we want to construct a machinery to represent generic parts of it. This means that we want to represent an assembly of cells, each with a given orientation and weight of our choice. A first requirement for this task is the ability to represent cells with the default orientation and cells with the opposite one. This is most naturally achieved by denoting a cell with its default orientation with τ p and one with the opposite orientation with −τ p. We can then represent a generic p-dimensional domain cp composed by p-cells of the complex K as a formal sum, cp =
np
wi τ pi
i=1
τ pi ∈ K
(91)
where the coefficient wi can take the value 0, +1, or −1, to denote a cell of the complex not included in cp, or included in it with the default orientation or its opposite, respectively. This formalism, therefore, allows the algebraic representation of discrete subdomains as “sums” of cells. We now make a generalization, allowing the coefficients of the formal sum [Eq. (91)] to take arbitrary real values wi ∈ R. To preserve the representation of the orientation inversion as a sign inversion, we assume that the following property holds true: wi −τ pi = −wi τ pi (92)
With this extension, we can represent oriented p-dimensional domains in which each cell is weighted differently. This entity is analogous, in a discrete setting, to a subdomain with a weight function defined on it; thus it will be useful in order to give a geometric interpretation to the discretization strategies of numerical methods, such as finite elements, which make use of weight functions. In algebraic topology, given a cell complex K, a formal sum like Eq. (91), with real weights satisfying Eq. (92), is called a p-dimensional chain with real coefficients, or simply a p-chain cp (Fig. 25). If it is necessary to specify explicitly the coefficient space for the weights wi and the cell complex on which a particular chain is built, we write c p (K , R). We can define in an obvious way an operation of addition of chains defined on the same complex, and one of multiplication of a chain by a real number λ, as follows: wi + wi′ τ pi (93) c p + c′p = i wi′ τ pi + wi τ pi = i
λc p = λ
i w τ = (λwi )τ pi i p i
i
(94)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
191
Figure 25. Given an oriented cell complex (top), a p-chain (bottom) represents a weighted sum of oriented p-cells. The weights are represented as shades of gray. Notice that negative weights make the corresponding cell appear in the chain with its orientation reversed with respect to the default orientation of the cell in the cell complex.
With these definitions the set of p-chains with real coefficients on a complex K becomes a vector space C p (K , R) over R, often written simply as C p (K ) or C p . The dimension of this space is the number n p of p-cells in K. Note that each p-cell τ p can be considered an elementary p-chain 1 · τ p. These elementary p-chains constitute a natural basis in Cp, which permits the representation of a chain by the n p -tuple of its weights: c p = (w1 , w2 , . . . , wn p )
(95)
Working with the natural basis, we can easily define linear operators on chains as linear extensions of their action on cells. In particular, this is the case for the definition of the boundary of a chain. 5. The Boundary of a Chain The boundary ∂τ p of a cell τ p is by definition the collection of its faces, endowed with the induced orientation (Figs. 4 and 6). Remembering the definition of the incidence numbers, we can write ∂τ p =
n p−1 j j τ p , τ p−1 τ p−1
(96)
j=1
where the index j runs on all the (p −1)-cells of the complex. Note that Eq. (96) gives to a geometric operation an algebraic representation based uniquely on incidence matrices. Since the p-cells constitute a natural basis
192
CLAUDIO MATTIUSSI
for the space of p-chains, we can extend linearly the definition of ∂ to an operator—the boundary operator—acting on arbitrary p-chains, as follows: i ∂c p = ∂ wi ∂τ pi (97) wi τ p = i
i
Thus the boundary of a p-chain is a (p −1)-chain, and ∂ is a linear mapping ∂ : C p (K ) → C p−1 (K ) of the space of p-chains into that of (p −1)-chains. It can be proved (Hocking and Young, 1988), by using Eq. (90), that for any chain cp the following identity holds true: ∂(∂c p ) = 0
(98)
That is, the boundary of a chain has no boundary, a result that, when applied to elementary chains (i.e., to p-cells), satisfies our geometric intuition. The boundary of a cell defined by Eq. (96) coincides practically with the usual geometric idea of the boundary of a domain, complemented by the fact that the faces are endowed with the induced orientation. The calculation of the boundary of a chain defined by Eq. (97) can instead give a nonobvious result. Let us consider p-chains built with a set of cells that form a p-dimensional domain (Fig. 26). For some chains of this kind, it may happen that the result of the application of the boundary operator includes (p−1)-cells that we typically do not consider as belonging to the boundary of the domain. In fact, it turns out
Figure 26. Given a p-chain c p (top), its boundary ∂c p is a ( p − 1)-chain (bottom) that usually includes internal “vestiges” with respect to what we are used to considering the boundary of the domain spanned by the p-cells appearing in the p-chain. The weights of 2-cells are represented as shades of gray and those of 1-cells by the thickness of lines.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
193
that this represents the rule, not the exception, since each “internal” (p −1)-cell of the domain cp appears in Eq. (97), unless the sum of the weights received by it from the p-cells of which it is a face [the so-called cofaces of the (p −1)-cell] vanishes. Obviously, this vanishing is true only for particular sets of weights; that is, for particular chains. Later, we shall build a correspondence between chains and weighted domains. In that context, the boundary of a weighted domain will be defined, and the result will turn out to be confined to the traditional boundary only for particular weight functions.
B. Fields A consequence of our traditional mathematical education is that when we hear the word field we tend to think immediately of its representation in terms of some kind of field function; that is, of some continuous representation. If we refrain from this premature association, we can easily recognize that the transition from what is observed to this kind of representation requires a nontrivial abstraction. In practice, we can measure only global quantities; that is, quantities related to macroscopic p-dimensional space–time subdomains of a given domain. It is, however, natural to imagine that we could potentially perform an infinite number of measurements for all the possible subdomains. We then conceive this collection of possible measurements as a unique entity, which we call the field, and we represent this entity mathematically in a way that permits the modeling of these measurements, for example, as a field function that can be integrated on arbitrary p-dimensional subdomains. Consider now a domain in which we have built a mesh, say, a cell complex K. By so doing, we have selected a particular collection of subdomains, the cells of the complex K. Consequently we must (and can) deal only with the global quantities associated with these subdomains. The fields will manifest themselves on this mesh as collections of global quantities associated with these cells only. Of course, this association will be sensitive to the orientation and linear on cell assembly. This, in essence, is the idea behind the representation of field on discretized domains in terms of cochains. 1. Cochains Given an oriented cell complex K and an (algebraic) field F , consider a function c p which assigns to each cell τ pi of K (thought of as an elementary chain) an element ci of F , written i p (99) τ p , c = ci
194
CLAUDIO MATTIUSSI
and is linear on the operation of cell assembly represented by chains; that is, it satisfies wi τ pi , c p wi τ pi , c p = (100) (c p , c p ) =
This function c p is called a p-dimensional cochain, or simply p-cochain c p . It can be written as c p (K , F ) or c p (K) to designate explicitly the cell complex and the algebraic field involved in the definition [when the complex is externally oriented, we will write c p ( K˜ ) if the complex is explicitly mentioned, and c˜ p if it is not]. We will call ordinary cochains those defined on an internally oriented cell complex, and twisted those defined on an externally oriented one (Burke, 1985; Teixeira and Chew, 1999b). We can readily see that this definition contains the essence of what we said previously concerning the action of physical fields on domains partitioned into cell complexes. The cochain, like a field, associates a value with each cell, and the association is additive on cell assembly. Note that from Eq. (100) it follows that (−τ p , c p ) = −(τ p , c p )
(101)
That is, as expected, the value assumed by a cochain on a cell changes sign with the inversion of the orientation of the cell. Thus, the only thing that must be added to the mathematical definition of a cochain to make it suitable for the representation of fields is the attribution of a physical dimension to the values associated with cells. With this further attribution the values can be interpreted as global physical quantities (which—we stress again—need not be scalars) and the corresponding entity can be called a physical p-cochain. All cochains considered in this work must be considered physical cochains, even if the qualifier “physical” is omitted. From Eq. (100) we see that a cochain c p is actually a linear mapping c p : C p (K ) → F of the space of chains Cp(K) into the algebraic field F , which assigns to each chain cp a value (c p , c p )
(102)
This representation emphasizes the equal role of the chain and of the cochain in the pairing. To assist our intuition, we can think of Eq. (102) as a discrete counterpart of the integral of a field function on a weighted domain, and this can suggest the following alternative representation for the pairing (Bamberg and Sternberg, 1988): cp (103) (c p , c p ) ≡ cp
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
195
We can define the sum of two cochains and the product of a cochain by an element of F , as follows: ′
′
(c p , c p + c p ) = (c p , c p ) + (c p , c p )
(104)
(λc p , c p ) = λ(c p , c p )
(105) p
This definition transforms the set of cochains in a vector space C (K , F ) over F , usually written simply as C p (K) or C p . A natural basis for this vector space is constituted by the elementary p-cochains which assign the unity of F to a p-cell and the null element of F to all other p-cells of the complex. The dimension of C p (K) is, therefore, the number n p of p-cells in K, and on the natural basis we can represent uniquely a cochain as the n p -tuple of its values on cells: c p = (c1 , c2 , . . . , cn p )T ci = τ pi , c p ∈ F (106) With this representation, and with the corresponding one for a chain [Eq. (95)], the pairing of a chain and a cochain is given by (c p , c p ) =
np
wi ci
(107)
i=1
In the case of a physical cochain, the natural representation would be an n p -tuple of global physical quantities associated with p-cells. For example, in ˜ 3 is represented by the a heat transfer problem the heat content 3-cochain Q c n˜ 3 -tuple of the heat contents of the 3-cells τ˜3 of the cell complex K, which discretizes the domain: ˜ 3 = Q 1 , Q 2 , . . . , Q n˜ 3 T Q (108) c c c c where
˜3 Q ic = Q c τ˜3i = τ˜3i , Q (109) c n˜ 3 The heat Qc associated with a chain c˜ 3 = i=1 wi τ˜3i corresponds, therefore, to n˜ 3 ˜3 = Q c = c˜ 3 , Q wi Q ic c
(110)
i=1
Note the similarity with a weighted integral: wqc Qc =
(111)
V˜
Using the concept of cochain, we can redraw the classification diagrams of physical quantities for a discretized domain, substituting the field functions
196
CLAUDIO MATTIUSSI
Figure 27. The Tonti classification diagram of global electromagnetic physical quantities in terms of cochains. Note the presence of two null cochains, corresponding to the absence of magnetic flux production and to the absence of electric charge production.
with the corresponding cochains. For example, in electromagnetism we have ˜ 2 of the 1-cochain U1 of electromagnetic potential; the 2-cochains 2 and 3 ˜ magnetic flux and electric flux, respectively; and the 3-cochain Q of electric charge (to which we must add the null 3-cochain 03 of magnetic flux production and the null 4-cochain 0˜ 4 of electric charge production). The corresponding classification diagram is depicted in Figure 27. Remark III.1 It is sometimes argued that on finite complexes, cochains and chains coincide, since both associate numbers with a finite number of cells (Hocking and Young, 1988). Even disregarding that the numbers associated by chains are dimensionless multiplicities whereas those associated by cochains are physical quantities, the two concepts are quite different. Chains can be seen as functions which associate numbers with cells. The only requirement is that the number changes sign if the orientation of the cell is inverted. Note that no mention is made of values associated with collections of cells, nor could it be made, for this concept is still undefined. Before the introduction of the concept of chain we have at our disposal only the bare structure of the complex—the set of cells in the complex and their connectivity as described by the incidence matrices. It is the very definition of chain which provides the concept of an assembly of cells. Only at this point can the cochains be defined, which associate numbers not only with single cells, as chains do, but also with assemblies of cells. This association is required to be not only orientation dependent, but also linear with respect to the assembly of cells represented by
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
197
chains. This extension from weights associated with single cells to quantities associated with assemblies of cells is not trivial and makes cochains a very different entity from chains, even on finite cell complexes. 2. Limit Systems The idea of the field as a collection of its manifestations in terms of cochains on the cell complexes that subdivide the domain of a problem, finds a representation in certain mathematical structures called limit systems. The basic idea is that we can consider in a domain D the set K of all the cell complexes that can be built on it (with the kind of orientation that suits the field at hand). We can then form a collection of all the corresponding physical p-cochains on the complexes in K. This collection can be considered intuitively the collection of all the possible measurements for all possible field configurations on D. Next we want to partition this collection of cochains into sets, with each set including only measurements that derive from a given field configuration. We define for this task a selection criterion based on the additivity of global quantities. This criterion is the relation that links the cochains within each set and allows our considering each of these sets a new entity, which in our interpretation is a particular field configuration thought of as a collection of its manifestations in terms of cochains. We can define operations between fields, and operators acting on them, deriving naturally from the corresponding ones defined for cochains. For example, we can define addition of fields and the analogous of traditional differential operators (gradient, curl, and divergence) in intuitive discrete terms. This allows an easy transition from the discrete, observable properties to the corresponding continuous abstractions. The reader is warned that the rest of this section is abstract, as compared with the prevailing style of the present work. The details, however, can be skipped at first reading, since only the main ideas are required in the sequel. The point is not to give a sterile formalization to the ideas presented so far, but to provide conceptual tools for the representation of the link existing between discrete and continuous models. Let us now address the mathematics. Consider the set K = {K α } of all cell complexes which subdivide a domain D. In this case, the complexes are internally oriented, but they could be externally oriented ones as well. We will say that a complex Kβ is a refinement of Kα—written Kα < Kβ —if each cell of Kα is a union of cells of Kβ . The set K is partially ordered by the relation <, and for any pair of complexes Kα, Kβ there exists a complex Kγ in K, which is a refinement of both (remember that our domains are homeomorphic to polytopes) (Whitney, 1957). This property makes of K a directed set (Eilenberg and Steenrod, 1952; Hocking and Young, 1988).
198
CLAUDIO MATTIUSSI
For each complex Kα in K let us consider the set C p (Kα) of all the physical p-cochains on Kα, with a given physical dimension [with our choice of the space where the cochains take their values, C p (Kα) is a vector space, but for simplicity, we will consider only the group operation in the sequel]. Whenever Kα < Kβ in K, there is a transformation f K α K β : C p (K β ) → C p (K α ) of C p (Kβ ) into C p (Kα) which, to each cochain c p (Kβ ), associates the cochain c p (Kα) taking on each cell τ p of Kα the sum (with proper signs, to take care of orientation) of the values taken by c p (Kβ ) on the cells of Kβ which compose τ p. Physically speaking the transformations f K α K β are based on the additivity of the global physical quantities upon cell assembly. These transformations satisfy the following properties: r r
f K α K α is the identity transformation for each Kα in K. f K α K β f K β K γ = f K α K γ whenever Kα < Kβ < Kγ .
If F denotes the collection { f K α K β } of all such transformations, and C p the collection {C p (K α ), K α ∈ K} of all the physical p-cochains homogeneous relative to the physical nature of the field, on cell complexes in K, the pair {C p , F} is called an inverse limit system over the directed set K. Each set C p (Kα) in the collection C p is a group, and each f K α K β in F is a homomorphism. The inverse limit group C p (K∞) of the system {C p , F} is the subgroup of the direct sum K C p (K α ) consisting of all sets {c p (K α )}—one element from each group C p (Kα)—for which f K α K β (c p (K β )) = c p (K α ) whenever Kα < Kβ in K. The group operation in C p (K∞) is defined naturally by the formula {c p (K α )} + {c p (K α )} = {c p (K α ) + c p (K α )}
(112)
where the sum on the right indicates the group operation in each C p (Kα). For each Kβ in K there is a natural projection π K β : C p (K ∞ ) → C p (K β ), defined by π K β ({c p (K α )}) = c p (K β ). Each projection π K α is a homomorphism. As anticipated we can interpret all this physically as follows. Each element {c p (K α )} of the inverse limit group C p (K∞) represents a physical field defined in the domain D, which associates physical quantities to p-dimensional geometric objects (let us call it a physical p-field, or simply p-field). The set {c p (K α )} is the collection of its manifestations [in terms of cochains c p (K α )] on the cell complexes K α ∈ K, which decompose D. The cochains that correspond to a given field can be recognized for they are linked by the functions in F. The group operation in C p (K∞) is the addition of fields. The projection π K β associates with each cell complex Kβ and with each field {c p (K α )} the corresponding cochain c p (Kβ ). In the previous example we would have on a ˜ c (K α ), and each set {Q ˜ c (K α )} of the complex Kα the heat content cochains Q inverse limit group Qc(K∞) would be a particular heat content field.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
199
To complete this exposition of limit systems, we will now anticipate some ideas, related to the action of an operator between cochain spaces, that will be introduced in the next section. For each cell complex Kα we will define an operator δ K α —the coboundary operator—which will be used to write topological laws in discrete form. This is a homomorphism of C p (Kα) into C p+1 (K α ). Given the two inverse limit systems {C p , F} and {C p+1 , F ′ } over the directed set K [i.e., the inverse limit system constructed with the p-cochains and the (p + 1)-cochains on the cell complexes which decompose D], we define a transformation δ of {C p , F} into {C p+1 , F ′ } based on the action of δ K α . The transformation δ consists, for each Kα in K, of the transformation δ K α with the condition that whenever Kα < Kβ in K, the commutative relation δ K α f K α K β = f K′ α K β δ K β holds true, such that the sum on p-cells goes into the sum on (p + 1)-cells. Such a transformation δ of {C p , F} into {C p+1 , F ′ } induces a homomorphism δ ∞ of the inverse limit groups C p (K∞) into C p+1 (K ∞ ) as follows. If {c p (K α )} is an element of {C p , F}, and Kα in K is given, set c p+1 (K α ) = δ K α (c p (K α )). Note that if Kα < Kβ the preceding commutative relation tells us that f K′ α K β (c p+1 (K β )) = c p+1 (K α ). Thus {c p+1 (K α )} is an element of C p+1 (K ∞ ). We define δ∞ ({c p (K α )}) = {c p+1 (K α )}. Since each element {C p (K α )} of the inverse limit group C p (K∞) represents a p-field defined in the domain D, δ ∞ represents an operator which transforms a p-field in a (p + 1)field on D. We will see that it can be considered a way to define “differential” operators without the use of derivatives. All this shows how the idea of field can be considered a limit concept abstracted from a collection of discrete manifestations of the field on cell complexes or, if you prefer, from a collection of possible measurements of global physical quantities. Correspondingly, a physical law concerning a field can be considered a collection of relations between the cochains that constitute the field. Note that the idea of field is an abstraction that remains at a higher logical level than that of actual measurements. So, we must take care not to treat a single cochain on a particular cell complex as if it were a field (which is instead a class of cochains), for this would be an error of logical typing (such as eating the menu card instead of the dinner) (Bateson, 1972).
C. Topological Laws Equipped with the concept of the cochain, we are now in a position to give topological laws a discrete representation. Previously, we derived the topological laws of electromagnetism in discrete form [Eqs. (56) through (62)]. The fundamental property of all these relations—shared by all topological laws—is that they equate a global physical quantity associated with a geometric object to
200
CLAUDIO MATTIUSSI
another global quantity associated with its boundary. This appears even more clearly in the space–time formulation of the same laws [Eqs. (63) through (66), repeated next for easy reference]: φ(∂ V ) = 0(V ) Q(∂ H˜ ) = 0(H˜ )
(113) (114)
U (∂ S ) = φ(S ) ψ(∂ V˜ ) = Q(V˜ )
(115) (116)
If the domain is meshed with the primary and secondary cell complexes we will have to substitute the generic geometric objects appearing in Eqs. (113) through (116) with the cells of K and K˜ . Equations (113) through (116), then, become φ(∂τ3 ) = 0(τ3 )
(117)
Q(∂ τ˜4 ) = 0(τ˜4 )
∀τ3 ∈ K ∀τ˜4 ∈ K˜
(119)
ψ(∂ τ˜3 ) = Q(τ˜3 )
∀τ2 ∈ K ∀τ˜3 ∈ K˜
U (∂τ2 ) = φ(τ2 )
(118) (120)
Equations (117) through (120) have simple interpretations. For example, Eq. (120) says that the charge associated with each 3-cell of the secondary mesh K˜ equals the electric flux associated with the boundary of the 3-cell. 1. The Coboundary Operator Each of Eqs. (117) through (120) is a list of equivalences between global quantities. We can ask if the discrete representation of fields in terms of cochains can be used to write this list in a more compact way. A first step in this direction consists of writing each global quantity as a chain–cochain pairing. For example, Eq. (120) becomes ˜ 3) ˜ 2 ) = (τ˜3 , Q (∂ τ˜3 ,
∀τ˜3 ∈ K˜
(121)
However, this is still a list of equivalences of global quantities, not an equivalence of two cochains. In addition, we cannot equate directly the two cochains ˜ 3 because a domain and its boundary have different geometric dimen˜ 2 and Q sions, and the corresponding cochains belong consequently to two different spaces, in this case to C 2 ( K˜ ) and C 3 ( K˜ ), respectively. We can circumvent this problem by defining an operator δ, which, on a given cell complex K, transforms a p-cochain c p into a (p + 1)-cochain δc p , which satisfies the following
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
201
relation: def
(τ p+1 , δc p ) = (∂τ p+1 , c p )
∀τ p+1 ∈ K
(122)
We can extend this definition linearly to arbitrary chains, as follows: def
(c p+1 , δc p ) = (∂c p+1 , c p )
(123)
In this way we have defined an operator δ, which is a linear mapping of the cochain space C p (K) into C p+1 (K ) (to see how this operator acts on combina′ tions of cochains, simply substitute c p + c p or λc p for c p on the left side and observe the effects on the right side). This operator is called the coboundary operator and is just the operator needed to construct the linear transformation δ ∞ of the inverse limit system C p (K∞) into C p+1 (K ∞ ) [i.e., the transformation of p-fields into (p + 1)-fields] anticipated in the final paragraphs of the section devoted to limit systems (Section III.B.2). Let us apply the definition of this new operator to Eq. (121). Particularizing ˜ 2 , we have Eq. (122) for the electric flux cochain def
˜ 2 ) = (∂ τ˜3 , ˜ 2) (τ˜3 , δ
∀τ3 ∈ K
(124)
which, substituted in Eq. (121), gives ˜ 3) ˜ 2 ) = (τ˜3 , Q (τ˜3 , δ
∀τ˜3 ∈ K˜
(125)
Since Eq. (125) asserts the identity of the components of the two cochains in the natural basis representation, it affirms, in fact, the identity of the two cochains. We can, therefore, thanks to the definition of the operator δ, write the topological law [Eq. (116)] in terms of cochains only, as follows: ˜3 ˜2=Q δ
(126)
The definition [Eq. (123)] of the coboundary operator may seem abstract. However, it has a very intuitive meaning that can be exemplified as follows (Tonti, 1975). Equation (124) can be rewritten by substituting ∂ τ˜3 with its expression in terms of incidence numbers. Exploiting the linearity of the chain– cochain pairing, after some reordering of terms, gives ˜ 2) = ˜2 (τ˜3 , δ τ˜3 , τ˜2i τ˜2i , (127) i
More generally, Eq. (122) becomes τ p+1 , τ pi τ pi , c p (τ p+1 , δc p ) =
(128)
i
This means that the coboundary operator operates on a p-cochain c p and builds a (p + 1)-cochain δc p , which assumes on each (p + 1)-cell τ p+1 the global
202
CLAUDIO MATTIUSSI
Figure 28. The coboundary of a p-cochain is a ( p + 1)-cochain, which assigns to each ( p + 1)-cell the sum of the values that the p-cochain assigns to the p-cells which form the boundary of the ( p + 1)-cell. Note that each quantity appears in the sum multiplied by the corresponding incidence number.
physical quantity associated by c p with the boundary of τ p+1 . This value is equal to the sum of the physical quantities associated by c p with the faces of τ p+1 endowed with the induced orientation (Fig. 28). In other words, the coboundary operator takes a quantity associated with the boundary of a geometric object and transfers it to the object itself. From Eq. (128) we see also that if we use the natural representation for the cochains, the coboundary operator admits the following matrix representation in terms of incidence matrices: δc p = D p+1, p · c p
(129)
2. Properties of the Coboundary Operator The coboundary operator enjoys a number of useful properties that are a discrete version of familiar properties of differential operators (in light of our discussion about limit systems, it is more appropriate to say that the properties of the differential operators follow from those of the coboundary operator). First, as a consequence of the relationship [Eq. (90)] holding between incidence numbers [or, if you prefer, from the property ∂(∂cp) = 0, holding for any chain, and the adjointness of boundary and coboundary operators], for any cochain c p we have δ(δc p ) = 0 (Hocking and Young, 1988). We will show later that the coboundary operator is the discrete counterpart (or, from another point of view, the precursor) of a differential operator d, which generalizes the traditional differential operators: gradient, curl, and divergence. This means in particular that the geometric
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
203
interpretation of the action of the coboundary operator given in the previous section can be used to explain the action of these three familiar differential operators. In fact, a bit of reflection reveals that this geometric construction is implicitly used in their traditional textbook definition. In this light, for example, the identity δ(δc p ) = 0 corresponds to the well-known identities curl(grad ϕ) = 0 holding for any 0-field ϕ, and div(curl A) = 0 holding for any 1-field A. Another interesting observation is the fact that on a pair of dual n-dimensional cell complexes the following property of the coboundary operators acting at the same level and at opposite sides of the factorization diagram holds true. The coboundary transforming p-cochains c p in (p + 1)-chains of the primary complex is the adjoint, relative to a natural duality between cochains defined by suitable bilinear forms ·, ·, of the coboundary transforming (n − (p +1))cochains in (n − p)-cochains of the secondary complex (Mattiussi, 1997). For ˜ 2, example, in Figure 29, this is the case of the operators acting in δU1 and δ which satisfy ˜ 2 = U1 , δ ˜ 2 δU1 , For generic cochains this property corresponds to p n−( p+1) p n−( p+1) δc , c˜ = c , δ˜c
(130)
and is expressed in terms of incidence matrices by Eq. (89). The corresponding property for differential operators is the (formal) adjointness of −grad and div, and of curl and −curl. The nondegenerate bilinear forms which put in duality the cochain spaces in Eq. (130) are, in natural basis representation, the n 2 =n˜ 2 2 ˜ 3 = n1 =n˜ 3 Ui Q i , and 2 , ˜ U1 , Q = i=1 j=1 φ j ψ j (where we have assumed that dual cells share the same index). These bilinear forms are discrete counterparts of the energy integrals for the corresponding local field quantities on which the adjointness of the differential operators is based. In summary, by adopting the coboundary operator for the discrete representation of topological laws, and a pair of dual cell complexes as primary and secondary meshes, one automatically builds into the resulting numerical method a number of important properties of the continuous mathematical model. By so doing, contrary to what happens within many numerical methods, one is not forced to check after the discretization has been performed whether these properties are satisfied, or to enforce explicitly these properties as additional constraints in the discretization phase. Note that the prescription to write the topological equations in terms of the coboundary operator does not imply the use of some exotic mathematical entity. It means simply that you adopt the correct association of global physical quantities with oriented geometric objects, and that you write the topological equations in integral form, equating the global quantity associated with each cell with that associated with its boundary. From this point of view, when all the formal properties have been
204
CLAUDIO MATTIUSSI
proved, to say that we are using the coboundary operator is simply a shortcut to signify that the sequence of steps just described is being executed. 3. Discrete Topological Equations Applying the definition of the coboundary operator δ, we can rewrite the topological laws of electromagnetism [Eqs. (113) through (116)] as relations between physical cochains. The steps are those leading from Eq. (116) to Eq. (126). The result is δ2 = 03 ˜ 3 = 0˜ 4 δQ
(131)
δU = ˜3 ˜2 =Q δ
(133)
1
2
(132) (134)
We can thus redraw the space–time classification diagram of Figure 14 in terms of cochains and coboundaries (Fig. 29). Note in the diagram of Figure 29 and in Eqs. (131) and (132) the presence of the null 3- and 4-cochain on K and K˜ , respectively. Note also that contrary to the traditional Maxwell’s equations in differential vector notation, in which positive and negative terms appear, in the cochain– coboundary notation, all signs are positive. This happens because the coboundary operator automatically takes care of the signs, by considering values on
Figure 29. The Tonti classification diagram of electromagnetic physical quantities in terms of cochains, showing the topological laws in terms of the coboundary operator.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
205
the boundary as endowed with the induced orientation. The presence of negative signs in the traditional form of these equations is due to the fact that the default orientation of a geometric object in the mathematical definition of an operator and that in the traditional definition of a physical quantity may not agree. For example, the term −grad V in Eq. (11) defining the electromagnetic potential is due to the fact that in mathematics the default orientation of points is as “sinks,” so that the boundary of an internally oriented line is the endpoint “minus” the starting point, whereas the traditional definition of the scalar potential V implies a default orientation of points as “sources” (inherited, in fact, from the default external orientation of volumes to which electric charge is associated). Remember also, when one is considering the meaning of signs, that the quantities are associated with space–time objects and not merely with geometric objects in space.
D. Constitutive Relations We said previously that the constitutive relations do not lend themselves to a natural discrete representation. Nonetheless, some kind of discrete constitutive link must be given in order to complete the discretization of the physical field problem and to arrive finally at a finite system of algebraic equations that can be subjected to the action of an algebraic solver. The task of finding the discrete constitutive link constitutes, in fact, the central problem of the discretization step. We will consider later in detail a number of possible approaches to this task, mainly inspired from existing numerical methods. For now, we shall limit ourselves to a generic analysis of the structure that this representation must possess. Given the discrete representation of fields in terms of cochains that was presented earlier, the discrete constitutive links must be operators linking cochain spaces (usually an ordinary cochain space to a twisted one, or vice versa),∗ such as F1 : C p (K ) → C q ( K˜ ) F2 : C r ( K˜ ) → C s (K )
(135) (136)
Usually, these operators are discrete links whose structure is directly inspired by that of the local ones between field functions. For example, if we denote with ∗ The analysis of the factorization diagram of a great number of physical theories (Tonti, 1975) suggests that this could always be the case; that is, that every constitutive link could turn out to be a bridge between physical quantities associated with geometric objects endowed with different kinds of orientation. However, no formal proof of this conjecture seems to have been produced so far.
206
CLAUDIO MATTIUSSI
˜ d the electric part of the electric flux 2-cochain, and with e the electric part of the magnetic flux 2-cochain, the electric constitutive relation [Eq. (14)] can be given an approximate discrete representation as an operator Fε as follows: ˜ d = Fε (e )
(137)
˜ d) e = Fε−1 (
(138)
˜ h) b = Fμ ( Q˜ j = Fσ (e )
(139)
˜ d , we shall represent it as If the required link goes from e to ˜ d ), to allow for discrete operators obtained by direct instead of as e = Fε−1 ( discretization of the local link going from E to D, without limiting the choice to those obtained by inverting the link Fε. Analogous representations hold, in both directions, for the magnetic constitutive relation [Eq. (15)] and the generalized form of Ohm’s law [Eq. (16)], the discrete version of which we will write as
(140)
whereas the discrete links in the opposite direction will be written as ˜ h = Fμ−1 (b ) e = Fσ −1 ( Q˜ j )
(141) (142)
These links are usually linear operators between cochain spaces, and can, therefore, be given a natural matrix representation that we will represent, in the case of Eq. (137), as ˜ d = F ε e
(143)
with analogous renderings for the other electromagnetic constitutive links just considered. The widespread use of a linear discrete constitutive link stems from the desire to obtain a linear system of equations by composition of the discrete constitutive operator with the coboundary operator (which is a linear operator between cochain spaces). This choice, however, does not imply that the local constitutive relation from which they derive must also be linear.∗ When the local constitutive relation is nonlinear, whereas the discrete constitutive equation is linear, this last link must be considered as obtained from the former by means of some kind of linearization technique within the numerical procedure. Note that links of the kind in Eqs. (135) and (136) have an even broader scope than may seem at first sight. Consider, for example, a time-dependent problem to be solved in a domain D for a time interval I. The space–time domain ∗ In fact, this will often not be the case, since, being that topological equations are always linear, nonlinearities present in the field equations are due to the constitutive terms, where they can be found when the field equations have been factorized.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
207
of the problem is the Cartesian product D × I, which is decomposed by the primary and secondary cell complexes K and K˜ . Usually the decomposition is the product of a spatial decomposition K s of D by a decomposition K t of I (or K˜ s and K˜ t , respectively). With this hypothesis, Eq. (135), for example, includes relations linking the values taken by C˜ q on all q-cells of K˜ s × K˜ t [that is, all q-cells of K˜ s for all time instants (0-cells) of K˜ t and all (q − 1)-cells of K˜ s for all time intervals (1-cells) of K˜ t ], to the value taken by C p on all p-cells of K s × K t [that is, all p-cells of K s for all time instants of K t and all (p −1)-cells of K s for all time intervals of K t ]. Of course, most of the time, constitutive equations are very simple, particular cases of this general relation. A very common case, for example, with dual cell complexes and the natural dual indexing of cells, would be a diagonal operator. However, we will see later that more general cases can be usefully considered. In particular, it may happen that the actual discrete constitutive link is never explicitly determined, being obtained as a result of an algorithm, for example, an optimization procedure, taking as input a given known cochain, and giving as the result the cochain linked to the former by the constitutive relation. The availability of a discrete representation for the constitutive relations allows the redrawing of the complete factorization diagram in discrete terms. For the case of electromagnetism, the resulting factorization diagram is depicted in Figure 30. Note the distinction between the parts of the cochains referring to time instants and those referring to time intervals, and the corresponding one for the action of the coboundary operator. This distinction is particularly useful in view of the application of the diagram to numerical methods, but remember that it applies only to space–time meshes obtained as Cartesian products of separate discretizations of the space and time domains. E. Continuous Representations The last few sections showed how to represent discretized domains and subdomains, fields, and topological laws in terms of cell complexes, cochains, and the coboundary operator. This can be applied straightforwardly to obtain a formal treatment of the finite volume method, which writes balance and conservation equations over subdomains which are actually simple cells, such as ∂B =0 (144) curl E + ∂t τ2 and
τ3
div B = 0
(145)
208
CLAUDIO MATTIUSSI
Figure 30. The Tonti discrete factorization diagram of electromagnetism, with the fields represented in terms of cochains, the topological laws in terms of the coboundary operator, and the constitutive links as operators between cochain spaces.
which, after application of the integral theorems of vector calculus, become ∂B =0 (146) E+ ∂τ2 τ2 ∂t B=0 (147) ∂τ3
The case of finite element methods, however, does not fit equally well in a representation built on these purely discrete concepts. For example, a weighted residual formulation of Eqs. (144) and (145) is
∂B =0 w · curl E + ∂t τ3 w div B = 0
τ3
(148) (149)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
209
where w is a vector-valued weight function, w is a scalar one, and τ 3 is a regular-three-dimensional domain. After integration by parts, Eqs. (148) and (149) become ∂B =0 (150) w· w×E+ curl w · E − ∂t τ3 ∂τ3 τ3 wB = 0 (151) grad w · B − ∂τ3
τ3
which do not convey the immediate geometric meaning of Eqs. (146) and (147). The weighted residual technique just shown in action, taken as representative of the strategy adopted by the finite element methods, permits nonetheless a geometric interpretation that parallels that of the finite volume methods. To display this analogy, we will introduce some continuous concepts that correspond to the discrete ones introduced so far in the representation of geometry, fields, and topological laws (Table 1). Note that the aim of the present section is not the construction of an abstract correspondence between discrete
TABLE 1 Table of Correspondences between Discrete and Continuous Concepts Discrete
Continuous
p-cell
τp
Dp
p-dimensional domain
Boundary of a p-cell
∂τ p
∂Dp
Boundary of a p-dimensional domain
p-chain
cp
wp
Weighted p-domain
p-skeleton of a cell complex
Kp
Wp
Collection of weighted p-domains
p-cochain
cp
Pairing of p-chain and p-cochain
(cp, c p )
Coboundary operator
δ
ωp
Discrete generalized Stokes’s theorem (τ p+1 , δc p ) = (∂τ p+1 , c p ) Discrete Green’s formula (summation by parts) (c p+1 , δc p ) = (∂c p+1 , c p ) Discrete topological equation (weak form) (∂c p+1 , a p ) = (c p+1 , b p+1 ) ∀c p+1 (strong form) δa p = b p+1
wp
d
p-form ωp
Weighted p-integral of a p-form Exterior differential operator
generalized Stokes’s theorem Continuous p p D p+1 dω = ∂ D p+1 ω
Continuous Green’s formula (integration by parts) p p w p+1 dω = ∂w p+1 ω
Continuous topological equation (weak form) p p+1 ∀w p+1 ∂w p+1 α = w p+1 β (strong form) dα p = β p+1
210
CLAUDIO MATTIUSSI
and continuous concepts. The inspiration for the parallelism comes from (and is instrumental to the application to) numerical methods. The actual formal justification of a particular correspondence—that is, the proof that in the limit the discrete concept goes into the continuous one—may be very difficult, or even impossible. In this case we will simply be confident in the heuristic interpretative value of the correspondence thus built. To parallel the presentation of discrete representations made so far, we should ideally deal first with continuous models for the domain geometry. It turns out, however, that a preliminary discussion of continuous field representations paralleling cochains is preferable. The only concept that we must suppose available is that of n-dimensional domain of integration Dn contained in the domain D, which can be considered as an n-dimensional differentiable manifold and usually is merely a regular subdomain of Rn . 1. Differential Forms Given expressions such as Eqs. (150) and (151), if we want to find a geometric interpretation for the weighted residual methods, it is clear that we will have to deal with integrals. Strictly speaking, the “thing” that is subjected to integration on oriented p-dimensional domains is a p-dimensional differential form (usually called simply a p-form) (Deschamps, 1981; Warnick et al., 1997). If the domain is internally oriented, we speak of an ordinary p-form, which we denote by ω p ; otherwise, the p-form is called twisted and is denoted by ω˜ p (Burke, 1985). We said previously that p-cochains associate values with p-cells and that this association is additive on the sum of cells. This is true also for the integration of p-forms on p-dimensional domains. In fact, a p-form on a cellulizable domain D gives rise to a p-cochain c p (Kβ ) on each cell complex Kβ which subdivides D, since it associates with each cell τ pi a value ci , where i ωp (152) c = τ pi
Note how this parallels the original definition of the action of a cochain, as associating with the cell the value ci = τ pi , c p (153) To emphasize this parallelism, Bamberg and Sternberg (1988) call p-forms also incipient p-cochains. In fact, we can think of the p-form ω p as a particular representation of an element {c p (K α )} of the inverse limit group C p (K∞). The particular cochain c p (Kβ ) is then the projection of ω p on the cell complex Kβ . Given the correspondence of Eq. (152) with Eq. (153), we can ask what corresponds to the more general pairing of chains and cochains (c p , c p ). At
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
211
first sight, we could consider an integral on a chain, using the definition p p def ω = ω = wi ωp (154) cp
i wi τ pi
i
τ pi
This is, in fact, the most general definition of integration on manifolds usually given in textbooks (Choquet-Bruhat and DeWitt-Morette, 1977; Flanders, 1989). A bit of reflection, however, shows that this extension of the concept of integral, from p-dimensional domains to p-chains, does not fulfill our needs, since the integrals appearing in weighted residual expressions such as Eqs. (148) and (149), in the language of differential forms, correspond to wω p (155) τp
where w is a weight function defined on τ p . To find a way out of this impasse, we could think of Eq. (155) as derived from the right-hand side of Eq. (154) by a limit process that considers chains built on finer and finer meshes, or, alternatively, we could reconsider the way a Riemann integral is evaluated by partitioning the integration domain. We will pursue both lines of thought in the next subsection. One could at this point argue that finite element methods do not actually use formulations based on differential forms, since expressions such as Eqs. (148) and (149) have a scalar or vector field function in place of the differential form ω p . This is, however, merely an unfortunate consequence of the historical development of vector calculus as applied to physics. The reduction of what are actually differential p-forms to only scalars and vectors in fact makes things more difficult to understand physically and to represent geometrically. One has only to browse through books such as Burke (1985), Schouten (1989), and Misner et al. (1970), and papers such as Warnick et al. (1977), with their fascinating geometric representations of forms and integration, to convince ourselves of this fact. 2. Weighted Integrals Although the idea behind the operation of actually integrating a p-form on a differentiable manifold is intuitive, one must face some technical problems. For example, to define an analogy of Riemann integration, one must represent the integration domains and their partitions. This problem is usually circumvented, thanks to the presence of a collection of maps—the so-called coordinate maps—of a collection of open sets of the manifold where the integration takes place, onto open subsets of a suitable Euclidean space (Bishop and Goldberg, 1980; de Rham, 1960). This allows the definition of the pullback of a form
212
CLAUDIO MATTIUSSI
from the manifold to the Euclidean space (Burke, 1985), where the familiar machinery of Riemann integration is usually invoked. Justification of this step often involves the idea that the differential form subject to integration in the manifold becomes, after pullback, an ordinary scalar function in Euclidean space, while the integration domain within the manifold becomes a Euclidean domain, which can be easily partitioned into pieces of known extension. But how does it happen that a differential form within the manifold becomes an ordinary function in the Euclidean space? There is, in fact, a step that is usually passed over silently in forming the Riemann sums that define the integral. This step is the pairing of the pulled-back form with a collection of multivectors, which represent the parallelepipeds partitioning the domain in the Riemann integration procedure. This concept of multivector just introduced requires a brief digression. The calculus of differential forms is built on the algebra of forms, which defines forms as linear functions defined on spaces of multivectors. To this end one starts from a vector space and defines first the concept of p-vector vp (Birss, 1980). You can think of a p-vector as defining a p-dimensional oriented domain within an affine space (Tonti, 1975). Thus, a 0-vector ν 0 is a scalar; a 1-vector v1 is the familiar vector and defines an oriented segment along a line; a 2-vector v2, or bivector, defines an oriented surface on a plane; a 3-vector v3, or trivector, defines an oriented volume; and so on, up to the maximum dimension allowed by the affine space.∗ Paralleling the distinction between internal and external orientation for geometric objects, ordinary multivectors, corresponding to internally oriented geometric objects, and twisted multivectors, corresponding to externally oriented geometric objects, can be defined (Burke, 1985; Fig. 31). Note that p-vectors constitute in turn a vector space, and that we can define the extension of a p-vector without recourse to metric concepts. Given the concept of multivector, one can define an algebraic p-form as a linear function on the space of p-vectors, with values in an algebraic field. From this definition it follows that the pairing of a p-vector v p and a p-form ω p gives a value, exactly like the pairing of a chain and a cochain (see Misner et al., 1970, and Warnick et al., 1997, for fascinating geometric illustrations of this pairing). This analogy suggests the following representation of the pairing: (v p , ω p )
(156)
∗ Actually, the situation can be more complex, as a result of the fact that, for example, in four-dimensional space a generic multivector can be compound (Schouten, 1989; Tonti, 1975). However, compound multivectors are not required to represent common geometric objects (Birss, 1980).
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
213
Figure 31. Ordinary and twisted multivectors correspond to internally and externally oriented geometric objects, respectively. Here the case of a three-dimensional vector space is represented.
Given these steps, so that the Riemann integral of a p-form in a Euclidean space can be defined, the integration domain Dp is subdivided into a collection Vp of p-dimensional parallelepipeds, which can be thought of as p-vectors vip (Dezin, 1995). The corresponding Riemann sum is, therefore, vip , ω p (157) S= i
In the sequence of domain partitions considered in the Riemann integration process, the maximum p-vector extension tends to zero, and the corresponding limit, existing for suitable regularity conditions, is the integral of the form ωp Dp
We see, therefore, that in building a Riemann sum we actually pair the differential form with a chain composed by the collection Vp of multivectors, which partition the domain. An obvious extension paralleling that which assigns real weights to cells in the formation of chains consists of using collections of weighted multivectors. This can be done by assigning a scalar weight function w on the integration domain. We can then define the new Riemann sums as
214
CLAUDIO MATTIUSSI
follows: S=
w(ξ i )vip , ω p
(158)
i
where ξ i is a point belonging to the p-vector vip of the domain decomposition. Under the usual regularity conditions, the value of the limit of the sequence of Riemann sums with decreasing maximum extension of the multivectors does not depend on the actual position of ξ i in vip . This limit is the weighted integral and can be denoted by ωp (159) wp
This symbolism emphasizes the fact that the function w weights the integration domain and is not an ordinary function that multiplies the form. This can be thought of as saying that integrals of the kind [Eq. (159)] [which includes as particular cases those appearing in Eqs. (148) through (151)] must be considered Stieltjes integrals and not ordinary integrals of the products of functions (Lebesgue, 1973). If the integral is defined on a regular p-dimensional domain that lies in a manifold, we can, as anticipated, pull back the form and apply the procedure just described for the integration in Euclidean spaces. It is, however, instructive to consider the possibility of going the other way; that is, to push forward the multivectors that partition the Euclidean domain (Burke, 1985). This can be thought of as providing a decomposition of the integration domain in the manifold. There are actually some technical difficulties, since the vectors thus pushed forward do not “belong” to the manifold but to its tangent space (Burke, 1985). We can circumvent this problem, saving the heuristic value of the idea, by thinking, for example, of manifolds which are subsets of a suitable Euclidean space, so that tangent multivectors are actually tangent parallelepipeds that approximate the true image of the Euclidean parallelepiped on the manifold (e.g., see Figs. 8.24 and 8.25 and the related discussion in Bamberg and Sternberg, 1988). To this “decomposition” of the integration domain in the manifold apply all the considerations just made for Riemann sums and weighted integrals, and in particular the role of the function w in Eq. (159) in weighting the “cells” of the decomposition of the domain considered in a Riemann sum; that is, in giving a collection of finer and finer chains, which can be thought of as the continuous counterpart of a chain. 3. Differential Operators To develop further the correspondence between differential forms and cochains, let us consider the action of the coboundary operator. The definition [Eq. (122)]
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
215
of the coboundary operator on the cochains of a cell complex K was given in order to allow the transition from a topological equation in the form ∂τ p+1 , a p = τ p+1 , b p+1
∀τ p+1 ∈ K
(160)
to the following direct relation between cochains: δa p = b p+1
(161)
We can mimic the definition [Eq. (122)] of the coboundary operator in the terminology of differential forms. We obtain p def dω = ωp ∀D p+1 ⊂ D (162) D p+1
∂ D p+1
where d is an operator transforming p-forms into (p + 1)-forms. This operator is called the exterior differential and inherits the property of the coboundary operator of allowing the transition from Eq. (160) to Eq. (161), by transforming a topological equation given in integral form, such as p α = β p+1 ∀D p+1 ⊂ D (163) ∂ D p+1
D p+1
into dα p = β p+1
(164)
Note that usually the exterior differential is defined in terms of derivatives of the form’s components, whereas Eq. (162) constitutes an intrinsic definition (Isham, 1989) which, as emphasized by one of the creators of the calculus of forms,∗ does not require the existence of the derivatives of the form’s components. The generic operator d defined by Eq. (162) combines in a unique operator the action of the familiar differential operators gradient, curl, and divergence, which can also be given an intrinsic definition. Remembering that we call p-field that which corresponds to a quantity associated with p-dimensional geometric objects, we can give the following definitions. The gradient operator acts on 0-fields and gives 1-fields, which satisfy def ϕ ∀D1 ⊂ D (165) grad ϕ = D1
∂ D1
The curl operator acts on 1-fields and produces 2-fields according to def curl A = A ∀D2 ⊂ D D2
(166)
∂ D2
∗ “On conC¸oit donc la possibilit´e de d´efinir la d´erivation ext´erieure comme une op´eration autonome, ind´ependante de la d´erivation classique” (Cartan, 1922) p. 69.
216
CLAUDIO MATTIUSSI
Likewise, the divergence operator acts on 2-fields and gives 3-fields satisfying def B ∀D3 ⊂ D (167) div B = D3
∂ D3
It is worth noting once more that the property δδ = 0 of the coboundary operator is reflected in the property dd = 0 of the exterior differential operator, which in turn corresponds to curl grad = 0 and div curl = 0 in vector calculus notation. Given its properties, the exterior differential d appears as the equivalent of the limit operator δ ∞ defined at the end of the section on limit systems (Section III.B.2). It is no wonder then that its definition can be based on global concepts. Of course, given the additivity of global physical quantities, and the telescoping property following from the opposite orientation induced on the common boundary by adjacent, coherently oriented domains, if the definition [Eq. (162)] of d (and those [Eqs. (165) through (167)] of the traditional differential operators of the vector calculus) is enforced in the small, it holds for every geometric object. This is why in textbook expositions the definitions [Eqs. (165) through (167)] are applied to infinitesimal one-, two-, and threedimensional rectangles, to derive the definition in local terms of the operators. This gives the familiar expressions in terms of derivatives, but our approach shows that these operators have a more general significance. The three-to-one relation of the differential operators of vector analysis with the exterior differential of forms stems from the already signaled limitations of representation in terms of vectors alone, which hide the true “p-nature” of p-fields. Our treatment reveals, for example, that an expression of the kind curl(curl A)
(168)
is meaningless as such, for the 2-field produced by the first application of the operator cannot be operated onto by the second operator. The actual expression should, therefore, be actually something such as curl(k(curl A))
(169)
where the intermediate operator k represents an operator, for example, a constitutive link, which transforms a 2-field into a 1-field (which, if k is a constitutive operator, is usually endowed with a different kind of orientation with respect to A). Using differential forms and the exterior differential one can rewrite in a compact way the equations of electromagnetism. One starts by grouping everything related to a given space–time geometric object in a unique differential form. Thus, remembering the classification of electromagnetic quantities already defined, one has an ordinary 2-form—the electromagnetic 2-form F2,
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
217
which “groups” E and B, or, better, the local counterparts of φ e and φ b—and a twisted 3-form—the charge–current 3-form J˜3 , grouping J and ρ. The local conservation of magnetic flux is expressed by dF2 = 03, and that of electric charge by d J˜3 = 0˜ 4 . The charge–current potentials H and D go into a twisted 2-form G˜ 2 , related to J˜3 by d G˜ 2 = J˜3 , whereas the electromagnetic potentials go into an ordinary 1-form A1 satisfying dA1 = F2. The constitutive relations are expressed by a mapping between differential forms; for example, the electric and magnetic constitutive relations are expressed by G˜ 2 = χ (F 2 ), where χ is a generic operator from the space of ordinary 2-forms to that of twisted 2-forms (as detailed subsequently, often erroneously identified with the Hodge star operator). The construction of the corresponding factorization diagram in terms of differential forms is straightforward. 4. Spread Cells Let us now go back to weighted integrals and combine their properties with those of the newly defined differential operator. We have at last with w p ω p an expression fully correspondent, in a continuous setting, to the chain–cochain pairing (c p , c p ). This is a bilinear pairing with respect to which the boundary and coboundary operators are mutually adjoint, satisfying the relation (c p+1 , δc p ) = (∂c p+1 , c p ). In a similar way we can define the adjoint of the exterior differential as the boundary of the weighted domain. Formally this produces p ωp (170) dω = ∂w p+1
w p+1
and can be given an explicit expression in terms of differential forms (Bamberg and Sternberg, 1988). Since we are interested in its application within the weighted residual method, to interpret formulas such as Eqs. (148) and (149), let us see instead how this appears in the familiar language of vector calculus (remember that the exterior differential operator d corresponds to the gradient, curl, or divergence operators, depending on the type of field under consideration). Integrating by parts the expression that corresponds to the left side of Eq. (170) when d is the divergence operator, we have w div D = wD − grad w · D (171) τ3
∂τ3
τ3
where the 3-cell τ 3 can be taken as the support of the weight function w.∗ The right side of Eq. (171) can be considered to correspond to that of Eq. (170); ∗ The support of a function is the closure of the set of points where it does not vanish (Bossavit, 1998a).
218
CLAUDIO MATTIUSSI
that is, the expression for the “boundary” of a weighted three-dimensional geometric object. In other words, we can give the following formal definition (Mattiussi, 1997): def D= wD − grad w · D (172) ∂(wτ3 )
∂τ3
τ3
where with wτ 3 we represent a weighted 3-cell. Note that, as anticipated while speaking of the boundary of chains, this “boundary” includes actually an integral on the whole 3-cell τ 3, and not only on ∂τ 3, except in the particular case of a weight function which is constant on its support. We can, therefore, give the following geometric interpretation to the corresponding weighted residual formulas. The weight function w defines the continuous counterpart of a chain. We can think of it as a “spread” or “smeared out” cell (Mattiussi, 1997), to be compared with the “crisp” cells considered so far, which can be characterized by a weight function that is constant on its support (O˜nate and Idelsohn, 1992; Fig. 32). When an expression such as w div D = wρ (173) τ3
τ3
is written within a finite element formulation of an electromagnetic problem, and the left-hand side is integrated by parts to get wD − grad w · D = wρ (174) ∂τ3
τ3
τ3
we can consider this last formula as the expression of the balance between the electric charge associated with the corresponding spread cell and the electric
Figure 32. Weight functions which are constant on their support define crisp cells (left). Generic weight functions define instead the continuous counterpart of a chain that can be thought of as a spread cell (right).
219
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
flux associated with the boundary of that spread cell; that is, ρ D= ∂(wτ3 )
(175)
wτ3
If the weight function is proportional to the characteristic function of a cell, that is, it is constant on the cell and is zero outside, the second term of the left-hand side of Eq. (174) vanishes, and the finite element method corresponds to the finite volume method (Fletcher, 1984; O˜nate and Idelsohn, 1992). Otherwise, the finite collection W = {wip } of weight functions used within a weighted residual finite element formulation can be thought of as defining a continuous counterpart of a cell complex, composed by spread cells. Of course, these spread cells usually overlap, whereas the p-cells of a cell complex meet at most on lower-dimensional cells. However, if the weight function constitutes a partition of unity in the domain (Belytschko et al., 1996), something of the spirit that dictated that request for cell complexes remains valid, since the sum of the physical quantities associated with the spread cells of W equals the amount of that quantity associated with the entire domain. Note that the role of integration by parts, or, if you prefer, of Green’s formulas, is interpreted geometrically as defining implicitly the boundary of a spread cell. For this reason, the corresponding discrete formula [Eq. (123)] can be called the discrete Green’s formula or the summation by parts formula. It is worth emphasizing that this summation by parts formula, contrary to those used in the context of compact finite difference methods (Bodenmann, 1995; Strand, 1994; Lele, 1992), is based on topological concepts only and does not require the preliminary definition of an inner product. Moreover, the summation by parts formula [Eq. (123)] is automatically satisfied by adopting a discretization based on cell complexes, chains, cochains, and the corresponding operators, and, therefore, need not be imposed explicitly on the discrete operators which substitute the differential ones. The relation corresponding to Eq. (171) for the case of two-dimensional domains is w curl E = wE − grad w × E (176) τ2
∂τ2
τ2
where E is a generic 2-field. This leads to the following formal definition
def
∂(wτ2 )
E=
∂τ2
wE −
τ2
grad w × E
(177)
for the boundary of a spread 2-cell, to be used, for example, to enforce the
220
following relation:
CLAUDIO MATTIUSSI
∂(wτ2 )
E+
wτ2
∂B =0 ∂t
(178)
An expression such as Eq. (177) would, however, find application within the finite element formulation of a three-dimensional problem, only if a peculiar kind of discretization were defined for the domain that mixed discrete and continuous concepts. An example of such a discretization would be a collection of weight functions defined on the 2-cells of a cell complex that subdivides the three-dimensional domain of the problem. For weighted one-dimensional domains the following definition would apply: def grad wϕ (179) wϕ − ϕ= ∂(wτ1 )
∂τ1
τ1
As before, its use within a numerical method requires a mesh including a collection of spread 1-cells distributed within the domain of the problem. These kinds of formulations, mixing continuous and discrete concepts in the construction of the meshes, are not currently used in the numerical practice. Instead in an n-dimensional domain, only n-dimensional weighted integrals are considered, such as the first term on the left side of Eq. (148) in place of that of Eq. (176). In this case, one can still think that the vector w(ξ ), where ξ is a point within the support of the weight function w, defines locally a weight for bivectors orthogonal to w(ξ ) (if the entities subjected to weighted integration are 2-fields) or vectors parallel to w(ξ ) (if the entities are 1-fields). Thus some remnant of the geometric meaning of the weighted residual equation is still present in these formulations. Note that if the well-known integrability conditions hold, the support of w can be thought of as sliced into a collection of spread cells.∗ For example, irrotational weight functions w(ξ ) define a collection of surfaces orthogonal to the field w, whereas solenoidal weight functions define a collection of lines along it. Note that these cases correspond to the absence, in the expression for the boundary of the corresponding weighted domain, of terms that are integrated on the interior of the support of the weight function (this is the continuous counterpart of the presence of “interior” cells in the boundary of a generic chain). 5. Weak Form of Topological Laws We call the strong solution of a physical field problem that which satisfies its mathematical model in terms of partial integrodifferential equations ∗ The collection of domains supporting the cells can be a foliation, or, in the presence of singularities, a stratification (Abraham et al., 1988).
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
221
supplemented by a set of boundary conditions (Oden, 1973). Correspondingly, this mathematical model is called the strong formulation of the problem. Let us borrow this name and apply it to the differential formulation of topological laws. Hence, we shall call Eqs. (7) through (13) the strong form of the topological laws of electromagnetism, and Eqs. (70) and (71) the strong form of those of heat transfer. In the language of differential forms, these equations can all be rewritten as follows: p
dα p = β p+1
p+1
(180)
where α and β are suitable differential forms representing the fields involved, and d is the exterior differential operator. We said that from the point of view of inverse limit systems, the operator d can be interpreted as the operator δ∞ —that is, a collection of coboundary operators acting between the projections on the directed set K of all the cell complexes which subdivide the domain of the problem—of the fields represented by α p and β p+1. Therefore, a strong topological statement such as Eq. (180) can be interpreted as the collection of all the corresponding discrete topological statements (in terms of cochains and coboundary). Thus, Eq. (180) is equivalent to p
δA p (K ) = B p+1 (K )
p+1
∀K ∈ K
(181)
where A (K) and B (K) are the cochains resulting from the projection of α p and β p+1 on the cell complex K. Seen in this light, the weak and strong formulations of topological laws are different only in our considering the collection of topological statements as an assembly, or as a single entity. This approach applies also to the case of spread cells; that is, to the enforcement of topological laws in terms of weighted integrals discussed previously. Of course, the collection of spread cells must be wide enough so that practically all the conceivable topological statements will be enforced. Thus, when we select a suitable space W of weight functions, a statement such as αp = β p+1 ∀w ∈ W (182) ∂w
w
(where, with some notational abuse, we have identified the weight function with the weighted domain of integration) “leaves nothing to be desired” (Bossavit, 1998b) from the point of view of the enforcement of the topological law expressed by Eq. (180). In fact, it turns out that Eq. (182) is a more comprehensive statement than Eq. (180), since it is not disturbed by the presence of discontinuities in the field, which require instead the enforcement of separate interface conditions when the strong formulation is adopted. Inspired by the language of functional analysis, we can call Eq. (182) the weak formulation of the topological law. The equivalence between weak and strong formulations of topological laws no longer holds if we consider one, or at most a few, cell complexes instead
222
CLAUDIO MATTIUSSI
of the complete collection of cell complexes which subdivide the domain. This is the case for numerical methods in which only one mesh for each kind of orientation is built in the domain, and consequently only the topological statements corresponding to the actual meshes are enforced. Of course, since we are considering in the domain only the physical quantities associated with the geometric objects of the meshes, we cannot hope to enforce a wider set of topological equations than those which involve these quantities (which, however, are enforced exactly). In particular, if we build a field function defined on the whole domain, starting from the finite collection of global quantities defined on the complex and satisfying on it the corresponding topological law, we can expect topological prescriptions not included in those enforced to be violated (Bossavit, 1998a).
IV. Methods A. The Reference Discretization Strategy We have at this point all the elements to ascertain whether or not a discretization strategy complies with the tenets derived from an analysis of the mathematical structure of physical field theories. To provide a framework for the development of new methods that satisfy these requirements, and to facilitate the comparison of these principles with those adopted by a number of popular numerical methods, we will now describe a reference discretization strategy directly based on the ideas developed so far. Note that this reference strategy does not qualify as a complete numerical method since the discretization of the constitutive relations is described in generic terms only. However, it will be clear that a whole class of methods complying with the analysis discussed so far can be obtained by combining the elements of the reference strategy. In fact, and this is one of the central points of this article, the reference discretization strategy is intended as a template to be used for the systematic construction of numerical methods for field problems complying with the structure of the underlying physical theory. The only prerequisite for the application of this template is the determination of the factorization diagram for the problem for which a numerical method is sought. Seen in this more limited practical perspective, all the discussion so far can be interpreted as an introduction to the language and tools that allow the correct execution of this preliminary step, the reference discretization strategy in itself being reformulable in traditional mathematical terms without the need to refer to the ideas of algebraic topology. The reference discretization strategy is presented here for the case of timedependent electromagnetic problems, since at this point we know the structure of the factorization diagram for this theory. Moreover, although the analysis
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
223
introduced in the present work applies to static problems as well, we know that it is in space–time that it comes to full fruition. As a consequence of this choice of the problem for the reference strategy, the comparison that will follow will consider mostly time-domain methods, be they finite difference, finite volume, or finite element methods. In summary, we consider, as the subject of the discretization, within a bounded space–time domain, a problem constituted by Maxwell’s Eqs. (7), (8), (12), and (13) [or any of the integral formulations derived from them within the present work; in particular, the fully discrete form represented by Eqs. (56), (57), (61), and (62)], supplemented by a set of constitutive relations [e.g., Eqs. (14) through (16)]. To complete the definition of the problem, we will assume as given a set of initial and boundary conditions that make the problem well posed. Imposed currents and charges can also be specified as independent sources; for example, a term such as Jv = ρv is very common in problems deriving from particle accelerator design. 1. Domain Discretization The space–time domain is discretized by the reference strategy by using two dual oriented cell complexes, which act as primary and secondary meshes.∗ We assume that each mesh is obtained as a Cartesian product of the elements of a cell complex, which subdivides the problem domain in space, by those of a cell complex discretizing the time interval for which a solution is sought. The more complex case of moving meshes, and the even more general case of generic space–time cell complexes, could be contemplated as well but entail a number of difficulties in the attribution of a physical meaning to the quantities and the deduction of suitable constitutive equations (Nguyen, 1992), which we choose to avoid in this context. Remember, however, that the reference method can be extended to include these cases as well. Given this choice, we have for p = 0, 1, 2, 3 four collections of indexed primary p-cells {τ pi , i = 1, . . . , n p } in space. To each primary p-cell τ pi there cori responds a dual secondary (3 − p)-cell τ˜3− p with the same index and the default orientation defined by that of the dual primary cell. The secondary cells also constitute, therefore, four collections of p-cells {τ˜ pi , i = 1, . . . , n˜ p = n 3− p } (Fig. 33). Notice that as a way to facilitate drawing, the 1- and 2-cells in Figure 33 and in Figures 35 and 36, are straight or planar, but this is not required by the definition of cell, on which the reference method is based. However, the ∗ In fact, the reference discretization strategy as described next applies to nondual primary and secondary meshes as well, and in particular to the case of geometrically coincident (although logically distinct) primary and secondary meshes (this last choice simplifies the setup of boundary conditions and the treatment of material discontinuities). However, if we would use nondual meshes, some significant algebraic properties of the discretized model based on dual meshes (in particular, operators’ adjointness) would be at risk of being lost.
224
CLAUDIO MATTIUSSI
Figure 33. Reference discretization of the domain in space. Note that the orientation and index of each primary p-cell are used to index and orient the dual secondary (3 − p)-cell (the orientation of 0-cells and 3-cells is not represented).
use of planar and straight cells greatly simplifies the calculations, especially for what concerns the discretization of the constitutive relations. The actual construction of the two meshes is problem dependent, since it must consider what kind of boundary conditions are specified (which determines what kind of cells are needed at the boundary); where the material discontinuities, if any, are located; and to what constitutive parameter they refer. This last point will become clearer after the discussion on constitutive relations discretization (in Section IV.A.3). In general, one can start by defining a primary mesh that conforms to material and domain boundaries, and then construct the secondary cells by defining within each n-cell of the primary mesh (n is the dimension of the domain) a secondary 0-cell. The secondary mesh is then built by starting from this 0-cell so as to make the two cell complexes reciprocally duals. The actual position of each secondary 0-cell within its dual n-cell also depends on the problem and on the strategy adopted for the discretization of constitutive relations. In many cases, the position corresponding to the barycenter of the n-cell is a good choice, but, as will be hinted at subsequently, it is not always the optimal one. The domain in time is a time interval I subdivided by two dual cell complexes. The primary one is constituted by two collections of indexed p-cells, with p = 0, 1. The 0-cells are time instants {t pn , n = 1, . . . , N } indexed according to increasing time. As a way to simplify the notation and facilitate the comparison with existing methods, the time interval going from t0n to t0n+1 n+1/2 is indexed as t1 . In time, to each primary cell t pn there corresponds a dual n secondary cell t˜1− p , inheriting the index of its dual cell. Thus, the time interval n−1/2 n+1/2 t˜1n goes from the time instant t˜0 to the time instant t˜0 (Fig. 34). Primary
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
225
Figure 34. Reference discretization of the domain in time.
space–time cells are obtained as Cartesian products τ pi × tqn , and secondary ones as products τ˜ pi × t˜qn . Note that the duality of the meshes applies in both space and time. This discretization supplies the oriented geometric objects needed to support the global physical quantities of electromagnetism; that is, Q ρ , Q j , φ b , φ e , ψ d , ψ h , U a , and U v [defined in Eqs. (48) through (55)]. However, the quantities actually appearing in the formulas of the reference strategy are Q j , φ b , φ e , ψ d , and ψ h only. The association of a global quantity with a geometric object will be denoted by a pair of indexes, according to the following convention: b (183) φ b τ2i × t0n = φi,n n+1/2 e φ e τ1i × t1 (184) = φi,n+1/2 n+1/2 d (185) ψ d τ˜2i × t˜0 = ψi,n+1/2 h (186) ψ h τ˜1i × t˜1n = ψi,n j Q j τ˜2i × t˜1n = Q i,n (187) 2. Topological Time Stepping Previously, we explained how Faraday’s law and Maxwell–Amp`ere’s law can be given a geometric interpretation in terms of global quantities associated with a space–time cylinder (Figs. 12 and 13). Within a domain discretized following the prescriptions of the previous subsection, we can apply this property to build a topological time-stepping procedure. Faraday’s law is used to time-step φ b as follows. We build a space–time cylinder on a primary 2-cell τ2i
226
CLAUDIO MATTIUSSI
Figure 35. Blown-up representation of the geometric objects and global physical quantities involved in topological time stepping on the primary mesh. The (internal) orientation of the geometric objects is not represented.
considered at the time instant t0n . The resulting 2-cell τ2i × t0n is the first base of the cylinder. The boundary ∂τ2i of τ2i , considered during the time interval n+1/2 n+1/2 n+1/2 t1 , is a finite collection of 2-cells ∂τ2i × t1 = k [τ2i , τ1k ]τ1k × t1 that constitutes the lateral surface of the cylinder. The cylinder is closed by the 2-cell τ2i × t0n+1 (Fig. 35). If we assume as known the primary global quantib e b ties at times t < t0n+1 , that is, φi,n and φk,n+1/2 , we can calculate exactly φi,n+1 2 3 from the topological equation δ = 0 [Eq. (131)], which, if we isolate the unknown term, becomes b b φi,n+1 = φi,n ±
n1 i k e τ2 , τ1 φk,n+1/2
(188)
k=1
where the actual sign of the second term of the right side depends on the default orientation assumed for the primal 2-cells to which a positive φ e is associated. Using the representation [Eq. (106)] of cochains as vectors of global physical quantities and the definition [Eq. (88)] of the incidence matrices, we can rewrite the topological time-stepping formula [Eq. (188)] in matrix terms, as follows: bn+1 = bn − D2,1 en+1/2
(189)
where we assume the default orientation which gives to the last term a minus sign.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
227
Figure 36. Blown-up representation of the geometric objects and global physical quantities involved in topological time stepping on the secondary mesh. The (external) orientation of the geometric objects is not represented.
An analogous procedure holds for the time stepping of ψ d by means of ˜ 3 on the secondary mesh (Fig. 36). The result ˜2=Q Maxwell–Amp`ere’s law δ is h j d d ψi,n+1/2 = ψi,n−1/2 ± ± Q i,n (190) τ˜2i , τ˜1k ψk,n k
j Q i,n
is the charge associated by the electric current flowing through τ˜2i where during the time interval t˜1n . With a suitable choice of default orientations, the matricial representation of Eq. (190) corresponding to Eq. (189) is d d ˜j ˜ 2,1 ˜ nh − Q ˜ n−1/2 ˜ n+1/2 = +D n
(191)
b } given as part of the initial conditions, we If we consider the collection {φi,0 can use Eq. (188) to start a time stepping for φ b , provided the set of values e } is known. Of course, we cannot expect these values to be also given {φk,1/2 d as initial conditions. We can, however, assume the set of values {ψi,1/2 } to be e d given as initial conditions. Hence, we can derive {φk,1/2 } from {ψi,1/2 } by means of a discrete constitutive link Fε−1 , and advance in time in this way φ b from h b b b } to {φi,1 }. At this point we know {φi,1 } and we can derive {ψi,1 } from it by {φi,0 j e means of a discrete constitutive link Fμ−1 , and {Q i,1 } from {φ j,1/2 } and {φ ej,3/2 } d d or, better, indirectly from {ψi,1/2 } and {ψi,3/2 } by means of a constitutive link
228
CLAUDIO MATTIUSSI
Figure 37. The two half-time steps of the reference method. Topological time stepping is applied on each side of the diagram, to update φ b and ψ d , respectively. The discrete constitutive links supply the quantities required by the time-stepping formulas; that is, φ e for the updating of φ b , and ψ h and Q j for the updating of ψ d .
Fσ,ε−1 which includes the action of the constitutive link f ε−1 and of fσ . This d d allows the determination of {ψi,3/2 } by time-stepping {ψi,1/2 } using Eq. (190), and so on (Fig. 37). In matricial representation, for a generic time step n, this corresponds to d ˜ n+1/2 (192) bn+1 = bn − D2,1 Fε−1 d d ˜ 2,1 Fμ−1 b − Fσ,ε−1 ( ˜ n−1/2 ˜ n+1/2 ˜ d) = +D (193) n
˜ d ) the cochain has no time-step subscript, as a where in the term Fσ,ε−1 ( d d ˜ n+1/2 and ˜ n−1/2 reminder that both are involved in the link. The topological time-stepping formulas [Eqs. (188) and (190)] are based on two of Maxwell’s four equations. We will now prove that in adopting the time-stepping scheme thus described, we will not need to explicitly enforce the other pair of Maxwell’s equations. As usual for numerical methods devoted to time-domain electromagnetic problems, it will suffice to show that, if these equations are satisfied at a given time instant, they remain so after the execution of a time step. Consider first Gauss’s magnetic law. This asserts the vanishing of magnetic flux associated with the boundary of any 3-cell τ 3 at any time instant t0n . To simplify the notation, we denote with n the magnetic flux cochain at time t0n , thus avoiding the use of products of spacelike and timelike geometric objects in the formulas. With this provision, Gauss’s law for a particular 3-cell
229
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
τ3∗ , considered at time t0n , reads as follows: φ b ∂τ3∗ × t0n = (∂τ3 , n ) = 0
(194)
where, in the middle term, we have represented the quantity in terms of a chain–cochain pairing. We must now show that, from the validity of Eq. (194) and the application of the time-stepping formula [Eq. (188)], there follows the validity of Gauss’s law at time t0n+1 . Substituting in Eq. (194) the expression [Eq. (96)] of the boundary in terms of incidence numbers, we have ∗ i b ∗ i i ∗ b n (195) τ3 , τ2 φi,n = 0 τ3 , τ2 τ2 , n = φ ∂τ3 × t0 = i
i
The same substitution, applied to the expression of φ b at time t0n+1 , gives ∗ i b φ b ∂τ3∗ × t0n+1 = (196) τ3 , τ2 φi,n+1 i
Substituting the time-stepping formula of Eq. (188) in the right side of Eq. (196), we obtain i k e b b ± = τ2 , τ1 φk,n+1/2 τ3∗ , τ2i τ3∗ , τ2i φi,n τ3∗ , τ2i φi,n+1 i
i
k
i
(197)
Rearranging the terms, we have b e b ± = τ3∗ , τ2i τ2i , τ1k φk,n+1/2 τ3∗ , τ2i φi,n τ3∗ , τ2i φi,n+1 i
i
i
k
(198)
The first term on the right side of Eq. (198) vanishes, since we have assumed Eq. (195) to hold true; the second term vanishes in virtue of the relation [Eq. (90)] holding among incidence numbers. Hence, remembering Eq. (196), we finally have (199) φ b ∂τ3∗ × t0n+1 = 0
This proves that if Gauss’s magnetic law is satisfied at time t0n , then, upon application of the topological time-stepping formula, it is also satisfied at time t0n+1 . From Eq. (198) there follows a more general conclusion, namely, that following the execution of topological time stepping, the amount of violation of Gauss’s magnetic law, if any, does not change. In other words, the topological time stepping on φ b automatically enforces the law of magnetic charge conservation. In the case of Gauss’s electric law the balance to be enforced is that between the electric charge associated with the 3-cells and the electric flux through its
230
CLAUDIO MATTIUSSI n−1/2
boundary. Suppose that at time t˜0 there is a charge Q ρ∗,n−1/2 associated with ∗ the 3-cell τ˜3 , and that the following relation holds: n−1/2 ψ d ∂ τ˜3∗ × t˜0 (200) = Q ρ∗,n−1/2
Repeating the steps of the preceding proof, but using instead the time-stepping formula of Eq. (190), we obtain j n+1/2 τ˜3∗ , τ˜2i Q i,n = Q ρ∗,n−1/2 ± ψ d ∂ τ˜3∗ × t˜0 (201) i
This shows that after topological time stepping, the electric flux associated with the boundary of τ˜3∗ may have changed. However, this change is consistent with the law of charge conservation, since the new term on the right side of Eq. (201) is the result of the electric current flowing through the boundary of the cell during the time interval t˜1n . Hence, the topological time stepping on ψ d enforces automatically the law of electric charge conservation and preserves the violation of Gauss’s electric law, if any. Note how the realization of the space–time nature of topological laws suggests the adoption of a uniquely determined time-stepping procedure on each side of the factorization diagram of physical quantities, an observation that is true for any theory admitting such a factorization diagram. It is clear that we are revealing here the roots of the adoption, within many numerical methods for partial differential equations, of a leapfrog time-stepping procedure based on two half-time steps. This choice, in the absence of the justification given in the present work, which is based on the analysis of the structure of physical theories, is often considered as an oddity, justified only by the good results it offers. [For further details on the topological time-stepping process see the discussion in Mattiussi (2001).] The limits that the univocity of the topological time-stepping process puts on the form of the complete time stepping are not as severe as might appear at first. It is indeed true that the topological time-stepping formulas (189) and (191) are based on a topological law applied to a single, space–time 3-cell, and, therefore, that each newly calculated value directly depends only on quantities associated with the cell itself and with its boundary. The complete time-stepping operator includes, however, in addition to the topological relation, at least one discrete constitutive operator. This constitutive operator links the quantities directly involved in the topological time-stepping formula to other quantities. The constitutive link, therefore, allows the extension both in space and in time of the dependence of the newly calculated value on quantities associated with cells other than those for which the topological law is enforced. Thus, the newly calculated values associated with a given cell can be made to depend on quantities associated with cells of a generic neighborhood of that cell in space, and extending in time deeper into the past than the single time step considered
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
231
by the topological relation, or on other quantities for the time instant at which the new quantity is calculated. This can be expressed by rewriting the particular time-stepping formulas of Eqs. (192) and (193) as follows: ˜ d) bn+1 = bn − D2,1 Fε−1 (
d d ˜ 2,1 Fμ−1 (b ) − Fσ,ε−1 ( ˜ n+1/2 ˜ n−1/2 ˜ d) = +D
(202) (203)
that is, removing the time subscript from the cochains entering the discrete constitutive links in order to emphasize the involvement of the whole space–time cochain in the link. Observe that if the expression of the discrete constitutive operators is explicitly given, by actually substituting them in the topological time-stepping formulas, we can make them depend on only two kinds of variables, in this case φ b and ψ d (but other pairs of variables can be selected to appear in the formulas, applying differently the constitutive links). In summary, there is a possibility of building a variety of time-stepping procedures (including implicit ones) complying with the adoption of a topological time-stepping operator. Given the variety of discrete constitutive links that can be built, and the uniqueness of the topological time-stepping links, one could, therefore, conceive a numerical package offering the choice between different discretization strategies for constitutive relations, to be combined with the unique discretization of topological laws based on the coboundary operator. Note finally that we can expect problems in trying to use a weighted residual approach to build a topological time-stepping procedure, for the geometric ideas upon which we have based the topological time-stepping procedure cannot be easily extended to spread cells. 3. Strategies for Constitutive Relations Discretization The task of constitutive relations discretization consists of determining a link between cochains which approximates the local constitutive equation and, with it, the ideal material behavior it represents. There are many possible approaches to this task, and from this point of view the three cases presented next are in no way exhaustive. However, since many numerical methods do not consider explicitly the particular problem represented by the discretization of constitutive relations and perform this task in a manner that appears at most as some form of educated guessing, we will try at least to present discretization procedures for constitutive relations that can be applied systematically. Our inspiration, as usual, comes from existing numerical methods. We will first consider one of the simplest and most intuitive approaches that qualify as a systematic technique for the discretization of constitutive relations; then two more sophisticated classes of strategies are presented.
232
CLAUDIO MATTIUSSI
Remark IV.1 The discretization of constitutive relations is often referred to as the discretization of the Hodge star operator ∗ (Tarhasaari et al., 1999; Teixeira and Chew, 1999b; see Bamberg and Sternberg, 1988, for a formal definition of its action). Considering the great variety of possible constitutive links, this point of view appears too restrictive. The Hodge star operator institutes indeed a one-to-one correspondence between ordinary p-forms α p and twisted (n − p)-forms β˜ n− p defined on an n-dimensional manifold. We can represent this relation as follows: β˜ n− p = ∗α p
(204)
It is apparent from Eq. (204) that if we adopt the representation of fields as differential forms, the Hodge operator can play the part of a constitutive link (provided we include within it the required material parameters). However, in this role, the Hodge operator is a mathematical model for the behavior of a particular class of ideal materials (and when it is considered merely in mathematical terms, that is, without the intervention of material parameters, not even that), and cannot be considered as a model for all material behaviors. It is the fact that the Hodge star operator constitutes the traditional bridge between ordinary and twisted forms that tempts us to consider it the constitutive operator. That things are not so can be seen by considering that constitutive equations can be mathematically much more complex than the simple correspondence brought about by the Hodge operator (see, for example, the discussion in Post, 1997). Of course, since the transition from ordinary to twisted differential forms (or vice versa) is implied by the constitutive links, the Hodge operator or something analogous, capable of “crossing the bridge” in the factorization diagram, will be required, but typically only as a part of the complete constitutive link. In other words, every operator linking two differential forms that represent two fields plays the role of the constitutive relation of a particular ideal material [perhaps a nonphysical one, as in the case of materials which form the so-called perfectly matched layer, used for the implementation of absorbing boundary conditions (Berenger, 1994; Teixeira and Chew, 1999a)]. However, contrary to what happens with the topological equations, no single operator can claim a privileged role as constitutive operator. a. Discretization Strategy 1: Global Application of Local Constitutive Statements While introducing the idea of constitutive links, we hinted at the fact that a local constitutive equation of the kind D = εE holds true also in a macroscopic space–time region, provided that the fields are uniform in space and constant in time, and that the material is homogeneous. In this case, if the surface S to which the electric flux is associated is planar, and orthogonal to the straight
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
233
line segment L to which the voltage is associated, we can write φe ψd =ε (205) S LI where I is the time interval during which the voltage is considered and, with some notational abuse, we have identified the symbols of the geometric objects with the value of their extension. The uniformity conditions upon which the transition from the local statement to the global one is based are admittedly severe. Consequently these requirements are not satisfied in the majority of cases, and equations such as Eq. (205) are, therefore, only rough approximations of the actual relation holding between the global variables. Nonetheless, in many numerical methods this approach is adopted more or less explicitly in order to obtain a discrete version of the constitutive links. The rationale behind this choice is that when we decrease the maximum space and time discretization steps, the uniformity hypothesis is approached more and more closely and Eq. (205) becomes an acceptable discrete approximation of D = εE. Note that in addition to uniformity there is a requirement of geometric regularity and orthogonality of the geometric objects and, therefore, of the discretization meshes. Hence, this approach will require meshes with dual, orthogonal cells, for example, the regular orthogonal grids of the FDTD method or the more general Delaunay–Voronoi meshes (Guibas and Stolfi, 1985). This last requirement, however, can usually be relaxed, paying the price for a more complex evaluation of the terms appearing in the link, to take care, for example, of the angles between the cells or their curvature. In summary, applying consistently this simple discretization technique to the local expression of all the constitutive relations appearing in a problem, one obtains a series of links between cochains, which are the required discrete constitutive links. Since this approach can give only very crude approximations of the actual constitutive relations, it is acceptable only if the fields do not vary rapidly in space and time or if the recourse to a very fine mesh can be accepted. To find, for the constitutive relations, a more accurate discrete approximation than the one just presented, we must use more extensively the information represented by the local constitutive equations. We will consider, in the next subsections, two approaches based on the preliminary reconstruction—based on the corresponding cochains—of one or both of the field functions appearing in the constitutive equation written in local form. b. Discretization Strategy 2: Field Function Reconstruction and Projection The method considered in the present section requires the reconstruction of only one of the field functions appearing in the constitutive equation in local form. An example will clarify the actual workings of the strategy. Suppose that
234
CLAUDIO MATTIUSSI
we want to discretize the following constitutive equation: B = f μ (H)
(206)
As usual, to simplify the notation we represent the field functions using the usual tools of vector calculus, even though a formulation in terms of differential forms would be more appropriate (see Teixeira and Chew, 1999b, for a description of the strategy both in differential forms and in vector calculus language). In the discrete setting of the reference strategy, the field functions B and H appearing in Eq. (206) do not belong to the problem’s variables, and ˜ h, we have instead the magnetic flux cochain b and the electric flux cochain b h ˜ which at the end of the process must be linked by the relation = Fμ ( ). In order to use the information constituted by Eq. (206), we proceed by deriving ˜ h a field function H. To this end, we select a reconstruction from the cochain ˜ h a field function H, as follows: operator Rh giving for each cochain ˜ h) H = Rh (
(207)
Note that the reconstruction in Eq. (207) starts from space–time global quantities, and, therefore, the reconstructed field is intended as given in space– time also, as a function H(r, t). We can now apply to H the local constitutive link [Eq. (206)], obtaining the field function B. We must finally return to cochains, and we do this by means of a projection operator Pb, which produces a cochain b for each field function B, as follows: b = Pb (B)
(208)
The composition of the operators [Eqs. (207), (206), and (208)], gives the desired discrete constitutive link Fμ: ˜ h ))) b = Fμ ( h ) = Pb ( f μ (Rh (
(209)
The same approach applies to the discretization of the electrostatic link, and of any other constitutive relation, of which the local expression corresponding to Eq. (206) is known (Fig. 38). A natural joint requirement for reconstruction and projection operators is that for every cochain c p the following relation holds true: P∗ (R∗ (c p )) = c p
(210)
That is, by projecting back the reconstructed field one must obtain the original cochain. This means that the combined operator P∗ R∗ must be the identity operator. Note, however, that this is not true in general for the operator R∗ P ∗ ; that is, because of the limitations of the reconstruction operator, by projecting a generic field and then reconstructing it one typically does not obtain the original field (Tarhasaari et al., 1999).
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
235
Figure 38. The reconstruction–projection method for the discretization of constitutive relations. Given the starting cochain, an approximation of the corresponding field function is determined by means of a reconstruction operator R∗ . The result is then subjected to the action of one or more local constitutive operators f ∗ . The resulting field function is finally projected on the cell complex by means of an operator P∗ ; thus one recovers a cochain.
Note also that in Eqs. (207) and (208) we added to the symbols R and P of the reconstruction and projection operators an index, which refers to the field function involved in the process. This was done to serve as a reminder that the operators must comply with the association of physical quantities with oriented geometric objects, so that each operator will be “tailored” to the actual nature of the fields it is called to operate on. In particular, the reconstruction operator operates on p-cochains and produces p-fields that must be compatible with the original cochain. It is clear, therefore, that the proper selection of the reconstruction operator is instrumental in the attainment of a good discrete solution. For this reason, subsequently a separate section is devoted to the discussion of some actual reconstruction operators. Using the just-derived representation of the discrete constitutive operators, we can rewrite the generic time-stepping formulas (202) and (203) for this particular discretization strategy, as follows: ˜ d ))) bn+1 = bn + D2,1 Pe ( f ε−1 (Rd (
d d ˜ 2,1 Ph ( f μ−1 (Rb (b ))) + P j ( f σ ( f ε−1 (Rd ( ˜ n+1/2 ˜ d )))) ˜ n−1/2 = +D
(211) (212)
With the reconstruction and projection operators appearing in Eqs. (211) and (212), the reference discretization strategy becomes a class of numerical methods complying with the reference discretization strategy. Let us analyze in general terms where the discretization error might enter this class of methods. The strategy first requires reconstructing the field function by starting from a cochain. This opens the first door to possible errors since we cannot expect the
236
CLAUDIO MATTIUSSI
true solution to be in the range of our reconstruction operator. Remembering what has been said about limit systems and the concept of field as a collection of its manifestations in terms of cochains on the directed set of all the cell complexes that subdivide the domain, we can interpret this by saying that a single cochain alone cannot determine the field it derives from. More precisely, given a cell complex and a cochain defined on it, there would be in general an infinite number of fields that admit that cochain as its projection on that complex. The choice of the reconstruction operator corresponds, therefore, to the selection of a particular field in the multiplicity of fields that are compatible with the discrete image we start from. After the reconstruction step and the application of the local constitutive operator, we project the resulting field function onto the cell complex, obtaining a cochain, and we impose on it the topological equation. Since the cell complex is finite, it follows that we are enforcing only a finite subset of all the possible topological relations implied by the corresponding topological laws. This gives more freedom to the solution than what was implied by the original physical field problem. It is the combination of these two processes that gives rise to the discretization error that we will finally find in the solution of the discrete problem. This double nature of the discretization error was lucidly analyzed by Schroeder and Wolff (1994). The reconstruction–projection strategy for the discretization of constitutive relations, and the corresponding error analysis, was given a formal treatment based on the concepts of the theory of categories (Tarhasaari et al., 1999). In this context, the names Whitney functor and de Rham functor were suggested for the reconstruction and projection operators, respectively. Let us further comment regarding the properties of the projection operator. Speaking of limit systems, we said that a field can be thought of as a collection of cochains, each of which is a projection of the field on a particular cell complex. Adopting the natural representation of a cochain as a vector of global physical quantities associated with cells, the operation of projection amounts, therefore, to the evaluation of these global quantities on the p-cells of the complex, where p and the complex orientation must suit the nature of the field. In practice, this evaluation corresponds usually to the integration of the field function on these p-cells. This fact, combined with the fact that the reconstruction operator performs an approximation of the field function, opens the door to some optimization opportunities. We know from the theory of approximation that the reconstructed function typically has a set of loci where the approximation is of a higher order than that at generic points of the domain. Therefore, by making the cells where the projection is performed coincide with these loci, we can obtain a higher accuracy in the resulting discrete constitutive equation. This means that the accuracy of the results can benefit from the proper selection and placement of the primary and secondary meshes. In particular, it can be shown that the choice of two suitably placed dual grids can
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
237
be desirable also in this respect (Mattiussi, 1997). For low-order polynomial reconstructions on regular cells, the center of the cells is usually the optimal location for the dual object. This extends to the time-stepping procedure, where it suggests the placement of secondary time instants in the middle of primary time intervals. However, for higher-order polynomial and for nonpolynomial reconstruction, as well as for nonregular cells, the determination of the optimal reciprocal position and shape of the cells of the two cell complexes can be far from obvious. The solution of this problem requires usually the solution of an algebraic or a differential system of equations which, in turn, are derived by enforcing some best approximation constraints. [See Mattiussi (1997) for a description of some possible approaches to this problem.] Note that the approach used in the reconstruction–projection strategy to discretize the constitutive relations gives rise to discrete links where the value of the resulting cochain on each cell depends on the values taken by the cochain one starts with on many cells, potentially on all those entering the reconstruction process. So that the sparsity of the matrices appearing in the system of algebraic equations is preserved, the reconstruction process is usually performed locally, so that the value of the reconstructed field function on each point depends only on the values taken by the original cochain on the cells of a sufficiently small neighborhood of the point. In particular, the simple discretization strategy described in the previous subsection is a particular case of the strategy described here. In that case the reconstruction operator works locally on a single cell, giving a uniform field, which is projected onto the dual cell. c. Discretization Strategy 3: Error-Based Discretization There is another approach to the discretization of constitutive relations which is based on the reconstruction of field functions. Let us describe the workings of this strategy by using the same example of the previous strategy. As before, ˜ h and we want to determine the we assume as known the electric flux cochain b magnetic flux cochain . Contrary to the previous case, however, we apply a reconstruction operator to both cochains as follows: ˜ h) H = Rh ( B = Rb (b )
(213) (214)
Since the cochain b is the unknown term of the discrete link, the reconstruction of B is made in formal terms only. Ideally, the relation holding between B and H is the local constitutive equation B = fμ(H). From the discussion of the previous subsection we know that, both fields being obtained by reconstruction from cochains, these fields are forced to be in the range of the reconstruction operators. Consequently, we cannot expect the local constitutive equation to be satisfied exactly by the reconstructed fields.
238
CLAUDIO MATTIUSSI
We, therefore, define an error density function in the domain, which we denote with ξ (B, H)
(215)
which is intended to give a local estimate of the amount of the violation of the constitutive link. We ask, as the minimal set of requirements for this scalar function which measures the local constitutive error, for it to always be positive and to vanish only for B and H satisfying B = fμ(H). The actual definition of the function ξ is a nontrivial task, which depends on the problem and on its constitutive relations. We will not consider in detail this subject here, assuming this function as given. For this important topic the reader is referred to the literature, in particular to that dealing with complementary variational techniques, which appear especially suited to the determination of physically significant local error functions linked to local and global energy estimates (Albanese and Rubinacci, 1998; Golias et al., 1994; Marmin et al., 1998; Oden, 1973; Penman, 1988; Remacle et al., 1998; Rikabi et al., 1988). Substituting the reconstruction operators [Eqs. (213) and (214)] in the error function [Eq. (215)], we obtain the local error function in terms of the cochains ˜ h and b : ˜ h )) ξ (B, H) = ξ (Rb (b ), Rh (
(216)
Integrating ξ on a space–time domain D we obtain a global error functional b ˜h ˜ h )) E ( , ) = ξ (Rb (b ), Rh ( D
˜ is known, we can determine the optimal cochain b by Since the cochain means of the following optimization problem: h
b = b : min E (b , ˜ h ) b
(217)
˜ h ) and, thereThis procedure implicitly defines a link of the kind b = Fμ ( fore, establishes a discrete constitutive relation approximating the local constitutive relation. Note that this approach to the discretization of a constitutive relation puts the two fields linked by the equation on a more equal level than does the previous strategy. There is, of course, still a direction of the link, going from the known cochain to the unknown one, but, in terms of the coefficients of the cochains involved, the link is no longer many-to-one but many-to-many. From a physical point of view this appears sound, considering that a constitutive relation should not be considered a cause–effect relationship in which a field is given, which fully determines another field, but instead must be viewed as
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
239
a constraint which codetermines both fields. Note also that, in general, the minimization problem [Eq. (217)] takes place in space–time and not in space only. In fact, in a time-dependent problem, a minimization procedure must be performed at each step, for example, on the space–time domain constituted by the Cartesian product of the domain D in space multiplied by the time step t, as follows: ˜ h )) E (b , ˜ h ) = ξ (Rb (b ), Rh ( t
D
The necessity of solving an optimization problem at each time step can greatly increase the computational cost of this strategy. In fact, the error-based approach was applied in the past mainly to static or quasistatic problems, or to frequency-domain problems (Albanese and Rubinacci, 1993). This is also because the required theoretical analyses and error functions were first given for these cases. However, the development of computing machines can quickly make attractive this kind of approach in alternative to the reconstruction– projection strategy for time-dependent problems also (Albanese et al., 1994; Albanese and Rubinacci, 1998). 4. Edge Elements and Field Reconstruction The discretization strategies for constitutive relations that we have presented require the reconstruction of field functions on the basis of the unknowns of the discrete formulation of the problem. This appears at first as a familiar problem with function approximation. However, in our case, the starting data are cochains, that is, collections of global values on cells, not nodal samples of field functions. In other words, instead of a traditional nodal-based approximation problem, we must consider a problem of, cochain-based field function approximation (Mattiussi, 1997, 1998; Fig. 39). The two concepts coincide
Figure 39. (Left) A nodal-based field function approximation is based on a set of local scalar or vector values defined on a grid of points. (Right) A cochain-based field function approximation takes instead as its starting point a p-cochain, that is, a set of global values associated with the oriented p-cells of the mesh. Here the case of ordinary 1-cochains on two-dimensional domains is considered.
240
CLAUDIO MATTIUSSI
only for the case quantities associated with points, which, for theories having scalar global values, are represented in the continuous case by 0-forms (i.e., by scalar field functions), and in the discrete case by scalar-valued 0-cochains. Working only with unknowns associated with points, one can easily overlook the fact that the reconstruction is actually based on cochains, and this is what happens, for example, with a potential formulation of electrostatics. However, already in magnetostatics, one is faced with the fact that neither the fields nor the vector potential is associated with points. To use the traditional nodalbased tools of approximation theory in this case one is, therefore, forced to ignore the correct association of physical quantities with geometric objects, and consequently also to abandon any hope of complying with the structure of the field problem. The alternative is the introduction of new approximation tools tailored to the characteristics of cochains. In this sense it can be said that one needs to consider at least three-dimensional magnetostatics to start appreciating the true nature of the task constituted by the discretization of an electromagnetic problem. In summary, an ordinary approximation problem asks: “Find on a given domain a scalar-valued or vector-valued function which approximates the data constituted by local scalar or vector values defined on a set of points.” However, our approximation problem states: “Find on a given domain a p-form that approximates the data constituted by global values associated with the p-cells of a cell complex.” In other words, we are requiring of the reconstructed p-form to have the given cochain as its projection onto the complex. A traditional approach to the solution of approximation problems within numerical methods is the selection of a set of shape functions. In our case, these are suitp able forms σi (r), that is, p-forms with the correct kind of orientation for the p-cochain one starts with, which can be used as a basis for the reconstruction, for example, in terms of a linear combination of them, as in p ω p (r) = a i σi (r) (218) i
The reconstruction must, of course, be uniquely determined (and, in fact, it must be a well-conditioned operation), and this is reflected in independence requirements of the shape functions. If instead of differential forms, we choose to work with the traditional tools of vector calculus, the entity to be approximated is a scalar or vector function; that is, a p-field defined in the domain. Correp p p spondingly the forms σi (r) become field functions si (r) or si (r). From now on we will speak generically of shape functions, including in this definition differential forms and scalar and vector functions. In general, the shape functions for an approximation problem are defined globally; that is, they are nonzero on the whole domain. In numerical methods
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
241
for field problems it is often preferable to define shape functions of a local nature; that is, functions that are nonzero only within the domain constituted by a small number of adjacent cells. A class of shape functions, which complies with all the requirements listed so far, is that of the so-called edge elements. These are shape functions, which were introduced about 20 years ago in finite element practice (Ahagan et al., 1996; Albanese and Rubinacci, 1998; Webb, 1993; Jin, 1993). Edge elements are usually defined in terms of the kind of interelement discontinuities they permit. We will offer instead the following definition: an edge element is a shape function σ p defined in a domain subdivided by a cell complex, whose projection on the cell complex is an elementary p-cochain; that is, a cochain whose value is one on a particular p-cell τ pi and is zero on all other p-cells of the cell complex. In formulas, 1 if i = j p (219) σj = 0 if i = j i τ p
Note that Eq. (219) is a natural extension to generic geometric objects of the requirement traditionally imposed on the nodes of scalar shape functions. If the shape functions in the reconstruction [Eq. (218)] satisfy Eq. (219), they also satisfy automatically the property expressed by Eq. (210); that is, we have P ∗ R∗ = I . To comply with the requirements of numerical methods, we ask also of this shape function to be nonzero only on a small neighborhood of τ pi . We will call such a shape function an ordinary or a twisted (depending on the orientation of the corresponding cochain) p–edge element, to emphasize the correspondence to a particular oriented geometric object. The aforementioned definition of edge elements is intended as a unifying definition in terms of the role they play in the discretization process, that of cochain-based field function approximation (their possible role as weight functions is discussed later, in the context of the finite element methods). Paralleling the reasons behind the introduction of the reference discretization strategy, this definition of the edge elements is not intended to give a sterile classification, but rather to help in testing existing elements for their consistency with this role, to be a guide for the development of new elements, and to assist in extending the application of edge elements to new fields. Our definition of edge elements may seem strange to edge elements practitioners also because such practitioners are accustomed to taking as their starting point the averaged components of the field to be reconstructed (be they tangent or normal to p-cells). A bit of reflection, however, reveals that the two ideas are perfectly equivalent, since multiplying the averaged field component by the extension of the cell, one obtains the global value associated with the cell, whereas the fact that only the averaged tangential or normal component (to accommodate internal or
242
CLAUDIO MATTIUSSI
Figure 40. (Left Column) Edge elements are usually considered as based on the averaged components of the field tangent to the cells or normal to them. This is, however, equivalent to considering the corresponding global values associated with internally or externally oriented cells, respectively (right column). Edge elements for two-dimensional problems are considered here. Note that wavy arrows represent external orientation of geometric objects and not the presence of vectors associated with them.
external orientation, respectively) is considered ensures that the field quantities contain no more information than the global value (Fig. 40). From our definition of edge elements it follows that given a cochain c p on p a cell complex K and a set of edge elements {σi }, one for each p-cell of K, we can construct a field function in the domain |K | as a linear combination of the kind of Eq. (218), but this time with the coefficients ci that are the values taken by the cochain to be reconstructed on the p-cells of the complex; that is, the vector that represents the cochain with respect to the natural basis for cochains [Eq. (106)]. The simple requirement of having an elementary cochain as their projection does not uniquely determine edge elements. One can indeed find a multitude of shape functions that comply with this requirement, and in particular with Eq. (219). For this reason, one tries, in selecting edge elements for a problem, to satisfy other properties also, in particular those related to the accuracy of the reconstruction and, therefore, of the computation. These include, for example, the presence of the polynomial terms up to a given order in the reconstructed functions or in a transformation thereof (Sun et al., 1995). Note that in some cases a certain number of missing terms in the reconstructed function can be dispensed with, with a proper placement of the meshes (Mattiussi, 1997). In this quest for a higher-order edge element, one may end up defining shape functions that resemble edge elements but, according to our definition,
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
243
Figure 41. Anomalous edge elements mix internal and external orientations and associate multiple quantities with a single geometric object, which thus violates the principles of the association of physical quantities with geometric objects.
are not. This is the case for a number of the so-called vector elements that have been proposed to improve the behavior of the first generation of edge elements (Cendes, 1991; Sun et al., 1995). Let us follow the path that leads to the introduction of these elements. We consider 1–edge elements (i.e., elements whose projections are 1-cochains) for two-dimensional problems, and we represent them with the degrees of freedom they associate with a triangle (Fig. 40). It is apparent that edge elements of this kind permit the reconstruction of 1-fields on the primary and secondary meshes. These elements are characterized, however, by a small number of degrees of freedom, and, therefore, by a small number of terms in the approximating polynomials. This translates into a slow rate of convergence for the methods in which they are employed (Sun et al., 1995). To circumvent this problem, one would naturally increase the number of degrees of freedom associated with each edge element. Figure 41 shows the result, in terms of degrees of freedom, for a popular vector element derived following this idea. It is apparent that these elements mix internal and external orientations and associate multiple quantities with a single geometric object. Thus, reconstruction based on these kinds of elements violates two of the fundamental principles of the association of physical quantities with geometric objects. Since they do not comply with our definition of edge elements, we will call these kinds of elements anomalous edge elements. Anomalous edge elements show that not every non-nodal-based shape function is an edge element. There is no doubt, however, that the introduction of such elements was dictated by the necessity to overcome some real problems. Let us, therefore, try to bring these elements back within our definition of edge element. To this end, we must first ensure that the number of geometric objects appearing in the element equals the number of instances of the physical quantity that we want to associate with it (i.e., the number of degrees of freedom of the element). We do this by adding to the original element a suitable number of geometric objects of the correct dimension and orientation. In this way we end up with an element with the
244
CLAUDIO MATTIUSSI
Figure 42. Anomalous edge elements can be brought back to conformity with the prescriptions deriving from the structural analysis of physical field theories, by introducing additional geometric objects.
same number of degrees of freedom as that of the original anomalous element, except that each instance of the physical quantity is now associated with a distinct geometric object, and the kind of orientation is the same for all the cells intervening in the reconstruction. For example, in the case of Figure 41 the quantity under consideration is associated with internally oriented cells and, therefore, the new geometric objects are primary 1-cells; Figure 42 shows a possible result of the process just described. Figure 42 shows also that the support of an edge element can be composed by the assembly of several p-cells of the cell complex. This determines an interpolation domain which is the union of several n-cells of the corresponding cell complex (where n is the dimension of the domain). In fact, this assembly of the p-cells belonging to several neighboring n-cells of the cell complex is the easiest way to increase the number of terms in the resulting interpolating polynomial while preserving the local nature of the interpolation process. It appears, therefore, that the recourse to field function reconstruction for the discretization of the constitutive equations implies, besides the definition of the primary and secondary meshes, the introduction of an additional discretization structure for the geometry. This additional discretization structure is composed of by what we will call the elements, which are the domains on which separate approximation problems are solved to reconstruct the field function starting from the cochain coefficients. For example, for the reconstruction of an ordinary (or a twisted) p-field we consider within each element the ordinary (or twisted) p-cells and use the physical quantities associated with these cells (and only these) to build the field function approximation which holds true within the element. We will call element mesh the structure determined by the elements (Fig. 43). Note that an element mesh is required for each field to be reconstructed. Note also that even if usually each element of the mesh is constituted by the union of a small number of n-cells of the primary or secondary mesh, this is not mandatory. One could in fact conceive of a separately defined element mesh to accommodate this process, which can be itself a cell complex
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
245
Figure 43. The recourse to field reconstruction for the discretization of the constitutive relations requires the definition of an additional geometric structure, in addition to the primary and secondary meshes, namely, an element mesh for each field to be reconstructed. Here each object of the new mesh is obtained as the union of cells of the primary or secondary mesh, but more general geometric structures not based on the two preexisting meshes can be used as well.
or not (e.g., different elements could overlap), or, as in the case of meshless methods (Belytschko et al., 1996), no element mesh at all. Finally, we will make some scattered comments on edge elements and the operation of reconstruction in general. First, we should mention that edge elements alone do not guarantee that a discretization complies with the results of the analysis of the structure of physical theories. Given a cochain, edge elements reconstruct a field which complies with that cochain, in the sense that the latter is a projection of the former. However, if the cochain we start with is a nonphysical one, in that, for example, it does not satisfy the topological laws of our theory, we cannot ask the reconstructed field to do so. Hence, only a proper formulation of the field problem, along with the use of edge elements in the reconstruction of field functions, guarantees the physical soundness of the solution (Mur, 1994). Next, note that within the approach presented in this section, shape functions are not used to obtain a continuous field defined on the whole domain, but only as a step in the realization of the discretization strategies for constitutive relations. From this point of view, they are a tool which is used temporarily in a phase of the discretization process after which they are discarded. Of course, one must not be careless in using this tool. In particular, one must consider the fact that the discontinuities in the properties of materials usually produce corresponding discontinuities in some of the field functions. Therefore, to properly approximate the constitutive equations in elements containing material discontinuities, one sees here the necessity— for the first time—of taking into account these material discontinuities and of making them coincide with element boundaries, so that discontinuities of the field functions can be modeled by the reconstruction process. We emphasize
246
CLAUDIO MATTIUSSI
“for the first time” because the discrete rendering of topological laws, as presented previously, is not disturbed by the presence of material discontinuities, since topological laws do not depend on material parameters. In fact, when one is dealing with global quantities only, the very concept of field function continuity is meaningless. Note also that the reconstruction is instrumental to the constitutive relations discretization, which implies that we do not ask the reconstructed fields to satisfy the topological laws (in local, differential form), since these are imposed only (in global form) at the cell-complex level.
B. Finite Difference Methods We now deal with the comparison of existing methods with the reference discretization strategy just detailed, starting with the finite difference (FD) methods. We should mention in advance that in this presentation of the methods the references typically cited are not founding papers but preferably survey works including extensive bibliographies. The classical FD approach to the discretization of field problems is based on the use of FD formulas to approximate locally the derivatives entering the expression of differential operators. A structured grid of points is defined first— usually a very regular one—and a local field quantity is attached to each point. Then, for each of these points the differential operators appearing in the problem’s equations are given a discrete expression by means of the aforementioned FD formulas. Given the absence of any reference to the association of physical quantities with geometric objects other than points, one can hardly expect such an approach to yield results consistent with those of the analysis developed thus far. This was indeed the case for the first attempts to give an FD formulation for electromagnetic problems. These attempts resulted in methods that have practically nothing in common with the reference method developed in the previous section. The first notable exception was the well-known finite difference time-domain method, the analysis of which offers some interesting results. 1. The Finite Difference Time-Domain Method Consider an electromagnetic problem with constitutive equations B = μH and D = εE, and, for simplicity, the absence of electric losses. To solve this problem, the finite difference time-domain (FDTD) method starts by defining within the space–time domain of the problem two dual orthogonal Cartesian grids. These subdivide the space domain into parallelepipeds having size x, y, and z. The time domain is subdivided into time steps of size t. The primary and secondary grids are staggered by a half step in both space and time. To preserve the distinction emphasized thus far between primary and secondary
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
247
Figure 44. The FDTD method makes use of two dual orthogonal grids staggered by a half step in space and in time. The variables appearing in the time-stepping formulas are the field components of E and H tangent to the grid edges and evaluated in the midpoint of each edge.
geometric objects, and the corresponding one between the related quantities, y, z, and t for the discretization steps of the secondary we will write x, grid, even if in the FDTD method these quantities coincide numerically with the primary ones. The variables used within the method are the x, y, and z components of the electric and magnetic fields E and H, which are attached to the midpoint of the grid edges, and the local values of the material properties at the same locations. The nodes and the associated quantities are individuated x by integral or half-integral indexes; for example, E (i, j+1/2,k+1/2),n+1/2 stands for x E (ix, ( j + 1/2)y, (k + 1/2)z; (n + 1/2)t) (Fig. 44). With this symbolism, the FDTD time-stepping formulas are, for the time stepping of H x , x x yzμ(i+1/2, j,k) H(i+1/2, j,k),n+1 − H(i+1/2, j,k),n y y = yt E (i+1/2, j,k+1/2),n+1/2 − E (i+1/2, j,k−1/2),n+1/2 z z − zt E (i+1/2, (220) j+1/2,k),n+1/2 − E (i+1/2, j−1/2,k),n+1/2 and, for the time stepping of E x , x x zε (i, j+1/2,k+1/2) E (i, y j+1/2,k+1/2),n+1/2 − E (i, j+1/2,k+1/2),n−1/2 t H(i,y j+1/2,k+1),n − H(i,y j+1/2,k),n = − y z z t H(i, (221) + z j+1,k+1/2),n − H(i, j,k+1/2),n
Analogous relations hold for the time stepping of the other components of E and H (Kunz and Luebbers, 1993; Taflove, 1995). Note that the method seemingly does not make use of global physical quantities, resorting instead
248
CLAUDIO MATTIUSSI
to nodal values of local vector quantities. We can, however, observe that each of the field components appearing in these formulas can be considered the ratio of the global quantity associated with a cell and the extension of the cell itself. Interpreting the local field components in this way, that is, as averaged field components with respect to spacelike and space–time 2-cells, and remembering that from the local constitutive equations it follows that x x x B(i+1/2, j,k),n = μ(i+1/2, j,k) H(i+1/2, j,k),n and D(i, j+1/2,k+1/2),n+1/2 = ε(i, j+1/2,k+1/2) x E (i, j+1/2,k+1/2),n+1/2 , we can write bx x yzμ(i+1/2, j,k) H(i+1/2, j,k),n+1 = φ(i+1/2, j,k),n+1
(222)
e
y
y yt E (i+1/2, j,k±1/2),n+1/2 = φ(i+1/2, j,k±1/2),n+1/2
(223)
ez z zt E (i+1/2, j±1/2,k),n+1/2 = φ(i+1/2, j±1/2,k),n+1/2
(224)
and dx x zε (i, j+1/2,k+1/2) E (i, y j+1/2,k+1/2),n+1/2 = ψ(i, j+1/2,k+1/2),n+1/2 hy t H(i,y j+1/2,k+1),n = ψ(i, y j+1/2,k+1),n hz z t H(i, z j+1,k+1/2),n = ψ(i, j+1,k+1/2),n
(225) (226) (227)
With these definitions the FDTD method can be described as working in terms of global quantities. In particular, the FDTD time-stepping formula for H x [Eq. (220)] becomes the following time-stepping formula for φ bx , e
e
bx bx y y φ(i+1/2, j,k),n+1 = φ(i+1/2, j,k),n + φ(i+1/2, j,k+1/2),n+1/2 − φ(i+1/2, j,k−1/2),n+1/2 ez ez − φ(i+1/2, j+1/2,k),n+1/2 + φ(i+1/2, j−1/2,k),n+1/2
(228)
and the FDTD time-stepping formula for E x [Eq. (221)] becomes h
h
dx dx y y ψ(i, j+1/2,k+1/2),n+1/2 = ψ(i, j+1/2,k+1/2),n−1/2 − ψ(i, j+1/2,k+1),n + ψ(i, j+1/2,k),n hz hz + ψ(i, j+1,k+1/2),n − ψ(i, j,k+1/2),n
(229)
Comparing Eqs. (228) and (229) with the time-stepping formulas of the reference discretization method [Eqs. (188) and (190)], we recognize that the former are a particular case of the latter. The signs of the φ e and ψ h terms on the right sides of Eqs. (228) and (229) correspond to the incidence numbers appearing in the reference formulas. From Eq. (222) we see that the following relation holds true: x H(i+1/2, j,k),n =
1 1 φ bx μ(i+1/2, j,k) yz (i+1/2, j,k),n
(230)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
249
Conversely, from the equation corresponding to Eq. (226) for the case of the x component, we determine x H(i+1/2, j,k),n =
1 ψ hx t (i+1/2, j,k),n x
(231)
Comparing Eqs. (230) and (231), we obtain bx φ(i+1/2, j,k),n = μ(i+1/2, j,k)
yz h x ψ t (i+1/2, j,k),n x
(232)
This is the constitutive equation linking φ bx to ψ h x . The same procedure, applied to Eqs. (225) and (223), gives dx ψ(i, j+1/2,k+1/2),n+1/2 = ε(i, j+1/2,k+1/2)
z e y φx xt (i, j+1/2,k+1/2),n+1/2
(233)
Equations (232) and (233) are clearly discrete constitutive equations of the simplest type, like Eq. (205), obtained by extending the local constitutive equations B = μH and D = εE and exploiting the planarity, regularity, and orthogonality of cells in the meshes adopted by the FDTD method. This is even more clear when we write Eqs. (232) and (233) as follows: bx φ(i+1/2, j,k),n
yz
= μ(i+1/2, j,k)
hx ψ(i+1/2, j,k),n t x
(234)
ex dx φ(i, ψ(i, j+1/2,k+1/2),n+1/2 j+1/2,k+1/2),n+1/2 = ε(i, j+1/2,k+1/2) xt y z
(235)
Therefore, we can affirm that the FDTD method implicitly uses discretization strategy 1 for the discretization of constitutive relations. Substituting these discrete constitutive equations, or their inverse, in the time-stepping formulas [Eqs. (228) and (229)], we obtain time-stepping formulas in terms of two global variables only. In particular, the formulas in terms of φ b and ψ d are bx φ(i+1/2, j,k),n+1
=
bx φ(i+1/2, j,k),n
− +
1
xt + z y
ε(i+1/2, j,k−1/2) 1
ε(i+1/2, j−1/2,k)
1 ε(i+1/2, j,k+1/2)
dx ψ(i+1/2, j,k+1/2),n+1/2
dx ψ(i+1/2, j,k−1/2),n+1/2 − dx ψ(i+1/2, j−1/2,k),n+1/2
1 ε(i+1/2, j+1/2,k)
dx ψ(i+1/2, j+1/2,k),n+1/2
(236)
250
CLAUDIO MATTIUSSI
and dx ψ(i, j+1/2,k+1/2),n+1/2 dx = ψ(i, j+1/2,k+1/2),n−1/2 −
+ −
1 μ(i, j+1/2,k) 1 μ(i, j,k+1/2)
t x 1 φ bx yz μ(i, j+1/2,k+1) (i, j+1/2,k+1),n
bx φ(i, j+1/2,k),n +
bx φ(i, j,k+1/2),n
1
μ(i, j+1,k+1/2)
bx φ(i, j+1,k+1/2),n
(237)
Analogous formulas can be written for the other components. It is interesting to consider also the case of lossy materials, in which case Eq. (221) becomes (Taflove, 1995) x x zε (i, j+1/2,k+1/2) E (i, y j+1/2,k+1/2),n+1/2 − E (i, j+1/2,k+1/2),n−1/2 t H(i,y j+1/2,k+1),n − H(i,y j+1/2,k),n = −y z z t H(i, + z j+1,k+1/2),n − H(i, j,k+1/2),n x x E (i, j+1/2,k+1/2),n+1/2 + E (i, j+1/2,k+1/2),n−1/2 z tσ (i, j+1/2,k+1/2) + y 2 (238) The new term with respect to Eq. (221) represents the charge flowing through z) during a time interval t, and can, therefore, be written as a surface (y x x E (i, j+1/2,k+1/2),n+1/2 + E (i, j+1/2,k+1/2),n−1/2 y z tσ(i, j+1/2,k+1/2) 2 j
= Q (i,x j+1/2,k+1/2),n
(239)
Thus, with the new term, the lossless time-stepping formula [Eq. (229)] becomes h
h
dx dx y y ψ(i, j+1/2,k+1/2),n+1/2 = ψ(i, j+1/2,k+1/2),n−1/2 − ψ(i, j+1/2,k+1),n + ψ(i, j+1/2,k),n j
hz hz x + ψ(i, j+1,k+1/2),n − ψ(i, j,k+1/2),n + Q (i, j+1/2,k+1/2),n
(240)
251
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
and we have the new discrete constitutive equation j
Q (i,x j+1/2,k+1/2),n
! ex ex z t φ(i, y j+1/2,k+1/2),n+1/2 + φ(i, j+1/2,k+1/2),n−1/2 σ(i, j+1/2,k+1/2) = xt 2
(241)
which can also be written as j
ex ex Q (i,x j+1/2,k+1/2),n φ(i, j+1/2,k+1/2),n+1/2 + φ(i, j+1/2,k+1/2),n−1/2 = σ(i, j+1/2,k+1/2) z t 2xt y
!
(242)
Note that, contrary to Eqs. (232) and (233), which are one-to-one discrete constitutive equations, Eq. (241) links Q jx to two distinct values of φ ex , which correspond to two consecutive primary time intervals. This can be explained by considering that the space–time volume to which Q jx is associated spans which is only half covered by a primary time a secondary time interval t, interval t. From this, it is desirable to involve at least two instances of φ ex in the determination of a single instance of Q jx . So that we can actually obtain from Eq. (240) a time-stepping formula, the last term must be expressed in terms of electric fluxes only. To this end, we invert the discrete constitutive relation [Eq. (233)], obtaining ex φ(i, j+1/2,k+1/2),n+1/2 =
xt dx 1 ψ z (i, j+1/2,k+1/2),n+1/2 ε(i, j+1/2,k+1/2) y
(243)
so that Q jx in the time-stepping formula [Eq. (240)] can be expressed as j
Q (i,x j+1/2,k+1/2),n σ(i, j+1/2,k+1/2) = t ε(i, j+1/2,k+1/2)
dx dx ψ(i, j+1/2,k+1/2),n+1/2 + ψ(i, j+1/2,k+1/2),n−1/2
2
!
(244)
Note how this relation involves two constitutive links: one going from primary to secondary quantities, and one going in the opposite direction (Fig. 37). In summary, with the interpretation of physical quantities suggested in Eqs. (222) through (224), (225) through (227), and (239), the FDTD method appears to adopt a discretization strategy fully consistent with the prescriptions of the analysis of the structure of physical field theories, both in the discretization of the geometry and in the association of global physical quantities to
252
CLAUDIO MATTIUSSI
space–time geometric objects. The FDTD method thus appears as a particular instance of the reference discretization method presented previously. The time marching is performed by means of the truly topological time-stepping formulas [Eqs. (228) and (240)], supplemented by the discrete constitutive relations [Eqs. (232), (233), and (241)]. Note that to determine the discrete constitutive equations, the constitutive relations are implicitly subjected to the simplest of the three kinds of discretization strategies considered previously. This leads one to expect that if this consequence of the original, intuitive approach which led to the FDTD method is not properly recognized beforehand, the efforts trying to extend the method to higher orders in space and time, or to nonorthogonal and unstructured meshes, are bound to produce mediocre results or to meet with severe difficulties. [For a more detailed analysis and a four-dimensional interpretation of the physical quantities and topological time stepping within the FDTD method see Mattiussi (2001).] 2. The Support Operator Method The support operator method (SOM; Hyman and Shashkov, 1997, 1999; Shashkov, 1996; Shashkov and Steinberg, 1995) is an FD technique that permits the derivation of discrete approximations to differential operators, which preserve some properties of the original continuous mathematical model within which the operators to be approximated appear. In particular, the focus is on the simultaneous preservation of some integral identity that is used in writing a topological law in continuous terms, and of some adjointness relation between pairs of topological statements that face each other in the factorization diagram of the corresponding physical theory. Given this emphasis on integral relations, it is instructive to compare this approach with that of the reference discretization strategy. The discretization of geometry adopted by the SOM is typical of FD methods in that a sufficiently well-behaved set of nodes is considered within the domain in space. For example, in two dimensions it is assumed that by properly joining these nodes one can construct at least a logically rectangular grid; that is, one that is homeomorphic to an actual rectangular grid (Shashkov, 1996). This implies that the resulting grid is, in fact, a cell complex, since it derives from the topological distortion of a subdivision of a domain in simple rectangular cells. Therefore, in addition to the 0-cells constituted by the original set of nodes, sets of p-cells with p up to the dimension of the domain are implicitly defined. This constitutes the primary mesh. The SOM does not make explicit use of a secondary mesh. The variables used by the method are the field components perpendicular or tangential to the cells, and associated with the centers of the cells (which, of course, in the case of the nodes are the nodes themselves). As in the case of the FDTD method, this opens the door to their interpretation as representatives of
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
253
global physical quantities. In fact, apart from nodal quantities, these components appear in the method’s formulas multiplied by the extension of the corresponding cell. Therefore, if we think of these variables as averaged values of the field over the cell, their products correspond to global quantities. Up to this point, we have merely described some general premises of FD discretization. It is in the discretization of the field equations that the SOM differentiates itself from a classical FD approach. Instead of discretizing separately the differential operators appearing in the field equations, the SOM selects one of the differential operators to play a privileged role in the discretization. This is called the prime operator. The prime operator is discretized with an FD approach; that is, by substituting its derivatives with FD approximations. However, during this process one must try to preserve as much as possible the integral properties of the prime operator. For example, if the prime operator is a divergence, a discrete version of Gauss’s divergence theorem must be applied to the discretized operator, which in this case is designated as DIV. The other differential operators that appear in the field problem are then considered for discretization. These are related to the prime operator by some integral relation. We know from our previous analysis that these amount substantially to the continuous counterparts of two properties of the coboundary operator: the fact that δδ = 0 (from which the properties such as dd = 0, curl grad = 0, and div curl = 0 are derived), and the adjointness (with respect to a suitable bilinear form, which puts in duality the corresponding cochains’ spaces) of the coboundary acting from ordinary p-cochains to produce ordinary (p + 1)-cochains on the primary mesh, with the coboundary acting on twisted (n − (p + 1))-cochains to give twisted (n − p)-cochains on the dual secondary mesh. For example, if the primary operator is a divergence, the corresponding integral relation is ϕ div A + A · grad ϕ = ϕ(A · n) (245) V
V
∂V
In this case, to discretize the gradient operator, the SOM puts in duality the discrete spaces of the variables by means of a suitable inner product and enforces a discrete counterpart of this relation. The resulting discrete operator is marked in some way, to signify its being a derived operator instead of a prime operator. For example, if the divergence is adopted as a prime operator, and Eq. (245) is used to obtain a discrete counterpart of the gradient, the resulting discrete operator is denoted with GRAD. We have not yet spoken of the constitutive links. The SOM does not adopt a separate discretization of these terms but includes instead this task in the discretization of some differential operator appearing in the field equations. Therefore, the SOM produces a discretized compound operator in place of a separate def discrete constitutive operator. For example, the operators ε curlμ = 1ε curl μ1
254
CLAUDIO MATTIUSSI def
and divε = div ε are defined, and they are discretized with the same procedure adopted for the purely differential operators. The only difference is that a bilinear form, which includes the constitutive links, is used to put in duality the spaces of discrete variables prior to discretization. Hence, three types of discrete operators result: a first class of operators determined from prime differential operators, denoted with GRAD, CURL, and DIV; a second class of operators determined as derived discrete operators, denoted with GRAD, CURL, and DIV; and a third class of operators determined as derived operators from compound differential operators, denoted, for example, with ε DIV and ε CURLμ . Finally, the derivatives with respect to time that remain in the semidiscretized model when the differential operators have been substituted by their discrete counterparts are discretized with the traditional approaches; that is, approximating the time derivative with an FD formula. In particular the standard leapfrog method is suggested for this task. Examining the workings of the SOM, we can see that the method recognizes the necessity to take into consideration a number of structural properties of the field problem. However, this is done when the problem itself has already been modeled in continuous terms (i.e., the branch on the right has been selected in Fig. 1). Therefore, properties such as quantity conservation and the adjointness of operators, which can be easily expressed and automatically enforced in discrete terms by adopting the discrete mathematical model of the reference strategy, are instead considered first at the continuous level and only subsequently enforced in a discrete setting. For this reason, the SOM appears to take a long detour to enforce properties that are automatically satisfied by using the approach based on the structure of physical theories. Moreover, the unique discrete operator for the representation of topological laws constituted by the coboundary operator and the related topological time-stepping process are not considered, and a separate discretization in space and in time is performed. In this sense, the SOM is representative of the task thus constituted and the possible pitfalls implied by the search for a structurally sound discretization strategy which takes as its starting point the continuous mathematical model. 3. Beyond the FDTD Method The preceding analysis shows that the discretization strategy adopted by the FDTD method is a physically sound one. Moreover, the method is easy to understand and to implement, at least for simple materials. All this makes FDTD a very successful method. This success has logically led to many efforts focused on the removal of its limitations. These lie mainly in the scarce flexibility of its orthogonal grids in modeling complex geometries, and in the low order of accuracy of the method, both in space and in time. Consequently
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
255
FDTD extensions to nonorthogonal or unstructured meshes, and to formulas having higher accuracy in space and time, have been presented (Taflove, 1998). Given the analysis presented in this work, and equipped with the conceptual tool represented by the reference discretization strategy, we know in what direction to look for an extension of FDTD capable of preserving its favorable qualities. The rationale can be stated as follows: “Keep two space–time cell complexes as meshes, keep the topological time stepping, and improve the discretization of the constitutive relations.” Unfortunately, the classical FD approach says instead: “Use an expression of differential operators in generic curvilinear coordinates and increase the accuracy of the approximation of derivatives appearing in the partial differential equation.” However, this does not lend itself to an easy generalization beyond logically rectangular grids. Moreover, since the derivatives appear in the equations as a consequence of the local representation of the topological operators, a brute-force approach to increasing the accuracy of their approximation, be they expressed in Cartesian coordinates or in curvilinear coordinates, is likely to lead to time-stepping formulas that cannot be considered as derived from a coboundary operator. We will, therefore, neglect the analysis of the extensions of FDTD based on local viewpoints, such as the classical FD approach, and strategies based on ideas borrowed from differential geometry, which make use, for example, of covariant and contravariant local basis vectors to express the differential operators on nonorthogonal grids (Taflove, 1998). We will look instead directly to methods which preserve the focus on the discrete nature of topological laws—namely, finite volume methods.
C. Finite Volume Methods In general, we can say that a numerical method is a finite volume (FV) method if, to discretize the field equations of a problem, it subdivides the problem domain into cells and writes the field equations in integral form on these cells. If we accept this definition, we see at once that the FV approach is very similar to that advocated by the reference method. However, the adoption of an integral approach alone does not ensure that all the requirements of the physical approach to the discretization are recognized and implemented. For example, one could write an integral statement in which topological and constitutive links are mixed, thus missing a fundamental distinction. Moreover, usually the discretization produced by the integral statements of the FV method does not include the time variable, which is instead subjected to a separate discretization. This opens the door to the possibility of time-stepping formulas that cannot be derived in a natural way from a space–time coboundary operator. From this point of view, the first FV method for time-dependent electromagnetic
256
CLAUDIO MATTIUSSI
problems that we will examine will turn out to be well behaved, in that it can be interpreted as a particular case of the reference discretization strategy. 1. The Discrete Surface Integral Method The discrete surface integral (DSI) method was suggested by Madsen (1995) for the solution of time-dependent electromagnetic problems on domains discretized by using unstructured grids. The method was first presented for the case of lossless, linear, isotropic materials, but it can easily be extended to lossy materials (Taflove, 1998) and to more complex electric and magnetic constitutive relations. For the discretization of the domain in space the method requires two dual meshes constructed exactly like those of the reference strategy (Fig. 33), but for the fact that within the DSI method there is no mention of the distinction between external and internal orientations. To simplify things with respect to a generic cell complex, we will assume that all 1-cells are straight lines and all 2-cells are planar. The variables used as unknowns by the method are at first sight not global ones but are instead field quantities associated with the edges or the faces of the two grids. However, we proceed to show that in this case also the field quantities are used in such a way that global quantities are actually intended. Let us start with the quantity associated with the primary 1-cells τ1k . This quantity is defined by the DSI method to be the projection E · sk of the electric field intensity vector E—assumed to be constant along the cell—onto the primary 1-cells τ1k represented as a vector sk. This means that a global value is actually considered associated with each primary 1-cell. In fact, if E is assumed constant also during a time interval t, we can consider directly the global space–time quantity φke = E · skt thought of as associated with a space–time primary 2-cell since this is the form in which E always appears in the DSI formulas. Correspondingly, the quantity associated with the secondary 1-cells τ˜1k is the projection H · s˜k of the magnetic field intensity vector H—assumed to be constant along the cell—onto the secondary 1-cells τ˜1k represented as a vector s˜k . This association can also be extended to consider the space–time global quantity ψkh = H · s˜k t. A full magnetic flux density vector B is associated with each primary 2-cell, and a full electric flux density vector D is associated with each secondary 2-cell. Both vectors are assumed to be constant over the corresponding cell. Even if these quantities are given as full-vector quantities, they appear in the ˜ i , where Ni and N ˜i DSI discretized Maxwell’s equations only as B · Ni and D · N are the so-called area-normal vectors, defined so that the two scalar products ˜ i, are actually the magnetic flux φib = B · Ni and the electric flux ψid = D · N associated with the corresponding cells. In this way we have interpreted all DSI variables as global quantities.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
257
We can now consider the DSI discretization of Faraday’s law and Maxwell– Amp`ere’s law in light of this interpretation of the variables. Faraday’s law at time tn+1/2 = (n + 1/2)t is written for a primary 2-cell τ2i , represented by the vector Ni, as dBn+1/2 · Ni = − En+1/2 · dl (246) dt ∂τ2i To discretize the time derivative, the DSI method adopts a time-centered leapfrog algorithm, which sets Bn+1 − Bn−1 dBn+1/2 = (247) dt t Substituting Eq. (247) in Eq. (246) we obtain the DSI time-stepping formula for magnetic flux: Bn+1 · Ni = Bn · Ni − t En+1/2 · dl (248) ∂τ2i
Remembering the expression of the boundary in terms of incidence numbers, we find that Eq. (248) becomes (249) τ2i , τ1k En+1/2 · sk t Bn+1 · Ni = Bn · Ni ± k
(with the uncertainty on the sign of the last term due to the usual dilemma regarding the relative default orientation of primary 1-cells with respect to the default positive direction of E). With the aforementioned interpretation of the variables as global quantities, Eq. (249) becomes e b b = φi,n ± (250) τ2i , τ1k φk,n+1/2 φi,n+1 k
which is exactly the topological time-stepping formula for φ b of the reference discretization method [Eq. (188)]. The same procedure can be applied to show that the DSI time-stepping formula for D, which is ˜ ˜ Hn · dl (251) Dn+1/2 · Ni = Dn−1/2 · Ni + t ∂ τ˜2i
reduces to the reference topological time-stepping formula for ψ d [Eq. (190)]. It remains now to examine how the DSI method proceeds to the discretization of the constitutive relations. We assume as given the local constitutive equation B = μH, and we consider how it is used to determine a discrete link going from φ b to ψ h . The DSI method adopts a reconstruction–projection method that is a particular case of discretization strategy 2 described previously for the discretization of the constitutive relations. The reconstruction is performed
258
CLAUDIO MATTIUSSI
Figure 45. The discrete surface integral method adopts a reconstruction–projection strategy for the discretization of the constitutive equations, which is based on the boundary cells of pairs of adjacent 3-cells.
with the following procedure. Consider two adjacent primary 3-cells, τ31 and τ32 (Fig. 45). They have in common a primary 2-cell τ2c . The boundary of τ2c is composed by primary 1-cells whose boundaries constitute a collection of 0-cells τ0m . The DSI method associates with each of these 0-cells two magnetic flux density vectors B1,m and B2,m, one for each of the two 3-cells, τ31 and τ32 . Each of these vectors is derived from a system of equations asking the fluxes calculated by integrating the vectors on three 2-cells (over which they are assumed as constant) to equal the fluxes associated with these same cells as DSI variables. j In more detail, calling τ2 , τ2k the other cells (besides the common cell τ2c ) belonging to the boundary of τ31 , which meet in the node τ0m ; calling Nc, Nj, Nk the corresponding area-normal vectors; and calling φcb , φ bj , φkb the variables associated with these 2-cells, the DSI method sets ⎧ b ⎪ ⎨φc = B1,m · Nc φ bj = B1,m · N j (252) ⎪ ⎩ b φk = B1,m · Nk
and determines B1,m in terms of φcb , φcb , and φkb . The same process is repeated for the adjacent 3-cell τ32 to determine B2,m, and for all the nodes τ0m belonging to the common 2-cell τ2c . The information constituted by all the B1,m and B2,m thus determined is then merged by using a weighting formula to produce finally a single vector B. The seminal paper on DSI (Madsen, 1995) examines three weighting formulas for this task. The vector B is then assumed to be the reconstructed, constant field within the two adjacent 3-cells, τ31 and τ32 . It depends on the values of the variable
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
259
φcb associated with the common 2-cell τ2c and on the values φib associated with all the 2-cells belonging to the boundary of τ31 and τ32 , which touch τ2c . The inverse local constitutive equation H = μ1 B is then applied to the reconstructed field. The resulting field H must at this point be projected on the secondary 1-cell τ˜1c , dual to τ2c , for H · s˜c to be determined. This is done by using the formula H · s˜c =
B · s˜c μ ¯c
(253)
where μ ¯ c is obtained by averaging μ along τ˜1c . Finally the global space–time is determined by multiplying the result of Eq. (253) by value ψch = H · s˜c t This whole process produces a discrete constitutive link the time step t. which relates ψch to the values φib on which the reconstructed field B depends. A similar procedure can be applied to discretize the constitutive equation D = εE, yielding a relation linking the global space–time quantity φce associated with each primary space–time 2-cell to the values ψkd associated with a small number of secondary neighboring 2-cells. Note that for boundary 2-cells the DSI reconstruction procedure is based on a single 3-cell. In summary, this analysis shows that the DSI method is, like the FDTD method, fully compliant with the prescriptions of the reference discretization strategy. It adopts a topological time-stepping formula for the global electromagnetic variables, although this remains hidden because of the use of local field quantities in the original description of the method. Compared with the FDTD method, the DSI method defines the quantities involved in more general terms, so that its time-stepping formulas apply to generic unstructured grids (only provided they are two dual cell complexes) on which they preserve their topological nature. Moreover, the DSI method adopts a more sophisticated approach to the discretization of constitutive equations than that of the FDTD method, since the DSI approach is based on a more complex reconstruction–projection strategy. All these properties make the DSI method a generalization of the FDTD method, which, from the point of view of the structure of physical field theories, preserves the favorable characteristics of that method. Considering in detail from the same point of view its reconstruction– projection strategy, the DSI method appears, however, to be far from optimal. The reconstruction strategy is actually focused on the determination of nodal field quantities and does not make use of edge elements. It is, therefore, likely that experiments with different reconstruction–projection operators more intimately related to the cochain concept would lead to further improvements of the method, not only in terms of compliance with the structure of electromagnetism but also in terms of accuracy.
260
CLAUDIO MATTIUSSI
2. The Finite Integration Theory Method The finite integration theory (FIT) method is an FV method for time-dependent electromagnetic problems which was developed independently of the FDTD method (Weiland, 1984, 1996). It is interesting to examine this method for two reasons. First, it has undergone a series of improvements from the time of its first appearance in the literature which have made it more and more similar to the reference discretization strategy described previously. Second, it distinguishes well the various phases of the discretization process for a field problem. In the first phase of its development (Weiland, 1984) the discretization of geometry adopted by the FIT method was based on two dual orthogonal grids, ˜ The idea of two kinds of orientation was not explicitly mentioned, G and G. but its consequences in terms of association of physical quantities with the two grids were implicitly used. In a more recent reformulation of the FIT method (Weiland, 1996), the orthogonal grids are abandoned and the fact that the two meshes need only be cell complexes is recognized. The new kind of mesh G is constructed by partitioning the spatial domain into volumes V i , whose nonempty pairwise intersection is a set of surfaces A j , whose pairwise intersection is in turn a set of lines Lk, whose pairwise intersection is in the end a set of points P l . Comparing this procedure with the definition of a cell complex given previously, we recognize in the resulting structure G our primary cell complex K, and in the sets {V i }, {A j }, {Lk}, and {P l } four sets of p-cells {τ pi }, ˜ which corresponds to the secondary cell p = 0, . . . , 3. The dual mesh G, ˜ complex K , is constructed by defining for each V i of G a dual point P˜ i located within V i and proceeding then to define the other dual objects L˜ j , A˜ k , and V˜ l . The discretization of fields, like that of the geometry, changed with the development of the method. Originally (Weiland, 1984) the quantities considered were the field components tangent to the lines of the grid in the case of field intensities E and H, and the field components perpendicular to the cells in the case of flux densities B and D. These components were assumed as evaluated at midcell and were subjected to numerical integration to obtain an approximation of the global quantities appearing in Maxwell’s equations in integral form. More recent formulations of the FIT method (Weiland, 1996) show that it has been recognized in the meantime that in writing these equations there is no reason to introduce the local field variables first. Consequently, the variables considered became the global field quantities associated with the geometric objects of the meshes. This process, however, was not fully carried out to include space–time geometric objects. Therefore only global quantities associated with spacelike objects are considered. These, if we adapt the notation to that used in the present work, are the electric voltages Vke associated with the primary 1-cells τ1k ≡ Lk; the magnetic fluxes φcb associated with the primary j 2-cells τ2 ≡ A j ; the magnetic voltages F jh associated with the secondary
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
261
j j 1-cells τ˜1 ≡ L˜j ; the electric fluxes ψkd and the electric currents Ik associated ρ k k with the secondary 2-cells τ˜2 ≡ A˜ ; and the electric charges Q l associated with the secondary 3-cells τ˜3l ≡ V˜ l . In formulas, Vke = E (254) τ1k
φ bj = F jh = ψkd j Ik
Q lρ
= = =
B
(255)
H
(256)
D
(257)
J
(258)
ρ
(259)
j
τ2
j
τ˜1
τ˜2k
τ˜2k
τ˜3l
Note that if we consider only the geometric objects in space, two distinct quantities of the same physical theory appear as associated with the same geometric object τ˜2k . The variables just defined are grouped by the FIT method ˜ d , I˜ j , and Qρ , which we can interpret as natural into vectors Ve, b , F˜ h , representations of space-like cochains. The discretization of topological laws at this point follows easily. Maxwell’s equations in integral form with respect to space, but in differential form with respect to time, are considered [Eqs. (23), (28), (29), and (24)]. Using implicitly the coboundary operator in space, these equations are semi-discretized, that is, the spacelike part of the topological relation is written in terms of the preceding cochains, and the time derivative is left in its original differential form. In matricial form, this reads
−
db + D2,1 Ve = 0 dt
(260)
D3,2 b = 0
(261)
˜d d ˜ 2,1 F˜ h = I˜ j +D dt ˜ρ ˜ 3,2 ˜d =Q D
(262) (263)
where D2,1 and D3,2 are the incidence matrices on the primary cell complex, ˜ 3,2 those on the secondary complex. The following relations ˜ 2,1 and D and D
262
CLAUDIO MATTIUSSI
hold true (Weiland, 1996): ˜T D2,1 = D 2,1 D3,2 D2,1 = 0 ˜ 3,2 D ˜ 2,1 = 0 D
(264) (265) (266)
Since the incidence matrices are the matricial representations of the coboundary operator, these relations correspond to the aforementioned adjointness of pairs of coboundary operators acting on the primary and secondary meshes, and to the relation δδ = 0 considered on the primary and secondary meshes. The constitutive equations are then discretized. The literature on the method is somewhat vague about the details of this process. The method used seems to correspond to discretization strategy 1 (discussed previously); that is, two dual cells are considered and the local constitutive equation is extended to the global quantities associated with these cells. For example, to discretize j j the constitutive equation B = μH, two dual orthogonal cells, τ2 and τ˜1 , are considered, and the corresponding global quantities are assumed as linked by a relation of the kind φ bj F jh
= Cμ, j
(267)
The coefficients Cμ, j constitute a matrix Cμ , which is diagonal for simple constitutive equations and meshes having orthogonal dual cells but can be nondiagonal in more complex cases. The discrete constitutive equations are, therefore, linear relations of the kind ˜ d = Cε Ve b = Cμ F˜ h
j I˜ L = Cσ Ve
(268) (269) (270)
where the subscript in the term I˜ L signals that there can be other contributions to the electric current besides this one. Using these equations, we find that the semidiscretized Maxwell’s equations [Eqs. (260) through (263)] can be rewritten in terms of two cochains only, which the FIT method chooses to be b and Ve. In particular, besides Eq. (260), which already depends on these two quantities only, the time-dependent relation [Eq. (262)], which corresponds to Maxwell–Amp`ere’s law, becomes j
d ˜ 2,1 C−1 b = I˜ j (Cε Ve ) + D (271) μ dt The set of semidiscrete equations obtained in this way are called Maxwell grid equations. −
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
263
Note that to this point no time-stepping procedure has been defined. From the choice of b and Ve as privileged variables, it follows that the time-stepping process will be based on the two time-dependent semidiscretized equations in terms of b and Ve [Eqs. (260) and (271)]. Various strategies for the discretization of the time derivatives remaining in these two equations are considered. In particular the usual leapfrog method is pointed out as the method of choice in most cases. For a time step t this gives the following time-stepping formulas (Weiland, 1996): bn+1 = bn − tD2,1 Ven+1/2 ˜ 2,1 C−1 b − I˜ j Ve = Ve + tC−1 D n+1/2
n−1/2
ε
μ
n
n
(272) (273)
These formulas cannot be considered topological time-stepping relations. To arrive at a topological time-stepping relation, we must first rearrange them as follows: bn+1 = bn − D2,1 tVen+1/2 ˜ 2,1 tC−1 b − t I˜ j Cε Ven+1/2 = Cε Ven−1/2 + D μ n n
(274)
bn+1 = bn − D2,1 en+1/2 d d ˜ 2,1 ˜j ˜ n+1/2 ˜ nh − Q ˜ n−1/2 = + D n
(276)
(275)
and then, considering the constitutive relations [Eqs. (268) and (269)]; intro˜ j ; and putting t F˜ h = ˜ h , e , and Q ˜ h , tVe = e , ducing the cochains j j ˜ , we can finally rewrite them as topological time-stepping reand t I˜ = Q lations
(277)
which correspond to those of the reference discretization strategy. In summary, the developers of the FIT method appear to have recognized, in the course of its evolution, the desirability of adopting a number of features that are suggested by the structural analysis of physical theories. These include the choice of cell complexes to discretize the domain and the priority of global physical quantities associated with geometric objects over local field quantities and their adoption as the method’s variables. Moreover, the distinction of topological laws from constitutive relations was built into the method from the start, along with the preservation in the semidiscrete system of equations of many structural properties of the continuous model of the original problem. The strategy adopted for the discretization of constitutive equations appears, however, elementary. Moreover, the method falls short of recognizing the desirability of a truly space–time approach to the discretization. The resulting choice of variables in the time-stepping formulas and the time-stepping formulas themselves suffer from this oversight. Even with the adoption of leapfrog time stepping, the interpretation of the FIT time-stepping formula
264
CLAUDIO MATTIUSSI
as a topological time stepping appears artificial, while the properties of the continuous mathematical model that were preserved in the semidiscrete model are at risk of being lost in the time discretization step. D. Finite Element Methods Originally, the finite element (FE) method was conceived of as an analytical tool for solid mechanics, and its first formulation was based on a direct physical approach (Burnett, 1987; Fletcher, 1984). Given its flexibility with respect to FD methods and the good results produced, the FE approach was applied to many other fields, with the variations required by the nature of the new problems. A whole class of FE methods ensued, which were soon given a rigorous mathematical foundation using the ideas of functional analysis. Despite this later formalization, the origins of the method lead one to expect that a certain similarity exists between the FE approach to discretization and the “physical” one of the reference discretization strategy. Let us, therefore, examine the FE methods from this point of view. We must first define what we intend to consider as an FE method, and we will do this in operative terms. To speak in concrete terms, let us consider a simple electrostatic problem. We assume that a distribution of charge ρ is given in a domain D, along with suitable boundary conditions along ∂D, and we seek the electrostatic potential V on D. We know (Fig. 18) that the field equations for this problem can be factorized into the following pair of topological equations −grad V = E
(278)
div D = ρ
(279)
supplemented by a constitutive equation of the kind D = f ε (E)
(280)
The FE discretization procedure starts with the subdivision of the ndimensional domain D in elements. In the simplest cases the elements correspond to the n-cells of the reference method and define a mesh in the domain. The field quantities that have been selected as unknowns are then given a discrete representation. This is done in terms of a finite number of variables associated with geometric objects, which belong to the mesh. In our case, since the unknown is a 0-field, these objects are a set of 0-cells, the so-called nodes in FE terminology. Two possibilities are open at this point for the discretization of the equations: the variational approach and the weighted residual approach (Fletcher, 1984). Given the greater generality of the latter, we will consider the weighted residual approach as characteristic of FE methods. To apply this technique, we must consider the complete field equation; that is, we must
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
265
reassemble Eqs. (278) through (280) to give −div( f ε (grad V )) = ρ
(281)
The first step of the FE discretization of this continuous formulation of the problem consists of a reconstruction of the unknown quantities based on its discrete representation, using a set of shape functions sj(r) (see the section on edge elements—Section IV.A.4), as follows: V (r) = V j s j (r) (282) j
This transforms Eq. (281) into =ρ Vj s j div f ε grad
(283)
Despite the adoption of a discrete representation for the unknown field, Eq. (283) is still a partial differential equation. So that a system of algebraic equations can be obtained from it, a set of weight functions wi is selected, and the following set of residual equations is written: i w = Vj s j − div f ε grad ρwi ∀i (284) D
D
To obtain a sparse coefficient matrix, FE methods adopt shape and weight functions having local character. Therefore, the support of each weight function is a small subdomain of D, which is in fact a 3-cell that we can denote with τ3i . Using the prior notation for weighted integrals, we can rewrite Eq. (284) as follows: = Vj s j − ρ ∀i (285) div f ε grad wi τ3i
wi τ3i
The left side of each of these equations can be integrated by parts, which gives Vj s j = f ε grad ρ ∀i (286) − wi τ3i ∂ (wi τ3i )
where the meaning of the term on the left side is defined by Eq. (172). The set of equations represented by Eq. (286) is the system of algebraic equations produced by the FE method. We can interpret this strategy in light of the reference method. First, we must reconstruct the field V, starting from the 0-cochain Vv = {Vj}, using the shape functions s j . We can represent this as the action of a reconstruction operator R v as follows: V = R v (Vv )
266
CLAUDIO MATTIUSSI
Next, we apply the local topological relation [Eq. (278)] to the reconstructed field, which gives the E field. The constitutive relation [Eq. (280)] is then applied to E, and we obtain the electric flux density D. For each weight function, that is, on each spread cell w j τ3i , we finally impose, the following topological equation: D= ρ ∀i (287) ∂ (wi τ3i ) wi τ3i Compared with the reference strategy, which enforces both topological equations in discrete terms on crisp cells, the approach just described differs in its applying one topological equation in differential terms and the other in integral terms on spread cells. The difference could be reduced by reformulating the reconstruction of E and making it start from the Ve cochain, and then applying to it the corresponding topological equation in coboundary terms. In any case the projection of the reconstructed field is not performed, because of the presence of (secondary) spread cells, which, contrary to the case of crisp cells, do not require this step. Note that if the reconstruction is based on Ve, edge elements and not nodal interpolation must be used to obtain a physically sound reconstruction of E. This is always true in the case of magnetostatics problems, since the magnetic potential A is a 1-field and a correct reconstruction of it must start from the cochain V a and use (ordinary) 1–edge elements. The realization of this fact was one of the reasons that led to the introduction of edge elements by the computational electromagnetics community. This analysis reveals also the different role of shape functions and weight functions in the discretization process. Shape functions are used to reconstruct the fields in order to approximate the constitutive equations, whereas weight functions define the spread cells which constitute a continuous counterpart of the secondary mesh to which the corresponding topological equations are applied. With different joint choices of the two sets of functions we obtain different categories of methods. If the weight functions wi are the characteristic functions of their support τ3i , that is, if 1 in τ3i i w = (288) 0 outside τ3i then the method is called a subdomain method. Considering Eq. (287), we recognize that a subdomain method is actually an FV method, since it applies the topological equations to a set of crisp cells τ3i . If the weight functions coincide with the shape functions, that is, if wi (r) = si (r)
∀i
(289)
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
267
then the method is called a Galerkin method. Other choices are, of course, possible and give rise, for example, to the so-called collocation method and to the least squares method (Fletcher, 1984). The choice corresponding to the Galerkin method is undoubtedly the most used in FE. This is linked to the fact that, when a variational formulation exists for the problem, Galerkin’s choice gives a system of equations that corresponds to that derived from the variational approach. Considering their different roles in the discretization process, the systematic choice of coincident weight and shape functions appears, however, to be questionable. If edge elements are adopted as shape functions on the grounds of physical considerations linked to the association of physical quantities with geometric objects, then, in the same spirit, weight functions should be chosen in order to impose in an optimal way the topological equations, and it may turn out that some kinds of shape functions are not ideally suited to this task. [For an analysis of the different roles of shape and weight functions from a functional analysis viewpoint, see Schroeder and Wolff (1994).] 1. Time-Domain Finite Element Methods The FE discretization strategy exemplified in the preceding paragraphs does not lend itself easily to the discretization of time-dependent problems. It has been noted (Fletcher, 1984) that the classical FE method is intrinsically “elliptic,” in the sense that it solves problems by “propagating” simultaneously on the whole domain the source and boundary conditions of the problem. Therefore it is ideally suited to the solution of boundary-value problems but does not apply well to initial-value problems. To adapt the nature of the FE method to a transient problem defined in a time interval [t0, t1], one should consider space–time shape functions s(r, t) and weight functions w(r, t), and transform the initial-boundary-value problem in a boundary-value problem, generating in some way the missing boundary condition at the final time instant t = t1. This can be done, for example, by putting t1 = ∞ and using the steady-state solution of the problem with time-infinite elements, or by using finite elements and a t1 large enough to make the solution at that time sufficiently similar to the steady-state condition (Burnett, 1987). Understandably, neither of these approaches enjoyed great popularity. A first alternative to this approach is the adoption of the separation of variables technique. Assume that, as usual, the problem constituted by Faraday’s law [Eq. (8)] and Maxwell–Amp`ere’s law [Eq. (13)], with the simple constitutive equations of electromagnetics [Eqs. (20) through (22)], and with suitable initial and boundary conditions, is to be considered. As for the preceding electrostatic problem, we first combine the equations into a single partial
268
CLAUDIO MATTIUSSI
differential equation, for example (Lee et al., 1997) 1 ∂ 2E ∂E ∂Ji curl E + ε 2 + σ + =0 curl μ ∂t ∂t ∂t
(290)
where Ji are the impressed currents. The space–time shape functions are expressed as products of functions that depend separately on space and time, and the unknown field is reconstructed as follows: (291) E(r, t) = V je (t)s j (r)
In this case, the shape functions are assumed to be 1-edge elements, and the vector of coefficients {V je (t)} can be considered a time-dependent cochain Ve(t). Next, a set of suitable weight functions wi(r) is considered at a generic time instant t, and the weighted residual method is applied to Eq. (290), which results in a system of ordinary differential equations
dVe d 2 Ve + CVe + d = 0 + B (292) dt 2 dt where A, B, and C are matrices and d is a vector. This system of equations is finally discretized and solved by using some time-stepping method for ordinary differential equations. The approach just examined, because of its adoption of edge elements for the reconstruction, has in common with the approach of the reference discretization strategy the discrete representation of a field as a cochain. However, the similarities stop there, since the discrete representation of fields does not extend to space–time and the distinction of topological and constitutive equations is not recognized. Moreover, contrary to the case of the electrostatic problem, in which the distinction was not explicitly recognized by the FE approach but could be considered implicitly built into the method, disentanglement is not possible here since Eq. (290) mixes inextricably the two time-dependent Maxwell’s equations and the constitutive equations. The same holds true for the other approach to time-dependent problems that preserves to the problem the “ellipticity” suited to the classical FE approach, that which considers time-harmonic fields (Jin, 1993). In that case Eq. (290) becomes 1 curl curl E − ω2 εE + jσ ωE + jωJi = 0 (293) μ A
where not only are the equations mixed, but the possibility of a space–time approach is also definitely lost. Thus it seems that despite its physical origin, the classical FE approach is not capable of producing a truly physical discretization of time-dependent problems. This is even more surprising if one considers that we owe to the FE method ideas such as that of edge elements, of the reconstruction–projection
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
269
strategy, and of the error-based strategies for constitutive equations discretization. FE practitioners are aware of the problem constituted by the lack of a convincing FE treatment of transient problems, and in recent times have considered with interest the simplicity and effectiveness of the FDTD method (Lee et al., 1997). They have realized in particular that to obtain a physical discretization, one has to start from the factorized equations, that is, consider separately the two time-dependent Maxwell’s equations, and the constitutive equations. Let us examine two methods which adopt this discretization philosophy. 2. Time-Domain Edge Element Method An FE-like time-domain method directly based on the two time-dependent Maxwell’s equations has been suggested, with minor variations, by many authors. This method is described in the survey paper on time–domain FE methods by Lee et al. (1997). It adopts a single discretization mesh for the domain in space and defines as variables the global quantities associated with the p-cells of this mesh. Contrary to the reference method, therefore, we do not have two distinct dual meshes. However, since both the primary and the secondary variables are associated with the cells of the unique mesh, we can assume that two “logical” meshes, which have different kinds of orientations and share the same geometric support, are implicitly defined. In other words, we have τni = τ˜ni , for every i and for n = 0, 1, 2, 3. With this provision the variables of the method are defined like those of the FIT method, by Eqs. (254) through (258), and are the electric voltages Vke , the magnetic fluxes φib , the magnetic j voltages Fhk , the electric fluxes ψid , and the electric currents Ii . Therefore, the fields are correctly represented by cochains, which we denote as Ve, b , ˜ d , and I˜ j , like those of the FIT method. F˜ h , According to the FE tradition exemplified in the preceding electrostatic example, the method then proceeds to the reconstruction of the fields E, B, H, D, and J, using as shape functions a set of 1-edge elements s1k , and a set of 2-edge elements si2 , as in E = k Vke s1k = Re (Ve ) H = k Fkh s1k = Rh (F˜ h ) ˜ d) D = i ψid si2 = Rd ( (294) b 2 B = i φi si = Rb (b ) j J = i Ii si2 = R j (I˜ j ) These expressions are substituted into Maxwell’s time-dependent equations and then the collocation method is applied to the resulting equations. This means that Maxwell’s equations are applied in integral form to the 2-cells.
270
CLAUDIO MATTIUSSI
After application of Stokes’s theorem, this produces d φib si2 = 0 Vke s1k + i i dt τ2 i ∂τ2 k d j h 1 d 2 Ii si2 Fk sk − ψi si = i i i dt τ2 i ∂τ2 k τ2
(295) (296)
which corresponds to
d b + φi si2 = 0 i i dt ∂τ τ k i 2 2 j d ψid si2 si2 = Ii Fkh s1k − i i i dt τ2 τ2 ∂τ2 i i k
Vke
s1k
(297) (298)
Remembering the characteristic property of edge elements expressed by Eq. (219), and expressing the boundaries of cells in terms of incidence numbers, we find that this corresponds to db + D2,1 Ve = 0 dt ˜d d + D2,1 F˜ h = I˜ j − dt
(299) (300)
where D2,1 is the incidence matrix between 2-cells and 1-cells of the mesh. Note that this corresponds to the semidiscrete relations [Eqs. (260) and (262)] of the FIT method. This is not a surprise, since we are actually performing the same steps of the FIT method. The fact that a reconstruction of the field quantities is performed before the enforcement of Maxwell’s equations is a heritage of the FE approach, but appears completely superfluous, since the subsequent projection performed while the equations are being enforced in integral form reproduces exactly the starting cochain. This is in fact the characteristic property of edge elements, as expressed by Eq. (219), or Eq. (210). The reconstruction is actually required only in a later phase, that is, when it is time to determine the discrete constitutive equations, expressing Ve in terms ˜ d , and F˜ h in terms of b . This is done by imposing on the reconstructed of fields [Eq. (294)] the constitutive equations, and then projecting the result to obtain a cochain, according to ˜ d ))) Vke = τ k 1ε D = P e ( f ε−1 (Rd ( 1 (301) Fkh = τ k μ1 B = P h ( f μ−1 (Rb (b ))) 1
271
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
This, after substitution and integration, gives the matricial links ˜d Ve = Cε−1 F˜ h = Cμ−1 b
(302) (303)
Substituting these links in the semidiscrete relations [Eqs. (299) and (300)], and discretizing in time by using a leapfrog scheme, we have ˜d bn+1 = bn − D2,1 tCε−1
d d ˜ n+1/2 ˜ n−1/2 = + D2,1 tCμ−1 b − t I˜ j
(304) (305)
˜ j = t I˜ j gives and finally, putting Fε−1 = tCε−1 , Fμ−1 = tCμ−1 , and Q ˜d bn+1 = bn − D2,1 Fε−1
d d ˜j ˜ n+1/2 ˜ n−1/2 = + D2,1 Fμ−1 b − Q
(306) (307)
These time-stepping formulas coincide with those of the reference method [Eqs. (192) and (193)]. In particular they could be put in the from of Eqs. (211) and (212), since the discrete constitutive links are determined by using a reconstruction–projection strategy. In summary, the time-domain edge element method just described appears to be in fact a particular FV method, which adopts coincident primary and secondary meshes, applies the topological equations in terms of cochains (the premature reconstruction of fields prior to topological equations application being immaterial), and discretizes the constitutive equations by using a reconstruction–projection method based on edge elements. Contrary to the reference discretization strategy, in this method the global variables are not considered associated with space–time geometric objects. However, thanks to its applying the topological equations to crisp cells, its time-stepping formulas can be considered as implementing a topological time stepping. 3. Time-Domain Error-Based FE Method The examples of FE methods given so far and the accompanying discussion could have created in the reader the impression that it is not possible to build a truly “physical” time-domain method by using an FE approach based on spread cells. To prevent the formation of this premature conclusion we present now an example of an interesting error-based time-domain FE method which uses spread cells (Albanese et al., 1994; Albanese and Rubinacci, 1998). The method is based on the use of potentials both on the primary mesh and on the secondary mesh. To this end, a slightly modified form of Maxwell’s
272
CLAUDIO MATTIUSSI
equations must be considered; that is, the set ∂B =0 (308) ∂t ∂DT curl H − =0 (309) ∂t where DT is a modified electric flux density, which includes the current density term. In this way both these statements appear as flux conservation statements, and the corresponding quantities admit two potentials A and W, such that (Albanese et al., 1994) curl E +
∂A ∂t ∂W H= ∂t B = B0 + curl A E=−
DT = DT0 + curl W
(310) (311) (312) (313)
The potentials A(r, t) and W(r, t) are formally reconstructed by using edge elements, as follows: A = Ra (U a ) w
W = Rw (Ŵ )
(314) (315)
where U a and Ŵ w are the corresponding cochains. Next, Eqs. (310) through (313) are used to determine the fields. This ensures that the fields satisfy Eqs. (308) and (309). As a way to enforce the constitutive equations, an error density function ξ (E, DT, B, H) is defined, following the criteria sketched in the definition of Eq. (215), and is integrated in space over the domain D, and in time over a time step t, which gives an error functional tn+1 ξ (E, DT , B, H) (316) E= tn
D
Given Eqs. (314) and (315), the error functional E can be expressed in terms of the cochains U a and Ŵ w . Therefore, a minimization problem based on E can be established at each time step, as follows, " a # " a " a # # " a w # w w U n+1 , Ŵ wn+1 = Un+1 , Ŵn+1 , Ŵn+1 : min E Un+1 , Un , Ŵn a w {Un+1 ,Ŵn+1 } (317) which thus allows the time stepping of the potential cochains.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
273
From the physical point of view this approach to the solution of Maxwell’s equations leaves much to be desired. In particular this is true for its use of a modified form of Maxwell’s equations. However, it appears to be a promising idea toward the determination of a physically consistent method based on the traditional approach of FE methods, and for the extension of the error-based methods to the case of time-dependent problems, by adopting either the FE or the FV approach.
V. Conclusions We have presented a set of conceptual tools for the formulation of physical field problems in discrete terms. These tools allow the representation of the geometry and of the fields in discrete terms, by using the concepts of oriented cell complex and chain and cochain. Moreover they allow us to bridge the gap between the continuous and the discrete concepts of field by means of the idea of a limit system. The analysis of the structure of physical field theories is based on these tools. This analysis unveils the importance of thinking of physical quantities as associated with space–time oriented geometric objects. It shows also that these objects must be thought of as endowed with one of two kinds of orientation. Moreover, this analysis exposes the distinction of topological laws from constitutive relations, showing their different behavior from the point of view of their discretizability. It clarifies also that a privileged discrete operator—the coboundary operator—exists for the representation of topological laws. A reference discretization strategy, which complies with these concepts, has been presented. It is based on the idea of topological time stepping for timedependent equations, which operates on global quantities and derives from the application of the coboundary operator in space–time. It was then shown how topological time stepping can be combined with different strategies for the discretization of the constitutive relations. In particular, three of these strategies were presented and examined in detail. Analyzing the operation of a number of popular methods, we have shown that there has been a steady tendency of numerical methods devoted to field problems toward the adoption and inclusion of techniques that adhere to the philosophy described previously. In particular we have revealed that many methods can be thought of as implicitly adopting the topological time stepping procedure. According to the signaled trend, even if the concept of topological time stepping seems to have eluded the creators of these methods so far, we can expect it to be recognized and included explicitly in future formulations of these methods.
274
CLAUDIO MATTIUSSI
In the long run, this trend will probably lead to classes of methods modeled on the reference discretization strategy, which mix the best features of the various methods. In particular, we have shown that methods such as the finite difference and the finite volume methods, which do well with regard to time stepping, usually fail to give the constitutive relations an adequate treatment, thus ending with very crude discretizations that are scarcely applicable to nonstructured grids. On the contrary, finite element methods, which discretize well the constitutive relations and can deal easily with arbitrary meshes, fail with regard to topological laws, especially when such methods are applied to timedependent problems. Thus, the time seems ripe for the combination of the best features of these categories of methods, with the devisement of methods that discretize carefully both topological laws and constitutive relations, bringing to the field of unstructured meshes the advantages of a correct topological time stepping. In particular, the joint use of error-based discretization strategies for the constitutive relations along with topological time-stepping schemes seems a promising and as yet unexplored field of enquiry. As anticipated in the Introduction, we have not considered questions such as the rate of convergence, the stability, and the error analyses of the methods. In light of the present discussion it is, however, worth making at least one observation: The tendency of the various approaches toward the adoption of a number of common ideas includes the use of global variables associated with geometric objects for the discrete representation of fields. Therefore, it seems logical to also focus the error analyses on the global quantities, instead of applying these analyses to the local quantities that are reconstructed from global ones once the numerical problem has been solved. The error deriving from this last step is one of cochain-based field function approximation, and is derived from a process of reconstruction of the field functions, which starts from the aforementioned global values. This error is obviously relevant to the solution of physical field problems, but it can be considered separately from the previous phases of the numerical method. For example, the final field reconstruction can be conducted with different criteria with respect to possible reconstructions which took place during the discretization phase. Finally, the emphasis placed in this study on a discrete approach to the modeling is not intended to indicate that the alternative continuous approaches should be abandoned in the near future or considered “bad” approaches. These alternative approaches are needed today and will continue to be in the future, since a discrete approach modeled on the reference discretization strategy presented in this work places some constraint on the relation between the problem to be solved and the resources required to actually do this. There will always be cases in which a solution is sought for which the discrete approach presented here appears in a particular moment not to be feasible. Returning to the theme of the Introduction, since good numerical mathematics is also the
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
275
art of the possible, to cover these cases the ingenuity of the mathematician will be required to produce for these problems a formulation that is numerically feasible. Usually this requires embedding in the discrete formulation of a problem the physical and mathematical knowledge available regarding the problem and the general behavior of the solution. This includes the exploitation of the properties of the continuous mathematical model. We see in our times examples of this in spectral methods and compact finite difference methods applied to fluid dynamics. However, as time goes by, we can expect that the approach to the discrete formulation of field problems outlined in the present work will be found to be numerically manageable for a steadily widening range of problems, and that for these cases this approach will be recognized and adopted as the method of choice.
VI. Coda A first version of this work was published as Mattiussi (2000) and is presented here with minor corrections, some additions (mainly in the attempt to improve readability and to clarify some points that readers of the previous version found obscure), and a slightly different emphasis on some topics. The response to the first appearance of this material [and to that of a previous paper dealing with similar issues but with greater attention to implementation details, in particular for what concerns the optimization of the reconstruction– projection process (Mattiussi, 1997)] made me feel as if only one part of the message that it was trying to convey was getting through, namely, the part concerning the possibility to reinterpret and compare the workings of existing numerical methods. This was probably a result of the quantity of material devoted to the analysis of some popular methods in light of the reference discretization strategy. In fact, the possibilities opened by the application of algebraic topology and of the structural analysis of physical theories to the discretization of field problems go way beyond that. For example, they include the development of the reference discretization strategy introduced in this work, which—abstracting from the particular physical theory for which it is presented here, and thinking instead in terms of the factorization diagram—can be used as a template for the systematic discretization of generic field problems (and, therefore, for the development of new methods for the numerical solution of these problems) complying with the structure of the underlying physical theory. In this respect, a lot of work remains to be done to apply this approach to field problems for which the computational community is striving to improve the numerical solutions (in particular, if the improvements have to do with the physical soundness of the solutions).
276
CLAUDIO MATTIUSSI
In relation to the history of exterior algebra, Gian Carlo Rota (1997) once wrote: Evil tongues whispered that there was really nothing new in Grassmann’s exterior algebra. . . . The standard objection was expressed by the notorious question, “What can you prove with exterior algebra that you cannot prove without it?” Whenever you hear this question raised about some piece of new mathematics, be assured that you are likely to be in the presence of something important. . . . A proper retort might be: “You are right. There is nothing in yesterday’s mathematics that you can prove with exterior algebra that could not also be proved without it. Exterior algebra is not meant to prove old facts, it is meant to disclose a new world” (pp. 47–48).
I hope that the publication of this contribution in its present form can help to convey better the neglected part of its message, namely, that, in its most ambitious embodiment, the application of the analysis of the structure of physical theories to the discretization of field problems is not meant to reinterpret old methods; it is meant to disclose a new world.
References Abraham, R., Marsden, J. E., and Ratiu, T. (1988). Manifolds, Tensor Analysis, and Applications. Berlin: Springer-Verlag. Ahagon, A., Fujiwara, K., and Nakata, T. (1996). Comparison of various kinds of edge elements for electromagnetic field analysis. IEEE Trans. Magn. 32, 898–901. Albanese, R., Fresa, T., Martone, R., and Rubinacci, G. (1994). An error based approach to the solution of full Maxwell equations. IEEE Trans. Magn. 30, 2968–2971. Albanese, R., and Rubinacci, G. (1993). Analysis of three-dimensional electromagnetic fields using edge elements. J. Comput. Phys. 108, 236–245. Albanese, R., and Rubinacci, G. (1998). Finite element methods for the solution of 3D eddy current problems, in Advances in Imaging and Electron Physics, Vol. 102, edited by P. Hawkes. Boston: Academic Press, pp. 1–86. Baldomir, D., and Hammond, P. (1996). Geometry of Electromagnetic Systems. Oxford, UK: Oxford Univ. Press. Bamberg, P., and Sternberg, S. (1988). A Course in Mathematics for Students of Physics: 1–2. Cambridge, UK: Cambridge Univ. Press. Bateson, G. (1972). The logical categories of learning and communication, in Steps to an Ecology of Mind. New York: Ballantine Books. Bellman, R. (1968). Some Vistas of Modern Mathematics. Univ. of Kentucky Press. Belytschko, T., Krongauz, Y., Organ, D., Fleming, M., and Krysl, P. (1996). Meshless methods: an overview and recent developments. Comput. Methods Appl. Mechanics Eng. 139, 3–47. Berenger, J. P. (1994). A perfectly matched layer for the absorption of electromagnetic waves. J. Comput. Phys. 114, 185–200. Birss, R. R. (1980). Multivector analysis I–II. Phys. Lett. 78A, 223–230. Bishop, R. L., and Goldberg, S. I. (1980). Tensor Analysis on Manifolds. New York: Dover.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
277
Bodenmann, R. (1995). Summation by Parts Formula for Noncentered Finite Differences. Research Report 95-07, Seminar f¨ur Angewandte Mathematik, ETH Z¨urich. Boothby, W. M. (1986). An Introduction to Differentiable Manifolds and Riemannian Geometry, 2nd ed. San Diego: Academic Press. Bossavit, A. (1998a). Computational Electromagnetism. San Diego: Academic Press. Bossavit, A. (1998b). How weak is the “weak solution” in finite element methods? IEEE Trans. Magn. 34, 2429–2432. Bowden, K. (1990). On general physical systems theories. Int. J. Gen. Syst. 18, 61–79. Branin, F. H., Jr. (1966). The Algebraic-Topological Basis for Network Analogies and the Vector Calculus. Paper presented at the Symposium on Generalized Networks, Polytechnic Institute of Brooklyn, New York. Burke, W. L. (1985). Applied Differential Geometry. Cambridge, UK: Cambridge Univ. Press. Burnett, D. S. (1987). Finite Element Analysis. Reading, MA: Addison-Wesley. Cartan, E. (1922). LeC¸ons sur les invariants int´egraux. Paris: Hermann. Cendes, Z. J. (1991). Vector finite elements for electromagnetic field computation. IEEE Trans. Magn. 27, 3958–3966. Choquet-Bruhat, Y., and DeWitt-Morette, C. (1977). Analysis, Manifolds and Physics. Amsterdam: North-Holland. de Rham, G. (1931). Sur l’Analysis situs des vari´et´es a` n dimensions. J. Math. 10, 115–199. de Rham, G. (1960). Vari´et´es Diff´erentiables. Paris: Hermann. Deschamps, G. A. (1981). Electromagnetics and differential forms. Proc. IEEE 69(6), 676–696. Dezin, A. A. (1995). Multidimensional Analysis and Discrete Models. Boca Raton, FL: CRC Press. Dolcher, M. (1978). Algebra Lineare. Bologna: Zanichelli. Eilenberg, S., and Steenrod, N. (1952). Foundations of Algebraic Topology. Princeton, NJ: Princeton Univ. Press. Ferziger, J. H., and Peri´c, M. (1996). Computational Methods for Fluid Dynamics. Berlin: Springer-Verlag. Flanders, H. (1989). Differential Forms with Applications to the Physical Sciences. New York: Dover. Fletcher, C. A. J. (1984). Computational Galerkin Methods. Berlin: Springer-Verlag. Franz, W. (1968). Algebraic Topology. New York: Ungar. Golias, N. A., Tsiboukis, T. D., and Bossavit, A. (1994). Constitutive inconsistency: rigorous solution of Maxwell equations based on a dual approach. IEEE Trans. Magn. 30, 3586–3589. Guibas, L., and Stolfi, J. (1985). Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. ACM Trans. Graphics 4, 74–123. Gustafsson, B., Kreiss, H.-O., and Oliger, J. (1995). Time Dependent Problems and Difference Methods. New York: Wiley. Hocking, J. G., and Young, G. S. (1988). Topology. New York: Dover. Hurewicz, W., and Wallman, H. (1948). Dimension Theory. Princeton, NJ: Princeton Univ. Press. Hyman, J. M., and Shashkov, M. (1997). Natural discretization for the divergence, gradient, and curl on logically rectangular grids. Comput. Math. Applic. 30, 81–104. Hyman, J. M., and Shashkov, M. (1999). Mimetic discretization for Maxwell’s equations and equations of magnetic diffusion. J. Comput. Phys. 151, 881–909. Isham, C. J. (1989). Modern Differential Geometry for Physicists. Singapore: World Scientific. Jackson, J. D. (1975). Classical Electrodynamics. New York: Wiley. Jin, J. (1993). The Finite Element Method in Electromagnetics. New York: Wiley. Kunz, K. S., and Luebbers, R. J. (1993). Finite Difference Time Domain Method for Electromagnetics. Boca Raton, FL: CRC Press. Lebesgue, H. (1973). LeC ¸ ons sur l’Int´egration. New York: Chelsea.
278
CLAUDIO MATTIUSSI
Lee, J., Lee, R., and Cangellaris, A. (1997). Time-domain finite element methods. IEEE Trans. Antennas Propagat. 45, 430–442. Lele, S. K. (1992). Compact finite difference schemes with spectral-like resolution. J. Comput. Phys. 103, 16–42. Lilek, Z., and Peri´c, M. (1995). A fourth-order finite volume method with colocated variable arrangement. Comput. Fluids 24, 239–252. MacLane, S. (1986). Mathematics Form and Function. Berlin: Springer-Verlag. Madsen, N. K. (1995). Divergence preserving discrete surface integral methods for Maxwell’s curl equations using non-orthogonal unstructured grids. J. Comput. Phys. 119, 34–45. Marmin, F., Cl´enet, S., Pirou, F., and Bussy, P. (1998). Error estimation of finite element solution in non-linear magnetostatic 2D problems. IEEE Trans. Magn. 34, 3268–3271. Mattiussi, C. (1997). An analysis of finite volume, finite element, and finite difference methods using some concepts from algebraic topology. J. Comput. Phys. 133, 289–309. Mattiussi, C. (1998). Edge elements and cochain-based field function approximation, in Proceedings of the Fourth International Workshop on Electric and Magnetic Fields, Marseilles (France). pp. 301–306. Mattiussi, C. (2000). The finite volume, finite element, and finite difference methods as numerical methods for physical field problems. Adv. Imaging Electron Phys. 113, 1–146 (P. Hawkes, Ed.). Mattiussi, C. (2001). The geometry of time-stepping. Prog. Electromagn. Res. 32, 123–149. Maxwell, J. C. (1871). Remarks on the mathematical classification of physical quantities. Proc. London Math. Soc. 3, 224–232. ´ ementaire d’Electricit´ ´ Maxwell, J. C. (1884). Trait´e El´ e. Paris: Gauthier-Villars. Misner, C. W., Thorne, K. S., and Wheeler, J. A. (1970). Gravitation. New York: Freeman. Moore, W. (1989). Schr¨odinger, Life and Thought. Cambridge, UK: Cambridge Univ. Press. Mur, G. (1994). Edge elements, their advantages and their disadvantages. IEEE Trans. Magn. 30, 3552–3557. Nguyen, D. B. (1992). Relativistic constitutive relations, differential forms, and the p-compound. Am. J. Phys. 60, 1134–1144. Oden, J. T. (1973). Finite element applications in mathematical physics, in The Mathematics of Finite Elements and Applications, edited by J. R. Whiteman. San Diego: Academic Press, pp. 239–282. Oden, J. T., and Reddy, J. N. (1983). Variational Methods in Theoretical Mechanics, 2nd ed. Berlin: Springer-Verlag, O˜nate, E., and Idelsohn, S. R. (1992). A comparison between finite element and finite volume methods in CFD. Comput. Fluid Dynamics 1, 93. Palmer, R. S., and Shapiro, V. (1993). Chain models of physical behavior for engineering analysis and design. Computer Science Technical Report TR93-1375, Cornell University, New York. Penman, J. (1988). Dual and complementary variational techniques for the calculation of electromagnetic fields. Adv. Electron. Electron Phys. 70, 315–364 (P. Hawkes, Ed.). Post, E. (1997). Formal Structure of Electromagnetics. General Covariance and Electromagnetics. New York: Dover. Remacle, J.-F., Geuzaine, C., Dular, P., Hedia, H., and Legros, W. (1998). Error estimation based on a new principle of projection and reconstruction. IEEE Trans. Magn. 34, 3264–3267. Rikabi, J., Bryant, C. F., and Freeman, E. M. (1988). An error-based approach to complementary formulations of static field solutions. Int. J. Numer. Methods Eng. 26, 1963–1987. Rota, G. C. (1997). Combinatorics, Representation Theory and Invariant Theory pp. 39–54. Indiscrete Thoughts, edited by F. Palombi. Boston: Birkh¨auser. Schouten, J. A. (1989). Tensor Analysis for Physicist. New York: Dover. Schroeder, W., and Wolff, I. (1994). The origin of spurious modes in numerical solutions of electromagnetic field eigenvalue problems. IEEE Trans. Microwave Theory Tech. 42, 644–653.
NUMERICAL METHODS FOR PHYSICAL FIELD PROBLEMS
279
Schutz, B. (1980). Geometrical Methods of Mathematical Physics. Cambridge, UK: Cambridge Univ. Press. Shashkov, M. (1996). Conservative Finite-Difference Methods on General Grids. Boca Raton, FL: CRC Press. Shashkov, M., and Steinberg, S. (1995). Support-operator finite-difference algorithms for general elliptic problems. J. Comput. Phys. 118, 131–151. Strand, B. (1994). Summation by parts for finite difference approximation for d/d x. J. Comput. Phys. 110, 47–67. Sun, D., Manges, J., Yuan, X., and Cendes, Z. (1995). Spurious modes in finite-element methods. IEEE Antennas Propagat. Mag. 37, 12–24. Taflove, A. (1995). Computational Electrodynamics. The Finite-Difference Time-Domain Method. Boston: Artech House. Taflove, A. (1998). Advances in Computational Electrodynamics. The Finite-Difference TimeDomain Method. Boston: Artech House. Tarhasaari, T., Kettunen, L., and Bossavit, A. (1999). Some realizations of a discrete Hodge operator: a reinterpretation of finite element techniques. IEEE Trans. Magn. 35, 1494–1497. Teixeira, F. L., and Chew, W. C. (1999a). Differential forms, metrics and the reflectionless absorption of electromagnetic waves. J. Electromagn. Wave 13, 665–686. Teixeira, F. L., and Chew, W. C. (1999b). Lattice electromagnetic theory from a topological viewpoint. J. Math. Phys. 40, 169–187. Tonti, E. (1975). On the Formal Structure of Physical Theories. Milano: Consiglio Nazionale delle Ricerche. Tonti, E. (1976a). The reason for analogies between physical theories. Appl. Math. Modelling 1, 37–50. Tonti, E. (1976b). Sulla struttura formale delle teorie fisiche, in Rendiconti del Seminario Matematico e Fisico di Milano, Vol. XLVI. pp. 163–257. Tonti, E. (1998). Algebraic topology and computational electromagnetism, in Proceedings of the Fourth International Workshop on Electric and Magnetic Fields, Marseilles (France). pp. 285–294. Truesdell, C., and Noll, W. (1965). The non-linear field theories of mechanics, in Handbuch der Physik, Vol. 3/3, edited by S. Flugge. Berlin: Springer-Verlag. Truesdell, C., and Toupin, R. A. (1960). The classical field theories, in Handbuch der Physik, Vol. 3/1, edited by S. Flugge. Berlin: Springer-Verlag, pp. 226–793. Versteeg, H. K., and Malalasekera, W. (1995). An Introduction to Computational Fluid Dynamics. The Finite Volume Method. Harlow, England: Longman. Warnick, K. F., Selfridge, R. H., and Arnold, D. V. (1997). Teaching electromagnetic field theory using differential forms. IEEE Trans. Educ. 40, 53–68. Webb, J. P. (1993). Edge elements and what they can do for you. IEEE Trans. Magn. 29, 1460–1465. Weiland, T. (1984). On the numerical solution of Maxwell’s equations and applications in the field of accelerator physics. Particle Accelerators 15, 245–292. Weiland, T. (1996). Time domain electromagnetic field computation with finite difference methods. Int. J. Numer. Modelling 9, 295–319. Whitney, H. (1957). Geometric Integration Theory. Princeton, NJ: Princeton Univ. Press.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 121
The Imaging Plate and Its Applications NOBUFUMI MORI Fuji Photo Film Co., Ltd., 798, Miyanodai, Kaisei, Ashigarakami, Kanagawa, 258-8538 Japan
TETSUO OIKAWA JEOL Ltd., Shin-Suzuharu Bld. 3F, 2-8-3 Akebono-cho, Tachikawa, Tokyo, 180-0012 Japan
I. II. III. IV.
V.
VI.
VII.
VIII.
Introduction . . . . . . . . . . . . . . . . . . . . Mechanism of Photostimulated Luminescence (PSL) . . . Imaging Plate (IP) . . . . . . . . . . . . . . . . . . Elements of the IP System . . . . . . . . . . . . . . . A. Exposure . . . . . . . . . . . . . . . . . . . . B. Reading . . . . . . . . . . . . . . . . . . . . . C. Erasing . . . . . . . . . . . . . . . . . . . . . D. Image Processor . . . . . . . . . . . . . . . . . Characteristics of the IP System . . . . . . . . . . . . A. Sensitivity . . . . . . . . . . . . . . . . . . . . B. Resolution . . . . . . . . . . . . . . . . . . . . C. Fading . . . . . . . . . . . . . . . . . . . . . D. Granularity and Uniformity . . . . . . . . . . . . . Practical Systems . . . . . . . . . . . . . . . . . . A. Transmission Electron Microscope (TEM) System . . . B. Computed Radiography and Radio Luminography System Applications of the IP . . . . . . . . . . . . . . . . A. High Sensitivity . . . . . . . . . . . . . . . . . B. Wide Dynamic Range . . . . . . . . . . . . . . . C. Quantitative Image Analysis . . . . . . . . . . . . D. Image Processing . . . . . . . . . . . . . . . . . E. Other Fields of Application of the IP . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
281 282 288 290 291 292 292 293 294 294 296 299 300 302 303 304 305 306 310 315 324 327 330 330
I. Introduction The imaging plate (IP) system was first developed for diagnostic X-ray radiography (Sonoda et al., 1983). The IP system has been used to obtain quantitative analysis in X-ray diffraction experiments and in autoradiography; that is, an image by the radioactive nuclei (Amemiya et al., 1988). It is useful for detecting neutron images (Niimura et al., 1994), and transmission electron microscope (TEM) images also can be recorded on the IP (Mori, Oikawa, Katoh, et al., 1988). The results have shown that the IP system is much more 281 Volume 121 ISBN 0-12-014763-7
C 2002 by Academic Press ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright All rights of reproduction in any form reserved. ISSN 1076-5670/02 $35.00
282
NOBUFUMI MORI AND TETSUO OIKAWA
useful than the conventional photographic film system or other image sensors such as television (TV) cameras, solid-state image devices, and gas-type area detectors. The IP is approximately 0.5 mm thick, and it is composed of flexible plastic film coated with photostimulable phosphor powder (BaFX : Eu2+, X = Cl, Br, I) together with an organic binder. In the IP system, a pattern of radiation is temporarily stored in the IP. The data on the IP are read out by an image reader, which scans the IP with a focused red-light laser beam and causes photostimulated luminescence (PSL). The emitted light falls on a photomultiplier tube (PMT), and the output of the PMT then becomes the time series of the digital signal. We call this new technology of radiography radio luminography, along with computed radiography in a diagnostic X-ray imaging. Advantages of the IP system are a wide dynamic range over five decades of magnitude, linearity of response to the intensity of radiation throughout the dynamic range, and 100 or 1000 times higher sensitivity than that of photographic films. Next, we will discuss the mechanism of PSL. The mechanism has attracted many researchers because it is interesting not only in the study of phosphor but also in that of the relation between luminescent centers and defects. There were some different opinions, and we will discuss the latest interpretations. We will also deal with the system including the IP, scanner (reader), printer, and processor. The structure of the IP will be discussed in relation to image quality. There are various ways of scanning the IP, all of which have both merits and drawbacks. In addition, we will discuss the characteristics of sensitivity, linearity, resolution, noise, and uniformity. The numerous practical data will show the reader the usefulness of the IP system and become a good reference for new applications of the IP.
II. Mechanism of Photostimulated Luminescence (PSL) Phosphor is a material that emits light excited by energy such as another light, ionizing radiation like electrons or X-rays, electric power, force, and chemical reactions (Blasse and Grabmaier, 1994). In these types of luminescence, first the excitation energy is converted to photon energy. However, photostimulated or thermally stimulated luminescence is caused by a second excitation of light or heat, after storage of the energy of the first excitation. We can use this second-type phosphor as memory for images. BaFX(X = Cl, Br, I) : Eu2+ (Sonoda et al., 1983), RbBr : TI+ (Amitani et al., 1986), Ba5SiO4Br6 : Eu2+ and Ba5GeO4Br6 : Eu2+ (Meijerink and Blasse, 1991), Ba3(PO4)2 : Eu2+ (Schipper et al., 1993), Y2SiO5 : Ce3+ (Meijerink et al., 1991), and (Ca, Sr)S : Eu2+, Sm3+ (Gasiot et al., 1982; Keller and Pettit, 1958) were reported as suitable secondtype phosphors. Among these, the most popular phosphor for PSL is BaFX
THE IMAGING PLATE AND ITS APPLICATIONS
283
Figure 1. The lattice structure of BaFX(X = Cl, Br, I). The lattice parameter depends on the larger halogen component. Eu2+ ions replace Ba2+ ions when Eu2+ ions are introduced as luminescent centers in the crystal. The layer is shown on the right-hand side (Liebich and Nicollin, 1977).
(X = Cl, Br, I): Eu2+ phosphor. In this article we will discuss the mechanism of PSL in BaFBr : Eu2+ phosphor. The crystal structure of BaFX is shown in Figure 1 (Liebich and Nicollin, 1977). This crystal has the layer structure F-Ba-X-X-Ba-F and belongs to the D74h space group. Since the X-X plane has weaker ionic force, the crystal tends to be cleaved along this surface. BaFX is colored under irradiation of ionizing radiation or ultraviolet light, the origin of the color being a defect called an F-center. The characteristics of F-centers have been studied in alkali halide and it was revealed that halogen vacancies (F+) in the structure capture electrons (Fowler, 1968).Yuste et al. (1976) observed two types of F-centers corresponding to the two kinds of halogen atoms F(Cl−) and F(F−) in BaFCl; Takahashi, Miyahara, et al. (1985) observed F(Br−) and F(F−) in BaFBr. BaFX : Eu2+ phosphor emits blue light as shown in Figure 2 (Sonoda et al., 1983). The origin of the blue light at about 390 nm is Eu2+ ions that are found in the BaFx crystal in small numbers as luminescent centers. Direct optical excitation of Eu2+ ions is about 4.6 eV (270 nm), not shown in the figure. The excitation spectrum of PSL in BaFBr : Eu2+ phosphor spreads over the visible light region (Fig. 3; Umemoto et al., 1988). In particular, in the red-light region we can get good efficiency, since this material is well suited to the red-light laser. Takahashi, Kohda, et al. (1984) proposed the following model as the mechanism of PSL in BaFBr : Eu2+. Ionizing radiation creates electrons and holes in the crystal. Electrons are trapped by halogen vacancies from F-centers. Holes
284
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 2. Luminescence spectra of BaFX : Eu2+ (X = Cl, Br, I) phosphor at room temperature (Sonoda et al., 1983).
are trapped by Eu2+ ions. Irradiation by light liberates electrons from F-centers to the conduction band, after which electrons recombine with holes of Eu2+ ions. The recombination energy is converted to light from Eu2+ ions. These processes are represented as an energy-level diagram (Fig. 4; Iwabuchi, Mori, et al., 1994). The following experimental findings support this model. Measurement of electron spin resonance spectra revealed that F-centers are formed by X-ray radiation (Takahashi, Miyahara, et al., 1985). The optical absorption spectrum due to F-centers, the excitation spectrum of photostimulated photoconductivity, and the excitation spectrum of PSL were measured in an X-irradiated BaFBr : Eu2+ single crystal (Fig. 5). Good agreement of the absorption spectrum of F-centers with the PSL excitation spectrum meant that an electron from the F-center is related to the PSL mechanism. Simultaneously observed photostimulated photoconductivity meant that the electron moves through the conduction band. Temperature dependence of the intensity of the photoconductivity
THE IMAGING PLATE AND ITS APPLICATIONS
285
Figure 3. Photostimulated luminescence excitation spectra of BaFBr : Eu2+ and BaFBr0.85 I0.15 : Eu2+ phosphor. On introducing iodine into a BaFBr crystal, partial replacement of bromine by iodine causes a spectrum shift to the longer wavelength region. This is because the lattice expansion effect causes a reduction of the binding energy of an electron of F(X−) center. The shift of the spectrum to the longer wavelength region is better suited to the application of a 680-nm semiconductor laser (Umemoto et al., 1988).
revealed that the thermal energy difference between the first excited state and the conduction band is 37 meV for F(Br−) and 1.3 meV for F(F−) respectively (Iwabuchi, Umemoto, et al., 1990). The change of luminescence intensity of Eu2+ ions before and after exposure to ionizing radiation or light stimulation supported the hypothesis that the Eu2+ ion traps a hole (Takahashi, Kohda, et al., 1984). In the preceding discussion, trapping an electron by a preexisting halogen vacancy is the origin of the F-center in the BaFX system. However, regarding the formation mechanism of color centers, there have been many arguments concerning the alkali halide system (Williams and Song, 1990). Two different processes of F-center formation have been discussed: (1) The lattice originally
286
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 4. The energy diagram that illustrates the photostimulated luminescence mechanism in BaFBr : Eu2+. The band gap is 8.2 eV and the optical absorption energy of the F-center that is formed by irradiation with ionizing radiation is 2.1 eV for F(Br−) and 2.5 eV for F(F−). Holes are trapped by Eu2+ ions. The ionization energy of Eu2+ ions is 6.2 eV, which means the direct creation of holes at Eu2+ ions, and released electrons are trapped by F+ centers to form F-centers. The memory process is represented by the solid line, and the readout process is represented by the dashed line (Iwabuchi, Mori, et al., 1994).
has defects such as halogen vacancies. When a free electron is introduced by ionizing radiation or an electrode, F-centers can easily be created by trapping electrons at halogen vacancies. This is the same process as that mentioned before in the proposal of Takahashi, Kohda, et al. (2) High-energy ionizing radiation distorts the lattice and pushes out an atom from a lattice site. Thus, defects including F-centers are introduced into the crystal. The growth curve of F-centers as a function of radiation dose enables us to distinguish which process is dominant. In alkali halide, Rabin and Klick (1960) found linear growth, which means new creation of F-centers; thus, the second process is dominant in alkali halide. However, in BaFBr Kondo et al. (1994) found that the F(Br−) band grew and became saturated after 60 min of X-irradiation. The saturation of growth suggested that filling preexisting vacancies with electrons is the main process of F-center creation in BaFBr (the first process).
THE IMAGING PLATE AND ITS APPLICATIONS
287
Figure 5. Comparison of the three spectra of a BaFBr : Eu2+ single crystal (optical absorption, photoconductivity, and photostimulation spectra). The sample was heavily exposed to X-rays; measurement was executed at room temperature with a configuration of E ⊥ c. (Reprinted from Takahashi et al., J. Lumin., 31 & 32 (1984), 266, “Mechanism of Photostimulated Luminescence in BaFX : Eu2+(X = Cl, Br) Phosphors,” with kind permission from Elsevier Science-NL, Sara Burgerhartstraat 25, 1055 KV Amsterdam, The Netherlands.)
The possibility of a complex center of an F-center (a hole center and a Eu2+ ion) was another candidate discussed. The point in question was which process, single molecular or bimolecular, accounted better for the linear characteristic of the IP system. On the one hand, following the proposal of Takahashi, Kohda, et al., PSL occurs by the meeting of electrons with holes; the probability of meeting depends on the product of their concentrations. This is essentially
288
NOBUFUMI MORI AND TETSUO OIKAWA
a bimolecular process. On the other hand, in the case of complex centers of F-centers and hole centers, the recombination probability of an electron and a hole should be some fixed value, not depending on their concentration (single molecular process). In 1988 von Seggern et al. calculated that the bimolecular reaction led to a quadratic relationship with dose whereas the single molecular reaction led to a linear relationship. However, Iwabuchi, Mori, et al. (1994) repeated this calculation much more precisely and found that linearity also occurs in the bimolecular process, the quadratic relationship occurring in only restricted conditions. Thus, there is no problem with the model of Takahashi, Kohda, et al. Only in the low-temperature region, where an electron cannot escape from the F-center without thermal activation energy, must we consider complex centers and electron hole recombination that occurs when electrons move by the tunneling process. Hangleiter et al. (1990) found this tunneling process at LHeT. As a summary of this section, there have been many discussion on the process of photostimulation in BaFBr : Eu2+. The model proposed by Takahashi, Kohda, et al. explained most of the experimental results. However, the study of BaFBR : Eu2+ has continued to attract the attention of researchers. Ohnishi et al. (1994) and Radzhabov and Egranov (1994) revealed the luminescence of self-trapped excitons. Koschnick et al. (1992) and Kondo et al. (1994) discussed the stability of F-centers, which is important for fading characteristics discussed later. Such studies will contribute to the improvement of the characteristics of the IP system.
III. Imaging Plate (IP) The IP is essentially composed of a protective layer, a phosphor layer, and a support layer (Fig. 6). The phosphor layer has the structure of a dispersion of phosphor particles and organic binder. Since the among of binder is of the order of one tenth that of phosphor by weight, the phosphor particle is not fully buried in the organic binder, but bound to a neighboring phosphor particle only by contact with it (not shown in the figure). The phosphor layer is sensitive to ionizing radiation, and its thickness is about 50–300 μm. The protective layer, 3–10 μm thick, protects the phosphor layer from dust, stain, or damage by external forces. The support layer gives the appropriate rigidity to the IP for transportation by the mechanical system. Flexibility is also useful in autoradiography for close contact between samples and the plate. The luminescent intensity of the IP is closely related to the reading system with respect to excitation wavelength and energy density; however, the characteristics of the phosphor are the most important. Although BaFBr : Eu2+ phosphor was originally used combined with a He-Ne laser (633 nm) as a stimulation light source, recently BaFBr0.85I0.15 : Eu2+ has been used much
THE IMAGING PLATE AND ITS APPLICATIONS
289
Figure 6. The structure of the imaging plate (IP). The IP consists of three main parts: protective layer, phosphor layer, and support. In this schematic drawing of the IP, phosphor grains of various forms, which are arranged irregularly in the layer, cause scattering of both the reading laser light and the generated luminescence. The bell-shaped pattern represents the intensity distribution of the reading light. Broadening of the shape causes degradation of the resolution. The degree of scattering is larger in the deeper part (B > A).
more together with a high-power semiconductor laser of 680 nm. This is because the semiconductor laser is preferable for reliability and reducing the system size. Figure 3 shows a comparison of the stimulation spectra of the two phosphors. The advantage of the BaFBr0.85I0.15 : Eu2+ is apparent for the 680-nm laser. The particle size of the phosphor affects the resolution and the noise of the image. The IP in the 1980s was made from about 7-μm phosphor particles, but in the 1990s, 4-μm phosphor particles were made. Using a smaller particle size improves the resolution and reduces the noise of the phosphor grain. Therefore, the direction of phosphor development is to obtain the phosphor of higher luminescent intensity and smaller particle size. Figure 6 also illustrates
290
NOBUFUMI MORI AND TETSUO OIKAWA
schematically how the reading light scatters in the phosphor layer. The degree of scattering determines the resolution in position and the detection intensity of the luminescence. The thicker phosphor layer increases the absorption efficiency of ionizing radiation and luminescent intensity, especially for X-rays; however, scattering degrades the resolution. To improve the resolution, a blue pigment, which absorbs only reading light, is some-times useful. As the phosphor particle scatters light, it may be useful to use a transparent phosphor layer such as a single crystal; however, the reading light is reflected at the other side of the layer, and this reflection occurs back and forth many times. Thus, the resolution should become much worse, and another technology is needed to reduce the broadening of the laser beam. The protective layer is important for the durability of using the IP many times; however, the thickness of the layer affects the resolution of the image. Generally, a thicker protective layer is better for durability but worse for resolution. Another point concerning the protective layer is the attenuation effect for ionizing radiation. The penetration depth of the electron at the protective layer estimated by the equation of Katz and Penfold (1952) is 70 μm for 100-keV, 2 μm for 10-keV, and 0.07 μm for 1-keV electrons. The maximum energy of an electron of tritium is about 10 keV; more than half its energy will be dissipated even for the 1-μm protective layer. Thus, the Ip for tritium has no protective layer. The phosphor does not degrade in normal humidity, but it does decompose on contact with water. In the field of autoradiography, where the surface is in contact with the sample, water contained in the sample often permeates the phosphor through the protective layer. Thus, sample dryness is important for durability of the IP. These features provide a survey of the structure; however, there are many commercially available types of the IP similar in outward appearance. Size, thickness, and flexibility may vary. There are high-resolution types and highsensitivity types. In practice, some are better for X-rays, others for TEM and autoradiography. Each of them combines almost exclusively with a particular reading system. Thus, in selecting an IP for a specific purpose, one should study all the characteristics of the system.
IV. Elements of the IP System Figure 7 shows the typical configuration of the IP system. The reader reads out the IP after exposure to ionizing radiation. Luminescence from the IP is photoelectrically detected and converted to a digitized electrical signal to be processed by the computer system. The eraser then exposes the IP to visible light to erase stored data and the IP becomes reusable. The details of this procedure are as follows.
THE IMAGING PLATE AND ITS APPLICATIONS
291
Figure 7. The configuration of the IP system. The IP system comprises the IP, the reader, the eraser, and the processor. After exposure to ionizing radiation, the IP is fed into the reader where the IP is scanned with the visible laser. The IP emits blue light, the intensity of which is proportional to the dose of ionizing radiation. The luminescence is detected by a photomultiplier tube and converted to an electrical signal. In the eraser, data are processed to be enhanced or analyzed to measure intensity and so on. After reading, the IP is irradiated with light to erase data stored in the IP.
A. Exposure The IP is a two-dimensional sensor for ionizing radiation. When the IP is exposed to ionizing radiation, the Ip stores the radiation energy as a latent image. Since stored energy on the IP disappears with light exposure, we must ensure that the ionizing radiation falls on the IP in the dark. For diagnostic X-ray imaging, it is convenient to use the light-shield case known as the cassette. This cassette is almost the same as that used in a conventional film screen system. For autoradiography, when the sample is very thin, like a membrane, we can expose the IP by contact with the sample in the cassette. However, when the sample is too thick to use the cassette, we need a box to shield light. The IP system is so sensitive a detector of environmental radioactivity, that the latter causes a noiselike fog level of photographic film. So that we can avoid this, it is preferable to erase just before an exposure to eliminate any prior stored
292
NOBUFUMI MORI AND TETSUO OIKAWA
activity and, furthermore, to use a shield box made of lead for long exposures as in autoradiography. After exposure, it is better to read the plate as soon as possible because the stored energy gradually escapes even in a dark place. We call this phenomenon fading.
B. Reading In the reader, the IP is scanned with a red-light beam that is focused on the surface of the IP. Luminescence is about 390 nm for BaFX : Eu2+ and comes to the detector with excitation light. Thus, the optical filter in front of the photodetector is used for cutting off laser light. There are many possible ways of scanning (Fig. 8). In the case of a flat-bed scanner, the IP is held on the flat bed and transported. A rotating mirror reflects laser beam light and focused beam spots move in a straight line on the IP. An F-θ lens is used to achieve uniform velocity scanning on the IP. The plate moves along the perpendicular direction to that of the spot motion. In the spinner-type scanner, the IP is fixed along the inner surface of the cylinder. Laser light passes through the dichromatic mirror (A), is reflected by the mirror (B), and is then focused on the IP surface. Luminescence of the IP is collected by a lens; the lens and mirror rotate as indicated by the dashed line in the figure. The pair of lenses forms a confocal configuration, which is used for PIXsysTEM. However, in the FDL-5000 system, lens (C) is not used. In the drum-type scanner, the IP is fixed on the cylindrical drum and the reading head moves parallel to the axis of the drum. In the disk-type scanner, the IP rotates and the reading head moves along the radial direction. In this type of scanner, the spatial density of reading must be kept the same between inner and outer positions of the plate. The flat-bed type is the most popular for medical applications or biotechnology; the spinner type is used for TEM.
C. Erasing After reading, exposing the IP to visible light erases the data stored. The light source is an ordinary fluorescent lamp or sodium lamp, chosen for its electrical power efficiency. The erasing level restricts the lowest detected level of the next measurement. High sensitivity means essentially competition with detection of unwanted environmental activity. Although the film has a one-way characteristic of storing information, the IP has a reset procedure by erasing. This is one of the reasons that the IP system can achieve highsensitivity detection.
THE IMAGING PLATE AND ITS APPLICATIONS
293
Figure 8. Various types of scanners. (a) Flat-bed-type scanner: The IP is held on the flat bed. The laser beam reflected by a rotating or turning mirror scans the IP. The luminescence from the IP is guided to the photomultiplier tube (PMT) through the light guide. (b) Spinner-type scanner: The IP is held on the inner side of the cylinder and moves along the direction of the axis of the cylinder while the reading head (spinner) rotates. (c) Drum-type scanner: The IP is put on the rotating cylinder (drum). The reading head moves along the direction parallel to the axis of the cylinder. (d) Disk-type scanner: The IP turns around. The reading head, which irradiates it with laser beam light and collects luminescence, moves along the radial direction.
D. Image Processor The latest technology of computer and memory devices makes it possible to execute complicated image-processing tasks much more rapidly. Image processing is useful to distinguish patterns or to measure the quantity of activity and pattern shape, gradation processing, narrowing of the range, and enhancement contrast of the image. By broadening the range, we can easily
294
NOBUFUMI MORI AND TETSUO OIKAWA
observe whole patterns of large dynamic range such as diffraction patterns. Since the IP system has good linearity, direct reliable quantization is possible with image data, and displaying the profiles of image data is also useful to compare activity. Fast Fourier transform (FFT) or contour map processing is useful to improve distinguishing power. The “superimpose” function allows one to write letters or arrow marks on the recorded image, and this is useful for presentation. In the field of diagnosis, image processing may indicate the point that doctors should note. These types of processing enhance the value of the image and this is one of the merits of the IP system. Image data can be stored on a large-capacity memory device such as an optical magnetic disk; thus, we can archive image data and retrieve images quickly.
V. Characteristics of the IP System In this section, we will discuss mainly the sensitivity and dynamic range, the resolution, fading, and noise. The noise characteristic is important to assess the efficiency of the detector, although it is difficult to calculate. These characteristics will be discussed by using the data of the TEM system. However, this discussion should be applicable to other fields, if one takes into consideration any differences of ionizing radiation.
A. Sensitivity Sensitivity is the luminescent intensity detected. Thus, the flow of image data is important for any discussion of the sensitivity factor (Fig. 9). Let N be the initial number of quanta of ionizing radiation. This number is multiplied by efficiency factors. The efficiency of the IP is represented by α(I). Alpha includes the absorption efficiency of ionizing radiation, electron- and holecreating efficiency, and readout efficiency. Alpha depends on the intensity I of reading light; the dependence of α on laser intensity I is gradually saturated in a practical system. In the case of X-rays of 80 kVp, α is estimated to be about 10–200 in practical systems. Beta is the light-collecting efficiency, including the transmission characteristics of optical elements such as filters, light-collecting guides, or lenses. This is normally 0.1–0.5. Chi is the quantum efficiency of the photodetector. As for the PMT, it is the quantum efficiency of the photocathode, typically 0.1–0.3. Delta is the amplifying factor of the PMT or electrical circuit, normally 102–107. If we use this notation, detected luminescent intensity becomes Luminescent intensity = N α(I )βχ δ
THE IMAGING PLATE AND ITS APPLICATIONS
295
Figure 9. The flow of image carriers. N quanta of ionizing radiation fall on the IP, which absorbs ionizing radiation and emits photons with efficiency α(I), when stimulated with light of intensity I. Photons from the IP reach the photodetector with efficiency β, which is defined by the light-collecting efficiency and transmission coefficient of the optics. Photons are converted to electrons by the photodetector with efficiency χ , and the number of electrons increases in both the photodetector and the electrical amplifier by a factor δ. The final number is the product of these efficiencies.
The signal intensity of the reader of PIXsysTEM as a function of electron dose is shown in Figure 10 along with the data of photographic film (Mori, Oikawa, Harada, et al., 1990). The figure shows good linearity of signal intensity to electron dose over five decades. The IP is used for many other types of ionizing radiation, and the linearity of the PSL intensity to the dose of radiation is generally observed. This is because ionizing radiation creates electrons and holes in the phosphor without any nonlinear process irrespective of the kind of radiation, although the efficiency will be different. The vertical axis for film is optical density. Although it may be possible to obtain a straight line by using another unit or calibrated data, the drawbacks of using film are its narrow dynamic range of about two decades and slightly poorer reproducibility, since the density changes by chemical conditions like the concentration or the temperature of the developer. Thus, using the IP improves the precision compared with that of the photographic film method. Figure 11 shows the dependence of the sensitivity of PIXsysTEM on accelerating voltage ( Mori, Oikawa, Harada, et al., 1990). The IP system shows its maximum intensity at about 150 keV. Ogura and Nishioka (1995) measured the dependence of the sensitivity for 40–200 keV for the FDL-5000 system and obtained similar results to those of Figure 11. The origin of the decrease below 100 keV is thought to be due to electron absorption by the protective layer.
296
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 10. Sensitivity characteristics of the IP system (PIXsysTEM). The signal intensity of the IP is plotted. The density curve of FG film, developed by D-19 for 2 min, is also plotted as a reference (Mori, Oikawa, Harada, et al., 1990).
The interpretation of the decrease in the higher-energy region is as follows: As the energy of the electrons goes up, the penetration depth of the incident electrons increases and electron energy is mainly dissipated in the deeper part of the phosphor layer. However, the intensity of the light for reading becomes weaker in the deeper part of the phosphor layer because of the absorption and diffusion of light. Luminescence from the deeper part of the phosphor layer is also diffused and weakened. As a result, the detected intensity of luminescence becomes weakened. Electrons of much higher energy will pass through the phosphor layer, and the intensity will then decrease substantially. B. Resolution The IP itself does not have discrete pixels, but a pixel is created as the electrical signal by the reader. Thus, signal response is very important for resolution.
THE IMAGING PLATE AND ITS APPLICATIONS
297
Figure 11. Dependence of sensitivity on accelerating voltage. The signal intensity of the IP reader (PIXsysTEM) was measured by changing the accelerating voltage of the electrons (Mori, Oikawa, Harada, et al., 1990).
One of the factors determining the resolution is the scattering of the laser beam in the phosphor layer, as discussed in the IP section (Section III). Another factor is the time response of the luminescence and the photodetecting system. The decay characteristic of the luminescence, the time in which the luminescence declines to 1/e intensity, is about 0.6 μs in the case of BaFBr0.85I0.15 : Eu2+; the reading time for one pixel should be longer than this time. The response of the electrical system, which converts luminescence to a digital electrical signal, should be shorter than the time for one pixel. Of the many ways of evaluating resolution, some researchers select the method in which the lattice image of a gold crystal of graphitized carbon is used. This way is very practical, but the result is affected by the characteristics of the TEM and the operating conditions when one is taking images. A method using a metal wire has been examined (Burmester et al., 1994; Isoda et al., 1992). The wire was directly fixed on the IP; uniform electron radiation created a shadow of the wire on the IP. The resolution as MTF (modular transfer function) was determined by the frequency analysis of the difference between the theoretical image and the observed image: MTF(q) = Fobs (q)/Ftheo (q), where q was spatial
298
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 12. The resolution of the IP system. The results of the response measured with the metal plate method are summarized. Squares: HR-II IP and CR-101 systems; closed circles: UR-III IP and PIXsysTEM (Mori, Oikawa, Harada, et al., 1990). Open circles: FDL-URV and FDL-5000 system (Ogura et al., 1994).
frequency, Fobs(q) was the amplitude of the Fourier spectrum of the observed shadow profile, and Ftheo(q) was the amplitude of the spectrum of the theoretical square-well profile. Instead of Fourier analysis, one may use a metal mask that has a pattern of openings of various spatial frequencies (Mori, Oikawa, Harada, et al., 1990). Uniform exposure made a square wave pattern on the IP; the readout amplitude of the wave pattern declined at a higher spatial frequency. Thus, the resolution was expressed by Response (q) = A(q)/A(0), where A(q) was the amplitude of the image profile at spatial frequency q. This corresponds to the contrast transfer function (CTF). The MTF and the CTF give almost the same result; however, the MTF is more suitable for treatment of theoretical analysis. Figure 12 shows the improvement of the resolution by comparing the resolutions of the three systems, measured with the metal mask method. Squares indicate the result of flat-bed scanning, with a pixel size of 100 μm; closed circles that of PIXsysTEM, with a pixel size of 50 μm; and open circles
THE IMAGING PLATE AND ITS APPLICATIONS
299
that of FDL-5000, with a pixel size of 25 μm. The improvement in resolution is important for TEM systems, since we can take images covering a wider area and at lower TEM magnification.
C. Fading The intensity of the stored image on the IP decreases with the passage of time. Figure 13 shows the fading characteristics of PIXsysTEM (Oikawa, Shindo, and Hiraga, 1994). The degree of fading depends on temperature, however, and is generally larger as the temperature is higher. This characteristic depends on the phosphor itself and on the wavelength of the reading light. There is no precise comparison of the dependence on the various types of the IP, but
Figure 13. Fading characteristics. Intensity change with the passage of time is plotted at 0 and 25◦ C. The measurement was made with doses of 1 × 10−10, 10−11, 10−12 C/cm2 (Oikawa, Shindo, and Hiraga, 1994).
300
NOBUFUMI MORI AND TETSUO OIKAWA
there is not much difference. Oikawa, Shindo, and Hiraga proposed empirical equations of fading as a function of temperature. This is useful for estimating the degree of fading. The fading characteristic does not depend on the dose; this is very important as it is possible to compare the intensity even after fading. The fading is negligible provided that the IP is kept in cool conditions, but this is not practical for TEM use. For autoradiography, however, it should help to increase the sensitivity because of its long time exposure. D. Granularity and Uniformity Image noise (granularity) is directly related to the perceptivity of the image. In this sense, noise is another aspect of the sensitivity of the system. Granularity is the deviation of the intensity of each pixel, composed of mainly two components. One is dependent on the number of image carriers, while the other is not and has a fixed value. The former follows a statistical deviation, the Poisson distribution: (Noise)2 = 1/n, where n is the number of image carriers (Dainty and Shaw, 1974). This number of image carriers changes as the detection process proceeds (Fig. 9). The fixed noise is electrical noise or the fixed noise of the IP. Total noise [reciprocal of signal-to-noise ratio (S/N )] is expressed as the sum of the individual types of noise: 1 1 1 1 1 1 1+ + Noise2fix + + + = (S/N )2 N α(I ) α(I )β α(I )βχ α(I )βχ δ On the basis of the preceding equation, the fixed noise appears at high dose (N is large) and determines the lower limit of the signal-to-noise ratio of the system. Conversely, the 1/N term appears at low dose (N is small). The multiplier is composed of α, β, χ, and δ. Alpha, the efficiency of the IP, is contained except for the first term. Beta and χ are important, since they are less than unity and may become the dominant part of the noise at low dose. Figure 14 shows noise characteristics of the FDL-5000 system (Ogura and Nishioka, 1995). The noise becomes better as the electron dose increases. The noise power is inversely proportional to the number of electrons exposed; however, improvement saturates because of the fixed noise. This figure shows that noise follows the relation just given. The efficiency of detectors is often discussed using a term called detective quantum efficiency (DQE), related to the noise characteristics, because it does not depend on the method of detection. The DQE is expressed as 2 $ 2 So Si DQE = No Ni where as usual S is signal and N is noise. Subscripts o and i are output and
THE IMAGING PLATE AND ITS APPLICATIONS
301
Figure 14. Noise characteristics of the IP system (FDL-5000; Ogura and Nishioka, 1995).
input, respectively. The denominator is the number of quanta of ionizing radiation. The numerator is generally expressed by the equation discussed in the last paragraph. However, when one is calculating the numerator for DQE, it is false to take the noise of granularity directly because the resolution characteristic reduces the noise in appearance. With compensation for this factor, frequency analysis of noise becomes important together with resolution. In the case of 80-kVp X-rays at 1 mR, Ogawa et al. (1995) reported a DQE of 0.2 at 1 line-pair per mm for the FCR 9000/ST-V system. This is an accurate way of characterizing a system, but it is difficult because it needs resolution data (MTF) and the data must be processed by FFT (Dainty and Shaw, 1974). A convenient way of calculating DQE with larger pixels such as 3 × 3 is sometimes used for minimizing the effect of the response, although information about the frequency dependence is lost. Thus, in the case, the value should be discussed together with resolution. Burmester et al. (1994) estimated by this convenient way that the DQE of their IP system is about 0.9 for 120-keV electrons at about 10−13 C/cm2. Ogura and Nishioka (1995) also calculated the DQE of the FDL-5000 using the data of Figure 14 and found a value of almost unity for 100-keV electrons at about the same dose region, taking care to measure
302
NOBUFUMI MORI AND TETSUO OIKAWA
the electron dose accurately. The difference is large between the DQE result of Ogawa et al. and that of Burmester et al. We suppose that this is due to the difference in ionizing radiation (X-rays and electrons). Electrons in this energy region will all be stopped as is predicted by the Katz–Penfold equation; for X-rays, however, more than 50% of the X-ray photons escape from the phosphor layer. Thus, the efficiency α will be very different between the two sources. Another factor is involved when the continuous signal is converted to a digital signal: the density resolution of the signal. When the density resolution is not as small as the noise level of the image data, the pattern will have artifacts such as contours, or the precision of quantitative analysis will become degraded. However, too small a density resolution leads to a waste of memory resources or time for image processing. Normally, data are logarithmically transformed, as expressed by the following equation: I = A · 10(L · (Q/M)) where L is the dynamic range of the image, m is the density resolution expressed by the bit number p : m = 2 p, and Q denotes the digital data. The change of fraction between Q and Q +1 is D = L/m (ln 10), which is sometimes called the error of quantization. The value D should be almost the same as that of the image noise. For example, in the case of L = 4 and noise = 0.4%, then m should be 1000, which means that the density resolution should be 10 bit (1024). This density resolution should be selected depending on the application field because the necessary signal-to-noise ratio depends on the application field. Uniformity of sensitivity is important for quantitative analysis. In the flat-bed scanner, the uniformity of laser light intensity and the light-collective efficiency govern the uniformity characteristics. Uniformity is always the same and can in principle be calibrated in the system. In some systems, the calibration is executed automatically and the user does not need to recognize this factor. The uniformity of the IP originates mainly from the uniformity of thickness of the phosphor layer. Amemiya et al. (1988) reported that the uniformity error is about 1.3%. They concluded that this degree of uniformity is sufficient for X-ray diffraction analysis for their purpose.
VI. Practical Systems In the previous section, we discussed the principles of the IP system and dealt with the basic ideas. In this section, we will consider the practical system.
THE IMAGING PLATE AND ITS APPLICATIONS
303
A. Transmission Electron Microscope (TEM) System Figure 15 shows the layout and components of the TEM system of the FDL5000 (Ogura et al., 1994). We can use the IP in the TEM just like photographic film because we can use an ordinary film cassette for the IP, together with the film magazine of the TEM. After removal from the cassette, exposed IPs are put into the magazine for reading the system. After the information such as operation conditions of TEM, sample names, and reading parameters are set, the reader reads all the IPs automatically. The data of the IP are stored on the digital data storage (DDS) unit simultaneously while the reader is reading. When the printer is connected to the IP reader, the image hard copies are also available at the same time. The image data in the DDS are transferred to the processor and processed and displayed. Since the processor is independent of the reader, image capture and image analysis can be performed separately. The size of the IP used is about 94 × 75 mm. The pixel size is 25 μm. The data volume is about 23 M bytes. In the TEM, photographic film or a TV camera system has been used (Reimer, 1984). Burmester et al. (1994) summarized the DQE of image devices: less than 0.35 for photographic film and 0.4–0.7 for slow-scan chargecoupled devices; (SS-CCDs; Kujawa and Krahl, 1992). They also reported that
Figure 15. Transmission electron microscope (TEM) system (FDL-5000). In this configuration, the IP is used with the TEM cassette in the TEM and with an IP magazine in the reader. Data from the IP are transferred to the computer system by the data storage media of DDS. This is because the quantity of data in the system is several 10 M bytes, so the data transfer time is not negligible. The separation of data processing and reading makes the best of the independent operation of each step (Ogura et al., 1994).
304
NOBUFUMI MORI AND TETSUO OIKAWA
the DQE of their own IP system was about 0.9, as discussed in Section V.D. This high efficiency is one of the merits of the IP system. The high sensitivity is useful not only for saving the sample from damage by the electron beam, but also for making it possible to use a high-speed shutter, which is helpful for avoiding the deterioration of the image quality by the vibration of the sample. The pixel size is 25 μm; thus, the image enlarged 16 times by area is not unnatural because the resolution limit of the naked eye is 100 μm. This digital enlargement contains no distortion factor caused by the optical system of enlarging equipment, as in the case of photographic systems.
B. Computed Radiography and Radio Luminography System The IP system was first used in the medical field for X-ray imaging. In this field the technique was called computed radiography (Tateno et al., 1987). High sensitivity is good for reducing the dose of the patient. The digital image enables us to make a picture archiving and communication system (PACS) and allows comparative diagnosis between isolated hospitals by the transmission of digital images. The IP system is widely used in this field and various systems are now available. A built-in system, in which the system circulates the IP and exposure and the reading and erasing process is executed in one system, is very convenient for examination. TEM application, autoradiography, X-ray diffraction, and so on are called radio luminography. In these fields the scanner most popularly used is the flatbed type and for high resolution, the spinner type. The IP system was evaluated in 1986 in the field of X-ray diffraction (Miyahara et al.). The high DQE and wide dynamic range of the system, together with its absence of count-rate limitation, resulted in a significant reduction of exposure time. Thus, the IP has helped protein crystallographers to obtain accurate measurements in a shorter time. This saves the sample from beam damage, so full data can be obtained with the use of only one sample. In the case of photographic film, many samples are needed to get full data and this degrades the accuracy of the data. This is the reason why the IP system has led to much progress in this field (Amemiya et al., 1988; Sakabe, 1991). In the field of X-ray diffraction, the combination with a synchrotron-radiation source is most successful; in addition, the IP system should be promising for use with a conventional laboratory-scale X-ray source (Sato et al., 1993). In the biotechnology industry, autoradiography is commonly used to analyze gene and protein sequences. Since the exposure time ranges from a day to a month in the conventional way of using photographic film, a reduction of exposure time by a factor of more than 10 by the IP system is very useful (Amemiya and Miyahara, 1988). In addition, one can measure the radioactivity of part of the sample by image processing, without taking off the part of the
THE IMAGING PLATE AND ITS APPLICATIONS
305
sample and measuring by liquid scintillation counter. These merits raise the importance of the autoradiography method. Neutron radiography is used for nondestructive testing, such as inspection of organic material in a metal vessel, or neutron diffraction analysis to investigate the position of hydrogen in a protein. However, the conventional IP is not sensitive to neutrons. Niimura et al. (1994) developed an IP that contains a Gd or Li compound in the phosphor layer. Since Gd or Li atoms have large cross sections for neutrons, absorb neutrons, and emit gamma rays or electrons, these can be detected by the phosphor. These researchers justified the merits of this system and demonstrated neutron radiography with the IP. Katto et al. (1993) measured the beam profile of an ultraviolet (UV) laser with the IP for tritium. Since BaFX : Eu2+ phosphor is sensitive to UV-VUV (UV–vacuum ultraviolet) light (Iwabuchi, Mori, et al., 1994), the IP is a valuable image device in this region. Nishikawa, Akimoto, et al. (1994) examined field-emission and field-ion microscopies with the IP; that is, images of He+ or Ne+. They showed the possibility of a quantitative analysis of electron tunneling and a field ionization probability over individual surface atoms. It is the combination of all these characteristics—sensitivity, dynamic range, resolution, and large effective area—that generates the superiority of the IP system. In some characteristics, another image system is better than the IP system. For example, the film system has good resolution and a wide effective area, but its sensitivity and dynamic range are not sufficient. The TV camera system has good sensitivity, spatial resolution, and time resolution; however, the effective area is small. The IP system does not suffer from the drawbacks of the film system and is suitable for the detection of images of ionizing radiation. Furthermore, it is important to comment on the easiness of handling of the IP system. The IP itself does not need any electric power. It is merely a thin plate and the only essential precaution is to exclude stray light. On reading, we need a large precision system; however, this is not an obstacle at exposure. This easiness is another merit of the IP system. Thus, we can apply the IP to many fields of imaging—electromagnetic waves from the UV region to the gammaray region, electrons, ion beams, and neutrons. Its characteristics overcome the drawbacks of conventional image sensors. With the development of new types of IPs like those for neutron imaging, this new technology called radio luminography will expand the field and make itself more valuable.
VII. Applications of the IP In this section, application data obtained by many researchers are introduced, which illustrate the advantages of the IP. The application fields in which the IP is expected to exhibit its performance are listed in Table 1. In these fields, there
306
NOBUFUMI MORI AND TETSUO OIKAWA TABLE 1 Likely Fields of Application of the IP No.
Advantages of the IP
1
High sensitivity
2
Wide dynamic range
3
Linear sensitivity
4
High-precision digital image
5
Dry system and others
a
Application fields a) Observation of beam-sensitive specimen b) Data acquisition with high-speed shutters (low- and high-temperature stages, etc.) c) Dark-field and weak beam method d) High-contrast images e) Electron diffraction and CBEDa patterns f) Electron intensity measurement g) Quantitative image analysis h) Image processing i) Image contrast enhancement j) Image filing and retrieving k) Reduction of personnel
CBED, Convergent-beam electron diffraction.
have been limitations to observation with conventional photographic film. Use of the IP is expected to break through those limitations.
A. High Sensitivity In this section, application data, illustrating the high-sensitivity performance of the IP, are introduced. For example, the IP was applied to TEM observation of silver bromide microcrystals, which are typical of the electron-sensitive materials, byAyato et al. (1990). Silver bromide (AgBr) microcrystals are so susceptible to beam irradiation damage that they are destroyed during room temperature recording using conventional photographic film, which makes recording difficult. Figure 16a shows AgBr microcrystals destroyed during exposure with conventional photographic film. The authors therefore reduced the electron dose by a factor of 100 by using the IP and thus succeeded in recording AgBr microcrystals without destroying them (Fig. 16b). The high sensitivity of the IP allowed us to record images of the silver bromide microcrystals at room temperature with very little irradiation damage by reducing the electron dose at the specimen. In low-dose observation, the IP is of great use for recording an image with good image contrast even at low-electron intensity. This is because the IP has a linear response to exposure even at low-exposure levels. Another example is a measurement of electron irradiation damage to a polyethylene single crystal (Oikawa, Shindo, Kudoh, et al., 1992). The degree of specimen damage was evaluated from the degree of intensity fading of an
THE IMAGING PLATE AND ITS APPLICATIONS
307
Figure 16. Electron micrographs of silver bromide microcrystals taken at room temperature (direct magnification: ×15,000). (a) Recorded with conventional photographic film (Fuji FG). Electron dose: 700 electrons/nm2. (b) Recorded with the IP. Electron dose: 7 electrons/nm2.
electron diffraction spot from the specimen (Kobayashi and Sakaoku, 1964). Figure 17 shows electron diffraction patterns of a polyethylene single crystal. These diffraction patterns were obtained at an accelerating voltage of 200 kV and an extremely low-electron dose rate, 1 electron/(nm2 · s). Moreover, the exposure time was set to 0.1 s in order to improve the time resolution per image during the exposure. Figure 17a shows an electron diffraction pattern taken by irradiating a fresh field of view with an electron beam. The image clearly shows even higher-order diffraction spots. Figure 17b shows a pattern taken after a dose of 600 electron/nm2. The logarithms of the intensity distributions of the two patterns are shown along the horizontal lines in the figures. Figure 18 shows background subtraction of the intensity of an electron diffraction with three-dimensional distributions. The spots are (200) irradiated with 200-kV electrons at doses of 250 and 480 electrons/nm2. Figures 18a and 18d show the original intensity distributions of the diffraction spots (200). Figure 18b and 18e show background intensity distributions obtained by a background fitting method (Shindo, Hiraga, Iijima, et al., 1993). Figures 18c and 18f show the net intensity distributions of the spots (200) after background intensity subtraction. Figure 19 shows the net intensity distribution changes after background subtraction of diffraction spots (200) after irradiation with
308
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 17. Electron diffraction patterns of a polyethylene single crystal with a thickness of about 10 nm, and their intensity distributions obtained by the IP (Oikawa, Shindo, Kudoh, et al., 1992) (200 kV, room temperature). (a) Electron dose: 0.1 electrons/nm2 (fresh field of view). (b) Electron dose: 600 electrons/nm2.
0.1, 250, and 480 electrons/nm2. Integrating the spot intensity allowed measurement of the change of the diffraction intensity with electron irradiation. Figure 20 shows the change of the integrated (200) reflection spots for 200 and 100 kV. In this case, the incident electron intensity was obtained as the whole intensity of the diffraction pattern, and the integrated (200) reflection intensities were normalized relative to the incident intensity. In the same electron irradiation condition, the reflection intensity at 100 kV fades more rapidly than that at 200 kV. At 200 kV, the reflection intensity at 730 electrons/nm2 irradiation faded to one twentieth of the original value, and at 100 kV, the intensity at 480 electrons/nm2 faded to one tenth of the original value. Because of its wide dynamic range, the IP records both high intensities (diffraction spots) and weak intensities (halo rings) in a single image. In addition, its linear response characteristic allows quantitative measurement of
THE IMAGING PLATE AND ITS APPLICATIONS
309
Figure 18. Background subtraction process of the intensity of electron diffraction spots (200) irradiated with 200-kV electrons at doses of (a–c) 250 electrons/nm2 and (d–f) 480 electrons/nm2.
the beam intensity. Furthermore, using the high sensitivity of the IP allows the exposure to be carried out with a very low dose, by using a high-speed shutter. The intensity fading of the diffraction spots of polyethylene with electron irradiation had already been measured by the X-ray diffraction method (Kawaguchi, 1979). However, the electron diffraction method is more useful
Figure 19. Change of intensity distribution of diffraction spots (200) of polyethylene irradiated with 200-kV electrons: (a) 0.1 electrons/nm2, (b) 250 electrons/nm2, (c) 480 electrons/nm2.
310
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 20. Change of diffraction intensity of the integrated (200) reflection spots irradiated with 200-kV (closed circles) and 100-kV (open circles) electrons.
than the X-ray diffraction method because the electron diffraction intensity is recorded simultaneously from the same specimen field of view, during electron irradiation.
B. Wide Dynamic Range In this section, application data illustrating the wide dynamic range performance of the IP are introduced. Since a convergent-beam electron diffraction (CBED) pattern has an intensity range covering about three orders of magnitude, the entire pattern cannot be recorded in a single image with conventional photographic film. With the IP, the dynamic range covers four orders of magnitude on a single image, which allows all the intensities of a CBED pattern to be covered. Figure 21a shows a CBED pattern recorded using the IP. Figure 21b shows a line profile (intensity distribution) along the center position of Figure 21a (indicated by the horizontal line). This profile shows that the pattern was recorded without saturation or loss, from the center to the periphery of the CBED pattern, which indicates the large width of the dynamic range. Figure 22 is a kind of a contour map presentation, obtained by dividing the intensity range of the image of Figure 21a into 16 parts and rendering the intensity steps of each part white and black alternately (Oikawa,
THE IMAGING PLATE AND ITS APPLICATIONS
311
Figure 21. CBED patterns taken with the IP and a JEM-2000FX II TEM at 100 kV. The specimen was a silicon (111) single crystal. (a) Low-contrast print. (b) Line profile (the intensity distribution) of part (a).
Mori, et al., 1990). It is seen that this presentation not only allows the pattern of the entire image to be recognized, but also is effective for extracting the features of the fine structures. With the IP, which has high-intensity resolution (4096 gray levels), contrast enhancement and image analysis applications can be carried out with high precision. Electron diffraction patterns of a Cu3Pd alloy were quantitatively analyzed by making good use of the wide dynamic range and good linearity of the IP by Shindo, Hiraga, Oikawa, et al. (1990). intensities of both fundamental and superlattice reflections of the alloy having a one-dimensional, long-period superstructure were measured in situ as a function of the temperature. The intensity changes of the superlattice reflections quantitatively evaluated clearly show the characteristic disordering process of the Cu3Pd alloy. It was demonstrated that quantitative structure analysis by electron diffraction patterns is possible with the use of the IP if the dynamical diffraction effect is taken into account. In this study, by measuring the intensities of the superlattice reflections and short-range-order diffuse scattering, the researchers quantitatively investigated the order–disorder transition of the Cu3Pd alloy, using the advantages of the IP, that is, a wide dynamic range and good linearity for the electron beam.
312
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 22. A contour map of the data in Figure 21a, showing that the intensity is recorded well over the whole pattern (Oikawa, Mori, et al., 1990).
In Figure 23, an electron diffraction pattern of Cu3Pd obtained with the IP is shown. The original signal intensities of 4096 gray levels were simply converted to 256 gray levels for the output; that is, each of the 16 gray levels of the original data were converted into 1 gray level in the output print of a diffraction pattern. The electron diffraction pattern shows sharp superlattice reflections, labeled A1, A2, B1, B2, and C. These superlattice reflections indicate a one-dimensional, long-period superstructure. In the single-crystal film, superlattice reflections from three variants are usually observed. The spots A, B, and C indicated in the pattern correspond to the three variants. The reflections A1 and B1 correspond to the periodicity of the basic ordered structure of the L12-type whereas A2, A3, B2, and B3 correspond to the periodicity of a long-period superstructure along each direction. By measuring the separation of superlattice reflections such as A2 and A3, the researchers obtained the period of the one-dimensional, long-period superstructure as M = 3.6. Figures 24a and 24b are electron diffraction patterns observed with the IP after the alloy was heated in the electron microscope at 823 K. Figure 24a is a pattern output in the same manner as in Figure 23, whereas in Figure 24b, only the gray levels below gray-level 1400 in the original signal intensity were converted into 256 gray levels; the gray levels above gray-level 1400 were
THE IMAGING PLATE AND ITS APPLICATIONS
313
Figure 23. Electron diffraction pattern of a single-crystal Cu3Pd observed by the IP. An , Bn (n = 1–3) and C indicate the superlattice reflections corresponding to three variants.
set to the value 256 for the output. It should be noted that in Figure 24a the superlattice reflections sharply observed in Figure 23 become faint. However, in Figure 24b, the diffuse scattering broadening at the positions of reflections such as A2 and A3 is clearly observed, which suggests the existence of a shortrange-ordered state, although the intensity of the transmitted beam and the fundamental reflections are saturated in this case.
Figure 24. (a) Electron diffraction pattern of Cu3Pd after heating to 823 K in an electron microscope. The conversion of the original intensity into the output is the same as in Figure 23. (b) The same electron diffraction pattern as that in part (a), but only the gray levels less than level 1400 of the original intensity were converted into 256 gray levels in the output print.
314
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 25. The change of electron diffraction patterns as a function of temperature. The intensity distribution of the superlattice reflection (or diffuse scattering) and that of the fundamental reflection are represented as a contour map. The area of the electron diffraction patterns corresponds to the square of that of the electron diffraction pattern shown in Figure 23.
In Figure 25, the intensity distribution of the electron diffraction patterns was plotted as a contour map in order to make clear the change of the intensity distribution with the increase of temperature. It should be noted that even the intensity of the fundamental reflection is not saturated owing to the wide dynamic range of the IP. Although reflections such as A2 and B1 correspond to the different regions with different variants, it was possible to compare these two reflections quantitatively to examine the disordering process, assuming that the thicknesses of these regions in each of these two variants are almost equal. This is because, during heating of the sample, a small drift of the sample was noticed and so the intensity variations due to the change of the excitation errors may be considerable when the intensities of superlattice reflections A2 and A1 situated relatively far from each other are compared. It is interesting to point
THE IMAGING PLATE AND ITS APPLICATIONS
315
out that the intensity of superlattice reflections such as the one indicated by A2, which corresponds to the periodicity of the one-dimensional, long-period superstructure, decreases first, and above 790 K, the intensity of the superlattice reflections B1 decrease next. The different rates of decrease of the intensities in these superlattice reflections with the increase of temperature are consistent with the another report (Hirabayashi and Ogawa, 1957), which indicates that the disordering process preferentially occurs at the antiphase boundary of the long-period superlattice, leaving a fairly highly ordered state between the boundaries below 790 K. By utilizing the IP, the researchers quantitatively analyzed the disordering process of Cu3Pd by measuring the intensities of both superlattice reflections and fundamental reflections. The characteristic disordering process and the transition to the short-range-order state were quantified from the in situ experiment by using the IP. It was demonstrated that the IP can be used for quantitative analysis by taking account of the dynamical factor.
C. Quantitative Image Analysis In this section, the application data of quantitative image analysis illustrating the linear response of the IP are introduced. For instance, high-resolution electron microscope (HREM) images of W8Ta2O29 were observed quantitatively by using the IP with a 400-kV electron microscope, by Shindo, Hiraga, Oku, et al. (1991). Figure 26 is an example of an HREM image taken with the IP. The specimen used was W-Ta-O; the image was recorded with an HREM, the JEM-4000EX, at an accelerating voltage of 400 kV, a direct magnification of ×1,500,000, a current density of 10 pA/cm2, and an exposure time of 2 s. The image data were subjected to contrast adjustment and ×2 magnification, by using the image-processing software of the IP processor (Oikawa, Mori, et al., 1990). An original print that was magnified ×1.8 (finally ×3.6) with the IP printer was used directly for printing. Figure 27 shows a three-dimensional presentation of the electron intensity distributions in areas a and b of Figure 26, which were measured from the IP. In area a (where the specimen is thin), the measured intensity is least at heavy atomic columns (indicated by arrows H in Fig. 27a), which shows a good agreement with the projected potential of the atoms in the structure model (the inset in Fig. 26). In area b (where the specimen is a little thicker), the intensity is greatest in the low potential region (indicated by arrows L in Fig. 27). It was thus clear from this quantitative measurement that the region was subjected to a strong dynamical diffraction effect. Likewise, an HREM image of the high-Tc superconductor Tl2Ba2Cu1Oy was quantitatively observed by using the IP, by Shindo, Oku, et al. (1994). In order to evaluate quantitatively the difference between the intensity of the
316
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 26. Example of an HREM image (Shindo, Hiraga, Oku, et al., 1991). Specimen was W-Ta-O and accelerating voltage was 400 kV.
observed image and that of calculated images, the researchers calculated a residual index RHREM for 743 sampling points in the unit cell projected along the [010] direction. Although it has a rather complicated layered structure, RHREM = 0.0473 was obtained by choosing the experimental parameters and taking into account the partial occupancy of Tl atoms. On the basis of the analysis of the HREM image of Tl2Ba2Cu1Oy, several requirements for further refinement of crystal structure analysis by quantitative HREM were discussed. The observed intensity of the HREM image was compared with the calculated intensity by changing the experimental parameters such as the crystal thickness and defocus value. So that the difference between the intensity of an observed image and the of calculated images could be evaluated quantitatively, a residual index RHREM, which should show the accuracy of the simulated images, was
THE IMAGING PLATE AND ITS APPLICATIONS
317
Figure 27. Three-dimensional presentation of intensity distribution, measured from the image data in Figure 26 (Shindo, Hiraga, Oku, et al., 1991).
introduced and evaluated. In the final refinement to reduce the value of RHREM, partial occupancy of Tl atoms was taken into account. On the basis of a quantitative analysis of the HREM image of Tl2Ba2Cu1Oy, some requirements for quantitative HREM were pointed out and were briefly discussed in comparison with those for the standard X-ray and neutron diffraction methods. An HREM study was carried out with a JEM-4000EX electron microscope. HREM image were recorded on the IP and were converted into digital data (2048 × 1536 pixels, 4096 gray levels) at the JEOL Laboratory. After investigation of the image intensity in the image-processing system (PIXsysTEM) (Oikawa, Mori, et al., 1990), the digital data were transferred to Tohoku University on magnetic tapes and were there analyzed with an engineering workstation (Sun: Argoss 5230) and a mainframe (NEC: ACOS-2020). An HREM image of Tl2Ba2Cu1Oy is shown in Figure 28. The incident electron beam was parallel to the [010] direction. The image was taken with a 2-s exposure and a direct magnification of ×1,500,000. It was noted that the image was observed with a defocus value which was rather smaller than the so-called Scherzer focus value (i.e., ∼48 nm). Although the image was recorded with 2048 × 1536 pixels and 4096 gray levels, only a part of the 1024 × 1024 pixels was output with 256 gray levels in Figure 28. In the image, small dark dots show heavy atom positions projected along the incident electron beam. In Figure 29, the number of pixels used for recording this HREM image is shown as a function of the gray level. Although the number of gray levels
318
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 28. HREM image of Tl2Ba2Cu1Oy recorded with the IP. The small rectangle shows a unit cell of Tl2Ba2Cu1Oy (Shindo, Oku, et al., 1994).
needed for recording HREM images seems to be much smaller than that for electron diffraction patterns, it is seen that about 1000 gray levels were used for recording the HREM image. A model of the atomic arrangement of Tl2Ba2Cu1Oy is presented in Figure 30a, which was proposed earlier by an X-ray diffraction study (Parkin et al., 1988). In Figure 30b, the intensity distribution of a part of the image near the crystal edge is shown as a contour map. The rectangles in the model of Figure 30a and in the intensity distribution of Figure 30b indicate unit cells of Tl2Ba2Cu1Oy, which has a tetragonal structure with the lattice constants a = 0.3866 nm and c = 2.324 nm. So that the noise such as quantum noise could be removed, the contour map was produced by smoothing the data with 2 × 2 sampling points and averaging the intensity after displacing the image
THE IMAGING PLATE AND ITS APPLICATIONS
319
Figure 29. Number of pixels as a function of the gray level used for recording the HREM image of Figure 28.
by +a and −a. Even after the averaging process, there is a small asymmetry around metal atom positions in the contour map. The asymmetry is considered to come from the crystal thickness change. The observed intensity of the HREM image was divided by the intensity of the incident electron beam, which was measured at the vacuum region near the crystal edge. Thus, the normalized observed intensity can be directly compared with the calculated intensity without any scaling factor. Although the contour map reveals the detailed intensity distribution of the HREM image, it is not easy to distinguish the intensity maxima from the minima, since both intensity maxima and minima appear as similar dense contour lines. As a way to make a detailed investigation of both high intensity and low intensity, which may correspond to low and high potential regions, respectively, the contour map of Figure 30b was separated into two contour maps as shown in Figures 30c and 30d. In Figure 30c, the grid
320
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 30. (a) Structure model of Tl2Ba2Cu1Oy. (b) Contour map showing the intensity distribution of the HREM image of Tl2Ba2Cu1Oy in Figure 28. (c) High-intensity region of the contour map (b). The grid corresponds to the sampling points at which the observed and calculated intensities were compared to evaluate a residual index RHREM. (d) Low-intensity regions of the contour map (b).
indicates the positions where the observed intensities were measured with the IP. The number of sampling points on the grid in the unit cell was 743. The observed intensities at these sampling points were compared with the calculated ones. In the contour map of Figure 30d, which shows low intensity, the heavier atomic columns of Tl and Ba can be easily distinguished from those of Cu. It should be noted that there is no marked difference between the density of the contour lines at the Tl site and those at the Ba site, although the potential of Tl atoms is much larger than that of Ba atoms. This will be taken into account for the refinement of the computer simulation that follows. An image calculation based on a structure model suggested by an X-ray diffraction study was carried out, which is shown in Figure 30a. So that the difference between the observed intensity and the calculated one could be
THE IMAGING PLATE AND ITS APPLICATIONS
evaluated, a residual index RHREM was calculated: % R HREM = |Iobs − Ical | Iobs
321
(1)
RHREM is the index for the observed and calculated image intensity and is basically different from the so-called R-factor or the residual index generally used in diffraction studies, where the factor or the index is evaluated for the absolute value of the structure factor. In Eq. (1) indicates the summation for the sampling points in the unit cell, which number 743 in this study and correspond to the grids of Figure 30c. As a way to get smaller values of RHREM, parameters, which depended on the experimental conditions (i.e., crystal thickness, defocus, and chromatic aberration), were changed. With the structure model of Figure 30a, RHREM = 0.0506 was obtained with the experimental parameters shown in Table 2, where the parameters which were changed to get a smaller RHREM in the calculation are indicated with asterisks. Images simulated with RHREM = 0.0506 are shown in Figure 31, where three types of contour maps (i.e., whole intensity, higher intensity, and lower intensity) are presented in Figures 31a through 31c in a similar manner to that of the observed images shown in Figures 30b through 30d, respectively. So that one could see the variation of RHREM with the change of the parameters in the calculation, RHREM was plotted as a function of crystal thickness t and of defocus f, as shown in Figures 32 and 33. In the calculation of Figure 32, all parameters except crystal thickness were set to be equal to those in Table 2. It is noted that RHREM is smaller than 0.07 in the crystal thickness range 4–6 slices. Figure 33a indicates the variation of RHREM as a function of defocus f in the range 5–75 nm. It is seen that RHREM is smaller than 0.06 in the range 15–45 nm. In Figure 33b, fine variation of RHREM as a function of f is indicated in the range 14–35 nm.
TABLE 2 Parameters Used for the Calculation of the HREM Image in Figure 31 Wavelength Spherical aberration constant Thickness of one slice Number of beams ∗ Defocus of objective lens ∗ Defocus due to chromatic aberration ∗ Crystal thickness a
0.00164 nm 1.0 nm 0.3866 nm 32 × 128 23.0 nm 24 nm 5 slice (=1.93 nm)
Asterisks indicate the parameters that were changed to obtain a smaller RHREM in the calculation.
322
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 31. (a) Calculated image of Tl2Ba2Cu1Oy shown with a contour map. Parameters used for the calculation are listed in Table 2. (b) High-intensity region. (c) Low-intensity region.
The difference of the intensity of the observed images from that of the calculated image with RHREM = 0.0506 is shown with a contour map in Figure 34a. It is seen that there are small peaks such as those indicated by A and B, where the calculated intensity deviates widely from the observed intensity. Region A corresponds to the positions around the Tl atomic columns. As pointed out in the observed image of Figure 30d, the contrast of Tl atoms is similar to that of Ba atoms despite its much larger atomic number. It is thus reasonable to say that the discrepancy may be attributed to the fact that the concentration of Tl atoms is lower than the nominal concentration. This was noticed by Shindo, Hiraga, Oku, et al. (1991) in their previous HREM experiment of Tl2Ba2Cu1Oy. They therefore took into account the partial occupancy of Tl atoms and made new image calculations. It was found that RHREM became smaller if the partial occupancy of Tl atoms was taken into account. As a result, RHREM = 0.0473 was obtained with an 87% occupancy of Tl atoms, as shown in Table 3. The parameters with asterisks indicate those changed to get a small value of RHREM in the calculation. Figure 35 indicates the variation of RHREM
THE IMAGING PLATE AND ITS APPLICATIONS
323
Figure 32. Variation of RHREM as a function of crystal thickness.
as a function of occupancy of Tl atoms. In Figure 34b, the low intensity of the calculated image with RHREM = 0.0473 is plotted as a contour map. It is noted that the density of the contour lines at the Tl position is slightly lower than that in Figure 31c, which was calculated with full occupancy of Tl atoms. In Figure 34c, the difference between the observed and the calculated images is shown as a contour map. Some of the contour lines around the Tl atom positions observed in Figure 30a disappear. However, there is still a fairly large difference at the positions indicated by B. These positions correspond to the interstices among oxygen atoms and Ba atoms in Figure 30a. As pointed out previously, there is some oxygen deficiency in the quenched samples. Thus,
324
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 33. Variation of RHREM as a function of defocus. The other parameters except the defocus value are the same as those in Table 2. (a) The range of defocus values is 5–75 nm. (b) The range is 14–35 nm.
the difference between the observed and calculated intensities in the preceding refinement may be attributed to some oxygen deficiency. In summary, in the analysis of an HREM image of Tl2Ba2Cu1Oy, a residual index RHREM of 0.0473 was obtained by changing the experimental parameters and introducing the partial occupancy of Tl atoms. By the refinement of the computer simulation, deficient oxygen positions were also detected. It was pointed out that a smaller residual index RHREM and a higher resolution limit are indispensable for obtaining more accurate atomic arrangements from HREM images observed with the IP. D. Image Processing Since the IP generates digital image data, it is convenient for digital image processing. In this section, two types of application data of the image processing are introduced.
THE IMAGING PLATE AND ITS APPLICATIONS
325
TABLE 3 Parameters Used for the Calculation of Final Refinement Corresponding to the Contour Map of Figure 34B Wavelength Spherical aberration constant Thickness of one slice Number of beams ∗ Defocus of objective lens ∗ Defocus due to chromatic aberration ∗ Crystal thickness ∗ Occupancy of Tl atoms
0.00164 nm 1.0 nm 0.3866 nm 32 × 128 24.5 nm 24 nm 5 slice (=1.93 nm) 87%
a Asterisks indicate the parameters that were changed to obtain a smaller RHREM in the calculation.
Figure 34. (a) Difference between observed and calculated intensities of HREM images with RHREM = 0.0506. (b) Lower-intensity distribution of the calculated images taking into account 87% occupancy of Tl atoms. (c) Difference between observed and calculated intensities of HREM images with RHREM = 0.0473. Note that there are still some peaks at positions indicated by B.
326
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 35. Variations of RHREM as a function of occupancy of Tl atoms. The other parameters except the occupancy of Tl atoms are the same as those in Table 3.
One is the simple contrast enhancement of an image. Figure 36 shows an example of the image contrast enhancement of a biological specimen (a thin section of a dragonfly). The image contrast was enhanced by the look-up-table (LUT) as shown in Figure 37. Here, the image contrast γ is defined as in Eq. (2): γ =
Wo Wi
(2)
where Wi is the dynamic range of input data and Wo is the dynamic range of output data.
THE IMAGING PLATE AND ITS APPLICATIONS
327
Figure 36. Contrast enhancement of an image from a thin section of a dragonfly. The contrast enhancement was carried out by the look-up-table (LUT) as shown in Figure 37.
The other is spatial frequency filtering. Figure 38 shows an example of the Fourier transformation of an HREM image (Si3N4 single crystal taken with the JEM-2010F 200-kV field-emission HREM). Figure 38a shows an original image, Figure 38b shows the Fourier-transformed two-dimensional power spectrum pattern (diffractogram), and Figure 38c shows an image reconstructed by selecting periodic spots in the spectrum, as indicated by the circles in 38b. The IP has a wide dynamic range and high intensity resolution (16,384 gray levels). Contrast enhancement and image analysis applications can hence be carried out with high precision.
E. Other Fields of Application of the IP The IP has begun to be used in the reflection high-energy electron diffraction (RHEED) field (Miura et al., 1995). In this field as well as in electron diffraction, the superior characteristics of the IP are valuable. Originally, the IP was developed as a highly sensitive image-recording device for X-ray images. The IP is widely used today in the fields of clinical medical science (Sonoda et al., 1983) and medicine and bioscience (Nakajima, 1993). The IP has also begun to be used in the field of X-ray crystallography (Fuji and Kozaki, 1993). Since the IP has good sensitivity for ultraviolet rays and ions (Nishikawa, Kimoto, et al., 1995), applications in these fields have also been started.
328
NOBUFUMI MORI AND TETSUO OIKAWA
Figure 37. Look-up-table (LUT) used for contrast enhancement in Figure 36. A gray-level histogram of the original image data is also shown in the figure.
THE IMAGING PLATE AND ITS APPLICATIONS
329
Figure 38. Image processing of spatial frequency filtering. (a) HREM image of an Si3N4 single crystal taken with the JEM-2010F field-emission TEM and the IP. (b) Fourier-transformed two-dimensional power spectrum pattern of (a). (c) Image reconstructed (spatial frequency filtered) by selecting periodic spectral spots indicated by the circles in (b).
Figure 39. Comparison of some characteristics for the image detection devices widely used today.
330
NOBUFUMI MORI AND TETSUO OIKAWA
VIII. Conclusion The TEM is an instrument for observing magnified images of microscopic objects and outputs experimental results in the form of images. Moreover, the TEM outputs not only the morphology of the specimen but also the result of interaction between the incident electron and the specimen. In this sense, the TEM image is not a mere “picture,” but a “message from the microscopic world.” Of course, imaging with the TEM is modulated by instrumental factors such as lens aberrations. Image detection devices also have specific characteristics. Figure 39 shows comparisons of some characteristics for the image detection devices widely used today. These devices have both advantages and disadvantages, and they have very different characteristics. Among these devices, it is hoped that the IP, which has high sensitivity and high quantitative precision for beam intensity and which is also suited for image processing, will be widely used and assist in new research using the TEM.
Acknowledgments Among the application data introduced in this article, Figure 1 was obtained in a joint research project by Dr. Hiroshi Ayato of the Ashigara Research Laboratory of Fuji Photo Film Co., Ltd., and the authors. Many of the application data in this article were obtained in a joint research project by Professor Daisuke Shindo of the Institute for Advanced Materials Processing, Tohoku University, and one of the authors (T. O.). We hereby express our gratitude to them for allowing us to use the data included in this article.
References Amemiya, Y., and Miyahara, J. (1988). Nature 336, 89–90. Amemiya, Y., Satow, Y., Matsushita, T., Chikawa, J., Wakabayashi, K., and Miyahara, J. (1988). In Topics in Current Chemistry, Vol. 147. Berlin/Heidelberg: Springer-Verlag, pp. 121–144. Amitani, K., Kano, A., Tsuchino, H., and Shimada, F. (1986). SPSE’s Conference and Exhibition on Electronic Imaging, 26th. A Fall Symposium, Advance Printing of Paper Summaries. p. 180. Ayato, H., Mori, N., Miyahara, J., and Oikawa, T. (1990). J. Electron Microsc. 39, 444–448. Blasse, G., and Grabmaier, B. C. (1994). Luminescent Materials. Berlin/Heidelberg: SpringerVerlag. Burmester, C., Braum, H. G., and Schroder, R. R. (1994). Ultramicroscopy 55, 55–65. Dainty, J. C., and Shaw, R. (1974). Image Science. New York: Academic Press. Fowler, W. B. (1968). Physics of Color Center. New York: Academic Press. Fujii, N., and Kozaki, S. (1993). Adv. X-Ray Anal. 36, 505. Gasiot, J., Braulich, P., and Fillard, J. P. (1982). Appl. Phys. Lett. 40, 376.
THE IMAGING PLATE AND ITS APPLICATIONS
331
Hangleiter, T. K., Koschnick, F., Spaeth, J.-M., Nuttall, R. H. D., and Eachus, R. S. (1990). J. Phys.: Condens. Matter 2, 6837–6846. Hirabayashi, M., and Ogawa, S. (1957). J. Phys. Soc. Jpn. 12, 259–271. Isoda, S., Saitoh, K., Ogawa, T., Moriguchi, S., and Kobayashi, T. (1992). Ultramicroscopy 41, 99–104. Iwabuchi, Y., Mori, N., Takahashi, K., Matsuda, T., and Shionoya, S. (1994). Jpn. J. Appl. Phys. 33, 178–185. Iwabuchi, Y., Umemoto, C., Takahashi, K., and Shionoya, S. (1990). J. Lumin. 48/49, 481– 484. Katto, M., Matumoto, R., Kurosawa, K., Sasaki, W., Takigawa, Y., and Okuda, M. (1993). Rev. Sci. Instrum. 64, 319–324. Katz, L., and Penfold, A. S. (1952). Rev. Mod. Phys. 24, 30. Kawaguchi, A. (1979). Bull. Inst. Chem. Res., Kyoto Univ. 206. Keller, S. P., and Pettit, G. D. (1958). Phys. Rev. 111, 1533–1539. Kobayashi, K., and Sakaoku, K. (1964). Proceedings of Symposium on Quantitative Electron Microscopy. Washington, DC. The Electron Microscopy Society of America, p. 359. Kondo, Y., Konno, Y., Tamura, N., Mori, N., and Iwabuchi, Y. (1994). Nucl. Instrum. Methods Phys. Res. B91, 219–222. Koschnick, F. K., Speath, J.-M., and Eachus, R. S. (1992). J. Phys.: Condens. Matter 4, 3015– 3029. Kujawa, S., and Krahl, D. (1992). Ultramicroscopy 46, 395. Liebich, B. W., and Nicollin, D. (1977). Acta Crystallogr. B33, 2790–2794. Meijerink, A., and Blasse, G. J. (1991). J. Phys. D: Appl. Phys. 24, 626. Meijerink, A., Schipper, W. J., and Blasse, G. (1991). J. Phys. D: Appl. Phys. 24, 997. Miura, H., Ohtaka, K., Shindo, D., and Oikawa, T. (1995). Mater. Trans., JIM 36, (in press). Miyahara, J., Takahashi, K., Amemiya, Y., Kamiya, N., and Satow, Y. (1986). Nucl. Instrum. Methods Phys. Res. A246, 572–578. Mori, N., Oikawa, T., Harada, Y., and Miyahara, J. (1990). J. Electron Microsc. 39, 433–436. Mori, N., Oikawa, T., Katoh, T., Miyahara, J., and Harada, Y. (1988). Ultramicroscopy 25, 195– 202. Nakajima, E. (1993). Radioisotopes 42, 228. Niimura, N., Karasawa, Y., Tanaka, I., Miyahara, J., Takahashi, K., Saito, H., Koizumi, S., and Hidaka, M. (1994). Nucl. Instrum. Methods Phys. Res. A349, 521–525. Nishikawa, O., Akimoto, T., Tsuchiya, T., Yoshimura, T., and Ishikawa, Y. (1994). Appl. Surface Sci. 76/77, 359–366. Nishikawa, O., Kimoto, M., Fukui, K., Yanagisawa, H., Takai, M., Akimoto, T., and Tuchiya, T. (1995). Surface Sci. 325, 288. Ogawa, E., Arakawa, S., Ishida, M., and Kato, H. (1995). SPIE 2432, 421. Ogura, N., and Nishioka, Y. (1995). Private communication. Ogura, N., Yoshida, K., Kojima, Y., and Saito, H. (1994). Proceedings of the Thirteenth ICEM. Les Editions de Physique, pp. 219–220. Ohnishi, A., Kan’no, K., Iwabuchi, Y., and Mori, N. (1994). Nucl. Instrum. Methods Phys. Res. B91, 210–214. Oikawa, T., Mori, N., Takano, N., and Ohnishi, M. (1990). J. Electron Microsc. 39, 437–443. Oikawa, T., Shindo, D., and Hiraga, K. (1994). J. Electron Microsc. 43, 402–405. Oikawa, T., Shindo, D., Kudoh, J., Aita, S., and Kersker, M. (1992). Proceedings of the Fiftieth Annual Meeting of the Electron Microscopy Society of America. The Electron Society of America, p. 382. Parkin, S. S. P., Lee, V. Y., Nazzak, A. I., Savoy, R., Huang, T. C., Gorman, G., and Beyers, R. (1988). Phys. Rev. B38, 6531.
332
NOBUFUMI MORI AND TETSUO OIKAWA
Rabin, H., and Klick, C. C. (1960). Phys. Rev. 117, 1005–1010. Radzhabov, E. A., and Egranov, A. E. (1994). J. Phys.: Condens. Matter 6, 5639. Reimer, L. (1984). Transmission Electron Microscopy. Berlin: Spring-Verlag. Sato, M., Katsube, Y., and Hayashi, K. (1993). J. Appl. Crystallogr. 26, 733–735. Sakabe, N. (1991). Nucl. Instrum. Methods Phys. Res. A303, 448. Schipper, W. J., Hamelink, J. J., Langeveld, E. M., and Blasse, G. (1993). J. Phys. D: Appl. Phys. 26, 1487. von Seggern, H., Voigt, T., Knupfer, W., and Lange, G. (1988). J. Appl. Phys. 64, 1405–1412. Shindo, D., Hiraga, K., Iijima, S., Kudoh, J., Nemoto, Y., and Oikawa, T. (1993). J. Electron Microsc. 42, 227–230. Shindo, D., Hiraga, K., Oikawa, T., and Mori, N. (1990). J. Electron Microsc. 39, 449–453. Shindo, D., Hiraga, K., Oku, T., and Oikawa, T. (1991). Ultramicroscopy 39, 50–57. Shindo, D., Oku, T., Kudoh, J., and Oikawa, T. (1994). Ultramicroscopy 54, 221–228. Sonoda, M., Takano, M., Miyahara, J., and Kato, H. (1983). Radiology 148, 833–838. Takahashi, K., Kohda, K., Miyahara, J., Kanemitsu, Y., Amitani, K., and Shionoya, S. (1984). J. Lumin. 31/32, 266–268. Takahashi, K., Miyahara, J., and Shibahara, Y. (1985). J. Electrochem. Soc. 132, 1492–1494. Tateno, Y., Iinuma, T., and Takano, M. (1987). Computed Radiography. Tokyo: Springer-Verlag. Umemoto, C., Kitada, A., Takahashi, K., and Matsuda, T. (1988). Extended Abstracts, the One Hundred Seventy-Fourth Electrochemical Society Meeting. The Electrochemical Society, p. 918. Williams, R. T., and Song, K. S. (1990). J. Phys. Chem. Solids 51, 679–716. Yuste, M., Taurel, L., Rahmani, M., and Lemoyne, D. (1976). J. Phys. Chem. Solids 37, 961–966.
Index
A Ablation of metal films, 20–22 Albite twins, 64, 65 Algebraic topology, 184 Alkali feldspars, phase separation of, 60–68 Amphiboles, phase separation of, 68–71 monoclinic, 71–77 orthorhombic, 77–81 structure and classification, 68–69 Analogies between theories, 148 Analytical electron microscopy (AEM), 55–58, 74, 76–77 Atomic force microscope (AFM), 92 Atomic scattering, 7
B BaFX, 282–288 Bethe stopping power formula, 24 Biopyriboles, 81 chain-width disorder in, 83–84 new, 82–83 polysomatic reactions in, 84–87 Boundary conditions and sources, 176–177 Bragg scattering, 7
C Cassette, 291 Cell complexes, 184–186 Chains, 190–191
boundary of, 191–193 co-, 193–197 Chain-width disorder in, 83–84 Charge content, 165 Charge-current potentials, 170 Charge flow, 165 Chesterite, 82–83 Classification diagrams, 148–149 Classification schemes, 149 Clausius-Clapeyron equation, 21 Coboundary operator, 200–204, 214–215 Cochains, 193–197 field function approximation, 239–240 Computed radiography, 282, 304–305 Constant electric field scaling, 93–94 Constant potential scaling, 93–94 Constitutive equations/relations, 160–161, 172–176 discrete representation, 205–207 strategies for discretization, 231–239 Contour mapping, 294 Contrast transfer function (CTF), 298 Convergent-beam electron diffraction (CBED), 310–311 Coordinate maps, 211 Crystallographic shear planes, 83–84
D Dark-field technique, 81 Deflector 333
334
INDEX
Deflector (Cont.) sliced, 105–106 stacked, 102–104 De Rham functor, 236 Detective quantum efficiency (DQE), 300–302 Detector, 124–126 Differential forms, 210–211 Differential operators, 214–217 Discrete Green’s formula, 219 Discrete surface integral (DSI), 256–259 Discretization error, 175–176, 237–239 Discretization of the Hodge star operator, 232 Discretization strategy, reference, 222 constitutive relations, 231–239 domain discretization, 223–225 edge elements and field reconstruction, 239–246 error-based, 237–239 field function reconstruction and projection, 233–237 global application of local constitutive statements, 232–233 topological time stepping, 225–231 Domain discretization, 223–225 Dynamic random access memory (DRAM), 92
E Edenitic substitution, 77 Edge elements and field reconstruction, 239–246 Einzel lens. See Stacked einzel lens Electric charge, law of conservation of, 170
Electromagnetic potentials, 170 Electron-microprobe analyzer (EMPA), 54, 76 Electron microscopy applications, 1–2 Electron microscopy, high-speed applications, 45 flash photoelectron, 25–36 pulsed high-energy reflection, 36–40 pulsed mirror electron, 40–45 techniques, 2–6 time-resolving, 6–45 transmission electron, 7–25 Electron-optical calculations, 126–132 Electron source silicon, 121–124 spindt, 119–121 Electrostatic lenses See also Fabrication of miniature electrostatic lenses scaling laws for, 93–94 Erasing, imaging plate, 292 Error, discretization, 175–176, 237–239 Error-based finite element method, time-domain, 271–273 Euclidean space, 211–212 Exposure, imaging plate, 291–292 Exsolution (phase separation), 55, 59 alkali feldspars, 60–68 amphiboles, 68–81 Exterior differential, 215–216
F Fabrication of miniature electrostatic lenses detector, 124–126 electron-optical calculations, 126–132
INDEX
electron source, 119–124 future for, 140 LIGA lathe, 108–118 review, 94–95 slicing, 104–108 stacked einzel lens, 132–140 stacking, 95–104 Fabrication of miniature magnetostatic lenses, 118–119 Factorization diagrams, 148–149 Fading, 292, 299–300 Faraday cup, 124, 134 Faraday’s induction law, 151–152, 159–160, 163–164, 169, 170, 171, 225–226, 257 Fast Fourier transform (FFT), 294 F-centers, 283–288 Field, concept of, 144 Field function reconstruction and projection, 233–237 Field reconstruction, edge elements and, 239–246 Fields, discrete representation cochains, 193–197 limit systems, 197–199 Finite difference (FD), 145 methods, 246–255 support operator method (SOM), 252–254 Finite difference time-domain method (FDTD), 246–252, 254–255 Finite element (FE), 145, 219 methods, 264–273 time-domain, 267–269 time-domain edge, 269–271 time-domain error-based, 271–273 Finite integration theory (FIT), 260–264 Finite volume (FV), 145
335
discrete surface integral (DSI), 256–259 finite integration theory (FIT), 260–264 methods, 207–209, 219, 255–264 Flash photoelectron microscopes. See Photoelectron microscopes, flash
G Galerkin method, 267 Gauss’s divergence theorem, 161 Gauss’s law for electrostatics, 160, 170–171, 229–230 for magnetic flux, 159–160, 168–169 for magnetostatics, 170, 228–229 Geometric objects and orientation, 150–157 Geometry, discrete representation, 183 boundary of a chain, 191–193 cell complexes, 184–186 chains, 190–191 incidence numbers, 188–189 primary and secondary mesh, 186–187 Granularity and uniformity, imaging plate, 300–302 Green’s formulas, 219 Guinier-Preston (CP) zones, 72–73
H Hertz-Knudsen-Langmuir equation, 21 High-resolution electron microscope (HREM), quantitative image analysis, 315–324
336
INDEX
High-resolution TEM (HRTEM), 54 biopyriboles and polysomatic defects, 81–87 Hodge star operator, 232
exposure, 291–292 image processor, 293–294 reading, 292 Incidence numbers, 188–189
I
J
Image intensity tracking, 2, 6 space-time resolution, 24–25 in transmission microscopes, 10–12 Image processing, 293–294, 324–327 Imaging plate (IP) advantages of, 282 computed radiography and radio luminography systems, 304–305 configuration of, 290–291 description of layers, 288–290 development of, 281 fading, 292, 299–300 granularity and uniformity, 300–302 photomultiplier tube (PMT), 282 photostimulated luminescence (PSL), 282–288 resolution, 296–299 sensitivity, 294–296, 306–310 transmission electron microscope, 303–304 Imaging plate (IP), applications, 281, 305 high sensitivity, 306–310 image processing, 324–327 miscellaneous areas, 327–329 quantitative image analysis, 315–324 wide dynamic range, 310–315 Imaging plate (IP), elements erasing, 292
Jimthrompsonite, 82–83
L Laser-driven guns photoelectron, 4–5 thermionic, 3–4 Law of conservation of electric charge, 170 of magnetic flux, 169–170 LIGA (lithography and galvo-forming or electroplating) lathe, 94, 108 dose calculation, 111–118 processing, 109–111 Light-optical microscopy, 2 Limit systems, 197–199 Lucite, 109–111, 115–118
M Magnetic flux φ, 151–152 Gauss’s law for, 159–160, 168–169 law of conservation of, 169–170 Magnetostatic lenses, fabrication of miniature, 118–119 Material parameters, 173 Maxwell-Ampère’s law, 160, 170, 171, 225, 227–228, 257, 262 Maxwell grid equations, 262 time-domain edge element method, 269–271
INDEX
time-domain error-based finite element method, 271–273 time-domain finite element methods, 267–269 Meshes primary and secondary, 186–187, 223 Metal films, ablation of, 20–22 Metal melts, hydrodynamic instabilities of, 12–20 Microchannel plate, 126 Miniature electron optics, use of term, 91 Miniature scanning electron microscope (MSEM) See also under Fabrication applications of, 91–93 electron source, 119–124 stacked assembly, 100–102 stacked electrostatic deflector and stigmator, 102–104 tilted, 130–132 Mirror electron microscopy, pulsed, 40–45 Modular transfer function (MTF), 297–298 Moon-rock samples, 54 Moonstone, 64–65 Multivectors, 212–214
N Noise, imaging plate, 300–302 Nucleation, homogeneous, 59 in alkali feldspars, 61 in amphiboles, 73, 80
O Orientation compatible or coherent, 189 external, 150, 153–154, 164
337
geometric objects and, 150–157 internal, 150, 151, 164 propagate, 189
P p-dimensional cell, 150, 155 differential forms, 210–211 oriented, 184–186 p-dimensional cochains, 193–197 incipient, 210 Perthites, 61 Petrographic optical microscope, 54 Phase separation (exsolution), 55, 59 alkali feldspars, 60–68 amphiboles, 68–81 Photoelectron gun, laser-driven, 4–5 Photoelectron microscopes, flash, 25–27 applications, 29–34 limitations, 34–36 short-time exposure imaging, 27–29 Photoionization, 35 Photomultiplier tube (PMT), 282 Photostimulated luminescence (PSL), 282–288 Physical field problems, continuous representations, 207–208 compared with discrete, 209 differential forms, 210–211 differential operators, 214–217 spread cells, 217–220 weak form of topological laws, 220–222 weighted integrals, 211–214 Physical field problems, discrete representations compared with continuous, 209 constitutive relations, 205–207 fields, 193–199
338
INDEX
Physical field problems (Cont.) geometry, 183–193 topological laws, 199–205 Physical field problems, methods finite difference methods, 246–255 finite element methods, 264–273 finite volume methods, 255–264 reference discretization strategy, 222–246 Physical field problems, numerical solutions alternative methods, 145–147 boundary conditions and sources, 176–177 classification of physical quantities, 163–168 constitutive equations, 172–176 discretization step, 145 geometric objects and orientation, 150–157 mathematical structure of theories, 147–150 modeling step, 144 physical laws and quantities, 148, 157–163 scope of structural approach, 177–183 topological laws, 168–172 Physical laws and quantities, 148, 157 equations, 159–163 local and global quantities, 158–159 Physical quantities classification of, 163–168 Plagioclase feldspars, 60 Poly(methyl methacrylate) (PMMA), 109–111, 115–118 Polysomatic defects, 81–87 Polysomatic series, 81 Polysomatism, 81 Polysome, 81
Polytype, 81 Potentials charge-current, 170 electromagnetic, 170 Pullback, 211–212 Pump-probe technique, 2 Push forward, 214 p-vector, 212–214 Pyrex fiber processing, 100 Pyriboles, 81 chain-width disorder in, 83–84 polysomatic reactions in, 84–87
Q Quantitative image analysis, 315–324
R Radio luminography, 282, 304–305 Reading, imaging plate, 292 Reference discretization strategy. See Discretization strategy, reference Reflection electron microscopy, 36–40 Reflection high-energy electron diffraction (RHEED), 327 Residual equations, 265 Resolution, imaging plate, 296–299 Reversed-biased p-n, 125 Richardson-Dushman expression, 35 Riemann integral, 211–214 Roth’s diagrams, 149
S Scaling laws for electrostatic lenses, 93–94
INDEX
Scanning electron microscope (SEM), 54 miniature, 91–93 Scanning tunnel microscope (STM), 92 Schottky junction, 125 Sensitivity, imaging plate, 294–296, 306–310 Shape functions, 240–241, 265–266 Short-time exposure imaging, 2, 3–5 bright-field, 22–24 flash photoelectron microscopy and, 27–29 in transmission microscopes, 7–9 Silicates alkali feldspars, 60–68 amphiboles, 68–81 analytical electron microscopy of (AEM), 55–58 phase separation, 59 Silicon die processing, 98–99 Silicon source, 121–124 Slicing, 104–106 processing, 106–108 Space-time discretization, 223–225 objects, 155–157 viewpoint, 165–168 Space-time resolution image intensity tracking, 24–25 photoelectron microscopes and, 34–35 short-time exposure bright-field imaging, 22–24 streak imaging, 24 Spindt source, 119–121 Spinodal decomposition, 59 in alkali feldspars, 61, 62–63, 64 Spread cells, 217–220 Stacked einzel lens
339
MSEM construction, 132–136 MSEM operation and image formation, 136–140 Stacking, 95–97 MSEM assembly, 100–102 MSEM electrostatic deflector and stigmator, 102–104 pyrex fiber processing, 100 silicon die processing, 98–99 Stokes’s theorem, 161 Streak imaging, 2, 5–6 space-time resolution, 24 in transmission microscopes, 9–10 Structure of a physical theory, 147–148 Subdomain method, 266 Summation by parts formula, 219 Support operator method (SOM), 252–254
T Thermionic gun, laser-driven, 3–4 Thin-film criterion, 55 Tilted MSEM, 130–132 Time-domain edge element method, 269–271 Time-domain error-based finite element method, 271–273 Time-domain finite element methods, 267–269 Time-harmonic fields, 268 Time-resolving microscopes, 6 flash photoelectron, 25–36 pulsed high-energy reflection, 36–40 pulsed mirror electron, 40–45 transmission electron, 7–25 Tonti diagrams, 149
340
INDEX
Topological laws, 168–172 coboundary operator, 200–204 discrete representation, 199–205 weak form of, 220–222 Topological time stepping, 225–231 Transformation diagrams, 149 Transformation laws, 167–168 Transmission electron microscope (TEM), imaging plate, 303–304 Transmission electron microscopy, applications in mineralogy alkali feldspars, 60–68 amphiboles, 68–81 analytical electron microscopy (AEM), 55–58 high-resolution (HRTEM), 54, 81–87 phase separation (exsolution), 55, 59–81 specimen preparation problem, initial, 53–54 Transmission electron microscopy, time-resolving applications, 12–22 image intensity tracking, 10–12
instrumentation, 7–12 short-time exposure imaging, 7–9 space-time resolution, 22–25 streak imaging, 9–10 Tschermakite substitution, 76
V Variational approach, 264 Vector elements, 243
W Wadsley defects, 83–84 Wehnelt bias, 7 Weighted integrals, 211–214 Weighted multivectors, 213 Weighted residual approach, 264 Weight functions, 265 Whitney functor, 236 Wide dynamic range, imaging plate, 310–315
X X-ray diffraction (XRD), 59, 74
This Page Intentionally Left Blank
90051
9 780120 147632
ISBN 0-12-014763-7