ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 90
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILahoratoire dOptique Electronique du Centre National de la Recherche Scientifique Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Elecrronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITEDBY PETER W. HAWKES CEMESILahorutoire d’Optique Electronique du Centre National de la Recherche Scientifrque
Toulouse. France
VOLUME 90
ACADEMIC PRESS San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper.
@
Copyright 0 1995 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NW 1 7DX International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014732-7 PRINTED IN THE UNITED STATES OF AMERICA 95 9 6 9 7 98 9 9 B C 9 8 7 6
94
5
4
3 2
1
CONTENTS CONTRIBUTORS . . PREFACE. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii ix
Minimax Algebra and Applications
R. A. CUNINGHAME-GREEN
I . Discrete Events . . . . . . . . . . . I1. Critical Events . . . . . . . . . . . . 111. Scheduling and Approximation . . . . IV. Path Problems . . . . . . . . . . . . V. Connectivity . . . . . . . . . . . . . V1. The Steady State . . . . . . . . . . . VII . Infinite Processes . . . . . . . . . . . VIII. Maxpolynomials . . . . . . . . . . . IX . Efficient Rational Algebra . . . . . . X . Miscellaneous Topics . . . . . . . . References . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
2 16 26 36 45 58 75 85
99 109 120
Physical Information and the Derivation of Electron Physics B . ROY FRIEDEN I . Introduction . . . . . . . . . I1. The Zero Property of Lagrangians
. . . . . . .. . . . . . . . . . . . .. . . . . .
111. Fisher Information . . . . . . . . . . . . . . . . . . . IV. Principle of Extreme Physical Information . . . . . . . . . V. Special Relativity . . . . . . . . . . . . . . . . . . . VI . Classical Electrodynamics . . . . . . . . . . . . . . . . VII . Quantum Mechanics . . . . . . . . . . . . . . . . . . VIII . Uncertainty Principles . . . . . . . . . . . . . . . . . . IX General Relativity . . . . . . . . . . . . . . . . . . . X . Power Spectral llf Noise . . . . . . . . . . . . . . . . XI . Synopsis and Highlights of Derivations . . . . . . . . . . . Appendix A. Fisher Information Obeys Additivity . . . . . .
.
V
124 126 128 139 147 149 154 165 170 174 185 190
vi
CONTENTS
Appendix B . Maximal Information and Minimal Error in Characteristic State . . . . . . . . . . . . . . . . . . Appendix C . Properties of Information Divergence Quantity I ( @ . 0') . . . . . . . . . . . . . . . . . . . . Appendix D. Maxwell's Equations from the Vector Wave Equation . . . . . . . . . . . . . . . . . . . . . . . Appendix E. Derivation of Eq (VII.39) . . . . . . . . . . Appendix E Evaluation of Certain Integrals . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
.
191
194 196
. 198 . 201 202
New Developments of Electron Diffraction Theory LIAN-MAO PENG
I. I1. 111. IV. V. VI .
Introduction . . . . . . . . . . . . . . . . . . . . . . General Theory . . . . . . . . . . . . . . . . . . . . Dynamical Elastic Diffraction by Crystals . . . . . . . . Perturbation Methods for Periodic Structures . . . . . . . Perturbation Methods for Nonperiodic Structures . . . . . . Bloch Wave Channeling and Resonance Scattering . . . . . Appendix A . Green's Functions . . . . . . . . . . . . . Appendix B . Crystal Structure Factors and Potential . . . . Appendix C. Optical Potential . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
206 207 221 250 272 293 334 338 341 350
Parallel Image Processing with Image Algebra on SIMD Mesh-Connected Computers HONGCHISHI. GERHARD x . RIITER. AND JOSEPH N . WILSON I . Introduction . . . . . . . . . . . . . . . . I1. Overview of Image Algebra . . . . . . . . 111. SIMD Mesh-Connected Computers . . . . . IV. Parallel Algorithms for Image Algebra Primitives V. Parallel Image Processing with Image Algebra . VI . Concluding Remarks and Future Research . . References . . . . . . . . . . . . . . . .
INDEX .
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
353 357 363 368 373 424 427
. . . . . . . . . . .
433
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
R. A. CUNINGHAME-GREEN (l), School of Mathematics and Statistics, Birmingham University, Birmingham B15 2TT, United Kingdom B. ROY FRIEDEN (123), Optical Sciences Center, University of Arizona, Tucson, Arizona 8572 1 LIAN-MAO PENG(205), Department of Materials, University of Oxford, Oxford OX1 3PH, United Kingdom
X. RITTER (353), Center for Computer Vision and Visualization, GERHARD Department of Computer and Information Sciences, University of Florida, Gainesville, Florida 3261 1 HONGCHISHI(353), Center for Computer Vision and Visualization, Department of Computer and Information Sciences, University of Florida, Gainesville, Florida 3261 1 JOSEPHN. WILSON(3531, Center for Computer Vision and Visualization, Department of Computer and Information Sciences, University of Florida, Gainesville. Florida 3261 1
This page intentionally left blank
PREFACE
The first volume of these Advances appeared in 1948 with the title Advances in Electronics, but shortly after, the founding editor Ladislaus (“Bill”) Marton added the words “and Electron Physics” to form the title that has remained unchanged for 84 volumes and 40 years. For most of the existence of the series, this title faithfully reflected the contents as can be seen from the cumulative index that was included in Volume 81. Image processing, however, has’long been regarded as a suitable topic for inclusion and indeed, the supplement by W. 0. Saxton, Computer Techniquesfor Image Processing in Electron Microscopy (Academic Press, 1978), is still the only monograph on the subject and has become something of a classic. Over the past few years, many aspects of imaging have been surveyed including image algebra and mathematical morphology, electron holography, optical computing, electron image formation and simulation. neural networks, edge detection, and coding theory. It therefore seemed sensible to recognize the importance of this theme by a small change in the title of the series, which from now on will be Advances in Imaging and Electron Physics. No change in editorial policy is intended and future volumes will continue to cover the physics of electron devices, and especially semiconductor devices, particle optics for accelerators, electron microscopes and related instruments, micro- and nanolithography, antennas, and the computing methods employed in these domains as well as aspects of imaging and image processing-and this list is far from exhaustive! Occasional historical articles will likewise continue to be included. And, of course, we shall try to cover new subjects that fall within our field, perhaps the most important role of a series such as this. Image pickup and display are also surveyed here, thanks to Ben Kazan, who became an associate editor when his series of Advances was amalgamated with AEEF! I am happy to welcome a second associate editor, Professor Tom Mulvey, formerly editor with C. J. R. Sheppard of Advances in Optical and Electron Microscopy. The latter has now merged with these Advances, at the suggestion of Tom Mulvey, but this merger will have little effect on the coverage of AIEP, for developments in electron microscopy have always been regularly reported -hardly surprising when we remember that the first electron micrograph of a biological specimen was obtained by Bill Marton in 1934. I have been closely involved with Advances in Optical and Electron Microscopy in one way or another since it was launched by V. E. Cosslett, with R. E. Barer as his coeditor, in 1966 and am pleased that, if separate publication could no longer be ix
X
PREFACE
justified in a time of shrinking library budgets, it should join forces with this series. This first volume under the new title opens with a chapter on minimax algebra and its applications by R. A. Cuninghame-Green, whose seminal book on the subject is now 15 years old. This very full and liberally illustrated account of the subject will, I hope, reveal its relevance in fields hitherto unaware of its potential interest. Readers of the chapter by J. L. Davidson in Volume 84 will already be aware of the fascinating relation between minimax algebra and mathematical morphology. The second chapter is by B. Roy Frieden, who has made many highly original contributions to imaging science over the years, notably in the use of maximum entropy and in statistical optics. Recently, he has made a discovery of the greatest interest, namely, that the Lagrangians that lead to Maxwell’s equations, Schrodinger’s equation, and many other major fundamental equations of physics can be united with the aid of an information measure introduced by R. A. Fisher in 1925, within a few years of the innovatory publications of de Broglie, Schrodinger, and Dirac. I am very pleased that Roy Frieden has agreed to prepare a full account of these exciting developments for these Advances. The volume continues with a description of recent developments in electron diffraction theory. Despite the existence of numerous textbooks on this subject, many important topics have not been surveyed in depth. In this chapter, L.-M. Peng examines quantitative aspects of electron diffraction, the retrieval of structural information from dynamical diffraction data, and the imaging of atom strings and planes. This presentation will surely become an essential complement to the standard textbooks. The volume ends with a lengthy description of parallel image processing with image algebra on single-instruction multiple-data mesh-connected computers by H. Shi, G. X. Ritter, and J. N. Wilson. One of these authors, G. X. Ritter, has already contributed a review of developments in image algebra to these Advances and the present chapter shows how efficiently this algebra can be implemented on a particular computer architecture. As always, I am extremely grateful to all the authors who have contributed to this volume; I am sure that readers will appreciate the trouble they have taken to render so much new material accessible. I conclude with a list of articles to appear in forthcoming volumes, several of which will follow close on the heels of this volume, in order to keep the publication time as short as possible.
FORTHCOMING ARTICLES Group invariant Fourier transform algorithms
Y. Abdelatif and
Nanofabrication
colleagues H. Ahmed
xi
PREFACE
Use of the hypermatrix Image processing with signal-dependent noise The Wigner distribution Para1le1 detection Hexagon-based image processing Microscopic imaging with mass-selected secondary ions Nanoemission Metareasoning in image interpretation Magnetic reconnection Sampling theory ODE methods The artificial visual system concept Projection methods for image processing Corrected lenses for charged particles The development of electron microscopy in Italy Space-time algebra and electron physics The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Group algebra in image processing Miniaturization in electron optics Crystal aperture STEM The critical-voltage effect Amorphous semiconductors Stack filtering Bayesian image analysis RF tubes in space Mirror electron microscopy Relativistic microwave electronics Rough sets The quantum flux parametron The de Broglie-Bohm theory Contrast transfer and crystal images Seismic and electrical tomographic imaging Morphological scale-space operations
D. Antzoulatos H. H. Arsenault M. J. Bastiaans P. E. Batson S. B. M. Bell M. T. Bernius Vu Thien Binh P. Bottoni and P. Mussio A. Bratenahl and P. J. Baum J. L. Brown J. C. Butcher J. M. Coggins P. L. Combettes R. L. Dalglish G. Donelli C. Doran and colleagues M. Drechsler J. M. H. Du Buf D. Eberly A. Feinerman J. T. Fourie A. Fox W. Fuhs M. Gabbouj S. Geman and D. Geman A. S. Gilmour R. Godehardt V. L. Granatstein J. W. GrzymalaBusse W. Hioe and M. Hosoya l? Holland K. Ishizuka P. D. Jackson and colleagues P. Jackway
xii
PREFACE
Algebraic approach to the quantum theory of electron optics Electron holography in conventional and scanning transmission electron microscopy Quantum neurocomputing Applications of speech recognition technology Spin-polarized SEM Sideband imaging Highly anisotropic media High-definition television Regularization Numerical methods for electron optics Near-field optical imaging SEM image processing Electronic tools in parapsychology Image formation in STEM The Growth of Electron Microscopy Phase retrieval The Gaussian wavelet transform Phase-space treatment of photon beams Image plate Z-contrast in materials science Electron scattering and nuclear structure Multislice theory of electron lenses The wave-particle dualism Electrostatic lenses Scientific work of Reinhold Riidenberg Electron holography X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology Set-theoretic methods in image processing Texture analysis Focus-deflection systems and their applications Information measures New developments in ferroelectrics Orientation analysis Knowledge-based vision Electron gun optics
R. Jagannathan and S. Khan E Kahl and H. Rose S. Kak H. R. Kirby K. Koike W. Krakow C. M. Krowne M. Kunt A. Lannes B. Lencovi A. Lewis N. C. MacDonald R. L. Morris C. Mory and C. Colliex T.Mulvey (ed.) N. Nakajima R. Navarro and colleagues G. Nemes T. Oikawa and N. Mori S. J. Pennycook G. A. Peterson G. Pozzi H. Rauch E H. Read and I. W. Drummond H. G. Rudenberg D. Saldin G. Schmahl J. P. E Sellschop J. Serra M. I. Sezan H. C. Shen T. Soma I. J. Taneja J. Toulouse K. Tovey J. K. Tsotsos Y. Uchikawa
PREFACE
Very high resolution electron microscopy Spin-polarized SEM Morphology on graphs Cathode-ray tube projection TV systems
Canonical aberration theory Image enhancement Signal description The Aharonov-Casher effect
xiii
D. van Dyck T. R. van Zandt and R. Browning L. Vincent L. Vriens, T. G. Spanjer, and R. Raue J. Ximen !F Zamperoni A. Zayezdny and I. Druckmann A. Zeilinger, E. Rasel, and H. Weinfurter
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL . 90
Minimax Algebra and Applications
.
R A . CUNINGHAME-GREEN
.
School of Mathematics and Statistics University of Birmingham Birmingham BI5 2TT. England
1. Discrete Events . . . . . . . . . . . A . Discrete-Event Systems . . . . . . B Forward Recursion: Max Algebra . . . C Processes of Max Algebra . . . . . . D . Complexity Considerations . . . . . E . Finiteness Considerations . . . . . . I1. Critical Events . . . . . . . . . . . A EventTimes . . . . . . . . . . . B . Conjugation: The *-Operation . . . . I11 Scheduling and Approximation . . . . . A . Minimax Algebra . . . . . . . . B . Linear Equations . . . . . . . . C . Chebyshev Approximation . . . . . . . . . . . D Diverse Interpretations IV . Path Problems . . . . . . . . . . . A . Directed Graphs . . . . . . . . . . B Weak Transitive Closure . . . . . . V Connectivity . . . . . . . . . . . . A . Strong Transitive Closure . . . . . . B . Connected Graphs . . . . . . . . C AcyclicGraphs . . . . . . . . . . D Further Properties of Delta . . . . . VI The Steady State . . . . . . . . . . . A . The Speed of a System . . . . . . B . The Eigenvalue . . . . . . . . . . C . Finite Eigenvectors . . . . . . . . D The Eigenspace . . . . . . . . . . E Steady State without Strong Connectivity VII . Infinite Processes . . . . . . . . . . A . Convergence to Steady State . . . . . B Power Series . . . . . . . . . . . VIII . Maxpolynomials . . . . . . . . . . . A Siting a Service Facility . . . . . . B Maxpolynomials . . . . . . . . C . Extrema of Product Forms . . . . . D Evolution . . . . . . . . . . . . IX . Efficient Rational Algebra . . . . . . . . . . . . . . . . . A . Resolution B . Linear-Time Rational Calculation . . . C Convexity and Concavity . . . . . .
. . .
.
.
.
.
.
. .
. . . . . . .
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 2 4 10 13 14 16 16 23 26 26 30 34 35 36 36 41 45 45 48 52 54 58 58 60 63 67 70 75 75 79 85 85 88 91 94 99 99
. . . . .
. . . . . .
. . .
. .
.
.
. . . .
. . .
106
107
.
Copyright 0 1995 by Academic Press Inc . All rights of reproduction in any form reserved . ISBN 012-014732-7
2
R. A. CUNINGHAME-GREEN X. Miscellaneous Topics . . . . . A. Approximation and Residuation B. General Linear Dependence . . C. Cayley-Hamilton and Realizability References . . . . . . . . . .
. . . .
. . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
109
I10 114 117 120
I. DISCRETE EVENTS A . Discrete-Event Systems 1. Events and Activities
In many systems which are of interest in engineering or physics, the state varies continuously through time. A familiar example is an electric circuit, where the voltage V at a particular point may be described by a function of a continuous variable t representing time: V = V(t).
Equations-usually differential equations-are then used to show how variables relating to different parts of the circuit influence one another. By contrast, many other systems, especially those which occur in digital signal processing or industrial production, are often more conveniently thought of in terms of events. A machine slitting sheet steel into strips proceeds from its first to its second . . . to its rth sheet. A descriptive variable s, say, might now define the total stock of unslit sheets standing in front of the machine when it completes its rth event: s = s(r).
Similarly, in operating a bus service, an individual vehicle experiences a sequence of events consisting of its arrivals and departures, and at each transfer point events occur corresponding to arrivals and departures on the various routes passing through that point. We may speak of a discrete-event system (DES),in which the individual components move from event to event rather than varying continuously through time. A characteristic of many such DESs, which will be at the heart of our discussion, is that any given component must wait before proceeding to its next event until certain others have completed their current events. A convenient illustration is afforded by a robot welding machine on an assembly line, whose task is to spot-weld together two sub-assembliesA and B produced by two presses in different parts of the factory. The welder cannot begin a new welding operation until it has completed its current
MINIMAX ALGEBRA AND APPLICATIONS
3
job and a new sub-assembly A has been pressed and fetched and a new sub-assembly B has been pressed and fetched. It is the conjunction of all three of these events which releases the robot for its next event. In the management of such a system, some questions naturally arise. If everything is synchronized as carefully as possible, what is the maximum speed at which the system could run? If a project involves the processing of 100 workpieces, and delivery promises have been made, what is the latest time at which the project could begin? It is to answer questions of this sort that the techniques of minimax algebra were developed. 2. The Model System
Figure 1 represents a hypothetical DES of the sort described in the preceding section, containing four machines. We shall call it the model system. Suppose that machine 1, once it has begun a new piece of work, takes four units of time to finish. If the rth event on any machine is by definition its completion of its rth workpiece, then certainly the (r + 1)st event on machine 1 cannot happen until at least four time-units after the rth. Figure 1 denotes this by an arrow, marked with the number of timeunits, directed from machine 1 in the (r + 1)st-event column to machine 1 in the rth-event column.
r+l
r
FIGURE1 . The model system.
4
R. A. CUNINGHAME-GREEN
We suppose also that machine 1 must receive work from machines 2 and 3 and that the (r + 1)st event on machine 1 is thereby similarly constrained not to take place until at least nine and three units of time after the rth events on machines 2 and 3, respectively. We denote these constraints similarly by labelled arrows in Fig. 1. Let xi(r)represent the earliest time at which the rth event can occur on the ith machine, for general values i = 1, .., 4 and r = 1,2, . . If we know xl(r),.. ,x&) for some particular r, then the constraints shown in Fig. 1 determine xl(r + 1) as
.
.
xl(r + 1) = max(x,(r)
..
+ 4, x2(r)+ 9, xj(r) + 3).
(1.1)
B. Forward Recursion: Max Algebra 1. Forward Equations
Machines 2, 3, and 4 of the model system are supposedly constrained in a similar manner to machine 1, though for clarity this is not indicated fully in Fig. 1. In general, a system of this kind with n machines is governed by a set of forward recursions of the form x i @ + 1) = max(x,(r) +
ail ,
...,x,,(r) + sin),
(1.2)
...
holding for all i = 1, ...,n and r = 1,2, and showing how later events depend upon earlier. In a practical situation, not every machine will constrain every otherthus, there is no entry corresponding to a14 in Eq. (1.1). This is mathematically rather inconvenient, so we introduce the symbol E to represent -00, a hypothetical number smaller than any real number. Any a,, which is not defined naturally by the system itself is assumed to equal e, and in effect will be ignored by the operator max. Thus, the RHS of Eq. (1.1) becomes max(x,(r)
+ 4, x2(r) + 9, xj(r) + 3, x4(r)+ e).
(1.3)
This convention allows us to write, for a general system with n machines, a full n x n system matrix A = [a,,],each of whose elements is either a real number or equal to E. For the model system shown incompletely in Fig. 1, we shall take the system matrix to be
MINIMAX ALGEBRA AND APPLICATIONS
5
Reserved notation
F denotes the real-number set: W denotes F U (4; Wm,ndenotes the set of all m x n matrices over W . The usual notation for the real-number system is avoided, since the algebra will differ from conventional algebra: The notation F emphasizes finiteness, and elements of F will also be called finite scalars. Elements of W will be called scalars, and elements of Wn,lwill be called vectors. 2. The Notation of Mau Algebra
Max algebra depends on a crucial change of notation. In place of the operator max, use the symbol 0 , reminiscent of an addition symbol; instead of + use 6 , reminiscent of a multiplication symbol. Thus, x
0y
= max(x,y);
x
6y
=x
+ y.
(1.5)
Because this will be the prevailing notation, the words addition and multiplication, and related terms such as sum and product, will always refer to the operations 0 and 6 (or later, to their duals), not to the more usual arithmetical operations, which when necessary will be referred to by explicitly using one of the adjectives arithmetical or conventional. Following this, Eq. (1.2) becomes Xi(r
+ 1) = aj1 6 xl(r) 0
6X Z ( ~ 0 )
0ain 6 xn(r), (1.6)
which has now taken the form of a simple linear function. In fact, since Eq. (1.6) is just the inner-product of row i of the matrix A with a vector x(r) defined by
we may rewrite the entire set of n relationships (1.6) as
x(r + 1) = A 6 x(r).
(1.8)
More generally, x(r)may be regarded as describing the state of the system at stage r. Equation (1.8) shows how the state of the system evolves from stage to stage under the action of a linear operator A. The symbol 6 in Eq. (1.8) manifests the fact that in carrying out the matrix multiplication the operators 0 , 6 play the roles of addition and multiplication.
6
R. A. CUNINGHAME-GREEN
In summary, the definitions in Eq. (1.5) define an algebraic structure
(W,0 , 0 )which can be extended to suitably dimensioned matrices in the usual way by the definitions r
-I
(1.9) J
In Eq. (1.9), the notation C@, suggestive of the arithmetical summation notation C, denotes in the obvious way repeated use of the addition operator @. Thus, in Eq. (1.9),
CQUik @ bkj = ail
@ b, @
*.*
@
Uin
8 b,.
k
The name max algebra will be used in this book both for the algebraic structure (W, 0 ,6) and, more informally, for the body of manipulative processes based upon it. Reserved notation
Matrices will generally be denoted by upper-case letters and their components by corresponding lower-case letters: A = [a,];
A =
[a,].
Introduction of either the lower- or the upper-case notation in a given context automatically implies introduction of the other. ] A[ ij denotes the (ij)th element of a given matrix A . Application 1.1
An AND-gate is a device used in signal processing. It is characterized by a single output and a number of inputs, each of which may be active or quiescent. If any input is quiescent, then the output is quiescent; if at some given instant all inputs are for the first time simultaneously active, then the ouput becomes active instantaneously. A delay is characterized by a single input, a number of outputs, and a given fixed time-interval. If the input becomes for the first time active, then after the given time-interval has elapsed, the outputs become active. Conventional symbols for these devices are shown in Figs. 2a and 2b. For the system shown in Fig. 3, let xl, x,, x3 represent the times at which the system inputs first become active, from an initial state in which everything is quiescent. Let yl, y , represent the same for the system outputs.
7
MINIMAX ALGEBRA AND APPLICATIONS
FIGURE2. (a) An AND gate. (b) A delay.
Then y = B 6 x , where y = [ ” ] , Y2
B = [ 3 2 3 4 1 5
1,
.-[:I. x3
We call B the transfer matrix. A suitable transfer matrix D for the system in Fig. 4 may be calculated as D = B 0 C, where
Thus,
3. Delivery Dates
A particular project involves altogether five stages of activity of the model system introduced in Section A,2. If the project begins by all machines being set in motion at time zero, at what time will each machine finish work on the project? The diagonal elements of the system matrix A of Eq. (1.4) give the completion times of the first stages, at which the machines clearly do not constrain one another. Hence, the first events-times are given by
i].
(1.10)
8
R. A. CUNINGHAME-GREEN
FIGUKE 3. An AND network.
Now, x(2) = A 0 x(l), x(3) = A Q x(2), etc. Hence, ~ ( 5 )= A
0 A Q A 0 A @ ~(1).
(I. 11)
Reserved notation A(@denotes A 0 A Q
... Q A ( p factors).
(1.12)
Application 1.2
Multiplying the system matrix A by itself gives 12 13 12 15 7 12 9 10 7 10 12 10 ~ 6 8
(1.13) 8
MINIMAX ALGEBRA AND APPLICATIONS
9
I
FIOURE4. A two-stage AND network.
Squaring again gives r 2 4 25
24 21
1 (1.14)
L 15
18 20
181
From Eqs. (1.11)-(1.14), the desired fifth-event times are found as
x(5) =
[i”]. 26
(1.15)
10
R. A. CUNINGHAME-GREEN
C. Processes of Max Algebra 1. Axiomatic Justification
The validity of the preceding calculations follows from the observation that the operators 0 , 6 satisfy many of the rules of conventional algebra: Commutative law for 0 ,
xOr=u@x, x 0 ( y 0 z) = (x 0 y ) 0 z ,
xOr
=Y
Associative law for 0 ,
6, Associative law for 6. Commutative law for
6 x,
x 6 ( y 6 z) = (x 0 y ) 0 z , These facts may be trivially verified from the definitions of the operators 0 , 6. In addition, we have Distributive law for 0 over 0. x 6 (y 0 z ) = x 6 y 0 x 6 z, This last follows from the fact that x
+ max(y, z) = max(x + y , x + 2).
As in conventional linear algebra, it follows that the operations 0 , 6 as defined for matrices are associative and distributive, and 0 is commutative. For example, by virtue of the associative law for 6 , the power A(4) could be calculated as AC2)0 A(’) in Application 1.2. 2. Further Properties of Scalars
In addition to the axioms noted earlier, the following hold good and are easily verified:
x 0 E = E @ x = x,
Identity property of
6E = E 0 x = E, x 60 = 0 0 x = x,
Null property of
x
E
E
under 0 ,
under
6,
Identity property of 0 under 0 .
If a scalar x is multiplied by itself one or more times, we follow the notation already established for matrices.
Reserved notation x@)= x
6 ..-6 x
( p factors),
x(O)denotes zero.
(I. 16)
Evidently, in conventional notation, x ( ~is)just the arithmetical product px, and so in place of the conventional binomial theorem for scalars
MINIMAX ALGEBRA AND APPLICATIONS
11
we have (x
0y p
=
x(p)
0 y@’
( p 2 O),
whence in general (
C@
xj)(p)
=
C@
(xj)(p),
Principle of exponentiation.
Notwithstanding the many similarities, max algebra differs from conventional algebra in at least two important respects, which follow trivially from the definition of 0 . x 0 x = x, x
0y
2
x,
Idempotent law of addition, Majority law of addition.
From the latter follows k
z@xi 2 xi
Principle of majority.
(vj),
j= 1
Subsequent arguments will apply these principles freely. 3 . Further Properties of Matrices
As will be familiar from conventional linear algebra, when a matrix product A @I B is formed, each column of the product matrix is precisely the vector which would arise by applying the matrix A to the corresponding column of
B. This fact will be used in later arguments under the name principle of column action. To multiply a matrix A by a scalar 6,we again proceed as in conventional linear algebra to define 8 6 b i j l = 6 a019 (1.17) and then it easily follows as usual that
6 6 (A @I x )
=A
6 (6@I XI.
(1.18)
Application 1.3
Suppose that a power cut occurs during a project, imposing a time delay equal to 6 on all the rth events, for some r. So the rth event-times now become
12
R. A. CUNINGHAME-GREEN
and hence the new (r + 1)st event-times will be A
0 (6 0 .W) = 6 0 (A 0 W ) ,
on using Eq. (1.17). Hence the (r + 1)st events and similarly all subsequent events will be affected by exactly the same delay 6.
In max algebra, we define a diagonal matrix as one in which all the off-diagonal elements equal E . Reserved notation diag(A, , ...,A,) denotes the n x n diagonal matrix having A , , the main diagonal, e.g.: diag(3, -2) =
[:
...,A,,
on
-3
Reserved notation I = diag(0, ..., 0)
and
Q, =
diag(e,
..., E ) .
It is straightforward to show that I and Q, have respectively the identity and null property for matrices over max algebra. Evidently, the idempotent law of addition for scalars immediately implies the same for matrices: A@A=A. Application 1.4 For matrices A , B, and p > 1, the idempotent law of addition simplifies the binomial theorem to (A @
B)(P) =
A(p) @ A(P-') @ B @
@A
0P
- l )
0 B(@.
In later chapters, an important role will be played by the following powersum of a certain matrix D:
rp= I @ D @ D ( ~@) ... @ D ( P ) . It is clear that
r,
=
(I@
Application 1.5 Section VI shows that the diagonal elements of a system matrix A and its powers have crucial relevance to the question of the maximum speed at which the system can run.
MINIMAX ALGEBRA AND APPLICATIONS
13
From the formula for the diagonal elements of A(’),
C@
I A ( ~ )=[ ~ ~ aij 8 aji.
By the majority principle, ] A ( ’ ) [2~ ~ (aJ2),
and a straightforward generalization yields ] A ( P ) [ ~2 ~ (aii)(P)
( p 2 1).
D. Complexity Considerations 1. The Orbit
If the state of a DES evolves from an initial state x under the action of a matrix A , the sequence of states x , A 0 x , ...,A ( p )0 x constitutes the ( p + 1)-stageforward orbit based on x. Thus, the five-stage forward orbit for the model system, based on x(1) given by Eq. (I.lO), can be calculated as
r:
1
2
3
4
5
How much computational effort is involved in calculating such an orbit? To form the inner product of a row of a matrix with a vector of order n involves, as shown in Eq. (1.3), altogether n arithmetical additions and n magnitude comparisons to implement the operator max. If we take the view that each of these primitive tasks takes one unit of computational effort, then the inner product may be calculated in order of n steps, abbreviated as O(n)steps. For brevity, we may loosely say that the task is O(n). When an n x n matrix A is applied to a vector x to form the product A 6 x , altogether n inner products must be formed, so this process is clearly O(n2). In calculating the ( p + 1)-stage orbit, the process must be carried out p times, establishing the following. Theorem 1.1. A ( p + 1)-stage orbit for a DES with n machines may be calculated in O(pn2)steps. H A statement such as Theorem 1.1, about how the amount of calculation grows with the size of a problem, is usually called a complexity statement.
14
R. A. CUNINGHAME-GREEN
The modern theory of complexity is highly developed, and the foregoing use of its ideas is distinctly oversimplified. However, it will be adequate for our purpose, which is to give a consistent measure of the amount of work involved in the execution of the algorithms we encounter. A fuller account is to be found in Papadimitriou and Steiglitz (1982). 2. Matrix-Powering The generation of delivery dates does not require a calculation of every stage of an orbit, but merely of the last stage. This may be calculated by applying a suitable power A ( p )of the system matrix A to the initial state x . In the case that the exponent p is a power of 2, we can calculate A ( p )by a process of repeated matrix-squaring. Now, forming a matrix-vector product is O(n2), so by the principle of column action, forming the product of two n x n matrices is O(n3).In particular, squaring a matrix is O(n3),so to form A ( p ) ,where p = 2k, is O(kn3)-in other words, O(n3logp). The logarithm is in principle to base 2, but this equals a constant times the natural logarithm and the units are arbitrary. When p is not a power of 2, we may proceed as follows. Suppose, for example, that p = 13. In the scale of 2, p would be written 1101, because 13 = 23
+ 22 + 20.
Hence, we can calculate A(13)as A“’ 0 A(4’ 6 A. We generate two matrix-sequences, one consisting of the consecutive powers of A and the other a running product B, say, of those powers of A we wish to include. For this example, we initialize B as A ; we calculate A(’), then square it to give A(4’, which we multiply into B; we square A(4’ to give A @ ) ,which we multiply into B to give B = A(13). It is obvious how this generalizes, and not hard to see that in the worst case, when p is one less than a power of 2, p = 2k+1- 1, we carry out 2k matrix multiplications altogether, whence the following result. Theorem 1.2. For an n x n matrix A , the power A ( p )may be calculated in at most 0 ( n 3logp) steps. w E. Finiteness Considerations
It would be rather strange if a system matrix turned up with either a complete row or a complete column consisting of the element E , since in the first such case we would have a machine wholly uninfluenced by the system (including itself !) and in the second a machine exerting no influence on the
MINIMAX ALGEBRA AND APPLICATIONS
15
system. Equally, although E is a necessary element in a system matrix, it will often not have a very natural interpretation in the description of states and event-times.
Reserved notation
Fm,,G Wm,, denotes the set of all m x n matrices in which no row or column contains E exclusively. Notice that, according to this notation, F,, denotes the set of all finite m-rowed vectors.
Theorem 1.3. I f A E Fmv,and B E Fn,q, then A 8 B E Fm,q.
Proof. Define C = A 8 B. It is easy to see that C will be m x q. Let i be any row index. Matrix A has at least one finite element on row i : say aU. Matrix B has at least one finite element on row j : say bjh. Then lc[ih
=
C@aik k
8 bkh
is finite because at least one finite term appears, namely aij 8 bjh.Hence the arbitrarily chosen row i of C contains at least one finite element. Similarly, so does any arbitrarily chosen column. I Application 1.6
Theorem 1.3 indicates one of the most important properties of F,,,:that the action of A E F,,,on a finite vector is to produce another finite vector. In particular, for a system matrix A E F,,,,the forward orbit based on a finite initial state-vector consists entirely of finite state-vectors. In a DES of the kind we have discussed, the (r + 1)st event cannot precede the rth on a given machine, so (i = 1, ...,n ; r = 1, ...), xi(r + 1) 2 xi(r) regardless of how the initial event-times are chosen. We call this the increasing property of the DES.
Theorem 1.4. If A E W,,, has aii 1 0 (i = 1, ...,n), then A E F,,,,and a DES having A as system matrix has the increasing property.
Proof. Clearly, A E F,.,. Also, xi(r + 1) =
C@ aU 8 xj(r) j
2 aii 8 xi(r),
by the majority principle, and the result follows. I
16
R. A. CUNINGHAME-GREEN
Application 1.7
Regardless of the increasing property, the proof of Theorem 1.4 gives
xi(r + 1)
2
aii Q xi(r),
and hence by iteration, x i ( p + 1) 2 (aii)(P)@I x i ( l )
( p 2 1).
11. CRITICAL EVENTS
A . Event Times 1. Notation of Min Algebra
Figure 5 depicts a circular bus route in a city, serving four districts N 1 ,..., N 4 , and also indicates the transit times between consecutive districts. With this diagram we can associate a 4 x 4 matrix D = [d,], in which dij gives the relevant transit time if Ni follows Nj on the route, and dij = E* otherwise, where E* represents +a,a hypothetical number greater than any real number:
D=
[i; : &*
&*
3
&*
4
:I]. &*
FIGURE5. A circular bus route.
MINIMAX ALGEBRA AND APPLICATIONS
17
Suppose four buses set off, one from each district, at times u, , ..., u, , respectively, and then circulate. What are the earliest times, u l , ..., u, , respectively, at which a bus will be available for departure at each district? Evidently, such an event at district Niis caused either by a bus coming into service there, or by the arrival of a bus from the preceding district. Thus, ui = min ui,min(dU (
+ uj) .
J
)
(11.1)
By analogy with max algebra, introduce the symbols @’, @’ with the meanings x @ ’ y = min(x,y); x@’y=x+y. (11.2) (We discuss later the distinction between @ and
a‘.)
Reserved notation W * denotes F U [&*I, the set of dual scalars. W& denotes the set of m x n matrices over W*. C@’denotes repeated use of the operator 0’.
a’,
The algebraic structure (W*, a‘),and the manipulative processes based on it, will be called min algebra (some authors say min-plus algebra). It is clear that every result proved in max algebra has a valid corresponding result in min algebra, which we shall call its dual. Extending the operators @‘, 0’ to matrices by analogy with Eqs. (1.9) of Section I, we arrive at the following reformulation of Eq. (11.1): Given u E Wnf, , D E W&, find a solution u of u=u
0’0Q’u.
(11.3)
Reserved notation Thepth power (p r 1) of a matrix B in min algebra will be denoted B t p l . By repeatedly substituting the formula for u from Eq. (11.3) in its own RHS, we obtain
u
=
u @’ D @’u @’ DtZ1 Q’ u @‘
.*-,
(11.4)
with the evident interpretation that the presence of the first bus at any point results from it having started there, or having arrived after one or after two . . . or more transitions. Since in this example there are only four districts, it is clear that we may truncate the series in Eq. (11.4) at the term DD1@’ u . In general, for n districts, we truncate at D t n - l 0’ l u.
18
R. A. CUNINGHAME-GREEN
Application 11.1
The matrix I * , whose diagonal elements are zero and off-diagonal elements are E * , has the identity property of matrices over min algebra. From Eq. (11.4), u = (I*
0' D 0' D[Z]0' ...D'n-11) 0' u,
so by the dual of Application 1.4, u = (I*
0' D)'"-'l 6' u.
The required power of I* 0' D can be calculated in O(n3log n ) steps following the dual of Theorem 1.2. For the particular four-district case just presented, suppose that buses start from districts 1,2, 3, respectively, at times 9, 0, 2; no bus starts from district 4. Then u is given by
The relationship between Eqs. (11.3) and (11.4) has some central importance in this general theory and was studied in detail by Gondran and Minoux and others-see Zimmermann (1981). We shall return to this topic. Application 11.2
An OR-gate, depicted in Fig. 6, is characterized by a single output and a number of inputs. If all the inputs are quiescent, then the output is quiescent; if at some given instant, for the first time some input becomes active, then the output becomes active instantaneously. Thus, the earliestoccurring input triggers the output. For the system shown in Fig. 7, let y l , yz represent the times at which the system inputs first become active, from an initial state in which everything is quiescent. Let zI, zz represent the same for the system outputs.
FIOURE6 . An OR gate.
MINIMAX ALGEBRA AND APPLICATIONS
19
FIGURE7. An OR network.
Then z = E 8’ y, where
z=[;;],
E=
[
5 43 ] ,
Y =
[;I.
If the inputs to this system are the outputs of the system considered in Fig. 3 (Application 1.1) then z = E 8’ (B @ x). Hence, in Fig. 8, if the inputs all become active at time zero, we can calculate that the outputs both become active at times 8 and 6, respectively.
2 . Backward Recursion Suppose that the event-times 45) for the model system, as calculated in Eq. (1.15), are now used as a fixed planning target, or as the basis for delivery promises. In the course of the project, however, some unforeseen delay might occur, perhaps as a result of machine breakdown. How great a delay, and to which events, could be tolerated without prejudice to the calculated finishing times 45)? Introduce the notation yi(r)to denote the latest possible time for the rth event on the ith machine. From a knowledge of the yi(r + 1) for any particular value of r, we can infer the values of the yi(r).A glance at the
20
R. A. CUNINGHAME-GREEN
FIGURE 8. A mixed network.
situation of machine 1 in the rth-event column in Fig. 1 will make this clear. Machine 1 must complete its rth event no later than four time units before its (r + 1)st and no later than three time units before machine 2’s (r + 1)st. Thus, the latest possible time for machine 1’s rth event is y,(r) = min(y,(r
+ 1) - 4, yz(r + 1) - 3).
In general, then, a system of this kind with n machines is governed by a set of backward recursions of the form
+ 1) + ail, ...,y,(r + 1) + ai,) (11.5) i = 1, ..., n and r = 1,2, ..., for suitable constants aii.
y i ( r ) = min(y,(r
holding for all Where no relevant constraint actually exists between machines i and j , we set au = E * .
21
MINIMAX ALGEBRA AND APPLICATIONS
In the notation of min algebra, the set of recursions in Eq. (11.5) assumes the compact form (11.6) u(r) = A* 6 ’ Y V + 11, where A * is the matrix [a,]. Consideration of a general diagram based on Fig. 1 easily shows that the elements isi/ in Eq. (11.5) are related to the elements aij of the system matrix by a.. Y = -a ,JI. * (11.7) Thus, the matrices A , A* are mutually related by the rule transpose and negate. Evidently, (A*)* = A.
(11.8)
Application 11.3
For the model sysytem, A* is found by transposing and negating the system matrix A given in Eq. (1.4). If the fifth-event times 45)are now taken as latest target times y(5), we can calculate the latest fourth-event times y(4) consistent with these targets, using Eq. (11.6): Y(4) =
[I; 1:
“]@‘[; ]=[fl
-3
-3
-6 & -2 :
E*
-6
-3
-4
24
26
If we compare y(4) with x(4) in the forward-orbit calculation of Section I,D,l, we see that they agree in the first three components, but that y(4) exceeds x(4) by 2 in the fourth component. This calculation has management significance. It shows that, on machines 1, 2, and 3, the earliest and the latest permissible times for the fourth events are the same. Thus, no delay in these events can be tolerated without detriment to the promised delivery dates y(5). By contrast, a delay of two time units can be tolerated on machine 4 at this stage. 3. Critical Events Events for which the earliest possible, and latest permissible, times are the same are called critical events. Following Application 11.3, we can now proceed to calculate the latest allowable stage-3 event times y(3) = A* B ’ y ( 4 ) and so on. In general, a sequence of the form y, A* 0’ y , ... will be called the backward orbit based on y . For the model
22
R. A. CUNINGHAME-GREEN
system, we may arrange the forward orbit based on x(1) and the backward orbit based on y(5) in a double orbit table as follows:
(11.9) 19
18
r:
1
3
2
4
5
It can be seen that there is a critical event at each stage, not always on the same machine. In general, more than one critical event can occur at a given stage, but as Application 111.3 in the next section will show, at least one event at each stage must be critical. In the forward recursion of Eq. (1.6), it is clear that for general i, j , and r, one of the following occurs: either xi(r + 1) > aij + xj(r)
or xi(r + 1) = aU + xj(r). In the first case, a sufficiently small delay to the rth event on machinej will not affect the timing of the (r + 1)st on machine i. In the second case, a delay, however small, to the rth event on machine j will cause a delay to the (r + 1)st on machine i , and the events are said to be critically related: If the later is a critical event, so must the earlier be. In a double-orbit table, joining with a line any two elements in the forward orbit half of the table which correspond to critically related critical events produces a diagram showing how a small delay to any critical event will propagate to later critical events. Figure 9 illustrates this for the preceding double-orbit table. We call this a critical diagram.
r:
3 4 FIGURE9. A critical diagram. 1
2
5
MINIMAX ALGEBRA AND APPLICATIONS
23
The foregoing results put into an algebraic context a number of ideas familiar in project management under the names critical path anabsis and project evaluation and review technique.
B. Conjugation: The *-Operation 1. Conjugation of Scalars The *-operation used earlier is called conjugation. For a scalar A, finite or otherwise, A* is defined to be -A, which is consistent with the notation E * , and also with Eq. (11.7), because a scalar has a natural interpretation as a 1 x 1 matrix. Clearly, (A*)* = A. A simple but important principle follows. Given a set of scalars (A,, ...,An],of which the greatest has the value A, say, then the least of the scalars (- A j ] is obviously equal to -A. In other words,
Aj]*
[TO
=
cO'A!,
(11.10)
J
and similarly (11.1 1) Clearly, A* @ A = 0 for any A E F, and if x E F,,,,,then also x* @ x = 0. For any A E F,
[ A [ = A @ A*. Application 11.4
In approximation theory, it is necessary to measure the closeness of two vectors x, y E F n , l . A convenient way of doing this is to use the Chebyshev distance C, which equals the greatest componentwise absolute difference between the vectors: C(X, U) =
m q hi - yil
= max(max(xj i
=
max max(xi - y i ) , max(yi (
so
- yi ,yi - xi))
i
i
- xi)),
24
R. A. CUNINGHAME-GREEN
If the components of y are known to be greater than or equal to the corresponding components of x, then clearly [(x, y) = x* @ y. In particular, for a DES with system matrix A , having the increasing property, a measure of the elapsed time between the first stage and the (p + 1)st is [(x(l), x ( p + 1)) = x* @ ACp)@ x , These ideas will be used extensively later.
where x = x(1).
2. Conjugation of Matrices
In introducing the matrix A * , the temporary notation ii, was used to avoid an ambiguity in the notation a;, which could be read as meaning ( ] A [ , ) * . In fact, it is more useful to make the following convention. Reserved notation
Given an upper-case symbol representing a matrix, a starred occurrence of the corresponding lower-case symbol denotes an element of the conjugate matrix. Thus, a; = ] A * [ , . Obviously, transposition-and-negation does not require a matrix to be square; we have, for example,
From the way matrix addition is defined, it is clear that the principles embodied in Eqs. (11.10) and (11.11) also apply to matrices. In particular, for similarly dimensioned matrices A , B: (A 0 B)* = A* @ ' B * ;
(A @' B)* = A* @ B * .
(11.12)
3. Conjugation of Products As for max algebra, and with essentially the same motivation, we make the
following conventions. Reserved notation
F& C Wz,,is the set of all m x n matrices in which no row or column contains E* exclusively. F* is defined to equal F, the set of finite scalars. Evidently, F:,, = F n , l ,the set of finite n-vectors.
MINIMAX ALGEBRA AND APPLICATIONS
25
The dual of Theorem 1.3 holds good, and clearly A E Fm,, if and only if A* E F;,. If A , p E F, we may trivially verify the identity
(11.13) ( A 0 P)* = p* mf A*, since both sides of this equation equal - A - p, and it is easy to see that the identity remains valid if either or both of A, p equal E . Theorem 11.1. I f A , B
E F,,,,
fhen (A @ B)* = B* B fA*.
Proof. If we define C = [cU] = A @ B, then by definition ] C * [ ,is (cji)*, i.e.,
(by Eq. (11.13)).
= p ’ b g @.‘a& k
This last expression is clearly the inner product of row i of B* and column j of A*, and the result follows.
Reserved notation
For A E F,,,, ALkqdenotes (A*)Ikl. Theorem 11.1 implies for A E F,,,, (A‘&’)*= ( A ) [ k l .
(11.14)
Application 11.5
Given the target finishing-times y(5) for the model system, the managers may not wish to consider the entire backward orbit, but simply calculate the latest start-times for the whole project. Evidently, the latest allowable first-event times are obtained by applying the operator A* four times to y(5), in other words, by calculating B ’ y ( 5 ) . Equation (11.14)shows that A[4b1need not be calculated afresh, but can be derived from A(4), given in Eq. (1.14)of Chapter 1, by simply transposing-and-negating. Thus, the latest allowable first-event times are
y(l) =
[
-24 -25 -24 -27
as previously found.
-19 -24 -21 -22
-19
-15
-22 -24 -22
-I(]
-20 -17
[
31
0 , 28 30 26
26
R. A. CUNINGHAME-GREEN
4. Finiteness Considerations
If x(1) is finite, and A E Fn,n,then we know from Application 1.6 that the entire (p + 1)-stage forward orbit, and in particular x(p + 1) = y ( p + l), consists of finite vectors. Since the dual of this result holds good, and moreover A* E F&, the backward orbit, and therefore the entire double orbit, consists of finite vectors. The term minimax algebra will be used for manipulations in which the operations of max algebra and min algebra are mixed. In fact, x(1) andy(1) are related by ~ ( 1 =) ALP’ 6‘(A(P)6 ~(1)). (11.16) Expressions of this nature will be studied in some detail in the next section. In minimax algebra, a technical problem arises from the use of the 6’ is ordinary infinite elements 8, E * . If the meaning of the symbols 6 , arithmetical addition, what value should be given to such expressions as &
6 &*,
&
@’&*?
In fact, to get a consistent algebra, we must define
6 &* & 6’ &* &
= &* = &*
6 & = &, 6’ & = &*.
(11.17) (11.18)
Further consideration of this point is beyond the scope of the present text, but a detailed discussion may be found in Cuninghame-Green (1979). For our purposes, the introduction of Fn,nand the assumption of finite state-vectors x(r) enable us to avoid the problem. For example, in Eq. (11.16), A may well involve E , but the application of A * to finite x ( p ) gives a finite result. It is evident that an extensive duality theory can be developed between the definitions and results of min algebra and max algebra. We shall make frequent informal use of this, but it is not part of the aim of the present book to give rigorous expression to it. The interested reader is referred to Cuninghame-Green (1979). 111. SCHEDULING AND APPROXIMATION A . Minimax Algebra 1. A Scheduling Problem
In the five-stage project for the model system, the target fifth-event times 31, 28, 30, 26 were calculated from an assumed set of start-times, and were therefore known to be achievable.
MINIMAX ALGEBRA AND APPLICATIONS
21
Suppose instead that a vector c of target times is dictated by other circumstances. For example, can the project be started in such a way that all machines finish at time 30? Writing B for A(4), this leads to the problem: Find x such that B 0 x = c.
(111.1)
This is essentially the problem of the solution of linear equations in max algebra. It is clear that by starting early enough, the project can be completed before any given target times, so by artificially imposing delays, a way of finishing exactly on time can be found. This would mean, however, that one or more machines either stood idle at the end, or were restrained at some stage from proceeding at the earliest possible moment. The discovery of exact solutions to Eq. (IILI), or the demonstration of the impossibility of this, therefore has an obvious management significance. If indeed Eq. (111.1) is insoluble for given B, c, then it is natural to ask how close one may come to achieving a solution: Find x such that B 0 x = c approximately.
(111.2)
Application 11.4 commented on the naturalness of the Chebyshev distance as a measure of approximation in minimax algebra. Reserved notation
For x, Y
E Fn,l,
C(X, U) =
m e Ixi - Yil *
The form in which Eq. (111.2) will be analyzed will accordingly be the following: Find x to minimize C(B @ x , c).
(111.3)
2. Inequalities Reserved notation
For similarly dimensioned matrices A , B: A < B indicates that aU < b,, for all i , j . A s B indicates that aU 5 b,, for all i, j . A 4 cB indicates that A s B, with equality occurring in at least one entry in each column. A 4 , B indicates that A s B, with equality occurring in at least one entry
in each row.
28
R. A. CUNINGHAME-GREEN
From the meaning of the operators max and min, it is clear that the following relations are all equivalent: A
IB
oB =A 0X e, A =
The inequality relation following axioms:
5
for some X,
B 0’ Y
for some Y.
(111.4)
defines a partial order in that it satisfies the
Transitivity. If A IB and B IC, then A Reflexivity. A IA for all A .
5
C.
..
If S,, . ,S,,Tare given sets, each with a suitable partial order 5 , and f is a mapping from S,x -.. x S,to T, then f is called isotone if it preserves the inequality I, i.e.: if Xi IY;. (i = 1, ..., t ) , then f ( X , , ...,Xr)If ( Y , , ..., X).
(111.5)
It is virtually self-evident that the elementary arithmetical functions min, max, and + are isotone: if x1 Iy 1
x,
and
then min(xl, x2) Imin(y,, YZ), and x,
Iy,,
m a x h ,X Z ) 5 max(Y1 ,YZ),
+ x, Iy1 + y,.
Since matrix addition and multiplication in both min algebra and max algebra are compositions of these elementary functions, the following result easily follows. Theorem 111.1. All scalar and matrix additions and multiplications, and compositions thereof, in both min algebra and max algebra, are isotone. H
It is easily seen that conjugation is not isotone but antitone: if X I Y
then Y*
IX*.
Application 111.1
Theorem 1.4 can be restated as follows. If A E W,,,satisfies A 1 Z, then A E F,,,, and a DES having A as system matrix has the increasing property. For, by isotonicity, A @XlZ@X=X.
29
MINIMAX ALGEBRA AND APPLICATIONS
Application 111.2
For a DES with system matrix A x(1) E F n . 1 , xi(p
E
F,,,,,, Application 1.7 gives for
+ 1 ) 1 (aii)(P)6 xi(1).
Thus, making free use of isotonicity, xi(l)* @ x i ( p
+ 1 ) 1 xi(l)* 6 (aii)(p)6 x i ( l ) = (aiiP
(since x i ( l ) *
6 x i ( l ) = 0).
Hence, x(l)* 0 x(p
+ 1 ) = x @ x i ( l ) * @ xi(p + 1 ) 1 C@( a i i ) ( p ) . i
i
Referring to Application 11.4, we can interpret this in the form
by the principle of exponentiation. 3. Minimax Algebra
The following is a fundamental result in minimax algebra. Theorem 111.2. For any B E Fm,,,and x E F,,,1, there holds x4,B* O’(B6x).
Proof. Define w = B 6 x and y = B* 6’ (B €3 x). Since x is finite, so are w and y (Theorem 1.3 and its dual, Section I ) . Consider the product w* 6‘w. This equals zero. On the other hand, it is ( B 6 x)* @‘ (B 6 x ) = (x* @ ’ B * ) 6 ’ (B 6 x) = X*
@’B*
6’ (B @ X )
(Theorem 11.1) (Associativity)
= x* 6 ’ y .
Hence, x*
6’ y = 0. But this says min(-xi i
+ (ith component of B* 6’ (B 6 x))) = 0.
This clearly implies the result. H
30
R. A. CUNINGHAME-GREEN
Application 111.3
For the five-stage project of the model system, if we take A(4) in the role of B in Theorem 111.2, then x(1) 4 c A [ 4 ” 6‘ (A(4) 6 x(l)),
confirming that at least one first-cycle event will be critical, however x(1) is chosen. More generally, with the notation of Section 11, consider the ( p + 1)stage double orbit based on some finite x(1) for a DES with system matrix A , taking y ( p + 1) = x ( p + 1). Defining B = (0 < r < p ) . Theorem 111.2 implies x(r + 1) 4, B* 6’ ( B 6 x(r + 1)) - B* 6’ (A(P-r)6 x(r + 1)) = B*
6’
8 A“ 6 x(1))
- ~ [ ( p - r ) * I@ ’ x ( p + 1) = y(r + 1). Hence, there is at least one critical event at every intermediate stage as well, as noticed in relation to the double orbit in Section 11. By the principle of column action, x may be replaced in Theorem 111.2 by any n-rowed finite matrix. By adapting and dualizing the proof, we easily establish the following results, which we shall use later:
Theorem 111.3. If B E Fm,n, and X is finite, then
(0 (ii) B. Linear Equations 1. Inverting Inequalities
The following result shows that conjugation provides a mechanism for inverting inequalities.
Theorem 111.4. If B B
6x
E
F,,,, x
Ic
Proof. The relation B 6 x (Theorem 1.3), and it means
=
[xj]E Fn,l and c = [ci] E F m , l ,then
if and only if x s B* @ ’ c . Ic
is a relation between finite vectors
max (bU+ xj) Ici ...,n
j = I.
(i = 1,
..., m),
MINIMAX ALGEBRA AND APPLICATIONS
31
i.e., b,
(i = 1, ..., m ; j = 1, ..., n),
+ xj Ici
and this happens if and only if
xi
Ib$
(i = 1, ..., m ; j = 1, ...,n),
+ ci
because this is certainly true if b$ = E * , and if b, is finite, then b$ = -b,. But the last inequality is clearly equivalent to xi
I
min (b;
i=l,
...,m
which in turn is equivalent to x
+ ci)
IB*
(j= 1,
..., n),
@' c.
Resewed notation
In relation to a given inequality B Q x X'
=
Ic, x'
denotes
B* @'c.
(111.6)
Notice that xu is finite when B E F,,,and c is finite. Theorem 111.4 may now be reformulated in the following way:
x' is the greatest solution of B @ x
Ic.
(111.7)
Accordingly, x' is called the principal solution of the inequality. The idea used in backward recursion is recognizable here. If B = A(p),where A is the system matrix of a DES, then x' gives the latest first-event times which will not cause target last-event times c to be overrun in a ( p + 1)-stage project. Theorem 111.5. Among solutions of B Q x Ic, for B E F,,, and c E F,,,,no B @ x is closer to c in any component than B @ x' is.
Proof. If x is any solution, then x Ix' by Theorem 111.4, and therefore by isotonicity, B@XIB@X~ICH . 2. Linear Equations As in conventional algebra, the linear-equations problem posed in
Eq. (111.1) is of central importance, but not always soluble.
Theorem 111.6. Given B E Fm,n, c E Fms1, then B @ x = c is soluble if and only ifx' E F,., is a solution; x' is then the greatest solution.
Proof. This easily follows from Theorem 111.5, since any solution of B @ x = c is obviously a solution of B @ x Ic.
32
R. A. CUNINGHAME-GREEN
A useful reformulation of Theorem 111.6 is embodied in the following solubility criterion for given B E Fm,", c E F,, :
B
6x
= c is soluble if and only if B @ (B* 6' c) = c.
(111.8)
In the event of solubility, we again call xu the principal solution. Application 111.4
At the outset of the section, we asked whether a five-stage project for the model system could be initiated in such a way that the earliest possible times for the fifth events were given by
c = [
;I. 30
Defining B = A(4), and taking B* from Application 11.5, we calculate
Using B = A(4) from Application 1.2 gives
so the work finishes early on machines 2 and 4. Since y(1) is the principal solution, delaying the first event on any machine will cause at least one of the target times to be overrun. On the other hand, by isotonicity, taking any first-event earlier cannot make machines 2 and 4 finish later. Hence, the target times are not achievable. 3. Weak Realization Problem Suppose that the first few terms x ( l ) , ...,x(P
+ 1)
are observed of a long forward orbit of a DES whose system matrix is unknown. Is it possible to predict the further evolution of the orbit? This
33
MINIMAX ALGEBRA AND APPLICATIONS
problem may arise in the control of a signal-processing system whose detailed working is inaccessible, or in the prediction of the behavior of a commercial competitor whose outputs are observable, but whose system is confidential. Obviously, if the system matrix is deducible, then the orbit is easily extrapolated, so we seek a matrix X such that X Q x(r) = x(r
+ 1)
(r = 1, . . . , p ) .
We call this the weak realization problem (a stronger related realization problem is considered briefly later). If G is the matrix whose columns are x(l), ...,x(p), and H the matrix whose columns are x(2), ...,x@ + l), then by the principle of column action, we seek X such that X@G=H. (111.9) Now, Theorem 111.6 may be generalized and dualized in a variety of ways, by very simple adaptation of the argument. For example, the unknown vector x may be replaced by an unknown matrix X ;the argument is virtually unchanged whether X multiplies from the left or the right. This leads to the conclusion that Eq. (111.9) has a finite solution if and only if X' = H 6' G* is a solution; X' is then the greatest solution. Application 111.5
1;
1,
From the first four terms of the forward orbit of the model system in Section I ,E, 1, we have 4 13 19 G = [ 4 10 16 6 12 18 4 8 14
13 19 25 H = [ 10 12 16 18 24 22 8 14 20
whence 6 9 7 9 X'=HQ'G*=
Calculating X' Q x(4), we obtain a correct prediction of x(5).
34
R. A. CUNINGHAME-GREEN
C. Chebyshev Approximation
Suppose f is a mapping from some set S, to Fm,l, and consider a general constrained approximation problem of the form minimize (( f ( x ) ,c),
subject to x E S.
Any minimizing solution x will be called Chebyshev-best in relation to any particular instance of this problem. It will not be unique in general, and we may therefore constrain the problem with another criterion-e.g., we may seek the greatest or the least Chebyshev-best solution.
Theorem 111.7. Given B E F&, c E F,, a Chebyshev-best solution to the approximation of c by B @ x subject to the constraint B Q x Ic is given by x = x'; x' is the greatest such solution.
Proof. This follows directly from Theorem 111.5. H Application 111.6 In Application 111.4, we found that the (greatest) Chebyshev-best solution to the problem of achieving fifth-event times of 30, without any overshoot, resulted in an undershoot of one time-unit on machine 2 and four time-units on machine 4:
C(B Q Y U ) , 4 = 4. If we now delay the first-event times relative to y(1) by one-half of (, i.e., by two time-units, then (Application 1.3) the fifth-event times will also be delayed by two time-units and the maximum undershoot will be reduced to two time-units. The maximum overshoot will increase from zero to two time-units, so ((B @ *I), c) = 2, where
It is clear that x = x(1) minimizes ( ( B @ x, c), since if z E Fn,l could be found making 4 < 2, where 4 is ((B @ z, c), then reducing the components of z by 4 would give a possible first-event vector producing zero overshoot with undershoot of at most 2C < 4, contradicting the Chebyshev-best status of y(1).
MINIMAX ALGEBRA AND APPLICATIONS
35
The ideas of Application 111.6 obviously generalize immediately, to give the following result.
Theorem 111.8. Given B E Fmsn, c E F m , l ,a Chebyshev-best solution of B @ x = c is given by p @ xu, where p is a scalar given by pu(2)
=
[(B @ X I ,C ) = (B @ xu)* @ C . R. Diverse Interpretations
1. Management Interpretations
The inequalities in Theorems 111.2 and III.3(i) have somewhat contrasting interpretations. Theorem 111.2 is concerned with forward recursion followed by backward recursion, and it is the components where equality occurs which are of management significance, leading to the idea of critical events. Theorem II1.3(i) is concerned with backward recursion followed by forward recursion, and it is the components where inequality occurs which point to the management problem of inactive machines. In the theory of machine-scheduling, the lateness I of an event is defined by subtracting the desired from the actual time of the event. Lateness may be positive or negative. If the lateness is positive, then the tardiness t is defined to equal the lateness, and the earliness e is defined to be zero; if the lateness I is negative, the tardiness is defined to be zero and the earliness is defined to be -1. Thus, t=l@O;
e=I*@O.
If we define the system lateness (or tardiness or earliness) to be the greatest lateness (or tardiness or earliness) experienced at the last stage of any machine in the system, then we can interpret the results of the preceding sections in the following way. Theorem 111.7 is concerned with the problem of minimal system earliness, subject to zero system tardiness. Theorem 111.8 is concerned with the problem of minimal absolute system lateness.
2 . Simple Linear Dependence As in conventional linear algebra, there are several different ways of looking at the application of a matrix B to a vector x to produce a vector c : B@x=c,
36
R. A. CUNINGHAME-GREEN
or
First, if c is given, these formulae have to do with solving linear equations: finding x such that B @ x = c. This is the view taken predominantly in the section so far. On the other hand, we may rewrite Eq. ( I I I . l O ) , using the commutative law of scalar multiplication, as
expressing c as a linear combination of the columns of B. In general, then, a given vector c is said to be expressible as a linear combination of given vectors b(l),...,b(n), if and only if there exist suitable scalar multipliers x , , ...,x,, such that c = C o x j @ b ( j ) , which happens if and only if the equation B @ x = c is soluble, where B is the matrix having b(l), ...,b(n) as columns. A relation among a set of vectors expressing one of them as a linear combination of the others will be called a simple linear dependence among them, to distinguish it from other forms of linear dependence which can occur in minimax algebra, as we discuss later. IV. PATHPROBLEMS A . Directed Graphs 1. Some Definitions
The managers of a finance company will move capital regularly from one investment to another to make a profit or avoid a loss. A DES, be it mechanical, electrical, logical, or economic, will incur costs or benefits by moving from one state to another, and the managers of such systems must find sequences of transitions which maximize the total benefit or minimize the total cost. The initial and final states need not be different: Money may attract interest by being left in one account, and physical systems may consume energy just ticking over. To discuss problems of this kind, we must introduce some more terminology.
37
MINIMAX ALGEBRA AND APPLICATIONS
Although different in detailed appearance, most of the diagrams in this book are directed graphs. A graph 8 is given by two sets % and 9I constituting its node-set and arc-set, respectively. According to application, nodes may be notated in various ways in this book; arcs will usually be represented by arrows, drawn from one node to another. Formally, an arc is an ordered pair of nodes:
We say that this arc is incident from Niand incident to Nj,indicating the latter diagrammatically by an arrowhead. We allow the possibility that N i = Nj (the arc is then a loop), and a pair of antiparallel arcs (Ni, Nj), (Nj, N i ) may occur; but a given ordered pair (Ni, Nj)may occur in the arc-set PI at most once, so if 8 has n nodes, then there are at most n2 arcs. If 8 has its full complement of n2 arcs, we say that 6 is complete. In this book, directed graphs will usually be arc-weighted. According to context, we shall use one of two arc-weighting systems: (i) Primal weighting system: each weight is an element of W. (ii) Dual weighting system: each weight is an element of W*.
Because these weighting systems will not be mixed, we can unambiguously take arithmetical sums of weights, subject to the usual conventions that x+&=&;
X+&*=&*.
In summary, then, a directed, weighted graph 8 is a triple 1% 91, w ) , where % is a nonempty finite set of distinct nodes, 53 is a finite set of distinct ordered pairs of nodes, and w is a function mapping elements of PI to one of the weighting systems just defined. To save constantly repeating the adjectives “directed” and “weighted,” we shall simply speak of a graph where this will cause no confusion. Reserved notation If 8 is a complete graph having n nodes, then D(8)denotes the n x n matrix D with d“ = W ( ( N i ,Nj)). We say that the graph and the matrix are corresponding.
Thus, for the complete graph of Fig. 10, the corresponding matrix is (IV. 1)
38
R. A. CUNINGHAME-GREEN
6 FIQURE10. A complete graph.
Reserved notation
Given a square matrix D with elements from one of the preceding weighting systems, 8(D)denotes the corresponding complete graph. If a given graph 8 is not complete, we can adjoin the “missing” arcs and attach weights to them equal to E or E* according to context. We call this the completion of 8.Conversely, from a complete graph, as in Fig. 10, we may delete any arcs whose weight is not finite and obtain the underlying finite graph, as illustrated in Fig. 11. Reserved notation
UFG(8) denotes the underlying finite graph of a given complete graph 8. Given a square matrix D, UFG(D) denotes UFG(@(D)). For brevity, most of the ensuing presentation will be in terms of the primal weighting system. A graph whose arc-weights all lie in this system will be called primal-weighted. Obviously, D E Wn,nif and only if 6(D)is primal-weighted. 2. Paths and Cycles A path in a graph is a sequence of nodes
(Ni,, * * * , N i , , )
(IV.2)
39
MINIMAX ALGEBRA AND APPLICATIONS
FIOURE1 1 . Underlying finite graph.
such that p 22
and
(Nil, Nil+,)E 8
(t = 1, ...,p - 1).
(IV.3)
..
The path is said to be from Nil to Nip,to contain the nodes Ni, , . ,Niqand to be of length p - 1. When p > 2, any node other than Nil and Nipwill be called an intermediate node. The use of angled brackets in Eq. (IV.2) reflects the fact that order is important in listing the nodes of a path. It has the incidental advantage of not distinguishing notationally between an arc and a path of length 1, and we shall use these concepts interchangeably. The definition of a path allows the sequence to contain repetitions. If there are no repetitions, we speak of an elementary path. If Ni, and Nip are the same, we speak of a cycle; if there are no other repetitions than this, it is an elementary cycle, otherwise a non-elementary cycle. Thus, a loop is an elementary cycle of length 1. A path which is not a cycle and is not an elementary path is called a nonelementary path. Obviously, non-elementary paths and non-elementary cycles contain cycles as subsequences, and if the graph contains n nodes, then a path or cycle of length exceeding n cannot be elementary. The weight of the path in Eq. (IV.2) is the arithmetical sum of the weights of the p - 1 arcs (Nil,
(IV.4)
40
R. A. CUNINGHAME-GREEN
is a non-elementary path of length 7 and weight 23. It contains several cycles as subsequences, e.g.,
(Nl,N2, N1, N1). If this cycle is replaced by the single node N, in Eq. (IV.4), we obtain a path of reduced length 14:
(N2 N1, N2 Ni N3), 9
9
which is still not elementary; but if the subsequence
(N2,Nl N2) is now replaced by the single node N2,an elementary path results. The process of replacing a cycle (Nk, ...,Nk)by the single node Nk will 3
be called cycle deletion. It is trivial to prove the following. Tbeorem IV.l. If there exists a path from Nito Nj in a given graph 6 , then either it is an elementary path, or it may be transformed to an elementary path from N, to Nj,of reduced length, by a finite number of cycle deletions. H 3 . Cycle Means
Theorem IV.2. If D E F,,,, then @(D)has at least one cycle of finite weight, and hence UFG(D) has at least one cycle.
Proof. By definition, D E Fn,nhas at least one finite element on row 1: say al,i E F. Again, row i contains at least one finite element: say ai,j E F. Continuing, we produce an index-sequence 1, i, j , ..., in which eventually some index (say, k) will recur. If k, h, ..., 4,k is the index-subsequence between the two occurrences, then clearly
(Nk,Nh,
e e . 9
N,, Nk)
is a cycle with finite weight in @(D). H
For any cycle, the cycle mean is defined as the arithmetical ratio of the weight of the cycle to its length. Thus, in the graph depicted in Fig. 10, corresponding to the matrix in Eq. (IV.l), the loop (N,,N,) has weight 7 and length 1 and therefore a cycle mean of 7; the cycle (N,,N2,N,) has weight 9 and length 2 and therefore a cycle mean of 4.5. The cycle mean will be of central importance in subsequent discussions of the stable states and maximum speed of a DES.
MINIMAX ALGEBRA AND APPLICATIONS
41
Reserved notation For D E W,,,, A(D) denotes the greatest cycle mean of all elementary cycles in @(D). Thus, for the matrix of Eq. (IV.l), A(D) = 7,given by the loop (N,,N,); for the system matrix A of the model system, it is not hard to find A(A) = 6, corresponding to the loop (N,,N,) and to the cycle
(N, N,, N,) 9
of length 2. We note en passant a couple of easy but important facts. If UFG(D) has no cycles, then A(D) = E. If A(D) = 0, UFG(D) has at least one cycle of weight zero and no cycle of positive weight.
Theorem IV.3. For D E W,,, ,A(D)is well defined and equals the greatest cycle mean of all cycles in @(D).
Proof. Because there are only finitely many elementary cycles, the maximum A(D) is well defined. Moreover, the cycle-mean of any nonelementary cycle will be a weighted average of cycle-means of constituent elementary sub-cycles and so cannot exceed A(D). H Any cycle in G(D) with cycle-mean equal to A(D) will be called a critical cycle. B. Weak Transitive Closure 1. The Extremal- Weight Path Problem
One of the most obvious applications for a graph is to depict a transportation network, with the nodes representing towns or districts, the arcs representing road or rail connections, and the weights representing distances, times, or fares along individual roads or tracks. In travelling from one town to another, using one or more of the given connections, it is natural to enquire what the minimum total travel distance, time, or cost needs to be. These are all forms of the least-weight path problem (LWPP). There is, self-evidently, a dual problem, the greatest-weightpath problem (GWPP), with a wide range of applications. The nodes represent the states of some DES, mechanical, electrical, economic or logical; the weight w((Ni,Nj)) represents the profit of a direct transition from state i to state j . What is the greatest profit achievable by moving the system from its present state to some desired state, if necessary via intermediate states?
42
R. A. CUNINGHAME-GREEN
Application IV.l
If (Nil, Nil, * * * , N i *l ,- - , N i p )
is a path of greatest weight from Nil to Nip, and this weight is finite, it is clear that and (Ni,, Nip) ( N i l Ni, .. Nil>
---
- 3
are paths of greatest weight from Nil to Nil and from N , to Nip, respectively (and these weights are finite); for otherwise, replacing for example the first of these sub-paths by a path of greater weight from Nil to Nil would contradict the weight-maximality of the given path from Nil to Nip. This is a form of the principle of optimality and will be used in subsequent arguments. The LWPP and the GWPP may be referred to collectively as the extremal weight path problem (EWPP). 2. Max Algebra and the G WPP
The GWPP has a natural expression in max algebra, as has the LWPP in min algebra: All statements dualize in the obvious way. We develop the discussion in terms of the GWPP, formulated as follows: Given a complete graph @,findfor each orderedpair of modes Ni ,Nj the greatest weight of any path from Ni to Nj. For a complete graph 8, the elements of D = D(8)have the following interpretation: do is the weight of the (only) path of length 1 from Ni to Nj. Now consider paths of length 2 from N, to Nj. Each is of the form (Ni, N k ,Nj), with associated weight djk + dk, As Nk ranges over all nodes, the greatest of these weights is given by k
+ dkj),
or in max-algebraic notation,
(1V.5) which is just the following.
ii.A straightforward argument generalizes this result to
Theorem IV.4. Given a complete (primal-weighted) graph 8,and any exponent r = 1,2, .. , the greatest weight of any path of length r from Ni to Nj in 8 is given by ](D(8))(r)[ii.
.
MINIMAX ALGEBRA AND APPLICATIONS
43
Reserved notation
For D
E
W,,,, A(D) denotes the formal matrix power-sum
A(D) = D @ D"' @ D@)@
....
(IV.6)
The matrix A(D) is called the (max-algebraic) weak transitive closure of D. Application IV.2 The GWPP calls for the greatest weight of all paths from Ni to Nj if there is no restriction on the length. From Theorem IV.4, this is given for a complete graph 6 by the greatest of the elements ](D(6))(r)[u (r = 1,2, ...), so the required quantity is r
".
which is exactly ]A(D(S))[ Thus, the weak transitive closure matrix gives a formal solution to the GWPP. Application IV.3 In the light of Theorem IV.4, the diagonal elements of D @ )have the following interpretation. is the greatest weight of any cycle of length r containing node Ni in 6(D).This suggests a procedure to compute A(D). We form the successive powers D") (r = 1,2, ...,n). If A, is the greatest diagonal element of D(", then A(D) is the greatest of the arithmetical ratios AJr. This provides a method of computing A(D) in O(n4) steps. For further discussion, see Zimmermann (1981).
Theorem IV.5. Given D E W,,,, suppose that 6 ( D ) contains a critical cycle of length L. Then, for arbitrary x E Fn,l,and t 2 1, ((x,
0 x) L ( A ( D ) ) V
Proof. From Application IV.3, the greatest diagonal element of D(L) equals LA@):
me(]^(^)[^^) = (A(D))@). 1
Arguing as in Application 111.2 shows that for arbitrary x E C€Wn.n,
The result follows on taking C as D(=).
and
44
R. A. CUNINGHAME-GREEN
Theorem IV.6. For D E Wn,n,if 8 ( D ) has a critical cycle (Ni,9
*
-
9
Ni,
3
Ni,),
then ]D(q)[i,i, = (A(D))(q) for all s = 1, ...,L and all multiples q of L.
Proof. With the notation of the theorem statement, the arithmetical ratio ( ] D ( q ) [ i I i sis) /at q most equal to A(D), by Application IV.3. But since q is a multiple of the length of the given critical cycle, we may concatenate copies of that cycle to give a cycle of length q and weight qA(D),containing each N i l , so that by Application IV.3, (ID(q)[isis) 2 qA(D).
Hence, (]D(q)[i,i,) = qA(D) = (A(D))(q). Application IV.4
As noted earlier, the system matrix A for the model system has one critical cycle of length 1, and one of length 2. In Eq. (1.13), the first three diagonal elements of A(’) equal 12 = (A(A))(’),fulfilling Theorem IV.6. 3 . p-Regularity
Equation (IV.6) raises the question of the propriety of writing an infinite series in max algebra, but, as we shall show in a later chapter, a matrix power-series in max algebra either diverges or else converges in a finite number of terms. If we define Ap to be the expression obtained by terminating the RHS of Eq. (IV.6) at the term in D ( p ) then , convergence in a finite number of terms means in this context that for some non-negative integer p , A(D) = Ap = Ap+, = Ap+2 = , (IV.7) and the matrix D is then called p-regular.
Theorem IV.7. D E Wn,nisp-regular f o r some integerp In ifA(D)5 0; and D is not p-regular f o r any p if A(D) > 0.
Proof. If A(D) I0, then all cycle means and therefore all cycle weights
are non-positive in a@).For any i, j 5 n and k > n, consider ] D ( k ) [ i j . Since this represents the weight of 7 , some path or cycle of length exceeding n, 7 cannot be elementary. Hence, cycle deletions may be made until the length is less than or equal to n. No such deletion, since it involves a cycle of non-positive weight, can decrease the total weight. Therefore, T does not have a weight greater than
MINIMAX ALGEBRA AND APPLICATIONS
45
the finally resulting elementary path or cycle, whose weight is however accounted for by for some r In. Hence, the inclusion of D(&)in Eq. (IV.6) will not affect the value of the RHS; thus, D(&)5 A,,
for k > n,
(IV.8)
and so the series may be terminated at the term D @ )at latest. On the other hand, if A(D)> 0, then some cycle mean, and therefore the weight of some cycle 7 , is positive. By concatenating T with itself an arbitrary number of times, we can create cycles of arbitrarily great weight and arbitrarily great length. It follows that the RHS of Eq. (IV.6) increases without bound in at least one entry, and therefore cannot converge in a finite number of terms. W
V. CONNECTIVITY A. Strong Transitive Closure 1. Strong Form of E WPP
In the applications considered in the previous section, the diagonal elements of the weak transitive closure matrix, measuring the economic effect of starting and finishing in the same position, were significant. In many other application fields, however, maintenance of the status quo incurs zero profit or loss: drawing up a table of fares or inter-city distances, for example. In such applications, it is natural to force the diagonal elements of the transitive closure matrix to be zero, and an algebraically convenient way is to add the identity matrix Z = diag(0, ...,0) to A(D) to give the strong transitive closure T(D)of D : T(D)=IOA(D)=ZODOD"'O
..-.
(V.1)
Application V . l
In a mountainous region, there is always a risk during the winter that a road may be closed. Suppose that Fig. 12 represents five villages with roads directly connecting certain pairs of villages. Knowing for each road the probability that it may be closed, how can we compute the probability of being able to drive from one given village to another, if necessary via other villages, going by the most reliable route? Let the probability that the direct road from village Ni to village Nj will stay open be p i , j . The reliability of a route (Ni, Nk,...,Nh, Nj) may be defined as the
46
R. A. CUNINGHAME-GREEN
FIOURE12. A road network.
arithmetical product of the probabilities Pi.k, . * . , P h , j *
More conveniently, the logarithm of the reliability is logpi,,
-k
"*
-k
l0gPh.j.
To calculate the reliabilities of the most reliable routes between each pair of villages, introduce arc-weights equal to the logarithms of the pi,j . This leads to an instance of the GWPP. The strong transitive closure is appropriate, since zero diagonal elements are logarithms of pii = 1, corresponding to the certainty of accessibility of each village from itself. 2. Properties of
r
Suppose D is p-regular. Then, using Eq. (IV.6) in Eq. (V.l), r(D) = z @ D @ D ( ~0)
... @ D @ ) .
Then Application 1.4 shows that
r(D) = (z @ D)? Moreover, writing r(D), A(D) as r, A for brevity, consider
Z@D@
=Z@D@
(Z@ A)
=z@D @
(Z @ D 0
@ D'P))
z @ D @ D'2) @ ... @ D ( P + l ) = zoA = r. = z~ =
MINIMAX ALGEBRA AND APPLICATIONS
Hence, r satisfies
r = 1 0D
~ r.I
Suppose 62 is another solution of Eq. (V.2), so 62 = I @ D @
a.
By repeatedly substituting this equation in itself, we derive
62 = I @ D @ (I @ D @ 62) = I @ D @ ( I @ D @ (I @ D @ @ D(P+~ 8) = r @ ~ ( p + l @) 62 - ... = I @ L
r
n))
(by the majority principle).
Hence, r is actually the least solution of Eq. (V.2). The following theorem summarizes.
Theorem V . l . If D E Wn,"is p-regular for some integer p 2 1, then the strong transitive closure matrix T(D) exists and equals (I @ D ) ( p ) ; moreover, the equation X=I@D@X is soluble, and its least solution is X = r(D ). Application V.2
An equation similar to Eq. (V.2) was given as Eq. (11.3), namely,
u = u @ ' D 0' U.
(V.3)
Define the min-algebraic strong transitive closure matrix r* = I* 0 ' 0 @ ' ~ [ 2 10' .... Along the lines of Theorem V . l , it is easy to prove that u = r* @' u provides a solution of Eq. (V.3) for general u and general (minalgebraical1y)pregular D, and that it is the greatest solution. It was essentially this solution which was developed in Application 11.1. 3 . Matrix-Squaring
Suppose A(D) I0. Consider D(")in relation to I @ A,,-l. The diagonal elements of D'") represent the weights of cycles in @(D).However, the condition A(D) I0 implies that all such weights are non-positive and therefore dominated by the (zero) diagonal elements of I . On the other hand, the off-diagonal elements of D(")represent weights of paths of length n and cannot be elementary. The cycle-deletion argument used in the proof of Theorem IV.1 shows that these elements will be dominated by corresponding elements in some summand in A,,-l .
48
R. A. CUNINGHAME-GREEN
It follows, therefore, that the power-series in Eq. (V.l) may be terminated at any term from D("-') onward, so we may compute T(D)as for any p 2 n - 1,
r(D) = (Z @ D ) ( P ) ,
(V.4)
for which purpose we can use the matrix-squaring method, continuing squaring until power n - 1 is first reached or exceeded. Adapting the arguments of Section I,D,2, this establishes the following result. Theorem V.2. If0 E Wn,"has I ( D ) I0, then T(D) can be found by the matrix-squaring method in O(n3log n) steps. W
Because A(D) = D @ D'2'
=
D
@I
(I @ D @ -..)= D
@I
T(D), (V.5)
we can calculate A(D) from T(D)by one more matrix multiplication, and so calculate A(D) also in O(n3log n) steps. Matrix-squaring is very simple to organize and therefore remains a useful practical method of calculating transitive closures. But it is not quite the most efficient, as discussed later in this section.
B. Connected Graphs 1. Strong Connectivity
In the maze depicted in Fig. 13, is it possible to get from P to Q? A graph is called strongly connected if for each ordered pair Ni ,Njof distinct nodes, there exists a path from Ni to Ni.This definition entails that a graph with only one node is strongly connected whether or not there is a loop. Application V.3
Given a graph 6 ,weighted or not, we can determine whether it is strongly connected as follows, Assign the conventional weight 0 to every arc and let D be the matrix corresponding to the completion of 6. Clearly, I( D ) I0. Compute the strong transitive closure T(D).If any element, say ]T(D)[u , is equal to E , then (3 is not strongly connected because there is no path of finite total weight from Ni to Nj. If a graph 6 has only one node, then 6 is necessarily strongly connected and T(D(@))= [O]. Combining this with the ideas of Application V . 3 leads to the following result.
MINIMAX ALGEBRA AND APPLICATIONS
49
FIOURE 13. A maze.
Theorem V.3. UFG(D), for p-regular D E Wn,n,is strongly connected if and only if T(D) is finite. Application V.4
Following the procedure of Application V.3, we find for the graph of Fig. 14
[:: ;]. O & O
ZOO(@)=
r(D(@))= ( I
D(@))(2) is quickly calculated as
[;g ;].
Hence, the graph is strongly connected.
FIGURE14. A strongly connected graph.
50
R. A. CUNINGHAME-GREEN
2. Isolated Node
If A(D) is finite, then so is T(D) = I @ A@). If n > 1, the converse is true. For the finiteness of T(D)implies strong connectedness of UFG(D), so for every index i and distinct index j , there is a path from Ni to Nj in UFG(D). Not only does this imply that the off-diagonal elements of A(D) are finite, but concatenating a path from Ni to Nj with a path from Nj to Ni gives a cycle containing Ni, so every diagonal element of A(D) is finite also. But consider the case where n is 1 and D = [ E ] ; here, T(D) = [O] but A(D) = [ E ] . In Theorem V.3, therefore, we cannot replace the condition “T(D) is finite” by “A@) is finite,” unless we set this one case aside by a suitable extra condition, such as n > 1,or D E Fn,n ,or A(D)is finite, as will be done without comment in the sequel. 3 . Connected Components
It is easy to show that the definition of strong connectivity is equivalent to the following. For any two distinct nodes Ni, Nj, there exists a cycle containing both Ni and Nj . Now choose any node Ni of a graph 8.Form a (sub)set of the node-set 9 to include Ni together with every node Nj which is such that there exists a cycle containing both Ni and Nj. We call Z1a strongly connected nodecomponent (briefly; a component) of @. If (and only if) exhausts the entire node-set 9,the given graph 8 was strongly connected. Otherwise, we can choose a node Nk not in and construct another component 9* containing Nk . Continuing, we can decompose 9 in a finite number of steps into a finite number of disjunct components. Application V.5
From a strong transitive closure matrix T(D), we can determine the components of 8 = UFG(D) as follows. Choose any row index i. Associating with Ni any node Nj for which the entries in position (i, j ) and (j,i) of T are both finite, defines the first component. Delete the rows and columns with index i or any associated j . Repeat until the whole of T is deleted. Application V.6
For the graph in Fig. 15, the procedure of Application V.3 defines
MINIMAX ALGEBRA AND APPLICATIONS
51
F I O ~ R15. E A graph with three components.
-E
O
O
E
E
E
E
E
E
O
E
E
E
E
O
E
E
O
D=
O
E
E
E
E
E
E
E
O
O
E E
E
E E
-
E
~
E E
-
O O O O E O
O O O O E O E
E
O
E
E
O
O O O O E O E
E
E
E
O
E
E
E
O
E
E
O
To find YI1, first pick N, , say. Following Application V.5, we find %I
= IN1 9
N,, N4I.
Continuing, we find that the graph has three components: IN,, N2, N4), IN39 %I, W 5 I From the components YI1, ...,YIt of a given graph 8, construct a new graph, 4, the condensed form of 6, as follows. The nodes of 4 are identified with 9,, ...,YIt. An arc (YIa, !Po) is defined in 4 if and only if there is some arc ( N i , N,) in 8 such that N i E YIa and N, E So. Thus, Fig. 16 shows the condensed form of the graph in Fig. 15.
52
R . A. CUNINGHAME-GREEN
b FIGURE16. Condensed form.
C . Acyclic Graphs
1. Coherent Numbering
A graph which contains no cycles other than loops will be called acyclic. Figure 16 shows an acyclic graph. We may confirm that it has another
property also: no arc, and therefore no path, is from a higher- to a lowernumbered node. We call this property coherent (node-)numbering. Theorem V.4. acyclic graph.
Coherent node-(re)numbering is always possible in an
Proof. If the graph has any loops, delete them. The resulting graph has no cycle and therefore must contain at least one terminal node-that is, a node from which no arc is incident; for otherwise we could pick an arbitrary node and trace an indefinitely long path from it, which would eventually contain a repeated node and therefore exhibit a cycle. So, assign the highest node-numbers to the terminal nodes, and then delete them, together with any arc incident to any of them. In the resulting graph, there are again no cycles, so we may assign the highest remaining node-numbers to its terminal nodes. Continuing, we implement coherent numbering in a finite number of steps. The converse of Theorem V.4 is easily proved: a coherently numbered graph is acyclic.
MINIMAX ALGEBRA AND APPLICATIONS
53
2. Upper-Triangular Matrices If a graph 6 is coherently numbered, it is quickly seen that D ( 6 ) has upper triangular form, dl,
dl2
.
d22
(E)
... ... ...
..........
dlt d21
, dtt
in which all entries below the diagonal are E .
Theorem V.5. If G,n denotes the class of n x n upper-triangular matrices, then q,nis closed with respect to 0 and Q , in the sense that if A , B E G+,, then also A 0B E
c,"
and
A Q B E Gsn.
Proof. The case of A 0 B is clear, so consider 0
]A Q B[u=
aik Q bkj
with i > j .
k
For every index k , either i > k or k > j , so at least one of aik, bkj is Thus, ] A @ B[jj I = E , SO A @ B E G , n .
E.
It is clear that the condensed form Q of a given graph 6 can have no cycles, and therefore by Theorem V.4 the components of 6 can always be (re)numbered in such a way that if Q has an arc 'illO)then a Ip. Then the nodes of (3 can also be (re)numbered, by assigning the lowest indices to nodes in ?Ill,the next lowest to nodes in g 2 ,.... In this way, the matrix D corresponding to the completion of a primal-weighted graph 6 can always without loss of generality be assumed to have upper block-triangularform :
Dll
.
D12 D22
..*
...
Dlt D2t
... ...........
Dtt
Application V.7 Having regard to the condensed form in Fig. 16, we may renumber the graph of Fig. 15 to that shown in Fig. 17. The matrix corresponding to its
54
R. A. CUNINGHAME-GREEN
FIGURE17. A coherently renumbered graph.
completion is now in upper block-triangular form: &
0
&
&
&
0 0
10
&I&
j&
& ! &
& I& &I& .........
0
)......I..
&
&
&
10 O l e
&
&
&
i ........ o
&
&
&
&i&
&
&I&
D. Further Properties of Delta 1. Floyd- Warshall Algorithm
Given a primal-weighted n x n matrix D, construct a sequence of matrices DI') = D ; D12),...,Din+').
For k = 1 , ....n , the elements d r + ' ) of the matrix Dik+')are derived from the elements d t l of the matrix D f k )as follows: d f " ) = dIkl ij @ difl 0 dA$] =
dt)
(i # k and j # k)
( i = k or j = k).
(V.6)
This procedure is usually known as the Floyd- Warshall algorithm and is used to compute transitive closure matrices: Theorem V.6. IfA(D) I0 for D E W,,,, then A(D) = DI""), which the Floyd- Warshall algorithm computes in O(n3)steps.
55
MINIMAX ALGEBRA AND APPLICATIONS
Proof. For @(D), make the hypothesis that for each ( i , j ) , IDck}[, represents the greatest weight attainable by any elementary path from N i to Nj not having, as an intermediate node, any N, with index r L k. This is trivially true when k = 1. Now consider the set of weights of elementary paths from N i to Nj not having, as an intermediate node, any N, with index r L k + 1. The greatest weight in this set may already be achieved by a path also not passing through N k ,in which case,
.. = ]&I[
U
ij.
This will certainly be the case if either i or j = k, since otherwise the path would be non-elementary. Otherwise, the greatest weight in the set must be achieved as the sum of two weights, corresponding to an elementary path from Nj to N k,joined to an elementary path from Nkto Nj, in which case = d i f }6 d,$}. It follows that application of Eq. (V.6) will ensure that the hypothesis made about dk} is also true of D{k+l}, and so by induction that D{"+'}gives the greatest weights of elementary paths not having any intermediate node N, with index r L n + 1. Since there are no such nodes, all elementary paths have been considered. And since A(D)I0, a simple cycle-deletion shows that non-elementary paths need not be considered, whence
D{"+'}= A(D). At each stage of the process, (n - 1)2 matrix elements have to be transformed, and this must be carried out n times. Hence, the algorithm takes O(n3)steps. H Since the strong transitive closure matrix may be derived as I @ A, Theorem V.6 shows that the Floyd-Warshall algorithm has lower computational complexity than matrix-squaring as a way of calculating transitive closures.
D{2} =
E
O
O
E
E
E
E
E
E
O
E
E
E
E
O
E
E
O
-
-
.
0 0 0 0 & & '
D{3} =
E
O
O
O
E
E
E
E
E
O
E
E
E
E
O
E
E
O
O
O
O
O
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
O
E
E
E
E
E
O
E
E
E
56
R. A. CUNINGHAME-GREEN
. . . and finally, O
O
O
O
E
O
O
O
O
O
E
O
E
E
O
E
E
O
O O O O E O E
E
E
E
E
E
E
E
O
E
E
O
This differs from T(D), as calculated in Application 5 , in that ]A(D)[,,= E , there being no path from N5 to itself.
Application V.9 The existence of the Floyd-Warshall algorithm shows that the strongly connected components of a graph 8 may be found, the nodes coherently renumbered, and the upper block-triangular matrix developed, in O(n3) steps at worst, since the calculation of T(D(@))dominates the computational complexity. In fact, this is not the most efficient way of doing this, but it suffices for the procedures which follow.
2 . Definite Matrices Section VI, following, will consider the stable states and maximum speed of a DES. The rest of the present section prepares the ground for this. Any node Nj in 8(D)which is contained by a critical cycle will be called an eigen-node and the index j an eigen-index. (The appropriateness of these names will emerge in the next section.) Adapting a term introduced by CarrC (1971), we shall call D E Wn," definite if
A(D) = 0
and
6 = UFG(D) is strongly connected.
(V.7)
The following result summarizes a number of properties of definite matrices which follow easily from previous work.
Theorem V.7. Let the n x n primal-weighted matrix D be definite, and let @ = UFG(D). Then (i) D E Fn*n; A = A(D) exists and is finite and ( i f n > 1) each 6 , with i # j gives the weight of some path from N i to Nj in (3 (and in its completion) which is of maximum weight and whose length does not exceed n - 1.
MINIMAX ALGEBRA AND APPLICATIONS
57
(ii) Each diagonal element 6 , gives the weight of a cycle of maximum weight among cycles containing Nj in 6 (and in its completion) and
whose length does not exceed n. (iii) Both 6 and its completion contain at least one elementary cycle of
weight zero and no cycle of positive weight. (iv) There is at least one eigen-node, and for each eigen-indexj , 6 , = 0; for every other index k, 6kk < 0. The foregoing results will be used freely in what follows.
Theorem V.8. Let A = A(D), where D E Fn,nis definite. If j is an eigen-index, then for each index i, there exists some index k such that 6, = dik + 6 , . Proof. If a maximum-weight path from Ni to Nj in 6 ( D ) is of length 1, then 6, = dij and the result holds on taking k = j (since 6 , = 0). If a maximum-weight path is of length greater than 1, suppose it is
Then the result follows by applying the principle of optimality (Application IV.l).
Theorem V.9. Let A = A(D), where D E Fnsnis definite. For every index j , there is an eigen-index i such that 6, gives the weight of a path of length exactly n + 1 from Ni to Nj. Proof. Choose an eigen-index k and a path T in 8 ( D )of maximum weight from N, to Nj and length not exceeding n. Let Q be a cycle of weight zero containing N,. Append 7 to enough copies of Q to give a path of length exceeding n from N, to Nj, still of maximum weight. Progressively cancel initial nodes from this path until the length is reduced to n + 1. The resulting path is still to Nj from some eigen-node Ni (all the nodes in Q are eigen-nodes) and being a sub-path of a maximum-weight path, it is of weight 6, by the principle of optimality.
Theorem V.10. If D E Fn,nis definite, then A(D(p’)= 0 ( p > l), and UFG(D),UFG(D@))have the same eigen-nodes. Proof. Any cycle containing Ni in UFG(D@’) corresponds to some diagonal element ]((D(p))(r)[ii in a power of D(p).But this is a diagonal element in a power of D, so it cannot exceed zero and will equal zero only if Ni is an eigen-node in UFG(D).On the other hand, if Ni is an eigen-node in UFG(D), then some cycle Q, containing Ni in UFG(D),has weight zero. If Q has length r, then concatenating p copies of Q will give a cycle of length r and weight zero, containing Ni in UFG(D(”)).
58
R. A. CUNINGHAME-GREEN
Thus, the greatest cycle-weight, and therefore the greatest cycle-mean, in UFG(D(P)) is zero. The increasing property of a DES, discussed earlier, motivates the following definition. D
E
Wn,, is increasing if D 6 x
2 x for all x E W,.
Theorem V.ll. D E Wntnis increasing if and only D E FnSn.Zf D is increasing and definite, then dii = 0
( i = 1,
if D 2 I ,
and then
...,n),
6 A(D) = A(D). Proof. If D L I, then by isotonicity D 6 x L Z 6 x = x. Conversely, and D
r ( D ) = D ( n - l )= A(D);
if D is increasing, then by the principle of column action, D 0 Z 2 I. Since this makes the diagonal elements finite, D E F,,,. If D is also definite, then each dii I0 , since A(D) = 0; but each dii 2 0, since D 2 I; so each dii = 0. Moreover, using D L I in Eq. (V.4),
T(D)= D("-')= D(")= D 6 T(D) = A(D). This also shows that D 6 A(D) = D@+') IA(D) ID
(by Eq. (IV.8))
6 A(Dh
since D is increasing so D
6 A(D) = A(D).
VI. THESTEADY STATE A. The Speed of a System 1. Mean Stage-Time
How fast can a DES run? In a general way, it is clear that the slower machines will constrain the speed of the faster, but what will be the net effect of this in the long run? Such topics were considered by Cohen et al. (1985), for example. It is convenient to work in terms of the time to complete a large number of stages, following completion of the first stage. In order to complete M
MINIMAX ALGEBRA AND APPLICATIONS
59
stages beyond the first, all machines must complete their (M + 1)st event. So, from Application 11.4, a fair measure of the time taken is =
mu),A(M’0 x(l)),
where A E K,nis the system matrix and x(1) the finite vector of first-event times. Suppose the maximum cycle-mean A(A) of matrix A is achieved with a cycle of length L. Let r 1 1 be any integer. Putting Mequal to Lr and using Theorem IV.5, T(Lr) 1 ( A ( A ) p ) =
(Lr)A(A).
It follows that there exist arbitrarily large values of M for which the average stage time T ( M ) / M is at least A(A). Otherwise expressed: The reciprocal of A(A) sets a bound on the sustainable speed of the system in terms of stages completed per unit time. Summarizing: Theorem VI.1. Lim sup ( T ( M ) / M )1 A(A). H M-.w
2. Steady State
In Cuninghame-Green (1962), a case-study is described in which the managers of a DES wished to operate the system in such a way that there would be a constant interval A between each two consecutive events on each machine-in other words:
xi(r + 1)
= 13.
+ xi@)
for all i and all r ,
or equivalently, x(r
+ 1) = A 6 x(r)
for all r .
(VI.1)
If this could be achieved, we would say that the system was in a steady state. Then it is clear that x(M
+ 1) = AtM) @I x(l),
and therefore, T(M) = x(l)* @I
(P) 6 x(1)) = A(M) 6 (x(l)* 6 x(1)) = P),
since x* @I x = 0, so a mean stage time equal to A is attained. From Theorem VI.1, therefore, A L A(A), and the question arises as to whether a steady state is achievable with A = L(A). We examine this next.
60
R . A. CUNINGHAME-GREEN
B. The Eigenvalue 1. The Eigenproblem
Since x(r + 1) = A @ x(r), Eq. (VI.1) implies, on writing x for x(r), A @ x = I 0 x.
(V1.2)
This is recognizable as the eigenvector-eigenvalue problem (briefly: eigenproblem) for A, in which we seek an x (an eigenvector) on which the action of the matrix is the same as that of a scalar A (the corresponding eigenvalue). Since we wish A , x , r3. to have a physical interpretation as a system matrix, a state, and a time-lapse, respectively, we shall assume A E F,,n and say that the eigenproblem for A is finitely soluble if there exist finite x, A such that Eq. (V1.2) holds. Theorem VI.2. If the eigenproblem for A E F,,,, is finitely soluble, then the eigenvalue is unique and equals I ( A ) . The eigenproblem for A(p)is then also finitely soluble for any integer p > 1, and A(A'p') = (A(A))'P'.
Proof. By hypothesis, we can find finite numbers 1,x , , .. .,x,, such that for i = 1,
..., n,
1 + xi
=
max(aij + xi). i
(VI.3)
Now, for any choice of i, say i = i , , equality occurs between the LHS and some j t h term in the RHS of Eq. (VI.3), say for j = i,:
I
+ xi, = ai, + xi2. j2
If i, # i, , we can next consider the equation with i = iz and find a similar equality f o r j = i 3 , say: I + xi2 = aiZi,+ x i 3 . Continuing in this manner, we generate a sequence of indices i l l i,, ... and eventually some index ik will recur. If we sum all the equations, from the first occurence of ik in the role of i to its first occurrence in the role of j , we obtain r.4 + 7 = o + 7 , where r is the number of equations summed, and 7
= Xik
+
O = Q . .
ikiktl
+ Xikt,-,, +
+ aiktr-,ik-
61
MINIMAX ALGEBRA AND APPLICATIONS
Since r is finite, it may be canceled to give r1 = a, so 1 equals the cycle-mean a / r . Hence, 1 is not greater than the greatest cycle-mean I ( A ) . But we have already argued that 1 cannot be less than 1(A), so the first result follows. Moreover, for the finite eigenvector x, by iteration, A(P) 0 x = A 0
.*.0 A 0 x
= (A(A))(P) 0 x ,
so the eigenproblem for A ( p ) is finitely soluble with eigenvector x and eigenvalue A ( A ( P ) ) = ( A ( A ) ) ( ~ ) . w Application VI.l The work of Section IV shows that the mean stage time in the steady state for the model system equals 6.
2. Algorithm of Karp The steady-state interpretation of 1(A) lends importance to the need to calculate it efficiently. The straightforward approach of Application IV.3 can be improved upon by a procedure which is O(n3),as discussed next. Given A E Fn,n,if UFG(@(A)) is not strongly connected, then A may be brought to upper block-triangular form [A,] in O(n3)steps at worst, as in Application V.9. It is clear that any cycle containing nodes from different components of @(A) will have cycle-sum E . Hence, it will suffice to find the values of l(Aii) for the diagonal blocks and take the greatest; therefore, we confine our attention now to the case where @(A)is strongly connected.
Theorem V1.3. Let A E Fn,nhave strongly connected UFG, and let a; denote the element on row i of the first column of A(k) ( k = 1,2, ...). Then 1(A) = max i=l
... n
(
min ((a:" - af))/(n+ 1 - k)) ,
k=l...n
)
in which the scope of the operators is understood to be restricted to indices i, k for which both a:+' and a; are finite. E Fn," (Theorem IV.2). Let D E Fn,n be defined by subtracting the (unknown) finite number 1 = &Ifrom ) every element of A. Clearly, all cycle-means are thereby also reduced by 1, so 1(D)= 0, and since the UFG remains strongly connected, D is definite. Introduce the analogous quantities d f for the matrix D . The symbols a:, d f denote the greatest weight of a path of length k from Ni to N, in @(A), @(D), respectively, so a! = d f + kA
Proof. A(A) is finite, since A
62
R. A. CUNINGHAME-GREEN
and hence, for all i and all k # n
+ 1, restricted as before, (VI.4)
From Theorem V.7, dil , the ith entry in column 1 of A(D), is finite and For each i, 6,' 2 d f
for all k 2 1.
For each i, ail = d;
for some r (1
(VI.5) Ir In).
(VI.6)
So if we choose k in Eq. (VI.4) to be r from Eq. (VI.6), we can make the RHS of Eq. (VI.4) less than or equal to zero. Hence (for any i with a:" finite),
min k
a:+' - a:
n+l-k
(VI.7)
-ASO.
And there is at least one i with ,,!'+I (therefore Theorem IV.9, For some i, ail = d / + l .
al")
finite, since from (VI.8)
Moreover, for this i, Eq. (VI.5) shows that the RHS of Eq. (VI.4) is greater than or equal to zero for all k. So, for this i, the inequality sign in Eq. (VI.7) is reversed. We conclude that a:+' - a: max min -L=O, i k n + l - k and the result follows. We remark that the restriction of indices in the theorem statement is purely a convenience to save attributing meaning to expressions such as E - E. The restriction never makes the process vacuous because, as we have seen, at least one a:" must be finite; and for each such i, at least one a: is finite because A(k)E Fn,n(Theorem 1.3). The symbols a: have the following interpretation. Define x(1) to be column 1 of A , and develop the orbit x( l), ...,x(n + 1) under the action of A . Then a: = xi(k). It is also easy to see that the proof of Theorem V1.2 remains valid if we choose any column-indexj and let the symbols a: now represent the greatest weight of any path of length k from Ni to Nj.The use of column 1 was arbitrary. Application VI.2
Given
MINIMAX ALGEBRA AND APPLICATIONS
63
choose x(1) as column 1 of A and form the orbit,
Row 1 can be ignored, since x,(3) = E . On row 2, whence A(A) = (10 - 4)/(3 - 1) = 3.
E
can be ignored,
Application VI.3
For the system matrix of the model system, the orbit based on the first column is
We readily confirm that A(A) = 6, as previously found. The procedure implemented in Applications VI.2 and VI.3 is from Karp (1978). It is easily seen that the computation of the orbit dominates the computational complexity, whence by Theorem I. 1, we have
Theorem VI.4. Karp’s algorithm may be implemented in O(n3)steps. 4
C. Finite Eigenvectors 1. Fundamental Eigenvectors
Let A = A(D) for D E Fn,nwith A(D)I0. Then D is p-regular, and from Eq. (V.2), and the majority principle, T(D) 1 D
6 T(D).
(VI.9)
By isotonicity, D
o rm L D o D 6 r w ,
whence, by Eq. ( V . 9 , A(D) 2 D
6 A(D).
(VI.10)
Hence, if d is any column of A, DBdsd.
(VI. 11)
The material on definite matrices in Section V.D,2 now enters the argument.
64
R. A. CUNINGHAME-GREEN
Theorem VI.4. If D E Fn,nis definite, j is an eigen-index, and d is column j of A = A(D), then d is afinite eigenvector of D (with eigenvector zero). Proof. Certainly, d is finite because A is. For any row-index i, Theorem V.8 guarantees the existence of an index k such that
dik
6ij =
+ 6,.
(VI.12)
So the majority principle applied to the element on row i of D Q d gives
max(dir + 6,) r
2
dik
+ 6,
=
dij,
which is the element on row i of d . Thus, column j of A satisfies D @ d 2 d , which coupled with Eq. (VI.11) gives
DQd=d.
H
The columns of A corresponding to eigen-indices are called fundamental eigenvectors (of D ) .
Application V1.4 For the following matrix D,
-a -::]. 1-3
D = [
Karp’s algorithm finds A(D)= 0. The Floyd-Warshall algorithm is therefore applicable and gives
which is finite, so T(D) is finite. Hence, the UFG is strongly connected, and D is definite. , = 0 (Theorem V.7), are j = 1 and 2. The eigen-indicesj, for which 6 We readily confirm that the first two columns of A,
are eigenvectors of D, with eigenvalue zero.
65
MINIMAX ALGEBRA AND APPLICATIONS
2. Eigenproblem f o r A E F,,, From the known complexities of the relevant algorithms, the work carried out in Application VI.4 easily generalizes to the following result.
Theorem. If D E F,,, is definite, then the eigenproblem f o r D is finitely soluble and the eigenvalue and fundamental eigenvectors may be found in o(n3) steps. H This theorem is not numbered because it immediately generalizes.
Theorem V1.5. If A E F,,, and UFG(A) is strongly connected, then the eigenproblem for A isfinitely soluble in O(n3)steps.
Proof. As in Theorem VI.3, D E F,,, is definite, where D is defined by A = L(A) Q D.
(VI.13)
First A(A), and then the fundamental eigenvectors of D,may be found in O(n3)steps, and are all finite. If d is a fundamental eigenvector, A Q d = A(A) Q D Q d = L(A) Q d ,
so d is an eigenvector also of A, with eigenvalue L(A). H The terms fundamental eigenvector, eigen-index and eigen-node will henceforth be extended to A and @(A)from D and 6(D)in the obvious way. Application VI.5.
If A Q x = A Q x , then following Theorem VI.2, xis also an eigenvector of A @ A(2)@ * * * @ A("), with eigenvalue L0 @ * * * @ A(,). Thus, if D is definite, and y is a finite eigenvector,
0Y
=Y.
3 . Equivalent Eigenvectors
Reserved notation For given definite D ( j= 1, ..., n).
E F,,,,
column j of A(D) is denoted by Aj
66
R. A . CUNINGHAME-GREEN
Two eigen-indices i, j (and the corresponding eigen-nodes Ni , Nj , and the corresponding h i , A j ) will be called equivalent if there is a critical cycle containing both Ni and Nj in UFG(D). According to context, we write i = j, or N, = Nj, or Ai = A j . Application V1.6
For the definite matrix D in Application VI.4, dl2
+ d21 = 0;
thus, the eigen-nodes N , , N2 both lie on the same critical cycle, so N, = N2. Theorem VI.6. Let A = A(D) for given definite D E Fn,n.Let i be an eigen-index,j any other index. Then 6, + Sji I0, with equality i f m d only if j is an eigen-index and i = j . Proof. Obviously, 6 , + 6, is the weight of a cycle and so cannot be positive; if it is zero, then by definition i = j . Conversely, if i = j then there is a cycle
(Ni,
..., Nj, ..., Ni)
of weight zero. Since this is the greatest possible weight for a cycle, then by the principle of optimality, the constituent paths (Ni, ...,Nj) and (Nj, ...,Ni) must have maximal weights. Thus, 6, + Sji = 0. H In Application VI.4, A2 can be obtained from A] by adding a constant ( = 1) to each element. In max-algebraic terms, A, , A2 are scalar multiples
of one another: A2
A1 = - 1 Q A2.
= 1 Q A,;
This symmetric relationship is associated with the fact that A] = A 2 . The following result explains. Theorem VI.7. Let A = A ( D ) for given definite D E FnVn. Let i, j be eigen-indices. Then Ai = 01 Q Aj for some finite 01 if and only if i = j . Pro05
If indeed Ai = 01 Q A j , then, on rows i and j ,
aii = 01 + 6,
and
Sji = 01
+ 6,.
Since dii = 6, = 0, it follows that 6, + Sji = 0 , so i = j by Theorem VI.6. Conversely, if i = j, let r be any row-index. We have 6,
+ 6, 5 a,
(VI.14)
MINIMAX ALGEBRA AND APPLICATIONS
67
since clearly the LHS gives the weight of a possible path from N, to Nj. Similarly, (VI. 15) 6,j + 6, I6,i. If Eq. (VI.15) could hold as strict inequality, we could add it to Eq. (VI.14) and produce
+ Srj + Sji + 6, < 6, + Srj, which is impossible, since from Theorem VI.6, 6, + 6, = 0. 6,
Hence, Eq. (VI.15) holds with equality for every row r, so Ai = a Q A j , where (11= Sji. W D. The Eigenspace 1. Independence of Fundamental Eigenvectors
In contrast to Theorem VI.7, the following result shows that fundamental eigenvectors which are not equivalent have an important measure of independence. It uses the concept of linear combination introduced in Section 111,D,2.
Theorem VI.8. No fundamental eigenvector hi is a linear combination of others not equivalent to Ai . hoof. Suppose that we can write Ai as a linear combination of other columns, with finite coefficients (Yk : k#i
On row i, equality must occur between the LHS and some term on the RHS, say a term with indexj: (VI.16) 6ii = i ~ Q j 6,. On row j we have by the majority principle (VI. 17) Equations (VI.16) and (VI.17) give aj
+ 6, + Sji 2 aj + dii + 6,.
Canceling finite a j , and Sii = 6, = 0,
6,
+ 6, 2 0,
which by Theorem VI.6 can happen only if i = j . W
68
R. A. CUNINGHAME-GREEN
2 . The Eigenspace Reserved notation For A E F,,n with finitely soluble eigenproblem, V(A)denotes the set of all finite eigenvectors of A. V(A)is called the eigenspace of A . The use of the term space is justified by the following easily proved result.
Theorem VI.9. If A E Fn,nhas afinitely soluble eigenproblem, then for all x , y E V(A)and a E F, x @ y ~ W l ) and
a a x ~ V ( A ) .W
It follows that any linear combination of finite eigenvectors is again a finite eigenvector. We are now in a position to characterize the eigenspace V(A)completely in the most important case.
Theorem VI.10. If A E Fn,nhas strongly connected UFG, then V = V(A) consists precisely of all linear combinations of the fundamental eigenvectors of A .
froof. By Theorem VI.9, all such linear combinations lie in V. On the other hand, let y E V. As before, define the definite matrix D such that A = L(A) 0 D, and to simplify the notation suppose without loss of generality that the fundamental eigenvectors are columns A , , ...,Ap of A = A@). We shall show that P
Y =
C'Yj
6 Aj,
j = 1
where y , , ...,yn are the components of y . For we know from Application VI.5 that
Y = A O Y Aj
=
(VI.18)
j = 1
(VI. 19) by the majority principle. But on an arbitrary row il of y = D 6 y, equality holds between the LHS and some term on the RHS, with index i 2 , say. Exactly as in Theorem VI.2, we may construct the index-sequence . . I ] , i2, ..., until some index ik recurs, which will be an eigen-index. Adding these equalities for j = i, to the first occurrence of i k , and canceling y j for
MINIMAX ALGEBRA AND APPLICATIONS
69
j = i2 to i k - l , gives
y '1. = d1112 .. + 5
... + dik-lik+ Yi,
di, ik + Yi,
(VI.20)
9
since
dili2 + * * * + dik-,ik is the length of a path from Nil to N i t . Since ik is one of the indices 1, ...,p, the majority principle applied to Eq. (VI.20) gives
And since row-index il was arbitrary, we have shown
which, combined with Eq. (V1.19), delivers the result. Thus, the whole eigenspace is generated by the fundamental eigenvectors. In fact, since any fundamental eigenvector generates all equivalent fundamental eigenvectors by taking scalar multiples, it is clear that the generating set may be reduced to any maximal set of inequivalent fundamental eigenvectors.
Application V1.7 The eigenproblem for the model system can now be solved completely. We have already found A(A) = 6. Form the usual matrix D such that A = 6 6 D and use the Floyd-Warshall algorithm to find
The first three columns have zero diagonal elements and are therefore fundamental eigenvectors, but the first two are equivalent, since d12+ dZl= 0. Hence, every eigenvector for the model system can be generated by taking linear combinations of
A1=
[ ; Iy [ -%I. A3 =
-4
70
R. A. CUNINGHAME-GREEN
Application VI.8 In the model system, if the five-stage project is initiated by all machines being set in motion at time zero, the first events occur at times given by x(1) of Eq. (1.10). From the forward orbit in Section I,D,l, A 0 4 1 ) = x(2)f 6 0 x(l),
so x(1) is not an eigenvector of the system matrix A , and therefore the system does not enter a steady state. Replacing x( 1) by any eigenvector will imply start times which take the system into a steady state, but from a management point of view it may be undesirable for these to be wildly different from zero. Accordingly, we seek a Chebyshev-best approximation to d l ) , from the eigenspace of A. Using the result of Application VI.7, this requires the calculation of the principal solution xwof
This leads to the Chebyshev-best eigenvector
with corresponding start-times
E. Steady State without Strong Connectivity 1 . Possible Finite Insolubility
Figure 18 depicts, in the style of Fig. 1, a DES with two machines. The UFG of its system matrix
71
MINIMAX ALGEBRA AND APPLICATIONS
r
r+1
F I ~ U R18. E A two-machine DES.
is not strongly connected. The nodes have been coherently numbered, so A is upper-triangular. Can such a system have a steady state? If a,, < a , , , the answer is negative, and this may be understood both intuitively and algebraically. Since machine 2 is unconstrained by machine 1, it is intuitively clear that it is free to run with an average stage-time of a,, , whereas machine 1 will require a greater average stage time, of a, . Indeed, if the physical function of machine 2 is to pass material to machine 1, then in the long run an indefinitely large stock will accumulate between the machines, unless some further constraint is introduced. Algebraically, we find that the eigenproblem for A is not finitely soluble. L(A) = a , , , so row 2 of the eigenvector-eigenvalue relation A 6 x = L @I x would give a,, + x, = a , , + x z .
,
In this simple example, it is not hard to see that a steady state is achievable when a , , Ia,, , but more complex structures are more difficult to analyze, as Section E,3 will show.
2. Equality of Eigenvalues It is clear that, for a steady state to exist, all strongly connected components of the system must be retarded as necessary to run with a mean stage time equal to L(A).In practice, therefore, nothing is lost by attaching if necessary an extra machine-a pacemaker-to each component of UFG(A) t o force this to happen. Select any node Nj in a given component gZk and adjoin an extra node, NI, say, setting a, = 1(A);
aU = air = 0;
ail = ali =
E
otherwise.
72
R . A. CUNINGHAME-GREEN
Assuming A(A) is not negative, since the process moves forward in time, this obviously gives every connected component the same mean stage-time A(A), and we can then appeal t o the following theorem.
Theorem VI.11. Suppose the nodes of UFG(A) for given A E Fn,nare coherently numbered, so that A has upper block-triangularform A = [A,] (r,s = 1, ..., t ) . If all the maximum circuit means L(A,) are equal, then the eigenproblemfor A is finitely soluble.
Proof.
Theorem IV.2 shows that A(A) is finite, and since any cycle containing nodes from two different components will have cycle-mean E , it is clear that each A&) = A(A). This precludes the possibility that any component is an isolated node, so from the discussion in Section V,B,2, each A, satisfies the conditions of Theorem VI.5 and therefore has a finitely soluble eigenproblem. Let X1, ...,X,be respective finite eigenvectors. We construct for A an eigenvector
y = [
;]
by the following algorithm. First, set yl = X,.Then, for r = t - j ( j = 1,2, ...), successively set Y, = a, X,, where a, is chosen large enough to make r
A(A) 0 U, 2
x.
A, s=r+l
Then U, is still an eigenvector of A,, and, from the block triangularity of A, A 0 Y = [An1 0
[Kl
=
1
= A(A) @ Y .
Application VI.9 The following matrix A is in upper block triangular form:
MINIMAX ALGEBRA AND APPLICATIONS
[;
73
j]
.......d........
& I 3 3
&
By inspection or calculation: L(A 1) = A@,,) = 3. Applying the methods of Application VI.4, we find the following fundamental eigenvectors.
ForA,,:
[ y]; -
forA,,:
Taking the first eigenvector of
[ :],[-:I.
as X,,we find
We must take a sufficiently large scalar multiple of X I to dominate this, say
giving
Y = [
;I.
It is readily confirmed that this is indeed a finite eigenvector of A. Another may be found, using the second fundamental eigenvector of A,,.
3 . A General Condition As remarked earlier for the 2 x 2 case, a stable state may be possible without equality of cycle-means. To investigate this further, suppose A is in upper block-triangular form [A,] (r = 1, ..., t ) and satisfies the following condition : (r = 1, ..., t - 1) @ A r r @ B, 5 B,
(VI.21)
where L = A&)
and B, = [A,,+,,...,Art].
74
R. A. CUNINGHAME-GREEN
If X is any finite eigenvector of A,, , an eigenvector y=[
;]
of A is obtained by the following algorithm. Set yl = X. Then, for reducing r = t - 1, ..., 1, set
[ i; 1. r,+1
Y , = A(-1) Q Br Q
For, application of the rth block-row of A to Y gives
[ i; ] r,+1
ArrQ
KaBrQ
= (A(-') Q A r r
0 Br 0 Br) @
[
zl] [ cl] = Br Q
=LOX. Assuming that no Br has a row consisting entirely of that Y will be finite.
E,
it is easily seen
Application VI.10
For the following matrix
[ ;. ] &
A =
we have A = 4, X = [O], and All=[;
& 1
31
..&1 .;. 12 ,
3.
B1=[;].
The condition of Eq. (VI.21) is readily confirmed, and the foregoing procedure finds
75
MINIMAX ALGEBRA AND APPLICATIONS
VII.
INFINITE PROCESSES
A. Convergence to Steady State
1. The Orbit Application VI.8 showed that starting all machines at time zero does not cause the model system to enter a steady state at its first events. However, from the orbit computed in Section I,D,l, the components of x(3) can be obtained by adding A(A) = 6 to the corresponding components of 42). So x(2) is an eigenvector, and the system has eventually entered a steady state despite the arbitrary choice of starting times. But suppose instead that the project started at times
L 41 The five-stage orbit now is
x(1) =
[
x(3) =
l;
12
16
(VII. 1)
None of these is an eigenvector of A , but 45) is now obtainable from x(3) by adding 12 to each component. Thus, A(’) 6 x(3) = 6(’) 6 x(3), so x(3) E V(A(’)). It is clear what will happen if the orbit is continued: Each x(r) from r = 5 onward will be obtained by adding 12 to the components of x(r - 2) (simple proof by induction). This is certainly stable behavior of a kind. In general, given a DES whose system matrix A E Fn,n has finitely soluble eigenproblem: If after a finite number N of stages, the orbit reaches an eigenvector of some power A(p)of A , then the total time taken to complete any p consecutive stages thereafter is always A ( A ( P ) ) = (A(A))(P) = pA(A).
76
R. A. CUNINGHAME-GREEN
Thus the average stage time is A(A), and the system is running at maximum theoretical speed. The case p = 1, however, remains of special interest, in view of its simple character from a management and control point of view. We consider this special case first.
2 . Robustness If, unlike the model system, a given system has the property that from any initial state, it will reach an eigenvector of the system matrix in a finite number of stages, then it has a particularly useful property. Even if some mishap interrupts its running, it will automatically return to a stable state in due course. A matrix A E Fn,"will be called robust if, for any finite x , there is an integer N such that A'" 0 x E V(A).A system with robust system matrix will also be called robust.
Theorem. If D E Fn,nis increasing and definite, then D is robust. Specifically, D("-" 0 x E V(D)for any x E Fn,,. ProoJ From Theorem V . l l , we know that D("-')= A(D), and all diagonal elements of D are zero. Hence, all columns of A(D) are fundamental eigenvectors. So D("-l)0 x = A(D) 0 x is a linear combination of eigenvectors and therefore is itself an eigenvector. H The preceding theorem is not numbered because it immediately generalizes.
Theorem VI1. If the UFG of A E Fnenis strongly connected and aii = A(A) (i = 1, ..., n), then A is robust. ProoJ This follows because A = A(A) 6 D where D is definite and increasing. Application VII.1
The following matrix A , with A(A) = 2, satisfies the conditions of Theorem VII. 1 : A=
[ 1 :-:I -4
2
0 .
A quick calculation shows A(3)= 2 0 A('), so the columns of A(*)are all
eigenvectors. In the orbit
x(1) =
[
'"I,
MINIMAX ALGEBRA AND APPLICATIONS
x(2) =
100
[ '"I; [ ;:I,
77
x(3) =
102
104
we find x(4) = 2 0 x(3), so the orbit reaches an eigenvector x(3) in two stages, despite the disparate sizes of the components of x(1). 3. A Sharper Result The conditions of Theorem VII.l are unnecessarily restrictive, as a slightly more elaborate argument shows.
Theorem. If D E Fnsn is definite, and dii = 0 for every eigen-index i , then D is robust.
Proof. Choose N ? pn, where p is any arbitrarily large integer, and consider cij, where C = D(N).Clearly, cij is the greatest weight of any path of length N from Ni to Nj in @(D),and each of these necessarily contains some q ? p of cycles, whose deletion would leave an elementary path. Thus, cij 5 6, + 0, where w is the total weight of some q cycles. If we provisionally consider only paths having no eigen-node as intermediate node, then all q cycles have strictly negative weight, and the path sum can be made into a negative number less than any given finite number by taking p large enough. We can therefore ignore this possibility, since if k is any eigen-index we can realize the constant value Sik + Skj as a candidate for cu by a path from Nj to Nk, followed by a suitable number of repetitions of the loop (Nk, Nk), followed by a path from Nk to Nj. Choosing the greatest of these with respect to k, it easily follows that for N sufficiently large,
c.. V =
c
0
(6ik
+ a,.)
(VII.2)
eigenindices k
Notice that this argument is essentially independent of whether i = j , or whether i or j may also be an eigen-index. Now let P (respectively, Q) be derived from A(D)by deleting all columns (respectively, rows) whose index is not an eigen-index. Then Eq. (VII.2) says that for all sufficiently large N: D") = P Q Q.But the columns of P are all (fundamental) eigenvectors, so
D Q D") = D Q P O Q = P Q Q = D ( ~ ) ,
78
R. A. CUNINGHAME-GREEN
Thus, the columns of D") are all eigenvectors. Hence, D(" 8 x, for any finite x , is a linear combination of eigenvectors and thus is itself an eigenvector. W Again, this theorem is not numbered because it immediately generalizes as follows. The easy proof is omitted. Theorem VII.2. If the UFG of A E F,,, is strongly connected and aii = A(A)for every eigen-index i, then A is robust. 4. Ultimate Periodicity
A sequence (x(r))(r = 1,2, ...) of finite vectors will be called ultimately periodic of period p if there exists an integer N such that for all
r 2 N: x(r
+ p ) = x(r).
If I is finite, a sequence [ y(r))will be called ultimately I-periodic of period p if y(r) = I(r)8 x(r) ( r = 1,2, ...), where (x(r))is ultimately periodic of period p . The phrase of period p may be omitted, with the understood meaning of some period. Application VII.2
The discussion in Section VII,A,l can be paraphrased by saying that the orbit in Eq. (VII.1) is ultimately 6-periodic of period 2. This is explained by calculating A(2),
[ 5 :;
'!I9
12 13 12 15 1:
and observing that it satisfies the conditions of Theorem VII.2 and is therefore robust. Hence, the orbit under the action of A , based on any finite starting vector x, will always reach an eigenvector of A(2). To generalize Application VII.2, we define another matrix parameter. Reserved notation
For A E F,,,,p(A) denotes the least common multiple of the lengths of all critical cycles. Theorem VII.3. For A E F,,,, if UFG(A(") is strongly connected for some multiple t of p(A), then the orbit under A , based on any finite starting vector, is ultimately I(A)-periodic of period t.
MINIMAX ALGEBRA AND APPLICATIONS
79
Proof. A(') E F,,,by Theorem 11.1. And, by Theorem IV.4, the weight of each arc in @(A('))is the weight of some path of length t in @(A), so UFG(A) is also strongly connected. Thus, A has finitely soluble eigenproblem and L(A('))= tL(A),using Theorems VI.2 and VI.5. Any cycle of cycle-mean t & 4 )in @(A('))determines a cycle of cycle-mean 1(A) in @(A)and shows that every eigen-node in @(A('))is an eigen-node in @(A).Theorem IV.6 now shows that A(') satisfies all the conditions of the foregoing Theorem VII.2. We may now derive one further result in this spirit, under conditions scarcely more stringent than just strong connectivity. Theorem VII.4. For A E F,,,,i f UFG(A) is strongly connected and at least one diagonal element a k k isfinite, then the orbit under A ,based on any finite starting vector, is ultimately L(A)-periodic.
Proof. The UFG of every sufficiently high power A(') of A is strongly connected since by taking a path from Ni to Nk, followed by a suitable number of repetitions of (Nk,Nk),followed by a path from Nk to Nj,we can find a path of length t from any Ni to any Nj.Taking t as a sufficiently large multiple of p(A), we shall meet the conditions of Theorem VII.3. From a practical point of view, we would expect that the system matrix of any DES of physical relevance would have all its diagonal elements finite, since any machine will have to constrain its (r + 1)st event to follow its rth. The effect of the preceding result is therefore quite far-reaching and shows that a wide class of DES will ultimately run at maximum theoretical speed from any initial state. Application VII.3
The system matrix of the model system satisfies the conditions of theorem VII.4, so the orbit based on any finite vector will be ultimately 6-periodic. These results can be refined, especially as regards the precise calculation of the period, and can be extended to certain DES which, though not strongly connected, have periodic behavior. But further consideration of this issue is beyond the scope of the present text. B. Power Series
1. Generalized Transitive Closure The cost of traveling from one place to another will usually depend not only on the mileage, but also on the subsistence costs incurred at each stop. In the money market, it may be advantageous to change dollars into
80
R. A. CUNINGHAME-GREEN
pounds and then into yen, but a greater brokerage fee is incurred for the double transaction. Thus, the profit or loss associated with the operation of a DES may depend not only on the route by which the system moves from one state to another, but also on the number of stages passed through. Accordingly, we may consider c, @ D(,) in place of Dtr)when considering the utility of r-stage transitions. Here c, represents a one-time gain or tariff associated with using the system for exactly r times. The GWPP then involves consideration of
coc, @ D(,)
rather than
r
coD"', r
i.e., of a general matrix power series rather than a simple matrix geometric series. The transitive closure matrices T(D), A(D) were introduced essentially as geometric series in the matrix D, which converge in a finite number of steps if A(D) 5 0, and do not converge if A(D) > 0. In generalizing these results, we must first develop a convergence theory for scalar power series Lo c, 0 z(').That is the objective of the following section. In Sections B,3 and B,4, we shall then show how the problem for matrices can be completely reduced to that for scalars. 2. Scalar Power Series
Strictly speaking, in considering scalar power series, it is not necessary to place restrictions on either the coefficients c, or the argument value z-each may be finite, or E , or E * , as shown in Cuninghame-Green and Huisman (1982). However, to avoid tedious special cases, without losing any practical generality, we shall make the following conventions. The coefficients will be taken from the primal weighting system: Each will be finite or equal t o E . Use of E allows us t o consider series with some terms "missing"; but an infinite number of them must be finite, otherwise we are essentially considering a maxpolynomial, as discussed in the next Section VIII, and no convergence question arises. Argument values z will be assumed to be finite. Dual conventions and results will hold for min-algebraic power series with coefficients from the dual weighting system. Theorem VII.5.
The sequence oN,where ON =
1' cr @ z',),
(VII.3)
rsN
converges with respect to N upper bound.
if and only if the terms c, Q z(,) have a finite
81
MINIMAX ALGEBRA AND APPLICATIONS
Proof. As N increases, the operator C0 takes the greatest of more and more terms, so 0, is monotone non-decreasing with respect to N; hence, by the monotone convergence theorem, it is either upper-bounded and convergent, or ultimately exceeds any bound, as N --* 00. But again, by the nature of the operator I@, oN is upper-bounded if and only if the term-sequence [c, 0 z(,)) is upper-bounded. The scalar power series will be said to converge finitely (for a given argument z ) if there is an index N such that
p c, @ p rsN
=
I@c, @ 20)
for a l l p
> N.
rsp
An index N will be called a terminating index of the scalar power series if
C@c, o z(r) L cp 0 z ( p )
for all p
> N.
rsN
It is clear that the scalar series converges finitely if and only if it has a terminating index.
Reserved notation The convergence bound p = ~((c,))of the given scalar power series is defined by p = lim inf(-c,/r). r+a
In the definition of p, we can either regard c, as restricted to the finite coefficients, or interpret -c, as E* = +00 if c, = E . Application VII.4 Consideration of the following three coefficient sequences shows that p may equal E , E * , or a finite number, even under the given conventions. (i) IcrI = Ir2), (ii) ICrl = PI, (iii) (c,) = 1-r'). Theorem VII.6.
rf z
< p, then the scalar power series convergesfinitely.
froof. Select any finite coefficient cj and let u = cj @ z ( j ) . Choose a finite and such that a > O and z+a
z
+ a,
82
R. A. CUNINGHAME-GREEN
i.e., (including any coefficients c, equal to E ) c,
+ rz < -ra.
(VII.5)
Hence, the term-sequence [c, Q z ( ~ ) )is ultimately dominated by a sequence which is decreasing linearly, and therefore for some N > j , c,
0 z(') < u
for all r 1 N.
Clearly, N is a terminating index. H In the light of this, our next result justifies the term convergence bound. Theorem V11.7. If z > p, then
C@ cr Q z(') raN
has no finite upper bound as N
+
00.
Proof. We shall show that no given finite number K can upper-bound the term-sequence [c, 0 z(,)). For we may choose [, 1 such that z > C > 1 > P;
(VII.6)
so [, 1 are finite. Now choose M such that
M > K/(C - 1).
(VII.7)
By the definition of p, there exists an index r > M such that
-.cJr < 1
(VII.8)
(so c, is finite). Hence, c,
0 z ( ~ =) c, = Cr
> r(T
+ rz > c, + r[
(by Eq. (VII.6))
+ w + r(C - 1) - 1)
(by Eq. (VII.8))
> M(C - .r7) >K (by Eq. (V11.7)). Notice that the convergence or unboundedness of the series is unresolved for z = p . This is a common situation in convergence theory generally, where behavior at the boundary necessitates more elaborate criteria, beyond our present scope.
MINIMAX ALGEBRA AND APPLICATIONS
83
3. Projection Matrices
Theorem VII.8. If A e Fn,n has finitely soluble eigenproblem, then there exists an n x n finite matrix Q such that Q* = Q ; Q(')
and A'"
= Q;
A 0
4, (&I))(,) @Q (r =
Q=W) 0 Q; 1,2,
...).
(VII. 9)
Proof. Let
be any finite eigenvector of A ; define Q = d @ d*. Then Q @ Q = d @ d* @ d @ d* = Q since d* @ d = 0. A @ Q = A @ d @ d* = I ( A ) @ d @ d* = I ( A ) @ Q. Now, qij = di - d j , so Q is skew-symmetric and Q* = Q. Hence also Q = (d @ d*)* = d 0' d*. In Theorem 111.3(ii), writing E = (A(A))(-')@ A gives E ( r ) e r ( E ( r@ ) d ) @ ' d * = d @'d* = Q, establishing the last part of Eq. (V11.9). H Application VII.5 For any vector x , Q @ x = d @ d* @ x = a! @ d,
where the scalar a! = d* @ x. In other words, the action of the matrix Q in Theorem VII.8 on any vector is to map it into a multiple of d. Hence the name projection matrix. Application VII.6 Since d is also an eigenvector of N
E@c, @ A(') 1
with eigenvalue
the argument of Theorem VII.8 shows that
84
R. A. CUNINGHAME-GREEN
4. Matrix Power Series
For the coefficients of a matrix power series, we make the same conventions as for a scalar power series, and we use the terminologyfinitely convergent and terminating index in the obvious analogous way. Theorem VII.9. If the eigen roblem for A E F,,,,, isfinitely soluble, then the matrix power series 1 r s N c, Q A(" converges to some matrix B E Fnsn or is without bound in at least one element-position, as N 00, exactly as the scalar power series CorSN c, 6 @(A))@) converges to a finite limit or is without bound. -+
Proof. Under the operator I@;",each element-position of the matrix series will be monotone non-decreasing; hence, by the monotone convergence theorem, each is either bounded and convergent or increases without bound. However, from Application VII.6,
and in every element-position, the RHS has some finite multiple of the scalar series. Hence, if the scalar series converges and is therefore bounded, the LHS is bounded and therefore convergent. Moreover, since the LHS lies in Fn,nfor all N, is non-decreasing, and is finitely bounded, it is clear that the limit matrix B E F,,,,,. On the other hand, if the scalar series increases without bound, so does the RHS in every element-position, and since the relation 4, implies equality somewhere on each row, so does the LHS in at least one elementposition (per row). H We are now in a position to show that convergence of a matrix power series depends essentially only on the size of the eigenvalue. Theorem VII.10. If the eigenproblem or A E F,,,,, isfinitely soluble, then as N + 00, the matrix power series C r s c, Q A(') converges finitely if A(A) < ~ ( I C , ] ) , to some matrix B E F,,,,,and is without bound in at least one element-position per row if A(A) > 0.
6
Proof. In view of theorems VII.6, VII.7, and VII.9, we have only to show that convergence is finite when A(A) c p. Certainly, this is true in any element-position which stays constantly equal to E . Otherwise, some finite value u will occur. But the matrix term-sequence (c, @ A'"] is dominated by (c, Q A(A)(" 63 Q), and the proof of Theorem VII.6 shows that this ultimately decreases linearly in every element-position, falling below any such chosen finite u and giving a terminating index valid for all elementpositions. H
MINIMAX ALGEBRA AND APPLICATIONS
85
VIII. MAXPOLYNOMIALS A . Siting a Service Facility 1. The Local Absolute Center
The plan for a new mountain holiday-home development foresees five principal housing districts connected by roads, as shown in Fig. 19. A firestation to serve all five districts will be built somewhere on the two-way road along the ridge between N, and N2;there is no further constraint on the site to be selected. The developers wish to choose the site to give the promptest possible service, and in the provision of such emergency services, a minimax criterion is often adopted: Each potential site is evaluated in terms of the time taken to reach the furthest potential demand point; a site is then chosen for which that time is least. We may suppose that the transit time for a fire-appliance in each direction along each of the roads is known. If D is the matrix correspondng to the completion of the given graph, weighted in the dual weighting system using these transit times, then we may compute the strong (min-algebraic) transitive closure [ y o ] of D , using the dual form of the Floyd-Warshall algorithm. Clearly, yij gives the shortest time to reach Nj from Ni through the road network. Suppose we consider a site which is at a distance z from N,. To reach another district Nj from that site, the fire-appliance must go either via N, , taking z + y u units of time, or via N2, taking o((N, ,N2))- z + yv units of time. Since it will always take the quicker route, it is the smaller of these quantities which is relevant, and therefore the greatest time to reach any
FIOURE19. A facility-location problem.
86
R. A. CUNINGHAME-GREEN
potential demand point from a point z units along (N, ,N2) is
t12(z)= max(min(p, J
+ z , qj - z)),
(VIII.1)
where Pj = Y V , q j = o((N1 ,N2)) + Y V . A point on ( N , , N2) for which t12is minimized is known as a local absolute center. (More generally, by considering each arc in turn, and choosing one for which the analogous minimum value of t , is least, we would find an absolute center.) Application VIII.l
For the graph of Fig. 19, the function t 1 2is max[z, 7 - z , min(z + 4, 10 - z),min(z
+ 2 , 8 - z), min(z + 3, 11 - z ) ] .
2 . Reformulation The max-algebraic convention that x(‘) denotes the ordinary arithmetical product rx will now be adopted generally, whether r is an integer or not. a fact used in the next Section IX). (Thus, x(‘) = In particular, x(-’) denotes - x. If P , Q are any algebraic expressions, then P - Q is denoted by P @ Q(-’). In this context, it is convenient to introduce double fraction bars to suggest a quotient: P / / Q = P Q Q ( - l ) .Thus,
7/12 = max(z, 7 - 2). z It is natural to ask whether such an expression can be “rationalized” as in ordinary elementary algebra, to give 7 @ z(2) Y
and it is easily verified that this may be done, since max(2z, 7) - z = max(z, 7 - z). This is a simple consequence of the distributive law for max algebra. The function t,, , defined earlier, contains a mixture of operators min and max. Because the problem under consideration is no longer formally linear, it is not helpful to make use here of the duality between max and min algebra, but better to work entirely within one or the other. We shall proceed in max algebra. The smaller of any two quantities results if we subtract the greater from their arithmetical sum. This enables us to write the operator min rationally in max algebra: X O Y min(x, y) = -. (VII I. 2) X O Y
MINIMAX ALGEBRA AND APPLICATIONS
87
Application VIII.2 Following Application VIII. 1, the function t12 can be written 7@z'" 1062 8 6 2 11 6 2 @- -0-0Z 6 @ z(') 6 @ d2) 8 @ z(') *
For example, min(z (4 @ z
+ 4, 10 - z) can be written using Eq. (VIII.2) as 6 10 6 z'-")//(10
6
@ 4 6 2).
On multiplying both numerator and denominator by z//4,this tidies up to 10 6 z 6 @ z ( ~')
which is the second term in the proposed expression. Notice that it dominates the third term, exceeding it by 2 for all values of z. To simplify the subsequent work, therefore, drop the third term and obtain for tI2
Application V111.3 The expression for t12 can be rationalized further. Take the second and third terms and set them over a common denominator of (6 @ d2))@ (8 @ 2")). Working exactly as in conventional elementary algebra, the numerator will be 10 6 z (0 (8 @
d2))@
11 6 z 6 (6
= (18 @ 17) @ z @ (10 = 18
0 2"))
0 11) 6 z(j)
6 z @ 11 6 2 0 ) .
Thus, f12 becomes 7 @ zC2)
z
0 11 6 (6 @ z ( ~ ) 6 ) (8 @ z ( ~ )') 18 6 z
@
The rationalization can be completed to obtain
88
R. A. CUNINGHAME-GREEN
Finally, the principle of exponentiation gives tlZ(2) =
21
o is 0
o 1 1 o o P)
z ( ~ )
z ( ~ )
z 0 (3 0 zY2) 0 (4 0 zY2)
The production of the preceding formula for t12raises a number of questions, which will be addressed in the following sections, following Cuninghame-Green and Meijer (1980): How efficiently can such algebraic manipulations be carried out? What is the relationship between the kinds of expression occurring in the numerator and denominator? How can the minima and maxima of a function such as t I 2be found? B. Maxpolynom ials 1 . Adding by Merging
An expression of the form Co c, 0 z ( j r ) , such as occurs in the numerator of the foregoing formula for t12, will be called a maxpolynomial. The name arises from the similarity of form and property between such expressions and the familiar polynomials of elementary algebra. However, there is no presupposition that the exponents of z constitute a set of consecutive nonnegative integers. We allow negative and fractional exponents, as in 6 0 z(-3.5)0 (-4.3) 0 z
(-2) 0 z(9.9),
but to avoid certain technical problems, we assume that all coefficients are finite. A maxpolynomial will normally be presented with its terms in increasing order of exponent, as in all the foregoing examples. Thus, a maxpolynomial is essentially a list of pairs ( ( c , , j , ) , * * * , (C,+l,j,+l>h
where (if p > 0 ) j , < ..- < j p + l .We call j p + l the degree of the maxpolynomial and p + 1 the length of the list or of the maxpolynomial. The f o p pair of the list is ( ~ , , + ~ , j ~and + ~the ) , f o p term of the maxpolynomial is c,+~ 0 z ( j p + l ) ; the bottom pair and bottom term are similarly (c, , j , ) and c, 0 z ( j l ) . Thus, the immediately preceding maxpolynomial is of degree 9.9 and could be presented as a list of length three with top pair (-2,9.9) and with bottom pair (6, -3.5). A maxpolynomial in which the coefficient c , + ~ ,in the top term, and the exponent j , , in the bottom term, are both zero, will be called standard.
89
MINIMAX ALGEBRA AND APPLICATIONS
Thus, the numerator of t I 2 in Application VIII.3 is a standard maxpolynomial in z. By reducing the list, we shall mean deleting the top pair. If p = 0, we thereby produce an empty list, otherwise, we produce a list of lengthp, with top pair ( c p , j p ) .By extending the list, we shall mean appending a new bottom pair. If the list was empty, we thereby create a list of length one, otherwise, we turn a list of length p + 1 into a list of length p + 2. Adding two maxpolynomials may be thought of as a process of merging the two corresponding lists, i.e., producing a new list into which the pairs from both lists have been incorporated, in increasing order of exponent. Thus, forming [3 @ 2 Q z ( ~@ ) (-1) Q zO)]@ [(-2) Q z(-’) @ = (-2) Q
z(~)]
z(- 0 3 @ z(2) @ 2 Q ZC3) @ (-1) Q z(3
may be regarded as merging the two lists ((3,0), (2,3), (-1, 5 ) )
and
((-2, -21, (0,2)>.
Of course, if the same exponent occurs in both lists, then it is a pair with the greater coefficient which is carried forward. Application VIII.4 Consider an algorithm called Merge(L’, L”, r), for merging two given lists of pairs L’, L” into a single list L. The symbol r represents some algebraic 0 ,or +. operation such as 0, At each step, the algorithm compares the top pairs, ( a ‘ , p ’ )of L’ and (a”,p”) of L ” . If p’ and p” are unequal, then the pair with the greater value of p is used to extend the list L, and the appropriate one of L’, L” is reduced. If p‘ = p”, then L‘, L” are both reduced and the list L is extended using the pair (a’”r’’,p’). When one of L’, L” becomes empty, any remaining pairs in the other list are carried forward in the obvious way. In applying the algorithm Merge just discussed, it is clear that the number of steps equals the number of times the list L is extended, and that this is at most equal to the total number of pairs in the two lists. Hence, the following result.
Theorem VIII.1. The sum of two maxpolynomials, of total length I , may be computed in O(1) steps by the algorithm Merge. H
2. Multiplying by Merging By analogy with the summation symbol
Lo,we introduce the following.
90
R. A. CUNINGHAME-GREEN
Reserved notation The product symbol
n
denotes iterated use of the operation
@
i'tj
=
tl Q
6, e.g.:
6 t,.
*.*
j = 1
An expression of the form (VIII.3) will be called a product form; the elements f i r are taken from the primal weighting system W and, when finite, will be called corners, for reasons which will emerge later. We allow the possibility that P1may have the value E , so that, for example, the denominator of the expression for t12at the end of Application VI11.3 is a product form: (E
0 2 ) 6 (3 0 2y2) 0 (4
0Z y 2 ) .
Application VIII.5
In conventional notation, the expression in Eq. (VIII.3) would be
Cr er m a ( P r Thus, the preceding denominator for
z
2).
f12
is
+ max(6,2z) + max(8,22).
This is identical with the function z+14 32
+8
52
forzI3, for 3
I
z
I4,
for z 2 4
and is therefore a piecewise linear function. The exponents in a product form may be positive or negative finite numbers and need not be integers. We shall, however, suppose that a product form is normally presented with its factors in increasing order of Pr, as in the foregoing example. Thus, a product form is essentially a list of pairs
((el,PA, ...,(ep,Pp>> where (if p > l),
P1 < ... < Pp.
MINIMAX ALGEBRA AND APPLICATIONS
91
It is then clear that the product of two product forms can be calculated by merging the two corresponding lists L', L" according to ascending values of P,, which can be achieved using a suitably modified form of the algorithm Merge@', L", 0 )from Application VIII.4.
Application VIII.6 Using the algorithm Merge, we find
o
[ z ( ~ ) (-1
o z ) ( - ~ )o (2 o 211
0 [z(-2)0 (-1 @ z)'2'
(3
z ) ' 2 ' ]
in the form (-1
2(3-2)
=z
0&-2+2)
o (2 o z) 0 (3 o
0 (2 0 z) 0 (3 0 2)").
Notice that literal application of the algorithm would have carried forward a vacuous factor (-1 @ z)"), but a trivial modification will suppress the carry-forward when a zero exponent is calculated. Exactly as for addition of maxpolynomials, we conclude:
Theorem VIII.2. The product of two product forms, with a total number I of factors, may be computed in O(1)steps by the algorithm Merge. There is of course nothing peculiar to max algebra about Theorems VIII.1 and VIII.2. The validity of the merge algorithm depends essentially on the laws of associativity and commutativity. C. Extrema of Product Forms
1. Global Behavior As in Application VIII.5, it is a simple matter to determine how any product form behaves, as a function of z. A factor of the form ( E @ z)@)is the linear ) function ez, of slope e ; a factor of the form ( p @ z ) ( ~ with finite is constant for z I/3 and behaves like the linear function ez, of slope e, for z > 8. Hence, the expression in Eq. (V111.3), for any value of z, is the arithmetical sum of some constants and some linear functions, specifically
n@(8,0z ) @ ~ ) b + f z , =
r
92
R. A. CUNINGHAME-GREEN
P2
P3
P4
z
where
C
f=
C
e,. (VIII .4) o,< Z As z increases from E through finite values, the mixes of constants contributing to b and f change only when z passes through a corner. For very small values of z , the slope of the function is zero if /3, is finite, otherwise it equals e , ; at a corner p,, the slope changes by e,; between corners, the function has constant slope; for very large values of z , the slope equals the arithmetical sum of all exponents. Hence the function is piecewise linear and, being the arithmetical sum of continuous functions, is continuous. These ideas are illustrated in Fig. 20, and explain the choice of the word corner. b=
o,r
e,B,;
Z
2. Local Extrema At corners where the slope of a product form changes sign, a local minimum or a local maximum occurs. Application VIII.7
Consider
0z ) ( ~0 ) (3 @ z ) ( - ~ ) 0 (3.5 02)") 0 (4 0 z ) ( - ~ 0 ) (5.5 0 z)'~). (VIII.5) This function has slope - 1 for z < 1.5; at z = 1.5, the slope changes to z(-') 0 (1.5
- 1 + 2 = 1 ; a t z = 3 , t o l -2=-1;andsoon.
93
MINIMAX ALGEBRA AND APPLICATIONS
1.5
3 3.5 4
5.5
.
z /
F I ~ U R21. E Local minima and maxima.
We find that the function has local minima at z = 1.5, 3.5, and 5 . 5 , and local maxima at z = 3 and z = 4. (The function is depicted in Fig. 21.) Generalizing on the foregoing example, it is clear that the local minimizers and maximizers of a product form may be found by making a pass through the list of pairs (e,, p,) in sequence and taking a cumulative sum of the exponents e,. For indices r where that sum changes sign, the corner 8, is a local minimizer or maximizer as appropriate. This establishes the following result. Theorem VIII.3. The local minimizers and maximizers of a product form with p factors may be found in 007)steps. Application VIII.8
In fact, local minima and maxima of a piecewise-linear function occur not only at a sign-change of the slope, but also anywhere where the slope is zero. The function z 0 (1 0 z)(-')0 ( 2 0 z ) 0 (3 0 Z F 2 ) shown in Fig. 22 has local minima for 1 c z I2 and local maxima for 1 Iz c 2 and z = 3. Theorem VIII.3 remains true, however, because in the general case it is clear that the bounds of the intervals of minimizers and maximizers, as well as the isolated minimizers and maximizers, are all established by one pass through the list.
94
R . A. CUNINGHAME-GREEN
"t
FIGURE22. Flat local extrema.
D. Evolution 1. Multiplying Out
In the facility-location problem, it is now clear that we could find a local absolute center on ( N l , N 2 ) if we could express the function t&) in product form and so determine its local minima. In fact, the product form considered in Eq. (VIII.5) is exactly such a representation of t I 2 . We establish this next, deferring until the following section the question of how this form was actually discovered. By segregating the factors with positive exponents from those with negative, the function in Eq. (VIII.5) may be written as a max-algebraic quotient: (1.5 0 z)C2) 0 (3.5 0zy2) 0 (5.5 0z)(2) (VII I. 6) z 0 (3 0 zY2) 0 (4 0 Z Y 2 ) The denominator is recognizable as that of t I 2at the end of Application VIII.3; we shall show that the numerators are also equal. In fact, it is clear that a product form with positive exponents may always be multiplied out using the rules of elementary algebra. This process is called evolution. Since the numerator in the foregoing expression is the square of (1.5
0 z) 0 (3.5 0 z ) 0 (5.5 0 z),
(VIII.7)
first consider the evolution of a general three-factor product form:
(P, 0 z ) 0 ( P 2 0 z) 0 (P3 0 z), in which PI, p2, p3 are finite.
MINIMAX ALGEBRA AND APPLICATIONS
95
By analogy with conventional algebra, we know that this will multiply out into a standard maxpolynomial of degree three, with constant term: coefficient of
D1 0 8, 0 B3,
z: 8, 0 P2 0P2 0 & 0PI 6 p3,
and in general, coefficients equal to the sum of r-at-a-time products of p’s. However, since 0 denotes max, it is clear that the sum of r-at-a-time products is just the product of the greatest r of the p’s. Since our convention is that 8, c /I,, ..., we find for the evolution of the three-factor form:
PI 0 8, 6 83 0 82 6 P 3 6 z 0 P3 0 z(2)0 z ( ~ ) . (VIII.8) Application VIII.9 The evolution of Eq. (VIII.7) is 10.5
0 9 0 z 0 5.5 0 z(’) 0z ( ~ ) .
Squaring this, using the principle of exponentiation, and comparing with Application VIII.3 shows that Eq. (VIII.6) does indeed represent t , , . From Application VIII.7 it follows that candidates for the local absolute center occur at z = 1.5, 3.5 and 5.5. A product form like that just considered, in which all exponents equal 1 and all P’s are finite, will be called simple. The preceding discussion for the three-factor case clearly generalizes trivially, to show that the coefficient of z ( j )in the standard maxpolynomial evolution of a p-factor simple product form will be 0 0 0 PP ( j < p). We can form the constant term with one pass through the list to accumulate the arithmetical sum of the p’s, and then generate all the maxpolynomial coefficients with one more pass which arithmetically subtracts the /3’s one at a time, smallest first. Hence:
---
Theorem VI11.4. The evolution of a simpleproduct form havingp factors may be achieved in O ( p ) steps.
2. Evolution in General The algorithm for the evolution of a simple product form may be described in the following way. Form the constant term as the product of the B’s; then f o r j = 1, ...,p , form the term involving z ( j )by replacing Pj by z in the term involving z(j-’),as illustrated in Eq. (VIII.8).
96
R. A. CUNINGHAME-GREEN
This motivates the following possible algorithm Evolution for a general product form. From its list
( ( e , P I ) , * * * (ep, P p ) ) : form the first (constant) termPf'1) 0 -.-0 a?); then f o r j = 2, . . . , p + 1, form the j t h term by replacing P,@j) by z@)in the ( j - 1)st term. 9
9
Application VIII.10
Applying the algorithm Evolution to the product form (1 0 z) 0 (3
0 z)"),
with list ((1, I), (2,3)),
we generate constant term
1
0 Y2) = 7,
next term z 0 3(2) = 6 0 z ; top term z 0 z(') = z ( ~ ) . This gives a proposed evolution as a standard maxpolynomial 7 0 6 0 z 0 z(~),
with list ((7, O), (6, I), (0,3)). Application VIII.11
Consider (1 0
z)(-I) 0 (2 0 ZY2',
with list ((-1, I), (292)).
The algorithm, Evolution generates in turn 1(-U
@
2'2';
&1)
0 $2);
z(-I)
0 p),
corresponding to a list ((39
O), (4, - 1 ) s
(0, I)),
but this is not the list of a maxpolynomial, because the exponents 0, - 1, 1 are not in increasing sequence. This difficulty motivates the following definition.
MINIMAX ALGEBRA AND APPLICATIONS
97
A product form, in which all the p ’ s are finite and all the exponents are positive will be called standard. Theorem VIII.5. The algorithm Evolution converts any standardproduct form having p factors into a standard maxpolynomial of length p + 1, in O ( p )steps. The product form and the maxpolynornial represent the same function of z.
Proof. It is clear that the algorithm is essentially a procedure for transforming a product-form list (***
(erlP r ) ..*>
into a maxpolynomial list ( a * *
(Cr,jr). * * >
using the formula
(arithmetical summations, an empty summation counting as zero). It may obviously be carried out in O (p ) steps. If p = 1, it is readily verified that the procedure is correct. For p > 1, it is clear that the list generated is that of a legal maxpolynomial, since the exponents j , increase with r because all the exponents es are positive. And the maxpolynomial is standard, since the arithmetical coefficientsummation is vacuous in the top term and the arithmetical exponentsummation is vacuous in the bottom term. Moreover, consider the behavior of the proposed maxpolynomial for z in the range pr-l < z Ipr (1 < r Ip ) . The value is determined by the term c, @ z(J.) =
c esPs + ( c .)z,
szr
scr
because any other term either has some z’s replaced by some B’s ( s p r - , < z), or some p’s (?/Ir L z) replaced by z’s, and will therefore not be greater. Hence, for this range of z, the function has constant slope C,,,e,. Similar arguments show that for z > &, the function has slope C s s p e s , and for z I it has slope zero. So the function is piecewise linear and is continuous, being compounded of continuous functions. Comparison with the description at the end of Section VII,C,l shows that the product form and the proposed maxpolynomial have equal slopes
98
R. A. CUNINGHAME-GREEN
everywhere (except at corners, where no slope is defined). Hence, they cannot differ by more than a constant. But if we evaluate them at, Hence, they are equal say, z = p p , they have the same value everywhere. H
pp).
Application VIII.12
We may establish the identity 3 0 z ( - l ) 0 (1 0 2 ) @ ( 3 0 2y2) = 10 6 z‘-” 0 9 0 3 0
2(2)
by setting the factor 3 @ z(-l) to one side, making an evolution of the resulting standard product form, then multiplying the result through by 3 @ z(-l). This gives a method of evolution of more general product forms c Q z ( J )Q P(z), where c is constant, j has any value and P is a standard product form. The final multiplication involves O ( p ) steps, so the whole process remains O ( p ) . Algorithmically, of course, it is not necessary to carry out a two-stage process: It suffices to start the algorithm with a bottom term c @ z ( j ) @ pled @ ... @ pp’ instead of
p p
@
... @ pp).
By reference to the discussion of Eq. (V111.4), it is easily seen how a product form may be evaluated for any given value of the variable z , by taking one pass through the list ((e,,p,)), accumulating arithmetical sums C e,p, and C e,. This justifies the following result.
Theorem VI11.6. Evaluation of a p-factor product form for a given value of z may be achieved in O ( p ) steps. H Application VIII.13
From Eq. (V111.5), t12(z)may be written as a product form with list ( ( - 1 , E ) , (2, 1 . 9 , ( - 2 , 3 ) , (2, 3.51, (-2,4),
(Z5.5)).
Evaluation at the three local minimizers found in Application VIII.7 gives f 1 2 ( 1 . 5 ) = 5.5; tlz(3.5) = 6.5; tlz(5.5) = 5 . 5 . From this, it is clear that two local absolute centers exist on ( N l , N 2 ) , namely at z = 1.5 and 5 . 5 .
MINIMAX ALGEBRA AND APPLICATIONS
99
IX. EFFICIENTRATIONAL ALGEBRA A. Resolution 1. Resolving Evolved Forms
In the analysis of the local absolute center problem, we were able to discuss local minima of t , , because of the availability of a product form equal to the maxpolynomial numerator. Where did this come from? A complete technique clearly requires an algorithm which will turn a general maxpolynomial into a product form. Because of its resemblance to resolving a polynomial into linear factors, this process will be called resolution. It is easy to resolve a standard maxpolynomial which was itself produced by the evolution of a standard product form, because consecutive maxpolynomial terms then differ exactly in that one more /3!e,) has been replaced by z ( ~ ' ) .So we may achieve the resolution by taking consecutive pairs of terms and cancelling common factors so that one term becomes just a power of z and the other just a constant. Thus, from two consecutive terms,
p!e,) 0
.*.
Ce ) 0 0 PpCep) 0 z ( e l + ... +e,-d 0Pr+T'
... 0 p$$ 0 pI+... +e,)
in a standard maxpolynomial produced by applying Evolution to the standard product form (8,0 zYeJ,
n" r
delete common factors to find
p!'.'
@
p')
which is
(8, 0 Z)@,). Application IX.1
From Application VIII. 10, the standard maxpolynomial 7 @ 6 0 z @ zC3)
is the evolved form of a standard product form. From the lowest pair of terms, cancel 6 to get 1 0 z. From the next pair, cancel z to get 6 @ z(*) = (3
0 z)"',
thus completing the resolution into the product form (1 0 z ) 0 (3
from which it was evolved.
0z)(')
100
R. A. CUNINGHAME-GREEN
Application IX.2
The procedure may be extended to express a given non-standard maxpolynomial in the form c
0 z ( j ) Q P(z),
where P(z) is a standard product form. Consider 10 Q z ( - l )0 9
0 3 Q z(').
First remove a factor 3 Q z(-')to produce a standard maxpolynomial: 7 0 6 Q z 0 z(~).
Resolving this as in Application IX.1, then re-introducing the factor 3 Q z(-')gives the required expression 3 0 z(-I) 0 (1 0 2 ) 6 (3 0 z)?
agreeing with Application VIII. 12. The foregoing procedure can be embodied in an algorithm Resolution, formally applicable to any arbitrary standard maxpolynomial of length P + 1, c, Q z(jJ,
p r
whether or not produced by a previous evolution. For r = 1, ...,p, take each pair of consecutive terms cr
z(jr)
0
r+ 1
z(jr+d
and cancel a common factor cr+] Q z ( j r )to obtain
( c r / / cr+ 1 ) 0 z ( j r + l - j r ) = (8, 0 z)'"', where er = j r +1 - j r ;
Pr
=
(Cr
- Cr+ N ( j r +1 - j r ) ,
(IX.1)
giving a proposed resolution
Theorem IX.1. in verse.
The algorithms Evolution and Resolution are mutually
Proof. Both algorithms are essentially list-transformations. Evolution
maps (. .. (e,, /Ir) ...) to (. .. (C, e,B,, C,< e,) ...). Resolution maps (... ( c r , j r ) ...) to (... ( j r +1 - j , , (Cr - C r + 1 ) / ( j r + 1 - j , ) ) ...). If we apply
MINIMAX ALGEBRA AND APPLICATIONS
101
Resolution to the outcome of Evolution, it maps
to
which is exactly (er,p,). Similarly, Evolution reverses Resolution. Application 1X.3
The algorithm Resolution, applied to the following standard maxpolynomial 3 0 1 0 2 0 2(2), transforms the maxpolynomial list ((3, O), (1, I), (032))
into ((1,2), (1, 1)).
This is not the list of a valid product form, because the p's are not in increasing sequence. This difficulty, analogous to that discussed in Application VIII. 11, motivates the following definition. A maxpolynomial 1' cr 0 z ( j r ) of lengths 1 or 2, or of length
p
+ 1 > 2 with the arithmetical ratios
--
(r = 2, P + 1) - Cr)/(jr - j r - 1 ) increasing strictly with respect to r, will be said to satisfy the concavity condition. The use of this particular term will be justified later in the section. Theorem IX.2. For a standard maxpolynomial of length p + 1 which satisfies the concavity condition, the algorithm Resolution finds in O(p) steps a resolution as a standard product form with p factors. The product form and the rnaxpolynomial represent the same function of z . (Cr-1
9
Proof. From Eq. (IX.l), it is clear that the concavity condition ensures that the list produced is a valid product-form list, in having pr increasing in r i f p > 1. Moreover, because the given maxpolynomial has finite coefficients and increasing exponents j , , the removal of common factors from each pair of consecutive terms will always give finite corners 8, and positive exponents e,. Thus, a standard product form results.
102
R. A. CUNINGHAME-GREEN
If we now apply Evolution to this product form, then Theorem IX.1 says that the original maxpolynomial is recovered, and Theorem VIII.5 says that it represents the same function of z as the product form. Clearly, one pass through the list for the maxpolynomial suffices to implement Resolution, so the process is O ( p ) . W Application IX.4
The procedure used in Application IX.2, combined with Theorem IX.2, justifies the following conclusion: For any maxpolynomial of lengthp + 1 which satisfies the concavity condition, a resolution which represents the same function of z may be found in O ( p )steps in the form c 0 z ( j )0 P(z),
where P(z) is a standard product form. 2. Inessential Terms If we modify (1 0 z)(~)using the principle of exponentiation, we get 2 0 z ( ~ )On . the other hand, if we carry out the multiplication, we find (1 0 z)'2' = (1 0 z ) 0 (1
0z ) = 1 0 1 0 (1 0 1) 0 z 0z 0 z = 2 0 1 0 z 0 z(2).
Although this is formally different from 2 0 z ( ~ ) it, defines the same function of z, because the term 1 0 z is dominated by the other two terms: by isotonicity, i f z s 1 then 1 0 z 1 1 0 1, ifz> 1thenz@z> 1@z, and so for all z , 10z
I1
0 1 0z 0 z
=2
0 z(2).
Application IX.5
In conventional notation, the maxpolynomial 2
01 0 z 0
=
max(2, 1 + z , 2z).
Figure 23 shows the three functions y=2,
y=l+z,
y=22.
Clearly, their upper envelope is not changed if the function 1 + z is removed.
MINIMAX ALGEBRA AND APPLICATIONS
103
FIGURE23. An essential term.
In a maxpolynomial r= 1
a term c, @
z(js)
(1
c s c p + 1) will be called inessential if for all z, c, 8 z(js) 5 cr 0 Z ( j r ) .
c@
rfs
Evidently an inessential term may simply be canceled from a maxpolynomial formula without changing the function defined by the formula. All other terms (in particular, the top and bottom terms) will be called essential. This property is related to the concavity condition, as the following result shows.
Theorem IX.3. If the term c, 0 z(Js)(1 < s < p maxpolynomial
+ 1) is essential in the
P+ 1
C0 cr @ z(jJ,
r= 1
then (c, -
cs+l)4A+l- A ) > L1 - c M j , - LI).
Proof. There is a value ar for z for which this term is not dominated, so c,
+ j S a> cs-I + js-lar
and
c,
+ j,ar > c , + ~+ js+lar.
Recalling that the exponents j , are increasing in r, we have (c, - cs+l)/(js+l - j,)
>a>
- c M j s - LA.
104
R. A. CUNINGHAME-GREEN
Application IX.6
By definition, the top and bottom terms of a maxpolynomial are classified as essential. This is justified by the readily confirmed fact that these terms are dominant for very large and very small values, respectively, of z. Theorem IX.4. A standard maxpolynomial resulting from use of the algorithm Evolution has no inessential terms. Proof. Let the maxpolynomial be of length p + 1 . The only case of interest is for p + 1 > 2, so choose any term /39) ... /3?) Z ( e I + ” ’ + e r - l ) (1 < r < p + 1).
Because of the numerical ranking of the B’s, it is clear that if we assign a value to z such that Dr-, < z < /3, then this chosen term will have a value strictly greater than that of any other term
bys)@
... p y @
@ Z(el+“’+es-l)
with s > r or s < r, and is thus essential. H Application IX.7
In Application VIII.9, the evolution is given of a product form. We easily confirm that this evolution has no inessential terms by finding the values of the terms for z = 2 and z = 4. Theorem IX.5. A maxpolynomial has no inessential terms if and only if it satisfies the concavity condition. Proof. Notice first that multiplying by or removing a common factor c @ z ( j )does not affect either the concavity condition of a maxpolynomial or the essentiaMnessentia1 status of any term. Hence, without loss of generality we may consider a standard maxpolynomial. Suppose the maxpolynomial satisfies the concavity condition. By Theorem IX.2, we may resolve it into a standard product form, to which we could then apply the algorithm Evolution. By Theorem IX.l, the original maxpolynomial would be recovered, so by Theorem IX.4 it has no inessential terms. The converse is already embodied in Theorem IX.3.
Application IX.8
Checking the concavity condition shows that the following maxpolynomial has no inessential terms: is 0 2(-4.5) @ 1 1
6Z ( - O . ~ ) 0 9 @
03
@
P).
MINIMAX ALGEBRA AND APPLICATIONS
105
3. Rectification
It is clear from Theorem IX.3 that if a given maxpolynomial does not satisfy the concavity condition, then it contains an inessential term which may simply be canceled. The process may be repeated if necessary, but in a finite number of steps the concavity will be met (or p will be reduced to 1) and a maxpolynomial will result to which the algorithm Resolution may be validly applied. The removal of inessential terms will be called rectification. It is easy to devise an algorithm Rectify for this purpose. From the list L of a given maxpolynomial, create a new list L’ by first moving the top pair from L to L‘. At each subsequent step until L is empty, move the top pair from L to become the bottom pair of L’, unless L’ has at least two entries and the bottom pair ( c s , j s ) say, , of L’satisfies (IX.2) -jr), in which case delete this bottom pair from L’ ;the indices r and t refer to the pairs currently at the top of L and next-to-bottom in L’, respectively. (cs - crY(jr - j , ) 5
(Cr - c s Y ( j s
Application IX.9
Consider the standard maxpolynomial 12 0 10 0 z ( ~0 ) 4 0 z ( ~0 )
0 ~(’1,
with list L : ((12,0), (10,2), (4,4), ( 0 , 6 ) ,((47)). The algorithm Rectify begins by moving the top pair to L’, and then, since L‘ does not contain at least two pairs, the next pair is moved to L’.Equation (IX.2) now holds because
(0- 0)/(7 - 6 ) < (4
-
0)/(6 - 4),
so the bottom pair (0,6)of L’ must be deleted. Since L‘ now contains less than two pairs, the next pair is moved across and the lists now appear as L : ((12,0), (10,2)); L’:((4,4), (0,7)). Proceeding, the algorithm further deletes (4,4). The two remaining pairs are moved in turn from L to L’ without Eq. (IX.2) being again satisfied, and the .list L’ finally appears as ((12,0), (10,2), (0,7)>, defining a standard maxpolynomial 12 @ 10 0 z ( ~0 )
with no inessential terms.
~(’1,
106
R. A. CUNINGHAME-GREEN
Theorem IX.6. Given a maxpolynomial of length p + 1, the algorithm Rectify transforms it in O ( p )steps into a maxpolynomial representing the same function of z , having no inessential terms and satisfying the concavity condition. If the given maxpolynomial was standard, so is the resulting maxpolynomial.
Proof. The algorithm drops only inessential terms, so it does not change the function represented. It is clear that the maxpolynomial defined by L' satisfies the concavity condition at every step, including the last, and therefore has no inessential terms. Each pair from the original maxpolynomial is moved once and possibly dropped once, so the process takes O ( p ) steps. Since neither the top nor the bottom term is dropped, standard maxpolynomials remain standard.
B. Linear- Time Rational Calculation 1. Addition and Multiplication
Let P(x), Q(x) be maxpolynomials of length p, q, respectively. The results of this and the previous chapter justify the following conclusions. We can form a maxpolynomial equal to P 0 Q, by merging, in O ( p + q) steps. We can also rectify both P and Q, and then resolve both into product form (perhaps times a constant), again in a total of O ( p + q) steps. These product forms may be multiplied by merging, and then evolved into a maxpolynomial, again in O ( p + q) steps. Hence, both P 0 Q and P 0 Q may be computed in maxpolynomial form in a number of computational steps depending only linearly on the total length of P and Q. Let us now define the length of a rational function P(z)//Q(z) as the sum of the lengths of P and Q. Given two rational functions P//Q, R//S of lengths I, m y respectivley, we can evidently form their product and sum
(P 0 RV/(Q
0S ) ,
(P 0 S
0 Q 0 R)N(Q 0 9,
again in a number of steps depending no more than linearly on I
+ m.
2 . Minima, Maxima, and Roots The minima and maxima of a rational function of length I may be found in O(1)steps by rectifying numerator and denominator, resolving into product forms, and merging into one product form. This was essentially how the local absolute center problem was addressed. Finding the solutions of a rational equation of the form P//Q = R//S
MINIMAX ALGEBRA AND APPLICATIONS
107
is clearly the same as finding the zeros of (P @ S)//(Q @ R), and this is clearly the same as finding the minimizers of the absolute value I(P @ S ) / / ( Q @ R)l. From the fact that 1x1 = max(x, - x ) = x 0 x(-'), we require the minimizers of (P 0 S ) N ( Q 0 R) 0 (Q 0 R)//(P 0 S ) , and so once again we may solve this problem in a number of steps depending at worst linearly on the total length of the expressions. C. Convexity and Concavity 1. The Concavity Condition
A continuous function f ( z ) , whose slope never decreases as z increases, as shown in Fig. 23, is usually called convex. In applying this term, we can allow the possibility that the slope may not be well defined at a finite number of places, as in the piecewise-linear function in Fig. 20. From the discussions in Section VIII of the global behavior of maxpolynomials, we know that they have constant slope for very large and very small values of z , and that the slope increases as z passes through a corner. Hence, maxpolynomials are piecewise-linear continuous convex functions. Similarly, a continuous function whose slope, defined at all but a finite number of points in the domain of definition of the function, never increases, is called concave. Suppose we take the list (. ..,(c,, j,), ...>of a maxpolynomial, plot points in the coordinate plane having ordinates c, and abscissae j , , and join consecutive points to give a piecewise-linear function. If the maxpolynomial has no inessential terms, then the concavity condition holds by Theorem 1x3,so
(2 5 s 5 P ) , - cs)4.is+l- j s )< (cs - c S - M j s- js-d which implies that the constructed function is concave; indeed, the slope decreases strictly at each plotted point. Hence the term concavity condition. Application IX.10
Plotting the points (-4.5, 15), (-0.5, l l ) , (0.5,9), (2.5, 3)
and joining consecutive points gives a piecewise-linear continuous concave function, whose slope decreases strictly at each plotted point, consistent with the fact that the maxpolynomial in Application IX.8has no inessential terms.
108
R. A. CUNINGHAME-GREEN
2. Diagonal Realization
Suppose a DES produces an orbit (x(r)= A(,-') 6 x(1)). Observation of the orbit is not possible directly, but only indirectly through a linear observation process represented by c E F i n .Thus, what is observed is the sequence of numbers ( g , ] , where g, = c
6 A(,-]) 0 x(1).
(IX.3)
The numbers g , are called Markov parameters. It is assumed that A ,x(l), and c are unknown. The strong realization problem is to calculate possible A , x(l), and c so as to satisfy Eq. (IX.3) for an observed sequence (g,] (Olsder, 1986). This is a stronger, and more difficult, version of the realization problem considered in Section 111.B.3. Under conditions of convexity, however, it is relatively straightforward, as the following shows. Application IX.ll
At stages r = 1,2, ..., a DES produces the following sequence (g,) of Markov parameters: 31, 28, 27, 26, 26, 27, 29, 33, 37,
...,
being thereafter 4-periodic of period 1 . Plotting the points (r,g,) in the (z,y ) coordinate plane and joining consecutive points discloses a piecewise-linear convex function y of z defined for z 1 1 . It is not hard to calculate that y is the upper envelope of the four linear functions 34 - 32, 30 - z, 21 + z, 1 + 42, or in max-algebraic notation, y = 34 0 z(-~)0 30 0
@ 21
8 z 0i 8 z ( ~ ) .
This maxpolynomial gives a mathematical model which for values r of z (r = 1,2, ...) represents g , . In general, if a process is ultimately periodic of period 1,and the Markov parameters have the property that the first differences g,+, - g , never decrease with respect to r, then a piecewise-linear convex function may be fitted as in Application IX.11 and we may speak of a maxpolynomial realization. A maxpolynomial realization may be very simply converted to the form required by Eq. (IX.3), by using the fact that in max algebraj") = r(j). Thus, if g, =
C'0 c, 8 r('s)
p+ I
MINIMAX ALGEBRA AND APPLICATIONS
109
is a maxpolynomial realization of (g,), then we have
whence g, = c 6 A('-') @ Hl),
where c = [cl,
...,cp+,],A
=
diag(jl , ...,j,+,), and
In Cuninghame-Green and ButkoviE (1993), conditions are given under which realizations constructed this way are minimal-dimensional. Application IX.12
For the Markov parameter sequence of Application IX.11,a diagonal realization is found in the form
Application IX.13
If a Markov parameter sequence Ig,) has the property that the first differences are non-decreasing and are constant for r 2 N , then one pass through g, , . .,gN will establish the values of the slopes and the indices for which the slopes change. In effect, this gives a list of corners and exponents for a product form, which may be converted to a maxpolynomial in O(N) steps. Hence, a diagonal realization may be constructed in O ( N )steps from the given (g,].
.
X. MISCELLANEOUS TOPICS This monograph concludes by looking briefly at some topics showing the further scope of minimax algebra. The aim will be more to give an appreciation of the ideas than to give a rigorous justification.
110
R. A. CUNINGHAME-GREEN
A . Approximation and Residuation 1. Period-1 DES
Given a sequence [g,] (r = 1,2, ...) of Markov parameters which is ultimately (say for r 2 N ) A-periodic of period 1, it is straightforward to find a product-form realization, that is an expression
f(z) = c @ (1 0~ ) ( ~ l@)
@ ( N @ z)'~,),
*.*
...), where c is a constant.
such that g , = f ( r ) (r = 1,2, We may rewrite f ( z ) as c
+ C e,
z),
S
so, in order for f and g to agree for z = 1 , ...,N , the unknown exponents {e,) must satisfy the linear equations
-
- - el -
1
2
3
...
N
2
2
...
N
3
3
3 3
...
N
...
Nd
-
e2 e3
+ g, + g2
--c --c
-
-c+g3
=
.
... _ N N N
-
-eN,
+ gN_
-c
It is not hard to invert the matrix to obtain -
1 1
.
1
-2
0
1
0
... ...
0
0 1
-2
0
...
0
0
...
0
1
...
0
-2
1
...
0
1
0
0
1
(1 - N ) / A
+ g1 --c + gz --c + g3 --c
--c
+ g,
111
MINIMAX ALGEBRA AND APPLICATIONS
A rational realization of a sequence (g,] of Markov parameters is an expression f ( z ) = P(z)//Q(z),where P,Q are maxpolynomials such that
(r = 1,2, ...).
f(r)= g,
The foregoing analysis leads to the following conclusion. A Markov parameter sequence is ultimately I-periodic of period one i f and only if it has a rational realization. Since the equations for the required exponents may be solved explicitly, as in Eq. (X.l), the rational realization may be constructed in O ( N ) steps. Application X.l
To find a rational realization of 0,1,0,1,2,0,-2,-4,-6
,...,
observe that the last change of slope occurs at N = 5. From Eq. (X.l), el = 1 ,
e2 = -2,
e, = 2,
e, = 0.
For large z, the slope of the function will be el
+ ... + e,,
which is 1 + e,. But from the given sequence, this slope must be -2, so e, = - 3 . Equation (X.l) with N = 5 gives e, = (-3/5) - (c/5), so c = 12 and the product-form realization is 12
o (1 o Z) o (2 o z)(-,) o (3 o
z)(’)
o (5 o z ) ( - ~ ) .
The rational realization is 12 8 (1
o Z) 6 (3 o ~ ) ( ~ ) / / (o2
z)(,)
8 (5
o
z)(,).
For a product-form realization to correspond to a maxpolynomial, all the exponents e, must be non-negative, giving a product form which can be evolved. Equation (X.l) then implies that the first differences of the Markov parameters are non-decreasing, giving a convex function. The condition g , 2 g , entailed in Eq. (X.l) limits the method as described to increasing sequences with a maxpolynomial realization, but this limitation can be avoided by allowing a factor ( E z ) ( ~in ) the product form. 2. General Approximation
Let h(z) be any given function. Take any arbitrary closed interval, say the unit interval [0, 11. For any large integer N, define the N points
112
R. A. CUNINGHAME-GREEN
= r / ( N + l), for r = 1 , ...,N . As discussed earlier, we can construct a rational approximation f(z) of h(z) in the form
p,
=c
+ ez + C e, max(r, z).
(X.2)
The natural extension of the Chebyshev distance function (to real-valued functions is given by In Cuninghame-Green (1983), it is shown that for any continuous function, we can always choose N large enough to make the value of ((f,h) arbitrarily small. Recalling that e, represents the change of slope off at z = p,, we may let N 00 and obtain a (purely formal!) argument representingf(2) on [0, 11 in the form +
c
+ ez +
1
so
(t 0 z ) f ” ( t ) d t .
(X.4)
Application X.2
For any functionf with continuous second derivatives, Eq. (X.4) can be justified by splitting the integral into 1; + jt and evaluating the second integral by parts. We find c = f(1) - f’(1) and e = f‘(0). For example, for the function z2, z2 = - 1
+2
I
so
(t 0 z)dt
3 . Generalized Matrices
In the rational approximations considered earlier, we fixed the corners p, and calculated optimal exponents e, . A maxpolynomial approximation may also be approached by fixing the exponentsj, and seeking the optimal coefficients c, to approximate a given function h(z)by
Some light on this is provided by the observation that a maxpolynomial in effect maps a function c, of the discrete variable r into a function P(z) of the continuous variable z. If we think of a generalized matrix M , whose (z, r)th element mz,,is Q,, we may write P(Z) =
c@mZ,,@ cr, r
MINIMAX ALGEBRA AND APPLICATIONS
113
achieving an analogue of matrix multiplication: P=MQc.
The task of finding the coefficients to give a maxpolynomial approximation of a given function h(z) may therefore be regarded as finding the best solution of MQc=h. By analogy with Section II,D, it is natural to seek a formal solution 0’ h, that is,
c = M*
c, = rnin(-rn,,, Z
+ h(z)).
(X.5)
Since maxpolynomials are convex, they are likely to be most suitable for approximating convex functions. So now suppose that h(z)is twice differentiable and h”(z) > 0 for all z. This implies that h’ is strictly increasing in z , so h is convex, and also h’ possesses an inverse. The minimum in Eq. (X.5) is attained where the derivative of h(z) - j,z vanishes, that is, where h’(z) = j , , giving
z = hf-’(j,), and so c, = h(h’-l(j,)) - jrh’-’(jr).
Application X.3
For the convex function z2, the preceding formula leads to c, = -j,2/4.
4. Residuation
If S, S* are two given sets, each with a suitable partial order, and f , f * are isotone functions mapping from S to S* and from S* to S, respectively, then the pair f , f * are said to form a residuation and to be each other’s residual if for every s E S and s* E S*,there hold
f *(f ( s ) ) 1 s and f ( f * ( s * ) )Is*. For example, given a matrix A , define
(X.6)
and f * ( y ) + A* 0’y ; f(x) = A 0 x then Theorem III.3(i) (together with its dual) shows that this is a residuation. For any residuation, we may prove a result exactly analogous to that of Theorem 111.7, as shown in Blyth and Janowitz (1972). In particular, the
114
R. A. CUNINGHAME-GREEN
formal process presented in the preceding section for approximating a function by a maxpolynomial can be justified in exactly this way. Finally, we may generalize the concept of a matrix [au] by replacing both the row and the column indices i, j by continuous variables and arrive at the maximum transform which maps a function h to a function h*: h*(x) = max(a(x,y)
+ h(y)).
This evidently has a formal residual in the minimum transform h(y) = min(-a(y,x)
+ h*(x)).
Bellman and Karush (1962) discuss the use of this in the solution of problems of optimization. Cuninghame-Green and Burkard (1987)consider the analogy of the eigenvector-eigenvalue problem. A form of this arises in the Frenkel-Kontorova model for stable configurations of atoms in a periodic potential. B. General Linear Dependence
1. Permanents and Assignments Suppose n people must receive one each of n tasks and that aij represents the value of assigning person i to job j . What assignment will produce the greatest possible total value? This classical assignment problem arises in a number of contexts, and a well-known algorithm exists for solving it in O(n3)steps-see, for example, Papadimitriou and Steiglitz (1982). Essentially, an assignment is a permutation n of the set [ 1,2,...,n ] , with the interpretation that person i receives job n(i). The total value of the permutation is C aidi), and therefore a maximizing permutation produces total value max “
Ci air(i)
In the notation of max algebra, this is
C@ II@air(i) r i The resemblance of this to the formula for a determinant in conventional linear algebra is striking: det([aijl) =
1’ r
dn)ai,(i)
3
i
where C(R) is i1 according as R is an even or odd permutation. If the sign function c is omitted from the definition of a determinant, the resulting expression is usually called a permanent, and that, by analogy, is the term
MINIMAX ALGEBRA AND APPLICATIONS
115
also used for Eq. (X.7). A maximizing permutation, which produces the value of the permanent, may be an even or an odd permutation. In general, it need not be unique and the case where, for a given matrix A, there are maximizing permutations of both parities, has a special significance which we discuss below. The conventional theory also defines for a given square matrix A its adjugate matrix adj(A), and this may again be imitated in max algebra, as follows. The cofactor au is defined as the value of the permanent obtained by omitting row j and column i from A ; then adj(A) = [au]. If A is definite and increasing, then adj(A) = A(A). See CuninghameGreeen (1979) for further discussion of this result, due to M.Yoeli. Application X.4
From the matrix A of Application VII. 1,
is definite and increasing. The cofactor aI2equals perm
L-2
0
= max(3, - 5 ) = 3.
Calculating the other eight cofactors similarly gives
It is quickly verified that B(2)= adj(B), and since r(B) = (I 0 B ) ( ~=) ~
( ~ 1 ,
Yoeli's result is confirmed. 2. Theorem of Gondran and Minoux
Given a collection of vectors a (l ),a(2), ..., it may be that one of them equals a linear combination of the others. Or it may be that a linear combination of some of them equals another linear combination of
116
R. A. CUNINGHAME-GREEN
some of them. In conventional linear algebra, these conditions are essentially equivalent and simply amount to saying that the vectors are linearly dependent. But because of the lack of a subtraction operation, these conditions are not interchangeable in max algebra. Application X.5 The techniques of Section I1 show that no one of
equals a linear combination of the others. Nevetheless, it is clear that linear dependences exist, for example, a(1) 0 a(2) = a(3) 0 a(4). We shall say that a general linear dependence holds among vectors ...,a(n) if there exist scalar coefficients I , ...,I, such that
a( l),
zo
0 a(j)=
zo
,
0 a(j),
j s K
jsJ
where J, K are disjunct subsets of 11, . . .,n ) . In Gondran and Minoux (1978), the following criterion was given.
Theorem X.l. A general linear dependence holds among the columns of a square matrix A if and only if the value of perm(A) is achieved by both an even and an odd permutation. H This result is the direct analog of the vanishing of the determinant as a criterion for linear dependence in conventional algebra. Application X.6
For the matrix A =
Io7l -1
L-1
6 2 ,
3 31
the value of the permanent (i.e., 9), is achieved by two permutations of opposite parity, namely the identity permutation achieving a , , + a22 + a33and the permutation (12)(3) achieving a12 a2, a 3 3 ,so a general linear dependence holds among the columns. There being only three columns, this must necessarily take the form of a simple linear
+
+
MINIMAX ALGEBRA AND APPLICATIONS
117
dependence, which is easily found, using the techniques of Section 11, to be -7 Q ( C O2)~ @ - 4 Q ( C O3)~ =
CO~ 1.
C. Cayley-Hamilton and Realizability 1. Characteristic Maxpolynornial
Given a square matrix A , a principal permanent (of order k) of A is any permanent perm(X), where X is of order k and is either A itself or any matrix obtained by deleting from A some rows and then the columns with the same index numbers. For example, if we delete row 2 and then column 2 from the matrix in Application X.6,we obtain a principal permanent perm[
0 1
-1
]
=
3.
3
The corresponding principal permanent mean is given by the arithmetical ratio perm(X)/k. The notation p(A) denotes the greatest principal permanent mean derivable from A .
Theorem X.2.
For A E F,,", p(A) = L(A).
Proof. Since the permutation which achieves the value of a principal permanent can be decomposed into its constituent cyclic permutations, every principal permanent value is the arithmetical sum of cycle-weights, and so every principal permanent mean is a weighted arithmetical average of cycle means. It follows that p(A) 5 I ( A ) . But suppose without loss of generality that the cycle-mean A(A) is achieved by the cycle-weight w = a12+ a23 + + a p l . Let X be the principal permanent obtained by deleting any row or column from A with an index greater than p . It is clear that perm(X) = w , that X produces a principal permanent mean of L(A), and therefore that p(A) = &4). If A is a given square matrix, z a scalar variable, and I the identity matrix, then we may formally multiply out perm (A @ x Q I) to produce a maxpolynomial. Thus, if A=
r'
1 01 41 1 ,
118
R. A. CUNINGHAME-GREEN
then perm(A 0 z @ I) is 2 0 2
1
21
0 0 2
perm[
1 1 40 2 1
2 = (2
0 x)
1
perm[ 0 0 2 2
]
@l@perm['
1 0 2
0 4 Q ~ e r m [1 0 0 2
1
3
2 2 0 2
which eventually leads to 7 0 6 0 z 0 2 @ zC2)0 zt3).
We call this the characteristic maxpolynomial of A. It is not hard to see that each coefficient of is exactly the greatest principal permanent of order k, from which we may derive as in Cuninghame-Green (1983a) the following.
Theorem X.3. r f A E F,,,,then the greatest corner of the characteristic maxpolynomial is 1(A). This result corresponds to the classical result that the eigenvalues are the roots of the characteristic equation. 2 . Cayley-Hamilton and Hankel
Another classical result of linear algebra, related to that just mentioned, is that a square matrix satisfies its own characteristic equation. In Olsder and Roos (1988) and elsewhere, a similar result is shown for max algebra, one version of which can be formulated in the following way.
Theorem X.4. such that
if
A
E F,,,,
there exist scalar coefficients Lo, ...,1,
E@Lj @ A") j E J
=
E@Aj @ A"), jsK
where J , K are disjunct subsets of (0,1, ...,n ) , and A'') = I . W This result has important consequences in the theory of realizability of (gj] has a realization
DES.For, if a sequence of Markov parameters
MINIMAX ALGEBRA AND APPLICATIONS
119
then Theorem X.4 clearly implies that there exist coefficients lo,...,An such that for integer r 2 1,
C@Aj o gj+r = j eC@ nj 8 g j + r , K
jeJ
where J, K are disjunct subsets of [0, ..., n). In other words, the same relationship is satisfied by any n + 1 consecutive Markov parameters. This in turn implies an unchanging linear dependence on any n + 1 consecutive columns of the infinite Hunkel matrix
If such a dependence does not hold, we can conclude that there is no solution to the strong realization problem with A of dimension n or less. Application X.7
Does the Markov-parameter sequence 0 , 1 , 2 , 2 , 2
,...
have a realization as in Eq. (X.8) with A of dimension 2 or less? To answer this, consider the three-row Hankel matrix 0 1 2 2 2
[I 2 2 2 2 2 2 2 2 2
***]*
There can be no linear dependence among the first three columns. We can check this either using the techniques of Section 111, or by observing that
[:1 :I
perm 1 2 2
= 6,
a value attained by one unique permutation, corresponding to the antidiagonal of this matrix. Hence there cannot be maximizing permutations of both odd and even parity; Theorem X.1 therefore implies that the columns are linearly independent, and so no realization of dimension less than 3 is possible.
120
R. A. CUNINGHAME-GREEN
REFERENCES* Baccelli, B., Cohen, G., Olsder, G A . , and Quadrat, J.-P. (1992). “Synchronization and Linearity: An Algebra for Discrete Event Systems.” Wiley, Chichester. Bellman, R., and Karush, W. (1962). Mathematical programming and the maximum transform. SIAM J. Appl. Math. 10. Blyth, T. S., and Janowitz, M. F. (1972). “Residuation Theory.” Pergamon, Oxford. CarrC, B. A. (1971). An algebra for network routing problems. J. Insf. Math. Appl. 7 , 273. Cohen, G., Dubois, D., Quadrat, J. P., and Viot, M. (1985). A linear system-theoretic view of discrete event processes and its use for performance evaluation in manufacturing. IEEE Trans. Aufom. Confrol AC-30, 210. Cuninghame-Green, R. A. (1962). Describing industrial processes with interference and approximating their steady-state behaviour. Oper. Res. Q. 13,95. Cuninghame-Green, R. A. (1979). “Minimax Algebra.” Lecture Notes in Economics and Mathematical Systems. No. 166. Springer-Verlag, BerlinINew York. Cuninghame-Green, R. A. (1983). Minimax approximation of continuous functions. Ekonomicko-mafemaficky Obzor 19,388. Cuninghame-Green, R. A. (1983a). The characteristic maxpolynomial of a matrix. J. Math. Anal. Appl. 95(1), 110. Cuninghame-Green, R. A., and Burkard, R. E. (1987). Eigenfunctions and optimal orbits. J. Math. Anal. Appl. 99,83. Cuninghame-Green, R. A., and ButkoviE, P. (1993). Discrete-event dynamic systems: The strictly convex case (in press). Cuninghame-Green, R. A., and Huisman, F. (1982). Convergence problems in minimax algebra. J. Mafh. Anal. Appl. 88(1), 196. Cuninghame-Green, R. A., and Meijer, P. F. J. (1980). An algebra for piecewise-linear minimax problems. Discrefe Appl. Math. 2 , 267. Gondran, M., and Minoux, M. (1978). L’indCpendence IinCaire dans les dioides. Bullefin de la Direction des Efudes et Recherches, Sdrie C (Mafhdmatiques, Informafique), (l), 67-90. E.D.F., Clamart, France. Gondran, M., and Minoux, M. (1984). Linear algebra in dioids: A survey of recent results. Ann. Discrefe Mafh. 19, 147. Karp, R.M. (1978). A characterization of the minimum cycle mean in a digraph. Discrefe Mafh. 23, 309. Olsder, G. J. (1986). On the characteristic equation and minimal realizations for discrete event systems. In “Analysis and Optimization of Systems” (A. Bensoussan and J. L. Lions, Eds.), pp. 189-201. Springer-Verlag, Berlin/NewYork. Olsder, G. J., and de Vries, R. E. (1988). On an analogy of minimal realizations in conventional and discrete event dynamic systems. I n “Discrete Event Systems: Models and Applications, Vol. 103 of Lecture Notes in Control and Information Sciences” (P. Varaiya and A. B. Kurzhanski, Eds.), pp. 149-161. Springer-Verlag. BerlinINew York. Olsder, G. J., and Roos, C. (1988). Cramer and Cayley-Hamilton in the max-algebra. Linear Algebra Appl. 101,87.
* The following references d o not constitute an exhaustive bibliography, but list some titles of direct relevance to topics in the text. Innevitably, in a short book, the important contributions of many people remain unacknowledged. For much more extensive list of references, consult Baccelli ef al. (1992), Gondran and Minoux (1984), and Zimmermann (1981).
MINIMAX ALGEBRA AND APPLICATIONS
121
Papadimitriou, C. H., and Steiglitz, K. (1982). “Combinatorial Optimization-Algorithms and Complexity.” Prentice-Hall, Englewood Cliffs, N.J. Zimmermann, U. (1981). “Linear and Combinatorial Optimization in Ordered Algebraic Structures.” North-Holland. Amsterdam.
This page intentionally left blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 90
Physical Information and the Derivation of Electron Physics B . ROY FRIEDEN Optical Sciences Center. University of Arizona Tucson. Arizona
. .
I Introduction . . . . . . . . . . . . . . . . . . . . . . I1 The Zero Property of Lagrangians . . . . . . . . . . . . . . . 111. Fisher Information . . . . . . . . . . . . . . . . A Parameter Estimation Channel . . . . . . . . . . . . . . . B . Cramer-Rao Error Inequality . . . . . . . . . . . . . . . C. Derivation of Cramer-Rao Inequality . . . . . . . . . . . . . D Multidimensional Parameters . . . . . . . . . . . . . . . E. Resulting Scalar Information . . . . . . . . . . . . . . . . F . Shift-Invariant Case . . . . . . . . . . . . . . . . . . G . Information I as a Measure of Disorder . . . . . . . . . . . . H . “Characteristic” Information State and Covariance . . . . . . . . I . information I as a “Mother” information . . . . . . . . . . . IV Principle of Extreme Physical Information (EPI) . . . . A Axiomatic Approach . . . . . . . . . . . . . . . . . . B . Solution . . . . . . . . . . . . . . . . . . . . . . C. Resulting Variational Principle . . . . . . . . . . . . . . . D. Why Zero Information. Physically? . . . . . . . . . . . . . . . . . E I as the Self-Distance of an Information Divergence Measure F. Comparison with Huber’s Probability Law-Estimation Procedure: Estimation becomes Derivation . . . . . . . . . . . . . . . . . . G . Agenda for Derivations to Follow . . . . . . . . . . . . . . V Special Relativity . . . . . . . . . . . . . . . . . . . . VI Classical Electrodynamics . . . . . . . . . . . . . . . . . . A Characteristic State . . . . . . . . . . . . . . . . . . B Conditional Information J and Solution q . . . . . . . . . . . VII Quantum Mechanics . . . . . . . . . . . . . . . . . . . A . Gauge Covariance . . . . . . . . . . . . . . . . . . . B . Transition to Complex Modes . . . . . . . . . . . . . . . C Definition of Momentum-Energy Space . . . . . . . . . . . . D . Finding What I Equals. so as to Form I . . . . . . . . . . . . E. Definition of Mass. Resulting Energy-Mass Relation . . . . . . . . F . Klein-Gordon Equation (Free Field) . . . . . . . . . . . . . G . Klein-Gordon Equation (with Fields) . . . . . . . . . . . . . H . Dirac Equation (Free Field) . . . . . . . . . . . . . . . . I Dirac Equation (with Fields) . . . . . . . . . . . . . . . . J . Dimensionality N,Resulting Spin. Nonrelativistic Limit . . . . . . . K . Discussion . . . . . . . . . . . . . . . . . . . . .
. . . .
. .
.
. . . . . .
.
.
. . .
.
. . .
.
123
124 126 128 128 129 130 133 135 135 136 137 138 139 140 141 142 143 144 145 146 147 149 150 151 154 154 156 157 158 159 160 161 161 163 163 164
Copyright 0 1995 by Academic Press. Inc. All rights of reproduction in any form reserved .
ISBN 0-12-014732-7
124
B. ROY FRIEDEN
. . .
. . . . .
Vll l . Uncertainty Principles . . . . . . . . . . . . . . . . . . . . . . . . . . A. Position-Momentum Relation B. Time-Energy Relation . . . . . . . . . . . . . C. Discussion: What Do the Heisenberg Relations Really Mean? . . . . D. Efficient Estimator and Minimum Uncertainty Product . . . IX. General Relativity . . . . . . . . . . . . . . X. Power Spectral l/f Noise . . . . . . . . . . . . . . . . . . . . . . . . A. Problem Definition B. Temporal Evolution and Disorder . . . . . . . . . . . . . C. Review of EPI Procedure . . . , . . . . . . . . . D. Application of EPI to I/f Scenario . . . . . . . . . . . . . . . . . . . . . . . . . E. FindingF[S(w), w ] . . F. FindingG(w). . . . . . . . . . . . . . G. Solution . . . . . . . . . . . . . . . . . H. Discussion . . . . . . . . . . . . . . XI. Synopsis and Highlights of Derivations . . . . . . . . . , . . . . Appendix A: Fisher Information Obeys Additivity . . Appendix B: Maximal Infomation and Minimal Error in Characteristic State Appendix C: Properties of Information Divergence Quantity I(& 0‘) . . . Appendix D: Maxwell’s Equations from the Vector Wave Equation . . . . . . . . Appendix E: Derivation of Eq. (V11.39) . . . . . . . . . Appendix F: Evaluation of Certain Integrals . . . . . , . . . . . . . . . . . . . . . . . . . . References
.
. . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
. .
165 166 167 168 169 170 174 175 176 178 179 181 181 182 184 185 190 191 194 196 198 201 202
I. INTRODUCTION At the quarter-mark of this century, the celebrated statistician Ronald A. Fisher invented a form of information that bears his name (Fisher, 1925). At about the same time, the eminent physicist Erwin Schrodinger was publishing his first papers on quantum mechanics, e.g., Schrodinger (1926). In the latter, he wrote down his famous differential equation describing the motion of a non-relativistic particle, the Schrodinger wave equation (SWE). He also worked backwards and found a Lagrangian principle for deriving the SWE, but could not attach any physical (or other) meaning to it, calling it “incomprehensible.” This began a trend in physics that continues until today. Lagrangians may be used to derive most disciplines of physics. Yet they remain enigmas. No one really knows where they come from, what their rationale is, and whether they have a common origin in a higher principle. The typical attitude is perhaps: A still more general comment is that the variational principle is generally useful in unifying a subject and consolidating a theory rather than in breaking ground for a new advance. It usually happens [author: as in the Schrodinger case above] that the
PHYSICAL INFORMATION
125
differential equations for a given phenomenon are worked out first, and only later is the Lagrange function found, from which the differential equations can be obtained. (Morse and Feshbach, 1953) Apparently Schrodinger was not aware of Fisher’s work because the answer to his problem of the “incomprehensible” Lagrangian is Fisher’s measure of information. Moreover, Fisher’s information provides a basis for virtually every other Lagrangian of physical theory. It is the higher principle we sought, uniting most of physics under one idea, that of information theory. The thesis of this chapter is that Fisher Information I is one-half of a two-term information concept, called “physical information” I, whose extremization (or zero-root) derives most of known physical theory. Before embarking on this quest, it is instructive to visit the Lagrangians for the various fields of physics (Table I). What these disparate Lagrangians have in common is a term that is quadratic in the field function of interest, in the form of a dot or inner product,
As will be seen, this term provides the link to information theory. It is
basically Fisher’s information I. Hence, Fisher information I occurs, not only in the Lagrangian for the SWE (fourth item down in the table), but in most Lagrangians of physics. It provides the unifying concept we sought. At this point, the reader might be surprised that the information quantity of use is not the Shannon or Boltzmann form of entropy. Certainly a classical grounding in communication theory or in thermodynamics would suggest that these, respectively, are the important information measures. It will become apparent, later, why these are not the appropriate measures for our purposes: Basically, the Lagrangians, and laws, of physics arise out of a parameter measurement-estimation effect; in particular, a gedanken experiment whereby the mean, or ideal, value of a parameter is to be estimated from experimental data. This naturally brings in Fisher information I, and not Shannon or Boltzmann entropy. By contrast, for the Shannon or Boltzmann forms to naturally occur, the gedanken scenario would have to describe a multiplicity of signal transmissions through a communication channel (Reza, 1961). This does not appear to be a proper model for defining most physical laws. As will become apparent, physics is well modeled as nature’s response to an optimized measurement-estimation procedure, and not to the transmission of parameters over a channel. We now note an essential property of the Lagrangians in Table I that is later interpreted as a defining property of physical information.
126
B. ROY FRIEDEN
TABLE I LAORANOIANS FOR VARIOUSPHYSICAL PHENOMENA. WHEREDo THESE COMEFROM, AND IN PARTICULAR, WHY DO THEY ALL CONTAIN A SQUARED GRADIENTTERM? Phenomenon
Lagrangian
Classical mechanics Flexible string or compressible fluid Diffusion equation Schrodinger wave equation Klein-Gordon equation Elastic wave equation Electromagnetic equations Dirac equations General relativity (equations of motion)
Boltzman Law Maxwell-Boltzmann Law Lorentz transformation (special relativity)
aiqnaiqn (invariance of integral)
Helmholtz wave equation
- v y vy* -
.
...
11. THEZEROPROPERTY OF LAORANOIANS A second property of the Lagrangians of physics is that most have value zero when evaluated at their extremum solutions. For fields of physics that do not obey this property, the physical information (PI) theory below will define alternative Lagrangians that do. The Lagrangians we will work with
127
PHYSICAL INFORMATION
are (except for the electromagnetic case) of the form (II.la) J
n=l
v = (a/ax,, a/ax,, ...,aiax,). (II.lb) r = (xl ,x,, ...,xK), Parameter K is the dimension of the coordinate space, e.g., K = 4 in relativistic theories (later). Function f 2 ( r ) is real. Integration limits are infinite unless otherwise stated. We show next that the extremum value of I is zero (Cocke, 1993). Proof. Denote n = 1, ..., N ;k = 1, ...,K. Then (1I.la) may be placed in the more compact form aqn/axk = qnk,
(11.2)
(11.3) The solution to I = extremum
(11.4)
is the Euler-Lagrange equation K
1, ..., N,
(11.5)
where L is the integrand of Eq. (11.3). Since from (11.3) a i / a q n k = 2qnk and aL/aqn = -2f2(r)qn,
(11.6)
C
d/dXk(ai/aqnk)= a u a q , ,
n
=
k= 1
the Euler-Lagrange equation (11.5) becomes k= 1
Now, integration by parts of the first right-hand term in Eq. (II.la) gives (11.8)
Using result (11.7) in this equation gives I = 0, the desired result. In the theory that follows, functionf(r) will play the role of a potential, or some other function defining the scenario. Quantity I will be one form of “physical information” (PI) that naturally arises out of an axiomatic definition of PI. At this point we make a necessary detour into parameter estimation theory. The following section provides the foundation for all subsequent physical derivations.
128
9. ROY FRIEDEN
111. FISHERINFORMATION
Suppose that an object is to be measured for a general coordinate c (of length, momentum, energy, etc., in any combination). The aim of making the measurement is, most basically, to verify the truth of a test hypothesis about the object. The more perfectly the basic parameters r of the object can be known, the more confidence there is in any resulting judgment on the truth of the hypothesis. These considerations will directly lead to the relevant form of information for establishing physical laws. A. Parameter Estimation Channel
Knowledge of the parameters is acquired, in general, as the result of measurements, followed by estimates of the parameters based upon the measurements. Such a flow of operations defines a parameter estimation channel. In Fig. 1 a parameter estimation channel is shown for estimation of a single parameter 0 based upon a vector of measurements
Y = (Y,lY,Y ...YYN). The observer is free to form any estimate e^ of B based on the observed y. This takes the form of an estimator function &y). At this point, the reader may wonder what this has to do with derivation of laws of physics. Figure 1 is a model scenario for creation of a law. A physical law p ( y 10) must certainly exist at the time that it is tested by a measurement. This simple model is a measurement-induced definition of physics.
tEstimate (Y,, YJ of e formed
e
..-,
Estimator
Ideal Value
Parameter source
t
Likelihood
channel
.Y"
Data observables FIGURE1. Parameter estimation channel.
PHYSICAL INFORMATION
129
It is tempting to make the act of measurement an act of creation, asserting that the law p(y 1 8) does not exist until the measurement is made. That is, since the measurement channel ultimately derives the law, it also literally creates the law. We leave this as food for thought. The error limitation of the estimator step will define the form of information pertinent to formation of the physical law. The achievement of channel capacity (in the language of Shannon information theory), i.e., extremizing the information, gives rise to the law. B. Cramer-Rao Error Inequality
To define the information that matters, consider the errors due to many 8 determinations. In any one determination the error will be [8 - f i ( y ) ] . Then the mean-square error over many determinations will be (111.1) where p(y 18) is the probability of the data in the presence of the ideal parameter value 8. (This probability law is commonly called the “likelihood law” in statistics.) Consider, for the moment, the class of estimators &y) that are unbiased, i.e., that obey (I1I. 2) &Y)) = 8. This says that the estimate is right, on average, even though it may not be right at each determination. This is the estimator cognate to unbiased apparatus. Under restriction (111.2) alone, a powerful statement may be made about e2. This is the Cramer-Rao (C-R) inequality e2 2 1/1, (111.3) where I is defined as I =
s
dyp(y I e)(a/ae[lnp(y I mi2.
(I I I. 4)
The action of I in Eq. (111.3) is such that, the larger I is, the smaller is the permissible error value. For this reason, I is called an “information” measure for the channel. It is, specifically, the Fisher information in data y about parameter 8. Equation (111.4) states that the likelihood law p(y 18) alone defines information level I. This is handy for later setting up a procedure for estimating a law of physics p(y I 8) based upon extremization of I. If other quantities entered in, such a procedure would be untenable, or at least more difficult to carry through. The proof of Cramer-Rao result (111.3) is straightforward and is given next.
130
B. ROY FRIEDEN
C. Derivation of Cramer-Rao Inequality The derivation generally follows Van Trees (1968).First, by definition of the average,
(iw - e) =
1
dyp(Y I e)[iw -
el
=
0,
(I11.5)
the latter by Eq. (111.2). Differentiating Eq. (111.5) with respect to 0 gives
By normalization, the second integral is 1. Also,
identically. Then Eq. (111.6)becomes
(111.7) Preparing for the use of the Schwarz inequality, we factor the integrand and square the whole equation:
By the Schwarz inequality, the left-hand side (lhs) obeys
r
(111.9) But by definition (111.4)the first integral is the Fisher Z. Also, by (111.1) the second integral defines the mean-square error e2 due to the estimate 6. Hence, lhs s Ze2, e2 lhs/Z, (1II.lOa) and by Eq. (111.8) lhs = 1, so that Eq. (111.3)follows as required. This is a general, aRd hence powerful, result regarding the ability to estimate.
PHYSICAL INFORMATION
131
An estimation rule &y) that accomplishes the equality in (111.3) is called an “efficient estimator.” Hence, the error e2 due to the efficient estimate obeys etff = 111. (111. lob) Not all probability laws p(y 18) allow for an efficient estimator to exist. An interesting example is provided by the case N = 1 of a single data measurement y. Let p(y 1 8 ) be normal in y with mean value 8. In this case an efficient estimator exists, and it is &y) = y, the measurement itself. (Note that a Gaussian law describes the ground state of the simple harmonic oscillator. Accordingly, its mean or rest position 8 can be efficiently estimated.) As a counterexample, a probability law that is a squared sinusoid, corresponding to the case of a free particle in a box, does not admit of an efficient estimate of mean particle position. 1. Generalization to Case of Complex Measurements We have derived the C-R inequality for the case of a real parameter 8. By contrast, the parameter can sometimes be a pure imaginary number fl = ia, i= a real. An example is in estimating the mean time (t) of detection of a photon, where 8 = ic(t), c the speed of light. The corresponding measurements y would also be pure imaginary, y = ict, t real. With knowledge that 8 is pure imaginary, the estimator 8 is likewise made pure imaginary, 6 = icb, b real. In summary, we have a scenario (for simplicity, take c = 1) for which all quantities are imaginary, e = ia, y = it, 8 = ib. (111.1Oc) Do the key results (111.3), (111.4), (1II.lOb) still hold? At first sight, Eq. (111.4) seems to now be meaningless as an information measure since it would give a negative value (because of the square of 8/88, 8 = ia). Also, Eq. (111.1) would now give a negative mean-squared error e’. What we will show is that although both I and e2 are now negative, their product in (111.3) remains positive and again obeys the C-R inequality. Therefore, the efficient error e2, where (by definition) the equality in (111.3) is attained, is still value 1/1 (Eq. (IILlOb)). To show this, start out as at Eq. (IIM),
m,
s
8tp(t 1 a)[&) - a] = 0.
(1II.lOd)
Note: Conventional probability theory handles complex variables by treating them as joint real variables consisting of the real and imaginary parts (Frieden, 1991a). Here there are purely imaginary parts it, ia present. Hence, the expectation integration is over variables dt. Likewise, p(y 18) really means p(t I a) and was so replaced.
132
B. ROY FRIEDEN
Next, follow the exact procedure beyond (111.5), differentiating with respect to a, etc. The result corresponding to Eq. (111.9) is now
s
1 I dt(alnp/aa)2p
Change variables from a, b, back to 8,
s
dtp(b - a)2.
6 via Eqs.
s
(III.10e)
(111.10~).This gives
1 I dt(a lnp/aB)2(i2)p dtp(6 - e)2/(i2).
(III.10f)
The factors (i2) cancel, and we can still define quantities
I=
s
dt(aInp/aO)2p
and
e2 =
dtp(6 - 0)’
(1II.lOg)
that obey a C-R inequality e 2 1 r 1.
(1II.lOh)
Efficiency is again defined as accomplishment of the equality, so we again have Eq. (III.lOb), e& = 1/1.
(111.1Oi)
We conclude that Eq. (111.4) for I can be meaningfully used whether O is real or pure imaginary. These are just the cases that occur in the physical derivations that follow. 2. General State of Bias Finally, we generalize to the situation where, contrary to Eq. (111.2), there is a general state of bias present,
&Y))
=8
+ g(e),
(111. lOj)
g some unknown function. The preceding derivation can again be carried through, now with the result
e2 r [ i
+ ag(~)/ae]~/r,
(111.1 1)
I as before (Van Trees, 1968, p. 147). Hence, once again the quality of the parameter estimation channel is defined by the Fisher information I . 3 . Single Data Value
From this point on we specialize to the case N = 1 of a single data value y (which may itself be multidimensional, see below). Fisher I obeys additivity (Appendix A; also Frieden, 1990), so that the I in M data values is just A4
133
PHYSICAL INFORMATION
times that in one. Then the Lagrangian formed from an M data-channel would simply be M times that of the one data-channel, so that extremizing or zeroing the former would lead to the same solution p as by use of the latter. D. Multidimensional Parameters
In most physical situations parameter 6 becomes a vector of unknowns (111.12) e = el, e,, ..., O K . Hence, the “single” data value y is now itself a vector. It is convenient to now abandon y as a notation for data and use instead the vector p = P19p2, . . . , p K . (111.13) Corresponding to (111.12) is a vector of estimators (111.14) k p ) = Jl(p), JKK(p) to be formed. See Fig. 2. Finally, there is resultingly a vector of errors .--Y
(111.15) i = 1 , ...,K . e: = ([ei - Ji(p)12>, We now seek a single figure of merit, analogous to e2 in the onedimensional case, that measures the errors in all the estimated components Bi. This is the familiar problem of forming one figure of merit from a vector of inputs. We show next that one such figure of merit leads to a simple K-dimensional generalization of the Fisher information Eq. (111.4). The precision hi associated with a standard deviation ei obeys (DeGroot, 1970)
hi = l/(dZei).
(111.16)
For a figure of merit, let us use the total squared precision over the efficient estimates of Bi, (111.17)
by (111.16). 0
~o
0
lde;;pyneter
0 + 0
‘-Imperfect measurement 0
P
FIGURE 2. The gedanken measurement experiment. Imperfect measurement p of ideal value 8 is made. Given p, an optimum estimate 8( p) of 8 is to be formed.
134
B. ROY FRIEDEN
We next relate error variances e: to Fisher information terms. Consider = ([il(p) - O1I2),where 8,( p) is a chosen estimation function of data p. Form an auxiliary vector
e:
a lnp/ael
v = [(il(p) - el)
... a 1np/aeK1.
ainp/ae2
(111.18)
Next, form the matrix
-e:
1
o
1
J11
J12
0
J21
Jz2
JK1
J~ =
s
0 '
. . . . . .
JIK
J2K
. . . . . . . . . . . . . . .
(VTV) =
whose elements Jii obey
0 . .
JK2
*
*
(111.19)
JKK.
*
dp(a lnp/aei)(a inp/aej)p.
(111.20)
The matrix, by its construction, must be positive definite so that all its principal minors are non-negative. In particular, det[ e' 1
Jll
]
2
0,
or e:
h
l/Jll.
(111.21)
It is readily shown that the lower bound l/J1,is achievable by an efficient estimator. This occurs when p is separable in coordinates p i . By analogous steps, similar results follow,
e;
1
l/Jii, i = 1, ...,K,
(111.22)
with the lower bound achievable by an efficient estimator. We now combine results. By Eq. (111.20), J~~=
s
dp(a lnp/aei)2p.
(111.23)
Then, by relation (111.22), precision h2 of (111.17) becomes h2 = A 2i,1
1
dp(a inp/aei)2p.
(111.24)
This is a direct K-dimensional generalization of the one-dimensional Fisher information in Eq. (111.4).
135
PHYSICAL INFORMATION
E. Resulting Scalar Information
The relation of Eq. (111.24) to Fisher information may be regained. In this multidimensional scenario a Fisher information matrix exists, whose elements obey Eq. (111.20), (111.25)
Comparison with Eq. (111.24) shows that K
2h2 =
=
Tr(I),
(111.26)
k= 1
the trace of the Fisher information matrix. We call this trace quantity the scalar Fisher information I, so that by Eq. (111.25) K
alnp
(111.27)
k= 1
From this point on, the term ‘Fisher information I”sha1l mean this trace information or one of its subsequent forms. For K = 1, (111.27) is the one-dimensional Fisher I of Eq. (111.4). Hence, even in a multidimensional parameter estimation channel, we can use a scalar information quantity to measure the overall information level. This is important, since Lagrangians are scalar quantities, and our immediate goal is to use information to form the Lagrangians of physics. F. Shift-Invariant Case
Any measurement p suffers a random error r from the ideal 8, where (111.28)
p=e+r.
All physical laws p(p 18) to be derived will obey shift invariance, i.e., preserve the same shape irrespective of the size of 8. This corresponds to Galilean invariance in the scenario of nonrelativistic quantum mechanics, or Lorentz invariance in others. Invariance of shape means that P(P 10) = P(P -
W
Y
(111.29)
where the right-hand side p is the p.d.f. law for r. Substitute this into Eq. (111.27) for I, and change integration variable to r = p - 8. The result
136
B. ROY FRIEDEN
is the “additive” form for Fisher I , I =
s
.
dr[V lnp(r) V lnp(r)]p(r).
(111.30)
Remarkably, 8 has dropped out. This is important, since a derived form for p certainly ought to be independent of any particular parameter value 8. Because of the l n p terms in Eq. (111.30), the impression is that if p 0 then Z 4 00. In fact, this is not true, because of the multiplication b y p 0 at the far right of (111.30). The way to see this is to work with another +
+
function q(r), such that (111.31)
P(r) = q2W.
Function q(r) is thus a kind of “amplitude” function, which suggests a connection with quantum mechanics. We defer this thought to a further section. Substitution of (111.31) into (111.30), and explicitly evaluating the V of the logarithm, gives a simple result
s
I = 4 drVq-Vq.
(111.32)
There is now no logarithm in the integrand to worry about. Equation (111.32) states that I is totally a measure of the gradient content in q(r) [and hence in p ( r ) ] . G . Information I as a Measure of Disorder
From the form of Eq. (111.32), I measures the gradient content of q ( r ) or p(r).In particular, a broad, smooth&) causes a small I. Now a smoothp(r) also represents strong disorder in random variable r, since then all r values are nearly equally probable. See Fig. 3. Hence, the smaller I is,
a
b
FIGURE 3. (a) High gradients, therefore high I . Narrow effective range in r, therefore small disorder. (b) Low gradients, therefore low I . Wide effective range in r, therefore strong disorder. The upshot is that I varies inversely with disorder.
137
PHYSICAL INFORMATION
the more disordered is the system represented by p(r). Since I varies monotonically with the degree of disorder, we use it as a measure of disorder. See Frieden (1990) for further details on this measure.
H . “Characteristic” Information State and Covariance Let us consider a system in a state whereby the single p.d.f. function p(r) is actually composed of a sequence of N non-overlapping functions p,,(r);
P = (A,
...,PN),
prn(r)pn(r) = 0,
m
f
n,
and correspondingly for amplitude function q(r), q = (41,
..., q N ) ,
2
pn = qn,
qrn(r)qn(r)= 0,
m # n. (III.33a)
See Fig. 4. We call these qn the “modes” of the law p(r). According to Eq. (111.31), (I I I. 33b) that is, the modes are probability “amplitudes.” Result (III.33b) holds generally, not just in a quantum scenario (see succeeding sections). A state of distinct modes (Fig.4) is called the “characteristic” information state of the system. It has some special properties, found next. Breaking up the integration dr in Eq. (111.32) into a sum of integrals over the discrete support regions of the mode functions (Fig. 4), Eq. (111.32) becomes
I=4
Nl
C
-
dr Vq,(r) Vq,(r).
(111.34)
n= I
This will be our working version of the scalar Fisher information I. It will be used in most of the physical derivations below.
* r
FIGURE 4. Separated modes &r). These are so configured during the gedunken measurement procedure.
138
B. ROY FRIEDEN
Return to the estimation problem for a moment. With knowledge of separated modes qn(r),the observer is free to fashion a distinct estimator 8( p, n) for each region n of data space into which p falls. Hence, there are N degrees of freedom present, as compared with but one (the single estimator) before. Therefore, the r.m.s. error e of estimation will have a reduced lower bound in Eq. (111.3). As might be expected, the information I is increased as well. See Appendix B. At this point it is important to emphasize that the modes qn(r)comprise, region-by-region over space r, the single scalar function q(r). Hence the qn(r)do not comprise a vector which, by contrast, has many qn at each r. The concept of a covariant derivative may now be used. A covariant derivative D has the defining property of transforming as a tensor (Lawrie, 1990, p. 34). That is, it transforms in a covariant way. Also, the covariant derivative of a scalar is the ordinary derivative. Hence, Eq. (111.32) may be recast as
1=4
s
drDq-Dq.
(111.35)
Again using the trick of replacing the one integral dr over its subranges corresponding to the individual support regions of the modes qn, Eq. (111.35) directly goes over into
Z=4
“S
drDq;Dq,.
(111.36)
n=1
Hence, I is a coordinate-covariant quantity. It will not change if space is generally warped, e.g., by a gravitational field. A second form of covariance is gauge covariance. Information I must obey gauge covariance as well. See Axiom (iii) of Section IV,A.
I . Information Z as a “Mother” Information I is the driving force in a Poisson information equation V2K = I ,
(111.37)
where V is the gradient operator in parameter space, Eq. (11. lb). Quantity K in Eq. (111.37) is any of the information measures due t o Kullback-Leibler, Jeffries, Rao, Wootters, or a host of others (Frieden, 1993, Section 1.5). Thus, I is a measure of the curvature (V2) of any information K. So far we have been addressing issues of the estimation channel of Fig. 1. Now we make the transition to physics.
PHYSICAL INFORMATION
139
OF EXTREME PHYSICAL INFORMATION (EPI) IV. PRINCIPLE
We have shown above that the elementary act of measure-estimation (m.e.) may be described by an appropriate channel, that of Fig. 1. (We noted that this is not the familiar communication channel of Shannon theory.) The next step is to use this channel to suggest a procedure for deriving physical laws. The procedure is an outgrowth of the following hypothesis: The inability to know, i.e. a state of extreme ignorance, gives rise to the laws of physics. This seems rather strange at first, since the laws of physics are statements of extreme knowledge, and certainly not ignorance, let alone extreme ignorance. Nevertheless, we work with the following premise (EPI principle): Each physical law can be derived by the condition that any attempt at measure-estimation of an appropriate parameter gives an estimate that has minimum (more generally, extreme) (IV.1) physical information, even under the most optimum of measurement conditions (the characteristic state, defined earlier). The amount of information that accompanies the law is zero.
Thus, the parameter estimation channel of Fig. 1 has the physical significance of being the scenario that gives rise to a physical law. Also, as described in credo (IV.l), during the m.e. procedure the modes qn(r)that define the law are to be in the optimum state for measure-estimating the parameter, i.e., the characteristic information state. It results that the p.d.f. p(r) and the Fisher I are to take the forms (III.33b) and (111.34), respectively. These define, then, an ideal, gedanken m.e. scenario for fixing the physical law q(r),p(r). However, the following distinction must be kept in mind. The gedanken measurement experiment is a theoretical construct. Operationally, whether or not a solution q(r) to the EPI principle actually exhibits such separation, the gedanken experiment (of Fig. 2) that is the basis for EPIplaces them in strong separation. This implies a duality in the use of Eq. (III.33b) to compute p(r). The qn(r)components to be used are only placed in their characteristic information state during the gedanken experiment of Fig. 2. After the solution is obtained from the EPI principle implied by the gedanken experiment, the solution is directly used in Eq. (III.33b), i.e., without enforced separation. Hence, there are two p(r) laws to consider: the one used during the gedanken experiment, and the physically observable one that results from the solution to EPI. Physical information theory rests upon a number of correspondences between Fisher information theory and Lagrangian physical theory. Chief
140
B. ROY FRIEDEN
among these is that characteristic information (111.34) is physically realized by a special field condition. This is a free-field situation. This condition fixes the Fisher I part of a total physical information I (defined later). But, of course, a field-dependence must be present in any Lagrangian that is to derive a field-dependent law of physics. To accommodate this, there is a second part Jof the physical information I, and this depends upon the fields. Statement (IV. 1) is called the “principle of extreme physical information” (EPI). We propose it as a distinct law of statistical physics-the meeting of estimation theory with physical theory. This is a kind of counterpart to the concept of maximum (again, extreme) entropy as the meeting of communication theory and statistical mechanics. A . Axiomatic Approach
Principle (IV.1) and the definition of I arise out of the following four axioms: (i) Disorder aspect. Physical information I measures the disorder of a phenomenon through a linear dependence upon Fisher information I, where the phenomenon is described by coordinate r (not necessarily a length), and where p(r) is its unknown p.d.f. (ii) Second law aspect. I is minimized, or more generally extremized, due to formation of p(r). This parallels the second law of thermodynamics, but is not the same since Fisher I and thermodynamic entropy are not the same quantity. That I is extremized is taken to be a new law of statistical physics. Its truth is verified by the large number of different phenomena that it derives (see following sections). (iii) Equivalence of allphysicalparadigms. The value of the extremized I should be a universal invariant over all phenomena. Its value is zero. A requirement that I be constant is met by the demand that its two constituents I, J (see below) be constants. This is to be true (a) for all phenomena, and (b) for all equivalent ways of viewing a phenomenon, i.e., with respect to choice of coordinate system, gauge, reference frame velocity, etc. These are the usual demands on Lagrangian formulations, and we see that they originate in this axiom of the EPI approach. In this way, the axiom will be found to derive the requirements of coordinate covariance and gauge covariance, the Lorentz transformation group, the relativistic equivalence of energy and mass Eq. (VII.25), the Dirac equation, the Einstein equations of motion, and the l/f power noise law.
141
PHYSICAL INFORMATION
(iv) Equivalence of conjugate coordinate spaces. The same information I should arise whether p(r) is initially expressed in direct (r-) space or in its Fourier conjugate space (coordinate p). The choice of space is subjective, and hence should not affect the information value. B. Solution By Axiom (i), I depends upon I . By (iv), I must also depend upon the Fourier space p representation of I, I =
s
dpA(p) =
s
drF[q(r),r] = J .
(IV.2)
The first equality, defining A(p), can arise from Parseval’s theorem. The second equality represents a return to r-space based upon prior physical knowledge F about the specific scenario. In other words, it is a version of general form I that is specialized to, or contingent upon, the scenario. Hence, it is called the conditional information, and is denoted as J. For simplicity, denote x=I,
y =
s
dpA(p).
(IV.3)
Thus, x is specifically the r-space representation Eq. (111.34) of I, while y is the p-space representation. Of course, by (IV.2), x = y , and this will be used later to numerically evaluate I. Meanwhile, we seek the functional dependence of I upon x and y , and so temporarily regard x , y as distinct quantities. To satisfy (iv), I must be a function of x and y ; also, by (i), if I is initially expressed in r-space, then I must have a general linear form in x,
u) = Xf(Y) + g(y),
I=
(IV.4)
f(.), g ( . ) as yet unspecified functions. Alternatively, if I is initially expressed in Fourier space, then the roles of x and y are interchanged, and I=
w,x) = Y f H + g(x).
(IV.5)
However, by (i), I must still be linear in x . Therefore, (IV.5) requires f ( x ) = ax
+ 6,
g(x) = cx
+ d,
(IV.6)
where a, b, c, and dare constants to be determined. Use of forms (IV.6) in (IV.4) gives I(x,y) = x(ay
+ b) + cy + d .
(IV.7)
142
B. ROY FRIEDEN
By axiom (iii), the extremum value of (IV.7) must always be the same number. In order to accomplish this, we note that the numerical value of I(x,y) in (IV.7) will always by I(x,x), since numerically x = y by Eqs. (IV.2), (IV.3). Then the extremized I is I(x,x) extremized through choice of x . By Eqs. (111.34) and (IV.3), the latter depends upon the solution q(r), and this, in turn, depends upon scenario input F. Hence, the only way to make the extremized I a constant number is to make I(x,x) independent of x. From (IV.7), I(x,x) = ax2 + (b + c)x + d.
(IV.8)
This can only be independent of x if a=0,
c=-b.
Then (IV.7) becomes I(x,Y) = b(x - y)
+ d.
(IV.9)
This causes an information value I = I(x,x) = d, with or without the extremum condition enforced. Since, by (iii), this information value is to be zero, necessarily d = 0. Finally, any value of b will suffice in the derivations to follow, which use the I constructed here. For simplicity, then, take b = 1. In summary, I=I-J,
1 1 4 n= 1
5
dr Vq,(r)
*
Vq,(r), J
s
= dr F[q(r), r]. (IV.10)
C. Resulting Variational Principle By Axiom (ii) and Eqs. (IV.2) and (IV-lo), the physical information I obeys
I=4
"I
n= 1
dr V q n ( r )* Vq,(r) -
s
dr F[q(r), r] I extremum = 0
(IV. 11)
at a solution q(r). To accomplish the zero at extremization, F must have the form in Eq. (11.la). The value of zero, in particular, makes sense on other grounds as well (Section IV,D). What extremization in Eq. (IV.11) accomplishes is to make the zero an extremum value. This has the following benefits: (a) Problem (IV. 1 l), without an extremum condition, is potentially satisfied by an infinite number of possible solutions q(r). But a given physical scenario (defined by input F) should usually (exception in Section VII) have only one paradigm q(r). The extremum condition picks out this paradigm. (b) The extremum problem (IV.ll) has a variational solution 61 = 0 (IV.12)
PHYSICAL INFORMATION
143
that is satisfied by a differential range 6q(r) of laws about the solution q(r). This is a statement of stationarity, or stability, for the solution law q(r). A requirement of stability is often reasonable, and is the basis for past Lagrangian extremum approaches. By Eq. (IV.ll), I is explicitly the difference between two information forms: The first is the Fisher information form, Eq. (111.34). This is the same form for all physical scenarios. The second information term in (IV.ll) is the form J = j dr F, which depends upon the scenario and is therefore called the “conditional” physical information. F is then its density. (Analogies of these quantities with an information divergence measure are discussed in Section IV,E.) Definition (IV.11) of I has the major benefit that is extremum/zero solutions q define the correct physical laws for the scenario. D. Why Zero Information, Physically?
Axiom (iii) and Eq. (IV.11) are interesting in that they define the physical information I to be precisely zero at a physical law solution q(r). Moreover, the effect is universal: It holds for all physical scenarios defined by functions F.A universal principle of zero information is a kind of “principle of austerity,” although it differs from Wheeler’s (1988) version. Is there a precedent for a state of zero universal information? One possibility is the following effect, conjectured by Hawking (1988): The total mass-energy of the universe is zero. (The positive energy mc2 due to each mass m, with c the speed of light, is canceled by the negative, attractive field energy surrounding the mass.) See also Vilenkin (1982), who postulates that the universe was created out of pure vacuum. On the other hand, “. .. there is no such thing as disembodied information, information in the abstract. Information, of whatever kind, must be associated with matter, radiation or fields of some sort” (Bekenstein, 1990). If so, the combined implication with Hawking’s conjecture of zero massenergy overall is that the information level overall should be zero. The alternative is a situation of infinite information density (information/ mass-energy), which seems implausible. Finally, since a physical law q(r) operates on a universal scale, it seems reasonable to associate the zero information level with the law. More compelling reasons exist, as well. That information I is zero has the immediate benefit of implying the Lorentz group of transformations of special relativity for components q(r) (Section V). Also, zero information is central to the derivation of Dirac’s equation of relativistic quantum mechanics (Section VII). Finally, Eq. (IV.11) gives the same value of information (again zero) regardless of
144
B. ROY FRIEDEN
which space (r or p) is initially used for representation of I. This satisfies Axiom (iv). Next, we relate I to a more general information measure that is, in general, not zero. (In a first reading, the section may be skipped.)
E. I as the Self-Distance of an Information Divergence Measure An “information divergence” is a functional measure of the distance between two p.d.f.’s. A famous example is that of Kullback-Leibler information, defined in Appendix C, Eq. (C.1). We show, next, that I is the “self-distance” of an associated information divergence, one which shares many properties with other information divergences. As is standard in the study of information divergences, imagine each mode qn(r) to be defined by a trial set of parameters 8Ai, qn(r)
= qn(r I o;),
8;
I 8A1,
e;, ...,ek = e‘,
eA2,...,
(1v.13)
the latter the total set of parameters defining a trial solution q or p.d.f. p. As an example, the 8; can be coefficients in an orthonormal series representation m
q n ( r 10;) =
C e;i+i(r)
(IV. 14)
i= 1
for qn(r). The +&) are any complete set of orthogonal functions. The solution q(r) to a given physical problem, defined by function F in Eq. (IV.ll), is given by the particular set of coefficients 8‘ = 8. The variational procedure used to solve ( I V . l l ) would now be variation of parameters 8’, rather than by direct variation of the qn(r) through Euler-Lagrange equations. The two problems are equivalent. Define the “physical information divergence” I(8, 8 ’ ) between trial solution 8’ and actual solution 8 as
I(8,e’) = 4
”1
C
n= 1
-
dr Vq,(r I 8;) Vqn(r I 8;) -
s
drF[q(r I e l ) , r].
(IV.15)
The right-hand side is merely the evaluation of Eq. (IV. 11) at a general trial solution 8 ‘ . Hence, neither of the conditions (extremum, or 0) of Eq. (IV.11) will be met by 8‘ unless 8’ = 8, the solution. An information divergence D(8,8’)typically obeys the following properties (among others; see Amari, 1985): (i) D(8,e) = 0; zero self-length. (ii) aD(8, 8’)/d8;i = 0 at 8’ = 8, all n, i ; extremum property.
PHYSICAL INFORMATION
145
(iii) d’D(8, 8‘)/M;iM;j= gnu(@) at 8’ = 8, where gnij is the metric tensor for mode qnin the space 8 ‘ . This is a measure of the curvature of information space 8’ for mode qn. (iv) Metric gnii(8) = i3’&3)/Mni80, in terms of a “potential function”
w(@. An example of a measure D(8, 8‘) is Kullback-Leibler cross-entropy. We show in Appendix C that physical information divergence I(8, W ) , Eq. (IV. 15), obeys properties (i)-(iv). Hence, physical information 1, given by Eq. (IV.Il), is the particular value, at extremum solution 8‘ = 8, of an information divergence quantity I(@,8’). It is also shown that the metric tensor for mode n obeys gnij
= gij = -
s
+i(r)[v2+ f2(r)14ji(r),
(IV.16)
independent of n. Quantity f defines the physical scenario through its definition (C.4) in terms of F. See also Eq. (11. la). Equation (IV. 16) defines a quantum-like matrix element that is valid, however, over the wider range of physical scenarios F.
F. Comparison with Huber ’s Probability Law-Estimation Procedure: Estimation becomes Derivation A variational principle of analogous form to Eq. (IV.ll) has, in the past, been proposed and occasionally used by statisticians (Huber, 1981). Their aim has been to uniquely estimate an unknown p.d.f. p(r) in the presence of data that are insufficient to form a unique estimate. Given the ambiguity, an empirically smoothest estimate of p(r) is then sought that is consistent with the data. Viewed in this light, the first term in (IV.11) is used to force the empirically smooth estimate, while the second term represents the insufficient data used as a constraint. The mathematics are, then, analogous to ours, although the interpretation is much different. Hence, the new information approach (IV.11) can be analogously interpreted to mean that physical laws are formed as the “outputs” of a statistical estimation procedure, where the shape p(r) of a given law is a tradeoff between absolute smoothness (due to the first term in (IV.1I)) and a “constraint ** (the second term) that expresses one physical fact F about the given scenario. The one fact is insufficient to fix p(r), as in the preceding. Since smoothness measures disorder (Section III,G), the output law p(r) then expresses maximal disorder consistent with the single physical fact. A departure from Huber’s approach is the physical significance of the zeros q(r) of (IV.11). Huber’s solutions are limited to extrema. Of course, the
146
B. ROY FRIEDEN
main departure is that, whereas Huber’s outputs p(r) are merely estimates, the EPI outputs from (IV. 11) are precise physical laws. Estimation becomes derivation. G . Agenda f o r Derivations to Follow
Principle (IV. 11) will be used below (Sections VI-XI) to derive the physical laws for many classical and quantum scenarios. (Section V uses Axiom (iii) above.) To clarify these derivations, steps (1)-(4) below will be uniformly followed. These steps may alternatively be regarded as defining a general procedure for finding an unknown p.d.f.p(r) under new physical circumstances. In this context, the derivations that follow are verifications of the procedure, since they confirm known laws p(r) (for the most part). (1) Identify the physical quantity of the scenario that is a probability density function p(r). This requires as well the definition of the gedanken measurement coordinates p, 8, r of Eq. (111.28). As examples: In quantum mechanics, p is the observed position of a particle, with p(r) the p.d.f. on position fluctuation r from the classical particle position 8 (Section VII). In gas theory, p is the observed velocity of a particle and p(r) is the p.d.f. on velocity fluctuation r from a net drift velocity 8 (Frieden, 1993). (2) Identify the component functions q(r)of Eq. (III.33b) with physical amplitudes or field functions appropriate to the scenario. As examples: In quantum mechanics these are the usual probability amplitudes (Section VII). In electromagnetic theory these are the four-potentials consisting of the vector and scalar potentials (Section VI). Quantity N is left as a free parameter that is fixed, at the end, on the basis of sufficiency. (3) With the q(r) so identified, the scalar information I given by Eq. (111.34) can be formed. Then, with r physically identified, Ican be expressed in the Fourier space that is conjugate to r (e.g., momentum space; see Eq. (V11.21)). Next, the result can be re-expressed (e.g., Eq. (VII.27)) as an average in r-space, I =
s
dr F M r ) , rl,
(IV.17)
where F is a known function. Alternatively, through the use of Axiom (iii) I can sometimes be expressed directly as (IV.17), without the use of intermediary Fourier space. (4) With F so identified, it is used in principle Eq. (IV.1 I) to form the solution q(r) for the scenario.
147
PHYSICAL INFORMATION
We next apply steps (1)-(4) above to various physical scenarios, with the aim of deriving the equality laws p(r), q(r) appropriate to each. The derivations cover Sections V-XI and are independent applications of the agenda (1)-(4). Hence, they may be read in any order. The derivations may alternatively be viewed as a probability law-estimation procedure (see Section II1,F preceding). Finally, a class of inequality laws (uncertainty relations, and a bound to entropy increase) are derived in Sections VIII and XI,again using the Fisher information concept.
V. SPECIAL RELATIVITY
We show here that Axiom (iii) of Section IV,A implies the Lorentz transformation group of special relativity. Moreover, this is to hold for any physical set of modes q, indicating that special relativity is basic to every field of physics. In this way, a framework for subsequent derivations (Maxwell’s equations, Dirac equation, etc.) will be constructed. Let p(r) be any physical p.d.f., with q(r) its modes. It is constructive to rewrite the basic Eqs. (111.34) and (III.33b) for information Iandp(r) using the Einstein implied summation convention,
I=4
s
dr aiqnaiqn,
W.1)
P(r) = qn(r)qn(r) = (4nqn).
(V.2)
We use N = 4 modes q = (41 9
...)q4).
(V.3)
Also, dimension K = 4, with r
= (xl, x2,x 3 ,x4),
(xl ,x 2 , x3) = (x, Y , z ) ,
x4 = ict,
(V.4)
so that unknown parameters 8 number four as well. The derivative operator ai obeys ai = waxi, i = I, 2,3,4. (V.5) Equations (V. 1) and (V.2) naturally express Fisher information I and p.d.f. p as inner products. We next show that Axiom (iii), invariance of I over all phenomena, implies the Lorentz group of coordinate transformations. Consider the gedanken estimation experiment of Figs. 1 and 2 to be performed in a flat-space laboratory coordinate system, but viewed in a reference frame moving at constant velocity u along (say) the x-direction.
148
B. ROY FRIEDEN
Axiom (iii) must hold in either coordinate system, since it holds for all phenomena. Thus, I’ = I, (V .6) where the prime denotes the quantity as observed in the moving system. This states that the physical information
I
=4
1
draiqnaiqn-
s
d r F [ q , , r]
(V.7)
must be invariant to reference frame. Furthermore, since the two right-hand terms are equal, each must be separately invariant. The first term I, in particular, is an integral of scalar inner products. It will be invariant to reference frame, by definition, if
s
dr’ajqhafq; =
s
draiqnaiqn,
W.8)
Primes denote quantities as observed from the moving system and aj
a
-
i = 1,2, 3 , 4 .
ax;
Invariance (V.8) will be obeyed if there exists one 4 x 4 transformation matrix [A]of coordinates r and the derivative vectors aiq, (a vector for each n), where r’ = [Alr, (V. 10) ajq; = [ ~ ] - l a ~ q , ,
(V. 11)
such that both the volume elements dr’ = dr
(V. 12)
and inner products (norms) a:q;a:q;
= aiqnaiqn,
n fixed = i , 2 , 3, or 4,
(V.13)
remain invariant. There are many solutions (Jackson, 1975) for [A]obeying requirements (V. 10)-(V. 13). (Also, see explanation in footnote.*) These In the derivation in Jackson (1975). his four-vector x corresponds to our derivative vector ai9,,. n fixed, i = 1-4. Also, our inner product requirement (V.13)corresponds to Jackson’s norm requirement (1 1.85). Thus, we derive the Lorentz group of transformations using aiqn as
a basis vector instead of the usual quantity x. Each n-value gives the same answer, the Lorentz group. The key equation (V.13)is valid for the following reasons (exterior to the preceding proof). A derivative vector a,9,,,n fixed, is well known to transform as a covariant derivative, i.e., a four-vector, in flat space. See, e.g., Jackson (1975), p. 31, Eq. (2.20),with a,,At = 0. Thus, ai9n has a proper length, and this is given by the invariant equation (V.13). That derivatives transform according to (V.ll)is given in Lawrie (1990), p. 61. It is also implied by Eqs. (V.10)and (V.12).
149
PHYSICAL INFORMATION
are called the Lorentz group of transformations. The most well-known is
[
y
o o
0
1 0
-ivy/c
ivy/c]
,
y
i
(1 -
U ~ / C ~ ) - ” ~ ,(V.14)
0 0
where c is the speed of light in vacuum. (That c should be a constant is derived in Section IX.) This is the ordinary, proper Lorentz transformation. The invariance of Z has an intuitive appeal. It states that the information, or accuracy, in determining 8 should not depend upon absolute speed. There is no preferred speed for estimating a four-parameter 8. A Lorentz transformation [A] makes the inner product of any fourvector frame-invariant. We worked with the particular four-vectors r and aiqn, n fixed, but another example is the four-vector q itself. Its inner product is p(r), by Eq. (V.2). Hence, the output probability density p(r) is also frame-invariant. Also, we demanded the first integral I in (V.7) to be frame-invariant. Then, since the second integral J in (V.7) equals the first (at solution q), conditional information J must likewise be frame-invariant. In summary, Axiom (iii) of invariance of I to phenomena leads naturally to its invariance to reference frame, and this gives rise to the Lorentz group of transformations. All major components p(r), I, J, and I of the theory obey Lorentz invariance, i.e., covariance. Thus, the concept of physical information I forms a natural bridge into the special theory of relativity. Many of the derived laws p due to the information principle Eq. (IV. 11) are explicitly Lorentz invariant as well (Sections VI and VII). VI . CLASSICAL ELECTRODYNAMICS The previous scenario suggests that a covarient (four-vector) approach should be taken. We follow the agenda of Section IV,G. In step (1) let the gedanken measurement experiment be the estimation of the mean spacetime coordinate (VI.1) e = (el, 8, ,e3, e4), e, = i ~ ( o over the “particles” of an electromagnetic field, i.e., photons. The measurement channel is Fig. 1. The estimated B is based upon one space-time measurement p=8+r (VI.2)
150
B. ROY FRIEDEN
of photon position in the field. All quantities are four-space coordinates, with r = (xl, x2,x,, x4) = (x, y, z , ict). (V1.3) Thus, dimension K = 4. Also, define a four-current vector in the usual way, J
= (j, icph
(VI.4)
where j is the current density and p is the charge density. A. Characteristic State
The density p(r) of the field particles is identified with the Poynting flux density rate, since this measures the space-time density of photons. Of course, the expression for the Poynting flux in the presence of a general source J does not follow the characteristic information form Eq (111.34) for I required by the information approach. However, as discussed below statement (IV. I), in the gedanken scenario sources are assumed to be absent. Then the Poynting flux becomes proportional to the square of the complex four-potential (Morse and Feshbach, 1953, p. 223) (VI.5) q = (41, 42 , 43 44) = (A,$1, 9
where A is the vector potential and 4 is the scalar potential. This is step (2) of the agenda. Probability law p(r) is now in the required summation form Eq. (III.33b) 4
p(r) =
c 4;
(VI.6)
n=l
(with N = 4). An irrelevant proportionality constant has been made unity, for simplicity. The distinction between the gedanken p(r) and the physical p(r) (here the Poynting flow rate) should continue to be kept in mind (see Section IV). Thus, the gedanken p ( r ) is constructed from the q(r) as if it obeyed Eq. (III.33b). However thephysicalp(r) is formed from the q(r) (V1.5) as, directly, the Poynting flow rate. The foregoing has accomplished steps (1) and (2) of the agenda of Section 1V.G. Step (3) is as follows. With N = 4, the Fisher information obeys the four-space version of Eq. (111.34), drQ,(r) n=l
nq,,(r),
(VI.7)
151
PHYSICAL INFORMATION
where 0 is the “box” operator (VI.8) We now seek the function F in Eq. (IV.ll) that forms conditional information J .
B. Conditional Information J and Solution q We proceed to step (3) of the agenda. For any one order n, let (V1.9) by definition (V1.8). To evaluate the value M,, of (VI.9), take recourse to the Fourier space representation of q,,(r)
pk real,
k = 1,2,3, p4 imaginary.
Then, applying Parseval’s theorem to (V1.9) casts it in frequency space as (VI. 11) ~ , = + 1 forn=1,2,3,
~ , = - l forn=4.
Here we have used the fact that qn(r)is purely real for n I3, and purely imaginary for n = 4. Interchanging orders of summation and integration gives
Mn
= en
s
4
dPP21Qn(p)12, p2
C
(VI.12)
i= 1
(VI.13) Again we use Parseval’s theorem, now for the two factors (p2Q,,),Q,*.The result is a return to r-space,
M,, =
-
s
dr(~2qn(r))qn(r)-
(VI.14)
152
B. ROY FRIEDEN
Therefore, summing Eq. (VI.14) over n gives, by Eq. (VI.7),
s
dr(U2qn(r))qn(r) = - dr q * 0 2 q . n= 1
(VI. 15)
We can now form the physical information. By Eq. (IV.ll) it is the difference between the general Fisher I and what I equals for the physical (not gedanken) scenario. In the physical scenario, the sources J now enter in. By Eqs. (VI.7) and (V1.15), 4
I=4
c
n=l
~ d r O q ~ . 0 q n + 4 a ~ d r q . 0 2 p = 0 (VI.16) . J
J
The new parameter a is introduced anticipating that @q will next be expressed in terms of a function F having arbitrary units. Represent 4
0 2 q = -F(J, q),
c
or
qnkk
=
n
-Fn(J, q),
= 1,
..., 4.
(VI.17)
k=l
This brings in the sources J, as required. Four-function F is to be found. We use a selfconsistency argument for this purpose. Substitute Eq. (VI.17) into (V1.16), giving I =4
c n
drUqn Og, - 4a J
J
-
drq F.
(VI. 18)
According to the EPI approach (IV.ll) this is to be extremized to give q . Accordingly, use the Euler-Lagrange equations
,
n=l,
...,4,
(VI.19)
where d: is the integrand of Eq. (V1.18):
6: = 4 1 U q n* Oqn - 4aq .F.
(VI.20)
n
The result is a solution (VI.21)
Compare this result with the right-hand Eq. (V1.17). By consistency the two should be the same. Directly, they are if a and F satisfy (VI.22)
PHYSICAL INFORMATION
153
The simplest solution is
a4 _ - 0,
a = 2.
(VI.23)
&In
Thus, F is not a function of q, or F = F(J) alone. The simplest example of such a function is 47r F(J) a J = -J, (VI.24) C
c the speed of light. This choice also has the advantage of being a four-
vector, and hence Lorentz covariant (see later discussion). Thus, on the basis of consistency, Lorentz covariance, and simplicity, Eq. (VI.17) gives a solution
4n
(VI.25)
Ozq = --J.
C
This accomplishes step (4) of the agenda. Equation (VI.25) is the e.m. wave equation in the Lorentz gauge, one important milestone of electromagnetic theory. We note that it is a vector equation in four-vectors q, J. This makes the equation Lorentz covariant, as required of any physical quantity q (see Section V). It is known (Jackson, 1975, p. 220) that the combination of the wave equation (VI.25) and the Lorentz gauge condition give rise to Maxwell’s equations. This is shown in Appendix D. Because Eq. (VI.25) is Lorentz covariant, so are the Maxwell’s equations derived from it, as required. In this way, the EPI principle (1V.I 1) can be used to derive classical e.m. theory. It is interesting to consider alternatives to (VI.23) as solutions to (VI.22). One class of such solutions is u = 1,
F, = qnG(J)
(VI.26)
for some function G. In this case Eq. (VI.17) gives
C
Qnkk
= -qnG(J),
n = 1, * * * , 4 ,
(VI.27)
k
as a “new” e.m. theory. Now the right-hand side must be expressible as a four-vector (see material following Eq. (VI.25)). Then since qnis already a four-vector, necessarily its multiplier G(J) is a scalar in J, or is independent of J. Either alternative is unreasonable physically. If G(J) is a scalar, i.e., a function G(J), then each component qn in (VI.27) has the same general solution. This is too specialized an answer for a general scenario J. Alternatively, if G(J) is independent of J, then the general equation (VI.27) is independent of sources. Again, this cannot represent the general e.m. scenario.
154
B. ROY FRIEDEN
VII. QUANTUMMECHANICS Usually quantum mechanics is introduced in the non-relativistic scenario, for reasons of simplicity. Relativistic Dirac is usually more formidable to derive than nonrelativistic Schrodinger (SWE). Here, however, it is simpler to first derive the relativistic versions (Klein-Gordon and Dirac equations). This is because EPI is naturally a covariant, vector theory (see preceding sections). In fact, EPI demands all physical theory q to be covariant (see Section V.) Hence, a four-dimensional approach will be taken, K = 4, with time as the fourth coordinate. This has the virtue of naturally giving the time dependence for solution q. Previous, three-dimensional versions of EPI derived the stationary, time-independent form of the SWE. Timedependence had to be tacked on in a somewhat ad hoc manner at the end (see Frieden, 1990, 1991b, 1993). The EPI approach allows a general number N of modes q to be present at the outset. EPI is naturally a vector theory (as Dirac requires). In all physical icenarios, parameter N is fixed at the end as the minimum number needed to describe the solution q. It will naturally result that a value N = 1 suffices for the Klein-Gordon solution, whereas value N = 4 is the minimum value required by the Dirac solution. Complexity itself follows from the many-estimators 8,, ...,iN effect defined in Section II1,H. The nonrelativistic limit will be taken at the end, to give the Schrodinger wave equation. As mentioned earlier, a 3-D EPI approach may be used to directly give the stationary SWE, thereby avoiding the relativistic derivation (Frieden, 1990, 1991b). Hence, a stationary theory naturally arises from a 3-D EPI approach, while the full time-dependent theory arises from a 4-D EPI approach. EPI works at any level of dimension. Of course, the 4-D approach predicts spin as well. As an unexpected dividend, the relativistic energy balance Eq. (VI.25) will also be derived. A. Gauge Covariance
It was shown (Section II1,H) that Fisher information I is coordinate covariant. By Axiom (iii), I should be gauge covariant, and since I a I, so should I. As is known (Lawrie, 1990, pp. 159, 164-165), a Lagrangian may be made gauge covariant by initially forming it in a free-field scenario, and then replacing its partial derivatives V and a /a t as
v -,v - ieA/ch,
a/at
-+
a/at
+ ie+/h.
(VII.1)
Here V is the specifically three-dimensional gradiant operator (VII .6), made boldface to distinguish it from the generally K-dimensional operator (11.1b).
PHYSICAL INFORMATION
155
Also, A and 4 are the electromagnetic potentials; see Eqs. (V1.5), (D.2); h is Planck’s constant h divided by 27r, and e is the charge on the particle. The multipliers (e/ch) and (e/h) of A and 4 in Eq. (VII.l) must be constants (Lawrie, as before). It will be shown that quantities c and h are universal constants (Sections IX and VII,E, respectively). Hence, charge e must also be a universal constant. Luckily, Fisher information Z depends on precisely derivatives (VII. 1). This permits it to be made gauge covariant in the same way. This is another attractive property of Fisher information from the physical point of view. The free-field scenario dovetails, as well, with our requirement that I be formed in a scenario for which it is ordinarily maximal, i.e., the characteristic information state (Section 111,H). We will see that, in all physical scenarios, it is precisely a free-field scenario that gives rise to the characteristic information state. (For example, see preceding Section V1,A.) We again follow the EPI procedure (1)-(4) of Section IV,G. The physical scenario is one of a material particle in a general e.m. field of potentials A, 4. The gedunken scenario places the material particle in a free field. (See preceding paragraphs.) Then a transit to the general e.m. field is made, at the end, through replacements (VII. 1). The aim is to determine the physics of the particle, as defined by its modes q . Step (1) is to define coordinate space, which is, initially, (x, y, z) at a given time t . Let the parameter estimation in Fig. 1 now be of the ideal, or mean, position (0, ,02, 0,) of the particle at time t. This would imply that coordinate space is the usual (x, y , z) three-dimensional space. However, we have to dispense with a three-dimensional theory. Section V showed that, because of Axiom (iii) of invariance of I to reference frame, all coordinates of the theory must be cast in four-vector form. Hence, define a fourposition vector (xl, x 2 , x 3 ,x4), x1 = ix, x2 = iy, x3 = iz, x4 = ct,
( x l ,x 2 , x3) = r (VII.2)
The dimension of coordinate space is now K = 4, and we have made a fourvector out of the usual (x, y, z, t) real coordinates by the usual insertion of imaginary i’s. This means that, effectively, the measurement problem of Fig. 1 is transformed into one of determining the ideal space and time (now) coordinate (VII.3) el , e, , 0, , e4 of a material particle. The 0, are defined as in Eqs. (V11.2), with the first three components imaginary. Hence, what was initially a three-dimensional estimation problem has been transformed, by the demands of covariance, into a four-dimensional one.
156
B. ROY FRIEDEN
This brings us to an interesting issue. As in Eqs. (V11.2), the first three components 8, ,d2, Os are imaginary numbers. How does the EPI principle accommodate imaginary parameters? This was taken up in Section III,C, 1. The real coefficients x, y , z of the imaginary data are simply used to form the estimates. Also, the resulting Fisher information (111. log) still has significance as the information specifier of the data. The upshot is that in Eq. (111.34) some of the components Waxi of the gradient are now imaginary. These give negative contributions to I when the dot product is taken. (These negatives are in fact required to form the correct Lagrangian for the problem.) Hence, we seek modes q in a (now) four-dimensional space where [by Eq. (III.33b)l N
P(X, Y , 2, t ) =
c d(X,Y ,
n= 1
2, 0 .
(VII .4)
Here we use the fact that the p.d.f. p of a complex number (ix,iy, iz,ct) is equal by definition to the joint p.d.f. of its real components (Frieden, 1991a). Since (VII.4) defines modes q, these are defined to have this property as well. The Fisher I corresponding to a p.d.f. (VII.4) is, by Eq. (111.34), I = 4cS
11
Sdxdydzdt
Oqn.Oqn,
(VIISa)
n= 1
using the notation (VI.8), (V11.2). We want to explicitly show the space and time dependencies. Defining a three-dimensional (ordinary) gradient
v
=
@/ax, w a y , a/az),
(VII.6)
by Eqs. (V11.2), Eq. (VII.5a) becomes
B. Transition to Complex Modes In contrast with our purposeful use of imaginary coordinates (VII.2) (demanded by covariance), the real modes q will naturally pack into complex pairs. Whereas covariance must be built in, wave complexity automatically occurs. A further benefit is a natural interpretation of the complex modes as probability amplitudes. There is no need for the usual ad hoc assumption (due to Born) of this effect. These are shown next.
157
PHYSICAL INFORMATION
Define complex wave functions, from the real modes q, as n = 1, ...,N.
+ iq,,,
iyn = q2n-1
(VII.8)
Then directly N
C
n= 1
N w,*wn
=
C
N (dn-1
n= 1
n ) = nC= l d -
+d
(VII.9)
In the same way, N
N
and (VII. 11) Using identities (VII.10) and (VII.ll) in Eq. (VII.7) gives
(VII. 12) Hence, information I, despite being defined in terms of purely real mode functions q, can be re-expressed in terms of complex mode functions yt. The interpretation of these new complex modes yt is immediate. From Eqs. (III.33b) and (VII.9), N
P
=
C
(VII. 13)
w,*wn*
n=l
Thus, modes yt become the familiar probability amplitudes of quantum mechanics. Notice that this correspondence arises naturally from the theory. There is no need for the usual Born interpretation as an ad hoc addendum. C. Definition of Momentum-Energy Space So far, such concepts as mass, momentum, and energy have not been used or defined. Momentum-energy space is defined, now, as the Fourier conjugate to space-time space, or
(ip/h, E/ch) = (ip,/h, ip,,/h, ip,/h, E/ch)
Founer conj. to
(ix, iy, iz, ct). (VII.14)
158
B. ROY FRIEDEN
Vector p is defined to be the momentum vector, with components
p x , py , pz . Parameter E is defined to be total energy. These, then, connect
with the space, time coordinates (r, t ) through a Fourier relation
Wn(r, t ) = 1/(2nA)’
ss
d ~ r d E $ ~ ( p , E ) e x p [ - i / A ( pr. - Et)]. (V11.15)
The $,, are new spectral functions that are conjugate to the i y n . Their physical significance is derived below. At this point, h could conceivably be a parameter that varies from problem t o problem.
D. Finding What I Equals, so as to Form I Our objective is to form physical information I. Following step (3) of the EPI procedure of Section IV,G, we have to first find what Fisher I equals for this scenario. This is now easily done. By Parseval’s theorem, since $,, and iyn are Fourier mates,
and
Using these two relations in Eq. (VII.12) gives
At this point we need a physical interpretation for Parseval’s theorem,
ss
dpdE14n12,
1$.,’
Again using
n = 1, ...,N. (VII.19a)
Summing both sides over n, using correspondence (VII.l3), and using the normalization of p, gives 1=
5
N
CfpdEP(p,E),
P(p,E)E
c I$n(p,E)I’*(VII-lgb)
n= 1
159
PHYSICAL INFORMATION
Then this new quantity P obeys P 1 0 and normalization. This implies that P is the p.d.f. in (p, E ) space. Then Eq. (VII.18) becomes
This is now an expectation,
z=
($)(-,u2+;).
(VII.21)
E. Definition of Mass, Resulting Energy-Mass Relation
To further evaluate (VII.21) we invoke Axiom (iii) (Section IV,A), that I is a constant, regardless of scenario. It suffices for Fisher Z of (VII.21) to be a constant. Then both factors in (VII.21) must be constant. In the first factor, quantity c is shown elsewhere (Section IX) to be constant. Hence h must be a constant. Or, Planck’s parameter is a constant because of the invariance of information I. We turn next to the second factor in (VII.21). The statistical fluctuations of E, p in (VII.21) necessarily change from one set of initial conditions to the next. This would make Z a variable, contrary to our aims, unless -,u 2
+ EC 2 = constant = A2(m,c),
(VI I. 22)
where A is some function of the rest mass m and speed of light c (the only other variables of the free-field scenario). Solving for E gives
E 2 = c2p2 + A2(m,c)c2.
(VII.23)
To balance units in (VII.23), function A ( m , c) must obey a relation A = mc,
(VJI.24)
where m is defined to be the mass of the particle. Equation (VII.23) then becomes the usual relativistic energy balance E 2 = c2p2 + m2c4,
(VII.25)
which, of course, was our aim. Hence, the relativistic energy balance equation (VII.25), and the concept of mass, grow out of the demand that physical information I be constant. Of further interest is the resulting value of Z. This defines conditional information J.
160
B. ROY FRIEDEN
Substituting Eq. (VII.25) into Eq. (VII.21) gives directly
Z = 4m2c3/h2= J .
(VI I .26)
Thus, Z is proportional to the square of the rest energy mc2. This is, then, the Fisher information carried by a relativistic particle in a free field. Returning to the requirement that Z be a constant, we see from (VII.26) that rest mass m must then be a constant (c and h having already been found to be constant). For example, if the particle is an electron, this condition fixes its rest mass as a universal constant. F. Klein-Gordon Equation (Free Field) By step (3) of the agenda (Section IV,G), the physical information is formed as the difference between the general form for Fisher Z, Eq. (VII.l2), and what it equals, Eq. (VII.26). The result is
(VII.27)
The far-right integral is unity, by normalization, and traces from (VII.26). We note that (VII.27) is in the required form (1I.la) for attaining value zero at its extremum, as required. EPI axioms (ii), (iii) postulate that all solutions I = extremum or I = 0 are physically meaningful. The extremum solution is found by using the integrand of 0'11.27) as the Lagrangian L in the Euler-Lagrange equations
I
- 'a dx aw&
] +-
dy
I
]
aL + -
av&
dz
I
]
aL aw&
+--
dt
I"]awZt
'a aw,"
=-
(VII.28)
(VII.29)
etc., for they, z, t derivatives. The result is, after multiplying through by c2h2, -c2h2V2vn+ h 2a-2 W n at2
+ m2c4wn= 0,
n = 1,
...,N .
(VII.30)
This is the free field Klein-Gordon equation (see Schiff, 1955, p. 320). There are, at this point, N Eqs. (VII.30). Parameter Nmust be fixed. The EPI approach allows for a general Nuntil a solution is formed, at which point the minimum N that suffices to define the solution is the choice. Since Eq. (VII.30) is a second-order differential equation, a value N = 1 suffices (recallingthat each complex tynhas two components).
161
PHYSICAL INFORMATION
G. Klein-Gordon Equation (with Fields)
The replacements (VII.l) make the theory covariant, as required from Section V. Making these replacements in Eqs. (VII.27) and (VII.30) give a field-dependent information I = 4c
11
dr d* n= 1
[
- (V + z ) V i
(v - z ) V n (VII.31)
and a field-dependent solution
(VII.32)
The latter is the Klein-Gordon (K-G) equation (Schiff, 1955, p. 320). As a further check, Eq. (VII.31) has the correct Lagrangian to produce the K-G solution (VII.32) (see Morse and Feshback, 1953, p. 316). As with the free-field solution (VII.30), value N = 1 suffices to define a general solution to (VII.32). This represents one complex wave function. The non-relativistic limit of Eq. (VII.32) can readily be taken (Schiff, 1955, p. 320). It gives the Schrodinger wave equation, as usual. H. Dirac Equation (Free Field)
An extremum problem usually has a unique solution, i.e., the one obeying the Euler-Lagrange equations. By contrast, we want a multiplicity of solutions w describing the physics of particles with different spin values. Our thesis is that these correspond to the multiple “roots” \y of information I = 0, defining the alternative [Axiom (iii)] set of EPI solutions. By Eq. (VII.27), these satisfy I = 4c
11
dr dt
N
C n=l
[-(Vw,)*
Vy/, + A2f+)*
?$’- q Z [ ~ , , / ’=] 0, (VII.33)
where we have introduced two parameters
A
=
l/c,
q = mc/h.
(V11.34)
162
B. ROY FRIEDEN
The roots of Eq. (VII.33) may be found by a factorization approach analogous to Dirac’s (the latter shown in Schiff, 1955, p. 324). Noting that (VII.33) is quadratic in V w , factorization will lead to a product of forms linear in Vw. Setting either form to zero will satisfy the zero of (VII.33) and result in a differential equation linear in Vw. This is the Dirac equation. Hence, in the following, we generally follow Dirac’s factorization procedure. The conceptual difference is that Dirac started from an ad hoc Hamiltonian operator equation, where derivative operators represent momentum or energy (Schiff, 1955, p. 323). We make no such assumptions here. The mathematical distinction is that Dirac factored the operator expression, whereas we factor an algebraic form (VII.33). Define (Dirac’s) N x N matrices [a,],[ay], [a,], [/I] with elements to be determined. For convenience of notation, define a vector of matrices
(VI I. 35) Denote the vector of all
wn as w
=
(Wl
9 * * * I
(VII.36a)
WNIT.
The inner product of [a]with V v is then
[4 vv
I
[a]Tvw= [4 awlax + [ay] aw/ay + [4 aw/az. (VII .36b)
This is a vector of rank N. To aid in the factorization, introduce two helper vectors of rank N, (VII.37) v2 = i[cr*]
V w * + [P*]qw*- iA-.aw* at
(VII.38)
It is shown in Appendix E that if matrices [a,],[ay], [a,], [/I]are Hermitian and mutually anticommute, then dr dt Re(vl * vz)
(VII .39)
163
PHYSICAL INFORMATION
The right-hand side is proportional to the form (VII.33) for 1. Hence, by Axiom (iii) (Section IV,A) we seek a vector solution w to the associated problem
I =4ci
5
drdtRe(v, - v 2 ) = 0
(V11.40)
This is satisfied by either of v l , v2 having value zero. We arbitrarily choose v l , noting that instead choosing v2 leads to essentially the complex conjugate of the result
v1 = i[a] - V\y - [/3]qw + i l aw - = 0.
(VII.41)
at
The right-hand equality is the free-field Dirac equation (Morse and Feshbach, 1953, p. 264). I. Dirac Equation (with Fields)
The replacements (VII.1) make the theory covariant, as required from Section V. Making these replacements in Eq. (VII.41) gives a fielddependent solution
-
i[a] (V -
%). -
q[/3]w + i l -
(:t
7)
+ - yt
= 0.
(VII.42)
This is the Dirac equation, describing the probability amplitudes y of a particle with spin. The spin is embedded in matrices [a], as shown next. J. Dimensionality N, Resulting Spin, Nonrelativistic Limit Thus far, the length N of vector w has been left arbitrary. In fact, the smallest value of N that suffices to describe four N x N matrices [a],[/.?I that mutually anticommute and are Hermitian is N = 4 (Schiff, 1955, p. 326). This describes a spin-* particle, the electron. Explicit representations are
(V I I. 43)
164
B. ROY FRIEDEN
where each “element” is a 2 x 2 matrix
( ), [I]=(; ;),
[ax1 =
0 1 1 0
[ayl =
( y -;), ).
[Ol=( 0 0 0 0
[a,] =
(
0 - 1O)>. (VII.44)
Matrices [a,], [a,], [a,] are the Pauli spin matrices. Presumably, higher values of N will lead to solutions ty defining other fundamental particles. The non-relativistic limit of Eq. (VII.42) can be directly taken (Schiff, 1955, pp. 329-330). As is well known, it gives back the Schrodinger wave equation plus a term involving interaction of particle spin with the magnetic field H.This does not disappear unless the particle has zero spin.
K . Discussion EPI principle (IV. 11) has been shown to derive the Klein-Gordon and Dirac equations of relativistic particles. According to the choice of parameter N describing the number of modes y,(r) in the output law p(r), one or the other answer results. Each formulation describes the mechanics of a different fundamental particle. Presumably, the use of higher N describes particles with higher spin values. Aside from deriving the spin-dependence, the approach also derived the relativistic energy balance Eq. (VII.25). This required a definition (VII. 14) of momentum-energy space and (VII.24) of mass. The constancy of the electronic charge e, Planck’s constant h, and rest mass m were also derived. It is interesting that, whereas the derivation of the Klein-Gordon equation involves the solution to a variational problem, namely the extremization of the information (VII.27), derivation of the Dirac equation is instead algebraic in nature. It follows from equating information (VIZ.27) to zero. Then the helper vectors vl, v2 of Eqs. (VII.37), (VII.38) accomplish factorization of the quadratic form (VII.27) into two linear forms, Eq. (VII.40). Setting either linear form equal to zero gives the Dirac formulation Eq. (VII.41). This is in the spirit of Dirac’s factorization, although from a purely algebraic standpoint. Zero-information solutions were sought, in cases of spin, because an extremum solution is unique, whereas we want multiple solutions corresponding to different particle spin states. A unique physical aspect of the derivations is that they do not rely on the usual association of gradient operators with momentum or energy. All gradients arise naturally: in the definition (111.34) of Fisher information,
PHYSICAL INFORMATION
165
and in either the Euler-Lagrange solution (V11.28) or the trial solution forms (VII.37), (VII.38). Gradients need not have prior physical meaning in the information-based approach. In effect, the approach derives their physical meaning. It is instructive to compare this derivation of quantum mechanics with that of classical electromagnetic theory in Section VI. Both arise out of the same gedanken m.e. experiment, i.e., where the space-time coordinate of a particle is to be estimated. The main difference is the way the sources are defined. Thus, in the absence of sources and rest mass the two results (VI .25) (e.m. vector wave equation) and (VII.30) (Klein-Gordon equation) are essentially the same. Also, then the Dirac equations can be placed in a vector form that is essentially Maxwell’s equations (Bocker, 1994). The structural similarity between classical e.m. theory and relativistic quantum mechanics has previously been noted (Eisele, 1970), and we see that it traces from a common measure-estimation scenario. VIII. UNCERTAINTY PRINCIPLES The Heisenberg uncertainty principle for a joint (y,p) measurement of (position, momentum) is e,2(p2)1 (h/2)’, e,2 = ((x- y)’). (VII1.la) The principle for a joint (T,E) measurement of (time, energy) is (VIII. 1b) ef = ( ( T - 7)’). e,2(EZ>2 (t1/2)’, These define the ability to measure pairs of quantities (X, (p)) or (T, ( E ) ) simultaneously, where X,Tare the ideal position and time values. These are physical results, but can be compared with a corresponding result from pure estimation theory, the Cramer-Rao (C-R) inequality Eq. (111.3). Let us entertain the thought that the Heisenberg relations (VIII.la,b) somehow derive from the C-R inequality. This is reasonable, since quantum mechanics derives from optimum measurement considerations (Section VII), and the C-R inequality defines the ability to measure and estimate. A complication is that the C-R defines the ability to measure a single parameter 8, and not two simultaneously. Therefore, if the Heisenberg relations are derivable from the C-R inequality, where does the second, conjugate variable enter in? In fact, the Fisher information I in the C-R inequality will provide the link. It will be found to be proportional to the mean-square spread in the variable that is conjugate to the measured one. Generalizations of Eqs. (VIII.la,b) will be derived from the C-R inequality in the next two sections (see also Frieden, 1992).
166
B. ROY FRIEDEN
A . Position-Momentum Relation
Most of the work has already been done in Section VI1,C. Here we are interested in establishing the uncertainty relation for coordinate components (x,p). To bring in the C-R inequality, we need to choose a parameter to measure. Let this be the ideal (mean) position 6 = X of a particle, based upon a measurement y, obeying y=X+x.
(VI11.2)
The random variable is position increment x . This causes y to be random, and hence estimate g ( y ) as well, and so we seek the mean-square error of R ( y ) from ideal value X. Temporarily let the system be in its characteristic state. Then the information I in a measurement y is the one-dimensional version of Eq. (111.34), I =4
5
N
dx
c qA2
n=l
= 4
s
N
dx
u/A*u/A,
qn = qn(x),
w,,
n=l
= w,(x),
(VIII.3)
where the prime denotes d/dx. The second equality is the one-dimensional version of Eq. (VILlO), due t o packing real modes qn(x)into associated complex modes u/,(x) according to Eq. (V11.8). The latter were found to be “probability amplitudes,” the sum of whose squares gives the p.d.f., Eq. (VII.13). The momentum space conjugate to wave amplitudes vnobeys the onedimensional version of Eq. (V11.15), (VIII.4) Using this relation in Eq. (VIII.3) gives a Parseval’s theorem (VIII.5) By Eq. (VII.l9b), the sum is just the marginal p.d.f. P(p). Then (VIII.5) is in the form of an expectation, (V I I I. 6) As we mentioned, this assumes the system to be in its characteristic state, i.e., with artificially separated modes q. Instead, now imagine the same modes to be in a natural, generally overlapped state because of being a physical solution to the problem. Call the resulting Fisher information I,. The r.m.s. error e in the estimate 2 of the mean position X of a particle,
167
PHYSICAL INFORMATION
based upon one measurement y , obeys the C-R bound in Eq. (111.3):
eiIx 2 1,
ei
= ((x- &y))’>.
(VIII.7)
It is shown in Appendix B that the characteristic state has the property that its Fisher information exceeds that of a system where the same modes are made to overlap, I- 5 I . (VIII.8) A Heisenberg result of the type (VIII. la) follows from the combination
of Eqs. (VIII.6)-(VIII.8). By Eqs. (VIII.6) and (V111.8), Ix 5
4
(VIII .9)
p 2 > .
Multiply the latter by e i , giving (VIII.10) Then by the C-R result (VIII.7) (VIII. 11) This is of the same form as the Heisenberg principle (VIILla), with one notable difference. See the following discussion.
B. Tirne-Energy Relation This derivation closely parallels the preceding, and so will be briefer. Here we form a gedanken estimate of the ideal event time (say, of arrival) 8 = T of a particle, based upon an event time measurement r. The latter obeys
r=T+t.
(VIII.12)
The random variable is time increment t , and we want to establish the resulting error in the estimate f(7) from T. Letting the system be in its characteristic state results in a one-dimensional Fisher information (111.34) on modes qn(t),
I =4
S
N
dt
C n=l
qA2 = 4
I ”
. dt C
n=l
I+v;*I+v;,
(VIII. 13)
where the prime now denotes d / d t . The second equality is the onedimensional version of Eq. (VII.lO), due to packing real modes qn(t)into associated complex modes I+V,(t)according to Eq. (V11.8). The latter were found to be “probability amplitudes,” the sum of whose squares gives the p.d.f., Eq. (VII.13).
168
B. ROY FRIEDEN
The energy space conjugate to wave amplitudes dimensional version of Eq. (V11.15),
wn obeys
Wn(t) = - dE 4,(E) exp( - iEt/h). G h
the one-
(VIII.14)
Using this relation in Eq. (VIII.13) gives a Parseval’s theorem, N
(VIII. 15) By Eq. (VII.l9b), the sum is just the marginal p.d.f. P(E). Then (VIII.15) is in the form of an expectation, 4 I = -( E 2 ) . (VIII.16) h2 Now let I, be the information for the same modes in a natural, generally overlapped state. The r.m.s. error e,in the estimate pof the ideal time Tof a particle based upon one time measurement 7 (see Eq. (VIII.12)) obeys the C-R bound Eq. (111.3)
e$I,
L
I,
e2,= ( ( T -
T(T))~).
(VIII.17)
But I,
5
I.
(VIII. 18)
Then a Heisenberg result of the type (VIILlb) follows from the combination of Eqs. (VIII.16)-(VIII.18), 4 1 Ipe’T<E2). (VIII. 19) This is of the same form as Heisenberg principle (VIILlb), with one notable difference. See the following discussion. C. Discussion: What Do the Heisenberg Relations Really Mean?
It is important to interpret the meaning of the derived inequalities (VIII. 1l), (VIII.19). These each follow from an attempt at measure-estimating a single (but not two) variable. The prediction is that the other variable is fatally “blurred out” even before any attempt is made at its measurement. Thus, Eq. (VIII.11) was derived on the basis that position X alone is to be measure-estimated. This results in a blurring (Eq. (VIII.ll)) of the conjugate momentum coordinate. According to the derivation, this blur exists even before momentum is measured. Thus, a definite momentum value does not exist prior to its measurement, confirming very well the Copenhagen interpretation of quantum mechanics.
PHYSICAL INFORMATION
169
Of course, a measurement can only add error to an intrinsic spread. Thus, any actual attempt to measure-estimate ideal momentum must suffer at least the uncertainty predicted by (VIII. 11). This leads to the usual statement of the Heisenberg principle, that any attempt at measurement of both (now) quantities will lead to errors that obey result (VIII.ll). However, as we have seen, this statement glosses over an interesting quantum measurement effect enroute. [This effect would not be seen if the usual route to the Heisenberg relations, Fourier relations (VIII.4) or (VIII. 14) (Bracewell, 1965), were taken.] The errors ex,e, in (VIII.la,b) represent errors in measured values from the mean, or true, values. These are the raw, observable errors with no processing of the data. By contrast, the measure-estimation channel of Fig. 1 (or Section III,A) has an estimation step following any measurement. Since the estimation step can be an optimum one (attaining “efficiency,” as in Section III,C), so that the individual errors would now be smaller, this would seem to imply that Heisenberg relations (VIII.la,b) might be overcome to a degree. For example, perhaps the right-hand sides could be made smaller because of the processing. In fact, the derived uncertainty relations (VIII. 1l), (VIII. 19), have precisely the forms of (VIILla), (VIII.1b). As far as uncertainty products are concerned, there is nothing to be gained by the estimation step. The only difference is that the errors are now those in the estimates, rather than in the direct observables. One concludes that Heisenberg uncertainty relations bind refined estimates as well as raw (direct) observables. Of course, results (VIII. 1l), (VIII.19) are directly the standard Heisenberg forms (VIII. la,b) when there is no processing step, i.e., when R ( y ) = y and F(T)= T. Hence, results (VIII.l l), (VIII.19) are generalizations of the usual Heisenberg results.
D. Efficient Estimator and Minimum Uncertainty Product An “efficient estimator” (see Eq. (1II.lOa) et seq.) scenario is defined to exist when the equality in the C-R inequality is accomplished. As was discussed, an example is the scenario where the “noise” x is Gaussian and the estimator R ( y ) = y , the data value itself. It is instructive to consider what this means with respect to the Heisenberg inequality (VIII.11). The “minimum uncertainty product” (VIII.ll) occurs when there is no processing, so that R ( y ) = y , and when all qn(x), +,,(p) are normal (Schiff, 1955, p. 56). Since the amplitudes are normal, the p.d.f. p(x) is normal as well. Then by the preceding paragraph the scenario is “efficient” as well. Hence, the presence of a “minimum uncertainty product” always implies the presence of an “efficient estimator.” This is another connection between Fisher estimation theory and physical theory.
170
B. ROY FRIEDEN
IX. GENERAL RELATIVITY The equations of motion of general relativity can also be derived by the EPI approach. This is somewhat surprising, because relativistic mechanics is usually regarded as a deterministic theory. Following, as usual, the procedure of Section IV,G, we have to define an appropriate gedanken measure-estimation scenario. As defined in Section VII,A, the scenario consists of particles moving about in free space. The particles are allowed to interact later, in the physical scenario. With no gravitational fields present, space is assumed to be flat, so we can use Euclidean coordinates. The particles are assumed t o be classical (non-quantum), and to each have mass m (defined in Section VI1,E). The particles are detected through their mass. It is assumed that the cosmological principle (Narlikarad, 1986) and Hubble’s law (Harwit, 1973) hold true. By the former, space is isotropic and homogeneous in its distribution of matter. Hubble’s law, in fact, follows from the isotropy condition (Harwit, 1973). Let the gedanken laboratory have origin 0 somewhere (anywhere) in space. Here we use the homogeneity property, according t o which the distribution of matter looks the same from every vantage point. The particle trajectories
x(r),
x
= (xl, x,,
x 3 ,x4) = (x, y , 2,-ict),
0
It It A ,
05
t 5 T,
(IX. 1) are to be found, using the EPI Eq. (IV.11). As usual, T is the proper time. Because of isotropy, the distribution of matter is the same in all directions. There is no preferred direction for location of a particle. Hence, the probability density for locating a particle at spherical coordinate position R obeys p o ( ~=)p , ( ~ ) ,
R
= (R, e, 41,
RB
IR IRA,
RB
> 0. (1x.2)
Radii (RB, RA)define the spatial measurement interval. Furthermore, the combination of homogeneity and isotropy in flat space implies that every point in space is apriori equally probable to detect a particle. This is a state of maximum a priori ignorance as to particle location. Anticipating a logarithmic Hubble effect, particle detectors are placed isotropically in space according to a density p(R) dR a d(ln R),
RB IR
IRA,
R B > 0.
(IX.3)
This choice of density function is arbitrary, but we will find that the information I that results is of the form Eq. (111.34) required of the information approach, i.e., the optimum form for estimation purposes (Section 111,H).
171
PHYSICAL INFORMATION
The probability density p ( R ) for a particle detection can now be ascertained. Let the detector have cross-sectional area a. Imagine a spherical shell of radius R to be centered at the laboratory origin 0. Then the effective number of detector positions on the shell is its area divided by the detector area, or 4nR2/a. Since, by isotropy, each such position is equally likely to register the detection, the probability density p ( R ) for a detection within shell ( R , R + d R ) obeys p ( R ) u (4nR2/a)p(R)
or
p ( R ) QC R ,
(IX.4)
by ( I X . 3 ) . This states that a detection is more likely to occur at large, than at small, R because there are more opportunities for detection at large R . This result can alternatively be derived as the transformation to spherical coordinates of a uniform distribution in Cartesian coordinates. The derivation is independent of the size of a. Hence, we drop the non-essential a-dependence from p ( R ) . The particle’s detection is due to its mass. Assuming each detector to have a constant detection efficiency Q = Prob(detection)/mass, then p obeys
(IX.5)
puam
as well as the proportionality in (IX.4). The exact units for Q are fixed later. Next we define the gedanken parameter to be estimated. Consider a particle detection that occurs during the finite time interval 0 5 ‘5 5 r, . The proper time of detection is registered. From this we want to estimate the mean proper particle detection time ( 7 ) = 0 over the given time interval. Since the parameter to be estimated is proper time ( r ) , we need to know the probability density function p ( r ) . It is easiest to first find p ( t ) , t the laboratory time. We can get this from Eq. (IX.4) for p ( R )as follows. By the Jacobian approach (Frieden, 1991a, pp. 99-loo),
By Hubble’s law, dR/dt is known to obey d -(lnR) = H dt
dR dt
or
-=
HR,
with H Hubble’s constant. Combining Eqs. (IX.4)-(IX.7) gives p ( t ) u amHR2,
or
R
=
R(t),
(IX.7)
172
B. ROY FRIEDEN
By Eq. (11.34), the corresponding information is (IX.9) Results (IX.8), (IX.9) are valid for laboratory coordinates (x,y , z , t ) . But by Axiom (iii), information I should be independent of the particular inertial reference frame chosen. That is, I' = I ,
(IX. 10)
where I' is I in a reference frame moving with respect to the laboratory with arbitrary constant velocity. [See also Eq. (V.6) et seq.] By the well-known rules of special relativity (see Section V), information (IX.9) should simply be extended to a four-sum (IX. 11) where proper time replaces laboratory time. Comparing Eqs. (III.33b) and (111.34), this I would arise out of a probability law
on proper time (compare with (IX.8)). This is the familiar expression for the area of a three-dimensional hypersurface on a four-dimensional hypersphere (Harwit, 1973). In effect, it extends isotropy to the four-vector position x as well. Equation (IX.12) is the required probability law for this particle scenario. Placing Eq. (IX.12) within the framework of Eq. (III.33b), we have (IX.13) Since qn has units of t-'l2, H has units of t-', and x,, has units of length, by (IX. 13) CY has units of probability/mass-area. The particles are now placed in their physical scenario, i.e., the gravitational fields are turned "on." The space may no longer be flat. Expressing Eq. (IX. 11) in any generally curved coordinate system x gives I = OlHm
C
m.n=l
S"
0
dx, dx, d7gmn(x(7))- -. d7 dr
(IX. 14)
Here g, is the metric tensor of the coordinate system. By Axiom (iii) (Section IV,A) physical information I must be independent of the curved coordinate system, or symbolically, I' = I.
(IX.15)
173
PHYSICAL INFORMATION
(Compare with Eq. (IX.lO).) Then, form (IX.14) for Z must also have this property of invariance. A sufficient condition for accomplishing the required information invariance is
C g,,(x(r))-dx, dT
m,n
dx,, - = constant, dr
(IX. 16)
by direct substitution into (IX. 14). This constant obviously has units of velocity-squared. It defines c2, the squared speed of light in vacuum (see Lawrie, 1990, p. 14). In this way, the information approach implies the constancy of the speed of light. Substituting Eq. (IX.16) into (IX.14) gives
Z = aHmcZ
r
dr = constant = J.
(IX.17)
Then by the EPI principle Eq. (IV.11) and Eqs. (IX.14) and (IX.17), the physical information is IEZ-J = aHm
jI
dr(g,,,,,(x(r))Z dx” dx” - c
= extrem.
(IX.18)
in summation notation. The extremum is to be attained through variation of the particle path amplitude x(7). The solution (Lawrie, 1990, pp. 74-75) for Lagrangian problem (IX. 18) is the Einstein differential equations of motion for the particle. This describes a geodesic path. Hence, all particles of the system travel along geodesic paths. This result is independent of value c2 for the constant in the Lagrangian, as surmised before. For nonrelativistic particles, with velocity tr 4 c, the geodesic solution goes over into Newton’s equations of motion (Lawrie, 1990, pp. 74-75). In this way, the EPI approach derives Newtonian mechanics. Of further interest is Eq. (IX. 17). It suggests the existence of a “proper” information flow rate dZ/dr obeying dZ/dT = aHmc2
aHE,
(IX. 19)
where E is the rest energy of the particle. Equation (IX.19) shows the equivalence of information flow rate to rest mass and rest energy. Proportionality constant a,previously defined at Eq. (IX.5), now takes on the role of a conversion factor between information on one hand, and mass or energy on the other. Parameter a was also a measure of mass detection efficiency (IX.5), so it has an interesting dual use. That energy and information are equivalent is not too surprising, in view of the quantum mechanical correspondence Eq. (VII.26). However,
174
B. ROY FRIEDEN
Eq. (IX. 19) shows that information and matter are also equivalent, perhaps a more exciting prospect. Moreover, because of the large conversion factor c2 it may be that a particle of macroscopic size can “release” a prodigious amount of information about its mean proper time. Hence, the particle can be located temporally with very small error (by Eq. (V111.17)). Rest mass m is seen to always be associated with H , as product Hm (see Eqs. (IX.8)-(IX.19)). This might indicate that mass and space expansion rate are related quantities. For example, if (as has been conjectured) H actually changes with time, perhaps m likewise changes such that product Hm remains constant. That m goes inversely as H makes some intuitive sense. We are currently working on deriving the Einstein field equations by the information approach. In this regard, Eq. (111.37), showing that information Z is proportional to the local curvature of information, is highly suggestive of a link to the Ricci curvature scalar R. Equation (IX. 14) may be described as giving the Fisher information in a proper time measurement. As we saw, this obeys invariance property (IX.15) to coordinate system. Suppose, more generally, that instead of proper time, the full space-time measurement of the particle were now made. By Axiom (iii) (Section IV,A), the information Eq. (111.34) is still to remain invariant. What form should it take? As is well known (Lawrie, 1990, p. 148), such a covariance property holds if, in Eq. (111.34), the volume element dr is replaced by the covariant space-time volume element
dr
-, c-’[-g(x)]
1/2
d 4x,
(IX.20)
and if all derivatives V are replaced by covariant derivatives D (Section 111,H). Notation g(x) denotes the determinant of the space metric g,,,,(x). In the limit of flat coordinate space, this new expression goes back into Eq. (111.34). The ramifications of this covariant form of Eq. (111.34) will be left to future research.
X. POWERSPECTRALl/f NOISE One of the great mysteries of physics and engineering is the phenomenon called “l/fnoise.” This defines a power spectrum of the form l / o a , o the frequency. As a physical phenomenon, l/f noise describes an astonishingly diverse range of phenomena. Only a partial list includes voltage fluctuations in resistors, semiconductors, vacuum tubes and cell membranes (!), traffic density on a highway, economic time series, musical pitch and volume, sunspot activity, flood levels on the river Nile, and the rate of insulin uptake
PHYSICAL INFORMATION
175
by diabetics. (See main reference Frieden and Hughes, 1994, and subsequent references for a description of these and other Vfphenomena. Also see summary articles by Weissman, 1988, and Bell, 1980.) What single effect could exist that would cause such a disparate array of phenomena to share the same form of power spectrum? The name “l/f noise” implies that a l/f power spectrum describes “noise” behavior, as if noise is the only phenomenon that all such effects could conceivably have in common. In fact, this intuitive notion agrees with the theme of this chapter. This is that the related concept of disorder, in particular extreme physical disorder, gives rise to all l/f phenomena. A. Problem Definition
Let S ( o ) denote the power spectrum (defined below) for a temporal signal X(t). A l/f power spectrum S ( o ) = must obey non-stationary statistics, since (as has been amply confirmed experimentally) the spectrum generally holds down to the smallest w that is measurable. For example, in weather data a l/f noise phenomenon has been observed down to o = lo-’’ Hz or 1 cycle in 300 years. A small o corresponds to a large time t , indicating a correlation time extending back to the onset of the process. Hence, fluctuations X ( t ) have an absolute dependence upon time, and are therefore non-stationary. The “strength” of the non-stationarity is, on this basis, dependent upon the strength of S ( o ) near the origin, i.e., the magnitude of a. In the context of musical compositions X(t), which obey a l/fphenomenon, it has been observed that power a = 0 defines music that sounds too discordant or random, a = 2 defines music that is too repetitious and “boring,” and a = 1 defines just the right trade-off between randomness (novelty) and repetition. Mozart’s music reputedly obeys a = 1. Correlation with the past implies memory. Keshner (1982) plots the autocorrelation functions for RC circuits that approximate a 1/f spectral law for each of a = 0, 1 and 2, and finds these to have increasingly negative slopes in the order (Y = 1,2,0. Thus, a system with a = 1 has a very long memory. The closer a is to 1, the greater is the influence of the distant past when compared with that of the recent past. For a near either 0 or 2 the X(t) process is influenced by the recent past much more strongly than by the distant past. In summary, a llfnoise process has memory, and the extent of memory is governed by the size of a. Non-stationary statistics, however, present a problem of definition of the power spectrum. The usual route to its definition is the Wiener-Khintchine theorem, according to which S ( o ) is the Fourier transform of a stationary
176
B. ROY FRIEDEN
autocorrelation function. However, there is an alternative (Frieden, 1991a). Consider a real-valued, temporal, stochastic signal X(t) over a time interval (0, T), T finite. It has an associated (complex) Fourier spectrum Z,(o) =
loT
dt X(t)e-'O'/fl,
i=
a,
(X. 1)
and a periodogram IT(w)
= IZr(o)I2.
(X.2)
As an example, the signal X(t) may be a randomly selected musical composition, where X(t) is the instantaneous squared voltage waveform. For simplicity, assume that the dc component of X ( t ) has been subtracted out, so that ( X ( t ) )= 0. (This is equivalent to subtracting out a fixed amount from the power spectrum at the origin, which has no effect on its shape elsewhere.) Define a power spectrum
In practice, the infinite limit can be well approximated by practicable time spans of modest length, since most musical compositions (and signals) are eventually ergodic. Any of Mahler's symphonies, e.g., are certainly long enough to be ergodic. We seek to derive S ( o ) as obeying a l/w" form, a constant. Equation (X.3) shows that we are seeking an equilibrium, or timeinvariant, form for s ( ~ The ) . principle of extreme physical information (EPI) may be used to derive such equilibrium functions (Frieden, 1993a).
B. Temporal Evolution and Disorder We next describe the evolution of the time signal X(t) in terms of Fisher information. It will be shown that as T 00, the disorder of X ( t )increases and consequently I a minimum value. This provides a basis for use of the EPI approach. For convenience, the language of acoustics is used. However, voltage fluctuations, or any other l/f phenomenon, can be described analogously. Consider the gedanken measurement experiment in Fig. 5 . Time signal X(t)is a musical composition, say, a randomly selected violin sonata. Signal X ( t ) is produced over increasing time intervals (0,To), (0, q),(0, T2),..., where To c c T2 ... . Suppose that a note o occurs in the first interval (0, To),and with complex amplitude Zo(w)= B(w) via Eq. (X. 1). However, we are not listening during the first interval, and so do not know either X ( t ) +
-+
PHYSICAL INFORMATION
177
FIOURE 5. Gedanken measure-estimationexperiment. The unknown tone amplitude e(w) is caused by signal X ( t ) over ideal interval (0, To).Subsequent tone amplitudes Z,(w), Z2(w),... are due to listening over ever-longer time intervals.
over the interval or &a),o fixed. Instead, we know spectral amplitudes Z n ( o ) ,n = 1,2, ..., over increasing, nested time intervals (0, T,,). The observable numbers Zn(o) are formed through Eq. (X.l), but without our knowledge of the underlying X(t) values. From simple observation of any one Zn(o)we are to best estimate B(o). Which value X n ( o )ought to lead, on average, to the best estimate? How should mean-square error in the estimate vary with interval length T,,? If the time sequence X(t) over interval (0, To)were known, by Eq. (X.1) O(o) would be known with zero error. Therefore we call interval (0, To) the “ideal data interval.” Suppose that the next interval (0, &) includes the ideal interval plus a small amount. Then, by Eq. (X.l), its Fourier transform Z,(o) should depart from B(o)by a small amount. Likewise, an optimum estimate of B(o)made on the basis of observation Z,(o) should incur small mean-square error. The trend continues. Interval (0,T2) includes the ideal plus more “tail” of X(t) than its predecessor. Therefore, the resulting Z2(o)will incur more error from e(o)than did Z,(o), and so will any estimate of e(o) based upon Z2(o).Hence, as time T increases, the optimized mean-square error e2 in knowledge of 0(o), o fixed, should increase. The Cramer-Rao inequality Eq. (111.3) states that optimum error e2 varies inversely with available information. This is for a one-dimensional unknown 8. However, our unknown B(o)is complex and therefore two-dimensional (K = 2). An outgrowth of the Cramer-Rao inequality was that for a 2-Dunknown, the error relates to a trace Fisher information I
178
B. ROY FRIEDEN
obeying Eqs. (111.17), (111.24), (111.29),
where p = p ( Z , , Zi) is the probability law defining the joint fluctuations of the real and imaginary parts Z , , Zi , respectively, of 2,. Mean-square error e2 in estimation of 0 was found to increase with T. Also, by (X.4) with e f = e: = e 2 / 2 (since 2, and Zi are identically distributed, see below), I = 4/e2. (X.a) Then Z must decrease with T. It follows that as T + 00, I tends toward a minimum value, I(p) = minimum.
(X.7)
Since physical information I OE I ( p ) , Eq. (X.7) suggests the use of the EPI principle. C. Review of EPI Procedure
The EPI procedure (Section IV,C) is briefly as follows: (1) Form a total information quantity I, which is the difference between a Fisher information term I and a “constraint” information J , I=I-J.
(X.8)
Fisher information I is of a universal form ( X . 9 , while J defines the particular scenario. Both I and J are to be expressed as functionals of the unknown distribution, here S ( o ) . (2) The latter is then varied so that both conditions I = I - J = extremum (X.9) and I = I - J = 0, I = I[S(W)], J = J [ S ( o ) ] , (X. 10) are met. This procedure will be followed below to form an output equilibrium law S ( o ) . In any scenario the solution [here S(0)l will satisfy (X.lO), since (X.10) is an axiom (Axiom (iii)] of the approach. However, a solution to (X.10) does not necessarily satisfy (X.9), since generally a root of a function
179
PHYSICAL INFORMATION
(say, a polynomial) is not necessarily an extremum as well. For example, in the scenario of relativistic quantum mechanics (Section VII), the KleinGordon equation obeyed both (X.9) and (X.lO), while the Dirac equation obeyed (X.10) but not necessarily (X.9) (depending on the form of the potential field present). A tenet of EPI theory is that every solution to either (X.9) or (X.10) has physical significance, i.e., occurs in nature. We call a solution that obeys both (X.9) and (X.10) a “principal solution” of the EPI problem. A power spectrum S(w) that is a principal solution of EPI will be sought next. This is for two reasons: (1) Since a principal solution arises as the solution to either (X.9) or (X.lO), it is, in a sense, a dominant solution, which complies with the ubiquitous nature of the l/f law. (2) An information I that satisfies both properties (X.9) and (X. 10) is also, mathematically, an “information divergence”; see Appendix C. This class of information quantities includes Kullback-Leibler entropy (C.1) and Shannon information as members. [Recall that both are expressible as the difference of two entropy terms, as in (X.8).] Extremum principle (X.9) then represents a generalized second law of thermodynamics, where the maximum is replaced by an extremum. This gives added physical significance to the solution S ( o ) found later. D. Application of EPI to l/f Scenario
In the context of our problem, EPI principle (IV.ll) takes the form
=
extremum = 0.
(X. 1 1)
The first right-hand term is Fisher information I; see Eq. (X.5). This is of a fixed form independent of scenario. Its effect on the solution is to produce a smooth output p (by principle (X.7)) regardless of scenario. The second term is J in Eqs. (X.8)-(X.10). Information J a n d functional F identify the particular physical scenario. This gives the principle its scope of application. Specific forms for I and J are given next. As was discussed, a time signal X ( t ) that exhibits l/f behavior is intrinsically nonstationary, essentially because of its long memory. The latter is indicated by the blowup of l/f near the origin [the so-called “infrared catastrophe” (Mandelbrot, 1977)]. A wide class of nonstationary signals X ( t ) was recently defined and analyzed by Solo (1992). This is the class of intrinsic random fields (IRF,) of order zero. An IRF, is a second-order, mean square continuous process X ( t ) obeying X(0) = 0, whose values are
180
9.
ROY FRIEDEN
nonstationary but whose increments are stationary. A particle exhibiting ordinary Brownian motion, for example, has these properties. The IRF, class of signals achieves nonstationarity as, effectively, a time-dependent sequence of stationary processes of short duration (as anticipated by Keshner, 1982). We shall regard X ( t ) as an IRF,. Solo (1992) shows that such a process obeys a central limit theorem. This is a key result. Then, both the real and imaginary parts of Z,(o) are independent Gaussian, with the same variance, at each o,and over all o. This allows us to compute I . If a density p(x) is Gaussian, with variance c?, a simple calculation [using one component of Eq. (X.S)] shows that
I = 1/02.
(X.12)
Here we have p ( Z , , Zi) separable Gaussian, with cr2 = S(o)/2. Then Eq. (X.5) gives l / a 2 for each term, or a total of 2/a2 = 4/S(o). Hence, I(w) = 4/S(o).
(X.13)
This is the behavior at one frequency o.Since Z,(w) is independent over frequencies, the information quantities (X. 10) add (Appendix A), and the total information is I =4
do/S(o).
(X.14)
n
This is the amount of Fisher information present about many (now) unknown tone amplitudes B ( o ) , o E S Z , SZ = (0,,02)in , independent, Gaussian data values ZT(o), o E SZ. The dc “tone” w = 0 is excluded from SZ; it has no physical reality. All subsequent integrals are over range SZ. The other contributor to I in (X.8) is the second term J. At first allow J to have a general form J =L
s
dwF[S(w),01,
(X.15)
where F is a general function of S and o.Obviously F must be known if solution S ( o ) is to be found. Subtracting (X. 15) from (X. 14) results in a physical information (X.8) obeying I =4
s
do/S(o) -
s
L d U F [ S ( o ) ,01.
(X.16)
We next find function F by demanding S ( o ) to be a principal solution of EPI.
PHYSICAL INFORMATION
181
E. Finding F [ S ( w ) ,w ] A principal solution S satisfies both Axioms (ii) and (iii). Then the solution
obtained by extremizing (X.16) is to be the same as by equating (X.16) to zero. The Lagrangian for the problem is L = 4/S - LF(S, w).
(X. 17a)
The Euler-Lagrange extremum solution is
auas = o = -4/s2
- A.,(aF/as).
(X.17b)
The condition that (X.16) be zero is satisfied by equating L of (X.17a) to zero, 0 = +4/S - L,F(S, 0).
(X. 17c)
We allow for different Lagrange parameters LI,L2 in (X.l7b), (X.17~)since they are independent solutions. Placing (X.17~)in the same form as (X.17b) by multiplying through (X.17~)by -1/S gives
0 = -4/S2
+ L,F(S, w)/S.
(X.17d)
Since both (X.17b) and (X.17d) must have one solution, we equate the two. The result is a simple differential equation -L,(aF/as) = L,F(S, W y s .
(X. 17e)
This has a solution
F(S,W ) = G(w)Sb,
b = -L2/L1.
(X.18)
The new function G(w)arises out of the partial derivative a/aS operation in (X.l7e), causing an integration constant G to become an integration function G(w).The information (X.16) now becomes I =4
s
~ o / S ( W )- A
S
dwS(o)bG(~).
(X.19)
The form of G(w) is found next. F. Finding G(w)
By Axiom (iii), I should remain invariant, at value zero, to different choices of the underlying coordinate system (here w). In past uses of Axiom (iii), invoking invariance to moving frame of reference gave rise to the Lorentz transformation group of special relativity (Section V), and invoking invariance to arbitrary geometrical distortion of coordinate space gave rise to the kinetic equations of general relativity (Section IX).
182
B. ROY FRIEDEN
Imagine that a solution S ( o ) to (X.16) has achieved I = 0. Axiom (iii) requires that I remain zero under, in particular, an arbitrary change of units in o.Define a new unit o1= am, a constant. Then the new power spectrum S1 obeys
1
S , ( o , ) = a s@.
(X.20)
The new information I, is still of the form (X.19),
We used the fact that parameter 1 = 1(a) will generally vary with unit a. Substituting (X.20) into (X.21), and changing integration variables back to (X.22)
w = w,/a,
gives I, = 4a'
s
d o / S ( o ) - A(a)a'-*
s
doS(w)bG(ao).
(X.23)
Compare Eqs. (X.19) and (X.23). The extremum solution S ( o ) to (X.19) attained I = 0. In order for the extremum solution to (X.23) to retain 1, = 0, the Lagrangians in (X.19) and (X.23) must be proportional. We see that they are proportional (with proportionality constant a') if and only if G(o)satisfies A(a)a'-bG(ao)= 1(l)a2G(w),
A(1) = 1,
(X.24)
If 1 depends upon unit a as a power law, 1(a) = A( l)ac,
c constant,
(X.25)
then the solution to (X.24) is G(u) = w k ,
k
= 1
+ b - C.
(X.26)
Interestingly, this is independent of unit a. If, on the other hand, 1(a)does not have the special form (X.25), the answer for G(w) will still be a powerlaw solution as in (X.26), but the power will now depend on unit a. G. Solution
With F and G now known by Eqs. (X.18) and (X.26). the physical information (X. 16) becomes
I
=4
s
do/S(o) - 1
s
doS(o)bok.
(X.27)
PHYSICAL INFORMATION
183
Parameters b and k are undetermined numbers. The information quantity J (far-right term) that fixes the scenario is a generalized Mellin transform of S(o). In the particular case b = 1, J becomes the ordinary Mellin transform. The Mellin transform has been shown (Zwillinger, 1992)to be a solution to classes of fractional differential equations. Fractional and fractal effects of many types dominate the analyses of l/fnoise (Flandrin, 1989;Frieden and Hughes, 1994). We may now find the equilibrium solution S ( o ) . The Lagrangian in (X.27)is L = L [ o ,S(w)] = 4/s - ASbok.
(X.28)
The solution by either Euler-Lagrange equation aL/aS = 0 or L = 0 is the same (as required earlier), S(o) =
Co-",
C , a constant, a = 1 - c / ( b + 1)2 0. (X.29)
Equation (X.26) was also used. The exponent is negative because, physically, S ( o ) should attenuate with a,not grow. The case a! = 0 represents white noise. The case c = 0 is of interest. By Eq. (X.29)it causes pure l/onoise, and this is independent of b. Also, by Eq. (X.25),L does not depend upon the choice of unit u. Perhaps this indicates a dominant solution. Solo (1992)has shown that solution (X.29)is consistent with the IRF, assumption if 1 Ia I2. Empirically, this includes the majority of l/f phenomena (Keshner, 1982).However, there are physical cases for which a! < 1 (Bell, 1980)or a! > 2 (Flandrin, 1989).These are beyond the scope of this derivation. It appears that the IRF, assumption is slightly too restrictive in this regard. Indeed, the only property of an IRF, process that was used is that its spectrum Z r ( o ) obeys a central limit theorem (see Eq. (X.12)et vecin.). It may be that a less restrictive process exists that likewise obeys a central limit theorem. The scope of the approach can be somewhat broadened. The same solution (X.29)results from extremizing the information at a single frequency o, I = I(@) = ~ / S ( O ) AF[S(w),01.
(X.30)
Arguments (X.17a)-(X.18)and (X.20)-(X.26)follow for this I as well. Therefore, the condition for integral form (X.14)to hold may now be lifted: This is that Z,(w) be independent over frequencies o. The EPI approach allows further generalization. Instead of Eq. (X.16), which allows for one input of scenario information, postulate the simultaneous presence of many such inputs, as in
s
N
Z = 4 do/S(o) -
A, n=l
1
d o F,(S, 0 ) .
(X.31)
184
B. ROY FRIEDEN
This physically represents the presence of N competing processes. Interestingly, as in the previous case (N = 1) the functions F,(S, O ) may again be fixed by the argument (X.17a)-(X.18) that the solution S(O) should be a principal solution, and argument (X.20)-(X.26) that I should remain zero under a linear change of coordinate o. The result is an information I =4
s
~ o / S ( O )-
" S
C n=
L, d o
(X.32)
1
(compare with Eq. (X.27)). The principal solution must then obey a transcendental equation N
4/S(O) -
c L n S ( 0 ) b w k n = 0.
(X.33)
n= I
This is a polynomial equation of power
/?
= max(b, n
+ 1)
in S, and so does not have a closed-form solution unless /? is 4 or less. The solution simplifies if all 6, = b, a constant, to S(O) =
(E=:
4
)
l/(b+ 1) *
(X.34)
1 L.nUkn
In the case b = - 4, k, = 0, 1,2, ..., this becomes Burg's (1978) maximum entropy spectral estimate. Hence, the two estimation principles of extreme physical information and maximum entropy are convergent in this case.
H . Discussion The principle of extreme physical information derives paradigms of physics. These are phenomena that are unexplainable by other, known phenomena. The wave equations of quantum mechanics (Section VII) are good examples. The EPI approach requires one physical fact, defining the particular scenario, which is insufficient in itself to derive the paradigm. [Here, it is that X ( t ) is an IRF,,.] This fact combined with a condition of extreme disorder, in the Fisher sense, derives the paradigm. The answer that EPI provides to the ultimate question of why a paradigm arises is that the paradigm is an expression by nature of extreme disorder. No other mechanism need be invoked. Turning to the problem at hand, we note that attempts at unifying l/f power spectra from a phenomenological viewpoint have been only partially successful; see surveys of Bell (1980) and Weissman (1988) covering decades
PHYSICAL INFORMATION
185
of past work. To us, this suggests that the phenomenon is a distinct paradigm, unexplainable by other phenomena, and hence of the type derivable by EPI. The EPI derivation rests upon the validity of the IRF, assumption, and upon internal consistency of the EPI approach including its axioms. No other physical mechanism has been used. The random fields considered are, roughly speaking, filtered versions of “nicely behaved” white noise. To the extent that such a field is present, the l/fresult Eq. (X.29) follows as an expression of extreme disorder. It would be useful to determine the extent to which the numerous physical Wfphenomena follow the IRF, model, and if the model can be broadened to permit a slightly wider range of (Y values. We leave these to future research. AND HIGHLIGHTS OF DERIVATIONS XI. SYNOPSIS
The Fisher trace information Z of (111.34), derived in Sections 111,E-H, has been found to provide a basis for the physical laws obeyed by the electron. The approach also provides definitions for the basic electron properties of mass (Eq. (VII.24)), momentum (Eq. (V11.13)), and energy (Eq. (V11.15)). The approach requires the cojoint use of I and a single physical truth defining each field of physics. The latter defines a conditional, scenario-dependent information quantity J that when subtracted from I forms a “physical information” I. This representation for I derives from Axioms (i)-(iv) of Section IV,A. Extremizing, or zeroing, I gives rise to the physical law q for the scenario. A premise of EPI theory is that every zero or extremum solution for I has physical significance. This is a new route to defining physical phenomena, some of which yet lie undiscovered. For example, in quantum mechanics, the definition of momentumenergy space leads to conditional information J obeying Eq. (V11.6). Then zeroing I leads to the Dirac Eq. (VII.42), while extremizing I leads to the Klein-Gordon Eq. (VII.32). The minimum necessary dimension N to the zero-root problem (VII.40) was found to be N = 4, corresponding to a spin particle. But alternative, higher dimension-N roots should exist as well. These should imply the existence of fundamental particles having higher spin than 3. Axiom (iii) of EPI theory (Section IV,A) was found to be essential to all manner of physical phenomena. It states that information I must always have the constant value zero. This is regardless of phenomenon or choice of representation for the phenomenon. By the latter is meant all arbitrary choices made in describing a phenomenon, such as choice of coordinate
186
B. ROY FRIEDEN
system, or gauge, or relative frame velocity. Undoubtedly there are others as well, and their use must lead to further advances. Since I is the difference of I and J , we wanted these quantities as well to obey such invariances. But is I of the proper mathematical form to obey these? A strength of the theory is that information I is, in fact, of the proper form. Thus, I can be placed generally in the coordinate-covariant form (111.36) or in the gauge-covariant forms (VII.31) or (VII.42). Mathematically, coordinate covariance for I holds because of use of the characteristic information state (Section 111,H). Coordinate covariance was essential in deriving the Einstein equations of motion (Section IX). It also permits the extension of other laws, such as the Klein-Gordon equation, to curved space (as in Lawrie, 1990, pp. 147-153). The other covariance, gauge-covariance, was used to derive the fielddependence of the Klein-Gordon and Dirac formulations (Section VII). Interestingly, the recipe for accomplishing gauge-covariance dovetails with the requirement of EPI that information I be in the optimized form (111.34). The recipe for gauge-covariance is first to form Z in a free-field state, and then to insert the fields through replacements (VII. 1). Fortunately, it is in this very free-field state that I always has the form (111.34) required by EPI (see, e.g., Eqs. (VI.7) or (IX.11)). The EPI principle (IV.11) is based upon the credo (IV.l), which is a statement of the “perversity of nature.” It has obvious parallels with the second law of thermodynamics, in that both state that, by some measure, the “disorder” of an isolated system must increase. However, the measures of disorder are different (Boltzmann entropy on the one hand, Fisher information on the other), and this is basically because the two information channels differ: Entropy derives from a “communication” channel model, where the central issue is the number of ways a given signal or system configuration can be formed, whereas Fisher information derives from a measure-estimation channel (Section III,A), where the central issue is how accurately a parameter can be estimated after a measurement. The rules of special relativity, Section V, were derived on the basis of Axiom (iii), that information I must always be zero. Here it is required to be zero regardless of the relative (constant) velocity of a reference frame. The Lorentz group of transformations result. This result affects most of the subsequent derivations based upon EPI, since an implication is that dimension K should normally be 4. A dimension of 4 permits frame invariance to hold for any physical law derived from EPI. The Maxwell equations of classical electrodynamics (Section VI) derive from EPI, again using dimension K = 4. The scenario must be defined by a choice of conditional information J , and this is done by a self-consistency argument. The result is the e.m. wave equation (VI.25) in the Lorentz gauge
PHYSICAL INFORMATION
187
(the only covariant gauge). It is then shown (Appendix D) that this implies the Maxwell equations. Also, see Frieden (1992a). The quantum mechanics of a relativistic particle is derived in Section VII. The requirements of the “characteristic state” are that a free-field situation be present initially (see previous discussion). Also, dimension K = 4 is required, as before. Although modes q are purely real, they may be packed as complex modes w that naturally become “probability amplitudes,” i.e., quantities whose squares define the p.d.f. for the particle. (There is no need for the Born ad hoc interpretation.) Also, the development never requires the interpretation of mathematical derivative operators as physical operators, i.e., ihV + momentum, etc. In fact, these identifications may be regarded as results of the development. The equivalence of energy, mass, and momentum, Eq. (VII.25), is seen to derive as well from the EPI approach. Once more, Axiom (iii) is used. For Fisher Z to be a constant it cannot depend on the particular p, E for a problem. This directly implies (VII.25). That parameters e , c, h, and m are universal constants is also shown (Sections IX and VII) to follow from Axiom (iii). The invariance of information implies the invariance of the physical constants. The Klein-Gordon equation follows as the extremum solution to the EPI problem, while the Dirac equation follows as the zero-root solution. EPI thereby provides a unified approach to quantum mechanics. The Dirac route, in particular, is new since it does not follow from extremizing a Lagrangian or postulating an energy eigenvalue equation (the usual approaches). Finally, it is interesting to consider where the concept of phase arises in the theory. It traces from the characteristic measure-estimation scenario, whereby separate estimators may be formed depending upon which support region the data space-time measurement falls into (see Appendix B). Each support region defines a different mode q,,, and successive pairs of these define a complex probability amplitude vn (Section VI1,B). Thus, phase originates as a concept of the Fisher measure-estimation channel. The Heisenberg uncertainty principle (position-momentum and timeenergy versions) was derived in Section VIII. These are seen to be manifestations of the Cramer-Rao inequality (111.3). In these derivations the Fisher information trakes on the physical significance of either meansquare momentum or mean-square energy. Also used was the fact that separated modes q in the characteristic state produce a greater information Z than if the modes are made to overlap. The derivations grow out of an attempt at measuring the mean coordinate of a particle. The ensuing C-R inequality shows that, as a result of the coordinate measurement, the conjugate coordinate is uncertain. As an example, measuring position causes an intrinsic uncertainty in momentum, even before measurement of
188
B. ROY FRIEDEN
the latter is attempted. Another interesting result is that the positional uncertainty ex in the derived Heisenberg law (VIII. 11) represents r.m.s. error in the processed data value, not just in the data value alone. Hence, the Heisenberg principle is stronger than as usually stated. It binds processed data as well as raw data. The equations of motion of general relativity were derived in Section IX. This derivation is interesting in that cosmological principles are used to derive the kinematics. These are found to obey the Einstein equations. The cosmological principle and Hubble’s law are used as inputs defining the scenario. The transformation rules of special relativity, derived in Section V, require coordinate dimension K = 4 to be used, and the modes q to be a four-vector, i.e., N = 4. This results in an expression (IX.14) for Fisher I in any generally curved coordinate system. Once again we invoke Axiom (iii), that physical information I should be zero independent of the coordinate system. The resulting information I, Eq. (IX.18), has as its extremum solution the Einstein differential equations of motion for the particle. This describes a geodesic path in the generally curved space. A second interesting output of the theory is Eq. (IX.l9), stating that an information flow rate dZ/dr exists, 7 the proper time, and is proportional to both Hubble’s constant and the rest energy mc2 of the particle. A third item of interest is the linkage between mass m and Hubble’s constant H . These always occur in the theory as the simple product mH, so that if this is regarded as a constant (as it usually is), then rest mass varies inversely with universal expansion rate. This prediction seems plausible. Finally, the EPI principle was used to derive the ubiquitous l/f power spectral noise effect. The appropriate gedunken measurement is that of the spectral amplitude of a tone (at a fixed frequency 0).The gedunken aim is to estimate the amplitude of the tone that occurred over an initial, shorter time interval. The physical scenario is defined by a time series X ( t ) that is an intrinsic random field of order zero. This type of signal obeys a central limit, which means that its spectrum is Gaussian. This allows the Fisher I to be expressed anlytically as 4/S(w) in Eq. (X.13), with S the power spectrum. A general physical information I is formed from this in Eq. (X. 16), in terms of a functional F[S(w),w ] to be determined. Functional F is found from two requirements; (a) that S(w) be a principal solution of EPI, i.e., satisfy both its extremum and zero requirements; and (b) that invoking Axiom (iii), information I should be invariant to a linear change of coordinate in frequency space. The S(o) that satisfies (a) and (b) is then found to be of the general l/f form (X.29). If many competing processes are more generally present, EPI predicts a power spectrum obeying (X.33) to result, with particular solution (X.34) resembling J. P. Burg’s maxent formulation (Burg, 1978).
189
PHYSICAL INFORMATION
The EPI principle is a new tool of physical analysis and prediction. It is continuing to be developed and used. We briefly report on two new uses: 1 . The second law of thermodynamics states that the entropy H of an isolated system must increase, dH/dt 2 0. However, the law does not provide a limit for the amount of increase possible. We have found (Nikolov and Frieden, 1994) that
dH/dt
IaI,
(XI.1)
a a constant and I the Fisher information about the mean particle position. This holds for classical particles or for generally relativistic electrons. Hence, entropy must increase, but not by too much, in the presence of an information level I. 2. The Lorenz equations (Lorenz, 1963), describing a chaotic system are of the form qn(t)=fn(Q),
n = l , * * * , N9, ' ( 4 1 , . * . , 4 N ) ,
(XI.2)
where the dot denotes a derivative d / d t . Each f n ( q )is a known function of the modes q . If these functions are nonlinear in q , then chaos results. We show next that the Lorenz equations readily follow from EPI theory, as a zero-root solution. Using time t as the parameter r to measure in the gedanken experiment, the Fisher information (111.34) is N
I =4
(XI.3)
[dtq:(t). n=l
J
By the standard EPI procedure, we must find what I equals for the particular scenario. Since by (XI.3) I is positive-definite, it may generally be expressed as a form N
I =4
c
d t f i ( q ) 2 0,
q = q(0,
(XI.4)
n=l
where f n ( q ) ,n = 1 , ...,N, is a real, vector function of the q . Next, the physical information is formed, and we seek a zero-root I =4
Nl
C
dt[&(t)
- f J ( q ) ]=
0.
(XIS)
n= I
One particular set of roots are those satisfying
- f,2(q) = 0,
n = 1,2, ... .
(XI.6)
190
B. ROY FRIEDEN
This, of course, is factorable, with a solution (XI.2) as required. Thus, the Lorenz equations arise readily as a zero-root of the physical information, for a scenario defined by the condition that Fisher Z be positive. Since this a weak condition, the Lorenz equations should have widespread occurrence. If the functions fJq) are nonlinear in the q, chaos results. Hence, EPI accommodates chaos and predicts its widespread occurrence. ACKNOWLEDGMENTS I thank B. H. Soffer, J. Hess and J. 0. Kessler for their initial, and continuing, encouragement of this work. Conversations with R. N. Silver and E. M. Wright on “the basics” have been particularly informative and helpful. Philosophical discussions with B. H. Soffer have been inspiring. An ongoing collaboration with W. J. Cocke has opened up cosmological vistas. R. J. Hughes originally suggested that the information approach might imply the constancy of the universal physical constants. Finally, my remote but enthusiastic correspondence with B. Nilolov, V. Solo, and R. J . Hughes have kept the juices flowing and the ideas forming.
INFORMATION OBEYSADDITIVITY APPENDIX A: FISHER Suppose that each parameter 0 is K-dimensional, and that there are M of these, so that the data are 111= 1,..., M ;k = 1 ..., K. Pmk, The scalar Fisher information was defined [Eq. (III.27)] to be the trace of the Fisher information matrix
I =
f 1dp(a 1np/aei)2p,
i= 1
p = p(p I e).
(A. 1)
Because the data are independent
P
= P(P 10) =
n P(Pd IW,
(A. 2)
d
so that M.K
I ~ = P
C
lnp(pdl@.
n.i
Taking the partial derivative M.K
a In p/aei = C a In p(pd I @mi. nd
(A.3)
PHYSICAL INFORMATION
191
Plugging this into Eq. (A.l) gives
When the indicated right-hand square is taken, all integrated cross-terms vanish. This is shown as follows. In the square of the inner sum, there will be perfect squares and cross terms. The perfect-squares contribution to Z are terms identified by index n, in
Comparing this with Eq. (A.l) shows that Z is now the sum (over n) of M information contributions Z(n), or one from each data value. This is what we set out to prove. The remainder is all cross terms, a sum K
T= 2
C n Tmk nnz j Td, i = l m#k
(A.7)
where
Taking the derivative of the logarithm gives
taking the derivative outside. But the integral that remains is the normalization integral, or unity. Therefore, its derivative is zero. Thus, all cross terms in (A.7) are zero, and the net Z is the sum of informations Z(n) over all data values, Eq. (A.6), as we were to prove.
APPENDIX B: MAXIMAL INFORMATION AND MINIMAL ERROR IN CHARACTERISTIC STATE As described below Eq. (111.34), with knowledge of separated modes qn(r) the observer is free to form a distinct estimator &p, n) for each region n of data space into which observable p falls. We show here that the separated modes qn thereby give a higher information Z and a smaller error e than when the same modes are allowed to overlap.
192
B. ROY FRIEDEN
For simplicity, we work with a one-dimensional (r = x ) case ql(x),q2(x) of two modes. We want to show that the use of two optimum strategies 8(y, i), i = 1,2, leads to a smaller estimation error e2, and larger information I, than if only one strategy is used. If the modes ql(x),q2(x)generally overlap, then only a single estimation rule & y ) may be used, and the derivation (111.5)-(111.10) gives eCin= 1 / 1 ‘ , I’ =
s
(B.1)
dxp’2(x)/p(x), p‘(x) E dp(x)/dx.
03-21
Or using q(x) defined by
P(X) = qW2, Eq. (B.2) becomes
I’ = 4
s
dxq’2(x).
In summary, the two component laws p l ( x ) , p2(x) only contribute to I‘ through their sum p(x) when a single estimation rule &y) is used. , do not overlap, two distinct By contrast, if the modes 4 1 ( ~ )q2(x) estimators dl(y), i 2 ( y ) may be used. Now the derivation steps (111.5)(111.10) give the results (Frieden, 1991b) 2 emin = 1/1,
(B .5)
Here the individual components p l ( x ) ,p2(x)explicitly contribute to I . If we can show that I 1 1’,
03.7)
it will follow, by Eqs. (B.l) and (B.9, that 2
emin
&in,
(8.8)
one aim of this appendix. To show (B.7), we have to express I and I‘ in terms of the same quantities, ql(x),qz(x). Toward this end, by Eq. (III.33a) information I in Eq. (B.6) becomes
I=4
s
dx[qi2(x)+ &’(x)].
(B.9)
193
PHYSICAL INFORMATION
Also, by Eqs. (111.31), (III.33b), and (B.3), q(x)2 = 4 1 ( d
(B.lO)
+ 42(x)2.
Differentiation gives (B. 11)
44‘ = 414; + 4245,
so that (B.12)
Using this in Eq. (B.4) gives (B.13)
Now quantities Z and I’ can be directly compared. By Eqs. (B.9) and (B.13), inequality (B.7) will hold if, at each x, (B. 14)
This inequality is easily verified. Square out the right-hand side and cross-multiply. Then (B. 14) holds if 4;%
+ 4;24: + 4 M + 4i24: = 4:4? + 4:452 + 2414;qzqi. (B.15)
After cancellation, (B. 15) becomes a requirement 4;% + 43?: 2 241 4;4245
-
(B. 16)
But identically, (414; - 424;Y =
d q i 2 + dq;’ - 2qiqiqzqi.
(B.17)
Transposing, we have
d4i2 + d q t
= (4145
- 4zqi)2 + 2qiqiqzq;.
(B.18)
Because the squared term is positive or zero, it follows that (B.19) dq;’ + d q i 2 2 241qiqz 44. This is exactly requirement (B.16). To summarize, we have shown that inequality (B.16) is true. This implies the truth of inequality (B.14), which in turn implies the truth of inequality (B.7). Then by (B.1) and (B.9, inequality (B.8) holds. These results are easily generalized to the presence of N modes qn(r), where r has a general dimension K. In summary, when a given law p(r) consists of N separated modes, N distinct estimation rules may be used, the information from each mode adds maximally, and the result is a decreased error of estimation.
194
B. ROY FRIEDEN
APPENDIX C: PROPERTIES OF INFORMATION DIVERGENCE QUANTITY r(e, el) A typical information divergence is the Kullback-Leibler form
This is a measure of the “distance” between the two curves p ( p 18) and p(p I O’), 8‘ # 8 generally. Quantities 8, 8‘ are alternative parameter lists that quantify the shape of the curve p . For example, 8 is one pair of values of mean and variance, while 8 ‘ is another. The aim here is to show that quantity I(& 8’) defined at Eq. (IV.15) obeys properties (i)-(iv) (listed below Eq. (IV.15)) that define an information divergence (Amari, 1985). We examine these, in turn: (i) Comparing Eqs. (IV.11) and (IV.15), with 8 the solution, by Eq. (IV.11) I(8, 8) = 0 identically. Thus, I(8,O’) is a measure of the distance between trial solution 8 ‘ and actual solution 8, and I(8,O) is the selfdistance of zero. (ii) The extremization requirement in Eq. (IV. 11) is now to be effected through variation of discrete parameters 8’. Assuming the solution 8 ’ = 8 to exist at an interior point of its space, the ordinary rules of calculus require that property (ii) be obeyed, with I replacing D. (iii) Define I(e, 8’) = z(e, 8 ) ) - J ( e , el), (C.2) where I ( @ ,8 ’ )
Ns s
=4
n= 1
J(8, 8‘)
dr Vq,(r I 8;)
- Vq,(r I O;), (C.3)
= drF[q(r18 ’1, r ] .
Equations (C.3) are direct 8’-parametrizations of Fisher information Z and conditional information J treated previously. Property (iii) requires us to evaluate I(8,8 + d o ) as an expansion up to second-order in powers of d 0 , with gnij identified as the coefficient of the quadratic expansion term denide,. (Caianiello, 1992). From definition (C.3), Z(8,8
+ do) = 4
Use the expansion
s
dr
Vq,(r 18
+ d o ) - Vq(r 18 + d o ) .
(C.4)
195
PHYSICAL INFORMATION
noting that higher-order derivatives in 8 will not contribute because of linear representation (IV.14) for q. It results that
We next evaluate J ( e , f l + do). To facilitate the calculation, specialize to the case N
F[q(r), rl = 4f%)
c d(r10).
(C.7)
n= 1
This separation of F is valid in all physical scenarios below except for the electromagnetic one (Section VI). We defer the electromagnetic case to future study. Quantity f2(r) now replaces F as defining the particular scenario. As shown before (Section 11), it also allows requirement I = 0 at extremum to be mathematically satisfied. Now using the expansion q n ( r ) e+ do) = qn +
C doni.a4 i
(C.8)
aeni
(again ignoring higher-order derivatives in 8 as above), definition (C.3) gives
+ 4 cn ic. j denid0,
s
a4n a4n drf2(r)- -.
aeniae,
(C.9)
Then combining Eqs. (C.2), (C.6), and (C.9) gives
(C.10)
(C. 11)
is the metric tensor for these physical scenarios. But the first right-hand term in (C.10) is zero by property (i) of this appendix. Also, the second right-hand term is the first variation of I about the solution point 8, and
196
B. ROY FRIEDEN
by (ii) this is zero as well. The result is that I(8, 8
+ do) = 4 C
(C. 12)
do,, de,g,,,.
n, i.j
If we also directly expand I@, 8 + do) in Taylor series about 8, and compare terms with (C. 12), result (iii) follows. Using the fact that the V and a/aBni operations commute, and using Green’s theorem, gives the simpler form (C.13)
(iv) For the expansion choice (IV.14), (C.13) becomes
s
gnu = - dr 4i(r)[vz + f 2 ( r ) l + j ( r ) = g i j ,
(C. 14)
the n-dependence dropping out. All modes 4,, have the same metric tensor. Form (C.14) of the tensor looks like a quantum-mechanical energy matrix element, but applies to the wider scope of physical scenariosf(r) (all but Section VI). Hence, physical information divergence I describes each scenario f(r) by a generally different metric gij In the scenario of quantum mechanics, (C. 14) is a modified energy matrix element. The modification is that the rpi, +j are not energy eigenfunctions, as would be required for (C.14) to be an energy matrix element. The eigenfunctions are the qn(r), while the q$(r) are arbitrary input functions; see (IV.14). Finally, because of choice (IV. 14), Eq. (C. 13) alternatively becomes
.
in terms of a “potential function” ~ ( 0 )It. is easy to show that v(8)is, in fact, I/8, in terms of the information itself.
D: MAXWELL’S EQUATIONS FROM APPENDIX WAVEEQUATION
THE VECTOR
We derive Maxwell’s equation in vacuum, assuming Eq. (VI.25), 4R
OZA = --j, c
Elz+ = -4np,
(D.1)
197
PHYSICAL INFORMATION
and the Lorentz gauge, 1 a4 V.A+--=O. c at
Definitions:
BsVxA, E
-V$
1 aA
- - -. c at
The latter two of course define the fields B, E. The four Maxwell’s equations derive as follows: 1.
V B = 0,
since
V*(VxA)=O
is a vector identity. 1 aB 1 aA 2. - - -= - v x - = V x (-V$ c at c at
- E)
by (D.5), = -V x E ,
since
vxv+=o
(D.9) (D. 10)
is a vector identity. 3. V x B
I
V
x (V x A) I V(V . A) - V2A
(D. 1 1)
is a vector identity,
(D.12) by (D.l) and (D.2), (D.13) identically. Therefore, By (D.5) 1 aE 4 K V X B - - - = - j. c at c
(D. 14)
(D.15)
198
B. ROY FRIEDEN
=
- A) = -V2+
a -V2d - -i -(V
c at
1 a24 +a
(D.16)
c at
by (D.21,
= 02f#J = 4np by (D.1). Therefore,
(D.17)
-
V E = 4np.
(D.18)
APPENDIX E: DERIVATION OF EQ. (V11.39)
From definitions (VII.37) and (VII.38) we form N
/
3
N
N
\
explicitly. The notation is as follows. Quantity almndenotes the (m,n) element of matrix [al], where I = 1 , 2 , 3 corresponds to components x, y, z, respectively, and
We also used
*
%k
(Ylkn,
8:k
= pkn,
(E- 3 )
a hermiticity property for the matrices. Evaluation of the individual terms comprising Re(v, * vz). Consider all possible products in the sum (E. 1). The sum of first products is N
s, = -nC= l
3
c
N
C
I,m=l j , k = l
*
QlknffmnjwmjV/lk
199
PHYSICAL INFORMATION
after renaming I + m,m =-
c c
I in the second sum,
+
(wmjV/l+k
alknamnj
-t w l j v z k
j ,k J , m m
-
wmjwkk
c
amknalnj
) (E.5)
arnknamnj’
n
j.k.m
where the last sum is for m = 1. Now suppose that
c
alkn(Y,nj
n
=
-
c
all j , k, 1, m,
(YmknaIM,
i #j.
(E.6)
n
In matrix notation, this means that [all[amI= -[aml[a~l, I # m,
(E.7)
or the matrices [a,], [a,,],[a,] anticommute. The effect on Eq. (E.5) is directly s1
c
=-
a m k n a l n j ( V / l j ~ ~k wmjV/l+k)
-
j,k,l.m n
c
wmjwzk
ma1
amknffmJ*
n
i.k,m
(E.8)
When S, is integrated dr, the first double sum contributes zero. This is shown in Appendix F, Eqs. (F.l)-(F.5). Hence, the first double sum effectively contributes zero to S1 of (E.4). Regarding the second double sum in (E.8), suppose that matrices [ai]obey
c
=
amknamnj
sjk*
n
This is equivalent to [a,]’ = [l],
m = 1,2,3,
(E.9)
where [l] denotes the unit diagonal matrix. Then directly the second sum becomes -
c
N
wmj&ksjk
=
-
i,k,m
m.j
wwvmj * -’
c
V w j .
Vw;,
(E.lO)
j =1
the latter by notation (E.2). This has the form of the first right-hand term in required form (VII.39), i.e., (E. 11) Next consider the sum of second products in Eq. (E.l), s 2I
-q2
PknPqjwjw:. j.k.n
(E. 12)
200
B. ROY FRIEDEN
Suppose that matrix
[PI obeys CPknPnj
=
n
This is equivalent to requiring
[PI2 = 111.
(E.13)
Then directly Szbecomes
c
sz = -q2
N
wjw:djk
= -q2
j .k
c
wjw?.
(E.14)
j =1
This has the form of the third right-hand term in form (VII.39), as required. Next, consider the cross terms in (E.l),
We renamed dummy summation index m to I, in the left-hand multiplesum. Suppose that (E. 16) f f l k nPnj = P k n aInj*
c
c
n
n
In matrix form, this states that
[arl[Bl= -[P1[4,
(E.17)
or matrix [PI anticommutes with matrices [ar],[a,,], [az]. Using identity (E.16) in Eq. (E.15) gives
c
S3 = -iq
alknPnj(W:V/lj
+ wjw:k)-
(E.18)
j,k,l,n
It is shown in Appendix F that the integral dr dt of this sum vanishes,
ss
drdtS3 = 0.
(E.19)
(See Eqs. (F.6)-(F.9).) Next consider the cross terms in (El) involving terms ilw4,, , i l y & . The interaction with terms in [PI is a sum s4 I
iqL
c
(V/:V/4nfikn
+
vk!dnfink)
(E.20)
k.n
after renaming the dummy index j to k. Replacing P k n by P:k (hermiticity), the first term in the sum is the complex conjugate of the second. Therefore, the sum of the two is purely real. This makes S4 pure imaginary. Therefore, the Re operation in (E.l) gives zero, Re@,) = 0.
(E.21)
20 1
PHYSICAL INFORMATION
The other cross terms of this type in (E.l) are the interactions of terms iAty4,,, iAi& with terms in [ a ] .This gives a sum
c
-A(
s 5
&k
W$
W4n
-
c
)
f f l n kV/lk WZn
k.1.n
k.1.n
(E.22)
after renaming dummy indices j, rn to k, I , respectively, and replacing alkn by &k (hermiticity). The first sum is the complex conjugate of the second. Therefore, the difference of the two is pure imaginary. Hence, its real part is zero, Re@,) = 0. (E.23) Finally, the end terms iAty4,,, -iAW& in form (E. 1) multiply each other to give a contribution s 6 = ReA2 W4nW:n,
c n
or (E.24)
by notation (E.2). This is the form of the second right-hand term in required form (V11.39). We can now combine results. Using (E.ll), (E.14), (E.19), (E.21), (E.23, and (E.24) in Eq. (E.l) gives
IS
-
dr dt Re(v, v2)
Q.E.D. APPENDIX F: EVALUATION OF CERTAIN INTEGRALS
We first show that the integrals of all terms (.) in the first double sum in Eq. (E.8) are zero. Arbitrarily consider the case 1 = 3, rn = 1, where the integrals are of the form
TE
S1
hdt(W3jW:k -
WIjv/:k)
Tw -
Kz.
(F.1)
202
B. ROY FRIEDEN
It is convenient to work in frequency space. Represent each of v j ( r , t ) , v k ( r , t ) by its Fourier transform, Eq. (VI.10).Then by Parseval’s theorem,
Likewise, we get
Comparing Eqs. (F.2) and (F.3), we get T,, = Tu.
Hence, by (F. l), T = 0.
Now the integral of the first double sum in Eq. (E.8) is a weighted sum of integrals of the form (F.1). Hence its value is zero, as was to be shown. We next show that, as discussed at Eq. (E.19),
s
First consider the case I = 3, withj, k arbitrary. Then the integral is d r d t ( v j w:k
-k
!dw3j)*
(F.7)
Again working in frequency space, we get
by Parseval’s theorem. Likewise,
the negative of (F.8). Therefore, the sum of (F.8) and (F.9) is zero, confirming assertion (F.6). REFERENCES Amari, S. (1985). Differential-geometrical methods in statistics. In “Lecture Notes in Statistics,” Vol. 28, pp. 80-85. Springer-Verlag, Heidelberg, Germany. Bell, D. A. (1980). A survey of l/f noise in electrical conductors. J. Phys. C 13, 4425.
PHYSICAL INFORMATION
203
Bekenstein, J. D. (1990). Quantum limitations on the storage and transmission of information. Int. J. Mod. Phys. C 1, 355. Bocker, R. P. (1994), private communication. Bracewell, R. (1965). “The Fourier Transform and Its Applications.” pp. 160-161. McGrawHill, New York. Burg, J. P. (1978). Maximum entropy spectral analysis. In “Modern Spectrum Analysis” (D.G. Childers, Ed.). Wiley, New York. Caianiello, E. R. (1992). Quantum and other physics as systems theory. Rivista del Nuovo Cimento 15, 7. Cocke, W. J. (1993), private communication. DeGroot, M. H. (1970). “Optimal Statistical Decisions.” McGraw-Hill, New York. Eisele, J. A., and Mason, R. M. (1970). “Applied Matrix and Tensor Analysis.” Wiley, New York. Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cambridge Phil. SOC. 22, 700. Flandrin, P. (1989). On the spectrum of fractional Brownian motions. IEEE Trans. Inform. Theory IT-35, 197. Frieden. B. R. (1990). Fisher information, disorder, and the equilibrium distributions of physics. Phys. Rev. A 4265. Frieden, B. R. (1991a). “Probability. Statistical Optics, and Data Testing,” 2nd Ed. SpringerVerlag, Heidelberg, Germany. Frieden, B. R. (1991b). Fisher information and the complex nature of the Schrodinger wave equation. Founds. Phys. 21, 757. Frieden, B. R. (1992a). Fisher information as the basis for Maxwell’s equations. Physica A 180, 359. Frieden, B. R. (1992b). Fisher information and uncertainty complementarity. Phys. Lett. A 169, 123. Frieden, B. R. (1993). Estimation of distribution laws, and physical laws, by a principle of extremized physical information. Physica A 198, 262. Frieden, B. R., and Hughes, R. J. (1994). Spectral l/f noise derived from extremized physical information. Phys. Rev. E49, 2644. Harwit, M. (1973). “Astrophysical Concepts.” pp. 438-439. Wiley, New York. Hawking, S. W. (1988). “A Brief History of Time.” p. 129. Bantam Books, Toronto, Canada. Huber. P. J. (1981). “Robust Statistics.” pp. 77-86. Wiley, New York. Jackson, J. D. (1975). “Classical Electrodynamics, 2nd ed.” pp. 536-541; 549. Wiley, New York. Keshner, M. S. (1982). l/f noise. Proc. IEEE 70, 212. Lawrie, I. D. (1990). “A Unified Grand Tour of Theoretical Physics.” Adam Hilger, Bristol, England. Lorenz, E. N. (1963). Deterministic nonperiodic flows. J. Atmos. Sci. 20, 130. Mandelbrot, B. B. (1977). “Fractals: Form, Chance, and Dimension.” Freeman, San Francisco. Morse, P. M., and Feshbach, H. (1953). “Methods of Theoretical Physics, Part I.” p. 278. McGraw-Hill, New York. Narlikarad, J. V., and Padmanabhan, T. (1986). “Gravity, Gauge Theories, and Quantum Cosmology.” p. 235. Reidel, Dordrecht. Nikolov, B., and Frieden, B. R. (1994). Limitation on entropy increase imposed by Fisher information. Phys. Rev. E49, 4815. Reza, F. M. (1961). “An Introduction to Information Theory.” McGraw-Hill, New York. Schiff. L. I. (1955). “Quantum Mechanics.” McGraw-Hill, New York.
204
B. ROY FRIEDEN
Schrodinger, E. (1926). Quantization as a problem of characteristic values. Ann. Phys. 79, 361.
Solo, V. (1992). Intrinsic random functions and the paradox of l/f noise. SIAM J. Appl. Math. 52, 270. Van Trees, H. L. (1968). “Detection, Estimation, and Modulation Theory, Part I.” Wiley, New York. Vilenkin, A. (1982). Creation of universes from nothing. Phys. Left. 117B,25. Weissman, M. B. (1988). l/f noise and other slow, nonexponential kinetics in condensed matter. Rev. Mod. Phys. 60, 537. Wheeler, I. A. (1988). World as system self-synthesized by quantum networking. IBMJ. Res. Develop. 32, 4. Zwillinger, D. (1992). “Handbook of Differential Equations, 2nd ed.” p. 261. Academic Press. New York.
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL 90
New Developments of Electron Diffraction Theory LIAN-MA0 PENG* Department of Materials. University of Odord Oxford. United Kingdom
. .
I Introduction . . . . . . . . . . . . . . . . . . I1 General Theory . . . . . . . . . . . . . . . . . . A Scattering Amplitude and Scattering Cross-Section . . . . . B Diffracted Beam Amplitude and Electron Diffraction Pattern C . Scattering by an Average Potential . . . . . . . . . . D . Elastic Scattering . . . . . . . . . . . . . . . . E Quasi-elastic and Diffuse Scattering . . . . . . . . . . F Correlation between Theory and Experiment . . . . . . . I11 Dynamical Elastic Diffraction by Crystals . . . . . . . . . A Fundamental Equations . . . . . . . . . . . . . . B Solutions of the Fundamental Equation . . . . . . . . C Boundary Conditions. Transmission. and Reflection Amplitudes D . Two-Beam Approximation . . . . . . . . . . . . . E Transmission High-Energy Electron Diffraction . . . . . . F. Reflection High-Energy Electron Diffraction . . . . . . . IV . Perturbation Methods for Periodic Structures . . . . . . . . A . Bloch Waves. Left-Hand and Right-Hand Eigenvectors . . . B Non-degenerate Perturbation Theory . . . . . . . . . C Tensor THEED . . . . . . . . . . . . . . . . D Direct Inversion of Crystal Structure Factors . . . . . . . E Direct Determination of Crystal and Surface Structures . . . . V Perturbation Methods for Nonperiodic Structures . . . . . . A . Distorted Wave Approximation . . . . . . . . . . . B Tensor RHEED . . . . . . . . . . . . . . . . C . Diffuse Scattering . . . . . . . . . . . . . . . D Z-Contrast Imaging . . . . . . . . . . . . . . . V1. Bloch Wave Channelling and Resonance Scattering . . . . . . A Two-Dimensional Bloch Waves and Axial Resonance Diffraction B. One-Dimensional Bloch Waves and Planar Resonance Diffraction C . Surface Resonance . . . . . . . . . . . . . . . Appendix A . Green's Functions . . . . . . . . . . . . Appendix B Crystal Structure Factors and Potential . . . . . . Appendix C . The Optical Potential . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .
. .
. .
. . . . . . .
.
. . . .
. . .
.
. . . . . . .
. . . . . . . . . . .
. . . . . . .
. . . . . . . . . . .
. . . . . . .
. . . . . .
. . . . .
. . . . . . .
. . . . . .
. . . . .
. . . .
. . . . . . . .
. . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . .
. . . . . . . . . . . .
206 207 207 209 214 216 217 218 221 221 223 226 230 236 241 250 251 252 255 257 265 272 273 276 279 289 293 294 309 316 333 337 340 349
* Present address: Beijing Laboratory of Electron Microscopy. Chinese Academy of Sciences. P.O. Box 2724. Beijing 100080. China. 205
Copyright 0 1995 by Academic Press. Inc . All rights of reproduction in any form reserved . ISBN 0-12414732-7
206
LIAN-MA0 PENG
1. INTRODUCTION Electrons interact with atoms through electrostatic potential. For highenergy electrons and to an excellent approximation, the interactions between the incident high-energy electrons and a solid may be approximated by an effective one-electron local potential V(r): V(r) = i
4i(r - ri) + AV(r),
(1.1)
in which the first term represents the contribution from an assembly of neutral atoms, and the second term describes much smaller modifications due to such processes as the ionization, bonding, and vibration of atoms. In principle, if the kinematical or single-scattering approximation holds for electron diffraction, the intensities of diffracted beams from a crystal may be written as where
& is related to the crystal structure via the relation
fy) is the electron atomic scattering factor of the ith atom in a unit cell, V , is the volume of a unit cell, and all other physical constants have their usual meanings. The Fourier seies method, like that used in the x-ray case (Lipson and Cochran, 1968), can then be used to obtain an electrostatic potential map (1.4) V(r) = & exp(ig * r).
C 8
In principle this potential map should have well-defined maxima at atom positions, and provides detailed information on the charge redistribution in crystals (Cowley, 1966; Smart and Humphreys, 1980; Zuo et al., 1988). The interaction of electrons with atoms is, however, very strong. Typically, the diffracted beams can gain intensitiescomparable to that of the incident beam for a crystal of less than 100 A . Hence, it cannot normally be assumed that the kinematical or single-scattering approximation is valid for electron diffraction. In general, it was realized, from the beginning years, that electron diffraction by crystals is dominated by multiplescattering events (Bethe, 1928). The strong interactions between the incident electrons and the crystal nuclei and electrons provide both the advantages and disadvantages for the technique of electron diffraction. It is these strong interactions which enable good statistics to be obtained from submicron regions (down to 2 A ) . On the other hand, the strong interaction
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
207
complicates the interpretation of electron diffraction and microscopy experiments and dynamical theory must be used (Cowley, 1993). The dynamical theory of electron diffraction by crystals was first developed by Bethe (1928), based on the dynamical theory of x-ray diffraction of Ewald (1917). The theory has then been re-formulated and developed using various different notations and sign conventions (Saxton et al., 1983) in the many text books (Hirsch et al., 1965; Cowley, 1981; Reimer, 1989; Spence and Zuo, 1992) and review articles (Howie, 1978; Humphreys, 1979; Dederichs, 1972; Steeds, 1984; Eades, 1989; Cowley, 1993). In this article we follow the sign convention of Hirsch et al. (1965). It is not the intention of the present article to repeat the well-documented theories and applications of electron diffraction. Rather, the article is aimed to provide an account of some of the more recent developments which have not been discussed in depth in the previously mentioned text books and reviews. In particular this article is concerned with the quantitative aspects of electron diffraction, the problem of direct retrieval of structural information from dynamical diffraction data, and the prospects of direct imaging of atom strings, planes, and surfaces. In Section 11, a brief introduction will be given of the basic concepts and quantities to be discussed in the subsequent sections. In Section 111, a general matrix representation of the dynamical theory of electron diffraction will be described, and shown to be able to deal with a wide range of diffraction problems, including transmission through a multilayer structure and reflection from surfaces of crystals. In Sections IV and V, perturbation methods will be given that are capable of dealing with non-Hermitian problems and non-periodic structures. Based on the perturbation methods, procedures for direct inversion of crystal and surface structures will be given, and a simple dynamical framework for describing diffuse scattering will be presented. In Section VI, one- and two-dimensional bound Bloch waves will be discussed in depth. Diffraction conditions under which these bound Bloch waves may be selectively excited will be given, and the prospects for imaging atom strings, interfaces, and surfaces will be discussed. 11.
GENERAL THEORY
A . Scattering Amplitude and Scattering Cross-Section A formal theory of electron diffraction deals inevitably time-dependent processes. Electrons are emitted from a source, such as an electron gun in an electron microscope, and are sent toward the sample. In the course of time the electrons interact with the sample via the electrostatic potential
208
LIAN-MA0 PENG
and are scattered. At a large distance from the sample, electrons move outward in all directions, where they are counted by detectors arranged far from the scattering centre. For the convenience of mathematic treatment, the incident electron wave function is often assumed to be of a form of a plane wave (11.1) woW = exp(iko r),
-
and far from the scattering centre, the electron wave function takes an asymptotic form (11.2) where w0 and wSdenote the incident wave and scattered wave, respectively, and f(0,d) is called scattering amplitude. In practice, the incident electrons from the source are collimated by an aperture, as shown schematically in Fig. 1, into a fairly well-defined beam. Such a collimated beam is not an infinite plane wave of the form (ILl), but can be made up by superposing infinite plane waves that have propagation vectors of slightly different magnitudes and directions. The total angular spread is of the order of the ratio of the wavelength of the electron to the diameter of the collimating aperture and can be made extremely small in practice. Since the scattering amplitude f ( 0 , d ) does not vary rapidly with angle, the small directional spread of the incident propagation vectors does not affect f(e,+) significantly. Our assumption (11.1) is thus justified. In scattering experiments, a useful concept is differential cross-section which is defined as the number of electrons which travel within a solid angle dQ for unit incident flux (Newton, 1966). Assuming a plane wave
FIOURE1. Schematic diagram showing the arrangement of electron scattering measurements.
209
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
incidence of the form (II.l), the current density of the incident beam is given by (11.3) For a scattered spherical wave of the form (11.4)
we have ih 2m [w, Vw: -
j = --
'
w: Vw,l
=
hk r - If@, m r3
-
6)12.
(11.5)
The flux of scattered electrons through a spherical surface element dS = r2dh2 o f a sphere having a very large radius is given by
-
hk d l = j, dS = - I f @ ,
m
+)I2
dh2.
(11.6)
Upon dividing by the incident flux j o , we obtain an expression for the differential cross-section da do
-=--
dl k - - If(& j 0 d Q ko
6)12.
(11.7)
B. Diffracted Beam Amplitude and Electron Diffraction Pattern
In the theory of electron diffraction, the crystal is assumed to be of a form of a slab or to be a semi-infinite crystal. It is sometimes useful to discriminate between the surface parallel and normal components of a real space vector r = (x, z) and a reciprocal space vector k = (9,k J , where the positive z direction has been chosen to be inward surface normal. Diffracted beam amplitude is defined as the two-dimensional Fourier transform of the electron wave function at a large distance from the crystal, i.e.,
s
n
S(q) = lim 2-f-
V/,(X,
-
z ) exp( - iq x) dx,
(11.8)
where the f signs refer to forward scattering and backward scattering, respectively. The diffracted beam intensity is given by 4 q ) = IW q ) 12.
(11.9)
210
LIAN-MA0 PENG
The scattering amplitude f(e,6) introduced in Eq. (11.2) can be related to the diffracted beam amplitude S(q) by the following relation lim
WJX,
Z+*W
z) = Zlim + S
f(e, +I-.
exp(ikr) r
(11.10)
Fourier transforming both sides of the preceding relation, we obtain
s
w,@, z) exp(-iq
*
x) dx = f(O,6)
e x p ( i k m )
exp( -iq. x) dx.
Since (see Appendix A): exp(ikr)
exp(-iq
2ni - x) dx = --lkzl exp(ik,z),
we have then an expression for the scattering amplitude in terms of the diffracted beam amplitude
f(e,+) = i gIkJ iI z e x p ( - i k z z ) in which kz =
s
yJx,z)exp(-iq
.x)dx, (11.11)
*my
and the f signs refer to forward (z > 0) and backward (z < 0) scattering respectively. Using Eq. (11.7), we obtain
da = dQ ko
(&y i 1 2n
lim
z-fm
ws(x, z)@(x’, z ) exp( -iq
x
+ iq - x‘) dx dx’. (11.12)
Shown in Fig. 2 is a typical transmission high-energy electron diffraction (THEED) pattern, obtained from a GaAs single crystal along the ( 1 1 1 ) zone axis for 100 keV incident electrons. This diffraction pattern is basically generated by dynamical diffraction processes of inelastically scattered electrons, and consists of complex Kikuchi line, band, ring, and parabola patterns. Shown in Fig 3 is a reflection high-energy electron diffraction (RHEED) pattern from a cleavage GaAs( 110) surface. This RHEED pattern is obtained in a transmission electron microscope (TEM), using 120 keV acceleration voltage. Similarity between this pattern and Fig. 2 is evident. Difference
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
21 1
R a m 2. GaAs( 11 1) zone axis electron diffraction pattern. The accelerating voltage used is 100 keV.
results, however, from the fact that while Fig. 2 is formed exclusively by forward scattered electrons, Fig. 3 is generated by backscattered electrons from the surface. For quantitative electron diffraction studies, the convergent-beam electron diffruction (CBED) geometry as shown in Fig. 4 is particularly favourable and can be applied either to THEED or RHEED. In this geometry, an electron beam is focused onto the specimen. If the incident beam is defined by a circular aperture, each diffraction spot is then spread into a circular disk; each point in the disk corresponds to a particular angle of incidence. The variation of intensity across each disk represents the variation of the diffracted beam intensity associated with that disk as a function of the incidence angle. These curves of variations are called rocking curves. CBED patterns are essentially two-dimensional rocking curves from a very small illuminated crystal area, which is typically of the order of a few hundred angstroms, well defined, free of defects and bending, and is well suited for comparison with theoretical calculations. Shown in Fig. 5 is a transmission CBED pattern, obtained from the Si(ll1) zone axis, showing the transmission and six (220) types of disks. Shown in Fig 6 is a bright field (BF) "'EM image of the Si sample in which the bright spot correspondsto the illuminated area used for obtaining Fig. 5 .
212
LIAN-MA0 PENG
R o w 3. W E E D pattern from a cleavage GaAs(l10) surface. 120 keV high energy electrons are incident at the surface near the [Ool] zone axis.
Figure 5 is recorded under a two-beam diffraction geometry. The thickness fringes in the figure result mainly from the two strongly excited Bloch waves associated with (OOO) and (200) beams and provide a fairly good estimate of the sample thickness from which the CBED pattern is obtained.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
n
THEED-CBED
213
n
RHEED-CBED
FIOURB4. Schematic ray diagrams showing CBED diffraction geometry for THEED and WEED.
FIOURE5. [111] zone axis CBED pattern from a Si crystal. The pattern is recorded for 200 keV incident electrons.
214
LIAN-MA0 PENG
FIOURB6. Two-beamBF TEM image of a Si single crystal. The white spot in the figure is the illuminated area used for obtaining the CBED pattern shown in Fig. 5. The acceleration voltage is 200 keV, and (Zoo) Bragg condition is satisfied.
For a very small electron probe, it is reasonable to assume that the CBED pattern is obtained from an area of uniform crystal thickness. A further advantage of employing CBED geometry is demonstrated by Fig. 7, which is obtained from the same sample as used for obtaining Fig. 6, but from a different area. The crystal is seen to be severely bent, and the angle of incidence changes appreciably across the sample. An ordinary selected area electron diffraction (SAED) pattern obtained from this sample then needs to be corrected for the distortion. For a small electron probe, as in the case of CBED, the incidence conditions may, however, be assumed to be uniform. C. Scattering by an Average Potential
We now consider the problem of calculating electron wave function ty(r). The total Hamiltonian for a system in Fig. 1 is
H=--h2 V2 + V(r, ...,rj, ...,Rn,...) + H,,, 2m
(11.13)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
215
R o m 7. Many-beam BF TEM image of a bent Si single crystal, showing three pairs of 200 type “extinction contours.”
where the first term represents the free electron Hamiltonian, the second term denotes the interaction potential between the incident electron and the crystal (in which r denotes the spatial coordinate of the incident high energy electron, and rj and R, denote the coordinates of thejth electron and nth nucleus of the crystal, respectively), and the third term is the crystal Hamiltonian. The interaction potential V(r, ...,r j , ...,R,, ...) is given by -Ze2 e2 (11.14) V(r, ...,r j , ...,R,, , .) = + n Ir - RnI j Ir - rjl
c
..
c-
.
By writing the nth crystal state as In), we have
where En is the corresponding energy eigenvalue of the nth crystal states. A formal treatment of electron diffraction must be based on a timedependent Schrodinger wave equation
ih-arv = H v . at
(11.16)
216
LIAN-MA0 PENG
To a first-order approximation, however, the incident electrons are diffracted by an averaged potential which is defined as 1 (V(r)) = -
zn
exp(-E,/k,
T)(nlV(r,
...,rj, ...,R,, ...)I n ) ,
(11.17)
where Z is the partition function of the crystal
2=
c exp(-E,/kgT), n
in which T is the absolute temperature and kB is the Boltzmann constant. Since the averaged potential (V(r)) is time-independent, the total energy of the system is conserved. The electron wave function can then be chosen to be an energy eigenfunction and be written as W(r, t ) = W(r) exp( - iEt/h),
(11.18)
where E is the total energy of the system. The time-dependent Schrodinger wave equation (11.16) then reduces to a time-independent equation (11.19) For crystal diffraction, the averaged potential (V(r)) is periodic. In Appendix B, some details have been given as to how to calculate this potential for several different diffraction geometries. D. Elastic Scattering
For thin films, such as those used in high-resolution electron microscopy (HREM) imaging (Cowley, 1981; Spence, 1988), the average potential approximation is adequate. For thicker crystals, effects due to the difference potential 6V , 6V(r, ..., r j , ..., R,, ...) = V(r, ..., rj,
..., R,, ...) - (V(r)),
(11.20)
are becoming increasingly important and cannot normally be neglected (Yoshioka, 1957; Howie, 1963; Hashimoto et al., 1962; Whelan, 1965; Humphreys and Hirsch, 1968; Dederichs, 1972). For elastic scattering, which is defined as those processes for which the initial state and final state of the crystals are identical, some of the effects of diffuse and inelastic scattering resulting from 6 V may be taken into account by the introduction of an opticalpotential Vop. In general, the optical potential is a non-local
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
217
operator. When acting on a real space wave function, we have VoPv =
s
Vop(r, r’)v(r’) dr’.
(11.21)
Fortunately, for high-energy electron diffraction, the non-locality of the optical potential operator (11.21)is very small and negligible (see Appendix C). The equation which governs the elastic diffraction of electrons by crystals then becomes
(11.22) An important property of the optical potential is that, as the average potential (V(r)), the optical potential Vop is periodic (Dederichs, 1972). It should be noted that while the average potential (V(r)) as appeared in (11.19)is always a real quantity as required by the conservation law of electrons, the optical potential is a complex potential. This is because the optical potential operator is only an approximate mathematic means for describing the effects of many-body and inelastic scattering processes on the elastically scattered electrons. The fact that the optical potential is complex reflects the fact that some of the incident electrons have been scattered from the elastic channel into inelastic channels and therefore are lost in the elastic channel. The optical potential may be constructed exactly like for the averaged potential (see Appendix B), except that an addition to the elastic atomic scattering factor is required; this additional absorptive atomic scattering factor may be calculated using one of the published computer routines (Bird and King, 1990;Weickenmeier and Kohl, 1991). E. Quasi-elastic and Dvfuse Scattering
In the presence of 6V as defined in Eq. (11.20), the elastic wave function ys(r)is distorted. When 6V is small and perturbation treatments apply, the effects of 6V on the elastic wave function may be described by the optical potential operator (11.21). Electrons which are scattered by 6V from the elastic channel into inelastic or diffuse channels may be described, to a firstorder approximation, by a distorted wave Born approximation (DWBA) (Mott, 1965). In an energy filtering diffraction experiment (Reimer, 1991), measurements are made for elastic and quasi-elastic scattered electrons. The scattered beam amplitude is given by (Dudarev et al., 1993a,b)
218
LIAN-MA0 PENG
where the suffix el is used to denote elastic scattering amplitude, and k and k, are used explicitly to reflect the fact that the scattering amplitude is a function of the incident and scattered electron wave vectors k, and k. The notation W-k(r) has been used to describe the dynamical electron wave function corresponding to an incident plane wave along the - k direction. The angular distribution of the elastic and quasi-elastic scattered electrons is described by the averaged differential cross-section:
(11.24) A detailed derivation of the DWBA (11.23) and its extension to higher order diffraction will be given in Section V. F. Correlation bet ween Theory and Experiment
Shown in Fig. 8 are two sets of zero-loss experimental and computed onedimensional CBED rocking curves along the [002] direction (Spence and Zuo, 1992). These curves are obtained from a B e 0 crystal at 80 keV, using a systematic diffraction orientation near [130] zero axis. Figures 8a and 8b are recorded at two different crystal thicknesses, 709A and 1,060A, respectively. Both calculated rocking curves in Figs. 8a and 8b are made using the same set of structure factors. The plots below these figures show the difference between calculations and experiments. The agreement between theory and experiment is seen to be excellent. Shown in Fig. 9 are two sets of energy unfiltered experimental and simulated rocking curves across a (200) disk, obtained from a Si single crystal at 80 keV. Figures 9a and 9b are recorded at two different crystal thicknesses: (a) t = 2,860 A and (b) t = 3,420 A. In the figure, the curves labelled “dynamical theory” are calculated based on the optical potential method. It is seen that the agreement between the theory and experimental results is rather poor. This is because the optical potential method aims only at calculating elastic scattering amplitudes. All inelastic contributions and interactions between the elastic and inelastic channels are neglected. The curves labelled “kinetic equation” are calculated based on a timeindependent quantum kinetic equation for the density matrix of highenergy electrons. The kinetic equation takes into account effects of multiple inelastic scattering events and partial coherence (Dudarev et at., 1993b). This figure shows that a fair agreement between theoretical and energy unfiltered experimental rocking curves has been achieved, but the
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
219
2.0c
1s
o
1 .oa
0.50
o.oa
'L
I
FIGURE 8. Zero-loss experimental and computed CBED rocking curves obtained from Be0 for 80 keV and at two crystal thicknesses. (a) The sample thickness is 709 A , and (b) the sample thickness is 1,060A. The plots below the rocking curves are the difference between the calculation and experiment. [Courtesy of Dr. J. M. Zuo.]
LIAN-MA0 PENG 1
I
I
1
I
I
1
I
I
1
I 0.25
I
I
I
I
,
I
0.3
0.35
0.4
0.45
0.5
0.55
0.6
I
I
I
I
I
I
0.65
0.7
0.75
1
0.8
0.6
0.4
0.2
0 I
0.25
0.3
I
0.35
I
0.4
I
I
I
I
0.45
0.5
0.55
0.6
I
0.65
I
0.7
1 0.75
FIGURE9. Energy unfiltered experimental and calculated CBED rocking curves within (200) diffraction disk. Experimental CBED pattern is recorded from Si and for 80 keV. The comparison is made for (a) t = 2,860A; (b) r = 3,420A.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
22 1
agreement is poorer than that of Fig. 8. It should be pointed out that the agreement achieved here is the best even achieved for energy unfiltered rocking curves (Peng et al., 1993). It is evident therefore that to conduct quantitative electron diffraction studies it is best to employ an energy filtering facility. 111. DYNAMICAL ELASTIC DIFFRACTION BY CRYSTALS A . Fundamental Equations
The idealized experiment arrangement which is to be analyzed in this and following sections is shown in Fig. 10. The specimen is assumed to be a perfect, parallel-sided crystal slab or a semi-infinite crystal; the crystal structure is assumed to be infinite parallel to the surface and periodic. In real space, the x-y plane is chosen to be parallel to the surface, and the positive z direction is pointing into the crystal. As shown in Section 11, for electron energies in the non-relativistic range, the steady state elastic wave function y(r) obeys Schrodinger's wave equation (11.22):
[G + v2 + U(r)]y(r) = 0,
(111.1)
where (111.2)
ko is the absolute amplitude of incident electron wave vector in the vacuum. The relativistic correction to the wave equation (111.1) may be most conveniently introduced at this stage by replacing the definition (111.2)
THEE0
RHEED
FIGURE 10. Schematic diagram showing the real space coordinate system for a crystal slab system or a semi-infinite crystal.
222
LIAN-MA0 PENG
with the expression (111.3)
It has been shown that the relativistically corrected wave equation (111.1) works well up to the mega-electron volt level (Fujiwara, 1961; Howie, 1962). For dynamical electron diffraction by crystals, the effective one electron potential U(r) is periodic and can be expanded in terms of Fourier series (111.4) where g are three-dimensional reciprocal lattice vectors, and the Fourier coefficients U, are related to the atomic scattering factor via the relation
u, = - - 1A(g/4n) exp( -Mf) exp( -ig - ri),
?(:)
(111.5)
i
in which V , is the volume of a crystal unit cell, the summation over i is taken over all atoms within the unit cell, M g = 8n22(g/4n)’ is the usual Debye-Waller factor, and is the “temperature factor” of the atom (International Tables for X-ray Crystallography, 1974). Following Bethe (1928), we write a solution of Eq. (111.1) for a periodic potential U(r) in the form (111.6) y(r) = C C,exp[i(k + g) * r].
2
K
Substituting (111.6) and (111.4) into (111. l), we obtain the fundamental equation of the dynamical theory of electron diffraction by crystals,
[ K 2 - (k
+ g)’]C, +
U,-/,ch = 0,
(I I I .7)
h+g
where K is the electron wave vector derived from k, after correction for the mean inner potential, and K 2 is given by
K 2 = k Z , + U,.
(111.8)
The set of equations (111.7) are second-order eigenvalue equations. Since in the processes of dynamical electron diffraction by crystals the surface parallel component of the electron wave vector k is a good quantum number, only the surface normal component of the wave vector k, is left with some freedom. The general three-dimensional eigensystem (111.7) can be reduced to an one-dimensional eigensystem by introducing an eigenvalue y : k = ko + yn, (111.9)
223
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
where n is a unit vector pointing towards the positive z direction. The set of equations (111.7) then becomes [y2
+ 2(ko + g),y
C
- 2koSg)Cg-
(111.10)
Ug-hCh = 0,
h *g
where 2koSg = K 2 - (ko + g)*,
(111.11)
and Sg is the usual excitation error for reflection g. In matrix notation, Eq. (111.10) can be rewritten as (y21 + yD
+ Q ) C = 0,
(111.12)
in which I is an identity matrix, D is a diagonal matrix with
WJ,
= 2(ko
+ g)z,
(111.13)
and Q is a general matrix whose elements are {Qlgh
= -2kosgagh
(111.14)
- ug-h(l - dgh),
where dghis the Kronecker 6, and C is a column vector with (111.15)
{CIh = Ch.
B. Solutions of the Fundamental Equation Mathematically, the fundamental equation (111.12) is only a special case of a general high-degree eigenvalue problem (yrnCo
+ ym-k, +
+ yC,-, + C , ) X
(111.16)
= 0,
which can be solved by forming two matrices A and B:
(
-cl
A =
-c3 ..*
: i :::),
-c2
... ...
... ...
coo and
B=( 0
0I
: i), 0
0
... ... ... ... (111.17)
such that the original high-degree problem (111.16) is transformed into a first-order problem AZ = yBZ, (111.18) with ZT = ( y m - l x , y m - z x , ...,X ) ,
224
LIAN-MA0 PENG
where the superscript T denotes transposition of the vector. The first-order linear system (111.18) can then be solved using standard numerical routines, such as NAG (1989) and EISPACK (Garbow et al., 1977). For electron diffraction, m = 2, Eq. (111.12) becomes
(-; -;) (yc")
=
Y(
yc")
9
(111.19)
and this is the equation first introduced by Colella (1972) for solving the RHEED problem. For cases in which there exist a limited number of N reciprocal lattice points lying on or close to the Ewald sphere, only N diffracted beams will be appreciably excited in the crystal. The set of infinite equations (111.19) may then be truncated into a set of 2N equations, giving in general 2N values of y") ( j = 1 , ...,2N) and 2N associated Bloch waves b(j)(k'j), c) ( j = 1 , ..., 2 N ) . It has been shown by Peng et al. (1992) that associated with each reciprocal lattice point in the crystal, there exist two diffracted waves propagating forward and backward with respect to the surface normal direction. For the reciprocal lattice vectors having small surface normal component, both the forward and backward waves are important and must be included. For a reciprocal lattice vector with large surface normal component, on the other hand, only one of the two waves will have appreciable amplitude. The number of beams to be included in the eigensystem (111.19) may therefore be reduced. This procedure is equivalent to the neglect of yz in (111.10) for certain reflections. The criterion for this neglect is that the ratio of the relative contributions resulting from the y2 and y terms is small,
I
Y 2(koz+ gz)
I
4 1.
(111.20)
Since the eigenstates and eigenvalues of the fundamental equation (111.10) are periodic functions in reciprocal space (Ashcroft and Mermin, 1976), it is always possible to restrict all non-equivalent eigenvalues 7") within the first Brillouin zone such that for all distinct Bloch waves we have I y ( j ) l 5 1/2d, where d is the planar spacing of the reflecting planes. Condition (111.20) now becomes kkOZ+ gz)l
* 1/4d.
(111.21)
If for reflection g Eq. (111.21) is satisfied, the fundamental equation (111.10) reduces to a first-order equation:
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
225
Assuming that among the N reflections involved, the first m beams do not satisfy (111.21) whereas the remaining n = N - m beams do. We can then rearrange Eq. (111.10) to obtain
f o r g = g , , . . . , g m , and (111.24)
for g = g m + l ,...,gN. In matrix notation, we can combine Eqs. (111.23) and (111.24) to obtain (111.25)
in which C 1 and C are an m- and an N-dimensional column vector, respectively: (111.26) CT = (C,,, - - * * Cgm); CT = (Cg, .- * CgJ; D is an m x m diagonal matrix with 9
Plgg
= 2(koZ+ gz),
9
g = g1, - - - , g m ;
(111.27)
Q is an m x N matrix
(Q)gh = -2koSgdgh - Ug-h(l - dgh),
g = g1, * * * , g mh; = g1,
..., g N ; (111.28)
I is an identity matrix, 0 is a null matrix, and A is an n x N matrix whose elements are
(111.29)
When m = N, as for the case of glancing incidence RHEED, all beams need to be treated fully. The matrix equation (111.25) reduces to
and this is identical to (111.19). When m = 0, as for the transmission Laue case where ko, + g , ko 1/4d, a first-order N x Neigensystem
- *
AC = y C .
is obtained.
(111.30)
226
LIAN-MA0 PENG
C. Boundary Conditions, Transmission, and Reflection Amplitudes Having solved the eigenvalue problem (111.10) for N beams, the total wave function within the crystal can be written as a summation of Bloch waves 2N
v(r) =
C,")exp[i(k,
a(j) j =1
+ y(j)n + g)
*
r].
(111.31)
€!
It should be noted, however, that to the accuracy of a phase factor, the Bloch wave solutions of Eq. (111.10) are periodic in reciprocal space, i.e., b")(k, r) = b")(k
+ g, r).
(111.32)
It is a corollary of this periodicity that not all Bloch waves resulting from Eq. (111.10) are separate and distinct. Those which differ in k by only a reciprocal lattice vector are physically equivalent (see Fig 11). When n distinct reciprocal lattice rods are involved, there exist 2n independent Bloch waves, and a total of N strongly excited diffracted beams within the crystal. The total electron wave field within the crystal
FIOURE11. Dispersion surface construction for a reciprocal lattice rod or systematic row of reciprocal lattices, showing the periodicity of the dispersion surface along the surface normal direction. A Brillouin zone boundary is marked in the figure.
227
NEW DEVELOPMENTS OF ELECTRON DIFFRACTIONTHEORY
slab can be written as
c a(j)&O(k(j),c) 2n
W(r)=
=
j =1
c Cg(i)exp[i(k, + y(j)n + g)
2n
gN
j=l
g=g,
C ,(A
*
r],
(111.33)
and the diffracted beam amplitude associated with the mth reciprocal lattice rod is given by 2n
~,(z= )
1di)c Ci!)exp[i(koZ + y") + gmZ)z].
j=1
(111.34)
gm
The boundary conditions are given by continuation conditions imposed on the diffracted beam amplitudes and their surface normal derivatives, i.e.,
2n
- iexp(-ik,.z)Wk(z)
=
c a(') C (koz+ y") j=1
+ g,,)Ci~)
g,
(I II. 36) x exp[i(y(j) + ~ , , ) z I , where the prime attached to W, denotes differentiation with respect to z. For convenience hereafter we shall still use ty, to denote the left-hand side of (111.35)and use WA to denote that of (111.36).Equations (111.35)and (111.36)can be written in matrix notation (Peng and Whelan, 1990) Y(z) = SP(z)CT(z)u,
(111.37)
where Y' is a 2n-dimensional column vector
(I II. 38) S is a (2n x 2 N ) matrix whose elements are ISI,,
=
1, 0,
if g belongs to the rnth rod otherwise
(I 11.39)
(I 11.40) (111.41)
228
LIAN-MA0 PENG
C is a (2Nx 2n) matrix: (C),i =
for h
5
cp
(111.42)
N , and
+ y"' + h,)Cf).
[CJ,+N,i= (koz
T is a (2n x 2n) diagonal matrix: (T)u = exp(iy"'z)dij,
(111.43)
= a('). and a is a 2n dimensional column vector For a crystal slab of thickness t, if we choose the plane of z = z1 to lie at the upper surface and the plane of z = z2 to lie at the lower surface (z2 - z1 = t), we then have from (111.37)
Wzl) = SP(zl)CT(zl)a,
(111.44)
Y(z2) = SP(z2)CT(z2)a.
(111.45)
In (111.45), a can be eliminated using (111.44),
a = [SP(zl)CT(zl)l-' Wl), to give Wz2) = M(z2
Y
(111.46)
Z,)WZl)Y
in which the matrix M(z2 ,z,) is called scattering matrix and is given by
M ( z ~z,) = [SP(Z~)C]T(Z~ - Z~)[SP(Z~)CI-'. 9
(I1I. 47)
If the origin of the coordinate system is shifted by R, thejth Bloch wave becomes b(j)(r + R). Since Bloch waves cannot depend on the choice of the origin, from the invariance of b'j) we obtain b("(r
+ R) = c C,")exp[i(k") + g)
*
(r
+ R)] = b'")(r)
8
=
c C$j) exp[i(k") + g) - r], g
and since the equation must hold for all r we then have C;(j) = exp(ik") * R)(exp(ig R)C,")),
(111.48)
where the primes denote quantities in the new coordinate system. Since the constant phase factor exp(ik(j) * R) can be arbitrarily removed, we
229
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
have then M(z2
21)
= [SP(zz)QCI T(zz - zi)[SP(zi)QCI-',
(111.49)
with Q being a 2N x 2N diagonal matrix
lQlg,g
=
l Q l g + ~ , g += ~
exP(ig R)*
(111.50)
If instead of shifting the origin of the coordinate system, the crystal is displaced by R, the Bloch waves become b(j)(r - R). Following similar argument which leads to (111.50), we obtain
lQlg.8 = l Q l g + ~ , g +=~ exP(-ig
R).
(111.51)
The scattering matrix method can also be generalized to describe diffraction from an assembly of crystal slabs, each having thickness tn and displacement R, , giving Y(z) = M(z)Y(O),
and
M(z) =
n Mn(Z,),
(111.52)
n
where the scattering matrix Mn(zn)of the nth crystal slab is given by Mnk) = [ S n Pn(zn) Qn Cn 1 Tn(tn ) [ S n Pn(zn- 1) Qn Cn1 -
(111.53)
in which zn = C[t = tk. We now consider a general case of diffraction by a crystal slab. Since the vacuum region above the upper surface contains only the incident beam and Bragg reflected beams, the wave function will be of the form n
Y&dr)
=
exp(ik0 * rl
+ m = 1 a m exp(i(k,, - kmz)- r],
(111.54)
in which the first term represents the incident beam, and amare the reflected beam amplitudes (rn = 1 is for the reciprocal lattice rod through the origin of the reciprocal space). The reflected wave vectors kmt =
( k ~+ g m ) t *
kmz =
I.JG
- (k~ + grn)?ln,
(111.55)
where the subscript t denotes the tangential component of the wave vector, and in particular kl, = ko,. At the upper surface where the origin lies we have
230
LIAN-MA0 PENG
In the vacuum region below the lower surface, only the transmitted Bragg beams exist. The wave function is given by n %veAr)
=
C
m= 1
3 m expIi(kmt + k m z ) * rI,
(I II. 56)
where 3m is the transmitted beam amplitude associated with the mth reciprocal lattice rod. At the lower surface
where t is the total thickness of the crystal slabs system. Explicitly, the matrix equation (111.52)can be rewritten as
(
) (
)
IS,, + RnI IM2lImn IM22)mn Ikn2(an1 - @ n ) J ' where M I , , M12, M z l , and M2, are the n x n sub-matrices of M(z)'s. Expanding the preceding matrix equation and after some rearrangements, we thus obtain the reflected beam amplitudes (CR,): 13~exp(ikm,oI lkmzgm exp(ikmzt)l
=
IMl1Imn
IM12Imn)(
and the transmitted beam amplitudes [ 3 m ] :
Ism1
+ IMltkn2e x ~ ( - i k ~ ~ ~ ) l 1 1 4 , J [IMil exp(-ikm2Z)1 - lM12knze x p ( - i k m z ~ ) l ] l ~ n l .(111.58)
= [(Mil exP(-ikm2Z)l
+
It should be noted that here we have used the notation that in (111.57)or (111.58) the reciprocal of the matrix in the denominator pre-multiples the matrix in the numerator, i.e., M/N = N-'M, and the same convention will be used subsequently. The problem of high-energy electron diffraction and reflection by a crystal slab is thus formally solved.
D. Two-Beam Approximation As a simple application, we now consider a two-beam case, assuming that within a crystal only one reflected beam associated with reflection g is appreciably excited. The fundamental equation (111.7)is thus truncated into a 2 x 2 matrix equation
(I1 I. 59)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
23 1
\ "
0
FIGURE12. Schematic diagram showing the two-beam hyperbola approximation for two diffraction geometries.
For a non-trivial solution, the determinant of the preceding matrix must be zero, giving the dispersion equation:
(K2 - k2)(K2 - ki) =
(I 11.60)
To obtain some idea of the form of the dispersion surface, which is defined as the plot of k as a function of the surface parallel component of the incident electron wave vector kOt,it is useful to consider a limiting case for which U,= 0. The dispersion surface then degenerates into two spheres with radius K, one centred on the origin and the other on the reciprocal lattice vector g, as shown in Fig. 12. As U,is increased from zero to a finite value, the lines of intersection of the two spheres will be modified, giving two branches of dispersion surface. Since the region of modification is very small in comparison to the radii of the two spheres which intersect, near the modified region the two spheres may be considered as straight lines and the section of the dispersion surface may be approximated by a hyperbola. Letting k = kB + 6k, (111.61) where k, denotes the wave vector satisfying the Bragg condition for g, i.e., = (kB + g)' = K2, and using the hyperbola approximation to the dispersion equation, i.e., neglecting triple and quartic terms in Eq. (111.60), we obtain a useful relation
(kB'6k) = *(-(g.dk)
* ./(g.6k)2 + /ug12].
(111.62)
232
LIAN-MA0 PENG
We now introduce a deviation parameter w : (111.63)
o = g * 6k/lUgl.
In terms of o,Eq. (111.62) can be rewritten as (kg * 6k) = *lu,l(-w f -1.
(111.64)
From Eq. (111.59) and using Eq. (111.62), we have
Using the relations 1
O*diT2
= -w f
m,
+
and
IColz IC,lz = 1,
we obtain for real w and positive U,:
(111.66) This expression can be further simplified by introducing a useful parameter B (Hirsch et al., 1965), w = cotp,
(111.67)
where the deviation w varies from --oo to +a,and the angle p varies from R to 0. Equation (111.66) now becomes
c:")= -cosp/2,
~ j '=) sin/1/2,
(I1I. 68)
Ci2' -- sinp/2,
cp = cos/3/2.
(I1 I .69)
and The conventional eigenvalue y as defined in (111.9) is related to the parameter 6k used here by the relation
k = ko
+ yn = kg + 6k.
Substituting y into the second-order dispersion equation (111.60), using the hyperbola approximation and noticing that kB 6ko = 0, we obtain two solutions for y : f*)
=
1 I-(g 2(kBc + 8,)
where 6ko = ko
- kg.
*
dko)
* d(g
*
6koY
+ (1 + & ' Z / ~ B ~ ) I ~ ~ I ~ I , (111.70)
233
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
In the symmetric Laue case of transmission electron diffraction (see Fig. 12), we have g, = 0, kB, = ko, kB, = -812, and g * 6ko = g * ( k o - kB) = g * (ko
+ ig) = -koS,,
(111.71)
where
- (k,+ g)’
2ko~,=
=
+ g)
-(2ko
*
g.
Substitution of Eq. (111.71) into Eq. (111.70) then gives the conventional form for the two-beam eigenvalue solutions 1 =((kOs,)
* .\l(k0S,)2 +
2kO
(111.72)
iu8i2)*
In the symmetric Bragg case of electron diffraction (see Fig. 12), g, = -g, kB, = g/2, kB = -g/2, we have
+*) = -- 1 2kBz
((kOSg)
*
d(k0s,)2
-
I ug12 1.
(111.73)
This equation is similar in form to Eq. (111.72), but the sign of IV,12 within the square root now becomes negative. It is a direct consequence of this sign reverse that a band gap now exists within which k becomes imaginary. An exact solution (rather than the hyperbola approximate solution, see, e.g., Peng, 1989) gives the boundaries of the band gap: (tS)’ -
lu,l
= K,”= (ts)’ +
(111.74)
WgI.
It can be readily shown that within the band gap, w is a pure imaginary quantity. Letting o + io, we obtain from Eq. (111.65) two sets of solutions for the Bloch waves:
co = --1 (iof d i = 2 ) 1 ’ 2 , Jz
1 c, = (-iof d i T 7 ) * ’ 2 . 4
(111.75)
A phase angle q can be introduced to further simplify the solution io
+ J1 - o’ = exp(iq),
q = tan-’
0
JI-;J“’
where q is the phase angle of the complex quantity. We now have
where the angle q(-) =
II -
q(+).
234
LIAN-MA0 PENG
The matrices and vectors defined by Eqs. (111.39)-(111.43) can be readily written out for this two-beam case: s=(1
O 0)
0 0 1 1 ’
Now consider reflection from a crystal slab of thickness t. Following (111.47), the scattering matrix M is given by
in which
Since the vacuum region above the crystals slab contains only the incident beam and the specular reflected beam, the super vector Y is of the form (111.78)
at the upper surface of the crystal slab, and (111.79)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
235
at the lower surface. In the preceding expressions, a0and 30 represent the specular reflected and transmitted beam amplitudes, respectively. Substitution of (111.78)and (111.79)into (111.46)leads to
(23
(k0;(l+-@&)
=
(
30 exp(ik0, t )
ko, 30 exp(ikozt )
giving the specular reflected beam amplitude cR0 :
For RHEED from a semi-infinite bulk crystal surface, it should be noted that between the two non-equivalent eigenvalues y(') and f 2 ) given by (111.70),only the one which has positive imaginary component will lead to the Bloch wave decaying into the bulk crystal and is physically allowed (Lamla, 1938;Miyake et al., 1968;Marks and Ma, 1988). If we denote this eigenvalue by y and the associated eigenvector by (C,, C,),the matrices C and T then reduce to a vector and a scalar, respectively:
and the Bloch wave amplitude vector rx to a scalar a. We can express the boundary condition at the upper surface explicitly. Choosing the origin to lie at the surface, we obtain
and therefore the specular reflected beam amplitude:
(111.81) To illustrate the effects arising from the finite thickness of the crystal slab, Fig. 13 shows two rocking curves calculated for a two-beam case around the Bragg incident angle. The calculations are made for 100 keV electrons and Au single crystals. Crystal slabs are of thicknesses t = 61.2A and 183.6 A for the two curves, respectively. This type of rocking curve corresponds
LIAN-MA0 PENG
1
0.8
0.6
0.4
0.2
0
31
31.2
31.4
31.6
31.8
32
32.2
32.4
Angle of incidence (mrad)
32.6
32.8
33
33.2
F ~ o m13. Two-beam specular RHEED rocking curves from the Au(001) surface of a crystal slab having thicknesses of 61.2A and 183.6 A . [From Peng and Whelan, 1990.1
to the so-called “Ewald’s solution” in the x-ray case (Ewald, 1917; Zachariasen, 1945). Oscillation fringes on either side of the Bragg reflection region arise from the interference of the backscattered beams from both the top and bottom surfaces of the crystal slab. Differing from the case of reflection from a crystal slab, the curves shown in Fig. 14 have been calculated for a semi-infinite bulk crystal. This type of curve corresponds to the “Darwin’s solution” in the x-ray case (Darwin, 1914; Zachariasen, 1945). No oscillation is observed around the two sides of the Bragg peak. The curve is asymmetric about the Bragg position, even for a non-absorbing crystal, as contrasted with the more weakly scattering x-ray case. The reflection peak appears to be more strongly damped around the lower-angle side. E. Transmission High-Energy Electron Diffraction
In THEED, high-energy electrons are incident nearly normally to the crystal surface such that ko, + y”’ + g , = ko, y‘j’ + g ,
*
237
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY I
I
1
I
I
I
1
I
1 -
0.8
-
0.6
-
0.4
-
0.2
-
31
I
I
abs=0.0%abs=l.O% abs=5.0%.....
31.2
31.4
31.6
31.8
32
32.2
32.4
32.6
32.8
33
33.2
Angle of incidence (mrad) FIOURE 14. Two-beam specular RHEED rocking curves from the (001) surface of a semiinfinite bulk gold crystal. Curves shown in the figure have been calculated for 100 keV incident electrons, and with absorption of 0, 1 , and 5%. respectively. [From Peng and Whelan, 1990.1
for all g. The boundary conditions (111.36) therefore reduce to simple continuity of transmitted and diffracted wave amplitudes (111.35). For cases when n distinct reciprocal lattice rods are involved, there exist 2n independent Bloch waves within the crystal. Since in the Laue case the scattering is predominantly forward, among the total 2n Bloch waves, the n Bloch waves which propagate backward can thus be neglected. This procedure is equivalent to neglecting the y2 term in Eq. (111.10).Equation (111.10)thus becomes a linear eigenvalue equation (111.30),giving n distinct values of y and a total of n independent forward-propagating Bloch waves. The diffracted beam amplitude associated with the mth reciprocal lattice rod is given by n
u/,(z) =
c di)c C~~)exp[i(koz + gmz + y(j))z].
i=1
(111.82)
g,
In matrix form: P ' = SPCTa,
(111.83)
where S is an n x N matrix, P is an N x N diagonal matrix, T is an n x n diagonal matrix, and a is an n-dimensional vector. All these matrices are identical to the corresponding top left submatrices of (111.39)-(111.43).
238
LIAN-MA0 PENG
1. THEED by a Multilayer System We now consider THEED by a multilayer system, consisting of n layers of crystal each having a thickness of t i . Following the same procedure which leads to (111.53),we obtain
n Mi(zi)P(O), n
P(z)=
(111.84)
i= 1
where Mi ( Z i ) =
[ S i Pi ( Z i ) Qici] Ti(ti)[SiPi(~i1) QiCi1-I
*
(111.85)
Q is the same as that defined in (111.51), which allows for a shift between different layers of the crystal, and zi = C i =, t k . For a plane wave incidence, we have ~ ~ (=0dml. ) The diffracted beam amplitudes are given by IWm(Z)l =
(;:;)
(TI 7 .;.) (i) MI, MlZ
=
***
=
(;:).
(111.86)
The diffracted beam amplitudes (ly,) depend in general on the incident beam direction. For CBED diffraction geometry, intensity distribution of the gth disk is given by
(111.87) We now consider an application of Eq. (111.86)to a Si/Ge,Si,-,strainedlayer superlattice (SLS). The SLS is assumed to consist of alternating layers of Si and Ge,Si,-,. Shown in Fig. 15 are an experimental [lo21 largeangle CBED (LACBED) pattern from a Si/Ge,Si,-, SLS sample and the corresponding simulated pattern. The simulation is performed using 3 1 beams, and for a primary beam energy of 99.4keV, a crystal thickness of t = 2,450A , and a Ge concentration x = 0.37.The surface normal components of the strained epilayer lattice constants are an(Si) = 5.427 A and a,(Ge,Si,-,) = 5.568 A . The simulated LACBED pattern is seen to agree very well with the experimentally obtained pattern (Wang et al., 1992a).
2. THEED by a Deformed Crystal We now consider diffraction by a deformed crystal. For a general strain field, following Whelan and Hirsch (1957), we make the usual column approximation. Along each column, we further assume that the deformation field is a slow-varying function so that the eigenvalues and eigenvectors of a thin crystal slice in a column may be assumed to be unaffected by the presence of the slowly varying deformation field. The slowly varying
239
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
FIGURE15. (a) Experimental and (b) simulated [I021 LACBED patterns from a S ~ / C I ~ , , , SLS, S ~ ~ for . ~ ~99.4 keV. [From Wang et al., 1992a.l
displacement field along a column may be simulated by considering the column as composed of an assembly of thin slices, each having a thickness Atk and a rigid shift with respect to the origin Rk(x,y ) . Along the column, we have
w,Y , f)=
n
n
Mk(X,
Y ) W , Y , O),
(111.88)
k= 1
where the matrix Mk is the scattering matrix M k (x, Y ) =
[SoP
1Qk (x, Y )Col T'(A fk ) [SO P
- 1) Qk (x, Y )CO1-
9
(111.89)
in which the superscript denotes that the relevant matrices are associated with a perfect crystal. Qk is a diagonal matrix as defined in Eq. (111.51). Alternatively, differential equations similar to the Howie-Whelan equation (Howie and Whelan, 1961) can be derived to simulate multiple diffraction effects within each column with high-orderLaue zone (HOLZ) effects included. For simplicity, we assume that the various HOLZ layers are not very close together such that along each reciprocal lattice rod only a single beam is appreciably excited. We then have a case where n = N,and the matrix S reduces to an identity matrix. For a single crystal slab, Eqs. (111.84) and (111.85) give
Y(z+ Az) = P(z + Az)Q(z)CT(AZ)C-'Q-'(Z)P-'(Z)Y(Z). (111.90)
240
LIAN-MA0 PENG
By expanding the preceding expression up to first order, we obtain a set of differential equations
Using Eq. (111.30), i.e. AC = C ( Y ( ~ 'we ) ~then , arrive at
and this is the generalized Howie-Whelan equation, with HOLZ effects explicitly included. Differential equations for the Bloch wave excitation amplitudes can also be derived if we go back to the Bloch wave picture and directly differentiate the following expression: Y(z) = P(z)Q(z)CT(z)a(z).
(111.93)
We obtain
+ P(z)Q(z)C(iy"'ldT(z)a(z)+ P(z)Q(z)CT(z)
z. da
(111.94)
Comparing (111.92) with (111.94), we obtain (111.95) and this equation has an identical form as that of involving only zero-order Laue zone (ZOLZ) reflections (Hirsch et al., 1965). Dynamical HOLZ effects are included, however, through the dependence of the matrices T and C on the inclusion of HOLZ reflections. The diffracted beam amplitudes expression (111.84) includes the matrix P which explicitly depends on HOLZ reflections. We first consider an application to the simplest case of plane defects, i.e., stacking faults. Shown in Fig. 16a is a bright field [Oil] LACBED pattern from a Si crystal, showing clearly a superimposed stacking fault (SF) shadow image, and in Fig. 16b is a simulated SF LACBED image. The simulation was made using 15 beams, a displacement vector R = .f[lTl], a crystal thickness of t = 1,650 A, and a defocus value of 4f = 3Opm. Four HOLZ reflections are included. The simulated image is seen to agree very well with the experimental image (Wang et al., 1992b).
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
241
F m m 16. (a) Experimental and @) simulated BF [Oil]LACBED patterns of a Si single crystal, showing superimposed shadow images of a stacking fault. The simulation is made for 100keV and a sample thickness of 1,650A. From Wang et al., 1992b.l
We now consider a more complicated case of a dislocation. Shown in Figs 17A-C are a series of three bright field [ l l l ] LACBED images of an edge dislocation, with varying diffraction conditions. Figures 17A-C show the corresponding four beam simulated images. The simulations have been made for a crystal thickness of t = 1,500 A, a defocus value of 4f= lOpm, a dislocation line vector of u = [Oll], and a Burgess vector b = #[Oil]. The column size used is 25 A x 25 A , and the slice thickness is At = 25 A (Wang et al., 1992b).
F. Refection High-Energy Electron Diffraction In RHEED, we are concerned with a problem of the type as illustrated schematically in Fig. 18. A primary beam of electrons is incident at a twodimensional periodic crystal surface at a glancing angle, typically of the order of one degree. The model crystal consists of a semi-infinite substrate and a surface region called the selvage (Wood, 1964). The selvage could consist in the simplest case of a surface potential barrier, or it could contain in addition one or more atom layers with scattering properties different from these of the substrate. The two-dimensional nets of reciprocal lattice
242
LIAN-MA0 PENG
FIGURE17. Simulated (A-C) and experimental (a-c) BF LACBED patterns of a Si crystal containing an edge dislocation. The simulation is made for a sample thickness of 1,500 A and 100 keV. [From Wang et al., 1992b.l
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
I
243
Incident wave
Surface
FIGURE18. Schematic diagram showing a cross-section view of a model for a semi-infinite substrate plus selvage system.
in the plane parallel to the surface describing the periodicity of the substrate and selvage need not be identical, but it is necessary that they be rationally related so that the crystal surface as a whole is periodic in two dimensions. 1. RHEED from a Semi-infinite Crystal a. Substrate Scattering. Since the substrate is three-dimensional periodic, we can use the Bloch waves as previously discussed. For an n-rod case, we have a total of 2n non-equivalent eigenvalues and corresponding Bloch waves. However, only n of the eigenvalues which lead to Bloch waves propagating down or decaying into the bulk crystal are physically allowed (Lamla, 1938; Miyake et al., 1968; Marks and Ma, 1988). The boundary condition at the interface between the substrate and the selvage gives for the mth reciprocal lattice rod:
(111.96) n
~ h ( t ,= )
c
&)
j= 1
c (koz+ y"'
+ g,l)C~~)exp[i(y(j)+ gm,)ts],
(111.97)
8m
or in matrix form Y(t,) = SbPb(t,)CbTb(t,)a,
(111.98)
where t, is the total thickness of the selvage, and the superscript b refers to substrate bulk crystal. The matrices Sb and Pb have their usual forms as given in Eqs. (111.39)-(111.43), but Cb and Tb reduce to 2N x n and n x n matrices, respectively, and the vector a becomes n-dimensional. The last
LIAN-MA0 PENG I
I
20
I
I
I
full - - - -
-
truncated -
-5
0
5
z (angstrom)
10
15
FIouRE 19. Calculated one-dimensional potential variation along the surface normal for a truncated and a full potential model. The region of z > 0 represents the periodic bulk substrate, the region of -5.OA < z < 0 represents the selvage, and the region with z < -5.0 A represents the free space where U(r) = 0.
equation (111.98)can also be written as Y(t,)=
(‘”’”‘>, lB2 a1
with
(::)
=
SbPb(t,)CbTb(t,). (111.99)
By neglecting the selvage scattering, i.e., letting t, = 0, the boundary condition at the entrance surface (z = 0) requires
(III.100) which gives reflected beam amplitudes from a truncated surface (111.101) Shown in Fig. 19 are two potential models for a Pt(001) surface. The “truncated” curve in the figure represents a truncated potential mode for which t, = 0. For z > 0 the potential is periodic, and for 2 < 0 the
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
245
1
0.9 0.8 0.7
0.6 0.5 0.4
0.3 0.2 0.1
a 10
20
30
40
50
Angle of incidence (mrad)
60
FIOURE20. Calculated absolute specular beam amplitude, for the Pt(001) surface and the (0101 zone axis, as a function of angle of incidence. The two curves shown in the figure have been calculated for 100keV incident elctrons and five reciprocal rods, using a full and a truncated potential approximation.
potential is zero. The transition from the vacuum region to the crystal is assumed to occur abruptly. The “truncated potential” curve shown in Fig. 20 is the corresponding specular reflected beam rocking curve calculated based on Eq. (111.101) and for a truncated potential model. b. Selvage Scattering. We now consider a fullpotential model as shown in Fig. 19 as the “full” curve. This is a more realistic potential model, which includes a finite transition region from the vacuum to the periodic substrate. If we write the scattering matrix associated with the selvage as Ms(ts), following the general equation (111.46), we obtain
(111.102)
Y(ts)= MS(t,)Y(0).
Combination of (111.99) and (111.102) leads to ((B,”‘> ( B 2 4
( ~ W l ( t s ) I m n W : 2 ( t s ) k n z ~ m n ) (Idnl
(M2sl(ts)lmn
(ML(ts)knzImn
+
an’>.
(an1 - a n 1
(111.103)
246
LIAN-MA0 PENG
In Eq. (111.103) the Bloch wave excitation amplitude vector a is an unknown. By eliminating this vector from the preceding equation we arrive at a solution for the reflection vector 6l from a semi-infinite crystal consisting of a substrate and a selvage:
We now consider the calculation of the selvage scattering matrix MS. A straightforward way is to construct a super unit cell having a large dimension perpendicular to the surface, as shown in Fig. 21a. Although a crystal composed of such unit cells is rather artificial, Fig. 21a shows clearly, however, that the potential distribution of the left half unit cell represents excellently the potential variation of the selvage as shown in Fig. 19 in the "full" curve. The relevant matrices S, C, P, T can be calculated exactly via the same formula (111.39)-(111.43), and the selvage scattering matrix is identical to that of associated with the left half super unit cell as shown in Fig. 21a. For a large unit cell dimension normal to the surface, many reciprocal lattice points along the surface normal are needed to achieve a convergent result. A more efficient way to calculate the selvage scattering matrix M Sis to use two-dimensional Bloch waves. The potential variation along the surface normal across the selvage can be simulated by dividing the selvage into many slices, and assuming that the two-dimensional potential field parallel to the surface is constant normal to the surface (see Fig. 21b). Since potential field in each slice is constant normal to the surface, only ZOLZ reflections appear in Eqs. (111.4) and (111.6). Equation (111.10) reduces to become
leading to 2n independent Bloch waves and n independent eigenvectors:
d i )= (C,'", cp, ..., Cn(i))
(i = 1, ...,n).
247
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY 20
I
1
1
~
super unit cell modal 0
-20
-40
-60
-80
-100
-120
z (angstrom) 20 2D potential step modal -
0
-20
-40
-60
-00
-100
-120
0
1
2
3
z (angstrom)
4
5
6
FIGURE21. One-dimensional potential models for selvage based on (a) a large super unit cell and (b) an assemble of thin slices, each having a constant potential normal to the surface.
248
LIAN-MA0 PENG
The diffracted beam amplitude associated with the mth reciprocal lattice rod (111.34)reduces to 2n
w,(z)
=
C
a ( j ) ~ g (exp(iX%), i)
j=1
and the general equation (111.52)becomes MS(z) =
n (C,T,Ci'),
(111.106)
k
in which T is a 2n x 2n diagonal matrix with (T]i,i = exp(iX'"t); (TIi+,,i + n exp( - iX'')t) (i = 1, ...,n); and
The "full potential" rocking curve shown in Fig. 20 is the corresponding specular RHEED rocking curve calculated based on the full potential model shown in Fig. 19 as the "full" curve. The selvage scattering is seen to have modified the rocking curve to a large extent for small angles of incidence. For large angular region (0 < 15 mrad), it is seen that the truncated potential model presents a fairly good representation of the potential for dynamical RHEED calculation. All major features of the full RHEED rocking curves are reproduced from the truncated potential model. Quantitatively, however, there exist some differences between the absolute amplitudes for the two potential models. 2. RHEED from a Crystal Slab Alternatively, dynamical RHEED calculations can be performed for a slab system (Maksym and Beeby, 1981;Ichimiya, 1983;Smith and Lynch, 1988). If the thickness of the crystal slab is much larger than the mean free path of absorption, the slab method results effectively the same reflection coefficients as from a semi-infinite crystal surface. In principle the scattering matrix could be constructed for either the bulk crystal slab or the selvage using (111.106). In practice, however, the scattering matrix as defined in (111.106) contains some exponential terms of the form exp(iX")z). For evanescent waves with negative imaginary X"), the scattering matrix M will diverge rapidly as the thickness of the crystal slab increases. Numerically, a better means for calculating RHEED from an assembly of slices is to propagate an S matrix which is defined as the ratio between the surface normal derivative of the wave function vector and itself (Ichimiya, 1983;Zhao et al., 1988). At the bottom surface of the crystal
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
249
slab, we have
where
Mi = M i ’
= CT(Zk-1 - zk)C-’.
(111.108)
Letting
+
~ i 1~i2kn~I(M;1 M;2kn2I-’9
(111.109)
( y ’lk-1 =
(111.110)
( S I =~ 1
+
we have then f s ~ k ( ~ I k - l *
Within the crystal slab, we have for the ith slice (i = k - 1 , =
(IvmI)
I ~ h li-
(IM;~I
~ ; 2 1 ) ~ (“‘> (
(Mi11 (Mi21
1
..., 1)
IvAI
i
giving
I’WI= (SIi-1(Wi-l,
(111.1 12)
where (SJi-1= (Mi1 + Mi2 SJM;,
+ M;2Si]-1.
(111.113)
At the top surface, we have
we therefore obtain (111.115)
Shown in Fig. 22 are three RHEED rocking curves calculated for 20 keV and a Si(OO1) surface, using a truncated potential model. The “3D” curve in the figure is calculated using three-dimensional Bloch waves; the “2D” curve is calculated using two-dimensional waves. The resulting RHEED rocking curves are seen to be almost identical. The three-dimensional Bloch wave calculations have been made using from 10 to 26 beams distributed
LIAN-MA0 PENG 1 0.9
0.8 0.7
0.6 0.5
0.4
0.3 0.2 0.1 0
10
20
30
40
50
60
Angle of incidence (rnrad)
70
80
90
100
FIOURE22. Calculated specular RHEED rocking curve for the Si(OO1) surface and 20 keV incident electrons. The three curves shown in the figure have been calculated using a threedimensional Bloch wave scheme (“3D”), a two-dimensional Bloch wave scheme (“2D”) and three-dimensional Bloch wave (for semi-infinite substrate bulk cystal) plus two-dimensional Bloch wave (for selvage) (“2D + 3D”).
along three reciprocal lattice rods. The number of beams used is surprisingly small, but it is seen that it agrees well with the full two-dimensional Bloch wave approach which uses a slice thickness of less than 0.01 8,. Also shown in the figure is a “2D + 3D” curve. This curve is calculated based on Eq. (IILlM), in which the substrate matrices B1 and B2 are calculated using three-dimensional Bloch waves and the selvage matrix M S is calculated using two-dimensional Bloch waves.
Iv. PERTURBATION METHODSFOR PERIODIC STRUCTURES In this and the next section we will develop perturbation methods which are suitable for non-Hermitian eigensystems. For simplicity, in this section we will be concerned only with periodic structures and consider a transmission diffraction geometry. The perturbation method which is more appropriate for reflection geometry will be given in the next section.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
25 1
For a periodic structure, it is convenient to use Bloch wave method. The perturbation method aims to find changes in the eigenvalues and eigenfunctions of a system when a small disturbance is applied. In the context of electron diffraction theory, the perturbation method has been extensively used and developed. Applications have been made to take into account of the effects of weak beams (Bethe, 1928; Gjannes, 1962); inelastic scattering (Hirsch et al., 1965); HOLZ diffraction (Bird, 1989); to crystal structure determination (Vincent et al., 1984); and to crystal structure factors refinement (Zuo, 1991; Bird and Saunders, 1992). A formal mathematical expression for the first-order partial derivatives of the scattering matrix can be found in a paper by Speer et al. (1990). It is assumed from the outset that the crystal potential may be written as a sum of two parts: V(r) = b(r) + AV(r), one of these, b(r), is a known potential, and the other one, AV(r), is small quantity and may be regarded as a perturbation on b(r).
A. Bloch Waves, Left-Hand and Right-Hand Eigenvectors
For the convenience of our following discussion, we first define a transformation to the eigenvector Cf) defined in Section 111: (IV. 1) and Bg is called right-hand eigenvector (Wilkinson, 1988). Substituting the preceding definition into (111.30), we obtain a linear eigensystem
In matrix notation, Eq. (IV.2) can be rewritten as (S
+ U)B = BT,
(IV.3)
in which the matrices S and T are diagonal matrices with (IV.4)
252
LIAN-MA0 PENG
and the elements of the matrices U and B are given by
"'"
ug-h
+ g,/koz.\Il + h,/ko,
= J1
,
(BJgi= B f ) .
(IV.5)
Similarly, we can define a set of left-hand eigenvectors Dg, satisfying
In matrix notation, we can rewrite Eq. (IV.6) as B(S
+ U) = TB,
with (BJjg= B y ) .
(IV.7)
By multiplying Eq. (IV.3) first by B-' from the left-hand side and then from the right-hand side, we obtain B-'(S
+ U) = TB-'.
(IV.8)
Comparing Eq. (IV.8) with Eq. (IV.7), we have B = B-', which gives the orthogonal relations B B = I and B * B = I. Explicitly, we have
-
c B y ) B ( i )= 8 h
hg
9
c BB(j)p') E
=
a,., JJ .
(IV.9)
8
j
B. Non-degenerate Perturbation Theory
The assumption that AV(r) is small suggests that an expansion may be made of both the perturbed eigenfunction and eigenvalues as power series in AV(r). Up to second order, we have U(r) = Uo(r) + 1 AU(r) B = Bo + IB1 + 12B2 ,
T
=
To
+
ITl
1
(IV.10)
+ 12T2
in which the parameters I has been chosen in such a way that the equation to which Eq. (IV.10) reduces when I 0, (IV. 11) (S + Uo) Bo = Bo To can be directly solved. This equation is called the equation for the reference structure. Substituting Eq. (IV. 10)into Eq. (IV.3), the coefficient of I gives the equation (S + UJB1
+ AUBo = BoTl + BITo,
(IV.
and the coefficient of I 2 gives (S
+ Uo)B2 + AUB1 = BoT2 + BIT1 + B2To.
(IV. 3)
253
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
1. First-Order Perturbation
We first consider Eq. (IV.12). Letting B1 = Boal,
(IV. 14)
where a, is a coefficient matrix. By using Eq. (IV.11) for the reference structure, Eq. (IV.12) then becomes
+ AUBo = BoT,.
Bo(T,al - a,",)
Multiplying both sides of the preceding equation by Bo, we obtain (BoBo)(Toal - a,To) + BoAUBo = (BoBo)Tl.
For diagonal terms, j = j', we have or explicitly,
(T,Jii= (B, AUBOld,
1
ylj)
= -U'"), (IV.15)
2k0,
in which
For off-diagonal terms, j
# j',
((T0)jj - [ ' l ' o ] j , j , ) [ ~ ~ 1 ] f= i,
-(BoAUBo)~,= -U'"),
and for a non-degenerate case, y p ) # yg'), we have 1 u'"" ((yl)d' = -2k0, ,g) - #') * The diagonal element (alJiican be obtained from the normalization condition (B + ABl)(Bo + A B , ) = I. By neglecting the second-order term, the preceding condition requires that Glii + alii= 0, which can be satisfied if we choose { G l J j i=
(al)jj=
0.
Using the definition (IV. 14), we have then
Similarly, we can define
B -- -alBo,
(IV.18)
254
LIAN-MA0 PENG
to obtain (for j # j')
and
(IV. 19) 2. Second-Order Perturbation We now consider the second-order equation (IV. 13). Letting
BZ = Boaz,
(IV.20)
using ( I V . l l ) , and multiplying both sides of (IV.13) by Bo, we obtain
(BoBo)(Toaz- azTo) + BoAUB,
=
(BoBo)(Tz + a,",).
For diagonal terms, j = j', we have
(IV.21) For off-diagonal terms, j' # j, we have
-
( ' T o ] j j [ ~ ~ z ) j j o (Crz)jji [ T o ] j y l
= - { B o AUBIJjj,
+ ( ~ ~ l ) j j ~ ( ' Y ' l ) j y(IV.22) t.
For non-degenerate case, substitution of Eqs. (IV.17) and (IV.15) into Eq. (IV.22) gives
1 Similar steps give
(ky,,p Y pb,
=-
-
The diagonal elements of
(B
U W~(0')
U'N~~'""
1
(5Z)jig
and
( C Y ~ ) ~
-
(QlzIu
Yp')- ic# j
@-
)#)
can be obtained requiring
+ ABI + AZBz)(Bo + A B , + AZBJ
=
I.
Using up to second-order terms, we obtain the condition
BoB2 + B I B , + BzBo = I,
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
which gives an equation for
255
( c x ~ and ) ~ (ii21J:
This equation can be satisfied if we choose
(IV .24)
Similarly, we have an expression for @):
(IV.25)
C. Tensor THEED
Although expressions for TI,T2, B, ,B2, B, ,and B2 are complicated, their dependence on the variations of structure factors are relatively simple. In matrix notation, letting L = 1 and AT = Tl+ T,, and using Eqs. (IV.15) and (IV.21), we have A y ( j )=
with
and
IQ(i)
. AU + AU .2 ( u ( j ) . AU,
(IV. 26)
256
Letting AB = B,
LIAN-MA0 PENG
+ B2 and using Eqs. (IV.17) and (IV.24), we have A B ~ =) ‘ &8 ( A .AU + A U . 2 gi? W . AU.
(IV .29)
Using Eqs. (IV.19) and (IV.25), we have A B ~=) 1E, - ( A . AU + AU . 2 g ( j ) . AU, in which
(IV.30)
,
(IV.32)
(IV.33)
The diffracted beam amplitude is given by
For given vectors and matrices ‘u and E , the calculation of the diffracted beam amplitude is an operation of the order of n(p + p 2 ) , where n is the number of Bloch waves having appreciable excitation amplitudes, and p is the number of varying crystal structure factors. For simple cases, we have typically n = 10 and p < 30. This situation should be compared with the case of full dynamical calculation, where each calculation scales as N’, with N being the total number of beams which varies typically 30 to 150. Previously, tensor expressions for low-energy electron dvfraction (LEED) beam intensity had been obtained by Rous and Pendry (1989). Shown in Fig. 23 are three variation curves for the transmitted beam intensities with varying U2,,, for a MgO single crystal and a systematic diffraction geometry. The calculation has been made for 100 keV, a crystal thickness of 1,OOO.O A, using full dynamical theory (“Full”), first-order (“1st order”), and second-order (“2nd order”) perturbation theory. It is seen that while the first-order perturbation approximation deviates slightly from the full dynamical theory, the second-order perturbation theory agrees almost perfectly with the full dynamical calculations.
257
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY 0.21
Full plst order -+a,,‘ 2nd order - +
0.208 0.206 0.204
2. 0.202 .ul
=
a,
L
0.2 0.198 0.196 0.194 0.192 u200
FIGURE23. Calculated variations of the transmitted beam intensity as a function of the (200) structure factor. The calculation has been made for 100 keV and a MgO single crystal. The sample thickness used is 1,OOOA. The curves denoted by “full,” “1st order,” and “2nd order” refer to calculations using full dynamical theory, first-order, and second-order tensor theory, respectively.
D. Direct Inversion of Crystal Structure Factors
Quantitative electron diffraction study normally involves defining and minimizing a merit function which measures the agreement between the theoretical model and experimental measurements (Zuo and Spence, 1991): (IV.35) where ;Z: and Z,!!] are the experimental and calculated beam intensities, respectively; C is a normalization constant; and .f is the variance of the ith experimental measurement. The structural information is obtained by adjusting a set of parameters, such as structure factors and atomic coordinates, of the model until the minimum of the merit function is achieved. The adjustment process is therefore a problem of many-dimensional minimization. In general the problem is tedious and complicated.
258
LIAN-MA0 PENG
For cases when the perturbation methods as discussed in the preceding sections are applicable, the general problem of minimizing x2 in the multidimensional parameter space may be reduced to a simple problem of matrix inversion, and structural information may be directly inverted from experimental dynamical beam intensities. We first discuss the problem of direct inversion of structure factors. As mentioned in the introduction, when a crystal is formed by bringing together an assembly of neutral atoms, charges will be redistributed in the crystal to reduce the total energy of the system and to form crystalline bonds. The effective one electron potential may be written as V(r) =
c &(r
-
+ AV(r) = Vo(r) + AV(r),
ri)
i
(IV.36)
in which the first term represents the contribution from neutral atoms, and the second term describes the charge redistribution in the material. A characteristic feature of charge distribution in a solid is that most of the charge overlap has already been included by the first term. The additional charge redistribution due to bonding as represented by the second term is very small in comparison with the first term, typically of the order of less than 0.01% of the total charge in covalent crystals. This therefore provides a perfect basis for a perturbation treatment for the influence of AV(r) on the diffracted beam amplitudes. Using Eq. (IV.34), the diffracted beam amplitude can be expanded into a power series of AU. Up to second order, we have 5, = 5f) + X, A U + A U * A, * AU, (IV.37) in which
-
In our following developments it is convenient to treat the real and imaginary parts of the structure factors as separate parameters. Letting ~ 2 =t
WAU,),
= At,k,
u21+l= Im(AU,);
Ait+l,Zk =
~AP,,,
Xi,= X,,
~ P , Z ~ += I iAf,k,
Xip+,= iX,; Ah+1,2k+1 = -AZk;
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
we then have 5, = 5;')
+ X' - a + a - A'
a,
259
(IV.38)
where the parameter vector a is now a real vector. Following Pendry et al. (1988), we assume that lA5,l = 15, Using Eq. (IV.38), we have then
4
IS:)/.
(IV.39)
1. Linear Model
We first consider the simplest linear model for direct inversion of crystal structure factors. Retaining in (IV.40) only linear terms, the x2 reduces to (IV.41)
The minimum of the x2 occurs when the first-order derivative with respect to ak vanishes, 2 (Vx') ak = 0 = 1- [AZ(') - &(i)ap]Y$), (IV .42)
id
I
for all parameters. By defining a design matrix (MI,,, = &(')/oi,and a vector b by bi = A I ( i ) / ~ i the , normal equation (IV.42) can be rewritten as
-
-
(MTM) a = MT b.
Formally, this matrix equation can be inverted directly to give a = (MTM)-'MT * b.
(IV .43)
In practice, the solution should best be obtained by the method of singular value decomposition (Press et al., 1986).
260
LIAN-MA0 PENG
2. Quadratic Model
We now extend the linear model to include quadratic terms. If the xz function is sufficiently close to the minimum, we may expand X2(a) in the vicinity of a reference vector a, quadratically:
-
X2(a) = X2(ao)+ VX2(ao) (a - a,)
+ *(a - a,)
H (a - a,), (IV.44) where VX2(a)is the gradient vector of the xz function (IV.40) with respect to a:
1.
- [AI(i) - y ( i ) .a - a . D(i)aJ(DiA)+ of;)
(IV.46)
For the true structure, having a parameter vector aminwhich minimizes the xz function, we have
-
VXz(amin)= VXZ(ao)+ H (amin- a,) = 0.
(IV .47)
Formally, the parameter vector is given by
amin= a, - H-'
VX2(ao);
(IV.48)
again the structure factors are inverted directly. For cases where the second-order matrix D") is small, Eqs. (IV.45) and (IV.46) become
We therefore have from Eq. (IV.47):
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
261
and this is identical to Eq. (IV.42). The quadratic model thus reduces to the linear model as previously discussed. To test our models, we show in Fig. 24 some results from an “ideal experiment.” The “actual” rocking curve in the figure is calculated for a structure with Uzoo = (0.058,0.001) and V4, = (0.024,0.00043), using full dynamical theory. This curve is then taken to be the “experimental curve,” and a Poisson-distributed noise is added to the curve to simulate experimental measurement errors. The starting structure is the reference structure which is defined to have V,,, = (0.054, 0.001) and V400 = (0.023, O.OOO4). The two “actual” and “reference” curves shown in Fig. 24a are then used to calculate the design matrix M and to invert the parameter vector a using the linear model. The “inverted” curve in Fig. 24a is calculated based on the inverted structure factors. The difference between the “actual” and “inverted” rocking curve is shown below the rocking curves and is seen to be noticeable, indicating that the linear model is not an accurate description of the scattering processes by A V(r). Nevertheless, the two rocking curves in Fig. 24a are seen to be closer together in comparison with the difference between the “actual” and “reference” curves. The standard variation oZ = X2(amiJ/h, h being the total number of measurements return from the inversion process is 4.74. A new linear least-square system is then set up based on the restored structure. The results from this new linear system are shown in Fig. 24b. The fitting between the “actual” and “restored” rocking curves is seen to be close to perfect. The c? value obtained is 1.05, which approaches an ideal Poisson situation. Similar procedures are then repeated using the quadratic model, and the results are shown in Figs. 25a and 25b for the first and second inversion processes. The c? value resulted from the first direct inversion is 1.2, and the value returned from the second iteration step equals 1.05, exactly the same as for the linear model. The determined U2wand V4, values are also similar, to within an accuracy of 0.5% for the real parts and 15.0% for the imaginary parts of the true values. The relatively larger error for the determination of the imaginary part of the crystal potential is partly due to the use of the normalization constant C in Eq. (IV.35). This is because a uniform decay of the rocking curve due to the imaginary potential is absorbed into the normalization constant C. As an extreme case, information concerning the imaginary part of the mean inner potential is completely lost in (IV.35). We now turn to the real test of our models, i.e., applying it to invert real experimental data (Peng and Zuo, 1994). Shown in Fig. 26 are an experimental energy filtered rocking curve (“actual”) and a “restored” curve for a MgO single crystal, using the quadratic model and 94 beams. The starting reference structure factors are obtained from neutral atoms (Doyle and Turner, 1968), and the absorption effects are included using
262
LIAN-MA0 PENG
m n I
I -
0
I
I
I
I
50
100
150
200
250
3500
b
actual
3000
0
inverted error ----
2500 2000
E&
-
1500
-
1000
500 0
-500
FIQURE 24. Calculated rocking curves for an “actual” and “reference” structures, and the curve calculated using the inverted structure factors. The plots below the rocking curves show the difference between the “actual” and “inverted” curves. All curves are calculated using a linear model, and (a) and (b) correspond to the first and second iteration steps, respectively.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
-500 I 0
b
263
I
1
I
I
I
50
100
150
200
250
50
100
150
200
250
3500
3000 2500
2000
$ 5 -
-
1500
lo00
500 0 -500 0
FIGURE 25. Same as Fig. 24, except that all calculations have been made using a quadratic model.
264
LIAN-MA0 PENG 3500 3000
2500 2000
1500 1000
500
-500 I 0
I
I
I
I
50
100
150
200
250
FIGURE26. Energy filtered zero-lose experimental and restored rocking curves for MgO single crystal and a systematic diffraction geometry. The acceleration voltage used is 100 keV, and the sample thickness is determined to be 954 A,
an Einstein model for TDS scattering (Bird and King, 1990). This procedure gives U,,,= (0.057334,0.000791) and V400 = (0.024531,0.000606). The initial inversion of the experimental rocking curve was made for four parameters, i.e., the real and imaginary parts of the structure factors V,,, and U400, giving V,,, = (0.055109 f O.OOOOO1, 0.000697 & 0.0000004) and V,,, = (0.024211 f 0.000003,0.000412 f O.O0OOOOl), and a c? value of 8.7. A new reference structure has also been constructed, but future iteration steps are found not to be able to improve the c2value significantly, indicating that the higher-order scattering effects are small. On the other hand, procedures involving more parameters improve progressively the a2 values. For nine parameters, including the real and imaginary parts of V,,, and &,),)and the crystal thickness, the final c2value approaches 8.2. The later procedure results, however, in increased variances for the determined V,,, and V400. The variance for V,,, is comparable with its determined value, and that for the crystal thickness and V800 exceed the corresponding determined values of parameters. These determined parameters must not be taken seriously. Although the inclusion of these parameters does improve the fitting, the residual c? value is more likely to result from other factors, such as changes of experimental conditions during measurements.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
265
For the linear and quadratic models as discussed in this section, the structure factors obtained from the inversion processes are unique. Since triple scattering processes affect only the degree of asymmetry or an harmonicity of the x2(a) surface around its minimum, we may conclude that a unique solution exists for the structure factors obtained from electron diffraction, unless multiple scattering processes involving terms of higher order than (ALI)3 are significant. The later condition gives a parameter of the order of 4C > 18,000 A , which is much thicker than all practical sample thicknesses used in THEED. E. Direct Determination of Crystal and Surface Structures
Having determined the crystal structure factors, the problem of atomic coordinates determination is then a trivial one. This is because given sufficient crystal structure factors, a Fourier summation,
V(r) =
C V , exp(ig - r),
(IV.49)
8
will reveal the atom positions directly as the intensity maxima via the Poisson relation (Spence, 1993). This method is not very efficient, however, since a high-resolution structure map requires the knowledge of both amplitude and phases of many diffracted beams. In general, given that the point group and space group of a crystal may be determined from dynamical CBED patterns (Gjmnes and Moodie, 1965; Buxton et al., 1976; Tanaka et al., 1983), the number of unknown atomic parameters is far less than the number of structure factors required to obtain a high-resolution structure map. In principle, a more efficient method for crystal structure determination is to utilize the characteristic changes of the diffracted beam intensities with varying atomic coordinates. In this and the next subsections we will explore this method in some detail. We start from a reference structure which is characterized by a set of atomic coordinates (ri1. For this reference structure we have
;(ms>
u8(ref) = -
-
c fi(g/4n) exp(-Mf) exp(-ig
ri). (IV.50)
i
By writing the set of atomic coordinates for the actual structure as (ri + 6 r i ) ,we then have for the actual structure fi(g/4n) exp( -Mf) exp( - ig * ri)exp( - ig 6ri). (IV.51)
266
LIAN-MA0 PENG
The difference structure factors are given by AU, = UJact) - UJref) =
c
(IV.52)
Ui,gSt,g,
i
where fi(g/4n)exp( -Mi") exp( -ig Si,g
=
exp(-ig
- ri),
and
(IV.53)
- 6ri) - 1.
1 . Linear Least-Squares Method
For crystal structure determination and to a first-order approximation, substitution of [AU,) into (IV.34) gives
(IV .54)
(IV.55)
and is given by (IV.28). Substituting (IV.52) into (IV.54), we obtain (IV.56)
For surface structure determination (Takayanagi et al., 1985), we consider a model system consisting of a reconstructed surface layer and a bulk crystal slab having thickness t. For the reconstructed surface layer, a tensor expression for AS, can be obtained as just discussed. On exiting the surface layer, the diffracted beam amplitudes can be written as '3, = 'S,(ref)
+ Ci.t Y I ~ ~ s ~ , ~ ,
(IV.57)
where the superscript s denotes surface layer. The dynamical scattering processes occurring in the bulk crystal lying beneath the surface layer can be represented by a scattering matrix Mb (Peng and Whelan, 1991a). The final diffracted beam amplitudes from the exit face of the bulk crystal
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
267
slab are given by S+b%g =
c
Mihs%h
=
h
C kf,bh'Sg(ref) + C kfih h
i.I
h
s 3 ~ ~ ) ~ i(IV.58) ,I,
where the superscript b denotes bulk crystal alone and s whole model system. By introducing a new matrix s+b3t)=
c
+ b denotes the (IV.59)
Mgh b s3(h) i.8,
h
we then arrive at A%, =
-
S+b~g
c M,bhSSSg(ref)= C S+b3t$i,p. h
(IV.60)
i.f
Again a tensor expression is obtained. To a first order in 6r, six=
-i
g, &i&, k
we have from (IV.56) or (IV.60) A%g z
C 3tL c ( - i h k & i , k )
i ,h
=
k
C 3ng,,6rI,
(IV.61)
I
where P = (i, k), the index k denotes one of the three orthogonal Cartesian axes, and SrT = (6xl,6yl, 6zl, ...);
3ng,!= - i
c
(IV.62)
h
The x2 function becomes
= 2C Re(%T(ref)3ng,]. where AZ'" = 4:; - CISi(ref)12, The minimum of the x2 function occurs where the first-order derivatives of the function with respect to all parameters vanish:
(IV.64)
We can write the preceding set of equations in matrix form,
- AI' =
XIT
- X')- 6r,
(XIT
(IV.65)
in which AI; = AZi/ai and Xi,, = 3Zi,I/ai.Formally the solution of this matrix equation can be written as
-
-
6r = (X'T X')-lX'T AI,
(IV. 66)
i.e., a direct solution to the crystal or surface structure is obtained under the THEED diffraction geometry.
268
LIAN-MA0 PENG
2. Nonlinear Least-Squares Method In a general case the atomic displacements 6r may not be small, and the linear model as discussed in the previous subsection does not apply. For many practical important cases, such as for the case of surface diffraction and for the case of light atom scattering, the perturbation theory works well. We start from a x2 function for the crystal structure factors: (IV.67) in which a, is the variance associated with the determination of the crystal structure factor U,. If the actual structure is sufficiently close to the reference structure, we may approximate the x2 function by a quadratic form, which we can write as
X2(6r) = ~'(0)+ d * 6r
+
- -
6 r D 6r,
(IV.68)
where d is the gradient vector of the x2 function:
and D is the Hessian matrix whose elements are
If the preceding quadratic approximation (IV.68) is a good one, the minimizing parameter vector arminmay be obtained directly starting from a reference vector dr, via the relation VX2(6rmin) = 0:
drmin= 6r,
+ D-'
- d.
(IV.71)
On the other hand, if the quadratic form (IV.68) is not a good one, we may take a step down the gradient
6rnext= 6ro + constant x d,
(IV.72)
where the constant is small enough not to exhaust the downhill direction. In practice nonlinear least-squares routines, the inverse Hessian method (IV.71) is combined with the steepest descent method (IV.72). The later method is used for 6r far from the minimum armin, switching continuously
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
269
FIOURE27. ematic diagram showing the [Ool] projected structure of a TiO, single crystal. During the phase transition, the TiO, octehedras are rotated by an angle 6 with respect to the normal structure.
to the former as the minimum is approached. This is the LevenbergMarquardt method (Press et al., 1986). The crucial difference from the general full minimization method is that now we are given the ability to evaluate the gradient and Hessian function of the x2, whereas in the general nonlinear function minimization case we had to resort to iterative methods in order to build up information about the Hessian matrix. We now consider two examples of direct structure determination via the least-squares method. The first example is concerned with the structure determination of the low-temperature phase of a SrTiO, crystal (see Fig. 27). The SrTiO, crystal undergoes a phase transition from hightemperature cubic form, with group symmetry Pm3m, to low-temperature nonpolar tetragonal form, with group symmetry 14/mcm, through the tetragonal rotation of TiO, octahedra (Muller et al., 1968). The structure analysis of the low-temperature form aims to determine the rotation angle of the TiO, octahedra. Previously, this structure has been studied in the context of electron diffraction by Tanaka and Tsuda (1990), using full dynamical calculations and the trial-and-error method. The rotation angle 4 of the TiOs octahedra was found to vary continuously from zero at 103 K to 2.1” at 4.2 K. Shown in Fig. 28 are variations of the transmitted beam intensity with the TiO, octahedral rotation angle 4. The calculations are made for 100 keV primary beam energy, a zone axis incidence, and a crystal thickness of 500 A . For simplicity, we have used the same Debye-Waller factor of 1.2 for all atoms, and a mean absorption of - 0.045. In the figure the “full” curve is calculated using the full dynamical theory, the “1st” curve is calculated using Eq. (IV.61), and the “2nd” curve is calculated based on Eq. (IV.60) using up to second-order terms in expanding Si,r. This figure clearly shows that whereas the linear expression (IV.61) holds well for almost whole range of possible rotation angles, the second-order expansion of Eq. (IV.60) works almost perfectly.
270
LIAN-MA0 PENG 0.1805
I
I
I
I
I
;
full -
0.18
0.1795
.$ 0
c
‘.n
0.179
W W
F 0.1785
From 28. Variations of the transmitted beam intensity with varying rotation angle of the TiO, octehedra as shown in Fig. 27. The three curves shown in the figure are calculated for 100 keV, and a sample thickness of l00A, using full dynamical theory, first-order, and second-order tensor theory, respectively.
Results for the direct inversion of the rotation angle 6 are shown in Fig. 29, which is calculated for the transmitted beam and 100 keV incident electrons. The “actual” rocking curve in this figure is calculated for a rotation angle of 9 = 0.034. The “reference” curve is calculated for a reference structure having = 0.0. The quadratic model is used for inverting the rotation angle, and the restored rocking curve from the leastsquares procedure is shown in Fig. 29 as the “inverted” curve. This restored curve is seen to be practically indistinguishable from that of the ideal “actual” curve. The d value returned by the NAG routine F04JGF is 5.107 x lo-’, which represents a perfect fitting between the “actual” curve and the restored curve. As the second example, we consider the determination of a Si(001)(2 x 1) reconstructured surface in THEED geometry. For simplicity, we consider a model system which consists of a 500A thick crystal slab and a reconstructed top surface layer having a thickness of 5.43 A . Shown in Fig. 30 are two schematic diagrams, illustrating the top view (along the (001) zone axis) and a side view (along the (’il0) zone axis) of a bulk terminated ideal Si(001) surface structure. To reach the minimum t#~
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY 0.18
I
0.178
-
0.176
-
0.174
-
0.172
-
0.17
-
0.168
-
0.166
-
0.164 I -0.08
I
I
I
I
I
27 1
I
actual reference - - - inverted o -
I
I
I
I
I
I
I
I
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
Angle of incidence (rad)
FIOURE29. Calculated and restored transmitted beam rocking curves for 100 keV and a SrTiO, single crystal. The “actual” curve represents an ideal “experimental” rocking curve, and the “reference” curve is calculated for a reference structure having zero TiO, octehedral rotation.
energy structure, the surface atoms are known to relax in both the x and z directions. Since under the Laue transmission diffraction geometry, the diffracted beam amplitudes are not sensitive to the atomic displacements along the beam direction, only the atomic displacements along the x axis will be considered in our following discussion. Shown in Fig. 31 are calculated 400 diffracted beam amplitudes for a Si(001)2 x 1 reconstructed surface. The primary beam energy is 100 keV, and the sample temperature is 93 K. The surface unit cell used in this study is the same as that used by Yin and Cohen (1981): twice as big as the bulk unit cell along the x direction, and the same along the y direction. All indices used here follow this convention. The variation curves shown in Fig. 31 have been calculated for such an incidence that the 300 Bragg condition is exactly satisfied. The “exact” curve in the figure is calculated using full dynamical formulation, while the “1st tensor” curve is calculated using Eq. (IV.61), the and “3rd tensor” curve is calculated based on Eq. (IV.60) using up to third-order terms in expanding Si,!. The “3rd” curve is seen to agree well with the exact dynamical results for atomic displacements of up to 0.3 A.
272
LIAN-MA0 PENG
[Ool] zone axis 3‘
3’
3’
[ 1 101 zone axis
FIQURE30. (a) Top and (b) side views of a Si(OO1) surface. The dimerization of the surface leads to the formation of strong bonds between the atoms 1 and 1’.
Shown in Fig. 32 are two THEED rocking curves for the 200 surface superlattice diffracted beam from a Si(001)2 x 1 surface. The “actual” curve is calculated for the actual Yin and Cohen model, and the “reference” curve is calculated for a reference structure having two atoms (atoms 1 and 1’ in Fig. 32) that deviate from the actual structure by 0.1146A and -0.2076A, respectively. The inversion is based on the quadratic model, and the restored rocking curve is given in Fig. 32 as the “restored” curve. It is seen that the restored rocking curve is indeed indistinguishable from the “actual” curve.
V. PERTURBATION METHODS FOR NONPERIODIC STRUCTURES In this section we will develop perturbation methods which are particularly suited for treating non-periodic structures.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
0.00017I
273
THEED from Si(O01) surface, 100keV. 93K I
I
I
I
I
I
I
exact + 3rd tensor -t1st tensor a - -
0.00016
0.00015
0.00012
0.00011
0.0001
t I
0
I
Q..,
' " " ~
t
1
0.05
0.1
I
I
0.15 0.2 Displacement (angstrom)
.......-\ 'm.,
I
I
0.25
0.3
1 0.35
FIGURE31. Calculated (400) beam intensity as a function of the atomic displacement of atom 1 as shown in Fig. 30. The primary beam energy is 100 keV.
A . Distorted Wave Approximation
We start by separating the total potential U(r) into two parts, U(r) = Uo(r) + AU(r), W.1) such that AU(r) introduces only perturbation to the behaviour of the electron movement in the potential Uo(r). The motion of an electron in the potential Uo(r) is described by a wave function tyo(r) and a Green function G(r, r') satisfying (V .a [GI + v2 + Uo(r)lvo(r) = 0 and [GI + V2 + Uo(r)]G(r, r') = 6(r - r'). (V.3) We now seek an approximate solution of the inhomogeneous equation
[G + V2 + U0(r)]v(r)= -AU(r)v(r).
(V.4)
Formally, the solution to (V.4) is given by
v(r) = vo(r)-
s
G(r, r') AU(r')v(r ') dr';
(V.5)
274
LIAN-MA0 PENG THEED from Si(OO1) surface, 100keV, 93K
0.00014
I
I
I
I
I
1
I
I
actual reference restored
0.00012
-t-
.o.-
o.ooo1
.-
0 0
.-
E
8e-05
2
66-05
E E
0
:: 4e-05
2e-05
-
0 II 0 5 10
,
1
1
I
20 25 30 Angle of incidence (mrad)
15
1
I
I
35
40
45
F I ~ U R32. E Calculated and restored (200) superlattice diffracted beam rocking curves for a Si(001)2 x I reconstructed surface. The “actual” structure used is the Yin and Cohen model, and the inversion is made based on a linear least-squares procedure.
the validity of this expression can be easily verified by substituting (V.5) back into (V.4). The scattering amplitude by the crystal is determined by the asymptotic form of the wave function in the region where U(r) = 0. For Wo(r) we have exp(ikr) Wo(r) = exp(iko r) + r fo(k), and for v/(r), exp(ikr) W(r) = exp(iko r) + fW.
-
-
r
Substitution of (V.6) and (V.7) into (V.5) gives f(k)
s
exp(ikr) exp(ikr) = fo(k) 7 - G(r, r‘) AU(r’)w(r’) dr’. r
~
(V.8)
To obtain an explicit expression for f(k) in terms of fo(k) and AU(r), we now perform a two-dimensional Fourier transform on both sides of Eq. (V.8). Using Eqs. (A.lO) and (A.21) and letting zo = z in Eq. (V.21),
275
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
we obtain
Here we have used the notation v - k ( r )to denote the dynamical electron wave function for an incident plane wave of the form exp(-ik r). In particular, we have vo(r)= vko(r)' An approximate solution with desired accuracy can be obtained for both the wave function and the scattering amplitudes by iteration. For the wave function, we have from Eq. (V.5)
-
v(r)=
vo(r)-
I
G(r, r') AU(r')v,(r')dr'
+ [ G(r, r') AU(r')G(r',r") AU(r")vo(r")dr'dr" + ... J
(V.10)
and for the scattering amplitude, we have from Eq. (V.9)
...
+
(V. 11)
and this is the distorted wave approximation (DWA) to the perturbation potential AU(r) scattering (Dudarev et al., 1993a and 1993b). It should be noted that here we have replaced the scattering amplitude with a new notation S(k, k,) to reflect explicitly the fact that the scattering amplitude depends both on the incident and scattered wave vectors. On the right-hand side of Eq. (V. 1l), the first term denotes the scattering amplitude of the high-energy electrons by the potential U,(r). The second- and high-order terms represent the corrections to the scattering amplitude S(kl, k,) due to single, double, and multiple scattering processes by the perturbing potential AU(r). In many applications and with good choice of the potential U,(r), the first-order correction due to the perturbing potential will be sufficient, giving
A5@i ko) = S(k, ko) - So(ki ko) 9
9
9
%
--I
4R
I
v-k,(r) AU(r)WkO(r) dr, (V.12)
and this is the distorted wave Born approximation (DWBA) (Schiff, 1968).
276
LIAN-MA0 PENG
B . Tensor RHEED
For RHEED from a semi-infinite bulk crystal, the potential variation along the surface normal is not periodic. In this case, to calculate the variations of the diffracted beam amplitudes with AU(r), DWA shall be used, rather than the Bloch wave methods as previously developed in Section IV (Peng and Dudarev, 1994). We now consider the problem of surface structure refinement. We assume that our starting reference structure is not far from the actual structure and the distorted wave approximation applies. Starting from this reference structure, which is characterized by a set of atomic coordinates ( r i ) , we then have, for the actual structure, a set of atomic coordinates ri(act) = ri + 6ri. The potential distribution is given by U(r) =
2m
-T
c pj(r
-
i
ri - 6ri) = Uo(r) + AU(r),
(V.13)
where
2m
Uo(r) = --ti2
C pi(r - ri),
(V.14)
with p(r) being given by Eq. (B.7), and AU(r) =
13i,'16rik+ i.k.1 c 3i:i,r6rik6ril + i . k . l , m 3 ~ ~ ~ 6rik , l , 6ril m brim + ... ,
i,k
(V.15) where the indices k,I, m denote one of the three orthogonal Cartesian axes. The first two terms in Eq. (V.15)are given by
and analytical expressions can be obtained readily using (B.8). Substituting (V.15) into (V.12), we obtain a tensor expression for the variation of the scattering amplitude
in which
's
%(") = - -
4n
ty+(r)3(")(r)yk0(r) dr.
(V.18)
277
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
Expression (V.17) is a general expression, which can be used in either RHEED or THEED diffraction geometry and for either periodic or aperiodic structure. When the reference structure is very close to the actual structures, then to a good approximation the first-order tensor theory applies. We therefore have (V. 19) AS(kl, k,) = ml,'k * d f i k .
c
i.k
For high-energy electron diffraction by crystals, either in the form of a crystal slab or a semi-infinite bulk crystal, it is often convenient to retain the two-dimensional periodicity of the crystal parallel to the surface and write U(r) =
cc UG(z)exp(iG
x),
(V.20)
where G are a set of two-dimensional reciprocal lattice vectors parallel to the surface. The Fourier coefficient UG(z) in the preceding expression is given by (B. 15). Applying Bloch's theorem to the two-dimensional periodic crystal, we have
Substituting (V.20) and (V.21) into (V. 12), we obtain AS(kl, k,) = -L 1
4n
=
-n
c c 1dx exp[i(k,
-
G G' GO,
k,
+ G + G' + G") XI *
1 c c G(kot - klt + G + G' + G") G G' G"
in which the subscript t denotes the surface parallel component of the relevant vectors. In RHEED, the wave function in the vacuum region above the surface is normally expanded in plane waves,
w(r) = exp(iko * r)
+ 1(RGexp[i(k, + G)
*
x -
iJG - (ko + G):z)],
G
(V.23)
278
LIAN-MA0 PENG
Displaced surface atom
fa @
00000
0
O@O@O
00000
0
top view
O
o 0
0 0
O@O@O
00000
@
O
0 0
0
O
@
o 0
0 0
O
0 0
O
O
side view
FIOURE33. Schematic diagrams showing a top and a side view of a Ni(100)p(2 x 2) surface, and top surface layer relaxation.
and Eq. (V.23) then reduces to
As the first numerical example, we consider the case of surface relaxation occurring on a Ni(001) “ p ( 2 x 2)” surface (see Fig. 33 and Rous and Pendry, 1989). The diffraction geometry used is RHEED, and the primary beam energy is 12.5 keV. Shown in Fig. 34 are two curves of variation of the specular reflected beam amplitude with the atomic displacement of the displaced Ni atoms, with respect to the bulk terminated atom position. The positive values in the figure correspond to surface relaxation. The calculations have been made for an angle of incidence of 58mrad, using exact systematic or one-rod RHEED theory (Peng and Whelan, 1991b; Ichimiya, 1987) and tensor approximation. The effects of TDS on the elastic wave were treated using the Einstein model (Hall and Hirsch, 1965)and a DebyeWaller factor of 0.16 A’, which corresponds to a sample temperature of 93 K (Radi, 1970). It is evident from this figure that the tensor approximation works well over a rather wide range of atom displacement. Having obtained the tensor expressions for the variations of the diffracted beam amplitudes, the linear least-squares as previously discussed in Section IV can then be used to invert the surface structure directly. To illustrate the procedures, in Fig. 35 we show two RHEED rocking curves from a Ni(001) “p(2 x 2)” surface. The “exact” curve is calculated for a relaxed surface having dz = 0. I5 A , and the “reference” curve is calculated for a bulk terminated reference surface. With the set of AIi between the
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
0.015
‘
0
279
I
I
I
1
I
0.05
0.1
0.15
0.2
0.25
dz (angstrom)
FIGURE34. Variations of the specular reflected beam amplitude with the top atom displacement of a Ni(001) p(2 x 2) surface. The origin in the figure corresponds to the bulk terminated atom position. The calculations have been made for a primary beam energy of 12.5 keV, and an angle of incidence of 58 mrad.
two curves and the calculated tensor expressions for the reference structure, a linear least-squares problem is set up and solved by using NAG routine F04JGF. The variances associated with the determined parameters are estimated using NAG routine F04YAF. The restored rocking curve from this linear least-squares procedure is also given in Fig. 35 as the “inverted” curve. The restored rocking curve is seen to be indistinguishable from the actual curve. The calculations are based on Eq. (V.24), and up to third-order terms in (V.15) have been used for evaluating AU (Peng and Dudarev, 1994).
C. Diffuse Scattering The DWBA (V.12) may be conveniently applied to deal with diffuse scattering from disorder, defects, and thermal vibrations. A comprehensive theory for diffuse scattering should ideally include dynamical Bragg diffraction effects by the average periodic structure and multiple diffuse scattering (for a review, see Gjrdnnes, 1993). Whereas numerical evaluations can be made using the DWA (V.ll), in practice the kinematic diffuse
280
LIAN-MA0 PENG
0.8
0.7 0.6
0.5 0.4
0.3 0.2
0.1 >
0
0
10
20
30
40
50
60
Angles of incidence (mrad)
70
80
90
100
FIOURE35. One-rod RHEED rocking curves from a Ni(001) p(2 x 2) surface. The “exact” curve is calculated for a relaxed surface with 6z = 0.15 A, the “reference” curve is calculated for a bulk terminated reference surface, and the “inverted” curve is restored from a leastsquares procedure.
scattering theory, which is based on first-order diffuse scattering, is often used. For diffuse scattering, the most commonly used quantity is the differential cross-section da/dn. Starting from the DWBA (V.12), we obtain
(V.25) here we have chosen the reference structure to be the averaged structure, and the difference potential is given by
W(r) = V(r) - (V(r)). To a good approximation, the interaction between the incident electrons and an assembly of N atoms may be written as
V r , r l , ...,r N ) =
N
C f(ri)q+(r - ri),
i= 1
(V.26)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
281
where p i @ ) is the potential associated with the ith atom, ri = Ri + u i , Ri is the equilibrium position of the ith atom, and ui represents the thermal displacement of the atom from its thermal equilibrium position. The distribution of defects is described by f(r), which for the simplest vacancy type of defects can be written as if there is an atom at ri otherwise
1 f(ri) =
0
(V.27)
The averaged potential is given by N
N
( W=)i C= 1 (f(ri))c(qi(r- ri))T
=
C Oi(pi(r - r i ) ) T , i=
(V.28)
1
where Oi = < f ( r i ) ) ,,(-.-),denotes averaging over the statistically distributed disorder configuration, and - ) T denotes that over the thermodynamical equilibrium. The fluctuating potential is defined as (a
6 W ,r1 , ...
N
rN) =
C f ( r i ) p i ( r - ri) - (f(ri)qi(r - r i ) ) i= 1
N
=
C f(ri)qi(r- ri) - Oi(pi(r- r i ) ) T . i=
(V.29)
1
By defining an average probability of simultaneously finding two atoms at sites ri and r j , vij = ( f ( r i ) f ( r j ) ) c (V.30) we obtain 9
In reciprocal space, we have
(V.31)
282
LIAN-MA0 PENG
where V is the volume of the crystal. In terms of rb(g), we have
(dV(r) dV*(r‘))
x [vij(exp(-ig
*
ui + ig’ * uj)) - OiOj(exp(-ig
- ui))(exp(+ig’
uj))). (V.32)
*
1. Defects Diffuse Scattering
To concentrate on diffuse scattering resulting from defects, we first ignore any displacement of the atoms around the vacancies and use a frozen lattice model, assuming ui = uj = 0. The expression (V.32) thus becomes
(dV(r) dV*(r’))
=
C Oijpi(r - Ri)p:(r’
-
Rj),
(V.33)
id
where
0.. = v.. - 0.0 I j IJ IJ
(V.34)
Substituting (V.33) into (V.25), we obtain the angular distribution of the diffuse scattered electrons:
(V.35) where Si(k, k,) is the dynamical scattering amplitude of the ith atom,
By applying the kinematic approximation to both the initial and final Bloch states, i.e., letting Wk,, = exp(iko r) and W&) = exp(-ik r), we obtain (V.37) Si(k, k,) = Si(s) = f i ( d 4 n ) exp( -is * Ri),
-
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
283
in which s = k - k,, andA(d4n) is the usual atomic scattering amplitude. Equation (V.35) now becomes do - c @,Ah* exp[-is - (Ri- Rj)], _
do
(V.38)
ij
and for a monatomic lattice,
= fj = f
do
- = NI f(s/4n)12 do
,
c @(u) exp(-is
u),
(V.39)
U
where @(u) is the average two-site correlation function : 1
@(u) = N
c 0,.
(V.40)
ij
The diffuse diffraction pattern is therefore simply a Fourier transform of the average two-site correlation function. For a general case, the kinematic approximation for both the initial and final states is not adequate. Bloch states must be used.
a. Dufuse Scattering from a Three-Dimensional Structure. For a solid which has an average three-dimensional periodic structure and a monatomic lattice, we can apply the Bloch theorem to the electron wave function associated with the average lattice to obtain vk(r) = exp(ik
*
r)%Ar),
(V.41)
where ‘Uk(r)is a three-dimensional periodic function, i.e., zIk(r -
Ri)= Qk(r).
Following (V.36) and using (V.41), we obtain
= exp(- is
*
Rj)f (D)(k,k,),
(V.42)
where f @)(k,k,) is the dynamical scattering factor given by (V.43) It should be noted that in (V.43) f (D) is independent of the atomic site index i. When the kinematic approximation applies, this expression reduces to the usual electron atomic scattering factor f(sMn).
284
LIAN-MA0 PENG
Substitution of (V.43)into (V.35)gives du
O(u) exp(-is
= N l f @’Oc, ko)12
- u),
U
and this expression is almost identical to the kinematic expression (V.39), except that f @) is now the dynamical scattering factor and is a function of both the incident wave vector ko and the scattered wave vector k. An approximate form for the dynamical scattering factor f @) may be obtained by utilizing the fact that the atomic potential V(r) is usually much more localized than the Bloch waves. The dynamical wave functions w0’ and w k may then be replaced by their values at the coordinate origin to give
(V.45) As a simple example, we now consider a case of random vacancies (Cowley, 1981). Assume there are a total of N atom sites but a number n of them, distributed at random, are vacant. For i = j, uii = 0, we have vU = (N- n)/N, Oi = (N - n ) / N , and Oii = vii - 0:= n(N - n ) / N 2 . For i # j, we have 0,= 0. The angular distribution (V.35)becomes
(V.46) Substitution of Eq. (V.45)into Eq. (V.46)gives
(V.47) This expression clearly shows that Kikuchi features (or variations with k) in a diffuse diffraction pattern result mainly from the dependence of the dynamical wave function w-k(o) on the scattered electron wave vector k. The dependence of the initial dynamical electron wave function wko(o) on the incident electron wave vector ko affects only the overall diffuse pattern intensity, rather than the angular distribution of the diffusely scattered electrons. Under the kinematic approximation, the dynamical wave function w-k(o) reduces to exp( - ik * r), the intensity of which does not depend on k. Kikuchi features in a diffuse pattern disappear under the kinematic approximation, and the general expression (V.46)reduces to that of Cowley (1981). b. Dufuse Scattering from a Two-Dimensional Structure. We now consider an application of (V.35)to a two-dimensional structure, as in the case of RHEED from a molecular beam epituxy (MBE) growing surface (Harris et ul., 1981;Lent and Cohen, 1984). Since the averaged crystal has
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
285
two-dimensional periodicity parallel to the surface, we have then
vk(r)= exp(ik
x)'uk(x, z),
(V.48)
where 'U,(x,z) is a two-dimensional periodic function parallel to the surface. For a monatomic lattice, from Eq. (V.36) we have (V.49)
The differential cross-section (V.35) now becomes da
_ - Cjj Oijexp[-is dQ
*
( x i - xj)]f(zi)f(zj).
(V.51)
For a random deposition model, Oii = 8, - 8;, where Oi is the layer coverage of the ith growing layer, and Oij = 0 for all i # j . From (V.51) we thus have (V.52) where No is the total number of atom sites within a layer. In particular, for a two-level system we have (V.53) i.e., the diffusely scattered beam intensity oscillates with the layer coverage 8. Elastically scattered beam intensity also oscillates during MBE growth via its dependence on the average potential and therefore on the layer coverage (Peng and Whelan, 1991b,c,d). For a general layer-by-layer growth model, we have for all pair of atoms whose Oij does not equal zero, zi = 0. The angular distribution for the diffuse scattering is given by
This result is similar to that obtained by Cohen and co-workers based on a kinematic theory of electron diffraction. It should be noted, however, that here our definition of the correlation function O(u) is different from that of Lent and Cohen (1984), and our result is derived from the full dynamical theory of RHEED from the average structure.
286
LIAN-MA0 PENG
2. TDS Scattering
We now consider TDS scattering from a perfect crystal free of defects, i.e., f(ri) = 1 for all atom sites. Expression (V.32) then becomes (dV(r) dV*(r‘))
1
=
-
dg dg’ 7 &(g)rj$(g’) exp[ig * (r - Ri)- ig’ (r’ - Rj)] (2R)
x [(exp(-ig
- ui + ig’ - uj))- (exp(-ig
*
ui)>(exp(+ig’- uj)>). (V.55)
By neglecting the effect of an harmonicity of crystal lattice vibration, the preceding expression can be simplified to give (dV(r) dV*(r’))
-
1&$
$j
dg % + i ( g ) ~ T ( g ‘ ) (2R)
-
exp[ig (r - Ri)- ig‘ (r‘ - Rj)]
x exP[-Mi(g) - Mj(g’)llex~[Y~(g, g’)l - 11,
(V.56)
in which Mi is the usual Debye-Waller factor of the ith atom, Mi(g) = +((g and
*
ui)2>,
(V.57)
qjis the correlation function of atomic displacements, K j k , g‘) = ((g * ui)(g‘ *
uj)>-
(V.58)
If a crystal lattice is of finite dimension, there exists only a limited number of N distinct lattice wave vectors, with N being the total number of unit cells in the crystal. Since for each wave vector q, there are 3p modes of thermal vibrations, where p is the number of atoms in a unit cell, there are thus a total of 3pN independent lattice waves. For small lattice displacements, the principle of superposition applies. The total atomic displacement ui may then be written as (James, 1962)
q a=l
where o,(q) is the circular frequency of the lattice wave with wave vector q polarized in the direction of the unit vector e, , aqais the amplitude of the elastic wave, and d,, is a random phase factor, reflecting the fact that there exists no definite phase relationship between different lattice waves. In
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
287
general, all frequencies o,@)are functions of k. To a good approximation, however, we can divide the 3p modes for each wave vector into two groups, i.e., acoustic (a = 1,2,3) and optical (a = 4, ...,3p) branches. The three acoustic waves describe vibrations with frequencies that vanish linearly with k in the long-wavelength limit, i.e., w,(q) = C,q, with C, being the velocity of propagation of the wave in the solid. The frequencies of the other 3(p - 1) branches do not vanish in the long wavelength limit, and the dispersion relations for these branches may be approximated as o,(q) = o, = constant. For optical branches, by neglecting the dependence of the polarization vector e, on the wave vector q, it can be shown that (Dudarev et al., 1993a) &,j(q, q’) = an,n’((q ’ u n , k ) ( q ’ ’ unt,kt)),
(V.60)
here we have used the notation that i = (n,k), j = (n’,k’),where n is the index of the unit cell, and k is to distinguish different atoms within a cell. Equation (V.60) clearly shows that the optical branches describes primarily the mutual motion of atoms within a unit cell, but do not contribute to the correlation function of displacements associated with atoms belonging to different cells. The correlation function of atomic displacements in different cells is determined mainly by acoustic branches of lattice waves. For an isotropic solid and a long wavelength limit, the correlation function is given by (V.61) where p = MN/V is the mass density of the substance, and M = C Mk is the total mass of a unit cell. In deriving (V.61) it has been assumed that the sample temperature is relatively high and that both the transverse and longitudinal waves have the same velocity C. Equations (V.60) and (V.61) clearly show that the behavior of the correlation function at large distances is determined by acoustic lattice vibration. The correlation radius of the optical branches is limited to relatively small scale having the order of the lattice constant. Since the optical correlation function is well defined spatially, a simple argument based on the uncertainty principle then suggests that the angular distribution of the diffuse scattered electrons by optical phonons depends only slightly on the angles of incidence and contributes only to the smooth diffuse background between Bragg spots. On the other hand, since the acoustic modes of vibrations are longranged, in momentum space they must be well defined, as opposed to the optical modes of vibration. The acoustic phonon excitations are therefore
288
LIAN-MA0 PENG
Fxom 36. RHEED from a Pt(ll1) surface. 100 keV high-energy electrons are incident at the surface along the 11101 zone axis.
responsible for the sharp diffuse maxima surrounding the Bragg spots in THEED geometry (Rez et al., 1977). and give rise to the appearance of transmission-like spots in RHEED patterns as shown in Fig. 36 (Peng and Cowley, 1988b).
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
289
FIGURE37. Schematic diagram showing the formation of a Z-contrast image in a STEM instrument.
D. Z-Contrast Imaging
Although diffusely scattered electrons result exclusively from the nonperiodic deviation of the crystal structure from its mean periodic lattice, the diffusely scattered electrons may be used to form high-resolution images of atomic resolution (Cowley, 1988). A fine example is provided by the Z-contrast imaging technique as developed by Pennycook and co-workers (Pennycook and Boatner, 1989;Jesson et al., 1991;Pennycook and Jesson, 1993). Z-contrast imaging is normally performed in a scanning transmission electron microscope (STEM),as shown in Fig. 37. A finely focused electron probe, typically of the order of 5 A or less, is focused at the specimen. Images of the sample are formed by scanning the probe across the sample. Although most of the beam will be diffracted through quite small angles, some will be scattered through larger angles, and it is this component that is collected by an annular detector (AD) and used to form the Z-contrast image, i.e.,
(V.62) with (da/dQ) being the differential cross-section as given by Eq. (V.25).
290
LIAN-MA0 PENG
In THEED, high-energy electrons are not sensitive to variations of the crystal potential along the beam direction. It is then a good approximation to use a projected potential approximation assuming that V(x, z ) = V(x). A two-dimensional Fourier transform of this potential then gives (V.63) Considering only the TDS contribution, Eq. (V.56) then reduces to (6V(X) 6V*(x’))
X
exP[-Mi(q) - Mj(q’)lIexp[Vj(q, ~ ‘ 1 1 11.
In general, both wk,(r) and Y-k,(r) in Eq. (V.25) are dynamical and should be expanded in terms of Bloch waves. For large-angle scattering, however, it is a reasonable approximation to assume that the scattered electron wave function has the form of a plane wave (Pennycook and Jesson, 1993)
wdL(r) = exp(-ik
- r).
(V.64)
For the initial dynamical wave function wko(r), we use a general expression (V.65) Substitution of Eqs. (V.63)-(V.65) into Eq. (V.25), we have = (syiidrdr’exp(-ik-r)exp(ik.r’)
-
X
(W4
-
dq dq‘&(q)t#$(q’) exp[iq (x - xi) - iq’ (x’ - xi)]
x;&si
11
dg d h w,(z)w&’)
dgdh
exp(ig * x) exp(-ih
- g)fj*(h - k,) exp[-i(k, - g)
*
*
x’)
xi
+ i(kr - h)
*
xj]
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
29 1
where Letting
X
exp[-Mi(kt - g) - Mj(kt - h)llex~[Kj(kt- g, k, - h)l - 11, (V.66)
and integrating the differential cross-section over the annual detector, we obtain the detected annual dark field signal
zAD= =
iAD
dk,($)
1 ,I
11 i,j
x
( 2 ~ ) ~
11
d g d h f i y ( g , h) exp(ig * xi - ih xj)
dzdz’ w,(z)wh+(z’) exp(-ik,z) exp(ik,z’).
(V.67)
Following Pennycook and Jesson (1993), we assume that h, g 6 k,. Equation (V.66) then becomes r
x exp[-Mi(kt) - Mj(kt)llex~[Yj(kt,kt)l - 11.
(V-68)
The preceding expression may be further simplified by noting that for xi # xi, the integrand depends on the phase difference between two scattering centres at xi and xj via an exponential function, i.e., exp[-ik,
- (xi - xj)].
Since this function is a rapid oscillating function comparing with all other functions in (V.68), contributions resulting from different atom sites will almost completely cancel out with each other after the integration, and the net contribution to ZAD from different atom sites is therefore much smaller than the self-correlation term. To a good approximation we thus have
(V.69)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
293
where
ui =
dkr lh(kt)lz{1 - e x ~ [ - m i(kr)~ 13
(V.70)
SAD
and this is equivalent to the independent vibration model of Einstein. Using the approximate expression (V.69),we have from Eq. (V.67) n
(V.71) and this is almost identical to the expression obtained by Pennycook and Jesson (1993)based on a &function approximation to the high angle potential. Since the detected signal ZAD depends only on the electron density at the atomic sites, only the bound states (we will discuss these bound states in more detail in Section VI) will contribute appreciably to the image. The use of the annular dark field detector or Howie detector (Howie, 1979) thus serves as a bound Bloch wave selector, and the resulting image shows mainly the selected bound Bloch wave. For the case of Si, the selected bound Bloch wave is the ls-state. Shown in Fig. 38 is a (1101 Z-contrast STEM image of a nominal (Si4Ge& superlattice, together with the simulated images and the derived structure model based on the image simulation. In this image, a 2.2 8, probe is used. Each bright spot or column in the image corresponds to an individual dumbbell (atoms composing the dumbbells are 1.36 A apart and are not resolved) and is slightly elongated along [Ool]direction. The column positions are independent of objective-lens defocus and specimen thickness (Jesson et al., 1991).
VI. BLOCHWAVE CHANNELLING AND RESONANCE SCATTERING As was first shown by Kikuchi and Nishikawa (1928) for calcite, the
diffraction of high-energy electrons by a single crystal consists of a pattern of intersecting straight lines and bands in addition to the Bragg diffraction spots. Kikuchi (1928)correctly explained the lines as due to Bragg reflections of the diffusely scattered electrons in the crystal. Shortly after, Shinohara (1935)observed that, as well as the straight lines, there exist circular arcs and parabolic curves. The parabolas were explained as envelopes of a family of Kikuchi lines corresponding to a set of co-zonal planes, and the rings as envelopes of a family of parabolas. Alternatively, the ring patterns
294
LIAN-MA0 PENG
were explained by Emslie (1934), based on the old idea of Kikuchi (1928), as due to one-dimensional diffraction effects by atom strings lying in a direction nearly parallel to the incident electron beam. The mechanism by which the electrons are trapped into the atom strings was originally proposed by Emslie (1934) as due to inelastic scattering. Detailed studies show, however, that only ultrahigh-energy particles (several mega-electron volts and higher) can be inelastically captured by the atom string. The original mechanism is therefore not applicable to the high-energy electrons of typically 100 keV to 400 keV. In this section we will present a dynamical elastic scattering mechanism for the localization of electrons around atom strings (giving rise to ring patterns) and atom planes (giving rise to parabola patterns), and we will show how these localized waves may be deliberately excited and imaged. A . Two-Dimensional Bloch Waves and Axial Resonance Dvfraction
For high-energy electron diffraction, the scattering is predominantly forward. It is then convenient to factor out the rapid variation of the electron wave function in the z axis by letting
v(r) = v(x,z ) = exp(ikozz)40(x,z),
(VI. 1)
where we have used the notation that r = (x, z ) and k, = (9, k,). By substituting (VI. 1) into (IV. l), neglecting the second derivative d24,/az2 and using a projected potential approximation to the crystal potential, i.e., U(r) U(x),we obtain J
1. Two-Dimensional Bloch Waves
To find the solution of Eq. (VI.2) which satisfies the boundary condition at the entrance surface z = 0,
-
(VI.3) 4,(x, 0) = exp(iq XI, we now introduce a complete system of two-dimensional Bloch waves, b(j)(q,x)
=
Ch C,?(q)
-
exp[i(q + G h ) XI,
(VI .4)
for the transverse motion of the fast electrons within the crystal. In Eq. (VI.4) the superscript j denotes the transverse energy band index for a given wave vector q in the two-dimensional Brillouin zone, and the summation over h is carried out over the two-dimensional reciprocal lattice
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
295
vectors Gh. If we choose the vectors C,? to be normalized to unity, i.e., & Ic,?~~ = 1 , the matrix cA') is then a unitary matrix (in the absence of absorption) which satisfies the orthogonal relations
Under the projected potential approximation, we can separate the solutions of (VI.2) into x- and z-dependent parts, i.e., b"'(q, x ) and Z"'(q, z), and expand the electron wave function C$~(X, z) as C$o(x,z ) =
c a"'(q)b'J'(q,x)Z"'(q,
2).
(VI.6)
i
Substituting (VI.6) into (VI.2), we obtain a set of differential equations for the Bloch waves b"'(q, x ) : (VI.7) and a set of ordinary differential equations for Z")(q, z):
2ik0,
[
dZ"'(q, z) 2m = - E"'(q) - - q 2 2m h2 h2 dz
1
Z'j'(q, z),
(VI.8)
where the function Z"'(q, z) satisfies the boundary condition z q q , 0) = 1. Since the projected potential U(x) is periodic in the ( x , y ) plane, we can expand U(x)as a Fourier series, i.e.,
U(X) =
c
u h exp(iGh ' X ) ,
(V1.9)
h
and transfer the set of differential equations (VI.7) into a set of eigenvalue equations for Ci''(q) and E(''(q):
in which the energy eigenvalueE"'(q) is usually referred to as the transverse energy in THEED. For a given energy eigenvalue E'j'(q) we can easily solve Eq. (VI.8) to obtain
In the absence of absorption, the Bloch wave excitation amplitudes a"'(q) in (VI.6) is determined by applying the boundary condition (VI.3)
296
LIAN-MA0 PENG
and using the orthogonal properties (VI.5) to be &)(q) = Cij)*(q). From
Eq. (VI.6), the electron wave function &,(x, z) within the crystal is
Two-dimensional Bloch waves can be classified as bound Bloch waves or free Bloch waves according t o whether or not the transverse energy E(’)(q) is negative. For all bound states, the transverse energies are negative and the corresponding Bloch waves are localized around atom strings. Particularly for the tightly bound Bloch states with large negative values of ,@)(q), the Bloch states will show little dispersion for different values of q. In real space the corresponding Bloch waves are then highly localized such that the overlap between electron wave functions localized around neighbouring atom string is very small. Shown in Fig. 39 are 10 branches of the band structure (plot of E ( j ) as a function of q) for 1T-VSe2 along the [0001] zone axis. At room temperature this crystal shows a trigonal layered structure, and along the [0001] zone axis the projected potential exhibits three well-separated atom strings (one of V atoms and two of Se atoms; Bird, 1989). The calculation is made for a primary beam energy of 100 keV, using 61 ZOLZ reflections. This figure clearly shows that the band structure is characterized by the presence of three branches, E‘”’) +r - 15 eV and = - 5 eV, each having only
.. ........
. . . ...
h
s w
100-
-10-
FIOURE39. A cross-section of the two-dimensional Bloch wave band structure for 100 keV and a 1T-VSe, single crystal along the [OOOI] zone axis. [From Dudarev and Peng, 1993.1
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
297
little dispersion and being well separated from all other branches Z?, (J = 3,4,5, ...) near and above 15 eV. Extensive numerical calculations have further shown that the situation as illustrated in Fig. 39 is very typical, i.e., in general in each principal zone axis there exist from one to several branches of transverse energy which are hardly dispersive, or for which EU)(q)is approximately constant, corresponding to the tightly bound states of electrons around the atom string potential. Shown in Fig. 40 are the projected potential V(x)and the three most tightly bound Bloch waves for [OOOl] 1T-VSe, . Since the nuclear charge of the Se atom is larger than that of the V atom, the projected potential wells around the Se string are deeper in Fig. 40a than that around the V string at the centre of the figure. The three most tightly bound Bloch waves are localized around the Se atom strings (Bloch waves 0 and 1) and the V atom string (Bloch wave 2), respectively. Other branches with relatively higher energies near or above zero are characterised by strong dispersion and describe almost free electron motion in the projected potential field V(x).
2. One-DimensionalDiffraction of Two-Dimensional Bound Bloch Waves We now consider the scattering of a tightly bound Bloch wave by an atom string, as shown in Fig. 41. Electrons within the tightly bound Bloch state will be scattered in all directions by the successive atoms along the string. For a constructive interference between scattered waves from all atoms along the string to occur, the optical path difference between the scattered waves from the successive atoms along the string must be a multiple of the electron wavelength within the crystal, i.e., n'b - nb cos 8 = 11,
(VI.13)
where n is the refractive index for the scattered waves, n' is that for the tightly bound Bloch wave propagating along the atom string, b is the crystal repeat distance along the crystal zone axis, A is the electron wavelength within the crystal, and I is an integer. By neglecting the dynamical diffraction effects on the scattered waves, n can be approximated as unity, and n' can be obtained from the ratio of the electron wave vectors of the tightly bound Bloch wave and the scattered wave (Peng and Gjmnes, 1989): (VI.14)
298
LIAN-MA0 PENG
FIOURE 40. Contour maps of (a) the projected potential, and (b), (c), and (d) the electron density distribution for Bloch waves 0, 1, and 2, respectively. The maps are calculated for 100 keV electrons incident at a IT-VSe, single crystal along the [Oool] zone axis. [From Dudarev and Peng, 1993.1
Substituting (VI.14) into (VI.13) and noticing that L = 2n/k,, we obtain (VI.15) i.e:, for each excited bound Bloch wave with a distinctive transverse energy E(') there is a characteristic scattering angle for which constructive interference occurs. If in Eq. (VI.15) we neglect E"), Eq. (VI.15) then reduces to give a set of angles with which the Ewald sphere intersects with HOLZs. The key result from Eq. (VI.15) is that a separation between distinct bound Bloch states is now realized in reciprocal space as shown schematically in Fig. 41 and in Fig. 2 as the HOLZ fine lines.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
299
FIGURE 40-continued
3. Axial Resonance Scattering and Selective Excitation of Two-Dimensional Bound Bloch Waves
An inverse process to the zone-axis diffraction discussed earlier is the offaxis or inverse HOLZ diffraction as shown in Fig. 42. Instead of sending the electron beam down a crystal zone axis, we now send an electron beam down one of the directions eci)where constructive interference between scattered electron waves by successive atoms along a string occurs, as in the inverse HOLZ diffraction geometry (Steeds et al., 1982). A simple application of the reciprocity principle (Pogany and Turner, 1968) then suggests that selective excitation of the bound Bloch waves may be achieved in real space (Dudarev and Peng, 1993).
300
LIAN-MA0 PENG
From 40-continued
As shown in Fig. 42, under the off-axis incidence, minus high-order Laue zones (MHOLZs) reflections are involved. A projected potential approximation is therefore inadequate. To discuss dynamical diffraction processes under this off-axis geometry, we need to include at least reflections of the ZOLZ and one of the MHOLZs. Assuming that the offset between the successive Laue zones is (AG,g), we can then write a general reciprocal lattice vector in the nth HOLZ as
gp) = (C,, + nAG,ng).
(VI.16)
The three-dimensional crystal potential U(r) can then be expanded as
Up)exp(igf) * r)
U(r) = n,h
exp(ingz)U@)(x),
= n
(VI.17)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
301
FIOURE&continued
with U(")(x)=
c Up)exp[i(Gh + n AG) - XI.
(VI.18)
h
We now search for a solution of the wave equation (VI.2) in the form
rather than the simple form (VI.6). By substituting (VI.19) into (VI.2), we obtain the following set of equations for the wave functions +,, (n = 0, 1,2, ...):
302
LIAN-MA0 PENG
10// b
FIGURE 41. Schematic diagram showing a one-dimensional zone-axis diffraction geometry.
The boundary conditions for &(x, entrance face z = 0 are
z) in the preceding equations at the (VI.21)
By analogy with zone-axis diffraction, we can expand the wave functions 4,, (n = 0, 1,2, ...) in terms of two-dimensional Bloch waves b")(k, x), PiJ'(q
v/,,(x, z ) =
+ n AG,z)b"'(q + n AG,x),
(VI.22)
J
wherePiJ)(q + n AG,z) are subjected to the following boundary conditions: @iJ'(q + n AG,0) =
a
ifn=O otherwise
(VI.23)
'
0 ZOLZ
/
lstMHOU
FIGURE42. (a) Schematic diagram showing an off-axis diffraction geometry. The corresponding experimental electron diffraction patterns shown in (b) and (c) are obtained from the GaAs [OOl] zone axis using 100 keV electrons. Shown in (a) is a portion of the ZOLZ pattern, and in (b) is the MFHOLZ pattern.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
FIOURE 42-continued
303
304
LIAN-MA0 PENG
To find the coefficients /3(J'(q + n AG, z), we first let t = z/u, where u = hkoz/m. Substituting (VI.22) into (VI.20) and noticing that for highenergy electrons ko, ng and the Bloch waves b(')(q n AG, x) satisfy
*
+
Eq. (VI.7), we obtain ih
c -dtd B,?b"'(q
+ n AG, x)
J
E"'(q
-
C
c V("-')(x)fi,(B,bu)(q+ I AG, x)
I#n j
h2 + n AG) + [(k,+ ng)2 - kz,] 2m
I
/3jJ'b'''(q
+ n AG, x).
By multiplying the preceding equation by b(J')*(q+ n AG, x), integrating the relations thus obtained, and using the orthogonal property for the twodimensional Bloch waves (VIS), we arrive at the following set of ordinary differential equations for piJ):
(VI.24)
where V c n - I ' ( J ( j= ) -
C CiJ'*(q + n AG)V~!!~I,"C$(q+ IAG).
(VI.25)
k,m
The effects of absorption can be taken into account in (VI.24) by using a first-order perturbation theory (Hirsch et al., 1965), i.e., neglecting the change in the eigenvectors C,?(k) and considering only the changes in the energy eigenvalues E(j)(k): E"'(k)
+
E("(k) - ipCJ'(k)
with (VI.26)
in which V'(x) is the absorptive part of the effective atomic potential. Substituting (VI.6) into (VI.24) we obtain
-
c j
V("-')(JIj)p,")- ip"'(q I#n
+ n AG)piJ).
(VI.27)
Equations of this form are well known in the theory of resonance scattering (Mott and Massey, 1965; McRae, 1979). Their solutions depend on the matrix elements V c n - " ( J l j )of the channel coupling between two
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
305
Bloch waves, the absorption coefficients p(J)(q + n -G), and the resonance detuning parameter 6iJ)(q), h2 6iJ)(q)= E”)(q + n AG) + -[(koz+ ng)2 - GI. (VI.28)
2m
To find a solution to the set of differential equations (VI.27), we first consider the dependence of the detuning parameter SiJ)(q) on the angle of incidence 8 of the electron beam with respect to the z axis. For 8 = 0, ko = k,, all #)(q) at eE, = ( l ~ k , ) ~ / 2=mlo5eV are of the order of 6i3)(0) = nh2Xg/m = lo3eV and are positive. With an increase in the angle of incidence 8, the function 6iJ)(q)decreases with increasing angle of incidence 8 and approaches zero at angles 8iJ)satisfying
(VI.29)
Noticing that g = 27r/b, we see that the preceding expression is identical with Eq. (V1.15). If in (VI.29) we neglect the term, the preceding condition then reduces to the geometrical condition for the Ewald sphere being tangential to the nth MHOLZ (see Fig. 42). It should be noted that the expression (VI.29) is an implicit equation for 8iJ),since the E(J)(q+ n AG) function depends on the angle of incident 8 as well as on the beam azimuth through q. For the tightly bound Bloch states, E(J)(q)= E”) = constant. The solutions of Eq. (VI.29) are independent of the incident beam azimuth 9, (VI.30)
and the solutions describe directional cones with the apex angle 28f) around the crystal zone axis. For branches of E(J)(q)lying close to or above the barrier of the atom string potential, the solutions of (VI.29) generally take a form which is strongly dependent on the azimuthal angle 9,and they are in general rather complicated. For the tightly bound Bloch states J and J’, the angular separation between the angles 8iJ) and 8is””may be approximated as (VI.31)
i.e., the angular separation A8 is directly proportional to the transverse energy separation A E of the two corresponding tightly bound Bloch states.
306
LIAN-MA0 PENG
In what follows we will consider the solutions of Eq. (VI.27) under the condition 8 = OiJ), or more quantitatively,
(VI.32) where n, m = 1,2, ...,J' # J , and the J t h Bloch state is a tightly bound state. Hereafter we shall refer to the angle 0iJ)as the axial resonance angle and the condition (VI.32) as the axial bulk resonance condition for the J t h tightly bound Bloch state. It can readily be shown that, in the indicated range of incident angles, the following inequality is valid: 16AJ'(q) I
* I &?q()
I;
(VI.33)
consequently, within the bulk crystal effectively only the Jth tightly bound Bloch state will be appreciably excited. For simplicity, here we will be considering a one-beam case where the incident beam is so tilted that apart from the incident beam no other ZOLZ reflections are appreciably excited, and where the Ewald sphere is approximately tangential to a MHOLZ having index n (see Fig. 42). To a good approximation we may retain in (VI.22) only a plane wave component for 4o and Bloch waves which are associated with the nth MHOLZ reciprocal lattice plane only. Equation (VI.22) thus reduces to
-
z ) = 4 z ) exp(iq XI,
If we further note that the absorption coefficients p(J)(q + n AG) and the matrix elements V("-')(JI j ) in (VI.27) do not generally exceed several electron volts, the following inequalities may then be obtained if the bulk resonance condition (VI .32) for the Jth bound Bloch state is satisfied:
+ m AG)( 4 ISg"(q + m AC)I; IV("-')(J' Ij)l 4 IS,""(q + m AG)I,
Ip""(q
(VI .35)
for J' # J. Also noticing that pg')(z)is a smooth function of z, we obtain an upper estimate of pg')(z) for J' # J ,
which suggests that, if the bulk resonance condition (VI.32) is satisfied, we need to retain in (VI.34) only the Jth tightly bound Bloch state. Shown in Fig. 43 are u(z) and the Bloch wave excitation amplitudes p!!,)(z) (J = 0, 1 , 2 ) for [Oool]lT-VSe2, calculated for a primary beam energy of
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
307
'p E l 0.5
0,108
0.1085
0.109
0.1095
0.11
0.1105
0.111
Incident angle (rad)
b
1
Incident Wave Bloch Wave 0 Bloch Wave 1 Bloch Wave 2
0
500
1000
1500
2000
2500
Crystal thickness FICWRE 43. Bloch wave excitation amplitudes as a function of (a) incidence angle 8, and (b) crystal thickness for 100 keV electrons and a IT-VSe, crystal. [From Dudarev and Peng, 1993.1
100 keV as a function of incidence angle (Fig. 43a) and crystal thickness (Fig. 43b). Two resonance peaks associated with Bloch waves 1 and 2 are clearly seen in Fig. 43a, at angles of 109.5 and 109.9mrad, respectively. The excitation amplitude of Bloch wave 0 is negligible. This is because Bloch wave 1 results from a bonding combination of 1s states associated with two separate Se strings, and the contributions from the two strings to the matrix element V(')of the channel coupling cancel out with each other. For the anti-bonding combination (Bloch wave 1) the situation is the
308
LIAN-MA0 PENG
opposite. The two Se atom strings contribute almost equally to the matrix element of the channel coupling, resulting in an enhancement in the channel coupling between the incident electron beam and Bloch wave 1 near 109.5 mrad. This figure clearly shows that under the bulk resonance condition effectively only a single Bloch wave is excited within the crystal. The general set of equations (VI.27) then reduces to a set of two equations relating the two coefficients a(t) = a(z) and P(t) = /3AJ)(z), d ih - a(t) = - V*p(t) - iya(t), dt d ih-/3(t) dt
=
#)/3(t) - Va(t)
-
(VI.37)
ipB(t),
in which
=
1
5 A
b(J)*(q + n AG, x)V(")(x)exp(iq * x) dx,
(VI.38)
A
--
and according to (VI.26) and (VI.28), y =
A
1
eV'(x)dx,
A
(VI.39)
The set of equations (VI.37) is similar to the well-known two-beam Howie-Whelan equations (Howie and Whelan, 1961), and the solution of This set of equations takes the form a(t) = exp[
309
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
(VI.40)
From (VI.19) we obtain the total electron waye function within the crystal,
v(r) = exp(ik, - r)ar(t) + exp[i(k, + ng)~]b(~)(q + n AG, x)B(t),
(VI.41)
i.e., for fast electrons which are incident at the crystal with an angle 0 = 0:' with respect to the crystal zone axis, the incident plane wave w0 = exp(ik, r) will then be resonantly coupled with a tightly bound Bloch wave b"'(q + n AG,x) belonging to the nth MHOLZ. Effectively only one bound Bloch wave is excited, and selection may be made of the bound Bloch waves by appropriately tilting the crystal.
-
B. One-Dimensional Bloch Waves and Planar Resonance Diffraction We now consider another useful diffraction geometry, i.e., the systematic diffraction geometry, in which only reflections lying on a rod of reciprocal lattice are appreciably excited. Under this diffraction geometry, the crystal potential may be approximated as a one-dimensional potential, i.e., U(r) = I/(x) =
Cn ~,,exp(ingx),
(VI.42)
where g is the shortest basic reciprocal lattice vector along the reciprocal lattice rod. The two-dimensional Bloch wave equation (V1.2) is reduced to become
a
2ikoZ 44%z ) - V:4(x, z) + Wx)4(x, z ) = q24(x,z).
0'1.43)
1. One-Dimensional Bloch Waves
We first introduce a complete system of one-dimensional Bloch waves b")(koX,x) =
Cn c t ) ( k 0 )exp[i(kox+ ng)x].
(VI.44)
Since the potential involved in Eq. (VI.43) is a one-dimensional potential, we can separate the solution of Eq. (VI.44) into x- and z-dependent parts,
310
LIAN-MA0 PENG
and expand the wave function in terms of one-dimensional Bloch waves,
$0, z) =
c c ~ ( ~ ) ( k ~ , ) bx~) exp[i(koy ’ ( k ~ , , + Gy)ylZ(j)(kox, z).
(VI.45)
j
Substituting Eqs. (VI.44) and (VI.45) into Eq. (V1.43), we obtain a set of differential equations for the one-dimensional Bloch waves b(J)(kox, x):
[-V:
2m + U(x)]b(”(ko,,X) = ~E”’(kO)b(”(ko,, x), h
(VI.46)
and a set of equations for the z-dependent component of the wave function: dZ”)(kox,z) = 2ikOz
$bu)(k0)- hZ [q2 -
dz
2m
Substituting Eq. (VI.44) into Eq. (VI.46), we obtain the fundamental equation for the one-dimensional Bloch waves,
(k,, + ng)’C,?(ko,)
+ C Un-,,#C$)(ko,)= ~2mE ( ” ( k o ) C ~ ) ( k o (VI.48) ). n’
The orthogonal relations for the one-dimensional Bloch waves are as follows: c,*(j)c(j) n’ = 6nn,, c,*(j)c(j’) = dd,. (VI.49)
c
c
j
n
Like the two-dimensional Bloch waves, the one-dimensional Bloch waves can be classified as bound or free according to whether or not the energy eigenvalue E(”(ko) is negative. For bound Bloch waves, the energy eigenvalues are negative and the Bloch waves are localized around the atom planes. Shown in Fig. 44a are six branches of the one-dimensional band structure for 400 keV and a MgO single crystal, and shown in Fig. 44b are the corresponding excitation amplitudes of these one-dimensional Bloch waves. In this figure, the horizontal axis is labelled in terms of (200)Bragg condition, i.e., for an angle of incidence of 1.0, the (200)Bragg condition is exactly satisfied. From Fig. 43b it is seen that for this systematic diffraction conditions the total electron wave function is dominated by the lowest three Bloch waves, and in all angular ranges the excitation amplitudes for these three Bloch waves are appreciable. 2. Planar Resonance Scattering and Selective Excitation of One-DimensionalBound Bloch Waves
To selectively excite one-dimensional bound Bloch waves, we now consider a diffraction geometry as shown in Fig. 45, where the incident beam and a
160
I
I
I I I
140
-
120
-
100
-
80
-
60
-
40
-
20
-
-40
I
1 2 ..__ 3 4 5 ___
.
6
0 -
-20
I .
---
__---
--------_____-__ -----
._... ._._-....__----
...
._._...__
1
1
I
I
I
-1
-0.5
0
0.5
1
-1
-0 5
0 Angle of incidence
0.5
1
Angle of incidence
1
0.8
0.6
0.4
0.2
0
FIOURE 44. (a) One-dimensional band structure for 400 keV electrons and a MgO single crystal; and (b) the corresponding one-dimensional Bloch wave excitation amplitudes. The horizontal axis origin corresponds to [001] zone axis incidence, and the boundaries correspond to the first Brillouin zone boundaries where (i200)reflections are excited.
312
LIAN-MA0 PENG
FIOURE45. Schematic diagram showing the formation of a resonance parabola as an envelope of a set of Kikuchi lines for a MgO single crystal.
side reciprocal lattice rod are involved. In general the projected potential distribution can be conveniently expanded as
W, u) =
c VGyexp(iG,,u), GY
with
by=
c Vng, Gy>exp(ingx), n
(VI. 50)
and for simplicity here we have assumed that the offset between different reciprocal lattice rods is zero, such that for a general two-dimensional reciprocal lattice vector we have G = (ng, G,,).By expanding the electron wave function in terms of the one-dimensional Bloch waves
3 13
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
similar procedures leading to (VI.24) give
(VI.52) where
In analogy to (VI.29), the planar resonance condition is given by
(VI.54) For tightly bound Bloch waves, E(j)(kox)= E") = constant. For these waves, the preceding equation (VI.54) then gives a set of parabolas on the q = (kox,koy) plane, each corresponding to a tightly bound onedimensional Bloch waves. For E"' = 0, Eq. (VI.54) reduces to a geometric condition for the Laue circle (intersection of the Ewald sphere with ZOLZ) being tangential to the reciprocal rod G,, , and the set of parabolas defined by Eq. (VI.54) then reduces to the envelope of a family of Kikuchi lines, as shown in Fig. 45. When the incident beam is so tilted that the center of the Laue circle lies on the j t h parabola, only the j t h tightly bound onedimensional Bloch wave will be strongly excited. A TEM image recorded for this angle of incidence will then be dominated by the j t h tightly bound one-dimensional Bloch wave. Since the planar bonding energy E'j' depends sensitively on the composition, occupation probability, and acceleration voltage, the method of selective excitation of one-dimensional Bloch waves may be used for the studies of interface and surface. Shown in Fig. 46a is the one-dimensional band structure corresponding to a diffraction geometry where the center of the Laue circle lies along the line connecting the origin and the (060) reflection as shown in Fig. 45. The corresponding Bloch wave excitation amplitudes are shown in Fig. 46b. It is seen that when the projected center of the Ewald sphere lies to the far left of the exact Bragg condition for the (060) reflection (3.0 in the horizontal axis), only the incident plane wave is present in the crystal. When approaching the Bragg condition from the left of the figure, the incident plane wave is resonantly coupled to the most tightly bound Bloch wave number 1, and within a narrow band, it is seen that the electron wave function in the crystal is dominated only by this tightly bound Bloch and the incident plane wave. On passing this tightly bound Bloch wave, the incident wave is then seen to interact strongly with other Bloch waves, and these
LIAN-MA0 PENG
1
11 5 6
.__
- -
0.8
0.6
0.4
0.2
C
1
1.5
2
2.5
3
3.5
Angle of incidence
4
4.5
5
5.5
'
FIGURE46. (a) Calculated one-dimensional band structure and (b) corresponding Bloch wave excitation amplitudes for 400keV incident electrons and a MgO crystal. The horizontal axis is directed along the line connecting (OOO)and (600)disks in Fig. 45 and is labelled in terms of the Bragg angle for the (200) reflection.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
3 15
From 47. Simulated BF CBED disk for 400keV and a MgO single crystal.
Bloch waves are seen to have been selectively excited within a certain angular range, within which they dominate over all other Bloch waves. Shown in Fig. 47 is a simulated BF CBED disk, for MgO crystal and 400keV incident electrons. Some straight black bands are seen in this figure, which result from Bragg diffraction of the type (06 f 2n) as shown in Fig. 45. The most striking feature is, however, the parabola to the left of the CBED pattern. This parabola results from the most tightly bound Bloch state as previously discussed, and to the left of which it is seen that diffraction effects are absent. For incidence angles corresponding to a projected center of the Ewald sphere lying on this parabola, only the most tightly bound Bloch wave will be excited, and the wave is highly localized around the atom planes parallel to the line connecting the (OOO) and (060)reflections.
316
LIAN-MA0 PENG
C . Surface Resonance 1, Surface States, Resonance, and Enhancement Conditions A rough comparison between a THEED (Fig. 2) and a RHEED (Fig. 3) pattern suggests that there exist some common features between the two kinds of patterns. These include the ring and parabola patterns as previously discussed in the context of bulk resonance scattering. For highenergy electron diffraction, this fact may be understood by looking at the problem of surface scattering from a transmission point of view (Peng and Cowley, 1986 and 1988a; Wang et al., 1989). The problem of RHEED is then reduced to a problem which involves only forward scattering, and THEED theory can therefore be applied. Both the one- and two-dimensional resonance scattering mechanisms may thus be adopted for interpreting surface resonance phenomena. In principle, these features are associated with the properties of the bulk crystal (Marten and Meyer-Ehmsen, 1985; Lehmpfuhl and Dowell, 1986; Peng et al., 1988; Dudarev and Whelan, 1993) and are not surface-specific. When the surface disturbance is strong enough, however, the energy eigenvalues associated with atom strings or planes near the surface may deviate appreciably from that of the bulk. The phenomenon then becomes surface-specific, and the associated electron waves are localized around the surface. These states correspond to the so-called Tamm states (Tamm, 1932). Surface resonance associated with these states have the same origin as that of the bulk resonance. In the presence of external surfaces, evanescent waves are allowed in addition to the bulk propagating Bloch waves, and electrons can be reflected by the surfaces. Some new features are present as a result of surfacereflection (MacRae, 1979; Echenique and Pendry, 1978). To illustrate these features, we first consider a scattering problem as shown in Fig. 48, where electrons are reflected between two interfaces. If a wave tp+ carrying unit flux propagates towards the interface I, the wave will be reflected by the interface I and become rl w - , where w- carries unit flux away from the interface I towards interface 11. In its turn w- will impinge on the interface I1 and be reflected back towards I to become r2r2w+.The total amplitude of the wave propagating towards interface I is therefore given by 1 w+, (VI.55) wI = iy+ + rlr2w++ (r,r2)2yl+-..= 1 - r,r2
which has a pole at r,r2 = 1. A true resonance occurs when the following resonance conditions are satisfied:
lrll
=
Ir21 = 1,
4,
+ 42 = 2nz,
(VI.56)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
II
3 17
I
FIGURE48. Schematic diagram showing a wave that is reflected forward and backward between two boundaries.
where r = Irl exp(i+), and n is an integer. A somewhat relaxed enhancement condition is given by the second equation of (VI.56), i.e. q51
+ 41’ = 2nn.
(VI.57)
2. Transmission and Reflection Coefficients, Resonance, and Enhancement Effects The kind of scattering problems as shown in Fig. 48 can be solved exactly for a simple one-dimensional square potential well model: V(x) = 0 for x < 0 and x > a, and V(x) = -V, for 0 c x c a. In this problem an electron approaches the potential well from x = -a and is reflected and transmitted by the well. Outside the potential well, the required asymptotic solution has the form u(x) =
exp(ikx) + R exp( -ikx), T exp(ikx),
x 5 0, x > a,
(V1.58)
where k = (2mE/h’)’/’. Within the potential well, since E + > 0, we can define a wave number: (Y = (2m(E + V,)/h2)’/’.The electron wave function inside the potential well is
u(x) = A exp(iax) + Bexp(-iax),
0 Ix
Ia.
(VI.59)
The continuity of u(x) and du/dx at x = 0 and x = a required by the boundary conditions provides four relations. We can eliminate A and B and solve for the reflection and transmission coefficients:
318
LIAN-MA0 PENG
Alternatively, the reflectivity r from the boundaries at x = 0 and x = u can be readily shown to be a-k r=-
a+k
and
&
=
0.
(V1.61)
Since both a and k are real and positive, from (VI.61) it is seen that the true resonance conditions (VI .56) are not satisfied for this simple potential model. The enhancement condition (VI.57) can, however, be satisfied for the transmission coefficient when sin au = 0, i.e., au = nn, n = 1,2,3,.. , giving I TI = 1. Physically, this means that the phase advance of an electron wave in each cycle of multiple scattering between the two boundaries of the potential well is an integer multiple of 2n. Constructive interference then occurs between the waves propagating toward the positive x direction. On the other hand, when sin au = 1, constructive interference occurs between waves propagating toward the negative x direction, giving a maximum reflection coefficient R. Shown in Fig. 49 are calculated transmission and reflection coefficients, for 20 eV electrons, an Au thin film, and 10% mean absorption. These coefficients are shown as a function of the film thickness. All diffraction
.
1
0.9
0.8
0.7 0.6 0.5 0.4
0.3
0.2 0.1
0
FIGURE49. Transmission (solid line) and reflection (dotted line) coefficients as a function of the crystal thickness for a Au thin film and 20 eV low-energy electrons.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
3 19
effects are neglected. Both transmission and reflection coefficients are seen to exhibit periodic oscillations with the crystal thickness. The phase of the transmission and reflection coefficient oscillations are seen to be out of phase, and this agrees well with our discussion based on the simple phase relation (VI.57). Suppose now that the electron energy E takes a value which is negative but above the mean inner potential of the crystal, i.e., - & c E < 0. Reflectivity from the boundaries is then given by
where a = (2rn(& + E)/h2)'12, and k = (2rn(-E)/A2)'/2. The phase C#I~ varies from --71 for E = - & t o 0 for E = 0. We then have lrll = Ir21 = 1 and dl + 42 = 2aa + 24r. The resonance conditions (VI.56) can then be satisfied for certain crystal thicknesses which satisfy aa = na - 4. This situation corresponds to the excitation of finite amplitude with zero input. For electron diffraction, the total energy E of the incident electron must be greater than zero. In the presence of diffraction effects, however, the electron total energy can be partitioned into longitudinal and transverse energies. It is then possible for the transverse energy of the high-energy electron to be negative. To illustrate this point, we consider a set of general two-dimensional wave equations (McRae, 1979; Maksym and Beeby, 1981) which can be obtained by substituting Eqs. (V.20) and (V.21) into Eq. (111.1):
where
[G - (ko,+ G)?]. For the Gth reciprocal lattice rod, 6 is negative if the Gth reciprocal lattice =
rod lies outside the Ewald sphere. Neglecting coupling between different reciprocal lattice rods and potential variation along the surface normal, i.e., U G - G , ( = ~ )0, Uo(z)= U o , we arrive at an equation
and this is the one-dimensional potential well problem we had just discussed. Shown in Fig. 50 is a schematic diagram showing a projection of the Ewald sphere and the five reciprocal lattice rods which are used for calculating Fig. 51. The calculations performed are based on Eq. (VI.63) for the Si[OOl] zone axis. The five reciprocal lattice rods involved are (OO),
320
LIAN-MA0 PENG
0
FIOURB50. Schematic diagram showing a diffraction geometry where the Ewald sphere is just about to touch the (40)type reciprocal lattice rods.
(*40), and (0 f 4). Shown in Fig. 51a are the reflection coefficients of the (00) and (40) rod, and in Fig. 51b are the correspondingtransmission coefficients, as a function of the incident wave vector ko. The corresponding energy variation in the figure is from 1 to 100 eV. This energy range corresponds to the low-energy range, where virtual and exchange effects are not negligible (Pendry, 1974). These effects may be, however, included by a suitable choice of the optical potential Vop, and it has been shown by Qian et al. (1993) that a set of equations which is similar to Eq. (VI.63) may well be used for calculating transmission low-energy electron diffraction (TLEED). The beam threshold condition for a particular reciprocal lattice rod is defined by the condition that 6 = 0. For Si[001]incidence, the condition for the (40) type of reciprocal lattice rods gives ko = 0.73665 A. Peaks in both the transmission and reflection coefficient curves are seen to occur at this condition. To the left of this value of ko, the waves associated with the (40) type of reciprocal lattice rods are all evanescent waves. A resonance coupling is seen to occur at a ko value just below 0.7A-', resulting in a resonance peak in the reflection coefficient curve and a dip in the transmission coefficient curve of the (00) rod. To investigate the origin of the resonance scattering, the transmission coefficients for the (00) and (40)rods are calculated and shown in Fig. 52 as a function of the crystal thickness for two slightly different ko values: 0.69 A-' and 0.65 A-'. While, roughly speaking, two frequencies are
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
a
0.45
1
0.4
-
0.35
-
0.3
I
1
I
I
1
I
-
0.25
-
0.2
-
0.15
-
0.1
-
0.05
-
,, ,, ,
I I
8
\ I
._ /<
.\
.,I
0
b
1
32 1
I’
__--
..----
__*. _-.----------i--I
I
I
0.7
0.8
0.5
0.45 0.4
[’
0.35
.f
0.25
L
E
s
I-
0.3
0.2 0.15 0.1
0.05
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.9
Incident wave vector FIGURE 51. (a) Reflection and (b) transmission coefficients of (00)and (40) type reciprocal lattice rods as a function of k,. The calculations have been made for Si along the [001] zone axis, and a crystal thickness of 5.43 A . The (40) type beam threshold condition corresponds to k, = 0.14A-’.
LIAN-MA0 PENG 0.18
0.16 0.14 0.12
0.1 0.08
0.06 0.04
0.02 0
~~
0
1
2
0
1
2
3
4
5
6
3
4
5
6
Thickness (angstrom)
0.25
0.2
0.15
0.1
0.05
0
Thickness (angstrom)
FIGURE52. Transmission coefficients of (a) (00)rod; and (b) (40) rod as a function of the crystal thickness. The calculations have been made for k, = 0.69A-' (solid line) and k, = 0.65 A-' (dotted line), and for the Si[OO1] zone axis.
323
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
observed in Fig. 52a (one corresponds to the enhancement condition which depends on the incident wave vector and crystal thickness, and the other one is associated with the atom position along the [OOl] zone axis, see below), only one frequency is observed in Fig. 52b, and the peak positions are almost independent of the incident wave vectors and are coincident with the atom positions in the crystal. These curves thus show that below the beam threshold condition, the incident wave interacts mainly with one-dimensional bound Bloch states, which are associated with the (40) type of reciprocal rods and are localized around atom planes parallel to the surface. The resonance scattering is enhanced whenever a new layer of atoms is included in the diffraction processes.
3. Surface Resonance Scattering in RHEED For RHEED from a semi-infinite crystal having a truncated potential at the surface, the two boundaries as shown in Fig. 48 may be taken to be virtual interfaces between the substrate bulk crystal and the selvage, and the width of the potential well may be taken to be zero, i.e., a = 0. As has been shown in the preceding section that, for 6 < 0, the reflection coefficient from the surface potential barrier is given by Eq. (VI.62) and its absolute amplitude is unity. The phase of the surface potential barrier reflectivity varies from 0 to K for increasing 6 ,which is shown in Fig. 53 as the “barrier” curve. For a truncated surface potential model and to a two-beam approximation, the reflectivity from the bulk crystal is given by (Kambe, 1988)
r,= where W =
i
- w - m
w<-1
- w + i m
-l<W<+l
- w + m
1<w,
(Gz- g2/4)/U-,.
(VI.65)
For - 1 < W < + 1 , we have
(VI .66) The resonance condition (VI.56) requires
4r + 4,
(
2ak - k‘) +tan-’(-
= tan-’ -~
di=-iF
)
= 2nn,
which has a solution 4* + 4, = 0 within the region - 1 < W < + 1 (or 0.3 < E/V, < 0.7 in Fig. 53) when U-,< 0 (see “barrier + bulk 2” curve in Fig. 53), giving a true surface state localized around the surface region z = 0. This solution is often called a Shockley state (1939) in the theory of surface states.
324
LIAN-MA0 PENG 7
1
I
I
1
I
I
barrier+bulkl rrler+bulk2 barrler bulk 1 bulk2
6 -
5 -
-
--- ..... ..........
4 .............................................
3 -
-e B
3
f
-2 -1
-3
-4
-
\.--
1
1
I
1
I
I
FIGURE53. Phase variations of bb (bulk reflectivity) and bI (surface potential barrier) as a function of the fractional electron energy in terms of the crystal inner potential, i.e., E/V,. In the figure, the curve “bulk 1” refers to the case of V, c 0 and the curve “bulk 2” refers to that of V, > 0.
Although possible, the requirement for the existence of a Shockley state in RHEED that U-,< 0 is rather restricted. An example is provided by the case of the GaAs(100) surface, where for E = 100keV, U,,, = -0.0039 A-2 compared with an inner potential of U,, = 0.115 A-2. It is, however, difficult to isolate effects from other strong reflections, such as V,,, = 0.037 A-’, so that the diffraction processes could be dominated by the Shockley state associated with (200)reflection. In three-dimensional diffraction cases, the reflection coefficients rl and r2 in Eq. (VI.56) are no longer simple complex quantities as in the one-dimensional case. A general matrix is now required to describe the reflectivities of the many Bloch waves impinging on the interface. The resonance conditions (VI.56) may be generalized to mean a condition for which Bloch waves would have finite amplitude for zero incident amplitude. If the incident wave has finite amplitude, the excitation amplitudes of the Bloch waves will become infinite and this is the condition for the existence of surface states (McRae, 1979). To illustrate the conditions and the effects of surface resonance in RHEED, we now consider a two-rod case for the Pt(001) surface.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY 14
33
I
I
I
I
34
35
I
I
I
I
36 37 Angle of incidence (mrad)
I
325
1
I
I
30
39
I 40
FIOURE54. Dispersion surface construction for an empty Pt lattice. Calculations have been made for 100 keV, using many reflections of the type (002n) and (402n). lying on the (00)and (40) reciprocal rods, respectively.
Shown in Fig. 54 is a cross-sectional view of the dispersion surface for an empty Pt lattice as a function of the angle of incidence. Two rods of reciprocal lattice are involved, i.e., the (00) rod which contains the incident beam and (002n)-type reflections, and the (40) side rod which contains (402n)-type reflections. In this figure, three Brillouin zones are shown, in which the middle one correspond to the first zone. To the right of the figure, the parabolic shaped curves are in fact distorted circles centred on (40 f In)-type reflections. The horizontal lines to the left of these circles simply reflect the fact that for the corresponding angles of incidence the (402n)-type reflections are outside the Ewald sphere. The eigenvalues associated with these reflections have therefore only a varying imaginary components. The inclined lines in the figure correspond to the circles centred on (002n)-type reflections. In particular, within the first Brillouin zone, the two lines are associated with (OOO) and (008) reflections, respectively, and the circle to the right is the circle associated with (404) reflection. The intersecting point between the two (OOO) and (008) lines corresponds to the (008) Bragg reflection condition, and the intersection of the (404) circle with the horizontal line corresponds to the beam threshold condition for the Ewald
326
LIAN-MA0 PENG
sphere to touch the (40) reciprocal lattice rod. To the left of this beam threshold condition Gois negative, and the associated wave in the vacuum is therefore an evanescent wave localized at the surface. The (40) rod is then called a closed channel for angles of incidence smaller than the beam threshold condition. Resonance diffraction are known to occur only when closed channels are involved, and the resonance conditions are expected to be satisfied to the left of the beam threshold condition. As the first example for demonstrating surface resonance scattering in RHEED, we consider a three-beam case. The three beams involved are (OOO), (800), and (404). Calculations have been performed for Pt and 100 keV incident electrons. The real part of the dispersion surface (Fig 55a) is seen to have be modified compared with Fig. 54 because of the lattice interaction, and two terraces are formed. The left terrace in the figure is associated with the beam threshold condition as previously discussed in connection with Fig. 54, and the right terrace is primarily associated with the (800) Bragg diffraction condition. The imaginary part of the dispersion surface (Fig. 55b) shows that within the left terrace, all four Bloch waves are evanescent waves, in contrast to the right terrace where only two of the four Bloch waves are evanescent. The absolute amplitudes of the Bloch wave excitations (Fig. 55c) show clearly that resonance (or singularity in excitation amplitude) occurs around the right edge of the left terrace associated with the beam threshold condition. It has been shown that this type of resonance scattering result from Bloch wave degeneracy (Peng, 1994). The behaviour of the resonance scattering is distinct from that of Bragg diffraction; the latter is seen to have only finite excitation amplitude. The absolute amplitudes of 6to0and (Stdoshown in Fig. 55d exhibit two peaks. For angles of incidence which give the total reflection peak to the left of Fig. 55d, all Bloch waves in the crystal are evanescent waves localized at the surface. Since the (40) reciprocal rod is a close channel for this angular range of incidence, the specular reflected beam is therefore totally reflected by the surface. On the other hand, the right peak in Fig. 55d corresponds to (800) Bragg diffraction. Both (00) and (40) reciprocal rods are open channel, and it is seen that electron flux is distributed between the two rods, and none of them shows total reflection. We now consider a general two-rod case where many reflections along the rods are included. Shown in Fig. 56 is a dispersion surface construction based only on the (40) reciprocal lattice rod. Unlike the three-beam case as shown in Fig. 5 5 , a tightly bound band is seen to have been formed to the left which is well separated from all other bands to the right. Shown in Fig. 57a is a similar dispersion surface plot, but now all (00) rod interactions are also involved. The (800) Bragg condition may be recognized to the right of the figure, at 8 = 32.8 mrad. The band associated with the tightly bound
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
327
9 -
0 -
5
4
"
20
-3
'
20
29
30
I
I
31
32
33
34
35
I
I
I
I
1
Angle of incidence (rnrad)
I
I
1
I
I
I
I
29
30
31
62
33
34
35
Angle of incidence (rnrad)
36
36
FIaum 55. Three-beam calculations of RHEED from Pt(001) surface. (a) Real part and (b) imaginary part of the dispersion surface; (c) the corresponding absolute amplitudes of the Bloch wave excitations, and (d) absolute amplitudes of the (00)and (40) reflected beams. (continues)
328
LIAN-MA0 PENG 10
C
9 8 a,
7
E
6
9 .-cn 0 .-
c
.= I a,
I
Pn
a
5 4
3
2 1
n 28
29
30
29
30
31 32 33 Angle of incidence (mrad)
34
35
36
34
35
36
1
0.8
0.6
0.4
0.2
28
31
32
33
Angle of incidence (mrad)
FIOURE55-continued
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
329
14
12
-
a 8 a"
10
8
6 4
t
2
15
+
I
I
20
25
I
,
30 35 Angle of incidence (rnrad)
40
45
50
FIGURE56. Dispersion surface construction for (40n) type systematic reflections lying along the (40) reciprocal lattice rod.
Bloch state is seen to lie between 27.8 and 28.8 mrad. This band interacts strongly with the (00)rods. The absolute excitation amplitude curves (Fig. 57c) show two singular peaks at both edges of this band (the peaks would be truly singular had the absorption not been included). The (800) Bragg condition hardly shows up in Fig. 57b, but the reflection curves in Fig. 57d show that the Bragg condition does give rise to a peak in the RHEED rocking curve. The reflection peak associated with the band of the tightly bound Bloch state is seen to have split into two peaks, which roughly correspond to the two edges of the tightly bound band. It should also be pointed out that under the resonance conditions all Bloch waves are evanescent waves. Had the absorption not been included, the reflection peaks in Fig. 57d would have to be a total reflection peak. Shown in Fig. 58 are two sets of one-dimensional curves, showing the spatial distribution of the two excited Bloch waves associated with the two-rods: (00) and (40). Figures 58a and 58b have been calculated for incidence angles of 28.3 and 32.95 mrad, which correspond to the (40) rod resonance and the (800) Bragg conditions, respectively. It is seen that while both Bloch waves excited under the resonance condition (Fig. 58a) are truly localized, one of the Bloch waves excited under the (800) Bragg condition (Fig. 58b) is spatially delocalized. In the figures, atoms are at multiples
LIAN-MA0 PENG
8
7.5
7
6.5
6
5.5
5
24 4
25
26
27
I
I
1
28
29
30
31
32
33
1
I
1
I
I
I
Angle of incidence (rnrad)
34
2 -
1 -
-1
-
-2
-
-4 24
1
I
I
I
I
I
I
I
I
25
26
27
20
29
30
31
32
33
34
Angle of Incidence (rnrad) FIQURB57. Two-rod RHEED calculations for a Pt(001) surface and 100 keV electrons. Shown in (a) and (b) are the real and imaginary part of the dispersion surface; in (c) are the absolute amplitudes of the Bloch wave excitations; and in (d) are the absolute amplitudes of the (00) and (40) rod reflected beam amplitudes.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
24
33 1
25
26
27
28 29 30 Angle of incidence (mrad)
31
32
33
34
25
26
27
28 29 30 Angle of incidence (mrad)
31
32
33
34
1.2
1
0.8
0.6
0.4
0.2
24
F i o m 57-continued
LIAN-MA0 PENG
"
0
1
0.5
1.5
2
Z (angstrom)
3
2.5
3.5
4
1
0.9
0.8 0.7
0.6 0.5 0.4
0.3 0.2 0.1
0
0
0.5
1
1.5
2
2.5
Z (angstrom)
3
3.5
4
FIGURE58. One-dimensional normalized electron density distribution of two Bloch waves for two angles of incidence (a) 0 = 28.3 mrad; (b) 0 = 32.95 mrad.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
333
of 0.98 A . The peaks in the Bloch waves excited under the resonance conditions are seen to have deviated slightly from the atom position. The localization is seen to be within a single monolayer of atom plane. ACKNOWLEDGMENTS This work was supported by the Violette and Samuel Glasstone Benefaction and SERC grant GR/H58278. The author thanks Professors J. M. Cowley, J. C. H. Spence (Arizona State University), M. J. Whelan, P. B. Hirsch (University of Oxford), J. K. Gjannes (University of Oslo), and K. H. Kuo (Beijing Laboratory of Electron Microscopy) for advice and encouragement, Drs. s. L. Dudarev (University of Oxford) and J. M. Zuo (Arizona State University) for many stimulating discussions. Figures 8 and 38, kindly provided by Drs. J. M. Zuo and S. J. Pennycook, are gratefully acknowledged. APPENDIX A. GREEN'SFUNCTIONS The free space Green function is defined as the solution to the following Helmholtz equation:
V2Go(r,rf) + K2Go(r, r') = S(r - r').
(A.1)
Fourier transforming both sides of the equation,
'I
$jo(k) = 7 Go(r, r') exp[-ik (2n)
*
(r - r')]d(r - r'),
(A.2)
and using the Fourier representation for the Dirac 6 function,
's
6(r - r') = - exp[ik (r (2703 we obtain
-
r')] dk,
(A.3)
(K2- I?)$jo(k) = ( 2 ~ ) - ~ . Formally, the general solution of this equation can be written as
+ CS(K2 - I?)], in which the constant C is arbitrary (this undetermined multiple of S(K2 - I?) is equivalent to the lack of specification of the contour around
334
LIAN-MA0 PENG
FIOURE 59. Schematic diagram showing the contour used for evaluating (A7).
the poles at k = kK). In general, for any C we have g,(k)exp[ik Letting r - r’ of r, we find
+
1
- (r - r’)] d k = (270’
-
exp[ik (r - r’)] dk. K2-@ (A.6)
r and choosing the z axis in k space along the direction
O0
exp(ikr)
For the outgoing wave Green function, we choose the contour C1 as shown in Fig. 59. Since r > 0, we can complete the contour in the upper half plane and find exp(ikr) Go(r) = -- dk( 2 ~ ) ’r ar c, kz - K 2
”1
i
i a
exp(iKr)
1 exp(iKr) 4n r
(A. 7)
Alternatively, using the Dirac formula, we can rewrite the solution of Eq. (A.4) in the form
(A.8) where 6 denotes the Cauchy principle value integral, and the r signs correspond to the outgoing and the ingoing wave Green function, respectively.
335
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
Sometimes it is convenient to work with a mixed real and reciprocal space Green function. Using the notation that r = (x, z), k = (q, kz), we define a mixed space Green function Go(q,x’;z,z’) =
3
Go(x,z’;x‘,z’)exp(-iq - x ) d x .
(A.9)
Substitution of the reciprocal space representation of the Green function (A@, 1 Go(x, Z; X ’ 2’) = (2nI3 s s d q ’ d k ; K 2 + i& - (qf2+ k12) 9
-
x exp[i(q’ (x
- x’)
+ ikl(z - z‘)],
into (A.9) then gives for the outgoing wave
-
x exp[i(q’ - q) x] exp(-iq‘
1
- x‘)
1 “ exp[iki(z - z’)] dk; exp(- iq x‘) 27r -K 2 ic - (q2 + k12)
=-
-
+
i exp(-iq 2KzI
=-
- x’) exp[iK,(z - z ’ ) ] ,
(A. 10)
w,
where Kz = f and the f signs correspond to forward (z > z ’ ) and backward (z < z’ ) scattering, respectively. We now consider a general Green function in the presence of a potential field Ll(r). The wave function of the fast electron in the potential field satisfies the Schrodinger wave equation [V2
+ K 2 + U(r, r’)]Y(r, r’) = S(r’),
(A. 11)
where S(r‘) is an electron source function. A general Green function is defined as G(r, r’), satisfying [V2
+ K 2 + U(r, r’)]G(r, r’) = 6(r - r’).
(A. 12)
Provided that the general Green function is known, the general solution to Eq. (A. 11) is given by Y(r) =
s
G(r, r’)S(r’) dr’.
(A.13)
The asymptotic form of the general Green function can be written as G(r, r’) = G&, r’)
+ GJr, r’),
(A.14)
336
LIAN-MA0 PENG
where GJr, r’) represents the wave scattered by the crystal for a point source illumination. In general, the Green function G(r, r’) is difficult to calculate, but certain properties can nevertheless be obtained for this function. We now consider a two-dimensional Fourier transform of this function: G(q, x; z, z’) =
s
G(x, z; x’, z’) exp(-iq
- x’) dx’.
(A.15)
To obtain an explicit expression, we first consider an electron source function of the form
S(x’, z ‘ ) = exp( - iq * x’)S(z’ - zo), where zo -c z. Substitution of Eq. (A.16) into (A.13) gives Y(r) = Y(x,z) =
s
G(x,z;x’,zo)exp(-iq *x‘)dx‘.
(A.16)
(A.17)
Since the Green function is symmetric, i.e., G(x, 2; x’, ~ we therefore have from (A.9) W X , 2) =
0 = )
G(x’, ~ 0 X, ; z),
G(x’, zo; x, z ) exp(-iq
*
x’) dx’ = G(q, x; zo, z). (A.18)
Fourier transforming (A. 14), using (A. 18) and (A. lo), we obtain i Y(x, z) = G(q, x; zo, z) = -exp(-iq 21kzl
*
x) exp[ikz(zo- z)] + Y$(X,z)
But this is just the solution of the diffraction problem to a plane wave incidence (A.20)
By writing the solution of the diffraction problem for a plane wave incidence wo = exp(ik * r),
337
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
as y k y we therefore have
and G(q, X ; 20 2 ) = 9
y(X,
L:,I
Z) = -eXP(ik,Zo)
3
\Y-k(x,
Z),
(A.22)
On the other hand, for an electron source function of the form
-
S(x', 2 ' ) = -2ilk,( exp(iq x') exp(ikzzl)d(z' - zl),
(A.23)
where z1 > z, we have
- + Ys(x, z) =
Y(x, z ) = exp(ik r)
(A.24)
\Yk.
APPENDIX B. CRYSTAL STRUCTURE FACTORS AND POTENTIAL
To a first-order approximation, for electrons, the atomic potential is given by the screened Coulomb potential Ze2 9 ( r ) = --exp(-r/a), r
03.1)
where a is the screening length. For the screened Coulomb potential we have
1 - -(-27rZe2)
(W3
lo
exp( -igr cos 0) sin 0 d0
r
In a more sophisticated model, we can write out explicitly the interaction between an incident electron and the atomic nucleus and electrons: 9(r,
..., R,, , ...,r j , ...) = --
Ze2
+C-, lr - Rnl j
lr
e2 - rjl
(B.3)
where r, R,, rj denote the coordinates for the incident electron, atomic nucleus, and electrons, respectively. Using a more compact notation we
338
LIAN-MA0 PENG
may write p(r,
Z,e2 z e2 ...,r,, ...I = c = a* c 6(r - ra),
(B.4) Ir - l,r r a in which 2, = - Z for the nucleus and Z , = 1 for atomic electrons. In terms of the coordinates Ira], the atomic scattering factor is given by a
+(g) = - p(r,
...,r,, ...) exp(-ig
e2 r) dr = 21 Z , exp(-ig 2ng ,
c
- r,). (B.5)
Numerically, the coordinates (r,) and accurate atomic potential 9(r) can be calculated using the relativistic Hartree-Fock atomic wave function. It has been shown by Doyle and Turner (1968) that, for elastic scattering, the electron atomic scattering factors can be expressed analytically: 4
c
f‘’(s) =
aj exp( - bjs2)exp( - Bs’),
(B.6)
j = 1
in which a i , bi are Doyle and Turner fitting parameters, B is related to the usual Debye-Waller factor M g by the relation M8 = Bg2/(47r)’, and s = g/47r. By definition the atomic potential is given by the inverse Fourier transform of the atomic scattering factor,
-
j
-
9(r) = h2 - fe)(g/47r) exp(ig r) dg. 2xm0 ( 2 ~ ) ~
(B.7)
Substitution of (B.6) into (B.7) then gives an analytical expression for the atomic potential:
-
9(r) = h2 - ai exp[ - (bi + B)g2/(4n)’] exp(ig r) dg 27rm0 ( 2 ~ i )= ~
c
h2 -2aai 27rm0 i = 1 h2
1
OD
g2 exp[ -
0
c
5
-2nm, a i ( bi z y+” eBx p ( - - ) .
g2] dg
exp(igr cos B)d(cos 0 ) 0
47r2r2 bi + B
To a good approximation, the crystal potential can be expressed as
Ur)
=
Cn Ci Pi(r - Rn - Ti),
(B.9)
in which the index n refers to the nth unit cell of the crystal, and i denotes the ith atom in the unit cell. For a three-dimensional periodic crystal, we can expand the potential in terms of a set of plane waves associated with the
339
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
three-dimensional reciprocal lattice vectors (g):
c bexp(ig - r).
V(r) =
(B.lO)
8
The gth Fourier coefficient of the potential is given by
‘s
V(r) exp( - ig * r) dry & =V
(B. 11)
where V is the volume of the crystal. Substituting (B.9) into (B.11) and noting the relation that exp( - ig * R,) = 1 , we obtain
&=
n
=
(1 (1
c c &(r - Rn - ri) exp[-ig - (r - Rn - ri)] dr c &’) exp(-ig - r ’) dr’ exp(-ig ri), V i i
I
*
1
exp(-ig
- ri)
(B.12)
where N is the total number of unit cells in the crystal. By substituting Eq. (B.7) into (B.12) and writing V / N = V,, where V, is the volume of a unit cell, we then obtain for a three-dimensional periodic crystal potential
c
h2 Fx, with Fx = i &(@(g/4n) exp( - ig * ri), (B. 13) 27rmo v, where& denotes the scattering factor of the ith atom in the unit cell, and Fg is the usual crystal structure factor for the gth reflection. For RHEED, it is convenient to use a mixed real and reciprocal space representation of the crystal potential:
&=-
V ( X ,Z) =
c
VG(Z)exp(iG * x),
(B.14)
G
where (G) are two-dimensional reciprocal lattice vectors parallel to the surface. The two-dimensional Fourier coefficients VG(2) are given by
‘S
VG(z)= S
V(X,Z ) exp(-iG
*
X)
dx,
(B.15)
in which S is the illuminated surface area. When dealing with twodimensional periodic crystals, it is useful to consider the crystal as consisting of a stack of atomic layers with index P. In analogy to (B.9), we can write the crystal potential as follows: VW =
c C c (Pi0 n
i
Xn
- ri - 4 ) s
(8.16)
l
where n is the index of the two-dimensional unit cell, ri = ( x i ,zi),i denotes the ith atom in the unit cell, and de = (XI, Ze) denotes the separation of the
340
LIAN-MA0 PENG
origin of the Pth atomic layer from the origin of the 0th layer. Substitution of (B.16) into (B.lS), noticing that exp(-iG x,) = 1, then gives V G ~=)
[s
c 17 r
n
-
qi(x’, z ’ )exp(-iG
6
1
x‘) dx‘ exp[-iG
- (xi + xJ], (B.17)
where z’ = z - zi - ze. We now consider the integral in the bracket. Substituting expression (B.7) for (pi(r) and (B.6) for fi”) into Eq. (B.17), we obtain
s
I...Jdx’ =
s
q(x’,z’)exp(-iG.x‘)dx’
x exp( -iG
x
s
exp[-i(q
= -4nh2
mo
- x’) dx‘
j = l
-
-
G) x’) dx’
[
a j ~ ~ e x -(bj p [+ B ) G z ] exp --4RzZ”] . bj + B (4702 bj + B (B.18)
By writing N as the total number of surface unit cells within the illuminated area, So = S / N as the area of a single surface unit-cell, and substituting (B.18) into (B.17), we then obtain
exp[-iG
- (xi + xr)].
(B.19)
APPENDIX C. THEOPTICAL POTENTIAL
In the usual dynamical diffraction theory of electrons, the interaction between the incident fast electrons and the crystal is described by a simple periodic potential V(r). The wave function involved in the wave equation is assumed to be a single electron wave function which depends only on the
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
34 1
coordinate r of the incident fast electron. This is a very simplified version of the complicated scattering processes of electrons by crystals. A crystal is by no means simple. It is a very complex system of electrons and nuclei which can be excited by the incident and scattered electrons. The scattering processes are therefore complicated many-body processes, and in general the problem cannot be solved exactly. Instead, an effective potential called optical potential or pseudo-potential is introduced to include the complication of many-body interactions (Molikre, 1939; Yoshioka, 1957). Following Dederichs (1972), we write the total Hamiltonian of the system as H = ho + Ho + V , (C.1) in which ho = - (h2/2m)V2 is the free Hamiltonian of the incident electron, Ho is the Hamiltonian for all the electrons and nuclei of the crystal, and V represents the interaction of the incident electron with the crystal. In general V depends on all the coordinates of the electrons and nuclei of the crystal, i.e., V = V(r, ...,R,, ...,r i , ...), where R, and rj denote the coordinates of the nth nucleus and ith crystal electron, respectively. For a plane wave incidence we have
Assume that before the scattering occurs, the initial crystal state is = 4a(...,R,, ..., r j , ...), which satisfies
HO4a
= Ea 4a *
(C.3)
The total wave function Y k , a= Yk,a(r,...,R,, ...,rj, ...) satisfies the full stationary Schrodinger equation ( h +~ HO+ V ) y k , a = (Ek
+ Ea)Yk,a*
(C.4)
To obtain a formal solution of (C.4), we write yk,a
= pk4a -k @ k , a *
Substitution of (C.5) into the wave equation (C.4) shows that satisfies a differential equation
(C.5) @k,a
Symbolically, this equation is easily solved with the help of a free-particle Green function Go defined as Go = Ek
1
+ E, + ic - Ho - h,'
(C.7)
342
LIAN-MA0 PENG
to give (C .a) Combination of '(C.5) into (C.8) shows that Yk,a satisfies an inhomogeneous equation, called the Lippmann-Schwinger equation, @&,a =
GOV y k , a *
+ GoVYk,u.
(C.9) For many experiments, such as energy filtering experiments and electron holography, only the elastic component of the total wave function is relevant. With elastic scattering here we mean those processes for which the initial and final state of the crystal are identical. For the total state Y k , athe , elastic component is defined as yk,a
= Vk9a
s
... rj...) . ~ ~ , ~=( r ... ) dR,...drj...9,*(...R,...r,...)Yk,,(...R, (C.10)
By averaging (C.10) over the thermal distribution of the initial crystal states we obtain for the elastic wave
with
(
Z = C e x p -(C. 12) a 2T)' Having defined the elastic wave function (C.l l), we now seek to find an equation for this elastic wave function. The concept of the optical potential Vop is introduced for this purpose. The definition equation for the optical potential is similar to Eq. (C.9): Wk
= Vk
+ Go VOPWk,
(C.13)
but now only the elastic wave functions are involved. By writing V = Vop + (V - Vop) and using Eq. (C.9), we obtain
where 1 1 Go = (C.15) 1 - GOYoP Ek + Ea + i& - Ho - ho - Vop' To manipulate Eq. (C.14) any further, we multiply (C.13) from the right-hand side by + a ; after some algebra, we obtain GV =
(C.16)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
343
Substitution of Eq. (C.16) into Eq. (C.14) gives yk,a
= Wk6a
-
+ 1
1 - G&'-
GV(v
-
VoP)yk,a
VoP) v k 4 a
Averaging the total wave function over the initial states and using the definition (C.l l), we obtain (yk,a)
= Wk
(C.18)
which gives
(C. 19)
and finally 1 Gy(V - Vop)) 1 - G y ( V - VoP)
(C.20)
and this is the equation for the optical potential. For weak interaction, i.e., V 4 ho, we can iterate Eq. (C.20) up to second order and obtain
In this expression, the first term is simply the averaged elastic potential. The second term represents the first correction coming from the diffuse and inelastic scattering on the elastic wave. It has now been firmly established that among the three main inelastic scattering mechanisms, i.e., TDS, plasmon, and inner-shell electronic excitation, the TDS contribution dominates over the other two by over an order of magnitude for g # 0 (Whelan, 1965; Hall and Hirsch, 1965; Kainuma and Yoshioka, 1966). In what follows we shall therefore be
344
LIAN-MA0 PENG
concerned only with the TDS contribution to Vop. To a good approximation we may assume that the atomic electrons follow adiabatically the motion of the nucleus and that all atomic electrons are in the ground state. The interacting potential is then given by
=
1
$
dRpi(r - R)&R -
ri),
(C.22)
in whichpp(R’) is the electron density of the ith atom in its ground state, the summation on i is over all atoms in the crystal, and p(r) is given by (C.23)
The averaged elastic potential ,i.e., the first term in Eq. (C.21), is a local potential. By writing ri -+ Ri + ui , where Ri denotes the equilibrium position of the ith atom and ui represents the thermal displacement of the atom from its thermal equilibrium position, we have P
(VW> =
In reciprocal space:
<= where
s
(V(r)) exp(-ig
r) dr =
vi(g) exp( - ig * Ri) exp( - M f ) ,
(C.25)
i
vi(g) =
-
s
bpi(r) exp(-ig
*
r) dr,
Mf = ((g uJ2> = 3 g 2 / 2 is the usual Debye-Waller factor, and values of for certain elements and compounds can be found in Radi (1970). The first-order correction to the average potential is given by the second term of (C.21). In real space representation, substituting Eqs. (C.22) and (C.24) into (C.21), we obtain
2
(i 1
dR dR‘pi(r - R)[6(R - ri) - (d(R - ri)>]
V(’)(r, r’) = iJ
x [&R’
- r j ) - (&R’ - rj)>] .
)
(C.26)
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
Using the following identity for
1;
fexP{;(Ek
+ E, + it - h,
E
345
> 0:
I
- Ho)t =
1 Ek
+ E, + i&
-
ho - Ho’ (C.27)
we obtain from Eq. (C.26):
V(’)(r, r’)
x
1 c -exp(-E,/k, a
T)
[6(R - ri)
-
(6(R - ri)>]
z
x exp[ -iHot][6(R’ - rj) - (6(R’ - rj)>]
X
(6[R - ri(t)]6[R’ - rj(0)] - (6[R - ri(t)]>>. (C.28)
It should be noted that in Eq. (C.28) we have used the Heisenberg representation for the time-dependent operator ri(t):
ri(t) = exp[ ;Hot] riexp[ - ; H o t ]
.
(C.29)
Since the energies of the phonons are much smaller than the energy of the incident electrons, we may use the static approximation and replace the time-dependent operator r i ( t ) by the static operator ri(0) = ri .
346
LIAN-MA0 PENG
Equation (C.28) then becomes V(’)(r, r’) = -
2m h2
ss
c
i,j
dRdR’q+(r - R)Go(r, r’)pj(r’
- R’)
x (6(R - ri)S(R’ - rj) - (6(R - ri)><S(R’- rj))), (C.30)
where
is the free space Green function (A.7). When acting on the dynamical electron wave function, we have
x
11
dRdR’(pi(r - R)cpi(r’
x (6(R - r,)S(R’ - r j )
-
- R’)
(6(R - ri))(6(R’
- rj))).
(C.32)
In Eq. (C.32),we note that the integrand depends on the phase difference between wave scattered at r and r’ via a rapidly oscillating function exp(ik,Ir - r’l). For i # j , contributions from different atoms will be cancelled out with each other after the integration. The total contribution will therefore be the same as from independent atoms. This result is equivalent to the use of an Einstein model for TDS (Hall and Hirsch, 1965). Equation (C.32) now becomes
x
11
d R dR’cpi(r - R)q+(r’ - R’)
Fourier transforming Eq. (C.32), using (A.6) and the definitions
347
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
we obtain
V
2m 1 --*-!drexp(-ig-r)Z - hZ V
x
-
d k exp[ik (r - r‘)]
i
G-P+i&
11
-
dg‘dg” ui(g’)ui(g”)exp[ig’ (r - R)
+ ig” - (r’ - R’)]
x ((6(R - ri)S(R’ - r i ) ) - (6(R - ri))(6(R‘ - ri)) =
_2m. h2
X
1
1
d h vi(g - k)ui(k - h)y(h)
dk
v G-P+i&
((exp[-i(g - h) * ri])
-
-
(exp[-i(g - k) ri])
-
x (exp[-i(k - h) ri]) =
(C.34)
V(g, b)W(h) dh,
where V is the volume of the crystal, and
-
x [(exp[-i(g - h) * ui]) - (exp[-i(g - k) ui]) x (exp[-i(k
- h)
*
ui])].
(C.35)
1 n 6 -- i -6(ko- k), 6-p 2k0
(C.36)
Using Dirac’s formula 1
=
G-k2+i&
and substituting (C.36) into (C.39, we see that V(’)Y, is a complex quantity. The real part of this complex quantity is due to virtual dvfwe scattering, and the imaginary part is due to real dvfwe scattering. It has
348
LIAN-MA0 PENG
been shown that the contribution resulting from virtual diffuse scattering is an order of magnitude less than that resulting from the real diffuse scattering (Rez, 1978). In what follows we shall therefore consider only the imaginary part. Equation (C.35) then becomes
1
-.h2 v 2ko
(
P(g, h) = 2m
d k 6 ( k - ko)ui(g - k)vi(k - h)
x exp[- i(g - h) * Ri] x (exp[-M(g - h)] - exp[-M(g - k) - M(k - h)]].
(C.37)
Using a high-energy electron approximation, we may neglect the curvature of the Ewald sphere, letting k - h = q, with q lying on a sphere satisfying the equation k = ko.Under this approximation, Eq. (C.37) is reduced to
h2
1
(-
v 2ko
P(g, h) = 2m * -
-
dq vi(q)ui(g - h - q) exp[- i(g - h) Ri]
x lexp[-M(g - h)l - exp[-M(q) - M(g - h - q)11 = V(g - h).
(C.38)
Substitution of (C.38) into (C.34) gives 3[ V(')Yk]=
1
P(g
- h)Y(h)dh
= P(g)
* "(g).
(C.39)
If we inverse Fourier transform (C.39), we thus obtain an approximate expression for V'')Y, P y k = s-'(P(g)
* ~ ( g ) ]= ~ - l ( ~ ( g ) ] ~ ~ ( r ) , (C.40)
which gives V(')(r, r') = V(')(r) =
s
V(g) exp( - ig * r) dg,
(C.41)
i.e., the optical potential Vop = (V(r)) + V(')(r) is now a simple local potential. Procedures for calculating the optical potential are similar to those for the elastic potential, as discussed in Appendix B. The only difference is that now for each atom, the atomic scattering amplitude is composed of two terms. The first term results from the averaged potential (V(r)) and is identical to that used in Appendix B. The second term results from Y ( l ) ,which can be calculated using (C.41) or one of the published routines, such as those by Bird and King (1990) and Weickenmeier and Kohl (1991).
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
349
REFERENCES Ashcroft, N. W., and Mermin, N. D. (1976). “Solid State Physics.” Saunders, Philadelphia. Bethe, H. (1928). Ann. Phys. 87, 5 5 . Bird, D.M. (1989). J. Electron Miscrosc. Tech. 13,77. Bird, D. M., and King, Q. A. (1990). Acta Crystallogr., Sect. A 46, 202. Bird, D. M., and Saunders, M. (1992). Acla Crystallogr., Sect. A 48. 555. Buxton, B. F., Eades, J. A., Steeds, J. W., and Rachham, G. M. (1976). Philos. Trans. R. SOC. London 281, 171. Colella, R. (1972). Acta Crystullogr.. Sect. A 28, 1 1 . Cowley, J. M. (1966). Prog. Mater. Sci. 13,269. Cowley, J. M. (1981). “Diffraction Physics.” North-Holland, Amsterdam. Cowley, J. M. (1988). Acta Crystallogr.. Sect. A 44, 847. Cowley, J. M., editor (1993). “Electron Diffraction Techniques.” Oxford Univ. Press, Oxford. Darwin, C. G. (1914). Philos. Mag. 27, 315 and 675. Dederichs, P. H. (1972). In “Solid State Physics” (H. Ehrenreich, F. Seits, and D. Turnbull, Eds.), Vol. 27, p. 125. Academic Press, New York. Doyle, P. A., and Turner., P. S. (1968). Acta Crystallogr., Sect. A 24, 390. Dudarev, S. L., and Peng, L. M. (1993). Proc. R. SOC.London A 440, 95 and 117. Dudarev, S. L., Peng, L.-M., and Whelan, M. J. (1993a). Proc. R. SOC.London A 440, 567. Dudarev, S. L., Peng, L.-M., and Whelan, M. J. (1993b). Phys. Rev. B 48, 13408. Dudarev, S. L., and Whelan, M. J. (1993). Phys. Rev. Lett. 70,2904. Eades, J., editor (1989). J. Electron Microsc. Vol. 13 (a special issue on CBED). Echenique, P. M.,and Pendry, J. €3. (1978). J. Phys. C: Solid State Phys, 11, 2065. Ewald, P. P. (1917). Ann. Phys. 54, 519. Emslie, A. G. (1934). Phys. Rev. 45, 43. Fujiwara, K. (1961). J. Phys. SOC.Jpn. 16,2226. Garbow, B. S., Boyle, J. M., Dongarra, J. J., and Moler, C. B. (1977). “Lecture Notes in Computer Science,” Vol. 5 1 . Springer-Verlag. New York/Berlin. Gjannes, J. (1962). Acta Crystallogr. IS, 703. Gjannes, J., and Moodie, A. F. (1965). Acta Crystallogr. 19, 65. Gjannes, J . (1993). In “Electron Diffraction Techniques” (J. M. Cowley, ed.), Oxford Univer. Press, Oxford. Lehmpfuhl, G., and Dowell, W. C. T. (1986). Acta Crystallogr., Sect. A 42,569. Hall, C. R., and Hirsch, P. B. (1965). Proc. R. SOC. London A 286, 158. Harris, J. J., Joyce, B. A., and Dobson, P. J. (1981). Surf. Sci. 103,L90. Hashimoto, H., Howie, A.. and Whelan. M. J. (1962). Proc. R. SOC.London A 269, 80. Hirtsch, P. B., Howie, A., Nicholson. R. B., Pashley. D. W.,and Whelan, M. J. (1965). “Electron Microscopy of Thin Crystals.” Butterworths, London. Howie, A., and Whelan, M. J. (1961). Proc. Roy. SOC.A 263 217. Howie, A. (1%2). J. Phys. SOC.Jpn. 17 (Suppl. BII) 118 (see discussion of Fujiwara’s paper by Whelan). Howie, A. (1963). Proc. R. SOC. London A 271, 268. Howie, A. (1978). In “Diffraction and Imaging Techniques in Materials Science” (S. Amelinckx, R. Gevers. and J. van Landuyt, Eds.), p. 295. North-Holland, Amsterdam. Howie, A. (1979). J. Microsc. 117, 11. Humphreys, C. J., and Hirsch, P. B. (1%8). Philos. Mag. 18, 115. Humphreys, C. J. (1979). Rep. Prog. Phys. 42, 1825.
350
LIAN-MA0 PENG
Ichimiya, A. (1983). Jpn. J. Appl. Phys. 22, 176. Ichimiya, A. (1987). Surf. Sci. 187, 194. James, R. W. (1962). “The Optical Principles of the Diffraction of X-rays.” Bell, London. Jesson, D. E., Pennycook, S. L., and Maribeau, J.-M. (1991). Phys. Rev. Lett. 66, 750. Kainuma, Y., and Yoshioka, H. (1966). J. Phys. SOC. Jpn. 21, 1352. Kambe, K. (1988). Ultramicroscopy 25, 259. Kikuchi, S. (1928). Proc. Imp. Acad. Jpn. 4, 275. Kikuchi, S., and Nishikawa, S. (1928). Proc. Imp. Acad. Jpn. 4, 475. Lamla, E. (1938). Ann. Phys. 32, 178 and 225. Lent, C. S., and Cohen, P. I. (1984). Surf. Sci. 139, 121. Lipson, H., and Cochran, W. (1968). “The Determination of Crystal Structures.” Bell, London. Maksym, P. A., and Beeby, J. L. (1981). Surf. Sci. 110, 423. McRae, E. G. (1979). Rev. Mod. Phys. 51, 541. Marks, L. D., and Ma, Y. (1988). Acta Crystallogr., Sect. A 45, 392. Marten, H., and Meyer-Ehmsen, G. (1985). Surf. Sci. 151, 570. Miyake, S., Hayakawa, K., and Miida, R. (1968). Acta Crystallogr., Sect. A 24, 182. Moliere, G. (1939). Ann. Phys. 35, 172 and 297. Mott, N. F., and Massey, H. S. W. (1965). “The Theory of Atomic Collisions,” Ch. 13. Clarendon Press, Oxford. Miiller, K. A,, Berlinger, W.,and Waldner, F. (1968). Phys. Rev. Lett. 21, 814. NAG (1989). NAG Fortran Library Manual. NAG, Oxford. Newton, R. G. (1966). “Scattering Theory of Waves and Particles.” McGraw-Hill, New York. Pendry. J. B. (1974). “Low Energy Electron Diffraction.” Academic Press, New York/ London. Pendry. J. B., Heinz, K., and Oed, W. (1988). Phys. rev. Letts. 61, 2953. Peng, L.-M., and Cowley, J. M. (1986). Acta Crystallogr., Sect. A 42 545. Peng, L.-M., and Cowley, J. M. (1988a). Surf. Sci. 199, 609. Peng, L.-M., and Cowley, J. M. (1988b). Ultramicroscopy 26, 277. Peng, L.-M., Cowley, J. M., and Yao, N. (1988). Ultramicroscopy 26, 189. Peng, L.-M. (1989). Surf. Sci 222, 296. Peng, L.-M., and Gjennes, J. (1989). Acta Crystallogr., Sect. A 45, 699. Peng, L.-M., and Whelan, M. J. (1900). Proc. Roy. SOC. London A 431, 111 and 125. Peng, L.-M., and Whelan, M. J. (1991a). Actu Crystallogr., Sect. A 47, 101. Peng, L.-M., and Whelan, M. J. (1991b). Surf. Sci238, L446. Peng, L.-M., and Whelan, M. J. (1991~).Proc. R. SOC. LondonA 432, 195. Peng, L.-M., and Whelan, M. J. (1991d). Proc. R. SOC. London A 435, 257 and 269. Peng, L.-M., Gjennes, K., and Gjennes, J. (1992). Microsc. Res. Tech. 20, 360. Peng, L.-M., Dudarev, S. L., and Whelan, M. J. (1993). Phys. Lett. A 175, 461. Peng, L.-M., and Dudarev, S. L. (1993). Surf. Sci. 298, 316; Ultramicroscopy 52, 319. Peng, L.-M. (1994). Surf. Sci. 316, L1049. Peng, L.-M., and Zuo, J. M. (1994), submitted for publication. Pennycook, S. J., and Boatner, L. A. (1989). Nature (London) 336, 565. Pennycook, S. J.. and Jesson, D. E. (1993). Ultramicroscopy 37, 14. Pogany. A. P., and Turner, P. S . (1968). Acta Crystallogr., Sect. A 24, 103. Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1986). “Numerical Recipes-The Art of Scientific Computing.” Cambridge Univ. Press, Cambridge. Qian, W., Spence, J. C. H., and Zuo, J. M. (1993). Acta Crystallogr., Sect. A 49, 436. Radi, G. (1970). Acta Crystallogr., Sect. A 26, 41.
NEW DEVELOPMENTS OF ELECTRON DIFFRACTION THEORY
351
Reimer, L. (1989). “Transmission Electron Microscopy.” 2nd. ed. Springer-Verlag, Berlin/ New York. Reimer, L. (1991). In “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Vol. 81, p. 43. Academic Press, New York. Rez, P., Humphreys, C. J.. and Whelan, M. J. (1977). Philos. Mug. 35, 81. Rez, P. (1978). D. Phil. thesis, University of Oxford. Rous, P. J., and Pendry, J. B. (1989). Surf. Sci. 219, 355. Saxton, O., O’Keefe, M. A., Cockayne, D. J., and Wilkens, M. (1983). Ultrumicroscopy 12, 75.
Schiff, L. I. (1938). “Quantum Mechanics.” McGraw-Hill, London. Shinohara, K. (1935). Phys. Rev. 47, 730. Shockley, W. (1939). Phys. Rev. 56, 317. Smart, D. J., and Humphreys, C. J. (1980). In “EMAG 1980” (T. Mulvey, Ed.), p. 211. Institute of Physics, london. Smith, A. E., and Lynch, D. F. (1988). Actu Crystallogr., Sect. A 44, 780. Speer. S., Spence, J. C. H., and Ihrig, E. (1990). Actu Ctystallogr., Sect. A 46, 763. Spence, J. C. H. (1988). “Experimental High Resolution Electron Microscopy.” Oxford Univ. Press, Oxford. Spence, J. C. H., and Zuo, J. M. (1992). “Electron Microdiffraction.” Plenum, New York. Spence, J. C. H. S. (1993). Actu Crystullogr., Sect. A 49, 231. Steeds, J. W. (1984). In “Quantitative Electron Microscopy” (J. N. Chapman and A. J. Craven, Eds.), p. 49. Steeds, J. W., Baker, J. R., and Vincent, R. (1982). In “Electron Microscopy,” Vol. I, p. 617. Takayanagi, K., Tanishiro, Y., Takahashi, M., and Takahashi, S. (1985). J. Vuc. Sci. Techno1 A 3, 1502. Tamrn, I. (1932). Phys. Z. Soviet Union 1, 733. Tanaka, M., Sekii, H., and Nagasawa, T. (1983). Actu Crystallogr.. Sect. A 39, 825. Tanaka, M., and Tsuda, K. (1990). In Proc. XI1 Inter. Congress for Electron Microscopy. San Francisco Press, San Franscisco. Vincent, R., Bird, D. M., and Steeds, J. W. (1984), Philos. Mug. A 50, 765. Wang, S. Q., Peng, L.-M., Peng, Duan, X. F., and Chu, Y. M. (1992a). Ultramicroscopy 45, 405.
Wang, S. Q., Peng, L.-M. Peng, Xin, Y., Chu, Y. M., and Duan, X. F. (1992b). Philos. Mug. Lett. 66, 225. Wang, Z. L., Liu, J., Lu, P., and Cowley, J. M. (1989). Ultramicroscopy 27, 101. Weickenmeier, A., and Kohl, H. (1991). Actu Crystullogr., Sect. A 47, 590. Whelan, M. J.. and Hirsch, P. B. (1957). Philos. Mug. 2, 1121 and 1303. Whelan, M. J. (1965). J. Appl. Phys. 36. 2099 and 2104. Wilkinson, J. H. (1988). “The Algebraic Eigenvalue Problem.” Clarendon Press, Oxford. Wood, E. A. (1964). J. Appl. Phys. 35, 1306. Yin, M. T., and Cohen, M. L. (1981). Phys. Rev. Lett. 25, 2303. Zachariasen, W. H. (1945). “Theory of X-ray Diffraction in Crystals.” Wiley, New York. Zhao, T. C., Poon, H. C., and Tong, S. Y. (1988). Phys. Rev. B 38, 1172. Zuo, J. M., Spence, J. C. H., and O’Keefe, M. (1988). Phys. Rev. Lett. 61, 353. Zuo, J. M. (1991). Actu Crystullogr., Sect. A 47, 87. Zuo, J. M., and Spence, J. C. H. (1991). Ultrumicroscopy 35, 185. Yoshioka, H. (1957). J. Phys. Spc. Jpn. 12, 618.
This page intentionally left blank
ADVANCES IN IMAOINO AND ELECTRON PHYSICS. VOL . 90
Parallel Image Processing with Image Algebra on SIMD Mesh-Connected Computers HONGCHI SHI. GERHARD X . RITTER. AND JOSEPH N . WILSON Center for Computer Vision and Visualization Department of Computer and Information Sciences University of Florida. Gainesville. Florida
.
I Introduction . . . . . . . . . . . . . . . . . . A . Image Processing and Parallelism . . . . . . . . B . Image Algebra and Parallel Image Processing . . . . C . Notation . . . . . . . . . . . . . . . . . . I1 Overview of Image Algebra . . . . . . . . . . . . A Images . . . . . . . . . . . . . . . . . . B. Templates . . . . . . . . . . . . . . . . . C Pixelwise Operations . . . . . . . . . . . . . D Global Operations . . . . . . . . . . . . . E Image-Template Operations . . . . . . . . . . I11. SIMD Mesh-Connected Computers . . . . . . . . . A . SIMD Parallel Computers . . . . . . . . . . . B Mesh-Connected Interconnection Networks . . . . . C . Mapping Images onto SIMD Mesh-Connected Computers IV Parallel Algorithms for Image Algebra Primitives . . . . A Pixelwise Operations . . . . . . . . . . . . . B Global Operations . . . . . . . . . . . . . C Image-Template Operations . . . . . . . . . . V Parallel Image Processing with Image Algebra . . . . . A . Abingdon Cross Benchmark . . . . . . . . . . B. Shrinking Binary Image Components . . . . . . . C . Labeling Binary Image Components . . . . . . . . D Computing Properties of Image Components . . . . . VI . Concluding Remarks and Future Research . . . . . . . A Concluding Remarks . . . . . . . . . . . . . B. Future Research . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . .
.
. . . . .
.
.
. . .
. .
. . . . . . . . . . .
. . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
353 354 355 357 357 358 359 360 360 361 363 364 365 366 368 369 369 371 382 383 389 396 415 424 424 425 427
I . INTRODUCTION
Image processing and image analysis involve intensive computations. demanding high-performance computers for practical applications . Parallel computing appears to be the only economical way to achieve the level of 353
Copyright 0 1995 by Academic Press. Inc . All rights of reproduction in any form reserved. ISBN 0-12-014732-7
354
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
performance required by image processing and analysis (Chaudhary and Aggarwal, 1990). Programming parallel computers for image processing is quite difficult without a high-level language well suited for image processing. Image algebra is a unified mathematical theory for image processing and analysis. It also provides a highly parallel language for image processing. This article explores the relationship between image algebra and parallel image processing algorithms, and how well single instruction multiple data (SIMD) mesh-connected computers are suited for image algebra. In this article, we select a group of image algebra primitives useful for parallel image processing and develop efficient algorithms to implement these image algebra primitives on SIMD mesh-connected computers. Image algebra treats images as primary operands and addresses implicitly the parallelism in image processing. In this article, we demonstrate that image algebra can serve as a good model for parallel image processing. This is accomplished by using image algebra to describe several highly parallel algorithms we have developed for various image processing tasks. We describe an efficient algorithm for the Abingdon Cross image processing benchmark. We propose a new binary image component shrinking algorithm. We describe and analyze several local image component labeling algorithms, one of which positively answers an open question. We also define a special class of image-template operations that prove useful for computing properties of image components and develop an efficient general algorithm for them. Finally, we give some suggestions for future research on parallel image processing with image algebra on parallel computers. A. Image Processing and Parallelism
Image processing is an important area of research concerned with the manipulation and analysis of images by computers (Rosenfeld and Kak, 1982). It is routinely employed in many application domains, such as document analysis, industrial robots for product assembly and inspection, medical diagnosis, military reconnaissance, and machine processing of aerial and satellite imagery for weather prediction and crop assessment (Gonzalez and Woods, 1992). Image processing is also a challenging area. Image-processing problems usually involve very intensive computations. Consider a sequence of images at medium resolution (512 x 512 pixels) and standard frame rate (30 frames per second) in color (24 bits per pixel) which represents a data input rate of about 23.6 million bytes per second (Weems et al., 1991). In order to enhance and segment such a sequence of images and to extract various
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
355
features from it, we may have to apply many thousands of operations to each input pixel. Such an intensive computation, if handled by even the largest and most powerful serial computer, can be hopelessly slow. A serial computer is considered inadequate for image-processing problems (Uhr , 1984), and parallel processing is now generally accepted as necessary to support image-processing applications (Weems et al., 1992; Chaudhary and Aggarwal, 1990). On the other hand, most image-processing algorithms are inherently parallel because they involve similar computations for all pixels in an image, which makes load-balancing easy. The massive amount of data in an image provides a natural source of parallelism, which can be employed by partitioning the image into sections (Jamieson and Tanimato, 1987; Chaudhary and Ranka, 1992; Webb, 1992). It is generally realized that parallel processing appears to be the only way to achieve the necessary speedups required by image processing (Chaudhary and Aggarwal, 1990), and image processing provides significant data parallelism that can be used in parallel processing. However, it is not always clear how to effectively employ the data parallelism in image processing. In fact, effective employment of data parallelism in image processing remains an active area of research. B. Image Algebra and Parallel Image Processing
To support parallel image processing, we need a unified theory that can serve as a model for image-processing algorithms and that also fits well into the theory and practice of parallel architectures. The search for such a theory started more than 30 years ago. Based on the work of von Neumann’s logical theory of automata (von Neumann, 1951), a cellular array machine was proposed by Unger that could implement in parallel many algorithms for image processing and analysis (Unger, 1958). Over time, more cellular array machines have been proposed. Cellular array machines usually are assumed to have fixed capacity of local memory. With the advent of VLSI technology, many mesh-connected computers have been developed for image processing. Although mesh-connected computers are closely related to cellular array machines, they allow their local memory to increase with their array sizes. The architectures such as the ILLIAC (McCormick, 1963), the CLIP series (Duff, 1973; Duff, 1982; Fountain et al., 1988), the MPP (Batcher, 1980), the GAPP (Cloud and Holsztynski, 1984), and the Hughes 3D machine (Hughes Aircraft Company, 1992) are arrays of processing elements arranged as two-dimensional meshes with hardwired communication between neighboring processors. The elementary operations of these machines create a mathematical basis
356
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
for the theoretical formalism capable of expressing a large number of algorithms for image processing and analysis. The formalism associated with these architectures is that of pixel neighborhood arithmetic and mathematical morphology, which is concerned with image filtering and analysis by structuring elements. Morphological concepts and methods were first unified by Serra and Sternberg into a coherent algebraic theory specifically designed for image processing and image analysis (Sternberg, 1980; Serra, 1982; Sternberg, 1985). More recently, a new theory was introduced by Maragos (1985) to unify a large class of linear and nonlinear systems under the theory of mathematical morphology. However, morphological methods have some well-known limitations because of their set-theoretic formulation, which is based on the Minkowski addition and subtraction of sets (Hadwiger, 1957). The morphology-based image algebras ignore the linear domains, transformations between different domains, and transformations between different value sets (Ritter et al., 1990; Ritter, 1991). Another algebraic structure formulated by Dougherty and Giardina (1987) defines a set of basic image operations. However, it provides too low a level of specification for expressing various image processing tasks (Ritter et al., 1990). Another class of parallel image-processing architectures is based on optical computing concepts and is intended to remove the interconnection bottleneck common to conventional cellular arrays (Huang et al., 1989; Huang, 1990; Fukui and Kitayama, 1992). These architectures use the inherent optical parallelism and three-dimensional free space interconnection capabilities to do parallel binary image processing. The formalism associated with these architectures is a coherent homogeneous algebra referred to as image logic algebra (Fukui and Kitayama, 1992) or binary image algebra (Huang et al., 1989; Huang, 1990). Binary image algebra is based on the set-theoretic formulation. It has three fundamental operations: the complement of an image, the union of two images, and the dilation of two images. For this reason, binary image algebra falls into the same class as other algebras based solely on mathematical morphology. Image logic algebra is also based on the set-theoretic formulation. It contains all the logic operations, dilation, and erosion of images. It also contains the image shift transformation, which translates an image vertically and horizontally; the image casting transformation, which allows the domain to be shrunk; and the multiple imaging transformation, which allows the domain to be enlarged. However, these transformations between domains are very simple and are not capable of effectively performing higher-level vision tasks. The development of a more general unified algebraic structure grew out of a need, by the U.S.Air Force System Command, for a common imageprocessing language. The goal was the development of a complete, unified algebraic structure that provides a common mathematical environment for
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
357
image-processing algorithm development, optimization, comparison coding, and performance evaluation. Image algebra developed by Ritter and his colleagues at the University of Florida proved to be highly successful and capable of fulfilling the tasks set forth by the government (Ritter et al., 1990; Ritter, 1991). It is a comprehensive and unified mathematical theory concerned with image processing and image analysis. It incorporates and extends mathematical morphology. Furthermore, it allows transformations between different domains and transformations between value sets. It extends the neighborhood operations of morphology to more general image-template operations. Programming parallel computers is very difficult at this time, as was programming serial computers in the 1960s (Webb, 1992). Each parallel computer has its own language, which runs efficiently on one computer only, and high-level programming tools for parallel computers tend to be tailored to particular machines. A high-level and architecture-independent language for parallel image processing is desirable. Image algebra provides a powerful algebraic language for image processing that, if properly embedded into a high-level parallel language, will greatly increase a programmer’s productivity, as the programmer can concentrate on developing effective algorithms themselves for particular image-processing problems without worrying about the parallel architecture.
C. Notation The notation used in this paper is similar to that used in most technical texts. Any new notation introduced will be explained in the paper. We will use log to denote the base 2 logarithm. We will also use the following notation to describe the asymptotic behavior of functions: 1. f ( n ) = O(g(n))denotes the fact that there exist constants c and no such
that f ( n ) Icg(n) for all n 2 no; 2. f ( n ) = Q(g(n))denotes the fact that there exist constants c and no such that f ( n ) 2 cg(n) for all n z no; 3. f ( n ) = O(g(n))denotes the fact that there exist constants c, ,c,, and no such that c,g(n) If ( n ) Iczg(n) for all n z no. 11. OVERVIEW OF IMAGE ALGEBRA Image algebra (IA) is a heterogenous algebra concerned with image processing and image analysis. It provides a very general algebraic language for image processing. We present an overview of image algebra and select
358
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
a group of image algebra primitives for parallel image processing from the viewpoint of parallel processing. In image algebra, basic objects in image processing such as images, point sets, and templates are formally defined as its operands. A set of unary and binary operations are defined on and between these operands. Image algebra is inherently parallel in that all the primitive operations are defined to work on images, which are collections of pixels. The parallelism of image algebra is inherent in its operations, which removes the need for sophisticated compiler analysis normally needed to uncover available parallelism. For parallel processing, image algebra can be considered as a collectionoriented language that manipulates aggregate data structures and operations as a whole (Sipelstein and Blelloch, 1991). When implemented on parallel computers, image algebra can have images distributed across the available processing elements. Operations can be performed by all the processing elements in the single instruction multiple data (SIMD) fashion. Here, we only describe image algebra from the viewpoint of parallel image processing. A complete description of image algebra can be found in the paper by Ritter, Wilson, and Davidson (1990) and Ritter (1991, 1992).
A. Images
Images are the most fundamental operands in image algebra. They are defined in terms of two other types of elementary operands: value sets and point sets. Images are endowed with two types of information, namely the spatial relationship of the points and some type of numeric or other descriptive information associated with these points. A value set F is a set of values that an image could assume. The most commonly used value sets in image processing are the sets of integers, real numbers, complex numbers, and binary numbers of length I, which are denoted by Z, R, C,and Z21, respectively. The operations on and between elements of a given value set F are the usual arithmetic operations, lattice operations, or logic operations associated with F. A point set X is simply a subset of some topological space. The most commonly used point sets in image processing are subsets of n-dimensional Euclidean space R".The operations on and between elements of points are the usual operations of coordinate points. The operations on and between point sets are the operations U, n, \, choice, and card. The choice function returns an arbitrary element of a point set, while the card function returns the number of elements in a point set.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
359
Given a point set X and a value set F, an F-valued image a on X is a function a: X + F, which is usually expressed in terms of the graph of a by a = ((x, a(x)): x E X),
where a(x) E F. An element (x, a(x)) of a is called a pixel, where x is the pixel location and a(x) is the pixel value at location x. The set of all F-valued images on X is denoted by FX. On parallel computers, pixels of an image are commonly assigned to the processing elements by a mapping that relates each pixel location to the index of a processing element. B. Templates
In image algebra, the definition of a template unifies and generalizes the usual concepts of templates, masks, windows, structuring elements, and neighborhood functions into one general mathematical entity. Templates are special types of images. An F-valued template from Y to X is an element of (F')'. If t E (FX)*,then for notational convenience we define t, = t(y) in order to denote the image t(y) for each y E Y. The pixel values t,(x) of the image t, = ((x, t,(x)): x E X) are called the weights of the template t at the point y, and y is called the target point of the image t,. The pixel values of the template t = ((y, t,): y E Y)are F-valued images on X. A parameterized F-valued template from Y to X with parameters in P is a function of form t: P + (F')'. Here, P is called the set of parameters, and each p E P is called a parameter for t. For each p E P, t@) is an Fvalued template from Y to X. Let X c R". A template t E (FX)' is called a translation-invariant template or, simply, invariant template if and only if for x, y E X with x + z, y + z E X, where z E R", we have t,(x) = ty+E(x+ 2). A template that is not translation-invariant is called a translation-variant template or, simply, variant template. Translation-invariant templates can be defined pictorially. For example, a template E (RX)' with X C Z2 defined by
t,(x) =
I 1
ifx=y
2
if x = y
3
+ (0, 1) if x = y + (1,O)
4
ifx=y+(1,1)
0
otherwise.
360
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
t =
FIOURE1 . An invariant template.
is an invariant template which can be pictorially represented in Fig. 1, where the hashed cell denotes the location of the target pixel y. The weight t,(x) for a template t is usually a simple function of the target point y and point x. It can be expressed by the same formula for every target point y. Thus, on a parallel computer, the weights t,(x) can be computed by the corresponding processing elements in the SIMD fashion. C. Pixelwise Operations
The pixelwise operations form a major class of operations in image algebra. A pixelwise operation applies a function to every pixel of an image or corresponding pixels of images. This class of operations maps perfectly to the parallel programming paradigm: All processors apply the same function to different elements at the same time. Pixelwise operations on and between F-valued images are the natural induced operations of the algebraic system F. Any function f:F F induces a function FX -+ FX, again denoted byf, and defined by +
f(a) = I(x, c(x)): 0) = f(a(x)), x E XI. In the same fashion, we can define n-ary functions of images. When an argument is just an element of F, we should first extend it to a constant image that has value of that argument at each location on X. The pixelwise operations are the operations on and between images defined in detail in (Ritter et al., 1990; Ritter, 1991, 1992) and include the unary and binary arithmetic and logic operations, the characteristic functions, the restriction function that restricts an image to a subset of its point set, and the extension function that extends an image to another image on a larger point set. D. Global Operations The global operations act on images in their entirety. These operations usually require global communication when implemented on parallel computers.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
361
Two simple global operations on images are the card function which returns the number of pixels in an image and the choice function which returns an arbitrary point from the point set of an image. Another important global operation is the global reduce operation. Suppose that X C R" is finite, say X = (xl ,x2, ...,x,). If y is an associative and commutative binary operation on the value set F, then the global reduce operation r on FX induced by y is defined by Ta =
r
IEX
a(x) = a(x,)ya(x2)y ... ya(x,),
where a E FX. Thus, r: FX + F. Different associative and commutative binary operations induce different global reduce operations. Typical uses of global reduce operations are summing the pixel values of an image, finding the minimum/maximum pixel value of an image, and using logic reduction to determine the and/or of a boolean image. Image comparison operations are also global operations. A comparison of images can be performed by first comparing images pixelwise and then applying a logic global reduce operation to the pixel comparison result. E. Image- Template Operations
In terms of image processing, operations that combine images and templates are the most powerful tools of image algebra. The image-template operations combine images and templates by using appropriate binary operations. They may be used for transformations between different domains and transformations between different value sets. Suppose that Ct = (F, F, ,F2, 0,y, 5, 1,) is an algebra. The sets F, , FZ, and F are value sets. The operation 0:FTx F2 -, F is a binary operation and the operation y : F x F -, F is an associative and commutative binary operation on F. The constant & E F is the identity element of the y operation, i.e., f y l , = Lyf = f for every f E F. The constant 0, E F2 is called the right ZeroTIement of the o operation, i.e., fl 00,= I,fzeveryf, E F 1 . If a E F: and t E (Ff)', then the generalized product of a with t is the binary operation 0:F: x (Ff)' FY, defined by +
1
r a(x) 0 t,(x), Y E Y . xex We define the support of t, with respect to the algebra Ct as S(t,) = [x E X : t,(x) # O,]. For example, if t E , ' )R ( the support of t, with respect to the-algebra (R,R,R,., +, 0,O) = (R,* , +, 0) is S,(t,) = (x E X: t,(x) # 01. Whenever the algebra is understood, the support with (Y, W ) : My) =
362
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
respect to the algebra is simply called the support. For a template t E (I?')' with support S(t,) for t, , we define the relative support of t, as R(ty) = (x - y: x E S(ty)), and define the relative support of t as R(t) =
u Ny).
Y EX
For example, the relative support of the template t shown in Fig. 1 is {(O, 0), (0,I), (1,0), (1, 1)). Note that the relative supports of an invariant template at different target points are the same. A template t is called a square (rectangular,diamond, or circular, respectively) template if and only if all the points in its relative support form a square (rectangle, diamond, or circle, respectively). A neighborhood on X is a function N : X 2' such that x E N(x). A template t E (FX)' is local with respect to N if and only if S(t,) E N(y) for all y E X. If N is understood, then we simply say that t is a local template. For example, if N(x) is a Moore neighborhood of x E X and t is a local template with respect to N , then the support o f t at each y E X always fits in the Moore neighborhood of y. An image-template operation with a local template is called a local image-template operation. An image-template operation with a nonlocal template is called a nonlocal image-template operation. One can derive many image-template operations by substituting appropriate binary operations for y and o in the definition of the generalized product. Among the most commonly u_sed image-template operations in image processing are 0, 8,@, 8,and 8 described as follows: +
aa
The linear image-template product @ is obtained from the generalized product by substituting + and for y and 0,and is defined for integer-, real-, or complex-valued images. Specifically,
-
I
a 0 t = (Y, WY)):
=
c a(x)
xex
*
1
.
ty(X),Y E Y
The additive maximum/minimum and may be applied to integeror real-valued images. The operation is obtained from the generalized product by substituting V and + for y and 0,while the operation X IJ is obtained by substituting A and + for y and 0. Specifically, ( Y , ~ ( Y ) MY) ): =
V
xex
and
I
a 151 t = (Y, W):b(y) = A a&) xsx
1 1.
a(x) + t,(x),y E Y
+ ty(x), Y E Y
363
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
The multiplicative maximum/minimum 8 and @ may be applied to nonnegative integer- or real-valued images. The operation @ is obtained from the generalized product by substituting v and for y and 0,while the operation @ is obtained by substituting A and for y and 0. Specifically,
-
I I
a 8 t = (Y, WY)):b(y) =
and
V
-
a(x) t,(x), Y E
xex
a @ t = (Y, WY)):b(y) =
A
-
a(x) ty(x), Y
E
I I
Y
Y .
xsx
The bitwise exclusive-or orland 6 and may be applied to I-bit binary number-valued images. The operation is obtained from the generalized product by substituting bitwise or (V) and bitwise exclusive-or @ for y and 0, while the operation is obtained by substituting bitwise and (A) and bitwise exclusive-or @ for y and 0. Specifically, ,
6
6
6
I
and I
a
6t =
1
(Y, MY)):WY)=
A
, (23 t,(x), Y E Y .
xax
1
The image-template operations are appropriate for use whenever any neighborhood or component-based operation is to be performed on portions of an image. They are applicable in areas such as filtering, edge finding, region growing, and feature matching. In these applications, the template defines the relationship of a neighborhood of source image points to result image locations and gives values to these points. The weighting operation o determines the contribution of each of the neighborhood points to the reduction operation r, which determines the final result image value. The image-template operations implicitly specify communications between processing elements when implemented on parallel computers. 111. SIMD MESH-CONNECTED COMPUTERS We now introduce the parallel architecture model that we intend to explore for parallel image processing using image algebra. Fine-grain SIMD parallel computers are believed to be well suited for applications that exhibit
364
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
significant data parallelism such as those encountered in image processing (Maresca and Fountain, 1991). These computers are referred to as massively parallel computers and are based on a large number of small-size processing elements, regular interconnection networks, and a simple control mechanism. Although a large number of parallel architectures with various interconnection networks have been proposed, massively parallel architectures with mesh interconnection networks are suggested as natural parallel architectures for image processing because their structures closely mimic structures of images and provide efficient local communication structures (Rosenfeld, 1983; Preston, 1983; Choudhary and Ranka, 1992). We present an overview of SIMD mesh-connected computers. A. SIMD Parallel Computers
Massively parallel computers are primarily SIMD architectures because of their massive-parallelism nature. The concept of an SIMD parallel computer is illustrated in Fig. 2. SIMD massively parallel computers are made of individual processing elements (PEs), each having a small arithmetic-logic unit (ALU), and a small local memory. Each PE is given a unique index. An SIMD massively parallel computer has a separate program memory and control unit. The control unit performs instruction sequencing, fetching, and decoding. Instructions are broadcast by the control unit to the PEs for execution. PEs are connected by an interconnection network that plays a very important role in the performance of parallel algorithms. While many interconnection networks have been proposed and reviewed (Feng, 1981; Hwang and Interconnection Network
4
I/O
4
4
~
Control Unit-
Program Memory
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
365
Briggs, 1984; Nigam, 1992), the two-dimensional mesh network is the simplest one and can be efficiently implemented with VLSI technology. Here, we consider SIMD massively parallel computers with meshconnected interconnection networks. Since most images in image processing applications are two-dimensional images and most processor arrays are also two-dimensional, we consider processing two-dimensional images on two-dimensional processor arrays. Extensions to higher dimensions can be worked out from the discussion of the two-dimensional cases. B. Mesh-Connected Interconnection Networks
In a two-dimensional m x n mesh-connected computer (MCC), the PEs are logically arranged as a two-dimensional array. Each PE is connected to its nearest 4 or 8 neighbors. A 4-connected mesh is shown in Fig. 3, while an 8-connected mesh is shown in Fig. 4. We denote the processor in the ith row and j t h column of the mesh by PE(i,j) with PE(0,O) in the upper left corner of the mesh. Sometimes we use a one-dimensional indexing of the mesh in which (i,j) is mapped to i * n + j . Data may be transmitted from one PE to another PE via the mesh network. It is assumed that a PE can transmit data to any of its nearest neighbors in unit time, but all PEs should transmit in the same direction to their neighbors. A minor variation to the mesh is a mesh with wraparound connections between PEs on the edge. For a 4-connected mesh, the right end of each row connects to the left end of the row, and the bottom of each column connects to its top. For an 8-connected mesh, diagonal connections are also wraparound. The ILLIAC, the CLIP, and the MPP are examples of mesh-connected computers.
i0
C
1
FIGURE3. Diagram of a 4-connected mesh array.
366
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
i
0
1
2
FIGURE4. Diagram of an 8-connected mesh array.
MCCs have regular interconnection structures suitable for VLSI implementation. The area occupied by the mesh network in an MCC does not increase faster than the size of the mesh. However, the worst-case data movement in an n x n mesh may take O(n) time due to its large diameter of O(n). MCCs seem to be natural architectures for image processing because their network structures are the same as image structures. Their large diameters, however, make image processing algorithms involving communications over long distance inefficient. In a word model, we assume that the interconnection links between PEs are word-wide. A word operation and transmission of a word can be accomplished in a single step. When we consider the time complexity of an algorithm in this model, we only count the number of word steps. In a bit model, we assume that the interconnection links between PEs are bit-wide. Operations on words are broken down into operations on individual bits. A bit operation or transmission of a bit can be done in one step. In this model, time is measured in bit steps. Usually, a time complexity measured in the bit model is slower than that in the word model by a factor of the number of bits in a word. We will use the word model unless we mention the bit model explicitly. C. Mapping Images onto SIMD Mesh-Connected Computers
To map an image onto the array of processors, we wish to assign each PE a pixel. If the array is smaller than the image, there are two ways to handle this (Rosenfeld, 1983). One way is to process the image a block at a time and keep track of what happens where the blocks meet or overlap, as shown in Fig. 5 . Alternatively, if the PEs have enough storage capacity, we can assign
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
367
. . :i .. ........ 3 p:q; . &-&---& .. ...... .... .... 0 0
0
0
0
...0
0
0
..'. ....
. .
0
0
0
. 0
0
.....
0
0
0
.. .. ..
. .
0
0
....
k- m 4
0
....
... 0
0
0 " ' 0
... 0
0
....
0
0
0 ." . '' 0 O
0
0
.....
.. .. .. 0
.'.. ....
0
.... . . . .
0
.... . .
'...
f
.m
.".
0
1
I
,
I
,
m-1
m-l
" ' 0
0
0
. .
.
.
.
0
" ' 0
0
0
..'.
. . . .
....
0
0
" ' 0
. . 0
,...
0
each PE a block of image pixels, and neighboring PEs must then exchange information about all the pixels located on the borders of their blocks, as shown in Fig. 6. For parallel computations on MCCs, it is always easy to convert an algorithm designed for a large network of processors into an equally efficient algorithm for a smaller network of processors (Leighton, 1992). For an algorithm processing an n x n image on an n x n processor array PA, with each pixel assigned to a PE, we can always convert the algorithm so that it can run on an m x m processor array PA2, where m 5 n, by having each PE of PA2 process [n2/m2]pixels to simulate
. ...... 0
0
0
.... ....
. .
.'..
.. ..
L
...a I 0
0
0
3
..
...0
0
0 " ' 0
0
0
. .
0
o....
..... ... 0
."'. 0
. . 0
FIGURE6. Processing of a large image (n x n) on a small processor array (m x m) by assigning an ( n / m ) x (n/m)block to each PE.
368
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
[n2/rn2]PEs of PAI. This will induce a slowdown of at most [n2/m2], which is to be expected, since we use a factor of [n2/rn2]fewer processors. Thus, if we can design an algorithm that is efficient for a large array of processors, we can scale it down to have an algorithm that is just as efficient for a smaller array. We will only consider mapping one pixel to each PE. A mapping of an image a E FXonto a processor array PA is a simple function h that assigns a pixel at location x E X of the image to a processing element PE(i,j) of PA, i.e., h(x) = (i,j). The simplest mapping is an identity function that maps the pixel at location ( i , j )to PE(i,j). When an image is mapped to a processor array, each PE should have a flag to indicate whether it corresponds to a pixel of the image or not. IV. PARALLEL ALGORITHMS FOR IMAQE ALGEBRA PRIMITIVES
Image algebra provides a powerful language for image processing. Several image algebra programming languages have been developed to implement some subalgebras of image algebra. They include Image Algebra FORTRAN (Wilson et al., 1989), Image Algebra C (Perry, 1987), Image Algebra Ada (Wilson, 1991; Yoder and Cockerham, 1990); and Image Algebra C++ (Wilson, 1993). They are high-level languages with extensions of images as basic data types and image algebra primitives as basic operators. On serial machines, these languages simplify the programming tasks for image processing because they replace large blocks of code with short algebraic statements. In order to increase the performance of image algebra implementations, researchers have attempted to implement image algebra primitives on special-purpose architectures. The main idea is that a host machine controls and communicates with a special-purpose architecture to execute image algebra primitives (Langhorne, 1990). The architectures that have been targeted thus far include cellular arrays (Ritter and Gader, 198% transputers (Crookes et al., 1990), ERIM’s Cytocomputer, Honeywell’s PREP, CM2 Connection Machine (Wilson, 1988; Wilson et al., 1988), Aladdin (Wilson and Sweat, 1991), VITec-50 (Shi and Wilson, 1993), MasPar (Meyer and Davidson, 1991), and digital-optical architectures (Coffield, 1992a, 1992b). To increase the performance of image algebra implementations on architectures that are only designed for fast, smallneighborhood operations, researchers have also developed methods to decompose templates to make full use of those architectures (Gader, 1988; Gadder and Dunn, 1989; Ritter and Li, 1989; Li and Ritter, 1990; Lucas and Gibson, 1991; Manseur and Wilson, 1992; Li, 1992). All these attempts are restricted to some specific available machines, and the implementations have been incomplete.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
369
Programming image-processing algorithms using a set of architectureindependent primitives from image algebra, we abstract away from a particular architecture. We do not have to worry about features of machines such as the number of processors and the type of communication networks connecting processors. However, we still do need to worry about the relative performance of the primitives on particular architectures. To search for architectures well-suited for image algebra, we map image algebra primitives onto SIMD mesh-connected computers and analyze their performances. A . Pixelwise Operations A pixelwise operation of image algebra applies a function to its image operand(s) pixelwise and requires no communication between processing elements as long as the same image-to-processor array mapping is used for the operand images. We can always have a linear speedup for any pixelwise operation. However, global operations and image-template operations are not so simple.
B. Global Operations Image algebra global operations perform basically semigroup computations. Thus, we discuss semigroup computations on MCCs first. 1. Semigroup Computations
Suppose that y is an associative operation on the value set F. The semigroup computation for a set A = (ai,jE F: 0 5 i, j < n) on an n x n processor array with ai, assigned to PE(i, j) is to compute
r,
0si,j
ai,j = * O . O Y ~ O , I Y
- - - Yon-1.n-1.
Suppose that we store the final result in PE(0,O). With simple modification of the algorithm given next, we could store the final result in any PE. Since the diameter of an n x n MCC is O(n), any global operation is expected to take n(n)time. An O(n)algorithm for the semigroup computation is as follows: Algorithm Semigroup Computation. 1. Move data column by column to the first column. Every time a column of data is moved to the first column, a y operation is performed by each PE in the first column. 2. Move data one by one to the first row by the PEs in the first column. Every time a datum is moved to PE(0, 0), it performs a y operation. end algorithm.
370
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
2. Global Reduce, Card, Choice, and Image Comparison Operations
For a global reduce operation r induced by y , we can compute r on an image by setting each PE not holding any pixel the identity of the associative binary operation y and performing the semigroup computation. Thus, we have the following result.
Theorem IV.1. There is an O(n) time MCC algorithm to compute any global reduce operation r on an image with domain within an n x n point set. The card operation counts number of pixels in an image. The algorithm can be described as follows: Algorithm Card. 1. Each PE holding a pixel of the image is set value 1, and each PE holding no pixel of the image is set value 0. 2. Perform the semigroup computation with the arithmetic addition as the associative operation. end algorithm. Therefore, we have the following result.
Theorem IV.2. There is an O(n) time NCC algorithm for the card operation applied to an image with domain within an n x n point set. The choice operation chooses randomly a point from the point set of an image. To perform the choice operation, we use its one-dimensional index for each PE in the processor array. The algorithm is as follows: Algorithm Choice. 1. Each PE holding a pixel of the image uses its index as the value, and each PE holding no pixel of the image uses minus infinity for maximum or infinity for minimum. 2. Perform the semigroup computations to obtain the maximum and the minimum. 3. Generate a random number between the minimum and the maximum obtained in Step 2. 4. Broadcast the random number to every PE. 5 . Each PE holding a pixel of the image with index less than or equal to the broadcast index uses its index as the value, and any other PE uses minus infinity as its value. 6. Perform the semigroup computation to obtain the maximum. 7. Find the point from the index using the information of the pixel to PE mapping. end algorithm.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
371
Steps 1, 3, 5, and 7 take constant time. Step 4 takes O(n)time. Steps 2 and 6 perform semigroup computations and take O(n)time. Therefore, we have the following result. Theorem IV.3. There is an O(n) time MCC algorithm for the choice operation applied to an image with domain within an n x n point set. For an image comparison operation, we can first perform the comparison pixelwise, which takes 0(1)time. We then perform the semigroup operation on the result image with the logic and operation as the associative operation, which takes O(n) time. Thus, we have the following theorem. Theorem IV.4. There is an O(n) time MCC algorithm for any image comparison operation applied to images with domains within an n x n point set. C. Image- Template Operations
The image-template operations derived from the generalized imagetemplate product of image algebra are powerful operations for image processing. The generalized image-template product requires intensive computation. In this section, we discuss efficient algorithms for the product. The more we know about the specific product type, the more efficient algorithms we can design for it. 1. General Image- Template Product
The image-template product in its general form is difficult to implement efficiently on SIMD mesh-connected computers. We have to compute the product one pixel by one pixel in the point set of the result image. For each pixel of the result image, we can apply a global reduce operation to the source image weighted with the weights of the template at each pixel in the point set of the source image. Suppose a E Ff is the source image, t E (F;)' is the template, and b = a @ t EIFY is the result image. The algorithm is as follows: Algorithm Image-template-product. Iterate points of Y in some order. For each y E Y, do the following: 1. Compute each weight t,(x) using PE(i, j) assigned to the pixel of a at x, x
E
x.
2. Compute a(x) 0 t,(x) using PE(i,j). 3. Perform the global reduce operation induced by the y operation and store the result to the PE assigned to the pixel of b at y. end algorithm.
372
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
The complexity of the algorithm is the complexity of the global reduce operation times the number of points in the point set of the result image. Thus, we have the following result. Theorem IV.5. Suppose that the result image has m pixels and the source image is on an n x n point set. There is an O(mn)time MCC algorithm for any image-template product.
The preceding algorithm is very general but not efficient. If the image-template product does not change image domain, i.e., if the result image domain is the same as the source image domain, we can have more efficient algorithms. 2. Image- Template Product without Changing Domain
The image-template product that does not involve changing domain covers many image-processing operations. Fundamental operations in computer vision and image processing such as convolutions and correlations (Rosenfeld and Kak, 1982; Ballard and Brown, 1982) are special cases of this product. Convolutions require intensive computations. For an m x n image and a template of t elements, a convolution takes O(tmn) time in a sequential computer. Thus, much attention has been devoted to the development of efficient parallel algorithms for convolutions (Ranka and Sahni, 1990). We assume that images are mapped to the processor array by the identity mapping. On MCCs, the convolution problem has been studied by several researchers (Lee and Aggarwal, 1987; Maresca and Li, 1986; Ranka and Sahni, 1990). However, the algorithms proposed for convolutions only work for invariant templates. The algorithms in Maresca and Li (1986) and Ranka and Sahni (1990) require square templates, and the algorithms in Lee and Aggarwal (1987) and Maresca and Li (1986) rely on broadcasting template weights to processing elements. Although the algorithm in Ranka and Sahni (1990) does not broadcast weights, it assumes that weights are somehow distributed over the processing elements initially and are shifted during the convolution computation, and the algorithm becomes complicated in order to reduce the local memory requirement to 0(1) per processing element. The algorithm in Lee and Aggarwal (1987) moves partial convolution results across processing elements during the computation, which is not efficient when the size of image pixel values (such as binary images) is small as compared to the size of the convolution values. We propose a simple and optimal algorithm for the image-template product on an m x n MCC for an m x n image and a template o f t elements (Shi et al., 1993). The algorithm takes O(t) time and requires 0(1) local
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
373
memory per PE. The image-template product is computed along disjoint convolution paths of the template. In order to compute the image-template product more efficiently, we modify and extend the concept of convolution paths introduced in Lee and Aggarwal(l987). Compared with the previously proposed algorithms, our algorithm works for both invariant and variant templates without restrictions on template shape. Furthermore, it does not broadcast template weights and does not move partial convolution results.
a. Convolution Paths. The concept of convolution paths was introduced by Lee and Aggarwal (1987) who also proposed an algorithm to perform convolutions along convolution paths. Under the definition there, however, a convolution path starts with some pixel in the window and ends at the target pixel. This causes their algorithms to have t - 1 partial result movements, where t is the length of the convolution path in the window. We define a convolution path to start with the target pixel in order to move source image pixel values instead of partial results. Let t E (FX)' be a template and let R(t) denote its relative support. A convolutionpath of length k for t is a sequence of points P = po ,pI , ..., pk such that (a) (b) (c) (d)
p0 = (0,O); pi E R(t), 1 I i Ik ; pi is 4-connected or 8-connected to pi+l, 0 I i pi # pi if i # j, 0 Ii , j I k.
< k ; and
For example, a 3 x 3 template t has a convolution path (0, 0), (-l,O), (- 1, l), (0, l), (1, l), (1, 0), (1, -l), (0, -l), (- 1, -1) in its relative support R(t) = ((i,j ) : -1 I i, j I11, as shown in Fig. 7. The arrows indicate the directions for pixel value movement in the computation of the image-template product along the convolution path. For a E Ff, t E (F:),' and a convolution path P = po, p l , ...,pk, the partial result (a @ t),, of the image-template product a @ t along P is as follows: k
(a @ t)p(Y) =
i= 1
a ( +~
i
P i ) 0 ty(Y
i-1 0
+ pi),
Y E X.
1
I 0
I
FIOURE7. A convolution path of a 3 x 3 template in its relative support.
374
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
FIGURE 8. Two disjoint convolution paths of a template.
Two convolution paths of a template are disjoint if they have no common points in their paths except their first point (0,O). A template t with relative support R(t) is partitionable by the disjoint convolution paths PI,P2,...,PI if every point of R(t) is in one of the convolution paths. For example, the template in Fig. 8 has two disjoint convolution paths and is partitionable by the two disjoint convolution paths. For a E Ff, t E (F;)', if t is partitionable by disjoint convolution paths P I ,P2, .., PI, we can obtain a @ t from the partial results along the disjoint convolution paths as follows:
.
I
A convolution path is called an ideal convolution path if it contains no 8-connected edges, since on a mesh-connected computer where processing elements are 4-connected, the fewer 8-connected edges in the convolution path, the more efficient the computation of the image-template product along the path, which will be clear later. For example, the 3 x 3 template in Fig. 7 has an ideal convolution path. Unfortunately, not all the templates are partitionable by disjoint ideal convolution paths. For example, the template in Fig. 9 is not partitionable by disjoint ideal convolution paths. We now discuss construction of disjoint paths for a given template. Lee and Aggarwal(l987) allow only one convolution path for a template, while we allow more than one disjoint convolution path for a template. It should be easier to construct efficient disjoint convolution paths for a template under our definition. Lee and Aggarwal (1987)have proposed convolution
FIQURE9. A non-ideal convolution path.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
375
FIOURE10. The neighbors of a point p.
paths for rectangular, diamond-shaped, and circular templates with target pixels in the center. Here, we consider more general cases of templates. A shrinking spiral path of a point set S is a path starting with one point on the external contour of S and following points counterclockwise along the external contour of the set formed by the points of S not followed yet. To introduce a simple procedure to obtain a shrinking spiral path for a point set, we define the s-neighbor of a point p to be the neighbor point marked with s in Fig.10 and denote it as N8(p, s), s = 0, 1, ..., 7 . The relative supports of most commonly used templates such as rectangular, diamond, and circular templates have shrinking spiral paths. The shrinking spiral path finding procedure is similar to the chain code finding procedure (Pavlidis, 1982). The procedure starts with the left topmost point po of the point set S and traces points counterclockwise along the external contour formed by the points of S not traced yet. It follows 4-connected edges whenever possible during tracing. Assuming the numerical operations on neighbor numbers to be modulo 8, the procedure can be described as follows: Procedure Shrinkingspiralgath. 1. The path initially consists of po . Set the current point p equal to po, and the search direction s equal to 6. 2. While S is not empty, do the following: 2.1. If N8(p,s- 2) is in S, then set p equal to N8(p,s - 2) and s equal to s - 2. Go to substep 2.10. 2.2. If Ns(p, s) is in S and N8(p, s - 1) has another neighbor in S if &(p, s - 1) is also in S, then set p equal to N8(p, s). Go to substep 2.10. 2.3. If N8(p, s + 2) is in Sand N8(p, s + 1) has another neighbor in S if N8(p,s + 1) is also in S, then set p equal to N8(p,s + 2) and s equal to s + 2. Go to substep 2.10. 2.4. If Ns(p, s + 4) is in S and N8(p,s + 3) has another neighbor in S if N8(p,s + 3) is also in S, then set p equal to N8(p, s + 4) and s equal to s + 4. Go to substep 2.10.
376
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
If N,(p,s - 1) is in S, then set p equal to N,(p,s - 1) and s equal to s - 2. Go to substep 2.10. 2.6. If N,(p, s + 1) is in S, then set p equal to N,(p, s + 1). Go to substep 2.10. 2.7. If N,(p,s + 3) is in S, then set p equal t o N,(p,s + 3) and s equal to s + 2. Go to substep 2.10. 2.8. If N,(p, s + 5) is in S, then set p equal to N,(p, s + 5) and s equal to s + 4. Go to substep 2.10. 2.9. Otherwise, no shrinking spiral path exists for this set. 2.10. Append p to the end of P and set S equal to S\(pj. end procedure.
2.5.
For a template t with relative support R(t), suppose that the foregoing procedure finds a shrinking spiral path P = po, p l , .., pr-], pr for R(t) U ((0,O)). Let pq = (0,O). We can easily construct two disjoint convolution paths: P 1 = p q , p q - 1 , . . . , ~ ~ and , ~ o P2=Pq,Pq+l,...,Pr-1,Pr for the template t. The number of 8-connected edges in the disjoint convolution paths in total is the same as the number of 8-connected edges in the shrinking spiral path. To see how efficient the disjoint paths for a template are, we only need to consider the number of 8-connected edges in the shrinking spiral path of its relative support. Obviously, the worst case is that all the edges in the path are 8-connected, and the best case is that all the edges in the path are 4-connected edges. To compare with Lee and Aggarwal’s convolution path construction method, we apply our method to the templates discussed in Lee and Aggarwal (1987).
.
Result IV.l. A K x L rectangular template has disjoint paths with no 8-connected edges. Applying the shrinking spiral path finding procedure to the relative support of a K x L rectangular template, we obtain a spiral path as shown in Fig. 11 for the relative support. There are no 8-connected edges in the shrinking spiral path. However, Lee and Aggarwal(l987) cannot construct ideal convolution paths for all rectangular templates even if their target points are in the center.
F I ~ U R 1EI . The shrinking spiral path of a rectangular template (5 x 6).
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
377
FIGURE12. The shrinking spiral path of an even diamond template (8 x 8).
Result IV.2. An even K x K diamond template has disjoint convolution paths with no &connected edges.
The shrinking spiral path of an even K x K diamond template is as shown in Fig. 12. Lee and Aggarwal(l987) also give ideal convolution paths for even diamond templates, but they require the target points to be in the center. Result IV.3. An odd K x K diamond template has disjoint convolution paths with K - 1 8-connected edges in total.
K
We consider two cases: K = 4k + 1 and K = 4k - 1. For the case of = 4k + 1, we can have a shrinking spiral path as shown in Fig. 13a by b
U
FIGURE13. The shrinking spiral paths of odd diamond templates. (a) 9 x 9 diamond template. (b) 7 x 7 diamond template.
378
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
applying the shrinking spiral path finding procedure. A turn of a spiral path reduces the template size by 4 in both dimensions. Thus, we need k turns to reach the center. It is easy to see that each turn has four 8-connected edges. Hence, the spiral path has 4k (which is K - 1) 8-connected edges. For the case of K = 4k - 1, we can have a shrinking path as shown in Fig. 13b. After k - 1 turns of the spiral path, which have four 8-connected edges each, we need another turn that has two 8-connected edges. Thus, the spiral path has 4(k - 1) + 2 (which is also K - 1) 8-connected edges. Lee and Aggarwal (1987) have extra K - 1 points for K = 4k + 1 and extra K points for K = 4K - 1, and they require the target pixel of the template to be in the center. Result IV.4. A circular template has disjoint convolution paths with some 8-connected edges. The shrinking spiral path of a circular template can be obtained as shown in Fig. 14. It is not easy to give the exact number of 8-connected edges in the shrinking spiral path for a general circular template. For a 9 x 9 circular template, we have a shrinking spiral path with five 8-connected edges. However, Lee and Aggarwal(l987) need seven extra points and require the target point to be in the center. For most commonly used templates, we have a simple algorithm to construct disjoint convolution paths. For the various templates Lee and Aggarwal (1987) have studied, we have as good as or better results. For a template t whose relative support R(t) has no shrinking spiral path, we have the following two methods to find disjoint convolution paths. One method is to divide R ( t ) into subsets such that each subset is adjacent to (0, 0), find a shrinking spiral path for each subset, and construct
FIOURE14. The shrinking spiral path of a circular template (9 x 9).
'
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
379
disjoint convolution paths from those shrinking spiral paths. Another method is to extend R ( t )by adding extra points to it, find a shrinking spiral path for the extended set, and construct disjoint convolution paths from the shrinking spiral path. Since the weights for the extra points are 0, (the zero element of the o operation), the result is still correct, but t G computation takes extra steps. b. Algorithm for Image- Template Product without Changing Domain. The result of the image-template product is obtained from all the partial results along the disjoint convolution paths. To compute a partial result along a convolution path, we shift the source image according to the points on the convolution path; do the o operation between the shifted source image and the corresponding template weights; and perform the y operation on the intermediate results of the o operation. In this process, we need the template weights along the convolution path. There are three methods to make weights available along the convolution path. The first one broadcasts the weights. The second one assumes that the weights are initially distributed over the processing elements and shifts the weights during the computation. The third method is to compute weights on the fly along the convolution path. The first and second methods have more communication overhead and only work for invariant templates, since the weights of variant templates are different for different target points. The third one works for both invariant and variant templates. Furthermore, it has less communication overhead. We compute weights on the fly along convolution paths. The weight t ( i , j ) ( x ) is a function of the target point ( i , j )and point x. It can be expressed by the same formula for every target point ( i , j ) . Thus, each PE(i,j) can compute t(i,j)(x)in the SIMD mode. We use weight(p) to denote the function computing t ( i , j ) ( ( i , j+) p) using each PE(i,j). We use shift(a, p , q) to denote the procedure moving pixel value a(i,j) from PE(i,j) to PE((i,j) + p - q) for each i , j . It is easy to see that shift(a, p , q) can be done in one route step on an 8-connected MCC. On a 4-connected MCC, it can be done in one route step if p and q are 4-connected, and in two route steps if p and q are 8-connected. Using weight(p) and shift(a,p, q), we have the following procedure to compute the image-template product b = a @ t along I a convolution path
P = PO,Pl,
*.*,Pk.
Procedure ProductAong-convolution-path. 1. Initialize the partial result as the identity of the y operator. Copy the source image a to c .
380
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
2. For each point p, on the convolution path, r = 1,2, ..., k, do the following: 2.1. Use the procedure shift(c, P,-~, p,) to move a pixel value originally at PE((i,j) + p,) to PE(i,j). 2.2. Compute the weight tci,j)((i,j) + p,) by the procedure weight(p,). 2.3. Perform the o operation between the pixel value and the weight. 2.4. Perform the y operation between the partial result and the result obtained in substep 2.3. Update the partial result. end procedure.
Since shift(a, p, q) takes one step on &connected MCCs, it can be easily proved by induction that a((i,j) + p,) will be shifted to PE(i, j ) in step r in the foregoing procedure. On 4-connected MCCs, since shift(a, p, q) takes one step for 4-connected p and q, and it takes two steps for &connected p and q, it can also be easily proved by induction that a ( ( i , j )+ p,) will be shifted to PE(i,j) at the (r + nE(r))th step, where nE(r)is the number of %connected edges in the path until p,. Thus, the foregoing procedure correctly computes the image-template product on the convolution path in k steps on &connected MCCs and in k + nE(k)steps on 4-connected MCCs. Furthermore, the procedure requires no partial result movement. Using this procedure, we present an efficient algorithm for the image-template product on SIMD mesh-connected computers. Suppose that P I , P2,...,P,are the disjoint convolution paths for the template. Algorithm Image-template-product-without-changing-domain. 1 . Compute the weight t(i,j)(i,j) using each PE(i,j). Initialize the final result as the result of the o operation between the pixel value at (i,j) and the corresponding weight. 2. For each convolution path P,, r = 1,2, ...,I, do the following: 2.1. Use the procedure Product-along-convolution-path(P,) to compute the partial result along the path P,. 2.2. Perform the y operation between the final result and the partial result obtained in substep 2.1. Update the final result. end algorithm. Since the number of &connected edges in a convolution path is at most the length of the convolution path, it is easy to see that the just-presented algorithm correctly computes the image-template product a @ t in O(t) steps, where t is the total number of points in the relative support o f t . Thus, we have the following result. Theorem IV.6. Suppose that the template t is partitionable by disjoint convolution paths with t elements in its relative support. There is an O ( t ) time MCC algorithm to compute the image-template product b = a @ t.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
381
The preceding simple algorithm works for both invariant and variant templates without restrictions on the template shapes. It does not broadcast template weights and does not move partial results. When the template is a local template, the algorithm is very efficient on MCCs. If the template is not local, its relative support could be as large as the image point set. In this case, the algorithm takes O(rnn) for an m x n source image. For a special class of nonlocal image-template product, we can develop a more efficient algorithm.
3 . Image- Template Product with Special Nonlocal Template It is not easy to design an efficient algorithm for the image-template product with a general nonlocal template on SIMD mesh-connected computers. For the image-template product with a special class of nonlocal templates that is very useful in computation of image component properties, we develop a more efficient algorithm. Suppose that c is a binary image. A connected component in a binary image is a maximal connected set of black pixels. Let CJx) be the point set of all the black pixels of c in the same connected component of the pixel at x if the pixel at x is black; otherwise, let Cc(x) = (3.Note that if x and y are the locations of two pixels in the same connected component of c, then CJx) = Cc(y).A parameterized template t(c) is called a component template with respect to the binary image c if if x E Cc(y)
w(x) t(c)y(x) =
otherwise,
where 0, is the zero element of the o operation and w is called a weight image with w(x) = 0, whenever c(x) = 0. By the definition, the support set of t(c) at target poiZ y is CJY),i.e., S(t(c),) = c,(Y). Let a E FF be a source image. Suppose that t(c) E (Ff)' is a component template with respect to c. Let b = a @ t(c) E FX.We have,
b(y)
=
r
a(x) o t(c),(x)
r
= x
I E X
a(x) o t(c),(x)
G(Y)
r
= X
E
a(x) o w(x).
%)
Note that the source image a and the result image b are partitioned into components by the binary image c. If the pixels of c at y1 and y2 are in the same connected component of c, the pixels of b at y1 and y2 have the same value, since
b(y,)
=
r x E C,(Y I)
a(x) o w(x) =
I'
a(x) o w(x) = b(y2).
x E cr(r.9
Thus, we only need to compute one value for each component of b and propagate that value to all the pixels of the component.
382
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
The computation of the image-template product with a component template is highly related to the image component labeling problem, which is a complex process on MCCs. Thus, we discuss the implementation detail of the image-template product with a component template on MCCs after we discuss labeling image components. Here, we only state the following result. Theorem IV.7. Suppose that the number of bits in each pixel value of the result image is b. There is an O(kn) time algorithm to compute the imagetemplate product with a component template on MCCs with O(bkn”k) bits of local memory per PE, where k is an integer between 1 and log(2n). V. PARALLEL IMAGEPROCESSING WITH IMAGE ALGEBRA
Parallel image processing activities are currently growing with development of new parallel image-processing architectures and new image-processing algorithms. However, current algorithm development is not based on an efficient mathematical structure specifically designed for image manipulation and image analysis. Each develops his or her own architecture and ad-hoc image-processing tools. This leads to an abundance of nonstandard notation and to inefficient development of highly sophisticated and costeffective image-processing algorithms. Image algebra provides an excellent algebraic notation and a common mathematical framework for imageprocessing algorithm specification, comparison, and performance evaluation. It is a purely mathematical structure, independent of any computer architecture or language. Image algebra provides highly parallel primitives and is a good tool to write parallel programs for image processing when it is implemented on a well-fit parallel architecture. The low-level image-processing tasks require computations corresponding to each pixel of an image. The computations are usually highly regular and can be characterized by their local nature. The intermediate-level image processing involves reduction of image pixel information to a form that will be effective for recognition of high-level image analysis. The computations for intermediate-level image processing are not very regular and parallelism is not immediately evident. The high-level image analysis involves cognitive use of knowledge. There is much less agreement about the computational view of high-level image analysis (Chaudhary and Aggarwal, 1990). Many low-level image-processing algorithms such as image enhancement, edge detection, morphological image processing, and thresholding algorithms have been described using image algebra (University of Florida Center for Computer Vision and Visualization, 1993). Thus, for low-level image
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
383
processing applications we only develop and describe algorithms with image algebra for the Abingdon Cross image-processing benchmark and shrinking binary image components. For intermediate-level image processing applications, we develop a few algorithms for labeling image components and computing properties of image components on SIMD mesh-connected computers and describe the algorithms with image algebra. This demonstrates that image algebra characterizes very well the low-level and intermediate-level image processing on parallel computers. A . Abingdon Cross Benchmark The Abingdon Cross benchmark was developed to compare the performance of computers used in digital image processing (Preston, 1986, 1986, 1989). The benchmark involves extraction of a skeletonized line cross from a solid cross embedded in white Gaussian noise. No algorithm for the benchmark is given, and any algorithm may be used. Usually it includes linear and nonlinear operations. The standard 512 x 512 Abingdon Cross image shown in Fig. 15 is generated using the method described by Preston (1986). Note that other images appear in this section magnified by a factor of 2 relative to the 512 x 512 image in this figure. The Abingdon Cross image has gray levels between 0 and 255. The vertical stroke occupies columns 224 through 288 and rows 64 through 488, and the horizontal stroke occupies columns 64 through 488 and rows 244 through 288. Both arms have gray level 160, and the cross center has gray level 192, while the background has gray level 128. The cross image is embedded in white Gaussian noise. The noise added to all pixels has zero mean and a standard derivation of 32. Since the strokes of the cross have a gray level 32 above that of the background, they have a signal-to-noise ratio of zero decibels [S/N = 20 log,,(32/32) = 01. Similarly, the cross center has a signal-to-noise ratio of 6 decibels [ S / N = 20 10g,,(64/32) 61. Because of the size limitation of the machine on which we perform the benchmark, we scale down the standard Abingdon Cross image to a 64 x 64 image. There are two methods for the scaling process (Preston, 1986). In the first method, the value of each pixel is taken randomly from the corresponding 8 x 8 block of pixels in the standard image. The noise in the image obtained this way still has zero mean and a standard deviation of 32. The signal-to-noise ratio is still maintained. However, as the two edges of each cross stroke move closer because of the scaling down, the total signal energy decreases while the noise energy remains the same. In the second method, the value of each pixel is the local average of the pixel J
384
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
F I O 15. ~ The standard 512 x 512 Abingdon Cross image.
values of the corresponding 8 x 8 block in the standard image. The noise still has zero mean. But the standard deviation of the noise is reduced to (64/512)* 32 = 4. The second method is usually used if the benchmark has to be performed on a smaller image. The image in Fig. 16a is generated using the second scaling method. The noise added to the image has a standard deviation of 4. To illustrate the robustness of our algorithm for the Abingdon Cross benchmark, we generated the test image shown in Fig. 16b with higher-level noise. The noise has zero mean and a standard deviation of 16. To obtain a skeleton cross from the noisy cross image, we proceed in four steps: thresholding, enhancement, erosion, and thinning. The input image a is a 64 x 64 Abingdon Cross image with Gaussian noise. The output image e
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
385
FIGURE16. (a) The scaled-down Abingdon Cross image. (b) The test image with higherlevel noise.
is a one-pixel thick cross image. The algorithm is described as follows: 1. Thresholding: Since the gray level of the background is 128 and the gray level of the arms is 160, we choose 144 as the threshold value to threshold the noisy cross image. This can be done by b := X>140(a).
The threshold image is shown in Fig. 17. 2. Enhancement: This step enhances the thresholded image by removing noise. We first apply a 1 x 7 horizontal binary median filter to the thresholded image to obtain the image shown in Fig. 18a, then apply a 7 x 1 vertical binary median filter to the thresholded image to obtain the image shown in Fig. 18b, and finally or together the two-filtered images to obtain the image shown in Fig. 18c.
FIGURE17. The threshold cross image.
386
HONGCHI SHI. GERHARD X. RITTER, and JOSEPH N. WILSON
FIGURE18. (a) The horizontally filtered image. (b) The vertically filtered image. (c) The or of the filtered images.
This step can be done by c1
:= X > 3 @
c2
:=
c :=
X>3@
c1
0 h) 0 v)
v c2,
where templates h and v are defined as shown in Fig. 19. 3. Erosion: This step makes the cross thinner. We apply the imagetemplate operation with the 3 x 3 template t as shown in Fig. 20 to erode the enhanced image three times as follows: d := c Loop 3 times d := d
6t.
The eroded image is shown in Fig. 21. 4. Thinning: This step thins the cross to a one-pixel thick line cross. We adopt a morphological thinning algorithm (Jang and Chin, 1990) for this step. The algorithm there uses four structuring templates dl , dz , d3, and d4 as shown in Fig. 22 to remove pixels from four directions, and four structuring templates e l , ez ,e3, and e4 as shown in Fig. 23 to remove extra pixels at skeleton junctions. The algorithm consists of four passes. The first pass changes the value of a pixel from 1 to 0 if the neighborhood of the pixel is one of the configurations shown in dl , dz, and e l . The second pass changes the value of a pixel from 1 to 0 if its neighborhood is one of the configurations shown in dz, d3, and e2. The third pass changes the value of a pixel from 1 to 0 if its neighborhood matches one of the configurations shown in d3, d4, and e3. The fourth pass changes the value of a pixel from 1 to 0 if its neighborhood matches one of the configurations shown in d4, dl , and e4 .
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
F I ~ U R19. B The templates used in the enhancement step.
t =
"'.i. 0
0
0
FIGURE 20. The template used in the erosion step.
21. The eroded cross image. FIGURE
387
388
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
FIGURE 22. The structuring templates d , , d,, d,, and d,.
We can express the thinning process as follows:
e := d Loop 2 times
e := e A ( e ~ d , ) A ( e ~ d , ) A ( e ~ e , )
6d,) A (e 6d,) A (e &I e,) e := e A (e 6d,) A (e 6d4) A (e 6e3) e := e A (e 6d4) A (e 6d,) A (e &I e4).
e
:=
e A (e
e2 =
el =
e3 = 0 e 4 =
1 ' .1
FIGURE23. The structuring templates e l , e2, e 3 , and e,
.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
3 89
FIGURE24. The fhinned cross image.
The thinned image shown in Fig. 24 is obtained by the preceding process from the eroded image. The preceding algorithm for the Abingdon Cross image processing benchmark has been implemented on the Hughes 3D machine emulator (Hughes Aircraft Company, 1992) by hand. The Hughes 3D computer is a 3D stack mesh-connected computer and it has a clock rate of 10 MHz. The experiment has shown an impressive result. The benchmark measures are a quality factor and a price performance factor. The quality factor is defined to be the size of the image processed divided by the execution time. The price performance factor is defined to be the number of pixels processed divided by the product of the execution time and the cost of the computer. The fastest result reported so far (Preston, 1992) is 42.2 microseconds for a 104 x 104 Abingdon Cross image on Martin Marietta’s GAPP 11, whose clock rate is also 10 MHz with a quality factor of 2.5 million. Our benchmark result for a 64 x 64 Abingdon Cross image is 15.5 microseconds with a quality factor of 4.1 million. If we perform the benchmark for a 128 x 128 Abingdon Cross image, we will have an even larger quality factor, since the execution time will not increase significantly. Because the 3D machine is not commercially available, we are unable to compute the price performance for our benchmark result at this moment. B. Shrinking Binary Image Components
Shrinking image components is a useful process in counting and labeling connected components in the image. A set S of black pixels in a binary image can be considered connected according to two different distance definitions (Rosenfeld, 1970; Rosenfeld and Pfaltz, 1968; Kong and Rosenfeld, 1989). For any two pixels in S, if there exists a path joining
390
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
them such that all pixels of the path are in S and all successive pixels at ( i , j ) and ( k , I ) satisfy d , [ ( i , j ) ,( k , I ) ] = li - hl + Ij - kl I1, S is called I-connected; if they satisfy d 2 [ ( i , j )(, k , I)] = max(li - hl, Ij - kl) 5 1, S is called 8-connected. A connected component in a binary image is a maximal connected set of black pixels. A parallel-shrink operator shrinks binary image components into representative pixels, preserving the connectivity, when applied repeatedly to a binary image. To count the connected components in the binary image, we count the representative pixels instead. To label the connected components, we assign unique labels to the representative pixels and propagate the labels to all the pixels of their corresponding components. From now on, we assume that black pixels have value 1 and white pixels have value 0. We also assume 8-connectivity among the pixels belonging to the same component. With simple modification, however, the techniques can be extended to the 4-connectivity case. 1. Levialdi 3 Parallel-Shrink Operators
The basic idea of Levialdi’s parallel parallel-shrink algorithm is to shrink each component in parallel to an isolated pixel at one corner of its bounding rectangle, which is the smallest upright rectangle containing all the black pixels of the component (Levialdi, 1972). We refer to the isolated pixel as the representative pixel of the component. Let a E (0, 11’ denote the source image, where X is an n x n point set. Levialdi defined four parallel-shrink operators in terms of the Heaviside function. Each uses a different 2 x 2 window and is distinguished by the direction in which it shrinks components. Since a parallel-shrink operator changes pixel values of an image in parallel when applied to the image, it is helpful to describe a parallel-shrink operator by the configurations of the neighborhood used for changing pixel values. Levialdi’s parallel-shrink operator quruses the information in the 2 x 2 neighborhood shown in Fig. 25 and shrinks components toward the upper left corners of their bounding rectangles. j
j+l
FIGURE25. The neighborhood for Levialdi’s parallel-shrink operator
(P,.
.
391
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
FIOURE26. The configurations for the parallel-shrink operator p,
where
is either 1 or 0.
The operator can be described as follows: (a) When the window has the configuration shown in Fig. 26a, change the black pixel to white; (b) When the window has the configuration shown in Fig. 26b, change the white pixel to black; (c) Otherwise, do not change. If a* is the image obtained by applying a shrinking operation to the image a, then a* = [a A (a tt)l v (a tz),
6
6
where t, and t2 are as shown in Fig. 27. Obviously, Levialdi’s parallel-shrink operator pur can be implemented in O(1) time on MCCs. Similarly, Levialdi defined three other parallel-shrink operators which shrink connected components toward three other corners. Levialdi (1972) has proved that when any one of his parallel-shrink operators is applied repeatedly in parallel to a binary image, 1. a correct shrinking will be obtained, i.e., only black pixels that do not disconnect a component will be changed to white pixels, and white pixels will not be allowed to become black pixels when this merges two or more components; 2. a single isolated pixel will be obtained for each component; 3. the maximum number of steps necessary for completely shrinking a given component with an r x r bounding rectangle is 2r - 1.
.
FIOURE27. The templates for Levialdi’s parallel-shrink operator pu,
392
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
2. New Parallel-Shrink Operators Levialdi’s parallel-shrink operators shrink each component toward one corner of its bounding rectangle. If we design a new parallel-shrink operator that shrinks toward more than one corner, we should be able to shrink an image completely in fewer steps. Thus, we try to combine Levialdi’s parallel-shrink operators to obtain new parallel-shrink operators (Shi and Ritter, 1994b). Since a parallel-shrink operator combined from two Levialdi’s parallel-shrink operators that shrink toward opposite directions has difficulty in determining which direction it should shrink toward from the local information, we can only combine any two of Levialdi’s parallelshrink operators that do not shrink toward opposite directions to obtain four new parallel-shrink operators. As we know, Levialdi’s parallel-shrink operator purshrinks toward the upper left corner, and another of his operators (pN shrinks towards the lower left corner. Thus, combining the effects of purand prr,we have a new parallel-shrink operator prto shrink each component toward the left side of its bounding rectangle. The new operator pruses a 3 x 2 neighborhood, as shown in Fig. 28. When applied to an image, the operator pr changes the value of a pixel if and only if one of purand pNchanges it. Thus, the operator pr can be expressed as follows: (a) When the neighborhood matches one of the configurations shown in Fig. 29a, change the black pixel to white; (b) When the window has the configuration shown in Fig. 29b, change the white pixel to black; (c) Otherwise, do not change.
If a* is the image obtained by applying prto the image a, then a* = [a A (a
6t i ) A (a 6
t3)1 V
[(a
6t2) V (a 6fd1,
where t l and t2 are as before, and t, and t4 are as shown in Fig. 30. j
j+l
FIGURE28. The neighborhood for the new parallel-shrink operator
cp,.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
*
*
*
393
*
FIGURE 29. The configurations for the parallel-shrink operator cp,, where
z
is either 1 or 0.
It is not difficult to see that the new parallel-shrink operator (or can be also implemented in 0(1) time on MCCs. Similarly, we can define three other new parallel-shrink operators that shrink toward three other directions. We can prove that any new parallelshrink operator can shrink an n x n binary image in at most rl.Sn1 - 1 parallel steps, preserving the 8-connectivity during the shrinking process. Due to similarity, we only give proofs for the operator ( o r .
Theorem V.l. The parallel shrinking algorithm is correct, i.e., when (or is applied repeatedly in parallel to a binary image, 1 . only black pixels that do not disconnect a component will be changed to white; 2. white pixels will not become black if this merges two or more components.
Proof. ( 1 ) The cases in which a*(p) = 0 when a(p) = 1 are shown in Fig. 31. We consider the three possibilities that the shrinking process might
FIGURE30. Two more templates for the new parallel-shrink operator 9,.
394
-m
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
e
O
O
C
a
FIGURE31. The cases in which a*(p) = 0 when a(p) = I .
disconnect a component: (a) a = 1, b = e = 0 , and at least one of c and d is 1: b* = 1. So the 8-connectivity is preserved. (b) e = 1, d = a = 0 , and at least one of c and b is 1: d * = 1. So the 8-connectivity is preserved. (c) a = e = 1, b = d = 0, and c may have 0 or 1: b* = 1 and d * = 1. Thus, the 8-connectivity is still preserved. (2) The cases in which a*(p) = 1 when a(p) = 0 are shown in Fig. 32. The only possibility that it may merge two or more components is that c = 1, b = d = 0, and a and e may have any values of 0 and 1. However, c* will be 0. Thus, the 8-connectivity is preserved. Q.E.D. Theorem V.2. When pi is applied repeatedly in parallel to a binary image, each connected component in the image will be completely shrunk.
Proof. Suppose a component is bounded by a rectangle formed by four lines i = il , i = i 2 ,j = j, , a n d j = j , as shown in Fig. 33. Line L 1 and line L2 divide the component into three areas. Let A denote the set of all the pixels of the rectangle in area A, including all the pixels on the boundary of A. Let B denote the set of all the pixels of the rectangle in area B, and C
FIGURE32. The cases in which a*(p) = 1 when a(p)
= 0.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
I
I
jl
395
L2
J2
FIGURE33. Illustration of parallel shrinking process.
denote all the pixels of the rectangle in area C.Assume that all the pixels in B and C are white pixels, and there are some black pixels on both line L , and line L2.Initially, B and C may be empty, and A may contain black pixels and white pixels in the rectangle. Now consider applying our parallel-shrink operator pi to this component. We may change some black pixels to white and some white pixels to black in A. However, we keep all pixels in B and C white, and change all the black pixels on lines L1 and L2 to white, which means that we can move line L1 and line L2one pixel to the left. Thus, after one application of pi, the size of A is decreased. If we apply pI repeatedly, we can decrease the size of A to 0. Therefore, we can completely shrink the component. Q.E.D. Unlike Levialdi’s parallel-shrink operators, the operator prywhen applied repeatedly, might shrink a component into either a single isolated pixel or two vertically connected pixels surrounded by all white pixels, and then shrinks it completely when applied one more time. Theorem V.3. The maximum number of steps necessary for completely shrinking a given component with an r x r bounding rectangle by using pi is h s r l - 1.
396
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
Proof. Suppose a component is bounded by an r x r rectangle as shown in Fig. 33. As we observed in the proof of Theorem V.2, every application of cpl moves both line L1and line L, one pixel to the left. Thus, they move two pixels closer along the right side of the bounding rectangle. After at most rr/21 - 1 application of ( o r , line L1 and line L, will be one or two pixels apart on the right side of the rectangle. From now on, every application of qr will turn the black pixel(s) on the rightmost column into white pixel(s) and leave one or two black pixels on the column immediately to the left of the rightmost column. Thus, r more applications of (or will completely shrink the component. Therefore, the maximum number of steps to completely shrink a component bounded by an r x r rectangle is rl.5rl - 1 . Q.E.D. For a component bounded by an r x r rectangle, it is not difficult to see that any shrinking algorithm that shrinks components pixel by pixel, preserving its 8-connectivity in the shrinking process, takes at least r/2 steps in the worst case, since there cannot exist a pixel in the rectangle such that the maximum 4-connected distance between that pixel and any pixel on the component boundary is less than r/2. Thus, this kind of parallel shrinking algorithm shrinks an n x n binary image with complexity Q(n) in the worst case. Levialdi’s parallel shrinking algorithm shrinks an n x n binary image within O(n) steps by changing some black pixels to white pixels and changing some white pixels to black pixels as necessary. Thus, his algorithm is optimal asymptotically, but with a multiplicative constant of 2. The new parallel shrinking algorithm we proposed is also optimal asymptotically and shrinks an n x n binary image within O(n) steps. However, we have a smaller multiplicative constant, namely, 1.5. When used in a local image component labeling algorithm, a parallel-shrink operator taking fewer steps requires the labeling algorithm to store fewer intermediate images. C. Labeling Binary Image Components
Labeling the connected components of a digitized image is a fundamental process in image analysis and machine vision (Rosenfeld and Kak, 1982). The process of labeling assigns a unique label to each connected component in the image. Once an image has been labeled, the components that correspond to different objects can be studied, described, and possibly recognized by higher-level analysis algorithms. From the definition of connectedness of image components in a binary image, we know that labels can be propagated locally among adjacent pixels. However, two pixels in the same component can be connected by a
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
397
relatively long path. Thus, the image labeling problem has local and global features that can be used in developing local labeling and global labeling algorithms. The labeling of connected components has been intensively studied, and many algorithms for different architectures have been proposed (Alnuweiri and Prasanna, 1992). On an n x n MCC, Nassimi and Sahni (1980) gave a well-known labeling algorithm that uses a divide-and-conquer technique with global operations and complex pointer operations to label an n x n binary image in O(n)time with O(1ogn) bits of local memory. However, this algorithm has a very large multiplicative constant in its time complexity. Recently, some parallel algorithms using only local operations have been proposed (Cypher et al., 1990; Alnuweiri and Prasanna, 1991). These algorithms use a fast binary image shrinking algorithm devised by Levialdi (1972). They have very small multiplicative constants in their complexities and are more practical. Cypher, Sanz, and Snyder’s first algorithm (Cypher et al., 1990) takes O(n) time and O(n) bits of local memory. To reduce the space requirement of their algorithm, we use a new parallel shrinking algorithm discussed in the last section and propose an algorithm that takes O(n)time and reduces local space requirement by a constant factor. Several algorithms have been proposed to further reduce the space requirement (Cypher et al., 1990; Alnuweiri and Prasanna, 1991; Shi and Ritter, 1993b; Shi and Ritter, 1994a). Cypher, Sanz, and Snyder’s second algorithm (Cypher et al., 1990) requires O(1ogn) bits of local memory. However, it takes O(nlog n) time. Alnuweiri and Prasanna (1991) provided an algorithm with an integer parameter k between 1 and log(2n) that requires O(kn”k) bits of local memory and takes O(kn) time. These wellknown local labeling algorithms fail to achieve the lower bound of O(n)in time and O(1ogn) in space that can be achieved by the global labeling algorithm. Whether there exists a local labeling algorithm that can achieve that lower bound has remained an open question (Alnuweiri and Prasanna, 1991; 1992). We propose a fast algorithm that positively answers this open question. In order to label an n x n binary image in O(n)time and O(1ogn) space, our algorithm uses a pipeline mechanism with stack-like data structures to shrink O(1ogn) images at the same time. The algorithm achieves the complexity lower bound with small multiplicative constants, making it the most efficient algorithm in both practical and asymptotic complexity measures. We consider labeling an n x n image on MCCs and present labeling algorithms in terms of image algebra primitives (Shi and Ritter, 1993a; Shi, 1994). In the following presentation, we define d E Zx by d(i,j) = i * n + j , i.e., d has a unique value at each pixel. Given an 8-connected component S and pixels p and q in S , the &distance between p and q is the length of a
398
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
shortest path joining p and q such that neighboring pixels in the path are 8-connected. If we restrict all the pixels of the path to be contained in S, then the length of a shortest such path is called the intrinsic 8-distance between p and q. The 8-diameter of S is defined as the greatest 8-distance between any pair of pixels in S. The intrinsic 8-diameter of S is the greatest intrinsic 8-distance between any pair of pixels in S. 1. Naive Local Labeling Algorithm
A naive local labeling algorithm starts with assigning each black pixel a
unique label and propagates labels in 3 x 3 neighborhoods repeatedly until the maximum label of each component is propagated to all its pixels. a. Label Propagating Operator. Let po be the template shown in Fig. 34 whose support is a 3 x 3 neighborhood. The label propagating operator iy can be easily expressed in image algebra. If b = iy(a), then b = (a El Po) * Xso(a).
6. The Algorithm. Since any image comparison takes O(n) time on MCCs, we only compare two images every n steps in the naive algorithm. The algorithm is expressed in image algebra as follows: A lgorithm Label-]. I := d a;
-
k : = 1; loop c := I;
I := iy(1); if k mod n = 0 then exit when I = c; end if; k : = k + 1; end loop; end algorithm.
FIQURE34. The template for the label propagating operator ty.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
399
The algorithm is very simple and only stores a couple of images. The execution time depends on the intrinsic 8-diameter of the maximum component in the image, since the label of each component is propagated pixel by pixel. Suppose that the intrinsic 8-diameter of the maximum component is d. The time complexity of the naive algorithm is O(max(n, d)), since every label propagation takes constant time and every image comparison takes O(n) time. The intrinsic 8-diameter of a component could be directly proportional to n2. Thus, the naive local labeling algorithm takes O(n2) time in the worst case. To be precise, we consider a worst-case image that contains only a spiral component, as shown in Fig. 35a. For this specific example, the algorithm needs to loop 60 times to propagate the maximum label of the pixel at the lower right corner to the interior endpoint of the spiral. In general, for an n x n image containing a spiral component as shown in Fig. 35a, let L, be the number of iterations the algorithm needs to propagate the maximum label to all the pixels in the component. The maximum label can be propagated to a pixel (the pixel with dark shade in Fig. 35b) on the layer next to the outside layer in 4(n - 2) steps, and the remaining pixels in the spiral component form a spiral component in an (n - 4) x (n - 4) image. Thus, we have the following recurrence equation:
L, = 4(n - 2) + L - 4 , with L1= 0, L2= 1, L, = 4, and L, = 7. Solving the recurrence equation, we have
F m m 35. (a) An image containing a spiral component. (b) Labeling process of the spiral component.
400
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
FIOURB 36. An example of the simple labeling algorithm Label-1.
Therefore, the naive algorithm takes rn21 - 1 = O(n2)local operations in the worst case. For an 1 1 x 1 1 image shown on the left of Fig. 36,the naive !ocal labeling algorithm Label-1 takes at least 23 iterations to obtain the label image shown on the right of Fig. 36. 2. Basic Two-Phase Local Labeling Algorithm A good method to reduce the time complexity of the naive labeling algorithm is to have a fast image-shrinking phase before the label-propagating phase. The number of iterations required by an algorithm using both a shrinking operator and a label propagating operator is only directly proportional to n. However, the price for decreasing time complexity is an increase in space complexity. The basic two-phase local labeling algorithm proceeds in two phases (Cypher et al., 1990). In the first phase, it applies the parallel-shrink operation devised by Levialdi (1972) to a binary n x n source image a. repeatedly until an all-zero image is obtained, resulting in 2n binary images a,, ,al, . .,a2n-l. This phase shrinks connected components to their representative pixels. Note that the image aZn-l is an all-zero image. We refer to I as the index of the image aI in the sequence. In the second phase, it assigns labels to image azn-2 which consists of representative pixels only. These labels are then propagated to the pixels connected to them in azn-3 by applying a label-propagate operation, then to aZn-4, and so on until a. is labeled. In the labeling process, new representative pixels that represent new components may be encountered. When this happens, new labels are assigned to the representative pixels and the labeling process continues. Note that more than one connected component may be shrunk to the same representative pixel, but at different iterations. Thus, a unique label assigned to a representative pixel should include the iteration number at which it is generated. The preceding simple algorithm applies the label-propagate operations in reverse order to the images generated by the parallel-shrink operations. Thus; it needs to store 2n images, requiring each PE to have 2n bits of local memory.
.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
j-1
401
j
FIGURE 37. The neighborhood for the label-propagate operator vur.
a. Label-Propagate Operator. We have discussed Levialdi’s parallelshrink operator q,,, in the last section. Corresponding to the parallel-shrink operator, we have a label-propagate operator. A label-propagate operator labels the black pixels of a/ from the already labeled image a,+1. Since a parallel-shrink operator, applied to a binary image iteratively, may shrink different components into the same isolated pixel in different shrinking steps, we assign a new label ( I + 1, i , j ) to an isolated black pixel in a, that represents a new component, where I + 1 is used to distinguish the component of a single pixel at (0,O)and the background (white Pixels). Since 0 5 i,j In - 1, and 0 II < 2n - 1, ( I + 1, i , j ) can be considered as a (3 log n + 1)-bit number with I + 1 in the first log n + 1 bits, i in the middle log n bits, and j in the last log n bits. The label-propagate operator w,,, corresponding to Levialdi’s parallelshrink operator q,,, uses the information of the neighborhood as shown in Fig. 37 to propagate the labels of black pixels in image a,+1 to their neighboring black pixels in image a,. Let la, be the label image of a,. If la, = Wu,(la,+l, a,), then
where the template p1 is shown in Fig. 38. The opeator w,,,overlaps the label image of al+, and the image a,, and then propagates the labels in the first image to the black pixels in the second image. It is not difficult to see that the label-propagate operator can be implemented on MCCs in O(1) time.
FIGURE 38. The template for the label-propagatoroperator
vur.
402
HONGCHI SHI, OERHARD X. RITTER, and JOSEPH N. WILSON
b. The Algorithm. The basic two-phase local labeling algorithm using Levialdi's parallel-shrink operator pur and its corresponding labelpropagate operator vuris as follows:
Algorithm Labell. a. := a;
/* Shrink image components */ for k := 1 to 2n - 2 loop ak := p,r(ak-l); /* Shrink */ end loop; /* Assign and propagate component labels */ I := (d + n2(2n - 1)) a2n-2; /* Assign new labels */ for k := 2n - 3 down to 0 loop 1 := vul(l, ak+l);/* Propagate labels */ I := ((d + n2(k + 1)) * xll(l)) v 1; /* Assign new labels */ end loop; end algorithm.
Levialdi (1972) has shown that when the operator pur is applied 2n - 1 times to an n x n image, the image becomes an all-zero image. Thus, the preceding algorithm is correct and uses 2n - 2 parallel-shrink operations and 2n - 2 label-propagate operations. It stores 2n - 2 intermediate images. Therefore, both the time complexity and the space complexity of the algorithm are O(n). The algorithm is illustrated in Fig 39. It takes 20 iterations to shrink the 11 x 1 1 image and 20 iterations to label the image components. Note that only nonzero images are shown in the figure.
FIOURE39. Illustration of the labeling algorithm L a b e l 2
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
j-1
i-1 1
i+ 1
403
j
11
FIGURE 40. The neighborhood for the label-propagate operator
v,.
3 . Improved Basic Two-Phase Local Labeling Algorithm As we discussed in the last section, the new parallel-shrink operator
wI
requires storage of fewer intermediate images when used in a local image component labeling algorithm. We use the new parallel-shrink operator to reduce the memory requirement of the basic two-phase local labeling algorithm and obtain an improved basic two-phase local labeling algorithm. a. Label-Propagate Operator. Corresponding to the new parallelshrink operator pl , a label-propagate operator w1is designed which uses the information of the neighborhood shown in Fig. 40. Let lal be the label image of a/. If lal = yl(lal+,,al), then 1% = ( ( h + l v all El PA
*
81 9
where the template pz is shown in Fig. 41. The operator wl overlaps the label image of al+l and the image al, and then propagates the labels in the first image to the black pixels in the second image. b. The Algorithm. Since the new shrinking operator pl may shrink a connected component to two neighboring representative pixels in a column, we should propagate labels with a 2 x 1 template shown in Fig. 42 after assigning unique labels to representative pixels to make representative pixels of each component have the same label.
P* = 0 1
FIGURE 41. The template for the label-propagator operator w,.
404
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
FIGURE42. The template for making two representative pixels of a component have the same label.
The algorithm using the new parallel-shrink operator sponding label-propagate operator I,U/ is as follows:
(pI
and its corre-
Algorithm Label-3. a. := a ;
/* Shrink image components */ for k := 1 to ri.snl - 2 loop ak := (pur(ak-,);/* Shrink */
end loop; /* Assign and propagate component labels */ I := (d n2( r1.5nl - 1)) arl.Znl-2; /* Assign new labels */ 1 := (1 p3) ~ > ~ ( 1 ) /* ; Make two new labels for a component the same */ for k := ri.snl - 3 down to o loop I := yur(I, /* Propagate labels */ I := ((a n2(k 1)) * ~ = ~ ( 1 v) ) 1; /* Assign new labels */ I := (I p3) * ~ ~ ~ ( /* 1 ) Make ; two new labels for a component the same */
+
-
-
+
+
end loop; end algorithm. It has been shown that the new shrinking operator can shrink any n x n image completely in at most rl.5nl - 1 steps. The algorithm Label-3 is correct and takes rl.5nl - 2 parallel-shrink operations, rl.5nl - 2 labelpropagate operations, and rl.5n 1 - 2 additional image-template operations involved with a 2 x 1 template, but it only requires the storage of rl.5nl - 2 intermediate images. Thus, the time complexity of the algorithm is O(n).The space complexity is also O(n).However, the labeling algorithm Label-3 stores fewer intermediate images than the labeling algorithm L a b e l 2 The algorithm is illustrated in Fig. 43. It takes 15 iterations to shrink the 11 x 11 image and 15 iterations to label the image components. Note that only nonzero images are shown in the figure.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
405
FIOURS43. Illustration of the labeling algorithm Label-3.
4. Advanced Local Labelling Algorithm
In the basic local labeling algorithms, the label-propagate operations are applied in reverse order to the images generated by the parallel-shrink operations. Thus, we have to store the intermediate images or generate them on the fly in the label-propagating process. Suppose that for the source image a = a,-,,we have intermediate images a. ,al, ...,a,- to be generated by the shrinking process. If fewer than s images are allowed, then only a subset of the s images can be stored at any time and the remaining images have to be generated in order during the label-propagating process. Suppose that the image al+l has been labeled and the next image to be labeled is a/. If a, is stored, it can be labeled using a label-propagate operation; if not, it has to be generated from a stored image al, with maximum index I’ c 1. Since Levialdi’s parallel-shrink operator and the new parallel-shrink operator have the same asymptotic complexity in shrinking binary images we simply use Levialdi’s parallel-shrink operator. With Levialdi’s parallelshrink operator, the number of intermediate images need for labeling an n x n image is at most s = 2n. Cypher, Sam, and Snyder’s log-space algorithm (Cypher et al., 1990) reduces the local memory requirement to O(1ogn) binary images, but it increases the time to O(n log n). Alnuweiri and Prasanna’s algorithm (Alnuweiri and Prasanna, 1991) requires O(kn”k) bits of local memory with k in the range between 1 and log(2n), but it takes O(kn) time. Both the log-space algorithm (Cypher et al., 1990) and the stack-based algorithm (Alnuweiri and Prasanna, 1991) can work under the constraint of space for storing only O(1og n) binary images, taking O(n log n) time. Although they are conceptually similar and have to use more time if less space is allowed, the log-space algorithm stores intermediate images in an order determined by two functions and is not general, while the stack-based algorithm determines the storage order in a natural way using stacks and is more general. The stack-based algorithm uses k stacks, each having capacity of storing (2n)1’k binary images, where k is an integer in the range
406
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
between 1 and log(2n). The stack-based algorithm takes O(kn) time and requires space for storing O(kn"k) binary images. Here, we describe the stack-based algorithm using image algebra. Let 2n = mk for any integer k with 1 5 k Ilog(2n). We use k stacks So, S1, ..., S k - , each having capacity of storing m binary images. In the algorithm, stack S, stores each image a, with index r divisible by m4. We first define some primitive operations on stacks as follows: set-empty(S): set stack S empty; empty@): return true if stack S is empty and false otherwise; pop@): pop the top image from stack S; push(a, S):push image a into stack S.
The algorithm is as follows: Algorithm Label-4. /* Initialize */ I := 0; I := 2n; for h := 0 to k - 1 loop set-empty (S,); end loop, q := k;
loop for h := q - 1 down to 0 loop /* Shrink the image mh(m - 1) times and push m images into S h */ push(% S h ) ; for r := 1 to m*+l - mh loop if rmod mh = o then push(a, s h ) ; a := coul(a);
end loop; if h # 0 then a := pop(&); end loop; /* Label m images popped from So */ for r := 1 to m loop 1 := V / u r ( h P p s o ) ) ; I := ((a n I ) * X = 1(1)) V I; I : = I - 1;
+
end loop; /* Find the highest nonempty stack */ q : = 1;
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
407
while q < k and empty(Sq)loop q := q
+ 1;
end loop; exit when q = k; a := pop@,); end loop; end algorithm. Obviously, the algorithm only requires space for storing O(km) = O(kn images. Let G be the number of parallel-shrink operations needed for labeling image ah, in Stack s h after image 8 ( r + l ) m h is labeled. To label a,h, the algorithm generates the mh intermediate images between ha, and a ( r + l l m h , and stores m immediate images in stack & , - I . The image a,r is labeled after all m pairs of images in consecutive locations of S h - 1 are labeled. Thus, we have = mh + m q - , = hmh. Since with To = 0. Solving the recurrence equation, we have the labeling problem can be considered as the problem of labeling a. after am&is labeled in an imaginary stack Sk, the number of parallel-shrink operations for the labeling problem is kmk. Obviously, the algorithm takes mk label-propagate operations. Therefore, the time complexity of the algorithm is O(kn). 5. Fast Advanced Local Labeling Algorithm
The previous algorithms reduce local space requirement at the price of increasing time complexity. The reason for this is that they only shrink one image at a time. Let 2n = mk for any integer k with 1 Ik Ilog(2n). We propose a fast local labeling algorithm (Shi and Ritter, 1993; Shi and Ritter, 1994) that shrinks k images at each time in order to maintain the time complexity to O(n)regardless of k with O(km)local space. In the algorithm, any intermediate image a/ has been generated and is available for use by the time it is needed. The algorithm is based on a pipeline mechanism and stack-like data structures. a. Shrinking Multiple Images Each Time. A collection of k binary images can be considered as an image containing k-bit binary numbers. Suppose that ac is a collection of k binary images with k = O(1og n) and ac[h] is some binary image a/ for 0 Ih Ik - 1. We can shrink the k images in ac simultaneously to obtain a new collection of k images in ac* as follows: ac* := [ac A (ac
&I tl)l v (ac & tz),
408
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
where A and v are k-bit bitwise logic operations. This shrinking operator will be referred to as c p f l . If the hth image in ac is some binary image alh, i.e., ac[h] = alh, where 1, is an index and 0 Ih 5 k - 1, then ac*[h] = 1*
It is not difficult to see that the parallel-shrink operator (of, with k = O(1og n) can be implemented in O(1) time on MCCs. b. Pipeline Mechanism. We design a pipeline consisting of k stages Stagek-, ,Stagekmz,...,Stage, as shown in Fig. 44. Stagek-, takes image a, as input, shrinks it mk - 1 times, stores the intermediate images ao, a,,,k-i,..., ac-l)mx-i, and outputs a(,,+l),,,k-i, a(,-Z),,,k-l, ...,a, to Stagek-,. For each Bl,,,k-I taken from Stagek-,, where I = m - 1, m - 2, ., ,0, Stagek-, shrinks it mk-' - 1 times, stores the intermediate images almk-i,a1,,,k-i+,,,k-2 ,...,al,,,k-i+(,,,-~),,,k-~, and outputs almk-1+(m-l)mk-2, BImk-l+(m-2)mk-2,. ..,Blmk-l to Stagek-3. In general, for each image Blmhtl taken from Stage,,,, , where I = m k - ( h + l ) 1, mk-(,,+l) - 2, ..., l , O , Stage,, shrinks it mh+l - 1 times, stores 81,,,h+I, al,,,h+i+,,,h, ...,almh+~+(,,,-,),,,h, and OUtpUtS ~ l m h + I + ( , , , - ~ ) , , , h , 8 ~ m h + l + ~ m - ~ ) m h , ...,almh+l to Stage,,-]. Thus, for each alm from Stage,, where I = mk-' - 1, mk-' - 2, ..., l , O , Stage, shrinks it m - 1 times, stores aim, aim+,, ..., aim+(,,,-,), and provides the sequence a/,,,+(,,,- ,),alm+(m-z), ...,aim+], almto the labeling process. From the preceding analysis, we can see that Stage,, takes rnh+2steps to consume a sequence of m images supplied by Stage,,+1and takes mh+' steps to produce a sequence of m images for Stageh-, . Similarly, Stage h + l takes mh+2 steps to produce a sequence of m images needed by Stage,,, and Stagehqltakes mh+' steps to consume a sequence of m images provided by Stage,, . Thus, each time Stage,, consumes a sequence of m images supplied by Stage,,,, ,another sequence of m images has been generated by Stage,,,,. On the other hand, each time Stage,, produces a sequence of m images, the previous sequence of m images has been consumed by Stage,,-, . Therefore, we can organize the pipeline stages to work on k different images at each step such that Stage,, produces a sequence of m images exactly when Stage,,-, needs such a sequence. When Stage,, produces its first sequence of m images, starts to work. After Stage,, produces its last sequence of m images, it stops working. Whenever Stage, produces a sequence of m images, we label them by applying m label-propagate operations.
.
k- I
k-2
h
1
FIOURE 44. The pipeline mechanism.
0
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA Step
Stage2
Stage 1
Stage0
409
Image labeled
0 1
2 3
3
4
P3
5
6
7 8 9 10
II 12
13 FIGURE45. Illustration of how the pipeline works.
We can use the step number to coordinate the pipeline stages. Let start to work at step Startk-' = 0. With this information, we can determine when each stage starts to work and when it stops working. Stage,, has to start to work at step Start,, = (mk-(,,+l)- l)mh+'and stops working after step Stop,, = (mk-(,,+')- l)mh+' + (mk-,, - l)mh,producing its first full S(h) at step Start,, + (m - l)mh = (rnk-,, - l)mhand its last full S(h) at step Start,, + (mk-" - l ) m h = (mk-("+') - l ) m h + l + (rnk-" - 1)m". With this pipeline scheduling, Stage,, inputs an image from Stage,,+l at a step that can be divided by mh+'and stores an image at a step that can be divided by mh in its working period. The pipeline starts with the source image a. at the highest stage and ends up ,...,a1,a. at the lowest stage with a sequence of mk images arnk-', whose indices are in descending order. A tiny example of labeling a 4 x 4 image shown in Fig. 45 illustrates how the pipeline works. Here, n = 4. Choosing k = 3, we have m = 2. We want a sequence of images a, ,a6 , ...,a. for the labeling process. Intermediate images in squares generated by each stage need to be stored.
c. Data Structure. Because Stage,, outputs images to in the reverse order it shrinks them, it is natural to have stack data structures to
410
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
top
pop-pointer
bottom
push-pointer
FIOURE46. The stack-like data structure.
store the intermediate images. Since in the pipeline mechanism, Stage,, prepares the next sequence of m images when the previous sequence of m images is being used by Stage,,-, , we combine two stacks together to form a stack-like data structure for intermediate images flowing from Stage,, to Stage,,-,. A stack-like data structure S of depth m is shown in Fig. 46.The pointer pop-pointer always points to the top and the pointer pushpointer always points to the first unused element from the bottom. When pushpointer points to the bottom, S is empty; and when m elements have been pushed into an empty S, S is full. Popping an element from S, we obtain the top element and shift the elements in area data 1 one position toward the top. Pushing an element into S, we put the element to the position pointed by pushpointer and adjust push-pointer to ( p u s h pointer + 1) mod m . We define the following basic operations that can be done in 0(1)time. set-empty@): set S empty; pop(S): return the top image in S and shift images in area data 1 one position to the top; push(a, S): push image a into area data 2 of S.
We use k + 1 stack-like structures S(k),S(k - l), ...,S(O), where S(k) is 1 bit deep while each S(h), 0 5 h Ik - 1, is m bits deep. The indices of images stored in each stack-like data structure S(h) are multiple of mh. Since Stage,, pushes a new sequence of m images into the stack-like data structure S(h) when Stage,,-, pops the previous sequence of m images out of S(h), pushing images into S(h) will never overwrite it. As the example in Fig. 45 shows, Stage, pushes the image sequence ao, a2 into S(l) when Stage, pops the previous sequence 8 6 , a, out of S(1). Also, when Stage,,-, needs another sequence and starts to pop images out from S(h), Stage,, has already pushed a new sequence into S(h). Thus, we will never pop an empty stack-like data structure. In the same example, when Stage, needs the sequence a2, a. , Stage, has already pushed that sequence into S(1). d. The Algorithm. The fast local labeling algorithm is presented next. Variable step is used to coordinate all the pipeline stages to work
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
41 1
concurrently and correctly. Variables high-stage and low-stage are used to indicate that only Stagehigh-sfage through Stagelow-stage are in their working periods. Variable ac holds all the current binary images of the k pipeline stages with ac[h] storing the current image of Stage,,. When performing push(ac[h],S(h)),each PE extracts the hth bit from variable ac and pushes that bit into S(h).When performing ac[h] := pop(S(h + l), each PE pops out a bit from S(h 1) and puts that bit into the hth bit position of variable
ac .
+
Algorithm Label- 5. input: a (a binary image to be labeled); output: I (an image with components uniquely labeled); /* Initialize */ 1 step := 0; 2 1 := 0;I : = 2n; 3 low-stage := high-stage := k - 1; 4 for h := 0 to k loop 5 set- empty(S(h)); 6 end loop; 7 push(a, S ( W ; 8 while (high-stage >= 0) loop if (step mod mlowstage- 0) then /* Find the last stage to update S */ 10 last-stage := low-stage; while (step mod mlast-stage+'- 0) loop 11 last-stage := last-stage 1; 12 13 end loop;
9
+
14 15 16 17 18 19 20 21
/* Update stack-like data structures */ for h := last-stage down to low-stage loop if (step mod mh+' = 0) then ac[h] := pop(S(h + 1); end if; if (step mod mh = 0) then push(ac[hl,S(h)); end if; end for;
22 23 24
/* Label the images in S(0) when it is full */ if (low-stage = 0) and (step + 1 mod m = 0) then forr:=Otom- ldo 1 := V"dL POP(S(0)));
412 25 26 27 28
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
I := [(d + n21) * x = ,(I) v I; I := I - 1; end loop; end if; /* Adust low-stage and high-stage */
29 30 31 32 33 34 35 36 37
if (low-stage > 0) and (step >= (low-stage low-stage := low-stage - 1; ac[low-stage] := pop[S(low-stage + l)]; push(ac[low-stage], S(low-stage)); end if; if (step > = high-stage,,,) then high-stage := high-stage - 1; end if; end if;
38
ac := &(ac);
-
l),,,,,) then
/* Shrink k images at the same time */
39 step := step 40 end loop; end algorithm.
+ 1;
e. Algorithm Correctness and Complexities. We define the output of a pipeline stage Stage, to be the sequence of images popped out from S(h) by Stageh-, in the order they are popped out. The output of Stage, is the sequence of images used by the label-propagate operations in the order they are popped out from S(0). We know from the pipeline scheduling that Stageh+] produces a sequence of m images forming a full S(h + 1) exactly when Stage, needs that full S(h + 1). For each image with index Imh+l from the output of Stageh+], Stage, produces a sequence of m images whose indices are Imh+' + ( m - l)mh, Imh+' + ( m - 2)mh, ..., Imh+' + mh,Imh+', forming a full S(h) for the output of Stage,. To argue the correctness of the algorithm, we first give a theorem. The proof of the theorem is a little tedious, but it is easy to understand with the example given next. Stage, converts the output of Stageh+], a sequence of images whose indices are m h + lapart in descending order, into a sequence of images whose indices are mh apart in descending order as the output of Stage,. Theorem V.4. In the preceding local labeling algorithm, the output of Stage, is a sequence o f m k - , images a(mk-h-I)mh ,tl(,k-h-t),,,h ,...,aImh, ...,a, f o r 0 S h S k - 1.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
413
Proof. We prove the theorem by induction on h. (i) h = k - 1: Stagek-] starts to work at step 0 with the input image a. popped from S(k) (in line 16) and pushes al,,,r-I into S(k - 1) at step (m - 1 - I)mk-' (in line 19), I = m - 1, m - 2, ..., 0, producing its only full S(k - 1) at step (m - l)mk-'. At this step starts to work with that full S(k - 1) and pops al,,,k-I from S(k - 1) at step (m - l)mk-' + (m - 1 - I)mk-' (in line 16), I = m - 1, m - 2, ..., 0. Thus, the output of Stagek-] is a ( m - l ) m t - l , a ( m - 2 ) m k - ~ , ..., a m k - l , ao. (ii) Induction hypothesis: The theorem is true for h + 1. That is, the output of Stage,,,' is B ( m k - ( h + l ) - l ) m h + l , 8 ( , k - ( h + I ) - 2 ) , , , h + l , . ..,a I m h - I , . ..,ao. (iii) We show that the theorem is also true for h. Stage,, starts to work at step (mk-@+')- l)mh+' and pops al,,,h+1 out from S(h + 1) at step (Stageh)stcrrr + (m'-@+I) - 1 - I)mh+' (in line 16), I = ,,,k-(h+l) - 1, n F h + ' )- 2, ...,0. Starting at step (Stageh),,, + ( m k - ( " + l ) - 1 - I)m"+l, Stage,, shrinks a ( m k - ( h + l ) - l - l ) m h + l and pushes a(mx-(h+I)-l-l)mh+~+rmh into S(h) at step (Stage,,),,a,
+ (mk-(,,+')+ 1 - I)mh+' + ( m - 1 - r)m"
(in line 19), r = m - 1, m - 2, ..., 0, giving the Ith sequence of m images a l m h - l + ( m - I ) m h , al,,,h+1+ , 2 ) m h , ..., aImh++' for the output of Stageh, I = mk-(h+l)- 1,mh-Th+l) - 2, ..., 1,O. Putting all the mk-(h+l) sequences of m images together, we have the output for Stage,, as follows: a(,,,k-h_l),,,h ,a ( , , , k - ~ ~ ) , , , h, ...,a,h , ...,ao. Q.E.D.
Corollary V.l.
The local labeling algorithm is correct.
Proof. From Theorem V.4, we know that the output of Stageo is a,r- I ,a , , , ~ ,- .~..,a. . This is exactly the sequence of images the labelpropagate procedure should use. Thus, the algorithm is correct. Q.E.D.
Theorem V.S. The space complexity of the local labeling algorithm is O(kn1Ik)and the time complexity is O(n). Proof. We have one stack-like data structure with depth of 1 bit and k stack-like data structures with depth of m bits. Each stack-like data structure has two pointers that take logm bits each. Thus, the stack-like data structures take km + 1 + 2k log m bits in each PE. The label itself takes 3 log n + 1 bits. Thus, we need km + 1 + 2k log m + 3 log n + 1 bits of local memory per PE. Therefore, the space complexity of the local labeling algorithm is O(knl'k), since m = (2n)'Ik and 1 Ik Ilog(2n). The time complexity of the algorithm is dominated by the number of the k-bit parallel-shrink operations, the number of label-propagate operations, and the number of stack-like data structure push/pop operations. Since
414
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
Stageo produces its last full S(0) at step (mk-(O+') - l)mo+' + (mk-0 - 1)mo = 2mk - m - 1, the while loop (lines 8-40) is executed 2mk - m - 1 times, which means the number of the k-bit parallel-shrink operations needed is 2mk - m - 1. Obviously, the number of label-propagate operations executed (in lines 23-27) is 2n. Since Stageh pops m k - ( h + ' ) images from S(h + 1) and pushes mk-h images into S(h) in its working period, and the label-propagate procedure pops 2n images from S(O), the number of stack-like data structure push/pop operations can be computed as k- 1
m + l C (mk-(h+l) + mk-h) + 2n = (mk - 1) + 2n.
h=O
m-1
Since the k-bit parallel-shrink operations, label-propagate operations, and stack-like data structure push/pop operations each take O( 1) time, and 2n = mk, the algorithm takes O(k)time for the initialization (in lines 1-7), O(n) time for the k-bit parallel-shrink operations, O(n) time for the label-propagate operations, and O(n) time for the stack-like data structure push/pop operations. Therefore, the time complexity of the algorithm is Q.E.D. O(n).
Corollary V.2. The local labeling algorithm is an O(n)-time and O(1og n)space algorithm.
Proof. When k
= O(log n), we have O(knlIk) = O(log n). Thus, the local labeling algorithm is an O(n)-time and O(1og n)-space algorithm. Q.E.D.
The fast advanced local labeling algorithm we have presented takes O(n) time and requires O(knlIk) bits of local memory per PE to label an n x n image on an n x n MCC, where k is in the range between 1 and log(2n). When k is 1, the time and space complexities of our algorithm are basically the same as the first algorithm in Cypher et al. (1990) and the algorithm in Alnuweiri and Prasanna (1991). When k is O(1og n), our algorithm requires O(1og n) bits of local memory per PE, the same as the second algorithm in Cypher et al. (1990) and the algorithm in Alnuweiri and Prasanna (1991). However, our algorithm only takes O(n) time, whereas their algorithms take O(n log n) time. The fast advanced local labeling algorithm gives a positive answer to the question posed by Alnuweiri and Prasanna (1991, 1992). Furthermore, the algorithm uses local operators and involves low communication overhead, having a very small multiplicative constants in its complexities. The algorithm presented here is the most efficient algorithm for image component labeling on mesh-connected computers in both theoretical and practical complexity measures.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
415
f. An Example. We illustrate how our local labeling algorithm works by labeling an 8 x 8 image. If we use Levialdi’s parallel-shrink operator, we have 16 intermediate images a,, ,al, ...,815. Choosing k = 5 , we have four pipeline stages Stage,, ...,Stage, and five stack-like data structures S(4), S(3), ...,S(O), where S(4) is 1 bit deep and S(3), ..., S(0) are 2 bits deep. In Table I, we show which image each stage obtains by shrinking a previous image or moving from the higher stage at each step. Note that the image indices instead of the images are used in the table. We also show the output of each stage and the content of each stack-like data structure at each moment. The St, column gives the image Stage, works on at each step. The 01, column gives the output of Stage,. The S(h) column shows the stack-like data structure S(h) at each step. As can be seen, the output of Stage, is the sequence of images whose indices are 15, 14, ..., 0. D. Computing Properties of Image Components Computing geometric properties of each connected component in an image is an important step toward high-level analysis of the image (Duda and Hart, 1973; Rosenfeld and Kak, 1982; Gonzalez and Woods, 1992). Some algorithms for computing geometric properties of image components on MCCs have been proposed (Dyer and Rosenfeld, 1981; Miller and Stout, 1985). Dyer and Rosenfeld (1981) have proposed MCC algorithms for computing geometric properties of image components such as area, perimeter, compactness, height, width, diameter, and moments. However, their algorithms take O(n2)time in the worst case. Miller and Stout (1985) have proposed O(n)time MCC algorithms for determining the convex hull of each image component, for determining if each component is convex, for computing the distance to the nearest neighbor component, for computing the internal diameter and external diameter, etc. However, their algorithms use complex global operations such as random access read and random access write, leading to large multiplicative factors in their time complexities. Furthermore, they have not discussed computing the area, perimeter, height, width, and moments of each image component. We develop a general algorithm to compute geometric properties of each image component in an image on MCCs (Shi and Ritter, 1994~). The geometric properties to compute are area, perimeter, compactness, height, width, diameter, moments, and centroid. Each of the properties can be computed by the image-template product with a specific component template. After the general algorithm for the image-template product with a component template is implemented, we only need to design a component template to compute each property. The general algorithm for
416
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N . WILSON TABLE I ILLUSTRATION OF LALIELINO AN 8 x 8 IMAOE BY THE FAST ADVANCED LOCAL LABEL IN^ ALOORITHM
20
21
22
23
24
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
417
TABLE I-continued
the image-template product with a component template on MCCs is modified from the stack-based image component labeling algorithm and is discussed in detail in this section. It has a small multiplicative factor in its time complexity and is practical, since the algorithm uses only local operators. 1. Algorithm for Image- Template Product with Component Template
We modify the stack-based algorithm for binary image component labeling to compute the image-template product with a component template on MCCs. During the process of shrinking each connected component to a representative pixel, we compute the result of the product for pixels in the component. In the propagating process, we propagate the result stored in the representative pixel of each connected component to all the pixels of the connected component. Let a E F: be a source image. Suppose that t(c) E pi)' is a component template with respect to a binary image c. To compute b = a @ t(c) E F', we use Levialdi's parallel-shrink operator pur to shrink binary image c to obtain a sequence of images coyc1, ..., with co = c. We correspond an image a, to each image c, with a. = a o w, where w is the weight image. As we know, pur shrinks each component toward the upper left corner of its bounding rectangle. Thus, when a black pixel in c,-~ is changed to white in c, by applying a parallel-shrink operation to c,-~ , we accumulate the value of that pixel in arel to one of its left, upper left, and upper neighbors in a, using the y operation. When a pixel in c, is a representative pixel, the corresponding pixel in a, will have the image-template product result for all the pixels in the corresponding component. We propagate the result to all the pixels in the connected component in the propagating process.
418
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
Suppose that the black pixel of c,-, at x disappears in c,. If the black pixel is not an isolated pixel in c,, we try to find a black neighbor from the neighbors in the following order: the left neighbor, the upper left neighbor, and the upper neighbor. We then accumulate the intermediate result of a,-, at x to the intermediate result of the found black neighbor in a,. To use image algebra to describe the process of computing a, from 8,- , we design a parameterized template u(c,- ,c,) E (F")"as follows:
r
,
if c,(y) = 1 and x = y
1, -
if c,(y) = 1 and x = y + (0, 1) and c,-,(x) = 1 and c,(x) = 0
1, -
if c,(y) = 1 and x = y + ( I , 1) and c,_,(x) = 1 and c,(x) = 0 and c,(x - (0, 1)) = 0
1, -
if cr(y) = 1 and x = y + (1,O) and C,-~(Y) = 1 and c,(x) = 0 and c,(x - (0, 1)) = 0 and C,(X - (1, 1)) = 0
1, -
otherwise,
0, -
where 1 ,is the identity of the o operation and 0, is the zero element of the o operation. The image a, can be simply computed by a, := a,-, @ U(C,-,, c,).
Note that the y operation here is the same y operation in b = a @ t(c), but the o operation here may be different from the o operation in b = a @ t(c). Since the template u is a local template and only uses information in a 2 x 2 neighborhood, computing a, from a,-, can be done in O(1) time on MCCs. Let b, be the image holding the results corresponding to the black pixels of c,. The pixel values of b, corresponding to the white pixels of c, are 0. The image b = a @ t will be bo v (1 - co) * l,, where 1, is a constant image whose pixel values are the identity element 3 the y operation, and the V is a bitwise logic or operation. To obtain b, from b,+, , we first propagate the results corresponding to the black pixels of c,+ to their black neighbors of c,, and then add the values corresponding to all the representative pixels appearing in c, to b,. The first step can be done by br
:=
PI)
*
Cr,
where pI is the template shown in Fig. 38. Note that the v operation used in the image-template operation is a bitwise logic or operation, since the pixels in a connected component should have the same result. To do the
419
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
v =
FIGURE47. The template for isolation of representative pixels.
second step, we design a template v as shown in Fig. 47 and use an image-template operation with that template to extract all the representative pixels in c,. Thus, the second step can be done by b,:= b;[1
-
( c , @ v ) J+ ( c , & v ) . a , .
Therefore, to compute b, from b,+ ,we can use the following IA expression: br := (br+l
PI)*
[cr - (cr 8 V ) I + (cr
6V)
*
8,.
It is easy to see that computing b, from b,+, can be implemented in 0(1) time on MCCs. Let 2n = mk for any integer k with 1 Ik Ilog(2n). We use k stacks So, S1, ..., each having capacity of storing m pairs of images. Stack S, stores each pair of images (c,, a,) with index r divisible by mq. The algorithm to compute b = a @ t(c) with t(c) being a component template with respect to a binary image c is as follows: A lgorithm Image-template-product- with- component- template. /* Initialize */ b := 0; a := aow;
for h := 0 to k - 1 loop set- empty(Sh); q :=
k;
loop for h := q - 1 down to 0 loop /* Compute mh(m - 1) image pairs and push m pairs into
PUWC, a), s h ) ;
for r := 1 to mh+‘ - mh loop if r mod mh = o then push((c,a), s h ) ; c’ := c; c := pU,(c’);/* Shrink components of c */ a := a @ u(c’,c ) /* Compute a corresponding to c */ end loop;
sh
*/
420
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
if h # 0 then (c, a) := pop(&,); end loop; /* Propagate component results m times */ for r := 1 to m loop (c, a) := pop(S0); b := ( b O p , ) * c + ( c & v ) . a ; end loop; /* Find the highest nonempty stack */ q : = 1; while q < k and empty(S,) loop q := q
+ 1;
end loop; if q = k then b := b v ( 1 - c) l r ; exit; end if; (c, a) := poP(S,); end loop; end algorithm. Following the analysis of the stack-based algorithm, we claim that the preceding algorithm requires O(bkn”k) bits of local memory per PE and takes O(kn)time on MCCs, where b is the number of bits in each pixel value of the result image, and k is any integer between 1 and log(2n).
-
2. Computing Area, Perimeter, and Compactness of Each Image Component The area of a component in a binary image is defined to be the number of pixels in the component. To compute the area of each component in a binary image a, we design a component template t(a) with respect to a as follows: 1 if x E C,(y) t(a),(x) = 0 otherwise.
I
Let image b be the image storing the area of each component in each pixel of the component. We can compute b as follows: b := a 0 t(a), since we have b(y) = 4x1 * W Y W xE
=
c
c,w
E
l
x EG(Y)
= card(C,(y)).
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
42 1
FIQURE48. The template for identifying the boundary pixels of each component.
The perimeter of a component in a binary image can be defined as the number of pixels in its boundary. We assume that each component of the source binary image a forms a digital manifold. To compute the perimeter of each component, we need to set all the interior black pixels to white. If the template v is defined as shown in Fig. 48, then a A (a v) is the image with interior black pixels being set white. Let c be the image storing the perimeter of each component in each pixel of the component. Using the same component template t(a), we can compute c as follows:
6
c := [a A (a
6v)] 0 t(a).
The compactness of a component in a binary image is usually measured by a / p 2 ,where a is the area of the component and p is the perimeter of the component. The compactness of a component is one of the measures for quantifying the shape of the component. Let d be the image storing the compactness of each component in each pixel of the component. After b and c are computed, we can compute d as follows: d := b/c2.
Note that the / is a pseudo division with r/O = 0, where r is any number. Therefore, we can use the general algorithm for the image-template product with a component template on MCCs to compute the area, perimeter, and compactness of each image component on MCCs. We have the following theorem. Theorem V.6. There is an O(kn) time algorithm to compute the area, perimeter, and compactness of each component in a binary image on MCCs with O(kn”k log n) bits of local memory, where k is any integer between I and log(2n). 3 . Computing Height, Width, and Diameter of Each Image Component
The height of a component in a binary image is the distance between the highest row and the lowest row occupied by the component. Likewise, the
422
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
width of a component is the distance between the leftmost and rightmost columns. To compute the height of each component in a, we define a component template t,(a) with respect to a as follows: t I (aIy(i9j ) =
I
i
if (i,j) E C,(y)
0
otherwise.
The template tl(a) is intended to set each black pixel its row coordinate. Let b be the image storing the height of each component in each pixel of the component. We can compute b as follows: b := ( a 8 Ma)) - (a@t,(a)),
since we have b(y) =
max
(a(x) * t(a),(x)] -
( i d E CAY)
=
max
(id)E C,(Y)
( i )-
min
min
(LA E C,(Y)
(a(x) * t(a),(x))
(il.
(idE CAY)
Similarly, to compute the image c storing the width of each component in each pixel of the component, we define a component template t2(a) with respect t o a as follows:
The image c can be computed as follows: c := (a 8 t2(a)) - (a @ tz(a)).
The diameter of a component in a binary image is the maximum distance between any two black pixels of the component. Since we assume 8-connectivity, we use chessboard distance here. It is shown in Dyer and Rosenfeld (1981) that the diameter of a component is equal to the length of the longest side of its bounding rectangle. Thus, after we have the length and width of a component, we can simply compute its diameter by taking the maximum of its length and width. Let d be the image storing the diameter of each component in each pixel of the component. We can compute d as follows: d := b V C. Therefore, we can use the general algorithm to compute the height, width, and diameter of each image component. We have the following theorem.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
423
Theorem V.7. There is an O(kn) time algorithm to compute the height, width, and diameter of each component in a binary image on MCCs with O(kn”k log n ) bits of local memory, where k can be any integer between I and log(2n). 4. Computing Moments and Centroid of Each Image Component
Moments and centroid of a component are useful as measures of the location and shape of the component. Various kinds of moments such as basic moments, central moments, normalized moments, and invariant moments have been defined to describe the shape of a component (Hu, 1962; Gonzalez and Woods, 1992). However, other moments are defined on basic moments. Thus, we only discuss how to compute basic moments of each component. Suppose that a is a gray-level image whose pixels form a group of components and the components are represented by the black pixels in the binary image c . A moment of order p + q of a component whose pixel locations form a set C is defined as
mpq =
C
iPjPa(i,j ) .
(i,A E C
The centroid
(61)of the component is defined by 7
m10
and
I = -
moo
-
mo,
j=-. moo
Let mpqbe the image storing moment of order p + q of each component in each pixel of the component. Let (T, 1) be the image storing the centroid of each component in each pixel of the component. Defining a component template tpq(c)with respect to c as follows:
we can compute mpq and (5,
i)as follows: mpq := a 0 tpq(c),
Y := m,o/mOO, and
-
j := mo~/moo.
Thus, we can use the general algorithm to compute moments and centroid of each component in an image. We have the following result.
424
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
Theorem V.8. There is an O(kn)time algorithm to compute moments and centroid of each component in an image on MCCs with O(kn”k log n ) bits of local space, where k is an integer between I and log(2n).
VI. CONCLUDING REMARKS AND FUTURERESEARCH This paper has addressed several issues of parallel image processing with image algebra on SIMD mesh-connected computers. We now give a brief summary of what we have achieved for this research and pose some possible future research problems. A. Concluding Remarks
Parallel computing is the only economical way to achieve the level of performance demanded by image-processing tasks. Many parallel architectures have been proposed and developed. Because of the simplicity of mesh-connected computers and the similarity of their structures to image structures, we have chosen and discussed SIMD mesh-connected computers for image processing with image algebra. We have mapped image algebra primitives onto mesh-connected computers and explored the important architectural features for image algebra. For pixelwise operations, since no communication between processing elements is involved, SIMD meshconnected computers fit quite well. For global operations, SIMD meshconnected computers do not work quite well because of the large diameters of their communication networks. For the sophisticated image-template operations, if they only involve local templates, SIMD mesh-connected computers work well because of the regular communication patterns of the operations; for some special image-template operations that are useful for a class of applications, we also have efficient MCC algorithms; for general image-template operations, it is difficult to design efficient MCC algorithms because of the complex communication patterns of the operations. Image algebra is a unified mathematical theory for image processing and image analysis. It treats images as primary objects, and its operations are highly parallel. It addresses implicitly the parallelism in image processing, and its image-template operations characterize many communication patterns between processing elements. We have identified and discussed important primitives of image algebra for parallel image processing. We have demonstrated that it can serve as a good model for parallel image processing by using image algebra to describe several highly parallel imageprocessing algorithms. The language provided by image algebra abstracts
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
425
away enough from specific architectures and allows image-processing application researchers to concentrate on their application problems instead of worrying about implementation details on specific machines. In addition to the efficient algorithms for image algebra primitives on SIMD mesh-connected computers, we have developed and analyzed several application algorithms on SIMD mesh-connected computers and described them concisely with image algebra. We have developed an efficient algorithm for the Abingdon Cross image processing benchmark. The algorithm has been tested on the Hughes 3D machine, and the result is very impressive. A new binary image component shrinking algorithm has been developed that takes fewer steps than the well-known Levialdi’s shrinking algorithm by a constant factor. The algorithm, if used in labeling binary image components as a primitive operator, can reduce the local space requirement of a local labeling algorithm by a constant factor. We have developed several fast local image component labeling algorithm for SIMD mesh-connected computers. One of them positively answers an open question whether there exists a local labeling algorithm to label an n x n binary image in O(n) time with O(log n) bits of local memory. Finally, we have defined component templates and developed an efficient general algorithm for the image-template product with a component template on SIMD mesh-connected computers. The general algorithm, when given two specific operators and a component template, can compute properties of each component in an image such as the area, perimeter, compactness, height, width, diameter, moments, and centroid.
B. Future Research The sophisticated image-template operations are very general and unify many image operations and transformations. They are very elegant mathematically and give users a powerful tool to describe image-processing applications. However, the general image-template operations pose difficulty to the implementors of image algebra. There seems to be no efficient way to map the general operations to parallel architectures, since the communication patterns involved with the operations are very complex in general. A practical way would be to classify the image-template operations into several commonly used classes and develop efficient implementation for each class. We have defined a class of image-template operations with local templates and a class of nonlocal image-template operations with component templates. The class of local image-template operations is very useful in low-level image processing, while the class of nonlocal imagetemplate operations with component templates is very useful for computing
426
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
properties of image components. We may define more useful classes of image-template operations that can be efficiently implemented. Architecture researchers have realized that only local communication is not enough to solve image-processing problems, and they have augmented the basic mesh network of mesh-connected computers and developed more powerful parallel architectures. For example, the DAP (Parkinson and Litt, 1990) and the 3D machine (Hughes Aircraft Company, 1992) are augmented with row and column broadcasting buses t o speed up communication between distant processing elements. The Polymorphic Torus (Li and Maresca, 1989a, b) and the CAAPP (Levitan et al., 1987; Weems et al., 1989) are augmented with a reconfigurable network that can be dynamically reconfigured into a large number of topologies. Some architectures are augmented with more links to form a pyramid (Uhr, 1983) and a hypercube (Hillis, 1985). The morphology-based image algebra only characterizes local communication between processing elements and fails to serve as a model for image-processing algorithms on these architectures. Image algebra is a more general algebraic structure that can also specify nonlocal communication patterns through its image-template operations. It may serve as a model for image processing on these architectures. Pixelwise operations can certainly be implemented very efficiently on these architectures. Global operations can easily take advantage of the global communication features of such architectures. However, it is not clear how to implement imagetemplate operations t o take advantage of the global communication features of those architectures, since image-template operations imply very complex communication patterns in general, while those architectures can only provide limited different communication patterns. Another mathematical framework closely related to image algebra is the generalized matrix product. The generalized matrix product is a heterogeneous matrix product that provides transforms that combine the same or different values (or objects) into values of a possibly different type (Ritter, 1991; Ritter and Zhu, 1992). It includes the common matrix and vector products of linear algebra. It has been proved that these products can be obtained by substituting specific values for p in the generalized matrix product. It has many applications in signal and image processing (Ritter, 1991; Ritter and Zhu, 1992; Zhu and Ritter, 1993; Zhu, 1993). It is worth exploring the relationship between the generalized matrix product and parallel processing. ACKNOWLEDGMENTS The authors thank Drs. Sam Lambert and Patrick Coffield of Wright Laboratories, Eglin AFB, for their continued support of this research.
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
421
This research was partially supported by U.S. Air Force Contract F08635-894-0134.
REFERENCES Alnuweiri, H. M., and Prasanna, V. K. (1991). Fast image labeling using local operators on mesh-connected computers. IEEE Trans. Puff.Anal. Mach. Intell. PAM1-13(2), 202. Alnuweiri, H. M., and Prasanna, V. K. (1992). Parallel architectures and algorithms for image component labeling. IEEE Trans. Puff.Analysis. Mach. Intell. PAMI-14(10), 1014. Ballard, D. H., and Brown, C. M. “Computer Vision.” Prentice-Hall, Englewood Cliffs, N.J. Batcher, K. E. (1980). Design of a massively parallel processor. IEEE Trans. Cornput. 29(9), 836. Chaudhary, V., and Aggarwal, J. K. (1990). Parallelism in computer vision: A review. I n “Parallel Algorithms for Machine Intelligence and vision” (P. S. Gopalakrishnan, V. Kumar, and L. N. Kanal, Eds.), pp. 271-309. Springer-Verlag. New YorkIBerlin. Choudhary, A., and S. Ranka, (1992). Parallel processing for computer vision and image understanding. Compufer 7. Cloud, E. L., and Holsztynski, W. (1984). Higher efficiency for parallel processors. I n “Proceedings IEEE Southcon 84,” pp. 416-422. Orlando, FL. Coffield, P. C. (1992a). An architecture for processing image algebra operations. I n “Image Algebra and Morphological Image Processing 111,” Vol. 1769 of Proceedings of SPIE, pp. 178-189. San Diego, CA. Coffield, P. C. (1992b). “An Electro-Optical Image Algebra Processing System for Automatic Target Recognition.” PhD thesis, University of Florida. Gainesville. Crookes, D., Morrow, P. J., and McParland, P. J. “An Implementation of Image Algebra on Transputers.” Technical report. Dept. of Computer Science, Queen’s University of Belfast, Northern Ireland. Cypher, R., Sanz, J. L. C., and Snyder, L. (1990). Algorithms for image component labeling on SIMD mesh connected computers. IEEE Trans. Cornput. 39(2), 276. Dougherty, E. R., and Giardina, C. R. (1987). Image algebra-induced operators and induced subalgebras. I n “Visual Communication and Image Processing 11,” Vol. 845 of Proceedings of the SPIE, pp. 270-275. Cambridge, MA. Duda, R. O., and Hart, P. E. (1973). “Pattern Classification and Scene Analysis.” Wiley, New York. Duff, M. J. B. Watson, D. M., Fountain, T. J., and Shaw, G. K. (1973). A cellular logic array for image processing. Puff.Recog. 5(3), 229. Duff, M. J. B. (1982). Clip4. I n “Special Computer Architectures for the Pattern Processing.” (K. S. Fu and T. Ichikawa, Ed.), pp. 65-85. CRC Press, Boca Raton, FL. Dyer, C. R., and A. Rosenfeld., (1981). Parallel image processing by memory-augmented cellular automata. IEEE Trans. Puff.Anal. Mach. Intell. PAMI-3(1), 29. Feng, T. Y. (1981). A survey of interconnection networks. Compufer 14(12), 12. Fountain, T. J., Matthews. K. N., and Duff, M. J. B. (1988). The CLIP7A image processor. IEEE Puff.Anal. Mach. Intell. 100). Fukui, M.,and Kitayama, K. I. (1992). Image logic algebra and optical implementations. Appl. Opt. 31(5), 581. Gader, P. D. (1988). Necessary and sufficient conditions for the existence of a local matrix decompositions. SIAM Mafrix Anal. Appl. 305. Gader, P. D., and Dunn, E. G. (1989). Image algebra and morphological template decomposition. In “Aerospace Pattern Recogniton,” Vol. 1098 of Proceedings of SPIE, pp. 134-145. Orlando, FL.
428
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
Gonzalez, R. C., and Woods, R. E. (1992).“Digital Image Processing.” Addison-Wesley, Reading, MA. Hadwiger, H. (1957). “Vorlesungen uber Inhalt. Oberflache und Isoperimetrie.” SpringerVerlag, Berlin. Hillis, W. D. (1985). “The Connection Machine.” The MIT Press, Cambridge, MA. Hu, M. K. (1962). Visual pattern recognition by moment invariants. IRE Trans. Inform. Theor. IT-8, 179. Huang, K. S . (1990). “A Digital Optical Cellular Image Processor: Theory, Architecture and Implementation.” World Scientific, Singapore. Huang, K. S. Jenkins, B. K., and Sauchuch, A. A. (1989). Binary image algebra and optical cellular logic processor design. Cornput. Visi. Graph. Image Process. 45(3), 295. Hughes Aircraft Company. (1992). “128 x 128 Array 3-D Computer Emulator User’s Manual.” Hwang, K., and Briggs, F. A. (1984). Computer Architectures and Parallel Processing.” McGraw-Hill, New York. Jamieson. L. H., and Tanimoto, S. L. (1987). Special issue on parallel image processing and pattern recognition. J. Parallel Distrib. Comput. 4, 1. Jang, B. K., and Chin, R, T. (1990). Analysis of thinning algorithms using mathematical morphology. IEEE Trans. Putt. Anal. Mach. Intell. 12(6), 541. Kong, T. Y., and Rosenfeld, A. (1989). Digital topology: Introduction and survey. Cornput. Vis. Graph. Image Process. 48(3), 357. Langhorne, D. (1990). The retargeting of image algebra FORTRAN to special-purpose architectures. Master’s thesis, University of Florida. Lee, S.-Y., and Aggarwal, J. K. (1987). Parallel 2-D convolution on a mesh connected array processor. IEEE Trans. Putt. Anal. Mach. Intell. PAM13(4), 590. Leighton, F. T. (1992). “Introducton to parallel Algorithms and Architectures: Arrays, Trees, Hypercubes.” Morgan Kaufmann Publishers, Inc., San Mateo, CA. Levialdi, S. (1972). On shrinking binary picture patterns. Commun. ACM 15(1). Levitan, S. P., Weems, C. C., Hanson, A. R., and Riseman, E. M. (1987). The UMASS image understanding architecture. In “Parallel Computer Vision” (L. Uhr, Ed.), pp. 215-248. Academic Press, Boston. Li, D. (1992). Morphological template decomposition with max-polynomials. J. Math. Imaging Vis. l o ) , 215. Li, D., and Ritter, G. X. (1990). Decomposition of separable and symmetric convex templates. In “Image Algebra and Morphological Image Processing,” Vol. 1350 of Proceedings of SPIE, pp. 408-418. San Diego, CA. Li, H., and Maresca, M. (1989a). Polymorphic-torus architecture for computer vision. IEEE Trans. Pal!. Anal. Mach. Intell. 11(3), 233. Li, H., and Maresca, M. (1989b). Polymorphic-torus network. IEEE Trans. Comput. C-38(9), 1345. Lucas, D., and Gibson, L. (1991). Template decomposition and inversion over hexagonal sampled images. In “Image Algebra and Morphological Image Processing 11,” Vol. 1568 of Proceedings of SPIE, pp. 157-163. San Diego, CA. Manseur, 2. Z., and Wilson, D. C. (1992). Invertibility of a special class of mean filter operators. J. Math. Imaging Vis. 1(2), 137. Maragos, P. (1985). “A Unified Theory of Translation-Invariant Systems with Applicatons to Morphological Analysis and Coding of Images.” PhD thesis, Georgia Inst. Tech., Atlanta. Maresca, M., and Fountain, T. J. (1991). Scanning the issue: Massively parallel computers. Proc. IEEE 79(4), 395. Maresca. M., and Li, H. (1986). Morphological operations on mesh connected architecture: A
PARALLEL IMAGE PROCESSING WITH IMAGE ALGEBRA
429
generalized convolution algorithm. In “Proceedings of 1986 IEEE Computer Society Conference on Computer Vision and Pattern Recognition,” pp. 299-304. McCormick, B. H. (1963). The Illinois pattern recognition computer-ILLIAC 111. IEEE Trans. Electron. comput. 12, 791-813. Meyer, T., and Davidson, J. L. (1991). Image algebra preprocessor for the MasPar parallel computer. In “Image Algebra and Morphological Image Processing 11,” Vol. 1568 of Proceedings of SPIE, pp. 92-100, San Diego, CA. Miller, R., and Stout, Q. F. (1985). Geometric algorithms for digitized pictures on a meshconnected computer. IEEE Trans. Putt. Anal. Mach. Intell. 7(2), 216. Nassimi, D., and Sahni, S. (1980). Finding connected components and connected ones on a mesh-connected parallel computer. SIAM J. Comput. 9(4), 744. Nigam, M. (1992). “Parallel Computations on Reconfigurable Meshes with Bushes.” PhD thesis, University of Florida, Gainesville. Parkinson, D., and Litt, J., Editors (1990). “Massively Parallel Computing with the DAP.” Research Monographs in Parallel and Distributed Computing. The MIT Press, Mass. Pavlidis, T. (1982). “Algorithms for Graphics and Image Processing.” Computer Science Press, Rockville, MD. Perry, W. K. (1987). IAC: Imagealgebra C. Master’s thesis, University of Florida, Gainesville. Preston, K. (1983). Cellular logic computers for pattern recognition. Computer 36. Preston, K. (1986). Benchmark results: The Abingdon Cross. In “Evaluation of Multicomputers for Image Processing” (S. Levialdi, L. Uhr, K. Preston, and M. J. B. Duff, Eds.), pp. 23-54. Academic Press, Orlando. Preston, K. (1989). The Abingdon Cross benchmark survey. Computer 9. Preston, K. (1992). Scientific/industrial image processing: New system benchmark results. Adv. Imaging. Ranka, S . , and Sahni, S. (1990). Convolution on mesh connected multicomputers. IEEE Trans. Putt. Anal. Mach. Intell. 12(3), 315. Ranka, S., and Sahni, S. (1990). Parallel algorithms for image template matching. In “Parallel Algorithms for Machine Intelligence and Vision” (P. S. Gopalakrishnan, V. Kumar, and L. N. Kanal, Eds.), pp. 360-399. Springer-Verlag, New York/Berlin. Ritter, G. X. (1991). Heterogneous matrix products. In “Image Algebra and Morphological Image Processing 11,” Vol. 1568 of Proceedings of SPIE, pp. 92-100. San Diego, CA. Ritter, G. X. (1991). Recent developments in image algebra. In “Advances in Electronics and Electron Physics” (P. Hawkes, ed.), Vol. 80, pp. 243-308. Academic Press, New York. Ritter, G. X. (1992). Image algebra. Technical Report CCVR-92-1, University of Florida Center for Computer Vision and Visualization, Gainesville, FL. Ritter, G. X., and Gader. P. D. (1985). Image algebra implementation on cellular array computers. In “IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management,” pp. 430-438. Miami Beach, FL. Ritter, G. X., and Li, Dong. (1989). Template decomposition and image algebra. Technical report, UniveTsity of Florida, Gainesville. Ritter, G. X., Wilson, J. N., and Davidson, J. L. (1990). Image algebra: An overview. Comput. Vis. Graph. Image Process. 49(3), 297. Ritter, G. X., and Zhu, H. (1992). The generalized matrix product and its applications. J. Math. Imaging Vis. 10). 201. Rosenfeld, A. (1970). Connectivity in digital pictures. J. ACM 17(1), 146. Rosenfeld, A. (1983). Parallel image processing using cellular array computers. Computer 1(1), 177.
Rosenfeld, A., and Kak, A. C. (1982). “Digital Picture Processing,” 2nd. ed. Academic Press, New York.
430
HONGCHI SHI, GERHARD X. RITTER, and JOSEPH N. WILSON
Rosenfeld, A., and Pfaltz, J. L. (1968). Distance functions on digital pictures. Pall. Recog. 1(1), 33. Serra, J. (1992). “Image Analysis and Mathematical Morphology.” Academic Press, New York/London. Shi, H., and Ritter, G. X.(1993a). Image component labeling using local operators. I n “Image Algebra and Morphological Image Processing IV,” Proceedings of SPIE, pp. 303-3 14, San Diego, CA. Shi, H.. and Ritter, G. X. (1993b). O(n)-time and O(log n)-space image component labeling with local operators on SIMD mesh connected computers. I n “Proceedings of the 1993 International Conference on Parallel processing,” St. Charles, IL. Shi, H. (1994). Image algebra techniques for binary image component labeling with local operators. J. Math. Imaging Vis., to appear. Shi, H., and Ritter, G. X.(1994a). A fast algorithm for image component labeling with local operators on mesh connected computers. J. Parallel Disfrib Compuf. to appear. Shi, H., and Ritter, G. X. (1994b). A new parallel binary image shrinking algorithm. IEEE Trans. Image Process. to appear. Shi, H., and Ritter, G. X. (1994~).A special class of non-local image-template operations. I n “Image Algebra and Morphological Image Processing V,” Proceedings of SPIE, San Diego, CA. Shi, H., Ritter, G. X.,and Wilson, J. N. (1993). An efficient algorithm for image-template product on SIMD mesh connected computers. In “Proceedings of the 1993 International Conference on Application-Specific Array Processors.” Venice, Italy. Shi, H.. and Wilson, J. N. (1993). Implementation of image-template operations on an image computer. I n “The Third International Conference for Young Computer Scientists.” Beijing, China. Sipelstein, J. M., and Blellch, G. E. (1991). Collection-oriented languages. Proc. IEEE 79(4), 504. Sternberg, S. R. (1980). Languages and architectures for parallel image processing. I n “Pattern Recognition in Practice.” North Holland, New York. Sternberg, S. R. (1985). Overview of image algebra and related issues. I n “Intergrated Technology for Parallel Image Processing” (S. Levialdi. Ed.). Academic Press, New York/London. Uhr, L. (1983). Pyramid multi-computer structures, and augmented pyramids. I n Computing Stuctures for Image Processing” (M. J. B. Duff, Ed.), pp. 95-112. Academic Press, New York/London. Uhr, L. (1984).“Algorithm-Structured Computer Arrays and Networks.” Academic Press, New York. Unger, S. H. (1958). A computer oriented toward spatial problems. Proc. I R E 46, 1144. University of Florida Centre for Computer Vision and Visualization, Gainesville. (1993). Handbook of Image Algebra.” von Neumann, J. (1951). The general logical theory of automata. I n “Cerebral Mechanism in Behavior: The Hixon Symposium.” Wiley, New York. Webb, J. A. (1992). Steps toward architecture-independent image processing. Computer 21. Weems, C., Riseman, E., and Hanson, A. (1992). Image understanding architecture: Exploiting potential parallelism in machine vision. Compufer 65. Weems, C., Riseman, E., Hanson, A., and Rosenfeld, A. (1991). The DARPA image understanding benchmark for parallel computers. J. Parallel Disfrib. Compuf. 11, 1. Weems, C. C., Levitan, S. P., Hanson, A. R., Riseman, E. M., Shu. D. B., and Nash, J. G. (1989). The Image Understanding Architecture. Inf. J. Compuf. Vis. 2(3), 251.
LIAN-MA0 PENG
43 1
Wilson, J. N. (1988). Implementation of the image algebra on the connection machine. University of Florida CIS Department, Gainesville. Wilson, J. N. (1991). An introduction to image algebra Ada. In “Image Algebra and Morphological Image Processing 11,” Vol. 1568 of Proceedings of SPIE, pp. 101-112. San Diego, CA. Wilson, J. N. (1993).Supporting image algebra in the C t t language. In “Image Algebra and Morphological Image Processing IV,” Vol. 2030 of Proceedings of SPIE, San Diego, CA. Wilson, J. N., Fischer, G. R., and Ritter, G. X. (1988). Implementation and the use of an image processing algebra for programming massively parallel computers. In “Frontiers ’88: The Second Symposium on the Frontiers of Massively Parallel Computation,” pp. 587-594. Fairfax, VA. Wilson, J. N., and Sweat, M. (1991). “Multiprocessor implementation of image algebra: Targeting the Aladdin architecture.” Technical report, Centre for Computer Vision and Visualization, University of Florida, Gainesville. Wilson, J. N., Wilson, D. C., Ritter, G. X.,and Langhorne, D. (1989). Image algebra FORTRAN language, version 3 .O.” Technical Report TR-89-03,University of Florida CIS Department, Gainesville. Yoder, M., and Cockerham, B. (1990). Image Algebra Ada Language. Software Leverage, Inc., Boston, MA. Zhu, H. (1993). “The Generalized Matrix Product and its Applications in Singal Processing.” PhD thesis, University of Florida. Gainesville. Zhu, H., and Ritter, G. X.(1993).The generalized matrix product and the wavelet transform. J. Math. Imaging Vis. 3(1).
This page intentionally left blank
Index
A
labeling, binary image components, 396-4 15 kvialdi’s parallel-shrink algorithm, 390-396.415.425 local labeling algorithm, 400-415 log-space algorithm, 405 naive labeling algorithm, 398-400 stack-based algorithm, 405, 417 AND gate, minimax algebra, 6-7 Antiparallel arc, definition, 37 Approximations Chebyshev, 34-35 distorted wave, 273-275 distorted wave Born, 217, 218, 275, 279 general, I 11-1 12 Arc-set, definition, 37 Arc weighting, 37 Assignment, minimax algebra, 114 Average stage-time, 59 Axial resonance scattering, Bloch waves, 302-310 Axiomatic justification, 16
Abingdon Cross benchmark, 383-389, 425 Absolute center, 86 Absorptive atomic scattering factor, 21 7 Acyclic graphs, 52-54 Addition linear-time rational calculation, 106 maxpolynomials, 88-89 Additivity, Fisher information, 190-191 Adjugate matrix, minimax algebra, I15 Algebra efficient rational algebra, 99-1 19 image algebra, 355-363, 368-382 minimax algebra, 1-98 Algorithms discrete-event system evolution algorithm, 96-97, 100-102 Floyd-Warshall algorithm, 54-56. 64 Karp algorithm, 61-63 merging lists, 89 rectify algorithm, 105 resolution algorithm, 100-101, 105 image processing, 355, 368-425 parallel image processing, 424-425 Abingdon Cross benchmark, 386, 388-389,425 fast local labeling algorithm, 407, 4 10-4 I2 global operations, 369-37 1 image-template operations, 37 1-382, 417-420
B Backward orbit. 21 Backward recursion, 19-21, 35 Beam threshold condition, 331-332 Bias, 132 Binary image algebra, 356 Binary image components area, 420 compactness, 421 433
434
INDEX
Binary image components (conrinued) controid. 423 diameter, 422-423 height. 42 I , 423 labeling, 396-4 I5 fast local labeling algorithm. 410-412. SO7 local labeling algorithm, 400-415 log-space algorithm, 405 naive labeling algorithm, 398-400 stack-based algorithm, 405, 417 moments, 423 perimeter, 42 I shrinking, 389-396. 407-408 width, 422. 423 Bloch waves axial resonance scattering, 302-3 10 bound, 296 channelling, 293-334 electron diffraction. 226, 229, 240 one-dimensional. 3 10-3 17 perturbation method, 25 1-252 planar resonance scattering. 3 13-3 17 two-dimensional. 294-3 10 Boltzmann entropy, 125, 186 Bound Bloch waves, definition, 296 Bright field TEM, 21 I
Cayley-Hamilton, 118-1 19 CBED, scc Convergent-beam electron diffraction Cellular array machines, 355 Characteristic maxpolynomial, I I8 Chebyshev approximation, 34-35 Chebyshev distance, 23 Classical electrodynamics information approach, 149-153 Maxwell’s equations. 186-1 87 from vector equation, 196-198 Coherent node renumbering, 52 Complexity statement, definition, 13-14 Complex modes, quantum mechanics, 156-157 Components, minimax algebra, 50 Computer programming, see Programming Computers, parallel image processing on, 353-426 Concavity, maxpolynomials, 107 Conditional information, 151-153. 185 Conjugation inverting inequalities, 30-3 I matrix, 4
products, 24-25 scalars, 23-24 Connected components, minimax algebra, 50 Connected graphs, 48-5 I Connectivity, 45-51 strong connectivity, 48-49 Convergence, power series. 82 Convergence bound, 8 I Convergent-beam electron diffraction, 2 I I , 214. 218-221 Convexity, maxpolynomials. 107 Convolution paths, 373-379 Correlation function of atomic displacements. 286 Cosmological principle, information approach. 170. 188 Covariance information approach, 186 quantum mechanics, 154-156 Cramer-Rao inequality, 129-133, 165-166. 177-178. 187 Critical cycle, 41 Critical diagram, minimax algebra, 22 Critical events, 16-26 Critical path analysis, 2-23 Crystals deformed, 238-24 I dynamical elastic diffraction, 22 1-250 boundary conditions, 226-230 equations, 221-224 reflection amplitude, 229-230 reflection high-energy diffraction, 241-250 transmission high-energy electron diffraction. 236-24 I two-beam approximation, 230-236 elastic scattering, 216 multilayer system, 238 selvage, 241, 245-248 semi-infinite, 235, 243-248 slab. 229-230 RHEED. 248-250 structure determination, 265 linear least-squares method, 266-267 nonlinear least-squares method, 268-272 structure factors, 257-265 substrate, 241. 243-245 Crystal structure factors direct inversion, 257-265 linear model, 259 quadratic model, 260-265
435
INDEX potential and, 338-341 Cycle deletion, 40, 47 Cycle means, 40-41
D Darwin’s solution, 236 Defects diffuse scattering, 282-285 Definite matrix, 56-58 Deformed crystals, 238-241 Delay, minimax algebra, 6-7 Delta (A),54-58 DES, see Discrete-event system Diagonal realization, DES, 108-109 Differential cross-section, scattering experiments, 208 Diffraction, see Dynamical elastic diffraction; Electron diffraction Diffraction pattern, electron diffraction, 209-2 I4 Diffuse scattering, 217-218. 279-285 defects diffuse scattering. 282-285 real diffuse scattering, 348-349 virtual diffuse scattering. 348-349 Dimensionality, quantum mechanics, 163 Dirac equation, 154. 161-164. 179, 187 Directed graphs, 36-41 Direct inversion, crystal structure factors, 257-265 Discrete-event system. 2-4 connectivity, 45-58 critical events, 16-26 efficient rational algebra, 99-1 19 infinite processes. 75-84 maxpolynomials. 88-98 orbit, 13-14 path problems. 36-45 Period-I DES. 110-11 I realizability, 117-1 19 scheduling, 26-36 steady state, 58-74 strong realization problem, 108-109 Disorder Fisher information as measure, 136-137 information approach, 175, 186 Distorted wave approximation, 273-275 Distorted wave Born approximation, 217. 218. 275, 279 Double orbit table, 22 Duality theory, minimax algebra, 26 DWA, see Distorted wave approximation DWBA, see Distorted wave Born approximation
Dynamical elastic diffraction, 22 1-250 boundary conditions, 226-230 equations, 221-224 reflection amplitude, 229-230 reflection high-energy diffraction, 24 1-250 transmission high-energy electron diffraction, 236-241 two-beam approximation. 230-236
E Efficient estimator, 169 Efficient rational algebra, 99-1 19 Eigen-index, 56, 64 Eigen-node. 56 Eigenproblem, 60-6 I Eigenspace, 68-79 Eigenvalue, 60-63 Eigenvectors equivalent eigenvectors, 65-67 finite eigenvectors, 63-67 fundamental eigenvectors, 63-64 independence, 67 left-hand, 252 right-hand, 25 1-252 Einstein field equations, information approach. 174. 186 Elastic scattering, electron diffraction, 216-217 Electrodynamics information approach, 149-153 Maxwell’s equations, 186-187 from vector equation, 196-198 Electron diffraction, 206-350 background, 206-207 Bloch wave channelling, 293-334 crystal structure factors direct inversion, 257-265 potential and, 338-341 diffracted beam amplitude, 209 diffraction pattern, 209-2 14 dynamical elastic diffraction, 221-250 Green’s functions, 334-338 perturbation methods non-periodic structures. 272-293 periodic structures, 250-272 potential, 25 I crystal structure factors, 338-341 full potential mode, 245, 248 optical potential, 216-217, 341-349 truncated potential mode, 244-245 scattering axial resonance scattering, 302-3 10 by average potential, 214-216
436
INDEX
Electron diffraction, scattering (continued) diffuse scattering, 217-218 elastic scattering, 216-217 planar resonance scattering, 3 13-317 quasi-elastic scattering, 217-218 real diffuse scattering, 348-349 resonance scattering, 293-334 selvage scattering, 241, 245-248 substrate scattering, 241, 243-245 surface resonance scattering, 323-334 TDS scattering, 286-287 virtual diffuse scattering. 348-349 scattering amplitude, 207-209 scattering cross-section, 207-209 theory, 207-22 1 Electron microscopy high-resolution electron microscopy, 216 scanning transmission electron microscopy, 289 transmission electron microscopy, 210, 222 Electron wave function, 216 Elementary path, 39, 40 Energy eigenfunction, 2 16 Energy-mass relation. quantum mechanics, 159-160, 187 Energy-time relation, uncertainty principle, 167-168, 187 Entropy Boltzmann entropy, 125 information approach, 186, 189 Kullback-Leibler cross-entropy, 144, 145 Shannon entropy, 125 EPI, see Principle of extreme physical information Equation for the reference structure, 252 Equivalent eigenvectors, 65-67 Estimation, probability law-estimation procedure, 145-146 Euler-Lagrange equation, 127 Event times, 16-23 Evolution, minimax algebra, 94-98, 100-102 Ewald's solution, 236 EWPP. see Extremal-weightpath problem Extension function, 360 Extremal-weight path problem, 4 1-42 strong form, 45-46 Extrema product forms, 92-93 Extreme information, principle of, 139-147
F Fast local labeling algorithm, 407, 410-412 Finite eigenvectors, 63-67
Finiteness, minimax algebra, 14-16. 26-27 Finite scalars, 5 First-order perturbation, 253-254 Fisher information, 124-125, 128-139 additivity, 190-19 I classical electrodynamics, 149-153, 186-187, 196-198 Cramer-Rao inequality, 129-133, 165-166, 177-178. 187 general relativity, 170-174, 188 information divergence, 144-145, 194-196 maximal information and minimal error, 191-193 measure of disorder, 136-1 37 multidimensional parameters. 133-135 parameter estimation channel, 128-129 Poisson information equation, 139 power spectral Ilfnoise, 174-185, 188 quantum mechanics, 154-165, 184, 185, I87 scalar information, I35 shift-invariant case, 135-136 special relativity, 147-149, 186 uncertainty principle, 165-169. 187-188 zero information, 143-144, 164, 185 Floyd-Warshall algorithm, 54-56. 64 Forward equations, 4-5 Forward orbit, 13 Forward recursion, 4-9, 35 Free Bloch waves, definition, 296 Free space Green's function, 334 Full potential mode, 245, 248 Fundamental eigenvectors, 63-64
c Gamma (r).46-47 Gedanken experiment, 125. 133. 138, 147 classical electrodynamics, 149-150 general relativity, 170, 171 power spectral Ilfnoise, 176-177, 188 quantum mechanics, 155, I65 special relativity, 147 General approximation, minimax algebra, 1 11-1 12 Generalized matrix, 112-1 13 Generalized transitive closure, 79-80 General linear dependence, minimax algebra, 114-1 17 General relativity, information approach, 170-174. 188 Global reduce operation, 361 Gondran and Minoux. theorem, 115-1 17
437
INDEX Graphs acyclic graphs, 52-54 connected graphs. 48-51 directed graphs, 36-4 I underlying finite graph, 38 Greatest-weight path problem. 4 1-42 matrix power series. 80, 84 max algebra, 42-44 Green’s functions, electron diffraction, 334-338 GWPf‘, see Greatest-weight path problem
H Hankel matrix, I19 Heisenberg principle, information approach, 167, 169, 187-188 High-order Laue zone, 239-240. 302-303 High-resolution electron microscopy, 216 HOLZ. see High-order Laue zone Howie-Whelan equation, 240 HREM, see High-resolution electron microscopy Hubble’s law, information approach, 170, 188
I Ideal data interval, 177 Image algebra. parallel image processing, 355-426 global operations, 360-361. 369-371, 424 image-template operations, 361-363, 371-382.417-420. 424 pixelwise operations. 360, 369, 424 templates, 359-360 Image components, binary computing properties, 4 15-424 labeling, 396-415 shrinking, 389-396 Image logic algebra, 356 Image processing Abingdon Cross benchmark, 383-389. 425 algorithms, 355. 368-425 Abingdon Cross benchmark, 386, 388-389, 425 fast local labeling algorithm, 407, 410-412 global operations. 369-371 image-template operations, 37 1-382, 4 17-420 labeling of binary image components, 396-4 15
Levialdi’s parallel-shrink algorithm, 390-396. 415,425 local labeling algorithm, 400-415 log-space algorithm, 405 naive labeling algorithm, 398-400 stack-based algorithm, 405, 417 computing component geometric properties, 415-424 image-template product, 4 17-420 labeling of binary image components, 396-4 I5 SIMD mesh-connected computers, 353-426 Image-template operations, 361-363, 424 algorithms, 371-382. 417-420 Imaging, Z-contrast imaging, 289-293 Incident wave, 208 Inequalities, inverting, 30-3 I Inessential terms, maxpolynomials, 102-103 Infinite processes. minimax algebra, 75-84 Infomation divergence, 144-145, 194-196 Information Row rate, 173 Information, see Fisher information; Physical information Infrared catastrophe, I79 Intermediate node, 39 Inversion process. crystal structure factors, 257-265 Isolated node. 50
K Karp algorithm, 61-63 Klein-Gordon equation, 154, 160-161. 164, 179, 185-187 Kullback-Leibler cross-entropy, 144, 145
Labeling, binary image components, 396-415 fast local labeling algorithm, 407, 410-412 local labeling algorithm, 400-415 log-space algorithm, 405 naive labeling algorithm, 398-400 stack-based algorithm. 405, 417 LACBED, see Large-angle CBED Large-angle CBED. 238, 240-241 Least-weight path problem, 41-42 LEED. see Low-energy electron diffraction Levialdi’s parallel-shrink algorithm. 390-39 I. 415, 425 Linear dependence, minimax algebra, 36 Linear equations, 30-35 Local absolute center, 85-86, 94, 106
438
INDEX
Local image-template operation. 362 Local labeling algorithm. 400-415 Local maximum, minimax algebra, 92 Local minimum. minimax algebra. 92 Log-space algorithm. 405 LOOP.definition. 37 Lorentz equations. 189-190 Lorentz transformations. 149 Low-energy electron diffraction. 256 LWPP see Least-weight path problem
M Mapping, SIMD mesh-connected computers, 366-368 Markov parameters. 108. I I I Mass, quantum mechanics, 159-160. 187 Massively parallel computers, 364 Mathematical morphology, 356 Matrices, 4, 11-13 adjugate matrix. I I5 conjugation matrix, 4 definite matrix, 56-58 Fisher information. I34 generalized matrix, 112-1 13 general linear dependence, 116 Hankel matrix. 119 matrix power series, 80, 84 notation, 6 Pauli spin matrix. 164 powering, 14 p-regularity, 44-45 principal permanent matrix, I I7 projection matrix, 83 scattering matrix, 228-229 selvage scattering matrix, 246 square matrix, I 16, 117 system matrix, 4, 14, 79 transitive closure matrix, 54-55. 79-80 upper-triangularmatrix, 53-54 Matrix power series, 80, 84 Matrix-squaring,47-48 Matter, information approach, 171-174 Max algebra conjugation of products, 24-25 greatest-weight path problem, 41-44, 80, 84 notation. 5-6, 16-19 processes, 10-13 Maximum local, 92 rational functions, 106-107 Maxpolynomials,80, 88-98 characteristic maxpolynomial, I 18
concavity. 107 convexity, 107 Evolution algorithm. 99-102 generalized matrices, I 12-1 I 3 inessential terms. 102-103 merging. 88-91 rectification, 105-106 Resolution algorithm, 99-103 Maxwell’s equations information approach, 186-187 from vector wave equation, 196-198 Mean stage-time, 58-59. 71 Measure of disorder, 137-1 38 Measure estimation, 139. 186. 187 Measurement, Heisenberg’s uncertainty principle. 169 Merging. maxpolynomials, 88-91 Merit function, 257 Mesh-connected computers, 356-357 SIMD mesh-connected computers, 353-368 MHOLZ, see Minus high-order Laue zones Min algebra, notation. 16-1 9 Minimax algebra Chebyshev approximations. 34-35 connectivity, 45-5 1 critical events, 16-26 discrete evenls, 2-16 efficient rational algebra, 99-1 19 infinite processes, 75-84 maxpolynomials, 88-98 notation, 5-6, 16-19 path problems, 36-45 scheduling. 26-36 steady state, 58-74 Minimum local. 92 rational functions, 106-107 Minimum transform minimax algebra, I14 square matrix, 116, I17 Minimum uncertainty product, 169 Minus high-order Laue zones, 304 Momentum momentum-energy space, 157-158. 185 position-momentum relation, 166-167, 187 Momentum-energy space, information approach, 157-158, 185 Multidimensional parameters, Fisher information. 133-135 Multiplication linear-time rational calculation, 106 maxpolynomials, 89-9 I
INDEX
N Naive labeling algorithm, 398-400 Newtonian mechanics, EPI approach, I73 Nodes coherent node renumbering, 52 eigen-node, 56 intermediate node, 39 isolated node, 50 Node-set, definition, 37 Noise, power spectral Ilfnoise, 174-185, 188 Non-elementary cycles, 39 Non-elementary path, 39 Non-Hermitian eigensystems, perturbation methods nonperiodic structures, 272-293 periodic structures, 250-272 Non-periodic structures, perturbation method, 272-293
0 Optical potential, electron diffraction, 2 16217, 341-349 Orbit, minimax algebra, 13-14, 21-23, 75-76 Ordered pair, definition, 37 OR gate, minimax algebra, 18-19 Outgoing wave Green’s function, 334
P Parallel image processing, 353-357 Abingdon Cross benchmark, 383-389, 425 algorithms, 424-425 Abingdon Cross benchmark, 386, 388-389.425 fast local labeling algorithm, 407, 410-4 I2 global operations, 369-37 I image-template operations, 37 1-382, 4 I 7-420 labeling of binary image components, 396-4 15 Levialdi’s parallel-shrink algorithm, 390-396.4 15,425 local labeling algorithm, 400-415 log-space algorithm, 405 naive labeling algorithm, 398-400 stack-based algorithm, 405, 417 SIMD mesh-connected computers, 353-426 Parallel-shrink operators, 390-396, 417 Parameter estimation channel, 128-129 Path definition, 38-39
439
minimax algebra, 36-45 Path weight, 39-40 Pauli spin matrix, 164 Period-I DES, 110-1 I I Periodic structures, perturbation method, 250-272 Permanent, minimax algebra, I14 Perturbation methods, electron diffraction nonperiodic structures, 272-293 periodic structures, 250-272 Perturbation theory, non-degenerate, 252-255 Physical information classical electrodynamics, 149-153, 186-187 Fisher information, 124-1 39 general relativity, 170-174, 188 Poisson information equation, 139 power spectral Ilfnoise, 174-185, 188 principle of extreme physical information, 138, 139-147 quantum mechanics, 154-165, 184, 185, I87 special relativity, 147-149, 186 uncertainty principle, 165-169 zero information, 143-144, 164, 185 Physical information divergence, 144 Pixelwise operations, 360, 369, 424 Planar resonance scattering, Bloch waves, 313-3 I7 Point set, 358 Poisson information equation, 139 Position momentum relation, uncertainty principle, 166-167, 187-188 Potential, 251 electron diffraction crystal structure factors, 338-341 full potential mode, 245, 248 optical potential, 216-217, 341-349 truncated potential mode, 244-245 Power series matrix power series, 80, 84 minimax algebra, 79-84 scalar power series, 80-83 Power spectral Ilfnoise, 174-185, 188 Poynting flux, 150 P-regularity, 44-45 Primal weighting, 37, 38 Principal permanent matrix, 117 Principal permanent mean, 117 Principle of extreme physical information, 138, 139-147, 178-179, 185-190 classical electrodynamics. 153, 186-1 87
440
INDEX
Principle of extreme physical information
S
(conrinrted )
general relativity, 170, 173, 188 power spectral I# noise, 176, 179-180, 183-185. 188
quantum mechanics, 154. 160. 164, 184. 185, 187 Probability law-estimation procedure. 145-146 Product forms. 91-93 Programming. SIMD mesh-connected computers. 357, 369-382 Projection matrix. 83
Q Quantum mechanics. information approach, 154-165. 184. 185, 187 Quasi-elastic scattering, electron diffraction, 2 I 7-2 I 8
R Rational realization, I I I Real diffuse scattering, 348-349 Realizability, 117-1 19 Rectification, maxpolynomials, 105-106 Rectify algorithm, 105 Reflected beam amplitude, 229 Reflection coefficients, 3 18 Reflection high-energy electron diffraction, 210, 241, 243. 317 from crystal slab, 248-250 from semi-infinite crystal, 235, 243-248 surface resonance scattering, 323-334 tensor RHEED. 276-279 Reformulation, minimax algebra, 86-88 Relativity general relativity. 170-174. 188 special relativity, 147-149, 186 Residuation. minimax algebra, 113-1 14 Resolution, minimax algebra, 99-106 Resolution algorithm, 100-101, 105 Resonance detuning parameter, 306 Resonance scattering, electron diffraction. 293-334 Rest mass, general relativity. 174 Restriction function, 360 RHEED, see Reflection high-energy electron diffraction Robustness, minimax algebra, 76 Rocking curves, CBED, 21 I , 218-221
SAED, see Selected area electron diffraction Scalar information, 135 Scalar power series, 80-83 Scalars, 5 , 10. 23-24 Scanning transmission electron miscroscope, 289 Scattered wave, 208 Scattering axial resonance scattering. 302-3 10 Bloch waves axial resonance scattering, 302-3 10 planar resonance scattering. 3 13-3 17 defects diffuse scattering, 282-285 diffuse scattering, 2 17-21 8, 279-285 electron diffraction, 2 14-2 I8 planar resonance scattering, 3 13-3 17 real diffuse scattering, 348-349 selvage scattering, 241, 245-248 substrate scattering, 241, 243-245 surface resonance scattering, 323-334 TDS scattering. 286-287 virtual diffuse scattering, 348-349 Scattering amplitude, 207-209 Scattering cross-section, 207-209 Scattering matrix, 228-229 Schradinger wave equation, 124. 161, 164, 215 Second law of thermodynamics, information approach, 189 Second-order perturbation, 254-255 Selected area electron diffraction, 214 Self-distance. 144 Selvage, crystal, 24 I Selvage scattering, 241, 245-248 Semi-infinite crystals, 235. 243-248 Sensor THEED, 255-256 Shannon entropy, 125 Shift-invariant case, 135-136 Shrinking, binary image components, 389-396.407-408 Shrinking spiral path, 375 SIMD mesh-connected computers, parallel image processing. 353-426 Simple linear dependence, minimax algebra, 36 Single instruction multiple data meshconnected computer, see SIMD meshconnected computers Slab crystals, 229-230 RHEED, 248-250
44
INDEX SLS, 238 Special relativity, information approach, 147-149, 186 Spin. quantum mechanics, 163-164 Square matrix, minimax algebra, 116, I17 Square template, 362 Stack-based algorithm, 405. 417 Steady state convergence to, 75-79 minimax algebra, 58-74 without strong connectivity, 70-74 STEM, see Scanning transmission electron miscroscope Strained-layer superlattice. 238 Strong connectivity. 48-49 Strong realization problem, DES, 108-109 Strong transitive closure, 45-46 Substrate scattering, 24 I , 243-245 Surface resonance, 317-334 Surface resonance scattering, RHEED, 323-334 System matrix, 4. 14, 79
T TDS scattering, 286-287 TEM, 210, 21 I , 289 Tensor RHEED, 276-279 Terminating index, 81 THEED, see Transmission high-energy electron diffraction Theorem of Gondran and Minoux, I 15-1 17 Thermodynamics, information approach, I89 Time-energy relation, uncertainty principle. 167-168 TLEED, see Transmission low-energy electron diffraction Transitive closure matrix, 54-55, 79-80 Transmission coefficients, 3 18 Transmission electron microscopy, see TEM
Transmission high-energy electron diffraction, 210. 236-237. 290, 317 by deformed crystal, 238-241 by multilayer system, 238 sensor THEED, 255-256 Transmission low-energy electron diffraction, 32 I Transmitted beam amplitude, 229 Truncated potential mode, 244-245 U
Ultimate periodicity, minimax algebra, 78-79 Uncertainty principle, information approach, 165-169, 187-188 Underlying finite graph, 38 Upper-triangular matrix, 53-54
V Value set, 358 Vectors eigenvectors. 63-67 minimax algebra, 5 Virtual diffuse scattering, 348-349
W Wave equations, 124 information approach, 161, 164, 184 Wave functions elastic, 217 electron, 2 16 Weak realization problem, 32-33 Weak transitive closure, 41-45
2 Z-contrast imaging, 289-293 Zero information, 143-144. 164, 185 Zero-order Laue zone, 240 Zero property, Lagrangians, 126-127 ZOLZ. see Zero-order Laue zone
I S B N 0-12-03V732-7