ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 107
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILaboratoire d 'Optique El...
18 downloads
968 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 107
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILaboratoire d 'Optique Electronique du Centre National de la Recherche Scientifique Toulouse. France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Pulo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITEDBY PETER W. HAWKES CEMESILaboratoire dOptique Electronique du Centre National de la Recherche Scientifique Toulouse, France
VOLUME 107
ACADEMIC PRESS
San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. @ Copyright
0 1999 by Academic Press
The chapter written by Jeffrey Wood appearing on page 309 is Crown copyright 0 1995 and is published with the permission of DERA on behalf of the Controller of HMSO.
All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. The appearance of the code at the bottom of the first page of a chapter in this book indicates the Publisher’s consent that copies of the chapter may be made for personal or internal use of specific clients. This consent is given on the condition, however, that the copier pay the stated per-copy fee through the Copyright Clearance Center, Inc. (222 Rosewood Drive, Danvers, Massachusetts 01923), for copying beyond that permitted by Sections 107 or 108 of the US.Copyright Law. This consent does not extend to other kinds of copying, such as copying for general distribution, for advertising or promotional purposes, for creating new collective works, or for resale. Copy fees for pre-1998 chapters are as shown on the title pages: if no fee code appears on the title page, the copy fee is the same as for current chapters. 1076-5670199 $30.00 ACADEMIC PRESS 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com Academic Press 24-28 Oval Road, London NWl 7DX, UK http://www.hbuk.co.uk/ap/ International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014749-1 Printed in the United States of America 99 00 01 02 03 QW 9 8 7 6
5 4 3 2
1
CONTENTS CONTRIBUTORS. . . . PREFACE . . . . . . .
. . . . . . . .
vii ix
Magneto-Transport as a Probe of Electron Dynamics in Open Quantum Dots J. P. BIRD,R. AKIS,D. K. FERRY,A N D M. STOPA 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Magneto-Transport in Open Quantum Dots: Some Theoretical Con-
siderations 111. Weak-Field Magneto-Transport in Open Quantum Dots: Low-Temperature Properties . . . . . . . . . . . . . 1V. Weak-Field Magneto-Transport in Open Quantum Dots: High-Temperature Properties . . . . . . . . . . . . V. High-Field Magneto-Transport in Open Quantum Dots . VI. Concluding Discussion . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . .
2 6
. . . . .
. . . .
. . . .
. . . . . . . . . .
8 24 44 63 67
External Optical Feedback Effects in Distributed Feedback Semiconductor Lasers MOHAMMAD F. ALAMAND MOHAMMAD A. KARIM
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Distributed Feedback Laser Fundamentals . . . . . . . . . . . .
111. IV. V. VI.
Experimentally Observed Effects . . Theories on Optical Feedback . . . External Optical Feedback Sensitivity Conclusion . . . . . . . . . . . . References . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . . . . .
73 74 82 97 107 114 1 15
Atomic Scale Strain and Composition Evaluation from High-Resolution Transmission Electron Microscopy Images A. ROSENAUER AND D. GERTHSEN I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11. Strain-State Analysis . . . . . . . . . . . . . . . . . . . . . 111. Composition Evaluation by Lattice Fringe Analysis . . . . . . .
IV. Applications . . . . . . . . . . . . . . . . . . . . . . . . . V
.
121 125 154 182
vi
CONTENTS
V . Summary and Discussion of the Atomic Scale Analysis Methods . . Appendix A: List of Variables . . . . . . . . . . . . . . . .
222 225
Hexagonal Sampling in Image Processing R . C. STAUNTON I. I1. 111. 1V. V. VI .
Introduction . . . . . . . . . . . . . . Image Sampling on a Hexagonal Grid . . . Processor Architecture . . . . . . . . . Binary Image Processing . . . . . . . . Monochrome Image Processing . . . . . Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
232 236 259 219 289 299 302
The Group Representation Network A General Approach to Invariant Pattern Classification JEFFREY WOOD I. I1. Ill . IV . V. VI .
Pattern Classification and the Invariance Problem . . . . . . . . Group Representation Theory . . . . . . . . . . . . . . . . Linear and Nonlinear Concomitants . . . . . . . . . . . . . . Adaptivity in Group Representation Networks . . . . . . . . . Practical Considerations and Simulations . . . . . . . . . . . The Computational Power of the Group Representation Network Model . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . The Group Representation Network and Other Invariant Classification Methods . . . . . . . . . . . . . . . . . . . . VIII . Summary and Open Questions . . . . . . . . . . . . . . . . Proof of Theorem 111.1 . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . Index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
310 313 329 344 362 310
.
378 391 395 406 409
CONTRIBUTORS Numbers in parentheses indicate the pages on which the author’s contribution begins.
R. AKIS(l), Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287 MOHAMMAD F. ALAM (73), Electro-Optics Program, University of Dayton, Dayton, Ohio, 45469 J. P. BIRD(l), Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287 D. K. FERRY (l), Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona State University, Tempe, Arizona, 85287
D. GERTHSEN (121), Laboratory for Electron Microscopy, University of Karlsruhe, 76128 Karlsruhe, Germany MOHAMMAD A. KARIM(73), Department of Electrical Engineering, University of Tennessee, Knoxville, Tennessee, 37996 A. ROSENAUER(121), Laboratory for Electron Microscopy, University of Karlsruhe, 76128 Karlsruhe, Germany R. C. STAUNTON,Department of Engineering, University of Warwick, Coventry, CV4 7AL, United Kingdom M. STOPA(l), Nanoelectronic Materials Laboratory, Frontier Research Program, RIKEN, 2-1 Hirosawa, Wako-shi, Saitama 351-01, Japan
JEFFREY WOOD(309), ISIS Group, Department of Electronics and Computer Science, University of Southampton, Southampton, SO17 lBJ, United Kingdom
vii
This Page Intentionally Left Blank
PREFACE In this volume, we find surveys on the study of quantum dots, distributed feedback lasers, image analysis techniques for electron micrographs, hexagonal sampling, and pattern recognition. We begin with an account of a procedure that sheds light on the transmission properties of open quantum dots, which are quasi-zero-dimensional semiconductor structures. Application of a magnetic field sweeps successive dot states past the Fermi surface and this in turn provokes regular oscillations of the magneto-conductance of the dot at low temperatures. The effects observed for weak magnetic fields are different from those found with high fields. The physics of these new and still imperfectly understood phenomena is examined here by J.P. Bird, R. Akis, D.K. Ferry, and M. Stopa. In the second chapter, M.F. Alam and M.A. Karim describe external optical feedback effects in distributed feedback semiconductor lasers. Such lasers are particularly useful in wavelength-division multiplexed optical communication systems but they suffer from a serious problem: they are highly sensitive to any light that may re-enter the cavity. It is these effects and the various remedies that are discussed in this contribution. The authors first describe this family of lasers and the effects that have been observed experimentally. The theory of these optical feedback effects are then presented and the sensitivity to external optical feedback is assessed. We now turn to electron microscopy. Digital image processing is highly developed in this field, with numerous software packages available, ranging from all-purpose suites to those designed for specific tasks. Nevertheless, areas remain for which no suitable software has been developed. Now that electron microscopy has moved out of the purely qualitative phase (in which conclusions were based on visual scrutiny of electron micrographs) into the quantitative era, such gaps are rapidly being filled and the chapter by A. Rosenauer and D. Gerthsen is a good example of this development. The authors are interested in strain in semiconductor heteroepitaxial layers and the composition of such structures and have developed a software package (Digital Analysis of Lattice Images, or DALI) for analyzing high-resolution electron images. They first provide a full account of strain-state analysis and of composition evaluation by the analysis of lattice fringes. They then describe in detail applications to several semiconductor heterostructures, notably ZnCdSe/ZnSe and InGaAslGaAs, and conclude with a discussion of atomic-scale analytical methods. ix
X
PREFACE
Next, an account of a topic that I have found very elusive. All the books on image processing mention the advantages of hexagonal sampling over the standard square-grid technique, but none provide a thorough discussion of hexagonal sampling with examples of the hardware requirements. I am therefore particularly pleased to include this chapter by R.C. Staunton, who has long advocated the use of this sampling pattern. After a general introduction, image sampling on a hexagonal grid is presented at length, from both the theoretical and practical viewpoints. This is followed by an account of the appropriate processor architecture. The fourth section deals with binary images, and covers connectivity, distances and the morphological operators; a comparison of skeletonization in hexagonal and rectangular grids is included and it will be recalled that the early studies of mathematical morphology all included discussion of the hexagonal case. The fifth section is devoted to monochrome images; in it, such topics as the Fourier transform, various geometric transforms, filters, and edge detectors are examined. The final contribution, by J. Wood, is concerned with a central problem of pattern recognition: how can pattern classifiers be designed in such a way that they are insensitive to certain kinds of transformation of the input data? In particular, how can the classifier be made invariant under linear transformations that form a group? There are numerous practical problems in this category: classification of an object specified by 3-D coordinates when rigid-body motion in three dimensions is allowed, for example. In this chapter, J. Wood first recapitulates the necessary background knowledge of group representation theory. He then introduces concomitants and adaptivity in group representation networks. The following sections cover practical considerations, the computational power of the group approach and a comparison with other methods. This full account of these original ideas should be widely welcomed. As always, I thank all the authors most sincerely for all the trouble they have taken over their surveys and conclude with a list of forthcoming contributions.
FORTHCOMING CONTRIBUTIONS
L. Alvarez Leon and J.-M. Morel (vol. 111) Mathematical models for natural images
I. Andreadis (vol. 110) Soft morphology
D. Antzoulatos Use of the hypermatrix
W. Bacsa (vol. 110) Interference scanning optical probe microscopy
M. Berz and colleagues (vol. 108) Modern map methods for particle optics
N. D. Black, R. Millar, M. Kunt, F. Ziliani, and M. Reid Second generation image coding N. Bonnet Artificial intelligence and pattern recognition in microscope image processing G. Borgefors Distance transforms
A van den Bos and A. Dekker Resolution 0. Bostanjoglo (vol. 110) High-speed electron microscopy S. Boussakta and A. G . J. Holt (vol. 111) Number-theoretic transforms and image processing
J. A. Dayton Microwave tubes in space E. R. Dougherty and Y. Chen Granulometric filters
J. M. H. Du Buf Gabor filters and texture analysis xi
xii
F O R T H C O M I N G CONTRIBUTIONS
D. van Dyck Very high resolution electron microscopy R. G. Forbes Liquid metal ion sources E. Forster and F. N. Chukhovsky X-ray optics M. J. Fransen, T. L. van Rooy, and P. Kruit On the electron optical properties of the ZrO/W Schottky emitter A. Fox The critical-voltage effect
M. Gabbouj Stack filtering W. C. Henneberger The Aharonov-Bohm effect
M. I. Herrera and L. Brh The development of electron microscopy in Spain K. Ishizuka Contrast transfer and crystal images C. Jeffries Conservation laws in electromagnetics
M. Jourlin and J.-C. Pinoli Logarithmic image processing
E. Kasper Numerical methods in particle optics A. Khursheed Scanning electron microscope design
G. Kogel Positron microscopy K. Koike Spin-polarized SEM P, V. Kolev and M. Jamal Deen (vol. 109) Development and applications of a new deep-level transient spectroscopy method and new averaging techniques
FORTHCOMING CONTRIBUTIONS
W. Krakow Sideband imaging D. J. J. van de Laak-Tijssen, E. Coets, and T. Mulvey Memoir of J. B. Le Poole L. J. Latecki Well-composed sets
W. Li Vector transformation J.-M. Lina, B. Goulard, and P. Turcotte (vol. 109) Complex wavelets C. Mattiussi The finite volume, finite element, and finite difference methods
S. Mikoshiba and F. L. Curzon Plasma displays
R. L. Morris Electronic tools in parapsychology J. G . Nagy Restoration of images with space-variant blur P. D. Nellist and S. J. Pennycook Z-contrast in the STEM and its applications
G. Nemes Phase-space treatment of photon beams
M. A. O’Keefe Electron image simulation
B. Olstad Representation of image operators
M. Omote and S. Sakoda (vol. 110) Aharonov-Bohm scattering C. Passow Geometric methods of treating energy transport phenomena
E. Petajan HDTV
...
Xlll
xiv
FORTHCOMING CONTRIBUTIONS
F. A. Ponce Nitride semiconductors for high-brightness blue and green light emission J. W. Rabalais Scattering and recoil imaging and spectrometry
H. Rauch The wave-particle dualism D. Saldin Electron holography G . E. Sarty (vol. 1 1 1) Reconstruction from non-Cartesian grids G. Schmahl X-ray microscopy
J. P. F. Sellschop Accelerator mass spectroscopy S. Shirai Cathode-ray tube gun design methods
M. Shnaider and A. P. Paplinski (vol. 110) Vector coding and wavelets
T. Soma Focus-deflection systems and their applications 1. Talmon Study of complex fluids by transmission electron microscopy
S. Tari (vol. 111) Shape skeletons and greyscale images J. Toulouse New developments in ferroelectrics T. Tsutsui and Z. Dechun Organic electroluminescence, materials and devices Y. Uchikawa Electron gun optics
J. S. Villarrubia Mathematical morphology and scanned probe microscopy
FORTHCOMING CONTRIBUTIONS
L. Vincent Morphology on graphs J. B. Wilburn Generalized ranked-order filters
C. D. Wright and E. W. Hill Magnetic force microscopy
T. Yang (vol. 109) Fuzzy cellular neural networks
xv
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 107
This Page Intentionally Left Blank
ADVANCES IN IMAGING A N D ELECTRON PHYSICS, VOL. 107
Magneto-Transport as a Probe of Electron Dynamics in Open Quantum Dots J. P. BIRD, R. AKIS, and D. K. FERRY Center for Solid State Electronics Research and Department of Electrical Engineering, Arizona Stute University, Tempe, Arizona 85287, U.S.
M. STOPAT Nunoelectronic Materials Laboratory, Frontier Research Program, RIKEN, 2-1 Hirosuwu. Wakc-shi. Saitama 351-01, JAPAN
I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Magneto-Transport in Open Quantum Dots: Some Theoretical Considerations 111. Weak-Field Magneto-Transport in Open Quantum Dots: Low-Temperature Properties . . . . . . . . . . . . . . . . . . . . . . A. Magneto-Conductance Fluctuations in Open Quantum Dots . . . . . . . B. Probing Wavefunction Scarring at Zero Magnetic Field . . . . . . . . . C. Zero-Field Magneto-Resistance Peak . . . . . . . . . . . . . . . . . IV. Weak-Field Magneto-Transport in Open Quantum Dots: High-Temperature Properties . . . . . . . . . . . . . . . . . . . . . . A. Temperature-Dependent Characteristics of the Magneto-Conductance Fluctuations . . . . . . . . . . . . . . . . . . B. Phase-Breaking in Open Quantum Dots . . . . . . . . . . . . . . . . C . Zero-Field Magneto-Resistance Peak: A Probe of Quantum Chaos? . . . . V. High-Field Magneto-Transport in Open Quantum Dots . . . . . . . . . . A. Aharonov-Bohm Magneto-Resistance Oscillations . . . . . . . . . . . . B. Giant Backscattering Resonances . . . . . . . . . . . . . . . . . . . C. Time-Dependent Magneto-Transport . . . . . . . . . . . . . . . . . VI. Concluding Discussion . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 6 8 10 16 22 24
25 29 38 44 45 52 58 63 67
In this review, we discuss the use of magneto-transport studies to probe electron dynamics in open quantum dots, which are quasi-zero-dimensional semiconductor structures in which electrical current flow is confined on length scales that approach the size of the electron itself. The transmission properties of these structures are strongly regulated by their quantum mechanical lead openings, which inject electrons into the dot in a highly collimated beam. This beam in turn only couples favorably to a small set of Current address: Walter-Scottky Institut, Technische Universitat Munchen, D-85748 Garching, Germany.
1 Volume 107 ISBN 0-12-014749-1
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright tQ 1999 by Academic Press All rights of reproduction in any form reserved. ISSN 1076-5670/99 $30.00
2
J. P. BIRD, R. AKIS, D. K. FERRY, AND M.STOPA
states within the dot and, at temperatures where electron phase coherence is maintained over long distances, interference of these states becomes the dominant process in the resulting electrical behavior. A powerful experimental tool for probing the interference is provided by the application of a weak magnetic field, which shifts the phase of the electron wavefunction and sweeps successive dot states past the Fermi surface. The resulting fluctuations in the local density of states are thought to be reflected directly in the magneto-conductance of the dot, which exhibits a series of regular oscillations at low temperatures. Numerical simulations reveal the oscillations to be correlated to the recurrence of wavefunction scarring within the dots, the details of which are produced by a small number of semiclassical orbits. These orbits appear to be highly stable, a property that is thought to arise from the role of the quantum point contact leads, and the discrete quantization within the cavity itself. In contrast to previous suggestions, we therefore conclude that chaotic scattering is suppressed in these structures, in which current flow occurs instead in a highly spatially nonuniform manner. Application of a magnetic field also allows an estimate of the phase breaking time of the electrons to be obtained, and the influence of temperature, environmental coupling, and disorder on this parameter will be considered. The origin of a zero-field peak in the magneto-resistance will also be discussed, and will be argued to provide a signature of energy averaging in these dots. At sufficiently high magnetic fields, the transport properties of the quantum dots are dramatically modified by the formation of well-defined edge states, which may be selectively confined within the dot. A striking observation in this regime is a resonant breakdown of the quantum Hall effect, which is correlated to the depopulation of Landau levels in the dot. Numerical simulations, in which the self-consistent evolution of the quantum dot profile with magnetic field is properly accounted for, suggest this breakdown results from a sudden increase in backscattering via trapped edge states, whose widths swell significantly as a Landau level depopulates and charge is redistributed within the dot. In this regard, the resonances may be considered as resulting from van Hove-like singularities in the coupling between different Landau levels.
1. INTRODUCTION
A fundamental issue in quantum mechanics concerns the manner in which the discrete level spectrum of an isolated system is modified when it is coupled to some external, macroscopic measuring environment. From an
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
3
experimental perspective, an ideal system for the study of this issue is provided by semiconductor quantum dots, which are quasi-zero dimensional semiconductor structures in which the flow of electrical current is confined on length scales comparable to the size of the electron itself (Ferry and Goodnick, 1997). The key components of a quantum dot are illustrated schematically in Fig. 1, in which the basic idea is that current flow between the macroscopic source and drain reservoirs is forced to occur via a central cavity. The strong confinement of motion that this cavity generates quantizes the electronic energy spectrum into a discrete ladder of states, while the coupling between the dot and its environment may be tuned directly in experiment by suitable adjustment of quantum point contact leads. In most situations of interest here, quantum dots are realized using the split-gate technique, the basic principles of which are illustrated schematically in Fig. 2 (Thornton et al., 1986). According to this approach, metal gates with a fine-line pattern defined by electron beam lithography are first deposited on the surface of a GaAs/AlGaAs heterojunction. Application of a suitable negative bias to the gates depletes the regions of electron gas from directly underneath them, forming a dot whose lead openings are defined by means of quantum point contacts (van Wees et ul., 1988, Wharam et al., 1988a). Use of electron beam lithography to define the gates allows the realization of submicron sized dots, whose size is therefore comparable to the spatial extent of short-range potential fluctuations in the underlying two-dimensional (2-D) electron gas (Nixon and Davies, 1990). These fluctuations are associated with the statistical distribution of donors in the AlGaAs layer and
FIGURE1. Schematic diagram illustrating the key features of a quantum dot. A central scattering region is connected to the source and drain by means of one-dimensional (1-D) leads, each of which supports a small number ( N ) of propagating modes.
4
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
FIGURE 2. Realization of quantum dots using the split-gate technique. (a). SEM micrograph of a 1-pm quantum dot, defined on a GaAs/AlGaAs heterojunction. The darker regions correspond to the semiconductor substrate, while the lighter ones are Ti-Au gates; (b) schematic diagram illustrating the depletion of a two-dimensional (2-D) electron gas through the application of a negative bias to suitable surface gates; the solid line indicates the shape of the conducting electron channel formed between the gates; (c) schematic diagram illustrating the depletion edge induced around the surface gates of a split-gate quantum wire.
recent studies suggest that minimum-energy considerations cause their ionization to order into a quasi-lattice structure (Stopa, 1996). In the presence of the resulting weak disorder, electronic motion within the dots should therefore be predominantly ballistic in nature, with large-angle scattering events being restricted to their confining walls (Richter et al., 1996). With a sufficiently negative bias applied to the quantum point contact leads of the dot, electron transmission through them may occur only by tunneling and the transport behavior is dominated by the Coulomb blockade effect (Grabert and Devoret, 1991; McEuen et al., 1991; Ashoori et al., 1992; Waugh et al., 1995; van der Vaart et al., 1995; Yacoby et al., 1995; Tarucha et al., 1996). However, we focus here on the behavior exhibited by open quantum dots, whose point contacts are instead configured to allow electron transmission via a finite number of transversely quantized modes
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
5
(Marcus et al., 1992; Chang et al., 1994; Bird et al., 1994a; Keller et al., 1994; Persson et al., 1995). In these open dots, the Coulomb blockade effect is suppressed and electron transport instead provides a natural connection to the study of quantum chaos (Jalabert et al., 1990). In particular, since open dots are energetically coupled to their external environment, it has been suggested that their discrete level spectrum should be obscured by lifetime broadening effects (Jalabert et al., 1990). The problem with these arguments, however, is that they ignore the few mode nature of the point contact leads, which act as quantum mechanical filters between the dot and its environment, greatly restricting the excitation of phase space within the central dot. The filtering action itself arises from the need for incoming electrons to match their transverse momentum component to one of the strongly quantized values within the input contact. Consequently, electrons are injected into the dot in a highly directed, or collimated, beam (Akis et al., 1996a, 1996b, 1997a), which is only thought to couple efficiently to a small set of preferred dot states (Zozoulenko et al., 1997). At suitably low temperatures, where phase coherence of the electrons is maintained over long distances, subsequent interference of these eigenstates then becomes the dominant factor in determining the resulting electrical behavior (Marcus et al., 1993a, 1993b; Bird et al., 1995a, 1998a; Clarke et al., 1995). In other words, rather than obscuring the discrete level spectrum of the dot, the introduction of environmental coupling by means of quantum point contacts is instead thought to filter the effective density of states that contributes to transport. Transport measurements are therefore expected to be ideally suited as an experimental probe of this filtering effect. Motivated by the preceding, we discuss here the use of magneto-transport studies to probe the dynamics of electrons in open quantum dots. At sufficiently weak magnetic fields, the main effect of the applied field is to modulate the intrinsic motion of electrons, by shifting the phase of their wavefunction. Important information is thus obtained on the nature of electrical current flow through these strongly quantized structures, and on the factors that limit the phase coherence of electrons trapped within them. At higher magnetic fields, however, well-defined Landau levels ultimately form, giving rise to a dramatic redistribution of charge within the quantum dot (Stopa et al., 1996). In this edge state regime, the dot may be considered as an artificially engineered atom, whose “level transitions” may be studied directly in experiment (van der Vaart et al., 1994a). The organization of this chapter is as follows. In Section 11, we discuss some of the theoretical concepts that will be important when interpreting the results of the magneto-transport studies. Because our main interest lies in probing the intrinsic dynamics of electrons in the dots, in Section I11 we begin by discussing the results of our weak field studies and focus on the
6
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
behavior obtained at low temperatures, where the influence of electron dephasing may be reasonably neglected. The importance of dephasing is then considered in Section IV, in which we present the results of studies performed at higher temperatures. In Section V, we discuss the behavior observed at high magnetic fields, where novel resonant scattering effects are found to dominate the magneto-transport. Finally, we present our conclusions in Section VI.
11. MAGNETO-TRANSPORT IN OPENQUANTUM DOTS: SOMETHEORETICAL CONSIDERATIONS
The importance of electron transport measurements as a spectroscopic probe of open quantum dots derives from the connection between conductance and the density of states. In order to illustrate this connection, in Fig. 3 we show the computed energy levels of an isolated dot and the evolution of these levels with magnetic field. In the same figure, we also show the conductance contour plot that is obtained for the same dot, when four propagating modes are present in its point contact leads. While a number of subtle modifications may be resolved, it is nonetheless apparent that the basic details of this contour reflect the underlying energy level structure of 15.5
15.5
14.0
14.0 -0.1
0.1
MAGNETIC HELD (T)
-0.1 0.1 MAGNETIC FIELD (T)
FIGURE3. Conductance measurements of open quantum dots provide a spectroscopic probe of their discrete level spectrum. The energy level spectrum of an isolated (0.3 pm) dot is shown on the left-hand side, while on the right-hand side we show the corresponding conductance contour plot, obtained with four modes now propagating in the dot leads. Lighter regions correspond to higher conductance. (See also Plate 1.)
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
7
the isolated dot. This correspondence between conductance and the density of states may easily be understood by recalling that the former quantity is expected to be proportional to the transmission probability of electrons through the dot (Landauer, 1957; Biittiker et ul., 1985). As this probability should in turn be proportional to the density of final states that may be accessed in transport, we thus arrive at the connection between conductance and the density of states (Ferry et al., 1998). At sufficiently low temperatures, where electron phase coherence is maintained over long distances, we therefore expect that transport measurements should provide an experimental probe of the discrete level structure of open quantum dots. Transport measurements are also expected to provide an important tool that may be used to clarify the correct semiclassical description of electron dynamics in open quantum dots. The connection here arises via periodicorbit theory, according to which the density of states of any quantum system may be decomposed in terms of contributions from closed, semiclassical orbits (Gutzwiller 1971, 1990; Berry and Tabor, 1976, 1977; Jalabert et ul., 1990, Nakamura 1993, Casati and Chirikov, 1995; Brack and Bhaduri, 19973. The important point t o note is that the introduction of environmental coupling by means of quantum point contacts is thought to filter the effective density of states that contributes to the conductance [Akis et a/., 1996a; Ferry et al., 19981. Consequently, it is expected that the appropriate semiclassical description of these devices will be one in which a small number of the intrinsic periodic orbits of the dot are coupled and participate in transport. As already mentioned, the filtering action of the leads is thought to arise from their ability to collimate electrons into a highly directed beam (Fig. 4). From the perspective of the semiclassical motion of
FIGURE 4. Quantum mechanical wavefunction simulation showing electrons emerging in a highly collimated beam from a quantum point contact. The gate geometry is taken to be the same as the asymmetric pattern shown in Fig. 2(a). In the left-hand figure, only one occupied mode is present in the quantum point contact, while in the right-hand one seven modes are supported. (See also Plate 2.)
8
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
the electrons, it may be imagined that this beam is only able to favorably couple to those few orbits whose momentum components it closely matches (Akis et al., 1996a, Zozoulenko et al., 1997). At sufficiently high magnetic fields, well-defined Landau levels form in the dot, whose transport properties are dramatically modified by a redistribution of charge into compressible and incompressible regions of electron gas (Beenakker, 1990; Chang, 1990; McEuen et al., 1991; Chklovskii et a/., 1992; Stopa et al., 1996). The compressible regions are characterized by noninteger filling factor and consequently exhibit metallic-like screening. The incompressible regions, on the other hand, correspond to the special situation of integer filling factor, and their screening properties are very poor. Because the incompressible regions are usually much narrower than their compressible counterparts, this spatial redistribution of charge strongly modifies the shape of the dot from its zero-field form. Its walls, in particular, develop a series of broad (compressible) terraces, which are separated from each other by much narrower (incompressible) regions, in which the confining profile suddenly changes (Stopa et al., 1996).
111. WEAK-FIELD MAGNETO-TRANSPORT IN OPEN QUANTUM DOTS:
Low TEMPERATURE PROPERTIES In this section, we consider the transport properties of open dots at low temperatures, where experiment has shown that the wave-like nature of the electrons is maintained over time scales up to a hundred times longer than the ballistic transit time across the dot (Bird 1995a, Clarke 1995). To illustrate the behavior typically observed in this regime, we show in Fig. 5 the results of a magneto-resistance measurement, performed over a wide range of magnetic field. At the highest fields shown here, the cyclotron radius of the electrons
(where k , is the Fermi wavevector and B is the magnetic field) is significantly smaller than the size of the dot and current flow occurs via well-defined edge states (Biittiker, 1988). The edge states result from the intersection of successive Landau levels with the Fermi surface, and may be thought of as analogous to classical skipping orbits. For suitable gate voltages, AharonovBohm oscillations are observed in this regime and are understood to result from electron tunneling via edge states trapped in the dot (Fig. 5, expanded section) (van Wees, 1989; Bird et al., 1994b). In this section, we focus on the
W
MAGNETIC FIELD (TESIA)
FIGURE5. Typical magneto-resistance trace, measured in a 1-pm split-gate quantum dot at 0.01 K. The noise-like features are in fact highly reproducible, and an expanded view of the structure at high magnetic fields reveals highly periodic oscillations.
10
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA TABLE 1 BASICPROPERTIES OF THE SEMICONDUCTOR WAFERS USED IN THISSTUDY TRANSPORT AT TEMPERATURES OF A FEWDEGREES KELVIN CHARACTERISTICS OBTAINED
Wafer mobility (p) m2/Vs
Carrier density ( n , ) x 10'5m-2
Mean free path ( / ) !'m
Fermi wavelength (J.F) nm
36-78
4.5-5.0
4-9
-
35
behavior observed at weaker magnetic fields, where the cyclotron radius is much larger than, or comparable to, the dimensions of the dot (Tables 1 and 2). In this regime, the low-temperature magneto-resistance is typically dominated by dense fluctuations, and closer inspection of these reveals a highly periodic nature. As can be seen from Fig. 6, this periodicity is quite distinct from that of the Aharonov-Bohm oscillations seen in the edge state regime. Another feature that may be observed at weak fields is a central magneto-resistance peak (Fig. 7) (Marcus, 1992; Chang et al., 1994; Bird et al., 1995b), and at first glance this appears reminiscent of that which arises due to weak localization in disordered thin-films (Bergmann, 1983, 1984).
A . Magneto-Conductance Fluctuations in Open Quantum Dots
The regular nature of the magneto-conductance fluctuations, observed in the weak-field regime, is confirmed by the results of Fourier analysis, which typically reveals the presence of just a few distinct frequencies (Fig. 8) (Bird et al., 1996a, 1997a). The fluctuations themselves are thought to arise from an interference effect, involving electron partial waves that propagate through the dot via a small number of periodic orbits (Akis et al., 1996a).
TABLE 2 TYPICAL PARAMETERS FOR THE QUANTUM DOTS" Gate size /'m
Effective dot size ( L ) Pm
~-
2 1 0.6 0.4
1.8 0.8 0.5 0.2-0.3
L/2F
---255015 -5
2r, = L Pm
AIkB
K ~
0.13 0.29 0.45 0.7-0.9
0.03 0.13 0.33 -1
"The effective size of the dots was inferred from the observation of Aharonov-Bohm oscillations in the edge state regime (Bird et a/., 1994b).
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
11
28
21
A
c:
z w
0
14
2
0
2
0.1
0.2
0.3
3.1
3.2
3.3
0.4
0.5
; w
35
w
29
23 3
3.4
3.5
MAGNETIC FIELD (TESLA) FIGURE6 . Comparison of the magneto-resistance over different ranges of magnetic field. In the lower plot, the strikingly periodic oscillations result from tunneling via edge states that are trapped in the dot. The basic periodicity of these oscillations is clearly different from those shown in the upper figure, which spans the same increnierir of magnetic field (0.5T). The experiment was performed at a temperature of 0.01 K.
As such, the simple periodicity of the fluctuations noted here suggests this interference is dominated by contributions from a small number of these semiclassical orbits. This characteristic is fully consistent with the results of electron transport simulations, which reveal the wavefunction within the dot to be strongly scarred by the remnants of a few closed orbits (Akis et al., 1996a, 1996b, 1997b). While the details of this scarring vary sensitively with magnetic field, we nonetheless find that certain scars may recur quite regularly, with a basic periodicity that is in good agreement with that of the measured fluctuations. Fourier analysis of the fluctuations shown in Fig. 8,
0
10
' -3
0.01
J
l , , , , i , , , , l , , , , l , , , , l , , , , ~
0 MAGNETIC FIELD (TESLA)
3
FIGURE7. Low-temperature magneto-resistance of a 1-pm GaAs/AIGaAs split-gate quantum dot. The inset shows an expanded view of the region around zero magnetic field, where a central magneto-resistance peak is apparent. The experiment was performed at a temperature of 0.01 K.
for example, reveals a fundamental magnetic frequency of approximately 9 T - ' . This in turn corresponds very closely to the magnetic field period over which a diamond-like scar is found to recur in the simulations (AB = 0.11 T, Fig. 8) (Akis et al., 1996a). With a small number of orbits involved in transport, electron interference occurs when the orbits return to their initial point, in this case the input point contact (Berry, 1984). Because the experiment here is performed in the presence of a weak magnetic field, this typically causes electrons to precess around the cavity, so that many rotations of the dot are required before orbit closure may be achieved (Bird et al., 1997b). At sufficiently low temperatures, electrons undergo multiple traversals of this basic orbit, while
0.5
4
3
2
0.0
-
0.5
0.m
0.65
1
-
c.
w
1
0
-
-
0
2
0
4
0
6
0
8
0
1
0
0
FREQUENCY (lTTESI-4)
FIGURE 8. The well-defined periodicity observed in the weak field magneto-conductance fluctuations is found to be correlated to the recurrence of well-defined wavefunction scars within the dot. In this figure we show the behavior observed in a 0.4-pm split-gate dot, which reveals fluctuations with a fundamental frequency of 9 T-'. This frequency content does not change significantly as the dot lead openings are varied, and corresponds closely to the field scale over which a diamond scar recurs. Lighter regions in these probability density plots correspond to regions of enhanced probability density. The experiment was performed at a temperature of 0.01 K. (See also Plate 3.)
14
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
maintaining phase coherence, and it is this highly recursive process that builds up the scar. Given such considerations, it is clear that a crucial requirement for the observation of well-defined scarring is that electrons remain coherently trapped within the dot over very long time scales. This notion is confirmed by the results of temperature dependent studies, which, as we will discuss in Section IV, may be considered as probing the time dependent evolution of the scars. We have already mentioned that periodic orbit theory reveals the underlying connection between quantum states and their associated semiclassical orbits. In this regard, the simple periodicity of the fluctuations noted here may also be considered to demonstrate that the transport properties of open dots are dominated by a small set of preferred quantum states. The selection of these states is thought to result from the collimating action of the input contact, in support of which we note that corresponding calculations in which the point contacts are replaced with uniform tunnel barriers reveal much weaker scarring behavior (Akis et al., 1996a). This conclusion is also supported by the results of an independent study, which has shown how quantum point contacts might be used to preferentially excite discrete states within open dots (Zozoulenko et al., 1997). The situation here is similar to that encountered in resonant tunneling diodes, in which only those states of the well, which couple effectively to the barriers, are excited in tunneling to exhibit clearly defined scars (Fromhold et al., 1995). 1. Stability of the Dominant Orbits Further experiments reveal a number of important properties of the orbits that dominate interference in open dots, the first of which is that these orbits appear to be highly stable. This characteristic is revealed by studies in which the magneto-conductance fluctuations are measured at a series of different gate voltages (Bird et al., 1996a, 1997a, 1997b). These studies reveal that the main effect of varying gate voltage is to modulate the amplitude of the dominant Fourier peaks, while leaving their frequency values unaffected (Fig. 8). The stability of the selected orbits is further suggested by the results of numerical calculations, which reveal essentially the same scarred features when the number of modes in the leads is varied over a similar range to that considered in experiment (Fig. 9).
2. Size-Dependent Scaling The periodic nature of the fluctuations is more clearly resolved in smaller dots, in which the increased importance of energy quantization presumably results in a smaller number of dot states being excited in transport. In order
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
15
FIGURE 9. Diamond scars formed in quantum dots with different numbers of modes present in the quantum point contact leads. The dot size here is 0.3 pm, which corresponds to the effective size of the experimental dot studied in Fig. 8. (See also Plate 4.)
to quantify the scaling behavior of the fluctuations, we compute their averaged Fourier spectra by summing over traces obtained at a number of different gate voltages. A fundamental peak typically survives this averaging and its amplitude is found to increase by more than two orders of magnitude when the dot size is reduced in experiment (Fig. 10). The frequency of this peak appears to show a linear scaling with dot length, rather than area, a surprising observation that is nonetheless confirmed by the results of numerical simulations (Bird, et al., 1997b; Holmberg, et al., 1998). In order to account for this unexpected scaling, we note that the basic periodicity of the fluctuations should reflect the rate at which the magnetic flux enclosed by the dominant orbits varies. In the presence of a weak magnetic field, these orbits precess around the dot so that many traversals are required before orbit closure may be achieved (Akis, et al., 1996a). For the periodicity of the fluctuations, the relevant magnetic flux is, therefore, that enclosed between multiple traversals, which in turn may be modified due to flux cancellation effects (Beenakker and van Houten, 1988). Indeed, from studies of the conductance fluctuations in disordered systems, it is known that the flux enclosed between different trajectories may scale in proportion to their length (Ferry and Goodnick, 1997; Ferry, et al., 1997). While the lengthdependent scaling observed here appears consistent with these arguments, it is alternatively possible that it somehow reflects the nonergodic sampling of phase space in these strongly scarred dots, and further studies are required to resolve this issue. 3. Universality of the Scarring Behavior While we have thus far restricted our discussion to the results of experiments performed on dots with the same gate geometry as that shown in Fig. 2, we emphasize that the selective excitation of stable orbits appears to be a generic property of all dots whose environmental coupling is provided by
16
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA 2.0
1.5
w P
zn A
B 4
i!
I.o
t-
u
w
n v)
0.5
0.0
0
20
40 60 80 FREQUENCY (1ITESLA)
I00
FIGURE10. Power spectrum obtained by averaging the spectra of the fluctuations, measured in the same 0.4-pm dot at a number of different gate voltages. First and second harmonics remain resolved. Upper inset: Amplitude of the averaged fundamental peak as a Function of dot size. Lower inset: Frequency of the averaged fundamental peak as a function of dot size. The experiment was performed at a temperature of 0.01 K.
means of few-mode quantum point contacts. In particular, studies performed on dots with chaotic gate geometries, and with different lead opening orientations, have also been found to reveal a highly regular nature to their weak field fluctuations (Fig. 11) (Marcus, et al., 1992; Okubo, et al., 1997a). As the associated wavefunction scarring implies a highly nonuniform sampling of phase space within these dots, an important conclusion is therefore that chaotic scattering is suppressed in open quantum dots when their discrete quantum mechanical nature becomes suficiently resolved.
B. Probing Wavefunction Scarring at Zero Mugneric Field As magneto-transport studies reveal a highly selective nature to electron transport in open dots, a natural question now arises as to whether this
17
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
2
0 0
50 100 MAGNETIC FREQUENCY (PERTESlA)
150
FIGUREI I. Observation of scarring effects in a stadium-shaped dot. The gate geometry studied in experiment is shown in the upper left inset and the spacer bar indicates 1 pm.The periodic nature of the weak field magneto-conductance is shown in the lower right inset, while in the main figure the frequency content of the fluctuations, obtained in experiment and in simulations, is compared. An example of scarring for this dot is shown in the lower left inset. The experiment was performed at a temperature of 0.01 K.
selectivity is intrinsic to these structures or if it is related instead to the application of the magnetic field. In order to resolve this issue, we note that Fig. 3 implies that Fermi energy variations should provide an equally effective probe of the discrete eigenspectrum of open dots. In experiment, we mimic this variation by modulating the voltage applied to the gates of the dot at zero magnetic field. At liquid helium temperatures, this modulation gives rise to a series of smooth plateaus in the resistance, reminiscent of those exhibited by pairs of quantum point contacts aligned in series (Fig. 12) (Wharam et al., 1988b). As the temperature is lowered, however, the phase
18
J. P. BIRD. R. AKIS, D. K . FERRY, A N D M. S T O P A 40
30
-2.4
-2
-1.6
-1.2
-0.8
GATE VOLTAGE (VOLTS) FIGURE 12. T h e resistance-gate voltage characteristic of a I - p n split-gate dot, measured at two different temperatures (indicated).
coherent lifetime of the electrons increases (Bird et al., 1995a) and the plateaus become disrupted by the growth of reproducible fluctuations (Fig. 12). After subtracting the quantized background from the total resistance variation, we thus obtain a series of highly regular conductance oscillations, reminiscent of those seen in the magneto-transport studies (Fig. 13) (Bird et af., 1998b). These oscillations, too, are found to be a generic feature of these devices, and suggest that the selective nature to electron transport inferred previously is indeed an intrinsic property of these devices. As for the origin of the gate-voltage induced oscillations, these are thought to arise from an associated modulation of the size of the dot. In particular, an analysis of the high field Aharonov-Bohm oscillations (Bird et al., 1994b) indicates that the magnitude of this modulation may be as much as several tens of percent, at least in the smallest dots studied. Varying the size of the dot in this manner,
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
W
60
2
40
I.6
20
1.2
0
0.8
19
Y
s
u, $ D:
-20
-40
-60
-80 -1.6
-1.4
-1.2
-1
-0.8
-0.6
GATE VOLTAGE (VOLTS)
FIGURE13. Gate voltage induced conductance variation, measured in a 0.4-pm split-gate dot at 0.01 K. Upper curve: variation of dot resistance with gate voltage. Lower curve. corresponding conductance oscillations, obtained after subtracting a monotonic background from the original data. The dot geometry employed here is shown in the inset.
it should be possible to sweep its eigenstates past the Fermi surface, generating fluctuations in the local density of states that should, in turn, be reflected directly in the measured conductance. In our earlier magneto-transport studies, an important feature was found to be the existence of a correlation between the periodicity of the magnetoconductance fluctuations and the recurrence of specific wavefunction scars. In order to consider the possibility of a similar correlation here, we have performed electron transport simulations in which the influence of the gate voltage variation on the confining profile of the dot is properly accounted
20
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
for (Vasileska et al., 1998). In this way, we obtain conductance oscillations as a function of gate-voltage whose frequency content agrees excellently with that obtained in experiment (Fig. 14). We also find that the wavefunction may be strongly scarred at zero-field, and that specific scars may recur quite regularly with gate voltage, in good correspondence with the periodicity of the associated conductance oscillations (Figs. 15 and 16). Consequently, we conclude that the selective nature to electron transport inferred from the magneto-transport studies is indeed intrinsic to open dots. Finally, in this section, we note that by combining the results of studies in which magnetic field and gate voltage are independently varied, it should
1
0.8
3
0.4
t;
3 W
0.2
u z
2 0 3
n
0
z
8 -0.2
-0.4
-0.6
-1
-0.9
-0.8
-0.7
-0.6
GATE VOLTAGE (VOLTS)
FIGURE14. Comparison of the gate voltage induced conductance oscillations, measured in experiment at 0.01 K and computed using self-consistent numerical simulations. The dot geometry employed here is shown in the inset.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
21
FIGURE 15. Self-consistently computed wavefunction plots, obtained from simulations of the split-gate dot geometry shown in Fig. 14. The plots were obtained at three different gate voltages and the darker regions correspond to enhanced probability density. A typical dot profile is shown in the upper left figure. (See also Plate 5 . )
be possible to construct conductance contour plots that provide information on the coupling modified density of states of the dots (Chan et al., 1995; Persson et al., 1995; Berggren et al., 1996). An example of one such contour is plotted in Fig. 17, in which the color scale indicates the measured variation of dot conductance with magnetic field and gate voltage. While a direct comparison to the form of Fig. 3 (in which the variation of magneto-conductance with energy is instead computed) cannot be made here, for now we are at least encouraged by observation of the well-defined striations that run through the experimental plot. Similar striations are also resolved in the numerical contours (Figs. 3 and 18), and have previously been shown to correspond to points of constant scarring (Akis et al., 1998b). Before a more detailed comparison of experiment and theory can be made, however, it will first be necessary to compute contour plots in which the influence of the gate voltage variation on the self-consistent profile of the dots is properly considered.
22
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA I
0
I
I
I
.
I
/
,
,
,
I
,
,
.
,
I
"
"
50
I
100
FREQUENCY (V')
FIGURE 16. The Fourier spectrum of the experimental conductance oscillations shown in Fig. 14, reveals a peak at the gate voltage frequency that corresponds to the recurring scars shown in Fig. 15 (see arrow). The inset shows the original, experimental oscillations.
C. Zero-Field Magneto-Resistant Peak
Another feature that may be observed in the magneto-resistance of open dots is a central peak at zero-field (Fig. 19) (Marcus et al., 1992; Berry et al., 1994a 1994b, Chang et al., 1994, Bird et al., 1995a; Keller et ul., 1996), which has previously been attributed to a weak localization effect in which electrons are backscattered through a series of collisions with the confining walls of the dot (Baranger et al., 1993). The study of this peak is often difficult at low temperatures, due to the obscuring effect of the surrounding fluctuations (Figs. 19 and 20), and previous studies have emphasized the importance of using a suitable energy average to resolve it (Baranger et al., 1993; Keller et al., 1996). In what follows, we will argue that the observation
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
23
-0.419
-0.363 -0.25
0
0.25
MAGNETIC FIELD (TESLA)
FIGURE 17. Experimentally determined conductance contour plot, obtained for a 0.4-pm split-gate quantum dot. The color scale ranges from red to blue, indicating low to high conductance, respectively. (See also Plate 6.)
of a well-defined peak in such experiments reflects the influence of energy averaging on the discrete spectrum of the dot. In particular, we will show how distinct peak lineshapes may be obtained, depending on the range of energy over which this average is taken. This in turn leads us to a very different interpretation of these peaks, in which the role played by closed semiclassical orbits is once again emphasized (Akis et a/., 1998b). We
24
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
MAGNETIC FIELD (TESLA) FIGURE18. Numerically determined conductance contour plot, for a 0.3-pm quantum dot. The well-defined lines indicated by the arrows correspond to lines of constant scarring. (See also Plate 7.)
postpone a discussion of this issue for now, however, until the role of electron dephasing in these structures has been considered.
Iv. WEAK-FIELD MAGNETO-TRANSPORT IN OPEN QUANTUM DOTS: HIGH-TEMPERATURE PROPERTIES While we have suggested that the transport properties of open quantum dots are strongly influenced by electron interference, we have thus far neglected the fact that the wave-like nature of the electrons is not preserved indefinitely in condensed matter systems (Ferry and Goodnick, 1997). While this approach should be quite reasonable at low temperatures, where the
MAGNETO-TRANSPORTAS A PROBE OF ELECTRON DYNAMICS
25
160
120
40
0
-0.5
0
0.5
MAGNETIC FIELD (TESLA)
FIGURE19. Weak field magneto-resistanceof a 0.4-pm split-gate quantum dot, measured at a temperature of 0.01 K and a series of different gate voltages. Successive curves are offset by 5-kR increments.
phase breaking time of the electrons may be very long, it is expected that the increased importance of dephasing should result in a suppression of interference at higher temperatures. In this section, we therefore consider the manner in which this suppression arises.
A . Temperature Dependence Characteristics of the Magneto- Conductance Fluctuations Reminiscent of the universal conductance fluctuations observed in disordered quantum wires (Lee 1987), the fluctuations in open dots are found to be
26
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA 35
32
0.13 K (+lo kn)
29
26
0.36 K (+S kn)
23 1.4K (*2
kn)
20
-0.2
-0.1
0
0.1
0.2
MAGNETIC FIELD (TESLA)
FIWRE 20. Temperature dependence or the magneto-resistance of a 1-,urn split-gate quantum dot, measured in the region near zero magnetic field.
strongly suppressed on raising the temperature to above a degree Kelvin (Fig. 21). The quantitative manner in which this quenching arises is found to be very different to that exhibited by disordered wires, however, and is thought to reflect the different nature of electron dynamics in these systems. In particular, disordered wires are known to exhibit a power law scaling of their fluctuation amplitude, which results from the broad distribution of diffusive electron trajectories that contribute to transport (Lee et a/., 1987, Bird et a/., 1990, 1991). Here, the fluctuations are exponentially suppressed with temperature (Fig. 22), however, and we have argued that this behavior is consistent with the notion that a stable set of orbits dominates the transport behavior (Bird et al., 1994a). In particular, as the phase-breaking
MAGNETO-TRANSPORTAS A PROBE OF ELECTRON DYNAMICS
27
60
50
40
10
; 0
0.5
1
1.5
2
2.5
MAGNETIC FIELD (TESLA)
FIGURE2 I . Temperature dependence of the magneto-resistance of a 1-pm split-gate quantum dot. Successive curves are offset by lo-kn increments.
time shortens at higher temperatures, the number of electrons that are able to propagate coherently along these orbits should decrease exponentially. As the fluctuations are thought to result from interference between electrons that propagate coherently along the stable orbits, it thus seems reasonable that this exponential reduction in transmission probability should be reflected directly in the amplitude of the fluctuations. Indeed, similar considerations are known to govern the temperature dependent decay of the Aharonov-Bohm oscillations in disordered rings, which also result from a geometrically defined interference effect and which are found to reveal a similar exponential decay to that shown here (Milliken et al., 1987, Chang et al., 1988, Kurdak et al., 1992). The exponential temperature scaling of the fluctuations is also confirmed by numerical simulations (Akis et at., 1996b), in which dephasing is accounted for phenomenologically by the introduction of an imaginary
10-2 0
0.25
0.5
0.75
1
1.25
1.5
TEMPERATURE (KELVIN)
FIGURE22. Left: The root mean square (rms) amplitude of the conductance fluctuations decreases exponentially with increasing temperature in experiment. Right: The experimental temperature variation is thought to reflect a similar exponential quenching of the wavefunction scarring, which is induced as the electron dephasing rate increases. In this figure the computed wavefunction in a 0.3-pm dot is shown for a number of different phase breaking times (T~).(See also Plate 8.)
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
29
potential (Wang et ul., 1993). These latter studies reveal that the exponential suppression of the fluctuations appears to he associated with a simultaneous disruption of wavefunction scarring due to phase-breaking scattering. This notion is illustrated in Fig. 22, in which we consider the influence of electron dephasing on a representative scar. Note how reducing the phase-breaking time to 500 ps has little effect on the initial scar, which is computed at absolute zero and for no dephasing. On reducing the phase-breaking time to 50 ps, however, the scar is almost completely destroyed, while for a coherent lifetime of just 1 ps only the collimation of incoming electrons may be resolved. As we will show, experimental studies of phase breaking in open dots yield typical lifetimes of the order of a few hundred ps at low temperatures, which decrease by more than an order of magnitude on warming to a degree Kelvin (Bird et al., 1995a; Clarke et al., 1995). In this regard, the temperature-dependent decay of the fluctuations in Fig. 21 appears consistent with a disruption of scarring, similar to that shown in Fig. 22. The wavefunction plots of Fig. 22 reveal an important general property of the scarring, namely that it results from interference of coherent electrons that are trapped in the dot for very long time scales. In particular, when viewed in order of increasing phase coherence, the series of plots in Fig. 22 may be considered to show the time dependent growth of scarring, subsequent to the initial injection of electrons into the dot. From this perspective, the important feature to note is that, while the direct transit time across the dot is no more than a few picoseconds, a well-defined scar is only built up once electrons have been coherently trapped for at least a hundred times longer than this! The implication is, therefore, that temperature dependent transport studies may be used to probe the temporal evolution of the scars.
B. Phase Breaking in Open Quantum Dots
An important parameter for characterizing electron interference in open quantum dots is provided by the phase-breaking time (z~),the average time scale over which the quantum mechanical phase of the electrons is preserved. In this section, we describe an experimental technique for determining the phase-breaking time, and consider some of the physical factors that limit its value. The approach that we employ exploits the magnetically induced increase in the average periodicity of the fluctuations, which is observed at high fields as skipping orbits begin to form (Fig. 23) (Bird et al., 1991, 1995a, 1995c, 1995d; Geim et al., 1992; Brown et al., 1993; Ferry et al., 1995). In this regime, fluctuations are thought to arise from interference between different skipping orbits, whose coupling is predominantly generated by
30
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
f
I.o
2 2
0 I-
$
0.5
t
If
U
"1
I-
0
3 U W
0 2
2
0.0
7 P
x
0
-0.5 0
0.5
I.5
1
2
2.5
MAGNETIC FIELD (TESLA)
FIGURE23. Conductance fluctuations measured in a 1-prn split-gate quantum dot at two distinct temperatures (the higher temperature trace has been shifted upwards by 0.75 ez/h for clarity). In both cases, the traces were obtained by subtracting a smoothed polynomial fit from the raw rnagneto-conductance data. The form of the background did not change significantly over the temperature range shown, and its average resistance was of order 16 kR.
scattering in the point contact leads. To compute the characteristic magnetic flux enclosed between these orbits, we consider the area that a single orbit encloses as it skips coherently along the walls of the dot (Fig. 24) (Ferry et al., 1995)
where N is the number of bounces the electron makes before losing phase coherence and vQ is the Fermi velocity. Given this definition, we obtain a simple expression relating the average period of the fluctuations to the magnetic field
B(B)
q+~
A,
h eA,
8n2tn*
hkiz,
B,
(3)
where m* is the effective mass of the electron and we have exploited the relations 1 . 1 ~= hk,/nz*, rc = hk,/eB (Eq. (1)) and h = h/2n; B, is the correla-
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
31
FIGURE24. Upper figure: schematic diagram illustrating, from left to right, electron trajectories in the weak, intermediate, and strong magnetic fields. respectively. Middle figure: the quantum mechanical probability density, calculated in a 0.8-pm dot at a magnetic field of 1.55 T. For reasons of clarity, only one edge state is shown in this picture. Lower figure: in the skipping orbit, regime fluctuations are thought to result from interference between different orbits, associated with different Landau levels. The characteristic area enclosed between these orbits is estimated by computing that enclosed between a single, average orbit and the walls of the dot.
tion field that arises in the fluctuation correlation function (Lee et ul., 1987)
where the angled brackets indicate an average over a suitable range of magnetic field. According to Eq. (3), when the phase-breaking time is independent of magnetic field the average period of fluctuation should increase as a linear function of the applied field. Such behavior is indeed found to be typical of experiment (Fig. 25, inset) (Bird et al., 1995a, Okubo et al., 1997b) and from the slope of the resulting straight line fit we are able to obtain an estimate for the phase-breaking time. Due to the approximations involved in computing the effective area for interference in the
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
I02
0
10
2
MAGNETIC FIELD (TESIA)
2
lo-'
101
100
TEMPERATURE (KELVIN)
FIGURE25. Main figure: Experimentally determined variation of the phase-breaking time with temperature in two different quantum dots (solid circles: 1 pm; open circles: 0.6 pm). The markers on the upper axis indicate the temperature where the thermal smearing ( k , T ) becomes comparable to the average level spacing (A) in the dots, while the dashed lines indicate a 1/T variation. Inset: Experimentally determined variation of the fluctuation correlation field with magnetic field. At intermediate fields, an approximately linear variation may be resolved.
skipping orbit regime, the value of this estimate is only expected to be correct within a numerical factor of order unity. Using this technique, however, it is expected that it should be possible to determine accurately the evolution of the phase breaking time with external parameters, such as temperature or gate voltage. 1. Phase Breaking at Finite Temperatures
In the main panel of Fig. 25, we show the measured variation of the phase-breaking time with temperature in two different quantum dots. At temperatures of order a degree Kelvin, the phase-breaking time varies roughly inversely with temperature in both dots, similar to the behavior
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
33
found in an independent study (Clarke et al., 1995). While the origin of this inverse variation remains undetermined, we note that it is reminiscent of that obtained for electron-electron scattering in two-dimensional (2-D) disordered systems (Altshuler et al., 1982; Fukuyama and Abrahams, 1983; Takane, 1998). As the temperature is lowered, however, both dots show a tendency to saturated behavior, which sets in at a higher temperature in the smaller dot. We have suggested that saturation is related to a crossover from 2-D to zero-dimensional phase-breaking behavior, which occurs when the discrete levels of the dot become thermally resolved (Bird et al., 1995a, 1997~).In support of this argument, we note that the transition between the two regimes of phase-breaking behavior appears to occur when the thermal energy becomes comparable to the average level spacing in the dots (the markers on the upper axis of Fig. 25)
where L2 is the effective area of the dot. Further evidence for the influence of dimensionality on phase breaking is provided by the results of nonequilibrium studies, in which the transport properties of the dot are measured in the presence of a superimposed dc source-drain bias (Fig. 26) (Linke et al., 1997a, 1997b). As the magnitude of this voltage is varied, the enhanced probability for electron-electron scattering is expected to quench phase coherence (Yacoby et al., 1991), and experiment is found to reveal a monotonic suppression of the dot resistance (Fig. 26). An analysis of the resulting lineshape variation allows an estimate for the bias dependence of the phase-breaking time to be obtained, the results of which are shown in Fig. 26 (Linke et a/., 1997a). As can be seen from this figure, the value of the phase-breaking time deduced in this manner is found to be independent of the voltage bias, until the corresponding excitation energy (e&) becomes comparable to the average level spacing in the dot. This threshold is therefore tentatively associated with a process in which electrons transition between dot levels once the bias voltage becomes sufficiently high (Bird et at., 1997~;Linke et al., 1997a). 2. Environmental Coupling and Electron Dephasing In Fig. 27, we study the influence of dot lead opening on phase coherence in three different quantum dots. For relatively low values of dot resistance, the phase-breaking time takes a value of roughly 40 ps in all three dots. As the dot leads are narrowed, by increasing the negative voltage applied to their gates, however, a clear increase in z4 is observed in each dot. This appears to set in at roughly the same resistance in each case (13 kn). Further
34
J. P. BIRD, R. AKIS. D. K. FERRY, AND M. STOPA
20
5
DC VOLTAGE BIAS (pv) I
2
I
10 DC BIAS VOLTAGE (pV)
I
I
I00
FIGURE 26. Variation of the phase-breaking time with dc source-drain bias voltage, measured in a 0.6-ym dot at 10 mK. The arrow indicates the crossover voltage (A/e). Inset: zero-bias resistance peak, generated in the 0.6-pm dot by sweeping dc bias at zero magnetic field. For a more detailed discussion of such nonequilibrium measurements, we refer the reader to [Linke er a[., 1997a1.
increasing the dot resistance beyond this transition leads to no additional change in phase coherence, and the overall impression is one of a step-like transition (Bird et ul., 1998a). We do note from this figure, however, that the value of the phase-breaking time in this high resistance regime shows considerable variations from one dot to another, which do not appear to follow any well-defined scaling with dot size. The influence of environmental coupling on the behavior of mesoscopic devices has been widely considered in the literature. For the open dots of interest here, we expect that the phase-breaking rate should be proportional to the number of available states that electrons may access during transport. In this regard, the variation shown in Fig. 27 suggests that the phase space available for scattering in the dot is significantly reduced when the quantum point contact leads are narrowed. For the origin of this effect, we note from Fig. 4 that reducing the width of the leads is expected to suppress Raring of the incoming electron beam, and so should reduce the excitation of phase space within the dot. While further theoretical studies are required to
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
35
500
200
100
50
20 0
10
20
30
40
RESISTANCE (kQ)
FIGURE27. Variation of the phase-breaking time with lead opening, measured in three different quantum dots. Solid circles: 1-pm dot. Open circles. 0.4-pm dot. Inset: 0.6-pm dot. Lines are intended to guide the eye and additional error bars are omitted for clarity.
confirm such a mechanism, another independent study has also argued for an increase in phase coherence in small dots when their environmental coupling is reduced (Barggild et a/., 1998).
3. Device-Dependent Variations in Phase Coherence An intriguing feature of Fig. 27 is the absence of a well-defined scaling of the phase-breaking time with dot size. This finding is confirmed by the results of other studies, which reveal considerable variations in phase coherence in the regime of few-mode coupling. For example, in Fig. 28 we show conductance fluctuations measured in two lithographically identical dots, which were patterned on Hall bars with similar characteristics. While the zero-field resistance was adjusted to be roughly the same in both devices, a striking difference is nonetheless apparent in the amplitude of the resulting
36
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA 1.8
f
5
1.4
w
0
z
5 33 cl
z
8
1.0
0.6
0
0.1
0.2
0.3
0.4
0.5
MAGNETIC FIELD (TESLA)
FIGURE28. Conductance fluctuations in two I-pm dots, patterned on nominally identical Hall bars (R,, = 20 kQ, n, = 4.4 x 10’’ m - 2 and p = 40 mZ/vs). The lower curve (T+ = 30 ps) has been shifted downwards from the upper one ( T = ~ 200 ps) by 0.6 e2/h.
fluctuations. The difference in amplitude suggests a considerable difference in phase-breaking time between the two dots. This poor correlation of phase coherence to the average properties of the host substrate is further illustrated in Fig. 29, in which we compare the lead opening dependence of the phase-breaking time, measured in three different dots. Note how the two dots fabricated in the lower quality material exhibit no noticeable variation in phase coherence with lead opening, while the higher mobility one shows an order of magnitude change. While a very rough correlation to wafer mobility can be resolved here, it is nonetheless clear that the magnitude of the phase-breaking time in the high resistance regime does not appear to be distributed in simple accordance with the mobilities of the bulk wafers (Bird et al., 1998a; Huibers et al., 1998). The poor correlation of the phase-breaking time to the average properties of their host material suggests a strong sensitivity of phase coherence in these dots to their microscopic disorder configuration. One obvious source of disorder is potential fluctuations in the 2-D electron gas layer, which
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
37
10' 0
10
20
30
40
4 (W FIGURE29. Variation of the phase-breaking time with lead opening in three 1-pm dots. Solid circles: n, = 5 x 10'sm-2 and 11 = 70m2/Vs; open circles: n, = 4.4 x 1 0 1 s m - 2 and p = 40m2/Vs; open squares: n, = 4.1 x 10'sm-2 and p = 20m2/Vs. Lines are intended to guide the eye.
arises due to the discrete distribution of donors in the AlGaAs layer (Nixon and Davies, 1990; Stopa, 1996a). The effect of these fluctuations should be to mix the otherwise discrete states of the dot, so that electrons may populate additional states that would otherwise remain inaccessible in transport (Altland and Gefen, 1995). In other words, while disorder is more normally thought of as giving rise to elastic scattering, by increasing the density of available states into which electrons may scatter, its presence may actually increase the dephasing rate in these dots. The very different behaviors apparent in Fig. 29 may arise if the phase-breaking time is limited by scattering from disorder within the dot and varying lead opening then would have little effect on the resulting phase coherent characteristics. In contrast, in the limit of weak disorder, electrons may escape from the dot before their phase is randomized within it. Because the phase of these electrons will therefore be broken in the external reservoirs (or within the quantum point contact leads), it seems quite reasonable that reducing the width of the dot leads should enhance their phase-breaking time. Once
38
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
again, however, theoretical studies are required to clarify the influence of disorder on the phase-breaking process within these dots.
C. Zero-Field Magneto-Resistance Peak: A Probe of Quantum Chaos? Weak localization is a well-known correction to the conductance of disordered systems that results from a process known as coherent backscattering (Bergmann, 1984). The origin of this quantum mechanical effect is the finite probability that exists for diffusing electrons to return to their initial point, after randomly scattering within a disordered medium. At sufficiently low temperatures, and in the absence of a magnetic field, constructive interference between these backscattered electrons and their time-reversed counterparts produces an enhancement of the sample resistance. The interference is suppressed by the application of a weak magnetic field, which breaks time-reversal symmetry. This leads to a magneto-resistance peak at zero-field. The open dots we study here may also exhibit a zero-field peak in their magneto-resistance (Figs. 19 and 20), which has been argued to result from the ballistic analog of weak localization in which electrons backscatter within the dot through a series of collisions with its confining walls. According to one semiclassical theory, in particular, the lineshape of this peak is expected to depend very sensitively on the nature of the electron dynamics in the dot, with Lorentzian and linear magnetic field dependencies predicted for chaotic and regular scattering, respectively (Baranger, 1993). In order to resolve a zero-field peak in experimental studies of open dots, it is usually necessary to suppress the influence of the surrounding fluctuations that may dominate the magneto-conductance at low temperatures (Fig. 20). Among the techniques that may be used to achieve this suppression include measuring the response of large arrays of lithographically identical dots (Chang et al., 1994), or averaging magneto-conductance traces obtained in single dots at different gate voltages (Keller et al., 1994; Chan et al., 1995, Huibers et ul., 1998). Alternatively, by raising the measurement temperature to roughly a degree Kelvin, it has been found possible to quench the fluctuations while leaving the central peak resolved (Fig. 20) (Bird et al., 199%). Motivated by these observations, we have suggested an alternative interpretation of this peak, according to which the peak is thought to provide a signature of energy averaging of specific spectral features in these dots (Akis et al., 1998b). As we discuss in greater detail in what follows, simply by varying the range over which this average is taken, we are able to obtain both Lorentzian and linear peak lineshapes in the same dot geometry! While this observation is quite consistent with the results of experiment (Bird et al., 1995b), it clearly contradicts the suggestions of the previous paragraph.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
39
1. Zero-Field Resistance Peaks: A Spectral interpretation To emphasize the connection of the zero-field peak to the spectral properties of open dots, in Fig. 30 we show the computed level spectrum of an isolated dot, and the conductance contour that is obtained at absolute zero when two propagating modes are present in its point contact leads. A clear correlation to the underlying level spectrum is once again apparent in this contour, from which it is also clear that sweeping magnetic field at fixed energy will not always yield a zero-field magneto-resistance peak. One way in which such a peak may be observed, however, is by averaging the results of magneto-resistance calculations, performed at a series of different Fermi energies. In order to illustrate the effects of such a uniform energy average, consider the magneto-resistance curve shown in Fig. 30. This represents an average of more than 60 distinct magneto-resistance traces, which are used to construct the contour plot shown in the same figure. After this uniform average is performed, a central peak is found to remain at zero magnetic field, reminiscent of the behavior observed in experiment. In Fig. 31, we show how the lineshape of this peak varies as we increase the energy range over which the average magneto-resistance is computed. While the geometry of the dot itself is held constant here, a transition between Lorentzian and linear lineshapes may nonetheless be resolved (see also Fig. 32). This transition is not restricted to the perfectly square dots we consider here, but is also observed in calculations performed for self-consistently computed profiles (Akis et al., 1998b). Based on these findings, we therefore conclude that the lineshape of the zero-field peak does not provide a reliable indicator of chaos in open dots! Instead, the observation of this peak is thought to reflect the fact that the transport properties of these dots are dominated by the details of their energy spectra, even at temperatures where specific spectral features may no longer be resolved. The observation of a peak at zero magnetic field, in particular, is thought to reflect the highly degenerate nature of the open dot energy spectrum in this region (Akis et al., 1998b).
2. Connection to Previous Experiment Zero-field peaks, whose lineshape was found to transition between Lorentzian and linear forms as gate voltage was varied, have been reported in studies of single dots with very different geometries (Fig. 32) (Bird et al., 1995b, Taylor, 1997). While this transition has been argued to reflect a crossover from chaotic to regular scattering in the dots (Bird et al., 1995b, 1996b), it now seems more likely that varying gate voltage instead changes the specific dot states that contribute to the associated energy average. This notion is further confirmed by the results of Fig. 33, in which we obtain
40
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
-100
0
100
MAGNETIC FIELD fTESLAl
FIGURE30. Upper left: Computed energy levels of an isolated 0.3-pm dot and their evolution with magnetic field and Fermi energy. Upper right: Corresponding conductance contour plot, obtained for the same range of magnetic field and Fermi energy, in an open 0.3-pm dot. Lighter regions correspond to higher conductance. Lower figure: Corresponding magneto-resistance lineshape, obtained by averaging over all curves in the upper right plot.
different peak lineshapes by averaging over different regions of the same dot spectrum. A transition between linear and Lorentzian lineshapes has also been observed in studies of nominally regular dots, on raising the measurement temperature to above a degree Kelvin (Chang et al., 1994, Bird et al., 1995~).While this transition was argued to reflect a thermally induced disruption of regular electron scattering (Chang et al., 1994), we instead suggest it arises as the effective window over which averaging is performed increases at higher temperatures. Indeed, an important question, which we have thus far neglected to consider, concerns our choice of energy window in the simulations. In experiment, the two main sources of energy averaging are thought to be thermal smearing and lifetime broadening (h/z4). Making suitable estimates for these quantities, we obtain a total broadening of order
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
41
FIGURE31. Left: Computed conductance contour plot obtained for a 0.3-pm, open quantum dot. Right: Corresponding magneto-resistance lineshapes. obtained by averaging over different sized energy windows (indicated on the energy axis).
a few tenths of a millielectron volt at a degree Kelvin (see Fig. 22), a value that is quite consistent with that required to obtain smooth peaks in the numerical simulations.
3. Absence of Weak Localization in Open Quantum Dots In previous studies of open dots, a distinction has been made between the zero-field magneto-resistance peak and the reproducible fluctuations that persist over a wider range of field. In particular, while the fluctuations have been explained in terms of an interference effect involving electron partial waves that connect the source and drain (Jalabert, 1990), the zero-field peak has been viewed as an additive contribution to the conductance, which results from interference between backscattered orbits (Baranger er al., 1993). In this review, on the other hand, we have argued that both of these magnetoconductance features are characteristic of the same density of states, which in turn is determined solely by contributions from backscattered orbits
13.5
-
-c
C
13.0
5
-
w 14
UI
0
0
z
2
2
$
12.5
2
2 w
-
W
P N
15
13
pc
pc
12
12.01 -30
'
I
'
I
'
I
'
1
'
I
0
MAGNETIC FIELD (mTESLA)
'
30
-15
0
15
MAGNETIC FIELD (rnTESLA)
FIGURE 32. Comparison of the zero-field peak lineshape, obtained numerically (left) and in experiment (right). In theory, the different lineshapes are obtained by averaging over different regions of the quantum dot spectrum, while in experiment they are obtained on changing gate voltage [Bird et a/., 1995b1.
P
W
FIGURE33. Different lineshapes are obtained for the zero field peak by averaging over ditferent regions of the quantum dot spectrum.
44
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
(Gutzwiller, 1971, 1990; Berry and Tabor, 1976, 1977; Jalabert et al., 1990; Nakamura, 1993; Casati and Chirikov, 1995; Brack and Bhaduri, 1997). According to our interpretation, the zero-field peak cannot therefore be considered as an additive contribution to the conductance, because the orbits that determine its details are the only orbits involved in transport. Consequently, the only possible interpretation of this peak is that of a probe of the underlying level spectrum, and we emphasize that there is no weak localization in the sense of an additive contribution to the conductance (Akis et al., 1998b).
v. HIGH-FIELDMAGNETO-TRANSPORT IN OPEN QUANTUM DOTS The transport properties of quantum dots are dramatically modified at high magnetic fields, where the formation of well-defined Landau levels results in current flow at the Fermi surface being carried by one-dimensional (1-D) edge states (Fig. 34) (Biittiker, 1992). These narrow channels are confined very closely to the walls of the dot and propagate while following equipotential paths, whose guiding center energies may be written as
FIGURE34. Schematic diagram illustrating the formation of edge states at high magnetic fields. The upper figure shows the location of the edge states relative to the sample boundaries, while the lower figure shows the resulting potential profile (thick curve) and Landau level structure. The dotted line shows the position of the Fermi level.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
45
FIGURE35. The tunable profile of the quantum dot may be used to selectively trap edge states within it. Scattering between the transmitted and confined edge states is assumed to occur at the positions marked a-d. The R is the area enclosed by the confined edge state while A , and A , are the interedge state areas defined by the scattering events at a-h and c-d, respectively.
where n is the Landau level index. An important consequence of Eq. 6 is therefore that quantum dots may be used to selectively trap edge states, by reflecting them from the saddle point barrier that is formed in their point contact leads (Fig. 35) (Glazman and Jonson, 1989; van Wees et a/., 1989). In situations where the interaction between the edge states is weak, the resistance of the dot then exhibits a series of perfectly quantized plateaus as 1980 Biittiker, gate voltage or magnetic field is varied (von Klitzing et d., 1992). When scattering between the trapped and transmitted edge states is possible, however, the latter provides an additional route for current flow through the dot and dramatic departures from Hall quantization may occur. In particular, a magnetically induced modulation of tunneling via the trapped edge states is found to give rise to Aharonov-Bohm oscillations and giant backscattering resonances, the properties of which we discuss in what follows.
A . Aharonov-Bohrn Magneto-Resistance Oscillations An important consequence of confining edge states in quantum dots is that their energy becomes quantized into a discrete spectrum, successive states of
46
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
which are swept past the Fermi surface each time their enclosed magnetic flux increases by one quantum ($o = h/e) (Sivan and Imry, 1988; Sivan et al., 1989). In situations where the trapped edge states provide a tunneling route for current flow through the dot, its magneto-resistance is then found to exhibit a series of highly periodic oscillations, which may persist over wide ranges of magnetic field (Figs. 5 and 36) (van Wees et ul., 1989; Taylor et al., 1992; Sachrajda et al., 1993; Simpson et al., 1993; Bird et al., 1994b). An example of this periodicity is shown in Fig. 36, in which the magnetic field position of successive oscillations is plotted. Equating the average period of oscillation to a corresponding magnetic flux, we typically obtain enclosed edge state areas that agree very closely with self-consistent calculations of the effective size of the dot (Bird et al., 1994b, 1997d). Furthermore, small variations in period observed over larger ranges of magnetic field (Fig. 36) appear consistent with the expected movement of the edge guiding centers, relative to the walls of the dot (van Wees et ul., 1989; Marcus et at., 1994; Bird et al., 1994b). The Aharonov-Bohm oscillations are rapidly quenched with increasing temperature and are typically no longer resolved by a degree Kelvin (Fig. 37). While we expect that both thermal averaging and phase breaking should be efficient in reducing the amplitude of the oscillations, the strong sensitivity to temperature that is apparent in Fig. 37 seems inconsistent with the effects of phase-breaking (see Fig. 25). We have, therefore, previously suggested that the quenching of these oscillations is related to thermal smearing of the discrete energy levels of the trapped edge states that mediate the tunneling process (Bird et al., 1994b).
1. Aharonov-Bohm Oscillations: Precise Departures from hf e Periodicity
A number of recent studies have shown that the Aharonov-Bohm oscillations in small dots may exhibit the phenomenon of frequency-doubling, which appears to result from the generation of two separate sets of h/e oscillations by opposite spin-branches of the same Landau level (Fig. 38) (Sachrajda et al., 1993; Simpson et al., 1993). The oscillations remain locked in antiphase over wide ranges of magnetic field and it has been speculated that the Coulomb interaction between the edge states plays an important role in maintaining this phase rigidity. From studies of different sized dots, we have found the frequency-doubled oscillations to be a generic feature of micron-sized devices (Bird et al., 1997d). In this regard, it seems reasonable that these oscillations are indeed associated with some novel charging effect, as has been speculated in the literature (Taylor et al., 1992; Sachrajda et al., 1993; Simpson et al., 1993).
4.0
3 2
3.7
t
9 W G
40
3.4
0
W +
$
3.1
35
60 120 OSCILLATION NUMBER
0
P 4
180
6.6
20
5%
b b
b
b
b b
b b
2.8
3
3.2
3.4
3.6
3.8
MAGNETIC FIELD (TESLA) 6.1
2.8
3
3.2
3.4
3.6
3.8
WGNETIC FIELD (TESIA)
FIGURE36. The Aharonov-Bohm oscillations observed in the edge state regime exhibit striking periodicity over a wide range of magnetic field. Results presented here were obtained in a 1-pm split-gate dot at a temperature of 0.01 K.
48
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. STOPA
i
1 v) W
a
3.2
3.25
3.3
3.35
MAGNETIC FIELD (TESLA)
FIGURE37. The Aharonov-Bohm oscillations observed in the edge state regime are rapidly washed out with increasing temperature. Results presented here were obtained in a 1-pm split-gate dot.
Period-doubling of the Aharonov-Bohm oscillations has also been reported and while, at present, we do not understand the origin of this effect, its precise nature once again suggests an interpretation in terms of the single particle spectrum of the trapped edge states (Fig. 39) (Bird et al., 1996c, 1997d). In particular, it is well understood that an increasing magnetic field causes the depopulation of successive Landau levels. As doubling of the oscillation period may be thought of as arising from a suppression of every other tunneling event, one idea is that as a depopulation event is approached internal redistribution of charge might somehow compete with the tunneling process responsible for the oscillations (Bird et al., 1996~).The difficulty with such a notion, however, is that it would require the competing tunneling processes to operate strictly sequentially. As this seems rather unlikely, further studies are required to determine the origin of the perioddoubling.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS 31
49
, , , , I , , , , , , , , , , , , , , 1 , , , ,
h 2.5
2.6
2.1
2.8
2.9
3
MAGNETIC FIELD (TESLA)
FIGURE 38. The phenomenon of frequency-doubling of Aharonov-Bohm oscillations, in this case measured in a 0.4-pm split-gate dot at 0.01 K. The oscillations are determined to be frequency-doubled because the assumption that they are single period oscillations results in an effective edge state area that is bigger than the lithographic size of the dot!
2. Magneto-Coulomb Oscillations: 7he Transition to Tunneling Transport While we have thus far focused on the behavior exhibited by open dots whose leads are initially biased to support one or more modes, a strong magnetic field may be used to induce a transition to tunneling in these structures. This transition is indicated by the magneto-resistance of the dot rising above the last quantum Hall plateau, corresponding to the point where the guiding center of the lowest Landau level drops below the saddle point minimum in the leads of the dot (Fig. 40). In addition to a monotonically increasing background, the magneto-resistance in this tunneling regime is characterized by the observation of a series of periodic oscillations (Bird et al., 1994b, 1994~).The period of these oscillations shows little sensitivity to changes in gate voltage and is found to be more than two orders of magnitude larger than that expected for the Aharonov-Bohm effect. In order
50
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
-
s
27
w
Y 2 V,
26
v)
W
U
25 3.75
3.50
-C
4.00
4.25
4.50
4.75
27
Y, W
Ya b-
V,
26
v)
w
[L
25 3.6
3.65
27
"
25 4.4
I
'
I
"
"
4.45
3.75
3.7
1
'
I
"
"
'
" 4.5
l
'
"
"
3.8
'
'
' 4.55
4.6
MAGNETIC FIELD (TESLA)
FIGURE39. Period-doubling of the Aharonov-Bohm oscillations measured in a 1.O-,um dot at 0.01 K. Upper figure: Broad range of magnetic field illustrating the transition between k/e and 21412 oscillations. Middle figure: Expanded view of the h/e Aharonov-Bohm oscillations. Lower figure: Expanded view of the 2h/e oscillations.
1400
100
1200
-C r
W
1000
800
u z I4
2 w
u)
600
L 400
200
0 4
5
6
MAGNETIC FIELD (TESLA)
7
6
A
5
6
7
8
MAGNETIC FIELD (TESLA)
FIGURE40. Observation of magneto-Coulomb oscillations in a 2-pm split-gate dot at 0.01 K. The left-hand figure shows the magneto-resistance of the dot at two different gate voltages. A t a critical magnetic field, the quantum point contacts of the dot depopulate and the magneto-resistance rises rapidly, indicating the transition to the tunneling regime. A t even higher magnetic fields periodic oscillations are observed in the magneto-resistance. In this figure, the curve that pinches off more rapidly corresponds to a more negative gate voltage. The right figure shows an expanded view of the high-field oscillations, which was obtained by subtracting a monotonic background from the magneto-resistance.
52
J. P.BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
to account for this observation, we note that in the tunneling regime the energy of the isolated dot may vary with magnetic field, while the electrochemical potential in the reservoirs remains pinned due to the presence of the attached battery. With all electrons occupying the lowest Landau level, the energy of the single particle states in the dot should vary with magnetic field according to 1 E(B) = - h o , 2
heB 2m*
= -.
(7)
In this regime, increasing magnetic field should cause electrons to tunnel successively out of the dot and, in order to determine the rate at which this depopulation occurs, we note that each tunneling event should be accompanied by a charging energy (Grabert and Devoret, 1991)
where C is the effective capacitance of the dot. In the presence of this charging energy, the magnetic field period between successive depopulation events ( A B ) should therefore be given as EN(B)
-
EN-
1 (AB) = EC,
(9)
where EN(B) is the energy of the Nth single particle state in the dot. In sufficiently large dots, the spacing between successive states is very small and we may therefore write 2m*e AB=-. hC For the quantum dot shown in Fig. 40, we compute a capacitance C = 0.73 x F and, substituting this value into Eq. (lo), a corresponding depopulation period of 0.25 T, in satisfactory agreement with the oscillations observed in experiment (Figs. 40 and 4 1). We therefore conclude that the oscillations observed in the tunneling regime result from a Coulomb blockade effect, and so refer to them as magneto-Coutomb oscillations (Beenakker et al., 1991; Bird et al., 1994b, 199%; van der Vaart et ul., 1994b). 3. Giant Backscattering Resonances
The remarkable quantization of the Hall resistance, observed in 2-D electron gas systems (von Klitzing et al., 1980), is understood to result from the formation of well-defined edge states, which become pinned within a few magnetic lengths of the sample boundaries at high magnetic fields (Buttiker,
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
53
8
3 ,
v)
W
t 0
5
0
3
6
9
12
15
OSCILLATION INDEX
FIGURE 41. The periodic nature of the oscillations shown in Fig. 40 is confirmed by a plot of their index as a function of magnetic field.
1988). Edge states located at opposite edges of the sample propagate in opposite directions and this spatial separation of current-carrying states is responsible for a strong suppression of electron backscattering. In particular, at sufficiently high magnetic fields, the edge states propagate as independent, 1-D channels and the quantization of the Hall resistance is simply determined by the number of occupied Landau levels. The situation is very different in quantum dots, however, in which the mesoscopic geometry greatly enhances the interaction between the opposite edge states (Glazman and Jonson, 1989; Kirczenow, 1994). It is therefore expected that adiabatic edge state transport may break down in these structures, as the dot geometry is tuned by means of changes to the applied gate voltage (Stopa et al., 1996; Bird et al., 1997e). In Fig. 42, we show the magneto-resistance of a split-gate dot at a number of different gate voltages. With the gates grounded at the drain potential, no dot is formed and the quantum Hall effect is clearly observed (Fig. 42, expanded section). With a negative bias applied to the gates, however, the
2 4 0
; wl
-
-
L
P I
,
I
,
l
,
I
,
FIGURE42. The evolution of the magneto-resistance of a 1-pm dot at 0.01 K, as the voltage applied to its split-gates is varied. The upper curves are offset by 10 kR and 20 kR, respectively, while the lowest curve was obtained with the gates grounded and exhibits wellquantized plateaus (see expanded section).
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
55
initial effect is to shift the Hall plateaus to lower magnetic fields, indicative of selective edge state confinement in the dot. In particular, when interedge state scattering can be neglected, the conductance of the dot should be given as (Biittiker, 1988)
where N is the total number of spin-degenerate edge states, R is the number of these trapped in the dot and T is the number transmitted through it. As the negative gate bias is further increased, however, it is expected that interedge state scattering should become more likely and that significant departures from Eq. (11) should thus be observed. This is indeed found to be the case in experiment, where a series of highly resonant peaks emerge in the magneto-resistance as the gates are more strongly biased (Figs. 42 and 43). The peaks imply a resonant backscattering of edge states at certain magnetic fields (R(B)+ l), which careful studies have shown to be correlated to the depopulation of Landau levels within the dot (Bird et al., 1997e). This depopulation is accompanied by a swelling of the remaining edge states in the dot, and we have argued that the backscattering arises in a process in which initially transmitted edge states tunnel into their oppositely propagating counterparts, via trapped edge states in the dot (Fig. 35) (Stopa et a/., 1996). While the swelling effect mentioned here is found to be most dramatic for the innermost edge states, the large separation of these from the transmitted edge states ensures that they are ineffective as a backscattering path. Instead, the resonant reflection is thought to be dominated by the outermost trapped edge state, whose width grows steadily with magnetic field but whose separation from the transmitted edge states simultaneously increases. Consequently, the magneto-resistance is found to be strongly peaked in the narrow range of magnetic field close to the depopulation event (Bird et a/., 1997e). The amplitude of the resonances is suppressed with increasing temperature, until little evidence for their existence may be resolved at liquid helium temperatures (Fig. 44). Such sensitivity is suggestive of a phase coherent effect and we have therefore modeled edge state transport using a quantum mechanical approach, in which the backscattering probability is computed by considering the phase interference between a number of different edge state areas (Fig. 45) (Kirczenow, 1994). These areas may be computed realistically, by considering the evolution of the dot profile with magnetic field, and by assuming that interedge state scattering arises predominantly in the quantum point contact leads (Stopa et al., 1996). Curvature of the dot profile should be strongest in these regions, allowing for enhanced edge state
&I
150
-
-
vp-4.407 v
4.404 v
E
-
.U6V
: f
I00 -
,427.
v-
u,
01
w
K
1-1
.4uv -
4.3a1 v
50
I-
4.415
/
4.387
0
1
2
3
MAGNETIC FIELD (TESLA)
4
v
""~""~""""'""''""'' 1.5 2.5 3.5
4.5
v 5.5
MAGNETIC FIELD (TESLA)
FIGURE43. Growth of giant-backscattering resonances with increasing gate bias, measured at 0.01 K in a 0.4-pm (left) and a 1.0-pm dot.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
57
40
s
32
16
1
1.5
2
2.5
3
3.5
MAGNETIC FIELD (TESLA)
FIGURE 44. The giant resonances are also suppressed with increasing temperature. The results shown here were obtained in a 1-pn split-gate dot.
coupling (Glazman and Jonson, 1989), a notion that seems quite consistent with experiment in which the resonances grow as the point contacts are narrowed (Fig. 43). The edge states are also wider in these regions, as can be seen from Fig. 45, in which we plot the convolution of the density of states with the derivative of the Fermi function. In that the strength of the scattering should depend on the local density of states in the destination channel, this figure gives further evidence that the coupling between Landau levels should be strongest in the point contact regions. In Fig. 46,we compare the results of numerical simulations of the edge state transport with an experimental magneto-resistance trace (Bird et al., 1997e). The lower curve is obtained by assuming a fixed dot profile as a function of magnetic field, and it is clear that this is unable to account for the resistance variation seen in experiment. The middle curve, on the other hand, is obtained by computing the self-consistent evolution of the dot profile with magnetic field, and corresponds very closely to the behavior found in experiment. In particular, both curves are seen to be peaked in the
58
J. P. BIRD, R. AKIS, D. K . FERRY, A N D M. STOPA
FIGURE45. Convolution of the density of states with the derivative of the Fermi function in a I-pm gated dot at 2.9 T (only the two lowest Landau levels are shown.) In this figure, only the upper left corner of the dot is shown, in the region near the input quantum point contact (see Fig. 2 for the dot geometry) (Bird et ul., 1997e). (See also Plate 9.)
magnetic field range over which the depopulation occurs, and exhibit fine structure with similar field scales. This latter observation is thought to provide support for the origin of these resonances as an interference effect involving different edge state areas (Fig. 35) (Stopa et a/., 1996; Bird et a/., 1997e). C. Time-Dependent Mugneto-Trunsport 1. Level Trunsitions of Artlficiul Atoms
The analogy of quantum dots to artificial atoms is particularly clear at high magnetic fields, where edge states trapped within the dot may be thought of as analogous to atomic levels (Fig. 47). For transitions to occur between these levels, electrons must tunnel across the incompressible gaps that
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
59
40
35
30 W
0
z 2
25
g v)
W
U
20
15
10 2.25
2.5
2.15
3
3.25
3.5
MAGNETIC FIELD (TESLA) FIGURE 46. Main figure: Comparison of the magneto-resistance measured in a 1-pm dot at 0.01 K (top), with the results of calculations, which incorporate a magnetic field evolving (middle) and independent (bottom) quantum dot profile (Bird rt cd., 1997e). Inset: Comparison of the Aharonov-Bohm oscillations, computed numerically and measured in experiment at 0.01 K.
separate the edge states, and at sufficiently high magnetic fields the likelihood of this tunneling should be very small. One phenomenon that may be observed in closed quantum dots at high fields is therefore metastable switching of the conductance between a number of discrete values (Fig. 48) (van der Vaart et al., 1994a; Bird et al., 1994b). The switching is thought to result as single electrons tunnel between different edge states of the dot, while the relatively long time between such events, of order several minutes, is thought to reflect the large edge state separation. Indeed, in another study it was shown that the time between switching events increases at higher
60
J. P.BIRD, R. AKIS, D.K. FERRY, A N D M.STOPA
FIGURE 47. The edge state structure in a quantum dot at high magnetic fields suggests an analogy to the level structure of atoms. In this case, the red regions correspond to compressible electron gas and the calculation is performed for the gate geometry shown. (See also Plate 10.)
fields, where the edge state separation is thought to be similarly increased (van der Vaart et al., 1997).
2. Zero Current Voltage Fluctuations An interesting phenomenon that is observed in poorer quality devices, whose resistance may exhibit time-dependent drift under fixed gate voltage conditions, is zero-current voltage noise (Ishibashi et al., 1993; Bird et al., 1995e). The origin of this noise, and of the time-dependent drift of the resistance, is thought to be a slow movement of ionization in the AlGaAs donor layer (Stopa 1996), which in turn should yield a time-dependent perturbation to the confining profile of the dot. An example of the zero-current voltage noise is shown in Fig. 49, in which, with the dot gates grounded and no current flowing, varying the magnetic field has no effect on the measured noise level. With the quantum dot formed, however, a
W
0
z
8
c
W
Y3
450
5
v)
% 6
450
400
4
2 '
0
350 10
20
30
40
TIME (MINUTES)
50
60
0
2
4
6
8
10
TIME (HOURS)
FIGURE 48. In the edge state regime, switching noise observed in the magneto-resistance of closed quantum dots is thought to arise from electron tunneling between discrete edge states within the dot. In this regard, the switching noise may be considered as arising from level transitions in an artificial atom! Results shown here were obtained in a 2-pm split-gate dot at a temperature of 0.01
K.
62
J. P. BIRD, R. AKIS, D. K. FERRY, A N D M. S T O P A 0.3
r
'
I
i
i
I
I
i
I
-
0.6
I
-
.
-0.3
-0.3
-0.6
0
2
4
6
a
MAGNETIC FIELD (TESLA)
FIGURE 49. Zero-current voltage noise, measured in a I-Aim split-gate dot at 0.01 K. In this case, the current leads were disconnected at the top of the cryostat and the voltage across the sample was measured with a Lockin amplifier.. The upper curve therefore represents the noise level in the experimental setup.
dramatic enhancement of the noise level is observed for certain ranges of magnetic field, which subsequent measurements reveal to be related to the confinement of edge states in the dot. The voltage noise persists with unaltered characteristics when the magnetic field is held constant, and its amplitude is also found to be unaffected by the application of a measurement current (Fig. 50). The mesoscopic origin of the noise is suggested by temperature dependent studies, which reveal that it may be quenched on raising the temperature to around a degree Kelvin (Fig. 51). As voltage noise is only observed over magnetic field ranges where one or more edge states are trapped in the dot, this suggests that these trapped modes must somehow be able to modify the electrochemical potential of the transmitted edge states. To appreciate how this modification might arise, we first consider the situation in which all occupied edge states are transmitted through the dot. With the contacts of the Hall bar floating, all edge states will fill to the same potential and the measured voltage across the dot will, therefore, be zero (Buttiker, 1988). If we now allow the size of the dot to vary by some small amount, the edge states will move so as to remain
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
63
lo-'
102
CURRENT (PA)
FIGURE 50. With a measurement current applied, the voltage noise level was found to be constant, while the corresponding resistance fluctuation A R = A V / I . This indicates conclusively that the origin of the noise is a voltage fluctuation. Measurements shown are from a 1-pm split-gate dot at 0.01 K.
pinned to the walls but, as this process does not change their chemical potential, the voltage measured across the dot will remain fixed at zero. Now consider what happens if we allow the dot size to vary while one or more edge states is trapped in the dot. The transmitted edge states will once more follow the movement of the dot, while the confined ones will remain pinned at its center. This latter property is a consequence of flux quantization, which requires the trapped edge states to enclose a fixed magnetic flux when their electron occupation does not change (Stopa et al., 1994). Consequently, any time dependent perturbations to the dot geometry should give rise to a variable capacitive coupling between the transmitted and the confined edge states, and it is this effect which is believed to be responsible for the zero-current voltage noise (Fig. 52) (Bird et al., 1995e).
VI. CONCLUDING DISCUSSION We have discussed here the use of magneto-transport studies to probe electron dynamics in open quantum dots, which are quasi-zero-dimensional
64
J. P. BIRD, R. AKIS, D. K . FERRY, AND M. STOPA
c
10.01 K
0.1
0.2
0.0
0.1
-0.1
0.0
2 w
c)
c U
P
-0.2' 7
'
'
'
'I' 7.1
'
' 7.2
'
'
'
' 7.3
'
'
'
1
-0.1
7.4
MAGNETIC FIELD (TESLA)
FIGURE51. The mesoscopic origin of the voltage fluctuations is suggested by the fact that it may be quenched with increasing temperatures. Measurements shown here are from a I-jtm split-gate dot.
devices in which electrical current flow is confined on length scales comparable to the size of the electron itself. The transmission properties of these structures are strongly regulated by means of their quantum mechanical lead openings, which inject electrons into the dot in a highly collimated beam. This beam couples favorably to only a small set of states within the dot and, at temperatures where electron phase coherence is maintained over long distances, interference of these states becomes the dominant process in determining the resulting electrical behavior. A powerful tool for probing the interference in experiment is provided by the application of a weak magnetic field, which shifts the phase of the electron wavefunction and sweeps successive dot states past the Fermi surface. The resulting fluctuations in the local density of states are thought to be reflected directly in the magneto-conductance of the dot, which exhibits a series of regular oscillations at low temperatures. These in turn are consistent with the notion that electron transport through these structures is dominated by the selective
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
65
FIGURE52. Proposed model for the origin of the voltage noise, in a dot whose precise confining profile varies with time. A confined edge state at the center of the dot is assumed to contain a fixed charge Q, which couples to the transmitted edge states via time dependent capacitances C,,(t).
excitation of a small number of discrete dot states. Furthermore, the simple periodicity of these oscillations implies that the proper semiclassical description of electron transport in these structures is one in which a small number of discrete orbits are predominantly excited during transport. The orbits themselves appear highly stable, a characteristic suggested by the results of experiment in which the frequency content of the fluctuations is studied as a function of gate voltage. Previous treatments of electron transport in open quantum dots have started from an assumption of ergodicity, according to which electrons are considered to scatter chaotically from the confining walls of the dot. The quantum mechanical nature of the lead openings is neglected in these treatments, and the discrete quantization within the dot is also assumed to be obscured by lifetime broadening effects. Clearly, such approaches are quite inconsistent with the results presented here, which reveal a highly nonergodic nature to electron transport in these dots. This conclusion is supported by the results of numerical studies, which reveal the wavefunction within the dots to be scarred by the remnants of a small number of semiclassical orbits. The details of this scarring are not independent of magnetic field but instead recur periodically when this parameter is varied,
66
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
with a frequency that corresponds very closely to that of the conductance oscillations seen in experiment. The implication is therefore that chaotic scattering is suppressed in these open dots, in which the transport behavior is instead dominated by a small number of regular orbits. These in turn are thought to be stabilized by the role of the quantum point contact leads, and by the discrete quantization within the cavity itself. Support for these notions is provided by the results of calculations performed for isolated dots, in which the collimation effect is absent and in which current flow occurs by tunneling. In such weakly coupled dots, the wavefunction typically exhibits the more uniform sampling of phase space that is expected for chaotic dynamics (Akis et a!., 1996a; Stopa, 1998). The conductance fluctuations are exponentially suppressed with increasing temperature and simulations suggest this behavior is correlated to a simultaneous disruption of scarring, which occurs as the phase coherent lifetime of the electrons shortens at higher temperatures. A useful parameter for characterizing this disruption is provided by the phase-breaking time, which may be thought of as the average time scale over which the wave-like nature of the electrons is preserved. In temperature dependent studies of this parameter, its value is found to saturate at low temperatures and we have suggested that this behavior results from a crossover to zero-dimensional phase breaking, which sets in once the discrete levels of the dot become thermally resolved. In other experiments, the influence of environmental coupling on the phase-breaking behavior has been demonstrated, although the manner in which this coupling modifies phase coherence remains poorly understood. Another unresolved issue here is the origin of large variations in phase coherence, seen from one nominally similar device to another. In addition to regular oscillations, the weak field magneto-conductance of open quantum dots may also exhibit a zero-field peak, which has previously been argued to result from the ballistic analog of weak localization. We have presented a very different interpretation of this feature here, according to which it is thought to provide a signature of energy averaging in these dots. An interesting conclusion is that there is no weak localization in these quantum dots, at least not in the sense normally implied in disordered systems. In these latter systems, weak localization essentially provides an additive contribution to the conductance, which arises from a set of backscattered orbits whose importance is rapidly quenched in the presence of a magnetic field. In contrast, we have argued that the transport properties of open quantum dots are related directly to the details of their density of states, which in turn is determined solely by contributions from backscattered orbits. When the magnetic field is increased sufficiently, the formation of welldefined Landau levels results in current flow at the Fermi surface being
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
67
carried by a finite number of edge states. These may be selectively confined in open quantum dots, in which the mesoscopic geometry also enhances the interaction between the different edge states. A striking observation in this regime is a resonant breakdown of the quantum Hall effect, which is correlated to the depopulation of Landau levels in the dot. According to the results of numerical simulations, in which the self-consistent evolution of the quantum dot profile with magnetic field is properly accounted for, this breakdown results from a sudden increase in backscattering via trapped edge states, whose widths swell significantly as a Landau level depopulates and charge is redistributed within the dot. In this regard, the resonances may be considered as resulting from van Hove-like singularities in the coupling between different Landau levels. In conclusion, we once again emphasize the powerful role that magnetotransport studies may play in the characterization of mesoscopic devices. Recently, much interest has been focused on the potential application of quantum dots in novel technologies such as quantum computing and ultrahigh frequency signal processing. Applications such as these offer the possibility of a genuine paradigm shift in microelectronics, which in turn is expected to derive from many of the fundamental phenomena revealed by the studies presented here.
ACKNOWLEDGMENTS In the course of the work presented here, the authors have benefited from invaluable interactions with a number of individuals, including: Y. Aoyagi; J. R. Barker; K. F. Berggren; K. M. Connolly; J. Cooper; L. Eaves; C. Ford; T. M. Fromhold; H. L. Grubin; H. Hofmann; N. Holmberg; K. Ishibashi; S. Komiyama; M. Keller; H. Linke; P. Main; C . M. Marcus; A. P. Micolich; K. Nakamura; R. Newbury; Y. Ochiai; Y. Okubo; D. M. Olatona; P. Omling; D. P. Pivin; Jr., T. Sugano; R. P. Taylor; D. Vasileska, and R. Wirtz. The work presented here was supported in part by The Institute for Physical and Chemical Research, Japan (RIKEN); The Office of Naval Research (ONR); and The Defense Advanced Research Projects Agency (DARPA).
REFERENCES Akis, R.. Ferry, D. K., and Bird, J. P. (1996a). P h j ~Rev., B 54: 17705. Akis, R., Bird, J. P.. and Ferry, D. K. (1996b). J . Phys. Condens. Mritter 8: L667.
68
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
Akis, R., Ferry, D. K., and Bird, J. P. (1997a). Phys. Rev. Lett., 79: 123. Akis, R., Ferry, D. K., and Bird, J. P. (1997b). Jpn. J. Appl. Phys. 36: 3981. Akis, R., Vasileska, D., Ferry, D. K., and Bird, J. P. (1998). Submitted for publication. Altland, A. and Gefen Y. (1995). Phys. Rev. B 51: 10671. Altshuler, B. L., Aronov, A. G., and Khmelnitsky, D. E., (1982). J. Phys., C 1 5 7367. Ashoori, R. C., Stormer, H. L., Weiner, J. S., Pfeiffer, L. N., Pearton, S. J., Baldwin, K. W., and West, K. W., (1992). Phys. Rev. Lett., 6 8 3088. Baranger, H. U., Jalabert, R. A., and Stone A. D. (1993). Phys. Rev. Lett., 7 0 3876. Beenakker, C. W. J. and van Houten, H. (1988). Phys. Rev., B 37: 6544. Beenakker, C. W. J. (1990). Phys. Rev. Lett., 64: 216. Beenakker, C. W. J., van Houten, H., and Staring, A. A. M. (1991). Phys. Rev., B 4 4 1657. Berggren, K. F., Ji, Z. L., and Lundberg, T. (1996). Phys. Rev., B 5 4 11612. Bergmann, G., (1983). Phys. Rep., B 28: 2914. Bergmann G. (1984). Phys. Rev. 107, 1. Berry, M. V. and Tabor, M. (1976). Proc. Roy. Soc. (London), A 349: 101. Berry, M. V. and Tabor, M. (1977). J.*Phys., A 10: 371. Berry, M. V. (1984). In The Wave-Particle Dualism. S Diner er al., eds., Dordrecht: Riedel. Berry, M. J., Katine, J. A,, Marcus, C. M., Westervelt, R. M., and Gossard, A. C. (1994a). SurJ Sci., 305: 495. Berry, M. J., Baskey, J. H., Westervelt, R. M., and Gossard, A. C. (1994b). Phys. Rev., B 5 0 8857. Bird, J. P., Grassie, A. D. C., Lakrimi, M., Hutchings, K. M., Harris, J. J., and Foxon, C. T. (1990). J. Phys. Condens. Matter, 2: 7847. Bird, J. P., Grassie, A. D. C., Lakrimi, M., Hutchings, K. M., Meeson, P., Harris, J. J., and Foxon, C. T. (1991). J . Phys. Condens. Matter, 3 2897. Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1994a). Phys. Rev., B 5 0 18678. Bird, J. P., Ishibashi, K., Stopa, M., Aoyagi, Y., and Sugano, T. (1994b). Phys. Rev., B SO: 14983. Bird, J. P., Ishibashi, K., Stopa, M., Taylor, R. P., Aoyagi, Y., and Sugano, T., (1994~).Phys. Rev., B 4 9 11488. Bird, J. P., Ishibashi, K., Ferry, D. K., Ochiai, Y., Aoyagi, Y., and Sugano, T. (1995a). Phys. Rev., B 51: R18037. Bird, J. P., Olatona, D. M., Newbury, R., Taylor, R. P., Ishibashi, K., Stopa, M., Aoyagi, Y., Sugano, T., and Ochiai, Y. (1995b). Phys. Rev., B 52: R14336. Bird, J. P., Ishibashi, K., Ferry, D. K., Aoyagi, Y., Sugano, T., and Ochiai, Y., (1995~).Phys. Rev., B 5 2 8295. Bird, J. P., Ishibashi, K., Ochiai, Y., Lakrirni, M., Grassie, A. D. C., Hutchings, K. M., Aoyagi, Y., and Sugano, T., (1995d). Phys. Rev., B 5 2 1793. Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T., (1995e). J. Phys. Soc. Jpn., 10: 3618. Bird, J. P., Ferry, D. K., Akis, R., Ishibashi, K., Aoyagi, Y., Sugano, T., and Ochiai, Y. (1996a). Europhys. Lett., 35: 529. Bird, J. P., Ferry, D. K., Edwards, G., Olatona, D. M., Newbury, R., Taylor, R. P., Ishibashi, K.. Aoyagi, Y., Sugano, T., and Ochiai, Y. (1996b). Physica, B 227: 148. Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1996~).Phys. Rev., B 5 3 3642. Bird, J. P., Akis, R., Ferry, D. K., Pivin, Jr., D. M., Connolly, K. M., Taylor, R. P., Newbury, R.,Olatona, D. M., Ochiai, Y.,Okubo, Y., Ishibashi, K.. Aoyagi,Y., and Sugano, T., (1997a). Chaos, Solitons and Fractals 8 1299. Bird, J. P., Akis, R., Ferry, D. K., Aoyagi, Y., and Sugano, T. (1997b). J . Phys. Condens. Matt., 9 5935. Bird, J. P., Linke, H., Cooper, J., Micolich, A. P., Ferry, D. K., Akis, R., Ochiai, Y., Taylor, R. P., Newbury, R., Omling, P., Aoyagi, Y., and Sugano, T. (1997~).Phys. Stat. Sol. (b), 204 314.
MAGNETO-TRANSPORT AS A PROBE OF ELECTRON DYNAMICS
69
Bird, J. P., Stopa, M., Taylor, R. P., Newbury, R., Aoyagi, Y., and Stopa, M. (1997d). Superlatt. Microstruct., 2 2 57. Bird, J. P., Stopa, M., Connolly, K., Pivin, Jr., D. M., Ferry, D. K., Aoyagi, Y., and Sugano, T. (1997e). Phys. Rev., B 5 6 7477. Bird, J. P., Micolich, A. P., Linke, H., Ferry, D. K., Akis, R., Ochiai, Y., Aoyagi, Y., and Sugano, T. (1988a). J . Phys. Condens. Mutt., 1 0 L55. Bird, J. P., Akis, R., Ferry, D. K., Cooper, J., Ishibashi, K., Ochiai, Y., Aoyagi, Y., and Sugano, T. (1998b). Semicond. Sci. Tech., 1 3 A4. Bnrggild, P., Kristensen, A,, Bruus, H., Reimann, S. M., and Lindelof, P. E. (1998). Phys. Rev., B 57: 15408. Brack, M. and Bhaduri, R. K. (1997). Semiclassical Physics. Reading, MA: Addison-Wesley. Brown, C. V., Geim, A. K., Foster, T. J., Ldngerak, C. J. G. M., and Main, P. C. (1993). Phys. Rev., B 47: 10935. Biittiker, M., Imry, Y., Landauer, R., and Pinhas, S. (1985). Phys. Rev., B 31: 6207. Biittiker, M., (1988). Phys. Rev., B 3 3 3020. Biittiker, M., (1992). In Semiconductors und Semimetals, Volume 35, M. Reed, ed., pp. 191-277, New York: Academic Press. Casati, G. and Chirikov, B. (eds.). (1995). Quantum Chaos. Cambridge: Cambridge University Press. Chan, I. H., Clarke, R. M., Marcus, C. M., Campman, K., and Gossard, A. C. (1995). Phys. Rev. Lett., 7 4 3876. Chang, A. M., Timp, G., Chang, T. Y., Cunningham, J. E., Chelluri, B., Mankiewich, P. M., Behringer, R. E., and Howard, R. E. (1988). Surf Sci., 196 46. Chang, A. M. (1990). Solid State Comm., 74: 871. Chang, A. M., Baranger, H. U., Pfeiffer, L. N., and West, K. W. (1994). Phys. Rev. Lett., 73: 2111. Chklovskii, D. B., Shklovskii, B. I., and Glazman, L. I. (1992). Phys. Rev., B 4 6 4026. Clarke, R. M., Chan, I. H., Marcus, C. M., Duruoz, C. I., Harris, Jr., J. S., Campman, K., and Gossard, A. C. (1995). Phys. Rev., B 52: 2656. Ferry, D. K., Edwards, G., Ochiai, Y., Yamamoto, K., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1995). Jpn. J . Appl. Phys., 3 4 4338. Ferry, D. K., and Goodnick, S. M. (1997). 7?ansport in Nanostructures. Cambridge: Cambridge University Press. Ferry, D. K., Bird, J. P., Akis, R., Pivin, D. P. Jr., Connolly, K. M., Ishibashi, K., Aoyagi, Y., Sugano, T., and Ochiai, Y. (1997). Jpn. J. Appl. Phys., 36: 3944. Ferry, D. K., Akis, R., and Bird, J. P. (1998). Superlatt. Microstruct. 2 3 611. Fromhold, T. M., Wilkinson, P. B., Sheard, F. W., Eaves, L., Miao, J., and Edwards, G . (1995). Phys. Rev. Lett., 7 5 1142. Fukuyama, H. and Abraham, E. (1983). Phys. Rev., B 27: 5976. Geim, A. K., Main, P. C., Beton, P. H., Eaves, L., Beaumont, S. P., and Wilkinson, C. D. W. (1992). Phys. Rev. Lett., 6 9 1248. Glazman, L. I. and Jonson M., (1989). J. Phys. Condens. Matter, 1: 5547. Grabert, A. and Devoret, M. H. (eds.). (1991). Single Charge Tunneling. Volume 294, N A T O Advanced Study Institute, Series B: Physics, New York: Plenum. Gutzwiller, M. C. (1971). J . Math. Phys., 12: 343. Gutzwiller, M. C. (1990). Chaos in Classicul and Quantum Mechanics. Berlin: Springer-Verlag. Holmberg, N., Akis, R., Pivin, Jr., D. P., Bird, J. P., and Ferry, D. K. (1998). Semicond. Sci. Tech., 13: A21. Huibers. A. G., Switkes, M., Marcus, C. M., Campman, K., and Gossard, A. C. (1998). Phys. Rev. Lett., 81: 1917.
70
J. P. BIRD, R. AKIS, D. K. FERRY, AND M. STOPA
Ishibashi, K., Bird, J. P., Stopa, M., Sugano, T., and Aoyagi, Y. (1993). Jpn. J. Appl. Phys., 32: 6246. Jalabert, R. A., Baranger, H. U., and Stone, A. D. (1990). Phys. Rev. Lett., 65: 2442. Keller, M. W., Millo, O., Mittal, A., and Prober, D. E. (1994). SurL Sci., 305: 501. Keller, M. W., Mittal, A., Sleight, J. W., Wheeler, R. G., Prober, D. E., Sacks. R. N., and Shtrikrnann, H. (1996). Phys. Rev., B 53: R1693. Kirczenow, G. (1994). Phys. Rev., B 50: 1649. Kurdak, C., Chang, A. M., Chin, A., and Chang, T. Y . (1992). Phys. Rev., B 4 6 6846. Landauer, R.(1957). IBM J . Res. Deoelop., 1: 223. Lee, P. A,, Stone, A. D., and Fukuyama, H. (1987). Phys. Rev., B 35 1039. Linke, H., Bird, J. P., Cooper, J., Omling, P., Aoyagi, Y., and Sugano, T. (1997a). Phys. Rev., B 5614397. Linke, H., Bird, J. P., Cooper, J., Omling, P., Aoyagi, Y., and Sugano, T . (1997b). Phys. Stcir. Sol., 204 318. Marcus, C. M., Rirnberg, A. J., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1992). Phys. Rev. Lett., 69: 506. Marcus, C. M., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1993a). Chaos, 3: 643. Marcus, C. M., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1993b). Phys. Reo., B 4 8 2460. Marcus, C. M., Westervelt, R. M., Hopkins, P. F., and Gossard, A. C. (1994). Surf: Sci., 305: 480. McEuen, P. L., Foxman, E. B., Meirav, U., Kastner, M. A.. Meir, Y., and Wingreen, N. S. (1991). Phys. Rev. Lett., 66: 1926. Milliken, F. P., Washburn, S., Urnbach, C. P., Laibowitz, R. B., and Webb, R. A. (1987). P h j ~ Reu., B 36: 4465. Nakarnura, K. (1993). Quantum Chaos: A New Puradiym of Non-Linear Dynamics. Cambridge: Cambridge University Press. Nixon, J. A. and Davies, J. H. (1990). Phys. Reu., B 41: 7929. Okubo, Y., Ochiai, Y., Vasileska, D., Akis, R., Ferry, D. K., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano. T. (1997a). Phys. Lett., A 236: 120. Okubo, Y., Bird, J. P., Ochiai, Y., Ferry, D. K., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1997b). Phys. Rev., B 5 4 1368. Person, M., Petterson, J., von Sydow, B., Lindelof, P. E.. Kristensen, A., and Berggren, K. F. (1995). Phys. Rev., B 52: 8921. Richter, K., Ullrno, D., and Jalabert, R. A. (1996). Phys. Rev., B, 54: R5219. Sachrajda, A. S., Taylor, R. P., Dharrna-Wardana, C., Zawadzki, P., Adarns, J. A,, and Coleridge, P. T. (1993). Phys. Rev., B 47: 681 1. Simpson, P. J., Mace, D. R., Ford, C. J. B., Zailer, I., Pepper, M., Ritchie, D. A,, Frost, J. E. F., Grimshaw, M. P., and Jones, G.A.C. (1993). Appl. Phys. Letr., 63: 3191. Sivan, U. and Imry, Y . (1988). Phys. Rev. Lett., 61, 1001. Sivan, U., Irnry, Y., and Hartzstein, C. (1989). Phys. Rev., B 39: 1242. Stopa, M. (1996). Phys. Rev., B 5 3 9595. Stopa, M. (1998). Sernicond. Sci. Techno/., 13: A55. Stopa, M., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1994). Superlatr. Microstruc't., 15: 99. Stopa, M., Bird, J. P., Ishibashi, K., Aoyagi, Y., and Sugano, T. (1996). Phys. Rev. Left., 76: 2145. Takane, Y. (1998). J . Phys. Soc. Jpn., 6 7 3003. Tarucha, S., Austing, D. G., Honda, T., van der Hage, R. J., and Kouwenhoven, L. P. (1996). Phys. Rev. Lett., 71: 3613.
MAGNETO-TRANSPORT AS A PROBE O F ELECTRON DYNAMICS
71
Taylor, R. P., Sachrajda, A. S., Zawadzki, P., Coleridge, P. T., and Adams, J. A. (1992). Phys. Rev. Lett., 69: 1989. Taylor, R. P., Newbury, R., Sachrdjda, A. S., Feng, Y., Coleridge, P. T., Dettmann, C., Zhu, N., Guo, H., Delage, A,, Kelly, P. J., and Wasilewski, 2. (1997). Phys. Rev. Lett., 7 8 1952. Thornton, T. J., Pepper, M., Ahmed, H., Andrews, D., and Davies, G. J. (1986). Phys. Rev. Lett., 5 6 1198. van der Vaart, N. C., de Ruyter van Steveninck, M. P., Kouwenhoven, L. P., Johnson, A. T., Nazarov, Y. V., Harmans, C. J. P. M., and Foxon, C . T. (1994a). Phys. Reu. Lett., 7 3 320. van der Vaart, N. C., de Ruyter van Steveninck, M. P., Harmans, C. J. P. M., and Foxon, C. T. (1994b). Physica B 194-6 1251. van der Vaart, N. C., Godijn, S. F., Nazarov, Y. V., Harmans, C. J. P. M., Mooij, J. E., Molenkamp, L. W., and Foxon, C. T. (1995). Phys. Rev. Lett., 7 4 4702. van der Vaart, N. C., Kouwenhoven, L. P.. de Ruyter van Steveninck, M. P., Nazarov, Y. V., Harmans, C . J. P. M., and Foxon, C. T. (1997). Phys. Rev., B 55: 9746. van Wees, B. J., van Houten, H., Beenakker, C. W. J., Williamson, J. G., Kouwenhoven, L. P., van der Marel, D., and Foxon. C. T. (1989). Phys. Rev. Lett., 6 0 848. van Wees, B. J., Kouwenhoven. L. P., Harmans, C. J. P. M., Williamson, J. G., Timmering, C . E., Broekaart, M. E. I., Foxon, C. T., and Harris, J. J. (1989). Phys. Rev. Lett., 62: 2523. Vasileska, D., Wybourne, M. N., Goodnick, S. M., and Gunther, A. D. (1998). Semicond. Sci. Tech., 1 3 A37. von Klitzing, K., Dorda, G., and Pepper, M. (1980). Phys. Rev. Lett., 45: 494. Wang, Y., Wang, J., and Guo, H. (1993). Phys. Rev., B 4 7 4348. Waugh, F. R., Berry, M. J., Mar, D. J., Westervelt, R. M., Campman, K. L., and Gossard, A. C . (1995). Phys. Rev. Lett., 75: 705. Wharam, D. A,, Thornton, T. J., Newbury, R., Pepper M., Ahmed H., Frost J. E. F., Hasko D. G., and Peacock D. C . (1988a). J . Phys. C 21 L209. Wharam, D. A,, Pepper, M., Ahmed, H. Frost, J. E. F., Hasko, D. G., Peacock, D. C., Ritchie, D. A., and Jones, G. A. C . (1988b). J . Phys. C 21: L887. Yacoby A,, Sivan U., Umbach, C . P., and Hong, J. M. (1991). Phys. Rev. Lett., 66: 1938. Yacoby, A,, Heiblum, M., Mahalu, D. and Shtrikman, H. (1995). Phys. Reu. Lett., 74: 4047. Zozoulenko, 1. V., Schuster, R., Berggren, K. F., and Ensslin, K. (1997). Phys. Rev., B 55: R 10209.
This Page Intentionally Left Blank
ADVANCES I N IMAGING AND ELECTRON PHYSICS. VOL. 107
External Optical Feedback Effects in Distributed Feedback Semiconductor Lasers MOHAMMAD F. ALAM' and MOHAMMAD A. KARIM' 'Electro-Optics Progrum. University of Duyton, Duyton, Ohio 45469-0245, U.S. 'Depurtment of Electricul Engineering, University of Tennessee, Knoxville, Tennessee 37996-2100. U.S.
. . . . . . . . . . . . . . . . . . . . . . . . . . , . A. Physical Structures for Distributed Feedback Lasers . . . . . B. Distributed Feedback Laser Electromagnetics . . . . . . . C. Oscillation Condition for a Distributed Feedback Laser . . . D. General Characteristicsof Distributed Feedback Lasers . . . 111. Experimentally Observed Effects . . . . . . . . . . . . . . . A. Intensity Fluctuation and Spectral Characteristics . . . . . . B. Linewidth Reduction, Broadening, and Chaos . , . . . , . C. Noise Generation . . . . . . . . . . . . . . . . . . . . D. Regimes of External Feedback . . . . . . . . . . . . . . E. Otherphenomena. . . . . . . . . . . . . . . . . , . IV. Theories on Optical Feedback . . . . . . . . . . . . . . . . A. Compound Cavity Model . . . . . . . . . . . . . . . . B. Coherence Collapse . . . . . . . . . . . . . . . . . . . C. Bistability Under Optical Feedback . . . . . . . . . . . . D. Mode Competition Noise . . . . . . . , . . . . . . . . V. External Optical Feedback Sensitivity . . . . . . . . . . . . A. Sensitivity to Threshold Gain and Spectrum . . . . . . . . B. Feedback Sensitivity Based on Mode Competition Theory . . VI. Conclusion . . . . . . . . . . . . . . . . . . . . . . , . References . . . . . . . . . . . . . . . . . . . . . . . . I. Introduction . . .
. .
11. Distributed Feedback Laser Fundamentals
. . . . . . .
. . . .
.
.
.
. . .
.
. .
. .
. . . . . . . . . , .... . . . . . . . . . . . , . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . , . . . . . , . . . . . . . . . . . . . . . , . . , . . . . . . . . . . . . . . . , . . . , . . . . . ,
73 74 75 75 79 81 82 83 87 90 94 95 97 98 102 104 104
107 108 11 I 114 1 15
I. INTRODUCTION Semiconductor lasers are used as a coherent light source in a number of applications including optical communication systems, optical data storage systems (compact discs (CDs), digital versatile discs (DVDs) etc.), optical measurement systems, printing etc. These lasers are inexpensive, lightweight, highly efficient, and can be mass-produced using integrated circuit fabrication technology. These lasers use a forward-biased p-n junction (or diode) to achieve optical gain (Chuang, 1995). The simplest type of semiconductor laser (also called laser diode) is the Fabry-Perot (FP) type semiconductor 73 Volume 107 ISBN 0-12-014749-1
ADVANCES IN IMAGING A N D ELECTRON PHYSICS Copyright C 1999 by Acddemic Press All rights of reproduction in dny h i n reserved ISSN 1076-s670/~9 $30 00
74
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
laser. A Fabry-Perot type laser diode uses the reflectivity of the facets (semiconductor-air interface) for providing necessary optical feedback to sustain laser oscillation. On the other hand, distributed feedback (DFB) semiconductor lasers use an internal Bragg grating within the laser cavity to provide optical feedback. The grating works as a wavelength-selective device to achieve highly stable narrow-linewidth laser operation. Distributed feedback lasers are particularly useful in wavelength-division multiplexed (WDM) optical communication systems where different DFB lasers transmit optical signals of different wavelengths that are very closely spaced. Due to their present and future technological significance, research activity on DFB lasers has increased manyfold during the recent years. A major problem with semiconductor lasers, both FP and DFB types, is that these semiconductor lasers are highly sensitive to the laser light which re-enters the laser cavity after being reflected by an external reflector. External optical feedback of the laser light usually causes instability of operation of a laser diode and generates excessive noise in optical communication systems (Petermann, 1988; Lenstra, 1991a, b; Twu, et al., 1992; Park et al., 1998). A variety of optical elements, including lenses, fiber endfaces etc. can be the source of unwanted optical feedback. Rayleigh backscatter from fiber can be another source of optical feedback. Packaged laser diodes may also receive optical feedback due to the external cavity formed between the laser diode chip and a transparent window of the package. Furthermore, in many cases an integrated external cavity with a laser diode is unavoidable in an optoelectronic device or circuit. For these reasons, costly and bulky optical isolators are required in most applications to protect semiconductor lasers from optical feedback-induced noise. However, an isolator can still cause very weak optical feedback to a diode laser. We will discuss in an exploratory fashion the basic mechanisms of noise generation, methods of analyzing external optical feedback performance, and various factors that contribute to the external optical feedback performance in semiconductor lasers with special emphasis on DFB lasers. In Section 11, we discuss the basic electromagnetic equations for DFB semiconductor lasers. Sections 111 and IV discuss experimental effects and theoretical investigations, respectively, related to external optical feedback. External optical feedback sensitivity is described in Section V. Section VI summarizes this chapter. 11. DISTRIBUTED FEEDBACK LASERFUNDAMENTALS
In DFB semiconductor lasers, distributed feedback is achieved by periodic longitudinal modulation of either the index of the waveguiding layer, or the
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
75
net optical gain in the active layer, or both. A DFB laser with only index modulation is called an index-coupled (IC) DFB laser while a laser with a gain modulation is termed a gain-coupled DFB laser. When both index and gain coupling are present, a DFB laser is said to be complex-coupled. However, in the literature, complex-coupled DFB lasers are sometimes termed simply “gain-coupled lasers” (to differentiate them from pure indexcoupled lasers) although they contain both index and gain coupling. Consequently DFB lasers that have only gain coupling but do not have index-coupling are referred to as “pure gain-coupled” lasers.
A. Physical Structures for Distributed Feedback Lasers It is possible for a DFB structure to be incorporated into a doubleheterostructure (DH) semiconductor laser. In the first DFB semiconductor lasers fabricated, the optical feedback needed for lasing operation used to be provided by a corrugated surface between the active layer and an outer p-AIGaAs layer. The fabrication of the grating in such an active layer caused interface recombination centers that increased the threshold current density substantially at high temperatures (Casey and Panish, 1978). Accordingly, it was impossible to operate such lasers around 300K even at low current densities. This problem was overcome by the separate (optical and carrier) confinement heterostructure (SCH) developed by Aiki et d . (1975). In this new structure, the carriers are confined to the p-AIGaAs active layer while the active layer is a larger region that includes two additional layers of p-AIGaAs. The grating is made on the p-AIGaAs layer to obtain optical feedback. Because the active layer is separated from the corrugated interface, the threshold current density has been found to be low enough to operate the laser diode at higher temperatures. Recent advances in fabrication technology have produced pure gaincoupled lasers (Luo et al., 1990, 1991), mixed-coupled lasers (Luo et al., 1991; Li et al., 1992), and loss-coupled lasers (Tsang et al., 1992a; Bourchert et al., 1993). B. Distributed Feedback Laser Electromagnetics An analysis of the electromagnetic fields inside a DFB laser begins with Maxwell’s equations. First, the wave equation for a simple semiconductor laser cavity is developed, and then this equation is solved considering the index and/or gain grating or corrugation present inside the cavity. Figure 1 shows the schematic of the model for electromagnetic analysis of a DFB laser with coherent external optical feedback. Amplitude reflectivities
76
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
SURFACE OF ANOTHER DEVICE
FIGURE 1. Schematic of a DFB semiconductor laser with a phase shift and external feedback.
of the left and right facets are r 1 and r,, and the corresponding power reflectivities of the facets are R , and R,, respectively. It is assumed that the output beam from the right facet of the DFB laser is reflected by an external reflector and the reflected light re-enters the laser cavity. Here r is the ratio of the feedback power to the output power at the facet, and v] is the coupling ratio of optical feedback into the active region in the laser cavity. Thus, qlis the effective feedback ratio of the laser. The model for a DFB laser as shown in Fig. 1 consists of two sections with lengths I , and I,, respectively. The regions -1, < z < 0 and 0 < z < I , are denoted, respectively, region 1 and region 2 for convenience. For any one of the regions, the index of refraction n(z) is assumed to vary along the (axial) z-direction as
where Ti is the average refractive index over the z-direction and An is its amplitude variation. The corrugation is assumed to have a Bragg grating with a spatial period A. Thus, the Bragg wave number is flB = n/A. The initial phase of the corrugation at the plane z = 0 is assumed to be 4, and the specific.values of C#J are 4, and 4, in regions 1 and 2, respectively. The DFB laser having a quarter-wave or A/4 phase shift has 4, = 4, = 0 . 5 ~ . Due to gain variation in the gain grating, the susceptibility ~ ( z varies )
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
77
according to
where ji is the average susceptibility over the z-direction and Ax is its amplitude variation. Ic/ denotes the phase difference between the index and the gain grating. The quantities n2(z) and ~ ( z as ) used in Eqs. (1) and (2) are interrelated by the well-known relationships: -
D
= con2E
+
(3)
and
P
(4) where 6 is the electric flux density vector, E is the electric field vector, i' is the electric polarization vector, and e0 is the permittivity of the free space. Using Eqs. (1)-(4) in Maxwell's equations, we get the wave equation for a DFB semiconductor laser to be -
aE
V 2 E - po-
at
= EoXE
a2E
- pE0n2-
at2
a2E
- p&,X7
at
=0
where p is the magnetic permeability and o is the resistivity of the material. Here N is assumed to be a mode corresponding to either an internal or an external cavity mode. To find a solution to Eq. (9,we may assume a solution of the form E = C [A,(z)F,(x,
y)ejONt-jflBz
+ B,(z)FN(x,
+ C.C]
y)ejwN'+jflBz
(6)
N
where A , and B , are amplitudes of the forward and backward components of mode N , respectively. The F,(x,y) is the normalized transverse component of the field distribution of mode N . The propagation constant of mode N , denoted by P N , satisfies the wave equation for a DFB laser in an unperturbed medium without any loss, corrugation, or external feedback, given by (V2 + p e O i i 2 w ~ ) ~ , (y)e-jflNZ x, = 0.
(7)
The angular lasing frequency of the mode N is given by BN w, = 2nf, = 7
(8) n&' By putting the value of E from Eq. ( 6 ) in Eq. (9,and replacing n2 and x in Eq. (5) using Eqs. (1) and (2), respectively, we arrive at an equation involving space dependence in the x, y , and z directions, and also time
78
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARlM
dependence. In that equation, the variation of field amplitudes is rather small in the z-direction. As a result, the second-order derivatives with respect to z may be neglected here. We also take time average over a period of AT = 2n/oN and integrate with respect to x and y within the limits of - 00 to + co to obtain a transverse spatial average. In addition, we choose FN(x,y ) such that !Tm (FN(x,y)I2 d x d y = 1. Also, with rotating wave approximation (RWA), some other terms disappear. After these approximations, we arrive at a pair of coupled wave equations (Kogelnik and Shank, 1972):
I.",
where SP, is the deviation of the wave number of mode N from Bragg wave number, g and CI are the gain and loss coefficients for the traveling wave given by
and, K~ and K~ are the coupling coefficients, respectively for the index and , K, are given by gain coupling. The quantities SP,, K ~ and
Thus, the angular lasing frequency can be written as
By differentiating the coupled wave equations (Eqs. (9) and (10)) with
79
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
respect to z, we obtain
[(&-
d2A,(z) 7 = -d2BN(z) -
dz2
g - CI
-jbp,y
[(y
-jda,)i
+( K ~
+jtigej*)(tii
1 1
+ jtige-j@) A,(z)
+ (rci+ j t i g e j @ ) ( t i +i j t i g e - j $ )
Equations (16) and (17) can be written as
(16)
BN(z) (17)
provided
The phase angle $ between the gain and index corrugations is usually 0 (in-phase) or n (antiphase). Assuming that $ is either zero or n, the contributions of both gain and index coupling can be expressed by a single ~ .solutions of Eqs. (18)complex coupling factor R, where R = tii + j ~ The (19) can be written as
+ = bleYNZ + b2e-YNz
A N ( z )= aleYNZa2e-YNz
B,(z)
where a1,2 and b,,2 are constants to be evaluated from the boundary conditions for each section. The boundary conditions for the two sections of the laser are utilized to find the different solutions A,,,(z) and B,,,(z) in Section 1, and A,,,(z) and B,,,(z) in Section 2. Due to the external feedback, the modified reflectivity of the right facet becomes (Favre, 1987),
r; =
+ (1 - R2)JF
e-jsext
(23)
where Be,, = 4de,,/Ae,, is the phase delay of the feedback light in the external cavity and A,, = 27c/(PB 6pN)nr,where n, is the refractive index. Multiple reflection of the feedback light between the laser right facet and the external reflector is neglected assuming that r is sufficiently small.
+
C. Oscillation Condition for a Distributed Feedback Laser For a DFB laser with a phaseshift at the plane z = 0, the boundary
80
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
conditions are given by riAN,2(/2)e-joBi2= B N.2 (1 2) e j 8 ~ 1 2
(24)
rlAN,l(-ll)ej~E'l= BN,](-!])e-J""
(25)
AN,2(0)
= BN,I(o)
(26)
= BN,2(0)'
(27)
Using the conditions given by Eqs. (24)-(27), we obtain the following condition for oscillation of a phase-shifted complex-coupled DFB semiconductor laser:
{
- y Nr,e - j 2 h i 1 cash( - Y N [ I ) - [rle-J2PBl1 (-jS,ON
x
{
+F)-jRej41]
sinh(yNll)}
-yNr)Ze-j28B'2 cosh(-yNz2)
Equation (28) is an equation involving complex quantities. Thus, both real and imaginary parts of Eq. (28) must be simultaneously satisfied by each of the allowed lasing modes. Equation (28) is developed assuming that the phase angle IJ between the gain and index corrugations is either zero or n. However, for the more general case when ) I is neither zero nor n, the contributions of both gain and index coupling can no longer be expressed by a single complex coupling factor R, and we need two different complex factors K + and K - , defined as follows:
K,
= K~
+ jK,e+j*
(29)
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
81
and
+
K - = ui juge-j#.
(30)
For a DFB laser that does not have any phase shift within the structure but has a dephased grating, the boundary conditions for the coupled wave equations become: r2A(L/2) = B(L/2)
(31)
rlB( - L/2) = A( - L/2).
(32)
The oscillation condition obtained from the preceding boundary conditions becomes:
where y is given by Eq. (20). In Eq. (33), r2 must be replaced by r; when external optical feedback is applied to the facet with reflectivity r2. D. General Characteristics of Distributed Feedback Lusers
Each lasing mode of a DFB laser is characterized by its threshold gain
(8 - a) and its lasing wave vector 8. However, the departure of the lasing wave vector from the Bragg wave vector (denoted by Sp,) is the parameter considered in theory. Thus, in the ( J - a) - Sp, plane, each mode solution represents a point where both the real and imaginary parts of the oscillation condition (Eq. (28) or Eq. (33)) are satisfied. The mode solutions of a DFB laser reveal a number of modes with different threshold gains and oscillation frequencies around the Bragg frequency. The mode with the lowest threshold gain is the dominant mode (i.e., the main mode). In a correctly designed DFB laser, the threshold gain difference between the dominant mode and other modes is very large (at least 30dB). In such lasers, only the dominant mode oscillates with almost all of the output power resulting in single-mode operation with a single narrow output spectrum. Pure index-coupled (IC) lasers require a phase-shift within the laser structure to achieve single-mode operation. On the other hand, pure or partly gain-coupled (GC) lasers can achieve single-mode operation without any phase-shifting structure. Theoretically, pure gain-coupled lasers show large threshold gain difference between the main lasing mode and the side modes (Morthier et al., 1990), higher stability due to standing wave effect
82
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
(Kogelnik and Shank, 1972; David et al., 1992), and less spatial hole burning (SHB) (Kapon et al., 1982; David et al., 1991). However, it is difficult to fabricate pure gain gratings because gain variation causes carrier density variation, which in turn causes the index of refraction to vary. Thus, in most gain-coupled lasers, both index and gain coupling are present and such lasers are characterized by a complex coupling coefficient. Lasers with complex coupling coefficients have been reported to exhibit excellent performance (Tsang et a/., 1992b; Li et al., 1992). It has also been reported that partly gain-coupled lasers may show better modulation characteristics than pure G C lasers (Lowery and Novak, 1994; Zhang and Carroll, 1993; Hong et al., 1995). Even a small amount of gain coupling ( 5 % of index coupling) present in complex-coupled DFB lasers can significantly improve the threshold gain difference between the main mode and the side modes (Morthier et al., 1990). Facet reflectivities of a laser play a crucial role in determining the lasing characteristics. In almost all cases, antireflection (AR) coatings are needed on the facets to eliminate the effect of uncertainty of the corrugation phase at the facet during fabrication. Pure gain-coupled lasers also have high yield despite the corrugation phase variation at the facets (Nakano et ul., 1992). One of the facets in a G C laser can be made highly reflecting (HR) to create an HR-AR gain-coupled laser with a high facet efficiency.
OBSERVED EFFECTS 111. EXPERIMENTALLY A number of effects are experimentally observed when a semiconductor laser is subjected to external feedback. When the distance of the external reflector is less than the coherence length of the laser, the feedback is termed coherent. On the other hand, distant reflectors located at a distance longer than the coherence length of the semiconductor laser produces incoherent feedback. A laser diode with a fiber as the external cavity has an external cavity length typically greater than 10 cm whereas a GRIN-rod lens external coupled cavity laser has a cavity length of less than 1 cm. The major effects observed under feedback are line broadening and the increase in the number of modes of oscillation due to external cavity modes. These may lead to mode hopping, intensity fluctuation, and generation of excess noise in optical communication systems. The exact characteristics depend on the feedback level, laser structure, laser driving current, presence or absence of a modulating signal etc. Some of the observed characteristics include excess noise in the low and high frequency range (Broom et a/., 1970; Salathe, 1979; Temkin et ul., 1986; Fujita et al., 1984; Goldberg et al., 1980; Goldberg et ul., 1982; Park et ul., 1998), suppression of excess noise with a
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
83
small modulating signal (Fujita et ul., 1984), self-modulation of the optical intensity (Salathe, 1979; Temkin ef a!., 1986; Fujiwara et al., 1981; Park et al., 1989), successive subharmonic-oscillation cascade leading to optical chaos (Mukai and Otsuka, 1985), kink-shaped light-current (L-I)characteristics, (Temkin et al., 1986; Fujiwara et a/., 19Sl), self mode locking of external cavity modes, coherence collapse of output intensity (Lenstra et al., 1985; Miles et a/., 1980; Temkin et al., 1986), and multiple-pass resonances superimposed on noise spectrum (Seo et ul., 1988).
A. Intensity Fluctuation and Spectral Charucteristics The effects of external optical feedback on semiconductor lasers have been extensively studied experimentally. These effects include narrowing of emission spectrum width (Bogatov et al., 1973; Voumard et al., 1977; Kikuchi and Okoshi, 1982), reduction or distortion of modulated output (Kobayashi, 1976), use of a semiconductor laser as a detector (Mitsuhashi et a/., 1976), low-frequency intensity fluctuation in continuous-wave (CW) operated lasers (Risch and Voumard, 1977) etc. On the other hand, from the early days of optical communication, unwanted external feedback in diode lasers has been known to cause degradation of modulation response and increase in intensity noise (due to fluctuations in intensity) (Broom et a/., 1970; Morikawa et al., 1976; Risch and Voumard, 1977; Ikushima and Maeda, 1978; Chinone et a/., 1978; Hirota and Suematsu, 1979). Properties of a semiconductor laser under optical feedback depend closely on its operating condition. Based on the injection current J of a semiconductor laser, three distinctive regions may be defined (Besnard et a/., 1993): (1) J < Jzh: Coherent and additive optical interference effects between the feedback light and the light reflected from the facet are necessary for the laser to work. The compound cavity delivers a stable intensity output. The coherence length is fixed by the external cavity. In this case, the behavior of the system is extraordinarily sensitive to fine adjustments of the external optics, which fixes the light distribution on the laser facet. (ii) J > Jlh:The laser can live by itself (i.e., lasing effect can occur without feedback). The optical feedback becomes uncorrelated to the field inside the laser. This yields an unstable, noisy intensity output. (iii) J z Jrh:In the vicinity of the solitary laser threshold, a mixing of or a hopping between the coherent and the incoherent behavior yields bursts of noise in the output intensity.
84
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
There are two components in the increased fluctuation frequency spectrum when the laser current is above threshold. One is the high-frequency component peaked at frequency l/z where T is the roundtrip time for the light in the external cavity formed by the external reflector and the laser diode facet facing it (Broom et al., 1970; Risch and Voumard, 1977; Ikushima and Maeda, 1978). The other is termed low-frequency fluctuations (LFF) that peak at a frequency reported to be one to two orders of magnitude smaller than l/z. Lang and Kobayashi (1980) reported extensive experimental observation results for CW AlGaAs double heterostructure diode lasers. Their findings demonstrated that the external feedback can make the injection laser multistable and cause hysteresis phenomenon. A typical light-output-versus-current characteristics with and without external optical feedback is shown in Fig. 2. The two curves with external optical feedback suggest the presence of hysteresis effects. The hysteresis phenomenon has been explained to be caused by crystal refractive index variation due to active regional temperature change with current. The jumps in the light-current characteristics are thought to be due to mode switching.
I
CURRENT FIGURE 2. Light-output-versus-current ( L - I ) characteristics for a CW AlGaAs double heterostructure laser diode with and without external optical feedback. (After Lang and Kobayashi, 1980.)
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
85
Mode jumping depends primarily on whether the external cavity length is close to an integral multiple of the wavelength (large output amplitude undulation) or not (small output amplitude undulation). The transient response of the laser diode to a current pulse at different biasing levels was also investigated by Lang and Kobayashi (1980). The results showed that the relaxation oscillation due to the input current pulse was suppressed at dc bias current levels corresponding to the peaks of the undulations of the L-I curve, but was enhanced at bias levels corresponding to the undulation valleys. Besnard et al. (1993) observed that slight adjustments to the experimental conditions may result in dramatic changes of the L-I curves. Figure 3 shows several typical cases of different L-1 characteristics observed under different experimental conditions. Curves (c) and (d) correspond to the situation where alignment was optimally set for maximum feedback. Both curves (c) and (d) follow an identical path for low injection currents with coherent
I-
3 P
I3 0 II
:
CURRENT FIGURE 3. L-I characteristics of GaAs-AIGaAs channel substrate planar (CSP) double heterostructure laser diodes under different experimental conditions. Curves (a) and (b) are obtained under poor optical alignment. Curves (c) and (d) are obtained when optical alignment is optimally set for maximum feedback. In (b) and (c), a region of instability is observed when the laser diode current exceeds the solitary laser threshold. (After Besnard rt of., 1993.)
86
MOHAMMAD F. ALAM AND MOHAMMAD A. KARlM
feedback effects. However, they may separate into two different branches above solitary laser (without feedback) threshold. In curve (d), the output intensity remains stable along the entire curve, while in curve (c), there is a noisy region and a reduction in the final output power. This reduced power is contributed by the generation of noise due to decrease in the mean time between successive intensity breakdowns at high bias currents (Henry and Kazarinov, 1986). Both curves (a) and (b) were obtained under poor optical alignment. Curve (a) resembles the curves that were reported in earlier investigations (e.g., Temkin et al., 1986). Depending on experimental conditions, a kink may or may not be present in the curve, as shown by the two different branches of the curve (a). Fujiwara et al. (1981) carried out experiments to determine the mechanism of low-frequency fluctuations (LFF) enhancement and relationship between peak LFF frequency and basic parameters of the laser diode with an external reflector. In their experiment, they reported similar L-I characteristics as in Fig. 2 (Lang and Kobayashi, 1980). The L F F peak power was maximum at the undulation valleys while it was minimum at the undulation peaks. Several experimenters (unpublished References 10- 12 in Fujiwara et ul., 1981) also found that above a certain current level I , , the undulation amplitude and lasing differential efficiency are suddenly reduced. Fujiwara et al. (1981) experimentally established a simple relationship fo = f R / w , where f, is the L F F peak frequency, f,is the intensity fluctuation peak frequency in the absence of the external optical feedback, K is the coupling parameter between the laser diode cavity and the external cavity, and T is the round-trip time in the external cavity. This simple relationship was found to be qualitatively in agreement with predictions based on compound cavity model (see Section IV-A). The spectral properties of a semiconductor laser under feedback from a reflector at a distance longer than the coherence length of the laser output was investigated by Cohen et al. (1990). They measured the spectrum and the visibility (absolute value of the field-autocorrelation function) of a laser for different values of feedback. Their analytical model required that the damping rate of the relaxation oscillations change with the amount of feedback. Hamel et al. (1992) measured the visibility of a semiconductor laser for a wider range of feedback levels and compared their results with numerical solutions of the Lang and Kobayashi (1980) equations. Hamel et a/. (1992) were able to solve the problem of feedback dependent damping (Cohen et al., 1990), but they required an unusually large value of the linewidth enhancement factor to fit theory with experiment. Sigg (1993) studied CW output power versus current characteristics of semiconductor lasers. In addition to L-I characteristics, he also reported the effect of external reflector reflectivity on the threshold current of metal-
EXTERNAL OPTICAL FEEDBACK EFFECTS I N LASERS
87
FIGURE 4. Normalized threshold current change (Allh/Ilh)as a function of the square root of the reflectivity of the external mirror (,/Re,,) in InGaAsP-InP laser diodes. (After Sigg. 1993.)
clad-ridge-waveguide (MCRW) type and buried heterostructure (BH) type laser diodes. Figure 4 shows experimentally fitted curves for the normalized threshold current reduction Alrh/lth (Zll, is the solitary laser threshold current and AI,,, is the change in threshold current due to external optical feedback) of a laser diode as a function of square root of the reflectivity of . et ul. (1996) reported experimenthe external reflector ( J R e x l ) Achtenhagen tal results on external optical feedback in complex-coupled DFB lasers. Giles et al. (1994) reported the spectral behavior of a high-power 980-nm InGaAs strained quantum-well laser diode used for pumping erbium-doped fiber amplifiers (EDFAs). They reported that 2% reflection from an external mirror caused the output spectrum to shift from 970nm to 1OOOnm. Because pump lasers used for EDFAs must meet stringent wavelength requirements, they proposed the use of external narrowband grating reflectors to control the laser emission wavelength.
B. Linewidth Reduction, Brocrdening, and Chaos Linewidth reduction has been reported by many experimenters (Patzak et ul., 1983; Kikuchi and Okoshi, 1982; Agrawal, 1984) with small amounts of feedback and proper phase matching. However, high levels of optical feedback result in relaxation oscillation and multiple external cavity mode
88
M O H A M M A D F. ALAM AND M O H A M M A D A. KARIM
operation, which in turn results in linewidth enhancement (Miles et al., 1980; Goldberg et a/., 1980, 1982; Acket et al., 1984; Osmundsen et a/., 1983). Lenstra et a/. (1985) experimentally studied the effects of higher levels of external optical feedback in detail. They observed that at high levels of feedback, the output becomes multimodal with a complex line shape that is almost insensitive to the external cavity length. As the amount of feedback increases further, spectral details in the emission line shape tend to disappear until finally a single dramatically broadened line results, with a width of the order of 25 GHz. The coherence length of the laser light was found to collapse from 10 m without feedback to about 10 mm with relatively high feedback. This phenomenon is termed coherence collapse and is observed in both Fabry-Perot and DFB laser diodes. In most cases, coherence collapse is harmful to a semiconductor laser diode. However, it may also be useful in suppressing coherent backscatter and speckle effects, for example in optical disk data storage systems. Miles et al. (1980) and Goldberg et a/. (1980, 1982) also reported linewidth broadening of the order of 20 GHz compared to the solitary laser linewidth of about 60 MHz. Li et al. (1993) carried out a detailed study of coherence collapse and found a number of new phenomena that can collectively be described as coherence collapse. These include subharmonic bifurcation (Mukai and Otsuka, 1985), self-pulsation (Park et ul., 1990), intermittent behavior (Sacher et a/., 1989), and staircase fluctuations involving random power drops followed by step-wise recoveries (Temkin et al., 1986; Henry and Kazarinov, 1986). Li et a/. (1993) demonstrated that coherence collapse can be reached via a period-doubling route (when the relaxation oscillation and external cavity modes or their harmonics were locked together), suggesting the onset of deterministic chaos, although a quasi-periodic route to chaos was observed by other experimenters (e.g., Lenstra et al., 1985). Coherence collapse has also been reported in distributed Bragg reflector (DBR) lasers by Woodward et al. (1990). For applications of laser diodes requiring narrow linewidth operation, coherence collapse places severe demands on the optical isolation of the laser diode. Spurious back reflections on the order of ( - 30 dB) of the emitted laser beam from various interfaces (like optical fibers) are sufficient to drive the laser into coherence collapse, Cho and Umeda (1984) reported that extremely high levels of optical feedback (5-10%) cause the optical field of a diode laser to be in a state of chaos. Dente et al. (1988) and Merrk et al. (1992) further investigated the transition to chaos in semiconductor lasers. Merrk et al. (1992) demonstrated that coherence collapsed state is a chaotic attractor, and with increasing feedback level, the laser undergoes a quasi-periodic route to chaos that may be interrupted by frequency locking. They obtained experimental phase portraits of the output of a laser diode to study the dynamic behavior of the
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
I
I
-6
89
0
6
OPTICAL FREQUENCY (GHz) FIGURE5. The time-averaged optical spectrum of a 1.3-pm DFB semiconductor laser under optical feedback. The central peak at zero frequency (A) represents the solitary laser oscillation frequency (internal cavity mode). The peaks B and C and their symmetrical counterparts are observed under external optical feedback (external cavity modes). The laser randomly jumps between the two external cavity modes B and C. (After Mark et al. 1992.)
relaxation oscillation. The phase portraits were found to support the concept of quasi-periodic route to chaos. A typical time-averaged optical spectrum under optical feedback is shown in Fig. 5. In the figure, A represents the central peak of the external cavity mode. There are two relaxation oscillation sidebands B and C (and their symmetrical counterparts) corresponding to two strong peaks in the intensity noise spectrum. By tuning an interferometer to different optical frequencies simultaneously, Mark et al. (1990) found that there was a high degree of anticorrelation present between the power levels of the peaks around modes B and C. This suggests that the laser jumps randomly between modes B and C . At a lower feedback level, the average time between jumps is very small, which increases for an increasing feedback level, attains a maximum, and finally drops rapidly. They argued that the final decrease in jumping time indicates that the system became chaotic. Recently, Lam et ul. (1996) studied the chaotic stability of DFB and ridge-waveguide external cavity semiconductor lasers under modulation and concluded that dynamic determinism is present at broadband chaotic state in these laser diodes. Another interesting phenomenon is the stable operation of semiconductor lasers with very strong external feedback. For example, if an antireflection (AR) coating is used to increase the relative feedback strength further, stable operation has been demonstrated in AlGaAs (Fleming and Mooradian, 1981) and InGaAsP (Wyatt and Devlin, 1983) diode lasers. Similar stability at high feedback has also been reported for InGaAsP laser diodes by Temkin el al. (1986).
90
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
C. Noise Generation The noise characteristics of semiconductor lasers under optical feedback are extremely important for such applications as optical communications, optical measurement etc. The generation of noise is the direct effect of a number of interrelated phenomena like intensity fluctuation, linewidth broadening, coherence collapse etc. as discussed earlier. External feedback results in excess noise generation at frequencies corresponding to the integral multiples of the external cavity roundtrip time (Broom et al., 1970; Salathe, 1979; Lang and Kobayashi, 1980). For the external cavity length of the order of 1 m, such noise peaks can be observed at gigahertz frequencies and are referred to as high-frequency noise. There is also a low-frequency noise in the < 100 MHz region, the frequency of which is proportional to the length of the external cavity (Hirota and Suematsu, 1979; Fujiwara et al., 1981; Morikawa et al., 1976). Temkin et al. (1986) carried out a number of experiments investigating the role of feedback intensity, bias current, and the external cavity length on several types of index-guided InGaAsP laser diodes. Figure 6 shows the experimental setup for reflection noise measurement by Temkin et al. (1986). Reflection feedback was provided by a flat front surface mirror mounted at distances between 10 and 150 cm froin the laser diode.
7 SPECTROMETER
ATENUATOR
FIGURE6. Experimental setup for reflection noise measurement of InCaAsP index-guided laser diodes. (After Temkin et d., 1986.)
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
91
Figure 7 shows the L-I characteristics with and without external optical feedback. High feedback results in threshold decrease, and a kink-shaped L-I curve. There is an inflection point in the L-l curve. In contrast to close reflector experiments (Fujiwara et a/., 198I), periodic undulations in the power output corresponding to longitudinal mode jumps were not observed for distant reflectors even at the highest feedback intensity (Ito and Kimura, 1980). The noise spectrum of the laser studied by Temkin et a/. (1986) consisted of a large number of sharp and intense peaks, equally spaced with the external roundtrip frequency. Noise intensity and spectral details were found to depend only on the bias and external cavity conditions and not on the laser structure. The overall spectrum found by Temkin et al. (1986) was very broad, extending from 0.2-6 GHz at low bias and the envelope peak was around 3 GHz. Figure 8 shows the optical spectra of an index-guided InGaAsP laser diode near threshold with and without external optical feedback. A change in the bias current causes a change of both high-
WITHOUT
CURRENT FIGURE 7. Light-current characteristics of a 1.55-pm ridge waveguide laser with and without external optical feedback. Expanded view of near-threshold region also shown. (After Temkin et a/., 1986.)
92
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
FIGURE8. Longitudinal mode spectra of an index-guided InGaAsP laser diode near threshold: (a) without external optical feedback and (b) with moderate optical feedback. Note that under external feedback, a number of side modes have power comparable to the main mode and each mode has increased linewidth. (After Temkin et a!., 1986.)
frequency and low-frequency noise peaks as shown in Fig. 9. Temkin et al. (1986) observed low-frequency noise in their studies in the 2-60 MHz frequency range. The noise intensity builds up with increasing bias current, and a maximum is observed near the inflection point of the L-I curve in Fig. 7. Above the inflection point, noise intensity rapidly decreases. They also observed that the frequency fL corresponding to the low frequency noise varies with the external cavity lifetime t as f L = u/t where a 0.08 while the frequency f, corresponding to high frequency noise varies as multiples of l/r. The same phenomenon was also reported by Morikawa et al. (1976). The noise spectra observed by Temkin et al. (1986) show a flattening effect when external feedback is increased. The individual modes are also downshifted in wavelength while their linewidths are greatly increased. External cavity modes were observed by Temkin et al. (1986) at low feedback levels
-
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
20
93
60
DIODE CURRENT (mA) FIGURE9. Bias current dependence of the first harmonic of the high- and Iow-frequency noise components. Both components follow an identically shaped curve, except that the scales are different for these two components. The left scale is for low-frequency noise while the right scale is for high-frequency noise. (After Temkin et ul., 1986.)
very close to threshold. At higher feedback and at the bias current above the inflection point in L-I characteristics in Fig. 7, the external cavity modes broadened very rapidly and could not be resolved. This broadening is similar to the coherence collapse phenomena observed by Lenstra et al. (1985). Schunk and Petermann (1989a) reported measured feedback-induced intensity noise for 1.3 pm DFB laser diodes. They compared their measurements with their theoretical predictions (Schunk and Petermann, 1988) and found that the relative intensity noise (RIN) starts to increase abruptly beyond a minimum value when external feedback power is increased from very low value of the order of - 50 to - 20 dB. Figure 10 shows a typical RIN-versus-feedback ratio plot. They also observed that the RIN depended on the index coupling coefficient K~ of an index-coupled DFB laser, and a change of the coupling strength from K L= 1.5 to KL = 3 yields an improvement by more than one order of magnitude. The RIN measurements have also been reported by Kawai et al. (1995) and Park et al. (1998). Park er al. reported the effects of external optical feedback on the power penalty of commercial DFB laser modules. They suggested that optical isolators for DFB laser modules used in 2.5Gb/s systems require an isolation ratio of
94
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARlM
-50
-40 -30 FEEDBACK RATIO (dB)
-20
FIGURE10. Relative intensity noise as a function of feedback level for a DFB semiconductor laser for two different output powers. Note the abrupt rise in RIN above a certain threshold feedback level. (After Schunk and Petermann, 1989.)
better than 54.5 dB for negligible power penalty induced by external optical feedback.
D. Regimes of External Feedback Although many authors have studied external feedback effects within a narrow range of external feedback ratios, some authors have classified the effects of external optical feedback into a number of regimes depending on various unique characteristics observed under different external feedback levels. Tkach and Chraplyvy (1986) measured the effects of feedback on the spectra of 1.5pm DFB semiconductor lasers for feedback power ratios ranging from -80dB (very weak feedback) up to -8dB (very strong feedback). They proposed five regimes of operation depending on the observed effects.
Regime I:
In this regime of extremely weak feedback, narrowing or broadening of emission line is observed, depending on the phase of the feedback. Linewidth change of the order of 30% is observable in this regime. Regime ZI: The emission line starts to show splitting that arises from
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
95
rapid mode hopping. The magnitude of the splitting depends on the strength of the feedback and on the distance to the reflector. Regime Ill: The mode hopping is suppressed and the laser operates on a single narrow line. This regime is very narrow, from -45 to - 39 dB and independent of the distance to the reflector. Regime IV: At around -40 dB, satellite modes, separated from the main mode by the relaxation oscillation frequency, start to appear. Effects are independent of distance to the reflector. These grow as the feedback increases and the laser line eventually broadens to as much as 50 GHz. This region corresponds to the coherence-collapsed region. The effects on this regime are independent of the feedback phase. Regime V: Extended cavity operation with a narrow linewidth is observed at the highest levels of feedback (usually greater than - 10 dB). Typically it is necessary to antireflection (AR) coat the laser facet to reach this regime. In this regime the laser operates as a long cavity laser with a short active region. The laser is relatively insensitive to additional external optical perturbations. Tkach and Chraplyvy (1986) also measured the feedback level where each transition occurs as a function of the distance to the external reflector; the regions are shown in Fig. 11. E. Other Phenomena When the bias current of a semiconductor laser is suddenly changed from a value below threshold to a value above threshold, there is a delay time before the laser switches on. The turn-on time varies statistically and the mean turn-on time (MTOT) is an important parameter of a laser diode. The standard deviation of MTOT is called turn-on jitter (TOJ). Low TOJ is required for high bit-rate optical communication systems. Experimental results by Simonsen (1993) reveal that relaxation oscillation sidebands due to external optical feedback can be suppressed and linewidth can be strongly reduced for CW operation of a laser diode. For large external cavity roundtrip times, TOJ increases with feedback under both weak (Langley and Shore, 1992; Langley and Shore, 1993) and moderate (Wu and Chang, 1992) optical feedback. Recent results show that the MTOT and TOJ oscillate periodically with external-cavity round-trip time of the order of picoseconds under repetitive gain switching (Hernandez-Garcia et ul., 1994). Recently, Besnard et ul. (1993) carried out an in-depth investigation of
96
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
"I c
m
E 9
2
K
-20
V
t
IV
Y
0
U
m n W
W LL
10
20
40
80
160 320
DISTANCE TO REFLECTOR (cm) FIGURE11. Various regions of feedback (I, 11, 111, IV, and V ) when distance to reflector and feedback ratio are vaned. (After Tkach and Chraplyvy, 1986.)
reflection-induced behavior of a semiconductor laser with a distant reflector. They reported a number of new effects, including multiple pass resonances at (c/3nL)and (c/4nL) in the noise spectrum, where n is an integer, L is the laser length, and c is the velocity of light. These resonances are observed when the optical system is slightly misaligned. They also observed noise bursts in the output signal that eventually culminated in coherence collapse. Besnard et ul. (1993) also observed subharmonic generation when a modulating signal was applied to a laser diode. They further observed switching between the first- and second-order triple-pass resonances of the external cavity. Besnard et ul. showed that most of the new effects have their origin in two distinct physical mechanisms: (i) dynamical effects: a locking of the statistically distributed intensity drops by deterministic effects; and (ii) spatial effects: breaking of symmetry of the optical beam that
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
97
propagates inside the passive external cavity, brought about by asymmetries of the geometrical configuration of the experimental setup. Mink and Verbeek (1986) observed asymmetry in the output power and in the power spectrum of the light emitted by the two facets of a laser when one of the facets is subjected to external optical feedback. Tomasi et al. (1994) observed asymmetric pulse shape in the low frequency fluctuations (LFF) of a semiconductor laser when it is subjected to external optical feedback. Nakano et al. (1991) reported the fabrication of gain-coupled DFB lasers and measured the intensity noise in such lasers. They reported that gaincoupled DFB lasers were less sensitive to external optical feedback in terms of RIN. Kurosaki et al. (1994) reported improvement in external optical feedback sensitivity in quarter-wave-shifted (QWS), index-coupled DFB lasers by moving the phase shift away from the center of the DFB laser towards one of the facets. Recently, Chuang et al. (1996) reported a complex-coupled DFB laser with current blocking grating that showed high resistance to external optical feedback effects (Wang et al., 1997). The measured relative intensity noise of the lasers was as low as - 160dB/Hz even in the presence of an external feedback level of - 15 dB. Without using an optical isolator, transmission over 235 km of fiber was demonstrated with a power penalty of only 1.55dB at a bit error rate (BER) of with this DFB laser. A partially corrugated-waveguide laser diode (PC-LD) was also recently demonstrated to be resistant to high levels of external optical feedback (Huang et al., 1996). Benoist (1996) recently investigated the possibility of optical isolation of a semiconductor laser using frequency-shifted feedback using acoustooptic interaction and reported better performance of a laser diode with frequencyshifted feedback compared to conventional feedback.
ON OPTICAL FEEDBACK Iv. THEORIES
Theoretical models for optical feedback are usually based on Lang and Kobayashi (1980) rate equations, which have proven to contain all the dominant effects observed experimentally. For weak feedback, regimes 1-11, a small signal analysis has been demonstrated by many authors to give a correct description of linewidth, spectral behavior, and modulation properties (Petermann 1988). For moderate or strong feedback, regimes 11-V,the nonlinearities must be taken into account.
98
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
A . Compound Cavity Model
Lang and Kobayashi (1980) analyzed the effect of external optical feedback when the distance of the reflector is smaller than the coherence length using a compound-cavity model. They demonstrated that external feedback can make an injection laser multistable and cause hysteresis phenomena, which follows the same mechanism as nonlinear Fabry-Perot resonator (Szoke et ul. 1969). Lang and Kobayashi used the following form of rate equation for the electric field for compound cavity laser configuration:
+ K E ( ~- T)ejn(f-T).(34)
E(t)eJnr
Here, E is the electric field, R is the laser oscillation frequency, T is the external cavity transit time, wN(n)is the diode cavity longitudinal mode resonance frequency, which is defined with an integer N as wN = Nnc/yl,, where is the active region refractive index, c is the velocity of light, and I , is the diode cavity length. G(n) is the gain of the laser medium, To is the loss of the diode cavity. The last term in Eq. (34) represents the external feedback. The coefficient K is related to cavity parameters as: K =
ca/2y1,
(35)
where parameter a, defined with the facet and external mirror reflectivities R , and R,, respectively, as (1
= (1 -
RJR
JR2)l',
(36)
is a measure of the coupling strength between the two cavities. Multiple reflections in the external cavity are neglected here. The rate equation for carrier density n is given by d dt
- n = -7n
-
G(n)lE12 + P
(37)
where P denotes the number injection rate per unit volume of excited carriers, which is related to current density J , electronic charge e, and diode active layer thickness d as P = J/ed, and y is the inverse spontaneous lifetime of the excited carriers. Under steady-state conditions, real and imaginary parts of Eq. (34) are set equal to zero. For small variations of the refractive index An, laser oscillation frequency AR, and an external parameter Ax expanded around their references values (nrra,, x r ) , solitary diode cavity resonance is expressed as:
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
99
where q, = q(n,, S2,,xr) is the reference value of q, and q, = and likewise for qn and q,. Under steady-state conditions, Lang and Kobayashi (1980) obtained the following equation from the preceding equations: qx.Ax = ( q c f f / ~ , ~ ) (COS(AS~.T) B[R - sin(AS2.z) - R ] - AC2.z)
(39)
where
is the external cavity transit time, /l= aL,/L,, L, = external cavity length, and L, = qefflU= effective diode cavity length. The factor R depends on cavity parameters and critically affects the lateral transverse mode stability in stripe geometry lasers against spatial hole burning (Lang 1979; Thompson et al., 1978). The calculated frequency versus refractive index is a multivalued function of the external parameter x, and that is why an external cavity laser is multistable. Lang and Kobayashi found that multistability arises when (1 + R~ ) % L , /L , > 1.
(40)
As R is large in semiconductor lasers, multistability is possible when the other parameters are suitably chosen. Under dynamic conditions, the stability of a stationary solution has been examined by Lang and Kobayashi by studying the time development of infinitesimal fluctuations in the field and the carrier density around it. They found that the stationary solutions of Eq. (39) for which F < 0 represent dynamically unstable (DU) states provided,
K, = K
cos(AR. Z)
(42)
tis = K
sin(AS2.z).
(43)
Lang and Kobayashi also found regions of gain-spectrumwise unstable (GU) ranges of frequencies by considering frequency dependence of gain. The laser frequency was found to be a multivalued function of the external parameter x and that can explain the multistability of the compound cavity. They also studied the dynamic response of the laser output to a small amplitude current modulation. The amplitude response curve as a function
100
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
of frequency was found to be dependent on the parameter AQ.7. The peak in the response spectrum depends on AR-7 and indicates that the relaxation oscillation in the laser output due to external feedback is suppressed when the returned light favorably interferes ( A R - 7 z 0) with the field in the diode, while it is enhanced and prolonged when A Q . z has negative values of an appreciable magnitude. Olesen ef al. (1986) and Tromborg et al. (1984) examined the influence of nonlinear dynamics on the linewidth, spectral behavior, and stability properties for a semiconductor laser with an external cavity. Olesen et al. separated the electric field equation given by Lang and Kobayashi (Eq. 34) into two separate equations for amplitude and phase of the electric field. They used the noise-driven rate equations instead of linearized small-signal equations for their simulations. Linewidth increase was found to be connected to an abrupt transition from a coherent to an incoherent state, or in other terms, a transition from a fixed-point attractor to a strange attractor (chaotic state). They determined the stability limits of a laser diode under optical feedback by considering phase condition and gain condition limits and also identified regions of dynamic instability. They showed that the ratio of linewidths with and without optical feedback in the stable region of operation is given by
Av
+ X J m cos(wz + $)}2 (44) AVO where Av is the spectral width with optical feedback, and Avo is the corresponding width without feedback. The X is the feedback parameter given by X = K T / T ~ , , where zin and T are the roundtrip times in the laser cavity and the external cavity, respectively, and is the power reflected from the external cavity relative to the power reflected from the laser mirror. Here c1 is the linewidth enhancement factor, w satisfies the phase condition - = [l
woz = w7
+ ~ J i T Z s i n ( o z + +)
(45)
and = arctan c(
(46) where coo is the solitary laser (without feedback) angular oscillation frequency. They introduced the concept of a “coherent feedback level” and showed that strong increase in linewidth is possible under certain conditions. The three rate equations for field amplitude, phase, and carrier density (Olesen et al., 1986; Mmk et al., 1992) can be solved when the initial conditions for field amplitude, phase, and carrier density are specified. However, due to the roundtrip delay time z, the amplitude and phase of the )I
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
101
field for - T < t < 0 also needs to be specified. This makes the system infinite-dimensional because time evolution actually depends on the specified values for the amplitude and phase of the field in a continuous time interval. A solution to the three equations describe a trajectory in the three-dimensional (3-D) space ( E o ,4, N ) , where the symbols represent the field amplitude, the field phase, and the carrier density, respectively. The trajectory obtained after the transients have died away constitutes the attractor. There is in general several coexisting attractors for a given feedback level. Some typical attractors are fixed point solution, limit cycle (a cyclic closed trajectory), the torus, and the strange (chaotic) attractor. Kikuchi and Lee (1987) analyzed the spectral stability of weakly coupled external cavity semiconductor lasers. They used the same type of equations as were used by Lang and Kobayashi (Eqs. (34) and (37)) and numerically solved the two equations to find the field spectrum by simulation. They concluded that the relation between the external cavity mode spacing f,, and the relaxation frequency of the carrier density fR determines the spectral stability. For a good spectral stability, it is important to satisfy the requirement that the external mode spacing is much larger than the relaxation resonance frequency of the solitary laser. Results from computer simulations suggest that the occurrence of mode hopping (Schunk and Petermann, 1988; Msrk and Tromborg, 1990), low-frequency intensity fluctuations (Mark et al., 1988), and the onset of coherence collapse (Olesen et al., 1986; Schunk and Petermann, 1988; Schunk and Petermann, 1989b) can be correctly predicted by the Lang-Kobayashi equations. For feedback levels below about -45dB one may assume that the light intensity is constant, which leaves the phase of the electric field as an independent variable of the system. This approximation leads to a potential model (Msrk and Tromborg, 1990; Msrk et ul., 1990b; Lenstra, 1991), which explains why external cavity lasers prefer to oscillate at the mode with minimum linewidth instead of the mode with minimum threshold gain. The model also predicts the experimentally observed rates of mode hopping to good accuracy. However, the potential model does not apply to the regime of coherence collapse. Mirasso and Hernandez-Garcia (1994) studied the effects of current modulation on timing jitter of semiconductor lasers in the short external cavities that are good for use in packaged laser diodes. Using the usual Lang-Kobayashi equations, they analyzed statistical properties of the turnon time of a laser diode. Besnard et al. (1993) took into account multiple feedback contributions (Hjelme and Mickelson, 1987; Favre and Le Guen, 1985) to the basic Lang-Kobayashi equations and qualitatively explained many of the new phenomena observed by them (see Section IV-B that follows).
102
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
B. Coherence Collapse
Lenstra et al. (1985) developed a set of equations for a laser diode under optical feedback to model the coherence collapsed state by assuming that the fluctuating phase difference 4(t) - 4(t - T ) is not very small where t is the external cavity roundtrip time. Their equations are:
d dt
1 2
- 4(t) = - aSAN(t)
d
-AZ(t) dt
=
(ZAN(t)
+ F,(t) + yS,(t)
+ F,(t) + 2ZyS,(t)
d Q: - N(t) = - 2ARAN(t)- -AZ(t) dt
(47)
SI
+ FN(t)
(49)
AZ(t) and AN(t) are the fluctuations in photon number and carrier number, respectively, a is the linewidth enhancement factor (Spano et ul., 1984; Agrawal, 1984; Henry, 1983), and 5 = Sg/SN is the differential gain (g is the stimulated emission rate or gain and N is the mean carrier number); Z is the mean number of photons in the mode. In addition, QR and ,IR are relaxation oscillation frequency and damping constant, respectively, and Fd,,F I , and FN are the Langevin forces, which model spontaneous transitions as a noise source acting on the phase, photon number, and carrier number, respectively. Here c is the velocity of light, R the power reflectivity of the laser mirror, r the fraction of power reflected back onto the laser facet, Id,eff the effective diode cavity length, and ,f is the fraction of the reflected field, which couples back into the lasing mode due to diffraction limited imaging. The quantities S, and S, are given by: s,(t) + js,(t) = ~ ( t=) e - j f ~ r [ e - I [ , ( l ) - , ( l - r l l - G(T ) l (51) where w is the mean frequency of the laser field and G is the stationary-state correlation function G(t) = (expi
-jC4@’) - 4@‘- t)l)>
(52)
where ( ) denotes averaging over t’. Using these equations, they derived the mean square fluctuations as
CA,(t)12
=
(CW + t ) - 4(”>
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
where A,
=
1;
:j
A, =
103
IG(t)12 d t
(54)
IG(t)12cos(0,t) dt
(55)
G(t) = exp( -f[I$(t)I2)
(56)
and a correlation function which self-consistently satisfies Eqs. (54) and (55) for very small lG(t)J has been derived. For such a lG(t)), the quantity -ln(G(t))/t has a slope given by n1i3
p- 8
r;/3a4/3
(1 -
9)2i3 p 3 .
(57)
The preceding expression was used for verifying experimental results on coherence collapse, and was able to explain experimentally obtained linewidth broadening. Cohen, Lenstra and coworkers (Cohen and Lenstra, 1989; Cohen et al., 1990) further studied coherence collapse by considering the light injected from the external cavity as a noise source. They obtained a statistical description (Dorizzi et al., 1987) of the collapsed state by self-consistency calculations that agree well with experiments. However, some discrepancies between theory and measurements have been detected (Cohen et al., 1990). Wang and Petermann (1991) used a similar approach to obtain an upper limit on the RIN due to optical feedback. The approach cannot, however, determine which route to chaos a laser undergoes under optical feedback. Another approach to a simplified analysis of the collapsed state is the injection locking model introduced by Henry and Kazarinov (1986). In this model, the feedback system is replaced by a laser diode exposed to injection of the stationary feedback field. Mnrrk et nl. (1988) showed that the injection locking model can explain the characteristic pattern of intensity dropouts observed in the time evolution of the intensity for low bias currents (Temkin et at., 1986; Sacher et al., 1989). Mmk et al. (1992) analyzed the stability of semiconductor lasers under optical feedback and found that within the regime of coherence collapse, the laser dynamics display the typical characteristics of chaos. They also found that two attractors associated with the same external cavity mode coexist but have different relaxation oscillation frequencies. Spontaneous emission leads to random jumping between the two attractors, which results in two strong peaks in the intensity noise spectrum. The origin of the second attractor was identified as a second Hopf bifurcation from an unstable external cavity mode. Li and McInerney (1993) showed from theoretical
104
M O H A M M A D F. ALAM A N D MOHAMMAD A. KARIM
calculations that the coherence collapsed state is a chaotic attractor with a fractal dimension of between 2 and 3, even with the inclusion of realistic spontaneous emission equation. C. Bistability under Optical Feedback
In a semiconductor laser operating close to the threshold under strong feedback, transition to chaos is preceded by random drops of the intensity, which give rise to a kink in the light-current characteristics (Henry and Kazarinov, 1986). Weak optical feedback can transform the relaxation oscillation of the solitary laser diode into self-sustained oscillations. The low-frequency fluctuations can be explained by a transient bistability (Mark et al., 1988). The bistability is caused by the competition of the external resonator mode of maximum gain reduction and another mode of smaller linewidth. The latter mode can live on a timescale of less than one roundtrip in the external cavity. Above a critical feedback level the stationary solution of the deterministic equation loses stability. One or two coexisting limit cycles with different oscillation frequencies close to the relaxation oscillation frequency Q turn up in numerical simulations in agreement with experimental findings (Mark et ul., 1990b; Merrk et al., 1992). The presence of limit cycles depends on QT where z is the round trip time in the external cavity. Starting from the Lang-Kobayshi equations, Ritter and Haug (1993) studied the bistability of limit cycles created by Hopf bifurcations from the same external cavity mode of a single mode semiconductor laser with optical feedback. They analyzed the pulsation amplitudes, frequencies, and the range of bistability under weak optical feedback. A short external cavity having a length of less than a few millimeters has been shown to be able to avoid coherence collapse regime (Mark et al., 1992; Schunk and Petermann, 1989b). A 50% increase in bandwidth of direct current modulation was also shown to be attainable for short external cavity laser diodes due to the dependence of effective differential gain on the detuning between laser diode cavity and external cavity (Lau and Yariv, 1985; Agrawal and Henry, 1988; Elenkrig et al., 1990). The effect of a short external cavity on the modulation characteristics has been studied by a number of researchers (Schunk and Petermann, 1989a,b; Lau and Yariv, 1985; Agrawal and Henry, 1988; Suris and Tager, 1985; Lau, 1988; Tromborg et al., 1984; Tager and Elenkrig, 1993). D. Mode Competition Noise Due to external optical feedback, multiple modes can oscillate simultaneously in a semiconductor laser. The nonlinear interactions among various
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
105
NORMALIZEDTHRESHOLD LEVEL DIFFERENCE FIGURE 12. Variation of the ratio of powers between two modes as a function of the threshold gain difference between the two modes. There is a region where the modal power ratio is multivalued. Bi-stability is observed in this multivalued region. (After Yamada, 1986.)
modes give rise to mode competition- the competition between various modes for the total optical power. Yamada (1986) investigated the mode competition noise in diode lasers. An example for the case of two competing modes is shown in Fig. 12. It shows a typical plot of the modal power ratio (between the two competing modes) as a function of the normalized threshold gain difference between the two modes. There is a region in which multiple values of the modal power ratio are possible for the same threshold level difference between the two modes. This can give rise to bistability as well as mode competition between the two modes. Yamada and Suhara (1990) analyzed the noise properties of semiconductor lasers from the viewpoint of mode competition. The rate equations for the mean photon number SN of mode-N and the injected electron density q were obtained by taking nonlinear interaction among lasing modes into account as (Suhara et al., 1994; Alam et al., 1997a):
106
MOHAMMAD F. ALAM A N D M O H A M M A D A. KARIM
D
= 2B
[
x j(w,
- W&f)
+ 7s1 I,,I - I , I,. -*-
-
where A , is the linear gain coefficient, 5 is the confinement factor of the optical field into the active region, a and b are the coefficients giving the gain slope and the wavelength dispersion relation, respectively, lLo is the wavelength at the gain peak, qg is the transparent level of the electron density, B and D are the gain saturation coefficients for the identical mode and different mode, respectively, due to the burning effect on energy of the laser polarization characterized by the intraband relaxation time tinfor the electron wave (Yamada, 1983), R,, is the dipole moment, n is the refractive index, and q, is an injection level characterizing the gain saturation coefficient. is another saturation coefficient due to the beating vibration of the injected electron density giving an asymmetric saturation profile on photon energy (Ogasawara and Ito, 1988; Yamada, 1989); qth is the threshold electron density, I/ is the volume of the active region, z, is the electron lifetime, I , and Ith are the transparent current level and the threshold current level, respectively, given by
and a is the linewidth enhancement factor (Henry, 1982). Gth(,) is the threshold gain level for mode N given by foL gth(N) [IEIy"(z)12 Gth(N)
where
gth(N)
=
+ IEk~'(z)12]1F~(-% y)12 d z
.&joLCIE'N"(Z)l2 + IEk-)(Z)1211~N(X, Y)12 d=
(65)
is a mode solution of the oscillation condition Eq. (28) or Eq.
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
107
(33). C is the spontaneous emission factor (Suematsu and Furuya, 1977) defined as a ratio of the spontaneous field going into a lasing field. Here 3 N ( fand ) 3Jt) are fluctuation components due to spontaneous emission. Correlation functions among these fluctuation terms are given as follows:
Here, 3,, and 3,, are the frequency components of the fluctuation terms 3 N ( t )and 3,(t), respectively. S,, and g, are the dc components of photon number and electron density, respectively. The nonlinear interactions among lasing modes are described in terms of D and H N ( M in ) Eq. (58), and are called the mode competition phenomena. Mode competition enhances fluctuations due to spontaneous emission causing excess intensity noise ) are zero for when more than one mode is present because the H N ( Mterms single-mode oscillation, but are nonzero for multimode operation. The RIN can be calculated from
where S,, is the fluctuation component of the photon number at the angular frequency R. Wu and Chang (1993b) analyzed the mode partition noise in semiconductor lasers with optical feedback for CW and dynamic operation. They used simulation techniques to study photon statistics and RIN spectra for the main and one side mode under optical feedback.
V. EXTERNAL OPTICAL FEEDBACK SENSITIVITY A distributed feedback semiconductor laser usually operates more modes with extremely narrow linewidth compared to type semiconductor lasers. Solutions of Eq. (28) or Eq. (33), give the modes of oscillation of a DFB laser. Each mode
with one or Fabry-Perot for example, has its own
108
M O H A M M A D F. ALAM A N D M O H A M M A D A. KARIM
oscillation frequency and threshold gain. Analysis of optical feedback usually begins from the oscillation condition Eq. (28) or Eq. (33). A. Sensitivity to Threshold Gain and Spectrum Favre (1987) analyzed the effect of external optical feedback on threshold gain, resonant frequency, and spectral linewidth in DFB semiconductor lasers. Starting from the oscillation condition (Eq. (28) with different notation), Favre used linear expansion around the solitary laser mode solutions for threshold gain ct and wave vector deviation 6 (6 is related to departure of the oscillation frequency w from the Bragg oscillation frequency oBby 6 = n ( o - o,)/c where n is the mean effective refractive index and c is the velocity of light in free space). He defined an external optical feedback sensitivity parameter C as follows: AaL - j A6 L = Cre-jwr
(70)
where Aa is the change in threshold gain, A6 is the change in wave vector deviation, r is the power reflectivity of the external reflector, and 7 is the roundtrip delay in the external cavity. The C parameter is a complex quantity and depends only on the solitary DFB laser modal characteristics. This parameter is a measure of how strongly a change in the reflectivity of one of the facets affects threshold gain and oscillation frequency of the DFB laser. Favre showed that for a DFB laser, the C parameters for the left and right facets are related by
where P, and P, are the transmitted powers through the right and left facets, respectively, and Cr and C, are the C-parameters for right and left facets, respectively. In case of Fabry-Perot type laser diodes, the C-parameter for a facet under optical feedback was found to be
where p is the reflectivity of the facet without optical feedback. For a DFB laser with its left facet antireflection (AR) coated, the C-parameters for the right and left facets, C, and C,, respectively, were derived by Favre as:
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
109
where qo = u0 - j S , is the value without feedback, L is the laser length, K is the index coupling coefficient, p , is the reflectivity of the right facet, and Wr(Y,)is the grating phase at the right (left) facet. A similar equation for the C parameter for an antireflection (AR) coated phase-shifted DFB laser submitted to weak external feedback was derived as:
c = -(Y0/40) S
= (1 -
+
U
'
+ p ; ) - (1 - p;02)e-YoL- ( p i - OZ)eYoL - (2~20- 02)eYoL + 2(1 - 0 2 ) ] [ 4 p t / ( i+
02)(1
7'= [O'e-Yo" U
S T
= po(pi - OZ)eYOL - ( 1
- pit)2)e-y0L/po
(75)
-jsy + u2.
(76)
where y is defined by y2
= (E
A mode solution of the oscillation condition (Eq. (28), for example) represents a point ( E ~ 6,) , in the a-6 plane and defines the corresponding yo obtained from Eq. (76). Other parameters appearing in Eq. (75) are as follows: po = ( - y o + a0 - j d 0 ) / j ~qo , = a0 - jS,, and O = e-jn where R is the corrugation initial phase at the phase shift position. Using the foregoing analysis, Favre's 1987 study of index-coupled conventional (without any phase-shift) DFB lasers with a cleaved right facet and an AR-coated left facet (CL-AR) having low coupling strengths ( K L< 1) showed that the sensitivity to optical feedback for these DFB lasers through their cleaved facet is similar to that of Fabry-Perot lasers. A conventional CL-AR DFB laser with K L= 4 was found to be about five times less sensitive to optical feedback through the cleaved facet than Fabry-Perot lasers. Considering feedback through the AR-coated facet, Favre found that DFB lasers are ten times more sensitive to feedback than Fabry-Perot lasers for K L = 0.35, and the DFB laser is less sensitive through the AR-coated facet than a Fabry-Perot laser only for coupling strength uL > 4. Favre also analyzed the phase-shifted index coupled DFB laser and found that for large values of KL,the optimum phase shift for less sensitivity is 2R = n, which is the quarter-wave-shifted (QWS) DFB laser. A plot of the absolute value of the C parameter for a QWS index-coupled DFB laser with AR-coated facets is shown in Fig. 13. It can be noted that QWS DFB lasers are less sensitive to optical feedback than Fabry-Perot type lasers with cleaved facets for K L > 2.5. Favre also found that the ratio of the intensity envelope at the end
110
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
COUPLING STRENGTH (d) FIGURE13. Plot of the absolute value of the C-parameter as function of coupling strength
KL for an AR-coated QWS DFB laser. The C-parameter for a cleaved Fabry-Perot type semiconductor laser is also shown for comparison. (After Favre, 1987.)
to the intensity at the center, Z(L/2)/Z(0), also varies with K L almost identically as ICI varies with ICL.Beylat and Jacquet (1988) reported simulation results on optimum facet reflectivity for minimum sensitivity to external optical feedback. Favre (1987, 1991) used a linear gain theory, which provides steady-state lasing conditions at threshold, to analyze the external optical feedback on DFB lasers. However, it has been shown that the axial nonuniformity of carrier density, that is, longitudinal spatial hole burning (SHB) in DFB lasers is an important phenomenon (Whiteaway et al., 1989; Ketelsen et al., 1991; Phillips et al., 1992) which should be taken into account when a DFB laser operates above threshold. Wu and Chang (1993a) used an axially nonuniform carrier density to take into account the axial variations of gain and refractive index due to SHB effects above threshold. They proposed an axially averaged C-parameter and compared the external feedback sensitivity performance for AR-coated QWS DFB lasers for different injection currents. They found that increased injection current also increases the averaged C-parameter. Favre (1991) extended his analysis to complex-coupled lasers as well. He concluded that although the modal characteristics (selectivity, threshold,
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
111
etc.) of pure gain-coupled or partially gain-coupled (complex-coupled) DFB semiconductor lasers showed immunity to external feedback, the external feedback sensitivity parameter (C-parameter) for gain-coupled DFB lasers is comparable to that of index-coupled lasers. The basic parameter that mainly determines feedback sensitivity is the absolute value of the complex coupling strength ZL (2 is the complex coupling coefficient and L is the laser length). Recently, Hirono et al. (1992) proposed an analytical expression for sensitivity of DFB lasers to external optical feedback. They reported that the sensitivity is proportional to the ratio between the output power from the reflector-side facet and the magnitude of the Lagrangian of the electromagnetic field in the cavity. Hui et al. (1994) analyzed the external feedback sensitivity for complex-coupled DFB lasers above threshold, taking into account SHB effects. They found that although pure index grating and partial gain grating in DFB lasers exhibit comparable sensitivity to external optical feedback at threshold, gain grating has the effect of reducing the feedback sensitivity when the lasers operate well above threshold, especially when coupling strength RL is high.
B. Feedback Sensitivity Based on Mode Competition Theory Most of the experimentally observed phenomena, including intensity fluctuations, mode hopping, transition to chaotic state, coherence collapse, etc., usually require relatively higher levels of feedback of the order of ( - 30 dB). On the other hand, mode competition induced noise begins to (-50dB) increase at much lower feedback levels of the order of (Yamada 1986; Yamada and Suhara 1990). Based on the mode competition theory, a critical feedback ratio can be defined above which external cavity modes start to appear around the main laser modes, generating excess noise. Figure 14 shows a typical optical spectrum in the presence of external optical feedback. The semiconductor laser exhibits two groups of lasing modes under the influence of external feedback. One group consists of the internal cavity modes (p-modes), which are determined by the structural parameters of the laser cavity itself. Another group consists of external cavity modes (m-modes), which build up around each internal cavity mode with frequency separations from p-modes characterized by the distance to reflection point I,,. When the effective feedback ratio qT (see Section 11-B beginning on page 76) is small enough, a correctly designed DFB laser operates at a single longitudinal mode. On the other hand, when the external optical feedback ratio is increased beyond a certain minimum feedback ratio qT, (the critical feedback ratio), external cavity modes start
112
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
p : Internal Cavity M o d e s m : External Cavity M o d e s
P -1
p
m'
m
Hf P+l
FIGURE14. Optical spectrum of internal and external cavity modes of a DFB laser. The internal cavity modes are denoted by p-modes while the external cavity modes are denoted m and m'.
to build up with oscillation frequencies close to each of the internal cavity mode oscillation frequencies. The critical feedback ratio above which RIN increases abruptly (Schunk and Petermann, 1989b; Suhara et al., 1994) to a high value also corresponds to that particular feedback ratio below which only a single internal cavity mode around the Bragg wavelength can exist. Thus, a semiconductor laser with a higher critical feedback ratio represents a laser that can withstand higher levels of external optical without generating excess noise, or a laser that is less sensitive to external optical feedback. Suhara et al. (1994) and Alam et al. (1997a) analyzed the critical feedback ratio in index-coupled and complex-coupled DFB semiconductor lasers for various configurations of DFB laser structural parameters. They found that the critical feedback ratio depends significantly on the reflectivities of the facets as well as the phase of the grating at a facet if that facet is not antireflection (AR) coated. They also found that phase-shifted index-coupled lasers show better immunity to external optical feedback when the phase shift is near n/2, although the optimum phase shift for highest critical feedback ratio may vary slightly from n/2. The QWS laser was found to have the highest critical feedback ratio, especially at higher coupling strengths. The critical feedback ratio of a QWS laser as a function of coupling strength xiL is shown in Fig. 15. The partly gain-coupled DFB laser was found to have lower critical feedback ratio and the conventional DFB laser was found to have the lowest critical feedback ratio. For complex-coupled DFB lasers, Alam et al. (1997b) found that the critical feedback ratio varied widely when the relative strengths of index and gain coupling are changed while the total coupling strength (ELI is kept constant. Figure 16 shows the critical feedback ratio as a function of index-to-gain coupling ratio K ~ / K ,for (RL( = 1.5. This is in contrast to the
113
EXTERNAL OPTlCAL FEEDBACK EFFECTS IN LASERS
l
0
-
'
c
COUPLING COEFFICIENT X LASER LENGTH
FIGURE15. The critical feedback ratio in a QWS DFB semiconductor laser as a function of coupling strength tiiL. (After Alam et al., 1997b.)
finding of Favre (1991) where it was concluded that the relative strengths of index and gain coupling have virtually no effect on external feedback sensitivity when JRLIremains constant. Alam et al. (1997b) also analyzed external feedback sensitivity of asymmetric QWS DFB lasers, which were found to be less sensitive to external feedback experimentally (Kurosaki et al., 1994) than QWS lasers with the phase shift at the center of the laser. They found asymmetry in the reflectivity of facets combined with the asymmetry in the position of the phase-shift can increase the mode selectivity of DFB lasers as well as improve external feedback sensitivity. For higher mode selectivity as well as critical feedback ratio, the position of the phase shift has to be moved axially towards the facet with the higher reflectivity of the two. In case of a DFB laser with one facet cleaved and the other facet AR-coated, (CL-AR structure), the optimum phase shift position was found to be about one-third of the total laser length away from the cleaved facet. Increasing the reflectivity of a facet, however, reduces the yield because of the statistical variation of the corrugation phase at the facet during DFB laser fabrication.
114
MOHAMMAD F. ALAM A N D MOHAMMAD A. KARIM
/
10-61
10.'
'
1o.2
I
\
I
10.'
/ -
1oo
10'
INDEX-TO-GAINCOUPLING RATIO
FIGURE16. Variation of the critical feedback ratio in a complex coupled DFB laser as a ) the total coupling (?L(= 1.5. (After Alam function of index-to-gain coupling ratio ( K , / K ~when et a]., 1997a, b.)
VI. CONCLUSION As we move towards the twenty-first century, DFB semiconductor lasers
will play increasingly important roles due to ever-increasing demand for high-speed optical communications, optical data storage, and other applications that require narrow-linewidth cost-effective lightweight laser sources. We have introduced here the basic concepts of DFB laser electromagnetics and its characteristics, which are relevant to external optical feedback. We then described a number of experimentally observed phenomena in semiconductor lasers in general, although some experimental results are particular to DFB lasers. We also discussed some of the theoretical models that explain these experimental phenomena. Finally, we discussed the external optical feedback sensitivity in semiconductor lasers where most of the discussion is specific to DFB lasers.
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
115
REFERENCES Achtenhagen, M., Miles, R. O., Hardy, A,. and Reinhart, F. K. (1996). Erect of the external reflector position on the threshold current in complex-coupled D F B laser diodes, Electron. Lett., 32: 334. Acket, G. A,, Lenstra, D., den Boef, A. J., and Verbeek, B. H. (1984). The influence of feedback intensity on longitudinal mode properties and optical noise in index-guided semiconductor lasers, lEEE J . Quantum Elecron., QE-20: 1163. Agrawal. G. P. (1984). Line narrowing in a single-mode injection laser due to external optical feedback. IEEE J . Qunntum Electron., QE-20: 468. Agrawal, G . P. and Henry, C. H. (1988). Modulation performance of a semiconductor laser coupled to an external high-Q resonator, IEEE J . Quantum Electron., QE-24: 134. Aiki, K., Nakamura, M., Umeda, J., Yariv A,, Katzir, A,, and Yen, H. W. (1975). GaAs-GaAIAs distributed feedback diode lasers with separate optical and carrier confinement, Appl. Phys. Lett.. 27: 145. Alam, M. F., Karim, M. A,, and Islam. S. (1997a). Effects of structural parameters on the external optical feedback sensitivity in DFB semiconductor lasers, IEEE J . Quantum Electron., 3 3 424. Alam, M. F., Karim, M. A., and Islam, S. (1997b). Analysis of external optical feedback characteristics of asymmetric, quarter-wave-shifted, distributed feedback semiconductor lasers, Appl. Opt., 36: 4131. Benoist. K. W. (1996). Influence of external frequency shifted feedback on a D F B semiconductor laser, IEEE Phaton. Technol. Lett., 8: 25. Besnard, P., Meziane, B., and Stephan, G. M. (1993). Feedback phenomena in a semiconductor laser induced by distant reflectors, IEEE J . Quantum Electron., 2 9 1271. Beylat, J. L. and Jacquet, J. (1988). Analysis of DFB semiconductor lasers with external optical feedback, Electron. Lett., 24: 509. Bogatov, A. P., Eliseev, P. G.,Ivanov, L. P., Logginov, A. S., Manko, M. A,. Senatorov, K. Ya. (1973). Study of the single-mode injection laser, l E E E J . Quantum Electron., QE-9: 392. Bourchert, B., Stegmuller, B., and Gessner, R. (1993). Fabrication and characteristics of improved strained quantum-well GaInAlAs gain-coupled DFB lasers, Electron. Lett., 2 9 2 10. Broom, R. F., Mohn, E., Risch, C., and Salathe, R. (1970). Microwave self-modulation of a diode coupled to an external cavity, IEEE J . Quantum Elerrron., QE-6 328. Casey Jr., H. C., and Panish, M. B. (1978). Hcterostructure Lusers. New York: Academic Press. Chinone. N., Aiki, K., and Ito, R. (1978). Stabilization of semiconductor laser outputs by a mirror close to a laser facet, Appl. Phys. Lett., 3 3 990. Cho. Y. and Umeda, M. (1984). Chaos in laser oscillations with delayed feedback: numerical analysis and observation using semiconductor laser, J . Opt. Soc. Am., B 1: 497. Chuang, S. L. (1995). Physics of Optoelectronic Devices. Chapter 10, New York: Wiley. Chuang, Z. M., Wang, C. Y., Lin, W., Liao, H. H., Su, J. Y., and Tu. Y. K. (1996). Very-low-threshold, highly efficient, and low-chirp 1.55-pn complex-coupled D F B lasers with a current-blocking grating, IEEE Photon. Techno/. Lett.. 8: 1438. Cohen, J. S. and Lenstra, D. (1989). Spectral properties of the coherence collapsed state of a semiconductor laser with delayed optical feedback. IEEE J . Quantum Eletran., 2 5 1143. Cohen, J. S., Wittgrefe, F., Hoogerland, M. D., and Woerdinan. J. P. (1990). Optical spectra of a semiconductor laser with incoherent optical feedback, IEEE J . Quantum Electron., 26: 982. David, K., Morthier, C., Vankwikelberge, P., and Baets, R. (1991). Gain-coupled DFB lasers versus index-coupled and phase-shifted DFB lasers: a comparison based on spatial hole burning corrected yield, IEEE J . Quantum Electron., QE-27: 1714.
116
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
David, K., Buus, J., and Baets, R. G. (1992). Basic analysis of AR-coated, partly gain-coupled DFB lasers: the standing wave effect, IEEE J . Quanrum Electron., QE-28: 427. Dente, G. C., Durkin, P. S., Wilson, K. A., and Moeller, C. E. (1988). Chaos in the coherence collapse of semiconductor lasers, IEEE J . Quantum Electron., 2 4 2441. Dorizzi, B., Grammaticos, B., Le Berre, M., Pomeau, Y., Ressayre, E., and Tallet, A. (1987). Statistics and dimension of chaos in differential delay systems, Phys. Rev., A 35: 328. Elenkrig, B. B., Nesterenko, A. G., and Tager, A. A. (1990). Modulation bandwidth limits for semiconductor lasers with compound selective cavities, Int. J . Optoelectron., 5: 523. Favre, F. and Le Guen, D. (1985). Spectral properties of a semiconductor laser coupled to a single-mode fiber resonator, l E E E J . Quantum Electron., QE-21: 19. Favre, F. (1987). Theoretical analysis of external optical feedback on DFB semiconductor lasers, IEEE J . Quantum Electron., QE-23 81. Favre, F. (1991). Sensitivity to external optical feedback for gain-coupled DFB semiconductor lasers, Electron. Lett., 27: 433. Fleming, M. W. and Mooradian, A. (1981). Spectral characteristics of external-cavity controlled semiconductor lasers, IEEE J . Quantum Electron., QE-17: 44. Fujita, T., Ishizuka, S., Fujito, K., Serizawa, H., and Sato, H. (1984). Intensity noise suppression and modulation characteristics of a laser diode coupled to an external cavity, IEEE J . Quantum Electron., QE20: 492. Fujiwara, M., Kubota, K., and Lang, R. (1981). Low-frequency intensity fluctuation in laser diodes with optical feedback, Appl. Phys. Lett., 3 8 217. Giles, C. R., Erdogan, T., and Mizrahi, V. (1994). Reflection-induced changes in the optical spectra of 980-nm QW lasers, IEEE Photonics Techno/. Lett., 6 903. Goldberg, L., Taylor, H. F., Dandridge, A,, Weller, J. F., and Miles, R. 0. (1980). Spectral characteristics of semiconductor lasers with optical feedback, IEEE Trans. Microwave Theory Tech., MTT-30 401. Goldberg, L., Taylor, H. F., Dandridge, A., Weller, J. F., and Miles, R. 0. (1982). Spectral characteristics of semiconductor lasers with optical feedback, IEEE J . Quantum Electron., QE-18: 555. Hamel, W. A,, van Exter, M. P., and Woerdman, J. P. (1992). Coherence properties of a semiconductor laser with feedback from a distant reflector: experiment and theory, IEEE J . Quuntum Electron., 2 8 1459. Henry, C. H. (1982). Theory of the linewidth of semiconductor lasers, IEEE J . Quuntum Electron., QE-18: 259. Henry, C. H. (1983). Theory of phase noise and power spectrum of a single-mode injection laser, IEEE J . Quantum Electron., QE-19 1391. Henry, C. H., and Kazarinov, R. F. (1986). Instability of semiconductor lasers due to optical feedback from distant reflectors, I E E E J . Quantum Electron, QE-22: 294. Hernandez-Garcia, E., Mirasso, C. R., Shore, K. A,, and San Miguel, M. (1994). Turn on jitter of external cavity semiconductor lasers, IEEE J . Quantum Electron., 30: 241. Hirono, T., Kurosaki, T., and Fukuda, M. (1992). A novel analytical expression of sensitivity to external optical feedback for DFB semiconductor lasers, I E E E J . Quantum Electron., 28: 2674. Hirota, 0. and Suematsu, Y.(1979). Noise properties of injection lasers due to reflected waves, IEEE J . Quantum Electron., QE-15: 142. Hjelme, A. R., and Mickelson, A. R. (1987). On the theory of external cavity operated single-mode semiconductor lasers, I E E E J . Quantum Electron., QE-23 1000. Hong, J., Makino, T., Lu, H., and Li, G. P. (1995). Effect of in-phase and antiphase gain coupling on high-speed properties of MQW DFB lasers, IEEE Photon. Technol. Leu., 7: 956. Huang, Y., Yamada, H., Okuda, T., Torikai, T., and Uji, T. (1996). External optical feedback
EXTERNAL OPTICAL FEEDBACK EFFECTS IN LASERS
117
resistant characteristics in partially corrugated-waveguide laser diodes, Electron. Letr., 32: 1008. Hui, R., Kavehrad, M., and Makino, T. (1994). External feedback sensitivity of partly gain-coupled DFB semiconductor lasers, IEEE Photon. Technol. Lett., 6 897. Ikushima, I. and Maeda, M. (1978). Self-coupled phenomena of semiconductor lasers caused by an optical fiber, IEEE J . Quantum Electron., QE-14: 331. Ito, M. and Kimura, T. (1980). Oscillation properties of AlGaAs DH lasers with an external grating, IEEE J . Quantum Electron., QE-16: 69. Kapon, E., Hardy, A., and Katzir, A. (1982). The effect of complex coupling coefficients on distributed feedback lasers, IEEE J . Quantum Electron., QE-18: 66. Kawai, T., Rahwanto, A,, Kitajima, K., Mori, M., Goto, T., and Miyauchi, A. (1995). Relative intensity noise of DFB L D s with near and far end reflections, IEICE Pans. Electron., E78 1779. Ketelsen, L. J. P., Hoshino, I., and Ackerman, D. A. (1991). The role of axially non-uniform carrier density in altering the TE-TE gain margin in InGaAsP-IP DFB lasers, IEEE J . Quantum Electron., 27: 957. Kikuchi, K. and Okoshi, T. (1982). Simple formula giving spectrum-narrowing ratio of semiconductor laser output obtained by optical feedback, Electron. Lett., 18: 10. Kikuchi, K. and Lee, T. (1987). Spectral stability analysis of weakly coupled external-cavity semiconductor lasers, J. Lightwave Technol., LT-5: 1269. Kobayashi, K. (1976). Improvements in direct pulse code modulation of semiconductor lasers by optical feedback, Pans. I.E.C.E. Japun, E59 8. Kogelnik, H., and Shank, C. V. (1972). Coupled wave theory of distributed feedback lasers, J . Appl. Phys., 43 2327. Kurosaki, T., Hirono, T., and Fukuda, M. (1994). Suppression of external cavity modes in DFB lasers with a high endurance against optical feedback, IEEE Photon. Technol. Lett., 6: 900. Lam, B., Kellner, A. L., Yu, P. K., Sushchik, M. M., and Abarbanel, H. D. (1996). Chaotic instabilities in modulated external-cavity semiconductor lasers, Pror. SPIE, 2610: 13. Lang, R. (1979). Lateral transverse mode instability and its stabilization in stripe geometry injection lasers, IEEE J . Quantum Electron., QE-15: 718. Lang, R. and Kobayashi, K. (1980). External optical feedback effects on semiconductor laser properties, IEEE J. Quantum Electron., QE-16: 347. Langley, L. N. and Shore, K. A. (1992). The effect of external optical feedback on the turn-on delay statistics of laser diodes under pseudo-random modulation, IEEE Photon. Technol. Lett., 4 1207. Langley, L. N. and Shore, K. A. (1993). The effect of external optical feedback on timing jitter in modulated laser diodes, J. Lightwave Technol.. 11: 434. Lau, K. Y. and Yariv, A. (1985). Detuned loading in coupled cavity semiconductor laserseffect on quantum noise and dynamics, IEEE J. Quantum Electron., QE-21: 121. Lau, K. Y. (1988). Efficient narrow-band direct modulation of semiconductor injection lasers at millimeter wave frequencies of 100GHz and beyond, Appl. Phys Lett., 5 2 2214. Lenstra, D., Verbeek, 9. H., and den Boef, A. J. (1985). Coherence collapse in single-mode semiconductor lasers due to optical feedback, IEEE J . Quantum Electron., QE-21: 674. Lenstra, D. (1991a). Feedback noise in single mode semiconductor lasers, SPIE Proc., 1376: 245. Lenstra, D. (1991b). Statistical theory of the multi-stable external-feedback laser, Opt. Commun., 81: 209. Li, G. P., Makino, T., Moore, R., and Puetz, N. (1992). 1.55 pm index/gain coupled DFB lasers with strained layer multi-quantum-well active grating, Elecrron. Lett., 28: 1726. Li, H., Ye, J., and Mclnerney, J. G. (1993). Detailed analysis of coherence collapse in semiconductor lasers, IEEE J . Quantum Electron., 2 9 2421.
118
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
Lowery, A. J., and Novak, D. (1994). Performance comparison of gain-coupled and indexcoupled DFB semiconductor lasers, I E E E J . Quantum Electron., 30: 2051. Luo, Y., Nakano, Y.. Tada, K., Inoue, T., Hosomatsu, H., and Iwaoka, H. (1990). Purely gain-coupled distributed feedback semiconductor lasers, Appl. Phys. Lett., 56: 1620. Luo, Y., Nakano, Y., Tada, K., Inoue, T., Hosomatsu, H., and Iwaoka, H. (1991). Fabrication and characteristics of gain-coupled distributed feedback semiconductor lasers with a corrugated active layer, IEEE J . Quantum Electron., 27: 1724. Miles, R. O., Dandridge, A., Tveten, A. B., Taylor, H. F., and Giallorenzi, T. G. (1980). Feedback-induced line broadening in CW channel-substrate planar laser diodes, Appl. Phys Lett., 37: 990, Mink, J. and Verbeek, B. H. (1986). Asymmetric noise and output power in semiconductor lasers with optical feedback near threshold, Appl. Phys. Lett., 48: 745. Mirasso, C. R. and Hernindez-Garcia, E. (1994). Effects of current modulation on timing jitter of single-mode semiconductor lasers in short external cavities, IEEE J . Quantum Electron., 30: 2281. Mitsuhashi, Y., Morikawa, T., Sakurai, K., Seko. A,, and Shimada, J. (1976). Self-coupled optical pickup, Opt. Commun., 1 7 95. Morikawa, T., Mitsubishi, Y., Shimada, J., and Kojima, Y. (1976). Return-beam induced oscillations in self-coupled semiconductor lasers, Electron. Lett., 12: 435. Msrk, J., Tromborg, B., and Christiansen, P. L. (1988). Bistability and low-frequency fluctuations in semiconductor lasers with optical feedback: a theoretical analysis, IEEE J . Quuntum Electron., 24: 123. Mark, J., Mark, J., and Tromborg, B. (1990a). Route to chaos and competition between relaxation oscillations for a semiconductor laser with optical feedback. Pliys. Rev. Lett., 65: 1999. Msrk, J. and Tromborg, B. (1990). The mechanism of mode selection for an external cavity laser, IEEE Plioton. Technol. Lett., 2: 21. Msrk, J., Semkow, M., and Tromborg, B. (1990b). Measurement and theory of mode hopping in external cavity lasers, Eleciron. Leit., 26: 609. Mark, J., Tromborg, B., and Mark, J. (1992). Chaos in semiconductor lasers with optical feedback: theory and experiment, IEEE J . Quantum Electron., 28: 93. Morthier, G., Vankwikelberge, P., David, K., and Baets, R. (1990). Improved performance of AR-coated DFB lasers by the introduction of gain coupling, IEEE Photon. Techno/. Lett., 2: 170. Mukai, T. and Otsuka, K. (1985). New route to optical chaos: Successive subharmonicoscillation cascade in a semiconductor laser coupled to an external cavity, Phys. Rev. Lett., 55: 1711. Nakano, Y., Deguchi, Y., Ikeda, K., Luo, Y. and Tada, K. (1991). Reduction of excess intensity noise by external reflection in a gain-coupled distributed feedback semiconductor laser, l E E E J . Qucintum Electron., 2 1 1732. Nakano, Y., Uchida, Y., and Tada, K. (1992). Highly efficient single longitudinal-mode oscillation capability of gain-coupled distributed feedback semiconductor lasers -advantage of asymmetric facet coating, IEEE Photon. Technol. Lett., 4 308. Ogasawara, N. and Ito, R. (1988). Longitudinal mode competition and asymmetric gain saturation in semiconductor injection lasers 11. Theory, Jupcan J . Appl. Phys., 27: 615. Olesen, H., Osmundsen, J. H., and Tromborg, B. (1986). Nonlinear dynamics and spectral behavior for an external cavity laser, IEEE J . Quantum Electron., QE-22: 762. Osmundsen, J. H., Tromborg, B., and Olesen, H. (1983). Experimental investigation of stability properties for a semiconductor laser with optical feedback, Electonics Lett., 19: 1068. Park, J. D., Seo, D. S., Mclnerney, J. G., Dente, G. C., and Osinski, M., (1989). Low frequency
EXTERNAL OPTICAL FEEDBACK EFFECTS I N LASERS
119
self-pulsations in asymmetric external-cavity semiconductor lasers due to multiple feedback effects, Opt. Lett., 14 1054. Park, J. D., Seo, D. S., and McInerney, J. G. (1990). Self-pulsations in strongly-coupled asymmetric external-cavity semiconductor lasers, IEEE J . Quantum Electron., 26: 1353. Park, K. H., Lee, J. K., Han, J. H., Cho, H. S., Jang, D. H., Park, C. S., Pyun, K. E., and Jeong, J. (1998). Effects of external optical feedback on the power penalty of DFB-LD modules for 2.5 Gb/s optical transmission systems, Optical arid Quuntuni Electron., 30: 23. Patzak, E., Olesen, H., Sagimura, A,, Saito, S., and Mukai, T. (1983). Spectral linewidth reduction in semiconductor lasers by an external cavity with weak optical feedback, Electron. . h / t . . 19: 938. Petermann, K. (1988). Laser Diode Modttltrrion und Noise. Dordrecht: Kluwer Academic. Phillips, M. R., Darcie, T. E., and Flynn, E. J. (1992). Experimental measure of dynamic spatial-hole burning in DFB lasers. IEEE Photon, Tec/7nOl. Lett., 4: 1201. Risch, Ch. and Vouniard, C. (1977). Self-pulsation in the output intensity and spectrum of GaAs-AIGaAs C W diode lasers coupled to a frequency selective external optical cavity, J . Appl. Phys., 4 8 2083. Ritter, A. and Haug, H. (1993). Theory of bistable limit cycle behavior of laser diodes induced by weak optical feedback, IEEE J. Qiiantum Electron., 29: 1064. Sacher, J., Elsaesscr, W., and Goebel, E. 0. (1989). Interniittcncy i n the coherence collapse of a semiconductor laser with external feedback. Phys. Rei,. Lrrr., 63: 2224. Salathe, R. P. (1979). Diode lasers coupled to external resonators, Appl. Phys., 20: 1. Schunk. N. and Petermann, K. (1988). Numerical analysis of the feedback regimes for a single-mode semiconductor laser with external feedback, IEEE J . Quuntum Electron., QE-24: 1242. Schunk, N. and Petermann, K. (1989a). Measured feedback-induced intensity noise for 1.3 pm DFB laser diodes, Electron. Lett., 25: 63. Schunk, N. and Petermann. K. (1989b). Stability analysis for laser diodes with short external cavities, IEEE Photon. Teclinol. Lett., I: 49. Seo, D. S., Park, J. D., Mclnerney, J. G., and Osinski, M . ( 1 9 8 8 ) . Effects of feedback asymmetry in external-cavity semiconductor laser systems, Eleciron. Lett., 2 4 726. Sigg, J. (1993). ENects of optical feedback on the light-current characteristics of semiconductor lasers, IEEE J . Quuntum Electron., 29: 1262. Simonsen, H. (1993). Frequency noise reduction of visible InGaAsP laser diodes by different optical feedback niethods, IEEE J . Quantum Electron.. 29: 877. Spano, P., Piazzolla, S., and Tamburrini, M. (1984). Theory of noise in semiconductor lasers in the presence of optical feedback, IEEE J . Quuntum Electron., QE-20: 350. Suematsu, Y. and Furuya, K. (1977). Theoretical spontaneous emission factor of injection lasers, Trans. I.E.C.E. Japan, E60: 467. Suhara, M., Islam. S., and Yamada, M. (1994). Criicrion of external feedback sensitivity in index-coupled and gain-coupled DFB semiconductor lasers to be free from excess intensity noise, IEEE J. Quantum Electron., 30: 3. Suris, R. A. and Tager, A. A. (1985). Influence of the carrier-density dependence of the refractive index on the emission spectrum of an injection laser, Sou. Phys. Semicond.. 19: 266. Szoke, A., Daneu, V., Goldhar, J., and Kurnit, N. A. (1969). Bistable optical element and its applications. Appl. Phys. Lett., 15: 376. Tager, A. A. and Elenkrig, B. 8.(1993). Stability regimes and high-frequency modulation of laser diodes with short external cavity, IEEE J. Quuntum Electron.. 29: 2886. Temkin, H., Olsson, N. A,, Abeles, J. H., Logan, R. A,, and Panish, M. B. (1986). Reflection noise in index-guided InGaAsP lasers, I E E E J . Quaniurn Elrmolt., QE-22 286. Thompson, G. H. B., Lovelace. D. F., and Turlcy, S. E. H . (1978). Kinks in the lightjcuirent
120
MOHAMMAD F. ALAM AND MOHAMMAD A. KARIM
characteristics and near field shifts in (GaA1)As heterostructure stripe lasers and their explanation by the effect of self-focusing on a built-in optical waveguide, IEEE J . Solid State and Electron Devices, 2 12. Tkach, R. W. and Chraplyvy, A. R. (1986). Regimes of feedback effects in 1.5-pm distributed feedback lasers, J. Lightwave Technol., LT-4: 1655. Tomasi, F., Cerboneschi, E., and Arimondo, E. (1994). Asymmetric pulse shape in the LFF instabilities of a semiconductor laser with optical feedback, IEEE J . Quantum Electron., 30: 2217. Tromborg, B., Osmundsen, J. H., and Olesen, H. (1984). Stability analysis for a semiconductor laser in an external cavity, I E E E J . Quantum Electron., QE-20: 1023. Tsang, W. T., Choa, F. S., Wu, M. C., Chen, Y. K., Logan, R. A,, Chu, S. N. G . , and Sergent, A. M. (1992a). Semiconductor distributed feedback lasers with quantum well or superlattice gratings for index or gain-coupled optical feedback, Appl. Phys. Lett., 60: 2580. Tsang, W. T., Choa, F. S., Wu, M. C., Chen, Y. K., Logan, R. A,, Sergent, A. M., and Burrus, C. A. (1992b). Long-wavelength InGaAsP/InP distributed feedback lasers incorporating gain-coupled mechanism, IEEE Photon. Technol. Lett., 4: 212. Twu, Y., Parayanthal, P., Dean, B. A., and Hartman, R. L. (1992). Studies of reflection effects on device characteristics and system performances of 1.5 pm semiconductor DFB lasers, .I. Lighrwave Technol., 10: 1267. Voumard, C., Salathe, R., and Weber, H. (1977). Resonance amplifier model describing diode lasers coupled to short external resonators, Appl. Phys., 12: 369. Wang, C. Y., Chuang, Z. M., Liao, H. H., Lin, W., Tu, Y. K., and Lee, C. T. (1997). Resistance to external optical feedback of low-chirp strained-quantum-well complex-coupled distributed-feedback laser, Japanese J . Appl. Phys., Part-I 36: 2685. Wang, J. and Petermann, K. (1991). Noise analysis of semiconductor lasers within the coherence collapse regime, IEEE J . Quantum Electron., 2 7 3. Whiteaway, J. E. A., Thompson, G. H. B., Collar, A. J., and Armistead, C. J. (1989). The design and assessment of 4 4 phase-shifted DFB laser structures, I E E E J . Quantum Electron.. 25: 1261. Woodward, S. L., Koch, T. L., and Koren, U. (1990). The onset of coherence collapse in DBR lasers, IEEE Photon. Technol. Lett., 2: 391. Wu, H. and Chang, H. (1992). Turn-on jitter in semiconductor lasers with moderate reflecting feedback, IEEE Photon. Technol. Lett., 4 339. Wu, H. and Chang, H. (1993a). Analysis of external optical feedback on distributed-feedback semiconductor lasers above threshold, IEEE Photon. Technol. Lett., 5: 1168. Wu, H. and Chang, H. (1993b). Mode partition in semiconductor lasers with optical feedback, IEEE J . Quantum Electron., 2 9 2154. Wyatt, R. and Devlin, W. J. (1983). 10 kHz linewidth 1.5-pm InGaAsP external cavity laser with 55 nm tuning range, Electron. Lett., 19 110. Yamada, M. (1983). Transverse and longitudinal mode control in semiconductor injection lasers, IEEE J . Quantum Electron., QE-19: 1365. Yamada, M. (1986). Theory of mode competition noise in semiconductor injection lasers, IEEE J . Quantum Electron., QE-22: 1052. Yamada, M. (1989). Theoretical analysis of nonlinear optical phenomena taking into account the beating vibration of the electron density in semiconductor laser, J . Appl. Phys., 6 6 81. Yamada, M. and Suhara, M. (1990). Analysis of excess noise induced by optical feedback in semiconductor lasers based on mode competition theory, Trans. I.E.I.C.E. Japun, E73: 77. Zhang, L. M. and Carroll, J. E. (1993). Enhanced AM and FM response of complex coupled DFB lasers, IEEE Photon. Technol. Lett., 5: 506.
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 107
Atomic Scale Strain and Composition Evaluation from High-Resolution Transmission Electron Microscopy Images A. ROSENAUER and D. GERTHSEN Luborutory for Electron Microscopy, University of Kurlsruhe, 76128 Kurlsruhe, Gerniany
I. Introduction
.
.
,
. . . . . .
.
. . . . . . .
11. Strain-State Analysis . . . . . . . .
. .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . .
A. The Measurement of Displacements and Lattice Spacings on an Atomic Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Determination of the Sample Thickness , . . . . , , . . , . . . . . , C. Consideration of the Elastic Relaxation . . . . . . . . . . . . , . . . 111. Composition Evaluation by Lattice Fringe Analysis . . . . . . . . . . . . . A. The Basic Idea behind Composition Evaluation by Lattice Fringe Analysis Method.. , . . , . . . . . . . . . , . . . . . . . . . . . . . . B. The Fringe Images . . . . . . . . . . . . . . . . . . . . . . . . . C. Theoretical Considerations . . . . . . . . . . . . . . . . . . . . . . D. Determination of Sample Thickness and Phase x. . . . . . . . . . . . . E. The Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . F. Correction of Imaging Conditions Varying Across the Image . . . . . . . G. Errors of the Composition Detection Due to Sample Thickness Uncertainties . . . . . . . . . . . . . . . . . . . . . . . . . . . 1V. Applications . , . . . . . . . . . . , . . . . . . . . . . . . . . . . A. Strain-State Analysis of Zn,Cd, _,Se/ZnSe Heterostructures . . . . . . . B. In,Ga, -,As/GaAs Stranski-Krastanow Islands . . . . . . . . . . . . . C . Strain State Analysis of an Array of Misfit Dislocations . . . . . . . . . D. Composition Evaluation by Lattice Fringe Analysis Method Evaluation of a CdSe/ZnSe(OO1) Heterostruct ure . . . . , . . . . . . . . . . . . . . V. Summary and Discussion of the Atomic Scale Analysis Methods . . . . . . . Appendix A: List of Variables . . . . . . . . . . . . . . . , . . . . . .
121 125 125 137 145 154 154 160 161 167 170 175 177 182 182 196 207 2 14 222 225
I. INTRODUCTION Regardless of the high degree of development of the transmission electron microscopes that allows high-resolution images of almost all materials to be obtained, the extraction of quantitative information requires considerable additional effort. Digital cameras, in particular charge-coupled device (CCD) cameras with pixel resolution of at least 1024 x 1024, are an important prerequisite to the processing of high-resolution transmission electron microscopy (HRTEM) images without distortions and with a linear contrast transfer between electron intensity and gray levels. As far as software 121 Volume I07 ISBN 0-12-014749-1
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright R 1999 by Acddcmic Prew All righlc of reproduction in any form reserved lSSN 1076-5670/99 $30 00
122
A. ROSENAUER AND D. GERTHSEN
is concerned, there are well-developed high-resolution image simulation program packages (Stadelmann, 1987; McTempas; the NCEM Simulation System) available that provide understanding of the imaging condition influences on image appearance. General purpose image processing program libraries like SEMPER (Saxton et al., 1979) contains a large variety of image processing functions. However, software for specialized purposes for the HRTEM image evaluation is generally not commercially available. The present article focuses on the description of the program package digital analysis of lattice images (DALI), which was developed to quantify HRTEM image information. These programs are applied to semiconductor heteroepitaxial layers where the strain state and the composition on an atomic scale are of particular interest. However, application to other materials can be well envisaged. Highly perfected crystal growth techniques, for example, molecular beam epitaxy (MBE) and the different variants of chemical vapor deposition (CVD), allow the growth of epitaxial layers with monolayer control. However, lack of basic understanding of growth processes prevails, particularly for the three-dimensional (3-D) growth modes where isolated islands are nucleated on a continuous wetting layer covering the substrate (Stranski-Krastanow growth mode (Stranski and Krastanow, 1939)) or directly on the substrate (Volmer-Weber growth mode (Volmer-Weber, 1974)). In this context, segregation and interdiffusion effects must be investigated on an atomic-scale spatial resolution. The 3-D growth modes are used to obtain self-organized nanostructures-“quantum dots”-whose optical and electronic properties are intensively studied by many groups; further, correlation with structural and compositional properties is required. The basis of one possible approach to solving the task of determination of strain and composition on an atomic scale is measurement of local lattice parameters, that is, measurement of the distance between adjacent atomic columns. This simply requires the detection of the intensity maxima positions in a high-resolution image that can be considered a fingerprint of the local lattice parameter. It is not necessary to know the actual position of the atomic columns with respect to the intensity maxima position if the TEM specimen thickness does not change significantly and composition-insensitive imaging conditions are chosen so as to avoid chemical shifts of the contrast pattern. Local composition can be extracted directly if the relationship between composition and lattice parameter is known. For many compound semiconductors, for example, In,Ga, -,As and Cd,Zn, -,Se, Vegard’s law (Eq. (1)) can be applied where the lattice parameter and the composition are linearly correlated: a A , B ~ - , C = uBC + x ( U A C - uBC). (1)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
123
If a lattice mismatch f = (af - a,)/uJ exists between the lattice parameters of the substrate a, and the epilayer as, the distortion of the unit cells in the epilayer must be taken into account. The tetragonal distortion can be easily calculated for coherently strained two-dimensional (2-D) layers below the critical thickness for the plastic relaxation by misfit dislocations, as seen in the work by Hull and Bean (1992). Measurement of the local lattice parameters was done previously by Bierwolf et al. (1993) and Jouneau et al. (1994) to investigate the strain distribution of thin epitaxial layers. Robertson et at. (1995) used Fourier-filtered HRTEM images to measure the distance and lattice fringe deviations. The situation becomes more difficult for 3-D growth modes. Deviations from tetragonal distortion occur close to the surface due to the elastic relaxation of the strained lattice. To compute the strain distribution in nanoscaled SiGe islands on Si(OO1) substrates the finite element method (FEM) was first applied by Christiansen et al. (1994). A complete relaxation of the misfit strain close to the top surface is obtained. This is the major driving force for island growth. An accurate knowledge of strain distribution is therefore a necessary prerequisite for composition evaluation that is based on the measurement of the local lattice parameters in epitaxial islands. Another question to be addressed is the elastic relaxation of strained structures due to the small HRTEM specimen thickness (typically 20 nm at most), which can modify significantly tetragonal distortion depending on local specimen thickness and the dimension of the strained structures. Specimen thickness must be accurately measured in the region of interest of the evaluated HRTEM image. A further important step in quantification is the calculation of elastic relaxation as a function of TEM specimen thickness and layer morphology. Analytical solutions to this problem exist for simple layer stuctures, as was shown by Treacy and Gibson (1986). For more complicated morphologies, FEM simulations can be applied (Tillmann et al., 1996). Due to modification of local lattice parameters by both local specimen thickness and elastic strain relaxation in the islands, a different and less elaborate approach would be desirable for composition determination. A solution for the TnGaAs system, presented here, can be extended to other ternary compound semiconductors such as CdZnSe and AlGaAs. The approach is based on the evaluation of Fourier amplitudes of lattice fringe images (CELFA: composition evaluation by lattice fringe analysis), which is shown to depend sensitively on indium concentration (Rosenauer et ul., 1998). The enhancement of chemical contrast under off-axis imaging conditions was previously applied by Jia et al. (1993). This chapter is organized in the following way: Chapter I1 presents details of strain-state analysis. The measurement of displacements and lattice
124
A. ROSENAUER A N D D. GERTHSEN
spacings on an atomic scale (Section 11-A) includes the noise reduction (Section 11-A-l), detection of lattice sites and subdivision into image unit cells (Section II-A-2), calculation of lattice base vectors (Section II-A-3), the calculation of local (Section 11-A-4) and averaged (Section 11-A-5) displacements and lattice spacings. The thickness measurement is outlined in Section 11-B; cell transformation is found in Section 11-B-1 and the determination of relative and absolute thickness values is found, respectively, in Sections 11-B-2 and 3. The elastic relaxation of the thin TEM specimen is considered in Section 11-C, which is subdivided into the analytical solution of the thinand the thick-sample limit (Section 11-C-1) and the finite element simulations (Section 11-C-2) to quantify composition. Chapter 111 outlines the composition evaluation by lattice fringe analysis (CELFA) procedure. First, the basic idea behind CELFA is explained in Section 111-A, followed by a discussion of the fringe images and the theoretical background of their formation found, respectively, in Sections 111-B and C. The analysis of the contrast patterns of the images of a defocus series leads to the determination of sample thickness (Section 111-D), which knowledge is necessary for the evaluation procedure explained in Section 111-E. The approximate correction of the effect of imaging conditions or sample thickness varying across the image is shown in Section 111-F. Finally, Section 111-G provides an estimation of the errors of composition detection that are due to uncertainties in sample thickness measurement. Chapter IV is concerned with the presentation of selected evaluation examples. Section IV-A begins with the strain-state analysis of a variety of Cd,Zn, -,Se/ZnSe heterostructures with different layer thicknesses and Cd contents that help explain the determination of local concentrations and the overall amount of CdSe that was deposited. This section also shows that the results of strain-state analyses are in good agreement with in situ reflection high-energy electron diffraction (RHEED) measurements (see Section IV-A1). Section IV-A-2 deals with the determination of diffusion coefficients for the diffusion of Cd in ZnSe in the temperature range 330-400°C. The investigation of In,Ga, -,As/GaAs Stranski-Krastanow islands is described in Section IV-B-1, which deals with free-standing islands (Section IV-B-2) and islands capped with 10-nm GaAs (Section IV-B-2). It is shown that all analysis methods, which include strain-state analysis, thickness measurement, finite element calculations, conventional (002)-dark-field imaging, in tandem with application of the CELFA method cooperate to yield a consistent and convincing image of the morphology and compositional inhomogeneities of the specimen. Section IV-C demonstrates that strainstate analysis is also applicable to interfaces containing an array of misfit dislocations. Finally, Section IV-D provides a further example of application of the CELFA method, which consists of an evaluation of a CdSe/ZnSe(OOl)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
125
heterostructure. This section also contains a discussion of the effect of crystal tilt around an axis parallel to the interface plane on the evaluated concentration profile, which turns out to be negligible under certain conditions. 11. STRAIN-STATE ANALYSIS A . The Measurement of Displacements and Lartice Spacings on an Atomic Scale
We describe here the measurement of local and averaged lattice parameters and displacements. Our method is similar to those suggested by Bierwolf et al. (1993) and Brandt et al. (1992), Paciornik et al. (1993), Seitz et al. (1995), and Jouneau et al. (1994). It contains the following analysis steps: 1. 2. 3. 4.
Noise reduction; Detection of lattice sites and gridding; Calculation of lattice base vectors; and Analysis of displacements and lattice spacings.
We will apply here individual analysis steps to the HRTEM micrograph depicted in Fig. 1, which shows an In,Ga,-,As island on a GaAs(OO1)
FIGURE1. (110) HRTEM image of an In,Ga, -,As/GaAs (001) Stranski-Krastanow island containing the grid that connects the local brightness maxima of the dumbbells. The marked area of interest (AOI, blue frame) is used for the determination of the In-concentration inside the island. The reference area (green frame) is used for the calculation of the basis vectors of the reference lattice. (See also Plate 11.)
126
A. ROSENAUER AND D. GERTHSEN
substrate. The cross-sectional image was taken along the [l lo] projection. The nominal In-content was 60%, the nominal In,Ga, -,As layer thickness was 1.5nm, and the growth temperature during MBE was 500°C. This micrograph will also serve as an application example for description of the measurement of local sample thickness and for finite element modeling. We will acquire a model of the local In-contents inside the island. This evaluation example was chosen specifically because the determination of local In-contents in the “bulk” of the island cannot easily be performed using other methods on an atomic-scale spatial resolution. The reason is that lattice parameters (parallel as well as perpendicular to the interface plane) change inside the island due to the elastic relaxation of the island at its free surfaces. This circumstance excludes methods in which a latticeparameter fluctuation may affect the In-content measurement (this is also the case for the lattice fringe analysis that will be outlined in Section 111). 1. Noise Reduction
Images digitized with either an off-line or an on-line CCD camera attached directly to the microscope contain some noise. One source of noise is thin amorphous layers both on the top and on the bottom surface of the sample, which are formed during the ion-milling step of sample preparation for the TEM (as seen in Schuhrke et a/., 1992). Other possible sources are the grain of photographic negative film emulsion and electronic noise of the device used for the digitizing processes. For noise reduction, we use a Wienerfiltering technique (Press et al., 1992) where the noise level is estimated locally in the Fourier-transformed image C . It consists of the undisturbed signal S and the noise part N :
C=S+N.
(2)
Noise reduction is carried out by applying a filter 0 to the Fouriertransformed image C :
C=cx0.
(3)
C is the Fourier transform of the noise-reduced image if
else where yx and gy are the spatial frequencies. It is appropriate to note that the filter CD calculated according to Eq. (4) is often called the “optimum” or
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
127
“conventional” Wiener filter. Other choices (e.g., the parametric Wiener filter) are compared in Marks, 1996. In Fourier-transformed HRTEM images of defect-free lattice structures the information IS/’ is predominantly localized around those spatial frequencies that correspond to lattice spacings in real space whereas the noise part INI2 has a low, slowly variable intensity. Therefore, the noise part IN\’ is estimated using the following procedure: the power spectrum ICI2 is divided into equally sized areas A , (Fig. 2). The area extension must exceed the Bragg spot extensions, which are contained in the Fourier-transformed image. Furthermore, each area is subdivided into blocks B,. For each block the intensity IBm is calculated as the maximum of all pixel intensities in B,. For each area A, the values I s , of the blocks in A , are averaged:
The weighting factors wB, ensure that the intensive Bragg peaks contribute
FIGURE2. Schematic drawing, which shows the decomposition of the Fourier-filtered image into areas and blocks. The smallest detectable unit is given by a pixel (picture element).
128
A. ROSENAUER AND D. GERTHSEN
much less (we, z 1) to the noise part estimation than blocks with low block values (we,, maxs,(I8,,,)) and that a smooth variation of the noise part inside the area is taken into account. As a result, the values I A n form a map of the noise part The noise part for each pixel is calculated by bilinear interpolation with respect to IA,,. The example in Fig. 3 demonstrates the efficiency of the described procedure (Rosenauer et ul., 1996). Figure 3a shows a part of the power spectrum ICI2 and a small part of the lattice image in the insert. The power depicted in Fig. 3b also contains a small spectrum after noise reduction part of the lattice image that results from the inverse Fourier transform of 5;.
[el2
2. Detection of Luttice Sites and Gridding
The contrast of HRTEM images is the result of the dynamical diffraction in the crystalline specimen depending on sample thickness, microscope parameters (defocusing distance, electron energy spread, beam convergence angle) and nonlinear image formation. As a consequence, the exact determination of atom positions usually requires the comparison of experimental with simulated images. We use the positions of the intensity maxima to obtain a lattice that represents the dimensions of the projected unit cells. The positions of the intensity maxima positions may correspond to the location of the columns of atoms, the tunnel sites and-sometimes-neither of the two. However, our approach does not rely on the knowledge of the positions of atomic columns with respect to the intensity maxima. It is based only on the assumption of a constant spatial relationship between the intensity maxima positions and the columns of atoms. This requirement is often fulfilled in small investigated areas with insignificant change in specimen thickness. The formation of the 2-D grid is performed in five steps: 1. Finding the positions M ( ' ) of pixels corresponding to local brightness maxima (Fig. 4a). 2. Fitting parabola to the intensity profiles along two lines L , and L, across M ( ' ) that run along the x- and y-directions, yielding a more accurate position M"' (Fig. 4b). 3. Fitting parabola to the intensity profiles along four lines L , to L, (Fig. 4c) across M'". Averaging of the parabola's maxima positions results in the final position M ( 3 ) . 4. Formation of grid lines by connecting positions along each of two selected directions (Fig. 4d, e). 5. Continuously numbering the grid lines, separately for each of the two sets of lines (Fig. 4d,e), yielding the 2-D grid where a pair of indices is assigned to each position.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
129
FIGURE3. Three-dimensional plots of the intensities of the Fourier-transformed images of (a) the original image, and (b) the image after Wiener noise reduction. The inserts show the corresponding lattice images.
FIGURE 4. Schematic drawing showing the procedure of the lattice site determination. In the first step (a) brightness maxima positions M’), marked by crosses, have to be found. Intensity profiles along the straight lines L , and L, are fitted by parabolas. One of these is shown in (b) as an example. Their vertices (M$’I, M!?) give the position with improved accuracy. The same procedure is applied to intensity profiles along L,.2,3,4 (c) for further accuracy enhancement yielding the final position M3).The detected positions form a 2-D grid whose grid lines are numbered continuously with respect to their points of intersection (marked with rectangles for the set of horizontal grid lines and with squares for the vertical ones) with two lines x and y (d). The dot marks the chosen point of intersection of the lines x and 4’;(e) shows the numbering for grid lines where each of them connects positions that belong t o the same [ 11 I plane; (f) shows the resulting indexing for the gridding in the case (d).
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
131
In the first step, care has to be taken that local brightness maxima between the “main” maxima positions are excluded. An enhanced intensity between the “main” bright spots of the HRTEM image are often observed under imaging conditions close to the “half spacing” contrast where the tunnel positions as well as the positions of the rows of atoms show similar brightness. To avoid these “artificial” positions a minimum distance between adjacent positions is defined. If a new maximum position is found during the search procedure, its distance to all other positions found previously is checked. If a distance is smaller than the predefined minimum distance, the position with lower intensity is deleted. The detection of pixels with a local intensity maximum performed in the first step is inaccurate due to residual noise and finite pixel extension. Because an intensity profile across a spot is nearly sinusoidal, it can be fitted by a parabolic curve in a region close to the estimated intensity maximum. The maximum of the parabolic curve that is fitted in the second step yields a more accurate estimate of the peak position, because not only maximum intensity is used for the position determination (which can be affected severely by noise). Instead, many intensity values around the true maximum position are used together with an approximated functional relationship. A further gain in accuracy is reached by the third step, where parabola are fitted along four lines, which is shown in F i g . 4 ~The . maxima positions of the parabola are averaged, leading to the final position W3’. For each position, the standard deviation G = (crx, cry) of the maxima positions of the four parabola is stored. A typical value for 101 is 0.2 pixels. The generation of grid lines is performed in the third step. The procedure starts with the selection of two directions d , and d2 along which the grid lines are intended to run. For each of the detected positions P its next neighbor N ’ ( N 2 ) lying in positive direction d , ( d , ) is searched (Fig. 5). The next neighbor position in direction - d , ( - d , ) is A ’ ( A 2 ) .In this way, strings of neighboring positions are formed representing two sets of grid lines, one set (1) with lines along d , and the other (2) with those along d,. In the fifth step, a point of the image is chosen that will be used as the intersection point (marked with a dot in Fig. 4d,e) of two axes x and y with the x-axis in the horizontal and the y-axis in the vertical direction of the image. The grid lines of set l(2) intersect the x-axis at an angle of ai(a:) and the y-axis at a:(.;). If la; - n/21 < la: - n/21, then the x-axis is used for the indexing of the grid lines of set 1, else the y-axis. The indexing is performed in such a way that the indices correspond to the order of the intersection points of the grid lines with the appropriate axis x or y. An analogous procedure is performed with the grid lines of set 2. The result is shown for two different choices of d , and d,: in Fig. 4d for the [ l l O ] and the [OOl] directions and in Fig. 4e for the two (1 11) directions. In this way, we obtain
132
A. ROSENAUER AND D. GERTHSEN
FIGURE5. Schematic drawings that show the positions N', N2, and P as well as the positions A' and A'. The distance arrows point from a position to its neighbor position. Therefore, only the N' and N z positions are considered to be neighbors of P , whereas P is a neighbor of both A' and A'. The parameters g1.2,3.4regulate the contribution of the individual distance arrows to the calculation of local distances: (a) corresponds to grid lines shown in Fig. 4e whereas (b) corresponds to Fig. 4d.
a 2-D grid where each lattice point P is characterized by two indices i and j . Therefore, a lattice point P will be denoted by P i , j .An example shown in Fig. 4f was obtained with the indexing shown in Fig. 4d. Note that there may be some positions such as i,j = 1,2 that do not belong to an existing lattice position Pi,j. The gridding of the evaluation example is also shown in Fig. I, where the directions d , = [ l l O ] and d2 = [OOl] were chosen. 3. Caicuiation of Lattice Base Vectors
In this section we describe the generation of lattice base vectors. They will be used in the next section for generation of a reference lattice that facilitates calculation of local displacement vectors. The lattice base vectors should be deduced in a reference lattice region without deviations from the perfect crystal structure, which is located far away from any lattice defects. The region indicated by the green rectangle in Fig. 1 is chosen as a reference area. The lattice positions inside the reference area are used to calculate two lattice base vectors a1 and a2 which correspond to the directions d , and d2.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
133
For each grid line, all those positions that lie within the reference region are used to fit a straight line; this results in two sets of straight lines. The directions 2, and a, of the lattice base vectors are calculated by averaging the gradients of the fitted straight lines. The positions of each grid line found inside the reference area are projected onto the directions 6, and i 2 .The distances between neighboring projected points are averaged for each of the sets 1 and 2, which yields the lengths of the base vectors a , and a2, respectively. It is appropriate to note that the lattice base vectors that are obtained in the described way are not understood to constitute lattice translation vectors. According to Fig. 4d, the lattice base vector parallel to to the (virtual) position the x-direction points from the position (Fig. 4f). In this case, the grid consists of two sublattices that are marked by dark and white grid lines in Fig. 4f. 4. Culculution o j Local Displacements und Luttice Spacings The lattice base vectors a , and a2 obtained in the previous section are now used to calculate reference lattice positions. The positions
Ri,j= ia,
+ j a , - a.
(6)
that form a reference lattice can be directly compared with the accompanying positions P i , j . The vector a. in Eq. (6) results from the condition that the sum of deviations Ri,j- P i q jcalculated inside the reference region (Ref. R.) vanishes:
The standard deviation of R,,j - Pi.j computed inside the reference lattice can be used to estimate the accuracy of the position determination. We define
Ref.R.
where N R e f , R , is the number of lattice positions inside the reference region. Typical values are in the order of 0.005nm. In the case of our evaluation example, 6 = 0.004 nm is obtained. Assuming a crystal lattice parameter of 0.6 nm, we can estimate an accuracy of the local lattice-parameter determination of 0.5% to 2%. The next step consists in the definition of local displacement vectors . = p 1.J. . - R 1.J . ..
1.1
(9)
For most purposes the displacement vectors u ; , have ~ to be projected onto
134
A. ROSENAUER AND D. GERTHSEN
a certain lattice direction parallel to the selected direction
Q = ka, + la,,
Q given by (10)
where the values k and 1 are small integer numbers that determine the direction on which the distance vectors will be projected. Using Eq. (lo), we obtain the projected displacements
which is consistent with the definition used in the DALI program package. Here N,, is the number of sublattices of the 2-D grid; it is 2 in the case of Fig. 4f. The choice of the grid line directions shown in Fig. 4e would lead to N,, = 1. Local lattice distances between next neighbor positions are defined by
m=1
(12) where N:,? and A:,j2 have previously been defined in Section 11-A-2. For clarification, N:,j2and A:,? as well as the position Pi,jare shown in Fig. 5a, b for the most common choices of d , and d,. The values Y , , , , ~ , ~ E- l{, O , l} regulate the contribution of the four distance vectors to Ai,j. Figure 6 depicts the local displacement vectors according to Eq. (9), and which were evaluated from the right-hand side of the island shown in Fig. 1 with the DALI program package. Due to the larger lattice parameter in the island, the lengths of the displacement vectors grow from the interface to the top. In the vicinity of the island border the displacement vectors exhibit a component parallel to the interface plane, which is due to the elastic relaxation of the strained island. The local projected displacements according to Eq. (11) are displayed in Fig. 7 as color-coded maps. In Fig. 7a, the displacement vectors are projected onto the growth direction using k = -2, I = 0 in Eq. (lo). It might be astonishing that k = -2 is chosen instead of k = 1. First, a negative value for k is selected because the indices of the horizontal grid lines in Fig. 1 increase from the top to the bottom. Hence, the lattice base vector a , points towards the bottom. In order to achieve positive values for the projected displacement vectors, Q has to be chosen pointing in growth direction. Second, k = - 2 instead of k = - 1 is
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
135
magnified 2x Experiment FEM
FIGURE 6 . Part of the displacement vector fields evaluated from Fig. 1 (drawn in red) and obtained by finite element calculation (blue) as outlined in Section 11-C-2. (See also Plate 12.)
chosen because the selection of Q occurs in the DALI program by two mouse klicks on two adjacent lattice positions, which have to be P i , j and P i - 2,j. As already discussed at the end of the previous section, the position P i - does not exist. To take the indexing into account, the normalization factor (IQI/NsL)- was introduced in Eqs. (10) and (1 1); it halves the length of Q in the case of NSL= 2. Figure 7b shows the projection of the displacement vectors onto the direction parallel to the interface plane, which was achieved by Q = 2a, in Eq. (10). Red regions indicate displacement vectors pointing to the right whereas blue regions correspond to a component to the left. 5. Calculation
05Averaged Displucements und Lattice Spucings
In some cases it is appropriate to average the local displacements and lattice spacings. In the DALI program package, the scalar values ui,j and Ai,j can be averaged either along the whole length or on only a part of the grid lines. The region in which the averaging is performed can be chosen by the selection of an “area of interest” (AOI). As an example, Fig. 8 shows averaged displacements as a function of the monolayer number along the growth direction that were obtained from the local displacements of Fig. 7a. The displacements were averaged along the horizontal grid lines inside an AOI, which is marked by a blue rectangle in Fig. 1. The vertical dashed line in Fig. 8 indicates the position of the surface besides the island. It is conspicuous in Fig. 8 that the displacements to the
136
A. ROSENAUER A N D D. GERTHSEN
FIGURE 7. Color-coded maps of the components of the displacement vector field (a) in growth direction and (b) in interface direction (a positive value indicates a displacement vector pointing to the right) deduced from Fig. 1. (See also Plate 13.)
right of the surface marker are not equal to zero as might be expected, but show a weak slope. This effect can be explained by a strain inside the GaAs buffer that is caused by the biaxially strained island; this will be verified using finite element simulations in Section 11-C-2. The previous annotation clearly shows that a quantitative interpretation of the displacement vector field requires a known correlation between the strain field and the chemical and geometric structure of the investigated sample area. One way to find a correlation is application of finite element calculations based on a sample geometry derived from the HRTEM image.
137
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION 1
0.8 c
ia
'
I
'
I
I
1
I
I
I
,
I I
C
0.6
8 m
c
Q u)
5 0.4
0.0 (002) plane number FIGURE 8. Components of the displacement vector field averaged for each monolayer in growth direction along the horizontal direction in the area of interest (AOI) of Fig. 1, marked with solid dots. The open squares are the result of the finite element method (FEM) simulations obtained for the FE-model with the best fit (see Section 11-C-2).
Whereas the projected shape of the island is directly visible in the HRTEM image, the evaluation of the local sample thickness constitutes a more complicated problem for which we will give a possible solution in the next section. B, Determination of the Sample Thickness This section deals with the determination of the local sample thickness, which is the basis for the finite-element modeling that will be described in the next section. Our approach is based on quantitative analysis of information from the transmission electron micrographs (QUANT1TEM)-procedure suggested by Schwander et ul. (1993), Ourmazd et al. (1989), Ourmazd et al. (1990, 1993), and Kisielowski et al. (1995). The QUANTITEM procedure has recently been discussed with regard to composition evaluation in ternary mixed crystals like In,Ga, -,As (Maurice er ul., 1997). QUANTITEM detects the projected crystal potential that is proportional
138
A. ROSENAUER AND D. GERTHSEN
to the sample thickness. In Kisielowski et ul. (1995) it is stated that the QUANTITEM analysis is valid to the extent that dynamical scattering in the investigated material can be described in terms of two Bloch waves. However, it is also shown in Kisielowski et u1. (1995) that the procedure can be used for 111-V semiconductors like GaAs or AlAs where three Bloch waves are excited with substantial intensity. Our implementation of the QUANTITEM-procedure as a part of the DALI program package uses decomposition of the HRTEM micrograph into image unit cells that are given by the 2-D grid from Section 11-A. The QUANTITEM method is based on the interpretation of each image unit cell as an image vector. Its dimension is given by the number of pixels included in the cell and should be equal for all cells of the image. The image unit cells from Section 11-A may differ in their sizes and angles. Therefore, the first step in the thickness determination procedure (described in the following section) is a transformation of the image cells that then provides quadratic cells of identical size 2" x 2" pixels (typically n = 5).
1. Cell Transformution
In this section we show an algorithm that transforms an irregularly shaped cell Z into the regularly shaped cell 2' (Fig. 9b). T o keep all of the information contained in cell Z , the size of cell Z' significantly exceeds the size of cell Z. For a description of the transformation procedure we have to define the four vectors u, b, c and d(u', h', c', and d') that point from the center of area M(A4') of the cell Z(Z') to the four points at its corners (where M is given by M ( x , y= ) (x, y ) dx dy/Jz dx dy). The following procedure is applied to obtain the intensity of a pixel pLm inside the regularly shaped cell Z': For each pixel pLm its midpoint pLm is described by a linear combination
sz
of the two adjacent vectors v; and v i (e.g., v;
=d
and v;
= c'
for the pixel
phmshown in Fig. 9b). The corresponding position p in the cell 2 is given by
P
= EVl
+ PV2,
(14)
which generally does not coincide with the midpoint of any pixel in Z. The vectors v1 and v2 are the two vectors in the cell Z which correspond to vf1 and v; in Z' (e.g., v l = d and v2 = c in Fig. 9b). We find four pixels prim, Pn,m+1, P , , + I . ~and pn+ l . m + l in Z whose midpoints define a square containing p (Fig. 9c). The values of these four pixels are used to calculate the intensity of the pixel pLm.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
139
FIGURE9. Schematic drawing, which explains the transformation of the original image unit cells as determined by DALI into quadratically sized lattice cells 2': (a) depicts a small part of the HRTEM image shown in Fig. 1. The detected lattice sites at the centers of the bright dots are connected by white lines. The black rectangle indicates the original cell Z ; (b) illustrates the transformation of the cell Z into the quadratic cell Z . The point [J inside the cell Z corresponds to the midpoint of the pixel pbm inside the cell Z ; (c) shows the system of coordinates used for the description of the transformation procedure. The crosses mark the midpoints of the pixels. The brightness of each pixel is described by its intensity value I,,,.
140
A. ROSENAUER AND D. GERTHSEN
For that purpose, four vectors Y, P,,+ 1, Y,+ and r,+ that point and pn+ I,,,+ 1 from p to the midpoints of the four pixels p,,, prim+ 1, pn+ inside the cell 2 (Fig. 9c) are defined. Using a coordinate system where the distance between adjacent pixels is 1 (Fig. 9c), the intensity I;,,, of pixel pkm is calculated by
I;,
1
=-
S
C
rij(1 - Irij12)fii
1 j=m,m+ 1 i=n.n+
2. Determination of Relative Thickness Values The intensity values I;,,, of the pixels Q;, inside a cell 2' are used to define the vector R
= (I; 1,.
. . ,I i N , 1; 1 , . .. ,I i N , . . . ,IbN),
(16)
with N = 2" and n being an integer number. Three template images R::2,3 are calculated by averaging the image vectors of the cells contained in three small regions. As shown in Kisielowski et al. (1995), the result of the QUANTITEM evaluation is nearly independent of the selection of the three (different) regions. The further steps of QUANTITEM are based on the assumption that each image vector R can be expressed as a linear combination of the template image vectors
R z plRT + p2RT + p3RT,
(17)
which defines a 3-D subspace in the N2-dimensional image vector space. Furthermore, the tips of all image vectors R lie on a plane E in the 3-D image vector space (Fig. 10a). The vectors El,, given by
-
B , :=
Rf - R l IRT - RTI
and B, :=
v- B1(V-E1) with IV-
i1(V*k1)1
V = R3'- R: (18) IRT - RTI 3
are an orthonormal basis of E (Fig. lob). Due to the noise of the HRTEM image, the tips of the evaluated image vectors ReValmay deviate slightly from the plane E . Therefore, we use ReValto define an in-plane vector TLlvaland a vector T&alperpendicular to E (Fig. lob).
ATOMIC SCALE STRAIN AND COMPOSITlON EVALUATION
141
2b
FIGURE10. Visualization of the QUANTITEM procedure which explains (a) the plane E, which is defined by three template image vectors R I , Z , 3It. contains the cloud of experimental data points represented by vectors (one of them is marked by R,,,,) whose tips form an ellipse; (b) shows the decomposition of the vector T into the components TI,?a, parallel and Tival perpendicular to the plane E ; B, and 8, form an orthonormal basis of E ; (c) illustrates the meaning of some variables used in Eqs. (20) to (22).
142
A. ROSENAUER AND D. GERTHSEN
which is in the range AXval 0.01 - 0.1, is used The value AXv,,:= to estimate the reliability of each value TL\,I. The data i"~v,l= xib, yib2 ( i = 1,2,. . . numbers the unit cells) obtained for each cell Zi are used to fit an ellipse given by x('p)k, y ( ' p ) b , with
+
+
x(9) = a cos cp
+ xo
Y ( ' p ) = b sin('p - 'po)
+ Yo.
(20)
The values a, b, xo, yo, and 'po, which are explained in Fig. lOc, are obtained from a fit procedure. They are used to calculate the angle cpi corresponding to each cell Zi by: 'pi= arctan
a sin cpi + b sin 'pocos 4; b cos cp; cos 'Po
where
4i= arctan (m). (21) Xi-Xo
To obtain sample thickness d, at the position of each image unit cell, we use the following approximation given in Kisielowski et a / . (1995):
where 5 is the extinction distance of the undiffracted beam along the (110)-zone axis (e.g., 5 = 14.7nm for GaAs at 200kV), and 0 is an angle that defines the origin of the thickness scale. The angle 0 is unknown, and its determination requires some additional information. If the thickness of one unit cell is exactly known, 0 is obtained by Eq. (22). The procedure to obtain 0 is outlined in the next section. Another approach that does not require three template image vectors uses the correspondence analysis (CA) seen in Aebersold et nl. (1996), which is also implemented in the DALI program package. This procedure is analgous to the interpretation of the cloud of data points given by Eq. (16) as a distribution of masses that approximates the shape of a nearly 2-D ellipsoid. The CA is used to find the axes of least inertia, which are the "main axes" of the ellipsoid by the calculation of the eigenvectors and eigenvalues of the matrix of inertia. The two eigenvectors that correspond to the two largest eigenvalues define a plane that is analogous to the plane E shown in Fig. 10. The projection of the image vectors onto the plane yields an ellipse that can be evaluated analogously to Eqs. (20-22). However, we found that the application of the CA does not improve significantly upon the use of the three image template vectors RT,2,3.Furthermore, calculation of the two largest eigenvalues and their corresponding eigenvectors takes about 30 min in the case of a 1024 x 1024 matrix of inertia. We turn now to the evaluation of the experimental HRTEM image shown in Fig. 1. However, in this section we use a region of the photographic
143
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
Sample thickness = 0 nm
-0.05
0.00
0.05
0.15
0.10 p
FIGURE11. The open circles indicate the tips of vectors solid line shows an ellipse fitted to the experimental data.
s
i
,
0.20
/[arb. units]
fldva, on the plane (h,,B2).The
negative that contains a larger part of th_e substrate (Fig. 13). Figure 11 shows the tips of the vectors TI,?,f on the (Bl,k2)plane. The solid line is the ellipse, which was fitted to the experimental data according to Eqs. (20) and (21). The thicknesses that correspond to the data points increase in clockwise direction. The next section outlines two methods to calculate 0, which can be used to determine absolute thickness values. 3. Determination of Absolute Thickness Values
A simple method to obtain 0 that works with sufficient accuracy in most cases uses an image vector R, = (1; l , . . . , . . ,Z;VN) with I\ l , . . . , 1; . . ,l i N= c. Here R, represents an image cell that contains only one gray level c corresponding to the image that is expected for a vanishing sample thickness. This cell is added as another data point to the cloud of points formed by Eq. (16), leading to an additional point ( i o , j o ) in E as indicated with a cross in Fig. 11. A straight line that connects the origin of intersects the ellipse at a point (xo,yo), the ellipse with the point (,fo,jo) which can be regarded as the point of the ellipse that corresponds to a
144
A. ROSENAUER A N D D. GERTHSEN
sample thickness 0 nm. However, in systems where three Bloch waves are excited with substantial intensity, this may constitute an insufficient approximation. A second procedure to determine the offset angle 0 of Eq. (22) is based on the method of Stenkamp and Jager (1996) and Stenkamp and Strunk (1996), who suggested considering the amplitudes of appropriate Fourier coefficients Ji. The intensity distribution Z(v) in the high-resolution image is derived from a Fourier sum
in which the complex Fourier coefficients J ( g ) depend on the beam amplitudes and phases and on the microscope parameters as seen in Stenkamp and Jager (1996) and Stenkamp and Strunk (1996). A fast Fourier transform algorithm is used to obtain the Fourier coefficients J ( g ) and to calculate the amplitude
The offset angle 0 may now be derived using a known functional relationship for J ( g ) in dependence of the specimen thickness. Especially the relation
can be used because the amplitudes of all Bloch waves with g # (000) are zero for a vanishing sample thickness. It is appropriate to note that double spacing images, which are dominated by the (220)-lattice fringes, and images showing complex contrast features that originate from low intensities of the J , coefficients cannot be used. However, this restriction does not affect the applicability of the described procedure in practice if a defocus series is taken instead of a single HRTEM images. Both for the conditions mentioned here and for sufficiently small thicknesses the following approximation can be made
IJ,
11~4l/lJ000~4l
4
(26)
which works for GaAs in (110) projection up to a sample thickness of about 6 nm as verified by EMS simulations. Figure 12 displays IJ, ,(d)l/lJ ooo(d)l evaluated from the part of the experimental HRTEM image previously used for the computation of Fig. 11. In this case, the offset angle 0 was calculated as an average of the results obtained for the two methods described in the foregoing. The first method leads to an origin of the thickness scale that is indicated by a cross in Fig. 12.
145
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
0.20
0.15
-2 --. b -
b 0.10 -
0.05
0
2
4 6 a 10 Measured thickness / [nm]
12
14
FIGURE12. The ratio \J,,,\/\Jo0,J for each image unit cell is plotted versus the measured thickness. The dashed line shows the extrapolation of the line that is fitted to the data below 6 nm. The cross on the thickness axis marks the origin of the thickness scale that is obtained using an image unit cell of uniform intensity.
The second method yields an origin given by the intersection of the straight line fitted to the data points (marked with a dashed line) below 6 n m and with the thickness axis in Fig. 12. Together with an error of thickness determination, which is obtained from the spread of the data points in horizontal direction as indicated in Fig. 12, we can estimate a maximum error of f1.5nm in the present example. Figure 13 shows the resulting thickness map, which reveals a wedge-shaped crystal whose thickness increases from the upper part to the bottom of the image.
C. Considerution of the Elustic Relaxation The determination of the local sample thickness is the basis for the development of a 3-D model for the application of the finite-element (FE) method (which represents the main subject of the current section). The FE method is introduced to take into account the elastic relaxation of the tetragonally distorted epitaxial layer, which is due to the small sample thickness in electron beam direction of less than typically 20 nm. In the next
146
A. ROSENAUER AND D. GERTHSEN
FIGURE13. Color-coded map of the evaluated thicknesses. (See also Plate 14.)
section we calculate the tetragonal distortion for two cases: the thin- and the thick-sample limits. 1. The Analytical Solution of the Thin- and the Thick-Sample Limits Figure 14a, b depicts the situation that applies for cross-sectional samples of quantum well type heterostructures. In both cases the strained layer (gray) is able to expand at the free sample surfaces. In the case of the thin sample (Fig. 14a), the strained layer is elastically relaxed at a maximum extent. This
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
a
147
I’
H
Electron bea direction
-&
; I I-
*-* *
#
FIGURE14. Sketch showing the reduction of the tetragonal distortion of the strained layer (marked in gray) in a specimen that is thin in electron beam direction (a) in comparison with a biaxially strained thick sample (b).
leads to a diminished tetragonal distortion in comparison to Fig. 14b that corresponds to the case of a very thick sample that is equivalent to bulk structure. The displacements evaluated in Section 11-A depend on the local composition according to Vegard’s law, which for a ternary material A,B,-,C is given by aArBl -,C
= XaAc
+ (1 - X ) u B , ,
(27)
where aACand a,, are the lattice parameters of the binary components. The main application of strain-state analysis is the composition evaluation in pseudomorphically grown structures. In this case, the unit cells of the epitaxial layers are tetragonally distorted. The lattice parameter a,, of a layer unit cell parallel to the interface plane and perpendicular to the electron
148
A. ROSENAUER AND D. GERTHSEN
beam direction is defined by the lattice parameter a, of the substrate (Fig. 14). The lattice parameter .I";" parallel to the interface plane and parallel to the electron beam direction as well as the parameter a, perpendicular to the interface plane vary locally. For comparison of the FE calculation. with the experimental displacement field we use the approximation that an atomic distance measured from the HRTEM contrast pattern corresponds to a lattice parameter a, that is averaged along the electron beam direction. In the following, the lattice parameters a, are calculated under the assumption of a cubic crystal structure for the electron beam directions (1 10) and (100). If the reference area is chosen inside the substrate (lower crystal part in Fig. 14) the lattice parameter that corresponds to measured lattice spacings in growth direction (Eq. (12)) are, in the two limiting cases of a thin or a thick sample, given by as - a, - cIR----a, - a -a,
a,
where a is the (local) bulk lattice parameter and Ci,jare the elastic constants of the strained layer. Table I gives an overview of the elastic constants and the lattice parameters of the semiconductors used here. Equation (28) can be used to calculate the error of the composition determination if sample thickness is not known. An analytical solution analogous to Eq. (28) for any TABLE I ELASTICCONSTANTS AND LATTICE PARAMETERS OF THE SEMICONDUCTORS THATAREOF INTEREST ~~~~~
Material Lattice parameter/ [nm] at 20°C c l I/[GPa] cl2/[GPa] c44/[GPa]
~~
GaAs
InAs
ZnSe
ZnTe
CdSe
0.5653 118.1 53.2 59.4
0.6058 83.3 45.3 39.6
0.5669 81.0 48.8 44.1
0.6104 71.2 40.7 31.2
0.6081 66.7 46.3 22.3
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
149
sample thickness is given in Treacy and Gibson (1986). Assuming one of the extreme sample thicknesses, we can also give an approximation for the measured displacements. From Eqs. (9) and (11) we obtain
i=l
us
i=l
where u;’ is the (averaged) displacement of the grid line il parallel to the interface plane (Fig. 15). If the strained layer is a ternary material we define the term “integral” AC-content of the binary compound C,, of a ternary material A,Bl -,C in the layer by n
C,,
=
1 x(i)
in units of [MLAC],
130)
i= 1
where the distance between adjacent planes i and i + 1 of grid lines parallel to the interface plane is designated to be one monolayer (ML). From Eqs. (27) and (29) we deduce assuming a, = a&
where u,,, is the maximum displacement that is measured on top of the
I: lattice position
0: reference lattice position
FIGURE15. Schematic drawing that explains the increasing displacements u, in a region of a ternary material A,B, -,C with a larger lattice parameter as the binary compound BC if the reference region is chosen inside the material BC.
150
A. ROSENAUER AND D. GERTHSEN
strained layer and u i c is the lattice parameter in the “substrate” perpendicular to the interface plane. Note that sic is used instead of the bulk lattice parameter aBc in order to take into account cases where the “substrate” BC is a buffer layer that may itself be tetragonally distorted to a certain extent. 2. Finite Element Calculations In the previous section it was shown that the lattice mismatch of a strained layer causes a tetragonal distortion that is reduced by the small sample thickness. A further elastic relaxation can take place for an island that is able to expand at its free surfaces. In the case of the composition evaluation of islands the application of FE-calculation is recommended. It is also advisable for investigation of 2-D buried or free standing layers if the composition estimation based on Eq. (28) is not sufficiently accurate. The FE-calculation starts with the generation of a 3-D geometric model of the sample, which we perform with the MSC-PATRAN program (see reference). For that purpose, the projected shape of the sample that is visible in the HRTEM micrograph as well as the local sample thickness that is determined according to Section II-B are exploited. The 3-D model is composed of “solids” containing a uniform composition. An island or a 2-D layer is therefore designed by a stack of slices where the slice thicknesses in growth direction usually are in the order of two to four monolayers. Figure 16 shows the decomposition of the FE model into solids for the evaluation example. In order to simulate an assumed concentration profile, elastic constants, a virtual thermal expansion coefficient @Thermal, and a heating temperature AT are assigned to each solid with the appropriate material parameters. The expansion coefficients are introduced to simulate a local lattice mismatch that will occur during the FE calculation by a heating of AT. In practice, AT = 0 is chosen for the solids of the substrate and AT = 1 for the solids of the strained material. Therefore, the thermal expansion coefficient of a solid has to fulfill
where as is the lattice parameter of the substrate and a the bulk lattice parameter corresponding to the material of the solid. One also has to define the orientation of the coordinate system (a,,, jet, 2,J associated with the elastic constants, which may deviate from the orientation of the coordinate axes (igeomr jgeom, igeom) used for the generation of the geometric model. For the generation of the geometry the jgeom-
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
151
FIGURE16. Geometry of the FE model and its decomposition into solids of uniform composition.
axis is usually chosen parallel to the growth direction and the igeom-axis parallel to the electron beam direction. If the latter is a crystallographic (1 10) direction, the cordinate system associated with the elastic constants can be formulated as (a,,, j,,, zec) = (2geom + igeom, jgeomr- Ageom + igeom). Furthermore, boundary conditions are defined for displacements u of the = uYgeom = uzeeam = 0) as well as for the bottom plane of the FE model (uXgeom = 0). side planes (uXgeom The third step is decomposition of the solids into finite elements where care has to be taken that the element density is high in regions where large displacement changes are expected (e.g., inside the island or the 2-D strained layer). Figure 17 shows the decomposition of the FE model into finite elements for the evaluation example. In our case, the structural data are written to a file (the input file for the ABAQUS solver (see references)). Figure 17 depicts a color-coded map of the resulting displacements in growth direction, given in nanometers. We apply the following steps to the result of the FEM to gain direct comparison with the experimental displacement values determined by the strain-state analysis of the HRTEM image: In the first step, atomic positions are calculated in the 3-D F E model of the specimen, depending on crystal structure and orientation. Next, atomic displacements are determined by
152
A. ROSENAUER AND D. GERTHSEN
FIGURE 17. FE model with color-coded values of the components of the displacement vectors in growth direction. The color-coded scale is given in nanometers. The light-blue grid indicates the finite elements. (See also Plate 15.)
interpolation of the surrounding nodal displacements. Finally, the atomic positions and displacements are averaged along the atomic rows in electron beam direction. As a result, a 2-D field of projected atomic positions and displacements is obtained that can be evaluated with the DALI program. The result of the FEM simulation is then compared with the experimental displacements. In an iterative process, the solid compositions are changed until sufficient agreement with the experiment is achieved. The displacement vectors that result from the FE-calculation (blue) can be compared in Fig. 6 with the experimental displacement vectors (red) evaluated from a part of the example HRTEM image. The high degree of coincidence shows the validity of the FEM approach to the nanoscopic problem. Figure 18 displays the In-concentration profile in growth direction used for the FEM. Note that the assumed In-concentration does not vary along the planes parallel to the interface. Figure 19 shows color-coded maps of (a) the displacements in growth direction and (b) the displacements in [110] direction. The scaling is chosen identically to that of Fig. 7, which
153
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
80-
-
0 0 0 0 0
-
I
s 60-
0 0 0
u
-
0I
c
.-0 2 E 40-
t
+a
8 c
s 5
-
0 0 0:
. I
20-
-
:0 I
:
0~
1
1
1
1
1
1
1
1
1
1
-
0 0 0 0 0 0 1
~
1
1
1
1
,
1
1
,
(002)plane number FIGURE18. Resulting In-concentration plotted versus the (002)-plane number. The dashed line marks the position of the surface next to the island. The finite In-content t o the right-hand side of the dashed line corresponds to the wetting layer with a thickness of one ML.
FIGURE 19. Components of the displacement vector field in (a) growth direction, and (b) horizontal direction evaluated from the FE calculation as described in Section 11-C-2. (See also Plate 16.)
154
A. ROSENAUER A N D D. GERTHSEN
reveals the good agreement between Fig. 19 and Fig. 7. Figure 8 contains the FE-displacements averaged in a region corresponding to the A01 in Fig. 1. 111. COMPOSITION EVALUATION BY LATTICE FRINGE ANALYSIS The alternative composition evaluation procedure presented in this section does not exploit the information of lattice parameter fluctuations. Therefore, it may be regarded as an analysis method that is complementary to the strain-state analysis described in Chapter 11. First, we will explain the basic idea that leads to the CELFA composition evaluation by lattice fringe analysis (CELFA) method. A. The Basic Idea Behind Coniposition Evaluation by Lattice Fringe Analysis
To outline the basic idea of CELFA let us use the abstraction of the composition evaluation problem sketched in Fig. 20. The investigated crystal is considered to be a system that is defined by N parameters P I . .. P,, where examples are the crystal structure, the orientation, or the composition. The electron beam constitutes an incoming test signal Sinput. The response of the system on the test signal is the intensity distribution of the real image or the diffraction pattern, which is called an outgoing signal Soutpul. The response signal is proportional to the test signal and contains a function of the parameters P , ... P, as well as an additional noise signal Snoise. The noise signal disturbs the measurement and leads to errors of the interpretation of the response signal, which is exploited for computation of some of the parameters P , . . . P,.
Incorning test-signal
defined by the System' parameters P,..P,
1
Outgoing
sinput
so"fp"t=
ffpf *PN1+L *
FIGURE20. Sketch showing the abstraction of the composition determination problem where the investigated crystal is regarded as a system defined by N parameters. The incident electron beam corresponds to the incoming test signal that interacts with the system. The response signal of the system contains a function of the parameters and an additive noise signal.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATlON
155
In our case, the parameter “composition” is of special interest. The information concerning the composition is contained in many of the Bloch waves. In the sphalerite structure, the structure factor SF,,, of a reflection (hkl) is given by
SF,,,
= 4[fA
+
fB
exp{i2n(h
+ k + !)/4}],
(33)
where the f A , Bare the atomic scattering factors of the two-atomic basis, for example, A = Ga, In and B = As. In the case of the (002) reflection we obtain SF,,, = 4 ( f A- f B ) . Therefore, the (002) reflection depends strongly on the chemical composition and is called a “chemically sensitive” reflection. It is appropriate to note that this is not always the case. For an electron beam direction close to, for example, a (1 10)-zone axis, an “artificial” excitation of the (002) beam can occur due to multiple scattering (e.g., (1 11) + (111) = (002)). Figure 21a shows as an example an InGaAs layer buried by 10 nm of GaAs, which was taken with a strongly excited (002) beam (close to the [loo]-zone axis) centered on the optical axis. Figure 22 depicts the dependence between (002)-beam intensity and In-concentration for sample thicknesses up to 150 nm, calculated with the EMS program package using the Bloch-wave method. In Fig. 22, each intensity curve is normalized in such a way that it is 1 in GaAs. Figure 22 can be used to compute a color-coded map of the In-content from Fig. 21a, which is then presented in Fig. 21b, where the sample thickness was assumed to be smaller than 100 nm. In this way one can gain an overview impression of the In-concentration in the wetting layer, which covers the GaAs substrate and in the islands. However, local composition evaluation is inaccurate because of the large amount of noise that is visible in Fig. 21a, b. The amount of noise can be reduced by filtering techniques that are applicable to nonperiodic images. An example given in Baba and Kanaya (1989) is an approach using the autocorrelation (AC) function where uncorrelated information close to the center of the AC spectrum is removed (Fig. 21c). However, Fig. 21c reveals that the accuracy of the local composition detection is not sufficient even in the case of the noise reduced image. Moreover, blurring of the interfaces is observed. For further development of the CELFA procedure, let us again consider Fig. 20. The situation shown in Fig. 20 reflects not only experimental methods like the electron microscopy, but applies to a wide variety of measurement techniques. A simple example is the recording of the current versus voltage curve of some resistor or semiconductor samples. In terms of Fig. 20, the incoming test signal is the applied voltage, the system is the resistor or the semiconductor, and the response signal is the measured current. The system is defined by its resistance, which is calculated by Sinpu,/Soutput. If the resistor is used for the temperature measurement in a
156
A. ROSENAUER AND D. GERTHSEN
FIGURE21. (a) (100) TEM dark field image of an In,Ga,-,As/GaAs (001) StranskiKrastanow layer capped with lOnm GaAs, obtained with the strongly excited (002) reflection centered on the optical axis; (b) color-coded map of the local In-content calculated according to Fig. 22; (c) TEM image; and (d) color-coded map after noise reduction. (See also Plate 17.)
helium cryostat, the produced heat has to be kept as low as possible. In this case, the measured currents are low and the resistance measurement is disturbed significantly by noise. The signal-to-noise ratio (SNR) is improved by the use of the lock-in technique. This means that the test signal Sinpur is modulated with a defined frequency fmod. The output signal that is
U '
0
I
2
I
I
4 6 Normalized image intensity
I
8
FIGURE22. Indium concentration plotted versus the normalized image intensity calculated for an image that is obtained with the (002) beam centered on the optical axis. The curves corresponding to different sample thicknesses as given in the legend are calculated with the Bloch-wave method of the EMS program package. Each computed curve is normalized with respect to the intensity in the GaAs ( x = 0) at the appropriate thickness.
158
A. ROSENAUER AND D. GERTHSEN
proportional to Sinput shows the same modulation. A Fourier filter is used to measure the amplitude of the Fourier coefficient corresponding to f m o d . In this way, the SNR is improved significantly because the Fourier spectrum of the noise signal generally contains only negligible contributions of the frequency fmod. The question that arises at this point is: How can we make use of the lock-in technique for composition evaluation with chemically sensitive reflections? Of course, modulation of the test signal “beam current” is the wrong way because the noise signal that stems mainly from amorphous surface layers is alscproportional to the test signal. In this case one has to consider a spatial modulator rather than a modulator in time. If we remember that HRTEM images are composed of a limited number of spatial frequencies we can learn that the crystal lattice itself can take over the task of a spatial modulator. From the preceding discussion, the closest approach to the lock-in technique would be an image that shows only one spatial frequency leading to a fringe pattern where the local amplitude of the fringes is proportional to the local amplitude of the (002) reflection. In this case, the noise-filtering technique described in Section 11-A-1 and that works with periodic images could be used. The measurement of the local amplitude of the (002) reflection in image unit cell diffractograms would yield undisturbed information. How can we approach this idealized concept in practice? First, we have to consider that the (002) beam that carries relevant information should not be modified by lens aberrations. Therefore, the (002) beam has to be centered on the optic axis. In this way delocalization effects (Thust et ul., 1996) are also avoided because they depend on the spatial derivative of the aberration function that vanishes on the optic axis. Second, adsorbed objects at the surface of the specimen can lead to a local modification of the signal, which would induce an error in composition detection. Therefore, we suggest the use of two spatial frequencies, where one of them does not carry significant chemical information as it is approximately the case for the (004) beam. The amplitude of the second reflection depends on local absorption but not on composition. Therefore, we suggest the measurement of the ratio of the amplitudes of the (002) and the (004) beam. Third, one has to take into account the nonlinear image formation if the Fourier amplitudes of image cell diffractograms are measured. In summary, we suggest the following imaging condition in the case of the sphalerite structure: A three-beam condition close to the [100]-zone axis is required where only the (000), (002), and (004) beams are strongly excited. The chemically sensitive (002) reflection has to be centered on the optic axis. Figure 23a,b shows the imaging condition that will be used for the
15.5
14.0 -0.1
0.1
MAGNETIC flELD (T) PLATE1. Conductance measurements of open quantum dots provide a spectroscopic probe of their discrete level spectrum. The energy level spectrum of an isolated (0.3 ,urn) dot is shown on page 6; here, we show the corresponding conductance contour plot, obtained with four modes now propagating in the dot leads. Lighter regions correspond to higher conductance.
PLATE2 . Quantum mechanical wavefunction simulation showing electrons emerging in a highly collimated beam from a quantum point contact. The gate geometry is taken to be the same as the asymmetric pattern shown in Fig. 2(a). In the left-hand figure, only one occupied mode is present in the quantum point contact, while in the right-hand one, seven modes are supported.
PLATE 3. The well-defined periodicity observed in the weak field magneto-conductance fluctuations is found to be correlated to the recurrence of well-defined wavefunction scars within the dot. In this figure we show the behavior observed in a 0.4-pm split-gate dot, which reveals fluctuations with a fundamental frequency of 9 T-'. This frequency content does not change significantly as the dot lead openings are varied, and corresponds closely to the field scale over which a diamond scar recurs. Lighter regions in these probability density plots correspond to regions of enhanced probability density. The experiment was performed at a temperature of 0.01 K.
PLATE4. Diamond scars formed in quantum dots with different numbers of modes present in the quantum point contact leads. The dot size here is 0.3 pm, which corresponds to the effective size of the experimental dot studied in Fig. 8.
PLATE5. Self-consistently computed wavefunction plots, obtained from simulations of the splitgate dot geometry shown in Fig. 14. The plots were obtained at three different gate voltages and the darker regions correspond to enhanced probability density. A typical dot profile is shown in the upper left figure.
-0.419
-0.363 -0.25
0
0.25
MAGNETIC FIELD (TESW PLATE6 . Experimentally determined conductance contour plot, obtained for a 0.4-prnsplit-gate quantum dot. The color scale ranges from red to blue, indicating low to high conductance, respectively.
PLATE7. Numerically determined conductance contour plot, for a 0.3-pm quantum dot. The well-defined lines indicated by the arrows correspond to lines of constant scarring.
Plate 8. Left: The root mean square (rms) amplitude of the conductance fluctuations decreases exponentially with increasing temperature in experiment. Right: The experimental temperature variation is thought to reflect a similar exponential quenching of the wavefunction scarring, which is induced as the electron dephasing rate increases. In this figure the computed wavefunction in a 0.3prn dot is shown for a number of different phase breaking times (ta).
PLATE9. Convolution of the density of states with the derivative of the Fermi function in a I - p n gated dot at 2.9 T (only the two lowest Landau levels are shown). In this figure, only the upper left comer of the dot is shown, in the region near the input quantum point contact (see Fig. 2 for the dot geometry) (Bird et al., 1997e).
PLATE10. The edge state structure in a quantum dot at high magnetic fields suggests an analogy to the level structure of atoms. In this case, the red regions correspond to compressible electron gas and the calculation is performed for the gate geometry shown.
PLATE11. { llO} HRTEM image of an In,Gal.fis/GaAs(OO1) Stranski-Krastanow island containing the grid that connects the local brightness maxima of the dumbbells. The marked area of interest (AOI, blue frame) is used for the determination of the In-concentration inside the island. The reference area (green frame) is used for the calculation of the basis vectors of the reference lattice.
PLATE12. Part of the displacement vector fields evaluated from Fig. 1 (drawn in red) and obtained by finite element calculation (blue) as outlined in Section II.C.2.
PLATE13. Color-coded maps of the components of the displacement vector field (a) in growth direction and (b) in interface direction (a positive value indicates a displacement vector pointing to the right) deduced from Fig. 1 .
RATE 14. Color-coded map of the evaluated thicknesses.
PLATE15. FE model with color-coded values of the components of the displacement vectors in growth dirction. The color-coded scale is given in nanometers. The light-blue grid indicates the finite elements.
PLATE16. Components of the displacement vector field in (a) growth direction, and (b) horizontal direction evaluated from the FE calculation as described in Section II.C.2.
PLATE17. (a) { 100) TEM dark field image of an InxGal.fis/GaAs(OO1) Stranski-Krastanow layer capped with 10 nm GaAs, obtained with the strongly excited (002) reflection centered on the optical axis; (b) color-coded map of the local In-content calculated according to Fig. 22; (c) TEM image; and (d) color-coded map after noise reduction.
PLATE18. Color-coded maps of the local In-contentx of a nominally 2 nm thick In,Ga,,As layer capped with 10 nm GaAs. The cap layer is to the right of the In,Gal.,As. The maps are obtained from / lJw4/ and (b) lJw21of local unit cells using a mean value of (a) ~Tooz~ / lToo41and (b) I (a) I,,JI Too21calculated from the GaAs Buffer on the left-hand side of the In,Ga,.&.
PLATE19. Color-coded maps of the local In-content x obtained from lJoo21/ IJ I of local unit 904 cells using (a) a mean value of lToo21I 1Too41calculated from an area of 10x10 cells in the upper left comer of the image and (b) a local map of lToozl/ lToo41.
PLATE 20. (a) Local values of Too2/ T,,computed in two regions to the left and the right of the In,Ga&s; (b) local map of Too2 / Tm4 obtained after averaging and extrapolation of the values shown in (a).
PLATE2 1. Color-coded maps of the relative error Ax / x per Af = 1 nm uncertainty of the measand (c,fl AI,Ga,,As using (a,b,c) lJoo21 ured sample thickness for (a,d) In,Gal,As (b,e) CdJnl,Se / lJm41and (d,e,f) lJm21 for the composition determination. The graphs were computed according to Eq. (65) using the definition of S (x,t,0) given in Eq. (59) for (a,b,c) and in Eq. (66) for (d,e,fl.
PLATE22. Color coded maps of (a) local lattice parameters in growth direction; (b) local displacement vector field components in growth direction; and (c) thicknesses evaluated with QUANTlTEM procedure in the region indicated with a black rectangle in (a). The arrow in (c) marks the region corresponding to the In,Ga,,As where the thickness map yields invalid results.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
159
FIGURE23. (a) Electron diffraction pattern that illustrates the imaging condition. The white cross in the upper left corner marks the [100]-zone axis; (b) schematically shows the (000). (002), and (004) beams that are used for the formation of the HRTEM image. The chemically sensitive (002) beam is centered on the optic axis; (c) illustrates the possibilities of tilting the crystal around an axis perpendicular to the interface plane (light gray) and parallel t o it (dark gray). The former case results in a fringe pattern parallel to the interface and the latter leads to a pattern with fringes running perpendicular to it. Note that in the latter case the tilt angle should be smaller than 4' in order to prevent a significant broadening of the evaluated concentration profile.
description of the CELFA method. Figure 23c visualizes the crystal orientation for a specimen containing a thin buried layer with a [OOl] growth direction. There are two possibilities, one using the (000), (020), and (040), the other using the (000), (002), and the (004) beams, to gain the required three-beam condition. The difference between the imaging conditions using either the ( O O j ) or the (0,jO) beams ( j = 2,4) is that the fringes run either parallel or perpendicular to the (001) interface. In the first case, the fringe spacings depend on the local composition in strained heterostructures. In the second case, all fringes have the same spacing, the substrate lattice
160
A. ROSENAUER A N D D. GERTHSEN
parameter, in pseudomorphically grown structures. Therefore, strain effects that may influence the HRTEM image patterns are avoided if the (OjO) beams are used. As it is shown in Fig. 23c, the evaluation with the ( O j O ) beams requires a tilt of the sample around an axis parallel to the interface plane, which results in a broadening of the projected interlayer. Therefore, the tilt angle has to be small and should be in the range of 2”-4”. In the following treatment, we will not distinguish between the (OOj) or the (OjO) beams. The “standard procedure” for determination of the local of the composition is based on consideration of the ratio IJoo21/1Joo41 amplitudes of the (002) and the (004) reflections of Fourier-transformed image unit cells. The suggested procedure requires a defocus series of n images (typically n = lo), which enables evaluation of the local sample thickness. All free parameters of the image formation are derived directly from the defocus series with a simple procedure. Therefore, the CELFA method does not require any knowledge concerning imaging conditions (except sample orientation). Furthermore, we will show how the effect of locally changing imaging conditions across the HRTEM image can be taken into account.
B. The Fringe Images In Section 111 we change the evaluation example and use a buried In,Ga, -.As Stranski-Krastanow layer that was grown on a GaAs (001) substrate and capped with 10 nm GaAs. Figure 24a depicts a fringe image, which is the first image of a defocus series of 10 images. The defocus stepsize was adjusted to 9 nm. The image contains a large amount of noise. Figure 24b represents the same image after noise reduction, performed with the Wiener filtering method described in Section 11-A-1. Additionally, the back transformation of the noise-reduced Fourier transform was performed with only circular areas around the (002) and (004) reflections. The radii of the circular apertures were chosen in such a manner that the circles overlap. Inside the GaAs buffer and cap layer, Fig. 24b reveals a contrast pattern consisting of alternating bright, dark, less bright, and again dark fringes. The distance between the bright fringes is about 0.28 nm. With increasing In-content (from the left to the middle part of Fig. 24b), the intensity of the bright fringes decreases whereas the brightness of the darker fringes increases. As we will see later, their intensity becomes equal at an In-content of about 22%. From Fig. 24b we realize that the used imaging condition seems to be well suited for the compositional analysis that we intend. The next section provides the theoretical description of the observed contrast pattern, which is necessary for understanding of the described analysis procedure.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
161
FIGURE 24. HRTEM fringe image obtained with the imaging condition shown in Fig. 23; (a) the micrograph before; and (b) after Wiener noise filtering. The white rectangle in (a) marks the area that is used for the calculation of the sample thickness.
C. Theoretical Considerutions
We start with a consideration of the (000), (002), and (004) beams that contribute to the image formation. The amplitudes of all other Zero Order Laue zone (ZOLZ) beams are comparably low and will be neglected in the following. Figure 25 shows the amplitudes and phases of the three beams in In,Ga, -,As in dependence of the sample thickness and the In-concentration
162
A. ROSENAUER AND D. GERTHSEN
'
I , , , , , , , , , , , ,, ,, , , ,
I
0.00 0 5 10 15 20 25 30 35 40 45 50 Sample thickness [ nm ]
0 5 10 15 20 25 30 35 40 45 50 Sample thickness [ nm ]
-
FIGURE25. Results of Bloch-wave computations performed with the EMS program package for In,Ga, _,As for different indium concentrations. Amplitudes of (a) the (OOO), (c) the (002), and (e) the (004) beams are plotted versus the sample thickness; (b), (d), and (f) show the corresponding phases.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
163
x calculated with EMS. Figure 25 reveals that the amplitudes and phases of
the (000) and (004) beams show a rather weak dependence on N. This is also the case for the phase pooz of the (002) beam, whereas its amplitude uoo2 varies strongly with x. Note that aoozchanges its sign in dependence of x. This is due to a phase shift of n, which for clarity was attributed to the amplitude aoo2 instead of to poo2. For x = 25%, the curve aoo,(t) (t denotes the sample thickness) is close to zero throughout the entire sample thickness range. In this case, the image is formed by the interference of only the (000) and (004) beams, which leads to an image pattern consisting of fringes with the same maximum intensity at a distance of 0.14 nm. This is the case for all sample thicknesses t and defocus values A$ Figures 26 and 27 show the amplitudes and phases for Cd,Zn,-,Se and Al,Ga, -,As, respectively. In Cd,Zn,-,Se, the amplitude of the (002) beam vanishes at a Cd-content of approximately x = 40%. In contrast, Al,Ga, -,As represents a material where the amplitude of the (002) beam remains positive in the whole range of x. Let us now consider the nonlinear image formation in some detail for the given conditions. According to lshizuka (1980), the complex amplitude J ’ ( g ) of the reflection g of the image power spectrum is in the case of the untilted electron beam given by
J’(g) = C T(g
+ h, h; Af)F(g + h)F*(h)
(34)
h
where F(g) is the Fourier transform of the object transmission function. The T(g + h, h; A f ) is the transmission cross coefficient defined by Ishizuka (1 980), which has the properties Tk, h ; A f ) = T * ( k g;A.f)
(35)
T(g,h; A f ) = V - g , - k A . f ) .
(36)
and
As described in Section 111-A, the incident electron beam is tilted in such a way that the (002) beam is parallel to the optical axis. In this case, Eq. (34) is modified to J(g) =
c T(g + h
- goo,,
h - goo,;
+ h)F*(h)
(37)
h
As mentioned in the preceding, we only consider the beams (000), (002), and (004), which leads us to F(goo,)
=0
for 1 # 0,2,4.
(38)
Therefore, we obtain the complex Fourier coefficients of the three relevant
164
A. ROSENAUER AND D. GERTHSEN 1.oo
0.80 0
B
0.60 'El c
.--
5 0.40 0.
0.20
0.00 0.2
0.1 8
m
a
u
.z - 0.0 E+
Q
-0.1
-0.2
0.80
H
," 0.60 'El 2 -a .c
0.40
0.20
0.00
Sample thickness [ nm ]
Sample thickness [ nm ]
FIGURE26. Results of Bloch-wave computations performed with the EMS program package for Cd,Zn, _,Se for different cadmium concentrations. Amplitudes of ( a ) the (000).(c) the (0021, and (e) the (004) beams are plotted versus the sample thickness; (b), (d), and (f) show the corresponding phases.
ATOMIC SCALE STRAIN A N D C O M P O S I T I O N EVALUATION
165
1.oo
0.80 8
," 0.60 '0 3 c .Q 0.40
0.20
0.00
0.2
B
rn
a,
3 .--
I
E"0.1
Q
0.0
0.80
B 0.60
3 .--
I
5a 0.40 0.20
0.00 (
5 10 15 20 25 30 35 40 45 50
Sample thickness [ nm ]
0 5 10 15 20 25 30 35 40 45 50
Sample thickness [ nm ]
FIGURE27. Results of Bloch-wave computations performed with the EMS program package for AI,Ga, -.,As for different aluminium concentrations. Amplitudes of (a) the (000). (c) the (002). and (e) the (004) beams are plotted versus the sample thickness; (b), (d), and (f) show the corresponding phases.
166
A. ROSENAUER A N D D. GERTHSEN
reflections in the image power spectrum as follows:
T(0, -gOo2;A f ) ' 2 )T*(-g 0 0 2 , 0; A f ) ( 3 )T*(goo2,0; 4 f ) .
(42)
The quantity T(goo2,-gOo2; A f ) in Eq. (41) is real because of T k o o 2 , -x',oz:
* 4f) (35) = T ( - ~ 0 0 2 > g 0 0 240 ~ (2)T*(g,,,,
-goo,;
41'). (43)
Therefore, we may use the abbreviations TOo2 exp( - i ( x ,
+ xSN := T(O, - g o O 2 ;4fL
To,,, exp(i(x.r+ xs)) : = T(g002,0; 4f) and
To04 =
T(g002, - g 0 0 2 ;
4f) (44)
where Too2 and TOo4are real numbers and xr and xs are phase shifts introduced by the objective lens defocus Aj' and the spherical aberration, given by 71
2 xf = 5 24f1.goo2;
71
xs = - CsL3&02.
2
(45)
In Eq. (45), i, is the wavelength of the incident electron beam and C, the spherical aberration constant. Inserting the abbreviations in Eq. (44) into Eqs. (40) and (41) yields J(x'oo2) =
~oo,CexP(- i(X.r.
+ X s ) ) F ko o z F * ( O )
+ exp(i(X.r + x s ) ) F k 0 0 4 ) ~ * ( ~ 0 0 2 ) 1
(46) (47)
J(g004) = T004F(g004)F*(0).
Furthermore, we introduce F(go0,)= u O o feipooJ, . 1 = 0,2,4.
(48)
by From Eqs. (46) and (47) we obtain the amplitudes of J(g,,,) and J(gOo4) IJ(~O"2N
= ~ " 0 2 J ( ~ 0 0 2 ~ o o o +(Q )2 004Q"02)2
/J(g004)1= Too4~004~0oo~
+ 2Q002~000Q004~002 cos(cp,,)
(49)
(50)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
167
where q n is given by
vn = - 2(x + xs) + (Po02 =
+ 2P002 - PO00
-2x,,
- P O O " ) - (Po04
- Po021
(51)
- P004'
Here x,,= xs + xs corresponds to the image n of the defocus series. Equations (45), (49), and (51) show that a linear change of the defocus results in an oscillation of IJ(g,,,)J. The defocus change corresponding to a full oscillation of lJ(gOO2)) is given by
The ratio ~ J O O 2 ~ / ~ is J ocalculated o4~ from Eqs. (49) and (50),leading to Too2
lJ002l/lJ004l
= __ Q002
To04
1
1 x
u604
+,-+-
'000
2
coS(-2x,+2P,02
-PoOo-P0O4).
u004'002
(53)
D. Determinution qf Sumple Thickness und Phase I, Equation (49) indicates an oscillation of (J(goO2)l in dependence of q,,, which is a linear function of the defocus according to Eqs. (45) and (51). Figure 28 gives an example of a defocus series consisting of 10 exposures (9 of them are shown), where the defocus stepsize of the microscope was adjusted to 9nm. Each image is a small part of an image of the size of 1024 x 1024 pixels2. An area of known composition in the GaAs buffer layer is used for the thickness determination and is marked by the white rectangle in Fig. plotted versus the image 24a. Figure 29 shows the amplitudes JJ(gOo2)l number (running from 1 to 10, 1 corresponding to the largest underfocus). Figure 29 was obtained in the following way: From each of the ten images, a region of the same size and position (indicated in Fig. 24a) was Fouriertransformed. The 10 pixels with the largest amplitudes enclosed in a circular area around the (002) reflection were summed for each image. The data points were fitted by a curve according to Eq. (49) given by
+
I J ( ~= ~ (1 ~ ~ An)JB ) I
+ Ccos(D(iz - E ) ) ,
n = 1 ... 10,
(54)
where A , B, C, D, E are the fit parameters and n is the image number. The factor (1 An) in Eq. (54) takes into account the (weak) defocus
+
168 A. ROSENAUER A N D D. GERTHSEN
FIGURE28. Small parts of the HRTEM images of a defocus series of 10 images (nine of them are shown), each showing the In,Ga, _,As interlayer as well as the GaAs cap and buKer layers. A defocus stepsize of the microscope of 9n1n was chosen. The images 2, 5, and 8 show a fringe pattern that is dominated by the (004) fringes corresponding to minima of the amplitude of the (002) reflection in Fig. 29.
169
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION I
-
I
I
I
I
I
I
I
I
-
2 . 0 1~04-
2
4" 0
-
1.5~10~-
W
-g
-0
2 .-
-
1.0~10~-
d
-
5 . 0 1~03I
I
1
2
I
I
I
I
I
I
3 4 5 6 7 8 Number of image of defocus series
1
I
9
10
FIGURE 29. Amplitude of the (002) reflection in the d i h c t o g r a i n plotted versus the image number shown in Fig. 28. The values J,,, and Jmi, are used to derive the specimen thickness.
dependence of TOo2in Eq. (49) that contains the source-size-dependent envelope function E,(g h, h; Af) (Ishizuka, 1980). The values obtained for the fit parameters are
+
A
B
= 0.012,
= 1.88 x los,
C = 1.7 x lo8, D
= 2.06
0.04, E = 0.6. (55)
According to Eqs. (52) and (54), the defocus stepsize between adjacent images of the defocus series is given by
AStepsize(Af)= D/(2nAy&,2) = (10.3 f 0.2) nm. , ~Eq. (49) result from Eq. (54) to q n = D(n - E ) . Finally, the The angles ~ p of phases x, of Eq. (51) are calculated by ~n
= ~ 0 0 2-
3vn + ~ 0 0 0+ ~ 0 0 4 )= ~ 0 0 2- t ( D ( n - E ) + ~ 0 0 0+ P O ~ J ,
(56)
where the pOoj are computed by the Bloch-wave method for x = 0 for the appropriate thickness whose determination is described in the following. From Eq. (49) we recognize that the maxima J,,, and minima Jminof IJCg,,,)l correspond to cos(qn) = f 1, leading to Jmawnin
= ~ 0 0 2 J ( a 0 0 2 a 0 0 0 ) ~ 2aoozuoooaoo4aoo2
=
*
~ 0 * 2 ( ~ 0 0 2 ~ 0 0 aoo4aoo2) 0
+ ( ~ 0 0 4 a o O2~ ) (57)
170
A. ROSENAUER A N D D. GERTHSEN
. .
-
0
-
-
0
m"
-
0
5
10
15
30 35 sample thickness [ nm ] 20
25
40
45
50
FIGURE 30. Amplitude ratio of the (000) and (004) beams plotted versus the sample thickness. The graph is used to determine the specimen thickness.
As already mentioned in this section, Too2contains the source-size-dependh,h; A , f ) (Ishizuka, 1980) that weakly depends ent envelope function E,(g on the defocus A$ Therefore, the values of J,,, and Jmi,change slightly For the following we between two adjacent oscillation periods of IJ(gnnz)l. as shown in Fig. 29 and deduce from Eq. (57): use J,,, and Jmin
+
+
In our case, the value (J,,, Jmi,,)/(JmaX - Jmin)= 1.6 is calculated from Fig. 29. The sample thickness can be obtained directly from the thickness dependence of aooo/aoo4for GaAs shown in Fig. 30, which was calculated with the Bloch-wave method. Figure 30 also contains uooo/aoo4for ZnSe. The calculations were performed with an absorption coefficient of 0.04. However, the curves anoo/uon4versus the thickness do not depend on the absorption in a good approximation. In the case here, we find a sample thickness of 16nm.
E. The Evaluation Procedure In this section the individual steps of the evaluation process are listed. Figure 31 gives an overview of the analysis steps that are described in the
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
171
FIGURE 31. Schematic drawing showing the individual analysis steps of the CELFA procedure. The rectangles indicate procedures and the ellipses are the data that are used for a special procedure. In the first step ( I ) , N images of a defocus series (squares with rounded corners) are used to evaluate the sample thickness t (Eq. (58) and Figs. 29 and 30) and the phasc angles xi (Eqs. (49) and ( 5 1 ) ) . One image i with a large value of IJOnzl is noise reduced (2) and oriented that the fringes are running in a horizontal direction (3). Using a Fourierfiltered image formed by the (004) reflection only, a 2-D grid is calculated that subdivides the image into image unit cells (4). The next step (5) involves the computation of reflection amplitudes and phases, the measurement of the real factors Tnnzand Tno4either as mean values inside a reference area or locally inside regions with .Y = 0. In the latter case, it follows the computation of a local map of either T,,,/T,,, or Tnnz that is used for concentration determination according to Eq. (53) or Eq. (49), respectively. In the steps (6) and (7), experimental values of either ~J0,,~/JJno41 or lJn021 obtained from each imagc unit cell are or of JJoO11, which are computed with Eqs. (49) compared with a list of values of ~Jooz)/JJ,04J and (50). The entry with the best fit yields the evaluated concentration x.
172
A. ROSENAUER AND D. GERTHSEN
following: 1. Determination of sample thickness und x,,. Sample thickness and angle
X, are measured from the unprocessed N images of a defocus series as described in Section 111-D. 2. Noise reduction. The image n of the defocus series to be evaluated is Fourier transformed and the filter @ of Eq. (4) is computed (Section 11-A-1) and subsequently applied to the Fourier transform. Only circular areas around the k ( 0 0 2 ) and the k(004) reflections as well as the central pixel are used for the inverse Fourier transformation. The size of the circles has to be chosen large enough that relevant information is not lost (e.g., satellite reflections close to the (002) reflection in the case of a compositional superlattice must be included). 3. Correction of the image orientation. The fringe patterns recorded with an on-line CCD camera are accidentally oriented. However, the CELFA procedure of the DALI program package requires the fringes running along the horizontal direction. Therefore, the image has to be rotated with the procedure outlined in Section 11-B-1: “cell transformation” where the whole image is used as “cell”. The rotation angle is computed from a line that is drawn by the user parallel to a fringe. 4. Subdivision of the image into imuge unit cells. In the case of the fringe pattern the 2-D gridding is connected with two difficulties. First, information is not available for the positions of the grid lines perpendicular to the fringes. Their distance, therefore, has to be chosen. In most cases quadratic image cells seem meaningful. The horizontal grid lines parallel to the fringe pattern are found by searching for brightness maxima along the chosen vertical grid lines perpendicular to the fringes. For measurement of the Fourier amplitudes, their minimum distance is given by the spacing of two bright fringes do,, = a0/2 with a. being the lattice parameter. Often, the distance 2dO0,is chosen in order to improve the accuracy of the composition evaluation. In both cases, the horizontal grid lines lie on either the bright or the less bright fringes in the regions with x = 0. However, the search procedure condition, in which the grid lines have to be positioned on either the bright or the less bright fringes, is not sufficient if their brightnesses interchange for concentrations x > xo. The value xo corresponds to the concentration where amplitude aoo2 of the (002) reflection vanishes, for example, xo = 0.22 for In,Ga,-xAs. This second difficulty of the gridding can be surmounted by the following procedure. The image is Fourier transformed and the inverse transformation is performed with the k (004) reflections only. Intensity maxima positions are searched along the chosen vertical grid lines leading to a distance dOo2/2of the horizontal grid lines. Then, horizontal grid lines are deleted in such a way that either each second
A T O M I C SCALE STRAIN A N D C O M P O S I T I O N EVALUATION
173
or each fourth line is kept. The reduced grid is indexed as described in Section 11-A-2 and lattice base vectors are calculated according to Section 11-A-3. Next, the positions of the grid lines with respect to the original image have to be checked if one aims for positioning the grid lines either (a) on the bright or (b) on the less bright fringes in the region with x = 0. Figure 32 explains why these two options are not equivalent in particular
FIGURE 32. Schematic drawings explaining the difference if either (a) the bright fringes or (b) the dark fringes are lying at the image unit cell corners. Note that the egecect may also occur conversely.
174
A. R O S E N A U E R A N D D. G E R T H S E N
cases. Let us assume that the bright fringe in the center is induced by one row containing Al atoms with the concentration x l , which is embedded in GaAs with the interfaces parallel to the vertical grid lines. Let us furthermore assume that the distance of horizontal grid lines is d,,,. In case (a), there will be two rows of cells that reveal an Al-concentration of x1/2, whereas in case (b) we find only one row that shows a concentration x l . If the grid lines are not positioned as intended, they can be shifted by one-fourth or one-half of the vertical lattice base vector. The resulting grid that decomposes the image into unit cells will be used for the following analysis steps. 5. Determination of To,,, und Too4.A table is calculated with the Blochwave method and lists the values uooo,uoo2. uoo4 and pooo, poo2, poo4 as a function of the In-concentration x for the relevant sample thickness. A stepsize of 1% is chosen for the In-concentration. A “reference region” is are selected in an area with x = 0 of the image n and lJ(goo2)land lJ(goo4)l measured and averaged over all unit cells contained in the reference region. Then, Too2and Too4are calculated from Eqs. (49) and (50) using the sample thickness (Eq. (58)) and the angle x,, (see Eq. (56)) obtained by the procedure in the previous section as well as uooo, uoo2. aoo4 and pooo, poo2, poO4 calculated for x = 0 with EMS using the Bloch-wave method. The procedure that is indicated by the dotted box in Fig. 31 is only applied if the sample thickness and defocus change significantly in the area of interest (see Section 111-F). 6. A second table computed from the first one, which was previously generated in step 5 , lists the values of the formula at the right-hand side of the equal sign of Eq. (53). 7. The ratio 1Joo21/1Joo4( for each image unit cell of the experimental image is compared with the calculated values of the second table formed in step 6. The table entry with the best agreement yields the local In-content x. Figure 33a is a color-coded map of the local In-content obtained for image 1 of the defocus series. The yellow regions correspond to the GaAs and the green to red regions show the In-content in the In,Ga, -,As layer. Steps 1 to 7 describe the “standard” evaluation procedure CELFA. However, in some cases it may be favorable to apply variants. A first variant of the standard procedure that is contained in the DALI forJ ocomposition o4~ program package does not use the ratio ~ J o o z ~ / ~ determination but only exploits IJoo21. This variant is meaningful if only small variations of lJoo2[ have to be detected with high accuracy. It is do not contribute to advantageous that errors of the measurement of [Joo41 the evaluation of x. The disadvantage is that this variant does not take account of disturbances on the sample surfaces (amorphization by the ion-milling procedure, oxides, contaminants). However, the presence of such
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
175
FIGURE33. Color-coded maps of the lcal In-content .Y of a nominally 2nm thick In,Ga, _,As layer capped with lOnm GaAs. The cap layer is to the right of the In,Ga,_,As. The maps are obtained from (a) ~.Joo2~/~Joo4[ and (b) lJoo21 of local unit cells using a mean value 4 (b) Too, calculated from the GaAs buffer on the left-hand side of the of (a) ~ o z / T o oand InXGa, (See also Plate 18.)
as.
imperfections can previously be checked with the standard procedure. In this case, steps 6 and 7 of the standard procedure have to be altered. In step 6, the second table has to list the values of the right-hand side of Eq. (49) instead of Eq. (53). In step 7, IJooz( is compared with the second table formed in step 6. Figure 33b gives the result for the described variant. Differences between Fig. 33a and b can only be recognized for the highest lead to the In-concentrations, where the errors of the measurement of (Joo4( largest deviations A x . A second variant that is described in the next section takes account of imaging conditions that may vary across the image.
I? Correction of Imaging Conditions Varying Across the Image Figure 34a shows a map of the In-content calculated from the largest possible section of image 1. The reference area was chosen in the upper left corner. Obviously, the evaluated In-content does not vanish in the whole GaAs region but increases from the top to the bottom. This artifact is due to the slow variation of the imaging conditions such as the defocus and the sample thickness, which affects the local validity of T',,,/T,04 calculated in the reference region. In this section we describe the correction of this effect, leading to a higher accuracy of the evaluated In-concentrations.
176
A. ROSENAUER AND D. GERTHSEN
FIGURE34. Color-coded maps of the local In-content x obtained from ~ J o o 2 ~ / ~ Jof oo4~ local unit cells using (a) a mean value of To,,,/Too4calculated from an area of 10 x 10 cells in the upper left corner of the image, and (b) a local map of Tooz/Too4.(See also Plate 19.)
The procedure is based on the calculation of a map that yields the appropriate values of T',,/To,, for each image unit cell. Obviously, To,,/T,,, is easily obtained for each unit cell inside the regions with x = 0 by the application of Eqs. (49) and (50). Figure 35a shows a color-coded map of Too2/To,, calculated in the GaAs. We clearly recognize the variation of Too,/Too,from the top to the bottom of the image. The values for Too,/To,, in the regions with x > 0 are extrapolated from those with x = 0. Optionof each cell inside the region with x = 0 can be ally, the value T,,,/T,,, averaged by the computation of a mean value with neighbor cells. Figure 35b shows the completed map for Too2/Too4. The evaluation process of Section 111-Ehas to be altered in such a way that the second table generated in step 6 is calculated for each image unit
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
177
FIGURE35. (a) Local values of Tooz/Too4 computed in two regions to the left and the right of the In,Ga,_,As; (b) local map of Too2/Too4 obtained after averaging and extrapolation of the values shown in (a). (See also Plate 20.)
cell separately using the appropriate value for Too2/Too,contained in the map (Fig. 35b). The result is shown in Fig. 34b. Obviously, the artifacts visible in Fig. 34a are removed in Fig. 34b. In cases where only lJOo2/is exploited for the composition evaluation as described in the previous section, a map of local instead of IJoo21/1Joo41 values of Too2is generated instead of Too2/Too4.
G. Errors of the Composition Detection Due to Sample Thickness Uncertainties In the following, we discuss the accuracy of the evaluation of the Inconcentration depending on the error At for determination of sample
178
A. ROSENAUER AND D. GERTHSEN
thickness. First, we introduce the abbreviation S(X, t,
cp) =
&-
1
2
a004
aOOO
(59)
Wcp).
+T+ a004uOO0
Next, we use the following approximation for aoo,(x, t): aoo,(x, t) = aoo,(O, t ) . c(x)
with c(0) = 1 and c(xo) = 0,
(60)
where xo is the concentration for which the amplitude uoo2 vanishes. For AI,Ga, -,As with aoo2> 0 for the whole range of concentrations, xo is = -0.28 and obtained by extrapolation. The values are ~ ! j " ~ * ~ = 0 . 2xtkGaAs 2, xCdZnSe 0 = 0.41. The function uooz(x,t)/aoo,(O, t ) is shown in Fig. 36a, b,c for In,Ga, -,As, Cd,Zn, -,Se, and AI,Ga, -,As, respectively. Figure 36 clearly indicates the validity of the approximation in Eq. (60) because the curve is nearly independent of sample thickness t. In the case of Al,Ga, -,As, Eq. (60) can be applied either for Al-contents below 50% or for sample thicknesses below 20nm. Furthermore, we recognize that c(x) can be regarded as a linear function c(x) = c x (x - xo) in good approximation. Now we use Eq. (53) to deduce:
-
=: G
/
-
1 for x=O
The value of G is obtained for x = 0 in the fifth step of the evaluation procedure of Section 111-E by the determination of ( T002!T004)measured from (jJ002//(Ja,04/)measured inside the reference region. The obtained value of G is valid for all In-concentrations x because all factors that contribute to G are independent of x. An error At of the thickness measurement, therefore, affects only S(x, t, cp). Furthermore, it is necessary to understand that it is not the absolute value ) may cause an error of the x-measurement, but only of S(x, t , ~ that deviations of S(x, t , cp)/S(O, t, cp) for different thicknesses are relevant. For the explanation let us assume that the measured thickness t deviates from the real thickness trealby At. In this case, the value ( T 0 0 2 / T 0 0 4 ) m e a s u r e d that is determined from Eq. (61) (using the measured thickness t ) in the GaAs will also deviate from the real value. Let us assume that we now use Eq. (61) for the measurement of x in a region with x > 0. From Eq. (61) it becomes obvious that the determined value of x will be correct if S(x, t , cp)/S(O, t , cp) = S(x, treal,cp)/S(O, treal,cp). Therefore, only deviations of S(x, t, cp)/S(O, t , cp)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
179
a
-
3. -*
-.
1
InGaAs
FIGURE36. Amplitude uooz(x,1 ) of the (002) beam normalized with rcspect to ao,,,(O, r ) , plotted versus the sample thickness t and the In-concentration .x for: (a) In,Ga,-zAs; (b) Cd,Zn, -.ySe; and (c) AI,Ga,_,As.
180
A. ROSENAUER AND D. GERTHSEN
from S(x,trea,,cp)/S(O,treal,cp) may cause an error of the determination of x. For convenience, we will use the abbreviations (IJOD21/IJ,041)measured =: M and S(x, t, cp)/S(O, t, cp) = S,(x, t , cp) in the following. From Eq. (61) we obtain
From Eq. (62) we obtain the error
In order to estimate the maximum error with respect to cp, we use the phase q,,, that maximizes dS,(x, t, cp)/dt, yielding
Finally, we obtain by inserting S,(x, t, cp) = S(x, t, cp)/S(O, t, cp), multiplying with (x - x,)/x and taking the absolute value
The color-coded map of Fig. 37a shows the result of Eq. (65) for In,Ga, -,As. If we assume a measured thickness of t = 15 nm with an error of At = & 5 nm we obtain an error of Ax/x = 0.007 nm- x 5 nm = 0.04 at an In-content of 6O%, which would lead to x = (60 & 2)%. The plot shows large errors at thicknesses close to 45nm, which can be attributed to the complex behavior of the a,,,(x, t)- as well as the uoo4(x,t)-curves in this region as can be seen in Fig. 25. The error vanishes at an In-concentration of 22% because a,,,(x = 22%) M 0 for all thicknesses. Similar results are obtained for Cd,Zn, -,Se and Al,Ga, -,As as shown in Figs. 37b and c. If only JJoorJ is used for the evaluation instead of (Joo21/lJoo41, Eq. (65) holds if the following definition for S(x, t, cp) is used instead of Eq. (59):
'
S(x,t, q) = J a i 0 4
+ a600 + 2a004aOOO cos(q).
(66)
In Figs. 37d, e, and f the results are plotted for Eq. (65) using Eq. (66) for In,Ga, -,As, Cd,Zn, -,Se and Al,Ga, -,As, respectively.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
181
FIGURE37. Color-coded maps of the relative error A x / x per At = 1 nm uncertainty of the measured sample thickness for (a, d) In,Ga, _.As, (b, e) Cd,Zn, _,Se, and (c, f ) AI,Ga, -,As using (a, b, c) ~ J o o z ~ / ~ and J o(d, o 4e,~f) lJoo2/ for the composition determination. The graphs were computed according to Eq. (65) using the definition of S(x, t, cp) given in Eq. (59) for (a, b, c) and in Eq. ( 6 6 ) for (d, e, f). (See also Plate 21.)
182
A. ROSENAUER AND D. GERTHSEN
IV. APPLICATIONS We describe here some applications of the evaluation techniques outlined in the previous sections. The examples are given in chronological order. The strain-state analysis was first carried out with ZnSe/Cd,Zn, ,Se/ZnSe/ GaAs (001) heterostructures (Section IV-A). A first aim (Section IV-A-1) was verification of the validity of our implementation of the strain-state analysis in the DALI program package. For that purpose, our results were compared with reflection high-energy electron diffraction (RHEED) investigations of as grown MBE samples. The next step was the measurement of the diffusion coefficient for the diffusion of Cd in ZnSe in the temperature range 340-400°C. Then we investigated free standing (Section IV-B-1) and buried (Section IV-B-2) In,Ga, _,As Stranski-Krastanow islands. In free standing islands, an inhomogeneous distribution of the In-concentration was measured where the mean In-content depends on the growth temperature. One of our main interests concerned the transformation of the morphology of the islands, which is caused by overgrowth with GaAs. The results that were obtained by strain-state analysis and the CELFA method provided valuable structural data for interpretation of optical spectra. The high accuracy of the CELFA procedure facilitated the quantitative investigation of the thickness and composition of the wetting layer (Section IV-B-2). A further application of the strain-state analysis that is given in Section IV-C deals with the measurement of displacements at a ZnSe/ZnTe interface that contains an array of misfit dislocations. Morover, first results of recent CELFA evaluations concerning Cd-content fluctuations of CdSe layers in ZnSe are presented in Section IV-D. ~
A. Strain-State Analysis of’Zn, Cd, -,SelZnSe Heterostructures The Cd,Zn, _,Se/ZnSe quantum wells (QWs) were grown and investigated in the Institute for Experimental and Applied Physics, University of Regensburg (Reisinger et al., 1996). The growth was performed on (001)oriented GaAs substrates in a conventional MBE system with elemental sources. The substrates were degreased and etched in standard solutions and then transferred into the MBE chamber where they were annealed for 5 min at 350°C before purging in H-plasma from an RF-plasma discharge source. The deoxidation process runs for 10 min at 300 W RF-power. The substrate temperature was lowered after 5 min to the growth temperature, which was kept constant at TG= 300’C. For the growth of the QWs, Cd is supplied additionally, whereas the Zn and Se fluxes remained unchanged. Growth and composition of the heterostructures was controlled by RHEED. The
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
183
RHEED system consists of a 35 keV electron gun and a fluorescence screen from which the RHEED pattern was scanned with a high-sensitivity TV camera connected to a personal computer. In order to investigate the RHEED oscillations, the time-dependent integral intensity of a rectangular area around the specular spot was measured. The incident angle of the electron beam was 1.2", the accelerating voltage 10kV and the azimuth either [llO] or [IIO]. For the TEM investigations { 110) cross-sectional samples were conventionally prepared. In the final stage of thinning two Art ion guns were used under an incidence angle of 14". During ion milling with a GATAN Dual-Ion mill the sample was kept in a LN, cooled specimen holder. The HRTEM micrographs were obtained with a Philips CM30 microscope equipped with a twin lens. The acceleration voltage was 300 kV and the Scherzer resolution 0.23 nm. 1. lnvestigutions of' As-Grown Srmples by Struin-Stutr Anulysis und RHEED
Five samples were grown with different Cd,Zn, -,Se interlayer thicknesses and Cd-concentrations x. Figure 38a shows two example micrographs of the samples MBE180 and MBE336, which represent the samples with the thinnest and the thickest Cd,Zn, -,Se insertions. The interlayers can be identified as vertically oriented dark bands. Figure 38b represents the displacement components in growth direction of three samples, which were averaged in planes parallel to the interface plane. The regions with increasing displacement values allow the determination of the Q W thicknesses that are given in Table 11. According to Eq. (31), the maximum displacements u, yield the integral Cd-content. The factor cxR of Eq. (31) was calculated according to an analytical solution for the elastic relaxation of the tetragonal distortion in interlayers with homogeneous composition, which was found by Treacy and Gibson (1986). The sample thicknesses were assumed to lie in the range of 10-30nm. Averaged Cd-concentrations x are determined by dividing the integral Cd-contents by the thicknesses in Table 11. The results are shown in Table Ill. The averaged lattice plane distances normalized with the ZnSe lattice parameter are given in Fig. 38c. Tables 11 and 111 also contain the QW thicknesses and Cd-concentrations x, which were determined by RHEED using the following procedure (Reisinger et al. (1996); Rosenauer et al. (1995)): The intensity oscillations (Fig. 39) of the specular spot (SS) are recorded with a high-sensitivity T V camera and a personal computer equipped with a frame grabber. Each period in the RHEED oscillation curve corresponds to the growth of one monolayer (M L). With the preceding experimental parameters, the curve starts with a maximum. Each of the following maxima corresponds to a
184
A. ROSENAUER A N D D. GERTHSEN
FIGURE 38. (a) (110) HRTEM micrographs of the Cd,Zn,-,Se/ZnSe quantum well structures MBEl8O and MBE336; (b) averaged displacements for each monolayer in growth direction; and (c) averaged lattice parameters obtained for the samples MBE180, MBE337, and MBE336. Evaluation results and corresponding RHEED results can be found i n Tables 2 and 3.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
185
TABLE I1 Cd,Zn, ,Se INTERLAYER THICKNESSES d,, DETERMINED BY THE STRAIN-STATE ANALYSIS AND RHEED FOR VARIOUS ZnSe/Cd,Zn, -,Se/ZnSe/GaAs (001) SAMPLES GROWN R Y MBE dcd
/C M Ll
Strain-state analysis RHEED
MBE 180
MBE 216
MBE 335
MBE 336
MBE 337
6.4 2 0.5 6.9 f 0.2
23 f 1 23.4 f 0.6
13.7 f 0.7 14.1 f 0.4
36.9 & 1.6 36.2 If- 0.7
11.3 & 1.0 12.0 f 0.3
TABLE I11 COMPARISON OF THE. Cd CONCENTRATIONS xCdOBTAINED BY STRAWSTATE ANALYSIS WITH RHEED MEASUREMENTS FOR THE SAMPLES THAT ARE LISTEDIN TABLE11
.ki/[%l Strain-state analysis RHEED
MBE 180
MBE 216
MBE 335
46 & 4 45 3
61 k 5 59 4
38 3 42 f 4
11*
TCdZnSe
i
0
+
20
40
MBE 336 29 29
+5 k4
MBE 337 31&3 31 f 4
60
Growth time / [ s ] FIGURE 39. Specular-spot intensity plotted versus the growth time for (upper curve) ZnSe and (lower curve) a Cd,Zn,_,rSe quantum well. The exponential decay as well as the decreasing oscillation period with increasing growth time of the lower curve indicate a broadening of the transition region between ZnSe and Cd,Zn, -.Se.
186
A. ROSENAUER AND D. GERTHSEN
further complete ML. After the growth of a Cd,Zn, -,Se QW it is observed that the barrier grows faster, that is, the time interval between two maxima is shortened compared to the deposition of the pure ZnSe buffer layer. This behavior is connected with the appearance of an alloy mixing. In the present case Cd,Zn, -,Se is formed where x decreases with increasing number of monolayers. The local Cd-concentration is determined by using the change of growth rate. The exact time period TznSeof the ZnSe growth is obtained from the RHEED oscillation curve of the ZnSe-buffer.This curve (upper curve in Fig. 39) is fitted by
where Zcos(t) describes a damped cosine oscillation with the frequency 1/T The damping is due mainly to a successive increase of the surface roughness. In Eq. (67) Ie...,(t) is an exponential function that takes into account the change of intensity when different sources are offered due to different atomic scattering factors of the elements. From the work of Gaines and Ponzoni (1994) we use the relation
for the determination of the Cd-content x where 1/7&ZnSeindicates the growth rate of the ternary and l/Tznsethat of the binary compound. Note that Eq. (68) is applied to each single monolayer j . Using the detection of RHEED oscillations during growth, the computer control of the epitaxial process enables the growth of QWs with monolayer accuracy. Comparing the results of the strain-state analysis with the RHEED measurements in Tables 2 and 3 reveals a good agreement. Figure 40 shows the mean displacements and lattice parameters obtained for the sample MBE216. In Fig. 40b a subdivision of the region with mean lattice parameters (normalized with the ZnSe lattice parameter) larger than I into both transition regions and the QW region is shown. In the following we will estimate the amount of Cd contained in the transition regions, which can be regarded as a measure of the interfacial roughness. The number of MLs Cd,Zn, -,Se with a composition corresponding to the mean lattice parameter A of the Q W region (Fig. 40b) is obtained by
For the sample MBE216 we obtain NCd,Zn,-xse = 1.62/0.074 [ML] = 21.9 [ML]. As shown in Fig. 40b, the QW region consists of 19 MLs. Therefore, 21.9 - 19.0 = 2.9 MLs Cd,Zn, -,Se are contained in the transition regions. Each interface represents a transition region containing about 1.5 ML
187
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
3-
0
10
20
30
40
50
60
7 region
\transition I
0
I
10
region7 I
#
I
20 30 40 50 Numberj of (002)plane
I
60
FIGURE 40. (a) Averaged displacements and (b) averaged lattice parameters plotted versus the number of the (002) plane in growth direction; umarand - 1 as well as the thickness of the transition regions are used to determine the amount of Cd contained in the transition regions.
a
Cd,Zn,-,Se. With the Cd-concentration of 61% given in Table I11 we obtain an amount of Cd inside each of the two transition regions corresponding to 0.9 ML CdSe. Table IV gives an overview of the Cd-contents of the transition regions of the five investigated samples reaching from 0.2 to 0.9 ML. However, it is appropriate to note that these values should be regarded as estimates because the subdivision into QW region and transition region is somewhat arbitrary. In conclusion we can give a mean value of 0.4 ML CdSe per transition region.
188
A. ROSENAUER A N D D. GERTHSEN TABLE IV REGION BETWEEN THE Cd,Zn, -.Se CdSe CONTENTI N EACHTRANSITION THE ZnSe MATRIX MBE 180 _____~
AND
MBE 216
MBE 335
MBE 336
MBE 337
0.9
0.4
0.45
0.2
~
Content [ML]
0.23
2. The Determination of Cd Diffusion in CdSelZnSe Single Quuntum Wells
A ZnSe/CdSe/ZnSe/GaAs (001) sample was grown by MBE as described in the preceding. The deposited amount of Cd was equivalent to 2 M L CdSe. The ZnSe/CdSe/ZnSe thickness was about 60nm, which is well below the critical thickness for the generation of misfit dislocations as seen in Rosenauer et al. (1996). The sample was cleaved into four pieces. The pieces were annealed at different temperatures (337"C, 3 6 7 T , 382°C and 394°C) for 1 h in an N,-atmosphere and then rapidly cooled to room temperature (RT). After annealing, the samples were prepared for TEM. The Cd-concentration profile shown in Fig. 41, detected in sifu with RHEED as described in Section IV-A-1, represents the profile of the cis grown sample. Figure 41 clearly shows that a 6-ML thick mixed crystal is obtained instead of a 2-ML thick CdSe interlayer. Figure 42a, b, and c shows HRTEM images of the specimens that were
I
'
' I T-
'
n
Y
\ -cI
c
J
Q)
+-,
C
0
0
73
,577
0
1
2
3
4
5
6
number of monolayer j
FIGURE41. Cd-concentration profile in growth direction evaluated from RHEED data for the nominally 2 M L thick CdSe quantum well embedded in a ZnSe matrix (Rosenauer er a/., 1995).
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
ZnSe CdxZn,,Se
189
ZnSe
FIGURE 42. (110) HRTEM lattice images of the interlayer regions of the samples annealed at (a) 337°C. (b) 367°C and (c) 394°C (Rosenauer c't ul., 1995).
annealed at 337"C, 367'C, and 394°C respectively. Figure 43 gives the displacement curves that were measured with the DALI program package. Obviously, the displacement curves reveal a significant broadening of the QW region that increases with annealing temperature. In Fig. 43, the 337°C plot also shows the displacement curve, which is directly evaluated from the RHEED data of Fig. 41. Good agreement with the strain-state analysis suggests a negligible diffusion of Cd at an annealing temperature of 337°C.
190
A. ROSENAUER AND D. GERTHSEN I
I
-
0.2 - 337°C
0.1 0.0
-
- Fit curve u(z)
-
DEva'=(0.16~0.08)1 0 "cm2 substrate I
I
I
I
I
0.0 I
I
-30
I
I
-20
I
I
I
I
I
-1 0
0
10
20
I
(002) plane numberj FIGURE 43. Averaged displacements plotted versus the (002)-plane number for the different annealing temperatures. The upper curve represents the displacement curve expected for the concentration profile shown in Fig. 41 that corresponds to a diffusion coefficient close to zero. The given values for the diffusion coefficients DE""' are directly determined from the 1995). experimental data by a fit of Eqs. (77) and (78) (Rosenauer et d.,
PLATE 23. Schematic drawing on the origin of the red regions in Fig. 50. The region with increased In-concentration induces a bending of the horizontal lattice planes corresponding to an enlarged displacement component in growth direction.
PLATE 24. Color-coded maps of the local In-concentration of (a) the wetting layer and (b) an
island.
PLATE25. (a) Finite element model for the specimen region evaluated in Fig. 50. The thickness in the In,Ga,.,As region is obtained by an extrapolation of the thicknesses of neighboring GaAs regions. The light-blue grid shows the finite elements. The shape of the model also contains the influence of the displacement field (bowing of the surface of the InXGal,As) amplified by a factor of 20. The colors correspond to the displacement vector field component in growth direction. The legend is given in (b), which shows displacement vector field components evaluated from a projected 3-D atomic model that was deduced from the results of the FE-calculation shown in (a).
PLATE26. { 110) HRTEM image and gridding (red) of the interface between the ZnSe and ZnTe of a heterostructure grown on GaAs(001) with MOVPE. The Burgers circuits around the misfit dislocations are marked in green indicating three Lomer (LO) and one 60" dislocation.
. . ......................... ... ' . . . . . . . . !....... .... . ..L....... . .. ............ .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................ .................... ................... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................... . . . . . . . . . . . . .............. ............ . . . . . . . . . . . . .. .............................................. .............. .............................. ....... .............. .............................................. ............................................ ............................................... ................................................ .................................................. .............................................. ................................................. ................................................ ................................ .............. .............................................. ................................................. ................................................ ...... _.. ........................................ ....................................... ....... . . . _ ---_.. ......................................... ....-.--*,........ .......................... ................................ .......... - - ......
............... . ...*.*............ ,,,,,
.................. ..................../,,..... ~---.......,....,,,.,,,,,,', .--......... ..,.\..,,,,,.,,
, . \ . . . . . . , . , I . . . . . . . . . .
e,,,,,.......
2ao.o
# . . C 1.-
400.0
(#Qo
Distance in
PLATE27. (a) Displacement vector field and color-coded displacement vector field components in (b) growth direction and (c) { 110) direction of the ZnSeiZnTe interface region.
Distance in -direction/ [pixels]
PLATE27. (Continued)
............................. ............................. ............................. ............................. ............................. ............................. ............................. ............................. ............................. .............................
.-..I.........I........rl....l....l.....
.. . A
Ilaaoloaoaoaocaaosoac Distance in -direction /[pixels]
PLATE28. Evaluation of the calculated 2-D grid that contains an array of Lomer misfit dislocations; (a) displacement vector field and color-coded displacement vector field components in (b) vertical direction and (c) horizontal direction.
PLATE 28. (Continue4
PLATE29. Color-coded map of the Cd-concentration of a CdSe layer with a nominal thickness of 1 ML buried in a ZnSe matrix.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
191
The Cd-concentration profile of a sample that was annealed at a temperature T for a duration o f t can be described by
where z denotes the distance in growth direction and ,j the (002)-plane (z, t, T) is the Cd-concentration profile of an ideal interlayer number. Xsingle of I M L CdSe and xj is the Cd-concentration of the monolayerj of the CIS grown sample shown in Fig. 41. For t = 0, Eq. (70) constitutes the decomposition of the composition profile of the Cd,Zn,-,Se mixed crystal into slices of Az = 1 ML thickness where each slice is represented by a Heaviside function Az Az 1 for--
i
After an annealing time t, Xsingle(z, t, T) is given by
where D(T ) is the temperature dependent diffusion coefficient and erf the error function defined by (73) Equations (70) and (72) are solutions of the linear diffusion equation
The relation between the Cd-concentration profile of Eq. (70) and the measured displacements is given by Eq. (31). Because the reference region has to be chosen inside the ZnSe, which is grown pseudomorphically on the GaAs substrate, we have to consider the tetragonal distortion of the ZnSe buffer. The lattice parameter of the strained ZnSe is derived from Eq. (28): ~ t n s e= aGaAI - u R ( # C i a A \
- UZnSe).
(75)
From Eq. (31) we obtain the displacement in the thin and thick sample limits as U(Z) =
A
jzx([,
t, T )d[;
--o(
A
=
0.1 for thin specimens (76) 0.15 for thick specimens,
{
192
A. ROSENAUER AND D. GERTHSEN
where the sum in Eq. (31) has been replaced by an integral in order to take account of the continuum character of Eqs. (70) and (74). The integral in Eq. (76) calculated for the whole nominally 2-ML thick CdSe interlayer yields a value of 2. In the case of the thin sample limit we therefore expect a maximum displacement of 0.2, which is in good agreement with the displacement curves shown in Fig. 43. However, the effect of the elastic relaxation does not significantly influence the determination of the diffusion coefficients. The application of Eq. (70) yields
where g(z, t, T ) is given by
z’ =
z
+ Az/2
2 J m ’
2’’ =
z - A212 2JzqFp‘
(78)
Equations (77) and (78) were used to fit a curve u(z) to each of the experimental displacement curves of Fig. 43. The diffusion coefficient D(T ) as well as A introduced in Eq. (76) were treated as fit parameters. The resulting fit curves and the obtained diffusion coefficients DEval(T)are also do not shown in Fig. 43. However, the fitted diffusion coefficients DEva’(T) constitute the final result. One should note that each bright spot in the HRTEM image corresponds either to a tunnel position or to an atomic position, which consists of two columns of atoms of different kind. The position of each intensity maximum depends on the scattering factors of the atoms involved as well as on the imaging conditions. Therefore, the measured averaged displacement profile does not depend only on the positions of the atoms but includes additional “chemical” shifts of the intensity maxima due to the varying Cd-concentration. In order to investigate the influence of this effect on the measured diffusion coefficients, image simulations were carried out using the EMS simulation software package (Stadelmann, 1987). The applied procedure is illustrated in Fig. 44. From the preparation conditions the minimum sample thickness in beam direction was estimated to be 15nm. Furthermore, in order to reduce the effects of specimen surface roughness and contamination only the thicker regions were used for the HRTEM images. A comparison of experimental
193
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
Simulation
Experiment
I
I
sample: D
J.
u Phase contrast jpaninn II
'UJ" I
Y
fC--->
JI
1
..
Negative
1
JI DALI
1
1
1
Simulation of phase contrast imaging (EMS)
I 1'
Simulated image
I displacement
1
J,values u,
I m i 1 Fit of u, by u(z)
displacement values u,'
JI DEva'
Fit of u,' by u(z) r
D = f -'(D""')
JI I
D
,
"
FIGURE44. Schematic drawing of the procedure applied for the determination of the diffusion coefficients. In order to determine the influence of phase contrast imaging on the measured diffusion coefficients DEvu',several model crystals with diffusion coefficients D,9 are generated and the imaging process is simulated with EMS. The simulated images are evaluated with DALI. The relation between D, and 0;'"'is used to correct the experimental diffusion coefficients DEvP'(Rosenauer rt a/., 1995).
and simulated images calculated for a wedge-shaped ZnSe crystal gives an estimate of 30 nm for the maximum thickness. Therefore, images were simulated for several crystals with thicknesses of 15, 20, 25, and 30nm according to the diffusion coefficients given in Fig. 43. The simulations were carried out for defocus values Af of between 0 and 140nm underfocus. Those lattice images showing spot patterns were evaluated with DALI. Figure 45a and b show the averaged displacements obtained for the two
194
A. ROSENAUER A N D D. GERTHSEN
c.
5 0.2 E
g 0.1
m -
0 0.0
.-
n U
8 0.2 m
%
0.1
0.0
-30
-20
-10
0
10
20
(002) plane number
1E-I 9
1E-18
D Y [cm2I s] (logarithmic scale) FIGURE45. (a, b) Averaged displacements obtained from the DALI evaluations of simulated lattice images. The simulations were performed at a thickness of 20 nm for (a) D, = 0.16. 10-’8cm2/s and (b) D, = 3.8. 1 0 ~ ’ 8 c m 2 /and s two different electron beam directions B. The solid lines represent the input for the calculated model crystal. The displacement of the solid line from the squares or triangles is due to a “chemical shift” of the intensity maxima of the phase contrast pattern; (c) diffusion coefficients plotted versus the defocus A$ The vertical straight lines indicate the diffusion coefficients D, used for the formation of the crystal models. The squares and triangles show the diffusion coefficients DY’obtained from the evaluation of the simulated images on the basis of the input diffusion coefficients D, corresponding to the electron beam directions indicated in (a). The results of both electron beam directions are shown for the two extreme cases of D, (Rosenauer et ul., 1995).
extreme diffusion coefficients and for electron beam directions B = [ 1lo] and [TlO]. The defocus value was in this case Af = 50 nm underfocus. If the crystal is projected along [llO], the positions of the Zn and Se atoms are interchanged compared to the [TlO] projection. The two projections are not equivalent because the interlayer is not symmetrical. Figure 45a and b shows “chemical shifts” that slightly influence the shape of the displacement curve and therefore also affect the measured diffusion coefficients. The averaged displacement curves obtained by the DALI evaluations of the simulated
195
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
lattice images were fitted by u(z) as given in Eqs. (77) and (78). The obtained diffusion coefficients are plotted versus the defocus Af in Fig. 45c for the 20-nm thick specimen. Obviously, the measured diffusion coefficients LIPa' (polygons in Fig. 45c) are smaller than the values D, used to generate the model structures that were the basis of the simulated images (straight lines in Fig. 4%). It can be seen that the determined diffusion coefficients also depend on defocus A$ Since the experimental defocus is not well enough known we have to conclude that this effect contributes to the experimental error. Figure 46a gives the calibration curve derived from the simulation of crystals with thicknesses given in the foregoing. This calibration curve was applied to the measured values DEva'(T).The final result is given as an Arrhenius plot of D ( T ) shown in Fig. 46b. The values are fitted by D(T)=
S
- D(T) = 1.9 .lo4 cm2/s. e[W.E.Martin, J. Appl. Phys. 44 (1973)56391
1E-1 Y
0
1
2
3
4
D, [I 0.'' cm2I
5
(b) -
s1
I
I
I
I
1S O
1.55
1.60
1.65
1000 / T[K] FIGURE46. (a) Calibration curve for measured diffusion coefficients DE'"'. The diffusion coefficients D&vl' obtained from the DALI evaluation of simulated lattice images are plotted versus the values D, that were used for the crystal simulation. The error bars result from the variation of crystal thickness and defocus Af; (b) Arrhenius plot of the calibrated diffusion coefficients D versus the temperature. The solid line is a linear fit to the measured values of D . The dotted line corresponds to D ( T ) = 6.4. 10-4cm2/s~exp( - 1.87eV/kT) given in Martin (1973), measured in the temperature range of 700 to 900°C (Rosenauer er ul., 1995).
196
A. ROSENAUER AND D. GERTHSEN
Figure 46b also contains the diffusion coefficient curve derived from high temperature measurements given in the indicated reference, which is in very good agreement with the D ( T ) determined in this study.
B. In,Ga,
-
,As/GuAs Strunski-Krastanow Islands
Stranski-Krastanow island structures that consist of 3-D islands o n a 2-D wetting layer are candidates for quantum-dot laser fabrication. In the Stranski-Krastanow growth mode, the spontaneous formation of small islands is due to the lattice mismatch between layer and substrate. The generation of islands is energetically favorable compared to a 2-D tetragonally strained layer because the islands are able to expand at their free surfaces. One of the main methods for the characterization of islands is photoluminescence spectroscopy (PL). However, the differentiation between compositional effects and quantum size effect is difficult (O’Donnell and Woggon, 1997) because both lead to a shift of the transition energy of the radiative electron-hole pair recombination. The quantitative evaluation of the local composition by HRTEM, therefore, can give important additional information to solve the ambiguity described here (Woggon et ul., 1997). In the following section we first show the evaluation of In-concentration profiles in free standing layers, which were carried out by means of strain-state analysis combined with both the measurement of local sample thickness and FE calculations. Section IV-B-2 focuses on capped islands and includes CELFA investigations that are compared with the strain-state analysis. However, one should bear in mind that these sections constitute application examples of the evaluation procedures described in previous sections and do not aim at an exhaustive presentation of our investigations on Stranski-Krastanow islands (which will be published elsewhere). The investigated samples were grown by molecular beam epitaxy (MBE) at growth temperatures of 500°C (this sample served as an evaluation example in the second section of this paper) and 560°C on GaAs (001) substrates using a Varian M O D Gen I1 system. To provide a clean surface, a 0.1 pm GaAs buffer layer was deposited at a substrate temperature of 610°C prior to the In,Ga, _,As growth. The nominal composition during the deposition of the In,Ga,-,As was chosen to be x = 60% and x = loo%, respectively. Cross-sectional TEM samples were conventionally prepared. The HRTEM lattice images were obtained in the (1 10)- as well as the (001)-zone axis using a Philips C M 200 FEG/ST electron microscope operated at 200 kV. The photographic negatives are either digitized with an on-line or an off-line CCD-camera at a resolution of 1024 x 1024 picture elements and with a gray-scale depth of 12 bits per pixel.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
197
1. Free-Standing lstands
A free standing In,Ga, -,As island grown at 500°C was used as an example for the demonstration of the individual analysis steps described in Section 11. In the following, we give a short repetition of the figures concerning its evaluation. Figure 1 shows the HRTEM micrograph of the island. Figure 6 represents the experimental displacement vector field. Figure 7 shows color-coded maps of the components of the displacement vectors in growth direction (Fig. 7a) as well as in interface direction (Fig. 7b). The displacements averaged in the A 0 1 (area of interest) marked in Fig. 1 are shown in Fig. 8. This displacement profile was used for the iterative adaption of the In-concentrations inside the solids (Fig. 16) of the FE-model. The geometry of the FE-model was formed according to the color-coded map of the local specimen thicknesses displayed in Fig. 13, which was calculated with the QUANTITEM procedure from Fig. 1. The finite elements of the model are depicted in Fig. 17, which also shows the displacements in growth direction at the model surfaces. The In-concentration profile with the best fit of the experimental data can be found in Fig. 18. For a comparison, the appropriate results from the FE-simulation were included in Figs. 6 and 8. A map of the components of the simulated displacement vector field (which are perpendicular and parallel to the interface) illustrates Fig. 19, which is in a good agreement with Fig. 7. Figure 18 shows that the In-concentration is not uniform inside the island. Instead, an almost linear increase from the bottom towards the top is observed. In this way, the total strain energy of the island is lowered because the indium is concentrated at the top where a more efficient elastic relaxation is possible. Furthermore we recognize that the average In-content of the island that was grown at 500°C is close to the nominal In-concentration of 60%. In the following we will show that this is not the case for islands grown at 560’C. Figures 47a and 48a show mean experimental and FEM-simulated displacements of islands grown at 560°C with a nominal In-concentration of 60% and loo%, respectively. Figures 47b and 48b depict the corresponding In-concentration profiles. Figure 48b contains results of nominal InAs islands viewed along the (1 10) and (010)-zone axes. The (1 10) and (010) projections differ because in the first case each bright (or dark) spot in the HRTEM micrograph results from a pair of atomic columns consisting of group 111 and group V elements that cannot be resolved. As the Ga and In atoms have different scattering amplitudes, a varying “chemical” shift of the intensity maxima position with respect to the atom column position is expected if the composition changes (see Fig. 45). This is not the case for the (010) projection where each atomic column gives rise to a separate spot
198
A. ROSENAUER AND D. GERTHSEN
experiment
-FE-simulation
U
$ i { [i 1 i -0,02
'
I
'
I
2
0
~
4
~
'
I
'
I
6
8
l
'
'
-
I
I
'
I
'
l
'
I
'
,
I
(
'
I -
1
'
Ti,
1
'
10 12 14 16 18 20 22 24 26 28 30 Number j of (002)-plane l
~
l
'
l
~
l
~
l
~
l
~
~
8 .
40
L
c 0 ._ c
P
c
5 20 c 0 0
-P
0
I
0
2
4
6. 8
10 12 14 16 18 20 22 24 26 28 30
Numberj of (002)-plane
FIGURE 47. (a) Averaged displacement components in [OOI] direction plotted versus the distance in [OOl] direction in monolayers for an islands of a sample with a nominal layer thickness of 3.5nm and a nominal In-content of 60% investigated in (110) projection; ( b ) evaluated In-concentration profile (Rosenauer er a/., 1997).
in the HRTEM contrast pattern under appropriate imaging conditions. However, HRTEM image simulations revealed that the chemical shift is rather small. The use of the (010)-zone axis is a further test showing that the error due to a possible chemical shift is small because both orientations yield similar composition profiles (Fig. 48b).
~
l
~
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
0
-s
2
6 8 10 12 Numberj of (002)-plane
4
14
199
16
60 -
.I
c
.g 40P
c
(u
C 0
-
20-
1
0
2
'4
'
1
&
-
-
-
1
.
&
A I
-
-
I
6 8 10 12 Number j of (002)-plane
'
A
I
14
- ,, '
16
FIGURE 48. Averaged displacement components in [OOl] direction plotted versus the distance in [OOl] direction for an island of a sample with a nominal layer thickness of 0.6 nm and a nominal In-content of 100% along the (1 10) projection; (b) evaluated In-concentration profiles for the (110)- and the (010)-zone axis (Rosenauer er a/., 1997).
Figures 47b and 48b reveal that the In-concentration of islands grown at 560°C also increases from the bottom towards the top of the island. However, we found some islands where the In-concentration again drops at the top (Fig. 47b), which can be attributed to In-desorption at the rather high growth temperature. For islands grown at 560°C the mean In-concen-
200
A. R O S E N A U E R A N D D. G E R T H S E N
tration is 24% and 45% for islands grown with nominal In-concentrations of 60% and loo%, respectively. These results clearly show that segregation of In takes place during the growth of islands. The segregation of In in In,Ga,-,As is a well-known effect (Moison et al., 1989; Gerard and Marzin, 1992). The presence of a significant amount of gallium in the islands that nominally consist of InAs is a noticeable finding because Ga atoms were not offered during the growth. We suppose that the decreased In-concentration is due to the diffusion of Ga atoms from the GaAs buffer layer into the island. The G a diffusion is facilitated by the strong G a gradient at the interface and the relaxation effect, which is achieved by the reduction of the lattice parameter.
2. Islunds Cupped with IOnm GaAs In this section we present the investigation of a nominally 2-nm thick In,Ga,-,As layer capped wih lOnm GaAs after a growth interruption of 60s. This sample belongs to a series that was grown with different In,Ga,-,As layer thicknesses as well as with varying durations of the growth interruption. However, the evaluation of only one sample will be shown here because a systematic presentation combined with PL results will be published elsewhere. This section gives an example of how the whole variety of evaluation techniques described in Sections I1 and 111 can be combined to gain conclusive quantitative results on the morphology of buried quantum well and quantum dot structures. In order to gain an overview impression of the island density, Fig. 49 displays a g/3g planview weak-beam image obtained with a g = (220) reflection. The image shows a high density of 9 x 10'4m-2 of coherent islands that exhibit only strain contrast. In addition, some plastically relaxed
FIGURE49. g/3g planview weak-beam image of a buried In,Ga, _,As Stranski-Krastanow layer with 60 s growth interruption obtained with g = (220).
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
20 1
islands with misfit dislocations are observed at a comparably low density estimated to be 1.3 x 10'4m-2. We begin quantitative investigations of the buried Stranski-Krastanow layer with the strain-state analysis carried out along a (1 lo} direction of a cross-sectional specimen. Figure 50a is a color-coded map of local lattice
FIGURE50. Color-coded maps of: (a) local lattice parameters in growth direction; (b) local displacement vector field components in growth direction; and (c) thicknesses evaluated with the QUANTITEM procedure in the region indicated with a black rectangle in (a). The arrow in (c) marks the region corresponding to the In,Ga,_,As where the thickness map yields invalid results. (See also Plate 22.)
202
A. ROSENAUER AND D. GERTHSEN
distances along the [001]-growth direction. The blue regions at the bottom and on the top of the map correspond to the GaAs buffer and cap layers. The green, yellow, and red areas in the middle represents the In,Ga,-,As layer. The green region constitutes the wetting layer whose width is significantly increased compared to the uncovered structures. The red inclusions are regions with increased In-content. The color-coded map given in Fig. 50b depicts the component of the displacement vectors in growth direction. The displacements vanish in the GaAs buffer where the reference lattice was chosen. Inside the In,Ga, -,As, the displacements increase from the bottom towards the top. It is conspicuous that the displacements are not uniform inside the GaAs cap layer above the In,Ga, -,As layer. We recognize red areas with large displacements that are situated directly above the regions with increased In-content that can be seen in Fig. 50a. This behavior can easily be understood by looking at the sketch in Fig. 51, which illustrates that the (002)-lattice planes are bent towards the top of the layer directly above the regions with increased In-content. Of course, the bent areas are regions with enlarged displacement in growth direction. Figure 50c is a map of local thicknesses calculated inside the region marked by a black rectangle in Fig. 50a. Figure 50c reveals a wedge-shaped specimen with an angle of the wedge of about 25", which is expected from the sample preparation because the specimen was ion milled with two Ar'-ion guns under an angle of 12" each, The red image unit cells marked with an arrow corespond to the In,Ga, -,As, which therefore must beexcluded from the interpretation of the thickness map that will be needed later to generate an FE model.
FIGURE51. Schematic drawing on the origin of the red regions in Fig. 50. The region with increased In-concentration induces a bending of the horizontal lattice planes corresponding to an enlarged displacement component in growth direction. (See also Plate 23.)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
203
FIGURE 52. Color-coded maps of the local In-concentration of: (a) the wetting layer; and (b) an island. (See also Plate 24.)
Figure 52a, b shows color-coded maps of the local In-concentrations that were evaluated with the CELFA procedure described in Section 111. Figure 52a contains mostly the wetting layer whereas Fig. 52b shows an island. These results were obtained from another piece of the wafer from which a [OOl] cross-sectional specimen was prepared. Therefore, the island that is visible in Fig. 52 cannot directly be compared with the island shown in Fig. 50a, b because different islands possess different sizes and In-contents, which becomes obvious in the (002)-dark-field overview micrograph (Fig. 21) of the same sample. For a quantitative comparison between the strain-state analysis and the CELFA evaluation we therefore have to focus on the wetting layer. Figure 53 gives the In-concentration profile averaged inside the wetting layer of Fig. 52a. The profile has a maximum at an In-content of 25% and shows an In-distribution that is slightly asymmetrical. To compare the results of the strain-state analysis shown in Fig. 50 with the In-concentration profile obtained with CELFA (Fig. 53), the latter is used to generate an FE model with a geometrical shape according to the map of local thicknesses that is depicted in Fig. 50c. This wedge-shaped FE model is shown in Fig. 54a. The model's colors correspond to the [OOllcomponents of the simulated displacements. The model is then filled with atoms according to the crystal structure and orientation of the experimental image (Fig. 50). By averaging the displacements along the atomic rows in
204
A. ROSENAUER AND D. GERTHSEN
30
. C
20
0 .c
2
15
8 C
8 c -
10
5 0 I
I
0
10
I
I
20 30 Numberj of (002) plane
-T
40
-
FIGURE53. In-concentration profile in growth direction obtained for a part of Fig. 52 that only contains the wetting layer.
electron beam direction and subsequent evaluation of the resulting 2-D grid we obtained the map of local simulated displacements shown in Fig. 54b. The displacements are then averaged along the interface direction and are shown as the solid curve in Fig. 55a. Figure 55a also contains the experimental displacements obtained in the wetting layer region of Fig. 50b. The simulated curve that represents the result of the CELFA evaluation is obviously in good agreement with the experimental displacement data. The good coincidence on the right-hand side of Fig. 55a indicates in particular the high accuracy of the CELFA evaluation because the maximum displacement at the top of the layer results from a sum over all In-concentrations inside the layer (compare with Eq. (31)) and hence contains the sum of the errors that occur in the CELFA evaluation of the In-concentrations inside the whole layer. Small systematic errors of the In-concentrations would thus sum up to a large deviation in the region of the maximum displacement in Fig. 55a. As described here, the FE-model shown in Fig. 54a as well as the displacement vector field-simulated with the FE method were used to generate
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
205
FIGURE54. (a) Finite element model for the specimen region evaluated in Fig. 50. The thickness in the In,Ga,_,As region is obtained by an extrapolation of the thicknesses of neighboring GaAs regions. The light-blue grid shows the finite elements. The shape of the model also contains the influence of the displacement field (bowing of the surface of the In,Ga, -,As) amplified by a factor of 20. The colors correspond to the displacement vector field component in growth direction. The legend is given in (b), which shows displacement vector field components evaluated from a projected 3-D atomic model that was deduced from the results of the FE-calculation shown in (a). (See also Plate 25.)
a 3-D model of the atomic strucure of the specimen. This 3-D atomic model whose projection in electron beam direction is visible in Fig. 54b can also be used to perform an image simulation with the EMS program package. This step is carried out to check whether the averaging of atomic displacements along the electron beam direction performed for the generation of Fig. 54b is a good approximation compared to an approach that fully takes into consideration the lattice bending. For that purpose, the 3-D model was subdivided into 20 slices along the electron beam direction, yielding 20 super-cell files that serve as input for the “we2” program of the EMS package. The “we2” program performs a multislice simulation and gives the exit-wave function as an output. Next, the “iml” program simulates the HRTEM imaging where the appropriate parameters were chosen for the Philips CM 200 FEG/ST microscope: spherical aberration constant 1.2 mm; diameter of the objective lens aperture 8.6 nm-’; defocus step size 5 nm; defocus spread 5.9 nm; and beam semiconvergence angle 0.3 mrad.
206
A. ROSENAUER A N D D. GERTHSEN
experimental d a t a FE-sim ulation
-0.05
, 0
1
x
af=Onrn
i
~f=-40nrn
1-111111 I
10
--A
I
I
I
I
20
30
40
50
Frcium 55. (a) Experimental (dots with error bars) and simulated displacements plotted versus the number of the (002) plane in growth direction. The simulated displacements contain (solid line) the evaluation of the 2-D projected atomic model generated according to Fig. 54a as well as (two types of crosses and open circles) the evaluation of HRTEM images simulated with EMS for dilTerent defocus values where the object exit wavefunction was calculated with the 3-D atomic model fully taking into account the lattice plane bending; (b) map of simulated images (GaAs buffer at the bottom and cap at the top) for objective lens underfocus values from - 110 to 0 nm. The evaluated displacements of the images with defocus values of 0, -40, and -90nm are shown in (a). The CdZnSe region is marked at the left of (b).
Figure 55b gives the resulting HRTEM images. Every stripe corresponds to one defocus value. The images are oriented in such a way that the GaAs buffer is at the bottom and the GaAs cap layer is at the top. As shown in Fig. 55b, we obtain three windows in the defocus region 0 to - llOnm where the images do not contain a half-spacing contrast pattern as a prerequisite for an evaluation from bottom to top. Note that only these regions are in agreement with the experimental image. One image stripe was
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
207
chosen from each of those windows corresponding to defocus values of 0, -40, and -90nm. The three images then were evaluated with the DALI program. The resulting displacements in growth direction are shown in Fig. 55a. The good agreement with the solid curve indicates that lattice bending does not produce noticeable errors in this case. C. Struin-Stute Analysis of' un Arruy of Misfit Dislocutions
In this section we show that the strain-state analysis procedure described in Section I1 is also applicable to systems containing dislocations. The investigation of the displacement vector field of an array of 60"- and Lomer-misfit dislocations in the highly lattice mismatched heterostructure ZnTe/ZnSe is presented as an example in this section. Moreover, we will show that continuum elasticity theory can be used to obtain an approximation for an atomic model of a faulted crystal that can be applied for a comparison with experimental results. The ZnTe/ZnSe heterostructure with thicknesses of 1 pm and 0.6 pm of the ZnTe and the ZnSe regions, respectively, was grown by atmospheric pressure metalorganic vapor phase epitaxy (MOVPE) on (001) GaAs substrates without misorientation at a growth temperature of 340°C.Dimethylzinc-triethylamine (DMZn-TEN), di-iso-propyltelluride (DiPTe) and ditert-butylselenide (DTBSe) served as metalorganic precursors. Prior to growth, the substrates were prepared with an etching solution of 4 : 1 : 1 [H,SO,] : [H202] : [H,O], which was performed in a horizontal reactor with H, as carrier gas. For TEM investigations { l l O } cross-sectional samples were conventionally prepared. A two gun AT+-ion mill with an Ar' energy of 5keV was applied under an angle of 14" on an LN, cooled specimen stage. The TEM micrographs were obtained with a Philips CM30 microscope equipped with a twin lens (C, = 2.0 mm). The accelerating voltage was 300 kV, point resolution was 0.23 nm. In this system, the misfit (lattice parameters: Table 1)
.f =
'ZnSe
- uZnTe =
-7.2%
(79)
%nTe
causes the nucleation of misfit dislocations as soon as the layer thickness reaches a critical value. Misfit dislocation half loops are generated and glide towards the interface along { l l l } planes (Marte, 1987). Therefore, the interface contains an array of misfit dislocations. The misfit dislocations are mostly perfect b = &I,( 110) dislocations. In the (001)-oriented substrate surface two types of perfect dislocations are observed: 60" dislocations with the Burgers vector b60 inclined to the interface plane and Lomer disloca-
208
A. ROSENAUER AND D. GERTHSEN
tions with bLomerparallel to the interface plane seen in Bauer et al. (1993). Lomer dislocations are twice as efficient in strain relaxation compared to 60" dislocations. Assuming total strain relaxation exclusively by Lomer misfit dislocations their distance would be:
By the investigation of dislocation distances in HRTEM micrographs of ZnTe/ZnSe an equivalent experimental value dLomer/)bLomerl = 13.8 was found. This indicates a nearly complete strain relaxation for sufficiently thick layers. Figure 56 shows the HRTEM micrograph that contains a part of the interface region with four dislocations. The horizontal positions of the dislocation cores vary due to the rough growth surface that is often observed in MOVPE-grown samples. The Burgers circuits drawn in green indicate three Lomer and one 60" dislocation. The dislocation lines run most probably parallel to the electron beam direction (Bauer et a/., 1992). Note
FIGURE56. ( 1 10) HRTEM image and gridding (red) of the interface between ZnSe and ZnTe of a heterostructure grown on GaAs (001) with MOVPE. The Burgers circuits around the misfit dislocations are marked in green indicating three Lomer (LO) and one 60" dislocation. (See also Plate 26.)
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
209
that the Burgers vector lies on the viewing plane in the case of the Lomer dislocations, which is of the edge type, whereas it is inclined for the 60" dislocation, which therefore contains a screw component. The 2-D grid that is marked in red in Fig. 56 consists of (002) planes in horizontal and (220) planes in vertical direction. For a Lomer dislocation, two {220} planes of the ZnSe terminate at the dislocation core whereas one horizontal as well as one vertical grid plane terminate at the 60" dislocation core. Figure 57a depicts the displacement vector field, which had to be evaluated with a reference region in the ZnSe because the reference region must contain continuously numbered lattice planes without missing planes. Note that vanishing displacement vectors are not drawn in Fig. 57a. Figure 57b is a color-coded map of the (110)-components of the displacement vectors showing that they point towards the (220) planes that are missing in the ZnTe. Figure 57c illustrates the [001]-components of the displacement vectors, which in the red region point towards the (002) plane that is missing at the left of the 60" dislocation. Furthermore, one recognizes a tilt between the ZnSe and ZnTe crystals due to the 60" dislocations because the green region between the red and blue ones is tilted against the (001)interface plane. Figure 58 shows the lattice parameter in [OOl] direction averaged along the (002) planes. The lattice parameter in the ZnTe crystal (at the right) is 1.07 times that in the ZnSe. From Eq. (79) one deduces that uZnTe/uZnSe = 1.078 for the bulk materials, which is close to the measured value already given here. Therefore, the dislocation network results in an almost complete strain relaxation, which is consistent with the results of the investigation of dislocation distances of larger regions as mentioned in the preceding. In the following, we will make a qualitatively based comparison of Fig. 57 with a simple theoretical approximation of an array of Lomer dislocations using elastic continuum theory of isotropic media. In the case of an array of Lomer dislocations with bLomer= 1/2a0[1 lo] and a dislocation line direction [lie], the [lTO] component of the displacement vector field vanishes because of the edge character of the dislocations. Therefore, the displacement calculation is reduced to two dimensions. The following calculation of a 2-D atomic model is accomplished in three steps: in the first step of the calculation a perfect bicrystal is formed that is shown in Fig. 60a. Figure 59 illustrates that each 2-D unit cell contains four projected atoms at the positions r l = (O,O), r2 = (0, - 1/4u,), r3 = ( 1/2uIl,- 1/2eu,), r4 = ( 1/2al,,- 3/4u,). The lattice parameter parallel to the interface plane is
The lattice parameters a, perpendicular to the interface plane are different
210
A. ROSENAUER AND D. GERTHSEN
Distance in cll0~directionI [pixels]
FIGURE 57. (a) Displacement vector field and color-coded displacement vector field components in (b) growth direction and (c) ( 1 10) direction of the ZnSe/ZnTe interface region. (See also Plate 27.)
211
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
1.16 1.14
I
1
I
I
I
I
I
I
1
,.I,]
P)
Y L
g =: 1.10f .= 1.08' Q
.-8
< 0
g 1.06-
3=%
5 1.04-
B0% o 1.02$ .E ,.oo-
k
0.98 -
0.96 -
1
I
0
10
1
I
20 30 Numberj of (002)-plane
40
FIGURE58. Averaged lattice parameter in growth dlrection plotted versus the number of (002) planes. The left part corresponds to the ZnSe and the right to the ZnTe.
inside the ZnSe and the ZnTe regions, given by
where 11 is Poisson's ratio and the factor .Ilf'
$1-V
represents the tetragonal
distortion, which will be eliminated by the displacement field of the dislocations. For this calculation f = 1/14 has been chosen. Figure 60a shows the resulting perfect 2-D crystal. Atomic rows n perpendicular to the interface plane are found at x, = nul,, n E {. . . , - 1, - 1/2,0, 1/2,1,. . .). In the second step of our simple calculation, two specific atomic rows as indicated in Fig. 60a are deleted. All rows to the right and to the left of the row that and 1/4nl,, is marked with an arrow in Fig. 60a, b,c are shifted by - 1/4~z,~ respectively. The atomic model that results after the second step is shown in Fig. 60b. In the third step of the calculation, the displacement vector field of an array of Lomer dislocations is applied. According to the elasticity theory of isotropic media (which of course is an approximation in the case of
212
A. ROSENAUER AND D. GERTHSEN
FIGURE59. The schematic drawing shows two unit cells of projected atomic positions along the (110) direction in a 2-D model. Each unit cell contains four atomic positions as indicated by the numbers in the lower part of the figure. ZnSe and ZnTe unit cells differ in the due to a tetragonal distortion. parameters u p S eand upTe
ZnTe/ZnSe), the displacement vector field of a single Lomer dislocation is given by Hirth and Lothe (1968), Mazzer et al. (1990), and Fortini and Brault (1990): u ~ xy), =
b [tan-l(:) 2n
U,(",Y)
=
I'-+
1
2(1 - v) rz
"'I
, - 4 ~ ( -l V) (1 - 2v) log@) + 1 r
where b = a,, is the Burgers vector and r = ,/-, Note that the crystal is assumed to extend infinitely in the y and --y directions. Therefore, dislocation image forces that occur due to surfaces are not taken into account. Complete strain relaxation is achieved if the Lomer dislocation
+D
+n
+o
+o
+
+
+o
O+O
+o
+o
+O
I
u
n
+Tm
O+O
+O + 0
+
xt
+o
+o
+o
:1: ,o 11: ,o
n:lo
+n
+n +o
+O+o
+n
+n + .-. 0 ..,
1
+n +o +o +n +o +o +o +D +o +a +o +o +o +o +o +o +D +n +o +O +O +o +o +o +o +o +o +o +o +o +o +o +n +n +o +o +o +o +o +n +D +o +o +o +o +n +n + o +O+o +o +o +o
+n
+D
+n +n +n +o
+n
4 1
+D
+D
p+.o.?!;.n.?.F..
-
+n
'+.u.. .'+ n.. ..t n. -. .?.a + n+ . o.+ o +.. o - + o +.. -n+ n+ o
+ D
+n +n +n +o +o +o +o +o +o +o +o +o +n +a +o +o +o +o to +o +o +o +n +n +o +o +o
+n
213
FIGURE60. Schematic drawing which shows the three steps to obtain the 2-D atomic model for a Lomer misfit dislocation. The arrows indicate the same row in (a), (b), and (c); (a) shows the perfect hicrystal. The rows that have to be deleted are surrounded by a dashed box; (b) shows the result after the deletion of the two rows and a shift that is explained further in the text; (c) is obtained after the addition of the displacement field of the Lomer dislocation array.
214
A. ROSENAUER A N D D. GERTHSEN
cores are at a distance d,, then given by
1
=-
UZnSe
fJ"i ~
. The total displacement vector field is
with 1 SX
= j all
and
s
Y
1 4
= -aZnTe 1
(84)
where s, and s, are used to shift the dislocation cores to positions corresponding to that which is marked with a cross in Fig. 59. The values nmin and n,, determine the number of contributing Lomer dislocations, which is lo3 in this calculation. The result is shown in Fig. 60c. Note that the lattice parameters perpendicular as well as parallel to the interface plane were chosen in such a way (Eqs. (82), (83)) that the averaged lattice parameters u I Iparallel to the interface plane are aznsr
'lZnTEin the ZnSe a n d T and ZnTe regions of Fig. 60c and the average lattice parameters uI perpendicular to the interface plane are uZnSeand uZnTe,respectively. Figure 61a, b, and c displays the result of the DALI evaluation of the atomic positions that were calculated as already described here. Note that the position of a dumbbell of a (110) HRTEM image is approximated by averaging of two closely neighboring atomic columns, which are occupied by different atomic species. These graphs can qualitatively be compared with the displacement field of the left two Lomer dislocations shown in Fig. 57. Obviously, the main features are in good agreement. However, one should bear in mind that the comparison can only be qualitative because the theoretical calculation is based on the elasticity theory of isotropic media whereas the real structure is neither continuous at the dislocation core nor is it isotropic. Moreover, the difference of the elastic constants of ZnSe and ZnTe have not been taken into account. ~
J5
D. CELFA Evaluution of u CdSelZnSe(001) Hetrrostrumm
The CELFA evaluations that have been presented in the previous sections were carried out on In,Ga, -,AsGaAs (001) Stranski-Krastanow layers. In this section we want to give an example showing that the CELFA procedure also can be applied to other materials, for example, CdSe and ZnSe. The CdSeIZnSe heterostructure was grown by migration enhanced epitaxy (MEE) at a substrate temperature of 280°C. The structure consists of a
FIGURE 61. Evaluation of the calculated 2-D grid that contains an array of Lomer misfit dislocations; (a) displacement vector field and color-coded displacement vector field components in (b) vertical direction and (c) horizontal direction. (See also Plate 28.)
216
A. ROSENAUER AND D. GERTHSEN
GaAs (001) substrate, a 50-nm thick ZnSe buffer layer, a 1 ML CdSe interlayer and a 10nm thick ZnSe cap layer. The overall thickness of the strained layers is well below the critical thickness for the formation of misfit dislocations. The preparation of [loo] cross-sectional samples and the HRTEM imaging was performed in analogy to that described in Section IV-B. The (020) and (040) lattice fringes perpendicular to the interface plane are used for HRTEM imaging. The crystal tilt was close to 3". Figure 62 shows a color-coded map of the local Cd-concentration, which was evaluated using the amplitude of the (020) reflection of the image unit cell diffractograms. At the lower part of the map a region with enlarged Cd-content is visible. Figure 63 depicts the corresponding Cd-concentration profiles obtained from a lower and an upper section of Fig. 62. The maximum of the Cd-concentration profile is about 13% in the upper part of Fig. 62, whereas it reaches nearly 20% in the lower part with enlarged Cd-content. Figure 64 shows a graph that is obtained by the integration of the Cd-concentration profile of the whole image in Fig. 62. Figure 64 reveals that the entire deposited Cd-content is equivalent to 0.97 ML CdSe, which is close to the nominally deposited amount of 1 ML. Figure 63 indicates a significant broadening of the interlayer compared to the intended width of 1 ML. This behavior is similar to that observed in capped In,Ga, -.As/GaAs (001) heterostructures in Section IV-B-2. In the following we will discuss to what extent the crystal tilt contributes to the observed interlayer width. In Section IV-B-2, the results of the CELFA method were in good agreement with strain-state analysis, which suggests a small effect of the crystal tilt in this case. However, the extent of the broadening may depend on the morphology of the transition regions between layer and matrix. For a quantitative discussion of this effect on the evaluated concentration profile we need an upper limit of the sample thickness. Figure 65 shows the oscillation of the amplitude of the (002) reflection as it relates to the objective lens defocus. This curve was obtained as described in Section 111-D from a region below the Cd(Zn)Se interlayer inside the ZnSe buffer. Figure 65 yields a defocus step size of 8.2nm (Eq. (52)) and the values J,,, and Jminlead to a crystal thickness of 15 nm according to Eq. (58) and Fig. 30. In the following we focus on the concentration profile obtained from the upper part of Fig. 62 and which is shown in Fig. 63. The shape of the profile is approximately a Gaussian curve that is given by X(Z) =
A
wJn/;!
~
exp
(-2;)
where A is the area below the curve, w the half width and z the coordinate in
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
217
FIGURE62. Color-coded map of the Cd-concentration of a CdSe layer with a nominal thickness of 1 ML buried in a ZnSe matrix. (See also Plate 29.)
218
A. ROSENAUER AND D. GERTHSEN I
I
1
20 -
lower part of Fig. 62 .... *--upper part of Fig. 62 4 -
-
-
-
-
0I
I
10
0
I
20 30 Numberj of (002)-plane
40
FIGURE 63. Cd-concentration profiles in growth direction for sections in the lower and upper part in Fig. 62.
I
I
I
1.0-
-
z0.8 -
-
0.6-
-
v 2
.Y
9
-
-
0.4
: -
-
0.0 -
-
75 0.2
0
I
I
I
Number j of (002)-plane
FIGURE 64. The curve is obtained by the integration of the Cd-concentration profile of the whole image in Fig. 62 and shows the integral Cd-content of the crystal below a (002) plane j. The right-hand part of the curve ( j > 25) indicates an overall Cd content that is equivalent to 0.97 M L CdSe.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
219
FIGURE 65. Amplitude of the (002) reflection in the diffractrogram plotted versus the image number of the images of a defocus series. The values J,,, and Jmi, are used to derive the specimen thickness. The corresponding crystal region was chosen inside the ZnSe buffer layer yielding an upper limit of the thickness.
growth direction. Fitting Eq. (85) to the experimental profile yields A = 0.89 ML CdSe and w = 5.19 ML. To investigate the influence of the cry: i1 tilt as it relates ) the shape of the real concentration profile, we first have to find a function X , that describes the real profile. Furthermore we will approximate the projected profile X;lted of the tilted specimen, which will then be compared with the experimental profile. Using this procedure, we will determine which “real” profiles X , lead to matching profiles in the case of a tilted crystal. For this purpose, the functional relationship
previously used in Eq. (72) seems to be well suited. Parameter P describes the sharpness of the transition regions of the profile, which proceeds to a Heaviside-function with broadness b for P 40. In this case, the function constitutes the “ideally sharp” profile with a homogeneous Cd-content of 89% and a width of b MLs, which is chosen as b = 1 ML to be consistent with the experiment. Parameter A is equal to the area below the curve (as
220
A. ROSENAUER AND D. GERTHSEN
in Eq. (85)). Therefore, parameter P can be varied without changing the area below the curve, which is identical with the entire deposited amount of CdSe. Note that the entire amount of CdSe is known from Fig. 63 because it cannot be affected by crystal tilt. Figure 67a shows the profiles obtained from Eq. (86) for some different values of the parameter P . The next step is the approximation of the crystal tilt sketched in Fig. 66. The crystal of thickness T in electron beam direction is decomposed into slices of thickness dt, which are shifted in such a way that the projected profile approximates (and for dt -+ 0 is identical with) the projected profile of the tilted sample. This transformation of the profile is described by 1
xlilted
PT
X,(z - ( t - T/2)tan Cp, P ) dt.
(87)
Figure 67b depicts the result of Eq. (87), which was calculated with 4 = 4” for a specimen thickness T = 15 nm and for the same set of parameters P that were used for the computation of Fig. 67a. Figure 67b also contains the
FIGURE66. Schematic drawing that shows the approximation that is made for the calculation of the concentration profile of the tilted specimen.
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
22 1
Numberjof (002)-plane 1
-
1
'
1
'
1
'
1
~
1
.
1
.
1
.
Parameter P=
80
.....
-
0.001
-
560 c a
*
$40
Experiment
E
8
8 20 0 -10
-24
-10
'
I
-8
-4 -2 0 2 4 Number) of (OOZ)-pIane
-6
-8
'
t
-6
'
I
'
I
'
I
'
I
'
I
-4 -2 0 2 4 Number] of (002)-plane
6
'
I
6
8
'
I
8
,
I
10
FIGURE67. (a) Cd-concentration profiles computed according to Eq. (86) for different width parameters P under the condition of a constant area equivalent to 0.89 M L CdSe (in accordance with the dashed profile shown in Fig. 63) below the profile curves; (b) Cdconcentration profiles of the tilted specimen with a thickness of I5 nm computed according to Eq. (87) with the same values for P as in (a); (c) difference between the curves in (b) and (a) for P 3 0.5.
222
A. ROSENAUER AND D. GERTHSEN
experimental concentration profile that is well fitted by the curve with P = 1.82. Obviously, the profile with the sharpest transition regions that corresponds to P = 0.001 is most affected by the tilt. Figure 67c shows the difference X , - XgJfedfor the other curves with P 2 0.5. One clearly sees that the difference decreases with increasing broadness of the transition region. For P = 1.82 with the best fit to the experimental profile, the difference becomes negligible, We therefore have to conclude that the profiles shown in Fig. 63 are not affected by crystal tilt. First, profiles with sharper transition regions do not fit the experimental data because their height is increased due to the fact that all curves have to enclose areas of the same size, a consequence of the known amount of CdSe deposited in all. Second, the effect of the crystal tilt can be neglected for profiles corresponding to a parameter P > 1 at conditions q5 < 4" and T d 15 nm. V. SUMMARY AND DISCUSSION OF THE ATOMICSCALE-ANALYSIS METHODS
This work, concerned with two methods to evaluate the composition of ternary sphalerite structure crystals, relied on different sources of information of the HRTEM micrograph. The first procedure reviewed was strainstate analysis. It exploits the different lattice parameters of layer and substrate/buffer in strained layer heteroepitaxy. This procedure has to be regarded as an indirect method of composition evaluation because the strain state of the strained layer under investigation is influenced by several factors of which a researcher must be aware. First, the tetragonal distortion of a pseudomorphically grown structure depends on specimen thickness due to the very small values < 20 nm that are necessary in HRTEM. In indefinitely thin specimens, the biaxial strain of the bulk sample is reduced to a uniaxial strain state. The amount of strain relief itself, however, does not depend only on the sample thickness but is also influenced by the concentration profile of the strained layer. This is due to the fact that different Fourier coefficients of the profile show a different relaxation behavior of the tetragonal distortion. To take this effect into consideration one has to know the local sample thickness, obtained by application of the QUANTITEM procedure to the investigated micrograph. The next step is generation of a hypothetical model of compositional morphology of the specimen, which then is the basis for a finite element calculation. The resulting displacements of the FE calculation then are used to compute a 3-D atomic model. To simulate the imaging process the displacements are averaged along atomic rows in the electron beam direction. The resulting 2-D grid is again evaluated with the DALI procedure analogously to the experimental image. Comparison of the simulated
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
223
and experimental displacements gives an indication as to how the guessed input model of the FE calculation has to be modified. In this way, the model is improved in an iterative way until the experimental and simulated displacement vector fields show sufficient agreement. As an alternative test, HRTEM image simulations can be performed on the basis of an atomic model deduced from the FE simulations as shown in Section IV-B-2 (Fig. 55). The sources of error for this analysis procedure are as follows: First of all, detection of the local brightness maxima in the HRTEM image has an accuracy of about 0.2 pixels. In TEM micrographs that are obtained for a (1 10) direction of the incident electron beam, the position of the brightness maxima of a dumbbell that cannot be resolved at a resolution 0.15 nm depends on both sample thickness and defocus. Therefore, a (100) projection seems to be preferable to the (1 10) projection where each spot in the HRTEM image stems from one column of atoms. However, the spacings between the projected rows is small (0.28nm for GaAs) and hence the contrast pattern depends strongly on variations in defocus, sample thickness, and orientation, all of which leads to small areas of the image that can generally be evaluated. Most of the examples of the strain-state analysis presented here are therefore based on the (1 10) orientation in this review. The application of the (100) projection seems promising if the TEM resolution increases and the sample preparation techniques are improved vis-a-vis the wedge shape of the specimen as well as the smoothness of its surfaces, which creates smaller variations in thickness in the electron beam direction. A third source of error whose effect becomes obvious mainly at interfaces with sharp chemical transitions is the delocalization that is given by Thust et al. (1996) and Lichte (1991)
where C, is the spherical aberration constant, ithe electron wavelength, and Af the defocus. Equation (88) represents a spatial delocalization imposed by the microscope on a certain spatial frequency 9. The delocalization is minimized for a particular g at a defocus of Af = - C,i2g2. The second method that has been outlined in this review is composition evaluation by the lattice fringe analysis (CELFA) procedure. This method uses a (000), (040), and (020) 3-beam imaging condition with the chemically sensitive (020) reflection centered on the optical axis. This condition is simple and hence nonlinear imaging can be solved analytically. Furthermore, all free parameters of the imaging can readily be extracted from a defocus series and the electron microscope parameters do not need to be
224
A. ROSENAUER AND D. GERTHSEN
known. Analysis of the series leads to values for the sample thickness t in electron beam direction and the defocus dependent phases 4,, of the oscillation of JJo,,l with advancing image number n. It was shown in Section 111-G and Fig. 37 that errors of thickness determination only weakly influence the evaluated concentrations. Moreover, a variation of either imaging conditions or specimen thickness throughout the image can be corrected. Two modes are possible for the evaluation. First, the amplitude ratio (Joo2j/~Joo,l of image cell diffractograms can be used. This method has the advantage that adsorbates at the surfaces of the specimen that lead to a modification of the electron wave function do not influence the measurement. Second, it is possible to exploit only the amplitude lJ,,,(, which offers the advantage of better accuracy. This is due to the fact that only errors of the determination of 1Joo21 are relevant for the measurement contribute in the first case. whereas the errors of both lJoo21 and [.lOo4J Furthermore, the error introduced by an uncertainty of the thickness determination influences the concentration determination to a smaller degree as shown in Fig. 37. In pseudomorphically grown strained heterostructures, the lattice fringes perpendicular to the interfaces must be used for evaluation because the contrast pattern may be influenced by a variation of fringe distance. For that purpose, the crystal has to be tilted around an axis parallel to the interface plane. To keep the induced blurring of the interface composition profile small, the tilt angle has to be kept below 4”. In this case, the considerations in Section IV-D revealed that the tilt noticeably influences the measured concentration profile only for extremely sharp interfacial transitions. This effect is supported by the fact that the measurement of the entire deposited amount of interlayer material, which is equivalent to the area underneath the concentration profile, does not depend on the crystal tilt. The CELFA procedure seems to be an efficient and accurate image evaluation tool both because the electron microscopist does not need to know any of the imaging parameters and because conventional comparisons between experimental and simulated images are not necessary. Moreover, the accuracy of the evaluation is high and the method is applicable to a whole variety of compound semiconductor systems where Cd,Zn -,Se, In,Ga,-,As, and Al,Ga, -,As are only a few examples.
ACKNOWLEDGMENTS This work is supported by the Volkswagen Stiftung under contract number 1/71 014. The authors would like to thank U. Fischer, S. Kaiser, T. Reisinger,
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
225
and T. Remmele for contributing valuable results within the scope of their diploma theses and dissertations. We also would like to express our thanks to A. Forster (Institute for Thin Film and Ion Beam Technology of the Research Center, Julich) for MBE growth of the InGaAs samples, N. N. Ledentsov and his group at the Ioffe Institute (St. Petersburg) for providing CdZnSe specimens, W. Gebhardt, H. Stand, and B. Hahn for the growth of the MOVPE samples, and J. Zweck for many valuable discussions and remarks.
APPENDIXA: LISTOF VARIABLES
In the following, a list of variables is given that does not include abbreviations that are very limited in scope.
u lattice parameter of the bulk material u l l lattice parameter purnllel to the interface plane and perpendiculur to the electron beam lattice parameter parallel to the interface plane and parallel to the electron beam a, lattice parameter perpmdicular to the interface plane ukc lattice parameter perpendicuhr to the interface plane in the “substrate” that is a binary material BC a, b, c, d vectors pointing from the center of the image unit cell Z to its corners a’,h’, c’. d vectors pointing from the center of the image unit cell Z’ to its corners uoo, real amplitude of the (OOj) beam a , origin of the grid of lattice positions a 1 , 2 lattice base vectors af lattice parameter of the film a, lattice parameter of the substrate uALB,.,c lattice parameter of a ternary material A,B, -,C uAC lattice parameter of a binary material AC A area below the concentration profile in Section IV-C A , B,C , D, E fit parameters in Eq. (54) A’s2 antineighbor position A, “area” defined for the decomposition of the Fourier-transformed image tlthermal virtual “thermal” expansion coefficient used for the FE calculation
uy
226
A. ROSENAUER A N D D. GERTHSEN
angle of the grid lines of set (1) or (2) with the x- or the y-axis b Burgers vector l~~~ Burgers vector of a 60" misfit dislocation bLomer Burgers vector of a Lomer misfit dislocation hl,2 orthonormal basis of the plane E of the QUANTITEM image vectors B , "block" defined for the decomposition of the Fourier-transformed image C Fourier-transformed image C , components of the elastic tensor C,, integral AC-content of a ternary material A,B,-,C in units of CMLl C, spherical aberration constant xf wave aberration due to the defocus xs spherical wave aberrations zn xf xs for the image n of a defocus series d j sample thickness corresponding to the image unit cell Zi dl,2 direction of grid lines D ( T ) diffusion coefficient in dependence of the temperature T in Section IV-A-2 6 accuracy of the position detection, calculated from deviations inside the reference lattice S(Af) defocus change corresponding to a full oscillation of /J(goo2)l Ai+j local lattice distance corresponding to the position PiTj AStepsize(Af) stepsize of the defocus change between adjacent images of a defocus series A mean lattice parameter in growth direction in Section IV-A-1 (see p. 000) E plane of QUANTITEM image vectors f misfit fA atomic scattering factor of the atom A modulation frequency of the input signal in the log in-technique ,fmod F(g) complex amplitude of the beam that corresponds to the reciprocal lattice vector g Af objective lens defocus 4 Wiener filter and tilt angle in Section IV-C 'pi, Qi angles describing a position on the QUANTITEM ellipse corresponding to the image unit cell 2: qn phase of JJ(g,,,)J in dependence of the defocus for the image n of a defocus series g vector of the reciprocal lattice g,.,.,,, factors with values - 1,0,1 defining the contribution of the a:::
+
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
227
distance vectors (Eq. (12)) I intensity J ( g h , k , l ) or J , h k [ Fourier coefficient of the diffractogram corresponding to the (h, k, l ) reflection JmaxrJmin maximum and minimum of the oscillation of JJ(goo2)l in dependence of the defocus Li line i 2 electron wave length M , M' midpoints of the cells Z and Z' in Section 11-B-1 M",2t3' lattice positions used for the position detection procedure N 2 dimension of the image vector space of the QUANTITEM procedure in Section 11-B N noise part of the Fourier-transformed image N i S 2 neighbor position NRef.Rnumber of lattice positions inside the reference lattice N,, number of sublattices of the grid of positions NCd,Zn,-,Senumber of MLs contained in the Cd,Zn, -,Se interlayer v Poisson's ratio pooj phase of the (OOj) beam P parameter describing the abruptness of the transition region of the Cd-concentration profile in Section IV-C, defined in Eq. (86) P , . . . P, some parameters P i , j latice position with indices i, j Q direction on which displacements and distance vectors are projected 0 offset angle of the QUANTITEM procedure corresponding to a sample thickness -,0 R delocalization in Section V R image vector of the QUANTITEM procedure Rfs2,3 template image vector of the QUANTITEM procedure Ri,j reference lattice position corresponding to the position Pi,j R, image cell of the QUANTITEM procedure that contains only one gray level c Sinputtest signal Soutpu,response signal SFh,, structure factor of the beam with Miller indices h, k, E CT standard deviation of the intensity maxima determination t duration of the annealing in Section IV-A-2 T oscillation period of the RHEED signal in Section IV-A-1 and annealing temperature in Section IV-A-2 as well as the crystal thickness in electron beam direction in Section IV-C T image vector corresponding to ReValinside the plane E component of the evaluated image vector T parallel to the plane E
228
A. ROSENAUER A N D D. GERTHSEN
Tival component of the evaluated image vector T perpendicular to the plane E T ( g + h , h; Af) transmission cross coefficient of the nonlinear image formation ATval value used to estimate the reliability of each vector TI,?,, AT virtual heating temperature used for the FE calculation ui., displacement vector corresponding to the position P,.J u,, maximum displacement measured on top of a strained layer w width of the Gaussian curve in Eq. (85) wB,,, weighting factor of the block B,,, x composition or spatial coordinate x, positions of atomic rows in Section IV-C (?,,, jet, 2,,) coordinate system associated with the elastic constants (igeom, igeom, coordinate system associated with the geometry of the FE model X(z, t , T ) Cd-concentration profile in dependence of the distance z in growth direction, the heating duration t and the temperature T in Section IV-A-2 Xslngle(z, t , 7') Cd-concentration profile of an ideal interlayer of 1 ML CdSe in Section IV-A-2 X,(z) real concentration profile in Section IV-C X:l'ed(~) real concentration profile of the tilted specimen in Section IV-C 5 extinction distance of the undiffracted beam y coordinate z coordinate in growth direction Z irregularly shaped image unit cell 2' quadratic image unit cell
REFERENCES ABAQUS 5.5, Hibbitt, Karlsson & Sorenson Inc. Aebersold, J. F., Stadelmann, P. A., and Rouviere, J.-L. (1996). UltiumicloscopJ: 62: 171 189. Baba, N. and Kanaya, K. (1989. Reseurcli Reports of Koyctkuin Uiriuersity, 66: 97-101. Bauer, S., Rosenauer, A., Link, P., Kuhn. W., Zweck, J., and Gebhardt, W. (1993). Ultrcimicroscopy, 51: 221 -227. Bauer, S., Rosenauer, A., Skorsetz, J., Kuhn, W., Wagner, H. P., Zweck, J., and Gebhardt, W. (1992). J . C r p t . Growth, 117: 297-302. Bierwolf, R., Hohenstein, M., Phillipp, F., Brandt, O., Crook, G. E., and Ploog, K. (1993). Ultmmicroscopy, 49: 273-285. Brandt, O., Ploog, K., Bierwolf, R., and Hohenstein, M. (1992). P/iyx. Rev. Lett., 68: 1339-1342. Y , 64: 3617. Christiansen, S., Albrecht, M., Strunk, H. P., and Maier, H. J. (1994). AppL P / I ~ . Lett., ~
ATOMIC SCALE STRAIN AND COMPOSITION EVALUATION
229
Fortini, A. and Brault, M. (1990). Revue Phys. Appl., 2 5 1037-1047. Gaines, J. M. and Ponzoni, C. A. (1994). SurJirce Sci., 310: 307. Gerard, J.-M. and Marzin, J.-Y. (1992). P/7ys. Reu. B 45: 6313. Hirth, J. P. and Lothe. J. (1968). Theor.v of Dis1octrtion.s. New York: McGraw-Hill. Hull, R. and Bean, J. C. (1992). Criticul Revitws in SoIicI Srcrt~nncl Muteriuls Sciences, 176 507. Ishizuka, K. (1980). Ulrrumicroscopy, 5: 55-65. Jia, C. L., Thust, A., Jacob, G., and Urban, K. (1993). Ultiumicroscopy 49: 330-43. Jouneau. P. H.. Tardot, A., Feulliet, G . , Marietta, H., and Cibert. J . (1994). J. Appl. Phys., 75: 7310. Kisielowski, C., Schwander, P., Baumann, F. H., Seibt. M., Kim, Y. O., and Ourmazd, A . (1995). Ultrci,nieio.seopy, 58: 13I 155. Lichte, H. (1991). Ultrrrmicroscopy, 3 8 13. Maree, P. M. J., Barbour, J. C., van der Veen, J. F.. Kavanagh. K. L., Bullelieuwma, C. W. T., and Viegers, M. P. A. ( I 987). J. A p p l . Phys., 62( 1 1): 44 13. Marks, L. D. (1996). Ultrurnicroscopy, 62: 43-52. Martin, W. E. (1973). J . Appl. Phys., 44: 5639. Maurice, J.-L., Schwander, P., Baumann. F. H., and Ourmazd, A. (1997). Ultrumicroscopj, 68: 149-161. Mazzer, M., Carnera, A., Drigo, A. V., and Ferrari, C. (1990). J . Appl. Phys., 68(2): 531-539. McTempas by Total Resolution, Berkeley, CA. Moison. J. M., Guille, C., Houzay, F., Barth, F., and Van Rompay, M. (1989). Phys. Rev. B 40 6149. NCEM Simulation System. The National Center for Electron Microscopy, Lawrence Berkeley Laboratory, Berkeley, CA. ODonnell. K. P. and Woggon, U. (1997). Appt. P/IJX Lett., 70: 2765. Ourmazd, A,, Baumann, F. H., Bode, M., and Kim, Y. (1990). Ultrurvicroscopy, 34: 237-255. Ourmazd, A,, Schwander, P.. Kisielowski, C., Seibt, M., Bauniann, F. H., and Kim, Y. 0.(1993). I n s t . Phys. Car$ Ser., 134: Section 1, 1-10. Ourmazd, A,, Taylor, D. W.. Cunningham, J., and Tu, C. W. (1989). P/i,ys. Rev. Le/t., 62: 933. Paciornik, S., Kilaas, R., and Dahmen, U. (1993). Ultrumicroscopg, 50: 255-262. PATRAN 6.0, MacNeal-Schwendler Corporation. Press, W. H., Vetterling, W. T.. Teukolsky, S. A., and Flannery, B. P. (1992). Numericrrl Recipes in C. pp. 547-549. Cambridge: Cambridge University Press. Reisinger, T., Lankes, S.. Kastner, M. J., Rosenauer. A,, Franzen, F., Meier, M., and Gebhardt, W. (1996). J . Cryst. Growth, 159 510-513. Robertson, M. D., Curie, J . E., Corbett, J. M., and Webb, J. B. (1995). Ultrumictosropy, 58: 175. Rosenauer, A,. Fischer, U., Gerthsen. D., and Forster, A. (1998). Ultrrrmicroscopy, 72: 121- 133. Rosenauer, A., Kaiser, S., Reisinger, T., Zweck, J., Gebhardt. W., and Gerthsen, D. (1996). Optik, 102: 63-69. Rosenauer, A,, Fischer, U., Gerthsen. D., and FBrster, A. (1997). AppL Ph.v.~. Lett., 71: 3868-3870. Rosenauer, A,, Reisinger, T.. Franzen, F., Schiitz, G., Hahn. B.. Wolf, K., Zweck. J., and Gebhardt. W. (1996). J . Appl. Phys., 79(8): 4124-4131. Rosenauer, A.. Reisinger, T., Steinkirchner, E., Zweck, J., and Gebhardt. W. (1995). J . Cryst. Growth, 152: 42-50. Saxton, W. 0.. Pitt, T. J., and Horner, M. M. (1979). Digital image processing: The Semper system, Ultrumicro.scopy, 4 343; Semper 6 by Synoptics Ltd., Cambridge, UK. Schuhrke, T., Mindl, M., Zweck, J., and HolFmann, H. (1992). Ulrrtmzicroscopy. 4 5 41 1-415. Schwander, P., Kisielowski. C., Seibt, M., Baumann, F. H., Kim, Y., and Ourmazd, A. (1993). Phys. Rev. Lett.. 71: 41 -~50. -
230
A. ROSENAUER A N D D. GERTHSEN
Seitz, H., Seibt, M., Baumann, F. H., Ahlborn, K., and Schroter, W. (1995). Phys. Stut. Sol. (a), 150: 625. Stadelmann, P. A. (1987). A software package for electron diffraction analysis and HREM image simulation in materials science, Ultramicroscopy, 51: 131-145. Stenkamp, D. and Jager, W. (1993). Ii1st. Phys. Car$ Ser. 134 Section I , 15-20. Stenkarnp, D. and Strunk, H . P. (1996). Appl. Phys. A 62: 369-372. Stranski, L. N. and Krastanow, L. (1939). Akad. Wiss. Wien. Muth.-Nuturwiss. K l lib 146, 797. Thust, A,, Coene, W. M. J., O p de Beeck, M., and Van Dyck, D. (1996). Ultramicroscopy, 64: 21 1-230. Tillmann, K., Thust, A., Lentzen, M., Swiatek, P., Forster, A,, Urban, K., Gerthsen, D., Remmele, T., and Rosenauer, A. (1996). Phil. Mug. Lett., 7 4 309. Treacy, M. M. J. and Gibson, J. M. (1986). J . Vuc. Sci. Teclinol., B 4(6): 1458-1466. Volmer, M. and Weber, A. (1974). Z. Phys. Chem., 119: 118. Woggon, U., Langbein, W., Hvam, J. M., Rosenauer, A,, Remmele, T., and Gerthsen, D. (1997). Appl. Phys. Lerr., 11: 377.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL 107
Hexagonal Sampling in Image Processing R . C . STAUNTON Depcirtment of Engineering. Universrly of Wuruwk Coventry CV4 7 A L UK
.
.
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A . Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . B . Processor Architectures . . . . . . . . . . . . . . . . . . . . . . C. Binary Image Processing . . . . . . . . . . . . . . . . . . . . . D . Monochrome Image Processing . . . . . . . . . . . . . . . . . . 11. Image Sampling on a Hexagonal Grid . . . . . . . . . . . . . . . . . A . The Hexagonal Packing of Sensory Elements in the Eye . . . . . . . . B. Hexagonal-Shaped Sensor Elements . . . . . . . . . . . . . . . . . C. Two-Dimensional Sampling Theory . . . . . . . . . . . . . . . . . D . Noise and Quantization Error . . . . . . . . . . . . . . . . . . . E. Practical Aspects of Digital Image Acquisition . . . . . . . . . . . . F . Measurement of Two Dimensional Modulation Transfer Function and Bandlimitshape . . . . . . . . . . . . . . . . . . . . . . . . I11. Processor Architecture . . . . . . . . . . . . . . . . . . . . . . . A . Single Instruction, Single Data Computers (SISD) . . . . . . . . . . . B. Parallel Processors . . . . . . . . . . . . . . . . . . . . . . . C. Two- and Multidimensional Processor Arrays . . . . . . . . . . . . D . Pyramid Processors . . . . . . . . . . . . . . . . . . . . . . . E. Pipelined Processors . . . . . . . . . . . . . . . . . . . . . . . F . Hexagonal Image-Processing Pipelines . . . . . . . . . . . . . . . IV . Binary Image Processing. . . . . . . . . . . . . . . . . . . . . . . A . Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . B. Measurement of Distance . . . . . . . . . . . . . . . . . . . . . C. Distance Functions . . . . . . . . . . . . . . . . . . . . . . . D. Morphological Operators . . . . . . . . . . . . . . . . . . . . . E . Line Thinning and the Skeleton of an Object . . . . . . . . . . . . . F . Comparison Between Hexagonal and Rectangular Skeletonization Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . V . Monochrome Image Processing . . . . . . . . . . . . . . . . . . . . A . The Hexagonal Fourier Transform . . . . . . . . . . . . . . . . . B. Geometric Transformations . . . . . . . . . . . . . . . . . . . . C. Point Source Location . . . . . . . . . . . . . . . . . . . . . . D . Image-Processing Filters . . . . . . . . . . . . . . . . . . . . . E. Edge Detectors . . . . . . . . . . . . . . . . . . . . . . . . . F . Hexagonal Edge-Detection Operators . . . . . . . . . . . . . . . . G. The Visual Appearance of Hexagonal Edges and Features . . . . . . . VI . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
232 232 234 235 235 236 236 238 239 245 245 253 259 259 262 263 266 271 275 279 279 280 280 281 281 282 289 289 289 290 290 292 293 294 299 302
231 Volume 107 ISBN 0-12-014749-1
ADVANCES IN IMAGING AND ELECTRON PHYSICS Copyright g f i I999 by Academic Press All rights of reproduction in aiiy form reserved . ISSN 1076-5670/99 530.00
232
R. C. STAUNTON
I. INTRODUCTION This chapter argues the case for the hexagonal sampling of images. Historically, square sampling has always predominated even though at each stage in the development of digital image processing over the last thirty years good hexagonal alternatives have been advanced. Advantages have been shown for the hexagonal scheme in sampling efficiency, processing algorithms and parallel processors; these will be discussed in the following sections. A . Sumpling
Classically, the brightness at a point in a continuous two-dimensional (2-D) field, 6 = f ( x , y) where x and y are the horizontal and vertical distances of the point from the origin. This field can be considered to be sampled by a grid of delta functions to produce a spatially discrete set of brightness values, and these brightness values themselves can be discretized to form what is usually considered to be a digital image (Gonzalez and Woods, 1992). Figure 1 shows two of many possible regular grids of delta functions that can be used for spatial sampling. The vertical spacing of each has been chosen to be identical. If the sampled brightness is included as an orthogonal vector at each point then a digital image is formed. This image may be viewed on a TV monitor if the 2-D space is tiled by picture elements (pixels), where each pixel is associated with one sampled value and is filled with the sampled brightness value. Referring to Fig. la, if t , = t 2 , then the image may be completely tiled by square pixels; in Fig. lb, if t , = 2 / 3 t , , then the image may be completely tiled by regular hexagons or rectangles. The hexagonal tiling shown in Fig. 2 has resulted in the term “hexagonal sampled image.”
a
*
*
*
I- ,,-I
t tJ
FIGURE1. Sampling grids: (a) square; (b) hexagonal.
HEXAGONAL SAMPLING I N IMAGE PROCESSING
233
a
b
C
FIGURE2. Image tilings: (a) square grid with squares tiles; (b) hexagonal grid with hexagonal tiles; (c) hexagonal grid with rectangular tiles.
Various aspects of hexagonal sampling are investigated in Section 11. These include the evolution of hexagonal arrays of biological sensors, the effect of the choice of sampling grid and tiling on the appearance of the image, and the light-gathering properties of the sensor. To show an advantage of one sampling scheme over another in terms of the information
234
R. C. STAUNTON
content, the bandlimiting of the image by the sensor system must be investigated. An example of the measurement of the bandlimiting characteristics of some TV camera-frame grabber systems is given at the end of the section.
B. Processor Architectures Much image processing is accomplished using single-instruction singledatum (SISD) computers (Flynn, 1966). A single-processor PC is an example of such a computer. Here, images are stored in semiconductor memory, disk files, or during computation in an array. For square sampled images the data map directly into a 2-D array, each element of which can be accessed by a pair of row and column pointers that directly relate to the original position of the pixel. For hexagonally sampled images the data can again be readily stored, but the mapping of the data onto a square array within a program requires some care as described in Section 111. With multiprocessor systems, the processing task is divided and distributed among the processor elements (PE). This can simply be accomplished by organizing the PEs in a pipeline and assigning each a different task such as smoothing, edge detection, etc. Other architectures that readily allow such task divisions include hypercubes and shared memory machines. Another way to divide the task is to assign each PE to a local area of the image and then to allow it to sequentially apply separate tasks to that area. With this arrangement, the sampling grid shape can affect the way the PEs are interconnected for communication. Image-processing tasks can be categorized at low, middle or high level (Luck, 1987). Low-level processes have input data that are associated with the original sampling grid and their output is also associated with it. The Sobel edge detector (Gonzalez and Woods, 1992) is an example of such a process. Middle-level processes again take data that are associated with the grid, but the output is often symbolic and not locked to the grid. A Hough transform (Illingworth and Kittler, 1988) that determines the angle of a straight line is an example of such a process. High-level processes have both input and output data that are not locked to the sampling grid. For example, in optical character recognition a process may take as an input a set of features including stroke end points and junctions, and output the ASCII code of the character. The effect of the sampling grid on the structure of multiprocessor systems in discussed in Section 111. A comparison between the processing of rectangularly and hexagonally sampled images by a pipeline processor has been presented.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
235
C. Binary Imuge Processing With a binary image one brightness level is often used to distinguish foreground objects and the other is used to distinguish the background. However, in realistic images containing noise, some pixels invariably are incorrectly classified. With binary images many processing algorithms are concerned with how pixels are connected and hence a tile or pixel model of the image is used. Connectivity is easily defined for the hexagonal scheme (Rosenfeld, 1970) and holds for either tiling shown in Fig. 2. Hexagonal connectivity between the set of pixels in the object and the set in the background can both be defined as 6-way connected. If the cluster of hexagons in Fig. 2(b) are considered as an object, then connectivity between a central pixel and each of the six surrounding neighbors is identical apart from the orientation of the border between the pixels. For the square scheme, pixels can be considered to be part of an object if they are either 4-way connected, that is, along a vertical or horizontal border, or 8-way connected where corner-to-corner connectivity is allowed. Background pixels can also be either 4- or 8-way connected, but if the foreground is 8-way connected, the background must be 4-way connected or visa versa; otherwise foreground and background features may cross over one another. Many hexagonal processing algorithms for binary image processing have been researched and published. Section IV discusses some of the advantages and disadvantages of hexagonal operators and their square counterparts. A comparison between hexagonal and rectangular skeletonization programs is presented. D. Monochrome Image Processing With gray-scale images, pixel, sampling point, and other models of the image structure have been used in the development of processes. Hexagonal counterparts of well-known square-grid process have been designed, and accuracy and computation speed comparisons have been made between the two schemes. Many of the hexagonal algorithms designs have exploited the equidistance between neighboring sampling points rather than any notional pixel shape. Figure 2b shows a regular hexagonal grid with a circle imposed on the six nearest-neighbor sampling points that surround a central point. Algorithms often utilize masks of coefficients that are convolved with local areas of the image. Coefficient weights are often a function of distance from the center, and thus the symmetry of the hexagonal scheme can result in simplified processing. Some square scheme masks used with such convol-
236
R . C. STAUNTON
ution operators are separable but this does not apply to hexagonal scheme operators. This can partly remove the advantage of a hexagonal operator in some cases. Small area masks can be efficiently convolved with the image in the spatial plane, but greater efficiency can be achieved with large masks by initially transforming the mask and data to the Fourier plane. Efficient hexagonal scheme transforms (Rivard, 1977) have been developed that compare favorably with the square-system fast Fourier transform (FFT). In Section V some simple hexagonal processing algorithms for gray-scale image processing are presented and their advantages and disadvantages with their square counterparts are discussed. The design of a simple hexagonal grid edge detector is discussed and its operation compared with that of the square-grid Sobel operator. Finally, comments on the visual appearances of hexagonally and rectangularly sampled images are made.
11.
IMAGE SAMPLING ON A
HEXAGONAL GRID
A . The Hexagonal Pucking of Sensory Elements in the Eye
Biological and opthalmic observations on the human eye indicate that a hexagonal packing of retinal sensory elements has evolved in nature. This was a motivation for the study of hexagonal sampling schemes for computer vision covered in this chapter. Behind the eye, ganglion cells and neurons connect to the retinal sensory elements and to each other to provide processing of the image focused on the retina. Further image processing occurs in the visual cortex and the brain. Models of biological image processing have lead to the development of computer architectures such as artificial neural networks and pyramid processors for computer vision. However, in this section, the discussion is limited to the sensor element structure. Helmholtz includes an anatomical description of the eye in his Treatise on Physiological Optics (Helmholtz, 1911, 1962). The higher orders of life have eyes capable of distinguishing both light and darkness and also form, hence, the eyes can have one of two forms. The first, common among insects, is a composite eye, in which sensory elements separated by opaque septa cover the surface of the eye. The elements at the surface of the eye are usually of a hexagonal and sometimes of a square shape. The second form of eye, as with the eyes of many vertebrates, has a lens that focuses light onto a retina. A section of a retina is shown in Fig. 3. The retina is comprised of rod and cone sensory elements. In the human eye there are approximately 100 million elements of the smaller rod type, and 5 million cones that are distributed among the rods in varying densities depending on the particular
237
HEXAGONAL SAMPLING IN IMAGE PROCESSING
Nerve fibers
FIGURE 3. The human retina: R -rods; C
~
cones; G -ganglion
cells.
part of the retina (Wandell, 1995). In the so-called “yellow spot” only cones are found, whereas towards the periphery of the retina there are only rods. The rods primarily initiate low-level light vision and the cones initiate high-level light vision. Behind the surface layer of rods and cones are layers of fine fibers connecting these elements to a layer of ganglion cells. These cells perform many processes, one of which is to pass information to the optic nerve. Thus the retina is a complicated array of different types of sensory element and has a number of layers associated with detection and interconnection; possibly some image processing is also performed (Watson and Ahumada, 1989). From the anatomic drawings in the Helmholtz treatise, it can be observed that the roughly circular sensory elements tend to pack together eficiently, which leads to a closely packed hexagonal lattice. Opthalmic experiments reported in Helmholtz’s second volume prove this to be the case. In one experiment, Helmholtz set up a grating of light and dark lines of equal thickness viewed at various distances and under differing lighting conditions to measure the spatial resolution of the eye. His results indicated that two bright lines could only be distinguished if an unstimulated retinal element existed between the elements on which the images of the lines fell. This is in accordance with Nyquist’s sampling theorem (Nyquist, 1928). He also noted that for grid spacings close to the resolution limit of the eye, the lines appeared wave-like or modulated with repeated thick and thin sections as shown in Fig. 4. From this effect he inferred that the cone sensors, the only type of sensor in the high-resolution part of the retina, were packed in a hexagonal pattern.
238
R. C . STAUNTON
d
A
FIGURE4. The wave-like appearance of parallel lines when viewed close to the eyes resolution limit and the hexagonal sensor pattern that produces this effect. (Source: He/mholtz Treatise. Helm, 1911).
Images of sections through cone sensors in the yellow spot of a human retina (Curcio et ul., 1987) show a roughly hexagonal shape for each cone, and sections through regions containing only rods show a hexagonal shape for each rod (Wandell, 1995). However, where cones exist in mixed regions their shape becomes more circular (Wandell, 1995).
B. Hexagon-Shaped Sensor Elements For any vision system, be it electronic, photochemical (Mitchell, 1993), biological or other, there are certain design parameters that can be optimized to increase its usefulness for a particular purpose or in a particular environment. Scenes with low light level can be best imaged using sensors with active areas that completely tile the image plane and that have long integration times or low shutter speeds. These techniques together with the use of large area sensors will increase the brightness signal-to-noise ratio (SNR). However, a smaller number of larger sensors will result in a lower spatial accuracy and a longer integration time in motion artifacts or missed events. The 2-D shape of the sensor elements and the geometry of the sampling grid will have an effect on the efficient acquisition of the image. The sensor element shape and any analogue signal processing by, for example, the lens, will 2-D bandlimit the signal before digitization. In the general case, the
HEXAGONAL SAMPLING IN IMAGE PROCESSING
239
oversampling of a spatial frequency bandlimited signal will not provide any increase in information, just more data.
C. Two-Dimensional Sampling Theory Before a computer can process an image, the image must be sampled, and then the quantity sampled digitized. Real-world scenes can be considered as continuous 2-D brightness fields. These brightness fields can be transformed to the Fourier plane and their spatial-frequency components analyzed. The magnitude of these transformed images can be considered as 2-D signals and plotted against vertical spatial frequency and horizontal spatial frequency (Gonzalez and Woods, 1992), and their spectra analyzed. The phase information can be analyzed in a similar way. The image of the scene is focused onto a detector and then sampled. The sampling process can be considered as a 2-D convolution between the continuous image and a grid of delta functions. For 1-D signals, the sampling theorem (Nyquist, 1928) states that if a signal is to be perfectly recovered from its sampled version, then there must be no frequency component in the pre-sampled signal that is greater than one-half the sampling frequency. A more recent theorem (Petersen and Middleton, 1962) allows consideration of multidimensional signals, and for a 2-D image can be stated as follows: A brightness function whose Fourier transform is zero outside all but a finite area of the Fourier plane can be everywhere reconstructed from its sampled values, provided that this finite area and its periodic extensions in the Fourier plane are nonoverlapping. Any real-life continuous scene will contain spatial-frequency components throughout the spectrum and the direct sampling of such a brightness field would result in frequency aliasing where frequencies above half the sampling frequency will be folded about the half-sampling frequency and superimposed on the lower-frequency components. This aliasing results in corruption of the discrete image and makes it impossible to perfectly reconstruct the continuous image. It is important to bandlimit the 2-D signal before sampling so that in the sampled signal the magnitude of its spectrum tends towards zero before components from periodic extensions of the spectrum interfere with the signal and cause aliasing. Figure 5 shows an example of the spectrum of a discrete 2-D signal. The central hill at the origin is identical to the spectrum of the continuous signal, and the other hills are some of the closer periodic extensions of this. Here, the hills do not overlap so there will not be any aliasing. However, the gaps between the hills are indicative of inefficient sampling in that the vertical and horizontal sampling frequencies could be reduced by a factor of approximately two before aliasing would occur. The
240
R. C . STAUNTON
FIGURE5. Example of the spectrum of a 2-D discrete signal
spectrum shown in Fig. 5 results from an image sampled on a square grid. The periodically extended hills are located on a square grid in the Fourier plane with each centered on integer multiples of the horizontal and vertical sampling frequencies. Other sampling grids will lead to other extension patterns in the Fourier plane. The conical shape of each hill has a circular cross section and is known as a circularly bandlimited signal (Mersereau, 1979). If the cross section is taken at the base of the hill, then all the signal information will be contained within this 2-D bandlimit region. The efficient packing of these all-inclusive band regions has been studied (Petersen and Middleton, 1962). Figure 6 shows the 2-D spectrum periodicity for an octagonal bandlimited signal on a skewed grid. The regions are quite separate and the sampling efficiency could be increased by reducing the sampling frequency in the U and V directions. However, such octagonal regions will not pack together to completely tile the plane as do some other shapes located at certain positions. Image sampling is usually achieved with a periodic sampling grid, and that grid is often square, but sometimes rectangular and occasionally hexagonal. The skewed sampling grid has been shown to be the general periodic grid (Petersen and Middleton, 1962) of which the foregoing are only special cases. We can now determine which is the most efficient grid. The minimum number of sampling points required to completely cover the image so that no information is lost must be found. This number will be a function of the grid geometry and the bandlimit of the image signal. If the bandlimit shape can be found and it completely tiles the Fourier plane, then we will have 100% efficiency. In theory there are many shapes that will
HEXAGONAL SAMPLING IN IMAGE PROCESSING
24 1
FIGURE 6. An octagonal band region, shown hashed, and some of its periodic extensions o n a skewed grid.
completely tile a plane including a square, a rectangle, a hexagon, an octagon with a small square extension in one corner, and a triangle that is alternately inverted. In practice, the bandlimit shape will be determined by the shape and characteristics of the sensor and any optical preprocessing by, for example, the lens. Theorists often choose a circular bandlimit shape to work with because then the spatial frequency is limited equally in each direction throughout the image plane. This means that a feature presented to the imaging system and detected at one angle would be equally well detected if presented at any other angle. Early work (Petersen and Middleton, 1962) has shown that circularly bandlimited images can be most efficiently sampled on a regular hexagonal grid as the bandlimit regions pack optimally in the Fourier plane. Such a packing is shown in Fig. 7a. Petersen and Middleton (1962) quote an efficiency of 90.8% for the regular hexagonal grid compared with a maximum efficiency of 78.5% for the square grid.
242
R. C. STAUNTON
Q X
FIGURE7. Tilings of the Fourier plane: (a) circular; (b) square; (c) regular hexagonal.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
243
Mersereau (1979) has calculated that 13.4% fewer samples are required when a circular bandlimited signal is sampled on a hexagonal grid than when sampled on a rectangular grid. He continued to investigate bandlimit shapes in the Fourier plane. If a 2-D continuous image is given by f,(x,y ) , where x is the horizontal and y is the vertical distance from the origin, then a discrete rectangularly sampled image can be described by fd(n1,
n,) =m,t,, n,t,),
(1)
where t , and t , are the horizontal and vertical sampling intervals as shown in Fig. la, and n , and n, are integer indexes to the image array. If F,(Qx, 0,) is the continuous image fE(x, y ) transformed to the Fourier plane, then the image is bandlimited within a shape S if F,(Q,, "J = 0,
(Qx,
a,) 3 s.
(2)
For the continuous image to be completely recoverable from a rectangularly sampled image, it must be bandlimited within the rectangular region defined by w1 < nlt,
w 2 < n/t2,
(3)
where w1 is the horizontal, and o2is the vertical bandwidth in radians m - l . If square sampling has been employed, then w 1 = w 2 and the band region will be square as shown in the crosshatched region in Fig. 7(b). For the discrete image the Fourier plane will be tiled with periodic extensions of this base region with each square centered on coordinates that are 2n multiples of ol,where n is an integer. With a square band region, it is interesting to note that the image will have a frequency response at i-45"that is $ times that for the horizontal direction. A hexagonal sampling theorem has been developed (Mersereau, 1979). A hexagonally sampled image can be described by
f d n i , n 2 ) = f,((n
1
- %/2)t3, n2t2),
(4)
where t, and t , are defined in Fig. lb, n , is an integer index along the horizontal axis, and n, is an integer index along an oblique axis at 120" to the horizontal. The vertical spacing of this grid (t,) has been chosen to be the same as the vertical spacing of the previous rectangular grid. If the hexagonal grid is regular, then t 3 = 2/$t,. For the continuous image to be completely recoverable from a regular hexagonally sampled image, it must be bandlimited within the hexagonal region defined by w 2 < n/t,
w 3 < 4n/3t3,
(5)
where, as shown in Fig. 7c, w g is the horizontal, and w 2 is the vertical
244
R. C. STAUNTON
bandwidth in radians m-'. Substituting for t , in Eq. ( 5 ) , o3< 2 7 t / 8 t 2 , and the maximum values of o2and o3are related by 2
The horizontal extent of the band region is larger than the vertical extent. The hashed regions shown in Figs. 7b and c represent the largest band regions for images that can be sampled on square and hexagonal grids. In practice, the image may be bandlimited to any arbitrary shape, but if this fits within the appropriate hashed region, then the bandlimiting will be sufficient to enable the image to be perfectly reconstructed. Considering a circular bandlimited image, then as shown in Figs. 8a and b, the band region
n
n
FIGURE8. Utilization of available bandwidth by a circular bandlimited image sampled on various grids: (a) square; (b) regular hexagonal; (c) rectangular.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
245
can be made to fit exactly within the square or hexagonal region by adjusting the common grid parameter t,. The circle more completely covers the hexagonal region than the square and there is less wasted bandwidth. The circularly bandlimited image of radius w 1 can be sampled by either grid, the maximum spatial frequency will be equal in any direction within the sampled image, and frequency aliasing will not occur. The vertical spacing of each grid is identical t,, but the horizontal spacing on the hexagonal grid is larger, resulting in a 13.4%' saving in sampling points and an advantage for the hexagonal grid. Fig. 8c shows a rectangular band region containing the circular region of radius wl. The rectangular case has been fully analyzed elsewhere (Mersereau, 1979), but graphical observation indicates poor utilization of the available bandwidth. On the other hand, if the image was square bandlimited, the square grid would have an advantage. The bandlimiting of the image must be investigated before an advantage for one grid can be identified. D. Noise and Quantization Error Noise from a number of sources can corrupt the image. Before sensing, lowand high-frequency lighting can modulate the image. Atmospheric distortion, rain, and vibration of the sensor can also add noise. Electronic noise can be additive or multiplicative, and introduced at the sensor or by the electronics. Quantization error will be introduced in both the spatial and brightness digitizations of the image. The average quantization error can be estimated (Kamgar-Parsi, 1989) and its effect on various image-processing operations evaluated. Quantization error can be estimated for hexagonal grids of sensors (Kamgar-Parsi, 1992). The average error and the distribution of a function on an arbitrary number of independently quantized variables can be estimated and used to compare the relative noise sensitivity of hexagonal and square sampling grids. It has been shown (Kamgar-Parsi, 1992) that depending on the image-processing operation, the effects of hexagonal quantization error can be between 10% below to 5% above that for a square sampling grid quantization error. Finally, it is concluded here that there is little difference between the effects of the quantization error for the two systems.
E. Practical Aspects of Digital Image Acquisition Digital image acquisition systems generally provide several serially organized functions including: (a) continuous (analogue) image forming; (b)
246
R. C. STAUNTON
antialias filtering; (c) spatial discretization; (d) analogue-to-digital conversion (ADC); and (e) signal processing. In addition, reconstruction, the forming of a continuous image at the output of a digital system, is also considered by some researchers (Burton et al., 1991) when estimating the quality of a system. Functions (a) and (b) can readily be considered together and are sometimes referred to as the image-gathering section (Burton et al., 1991). Functions (c) and (d) cover spatial and brightness discretization and are often referred to jointly as digitization. Function (e) refers here to processes such as amplification or impedance matching within the electronics of the system. The transfer functions of the image-gathering, sampling and reconstruction sections can be analyzed separately and then cascaded to determine the total effect on the reconstructed image as a part of the design. Sometimes it is possible to make a total system measurement (Staunton, 1998). By making this separation between the analogue and digital sections of the system we can consider that the image-gathering components bandlimit the analogue image before digitization (Staunton, 1996b). The shape of the bandlimit region can be determined, and, as discussed in Section II(C), must be known before the most efficient sampling grid can be chosen. Antialias filtering is important if the image is to be perfectly reconstructed. Such filters of various orders can readily be designed for time-varying voltage signals, and Fig. 9 shows the modulus of the gain against frequency for such a 1-D filter together with the sampling and folding frequencies. The slope in the cutoff region is determined by the order, and the design will typically require aliased components to be reduced to less than the resolution (1/2") of the ADC above the folding frequency, where n is the number of bits in the ADC output.
IGainl 1
A
ADC resolution
J
1/2" - - - - - - - - - - -
,
I
FIGURE9. One-dimensional antialiasing filter.
*frequency
HEXAGONAL SAMPLING I N IMAGE PROCESSING
247
For an imaging system the nature of the antialias filter will be determined by the physics of the imaging being undertaken. It will be 2-D and the magnitude of its gain can be plotted as a series of 2-D contours in the Fourier plane. Ideally, to avoid aliasing, the magnitude of the gain contour that indicates that the filter output is below the resolution of the ADC should coincide with the baseband spatial-frequency limit imposed by the sampling grid. The baseband region is shown crosshatched in Figs. 7b and 7c for square and hexagonal grids. In practice, a circular bandlimit region that lies within the ideal band region will be used to ensure equal resolution in each direction. Often the antialiasing filter cutoff frequency is determined only by the focusing and limitations of the lens and the receptive area of the sensor. An example of an optical design is given in Section II(F). The sensor array is discrete. It samples the image, but the finite receptive area of each sensor also smooths it. The sampling function can be considered as the convolution of the continuous image with a grid of Dirac delta functions. This can be expressed mathematically by Eqs. (1) and (4), or in a vector form (Ulichney, 1987; Burton et ul., 1991) by s(x) =
c 6(x
-
Vn),
(7)
n
where n is a 2-D integer column vector, 6x is a delta function, and V is a 2-D sampling matrix defined by v 1 and v2, which are linearly independent column vectors, where
The angle between v 1 and v 2 sets the geometry of the sampling grid, that is, 90" for rectangular and 120" for regular hexagonal, and their moduli set the distance between samples. Figure 10 illustrates the geometry of the regular hexagonal grid. Images can be formed from scenes reflecting or emitting electromagnetic radiation from any or several parts of the spectrum. No use is made of the frequency information in monochromatic images, but for color images and other multidimensional images, brightness planes are stored for each of several frequency bands. Visible, infrared (IR), x-ray, radio and ultraviolet images are commonly captured and processed. Other image sources employ ultrasound, seismic waves, surface-point contact measurements and atomicparticle emissions. In each case a large sensing area can increase the sensitivity of the detector and improve the signal-to-noise ratio (SNR), but may reduce the maximum spatial frequency that can be captured. Focusing devices (lenses) can improve the situation. In many imaging cases the sensor
248
R. C. STAUNTON
FIGURE10. Regular hexagonal grid with sampling vectors.
transforms an energy signal into a voltage signaI that can be further processed. The sensor designer may begin with the idea of completely tiling the image plane with sensors, as in the retina of the human eye, and then leading the electrical connections away from the rear. Hexagonal packing may be advantageous as has been found with radar systems (Sharp, 1961), point contact measurement (Whitehouse and Phillips, 1985). and for medical gamma cameras. Figure 11 shows a hexagonal-faced photomultiplier tube from such a camera. A completely tiled sensor array can be analyzed using a pixel model. Each sensor element can be considered to provide the brightness information for one pixel. This implies that the sensor is a perfect integrator over its entire surface, that there is no signal leakage between sensor elements, and that there is no radiation scattering within the array that can result in more than one element responding to a single photon. In practice, these three conditions are seldom true. With integrated sensor technologies such as charge coupled devices (CCD) and CMOS, it is not easy to make large numbers of electrical connections to the rear of the array, and circuits are often laid out alongside the sensor elements to effect data transfer. The active areas of the sensor can be kept large compared to the communications and power circuits, but the pixel model is effectively further compromised. A wafer scale image-processing system has been proposed with connections made to the rear of the wafers, but this did not extend to the hundreds of thousands of connections required for pixel-to-pixel transfers (Nudd et al., 1985).
HEXAGONAL SAMPLING IN IMAGE PROCESSING
249
FIGURE11. Hexagonal-faced photomultiplier tube.
1. CCD TVCamera
CCD image arrays can be 1-D or 2-D, with 2-D image capture being achieved with a 1-D array by scanning the object past it. Images are often large, with 2-D arrays of 512 x 512 or 682 x 512 (4:3 aspect ratio) the most readily available. These sensors discretize the image spatially, but the brightness value remains an analogue value. There are various array architectures (Batchelor et a/., 1985), but the interline transfer (ILT) device is the most popular. The image is focused onto the sensor area and during the acquisition phase the elements store an electric charge that is inversely proportional to the intensity of the light falling on them multiplied by the exposure time. In the readout phase the electric charge is transferred to storage registers that run parallel to the columns of sensor elements. This arrangement is illustrated in Fig. 12. Once the charge is transferred, the sensor elements can begin to receive the next image and the storage registers can begin to communicate the current image to the camera electronics. The registers are analogue devices
250
,-
R. C. STAUNTON
Sensing element
Output shift register
>
Video output
FIGURE12. Interline transfer CCD sensor.
and rely on multiphase clocks to shift the charges and synchronize the process. The column storage registers shift the data a row at a time into an output register, which in turn shifts the data to produce the raster scan output stream to the camera electronics. The electronics should include a reconstruction filter to correctly reproduce the image (Oakley and Cunningham, 1990), as well as amplification and impedance matching circuits. The camera output is therefore a time-varying continuous voltage signal. Considering a frame of this signal as a 2-D image, then it is horizontally continuous and vertically discrete. There are many errors associated with CCD sensors (Schroder, 1980). In particular, there will be light reflection and charge leakage within the array, and frequency bandlimiting caused by the electronics. The area of the light-sensitive elements can be maximized with respect to the shifting elements, but complete tilings of the surface are not possible. The tile and grid shape can be chosen by the designer. Square and rectangular shapes predominate for both, but a small (8 x 8) hexagonal tiled, hexagonal grid sensor array has been fabricated (Hanzal et al., 1985). Large RAM devices are often fabricated with cells on a hexagonal grid to save space. The technology to fabricate hexagonal grids exists. The sensor array discretizes the image. At this stage the bandlimiting of the image can be analyzed so that the best shape can be chosen for the sensor element, and to ensure that there is no signal aliasing. The modulation transfer function (MTF) of the individual components, that is, the atmosphere, the lens and the CCD elements, can be estimated theoretically using simplistic models, and a composite figure is obtained. The MTF is
25 1
HEXAGONAL SAMPLING IN IMAGE PROCESSING
analogous to the modulus of the frequency response of a system for processing time-varying signals. If the distance between the object and the lens is not great the MTF of the atmosphere can be neglected (Tzannes and Mooney, 1995). The ideal sensor element integrates the light intensity over its active area and can thus be considered a lowpass spatial filter. If the element is rectangular, then its horizontal 1-D MTF can be found by Fourier transforming the square profile l-D window of width xm
The spatial cutoff frequency is given by
The model is simplistic and provides only a l-D MTF. Techniques exist for measuring the M T F of individual sensor elements within an array (Sensiper et al., 1993). The lens is the final component to be analyzed. Its primary purpose is to focus the image, but in addition it acts as a lowpass spatial filter and can reduce aliased components. Both diffraction and aberration limiting occur within the lens (Ray, 1988). Diffraction limiting results in a high spatialfrequency cutoff
where 1is the wavelength of the electromagnetic (EM) radiation and N is the f-number of a circular aperture. A smaller aperture thus results in a lower cutoff frequency. If the aperture is circular, then the resulting 2-D bandlimiting will be circular. A l-D profile through a circular 2-D MTF can be calculated (Gaskill, 1978) m-
l.
(12) There are various aberrations that limit the frequency response of a lens. For monochromatic light these are spherical aberration, coma, astigmatism, curvature of field and distortion (Ray, 1988). The lens designer uses multiple elements to correct these aberrations, but the lenses that are often used in cost-effective TV systems still exhibit such defects. Aberration limiting
252
R. C. STAUNTON
results in a cutoff frequency that is proportional to the f-number of the aperture, with a wide aperture resulting in a low cutoff frequency. The cutoff frequency can be calculated for thin lenses (Black and Linfoot, 1957), but calculations are complicated by the choice of definition for “in focus” and by the compounding of thin elements. A computer-aided design (CAD) system or practical measurements should be employed. Figure 13 shows MTFs for an ideal sensor element plotted using Eq. (9), a diffraction-limited lens (f8,visible wavelength) plotted using Eq. (12), and the product of these two that can be considered as the system MTF. The frequency axis has been normalized to the Nyquist frequency of the array. The frequency-limiting components of this system are not providing sufficient filtering to remove aliased components, and the response is still greater than 0.4 at the Nyquist frequency. These simple theoretical techniques are limited in that they do not include aberration-limiting or 2-D information. Practical methods of measuring the 2-D MTF exist, and the results can be compared with the theoretical calculations.
5 Normalized frequency (f/fnyq) FIGURE13. MTFs for: (a) ideal sensor element; (b) diffraction-limited lens ( f-8, visible wavelength); (c) system M T F (product of a and b).
253
HEXAGONAL SAMPLING IN IMAGE PROCESSING
E Measurement of'2- D Modulution Trunsfer Function unnd Bundimit Shupe The MTF of continuous optical processing systems can be measured using traditional techniques (Ray, 1988), but these fail with digital acquisition systems due to signal aliasing. Various techniques have been researched to overcome the problems introduced by discrete sensor arrays. The simplest method is the knife-edge technique, and is suitable for use here to measure the bandlimit region and the filtering of aliased components. The method involves a shifting technique to produce a high-resolution profile across an image edge, and this renders the technique insensitive to geometric distortions. Geometric distortion information is important for applications such as image restoration, and for these, alternative techniques should be used (Zandhuis rt al., 1997; Boudin et ul., 1998). The measurement can be extended to include the frame-grabber digitizer as indicated in Fig. 14. The measurement now encompasses two discrete stages, the sensor array and the frame-grabber ADC. The array digitization is 2-D, but the ADC is operating on a partly discrete raster-scanned image and only digitizes in the horizontal direction. The bandlimit region measurements can be analyzed to show the contributions from each system component. The knife edge is provided by a long straight-edge object, the image of which is dark on one side and bright on the other. It is focused onto the sensor array and a TV frame is grabbed. The MTF is calculated from the stored image that is a smoothed version of the input step. The technique works by aligning an edge slightly off vertical or horizontal. In this way, the straight edge cuts each element along the line of the edge so that it records a slightly different brightness value than its neighbor. Assuming the edge to be straight, edge profiles along the edge can be aligned and a single
Discrete
Continuous
CCD Array - Discrete Discrete
Frame Memory
Camera
-1
2DMTF FIGURE14. TV camera-digitizer system.
254
R. C. STAUNTON
high-resolution profile known as the edge spread function (ESF) is assembled from them. As this is high resolution, this edge contains nonaliased information beyond the sampling frequencies of the array and the ADC. Early implementations of the technique (Reichenbach et a!., 1991) were limited to MTF measurements in the vertical and horizontal directions and required several parallel edges. The use of spatial domain calculations (Tzannes and Mooney, 1995) enabled a single edge to be used, and the consideration of plane waves and interpolation has enabled 2-D measurements to be made (Staunton, 1996b; 1997a; 1998). It is important to set up the acquisition system to be as linear as possible for the technique to be effective. The automatic gain control of the camera must be disabled, and the gamma correction removed. The ESF can be differentiated and transformed to the Fourier plane to give the transfer function of the system, the modulus of which is the 1-D MTF for the particular orientation of the edge profile; MTFs can be obtained for several edge-profile orientations and combined to form a 2-D M T F (Staunton, 1997a; 1998). A comparison of measured 2-D MTFs has been made between six acquisition systems (Staunton, 1998). The systems were made from combinations of three cameras and two frame grabbers. The component specifications as obtained from the manufacturers data sheets are as follows: Camera A: 2/3 in. CCD array. Square element shape. Sample spacing: 10pm horizontal and vertical. Resolution: 756 x 581 elements. Lens: Fixed focal length, 16mm. Camera B: 1/2 in CCD array. Resolution: 752 x 582 elements. Lens: Fixed focal length, 16 mm. Camera C: 1/3 in CCD array. Resolution: 750 elements horizontal, vertical not stated. Lens: Fixed focal length, 16mm. Frame Grabber X: Frame store: 512 x 512. Aspect ratio: 1:l. Frame Grabber Y: Frame store: 512 x 512. Aspect ratio: 4:3. Figure 15a shows measured and simulated 1-D MTFs for one acquisition system. The measured MTF cuts off at a lower frequency than the simulated one. This is to be expected as the simulation of the MTF of the lens did not include aberration limiting, and only ideal CCD array characteristics were used. The measured response at the Nyquist frequency is still 0.2 and significant aliasing will occur. Figure 15b shows a typical high-resolution ESF from which the MTF would have been calculated. Figure 16 shows I-D MTFs obtained for edge profiles oriented in 15" steps from 0" to 90" to the horizontal. The cutoff frequency increases with the angle of the profile, reaching a maximum for a vertical profile. The reduced cutoff frequency in the horizontal direction could be caused by
255
HEXAGONAL SAMPLING IN IMAGE PROCESSING
a
)
FIGURE15. Camera A, frame grabber X (a) a typical MTF; (b) a typical ESF. (Reprimed from IEE Proc. Vision, Imuye and Signal Processing, 145(3): 229-235. Staunton. R. C . (1998). Edge operator error estimation incorporating measurements of CCD TV camera transfer function, with permission from the IEE Publishing Department.)
256
R. C. STAUNTON
FIGURE16. The 1-D MTFs obtained from edge normals at angles of 0" to 90" to the horizontal. Camera A, frame grabber X. (Reprinted from I E E Proc. Vision, Image and Signal Processing, 145(3): 229-235. Staunton, R. C. (1998). Edge operator error estimation incorporating measurement of CCD TV camera transfer function, with permission from the IEE Publishing Department.)
filtering in the camera electronics, or by an antialiasing filter in the frame grabber. These circuits only operate on the raster-scanned signal. Figure 17 shows a quadrant of a 2-D MTF where the results for edge profiles at angles other than those given in Fig. 16 have been found by interpolation. Figure 18 shows slices through the 2-D MTFs for each camera-framegrabber system. The slices are located at the -3dB modulation level and have been normalized to the vertical Nyquist frequency of the CCD array of camera A. The vertical cutoff frequency for each combination is between 0.37 and 0.69, whereas the horizontal cutoffs are between 0.27 and 0.48. The vertical cutoff is limited mainly by the lens and the CCD element area, whereas horizontally, the camera and frame-grabber electronics also provide limiting. The different horizontal and vertical charge-shifting registers in the
HEXAGONAL SAMPLING IN IMAGE PROCESSING
257
FIGURE17. Quadrant of a 2-D MTF interpolated from the data in Fig. 16.
CCD array (Fig. 12) may also lead to differences in the horizontal and vertical responses. The system combinations-Camera A, grabber X; Camera A, grabber Y ; Camera C, grabber X; Camera C, grabber Y-each show an increase in cutoff frequency with increasing edge-profile angle. The bandlimiting is not circular. The horizontal cutoff frequency of Camera A is probably being limited by a reconstruction filter in the output electronics of the camera as the cutoffs are very similar for connections to grabber X and grabber Y. The horizontal cutoffs for Camera B and Camera C are nearly identical when connected to the same frame grabber. The differences here are dependent on the frame grabber, and could be caused by filtering in the input circuitry of the grabber.The traces for the systems including Camera B are nearly circular and thus there is an advantage in using a hexagonal sampling grid. The grid pattern can be realized by the digitization circuits of the grabber by adding a half-sampling period at the beginning of each line in alternate TV fields. If the square CCD element shape was adjusting the shape of the 2-D MTF, then a deviation would be expected in the trace at 45". No such deviations were observed, indicating that other limitations were dominant. This has shown that square CCD elements do not necessarily require a square sampling grid for optimum performance.
258
R. C. STAUNTON
Normalized frequency, angle (degrees)
90
0.8 '....
0.7
60 .* . .
0.6 0.5 \
0.4
'
.
.
.:P
\
.-. ,
30
0.3
0.2 0.1
0 0
0
FIGURE 18. Polar plots of the - 3 dB modulation points of the 2-D MTFs obtained from edge normals at angles of 0" to 9 0 to the horizontal: (a) Camera A, grabber Y; (b) Camera A, grabber X; (c) Camera B, grabber Y; (d) Camera B, grabber X; (e) Camera C, grabber Y; (0 Camera C, grabber X. (Reprinted from I E E Proc. Vision, Image and Signal Processing, 145(3): 229-23s. Staunton, R. C. (1988). Edge operator error estimation incorporating measurements of CCD TV camera transfer function, with permission from the IEE Publishing Department.)
The systems containing Camera A or Camera C ideally require more antialias filtering in the vertical direction. This would also even up the horizontal and vertical responses. Such filtering is difficult to achieve physically without defocusing the lens or allowing vertical charge leakage between CCD elements. The images produced by the systems containing Camera B are nearly circularly bandlimited and can be sampled most efficiently on a hexagonal grid.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
259
111. PROCESSOR ARCHITECTURE The objective of digitizing the image is usually so that it can be processed using a digital computer. This section considers the storage of image data and the spatial relationship between the data. In particular, the square and hexagonal sampling schemes are compared and the advantages and disadvantages of processing them with computers of various architectures are discussed. A detailed comparison of some specific image-processing algorithms is given in Sections IV and V. Parallel computer architecture is a large research area. The parallel processing of images is a smaller area, and the parallel processing of hexagonally sampled images is even smaller. However, most machines can process hexagonal images, but with varying degrees of efficiency. Surveys of parallel computer architectures include that of Fountain (1987), which provides an in-depth study of systems up to 1986. A special 1988 issue of the Proceedings of the I E E E on computer vision, edited by Li and Kender (1988), provides survey papers on architecture (Cantoni and Levialdi, 1988; Maresca et a!., 1988). There has been a special section in the I E E E Trunsuctions on Puttern Anulysis and Machine Intelligence, on computer architecture (Dyer, 1989). A more recent general survey that includes a new taxonomy of processors has been published (Ekmecic et ul., 1996). A . Single Instruction, Single Data Coniputers ( S I S D )
This is the conventional computer (Flynn, 1966). The program and data are stored in memory, and the memory is addressed in the correct order so that each particular datum is accessed and operated on as required. As reviewed in Section II(C), a circularly bandlimited image that has been hexagonally sampled will contain 13.4% less data for an equivalent information content than a square sampled image, but the addressing of the data will be less straightforward. Programs running on such computers need to have the image data or their subsets stored in indexable arrays because the value of an output pixel is often a function of several pixels in the input image. A 2-D square sampled image can be mapped one-to-one into an integer-indexed array where an increment of the index represents a step of one sampling distance in the image. The indexing of a hexagonal image stored in an array is less straightforward. A hexagonal pattern could be set up in a square array by filling only every other cell and shifting this pattern by one cell on alternate rows. Then the array would be twice as large, require double increments of the row address pointers, and the warp introduced would mean that either
260
R. C. STAUNTON
the horizontal or vertical increments would no longer be equivalent to the sample spacing. If the hexagonal image stored in the array is addressed with 60” or 120” axes, then indexing is possible (Mersereau, 1979). An example of such indexing is shown in Fig. 19. For practical use within the computer program the pixel addresses can be mapped to complex numbers (Bell et d.,1989). Alternatively, for small local area calculations, the hexagonal data can be mapped directly into a square array and different convolution masks used depending on whether the central pixel of the area is on an odd or even scan line (Staunton and Storey, 1990). Figure 20 shows a 7-neighbor hexagonal local area where six neighbors are equidistant from a central element, and the position shifting of the neighbors that occurs as the central element is located on either an odd or even scan line within a square 3 x 3 array. With such a scheme two sets of convolution masks are needed for each imageprocessing operation, although each is applied to only one-half of the image. A simple 3-integer coordinate scheme has been researched (Her, 1995) and is known as the symmetrical hexagonal coordinate frame. Three integers point to each pixel. The frame overcomes difficulties experienced with other coordinate systems found when designing some types of processing operator. Figure 21 illustrates the coordinate indexing, with the center of the image shifted to the origin at x = 0, y = 0, z = 0. With this scheme y points to the image scan line number, and x to the individual pixels along the scan 1ine.The image is planar, and if the origin is located centrally,
x +y
+ z = 0.
Y
FIGURE
19. Hexagonal pixels and indexing scheme.
(13)
F)
HEXAGONAL SAMPLING IN IMAGE PROCESSING
z Y
u A
x
V
w
x
26 1
w
FIGURE20. A 7-neighbor hexagonal local area: (a) the neighbor's positions within the image; (b) odd-row array positions; (c) even-row array positions.
The image plane cuts through a 3-D Cartesian space, and each indexed image point coincides with an integer-indexed point in the 3-D space, as illustrated in Fig. 22. This can be useful when processing operators, and especially geometric transformation matrices are being designed. However, loading the image into a 3-D array would lead to poor memory utilization. Thus for good memory utilization, the image should be loaded into a 2-D array that is indexed by the oblique (60")system. Now, allowing for the shift of the origin to the center of the image, x and y for the three-integer coordinate scheme are identical to the two coordinates in the oblique scheme as displayed in Fig. 19. In conclusion, a hexagonally sampled image can be efficiently stored within the memory of a SISD computer. When stored in program arrays the image can be efficiently indexed and data pointers easily calculated. In the
z=o
-2.02
-l,O,l
x=o
FIGURE 21. Three-integer index scheme.
262
R. C . STAUNTON 1
FIGURE22. Three-integer index scheme sampling points embedded within a 3-D Cartesian coordinate grid.
general case the 3-integer index scheme is the most efficient, but the oblique axis and the method involving the direct mapping of data into a square array, and the shifting of convolution masks may also be used.
B. Parallel Processors This section contains a survey of some of the computer architectures that have been used for image processing that have either been designed to process hexagonally sampled images, or that have hexagonal interconnections between processors, but that have been designed to process high-level information that is not locked to a sampling grid. Two-dimensional arrays of fine- and coarse-grain processors are discussed, as are pipeline, vector, pyramid, hypercube, and shared-memory devices. A more general review of parallel architectures for image processing has been published (Downton and Crookes, 1998). If the total image-processing task is considered, parallel processing can be applied in various ways: (a) The image can be divided into local areas, possibly overlapping, and a processor assigned to each area; (b) processes can be pipelined so that one processor completes the first task on the whole image and passes the resultant image on to the next processor for further processing, while waiting for the next image; (c) a pyramid of planes of 2-D arrays of processors can be constructed in which partly processed images are passed up to the next level for further processing, with a reduction in the number of processors and interconnections at each level; and (d) a particular task may be readily performed on a general-purpose array, hypercube, vector, or shared-memory processor.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
263
Computational or execution bandwidth can be defined as the number of instructions processed per second, and the storage bandwidth as the retrieval rate of data from memory (Flynn, 1966). Latency can be defined as the total time associated with a process from excitation to response for a particular data. In practical terms, the latency is the number of processor clock cycles that elapse between the input of a datum and the output of the processed result.
C. Two- and Multi-dimensional Processor Arruys An array of processor elements (PEs) is a group of elements that operates in parallel to process a set of data. Consider initially a simple imageintensity transform where each member of a data set bi,j is multiplied by a scaling constant K . Each transformed pixel q j = Kbi,j. An array of PEs of size i x . j could transform the entire array in one clock operation period. However, in many image-processing problems, ai,j would be a function of pixels within a local or global area. To facilitate these operations, interconnection is provided between PEs. The topology of the interconnections determines the dimensionality of the PE array. Figure 23 shows two examples of 2-D interconnection topology. The 8-way interconnection of PEs has also been realized. Three-dimensional interconnection involves the vertical stacking of such 2-D planes, or the formation of a torus (Li and Maresca, 1989). Multidimensional interconnection topologies are also realizable. The PEs in an array can be single instruction, multiple data (SIMD), or multiple instruction, multiple data (MIMD); M I M D implies a high level of processor autonomy,
FIGURE23. Two-dimensional processor array interconnection topology: (a) rectangular; (b) hexagonal.
264
R. C. STAUNTON
but some autonomy is possible for PEs within the SIMD definition (Maresca et a/., 1988). 1. Fine-Grain Arrays
Fine-grain arrays are more likely to be used for low-level image processing where there is an advantage in associating the array structure with the original data structure and applying connections between elements in local areas. Hexagonal interconnections where each processor connects to its six nearest neighbors have been realized. Some of these machines are listed in the following. Fine-grain array machines are especially useful for morphological image processing. Simple SIMD PEs are often used because arrays are large, many PEs can be integrated onto each VLSI device, and the single instruction processing reduces communication overhead. Figure 24 shows a fine-grain array together with some of the other units required to realize a system. The control unit broadcasts image-processing instructions to the PEs in the array, and receives back busy, finished signals from each. Many array processors are used with a raster-scanned input device such as a T V camera. The TV picture is captured by a frame grabber and then reformatted so that efficient array loading can be achieved. Loading such images requires the hardware overhead of the reformatting logic and a loading time overhead. Loading is often achieved by transferring a complete column of data from the reformatting logic to the array and rippling this and subsequent columns across the array until all columns are filled. In the Clip4 array (Fountain, 1987) the data and control paths are separated so that image loading can be performed concurrently with image processing. This requires an addi-
camera
Frame
Reformatter
Store
Buffer
I I 1
FIGURE 24. A fine-grain array system.
! :
HEXAGONAL SAMPLING IN IMAGE PROCESSING
265
tional hardware overhead. Some examples of fine-grain arrays that incorporate hexagonal interconnections are as follows, Clipl. The Clip4 system (Fountain, 1987) was a fine-grain SIMD processor that embodied the features of Fig. 24. I t was developed from Clip2 and Clip3, which also allowed hexagonal interconnections between PEs. The Clip4 chip, which is used to assemble the array, was designed in 1974, and was limited by the fabrication technology available. Limitations included the number of transistors per device (SOOO), the packaging (40 pin), and the clock speed (5 MHz). The resulting device contained eight PEs and has been used to build arrays of from 32 x 32 to 128 x 128 elements. The arrays can be connected in square or hexagonal 2-D meshes. The processor data width was 1-bit, and as an example of the processing speed, an %bit addition could be performed in Sops. The processor could perform Boolean operations; a 32-bit RAM was provided in each PE, and input gating was used on the near-neighbor input connections to facilitate efficient morphological operator implementation. Individual PEs could be switched off by certain processes, and a global propagation function allowed data to be passed through the array 50 times more efficiently than if propagation was limited to near-neighbor-only communication. Clip4 arrays have been used for many applications (Duff, 1985). A process involving the measurement of the rate of growth of biological cell cultures was possible for a large number of samples, as computation could be performed in less time than it took physically to change the sample (Fountain, 1987). Illiac I l l . The Illinois pattern recognition computer, Illiac 111 (McCormick, 1963), allowed 4-, 6-, and 8-way interconnections between PEs. It was used for analyzing bubble chamber traces. The PSC Circuit. This was a programmable systolic processor that had three 8-bit input channels and three 8-bit output channels (Fisher et al., 1983). Each PE contained an arithmetic and logic unit (ALU), multiplier, microcode store and sequencer, and RAM. It could be connected two, four and six way into arrays. Silicon Retina with Correlation-Based, Velocity-Tuned Pixels. This is a hexagonal architecture implemented on a CMOS chip (Delbruck, 1993). Visual motion computation is implemented using an analogue space-time algorithm. Analog Neural Network. This has been used for image processing (Kobayashi et ul., 1995). Various interconnection topologies were researched including hexagonal.
266
R. C . STAUNTON
Kydon (Bourbarkis and Mertoguno, 1996). This is a multilayer imageunderstanding system. The processors in the lower-level arrays are connected in a hexagonal mesh. 2. Coarse-Grain Arrays With coarse-grain arrays, one PE will be associated with many data, or large local areas of pixels. In some systems memory may be shared among PEs. The PEs are likely to be sophisticated microcomputers, and considerable processing and communication autonomy will be devolved to them. The array is likely to be a MIMD processor. Communication overhead limits the number of processors that can be inserted in an array to obtain faster processing. For some processes, such as low-level image processes, it may be advantageous to divide the image space and assign one PE to each local area. For higher-level processes, the computer programmer may perceive an advantage in redistributing the processing in a different way across the array. This is easier with a shared-memory system. Each of the relatively sophisticated communication channels supported by the PEs require significant chip area for their implementation. This results in early devices being limited to 4-way interconnection, as, for example, in the transputer (Inmos Ltd., 1989). With the development of hypercubes, etc., connectivity has increased again. 6-way interconnectivity has been reported for a system referred to as HARTS (Dolter et al., 1991). The optimum interconnectivity of these arrays is not primarily a function of the sampling grid of an original image. Arrays of PEs are used to speed up processing, but as the number of PEs added increases, the communications bandwidth limits the increase in speed. The interconnection topology can be chosen to optimize processing speed.
D. Pyramid Processors
A typical pyramid architecture is shown in Fig. 25. At the base of the pyramid, level 1 is the input image. This is connected upwards to level 2 so that 4-level-1 pixels connect to 1 level-2 pixel. This is known as a quadrature pyramid. Binary, hexagonal, 16-way, and other connection systems have also been realised. In its simplest form the structure may be a pyramid of memory elements so that reduced resolution images are stored at each level, as with the pipelined parallel machine (Burt et al., 1986). Pyramids of PEs are also realizable with architectural variations in the types of P E at each level and in the autonomy of control. Processor elements communicate between neighbors within their level, and also pass data upwards to their associated PEs at the higher level. Some processes require that data pass
HEXAGONAL SAMPLING IN IMAGE PROCESSING
267
Sun Workstation
FIGURE 25. A typical pyramid architecture.
both up and down the pyramid (Watson and Ahumada, 1989). In addition, PEs can work autonomously within the pyramid, or control can be passed down layer to layer from the apex. In one type of pyramid, the PEs are of the same type in each level and the arrays can be coarse grained (Handler, 1984), or more usually fine grained (Tanimoto et al., 1987). In another type, different PEs will be incorporated at the different levels (Nudd et al., 1989), where level 1 is populated by a 256 x 256 array of SIMD PEs, level 2 by a 16 x 16 MIMD array, level 3 by a 8 x 8 transputer array, and level 4 by a host Sun workstation. Pyramid processors are efficient architectures for image-understanding systems. The input is any general image at level 1 and the output would be a description of the scene in the form of, for example, a list of objects at the highest level.
268
R. C. STAUNTON
1. Hexagonal Pyramids
Hartman and Tanimoto (1984) investigated a hexagonal pyramid data structure for image processing. Level 1 was tiled with hexagons, but each hexagon was subdivided into six equilateral triangular pixels. Four triangles were then combined to give a single equilateral triangle at the next level, as shown in Fig. 26, where level-1 triangles [a, b, c, d] combine to give a level-2 triangle [A]; PEs could also be arranged in such a scheme, but the basic triangular pixel scheme is difficult to sample directly with a raster-scanned device as the image line spacings would be uneven. Resampling hexagonally sampled data to the triangular scheme could be achieved relatively easily. A hexagonal pyramid structure that models the processing structure of the human visual cortex has been researched (Watson and Ahumada, 1989). Anatomically, behind the hexagonally packed retinal sensors are a layer of retinal ganglion cells, which, in the center of the retina, connect one-to-one with the sensors (Perry and Coney, 1985). The ganglion cells can also be considered to be connected on a hexagonal grid. The 2.106 ganglion cells connect to the visual cortex that contains approximately lo9 neurons. Physiological experiments have shown that between the retina and the visual brain the image undergoes a sequence of transformations, and sets of cells in the cortex can be identified with these various transforms. The research considers a transform performed by the ganglion cells and a subsequent one performed within the cortex. The ganglion cells transfer spatial and brightness information. Their transfer function is broadband and
\
level 1
level 2
FIGURE 26. Hartman and Tanirnoto’s pyramid structure.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
269
they provide local adaptive gain control. The transform within the cortex is different. The cells are narrowband and employ a so-called hybrid spacefrequency code to convey the position, spatial variation and orientation of a region. The process in this group of cells has been modeled by a hexagonal orthogonal-oriented quadrature pyramid. The iinage transform performed in the cortex can be considered as image coding and the aim of the research was to model the transform with a pyramid constructed from elements that were themselves modeled on known physiological components. The pyramid had a hexagonal lattice input layer, the transform was invertible, and the overall process was found to be efficient. The input image is passed on by the retinal ganglion cells to the lowest level of the pyramid. This level can be considered to be tiled with hexagonally shaped pixels. The transformation to the next highest level in the pyramid involves taking a group of seven of these pixels (the shaded area in Fig. 27) and producing one output pixel that contains a vector of values from a set of seven kernels, one of which produces the average brightness value of the local area, and the other six of which are bandpass and localized in space, spatial frequency, orientation, and phase. Each low-level pixel only
FIGURE 27. A 7-element local area that produces one value in a reduced resolution image.
270
R. C. STAUNTON
FIGURE28. The hexagonal pyramid structure. (Generated using the program listed in the Appendix of Watson and Ahumada’s paper (1989).)
contributes to one next-level pixel, so the next-level contains only oneseventh the number of pixels, and so on, until the apex of the pyramid is reached. The resulting hexagonal pyramid structure is shown in Fig. 28. In this figure, the input image lattice is represented by the vertices and centers of the smallest hexagons and the highest level, which is also the lowest resolution image, is represented by the largest, thickest line hexagon. At the highest level there may only be one pixel, but the vector associated with it encodes all the image information and can be decoded back down the pyramid to reconstruct the original image.The model produces results that agree reasonably closely with physiological measurements, but some modifications, such as using larger kernels, are needed to produce a better match.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
27 1
E. Pipelined Processors With pipelined processors there is a single stream of data from the memory or input device, and the stream passes serially through several PEs, each of which performs a different operation on the data before the data are finally sent to their destination. This is shown diagrammatically in Fig. 29. At a particular time PE3 is operating on data (0), PE2 is operating on data (l), and PE1 is operating on data (2). Each PE is operating on a different data set and computing a different process. The pipeline can be termed a parallel processor. Programming flexibility can be compromised by such an arrangement. Events are separated in time, as with the SISD processor, but the sequence of instructions performed in a single pipe does not allow branching or easy rescheduling of instruction order. Efficient computation is achieved by applications where the same set of instructions must be applied to large sets of data. With images captured under controlled lighting conditions, local imageprocessing operations can be sufficient. Local image-processing operations do not require a knowledge of the complete frame of an image, but only of a group of adjacent pixels. If these operations are performed on a pipeline processor there is no need to store a complete image. Such a processor may operate directly on the serial data stream from the digitized output of a raster-scanned device. The PEs must operate in real time, but as the operations are very simple only a few lines of the image must be stored in line-length digital-shift registers within each PE. Figure 30 shows a pipeline processing element that stores two lines of the video image. In this example, three bytes of each of the two line-storage registers, together with three bytes from the previous line, form a 3 x 3 local image area on which processing is performed. Real-time processing operations are performed on this array of elements and a new value for the center pixel is calculated and output to form the output video stream. Storing more lines of an image enables larger local areas to be used. For example, a 5 x 5 pixel
FIGURE29. A pipeline processor.
212
R. C. STAUNTON
Video
,in
Operator processor
Video
[3 X 31 mask
FIGURE 30. A pipeline processor element
area could be used by storing four video lines. Processes of increased complexity can be achieved by cascading a number of PEs in a string. The pipelined processor operates in real time, although an increasing latency is introduced by the successive line delays. Many pipelined image-processing machines containing PE architectures based on that of Fig. 30 have been reported in the literature. Few are limited to contain only this simple PE design but also have general-purpose ALU and lookup table elements. If the “Warp” system (Annaratone et al., 1986; Crisman and Webb, 1991) is considered as a pipeline, then each PE can also contain local memory to aid multipass algorithm calculation. However, this feature causes Warp to be classified as a l - D M I M D array. Some pipelined systems are listed in the following, together with notes on those designed to process hexagonally sampled images. 1. Pipelined Systems Early pipelined image processors have been reviewed (Preston et a/., 1979). The basic PE architecture of Fig. 30 is evident in these systems, but of those early pipelined image processors, only the Cytocomputer was capable of
273
HEXAGONAL SAMPLING IN IMAGE PROCESSING
real-time processing. Some more recent systems are reviewed in what follows. Recirculating pipeline systems are characterized by having only short pipelines of PEs, and by individual PEs in the line being of a different hardware construction. Frame stores are used to enable data to be recirculated through the pipeline as shown in Fig. 31, and many video and system buses are employed for data communication and system control. These systems are capable of performing a wide variety of complicated image processes, some of which can be classified as being in the mid-level vision range. For example, the convex hull process has been realized (Bowman, 1988). First, data are scanned horizontally out of one frame store, processed, and restored in the second frame store. The data are then scanned vertically out of the second frame store, processed, and restored in the original frame store. Cytocomputer (Lougheed and Mccubbrey, 1980). The PE design conforms closely to Fig. 30, but with a programmable operator function. The hardware of the PEs in the pipeline is identical. The operator processor is limited to morphological and logical operations on a 3 x 3 local area. Scan line lengths up to 2048 pixels can be accommodated, and PEs are constructed on individual circuit boards from large scale integration (LSI) and very large-scale integration (VLSI) components. Cyto-HSS (Lougheed, 1985). This is a recirculating system based on a pipeline of Cytocomputer PEs.
I
Frame store
* -m
PE sum
-
I
I
PE convolve
PE lookup table
+
-
Frame store
274
R. C. STAUNTON
PIPE (Luck, 1986; 1987). Each PE contains three lookup table operators, image-combining units, a 3 x 3 arithmetic or Boolean local operator, and output crossbar switching logic. Images with resolutions from 256 x 256 to 1024 x 1024 can be processed. The crossbar switching enables data to process normally along the pipeline, to be switched in reverse direction along the pipe, or for the PEs to operate independently. Morphological and filtering operations are possible. University of California machine (Ruetz and Broderson, 1986). This system provides a custom chip set for the designer. Each P E function is realized by a different VLSI chip. Advantages include real-time operation and potential cost reduction through the use of VLSI, but the nonprogrammability of PEs has led to a dynamic inflexibility and a requirement to design a different chip for each PE function. University of Strathclyde (McCafferty et al., 1987). The system uses LSI components, operates in real time and employs sophisticated image-processing algorithms for edge detection. University of Belfast (McIlroy et al., 1984). This system contains a real-time P E incorporating LSI logic devices that perform the Roberts edge-detection algorithm. TITAN (Lenoir et al., 1989). In this design the P E has been implemented on a gate array. It is capable of several binary and gray-level morphological operations. The local operator size is 4 x 3 pixels.
Elor Optronics Ltd. (Goldstein and Nagler, 1987). This is a pipeline processor system for detecting surface defects in metal parts. Each PE is a single-board SIMD computer. Kiwivision (Bowman and Batchelor, 1987). There are three PEs in a recirculating pipeline. Each PE performs a different set of operations. The first PE is a 16-bit ALU, the second, a general-purpose local filter, operating on a 3 x 3 local area, and the third, a lookup table processor. In Kiwivision ZZ (Valkenburg and Bowman, 1988), a pipeline of Datacube PEs feed an Inmos transputer array. Datacube (Datacube Inc., 1989). A series of single board PEs have been produced that can be configured as a recirculating pipeline.
PREP (Wehner, 1989). Here, several parallel recirculating pipelines are used to speed processing by operating on distinct areas of the image. IDSP (Minami et al., 1991). This is a four-pipeline system implemented on a single VLSI chip. Additions and subtractions are allowed between data in each pipe. Applications: Video codec.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
275
Cheng Kung University (Sheu et al., 1992). This system uses a pipeline architecture to perform gray-scale morphological operations. It is suitable for VLSI implementation. Pipeline Processor Farm (Downton et al., 1994) System. This contains several pipes. Applications include general image processing and coding. Chung Cheng Institute (Lin and Hseih, 1994) Modular System. This contains three pipelines. It works in real time on 512 x 512 images. It is suitable for VLSI implementation. Applications: Template matching. New Jersey Institute of Technology (Shih et al., 1995). This pipeline architecture has been implemented as a systolic system. Applications: Recursive morphological operations. Jaguar (Kovac and Ranganathan, 1995). This is a fully pipelined singlechip VLSI device used for color JPEG compression of images of up to 1024 x 1024 pixels. Texas Instruments Pipelines (Olson, 1996). These have been discussed and with particular emphasis on how to program them. Yonsei University (Lee et al., 1997) Real-time System. This is used for HDTV applications and can perform edge detection. Another paper (Lee et al., 1998) includes a discussion on de-interlacing and color processing. F. Hexagonal Image-Processing Pipelines
GLOPR (Golay, 1969) was a pipeline for processing hexagonally sampled images. It operated on a seven-element local area that was passed to it from a host computer. It contained delay lines and could process images up to 128 x 128 in size at 3pS per pixel operation (Preston et al., 1979). It was used extensively for processing medical images, and was produced commercially as the Perkins-Elmer, Diff3. It could also perform many imageprocessing tasks including the basic morphological operations (Preston, 1971). A University of Warwick pipeline system that can process hexagonal or square sampled images has been designed (Storey and Staunton, 1989,1990; Staunton and Storey, 1990). The specification required operation at the video rate, construction from reconfigurable hardware PEs, and a VLSI implementation. A lack of resources has allowed only a simulation of the pipeline to be completed. The PE was designed at a functional level that could be configured to provide one of a number of image-processing operations. The initial device
276
R. C. STAUNTON
operated on a 3 x 3 local area. The processed images could be viewed directly on a TV monitor or transferred to a computer for high-level image processing. The PE has been designed to operate on sampled images up to 512 x 512 in size. Figure 32 shows a simple pipeline comprising an analogue-to-digital converter, two PEs, a digital-to-analogue converter and a control unit; PE (1) is performing edge detection and P E (2) binary line thinning. A novel feature of the PE design was that an image-processing operation such as convolution, edge detection, median filtering, gray-level morphological, or a binary operation could be completely performed with a single PE in one pass of the image data. Figure 33 shows the P E input and output signals. The clock is at the pixel rate. There are two 8-bit image data input channels to each PE. In the figure, one channel is connected to the output of the previous P E in the pipeline, and the other to a second source, which could possibly be the output from a second pipeline. The two channels are combined arithmetically inside the PE. Within the PE the image datum is clocked at the pixel rate through the various processing stages. A pair of unprocessed data is clocked in, and a processed datum clocked out from the PE with every clock pulse. The bandwidth of the PE is equal to the video rate, but the pipelining of the processes within the PE introduces a latency equal to an integer number of pixel clock periods. The PE hardware is based on that shown in Fig. 30. A block diagram of the basic PE is shown in Fig. 34. The video image enters the PE as a raster-scanned 1-D stream and the 3 x 3 local image area is assembled by employing two TV line length, 8-bit wide, digital-shift registers. The image adder allows two separate images to be combined at the PE input. sync Video source
v
video
ADC
v
A
Pipeline control
--
PEU) edge detection
* *
PE (2) binary
7 I
+m
DAC
monitor -z
thin
High level
FIGURE32. A simple pipeline processor consisting of two PEs.
277
HEXAGONAL SAMPLING IN IMAGE PROCESSING 2nd image input channel 8 bit
n
image data
8 bit
Previous
I
8 bit
PE
TV sync
PE
- image data
V
1 bit
I bit
control data 7
Next
TV sync
control data 2
V
PE h
2 bit-
To enable hexagonally sampled images to be processed, the horizontal sampling spacing was increased by a factor of 2/$, and the first sampling point on alternate lines delayed by half-a-point spacing. By definition only even-numbered scan lines are delayed and the first image line is numbered one. The number of points per scan line is reduced by the 2 1 4 factor, giving typical image sizes of 721 x 625 or 443 x 512 pixels in comparison with the equal resolution square-sampled image sizes of 833 x 625 and 512 x 512. With hexagonal sampling the data rate is reduced by the 2 1 4 factor from 13.0MBytess-' to 11.3 MBytess-'. For use in recirculating pipelines the only system modifications are to the initial image frame-grabbing module. Data can be processed by the pipeline at the designed 13.0 MBytes s - rate and thus-stored hexagonally sampled images can be processed in 13.4% less time than equivalent square images. Changes to the PE architecture were minimal so as not to affect hexagonal processing, and extra taps were added to the line delays to reflect the reduced number of pixels per line. The operator processor was also modified. For square-sampled data, some operations performed by the processor require the convolution of the nine image pixels comprising the local area with one or more 3 x 3 arrays of constants stored within the processor. The equivalent for hexagonally sampled data requires convolution with a seven-element array. With the foregoing system modifications for hexagonal data, the position of the central pixel with respect to the six neighbors changes within the grid of nine input pixels on alternate scan lines. This is illustrated in Fig. 20.
278
R. C . STAUNTON
Video
bit
1
Image 1 Video Image2
bit
1
Image adder
8 bit
Line Delays to Assemble a 3 x 3 Local Area
1
multiplication Multiplier array A 9X, coefficients c Contains array of nine 9 bit 8 x6-bit multipliers
Pre-stored multiplication Multiplier array B coefficients
I Selectable operator:
9 bit
sort & select, parallel binary
Contains array of nine
Video output
!
control
x 6 bit 8 x6 bit multipliers
co!trol
PE Control
Control
bit
3 information
FIGURE 34. A block diagram of the pipeline processor element.
It was necessary to store an extra set of convolution coefficient arrays within the PEs operator processor and to toggle between sets on alternate lines. This required the line-synchronization signals to be detected and the extra control signal to be processed by the control unit. The convolution coefficient magnitude range was identical to the square range as was the scaling capability provided. In practice the amount of scaling was less as fewer coefficients are employed. For the processing of the hexagonal edge detector, changes were needed to the square-system edge-detection hardware module to reflect the different magnitude equations. For the other operators implemented within the PE the modifications were minimal.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
279
In conclusion, with a pipeline processor, the processing time for a real-time video image will be unaltered for a particular operation regardless of whether square or hexagonal digitization is employed. One image is processed in one frame time, although a latency delay period is introduced by the string of PEs in the pipeline. Even so, there are still advantages for hexagonal digitization in pipelined systems despite the requirement of some extra control information. The line delays can be reduced in length by 13.4% and the PE master clock can be reduced by the same factor. The shorter line delays reduce process latency and the size of the circuit. In a recirculating pipeline system hexagonally sampled images will be able to be processed in 13.4% less time than square-sampled images. As the local image area contains only seven elements for a hexagonally sampled image, many of the processing modules would be simpler than for a square-system PE. For example, only seven multipliers would be required as opposed to nine in each multiplier array moduie.
IV. BINARYIMAGEPROCESSING With hexagonal binary image-processing operator design, the simple sixway connectivity definition is exploited, and usually an equivalent hexagonal operator will be smaller and more easily computed than its square grid counterpart. Many hexagonal processing algorithms for binary image processing have been researched. As discussed in Section II(C) there will be fewer samples (pixels) covering a given area, but if the hexagonal operators are simpler, or the processes are recursive, greater savings may be possible. The basic binary image-processing operations are described in textbooks (Davies, 1990). Some processes for the hexagonal grid are reviewed in what follows.
A . Connectivity
In determining if a group of pixels is connected together to form an object, a definition of connectivity must first be stated. On a hexagonal grid, all neighboring sampling points, with associated pixels touching a central pixel, are equidistant from the central sampling point. If the pixel shape is hexagonal, then all the nearest neighbors touch the central pixel along equal length sides. This scheme is known as 6-connectedness. Hexagonal grids with rectangular pixels, as shown in Fig. 2(c) can also be defined as 6-connected.
280
R. C. STAUNTON
On a rectangular grid, there are four nearest-neighboring pixels, but four additional pixels touch the central pixel at each corner. There are two definitions of connectivity: 4-connectedneq where only edge-adjacent pixels are neighbors; and 8-connectedness, where corner adjacent pixels are also considered as neighbors.
A problem arises because the connectivity of the background pixels can also be considered. Now, if the four-connected definition is used on both foreground and background, some pixels will not appear in either set. A simple closed curve should be able t o separate the background and object into distinct connected regions, but this is not the case. Again, if the 8-connected definition is used, some pixels will appear in both sets. One solution is to use 4-connectedness for the object, and 8-connectedness for the background. Another is to define a 6-connectedness that involves only two corners. The hexagonal systems unambiguous definition is more convenient. Connectivity is an important consideration in many image processes, especially where groups of pixels are being considered for membership of a particular feature, or the edges of a feature are being traced out and coded. The use of connectivity in shrinking and edge-following algorithms has been explored (Rosenfeld, 1970). Consideration has been given to the more general topological properties of digitized spaces, and in particular to connectivity and the order of connectivity (Mylopoulis and Pavlidis, 1971). B. Measurement of Distance Useful measurements include the distance between points, the dimensions of a part, the area of an object, and its perimeter, etc. (Rosenfeld and Kak, 1982). Connectivity evaluation, counting and edge following also are important operations. C. Distance Functions
Distance functions are used in shape analysis. The distance of each pixel in an object from the boundary of the object is measured and overlaid on the binary image of the object, and this information is then analyzed (Rosenfeld and Pfaltz, 1968). Metrics using a 4-, 6-, or 8-way connectivity have been compared. The 8-way distance involves a $ step for diagonals. The hexagonal 6-way function was found to give a better approximation to Euclidean distance than the other functions (Luczak and Rosenfeld, 1976).
HEXAGONAL SAMPLING IN IMAGE PROCESSING
28 1
D. Morphological Operators
Mathematical morphology is an approach to computer vision based onsearching for the shape of an object or the texture of a surface. Morphological operators are applied repeatedly to the image to remove irrelevant information and to enhance the essential shape of the objects within the scene. These methods are based on set theory. Operator design and application have been considered by various researchers (Matheron, 1975; Serra, 1982, 1986, 1988; Haralick et al., 1987). Hexagonal sampling grids and morphological image processing have been strongly linked since they were first introduced. Hexagonal parallel pattern transformations involving morphological operations have also been reported (Golay, 1969; Preston, 1971). The main reason researchers chose hexagonal sampling was to avoid the ambiguous connectivity definitions between pixels on a square array. One of the most active researchers in this area, Serra (1982, 1986, 1988), makes extensive use of the hexagonal grid, preferring it to the square because of the connectivity definition, its large possible rotation group on the grid, and the simple processing algorithms that result.
E. Line Thinning and the Skeleton of an Object The skeleton or medial axis of a shape can be used as a basis for object recognition. In particular, it is often used in optical character recognition systems. There are several steps involved in the process:
*
Thresholding: The gray-level image is converted to a binary image in such a way as to maintain the shape. Thinning: The shape, which may have a width of several pixels, is analyzed, or eroded, to find a one-pixel thick line that fits centrally within it. Line tracking: The thinned lines are chain coded. Line segmentation: The chain-coded information is converted to vector form. This point is the limit of the skeleton-forming process. Subsequent processes analyze the vectors to identify junctions and then the object.
Variations on this procedure exist and a large number of algorithms developed for the processes at each step. Surveys of these algorithms have also been reported (Smith, 1987; Lam et al., 1992). There are two main classes of thinning algorithm, namely, iterative and methodical. In iterative methods a local area on the edge of the object is examined, and the central
282
R. C. STAUNTON
pixel of the area is removed if certain rules designed to preserve the connectivity of the final skeleton are obeyed. The process is repeated on the image in a way that removes pixels equally from both sides of the object until no further pixels are changed. The resulting skeleton is connected, its pixels are a subset of the original object’s pixels, and it can be sufficient for many recognition tasks. Jang and Chin (1990) have used mathematical morphology to formally define thinning, and produced a set of operators that are proved to produce single-pixel thick connected skeletons. However, this resulting skeleton may lie only approximately in the correct place. The methodical algorithms aim to ensure a correctly positioned skeleton, but the iterative methods can produce sufficiently accurate skeletons for many applications; as they compute efficiently and can be easily realized using local operators, they have been applied to many problems. Deutsch (1972) reports similar thinning algorithms developed for use with rectangular, hexagonal, and triangular arrays, and has compared their operation. The triangular algorithm produced a skeleton with the least number of points, but was sensitive to noise and image irregularities. The hexagonal algorithm was the most computationally efficient, produced a skeleton with fewer points than the rectangular algorithm, and was easily chain coded. Deutsch concluded that of the three algorithms, the hexagonal was optimal. Other hexagonal skeletonization algorithms have been reported (Meyer, 1988; Staunton, 1996a).
I? Comparison Between Hexagonal and Rectangular Skeletonization Programs A comparison (Staunton, 1996a) has been made between an algorithm designed for the rectangular grid (Jang, 1990) and a similar one designed for the hexagonal grid. There are many rectangular grid algorithms for the iterative removal of border pixels. Jang and Chin’s (1990) was used for this comparison as it was designed using a mathematical framework based on morphological set transforms. Using these it can be proved that the final skeleton will conform to most of the following properties:
1. It will contain a number of single-pixel width lines. 2. Each skeletal element will be connected to at least one other. The skeleton will contain no gaps. 3. Skeletal legs will be preserved. 4. It will be accurately positioned. 5. Noise-induced pixels will be ignored, that is, limbs will not be formed towards single-pixel edge protrusions.
Fi fpJ Fi F'Fi F! 283
HEXAGONAL SAMPLING IN IMAGE PROCESSING
X
X
X
D'
x
l
X
x
02
x
E'
X
o
o
x
o
o
x
D4
D3
X
E2
o
x
X
X
E4
E3
FIGURE35. Rectangular scheme thinning templates D = {D',D2,D 3 , D"},and E = { E l , E 2 , E 3 , E"}. (Reprinted from Staunton, R. C. (1996a). An analysis of hexagonal thinning algorithms and skeletal shape representation, Pattern Recognition, 29(7): 1131 1146; Copyright (1996), with permission from Elsevier Science.) ~
It has been possible (Staunton, 1996a) to use similar mathematics to design a hexagonal algorithm that was close in operation to Jang and Chin's (1990). In each case the analysis led to the design of a set of thinning templates as shown in Fig. 35 for the rectangular algorithm and Fig. 36 for the hexagonal algorithm. Further analysis proved that the templates can be applied in parallel pairs to the image for the hexagonal case, and parallel triplets for the rectangular case. If the skeleton is to be positioned correctly the pairs must be applied in a particular order, as shown in Fig. 37 for both the rectangular and hexagonal algorithms. The templates can alternatively be applied sequentially, and for the hexagonal case this produces a better preservation of skeletal legs and a slightly more accurate positioning of the skeleton. However, the parallel application of the templates resulted in a converged skeleton in approximately half the time required for the sequential application of the templates.
1. Skeletal Quality Figure 38 shows some examples of the skeletons of four geometric shapes and a sample of text digitized on a rectangular grid. Figure 39 shows the same examples digitized on a hexagonal grid. These images can be com-
284
8111' R. C. STAUNTON
@ 0
0
p& F'
F4
0
X
X
F2
F3
@@ X
FS
0
X
F6
FIGURE36. Hexagonal scheme thinning templates F = [ F ' , F 2 , F 3 , F4, F 5 , F 6 ) . (Reprinted from Staunton, R. C. (1996a). An analysis of hexagonal thinning algorithms and skeletal shape representation, Pattern Recognition, 29(7): 1131 - 1146; Copyright (1996) with permission from Elsevier Science.)
pared and evaluated with respect to the five good skeletal qualities listed earlier in this section. Properties 1 and 2 hold for each algorithm. Property 3 concerns the preservation of skeletal legs. These are not preserved to 90" corners by the rectangular algorithm, and there is some shrinkage to the corners introduced by the hexagonal algorithm. This shrinkage was less with the sequential version of the hexagonal algorithm (Staunton, 1996a). Property 4 concerns the accurate positioning of the skeleton. Both algorithms have made good attempts at positioning the major axes of each shape, but few minor axes have been preserved by the rectangular algorithm. Considering the skeleton of Fig. 38a, the single component could have been produced by any one of many rectangular shapes, but the information remaining can only indicate the position of the shape and the orientation of its major axis. The hexagonal skeletons contain more limbs and more information. The extra limbs within the rectangular shapes digitized on the
. HEXAGONAL SAMPLING IN IMAGE PROCESSING
285
S
--m I 5 Pass 1: D1, D2, E
'
Pass 2: D2, D3, E2
Pass 3: @, D4,E3
Pass 3: F6, F I
4
Pass 4: D4,D',E4
4 Converged? J, Yes
Pass 4: F3, F 4
4
Pass 5: F5,F 6
4
Pass 6 : F2, F 3
S'
C Yes S'
(a>
(b)
FIGURE37. Template application order for thinning algorithms: (a) rectangular; (b) hexagonal.
hexagonal grid give an indication of their original size. Other rectangular algorithms have been researched that retain skeletal branches (Guo and Hall, 1989). The triangular shapes of Figs. 38b and 39b have both been processed to produce good skeletons. The rectangular algorithm has shifted the center of gravity, or branch junction, towards the base of the triangle. The hexagonal algorithm has produced an unusual step pattern in the lower left limb, and some limb shortening. Both algorithms have produced good skeletons of the text. The long thin strokes and the acute corners have not resulted in missing legs using the rectangular algorithm. The text images are from real scanned text, whereas the geometric shapes were computer generated. Property 5 states that
286
R.C . STAUNTON
FIGURE38. Skeletons produced by the new rectangular algorithm.
noise-induced pixels should be ignored. The text images contain single pixel protrusions in the edges of each letter that can be defined as noise. The hexagonal algorithm was insensitive to these noise pixels, some of which can be observed in Fig. 39e on the cross stroke of the “A” and the bottom of the
HEXAGONAL SAMPLING I N IMAGE PROCESSING
287
FIGURE 39. Skeletons produced by the hexagonal algorithm
“B.” The rectangular algorithm was sensitive to the noise and limbs were formed to these pixels. A 2-pass morphological filter was designed to remove the noise that resulted in the acceptable skeletons seen in Fig. 38e, but introduced a processing overhead.
288
R . C. STAUNTON
2. Program Eficiency Both the rectangular and hexagonal algorithms can be computed on parallel machines, or alternatively, for a SISD machine the templates to be applied in parallel can be logically combined and then computed. The hexagonal templates compute more quickly as they have only seven elements as opposed to nine for the rectangular ones. The removal of a complete layer of pixels from the outside of an object can be referred to as an iteration of the algorithm. For each application of the rectangular 4-pass scheme illustrated in Fig. 37, the “corner” templates are applied twice, and the “edge” templates once, whereas each application of the 6-pass hexagonal scheme applies each template twice. Table I compares the number of applications of the algorithm required to produce a converged skeleton from the images presented in Figs. 38 and 39. In each case the hexagonal algorithm is at least as efficient as the rectangular algorithm. For real rectangularly sampled images edge-noise removal requires the equivalent of an additional two passes of the algorithm. Counting passes,the average hexagonal computation requires only 80% of those for the rectangular computation. If the reduced time to compute the smaller templates (Figs. 35 and 36) is considered, the hexagonal computation will require only 63% of the time of the rectangular computation to calculate the average skeleton. The test images were obtained by subsampling a doubleresolution rectangularly sampled image in such a way that each shape contained the same number of pixels whether sampled on the rectangular or hexagonal grid. If a regular hexagonal grid had been used, then 13.4% fewer pixels would have been required for each shape, and the time to calculate a skeleton would have been reduced to 5 5 % of that to calculate it on the rectangular grid.
TABLE I A COMPARISON OF THE NUMRER OF PASSES REQUIREDTO FORMSKELETONS BY THE RECTANGULAR AND HEXANGULAR ALGORITHMS
Image Vertical rectangle Horizontal rectangle Triangle Corner Text
Rectangular algorithm passes
Hexagonal algorithm passes
20 23 27 12
20 16 24 12
19
15
HEXAGONAL SAMPLING IN IMAGE PROCESSING
289
In conclusion, both schemes produced good quality skeletons, although there were differences in skeletal attributes. The hexagonal scheme could compute the average skeleton in 55% of the time required to compute it with the rectangular scheme.
V. MONOCHROME IMAGE PROCESSING This section contains a review of gray-scale operators that have been designed, to work on hexagonally sampled images. As they are griddependent, they can be defined as low-level processes (Section I). Some operations can be computed more efficiently in the Fourier domain and thus hexagonal transforms have been developed; others are applied in the spatial domain. Where possible, comparisons have been made between these and similar operators designed for the rectangular grid.
A. The Hexugonul Fourier Transform
A hexagonal Fourier transform and hexagonal fast Fourier transform (HFFT) have been developed (Mersereau, 1979; Mersereau and Speake, 1981; Dudgeon and Mersereau, 1984; Guessoum and Mersereau, 1986). It was found that the HFFT required 25% less storage of complex variables than the rectangular fast Fourier transform (RFFT), and that it computed more efficiently. The algorithm is based on the Rivard procedure (Rivard, 1977), rather than the decomposition of the 2-D kernel into 1-D F F T method. Decomposition to 1-D FFTs is not possible in the hexagonal case. This alternative procedure is a direct extension of the 1-D FFT algorithm to the 2-D case, which can increase the computational efficiency of the RFFT by 25%. Mersereau has shown that his HFFT increased computational efficiency by an additional 25% in comparison to the Rivard RFFT. B. Geometric Trun.$brmations
Geometric transformations have been researched using a three-integer coordinate frame (Her, 1995) as described in Section 1II.A. The coordinate frame is easy to use and the symmetry of the grid has enabled the design of simple efficient operators. Operators have been designed for: rounding that finds the nearest integer grid point to a point calculated with real coordinates; translations and reflections; scalings and shearings; and rotations.
290
R. C . STAUNTON
C. Point Source Locution This task, also known as star tracking, involves the tracking of a moving point light source across the array. The image of the source is a blurred spot. The centroid of the spot is calculated to within subpixel accuracy to give the position of the source. Accuracy is improved if the sensor array has a high fill factor, that is, the sensor elements tile the image window as completely as possible. For a 100% fill factor, a hexagonal array of hexagonally shaped sensors has been shown (Cox, 1987) to out-perform a square array of square-shaped sensors. Detection error and sensitivity to noise is reduced, and computational load and data storage are reduced by 24%. For lower fill factors the advantages of a hexagonal array are less pronounced (Cox, 1989).
D. Image-Processing Filters 1. Linear Filters A series of general-purpose hexagonal FIR and IIR filters have been developed (Mersereau, 1979; Mersereau and Speake, 1983) and compared to rectangular filters with similar frequency responses. The hexagonal filters were found to be superior in terms of computational efficiency, and as they could be designed with 12-fold symmetry, they had a more circular frequency response. These filters concern 2-D signal processing, in general, as opposed to only image processing. Savings of up to 58% in memory and similar gains in computational efficiency were reported for hexagonal filters compared to their rectangular counterparts. Considering filters for image processing in more detail, the regular hexagonal structure leads to easy spatial plane local operator design. The local area can be defined to include the central pixel and any number of concentric “shells” of pixels at increasing distances from the center. All the members of a particular shell can be assigned equal weighting factors in many local operator designs. For example, consider a 4-shell Gaussian filter operating on a hexagonal grid, where four weighting factors are initially calculated as shown in Fig. 40, and the final algorithm will be of the form of Eq. (14): P , = kil,l + 1
6
6
6
C1 i 2 , p + m 1 i3.q + n rC= l
p=
i4,r,
(14)
q= 1
where k, 1, m and n are filter weights associated with the four shells, and i denotes image points. Four multiplications and 19 additions are required for the computation of each output pixel.
29 1
HEXAGONAL SAMPLING IN IMAGE PROCESSING
n
m
m
Z
l
n
n
m
l
k l
n
m
l l
m
f
e
d
e
f
e
c
b
c
e
d b a b d
n
m n
e
c
b
c
e
f
e
d
e
f
FIGURE40. Four-shell hexagonal and 6-shell square local operators.
In comparison, a similar filter on a square grid (5 x 5) requires six different weighting factors and a correspondingly more complicated algorithm of the form of Eq. (15). Ps = ail,l + b
4
C p= 1
4
4
i2,p
+C C p= 1
i3,p f
dC p= 1
4
4
i4.p
+e C p= 1
i5.p
+f
C
i6,p7
(15)
p= 1
where a, 6, c, d, e, andf are filter weights associated with the six shells. Six multiplications and 25 additions are required for the computation of each output pixel. Both filters are convolved with a similar image area, but, in general, 13.4% fewer points will be required for the hexagonal filter than for the square. However, in this case, the square-system operator kernel is separable, giving an alternative computation algorithm of the form of Eq. (16):
Now, 6 multiplications and 8 additions are required for the computation of each output pixel. The hexagonal operator requires only 4 multiplications, compared with 6 for both rectangular algorithms, but the number of additions is larger than that for the separable kernel rectangular method. Computational efficiency will be determined by the architecture of the computer arithmetic and logic unit, and depend upon whether the filter coefficients are integer or real numbers.
292
R. C. STAUNTON
2. Nonlinear Filters This class of filters includes designs such as the median filter and gray-scale morphologic filters (Sternberg, 1986; Haralick et GI., 1987). Hexagonal grid median filters should be more computationally efficient than their squaregrid counterparts, because for the same area of support, 13.4% fewer values exist. This will significantly simplify the sorting procedure.
E. Edge Detectors Edges correspond to intensity discontinuities in the image. These discontinuities may correspond to the edges of an object, but unfortunately sometimes they do not. For example, the edge of a shadow is likely to be detected. Many algorithms have been researched, but here some of the simplest are compared. Differential operators model local edges by fitting the best plane over a convenient size of neighborhood. In square arrays two orthogonal operators are applied to a pixel and from the response of these, the magnitude m of the gradient of the plane and the edge angle, a, can be calculated: rn = (th2 + tr12)”2, (17) u = arctan(tu/th),
(18)
where t u and t h are the responses of operators designed to respond maximally to vertical and horizontal edges. Fig. 41 shows Sobel operators designed to be convolved with a 3 x 3 area of the image. For edge detection, the response magnitude is compared with a threshold to determine if a significant edge exists. The Sobel operator has a computational processing time advantage over some other operators as only integer arithmetic is required and the local area in which it operates is relatively small. It has been shown by some researchers to be the optimum 3 x 3 operator (Davies, 1984; Staunton, 1997b).
1
2
1
1 0
0
0
0
2 0 -2
-1
-2
-1
1 0
-1
-1
FIGURE 41. Sobel differential operators with 3 x 3 area
293
HEXAGONAL SAMPLING IN IMAGE PROCESSING
F. Hexagonal Edge-Detection Opercrtors Hexagonal operators have been researched (Staunton, 1989). The regular hexagonal data structure leads to easy local operator design. The central element of the local area is surrounded by shells of elements. Figure 42 shows a set of edge-detection operators exploiting only the inner shell of neighbors, and these are of a comparable order to the 3 x 3 operators in Fig. 41. These hexagonal operators will respond maximally to edges at 60” angular intervals from the horizontal. The weighting functions of the shell elements are chosen as 1 or -1 to reflect the regular structure of the grid of sampling points. Davies’ design principle (Davies, 1984) indicates “1” to be nearly optimal. Again only integer arithmetic is required for computation. If these masks are used as differential operators, the slope magnitude m becomes relatively complicated compared with Eq. (17). The equation of m is derived as follows. The output of each of the three hexagonal operators, as shown in Fig. 42, can be represented as a vector. An edge can be modeled by a plane, and the three vectors, t,, t,, t,, lie within this plane. Assuming orthogonal x and y axes, t , is aligned with the y axis, t , is at 60” to t,, and t , at 60” to t , . The resultant vector. m can be found:
m
t,
=
+ t , + t,.
(19)
Examination of Fig. 42 indicates the simple relationship t ,
m
$
= -( t ,
+ t 2 ) i+ -3 ( t l
2
-
=t,
t2)j.
2
The slope magnitude, m is
m
0
-1
[3(tT
1
0
+ t: - t , t , ) ] t ’ 2 .
-1
1 0
-1
=
-1
0 0
0
1
1
1
0
0
-1
FIGURE 42. Hexagonal differential edge-detection operators.
-
t,, giving
294
R. C. STAUNTON
The angle that m makes with the x axis is known as the edge angle a
a
= arctan
(G=). t, t, -
A comparison between the computational efficiency and accuracy of local edge-detection operators in the two systems has been made (Staunton, 1989).The hexagonal system detector was found to compute more efficiently than the square-system Sobel detector as the mask weights are fewer in number and are all unity. On a SISD computer the hexagonal program is computed in 55% of the time required by the Sobel program. The accuracy of the two detectors was found to be equivalent, with the hexagonal being more accurate with one type of sensor model, and the square more accurate with a second type. G . The Visual Appearance of Edges uiid Features
The visual appearance of monochrome images is illustrated here using hexagonal and rectangular sampled images of a sand core used for metal casting. The core contained three small surface scratch defects that can be seen in Figs. 43 and 44. The illumination employed divided the image of
FIGURE 43. Rectangular sampled sand core image, 64 x 64 resolution.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
295
FIGURE44. Hexagonal sampled sand core image, 64 x 64 resolution.
each defect into a bright and a dark (shadow) segment. There is one large circular defect, a long thin defect, and a small defect with dimensions comparable with the pixel size. The core was 14cm high. The large circular defect has a diameter of 6mm, the long thin scratch has dimensions 30mm by 2 mm, and the small circular defect has a diameter of 1 mm. On comparing the square and hexagonal images in Figs. 43 and 44, the defects can generally be seen more clearly in the hexagonal. The offsetting of pixels on alternate lines enables the eye to trace their outlines more readily at this resolution. The large circular defect appears more circular, and the light and dark segments are more easily discerned. The long thin defect is more easily discernible as a connected component. The object edges, which in these examples are near vertical, are easier to localize in the hexagonal image. Long repeating brightness step sequences are observed in the rectangular image, whereas, a small castellated effect is observed in the hexagonal. In an attempt to segment the defects from the remainder of the image, the two images were then edge detected using the optimum Sobel and corresponding hexagonal operators introduced in Section V(E). The threshold level was set manually so that the resulting edge images contained, where possible, connected edges around the defects, and so that the number of false detections was minimized. Fig. 45 shows the resulting square edge-detected image, and Fig. 46 the resulting hexagonal image. The large
296
R. C. STAUNTON
FIGURE45. Rectangular sampled sand core image edge detected, 64 x 64 resolution
FIGURE 46. Hexagonal sampled sand core image edge detected, 64 x 64 resolution.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
FIGURE 47. Rectangular sampled sand core image, thinned, 64
297
x 64 resolution
circular defect appears to be square in overall shape in the rectangular image and there is a small disconnection in the outline. In the hexagonal image, it appears more circular, and the structure, such as the central dividing line between the light and dark segments, is more easily discerned. There is also a small gap in the outline. The long thin defect has a break in its outline in the square image, whereas the outline is complete in the hexagonal. The equal width of this defect along its length is more discernible in the hexagonal image. The presence of the small defect is indicated by a small group of edge pixels in each image. There are also more detected false edge points in the square image. These are seen as unconnected black pixels in various parts of the image. Figs. 47 and 48 show the square and hexagonal edge-detected images after thinning. The same points as in the foregoing, concerning the defects are still evident. The near vertical object edges appear as gradually increasing steps in the rectangular image, whereas in the hexagonal image a castellated effect is visible.
298
R. C. STAUNTON
FIGURE 48. Hexagonal sampled sand core image, thinned, 64 x 64 resolution.
1. Human Interpretation Interpretation of the images depends on the individual observer, and the resolution of the image being viewed. At the low 64 x 64 resolution of the forementioned images, features are easily discerned in the hexagonal images, and their true shapes, whether circular or rectangular, can be more easily estimated. At the higher resolutions of 256 x 256 and 512 x 512, the aliasing effects at the object edges are less troubling to the eye and may be undiscernible at even higher resolutions. With the offsetting of pixels on alternate lines in the monochrome hexagonal images, the human eye may be able to estimate the boundaries between features more accurately as the pixel boundaries do not align to form long vertical features as in the square system. However, with the binary line images the pixel off setting may appear troublesome to the human eye. This has been reported by other researchers (Preston et al., 1979). Machine interpretation will not depend on the visual appearance of the image, but on the efficiency of the higher-level processes. High-level processing will be easier if a detected edge contains fewer gaps.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
299
VI. CONCLUSIONS This chapter has reviewed research on the sampling of images on a hexagonal grid, the processing of hexagonally sampled images by single and multiprocessor computers, and the computation of image-processing operations on both binary and monochrome hexagonally sampled images. In the following, conclusions are drawn on each of these areas, and an attempt is made to answer the questions of when research should be conducted using hexagonally sampled images, and when it may be commercially advantageous to implement a hexagonally sampled image-processing system. The hexagonal packing of sensors together with a hexagonal sensor shape is found in eyes. Evolution has favored the hexagon. Some manmade sensors have hexagonal shapes, and others circular or rectangular shapes. Each of these shapes has been shown to pack together efficiently on a hexagonal grid. A high fill factor, or complete tiling of the area can lead to a high signal-to-noise ratio; however, for integrated sensor arrays, fill factors below 100% are necessary as communication circuits are required on the surface of the chip to transfer the image signals to the processor. Two-dimensional sampling theory was reviewed, with consideration being given to the aliasing of high-frequency components and the necessity to band limit analog signals before digitization to prevent this. If signals are circularly bandlimited, then their high-frequency information content is limited equally for any direction within the image. This is advantageous, as a feature detected when presented at one orientation to the sensor array can, in theory, be equally well detected when presented at any other orientation. If signals have been circularly bandlimited, then the hexagonal grid is more efficient than the square as 13.4% fewer sampling points will be required to give equal high-frequency information. This reduction in the number of sampling points for the hexagonally sampled images leads to reductions in image-storage requirements and faster subsequent processing. A circular bandlimit has many advantages, and it was shown to be achievable for two CCD TV camera-frame grabber systems. The first discrete stage in such a system is the CCD sensor array, and if a circular bandlimit is to be achieved, then the lens and the active area of the sensor that integrates the brightness signal focused on it will be the main frequencylimiting components. The modulation transfer function (MTF) of the lens can be modeled most simply by the diffraction limit. More sophisticated models include aberration limits, but these are best evaluated using a specialist CAD system at the time the lens is designed. Simple 1-D sensor models regard the sensor as a spatial window. Transforming this window to the Fourier domain results in a spectrum that is a sinc function of distance.
300
R. C . STAUNTON
The theoretical MTF of the system can be found by combining the lens and sensor MTFs, but this was found to overestimate the cutoff frequency. A knife-edge method for measuring the MTF of these discrete systems was outlined and applied to six TV camera-frame grabber systems. A circular bandlimit was found for two systems, and an elliptical bandlimit with a high vertical cutoff frequency was found for the others. Methods to reduce the vertical frequency response to make the bandlimiting more circular were discussed. Once a hexagonal image has been acquired it can be processed using a conventional SISD or a multiprocessor computer. Hexagonal images can be processed by most computer architectures capable of processing square images. With some architectures the structure of the processor interconnections is fixed by the image-sampling grid. When processing hexagonal images, each processor will be connected to six neighbors, and when processing square images, each processor will be connected to four or eight neighbors. For some machines of this type it is possible to set up the connections to enable both types of sampled image to be processed. With architectures based on the sampling pattern, the processing task is divided between processors using spatial criteria. Other divisions are possible and can make it easier to use general-purpose multiprocessors such as sharedmemory, hypercube, or pipeline systems. It depends on the application how the task is best divided between processors. Communications need to be established between the processors, and within a 2-D plane, general-purpose systems employing six-way (hexagonal) communications have been realized. Within a computer program a square image will map directly into a 2-D array, and 2-integer indexing is possible. For hexagonal images several indexing methods have been proposed, but the 3-integer scheme appears to be the most efficient for general use. Hexagonal pyramid systems are interesting to research first, because they can be used to model processing within the human visual cortex, and second, because the structure enables the efficient processing of low-, medium-, and high-level operations on arrays of different processors at each level in the pyramid. An example of a pipeline processor that was capable of processing both square and hexagonal images was given. Small changes to each processor element were required to enable this dual role. For hexagonal-only processing, less data storage (13.4%), a lower clock rate for real-time operation (13.4%), fewer multipliers and adders, but twice as many convolution masks were required compared to square image processing. For recirculating pipelines faster processing was possible with hexagonal images. The processing of binary hexagonal images was reviewed. It is preferred by some researchers to square processing due to its simple definition of connectivity, its large possible rotation group, and the 13.4% reduction in
HEXAGONAL SAMPLING IN IMAGE PROCESSING
30 1
the number of sampling points. Advantages have been found for hexagonal images with distance measurement, distance functions, morphologic operators, and skeletonizing programs. Two similar skeletonizing algorithms, one for hexagonal and one for square images were compared. Both were designed according to the same criterion, and had been proved to produce good-quality skeletons. On a single-processor computer, the hexagonal program was found to calculate the skeleton in 55% of the time required by the square program. The processing of monochrome hexagonal images was reviewed. Hexagonal FFT algorithms have been researched (Mersereau and Speake, 1981; Guessoum and Mersereau, 1986), and in a comparison with a similar square program, a hexagonal program was shown to require 25% less storage of complex variables and to exhibit a 25% increase in computational efficiency. Geometric transforms have been researched (Her, 1995) and shown to compute efficiently when a 3-integer image indexing scheme was used. Hexagonal and square convolution filters have been compared (Mersereau, 1979; Mersereau and Speake, 1983). Due to the symmetry of the filter weight masks savings of up to 58% in memory and computations were demonstrated. The convolution mask weights tend to be arranged in equal value shells around a central value. Fewer shells are required to cover a particular area if hexagonal sampling is used. The details of the design of a hexagonal edge detector and its comparison with the square-system Sobel detector have been presented. Again the symmetry of the hexagonal convolution masks that leads to unit weight coefficients resulted in a detector with a similar accuracy to the Sobel detector, but that could be computed in 55% of the time required by the Sobel detector. In Section V, a pair of resampled hexagonal and square-grid images was used for a visual comparison of edges and features. The hexagonal sampling enabled defects to be seen more clearly and their size better estimated. It is possible that the eye was better able to estimate boundaries more accurately when the pixels were offset in the hexagonal image. After edge detection and line thinning, the hexagonal edges were better connected, but the “zipper” effect caused by the offsetting of pixels in a binary brightness thin vertical line was not as pleasing to the observer as the single-pixel thick lines in the square-edge map. To answer the question on when hexagonally sampled images should be used, the following conclusions can clarify the choice: The quality of circularly bandlimited images is similar between hexagonally and square-sampled images. This has been shown theoretically in terms of information content, and in practice by observation.
302
*
R. C . STAUNTON
Hexagonally sampled images can be processed by most types of computer. A circularly bandlimited hexagonal image requires 13.4% less storage than a square image. For image processes of a similar quality, a hexagonal process may compute in only 55% of the time required by the square process.
Hexagonal sampling and processing will always be important when modeling processes in human vision. For general research the position is less clear as vast libraries of software and a large choice of hardware is available to support the square scheme. This support is important if new ideas are to be tested and published quickly. Processing speed is important for real-time applications. At present, if a computer is not fast enough the researcher can rely on a faster one shortly becoming available. For this group of researchers switching to hexagonal processing may enable them to stay one jump ahead of the computer technology. The author has found that researching hexagonal processes at the same time as square processes can often lead to a deeper understanding of the problem. Commercially, the higher processing speed and reduced storage requirement of hexagonally sampled images may be attractive. The printing of images and text on a hexagonal grid has already been done (Ulichney, 1987). Other self-contained products such as document scanners could well be produced at a lower cost if hexagonal processing was employed.
REFERENCES Annaratone, M., Arnold, E., Gross, T., Kung, H. T., Lam, M. S., Menzilcioglu, O., Sarocky, K., and Webb, J. A. (1986). Warp architecture and implementation. Proc. IEEE 13rlz Int. Syniposium on Computer Architecture, 346- 356. Batchelor, B. G., Hill, D. A., and Hodgson, D. C. (1985). Automared Visual Inspection, Bedford, UK: IFS Publications Ltd. Bell, S. B. M., Holroyd, F. C., and Mason, D. C. (1989). A digital geometry for hexagonal pixels. Image und Vision Computing, 7(3): 194-204. Black, G . and Linfoot, E. H. (1957). Spherical aberration and the information content of optical images. Proc. Roy. Soc. A, 2 3 9 522-540. Boudin, J, P.. Wang, D., Lecoq, J. P., and Xuan, N. P. (1998). Model for the charged coupled video camera and its application to image reconstruction. Optical Engineering, 37(4): 1268- 1274. Bourbakis, N. G. and Mertoguno, J. S. (1996). Kydon: An autonomous multi-layer imageunderstanding system: Lower layers. Engineering Applirutions of Arrijcial Intelli(gence, 9( I): 43-52. Bowman, C . C. and Batchelor, 8 . G . (1987). Kiwivision a high speed architecture for machine vision. Proc. SPIE. 849: 42-51.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
303
Bowman, C. C. (1988). Getting the most from your pipelined processor. Proc SPIE, 1004: 202-210. Burt, P. J., Anderson, C. H., Sinniger, J. O., and van der Wal, G. (1986). A pipelined pyramid machine. I n Pyramidal Systems f o r Computer Visiorl (V. Cantoni and S. Levialdi, eds.). Berlin: Springer Verlag, pp. 133-152. Burton, J., Miller, K., and Park, S. (1991). Fidelity metrics for hexagonally sampled digital imaging systems. J . Imuging Technology, 17(6): 279-283. Cantoni, V. and Levialdi, S. (1988). Multiprocessor computing for images. Proc. I E E E , 76(8): 959-969. Cox, J . A. (1987). Point source location using hexagonal detector arrays, Optical Engineering, 2611): 69-74. Cox, J. A. (1989). Advantages of hexagonal detectors and variable focus for point source sensors, Optical Engineering, 28( 1 1 ): 1145- 1 150. Crisman, J. D. and Webb, J. A. (1991). The warp machine on navlab, I E E E Trans. P A M I , 13(5): 451 -465. Curcio, C. A,, Sloan, K. R., Packer, O., Hendrickson, A. E., and Kalina, R. E. (1987). Distribution of cones in human and monkey retina: Individual variability and radical asymmetry. Science, 236: 579-582. Datacube Inc. (1989). Maxuideo System, Peabody, MA. Davies, E. R. (1984). Circularity a new principle underlying the design of accurate edge orientation operators, Image and Vision Computing, 2(3): 134-142. Davies, E. R. (1990). Machine Vision: Theory, Algorithms, Practicalities. London: Academic Press. Delbruck, T. (1993). Silicon retina with correlation-based, velocity-tuned pixels. I E E E n u n s . Neural Networks, 4(3): 529-541. Deutsch, E. S. (1972). Thinning algorithms on rectangular hexagonal and triangular arrays. Communications A C M , 15(9): 827-837. Dolter, J. W., Ramanathan, P., and Shin, K. G. (1991). Performance analysis of virtual cut-through switching in HARTS: A hexagonal mesh multicomputer, I E E E Trans. Computing, 40(6): 669-679. Downton, A. C., Tregidgo, R. W. S., and Cuhadar, A. (1994). Top-down structured parallelization of embedded image-processing applications, I E E Proc. Vision Image and Sign Processing, 141(6):431-437. Downton, A. and Crookes, D. (1998). Parallel architectures for image processing, I E E Electronics Communication Engineering J . , lO(3): 139- 151. Dudgeon, D. E. and Mersereau, R. M. (1984). Multidimensional Digital Signul Processing, Englewood Cliffs, NJ: Prentice-Hall Inc. Duff, M. J. B. (1985). Real Applications on Clip4. In Integrated Technology .for Parallel Image Processing ( S . Levialdi, ed.). London: Academic Press, pp. 153- 165. Dyer. C. R. (1989). Introduction to the special section on computer architectures and parallel algorithms, I E E E Trans. P A M I , ll(3): 225-226. Ekmecic, I., Tartalja, I., and Milutinovic. V. (1996). A survey of heterogeneous computing: Concepts and systems. Proc. IEEE, 84(8): 1127-1 143. Fisher, A. L., Kung, H. T., Monier, L. M., Walker, H., and Dohi, Y. (1983). Design of the psc: a programmable systolic chip, Proc. 3rd Calrech Conj.' on VLSI, pp. 287-302. Flynn, M. J. (1966). Very high speed computing systems. Proc. I E E E , 54(12): 1901-1909. Fountain, T. J. (1987). Processor Arrays Architecture und Applications. London: Academic Press. Gaskill, J. D. (1978). Linear Systems, Fourier Transforms, and Optics. New York: Wiley. Golay, M. J. E. (1969). Hexagonal parallel patern transformations, IEEE Trans. Computers, 18(8): 733-740.
304
R. C. STAUNTON
Goldstein, M. D. and Nagler, M. (1987). Real time inspection of a large set of surface defects in metal parts, Proc. SPIE, 849: 184.- 190. Gonzalez, R. C. and Woods, R. E. (1992). Digital Irnuye Processing. Reading, MA: Addison We sI ey . Guessoum, A. and Mersereau, R. M. (1986). Fast algorithms for the multidimensional discrete Fourier transform. I E E E A S A P , 34(4): 937-943. Guo, Z . and Hall, R. W. (1989). Parallel thinning algorithms: Parallel speed and connectivity preservation. Cornmuriications A C M , 32( 1): 124- I3 I . Handler, W. ( 1984). Multiprozessoren fur breite answendungsgebiete erlangen, general purpose array. G I NTG Fachtagung Architektur und Betrieb von Rechensystemen Informatik Fachbetrichte. Berlin: Springer-Verlag, pp. 195-208. Hanzal, B. R., Joseph, J. D., Cox, J. A,, and Schwanebeck, J. C. (1985). PtSi hexagonal detector focal plane arrays, Proc. S P I E , 570: 163- 17 1. Haralick, R. M., Sternberg, S. R., and Zhuang, X. (1987). Image analysis using mathematical morphology, I E E E Trans. P A M I , 9(4): 532-550. Hartman, P. and Tanimoto, S. (1984). A hexagonal pyramid data structure for image processing, I E E E Trans. S M C , 14(2): 247-256. Helmholtz, H. L. F. (191 I). Handbuch der Physiologischen Optik. Volume 2, Hamburg, Germany: Verlag von Leopold Voss. Helmholtz, H. L. F. (1962). Treatise on Physiological Optics. Volume 2 (Translated by J. P. C. Southall), New York: Dover Publications. Her, I. (1995). Geometric transformations on the hexagonal grid, I E E E Trans. Image Processing, 4(9): 1213-1222. Illingworth, J. and Kittler, J. (1988). A survey of the Hough transform, Computer Vision Graphics and Image Processing, 44: 87- 116. Inmos Ltd. (1989). T h e Transputer Databook, Second edition, Bristol, UK. Jang, B. K. and Chin, R. T. (1990). Analysis of thinning algorithms using mathematical morphology, I E E E Trans. P A M I , 12(6): 541 -551. Kamgar-Parsi, B. and Kamgar-Parsi, B. (1989). Evaluation of quantization error in computer vision, I E E E Trans. P A M I , ll(9): 929-940. Kamgar-Parsi, B. and Kamgar-Parsi, B. (1992). Quantization error in hexagonal sensory configurations, I E E E Trans. P A M I , 14(6): 665-671. Kobayashi, H., Matsumoto, T., and Sanekata, J. (1995). Two dimensional spatio-temporal dynamics of analog image processing neural networks, I E E E Trans. Neural Nerworks, 6(5): 1148-1164. Kovac, M. and Ranganathan, N. (1995). Jaguar-a fully pipelined VLSI architecture for J P E G image compression standard, Proc. I E E E , 83(2): 247-258. Lam, L., Lee, S. W., and Suen, C. Y. (1992). Thinning methodologies, a comprehensive survey, I E E E Trans. P A M I , 14(9): 869-885. Lee, J. W., Yang, M. H., Kang, S. H., and Choe, Y. (1997). An efficient pipelined parallel architecture for blocking effect removal in HDTV, I E E E Trans. Consumer Electronics, 43(2): 149- 156. Lee, J. W., Park, J. W., Yang, M. H., Kang, S. H., and Choe, Y. (1998). Efficient algorithm and architecture for post-processor in HDTV, I E E E Trans. Consumer Electronics, 44( 1): 16-26. Lenoir, F., Bouzar, S., and Gauthier, M. (1989). Parallel architecture for mathematical morphology, Proc. SPIE, 1199 471-482. Li, H. and Kender, J. R. (1988). Special issue on computer vision scanning the issue, Proc. I E E E , 76(8): 859-862. Li, H. and Maresca, M. (1989). Polymorphic torus architecture for computer vision. I E E E Trans. P A M I , 12(3): 233-243.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
305
Lin, T. P. and Hsieh, C. H. (1994). A modular and flexible architecture for real-time image template matching, I E E E Truns. Circuits and Systems: I-Fundumenlal Theory and Applicutions, 41(6): 457-461. Lougheed, R. M. and Mccubbrey, D. L. (1980). The cytocomputer a practical pipelined image processor, 7th. Int. Symposium in Coniput. Architecture, pp. 271-277. Lougheed, R. M. ( 1985). A high speed recirculating neighborhood processing architecture. Proc. SPIE, 534: 22-33. Luck, R. L. (1986). Using PIPE for inspection applications, Proc. SPIE, 730 12-19. Luck, R. L. (1987). Implementing an image understanding system architecture using pipe, Proc. SPIE, 489: 35-41. Luczak, E. and Rosenfeld, A. (1976). Distance on a hexagonal grid, IEEE E m s . Comput., 25: 532-533. Marescd, M., Lavin, M. A., and Hungwen, L. (1988). Parallel architectures for vision, Proc. I E E E , 76(8): 970-981. Matheron, G. (1975). Rundom Sets und Inreyrul Guometry, New York: Wiley. McCafferty, J. D., Fryer, R. J., Codutti, S., and Monai, G. (1987). Edge detection algorithm and its video rate implementation, Imuge and Vision Computing, 5(2): 155-160. McCormick, B. (1963). The Illinois pattern recognition computer- Illiac 3, IEEE Trans. Electronic Computers, 12(6): 791-X 13. McIlroy, C. D., Linggard, R., and Monteith. W. (1984). Hardware for real time image processing, IEE Proc. Part E, 131(6):223-229. Mersereau, R. M. (1979). The processing of hexagonally sampled two dimensional signals, Proc. IEEE, 67(6): 930-949. Mersereau, R. M. and Speake, T. C. (1981). A unified treatment of Cooley-Tukey algorithms for the evaluation of multidimensional DFT, IEEE Trrrns. ASSP, 29(5): 1011- 1018. Mersereau, R. M. and Speake, T. C. (1983). The processing of periodically sampled multidimensional signals, IEEE ASSP, 31( 1): 188- 194. Meyer, F. (1 988). Skeletons in digital spaces. In Imuye Analysis and Mathematicul Morphology, Volume 2: Theoretical Advances (J. Serra, ed.). London: Academic Press. Minami, T., Kasai, R., Yamaauchi, H., Tashiro, Y.. Takahashi, Y., and Data, S . (1991). A 300-mops video signal processor with a parallel architecture, IEEE J . Solid-state Circuits, 26(12): 1868- 1875. Mitchell, J. W. (1993). The silver halide photographic emulsion grain. J . Imaging Science and Technology, 37(4): 331-343. Mylopoulos, J. P. and Pavlidis, T. (1971). On the topological properties of quantized spaces: I1 connectivity and order of connectivity. J . Assoc. Comput. Machinery, 18(2): 247-254. Nudd, G. R.,Grinberg, J., Etchells, R. D., and Little, M. (1985). The application of three dimensional microelectronics to image analysis, In Inreyrared Technology ,fiw Purallel Imuge Processing (S. Leviadi, ed.). London: Academic Press, pp. 256-282. Nudd, G. R., Atherton, T. J., Howarth, R. M., Clippingdate, S. C., Francis, N. D., Kerbyson, D. J., Packwood, R. A,, Vaudin, G. J., and Walton, D. W. (1989). WPM: A multiple-simd architecture for image processing, IEE 3rd Inr. Con/: on Imuge Proc., Wurwick, U K , Publication No. 307, 161- 165. Nyquist, H. (1928). Certain topics in telegraph transmission theory, Trans. AIEE, 47: 6 17-644. Oakley, J. P. and Cunningham, M. J. (1990). A function space model for digital image sampling and its application to image reconstruction, Computer Vision Graphics and Image Processing, 49: 171-197. Olson, T. J., Taylor, J. R., and Lockwood, R. J. (1996). Programming a pipelined imageprocessor, Computer Vision unrl Image Understunding, 64(3): 35 1-367.
306
R. C. STAUNTON
Petersen, D. P. and Middleton, D. (1962). Sampling and reconstruction of wave number limited functions in n dimensional euclidean spaces, Information and Control, 5: 279-323. Preston, K. (1971). Feature extraction by Golay hexagonal pattern transforms, IEEE Trans. Computers, 20(9): 1007- 1014. Preston, K., DUE, M. J. B., Levialdi, S., Norgren, P. E., and Toriwaki, J. (1979). Basics of cellular logic with some applications in medical image processing, Proc. IEEE, 67(5): 826-856. Ray, S. F. (1988). Applied Photographic Optics. London: Focal Press. Reichenbach, S. E., Park, S. K., and Narayanswamy, R. (1991). Characterizing digital image acquisition devices, Optical Engineering, 30(2): 170-177. Rivard, G. E. (1977). Direct fast Fourier transform of bivariate functions, IEEE Trans. ASSP, 2 5 250-252. Rosenfeld, A. and Pfaltz, 1. L. (1968). Distance functions on digital pictures, Putt. Rec., 1: 33-61. Rosenfeld, A. (1970). Connectivity in digital pictures, J . Assoc. Comput. Machinery, 17(1): 146- 160. Rosenfeld, A. and Kak, A. C. (1982). Digital Picture Processing. Volume 1, New York: Academic Press. Ruetz, P. A. and Broderson, R. W. (1986). A custom chip set for real time image processing, Con$ ICASSP Tokyo, 801-804. Schroder, D. K. (1980). Extrinsic silicon focal plane arrays, In Charge Coupled Devices (D. F. Barbe, ed.). New York: Springer-Verlag. Sensiper, M., Boreman, G. D., Ducharme, A. D., and Snyder, D. R. (1993). Modulation transfer function testing of detector arrays using narrow-band laser speckle, Optical Engineering, 32(2): 395-400. Serra, J. (1982). Image Analysis and Mathematical Morphology. London: Academic Press. Serra, J. (1986). Introduction to mathematical morphology, Computer Vision Graphics and Image Processing, 3 5 283-305. Serra, J. (1988). Image Analysis and Mathematical Morphology. Volume 2, Theoretical Advances. London: Academic Press. Sharp, E. D. (1961). A triangular arrangement of planar-array elements that reduces the number needed, IRE Pans. Antennas Propagat., 3 445-476. Sheu, M. H., Wang, J. F., Chen, A. N., Suen, A. N., and Jeang, Y. L., Lee, J. Y. (1992). A data-resuse architecture for gray-scale morphologic operations, IEEE Trans. Circuits and Systems, II-Anulog and Digital Signal Processing, 39( 10): 753-756. Shih, F. Y. and King, C. P., Pu, C. C. (1995). Pipeline architectures for recursive morphological operations, IEEE Trans. Image Processing, 4(1): 11-18. Smith, R. W. (1987). Computer processing of line images a survey, Putt. Rec., 20(1): 7-15. Staunton, R. C. (1989). The design of hexagonal sampling structures for image digitization and their use with local operators, Image and Vision Computing, 7(3): 162-166. Staunton, R. C. and Storey, N. (1990). A comparison between square and hexagonal sampling methods for pipeline image processing, Proc. SPIE, 1194: 142-151. Staunton, R. C. (1996a). An analysis of hexagonal thinning algorithms and skeletal shape representation, Putt. Rec., 29(7): I131 - 1146. Staunton, R. C. (1996b). Edge detector error estimation incorporating CCD camera limitations, IEEE Norsig94 Signal Processing Conference, Espoo, Finland, pp. 243-246. Staunton, R. C. (1997a). Measuring the high frequency performance of digital image acquisition systems, IEE Electronics Letters, 33(17): 1448-1450. Staunton, R. C. (1997b). Measuring image edge detector accuracy using realistically simulated edges, IEE Electronics Letters, 33(24): 2031-2032.
HEXAGONAL SAMPLING IN IMAGE PROCESSING
307
Staunton, R. C. (1998). Edge operator error estimation incorporating measurements of CCD TV camera transfer function, I E E Proc. Vision, Image and Signal Processing, 145(3):229-235. Sternberg, S. R. (1986). Greyscale morphology, Computer Vision, Graphics and Imuye Processing, 35: 333-355. Storey, N. and Staunton, R. C. (1989). A pipeline processor employing hexagonal sampling for surface inspection, 3rd Int. ConJ on Image Processing and Its Applicutions, IEE Conference Publication No. 307, 156-160. Storey, N. and Staunton, R. C. (1990). An adaptive pipeline processor for real-time image processing, Proc. SPIE, 1197: 238-246. Tanimoto, S. L., Ligocki, T. J., and Ling, R. (1987). A prototype pyramid machine for hierarchical celIuIar logic. In Purallel Computer Vision (L. Uhr, ed.). Boston: Academic Press, pp. 43-83. Tzannes, A. P. and Mooney, J. M. (1995). Measurement of the modulation transfer function of infrared cameras, Optical Engineering, 34(6): 1808-181 7. Ulichney, R. (1987). Digital Ha!ftoning, Cambridge, MA: MIT Press. Valkenburg, R. J. and Bowman, C. C. (1988). Kiwivision I1 a hybrid pipelined multitransputer architecture for machine vision, Proc. SPIE, 1004 91 -96. Wandell, B. A. (1995). Foundations of Vision, Sunderland, MA: Sinauer Associates Inc. Watson, A. B. and Ahumada, A. J. (1989). A hexagonal orthogonal oriented pyramid as a model of image representation in visual cortex, I E E E Trans. BME, 36(1): 97-106. Wehner, B. (1989). Parallel recirculating pipeline for signal and image processing, Proc. S P I E , 1058: 27-33. Whitehouse, D. J. and Phillips, M. J. (1985). Sampling in a two-dimensional plane, J . Physics A, Math. Gen., 18: 2465-2477. Zandhuis, J. A,, Pycock, D., Quigley, S. F., and Webb, P. W. (1997). Sub-pixel non-parametric PSF estimation for image enhancement, I E E Proc. Vis. Image Signal Process., 144(5): 285-292.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 107
The Group Representation Network: A General Approach to Invariant Pattern Classification JEFFREY WOOD lSlS Group. Depurtment of Electronics rind Cotnpuirr Science. University of Southnpoti. Soictliampion SO1 7 1BJ . U.K .
1. Pattern Classification and the Invariance Problem . . . . . . . . . . . . I1. Group Representation Theory . . . . . . . . . . . . . . . . . . . . . . A. Irreducible Representations . . . . . . . . . . . . . . . . . . . . . B. Direct Sum and Tensor Product of Representations . . . . . . . . . . C . Homomorphisms and Intertwining Spaces . . . . . . . . . . . . . . D . Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E . Frobenius Reciprocity . . . . . . . . . . . . . . . . . . . . . . . . F. Special Classes of Representations . . . . . . . . . . . . . . . . . . 111. Linear and Nonlinear Concomitants . . . . . . . . . . . . . . . . . . A. Linear Concomitants . . . . . . . . . . . . . . . . . . . . . . . . B. Transmutation . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
310 313 313 . 316 . 316 320 324 . 325 . 329 330 331 C . Fixed Weight Group Representation Networks . . . . . . . . . . . . . 335 D . Redundancy of Noninduced Representations . . . . . . . . . . . . . . 341 1V. Adaptivity in Group Representation Networks . . . . . . . . . . . . . . . 344 A. Parameterized Homomorphisms . . . . . . . . . . . . . . . . . . . 345 B. Parameterized Homomorphisms for Induced Representations . . . . . . . 348 C . Number of Parameters and Parameter Reduction . . . . . . . . . . . . 354 D . Algorithm for Group Representation Network Construction . . . . . . . 356 E . Symmetry Networks . . . . . . . . . . . . . . . . . . . . . . . . 361 V . Practical Considerations and Simulations . . . . . . . . . . . . . . . . . 362 A. Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 362 B. The Learning Process . . . . . . . . . . . . . . . . . . . . . . . . 363 C . Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 D . Discriminability . . . . . . . . . . . . . . . . . . . . . . . . . . 364 E. Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 VI . The Computational Power of the Group Representation Network Model . . . 370 A. Polynomial Invariants . . . . . . . . . . . . . . . . . . . . . . . 372 B. Construction of Basic Invariants . . . . . . . . . . . . . . . . . . . 376 VII . The Group Representation Network and Other Invariant Classification Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 A. Integral Transform Invariants . . . . . . . . . . . . . . . . . . . . 378 B. Fast Translation-Invariant Transforms . . . . . . . . . . . . . . . . 384 C . Invariant First Order Networks . . . . . . . . . . . . . . . . . . . . 388 D . Higher-Order Neural Networks . . . . . . . . . . . . . . . . . . . . 389 E. Moment Invariants . . . . . . . . . . . . . . . . . . . . . . . . . 390 VIII . Summary and Open Questions . . . . . . . . . . . . . . . . . . . . . 391 Proof of Theorem 111.1 . . . . . . . . . . . . . . . . . . . . . . . . . 395 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
309
Volume 107 ISBN 0-12-014749-1
Crown copyright 1995. Published with permission of DERA on behalf of the Controller of HMSO. All rights of reproduction in any corm rrscrved . 1076-5670/99 $30.00
310
JEFFREY WOOD
I. PATTERN CLASSIFICATION AND
THE
INVARIANCE PROBLEM
Any computational scheme for the modeling of an unknown function requires prior knowledge (even if that knowledge amounts only to an estimate of the complexity of the function to be modeled). Furthermore, the more prior knowledge is used by the modeling algorithm, the greater its performance is expected to be. This is a crucial consideration in classification problems such as pattern recognition. In this context, prior knowledge can take many forms: for example, in an image classification problem, that the objects of interest are dark; or in a signal processing problem, that the important part of the signals has frequency in a given bandwidth. Often, the incorporation of prior knowledge involves construction of the pattern classifier in such a way that certain properties of the input data are transparent to it. We will be concerned here with the problem in which the prior knowledge to be included is that there are certain transformations of the input domain under which the classification remains unchanged. Rephrased, the pattern classifier should be constructed in such a manner that it ignores the application of such input transformations; that is, it should be invariant under those transformations. More specifically, we are going to suppose that the transformations concerned are linear, and that they form a group. Although slightly restrictive, the first assumption covers many problems of interest; the second assumption is quite natural, and amounts only to supposing that each invariance transformation is invertible. The problem now becomes one of invariance under a given group (or more precisely, under a given group representation). Examples of group invariance problems include the following:
1. Recognition of a signal, independently of linear translations along the time axis. 2. Character recognition, the classification being unchanged by translation in two dimensions. 3. Image identification, independently of two-dimensional (2-D) translation, rotation, and scaling. 4. Classification of an object specified by three-dimensional (3-D) coordinates, invariant under the 3-D rigid motion group (i.e., the group of linear transformations that can be performed on a physical object). 5. Checking for membership of a binary string in a given cyclic code, where cyclic permutation will not change the membership property of a string.
THE G R O U P REPRESENTATION NETWORK
31 1
6. The bipolar parity problem, in which the classification of an n-bit string is invariant under multiplication by - 1 of an even number of bits. 7. Recognition of a graph, independent of permutations of the vertices (i.e., the graph isomorphism problem). The author proposes a highly general model for the group invariance problem. This model is called the group representation network (GRN). In principle, a GRN can be constructed for any linear transformation invariance problem, though to date the supporting theory has only been developed for the case of a finite (or compact) invariance group. This universality makes the GRN particularly useful for problems for which there exist no established methods for producing invariant pattern classifiers: that is, those for which the invariance group is ‘‘unusual.” Keeping this basic intention in mind, our approach to the invariance problem will be discussed and the contents of this chapter will be described. There are two principal ways of solving the invariant pattern classification problem. The first is to extract a set of features from the inputs that are invariant under the given group, and then to process these features using some standard pattern classifier. Examples of this method include Fourier analysis or integral transform-based methods (Caelli and Liu, 1988; Sheng and Arsenault, 1986; Wechsler, 1987), and the use of moment invariants (Hu, 1962; Li, 1992). The second method is to build an adaptive invariant, that is, a function that is parameterized (and can thus be adapted to learn a desired mapping) and that remains invariant under the prescribed transformations for ail values of these parameters. The second method includes a number of neural network-type approaches, such as higher-order networks (Giles and Maxwell, 1987; Spirkovska and Reid, 1992). The first method is conceptually easier but involves certain difficulties. When performing an initial feature extraction, there is the problem of choosing the features. These features have to be invariant under the transformation group, and they must contain enough information for the pattern classification process. In addition, it is important that the extracted features do not have additional invariances that will render the classification problem impossible. Often these problems are solved by choosing the features to be a complete set of invariants, that is, a set of invariants under the transformation group which allow an arbitrarily accurate distinction between any two inputs not in the same orbit. However, the construction of a complete set of invariants is, in general, a difficult task. Moreover, the principle of Occam’s Razor dictates that it is important not to use too many features, as this may result in an unnecessarily complex classification system. We will instead advocate the second method of constructing adaptive invariants. Our approach will be to identify a class of basic building blocks
312
JEFFREY WOOD
or modular units. These units will each be parameterized, and, furthermore, each will have a property (in some sense) of transmitting the group action from its inputs to its outputs. This will be explained in detail in Section 111. By connecting these modular units together in a very flexible fashion, we will be able to build invariant systems of arbitrary functional complexity. These are group representation networks (GRNs). We can match the complexity of the GRN to the complexity of the invariance problem. Furthermore, although our construction method requires knowledge of the group’s representations, it does not require knowledge of the group’s invariants. As information on the representations of most groups is more readily available or more easily constructed than information on the group’s invariants, the general problem of invariant pattern classification becomes more tractable with this approach. Another advantage of our approach is that it generalizes readily to problems where the action of the group on the system output (the classification domain) is linear but not trivial. This is a problem of group concomitance rather than one of invariance. An example might be a signal processing problem in which a translation of the input is required to result in the same translation applied to the output. Section I1 provides background material on group representation theory. Section 111 introduces the GRN, and defines and analyzes the simple “modular units” comprising it. This section also shows that a GRN can be naturally viewed as an artificial feedforward neural network. The next section introduces the adaptivity into the GRN and provides formulas for its parameterized structures (the weight matrices), and an algorithm for general GRN construction. Section V discusses the practical issues of learning, discriminability, and generalization in a GRN, and describes several simulations in which GRNs are shown to have better learning and/or generalization performances than comparable neural networks without in-built invariance. Section VI presents the conjecture that any group invariant can be approximated to an arbitrary desired degree of accuracy by some GRN. In support of this, it is shown that any polynomial invariant under a real finite-dimensional representation of a finite group can be computed exactly by a GRN. Section VII adds further weight to the conjecture by demonstrating that many standard invariants used in pattern recognition (e.g., integral transform invariants) can be viewed as GRNs. Section VIII provides a summation. This material is taken largely from the author’s thesis (Wood, 1995). The concept of a group representation network (GRN) is based on the symmetry networks of Shawe-Taylor (1989; 1993), which form an important subclass of GRNs. The structure of a simplified GRN model was discussed in Wood
THE GROUP REPRESENTATION NETWORK
313
and Shawe-Taylor, 1996a. The results relating other invariant pattern classification techniques to GRNs have been presented in Wood and Shawe-Taylor, 1996b. The general form of the GRN model and the discussion of computational power in Section VI have not previously appeared in the literature, other than in the form of a thesis (Wood, 1995).
11. GROUPREPRESENTATION THEORY
This section contains a discussion of some elements of group representation theory, which is central to the approach of this paper. For further information, consult any book on representation theory, for example Cohn (1989), Fulton and Harris (1991), or Ledermann (1977). Another very appropriate summary of representation theory and its application to image processing problems is given by Lenz (1990). While the discussion here is limited to finite groups, it also applies (through the use of tools such as Haar integrals (Lenz, 1990)) to finitedimensional representations of compact groups, including, for example, the natural representations of the classical Lie groups U(n), O(n), SU(n), SO(n). A. Irreducible Representations
We assume the reader is familiar with the notions of a (not necessarily commutative) group, a subgroup, and a group action. We will also need the notion of conjugacy: a group element g~ G is said to be conjugate to any element of the form s-’gs, S E G. A representation of a group is essentially a linear action of that group on some vector space. More formally, a representation of the group G over the field F is a mapping A from G to the set GL(V) of all invertible linear operators on some vector space V over F , which satisfies: VY,,
92EG
A(g 1 9 2 )
= 4 9 ,M(9 2).
(1)
Thus a representation of a group defines a linear action of the group on some vector space. Note that to specify a representation, it is sufficient to specify it for a set of generators of the group.
Example II.1 1. The natural actions of the classical Lie groups, for example, the action of SO(2) on the space V = R2 by rotation about the origin. 2. The natural “permutation representations” of the cyclic groups C,, acting by permutation of components on the space R”.For example,
3 14
JEFFREY WOOD
the group C , = {e, gl,g;}, defined by g: according to the representation:
=k ; ;I. 1 0 0
= e,
acts on the space V = R3
3, =I ; ;I.
0 0 1 49,)
=(; ;
0 1 0
We will often use the “cycle notation” for this and other groups: g1 = (1 2 3), g: = (1 3 2). This notation, which refers to a permutation group’s abstract structure, means that g1 sends 1 to 2, 2 to 3, and 3 back to 1, whereas gt does the opposite.
3. C , has a natural permutation representation similar to that just discussed for C,. Another representation of C , is the one-dimensional (1-D) complex representation A(e) = (I),
A((1
A((1 3)(2 4)) = ( - l),
2 3 4)) = (11, A((1 4 3 2)) = ( - I ) ,
acting on the space V = G. 4. Consider the symmetric group S, of all permutations of n objects. One representation of S, is given by permutation of the components of R”, as in Example 2 for the cyclic group. However, the group S , has another action on the set X of all unordered pairs (i,j) of distinct elements of { 1,. . . ,n}: a group element g takes (i, j ) to (g(i), g( j)). We can interpret this action as the permutation of the edges of an n vertex graph induced by a permutation of the vertices. The given action induces an action of S, on the set of all functions f from X to R, according to (gf)(i, j) = f ( g - l ( i ) , g-’( j)). This is a linear action of S,, that is, a representation of the group. In the case n = 4, there are six distinct unordered pairs of elements, and so the vector space in question is the space I/ = R6.We can identify the natural basis vectors with the unordered pairs (1,2), (1,3), (1,4), (2,3), (2,4) and (3,4), respectively. With this correspondence, and using the standard cycle notation again, the representation is generated by
A((1 2 3 4 ) ) =
0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
I
o ’
0 1 0 0 0 0 1 o o /
\ o o o
THE GROUP REPRESENTATION NETWORK
i
315
1 0 0 0 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0 0 0 0 1 5. The group R (with group operation addition) acts on the set V of all functions from R to C by translation, that is, for any real z, an operator A(z) on this set of functions is defined for each u : R t* C by V ~R E
(A(r)u)(t)= u(t - z).
(2)
This defines a representation A of the group R. Another representation B is defined on the same set of functions by vs E R
(B(z)u)(s) = e2n'sru(s).
(3)
The representations of C , and S, that we have discussed are examples of permutution representations, which are discussed fully in Section 1I.F. Representation theory is often couched in terms of modules (the module corresponding to the representation A in the foregoing is the vector space V on which A acts together with the given group action), but because we will deal exclusively with modules in which a basis has been chosen, it seems more natural to work with the matrices of the representation themselves. For any representation A , we denote by Q ( A ):= V the corresponding module. A finite-dimensional representation is one for which the space V is finite-dimensional. All the examples in the preceding list, except for the last, are finite-dimensional. We will deal here almost exclusively with finitedimensional representations. We will also assume that the field F is fixed and has characteristic zero (e.g., F = R, C). A representation A of a finite group G is called irreducible if Q(A) has no proper nontrivial subspace which is mapped into itself by each g~ G under the action A. Otherwise, A is called reducible.
Example 11.2 For any group, the trivial representation A(g) = 1 for all Y E G is irreducible. Many groups also have alternating representations, which are 1 -D, irreducible, and defined by the property that A(g) = -t 1 for all g E G. One-dimensional representations are of course always irreducible. For the group C,, the irreducible representations are all I-D, and are of the
316
JEFFREY WOOD
form Rk(gI1)= (e'nr(k) for k = 1 , . . . , n. For example, the irreducible representations of C , are generated by R,(g,) = 1, R,(g,) = 1 , R,(g,) = - 1 and R,(g = - z. The natural representation of the 2-D rotation group SO(2) is irreducible. It is clear that IRz does not have an invariant subspace under this representation. Consider the representation of the symmetric group S,, which acts on IR2 according to: the permutation (1 2 3) is a rotation of 120" about the origin, and the permutation (1 2) is a reflection in the y-axis. This is irreducible, as again there is no stabilized subspace. The natural representation of C , discussed in Example 11.1.2 is reducible, because the span of the vector (1,1, l)T is mapped into itself by cyclic permutation. The representation of the additive group IR discussed in Example 11.1.5 is reducible because the subspace of affine functions v(t) = at + h is mapped into itself by the group action. 1)1!!1
4.
5.
6.
7.
B. Direct Sum and Tensor Product of Representations Consider two finite-dimensional representations A and B of a group G on respectively. Then the direct sum of A and B, vector spaces V and denoted A 0 B, is a representation of G on V 0 W defined by:
The tensor product of A and B, denoted by A 0 B, is a representation of G on V 0 W defined by ( A 0 N g ) := A(g) 0
(5)
where 0 on the right-hand side denotes the tensor product (Kronecker product) of linear transformations (matrices). We similarly define the tensor powers @ * A of any representation A . Note that matrices P , Q, R, and S of appropriate dimensions satisfy ( P 0Q)(R 0 S ) = P R 0Q S .
(6)
C. Homomorphisms and Intertwining Spaces Two representations A and B of a group G are said to be equivalent if there exists an invertible linear transformation T satisfying Vgg G
TA(g) = B(g)T
(7)
THE GROUP REPRESENTATION NETWORK
317
This means that A and B can be thought of as the same action on the same space, but defined with respect to different bases. A representation A over a complex field is unitary if the representative matrices A(g) are unitary; when the underlying field is real, the representation is said to be orthogonal and we have A(g)T = A(g)-' = A ( g - ' ) for all gE G. The following result is standard (e.g.. Cohn, 1989; Lenz, 1990): Lemma 11.1 Any jinite-dimensional complex representation of a ,finite group is equivalent to a unitary representation. any finite-dimensional red representation of a jinite group is equivalent to an orthogonal representation.
Let G be a group and let A and B denote two representations of G. A concomitant from A to B is a function 4 : R(A) F+ R(B) with the property: vg E G
4
4.
(8) If the concomitant is a linear map (of vector spaces), then it is called a homomorphism or intertwining operator from A to B. A group invariant can be seen to be a concomitant from a given representation of G to the trivial representation. Two representations are equivalent if and only if they are isomorphic (i.e., there is an invertible homomorphism from one to the other). 0
A(d
=
B(g)
Example 11.3 1. A famous example of an intertwining operator is the discrete Fourier transform (DFT). One of the defining properties of the D F T is that cyclic translations in the original signal domain become componentwise phase shifts in the spectral domain, for example, in the fourdimensional (4-D) case:
hDFT
/ 0 0 0 1 \ 1 0 0 0 0 1 0 0 0 1
I 1 0 0 1 0 0 0 -1 0 0
o\
i, J=l: 4jhDFT5 0
(9)
where hDFTis the matrix representing the 4-D DFT operator. It follows that similar identities hold for other cyclic translation operators, and Eq. (9) essentially states that the D F T is a homomorphism from the natural permutation representation of the cyclic group C , (see Example 11.1.2) to the direct sum of all irreducible representations (see Example 11.2.3). The DFT is also invertible, that is, an isomorphism, and these two representations of C , are thus equivalent. Analogous results hold for a D F T of any order/for any cyclic group. This observation is fundamental to group-theoretic harmonic analysis (Clausen and Baum,
318
JEFFREY WOOD
1993; Hewitt and Ross, 1979). We will return to the D F T in Section VII. The continuous Fourier transform has a similar property with respect to certain representations of the additive group R, as indicated in Example 11.1.5. 2. Consider three representations of the symmetric group S,. The first is the natural 3-D permutation representation A . Second, we have the trivial representation B, and third the irreducible 2-D representation C given in Example 11.2.5, where (1 2 3) is a 120" rotation and (1 2) a reflection. These representations are generated by A((1
2
J,
0 0 1 3)) = (o1 0 0
=(;
0 1 0
AK1
2))
-
1 3 4
B((1 2 3)) = B((1 2)) = 1,
= o - 2 $
2 -1
3+$ l+$
1+$
1
0 -2-2$
T H E G R O U P REPRESENTATION NETWORK
319
The invertible linear operator
is therefore an isomorphism from A to B 0C . Homomorphisms from A to B or from A to C can be extracted as submatrices of W 3. As another example of a noninvertible homomorphism, consider the representation A of S , given in Example 11.1.4 as the action induced by the permutation of edges on a 4-vertex graph:
i A((1
0 0 0 1 0
2 3 4)) =
0 0 0 0 1
1 0 0 0 0
0 0 0 0 0 1
0 1 0 0 0
0 1 0 0
0 0 1 0
\ o o o
i
1 0 0 0
0 0 0 1
0 0 0 0
\
0 0 1 0 ’ 0
oo/
‘i
0 0 0 0 ’
As a second representation of S,, consider the natural permutation representation given by
0 0 0 1 1 0 0 0
0 1 0 0 1 0 0 0
0 0 1 0
0 0 0 1
A homomorphism from A to B is given by the following matrix as can easily be checked (it is sufficient to show that WA((1 2 3 4)) = B((l 2 3 4))W and WA((1 2)) = B((l 2 ) ) W ) :
(10) PZ
P2
PI
P2
P1
320
JEFFREY WOOD
where p 1 and pz can be any real values. In fact, W can be viewed as a general incidence matrix for the fully connected 4-vertex graph. The set of all homomorphisms from A to B can easily be seen to be a subspace of the vector space of all linear transforms from Q(A) to Q(B). This set is called the intertwining space of the pair ( A , B), and is denoted by Hom,(A, B), or by Hom(A, B ) if the group is understood from the context. The dimension of the intertwining space of (A, B ) is called the intertwining number of ( A , B). For a finite group, the intertwining number of ( A , B) is the same as that of (B, A ) and so we can simply refer to the intertwining number of A and B. Schur’s Lemma is another classical result of representation theory (e.g., Cohn, 1989; Ledermann, 1977).
Lemma 11.2 Schur’s Lemma Let A und B be irreducible ,finite-dimensional representations of u ,finite group G over a field of characteristic zero. Let W be a homomorphism,from A to B. Then: 1. W is either invertible or 0. 2. I f A and B are equul then W
= k l for
some scalar k.
Schur’s Lemma states that the intertwining number of two inequivalent irreducible representations is 0, whereas the intertwining number of two equivalent irreducible representations is 1.
D. Churacters Let A denote a finite-dimensional representation of a group G over the field F . The character of A is a function x [ A ] : G H F , defined by: vg E G
1I[AI(g) = Trace(&)).
(11)
The character is so-called because in many senses it characterizes the representation; for example, it is easy to show that equivalent representations have the same character. The character is also constant on conjugacy classes, for example, for S , it must hold that x[A]((l 2 3))= x[A](( 1 2 4))= )1[A1((2 4 3)) = . ..,etc. Example 11.4 1. For the natural permutation representation A of C, described in Example 11.1.2, x[A](g) = n if g = e, and x[A](q) = 0 otherwise. More generally, for any permutation representation A of a group G, we see that x[A](g) equals the number of basis elements of Q[A] which are fixed by g.
321
T H E G R O U P REPRESENTATION NETWORK
2. Consider the permutation representation B of S,, induced from the permutation of the edges of a 4-vertex graph (Example 11.1.4). The character of this representation is more complex; we have xCBl(e) = 6 X[B1((1
2 3 4))
= xCBI((1
2 4 3)) = XCBI((1 3 2 4))
= XCBI((1
3 4 2)) = X[B1((1
= x[B]((l
4
4 3 2))
3 2)) = 0
XCSl((1 2 3)) = XCBl((1 3 2)) = XCBl((1 2 4)) = XCBI((1 4 2)) = XCBI((1
3 4)) = XCBl((1 4 3))
= x[B]((2
4 3)) = 0
XCBI((1 a 3
4)) = XCBI((1 3x2
=
XCSl((2 3 4))
4)) = XCBI((1 4"
3)) = 2
XCBI((1 2)) = XCBI((1 3)) = XCBI((1 4)) = xCBI((2 3)) =X
C
4))~ = ~ c m 34)) = 2.
3. The irreducible representation A of S , given in Example 11.2.5 has the following character: xlIAl(e) = 2
XCAI((1 2 3))
xCAl((1
= XIIAI((1
2)) = XCAI((1 3))
3 2)) = - 1
= XIIAI((2
3)) = 0.
For any finite group G, the functions from G to a subfield F of G forin a Hilbert space with inner product vflt f 2 :
GH F,
1
<,{I?
fi>G:=
IG(
.f;(g)f'2t(g),
(12)
G
where denotes complex conjugate. The following standard result uses this Hilbert space to characterize intertwining numbers. Lemma 11.3 Let A and B he two,finite-dimensional representations of afinite group G oiler a suhjield of G;then the intertwining number o f A and B is equal to the inner product of x[A] and x[B]. The intertwining number of two representations can generally be readily calculated as the inner product of characters. We will find this invaluable in the design of invariant pattern classification systems.
322
JEFFREY WOOD
Example 11.5 1. Consider the two permutation representations A and B of S, discussed in Example 11.3.3. The intertwining number of A and B is found as follows (using the fact that characters are constant or conjugacy classes):
= 2,
which means that the space of homomorphisms from A to B (or vice versa) has dimension 2. In fact, as we shall see later, all homomorphisms from A to B have the form of Eq. (10) in Example 11.3.3. 2. Let A denote the irreducible representation of S, considered in Example 11.2.5 and Example 11.4.3, and let B denote the trivial representation, which is also irreducible. Now the inner product of the characters of these representations is 1 (~c.41, XCBI>~ = ~ x c A I ( ~ ) X C B I ( AN((^ ~)+~
1 6
+ 2.-
= ~(1.2.1
1.1
2 ~ ) ) X C B I ( (2~ 3))
+ 3.0.1)
= 0.
meaning that the only homomorphism from A to B is 0. This confirms Schur's Lemma. In general, the inner product of distinct characters of irreducible representations must be 0. The following standard results are central to representation theory:
T H E GROUP REPRESENTATION NETWORK
323
Lemma 11.4 Let G be a j n i t e group. 1. G has a finite number of irreducible representations (to within equivulence), equal to the number of conjugacy classes of the elements of G. 2. Any character is constant on conjugacy classes. 3. For two irreducible representations A and B of G,
Furthermore, ( x [ A ] ,x[A])o= 1 fi and only if A is irreducible. 4. Any finite-dimensional representation of G is equivalent to a direct sum of irreducible representations: K
A
E
@(gniRi), i= 1
where n,, . . .,nKE N and R,, .. . ,R Kare the inequivalent irreducible representations of G. Such a decomposition is unique (to within permutation of the indices i). 5. The multiplicities ni in the decomposition Eq. (14) are the respective intertwining numbers of A and Ri, that is, ni = ni(A):= (xC.41, X C R ~ I ) ~ .
(15)
6. The intertwining number of two finite-dimensional representations A and B is given by K
dim Hom,(A, B ) =
n,(A) -ni(B).
(16)
i= 1
Proof The last result is less standard, so we derive it from the others. By Lemma 11.3, the intertwining number is given by the product of the characters of A and B. From the decomposition of Eq. (14), the character of A can be written K
xCAI
=
C ni(A)~CRil,
i= 1
and similarly for x [ B ] . Now we use the linearity of the inner product, and apply law 3 to derive the required result. The results of Lemma 11.4 are very helpful in constructing the irreducible representations of a given finite group, and their characters. The characters of a group are often displayed in a character table, which lists the characters of all the irreducible representations in terms of the value each takes on each
324
JEFFREY WOOD
conjugacy class. For example, for S, we have:
S, Rl R2 R3
R, R5
I
e
(1 2)
1 1
1 -1 1 -1 0
3 3 2
3)
(1 2 1
(1 2 3 4) 1 -1 -1 1 0
1 0 0 -1
(1 2)(3 4) 1 1 -1 -1 2
Note that the first column of the table gives the dimension of the corresponding representation.
E. Frobenius Reciprocity Frobenius reciprocity is a useful tool for relating homomorphisms between representations of a given group to those between representations of its subgroups. Let G be a group, and let H denote an arbitrary subgroup. Let A be a representation of G, and denote by res,GA the restriction of A (as a function) to the subgroup H; this is clearly a representation of H and is called the restriction of A to H. Now let G be a group with subgroup H of finite index m. The right cosets H iof H are the sets H i = Hg E G, g E G. When G is finite, the cosets are of equal size IHI and they partition the group; thus there are m cosets. The cosets are defined by a set of right coset representatives, that is, a set hi E H i for i E 1,. . . ,m. Each right coset representative defines its right coset by H i= Hhi. Now let A denote a representation of H, then the induced representation of G from A , denoted by indZA, is defined by the formula:
,
V ~ E G(indzA)(y) :=
T m , ... where the submatrices
(17)
Tmm
qj are given by:
T j : = {;(high,:
I)
if highJ: E H otherwise.
We also say that indtA is induced from the representation A. The representation induced depends upon the choice of right coset representatives, but all such representations are equivalent.
T H E G R O U P REPRESENTATION NETWORK
325
The following theorem (see e.g., Mackey, 1976) describing a relationship between restricted and induced representations is crucial here. Theorem 11.1 Frobenius Reciprocity Theorem Let G be a ,finite group with szthgroicp H, let A be a ,finite-dirneiisionul representation of G and B he N finite-dimensional representation of H. Then: 1. The intertwining space HomG(A, indEB) is isomorphic to the space Hom,(res;A, B). 2. (x[IA1, %[IindEB])G = (x[IresEA1l %CB1>H.
Examples of an induced representation and Frobenius reciprocity are given in the next section, following a discussion of permutation representations. F. Special Clusses of Represcntutions
Any subgroup H of a given G defines a permutation action of G on the (right) cosets of H in G. This is given for any right coset H i by g ( H J := H i g . A finite-dimensional representation A of a group G is called a permutation representation if every matrix A ( g ) has exactly one ‘1’ entry in each row and in each column, and all other entries are ‘0’. The term “permutation representation” comes from the observation that the representation acts by permuting the vectors of the natural basis. Conversely, consider a finite set X on which G acts by permutation. This action induces a linear action of G on the space of all functions from X to any field F , which (with an appropriate choice of basis) is a permutation representation. This situation is particularly interesting when the action of G on X is transitive, that is, for any pair of elements x1,x2 of X there exists a group transformation mapping xl to x2. In this case, we can find a corresponding action of G on the cosets of some subgroup. To do this, pick an element x of X , and construct the subgroup H of G which fixes x (hx = x for all h E H). The given permutation action is then identical to the permutation action of G on the right cosets of H. The permutation representation P can be constructed by ordering the right cosets of H, say H , = H, H,, . . . ,H,. Now we construct the n x n permutation matrices P(g) element-wise according to
It is easy to see that this is a permutation representation and is furthermore induced from the trival representation of the subgroup H. Such a permuta-
326
JEFFREY WOOD
tion representation, which acts transitively on the natural basis vectors, is said to be transitive. Given a transitive permutation representation P, we can recover a corresponding subgroup and cosets by:
Hj = {gEG(P(g),,j = 1);
H = H,.
(20)
An important example of a permutation representation is the so-called (right) regular representation of G, which is induced from the trivial representation of the trivial subgroup H = { e } , and thus describes the (right) action of G on its own elements. The dimension of the regular representation is equal to the order of the group, and the regular representation is equivalent to a direct sum of irreducible representations, in which each irreducible representation occurs a number of times equal to its dimension. Example 11.5 The natural permutation representation of the cyclic groups C,, discussed in Example 11.1.2, is induced from the trivial representation of the trivial subgroup {e}. This is the right regular representation of C,, and it is equivalent to the direct sum of all n (I-D) irreducible representations, as discussed in Example 11.3.1; an isomorphism is the DFT. Here is an example of Frobenius reciprocity. Take G = S,, the symmetric group of degree 4, and H to be the subgroup
of all elements which fix the object 1. Set H, = H, H, = H (l 2), H, = H(l 3), H , = H(l 4). Take C to be the trivial representation of H; then indEC is a 4-D representation given by
indsC(y)i,j=
1 ifHig= H j 0 otherwise.
which is precisely the natural permutation representation B given in Example 11.3.3. Now take A to be the six-dimensional (6-D) permutation representation of G again in Example 11.3.3, and consider the restriction of A
327
THE GROUP REPRESENTATION NETWORK
to
H,generated by
l o o 0 1 0 0 0
1 0 0 0 0 0
0 0 0 0 1
0 0 1 0 0
0 0 0 ' 1 0
0 1 0 3)) = 0 0 0
1 0 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 0 1
0 0 0 0 1 0
A((2 3 4))
=
I i
4 2
o o o \
1 0 0 0 0
1 'I,
i
Now a homomorphism from resSA to the trivial representation C of H is given by W'=(p,
P1
P2
P1
P2
(21)
P2)>
where p 1 and p2 are any real values. By consideration of the inner product of characters (Lemma 11.3), it is not hard to see that any intertwining operator between these representations must have the form of Eq. (21). Now Frobenius reciprocity states that there is a one-to-one linear mapping between homomorphisms W' as in Eq. (21), and homomorphisms W from A to B = indGC, as constructed in Eq. (10). Such an isomorphism is given in this case by
(PI
PI
PI
P2
P2
P2)-
/PI
P1
P1
P2
P2
P2\
PI
P2
P2
P1
PI
P2
P2
PI
P2
P1
P2
P1
P2
P2
P1
P2
P1
P1
i
1.
The concept of a permutation representation is standard; however we will have recourse to consider certain other classes of group representation, which to our knowledge have not been considered previously in the literature.
Definition 11.1 Let A be a finite-dimensional representation of a group G over the real field. A is called a perm-diagonal representation if every row and column of each A(g) contains exactly one nonzero entry. A is called an inversion representation if every row and column of each A ( g ) contains
328
JEFFREY WOOD
exactly one nonzero entry, which is k 1. A is called a positive representation if every entry of each A ( g ) is nonnegative. A is called a unit-row representation if the entries in each row of each A(g) sum to 1. For any perm-diagonal representation A, we can write A(g) = D(y)P(y), where D(y) is nonsingular and diagonal, and P(g) is a permutation matrix (hence the name, “perm-diagonal”). We denote this by A = DP, and we remark that P is also a representation of G, called the underlying permutation representation of A (Wood, 1995). Furthermore, any positive representation is also perm-diagonal; see Lemma A.l in the Appendix that begins on p. 395. A perm-diagonal representation is said to be transitive if its underlying permutation representation is transitive. We will find that inversion representations are particularly interesting. The concept of an inversion representation is motivated by the following idea: consider a group that acts on a set of coins in two ways-by permuting them and by inverting (flipping) them. This induces an inversion representation of G on the functions from this set of coins to a field F in a similar way that a permutation representation is induced from a permutation action of a group. An important subclass of inversion representations consists of those induced from an alternating representation of some subgroup. Example 11.6 1. A perm-diagonal representation of the group S,:
lo
0
2\
\o
0
lo 2 3))
=
0
-1
__
-3,
0
-I/ 0
-2\
THE GROUP REPRESENTATION NETWORK
329
2. An inversion representation of the group C,:
A((l
3
2)) =
[” “I 0 0 -1
111. LINEARAND
0
NONLINEAR CONCOMITANTS
The most important property of the GRN model we are going to introduce is that it is modular, in the sense that any GRN can be decomposed into some fundamental building blocks. This is more interesting viewed in terms of synthesis: these building blocks can be combined in a very flexible manner to produce GRNs. Our principal requirement of an invariant pattern classifier is that is is invariant under the action of some representation A of a group G. The function computed by a given classifier is therefore a concomitant from A to the trivial representation of G. In fact, the theory that follows can be equally easily applied to the construction of systems with general concomitance properties (Wood, 1995). The obvious way to decompose a concomitant of group representations is into other (in some sense, simpler) concomitants. This can be done in two ways: Given three representations A , B, C of a group G, a concomitant Cbl from A to B and a concomitant ( p 2 from B to C, clearly (b20Cbl is a concomitant from A to C. Alternatively, if q51 is a concomitant from A t o C and (p2 is a concomitant from B to C, then we construct a concomitant from A @ B to C as follows: VVI
E W ) , V,EQ(B),
(b3(V1, V 2 )
= (bI(V1)
+ 42(V2).
These compositions correspond to the graphical structures shown in Fig. 1 . By connecting such concomitants in a conceptually hierarchical structure, we can construct highly complex structures in the form of a directed acyclic graph where each node corresponds to a group representation and each edge to a concomitant between the corresponding pair of representations. This solves the problem of how to combine “basic concomitants” together in a complex fashion, but we still have the question as to what these basic concomitants should be. An obvious starting point is the class of linear concomitants or intertwining operators. By themselves, these are certainly
330
JEFFREY WOOD
FIGURE1. Connecting concomitants in a directed acyclic graph structure.
not enough, as the addition and functional composition of linear operators yields only more linear operators! However, by combining linear concomitants with a special highly nonlinear class of concomitants (to be introduced in Section II1.B) we will be able to construct complex concomitant functions. Our next task is to analyze the structure of linear concomitants. A . Linear Concomitants
As discussed in Section 11, a linear concomitant between two representations A and B of a group G is precisely a homomorphism or intertwining operator from A to B. We consider exclusively the case where G is finite and A and B are finite-dimensional (the extension to the case where G is compact requires a change of summations to Haar integrals (Wood, 1995)). In the case where A and B are irreducible representations, such concomitants are described fully by Schur's Lemma 11.2. In the general case, there is a formula parameterizing all such homomorphisms. Lemma 111.1 Let A and B be two representations of ajinite group G dejined over afield F. Then any homomorphism W from A to B is of the form W=
1 B(s)X,A(s-') S€G
for some linear transform X p over F , and conversely.
Proof Let W be a homomorphism from A to B. From the definition of a 1 homomorphism of group representations, we find that X , = - W satisfies (GI the equation of the lemma, so W is of the required form. The converse is to prove that W defined by Eq. (22) satisfies WA(g) = B(g)W for all g E G. This is a well-known result (e.g., Ledermann, 1977), and can be easily proved by a change of variable in the summation. 0
By making substitutions into Eq. (22), we can now construct arbitrary
T H E G R O U P REPRESENTATION NETWORK
331
homomorphisms between any finite-dimensional representations of a finite group.
Example IZZ.2 Take G = S,, the two-element group {e,y}, and let A = B = the natural 2-D permutation representation. Let X,, denote an arbitrary 2 x 2 real matrix, with entries a, b, c, d. Then the resulting homomorphism W is given by
Observe that interchanging the rows of W is equivalent to interchanging the columns. This means that W is a homomorphism from A to B, as required.
B. Transmutation Unfortunately, the composition of linear concomitants (homomorphisms) of group representations can only result in more linear concomitants. In order to construct more sophisticated concomitants, we need also to consider a class of nonlinear concomitants. The class we are going to consider is the class of functions on finite-dimensional vector spaces that act in a component-wise fashion (with respect to some fixed basis).
Definition IZZ.2 Let I/ be a finite-dimensional vector space over the field F , and let f be a function from F to F . We define a function f :V H I/ to be the component-wise action o f f with respect to the natural basis e , , . . . , e n ; formally V a , , ..., a,,€F,
f-( a , e ,
+ ... + a , e , ) : = f ( a , ) e , + ... +f(a,)e,.
(23)
Now let G be a group, and let A and B be two representations of G on the space I/: The function f is said to transmute the representation A into the representation B i f f- is a concomitant from A to B, that is, VtJE r/; 9 E G,
f-( A ( g ) v ) = B ( d f ( 4 .
(24)
In the case A = B, the function f is said to preserve the representation A . The function f is referred to as a transmutation function (or preservation function). The transmutation condition is this: applying the transformation A(y) and then the function f component-wise gives the same result as applying the function f component-wise and then the transformation B(g) (for all g). It is possible to extend Definition 111.1 to the case where I/ is not finite-dimensional, but is instead a space of functions from some set X to F.
332
JEFFREY WOOD
In this case, the function f is interpreted as composition o f f with a given vector v, that is, f ( v ) = ,fL v, and we also require that V should be closed under such a composition. The rest of Definition 111.1 is unchanged. We will not have much need for this extension of the transmutation concept. Example 111.2 1. Consider again the following inversion representation A of C,:
0
-1
0
[pl $ 0
A((l
3 2))
=
Let B denote the underlying permutation representation of A :
,
B((l
2
3))
=
Then the function f ( x ) = cos x, over the field [R, transmutes A into B (as does any even function). 2. Consider the additive group Z,with two representations A , B generated by: A(1)
=
(i i),
B(l)
=
('
0 O). s
The function f(x) = sgn(x)/x, over the field [R, transmutes A into B. 3. Here is a well-known example of complex transmutation. Take G = C,, and let A be the direct sum of irreducible representations, that is, 1 0 0 0
1 0
0
0 0 0 1
0 0
0
THE GROUP REPRESENTATION NETWORK
I1
0
1; jl I1
A((l
4 3 2))
=
0
0
333
o \
0
o\
Let B denote the direct sum of four copies of the trivial representation, that is, B(y) = I , for all Y E G. Then the complex modulus function f ( x ) = 1x1 transmutes A into B. 4. Now take the additive group G = R, acting on the space I/ of all functions from G? to C.Let A be the representation of G defined for any TEG,V E I/ by v s E G?,
(A(T)V)(S) =
e2n”rV(s).
Let B be the representation of G which is the identity operation for all group elements. Then again the complex modulus function f ( s ) = Is1 transmutes A into B : (f(A(T)V))(4
=(
f’
@
4 T ) W
= I(A(T)Wl =
le””Tv(s)l
=
Iv(s)l
= (B(T)( f
v))(s)
= (B(T)(.f(V)))(SX
as required. The transmutation functions are a general class of nonlinear concomitants. We will see that this class of concomitants is sufficiently rich that we can obtain highly complex structures by combining them with representation homomorphisms. However, the transmutation functions are also restrictive enough that we can provide an almost exhaustive characterization of them in the case F = R, which is the most interesting case for practical purposes. To do this in a straightforward way, it is convenient to eliminate from consideration two classes of functions:
334
JEFFREY WOOD
Definition 111.2 A function f : R H R is called t-exceptional (transmutationexceptional) if one of the following cases holds. 1. f is discontinuous everywhere on either the positive real or the negative real line. 2. There exist zl, t2 E R, z1 4 { - 1,0, l}, z, # 0, such that . f ( z l x )= z2,f(x) for all X E R, but f is not of the form
I
blx" x > 0
VXER
f ( x ) = b2x" x < 0 h,
x=O
for any a, b,, b,, b, E R. I f f is not t-exceptional, it is called t-unexceptional. In the following theorem, which characterizes transmutation, we are going to assume that the transmutation function .f is t-unexceptional. We therefore require that f is continuous at at least one positive point and at least one negative point; not a very stringent requirement! We also wish to eliminate functions satisfying condition (2) in the foregoing, and this may seem rather contrived. In fact it is possible to characterize t-exceptional (case 2) functions, but not succinctly. Comparatively simple t-exceptional (case 2) functions include: z, = 10, z2 = 1, z1 = 2, z2 = 2, z1
= 2,
z2
= 3,
f ( x ) = sin(2zlog,,Ixl) f ( x ) = the largest integral power of 2 not greater than 1x1
,f(x) = 3 s i d z n log2x)+(logzx)
Generally, t-exceptional (case 2) functions have a property that might be described as geometric periodicity, as is the case for the t-unexceptional function f(x) = bx". For a further discussion of these functions, consult Wood (1995). The omission of the t-exceptional (case 2) functions is not significant here, as to include them in our characterization (Theorem 111.1) would require only a modification of the transmutation conditions for general positive and perm-diagonal representations, whereas we will see in Section II1.D that in fact these cases of transmutation are redundant in group representation networks. We will require some further notation. Let pu denote the function, which, applied (where applicable) to a matrix of arbitrary dimensions, raises each entry to the power of a. Similarly, let jju denote the function that raises each entry to the power of a and multiplies by the sign of the original entry. Our first main result is as follows.
THE G R O U P REPRESENTATION NETWORK
335
Theorem 111.1 Let A and B be$nite-dimensional representations of a group G over the Jield R,and let , f he a t-unexceptional function from R to R. Then f transmutes A into B precisely when one of the following holds: 1. A is a permutation representation and B = A. 2. A is an inversion representation, B = A and f is odd. 3. A is an inversion representation, B is the underlying permutation representation of A and f is even. 4. A is a positive representation and some reals a, b,, b, exist such that B(g) = p,(A(g)) for all g E G, and f has the form:
1
b,xa x > 0
VXER
f ( x ) = h2xa x < 0
0
x=o.
5. A is a perm-diagonal representation and some reals a, b exist such that one of the,following holds:
(a) B(g) = p,(A(q)) for all g E G and f has the f o r m f ( x ) = bx" for all x. (b) B(g) = ?,(A(g)) for all gE G and f has the f o r m f ( x ) = sgn(x)bx" for all x. 6. 7. 8. 9.
A is a unit-row representation, B = A and f is afine. B = A and f is linear. B is a unit-row representation and f is constant. f is the zero,function.
Proof See the Appendix.
0
In the previous paper (Wood and Shawe-Taylor, 1996a), only the case of preservation ( A = B ) was considered. As we will see, this omits an important useful case of transmutation. In the preceding list, evidently cases 6-9 are not of any real interest, as the functions f are too simple to be useful in constructing advanced concomitant structures. However, cases 1-5 offer a large class of nonlinear concomitants. Finally, note that while our general discussion proceeds on the assumption that the invariance group is finite, Theorem 111.1 in fact holds for an arbitrary group G.
C. Fixed- Weight Group Representation Networks By combining representation homomorphisms and transmutation functions together in a structured manner, we find that it is possible to construct highly complex functions with a group concomitance property. This can be done using the two composition operations shown in Fig. 1. Furthermore, by composing these linear and nonlinear concomitants together in such way, we naturally obtain a feedforward neural network structure. A feedforward
336
JEFFREY WOOD
neural network is essentially a directed acyclic graph, with input nodes (those nodes with no parents) and output nodes (those nodes with no children). The network acts as an information processor, with each edge having a corresponding multiplicative weight, and a non-input node’s output being determined by summing the node’s weighted inputs and then applying a nonlinear function, the activation function. For more information on artifi1991). cial neural networks consult, for example (Fausett, 1994; Hertz et d., The initial definition of a GRN, which does not include a parameterization to allow adaptation, is as follows.
DeJinition ZZZ.3 A fixed-weight Group Representation Network (GRN) over the group G is a feedforward neural network N having the following properties: 1. The nodes of N are partitioned into layers, the input nodes forming a single layer and each output node being a layer by itself. There are no connections between nodes in the same layer, and no feedback loops in the interlayer connections. Each layer i has an associated pair of representations (Ai, Bi)of G, each of dimension equal to the number of nodes of that layer. The representation A iis called the input representation of the layer, and B, is called the output representation. For each output layer 1, B,= 1, the trivial representation of G. The representations of the network are all over some fixed field F. 2. The connections of the network have weights that are fixed in F . The weight matrix yielding the inputs to layer j from the outputs of layer i is a homomorphism from Bi to A j . 3. All the nodes in a given non-input layer have the same activation function, 1; : F H F , which transmutes A , into Bi.
The representation B, corresponding to the input layer is called the input representation of the network. When F = R, the GRN is said to be real. The input representation of a fixed-weight GRN is the output representation of the input layer, rather than the input representation A , (which is completely redundant in Definition 111.3). The functionality of a GRN is that of its underlying neural network. Normally, we will restrict our attention to real GRNs. Occasionally we will have recourse to consider the complex variety. We will also wish at one point to extend conceptually the notion of the GRN to a network where some of the layers contain an infinite number of nodes, but this is not the usual situation. The group representation network is designed to meet an invariance condition:
T H E GROUP REPRESENTATION NETWORK
337
Theorem 111.2 Let N be ajixed weight G R N over the group G, and let A he the input representation of N. Then the output f ,of N is invariant under the action of A on the input layer, that is, ,fiw any network input vector v we have
that is, f, is a concomitant from A to the trivial representution 1 o f G.
Proof The essence of the proof is as follows. Consider the propagation of an arbitrary input vector 8' through the network N, and then consider the result of inputting the transformed vector A ( g ) v for some g E G. The output of the input layer is transformed by A(g) = B,(y). Now for each layer i of N with connections leading only from the input layer, the homomorphism property of the weight matrix between the two layers induces the action A&) on the input vector to that layer. In other words, if vCi)was the vector of inputs to that layer with v as the network input, then Ai(g)v"' will be the corresponding input to that layer with A ( g ) v as the network input. Next, as the activation function for that layer transmutes A i into Bi,there is a transformation B,(g) induced on the vector of outputs from layer i. Proceeding through the network in a feedforward fashion, we eventually find that the action of A(g) on the network input induces an action B,(g) on the output of each network output layer I, where B, is the output representation of that layer. But each B,(g) = 1, the trivial representation, so the network outputs are unchanged. This proof can easily be formalized by induction through the network structure (Wood, 1995). 0 Thus any GRN, constructed by connecting representation homomorphisms and transmutation functions in a feedforward structure, is invariant under the action of the group on the inputs. To construct fixed-weight GRNs under arbitrary finite-dimensional representations of finite (or compact) groups, it is only necessary to have enough information about the representations of the groups to be able to compute homomorphisms between them. Direct information on the invariants of the group is not needed. We can also consider GRNs in which the output representations of the network are nontrivial (Wood, 1995). In such a case, the GRN is computing a concomitant of group representations, rather than an invariant. This concept will be useful in some later proofs, but it is not part of our definition of a GRN.
Example 111.3 1. Examples of GRNs will appear throughout the remainder of the paper: however we give two examples of fixed-weight GRNs here.
338
JEFFREY WOOD
First, consider the bipolar X O R problem, defined as follows: u1
02
u , X O R v2
-1
-1
-1
-1
1
1 1
1 1
-1
-1
1
Given its size, this problem is notoriously hard to learn (e.g., to train a neutral network on). However, the problem is almost entirely defined by a symmetry: if the two inputs u l , u, are both multiplied by - 1, the output is unaffected. If we can construct a GRN invariant under this operation, and which furthermore discriminates between the input pair ( - 1, - 1) and the input pair (- 1, l), then we have effectively solved the bipolar X O R problem. The invariance group in question is S , = C , = (e,(l 2)), and the input representation o f the desired GRN is given by
This representation is the direct sum of two copies of the group’s alternating representation. We choose a single hidden layer (“layer 1”) with two nodes, for which the input and output representations are given by
The single output node has connections only from the hidden layer, and its representations A , , B, are both chosen to be trivial. The activation function of the hidden layer nodes if fl(x) = O(x - lS), where 0 is the Heaviside function VXER,
O(x)
=
i
1 x 3 0 0 x<0.
The activation function of the output layer node is ,f2(x) = ~ O ( X 0.5) - 1.
By Theorem 111.1, these functions have the transmutation property with respect to the corresponding pairs of representations.
THE GROUP REPRESENTATION NETWORK
339
The weight matrix from the input layer to the hidden layer is chosen as
w=(-2
-3). 3
This is a homomorphism from B, to A , , as
(22
13)(01:I)=@
3('2
i3).
The weights from the hidden to the output layer are set to be 1; the matrix (1 1) is clearly a homomorphism from B , to A , = 1. The resulting network, shown in Fig. 2, is therefore a group representation network. 2. Second, consider the problem of constructing a neural network with input space lR2, which is invariant under 120" rotations about the origin, and under reflection in the y-axis. The invariance group is G = S,, and the representation in question is the 2-D irreducible representation discussed in Examples 11.2.5, 11.3.2, and 11.5.2. This representation is defined by
4
We wish to construct a GRN invariant under this representation. An
2 0 (x-0.5)
0 (x-1.5)
-2
FIGURE2. Example of a fixed-weight GRN.
340
JEFFREY WOOD
obvious first choice is simply to have no hidden nodes and an output node connected directly to the inputs, with A , = B, = the trivial representation. However, the weight matrix of connections from the output node to the input node is an element of the intertwining space of (B,,A,), and by Schur’s Lemma this space is 0, the two representations being irreducible and inequivalent. In Example 11.5.2 we showed that the intertwining number of these representations was 0. Another possibility is to still have no hidden nodes, but to change A , to one of the group’s alternating representations. However this meets with the same problem, as again B, and A , are irreducible and inequivalent. Hidden nodes are therefore essential to obtain a nonconstant network output. Let us take a single hidden layer (“layer 1”) with three nodes, where the input and output representations are the natural permutation representation A , = B,,defined by 0 1 0 A,((1
2 3)) =
By Theorem 111.1, the activation function f, of layer 1 can be chosen arbitrarily, say f,(x) = tanh x. A suitable weight matrix for the connections from layer 0 to layer 1 is the transpose of a submatrix of the isomorphism given in Example 11.3.2:
W,,l
=
13+0fi -3-J3
l+J3
The choice of the weight matrix is far from unique, as any element of the intertwining space can be chosen. However, in this case the intertwining number is equal to 1, as can be seen by considering the equivalence derived in Example 11.3.2, together with law 6 of Lemma 11.4. Hence any valid weight matrix is a multiple of the one given here. We will choose the output layer representations A , = B , = 1; the input to the output layer can simply be chosen to be the sum of the outputs of the hidden nodes. The activation function f, of the output layer is again arbitrary, so we take f2(x) = tanhx. The resulting network is shown in Fig. 3.
T H E G R O U P REPRESENTATION NETWORK
341
tanh
FIGURE 3. Example of a fixed-weight GRN.
D. Redundancy of Noninduced Representations The previous sections describe the structure of (fixed-weight) group representation networks (GRN), the activation functions in these networks being chosen to satisfy the transmutation law, Theorem 111.1. Now in fact many of the different classes of activation function suggested by this theorem can be eliminated, in the sense that any real GRN using such an activation function can be replaced by a simpler one computing precisely the same function. We will see in this section that, in fact, we can reduce our consideration to the first three cases of Theorem 111, and that, furthermore, some inversion representations can also be discounted. Aside from eliminating the need to consider many classes of representations when designing GRNs, this also simplifies the problem of fully describing the homomorphism (weight matrix) structure of GRNs. We are going to show that cases 4-9 of Theorem 111.1 can be eliminated from consideration in a GRN. Case 9, the zero function, is trivial. Case 7, the linear function, is also straightforward, since any layer having a linear activation function can be removed, with appropriate new connections being introduced elsewhere (or changes made to existing connections) to ensure
342
JEFFREY WOOD
that the functionality of the network is unaltered. It is easy to see (from simple predictable rules obeyed by representation homomorphisms) that the resulting network is still a GRN. Case 8, that of a constant activation function, can also be eliminated by replacing any layer using such a function by a single “threshold node” providing a constant output of 1 to all layers to which the original hidden layer fed input. This node is fixed by the group, and can be thought of either as a new input node, or as a hidden layer (by itself) with no inputs! The last straightforward case to eliminate is case 6, an affine activation function, which can be reduced to a combination of the linear and constant cases. Full details are given in Wood (1995). This leaves us with cases of transmutation concerning perm-diagonal representations. However the next result shows us that without loss of generality we can assume that all representations in the hidden layers of a GRN are induced from 1-D representations of some subgroup. For a finite group, and over a real field, this means that such representations are either permutation representations or a special class of inversion representations. Note that it entails no loss of generality to assume that a perm-diagonal representation is transitive; if a given layer has representations that are perm-diagonal but nontransitive, then it can be broken up into sublayers that correspond to transitive perm-diagonal representations.
Lemma 111.1 Let A and B be the input and output representations respectively of some non-input layer of a fixed-weight real G R N ; assume A and B are perm-diagonal and transitive. By Theorem 111.1, A and B have the same underlying permutation representation P. Let H I = H,. . . ,H , be the right cosets of a subgroup Hassociated with P, as dejined in Eq. 20 in Section I1.F by:
Let
c1
and
fl
be the representations of H dejined by:
Then we can replace A by indgci and B by indgp in the G R N , with appropriate changes to the weight matrices connected to this layer, without changing the network functionality.
Proof For the sake of brevity, we provide only a sketch of the proof here; the full length version is given in Wood (1995). Take A , B, H, H I , .. . ,H,, c1 and p to be as given in the lemma statement, and let f be the activation function, which transmutes A into B. Further, let A‘ and B denote indga and indgfl, respectively. The first step shows that A and A’ are equivalent representations. An isomorphism from A to A’ can be identified: Let h,, . . . ,h, be a set of right
THE GROUP REPRESENTATION NETWORK
343
coset representatives of H in G, that is, Hi = Hh, for all ic 1,. . . ,m. Then construct the matrix
It is easy to see that T is diagonal and nonsingular, and it is not hard to show that TA(g)= A’(g)T for any g E G. Hence T is an isomorphism from A to A’. Clearly one of the cases 1-5 of Theorem 111.1 must hold. In each case, we have either vg E G
B(g) = P a ( 4 7 ) )
(28)
VClE G
B(g) = ia(A(g))7
(29)
or
for some real a, where pa and Pa are the functions introduced preceding Theorem 111.1. In the permutation and inversion representation cases, we have a = 1 in either Eq. (28) or Eq. (29). The proof is essentially the same whether we have Eq. (28) or Eq. (29), so for simplicity we will consider only the case of Eq. (28). The set of integer powers of the invertible matrix T forms a representation of the additive group Z.The entries of T are drawn from the entries of A(g), g~ G, and so this representation of Z is perm-diagonal. The matrix p,(T) similarly generates a perm-diagonal representation of Z. Applying the transmutation law, Theorem 111.1, we see that vvEQ(A)
f(W= P,(T)f(4?
(30)
and similarly for T-’ and p a ( T - ’ ) . Furthermore, for perm-diagonal matrices M , N it holds that p , ( M N ) = p a ( M ) p a ( N ) .From this, we have that Pa(T)B(S) = P a ( T ) P a ( A ( g ) ) = P~(TA(Y)) = Pa(A’(g)T1 = Pa(A’(g))P a(T) for all g E G. Next, Eq. (28) yields that P(h) = p,(a(h)) for all h~ H, and from this we can show that p,(A’(g)) = B’(g). Consequently, p , ( T ) is a homorphism (an isomorphism) from B(g) to B’(g). Combining this fact with Eqs. (28) and (30), we deduce that
v v E QCAl 9 E G that is, f transmutes A’ into B’.
f ( A ’ ( d 4 = B‘(g)_f(v),
(31)
344
JEFFREY WOOD
Now build a new GRN, which is identical to the old one except that A and B are replaced by A’ and B in the layer concerned, and, furthermore, that any weight matrix W of connections leading into that layer is replaced by TW, whereas any weight matrix W‘ of connections leading out of that layer is replaced by W’p,(T)-’. It is easy to see that each such W or W’ is a homomorphism between the appropriate pair of representations, and this, together with the transmutation result Eq. (31), establishes that the new network really is a GRN. It remains to prove that its functionality is the same as that of the old one. For ease of notation, let us suppose that there is only one such weight matrix M! and only one such W’. Applying Eq. (30), we find that
( w ’ p a ( T ) - ’o) f o ( T W ) = (W’pa(T)-l)o p , ( T ) o ,-f o
w = w q -w,
which demonstrates that the functionality of the new GRN is the same as that of the old one. This concludes the proof. 0 Over a finite group, the only 1-D real representations are trivial or alternating. Hence the only representations of use in the hidden and output layers of a real GRN over a finite group are permutation and (a subclass of) inversion representations:
Corollary 111.1 Any real G R N over a jinite group computes the same function as another real G R N , with at most the same number of nodes, in which the hidden and output layers use only representations induced from 1-D representations of subgroups. I n particular, such representations are either permutation representations or inversion representations. Lemma 111.1 and Corollary 111.1 greatly reduce the task of describing the structure of GRNs in more detail.
IV. ADAPTIVITYIN GROUPREPRESENTATION NETWORKS
We have discussed fixed-weight GRNs, which are fixed-weight neural networks invariant under a given group representation. There is no reason why such networks could not be used as invariant feature extractors in an invariant pattern classifier. However, it is quite easy to introduce adaptivity directly into the GRN model, and this what we discuss in this section. We also derive formulas for the weight matrices of connections in a GRN, which can be applied in both the adaptive and fixed weight cases. At the end of the section, we provide an algorithm for GRN construction.
THE G R O U P REPRESENTATION NETWORK
345
A . Parumeterized Homomorphisms
A group representation network is in particular a feedforward neural network. Feedforward neural networks are normally considered to be adaptive, because their weights are generally variable and can be adapted so that the network “learns” a given function (Fausett, 1994; Hertz et al., 1991). The weight matrices of a feedforward neural network are, therefore, parameterized linear transformations. We would like to be able to parameterize the weight matrices of a GRN in a similar way. Unfortunately, we have the tricky restriction that the weight matrix must be a homomorphism between two group representations, this is, for any value of the matrix parameters. How can we accomplish this? Surprisingly, we already have an initial solution in Lemma 111.1, which provides a parameterization of all homomorphisms from a representation A to a representation B :
where X , is an arbitrary linear transform. If we treat this equation as a map from a parameter space (space of entries of X,) to the intertwining space of ( A , B ) (the space of homomorphisms WAsB from A to B ) , then our problem is solved. Furthermore, if we regard the entries of X , as indeterminates, the resulting WA,Bis a linear polynomial matrix that represents the entire parameterization map.
Definition ZV.2 Let A and B be two finite-dimensional representations of a group G. Then a parameterized homomorphism from A to B is a matrix WA,Bwith entries that are linear combinations of the indeterminates of a finite set S, such that the evaluation of WA,Bat any point is a homomorphism from A to B. Note that, by Eq. (32), any intertwining space (between finite-dimensional representations of a finite group) admits a parameterized homomorphism, as the entries of WA,Bare a linear combination of the indeterminates of X,. It is important that in this case the parameterized homomorphism can take any value in the intertwining space, by Lemma 111.1. A parameterized homomorphism with this property is said to be complete.
Definition ZV.2 An adaptive group representation network (GRN) is a fixed-weight GRN in which the weight matrix homomorphisms are replaced by parameterized homomorphisms.
346
JEFFREY WOOD
The parameterized homomorphisms in an adaptive GRN are normally taken to be complete, so as not to restrict the adaptive power of the system unnecessarily. The problem with the parameterized homomorphisms described by Eq. (32) is that there is in general a huge amount of redundancy in the parameter space. In other words, most intertwining spaces can be parameterized using far fewer parameters. A reduction in the parameter space can be achieved by using an echelonform reduction on the map from the parameter space to the intertwining space. However, for the purpose of inclusion in a GRN we have seen that we need to consider only parameterized homomorphisms where B is an induced representation. For this special case, we can produce a formula using significantly fewer parameters, as will be seen in the next section.
Example I V.1 1. We have already produced an example of a parameterized homomorphism in Example 11.3.3. The matrix P1
P1
P1
P2
P2
P2
P1
P2
P2
P1
P1
P2
P2
P1
P2
P1
P2
P1
Pz
P2
P1
P2
P1
P1
in the two indeterminates p l , p 2 , is a parameterized homomorphism between two permutation representations A, B of S,. Here B is the natural permutation representation of S , induced by the permutations of the nodes of a 4-vertex graph, and A is similarly induced by the corresponding permutations of the graph's edges. In Example 11.5.2,we reasoned that the intertwining space of ( A , B) has dimension 2. Hence the set of matrices represented by WA,B,which is clearly a 2-D subspace of the space of all 4 x 6 matrices, is the whole intertwining space, that is, W is a complete parameterized homomorphism. As an (adaptive) weight matrix in a neural network, W is shown in Fig. 4. An example of a noncomplete parameterized homomorphism would be the matrix WA+Bobtained from WA,Bby replacing p2 by p l . 2. Schur's Lemma 11.2 provides complete parameterized homomorphisms between any pair of irreducible finite-dimensional representations; when the representations are inequivalent, a complete parameterized homomorphism is given by W = 0, and when they are equal, it is given by W = k l , with k now treated as an indeterminate. 3. To illustrate the redundancy of parameters that generally results when
THE GROUP REPRESENTATION NETWORK
347
FIGURE4. A parameterized homomorphism as a weight matrix in a GRN. Connections drawn in the same style are constrained to have equal weights.
applying Eq. (32), with X , as a matrix of indeterminates, let us construct parameterized versions of the homomorphisms between the input and hidden layers of the fixed-weight GRNs in Example 111.3. In Example 111.3.1, the group is S 2 = C,, with two representations given by
A complete parameterized homomorphism from Bo to A , can be constructed via Eq. (32) as follows:
P.)
= 1"( P3
P4
=( -
+
(-P2 -P4 - P1
P1-
P2
P2
P3
P4
P4-
P3
-PI) -P3
1.
Thus Eq. (32) provides a parameterized homomorphism with four parameters, whereas we evidently need only two. 4. A still worse case of parameter redundancy appears when we try to construct an adaptive version of the fixed-weight GRN in Example
348
JEFFREY WOOD
111.3.2.Starting with a 3 x 2 parameter matrix X , with entries pl,. . . ,p6, we obtain the following complete parameterized homomorphism:
\
0
-3 -Pz + 4 PI
P3
-P4 f 2 P 6 1
which involves five parameters (or, if the algorithm is automated, six, unless the trivial redundancy of p s is detected). However, we can rewrite Win the form
for a single parameter p,. In Example IV.4 we will construct a complete adaptive GRN. Other examples will be presented. B. Parameterized Homomorphisms for Induced Representations From Lemma 111.1, we have seen that in a GRN, all hidden layer representations can be taken to be induced from a I-D representation of some subgroup. This is a very useful assumption, as it allows us to describe in more detail the general structure of the network's parameterized homomorphisms. Theorem IV.l Let G be a j n i t e group, H a subgroup and A an urbitrary finite-dimensional representation of G. Let h,, . . . ,h, be u complete set of right coset representations of H in G,and let p be a 1-Drepresentation of H. Now let B denote the representation of G given by G = indEp, constructed with A ) denote the matrix respect to the given right cosets. Finally, let a,.,@,
T H E GROUP REPRESENTATION NETWORK
349
Then a compIete parameterized homomorphism from A to B is given by
where p is a parumeter vector.
Proof From Eq. (22), a complete parameterized homomorphism is given by
wA,B = c B(s)X,A(s-'). S€
G
We can rewrite this using tensor (Kronecker) products in the form of a linear map from the parameter space to the intertwining space. This requires some notation: for any p x q matrix C, let row(C) denote the pq-dimensional vector obtained by writing out the rows of C in a column vector. We now find (Wood, 1995): row( W,,,)
= Y row(X,),
where Y is the linear transformation represented by the matrix Y
=
1B(s) @ A ( s - ' ) ~ . rsG
Next, we use the block matrix form Y is given by
vij =
= (vij), i, j E
1,. . . ,m,where each block
2 B(s)i,jA(s)-T. ssG
We now apply the fact that B is induced from /I, which means that = P(hishJ:') when hishJ E H,and 0, otherwise. We extend the domain of P to G by defining it to take value 0 on G\H. Now we have
vij =
C /l(hishy')A(s)-T 5EG
as B vanishes outside H. Now we sum over h-' rather than h, to obtain
vii
= mA(h;')-TQH(B,A ) T A ( h j ) - T= u
~ ~ A ( ~ ~ ) - ~ .
350
JEFFREY WOOD
We define a new parameter vector p by p = rn(A(hl)-I'IA(h,)-TI...(A(hm)-T) row(X,);
note that this mapping is epic, so any p in the space IF?"can be obtained from a suitable X , . The re-parameterized homomorphism WA,Bnow appears as follows, in column vector form:
Rewriting this in matrix form gives us the desired result.
0
Let us make some remarks on Theorem IV.l. First, note that by Corollary 111.1, the theorem effectively describes the weight matrices of any real GRN over a finite group. Also, over a real field, p will always be either trivial or alternating, and so in particular p(h) = P(h)-' for any h~ H. We have used the form p(h)-' in defining QH(p,A ) because the preceding theorem also works in the case of complex representations. Furthermore, Theorem IV. 1 can be applied to finite-dimensional representations of compact groups, using Haar integrals instead of summations (Wood, 1995). The matrix QH(p,A ) can be thought of as a weighted average over H of the values of A . These parameterized homomorphisms (weight matrices in an adaptive GRN), can be easily constructed: we only have to take a simple weighted sum of the values of A(h) over H, to pre-multiply the result OH@,A ) by a parameter vector pT, and then post-multiply by the coset representative matrices A(ki) to obtain the final form. Furthermore, we can see that pTQ,&3, A ) is itself a complete parameterized homomorphism (from resEA to p). The mapping from pTQH(p, A ) to WAqB given by Eq. (34) can be thought of as a mapping from the intertwining space of (resEA, p) to the intertwining space of ( A ,indzp). Theorem IV.1 therefore amounts to a restatement of the well-known Frobenius Reciprocity Theorem 11.1 for 1-D B. Example IV.2 1. Let G be the cyclic group C,, and consider the problem of constructing a complete parameterized homomorphism from the natural permuta-
THE GROUP REPRESENTATION NETWORK
351
tion representation A of C, to itself. Denoting by g a generator of C,, the representation is defined by
1: ; ;
A(g)= 0
1 0 0 0 .
0 0 0 1 0 The subgroup corresponding to the permutation representation A is the trivial subgroup H = { e } , and A = ind:,!?, where ,!? denotes the trivial representation of H. The right coset representatives of H in G are h i = e, h2 = 9, h3 = y 2 , h4 = g3, h , = g4. Finally, we see that OH(,!?, A ) = I,, and so the complete parameterized homomorphism is given from Theorem IV.l by
It is easy to see that this is a parameterized homomorphism from A to A . It is also complete, and there is clearly no redundancy of parameters. 2. Let us now take G = S3 and A to be the 2-D irreducible representation considered earlier, where A((1 2 3)) is a rotation of 120" about the origin in !R2,and A((1 2)) is a reflection in the y-axis. The complete representation is:
I!
1-1 3
-1 2
2) = (;I
352
JEFFREY WOOD
Let B be the inversion representation induced from the alternating representation fl of the subgroup H = {e,(l 2 ) ) . We use the following enumeration of cosets (and choice of coset representatives): H I = H, H, = H(l 3 2), H, = H(l 2 3).
-1
=[;' ; 0
B((1
3))
Now the matrix
-1
QH(IJ,
0 B((2 3)) =
0
[
0 0 -1
0
0 -1 0
-1
0 0
7
A ) is easily seen to be
)-( -1
1 1 0 Q&a)--[( 2 0 1
0
0 1 0 I)]=(0 o).
Applying Theorem IV. 1, a complete parameterized homomorphism from A to B is found to be 1 0 1 0 P J ( 0 o)(o 1)
W=
THE GROUP REPRESENTATION NETWORK
353
L
3. In the two preceding examples, the complete parameterized homomorphism we constructed had no parameter redundancy. Let us now consider an example where there is redundancy, though to a considerably lesser degree than when using the technique in Section 1V.A. By the remarks given about Frobenius reciprocity, it is enough to consider the redundancy in the first row of the parameterized homomorphism, which is a parameterized homomorphism from res.fj.4 to p. Here we take
G = S,, H = (e, (2 3 4), (2 4 3), (2 31, (2 4), (3 4)),
p to
be the trivial representation of H and A the 6-D permutation representation of G induced by the permutation action on the edges of a 4-vertex graph. This is the situation considered in Examples 11.5.2 and IV.l.1. In Theorem IV.l, we obtain the following value for Q,(B, A ) :
QH(IJ,
1 1 1 1 A) = 3 0 0 0
I,
1 1 1 0 0 0
1 1 1 0 0 0
0 0 0 1 1 1
0 0 0 1 1 1
I
0 0 0 1 ' 1 1
i
354
JEFFREY WOOD
which (neglecting the constant factor) leads to the following as a complete parameterized homomorphism from reszA to p: (PI + P 2 + P 3
PI + P 2 + P 3
PI+PZ+P3
P4+P5+P6
P4+P5+P6
P4+PsfP6),
where p l , . . . ,ps are indeterminates. This leads to a complete parameterized homomorphism from A to indEP which has six parameters. Clearly there is redundancy in this parameterization, and it can be reduced to the two parameter form proposed in the earlier examples. Note that application of the general technique in Section 1V.A would result in a parameterized homomorphism with 24 parameters! C. Number of Parameters and Parameter Reduction We have seen how to construct the parameterized homomorphisms in a GRN. Our initial construction Eq. (32) of a complete parameterized homomorphism between two arbitrary finite-dimensional representations A and B, used mn parameters, where n and rn are the dimensions of A and B respectively. However we saw that, in the (effectively completely general) case where B is induced from a 1-D representation, we can instead use Eq. (34) in Theorem IV.1, which involves only n parameters. This is obviously a vast improvement; however it leads to the question: can we reduce the number of parameters still further? In general, the answer is “yes.” In fact, we can easily find the minimum number of parameters; it is given by the dimension of the space of homomorphisms from A to B, that is, the intertwining number of A and B. By Lemma 11.3, this number is given by
where x[A] and x[B] are the characters of the representations A and B, given by the trace of the representation matrices, and we have removed the complex conjugate symbol as we are dealing with real representations. Using basic tensor product laws, we can easily show that this inner product is given by the trace or equivalently the rank of the matrix T introduced in the proof of Theorem IV.1. However, because we know that without loss of generality we can write B = indgj, we can find a simpler expression using Frobenius reciprocity. By Theorem 11.1, the inertwining number of A and B is also the intertwining number of resEA and P, and is, therefore, given by
355
THE GROUP REPRESENTATION NETWORK
Furthermore, QJA, p) is a projection matrix (Wood, 1995), and so its rank is equal to its trace. Summarizing this:
Lemma IV.l Let A and B be finite-dimensional real representations of a Jinite group G,where B is induced from a 1-D representation p of a subgroup H. Then the minimum number of parameters in terms of which a complete parameterized homomorphism from A to B can be expressed, is given by the intertwining number of A and B. This is equal to the intertwining number of resgA and p, or to the rank and trace of the matrix QH(p,A), defined in Theorem IV.l.
As QH(p,A) is a projection matrix, it has full rank only in the case where it is the identity, and this is very uncommon (in fact it occurs only when A(h) = I for all Therefore, in general, Theorem IV.l still uses an unnecessarily high number n of parameters. To find a complete parameterized homomorphism using the minimum number of parameters, we need to find a matrix Q' and parameter vector p' such that p ' = Q is a complete parameterized homomorphism from reszA to p, and Q' has full row rank. Substituting this for pTQH(P, A ) into Eq. (34) in Theorem IV.1 gives us a complete parameterized homomorphism from A to B with the minimum number of parameters. A suitable matrix Q' can be found by row-reducing QH(&A ) to echelon form (and then eliminating redundant rows and corresponding components of pT). This does not change the span of the rows of the matrix, and so the result p'*Q is also a complete parameterized homomorphism. Example ZV.3 Consider Example IV.2.3, in which we derived the following complete parameterized homomorphism from resgA to p (notation as in Example IV.2.3): P~Q~-,(P, A) =
1 3
-(Pl+P2+P3
Pl+PI+P3
Pl+P2+P3
P4+l)5+P6
P4+P5+P6
P4+P5+P6),
where
I
1 1 1 0 0 0 1 1 1 0 0 0
0 0 0 1 1 1 0 0 0 1 1 1
We can reduce the number of parameters by row-reducing
Q,(B,
A) to its
356
JEFFREY WOOD
echelon form O;l(D,A):
i
0 0 0 0 0 0
\
0 0 0 0 0 0 0 0 0 0 0 0
I
1 1 1 0 0 0
The new complete parameterized homomorphism from reszA to PTQH(D,
1 A) =+PI
P1
P1
P4
fi is then
P4),
which obviously contains no redundancy. D. Algorithm f o r Group Representation Network Construction
We are now in a position to provide an algorithm for the construction of G R N s for arbitrary problems of invariance under a finite-dimensional real representation of a finite group. There is a great deal of freedom in such construction, which means that a network can be tailored (in terms of functional complexity) to suit a given problem. However, the choice of network connectivity and hidden layer representations has to be made heuristically. The method described can also be applied to finite-dimensional representations of compact groups, with the proviso that all representations used in the network are also finite-dimensional. Algorithm ZV. 1 1. Assume that the invariance group G and input representation B, are given, where G is finite, and that B, is real and finite-dimensional. 2. Choose the number of layers of the network, and their connectivity. Choose a pair of real finite-dimensional representations A i , Bi (of the same dimension) for each non-input layer i, and an activation function ,fi which transmutes from A ito Bi. For each output layer I, B , must be , are given by Theorem trivial. The possible choices of triples ( A i BiJJ 111.1. Without loss of generality (Corollary III.l), A i and Bi can always be taken (for non-input layers) to be induced from trivial or alternating representations of a subgroup Hiof G, and so are either permutation or inversion representations. The following considerations may help to choose the hidden layer representations:
T H E G R O U P REPRESENTATION NETWORK
351
(a) The number of nodes in a given hidden layer i is equal to the dimension of A i or of Bi.The number of hidden nodes in the network, and the connectivity of layers, are determined by similar considerations to that in standard feedforward network construction. (b) The effective number of free parameters in the network, a crucial consideration for training and generalization, is given by the total of the effective number of free parameters in each layer of connections (plus any additional degrees of freedom introduced by parameterizing the network’s activation functions, for example). In the layer of connections between layer i and layer ,j, the effective number of free parameters is given by the intertwining number of Bi and A i . This number can be found by applying Lemma IV.l. It is generally good to maintain a low ratio of number of connections to number of parameters. 3. For each set of connections from a layer i to a layerj, construct the weight matrix as a (preferably complete) parameterized homomorphism from Bi to A j . By assumption, A j is induced from a trivial or alternating representation aj of a subgroup H. A complete parameterized homorphism can be constructed by applying Theorem IV.l. 4. If desired, the actual number of parameters in the network can be reduced to the effective number by row-reducing each matrix Q,.,(aj, Bi) to echelon form.
One of the central issues in GRN design is the choice of representations for the hidden layers of the network. An important guide to this choice is the number of free parameters, that is, the intertwining numbers. While these can be calculated for a given pair of representations by Lemma IV.l, this technique is inefficient when designing a large network, as it has to be repeated for many candidate pairs of representations. Instead, it is better to find or construct the character table of the group, and then to decompose each candidate representation in terms of the irreducible representations, using the laws given in Lemma 11.4. All the intertwining numbers of interest can then be easily computed by applying law 6 of Lemma 11.4. Note that a GRN can include a threshold parameter in each non-input layer for which the input and output representations are permutation representations. This follows from the transmutation law, Theorem 111.1. We can also deduce this by considering the threshold as the weight of a connection from an extra input node which outputs a constant value of 1. It is crucial that all the nodes in a given layer must have the same threshold. Example ZV.4 Here is a complete example of an adaptive group repre-
358
JEFFREY WOOD
sentation network. The invariance group will be the symmetric group of degree 3:
G = S,
=
{e,(l 2 3), (1 3 2), (1 2), (1 3), (2 3)).
We will build a 4-layer network. The representations and activation function corresponding to each layer are as follows.
1. The input representation of the GRN (i.e., the output representation of the input layer) is the 2-D irreducible representation B , considered in Example IV.2.2. 2. The input representation of the first hidden layer corresponds to an inversion representation A , induced from the alternating representation of the subgroup ( e , (1 2)), again as considered in Example IV.2.2. The output representation of this layer is the underlying permutation representation B , of A , :
B,(e)
=
[:: ;) 0
1 0 ,
[;;
[:: s)
B,((1
2 3)) = 1 0 0 ,
B,U
2)) =
Bl((2
3))
0 1 0
B,(U
3 2))
=
J l
[;; :j, 0 1 0
B,(U
3)) =
=
[;9 ;), [;1 s'i.
According to the transmutation law, Theorem 111.1, the activation function can be any even function. 3. The input and output representations of the second hidden layer are both the regular representation A,, and the activation function of this layer is arbitrary.
/
2 3)) =
0 0 1 0 0
I
1 0 0 0 0
0 1 0 0 0
0 0 0 0 1
\ o o o o
0 0 0 0 0 1
I
0 0 0 1 ' 0
o/
THE GROUP REPRESENTATION NETWORK
I AA(1
0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1
359
\
2)) =
.
0 1 0 0 0 0
1 0 0 1
o o o /
4. The input and output representations of the output layer are both trivial, and the activation function is again arbitrary. The numbers of nodes in each layer are given by the dimensions of the corresponding representations, that is, 2, 3, 6, and 1, respectively. For simplicity, we take each layer to have connections leading only to the layer above. Let us begin by considering the layer of connections from the input layer to the ( A B,)-layer. The minimum parameter dimension of these connections is the intertwining number of (Bo,A , ) , which by Lemma IV.l is equal to the trace of the matrix:
where c( denotes the alternating representation of {e,(l 2)). Thus the minimum number of free parameters is 1. The weight matrix of the complete formal homomorphism, as given by Theorem IV.1, has already been constructed in Example IV.2.2, and is equal to
I
PI
o
\
Let us now move on to the next layer of connections, that is between the B , and A , representations. The characteristic matrix Qle)(l,B , ) is clearly the 3 x 3 identity matrix, and so by Lemma IV.l the minimum number of parameters is
360
JEFFREY WOOD
3. Furthermore, from Theorem IV.1 the weight matrix is:
P4
P5
P3
P5
P3
P4
P3
P5
P4
P4
P3
P5
where the parameters p3, p4, and p5 are so enumerated to avoid confusion with those of the previous layer. Finally, the connections to the output layer can be constructed trivially; there is one free parameter p6, and the weight matrix is: WB2,Aa
= (P6
P6
P6
P6
P6
P6).
This completely specifies an invariant GRN. There are five effective degrees of freedom in the network. Without using echelon form reduction of the coefficient matrices, or any other kind of simplification technique, the sixth parameter pz would also be included in WBo,A,, despite the fact that it has no actual effect on the operation of the network. Additional degrees of freedom could be incorporated by parameterizing the activation functions; in particular by including thresholds. As the activation function of the first hidden layer must be even, thresholds are not allowed in this layer; however, thresholds can be added to the second hidden layer and the output layer..
T H E G R O U P REPRESENTATION NETWORK
361
Note that all the examples of GRNs which we have given have simple connectivity, with each layer feeding direct information only to a single higher layer. As in a classical feedforward neural network, GRNs can have more general underlying structures; the only requirement on the connectivity is that it cannot involve loops.
E. Symmetry Networks An important subclass of GRNs, sufficient for many applications, are the “symmetry networks” introduced by Shawe-Taylor (1989, 1993). A symmetry network is a G R N in which every representation is a (finite-dimensional) transitive permutation representation. The structure of a symmetry network can be described in more detail than that of a general GRN; in particular: 1. The input and output representations of each layer are equal, and there are no restrictions on the network’s activation functions. In particular this means that thresholds can be added to the hidden and output nodes without affecting the transmutation property. This does however require that all the nodes in a given layer have the same threshold, a special case of the “weight sharing” property discussed in what follows. 2. Each layer has not only a corresponding (permutation) representation but also a corresponding subgroup, which is the set of all group operations fixing an arbitrarily chosen node in the layer. This is the subgroup denoted by H (for a given hidden layer) in the preceding sections. 3. The complete parameterized homomorphisms in a symmetry network have a special structure. The weight of a given connection is equal to the value of a single parameter, this parameter being shared by a number of connections in the same layer. In other words, the structural restrictions in a symmetry network take the form of (“hard”) weight sharing. Symmetry networks can therefore be easily drawn, using a different color or style for all the connections sharing a given weight parameter. 4. In a symmetry network, the group not only has a well-defined permutation action on the nodes of each layer, it also has a permutation action on the connections between a given pair of layers. This action is simply induced by the permutations on the nodes in the corresponding layers, and connections with the same weight are precisely those in the same orbit. This alternative description of a symmetry network is given by Shawe-Taylor (1989, 1993). Also, the group action on the connections between two layers can be described in terms of the double cosets of the corresponding pair of subgroups (Wood, 1995).
362
JEFFREY WOOD
Figure 4 in Example IV.l.l shows an adaptive weight matrix in a GRN. The representations concerned are permutation representations, and so the resulting subnetwork is in fact a subnetwork of a symmetry network. We will encounter other examples of symmetry networks in the next section. V. PRACTICAL CONSIDERATIONS AND SIMULATIONS In this section, we look at some practical issues concerning the application of GRNs to invariant pattern classification problems. We also discuss some simulations in which adaptive GRNs are applied to various problems, and we show that generally they exhibit higher generalization performance than neural nets without inbuilt invariance. A. Learning Algorithms Typically feedforward neural networks are trained using gradient-descentbased algorithms such as backpropagation (Rumelhart et al., 1986b; Werbos, 1974). The backpropagation algorithm and its variants can be easily adjusted to work on group representation networks, simply by applying the chain rule for differentiation. Instead of updating the weights of the connections themselves, the changes to the weights suggested by the standard algorithm are “backpropagated” to the parameters of the network. To be more precise, if pk is a network parameter, and E is the error function of the network, then the derivatives required by the learning algorithm can be found by
where the sum is over all weights w (in effect, over all weights in a single layer of connections). The quantities aE/aw are computed by the original version of the learning algorithm, and the partial derivatives aw/dpk are given by the formulas for the weight matrix (parameterized homomorphism) in the previous section. One unfortunate aspect of this algorithm is that its computational complexity is of the order of N , N , , where N , is the number of weights and N , is the average number of parameters in terms of which each weight is defined. For a given training pattern, each iteration of the modified algorithm will therefore be slower than an iteration of the basic algorithm. The reduction in the number of training patterns (see what follows) should, however, make up for this.
THE G R O U P REPRESENTATION NETWORK
363
When a GRN has a large number of parameters in some of its connection layers, special attention has to be paid to the range in which those parameters are initialized (assuming a randomizing technique is used). This is because the magnitude of each weight will be of the order of N , times the average magnitude of the parameters. Consequently the initialization range for the parameters should be scaled down by a factor of N , from the normal range used to initialize weights. Due to the structural restrictions on a GRN, batch learning (as opposed to on-line learning) may be particularly inadvisable for such networks. This is because batch learning introduces additional local minima into the error surface of the network, and a GRN will be more likely to fall into these than a standard neural network (this problem is discussed in more detail in (Wood, 1995)). Instead, on-line learning is preferred, where a change in parameters is made after the presentation of each training pattern in turn. It may be that techniques other than gradient descent are more suitable for highly constrained networks such as the GRN. Other possible training methods include simulated annealing and genetic algorithms (Davis, 1987). An interesting possibility for further experimental research is the development of constructive algorithms such as cascade correlations (Hertz et al., 1991) for GRNs. However, the simulations to be described in Section V.E use gradient descent techniques. B. The Learning Process
The invariance in a GRN is enforced so it is unnecessary (and possibly detrimental) to include more than one training pattern per group orbit in the training set. That is to say, if Y is a training pattern, then there is no need for B,(g)v to be a training pattern for any g E G (here B, is the input representation of the GRN), as the response of the network to these two patterns is guaranteed to be the same. This is very important, and can reduce greatly (by a factor of the order of \GI) the amount of training data required by the network. This makes the job of the training supervisor easier, and speeds the learning process. In the previous section we noted that each iteration of a gradient-descentbased learning algorithm will take longer in a GRN than in an ordinary neural network. This is, however, counterbalanced, first by the fact that the number of training samples is reduced in a GRN, and second by the fact that there are fewer parameters to optimize. However, it seems that there is a greater tendency for GRNs to get stuck in local minima than for conventional neural networks; this is probably a consequence of the highly constrained network structure, and is the main reason why other learning algorithms may be preferable for GRNs.
364
JEFFREY WOOD
C. Generulization
The main advantage of a GRN over a standard neural network, when applied to an invariance problem, is that the invariance is already inbuilt and therefore does not need to be learned. To put it another way, part of the job of generalization (the invariant transformations applied to the test data) has already been done during the prior construction of the network. In some applications, such as that of graph isomorphism or the parity problem (both discussed in Section V.E) the invariance completely determines the problem! In such a case, a “perfectly discriminating GRN” (see what follows) can classify all patterns correctly without training; it is only necessary to compare the response of the network to a test pattern with the responses to the training patterns. However there is a caveat here- the parameters of the network have to have been initialized in a suitable range, as discussed earlier, to avoid problems caused by round-off errors. In general, we expect that a neural network’s generalization performance will be increased by the incorporation of prior knowledge. A reduction in the number of free parameters of a network (such as accomplished by the process of in-building invariance) also generally improves generalization ability (Le Cun, 1989).
D. Discriininability The discriminability of a GRN (or any invariant function) is its ability to discriminate between inputs not in the same orbit (this is only a qualitative measure). One danger with the construction of a GRN is that additional unwanted invariances may be introduced into the network inadvertently. This is also an issue for other invariant pattern classifiers. As an example, the power spectrum of the discrete Fourier transform is frequently used to provide cyclic translation invariance; however, this function is also invariant under “reflection” of the input sequence, which may be undesirable in some applications. In a GRN, poor discriminability can occur simply when the network structure (i.e., connectivity and hidden layer group representations) is badly chosen. As an example, in a symmetry network we could choose to have no hidden nodes and connect the input directly to the output. When the input representation of the network is transitive, this requires that all the weights from the input nodes to the output node are the same, and the network is now invariant under all permutations of the input, and, furthermore, under other operations as well. This somewhat pathological example should give an indication as to the general problem. We can avoid poor discriminability
T H E G R O U P REPRESENTATION NETWORK
365
by building a sufficiently large network and by choosing representations of connected layers in such a way that they have a “large” effective number of parameters. The ideal case is “perfect discriminability.” An adaptive GRN is said to discriminate perfectly if for any two inputs v l , v2 which are not in the same orbit under the group action, and for any real value E > 0, there exists an assignment to the network parameters such that the corresponding outputs differ by at least E . Given a finite-dimensional permutation representation of a finite group, we can build a symmetry network invariant under this representation which has a single hidden layer corresponding to the regular representation. Such a network discriminates perfectly (Shawe-Taylor, 1993). However, the problem with this network is that it contains IGI hidden nodes, which for some applications is a great many. In general, the problem of ascertaining the discriminability properties of a GRN is difficult, even for symmetry networks; see (Shawe-Taylor, 1993; Wood, 1995). The best approach in practice seems to be heuristic: a GRN architecture with sensible connectivity and a “reasonable” number of free parameters will have good or perfect discriminability.
E. Simulations We now look at a variety of small problems that have been used to test the learning and generalization abilities of group representation networks, in comparison with standard neural networks. In all cases, the networks were trained using gradient-descent-based algorithms, and all non-input nodes in the networks used the binary sigmoidal function f ( x ) = 1/( 1 + exp( -x)) as the activation function. 1. The Gruph Recognition Problem
The problem is to identify a graph isomorphism class; that is, to construct a system giving output 1 for any graph isomorphic to some chosen prototype, and output 0 for any graph that is not isomorphic. Graph isomorphism is important in the identification of fingerprints. The graph isomorphism problem is “difficult”; it is generally thought to be NPcomplete (Luks, 1982). An n-vertex graph is encoded by a binary sequence of length n(n - 1)/2, where each 1 represents the presence of an edge, and each 0 the absence of an edge. The edges are represented in a fixed order, for example, lexicographically: (1
2),(1
3),(1 4 L . . . , ( l
fib12 3),(2 4),...,(2
~),...,((n-l)
n).
366
JEFFREY WOOD
Because graphs with a different number of vertices are trivially nonisomorphic, we consider only isomorphism tests for graphs with the same number of vertices as the prototype. This enables us to use a fixed GRN architecture. The invariance group of this problem is S, ( n is the number of vertices), and the input representation is the permutation of the input nodes (identified with graph edges) induced by the permutation of the n vertices. This action maps graphs to isomorphic graphs, and the graph recognition problem is to build a pattern classification system that is invariant under this group action. Our first example of a symmetry network for graph recognition is displayed in Fig. 5 (for n = 5). This network has two free parameters (including a threshold in the activation function), among 11 connections (again including the threshold). However, it discriminates only between graphs with a different number of edges, so is not suitable for the problem. The second network (Fig. 6, again for n = 5) has five free parameters among its 61 connections. It can distinguish between graphs with distinct degree (valency) sequences (Shawe-Taylor, 1989). Again this is not sufficient for graph isomorphism. Finally, we have a network in Fig. 7 with perfect discriminability (this is proved in Shawe-Taylor, 1993). This network has two hidden layers, with 9 degrees of freedom and 171 connections. Dodd (1988) succesfully trained a network to recognize three isomorphism classes of 5-vertex graphs. The learning algorithm (back-propagation) took approximately 40,000 iterations to converge. Shawe-Taylor used a symmetry network similar to that in Fig. 7 (it had an additional two output nodes and all Iayers were connected). This network has a similar underlying architecture to that used by Dodd, with 21 free parameters among 293 connections.
FIGURE
5. Symmetry network classifying by number of edges
THE GROUP REPRESENTATION NETWORK
n
FIGURE6. Symmetry network classifying by degree sequence.
n
FIGURE7. Perfectly discriminating symmetry network.
367
368
JEFFREY WOOD
Shawe-Taylor's network converged in 350 iterations of the linear programming algorithm (also a gradient technique; see Shawe-Taylor and Cohen (1990)). It should be noted that this algorithm was not designed with symmetry networks or similar architectures in mind. A further experiment was also performed on 10-vertex graph recognition. This involved learning to distinguish a random graph from 19 similar graphs, obtained from the original by minor distortions. A symmetry network was constructed for this problem. It trained in 190 iterations of the linear programming algorithm, and scored well on a test set, again obtained by random distortion. The symmetry network used had 28 parameters among its 6586 connections, and effectively learned to classify correctly a number of graphs of the order of 20 x lo! = 72,576,000. For further details on these experiments, consult Shawe-Taylor (1989). 2. The Parity Problem
The parity of bipolar (f1-valued) string is unchanged by the inversion (of sign) of an even number of bits. Furthermore, any bipolar n-bit string can be obtained from another bipolar n-bit string of the same parity by such an inversion. The parity problem is therefore fully described by enforcing invariance under this linear group action. The invariance group G is G = x "-lC2, and the input representation is the direct sum of the n - 1 distinct alternating representations of G and the trivial representation. A class of GRNs invariant under these groups was constructed (one network for each value of n). These networks each had a single hidden layer of 2"-' neurons, for which both the input and output representations were the regular representation. The input and output representations of the output node were always trivial. These networks were compared with two other types of network: 1. A standard feed-forward neural network, unconstrained. This network had the same number of nodes and the same connectivity as the GRN already described herein. 2. A network invariant under both input inversions (as described here) and input permutations. The input and output representations of the hidden layer were chosen to be the permutation representation corresponding to the subgroup of all input node permutations.
For n = 2 and n = 4 an additional comparison was made with a symmetry network (invariant under all input node permutations) with the same underlying architecture as the preceding networks. A symmetry network was not used for other values of n since a comparable architecture did not exist.
369
THE G R O U P REPRESENTATION NETWORK
Each of the networks described here was trained for n = 2 , 3 , 4 , 5 , twenty times in each case from random initial parameter values. A backpropagation-style algorithm was used, with batch updating. In some cases, the network failed to reach the solution, due to being caught in a local minimum or tending towards a minimum at infinity. The results of these simulations are shown in Table I. Numbers of iterations are averages over 20 runs. It can be seen that the number of iterations required to train the networks with the inversion invariance constraints was significantly fewer. However the price of this faster convergence appears to be an increased rate of failure to converge at all. We hope that this is a property of the particular problem and specific network architecture (as defined by the hidden layer subgroups) and not one of the technique in general. It is also quite likely that the use of batch learning (which was required by the particular variant of backpropagation used) was partly to blame for this convergence failure. It is interesting to note that the symmetry network never managed to converge for n = 4. It is possible that no solution exists with the network weights constrained in the specific manner of this network. The choice of subgroups for the inversion-invariant networks is severely limited by the need to maximize the number of parameters. The choice of TABLE I SIMULATION RESULTS FOR THE PARITY PROBLEM ~~~
~
# free parameters
Percentage of convergence
Mean # iterations when converged
Invariance group"
# h. layer
2 2 2 2
-
2 2 2 2
9 9 9 9
9 5 5 4
80% 9 5Y o 100% 60 Yo
52 43 30 47
3 3 3
-
4 4 4
21 21 21
21 6 4
100% 100% 80%
69 45 35
4 4 4 4
-
8 8
49 5
8 8
49 49 49 49
100%
P I IP
0 Yo 95% 45%
20 1 N/A 63 62
5
-
16 16 16
I12 112 112
112 8
loo% 90% 40%
229 100 54
n
5
5
P I IP I IP
I IP
nodes
# connects
I 4
4
. no invariances (standard neural network); 'P: permutation invariances only (symmetry network; '1': inversion invariances only (GRN); ' I P inversion and permutation invariances (GRN).
.'-',L
370
JEFFREY WOOD
the hidden layer representations unfortunately resulted in a number of nodes that was exponential in n. However, this is not too serious a problem, because the number of free parameters in the network is still only linear (or in the case of the inversion plus permutation-invariant networks, constant!). It is also worth noting that for a perfectly discriminating inversioninvariant network applied to the parity problem only two examples (one positive and one negative) need to be presented to the training algorithm to enable learning of the entire data set. The classification of all other possible examples is then defined automatically by the invariance. This is extremely useful because the total number of inputs increases exponentially with n. 3. A Key Feature Character Recognition Problem
To test GRN adaptivity under transformations other than permutations and inversions, we attempted a character recognition problem. The characters used were capital letters from three classes: “A,” “M,” and “X.” Each letter was specified by the coordinates of five key features (the endpoints, points of intersection and “corner” points) in a fixed order. The first point was always taken to be the origin, so that each pattern was an 8-dimensional real vector. A G R N was constructed with invariance under 30” rotations and reflection in the y-axis. The invariance group was therefore the dihedral group D12. This GRN was compared in performance to a standard feedforward network with the same connectivity. Each network had 3 output nodes and two hidden layers each of 12 hidden nodes. Both networks were trained on 72 patterns (equal number of “A%, “M”s, and “X”s) and then tested on a further set of 72 patterns. The training set and test set were both generated randomly, according to some simple rules. One of the pattern sets is shown in Fig. 8. Over 10 test runs, the GRN took an average of about 2800 iterations of a gradient descent algorithm to converge on this problem, but the conventional network trained in only 700 iterations. However, when shown the test set, the standard network had an average correct classification rate of 91.9%, whereas the GRN exhibited a flawless performance of 100% correct classification on each run.
VI. THE COMPUTATIONAL POWER OF THE GROUPREPRESENTATION NETWORK MODEL We have introduced the group representation network model, and presented an analysis of its structure. We are now going to argue that the GRN is a very general model for the construction of group invariants.
371
THE GROUP REPRESENTATION NETWORK
+...
-f^........
F ............... - f l
x x ....
....
+....
q...
........
. . . ~ .. . . ~
+....
. . . ~
.~
....
4
........
FIGURE 8. Sample patterns for the A-M-X problem.
Any continuous function between real vector spaces can be approximated arbitrarily closely over any compact set by some feedforward neural network using the sigmoidal activation function f ( x ) = 1/(1 exp( - x)). For information on approximation abilities of neural networks, consult, for example, Cybenko (1989), Hornik (1991), or Leshno et a[. (1993). Now the group representation networks form a very rich subclass of the class of invariant feedforward neural networks. The number of layers in the network is unconstrained. The representations associated with each layer can be chosen freely. The weight matrices themselves can take any value in the appropriate intertwining space. Finally, the activation functions of the network also allow great scope for freedom of choice in GRN construction, the only restriction on these being the transmutation criterion. As GRNs
+
312
JEFFREY WOOD
often use permutation representations acting on both the input and output of their hidden layers, and any function will preserve a permutation representation, we are often able to choose any functions as activation functions. Given the approximation power of feedforward neural networks in general, and the flexibility of the class of GRNs as invariant neural networks, we expect a result such as the following:
Conjecture VZ.1 Universal Invariance Conjecture Any function that is invariant under a finite-dimensional real representation of a finite group can be approximated arbitrarily closely over any compact set by a group representation network. In this section we will present evidence in support of this conjecture, centering on a proof that any polynomial invariant under such a representation can be computed precisely by a GRN. A. Polynomial Invariants
The computation of polynomial invariants can be expressed by means of tensors. We therefore have to introduce some preliminary notation and results concerning the implementation of tensor functions in GRNs.
Definition VZ.1 The product transformation PV,+, on the vector spaces V and W is the natural mapping from V x W to V 0 W defined by
P“,w(V,W):=V@w V l V € ~ W € W
(38) The v-power transformation of the vector space V is the natural mapping P;:V H O r V defined by Pb(v):= @‘v
V V f I/:
(39)
If a group G acts on the vector spaces V and K with respective representations A and B, we easily find that PV,wis a concomitant for the pair ( A 0 B, A 0B),and P; is a concomitant for the pair ( A , @‘A). We will now present a result describing the construction within a GRN structure of the two transformations given in the preceding. These transformations are not generally invariants, so we cannot strictly say that they are “computable by a GRN.” However, they can be computed by appropriate applications of homomorphisms and transmutation functions, so it seems appropriate to say that they are “computable within a GRN.” We will only use this nomenclature in the current section. Lemma VI.l Let V and W denote two modules over theJinite group G which are ,finite-dimensional as vector spuces over R. Then:
THE GROUP REPRESENTATION NETWORK
373
1. The product transformation on V and W is computable within u G R N . 2. The r-power transformation on V is computable within a G R N , fir any r.
Proof 1. Let A and B be the finite-dimensional representations of G on the spaces V; W respectively. As observed, the product transformation on V and W is a concomitant from A 0B (acting on V x W ) to A 0B. We prove that the product transformation on V and W can be computed within a GRN, in two stages. First, we take the case where A and B are permutation representations. Then we consider the general case and reduce it to the permutation representation case. The first part of the proof is constructive.
(a) Suppose that A and B are permutation representations, that is, with a basis of V and W fixed, they act by permuting the components. Then A 0B is also a permutation representation. Furthermore, if X is a basis set for V , and Y is a basis set for W , then X x Y is a basis for V 0 W . It is convenient to identify V with the set of real-valued functions on X, and similarly for W and V 0 W Note that (v 0w)(x,y ) = v(x)w(y) for any v , w, x, y. For any of these spaces, let sq denote the function that performs the squaring operation component-wise with respect to the identified basis, for example, for any v E V and x E X , (sq(v))(x) = (~(x))’. Furthermore, let Tl : V H V 0 W be the mappingTH(v @g>, where 1 is the 1-vector in W (constant function on Y ) . Similarly, let T2 : W I-+ V 0W be the mapping w H (10w). Finally, let T3: V H V; T4: W H W be respective identity operators 1, and define T5: V 0 W H V 0 W to be iZ. Clearly T,, . . . , T5 are all linear transformations, and it is also easy to see that they are homomorphisms between the appropriate representations of G. For any v E r/: w E and (x, y) E X x I: we now find (Pv.wtv3 w))(x,Y ) = V(X)W(Y)
1
= - (v(x)
2
+ w(y))’
1
- - v(x)’
2
1 w(y)’ 2
--
374
JEFFREY WOOD
This allows us to decompose the product transformation into the form
The function z H z2 preserves the permutation representation A 0 B, and so the decomposition here can be computed within a GRN, as shown in Fig. 9. The activation function of the output layer is the identity. (b) The first part constructs a GRN structure for computing the product transformation on two real finite-dimension1 vector spaces acted upon by permutation representations of a finite group. Now suppose that the representations A and B are real, finite-dimensional but otherwise arbitrary. By Lemma 11.1, A must be equivalent to an orthogonal representation A’; let TAbe an isomorphism from A to A’. Define B’ and T ’ similarly. Let PI denote the direct sum of (dimA’) copies of the regular representation of G. Thus every irreducible subrepresentation of A’ is guaranteed to occur as a subrepresentation of PI, a number of times at least equal to the number of its occurrences in A‘. It follows from consideration of the inner product of characters (Lemma 11.3) that the intertwining number of A’ and PI is equal to the dimension of A’, and
I
V@
w
I
T2
\
FIGURE 9. The GRN computation of the product transformation.
THE GROUP REPRESENTATION NETWORK
375
hence there exists a full column rank homomorphism W, from A' to P , (Wood, 1995). We can then construct a left inverse W3:= ( W:W,)- W: of W,, and as A' is orthogonal it follows that W, is a homomorphism from P , to A'. Analogously we have P,, W,, and W 4 : = ( W ~ W J 1 Wdefined ~, w.r.t. B'. Next, note that W s : = ( T i 0T i ')( W , @ W,) is a homomorphism from PI @ P , to A O B . We can now compute the product transformation as follows: pV,W(v,
w, = wS(wlTAv
0 w2TBw)
= w5 PC2[P~],R[P2]( wl TA v, wZ TB
w).
(41)
within a GRN. By the previous argument, we can compute Pn[pll,n[p21 Since TA, TB, W,, W,, and W, are all homorphisms between the appropriate pairs of representations, we can also compute the product transformation on V and W within a GRN. 2. The r-power transformation on V is easily computed within a GRN by performing a duplication V H(V @ V ) and then the successive product transformations (V x V ) - ( V @ V ) , ((V @ V ) x V ) t * ( ( V @ V ) @ V ) , etc. A formal proof is given in Wood (1995). 0 This leads us to our central result on the computational abilities of GRNs.
Theorem VI.1 Let G be ajinite group, and let A denote ajinite-dimensional real representation of G. Then any polynomial invariant under A is computable by a G R N .
Proof This proof is partly based on the polarization process for group concomitants (see (Dieudonne and Carrell, 1971; Weyl, 1946)). Let 4 = Xr=l $idenote a polynomial invariant under A , where the 4;s are its homogeneous terms. Each 4i(and 4 itself) is a function on the space Q [ A ] . As the (real) ground field has characteristic zero, the terms c#+ are also polynomial invariants under A ; (see Dieudonne and Carrell, 1971). Furthermore, the summation operation (4i, . . . ,&) HEf= 4i= 4 is a homomorphism from Of='=, 1 to 1, where 1 denotes the trivial representation. Therefore if each homogeneous term 4i is computable by a GRN, 4 is also computable by a GRN. Using this argument, we can now assume without loss of generality that 4 is a homogeneous polynomial concomitant. Let r denote the degree of 4; we define a linear transformation T4 as follows:
VVEQ"41
4(v)= T 4 f h [ A ] ( v ) ,
(42)
where PAIAlis the r-power transformation on the representation space of A .
316
JEFFREY WOOD
By the concomitance properties of and Y E G: /
r
4 and PAlAl,we find for any V E Q [ A ]
\
=
$(A(g)v)
=
4(V)
=
T,%41(4
and hence T4 is a homomorphism from @ ‘ A to the trivial representation. is computable within a GRN. Therefore T,PALA1 is By Lemma VI.1 PhLA1 computable by a GRN, that is, 5, is computable by a GRN. 0 The proof of Theorem VI.1 (and of Lemma VI.l, which it invokes) is constructive in the case where A is a permutation representation.
B. Construction of Basic Invuriants The fact that we can compute precisely any polynomial invariant using a GRN is itself a strong indication of the Universal Invariance Conjecture for GRNs, because polynomials are also well known to be a class of (local) universal approximators. Furthermore, the class of concomitants computable by GRNs is by no means limited to polynomials. Given a GRN N, with multiple output nodes computing a set of polynomial invariants, we can “tack on” a standard feedforward network to the outputs of N,, and the output of this neural network will also be invariant. Furthermore, it follows that the entire neural network is also a GRN, in which the group acts trivially on each node in the added part of the network (and each weight individually in this subnetwork is a homomorphism). In particular, any invariant 4, which is a function ‘P of a set of invariants which are computable by GRNs, is also computable by a GRN, on the condition that Y can be computed precisely by a feedforward neural network (with arbitary activation functions). This is particularly interesting given the notion of a finite integrity basis (Weyl, 1946). A finite integrity basis (of a given representation of a given group) is a finite set of functionally independent invariants, in terms of which any invariant can be expressed. The invariants corresponding to a given representation of a finite group or compact Lie group have a finite integrity basis. From these observations, we can argue that to approximate a vast number of invariants (perhaps all) using GRNs, it is only necessary to be able to compute precisely the invariants of a finite integrity basis:
T H E G R O U P REPRESENTATION NETWORK
377
Lemma VI.2 Let G be a group with a representation A hauing,finite integrity basis {41,. . . ,4k}for some k , such that each 4iis computable by a G R N . Let 4 be an invariant under A , which can he written:
4(v)= Y(4 l ( v ) , 42(v), . .> dk(v))
(43)
'
,for some ,function Y. Zf Y can be approximated arbitrarily closely by a feedforward neural network, then 4 can be approximated arbitrarily closely by u GRN. Proof We can construct a neural network N to approximate Y; this network is a G R N over G in which every node is fixed by the group. We can also construct GRNs to compute the basic invariants d1,.. . ,4k. Connecting these GRNs together in the obvious way gives us a G R N which approximates 4. 0 In any case where a (finite-dimensional) representation admits a finite integrity basis of polynomial invariants, Theorem VI.1 and Lemma VI.2 show us that we can approximate arbitrarily closely perhaps any invariant at all; certainly we can approximate arbitrarily closely any continuous function of the finite integrity basis. For one class of representations, this situation indeed holds.
Corollary VI.1 Let S , denote the symmetric group of degree n. Let A denote the natural permutation representation of S , on a real n-dimensional vector space. Then a j n i t e integrity busis for A are the,functions:
41(v)= ) - u i , i
42(v) =
)- ' i U j , i<j
43(v) =
1
vjvjuk,.
.. 4 n ( v ) 2
= v1v2."vn,
i<j
(44) where ui denotes the ith component of the vector v. Hence any invariant $ under A can be written in the form
4 ( v ) = $(41(44 2 ( 4 , . . (4) ' 3
(45)
for some function Y. I f Y can he approximated arbitrarily closely by a feedforward neurul network, then 4 can be approximated arbitrarily closely by a GRN.
Proof That form a finite integrity basis is proved by Weyl (1946). The other results follow from Theorem VI.l and Lemma VI.2. 0 We have argued here that the G R N model is capable of computing a large (perhaps even universal) class of invariants under any finite-dimensional representation of any finite group. In particular, we have shown that any such invariant which is a polynomial can be computed by a GRN. The
378
JEFFREY WOOD
computation of other standard finite integrity bases by GRNs is an open question, as is the resolution of the Universal Invariance Conjecture for GRNs. VII. THEGROUPREPRESENTATION NETWORK AND OTHERINVARIANT CLASSIFICATION METHODS In this section, we look at some other techniques for invariant pattern classification in the light of the GRN model. We will see that many such techniques can be viewed as special cases of the GRN model. This new perspective will allow us to improve our understanding of the invariance problem and its solutions, and may suggest new approaches to solving invariance problems. The results in this section also provide new evidence in favor of the Universal Invariance Conjecture VI.1. Recall that in the introduction we discussed two different approaches to the problem of invariant pattern recognition. One of these approaches was the use of adaptive invariants, and the other was the process of invariant feature extraction followed by feature classification. The feature classification part of the latter two-stage process is often performed by a neural network. If in this case we can show that the invariant feature vector (for example the DFT power spectrum) is computable by a GRN, then the entire process can be performed by a GRN, because the neural network for feature classification can be added onto the GRN for invariant feature extraction. Each neuron of this added network is fixed by the group action, because it is computing a function of group invariants. A . Integral Transform Invariants The general integral transform (Caelli and Liu, 1988; Lenz, 1990) of the function u over the set X is given by Vs E S
z(s) :=
jx
V(X)K(S,
x) d x ,
(46)
where S is another set, K (the “kernel”) is a function on S x X and z (the output of the integral transform) is a function on S. The group G acts on the set X , and this induces a linear action of G on the space of functions u. The complex modulus Iz(s)l of z at any point s is required to be invariant under this action, and the transform is defined to be invertible. We begin by looking at several well-known classes of integral transform invariants.
THE GROUP REPRESENTATION NETWORK
379
1. The Discrete Fourier Transform As discussed in Example 11.3.1, the classical discrete Fourier transform can be defined as an isomorphism from the regular representation of the group C , to the direct sum of its irreducible representations (Clausen and Baum, 1993). For example, in the 4-D case we have 1 0
0
h,m
(47)
where h,,, is the matrix representing the 4-D DFT operator. The power spectrum of the classical D F T is invariant under cyclic translation of its input. By definition, the power spectrum is obtained by applying the complex modulus function, which we denote here by f , component-wise to the transformed vector. Let R j denote the j t h irreducible representation of C,, that is, R j ( g k )= ( @ i k ( j -
1,
I)/N
(48)
2 . . ,9,where the group elements are enumerated by go = e, gl, g 2 = gl,. = gl-'. It is evident that f ( e l H x )= f ( x ) for any reals 8 and x, so f transmutes each irreducible representation Ri into the trivial representation. From this it follows, as discussed in Example 111.2.3, that f transmutes the direct sum of all irreducibles into the direct sum of n copies of the trivial representation. The power spectrum is hence obtained by first applying a homomorphism (the Fourier transform) and then a transmutation function (the complex modulus function). It therefore fits naturally into the framework of the group representation network -the D F T power spectrum is, not only functionally but structurally, a G R N . The power spectrum GRN for n = 4 is shown in Fig. 10.
1.1
____ weight 0.5
- - _ - -weight -0.5 ............ weight 0.5 L weight -0.5 t FIGURE10. The GRN for the power spectrum of the DFT (n = 4).
380
JEFFREY WOOD
Note that, although the input of this network is generally taken to be real, and the output is always real, the network requires complex-valued computations to arrive at its output. An obvious question is “can we implement the DFT in a GRN in which all connections (representations, etc.) are real?”. The answer must be “yes,” for the D F T power spectrum is a polynomial invariant, and by Theorem VI.1 can therefore be computed by a real GRN. Indeed, we will now show that the formula for the power spectrum can be reformulated as a real-weighted sum of products of the input values. Let the input components be denoted by u l , . . . ,v,,. Then the rth component of the DFT power spectrum, p,, is given by P,=-
I;;;
1e
-2m(r-l)($n
1)
s= 1
27r(r-l)(s n
s= 1
n
n
cc
+s=1 1 ”
t = 1 vsvt
[sin
-
1)) us) 2
+
(
__ 1 ;;;.=I
1 sin. (27r(r-l)(s-l))vs)’ n
2n(r - l)(s - 1) . 2n(r - l)(t - 1) n )sin( n
(
(2n(r - t)(s - t )
(49)
ns=1 t=l
The power spectrum can be computed in the form of Eq. (49) using a real GRN, as shown in Fig. 11. This GRN works by computing a 2-power transformation (as discussed in Section V1.A) and then performing a linear mapping before taking the final square root. Associating each node in the layer forming the output of the 2-power transformation with a pair (s, t) (where s, t~ 1 . . . n), we find that the weight from the node (s, t ) to the rth node of the output layer is given by:
Thus we have a real finite GRN with a relatively simple structure that is computing the D F T power spectrum. It is worth pointing out that in the final formula for p , the cosinusoidal weight terms could be replaced by any periodic function of ( s - t ) with period n without affecting the invariance.
T H E G R O U P REPRESENTATION NETWORK
4 I
38 1
2-power transformation
m
[o o m - 0) FIGURE11. G R N (2) for the power spectrum. The weights shown have values dictated by Eq. (50).
2. The Continuous Fourier Trunflorm The power spectrum of the continuous Fourier transform is defined by the equation
where p is the power spectrum (a function on the real line) and u is the input function. Each component p(s) of the power spectrum is therefore obtained by performing a linear transformation (the integral with fixed s), and then applying the complex modulus function. The invariance group is the 1-D translation group (the additive group R). If we extend the concept of a GRN to allow infinite numbers of neurons, then we can construct such a network for computing the continuous Fourier transform, in the same way as in the discrete case. This has been indicated in Examples 11.1.5 and 111.2.4. 3. The Mellin Transform One solution to the problem of 1-D scale invariance is provided by the Mellin transform (Li, 1992; Sheng and Arsenault, 1986), defined for any function u(r) on R by:
382
JEFFREY WOOD
A standard trick is to define the following coordinate mapping Tog, called a logarithmic mapping, on the input space: VXER
ul(x):= (TlOg(u))(x) := - 2 ~ ( e - ~ " " ) .
We now find that VSE R
z(s) =
rm
u,(a)e-2"'""da,
(53)
(54)
that is, the coordinate mapping Tog turns the Mellin transform into a Fourier transform. Furthermore, applying a scale factor K to the function u results via this coordinate mapping in a shift proportional to In K of u , . Consequently the modulus of the Mellin transform in this form is invariant (by the invariance laws of the Fourier transform) under scaling, and this modulus can be calculated by performing a logarithmic mapping and then computing the Fourier transform power spectrum. Now let us define two representations of the additive group R; first the scaling representation A , , for all elements ~ E and R functions u on the positive real line, we have
Vx E [R
+
( A l(y)u)(x):= u(ezrrYx).
(55)
Now we define another representation A , of R, for any function u , on the real line and any group element y , by VXER
( A , ( y ) u , ) ( x ):= ul(x - y ) ,
(56)
The logarithmic mapping Togfrom QIA1] to sZ[A,] is linear because the value of a vector 7;o,(u) E ~ [ A J at a point x is a multiple of the value of v at the point e - 2 n x .It is routine to show that Togis a homomorphism from A , to A,. Furthermore, we have seen that the Fourier transform power spectrum can be computed by a GRN, and so the Mellin transform power spectrum can as well. 4. The General Integral Transform
Recall the form of the general integral transform: Vs E S
z(s) :=
lx
u(x)ic(s,x ) dx,
(57)
the modulus of z being unchanged by the action of a group G on the space V of functions u on X . Let us consider the case where X is finite, in which case V is finite-dimensional, and let the representation of G defining the induced action on I/ be denoted by A. We now show that the function z is of G. a homomorphism from A to some representation e'B(y)
THE G R O U P REPRESENTATION NETWORK
Lemma VII.1
383
Suppose that:
1. V is ajinite-dimensional Hilbert spuce over a subfield of C on which the group G ucts according to some representution A. 2. z is a linear function from V to C. 3. z satisjies Iz(A(g)v)l = Iz(v)l .
V ~ G, E VEV
(58)
Then there exists a function 6 from G to lR that satisfies Vg E G,v E I/
z(A(g)v) = eiH(g)z(v),
(59)
and,furthermore e'H(~~)e'"(yz) for all gl, g 2 in G. = 0, so we assume z # 0. By the Riesz Representation Theorem (Kreyszig, 1978), we can identify the functional z with an element z of r! that is, z ( v ) = (2, v ) for any V E where (,) denotes 1 the Euclidean inner product. Define 2 E V by ? = t, so that (?, 2) = 1.
Proof The lemma is trivial for z
~
Let B be an orthonormal basis for any b e B and g E G we have
lbll
which includes i. By Eq. (58), for
and hence, for some function 6 : GH R, we have
e"(g)
if b = 2 if b # 2.
We now find Vb E B, g E G
z(A(g)b) = eiiYg)z(b).
By linearity, we obtain Eq. (59) as required, and it is now straightforward = e's(''l)ero(q~) for all g1,g2 in G. to show that 0 ei*Y1Y2)
From Lemma VII.1 it follows that g ~ ( e ' ~is ~a )representation ) of G, which the complex modulus function transmutes into the trivial representation. We have also deduced that the linear function z is a homomorphism from A to g++(e'"')'),so each element of the integral transform invariant is computable by a GRN, and hence (by computing these elements in parallel) every integral transform invariant over a finite set can be computed by a GRN. It remains an open question as to whether Lemma VII.1 can be generalized to the case where X is infinite. The proof of the lemma applies when V is infinite and z is bounded, but may not hold more generally.
384
JEFFREY WOOD
B. Fast Translation-Invariant Transforms Another technique for achieving cyclic translation invariance is that of fast translation-invariant transforms (FTIT), introduced by Wagh and Kanetkar (1977); see also Moharir (1992). These are transforms that can be computed using a scheme such as that illustrated in Fig. 12 for the case G = C,. Here the leftmost set of nodes are the input nodes, and the information flow (of real values) is from left to and $ 2 can right. All connections in the diagram have a weight of 1, and be any commutative functions of their two inputs. The output of a fast translation-invariant transform is invariant under cyclic translation. This can be seen in the example in Fig. 12 as follows. A cyclic permutation in the input layer of the diagram induces a permutation in each of the two separate sets of nodes in the second layer. This, in turn, induces a permutation in each of the four sets of nodes in the next layer, and finally a (trivial) permutation in the eight singleton sets of nodes in the output layer. Special examples include the R-transform (Wagh and Kanetkar, 1977), defined by: $1b1,x2) =
x1 + x2,
$z(x11xz) = 1x1 - x2L
(60)
the M-transform (Wagh and Kanetkar, 1977):
$l(x .x2) = max(x,, x2h
$2(.x
1,
x2) = min(x,, xz),
(61)
Uses function
wI
0 Uses function
~2
FIGURE12. Fast translation-invariant transform. AII weights are 1. and information Row is from left to right.
THE GROUP REPRESENTATION NETWORK
385
and the fast correlation transform (Duren and Peikari, 1991):
Any fast translation-invariant transform can be either approximated or computed by some GRN. Let us first consider the special case in which both ) and $, are functions of the sum of their inputs, that is, $ i ( ~ I r ~ 2 = ti(xl x,) for some function ti,i = 1,2. In this case, the scheme illustrated for computing the FTIT can clearly be regarded as a feedforward neural network. As already noted, the permutation action of the group on the input nodes induces a permutation action on each higher layer successively. Thus the nodes of each layer of the network can be divided into subsets that are acted upon by permutation representations of the group. In making these observations, we are implicitly deducing that the linear transformations defining the function of the FTIT are homomorphisms. Because any activation function li preserves permutation representations, we can conclude that the network just described is a GRN (and a symmetry network). Furthermore, we will now see that we can substitute networks approximating the propagationjactivation functions $, and $, in order to build GRNs for any FTIT. Let us restrict our consideration to the function the same argument can be applied to $z. The condition on $, is that it is symmetric, that is, invariant under the natural representation of S, = C,. By Corollary VI.1, we should therefore be able to build a GRN N, to compute or approximate $,. The network N2 is formed by constructing the basic invariants x1 x, and xIx,, and then applying some function to those invariants. The basic invariants can be computed using only the regular and trivial representations of S , within N, (in other words, using only the permutation representations). Every node in N, not involved in the computation of these invariants computes a function of these invariants and is thus fixed by S,. Therefore we can assume that every representation used in N, is either a trivial or a regular representation. Now consider the FTIT network N, (as shown in Fig. 12 for the case n = 8). The nodes of each layer of N, can be divided into subsets acted upon by transitive representations of C,. These subsets halve in size in each successive layer, and all the nodes in a given noninput subset have the same propagation function ($, or t,b2). Consider one such subset B,. Those nodes in the next layer to which B , is connected can themselves be divided into two equal-sized sets according to propagation function; consider the set B , with transition function $,. We now consider the substitution of N, for every node in B,. We can collect each set of corresponding layers (S, representations) of the separate
+
+
386
JEFFREY WOOD
Bl
B3
CASE 1
Bl
CASE 2
FIGURE13. Substituting the N, networks into a given layer of connections in the N, network. In case 1, the group acts on the B , nodes in the same way as on the B , nodes. In case 2, the group acts on the B , nodes in the same way as on the B , nodes. The dotted arrows represent general information flow through the N, networks. It is possible that some branches of the N2 networks skip the B , nodes altogether, but to add such connections would overcomplicate the picture.
N, networks. Let B , denote a set of nodes formed in this way. Figure 13 shows the two possible cases that need to be considered regarding the structure of B,. 1. Consider the case where B , comprises N, layers identified with the regular representation of S,. The two nodes in such a given N, layer can be identified with the two input nodes of that N, network; thus we have an identification of the nodes of B, with the nodes of B,. The group G acts on B , by cyclic permutation, and hence on the N2 networks by a combination of cyclic permutation and application of the transposition of S , to each N, network. By arranging the nodes of B, according to the identification described in the foregoing, the action of G on B , again becomes a cyclic permutation. In other words, the action of G on B , is identical to that on B,. 2. Now consider the case where B, comprises N, layers identified with the trivial representation of S,. The nodes in B , have a one-to-one correspondence with the nodes in B, (each B, node is identified with the B , node in the same N, network). It is clear that the B , nodes are fixed by a given group element if and only if the corresponding B, nodes are fixed, and that they are cyclically permuted precisely when the corresponding nodes are cyclically permuted, as well. Thus the action of G on B , is identical to that on B,.
We have now seen that the group G has a defined linear action on the layers
T H E G R O U P REPRESENTATION NETWORK
387
of the network after substitution. It follows from this that all the linear transformations concerned are homomorphisms. Finally, as all representations concerned are permutations, each activation function must preserve the corresponding representation. This concludes our argument that the network after substitution is a GRN. We can therefore approximate any FTIT using a GRN. In many cases, we cannot only approximate but compute exactly an FTIT in a GRN structure. As an example, we consider the M-transform, defined by Eq. (61). The function $1 can be computed by a GRN, as shown in Fig. 14. The output of the GRN in Fig. 14 is given by 1
-(XI 2
1, + x2) + ?J2X: + 2x2 - (xl +
XJ2
1 2
+
1 2
+ x2) + j,1 1x1 - x21
= -(x1
=-(XI
x2)
+ $17-
= max(x 1, xz).
We can similarly construct a GRN for the function t j 2 . Substituting these networks into the FTIT network will give us a GRN for computing the M-transform. We can apply the same technique to construct GRNs for the R-transform and for the fast correlation transform.
FIGURE14. A GRN for the maximum function
388
JEFFREY WOOD
C. Invuriunt First-Order Networks Let us now consider another class of systems used for invariant pattern recognition- invariant (or tolerant) first-order neural networks. Weight sharing is a common approach for introducing invariance into feedforward neural networks. Fukumi et a/. (1982) used it for enforcing rotation invariance in two dimensions. In their network, each input image is represented by a discrete polar-coordinate grid. Invariance is achieved under a finite, though arbitrarily large, rotation group. This works by constructing a first hidden layer of nodes, each of which is connected to the input with weights that are shared among pixels with the same radius from the image center. The network structure from the first layer upwards has no weight sharing, because the output of the first hidden layer is an invariant feature vector. The network from Fukumi et ul. is invariant under C, for some n acting by rotation on a discretized polar coordinate grid. Each hidden node is fixed by the action of the group, due to the weight sharing, and it is clear that this network is a symmetry network with a comparatively straightforward structure. Many networks designed to solve a problem of transformation invariance are unfortunately only transformation-tolerant, due to practical system limitations. Among these are the time delay neural networks (TDNNs) (Lang and Hinton, 1988; Waibel et ul., 1989), which have been used in speech processing to recognize patterns independent of their occurrence in time-that is, with invariance under the I-D translation group. The basic structure of a layer of connections in a TDNN is illustrated in Fig. 15. Here each node in the higher layer is connected to the nodes of the lower layer by a weight vector, which is a translate of the weight vector of the first node in the layer. Furthermore, some of the connections are absent. At first glance it seems that a network formed by layers of this type, with the layers shrinking in size until they reach the output node, would be invariant under linear translation. A closer inspection reveals that this is not
FIGURE 15. Basic structure of a time delay neural network (TDNN) layer
T H E GROUP REPRESENTATION NETWORK
389
so; to see this consider Fig. 15. Let the weights leading to the first higher layer node be 1, 2, and 4 from left to right, and consider the vectors v 1 = (0, 1, 1,0,0, 1,O) and v 2 = (0, 0, 1, 1,0,0, 1) as output vectors from the lower layer. With v1 as input to this layer of connections, the net input to the higher layer is given by the vector y 1 = (6, 3, 1,4, 2), and that for v2 is y 2 = (4, 6,3, 1,4). Vectory, is not a translation ofy,, so this connection structure has failed to pass the linear translations upwards through the network. A TDNN-like network will be truly invariant under translations in a small central window of the input layer; these will produce no “edge effects” of the type seen in the preceding and will not affect network output. A TDNN can be described as “translation-tolerant.’’ As a TDNN is not truly translation-invariant (and even if it were, the invariances would not form a group), we cannot directly compare its structure to that of a GRN. Nevertheless, it is plain from diagrams such as that of Fig. 15 that the structure of a TDNN is based on the same principles as the symmetry network, and the TDNN is certainly a related model. Networks possessing tolerance under 2-D linear translation tend to have an architecture that is a generalization to two dimensions of that described in the preceding. This covers the neocognitron (Fukushima, 1980), the feature extraction part of the digit recognition network of Le Cun et ul. (1989), and also the network of Rumelhart et ul. (1986a) for solving the T-C problem. Again we have a GRN-like structure, although true translation invariance is not achieved.
D. Higher-Order Neural Networks A higher-order neural network (HONN) is an obvious generalization of a standard (first-order) network (Giles and Maxwell, 1987). Let P denote the power set of the set { I , . . . ,n } . Formally, a higher-order neuron of degree r with inputs ( u l , u 2 , . . . , u,) has the functionality
where the w s are weights, f denotes some activation function and F(v) denotes the output of the neuron. Higher-order neurons can be connected together into networks in the same way that first-order neurons are, but due to the power of a single neuron, the networks can be much smaller. Often only a single computational layer is required. Higher-order networks have been used by many researchers for achieving invariant pattern recognition. They are sometimes applied to problems of linear or cyclic 1-D translation invariance (Duren and Peikari, 1991), or
390
JEFFREY WOOD
more commonly to 2-D invariance under a group of some combination of rotations, translations, and dilations (Kwon et al., 1993; Perantonis and Lisboa, 1992; Spirkovska and Reid, 1992). We can implement (emulate) a higher-order neuron of degree Y within a first-order network. With the definition of Eq. (63), this works by computing the r-power transform (see Section V1.A) of the input vector, and then performing a linear transformation. Algebraically, Eq. (63) becomes:
where W, is a row vector of dimension n‘. Suppose that the original higher-order neuron is invariant under a finite-dimensional representation A of some finite group G (as in the case of Duren and Peikari’s cyclic translation invariance (Duren and Peikari, 1991)). As the r-power transform is a concomitant, we find that the weight vector W, is a homomorphism from @ ‘ A to 1,. Furthermore, by Lemma VI.l, the r-power transform of A can be computed in a GRN. Hence the emulating network must be a GRN. In cases of transformation tolerance, we will find that the emulating network still has a GRN-like structure. In conclusion, if a higher-order neuron is invariant under a real finitedimensional representation of a finite group, then its output is computable by a real GRN. E. Moment Invariants
A moment is a weighted sum of the input pattern u(x) over the input field, with weights given by some polynomial in x. The most common kinds of moment used in invariant pattern classification are the regular, central, and normalized moments, normally given for an input pattern defined on a 2-D grid (Li, 1992; Srivastava, 1991). The central moments, defined in terms of the regular moments, are invariant under translation. The normalized moments, defined in terms of the central moments, are invariant under translation and scaling. However, the moment invariants do not seem to fit in any natural way into the GRN framework, perhaps because of the awkward nature of their recursive definition. It may be that this is an oversight and that, in fact, we can construct a GRN (or GRN-like network) to emulate a moment invariant. It also seems more than likely, from the discussion in Section VI, that we could at least approximate a moment invariant using some GRN. Although invariants such as the central moments have not been shown to fit the GRN model, we may still benefit from an understanding of this model
THE G R O U P REPRESENTATION NETWORK
391
in studying them. This is because the basic building blocks of the GRN, namely the homomorphism and transmutation function, often occur in connection with these and other methods. For example, the regular moments mo,o and m l * oare defined by
dx,Y ) d x d y
mo,o: = m, ,o :=
(65)
1
xvb, Y) dx dy.
(66)
Defining the vector m(v) by m(v)' = ( n ~ mo,o), ~ , ~ we , find that if the original image u is translated by (u, b), then m(u) is transformed by a matrix B(u, b): m(w) = W ,b)m(u),
l B(u, b) = (o
I>
u
Thus the double integral defining m(u) (obtained by combining the definitions of rno,oand ml,o) is a homomorphism from the regular representation of the 2-D translation group to the representation B. In contrast to regular moments, Zernike moments (Perantonis and Lisboa, 1992) are much easier to deal with. These invariants are defined by the complex modulus of the following function for each pair of values d and m: (67) where K is some function defined earlier. Clearly then, the Zernike moments are integral transform invariants. When written in discrete form, the argument of Section VII.A.4 can be applied to them. Thus the Zernike moments, which are invariant under 2-D rotation, can be approximated by a GRN.
VIII. SUMMARY AND OPENQUESTIONS We have presented a general neural network model, the group representation network, for invariant pattern classification. These GRNs are constructed from two basic building blocks, the homomorphism and transmutation function, which can be combined in a very flexible manner to produce networks of any desired complexity. This makes the GRN especially suitable for invariance problems for which no standard techniques exist. The GRN can be parameterized in a natural manner similar to that in a standard feedforward neural network, and standard learning algorithms can
392
JEFFREY WOOD
be adapted for GRN training. The incorporation of prior invariance knowledge improves generalization ability, reduces the size of the training set and may also lessen training time. We have conjectured that a GRN can approximate any real invariant under a finite group to an arbitrary accuracy (over a compact set). In support of this, we showed that a GRN can compute precisely any polynomial invariant under a finite-dimensional representation of a finite group. We have also described the relationship between the GRN and other techniques used for invariant pattern classification. We conclude with a list of open questions: 1. How can we extend the principle of the GRN to problems where the invariant transformations do not form a group? This issue often arises when we try to approximate the action of an infinite group on some space by discretizing the space. For example, the action of rotations about the origin acting on 2-D Cartesian pixel grid cannot in general be formulated as a group action. 2. What are the most suitable learning algorithms for GRNs? It seems that basic gradient descent methods may be inappropriate, and more success may be obtained using methods such as simulated annealing or genetic algorithms. Another possibility is to introduce a constructive algorithm for GRN training. 3. How can we characterize transmutation over a field other than R, for example, over the complex numbers? Can we reduce any GRN over (L to a GRN in which the hidden and output layers use only representations induced from 1-D representations of subgroups? 4. How can we characterize the discriminability of a GRN? This issue is discussed for symmetry networks in Shawe-Taylor (1993), though even for this class of GRNs a conceptually simple answer has not been obtained to date. Connections between discriminability and graph isomorphism (Wood, 1995) suggest that the problem of ascertaining the discriminability of a given symmetry networks may be NP-complete. 5. We have seen that the basic invariants of the natural representation of the symmetric group can be computed exactly using a GRN (Corollary VI.l). Can GRNs compute other known basic invariants (see e.g., (Weyl, 1946))? 6. Does the Universal Invariance Conjecture V1.1 hold? 7. We can apply the concepts of Section VII to find new techniques for invariant pattern recognition. This can be done in two ways: (a) Given a computational system applying as its first stage an invariant transform (such as the DFT), if we can emulate it using a GRN then we have an adaptive invariant system that is at least as powerful.
THE GROUP REPRESENTATION NETWORK
393
(b) Alternatively, given a GRN which is an invariant under some representation, we can substitute values for the network’s parameters to obtain an invariant transform. Furthermore, due to the high degree of flexibility in choice of parameter values and activation functions, we should be able to make such transforms highly efficient. 8. Can we use the fundamental building blocks of the parameterized homomorphism and transmutation function in other neural network structures, that is, recurrent networks? The application of these principles to a Boltzmann machine, for example, may prove relevant to finding fast solutions to highly symmetric combinatorial optimization problems.
ACKNOWLEDGMENTS
I would like to thank John Shawe-Taylor for being a helpful and inspiring Ph.D. supervisor, and for proofreading this paper for me. This work was carried out while I was studying at the University of London, and was supported by the Engineering and Physical Sciences Research Council and the Defence and Evaluation Research Agency under CASE Award no. 92567042.
This Page Intentionally Left Blank
Appendix: Proof of Theorem 111.1
For convenience we restate the theorem:
Theorem 111.1 Let A and B befinite-dimensional representations of a group G over the field R, and let f be a t-unexceptional function from R to R. Then f transmutes A into B precisely when one of the following holds: 1. A is a permutation representation and B = A. 2. A is an inversion representation, B = A and f is odd. 3. A is an inversion representation, B is the underlying permutation representation of A and f is even. 4. A is a positive representation and some reals a, b,, b, exist such that B(g) = p,(A(g)) for all g E G, and f has the form:
VXER
I
blxa x > 0
f ( x ) = b2xa x < 0 0 x=o.
5. A is a perm-diagonal representation and some reals a, b exist such that one of the following holds: (a) B(g) = p,(A(g)) for all g E G and f has the form f (x) = bx" for all x. (b) B(g) = j,(A(g)) for all g E G and f has the form f ( x ) = sgn(x)bx" for all x.
6. A is a unit-row representation, B = A and f is ajine. 7. B = A and f is linear. 8. B is a unit-row representation and f is constant. 9. f is the zero function. Proof Throughout the proof, A and B are n-dimensional representations of a group G over the field R (n < co),and f is a function from IR to R. We will first prove the easy part, namely that any function f and pair of representations A , B obeying one of the 9 cases of transmutation listed in the foregoing do indeed have the property that f transmutes A into B. We consider each case in turn, following two initial results: 395
396
APPENDIX: PROOF OF THEOREM 111.1
Lemma A.l diagonal.
A positive Jinite-dimensional representation of a group is perm-
Proof For any matrix M appearing in the group representation, with inverse N , neither M nor N can contain a negative entry. The ith row of M must have at least one nonzero entry, say Mi,k,and the inner product of this row with the j t h column of N (for any j # i ) is 0, and so N k , j= 0 for all ,j # i. Hence the kth row of N contains exactly one nonzero entry, i v k , i . Using an analogous argument we now deduce that M j , k= 0 f o r j # i. Hence each column of M and each row of N contains exactly one nonzero entry. Reversing the roles of M and N , we conclude that the representation is perm-diagonal. 0 Lemma A.2 The,functionf : lR H R transmutes the representation A into the representation B if and only if the,following holds for all iE 1,. . . ,n, all V E R" and all g E G:
f C
1
A(g)i.jUj =
(j:1
2
(A. 1)
B(g)i.jf(uj).
j:1
Let g E G, iE 1.. .n and v E lRn all be arbitrary. We mean to prove that, in each of the 9 given cases, Eq. (A.1) holds. 1. Let A
=B
be a permutation representation,
where k is such that A ( g ) i , k = B ( 9 ) j . k = 1. 2. Let A = B be an inversion representation and f be an odd function. Let k be such that A(g)i,k = B(g)i,k = f 1. If A(g)i,k = 1, then We have
+
(il 1 A(g)i,jvj
f
Otherwise, A ( g )i,k = /
n
-
= .f(Uk) =
n
1
B(g)i,jf(vj),
j= 1
1, and as ,f is odd we have \
n
3. Let A be an inversion representation, B the underlying permutation representation of A , and f be an even function. Let k be such that IA(g)i.kl= B(g)i,k= 1. Then:
= B(g)i,kf(Uk)
(as f is even)
APPENDIX: PROOF OF THEOREM 111.1
397
4. Let A be a positive representation, B(g) = p , ( A ( g ) ) for all g E G and some (IE IR and f be of the form specified for case 4 of the theorem for some reals h,, b,. By Lemma A . l , A is a perm-diagonal representation. Note that B must also be a positive representation. Let k be such that A(g)q,k= B(g)i.k> 0. For brevity we take uk > 0; the proof for uk < 0 is analogous and the proof for vk = 0 is trivial. Thus we obtain:
2 A(g)
f
1
i,juj
= .f(A(g)i , k u k )
= b , ( A ( di , k U k ) a = A(C!)f.k.f(%) n
=
C
B(g)i,,jf(~~j).
j=1
5. Now let A be an arbitrary perm-diagonal representation. We have two cases to consider: = p,(A(g)) for all g E G and some (IE R and .f is defined by f ( x ) = bx" for all x, where b is another constant real. Again let k be
(a) B(y)
such that A(g)q.k = B(g)i,k# 0. Hence: /
n
n
(b) B(y) = Fa(&)) for all g E G and some Q E IR and f is defined by f(x) = sgn(x)bx" for all x, where h is another constant real. Defining k as usual, this time we have:
398
APPENDIX: PROOF OF THEOREM 111.1 =
sgn(A(g)i , k ) A ( g ) 4 . k f ( U k )
=
B(g)i , k f ( U J n
=
1
B(g)i,j,f(Uj).
j= 1
6. Let A = B be a unit-row representation and f be affine; say . f ( x ) = mx + c for two reals m, c. Now we have: /
n
\
n
n
n
j= 1
j= 1
1 ~ ( gi,jmuj+ ) 1 A(g)i,jc (using the unit-row property)
=
n
=
C
A(g)i,jf(Uj)
j= 1
n
=
C
B(g)i,jf(uj)’
j= 1
7. Let A = B be arbitrary and .f be linear; say f ( x ) = mx for some real m. Trivially we see that:
8. Let A be arbitrary and B be a unit-row representation. Let ,f be a constant function, say f ( x ) = c.
n
=
1B(g)i,jc j =
(using the unit-row property)
1
n
=
1
B(g)i,jf(uj).
j= 1
9. Finally, let f be the zero function. Now both sides of Eq. (A.l) are equal to zero.
In each case, therefore, Eq. (A.l) holds for an arbitrary choice of i, g and u. By Lemma A.2, f transmutes A into B. This completes the first part of the proof. For the second part of the proof, we need to show that no other cases of transmutation are possible (that is, for a t-unexceptional function). This requires a series of preliminary results.
399
APPENDIX: P R O O F OF THEOREM 111.1
Lemma A.3 Let f be a junction that transmutes the finite-dimensional representation A into the representution B. Then either B is a unit-row representation or else f passes through the origin.
Proof Assume f transmutes A into B ; hence Eq. (A.l) holds. Substituting v = 0 we obtain: II
Vie1 ... n , g E G
f(0) = f ( 0 )
1 B(g)i,j. j= I
Hence either f (0) = 0 or Cy, B(g)i,j= 1 for all i and g. This is the required result. The rest of the proof proceeds by deduction of the properties of transmutation functions. The following definition will be useful: If a and b are real numbers satisfying the relation:
V XE R
~ ( u x =) bf(X)
+ (1 - h)f (0)
(A.2)
then we say that (a, b) is a transmutation pair for .f: For any given function J we denote by TP(f ) the set of all transmutation pairs for f: Lemma A.4
Let f be a nonconstant,function. Then the following laws hold:
1. (0, O h t1,1>E T P ( f ) 2. (a, 0)E T P ( f )* a = 0 3. (0, b ) E T P ( f )* b = 0 4. (a, b), (a, C) E T P ( f )* b = c 5. (1, b) E T P ( f )* b = 1 6. TP(f)\(O, 0) is a multiplicative subgroup of R x R 7. ( - 1 , h ) E T P ( f ) * b = + 1
Proof
1. This is immediate from the definition of T P ( f ) . 2. Suppose (a, 0) E TP(f ), that is, .f(ux) = f ( 0 ) for any x E R. As .f’ is nonconstant, we must have a = 0. 3. Suppose (0, b) E TP(f ) , that is, f (0) = b f ( x ) (1 - b)f(O), or bf ( x ) = bf(O), for all x E R.As f is nonconstant, b = 0. 4. Suppose both (a. b) and (a, c) are in T P ( f ) . Then we have
+
b( f
(4 - .f(O))
= f ( a x ) - f (0) = 4 f (4 - f(0)).
As f is nonconstant, we must have b = c.
5. Follows directly from laws 1 and 4. 6. Law 1 gives the identity law for groups, and the associative law follows directly from that for real numbers.
400
APPENDIX: PROOF OF THEOREM 111.1
Now let (al,b,) and ( a 2 ,b 2 )denote transmutation pairs. For any real x we now have:
+ (1 - b,)f(O) = b,b,f(x) + h,(l - b,)f(O) + (1 - b,)f(O) = b,b,f(x) + (1 - blb2)f(O),
f(a1azx) = h,f(a,x)
that is, (a 1u2,b,b2) is a transmutation pair. This proves the law of closure. Finally, let (a, b) denote a transmutation pair other than (0,O). By laws 2 and 3, neither a nor b is 0. We now have
f (?)11
(defining Y )
=f(y)
1 b
= - C.f(ay) - (1 -
1
=bf(x)
+ (1
-
b)f(O)l
;)
.f(O),
so (l/u, l/h) is a transmutation pair. This proves the law of inverses. 7. Suppose ( - 1, b) E T P ( f ) .Then by law 6, (1, b2)E T P ( f ) ,and so by law 5 b2 = 1; hence b = f l . 0
Lemma A S Let f be a function that transmutes the finite-dinzensional representation A into the representution B. Let a denote a sum of any number of distinct entries in any row ofa representative matrix of A , and let b denote the sum of corresponding entries in the corresponding matrix of B. Then (a, b) is a transmutation pair f o r f : Proof Let e, denote the ith column of the identity matrix. Suppose a is the sum of some entries in row r of matrix A(g) of A . Define the subset S of (1,. . . ,n } by a = CJ,sA(g)r.J;thus b = ZJesB(g)r,J. Finally, take Y = ZJESxeJ for arbitrary x. Now from Eq. (A.l) with i = r we have:
fk 1 1 A(g)r.jx
=
B(g)r,Jf(x)
+ 1B(g)r,,f(O) 16s
/ES
* f ( ~ ) = bf(x)
il
+ ((
B(Y)r,j) - b )
f(o).
Now by Lemma A.3, either f ( 0 ) = 0 or C;l=, B(LJ),,~ = 1. Hence we can write VX E R
as required.
~ ( u x= ) bf(x)
+ (1 - b)f(O) 0
APPENDIX: PROOF OF THEOREM 111.1
40 1
Lemma A.6 Suppose that f is LI nonconstant function that transmutes some non-perm-diagonal jinite-dimensional representation A into some representation B. Then T P ( f ) is an additive submonoid of IR x R.
Proof Let A ( y ) denote a non-perm-diagonal matrix of A , which must exist. As A ( g ) is not perm-diagonal, there exist two nonzero elements A(g)i,sand A(g)i,tin the same row of A(g). For simplicity of notation throughout this , ~ , similarly for B. proof, we henceforth write a , for A(g)i,S,a , for A ( C J ) ~and By Lemma AS, (a,,, b,) and ( a i f bit) , are transmutation pairs for f- Also, let ( p , 4 ) denote an arbitrary nonzero transmutation pair, for example, (1, 1). Now, into Eq. (A.l) we substitute v = pxe, aisxe,, for arbitrary x (x should be regarded as an indeterminate). This gives us:
+
f("ispx + aitaisx) = bis.f(px) + bit.f(aisX) +
1
bi,j,f(O)
j+s.i
= bisf(Px)
+ h i t f ( a i s x ) + (1 - (his + bjt)),f(O)
(applying Lemma A.3)
* f(ais(px + aitx)) = bisq,f(x) + bis(1 - 4 ) f ( 0 ) + bitbisf(x) + - bis)f(O) + (1 - (his + bit))f(O)
* 'isf((P+air)X) = bis(4+bit).f(x)+(l - (his4+bi,bis))f(0)-(1- b J f ( o ) * bi,f((P + ait)x) = bis(q + bit)f(x) + (bis - (bis4 + bisbit))f(O). As a, # 0, law 2 of Lemma A.4 tells us that b, # 0. Dividing by this gives us:
f ( ( p + air)x) = (4 + bit)f(x) + (1 - ( 4 + bit))f(O).
+
+
Hence ( p ait,4 bit) is a transmutation pair. We have deduced that the transmutation pair set is closed under addition of (ait,bit).This is a nontrivial result because ( a i t ,bit) is not zero. Now let ( p l , y l ) and ( p 2 , q z ) be arbitrary transmutation pairs. We mean to show that their sum is also a transmutation pair. We can assume ( p l , 41) and ( p 2 , q2) are nonzero, because otherwise the result is trivial. By law 6 of Lemma A.4, T P ( f ) is a multiplicative group, and so
(y,%)E TP(f).
Hence from the result already given,
402
APPENDIX: PROOF OF THEOREM 111.1
Again T P ( f ) is a multiplicative group,
is in T P ( f ) , and further-
more
that is, ( P 1 + p z , 41 + q 2 ) E T P ( f ) . In other words, the sum of any two elements in T P ( f ) is in T P ( f ) , so the transmutation pairs are closed under addition. The associative law for addition holds trivially, and by law 1 of Lemma A.4, (0,O) E T P ( f ) ,so T P ( f ) is an additive monoid. 0 Note the rather important condition on the theorem listed here, namely that A is not perm-diagonal. Lemma A.7 I f D is dense in S Vd E D,x E R
c R, f : RH f(dx)
then we can extend the property at
R is continuous at
P
and
d f ( x ) + (1 - d ) f ( O )
0 to obtain
Proof For s = 0, the required result holds trivially. We therefore assume s # 0. Since D is dense in S , for any s E S and given any E > 0 we can write s = d(s, E ) + 6, where 6 < E and d(s, E ) ED, d(s, E ) # 0. Thus s = lime+Od(s, E). Now we have:
= lim f ( ( d ( s , E ) E+O
+ 6)p)
€+O
(since all such limits exist) = sf(P)
+ (1 - s ) f ( O )
(by continuity off at
p).
0
403
APPENDIX: PROOF OF THEOREM 111.1
Corollary A.l such that
Let D denote a dense subset of R+ and f : R H R a function Vd E D
(d, d) E T P ( f ) .
Assume also that f is continuous at at least two points Then for any positive real s, (s, s) E T P ( f ) .
P1 > 0 and p2 < 0.
Proof By Lemma A.7, we have that, for any positive real s, f(sP1) = sf(PJ
+ (1
-
s).f(O)
f ( s B 2 ) = s f ( B 2 ) + (1 Now let x and s be any positive real numbers.
= sf(x)
+ (1 - s)f(O).
For x < 0, we have the same argument using p2 instead of PI. Finally, for x = 0, the same result holds trivially. Thus for any positive real s, we have VX€
R
f ( s x ) = $(x)
+ (1 - s)f(O).
In other words, (s, s) E T P ( f ) for all positive reals s. We are now ready to attack the Proof of the main theorem. Firstly, assume f is constant. By Lemma A.3, either B is a unit-row representation or f is zero, as required. Henceforth, f is assumed to be nonconstant. We have two main cases to consider, depending upon whether or not A is a perm-diagonal representation. We will deal with these rather different cases separately. 1. Suppose A is a perm-diagonal representation. By Lemma A S , ( A ( g )i.j>
ij) E
TP(f)
for any g, i, j. By law 3 of Lemma A.4, B must also be a perm-diagonal
404
APPENDIX: PROOF OF THEOREM 111.1
representation. We now have a number of subcases to consider.
(a) Suppose A is a permutation representation. The matrix entries of A contain only 0 s and 1’s. By laws 3 and 5 of Lemma A.4, the corresponding entries of B must also be 0 and 1, that is, B = A . (b) Suppose A is an inversion representation. We can further suppose that A is not a permutation representation, having already dealt with this case. By laws 4 and 7 of Lemma A.4, k = k 1 must (consistently) appear in the matrices for B wherever - 1 appears in the matrices for A, and again 0 and 1 must appear in B wherever they appear in A. Hence either B = A or B is the underlying permutation representation of A , depending upon whether k = - 1 or + 1, respectively. A cannot be a unit-row representation, because unit-row inversion representations are permutation representations. Consequently, by Lemma A.3, f goes through the origin. When k = -1 (and B = A ) , ( - 1, - 1) is a transmutation pair, so for all x, ,f(-x) = -.f(x), that is, f is odd. When k = 1 (and B is the underlying permutation representation of A), we have that ( - 1 , l ) is a transmutation pair, that is, f ( - x ) = f ( x ) , which means that f is even. This completes the proof for the case when A is an inversion representation. (c) Suppose A is an arbitrary perm-diagonal representation. We can assume that A is not an inversion representation, and hence A is not a unit-row representation, so ,f goes through the origin. By Lemma A.5 each corresponding pair of nonzero elements A(g)i,j, B(g)i , j obeys the law: ‘dx~
f ( A ( g )i,jx) = B ( g ) i , j f ( x ) .
Choose a pair z1 = A(g)i,j,7 , = B(Y),,~ such that 7,$ { - 1,0,1). This must be possible, or else A would be an inversion representation. By law 2 of Lemma A.4, z2 # 0. Here T~ and z2 satisfy the law: v x ER
.f(7
,x) = T 2 f ( X ) .
As f ’ is t-unexceptional, it must be of the following form for some a, b,, b , E R:
1
blxa x > 0
VXER
f ( x ) = b2xa x < 0 0 x=o,
because we also know f ( 0 ) = 0. Now let i, j A(g)i,j 3 0.
E
(‘4.3)
1 ...n and gE G be such that
B(g)i,jf(x)= f ( A ( g )i.jx) = A(g)4.jf(x).
APPENDIX: PROOF OF THEOREM 111.1
405
As f is nonconstant, B ( y ) i , j= A(g)Y,j. When A is a positive representation (and hence by Lemma A.l, perm-diagonal), we can apply the preceding for any matrix entry A(g)i.j,and, hence, B(y) = pa(&)) for any Q E G. This concludes the proof for positive representations. When A is not positive, there exists a negative entry t , in some matrix of A . Substituting into Eq. (A.3) gives us: h,t'fx" f(tlx)
=
x >0
b,77xx" x < O lo x = 0.
On the other hand, denoting by 7 2 the corresponding entry in the corresponding matrix of B, we have: t2hl.xa x > 0 f(tlx) = 7,f(.x) =
Z2b2X0
10
x <0 x = 0.
Equating these for nonzero x tells us that 7,b1 = z'fb, and t z b , = Tyb,. For both of these to be satisfied, we must have b , = +b, and T , = +T;, the two signs being either both positive or both negative. Furthermore, this argument holds for any negative entry t l in a representative matrix of A. In the case where b , = b,, f must have the form f ( x ) = bx" and B(g)i,j= A(Q):,~ for all i, j and g, that is, B(y) = pa(&)) for all y e G. In the case where b , = - b,, f must have the form f ( x ) = sgn(x)bx" and we also have B(g)i,j= sgn(A(g)i,j)A(g)4,jfor all i, j and y, that is, B(g) = p,(A(g)) for all y e G. This is our required result. This completes the proof in the case where A is a perm-diagonal representation. 2. We now consider the case when A is not perm-diagonal. As ,f is nonconstant, law 6 of Lemma A.4 and Lemma A.6 say that T P ( f ) is a multiplicative subgroup and an additive submonoid of R x R. As ( 1 , l ) T~ P ( f ) , all pairs (q, q), where q is positive and rational, must be in T P ( f ) . By Corollary A.l, and as f is t-unexceptional, (s, s) E T P ( f )for any positive real S.
By Lemma A.l A is not a positive representation. Hence A contains a negative entry k in some matrix, for which by Lemma AS, (k, k ) E TP(f). Again as T P ( f ) is a multiplicative group, (r, r) E T P ( f )for any nonzero real Y, but also for r = 0. In other words, vs, x E lR
f ( s x ) = sf(x)
+ (1 - s)f(O).
406
APPENDIX: PROOF OF THEOREM 111.1
Also, by Lemma A S and law 4 of Lemma A.4, we must have B define the function f by: f ( x ) :=
=
A . Now
.f(4- f ( 0 ) .
This new function goes through the origin, and clearly obeys: vs, x E R
f(sx) = sT(x).
Now let x 1 , x 2be arbitrary reals. If either is zero, we have ?(xi f ( x + f(x,) trivially. Otherwise, it follows anyway: ?(XI
+x2)
+ x2) =
=?((I
= ?(XI)
x2
-
+--(XI) X1
= .?(XI)
+ .m.
2
Hence f is linear. By the definition of f itself must be affine. Furthermore, unless f is a unit-row representation, Lemma A.3 tells us that f ( 0 ) = 0 and so f itself must be linear. This completes the proof in the case where A is not a perm-diagonal representation.
REFERENCES Caelli, T. and Liu, 2.-Q. (1988). On the minimum number of templates required for shift, rotation and size invariant pattern recognition, Pattern Recognition, 21(3): 205-216. Clausen, M. and Baum, U. (1993). Fast Fourier Trunsforms. Mannheirn: Wissenschaftsverlag. Cohn, P. (1989). Algebra. Volume 2, Second edition, Chichester, UK: John Wiley and Sons. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function, A4atheniutic.s qf Control. Signals, and Systems, 2: 303-3 14. Davis, L. (1 987). Genetic Algorithms and Simulated Annealing, Research Notes in Artificial lnre//igence.London: Pitman. Dieudonne, J. and Carrell, J. (1971). Invariant Theory, Old and N e w . New York: Academic Press. Dodd, N. (198R). Graph recognition strategies, Terhnicul Report RlPREPIlOOOl28l88, DRA Malvern, UK. Duren, R . and Peikari, B. (1991). A comparison of second-order neural networks to transformbased method for translation- and orientation-invariant object recognition, In Proceedings of the 1991 IEEE Workshop on Neural Networks f o r Signal Processing, pp. 236-245. Fausett, L. (1994). Fundamentals of Neural Networks. Englewood Cliffs: Prentice-Hall. Fukumi, M., Omatu, S., Takeda, F., and Kosaka, T. (1992). Rotation-invariant neural pattern
APPENDIX: PROOF OF THEOREM 111.1
407
recognition system with application to coin recognition, IEEE Trans. Neural Networks, 3(2): 272-279. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by a shift in position, Biological Cybernetics, 36: 193-202. Fulton, W. and Harris, J. (1991). Representation Theory ( A First Course). New York: Springer-Verlag. Giles, C. and Maxwell, T. (1987). Learning, invariance, and generalization in high-order neural networks, Applied Optics, 26(23): 4972-4978. Hertz, J., Krogh, A,, and Palmer, R. (1991). Introduction to the Theory of Neural Computation, Reading, MA: Addison-Wesley. Hewitt, E. and Ross, K. (1979). Abstract Harmonic Analysis, Volume 1, Second edition, Berlin: Springer. Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks, Neural Networks, 4, 251 -257. Hu, M. (1962). Visual pattern recognition by moment invariants, IEEE Trans. Information Theory, 8: 179-187. Kreyszig, E. (1978). Introductory Functional Analysis with Applications. New York: John Wiley and Sons. Kwon, H., Kim, B., Cho, D., and Hwang, H. (1993). Scale and rotation invariant pattern recognition using complex-log mapping and augmented second order neural network, Electronic Letters, 29(7): 620-621. Lang, K. and Hinton, G. (1988). The development of TDNN architecture for speech recognition, Technical Report CMU-CS-88-1.52,Pittsburgh: Carnegie-Mellon University. Le Cun, Y. (1989). Generalization and network design strategies. In Connectionism in Perspectioe, R. Pfeifer, Z. Schreter, F. Fogelmansoulie and L. Steels, eds., Amsterdam: Elsevier Science, Chapter 40, pp. 143-155. Le Cun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., and Jackel, L. (1989). Backpropagation applied to handwritten zip code recognition, Neural Computation, 1: 54 1 -55 1. Ledermann, W. (1977). Introduction to Group Characters. Cambridge: Cambridge University Press. Lenz, R. (1990). Group Theoretical Methods in Image Processing, Volume 413, Lecture Notes in Computer Science, Berlin: Springer-Verlag. Leshno, M., Lin, V., Pinkus, A,, and Schocken, S. (1993). Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks, 6 86 1- 867. Li, Y. (1992). Reforming the theory of invariant moments for pattern recognition, Patern Recognition 25(7): 723-730. Luks, E. (1982). Isomorphism of graphs of bounded valence can be tested in polynomial time, Jour. Computer and System Sciences, 25(1): 42-65. Mackey, G. (1976). The Theory of Unitary Group Representations, Chicago Lectures in Mathematics, Chicago: Chicago University Press. Moharir, P. (1992). Pattern Recognition Transforms, Taunton, Somerset, UK: Research Studies Press Ltd. Perantonis, S. and Lisboa, P. (1992). Translation, rotation and scale invariant pattern recognition by high-order neural networks and moment classifiers, I E E E Trans. Neural Networks, 312): 241-251. Rumelhart, D., Hinton, G.,and Williams, R. (1986a). Learning internal representations by error propagation, In Parallel Distributed Processing: Explorations in the Microstructures ofcognition, D. Rumelhart and J. McClelland, eds., Volume 1, Cambridge: MIT Press, pp. 318-362.
408
APPENDIX: P R O O F OF THEOREM 111.1
Rumelhart, D., Hinton, G., and Williams. R. (1986b). Learning representations by backpropagating errors, Nature, 323. Shawe-Taylor, J. (1989). Building symmetries into feedforward networks. In Proceedings of’ First IEE Conference on Artificial Neurul Networks, pp. 158-162. Shawe-Taylor, J. (1983). Symmetries and discriminability in feedforward network architectures, I E E E Trans. Neural Networks, 4(5): 816-826. Shawe-Taylor, J. and Cohen, D. (1990). The linear programming algorithm, Neural Networks, 3: 575-582. Sheng, Y. and Arsenault, H. (1986). Experiments on pattern recognition using invariant Fourier-Mellin descriptors, Jour. Optical Society of’ America, A (Optics ond Image Science), 3(6): 771 -776. Spirkovska, L. and Reid, M. (1992). Robust position, scale and rotation invariant object recognition using higher-order neural networks, Pattern Recognition, 25(9): 975-985. Srivastava, R. (1991). Transformation and distortion tolerant recognition of numerals using neural networks, In Proceedings qf the 1991 A C M Computer Science Conjerenci., New York: ACM, pp. 402-408. Wagh, M. and Kanetkar, S. (1977). A class of translation invariant transfomis, l E E E Trurls. Acoustics, Speech and Signul Processing, 25: 203. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, K. (1989). Phoneme recognition using time-delay neural networks, I E E E Trims. Acousrics, Speech arid Signal Processing, 37(3): 328-339. Wechsler, H. (1987). Invariance in pattern recognition, In Advunces in Electronics ii~idElectron Physics, P. Hawkes, ed., Volume 69, New York: Academic Press, pp. 262-322. Werbos, P. (1974). Beyond Regression: New Tools,forPrediction and Anulysis in the Behncioural Sciences, Ph.D. thesis, Harvard University. Weyl. H. ( 1 946). The Clussicul Groups, Their Invariants trnd Represmtutions, Princeton, NJ: Princeton University Press. Wood, J. (1995). A Model of Adaptive Inouriance, Ph.D. thesis, Royal Holloway University of London. Wood, J. and Shawe-Taylor, J . (1996a). Representation theory and invariant neural networks, Discrete Applied Muthemurics, 69( 1-2): 33-60. Wood, J. and Shawe-Taylor, J. (l996b). A unifying framework for invariant pattern recognition, Pattern Recognilion Lelters, 1 7 1415-1422.
Index
A Adaptive invariant, 31 1 Adaptivity in group representation networks, 344-362 Aharonov-Bohm oscillations, 8, 10, 18, 45-52 frequency doubling, 46 period doubling, 48 temperature dependent decay, 27 Algorithms for constructing group representation networks, 356-361 learning, 362-363 Alternating representation, 31 5 Analog neural network, 265 Antialias filtering, 246-247 Atomic scale analysis. See Composition evaluation by lattice fringe analysis (CELFA); Strain-state analysis
B Backpropagation algorithm, 362 Backscattering Rayleigh, 74 resonances, 52-58 zero-field magneto-resistance peaks and, 38-44 Binary image processing comparison between hexagonal and rectangular skeletonization, 282289 connectivity, 279-280 distance, measurement of, 280 distance functions, 280 morphological operators, 281 role of, 235
thinning and skeletonization algorithms, 281-282 Bistability under optical feedback, 104 Bloch-wave method, 155 Bragg grating, 74, 76 Burgers vector, 207, 208
C CdSelZnSe heterostructures, CELFA analysis of, 213-220 CdZnSe/ZnSe heterostructures, strainstate analysis of, 182- 196 CELFA. See Composition evaluation by lattice fringe analysis Chaos, 87-89 Characters, group representation theory and, 320-324 Character table, 323-324 Charge coupled devices (CCDs), 248, 249-252 Chemical vapor deposition (CVD), 122 Cheng Kung University, 275 Chung Cheng Institute, 275 Clip4 array, 264-265 Coarse-grain arrays, 266 Coherence collapse, 88, 102- 104 Coherent feedback level, 100 Coherent versus incoherent feedback, 82 Complex-coupled D F B lasers, 75, 82, 97, 110-111,112 Composition evaluation by lattice fringe analysis (CELFA), 123, 124 basics of, 154- 160 of CdSe/ZnSe heterostructures, 213220 composition detection errors, 177181
409
410
INDEX
Composition evaluation (Cont.) conclusions, 222-223 correction of imaging conditions, 175-177 determination of sample thickness and phase x,, 167-170 evaluation procedure, 170-175 fringe images, 160-161 Stranski-Krastanow island structures and, 199-206 theoretical considerations, 161-167 Compound cavity model, 98-101 Concomitants, linear and nonlinear, 329-344 Conductance, density of states and, 6-7 Conductance fluctuations in open quantum dots, 10-16 temperature dependence characteristics and, 25-29 Conjugacy, 313 Connectivity, 279-280 Continuous Fourier transform, 381 Coulomb blockade effect, 4-5 Coulomb oscillations, magneto, 49, 52 Critical feed back ratio, 111- 113 Cyclotron radius, 8, 10 Cytocomputer, 272-273 Cyto-HSS, 273
D Datacube, 274 Density of states, conductance and, 6-7 Dephasing, 24 See also Phase-breaking time environmental coupling and electron, 33-35 Digital analysis of lattice images (DALI), 122, 134, 135, 138, 142, 152, 189, 194, 213 Digital image acquisition, 245-252 Digitization, 246 Direct sum of representations, 316 Discrete Fourier transform (DFT), 317318, 379-380
Discriminability, 364-365 Distributed Bragg reflector (DBR) lasers, coherence collapse and, 88 Distributed feedback (DFB) lasers See also Optical feedback applications, 74 coherence collapse, 88 coherent versus incoherent feedback, 82 complex-coupled, 75, 82, 97, 110- 111, 112 electromagnetics, 75- 79 fundamentals of, 74-82 future of, 114 gain-coupled, 75, 81-82, 97 general characteristics, 8 1-82 index-coupled, 75, 81, 97 mechanics of feedback in, 74 oscillation condition, 79-81 physical structures, 75 Double-heterostructure (DH) semiconductor lasers, 75
E Edge detectors, 292 hexagonal, operators, 293-294 Edges and features, visual appearance of, 294-298 Edge spread function (ESF), 254 Edge states, 8 backscattering resonances, 52-58 formation of, 44-45 Elastic relaxation, 145- 154 Electromagnetics, DFB, 75-79 Elor Optromics Ltd., 274 Environmental coupling, electron dephasing and, 33-35 Equivalent representations, 3 16 Eye, hexagonal packing of sensory elements in the, 236-238
F Fabry-Perot (FP) type semiconductor laser, 73-74 coherence collapse, 88
INDEX
Fast correlation transform, 385 Fast Fourier transform (FFT), 236 Fast translation-invariant transforms (FTITs), 384-387 Feedforward neural network, 335-336 Filters linear, 290-29 1 nonlinear, 292 Fine-grain arrays, 264-266 Finite-dimensional representation, 315 Finite element method (FEM), 123 elastic relaxation, 145-154 Fixed-weight group representation networks, 335-341 Flux quantization, 63 Fourier analysis, 10, 11- 12, 3 11 Fourier transform continuous, 381 discrete, 317-318, 379-380 fast, 236 hexagonal, 289 hexagonal fast, 289 rectangular fast, 289 Fourier-transformed image, 126, 127, 128 Frobenius reciprocity, 324-325
G Gain-coupled (GC) DFB lasers, 75, 8 1-82,97 Generalization performance, 364 Geometric periodicity, 335 Geometric transformations, 289 GLOPR, 275 Graph recognition problem, 365-368 Group concomitance, 3 12 Group representation networks (GRNs) adaptivity in, 344-362 algorithm for constructing, 356-361 computational power of, 371-378 conclusions, 391-393 defined, 3 11 discriminability, 364-365 examples of group invariance problems, 310-31 1
41 1
fixed-weight, 335-341 generalization performance, 364 graph recognition problem, 365-368 learning algorithms, 362-363 learning process, 363 linear and nonlinear concomitants, 329-344 number of parameters and parameter reduction, 354-356 parameterized homomorphisms, 345348 parameterized homomorphisms for induced representations, 348-354 parity problem, 369-370 polynomial invariants, 372-376 redundancy of noninduced representations, 341-344 simulations, 365-371 symmetry networks, 361-362 transmutation, 331-335, 395-406 Group representation theory characters, 320-324 direct sum and tensor product of representations, 316 Frobenius reciprocity, 324-325 homomorphisms and intertwining spaces, 316-320 irreducible representations, 313-316 special classes of representations, 325-329
H Hall effect, 2, 52, 53, 55 HARTS, 266 Heaviside function, 191 Hexagonal fast Fourier transform (HFFT), 289 Hexagonal Fourier transform, 289 Hexagonal pyramids, 268-270 Hexagonal sampling advantages of, 232 conclusions, 299-302 Hexagonal sampling, binary image processing
412
INDEX
Hexagonal sampling (Cont.) comparison between hexagonal and rectangular skeletonization, 282289 connectivity, 279- 280 distance, measurement of, 280 distance functions, 280 morphological operators, 281 role of, 235 thinning and skeletonization algorithms, 281-282 Hexagonal sampling, image sampling digital image acquisition, 245-252 grids and tilings, 232-233, 240-245 hexagonal packing of sensory elements in the eye, 236-238 hexagon-shaped sensor elements, 238239 measurement of 2-D modulation transfer function and bandlimit shape, 253-258 noise and quantization error, 245 two-dimensional sampling theory, 239-245 Hexagonal sampling, monochrome image processing edge detection operators, hexagonal, 293 -294 edge detectors, 292 edges and features, visual appearance of, 294-298 geometric transformations, 289 hexagonal fast Fourier transform, 289 hexagonal Fourier transform, 289 image processing filters, 290-292 point source location, 290 role of, 235-236 Hexagonal sampling, processor architectures hexagonal image processing pipelines, 275-279 parallel, 262-263 pipelined, 271 -275 pyramid, 266-270
single-instruction single-data (SISD) computers, 234, 259-262 two- and multi-dimensional processor arrays, 263-266 types of, 234 Higher-order neural networks (HONNS), 389-390 High-field magneto-transports, open quantum dots and. See Magnetotransports (high-field), open quantum dots and High-resolution transmission electron microscopy (HRTRM), 121- 122 See ulso Composition evaluation by lattice fringe analysis (CELFA); Strain-state analysis High-temperature properties, weak-field magneto-transport and, 24-44 Homomorphisms, 316-320 parameterized, 345-348 parameterized, for induced representations, 348-354 Hopf bifurcation, 103, 104 Hough transform, 234 Hysteresis, 84, 98
I IDSP, 274 Illiac 111, 265 Image gathering section, 246 Image processing See ulso Binary image processing; Monochrome image processing filters, 290-292 Image sampling digital image acquisition, 245-252 grids and tilings, 232-233, 240-245 hexagonal packing of sensory elements in the eye, 236-238 hexagon-shaped sensor elements, 238-239 measurement of 2-D modulation transfer function and bandlimit shape, 253-258 noise and quantization error, 245
INDEX
two-dimensional sampling theory, 239-245 Image simulation software, 122 Incoherent feedback, coherent versus, 82 Index-coupled (IC) DFB lasers, 75, 81, 97, 112 Induced representations, 324-325 parameterized homomorphisms for, 348-354 InGaAsIGaAs Stranski-Krastanow island structures, 196 capped islands and CELFA, 199-206 evaluation of In-concentration in free standing, with strain-state analysis, 197-199 Injection locking model, 103 Input representation, 336 Integral transform-based methods, 3 11 Integral transform invariants, 378 continuous Fourier transform, 381 discrete Fourier transform, 317-318, 379-380 general, 382-383 Mellin transform, 381-382 Intertwining spaces, 316-320 Invariant first-order networks, 388-389 Invariant pattern classification, 310-3 13 Invariants moment, 31 1, 390-391 polynomial, 372-376 Inversion representation, 327-328, 329 Irreducible representation, 315-316
J Jaguar, 275
K Kiwivision, 274 Knife-edge technique, 253-254 Kydon, 266
L Landau levels, 2, 5, 8, 44 Laser diode. See Distributed feedback (DFB) lasers
413
Lattice displacements and spacings, 125 calculation of, 133-137 calculation of lattice base vectors, 132- 133 detection of lattice sites and gridding, 128, 130-132 noise reduction, 126-128 Lattice fringe analysis. See Composition evaluation by lattice fringe analysis Learning algorithms, 362-363 Lifetime broadening, 40 Linear concomitants, 329-344 Linear gain theory, 110 Linewidth reduction/broadening, 87-89 Logarithmic mapping, 382 Lomer misfit dislocations, 206-21 3 Low-temperature properties, weak-field magneto-transport and, 8-24
M Magneto-Coulomb oscillations, 49, 52 Magneto-resistance peak, central, 10 Magneto-transports, open quantum dots and conclusions, 63-67 Coulomb blockade effect, 4-5 features of quantum dots, 2-3 split-gate technique, 3, 4 theoretical issues, 6-8 Magneto-transports (high-field), open quantum dots and Aharonov-Bohm oscillations, 45 -52 analogy to artificial atoms, 58-60 backscattering resonances, 52-58 edge state formation, 44-45 switching, metastable, 59-60 time dependent, 58-63 zero-current voltage noise, 60-63 Magneto-transports (weak-field), open quantum dots and central peak at zero-field, 22-24 conductance fluctuations, 10- 16 conductance fluctuations and temperature dependence characteristics, 25-29
414
INDEX
Magneto-transports (Cont.) environmental coupling and electron dephasing, 33-35 high-temperature properties, 24-44 low-temperature properties, 8-24 phase-breaking time, 29-38 scarring, 11-12, 14, 15-16 size-dependent scaling, 14- 15 stability of dominant orbits, 14 wavefunction scarring at zero magnetic field, 16-21 zero-field magneto-resistance peaks and backscattering, 38-44 Maxwell's equations, 75, 77 Mean turn-on time (MTOT), 95 Mellin transform, 381 -382 Metalorganic vapor phase epitaxy (MOVPE), 206, 208 Migration enhanced epitaxy (MEE), 215 Mode competition noise, 104- 107 sensitivity based on, 111- 113 Modulation transfer function (MTF), 250-251,252 measurement of 2-D, and bandlimit shape, 253-258 Molecular beam epitaxy (MBE), 122, 196 Moment invariants, 311, 390-391 Monochrome image processing edge detection operators, hexagonal, 293-294 edge detectors, 292 edges and features, visual appearance of, 294-298 geometric transformations, 289 hexagonal fast Fourier transform, 289 hexagonal Fourier transform, 289 image processing filters, 290-292 point source location, 290 role of, 235-236 Morphological operators, 281 MSC-PATRAN program, 150 M-transform, 384 Multi-dimensional processor arrays, 263-266
Multiple-instruction multiple-data (MIMD), 263-264
N New Jersey Institute of Technology, 275 Noise, hexagonal image sampling and, 245 Noise generation, optical feedback and, 90-94 Noise reduction, strain-state analysis and, 126-128 Nonlinear concomitants, 329-344
0 Occam's Razor, 311 Open quantum dots. See Magnetotransports, open quantum dots and Optical feedback intensity fluctuation and spectral characteristics, 83-87 linewidth reduction, broadening, and chaos, 87-89 mean turn-on time, 95 noise generation, 90-94 reflection-induced behavior, 96 regimes for, 94-95 sensitivity based on mode competition noise, 111-113 sensitivity t o threshold gain and spectrum, 108- 111 turn-on jitter, 95 Optical feedback theories bistability under optical feedback, 104 coherence collapse, 88, 102-104 compound cavity model, 98-101 mode competition noise, 104- 107 rate equations and, 97 Orthogonal representations, 3 17 Oscillation condition, DFB, 79-81 optical feedback and, 108- 111 Output representation, 336
P Parallel processors, 262-263 Parameterized homomorphisms, 345- 348
INDEX
for induced representations, 348-354 Parity problem, 369-370 Pattern classification, invariant, 3 10313 Periodic-orbit theory, 7, 14 Perm-diagonal representation, 327, 328, 342-343, 395-396 Permutation representations, 3 15, 325328 Phase-breaking time environmental coupling and electron dephasing, 33-35 experimental technique for determining, 29-32 at finite temperatures, 32-33 scarring and, 29 variations in phase coherence, 35-38 PIPE, 274 Pipelined processors, 271-275 Pipeline Processor Farm, 275 Point source location, 290 Polynomial invariants, 372-376 Positive representation, 328 PREP, 274 Processor architectures hexagonal image processing pipelines. 275-279 parallel, 262-263 pipelined, 271-275 pyramid, 266-270 single-instruction single-data (SISD) computers, 234, 259-262 two- and multi-dimensional processor arrays, 263-266 types of, 234 PSC Circuit, 265 Pyramid processors, 266-270
Q
QUANTITEM (quantitative analysis of information from transmission electron micrographs), 137- 138, 140, 197 Quantization error, hexagonal image sampling and, 245
415
Quarter-wave-shifted (QWS) D F B lasers, 97, 109-110, 112, 113
R Rayleigh backscatter, 74 Rectangular fast Fourier transform (RFFT), 289 Reducible representation, 315, 316 Reflection high-energy electron diffraction (RHEED), 124, 182-187 Relative intensity noise (RIN), 93-94, 97, 107 Representation. See Group representation network (GRN); Group representation theory Right/regular representation, 326 R-transform. 384
S Sampling. See Hexagonal sampling Scaling, size-dependent, 14- 15 Scarring, 11-12, 14 phase-breaking time and, 29 universality of, 15-16 wavefunction, at zero magnetic field, 16-21 Semiconductor lasers See also Distributed feedback (DFB) lasers applications, 73 SEMPER, 122 Sensor elements, hexagon-shaped, 238-239 Separate confinement heterostructure (SCH), 75 Silicon retina with correlation-based, velocity-tuned pixels, 265 Single-instruction multiple-data (SIMD), 263-264 Single-instruction single-data (SISD) computers, 234, 259-262 Skeletonization algorithms, 281 -282 comparison between hexagonal and rectangular, 282-289 Sobel edgehetector, 234, 292
416
INDEX
Spatial cutoff frequency, 251 Spatial hole burning (SHB), 82, 110 Split-gate technique, 3, 4 Square sampling, 232 Star tracking, 290 Strain-state analysis applications, 182- 196 cell transformation, 138- 140 conclusions, 220-222 determination of sample thickness, 137-145 elastic relaxation, 145- 154 measurement of displacements and lattice spacings, 125- 137 misfit dislocations, 206-213 Stranski-Krastanow island structures, 196 capped islands and CELFA, 199-206 evaluation of In-concentration in free standing, with strain-state analysis, 197-199 Switching, metastable, 59-60 Symmetric group, 314 Symmetry networks, 361 -362
T Tensor product of representations, 316 Texas Instruments Pipelines, 275 T-exceptional (transmutationexception), 334 Thermal smearing, 40, 46 Thinning algorithms, 28 1-282 3-D growth modes, 122, 123 Threshold gain, optical feedback and, 108- 1 11 Time delay neural networks (TDNNs), 388-389 TITAN, 274 Transmutation, 331 -335, 395-406 Trivial representation, 31 5 T-unexceptional (transmutationunexception), 334 Tunneling transport, magneto-Coulomb oscillations and, 49, 52 Turn-on jitter (TOJ), 95
Two-dimensional processor arrays, 263266 Two-dimensional sampling theory, 239245
U Underlying permutation representation, 328 Unitary representations, 317 Unit-row representation, 328 Universal Invariance Conjecture, 372, 378 University of Belfast, 274 University of California, 274 University of Strathclyde, 274 University of Warwick, 275
V van Hove-like singularities, 2 Vegard’s law, 122, 147
W Wavefunction scarring at zero magnetic field, 16-21 Weak-field magneto-transports, open quantum dots and. See Magnetotransports (weak-field), open quantum dots and Weight sharing, 361 Wiener-filtering technique, 126- 127, 160
Y Yonsei University Real-time System, 275
Z Zero-current voltage noise, 60-63 Zero magnetic field central peak at, 22-24 resistance peaks and backscattering, 38-44 wavefunction scarring at, 16-21 ZnCdSe/ZnSe heterostructures, strainstate analysis of, 182-196 ZnTe/ZnSe, misfit dislocations, 206-21 3
This Page Intentionally Left Blank
I S B N 0-12-014749-1