ADVANCES IN IMAGING AND ELECTRON PHYSICS VOLUME 91
EDITOR-IN-CHIEF
PETER W. HAWISES CEMESILaboratoire d’Optique Electronique du Centre National de la Recherche Scientifrque Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITEDBY PETER W. HAWKES CEMESILahoratoire d' Optique Electronique du Centre National de la Recherche Scientifique Toulouse, France
VOLUME 91
ACADEMIC PRESS San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper.
@
Copyright 0 1995 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, lnc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NW 1 7DX
International Standard Serial Number: 1076-5670 International Standard Book Number: 0- 12-014733-5 PRINTED IN THE UNITED STATES OF AMERICA 95 96 9 7 9 8 99 0 0 B B 9 8 7 6
5
4
3 2
I
CONTENTS
CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . .
I. 11. 111. IV.
V.
I. 11. 111. IV. V.
ix xi
Canonical Aberration Theory in Electron Optics Up to Ultrahigh-Order Approximation JIVE XIMEN Introduction . . . . . . . . . . . . . . . . . . . . . . The Harniltonian Function and Its Power-Series Expansion . . . The Eikonal Function and Its Power-Series Expansion . . . . Intrinsic and Combined Aberrations Up to the Ninth-Order Approximation. . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
28 33 35
New Developments on Generalized Information Measures INDER JEET TANEJA Introduction . . . . . . . . . . . . . . . . . . . . . . Unified (r,s)-Information Measures . . . . . . . . . . . . M-Dimensional Unified (r,s)-Divergence Measures . . . . . . Unified (r,s)-Multivariate Entropies . . . . . . . . . . . . Applications . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
37 41 75 95 110 132
V
1
4 13
vi
CONTENTS
50 Years of Electronics Introduction R. A. LAWES The Exploitation of Semiconductors B. L. H. WILSON 1. Prehistory . . . . . , . . . . . . . 11. The Conceptual Network . . . . . . 111. Technology . . . . . . . . . . . . IV. The Transistor . . . . . . . . . . . V. Integrated Circuits , , . . . . . . . VI. The Field Effect Transistor . . . . . . V11. The Information Technology Revolution VIII. U.K. Progress in ICs from 1963 . . .
. . .
. .
. . . .
. .
. . . . . .
. .
. . . . . . . . . . . . , . . . . . . . . . . . . . .
. . .
. . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
142 i42 143 143 146 149 150
I so
The Use and Abuse of 111-V Compounds CYRILHILSUM
1. 11. 111.
IV. V.
I. 11. 111.
IV. V. VI. VII.
Telecommunications: The Last, and the Next, 50 Years JOHN BRAY Today’s World of Telecommunications . . . . . . . , . . . Telecommunications 50 Years Ago . . . . . . . . . . . . Key Developments in the Last 50 Years . . . . . . . . . . Telecommunications in the Next SO Years . . . . . . . . . The Impact of Telecommunications and Information Technology on the Future of Mankind . . . . . . . . . . . . . . . .
Mesoscopic Devices Where Electrons Behave like Light A. J. HOLDEN Introduction . . . . . . . . . . . . . . . . . . . . . Building a Mesoscopic System . . . . . . . . . . . , . A Brief History . . . . . . . . . . . . . . . . . . . Transport Theory . . . . . . . . . . . . . . . . . . . Some Simple Examples . . . . . . . . . . . . . . . . The Aharonov-Bohm Effect . . . . . . . . . . . . . . Two-Terminal Conductance in I-D Quantum Point Contacts .
. ,
. . . .
.
189 191 192
205 210
2 13 214 215 2 IS 2 16 2 18 219
vii
CONTENTS
VIII . Magnetic Phenomena: Edge States . . IX . Devices . . . . . . . . . . . . . X . Summary and Conclusions . . . . . References . . . . . . . . . . . .
1.
I1 . I11. IV. V.
. . . .
. . . .
. . . .
. . . .
. . . . . . . 223 . . . . . . 225 . . . . . . . 227 . . . . . . 228
The Evolution of Electronic Displays. 1942-1992 DERRICK GROVER Introduction . . . . . . . . . . . . . . . . . . . . . . Flat Panel Technology . . . . . . . . . . . . . . . . . . Projection Displays . . . . . . . . . . . . . . . . . . . Three-Dimensional Display . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
Gabor’s Pessimistic 1942 View of Electron Microscopy and How He Stumbled on the Nobel Prize T MULVEY I . Introduction . . . . . . . . . . . . . . . . . . . . . . I1 . Electron Physics in Wartime Britain . . . . . . . . . . . . 111. Dangers in the German Reich . . . . . . . . . . . . . . IV. An Enemy Alien with Special Qualifications . . . . . . . . V. The Third Meeting of the Electronics Group. October 1942 . . VI . Space-Charge Correction of Spherical Aberration . . . . . . VII . The Projection Shadow Microscope: An Unrecognized Tool for Holography . . . . . . . . . . . . . . . . . . . . . . VIII . Abbe’s Theory of the Microscope and the TEM . . . . . . . IX . Electron Beam Holography . . . . . . . . . . . . . . . . X . Electron-Optical Trials at the AEI Research Laboratory . . . . XI . Artifact-Free Holography . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .
231 244 252 253 256 256
259 260 267 268 269 274 274 275 276 279 282 283
Early Techniques in Radio Astronomy A . HEWISH
INDEX
. . . . . . . . . . . . . . . . 291
This Page Intentionally Left Blank
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors' contributions begin. JOHN BRAY(189), The Pump House, Woodbridge, Suffolk IP13 6AH, United Kingdom
DERRICK GROVER(231), 3 The Spinney, Haywards Heath, West Sussex RH16 IPL, United Kingdom
R. A. LAWES(139), Central Microstructure Facility, Rutherford Appleton Laboratory, DIDCOT (Oxon) OX1 1 OQX, United Kingdom A. HEWISH(285), Cavendish Laboratory, Cambridge CB3 OHE, United
Kingdom CYRILHILSUM(171), GEC HIRST Research Center, Borehamwood (Hertz) WD6 IRX, United Kingdom A. J. HOLDEN (2 13), GEC-Marconi Materials Technology, Caswell, Towcester
NN 12 8EQ, United Kingdom T. MULVEY (259), Department of Electronic Engineering and Applied Physics, Aston University, Birmingham B4 7ET, United Kingdom
INDER JEET TANEJA(37). Depanamento de Matemitica, Universidade Federal de Santa Caratina, 88.037-060 Florianopolis, SC, Brazil B. L. H. WILSON(141), The Armada House, Weston, Towcester "12 United Kingdom
8PU,
JIVE XIMEN ( 1 ), Department of Radio-Electronics, Peking University, Beijing 10087 1, China
ix
This Page Intentionally Left Blank
PREFACE
This latest volume of these Advances contains two regular articles and a separate part in celebration of “50 Years of Electronics.” The two regular contributions are by authors whose names are known to readers of the series; J.-y. Ximen has already written an account of his canonical approach to aberration theory, which he extends here to high order aberrations. Although these very high order perturbations are not of immediate interest for short systems such as electron microscopes and similar instruments, they do become important in circular devices and perhaps in very long structures. I. J. Taneja has likewise described his ideas about information measures in these pages and, in the present contribution, he extends and generalizes those notions. The second part grew out of a meeting held by the Electronics Group of the (British) Institute of Physics in 1992 to celebrate their 50th anniversary, an anniversary that was certainly worth marking for the first chairman was John Cockcroft and the first full lecture was delivered by Dennis Gabor. That lecture was on electron optics and it was an apparently insoluble problem of electron lenses that stimulated Gabor to invent holography a few years later. Among the authors of the very early papers describing attempts to implement Gabor’s ideas was Tom Mulvey, who reminisces here about those early days. The other contributors to this collection discuss semiconductors (B. L. H. Wilson and C. Hilsum), mesoscopic systems (A. J. Holden), and displays (D. Grover). Finally, A. J. Hewish tells how electronics wedded to astronomy led to the birth of radio astronomy. I am most grateful to the authors of all these contributions and to Dr. Brian Jones, who first suggested that these Advances would provide a suitable home for these recollections of a most exciting half-century of developments in electronics. As usual, I conclude with a list of forthcoming articles. Two further volumes will follow shortly to ensure rapid publication, and articles scheduled to be included in those volumes are indicated.
ARTICLES FORTHCOMING
Group invariant Fourier transform algorithms
Nanofabrication xi
Y. Abdelatif and colleagues (Vol. 93) H. Ahmed
xii
PREFACE
Use of the hypermatrix Image processing with signal-dependent noise The Wigner distribution Parallel detection Hexagon-based image processing Microscopic imaging with mass-selected secondary ions Nanoemission Magnetic reconnect ion Sampling theory ODE methods The artificial visual system concept Projection methods for image processing The development of electron microscopy in Italy Space-time algebra and electron physics The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Group algebra in image processing Miniaturization in electron optics Crystal aperture STEM The critical-voltage effect Amorphous semiconductors Stack filtering Median filters Bayesian image analysis RF tubes in space Mirror electron microscopy Relativistic microwave electronics Rough sets The quantum flux parametron The de Broglie-Bohm theory Contrast transfer and crystal images Seismic and electrical tomographic imaging Morphological scale-space operations
D. Antzoulatos H. H. Arsenault M. J. Bastiaans P. E. Batson S. B. M. Bell M. T. Bemius Vu Thien Binh A. Bratenahl and P. J. Baum J. L. Brown J. C. Butcher J. M. Coggins P. L. Combettes G. Donelli C. Doran and colleagues M. Drechsler J . M. H. Du Buf D. Eberly A. Feinerman J. T. Fourie (Vol. 93) A. Fox W. Fuhs M. Gabbouj N. C. Gallagher and E. Coyle S. Geman and D. Geman A. S. Gilmour R. Godehardt V. L. Granatstein J. W. GrzymalaBusse W. Hioe and M. Hosoya P. Holland K . lshizuka P. D. Jackson and colleagues P. Jackway
PREFACE
Algebraic approach to the quantum theory of electron optics Electron holography in conventional and scanning transmission electron microscopy Quantum neurocomputing Applications of speech recognition technology Spin-polarized SEM Sideband imaging Highly anisotropic media High-definition television Regularization Near-field optical imaging SEM image processing Electronic tools in parapsychology Image formation in STEM The Growth of Electron Microscopy Phase retrieval Phase-space treatment of photon beams Image plate Z-contrast in materials science Electron scattering and nuclear structure Multislice theory of electron lenses The wave-particle dualism Electrostatic lenses Scientific work of Reinhold Rudenberg Electron holography X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology Set-theoretic methods in image processing Texture analysis Focus-deflection systems and their applications New developments in ferroelectrics Orientation analysis Knowledge-based vision
...
XI11
R. Jagannathan and S . Khan E Kahl and H. Rose S. Kak H. R. Kirby K. Koike W. Krakow C. M. Krowne (Vol. 92) M. Kunt A. Lannes A. Lewis N. C. MacDonald R. L. Moms C. Mory and C. Colliex T. Mulvey (Ed.) (Vol. 94) N. Nakajima (Vol. 93) G . Nemes T. Oikawa and N. Mon S. J. Pennycook G . A. Peterson G . Pozzi (Vol. 93) H. Rauch E H. Read and I. W. Drummond H. G . Rudenberg D. Saldin G . Schmahl J. P. F. Sellschop J. Serra M. I. Sezan H. C. Shen T, Soma J. Toulouse K. Tovey (Vol. 93) J. K. Tsotsos
xiv
PREFACE
Electron gun optics Very high resolution electron microscopy Spin-polarized SEM Morphology on graphs Cathode-ray tube projection TV systems
Image enhancement Signal description The Aharonov-Casher effect
Y. Uchikawa D. van Dyck T. R. van Zandt and R. Browning L. Vincent L. Vriens, T. G . Spanjer, and R. Raue €? Zamperoni (Vol. 92) A. Zayezdny and I. Druckmann A. Zeilinger, E. Rasel, and H. Weinfurter
ADVANCES IN IMAOINO AND ELECTRON PHYSICS,VOL. 91
Canonical Aberration Theory in Electron Optics Up to Ultrahigh-Order Approximation JIYE XIMEN Department of Radio-electronics,Peking University Beijing. China
. .
I. Introduction . . . . . . . . . . . . . . . . . . . II. The Hamiltonian Function and Its Power-Series Expansion . . . . . A. Fundamental Formulas . . . . . . . . . . . . . . . B. Power-Series Expansions for Different-Order Hamiltonian Functions C. Discussion . . . . . . . . . . 111. The Eikonal Function and Its Power-Series Expansion . . . . . . . A. The First-Order Gaussian Trajectory in Hamiltonian Representation . . B. Different-Order Eikonals in Hamiltonian Representation . . . . . C. Canonical Expansion of c4 . . . . . . . . . . . . . . . D. Canonical Expansion of c6 . . . . . . . . . . . . E. Canonical Expansion of c8 . . . . . . . . . . . . . . . F. Canonical Expansion of . . . . . . . . . . . . . G. Discussion . . . . . . . . . . . . . . . . . . . IV. Intrinsic and Combined Aberrations Up to the Ninth-Order Approximation A. Intrinsic Aberrations Up to the Ninth-Order Approximation . . . . B. Combined Aberrations Up to the Ninth-Order Approximation . . . V. Conclusions . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
.
. . . . . . . . . . . . . . . . .
. . .
.
.
. .
.
. . . .
.
. . . . . . . . . . . . .
. . . . . . . . . . . . .
1 4
4 8 13 13 13 15 16 17 19
22 26 28 28 30 33 35
I. INTRODUCTION In the history of electron optics, theory of aberration played a great role in the development of charged-particle instruments and devices. The theoretical calculation of electron optical aberrations was first published in the early 1930s, by Glaser (1933a, b) using the eikonal method and by Scherzer (1936a, b) using the trajectory method. It was Glaser who established the eikonal aberration theory in his famous book (Glaser, 1952). In the literature, the general theory for calculating higher-order geometrical and chromatic aberrations in Lagrangian representation was further established by Sturrock (1955) and Plies and Typke (1978). As far back as 1957, Wu investigated the fifth-order geometrical aberrations for a rotationally symmetric electromagnetic focusing system, but only derived spherical and distortion aberration coefficients (Wu, 1957). The author (Ximen, 1983, 1
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved.
ISBN 0-12-014733-5
2
JIYE XIMEN
1986) employed the eikonal method to investigate a rotationally symmetric imaging system, an electromagnetic deflection system, an electromagnetic multipole system, and an ion optical system. Later, Li and Ni (1988) proposed the relativistic fifth-order geometric aberration theory. P. W. Hawkes and E. Kasper (Hawkes and Kasper, 1989) have published their encyclopedic book, where the principles of electron and ion optics were presented and the original articles about aberration theory were also reviewed. It is to be noted that most of the studies on the eikonal aberration theory are restricted to the Lagrangian representation using the position- and slope-configuration space. Although Glaser (1952), Sturrock (1953, Hawkes and Kasper (1989) have touched the electron optical Hamiltonian function, the Hamiltonian representation using the position- and momentum-phase space has not been explored to analyze electron optical aberration. It is in the author’s previous papers (Ximen, 1990a, b, 1991a) that electron trajectories have been described in the Hamiltonian representation using generalized position- and momentum-representations, and also electron optical aberrations have been derived by Hamiltonian function up to the sixth-order approximation. Correspondingly, the correction of third-order geometric aberrations has been achieved in both theoretical and practical aspects with the development of high-resolution electron optical instruments in the fields of microelectronic techniques and electron microscopy. In order to further improve the performances of electron optical instruments, it is necessary to investigate higher-order geometric aberrations. Moreover, the technology of microfabrication in the semiconductor industry has been developing increasingly; the thorough study of higher-order chromatic aberrations of electron lenses also has become of great interest. However, in the conventional electron optical higher-order aberration theory, the higher-order aberrations are expanded into very complicated analytical series. Therefore, in the past decade, the most difficult and complicated task of correctly calculating higher- or ultrahigh-order and arbitrary-order aberrations in charged-particle optical systems and accelerators has drawn much attention, and various powerful theoretical tools have been employed in literature. In Dragt’s paper (1981), it was first shown that the Lie algebraic method is a powerful tool for describing nonlinear effects and computing nonlinear orbits in charged-particle optical systems and accelerators. Later, Dragt and Forest (1986) employed Lie algebraic tools for characterizing chargedparticle optical systems and computing some aberrations in electron microscopes up to the fifth order. In Xie and Liu’s paper (1990), an arbitrary-order, approximately
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
3
analytical solution of a universal nonlinear dynamic system equation is approached by the successive-approximation method. The calculation expression for the arbitrary-order, approximately analytical solution of a universal nonlinear dynamic system equation has a simple and clear recurrent property, and hence it is easy to perform the computation procedure by computer software. In Berz’s paper (1990), the differential algebraic approach for designing and analyzing charged-particle optical systems and accelerators was presented. It allows the computation of transfer maps to arbitrary orders for arbitrary arrangements of electromagnetic fields. The resulting maps can be cast into different forms. For example, in a Hamiltonian system, they can be used to determine the generating function or eikonal representation. Also, various factored Lie operator representations can be directly obtained. Obviously, Lie algebraic theory (Dragt and Forest, 1986), the approximately analytical method (Xie and Liu, 1990), the differential algebraic approach (Berz, 1990), and the canonical theory in the present article have physical equivalences and mutual complements. Based on previous papers (Ximen, 1990a, b; 1991a), the author will now apply the canonical aberration theory to calculate up to the ninthorder canonical aberrations in rotationally symmetrical electromagnetic systems. The remaining sections of this paper are devoted to canonical aberration theory in up to ultrahigh-order approximation. Section I1 derives powerseries expansions of different-order Hamiltonian functions for up to the 10th-order approximation in a rotationally symmetrical electromagnetic system. The computational procedure is greatly facilitated by using the important recursive formulas for straightforwardly deriving higher-order Hamiltonian functions by successive approximation. By using Hamiltonian representation, Section 111 derives eikonal function and its power-series expansions for up to the tenth-order approximation in a rotationally symmetric electromagnetic system. The coefficients in canonical powerseries expansions of different-order eikonal functions can be divided into isotropic coefficients and anisotropic coefficients, respectively, which possess rigorous recursive structure. Having derived all different-order eikonal functions in Section 111, we will move to Section IV, and will manage to calculate both intrinsical and combined aberrations up to the ninth-order approximation by means of a gradient operation on corresponding eikonal function. Finally, in Section V we will draw important theoretical conclusions about canonical aberration theory in up to the ultrahigh-order approximation, and will discuss its possible applications and future developments.
4
JIYE XIMEN
11. THEHAMILTONIAN FUNCTION AND ITSPOWER-SERIES EXPANSION A . Fundamental Formulas Lagrange mechanics (Arnold, 1978) describes a particle's motion in a mechanical system by means of an n-dimensional coordinate space. Let i j = ( q l , ...,q,,) be generalized coordinates in a mechanical system having n degrees of freedom, and 4' = ( q l , ..., 4,) be the generalized velocities. The Euler equation is the electron's trajectory equation:
("'aG)
- -
dt
aL-o. aij
Since 4' is a vector in the n-dimensional coordinate space, so that electron's trajectory equation consists of n second-order equations. Hamiltonian mechanics (Arnold, 1978) describes a particle's trajectory by means of a 2n-dimensional phase space (coordinate-momentum space). In fact, the Hamiltonian equation, consisting of 2n first-order equations, is the electron's trajectory equation: 1.
p
=
--aH aij '
aH
q=-
a3
I
where the Hamiltonian H associated with a Lagrangian L is defined by the Legendre transformation: H(g,
d, t ) = - 4; - L(ij, ij, t ) .
It can be shown that the system of Lagrange's equations (n second-order equations) is equivalent to the system of Hamiltonian equations (2n firstorder equations). The Lagrangian theory allows us to solve a series of important mechanical problems, including problems in charged-particle optics. However, Lagrangian mechanics is contained in Hamiltonian mechanics as a special case. The Hamiltonian theory allows us to solve a series of more general mechanical problems, and it has greater value for seeking approximate solutions to perturbation theory, for understanding the general characteristics of motion in a complicated mechanical system, and finally for developing canonical aberration theory and charged-particle beam optics. It is to be noted that, in the usual Lagrangian-Hamiltonian formulations, the time t plays the role of an independent variable, and all conjugated generalized variables, i.e., the canonical variables qi and p i (i = 1, . .., n), are dependent variables, which are viewed as functions of the independent
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
5
variable t. In some cases of interest (Dragt, 1981), it is more convenient to take some coordinate to be the independent variable rather than the time. In electron optics, we always choose the z-coordinate along a rectilinear or curvilinear optical axis to be the independent variable for describing the Lagrangian and Hamiltonian functions. Based on the theory of classic mechanics (Arnold, 1978; Goldstein, 1980), the canonical aberration theory has been discussed in the author's previous papers (Ximen, 1990a, b; 1991a). In the present section, a rotationally symmetrical electron optical system will be discussed. Let the transverse component of a position vector and a momentum vector be r' = (x, y ) and 13 = (px,p,,),respectively. The Hamiltonian function H is defined as follows (Glaser, 1952; Sturrock, 1955; Hawkes and Kasper, 1989; Ximen, 1990a, b, 1991a):
where 7- = (-y,x), r = (x2 + y2)1/2,and 4 = (-e/2rn) > 0. In order to establish the canonical aberration theory in up to the 10thorder approximation, the electrostatic potential 4 and magnetic vector potential A will be expanded into power series (Glaser, 1952; Ximen, 1983, 1986) as follows:
4
=
~ ( z -) u 2 ~ " ( z ) -( 7) i
+ agV'8'(r'.3 A
- =
r
4
+ U , V ( ~ ) ( ~ 7)2 '. - u , v ( @ ( ~i)3 . - U I O V ( ' 0 ) ( r ' *7)s +
+B0(z)- b2B6(z)(7.7)
+ b 8 B ~ 8 ) ( 7 * + ...,
- . a ,
(2)
+ b4B04)(7-r')2 - bsBf)(r'.7)3 (3)
where the magnetic vector potential only has an azimuthal component A ; V ( z )and Bo(z)are axial distributions of electrostatic potential and magnetic induction, respectively; and (7- 7) = x2 + y 2 .
'lo
= 36 x
64 x 64 x 100 '
1 - 24 x 120 x 512'
b -
6
JlYE XlMEN
We can expand Hamiltonian Function H into a power series: H=Ho+H*+H4+H,+Hg+H,o..., where H ,
=
(6)
- V ’ / ’ . In Eq. (l), we obtain
=
v[l - P2 + P4
+ pg - PI0 + ’’‘1. (7) (p’ . a = p: + p i , and the quasi-vector product - P6
where the scalar product (7x p3 = xp,, - ypx. Substituting Eqs. (2) and (3) into Eq. (7), we can classify different-order terms and derived the following expressions for P2,,: Second-order terms:
Fourth-order terms:
Eighth-order terms:
Tent h-order terms:
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
7
Now we derive the recursive formulas for different-order Hamiltonian functions. Using Eqs. (l), (6), and (7), we can obtain the following equations:
The left-hand side of Eq. (14) can be expanded in up to the 10th-order approximation, and then Eq. (14) can be rewritten:
Finally, we obtain the recursive formulas for different-order Hamiltonian functions.
H4 =
2
Ha =
m, H22
-p"fl +
-p8- f l +2
Equations (15) are the most important recursive formulas for straightforward derivation of up to 10th-order Hamiltonian functions by successive approximation, where polynomials P2, (n = 1,2,3,4,5) are given by Eqs. (8)-(12)*
8
JIYE XIMEN
B. Po wer-Series Expansions for Different-Order Harniltonian Functions Substituting polynomials Pzn [n = 1,2, 3,4,5; see Eqs. (8)-(12)] into Eq. ( 1 9 , we derive all power series expansions for different-order Hamiltonian functions. 1. Second- and fourth-order Hamiltonian functions;
H2 = M ( 7 . 7 ) + N -(’”+ V
2Q(7xp3,
where L, M , N, P,Q, K are electromagnetic field distribution functions for describing third-order aberrations (Glaser, 1952; Ximen 1983, 1986): 1
- (V”+ ?,‘Bi)’ - V(4)- 4vB0&
1
N = -21V
M=-(V”+&), 8 V ‘I2 - (V“
+ &)
-
1/2
,
lr;
2. Sixth-order Hamiltonian function:
Hs = L3(7.7)3 + L2(7.7)2-
(’.a + L,(7-7) (p’’a2 (p”a3 + L O T V
v2
where L 3 ,L 2 ,L 1 ,L o ,M I ,M 2 ,M 3 , N1,N 2 ,N3 are corresponding functions for characterizing fifth-order aberrations (Ximen, 1990b, 1991a). It is to be
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
noted that these functions have been modified and corrected as follows:
9
10
JlYE XlMEN
1 =
J4
v (8)
qBi2)Bi4) qB~B&@ +-+L3M [ - 2 x 36 x (64)2 6 x (32)2- 2 x (12)2x 128 32 L2
LN
1 J2 =
7
[x+
-
1
M2
+ L I M + L2N
,
5 V1/2 = 5 --
J
O-
64N 9
128
q1l2Bi6) = yl/z [-ti x 24 x 128 1
JL3
1 LQ JL2 =yl/~
IT +
MP
NP
MQ
1
JLI =
1 1
JMI
=
+ M2M+ M,N+
+ M3M v1/2 [T + 2 + M 3 N + 2QLO]
JL.O =
KL
P2
1
LP ++ M I M + 2QL3 4
1 1
2QL2 ,
+ M2N + 2QL1 =
1
v'/2
1
yl/z [T + 7 + N I M + 2QM1
+ QP + N2M + N1N + 2QM2
1
+ N 2 N + 2QM3
,
+ M3N1
7
1
7
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
11
12
JIYE XlMEN
,1 / 2 ~ $ 8 )
IJ4 = - 24 x 120 x 512 133 =
+ 2J4Q + JL3M+ L,P + - '
$k 3 Q + JL3N + JL2M + L 3 Q + L 2 P + 2
+ M 3 P + "4 + 3 3 ,
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
13
C. Discussion In the canonical theory (Ximen, 1990a, b, 1991a), the power series expressions for Hamiltonian functions are remarkably symmetric with respect to position variables (x, y ) and momentum variables ( p x,p,,). In Eqs. (17), (19), (21), and (23), the power series of W(?,p’,z)only contains the scalar or quasi-vector products (7- 7), (7x p3, which are rotationally invariant quantities. Therefore, employing the Hamiltonian function enables us to avoid introducing rotating coordinate systems and to save performing laborious calculations. The most important result in this section is that we have first derived recursive formulas [see Eq. (15)] for straightforward derivation of up to 10th-order Hamiltonian functions by successive approximation. It is to be noted, only in the fourth-order approximation, that the fourthorder variational function F4 is similar to the fourth-order Hamiltonian H4 (apart from a minus sign). However, in the higher-order approximation, say sixth-, eighth, and 10th-order, F2, is somewhat different from H,, (2n = 6 , 8, 10). Because the power-series expansions for the variational function F and the Hamiltonian function H are based on different principles (Ximen, 1991a), the variational function F is expanded with respect to r‘ and 7‘ in Lagrangian representation, while the Hamiltonian function H is expanded with respect to 7 and p’ in Hamiltonian representation.
@-a,
111. THEEIKONAL FUNCTION AND ITS POWER-SERIES EXPANSION
A . The First-Order Gaussian Trajectory in Hamiltonian Representation According to classic mechanics (Arnold, 1978; Goldstein, 1980), the canonical equation is an electron’s trajectory equation:
r’ I
aH ap’ ’
=-
aH p” = --
a7
a
14
JIYE XIMEN
In the first-order approximation, substituting H2 into Eq. (25), we obtain the first-order Gaussian canonical equations:
Substituting H2 into Eq. (25)’ we get Gaussian trajectory equations:
3;
=
-2iw + 2Qp’-,
3-
= (-Py, P J .
(28)
It is noted that 2Q describes the rotation of the Gaussian electron trajectory. We introduce a rotating rectangular coordinate system indicated by uppercase symbols (Glaser, 1952; Ximen, 1983, 1986):
7 = R’ exp(2i dl
=
Qdz),
p’
=
Fexp (2i
5:
Qdz),
51 = - w g .
5/n,
Therefore, in the rotating rectangular coordinate system, the Gaussian electron trajectory is given by
( f l F ) I + 2MR’ = 0.
(29)
It is proved that in the rotating rectangular coordinate system, (7- 7), (p’ 3,(7 x 3 are rotationally invariant quantities:
-
(7-7) =
(E-E),
=
(F-5),
(7x3
=
(2xP).
By solving Eq. (29), the first-order trajectory and momentum can be expressed as follows: -.
rg = rpra + repa, -t
-t
Zg
=
V
1/2
I’
rpra + V
1/2
I
-t
rap,,
(30)
where rcr,rp are well-known particular solutions to the Gaussian trajectory equation, which satisfy following initial conditions at za : r,(za)
=
0,
rk(za) = V:1’2,
r&,) = 1 ,
rL(za) = 0.
(31)
It is to be stressed that the r,(z) just defined is different from the ordinary one; thus, the Wronsky determinant is given by V1/z(rkrB- r, rb) = 1 .
(32)
Since the first-order quantities ig, Gg can be tacitly understood, the subscript g will be omitted hereafter. From Eqs. (30) and (32), all the scalar
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
15
and quasi-vector products in the power-series expansions of different-order Hamiltonian functions can be expressed as follows:
+ 2(ia - Ja)rarb+ @a
(7-7) = qa*
(ix p3
=
- Ja)r,”= a + B + C!,
(ia x j&).
B. Different-OrderEikonals in Hamiltonian Representation Based on canonical theory in electron optics (Ximen, 1991a), the canonical aberrations in an electromagnetic round lens can be expressed by following eikonal functions (i.e., functions of optical length):
By utilizing Eq. (33), we can transform the products (7. it), @’.a/V, (ia - J a ) , (i xa Ja), and then obtain the into (ia ia), @’a (ix coefficients in power-series expansions of elm:
-
* = (a‘ + 63’ + C ! ’ ) j =
c
X+”+”=j
j ! (a’)X(a’)”(C!’)V,
l!p!v!
However, all even-order power of the quasi-vector product (7 x a> can be rewritten by using the following identities successively:
a2= (iax (7 x a4= (ia x (ix
=
(ia *
=
(ia* ia)Ga * Ja)
-
(ia
+ (ia- Ja14
-
2(ia ia)(Ja Ja)(ia Ja)2.
Ja)4 *
Ja)2
-
-
By this transforming procedure, the number of independent terms in e2,, can be reduced into a minimum.
16
JIYE XIMEN
C. Canonical Expansion of c4 Substituting Eq. (33) into Eq. (17) and performing integration in the interval [z,, z] as defined in Eq. (34), we obtain the canonical expansion
There are nine coefficients of E ~ qjk, , and cijk*, altogether, that are described by the following universal integral functions:
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
17
In Eq. (38), 2 = (K,,K ~ K,~ are ) the dummy arguments, which are constituted by different combinations of r, ,r, ,r; , rb :
D. Canonical Expansion of
&6
Substituting Eq. (33) into Eq. (19) and performing integration in the interval [z,, z] as defined in Eq. (34), we obtain the canonical expansion of E ~
where the coefficients of
& 6 , &j,k,
and
&i,k*
are given as follows:
.
18
JIYE XIMEN
In Eq. (42), ( K , ,K ~ I C, ;~of K ~EK ~, ~&i&, are dummy which There are K'16=coefficients , ) the and E ~ ~arguments, altogether, * , that are are rol, ra , r; , rb : constituted by different combinations of described by the following universal integral functions. .
-
-
In Eq. (42), K' = (K,,K ~ K~, ;K ~ K , ~ are ) the dummy arguments, which are constituted by different combinations of rol,ra ,r; , r; :
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
19
E. Canonical Expansion of &E
Substituting Eq. (33) into Eq. (21) and performing integration in the interval [ z , , z] as defined in Eq. (34), we obtain the canonical expansion of Es:
where the coefficients of e8, cijk, and eijk*, are given as follows:
20
JIYE XIMEN
J za
CANONlCAL ABERRATION THEORY IN ELECTRON OPTICS
21
There are 25 coefficients of E ~ &, i j k , and &ok*, altogether, that are described by the following universal integral functions:
=
[(JL3K1
+3
K
2JL2
2
+ - K33 JL1
)
K4
In Eq. (46), I? = (K,,K ~ K , ~ K, ~ u5, ; K ~ )are the dummy arguments, which are constituted by different combinations of r,, rs, r:, rh:
22
JIYE XIMEN
F. Canonical Expansion of
Substituting Eq. (33) into Eq. (23) and performing integration in the interval [za,z] as defined in Eq, (34), we obtain the canonical expansion of & , 0 :
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
23
NEWIX aAIf
PZ
There are altogether 36 coefficients of described by the following integral functions:
&ok,
and
&;jk*,
which are
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
25
J za
In Eq. (50), K' = ( K , ,K ~ I C, ~ K, ~ x5, ;K 6 , K ~ are ) the dummy arguments, which are constituted by different combinations of r, ,r p ,r:, ri :
26
JlYE XIMEN
G . Discussion
We have derived the canonical power-series expansions of the differentorder eikonal e2,, (n = 2 , 3 , 4 , 5 ) with respect to scalar and quasi-vector products (7, * 7,), (5, Za), (7, (7, x Za), which can be universally
- a,),
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
27
expressed as follows: &2n
=
&ijk(?a ' ? a ) ' ( Z a
Za)J(Ta ' Z a ) k
i+j+k = n
&ijk*(Ta * 7 a I i @ a ' Z a ) j ( ? a
+
' Ja)"
L+j+En-l
I
(?a
x Za)*
(52)
The coefficients &ijk and &ijk* in canonical power-series expansions of the different-order eikonal E,,, can be divided into two groups: The isotropic coefficients &ijk correspond to the homogeneous products of (Ta * Ta), (Za Za), (Ta Fa),without (Ta x Fa),as shown in the first line of Eq. (52), while the anisotropic coefficients &ijk* correspond to the homogeneous (Ta Za), with (Ta x Za), as shown in the products of (Ta * Ta). (Ca second line of Eq. (52). For the eikonal c Z n ,the number of isotropic and anisotropic coefficients &,jk and &ijk* are denoted by Nisoand Naniso:
-
-
Niso= (n + l)(n + 2)/2,
Naniso = n(n
Therefore, the total number of coefficients N,,,
=
+
&ijk
Niso Naniso = (n
and
+ 1)/2.
&ijk*
+ 1)2.
(53)
is equal to Ntot: (54)
In Section 111, it has been proved that the coefficients &ijk and &ijk* possess rigorous recursive structure: For the eikonal c Z n ,its anisotropic coefficients &ijk* originate from the isotropic coefficients &i,k in the lower-order eikonal &2n-2 ; meanwhile its isotropic coefficients &ijk become the anisotropic coefficients &ijk* in the higher-order eikonal E * , , + ~ . However, it is to be emphasized that for different-order eikonals, i.e., E ~ ~ eZn - ~, E,,,+,, , their isotropic and anisotropic coefficients &ijk and &ijk* are expressed by universal integral functions containing different fielddistribution functions in their integrands, which are indicated by a symbol of 1) . ..I\. Eikonal E ~ Six : coefficients &ijk (i + j + k I2) and three coefficients &ijk* (i + j + k II) are expressed by definite integrals FLMN(IIL, M , Nil) and FPQ(IIP,QII) with variable upper limits, see Eqs. (38)-(39); their integrands consist of field-distribution functions L , M , N ; P , Q, K , see Eq. (18). Eikonal E~ : Ten coefficients &ijk (i + j + k I3) and six coefficients &ijk* (i + j + k I2) are expressed by definite integrals F'(IIL,, L,, L , ,Loll), F,(IIM,, M 2 ,M311) and FN(IINI,N,II) with variable upper limits, see Eqs. (42)-(43); their integrands consist of field-distribution functions L,, L 2 ,4 ,L o ;M I ,M 2 ,4; N , , N 2 ,N 3 , see Eq. (20). Eikonale, : Fifteen coefficients &ijk (i + j + k I4) and 10 coefficients &ijk* (i + j + k 5 3) are expressed by definite integrals FJ(IIJ4,J 3 , J2,J , , .loll),
28
JIYE XIMEN
ABERRATIONS UP TO THE IV. INTRINSICAND COMBINED NINTH-ORDER APPROXIMATION
According to canonical aberration theory, knowing eikonal functions , enables ~ us to calculate both intrinsical and combined aberrations up to the ninth-order approximation by means of a gradient operation on the corresponding eikonal function. e2, ...,E
A . Intrinsic Aberrations Up to the Ninth-Order Approximation Having derived eikonal functions e2, ...,eI0in a rotationally symmetrical electromagnetic system, we can calculate the intrinsic aberrations up to the ninth-order approximation with great simplicity by means of a gradient operation on the corresponding eikonal function. In the author’s previous paper (Ximen, 1990b, 1991a), the intrinsical position aberrations and momentum aberrations in the third- and fifth-order approximation have been given as follows:
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
29
Similarly, the intrinsic position aberrations and momentum aberrations in the seventh- and ninth-order approximations can be given as follows:
Therefore, the most important advantage of canonical aberration theory is that the momentum aberrations are much simpler than the slope aberrations in the same-order approximation (Ximen, 1990b, 1991a). From Eqs. (55)-(SS), it is to be noted that, for the 2n-order eikonal cZn, its isotropic and anisotropic coefficients &ijk and &ijk* essentially coincide with (2n - 1)order isotropic and anisotropic aberration coefficients, respectively. For more general theoretical studies (Ximen, 1990b, 1991a), the canonical position and momentum vectors iandp’ will be combined into the four-dimensional vector X; meanwhile, the differential vector 7‘ and 3’ will be combined into XI: X
X=
Y
(59)
PX PY
Similarly, the canonical position and momentum aberrations A 7 and A$ will be combined into the four-dimensional vector AX:
30
JIYE XIMEN
Thus, the Gaussian trajectory, i.e., Eq. (30), can be transformed into the four-dimensional representation: Xg = GX,;
Introducing an antisymmetric fundamental matrix J (Goldstein, 1980),
we can transfer the Hamiltonian equation, i.e., Eq. (25), into the fourdimensional representation:
X' = J -aH
ax
Up to the ninth-order approximation, substituting H,, H4,H , , H 8 , H l o into Eq. (63), we obtain the canonical position and momentum aberrations in the four-dimensional representation: (AX2n-l)intr =
(H2,Jdz
=
a
GJ-c2,,
(n
=
2 , 3 , 4 , 5 ) . (64)
axa
Equation (64) is the important intrinsic aberration expression, which is general in nature and compact in form. B. Combined Aberrations Up to the Ninth-Order Approximation
The interaction between the lower-order canonical aberration (A?, Ap3 and the proper-order gradient of the Hamiltonian ((a/ai)H,, , (a/ap3H2,)
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
31
results in the higher-order combined aberrations. In higher-order canonical aberration theory, the combined higher-order aberrations are the most complicated problem. In the previous paper (Ximen, 1990b, 1991a), we succeed in expressing the fifth-order aberrations (AX5)comb by the gradient operation on a well-defined combined eikonal E ~ , . In this section, we will derive combined aberrations up to the ninth-order approximation. An appropriate fifth-order combined aberration expression is given as follows:
We can transfer Eq. (65) into the four-dimensional representation:
a
(AXs)comb = G J -
(66)
Utilizing the following operator identity (Ximen, 1991a),
a
a
(AT3.$+
a&, =
a
ap, ar, -
a&,
a
-.-), ai, ajj,
we obtain the fifth-order combined aberration expression: (AX4comb
=
GJ-
H4 dz.
(67)
It is to be noted that in Eq. (67), the gradient Wax, (i.e., a/a?, and a/@,) must not act on the lower-order aberrations, i.e., ae4/d7, and ac4/afia (Ximen, 1990b, 1991a). According to classical mechanics (Arnold, 1978; Goldstein, 1980), we define the following Poisson bracket operators:
(68)
32
JlYE XIMEN
where
(a&,,,/aia, a&,,/ajja) v = (a/aTa, a/aza).
VE,,, =
( n = 2 , 3 , 4 , ...),
It is evident that the order of magnitude for the Poisson-bracket operator is respectively given by o([v&4,v1) = o(2), @[v&6 v1) =
0(4), o([v&, vl)
= 0(6),
...
(69) Based on all the results in Eqs. (65)-(69), we obtain different-order combined aberrations by means of Poisson bracket operators: 9
It is to be stressed again that, in Eq. (70), the outside-integral gradient must not act onto the lower-order operators Waxa (i.e., a/aia and intrinsic aberrations VE,,, (i.e., AT2,-, and’Ajj2n-l),but exclusively act onto the lower-order gradient of VH,, . We will give further explanation of the operators in Eq. (70). Ve4 : Having the canonical expansion of e4, i.e., Eq. (36), we can perform the gradient operation on c4 term by term, and then obtain six isotropic coefficients &i,k (i + j + k 4 2) and three anisotropic coefficents eijk* (i + j + k Il), which are expressed by definite integrals FLMN(IIL, M , Nil) and FpQ(IIP,Qb, see Eqs. (38)-(39). VH4: Knowing the canonical expansion of H4, which is essentially equal to the derivate of c4, we can perform the gradient operation on H4term by term, and then obtain six isotropic coefficients &bk (i + j + k 5 2) and three anisotropic coeffients &bk* (i + j + k I1). These coefficients are expressed by field-distribution functions L , M , N; P , Q, K , see Eq. (18), and arguments K , , K , , K ’ , which are different combinations of r,, ra, r:, rk. V&6: Having the canonical expansion of E6, i.e., Eq. (40), we can perform the gradient operation on &6 term by term, and then obtain 10 isotropic coefficients E~~~ ( i + j + k I 3) and six anisotropic coefficients &ijk*
33
CANONICAL ABERRATION THEORY IN ELECTRON OPTICS
(i + j +k I2), which are expressed by definite integrals F L ( l l L 3 , L 2 ,L , ,Loll), F M ( I I M I M2, M 3 I I ) and FN(IIN,,N 2 b , see Eqs. (42)-(43). VH6: Knowing the canonical expansion of H 6 , which is essentially equal to the derivate of & 6 , we can perform the gradient operation on H6 term by term, and then obtain 10 isotropic coefficients &bk (i + j + k I3) and six anisotropic coefficients &&* (i + j + k I2). The coefficients are expressed by field-distribution functions L 3 ,L 2 ,L , ,L o ; MI,M 2 ,M3; N , ,N , , N 3 , see Eq. (20), and arguments K ~ K , , K ~ K , ~ K, ~ which , are different combinations of r, , ro, rk , r; . V&g: Having the canonical expansion of & g , i.e., Eq. (44), we can perform the gradient operation on &g term by term, and then obtain 15 isotropic coefficients &ijk (i + j + k I 4) and 10 anisotropic coefficients &ijk* (i + j + k I3), which are expressed by definite integrals FJ(11J4, J3 J2 JI J O / o , FJL(IIJL~ JL2 JLI JLOll>, FJM(/lJMI JMZ JM3II) and FJN(IIJNi9 J,vzII), see Eqs. ( 4 6 ~ 4 7 ) . VHg : Knowing the canonical expansion of Hg , which is essentially equal to the derivate of & g , we can perform the gradient operation on H g term by term, and then obtain 15 isotropic coefficients &bk (i + j + k I4) and 10 anisotropic coefficients &bk* (i + j + k I3). These coefficients are expressed by field distribution functions J4,J 3 , J, ,J , ,J,; J ~ 3 9Jl.29 J L I , J L O ; J M I , J M 2 , J ~ 3 ; J N , , J N 2 , J N 3 ; s e e E q . (221,andarguments K , , K 2 , K ~ K , ~ K , ~ K ,6 , which are different combinations of r,, ro, r:, r;. V E , , : Having the canonical expansion of cI0, i.e., Eq. (48), we can perform the gradient operation on c10 term by term, and then obtain 21 isotropic coefficients &ijk (i + j + k 4 5 ) and 15 anisotropic coefficients &i,k* (i + j + k I4), which are expressed by definite integrals 9
3
t
Y
Y
9
FI( 115' I4 I3 I2 9 I 1 9 I0 11 1, FIJ( llrJ4 I . 3 ,I . 2 I IJ1 I IJOIl), FIL( llr,53 ,IL2 ,IL 1 ILOll ) I F ~ ~ ( I I II M~ 2i 9 I M ~ I Oand F I N ( ~ I I NI ~~z l l ) ,See Eqs. (50)-(51). V H , , : Knowing the canonical expansion of H l o , which is essentially equal to the derivate of E , , , we can perform the gradient operation on h!,, term by term, and then obtain 21 isotropic coefficients &bk (i + j + k 5 5 ) and 15 anisotropic coefficients &bk* (i + j + k 5 4). These coefficients are expressed by field-distribution functions I s , 14,I , , I , , I , ,I,; '54, 'J3 ' 5 2 9 IJI IJO; 'L3 I L 2 9 ILO; I M I I M 2 , I M 3 ; I N 1 I N 2 9 I N 3 ; see Eq. (24), and arguments K ~ K, ~ K , ~ K , ~ K , ~ K ,6 , K , , which are different combinations of r a , rB,r: , r;. 9
9
9
I
9
3
9
9
9
9
3
V. CONCLUSIONS 1. We have derived not only the power-series for Hamiltonian functions up to the tenth-order approximation, but also the important recursive formulas
34
JIYE XlMEN
for straightforward derivation of higher-order Hamiltonian functions by successive approximation. It has been shown that only the fourth-order variational function F4 is similar t o the fourth-order Hamiltonian H4 (apart from a minus sign). However, in the higher-order approximation-say, sixth-, eighth-, and 10th-order-F,, is somewhat different from H,, (2n = 6, 8, lo), because the power-series expansions for the variational function F and the Hamiltonian function H are based on different principles (Ximen, 1990b, 1991a). 2. We have derived the canonical power-series expressions for eikonal functions up to the 10th-order approximation, i.e., E,, (2n = 4, 6, 8, 10). The coefficients cijk and cijk,, in canonical power-series expansions of different-order eikonal c2,, (2n = 4 , 6 , 8 , 10) can be divided into two groups: (n + l)(n + 2)/2 isotropic coefficients cijk, and n(n + 1)/2 anisotropic coefficients cijk*, respectively. It has been proved that the coefficients &i,k and cijk* possess rigorous recursive structure: For different-order eikonals, i.e., c2,-,, c2, ,E , , + ~ , their isotropic and anisotropic coefficients cijk and cijk* exchange in turn. These coefficients are definite integrals expressed by universal integral functions containing different field-distribution functions in their integrands. 3. According to canonical aberration theory, knowing all different-order eikonal functions enables us to calculate both intrinsical and combined aberrations up to the ninth-order approximation by means of a gradient operation on the corresponding-order eikonal function. In a rotationally symmetrical electromagnetic system, having derived eikonal functions E , , ...,c I O ,we can calculate the intrinsic aberrations (A?,,- Jintr and (n = 2 , 3 , 4 , 5 ) up to the ninth-order approximation by means of a gradient operation on the corresponding-order eikonal function. It is to be noted that, for a 2n-order eikonal E,, , its isotropic and anisotropic coefficients &;jk and cijk* essentially coincide with (2n - 1)order isotropic and anisotropic aberration coefficients, respectively. In higher-order canonical aberration theory, to calculate the combined higher-order aberrations is the most complicated problem. In the present study, we have derived combined aberrations up to the ninth-order approximation, and have successfully expressed the combined aberrations (AT2n-l)comband (A$2n-l)comb (n = 2,3,4,5) by successively operating the Poisson bracket operator onto the corresponding-order Hamiltonian function H,, and eikonal function E,, . 4. The canonical aberration theory using Hamiltonian generalized position and momentum representation has several main advantages. First, the momentum aberrations are much simpler than the same-order slope aberrations, so that the calculation of higher-order aberrations is greatly
CANONICAL ABERRATION THEORY IN ELECTRON OPTlCS
35
facilitated. Secondly, the canonical aberration expressions enable us t o calculate position and momentum aberrations, including axial and off-axial aberrations, at any observation plane in an electromagnetic system with rectilinear or curvilinear axes. 5 . In principle, the canonical aberration theory can be utilized to calculate higher than ninth-order canonical aberrations, including intrinsic and combined position and momentum aberrations, in rotationally symmetrical electron optical systems. It is evident that the calculation of ultrahigh-order canonical aberrations is very complicated. However, the theoretical features of the canonical aberration theory, i.e., its conciseness and simplicity, symmetrical property and recursive structure, give us the attractive possibility of calculating ultrahigh-order canonical aberrations by means of computer software, e.g., Reduce and Mathematica. 6. Canonical aberration theory has been applied to calculate higherorder aberrations in different electron optical systems, including numerical computation of third- and fifth-order geometrical aberrations in rotationally symmetrical electrostatic lenses (Ximen, 1990a, b, 1991a; Liu and Ximen, 1993), numerical computation of third-order chromatic aberrations in rotationally symmetrical electrostatic lenses (Ximen, 1991b; Liu and Ximen, 1992), and the calculation of different canonical aberrations from lower order to higher order in 2N-pole electromagnetic multiples (Ximen, 1990~;Shao and Ximen, 1992). ACKNOWLEDGMENTS
This work was supported by the Doctoral Program Foundation of the Institute of Higher Education of China. The author also wishes to thank Prof. Zhixiong Liu of Peking University for participating in numerical computations of higher-order electron optical aberrations. REFERENCES Arnold, V. I . (1978). “Mathematical Method of Classical Mechanics.” Springer-Verlag, New York/Berlin. Berz, M. (1990). Nucl. Instrum. Methods. Phys. Res. A298, 426. Dragt, A . J . (1981). Lectures on nonlinear orbit dynamics. I n “Physics of High Energy Particle Accelerators” (R. A. Carrigan, F. R . Huson, and M. Month, Eds.). Amer. Inst. of Physics, New York. Dragt, A . J . , and Forest, E. (1986). Lie algebraic theory of charged particle optics and electron microscopes. I n “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Vol. 67. Academic Press, New York.
36
JIYE XIMEN
Glaser, W. (1933a). Z. Physik 81, 647. Glaser, W. (1933b). Z. Physik 83, 104. Glaser, W. (1952). “Grundlagen der Elektronenoptik.” Springer, Wien Goldstein, H. (1980). “Classical Mechanics” (2nd ed.). Addison-Wesley, Reading, Mass. Hawkes, P. W., and Kasper, E. (1989) “Principles of Electron Optics.” Academic Press, New York/London. Li, Y., and Ni, W. (1988). Optik 78, 45. Liu, Z., and Ximen, J. (1992). J. Appl. Phys. 72, 28. Liu, Z., and Ximen, J. (1993). J. Appl. Phys. 74, 5946. Plies, E., and Typke, D. (1978). Z. Nuturforsch. 33a, 1361. Schemer, 0. (1936a). Z. Physik 101, 23. Schemer, 0. (1936b). Z. Physik 101, 593. Shao, Z., and Ximen, J. (1992). J. Appl. Phys. 71, 1588. Sturrock, P. A. (1955). “Static and Dynamic Electron Optics.” Cambridge Univ. Press, London/New York. Wu, M. (1957). Actu Phys. Sinicu 13, 181. Xie, X., and Liu C. (1990). Chinese J . Nucl. Phys. 12, 283. Ximen, J. (1983). “Principles of Electron and Ion Optics and Introduction to Aberration Theory.” Science Press, Beijing. Ximen, J. (1986). Aberration theory in electron and ion optics. In “Advances in Electronics and Electron Physics” (P. W. Hawkes, Ed.), Suppl. 17. Academic Press, New York. Ximen, J. (1990a). Optik 84, 83. Ximen, J. (1990b). J . Appl. Phys. 68, 5963. Ximen, J. (1990~).J. Appl. Phys. 68, 5968. Ximen, J. (1991a). Canonical theory in electron optics. In “Advances in Electronics and Electron Physics” (P. W. Hawkes and B. Kazan, Eds.), Vol. 81, p. 231. Academic Press, New York. Ximen, J. (1991b). J . Appl. Phys. 69, 1962.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 91
New Developments in Generalized Information Measures INDER JEET TANEJA Departamento de Matematica, Universidade Federal de Santa Catarina Florian&polis,Brazil
.
.
I. Introduction . . . . . . . . . . . . . . . . . . . . . 11. Unified (r, s)-Information Measures . . . . . . . . . . . . . . . A. Composition Relations among the Unified (r, s)-Information Measures . . B. Shannon-Gibbs Type Inequalities . . . . . . . . . . C. Inequalities among Unified (r, s)-Measures . . . . . . . . . . . D. Different Kinds of Convexities . . . . . . . . . . . . . . . E. Optimization of Unified (r, s)-Inaccuracies . . . . . . . . . . . F. Properties of Unified (r, s)-Entropy Relative to pmnx and pmin . . . . . 111. M-Dimensional Unified (r, s)-Divergence Measures . . . . . . . . A. Properties of M-Dimensional Unified (r,s)-Divergence Measures . . . . IV. Unified (r,s)-Multivariate Entropies . . . . . . . . . . . . . . . A. Different Forms of Unified (r, s)-Conditional Entropies . . B. Properties of Unified (r, s)-Conditional Entropies . . . . . . . . . C. Unified (r, s)-Mutual Information . . . . . . . . . . . . . . V. Applications . . . . . . . . . . . . . . . . . . . . . . A. Markov Chains . . . . . . . . . . B. Comparison of Experiments . . . . . . . . . . . . . C. Connections of Unified (r, s)-Divergence Measures with the Fisher Measure of Information . . . . . . . . . . . . . . . . . . . . . D. Unified (r,s)-Divergence Measures and the Probability of Error . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
. . . .
.
. . . . . . . . . .
.
37 41 48 53 57 62 71 73 75 82 95 97 98 107 110 111 112 115 120 132
I. INTRODUCTION The concept of Shannon’s (1948) entropy is the central concept of information theory. Sometimes this measure is referred to as the measure of uncertainty. The entropy of a random variable is defined in terms of its probability distribution and can be shown to be a good measure of randomness or uncertainty. Kullback and Leibler (1951) introduced a measure of information associated with two probability distributions of a discrete random variable, famous as relative information. At the same time they also developed the idea of the Jeffreys (1946) invariant, famous as J-divergence. Kerridge (1961) developed another measure similar to Shannon’s entropy involving two probability distributions, calling it the inaccuracy measure. Sibson (1969) studied the idea of information radius, 37
Copyrighl 0 199s by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014733-5
38
INDER JEET TANEJA
generally referred to in literature as the Jensen difference divergence measure. These five classical measures have found deep applications in statistics. In this chapter, we also introduce a new divergence measure, calling it the arithmetic-geometric mean divergence measure. We have also studied unified (r, s)-generalizations of the preceding six information and/ or divergence measures. The divergence measures-that is, J-divergence, Jensen difference divergence, arithmetic-geometric mean divergence, and their respective unified (r, s)-generalizations-have been studied in the M-dimensional case. Let
be a set of all probability distributions associated with a discrete finite random variable. Let P I ,P z , ...,PM E A, be M-probability distributions. Let ( A , , A 2 , ..., A M ) E A M . It is well known that the Shannon’s entropy satisfies the following two inequalities: H(Pj) 5 H(P, 11 Q k ) ,
V j , k = 1,2, ..., M ,
j # k,
(1)
and M
/ M
with equality iff PI = P2 =
\
... = P M ,where n
H(P) = -
C Pilog2~i i=1
and n
H ( P II Q) = -
C Pi log2 qi
(4)
i= 1
for all P , Q E A,. The measure H ( P ) is the Shannon’s entropy (Shannon, 1948), and the measure H(PII Q) is the inaccuracy (Kerridge, 1961). The inequality (1) is known as the Shannon-Gibbs inequality, and the inequality (2) arises from the concavity property of Shannon’s entropy. The inequality (1) is applied to many important theorems of information theory and statistics. It is also applied in solving interesting mathematical problems. As a result of inequality (l), we have
for every k = 1,2, . . . , M with equality iff PI = P2 =
... = P M .
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Multiplying both sides of (5) by A & , summing over k using (2), we have M
M
5
C
=
1,2,
39
..., M , and
M
AjAkH(4 IIPk),
(6)
j=l k=l
j =1
with equality iff PI = P2 = = PM. The preceding inequalities (6) admit the following three non-negative differences given by
I(P1,Pz,..., PM) = H
and
M
/ M
\
40
INDER JEET TANEJA
where D(P (1 Q) appearing in (7), (S), and (9) is the relative information (Kullback and Leibler, 1951) given by
From (7), (S), and (9), we conclude that - - - , P M+) T(Pl,P2, - - - , P M=)J(P,,P2, ...,P&f).
(11) In particular when M = 2, PI = P, and P2 = Q,then from (7), (S), and (9) we have the following two-dimensional measures: 1(P1,P2,
respectively. Throughout the paper, it is understood that all the logarithms are base 2. 0 log 0 = 0 log 0/0 = 0. Whenever qi = 0 for some i , the corresponding pi is also zero for that i. The measure (7) is the M-dimensional generalization of the information radius or Jensen difference divergence measure (Sibson, 1969) and was studied for the first time by Burbea and Rao (1982a, b). For simplicity we call it the I-divergence. The measure (8) is the M-dimensional generalization of the J-divergence and was studied for the first time by Toussaint (1978). The measure (9) is presented for the first time. The non-negativity of (9) allows us to conclude an interesting arithmetic and geometric mean inequality, i.e., M
C
j= 1
M
~~p~~ 2
n p?,
v i.
j= 1
Becailse of this, we call it the A & G-divergence or simply the T-divergence.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
41
In past years, researchers’ interest tended toward one- and two-scalar parametric generalizations of the measures (3), (4), (lo), (12), and (13). RCnyi (1961) was the first to generalize parametrically the measures (3) and (10) simultaneously. Nath (1968) presented the first generalization of the measure (4). Rathie and Sheng (1981) and Burbea and Rao (1982a, b) presented the first parametric generalization of the measure (13). Burbea and Rao (1982a, b) also presented the first one-parametric generalization of the measure (12). Sharma and Gupta (1976) and Sharma and Mittal (1975, 1977) studied two-parametric generalizations of the measures (3), (4), and (10). Taneja (1983, 1987) studied two-parametric generalizations of the measure (13). Taneja (1989) studied two-parametric generalizations of the measure (12). Also, Taneja (1989) unified some of these generalizations and studied their properties, calling them unified (r, s)-information or divergence measures. Here we have obtained some of these unifications via composition of the mean of order t with Box and Cox’s transformation function or its generalized version. Some connections of these generalized measures with CsiszBr’s +divergence measure are also given, leading to relations with Fisher’s information measure. Some properties and applications to different areas of these unified measures are also studied. The areas covered are pattern recognition, comparison of experiments, noiseless coding, Markov chains, etc. Multivariate measures of unified (r, s)-entropy are also studied. The unified (r, s)-measures we have divided in two parts: one on generalizations of the measures (3), (4), and (lo), called unified (r, s)-information measures; and the other on the generalizations of the measures (7), (8) and (9), called M-dimensional unified (r, s)-divergence measures, and in particular, the two-dimensional case, called unified (r, s)-divergence measures. Most of the results presented in this work are the author’s or a joint contribution with the author; some of the work presented was done during author’s stay at the Universidad Complutense de Madrid, Spain, from October 1989 to February 1991, The arithmetic-geometric mean divergence measure and its unified (r, s)-generalizations are presented and studied for the first time in the chapter. 11.
UNIFIED (r, S)-~NFORMATION MEASURES
In this section we study two parametric generalizations of the measures H ( P ) , H(P 11 Q), and D(P I( Q). For the measure H(P 11 Q), we give three different kinds of generalizations. These generalizations are written in unified (r, s)-forms, called respectively unified (r, s)-entropy, unified (r, s)inaccuracies, and unified (r, s)-relative information, where r and s are real parameters. These generalizations are also derived through the composition
42
INDER JEET TANEJA
of the mean of order t and Box and Cox's transformation function, called the generalized mean of order t or the unified (r,$-mean. The ShannonGibbs type inequality has been generalized in different ways. Different kinds of convexities and relationships among the measures are also studied.
(al) Unified ( r , &Entropy The unified (r, s)-entropy measure is given by
[
H,s(P) = (21-s -
w[(
.:>'
s- l)/(r-
i= 1
1)
-
I],
r f 1 , S f 1,
r = l , s + 1, H,'(P) = (1 -
r)-l
log,
c pi) ,
r # 1 , s = 1, r = 1 , s = 1. (15)
(a2) Unified (r, s)-Inaccuracies The three different kinds of unified@,s)-inaccuracy measures are given by - l)-'["Kr(P 11 Q)(s-l)'(r-l) - 11, "H;((p 11 Q) = r # 1 , s z 1,
CY
= 1, 2, and 3, where
43
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
(a3) Unified (r, +Relative Information The unified (r, s)-relative information measure is given by
[(ifilP;q;-y-l)/('-l)
e ( P 11 Q) = (1 - 2'-')-'
-
1],
r # l , s # 1,
G ( P 11 Q)
=
(1 - 21-S)-1[2(S-1)D(pIIQ) - 11, r
=
1, s z I ,
r = 1 , s = 1. (17)
We observe the following particular cases: (i) When r = s = 2, we have
@(P I Q)
=
2b(P II Q)
-
11,
and D:(P II Q)
=
b 2 [ W II Q)1,
where o(P IIQ) = Cy= pfq;' is the well-known Pearson's w2divergence. (ii) When r = s = -5, we have
D E ( P II Q)= (1
- fi)-'[p(P
II Q) -
-2 log&"
II Q)1,
11,
and D:/2(P II Q)
=
where p(P I( Q) = Cy= I p!/2qf/2 is the well-known Bhattacharyya's or Hellinger's distance and -log,[p(P 11 Q)] is the Bhattacharyya's coefficient. (iii) We can write
D~(PI( where U
=
(l/n,
I/)=
n"-'[Xs(P)- x~(u)],
(18)
..., l/n) E An, and
It is customary to study the generalized measures given in (15), (16), and (17) for positive values of r and s. Taneja (1989) and Taneja et al. (1989) studied them for r > 0 and any s. The natural question arises, why keep
44
INDER JEET TANEJA
r > O? Why not study them for r IO? This is our aim in this chapter, i.e., to study measures (15), (16), and (17) for r , s E (-m,m). By considering r c 0, the problem again arises that we can find probability distributions such that the measures (15), (16), and (17) become infinite. To avoid this, we shall redefine the set A,, in the following way:
I cn
= ( p l ,. . . , p , , )
p i = 1 with p i 2 0 for r > 0
i= I
and p i > 0 for r
I0,
I
vi
.
Unless otherwise specified, from now on we shall consider the measures XS(P), "XS(P I( Q) (a = 1,2, and 3) and Bf(P(1 Q) for all P, Q E rA,,, r, s E (-00, 00). We observe that the measures Xs(P), "Xf(P11 Q)(a = 1,2, and 3), and BS(P 11 Q) are continuous with respect to parameters r and s. This allows us to write them in the following simplified forms:
XS(P) = CEIH,"(P)lr# 1, s # I), "XS(PI1Q) = CE("NS(PI1Q)Ir # 1, s # 1)
(a = 1,2, and 3),
and
I z 1, s z 11,
BXP 11 Q)= CEPXP I( Q) r
for all r, s E (-00, m), where CE stands for continuous extension with respect to r and s. Thus, any result holding for r # 1, s # 1 extends for any r and s. In view of this, we shall prove all the results only for Hs(P), "H;(P )I Q) (a = 1,2, and 3), and q ( P 11 Q), r # 1, s # 1. For ( V , W ) E A,, x I',,,let us consider the following generalized measure:
t # O , S #
-
1,
11,
t = O , s # 1, t # O , s = 1, t = O , s = 1, (19)
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
45
for all t, s E (-00, m), where
r, = ( w = ( w , , ..., w,) I wi 2 0, i = 1,2, ...,n) c IR" f o r t < O , w i > O , u i > O f o r i = 1 , 2,..., n. The measures (15), (16), and (17) can be obtained as the particular case of the measure (19) in the following way: (i) Take ui = wi = p i , v i = 1,2, ..., n and t = r - 1 in (19); we get @,S(V11 W ) = XS(P). (ii) Take ui = p i , wi = q i , V i = 1,2, ..., n and t = r - 1 in (19); we get @,S(V11 W ) = 'XSP 11 Q). (iii) Take ui = p i , wi = Q ~ v, i = 1,2, ..., n and t = (r - l)/r, in (19); we get @,S(V11 W) = 'XS(P 11 Q). (iv) Take ui = pI/C;!lp;, wi = q i , v i = 1,2, ...,n and t = 1 - r in (19); we get @,S(V11 W) = ' X 3 P 11 Q). (v) Take ui = p i , wi = p i / q i ,v i = 1,2, ..., nand t = r - 1, in (19); we get @S(V 11 W ) = W' 11 Q). We can write @S(V II W ) = w 4 ( V II W ) ) , where
(20)
t
=
0,
and
for all t, s E (-00, 00) and x E (0, a),for t < 0 in (20), we have ui > 0, V i. The measure (20) is famous as mean of order t (Beckenbach and Bellman, 1971, p. 17) and the measure (21) as Box and Cox's transformation (Box and Cox, 1964). We shall call the measure @,S( V 11 W) the unified (t, +mean or generalized mean of order t. It can be written as
@SWII W ) = CE(4"V II W )I t
f
0, s f 11,
where CE stands for continuous extension with respect to t and s. More details of the measure (19) can be seen in Quesada and Taneja (1993,1994). [See also Taneja (1994) for more studies given in this section.]
46
INDER JEET TANEJA
As a consequence of the properties of mean of order t given in (20) (Beckenbach and Bellman, 1971), we have the following classical inequalities frequently used later in the chapter.
Jensen ’s inequalities:
,I
s( i p i a i y ,
C Pis;
i= 1
2
o<w
i= 1
(22)
(
o > 1 or w < 0,
,!,piai>”.
with equality iff ai = c , v i = 1,2, ...,n, where P = ( p , , p 2 , ..., p , ) and ai are non-negative real numbers.
E
rhn
Holder’s inequalities:
with equality iff for some c, a? = ~ b ~ / V~ i,- where ’ , ai and bi are nonnegative real numbers. For o < 0, ai > 0 , bi > 0 , V i.
Minko wski’s inequalities: l/O 9
i(i
k=l
j=1
w < l,w#O,
1/o
w > 1,
with equality iff ajk are independent of j, i.e., ajk = c k , V j , k , where ajk are non-negative real numbers. For w c 0, ajk > 0 v j , k. The following inequalities will also be used frequently: sza;,
(jlaiY[
w
(25)
1:’
2
c
~ $ 0 ,
a;,
w
> 1,
i= 1
with equality iff at most one ai > 0. Our aim in this section is to generalize the inequality (1) involving the measures (12) and (13). Different kinds of convexities such as convexity, Schur-convexity, pseudoconvexity, and quasi-convexity of the unified
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
47
measures (15), (16), and (17) are studied. Some inequalities among the measures are also obtained. The definition of convexity is well known in the literature, but the other kinds are not so widely known. We shall rewrite these definitions below:
Definition 1 (Convexity). A numerical function 9: A, convex on A,, if for all P, Q E A,,, we have
+
IR (reals) is
1 W ) + M Q )2 W p + PQ), for all 0 < 1, p < 1 with 1 + p = 1. For concave functions, the above inequality is reversed.
Definition 2 (Pseudoconvexity). A numerical function 9: A,, is pseudoconvex on A,, if for all P, Q E A,,, we have
+
I? (reals)
VS(P)(Q- P ) 1 0 implies S(Q) 1 9(P), where V represents the gradient operator. For pseudoconcavity, we have
VS(P)(Q- P ) I0 implies S(Q) I9(P), for all P, Q E A,.
Definition 3 (Quasiconvexity). A numerical function 9: A, quasiconvex on A,, if for all P, Q E A,, we have
+
IR (reals) is
9(Q) I9(P) implies VS(P)(Q- P ) I0. For quasiconcavity, we have
9(Q) 1 6(P) implies Vb(P)(Q- P ) 1 0, for all P, Q E A,.
Definition 4 (Majorization). For all P, Q E A, , we say that P is majorized by Q, i.e., P < Q, if (a)
p11 p 2 1
' - a
lpn;
q, 1 q 2 2
--*
1 q,,
with U
U
C pi s
C qi,
1s o s n ,
i= 1
i= 1
or (b) there exists a doubly stochastic matrix (aik),aik 2 0, i, k = 1,2, with aik = C; = aik = 1 such that n
pi =
C k=l
aikqk,
i = 1,2,
...,n.
...,n
48
INDER JEET TANEJA
Definition 5 (Schur-convexity). A numerical function 9: A,, + IR (reals) is Schur-convex on A,, if P c Q, i.e., P majorized by Q implies 8(P) I9(Q) for all P, Q E A,,. For Schur-concavity, we have
P c Q implies 8(P) 2 9(Q), for all P, Q E A,,. Definition 6 (Schur-convexity in M-dimensional case). Let pi = (PU,PV,...,P.) E A n , Qj = (QU,qv,.-.,qlv.) E An, j = 1,2, . . . , M . A function F: A t + IR (reals) is Schur-convex on A: if ( P I ,P 2 , ...,PM)c ( Q 1 ,Q 2 , ..., Q M ) implies F(Pl,P2, . .,PM)5 F ( Q l , Qz , ...,Qd,where (5,P2, ...,PM)c ( Q 1 ,Q2,..., QM) means that there is a doubly aik = E:= a, = 1 such stochastic matrix [ait),i, t = 1,2, ..., n with that
.
n
pJ.I . =
C t=
aitqjt,
i = 1,2, ..*,n;
j = 1,2,
...,M .
1
Definitions 1-6 and inequalities (22)-(25) can be found in standard books on convex functions and inequalities, such as Beckenbach and Bellman (1971), Marshall and Olkin (1979), Mangasarian (1969), and Roberts and Varberg (1973). A. Composition Relations among the
h i f e d (r, s)-Information Measures
and
for all x
L
0.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
49
where & ( x ) is as given in (21). The function e,(x) satisfy some interesting properties given by the following result. Result 1.
We have the following:
e s w
20,
05x51,
so,
x 2 1,
with equality iff x = 1. S
lim O,(x)
(ii)
I
1,
=
s+o+
s > 1. O < X I
1,
x = 1,
(iii) S-W
x > 1.
o<x<+, x=1 29 x>+.
(v) (vi) (vii) (viii) (ix)
B,(x) is a convex function in x for s < 2. O,(x) is a concave function in x f o r s > 2. es(x)is a monotonically decreasing function of x. e,(xy) = e,(X) e , ( y ) (FS - i)e&)e,(Y),for U I I X , y E R+. eS(x)is a monotonically decreasing function of s for x 2 1.
+
+
where
el&) = lim1 &(x) = -log,x s-.
50
INDER JEET TANEJA
and
with
I
4)
with equality iff x
=
1 1 ,
s 2 1,
5 1 ,
s I1.
1 or s = 1.
or (0< x with equality iff x
=
5
+ or x
1 1,
s 1 2),
3 or x = 1 or s = 2.
Proof. Parts (i) through (viii) are easy verifications. (ix) For s # 1, we have d
-(e&)) ds
= (-1)(2'-'
-
1)-22(+)s(1n+)(xs-1- 1) xs-
=
(-1)(21-S
- 1)-2[
1
-
+ (2'-'
1 21-Sln2l-s
s- 1
+
- I)-l(x-'x"Inx)
1 - 21s-1
xs-
1
In
xs-
I
1
Since ulna 1 a
- 1,
a 10
(OlogO
=
0)
and
hold, then we conclude that
d
ds(Os(x)) I0
for x
2 1.
This proves that the function e,(x) is monotonically decreasing in s for x 1 1.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
51
(x) We know that lna 5 a - 1,
a
> 0.
Taking a = x'-I, x > 0, and a = 2'-', and rearranging the terms, we get the required result. (xi) For a > 0, we know that (Hardy et al., 1934,p. 40, Th. 4.2)
- l), s y ( a - l),
2 y(a
y 2 1,
Ocysl.
Take a = 2'-' and y = -log,x; we get the result. (xii) Let us consider the function q,(x) = I - x - +e,(x). Then q; (x) = - 1 - +e; (XI
and q:(x) = -+e:(x), where (21- - l)-l(s - 1)x+2,
s # 1,
e;(x) =
s = 1,
and
(21- - I)-+ - I)(~ - 2)xs-3,
#
1,
s = 1.
In 2 x2 ' Since (s - 1)/(2'-' - 1) < 0 for any s, then
< 2,
> 0,
s
CO,
s>2.
This implies that
< 0,
s
c 2,
> 0,
s
> 2.
v:(x) = -+e:(x)
52
INDER JEET TANEJA
Thus, we conclude that qs(x) is strictly concave for s < 2 and is strictly convex for s 1 2. It will attain its maximum or minimum at qi(x) = 0, i.e., when l/(s-l)
X =
s # 1 , s # 2,
,
“2(7-sl)]
s = 1,
i.e., at this point of x , the function q,(x) attains its maximum or minimum. Also, the only zeros of qs(x)are when x = or x = 1 or s = 2. Using part (ii), we conclude that
x-o+
2(21-s - 1 ) ’
s > 1.
Since
(>O,
s>2,
this gives
This expression, along with other considerations, completes the result. Thus, as a consequence of Result 1, the functions nt,(x) and X , ( x ) satisfy some interesting properties given by the following results. Result 2 . (i) (ii) (iii) (iv)
We have
nt,(x) 2 0 with equality iff x
=
0;
nt,(x) is an increasing function of x ; nt,(x) is strictly convex in x for s < 1; nt,(x) is strictly concave in x f o r s > 1; W X )
[
2
N(s)x,
s 5 1,
I
N(s)x,
s 1 1,
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
53
where
with
N(s) 2
x,
(iv) I X ,
Result 3. (i) (ii) (iii) (iv) (v)
[
< 1,
s
>1,
s>l.
(x 2 1 , s
I1 )
< 1,
or (0 I x
I
1, s 2 l),
( 0 1 x 1 1 , s l~) o r ( x > 1 , s 1). ~
We have
X,(x) L 0 with equality iff x = 0; X,(x) is an increasingfunction of x; X , ( x ) is an increasing function of s; X,(x) is a strictly convex function of x for s > 1; X,(x) is a strictly concave function of x for s < 1 . B. Shannon-Gibbs Type Inequalities
In this subsection, we shall present generalizations of inequality (1) involving measures (15) and (16). Before giving the generalized Shannon-Gibbs type inequalities, we shall prove the non-negativity of the measures (15), ( l a ) , and (17). For it we need the following result.
Result 4. Let G,S(P)= (21-S - l)-lkr(x)(s-l)’(r-l)- 11,
r # 1, s # 1,
where g,(x) is a real function such that g,(x) = 1 , with
GS(x)= CE(G,S(x)I r # 1 , s # 1). The following holds:
(9 If
holds, then
54
INDER JEET TANEJA
(ii) If
holds, then
6x4 2 6 3 Y ) . Proof. Raising both sides of inequalities (29) and (30) by (s - l ) / ( r - l), multiplying by (2'-' - l)-' (s # 1) and simplifying, we get the required result. Proposition 1.
We have
(i) 0 5 XS(P) < 00 with equality iff P = Po; (ii) 0 I"Xs(P11 Q) c 00 (a = 1 , 2, and 3) with equality iff Q = Qo;
0 Ia>S(P)) Q)< 00,
(iii) (al) (a21
-m
r 2 0;
c Ds(P((Q ) 5 0,
r I 0;
with equality iff P = Q or r = 0, where P o or Q" are the probability distributions such that one of the probabilities is one and all others are zero. Proof. For 0
1, we know that 5
q'
[
t
1,
> 0,
t c 0,
2 1,
with equality iff q = 1 . In view of (31), we get the following inequalities: 1,
r c 1,
I1,
r > 1,
2 i= 1
and
with equality in (32) iff P = P o and in (33) to (35) iff Q = Q".
55
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
In (22), take pi
= q i , ai = p i / q i , v i ,
i= 1
and w = r; we get
11,
01r
2 1,
r> 1 orrs0,
with equality iff pi = q i , v i, i.e., P = Q or r = 0. Applying Result 1 over inequalities (32)-(36), we get the required result. The following two propositions give the generalized Shannon-Gibbs type inequalities.
Proposition 2. For m
withequalityiff Q
=
=
2 and 3 , we have
P'orr = 0 f o r a = 2 , a n d P = Q o r r = 0 form = 3.
h o o f . We shall prove the theorem for each value of m separately. For m = 2. In (22) take pi = q i , ai = piq;'", V i, and w = r; we get
with equality iff pig;'/' = c, i.e., pi' = crqj,i.e., iff qi = ~ i ' / C y = ~ p i ' , i = 1,2, ...,n, i.e., iff Q = P' E 'A,, or r = 0. Inequality (38) and Result 4 completes the proof. For m = 3. From (36), we can write
V
i Pi'
i= 1
pig;-',
oI
r < 1, (39)
n
~ f q f - ~ , t > 1 or r
5
0,
with equality iff pi = q i , V i, or r = 0. Inequality (39) and Result 4 complete the proof. It can easily be checked (Kapur, 1987) that the inequality (37) does not hold in genera1 for 01 = 1 . In the following we shall prove that it holds under certain conditions.
56
INDER JEET TANEJA
with equality iff P
=
Q.
Proof. In inequality (23) take ai = pI'(r-l)qly bi = P;'('-~), and o (r - l)/r we get
(i
r/(r- I)
i = 1 Piq;-l)
=
r r O , r # 1,
(i
i=l
P:> 2
(42)
c 41,
r s 0,
i= 1
with equality iff p i = q i , v i, or r = 0. From (40) and (42), we have rr0,rfl rsO i.e.,
r f 1, i.e., f
n
I
1 p;,
r > 1,
i= 1
1 Pi4J-I
(43)
i= 1
2 cpr, i
r < 1,
i =n 1
with equality iff p i = q i , v i. The rest of the proof follows from (43) and Result 4. Corollary 1. If either P < Q, r (41) holds.
I1,
or Q < P , r
2 1,
then the inequality
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
57
Proof. Using Definition 4 and Jensen’s inequalities (22), we can easily check that P < Q implies f
n
O
I
~
I
~
,
i=1
(44) i=1
PI,
2
r z 1o r r s ~ .
i= 1
In view of (42) and (44),we have n
n
C pi&’ i=1
2
C pr, i=1
r I 1.
(45)
Similarly, we can prove that Q < P implies n
n
C
I
i= 1
C pf,
r 2 1.
i= 1
Inequalities (45) and (46) together with Result 4 complete the proof.
Proposition 4.
If P < Q, then
‘ W p11 Q). Proof. If P < Q, then from (44),we have
W(Q) I
(47)
From (42) and (48) we get
(z t q i ,
r I 1.
i= 1
Inequality (46) and Result 4 complete the proof.
C. Inequalities among Unified (r, s)-Measures The measures X ; ( P ) , “XS(P11 Q) (a! = 1, 2, and 3), and D;(P (1 Q) bear some interesting inequalities among themselves, and these are given in the following propositions. Monotonicity with respect to the parameters is also studied.
58
INDER JEET TANEJA
Proposition 5.
We have
with equality iff Q = U, r # 0, r # 1, or i f f r = 0 or r U = (l/n, ..., l/n) E An.
Roof. In inequality (22) take ai = &')Ir
=
1, where
and o = r ; we get
with equality iff qi = l/n, v i , or r = 0. Inequality (51) and Result 4 completes the proof. Proposition 6.
We have
' X X P I1 Q)2 Q:(p 11 Q), with equality iff P
=
(52)
Q = Po.
Proof. Since 3X;(P11 Q) 2 0 for any r and D:(P 11 Q) I 0 for r
I0, it is sufficient to show the result only for r > 0. Dividing both sides of (32) by C:= I p{q:-r, raising by (s - l ) / ( r - 1) and subtracting 1 , we get
El= 1 Pf CI= I Pir qi1-r)
(
(s- I)/@-
1)
- 1
r-1 with equality iff P = P o
) or (r c 1 , -r -
1
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
59
Again raising both sides of (36) by (s - l)/(r - l ) , we get 151
( r > l o r r r O , - ,-,>O) r- 1
I with equality iff P = Q or r = 0. Applying (54) over (53), multiplying both sides by (2'-' - 1)-' (s # 1 ) and simplifying, we get the required result.
Proposition 7.
We have
with equality iff Q
=
P' or r
with equality iff P
=
Q or r
with equality iff P
=
Q.
=
=
0.
0.
(iii)
Proof. (i) Summing a):(P 11 Q) on both sides of (37)for a = 2, and s = 1 , we get the required result. (ii) Taking log( .) on both sides of (42), we get the required result. (iii) Proof follows in view of part (ii).
60
INDER JEET TANEJA
(iv) In equality (23) take ai = q[p:’(r-l), bi = pi’(l-r)qi, and o (r - l)/r; we get q;+l,
r 2 0, r
=
# 1,
(55) 2
C
qJ+’,
r 5 0.
i= 1
Since
1
c q;,
r
I0,
i= 1
then from ( 5 9 , we conclude that
Taking log( .) on both sides of (56), we get the required result. Remark 1. The measures H;(P) and D;(P 11 Q) are one-parametric generalizations of the Shannon’s entropy and of Kullback-Leibler’s relative information, respectively, both studied by RCnyi (1961). The measures “X,‘(P1) Q)(a = 1, 2, and 3) are three different one-parametric generalizations of Kerridge’s (1961) inaccuracy studied by Nath (1968, 1975) and Van der Lubbe (1978), respectively. Proposition 5 gives inequalities among these measures. But, unfortunately, these are not extendable for unified (r, s)information measures. Part (iv) of the preceding proposition has no meaning for 0 < r < 1, because in this case the LHS becomes negative. Also, for 0 < r c 1, part (ii) has meaning, provided the LHS remains positive. Inequality (49) is similar to one studied by Fisher (1977).
Proposition 8.
We have
(i) X s ( P ) and “ X : ( P 1) Q) (a = 1 and 2) are monotonically decreasing functions of r (s fixed). (ii) Ds(P )I Q) is a monotonically increasingfunction of r (sfixed)and of s (rfixed). The proof is based on the following result:
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Result 5.
61
The function given by
is monotonically decreasing in r, provided f '(r) 2 0, and is monotonically increasing in r, provided f '(r)I 0, where f (r) is a function of r, f '(r) = (d/dr)f ( r ) , and wi 2 0, V i = 1,2, ...,n. Proof of Result 5. We have
where
Since C;= qi ln(qi/pi)2 0 and (s - 1)/(2'-' - 1) < 0 for any s, then from (58) we conclude that
This completes the proof of the result.
Proof of Proposition 8. (i) In (57) take wi = p i , f ( r ) = r - 1; wi = qi, f ( r ) = r - 1; and wi = q i , f ( r ) = (r - l)/r, V i ; we get, respectively, the monotonicity of XS(P), ' X ; ( P 11 Q), and 'XS(P 11 Q) with respect to r.
62
INDER JEET TANEJA
(ii) In (57) take wi = p i / q i ,V i,f ( r ) = r - 1; we get the monotonicity of D;(P 11 Q) with respect to r. It only remains to show the monotonicity of Df(P 11 Q)with respect to s, as follows. We can write
G(P
II Q)= - e m ,
s # I,
(59)
where l/(l-r)
x = (iilPlql-‘)
9
r f 1,
and 6, is as given by (21). In view of relation (59) and Result l(ix), it is sufficient to prove that x 2 1, and it holds by raising both sides of (36) by 1/(1 - r), r # 1. This proves the monotonicity of 9; (P (1 Q) with respect to s.
D. Different Kinds of Convexities This section deals with the study of different kinds of convexities of measures (15), (16), and (17) such as convexity, Schur-convexity, pseudoconvexity, and quasiconvexity.
Proposition 9. We have (i) ‘ X ; ( P11 Q) is a convex function of Q E A; for r < 2, s < 2, and is a concave function of Q E ‘ A nfor r > 2, s > 2. (ii) “X:(P(1 Q) (a = 2 and 3) are convexfunctions of Q E ‘ A nfor r > 0, s < 2, and are concave functions of Q for r < 0, s > 2. The proof is based on the following result.
Result 6. For all P, Q E ,,A,, v, /3 E (-
m, m),
the measure given by
is convex in Q under any one of the following conditions: (i) v > 1 orv 1, (ii) v > 1, v/3 > 1, (iii) v < 1, v/3 < 0,
and is concave in Q under any one of the following conditions: (i) O < v < 1 , 0 < / 3 < 1, (ii) v < 1, 0 < v/3 < 1 .
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
63
= ( p I , p 2 ,..., P a ) E A , QI = (411,. . . , 4 1 n ) E vAn and Q2 = (q21,..., qZn)E v A n . Also consider 0 < A , p < 1 , A + p = 1 . In view of inequality (22), we have
Proof of Result 6. Let P
Raising both sides of (61) by B, we get
Again in view of Jensen’s inequality (22), we have
From (62) and (63), we get AG,B(P II QJ + PGAP II Q2)
64
INDER JEET TANEJA
VS > 1 or v B <
0,
(67)
o < vp<
1.
From (66) and (67), we get
AG,B(P II QJ + @,B(P
I
2
II Q 2 )
G,BP II nQ1 + P Q ~ ) ,
(V
> 1, VS > 1) or (V c
G,BV I J Q i
(V
< 1, 0 < VS < 1).
+PQ~),
1,
VS < 01, (68)
Inequalities (64) and (68) together give the required result.
Proof of Proposition 9. (i) We can write “H,S(PII Q) = (2’-’ - l)-’[“G,S(P11 Q) - 11,
r
# 1, s # 1 (a= 1
and 2),
and ’H,S(P 11 Q ) = (2’-‘ - 1)-1[G,S(P)3G,S(P 11 Q) - 11,
r
# 1,
s # 1,
65
DEVELOPMENTS IN GENERALIZED INFORMATIONMEASURES
where 'GAP II Q)=
(
(s- l)/(r-l)
,
iilPi~f-l)
(s- l)/(r-1)
and
(ii,~h'-r) (s-l)/(l-r)
'GN'
11 Q)=
I
for all r # 1 , s # 1. The rest of the proof follows in view of Result 6. Proposition 10. Xs(P) is a concave function of P in r A , for s 2 r > 0 or s ? 2 - l / r , r > 0, and is convex in rA,, for r < 0, sc 1 , or l c s c 2 - l/r.
The proof is based on the following result. Result 7. For all v, /3 E (-a, a),the measure given by GW) =
(
ilP:)B
is convex in P under any one of the following conditions: (i) v > l o r v < O w i t h / 3 > 1 , (ii) v > 1 , v/3 > 1 , (iii) v < 1 , v/3 < 0,
and is concave in P under any one of the following conditions: (i) O C V C 1 , 0 < / 3 < 1 , (ii) v < 1 , 0 < v/3 < 1 . P
The proof of this result follows from Result 6 by taking Q = P and = U,where U = ( l / n , ..., l / n ) E A,,.
Proof of Proposition 10. We can write Hs(P) = (2'-' - l)-'[G,S(P)- 11,
r
# 1, s # 1 ,
(70)
where
c Pf)
(s- l)/(r-1)
GW)
= (i:l
9
r f
1,Sf
1.
The rest of the proof follows in view of Result 7. Corollary 2. XS(p, 1 - p ) is a concavefunction of P E A2 for (s 2 r > 0) or (s ? 2 - l / r , r > 0 ) or (0 < r I2, s 2 1).
66
INDER JEET TANEJA
The proof for the first two conditions follows from Proposition 10, while for the last condition it follows by simple derivation. Proposition 11.
We have
(i) Xs(P) is a pseudoconcavefunction in P for r > 0 and pseudoconvex in P for r < 0. (ii) ' X s ( P 1) Q) is apseudoconvex function in Q for r < 2 and ispseudoconcave in Q for r > 2. (iii) a3CS(P11 Q) (a = 2 and 3 ) arepseudoconvexfunctions in Q for r > 0 and are pseudoconcave in Q for r < 0. The proof follows immediately from the following two results. Result 8. For all P , Q E vA,,, v E 5
(-00,
a),
C py,
v > 1,
1 py,
v < 1,
i= 1 i=l
2
i=l
implies
Proof of Result 8. It is well known that every concave function is pseudoconcave and every convex function is pseudoconvex (Mangasarian, 1969). Thus, proof of the result follows from the fact that the measure G,!(P)= C;= lpY is convex in P E vAnfor v > 1 or v < 0 and is concave in P E A,, forO
We have the following:
(i) The measure given by (69) is pseudoconvex in Q for ( v > 1 or v < 0 , B > 0) or (0 < v < 1, B < 0) and is pseudoconcave in Q for (0 < v < 1, B > 0) or ( v > 1 or v < 0, B < 0). (ii) The measure given by (60) is pseudoconvex in Q for (v > 1 or v < 0, B > 0) or (0 < v < 1, B < 0) and is pseudoconcave in Q for (0 < v < 1, j?> 0) or ( v > 1 or v < 0, B < 0).
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
67
Proof of Result 9. (i) We have
for all P, Q E "A,,. Inequality (71) can be rewritten as /
n
I V
Cp/,
v>1orvc0,
i= 1 i= 1
1 v cp/,
(74) O
i=l
Raising both sides of (72) by 8, we get
[
?G!(P),
G%Q) sG,!(P),
( v > 1 o r v < O , j ? > O ) o r ( O c v < 1,/3<0),
( O c v c 1,/3>O)or(v> l o r v < 0 , / 3 < 0 ) . (75) In view of (71)-(75), we get the required result. (ii) The proof follows similar lines to that of part (i). The proof of Proposition 11 follows immediately from Result 9.
Remark 2. It is well known (Mangasarian, 1969) that every pseudo convex function is quasiconvex and every pseudoconcave function is quasiconcave. Thus, for the respective values of the parameters r and s in the preceding results, the quasiconvexity (resp. quasiconcavity) follows. Corollary 3.
We have
where U = (l/n,
..., l/n)
E A,,.
The proof is an immediate consequence of Proposition ll(i). It also follows from Proposition l(iii) and (18), by taking Q = U. Proposition 12. We have
(i) Xf(P)is a Schur-concave function in P for r in P for r < 0.
> 0 and Schur-convex
68
INDER JEET TANEJA
(ii) ' X ; ( P 11 Q) is a Schur-convex function in Q for r < 2 and is Schurconcave in Q for r > 2. (iii) a X ; ( P11 Q) (a = 2 and 3) are Schur-convexfunctions in Q for r > 0 and are Schur-concave in Q for r < 0. Again the proof is based on the following result.
Result 10. We have the following: (i) The measure given by (69)is Schur-convex in Q for ( v > 1 or v < 0, /3 > 0) or (0 < v c 1, /3 < 0) and is Schur-concave in Q for (0 < v < 1, /3 < 0 ) or ( v 1 or v < 0, /? < 0). (ii) The measure given by (60)is Schur-convex in Q for ( v > 1 or v < 0 , /3 > 0) or (0 < v < 1, /3 < 0 ) and is Schur-concave in Q for (0 < v < 1, /3 > 0) or ( v > 1 or v c 0, /3 < 0).
Proof of Result 20. (i) Let P = ( p l ,...,p,) E "A,,, Q = ( q l , ..., qn)E " A nbe two probability distributions such that P < Q; then we have to show that IG!(Q),
( v > I, v < 0,I/ > 0) or (0 c v < 1, B c 01,
L G!(Q),
(0 < v < 1, /3 > 0) or ( v > I , v < 0,/3 < 0).
Taking r = v and raising both sides by /3 in (41), we get the required result. (ii) The proof follows on similar lines to that of part (i).
Proposition 13 (Convexity in Pairs). Q ( P 11 Q) is a convex function of the pair of probability distribution (P, Q) E An x An for s 1 r > 0.
Proof. Let us consider the probability distributions P, = ( p l l,p 1 2 ,.. ., P l n ) E A n , QI = (q11,4129 41n) E A n r P2 = ( ~ 2 1~, 2 2 9 P Z ~E )A n , and Q2 = (q21,q22,...,q2n)E An. It is well known (Ferentinos and Papaionnou, 1981) that the measure Cy= pfq!-' is convex in (P, Q) for r > 1 and is concave in (PIQ) for 0 c r < 1. This allows us to write the .-.?
following inequalities: n
n
DEVELOPMENTS I N GENERALIZED INFORMATIONMEASURES
Raising both sides of (76) by (s
-
l)/(r - I), we get (s- l)/(r- 1)
n
A1
C P;iq:Lr
+ ~2
i=l
[ [i
69
C
i= 1
( s - I)/@- 1)
5
+ ~ 2 P 2 i ) r ( ~ l q+l i ~ 2 q 2 i ) l - r ]
( ~ I p l i
i= 1
( r > 1r ,- 51
>O
) (
or O < r < l , -
9
r-1
Again using the fact that the functionf(x) = x' is convex in x for t > 1 or t < 0 and is concave in x for 0 < t < 1 , where x > 0, we conclude the following inequalities: [A,
i
(s- I)/+
i = l P;iq:;r]
1)
i
+ [ ~ zi = l ~
(S-
l)/(r- 1)
i q j L r ] (s- I)/@-
i= 1
i= 1
s- 1
s- 1 < 0. r- 1
> 1 or-
1)
70
lNDER JEET TANEJA
Substracting 1 on both sides of (79), multiplying by (1 - 2l-‘))-’ (s # l), and simplifying, we get
LiDs(P1 II Q i ) +
12DsW2
II Qz) 2 Ds(A1P1 + 12p2 II 11Qi+ 1 2 Q 2 ) ,
(80)
f o r a l l s > r > O , r # 1 , s f 1. In particular, when r = s, the inequalities (80) still hold, which we get directly from (76). The remainder of the proof follows from (80) in view of the continuity of the measures with respect to parameters r and s. Proposition 14 (Schur-Convexity). For all r E (0,00)and s E (-00, 00), D;(P 11 Q) is a Schur-convexfunction of the pair of probability distributions (P, Q) E A n X An*
Proof. Since ( P I ,Q1) < (P2,Q2),then by Definition 6, we have diq:Fr =
(il
aikP2k)(
,%,
(81)
aikq2k)l-r
for all i = 1,2, ..., n. From the relations (81), Holder’s inequalities (23), aik = 1, v k, we get the following inequalities: and the fact that
The remainder of the proof follows from expression (82) and Result 4. Proposition 15. (Generalized Data Processing Inequalities).
%(p(B) 11 Q(B)) 5 % ( p for all r
> 0,
-00
We have
11 Q),
c s c 00, where
P(B) =
C ~ i b l ii i , ~ i b z i *.., , i pibmi) E An =1 i=
( i n= l
1
and
Q(B) =
c qibli, i qibli, i 4ib.i)
( in= 1
i= 1
with B = [bij],v i = 1,2, ..., n, j = 1,2, such that cj”= b, = 1, V i = 1,2, ..., n.
€An,
i= 1
..., m, being a stochastic matrix
The proof of this proposition follows on lines similar to that of Proposition 14.
71
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
E. Optimization of Unified (r, s)-Inaccuracies In this subsection we shall apply Propositions 1l(ii) and 1l(iii) to optimize the unified (r, s)-inaccuracies given in (13) with respect to distribution Q. It is understood that pmax = max(pl, ...,p,) andp,, = min[pl, ...,p,l > 0. Let us write
" Xs (P ) = Opt "Xs(P 11 Q),
(a= 1, 2, and 3),
Q
(83)
where the optimum is taken over all the probability distribution Q E A,, . The optimum stands for the maximum if "Xs(P 11 Q) (a = 1, 2, and 3) are pseudoconcave in Q, and for the minimum if "XS(P 11 Q) (a = 1, 2, and 3) are pseudoconvex in Q, i.e., max "XS(P 11 Q), " X ; ( P )=
"Xs(P 11 Q),
if "Xs(P 11 Q) is pseudoconcave in Q, if "Xs(P 11 Q) is pseudoconvex in Q.
In fact, we have the following proposition. Proposition 16.
We have
11 Q),
'WP) = Opt 'XW' Q
and where
XS(P) = Opt "XS(PI( Q) Q
(84)
(a= 2 and 3),
(85)
(2-r)(s- l)/(r- 1)
(2l-s -
-
11,
i= I
r f 1,s# 1,r#2,
-
C Pi logpi,
i=1
The proof is based on the following result.
r = 1 , s = 1.
72 Result 11.
INDER JEET TANEJA
We have
Consequently,
Proof of Result 22. We shall prove the foregoing result using Lagrange’s method of multipliers. Let us consider n
f(Q)=
C PiqY + A
i=l
Thus,
Substituting this value of qi in the expression C1= piqr,we get the required result. The remainder of the proof follows in view of the fact that the function lpiqris convex in Q for v > 1 or v < 0 and is concave in Q for O
Proof of Proposition 26. In Result 1 1 , take v = r - 1 ; we get (84). Take v = (r - l)/r in (86) or (87); we get (85) for CY = 2. Again take v = 1 - r and P = P‘ in (86) or (87); we get the result of CY = 3. We observe that the inequalities (87) are the same as (38) for v = (r - l)/r, r # 0. As a consequence we have inequalities (37) for CY = 2. Also, the following proposition holds. Proposition 17.
We have
Inequalities (88) are the same as (37) when 1/(2
-
r) = t for
CY
=
2.
Since the measure X i ( P 11 Q ) is neither pseudoconvex nor pseudoconcave in Q , we can’t apply the Langrange’s method of multipliers to optimize it. But we can do it directly, as given in the following proposition.
73
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Proposition 18.
We have esbmax) 5
'X;(P II Q) 5
esbmin),
where i= 1
'%(p 11 Q) =
s = 1,
and 8,(p), 0 < p 5 1, is as given in (21).
Proof. It follows immediately from the fact that
F. Properties of Unified (r,s)-Entropy Relative to pmaxand pmin In this subsection, we shall present some inequalities for the unified (r,s)entropy relative to the maximum and minimum probability of a probability distribution. One of these inequalities generalizes the well-known Fano inequality. Proposition 19.
We have
X"(P)
IXS(P) 5
xm,
(89)
where X>(P) = lim XS(P) = O s ( p m a x )
(90)
X"_(P) = lim XS(P) = e s ( p m i n ) .
(91)
r-rm
and r + -m
Proof. Applying L'Hospital's rule, we have
Dividing and multiplying this expression by p&, and using the fact that 1,
Pi = Pmax,
0,
Pi Z P m a x ,
74
INDER JEET TANEJA
we get lim H"P) = -logpmax r+oD
(93)
The expression (90) follows in view of (21) and (93). Again dividing the expression (92) by (l/pmin)r, and using the fact that lim(kJ= r - 0 ~ Pmin
[
1,
p.I = pm .i n ~
0,
Pi
f Pmin,
we get lim H,"(P)= -log pmin .
r-
-m
(94)
The expression (91) follows in view of (21) and (94). The inequalities (89) follow from (90) and (91) and Proposition 8(i). Proposition 20.
We have 1-
(9
Pmax
= +XS(P),
under the following conditions : (al) (s I2, l/n IpmiuIt )or (s L 2 , pmax L 3) with r > 0; (a2) (s L r > 0 ) or (s L 2 - l / r , r > 0 ) or (0 < r I2, s L 1) with Pmax 2
3; 1 - Pmin 2
(ii)
+x;(p),
under the following conditions: (b,) (s I2, I pmax I1) or (s 2 2, 0 < pmin I 1) with r < 0 .
+
Proof. Replace x byp,,
in Result l(xii) and apply the RHS of inequalities (86); we get the proof under the conditions (al). Again replacing x by pmin in Result l(xii) and applying the LHS of inequalities (86), we get the proof under the conditions (bl). The proof under conditions (az) follows by concavity of XS(p, 1 - p) given in Corollary 2 (Taneja, 1989). Proposition 21.
We have
The proof of this proposition is based on the following result.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Result 12. For 1 5
Q In,
75
we have
r 5 0. \i= 1
i=l
/
Proof of Result 12. Without loss of generality we can reorder the distribution P E A: such that p 1 L p2 L - - - 2 p,,. Define Q E A; with
(Pi
i = r~
+ 1, ...,n. k
We can easily check that q1 1 q2 L L qn and C i = lpi 1 Cf=1 qi, 1 Ik In. Thus, Q c P. Applying Proposition 12(i), we get part (i). Again , p i , q2 = Cy=a+lpi, define the distribution Q E A,, with q1 = q3 = q4 = ... = q,, = 0 (1 IQ In). In this case, we can also check that k k q1 2 q2 L ... L qn and Ci= I p i I Ci= q i , 1 I k 5 n, i.e., P c Q. Again applying Proposition 12(i), we get part (ii).
Proof of Proposition 21. In the preceding lemma take p,, = pmax; we get the required result.
Q
=
n - 1 and
Part (i) of Proposition 21 generalizes the well-known Fano inequality. In view of this, we call them the generalized Fano-type inequalities.
UNIFIED (r, 111. M-DIMENSIONAL
DIVERGENCE MEASURES
In this section, we shall present M-dimensional unified (r, s)-generalizations of the measures information radius, J-divergence, and A & G-divergence measures given by (7), (8), and (9). For each of these measures we have given three different generalizations. Some properties and connections among them are also presented.
76
INDER JEET TANEJA
(al) M-Dimensional Unified (r,s)-Information Radii The M-dimensional unified (r, .+information radii or M-dimensional unified ( r , s)-R-divergences measure is given by ( " l f ( P , , P z , ..., P,),
For a
=
2, we have
r
#
1, s # 1,
77
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
with
i ( c" LjPji)( c" LkPki]-
Z,S(P,,P2, ..., PM) = (1 - 2 1 - y [ i = l
j=1
- 11,
k= 1
S #
1.
For the two-dimensional case, i.e., when M = 2, L1 = L2 = i,P, = P , and P2 = Q, we have 'ZS(P1,P2) = 'Z:(P 11 Q) = (1 - 2'-')-' - 11,
(98)
2ZS(Pl,P2) = 2Z,!(P 11 Q) = (1 - 2I-'))-'
(99)
i=l
and 'Z:(Pl,P2) = 'Z3P 11 Q) = (1 - 2l-9-l
for all r # 1, s # 1. (a2) M-Dimensional Unified (r, s)d-Divergences The three different ways of defining M-dimensional unified (r, s)-Jdivergence measures are given by ("JS(P1,P 2 , ...,PM), "g;(P, P2, .. .,PM) = y
I
f
# 1, s # 1,
"JS(P1,P2, ..., PM),
f
= 1, s # 1,
"J,'(P,,P,, ...,PM),
f
# 1, s = 1,
(101)
78 CY =
INDER JEET TANEJA
1, 2, and 3, where for
IJS(Pi,J'z,.*.~PM)
CY =
1, we have
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
79
with
For the two-dimensional case, i.e., when A4 = 2, A , = A2 P2 = Q, we have
=
i,Pl = P, and
2 JS(P1 P2) 9
=
'JW11 Q)
(l+r)/2 (l-r)/2
Pi
4i
(I+r)/2 (l-r)/2
+PI 2
4i
for all r # 1, s # 1. (a3) M-Dimensional Unified (r, s)-A & G-Divergences The three different ways of defining unified (r, 8)-A & G-divergences or unified (r, s)-T-divergence measures are giveq by
("T;'(P,, P2, ...,PM),
r
# 1,
s
# 1,
80
with
INDER JEET TANEJA
(Y
= 1, 2, and 3, where
For 01 = 2, we have
For
(Y
=
3, we have
and
In particular, we have
where
for
01
= 1, we have
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
81
and
pi + qi
(s- l ) / ( r - 1)
i= 1
for all r # 1, s # 1. We observe that the measures " $ ( P I ,P z , ..., PM) (a! = 1 and 2), "ds(P1,P2, ..., PM) (a! = 1 and 2), and " 3 3 P 1 P2, , ..., PM)(a! = 1 and 2) are continuous with respect to the parameters r and s. This allows us to write them in the following simplified forms: "~;(PI,P . - ~. , ,P M )= CE["IS(Pl,Pz,..., p~)Ir# 1, S # I ) , udS(P1,P2,- . * ~ P M = CE("JS(J'I,P2, ) ..., pM)I r Z 1, s Z 11,
and 'Jr(p1,P2, - - * , P M=) CE[nTs(Pl,P2,...,PM)Ir # 1, s
a s
z I],
(110)
a = 1, 2, and 3, for all r, s E (-00, a),where CE stands for continuous extension with respect to r and s.
82
INDER JEET TANEJA
Also, we can write
A. Properties of M-Dimensional Unified (r,s)-DivergenceMeasures
In this subsection we shall give some properties of M-dimensional unified (r,s)-measures " $ ( P I , ...,PM), "J f ( P l ,...,PM), and "3s(Pl,..., PM) (a = 1 , 2, and 3). Some studies on the measures " $ ( P I , ..., P M) (a! = 1,2, and 3) can be seen in Menendez et al. (1992).
Proposition 22. The measures " $ ( P I , ..., PM) (a = 1 and 2), "J:(Pl,..., PM)(a = 1, 2, and 3), and a3s(P1,..., PM)(a! = 1 , 2, and 3) are non-negative for all r > 0 and any s E (-a, a) and are zero iff p , = p2 = ... = PM. The non-negativity of 3Sf(P1,...,PM)follows under the conditions when either s 1 2 - l / r or s 2 r with r > 0. Proof. (i) For "gf(P,, ...,PM):By Jensen's inequality (22), we have 5
M
C
AjPJi
j=1
[
1
( (c
j=1
AjpjiY,
(1 18)
M
AjPjiY,
j =1
v i = 1,2, ..., n, with equality iff pji = i.e., iff PI = P2 =
..- = PM.
0 < r < 1,
r > 1,
cj"=Ajpji,V i, i.e., pji = cj, V I., .
J,
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Multiplying both sides of (1 18) by i = 1,2, ..., n, we get
83
(Cf = , &pki)l-r and summing over all
---
with equality iff PI = P2 = = PM. Expression (1 19) along with Result 4 gives the non-negativity of I S g r ( P l ,..., P M ) . The non-negativity of 2ds(Pl,...,PM) follows from relation (96) and Proposition l(iii). The non-negativity of '3;(P1,...,PM) under the conditions specified is due to the concavity property of the unified (r, +entropy Xs(P) given in Proposition 10. (ii) For adS(Pl,P 2 , ...,PM):In view of (36), we can write n
C PjiPiFr i= 1
11,
O
2 1,
r > 1,
---
for all j, k = 1,2, ..., M with equality iff P, = P2 = = PM. Multiplying both sides of (120) by A,&, summing over a l l j and k with j # k , and dividing by M
C AjAk, j,k= 1 j#k
we get
j#k
j#k
Multiplying both sides of (1 18) (up= P $ ) ' - ~and summing over all i, we get
i(5
i=l
j=1
AjPji)(
li
( (F
< 2
j = I
j =1
In view of (3 l), we have
ii
k = 1 P:)'-.
AjpjiJ(
k=l
fi
pkj'-r,
AjpjiJ( k = 1 p$)l-',
0 < r < 1, ( 122)
r > 1.
84
INDER JEET TANEJA
since M
M
j= 1
j = 1
holds always (arithmetic and geometric mean inequality). Multiplying both sides of (123) by C z I Ajpjiand summing over all i, we get
From (122) and (124), we have
Applying Result 4 over (121) and (125), we get the non-negativity of 1 s & ( P I ,P 2 , ..., P M ) and 3 $ ( P 1 ,P 2 , ...,PM), respectively. The nonnegativity of 2 $ ( P l , P 2 , ...,PM) follows from relation (102) and Proposition l(iii). (iii) For "3s(P1,P2,...,PM):By Jensen's inequality (22), we have M
c Ajpji-r
j = 1
[
5
1
( (
0
c r < 1,
j = I
( 126)
Ajpji>'-',
r > 1,
j = 1
v i = 1 , 2,..., n,withequalityiffpj,= ~ ~ l A j p j i , t l i , i . e . , p j i = c j , v i , J , i.e., iff PI = P2 = ... = P M . Multiplying both sides of (126) by (Cf= Akpk;)' and summing over all i = 1,2, ..., n, we get
with equality iff PI = Pz = = PM. Applying Result 4 over the expression (124) and (127), we get the nonnegativity of the measures '3S(P1,P 2 , ..., PM) and '3S(P1,P2, ..., P M ) , respectively. The non-negativity of 233P1,P 2 , ...,PM) follows from relation (107) and Proposition l(iii). Proposition 23. For alf r E (0,w), s E
(-00,
a),we
have
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
85
and
Proof. By Jensen's inequality (22), we have the following three inequalities:
c"
1 -r
c"
j = ] Aj[ i = l Pji( k = ] AkPki)
s- 1
s- 1 < 0; r- 1
1 lor-
5
s-1
]
(s- l ) / ( r - 1)
2
1,
s- 1 lor-<<; r- 1
86
INDER JEET TANEJA
and
s- 1
s- 1 >lor< 0. r- 1
Subtracting 1 , multiplying by (1 - 2'-')-' (1 3 1)-( 133), and simplifying, we get
(s # 1) on both sides of
and
Expressions (134), (135), and (136), along with the continuity of the measures, give (128), (129), and (130), respectively. Proposition 24. For all r E (0, a),s E 3 s dr(P1, - * .
and
Y
&(PI - - PM)2 9
a),we
-PI .-
PM) 2 '9:(P]
3 s
9
(-00,
*
have
9
PM),
(137)
9
PM).
(138)
Proof. In view of (123), we have M
(139) 5
(fi
k=l
r > 1.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Multiplying both sides of (139) by
87
cj”=A.jpjjand summing over all i, we get O
r > 1. Inequalities (122) and (140), along with Result 4, give (137) and (138), respectively. The following proposition gives the relation between the measures V S ( P (1 Q)and a3f(P(1 Q) for a = 1 and 2, i.e., only for the two-dimensional case. The extension for the M-dimensional case will be dealt with elsewhere. Proposition 25. For all r
E (O,m),
sE
(-00,
a3XP 11 Q)2 4 ” 9 W 11 Q),
a),(Y = 1 and 2, we have
s2r
> 0,
“ 3 W 11 Q)2 2 ” g W 11 Q). Proof. Using Jensen’s inequality (22), we can write
i.e.,
Similarly, we can write
(141) (142)
88
INDER JEET TANEJA
Adding (143) and (144)' we have
Let
and
Then, from (145), we have
Raising both sides of (146) by (s - l)/(r - l), we get
Again by Jensen's inequality (22)' we have O<-
s- 1 r- 1
s- 1 -?lorr- 1
I1,
5-1 r-1
< 0.
DEVELOPMENTS I N GENERALIZED INFORMATION MEASURES
89
From (147) and (148), we have
i.e.,
I
( r > lr ,- s1
>O
) or (
O
r-1
s- 1 5-1 with -> l o r < 0, r- 1 r-1
( r > l , r' A - 1>O
) (
or O < r < l , -
r- 1
s- 1 s- 1 with -z l o r < 0. r- 1 r-1
Multiplying both sides of (149) by (1 - 2'-')-' 'J,S(P11 Q ) 1 4lR;(P 11 Q),
(s f l), we get s
z r > 0.
Similarly, using (143) and (144) directly instead of (145), we can prove that
*J,"(P11 Q) z 42Rs(P 11 Q),
s2r
> 0.
90
INDER JEET TANEJA
The rest of the proof of inequalities (141) follows from the continuity of the measures with respect to the parameters r and s. In view of (143) and (144), and the fact that A + l 2 '
I -
O
A + l r2 '
A 2 1,
we conclude that Pi
+ 4i
1- r
O
i= 1 1- r
r > 1, and 1-r
O
1- r
r > 1,
respectively. Adding (150) and (151) and dividing by 2, we get O
(152)
r > 1. The inequalities (152) and Result 4 prove the inequalities (142) for a = 1. Similarly, from (150) and (151) along with Result 4, we get (142) for a = 2. Proposition 26. The measures "9S(Pl,..., PM) (a! = 1 and 2), ",$(PI, ..., PM)(a = 1,2 and 3), and "3s(P,,..., PM)(a = 1,2 and 3) are increasing functions of r (sfixed) and of s (rfixed).In particular, when r = s, the result still holds. Some parts of this proposition follow from the next result.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
91
is monotonically decreasing in r, provided f '(r)2 0, and h monotonically increasing in r, provided f ' ( r ) I 0, where f ( r ) is a function of r, f ' ( r ) = (d/dr)f(r) where W, = ( w j l , w j 2 ,..., win) E rn, with wji > 0 , v i = 1 , 2,..., n ; j = 1,2,..., M . The proof of this result follow similar lines to that of Result 5. The inequality (142) can be seen in Taneja et al. (1991~).
Proof of Proposition 26. Monotonicity of all the measures involved with respect to s follows in view of relations ( l l l ) , (112), and (114)-(117), and Result 3(iii). The monotonicity with respect to r is as follows:
(i) Take wji = pji/cF= &Pki and f ( r ) = r - 1 in (153); we get the monotonicity of the measure ' $ ( P 1 ,P 2 , ...,PM)with respect to r. (ii) Take M
Pi =
c
j,k= 1 j# k
(iii) (iv)
(v) (vi)
M
AjAkPji/
c
AjAk
j.k= 1 j#k
and wi = (p,i/p&i),j # k,f ( r ) = r - 1 in (57); we get the monotonicity of the measure '3S(P1,P2, ...,PM)with respect to r. Take wji = pji/HF=,p$ and f (r) = r - 1 in (153); we get the monotonicity of the measure '3S(P1,P 2 ,...,PM)with respect to r. M Take j = k,pji = pi = cj= Ajpji, Wki = M A,pji/p&i, and f ( r ) = r - 1 in (153); we get the monotonicity of '3S(P1,P2, ...,PM)with respect to r. and f ( r ) = r - 1 Take pi = ZE Ajpji, wi = Ajpji/ny= in (57); we get the monotonicity of '3S(Pl,P2,...,PM)with respect to r. The monotonicity of the measures 2Ss(Pl,P2, ...,PM),*$(P1,P 2 , ...,PM),and 23S(P1,P 2 , ...,PM)with respect to r follows from the relations (96), (102), and (107) and Proposition 8(ii), respectively.
cj=
Proposition 27. For CY = 1 and 2, the measures a$(Pl, ...,PM),"$(PI , ..., PM),and a3s(P1,..., PM)are convex functions of ( P l ,P 2 , ...,P,) E A: for all s 1 r > 0.
92
INDER JEET TANEJA
Proof. Let us consider the probability distributions Pj=(Pjl,~j2r...,Pjn)~An,
Qj=
(4j1,4j2,***,4jn)~An
f o r j = 1,2,. ..,M . We shall prove only the convexity of '$(PI,P 2 , ...,PM), I s &(PI,P2, ..., PM), and I3s(P1,P 2 , ..., PM). The convexity of '$(PI, P2, ..., PM),2$s(Pl,P 2 , ...,PM),and 23s(P1, P2, ..., P,,,,) follows from relations (96), (102), and (107) and Proposition 13, respectively. It is well known (Ferentinos and Papaionnou, 1981) that the measure I p ; q ; - r is convex in (P, Q) for r > 1 and is concave in (PIQ) for 0 < r < 1 . From this we conclude that the measures
and
are convex for r > 1 and are concave for 0 < r < 1 in the distributions (PIP , 2 , ...,PM).This allows us to write the following inequalities:
Pl"yr(PlsP2s * * * ~ P+ M~ 2)9 ' r ( Q I , 5 a'Y,(PIPl
[
2
+ ~ 2 Q 1 ,P2 + ~
Q 2 , . * * , QM) 2 Q 2 ,* * * , P I P+M P~
QM),
O
PI PI PI
(157)
+ P ~ Q ~ ~+PP UI ZPQ Z*~, - * , P I P+MPLZQIM), r > 1,
for all (Y = 1,2, and 3. Raising both sides of (157) by (s - l)/(r - l), we get p ] ["Yr(P],P2,
...,P M ) ] ( s - I)'(r-l)
+ P ~ [ " Y ~Q2( Q .I- -~ Q M ) ~( s - l)/(r9
15 ["'+'r(PIf'l + ~ 2 Q 1 9* - * , P I P+MPZQM)I
[
( r > lr ,- 1e
> O
) (
(s- l ) / ( r - 1) 9
or O < r < l , -s - l < O ) r- 1
1)
93
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Again using the fact that the functionf(x) = x' is convex in x for t > 1 or t < 0 and is concave in x for 0 c t < 1 , where x > 0, we conclude the following inequalities:
s- 1 >lorr- 1
< 0.
Inequalities (158) and (159) yield pl["Yr(P1,Pz, ..., P&f)](s-l)/(r-l) + PzW'r(Q1, Q z , 5
[
(s- l)/(r- 1) QM)~
(s- l)/(r- 1)
[myr(PIP1+ PZQI * . . , P I P M + PzQM)] 1 >s>r,
+ P z Q i , - . * , P ~ P+MPzQM)] s > r > 1 ors > 1 > r.
9
(160)
(s- l)/(r- 1)
2 ["vr(PlPl
9
Subtracting 1 on both sides of (l60), multiplying by (1 - 21-s))-1(s # 1) and simplifying, we get for (Y = 1 , 2, and 3 as follows:
PM)+ Pzllr(Q1v
PllZr(P1s...s
QM)
*..s
2 'Zr(P1P1 + P Z Q I , . . - , P ~ P+ MP Z Q M ) ,
fil'Jr(P1, 2
..
*
9
PM)+ P Z l J r ( Q 1 s
' J , ( P I P+~~
**
9
(161)
QM)
+
z Q i ,. . . , P ~ P M P Z Q M ) ,
(1 62)
and pi1T(P1,
---
9
PM)+ PzlJr(Q1
--
*
9
QM)
2 'T(P1P1 f P z Q ~ ., - * , P I p M
PZQM),
(163)
respectively, for all s > r > 0, r # 1, s # 1 . In particular, when r = s, the inequalities (160) still hold, which we get directly from (157). The rest of the proof follows from (161), (l62), and (163) and the continuity of the measures with respect to parameters r and s.
Proposition 28. For r E (0, a),s E (-a, a),CY = 1 and 2, the measures "gS(Pl, ..., PM),"$(P1, ...,PM),and "3S(Pl,..., PM) are Schur-convex
94
INDER JEET TANEJA
...,PM)E A,: i.e., ( P l ,..., PM)< ( Q 1 ,...,Q M )implies agS(Pi P2 - - - PM) 5 ngs(QI, Q2 ..., Q M ) , (1 64) *3S(p1,p2 . PM) 5 "$(QI Q2 .- Q M ) , (165)
functions of ( P I ,
9
9
9
9
9
9
and "3S(P1,P2, ..., PM)
5 T ( Q 1 ,
Q2,
...,Q J .
( 166)
Proof. Since ( P I ,P2, ...,PM)< ( Q 1 Q , 2 ,..., QM),then from Definition 6, we can write
and
(E
j = 1 Ajpji](
E
(C c M
k = 1 AkP;Fr)
=
n
j = 1 Aj ? = I
c" i
ai?q?j][ k = l
Ak( t = 1 a i t q t i ) l - r ] ,
(169) f o r a l l i = 1 , 2,..., n. Applying Holder's inequality (23) over the expressions (167), (168), and (l69), summing over all i = 1,2, ..., n, and using the fact that air = 1 , v t = 1 , 2,..., n,weget
i ( c"
i=]
j=1
2
AjPJi)(
k=l lkPkiypr
(5 i(E
t=l
5 ?=I
j = 1 Aj4n>(
M
c
j,k= 1 j#k
Ajqj?)(
k = l Akqkt)l-',
0< <
j=1
AjAkPJiP&r
( 170) k=l
kkqk?)l-r,
r>
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
95
and
F
AjPji](
j = I
Ajqjt](
E
E
k= I
k= 1
respectively. Expressions (170), (171), and (172), along with Result 4, give (164), (165), and (166), respectively, for (Y = 1. The Schur-convexity of the measures 2 s $ ( P I , ..., PM),2$(Pl, ..., PM),and '3;(P1,...,PM)follows from relations (96), (102), and (107) and from Proposition 14, respectively. Proposition 29 (Generalized Data-Processing Inequalities). For r E (0,a), (Y = 1 and 2, we have
s E (-m,m),
"9s (Pl(B),P2(B),
..., PM@))5 " $ ( P I ,
"df(P~(B)v P2(B),
P2 9
9
PM(B))5 "$(Pi P2
9
PM@)) 5 %(Pi
-.-
9
PM),
.-. PM),
and "%(Pi(B),P2(B),
where P,(B) =
rc
.- -
---
9
PM),
~ j i b z i-, - - , Pjibmi) E r A n ,
Pjibli,
i= I
Pz
i= 1
i= 1
v j = 1,2 ,..., M ,
with B = { b i j ] V , i = 1,2, ..., n; j = 1,2, such that bij = 1, v i = 1,2, ..., n.
..., m being a stochastic matrix
The proof follows on lines similar to that of Proposition 28.
IV. UNIFIED (r,S)-MULTIVARIATE ENTROPIES The concept of the generalized entropy measure given earlier needs to be developed for multivariate probability distributions-in particular for bivariate cases, especially for problems of communication that require the analysis of messages sent over a channel and received at the other end. The same is also required in bounding Bayesian probability of error, comparison of experiments, etc. In order to develop this idea, let us Y) taking the values (xi,y j ) , consider a bidimensional random variable (X, i = 1,2, ..., n , j = 1,2, ..., m, with joint and marginal probabilities
96
INDER JEET TANEJA
denoted by
The conditional probability of y, given the value xi of X is denoted by p ( y , [ x i )= Pr(Y = y j IX
=
xi),
v j = I , 2 , ..., m ; i = 1,2, ..., n.
Similarly, the conditional probability of x i , given the value yj of Y, is denoted by p ( x i I y j ) = Pr(X = xil Y = y j ] ,
vi
= 1,2,
..., n ; J = 1,2, ..., m.
Based on these notations, the joint and individual (or marginal) unified (r, s)-entropies can be written as XSV, Y),XS(x), and XXY),
respectively, where XS is as given in (15). As we know, the Shannon’s conditional entropy is given by m
H ( X I Y)
P(yjW(X I Y = Yj), j= 1
where n
H(X IY = yj) = - C
i= 1
xi I yj) logp(xi 1 yj),
v j = I , 2 , ..., m.
There is no unique way to define the conditional generalized entropy. It has been defined in different ways by different authors. We shall specify here four different ways to define the conditional generalized entropy. We shall observe that these approaches in the limiting case reduce to the wellknown Shannon’s conditional entropy. The idea of mutual information has also been generalized for the unified (r, s)-entropy measures. Henceforth, unless otherwise specified, the letters X, Y, Z, ..., etc., will represent discrete, finite random variables.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
97
A. Different Forms of Unified (r, s)-Conditional Entropies
The unified (r,s)-conditional entropies for a fixed value of a random variable are given by Xs(Y IX = xi) for each i = 1,2, ..., n and X ; ( X I Y = y j ) f o r e a c h j = 1 , 2,...,m. We shall now give four different ways to define unified (r, s)-conditional entropies. The first is based on the natural way, as in Shannon's case. The second and third forms are obtained from some of the expressions appearing in between the first form. The fourth is based on the well-known property of Shannon's entropy. These are given by
98
INDER JEET TANEJA
and
4X;(XI Y ) = XS(X, Y ) - XS(Y), respectively. Also, we observe that 3 X i ( X I Y) = 2X i(X I Y), and 2 X ; ( X I Y) = 1 X i ( X I Y).For r - l = 2 - s, we have ' X ; ( X I Y) = 'XS(X 1 Y ) . In a similar way, we can define the generalized unified (r, s)-conditional and joint entropies for three or more random variables X , Y, Z, ..., etc., such as XS(X, Y, Z), 'WS(X I Y , Z ) , 3CS(X, Y I Z ) (a = 1, 2, 3, and 4).
B. Properties of Unified (r,s)-Conditional Entropies The generalized entropy measures (joint, individual, and conditional) satisfy some interesting properties. These are divided in two subsections, one on bivariate cases and the other on multivariate cases. 1. Bivariate Cases
Proposition 30. For r, s E
(-00,
a),we have
XS(X, Y ) 1 XS(X) or XS(Y).
Proof. In view of inequality (25), we have r > 1,
rn
c
j = 1
for each i = 1,2,
r < 1,
..., n. As cj"= p(xi,y j ) = p ( x i ) ,v i, we have
for each i = 1,2, ..., n. Summing over all i = 1,2, ...,n, we obtain n r n
I
C p(xi)',
r > 1,
C p(xi)',
r < 1.
i= 1
C C P(xi,Yj)'
i=l j=1
L
i =1
Inequality (178) and Result 4 complete the proof.
(178)
99
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Proposition 31.
We have *Xf(XI Y ) 2 0
(a! =
1, 2, 3, and 4).
Proof. We shall prove this proposition for each value of a separately. For a! = 1 . In this case, we have 1
m
XS(X I Y ) =
C P(yj)XSW I Y = yj)*
j= 1
Since Xs(X I Y = y j ) 2 0 [Proposition l(i)] for j
=
1,2, ...,m, then
'XS(X 1 Y ) 2 0. For a! = 2. In view of inequality (18), we have
i P(xi
2
i= 1
for all j
=
[
r>l,
11,
1Yj)r
r c 1,
1,
1,2, ...,m . Multiplying by p ( y j )and summing over all j, we get
Inequality (179) and Result 4 complete the proof. For a = 3. In inequality (22), take i = j, a, = (If= I Yj)r)l'r, r # 0, and t = r, we get m
Vj,
n
C P O j ) i C= 1 P(xi I YjIr j= 1
I
] ]
l/r r
I[ j ~ l P ( y j ) ( i i ~ p ( x i l y j ) r ) 9
2
[jFl
P(yj)(
iil
P(xi I
o < r < 1, ( 180)
l/r r 9
r > 1 or r < 0.
From (179) and (180), we get
In view of Result 4 and (181), we get 3X:(X I Y) 2 0 for r > 0. For r c 0, the proof follows from Proposition 38(ii), proved later. For a! = 4. It follows immediately from Proposition 30.
100
INDER JEET TANEJA
Proposition 32. IfXand Yare independent random variables, then
XS(X,Y ) = XS(X)+ x;(Y)+
(21-s
-
l)x;(x)x;(Y).
The proof is a simple verification. Proposition 33. For any X and Y, we have
XS(Y) + 2X;(xI Y )+ (21-s - (21-s
-
l)-'(exp,[(l
l)x;(ryx;(x I Y) - s)(~f(Y) + X:(XI Y)I
-
11.
Again, the proof is a simple verification. Proposition 34.
We have
Proof. It is an immediate consequence of inequality (180) and Result 4. Proposition 35.
We have
Proof. In view of Jensen's inequality (22), we have
Subtracting 1 on both sides of (182), multiplying by (2'-' - l)-' (s # l), and simplifying, we get the required result. Proposition 36.
We have IX S ( X )
+ 'X:(Y1 X),
XS(X I Y ) 2
x~(x) + 'x~(Y 1 x),
s-
1
s-
1
s-
1
-111rr- 1 r - 1' s-
1
r515r-1 r - 1'
For proof, refer to Rathie and Taneja (1991).
101
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Proposition 37.
We have I‘ X s ( X I Y ) ,
“s(X
IY) 2
‘x~(Y 1x1,
s-
1
-111rr- 1 s- 1
r-
r - 1
s-1
r - 1’ 1 r - 1’
s-
IlI-
The proof follows immediately from the definition of 4Xs(XI Y) and Proposition 36.
Remark 4. The conditions (s - l ) / ( r - 1) I1 I r((s - l)/(r - 1)) are equivalent to either r 2 s z 2 - l/r z 1 or r < 0, s 1 2 - l/r, and the conditions (s - l)/(r - 1) 2 1 L r((s - l ) / ( r - 1)) are equivalent to either 1 2 r ~ s 2 2 -l / r , r > O o r s s r < O . Proposition 38.
(i)
W
for a (iii)
We have
X IY )
=
i
1
I
Xs(X),
r > 0 with s L r or s L 2 - -, r
2
Xs(X),
r < 0 with s I 1 or 1 < s s 2
r
2 and 3;
‘X~(X 1Y)
IX s ( X ) ,
2
The equality sign holds
X:(X),
iff X
1
r z s L 2 - - L 1, r sI r
< 0.
and Y are independent random variables.
ProoJ (i) It follows in view of Proposition 34. (ii) In view of Proposition 34, it is sufficient to show for Minkowski’s inequality (24), we can write rn
c
1
- -;
j = I PQj)(
01
= 3. Using
it,
P(xi 1 Yj)‘>’”
r > 1,
r< l,r#O,
102
INDER JEET TANEJA
i.e.,
i.e., f
n
with equality iff p(xi I y,) = p ( x i ) for every i = 1 , 2 , ...,n and j 1 , 2,..., m . The rest of the proof follows from Result 4. (iii) This part follows in view of Proposition 37 and part (i).
=
As a consequence of Propositions 34-38, the following proposition holds: Proposition 39. (i) For s 0
IX S ( X ) I
(ii) For s 2 r
0
I
r < 0, we have
I
3X3X I Y ) I 2X;(X I Y ) I'XS(X 1 Y ) I";(xI Y ) .
< 0 , we have
' X ; ( X I Y ) I 2XS(X I Y ) I3XS(X I Y ) I X;(x).
(iii) For r t s t 2 - l / r 2 1, we have 0 I";(X
IY) I
'XS(X I Y ) 5 XS(X).
(iv) For r < 0, s 2 2 - l / r , we have
2. Multivariate Cases
In this subsection, we shall extend the results studied in Section IV, B, 1 to three or more random variables. Proposition 40.
We have
" X ; ( X ,Y I Z ) t "XS(X I Z) or " X : ( Y 1 Z),
(a, =
1 , 2, 3, and 4).
DEVELOPMENTS IN GENERALIZED INFORMATIONMEASURES
103
Proof. From inequality (25), we can write
for every i = 1,2, ..., n and k = 1,2, ..., 1. As Cy= p ( x i ,y j I zk) = p(xi I zk) for every i = 1,2, ...,n and fixed k, we have r > 1, r < 1, for fixed k and every i. Summing over all i = 1,2, ..., n, we have
for all k = 1,2, ..., I . For 01 = 1 . Applying Result 4 over expression (184), we have
x;<x,Y I 2 = Z k ) 2 x ; ( X 12 = Z k ) .
(185)
Taking expectation with respect to 2 on both sizes of (185), we get ' X S ( X ,Y I Z ) 2
'x;(xI 2).
For 01 = 2. Multiplying (185) by p(zk),summing over all k = 1,2, ...,I , and applying Result 4, we get 2
x;<x,Y 1 2) 2 *XS(XI Z ) .
For 01 = 3. Raising both sides of (185) by l/r, multiplying by p(zk), summing over all k = 1 , ...,I, and applying Result 4, we get
3x;(x, Y 12) 2 3x;(x 12). For
01 =
4. By definition, we can write
x;(x,Y , 2) = x ; ( X ,Z ) + =
X;(Z)
Y I x,Z )
+ 'XS(X I Z ) + 'X;( Y I X , 2)
(186)
104
INDER JEET TANEJA
and
XS(X, Y, 2)
=
XS(2)
+ “S(X,
Y I 2).
(187)
Y I x,2).
(188)
Comparing (186) and (187), we get
4X;(x, Y I 2) = 4X3X 1 2) +
“x:(
Proceeding on lines similar to Proposition 30, we can show that
XS(X, Y, 2) 1 XS(X, 2).
(189)
From (186) and (189), we conclude that
Again from (188) and (190), we conclude the required result. Applying mathematical induction over relation (186), we have the following proposition. Proposition 41.
We have
Proposition 42.
We have
1 r < 0 withs < 1 or 1 < s < 2 - - ; r
for (Y
=
2 and 3 .
The equality sign holds in (i) and (ii) iff X and Yare independent given 2, i-e.9 iff P(xi I z k ) = P(xi I Y j , z k ) or P(x;,Y j I ~ k =) P(X; I z k ) .P ( Y ~1 z k ) , V i,j , k.
105
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Proof. For r
#
1 , s # 1, we have
I
r
=
P(xi
2HScx I Y , Z ) - ('I-'
-
I)-'[[
E
j = 1
P(yj, z k )
i
i= 1
P(xi I y j , z k ) r
]
0- W V -
1)
-
1]
.
(193)
106
INDER JEET TANEJA
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
107
Thus,
with equality iff p(xi 1 y j , z k ) = p(xi 1 z k ) , v i , j , k , i.e., iff X and Y are independent given 2. The rest of proof follows from Result 4. Proposition 43.
We have
+ lX;(z I X , Y ) ,
s- 1 s- 1 -< l s r r- 1 r- I'
z IX;( Y I X ) + 'X;(Z I x,Y ) ,
s- 1 s-1 -> 1 2 r r- 1 r - 1'
5 'XS( Y I X )
' X s ( y , 7 I \'k I
The proof follows along the lines of that of Proposition 36. C. Unified (r, s)-Mutual Information
In this section, we present six different ways to define the unified (r,s)mutual information. Let us consider
"S:(XA Y ) = X s ( X ) - " X s ( X I Y ) ,
a
=
1, 2, 3, and 4, (195)
and
and a&s(.
A .) =
H,?(.)- "Hs(. I .), etc.
108
INDER JEET TANEJA
We call the measures Y S ( X A Y ) ( a = 1, 2, 3, and 4) the unified (r, s)mutual information among the random variables X and Y. The measures "Ss(X A Y I Z ) (a = 1, 2, and 3) we call the unified (r, s)-mutual information among the random variables X and Y given Z. Proposition 44.
We have
(i) "SS(X A Y ) 2 0 (resp. 5 0) (a = 1, 2, 3, and 4), with equality and Yare independent. (ii) "SS(X A Y I Z ) L 0 (resp. I0 ) (a = 1, 2, and 3) with equality and Yare independent given Z.
iff
X
iff
X
The preceding property holds under the follo wing conditions: (c,) Fora = 1, r > 0 withs 1 r o r s 1 2 - l/r(resp. r < 0 withs I1 or 1 < s I2 - l/r; (cz) For a = 2 and 3 , r > 0, s E ( - m , m ) (resp. r < 0 , s E (-00,m)); (cJ For a = 4, r 2 s 2 2 - l/r L 1 (resp. s Ir < 0) (only forpart (i)).
Proof. Part (i) follows from Proposition 37, and part (ii) follows from Proposition 42. Proposition 45.
"ss(X A
We have
z) + "sS(X A Y I z) = * s s ( X A
Y ) + "ss(X A
(a = 1, 2, and
3).
z IY), (198)
Proof. For a = 1, 2, and 3, we have
"%s(x A z) + "ss(X A Y I z) = X:(X) =
- "Xs(X I z) + "Xs(x I z)
"XS(X 1 Y , Z)
X S ( X ) - "XS(X I Y , Z )
"ss(XA Y ) + "Ss(X A z
1 Y ) = X S ( X ) - "XS(X I Y ) + "XS(X I Y ) - "XS(X I Y,Z ) = XS(X) - "XS(X 1 Y , Z ) =
"ss(X A Y , 2).
Combining (199) and (200), we have the required result. The relation (198) is famous as the additive property.
(2Oo)
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
109
For simplicity, let us write P ~ =Y l
PX x PY = lP(Xi)P(Yj)J,
and
~ ( ~ i , ~ j ) l
for all i = 1,2, ..., n ; j = 1,2, ..., m. We have
F(x A Y ) = H ( X ) - H ( X I Y )
where D(P 11 Q) is as given in (10). We can also write m
where
for a l l j = 1,2, Thus,
..., m. m
F(x A Y ) = DVxY II Px x PY) =
C P(YjP(Px1Y =j II Px).
j= 1
The two other ways to define the unified (r,@-mutual information are given by rn
“%S(XA Y )
with equality (ii)
iff X and
I
2 0,
r > 0,
I 0,
r < 0,
Yare independent.
110
INDER JEET TANEJA
Proof. Part (i) follows from Proposition 1. We shall prove only part (ii). In view of Jensen's inequality (22), we have (s- I)/('-
r - 1
1)
< 1,
s-1
s-1
>lor< 0, r-1
i.e.,
r - 1
s-
1
<
1,
s- 1 < 0. r - 1
> 1 or-
Subtracting 1 on both sides of (203), multiplying by (2'-' - l)-' (s # l), and simplifying, we get the required result.
V. APPLICATIONS This section deals with the applications of the unified (r, s)-information and divergence measures presented in the previous sections. Most of the applications are in statistical information theory, such as Markov chains, comparison of experiments, error bounds, and connections with Fisher's information measure.
111
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
A. Markov Chains
In this subsection, we shall apply the generalized mutual information measures given in Section IV,C to relate them to Markov chains. We shall consider only the positive values of the parameter r, since for negative values the generalized mutual information becomes negative. First, we shall present the concept of Markov chains.
Definition 7. A sequence of random variables X 1 , X 2 ... , forms a Markov chain denoted by X,BX,B .-.if for every i , the random variable Xi+ is conditionally independent of ( X ,,X , , ...,X i - , ) given X i . The randon variables X , Y , and Z form a Markov chain, i.e., xeyez, iff "$(XAY ~ Z=) 0 (a = 1, 2, 3 , 5 , and 6).
Proposition 41.
Proof. Since ( X , Y , Z ) forms a Markov chain, then by definition, we have vi,j,k,
~ ( ~ l x i ,= ~ ~ (jz k ) Ixi),
i.e., AXi
Yj I z k )
= P(xi
I Yi IAxi I ~ k ) ,
v i , j , k.
Thus, in view of Propositions 44(ii) and 46, we get the required result. Proposition 48.
We have
(i) If XBYBZ, then *ss(X A Y ) I"ss(xA Y ) or "gs(Y A
z),
(a = 1, 2, 3, 5 , and 6);
(ii) If XB YBZB W , then "9:(X A W ) I"$(Y A Z ) ,
(a = 1, 2, 3, 5 , and 6),
under the following conditions: (c,) f o r a = 1, s 2 r > 0 o r s 2 2 - l / r > 0; (c,) for a = 2, 3, 5 , and 6, r E (0,oo) and s E
(-oo,oo).
Proof of Proposition 48. (i) In view of Propositions 44 and 45, we have "$(X A Z )
+ "$(X
A
Y I Z ) = "$(X
A
Y),
(204)
(a = 1, 2, and 3).
Expression (204) and Proposition 44(ii) complete the result for a and 3.
=
1,2,
112
INDER JEET TANEJA
In Proposition 15, take P = Pxy, Q = PxXPy:
and use the fact that p(zk 1 x1.’ yI. .) = p(zk I x i ) , v i, j , k , we get the result for a = 6. Similarly, we can prove it for a = 5. (ii) Since X, Y, Z, and W form a Markov chain, then X,Y, and W and Y, 2,and W also form Markov chains. Applying part (i) over the two subMarkov chains, we get the required result.
B. Comparison of Experiments
Let gX = (X, j l x , Pe ; 8 E 0)denote a statistical experiment in which a random variable or random vector X defined on some sample space ‘X is to be observed, and the distribution Pe of X depends on the parameter 0, whose values are unknown and lie in some parameter space 0.We shall assume that there exists a generalized probability density function f (x I 0) for the distribution Pe with respect to a a-finite mesurep. Also, let E denote the corresponding marginal generalized probability density function (gpdf) given by
f(x)=
s
e
f(x I 6 ) 4.
Similarly, if we have two probability distributions sponding gpdf’s are
fi(x) =
1
e
f(xI0)dri,
t l ,tzE E,
the corre-
i = 1,2.
In this context, the relative information, the information radius, and the J-divergence are given by the following:
Relative information :
113
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Information radius :
J-divergence:
In a similar way, we can write the corresponding unified (r, s)-measures in the integral forms, such as
:Xs(r,
I( r2) = ,9;(<, [It2)
1 1
II (2) = a x 1 II t2) :XXl II r 2 ) = X2SXl II r 2 1 ,4XX, II r 2 1 = :3sG II (2) X5XXl II t 2 ) = X23Xl II (2) X2XXl
[unified (r,s)-relative information],
[unified (r, s)-information radii],
[unified (r,s)-J-divergences].
Consider two arbitrary experiments gX = ( X ,BX,Po;B E 0 ) and gY = (Y, B y , Qe;0 E 0)with the same parameter space 0. Let E denote the class of all prior distributions on the space 0. We shall assume that there exist gpdf's f(x 1 0) and g(y I 19)for the distributions Po and Qe,with respect to some 0-finite measures p and v, respectively. Given two prior distributions E E,let fi(x) denote the marginal gpdf:
r,, r2
1
.e
f(xIe)dti,
i = 1,2,
and let ;Xs (t,( 1 &) ((Y= 1 , 2 , 3 , 4 , and 5 ) denote the generalized divergence measures of information contained in gX for discriminating between fl(x) and f2(x). In this context, we give the following definition:
Definition 8. We ,Say that the experiment gX is preferred to experiment g Y , denoted by gX 2 g Y , if and only if ;wr1
II r 2 ) 1 %s(r, II (2)
for all r l
Y
r2
E
=.
We say that experiment gX and g Y are indifferent, denoted by tix if and only if gX gY and g Y g X .
tiy,
114
INDER JEET TANEJA
Some studies in this direction, including Bayesian and Lehmann approaches, have been undertaken by Pardo et al. (1993a, b), Taneja (1987), Taneja et al. (1991a, b), and Morales et al. (1991). Based on the preceding definition, the following proposition gives interesting properties for the unified measures ,"Xs((, 11 (*) (OL = 1, 2, 3, 4, and 5).
Proposition 49. (i) Let g y be any experiment and g N be the null experiment (i.e., the distribution is independent of 8 a.e. p), then gX i gN. (ii) Given the compound experiment (gX,g Y ) , where gX a f d g Y are the corresponding marginal experiments. Then (gX,g Y ) 2 gX (or g X ) , with indifference i f f f ( y I x , 8) is independent of 8 [respectively f ( y I x , e)] is the conditional gpdf of Y , given X = x and 8 E 0. (iii) Let be the resulting experiment after observing gX n times; then (n) > g ( n - 1 ) . gx - x (iv) Let = ( X ,x , f ( x I 8); 8 E 01 be an experiment and [EiIi be a measurablepartition of x . Let us consider another experiment g Y = ( Y , c, f ( x I 8); 8 E 01 with the o-algebra generated by and with Qe(Ei)= jEi f ( x I 8) dp(x), v i E N. Then gX gY with indifbased on the experiment &$), if it is verified that ference iff T = (&b'yR) &$) gT with indifference iff f ( x I 8) is independent of 8 for almost every x . (v) For all statistics T,= T(&$'))based on the experiment &!$", it is verified that &$!)' 1 g T with indifference i f f T is a sufficient statistic.
9)
e
Now we give the relation between the preceding criteria and Blackwell's criterion. Blackwell's (195 1) definition of comparing two experiments states that experiment gX is sufficient for experiment g Y , denoted by gX Ig Y , if there exists a stochastic transformation of X to a random variable Z ( X ) such that for each 8 E 0 the random variables Z ( X ) and Y have identical distributions. By g Y = ( Y , P y ,Q e ;8 E 01, we shall denote a second statistical experiment for which there exists a gpdf g(y 18) for the distribution Q with respect to a a-finite measure v. According to this definition, if g Y 2 g Y , then there exists a non-negative function h satisfying (cf. DeGroot, 1970, p. 434):
and
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
1 15
rl, r2
If we have two prior distributions E 0, after integrating over 0 and changing the order of integration in (205), we get
gi(y) =
h(y Ix).h(X)dp,
i
= 192.
SX
Let I be any measure of information contained in an experiment. If gX 2 8, implies I, L I,, then we say that gX is at least as informative as gU in terms of measure I. Goel and DeGroot (1979) applied it for relative information. Ferentinos and Papaioannou (1982) applied it for a-order generalizations of directed divergence. Taneja (1987) extended it to various generalizations of the J-divergence measure having one and two scalar parameters. According to this approach, the result for the unified measures ,“XS(r, 11 (a = 1, 2, 3, 4, and 5) are summarized in the following proposition.
r2)
, “ ~ s ( r11 ,r2)2 ;xs(r, 11 r2)( a = 1, 2,
Proposition 50. If gx 2 &, then 3, 4, and 5) for every E E, 0 < r
(,,r2
c 00 and
-00
c s c 00.
The preceding proposition can be proved based on the inequalities given in Proposition 15. For more details refer to Taneja (1987) and Taneja et ai. (1991b). For more studies on comparison of experiments, refer to Pardo et ai. (1993a, b). C. Connections of Unified (r,s)-Divergence Measures with the Fisher Measure of Information
Consider a family 5 = (Po; 0 E 0) of probability measures on a measurable space (X, /Ix) dominated by a finite or a-finite measure p. The parameter space 0 can either be an open subset of the real line or an open subset of an n-dimensional Euclidean space I?“. Let f (x I 0 ) = dP,/dp. Then the Fisher (1925) measure of information is given by if 0 is univariate
r,F(e)=
a
-log f (x
a
I
I e) aej log f(x I e)
,
if
e is n-variate
llnxn
where 11 ( I n x n denotes an n x n matrix and Eo denotes expectation with respect to f ( x I e). Some studies towards Fisher information measure applying differential geometric approach have been successfully carried out by Rao (1945, 1973,
116
INDER JEET TANEJA
1987). Atkinson and Mitchell (1981), Burbea and Rao (1982a), Amari (1984, 1985), Cuadras et al. (1985), Campbell (1985, 1986, 1987), Burbea (1986), Burbea and Oller (1988), Cuadras (1988) etc. A direct approach has been undertaken by Kagan (1963), Vajda (1973, 1989), Aggarwal (1974), Boekee (1978), Ferentinos and Papaioannou (1981), Taneja (1987), and Salicru and Taneja (1993). 1 . Csiszar's +-Divergence and the Fisher Information Matrix
For f , ,f 2 E
r, the CsiszAr's
+-divergence (CsiszAr, 1967) is given by
with +(1) not necessarily zero and +(x) a continuously differentiable, non-negative real function. Following Kagan (1 963) and Ferentinos and Papaioannou (1 98 l), we define
Then Csiszar's parametric matrix is given by where 8 + tei, 6 + tej E 0,i, j = 1 , 2, ..., k and ei = (l,O, ..., 0 ) , e2 = (0, 1 , 0 ..., 0 ) , ..., ek = ( O , O , ..., 1) are the unit vectors. Suppose the following conditions hold: f(x,$) dp<m
f o r a l l $ ~ O a n d i , j =1 , 2,..., k.
(d) The third-order partial derivative of f ( x , 8) with respect to $exists for all 19 E 0 and all x E X.
Based on these considerations the following proposition holds. Proposition 51. If the conditions (a)-(d)are satisfied, then for aN 0 E 0, we have
r;(s)
fl z;(s), 2
if 8 is univariate, (209)
=
4
[S$(0)
+ I:(@],
if 6 is k-variate,
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
1 17
and
Proof. Considering MacLaurin's expansion for the function
and after some algebraic simplifications, we obtain
Thus we get, for the Csiszar information matrix,
In the univariate case, we have
Similar kinds of results can be seen in Aggarwal(1974), Kagan (1963), and Ferentinos and Papaioannou (1981). 2. Unified (r,s)-Divergence Measures and Fisher's Information Matrix Before we shall present the relationships of the unified (r, s)-measures with Csiszar's +divergence measure: (i) For Ss(P )I Q): We can write
Ds(P II Q)= h(D+((PII Q)- 4(1)), where
4(x)
= XI-',
r
# 1,
118
INDER JEET TANEJA
1 19
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
Let us suppose that the following regularity conditions are satisfied: (i) f ( x l e) > 0 for all x E X , 8 E 0; (ii) @/Mi) f (x 1 0 ) exists for all x E X,all 0 E 0,and all i = 1,2, ...,n; ( 5 ) for any A E pX, ( W a d i ) f (x I 0) dp = j A f ( x I 0) dp for all i. Define "3Cs(e) = lim inf "X;( f ( x 10) 11 f ( x I 0 AB-0
+ A@),
a
=
1 , 2 , 3 , 4 , and 5,
where "X's (a = 1 , 2, 3, 4, and 5 ) are the corresponding unified (r, s)divergence measures as specified in the beginning of Section V,B. Then the following theorem holds:
Proposition 52. Let 0 be univariate and let the regularity conditions of the Fisher information measure be satisfied. Also, suppose that
for all B E 0 and that the third-order partial derivative o f f (x I 0) with respect to 8 exists for all 8 E 0 and f o r all x E X . Then I
r x;(e)= -I;@), 2
2
s x,(e) = 3x;(e) = rz;(e),
4
r x,(e) s = 'x;(e) = -I;@),
and 8
for all 6 E 0 , O < r < 00, --m < s < 00. The proof is based on the following result.
Result 15. Let G,h(f,> IIfi) = h ( D , ( f , IIfd
-
+(I)),
where h is a continuous differentiable real function with h(0) = 0 , and D,( f l 11 f 2 ) is given by (206). Suppose the conditions (a)-(d) are satisfied. Thenfor 0 E 0,we have
c,hu;((e))= h'(o)&e),
(21 1)
where G , h ( I W ) = I1Gmh(G(e)) IIk x k
9
120
INDER JEET TANEJA
with
f ( x ,8 + tei) + f ( x , 8 + tej)
G$(l:(8)) = lim inf
2
and I:(@ is the Csiszir’s information matrix given by (208).
froof. Applying L’Hospital’s rule we can easily check that G$(Z;( 8)) = h‘(O)Z;(O).
This gives
q(Gm= IlG$(Z;(e))llk
xk
=
~ ~ h f ( o ) z ~x (k @ ~ ~ k
=
h’(O)l11;(8) h ’(O)Z,C( 8).
=
Ilk x k
This completes the proof of the result. Proof of Proposition 52 follows from Proposition 51, Result 15, and the relationships of the unified (r, s)-divergence measures with Csiszhr’s +divergence measure given by (i)-(v). For more details refer to Salicru and Taneja (1993).
D. Unified (r,s)-Divergence Measures and the Probability of Error Suppose we have n pattern classes X = ( x l ,x,, ..., xn) with a priori probability pi = Pr(X = x i ) , i = 1,2, ..., n, ] p i= 1. Let the feature y on Y have a conditional probability density function p ( y I x i ) , i = 1,2, ...,n. We assume that pi and p ( y I x i ) are completely known. Given a feature y on Y, we can calculate the conditional a posteriori probability p(xi I y ) for each i by the Bayes rule: PiPO I xi ) i = 1 , 2,..., n. = 1 PkP(Y I X k ) ’ If we consider the Bayes decision rule, which chooses the hypotheses (pattern classes) with the largest posterior probability, then the partial probability of error for a given Y = y is expressed by
p(xi I y ) = Pr(X = xi I Y
=
y)=
c;
P(e I Y ) = 1 - m=IP(xl I Y ) ,P ( X 2 I Y ) , - * P(X* I Y ) l Prior to observing Y , the probability of error, P(e), associated with X is defined as the expected probability of error, i.e., 9
P(e) = EY(p(eIy))=
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
121
where p ( y ) = Cy= I p i p ( y I x i ) is the unconditional density of Y evaluated at y . In recent years, researchers have paid attention to the problem of bounding this probability of error for two- or multiple-class problems while taking some information, divergence, and distance measures into consideration (Kanal, 1974; Toussaint, 1974; Chen, 1976; Boekee and Van der Lubbe, 1979;Devijver and Kittler, 1982;Taneja, 1985, 1989; Lin, 1991). Our aim here in this section is to give bounds on the probability of error in terms of some generalized divergence measures involving two scalar parameters. Some relations and bounds with Chernoff's symmetric measure and Bhattacharyya's distance are also considered. Some of the results in this section can be seen in Taneja et af. (1993). 1. Unified (r,s)-I-Divergence Measure and Generalized Conditional Entropies
In this subsection we shall connect the unified (r,s)-I-divergences given in (95)with the generalized entropies studied in Section 111. We have taken all Ai's to be equal. For simplicity, let us denote P;, the probability density function p ( y I x i ) for each i , where
For all r E (0,m) and s E (-00,
I
=
(1
-
m),
we have
122
INDER JEET TANEJA
with 1 s
Zr(X;Y ) = ( 1 - 2l-7-I ) l -r
(s- l ) / ( r - 1 )
r # 1,sz 1.
(214)
Similarly to (214) we can also alternatively consider the following generalization of I-divergence: 2 s
ZJX; Y ) = (1 - 2'-9-l
for all r E (0,m) and s E (-m,m)with r # 1 , s # 1 . In the extended version, the preceding measures are written as "Ss(X;Y ) = CE("Z;(X;Y ) I r # 1, s # l ) ,
(a! =
1 and 2). (216)
It is understood that all the integrals involved exist. Let us consider a more general form of the measure (215) given by
(s- l ) / ( r - 1)
xk)>"dy]
- 11.
r # 1 , s z 1,r>0, with 'S:(X;Y 1) P ) = CE('Z;(X; Y 1) P ) I * f 1 , s # l ) ,
where P = ( p l ,p 2 , ...,p , ) is the prior probability distribution associated with X . In particular, when p i = l/n, v i, i.e., P = U = ( l / n , l/n, ..., l/n), we have 2S;(X;Y 11 U )= ZS;(X;Y ) . (2 19) The following propositions hold: Proposition 53. For all r E (0,w) and s E
(-00,
w),
we have
2SS(X;Y 1) P ) = n"-"x;(U) - 2 x ; ( X1 Y ) ] ,
(220)
123
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
where X s ( U ) is as given in (15) f o r P (173) for a = 2 in the integral form.
=
U and ' X s ( X I Y ) is the same as
froof. In view of continuity of the expression (218) with respect to the parameters r and s, it is sufficient to prove the result only for r # 1, s # 1. By using the relation p ( y 1 xi)pi = p(xi 1 y)p(y),V i = 1,2, ...,n, y E Y , we can write
for all r E (0,00)and s E ( - 0 0 , 0 0 ) with r # 1, s # 1. This completes the proof of the proposition. Proposition 54. For all r E ( 0 , a ) and s E
(-00,
a),we have
'$(x; Y ) = n"-'[xs(U) - ' x ~ (I Yx) ]
(221)
and
Proof of (221) follows from (219) and (220). Proof of (222) follows from (221) and (128). Proposition 55. For all r E (0,00)and s E (-00, a),we have
and
where ' X s ( X I Y ) is the same as (173) for a = 1 in the integral form. Proof of (223) follows from (220) and Proposition 35. The proof of (224) follows from (223) and (219).
124
INDER JEET TANEJA
2. Unified (r, s)-I-Divergence Measures and the Probability of Error In this subsection, we shall give upper bounds on the probability of error in terms of unified (r, s)-I-divergence measures given by (216). Some Fanotype bounds are also given. Proposition 56.
We have the upper bounds P(e) I+ [ x ~ ( un'+) 2Ss(X;Y 11 P ) ] ,
(225)
P(e) I + [ x ~ ( un'-S ) 'Ss(X;Y ) I ,
(226)
and under the following conditions on r and s: (i) (0 < r < 2, s > 2) or (r > 2, s (ii) 0 < r I2, 1 Is I2; (iii) r > 2, 1 < s < 2, P(e) L 3.
Proof.
> 0) with P(e) Ii;
We know that (Taneja, 1989)
P(e) 5 + ' x ~I Y ( x)
(227)
under the following conditions: (c,) r > 0, s > 2, P(e) I+; (c2) (0 < r < 2, 1 I r I 2) or (0 > s 2 2 - l/r, r > 0) or (0 < r O C S S 1,szr); (c3) r > 2, 1 Is I2, P(e) 2 +; (c4) 0 < r < 1, 0 < s < 1, s 1 r, P(e) 2 i; (c5) s 5 0 < r, s s 2 - l/r, p ( e ) 2 3.
I1,
The rest of the proof of (225) follows from (227) or (223), where the conditions (i), (ii), and (iii) follow from (cl)-(c5), by using the fact that s L r > 0. The proof of (226) follows from (227) and (224). Proposition 57. For all s
L
r > 0 , we have the following bounds:
2 ~ s ( ~Y; 11 P ) 2 n"'[Xs(U)
-
X:((P(~))]
(228)
and 2 s
9 , ( ~ Y; ) 2 ns--'[Xs(U)- X s ( ( ~ ( e ) ) ] ,
where
for all r E ( 0 , ~and ) s E (-00,oo).
(229)
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
125
Proof. We know that (Taneja, 1989)
'WXI Y ) 5 W P ( e ) ) ,
(23 1)
for all s 2 r > 0 or s 1 2 - l/r, r > 0, where Xs(P(e))is as given by (230). The proof of (228) follows from (231) and (223). The proof of (229) follows from (231) and (224). 3. Generalized Chernoff Measure, Bhattacharyya Distance and the Probability of Error
For an n-probability density function Pk (i = 1,2, ..., n), let us consider the following measure:
where p ( y I x,)'p(y I xj)'-'dy,
C~CX Y ;) =
r > 0.
(233)
I x,))~'' dy.
(234)
l Y
In particular, when r
=
e, we have ( p ( y I xi)&
B,(X; Y ) = C ; y ( X ;Y ) = IY
The measure (234) is known as the Bhattacharyya distance or affinity (Matusita, 1967). The measure (233) is known as the Chernoff measure (Chernoff, 1952). We call the measure (233) the generalized Chernoff measure. Let us write the measure (232) and (233) in a more general form:
izj and r
repectively . In particular, whenp,
=
l/n, V i , i.e., P = U,we have
1 S'(X, Y 11 U)= -S'(X; Y )
n
(237)
126
INDER JEET TANEJA
and 1
11 U )= -nB i j ( X ; Y ) ,
B,(X;
i#j,
respectively. Simplifying the expressions (235)and (236),we get
and B i j W ; Y II
=
M X i
I Y)P(xj I Y))"*P(Y) d
~ , i +j ,
!Y
respectively. The following proposition gives the relations between the probability of error and the symmetrized Chernoff measure. Proposition 58.
We have
i
1 n
I -S'(P(e)),
S'W; y II P )
2
and S'(X; Y )
1 - S'(P(e)), n
IS'(P(e)),
2
S'(P(e)),
where
+-nn -- 12 P(e),
r > 0.
0
I1,
(239)
r
2 1,
0 < r 5 1, r L 1,
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
127
Using Jensen's inequality (22), we have the following:
"
C
i=l i = ko
and
1 --(xi
n
- 1
I YY
I
1 -P(xiIY)
i = ko
i = ko
Expressions (242), (243), and (244) together give
1 ,
0
< r s 1, (243)
128
INDER JEET TANEJA
is concave in p for 0 < r < 1 and is convex in p for r > 1 . This gives
Thus, (245) and (246) together give (239), while (240) follows from (237) and (239). The following proposition gives relations between the symmetrized Chernoff measure and the Bhattacharyya distance. Proposition 59.
We have
and
i#j
I
1 i#j
2(1
-
r) 2 1 or 2(1
-
r) I0,
129
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
I
<
i#j
2(1 - r) 21 or 2(1 - r) I0, (249)
1 / 2 2(1-r)
n(n 1- 1)
]
i p y - l [ jY(PiP(Y Ixi))(hP(Y PiAY '*I) I
xi)
i,j=l
i t j
0
I2(1 -
,
r) 5 1.
Simplifying the expression (249), we get the result (247). The result (248) follows from (247), (237), and (238) by taking P = U. 4. Unified (r, s)-J-Divergence Measures and the Probability of Error
Let us consider the following unified (r, s)-J-divergence measures for the pair of distributions X and Y :
"33X;Y ) = CE("J,"(X;Y ) :r
# 1, s # 1)
(a = 1 and 2),
(250)
where 'J;(X;Y ) = (1 - 21-s)-' X
n(n - 1)
[ i [1
](s-WV-l)
Y
i,j=l
lxi)b(Y Ixi)l-rdY
- 1) 7
i#j
(25 1)
and 2 s
J r ( X ;Y ) = (1
-
2'-')-'
x [n(n
1)
i,i
jYP(Y Ixi>b(Y1 X i ) l - r
dl
i#j
1)
- 117
(252)
for all r E (0,m) and s E (-m,m) with r We can also write 1
where a>;(&
0- O W -
1) P i ) is as given by (213).
#
n
1 and s # 1.
130
INDER JEET TANEJA
Let us write the measure (252) in a more general form:
2Js(X;y I1 P ) = (1 - 21-"-1
x
[[2 niS 1 i,j=1
(s- l)/(r- 1)
Y (PiP(Y
I xi))'(PjP(u ~ X i ) ) l - r d Y ]
i#j
r#
1,SZ
- 1)
9
1,r>0.
(253) After simplification, the measure (253) stands as follows: Y 11 P)]@-l)'@-l) - 11,
2J,"(X;Y 11 P ) = (1 - 2l-"-'([ns'(X;
r f
1,Sf
l,r>O,
where S'(X; Y I( P) is as given by (235), with
23:(X;Y 11 P ) = CE12J,"(X;Y 11 P ) 1 r # 1, s # 11,
(254)
and in particular
In particular, when p i = l/n, v i , i.e., P = U,we have 2$(X; Y 11 U )= 23S(X; Y).
(257)
An alternative way of generalizing the measure (257) by considering different generalizations of (256) is as follows: 1
33;(x; y II P ) = n-l
s
Y
33stYlP(Y)dY,
where 3$(~)
=
CEI3J,"(y)I r # 1, s # 11,
YEY
(258)
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
131
with
i#j
r # 1 , s f 1,r>0.
The following proposition holds: Proposition 60. For all r E (0,OQ) and s E (-
we have
00, a),
Proof follows by applying Jensen's inequalities (22). Based on these considerations we have the following proposition. Proposition 61.
The following bounds hold:
(i) ' 3 ; ~ Y; II p ) 2 3s(P(e)); (ii) 3 3 s ( ~Y;11 P ) 2 $(P(e)), s 2 r > 0; (iii) ',$(x;Y ) 2 J;(P(e)); (iv) '$(A'; Y ) 2 $(P(e)), s 2 r > 0;
where
d;(P(e)) = C E M W e ) )I r
#
1, s # 11,
with J,S(P(e))= (1 - 21-~)-'l[~r(~(e))](s-1)'(r-') - 11,
r
#
1, s # 1, r > 0.
froof. Part (i) follows from (239). Part (ii) follows from (259). Part (iii) follows from (257) and part (i). Part (iv) follows from (129) and part (iii). The following proposition relates the unified (r, s)-J-divergence measure (250) for a = 2 with the Bhattacharyya distance (234). Proposition 62. For all r E (0,00) and s E
a;(& PI,
0) '3;(x; y II P ) 2 (ii) '3;w; Y ) 2 d m ,
(-00,
a),we have
132
INDER JEET TANEJA
with
1
n
n
i# 1 and Bij(X;Y 11 P ) is given by (236). Proof.
Part (i) follows from (247), and part (ii) follows from (248).
REFERENCES Agarwal, J. (1974). “Sur I’information de Fisher, in Theorie de I’information” (J. Kampe de Feriet, Ed.), pp. 1 1 1-1 17. Springer-Verlag, New York/Berlin. Amari, %-I. (1984). “Differential Geometry of Statistics: Towards New Developments,” In NATO Workshop on Differential Geometry in Statistical Inference, London, April, 9-1 1. Amari, S.-I. (1985). “Differential Geometric Methods in Statistics,” Lecture Notes in Statistics. Springer-Verlag. New York/Berlin. Atkinson, C., and Mitchell, A. F. S. (1981). Rao’s distance measure. Sankhya 43A, 345. Beckenbach, E. F., and Bellman, R. (1971). “Inequalities,” Springer-Verlag, New York/Berlin. Ben-Bassat, B., and Raviv, J. (1978). Rknyi’s entropy and the probability of error. IEEE Trans. Inform. Theor. IT-24,324. Blackwell, D. (1951). “Comparison of Experiments,” Proc. 2nd Berkeley Symp., pp. 93-102. Univ. of California Press, Berkeley. Boekee, D. E. (1978). The D,-information of order s. Trans. 8th Prague ConJ on Inform. Theory, Vol. C, 55. Boekee, D. E., and Van der Lubbe, J. C. A. (1979). Some aspects of error bounds in feature selection. Patte,-n Recognition 11, 353. Box, G. E. P., and Cox, D. R. (1964). An analysis of transformations. J. R. Stat. SOC. Ser. B26, 211. Burbea, J. (1986). Information geometry of probability spaces. Exposif. Moth. 4, 347. Burbea, J., and Oller, J. M. (1988). The information metric for univariate linear elliptic models. Statist. Decision 6, 209. Burbea J., and Rao, C. R. (1982a). Entropy differential metric, distance and divergence measures in probability spaces: A unified approach. J. Mull. Anal. 12, 575. Burbea, J., and Rao, C. R. (1982b). On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inform. Theor. IT-28, 489. Campbell, L. L. (1985). The relation between information theory and the differential geometry approach to statistics. Inform. Sci. 35, 199. Campbell, L. L. (1986). An extended Cencov characterization of the information metric. Proc. Am. Math. SOC.98, 135. Campbell, L. L. (1987). “Information Theory and Differential Geometry,” Tech. Report, 1987-12, Department of Mathematics and Statistics, Queen’s University.
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
133
Chen, C. H. (1976). On information and distance measures, error bounds and feature selection. Inform. Sci. 10, 159. Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on a sum of observations. Ann. Muth. Stutisf. 23, 493. CsiszBr, I. (1967). Information type measures of difference of probability distributions and indirect observations. Studiu Sci. Moth. Hunger 2, 299. Cuadras, C. M. (1988). Distancias estadisticas. Estudkticu Espunholu 30, 295. Cuadras, C. M., Oller, J. M., Arcas, A., and Rios, M. (1985). Mttodos geomdtricos de la estadisticas. Qiiestiio 9(4), 219. DeGroot, M. H. (1970). “Optimal Statistical Decison.” McGraw-Hill, New York. Devijver, P. A., and Kittler, J. V. (1982). “Pattern Recognition: A Statistical Approach.” Prentice-Hall, Englewood Cliffs, N.J. Ferentinos, K., and Papaioannou, T. (1981). New parametric measures of information. Inform. Control 51, 193. Ferentinos, K., and Papaioannou, T. (1982). Information in experiments and sufficiency. J. Statist. Plann. Inferen. 6, 309. Fisher, P. (1977). On some new generalizations of Shannon’s inequality. P U C I ~J.CMoth. 70, 351. Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cumbridge Philos. Sco. 22, 700. Goel, P. K., and DeCroot, M. H. (1979). “Comparison of experiments and information measures. Ann. Statist. 7, 1066. Hardy, G. H., Littlewood, J. E., and Pblya, G. (1934). “Inequalities.” Cambridge Univ. Press, London. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proc. R. Soc. Ser. A 186, 453. Kailath, T. (1967). The divergence and Bhattacharyya distance measures in signal selection. IEEE Truns. Commun. Tech. COM-15(1),52. Kanal, L. (1974). Patterns in pattern recognition. IEEE Truns. Inform. Theor. 20, 697. Kagan, M. (1963). On the theory of Fisher’s amount of information. Sov. Moth. Dokl. 4,991. Kapur, J . N. (1987). Inaccuracy, entropy and coding theory. Tumkung J. Moth. 18, 35. Kerride, D. F. (1961). Inaccuracy and inference. J. R. Stut. Soc., Ser. B, 184. Kullback, S., and Leibler, R. A. (1951). On information and sufficiency. Ann. Moth. Stutist. 22, 79. Lin, J. (1991). Divergence measures based on Shannon’s entropy. IEEE Truns. Inform. Theor. IT-37, 145. Mangasarian, 0 . L. (1969). “Nonlinear Programming.” Tata/McGraw-Hill, New Delhi/ Bombay. Matusita, K. (1967). On the notion of affinity of several distributions and some of its applications. Ann. Inst. Statist. Moth. 19, 181. Marshall, A. W., and Olkin, 1. (1979). “Inequalities: Theory of Majorization and Its Applications.” Academic Press, New York. Menendez, M. L., Pardo, L., and Taneja, 1. J. (1992). On M-dimensional unified (r, s)-Jensen difference divergence measures and their applications. Kybernetiku 28(4), 309. Morales, D., Taneja, I. J., and Pardo, L. (1991). I-Measures of hypoentropy and comparison of experiments: Bayesian approach. Stufisficu(ltuly) LI(2). 173. Nath, P. (1968). On measures of error in information. J. Moth. Sci. 111, 1. Nath, P. (1975). On coding theorem connected with Rtnyi’s entropy. Inform. Control29.234. Pardo, J . A., Menendez, M. L., Taneja, I. J., and Pardo, L. (1993a). Comparison of experiments based on generalized entropy measures. Commun. Statist. Theor. Methods 22(4), 1113.
134
INDER JEET TANEJA
Pardo, J. A., Pardo, L., Menendez, M. L., and Taneja, 1. J. (1993b). “The generalized entropy measures to the design and comparison of regression experiment in a Bayesian context. Inform. Sci. 73, 93. Quesada, V., and Taneja, I. J. (1993). Order preserving property of unified (r , s)-information measures. Soochow J. Math. 18(1), 379. Quesada, V., and Taneja, I. J. (1994). Generalized mean of order t via Box and Cox’s transformation. Tamkang J. Math., in press. Rao, C. R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. SOC.37, 81. Rao, C. R. (1973). “Linear Statistical Inference and its Applications,” 2nd ed. Wiley, New York. Rao, C. R. (1987). Differential metrics in probability spaces. In “Differential Geometry in Statistical Inference” (S. S. Gupta, Ed.), pp. 217-240. Rathie, P. N., and Sheng, L. T. (1981). The J-divergence of order a . J. Comb. Inform. Syst. Sci. 6, 197. Rathie, P. N., and Taneja, 1. J. (1991). Unified (r,s)-entropy and its bivariate measures. Inform. Sci. 54, 23. Rhyi, A. (1961). “On Measures of Entropy and Information.” In: Proc. 4th Berk. Symp. Math. Statist. and Probl., Vol. 1, pp. 547-461. Univ. of California Press, Berkeley. Roberts, A. W., and Varberg, D. E. (1973). “Convex Functions.” Academic Press, New York. Salicru, M., and Taneja, I. J. (1993). Connections of generalized divergence measures with Fisher information matrix. Inform. Sci. 72, 251. Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27, 379. Sharma, B. D., and Gupta, H. C. (1976). On non-additive measures of inaccuracy. Czech Math. J. 26, 584. Sharma, B. D., and Mittal, D. P. (1975). New nonadditive measure of entropy for discrete probability distributions. J. Mafh. Sci. 10, 122. Sharma, B. D., and Mittal, D. P. (1977). New nonadditive measure of relative information. J. Comb. Inform. Syst. Sci. 2, 122. Sibson, R. (1969). Information radius. Z. Wahrs. und verw Geb. 14, 149. Taneja, 1. J. (1983). On a characterization of J-divergence and its generalizations. J. Comb. Inform. Syst. Sci. 8 , 206. Taneja, 1. J. (1985). Generalized error bounds in pattern recognition. Pattern Recog. Lett. 3, 361. Taneja, I. J. (1987). Statistical aspects of divergence measures. J. Statist. Plann. Inferen. 16, 137. Taneja, 1. J. (1989). On generalized information measures and their applications. Adv. Electron. Electron. Phys. 16, 327. Taneja, 1. J. (1994). On unified (r,s)-information measures. J. Inform. Optim. Sci., in press. Taneja, 1. J., Pardo, L., Morales, D., and Menendez, M. L. (1989). On generalized information and their divergence measures and their applications: A brief review. Quesfiio 13, 47. Taneja, 1. J., Pardo, L., and Morales, D. (1991a). L-Measures of hypoentropy and comparison of experiments: Blackwell and Lhemann approach. Kybernetika 27(5), 413. Taneja, 1. J., Pardo, L., and Morales, D. (1991b). (r,s)-Information radius of type t and comparison of experiments. Aplikace Matematiky 36(6), 440. Taneja, I. J., Pardo, L., and Menendez, M. L. (1991~).Some inequalities among generalized divergence measures. Tamkong J. Math. 22(2), 175. Taneja, I. J., Pardo, L., and Menendez, M. L. (1993). Generalized divergence measures and the probability of error (communication).
DEVELOPMENTS IN GENERALIZED INFORMATION MEASURES
135
Toussaint, G. T. (1974). “On the Divergence Between Two Distributions and the Probability of Misclassification of Several Decision Rules,’’ Proc. 2nd Intern. Joint Conf. on Pattern Recogn., Copenhagen, August, 1-8. Toussaint, G. T. (1978). Probability of error, expected divergence, and the affinity of several distributions. IEEE Trans. Sysf. Man Cybern. SMC-8, 482. Vajda, 1. (1968). Bounds on the minimal error probability on checking a finite or countable number of hypotheses. Inform. Trans. Problems 4, 9. Vajda, 1. (1973). “XZ-Divergenceand generalized Fisher’s information.” In Trans. 6th Prague Conf. on Inform. Theory Statistical Decision Functions and Random Processes, Prague, pp. 873-886. Vajda, I. (1989). “Theory of Statistical Inference and Information.” Kluwer, Dordrecht, The Netherlands. Van der Lubbe, J . C. A. (1978). “On Certain Coding Theorems for the Information of Order a and of Type 8,’’In Proc. 8th Prague Conf., pp. 253-266.
This Page Intentionally Left Blank
50 Years of Electronics
Introduction R. A. Lawes The Exploitation of Semiconductors B. L. H. Wilson The Use and Abuse of 111-V Compounds Cyril Hilsum Telecommunications: The Last, and the Next, 50 Years John Bray Mesoscopic Devices Where Electrons Behave like Light A . J. Holden The Evolution of Electrical Displays, 1942-1992 Derrick Grover Gabor’s Pessimistic 1942 View of Electron Microscopy and How He Stumbled on the Nobel Prize T. Mulvey Early Techniques in Radio Astronomy A . Hewish
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014733-5
This Page Intentionally Left Blank
INTRODUCTION
In 1992, the Electronics Group of the Institute of Physics celebrated its 50th anniversary. Even during the troubled times of World War 11, sufficient foresight enabled a resolution to be passed by the Board of the Institute t o form an Electronics Group (28 January 1942). Its first Chairman was Professor John Cockcroft, later to be knighted and awarded the 1951 Nobel Prize for Physics for his work on nuclear transformation by artificial means. The Group’s first full lecture was “Electron Optics,” given by Dr. Dennis Gabor, who was awarded the 1971 Nobel Prize for Physics for his work on holography. At the end of its first year, the Electronics Group had about 200 Members or 10% of the total Institute membership. During the next decade or so, meetings and lectures became increasingly concerned with modern (i.e., semiconductor) electronics, and a lecture by G . Dummer in October 1958 on “Microminiature Electronic Components” gave a foretaste of the integrated circuits to come. Other lectures established the wide interests of the Electronics Group, including “Recent Work on the Theory of the Latent Image” by Sir Neville Mott (1977 Nobel Prize for Physics) and Sir Martin Ryle on “Radio Astronomy” (1974 Nobel Prize for Physics). With such an illustrious beginning, the question of how best to celebrate our 50th Anniversary began to occupy the Group under its then Chairman Dr. Keith Vanner. Keith was largely responsible for organizing the evening and persuading a group of very eminent (and busy!) electronics specialists to attend our celebrations at the 1992 meeting of the British Association in Southampton and to talk about the “Evolution of Electronics since 1942.” Professor Tom Mulvey provided a humorous and technically stimulating account of Professor Gabor’s vision of electron microscopy and his progress toward the Nobel Prize for Physics. Professor Bryan Wilson (“The Exploitation of Semiconductors”) and Professor Cyril Hilsum (“The Use and Abuse of 111-V Compounds”) gave fascinating accounts of the pioneering work on silicon and 111-V compound microelectronics in the U.K. Dr. Holden brought us up to date, and indeed gave a glimpse of a possible future, with his talk on mesoscopic systems. Dr. Derrick Grover reviewed the evolution of displays over the last 50 years, and Dr. John Bray managed to review 50 years of telecommunications, from telephone speech at 40 kHz, through optoelectronic switching, to the electronic newspaper! Finally, Professor Anthony Hewish showed how the emergence of radio astronomy (for his contributions he shared the 1974 Nobel Prize for Physics) relied on the development of electronics in the postwar period. 139
140
INTRODUCTION
Electronics has contributed immensely to science and technology and to modern life over the last half century. The Electronics Group of the IoP has played its part. The presentations of the speakers at our 50th Anniversary are included in this volume and will give the reader a sense of the excitement and achievements of those years. R . A. Lawes Chairman of the Electronics Group, 1992-1994
ADVANCES IN IMAGING AND ELECTRON PHYSICS,VOL. 91
The Exploitation of Semiconductors B. L. H. WILSON Armada House, Weston, Towcesler, United Kingdom
. . . . . . . . . . . . . . Technology . . . . . . . . . . . . The Transistor . . . . . . . . . . . A. Competition between Transistor Types . . B. The Diffused Transistor . . . . . . . C. The Planar Transistor . . . . . . . . D. Discrete Transistors in the Mid-1960s . . . Integrated Circuits . . . . . . . . . . A. Planar Integrated Circuits . . . . . . B. The New Situation . . . . . . . . .
I. Prehistory
. . . . .
.
11. The Conceptual Framework 111.
IV.
V.
. .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
C. The Integrated Circuit Situation in the Mid-1960s . . VI. The Field Effect Transistor . . . . . . . . . . VII. The Information Technology Revolution . . . . . . VIII. U.K. Progress in ICs from 1963 . . . . . . . . A. The Expanding Opportunity for Integrated Circuits . B. What Went Wrong? . . . . . . . . . . . C. Characteristics of the Industry . . . . . . . . D. Some National Attitudes Compared . . . . . .
.
. . . . . . . . . . . . , . . . . . . . . . . . . . . . . . , .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. .
. . . . . . . .
. .
. . . . . . . . . . . . . . . .
142 142 143 143 144 145 145 146 146 147 148 149 149 150 150
151 IS1 166 168
This paper covers the discovery of the useful properties of semiconductors, and concentrates very largely on silicon integrated circuits, “chips,” whose social and economic importance far transcends the very substantial influence of other semiconductors. An accompanying paper covers gallium arsenide. It is written from the standpoint of the United Kingdom, and attempts to show how the U.K.’s substantial contribution to the early development of silicon ICs has not led to a large industry in the indigenous companies, though there is a substantial industry which manufactures chips on processes developed elsewhere, sustained by inward investment. Some reasons are suggested for this failure. The paper is written from the point of view of someone closely associated with the IC development in Plessey, which was one of the leading companies in U.K. integrated circuit development for nearly 30 years. As such, it does not explicitly recognize the substantial contributions made by many other U.K. companies, such as Phillips, S.T.C., G.E.C., Ferranti, and Inmos. 141
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014733-5
142
B. L. H. WILSON
I. PREHISTORY
The first observation of semiconducting properties is sometimes said to be the observation of Faraday of the negative temperature coefficient of resistance in silver sulfide in 1835. Two years later, Rochenfeld was the first to observe rectifying contacts. Although there were many relevant observations throughout the 19th century, particularly of photoconductivity, we will now skip to the early days of radio, where point contact rectifiers on various semiconductors were found to be efficient detectors in the period 1902-1906 (Bose, Pierce); Pickard observed that silicon was among these materials. Though cat’s whiskers became familiar to many until they were largely supplanted by valves, no real scientific development occurred until the Second World War. In the meantime, a substantial semiconductor industry had sprung up to expoit the rectifying properties of copper oxide and selenium, the first selenium rectifiers being produced commercially in 1927. There were, however, a number of anticipations of the (field-effect) transistor between the wars by Lilienfeld and others in the 1920s. The first published scientific account of a device that actually worked was by Hilsch and Pohl in 1935. The geometry of their “transistor” was such that gain could only be obtained below 1 Hz. Some of the early inventors had hit on the correct principle, that charge induced on the opposite plate of a capacitor could be mobile and give current between what we should now call source and drain. They failed to appreciate how easily it could be trapped at surface or interface states. 11. THECONCEPTUAL FRAMEWORK
It is unlikely that any substantial technology could have emerged before an adequate scientific framework had been put in place. The quantum theory of metals was first developed by Sommerfeld and Bloch in 1928, and this was followed by Wilson’s paper on the quantum theory of semiconductors in 1932. It developed a theory of electron and hole conduction in terms of contributions by impurities, which had respectively one more or one less electron than the host semiconductor. Wilson, an Englishman, later published a substantial book on the theory of metals (and semiconductors), which still seems surprisingly modern. He subsequently went into metallurgy as a business and was remarkably successful. At that time, no-one in the West was thinking about what might happen if both electron conduction and hole conduction were present in different parts of the same semiconductor. The Russian physicist Davydov published his theoretical paper
THE EXPLOITATION OF SEMICONDUCTORS
143
in 1938 on the properties of such p-n junctions, including minority carrier injection, but despite the fact that the paper was available in English translation, it required the stimulus of the war for practical demonstration.
111. TECHNOLOGY
The development of radar during the war provided the incentive for a new investigation of the point contact diode detector at what were then felt to be high frequencies. B.T.H. at Rugby contributed notably to the British work, but much the largest and most widely distributed effort took place in the U.S.A., notably at Bell and Purdue. This included the development of techniques for the chemical control of purity of germanium and silicon, the preparation of polycrystalline material, and developments in sawing, polishing, and etching. p-n junctions were studied by Oh1 in 1941. Remarkably, the importance of single-crystal growth was not realized, and even when Teal and Little grew the first germanium crystals in 1948, it was without the support of their laboratory head, Shockley, who led the Bell laboratories team. Subsequently, Teal left Bell for Texas Instruments, where he succeeded in growing single crystals of silicon, providing a technical boost to Texas’s capability in what was destined t o be the most important semiconductor.
IV. THETRANSISTOR As is well known, the first transistor was invented by Bardeen and Brittain in 1947, though the invention was not disclosed until 1948. They were
members of a team at Bell laboratories doing fundamental work on semiconductors, in this instance the study of surface states. To probe the surface, a second point electrode was brought down close to the point of a point contact rectifier, and it was realized that under appropriate circumstances the second wire could act as a control electrode for the current injected from the first. The importance of the discovery was rapidly realized, even though the early transistors were unstable and noisy. During the period 1948-1951, Shockley developed the p-n junction transistor, which overcame many of the early problems. The invention commanded worldwide attention: The prospects of a device without the large bulk, complexity, unreliability, large power drain, and limited life of the valve were quickly realized.
144
B. L. H. WILSON
A . Competition between Transistor Types
An early type of junction transistor t o emerge was the alloy transistor (1950). A thin die or chip of semiconductor, almost always germanium, had fired into its two opposing faces small beads of metal, typically indium. As the molten metal cooled down, dissolved germanium grew in the pits which the molten metal had etched. The regrown regions were doped with indium, and the resultant structure was a p-n-p alloy transistor. Although production required complex jigs made from carbon or steel, and not many transistors were made in a batch, it was comparitively simple, and the process was copied and licensed very widely. More transistors could be made simultaneously if the transistor structure was made during growth of the original crystal, by overdoping the original doping with an impurity of opposite type added to the melt, and then overdoping again. The region with transistor structures was cut from the grown crystal as a thick disk, which was in turn cut up into a large number of long cuboids, which formed the transistor. Although many transistors were formed simultaneously, it was necessary to grow a whole crystal to form them; this grown junction process (1951) was thus extremely uneconomical in material. Furthermore, it was hard to make contact with the narrow base region, and this early type soon became obsolete. A problem with the alloy transistor was that the emitter and collector electrodes were separated by a thickness comparable with that of the original wafer. The frequency response of the transistor was limited by the time it took for minority carriers to diffuse across this distance, several tens of microns, and could not extend much above the audio range. To solve this problem, Philco introduced the electrochemical transistor, in which an individual transistor die was mounted on a special jig so that dimples could be electrochemically etched on both sides, leaving in the transistor region only a thin web of germanium whose thickness could be monitored by its transparency. By reversing the polarity, metals could be plated onto both sides, and these could be fired in to form an alloyed transistor. Frequency responses in the megahertz region were now available. Again the process was widely licensed, though its successful deployment was to be short. Another competing type was successfully developed by Philips from the alloy transistor geometry. The doping type of the alloy dots was now of the same type as the germanium wafer, but the collector metal dot contained a second impurity of opposite type. This impurity was selected because it diffused rapidly from the alloy, producing a thin diffused base region, whose thickness was controlled by time and temperature rather than by jigging.
THE EXPLOITATION OF SEMICONDUCTORS
145
Although these processes produced a substantial industry and made possible portable radios much superior to the valve radio, they suffered from a number of disadvantages. The production required complex jigs which produced only a limited number of transistors; the electrochemical process required a jig for each transistor. Lead attachment and heat sinking were often problems. The transistors were only suitable for use at ordinary temperatures and could scarcely operate above about 75 "C. Later, silicon variants on these types were introduced, which encouraged military use. Further, the exposed semiconductor surface was extremely sensitive to ambient gases, particularly water vapor. Transistors could not be vacuumencapsulated, but were mounted in sealed cans. Some secrecy surrounded formulations for statisfactory surface treatments, but even the best devices had a limited operating life of some tens of thousands of hours.
B. The Diffused Transistor Some of these disadvantages were overcome in the early diffused transistor in silicon (1956). A whole wafer was subject to two diffusion processes. The first, boron, diffusion produced a thin p-type skin over the whole of the wafer, while the second, phosphorus, diffusion covered it with an even thinner n-type skin to give an n-p-n sandwich. The geometry of the transistor was defined by a photographic process. Originally, wax was used as a mask, but this was soon abandoned. Metal was evaporated onto the whole wafer, which was then covered with photoresist. The photoresist was exposed through a photographic mask and developed so that the wafer was covered by exposed circular islands of resist. The wafer was then etched through the diffused regions of the silicon to produce structures reminiscent of the mesas of Arizona, after which the type was named. A similar step, but with a new mask and lighter etch, exposed the p-type layer for contacting, over part of the original mesa. This process, of which there were a number of variants, had several advantages. A large number of identical high-frequency structures were produced simultaneously, and much of the complex jigging was replaced by a photographic process. This was to prove of decisive importance in reducing costs, particularly when a new variant was invented which eventually solved the stability problem.
C. The Planar Transistor The planar process, which in its developed form is attributed to Hoerni at Fairchild between 1959 and 1961, made use of a process step called oxide masking. Before diffusion, the silicon wafer is thermally oxidized so that it
146
B. L. H. WILSON
is covered with an adherent layer of silica. Resist is then spun over the wafer, and exposed through a photographic mask. The resist is developed to reveal a pattern of holes in the resist, and the wafer is then dipped in hydrofluoric acid to locally dissolve the silica. After removing the resist, the wafer is placed in an oxidizing furnace for the boron diffusion. The silica masks the boron diffusion except where holes had been etched. The boron diffuses down into the wafer and for a similar distance sideways under the silica, so that the p-n junction so formed is protected by a layer of oxide. Oxide also grows over the diffused region in the oxidizing ambient of the diffusion furnace. The process is repeated using a second mask with smaller holes for the emitter diffusion of phosphorus, giving a localized diffused region aligned to the original transistor bases. Finally, metal stripes are evaporated, being again defined by photographic masks. The wafer is sawn into a large number of identical dice, which are subject to a mounting and lead attachment process that could be rapidly performed manually with suitable jigs and was later automated. This process produced very reliable high-frequency transistors, with a temperature range suitable for military and professional use, but in other applications had to compete with established alloy processes. As the collector region could be gold soldered on to a metal carrier, power transistors were also easily made.
D. Discrete Transistors in the Mid-1960s The mid-1960s saw many firms engaged in transistor manufacture in the U.S.A., Europe and Japan. Competition and the resultant price erosion was such that manufacture was not always profitable. In addition, there was competition between transistor types. Manufacturers which had licensed processes for transistor manufacture found that they became obsolescent, and unless this license was merely to back up their own in-house R & D, a new license must be bought as the process became obsolescent. Germanium, despite its shortcomings, was still dominant in volume. However, most large electronics firms felt that this was a technology that they could not afford to neglect, so radical was its influence becoming in key consumer markets, in defence, telecommunications, and computing. This thinking was to be transformed by the rise of the integrated circuit. V . INTEGRATED CIRCUITS
The idea of the integrated circuit was first put forward by a British defense engineer, G . W. A. Dummer, at a meeting of the American I.R.E. in 1952.
THE EXPLOITATION OF SEMICONDUCTORS
147
The conception was that a number of components were to be fabricated in a solid block of material, the “solid circuit.” Some years elapsed before this vision could be interpreted. The driving force at the time was “miniaturization” for military purposes. There were two approaches. One was to build up, on a passive substrate such as glass, successive layers of deposited material: gold for interconnections, evaporated nichrome for resistors, and evaporated silicon monoxide for dielectric were commonly used. To these were added active diodes and transistors, often in chip form. This approach was initially easier. Simple metal masks or, later, screen printing were used to differentiate patterns in the various layers. The technique still survives, but did not go on to form a major industry. The more radical approach, the one that best approximated the original solid circuit vision, was to form all the components in a semiconductor chip in what was now to be called an integrated circuit. The first demonstration of the technique was by Jack Kilby of Texas Instruments, who created a multivibrator circuit entirely from silicon in 1958 (announced 1959), albeit connected by a bird’s nest of wires. Serious work in the U.K. began later that year at Plessey’s Caswell laboratories. However, the rise of the planar process was to transform the subject. A. Planar Integrated Circuits
The planar process with its photographic basis and its use of whole silicon wafers and passivated junctions was ideal for extension as the basis of integrated circuit technology. Resistors could be formed from the base diffusion, which was of quite high sheet resistance (200 ohms). By altering the mask pattern, resistors of a desired value could be defined. Although these resistors were temperature-dependent, in many circuits it was the ratio of resistance that mattered, and this was independent of the sheet resistance. The emitter diffusion was of low sheet resistance and could form one plate of a capacitor of which the thermally grown silica was the dielectric and the aluminum metallization was the other plate. The aluminum also formed the main interconnect between components, though cross-overs could be effected by using the emitter diffusion. However, all these components were unfortunately interconnected by the use of a common substrate. Some early circuit designs even attempted to overcome this limitation by cutting holes in the substate. In 1959, Lehovic invented a way to use the reverse-biased p-n junction to isolate the components. The idea was to place each component or group of components on a “land” of n-type material, but to surround that land by a wall of p-type material which was allowed to float with no applied bias.
148
B. L. H. WILSON
No net current could flow into the wall, so what little current did flow was a combination of the small leakage current from reverse bias for the majority of the junction with a compensating current from slightly forward bias of the remainder. Several ways of realizing this isolation compatible with the then-dominant bipolar transistor process were invented. One of the simplest was to use ap-type substrate on which a thin layer of n-type silicon could be grown by the newly invented process of vapor phase epitaxy. The lands for the components could then be formed by a deep diffusion of boron from the surface down t o the p-type substrate. Components could then be made on these lands in the way indicated earlier. More sophisticated versions of the process involved diffusion of the substrate prior to epitaxy: An n-type diffusion which was to appear beneath the transistor collector made the area of the collector an equipotential, while a p-type diffusion coincident with the later deep diffusion eased the demand for vertical penetration of the deep diffusion.
B. The New Situation This invention was to have revolutionary implications for electronic economics, though they were only slowly appreciated, particularly outside the integrated circuit industry. All components, transistors, diodes, resistors, and capacitors were defined by photolithography simultaneously. The cost of processing a wafer was the same irrespective of the complexity it contained. So at high yield the cost of each integrated circuit chip was proportional to its area. So smaller components, i.e., transistors and diodes, were cheaper than area-demanding resistors and capacitors, reversing the usual constraints. Circuits were fully designed before fabrication, as breadboarding was no longer desirable or even possible. A new breed of electronic engineer was required, the integrated circuit designer, who later was to be aided by sophisticated tools for layout, circuit analysis, and synthesis. Yields of integrated circuits were low at first, particularly for complex large-area chips, which suffered most from the defects often associated with dust. This led to the use of expensive clean rooms and protective clothing, filtered air, pure reagents and deionized water, which added greatly to costs. To pack more circuits into the wafer, it was desirable to shrink the size of features on the wafer; this also led to higher-frequency circuits largely because of a reduction in stray junction capacitance. Although the finer lithography required more expensive equipment, the main effect was to give cheaper, and potentially more complex, chips. The situation is often illustrated by the complexity of memory chips, which has risen exponentially from 1 kbit in 1968 to 16Mbits today (1992). Most of this 16-thousand-fold increase is associated with a decrease in feature size. It is
THE EXPLOITATION OF SEMICONDUCTORS
149
matched by a corresponding decrease in price per bit. None of this would have been possible if it were not for a corresponding increase in realiability. Though there were problems with integrated circuit reliability in the early days, when these teething troubles were over it was found that integrated circuits were much more reliable than the mass of components that preceded them, partly because of the use of passivated junctions, but mainly because of the elimination of large numbers of interconnects between different metals in formerly conventional assemblies. But we have now anticipated the historical development.
C. The Integrated Circuit Situation in the Mid-1960s In the United States, the integrated circuit market was dominated by military use, which had 72% of the market in 1965. The hearing aid (1963) was practically the only consumer application. The Minuteman missile had sustained the first major application. Typical companies included TI, Fairchild, G.M.E., National (1967) and Intel (1968). Some older companies, such as RCA and AT&T, who had dominated semiconductor research earlier, at this time largely ignored the integrated circuit, which was to have such a revolutionary effect on telecommunications. TI introduced standard logic circuits in 1965, a change which led on to the cell library, in which such cells were regarded as elements in the design of larger chips. Japan began IC research in 1964, and by 1965 eight large vertically integrated companies were making ICs. The U.K. was also at that time dominated by military application. Plessey concentrated on linear circuits and Ferranti on digital. Despite the smaller scale of operation, developments were world-class and did not rely on licensed technology. This is illustrated next by a number of examples, drawn from the experience of one manufacturer, Plessey. This choice reflects the author’s knowledge and should not be taken as underrating the contributions of the other indigenous companies. Plessey established the concept of customer design on a standard invariant process in 1964, began commercial sales in 1965, and converted its planar process production line to ICs in 1966, phasing out discretes a year later. A single major technical advance remains to be described. VI. THEFIELDEFFECTTRANSISTOR
The field effect transistor, FET, in which the charge on one plate of a capacitor modulates the conductance of a semiconductor channel on the other, was early demonstrated by Shockley and Pearson in germanium.
150
B. L. H. WILSON
Large-scale development awaited the use of a capacitor formed in the thermal oxide of silicon, when an applied voltage could turn on conductance in a channel beneath it (Fairchild, 1962). Many firms turned to this conceptually simple device, but problems of stability were not solved for several years. Then the problems of accidental contamination of the oxide by impuries was realized, and techniques were developed to reduce the charge trapped near the silicon/silica interface. It turned out that integrated circuits using this type of FET, the MOSFET, did not require a special isolation process, and thus were smaller and simpler than bipolar circuits. Also a variant, CMOS, which now dominates logic, requires almost no power except when the logic gate is switched.
VII . THE INFORMATION TECHNOLOGY REVOLUTION The technology was now in place for the IT revolution through cheap, reliable circuit complexity by photolithography. These were t o be complemented by the development of circuit design skills, culminating in efficient right-first-time designs for complex products using CAD suites and a data base or library of previous designs. Subsequent developments in integrated circuits have been on an immense financial scale, but have been evolutionary, not revolutionary. They include the use of ion implantation to control the dose of impurity incorporated by diffusion, and the development of multilayer metal conductors. Most developments have been aimed at decreasing the feature size of chips, and hence increasing circuit complexity or decreasing cost. Lithographic tools have been transformed to attain the dimensional control required to better than a tenth of a micron, and they have been matched by the development of dry etching processes to transfer the fine pattern in resist to the silicon.
VIII. U.K. PROGRESS IN ICs
FROM
1963
These will be illustrated by chip designs from Plessey Semiconductors or its research arm at Caswell, Towcester. Recently, Plessey Semiconductors took over Ferranti’s IC operation, and then in turn Plessey was taken over by G.E.C.-Siemens and later allocated to G.E.C. This ultimately brought almost all the remaining manufacture by indigenous companies under one management in G.E.C. Plessey Semiconductors.
THE EXPLOITATION OF SEMICONDUCTORS
151
A . The Expanding Opportunity for Integrated Circuits
The scope for integrated circuits is now well known, but at any one time it has been difficult to see how large the next steps will be and how radical the effect on industry as a whole. Application to logic was historically the first and led on to widespread application to computers in science and engineering, banking, and finance. The obvious application to radio began in 1966, and TV followed. More radical was the use in memory, previously dominated by nonvolatile magnetic memory. But semiconductors could show lower costs, and with the development of the 1K DRAM of 1968, the way was open for the virtual extinction of core memory and the use of vastly increased memory at lower costs. Memory now is the largest IC product, and though a fiercely competitive market leads the technology. The microprocessor followed in 1971 and progressed through more complex processors until the operations of a complete work station can now be obtained on a single chip. To expand further, the industry then needed to satisfy, or often to create, demands for chips from individual consumers rather than in professional markets. Calculators and electronic controls for “white goods” were followed by games computers, word processors, and personal computers and by great changes in “home entertainment electronics.’’ Automobile electronics can account for as many as 15 microprocessors in a luxury car. Personal communications networks will increase the already large dependence on chips of telephones, answering machines, and fax, apart from the total dependence of the telecommunications network itself on highly specialized products. In the future we are promised chips for bandwidth compression in the videophone and HDTV, machine translation and speech recognition, while the availability of cheaper nonvolatile semiconductor memory will be a new feature in the memory market. Past experience suggests that new applications, not now widely envisaged, will emerge to fuel more growth.
B. What Went Wrong?
Despite an excellent position in early R & D, and some notable products and processes, the indigenous U.K. effort is unimportant in world production, and there is little sign of the dynamic of new product opportunities creating fresh businesses in electronics and information technology. There have, of course, been many analyses of the decline of the world market share not just of Britain and Europe, but of the West as a whole. We will rehearse some of the arguments.
152
B. L. H. WILSON
FIGURE1. 1962. Early planar designs for a digital integrator.
THE EXPLOITATION OF SEMICONDUCTORS
153
FIGURE 2. The first successful analog feedback IC amplifier (-5 MHz). The circuit and its conventional discrete realization are shown, together with a 1-in. wafer with a number of potential ICs. 6- and 8-in. diameter wafers are now most commonly used.
FIGURE 3. A chip photograph of the amplifier. Note that the feature size is 25pm, in contrast to features between 0.5 and 2 p m today. Many lithographic defects apparent on the chip reflect the primitive contact lithography and cleam-room practice of the time.
154
B. L. H. WILSON
FIGURE4. 1963. A later version of the same amplifier, incorporating improved interdigitated transistors.
THE EXPLOITATION OF SEMICONDUCTORS
155
FIGURE 5 . A logarithmic amplifier with 165 MHz bandwidth. High-frequency radio amplifiers were a strong feature of Plessey’s business in the 1960s and subsequently, and were frequently in advance of U.S.designs.
156
B. L. H. WILSON
FIGURE 6. 1966. An rf/if amplifier with 20 dB gain and 150 MHz bandwidth. Note the large area occupied by capacitors (in black). Such designs were intended for military or avionic use.
THE EXPLOITATION OF SEMICONDUCTORS
157
FIGURE 7. 1967. A divide-by-two circuit, operating at 200 MHz, a high speed at the time, first used in frequency synthesis in military radio. Unlike their discrete counterparts, such circuits were built very largely from active components, which occupied a smaller area than passive components.
158
B. L. H.WILSON
FIGURE8. 1967. A variable decade divider used in frequency synthesis.
THE EXPLOITATION OF SEMICONDUCTORS
159
FIGURE 9. 1969. A variable two-decade divider using MOS circuitry at 2 MHz. Note how complexity has increased, both as a result of the simpler MOS circuits, and with improving technology. Two layers of metal allow cross-overs to be effected without penalty.
160
B. L. H . WILSON
FIGURE10. 1970. A self-scanned photodiode array was used for high-speed optical character recognition in banks. The array consists of 72 x 5 photodiodes with amplifiers and row and column addressing circuits.
THE EXPLOITATION OF SEMICONDUCTORS
FIGURE1 1 . 1972. The world’s first 1 GHz divider.
161
162
B. L. H . WILSON
FIGURE12. 1982. A I ,024-element video delay line using charge-coupled device technology. The serpentine configuration avoids discontinuities in charge collection. CCDs are more familiar as the active optical element in video cameras, but can also be used as here in signal processing.
THE EXPLOITATION OF SEMICONDUCTORS
163
FIGURE 13. 1987. A 1 GHz divide-by-four circuit, originally produced as a demonstrator for the 1 pm bipolar process in the U.K. Alvey program. Over the period of 20 years since the earlier example shown in Fig. 7, the operating frequency of state-of-the-art circuits for this application had increased 2O-fold, largely as a result of decreases in feature size, with an accompanying shrinkage in the vertical dimension.
164
B. L. H.WILSON
FIGURE 14. 1989. A 406,000 transistor asynchronous time multiplex switch, for use in packet switching in telecommunications. The switch uses 1 pm CMOS technology also developed as part of the Alvey program, and three layers of metal interconnect. Metal interconnect now dominates the surface of the chip, and the development of fine features in the metal and dielectric layers rivals in difficulty the delineation of fine features in the silicon itself.
THE EXPLOITATION OF SEMICONDUCTORS
165
FIGURE15. 1990. A I . 3 GHz all-parallel 5-bit analog-to-digital converter. Although most of today’s complex chips are largely or completely digital, analog connections are usually necessary to communicate with the outside world.
166
B. L. H. WILSON
FIGURE16. 1992. An array of 70,000 gates, using the unstructures sea-of-gates approach. This illustrates the modern tendency to use so-called semi-custom logic for even very complex logical tasks. The metallization connecting the gates can be designed by the end customer using a basic chip used by many customers. This cuts costs and shortens the design cycle. (Photo, GEC Plessey Semiconductors.)
C. Characteristics of the Industry The industry suffers from very high R & D costs, as new process variants, often involving smaller feature sizes, make earlier processes obsolescent. Items of production equipment for lithography, dry etching, and ion implantation may cost a million pounds or so, and are now essential for research, too. A minimal sustainable investment will be of the order of ten million pounds per year. These costs are small compared with the capital costs of production, where wafer “fabs” will cost something like a hundred
THE EXPLOITATION OF SEMICONDUCTORS
167
FIGURE 17. 1992. A microcomputer, combining all the elements of a work station on a single die. The chip is only 8.37 x 7.58mm and embodies 98,019 0.8pm CMOS transistors. It contains a 32-bit RISC processor, a memory controller, an I/O controller, a bus interface, and a videolaudio controller. (Photo, GEC Plessey Semiconductors.)
million pounds, and plant for the largest volume product, memory, is so demanding that even the largest companies may combine to share the costs of the order of five hundred million pounds. If the plant is not to produce only low-margin products, the plant will require a large team of circuit design engineers employing expensive C.A.D. tools. Firms making such a large investment only succeed when it is coupled with engineering and marketing vision of the highest order; low-profit or loss-making activities are common. One reason for this is rapid technical obsolescence of
168
B. L. H.WILSON
processes coupled with price erosion. Time t o market is often vital to recover the original investment, It may seem surprising that under these circumstances firms stay in the race, and indeed many have not or have trimmed their investment. Some major firms such as Intel base their strategy entirely on selling devices on the open market. Some large firms feel that some in-house IC production is essential to underpin their much larger equipment or service businesses. New developments in ICs will obsolete older equipment or give opportunities for entirely new products, particularly in information technology, which has been entirely dependent on the growth of an underpinning chip capability, which may not be securely available in time to those firms without it. Others fear that the periodic shortages of capacity in the industry for standard products, particularly for memory, will limit their sales for equipment. A few firms have manufactured ICs entirely for their own use, but most now feel that their internal market does not offer economies of scale and offer products on the open market as well. Such firms are therefore faced with ambivalent attitudes to their investment. Fears of loss of competitiveness are not just felt by individual firms, but by national and economic groupings as well. The reaction t o this difficult situation has varied greatly in different countries.
D. Some National Attitudes Compared The U.S.A. was the largest market for ICs (now $18B in 1992), and was for a time the largest supplier. Japan became the largest supplier in 1985 because of her dominant position in the memory market, but with an increasing penetration of other overseas markets. Japan has, of course, a large internal market for ICs often for consumer products which will ultimately be exported. Its market is now the largest at $19.1B. The market for Europe is $16.5B, of which Germany and the U.K. consume the majority. The U.K. has quite a large industry based on inward investment by the U.S.A., and more recently by Japan; the indigenous industry is small and focuses on specialized products, but for worldwide sale. Now it is concentrated in one supplier, GEC Plessey Semiconductors, which is said to be the largest manufacturer of ASICs (application-specific ICs) in Europe, where, unusually, it returns a satisfactory profit. The typical American supplier is a device company, sometimes with other interests, e.g., Motorola. The only major US. integrated companies who are manufacturers are AT&T and IBM. By contrast, Japanese semiconductor companies are vertically integrated with equipment companies. The growth of American device companies was much aided by the availability of venture capital at least
THE EXPLOITATION OF SEMICONDUCTORS
169
until 1980. In Japan, capital has been readily available from the banks, which sometimes formed part of the company group, at low interest rates. The U.K. has had much higher interest rates, sometimes with inflation. Corporate long-term strategy is, of course, characteristic of Japan, combined with guidance and intervention from MITI, notably in their funding of work in LSI (large scale integration) from 1971, and VLSI (very large scale) from 1976 to 1979. This period transformed the competitiveness of the Japanese industry. To combat the VLSI initiative, the U.S. Department of Defense launched the Very High Performance Integrated Circuit (VHPIC) initiative, aiming particularly at increasing chip computational power, but without materially affecting economic competitiveness. Most of the U.S. industry may perhaps be characterized by medium-term opportunism, though the opportunities seized by TI, Intel, and Motorola still keep them among the world’s top 10 semiconductor companies. British initiatives have been on a smaller scale and have partly reflected the political character of the government of the day, e.g. by Labour support of Inmos. Defense-funded research has been of some importance in the past, but was superseded by first the Alvey initiative and then funding from the EC Framework program, coupled with reduced funding from the Department of Trade and Industry. Despite having processes at least in pilot plant which were little inferior to the competition, there has generally been insufficient emphasis on the range of premium products to exploit the investment to the full. The lesson seems to be that large-scale competitiveness requires simultaneous excellence in processes, products and marketing. Most of the surviving large companies are vertically integrated, deriving part of their rationale from the chip marketplace and a part from the rest of the organization. This, of course, is an uneasy situation to manage, and against modern trends in accounting, which makes each part of the organization independently responsible for its own financial performance. As success in chip manufacture seems on a national scale to be linked with growth of market share in electronics and information technology as a whole, these challenges to management and government will not go away, but must be solved by countries which wish to retain an electronics industry sharing in the growth of worldwide demand. ACKNOWLEDGMENT
The author is particularly indebted to Philip Morris’s book A History of the World Semiconductor Industry (Peter Peregrinus, 1990) for many facts outside his personal experience.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS,VOL. 91
The Use and Abuse of 111-V Compounds CYRIL HILSUM GEC Hirst Research Center, Borehamwood, United Kingdom
A talk can be factual, describing where we are. It can be historical, listing the events that led us here. And it can be emotional, probing the thoughts and aspirations of the people who were the history, who laid the foundations for the present state of the art. The first talk is useful if you are, or might become, engaged in research on that topic. The second is interesting if you enjoy following trains of events. The third is the real kernel of science, because it gives an insight into why things happen and, perhaps more important, why it takes so long for ideas to come to fruition as devices and systems. It can also be extrapolated to other fields. That is particularly significant today when there is so much said about our failure to exploit basic science, as though there is a simple route which we miss. This is the story of the 111-V compounds, a family of materials with fascinating properties, and of the struggle of scientists to discover these properties and exploit them. It is a story with many twists, with periods of slow progress and great frustration followed by high achievement and immense pleasure. It is a story of exploitation, though much of the use came in unforeseen, and probably unforeseeable, ways. I believe that the field is unique in its richness, in the lessons in physics there are to be learned, but also in the variety of applications we have discovered, and keep discovering. It may also be exceptional in the convoluted pattern of discovery, as though the materials held back their gifts until they felt the scientists had suffered enough. Since the identification of 111-V compounds as semiconductors is placed by German historians in the early 1950s, and Russian historians would go back further still, you will understand that I shall be omitting much that happened, concentrating on main threads, and, of course, interpreting in my own way events that I recollect. However, I am known to have a good memory, though it can be selective. The 111-V compounds were invented when excitement over the transistor was being tempered by realization of the limitations of germanium. At that time all devices had to be made in pulled single crystals, and silicon was viewed with great suspicion, since the high growth temperature, over 14OO0C, would certainly result in impure crystals. I said the 111-V 171
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014733.5
172
CYRIL HILSUM
compounds were “invented,” and that may surprise you, since you might think Nature got there first. However, none of the 111-V compounds we use occur naturally, and the Siemens Company and Professor Heinrich Welker were able to secure a valid worldwide patent on semiconductor devices in compounds made from
J
10
15
20
”
30
35
40
whao we CLaim ie:1. A semi-oonductor device iu which e compound &B, 8- h*hebore dc h n e d ie employed an the --conductor. 2. A seani-sondudor device in which a compound A,& an h e b e f o r e d&+ ia employed a~ the a m e n d u c t o r , smd cam und having a ntrnctura euch that the c u r nearest neighboara oi selected atom lie on the corner pointa of 8 tetn, hedron. 3. A mmi-conductar d h c a in which B campound A& an hereinbatore defined in amployed an the Semi-eonductor, eaid comyund having a a t r u c h e such bhat the our mumt neighboars of a selected dom lie o n the c o w points of an eqch h d tetrahedmll, 8t bhe C a b 9 p o d of which the Saleoted h in situated. 4. A device p.. nlaimerl in 3, wherein the aerm-condnctor compnsea D o a m p o d of the form A,& which haa a cubio zino b l a d e e t n r e - . 5. A dewice an clauned 1p claim 1, w ~ i the n elmat B, is c h ~ e nto be antimon 6. A %vim OJ o h e d in claim 5, wherein the aompound In S b comprises the eemiamdnator. 7. A dewice B. daimed in claim 5, wherein the oompound Al Sb, Oompriw the d-aomdaotor. 8. A device an claimed in claim 5. wherein the compound (la Sb oompriaes the Seani-xnduta. 9. A device an &ed in claim 1, wherein the dement B, is ohoaen to be phosphorab. 10. a deV’iW 8 8 alaimed bl Claim 1, wherein the element B, ie a h m n to be
ni???derice an claimad in whe+ the element B . is ch45 m e 12 A d&e BB claimed in wherein the a r m p o d In Aa the mmi-oonductor. 13. A device an claimed in 50 wherein the compoand 81 Ae the 3 e Q I l i - C d U C h . 14. A device aa OLaimed in wharain the compound Ga As the semi-conductor.
claim 1, to be claim 11,
cornprime
claim 11, compriees claim 11, comprises
FIGURE1. Siemens’ original Ill-V patent claims.
THE USE AND ABUSE OF 111-V COMPOUNDS
173
Whah we claim ie:1. A eeani-conductor dev+ in which a compound A,B. 88 hereanhedore de5 h e d ie employed ae the mmiumdnctor. 2. A sermi-conductor device in which a compound A,& 88 he&before d h e d is employed aa the ad-mnductor, said com und having a etructure such that 10 the neare& neighbarn of a aelected atam lie on the corxm pointa of a tetrsc hedmn.
Eur
FIGURE2. First claim.
the third and fifth groups of the Periodic Table. This was in 1952 (Fig. 1). You will see that the first claim of the patent is very broad (Fig. 2). There is a difference between knowledge and understanding, and to follow this story you must appreciate that those who were involved in this research did not bring all of their knowledge of physics to bear. The crucial difference between the two ways in which electrons lost the energy which had excited them was not emphasized. We knew that light atoms gave materials with large energy gaps, and there was also an implicit acknowledgment that a small energy gap led t o electrons traveling quickly in the crystal, a high mobility. The heavy atoms that caused this also gave the material a low melting point. A small energy gap gave photosensitivity at long wavelengths, in the infrared, whereas a high gap gave high enough resistance for diodes and transistors. It was natural for Siemens, and shortly afterwards British and American laboratories, to choose two distinct targets for their research: indium antimonide, with a small energy gap, and the binary compounds pivoted around germanium for a rival to silicon (Fig. 3). It is worthy of note that the basic physics we learned on InSb was later exploited in devices made in large energy-gap materials. Let us deal with this before we go back t o our primary concern, GaAs. Indium antimonide was hailed as a new kind of semiconductor, because its electron mobility was so high. Its melting point was conveniently low, and the resultant ease of preparing large, pure single crystals meant that most
Ga
Ge
As
In L
I
FIGURE 3. Section of periodic table.
174
CYRIL HILSUM
laboratories could study fundamental physical phenomena, magnetic, thermoelectric, and, most important, photoelectric effects. Two important applications were pursued, galvanomagnetic devices and infrared photocells. Such photocells were the crucial components in the first really effective guided missilies, the U.S. Sidewinder and the British Blue Jay, and though InSb, with its limited spectral range, is not used in this way today, it remains an excellent photocell material. The defense interest, of course, stimulated much practical work on material purification and theoretical research on electron lifetime and recombination. The pure material was an essential requirement for much of the fundamental physics of the 1960s, a most productive era in semiconductor physics. Galvanomagnetic effects are based on interactions with both a magnetic and an electric field. A high mobility means that the effects can be demonstrated in low magnetic fields, and if we compare the 3000 cm2/Vs typical of germanium in 1955 with the 70,000cm2/Vs reported for the newly
FIGLJRE 4. The susceptibility meter.
THE USE AND ABUSE OF 111-V COMPOUNDS
175
discovered InSb, you can well understand the rush of devices that resulted. Most of these were functional applications, which have died as digital circuits of considerable complexity have become cheaper. The simple magnetic field probe, based on the Hall effect, proved more viable, and this found many uses. Another interesting application of the Hall probe is the susceptibility meter, shown in Fig. 4. This, the magnetic equivalent of the Wheatstone bridge, is sensitive enough to measure the susceptibility of oxygen in blood. It has never been made commercially. If power is to be drawn from a magnetic field probe, the resistivity as well as the mobility becomes significant, and a material with a larger energy gap than InSb is favored, because it will have higher resistance. This also gives less temperature drift. Siemens moved to InAs and then, very early in materials science, to the ternary 111-V compound InAsP. The other important galvanomagnetic effect is magnetoresistance. The increase in resistance on immersion in a magnetic field is proportional to the square of the product of mobility and magnetic field, so again there is a premium on high-mobility materials such as InSb and InAs. The resistance change on moving a sample in a field gradient is very large, and motions of a few angstroms can be detected (Fig. 5). Some of the galvanomagnetic applications are listed in Table I. Let us for the moment leave our low-energy-gap research there and return to the initial exploration phase, with the search for an alternative to germanium. There were three clear candidates, once we learned that a move away from Group IV to 111-V gave both an increase in melting point and a larger energy gap (Fig. 6). These were GaAs, InP, and AlSb. The obvious choice was AlSb, based on plentiful harmless elements with a low vapor pressure. Who would not prefer it to GaAs, a combination of a rare metal
FIELD REGION
FIGURE5. The displacement meter.
176
CYRIL HILSUM TABLE I GALVANOMACNETIC APPLICATIONS
Hall Effect Application
Drive
Magnetic Field
Gauss meter Compass Susceptibility meter Clip-on ammeter
Local battery Local battery Oscillator Local battery
Magnet Earth Magnet Current in wire
Magnetoresistance Displacement Microphone
Oscillator Local battery
Magnetic gradient Magnetic gradient
with a toxic explosive vapor? We note that Siemens singled GaAs out in the original patent as much less important than AlSb, and only ninth overall (Fig. 7). Nature gave use the first pointer that we were wrong when we found that our crystals of AlSb disappeared overnight, not stolen by rival researchers, but decomposing into crumbs. Preserving them for the time needed to take measurements was itself a feat, and the results of the painstaking measurements disappointing. The mobility was abysmally low, a few hundred cm2/Vs. Reluctantly we reduced our candidates to two, and most groups concentrated now on GaAs. There could be no disguising the fact that this was both a difficult and a dangerous area of materials science. The earliest technique of manufacture was to seal Ga and As into a quartz tube, surround the furnace with bricks, and retire to a safe distance. Small crystals could later be extricated from the wreckage, for the tube invariably exploded. We thought that there must
Al
v
P
Si
AS
q G *.'a.
b -
In
4
Sb
FIGURE 6. Alternatives to silicon.
177
THE USE AND ABUSE OF 111-V COMPOUNDS
12. A device ae claimed in claim 11, wherein the compound In A8 comprieee the eemi-conductor. 13. A device as clshned in claim 11, wherein the compound dl AB cornprim the semi-conductor. 14. A device BB claimed in claim 11, wherein the coz~~pound Ga As compnees the semi-conductor. FIGURE7. Later Siemens claims.
be a better way, and this was found by Philips in the Netherlands with a two-temperature furnace, one part above the melting point at 1250°C, and the other at 600°C to control the arsenic vapor pressure (Fig. 8). But it was still very difficult for anyone to make reasonably sized single crystals, let alone control their purity. The problems seemed insurmountable, and gradually most foreign laboratories withdrew from work on GaAs, including Siemens, who were now concentrating on the much easier galvanomagnetic applications. In the UK we were determined to persevere with GaAs. Why we were so determined I do not know. It must have been a mixture of stubbornness and intuition. The “we” here is collective, because no individual saw it through. My own responsibility had been the low-energy-gap materials and applications until about 1961, by which time those working on GaAs were exhausted and dispirited. They moved on at a time when they had cracked perhaps 90% of the problem, leaving my section with 10% of the work, and later much of the credit. One of the things that the group had done to focus interest was to simulate a target, an easily understood goal. Ours was the “Red-Hot Transistor.” The applications for such a device must be Furnace at 1.10O”C Mullite tube
+ Silica tube
Silica boat
M&en
gallium arsenide
Puiey
Pulley
FIGURE8. The two-temperature furnace.
178
CYRIL HILSUM
1957 1958 1959 1960 1961 1962 1963 1964 1965
Year
FIGURE9. U.S. sales of Ge and Si.
minimal, but we had to encapsulate potential advantages over silicon, by then (1962) established as the successor to germanium (Fig. 9). The fact was that we could make no transistors at all, not even lukewarm ones. Transistors need electrons which can move freely in the crystal for an appreciable time, and the lifetime in our samples was so low it was difficult to measure it. As a result no transistor action could be seen for many years. However, there was progress along different, and unexpected, lines. There always was on GaAs. Whereas previously I have been talking about collective work, now I must introduce a personal note. I had been thinking about conductivity in GaAs, and wondered if the fact that there were two sets of energy states in which electrons could travel could give departures from Ohm’s Law. My calculations showed that this was indeed so, but this was not just a change in resistance. At high electric fields, 3 kV/cm or more, it appeared that an increase in voltage should actually give a decrease in current, a negative differential resistance. This is shown in (Fig. 10). Such behavior had never been observed, and it should be both interesting for physicists and also useful. Others had shown that such a sample would oscillate. If the calculations were correct, a dc voltage of just 3 volts applied to a uniform slice of GaAs, containing no junctions or transistors, would cause the circuit to oscillate at microwave frequencies. Two years after the prediction, the
THE USE AND ABUSE OF 111-V COMPOUNDS
179
4.
2.
F. volts/cm
FIGURE10. Transferred electron effect.
effects were observed, and today this electron transfer effect gives us the simplest microwave source we have. It is widely used. This part of the history is well documented. The part that is less well documented is the path from infrared photocells based on InSb to semiconductor lasers based on GaAs. By 1961 we were alerted to the fact that there were two types of 111-V compounds, most simply defined by their optical properties. There were those like indium antimonide, in which excited electrons died by giving up their energy as photons, or light. The others, like gallium phosphide, were similar in this respect to silicon and germanium, and the electron energy came out as heat, warming the crystal. The prospect of making a light source intrigued us, so we studied GaAs first, to make a semiconductor lamp for near-infrared emission, and then the alloys GaAsP and AlGaAs for visible radiation. Much materials research followed, and from it emerged the LED which all of you know. The light-emitting diode still remains one of the biggest applications for 111-V compounds, and the use of GaAs IR diodes in remote control is also widespread. By 1962 we had some insight into the processes by which excited electrons gave up their energy, mostly through basic studies on InSb, and we could also make very efficient IR-emitting junction diodes in GaAs. The invention of the gas laser had made semiconductor physicists aware of stimulated emission, and there developed a race to the semiconductor laser, won by General Electric in the States, with IBM a short head behind. The road to optical fiber communications was by no means established then-in fact, there was no application for the device, but that didn’t stop everybody from
180
CYRIL HILSUM
working on it. A new area of physics became available and popular. GaAs had not given up its perverse ways, however, for we soon discovered that it was one thing to make a laser, and another to keep it working without a plague of black spots developing. These problems did not change the general climate, in that by now it was again respectable to work on GaAs. The rush to the laser had diverted attention away from electronic applications, such as the transferred electron microwave source, and this had to wait until 1965 or so to become recognized as a marketable device. But before then we had our transistor-not red-hot, I hasten to say, but clearly amplifying and oscillating at room temperature. The GaAs transistor we had aimed at in the earlier years was conceived in desire, rather than planned as a necessary addition to the device family. By necessity it was based on the type of transistor first invented, which uses both holes and electrons, and is said to be bipolar. The crucial materials parameter for this transistor is the product of the electron and the hole mobilities, which is about the same for germanium and GaAs, since the hole mobility in compounds is so low (Fig. 11). There are subtleties which today might, and do, lead us to work on such devices, but then it was intuition rather than logic, propaganda rather than persuasion, which
OPERATING TEMPERATURE ("C)
FIGURE11. Predicted performance of bipolar compound semiconductor transistors.
THE USE AND ABUSE OF 111-V COMPOUNDS
181
supported the programs. The later unipolar, or field effect, transistor was different. This uses just electrons. Here you could demonstrate by facts as well as faith that the superior electron mobility of GaAs should give a transistor working to very high frequencies (Fig. 12). The IBM group at Zurich invented the device, but the management soon lost confidence in it, and it was left to the U.K. to do most of the development work. This transistor is now the core of the monolithic microwave integrated circuit, the key component in phasedarray radars and satellite communication systems. Incidentally, this shows that the limits given in the viewgraph were very pessimistic, because GaAs transistors work well beyond 10 GHz, the top line of that graph. For those with an interest in physics, it is entertaining t o observe that the two rival communication techniques, satellites and optical fibers, each rely on the properties of the direct energy gap, one for the high mobility and low noise, the other for stimulated emission. Thus is strategy vindicated. The years that followed can be described as a time for consolidation, though younger workers in the field may resent such a curt description of some inspired research. The previous work on a high-energy-gap substitute for germanium had yielded a number of electronic devices working at
FIGURE 1
182
CYRIL HILSUM
microwave frequencies, and two optoelectronic devices, the LED and the laser, which were rapidly expanding in their influence and forcing device designers to broaden their materials skills. Emission wavelengths are defined only too closely by energy gaps, yet similar applications may require a variety of wavelengths. This was shown strikingly in optical fiber communications, where the realization that it was possible to launch light a long distance in fibers was coupled with an appreciation of the loss mechanisms. The optimum wavelength for low attenuation and low dispersion moved out further into the infrared, away from wavelengths attainable with GaAs. It was here that the availability of mixed crystals was so valuable. We spoke earlier of the link between atomic weight and energy gap. Heavy atoms are larger, so a compound that emits in the infrared, like InSb, will have a larger lattice spacing than one for the visible, like GaAsP (Fig. 13). The use of a mixed crystal enabled us to fill in the gaps in the spectrum, so that we could get emission at any wavelength we chose. Bulk crystals of the useful mixed compounds cannot be grown, but we learned how to make good thin layers, depositing them on the crystals we could grow, such as GaAs and GaSb. The problem now was that adequate layers
0 0 COMPOUND
A18b cap 0CaSb
InP
InAa InBb
LATTICE CONSTANT
(A)
6.14 5-15 S.65 6.10 S.87 6.06 6.48
FIGURE13. (a) Tetrahedral ionic radii (A) and (b) lattice constants.
183
THE USE AND ABUSE OF 111-V COMPOUNDS
541
05
I
I
10
15
I
2 .o
I
I
I
2.5
30
35
Maximum wavelength of emission or detection Imicronsl
FIGURE14. GaInAsP, a quaternary compound.
could be deposited only if the lattice constants roughly matched, and there is a big difference between GaAs and GaSb. The mixed crystal concept was taken almost to the limit of credibility to overcome this problem, with lasers made from quaternary compounds, InGaAsP, the extra degree of freedom enabling one to choose an energy gap while tailoring the lattice constant to match the substrate on which the layer must be deposited. It was here that InP came into its own, for the lattice constant of GaAs does not match that of the useful quaternary (Fig. 14). This subservient role, acting as a support rather than a star, came after InP had been shown to be superior to GaAs as a transferred electron oscillator, particularly at frequencies above 60 GHz. The quaternary lasers were proposed for longer wavelengths at the time when GaAs lasers were shown to be prone to damage, and we naturally anticipated even worse life problems with the more complex materials. Typically, the four-element layers proved easier to make than the two-element ones, and they lived longer. The contradictions within the 111-V family seemed neverending. Let us pause at this point, the end of the era of conventional devices and summarize our applications, the use part of the title (Table 11). Some of these uses, quite important ones, have not been mentioned before, but you will have to take my word for it that these applications, like the others, were not foreseen by those who started work on the 111-V compounds. Certainly we exploited the basic science, but in completely unforeseen ways. We now move rapidly to today. Most of the excitement in achievement with compound semiconductors came not from out-performing silicon, in making superior transistors, but
184
CYRIL HILSUM TABLE I1 APPLICATIONS OF Ill-V COMPOUNDS Material
Device
Use
lnSb
Photocell Hall effect probe Magnetoresistance probe Hall effect and magnetoresistance probes LED IR diode Laser Electron transfer diode Field effect transistor Solar cell Photocathode LED
IR detector Magnetic measurements Magnetic measurements Magnetic measurements Indicator Remote control Communications Microwave source Microwave amplifier Power conversion Image intensifer Indicator, display
In ASP GaP GaAs
GaAsP
in inventing novel devices which could not be made in silicon, such as the LED, the laser, and the transferred electron oscillator. Perhaps this is why I describe the 1970s as a period of consolidation, with perhaps a hint of quiescence, if not boredom. But in that period seeds were being sown for another explosion in original thought, in which we are still immersed. Improvement in laser performance had come through complexities in device design which required a hitherto undreamed-of capability in depositing layers of compounds on an almost atomic scale (Fig. 15). It was Bell Laboratories who first capitalized on this ability in a revolutionary way, by showing that in thin layers of GaAs and GaAlAs, with a sharp boundary between them, one could separate the electrons from the donor atoms that had supplied them. It had previously been thought inescapable that the transport of electrons in any reasonable concentration must be hindered by the charge on the impurity atoms from which they derived. A good number to remember is that 1017 donors per cm3 will halve the mobility of the pure crystal (Fig. 16). Now one could almost remove the impurity scattering, too (Fig. 17). It is true that the effect was seen only in a sheet of electrons, the twodimensional electron gas, but this could still find use in the HEMT, the high electron mobility transistor. Useful this is proving to be, but the main result of this discovery was not its direct implementation, but the realization that we could be seeing new physics, with many of our basic assumptions overturned. Much of our physics comes from expressions of natural laws which are, of necessity, approximations. They are valid under certain boundary conditions which are valid for the experiments by which we seek verification. The simplest example is that of transport through a semiconductor sample,
THE USE AND ABUSE OF 111-V COMPOUNDS
A PURE SEMICONDUCTOR CONTAINS FEW ELECTRONS. DEVICES NEED ELECTRONS, SO YOU DOPE WITH ATOMS WHICH WILL GIVE UP ONE OF THEIR OWN ELECTRONS. THE DOPING ATOMS THEN BECOME CHARGED, AND THE CHARGE INTERFERES WITH THE ELECTRON MOTION. THE ELECTRONS THEREFORE TRAVEL SLOWER THAN IN PURE MATERIAL. THE REDUCTION IS BY A FACTOR OF
l+VTiKF AND THIS IS EQUAL T30 2 WHEN N = 1017cm
FIGURE16. Doping semiconductors in the 1970s.
185
186
CYRIL HILSUM
0'0'0'0'
0'0'0'
2
1 2 A THIN LAYER OF MATERIAL 1 IS SANDWICHED BETWEEN LAYERS OF MATERIAL 2, WHICH HAS A HIGHER IONISATION ENERGY. ELECTRONS FROM IMPURITY ATOMS IN 2 SLIDE DOWN THE ENERGY SLOPE TO 1. THESE ELECTRONS NOW TRAVEL IN PURE MATERIAL, SO THEY ARE FAST. THE MOBILITY GAIN AT 20°C IS 60%.
FIOURE17. Doping semiconductors in the 1980s.
in which we assume that the terminal electron velocity is determined by an equilibrium between energy gained from the electric field and that lost to collisions with the lattice atoms. But what if our sample is so thin that the carriers reach the anode without having spied an atom? This is no longer an academic question when we can readily make layers 5 nm thick (Fig. 18). We have to think about the significance of electron wavelengths comparable with the thickness of the sample, of the likelihood of tunnelling through energy barriers, of resonant phenomena, of quantum effects. We can exploit this new physics by making new types of diodes and transistors, optically bistable layers, highly photosensitive diodes, quantum-well lasers whose emission wavelength depends on the dimensions as well as the material (Fig. 19). It would be pointless for me to construct a list of all the applications that might develop, anticipating the results of this new physics, because it would almost certainly be inaccurate. The physics we knew at the beginning of my story did not serve us well in the unfolding of the many uses for these versatile materials. The abuse in the title largely came from our inability to interpret the physics effectively and t o see how to fill in the knowledge gaps at the right time. Yet can we feel that our experience has been helpful, that we have learned how to cut corners so that the new physics of semiconductors will be mastered and applied more quickly? There should, of course, be some time attenuation, since there are many more scientists and engineers now interested in the problem than there were 35 years ago. But it has taken nearly 10 years to see the first devices exploiting two-dimensional electron gases. It may well be that there is a natural gestation time between discovery and exploitation which blurs the links. There is now a growing movement to classify basic physics into fields which create wealth and those that are said
187
THE USE AND ABUSE OF 111-V COMPOUNDS
CHARACTERISTIC ELECTRON LENGTHS IN SEMICONDUCTORS
I
LNs
-
X-RAY
DEPLETION w DTH
PRESENT PRODUCTION DEVICES 10'"
s ABRUPT SPACE CHARGE EDGE APPROXIMATION HOLDS DEBYE LENGTH
I
10'2s
ABRUPT SPACE CHARGE EDGE APPROXIMATION FAILS RETARDED TRANSPORT
10'13~
MEAN FREE PATH
ELECTRON WAVELENGTH
WANTUM REGIME 1014 s
UNIT CELL DIMENSIONS EFFECTIVE MASS APPROXIMATION FAILS I
10
A
I
I
iooA
iwoA
I
r ~ m
INTERATOMIC SPACING
I
10~m
io0pm
ATTAINABLE LINEWIDTHS
FIGURE18. The new physics.
0
6000
7000
8000
9000
Wavelength I(
FIGURE19. Emission from a quantum-well laser in GaAs.
188
CYRIL HILSUM
to be mere academic meditation. Would it were so simple! There are many blind alleys to be trudged, many diversions in short-term problems which tease for a solution. Fortunately, there is also an instinct for the kernel of success, an instinct sharpened by our past tutors and honed by own experience which nags us to persevere when managers press for the common-sense way out. The oftenrepeated saw “GaAs was, is, and always will be the material of the future” was nearly too true to be funny to those who lived and breathed the stuff. There has to be a moral to a story, and a lecture, as long as this. “Seek and ye shall find” is fairly appropriate, but you have to knock pretty hard and very often before the door is opened. Physics does give a true foundation, but the structure that you build on it is limited by your imagination. I hope you have it in plenty.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 91
Telecommunications: The Last, and the Next, 50 Years JOHN BRAY The Pump House, Bredfield, Woodbridge, Suffolk United Kingdom
1. Today’s World of Telecommunications . . . . . . 11. Telecommunications 50 Years Ago . . . . . . . 111. Key Developments in the Last 50 Years . . . . . . A. Coaxial Cable and Microwave Radio-Relay Systems
. . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Invention of the Transistor and the Microchip The Digital Revolution . . . . . . . . . . . . . . . . . . Satellite Communication . . . . . . . . . . . . . . . . . Optical Fiber Communication . . . . . . . . . . . . . . . . Electronic Computer-Controlled Switching Exchanges . . . . . . . . G. Facsimile, Data Access (Teletext), and Television Conferencing Services . . IV. Telecommunications in the Next 50 Years . . . . . . . . . . . . . A. New and Improved Services . . . . . . . . . . . . . . . . B. New Technology and System Concepts . . . . . . . . . . . . . V. The Impact of Telecommunications and Information Technology on the Future of Mankind . . . . . . . . . . . . . . . . . . . . . . . B. C. D. E. F.
189 191 192 193 195 195 198 199 20 1 204 205 201 209 210
I. TODAY’S WORLDOF TELECOMMUNICATIONS
The present-day world telecommunication network is the most complex, extensive, and costly of mankind’s technological creations and, it could well be claimed, the most useful. Together with sound and television broadcasting, telecommunicationcommunication at a distance-provides the nervous system essential for the social, economic, and political development of mankind. It enables any user of the world’s total of 700 million telephones in homes and offices, ships, aircraft, and motor cars to communicate with any other, regardless of distance. It provides means for the fast distribution of documents and the “electronic” letter mail; it enables man to communicate by data transmission over the network with remote computers and data banks, and computer with computer. The long-distance transmission of television signals is now commonplace, even from the far-distant planets of the solar system. The use of television 189
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014733-J
190
JOHN BRAY
for business conferences between offices in distant locations is increasing as more wide-bandwidth channels become available in the world telecommunication network. A whole new art of information access, exchange and processing-IT or information technology-is also developing whereby users in home or office can gain access to virtually unlimited pages of information, visually displayed, from local or remote data banks via the telecommunication network, and interact with displayed information or send messages on a person-to-person or group-to-group basis (Fig. 1). The future impact of information technology linked t o telecommunications could well be immense. By removing the need to travel to Optical Fibre Systems
7 Microwave Radio
4
-=7 Coaxial Cable
/
/
Facsimile Oatel Confravision Radiopaging Radiophone Telex Telephony Telegraphy
1970s
1980s
4
/
-
Audio Svstems
-
-
/
/
/
/ Telephony
Telegraphy
Telegraphy
Telex Telephony Telegraphy
1850
1890
1930
Prestel Electronic E% :l ci: Funds Transfer Viewphone Viewdata LOW~COSI Fax Telemetry Telecommand Super Telex Enhanced Oata f acsimile Oatel Confravision Radiopaging Radiophone Telex Telephony Telegraphy
/
Carrier Cable Systems
/
Stereo Video Telemail Home Newspapers Colour f AX Prcstcl Electronic Office Electronic Funds Transfer Viewphone Viewdata Low~cos: Fax Telemetry Telecommand Super Telex Enhanced Oata Facsimile
Dotel Confravision Radiopaging Radiophone Telex Telephony Telcpraphy
2000
FIGURE1. The growth of telecommunication services and the matching transmission systems from 1850 to the year 2000.
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
191
communicate, the vast waste of human and material resources needed to provide ever-expanding rail and road facilities for countless millions of computers every day from homes to city offices could well be minimizedand by diverting much office work from large cities to villages and small towns, the quality of life could be enhanced and and the rural economy sustained. This vast development in scale and range of telecommunication services and facilities has become possible mainly by a remarkable and continuous evolution in telecommunication system concepts and supporting technology, much of which has taken place in the last few decades, but of which the scientific and mathematical foundations can be traced back a century or more. The contributions of Michael Faraday in experimental electromagnetism, Clerk Maxwell, Oliver Heaviside in electromagnetic wave propagation theory, and Heinrich Hertz for his discovery of radio waves, mark high points in the process of human intellectual creativity. No less important as the electronic age dawned were the contributions of J. J. Thomson and Ernest Rutherford for their discovery of the electron, and Max Planck’s concept of the photon as the indivisible quantum of light energy. To assess the advances in telecommunication made during the last 50 years, it is convenient as a starting point to review briefly the technology, system concepts, and services are they were at the end of World War 11.
11.
TELECOMMUNICATIONS 50 YEARSAGO
In the early 1940s, telecommunication services in the United Kingdom were predominantly for telephone speech and telegrams, with limited low-speed data and facsimile transmission. Telegrams were transmitted using teleprinters operated at 50 bits/s, corresponding to the speed of the average typist. Video transmission hardly existed except for a few miles on cable pairs in London. Intercity transmission of telephone speech was by mainly by overhead pole-mounted wires and underground wire-pair cables. These provided groups of 12 or 24 voice circuits, arranged in a frequency-division multiplex of 4 kHz spaced channels, using carrier frequencies of 12 to 60 or 12 to 108 kHz. The first prototype coaxial cable systems, providing some 600 voice circuits between 60 and 2540 kHz, came into service in the early 1940s. Transoceanic telephone communication in the 1940s was provided by beamed short-wave radio systems, limited in capacity t o a few tens of voice channels and subject to occasional disruption or poor quality of
192
JOHN BRAY
transmission due to ionospheric storms, Between the United Kingdom and the U.S.A., a single, more reliable, voice channel was also available on the long-wave transatlantic radiotelephone system. However, it was not until 1956 that a transatlantic submarine cable, providing 35 voice channels, came into service and demonstrated that there was a pent-up demand, given reliable good-quality service. The Victorian telegraph engineers and their successors had built up a worldwide network of submarine telegraph cables, and these, although restricted in speed to a few words per minute, provided a reliable but slow means of communication, used mainly for diplomatic and business purposes. In the 1940s at least a third of telephone switching-that is, the connection of a caller to a called customer-was carried out by operators at manual telephone exchanges in response to a spoken request. The remaining majority of calls were switched in automatic exchanges equipped with Strowger electromechanical switches controlled by dialed impulses originated by the caller. The Strowger switch, invented by a Kansas, U.S.A., undertaker in 1889, remained the backbone of the world’s telephone switching systems for several decades. However, it was bulky and slow, and it required considerable labor-intensive maintenance. Customer-dialed switching was at first restricted to a limited area around each exchange, and it was not until 1958 that “Subscriber Trunk Dialing” (STD) became available and enabled long-distance calls to be directly dialed by the caller. Viewed from the 1990s, telecommunication services in the 1940s can be seen as limited in scope and scale, and relatively costly-useful as they were at the time. The ability to reduce the cost of each intercity telephone circuit by providing large numbers of circuits on a common pathway such as a hair-thin glass fiber, the possibility of spanning the oceans by a satellite, and the coming of fast, reliable, and flexible electronic switching were hardly to be envisaged in the 1940s. Nor could be foreseen the growth of a wide range of new visual and data communication services that were to transform the very nature of telecommunications. The war years, 1939-1945, gave a strong impetus to the development of radio and electronic technology for military purposes that later proved of value for civil telecommunications, notably for VHF, UHF, and microwave radio-relay systems. 111. KEYDEVELOPMENTS IN THE LAST50 YEARS
To a considerable extent the exponential growth of telephone traffic and the burgeoning of new services such as facsimile, data access, teletext, teleconferencing, and video links for television broadcasting were stimulated
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
193
and made possible by the creation of new device and materials technology and system concepts during the 1950s, ’60s and ’70s. To a degree these new services were “technology-led,” as distinct from the “software-led” developments, fueled by an ever-increasing public demand for communication, that followed in the 1980s and later years. From a vast range of device innovation and many new system concepts, certain key developments can now be seen as seminal and of critical importance. A . Coaxial Cable and Micro wave Radio-Relay Systems
The rapid growth of intercity telephone communication in the post-war years created a demand for transmission equipment providing several hundreds of telephone circuits on a common cable pair or microwave radio carrier in order to achieve economies of scale. At the same time the television broadcasting service, resumed in London after the end of the war, began to spread in the 1950s to provincial cities throughout the U.K. and created an urgent demand for intercity video links. The prototype frequency-division multiplex coaxial cable system developed at the Post Office Radio and Research Laboratories in the 1930s became the basis for a network of such systems on intercity links throughout the United Kingdom. A family of coaxial cable systems was designed and manufactured by the U.K. telecommunications industry over the years from 1950 to 1980 with telephone circuit capacities increasing from 960 t o 10,800 on each 4-mm or 10-mm diameter coaxial pair. These developments made possible a 10-fold reduction in the annual costs of providing each intercity telephone circuit (Fig, 2). Two key developments may be noted that made these advanced coaxial cable systems practicable: 1. the invention in 1927 by H. S. Black, Bell Labs. of negative feedback, which stabilized the gain and reduced distortion in cable amplifiers; and 2. the development of the quartz crystal filter in Bell Labs, which greatly facilitated the close stacking of telephone channels in a 4 kHz spacing frequency-division multiplex.
The coaxial cable network, designed primarily for multichannel telephony, was, however, not technically well suited to television signal transmission with its critical phase linearity requirements. The urgency of the demand for television distribution to provincial cities gave an impetus to microwave radio-relay system development, since such systems with their general use of hilltop sites could be more quickly installed than roadside
194
JOHN BRAY ANALOGUE ERA
m
CARRIER ERA
-+
- OlGllAL - - -.- ERA- - -
+
-----*
24 CIRCUIT CARRIER
- -*
COAXIAL CABLE ISEMl CONOUCTORI E R A
MICROWAVE RADIO IFMIIDMI ERA ,~IC_RD_W~Vf-RAOIO
FIGURE2. Relative annual costs (per 100 km) for communications systems.
cables. Moreover, microwave radio-relay systems could be designed to accommodate either multichannel f.d.m. telephony, matching the cable f.d.m. multiplex, or television. Much of the pioneering work on microwave radio-relay systems in the 1930s and 1940s was carried out at Bell Labs (U.S.A.), and development in the U.K. in the British Post Office and in the telecommunications industry followed a broadly similar pattern. This resulted in a family of microwave radio-relay systems operating in the 2-, 4-, and 11-GHz frequency bands, using frequency modulation of the radio carrier. A unique contribution of the British development was the traveling-wave tube amplifier as an output stage of the microwave transmitter, as an alternative to the Bell 4-GHz triode valve amplifier. The traveling-wave tube offered far less critical manufacturing tolerances and broader bandwidth, and it was suitable for microwave frequencies at least up to 15 GHz. It was invented by R. Kompfner at the Clarendon Laboratory, Oxford University, during the war years, manufactured by Standard Telephone Cables U.K., and first used on the Post Office Manchester to Kirk-o-Shotts microwave link in 1949. Traveling-wave tubes were later widely used in microwave radio-relay systems throughout the world, and in satellite communication and military surveillance systems. Both coaxial cable and microwave radio-relay systems were adapted in the 1980s to digital PCM operation, generally at 140 Mbit/s, as part of the analog-to-digital revolution (see later discussion).
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
195
B. Invention of the Transistor and the Microchip
The invention of the transistor by J. Bardeen, W. H. Brattain, and W. Shockley of Bell Telephone Laboratories, U.S.A., in 1948, and the planar-integrated circuit-the “microchip”-by J. S. Kilby of Texas Instruments, U.S.A., and R. Noyce of Intel, U.S.A. in 1958-1959, began a development in electronics that was to have a profound and continuing impact on telecommunications, on sound and television broadcasting, and on computing throughout the world. The transistor and the microchip enable electronic equipment to be made that is smaller, more reliable, and lower in cost and power consumption than is possible, using thermionic valves. By exploiting the properties of semiconductor materials such as silicon with added atoms of other elements, active devices such as amplifiers, switches, and rectifiers can be created in very small physical sizes. The microchip, which may contain a million or more transistors and similar devices on a square centimeter of silicon, enables complex circuit operations to be performed rapidly, reliably and economically-greatly enhancing, for example, the power of computers to calculate, the service functions available in electronic exchanges, and the quality of color television. Moreover, the microchip is well adapted to highly automated mass production; this has led to decreasing unit costs and high reliability in spite of growing circuit and function complexity. The transistor and the microchip have facilitated the design of largercapacity land and submarine cable systems, and the design of communication satellites. They have made possible a vast range of customer equipment for computing, communicating, and broadcasting; they gave rise to new electronic industries and changed old ones beyond recognition. And, in particular, they made possible the evolution of communication from analog to digital modes. C. The Digital Revolution
A “digital” system is one in which information, whether in the form of speech, facsimile, television or data signals, is transmitted or processed by “on” and “off” pulses of a current, radio, or light wave, as compared with an “analog” system in which the continuously varying amplitude of the information signal is conveyed by corresponding amplitude, frequency, or phase modulation of a carrier wave. Digital techniques, which first began to find application in the 1930s, have now revolutionized telecommunications, computing, and the recording of
196
JOHN BRAY
video signals-television broadcasting may well see the introduction of digital techniques in the future. Early digital systems used amplitude, width, or position modulation of individual pulses of a carrier wave, as in some Army wartime radio systems in which several speech channels were interleaved in time. The key development that transformed digital systems was “pulse code modulation’’ invented in 1937 by A. H. Reeves, then working with E. M. Deloraine in the laboratories of Le Material Telephonique, Paris. It was an outcome of Reeves’s search for a means of overcoming noise and interference in telephone transmission systems. To d o so, he envisaged sampling the amplitude of a speech signal at a rate equal to twice or more the highest frequency in the speech wave and converting each amplitude sample into a short coded train or “byte” of on-off pulses. The presence or absence of pulses in each train defined uniquely the amplitude of the original speech sample, usually according to a code in binary notation. Noise or interference not exceeding half the amplitude of each pulse was ignored in the process of detecting whether or not a given pulse was present or absent. The benefits of pulse code modulation may be summarized as follows: 1. Digitally encoded signals may be transmitted over long distances,
requiring many amplifiers in tandem, or switched in many exchanges, without introducing noise, distortion, or loss of signal strength, whereas analog signals suffer a progressive loss of quality; 2. Many channels and various types of information, e.g., speech, data, facsimile, or video, can be transmitted simultaneously on a common path such as a coaxial or optical fiber cable, or a microwave carrier, without mutual interference; 3. Digital signals may be stored, e.g., on magnetic tape or optical disks, and reproduced repeatedly without loss of quality; 4. Digital signals lend themselves readily t o the use of microchips for coding and decoding, storage, and processing, with their advantages of low cost, reliability, and compactness. The use of PCM digital techniques on an appreciable scale in the British Post Office telecommunication network began in the 1960s when massproduced microchips became available at acceptable cost. The first applications were to interexchange junction links on wire-pair cables, the digital PCM equipment providing 24 or 30 voice channels on each wire-pair, with 8-bit coding, an 8 kHz sampling rate, and an overall bit rate of 2 Mbitls. These digital systems were economically and operationally attractive because they gave a large increase in capacity on existing inter-exchange
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
197
cables to meet traffic growth without laying new cables. From this modest beginning, larger-capacity PCM digital systems for up to 2,000 telephone channels or a television channel at bit rates of 140 Mbit/s on coaxial cable and microwave radio-relay links were developed in the 1970s. Recognizing the growing importance of digital techniques, the International Telecommunication Union, through its Telephone, Telegraph and Radio Consultative Committees, put in hand the standardization of the basic parameters for a “hierarchy” of PCM systems in steps ranging from 30 to 6,000 telephone channels at pulse bit rates ranging from 2 to 140 Mbit/s with “slots” for data, facsimile, and television channels. This international cooperation was of great importance in facilitating communication across national boundaries, and in enabling the development of equipment to be carried out in an orderly and efficient manner. At first, digital multichannel telephone systems on intercity coaxial cable links had difficulty in competing economically with the well-established frequency-division multiplex analog systems. This changed significantly when it became possible to switch digitally in trunk exchanges (see later discussion), thereby avoiding the cost of digital-analog conversion equipment. A task force set up by the British Post Office in the 1970s to study modernization of the U.K. telecommunications network made a firm recommendation to “go digital” for the main intercity trunk network, both for transmission and switching, and to plan for “an integrated digital network” in which the benefits of digital operation could be extended to customers for a wide variety of telecommunication services. This far-sighted recommendation has now been given practical effect with the implementation of British Telecom’s “Integrated Services Digital Network.” ISDN provides a wide and growing range of “dial-up” modes of communication, including telephony, facsimile, videotext, audio and television conferencing, and data-base access and transfer. It also provides “multipoint,” i.e., broadcast, operation as well as “person-to-person” communication. The structure is based on the 64 kbit/s pulse rate for PCM digital speech transmission, which can be used in integral multiples for wider-bandwidth services such as group-to-group television conferencing. Bit-rate economy for visual services has been achieved by highly efficient analog-to-digital coding techniques and the removal of redundant picture information. Digital PCM tehniques are now firmly established in coaxial cable, microwave radio-relay, optical fiber, and satellite transmission systems and in exchange switching; they are likely to find an increasing role in mobile radio communication and sound and television broadcasting in the future-a remarkable tribute to the inventor of PCM. A. H . Reeves.
198
JOHN BRAY
D. Satellite Communication The 2nd of July 1962 saw the first-ever transmission of live television signals across the North Atlantic via the Bell Telephone Laboratories TELSTAR earth-orbiting satellite and the participating earth stations at Andover, Maine, U.S.A. (Bell), Goonhilly, Cornwall (British Post Office), and Plumeur Bodou (French PTT). This momentous event heralded the beginning of a worldwide revolution in long-distance communication and broadcasting that has expanded enormously the scope and scale of these services. It has also opened the door to greatly improved mobile radio communication, e.g., to ships and aircraft. The technical advantages of satellites are obvious enough. For telecommunication links, only one repeater/amplifier is required, i.e., in the satellite, compared with the many repeaters on long land and submarine cables, with substantial gains in reliability and quality of transmission. For direct broadcasting, large areas can be covered by a single transmitter, avoiding the “shadow” areas created by hills and mountains when using ground-based transmitters. Furthermore, the relatively wide frequency bands available in the microwave spectrum and the use of highly directional ground station aerials permit a massive growth of telecommunication traffic on a worldwide scale. The economic advantages of satellite compared with terrestrial telecommunication links increase with the distance spanned-however, the high-capacity optical fiber submarine cable (see later discussion) may challenge this advantage. The first proposal for a world satellite communication system was made by Arthur C. Clarke (U.K.) in 1945; he described how a satellite orbiting the Earth in a west-to-east direction at 22,300 miles height above the equator would appear to be stationary to an observer on the ground. He noted that three such satellites, equispaced around the equator and equipped with radio receivers and transmitters, could provide virtually worldwide coverage for point-to-point telecommunication links and for broadcasting. A similar proposal was made in 1955 by a Bell Laboratories (U.S.A.) scientist, J. R. Pierce. Both proposals were ignored until the Russians launched into low Earth orbit the first man-made satellite, Sputnik 1, in 1957-an achievement that convinced civil and military authorities in America and Europe that satellites were “for real.” The challenge was taken up in the U.S.A. by the American Telephone and Telegraph Co. and Bell Telephone Laboratories with their TELSTAR project, and the U.S. National Aeronautics and Space Administration with a similar project, RELAY. The British Post Office and the French PTT
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
199
Administration agreed to cooperate by building ground stations and conducting communication tests. The TELSTAR satellite, designed and built by Bell, was launched by NASA into an elliptical orbit from about 600 to 3,500 miles height, orbiting the Earth every 2.5 hours. It thus required precision tracking by the highly directional ground station aerials and was only mutually visible to ground stations on either side of the Atlantic for periods of from 0.5 to 1.0 hours each pass. Nevertheless, the tests that began in July 1962 fully demonstrated that satellites could provide highquality transmission of color television signals, 600 or more simulated telephone channels, and high-speed data transmission. The age of worldwide satellite communication had begun. But a further step remained-the achievement of Clarke’s idea of the stationary orbit, which would avoid the need for tracking and give continuous 24-hour coverage with a single satellite. The credit for this achievement must go to the engineers of the Hughes Aircraft Co. in the U.S.A.-their project SYCOM satellite was launched into “synchronous,” i.e., stationary, orbit in July 1963. From these beginnings, communication satellite development has proceeded apace. Solar to electrical energy conversion systems of higher efficiency provide more power; satellite receiver/transmitter transponders provide 10 or more television channels and 100,OOO or more telephone circuits. Directional aerials on satellites, illuminating defined areas of the Earth’s surface and concentrating radiated energy where it is most useful, have made possible direct broadcasting from satellites to domestic receiving aerials of modest size. And the exploitation of digital techniques with sophisticated coding is making possible the transmission of high-definition, e.g., 1 ,250-line, television pictures from satellites to viewers’ homes. E. Optical Fiber Communication
Optical communication systems using light transmitted by hair-thin glass fibers are now assuming a dominant role in intercity and interexchange cable links and in transoceanic submarine cables; the local distribution network between switching exchanges and customers’ premises will involve an increasing role for such systems in the future. Suitably designed glass fibers transmitting coherent light offer the possibility of almost unlimited communication capacity. Light with a wavelength of the order of 1 micron has a frequency of 300 million MHz-if only one percent of this is effectively used, the corresponding communication bandwidth is 3 million MHz, wide enough to accommodate millions of telephone circuits or hundreds of television channels on a single fiber.
200
JOHN BRAY
Furthermore, by using light as a carrier, such systems are immune to interference from electrical machinery and radio transmissions. The compact mechanical form and flexibility of glass fiber cables are particularly useful in the provision and installation processes. But optical fiber communication systems did not reach the stage of operational and commercial practicability without a long and sustained effort that began in 1966 with the publication of a classic paper by K. C . Kao and his colleague G. A. Hockham at Standard Telecommunication Laboratories (U.K.), setting out the scientific design principles of an optical fiber cable. The technological challenges involved in achieving a viable system were substantial. First was the development of low-loss glass, starting from an initial 1,000dB/km or more, reducing it to a useable 10dB/km and eventually to less than 1 dB/km. And this had to be achieved in a fibre structure in which the refractive index distribution between an inner core and the glass cladding was closely controlled in order effectively to guide a light wave. Means had to be developed for continuously drawing glass fibers in lengths of 10 km or more as a basis for a practicable and economic manufacturing process. Solid-state laser light sources and photodetectors had to be developed, suitable for operation at pulse bit rates of 100 Mbit/s or more, with the long lives needed for an operational system. And cable jointing and laying techniques suitable for field use had to be created. Important contributions to this work were made in the laboratories of British Telecom, Standard Telecommunication Laboratories, and Southampton University in the U.K., and Corning Glass and Bell Telephone Laboratories in the U.S.A. By 1986 there were more than 65,000 km of 140 Mbit/s optical fiber systems in operation in the British Telecom intercity network, and 565 Mbit/s systems were beginning to come into service. In that year a worldfirst optical fiber submarine cable commenced operation between the U.K. and Belgium, and others followed across the Irish Sea. None of these required repeaters on the sea bed-a clear illustration of the success achieved in the long scientific and engineering battle to reduce losses in optical fibers. It paved the way for an even more dramatic success in the optical fiber story-the spanning of the North Atlantic by an optical fiber cable in 1988. The first transatlantic optical fiber submarine cable system TAT 8 was a joint enterprise between the American Telephone and Telegraph C o . , British Telecom, and France Telecom; it provides some 40,000 telephone circuits, together with data and video channels, over 12 glass fibers, each carrying 140 Mbit/s pulse trains. The development of optical fiber systems continues; light can now be amplified, e.g., in erbium-doped optically pumped glass fibers, it can be electrically switched from fiber to fiber, and light of different wavelengths
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
201
can be multiplexed on the same fiber. Cable losses continue to be reduced, and there is a prospect of a repeaterless transatlantic optical fiber cable. It is clear that there is a continuing and expanding future for the optical system concept that originated with Charles Kao in 1966. F. Electronic Computer-Controlled Switching Exchanges
With more than 700 million telephones in use in the world, and with facsimile, data, and video communication growing at an ever-increasing rate, the problem remaining once transmission paths have been created by cable, radio, or satellite is to enable each user of the world telecommunication network to select another from the millions connected to the network. This problem has been largely solved through the efforts of the many scientists, mathematicians, and engineers who have created today’s fast and reliable exchange switching systems. The creation of a worldwide telecommunication network, and especially its switching systems, has necessarily involved international cooperation on a large scale to secure the commonality of system technical characteristics and operating procedures to ensure that both local and long-distance calls can be connected with equal ease. This cooperation has been achieved through the International Telephone and Telegraph Consultative Committee (CCITT) of the International Telecommunication Union, and its success has been an outstanding example of the willingness of nations to work together for the common good when the aims are clearly defined. Exchange switching systems have gone through a long process of evolution, from the early manually operated switchboards connecting limited numbers of users in a local area, through electromechanical switches such as Strowger and Crossbar which were prominent from the 1930s to the 1970s, to the electronic computer-controlled switches which are now an increasing part of the world network. The electromechanical switching systems were essentially “step” systems in which pulses of current generated by the user’s rotary telephone dial moved the exchange switch one step at a time to find a wanted connection. Since each switch could accommodate only a hundred or so outgoing lines, banks of switches had to be interconnected to ensure that, as far as possible, there was an unblocked pathway through the exchange. The economic design of electromechanical exchange switching systems to minimize the numbers of switches required became a complex and highly mathematical art known as “trunking and grading.” The problems of design became more acute as the number of user lines at each exchange grew into the tens and hundreds of thousands-the “big city” problem.
202
JOHN BRAY
The “big city” problem was solved by a key development known as “common control,” in which the dialed impulses are placed in a temporary store or register at the exchange and then converted to a machine language best adapted to control the interconnecting switches. This principle was later applied in the “stored program” of computer-controlled electronic exchanges. The electromechanical exchange switching systems of the Strowger and Crossbar types had considerable limitations-the switches were bulky and slow, and they required considerable maintenance to avoid contact failures, e.g., through exposure to the atmosphere. Although these problems were minimized by a generation of “reed-relay” exchanges, based on contacts sealed from the atmosphere, the basic limitations remained. The evolution of exchange switching from electromechanical to computer-controlled electronic systems opened the door to major improvements in the speed and reliability of the connection process, but above all it enabled additional service facilities such as abbreviated dialing, call transfer, and itemized billing of call charges t o be provided rapidly and efficiently. It greatly facilitated the management of switching exchanges and customers’ lines, especially with regard to fault location and correction. And it provided the flexibility to handle the expanding non-voice telecommunication traffic such as data, facsimile, and video. This evolution was made possible by the transistor and the microchip; by the principle of “stored-program,’ control, adapted from computers; and by digital techniques. The microchip and other solid-state devices provide the basis for very fast and reliable switches; they enable temporary and semipermanent “memories” to be made for storing information, e.g., in transient form about call progress and, in more permanent form, for switch control. The latter includes “logic” operations, e.g., the manipulation of information into a differept form suitable for switch control. In contrast with earlier “wired logic,” solid-state logic can be readily rearranged to accommodate new service requirements. To the Bell Telephone Laboratories (U.S.A.) must go the credit for the first pioneering and major advance in electronic switching systems-the Bell System ESS No. 1-which commenced service in 1965, then described as the largest single development project ever undertaken by the Laboratories. Although ESS No. 1 did not incorporate large-scale integrated circuits and used a variety of magnetic device technology, it was based on commoncontrol principles which were continued in its successors and implemented more cost-effectively in microchip technology. ESS No. 1 was also one of the first electronic exchanges to incorporate push-button “touch-tone” dialing, providing faster connections than the rotary dial, and which was eventually adopted worldwide. By the 1970s, PCM digital transmission was
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
203
beginning to supplement analog frequency-division multiplex transmission on interexchange and intercity routes in the U.S.A. and it became economic to switch digitally, as in the Bell System ESS No. 4 electronic exchange. In the British Post Office, electronic switching research and development had been pursued, in cooperation with the U.K. telecommunications industry, since the 1950s. Two key developments emerged from this work, one leading to digital switching, and the other t o a radically new concept of system design and evolution. The application of 24-channel time-division multiplexed PCM digital transmission on interexchange junction routes gave the P O Research Department an incentive to explore digital switching, leading to a field trial at Empress Exchange, London in 1965-a world first. The trial system, which switched traffic on six 24-channel PCM digital links between three neighboring exchanges, was formally opened for operational use in 1968. This achievement, and a similar development by Standard Telephones and Cables Ltd. with a field trial at Moorgate Exchange, London, in 1971, provided confidence and a firm basis for the ongoing development of digital switching throughout the United Kingdom telecommunication network. The second key development was the creation of an evolutionary approach to the design of exchange switching systems that became known as “System X.”This concept, initiated by the British Post Office and subsequently developed in collaboration with U.K. industry, defines a common framework for a family of exchange switching systems including local, tandem, and trunk exchanges, based on a series of hardware and software subsystems, each carrying out or controlling specific functions in the switching process. A primary aim of System X is to make provision for future changes that are operationally or economically desirable, e.g., to take advantage of new system concepts and advances in device technology, and to provide new services rapidly and reliably. At the same time, System X is designed to work with the existing network on either an “overlay” or a “replacement” basis, so that the network can be updated without disruption. Central to System X design are the following: 1. Stored program control for controlling the operation of the exchange
and performing administrative functions such as maintenance, recording call charges and traffic data. 2. Common channel signalling, using only one out of many channels to control the setting-up and routing of calls. 3. The use of microchip technology. 4. Provision for integrated digital transmission and switching at local, as well as tandem and trunk levels.
204
JOHN BRAY
5 . Provision for non-voice services such as facsimile, data, Prestel,
audio, and video conferencing. Many of the System X concepts were initiated in the Research Department of the British Post Office and were later given a specific form by a joint PO-U.K. Industry Advisory Group on System Definition, set up in 1968. The subsequent collaborative program for the design and manufacture of System X equipment was initiated in 1976; it subsequently achieved large-scale production and massive utilization in the U.K. and world telecommunication networks. G . Facsimile, Data Access (Teletext), and Television Conferencing Services
From its specialized beginnings for press services in the pre-War years, facsimile has evolved into a universal worldwide letter and document transmission service for personal and business use. By using the telephone channels of the existing world telecommunication network, with technical and operating procedures standardized internationally through the CCITT, the growth of the service and its wide availability have been unrestricted. Progress has been made to reduce the time and improve the quality of facsimile transmission; the six minutes required to transmit an A4 page in the 1960s was reduced to 3 minutes by 1976. A new generation of facsimile machines is set to reduce the transmission time further to less than 30 seconds by exploiting the 64 kbit/s capability of the integrated services digital network (ISDN) and using sophisticated coding and scanning techniques to remove redundant information in the transmitted signals. By using still higher bit rates in the ISDN, complete pages of newspaper text and half-tone illustrations can be rapidly transmitted-a service of particular value for worldwide newspaper production. The visual presentation of alphanumeric and graphical information on television screens at customers’ premises, derived from central data banks over telephone lines, was first demonstrated by British Telecom’s Research Department in 1974; it became the basis of a new service at first called Viewdata, later BT “Prestel”, available to domestic and business users. The range and number of pages of information that can be accessed from the user’s keyboard are virtually unlimited; they can include, for example, telephone directory “yellow pages,” bus and rail timetables, travel information, share prices, “for sale” items, and “what’s on” at local theaters. Furthermore, since the service is essentially “interactive,” that is, under the control of the user via his or her local keyboard, it can be used for banking and cash transfers, shopping from home, message (telegram) transmission,
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
205
travel booking, and similar services. A parallel development was the BBC and ITV “Ceefax” and “Oracle” teletext information services; however, because of the limited space available in the television signal waveform, these offer a restricted number of pages and are not user-interactive. Television conferencing , enabling live group-to-group visual communication with supporting sound and document facsimile transmission, began as British Post Office “Confravision” service in 1972, at first using standard 625-line television equipment and 5 MHz bandwidth analog cable and microwave radio-relay links. Television conferencing offers substantial reduction in the cost of visual communication for business, political, or social purposes when long distances are involved (Fig. 3). The immediacy of communication it provides is in sharp contrast with the time involved in long-distance travel. Furthermore, it enables groups in more than two widely separated locations to confer simultaneously-a facility offered by no other means of communication. The environmental advantages compared with air travel, for example in lower energy consumption and less pollution, are by no means insignificant. In spite of its obvious advantages, television conferencing is not yet widely used by the business world. Larger-scale use depends on reducing the cost of connection and enabling video links t o be “dialed up’’ as readily as a telephone call. Worldwide communication by satellite and optical fiber cables for local area, intercity, and transoceanic communication are, through their large communication capacity, greatly reducing transmission costs and increasing the availability of long-distance video links. Current developments using the integrated service digital network will provide readily dialed up, cost-effective video links by exploiting the 64 kbit/s mode to provide person-to-person visual communication, and multiples of this bit rate up to about 2 Mbit/s for higher-definition group-to-group conferencing. Sophisticated digital coding techniques have greatly reduced the bit-rate requirements for acceptable picture quality. A European telecommunication authorities collaborative project “MIAS” (multipoint multimedia conference system) is planning the technical basis for a worldwide system enabling groups of users at different locations to see and speak with one another, send facsimile documents, exchange data files, and examine high-definition still pictures.
IV. TELECOMMUNICATIONS IN THE NEXT50 YEARS Telecommunications in the last 50 years has been mainly “technologyled”-that is, the services provided were stimulated by the invention of new
FIGURE3. Mr. Punch foresees conference television 100 years ago. It is remarkable that this cartoon in Punch magazine for December 1879 foresaw not only U.K.-Australia conference television, but also large-screen projection in a format not dissimilar to today’s high-definition TV.
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
207
devices such as the transistor and the micro-chip, the coaxial cable and radio-relay, and electromechanical/electronic switching systems. The next 50 years will be mainly “software-led”; this will involve increasingly sophisticated computer-type programs for the control of transmission, switching, and network organization to facilitate the growth of existing services and provide new services. The trend is thus towards an “intelligent network” capable of responding quickly and cost-effectively to a wide range of user requirements, without detailed control by the user. Also important in large telecommunications organizations is the development of software for business and personnel management, the control of maintenance and fault-finding, customer inquiries and billing-not the least to reduce manpower requirements. Although the impact of software development may well be dominant, there are foreseeable technology trends that could enable new services to be created and existing services to be improved. A. New and Improved Services 1. Integration of Telecommunications and Broadcasting
The wide effective bandwidth of the optical fiber cable offers a hitherto unrivalled capability for providing integrated services on a single cable to customers’ homes. The short-range millimetric radio link also offers similar potential. Multichannel broadcast television and radio, a video library service, and telephone, data, facsimile, and television conferencing services -using the integrated services digital network concept as appropriatecould well be combined in a common facility. The economic advantages of a logically engineered cost-effective integrated system and the benefit to users of such an approach would appear substantial; whether the competing interests of broadcasters, program providers, commercial firms, and cable companies-not to say the wisdom of regulators-will inhibit its creation remains to be seen. 2 . Video Library: View What You Want, When You Want
The electronic video library service-in which the viewer has an “ondemand” access over cable or millimetric radio t o a wide range of recorded video programs and information sources in local or more remote libraries-has not yet progressed beyond experimental systems with limited program choices. The current widespread demand for rented videocassettes indicates a large potential market for such a service-the electronic version could offer immediate viewing selected from an unlimited range of choices
208
JOHN BRAY
presented on a continuously updated “menu.” The video library service could offer the convenience of centralized billing, e.g., according to the quality and duration of the material viewed. Only one video channel between each customer’s premises and the video library is required-unlike present cable systems with their tens or hundred of video channels. The break away from “continuously running” 24-hour conventional broadcasting services, which create a demand for low-cost and often poor-quality fill-in programs occupying a multiplicity of radio and cable channels, would, it is suggested, help to create a high-quality and financially viable viewing service providing far greater customer satisfaction. The transmission and switching technology for such a service already exist-what is needed, for economic as well as operational reasons, is the technology for a solid-state video store, with a capacity of a few hours’ recording time, which can be accessed electronically by many viewers at a time. 3 . Television Conferencing
The integrated services digital network (ISDN) already provides a basis for a “dial-up,” cost-effective television conferencing service for person-toperson or group-to-group use, generally using conventional visual display units. However, this falls some distance short of creating a real “you are here” sense of conference participation. This is partly because the single camera in such systems has only a single viewpoint, whereas each of the several viewers in a large spread-out group has a distinctive viewpoint. Furthermore, the presentation lacks three-dimensional visual quality-a successful 3-D system (e.g., hologram), with stereophonic sound, could in the future create a television conferencing system very close to reality. 4. Domestic Television Viewing: Virtual Reality Innovation in domestic television viewing is likely to proceed in the direction of higher definition, e.g., 1,250-line, and larger rectangular-format viewing screens. A virtually unexplored viewing device is the video analogue of the personal headphone audio system, in the form of miniature viewing screens built into the viewer’s spectacles. Since each eye sees a separate picture, three-dimensional viewing becomes readily practicable. The possibilities are interesting-such a device could offer the illusion of being in an unrestricted three-dimensional world, entirely cut off from the viewer’s immediate surroundings. The technological base for such a development is already here-miniature liquid-crystal television displays are now state-of-the-art.
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
209
A further development-“virtual reality”-makes such a viewing system user-interactive by providing users with sensors enabling them apparently to move about in a 3-D picture scene, but this requires a data base of extremely large capacity.
5 . The Electronic Newspaper The electronic newspaper of the future will probably use a flat display unit based on liquid-crystal and microchip techniques, thin and light enough to be comfortably hand-held, and linked to the telecommunication network by short-range radio link. The displayed information could be transmitted overnight for reasons of economy, and updated as required during the day. Although the technical characteristics and the operational and billing characteristics would need to be standardized, such a system could offer access to a multiplicity of “newspapers,” of virtually unlimited scope and variety of local, national or overseas origin. The social and environmental benefits could in time be substantial. A major advantage over the present large scale use of newsprint would be a slowing down of the rate of destruction of the world’s forests. The uneconomic use of man power and the congestion of roads necessitated by today’s cumbersome, slow, and inefficient newspaper distribution system would, in time, be replaced by a speedy, energy-efficient, electronic system. B. New Technology and System Concepts 1. Opto-electronic and Photonic Switching
Many laboratories are now exploring optical switching with a view t o closer integration with optical transmission and ultimately faster, more flexible, and lower cost telecommunication systems. One approach is to employ optical switches using photons to replace the semi-conductor devices using electrons in conventional switching configurations such as crossbar, but with electronic control of the switching function, hence “opto-electronic.” Another approach uses guided light waves in a lithium niobate crystal which can be deviated from one light guide to another by applied electrical control signals. This principle can be extended to light beams in space which can be deviated orthogonally to impinge on light-sensitive receptors in a two-dimensional matrix-hence “photonic” switching. Another, radically different approach, is based on lightwavelength multiplexing in which coherent laser light sources are marginally deviated in frequency to select one of a matrix of frequency sensitive receptors.
210
JOHN BRAY
Multistage optical switching networks are well adapted to self-routing data transmission in which packet-type data with appropriate header addresses can find their own way through complex networks, setting switches as they proceed. 2. The Transparent Network: An Optical “Ether” Proponents of this concept claim that there is sufficient bandwidth available in optical fiber cables, and the frequency stability achievable in coherent laser light sources is potentially sufficient, for each customer in the national network to be allocated a unique frequency. A desired A to B connection would be established by A sending at the appropriate frequency to reach Banalogous to radio-communication via the all-pervading radio-wave ether. Clearly, such a scheme would involve light amplification to overcome optical cable and light branching network losses. In an optically transparent network, switching would disappear and there would be freedom to use the transmission capability of each path thus established in whatever mode was desired. 3 . The Unlimited Data Base
Nearly all present-day communication systems depend on the ability to store and access information in digital form in memory devices based on the microchip-typically with up to a million or more individual solid-state devices, each corresponding to an information “bit,” on a one square centimer of silicon. Access, or “reading,” is electronic and fast. Magnetic tapes and optical discs provide means for video signal storage equivalent to some tens of billions of bits, but with relatively slow electro-mechanical or optical-mechanical scanning access. The continuing development of information technology and new services such as the video library will require data bases of even larger capacity and fast access, while the effective implementation of the “virtual reality” concept requires data bases larger by several orders of magnitude than have been achieved so far. To further these objectives, research is looking to solid-state bit-storage elements of molecular and even atomic dimensions; the most recently reported research indicates the possibility of a “single electron” device. If so, the door would be open to fast access data bases of almost unlimited capacity.
v. THEIMPACT OF TELECOMMUNICATIONS AND INFORMATIONTECHNOLOGY ON THE FUTURE OF MANKIND But perhaps telecommunications is moving into an era where technological innovation is less important than the more effective use of existing information technology for environmental and quality of life improvement.
TELECOMMUNICATIONS: THE LAST, AND THE NEXT, 50 YEARS
21 1
It is a simple truism that it is no longer necessary to travel to communicate-there are now ample means available to communicate at a distance, person-to-person, group-to-group, and with or between computers and data banks, in a variety of modes including audio, visual, and permanent record (fax)-described by the term “information technology.” A great deal of the work that goes on in offices in large cities is concerned with the processing and exchange of information; to carry it out, vast numbers of computers travel by train and car, day by day, from their homes in the suburbs or other towns. Such travel is wasteful of time, involves the consumption of enormous amounts of irreplaceable fossil fuels, and creates heavy peak demands on road and rail services leading to congestion and delay. Long distance inter-continental travel by jet aircraft for business purposes is wasteful of resources, time-consuming, and harmful to the environment. Tele-conferencing offers an economical and rapid alternative, with the unique advantage that it also permits simultaneous communication between three or more widely separated locations. At the present time information technology such as tele-conferencing is being used only on a limited scale to replace travel, for example, by firms with offices in cities and factories in distant locations. The use of facsimile transmission for business purposes is now well established and growing rapidly. What is now needed to make more effective use of information technology is a cooperative and unified study by government, industry, environmentalists, and telecommunication authorities to determine the benefits to the economy and the environment that could be achieved by the large scale use of information technology to move offices, factories, and other centers of employment away from large cities to locations that are more acceptable environmentally. The innovative use of information technology could have a beneficial effect on the declining rural economy, at present suffering depopulation and the loss of schools, shops, and other facilities. It could stimulate the development of small, environmentally friendly, information technologybased offices in villages and small towns, used by a single firm or on a shared basis by a number of firms. There are possibilities too for working from home to town-based offices, an activity that includes a new and profitable cottage industry-the preparation of computer software. If such a study reeals that the economic and environmental benefits of the larger-scale use of information technology are both substantial and achieveable-as they may well be-means for encouraging the relocation of existing offices, and a location planning policy for new offices, will be needed.
212
JOHN BRAY
The successful implementation of a study aiming at the large-scale use of information technology and the decentralizing of office activity could well have a major impact on where people live and work and the quality of life they enjoy in the future-together with substantial benefits to the national economy and the environment itself.
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 91
Mesoscopic Devices Where Electrons Behave like Light’ A. J. HOLDEN GEC-Marconi Materials Technology, Caswell, Towcester, Northamptonshire. United Kingdom
1. Introduction . . . . . . 11. Building a Mesoscopic System 111. A Brief History . . . . .
IV. V. VI. VII. VIII. IX. X.
. . . . . . . . . . . . . . . . . . . . . . . Transport Theory . . . . . . . . . . . . Some Simple Examples . . . . . . . . . The Aharonov-Bohm Effect . . . . . . . .
. . . .
. . . .
. . . .
. . . .
Two-Terminal Conductance in 1-D Quantum Point Contacts Magnetic Phenomena: Edge States . . . . . . . . . Devices . . . . . . . . . . . . . . . . . . . Summary and Conclusions . . . . . . . . . . . References . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . .
.
.
.
213 214 215 215 216 218 219 223 225 227 228
I. INTRODUCTION Mesoscopic systems I will define as physical regions within which carriers do not suffer any quantum-mechanical phase-destroyingcollisions, bounded by regions which can be described “classically” and contacted by charge reservoirs which act as a source and sink of carriers and of energy. Practical observations of such systems have been reported recently by several groups around the world. Of particular note are the observations of quantized conductance in two-terminal structures, conductance oscillations periodic with applied magnetic or electric field, electron waveguide effects including stub tuners and directional couplers, non-ohmic behavior of ballistic resistors, anomalous hall effect, and universal conductance fluctuations in applied electric or magnetic fields. All of these effects make for some fascinating physics, although the idea of a mesoscopic device may be a little premature. Quantum effects in fabricated structures such as quantum wells in optoelectronics and electronics, resonant tunnelling diodes, and hot electron transistors are not new. However, most existing quantum structures rely on transport vertically through the grown layers. I propose to restrict my review to the new “laterally confined” structures, which offer renewed
’ Reprintedfrom Instituteof Physics Publishing(IOP) conferenceseries on GalliumArsenide. 213
ISBN 0-12-014733-5
214
A. J. HOLDEN
novelty and which, I will argue, have some advantages over the vertical structures for practical devices.
11.
BUILDING A MESOSCOPIC SYSTEM
The key to the realization of mesoscopic systems is the modulation-doped semiconductor heterojunction, which produces a high-mobility two-dimensional electron gas (2DEG). Mobilities in excess of 1,OOOm2/Vs have been observed at millikelvin temperatures with a pinball mean free path of several microns. In such a system, regions a few microns long may sustain carriers which travel “ballistically,” retaining their quantum-mechanical phase. Furthermore, carriers may retain phase information, even after undergoing a few collisions, provided that such collisons are not “totally dephasing.” In practice, elastic collisions are not dephasing, whereas inelastic collisions are. Leggett (1989) makes a more general definition of a dephasing collision as one in which the quantum state of the environment is changed. If such a dephasing collision occurs, the boundary or environment of the mesoscopic system can no longer be treated classically. Three factors limit the observation of mesoscopic behavior. Firstly, detection of quantum effects usually requires low carrier densities; this leads to reduced screening of remote impurities and brings the practical elastic mean free path down below 1 pm (Timp et al., 1989). Secondly, for studies of transport in a particular direction, the 2-D system imposes a twofold ensemble averaging-over the carrier energy and the transverse wave-vector k,. This latter average is disastrous when elastic scattering is present, and so 2-D mesoscopic systems have to be exceptionally small and operate at very low temperature (Bandyopadhyay et al., 1989). Moving to l-D systems removes the ktaveraging and allows a certain amount of phasepreserving elastic scattering to take place, relaxing the constraints on temperature and size. Most observations of mesoscopic effects have been made in 1-D systems. Finally, the provision of contacts to the system must not violate the mesoscopic region. They must be close enough together to allow phase-coherent transport between them, but far enough apart to prohibit the transmission of evanescent modes. One-dimensional systems are usually constructed by patterning the surface of a modulation-doped structure using nanolithographic techniques. Split gate structures (Wharam et al., 1988a) or mesa structures (Miller et al., 1989) use surface depletion to laterally confine the 2DEG and produce 1-D channels, wave guides, constrictions, and point contacts in which mesoscopic experiments can be performed.
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
215
111. A BRIEFHISTORY
Mesoscopic physics is a very new field. It has its origins in the observation of universal conductance fluctuations (Thornton et al., 1987) and the observation of Aharonov-Bohm (A-B) fluctuations in both metal rings (Webb et al., 1985) and semiconductor quantum wells (Datta et al., 1985), these being the first genuine observations of quantum interference effects in nanofabricated structures. Before this time only the quantum mechanics of perpendicular “quantum well” type structures had been observed. Quantum wires (van Houten et al., 1986) and electron waveguides (Timp et al., 1987) were now studied, with the key development being the fabrication of structures on a scale less than the dephasing scattering length. Buttiker (1986) produced a theory which extended the very early work of Landauer (1 970) and has become the blueprint for interpreting mesoscopic systems. This theory has become the “Kirchhoff’s laws” for mesoscopic systems (see Section VII). Application of Buttiker theory predicts quantized conductance in ballistic point contacts, and this was first observed by van Wees et al. (1988) and Wharam et al. (1988a). Subsequently, many groups across the world have confirmed observations and proposed exciting new structures. The whole subject seems to have come of age at the International Symposium on Nanostructure Physics and Fabrication, College Station, Texas, in March 1989, where many of the key observations were reported.
IV. TRANSPORT THEORY The inclusion of quantum mechanics in transport theory has undergone three distinct phases, each of which has its relevance and application. Since the advent of quantum mechanics, the most widely used and successful theory is the kinetic theory (developed originally for gases), which derives its quantum mechanics through the use of Fermi-Dirac statistics, band structure, and quantum-mechanical scattering cross-sections and then proceeds by treating carriers in what we have come to term as a “classical” manner, solving for drift and diffusion, etc. This theory has developed in many forms, with its most recent successes being in large-scale numerical Monte Carlo calculations. In the second phase, attempts have been made to treat transport within a fully consistent quantum-mechanical framework. Of particular note here is the work of Kubo (1986). This theory is used in closed systems with a well-defined Hamiltonian. We would also include in this category the theory of quantum wells and the like, which are trivial but nevertheless
216
A. J. HOLDEN
well-known examples of where closed-system, consistent quantum-mechanical solutions have proven to be very useful. A fundamental problem with this type of theory is that it solves for a conservative system and then attempts to apply the solution to a dissipative system. Boundaries are always a problem, since it is physically impossible to include the whole environment in the Hamiltonian. Contacts and leads have to be fudged in afterwards. One application of this technique which we made to GaAs field effect transistor (FET) channels gets around the problem of boundary conditions by considering the channel as an infinite “gutter pipe” which is tipped (source-drain field applied) at time t = 0 and the evolution of a quantum-mechanical carrier traced in time (Holden and Debney, 1985; Sawaki, 1983). In this way, the time evolution of the phonon scattering time constant can be extracted, but the entrance and departure of carriers from the system is not described. Buttiker (1986) pioneered the third phase, which has led to a proper understanding of the mesoscopic systems. The mesoscopic system is short compared to the dephasing mean free path L,; it is bounded by an environment which can be described classically; and the contacts and leads are explicitly allowed for as reservoirs, providing a source and sink for carriers and energy. The whole sample is treated as a single phase-preserving scatterer, with all the dissipation occurring in the reservoirs feeding current to the sample. We are told how electrons enter and leave the sample, and from this the resistance can be calculated. Buttiker theory predicts quantized conductance in a two-terminal mesoscopic structure (Section VIII) and has been extended to give a basic understanding of many other mesoscopic phenomena.
V. SOMESIMPLE EXA~~PLES
Three simple examples are given to provide a feel for the real quantummechanical effects which can be observed in mesoscopic systems. In Figs. 1 and 2 we show the results of Webb (1989) for a short, narrow, metallic wire and two configuations of wire loop, all in a magnetic field (the basic geometry is shown in the insets, with the magnetic field perpendicular to the conducting plane). As the field varies, the measured resistance fluctuates. In the single wire (Fig. la), this fluctuation is random but reproducible. In the single loop (Fig. 2), the fluctuation has a fixed period equal to a change in flux through the ring of W e , and in the wire with an extra loop outside the classical current path (Fig. lb), the fluctuations have a low-“frequency” random component with a higher-“frequency” periodic
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
507
217
428
E
f
(Y
f
506
a
427
U
505 -1.6
-1.4 H Tesla
-1.2
-
426 -1.6
-1.4 H Tesla
-
-1.2
FIGUREI . Quantum fluctuations in the conductance of a wire as a function of applied magnetic field. (a) Single wire and (b) wire with remote loop. After Webb (1989).
component superimposed. These are quantum interference phenomena which have no classical analogue. The first example demonstrates universal conductance fluctuations (UCF) where the wave function of a coherent carrier passes along alternative paths through parts of the sample, as shown schematically in Fig. 3. The presence of a magnetic or electric potential changes the effective phase length in each path (see Section VI), and the carrier interferes with itself, causing conductance fluctuations. The resulting fluctuations are random and will, of course, sum to zero in a large sample. In a small sample, these fluctuations can be observed and will be precisely reproduced with each field pass, unless 1 0
0
2
0
'to?
X
5 -1 a
a
0.8pm
-2
V
-3
0.10
0.15
0.20
H (T) Oscillation 'period'
(Do = hle
FIGURE 2. Oscillatory fluctuations in the resistance of a ring as a function of applied magnetic field. After Webb (1989).
218
A. J. HOLDEN
- - - Impurities ~
FIGURE 3. Alternative paths for coherent carriers lead to microscopic rings whose conductance fluctuates with the flux enclosed-universal conductance fluctuations.
the impurity system or whatever is producing the current paths is changed. Moving just one impurity in a small sample can have a dramatic effect on the UCF. In the second example, two paths for the wave function are deliberately provided as two arms of a ring. There is now one dominant interference effect determined by the flux through the ring, leading to fluctuations with a single short period in the field H. The third example shows the longer, random-period UCF associated with the wire combined with the shorter, single-period fluctuations associated with the extra small loop. Interference effects associated with the loop are observed clearly, even though the loop is outside the classical current path in the experiment. Mesoscopic VLSI circuit designers take note!
VI. THEAHARONOV-BOHMEFFECT Following the original studies of Aharonov and Bohm (1959), Webb et al. (1985), and Datta et al. (1985) made the first observations in nanofabricated structures. Figure 4 shows a schematic ring structure. Carrier wave functions entering from the left divide and pass along both arms of the ring. If a magnetic vector potential A and/or a scalar electric potential V is present, the phase of the wave in the two arms differs by an amount 64
=
(2rce/h)
s:
( V d t - A * ds),
(1)
where rl and r, are the positions of the incoming and outgoing waves on the ring. The waves traveling on the two halves of the ring will oscillate in and out of phase with a flux period given by Oo = h / e or a voltage period given by j V d t = h/e, where the integral is over the time period that the electron is in contact with the potential. It is important to note that Eq. (1) will give a phase change, provided that a potential exists, even if the actual electromagnetic fields themselves are zero in the current path, i.e., there is no classical force acting on the carriers.
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
Phase change
6 0 = (e/?)/
Oscillations with flux period
219
-
(Vdt Ads)
a0=
FIGURE4. Schematic geometry for the Aharonov-Bohm experiment.
This basic theory explains all the three phenomena described in Section V and forms the basis for a number of device possibilities. Datta et al. (1985) fabricated an A-B ring in the form of a pair of parallel semiconductor quantum wells using molecular beam epitaxy (MBE) and observed unambiguous conductance oscillations. The rms value for conductance fluctuations is e 2 / h and is universal.
VII. TWO-TERMINAL CONDUCTANCE IN 1-D QUANTUM POINT CONTACTS Some of the most remarkable mesoscopic physics is to be found in the conceptually simple 1-D quantum wire or waveguide. These structures are usually made by imposing a variable lateral constriction over a short distance in a 2DEG, and they can also be viewed as a quantum point contact (QPC). Figure 5 shows a typical split gate approach used by Timp et al. (1989). Application of a negative gate bias depletes the 2DEG layer and produces a 1-D constriction or short waveguide for carriers traveling in the y direction between the two ohmic contacts. The width of the waveguide is controlled by the gate bias causing lateral depletion between the gates (indicated by the dotted lines) which varies the number of quantized states below the Fermi level or, alternatively, the number of allowed electron waveguide lateral modes. Figure 6 shows a simplified “Buttiker” picture of the QPC with a mesoscopic 1-D region contacted at each end by reservoirs. If a small
220
A.
vg
J. HOLDEN
=-1v
I.......................,3. :. ........... I
1 I4
vg
vg i l L
.
I
.
I
: ;
I'
Undoped A,, Ga,,As Undoped
Ga,,,As
1D electron gas 4 2
FIGURE 5 . Typical split gate structure. After Timp et al. (1989).
change is made in the relative chemical potentials between the reservoirs, & = eV, where I/ is the applied voltage, then the left-hand reservoir is free to inject carriers into the 1-D guide with velocity uF, the Fermi velocity. The current so produced will be the product of the charge times the carrier velocity times the density of forward traveling states times the energy range, i.e., I = e x vF x (1/2)NlD(EF) x dp. (2) In one dimension, the density of states in energy (including spin) is
*
+ 6p=eV ..........
-e Current ................................... t
/
Left-hand reservoir
curreti
'
\
I
One-dimensional channel
I
Charga
ViLlcnY
Dens;tyol ~dlravelling states
Right-hand reservoir
R m p of
FIGURE 6. Simple two-terminal picture of mesoscopic conduction leading to the Buttiker picture of quantized conductance.
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
N
0)
c! 5 a
1’3
t
221
-4
-3
-2 vg
-1
(V)
FIGURE7. Measured quantized resistance steps in a split gate structure. After Whararn el al. (1988a).
On substituting (3) into (2), the velocities cancel and we have the simple result 2e I = -6p; (4) h substituting for 6p yields the conductance into one l-D level or “channel” as
2e2 h ’
G=-
or, for n channels,
2ne2 G=h *
The number of available channels or modes in the 1-D guide can be varied by varying the constriction width with the gate voltage. This predicts that the conductance will increase in quantized steps with gate voltage as each new channel becomes available. The resulting resistance is shown in Fig. 7, as observed by Wharam et al. (1988a). This conductance was also observed independently by van Wees et al. (1988). Subsequently, work has been done on placing two or more QPCs in series (Wharam et al., 1988b; Beenakker and van Houten, 1989a) leading to the remarkable result that the resistances do not add, but the net resistance is that due to the highest resistance constriction. This can be easily understood from the preceding analysis, since each QPC will only transmit so many modes according to its width. If the waves remain coherent between the QPCs, then the only modes to pass all the way through will be those transmitted by the QPC with the lowest number of allowed modes. If this is n, then the resistance of the network will be h/2ne2, regardless of the number of extra QPCs added which transmit more than n modes. Ohm’s law is rewritten.
222
A. J. HOLDEN
The preceding derivation considers only two contacts and assumes that every subband is transmitted and reflected perfectly. Buttiker (1986) extends these ideas to a multiple contact system and defines transmission and reflection coefficients between the contacts to take account of the elastic properties of the system, which is treated as a single scattering entity. The resulting formula replaces Kirchhoff’s laws for mesoscopic systems. A more conceptually powerful approach due to Laughton et al. (1990) considers a modal analysis of a QPC. Electron waves, now very like light waves in a waveguide, approach the QPC with a number of propagating modes. According to the width of the QPC the transverse energies of some of the modes will be forced above the Fermi energy causing them to be reflected. Modes remaining below E, will be transmitted. This quasi optical picture of electron waveguides allows us to imagine other possible analogues such as the directional coupler, using two electron waveguides coupled via a variable potential barrier (del Alamo and Eugster, 1990) or the Fabry-Perot interferometer with electron waves (Smith et al., 1989). Of particular note for the device hunters is the mesoscopic stub tuner (Miller et al., 1989). Figure 8 shows a schematic of the tranditional microwave stub tuner together with its mesoscopic analogue. The electron wave direct from the source interferes with a part of itself which has made the round trip down and up the stub. The effective stub length is varied by extending the gate depletion region and the source to drain conductance is seen to oscillate with applied gate voltage. Multimoding in the electron guide causes some multiple periods but the oscillation have been detected in a fabricated structure (Miller et al., 1989).
1D electron
Drain
waveguide
\
iate
FIGURE 8. Microwave stub tuner and its mesoscopic analogue using I-D electron waveguides. After Miller et a/. (1989).
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
223
VIII. MAGNETIC PHENOMENA: EDGESTATES
The quantized conductance described in Section VII does not require a magnetic field, but the appearance of conductance quanta equal to 2e2/h is reminiscent of the famous integer quantum Hall effect (von Klitzing et al., 1980), which was first observed in Plessey-fabricated Metal Oxide Semiconductor Field Effect Transistors. The conductance quantization observed in QPCs is not as “sharp” as the quantized Hall effect in magnetic fields, but in mesoscopic structures, the quantized Hall effect, and other magnetic field effects, can be understood on the same footing within the Buttiker theory. Following Buttiker (1988) we consider a 2DEG with boundaries. This could be a 1-D QPC of the kind described in Section VII, but in high magnetic field, quantization will be seen even in comparatively wide devices. The 2-D system (the 2DEG) is further constrained by the edges of the sample represented by a potential V(x)in the x direction which is flat in the middle and rises sharply at the edges of the sample. We will consider conduction in the y direction. If we now apply a magnetic field in the z direction and consider a separable solution for the wave function of the form ‘yik
=
(6)
eibfj(x),
the solution becomes and eigenvalue problem forf (Halperin, 1982) of the form
where 1/2
xo=-k(&)
,
is the enter of the harmonic oscillator “Landau” wave function of order j and o,is the cyclotron frequency. If we consider only the central flat region of V(x)( = 0, say), the eigenenergiessplit into the usual Landau levels. Near the edges of 2DEG, however, where V(x)rises steeply, the eigenenergies will also begin to rise, as shown in Fig. 9. For a given position of the Fermi level the only mobile states will be at the edges of the device where the Fermi energy cuts the various eigenenergies. These states are at each edge of the waveguide. Buttiker (1988) shows that the mobile carriers at one edge will be traveling in the opposite direction to carriers at the other edge. These so-called edge states have important consequences for mesoscopic structures. First, by arguments similar to those of Section VII, Buttiker (1988) shows that the current fed into an edge state from a reservoir is the same as that fed into a 1-D waveguide state at zero magnetic field, i.e., Z = (2e/h)dp,
224
A. J. HOLDEN
Rectangular Confining Potential FIGURE9. Quantized Landau levels for a 2DEG in a magnetic field, showing how the energy rises as the center of the harmonic oscillator wave function (x,,) approaches the physical edge of the 2DEG. After Halperin (1982).
with a corresponding quantized conductance 2ne2/h, where n is now the number of occupied edge states with positive velocity. Secondly, since states with opposing velocities must remain on opposite sides of the device, any scattering event must transfer a carrier right across the device to produce an effective backscatter. Buttiker (1988) points out that even the effect of inelastic collisions, in an otherwise coherent system, will be suppressed. Buttiker (1988) goes on to extend the preceding analysis to four-contact systems and demonstrates how the integer quantum Hall effect emerges in mesoscopic systems. Van Wees et al. (1989a) have performed some elegant and fascinating experiments in which they make Hall measurements on a 2DEG in the mesoscopic limit and in high magnetic fields so that the current is carried in edge states. They form the second voltage and current contacts using QPCs. The QPCs can be set (according to the width of the constriction) to accept or reject certain edge states. This leads to an apparent quantized Hall conduction plateau which is unrelated to the occupied Landau levels in the 2DEG. Edge states which are not accessed by the QPCs do not contribute to the electron transport and hence to the observed Hall conductance. Van Wees et al. (1989b) also demonstrate anomalous suppression of Shubnikov-de Haas resistance oscillations when a QPC is used as a controllable voltage probe. Anomalous Hall effects in mesoscopic structures have been analysed by Beenakker and van Houten (1989b) and Ford (1989) in terms of “billiard ball” model for carrier transport in mesoscopic structures. This has some analogy with the ray picture in optics rather than the wave picture used earlier. Using this approach, Ford (1989) was able to explain a series of experiments performed in mesoscopic Hall crosses.
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
225
IX. DEVICES There are a number of suggestions for specific mesoscopic devices, and we must also be concerned about possible mesoscopic phenomena occurring in existing “classical” devices as dimensions become smaller. Quantum devices in the form of quantum well lasers, resonant tunneling diodes, and related structures are already finding applications. The lateral devices discussed in this review are far less well advanced, but it has been argued (Bandyopadhyay, 1989) that they offer more long-term promise than the vertical structures. Lateral structures should be capable of carrying more current for a given device area than a vertical structure. This is because the large potential barriers which create the quantized systems are the heterojunctions which are grown into the epilayers. The carriers in the vertical devices must tunnel through or be injected over these barriers. In the lateral devices, the carriers move parallel to these large heterojunction barriers and have only to surmount small controlling potentials. Devices need to be able to carry sufficient current in order to charge interconnect and other load capacitors, to generate power for microwave applications, or to drive further logic circuits. A second advantage of lateral structures is their comparatively low threshold voltage. Only small differentials in voltage are needed to produce large phase shifts. In an A-B ring, for example, as little as 1.5 mV is sufficient to produce a current modulation (Bandyopadhyay et al., 1986). The devices which have been most successful to date, vacuum tubes, transistors, etc., have been robust, self-regulating and are still improving in complexity and speed, often much faster than applications can be found to use the improvements. Small, low-capacitance mesoscopic devices, with high current capability and low threshold voltage, have the potential for very fast operation in high-complexity circuits, but are small, delicate, very sensitive to the control of growth and processing techniques (MBE, nanolithography), and very susceptible to adverse impurity concentrations. In addition, most of the demonstrations to date have been at millikelvin temperatures, with only a few observations as high as liquid nitrogen. Also, the devices have impedances of order h/e2 (13 kn), which is not well matched to 50 a! Some of the mesoscopic phenomena are double-edged. For example, the detection of interference from structures outside the classical current path (Section V) offers exciting possibilities for fast, intelligent interconnecting and “neural net” style cross-talk, but it will be a real headache to the VLSI designer who wants closely packed circuits. Quantum devices are very individual structures, exhibiting universal conductance fluctuations which are the unique signature of a particular impurity distribution. Useful devices, on the other hand, need to be
226
A. J. HOLDEN
reproducible, with local fluctuations averaging out over the whole device. In short, the devices will be smaller and faster with lower power consumption, but are likely to have lower yield, be less reliable and operate at very low temperature. Much work will have to be done to overcome these obstacles. Specific mesoscopic devices proposed in the literature include the “stub tuner” (Miller et al., 1989) (see Section VII), a 1-D quantum wire transistor (Hiramoto et al., 1989), and an A-B interferometer using an electrostatic potential to modulate the current in a 1-D conventional A-B structure (Bandyopadhyay et al., 1989). Simulations of this last device using an electrostatic modulation at liquid nitrogen temperature predict a 70% modulation with a maximum current of I = 2.1 pA for a typical device cross-section of 100 x 100 A2,corresponding to a very respectable current density of lo6 A/cm2. This compares favorably with the current densities achieved in high-speed Si and GaAs Heterojunction Bipolar Transistor (HBT) bipolar processes and is an order of magnitude higher than typically achieved in resonant tunnelling diodes. The threshold voltage for switching was a very low V, = 7mV, and if an optimistic value for capacitance of C = 1 fF is taken, this would imply an extrinsic switching speed of around 3 ps. This is comparable to the fastest conventional GaAs devices. Note that this is a genuine attempt to calculate real extrinsic switching speed. The intrinsic speed of such a device could be taken as the transit time for the electrons, which is probably as low as 230 fs. This simulation serves to illustrate a very important point when assessing such devices. Most of the clever physics in new devices improves the intrinsic speed. Here, carrier wave functions are caused to interfere over very short distances, leading to modulation of current over a very small “active” region with very fast transit times. The useful speed of the device, however, is determined by how quickly the device can charge its own and associated capacitance. Even though the mesoscopic device can carry a high current, have a commendably low threshold voltage, and be very small, its predicted speed is only just equal to the very best research “conventional” devices. This, I suggest, is not a winning advantage. Where mesoscopic device concepts may succeed is if they offer additional functionality in a way which dramatically reduces the number of devices required to perform a specific task. As an example from vertical structures, Sen et al. (1988) fabricated a parity generator circuit using a single resonant tunnelling bipolar transistor which multiple states, replacing 24 conventional transistors for the same function. One example from lateral devices is the quantum diffraction FET (QUADFET) proposed by Kriman et al. (1988). Figure 10 is a schematic of the device. A split gate constricts a 2DEG and acts as a diffraction slit. Carriers from the source form a diffraction pattern which is detected by drain fingers as shown. Each finger represents
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
Split gate
227
Drain fingers
Source contact
FIGURE10. Proposed structure for the quantum diffraction FET.After Kriman et a/. (1988) and Bandyopadhyay et al. (1989).
a digital bit, and small modulations of the split gate will change the slit size and modulate the bit pattern. We thus have an instant analog-to-digital converter in one device.
X. SUMMARY AND CONCLUSIONS The technical achievement of real mesoscopic systems has opened a remarkable new world of semiconductor physics. The key advance is the production of 1-D systems, which are short compared to the dephasing mean free path (around l p m or more at low temperatures). Genuine quantum interference effects are observed even when some elastic scattering is present. Conductance in 1-D “electron waveguides” is found to be quantized, even in the absence of magnetic fields, and the classical Kirchhoff’s and Ohm’s laws which govern current flow are replaced by the Buttiker formalism, which describes transmission between contact reservoirs through a single mesoscopic scattering region. Electron waves propagate through 1-D wires like light in a waveguide, producing interference, coupling, and diffraction phenomena. Application of magnetic fields further enhances the effects, with the appearance of edge states, universal conductance fluctuations, Aharonov-Bohm effects, and anomalous Hall effects. Lateral mesoscopic devices offer some hope of higher current handling and lower threshold voltages, but poor yield and very low-temperature operation are severe obstacles. Mesoscopic devices have the best chance to make an impact when they offer novel and increased functionality, not just raw speed. Mesoscopic effects are real and are happening inside conventional devices even now. Overshoot effects, tunnelling and resonant tunnelling, quantum
228
A. J. HOLDEN
well effects, and quantum-mechanical transmission and reflection at barriers are all reported phenomena. It is therefore important for device physicists to study such effects, to model them and allow for them in design. For the present, however, we are, unlikely to see many of the dramatic quantum interference effects described in this review; for them, you really need one dimension and very low temperatures.
ACKNOWLEDGMENTS My special thanks go to Graham Rees and John Davies, who helped me with the literature and provided much unpublished information. I also thank Bryan Wilson and all the members of the Theory Group at Caswell for discussions and critical comment and, finally, all the groups throughout the world who have contributed to this exciting field and whose work it has been my privilege to review.
REFERENCES Aharonov, Y., and Bohm, D. (1959). Phys. Rev. 115,485. Bandyopadhyay, S.,Bernstein, G. H., and Porod, W. (1989). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed and W. P. Kirk, Eds.), pp. 183-188. Bandyopadhyay, S., Datta, S., and Melloch, M. R. (1986). Superlatt. Microstruct. 2, 539. Beenakker, C. W. J., and van Houten, H. (1989a). Phys. Rev. B. 39, 10,445. Beenakker, C. W. J., and van Houten, H. (1989b). Phys. Rev. Lett. 63, 1857. Buttiker, M. (1986). Phys. Rev. Lett. 57, 1761. Buttiker, M. (1988). Phys. Rev. B 38, 9375. Datta, S., Melloch, M. R., Bandyopadhyay, S., Noren, R., Vaziri, M., Miller, M., and Reifenberger, R. (1985). Phys. Rev. Lett. 55, 2344. del Alamo, J. A., and Eugster, C. C. (1990). Appl. Phys. Lett. 56, 78. Ford, C. J. B. (1989). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed and W. P. Kirk, Eds.), pp. 389-394. Halperin, B. 1. (1982). Phys. Rev. B 25, 2185. Hiramoto, T., Hirakawa, K., lye, Y., and Ikoma, T. (1989). Appl. Phys. Lett. 54, 2103. Holden, A. J., and Debney, B. T. (1985). Physicu 134B, 132. van Houten, H., van Wees, B. J., Heijman, M. G. J., and Andre, J. P. (1986). Appl. Phys. Lett. 49, 1781. von Klitzing, K., Dorda, G., and Pepper, M. (1980). Phys. Rev. Lett. 45, 494. Kriman, A. M., Bernstein, G. H., Haukness, B. S., and Ferry, D. K. (1988). Proc. 4th Intl. Conf. on Superlattices, Microstructures and Microdevices. Kubo, R. (1986). Science 233, 330. Landauer, R. (1970). Philos. Mug.21, 863. Laughton, M. J., Barker, J. R.,Nixon, J. A., andDavies, J. H. (1991). Phys. Rev. 844, 1150.
MESOSCOPIC DEVICES WHERE ELECTRONS BEHAVE LIKE LIGHT
229
Leggett, A. J. (1989). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed, and W. P. Kirk, Eds.), pp. 31-42. Miller, D. C., Lake, R. K., Datta, S., Lundstrom, M. S., Melloch, M. R., and Reifenberger, R. (1989). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed, and W. P. Kirk, Eds.), pp. 165-174. Sawaki, N. (1983). J. Phys. C Solid Stute Phys. 16, 461 1. Sen, S., Capasso, F., Cho, A. Y., and Sivco, D. L. (1988). Electron. Lett. 24, 1506. Smith, C. G., Pepper, M., Ahmed, H.,Frost, J. E. F., Hasko, D. G., Newbury, R., Peacock, D. C., Ritchie, D. A., and Jones, G. A. C. (1989). J. Phys. Condens. Mutter 1, 9035. Thornton, T. J., Pepper, M., Davies, G. J., and Andrews, D. (1987). Proc. 8th Int. Conf. Physics of Semiconductors (0. Engstrom, Ed.), p. 1503. World Scientific Publishing Co. Pte. Ltd., Singapore. Timp, G., Chang, A. M., Mankiewich, P., Behringer, R., Cunningham, J . E., Chang, T. Y., and Howard, R. E. (1987). Phys. Rev. Lett. 59, 732. Timp, G., Behringer, R., Sampere, S., Cunningham, J. E., and Howard, R. E. (1989). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed, and W. P. Kirk, Eds.), pp. 331-345. Webb, R. A., Washburn, S., Umbach, C. P., and Laibowitz, R. B. (1985). Phys. Rev. Lett. 54, 2696. Webb, R. A. (1989). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed, and W. P. Kirk, Eds.), pp. 43-54. van Wees, B. J . , van Houten, H., Beenakker, C. W. J., Williamson, J. G., Kouwenhoven, L. P., van der Marel, D., and Foxon, C. T. (1988). Phys. Rev. Lett. 60,848. van Wees, B. J., Willems, E. M. M., Harmans, C. J. P. M., Beenakker, C. W. J., van Houten. H.. Williamson, J. G., Foxon, C. T., and Harris, J. J . (1989a). Phys. Rev. Lett. 62, 1181. van Wees, B. J., Kouwenhoven, L. P., Willems, E. M. M., Harrnans, C. J. P. M., and Williamson, J . G. (1989b). Proc. Int. Symp. Nanostructure Physics and Fabrication, College Station, Texas. (M. A. Reed, and W. P. Kirk, Eds.), pp. 361-368. Wharam, D. A,, Thornton, T. J., Newbury, R., Pepper, M., Ahmed, H., Frost, J . E. F., Hasko, D. G., Peacock, D. C., Ritchie, D. A., and Jones, G. A. C. (1988a). J. Phys. CSolid State Phys. 21, L209. Wharam, D. A. Pepper, M., Ahmed, H., Frost, J. E. F., Hasko, D. G., Peacock, D. C., Ritchie, D. A., and Jones, G. A. C. (1988b). J. Phys. C Solid Stute Phys. 21, L887.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL . 91
The Evolution of Electronic Displays. 1942-1992 DERRICK GROVER 3 The Spinney. Haywards Heath. West Sussex RH16 IPL. United Kingdom
I . Introduction . . . . . . . . . . . . A The Cathode Ray Tube . . . . . . . B . Oscilloscopes . . . . . . . . . . C . Graphic Displays . . . . . . . . . D . Vacuum Fluorescent Displays . . . . . E . Other Developments . . . . . . . . F . Storage Technology . . . . . . . . 0. Visual Display Units . . . . . . . . H . Frame Stores . . . . . . . . . . I1 . Flat Panel Technology . . . . . . . . A . Light Emitters . . . . . . . . . . B . Subtractive or Light Controller Displays . C . Liquid Crystals with Memory (Coles, 1989) I I I . Projection Displays . . . . . . . . . I V . Three-Dimensional Displays . . . . . . A . Pseudo-3-D . . . . . . . . . . B . Stereo Systems . . . . . . . . . . C . Real 3-D Systems . . . . . . . . . D . Virtual Reality . . . . . . . . . . V . Conclusions . . . . . . . . . . . . References . . . . . . . . . . . .
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
231 232 233 234 236 237 237 238 242 244 245 246 247 252 253 253 254 254 255 256 256
I . INTRODUCTION When I was invited to review the evolution of displays over the last 50 years (Grover. 1992). it was observed that there were “not many people left” who could span that period of time . I was. at the beginning of this period. 10 years old with little interest in displays. and so I did not expect to have any references ‘ t o refer to for guidance . The answer came from an unexpected source since in 1957 I was inveigled into subscribing to the Encyclopeadia Britannica. of which I became quite critical . More space should. I thought. have been devoted to science and less to the considerable number of entries for obscure authors and even more obscure American towns . So when I checked on such entries as the cathode ray tube (CRT). television. and radar. I was surprised to find more than 50 pages devoted to 23 1
Copyright 0 1995 by Academic Press. Inc . All rights of reproduction in any form reserved. ISBN 0-12-014733-5
232
DERRICK GROVER
these topics (Encyclopaedia Britannica, 1957). The CRT was already in use in 1942 for radar, oscilloscopes, and television. Of course, during the war years considerable advances were being made in the use of displays. A short review of the evolution of display technology must necessarily ignore many aspects of the art, and it will be biased by my particular experiences. I have been asked t o base this paper on my presentation to the BA meeting when I felt freer to reminisce, rather than on my written contribution, which was more factual. My introduction to electronics occurred during a summer vacation job in 1952 at the research laboratories of STL. I picked up a copy of Thermionic Valve Circuits by Emrys Williams. The techniques described were fascinating, and there were interesting topics such as the Colpitts oscillator and the Puckle timebase. I read the book from cover to cover in the course of the vacation, and my conversion to electronics was complete. The electronic display must be regarded as the most important information medium. Its applications range from the simplest alphanumeric display on a meter or washing machine to the ubiquitous television set and to its role as a window into a radar or computer system. Of particular interest is its role as an information transducer which translates data in a format which is compatible with a computer, or electronic system, to information which is comprehensible t o a human. This information can usually be provided quickly, and often in response to the reaction of a person in real time. The limitations in some computer display applications, such as simulation and animation, lie in the considerable quantity of parallel information which must be processed in order to give a realistic impression to the perceptive eye. Ultimately it will provide virtual reality with systems which will simulate the real world ever more exactly-a prospect which undoubtedly will have sociological implications. Some discussion of the problems associated with display development is contained in (Grover, 1991).
A . The Cathode Ray Tube
The cathode ray tube spans the whole of the 50-year period of this review. It was manufactured in two main forms in which the principal difference lay in the method of deflecting the electron beam. The smaller tubes used in oscilloscopes utilized orthogonal electrostatic plates to give high rates of deflection in the x and y axes in order to monitor high-frequency waveforms. Rise times of the order of 1 nanosecond were possible by the late 1950s, although at this speed the screen size was limited. The sampling
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
233
oscilloscope was developed with comparable rise times, and storage tube oscilloscopes could record one-off events. The larger tubes required for radar and television employed magnetic coils for the deflection of the electron beam, and they were relatively slow. Deflection for the television tube was optimized by tuning the horizontal drivers to the line scan rate so that a quasi-oscillatory action was achieved, which minimized the energy expended in deflecting the beam. The principles used in the early tubes are still used today, although refinement in design has given a more linear display with a higher resolving power of 4,000 lines or more. One factor in the improvement in design has been the demands of the color tube. The shadow mask color tube in which separate electron beams were focused onto each of three color phosphors in a mosaic pattern was in place in the 1950s. Refinements in manufacture have reduced the costs in real terms to a fraction of the original. In the period 1955 to 1978 the shadow mask tube increased in brightness from 4 to 350 fL,and to 1,000 fL for the projection tube (Chang, 1980). Variations on this design have evolved. B. Oscilloscopes
The versatility of the cathode ray tube as an analog device has kept it in the forefront of display systems. Various scanning systems can be employed, but it is less usual these days to see the circular scan resulting from applying sinewaves in quadrature to the x-y plates. My first job in the mid-1950s at GEC Hirst Research Center was to utilize this mode of display to show the phase difference between two inputs by generating a blip at the appropriate point on the scan. The completed display was considered to be “very solidlooking.’’ The oscilloscope business in the early years appeared to be dominated by Cossor, and during the 1950s Solartron became a popular choice. Important factors in the decision were the ease of use in terms of measuring rise time and waveform amplitude. Thus, whereas on some oscilloscopes it was necessary to move the waveform with a potentiometer to achieve an accurate measurement, on others a preset graticule enabled the measurement to be read directly. A digital readout of these measurements had to await the 1980s. One of the milestones in the development of oscilloscopes was the introduction of added features to the basic function of the instrument. By about 1957 I was engaged in the design of transistorized digital electronics for a message switching system to be installed by ST&C in the Admiralty. The system clock was defined by the storage drum, which revolved at 50 Hz
234
DERRICK GROVER
and held about 2,000 bits per track. It was necessary to inspect each bit for error, and in particular to check how well the clock track joined up at the end of a revolution. The accuracy of the join in fact dictated the tolerances allowed in the whole system, and its measurement was important. Inspection of the track was solved by the Tektronics oscilloscopes, which had then been introduced, with a time-delayed trigger. It became possible to inspect each of the individual pulses on the clock track. The Tektronics was also the oscilloscope of first choice, since its graticule was accurately calibrated. The rise time was 0.1 microseconds, and the price at the time was in the range f800 to f 1500, equivalent in real terms to f10,000 to €20,000 today. They have evolved in the 1980s into instruments with complete systems incorporating microprocessors and digital readouts. In 1959, I had the problem of designing circuitry for a backup tape store. The jitter in the waveforms derived from this device made it impossible to assess the tolerances required, and the case for the purchase of a storage oscilloscope was made. My next experience of state-of-the-art oscilloscopes came in 1960 when I joined Remington Rand Univac in the U.S.A. I was given the task of designing a pulse generator with a rise time of 1 nanosecond. The oscilloscope provided was known as the EG&G scope, which itself had a rise time of 1 nanosecond. (The full names were difficult to remember but as I recall they were Edgerton, Germehaussen, and Greer.) The screen was about one inch across, and that, no doubt, was an achievement, considering that the rise time was comparable to the transit time of the electron through the tube. I was not allowed to view the screen directly because it was thought there was a danger of the emission of X-rays due to the high anode voltage. A video camera and television screen were used to relay the picture, and this had the advantage that the screen was magnified to about 10 inches. C. Graphic Displays
In 1967 I joined Cossor Electronics, by then a subsidiary of Raytheon. Raytheon in the U.S.A. was a pioneer in the development of alphanumeric displays, and its subsidiary provided the outlet in the U.K. The cost of about f10,OOO-20,000for these relatively dumb terminals was, by today’s standards, prohibitive and would be equivalent to €90,000-180,000 in 1993. The initial application was restricted to airline seat reservation systems because a lost sale was expensive enough to justify a real-time system. For six months I was situated in Boston in order to liaise with the parent company and help with the transfer of the technology, but by 1968 I had a particular brief to investigate the graphic displays developments in the U.S.A. It was the era of Project MAC at MIT, when the bureaucracy of
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1 992
235
time-sharing management systems occupied 80% of the computer time and people could not understand where the lost power had gone. Since there were graphic displays being refreshed from the same computer, the computing power available to other users became critical. The cost of refreshed graphics was often in the range E300,000 to f500,000, corresponding to f2.5 million to E4.5 million today. The demands on the computer system accelerated with the important concept of interaction with a display. The research of Ivan Sutherland (1963) had resulted in the sketchpad system, which was a significant milestone. The computer now held a display file in which the information was associated with a data structure. The light pen was the interactive device, which responded to the flash of ultraviolet at the instant when a line was being drawn on the cathode ray tube. This instant would correspond with the time that the computer was accessing the data structure, and accordingly the operator could pick a line on the screen for the computer to interpret in accordance with its position in the diagram or by other properties defined by attributes associated with the line. It was a time when many people were investigating the optimum data structures to be used for graphics, and it was necessary to burn much midnight oil to understand them. During the 1960s, high-performance calligraphic systems with dedicated computers were developed. The main constraints on the picture quality were the needs to control the movement of the spot, for line drawing on a cathode ray tube, to be linear and of constant luminance. The difficulty arises because of the need to control the movements in x and y so as to be proportional, and at the same rate of progress, over section of logarithmic curves. The screen had to be blanked out at the beginning of a line to obscure the nonlinearities when switching transistors. On reaching the end of a line the inertia of the spot (realized as current in an inductor) did not permit the movement to be stopped immediately, and it was necessary to blank out the screen to obscure oscillations. Some displays would show cusps where this was not adequately controlled. The start/stop nature of calligraphic line drawing also meant that significant power was required, and the effect of all these constraints together limited the number of vectors which could be shown on a screen. This was compensated for to some extent by reducing the refresh rate (and causing flicker) or by using long persistent phosphors which gave more acceptable flicker but which would smear when changes were made to the picture. The advantages of calligraphic displays were the ability to draw a relatively high-resolution line without edge problems (staircasing), and also the direct interaction which was possible when using a light pen. The shadow mask tube was unsuitable for calligraphic displays, and color was introduced with the penetron tube in which two layers of phosphor
236
DERRICK GROVER
were deposited on the screen. By changing the anode voltage, the energy with which the electrons penetrated the phosphors could be changed and in consequence the color (as many as four colors could be distinguished). The technique introduced difficulties in controlling the registration of the beam because of the change in deflection sensitivity with anode voltage. Postdeflection amplification (PDA) was used to alleviate the problem and was optimized by Ferranti Ltd., who added a voltage pedestal (sic) in order to maintain a tight control on parameters. Attempts were made to increase picture complexity by the use of rear port tubes so that fixed information (e.g., maps) could be projected optically onto the screen and variable data then written by the electron beam. Such displays were unsatisfying, in part because of the low color density of the optical image projected onto a translucent phosphor, and in part because of the mismatch in brightness between the optical and phosphorescent images. The spatial drifts in the two images were also subject t o independent factors. An interesting variation was the pipe tube developed by Plessey in the 1960s. The tube was shaped like a smoker’s pipe with the electron gun at the mouthpiece and the phosphor screen at the base of the bowl. The screen was viewed through a glass face plate at the top of the bowl. Since the phosphor was viewed on the same side as the electron beam, it did not suffer dispersion through the phosphor and a very bright, high-definition image was produced. Trapezium distortion had to be compensated for electronically. In more recent years the demand for flat screens for portable TV sets has resulted in the design of a flat CRT in which electrons are generated at the side and flow parallel to the screen where they are deflected onto the phosphor faceplate. The beam voltage needs to be kept low in order for the deflection voltages to be kept to reasonable limits. In order to give adequate luminance, channel plate amplifiers have been employed to increase the electron current by a factor of about 800. In an example of this system, the beam was further accelerated by 10 kV to give a raster scan display of 90 fL (Sobel, 1992). D. Vacuum Fluorescent Displays
Vacuum fluorescent displays are essentially low-voltage CRTs running at a low anode potential of about 100 volts. A wire grid controls the emission from a low-temperature cathode. Recent developments utilize, within the glass envelope, many phosphor-coated anodes, which are themselves connected to the drains of an array of field effect transistors. They are used for very small displays in the viewfinders of camcorders.
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
237
E. Other Developments
Low-cost methods of displaying graphics by the use of storage tubes were also being developed at MIT (Stotz and Cheek, 1967). A problem lay in the time required to write a diagram on the screen and still maintain the registration necessary in order for a picture to join up. The answer lay in digital deflection techniques and in solving the problem of drawing the best approximation to an analog line on a grid of digitally defined points. A binary rate multiplier was the name given to the resulting circuits, which chose the increment in x and y to give the best approximation. Tektronics further developed storage tube graphics to become the widely used graphics terminals of the 1970s, as described later. Other topics of interest were the “Harvard Laboratory for Computer Graphics’’ lunch-time lectures on computer graphics. A continuing theme was the use of a low-cost printer to produce the thematic mapping which was a feature of their work. The director was at pains to point out how much successful work could be produced with relatively inexpensive equipment . The ink-jet printer made its appearance in 1968 and was demonstrated at an exhibition in Anaheim, California. It was quiet compared with the line and chain printers of the period and made a gentle plopping noise as each character was printed. The printing was, at the time, spidery and quavery and unacceptable for commercial use. F. Storage Technology
Many of the problems of the refreshed calligraphic display were solved by the storage tube, with its ability to show a large number of vectors. It overcame the problems of power dissipation and flicker by drawing vectors relatively slowly and storing them on the screen (Stotz and Cheek, 1967; Van-Dam, 1970). The price paid for this was a relatively low-contrast image, together with the inability to interact using a light pen, or to change the data on the screen without rewriting the total display. The image was stored electrically on a grid behind the screen-independently of the computer. Interaction had to be by a spatial cursors using, for example, crossed lines to identify xy coordinates and then relatively cumbersome methods of associating those coordinates with data held in computer store. By 1979, Tektronix could show 15,OOO alphanumerics on the screen or draw vectors at 20,000 cm/s. At the time, various methods were devised for producing hybrid displays incorporating storage capability together with the ability to erase data on
238
DERRICK GROVER
the screen. One such development, for overwriting selected storage tube data with an erase beam, was subject to problems of registration and interference between the erase beam and the original lines of data; these tolerances limited the effective resolution of the picture (Ellis, 1971). An important development (Street, 1974) was the Laser Scan Laboratories system (HRD1) in which a deflected laser beam was used to write on photochromic film which was sensitive to light of a certain wavelength. It darkened when exposed and retained the image for a period of time while being flooded with an irradiating beam for viewing purposes. The deflection system used mirrors controlled with interferometric techniques to within a microradian, and registration of the writing beam was excellent with a resolution of 10,000 lines. In the early 1970s CRTs were developed in which cathodochromic powders were deposited as a phosphor screen. When these materials are exposed to an electron beam, the crystalline structure is changed to produce a high-contrast stored image due to the generation of color centers. The material could be switched to its original bleached state by the application of light or heat, and an electron beam was used to overwrite data in erase mode. Fatigue of the material after many erasures made it unsuitable for commercial use. The contrast ratio reduced to 50% after about 1,000 cycles. The silicon storage tube was also selectively erasable and provided a resolution of 4096 x 4096. It could be drawn in calligraphic mode and could be read in a raster mode compatible with a television system. Hardware analog control of the reading scan permitted an electronic zoom facility. Other forms of scan converter of lower resolution were also manufactured. G . Visual Display Units
In parallel with the development of the calligraphic displays in the 1960s were the developments of the refreshed display for alphanumeric information known as a visual display unit (VDU) and its low-cost graphic derivatives. The interactive techniques that were available were implemented as hardware-controlled cursors, and the memory was a delay line store in which the position of the cursor was defined by an extra bit in each character byte (Jones, 1976). Various refinements were added by creative hardware design. In England I became involved in systems design, and in particular in the design of low-cost graphics based on the technology of the VDU. In those days, before the microprocessor, it was necessary to design all systems features in hardware. An example of a Raytheon/Cossor VDU system of the 1960s is shown in
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
239
Circulating refresh store
To chorocter generator
Character buffer From keyboard ond cornporator logic
Comrns buffer Carnmunicotions system via logic
FIGURE1. Typical refresh store, showing logic and control characteristics.
Fig. 1 (Jones, 1976). The circulating refresh store is a delay line. Since the delay line allowed no pause in the character transfer, it was necessary to use extra character stores for editing. Thus, in normal operation store No. 1 would be in series with the delay line. By switching out store No. I at the appropriate moment, a character could be deleted, and this could be repeated at each refresh of the screen. Alternatively, by switching in store No. 2, an extra space would be added to the data stream into which a character from the keyboard, or communications line interface, could be inserted at each refresh. The position of editing was defined by the cursor position. This was originally identified by an extra bit in the character byte, and the cursor movement was limited to one character position at each refresh. It is curious that it was several years before it was realized that a counter could define the cursor position, which not only saved many bits in storage but also allowed the cursor to be moved by a line in a refresh time. The system was further enhanced by customer requirements. In particular, the LACES system installed at Heathrow airport, England, initiated the facility to define fixed and variable data by the use of further control codes, and the cursor could then be jumped from one data field to another. The delay line later became a semiconductor store. Since it was then possible t o stop the progression of the data through the store (for a period of time), editing became a simpler operation.
240
DERRICK GROVER
FIGURE2. The operation of a monoscope.
The characters and symbols on many displays in the 1960s had to be generated using tubes known as monoscopes which gave excellent quality. The monoscope was a form of CRT incorporating a mask painted with the character shapes using a secondary emission material. The operation is illustrated in Fig. 2, where a letter “F” is to be written onto the display screen. The code for the letter would be received from the display store and be applied to a D/A converter to deflect the monoscope electron beam and position it at the beginning of the letter. The electron beam in the display tube would meanwhile progress to the next position in the line of text in order to print the letter “F.” A high-frequency waveform (known as a diddle) would then be applied simultaneously to both beams in order to scan the space occupied by the letter. The secondary emission from the monoscope screen was used to turn on the beam current in the display CRT screen so that the letter “F” was copied. The charactron was an alternative tube in which symbols were punched through the mask. Subsequently a semiconductor chip was designed to produce a 5 x 7 character format. The quality was poor by comparison, but the significantly lower costs could not be ignored. In parallel with the VDU development were low-cost graphics systems utilizing similar hardware constraints insofar as they were based on delay line stores and were limited in interactive capability to the use of a cursor (Grover, 1973). In fact, it was possible to adapt the VDU system in interesting ways to produce features useful for graphics. It became an intellectual
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
24 1
challenge to incorporate customers’ requirements into the system. Elsewhere, drum stores were also developed to hold a display file and refresh the display in a rotation of the drum. There were different views as to the size of the market for low-cost graphics in competition with the Tektronics storage tube display. The main advantages of a VDU-based version were a bright display, real-time interaction, and compatibility with a light pen, but the resolution was lower and the number of vectors more restricted because of the delay line store. I surveyed the market, and I found that the large utilities and universities were interested. When I discussed the features with an accountant, he declared that he could assess a column of figures more readily than he could see trends in a graph. I also had discussions with a cartoonist, pointing out that he need only draw the extreme positions of his figures and then let the computer calculate the increments in between. He pondered the point for a while and then observed that-if at one moment his character was standing beside his horse and then the next moment he was on top of it-then it looked funnier. There were moments when my survey was not very encouraging. Ultimately it was decided that the market at the time was not sufficiently interested in graphic displays to justify manufacturing in the quantity demanded by alphanumeric displays. A part of the market was being served by VDUs in which symbols could be substituted for alphanumeric characters. It was possible to produce low-resolution diagrams which were sufficient to monitor, for example, control systems. In 1971 I joined the National Research Development Corporation (NRDC), where my first task was to survey the displays and computer graphics developments in the U.K.A part of this survey was published in 1978. A little later I became chairman of the Displays Specialist Group of the British Computer Society, and during the early 1970s we covered most of the equipment in visual information technology. The topics ranged from printers and graphics displays to a state-of-the-art lecture from RSRE on solid-state technology, when we were given a demonstration of many types of liquid crystal, electroluminescence, and other technologies. The one topic which we did not cover was VDUs (since we thought we knew it all), but Gareth Jones, an editor with IPC Science and Technology Press who was on our committee, encouraged us to prepare a book on the subject (Jones, 1976). We did not know the amount of work it would entail. Teletext appeared about this time, but unfortunately the potential of the emerging technology was underestimated so that the system was standardized on a limited picture capability. A review of the price of displays (Chap. 1 in Jones, 1976) is shown in Fig. 3 (the prices corrected for inflation to 1993 are also shown). An idea of the evolution in display systems (Grover 1977) is given in Fig. 4.
242
DERRICK GROVER
I
oscilloscopes Selective erase storage tube
Teletypewriter -
l-
B/W teletext
Storage tube terminals
Color teletext
Intelligent VDU
Color television
4 l
-
B N television
LOW-cost refreshed graphas
VDU Teletype compatible & V
Plasma panel
-
Minicomputer graphics
YFAR
100
200
500
1000
500
1000
2000
5000
2000
10000 Range of price.C
5000
lOW0
1975
20000
50000
1993 (corrected lor inflation)
FIGURE3. Price ranges as of 1975 for various displays. Prices corrected for inflation to 1993 are also shown.
H . Frame Stores
In the early 1970s it was possible to control a black-and-white picture from a storage array which allocated one bit per pixel of display. Subsequently, multiple arrays were added which could be accessed in parallel t o control gray-scale pictures and color. The DTI-funded CAD Centre at Cambridge had been formed, and a significant investment was made in displays and the associated software. One of the developments known as the Bugstore incorporated sufficient backing store t o hold a grayscale display file running in raster scan mode. The cost of the store was not economic for a commercial display, but the system gave the center a lead in the development of grayscale pictures. A problem with the calligraphic display had been the computing-intensive task of calculating hidden lines when showing a 3-D picture in a 2-D space; numerous papers were written on the topic. With a raster display, the solution was to write the rearmost plane first and progress towards the front, so that the successive planes automatically covered the hidden areas. Most of the these developments were overtaken in the mid-1970s by the advent of semiconductor stores of sufficient capacity and speed t o hold a television frame, and that technology was perhaps the most significant in the development of low-cost displays. It made it possible to produce a computer add-on for f50 which changed the nation’s cultural attitudes to
HIGH PERFORMANCE GRAPHICS
ALPHANUMERIC VDUs
TELEVISION
Rear Pon OpllCS
Major Computer Manulamren and ldium V m W General
Symbokc RasterSEanJ
-SOFTCOPY
Mcmlilm
I
Tekmnix system
Symbolic Gmphrs
Prinmbon
MDS
I
Stow
I
M'"-p"Wn Hardware DEC GT44 lMIACPDS4 M-W& ~ ~ 7 2 1 S,mrom
TGa GlDl
4 Mechanral
Ramlek GX1W CAM: Greyscales
801
J-
XY P b n e n
Mtcmlixhe
I
I
UNrahsche (Mamru Ah4T)
Reireshed Dop(avs
swcc
LOW
A Primers
"enen
kannena
Store and ROM
SOLID STATE
HARD COPY
Chmac lrneraCtNe system
CORE Incremema1 printer
I
Non
I
I
I
\
?4kClNe
N e m 181 Slabletron system
Mapr Computer
t
Marmlamwnand ldimm Venor General
Phomchromr HRDI Laser x a n d v W
Caltwdochmmlcs
Comlng 904 dlsphv
OpttEal slide
(Pnncaton
DVST
TV cobur lube Materul propBflDs
A 'matron tube 4
I
FIGURE4. The evolution in display systems as of 1975.
I
w
244
DERRICK GROVER
computers. It introduced an era when computer graphics software techniques advanced rapidly to produce ever more realistic pictures on a CRT screen. The evolution from grayscale to color, texture, multiple reflections, and animation have illustrated the trend to demand ever more powerful processing capability. It became necessary to map line drawings onto the pixel positions which most nearly fitted the line, and aliasing was a problem with near-horizontal and -vertical lines. Anti-aliasing was achieved by simulating the output of a video camera and shading pixels which were partly within the line space. It made a significant improvement to the picture. Alphanumeric data could be written on a television screen as a dot matrix pattern utilizing chips designed for the purpose, and a pseudo-improvement in resolution could be engineered by staggering the bits required for an oblique line. Interaction again relied upon a spatial cursor for defining xy position. If a light pen was used, it performed the role of a spatial cursor, since the time of UV flash was related to xy coordinates rather than position in store. Methods for improving the picture by introducing grayscale, and subsequently color, first utilized planes of data store which could be used for various purposes such as low-resolution color or different overlays on a picture. The acceleration in the development in storage media and computing power in the 1980s gradually solved the problems of color rendering and resolution. Color systems were manufactured with a definition of up to 24 bitdpixel. The major problem now to be resolved related to the computing power required to animate pictures for simulation purposes and modeling. Virtual reality is the latest development where the demand exceeds the computing power available. If Northcote Parkinson ever considered the situation, he would doubtless nominate a law to the effect, “The demands of the display industry shall always expand to fill the computing power available.” A more correct version might read, “The demands of the computer graphics industry shall always exceed the computing power that is commercially viable.” 11. FLATPANELTECHNOLOGY
The search for a flat panel technology which would replace the physical bulk of the CRT has been attempted for many decades, and by the end of 1968 it was seen as of increasing importance to make a display which was compatible with integrated electronics both in terms of voltage and size. By that time electroluminescent displays had been developed for many years,
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
245
and plasma panels were also on the market. The technology which was to become the most important was liquid crystal, and some work was published in the late 1960s. A particular motivation was the need for a small display which was compatible with the significant reduction in size being achieved for radar systems. Consequently, from 1970 liquid crystal was investigated seriously as a display medium at the Royal Radar Establishment, later RSRE, in association with the University of Hull. A. Light Emitters
1. Electroluminescent Displays Electroluminescent displays were subject to extensive research in the 1950s, and in the mid-1960s high-brightness dc powder phosphor devices were demonstrated. Pulsed dc electroluminescence gave improved brightness and lifetime. The basic structure most used comprised ZnS doped with Mn as a phosphor sandwiched between two electrodes. Dc power is reported to be the cheapest device to manufacture. There is a gradual loss of brightness, although this can be compensated for by constant current drives and optical feedback circuits. Thin-film electroluminescence was demonstrated in the 1970s. Structures excited by an ac current showed promise and in the late 1980s were considered a most promising technology, giving high brightness, contrast, and long life and enabling the largest panels to be manufactured so far. Fabrication incorporates sputtering, evaporation, and etching with as many as seven sequential depositions, so that the final yield could be low. Displays with pixel counts up to 800,000 could be manufactured, and full-color displays were also made. 2 . The Plasma Display Plasma display panels were being marketed in the 1970s. They consisted of a honeycomb of tiny neon cells. Orthogonal metallic strips plated onto the front and rear of the panel provided the matrix addressing for the cells between them. If the potential between the strips exceeded the strike voltage then a cell was turned on; it would be turned off when the potential difference was reduced below the maintenance voltage. The resolution was limited by the series resistor necessary for each cell. A self-scan panel was developed by Burroughs which used a preferential glow technique to automatically scan the columns so that the column drive circuits could be dispensed with. An ac plasma panel was developed by Owens of Illinois which was suitable for graphic displays. Resolution of 60 lines per inch was possible,
246
DERRICK GROVER
and inherent memory was achieved by driving the orthogonal electrodes with a high-frequency (50-100 kHz) sustaining square wave upon which the strike voltage would be superimposed. Resolution was 512 x 512 points, permitting 4000 characters. Subsequent developments in the 1980s utilized the ultraviolet light emitted from neon (or xenon for its greater UV content) cells t o activate different color phosphors. The technology in 1992 is being considered for large-screen television by virtue of its relatively good manufacturing yield compared with liquid crystal. 3 . Light-Emitting Diode Displays
The difficulty in the development of light-emitting diode displays has been to produce adequate brightness across the color spectrum. Quantum efficiency considerations enable red to be generated more cheaply and relatively efficiently. Research has continued over the last two decades to provide sufficient brightness at the blue end of the spectrum. Currently blue diodes are manufactured with a limited brightness of 5 mcd, whereas red diodes can generate in excess of 5 cd (Watanabe, 1989). B. Subtractive or Light Controller Displays 1. Liquid Crystal
Liquid crystals are dipoles composed of long slender molecules. When the molecule is parallel to the direction of light, the light is relatively unimpeded, but light at right angles is absorbed. A cell in its simplest form utilizes an electric field to align the molecules with the light. On removal of the field, the molecules relax to a scattered state and light is absorbed. If I may be forgiven the analogy, a box of matches closely scattered on a table will obscure that part of the table, but if glued in position vertically over the same area then the coverage is less and some of the table will be seen. For the analogy to be more correct, the aspect ratio of the match has to be much greater, and there have to be many boxes of matches. The twisted nematic (TN) cell became the preferred contender because of its better contrast compared with the scattered light cell. In the TN cell, molecules on one suface are orthogonal t o the other. This causes a 90" twist which rotates the plane of polarized light by 90". With a positive liquid crystal, the application of an electric field aligns the molecules parallel t o the field to remove polarization of the light. Since the polarizers are orthogonal, the cell becomes dark with the application of the field, or light when the field is removed. The cell was driven by 1.5 V ac at low frequency,
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
247
and time of changeover was about 100 ms. It was visible in bright ambient light and had a reasonable angle of view (Hilsum, 1981). The performance of twisted nematic liquid crystal (contrast ratio 3 : 1 and a viewing angle of 30") was improved by doping the material with a cholesteric liquid crystal (Mosely, 1989). This material has a natural helical structure, and supertwist liquid crystal was developed with twist angles of up to 270". The higher the angle, the better the contrast and the viewability of the display, although the cell production was more difficult. The parameters of the supertwist display were also more critical than for others, and a higher value of pretilt angle (>So) was required. The early liquid crystals were difficult to multiplex without loss of contrast. The transmission voltage characteristic was shallow and dependent on temperature. Liquid crystals were developed after 1973 with a steeper characteristic, which enabled a larger number of lines to be multiplexed. Temperature-sensing elements were used to change the voltage drive in sympathy with the change in temperature.
C. Liquid Crystals with Memory (Coles, 1989) 1. h e c t i c Liquid Crystal Smectic A liquid crystal devices are interesting for large-area displays because they can be switched in a relatively short time of looms, have long-term memory, and can be multiplexed. Application of a low-frequency ac voltage (10-100 Hz) renders it optically opaque, and high frequency of greater than 1 kHz gives a clear homeotropic texture. Alternatively, Smectic A can be written with a laser beam to heat the material into its isotropic phase, which produces a random array of molecules. Rapid cooling can produce a scattered random texture in the absence of an electric field, or a clear homeotropic texture in the presence of a moderate electric field. 2 . Ferroelectric Liquid Crystal (Ross, 1989)
Ferroelectric liquid crystals were developed in the 1980s. The most prominant of them were the chiral Sc phase, in which molecules were chemically engineered to have an electric dipole across the molecular axis. Application of a dc field across a device will switch the molecules in one direction, and this can be reversed with an opposing field to switch them back in a time of about 10 microseconds. If the cell is used with crossed polarizers, then a dark state can be produced which alternates with a light-transmitting state. The fastest ferroelectric crystals were temperature-sensitive, but slower cells
248
DERRICK GROVER
switching in about 100 microseconds were more tolerant and have been further developed. The memory and multiplexing capability make it a candidate for large-area matrix-addressed displays with high information content. A demonstration of a ferroelectric display (Thorn-EMI, 1992) with a resolution of 640 x 480 and a line address time of 70 microseconds had a contrast ratio of 11 : 1 and operating temperature range 0-50°C. 3. Polymer Dispersed Liquid Crystal Films (PDLC)
In the later 1980s, liquid crystal films were being produced comprising microdroplets of liquid crystal immersed in a thin polymer film (Spruce and Pringle, 1992). The advantage of this technique was claimed to be easier fabrication, which overcame the problems of conventional liquid crystals, such as surface alignment, cell filling, and sealing. The PDLC shutters d o not require polarizers, which, in addition to reducing the light transmission, were considered fragile and costly. Reponse times of less than 1 ms have been reported. The technology is considered to be more suitable for large areas, since the problem of maintaining the constant spacing between the electrodes, to provide uniform optical response, is automatically maintained by the polymer. The display may also be molded to different shapes. The main disadvantages are the high driving voltage of 30-70 volts, difficulty in multiplexing, and the need for controlled lighting for the scattering display. 4. Active Matrix Addressed Displays Further attention has been paid to developing devices with nonlinear characteristics which could be placed in series with the cell as an aid to multiplexing. The simplest of these is a varistor material using thin-film technology, and subsequently thin-film transistors based upon amorphous silicon. A thin-film transistor switch matrix is shown in Fig. 5 . Bidirectional field effect transistors are used. A row of gate electrodes is selected, and the data signals are applied to the columns which are connected to an electrode of each transistor. In order to maintain zero dc current through the liquid crystal, alternating data pulses are applied to the column, and the electrode is alternately source or drain depending on the polarity. The third electrode is connected to the liquid crystal pixel, which is manifested as a capacitance in the circuit. Commercial displays were produced in the 1980s giving about 200 lines of display, rising to 400, Poly-silicon gives a higher ratio of on to off current and accordingly permits a greater number of lines to be addressed. The processing temperatures, however, are less compatible with low-cost glass, and there have been moves to develop cadmium selenide transistors which
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
249
Data
electrode) bus
....0\.\.
4
\\\\\\
.:-y> :. 4
o
<
-
f \
\.\..\.
4
..\\\\.
4
4
I
.
.
.
I
.
.
.
I
.
.
f \
4 \\\\\\,
:.:::.>
..\.\.
f
TFT switch
.\\\\\. n .:.+>. . .
....
:;: : t
Liquid crystal pixel
\\.
.
4
...:,,.\ ..
c.,, \\. -
.-..,,--; . . ., .... -.... .....A ---
---
FIGURE 5 . A thin-film transistor switch matrix.
have the potential of addressing up to 4,000 lines of display (Lee et al., 1991). The number of addressable lines is dependent on the ratio of on t o off currents of the transistor switch. The higher the on current, the faster the liquid crystal capacitance can be charged up, and therefore the more lines can be addressed in a refresh time. The lower the off current, the slower the discharge, and therefore the longer can be the time before refreshing. A graph (circa 1989) of on current versus addressable lines is shown in Fig. 6 . The performance of both poly-silicon and cadmium selenide have been further developed since then. 3000
No. of Lines 1000
I
1o 8 Amps On current (per square of device L=W)
FIGURE 6. On current vs. addressable lines for active matrix addressed displays, cu. 1989.
250
DERRICK GROVER
:I1
:9
;7
13
19
-
w
v
15
w
.
21
v
17
FIGURE7. Insulated column drivers.
An important requirement is to prevent short circuits which obliterate lines of pixels and are objectionably noticeable. They particularly occur as a result of a short at the crossover between row and column buses. An idea, elegant in its simplicity, was invented at GEC to overcome this problem. The column buses are printed on the opposite side of the liquid crystal to the row buses, so that it is impossible for a crossover short to occur. A reconfiguration of the circuit was required, as shown in Fig. 7. The gate buses alternate with earth lines on one side of the substrate, but the columns, shown dotted, are on the other side of the liquid crystal. The data pulses are applied to transistor TI 1 via the capacitance of the liquid crystal C l 1. There is a price to pay insofar as the capacitance of the circuit is increased, and approximately twice the voltage is required in order to drive the array. A plasma switching system, developed at Tektronics, is reported (Grover, 1992) which dispenses with a matrix of transistors by utilizing channels of gas which can be ionized to choose the row of pixels to be addressed. A groove in a glass sheet forms the channel for the gas. A strobe applied to the channel anode causes the gas to ionize and become conducting. A pulse of charge can thereby move from the column electrode via the liquid crystal to earth through the ionized gas. When the gas is not ionized, it becomes an insulator, thereby deselecting the row. A further constraint at present on addressing large-area displays is the need to address every line in x and y. Mechanical plugs and sockets for this purpose will be either inherently unreliable or expensive. Although improved techniques are being developed to connect the driving circuitry to
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
25 1
the display substrate, the preferred method in the current scenario is to address the substrate serially and process the data for integrated drivers on the same substrate to minimize the external connections to the display. Silicon technology for this purpose has provided limited results, but it is possible that the cadmium selenide technology with its greater mobility will make an important contribution. 5 . Electrochromic Displays
Electrochromic effects-electrically and electrochemically induced color changes in a material-were reported more than 50 years ago, but the main effort to apply them to display technology has been concentrated into the last 20 years. Among the light controller or subtractive displays, electrochromic technology is potentially the main rival to liquid crystal when fast switching rates are not paramount. It has many attractive properties in its favor both in its appearance and inherent memory, and in its compatibility with conventional semiconductor controlling circuits. The type of electrochromic cell investigated most is the electrochemical cell where ions are transferred through an electrolyte under a drive voltage which is limited to prevent damage to the cell so that a fast rate of switching cannot be expected. A cell may change state in less than 100 ms or in several seconds, depending on the temperature and the technology used. The developments in electrochromics are divided into two main areas of application. In one the emphasis is on a polychromatic cell which will display different colors and interesting results are being obtained using derivatives of the phthalocyanine dyes. In the second, the emphasis is on the production of monochromatic cells with a solid electrolyte which has adequate optical density and reliability for public information boards (special issue on electrochromic technology, 1988). There are many other forms of electrochromic cell, and the properties of any one type are not necessarily representative of the whole. Thus, cells based on viologen compounds generate an insoluble precipitate at the electrode, and electroplating also comes within the definition of an electrochromic cell. The main emphasis now appears to be on cells based on lithium tungsten trioxide, which are reported to give robust solid-state cells of acceptable performance and reliability. Changes in color in tungsten trioxide are caused by guest atom additions to the crystal lattice. By introducing additional atoms into the interstitial sites of the lattice structure of a crystalline compound, free carriers are produced, giving electronic transitions in the visible region of the spectrum. The intensity of color seen is a function of the guest atom density.
252
DERRICK GROVER
An interesting feature is that the cell is a battery and the voltage across it can be used to monitor the state of the cell. This property is of value if it is necessary to authenticate the information being displayed by a means which is independent of the driving system. There are few other display technologies, other than mechanical displays, which have this independent authentication property. The voltage is also a measure of the optical density or grayscale and can be used in a feedback loop if the need should arise. 6 . Electrophoretic Displays
There was much interest in the electrophoretic display in the early 1970s. It used a suspension of colloidal particles in a liquid of contrasting color. Application of an electric field caused the particles to move between front and back electrodes, which could be shaped to form a symbol. It provided a pleasant display with a good angle of view. Power consumption was about 300 microwatts/cm2, but it required a drive of about 75 volts. It was, however, subject to coagulation of the particles, and lifetime was inadequate. 111. PROJECTION DISPLAYS
Large-area displays have been of particular interest to the military, entertainment and advertising industries. In 1943 the first demonstration of the Eidophor projection system was given. Subsequently color was added by the coincidence of three separate projection systems. It was the first of several light valve designs utilizing the distortion of an oil film by an electron beam, The light valve system comprises a lens and two combs which are placed so that light from a 2.5 kW xenon lamp passing through the slots of the first comb is blocked by the teeth of the second. An oil film placed between the combs has no effect until a groove is caused by the charge from an electron beam. The groove diffracts the light around the teeth of the second comb, and this increases with the depth of the groove and in consequence with the degree of charge imparted by the electron beam. Grayscale is thus possible. Color is achieved by using three systems projected coincidently onto the screen. One modification of the system utilizes the variation of diffraction with wavelength to separate the three primary color images generated from a single light valve. Liquid crystal light valves have gained in popularity as the performance of the devices has improved, especially since the liquid crystal can be chosen for high-temperature operation. The improved contrast available with active matrix addressed arrays is important. They are considered by many
THE EVOLUTlON OF ELECTRONIC DISPLAYS. 1942-1992
253
people to be better for large-area displays, such as HDTV, than wallmounted displays, because of the problem of yield when manufacturing large-area liquid crystal displays. Liquid crystal light valves may also be optically and thermally addressed. The latter method gives higher resolution, but is too slow for video systems. The saturated colors available from lasers make them a preferred light source, but the popularity of laser projection systems has been limited because of the cumbersome size of gas lasers. The development of solidstate lasers is providing more scope for laser projection displays. Two main methods of deflecting the laser beam are employed. Mechanical systems use a continuously rotating mirror in the form of a polygon. Imperfections in the angle of the mirror surface cause jitter. A system developed in the 1980s sensed the errors electronically and applied a correction to the delay of a line of stored data to compensate for the jitter. Acousto-optic deflectors employ a Bragg cell in which the frequency of the acoustic wave applied to the cell is varied in order to vary the spacing of the diffraction grating formed from the variations in refractive index. Projection CRTs with useful output have been available since the 1950s, but they suffer from some disadvantages. The color gamut is limited by the phosphor used. The definition of the beam decreases with brightness, and high anode voltages in the range 30-50 kV are necessary. Color is by the use of three coincident systems, and errors in convergence are a problem which is being solved by the use of automated procedures. These include holding the parameters in a computer memory and calculating the compensation required under different conditions.
IV. THREE-DIMENSIONAL DISPLAYS It was part of our function at NRDC, later to become the British Technology Group (BTG), to be in touch with the state of the art. This meant monitoring university developments and projects funded by the research councils, as well as industrial proposals and submissions by private individuals. Potential inventors are attracted to glamorous subjects-as yet unsolved by science. Three-dimensional television is such a subject, and I received many curious submissions. A. Pseudo-3-D
Various pseudo-stereo systems have been shown with the intention of deluding the brain into thinking it is seeing 3-D. Such systems derive
254
DERRICK GROVER
different data from monoscopic sequences of film or video pictures so that movement gives pseudo-stereo. Thus, if the image presented to one eye is delayed by one or more frames compared to the other eye, then objects moving laterally across the screen will appear t o be closer than stationary objects. An effect is also obtained by using a rotating color filter so that different colors are delayed by different amounts. These systems present false information, but the brain accepts the illusion and fills in the missing information. B. Stereo Systems
Methods utilizing stereo images have been available for many years, and several techniques have been introduced for producing three-dimensional television. Some have been based on bicolor or polarized spectacles to differentiate the separate stereo images. Methods of generating stereo computer images, by reflecting half a screen of data, were reported (1971). A time-multiplexed system which utilized switched liquid crystal spectacles to separate alternate TV frames was later developed at Leeds University. The refresh rate of 25Hz for each eye caused flicker, and an enhanced version running at twice the rate was produced in Japan, but it was then not compatible with a standard TV monitor. Systems which operate without special glasses usually utilise lenticular screens to separate cones of emitted light to be visible to only one eye. A very small tolerance in lateral drift is required in such systems, and digitally addressed flat panel technology will be the more suitable by virtue of the inability of the image to drift in x or y . C. Real 3 - 0 Systems
Holographic displays are outside the scope of this review, since the considerable data required to support them is beyond the capability of electronic systems. It is likely that such resolution is not necessary in practice. Electronic 3-D systems, in which the observer can look around the side of the object, use time-multiplexed data. One of these uses a rotating plane of LEDs on which individual points are flashed at the appropriate instant. In other systems, oscillating mirrors are used to reflect the image projected from a CRT. In one implementation (Lasik, 1976), the distance between a pair of mirrors is altered to change the effective distance of the image from the observer. In another, the focal length of a flexible reflective concave mirror changes in an oscillatory fashion. The resulting distortion is compensated for electronically.
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
255
In one development (Travis and Lang, 1991), the different images are time-multiplexed and projected at the angles appropriate for each image. The view is accordingly different for each eye as needed for stereoscopic effect, and the view changes with the angle of observation. It is conjectured that an extension of this system, to give 100 different views of a three-dimensional object, would be sufficient for realistic 3-D. Although this would increase the data flow by 100 times compared with a two-dimensional display, it nevertheless appears feasible if computing power is going to further increase by the 1,OOO times forecast for the relatively near future. It is likely that optical methods would have to be used both for switching and addressing the data. Such methods are uneconomical in their utilization of data since a single viewer would only be seeing a fraction of the data being transmitted at one instant. The provision of devices for monitoring the position of the viewer’s eye could, in principle, be used to reduce the data processing by a large factor. Some of the 3-D techniques are described in more detail in (Collender, 1986).
D. Virtual Reality The helmet-mounted displays used to provide separate images to each eye, now known as virtual reality, provide a potentially powerful technique. They are currently restrained by the resolution of the image and by the relatively cumbersome gear. An early system incorporating two CRTs in a helmet was developed at the University of Utah (Vickers, 1974). It was used to draw wire frame diagrams with which the user interacted using a wand. The system monitored the position of the wand and the user’s head. The provision of a form of spectacles with the optics to focus a solid-state display for each eye (with peripheral vision) has been a goal for many years. These developments are being associated with techniques which monitor the movements of the hand or the eye in order to simulate a real environment. Tactile interaction is a current trend which uses strain gauges and pressure inducers within gloves in order to simulate natural behavior. The computer processing power available is currently insufficient to interpret the interaction and update the image in real time. Delays of 20ms and more are reported, which are disorienting for the operator. The virtual reality display of the future is seen as a tool to create teleexistence-defined as the creation of a strong sense of remote presence (for operating robots under remote supervisory control). Under this scheme, the robot becomes an intelligent prosthesis where the human performs the perceptual and cognitive functions needed to recognize patterns, navigate obstacles, and make evaluations (Rheingold, 1991). The developments will
256
DERRICK GROVER
clearly be important as a monitor for keyhole (and finer) surgery and for the manipulation of instruments on a microscopic scale. It is evident that there will be the added problem of monitoring the position with an accuracy comensurate with the precision required, perhaps eventually with nanotechnology.
V. CONCLUSIONS
The developments in displays for graphics presentation and interaction are exciting and show no sign of abating. The medium will be the most powerful communications tool available in the foreseeable future. The ability to visualize and manipulate on a microscopic scale will be important for surgery and precision manufacture. The possibility of controlling, and interacting with, events across national boundaries in real time will be of interest to the legislators of the future. The degree t o which people will be able to immerse themselves in a world of virtual reality without psychological and sociological repercussions is a further area of interest. Its potential application to simulation would appear to be limited only by the user’s willingness to be subjected t o the experience. Much has been left unmentioned, and the reader is referred to the citations in the bibliography which contain many further references.
REFERENCES Chang, I. F. (1980). Recent advances is display technologies. Proc. SID Vol. 2112. Coles, H. J. (1989). Introduction and overview of liquid crystal display. IEE coll. on “Graphic Display Devices.” Dig. 1989171, May, 1989. Collender, R. (1986). 3-D television, movies and computer graphics without glasses. I n “Displays.” July, 1986. Ellis, A. B. E. (1971). A direct view storage tube with selective erasure. IEE Display Conference, Pub. No. 80, pp. 1-5, Sept. 1971. Encyclopaedia Erittanica (1957). Vol. 18, p. 872; Vol. 12, p. 450; Vol. 21, p. 912. Grover, D. J. (1973). Message orientated interaction graphics. Comput. J. 16(1), 39. Grover, D. J. (1977). Hardware for visual information. In “Computer Aided Design,” Vol. 9, NO. 4, pp. 223-232. Grover, D. J. (1978). A review of graphics work in the UK. Online conference on Interactive Computer Graphics. September, 1978. Grover, D. J. (1991). Constraints on the evolution of display technology. BCS Comp. Graphics and Displays spec. Group seminar. Dec., 1991. Grover, D. J. (1992). The electronic display: Its evolution and potential. British Association Annual Meeting, Southampton University. Hilsum, C. (1981). Recent progress on solid state displays. Inst. Physics conf. ser. no. 57.
THE EVOLUTION OF ELECTRONIC DISPLAYS, 1942-1992
257
Jones, 1. A. (1976). Technology of VDU’s. In “Visual Display Units and Their Applications” (D. Grover, Ed.), Chap. 2. IPC Science & Technology Press. Lasik, 0 . L. (1976). A three dimensional display with true depth and parallax. SID 76 digest, pp. 104-5. Lee, M., Wright, S., Judge, C., and Cheung, P. (1991). High mobility cadmium selenide transistors. Intl. Research Display Conference, San Diego. Mosely, A. (1989). Supertwist LCD technology. IEE coll. on “Graphic Display Devices.” Dig. 1989/71, May, 1989. Ortony, A. (1971). A system for stereo viewing. Compuf. J. 14(2), pp. 140-144. Rheingold, H. (1991). “Virtual Reality.” Secker & Warburg. Ross, P. W. (1989). Ferroelectric liquid crystal displays. IEE coll. on “Graphic Display Devices.” Dig. 1989/71, May, 1989. Sobel, D. (1992). Flat panel displays. In “Electro-Optical Displays” (M. A. Karim, Ed.), Chap. 4. Dekker, New York. Special issue on electrochromic technology (1988). I n “Displays.” Oct., 1988. Spruce, G., and Pringle, R. D. (1992). Polymer dispersed liquid crystal films. Electron. Commun. J. Vol. 4, No. 2, pp. 91-100. Stotz, R. H., and Cheek, T. B. (1967). A low cost graphic display for a computer timesharing console. Project MAC ESL, MIT July, 1967 (EL-TM-316). Street, 0.S. B. (1974). The HRDI: A high resolution storage display. CAD 74 conf. 1PC Sci. & Tech. Press. Fiche 408. Sutherland, I. E. (1963). Sketch-pad-A man-machine graphical communication system. Proc. SJCC 329. Thorn-EM1 (1992). Demonstration. Travis, A. R. L., and Lang, S. R. (1991). The design and evaluation of a crt-based autostereoscopic 3-Ddisplay. Proc. SID Vol. 3214. Van-Dam, A. (1970). Storage tube graphics: A comparison of terminals. Computer Graphics 70, Brunel University. V‘ickers, D. L. (1974). Sorcerer’s apprentice: Head mounted display and wand. Utah Univ. Rep. UTEC-CSc-74-078 (July, 1974) NTIS AD/A 008 787. Watanabe, M. (1989). JEE, September, 1989, pp. 107-109.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 91
Gabor’s Pessimistic 1942 View of Electron Microscopy and How He Stumbled on the Nobel Prize T. MULVEY Department of Electronic Engineering and Applied Physics, Aston University, Birmingham, U.K.
1. II. Ill. IV. V. Vl. V11. VIII. IX. X. XI.
Introduction . . . . . . . . . . . . . . . . . . . . . . Electron Physics in Wartime Britain . . . . . . . . . . . . . Dangers in the German Reich . . . . . . . . . . . . . . . An Enemy Alien with Special Qualifications . . . . . . . . . . . The Third Meeting of the Electronics Group, October 1942 . . . . . . Space-Charge Correction of Spherical Aberration . . . . . . . . . The Projection Shadow Microscope: An Unrecognized Tool for Holography Abbe’s Theory of the Microscope and the TEM . . . . . . . . . Electron Beam Holography . . . . . . . . . . . . . . . . Electron-Optical Trials at the AEI Research Laboratory . . . . . . . Artifact-Free Holography . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. .
.
. . . . . . .
259 260 267 268 269 274 274 275 276 279 282 283
I. INTRODUCTION On the occasion of the 50th anniversary of the Electronics Group of the Institute of Physics, it seems appropriate to recall the scientific and technical content of one of the first lectures given t o the newly formed group. I refer to Dennis Gabor’s stimulating 1942 wartime lecture on the new science of “electron optics” and the electron microscope, which eventually led him, by indirections, to the invention of electron beam holography and the Nobel Prize in 1971. In this wide-ranging talk, he attempted to derive, from the sparse information then available, the ultimate resolving power of the electron microscope, an instrument that had just become commercially available in Germany but not in the U.K. With the aid of wonderfully ingenious back-of-the-envelope calculations, Gabor took the audience on an exploratory journey in search of the fundamental resolution limit of this new kind of microscope. The results, based partly on fundamental physics and partly on inspired guesswork, were, however, pessimistic about ever attaining the atomic resolution that seemed possible in view of the small size of the electron. Gabor attributed his pessimism, almost exclusively, to the malevolent influence of the spherical aberration of the objective lens. The correction of 259
Copyright 0 1995 by Academic Press, Inc. All rights of reproduction in any form reserved. ISBN 0-12-014733-5
260
T. MULVEY
this aberration became a persistent theme throughout his scientific life, although many of the early pioneers thought they had enough to do at the time, making the instrument work at all, since there were many practical difficulties with the early models. For many years Gabor struggled unsuccessfully with this problem, but he could not get it out of his mind. In the Easter holidays of April 1947 he received, quite unexpectedly, a “vision” (cf. Allibone, 1980), referred to later, in which he says he was given “instantaneously and without effort on his part,” the solution to his problem, namely the concept of holography and image reconstruction. The word “holography” was coined by Gabor from the Greek “holos” meaning whole and “graphein” meaning to record. A hologram would therefore record, in a single medium such as a photographic plate, the complete electron wave in amplitude and phase. No one had ever even dreamed of this before, and the term became his exclusive trademark. The Rugby vision made him reject all his previous ideas about electron microscopy and pursue holography single-mindedly for the rest of his life, as the most likely means of correcting spherical aberration. At the outset, it seemed to many experts that it was very unlikely to succeed. Fortunately, we have extensive documentation of Gabor’s subsequent tortuous path towards the Nobel Prize, in 1971, for the invention of holography. The prize, surprisingly, was awarded mainly for its optical applications, which Gabor himself did not envisage at the time of his invention! Moreover, it was only well after his death-he died in 1979that aberration-free atomic resolution, which was his real goal, was achieved for the first time in the electron microscope, with the aid of holography. With hindsight, at least two other contemporary scientists had come very close to inventing holography, even to the point of unknowingly producing “holograms,” as Gabor was later to call them, but somehow they could not take the decisive intellectual step that was needed to grasp the mindboggling concept of recording an electron wavefront in amplitude and phase on a photographic plate and then reconstructing the image in some way with visible light, correcting the aberrations in the process. 11. ELECTRON PHYSICS IN WARTIME BRITAIN In 1942, in wartime Britain, the all-embracing Electronics Group of the Institute of Physics was the only viable forum in the U.K. for discussing the future possibilities of this comparatively new and controversial instrument, the “transmission electron microscope” (TEM). A fairly crude experimental TEM, based on the pioneering work of Knoll and Ruska in Berlin, had,
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
261
in fact, been manufactured in the U.K. as early as 1936 for Prof. L. C. Martin et al. (1937) in the Applied Optics Department at Imperial College, by the Research Department of the Metropolitan Vickers Electrical Company (MV) in Manchester. This instrument, the first commercially produced electron microscope in the world, did not, however, surpass the optical microscope in resolving power, nor was it supposed to, as its purpose was primarily to compare optical and electron-optical images. Nevertheless, it set the scene for the later U.K. industrial production of a wide range of electron microscopes. By 1938, however, the imminent prospect of a war with Germany was a primary concern of the government, and the MV Research Laboratory became heavily involved in urgent defense work, such as developing antiaircraft radio-location networks (radar), as they were then called; further commercial production of electron microscopes had to be shelved in the U.K.until some more favorable time in the future. Of relevance to present account is the fact that also in 1936, the famous German electron optics theorist, Otto Scherzer, who was already exerting a strong influence on Gabor’s scientific thinking, was also diverted from electron microscopy in order to take charge of radar development for the Luftwaffe. In the same year, Scherzer (1936) proved mathematically that no arrangement of magnetic or electrostatic fields of axial symmetry could have negative spherical aberration. This was bad news for electron microscopy, since it was already known that even the best electron lenses up to that date had devastatingly high positive spherical aberration compared with optical objective lenses. In these, by the end of the 19th century, after hundreds of years of effort, complete correction of spherical aberration had been achieved by Abbe and his colleagues in Germany by the addition of lens elements having negative spherical aberration. In a perfect lens, the refractive power of successive radial zones of the lens increases linearly with zone radius. In all axially symmetrical electron lenses, according to Scherzer, the outer zones have an increasingly greater refracting power than those of a perfect lens. No rearrangement of magnetic polepieces of electrode shapes can reverse this behavior. Spherical aberration in an objective lens therefore gravely distorts the wave front, and hence the information content of the wave leaving the specimen. From a geometrical point of view, the disk of confusion in the image due to spherical aberration increases as the cube of the aperture angle, so the easiest way to reduce its effect is to aperture down the ray pencils leaving the object; this is an unsatisfactory compromise, but it is still the method employed in practice today. This procedure introduces a further aberration, that of diffraction, whose associated disk of confusion increases inversely as the size of the aperture. The wave front of the wave leaving the specimen,
262
T.MULVEY
whose amplitude and phase carries the detailed information about the object, is then seriously disfigured by the combined effect of these two aberrations, and the lens is unable to reproduce a faithful image of the object in the image plane. The image may well be reasonably sharp but its appearance on the fluorescent screen will change rapidly as the operator adjusts the focus control of the microscope. Unwanted contrast reversals of fine detail may also occur. Focusing becomes arbitrary and true atomic resolution cannot be achieved. Scherzer was a worried man when he arrived at this mathematical result, especially as he was employed by the AEG (Algemeine Elektrizitaets Gesellschaft) Company in Germany, who were planning to manufacture electrostatic TEMs. In 1936, and even in 1942, when Gabor gave his lecture, the problem was, in fact, not yet urgent in practical terms; the few electron microscopes that were available were still a long way from atomic resolution, so it needed a strong mathematical-physics background and a vivid imagination to visualize what the ultimate resolution of the TEM might be. In many ways, Gabor’s attitude was similar to that of Abbe working at Carl Zeiss Jena, who, to his astonishment, discovered from his newly derived optical wave theory that any further attempts to improve the resolving power of the Jena optical microscopes would be doomed to failure. In the absence of aberrations, the limit was now set by the nature of light, to a value of around half the wavelength. His fellow designers thought him crazy and continued to try (unsuccessfully) to surpass this “limit.” Scherzer, however, unlike Abbe, was not downcast by his discovery. He pointed out that there were clever ways around his theorem, and he set them out clearly. They all sounded difficult and expensive! For example, one could abandon lenses of axial symmetry and use quadrupole, hexapole, and octapole structures, which he, indeed, began to investigate both theoretically and experimentally, considering this to be the most likely approach to success. One could also introduce space charge into the optical path; an electron swarm around the axis would weaken the radial force on the electron and, one hoped, the spherical aberration. This was the method that appealed to Gabor, who was an expert on the magnetron. Scherzer spent the rest of his life, and much of that of his gifted students, gnawing away at this problem, but failed to crack it before he died. Our knowledge and practical skills in making such systems are now truly remarkable, but one suspects that the engineering and electrical tolerances required are probably still beyond present nanofabrication capability. The engineering effort since 1936 has been of impressive proportions, but it has not yet proved possible to incorporate such schemes into a practical highresolution electron microscope, let alone demonstrate a higher resolution than that of a conventional EM.
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
263
Probably the only other person in the world who was equally worried about Scherzer’s conclusion was Denis Gabor, who set himself the task of solving the problem. Gabor was born in Hungary of gifted parents. He was mainly educated at home, learning English, French, and German from governesses of the appropriate nationality, followed by a private tutor in mathematics and physics. Later on he did attend an elementary school, and then a grammar school. It was a period of extreme boredom for him; he had problems with his teachers, since he knew considerably more about physics, mathematics and languages than they did. His father had let him buy, from his earliest years, any book he wanted. Gabor was particularly keen, as a boy already fluent in Hungarian, French, German, and English, on advanced German textbooks in physics and mathematics. When he arrived at the University Engineering Department, he found (cf. Allibone, 1980) he had already covered the complete degree course in mathematics and physics. He later did military service in the Hungarian Army, being stationed in Italy in 1918, at the end of World War I. Thanks to his wonderful memory, he soon picked up Italian by memorizing 200 new words each day. Gabor spoke all his foreign languages with correct grammar and syntax, but with the same strong, unmistakable Hungarian accent. Later he went to the Technische Hochschule Berlin. On graduating, he considered his professor incapable of giving him a sufficiently demanding project, so he opted for a Ph.D. in the T H Berlin High Voltage Laboratory, where, without any previous experience, he designed and built an innovative state-of-the-art high-speed electron beam oscillograph for measuring fast transients caused by lightning discharges in overhead transmission lines. In 1926 he submitted his thesis (Gabor, 1926). This marked the start of his career as an inventor. Although Gabor could easily pass for a mathematical or experimental physicist or a professional mechanical or electrical engineer, he always regarded himself as an inventor. He and his father were agreed that the raison d’Ctre of an engineer was to be able to invent and realize something startling that nobody had even imagined was possible. It was always his ambition to invent something really striking, that would make people sit up. Holography certainly did that! In addition to having a penetrating intellect, Gabor was a voracious reader with a simply remarkable memory; his mind was teeming with ideas of things to invent, quite unperturbed by the absence of a suitable technology to realize them. As a person he was described by a colleague as having “a thick skin, while remaining a remarkably sensitive man, alert to all new ideas that came his way.” He had immense cerebral informationgathering and processing ability.
264
T. MULVEY
Glass tube Sealinq compound
To the pump
casinq Fluorescent screen Removable cassette
Photoqraphic plates
Laqe flat qround joint tor smlinq off the apparatus
FIGURE 1 . Gabor’s iron shrouded magnetic “lens” for the high speed oscillograph. (Courtesy of the late Prof. D. Gabor.)
The most significant innovation in Gabor’s design of the oscillograph was the ironahrouded electron beam “converging element,” as it was called, shown in Fig. 1. At the time when Gabor submitted his Ph.D. thesis, it had been known for many years that a uniform magnetic field of a long currentcarrying solenoid causes a beam of cathode rays (electrons) emanating from a point source to converge to a point located on the same flux line further along the solenoid. This may seem like the action of a converging optical lens, but it isn’t, since, as mentioned earlier, a lens must have a unique axis, on which the refracting power is zero, increasing linearly with radial distance from the axis. It so happened that Gabor found it very inconvenient to employ such a solenoid, since this would prevent him gaining access to the column of his oscillograph. With remarkable intuition, he replaced the long coil by a short solenoid, which he surrounded with an external iron shroud, so that the external stray magnetic field would be reduced and would not cause unwanted beam deflections in other parts of the column. Such an arrangement, he found, had very different properties from that of an infinite solenoid. He had, unknowingly, constructed the first iron shrouded magnetic electron lens!
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
265
This could have been his chance to be regarded by posterity as the father of electron optics and microscopy and a possible candidate for a Nobel Prize. It was not to be! Unfortunately, within the time limit of the Ph.D. program, he was quite unable to understand how this device worked. The token theory he proposed in his thesis he admitted was quite inadequate, but nevertheless he had faith in the experimental result. He really kicked himself, when in 1927, he first saw the famous paper of Hans Busch (1927) from Jena, showing, unbelievably, that a short magnetic solenoid acts on a beam of electrons as a converging optical lens does on light. This was a totally unexpected result for Busch himself, and certainly for physicists and cathode ray tube designers at the time. Busch had been working on this problem on and off since his Ph.D. days of 1908-1910, when he had been forced to abandon his original Ph.D. program because a theory of the behavior of magnetic solenoids on electron beams was not available. His genius was to simplify the calculation of the behavior of a narrow (paraxial) bundle of incoming rays traveling almost parallel to the axis. This avoids many mathematical and conceptual problems. By 1922 he had worked out the rotation of the electron beam as it passes through the field of the solenoid (Busch, 1922) and had even used the result to calculate the charge/mass ratio (e/rn)of the electron. This gave him the clue to the understanding of the converging action of a short coil on the rotating beam after nearly 20 years of mental effort. Gabor never forgot the impact of Busch’s lens paper, especially as Busch’s theory seemed so deceptively simple and elegant. The result was also “crazy” and “unexpected”-characteristics that always appeal to physicists. Gabor described it to his friends: “Busch’s paper was more than an eye-opener; it was like a spark in an explosive mixture!” Busch is rightly regarded as the father of electron optics. However, Busch’s own experimental results, obtained before he had derived the theory, did not support his new theory. The theory was therefore published provisionally on its own merits, unsupported by experiment. This may be the reason why Busch did not draw any general conclusions about electron optical instruments from his paper. Ernst Ruska, a young and gifted final year undergraduate at the TH Berlin, a born experimenter, was asked by his tutor, Max Knoll, to take up the question of Gabor’s short solenoid device and the discrepancy between Busch’s theory and experiment. Ruska soon discovered the cause, namely, a lack of experimental precision by Busch in defining the position of the source and an inadequate resolution in the fluorescent screen he used. When Ruska had put both matters right, he obtained agreement, better than ~ Y O , between Busch’s theory and experiment (cf. Ruska, 1980). This was bad news for Gabor, who now realized, belatedly, how his own iron-shrouded “lens” worked.
266
T. MULVEY
FIGURE2. Schematic diagram of the first two-stage TEM (Knoll and Ruska, 1932). (Courtesy of the late Prof. E. Ruska.)
These results stimulated Ruska himself and his supervisor Max Knoll to build and test in 1931 the first two-stage electron microscope, a simple design (Knoll and Ruska, 1932) shown in Fig. 2, with a magnification of
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
267
a mere 13 times, but it worked! By 1933, Ruska had built, without supervision and largely with his own hands, a two-stage electron microscope that handsomely surpassed the resolution of the light microscope (cf. Ruska, 1980). All this activity eventually earned him the Nobel Prize in Physics 1986. But in 1927, Gabor was very conscious of the fact that if he hadn’t been so slow about realizing how his “lens” really worked, it would have been an afternoon’s work for him to take a single-stage electron optical image in his electron beam oscillograph, thereby becoming the father of electron optics instead of Busch, and perhaps gaining the Nobel Prize. In fact, Gabor got his Nobel Prize sooner (1971) than did Ruska. IN THE GERMAN REICH 111. DANGERS
By 1932-33, life had become dangerous for all people of Jewish descent in Germany, and soon after in its conquered satellites, as the National Socialist Party gained increasing power. Gabor’s contract in Berlin was not renewed in 1932, so he returned to Hungary. The Gabors were of RussianJewish extraction, but the whole family had embraced the faith of the Lutheran church at the end of World War I. However, the German nonAryan Laws were subject to wide and arbitrary interpretation, and Gabor decided to emigrate. In 1934, he approached Dr. T. E. Allibone, Head of the Metro-Vick Research Department, who was well acquainted with the TH Berlin oscillograph group and had visited them in Berlin. Gabor mentioned his problem and asked if a position could be found for him. There was no position immediately available in Manchester, but a suitable position was found at the Research Department of the British Thompson Houston Company (BTH) in Rugby, a sister company of Metro-Vick. He arrived in 1934. By 1936, he had settled down in Rugby and had married Marie Louise Butler, also a BTH employee. The Research Department was concerned with both light optics and electron optics; Gabor worked initially on the design of cathode ray tubes. The optics work was, in any case, no problem; Gabor was also an excellent optics man! He also kept up his personal interest in electron optics and electron microscopy, keeping in touch with the world literature. At this time it was mostly in German, one of his four foreign languages. It was in 1936, in Rugby, for example, that Gabor read Scherzer’s definitive paper on spherical aberration, mentioned earlier. Gabor could never refuse to try to solve a problem on the grounds that it looked insoluble to experts in the field. He decided to take this matter up seriously if he could get a chance, and for the next 10 years the problem of making a perfect electron lens was always at the back of his mind. In those days it was quite usual for
268
T. MULVEY
researchers to have a “pet subject” that they could pursue in parallel with their main research task. Gabor soon became part of the small electron optics fraternity in the U.K. and began to think about more general problems in electron optics. He thought about the nature of space charge caused by swarms of electrons placed along the axis of a lens and concluded that their electron optical properties would exhibit negative spherical aberration and hence correct the positive aberration of electron lenses. The theory of space charge was also very relevant to the understanding of the magnetron, a device which was to become crucial to the development of U.K. wartime radar. All the time, Gabor felt subconsciously that he should have stuck to his original Ph.D. work, electron optics and electron microscopy. He always felt at home in this subject, kept up his contacts with his old colleagues, and gave papers at European and international conferences on electron microscopy.
IV. AN ENEMY ALIENWITH SPECIAL. QUALIFICATIONS When war was declared in 1939, Gabor was in a difficult position, since he had seen active service in the Austro-Hungarian Army. Characteristically, he immediately volunteered for military service, an offer that was instantly rejected by the War Office. Instead, he was classified as an enemy alien. Perhaps because he had an English wife and was vouched for by the BTH Company and others, he was placed on the Register of “Enemy Aliens with Special Qualifications,” a typical English way of managing things! This meant that he was not to be interned “for the duration,” but was allowed to continue in his employment at BTH. The management was, however, held responsible for security matters, and there were restrictions on his travel. He was not allowed access to any classified information, and he was specifically not allowed to enter the main works or the research laboratory. In particular, there was no chance that he could put his extensive knowledge of the theory of magnetrons to a useful purpose in radar. He was not even allowed, for example, to be informed about the plans for the detection of German aircraft. This did not stop him from putting forward a scheme, never implemented, for detecting German aircraft from the infrared emissions from the engine. All this meant that, initially, he had to work from home. Later the management got around the regulations by building him a sizable wooden hut outside the factory security fence, making it much easier for the research managers and others to visit him for consultation and discussion. It also enabled him to spend long stretches of time undisturbed by routine
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
269
tasks and to think long and deeply about the Abbe theory of the microscope and its implications for electron microscopy. It seems that Gabor was able at this time, to load into his mind, or even his subconscious mind, a vast amount of data. This would be processed, even “laundered” over a long period, perhaps years, and eventually a solution would present itself to his conscious mind, without any reference to the original sources of information. Once Gabor had the tenative solution to a problem in his head, it was not difficult for him to arrive at this solution mathematically, making up the theory as he went along, taking hair-raising short cuts with the mathematics. It was, in fact, always difficult even for experts in the subject to follow his explanations, and difficult for experimenters to see how to realize his ideas experimentally. This was very much the case, in 1948, at the AEI Research Laboratory, Aldermaston, in the early stages of electron beam holography. However, in the early years of the war, electron optics was a Tom Tinker’s ground in physics and technology. Speculation about the future prospects for electron microscopy was the order of the day. V. THETHIRDMEETING OF THE ELECTRONICS GROUP, OCTOBER 1942 This, then, was the background situation in 1942, when the Electronics Group was formed and at its third meeting, on 31 October 1942, held in Rugby in collaboration with the Midlands Branch of the Institute of Physics, Gabor, the “enemy alien with special qualifications” gave his seminal lecture on “Electron Optics.” The meeting took place at the Rugby College of Technology. Five months later, on 4 March 1943, when the course of the war was looking more hopeful for the Allies, his travel restrictions were eased and Gabor was able to give a similar lecture to the Cambridge Physical Society, where he reiterated his belief, expressed in 1942, that electron microscopy had reached a stage where further progress would be slow and difficult. These two lectures were later slightly amplified and published in book form (Gabor, 1946) as The Electron Microscope (Hulton Press). This was a splendid, stimulating account of the subject for physicists and engineers alike, still littered with back-of-the-envelope calculations and inspired stimulating guesses, clearly revealing, with hindsight, the confused but exciting beginnings of electron microscopy in the U.K. at that time, as well as indicating Gabor’s current scientific outlook. Its content was clearly the fruit of Gabor’s three years of incarceration and isolation from scientific colleagues in his wooden hut in Rugby, with no experimental contact with
270
T. MULVEY
the subject and very little contact with the outside scientific world. The monograph was therefore highly individualistic, if not idiosyncratic, and made a big impact on electron microscopists when it came out and inspired them for many years afterwards. In his preface, dated 22 July 1944 at Rugby, Gabor wrote: The fundamental development of the subject has now reached a stage beyond which progress is likely to be slow and difficult. At this point, it seems appropriate to look into the future, and to try to explore in imagination the avenues of further development. As this is a proverbially risky undertaking, the author is quite prepared to join the ranks of other, more illustrious prophets who have failed.
He was quite right about the last statement! At this stage he had no idea that in five years’ time he was going to change his ideas completely about electron optics and invent the unlikely method of electron beam holography as a method of correcting spherical aberration in the electron microscope. Gabor began his lecture by asking the same question about the electron microscope that Abbe had asked himself about the light microscope, namely, “What is the fundamental resolution limit of the TEM?” He even followed Abbe’s wave optics reasoning, as indeed Boersch, another great pioneer, was doing in Berlin. Both men had pointed themselves in a direction that could lead either of them to invent or discover holography, then a completely unknown concept. Thus, in Chapter 7 of his monograph, entitled, “The resolution limit of the uncorrected electron microscope,” Gabor looked at the factors that might limit the ultimate resolution of an “uncorrected” electron microscope. Fortunately, much of the theory of the TEM could be borrowed from Abbe’s theory of the light microscope, and Gabor adapted it as he went along in the calculation. Gabor proceeded on severely practical lines to calculate the resolution of the TEM. It was known, from optical theory, for example, that the resolution d as limited by spherical aberration and diffraction is given by
d = k(CsL3)”4,
(1)
where k is a constant between 0.5 and 1, depending on initial assumptions; C, is the spherical aberration coefficient-for an electron lens, typically between a fifth and a third of the focal length, which itself has a value of a few millimeters or so. Note that Gabor wrote C, as Cf,where C is a dimensionless constant equal to CJf.This can be misleading in the context of resolution, as it is the absolute value of C, that is important here; the focal length mainly controls lens magnification.
GABOR’S PESSIMISTIC I942 VIEW OF ELECTRON MICROSCOPY
27 1
I is the wavelength of the electron, given by
I
=
12.2/y,
if 12 is expressed in angstroms and V , in volts. Here, V , is the relativistically corrected accelerating voltage, not the voltage actually applied to the electron gun, which is always smaller. Above 50 kV, the wavelength is dramatically reduced by relativistic effects, and this is the reason that would be given today for the improvement of resolution to be expected at higher voltages. However, at that time there was very little practical data available about the aberrations of high-voltage lenses, so there was no general agreement about the expected resolution of the electron microscope at different accelerating voltages. In addition, there was no suitable technology available for constructing high-voltage electron microscopes. Gabor himself had no experience of columns operating at voltages higher than 60 kV, and for some reason, he simply didn’t believe, at this stage, in the advantages of high accelerating voltages for improving the resolution in a TEM. Some optical theory, however, could be applied to the problem. Thus, Lord Rayleigh’s wave optical calculations allowed a designer a path difference error of about a quater wavelength (1214) between the paraxial and marginal rays in the objective lens; this gives k = 0.7 in the preceding equation. Gabor used an empirical random error approach introduced by von Ardenne in Berlin and got k = 1.2. This made Gabor’s resolution limit look worse by a factor 1.6, but in 1942 this was not important, since commercial instruments were nowhere near the theoretical limit; a more serious error, however, occurs in connection with the estimation of the effect of accelerating voltage on resolution. If we substitute Eq. (2) into Eq. (I), we get, for k = 0.7, d
=
4.56C,”4V,-3/8
(3a)
and for k = 1.2,
d = 7.5C,‘/4V’,-3’8, (3b) i.e., a factor of 1.64 higher, as mentioned earlier. Gabor concluded correctly that Eq. (3b) shows “that in order to obtain a high resolution the most effective method is to raise the potential V.” This is blindingly obvious today, with all our practical experience, but for some reason that he didn’t explain, Gabor seems to have had an intuitive feeling that, in spite of Eq. (3), 60 kV was the optimum accelerating voltage for a TEM, and he went on to say, “But this [i.e., raising the voltage] is less effective than it first appears, for two reasons. The first is that, in a given magnetic lens field the focal length f is proportional to the applied voltage V.” This is unfortunately not true, especially in the relativistic range, i.e.,
272
T. MULVEY
above 60 kV. Nevertheless, he then rearranged the equation in a slightly different form to prove his point:
d = 7.5(C, /f ) 1/4( V,/f ) V,- ‘I8.
(4)
This is correct, but he then assumed, incorrectly, that he could replace the term f / V , by the f/V value for a “good” lens at 60 kV. He estimated that this best lens would have a focal length of some 0.3 cm. This gross oversimplification, of course, had the effect of reducing the apparent improvement in resolution at high voltage. He concluded, quite erroneously, that the resolution would hardly change with V, and that “At 60 kV, which can be roughly considered the optimum, the resolution becomes (with the strongest lens that can be realised)” d
=
9(Cs/f)1/4 [angstroms].
(5)
The value of CJf of good lenses was known to lie between 0.2 and 0.3, and so a resolution of some 6 to 6.7 a.u. was obtained as the ultimate resolution for an “uncorrected” TEM, and Gabor concluded that the spherical aberration and not the accelerating voltage was the main parameter of importance. This is quite an unjustified conclusion, a victory of faith over physics! Gabor was, however, still a bit uneasy with this result, since 6a.u. resolution was far lower than anything obtained in any electron microscope hitherto constructed. The chapter concludes without resolving this dilemma; after speculating what other practical parameters, such as electrical and mechanical stability, might be at work, he wrote as follows: “We obtain, therefore, about 12 a.u. for the resolution that can be expected from an uncorrected electron microscope. The best experimental result obtained up to date is twice as much, 24 a.u.” No one, as far as I know, ever questioned this curious and uncharacteristic “hand-waving” estimate of the optimum accelerating voltage. In any case, the difficulties and expense of constructing a highvoltage TEM would have been very great at that time. With hindsight, it would have been possible for Gabor, without making ad hoc assumptions, to calculate in a well-defined manner what improvements, if any, one might expect in a TEM by going from 60 kV to 1,OOO kV, keeping the same form of lens design. For a given shape of axial field distribution, one can simply scale up the lens properties, focal length, and spherical aberration as one increases the accelerating voltage. It is simply necessary to scale up the size of the lens, keeping the excitation parameter NI/ V ,constant, where NI is the lens excitation in ampere-turns, since in any magnetic circuit the magnetic field distribution remains geometrically similar if the dimensions and the excitation are each scaled by a factor n,
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
273
provided the B-H properties of the iron remain the same. This rule is exact and holds even if saturation effects take place in the iron. Thus, if we scale a magnetic lens in this way by a factor n, the magnetic field distribution in the scaled-up lens will be geometrically similar to that of the parent lens. Likewise, the electron trajectories will be scaled in the same way. The focal properties, focal length, etc., in the scaled lens will thus be n times as great as those in the original. These focal properties are governed by the parameter NZ/ V,, where NZ is the number of ampere-turns in the lens. In fact, to change from a given focal setting of the objective at an accelerating voltage of 60 kV ( V , = 63.6 kV) to the corresponding one at 1 ,000 kV ( V , = 2,000 kV), one must scale the lens excitation by a factor of about 5.6 and the lens dimensions by the same factor. The focal properties, such as focal length and spherical aberration coefficient, will then also increase by a factor of 5.6. The wavelength, on the other hand, will be reduced by a factor of 5.6, since it varies inversely as the square root of V, [Eq. (l)]. Thus, from Eq. (3), the wavelength, rather than the spherical aberration coefficient, is the dominant term in the expression for resolution. It readily follows that the resolution of any objective lens, good or bud, will be improved by a factor of about 2.3 in going from 60 kV to 1,OOO kV accelerating voltage, a factor well worth having. It seems therefore that Gabor’s estimate of the ultimate resolving power of the “uncorrected” TEM was hopelessly pessimistic. At 1,000 kV, for example, he should have found 1.6 a.u. at 1 million volts from his accurate formula, rather than 4.2 a.u. given by his simplified formulas. The “brick wall” of physics theory that Gabor often referred to in connection with resolution, and which he was unsuccessfully trying to leap over, was indeed much lower than he thought, but nevertheless, ultimately an important obstacle, not so much for our ability to see atoms, but how to interpret the image. This “happy mistake” provided Gabor with an even stronger motivation for correcting spherical aberration than he had in 1936. Ernst Ruska’s own “educated guess” of the ultimate resolving power of the “uncorrected” TEM was much more relaxed, namely 3 a.u. at 80 kV, as announced in 1932, after discussions with his supervisor Max Knoll (Knoll and Ruska, 1932); this was, in fact, quite near the mark. They “deduced” this result after estimating the angular width of the ray pencils leaving the specimen in their 80 kV experimental TEM and the known wavelength of the electrons! At that time very little was known about the aberrations of real lenses, so they assumed that they would not be a serious obstacle! This view was probably adequate until about 1947, when Hillier in the U.S.A. proved experimentally that the limiting factor until then had actually been astigmatism caused by defects in the machining of the lens polepieces, and not spherical aberration, as believed by Gabor. Ruska was
214
T. MULVEY
seen to be right in his attitude of ignoring, for the time being, all methods
of correcting spherical aberration, but concentrating on reducing lens defects as much as possible by refining conventional lenses. Although Gabor’s assertion in his 1942 lecture that spherical aberration was the chief obstacle to be overcome was simply not justified, nevertheless, in the end, it would eventually become a serious obstacle in the quantitative appraisal of atomic-resolution electron microscopy. VI. SPACE-CHARGE CORRECTION OF SPHERICAL ABERRATION At the end of the war, Gabor was allowed back into the main works of BTH Rugby and began to undertake experimental as well as theoretical research. He also began to get back into the electron microscope scene, proposing, among other things, to Metropolitan Vickers Research Laboratory a scheme for correcting spherical aberration in an electron microscope objective lens by means of electron space charge. The scheme was summarily rejected, however, by Metropolitan Vickers when it was realized that the tolerances of positioning the fine axial wire in the lens bore producing the space charge were simply unattainable. One of the many difficulties found by subsequent intrepid researchers who tried out the idea was that the inevitable statistical fluctuations of the space charge impaired the image quality, and hence the resolution of the image, compared with that from an uncorrected microscope!
SHADOW MICROSCOPE: VII. THE PROJECTION AN UNRECOGNIZED TOOLFOR HOLOGRAPHY The AEG Company in Berlin, the chief rivals of Siemens und Halske who promoted the magnetic TEM, had great hopes for the electrostatic electron microscope, because of its conceptual elegance and apparent simplicity. For example, electrostatic lenses were said to be achromatic against variations of accelerating voltage; this is true at low accelerating voltages, but not in the relativistic region above 30 kV. This advantage turned out later to be offset by serious technical drawbacks, such as unwanted voltage flashover between lens electrodes, which limited the accelerating voltage to around 80 kV, too low for materials science specimens. This soon proved commercially fatal, forcing AEG and others to withdraw from the market. Their work was not entirely wasted, however, since it provided Gabor with some of the essential ideas and even the unrecognized tools of holography. One of these was the “electron shadow microscope” of Boersch.
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
275
VIII. ABBE’S THEORY OF THE MICROSCOPE AND THE TEM
In 1936, as Otto Scherzer was publishing his paper on spherical aberration, his gifted colleague Hans Boersch, working in the same laboratory, began a remarkable research program to see if Abbe’s wave theory of the optical microscope could be applied to the electron microscope. Abbe’s theory supposes that the specimen is illuminated with coherent light, i.e., from a small source, capable of producing diffraction effects. On passing through the specimen, the (carrier) wave is encoded (modulated) in amplitude and phase with information about the specimen and displayed as a diffraction pattern/Fourier transform in the back focal plane of the objective. The wavelets from this diffraction pattern then proceed to interfere at the image plane, producing the familiar image that the eye can readily interpret. Boersch was able to show, to everyone’s surprise, including his own, that this is precisely how the TEM works! It is not easy to produce a coherent illuminating beam-and hence the striking edge-diffraction effects-in a standard TEM; but by 1939, Boersch had invented and built a projection (shadow) electron microscope (Boersch, 1939), that used demagnifying electrostatic lenses to create a point-like (spatially coherent) electron source. A thin specimen was placed a short distance away from the source, and so a magnified “shadow projection image” of the specimen was obtained on a distant fluorescent screen or photographic plate (see Fig. 4, later, for a schematic diagram). This microscope, which had no lenses between specimen and image, showed many interesting wave optical effects, such as Fresnel diffraction, never seen before in an electron microscope. A year later, Boersch (1940) published images from his shadow microscope showing Fresnel fringes surrounding the edges of the specimen, as shown in Fig. 3. These fringes arise from interference between the direct wave from the point source and the electron wave scattered from the sharp edge. Hiller (1940) in the U.S.A. also produced such fringes in the same year. These fringes, without Gabor being aware of it at the time of his 1942 lecture, were later to form the basis of his invention of holography. In fact, for many years, Gabor stubbornly refused to accept that such fringes were Fresnel fringes, believing them to be energy loss fringes caused by plasmon losses in the specimen. In the first edition of Electron Microscopy, he strongly attacked Boersch and especially Hillier for making such “unsubstantiated” claims. Later, Gabor was forced to recant and apologize, in 1947, to Hillier and Boersch in the preface of his second edition of Electron Microscopy (1948), after he was privately given incontrovertible proof by Hillier and Ramberg (1947) of the true nature of these fringes.
276
T. MULVEY
OBSERVATION PLANE
-
GEOMETRICAL SHADOW EDGE
?9 FIGURE3. Boersch’s Fresnel diffraction patterns in the electron shadow projection microscope. The first electron “hologram” (Boersch, 1940). (Courtesy of the late Prof. H. Boersch.)
IX. ELECTRON BEAMHOLOGRAPHY In my opinion, the shock produced by the fatherly advice from Hillier triggered off in Gabor’s mind the incredible idea of holography. Gabor suddenly realized that these so-called “shadow images” were a terrible misnomer. They were Gabor “interferograms,” recording not just a shadow of the specimen, but more importantly the actual wave leaving the specimen in amplitude and phase. It was Gabor’s genius to call such a micrograph a “hologram,” establishing the term as his personal trademark. It was an even greater stroke of genius to recognize that if the electron hologram, in the absence of the specimen, were illuminated by a coherent beam of light, of any wavelength, one would see the original object, but afflicted with the inevitable spherical aberration of the electron lens. This didn’t matter, however, since the light wave could now be corrected by an optical system. Gabor claimed that this “revelation” or “vision” came to him suddenly and unexpectedly during Easter 1947. The background to Gabor’s Easter “vision” was as follows. Easter Monday, the traditional Bank Holiday, fell on 15 April in 1947, and Gabor was still employed in the BTH Research
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
277
Laboratory. He and his wife decided to have a game of tennis. As they sat together on a bench in the BTH sports area, waiting patiently for their turn, Gabor related that he was suddenly struck by a “vision” in which “instantly and without any effort on my part, the complete solution of the aberration correction problem was presented to me.” The vision according to Gabor ran as follows: Take an electron image that contains the whole information (i.e., amplitude and phase) and then correct it by optical means. To record both amplitude and phase, a coherent background must be supplied by the electron source. The result will be an interferogram. Photograph this and then illuminate the interferogram with coherent optical light and record the reconstructed image on a photographic plate.
Any aberrations in the resulting image, such as defocusing, astigmatism, or spherical aberration can then be corrected optically, since the interferogram records the wavefront of the original electron wavefront. It followed as a corollary that Boersch in 1940 had published the first electron holograms with the electron shadow microscope, without of course realizing their significance. Hillier, following Boersch’s idea, had already made an electron projection microscope, but with magnetic lenses, and had presumably also unwittingly recorded holograms. Furthermore, Boersch (1938) and Bragg (1942) had built optical image reconstruction equipment, inherently capable of carrying out a Fourier transform on a diffraction pattern or hologram. Gabor could suddenly comprehend all this, but even he found it difficult to convey these new ideas to practicing electron microscopists, who were still thinking on traditional particle-optics lines. He also found himself under extreme self-imposed pressure to try out in practice these incredible ideas in electron microscopy. At this point, it should be mentioned that although the Abbe theory of imaging had been a powerful stimulus to scientists such as Boersch and Bragg, they were not strictly the first in the field with the idea of two-stage imaging. The idea had, in fact, been previously published by Wolfke (1920) in Germany, who made the interesting suggestion of a “two-wavelength X-ray microscope.” The X-ray diffraction pattern recorded on film would serve as Abbe’s primary image, and an optical system would be used to form the secondary image with light. Wolfke’s paper was not well documented, and he did not give any experimental results, so his work was soon forgotten. The chief difficulty, of course, is how to record the phase; a photographic plate records intensity, and so the phase is lost. Gabor knew of Boersch’s work, but was more impressed by Bragg’s results, because Bragg produced
278
T. MULVEY
credible images from known crystals. Gabor now realized that the “hologram” contained amplitude and phase information, and so the wave leaving the specimen could be reconstructed optically and the aberrations corrected without prior knowledge of the phases. This was the real breakthrough without any precedent. In a BTH Research Report of January 1948, Gabor explained to the management: By Huyghen’s principle, if an object is illuminated by a coherent wave, the full information on the modifications which the wave has suffered in traversing the object is contained in every wavefront, or in any surface traversed by the wavefront. But the information is in an “unfamiliar,” and even partly inaccessible form; partly in the form of amplitudes, partly in the form of phases. A photographic plate will only record the amplitudes, that is to say, only one half of the information. An “image” can only be obtained in certain planes of an optical system, where the phase relations are at least approximately restored. . . . The basic idea of the new instrument is to record the phases of the modified wavefront, by comparing them with the phases of the unmodified wavefront. . . .
This astonishing idea was truly the intellectual property of Gabor, and Gabor alone. His genius was to call this record a “hologram”; at first he had no idea of the astonishing universality of this idea, which would later become apparent for all forms of radiation. Figure 4, taken from Gabor’s (1949) comprehensive Royal Society paper, shows schematically the proposed design for the “new instrument” or “diffraction microscope,” as Gabor sometimes called it. The twocomponent microscopes proposed were certainly not new. The electron microscope for taking the hologram, as shown in the figure, was identical with Boersch’s shadow electron microscope, even to the point of being equipped with electrostatic lenses, shortly to be abandoned in commercial electron microscopes. The optical “reconstruction” microscope followed closely Bragg’s “X-ray microscope.” The only new instrumental idea, and it was a startling one, was the hologram itself! The accompanying ray diagrams indicate the way in which the hologram reconstruction optics produce two images, a real and an imaginary one of the original specimen, axially spaced from each other but unfortunately in line with one another. This later turned out to be the Achilles heel of this otherwise brilliant system. However, Gabor had really thought the whole thing through and stated quite clearly that one really needed a separate, but coherent, reference wave inclined at an angle to the beam illuminating the specimen. At the time, no suitable coherent electron optical beam splitter was available, so he had to compromise by using the same beam for both purposes. It was almost uncanny that all future innovations by both optical and electron optical experts, leading to the removal of these shortcomings
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
279
Election
ryrtcrn ELEClaONlC
ANALYSIS
Lens, to enlarge holopmm in r atio
+/Ae
OPTIUL Srwrnrris
FIGURE4. Gabor’s design concept of the “diffraction electron microscope” and the “optical reconstruction” microscope (Gabor, 1949). Top: electron projection shadow microscope, after Boersch (1939). Bottom: light microscope for viewing the two optical images produced by the electron hologram; I , , electron wavelength, I , , light-optical wavelength. (Courtesy of the Royal Society.)
and the subsequent stunning practical success of holography, did not involve any principles not foreshadowed in Gabor’s paper, and so no one was to share the Nobel Prize with him for holography. In the extensive theory that Gabor gave in his 1949 paper, he indicated that in this “in-line” holography, it would unfortunately be impossible to separate the wanted and unwanted images completely, and this might cause image artifacts, which he hoped would not be too severe. He also showed that when the chosen image was corrected optically for spherical aberration, the unwanted interfering image would then have its aberration doubled, perhaps giving rise to further artifacts, depending on the nature of the specimen. Preliminary light optical reconstructions carried out at Rugby on optical holograms, in the absence of a laser (it hadn’t yet been invented!), were reproduced in the paper; these indeed exhibited such artifacts, but were inconclusive for an evaluation of the possibility of electron beam holography.
x. ELECTRON-OPTICAL TRIALS AT THE AEI RESEARCH LABORATORY The electron optical trials of the feasibility of electron in-line holography were carried out shortly afterwards, at the AEI Research Laboratory at Aldermaston, Berkshire, whose Director, T. E. Allibone, was well acquainted
280
T. MULVEY
with Gabor’s work. Allibone set up a small team at the laboratory under the stimulating leadership of M. E. Haine, with J. Dyson, J. Wakefield, and T. Mulvey, together with strong electronics backup from M. W . Jervis. Haine and Mulvey modified an experimental 100 kV TEM to operate as a Boersch/Hillier shadow microscope, with magnetic lenses. Holograms of modest potential resolution were quickly obtained and reconstructed in an optical bench setup, designed and built by Dyson and Wakefield, in which variable amounts of spherical aberration could be introduced so as to correct the aberration in the hologram. It was soon found that the magnetic shadow microscope has a very limited field of view caused by chromatic difference of magnification (cf. Haine and Mulvey, 1952). This would have put excessively tight limits on the permissible instability of the accelerating voltage. In addition, extrapolation of the results showed that as atomic resolution was approached, the exposure time would increase by a factor of about 10 thousand! This would certainly call for a field emission gun, which was not yet available in electron microscopy, and in any case a much more stable accelerating voltage. A breakthrough in the team’s thinking occurred in 1950 when Haine and Dyson (1950) convinced each other one afternoon that we were wasting our time using a Boersch/Hillier shadow microscope, since, given a coherent source, out-of-focus images in the TEM were also “holograms” and didn’t suffer from limitations of field of view! It was easy to convince the other members of the team of the truth of this proposition, but Gabor put up some stiff resistance to the idea at first, thinking that it would not be possible to reconstruct such holograms. It didn’t take long for him to change his mind after he had done a few calculations. This startling idea of Haine and Dyson in fact set the scene for all future developments of holography in the TEM, and also revealed the very general nature of electron beam holography and in particular the amazingly complex imaging modes of the TEM. Thus, in a focal series taken with coherent illumination, in the presence of spherical aberration, there is no unique “focus”; all the images may be regarded as holograms, from which the original object may be recovered. The TEM holograms were also valuable in other ways, as the hologram itself would enable one to measure precisely the instrumental performance and resolution at the time of taking the hologram. In order to improve the spatial coherence of the TEM by at least an order of magnitude, a copper aperture, adjustable down to a few micrometers in diameter, was placed just below the electron gun. In addition, Haine and Jervis designed and constructed a temperature-stabilized magnetic sector electron velocity analyzer to improve the long-term stability of the accelerating voltage.
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
28 1
FIGURE 5. Left: The first high information limit Gabor “in-line” TEM hologram. Information limit better than 1 nm. Specimen: zinc oxide crystals. Width of field approximately 100nm. Hologram obtained in 1951 by M. E. Haine and T. Mulvey at the AEI Research Laboratory, Aldermaston, U.K. Right: Optical reconstruction obtained by J. Dyson and J. Wakefield at the same laboratory. Such in-line holograms could not be reconstructed at their information limit because of the unavoidable image artifacts due to the unwanted in-line “second image,” foreseen in Gabor’s theory, but which, in practice, overshadowed the benefits of being able to correct the spherical aberration (cf. Haine and Mulvey, 1952). (Courtesy of the Journal of the Optical Society of America.)
Using these new ideas and improved techniques, it was possible to record holograms with a well-defined information limit (potential resolution) down to 0.5 nm (5 a.u.), as shown in Fig. 5 (cf. Haine and Mulvey, 1952). However, it proved impossible to reconstruct them optically to anywhere near the information limit of the hologram. It soon became clear that this was due to the image artifact predicted by Gabor, arising from interference effects from the second (unwanted) image. This was particularly apparent at high resolution, since it introduced spurious detail more disturbing than that caused by the spherical aberration of the uncorrected objective lens. Correcting the spherical aberration therefore did not help. From a TEM manufacturer’s point of view, it was clear that in-line holography was a splendid diagnostic tool for the quantitative assessment of the imaging properties of a TEM, but was not, at least for the time being, a viable method of improving its resolution. With the new diagnostic tool of holography, it was clear that in-line holography was a splendid method for the quantitative assessment of the imaging properties of a TEM, but was not, at least for the time being, a viable method of improving its resolution. With this diagnostic tool, it also became clear that the resolution of the TEM could still be appreciably improved by better electron optical design
282
T. MULVEY
and increased accelerating voltage. Holographic electron microscopy, as such, was therefore put on hold, until an improved method of taking the hologram emerged. This was clearly a disappointment for Gabor, as it put his method on a par with that of Scherzer, inasmuch as both methods could correct spherical aberration but could not improve resolving power, because of unwanted artifacts.
XI. ARTIFACT-FREE HOLOGRAPHY The final technical breakthroughs to artifact-free optical and electron holography were to take many years. The invention of the laser greatly helped the optical aspects, but not those of electron microscopy. The first advance occurred in optics in 1962, when Leith and Upatnieks (1962) hit on the idea of using a Fresnel bi-prism to separate the reference beam from that passing through the specimen. Their use of a laser source greatly increased the intensity and coherence available. These measures completely solved the artifact problem in optics and produced really stunning 3-D holographic reconstructions, opening many new fields in optical processing. Gabor was awarded the Nobel Prize in Physics in 1971. Sadly, Gabor did not live to see the culmination of his Easter vision about electron microscopy; he died in 1979. The practical achievement of electron beam holography as originally envisaged by Gabor, i.e., atomic resolution in a TEM from a hologram corrected for spherical aberration, was to take place some 10 years after his death and nearly 40 years after his original Easter vision in Rugby. This came about from the invention, almost by accident, in 1945 of the Fresnel electron biprism by Moellenstedt at the AEG Electron Microscope Laboratory in Mosbach, Germany. A comprehensive survey of these events may be found in accounts by Mulvey (1987), Moellenstedt (1991), and, in particular, Lichte (1991), at the University of Tiiebingen. Lichte designed a miniature electron bi-prism, which could be placed in the selected area aperture holder beneath the objective lens, or taking an off-axis hologram in a 100 kV Philips TEM, equipped with a field emission gun. These measures eliminated the unwanted interaction between the two images from the hologram and reduced the exposure time to less than a minute. It was already known by then that a laser reconstruction of the hologram was too noisy for atomic resolution. Gabor had already suggested in 1947, before the advent of the programmable computer, that the image could be calculated from the hologram, and this was the method that Lichte was compelled to adopt, not without some difficulty in locating a powerful enough machine. As a matter of interest, he was able to obtain the use of the powerful Martinsried
GABOR’S PESSIMISTIC 1942 VIEW OF ELECTRON MICROSCOPY
283
Institute computer, during the Easter holiday of 1985, a fitting anniversary for achieving, for the first time, an aberration-free atomic resolution, verifying completely all the predictions made by Gabor right from the beginning. The Electronics Group of the Institute of Physics can rightly take pleasure in the fact that they invited Gabor, at a critical time, to review the future of electron microscopy and thereby launch himself on a path leading to the invention of holography and the award of the Nobel Prize.
ACKNOWLEDGMENTS The author wishes to thank the Royal Society for permission to reproduce Fig. 4 and the Journal of the Optical Society for permission to reproduce Fig. 5 .
REFERENCES Allibone, T. E. A. (1980). Royal Society Memoir-Denis Gabor (1900-1979). Boersch, H. (1938). Z. Tech. Phys. 19,337. Boersch, H. (1939). Z. Tech. Phys. 20, 346. Boersch, H. (1940). Naturwissenschaffen 28, 710. Bragg, W. L. (1942). Nature 149,470. Busch, H. (1927). Arch. Elekfr. 18, 583. Busch, H. (1922). Physikal. Z. 23, 438. Gabor, D. (1927). “Kathodenoszillograph.” Berlin Thesis. Julius Springer, Berlin. Gabor, D.(1946). “The Electron Microscope.” Hulton Press (1st ed. 1946; 2nd ed. 1948, pub. Electronic Engineering, London). Gabor, D. (1949). Proc. R. SOC.A197, 454. Haine, M. E., and Dyson, J. (1950). Nature (London) 166, 315. Haine, M. E., and Mulvey, T. (1952). J. Opt. SOC.Am. 2, 763. Hillier, J. (19401. Phys. Rev. 58,842. Hillier, J., and Ramberg, E. G. (1947). J. Appl. Phys. 18,48. Knoll, M., and Ruska, E. (1932). Ann. Phys. 12, 607, 641. Leith, E. N., and Upatnieks, Y. (1962). J. Opf. SOC. Am. 52, 1123. Lichte, H. (1991). “Advances in Optical and Electronic Microscopes,” Vol., p. 25. Academic Press, New York. Martin, L. C., Whelpton, and Parnum (1937). J. Sci. Instrum. 14, 14. Moellenstedt, G. (1991). “Advances in Optical and Electron Microscopes.” Vol. 12, p. 1. Academic Press, New York. Mulvey, T. (1987). Opfik 77, 39. Ruska, E. (1980). “The Early Development of Electron Lenses and Electron Microscopy.” Hirzel, Leipzig. Scherzer, 0. (1936). Z.Phys. 101, 593. Wolfke, M. (1920). Phys. Z. 21, 149.
This Page Intentionally Left Blank
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 91
Early Techniques in Radio Astronomy A. HEWISH Cavendish Laboratory, Cambridge, United Kingdom
Radio astronomy in the U.K. essentially began in late February 1942, when anti-aircraft radars operating at meter wavelengths were jammed by intense radio noise emitted from the sun. It was J. S. Hey, working with the Army Operational Research Group, who identified the solar source of the noise signals and who later initiated systematic observations in the immediate postwar years. Following up Jansky’s discovery of radio emission from the Milky Way in the U.S.A. in 1937, Hey also located radiation from a source in the constellation of Cygnus which ultimately led to the identification of the first radio galaxy. Over the years, detailed studies of radio galaxies have revolutionized our picture of the Universe, and this paper outlines the development of techniques in radio astronomy by which these advances were achieved. Astronomical radio signals are generated by a number of different mechanisms and usually have the character of electrical noise (or “hiss”)the equivalent of white light in conventional astronomy. Frequently the signals are much weaker than the front-end noise from the receiver itself, even when the best low-noise amplifiers are used, and long integration times are necessary for their detection. Small fluctuations of receiver gain are an immediate problem, and this was solved by a variety of switching techniques, for example alternating between the signal and a constant noise source, so that an ac output is generated. Suitable filtering then removes the effects of slow variations of receiver gain, allowing long integration time to be achieved. A more fundamental problem concerns the necessity for exceptionally large receiving antennas, both to obtain a detectable signal and to achieve high angular resolution‘. The latter depends upon the ratio LID, where L is the wavelength and D the aperture of the antenna. In the earliest work, large values of D were obtained by connecting two antennas separated by a distance D to a single receiver. The earth’s rotation causes the resultant signal to vary periodically in response to the changing phase of the two components, and this device is a radio analogue of the Michelson interferometer in optical astronomy. An equivalent system with a single antenna mounted on a cliff and utilizing Lloyd’s mirror interference by reflection from the sea was employed by Pawsey and his co-workers in Australia. It was the 285
Copyright 0 1995 by Academic Press, Inc. All righls of reproduction in any form reserved. ISBN 0-12-014733-5
286
A. HEWISH
enhanced angular resolution of a simple east-west interferometer used by Graham-Smith that enabled the Cygnus source to be identified with a distant galaxy. An important early development was Ryle’s phase-switching interferometer, in which the outputs from two separated antennas were combined alternately in phase and in anti-phase. The modulated signal corresponds only to signals which are correlated at both antennas, so that sources of angular size much larger than L/D are rejected. This is an advantage when detecting sources against the sky background, and local man-made radio interference is also greatly reduced. The first large-scale sky surveys were carried out by Ryle at Cambridge in the early 1950s, using phase-switching interferometers and antennas of considerable collecting area economically constructed with reflecting surfaces composed of steel wires stretched between a framework of parabolic pylons (Fig. 1). A more conventional approach to achieving large collecting areas was pioneered by Love11 at Jodrell Bank who constructed a fully steerable parabolic reflector 250 ft. in diameter which became operational in 1957. A similar reflector of diameter 100 m was completed in Germany in 1970. The latter incorporated the principle of structural homology, whereby flexure
FIGURE1.
EARLY TECHNIQUES IN RADIO ASTRONOMY
287
under gravity deforms the surface so that it remains parabolic, but of varying focal distance. This provides a more accurate surface, so that the radio telescope can be operated at much shorter wavelengths. The largest reflector yet built is the 1,OOO ft. spherical bowl at Arecibo in Puerto Rico. This fixed bowl was excavated from the limestone hills, and scanning over a limited angular range is achieved by movement of the receiving antenna at the focus. Another means of constructing partially steerable reflectors of large area was that of Kraus, in the U.S.A., who used a tiltable flat surface to direct radiation horizontally into a second focusing reflector having a focal point at ground level. Massive radio telescopes adopting similar ideas were later constructed in France and Russia. By the late 1950s it was evident that the high angular resolution needed to study radio galaxies in detail could not be obtained by conventional methods. Apertures of at least one mile were required. One solution was the unfilled aperture technique adopted in the Mills Cross telescope built in Australia in 1966. Two narrow cylindrical parabolic reflectors, each one mile long and arranged as an east-west, north-south cross, were connected to a correlating receiver, giving an output corresponding to the product of the voltage polar diagrams of the separate antennas. Combining the intersecting fan beams in this way provides a pencil beam of high angular resolution. The rapid development of electronic computers in the 1960s was exploited by Ryle in a still more radical approach known as aperture synthesis, for which he was awarded the Nobel Prize in 1974 (Fig. 2). Steerable antennas of modest size are arranged as correlation interferometers, producing outputs corresponding to components in the two-dimensional Fourier transform of the intensity distribution across the field of view. Inverse transformation of the stored data from interferometer pairs on different baselines, usually employing the Earth’s rotation to vary their position angle in relation to the sky, ultimately yields an image of angular resolution L/D, where D is the maximum baseline employed. Knowledge of the phase of the signals is necessary for Fourier inversion of the image, so all the receivers in use at one time require a common local oscillator. The flexibility of aperture synthesis methods, and power of computer techniques for image analysis, and image enhancement where sampling of the Fourier transform by the interferometer pairs is incomplete, produced a revolution in high-resolution radio astronomy. The 5-km telescope designed by Ryle and completed in 1972 incorporated eight parabolic reflectors, four of which could be moved along rails to provide variable baselines (Fig. 3). This arrangement provided 16 Fourier components simultaneously, and the high-resolution images obtained gave the first clear evidence that the
288
A. HEWISH
FIGURE2.
dumbell structure of many radio galaxies was caused by beams of energy emitted from active nuclei in the central regions. Some years later came the Very Large Array in the U.S.A., consisting of 27 reflectors arranged in a “Y” configuration, each arm being 13 miles long.
EARLY TECHNIQUES IN RADIO ASTRONOMY
289
FIGURE3.
This provides more than 300 Fourier components at the same time. The largest baselines so far employed in aperture synthesis are those of the MERLIN array operated by Jodrell Bank, which extend to nearly 200 kilometers. Direct cable connections between the antennas are impractical at this separation, so radio links with calibrated phase-paths are used. The highest angular resolutions in radio astronomy have been obtained by interferometers spanning intercontinental baselines using the VLBI technique. At this separation, radio links to maintain the phase of interferometer signals cannot be used. Independent local oscillators of high stability provide phase-coherent signals at each antenna which are recorded and correlated later to provide interferometric data. Fourier inversion to generate the image is no longer straightforward, as the true phase of the components has been lost, but image-analysis techniques using relative phases in the phase closure method supplies a partial solution to this difficulty. VLBI methods were initiated by Hanbury Brown and Palmer at Jodrell Bank in the late 1950s and early 1960s. It is noteworthy that radio telescopes operating at wavelengths typically lo5 times longer than optical wavelengths now provide images of substantially higher resolution than those available from the largest ground-based optical telescopes. This is due to atmospheric turbulence, which degrades the performance of ground-based telescopes. An outstanding challenge for the future is the imaging of very small thermal variations in the microwave background radiation, which will provide clues about the origin of galaxies
290
A. HEWISH
in the early universe. This requires the detection of changes of a few parts in a million of the 2.7 kelvin background temperature on angular scales exceeding several arcminutes and poses new technical problems. It seems likely that aperture synthesis arrays of small overall size, but operating over very wide bandwidths at a variety of frequencies, will provide the best solution.
Index
unified (r,s)-J-divergence measure, I3 I - 132 Bivariate entropy, unified (r.s), 95-96. 98- 102 Blue Jay missile, 174 British Thompson Houston Co., 267, 268 Broadcasting, 207-209 Bugstore, 242 Burroughs, 245 Buttiker theory, 215, 216, 219, 227 quantized Hall effect, 223-224
A A- and G-divergence, 40 M-dimensional generalization, 79-82 Abbe’s wave theory, 276 Aberration theory, eikonal functions. 1-35 Active matrix addressed display, flat panel technology, 248-25 1 Additive property, 108 AEG Company, 262, 274 AEI Research Laboratory, 269. 279-282 Aharonov-Bohm effect, mesoscopic physics, 215, 218-219 Aliasing, 244 Alloy transistor, 144 Aluminum andmonide, 175-176 Amplifier history, 153-156, 194 traveling-wave tube amplifier, 194 Analog system, 195 Approximately analytical method, 3 Arithmetic-geometric mean divergence, 38 Astronomy, radioastronomy, 285-290 AT&T, semiconductor history, 149, 168, 198 Australia. rddioastronomy, 287
C Calligraphic systems, 235-236, 237, 242 Canonical aberration theory, eikonal functions, 1-35 Canonical equation, 13 Canonical expansion, of eikonals, 16-28, 34 Cathode ray tube, 231-233, 235-236 computers, 235-236 flat screen, 236 monoscope, 240 projection displays, 253 rear-port tube, 236 storage tubes, 237-238 vacuum fluorescent displays, 236 CCIIT, see International Telephone and Telegraph Consultative Committee Chernoff measure generalized, 125 symmetrized Bhattacharyya’s distance, 128- 129 probability of error, 126- 128 CMOS. 150
B Bell Laboralories. semiconductor history, 143, 184, 193-195, 198, 200, 202 Bhattacharyya’s coefficient, 43 Bhattacharyya’s distance, 43, I25 symmetrized Chernoff mcasure, 128- I29 29 I
292
INDEX
Coaxial cable, 193-194 Combined aberrations, to ninth-order approximations, 30-33, 34 Communications, see Telecommunications Computer-controlled switching, 164, 201 -204 Computer printers, ink-jet printer, 237 Computers frame stores, 242, 244 graphic displays, 234-236, 237, 244 radioastronomy, 287 stereo images, 254 virtual reality. 255 Concavity, 38, 65-69 Conditional entropy, unified ( r , s ) , 95- 107. 121-123 Conductance, universal conductance fluctuations, 216-218 Convexity, 46-48, 62-70 majorization, 47 in pairs, 68-70 pscudoconvexity, 47, 66-67 quasiconvexity, 47, 67 Schur-convexity. 48, 67-68, 70 Copper oxide, 142 Cossor Electronics. 234, 238-239 CsiszAr's information matrix, 117, 120 Csiszrlr's +-divergence. 41, I 16
D Differential algebraic method, 3 Diffraction electron microscope, 278-279 Diffused transistor, 145 Digital integrator, 152 Digital revolution. telecommunications, 195-197 Digital system, 195 Discrete transistor, 146 Displacement meter, 17.5 Divergence measures, 37-38, 41 A- and G-divergence, 40 M-dimensional generalization, 79-82 arithmetic-geometric mean divergence, 38 I-divergence, 40 J-divergence, 37. I13 Bhattacharyya's distance, 131-132 M-dimensional generalization, 40, 7 1-79 Jensen difference divergence, 38 M-dimensional case. 38-40, 76-82 T-divergence, 40 M-dimensional generalization, 79-82
Doping, 144, 185-186
E Edge states, mesoscopic physics, 223-224 EG&G oscilloscope, 234 Eighth-order Hamiltonian function, powerseries expansion, 9- 10. I3 Eikonal aberration theory, I , 2 Eikonal E,, 6-17, 27, 32 Eikonal E,, 17-19, 27, 32-33 Eikonal cR, 19-22, 27-28, 33 Eikonal e,,,, 22-26, 28, 33 Eikonal functions, power-series expansion, 13-35 Electrochemical transistor, 144 Electrochromic display, flat panel technology, 25 1-252 Electroluminescence, pulsed dc, 245 Electromechanical switching system, 192, 20 1-202 Electron. trajectory equation, 4. 13 Electron bcam holography, see Holography Electron biprism, 282 Electronic displays, 231 -256 cathode ray tube, 231-233. 235-238. 240, 253 flat panel technology, 244-252 light controller displays, 246-247 light emitters, 245-246 liquid crystals with memory, 247-252 frame store, 242, 244 graphic display, 234-236. 237, 244 oscilloscope, 233-234 projection display, 252-253 storage technology, 237-238 three-dimensional display, 252-253 vacuum fluorescent display, 236 virtual reality, 208-209, 244. 255-2.56 visual display unit, 238-242, 243 Electronic newspaper, 209 Electronic switching system, 202 Electron microscope. 259-283 diffraction electron microscope, 278-279 electrostatic electron microscope, 274, 278 transmission electron microscope. 260-274, 280-282 Electron optics, canonical aberration theory, to ultrahigh-order approximations, 1 -35 Electrophoretic display, flat panel technology. 252
293
INDEX Electrostatic electron microscope. 274, 278 Electrostatic potential, canonical aberration theory, to tenth order approximation, 5 England, see United Kingdom Entropy, 37 Shannon’s entropy, 37, 38, 96 unified (r-.s-entropy, 41-42 bivariate. 95-96, 98-102 conditional, 95-107, 121-123 multivariate, 95, 102-107 properties, 73-75 Exchange switch system, 201-204
F Facsimile machine, 204 Fairchild, semiconductor history, 145, 149. I 50 Ferroelectric liquid crystal, 248 FET. .set- Field effect transistor Field effect transistor, 149- 150, 216 vacuum fluorescent displays, 236 Fisher measure of information, unified (r,s)divergence, 41, I 15-120 Csiszh’s $-divergence, 116-1 17 Flat panel technology, 244-252 active matrix addressed display, 248-25 I electrochromic display, 25 1-252 electrophoretic display, 252 light controller display, 246-257 light emitter, 245-246 liquid crystal, 246-252 polymer dispersed liquid crystal film, 248 subtractive display, 246-247 Flat screen, 236 Fourth-order Hamiltonian function, powerseries expansion, 8, 13 Frame store, 242, 244
G Gabor, Dennis, 259-283 Gallium arsenide, 175-184, 216 Gallium phosphide, 179 Galvanomagnetic devices, 174- 175 Gas laser, 179 Gaussian trajectory, in Hamiltonian representation, 13- IS, 30 G.E.C. Plessey Semiconductors, 150, 168, 223 Generalized Chernoff measure, I25 Generalized information measures, 37-41 unified (r.s)-information measures, 41 - 132
Generalized probability density function, I 12 Germanium, semiconductor history, 143, 144, 149, 171, 173 Glass, optical fiber technology, 199-201 Graphic dispjay, 234-236, 237, 244 Grayscale picture, 242, 244, 252 Great Britain, see United Kingdom Group 111-V compounds, semiconductors. 171-188 Guided missiles, 149, 174
H Hall effect, 175 anomalous, mesoscopic devices, 224 quantized, 223-224 Hamiltonian equation, 4 Hamiltonian functions power-series expansions, 8- 13 up to tenth-order approximations, 4- I3 Hamiltonian mechanics, 4 Hamiltonian representation, Gaussian trajectory, 13- 15, 30 Hellinger’s distance, 43 HEMT, see High electron mobility transistors High electron mobility transistor, 186 Holder’s inequalities, 46, 94-95 Holograms, 260. 276, 278, 280 Holographic display, 254 Holography, history, 259-260, 269, 274, 276-283 Hughes Aircraft Co., 199 1
IC, see Integrated circuit I-divergence, 40 Inaccuracy measure, 37, 38 Indium, 144 Indium antimonide. semiconductor history, 173-175, 179 Inequalities Holder, 46, 94-95 Jensen, 46, 63, 64,82-87, 100, 106, 110, 131
Minkowski, 46. 64,101, 106 Shannon-Gibbs, 38, 42, 53-57 unified (r,s)-measures, 57-62 M-dimensional generalization, 82-95 Information measures, 37-4 I unified (r.s), 41-132 Information radius. 37-38
294
INDEX
unilied (r,s). I 13 M-dimensional, 76-77 Information technology, 150, 190 impact on the future, 210-212 unlimited database, 210 Information theory, statistical, 37-4 I , I 10- 132 Ink-jet printer, 237 Institute of Physics, Electronics Group, 139-140, 260, 269, 270 Integrated circuit, 146- 169 history, 18 1 planar, 147-148, 195 United Kingdom, 150-151, 168-169 Integrated digital network, 197 Intel. semiconductor history, 169, 195 Interferometer, radioastronomy, 286, 287, 289 International Symposium on Nanostructure Physics and Fabrication, 2 I5 International Telephone and Telegraph Consultative Committee, 201. 204 Intrinsic aberrations. to ninth-order approximations, 28-30. 34
J Japan, semiconductor industry, 146, 149. 168-169 J-divergence, 37, 113 Bhattaharyya’s distance, 13 1 - 132 M-dimensional generalization. 40, 7 1-79 Jeffreys invariant, 37 Jensen difference divergence, 38 Jensen’s inequalities. 46, 63, 64, 82-87, 100. 106, 110, 131 Jodrell Bank Observatory. 286-287, 289 Junction transistor, 143. 144
K Kullback-Leibler’s relative information, 60
L LACES system, 239 Lagrange mechanics, 4 Large area display, 250-252. 253 Lasers gas laser, 179 history, 179-180, 182. 184 ho!ography, 282 projections systems, 253 quantum-well laser, 187 Laser Scan Laboratories, 238
Lateral structures, mesoscopic devices, 213-214, 225 LED, see Light-emitting diode Lens, magnetic, 261-262, 264-265 Lenticular screen, 254 Lie algebraic method, 2-3 Light-emitting diode, 179, 182. 246 electronic 3-D system, 254 Light pen, 235, 244 Light valve system, 252-253 Liquid crystal, 246-248 light valve, 252-253 Lithium tungsten trioxide, 25 I Logic circuit, 149
M Magnetic lens, 261-262, 264-265 Magnetic shadow microscope, 280 Magnetic vector potential, canonical aberration theory, to tenth order approximation, 5 Magnetoresistance, 175 Majorization, 47 Markov chain, unified (r,s)-mutual information measures, I 1 1 - I I2 M-dimensional unified (r.s)-divergence measures, 38-40, 75-95 Measure of uncertainty, 37 Memory. 148. 151 MERLIN array, 289 Mesa structure, 214 Mesoscopic devices. 2 13-228 Aharonov-Bohm effect, 215, 218-219 anomalous Hall effects. 224 magnetic phenomena, 223-224 Mesoscopic physics, 21 5 one-dimensional quantum wire, 29 1-222 Metropolitan Vickers Electrical Co., 261 MIAS, see Multipoint multimedia conference system Microchip, telecommunications, 151, 195, 202 Microprocessor, 15 I Microscopy Abbe’s wave theory, 276 diffraction electron microscope, 278-279 electrostatic electron microscope, 274, 278 magnetic shadow microscope, 280 projection shadow microscope, 274, 278 transmission electron microscope, 260-274, 280-282 Microwave technology history, I8 1 - I82
295
INDEX radioastronomy, 289-290 radio relay, 193- 194 Mills Cross telescope. 287 Minkowski’s inequalities, 46, 64, 101, 106 Mixed crystals, semiconductor history, 182-183 Monoscope, 240 MOSFET, 150, 223 Multipoint multimedia conference system, 205 Multivariate entropy, unified (r.s), 95, 102- I07 0 One-dimensional quantum wire, 2 19-222 Optical fiber technology, 179-180, 182 communications, 199-207 glass, 199-201 Optically transparent network, 210 Opto-electronic switching. 209 Oscillograph, 263-264 Oscilloscope, 233-234 Overdoping, 144
P PCM, see Pulse code modulation PDA, see Post-deflection amplification PDLC, see Polymer dispersed liquid crystal film Pearson’s $’-divergence, 43 Penetron tube, 235-236 Phase-switching interferometer, radioastronomy, 286 Philco, semiconductor history, 144 Phillips, semiconductor history, 144, 177 Photonic switching, 209 Pipe tube, 236 Planar integrated circuit, 147-148, 195 Planar transistor, 145- 146 Plasma display panel, 245-246 Plasma switching system, 250 Plessey Semiconductors, semiconductor history. 141, 147, 149, 150-166, 236 p-n junction, 143, 147 p-n-p alloy transistor, 144 Point contact rectifier, 142, 143 Polymer dispersed liquid crystal film, flat panel technology, 248 Post-deflection amplification, 236 Power-series expansions of eikonals, 16-28, 34 for Hamiltonian functions, 8- 13
Probability of error, unified (r,s)-divergence measures, 120- I32 Projection display, 252-253 Projection shadow microscope, 274, 278 Project MAC, 234-235 Pseudoconcavity, 66-69 Pseudoconvexity, 47, 66-67 Pseudo-stereo system, 253-254 Pulse code modulation, 196 digital transmission, 202-204 Pulsed dc electroluminescence. 245
Q QPC, see Quantum point contact QUADFET, see Quantum diffraction field effect transistor Quantum diffraction field effect transistor, 226-227 Quantum mechanical effects, mesoscopic devices, 213, 214, 216-218 Quantum point contact, 219-222, 224 Quantum well, 213, 215 Quantum-well laser, 187 Quantum wire, one-dimensional, 219-222 Quasiconvexity, 47, 67
R Radar flat panel technology, 245 history, 143, 261, 268 Radioastronomy, 285-290 Raytheon, 234, 238-239 RCA, semiconductor history, 149 Rear port tube, 236 Rectifier, history, 142 Recursive formulas, to tenth-order Hamiltonian functions, 7, 33-34 Refreshed graphics, 235, 237 Relative information, 37 RELAY, 198 S Satellite communications, 198- 199 Schur-convexity, 48, 67-68, 70 Second-order Hamiltonian function, powerseries expansion, 8 Selenium, 142 Semiconductor industry, 146, 149- 151, 166-169 United Kingdom, 149-151, 168-169, 177-178, 181
296
INDEX
Semiconductor laser, I79 Semiconductors, 142-146, 149-150 Group 111-V compounds, I7 1- 188 history, 142-166. 171-188 integrated circuits, 146- 149 mesoscopic devices, 213-227 modulation-doped semiconductor heterojunction, 2 14 quantum theory. 142 Shadow mask color tube, 233, 235-236 Shannon-Gibbs inequalities, 38, 42, 53-57 Shannon’s entropy, 37, 38. 96 Sidewinder missile, 174 Siemens Co.. semiconductor history, 172- 173, 176, 177 Silicon. semiconductor history, 147, 150. 171, 173. 176, 178 Single crystals, semiconductor history, 143 Sixth-order Hamiltonian function, power-series expansion, 8-9. 13 Sketchpad system, 235 Smetic liquid crystals, 247 Spherical aberration, 259-260, 267, 270, 277, 282-283 space-charge correction, 274 Split gate structure, 2 14, 220, 227 Sputnik, 198 Stationary orbit. satellite communications, 198-199 Statistical information theory, 37-41 unified (r.s)-mutual information measures, 110-132 STD. see Subscriber trunk dialing Stereo imaging, 253-255 Storage technology, 237-238 Storage tube, 237. 238 Strowger switch, 192, 202 Stub tuner, 226 Subscriber trunk dialing, 192 Subtractive display, Rat panel technology, 246-247 Susceptibility meter, 174, 175 Switching systems, 164, 192, 201-204 computer-controlled, 164, 201 -204 electromechanical, 192. 201-202 clectronic, 202 exchange switch systems, 201 -204 opto-electronic. 209 photonic, 209 plasma. 250 Strowger switch, 192. 202 SYCOM-satellite, 199
Symmetrized Chemoff measure, 126- I29 System clock, 233-234 r r
TAT 8,200 T-divergence, 40 M-dimensional generalization. 79-82 Tektronics, 237, 241, 250 oscilloscope, 234 Telecommunications, 151, 164 broadcasting, 207-20Y coaxial cable, 193- 194 computer-controlled switching, 164, 20 I -204 digital revolution, 195-197 electronic newspaper, 209 facsimile machine, 204 t‘uture, 205-212 guided missiles, 149. 174 history, 190-205 microchip, 151. 195, 202 microwave radio relay, 193- 194 optical fiber communications, 199-207 satellite communications, 198- 199 teleconferencing, 150, 189- 190, 205, 206, 208. 21 1 teletext, 204-205, 241 transistors, I95 United Kingdom, 191-194. 196-205 video library service, 207-208 virtual reality, 208-209 Teleconferencing, 150, 189- 190, 205, 206, 208, 21 I Telescope, radioastronomy, 285-290 Teletext, 204-205, 241 Television. 207-209 flat screen, 236 frame store, 242 history, 193-194. 196, 198 plasma display tube, 245-246 three-dimensional, 253, 254 United Kingdom, 193 TELSTAR. 198, 199 TEM, see Transmission electron microscopy Tenth-order Hamiltonian function, power-serics expansion, I I - I3 Texas Instruments, semiconductor history, 143. 147, 149, 169, 195 Thin-tilm electroluminescence, 245 Three-dimensional display, 252-253 TN cell, see Twisted nematic cell Touch-tone dialing. 202
INDEX Transistors, 144- 146 alloy transistor, 144 diffused transistor, 145 discrete transistor, 146 electrochemical transistor, 144 field effect transistor, 149-150, 216 vacuum fluorescent display, 236 gallium arsenide. 180- 18 1 high electron mobility transistor, 186 history, 142, 143, 178-188 junction transistor, 143, 144 planar transistor, 145-146 p-n-p alloy transistor, 144 quantum diffraction FET. 226-227 telecommunications, I95 Transmission electron microscopy, 260-274, 280-282 Transparent network. 210 Transport theory, quantum mechanics, 2 15-2 I6 Traveling-wave tube amplifier, I94 Twisted nematic cell, 246 Two-dimensional electron gas, 2 14, 223
U Ultrahigh-order approximation, canonical aberration theory, 1-35 Uncertainty measure, 37 Unified (r.s)-divergence measures Fisher measure of information, I 15-120 probability of error, 120-132 Unified (r.s)-entropy, 41 -42 bivariate, 95-96, 98- 102 multivariate, 95, 102-107 properties, 73-75 Unified (r.s)-inaccuracies. 41. 42 optimization, 71-73 Unified (r,s)-information measures, 41 -75 applications, 1 10- 132 Fisher measure of information, 115-120 Markov chains, 1 I I - I12 probability of error, 120-132 composition relations, 48-53
297
convexities, 46-48, 62-70 majorization, 47 in pairs, 68-70 pseudoconvexity, 47, 66-67 quasiconvexity, 47, 67 Schur-convexity, 48, 67-68, 70 inequalities among, 57-62 M-dimensional, 38-40, 75-95 mutual information, 107-1 10 Markov chains, 11 1-1 12 Shannon-Gibbs inequalities, 42, 53-57 unified (r.s)-entropy, 41-42 bivariate, 95-96, 98-102 multivariate, 95, 102- 107 properties, 73-75 unified (r.3)-inaccuracies, 41. 42 optimization, 71 -73 Unified (r,s)-information radii, 1 13 M-dimensional, 76-77 Unified (r.s)-mutual information, 107- 1 10 Markov chains, 1 I 1 - 112 Unified (r.s)-relative information, 41, 43 Kullback-Leibler, 60 United Kingdom electron physics during World War 11, 260-274 integrated circuits, 150-151, 168-169 radioastronomy, 285-290 semiconductor industry, 149-151, 168-169, 177-178, 181 telecommunications, 191- 194, 196-205 television, 193 United States radioastronomy. 287, 288 semiconductor industry, 146, 149, 168- 169 Universal conductance fluctuations, mesoscopic devices, 2 16-2 18
V Vacuum fluorescent display, 236
VDU,see Visual display unit Video library service, 207-208 Virtual reality, 208-209, 244, 255-256 Visual display unit, 238-242, 243
ISBN 0-32-034733-5 90051