PROGRESS IN OPTICS VOLUME X X X I I
EDITORIAL ADVISORY B O A R D G. S. AGARWAL,
Hyderabad, India
C. COHEN-TANNOUDJI...
29 downloads
1029 Views
15MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROGRESS IN OPTICS VOLUME X X X I I
EDITORIAL ADVISORY B O A R D G. S. AGARWAL,
Hyderabad, India
C. COHEN-TANNOUDJI, Paris, France
V. L. GINZBURG,
Moscow, Russia
F. GORI,
Rome, Italy
A. KUJAWSKI,
Warsaw, Poland
J. PERINA,
Olomouc, Czech Republic
R. M. SILLITTO,
Edinburgh, Scotland
J. TSUJIUCHI,
Chiba, Japan
H. WALTHER,
Garching, Germany
P R O G R E S S IN OPTICS VOLUME XXXII
EDITED BY
E. WOLF University of Rochester,
N.Y.,U.S.A.
Contributors M. I. CHARNOTSKII, V. L. GINZBURG, J. GOZANI, G. MAINFRAY, C. MANUS, B. P. PAL, V. I. TATARSKII, L. P. YAROSLAVSKY, F. T. S. YU, V. U. ZAVOROTNY
1993
NORTH-HOLLAND AMSTERDAM LONDON.NEW YORK'TOKYO
ELSEVIER SCIENCE PUBLISHERS B.V. SARA BURGERHARTSTRAAT 25 P.O. BOX 211 1000 AE AMSTERDAM THE NETHERLANDS
ISBN: 0 444 81592 9
0 1993 ELSEVIER SCIENCE PUBLISHERS B.V. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means. electronic, mechanical, photocopying, recording or otherwise, without the written permission of the publisher, Elsevier Science Publishers B. V., Copyright & Permissions Department, P.O. Box 521, 1000 A M Amsterdam, The Netherlands. Special regulations for readers in the U.S.A.: This publication has been registered with the Copyright Clearance Center Inc. ( CCC) , Salem, Massachusetts. Information can be obtained from the CCC about conditions under which photocopies of parts of this publication may be made in the U.S.A. All other copyright questions, including photocopying outside of the U S .A., should be referred to the publisher, unless otherwise specified. No responsibility is assumed by the publisher for any injury andlor damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein.
PRINTED ON ACID-FREE PAPER PRINTED IN THE NETHERLANDS
CONTENTS OF PREVIOUS VOLUMES
VOLUME I(1961) The Modern Development of Hamiltonian Optics. R . J . PEGIS . . . . . . . Wave Optics and Geometrical Optics in Optical Design. K . MIYAMOTO. . The Intensity Distribution and Total Illumination of Aberration-Free Diffraction Images. R. BARAKAT. . . . . . . . . . . . . . . . . . . . . IV . Light and Information. D . GABOR . . . . . . . . . . . . . . . . . . . . V . On Basic Analogies and Principal Differences between Optical and Electronic Information. H . WOLTER. . . . . . . . . . . . . . . . . . . . . . . . . VI. Interference Color. H . KUBOTA. . . . . . . . . . . . . . . . . . . . . . VII. Dynamic Characteristics of Visual Processes. A . FIORENTINI . . . . . . . . VIII . Modern Alignment Devices. A. C. S. VANHEEL . . . . . . . . . . . . . . I. I1. I11.
1-29 31- 66 67- 108 109-1 53 155-210 21 1-251 253-288 289-329
V O L U M E I1 (1963) I. I1. I11. IV . V. VI .
Ruling. Testing and Use of Optical Gratings for High-Resolution Spectroscopy. G . W . STROKE. . . . . . . . . . . . . . . . . . . . . . . The Metrological Applications of Diffraction Gratings. J. M . BURCH. . . . Diffusion Through Non-Uniform Media. R. G . GIOVANELLI . . . . . . . . Correction of Optical Images by Compensation of Aberrations and by Spatial ..................... Frequency Filtering. J. TSUJIUCHI Fluctuations of Light Beams. L. MANDEL. . . . . . . . . . . . . . . . . Methods for Determining Optical Parameters of Thin Films. F. ABELBS . .
1-72 73-108 109-129 131- 180 181-248 249-288
V O L U M E I l l (1964) I. 1-28 The Elements of Radiative Transfer. F . KOTTLER . . . . . . . . . . . . . I1. Apodisation. P . JACQUINOT. B. ROIZEN-DOSSIER . . . . . . . . . . . . . . 29- 186 I11. Matrix Treatment of Partial Coherence. H . GAMO. . . . . . . . . . . . . 187-332 V O L U M E IV (1965) Higher Order Aberration Theory. J . FOCKE. . . . . . . . . . . . . . . . I. I1. Applications of Shearing Interferometry. 0. BRYNGDAHL. . . . . . . . . Ill . Surface Deterioration of Optical Glasses. K . KINOSITA . . . . . . . . . . . IV. Optical Constants of Thin Films. P. ROUARD.P. BOUSQUET . . . . . . . . V. The Miyamoto-Wolf Diffraction Wave. A. RUBINOWICZ . . . . . . . . . . VI. Aberration Theory of Gratings and Grating Mountings. W. T. WELFORD .. VI I Diffraction at a Black Screen. Part 1: Kirchhoffs Theory. F. KOTTLER. . . V
1-36 37- 83 85-143 145-197 199-240 241-280 281-314
VI
CONTENTS OF PREVIOUS VOLUMES
V O L U M E V (1966) 1.
Optical Pumping. C . COHEN.TANNOUDJI. A . KASTLER. . . . . . . . . . . I1 . Non-Linear Optics. P . S. PERSHAN. . . . . . . . . . . . . . . . . . . . 111. Two-Beam Interferometry. W. H . STEEL . . . . . . . . . . . . . . . . . 1V. Instruments for the Measuring of Optical Transfer Functions. K . MURATA V . Light Reflection from Films of Continuously Varying Refractive Index. R . JACOBSSON. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . X-Ray Crystal-Structure Determination as a Branch of Physical Optics. H . LIPSON.C. A . TAYLOR. . . . . . . . . . . . . . . . . . . . . . . . . V11. The Wave of a Moving Classical Electron. J . PICHT. . . . . . . . . . . .
1-81 83-144 145-197 199-245 247-286 287-350 351 -370
V O L U M E VI (1967) 1. I1. 111.
Recent Advances in Holography. E. N . LEITH.J . UPATNIEKS . . . . . . . . I - 52 Scattering of Light by Rough Surfaces. P . BECKMANN. . . . . . . . . . . 53- 69 Measurement of the Second Order Degree of Coherence. M . FRANCON. S. MALLICK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71-104 IV. Design of Zoom Lenses. K . YAMAJI. . . . . . . . . . . . . . . . . . . . 105-170 V . Some Applications of Lasers to Interferometry. D . R . HERRIOT . . . . . . 171-209 VI . Experimental Studies of Intensity Fluctuations in Lasers. J . A . ARMSTRONG. A . W. SMITH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1-257 VII . Fourier Spectroscopy. G . A . VANASSE.H . SAKAI. . . . . . . . . . . . . . 259-330 VIII. Diffraction at a Black Screen. Part 11: Electromagnetic Theory. F. KOTTLER 33 1-377
V O L U M E VII (1969)
I.
Multiple-Beam Interference and Natural Modes in Open Resonators. G . KOPPELMAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Methods of Synthesis for Dielectric Multilayer Filters. E. DELANO. R . J . PEGIS I11. Echoes at Optical Frequencies. I. D . ABELLA. . . . . . . . . . . . . . . IV. Image Formation with Partially Coherent Light. B. J . THOMPSON. . . . . V . Quasi-Classical Theory of Laser Radiation. A . L. MIKAELIAN. M . L. TERMIKAELIAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . The Photographic Image. S. OOUE . . . . . . . . . . . . . . . . . . . . VII . Interaction of Very Intense Light with Free Electrons. J . H . EBERLY. . . .
1-66 67-137 139-168 169-230 23 1-297 299-358 359-415
V O L U M E VIII (1970)
I. I1 . Ill . IV . V. VI .
Synthetic-Aperture Optics. J . W. GOODMAN. . . . . . . . . . . . . . . . The Optical Performance of the Human Eye. G . A . FRY. . . . . . . . . . Light Beating Spectroscopy, H . Z. CUMMINS. H . L. SWINNEY . . . . . . . . Multilayer Antireflection Coatings. A . MUSSET.A . THELEN. . . . . . . . . Statistical Properties of Laser Light. H . RISKEN . . . . . . . . . . . . . . Coherence Theory of Source-Size Compensation in Interference Microscopy. T . YAMAMOTO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII . Vision in Communication. L. LEVI . . . . . . . . . . . . . . . . . . . . VIII . Theory of Photoelectron Counting. C. L. MEHTA . . . . . . . . . . . . .
1-50 51-131 133-200 201-237 239-294 295-341 343-372 373-440
CONTENTS OF PREVIOUS VOLUMES
VII
V O L U M E IX (1971) I.
Gas Lasers and their Application to Precise Length Measurements. A . L. BLOOM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-30 I1. Picosecond Laser Pulses. A . J . DEMARIA. . . . . . . . . . . . . . . . . 31- 71 I11. Optical Propagation Through the Turbulent Atmosphere. J . W. STROHBEHN 73- 122 IV. Synthesis of Optical Birefringent Networks. E. 0. AMMANN . . . . . . . . 123-177 V. Mode Locking in Gas Lasers. L. ALLEN. D . G . C. JONES. . . . . . . . . . 179-234 VI . Crystal Optics with Spatial Dispersion. V. M. AGRANOVICH.V. L. GINZBURG235-280 VII . Applications of Optical Methods in the Diffraction Theory of Elastic Waves. K . GNIADEK. J . PETYKIEWICZ ....................... 28 1-310 VIII Evaluation. Design and Extrapolation Methods for Optical Signals. Based on Use of the Prolate Functions. B. R . FRIEDEN . . . . . . . . . . . . . . 3 1 1-407
V O L U M E X (1972) I. Bandwidth Compression of Optical Images. T . S. HUANG. . . . . . . . . I1. The Use of Image Tubes as Shutters. R. W. SMITH . . . . . . . . . . . . 111. Tools of Theoretical Quantum Optics. M. 0. SCULLY.K . G. WHITNEY. . . IV. Field Correctors for Astronomical Telescopes. C . G . WYNNE. . . . . . . . V . Optical Absorption Strength of Defects in Insulators. D . Y . SMITH. D . L. DEXTER. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Elastooptic Light Modulation and Deflection. E. K . SITTIC . . . . . . . . VII . Quantum Detection Theory. C. W. HELSTROM. . . . . . . . . . . . . .
I- 44 45- 87 89-135 137-164 165-228 229-288 289-369
V O L U M E XI (1973) I. I1
Master Equation Methods in Quantum Optics. G . S. AGARWAL. . . . . . Recent Developments in Far Infrared Spectroscopic Techniques. H . YOSHINACA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. Interaction of Light and Acoustic Surface Waves. E. G . LEAN . . . . . . . IV. Evanescent Waves in Optical Imaging. 0. BRYNGDAHL . . . . . . . . . . V. Production of Electron Probes Using a Field Emission Source. A. V. CREWE VI . Hamiltonian Theory of Beam Mode Propagation. J . A . ARNAUD . . . . . . VII . Gradient Index Lenses. E. W. MARCHAND. . . . . . . . . . . . . . . .
1-76 77-122 123-166 167-22 1 223-246 247-304 305-337
V O L U M E XI1 (1974) I. I1. 111. IV . V. VI .
Self.Focusing. Se1f.Trapping. and Self-phase Modulation of Laser Beams. 0. SVELTO. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Self-Induced Transparency. R . E . SLUSHER . . . . . . . . . . . . . . . . Modulation Techniques in Spectrometry. M. HARWIT.J . A. DECKERJR . . Interaction of Light with Monomolecular Dye Layers. K . H . DREXHAGE .. The Phase Transition Concept and Coherence in Atomic Emission. R . GRAHAM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beam-Foil Spectroscopy. S. BASHKIN . . . . . . . . . . . . . . . . . . .
1-51 53-100 101-162 163-232
233-286 287-344
VIlI
CONTENTS OF PREVIOUS VOLUMES
V O L U M E XI11 ( 1 9 7 6 )
I. 11. 111.
IV. V. VI.
On the Validity of Kirchhoffs Law of Heat Radiation for a Body in a Nonequilibrium Environment, H. P. BALTES. . . . . . . . . . . . . . . . The Case For and Against Semiclassical Radiation Theory, L. MANDEL . . Objective and Subjective Spherical Aberration Measurements of the Human Eye, W. M. ROSENBLUM, J. L. CHRISTENSEN . . . . . . . . . . . . . . . . Interferometric Testing of Smooth Surfaces, G. SCHULZ,J. SCHWIDER . . . Self-Focusing of Laser Beams in Plasmas and Semiconductors, M. S. SODHA, A. K. GHATAK, V. K. TRIPATHI. . . . . . . . . . . . . . . . . . . . . . Aplanatism and Isoplanatism, W. T. WELFORD. . . . . . . . . . . . . .
1- 25 27- 68
69- 91 93-167 169-265 267-292
V O L U M E XIV ( 1 9 7 6 ) I. The Statistics of Speckle Patterns, J. C. DAINTY. . . . . . . . . . . . . . 1- 46 11. High-Resolution Techniques in Optical Astronomy, A. LABEYRIE , . . . . . 47- 87 111. Relaxation Phenomena in Rare-Earth Luminescence, L. A. RISEBERG, M. J. WEBER. . . . . . . . . . . . . . . . . . . . . . . . . . . , . . . 89-159 IV. The Ultrafast Optical Kerr Shutter, M. A. DUGUAY. . . . . . . . , . . . 161-193 V. Holographic Diffraction Gratings, G. SCHMAHL, D. RUDOLPH. . . . . . . 195-244 VI. Photoemission, P. J. VERNIER. . . . . . . . . . . . . . . . . . . . . . . 245-325 VII. Optical Fibre Waveguides - A Review, P. J. B. CLARRICOATS. . . , . . . 327-402
V O L U M E XV ( 1 9 7 7 ) I.
Theory of Optical Parametric Amplification and Oscillation, W. BRUNNER, 1- 75 H. PAUL.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. Optical Properties of Thin Metal Films, P. ROUARD, A. MEESSEN . . . . . 77-137 111. Projection-Type Holography, T. OKOSHI . . . . . . . . . . . . . . . . . 139- 185 IV. Quasi-Optical Techniques of Radio Astronomy, T. W. COLE . . . . . . . . 187-244 V. Foundations of the Macroscopic Electromagnetic Theory of Dielectric Media, J. VAN KRANENDONK, J. E. SIPE . . . . . . . . . . . . . . . . . . . . . 245-350
V O L U M E XVI ( 1 9 7 8 ) Laser Selective Photophysics and Photochemistry, V. S. LETOKHOV. . . . Recent Advances in Phase Profiles Generation, J. J. CLAIR,C. I. ABITBOL . Computer-Generated Holograms: Techniques and Applications, W.-H. LEE Speckle Interferometry, A. E. ENNOS . . . . . . . . . . . . . . . . . . . Deformation Invariant, Space-Variant Optical Pattern Recognition, D. CASASENT, D. PSALTIS . . . . . . . . . . . . . . . . . . . . . . . . . VI. Light Emission From High-Current Surface-Spark Discharges, R. E. BEVERLY 111 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII. Semiclassical Radiation Theory Within a Quantum-Mechanical Framework, I. R. SENITZKY .., ,....... .. ................. I. 11. 111. IV. V.
1- 69 71-117 119-232 233-288
289-356 357-41 1 413-448
CONTENTS OF PREVIOUS VOLUMES
1x
V O L U M E XVII (1980) 1. 11. 111. IV. V.
Heterodyne Holographic Interferometry, R. DANDLIKER . . . . . . . . . . Doppler-Free Multiphoton Spectroscopy, E. GIACOBINO, B. CAGNAC. . . . The Mutual Dependence Between Coherence Properties of Light and Nonlinear Optical Processes, M. SCHUBERT, B. WILHELMI. . . . . . . . . Michelson Stellar Interferometry, W. J. TANGO,R. Q. Twlss . . . . . . . . Self-Focusing Media with Variable Index of Refraction, A. L. MIKAELIAN.
1- 84 85-161
163-238 239-277 279-345
V O L U M E XVIII (1980) Graded Index Optical Waveguides: A Review, A. GHATAK, K. THYAGARAJAN1-126 Photocount Statistics of Radiation Propagating Through Random and Nonlinear Media, J. PE~IINA. . . . . . . . . . . . . . . . . . . . . . . 127-203 111. Strong Fluctuations in Light Propagation in a Randomly Inhomogeneous V. U. ZAVOROTNYI . . . . . . . . . . . . . . . 204-256 Medium, V. I. TATARSKII, IV. Catastrophe Optics: Morphologies of Caustics and their Diffraction Patterns, M. V. BERRY,C. UPSTILL . . . . . . . . . . . . . . . . . . . . . . . . 257-346 I. 11.
V O L U M E XIX (1981) I. 11.
111. IV. V.
Theory of Intensity Dependent Resonance Light Scattering and Resonance Fluorescence, B.R. MOLLOW. . . . . . . . . . . . . . . . . . . . . . . 1- 43 Surface and Size Effects on the Light Scattering Spectra of Solids, D. L. MILLS, K. R. SUBBASWAMY . . . . . . . . . . . . . . . . . . . . . . . . . . . 45-137 Light Scattering Spectroscopy of Surface Electromagnetic Waves in Solids, S. USHIODA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139-2 10 Principles of Optical Data-Processing, H. J. BUTTERWECK . . . . . . . . . 21 1-280 The Effects of Atmospheric Turbulence in Optical Astronomy, F. RODDIER 281-376 V O L U M E XX (1983)
I. 11. 111. IV. V.
Some New Optical Designs for Ultra-Violet Bidimensional Detection of M. DETAILLE, M. S A ~ S E 1- 61 Astronomical Objects, G.COURT@P. CRUVELLIER, Shaping and Analysis of Picosecond Light Pulses, C. FROEHLY, B. COLOMBEAU, 63-153 M. VAMPOUILLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-Photon Scattering Molecular Spectroscopy, S . KIELICH . . . . . . . 155-261 Colour Holography, P. HARIHARAN. . . . . . . . . , , . . . . . . . . 263-324 Generation of Tunable Coherent Vacuum-Ultraviolet Radiation, W. JAMROZ, B. P. STOICHEFF. . . . . . . . . . . . . . . . , . . . . . . . . . . . . 325-380 V O L U M E XXI (1984)
I. 11. 111. IV. V.
Rigorous Vector Theories of Diffraction Gratings, D. MAYSTRE. . . . . . Theory of Optical Bistability, L. A. LUGIATO. . . . . . . . . . . . . . . The Radon Transform and its Applications, H. H. BARRETT . . . , . , . . Zone Plate Coded Imaging: Theory and Applications, N. M. CEGLIO, D.W. SWEENEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fluctuations, Instabilities and Chaos in the Laser-Driven Nonlinear Ring R. R. SNAPP,W. C. SCHIEVE .....,.... .. Cavity, J. C. ENGLUND,
1- 67 69-216 217-286
287-354 355-428
X
CONTENTS OF PREVIOUS VOLUMES
V O L U M E XXII (1985) 1.
I1
Ill . IV . V.
VI .
Optical and Electronic Processing of Medical Images. D . MALACARA. . . I- 76 Quantum Fluctuations in Vision. M . A. BOUMAN.W. A . VAN DE GRIND. 77-144 P. ZUIDEMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spectral and Temporal Fluctuations of Broad-Band Laser Radiation. A . V . MASALOV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145-196 Holographic Methods of Plasma Diagnostics. G . V . OSTROVSKAYA. Yu.1. OSTROVSKY. . . . . . . . . . . . . . . . . . . . . . . . . . . . 197-270 Fringe Formations in Deformation and Vibration Measurements using Laser Light. I . YAMAGUCHI. . . . . . . . . . . . . . . . . . . . . . . . . . . 271-340 Wave Propagation in Random Media: A Systems Approach. R. L. FANTE . 341-398
V O L U M E XXIII (1986) I.
IV. V.
Analytical Techniques for Multiple Scattering from Rough Surfaces. J . A . DESANTO. G . S. BROWN. . . . . . . . . . . . . . . . . . . . . . . I- 62 Paraxial Theory in Optical Design in Terms of Gaussian Brackets. K . TANAKA 63-1 11 Optical Films Produced by Ion-Based Techniques. P . J . MARTIN. R . P . NETTERFIELD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113-182 Electron Holography. A . TONOMURA. . . . . . . . . . . . . . . . . . . 183-220 Principles of Optical Processing with Partially Coherent Light. F. T . S. Yu 221-275
I. I1. Ill . IV . V.
Micro Fresnel Lenses. H . NISHIHARA. T . SUHARA. . . . . . . . . . . . . 1-37 Dephasing-Induced Coherent Phenomena. L. ROTHBERC . . . . . . . . . 39-101 Interferometry with Lasers. P . HARIHARAN . . . . . . . . . . . . . . . . 103-164 . . . . . . . . . . . . . . . 165-387 Unstable Resonator Modes. K . E . OUGHSTUN Information Processing with Spatially Incoherent Light. I . GLASER. . . . . 389-509
I1 . 111.
V O L U M E X X l V (1987)
V O L U M E X X V (1988)
I. 11.
I11. IV.
Dynamical Instabilities and Pulsations in Lasers. N . B. ABRAHAM.P. MANDEL. L.M.NARDUCCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-190 Coherence in Semiconductor Lasers. M . OHTSU.T. TAKO . . . . . . . . . 191-278 Principles and Design of Optical Arrays. WANCSHAOMIN. L. RONCHI . . . 279-348 349-415 Aspheric Surfaces. G . SCHULZ . . . . . . . . . . . . . . . . . . . . . .
V O L U M E XXVI (1988) I. I1. I11. 1v. V.
Photon Bunching and Antibunching. M . C. TEICH.B. E. A . SALEH. . . . . Nonlinear Optics of Liquid Crystals. I . C. KHCO . . . . . . . . . . . . . Single-Longitudinal-Mode Semiconductor Lasers. G . P . ACRAWAL. . . . . Rays and Caustics as Physical Objects. Yu.A . KRAVTSOV. . . . . . . . . Phase-Measurement Interferometry Techniques. K . CREATH. . . . . . . .
1-104 105-161 163-225 227-348 349-393
CONTENTS OF PREVIOUS VOLUMES
XI
V O L U M E XXVII (1989) The Self-Imaging Phenomenon and Its Applications, K. PATORSKI. . , . . 1-108 Axicons and Meso-Optical Imaging Devices, L. M. SOROKO. . . . . . . . 109-160 111. Nonimaging Optics for Flux Concentration, I. M. BASSETT,W. T. WELFORD, R. WINSTON. . . . . . . . . . . . . . . , , , . . . . . . . . , , . . . 161-226 IV. Nonlinear Wave Propagation in Planar Structures, D. MIHALACHE, M. BERTOLOTTI,C. SIBILIA. . . . . . . . . , . , . . . . . . . . . . . . 227-313 V. Generalized Holography with Application to Inverse Scattering and Inverse Source Problems, R. P. PORTER . . . . . . . . . , , , . . . . . . . . . 315-397 1. 11.
V O L U M E XXVIII (1990) I. I1
111.
IV. V.
Digital Holography - Computer-Generated Holograms, 0. BRYNGDAHL, F. WYROWSKI. . . . . . . . . . . . . . . , , . . . . . . . , , , , , . 1- 86 Quantum Mechanical Limit in Optical Precision Measurement and Communication, Y. YAMAMOTO, S. MACHIDA,S. SAITO, N. IMOTO, 87-179 T. YANAGAWA, M. KITAGAWA, G. BJORK . . , , . , , . . , . . . . . , , The Quantum Coherence Properties of Stimulated Raman Scattering, M. G. RAYMER,I. A. WALMSLEY. . . . . . . . . . . , . . . . . . . . . 181-270 Advanced Evaluation Techniques in Interferometry, J. SCHWDER . . . . . 271-359 Quantum Jumps, R. J. COOK . . . . . . . . . . , , , , , , , . . . . . . 361-416
V O L U M E X X I X (1991)
I. 11. 111.
IV. V.
Optical Waveguide Diffraction Gratings: Coupling between Guided Modes, 1- 63 D. G. HALL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enhanced Backscattering in Optics, Yu.N. BARABANENKOV, Yu.A. KRAVTSOV, V. D. OZRIN,A. I. SAICHEV. . . . . . . , . , . . . . . . . . . . . . . 65-197 Generation and Propagation of Ultrashort Optical Pulses, I. P. CHRISTOV 199-291 Triple-Correlation Imaging in Optical Astronomy, G. WEIGELT . . . . . . 293-319 Nonlinear Optics in Composite Materials. I . Semiconductor and Metal Crystallites in Dielectrics, C. FLYTZANIS, F. HACHE,M. C. KLEIN,D. RICARD, PH. ROUSSIGNOL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321-4 I I
V O L U M E X X X (1992) I.
Quantum Fluctuations in Optical Systems, S. REYNAUD, A. HEIDMANN, E. GIACOBINO, C. FABRE . . . . . . . . . , , . . . . . , . . . . . . . . 1- 85 11. Correlation Holographic and Speckle Interferometry, Yu.1. OSTROVSKY, . . . , , . . . . , . . . , , , . . . , . . . . . . . . 87-135 V. P. SHCHEPINOV 111. Localization of Waves in Media with One-Dimensional Disorder, S. A. GREDESKUL, . . . . , . , . . . . . . . . . . . 137-203 V. D. FREILIKHER, IV. Theoretical Foundation of Optical-Soliton Concept in Fibers, Y. KODAMA, A. HASECAWA. . . , . . . . , . , , , . . . . . . . . . . , , . , . . . 205-259 V. Cavity Quantum Optics and the Quantum Measurement Process., P. MEYSTRE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261-355
XI1
CONTENTS OF PREVIOUS VOLUMES
VOLUME XXXI (1993) 1. 11.
111. IV. V. VI.
Atoms in Strong Fields: Photoionization and Chaos, P. W. MILONNI, B. SUNDARAM ...... .... ... ......... ........ 1- I37 Light Diffraction by Relief Gratings: A Macroscopic and Microscopic View, E. POPOV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , . 139-187 Optical Amplifiers, N. K. DUTTA,J. R. SIMPSON. . . . . . . . . . . . . . 189-226 Adaptive Multilayer Optical Networks, D. PSALTIS, Y.QIAO. . . . . . . . 227-261 Optical Atoms, R. J. C. SPREEUW, J. P. WOERDMAN . . . . . . . . . . . . 263-319 Theory of Compton Free Electron Lasers, G. DATTOLI,L. GIANNESSI, A. RENIERI, A. TORRE . . . . . . . . . . . . . . . . . . . . . . . . . . 321-412
PREFACE Once again we are presenting to our readers a volume containing a number of review articles on recent developments in optics and related subjects. In the first article an account of guided wave optics on silicon is presented. This is a subject of considerable current interest in the broad field of integrated optics and is likely to influence the design and fabrication of various optical components for future use in this field of technology The second article gives an overview of the optical implimentation of neural networks. It discusses their design, models and architecture. The following article deals with applications of the path-integral technique to the theory of wave propagation in random media. This technique, first developed in quantum field theory, has been used with considerable success in the last two decades for solutions of problems encountered in classical statistical wave theory. The article mainly considers the use of the pathintegral technique in the study of wave propagation in inhomogeneous media, for the calculation of statistical moments and for elucidating the accuracy of various approximate methods. The next article reviews methods for obtaining information about the relative location of objects in space. It includes an analysis of the potential accuracy and reliability of object location in the presence of additive Gaussian noise and a discussion of optical filters for localization of objects under various circumstances. The fifth article deals with the broad topic of radiation from uniformly moving sources. It includes discussions of the Vavilov-Cherenkov radiation, the Doppler effect in media, transition radiation and Bremsstrahlung. These phenomena are of particular importance in the electrodynamics of continuous media, especially in a plasma. In the concluding article nonlinear optical processes in atoms and weakly relativistic plasmas are discussed. The emphasis is on the specific properties of laser radiation that are important for inducing multiphoton processes and on nonlinear interactions of very intense laser pulses with electrons. The recent volumes in this series reflect a happy consequence of the political changes that have taken place in Eastern Europe in the last few XI11
XIV
PREFACE
years, namely a greater interaction between scientists from East European countries and Western scientists. We are fortunate to have been able to include in this volume several articles written by leading authorities in their respective field from the former Soviet Union. Emil Wolf Department of Physics and Astronomy University of Rochester Rochester, New York 14627, U S A July 1993
CONTENTS I. GUIDED-WAVE OPTICS ON SILICON: PHYSICS. TECHNOLOGY AND STATUS by B. P. PAL(NEWDELHI.INDIA)
5 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 5 2. PHYSICS AND ANALYSISOF OPTICAL WAVEGUIDES . . . . . . . . . . . . 2.1. Planar waveguides . . . . . . . . . . . . . . . 2.2. Power carried by a guided mode in a planar waveguide 2.3. Waveguiding in three-dimensional structures . . . . 2.4. Multilayer waveguides . . . . . . . . . . . . 5 3. TECHNOLOGY OF SILICON-BASED OPTICAL WAVEGUIDES . . . 0 4. GUIDED-WAVE OPTICALCOMPONENTS ON SILICON . . . . . 5 5 . ACTIVE WAVEGUIDES ON SILICON . . . . . . . . . . . 5 6. CONCLUSIONS. . . . . . . . . . . . . . . . . . ACKNOWLEDGE MEN^ . . . . . . . . . . . . . . . . .
3 6 6 12 13 20 25 38 49 50 51 51
. . . . . .
. . . . . . .
. . . . . .
. . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . .
. . . . . .
REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . .
I1. OPTICAL NEURAL NETWORKS ARCHITECTURE. DESIGN AND MODELS by FRANCIS T . S. Yu (UNIVERSITY PARK.PA. USA)
0 1.
INTRODUCTION . . . . . . . . . . . . . . . . 8 2. OPTICALASSOCIATIVE MEMORY. . . . . . . . . . 5 3. OPTICALNEURALNETWORKS . . . . . . . . . . . 3.1. Two-dimensional implementation . . . . . . . 3.2. LCTV-based optical neural networks . . . . . . 3.3. Compact optical neural networks . . . . . . . 3.4. Mirror-array interconnected neural networks . . . 3.5. Optical disk-based neural networks . . . . . . . 5 4. NEURALNETWORK MODELS . . . . . . . . . . . 4.1. Hopfield model . . . . . . . . . . . . . . 4.2. 4.3. 4.4. 4.5. 4.6. 4.7.
Back-propagation model . . Orthogonal-projection model . Multilevel recognition model . Interpattern-association model Hetero-association model . . Space time-sharing model . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
63 66 69 69 71 73 74 77 80 81 83 84 87 89 96 100
XVI
CONTENTS
$ 5. REDUNDANTINTERCONNECTION NEURAL NETWORKS . . . . . . . . . . . 5.1. Redundant-interconnection IPA model . . . . . . . . . . . . . 5.2. Minimum redundant IPA model . . . . . . . . . . . . . . . . 5.3. Simulated and experimental results . . . . . . . . . . . . . . . $ 6. OPTICAL IMPLEMENTATION OF HAMMING NETS . . . . . . . . . . . . . 6.1. Hamming net model . . . . . . . . . . . . . . . . . . . . 6.2. Optical implementation . . . . . . . . . . . . . . . . . . . 6.3. Experimental demonstrations . . . . . . . . . . . . . . . . . $ 7. INFORMATION STORAGE CAPACITY . . . . . . . . . . . . . . . . . 7.1. Upper bound . . . . . . . . . . . . . . . . . . . . . . 7.2. Lower bound . . . . . . . . . . . . . . . . . . . . . . 7.3. Moment-invariant neurocomputing . . . . . . . . . . . . . . . $ 8. SELF-ORGANIZING OPTICALNEURAL NETWORKS . . . . . . . . . . . . 8.1. Kohonen’s feature map . . . . . . . . . . . . . . . . . . . 8.2. Unsupervised learning . . . . . . . . . . . . . . . . . . . 5 9. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . .
105 106 108 110 113 114 117 120
121 123 125 126 131 131 135 142 143
111. THE THEORY O F OPTIMAL METHODS FOR LOCALIZATION O F OBJECTS IN PICTURES by L. P. YAROSLAVSKY (BETHESDA.MD. USA)
147 $ 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . OF TWO-DIMENSIONAL $ 2. THE ACCURACY AND RELIABILITY OF THE LOCALIZATION 149 OBJECTS ON A PLANE . . . . . . . . . . . . . . . . . . . . . 2.1. Localization of a single object in the presence of additive white Gaussian noise: optimal localization device and two types of localization errors . . . 149 2.2. Localization of a single object in the presence of additive white Gaussian noise: potential accuracy of coordinate measurements . . . . . . . . 152 2.3. Localization of a single object in the presence of additive Gaussian noise: measurement accuracy for non-optimal estimator; localization in non-white 156 noise . . . . . . . . . . . . . . . . . . . . . . . . . 2.4. Optimal localization in color pictures . . . . . . . . . . . . . . 159 2.5. Localization of an object in the presence of additive Gaussian noise: reliability of coordinate measurements . . . . . . . . . . . . . . . . . 162 2.6. Localization reliability in the presence of additive white Gaussian noise: more accurate estimation and approximation of the localization error distribution 165 density . . . . . . . . . . . . . . . . . . . . . . . . . 2.7. Localization reliability in the presence of additive white Gaussian noise and 169 multiple outside objects . . . . . . . . . . . . . . . . . . . OF OBJECTSON A COMPLEX BACKGROUND WITH A MINIMUM OF $ 3. LOCALIZATION 172 ANOMALOUSERRORS. . . . . . . . . . . . . . . . . . . . . . 3.1. Formulation of the problem . . . . . . . . . . . . . . . . . 172 3.2. Localization of an exactly known object for the spatially homogeneous opti174 mality criterion . . . . . . . . . . . . . . . . . . . . . . 181 3.3. Localization of inexactly known objects . . . . . . . . . . . . . 3.4. Reliable localization for spatially inhomogeneous objects . . . . . . . 185
CONTENTS
XVII
3.5. Reliable localization in blurred pictures . . . . . . . . . . . . . 3.6. Optimal localization in multicomponent pictures with cluttered background 3.7. Phase.only., binary phase.only., minimum average correlation energy.. entropy.optimized. and other filters for optical pattern recognition; reliable localization and picture contours . . . . . . . . . . . . . . . . 3.8. Selection of reference objects from the standpoint of localization reliability $ 4. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . REFERENCES
. . . . . . . . . . . . . . . . . . . . . . . . . . .
187 188
189 196 199 200 200
IV . WAVE PROPAGATION THEORIES IN RANDOM MEDIA BASED ON THE PATH-INTEGRAL APPROACH by M . I. CHARNOTSKII. J. GOZANI.V. I . TATARSKII AND V . U . ZAVOROTNY (BOULDER. CO. USA)
$ 1 . INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . $ 2. PROBLEM FORMULATION AND GOVERNING EQUATIONS. . . . . . . . . . 2.1. Parabolic equation for a field in a random medium . . . . . . . . . 2.2. Moment equations . . . . . . . . . . . . . . . . . . . . . 2.3. Plane-wave-type fourth-moment equations . . . . . . . . . . . . 2.4. Generalization of the problem . . . . . . . . . . . . . . . . . $ 3. INTRODUCTION TO PATHINTEGRALS . . . . . . . . . . . . . . . . . 3.1. Derivation of path-integral representation of the parabolic equation solution 3.2. Unconditional and conditional path integrals . . . . . . . . . . . 3.3. The probabilistic interpretation . . . . . . . . . . . . . . . . 3.4. Phase-space path integral . . . . . . . . . . . . . . . . . . IN INHOMOGENEOUS MEDIA. . $ 4. PATH-INTEGRAL REPRESENTATIONS OF WAVE FIELDS 4.1. Basic unconditional and conditional Feynman path-integral representations 4.2. Velocity representation for path-integral variables . . . . . . . . . . 4.3. Variational operator representation . . . . . . . . . . . . . . . 4.4. Plane-wave expansion . . . . . . . . . . . . . . . . . . . 4.5. Orthogonal expansion of paths . . . . . . . . . . . . . . . . $ 5. PATH-INTEGRAL REPRESENTATIONS OF MOMENTS . . . . . . . . . . . . 5.1. Second-moment path-integral representations . . . . . . . . . . . 5.2. Fourth-moment path-integral representations . . . . . . . . . . . 5.2.1. Spherical wave expansion . . . . . . . . . . . . . . . . 5.2.2. The outgoing plane-wave expansion . . . . . . . . . . . . 5.2.3. The incoming plane-wave expansion . . . . . . . . . . . . 5.2.4. The mixed plane-wave expansion . . . . . . . . . . . . . $ 6. THE CONNECTION BETWEENHEURISTICAPPROXIMATIONS AND PATH-INTEGRAL REPRESENTATIONS . . . . . . . . . . . . . . . . . . . . . . . 6.1. Heuristic field approximations . . . . . . . . . . . . . . . . 6.2. Fourth-moment heuristic approximations . . . . . . . . . . . . . 6.3. An orthogonal expansion of the path integral for the fourth moment . . . $ 7. CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . .
205 209 209 211 214 215 217 217 220 221 227 229 229 231 233 234 237 241 242 244 244 246 248 249 251 251 254 257 261 262 262
XVlIl
CONTENTS
V. RADIATION BY UNIFORMLY MOVING SOURCES Vavilov-Cherenkov effect. Doppler effect in a medium. transition radiation and associated phenomena
by V. L. GINZBURG (Moscow. RUSSIA)
. . . . . . . . . . . . . . . . . § 1. INTRODUCTION EFFECTFOR A CHARGE . . . . . § 2. VAVILOV-CHERENKOV THEORY OF THE VAVILOV-CHERENKOV EFFECT . § 3. QUANTUM
. . . . . . .
269
. . . . . . . .
. . . . . . . .
5 4.
VAVILOV-CHERENKOV RADIATIONIN THE CASE OF MOTIONIN CHANNELS AND GAPS RADIATIONFOR ELECTRIC.MAGNETICAND TOROIDAL § 5. VAVILOV-CHERENKOV 281 DIPOLES . . . . . . . . . . . . . . . . . . . . . . . . . . AND QUANTUM THEORIES OF THE DOPPLER EFFECTIN A MEDIUM . . 288 § 6. CLASSICAL 292 . . . . . . . . . . . . . . . . . . . . § 7. ACCELERATIONRADIATION AT THE BOUNDARY BETWEEN TWO MEDIA . . . . . . . RADIATION 294 § 8. TRANSITION 8 9. TRANSITION RADIATIONAS A MORE GENERAL PHENOMENON . FORMATION ZONE . . 299 10. TRANSITION SCATTERING . TRANSITION BREMSSTRAHLUNG . . . . . . . . . 303 9 1 1. TRANSITION RADIATION, TRANSITION SCATTERING AND TRANSITION BREMSTRAHLUNG IN A PLASMA. . . . . . . . . . . . . . . . . . 306 REMARKS. . . . . . . . . . . . . . . . . . . . . 309 12. CONCLUDING REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . 311 VI . NONLINEAR OPTICAL PROCESSES IN ATOMS AND IN WEAKLY RELATIVISTIC PLASMAS by G . MAINFRAY AND C . MANUS (GIFSUR YVETTE, FRANCE)
. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . 2. LASERLIGHTPOLARIZATION EFFECTS. . . . . . . . . . . . . . . . 3. RESONANCEEFFECTS. . . . . . . . . . . . . . . . . . . . . . TEMPORAL-COHERENCE EFFECTS IN NONRESONANT MULTIPHOTON IONIZATION $! 4. LASER OF ATOMS . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. General discussion . . . . . . . . . . . . . . . . . . . . . 4.2. Comparison with experiments . . . . . . . . . . . . . . . . . 4.2.1. The two-mode case . . . . . . . . . . . . . . . . . . 4.2.2. The multimode case . . . . . . . . . . . . . . . . . . 5. LASERTEMPORAL-COHERENCE EFFECTS I N RESONANTMULTIPHOTON IONIZATION OF ATOMS . . . . . . . . . . . . . . . . . . . . . . . . . . . SELF-FOCUSING OF A LASERPULSEIN A PLASMA . . . . . . . § 6. RELATIVISTIC 6.1. Recent possibility of observing new physical effects . . . . . . . . . 6.2. Self-trapping of a long laser pulse in a plasma in equilibrium with the laser field . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3. Self-focusing and self-trapping of ultra-short laser pulses . . . . . . . 6.3.1. Self-focusing due to relativistic effects . . . . . . . . . . . . 6.3.2. Relativistic and ponderomotive effects in self-focusing . . . . . . 6.3.3. Potential representation . . . . . . . . . . . . . . . . . 6.3.4. Recent developments . . . . . . . . . . . . . . . . . . 6.4. General discussion and conclusions . . . . . . . . . . . . . . . REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . .
315 316 317
AUTHORINDEX. . . . . . . . . . . . . . . . . . . . . . . . . .
363 373 377
!j 1
SUBJECT INDEX. . . . . . . . . . . . . . . . . . . . . . . . . . CUMULATIVE INDEX . . . . . . . . . . . . . . . . . . . . . . . .
321 321 325 326 328 332 335 335 336 342 342 345 348 351 355 358
E. WOLF, PROGRESS IN OPTICS XXXII 1993 ELSEVIER SCIENCE PUBLISHERS B.V.
I
GUIDED-WAVE OPTICS ON SILICON: PHYSICS, TECHNOLOGY, AND STATUS* BY
B. P. PAL** Electromagnetic Technology Division, National Institute of Standards and Technology, 325 Broadway, Boulder, CO 80303, U S A
* This work was done for the US government and is not subject to copyright. ** Permanent address: Department of Physics, Indian Institute of Technology, New Delhi, 110016, India. I
CONTENTS PAGE
. . . . . . . . . . . . .
0 1. 0 2.
INTRODUCTION . . .
9 3.
TECHNOLOGY OF SILICON-BASEDOPTICAL WAVEGUIDES . . . . . . . . . . . . . . . .
PHYSICS AND ANALYSIS OF OPTICAL WAVEGUIDES
3 6
.
25
0 4. GUIDED-WAVEOPTICAL COMPONENTS ON SILICON
38
. . . 8 6. CONCLUSIONS . . . . . . . . . . . ACKNOWLEDGEMENTS. . . . . . . . . . REFERENCES . . . . . . . . . . . . . .
49
Q 5. ACTIVE WAVEGUIDES ON SILICON
2
. . . .
. . . . . . . . . . . . . . . .
50
51 51
0 1.
Introduction
The aim of integrated optics is to combine discrete miniature components on a common substrate by means of optical waveguides, much like the integrated-circuit technology combines electronic devices on a single substrate. In fact, the term integrated optics was coined by Miller [1969] from this analogy. As an example, Miller proposed a miniature optical repeater that would use guided-wave optics to integrate an optical detector, an amplifier, and a light source on a common substrate. However, there is one essential difference between optical and electronic technologies. Integratedoptical circuits usually require a relatively large length-to-width ratio for a usable device. These are typically a few optical wavelengths in width, but several millimeters to centimeters in length (Papuchon [19861). However, monolithic integration of several optical components requires key devices like source, detector, modulator, and switch all built around a single material. To fulfill these functions simultaneously, the material must (i) possess a direct band gap, (ii) exhibit strong absorption at the wavelengths of interest, (iii) be optically active, and (iv) have a large electro-optic or acousto-optic coefficient. Since any such single material was unavailable in the 1970s, research and development efforts concentrated on a hybrid technology instead of a monolithic technology. In the hybrid technology, several guided-wave components are built around different materials and then assembled and operated as an integral system. Integrated optics is also often generically, although somewhat loosely, used to refer to any discrete-device-like modulator, e.g., a directional coupler, which has been formed out of optical waveguides. The last two decades have witnessed demonstrations of a variety of guided-wave optical devices, such as wavelength division multiplexers/demultiplexers (Aiki, Nakamura and Umeda [1977]), spectrum analyzers (Mergerian, Malarkey, Pautienus, Bradley, Marx, Hutcheson and Kellner [1980], Thylen and Stensland [1982]), analog-to-digital converters (Taylor [19781, Leonberger, Woodward and Spears [19791, Chang and Tsai [1983]), digital correlators (Verber, 3
4
GUIDED-WAVE OPTICS ON SILICON
CI, 4
1
Kenan and Busch [1983]), switches and modulators (Schmidt and Alferness [1979]), directional couplers (Papuchon, Combemale, Mathieu, Ostrowsky, Reiber, Roy, Sejourne and Werner [19751, Schmidt and Alferness [1979]), filters (Schmidt and Alferness [19791, Alferness and Buhl [1982]), and signal samplers (Izutsu, Haga and Sueta [1983]). Those ranged from passive to active devices and were based on insulators such as LiNbO,, glass, semiconductors like GaAs/AlGaAs, InGaAsP/InP, thin films of silica, or doped silica on silicon. Out of these, InGaAsP/InP offers a good choice for monolithic integration of several devices (Merz, Yuan and Vawter [19851). During the 1980s much effort was spent in realizing guided-wave optical components based on semiconductors, although the technologies based on LiNbO, (Korotky and Alferness [1987]) and glass (Findalky [1985], Ramaswamy and Srivastava [ 19881, Hashizume, Seki and Nakoma [ 19891, Nissim, Beguin, Jansen and Laborde [1989]) developed at a much faster pace. Since high-performance discrete devices are now available from these technologies, a question about the rationale behind integration is sometimes raised. The rationale for integration can be appreciated from the following example. Through a monolithic optoelectronic integrated circuit, a detectoramplifier can be connected with a capacitance as low as 0.2 p F as opposed to about 1 p F required by the combination of a discrete photodiode and a discrete amplifier (Carney and Hutcheson [19871). This five-fold decrease in capacitance would yield a corresponding increase in detector bandwidth, since the RC time constant related bandwidth of a detector is given by (27cRC)-', where R and C represent the resistance and the capacitance of the detector circuit. Thus optical integration of different devices is expected to yield higher performance due to lower parasitic capacitances and inductances, and fewer interconnecting discrete components. Furthermore, integration can achieve an increase in the density of functional devices and, often, a lower manufacturing cost. Considerable research activity in this direction recently has been concerned with integration of optical and electronic components on a single 111-V semiconductor substrate such as GaAs and InP or ternary/quarternary compounds lattice-matched to these semiconductors and grown on these substrates. This has led to the emergence of optoelectronic integrated circuits (Koren [19891) in which all optical and electronic functional components are built on a single substrate such as InP, GaAs, Si, Ge, or GaAs on Si. A second option is a hybrid approach in which different functional units can be surface-mounted on a common substrate with guided-wave components as interconnections between them. Due to the maturity of the silicon
I.§ 11
INTRODUCTION
5
technology, which is extensively used for forming large-scale and very largescale integrated circuits in the area of microelectronics, silica on silicon is a good choice for such hybrid integration. The attractive features of silicon include the following: - An established electronic substrate for integrated circuit technology. Thus, integrated optics on silicon is synergetic with microelectronics. - Availability in excellent quality. - Availability of relatively large silicon wafers (up to about 16 or 20cm), making it suitable for integrating a number of components on a common substrate. - Relatively high thermal conductivity (1.6 cm - K - ), enabling surfacemounting of active components like 111-V compound laser diodes. - Transparency over infrared wavelengths ranging from 1.2 to 1.6 pm, which is the lowest loss wavelength transmission window of silica-based optical fibers. Thus, optical waveguide components on silicon are suitable for efficient coupling to optical fibers in the important wavelength range. Furthermore, since the materials are similar, easy design procedures are allowed to match guided-mode spot sizes between an optical waveguide and a fiber. - Amenability to anisotropic etching (Bean [1978], Kendall [1979], Peterson [1982], Matsuo [1978], Matsuo [1980]), enabling easy fiber attachment through etched mechanical fixtures like V- or U-grooves (Boyd and Sriram [1978], Grand, Denis and Valette [1991]). - Amenability to dry etching, which allows flexibility in component integration technology. - Highly developed technology for etching, cutting and dicing, polishing, and photolithography, yielding the potential for mass production. - Availability at lower costs than 111-V compounds. - Less toxic than 111-V compounds. - Well-developed silica deposition technologies for waveguide fabrication are available. Despite these attractive features, silicon was not seriously viewed as a potential candidate for integrated optics until recently (Stutius and Streifer [1977], Boyd, Chang, Fan and Ramey [1981], Willander [1983a], Falco, Botineau, Azema, De Micheli and Ostrowsky [1983], Willander [1983b], Kawachi, Yasu and Kobayashi [1983], Boyd, Wu, Zelmon, Neumaan and Timlin [1984], Yamada, Kawachi, Yasu and Kobayashi [1984a, 1984b1, Kawachi, Yamada, Yasu and Kobayashi [1985], Lee, Henry, Kazarinov and
6
[I, § 2
GUIDED-WAVE OPTICS ON SILICON
Orlowsky [1987], Aarnio, Honkanen and Leppihalme [1987], Hall [1987], Valette [1987, 19881, Hickernell [1988], Soref and Lorenzo [1988], Valette, Renard, Denis, Jadot, Fournier, Philippe, Gidon, Grouillet and Desgranges [1989], Soref and Ritter [1990], Baba, Kokubun and Watanabe [1990], Pal, Singh, Ghatak and Bhattacharya [19903, Kawachi [19903, Valette, Renard, Jadot, Gidon and Erbeia [1990], Tewari, Singh and Pal [1990], Takagi, Jinguji and Kawachi [1991], Adams, Shani, Henry, Kistler, Blonder and Olsson [19911, Kawachi, Miya and Ohmori [19911, Welbourn, Beaumont and Nield [1991], Kokubun, Tamura and Kondo. [1991], Henry [1991], Okamoto [19911, Kawachi [19911). The subject is important because future optical networks will require a large variety of optical components for switching, branching, combining, and wavelength multiplexing and demultiplexing of optical signals and data. The intense interest in this technology prompted this attempt to present a unified description of guided-wave optics on silicon.
0 2.
Physics and Analysis of Optical Waveguides
2.1. PLANAR WAVEGUIDES
The physics of an optical waveguide is best illustrated through a planar or slab geometry. As shown in fig. 1, it involves a guiding region of refractive index n, surrounded by a cover and a substrate of refractive indices n, and n, on each side with n,, n, < n, (Marcuse [1974], Sodha and Ghatak [1977], Adams [1981], Ghatak [1986], Ghatak and Thyagarajan [1989], Pal
X
ti, d
Cover
nc ,n1,n,
Core
n1
Substrote
n, X
1 Fig. 1. Schematic of a planar optical waveguide in which a core of uniform refractive index n, is surrounded by a cover of refractive index nc and a substrate of refractive index n,; the refractive index distribution n(x) is also shown.
I,§ 21
PHYSICS A N D ANALYSIS
7
[1987]). If n, = n,, the waveguide is called a symmetric waveguide. We assume the refractive index to be varying only along x as n2(x)= n," x 2 d = nf O < x < d = n," x < 0
(cover), (core), (substrate).
The governing wave equation is (Ghatak [1986]) a z y vi Y + EoponZ(x)= 0, at2
where E~ and po are the free-space dielectric permittivity and the magnetic permeability (the media are assumed to be nonmagnetic), respectively. Here Y stands for either d or X , and it encompasses both the spatial and temporal variations of the electric and magnetic fields. If we choose the z-axis as the direction of propagation (fig. l), without any loss of generality, spatial variations of the fields will be confined to the xz plane. If the time dependence is e'"', the solution of eq. (2.2) is
where j represents the propagation constant. Values of /.? are dictated by the waveguide parameters. The boundary conditions allow only a discrete set of P's. The transverse field distributions Ej(x) and Hj(x) (which remain invariant with propagation) corresponding to these discrete p's constitute the guided or bound modes of the waveguide. Any arbitrary electromagnetic field incident at the input end of the waveguide can be expanded into a sum over the waveguide's allowed discrete (or guided) modes and a continuum of radiation modes (Marcuse [19741, Ghatak and Thyagarajan [1989]): Y(x, y, z, t)=
c a $ (x) e'("'-Pp') p
p
+
a(/.?)i,ha(x)ei("'-8') dp.
(2.4)
P
In eq.(2.4), $ on the right-hand side stands for either E or H,and p labels a particular mode; the coefficient up is such that power in the pth mode is proportional to 1 ap1.' Furthermore, these guided modes are mutually orthogonal and are normalized so they satisfy the orthonormality condition (Ghatak and Thyagarajan [1989])
8
[I, 0 2
GUIDED-WAVE OPTICS ON SILICON
where 6,, is the Kronecker delta function: J, = 0
=1
for p # 4, forp=q.
(2.6)
The orthonormality condition normalizes the power carried by each mode to 1. The radiation modes also form an orthogonal set, although the orthonormality condition is required to be defined appropriately in terms of the Dirac delta function (Ghatak and Thyagarajan [19891). For the index distribution given by eq. (2.1), the structure (fig. 1) will support both TE and TM modes. For the TE modes, E y , H,, and H , are the only nonzero field components, whereas for the TM modes, corresponding nonzero field components are H y , E x , and E,. The TE modes satisfy the wave equation (Pal [1987]) d2EY(4 + dx2
E,(x)= 0,
where
represents transverse component of the plane wave vector k(x) (= k,n(x)). Solutions of eq. (2.7) for the index distribution (eq. (2.1)) can be written as (Pal [1987])
(2.9) where yf
=
fii - kin,”,
x i = kin:
-Pi,
y:
=fig
- kin:.
(2.10)
For a guided mode the field is oscillatory within the core, and is exponentially decaying in the cover and substrate. Accordingly, y,, xp, and y,, are all real and positive. Hence, for a guided mode konl > P, > ken, > konc,
(2.1 1)
where it is assumed that n, is less than n,. The portion of the field outside the core, which exponentially decays in the cover and the substrate (eq. (2.9)), is known as the evanescent tail of a guided mode, which is used in practice to construct a number of waveguide components, such as directional couplers
1, § 21
PHYSICS AND ANALYSIS
9
and polarizers. The continuity of Ey and H, across the film-cover and filmsubstrate interfaces leads to the following eigenvalue or characteristic equation (Pal [1987]) (2.12)
or
up=
+ 4, + p4/d,
P = 0, 1,2, ...,
(2.13)
where tan $s = ys/up and tan 4, = yc/up. Solutions of eq. (2.13) yield Po, P I , /I2, ..., which correspond to the TEo, TE,, TEz, ..., modes of the waveguide. For a symmetric waveguide, eq. (2.12) is transformed to (Pal C198711 (2.14) Since tan(X,d) can be expanded as 2 tan(+cpd)/[l - tanz(+u,d)], eq. (2.14) may be recast as a quadratic equation in tan(+u,d), the solution of which yields (Pal [19871) $u,d tan(+c,d)
= iyzd,
(2.15a)
(2.15b) where y: = - k i n : , and n, = n, = n z . The modal field associated with the propagation constants yielded by solutions of eq. (2.15a) is (Pal [1987]) E y b )=
D COS[K,(X -id)], cos ($Kpd)
(2.16)
which is symmetric in x about the mid-plane of the core. The corresponding modal field associated with the propagation constants yielded by solutions of eq. (2.15b) is E Y ( 4=
D sin(K,($d sin (&,d)
- x)),
(2.17)
Thus, in a symmetric waveguide the guided TE modes will, in general, consist of symmetric (eq. (2.1 6)) and antisymmetric (eq. (2.17)) modes. For a quantitative evaluation of the modal fields of a given planar waveguide one must solve the transcendental equation, eq. (2.15), for /3 either graphically or
10
GUIDED-WAVE OPTICS ON SILICON
CA 0 2
numerically. We introduce two dimensionless parameters V 2 =$(K,” + y $ ) d 2 = S k i d 2 ( n : - n i )
(2.18a)
and b=
P 2 / k i - ni n: - n$ ’
(2.18b)
In view of eq. (2.1 l), O
(2.19)
In terms of V and b, eqs. (2.15a) and (2.15b) can be rewritten as tan(V&%)
=
Jb/o
tan( V&%)
=
-
(symmetric modes)
(2.20a)
(antisymmetric modes).
(2.20b)
and
,/-
For a given waveguide and operating wavelength, V is known. For that V, a plot of the left- and right-hand sides of eqs. (2.20) as a function of b in the range defined by eq. (2.19) on the same figure will yield b (hence P) through their intersections. The number of intersections also determines the number of modes. As an example, consider a symmetric planar waveguide formed from silicon nitride (Si,N4) as the core layer with silica (Si02) as the surrounding medium. If we assume the operating wavelength to be 0.6328 pm, the refractive index of Si,N4 is 2.014 and that of SiOl is 1.458. Since I, n , , and n2 are fixed, different values of V correspond to different widths ( d ) of the waveguide. For example, for a V = 1, d x 0.14 pm in such a nitride (Si02/Si,N4/Si02) planar waveguide. Figure 2 shows universal dispersion curves depicting b as a function of V of the TE modes in a symmetric planar waveguide; values of b at different values of V are obtained by solving eqs. (2.20a) and (2.20b). We find from fig. 2 that for V = 1, which corresponds to a d N 0.14 pm, a silicon nitride waveguide will support only one TE mode at 0.6328 pm wavelength. If d is increased to ~ 0 . 7 pm, 3 V increases to ~ 5and , the silicon nitride waveguide will support two symmetric and one antisymmetric TE modes at 0.6823 pm wavelength. We may state that for V = 1, all modes except the TE, mode are cut off in a planar waveguide. By definition a mode is cut off in a symmetric waveguide when the propagation constant p, equals k,n2, which implies that b = 0. Thus at cutoff, eq. (2.13a) becomes (Pal [19871) x=+p71,
p = o , 1,2,...
(2.21a)
11
PHYSICS AND ANALYSIS 1-
p=o
2.5
5
7.5
10
12.5
15
v Fig. 2. Normalized b-V dispersion curve of a symmetric planar waveguide. The full curve corresponds to universal curves for TE modes, and the dashed curve (- - -) corresponds to TM modes of a Si02/Si3N4/Si02symmetric waveguide for which the refractive index of SiOz and Si3N4 are 1.458 and 2.014, respectively, at 0.6328 pm. (-)
where V, stands for normalized cutoff frequency, i.e., the V-number at which a mode is cut off. Even values of p correspond to symmetric TE modes, and odd values to antisymmetric modes. Equation (2.21a) shows that the TEo mode is never cut off in a symmetric planar waveguide. For design purposes the cutoff condition, eq. (2.21a), can be rewritten in a more useful form (Pal C19871) (2.21b) This condition implies that the smallest ratio ( d / l ) for the pth mode to be supported in a waveguide is given by eq. (2.21b). For example, for any d / A less than 0.35, a silicon nitride planar waveguide will function as a singlemode (only TEo mode) waveguide. These calculations can be repeated for the TM modes of an asymmetric planar waveguide. The eigenvalue equation is
(2.22)
tan ( K p d )
For a symmetric planar waveguide the eigenvalue equations for the TM modes are t a n ( V m ) =( n : / n i ) , / m
(symmetric modes)
(2.23a)
12
CL § 2
GUIDED-WAVE OPTICS ON SILICON
TABLE1 TM modes
TE modes Asymmetric waveguide Eigenvalue equation
Eqs. (2.12) or (2.13)
Cutofl
tan - 1
.
Symmetric waveguide
Eqs. (2.20a)
Asymmetric waveguide Eq. (2.22)
and (2.20b)
J;;+ t p n
Symmetric waveguide Eqs. (2.23a) and (2.23b)
tpn
v,
( p = O , l,2,.:.)
and tan( V
m
)= - ( n : / n : ) , / m
(antisymmetric modes). (2.23b)
The b-V dispersion curves for the TM modes of a silicon nitride planar waveguide are shown in fig. 2 as dashed curves. The ratio of the core-tocladding refractive index at 0.6328 pm wavelength is about 1.38 in such a waveguide. We list important results for the TE and TM modes of a planar waveguide in table 1. For an asymmetric waveguide, V;f" is greater than V;fE. For the lowest order mode (Ghatak [1986]), V;fE= tan-'&
and
V:"
= tan-'(n:&/nf).
Here a represents the asymmetry parameter, and it is defined as a =(nf - n:)/(n: - nf). Thus, for tan-'&< V < tan-'(n:&/nf), only the TEo mode is supported in an asymmetric planar waveguide. Such a waveguide, in which all modes except the TEo mode are cut off, is called a singlepolarization, single-mode waveguide (Ghatak [19861, Ghatak and Thyagarajan [1989]). In the case of weakly guiding waveguides, for which n, x n,, TE and TM modes are nearly degenerate. 2.2. POWER CARRIED BY A GUIDED MODE IN A PLANAR WAVEGUIDE
The energy density associated with the electromagnetic field, by definition, is given by the time average of the corresponding Poynting vector (S), (S)=(bxA?)=tRe(bxA?),
(2.24)
where (...) implies time average. In eq. (2.24) both temporal and spatial
1 7 8 21
13
PHYSICS AND ANALYSIS
dependences are assumed to have been included in the field components. As an example, we consider only symmetric TE modes. Simple algebraic manipulations lead to an expression for the net power carried by a symmetric TE mode along z-direction per unit length along y (Pal [1987]) Re(& x %‘).idx (2.25)
where 5 = $cpd and 2 is the unit vector along z. Equation (2.25) shows that the guided-mode power is confined to an effective guide half-width of ( i d + 1/y2). We can thus ascribe a confinement factor r to each mode by the following definition (Pal [1987]) power inside the core r =opticaltotal optical power (2.26) At mode cutoff, f l = k 0 n 2 , so y 2 = 0, and hence ceases to be guided inside the core.
r = 0; the mode, therefore,
2.3. WAVEGUIDING IN THREE-DIMENSIONALSTRUCTURES
The density of guided-wave components on a substrate can be greatly increased by confining the guided optical energy in both the x and y directions. In contrast to the planar geometry, three-dimensional waveguides (fig. 3) consist of rectangular or near-rectangular cores, which are difficult to analyze. Studies of propagation effects in them generally require extensive
. U Fig. 3. Some examples of three-dimensional waveguide geometries:(a) raised strip, (b) embedded strip, (c) rib, (d) strip-loaded; in all these geometries the shaded region represents the core.
14
GUIDED-WAVE OPTICS ON SILICON
CI. 0 2
numerical analyses (Marcatili [19693, Goell [1969]). Hocker and Burns [19771 have, however, proposed a relatively simple and approximate approach, which is called the effective-index method (Knox and Toulios [1970]). This method can be illustrated through the example of an embedded strip waveguide (fig. 3b), in which a core that has dimension d , x db and refractive index nl and has on its three sides a medium of refractive index n3. The core is assumed to be covered by a medium of refractive index n2. The method starts with the assumption that the waveguide extends infinitely along y. We then find the modes of an asymmetric slab waveguide (fig. 4a) in the xz plane consisting of a core of width d, that has index nl, and is sandwiched between two media of refractive indices n2 and n 3 . The analysis presented in 2.1 can be extended to obtain the propagation constant p, from the mode dispersion curve: b versus I/ of the guided modes for this asymmetric waveguide. Depending on polarization of the input beam, either TE (electric field along y) or TM (magnetic field along y) modes will be excited. An effective index .iff(=P,/k,) can be associated with the pth mode of this waveguide. At the next step the method assumes that the entire asymmetric waveguide along x can be replaced by a core material of index np. In the y z plane we thus obtain a symmetric planar pseudo-waveguide of core refractive index np and of width d b surrounded by a medium of index n3 (fig.4b). Thus, we can study propagation in such a symmetric pseudo-waveguide by finding the propagation constants & of different modes from the mode dispersion curve (fig. 2). For each value of p there will be q solutions for the effective waveguide structure in the y z plane. Thus, in the effective-index model, each mode is designated with a pair of subscripts: p and q. According to Hocker and Burns [1977], the agreement in the values of j,, found for the effective-index method and more nearly exact numerical
Fig. 4. Effective-index model: (a) an asymmetric planar waveguide of width d, in the xz plane with a core of refractive index n , sandwiched between a cover and a substrate of refractive index n2 and n3, respectively; (b) a symmetric planar waveguide of width d , in the y z plane with a core of refractive index niff surrounded by a medium of refractive index n 3 .
1 3 0 21
PHYSICS AND ANALYSIS
15
methods is quite good for the lowest-order modes and for large aspect ratios (Marcatili [19693, Goell [19691). In particular, the agreement is extremely good far from cutoff. In view of this and the simplicity of the model, the effective-index method is used extensively in the literature to model propagation in three-dimensional waveguides. In the mid- 1980s an alternate and relatively simple technique (Kumar, Thyagarajan and Ghatak [19831) based on perturbation theory was proposed to deal with such rectangular geometries. This technique was applied to a number of integrated-optical waveguide geometries and devices (Kumar, Thyagarajan and Ghatak [1983], Kumar, Kaul and Ghatak [1985], Varshney and Kumar [1988]). The method relies on choosing a fictitious rectangular optical waveguide, the index profile of which is separable in x and y coordinates, and which closely resembles the actual waveguide except at the corners. Since the index profile is separable in x and y, the modal solution to the fictitious waveguide becomes extremely simple. Furthermore, since the real index profile differs little from the hypothetical profile, simple perturbation theory is then applied to obtain the propagation characteristics, and hence the b-V dispersion curves of the real waveguide. As an example, consider the ridge waveguide of fig. 5a. It can be approximated as a fictitious waveguide of the form shown in fig. 5b, with index profile (Varshney and
Fig. 5. Perturbation theory model to analyze a three-dimensional waveguide: (a) real waveguide (adapted with permission of IEEE from Varshney and Kumar [l988], 0 1988 IEEE); (b) fictitious waveguide with a dielectric constant profile separable in x and y coordinates; the dielectric constant profile of the fictitious waveguide differs from the real waveguide only in the shaded regions.
16
GUIDED-WAVE OPTICS ON SILICON
[I, § 2
(2.27)
(2.28a)
(2.28b) The dielectric profile (eq. (2.27)) matches the dielectric profile of the real waveguide everywhere except at the shaded regions shown in fig. 5b. The modal solution to the fictitious waveguide is given by a solution of the scalar wave equation (Kumar, Thyagarajan and Ghatak [1983], Varshney and Kumar [19881) (2.29) where /lois the propagation constant. Equation (2.29) can be converted into two independent equations in x and y by substituting $(x, y ) = X ( x ) Y ( y ) , and by using the method of separation of variables, (2.30) and (2.31)
+
where flt = fl: fl: - ktn:. The solutions to eq. (2.30) in different regions of the waveguide are (Varshney and Kumar [1988]) (2.32)
Vx= f k o d a J m .
(2.33)
1 3 5
21
17
PHYSICS AND ANALYSIS
Here, 8 = 0 for a mode symmetric in x, and 8 = fn for a mode antisymmetric in x, and A and B are constants. The solutions to eq. (2.31) in different regions of the waveguide are Y(Y)= A2 exP(-Y,Y/t), y>db, = A 1 cos(yiY/t)+Bi sin(Yly/t), - - < y < d b , Y < -4 = A 0 exP(yoY/t),
(2.34)
where y2
=t
J
m
,
y1=
Jm,V ,
=t
k o J m ,
The consants PI and P2 are determined by satisfying the boundary conditions for the dominant field vector. For example, the EY,,-mode will approximate a TE mode in the x-direction and a TM mode in the y-direction. The boundary conditions required to be satisfied by $(x, y) (= EYp, mode) are (Varshney and Kumar [1988])
$,
a*
continuous at x = +$d,;
a* continuous at y = - t and y = db. n2 $, aY
(2.36)
In a similar manner, $(x, y) ( = E;, mode) requires continuity of n2 $,-a* at x = +$d,; ax
a*
at y = - t and y = d b .
$,aY
(2.37)
These boundary conditions lead to eigenvalue equations for /I1 and (Varshney and Kumar [ 19881):
( ::)
arctan c -
+ f ( p - 1)n = 0
P2
(2.38)
- p1
and
(
3
( :::) ( + (3
arctan Dlo - +arctan D12- -yl
1
-
+ ( q - l)n=0,
(2.39)
18
[I, § 2
GUIDED-WAVE OPTICS ON SILICON
where c = n:/ni for the Ef, mode and c = 1 for the EY,, mode, Dij = n?/n: for the EY,, mode and Dij = 1 for the E f q mode, and i, j = 0, I , 2. The solutions of eqs. (2.38) and (2.39) will yield p1 and p2, and hence Po, from the relation /38 = flf 8: - k i n : . Thus, from perturbation theory, the propagation constant p of the real waveguide will be
+
(2.40)
where A/? represents the first-order perturbation correction to
Po,
(2.41)
-00
Here, 6n2 is the difference in the dielectric constant distribution between the real and the fictitious waveguides. From figs. 5a and b, we obtain
6n2 = n: - ni for regions (I), =O otherwise.
(2.42)
The perturbation theory yields results that are more accurate than the effective-index method (Kumar, Thyagarajan and Ghatak [19831). This perturbation technique is useful, in particular, when 6n2 is small. However, it has been shown that even for semiconductor rib waveguides, in which n, = 3.44 and no = 1.0 (i.e. in which 6n2 N 10.8), the perturbation theory yields results that are in good agreement with other methods, such as the finite-element method or the mode-matching technique (Varshney and Kumar [1988]), which involve extensive numerical analysis. The perturbation results for the variation of neff of the scalar fundamental mode (Ell) of a semiconductor waveguide of width d, is reproduced in fig, 6 from Varshney and Kumar [1988]. In a realistic silica waveguide the difference 6n2 is small, about 1.1 3 in the case of a silica-clad phosphosilicate waveguide. Perturbation theory should therefore yield useful results. A comparison between various theories with regard to the coupling length estimation for a silica-based waveguide directional coupler is shown in fig. 7 (Takato, Jinguji, Yasu, Toba and Kawachi [1988]); the agreement between the perturbation results and the experiment is quite good. For a tightly bound mode, fractional energy of the mode is extremely small in the regions where 6n2 is
I,§ 21
19
PHYSICS A N D ANALYSIS
3.43 El, 3.41 -
1
#73' 3.35 3 0
- mode
2
4
3
I
Fig. 6. Effective index of the scalar fundamental mode ( El l ) versus width (d,) of a rib waveguide with no = 1, n, = 3.44, n2 = 3.35, d, = 0.2 pm, and t = 0.8 pm. (Reproduced with permission of IEEE from Varshney and Kumar [1988], 0 1988 IEEE.)
01 0
I
1
,
2
I
1
I
I
3
4
S
6
Waveguide separation
I P ml-
Fig. 7. A comparison of experiments with different theories for the dependence of perfect (i.e., IOOYO) optical power coupling length in a directional coupler with waveguide separation between two silica channel waveguides (dimension of 10 x 8 pm2 each with a relative index difference between the core and cladding of 0.24%) at wavelengths of 1.29 and 1.55 pm; (-) point matching method, (- - -) perturbation analysis (Kumar, Kaul and Ghatak [1985], (-.-.-) Marcatili's method (Marcatili [1969]), and ( 0 )experimental points. (Reproduced with permission of IEEE from Takato, Jinguji, Yaw, Toba and Kawachi [1988], 0 1988 IEEE.)
20
[I, 0 2
GUIDED-WAVE OPTICS ON SILICON
nonzero. Thus, the perturbation technique is more accurate when a mode is far from cutoff. Furthermore, any stress-induced birefringence, which may occur in waveguides made of silica on silicon, can be easily incorporated in the perturbation technique, as Kumar, Shenoy and Thyagarajan [19841 showed. 2.4. MULTILAYER WAVEGUIDES
Optical waveguides are difficult to form on silicon due to the lack of another suitable transparent medium of refractive index higher than that of silicon (n z 3.5). The difficulty of finding a suitable higher refractive index material compatible to silicon can be overcome by growing a layer of silica on silicon before guiding layers such as glass (Boyd, Wu,Zelmon, Neumaan, Timlin and Jackson [1985]) are deposited; the silica layer acts as a buffer layer. Thus, the refractive index profile of a typical composite structure will be as represented in fig. 8. Because of the high-index silicon substrate, the waveguide behaves as a leaky structure unless the buffer layer is thick enough. To reduce mode leakage loss in such a waveguide, the silica buffer layer is grown thick enough to ensure that the evanescent tail of the guided field will be negligible at the interface between the silica buffer layer and the silicon substrate. To achieve this, typically the silica layer thickness must be greater than 4 pm, requiring a long deposition time. The problem of long deposition time can be overcome in a novel waveguide configuration (Duguay, Kokubun, Koch and Pfeiffer [1986], Kokubun, Baba, Sasaki and Iga [1986]). It involves a multilayer planar configuration known as “ARRO
Air
r
n 1x1-
r
Fig. 8. Schematic of a silicon-based optical waveguide together with its refractive index profile; a buffer S i 0 2 layer thick enough to reduce leakage loss of guided light into the silicon substrate is introduced before the core is formed.
1 3 8 21
PHYSICS AND ANALYSIS
21
waveguides”, an acronym for antiresonant reflecting optical waveguide. The layered structure of the waveguide and the corresponding refractive index profile are shown in figs. 9a and b, respectively. The bottom silica layer (- 2 pm) is called the second cladding layer, whereas the top silica layer (-4 pm) forms the core of the waveguide. The intermediate high index layer of about 0.1 pm thickness between these two regions is called the first cladding. Two independent physical phenomena are exploited in this waveguide geometry. The silica-air interface at the top provides a total internal reflecting surface, whereas the high refractive index layer sandwiched between the two silica regions serves as a highly reflecting (>99%) interface. Thus an ARRO waveguide on silicon uses silica as the core like that in an optical fiber. The initial experiments on ARRO waveguides involved poly-silicon (poly-Si) as the thin high refractive index layer. However, for experiments at wavelengths less than 1 pm, poly-Si has been replaced by titania (TiO,) as the first cladding layer, since silicon is highly absorptive in this wavelength range (Kokubun, Baba, Sasaki and Iga [19861, Baba, Kokubun, Sasaki and Iga [1988]). The thickness of the first cladding layer is chosen to be small to act as a Fabry-PCrot resonator and closely matched the antiresonant condition of the resonator. Antiresonances in a Fabry-PCrot etalon are spectrally broad (Ghatak and Thyagarajan [19891). From the Fabry-PCrot analogy the waveguide will work over a wide spectral range. Thus the fabrication tolerance is comfortable. Under optimum conditions, reflectivity could be almost 99.96% from the set of two interfaces of poly-Si/TiO,-SiO, and Si02-Si (Duguay, Kokubun, Koch and Pfeiffer [19861). Approximate expressions for optimum thicknesses of the two reflecting layers are (Duguay, Kokubun, Koch and Pfeiffer [1986], Kokubun, Baba, Sasaki and Iga [1986])
Fig. 9. (a) Schematic of an ARRO waveguide geometry;(b) refractive index profile of an ARRO waveguide; (c) refractive index profile of an ARRO-B waveguide (see text).
22
GUIDED-WAVE OPTICS ON SILICON
[I> 5 2
and d;P'
N
+
(2.44)
for TE modes, for TM modes;
(2.45)
ideff(2M l), M = 0, 1,2, ...,
where
and C=1 = (n:/n:)
M , N stand for order of antiresonances. Here, n3, d3 and nz,dz, represent, respectively, the refractive indices and the widths of the first and second claddings; n4 and d4 correspond to refractive index and width of the core; and n5 is the refractive index of the cover, which is usually air. These results have been derived to yield minimum loss for the fundamental mode in an ARRO waveguide under antiresonant conditions. In an optimum A R R O waveguide configuration, as outlined earlier, the effective index (P/ko) and loss minimum (af'")of the fundamental TEo mode, respectively, are (Duguay, Kokubun, Koch and Pfeiffer [1986], Kokubun, Baba, Sasaki and Iga [1986]) P1
[
/ k o = n4 1 -
(31"'.
(2.46)
4n4 dcff
and (Duguay, Kokubun, Koch and Pfeiffer [1986], Kokubun, Baba, Sasaki and Iga [1986], Baba, Kokubun, Sasaki and Iga [1988], Baba and Kokubun [1990, 19911)
a?"l=
x
X
5428.68(1/deff)'
1
1
(2.47)
with X=l = (n: n, /n:)'
for TE mode, for TM mode,
where n, is the substrate refractive index, and 1 and deff are measured in
I > §21
PHYSICS AND ANALYSIS
23
micrometers (Baba, Kokubun, Sasaki and Iga [ 19883).Equation (2.48) shows that the loss discrepancy between the TE and TM modes is due to the large difference in refractive index between the core and the first cladding, and also between the core and the substrate (Baba, Kokubun, Sasaki and Iga [1988], Baba and Kokubun [1991]). In an alternative method ARRO waveguides were analyzed through an equivalent transmission line and transverse resonance method (Jiang, Chrostowski and Fontaine [1989]). If the thicknesses d2 and d3 satisfy antiresonance conditions, the dispersion relation for the propagation constant of the TE modes is COt(Y4d4)= -@s/Y,,
(2.49)
where y4 is the imaginary part of the transverse propagation constant in the core, and a5 is the real part of the transverse propagation constant in air. More recently a novel matrix approach was used to obtain the propagation characteristics of ARRO waveguides (Tewari, Singh and Pal [1990]). As shown in fig. 10, we call the refractive index of the silicon substrate, the second cladding, the first cladding, the core, and the cover (air) n,, n2, n3, n4, and n5, respectively. The thickness of the corresponding regions are d l , d2, d3, d4, and d5. The structure has five homogeneous layers, four interfaces, and five different refractive indices. The recipe of the model involves only a few computational steps (Ghatak, Thyagarajan and Shenoy [1987]). As a first step, a plane wave of amplitude E: is allowed to be incident on the interface between the silicon substrate and the second cladding at some angle of incidence 8. The corresponding amplitude reflection and transmission coefficients at the mth interface for TE and TM polarizations are given by (Ghatak, Thyagarajan and Shenoy [ 19871, Ghatak and Thyagarajan C19891)
Fig. 10. Matrix method to model the propagation characteristics of an ARRO waveguide.
24
GUIDED-WAVEOPTICS ON SILICON
T E polarization:
(2.50) T M polarization: r, =
n,+ cos 8, - n, cos Om+ n,+, cos8,+n,cos8,+l’
(2.51) respectively. For individual layers the 8,’s are given by Snell’s law at each interface:
fl= konl sin O1 = kon2 sin O2 = kon3 sin O3 = kon4 sin 04.
(2.52)
For each 8, the values of r, and t, are stored for each interface. Appropriate boundary conditions at each interface lead to the matrix equation (2.53) where + and - signs correspond to transmitted and reflected fields at each interface, and S is the product of 2 x 2 matrices: S1,S2, ...,S4, with (2.54) and 6, = k,d, cos 8,;
k, = ken,.
In view of eq. (2.52),6, can be expressed in terms of 6, = k,d,(ni - n: sin2 el).
(2.55)
el as (2.56)
Thus, 6,, r,, t,, and S, can be calculated for a given el. Since the fifth layer is very thick, the reflected field E ; E 0, and hence from eqs. (2.53)-(2.55) the electric fields in every layer can be obtained in terms of the incident field E : . At the next step the mode excitation efficiency q(fl)= IEi/E:12 is evaluated for the given (i.e., for the given /? = konl sin el). The process is repeated by scanning the fl-space through a variation in 8. A plot of q(P)
1, B 31
TECHNOLOGY
25
reveals resonant peaks that closely resemble Lorentzian functions in shape, For a single-mode guide there is only one such peak. The value of B at which the peak appears corresponds to the real part of the propagation constant. The full-width-at-half-maximum [FWHM (= 2f)] of the Lorentzian represents the leakage power loss coefficient, where f is the imaginary part of the propagation constant (Ghatak, Thyagarajan and Shenoy [19871, Ghatak and Thyagarajan [ 19891). With the derived propagation constant the fields throughout the system can be computed by evaluating appropriate matrices (eq. (2.54)). The method can be applied to any multilayer structure in which one or more layers has a complex refractive index (Thyagarajan, Diggavi and Ghatak [19871). For lossless waveguides, f 0. This method was used to design and fabricate ARRO waveguides (Tewari, Singh and Pal [1990], Pal, Singh, Ghatak and Bhattacharya [1990]). Calculation for TE mode losses for various parameters (e.g., thicknesses of the individual layers) can be used to optimize an ARRO waveguide. Some results are shown in figs. 1 la,b. An important attribute of this matrix method is that, in addition to giving leakage loss and propagation constant, it allows computation of the corresponding modal field distributions. Feasibility of applying the matrix method to ARRO-B waveguides (Baba and Kokubun [1989]) was also tested (Tewari, Singh and Pal [1990]). These waveguides use a layer of lower refractive index between the two silica layers, rather than a layer of higher refractive index layer (fig. 9c). In contrast to the ARRO waveguides, an ARRO-B waveguide is polarization insensitive.
8 3. Technology of Silicon-BasedOptical Waveguides The technology of silicon-based waveguides before 1985 was reviewed by Boyd, Wu, Zelmon, Neumaan, Timlin and Jackson [1985]. In the earliest attempts to fabricate optical waveguides on silicon, the silicon surface was first thermally oxidized in order to grow a buffer layer of silica (refractive index = 1.46 in the 0.6-0.8 pm wavelength range) before the waveguide core (inorganic polymers or silicon oxynitrides, for example) was formed (Rand and Strandley [1972], Boyd and Chen [1976], Boyd and Chen [1977], Marx, Gottlieb and Brandt [1977]). This silica buffer layer was thick enough to prevent leakage of guided light to the high-index silicon substrate, the complex refractive index of which is about 3.8540.077 at the He-Ne wavelength (Stutius and Streifer [1977]). The surface smoothness of the silica buffer layer was usually of about the same quality as the original silicon
26
CL 8 3
GUIDED-WAVE OPTICS ON SILICON
1.4418~-ftfective
index
10.3 1.
4 0
4 1
1
k
2
k
3
O 4
Second cladding thickness I IJ in)
1.4419
f'-
-
Effective index
. 5
l
10.6
1.4418 -
-0.5
-
-0.4
t . c_
1 A417
'
-t
El
._ *
-
&' 1.&16-
.A-
W
First cladding thickness I /.fin)--
Fig. 1 1 . (a) Effective index and loss of TE, mode versus second cladding thickness in the poly-Si ARRO waveguide as obtained by the matrix method (see text). (b) Effective index and loss of TEo mode versus first cladding thickness in the poly-Si ARRO waveguide as obtained by the matrix method (see text).
wafer surface (Boyd, Wu, Zelmon, Neumaan, Timlin and Jackson [1985]). The core layers were derived by deposition, for example, by chemical vapor deposition (CVD) or radiofrequency (RF) sputtering. In one of the earliest such waveguides with attenuation less than 0.1 dB/cm (for TEo and TMo modes at 0.6328 pm), silicon nitride waveguiding film ( N 300 nm thick) of refractive index -2.01 was deposited through low pressure CVD (Stutius and Streifer [19771). The thickness of the silica buffer layer varied from 275 to 820 nm from sample to sample. Other successful attempts involved RF
1. P 31
TECHNOLOGY
21
sputtering of 7059 glass* ( n = 1.5530) or zinc oxide (ZnO) (n-2.0) as the guiding layer (Goell and Strandley [19691, Dutta, Jackson and Boyd [1980], Dutta, Jackson, Boyd, Hickernell and Davis [1981], Dutta, Jackson and Boyd [1981]). Laser annealing of the deposited waveguiding film led to a dramatic reduction of attentuation in such waveguides. Typically, the reduction was by a factor of 250 in ZnO and 80 in the 7059 glass waveguides (Boyd, Wu, Zelmon, Neumaan, Timlin and Jackson [19851). Improvements in the quality of the interface between the guiding layer and the buffer layer of silica led to a loss of only 0.01 dB/cm (Dutta, Jackson, Boyd, Hickernell and Davis [198 I], Dutta, Jackson and Boyd [198 13, Chen and Boyd [198 11, Dutta, Jackson, Boyd, Davis and Hickernell [1982]). Since ZnO is a piezoelectric material, ZnO-based waveguides enable realization of acousto-optic devices with silica on silicon as the substrate (Hickernell,Davis and Richard [1978], Hickernell [1979], Chubachi [1976], Yao, Anderson and August [19791). Some success in fabricating spectrum analyzers with ZnO waveguides on silicon-based substrates have been reported in $ 4 . Fabrication of silicon oxynitride (Si,O,N,) core waveguides was reported recently by Gleine and Miiller [1991]. The core layer of about 0.25 pm was deposited by lowpressure CVD (LPCVD) at about 3- 10 Pa atmosphere from vapor-phase reactions between SiH2Clz (I 2-20 mL/min), NH3 (0-300 mL/min), and O2 (0-500 mL/min) on top of a 4 pm thick silica layer. The core is finally covered with a silica overlayer. Typically, the losses are less than 0.5 dB/cm at 0.6328 pm. Laser annealing with a pulsed C 0 2 laser reduces the loss to some tenth of a dB/cm depending on the irradiation time and intensity of the laser. Laser annealing also led to a decrease in the refractive index of these near oxide films up to about 2% due to a reduction in the stress of the deposited films. This laser-induced reduction in the refractive index of the deposited films can be exploited to trim silicon-based guided-wave components (Gleine and Miiller [1991]). In one study, temperature-independent operation of a single-mode waveguide having the composite structure 7059 glass/SiO, /Si was demonstrated; the guiding layer thickness was 0.368 pm (Chen and Boyd [1981]). Such temperature-independent waveguide operation has potential applications in interferometric sensors based on optical waveguides. There is much recent interest in using silica or doped silica as the waveguiding core. Thermal oxidation of silicon for 24-27 hours yielded graded-index
* A proprietory trade name is used to enable the readers to reproduce the experiment; other glasses might work as well or better.
28
GUIDED-WAVE OPTICS ON SILICON
,?I 4 3
silica waveguides with losses of 0.3-0.4 dB/cm (Zelmon, Jackson, Boyd, Neumaan and Anderson [1983]). The thickness of the grown SiO, layers varied from 14.4 to 15.8 pm. For these thicknesses, out-of-plane scattering was very low, and the measured transmission loss was attributed mostly to leakage of light to high index silicon substrate (Zelmon, Jackson, Boyd, Neumaan and Anderson [19831, Boyd, Wu, Zelmon, Neumaan, Timlin and Jackson [19853). In subsequent works, phosphosilicate glasses were used to form the guiding layer, which were deposited by CVD through co-oxidation of silane and phosphorus at 400 to 500°C under atmospheric pressure (Neumaan and Boyd [1980, 19811). By doping with phosphorus, the refractive index of silica can be increased by a few percent. Since optical fibers are made from similar host materials (Pal [1979]), it is possible to tailor the mode-field profile of such waveguides to match closely the LP,,-mode profile of silica-based optical fibers. This will ensure good power-coupling efficiency between the waveguide and a single-mode fiber. This advantage of phosphosilicate waveguides has motivated extensive investigations of fabrication of low-loss phosphosilicatechannel waveguides on silicon (Grand, Jadot, Denis, Valette, Fournier and Grouillet [19903). The composite structure of these waveguides is Si/SiO, /P doped-SiO, /Si02(Valette,Gidon and Jadot [19871, Valette [1987, 19881, Valette, Renard, Denis, Jadot, Fournier, Philippe, Gidon, Grouillet and Desgranges [19891, Grand, Jadot, Denis, Valette, Fournier and Grouillet [19903, Valette, Renard, Jadot, Gidon and Erbeia [19901). All silica and phosphorus-doped silica layers are deposited through a plasma-enhanced CVD (PECVD) process. Two major attributes of the PECVD technology are that it is a relatively low-temperature process ( ‘Y 8OO0C),compatible with the well-established microelectronics processing, and that it yields a high average deposition rate of about 40 nm/min (Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [199 11). These waveguides exhibited losses less than 0.2 dB/cm at 0.633, 0.8 and 1.3 pm wavelengths. However, to achieve low loss at 1.55 pm, the waveguides require thermal annealing at an elevated temperature of about 1000°C for about 3 h. The reproducibility of the waveguides is achieved by maintaining phosphorus doping (at various levels) during deposition of all the layers. During deposition of the core, phosphorus doping level is kept to 5-10%, whereas for the buffer and cover layers it is between 2 and 3% (Grand, Jadot, Denis, Valette, Fournier and Grouillet [1990]). Typically, a phosphine flow rate of 4 cm3/min leads to a phosphorus doping of about 3% by mass in silica (Valette, Renard, Denis, Jadot, Fournier, Philippe, Gidon, Grouillet and Desgranges [19893). Dry
I,§ 31
TECHNOLOGY
29
etching of the core layers with CHF3 has been used to form low-loss channel waveguides using this technology. Modal spot size is optimized for coupling to fibers at 1.55 pm; such an optimized waveguide structure is shown in fig. 12 (Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [19913). Fiber-to-waveguide coupling efficiency also was the motivating factor for fabrication of phosphosilicate glass waveguides on silicon reported by Henry, Blonder and Kazarinov [1989]. In their technology a buffer silica layer of about 15 pm is grown through rapid oxidation of the silicon substrate under a high pressure steam. The phosphosilicate core layers (4-5 x 7 pm2) are deposited by incorporating 6 5 8 % of phosphorus in silica through lowpressure chemical vapor deposition (LPCVD) at 680°C. The process involves a chemical reaction between tetraethylorthosilane, ammonia, and phosphine. The cover (-5 pm) consists of phosphorus-doped (-2% P) silica layers, deposited through LPCVD at about 380°C from the same raw materials. Eventually, after the depositions are completed, the waveguide is annealed at about 1000°C to relieve strain and to densify the film. The measured fiber-to-fiber coupling losses for these phosphosilicate glass core waveguides at 1.3 and 1.5 pm are shown in fig. 13 (Henry, Blonder and Kazarinov [19891). Silicon-based waveguides suitable for optical coupling from laser diodes are based on silicon nitride core layers (Henry, Kazarinov, Lee, Orlowsky and Katz [1987], Henry, Blonder and Kazarinov [1989], Shani, Henry, Kistler, Orlowsky and Ackerman [19891). These waveguides are characterized by a tightly confined modal field due to a relatively large index difference (An 0.55) between the core and surrounding medium. A silica buffer layer of about 5 pm thickness, which is sufficient to reduce the leakage
-
r-fR Silicon
Fig. 12. Schematic of planar and channel waveguide geometries with phosphosilicate glass for optimum coupling to optical fibers. (Reproduced from Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [199 11 by permission of Kluwer Academic Publishers.)
30
CL § 3
GUIDED-WAVE OPTICS ON SILICON
0 0
I
1
I
2
3 L Icm1-L
4
I
5
6
1
Fig. 13. Measured fiber-to-fiber loss at 1.3 and 1.5 pm through a phosphosilicate core glass waveguide with length of the waveguide; new data indicate more recent results on a waveguide, the cross-section of which is shown in the inset. (Reproduced by permission of IEEE from Henry, Blonder and Kazarinov [1989], 0 1989 IEEE.)
loss, is realized through deposition of phosphorus-doped glass (- 8% of P), followed by annealing at 1000°C or through high-pressure oxidation (about 2.45 x lo6 Pa at 950°C). To form the core layers of silicon nitride (n 1.97 at 1.2-1.6 pm), LPCVD is employed with dichlorosilane, ammonia, and oxygen as the raw materials (Henry, Blonder and Kazarinov [1989]). Twodimensional confinement of guided light is achieved by chemical etching with hot (-174°C) phosphoric acid to form a mesa rib waveguide (-4 x 0.12 pm2) on silicon nitride; the etch rate is about 0.1 nm/s (Henry, Kazarinov, Lee, Orlowsky and Katz [1987]). The core is covered with a plasma-deposited, 0.8 pm thick, silica superstrate. With these waveguides, losses less than 0.3 dB in the 1.3-1.6 pm range were reported. Absorption peaks at 1.4 and 1.52 pm associated with hydrogen in silica and silicon nitride layers similar to the ones that occur in optical fibers were identified. These peaks could be reduced substantially, however, by annealing the waveguides at 1 100- 1200°C. The loss spectrum of a 4 pm wide silicon nitride rib waveguide is shown in fig. 14; the inset shows a Bragg reflector grating (see 84) on top of the waveguide (Henry, Blonder and Kazarinov [1989], Henry [1991]). Waveguide refractive indices must often be known to an accuracy of a few parts in 104. For example, for a Bragg filter based on a waveguide with a core-cladding index difference of about 0.006, a change of refractive index by will shift the resonant wavelength of the filter by
-
1,s 31
TECHNOLOGY
31
Wavelengthlprnl-
Fig. 14. Loss spectrum of a SiO,/Si,N,/SiO, rib waveguide of 4 pm width. The inset shows a Bragg reflector grating developed on the cover. (Reproduced with permission of IEEE from Henry, Blonder and Kazarinov [1989], 0 1989 IEEE.)
1 nm. Extensive experimental data for the refractive index dispersion of thermally deposited silica, phosphorus-doped silica, and silicon nitride glasses were reported by Lee, Henry, Orlowsky and Kometani [1988]. Fabrication of both planar and channel waveguides based on the composite structure Si/SiO2/Si3N,/SiO2 was also reported from France (Valette, Renard, Denis, Jadot, Fournier, Philippe, Gidon, Grouillet and Desgranges [19891, Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [19911). The silica buffer layer (-2 pm) is obtained by thermal oxidation of silicon, whereas the overlayer of silica is obtained through PECVD. The guiding layer of silicon nitride (n = 2.014 at 0.633 pm, and n = 1.997 at 0.8 pm) is relatively thin (typically, 0.08-0.22pm) and is deposited by LPCVD. The fundamental mode spot size in these waveguides is small; it typically varies between 0.5 and 2 pm. The design key behind this technology is to produce a change in the effective index of the guided mode through a controlled dry ion etching of the silica overlayer. Figure 15a reproduces from Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [1991] a curve depicting change in neff ( = Aneff), with the thickness of the silica overlayer as a variable. Physically the nature of the curve can be understood from the fact that varying thickness of the overlayer induces modification to the evanescent tail of the modal field distribution. For example, saturation of Aneff occurs when the silica overlayer thickness
32
GUIDED-WAVEOPTICS ON SILICON
[I, 5 3
Thickness of SiOp overlayer, pm
A
(b)
0.10
! .-
0'08
0.06 5
.-
C
0.04
0.02
L
I
I
I
I
I
I
0 0.05 0.1 0.15 0.2 0.25 0.3
Thickness of Si,N, Layer, pm
Fig. 15. (a) Theoretical change in the effective index of a single-mode SiO,/Si,N,/SiO, waveguide as a function of thickness of the SiO, overlayer; (b) theoretical change in the effective index of the same waveguide with thickness of the Si,N, core layer. (Both figures reproduced from Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [19911 by permission of Kluwer Academic Publishers.)
exceeds the characteristic penetration depth of the mode's evanescent tail. Alternatively, modification of Aneff can be induced by using another dielectric of refractive index less than that of silica at the etched region. Futhermore, the change in neff with respect to a variation in the silicon nitride layer thickness is polarization sensitive, as shown in fig. 15b. The fundamental mode spot size in these waveguides is relatively small, and hence these waveguides are more suitable for coupling to laser diodes than to singlemode fibers.
I , § 31
TECHNOLOGY
33
The well-known flame hydrolysis deposition technology of fiber preform fabrication is the basis of an alternative approach to realize several guidedwave components on silicon (Kawachi [1990, 19911). In this method (Kawachi, Yasu and Edahiro [19831) a combination of flame hydrolysis deposition (FHD) and reactive ion etching (RIE) is employed to produce channel waveguides with modal field profiles matched to optical fibers. Raw materials in the form of a mixture of silicon tetrachloride (SiC14)and titanium tetrachloride (TiCl,) or germanium tetrachloride (GeCl,) and SiC1, are injected into an oxyhydrogen torch and react by flame hydrolysis to produce doped-silica soots, which get deposited on the silicon wafers. A large number of 7.62cm silicon wafers (up to about 30) can be placed on a turntable of about 100 cm diameter to collect doped-silica particles. The refractive index of these synthesized glass particles can be controlled through variation in the TiCl,/GeCl, flow rates. Initially, only SiC1, is fed into the flame to deposit silica soot to form the buffer layer before the core layers of doped silica are deposited. The deposited porous structure of buffer and core layers is then consolidated by heating in a separate electric furnace from about 1200 to 1300°C. A planar structure is thus formed. Typically, the buffer layer is about 20 pm thick and the core layer about 8 pm. Various processing steps sequentially involved in this technology are shown in fig. 16. To form a ridge for channel waveguides, an overlayer of about 2 pm of amorphous silicon (a-Si) is deposited on top of the planar waveguide through magnetron sputtering. The ridge pattern was defined by conventional photolithography, followed by RIE of the overlayer with CBrF, gas. Subsequently, RIE with a mixture of CzFs and CzH4 is carried out to etch out the deposited layers except the photolithographically defined ridge region until the buffer layer is exposed. The core ridge is thus formed and then covered by depositing a thick overlayer of silica through flame hydrolysis. The final product is a buried waveguide (Kawachi [1990]). The thick overlayer enables easy attachment of optical fiber arrays to the waveguide and dicing of the waveguide without damaging the core. Typically, for a single-mode waveguide, core size is 8 x 8 pmZ and corecladding index difference (An) is 0.25% (Kawachi [1990]). Transmission loss in these waveguides is about 0.1 dB/cm. For components that require high resistance to bending, small cores (6 x 6 pm’) with a relatively high An (up to 0.75%) are also reported (Takato, Jinguji, Yasu, Toba and Kawachi [19883). These waveguides exhibit a somewhat larger transmission loss of about 0.3 dB/cm. These numbers are for the cores made of titania-doped silica. Germania (GeO,), whose melting temperature of 1086°C is lower than
34
GUIDED-WAVE OF'TICS ON SILICON
[I, 9: 3
FHD + Consolidation
r-
4 Buffer Silicon
FHD + Consolidation
Fig. 16. Schematic of the Rame hydrolysis deposition technology and its various intermediate steps. (After Takato, Jinguji, Yasu, Toba ahd Kawachi [1988], 0 1988 IEEE.)
the value of 1850°C for titania, has been used to fabricate a long (-40 cm) single-mode waveguide having An N 0.75% with fourteen 90" bends, each of 5 m m radius on a silicon wafer (Kominato, Ohmori, Okazaki and Yasu [1990], Kominato, Ohmori and Onose [1991]). Small amounts of phosphorus pentoxide (P,O,) and boron trioxide (B,O,), if added during deposition of both the core and the cladding, help to reduce the consolidation temperature of the deposited glass (Kominato, Ohmori, Okazaki and Yasu [19903). Transmission loss, including the bend loss, was measured to be about 0.04 dB/cm at 1.55 pm (Kominato, Ohmori and Onose [1991]). Estimated loss in the straight part of the waveguide was about 0.0 1 dB/cm. The FHD technique can be readily scaled up to produce highly multimode waveguides with cross-sections of about 40 x 40 pm2 and to realize integration of various optoelectronic components (Kawachi, Yasu and Edahiro [1983], Kawachi, Yamada, Yasu and Kobayashi [1985], Terui, Yamada, Kawachi and Kobayashi [1985]). Since mismatch in the thermal expansion coefficients between doped silica (-0.5 x K-') and silicon (= 2.5 x K - l ) is large, we might expect crack formation with such thick
1, B 31
TECHNOLOGY
35
silica layers. However, stress in silica waveguides is compressive, and this prevents cracking of the waveguides, although it leads to a birefringence of between the TM and TE modes (Kawachi [1990]). about 4 x Two different methods have been proposed to control the birefringence between the TE and TM modes in a silica-based waveguide on silicon. In one method a pair of grooves is etched (through RIE) symmetrically around the waveguiding ridge along its length in the cladding (Kawachi, Takato, Jinguji and Yasu [1987]). These grooves lead to a release of stress on the ridge; birefringence reduces to 0 as the ridge width decreases from about 500 to 50 pm (Kawachi [1990]). Alternatively, birefringence can be controlled by depositing a magnetron-sputtered 6 pm thick film of a-Si on top of the overcladding (Kawachi [1990]). The residual stress of the sputtered a-Si film modifies the birefringence of the waveguide. Once again, the magnitude of the resultant birefringence is determined by the width of the a-Si film. Birefringence increases to about 5.5 x for an a-Si strip width of 50 pm, as the strip width increases to about and then decreases to 2.5 x 200 pm (Kawachi [1990]). Microwave (2.45 GHz) plasma-assisted CVD at about IO0O"C was also used to form germania-doped silica and silicon nitride waveguides on a silica substrate (Nourshargh, Starr and McCormack [19861, Nourshargh, Starr and Ong [ 19891). In another recent investigation (Sun, Myers, Schmidt and Sumida [ 1991]), a germania-doped silica channel waveguide formed on silicon through FHD and RIE was transformed to a circular, cross-sectional channel waveguide by employing a selective etching of the waveguide in a buffered hydrofluoric acid (HF) solution (containing a mixture of 49% H F and 40% NH4F solutions). In view of the circular cross-section of the core, the coupling loss from fiber-to-waveguide-to-fiber drops to about 0.5 dB from 1.8 dB, which is typical for a fiber-to-rectangular core-to-fiber coupling (Sun, Myers, Schmidt and Sumida [19911). Silicon in combination with silica is used to form multilayer ARRO waveguides, the functional principle of which was discussed in 0 2. A second cladding of about 2 pm thickness is grown on the silicon wafer by thermal oxidation (fig. 9a). Subsequently, a thin (-0.1 pm) layer of poly-Si or titania or silicon nitride is deposited by CVD to form the highly reflecting layer. The silica core (-4 pm) is finally deposited on the top by LPCVD. Loss of about 0.4 dB/cm at 1.3 pm for TE modes has been reported (Duguay, Kokubun, Koch and Pfeiffer [1986]). The choice of materials for the first cladding depends on the wavelength. For wavelengths less than 0.9 pm, titania (n zz 2.1) is the preferred material because poly-Si is highly absorbing at these wavelengths, whereas for wavelengths longer than 1.1 pm, poly-Si
36
[I, § 3
GUIDED-WAVE OPTICS ON SILICON
10-2
1 .o
I
I
I
1.2
I
I
1.4
I
1.6
Wavelength, prn
Fig. 17. Optical absorption spectrum of n-type crystalline silicon; donor concentration ND is 10l6~ m - (Reprinted ~ . with permission of Solid State Technology from Soref and Lorenzo [19881.)
-
(n 4.5) is more suitable (Kokubun, Baba, Sasaki and Iga [1986]). On the other hand, silicon nitride is expected to be useful in both wavelength ranges (Pal, Singh, Ghatak and Bhattacharya [19901). In a slightly different configuration (fig. 9c), the first cladding is replaced by a 0.3 pm layer of NA45 glass* of refractive index 1.54 to provide total internal reflection in contrast to pure reflection that occurs in a normal ARRO waveguide (Baba and Kokubun [1989]). In this configuration, which is called an ARRO-B waveguide, all the layers are deposited by radiofrequency magnetron sputtering, and the measured losses for TEo and TMo modes were 0.5 and 0.7 dB/cm, respectively, at 0.633 pm (Baba and Kokubun [1989]). All the silicon-based waveguides reported to date used silicon only as a support material. However, due to its high transparency between 1.2 and 1.6 pm (fig. 17), low-loss waveguide components in silicon are attractive for optical fiber systems. Significant work using silicon as a waveguide was reported by Soref and Lorenzo [1985, 1986, 19881, Soref and Bonnett [1987]), exploiting the fact that injection of free carriers reduces the refractive index of a semiconductor. However, the presence of free carriers also induces an increase in absorption coefficient. Free-carrier-induced decrease in the refractive index and increase in the absorption coefficient are given by (Moss [1959], Soref and Bonnett [1987], Lubberts, Burkey, Moser and Trabka ~19811)
-
* A proprietory trade name is used to enable the readers to reproduce the experiment; other glasses might work as well or better.
1, § 31
31
TECHNOLOGY
-
Number of e h pairs, cm3
Hole concentration, cm3
Fig. 18. Carrier-induced change in the complex refractive index n + ik of silicon with: (a) hole concentration;(b) number of e-h pairs. (Reprinted with permission of Solid State Technology from Soref and Lorenzo [ 19883.)
An = -(q212/8712C2nEg)[N,/m,*, 4-Nh/m&],
(3.1)
and
+
(3.2) , n are electronic charge, optical wavelength, free-space where q, I , E ~ and permittivity, and refractive index of pure silicon. N,, mcc, and pe represent free-electron concentration, conductivity effective mass of an electron, and electron mobility, respectively; the corresponding quantities with h in the subscript represent the same characteristic quantities for holes. Free-carrierinduced changes in the real (An) and imaginary (Ak) parts of the refractive index, as well as the absorption coefficient (Au), are plotted as a function of carrier concentration in figs. 18a and b. Thus, optical waveguides can be formed by the epitaxial growth of a layer of lightly doped silicon on a heavily doped silicon substrate. The epilayer forms the core of an optical waveguide suitable for operation at 1.3 pm (Soref and Lorenzo [1985], Soref and Bonnett [1987]). To induce a sufficient change in the refractive index of silicon through carrier doping, carrier concentrations greater than 10l6cmP3 are required. Typically, for N , = 10" ~ m - ~An, is about A U = (q312/4712C3nEo)CN~/m,*,2p, Nh/mrhZph],
38
GUIDED-WAVE OPTICS ON SILICON
CL §4
-0.9 x at 1.3 pm (Soref and Bonnett [1987]). Fabrication of several slab and channel waveguides with n on n', p on p', n on p', and p on n' have been reported. At 1.3 pm, these multimode waveguides exhibit loss ranging from 5 to 13 dB/cm in the slab and from 15 to 10 dB/cm in the rib channel geometry (Soref and Bonnett [19871, Hall [19871). Another method of forming optical waveguides in silicon involves the separation by implanted oxygen (SIMOX) technique (Hall [19871, Kurdi and Hall [1988]). Fabrication of planar waveguide by this technique was recently reported by Weiss, Reed, Toh, Soref and Namavar [1991]. Oxygen ions at a dose of 1.6 x lo'* cm-2 at 160 keV are implanted in silicon followed by thermal annealing for 6 h at 1300°C. Ion implantation leads to a 0.4 pm thick oxide layer buried under a 0.15 pm silicon layer. Subsequently, a 2 pm thick doped-silicon layer with a carrier concentration of about 10'' cm-3 is grown by CVD over the top surface. This layer of doped silicon forms the cover over the intermediate core of undoped silicon (Weiss, Reed, Toh, Soref and Namavar [19911). Thus, the implanted silica layer essentially separates the core and substrate, both of which are made of silicon to prevent leakage loss. Loss measurements made at 1.15 and 1.523 pm with He-Ne lasers indicated a loss minimum of 8 dB/cm for the TEo mode at 1.15 pm (Weiss, Reed, Toh, Soref and Namavar [199 13, Weiss and Reed [199 11). Fabrication of single-mode waveguides in silicon through in-diffusion of Si-Ge alloys was also reported (Splett, Schmidtchen, Schiipert and Petermann [1990]). The process involved diffusion of Ge,Si, --x alloy into a well-defined section of the silicon wafer at 1200°C for 65 h; typically, x = 0.5. Single-mode ridge waveguides having x = 0.0 1 were fabricated through anisotropic etching with a mixture of 100 g of KOH and 100 mL of H 2 0 at 60°C. The best transmission loss was about 3 dB/cm at 1.3 pm (Splett, Schmidtchen, Schiipert and Petermann [1990)). Fabrication of buried optical waveguides using electron beam irradiation of silica in a slab geometry was recently reported (Barbier, Green and Madden [1991]). An electron accelerating voltage of about 25 keV was used to irradiate the samples, leading to a waveguide depth of 7.5 pm. Lowest measured loss was about 0.3 dB/cm at 0.6328 pm.
0 4.
Guided-Wave Optical Components on Silicon
Although the technology of silicon-based integrated optics is relatively new, several functional components and devices have been realized and
1, Q 41
39
GUIDED-WAVE OPTICAL COMPONENTS
reported. They include Fresnel lenses, beam splitters, dispersive mirrors, couplers, polarization devices, spectrum analyzers, displacement sensors, refractive index sensors, wavelength and frequency division multiplexers and demultiplexers, Bragg cells, optical switches, Y branches, X crosses, directional couplers, 8 x 8 star couplers, waveguide arrays, phase shifters, birefringence controllers, resonators, Bragg reflector lasers, and Bragg reflection filters. By means of photolithography two doped-silica FHD waveguides separated by only a few micrometers can be designed on silicon to form a directional coupler (fig. 7) (Kawachi [1990], Okamoto, Takahashi, Suzuki, Sugita and Ohmori [1991]). The intermediate region between the two cores is easily filled with cladding glass during the consolidation step of the FHD process. These couplers are polarization insensitive. Several such guidedwave optical beam splitters suitable for splitting, redirecting, tapping, and combining optical signals have been fabricated through FHD waveguides on silicon (fig. 19) (Kawachi [1990]). The excess loss due to coupling from fiber-to-waveguide-to-fiber in the Y configuration of the beam splitter is about 1 f 0.5dB in the 1.2 to 1.6 pm wavelength range. In the directional coupler configuration (fig. 19b),high wavelength selectivity of the directional couplers allows the beam splitter to operate at 1.3 or 1.55 pm with an excess loss of less than 1 dB. Wavelength division multiplexers and demultiplexers
Y-Splitters
.... l a I I I
Directional couplers Fig. 19. Schematic of a silica-based,single-mode ( I x 8) optical waveguide beam splitter in two different configurations:(a) Y-shaped branch, (b) directional couplers. (After Kawachi [19903.)
40
GUIDED-WAVEOPTICS ON SILICON
CI9
§4
based on directional couplers can be constructed either in a single coupler configuration or by combining two directional couplers along the z-direction in a Mach-Zehnder interferometer on silicon (Kobayashi, Kito, Yasu and Kawachi [1989], Kawachi [1990], Jinguji, Takato, Sugita and Kawachi [1990]). In the single-coupler geometry, which is more suitable for realizing large wavelength separation (1.3 and 1.5 pm),the coupler is designed to have zero coupling at one wavelength (A,) and 100% coupling at the second wavelength (A,). Thus, as shown in fig. 20a, light at 1, will exit through the output port of the first waveguide and A, will exit through the output port of the second waveguide. In the second configuration two directional couplers are joined through two intermediate waveguide arms with a small path difference AL between them (fig. 20b). To obtain good wavelength multiplexing and demultiplexing efficiency for arbitrary combinations of A, and A,, submicrometer accuracy is required in AL. This accuracy is readily attained through photolithography during the mask preparation (Kawachi [1990]). The temperature coefficient of the refractive index of silica, dn/dT, is about lop5. Thus, a temperature increase of 6.5"C will induce a phase shift of II to the guided mode in a 10mm silica waveguide. Typically the power consumption will be only 0.5 W. A thin-film chromium heater can be loaded on top of a single-mode buried silica waveguide in silicon (as shown in
Single directional coupler configuration
(b)
4.b 2
4
7 Extra Length v (AL)
b
MZ Interferometer configuration
Fig. 20. Two different directional coupler configurations for realizing wavelength-division multiplexer or demultiplexer: (a) single directional coupler, (b) Mach-Zehnder interferometer. (After Kawachi [1990].)
41
GUIDED-WAVE OPTICAL COMPONENTS
fig. 21) to realize such an in-line thermo-optic phase shifter. Optical frequency division multiplexers or demultiplexers have been fabricated by loading a Mach-Zehnder waveguide interferometer in silica with such a thin-film chromium heater (Toba, Oda, Nosu and Takato [19891, Takato, Kominato, Sugita, Jinguji, Toba and Kawachi [1990]). As shown in fig. 21, two 3 dB waveguide couplers are combined through two waveguide arms with a path difference of AL between them. A thin-film heater is loaded on one arm to induce an additional phase shift and, hence, a corresponding additional optical path length difference between the two arms to obtain frequency tuning in the demultiplexer. The path length difference AL determines frequency spacing Af through (Kawachi [1990]) Af=-
C
(4.1)
2n AL’
where n is the refractive index of the waveguide and c is the free-space velocity of light. An optical path difference (AL) of 17 mm will cause Af to be about 6 GHz; Af is related to wavelength spacing A l through
Thin fil,m heater 1
4
2
3
LDirectional couplerJ
Thin film heater
Fig. 21. Frequency division multiplexer or demultiplexer with thermo-optic phase shifter on a silica-based Mach-Zehnder waveguide interferometer. (After Takato, Jinguji, Yasu, Toba and Kawachi [1988], 0 1988 IEEE.)
42
GUIDED-WAVE OPTICS ON SILICON
CL §4
A value of Af of 6 GHz amounts to a wavelength spacing of about 0.05 nm in the 1.55 pm wavelength band. Frequency tuning in these multiplexers or demultiplexers can be obtained through modulation of the optical pathlength difference between the two arms of the interferometer by means of the phase shifter. This thermo-optic effect can also be used to perform switching operations in an otherwise passive silica waveguide, although the switching time is slow (-2 ms) (Takato, Jinguji, Yasu, Toba and Kawachi [19883, Kawachi [19903). A thermo-optic switch is configured by combining two directional couplers through two intermediate waveguide arms as in the thermo-optic frequency division multiplexer. However, there is one difference. In contrast to loading one of the waveguides as in a frequency division multiplexer, both waveguide arms are loaded with thin-film heaters in a switch. In the absence of any electrical input to the phase shifters, the switch operates in the cross-state. When electrical power corresponding to a 71 phase shift is supplied to either of the phase shifters, the switch operates in the parallel state. In an alternative scheme a photo-induced change in the refractive index of germania-doped silica has been used in an asymmetric Mach-Zehnder waveguide interferometer-based frequency division multiplexer to obtain frequency tuning (Hibino, Kominato and Ohmori [1991]). A 20 W Ar' ion laser is used to irradiate the waveguide for about an hour, which leads to a at 1.3 pm. The frequency maximum refractive index change of about 4 x shift (Af) due to laser irradiation relative to free spectral range (f,= 25 GHz) of the multiplexer is described by (Hibino, Kominato and Ohmori [1991]),
_ "-- 2 An (L + AL)A, f,
(4.3)
where An is the photo-induced refractive index change in the lower arm of the asymmetric Mach-Zehnder interferometer, L is the length of the shorter waveguide between the two 3 d B couplers of the interferometer. Photoinduced change in the refractive index has also been observed in titaniadoped silica waveguides (Hibino, Abe, Kominato and Ohmori [1991]). A maximum change in the refractive index induced by an Ar' laser was about 1x in a Mach-Zehnder waveguide interferometer in which the concentration of titania in the core was about 0.3 mol%. In another experiment, the laser-induced change in the refractive index of a silicon oxynitride-silica waveguide was used to tune the optical power coupling ratio to any arbitrary value in a directional coupler (Gleine and Muller [1991]). A large number of silica-based optical circuits on a single silicon wafer can be combined
I , § 41
GUIDED-WAVEOPTICAL COMPONENTS
43
either in parallel or in series to form a variety of devices. Examples include 128 channel selective optical frequency division multiplexed filter (Takato, Sugita, Onose, Okazaki, Okuno, Kawachi and Oda [1991]), 8 x 8 matrix thermo-optic switch through sixty-four 2 x 2 switching units (Sugita, Okuno, Matsunagu, Kawachi and Ohmori [ 1990]), and 8-tap optical transverse filter (Kawachi [19911, Sasamaya, Okuno and Habara [19913). Performance characteristics of a wide variety of FHD waveguide components including ring resonators on silicon were recently reviewed (Kawachi [1990, 19913). FHD silica ridge waveguides in which germania and phosphorus pentoxide are used as co-dopants have also been reported to generate the second harmonic of a Q-switched Nd-YAG laser pump at 1.064 pm (Kashyap, Ainslie and Maxwell [1989]). Frequency doubling is due to a nonlinearity similar to the quadrupole interaction known to occur in silica fibers (Osterberg and Margulis [1986]). A 200-fold increase in the yield of the frequency-doubled radiation has been observed when the waveguide is seeded for about an hour with a 0.532 pm light. Several other guided wave components and devices based on waveguides on silicon, such as the spectrum analyzer, Fresnel lenses and mirrors, Bragg cell, displacement sensor, and liquid refractive index sensors were reviewed by Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [19911. In the spectrum analyzer, 1.5 pm of ZnO, which is a piezoelectric material, is deposited by magnetron sputtering from a zinc target on top of a silicon nitride waveguide leading to the composite structure Si/Si02/S, N4/Si02/ZnO. Surface acoustic waves are generated and propagated through the ZnO film by feeding electrical signals to an interdigital transducer (IDT) finger laid on it, thereby creating a phase grating on the waveguide. The guided waves experience Bragg diffraction by this grating. Scanning the spatial locations of the diffracted light spot by means of a photodiode array can be correlated to the frequency content of the electrical signal feeding the ZnO piezotransducer. The theory of such acousto-optic devices has been described by Ghatak and Thyagarajan [1989]. Two different versions of such a spectrum analyzer, in-lineand folded types, were reported (Valette, Lizet, Mottier, Jadot, Gidon and Renard [1984], Mottier, Valette and Jadot [1986], Valette, Mottier, Lizet and Gidon [1986]). Typical characteristics of these devices operating at 0.835 pm are shown in table 2. The displacement sensor is essentially a Michelson interferometer in which all the optical components, such as lenses, beam splitter, mirrors, and phase shifters, are integrated on the silicon wafer. The interference pattern is detected by a pair of photodetectors butt-coupled to the
44
[I, § 4
GUIDED-WAVE OPTICS ON SILICON
TABLE2 Integrated optical spectrum analyzer: performances of in-line and folded types (From Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [1991].) Device parameter Bandwidth (MHz) Resolution (MHz) Central freq. (MHz) Acoustic mode Integrated lenses Focal length (mm) Resolved spots Dynamic range (dB) Access time Photodetector array Optical source
In-line type
Folded type
180
480
6 Acoustic mode Rayleigh classical Fresnel lenses 10 20 25 (30) 20 ps Serial-parallel CCD 2 x 10 pixels HLP 1400 (0.835 pm)
3 (1.5-2)* 600-700** Sesawa Curved Fresnel lenses 20 160 10-15 (25) 5 PS (2)
Serial-parallel CCD 16 x 10 pixels HLP 1400 (0.835 pm)
* Numbers in parentheses indicate performances possible in the near future.
** Transducer array with 700MHz central frequency leads to the generation of a third acoustic mode beyond 860 MHz. wafer (Gidon, Valette and Schweizer [19851, Valette, Renard, Jadot, Gidon and Erbeia [1989], Valette, Renard, Jadot, Gidon and Erbeia [1990]). The working distance of this sensor is IOcm, and it can make linear distance measurements with an accuracy of 0.1 pm. Furthermore, if a liquid of refractive index less than that of the cover silica layer is placed on the sensing arm of the interferometer, it will induce a phase change in the guided beam. As a result, a fringe shift in the interference pattern will occur, which can be correlated with the refractive index of the sample liquid (Valette, Renard, Jadot, Gidon and Erbeia [1990]). Several optical components, such as Fresnel lenses, mirrors (plane, parabolic, and elliptical), polarization converters or dividers, and phase shifters, have been reported on oxidized silicon substrates through partial or complete etching of the silicon overlayer (Mottier and Valette [1981], Valette, Morque and Mottier [1982], Gidon, Valette and Mottier [1985], Valette, Renard, Denis, Jadot, Fournier, Philippe, Gidon, Grouillet and Desgranges [19891). U-grooves are etched in silicon to achieve low-loss ( ~ 0 . dB) 7 connection between single-mode fibers and channel waveguides grown in silicon (Grand, Denis and Valette [1991]). The process uses RIE to create a slot of a width equal to the diameter of the fiber to be connected. This is followed by deep etching of silicon with a dry etching process. The fiber is eventually bonded
o
1, 41
GUIDED-WAVEOPTICAL COMPONENTS
45
to the groove with a glue (Grand, Jadot, Valette, Denis, Fournier and Grouillet [1990]). A set of four U-grooves are formed to construct a fourchannel wavelength multiplexer or demultiplexer (fig. 22) (Valette, Gidon and Jadot [19873). In this device a waveguide Fresnel mirror is incorporated on the silicon substrate to disperse and focus the four wavelengths (channel separation -20 nm around 1.5 pm) to four spatial locations at one edge of the substrate. The U-grooves are etched on these locations to form microguides and make optical coupling with the four optical fibers. Cross-talk between the channels is less than -15 dB (Grand, Jadot, Valette, Denis, Fournier and Grouillet [19903). Recently, fabrication of a ringe resonator operating at 0.8 pm was also reported. A finesse of 48 k 1.5 has been achieved at a propagation loss of 0.028 k 0.009 dB/cm; the effective diameter of the ring is about 3 cm (Bismuth, Gidon, Revol and Valette [1991]). A large number of novel components based on phosphosilicate or silicon nitride-core silica waveguides on silicon was reviewed by Henry, Blonder and Kazarinov [19893. Examples include star couplers (Dragone, Henry, Kaminow and Kistler [1989], Dragone [1991], Dragone, Edwards and Kistler [1991]), polarization splitters (Shani, Henry, Kistler, Kazarinov and Orlowsky [1990]), four-channel Mach-Zehnder multiplexers (Verbeek, Henry, Olsson, Orlowsky, Kazarinov and Johnson [ 1988]), Bragg reflector lasers (Olsson, Henry, Kazarinov, Lee, Orlowsky, Johnson, Scotti, Ackerman and Anthony [1988]), and Bragg reflection filters (Henry, Shani, Kistler, Jewell, Pol, Olsson, Kazarinov and Orlowsky [1989a,b]). As shown in fig. 23 in their N x N star coupler geometry (N up to 40), an array of phosphosilicate Fresnel Waveguide
At + A2 A3 + Aq +
Fig. 22. Schematic of a wavelength division multiplexer or demultiplexer integrated with a waveguide Fresnel mirror. (Reproduced from Valette, Jadot, Gidon, Renard, Grand, Fournier, Grouillet, Philippe, Denis, Desgranges, Mulatier and Erbeia [199 11 by permission of Kluwer Academic Publishers.)
46
CI, 4 4
GUIDED-WAVE OPTICS ON SILICON
Input
output
L
*bN
glass cores merges into a free-space slab waveguide formed on the silicon substrate. The light injected by the array waveguides into the slab is collected by a corresponding array of waveguides on the other side of the slab. For a 19 x 19 star coupler, fiber-to-fiber loss was only about 3 dB more than the excess loss due to division by 19 (Henry, Blonder and Kazarinov [19891). The polarization splitter consists of a Mach-Zehnder interferometer formed with phosphosilicate glass waveguides in which one of the arms has an additional 22 nm patch of silicon nitride (fig. 24) (Henry, Blonder and Kazarinov [1989]). The silicon nitride layer is added before deposition of the phosphosilicate glass. The silicon nitride layer is highly birefringent, and induces a phase shift to the TE polarization larger than the TM polarization. Si,N, Layer 3 dB Coupler
3 dB Coupler f(e
TE4+LJ
TM
r
Fig. 24. Schematic of a polarization splitter based on a silica waveguide Mach-Zehnder interferometer, in which a patch of Si3N, is introduced in one arm to produce a linear birefringence between the TE and TM modes. (Reproduced with permission of IEEE from Henry, Blonder and Kazarinov [1989], 0 1989 IEEE.)
o
1, 41
GUIDED-WAVE OPTICAL COMPONENTS
47
The length of the silicon nitride layer is adjusted to induce a 2n phase change in the upper arm so the TE polarization crosses over to the lower port. The TM polarization is made to undergo a phase change of n only in the upper arm by adjusting the additional length AL (fig. 24). Thus, TE polarization may exit through the lower port while the TM mode exits through the other output port. The insertion loss is in the range of 1.6 to 2.0dB, whereas suppression of unwanted polarization is 14 to 21 dB. A polarization splitter that has an insertion loss of about 1.5 dB and uses a Y-branch waveguide made with phosphosilicate glass and silicon nitride cores was fabricated by Shani, Henry, Kistler, Kazarinov and Orlowsky [1990]. It is an adiabatic device based on an asymmetric Y-branch. A 7 pm phosphosilicate glass core is gradually tapered to a 5 pm waveguide 5 mm in length, which branches off adiabatically (over a length of 1 mm) into a pair of waveguides: a 55 nm silicon nitride core waveguide and a 7 pm phosphosilicate glass core waveguide. The TE mode can be made to branch to the nitride guide by an appropriate choice of the nitride layer thickness, which is birefringent, while the TM mode exits through the phosphosilicate glass waveguide. The silicon nitride waveguide eventually makes a transition to a phosphosilicate glass waveguide. Cross-talk for the unwanted polarization is in the range of - 15 to -34 dB at 1.55 pm. A four-channel multiplexer with a channel separation of 7.7 nm was fabricated on silicon by combining three Mach-Zehnder interferometers formed with phosphosilicate glass waveguides in which fiberto-fiber insertion loss is 2.5 dB (Verbeek, Henry, Olsson, Orlowsky, Kazarinov and Johnson [19881). The multiplexer is almost polarization insensitive. The additional path length AL1 is chosen to yield a channel spacing of 14.4 nm between A1 and & (fig. 25). The other wavelengths ,I3 and ,I4 are chosen with the same channel spacing. The third Mach-Zehnder
Fig. 25. A four-channel wavelength multiplexer based on Mach-Zehnder interferometric elements in a doped-silica waveguide on silicon. (Reproduced with permission of IEEE from Verbeek, Henry, Olsson, Orlowsky, Kazarinov and Johnson [1988], 0 1988 IEEE.)
48
GUIDED-WAVE OPTICS ON SILICON
[I, 5 4
interferometer combines all the wavelengths into a single channel. All the variables AL,, ALz, and AL3, which are required to a high precision in the multiplexer design, are controlled at the mask development stage. The fiberto-fiber insertion loss is 2.5 dB. High-resolution Bragg reflection filters that operate at 1.3-1.6 pm require line and spatial features of the grating to be about 0.25 pm. A spatial frequency-doubling lithography (SFDL) technique with an excimer laser in the deep ultraviolet region is used for patterning such periodic features on silicon waveguides (Olsson, Henry, Kazarinov, Lee and Orlowsky [19871, Henry, Blonder and Kazarinov [19891, Henry, Shani, Kistler, Jewell, Pol, Olsson, Kazarinov and Orlowsky [1989a,b]). The SFDL technique can be used to generate a number of gratings simultaneously on the same mask. In another set of experiments, holographic techniques are employed to pattern gratings of a period of about 0.5 pm on the top surface of a silicon nitride waveguide to form a silicon chip Bragg rejector (fig. 14) (Ackerman, Kwo, Silva and Wagner [19883). When such a reflector is coupled to a laser diode, single-mode operation of the laser can be attained near the Bragg wavelength (Henry, Blonder and Kazarinov [19891, Olsson, Henry, Kazarinov, Lee, Orlowsky, Johnson, Scotti, Ackerman and Anthony [19881). Under certain operating conditions, temperature variations can cause a great reduction in the line width of these lasers. Typically, Av is 1-50 MHz (Henry, Shani, Kistler, Jewell, Pol, Olsson, Kazarinov and Orlowsky [ 1989a,b]); in one experiment, Av was 110 kHz (Ackerman, Kwo, Silva and Wagner [1988]). In another version a 1.5 pm laser was coupled to an integrated optic quarterwave shifted Bragg cavity to realize an ultra-narrow line width (Av 135 kHz) resonant optical reflection laser (Olsson, Henry, Kazarinov, Lee, Johnson and Orlowsky [1987]). Such a laser with Av as low as 10 kHz was reported by Ackerman, Dabura, Shani, Henry, Kistler, Kazarinov and Kwo [1990]. Electrorefraction or carrier-induced refractive index change in silicon has been used to realize an electro-optic switch in silicon (Soref and Lorenzo [1986, 19881, Soref and Bonnett [ 19871). Electrorefraction, which is related to the well-known Franz-Keldysh effect of electro-absorption, arises due to electric field-induced tunneling between the valence and the conduction bands. On the other hand, a carrier injection of lo'* cm-3 can induce a change of in the refractive index of silicon. An infrared light modulator has been fabricated by using such a carrier-induced change in the refractive index of silicon (Kanada, Fujisawa and Kikuiri [1986]). Fabrication of an optical power divider with two epitaxially grown, crossed silicon multimode
-
ACTIVE WAVEGUIDES
49
-
rib waveguides with a donor concentration (nD) of about 9 x lOI4 cm-3 on a heavily doped (no 3 x 10'' ~ m - silicon ~ ) substrate to operate at 1.3 pm has also been reported (Soref and Lorenzo [1986]).
9 5. Active Waveguides on Silicon Considerable progress has been achieved in recent years on rare earthdoped fiber lasers and amplifiers (Urquhart [1988]). Fiber lasers with a variety of rare earth dopants such as neodymium, erbium, thulium, holmium, ytterbium, praseodymium, and samarium, and based on silica and fluorozirconate glasses as host, have already been reported (Digonnet [1990, 19931). Since the materials involved in silica-based optical waveguides on silicon are very similar to those of low-loss optical fibers (Pal [1979]), it should be possible to realize active components like lasers and amplifiers in silicabased planar waveguides on silicon. Indeed, continuous wave lasing at a wavelength of 1.0515 pm was achieved by fabricating a neodymium-doped silica core ridge waveguide 20 pm wide on a silicon substrate by FHD and RIE (Hibino, Kitagawa, Shimizu, Hanawa and Sugita [ 19891). Neodymium is incorporated into the silica waveguide by immersing the FHD soot glass consisting of a network of silica, boron, and phosphorus in an alcohol solution of 0.5% of NdC13* 6 H 2 0 before the sintering step. Neodymium ion concentration is estimated to be about 2000ppm. A 6 p m wide ridge pattern is formed on the sintered core of a neodymium-doped phosphosilicate glass waveguide by RIE followed by deposition of silica overcladding by the FHD process. Light from a Ti:A1203 laser tuned into 0.8 pm wavelength and pumped by an Ar' laser is injected into a single-mode fiber, which is butt-joined to the neodymium-doped waveguide as the pump source. A second single-mode fiber is butt-joined to the other end of the waveguide to collect the guided light. The resonator is formed by depositing dielectric mirrors onto the fiber end faces. The lasing threshold is about 150 mW and the slope efficiency is 0.12% (Hibino, Kitagawa, Shimizu, Hanawa and Sugita [1989]). The FWHM of the emitted lasing peak wavelength of 1.0515 pm is about 0.12 nm, and the measured transmission loss at the lasing wavelength is 0.85 dB/cm. Fabrication of a neodymium-doped silica waveguide laser with an 8 pm wide core was recently reported (Hattori, Kitagawa, Ohmori and Kobayashi [1991]). A commercially available laser diode emitting at 0.805 pm is used as the pump, and dielectric mirrors are deposited directly at the waveguide
50
GUIDED-WAVE OPTICS ON SILICON
CL 0 6
end faces. The lasting threshold is only 25 mW at a slope efficiency of 1.2%. It was made possible by a careful analysis of the scattering loss induced by neodymium doping and dependence of the lasing threshold on the width of the core, Lasing (continuous wave) at two wavelengths around 1.589 and 1.604 pm were recently demonstrated with an erbium-doped phosphosilicate glass waveguide fabricated by following the same approach through FHD and RIE (Kitagawa, Hattori, Shimizu, Ohmori and Kobayashi [19911). Erbium ion concentration is estimated to be 8000 ppm. A Ti :Al,O, laser tuned into 0.98 pm and pumped by an Ar’ laser is used as the pump source. Dielectric mirrors to form the resonator are deposited directly onto the waveguide end faces. The pump power required for lasing is about 49 mW with a slope efficiency of 0.81%. Transmission loss at 1.54 pm is about 0.82 dB/cm. In some earlier experiments, silicon diodes not in a waveguide configuration, after being implanted with erbium ions, had yielded emission of 1.54 pm radiation at room temperature (Ennen, Schneider, Pomrenke and Axmann [19831, Ennen, Pomrenke, Axmann, Eisele, Haydl and Schneider [19853). This scheme of erbium ion implanted emission was consolidated in silica-based waveguides to realize optical emission at the 1.5 pm wavelength region (Polman, Lidgard, Jacobson, Becker, Kistler, Blonder and Poate [ 19903, Lidgard, Polman, Jacobson, Blonder, Kistler, Poate and Becker [1991]). In particular, fluorescence was observed at 1.54 pm from phosphosilicate-core silica glass waveguides on a silicon substrate by implanting 3.5 MeV erbium ions into the waveguide at an implantation fluence in the range of 10’’ to 10l6ions/cm2. Fluorescence lifetime is typically 10 ms, which is of the same order as that observed in erbium-doped fibers. The radiation at 488 nm from an Ar’ laser is used as the pump. It should be possible to incorporate optically active rare earths in the silica network of an ARRO waveguide on silicon. Work is under way in our laboratory to realize a nearinfrared source through such a scheme.
0 6.
Conclusions
Starting from the earliest experiments on silicon-based waveguides in the 1970s, we have described different technologic options and the status of components realized or reported to date, and the theory of propagation in optical waveguides, both in planar and rectangular geometries. The review should provide a comprehensive description for design, fabrication and estimation of silicon-based optical waveguides and associated passive and active optical components.
I1
REFERENCES
51
Acknowledgements This work was supported by the Indo-US Materials Science Collaboration Program. The writing of this review started at IIT, New Delhi; however, a major portion of it was written during my sabbatical year as Visiting Scholar at NIST, Boulder, Colorado. I am grateful to the Fulbright Foundation for providing travel support. Special thanks go to my host R. L. Gallawa for his keen interest, enthusiasm, and encouragement, and generosity of time and patience. His constructive criticisms have greatly improved the coherence of the review. I further thank A. K. Ghatak, N. Sanford, and G. P. Agrawal for very useful comments as reviewers for our internal editorial review board, and Matt Young and Don Larson for their helpful editorial suggestions. I had many interesting discussions on silicon-based integrated optics with my erstwhile student Hemant Singh at IIT Delhi. I thank Swagata Deb for her help in some of the numerical calculations on ARRO waveguides. Thanks are also due to Margalene Hartman for her editorial assistance. References Aarnio, J., S. Honkanen and M. Leppihalme, 1987, A novel semiconductor process for optoelectronic applications on silicon substrate, in: Proc. Eur. Conf. Optical Communications, ECOC87, Helsinki, p. 235. Ackerman, D.A., M.I. Dabura, Y. Shani, C.H. Henry, R.C. Kistler, R.F. Kazarinov and C.Y. Kwo, 1990, Compact hybrid resonant optical reflector lasers with very narrow linewidths, in: Topical Meeting on Integrated Photonics Res., Hilton Head, SC. Ackerman, D.A., C.Y. Kwo, V.L. Silva and E.J. Wagner, 1988, Compact silicon chip Bragg reflector hybrid laser with I10 kHz line-width, in: I Ith Int. Semiconductor Conf., Boston, MA. Adams, M.J., I98 1, An Introduction to Optical Waveguides (Wiley, Chichester). Adams, R., Y. Shani, C.H. Henry, R.C. Kistler, G.E. Blonder and N.A. Olsson, 1991, Very lowloss phosphorus-doped silica-on-silicon waveguides measured using a ring resonator, in: Opt. Fib. Commun., OFC'91, San Diego, Paper TuF5, p. 22. Aiki, K., M. Nakamura and J. Umeda, 1977, A frequency multiplexing light source with monolithically integrated distributed feedback diode lasers, IEEE J. Quantum Electron. QE13, 220. Alferness, R.C., and L.L. Buhl, 1982, Tunable electro-optic waveguide TE-TM converter/wavelength filter, Appl. Phys. Lett. 40,861. Baba, T., and Y. Kokubun, 1989, New polarization-insensitive antiresonant reflecting optical waveguide (ARROW-B), IEEE Photonics Tech. Lett. PTL-1, 232. Baba, T., and Y. Kokubun, 1990, High efficiency light-coupling from antiresonant reflecting optical waveguide to integrated photodetector using an antireflecting layer, Appl. Opt. 29, 2781. Baba, T., and Y. Kokubun, 1991, Scattering loss of antiresonant reflecting optical waveguides, IEEE J. Lightwave Tech. LT-9, 590.
52
GUIDED-WAVE OPTICS ON SILICON
[I
Baba, T., Y. Kokubun and H. Watanabe, 1990, Monolithic integration of an ARROW-type demultiplexer and photodetector in the shorter wavelength region, IEEE J. Lightwave Tech. LT-8, 99. Baba, T., Y. Kokubun, T. Sasaki and K. Iga, 1988, Loss reduction of an ARROW waveguide in shorter wavelength and its stack configuration, IEEE 3. Lightwave Tech. LT-6, 1440. Barbier, D., M. Green and S.J. Madden, 1991, Waveguide fabrication for integrated optics by electron beam irradiation of silica, IEEE J. Lightwave Tech. LT-9, 715. Bean, K.E., 1978, Anisotropic etching of silicon, IEEE Trans. Electron. Dev. ED-25, 1185. Bismuth, J., P. Gidon, F. Revol and S. Valette, 1991, Low-loss ring resonators fabricated from silicon based integrated optics technologies, Electron. Lett. 27, 722. Boyd, J.T., and C.L. Chen, 1976, Integrated optical silicon photodiode array, Appl. Opt. 15, 1389. Boyd, J.T., and C.L. Chen, 1977, Integrated optical waveguide and charge-coupled device image array, IEEE J. Quantum Electron. QE-13, 282. Boyd, J.T., and S. Sriram, 1978, Optical coupling from fibers to channel waveguides formed on silicon, Appl. Opt. 17, 895. Boyd, J.T., S.H. Chang, C.L. Fan and D.A. Ramey, 1981, Integration of photodetectors and optical guided wave structures formed on silicon substrates, Proc. SPIE 272, 98. Boyd, J.T., R.W. Wu, D.E. Zelmon, A. Neumaan and H.A. Timlin, 1984, Planar and channel optical waveguides utilizing silicon technology, in: Proc. 1st Conf. Integrated Optical Circuit Engineering, Proc. SPIE 517, 100. Boyd, J.T., R.W. Wu, D.E. Zelmon, A. Neumaan, H.A. Timlin and H.E. Jackson, 1985, Guided wave optical structures utilizing silicon, Opt. Eng. 24, 230. Carney, J.K., and L.D. Hutcheson, 1987, GaAs-based integrated optoelectronic circuits: Design, development and applications, in: Integrated Optical Circuits and Components: Design and Applications, ed. L.D. Hutcheson (Marcel Dekker Inc., New York) p. 229. Chang, C.L., and C.S. Tsai, 1983, Electro-optic analog to digital converter using channel waveguide Fabry-Pirot modulator array, Appl. Phys. Lett. 43, 22. Chen, C.L., and J.T. Boyd, 1981, Temperature independent thin film optical waveguide, Appl. Opt. 20, 2280. Chubachi, N., 1976, ZnO film for acousto-optic devices in nonpiezoelectric substrates, Proc. IEEE 64,772. Digonnet, M.J., ed., 1990, Proc. Fiber Laser Sources and Amplifiers 11, Sept. 18-19, San Jose, CA (Society of Photo Optical Instrumentation Engineers, Bellingham, USA). Digonnet, M.J., ed., 1993, Rare Earth Doped Fiber Lasers and Amplifiers (Marcel Dekker, New York). Dragone, C., 1991, An N x N optical multiplexer using a planar arrangement of two star couplers, IEEE Photonics Tech. Lett. PTL-3, 812. Dragone, C., C.A. Edwards and R.C. Kistler, 1991, Integrated Optics N x N multiplexer on silicon, IEEE Photonics Tech. Lett. PTL-3, 896. Dragone, C., C.H. Henry, I.P. Kaminow and R.C. Kistler, 1989, Efficient multichannel integrated optics star coupler on silicon, IEEE Photonics Tech. Lett. PTL-1, 241. Duguay, A., Y. Kokubun, T.L. Koch and L. Pfeiffer, 1986, Antiresonant reflecting optical waveguides in SO,-Si multilayer structures, Appl. Phys. Lett. 49, 13. Dutta, S., H.E. Jackson and J.T. Boyd, 1980, Reduction of scattering from a glass thin film optical waveguide by C0,-laser annealing, Appl. Phys. Lett. 37, 512. Dutta, S., H.E. Jackson and J.T. Boyd, 1981, Extremely low-loss glass thin-film optical waveguides utilizing surface coating and laser annealing, J. Appl. Phys. 52, 3873. Dutta, S., H.E. Jackson, J.T. Boyd, F.S. Hickernell and R.L. Davis, 1981, Scattering loss reduction in ZnO optical waveguides by laser annealing, Appl. Phys. Lett. 39,206. Dutta, S., H.E. Jackson, J.T. Boyd, R.L. Davis and F.S. Hickernell, 1982, C0,-laser annealing of Si3N4,Nb,O, and Ta,O, thin film optical waveguides to achieve scattering loss reduction, IEEE J. Quantum Electron. QE-18, 800.
I1
REFERENCES
53
Ennen, H., G. Pomrenke, A. Axmann, K. Eisele, W. Haydl and J. Schneider, 1985, 1.54 pm electroluminescence of erbium-doped silicon grown by molecular beam epitaxy, Appl. Phys. Lett. 46, 381. Ennen, H., J. Schneider, G. Pomrenke and A. Axmann, 1983, 1.54 pm luminescence of erbiumimplanted 111-V semiconductors and silicon, Appl. Phys. Lett. 43, 963. Falco, C., J. Botineau, A. Azema, M. De Micheli and D.B. Ostrowsky, 1983, Optical properties determination at 10.6 pm of thin semiconducting layers, Appl. Phys. Lett. A 30,23. Findalky, T., 1985, Glass waveguides by ion exchange: A review, Opt. Eng. 24, 244. Ghatak, A.K., 1986, Electromagnetics of integrated optical waveguides, J. Inst. Electron. & Telecom. Eng. (India) 32, 159. Ghatak, A.K., and K. Thyagarajan, 1989, Optical Electronics (Cambridge University Press, Cambridge). Ghatak, A.K., K. Thyagarajan and M.R. Shenoy, 1987, Numerical analysis of planar optical waveguides using matrix approach, IEEE J. Lightwave Tech. LT-5, 600. Gidon, P., S. Valette and P. Schweizer, 1985, Vibration sensor using planar integrated interferometric circuit on oxidized Si substrate, in: Proc. 2nd Int. Conf. on Optical Fiber Sensors, Stuttgart, eds R.Th. Kersten and R. Kist (VDE-Verlag GmbH, Berlin) p. 187. Gidon, P., S. Valette and P. Mottier, 1985, Integrated lenses on silicon nitride waveguides, Opt. Eng. 24, 235. Gleine, W., and J. Miiller, 1991, Laser trimming of SiON components for integrated optics, IEEE J. Lightwave Tech. LT-9, 1626. Goell, J.E., 1969, A circular harmonic computer analysis of rectangular dielectric waveguides, Bell Syst. Tech. J. 48, 2133. Goell, J.E., and R.D. Strandley, 1969, Sputtered glass waveguides for integrated optical circuits, Bell Syst. Tech. J. 48, 3445. Grand, G., H. Denis and S. Valette, 1991, New method for low cost and efficient optical connections between single-mode fibers and silica guides, Electron. Lett. 27, 16. Grand, G., J.P. Jadot, H. Denis, S. Valette, A. Fournier and A.M. Grouillet, 1990, Low loss PECVD silicon channel waveguides for optical communications, Electron. Lett. 26, 2 135. Grand, G., J.P. Jadot, S. Valette, H. Denis, A. Fournier and A.M. Grouillet, 1990, Fiber pigtailed wavelength multiplexer/demultiplexer at I .55 pm integrated on silicon substrate, in: Proc. 8th Ann. Eur. Fiber Optics Communication and Local Area Networks Conf., Munich, June 27-29 (IGI Europe, Boston, MA) p. 108. Hall, D.G., 1987, Survey of silicon-based integrated optics, IEEE Computer SOC. Mag. (December) p. 25. Hashizume, H., M. Seki and N. Nakoma, 1989, Polarization insensitive planar waveguide I x 4 branching device with good wavelength flattened characteristics by ion exchange, in: Optical Fiber Communication Conf., OFC'89, Houston, TX, paper WM 1. Hattori, K., T. Kitagawa, Y. Ohmori and M. Kobayashi, 1991, Laser-diode pumping of waveguide laser based on Nd-doped silica planar lightwave circuit, IEEE Photonics Tech. Lett. PTL-3, 882. Henry, C.H., 1991, Integrated optics and hybrid optoelectronic devices on silicon, Tutorial Lecture, in: Optical Fiber Commun. Conf. OFC'91, San Diego, Feb. 18-22. Henry, C.H., G.E. Blonder and R.F. Kazarinov, 1989, Glass waveguides on silicon for hybrid optical packaging, IEEE J. Lightwave Tech. LT-7, 1530. Henry, C.H., R.F. Kazarinov, H.J. Lee, K.J. Orlowsky and L.E. Katz, 1987, Low-loss Si,N,-SiO, optical waveguides on silicon, Appl. Opt. 26, 2621. Henry, C.H., Y. Shani, R.C. Kistler, T.E. Jewell, V. Pol, N.A. Olsson, R.F. Kazarinov and K.J. Orlowsky, 1989a, Compound Bragg reflection filters made by high resolution deep ultraviolet stepper, in: Optical Fiber Communication Conf. OFC'89, Houston, TX, paper TuBB4.
54
GUIDED-WAVE OPTICS ON SILICON
[I
Henry, C.H., Y. Shani, R.C. Kistler, T.E. Jewell, V. Pol, N.A. Olsson, R.F. Kazarinov and K.J. Orlowsky, 1989b, Compound Bragg reflection filters made by spatial frequency doubling lithography, IEEE J. Lightwave Tech. LT-7, 1379. Hibino, Y., M. Abe, T. Kominato and Y. Ohmori, 1991, Photoinduced refractive-index changes in Ti0,-doped silica optical waveguides on silicon substrate, Electron. Lett. 27, 2294. Hibino, Y., T. Kominato and Y. Ohmori, 1991, Optical frequency tuning by laser-irradiation in silica-based Mach-Zehnder-type multi/demultiplexers, IEEE Photonics Tech. Lett. FTL3, 640. Hibino, Y., T. Kitagawa, M. Shimizu, F. Hanawa and A. Sugita, 1989, Neodymium-doped silica optical waveguide laser on silicon substrate, IEEE Photonics Tech. Lett. PTL-1, 349. Hickernell, F.S., 1979, An optical measure of the acoustic quality of ZnO thin films, Proc. IEEE Ultrasonics Symp., p. 932. Hickernell, F.S., 1988, Optical waveguides on silicon, Solid State Techno]. (November) p. 83. Hickernell, F.S., R.L. Davis and F.V. Richard, 1978, The acousto-optic properties of thin film Si,N4, Ta205,ZnO, and 7059 glass on oxidized silicon substrates, in: Proc. IEEE Ultrasonics Symp., p. 60. Hocker, G.B., and W.K. Burns, 1977, Mode dispersion in diffused channel waveguides by the effective index method, Appl. Opt. 16, 113. Izutsu, M., H. Haga and T. Sueta, 1983, Picosecond signal sampling and multiplication by using integrated tandem light modulators, IEEE J. Lightwave Tech. LT-1, 285. Jiang, W., J. Chrostowski and M. Fontaine, 1989, Analysis of ARROW waveguides, Opt. Commun. 72, 180. Jinguji, K., N. Takato, A. Sugita and M. Kawachi, 1990, Mach-Zehnder interferometer type waveguide coupler with wavelength-flattened coupling ratio, Electron. Lett. 26, 1326. Kanada, S., Y. Fujisawa and K. Kikuiri, 1986, Infrared light modulator of ridge type optical waveguide structure using effect of free carrier absorption, Electron. Lett. 22, 922. Kashyap, R., B.J. Ainslie and G.D. Maxwell, 1989, Second harmonic generation in GeO, ridge waveguides, Electron. Lett. 25, 206. Kawachi, M., 1990, Silica waveguides on silicon and their application to integrated-optic components, Opt. Quantum Electron. 22, 391. Kawachi, M., 1991, Silica-based planar lightwave circuit technologies, in: Proc. Eur. Conf. Optical Communications, ECOC'91/IOOC'91, Paris, Invited Papers, p. 51. Kawachi, M., T. Miya and Y. Ohmori, 1991, Silica based planar lightwave circuits for fiberto-the-home applications, in: Optical Fiber Communication Conf. OFC'91, San Diego, paper TUG1, p. 26. Kawachi, M., M. Yasu and M. Kobayashi, 1983, Flame hydrolysis deposition of Si02-Ti02 glass planar optical waveguides on silicon, Jpn. J. Appl. Phys. 22, 1932. Kawachi, M., M. Yasu and T. Edahiro, 1983, Fabrication of SiO,-TiO, glass planar optical waveguides by flame hydrolysis deposition, Electron. Lett. 19, 583. Kawachi, M., N. Takato, K. Jinguji and M. Yasu, 1987, in: Proc. Conf. Optical Fiber Communication/Integrated Optics OFC/IOCC'87, Reno, paper TuQ3 1. Kawachi, M., Y. Yamada, M. Yasu and M. Kobayashi, 1985, Guided-wave optical wavelength division multi-demultiplexer using high-silica channel waveguides, Electron. Lett. 21, 3 14. Kendall, D.L., 1979, Vertical etching of silicon at very high aspect ratios, Ann. Rev. Mater. Sci., p. 373. Kitagdwa, T., K. Hattori, M. Shimizu, Y. Ohmori and M. Kobayashi, 1991, Guided-wave laser based on erbium-doped silica planar lightwave circuit, Electron. Lett. 27, 334. Knox, R.M., and P.P. Toulios, 1970, Integrated circuits for the millimeter through optical frequency range, in: Proc. of MRI Symp. on Submillimeter Waves, ed. J. Fox (Polytechnic Press, Brooklyn) p. 497. Kobayashi, S., T. Kito, M. Yasu and M. Kawachi, 1989, in: Fall Nat. Conv. IECE, Japan, C-265.
11
REFERENCES
55
Kokubun, Y., T. Baba, T. Sasaki and K. Iga, 1986, Low-loss antiresonant reflecting optical waveguide on Si substrate in visible-wavelength region, Electron. Lett. 22, 892. Kokubun, Y., S. Tamura and T. Kondo, 1991, Spot-size transformer by ARROW-B waveguides, in: Topical Meeting on Integrated Photonics Research, Monterey, CA, Paper ThA2. Kominato, T., Y. Ohmori and K. Onose, 1991, Characteristics of high silica single-tap ring resonator fabricated by planar lightwave circuit technology, in: Spring National Conv. IECE, Japan, C-144. In Japanese. Kominato, T., Y. Ohmori, H. Okazaki and M. Yasu, 1990, Very low-loss GeO, doped silica waveguide fabricated by flame hydrolysis deposition method, Electron. Lett. 26, 327. Koren, U., 1989, Optoelectronic integrated circuits, in: Optoelectronic Technology and Lightwave Communications Systems, ed. C. Lin (Van Nostrand Reinhold, New York). Korotky, S.K., and R.C. Alferness, 1987, Ti-LiNbO, integrated optic technology: Fundamentals, design considerations, and capabilities, in: Integrated Optical Circuits and Components: Design and Applications, ed. L.D. Hutcheson (Marcel Dekker Inc., New York) p. 169. Kumar, A,, A.N. Kaul and A.K. Ghatak, 1985, Prediction of coupling length in a rectangular core directional coupler: An accurate analysis, Opt. Lett. 10, 86. Kumar, A., M.R. Shenoy and K. Thyagarajan, 1984, Modes in anisotropic rectangular waveguides: An accurate and simple perturbation approach, IEEE Trans. Microwave Theory & Tech. MTT-32, 1416. Kumar, A., K. Thyagarajan and A.K. Ghatak, 1983, Analysis of rectungular core dielectric waveguides: An accurate perturbation approach, Opt. Lett. 8, 63. Kurdi, B.N., and D.G. Hall, 1988, Optical waveguides in oxygen implanted buried oxide siliconon-insulator structures, Opt. Lett. 13, 175. Lee, H.J., C.H. Henry, R.F. Kazarinov and K.J. Orlowsky, 1987, Low loss Bragg reflectors on SiO,-Si,N,-SiO, rib waveguides, Appl. Opt. 26, 2618. Lee, H.J., C.H. Henry, K.J. Orlowsky and T.Y. Kometani, 1988, Refractive index dispersion of phosphosilicate glass, thermal oxide, and silicon nitride films on silicon, Appl. Opt. 27, 4104. Leonberger, F.J., C.E. Woodward and D.L. Spears, 1979, Design and development of highspeed electro-optic A/D converter, IEEE Trans. Circuits & Sys. CAS-26, 1125. Lidgard, A., A. Polman, D.C. Jacobson, G.E. Blonder, R.C. Kistler, J.M. Poate and P.C. Becker, 1991, Fluorescence life time studies of MeV erbium implanted silica glass, Electron. Lett. 27, 993. Lubberts, G., B.C. Burkey, F. Moser and E.A. Trabka, 1981, Optical properties of phosphorusdoped polycrystalline silicon layers, J. Appl. Phys. 52, 6870. Marcatili, E.A.J., 1969, Dielectric rectangular waveguides and directional couplers for integrated optics, Bell Syst. Tech. J. 48, 2071. Marcuse, D., 1974, Theory of Dielectric Optical Waveguides (Academic Press, New York). Marx, G.E., M. Gottlieb and G.B. Brandt, 1977, Integrated optical detector array, waveguide, and modulator based on silicon technology, IEEE J. Solid-state Circuits SC-12, 10. Matsuo, S., 1978, Selective etching of SiO, relative to silicon using CF4 plasma, Jpn. J. Appl. Phys. 17, 235. Matsuo, S., 1980, Selective etching of SiO, relative to silicon without undercutting by CBrF, plasma, Appl. Phys. Lett. 36, 768. McWright Howerton, M., and T.E. Batchman, 1988, A multi-film waveguide photodetector using hydrogenated amorphous silicon, IEEE J. Lightwave Tech. LT-6,1856. Mergerian, D., E.C. Malarkey, R.P. Pautienus, J.C. Bradley, G.E. Marx, L.D. Hutcheson and A.L. Kellner, 1980, Operational integrated optical RF spectrum analyzer, Appl. Opt. 19,3033. Merz, J.L., Y.R. Yuan and G.A. Vawter, 1985, Photonics for integrated circuits and communications, Opt. Eng. 24, 214. Miller, S.E., 1969, Integrated op:ics: An introduction, Bell Syst. Tech. J. 48, 2059.
56
GUIDED-WAVE OPTICS ON SILICON
[I
Moss, T.S., 1959, Optical Properties of Semiconductors (Butterworths, London). Mottier, P., and S. Valette, 1981, Integrated Fresnel lens on thermally oxidized Si substrate, Appl. Opt. 20, 1630. Mottier, P., S. Valette and J.P. Jadot, 1986, Broadband Bragg deflector for optical waveguides on Si-substrates, Opt. & Laser Technol. 18, 89. Neumaan, A., and J.T. Boyd, 1980, Phosphosilicate glass flow for integrated optics, J. Vac. Sci. & Technol. 17, 529. Neumaan, A., and J.T. Boyd, 1981, Laser annealing of phosphosilicate glass, J. Vac. Sci. & Technol. 18, 821. Nissim, C., A. Beguin, R. Jansen and P. Laborde, 1989, Fabrication and characterization of buried single-mode waveguides and couplers made by ion-exchange in glass, Optical Fiber Communication Conf., OFC'89, Houston, TX, Paper WM2. Nourshargh, N., E.M. Starr and J.S. McCormack, 1986, Plasma deposition of GeO,/SiO, and Si,N, waveguides for integrated optics, IEE Proc. J. 133, 264. Nourshargh, N., E.M. Starr and T.M. Ong, 1989, Integrated optic 1 x 4 splitter in SiOJGeO,, Electron. Lett. 25, 981. Okamoto, K., 1991, Recent progress in high-silica planar lightwave circuits, in: Topical Meeting on Integrated Photonics Research, Monterey, CA, April 9-1 I , Paper ThE1. Okamoto, K., H. Takahashi, S . Suzuki, A. Sugita and Y. Ohmori, 1991, Design and fabrication of integrated-optic 8 x 8 star coupler, Electron. Lett. 27, 774. Olsson, N.A., C.H. Henry, R.F. Kazarinov, H.J. Lee, B.H. Johnson and K.J. Orlowsky, 1987, Narrow linewidth 1.5 pm semiconductor laser with a resonant optical reflector, Appl. Phys. Lett. 51, 1141. Olsson, N.A., C.H. Henry, R.F. Kazarinov, H.J. Lee, K.J. Orlowsky, B.H. Johnson, R.E. Scotti, D.A. Ackerman and P.J. Anthony, 1988, Performance characteristics of 1.5 pm single frequency semiconductor laser with an external waveguide Bragg reflector, IEEE J. Quantum Electron. QE-4, 143. Olsson, N.A., C.H. Henry, R.F. Kazarinov, H.J. Lee and K.J. Orlowsky, 1987, Relation between chirp and linewidth reduction in external reflector semiconductor lasers, Appl. Phys. Lett. 51, 92. Osterberg, U., and W. Margulis, 1986, Dye laser pumped by Nd:YAG pulses frequency doubled in a glass optical fiber, Opt. Lett. 11, 195. Pal, B.P., 1979, Optical communication fiber waveguide fabrication: A review, Fib. Int. Opt. 2, 195. Pal, B.P., 1987, Lightwave propagation in optical waveguides, in: Fiber Optics and Instrumentation, ed. M.M. Butusov (Mashinistroenie Publisher, Leningrad). In Russian. Pal, B.P., H. Singh, A.K. Ghatak and A.B. Bhattacharya, 1990, Design and fabrication of ARROW (antiresonant reflecting optic waveguide) on silicon, in: XXIII General Assembly, URSI, Praha, 28 Aug-5 Sept. Papuchon, M., 1986, Integrated optics, J. Inst. Electron. & Telecom. Eng. (India) 32, 171. Papuchon, M., Y.Combemale, X. Mathieu, D.B. Ostrowsky, L. Reiber, A.M. Roy, B. Sejourne and M. Werner, 1975, Electrically switched optical directional coupler: COBRA, Appl. Phys. Lett. 27, 289. Peterson, K.E., 1982, Silicon as a mechanical material, Proc. IEEE 70, 420. Polman, A., A. Lidgard, D.C. Jacobson, P.C. Becker, R.C. Kistler, G.E. Blonder and J.M. Poate, 1990, 1.54 pm room temperature luminescence of MeV erbium-implanted silica glass, Appl. Phys. Lett. 57, 2859. Ramaswamy, R.V., and R. Srivastava, 1988, Ion-exchanged glass waveguides: A review, IEEE J. Lightwave Tech. LT-6,984. Rand, M.J., and R.D. Strandley, 1972, Silicon oxynitride films on fused silica for optical waveguides, Appl. Opt. 11, 2482.
11
REFERENCES
57
Sasamaya, K., M. Okuno and K. Habara, 1991, Coherent optical transversal filter using silicabased waveguides for high-speed signal processing, IEEE J. Lightwave Tech. LT-9, 1225. Schmidt, R.V., and R.C. Alferness, 1979, Directional coupler switches, modulators and filters using alternating A j techniques, IEEE Trans. Circuits & Sys. CAS-26, 1099. Shani, Y., C.H. Henry, R.C. Kistler, K.J. Orlowsky and D.A. Ackerman, 1989, Efficient coupling of a semiconductor laser to an optical fiber by means of a tapered waveguide on silicon, Appl. Phys. Lett. 55, 2389. Shani, Y., C.H. Henry, R.C. Kistler, R.F. Kazarinov and K.J. Orlowsky, 1990, Integrated optic adiabatic polarization splitter on silicon, Appl. Phys. Lett. 56, 120. Sodha, M.S., and A.K. Ghatak, 1977, Inhomogeneous Optical Waveguides (Plenum Press, New York). Soref, R.A., and B.R. Bonnett, 1987, Electro-optical effects in silicon, IEEE J. Quantum Electron. QE-23, 123. Soref, R.A., and J.P. Lorenzo, 1985, Single-crystal silicon: A new material for 1.3 and 1.6 pm integrated-optical components, Electron. Lett. 21, 953. Soref, R.A., and J.P. Lorenzo, 1986, All-silicon active and passive guided wave components for I = 1.3 and 1.6 pm, IEEE J. Quantum Electron. QE-22, 873. Soref, R.A., and J.P. Lorenzo, 1988, Silicon guided-wave optics, Solid-state Technol. (November) p. 95. Soref, R.A., and K.J. Ritter, 1990, Silicon antiresonant reflecting optical waveguides, Opt. Lett. 15, 792. Splett, A., J. Schmidtchen, B. Schiipert and K. Petermann, 1990, Integrated optical channel waveguides in silicon using Si-Ge alloys, in: Proc. SPIE Conf. on Physical Concepts of Materials for Novel Optoelectronic Device Applications, Aachen, Oct. 28-Nov. 2. Stutius, W., and W. Streifer, 1977, Silicon nitride films on silicon for optical waveguides, Appl. Opt. 16, 3218. Sugita, A., M. Okuno, T. Matsunagu, M. Kawachi and Y. Ohmori, 1990, Strictly non-blocking 8 x 8 integrated optical matrix switch with silica-based waveguides on silicon substrates, in: Proc. Eur. Conf. Optical Communications ECOCPO, Amsterdam, Paper WeG4, p. I. Sun, C.J., W.M. Myers, K.M. Schmidt and S. Sumida, 1991, Silica-based circular cross-sectioned channel waveguides, IEEE Photonic Tech. Lett. PTL-3, 238. Takagi, A., K. Jinguji and M. Kawachi, 1991, Wavelength-flattened (3 x 3) coupler with silica waveguides on silicon, in: Optical Fiber Communications Conf. OFC’91, San Diego, Paper TuF4, p. 21. Takato, N., K. Jinguji, M. Yasu, H. Toba and M. Kawachi, 1988, Silica based single-mode waveguides on Si and their application to guided-wave optical interferometers, IEEE J. Lightwave Tech. LT-6, 1003. Takato, N., T. Kominato, A. Sugita, K. Jinguji, H. Toba and M. Kawachi, 1990, Silica-based integrated optic Mach-Zehnder multi/demultiplexer family with channel spacing of 0.01-250 nm,IEEE J. Select. Areas in Communication 8, 1120. Takato, N., A. Sugita, K. Onose, H. Okazaki, M. Okuno, M. Kawachi and K. Oda, 1991, 128-channel polarization-insensitive frequency-selective-switch using high-silica waveguides on Si, IEEE Photonics Tech. Lett. PTL-2, 441. Taylor, H.F., 1978, Guided wave electro-optic device for logic and computation, Appl. Opt. 17, 1493. Terui, H., Y. Yamada, M. Kawachi and M. Kobayashi, 1985, Hybrid integration of laser diode and high-SiO, multimode optical channel waveguide on Si, Electron. Lett. 21, 646. Tewari, R., H.Singh and B.P. Pal, 1990, An accurate numerical technique for the analysis of ARROW waveguides, Opt. Technol. & Microwaves Lett. 3, 305. Thyagarajan, K., S. Diggavi and A.K. Ghatak, 1987, Analytical investigation of leaky and absorbing planar structures, Opt. Quantum Electron. 19, 131.
58
GUIDED-WAVE OPTICS ON SILICON
[I
Thylen, L., and L. Stensland, 1982, Lens-less integrated optic spectrum analyzer, IEEE J. Quantum Electron. QE-18, 381. Toba, H., K. Oda, K. Nosu and N. Takato, 1989,16-channeIoptical FDM distribution/transmission experiment utilizing multichannel frequency stabilizers and waveguide frequency selection switch, Electron. Lett. 25, 574. Urquhart, P., 1988, Review of rare-earth doped fiber lasers and amplifiers, IEE Proc. 135, Part J, p. 385. Valette, S., 1987, Integrated optics on silicon substrate: Application to optical communications, in: Proc. of Horizon de I'Optique 87, Marseille, June. Valette, S., 1988, State-of-the-art of integrated optics technology at LET1 for achieving passive optical components, J. Mod. Opt. 35, 993. Valette, S., J.P. Jadot, P. Gidon, S. Renard, S. Grand, A. Fournier, A.M. Grouillet, P. Philippe, H. Denis, E. Desgranges, L. Mulatier and C. Erbeia, 1991, Integrated photonic circuits on silicon, in: Novel Silicon Based Technologies, ed. R.A. Levy (Kluwer, Amsterdam) p. 173. Valette, S., J. Lizet, P. Mottier, J.P. Jadot, P. Gidon and S. Renard, 1984, Integrated optical circuits archieved by planar technology on Si substrates: Application to optical spectrum analyzer, IEE Proc. 131, Part H, p. 325. Valette, S., P. Mottier, J. Lizet and P. Gidon, 1986, Integrated optics on silicon substrate: A way to achieve complex optical circuits, Proc. SPIE 651, 94. Valette, S., S. Renard, H. Denis, J.P. Jadot, A. Fournier, P. Philippe, P. Gidon, A.M. Grouillet and E. Desgranges, 1989, Si-based integrated optics technologies, Solid-State Technol. (February) p. 69. Valette, S., S. Renard, J.P. Jadot, P. Gidon and C. Erbeia, 1989, Silicon based integrated optics technology for optical sensor applications, in: Proc. 5th Int. Conf. on Solid Sensors and Actuators, Transducer'89, Montrieux, Switzerland, June 25-30, p. 324. Valette, S., S. Renard, J.P. Jadot, P. Gidon and C. Erbeia, 1990, Silicon-based integrated optics technology for optical sensor applications, Sensors & Actuators A21-A23, 1087. Valette, S., P. Gidon and J.P. Jadot, 1987, New integrated optical multiplexer/demultiplexer realized on Si substrate, in: Proc. 4th Eur. Conf. on Integrated Optics, ECI087, Glasgow, eds C.D.W. Wilkinson and J. Lamb, p. 145. Valette, S., A. Morque and P. Mottier, 1982, High performance integrated Fresnel lenses on oxidized Si substrates, Electron. Lett. 18, 13. Varshney, R.K., and A. Kumar, 1988, A simple and accurate modal analysis of strip loaded optical waveguides with various index profiles, IEEE J. Lightwave Tech. LT-6, 601. Verbeek, B.H., C.H. Henry, N.A. Olsson, K.J. Orlowsky, R.F. Kazarinov and B.H. Johnson, 1988, Integrated 4-channel Mach-Zehnder multi/demultiplexer fabricated with phosphorus doped SiOl waveguides on Si, IEEE J. Lightwave Tech. LT-6, 1011. Verber, C.M., R.P. Kenan and J.R. Busch, 1983, Design and performance of an integrated optical digital correlator, IEEE J. Lightwave Tech. LT-1, 256. Weiss, B.L., and G.T. Reed, 1991, The transmission properties of optical waveguides in SIMOX structures, Opt. Quantum Electron. QE-23, 1061. Weiss, B.L., G.T. Reed, S.K. Toh, R.A. Soref and F. Namavar, 1991, Optical waveguides in SIMOX structures, IEEE Photonics Tech. Lett. PTL-3, 19. Welbourn, D., C. Beaumont and M. Nield, 1991, Directional couplers in a high-index silica waveguide, in: Topical Meeting on Integrated Photonics Research, Monterey, CA, April 9-1 1, Paper ThA3. Willander, M., 1983a, Carrier-dependent parameters in a silicon optical waveguide, J. Appl. Phys. 56,4660. Willander, M., 1983b, Surface recombination velocity in a silicon optical waveguide, Appl. Phys. A 31, 45. Yamada, Y., M. Kawachi, M. Yasu and M. Kobayashi, 1984a, Optical fiber coupling to highsilica channel waveguides with fiber guiding grooves, Electron. Lett. 20, 313.
11
REFERENCES
59
Yamada, Y., M. Kawachi, M. Yasu and M. Kobayashi, 1984b, Fabrication of a high silica glass waveguide optical accessor, Electron. Lett. 20, 589. Yao, S.K., D.B.Anderson and R.R. August, 1979, An update of integrated optics on Si substrates, in: Proc. Europ. Conf. on Optical Communications, ECOC'79, Amsterdam. Zelmon, D.E., H.E. Jackson, J.T. Boyd, A. Neumaan and D.B. Anderson, 1983, A low scattering graded index SiO, planar optical waveguide thermally grown on silicon, Appl. Phys. Lett. 42, 565.
This Page Intentionally Left Blank
E. WOLF, PROGRESS IN OPTICS XXXII @ 1993 ELSEVIER SCIENCE PUBLISHERS B.V. ALL RIGHTS RESERVED
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN AND MODELS BY
FRANCIS T. S. Yu Department of Electrical and Computer Engineering, The Pennsylvania State Uniuersity, University Park, PA 16802, U S A
61
CONTENTS PAGE
§ 1.
INTRODUCTION . . . . . . . . . . . . . . . .
§ 2.
OPTICAL ASSOCIATIVE MEMORY
§ 3.
OPTICAL NEURAL NETWORKS
. . . . . . . . . .
69
§ 4.
NEURAL NETWORK MODELS . . . . . . . . . .
80
§ 5.
REDUNDANT INTERCONNECTION NEURAL NETWORKS . . . . . . . . . . . . . . . . .
63
. . . . . . . . . 66
.
105
6 6.
OPTICAL IMPLEMENTATION OF HAMMING NETS . . 113
§ 7.
INFORMATIONSTORAGECAPACITY
§ 8.
SELF-ORGANIZING OPTICAL NEURAL NETWORKS
131
§ 9.
CONCLUSION
142
. . . . . . . 121
. . . . . . . . . . . . . . . . . .
REFERENCES . . . . . . . . . . . . . . . . . . . 143
62
Q 1. Introduction
Electronic computers can solve computational problems, such as addition, subtraction, multiplication, and division, thousands of times faster and more accurately than human brains. However, cognitive tasks, such as pattern recognition, understanding and speaking a language, retrieving contextual information, and guiding a mechanical hand, can be performed more effectively and efficiently by the human brain. In fact, these tasks are still beyond the reach of modern electronic computers. A human brain consists of millions of neurons that are massively interconnected by synapses. Techniques to simulate artificial neural networks (ANNs) are basically drawn from cognitive psychology and from biological models. Thus, the purpose of neural network research is to simulate the structure of a network of massively interconnected biological neurons, in which the information is processed in a parallel and associative manner. There are, however, some fundamental differences between digital computers and ANNs. For example, in ANNs every neuron performs simple logical operations, in contrast with digital computers, in which data processing relies on central processing units (CPUs). Moreover, ANNs are capable of learning, improving their performance, adapting to different environments, and coping with disruptions, whereas digital computers are programmed with rigid rules, which must be manually reprogrammed for better solutions. Neural networks can process inexact, ambiguous, fuzzy data that do not exactly match the stored information, whereas electronic computers cannot adjust the specific definitions and rules for which they are programmed to accommodate new, inexact, degraded, or contradictory input. A neural network consists of a collection of processing elements, i.e., neurons. Each neuron has many input signals, but only one output signal, which is fanned out to many pathways that are connected to other neurons. These pathways interconnect with other neurons to form a network (fig. 1.1). The operation of a neuron is determined by a transfer function that defines the neuron’s output as a function of the input signals. Every connection entering a neuron has an adaptive coefficient called weight assigned to it. The weight determines the interconnection strength between neurons, and 63
64
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN AND MODELS
CU 5 1
Interconnection Weights
Input Neurons
Fig. 1 . 1 . A
one-layer
output Neurons
neural network.
it can be changed by a learning rule that modifies the weights in response to the input signals and the value supplied by the transfer function. The learning rule allows the response of the neuron to change with time, depending on the nature of the input signals, which means that the network adapts to the environment and organizes the information within itself, as in learning. In general, two kinds of learning can be distinguished namely, supervised and unsupervised learning. Supervised learning requires an instructor to supply the network with both input data and desired output data as training examplars (e.g., references). In other words, the network has to be taught when to learn and when to process information, but it cannot do both at the same time. In unsupervised learning the network is given the input data, but it has no desire to provide output data. Rather, after each trial, or series of trials, the network is given an evaluation rule that evaluates its performance. Thus, the network can learn an unknown input during some iterative process and behaves with a human-like characteristic of self-learning ability. A score of neural network models have been developed, including Fukushima [1969], Hopfield [19821, Kohonen [1984], Rumelhart and Zipser [19861, Lippmann [19871, and Carpenter and Grossberg [19871. Examples of supervised learning models are Hopfield, Perceptron, error driven back propagation, and the Boltzmann machine. The best known unsupervised learning models are adaptive resonance theory, Neocognitron, Madline, and the Kohonen self-organizing feature map. In general, a neural network of N neurons has N 2 interconnections. The transfer function of a neuron operation can be described by a nonlinear operation (e.g., a step function), making the output of a neuron operation in (0 or 1) binary states, or a sigmoid function that gives rise to analog
11, §
11
65
INTRODUCTION
values. Figure 1.2 shows that the state of the ith neuron in the network can be represented by an iterative equation, given by
where ui is the activation potential of the ith neuron, T j is the interconnection weight matrix (IWM) (called associative memory) between the j t h and ith neurons, and f is a nonlinear processing operator, for which we assume a thresholding function given by
{
f(4=
1, x>o, 0, X d O .
(1.2)
Thus the operation of a neuron is simply - . the nonlinear processing of the sum of the weighted input signals, such that the expression can be represented by a matrix-vector product, i.e.,
U=f{TV}. where U is the output state vector of the neural network, f is a nonlinear thresholding operator, Y is the input vector, and T is the IWM. Neural network models have been simulated by conventional computers, but they are best implemented on special-purpose computational hardware. VLSI technology was also used to design and simulate neural network operations. For a two-layer neural network the number of interconnections is equal to the square of the number of neurons in the network. For example, a fully interconnected 1O4 neurons network requires about lo8 interconnections, which is beyond the state-of-the-art VLSI technology. On the other hand, optics offers the advantages of parallel processing and massive interconnection capabilities for the design of a large-scale optical neural network (Farhat, Psaltis, Prata and Paek [19851, Lalanne, Taboury, Saget Interconnection ,Weights
Input Signals
Signals
Neuron
Fig. 1.2. Artificial neuron operation.
66
OPTICAL NEURAL NETWORKS ARCHITECTURE,DESIGN AND MODELS
CII, § 2
and Chavel [1987], Athale, Szu and Friedlander [1986], Wu, Lu, Xu and Yu [1989], Yu, Lu, Yang and Gregory [1990]). The primary features that stimulated our investigation of optical implementations of neural networks were that light beams which propagate in space do not interfere with each other and that optical systems provide a larger space-bandwidth product. The first optical implementation of a neural network was proposed by Farhat, Psaltis, Prata and Paek [1985]. Since then, a score of optical neural network architectures have been proposed, including the associative memory using a liquid crystal light valve (LCLV) by Farhat and Psaltis [1987] and Psaltis, Yu, Gu and Lee [1987]; hybrid optical neural networks using a programmable spatial light modulator (SLM) by Johnson, Handschy and Pagano-Stauffer [19873; and the optical ring resonator by Anderson and Erie [1987]. This chapter gives an overview of the optical implementation of neural networks. It is not intended to cover the vast domain of neural networks, for which studies by Rosenblatt [1962], Fukushima [ 19881, Hopfield [1988], and Kohonen [19841 are particularly useful.
Q 2. Optical Associative Memory Much interest has been expressed in the implementation of associative memory using optical techniques, especially the implementation of neural models. Associative memory is a process in which the presence of complete or partial input would directly produce a predetermined output. Here, we consider an electro-optical associative memory using an inner-product model, whereas a neural network is based on an outer-product model; for example, matrix-vector multiplication results in an outer product. It is interesting to note that the inner-product model can be implemented by employing a new class of photonic materials, the so-called electron trapping (ET) materials, as proposed by Jutamulia, Storti, Lindmayer and Seiderman [1991]. An optical correlator has been widely used to perform space-invariant pattern recognition, in which a system is equipped with a memory bank of patterns. The match of an input object with a memory pattern is detected by the presence of a strong correlation peak, which is essentially a spaceinvariant inner product. The correlation peak resembling a point source can be further used to reconstruct a hologram, such that a recalled output can be projected. If the recalled output is identical to the input, the system is
1195 21
OPTICAL ASSOCIATIVE MEMORY
67
categorized as auto-associative, but, if the recalled output is not the same as the input, the memory is called hetero-associative. It is obvious that to recall different stored memories, the inner product must not be space-invariant. In other words, the location of a noninvariant correlation peak represents the match of the input object with a memory pattern. Therefore, an optical associative memory retrieval process can be decomposed by taking multiple noninvariant inner products in parallel and multiplying them with the corresponding memories. Liu, Kung and Davis [19863 demonstrated a real-time optical associative retrieval technique which can be outlined as follows: (1) computation of the inner product of the input-pattern vector with each of the stored memories; (2) multiplication of each inner product with the corresponding memory; (3) summation of these products over all of the memories to produce the result, and (4) subjection of the result to thresholding to produce the output result. If the output result does not belong to one of the associative memories, the threshold result can be fed back as a new input, continuing this process until it converges to a final output result. Since a correlation peak is proportional to the inner product, this method is equivalent to the holographic method of Paek and Psaltis [1987]. Although the preceding methods may approximate a biological neural process, many iterations may be needed to obtain a convergent output. Using a two-dimensional optical thresholder can be avoided by changing the order of thresholdings and thus forming a noniterative association, which is similar to the winner-take-all algorithm. Even though the method may not be a closely imitated biological process, it may have some practical applications. The algorithm of these methods can be expressed by 4x9 y) = mk(x,Y ) ,
(2.1)
where o(x, y), and i(x, y ) are the respective output and input patterns, mk(x,y) is the kth stored memory, and U is an arbitrary constant. In fact, this algorithm is similar to that of the symbolic substitution by correlation of Yu and Jutamulia [1987], and is also similar to the method of controlled nonlinearity in the correlation domain proposed by Athale, Szu and Friedlander [1986] as well as the holographic associative memory by Owechko, Dunning, Marom and Soffer [1987].
68
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN AND MODELS
IIL 9: 2
In view of this algorithm, two operations are required to obtain an associative memory of o(x, y): (1) the inner product of the input pattern with each memory is computed, and (2) if the inner product equals or exceeds the thresholded value, the output pattern is the corresponding memory. Furthermore, by referring to the Parseval theorem, it is obvious that the inner product of the input pattern with the memory in the space domain is equal to the inner product in the Fourier domain, as given by
ss
mk(X,Y) i(x, Y) dx dy =
47c2 ~
ss
M k ( p ,4 ) +P,
- 4 ) dp dq b U ,
(2.3) where M k ( p ,q) and I ( p , q) are the Fourier spectra of mk(x,y) and i(x, y), respectively, and U is an arbitrary constant. Since mk(x,y) and i(x, y) are positive real, and their Fourier spectra are, in general, complex, the realization of eq. (2.3) is complicated. However, its association can be performed as follows: ssIM*(P.4)12 II(P,q)lzdPdq2U’,
(2.4)
where U ‘ is a new threshold value. Note that the left-hand side of eq. (2.4) can be used for object identification and counting, as demonstrated by Jutamulia, Fujii and Asakura [19821, as follows:
jj
IMk(P,4)12 Il(P9 dI2dP dq = WCIMk121 ~ 1 2 1 ~ x = o , y = o ~
(2.5)
where 9 represents the Fourier transform operation. Alternately, eq. (2.5) can be written as { F [ ( M k I * ) ( MIk
= 0,y = 0
= { [ F ( MI*)] k 0 [ F ( M kI *)]Ix= 0,y = 0
= {[mkOi] 0 [mkOi]}x=o,y=O,
(2.6)
where * and 0 denote the complex conjugate and correlation operation, respectively. In this equation we note that the autocorrelation peak value is, in fact, the cross correlation of mk(x,y) and i(x, y). The result of eq. (2.6) is proportional to the number of peaks from the first correlation (mk0 i), which can be used for object identification and counting application, as pointed out by McAulay, Wang and Ma [1990]. Note that this result can be used for counting the specific objects, whereas eq. (2.4) can be used for pattern matching (i.e., counting a specific object as either zero or one).
11, Q 31
OPTICAL NEURAL NETWORKS
69
There are two basic matching schemes, namely, XNOR and AND, which in Boolean algebra are given as
A XNOR B = (A AND B) OR (A and B),
(2.7) where the overbar denotes the complementary operation. We note that this equation will result in different inner-product calculations, such that the AND scheme performs only half of the matching performed in the XNOR scheme. However, the essence of the information is not reduced, since (A AND B) and (A AND B) are not completely independent. In fact, the two matching schemes based on XNOR and AND operations should be applied in different circumstances. For symbolic computation, if the XNOR matching scheme is applied, a digit “0” is equally as good as a piece of information as “1”. However, for two-dimensional associative memory dealing with imaging and object recognition, the AND (multiplicative) matching scheme is preferred. In the XNOR scheme the match between “0” of the background in a memory scene with “0” of the background in an input scene will contribute positively to the inner product. This contribution is generally not wanted for associative imaging memories where it is important to obtain a high ratio between auto-inner product and cross-inner product. In a real-world situation the input pattern can be just a uniform distribution. For example, a totally bright input consists of only “1”s. In this case the space domain XNOR scheme will always detect half-matches between the input and all stored memories at the threshold level, provided that a memory consists of the same number of “1”s and ‘0”s. However, the AND scheme will always detect full matches for all memories. Consequently, the whole memory will be considered as an associative memory. However, by simply blocking the intense DC component this drawback can be avoided in the Fourier domain approach.
9 3. Optical Neural Networks 3.1. TWO-DIMENSIONAL IMPLEMENTATION
Referring to the one-dimensional iterative equation of eq. (1.1), it is easy to see that a one-layer two-dimensional neural network would require N4 interconnections, for which the iterative equation can be written as
70
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
Fig. 3.1. Partition of a four-dimensional weight matrix submatrices.
CII, 9: 3
Tkijinto an array of two-dimensional
where Vlk represents the state of the lkth neuron in an N x N space and & i j is a four-dimensional interconnection weight matrix (IWM). Since the IWM can be partitioned into an array of two-dimensional submatrices Tllij,TI,,, ..., TNNij, a four-dimensional associative memory matrix can be displayed as an N 2 x N 2 two-dimensional format (fig. 3.1). For the twodimensional representation, attention has been drawn to optical implementation of neural networks. Farhat and Psaltis [1987] proposed an optical neural architecture in which a basic matrix-vector optical processor was used. In this system a linear array of LEDs was used as an input feature composer, which is interconnected to a synaptic mask by a lenslet array (fig. 3.2). To provide the network with self-organization and learning capabilities, a fine-resolution programmable spatial light modulator (SLM) with a large number of distinguishable gray levels is needed for the generation of the IWM. However, currently available SLMs have relatively small space-
Fig. 3.2. Two-dimensional optical neural network.
OPTICAL NEURAL NETWORKS
71
Fig. 3.3. Schematic diagram of an adaptive optical neural network using a video monitor.
bandwidth product and very limited gray levels which make it difficult to implement them in such an optical neural network. Furthermore, the interconnection part of the iterative equation has to be added electronically or using an array of light integrating elements, which would slow down the operation speed or pose severe optical alignment problems. To alleviate these problems, we describe a hybrid-optical architecture using a highresolution video monitor by Lu, Wu, Xu and Yu [I9891 (fig. 3.3). This architecture has the advantage of providing an incoherent light source, and it also alleviates the constraints of the low resolution and on the dynamic range of the SLM. However, using a video monitor had other disadvantages; the physical size makes the system rather large, and the curvature of the monitor screen causes some alignment problems. To solve these problems, a compact optical neural network using inexpensive pocket-sized, liquidcrystal televisions (LCTVs) by Yu, Lu, Yang and Gregory [I9901 will be discussed in the next section. We also note that the major differences of this architecture, when compared with the matrix-vector processor of Farhat and Psaltis, are that the positions of the input feature composer and synaptic mask have been changed. By using this arrangement, optics can perform the summation of the interconnections between the input vector and the memory submatrices. 3.2. LCTV-BASED OPTICAL NEURAL NETWORKS
Liquid-crystal televisions (LCTVs) have been used as viable spatial light modulators (SLMs) in various optical signal processing and computing architectures (Liu, Davis and Lilly [1985], McEwan, Fisher, Rolsma and Lee [1985], Gregory [1986], Young [1986], Tai [1986], and Tam, Yu, Gregory and Juday [1990]). Based on the current advances of the LCTV
12
OPTICAL NEURAL NETWORKS ARCHITECTURE. DESIGN AND MODELS
Ell, 5 3
Fig. 3.4. An LCTV optical neural network.
devices, the imaging quality has been improved close to that of the commercially available high-resolution video monitors. For example, the contrast ratio of a recent LCTV, with built-in thin-film transistors, can be higher than 30 : 1, and the dynamic range is approximately 16 gray levels, and can be continuously adjusted. The optical architecture that we will discuss is a Hitachi C5-LCl LCTV to display the IWM, and a Seiko LVD-202 LCTV to display the input patterns. The resolutions of these LCTVs are 256 x 420 pixels and 220 x 330 pixels, respectively. A 8 x 8 fully interconnected LCTV neural network is shown in fig. 3.4, in which an 80 W xenon-arc lamp is used as the incoherent source. The lenslet array consists of 8 x 8 lenses, in which each lens images each of the IWM submatrices onto the LCTV2 to provide the interconnections between the IWM and the input pattern. Thus we see that each of the IWM submatrices are superimposed and added onto (multiplied) the input pattern to establish the interconnection part of eq. (3.1.), i.e., N
N
The transmitted light field after the LCTV2 is collected by an imaging lens, which focuses at the lenslet array and images onto a charge-coupled-device (CCD) array detector (fig. 3.5). The signals collected by the CCD camera are then sent to a thresholding circuit, and the final results can be fed back to the LCTV2 for the next iteration. Note that the data flow in the optical system is controlled primarily by a microcomputer (PC). For example, the IWM and the input pattern can be written onto the LCTVl and the LCTV2 by means of the PC, and the PC can also make decisions based on the output results of the neural network. Thus the LCTV neural network is indeed a programmable, adaptive neural network.
13
OPTICAL NEURAL NETWORKS
Video Monitor
Lenslet Array
Input Device Imaging Lens
Output Detector
4 _.-
Fig. 3.5. Optical interconnection using lenslet array.
3.3. COMPACT OPTICAL NEURAL NETWORKS
The preceding section discussed an optical neural network using liquidcrystal televisions (LCTVs) and a large imaging lens. The numerical aperture of the imaging lens is approximately 1/0.7, and is used to image the lenslet array onto the CCD detector. However, the architecture suffers from the disadvantages of low light efficiency, high aberration, and larger physical size. To alleviate these shortcomings, we shall discuss a compact architecture, proposed by Yang, Lu and Yu [1990], using tightly cascaded LCTVs (fig. 3.6). Let us refer to the two-dimensional iterative equation (3.1), in which the input pattern and the IWM are displayed on the cascaded LCTVs. The emerging light can be made proportional to the product of TkijUij(n)by distributing the input pattern vectors over the LCTVs (fig. 3.7). Since the LCTV 1 Whlte Light Source
LCTV2 CCD
/ \
D
Fig. 3.6. Compact optical neural network using cascaded LCTVs.
I4
OPTICAL NEURAL NETWORKS: ARCHITECTURE. DESIGN AND MODELS
Tlk11Tlk12
Tlk21 Tlk22
"""
......
CII, 9: 3
I k 1N
lk2N
........................ TlkN2
( a )
......
lkNN
( b )
Fig. 3.7. Formats of the input and the IWM: (a) input pattern format, and (b) IWM.
TkijUij(n) submatrices are superimposed onto the CCD detector by the lenslet array, the output intensity array is the summation of these submatrices over i and j . Needless to say, by thresholding the output signals, the result can be fed back to the input LCTV2 for the next iteration. Thus a closedloop operation can be obtained with this system. Furthermore, with the elimination of the imaging lens, the system can be built in compact form. By cascading the input pattern with the IWM, the output pattern pixels will remain the same shape as the pattern pixels, instead of the circular shape of the lenses. 3.4. MIRROR-ARRAY INTERCONNECTED NEURAL NETWORKS
Optical implementation of neural networks has been burgeoning in recent years. A primary reason could be the fact that massive interconnections, based either on the lenslet-array or on holographic interconnections are available in optics. The distinctions between these techniques are that the lenslet-array interconnection neural network is basically an incoherent interconnection system, whereas the holographic interconnection is coherent. The advantage of using the incoherent lenslet interconnection is that the coherent artifact noise can be suppressed, however, it suffers from a low light efficiency, which limits the large-scale operation. To alleviate this shortcoming, we propose a mirror-array interconnection method, in which a high light-efficient system can be built. To improve the light efficiency, the mirror array interconnection neural network (fig. 3.8) proposed by Yu, Yang, Yin and Gregory [1991] will be described. Let us assume that the IWM and the input pattern vectors are
o
11, 31
OPTICAL NEURAL NETWORKS
Fig. 3.8. A mirror-array optical interconnected neural network.
displayed onto two tightly cascaded liquid-crystal televisions, as discussed in the preceding section. The input pattern is distributed in the LCTV using an input pixel element which is of the same size as the IWM submatrix. Thus, it can be seen that the intensity of the emerging light from the cascaded LCTVs is proportional to the product of the [Tl;kijUij(n)] submatrix. If we further assume that an N x N mirror array is affixed onto a parabolic substrate, each of the mirror elements would reflect the [ T U ] submatrices, and then be superimposed onto the CCD array detector. By thresholding the array of output signals, an output pattern can be obtained. Again, this output pattern can be fed back to the LCTVl for the next iteration. We note that by using the mirror array for interconnection, the light efficiency of the system increases by a factor of N as compared with the lenslet-array technique. Since each of the reflected [ T U ] submatrices is not precisely superimposed, it may degrade the performance of the interconnection. Let us now consider a one-dimensional analysis, as follows: The position error E occurs mostly at the edge of the [ T U ] submatrices, which can be shown to be
’,
c=
Nab2,
(3.3)
where N is equivalent to the number of neurons in the row (or the column) of a pattern vector, a is the pixel size, and /3 is the maximum allowable angle
16
OPTICAL NEURAL NETWORKS. ARCHITECTURE, DESIGN AND MODELS
[Il, 5 3
of the mirror array, as estimated using a paraxial approximation,
p=- N 2 a
(3.4)
F cos 8’
where F is the focal length of the parabolic substrate, and 8 is the angle between incident and reflected light beams. By substituting eq. (3.4) into eq. (3.3), the position error can be written as E=
N5a3 4F2 cos2 8’
(3.5)
from which we see that the position error increases as the fifth power of the number of neurons (i.e., E oc N ’ ) . Note that the pixel overlapping within the interconnected [ T U ] submatrices would also affect the performance of the system, which is primarily caused by the diffraction effect and source size of the system. For example, an extended light source would produce a divergent beam angle given by a=-
S
2f’ where s is the source size and f is the focal length of the collimating lens. Figure 3.9 illustrates the shifting effect of the pixel image, in which the
source light
=L
(b)
Fig. 3.9. Shadow casting configuration: (a) effects due to source size, and (b) effects due to image shifting.
IL9: 31
OPTICAL NEURAL NETWORKS
I7
deviation of the shadow-casted image can be expressed as
where L is the distance from the LCTVs to the CCD detector. Thus the spread of the pixel images can be written as
IL
d2 = -,
a
where I is the wavelength of the light source. If we restrict the position errors and the overlapping pixel to be within one tenth of the pixel size, the position error can be shown to be
and
d =d ,
+ d2 <&a.
(3.10)
Thus, using these constraints, adverse effects of using the proposed mirror array for interconnections can be minimized. 3.5. OPTICAL DISK-BASED NEURAL NETWORKS
The rapid advances of optical disk (OD) technologies have benefited from the large optical information storage. This section examines an OD-based neural network as proposed by Lu, Choi, Wu, Xu and Yu [1989]. Since an OD stores information in binary form, two neural network models, i.e. the one using binary pattern association (BPA) by Lu, Xu, Wu and Yu [1990] and the three-state pattern association (TPA) by Farhat and Psaltis [1987] are suitable for optical implementation. The IWM for BPA is a binary matrix that can be implemented directly on an OD, whereas the IWM for TPA entries are “1” (i.e,, excitation), “-1” (i.e., inhibition), or “0” (i.e., no relation) based on the association. Since IWM can be represented by positive Tf and negative T - binary matrices, they can be implemented on both sides of a optical disk. Figure 3.10 shows a proposed OD-based neural network, in which a pulse-laser beam is divided into two orthogonally polarized beams using a polarized beam-splitter (PBSl). Since these beams are directed at the scanning heads, T + and T - can have a parallel readout. Note that each scanning head consists of a mirror, a polarized beam-splitter, a quarter-wave plate,
78
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN AND MODELS
L:
CII, 5 3
Lens
M: Mirror PBS: Polarized Beam Splitter QWP: Quarter Wave Plate T,: Positive Pad of IWM T-: Negative Pati of IWM 1: Nonlinear ThreshoMing Function PD: Photo-Detector
Fig. 3.10. Optical disk-based neural network architecture.
and a microscopic lens, which is to enlarge the readout matrix. Moreover, the scanning heads are designed to travel along the diagonal lines of the disk, which covers the innermost to the outermost tracks, and the polarized beam-splitters (PBS2 and PBS3) are aligned in the same direction as the incoming beams. Since each reading beam passes the quarter-wave plate twice, its polarization rotates by 90". These orthogonally polarized readout beams are then enlarged by the microscopic lenses L2 and L3, and then superimposed onto the diffuser to form an incoherent IWM. The optical setup behind the diffuser is basically the same as the previously described optical neural networks. To form a closed-loop neural network operation, the electronic signals from the photodetector arrays are sent to a postprocessing array electronic circuitry. The circuit consists of a buffer, a comparator, and a thresholding network. The output result can be fed back for the next iterations, or sent to the microcomputer for decision making. Several types of commercially available optical disks can be used, such as read-only CD-ROM, write-once OD, and magneto-optical erasable disk.
1I,§ 31
OPTICAL NEURAL NETWORKS
19
Let us assume that an Optimen 1000TM(a 12 inch diameter, read-only OD) is used. This OD can store as much as 2.05 G b information on both sides. The area on which the laser beam focuses is chosen to be 1 mm', which is primarily limited by the field of view of the microscopic lens (Ll and L3), and a magnification factor of 70 is assumed. Since 1 mmz contains 529 x 529 bits, it can store a (23 x 23)-neuron IWM. The blocks-in-tracks configuration of the OD is shown in fig. 3.10. Although a radial alignment problem of the blocks occurs, this can be compensated for either by using lenses L2 and L3 or by input SLM with the photodetector arrays. To retrieve an association memory, the scanning heads can simply aim at the IWM blocks using a pulse laser. The captured IWM is then imaged onto the diffuser (fig. 3.10). Note that the pulse width and power of the laser are related to the response time and sensitivity of the photodetector array. Since the response time of a typical photodetector is in the 1 ns range, a pulse width in the nanosecond range should be chosen. Let us now consider the rotation of the OD driver. Since the size of a pit is about I pm, in order to avoid blurring the travelling distance of the scanning heads should not exceed 1 pm within the pulse duration (i.e., 1 ns). Thus, the maximum rotation of the OD can be as high as V,, =
x lo9 x 60/(n x 12 x 0.025) = 63 662 rpm,
which is much higher than the rotation of the commercial available OD driver (about 1122 rpm). Based on the OD design parameters described above, one can have about 14 600 blocks on each side of the OD, each block having 529 x 529 bits. Assuming that each block can store 50 associations by using the interpattern association model, there would be about 730 000 associations per disk, i.e., almost five times the 150000 entries of a Webster's Collegiate Dictionary. If the existing OD driver spins at 1122 rpm, the reading heads take about
Fig. 3.11. IWM blocks on an optical disk.
80
OPTICAL NEURAL N E T W O R K S ARCHITECTURE. DESIGN A N D MODELS
CII, § 4
45 ps to move from one block to another. We assume that the array of electronic circuits can perform the postprocessing within this duration and each block consists of 279841 connections. The processing rate of the proposed OD neural network is, therefore, V, = 2 x 279 841/(45 x
= 12.4 x
lo9 connections per second.
Compared with the currently available electronic processors (designed for neurocomputing), which can perform about 22 x lo6 connections per second, the proposed OD-based neural network has a speed-up factor of about 560. There are, however, critical issues that should be addressed before the realization of the OD-based neural network. The readout head has to be redesigned for reading the whole block of an IWM (Psaltis, Yamamura, Neifeld and Kobayashi [1989]). Although the existent SLM still cannot catch up with the processing speed of the proposed system (i.e., about lo9 Hz), the O D system is suitable for applications to a huge database associative search, which does not require frequent changes of the input patterns. Furthermore, the electronic bottleneck in the feedback loop may be alleviated to some extent by using parallel buffers. However, a decisionmaking circuit for post-processing data within 2 to 45 ps has to be developed. 9; 4. Neural Network Models
A neural network is a massively parallel data-processing architecture, composed of many simple processing elements interconnected to achieve a collective computational procedure. The essence of a neural net is to embed a serial of algorithms into the parallel interconnected networks, so that it will provide faster (due to a shorter algorithm) and more accurate (due to a distributed structure) computations. Neural networks are particularly effective for optimization, association, and recognition, compared with serial computers. 4.1. HOPFIELD MODEL
An associative memory is in the class of fundamental neural networks for which a complete or partial pattern of a known input will recall a predetermined associative output. An associative memory is recalled by taking the outer product of the input and the distributed memory. If we consider that the input is a one-dimensional vector, the distributed memory is then a two-
11, § 41
NEURAL NETWORK MODELS
81
dimensional matrix. If only a set of orthogonal vectors can exist, an arbitrary input (a member of the orthogonal set) will associate with only one predetermined output. When the input vector is slightly distorted (thus it is not a member of the orthonogal set), the desired output cannot be retrieved. The output will be a linear combination of several associative memories that is unrecognized. By referring to this problem, Hopfield [19823 proposed a mathematical model in which the desired output could be retrieved from a distorted input after several nonlinear iterations. In the definition of the Hopfield model an associative memory retrieval process is equivalent to an iterative thresholded matrix-vector outer product, expressed by
N
where (l and 5 are the binary output and binary input, respectively. This iterative process is repeated by using the previous output as the new input, until the output converges to be the same as the input. In other words, the iterative process converges on the associative memory, in which the interconnection weight matrix (IWM) is defined as
for i = j, where V y and V y are the ith and j t h elements of the rnth binary memory vectors. Since the memory vector is distributed in the memory matrix, other memories can be overlaid in the same IWM. Therefore, it is merely necessary to calculate the interconnection for each memory, and then add them together to form an associative memory matrix. A problem occurs, however, if the storage memories become too large, necessitating a substantial increase in the number of neurons. The features of the Hopfield model include the following: ( 1 ) construction of the IWM is based on an outer-product operation; (2) the IWM is determined by a supervised learning rule as expressed in eq. (4.2); (3) the iterative equation involves a nonlinear thresholding function; (4) the desired output from a distorted or incomplete input can be obtained after iterations; and (5) the IWM does not necessarily change after
82
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN A N D MODELS
CII, § 4
a noniterative learning. The most important advantage of the model is that it provides correct recognitions and associations for distorted or partial input signals. However, since iterations are required, the processing speed is slowed down. As an experimental illustration of the Hopfield model using the LCTV neural network of fig. 3.4, we let the letters A, B, W, and X be stored in the IWM-based on the outer-product operation. We assumed that each letter occupies an 8 x 8 pixel array (fig. 4.1a). The positive and negative parts of the IWM are shown in figs. 4.lb and c, respectively. As depicted in fig. 4.lb, a partial image of A is fed to the input LCTV2 of the LCTV neural network. By sequentially displaying the positive and negative parts of the IWM on the LCTV1, the output signal arrays can be picked up by the CCD detector and then sent to the microcomputer for subtraction and thresholding operations. A partially recovered pattern is obtained at the output end, as shown
Fig. 4.1. Experimental results of the Hopfield model: (a) training set; (b), (c) positive and negative weight matrices; and (d) reconstructed result.
11, D 41
NEURAL NETWORK MODELS
83
in the middle of fig. 4.ld. Since the output pattern is not completely recovered, this result is fed back to the input LCTV2 for the next iteration. A more completely recovered letter is obtained, as depicted on the right-hand side of fig. 4.ld. We note that, to avoid bipolar quantities, the positive and negative parts of IWM can be combined with a bias, so that a single-step operation can be achieved. 4.2. BACK-PROPAGATION MODEL
The Hopfield model is basically a two-layer (input layer and output layer) neural network, which is very useful for associative memory, but not for complex computations. A classic example is that of an XOR gate that cannot be implemented by a simple input-output layer network, as pointed out by Rumelhart, Hinton and Williams [1986]. This is also understood from the viewpoint of fundamental logic processing by Hurst [1978]. A neuron is exactly the same as a threshold logic gate, which performs sum-of-weightedproduct and thresholding. The AND and OR are vertex gates, and thus can be easily realized using threshold logic gates. However, the XOR gate cannot be realized using a single threshold logic gate, suggesting that a neuron cannot perform the XOR operation. Nevertheless, multilayer neural networks are capable of performing XOR or other complex computational tasks. It should not be difficult to determine the interconnection for a two-layer network when the values or states of the two layers are known. However, it will be very difficult to determine the interconnections between layers in a multilayer network, provided that only the states of the input and output layers are known. Rumelhart, Hinton and Williams [19861 proposed the back-propagation method to address this problem. This method is applied to the formation of correct interconnections for hidden layers in a multilayer network. To generate interconnections for known input-output pairs, random interconnections are assigned as the starting point. During the first phase of operation, the input is presented and propagated forward through the multilayer network to compute the output. The actual output is then compared with the desired output from which an error signal results. The second phase of operation involves a backward pass through the network, during which the error signal is passed to each layer in the network and the appropriate weight changes are made. The outer product for multilayer neural networks is given as follows, nj =
1 wjioir i
(4.3)
84
OPTICAL NEURAL NETWORKS. ARCHITECTURE, DESIGN AND MODELS
CIL 0 4
where nj is the output before a nonlinear transformation, wji is the weight of the interconnection, and oi is the output of the previous layer (ith layer). The output for the j t h layer is (4.4) wherefj is a differentiable and nondecreasing response function for a nonlinear transformation. Note that the hard thresholding is not differentiable. The output expressed in eq. (4.4) then becomes the input for the next layer, as expressed in eq. (4.3).This computation is continued until the final output is obtained. The final output from this actual computation may not be the same as the desired output, so that the interconnections between layers must be corrected. The important aspects of the back-propagation model are: (1) the interconnection is formed after an iterative error-correction process (as consequence no specific explicit algorithms is needed and it is not necessary to write a program as long as the input-output pair is known); and (2) after the correct interconnection is formed, the computation can be performed in one step. Optical implementations for the back-propagation neural network were discussed by Psaltis, Brady and Wagner [ 19883. They proposed an architecture using a photorefractive crystal for the implementation of the IWMs, which can be optically corrected or changed. The feed-forward and backpropagation networks were physically the same network. However, no practical SLM has the capability of providing simultaneously the required transfer functions for both the feed-forward and back-propagation signals. Nevertheless, for a hybrid electro-optic system it should be relatively easy to implement the back-propogation neural network. One of the many applications is that by Gorman and Sejnowski [I9881 who applied the back-propagation !earning algorithm to train a neural network to classify sonar returns from an undersea metal cylinder and a cylindrically shaped rock of comparable size. 4.3. ORTHOGONAL-PROJECTIONMODEL
The error-correction ability of the Hopfield model is effective, assuming that the stored vectors are significantly different (i,e., independent). The correction ability decreases rapidly as the number of stored patterns increases. McEliece, Posner, Rodemich and Venkatesh [19871 calculated that to obtain the desired results the number of stored vectors, M , in the
11, 8 41
85
NEURAL NETWORK MODELS
Hopfield model should be sufficiently smaller than the number of neurons, N , in the network, i.e.,
N M<41n N'
(4.5)
In practice, however, the stored vectors are generally not fully independent, which produces ambiguous output results. Orthogonalization techniques have been used in associative memory for digital image processing. We shall now use the orthogonal projection (OP) algorithm to improve the error-correction ability of the optical neural network, as described in the preceding section. Let us consider an N-dimensional vector space consisting of a set of M vectors V'"),which will be used to construct an IWM. The basic concept of the OP algorithm is to project each vector Po) within the vector set { Y('")} onto the orthogonal subspace spanned by the independent vectors Y*('"), m = 1,2, ...,mo - 1. The orthogonal projection vector can be described by the Gram-Schmidt orthogonalization procedure (Paige and Swift [196l]), so that
where [ V'"),V*('")] denotes the inner product, and 1) V*(m)11 is the norm of v*('"). Hence, for a given matrix T("-l), the recursive algorithm for the associative memory matrix T('")can be expressed as
otherwise, (4.7) where U('")is the desired output vector, and the initial memory matrix T(O) can be zero or the identity matrix. Using the O P algorithm described above, computer simulations of the proposed optical neural network were conducted, and the result is shown in fig.4.2. Note that the same English letters (fig.4.la) are used for the reference patterns. By applying eqs. (4.6) and (4.7), the constructed positive and negative parts of the IWM are shown in figs. 4.2a and b, respectively. The reconstructions of a partial pattern A by using the O P and Hopfield models are shown in figs. 4 . 2 ~and d. The successive patterns in these figures
86
OPTICAL NEURAL NETWORKS ARCHITECTURE. DESIGN AND MODELS
CII, § 4
Fig. 4.2. Orthogonal project model: (a), (b) positive and negative weight matrices using the OP algorithm; (c) reconstruction using the OP algorithm; (d) reconstruction using the Hopfield model.
represent the output results obtained with successive iterations. From these results we see that the OP algorithm is more robust and has a higher convergent speed compared with the Hopfield model. In this example the OP algorithm requires only two iterative operations to obtain the correct result, whereas the Hopfield model converges into a local minima, which gives rise to an incorrect result (fig. 4.2d). A numerical analysis of the robustness of an (8 x 8)-neuron, one-layer neural network is evaluated. The twenty-six letters of the English alphabet (here defined in a grid of 8 x 8 pixels) are used as reference patterns. The average Hamming distance of the reference patterns is about 26 pixels, and the minimum distance is about 4 pixels. Let us assume that the input patterns are embedded in an additive random noise, where the input SNR are chosen to be about 5 dB (i.e., 50% noise), 7 dB (i.e., 33% noise), and 10 dB (i.e., 10% noise), respectively. Figure 4.3 shows the output error pixels against the number of stored patterns for various values of input SNRs. From this figure we see that the Hopfield model becomes unstable for a storage capacity beyond five letters. Note that this result is quite consistent with the storage capacity of eq. (4.9,whereas, in the stability range of the Hopfield model, the neural network using the OP algorithm can retrieve all the letters even with 50% input noise. The error-correction ability also decreases substan-
11, Q 41
87
NEURAL NETWORK MODELS
10
-
Ir
-
TOTAL PIW No. 64
SNR.5db
OP
0 0
10
20
30
REFERENCE PATERN NUMBER
Fig. 4.3. Performance of the Hopfield and OP models.
tially as the input noise and number of stored patterns increase. Nevertheless, the OP algorithm provides better error-correction capability when compared with the Hopfield model. 4.4. MULTILEVEL RECOGNITION MODEL
For the case of an ill-conditioned weight matrix, the Hamming distances between the stored vectors are very short, which can produce incorrect results. For example, four capital letters, T, I, 0, and G , are stored in the associative memory matrix (fig. 4.4a). Although the input partial image of G contains the main feature, the Hopfield neural network fails to reproduce the correct pattern by which the output converges to an erroneous result (fig. 4.4b). In some cases the output would not converge to a correct result, even when the input is exactly the same as one of the reference patterns. In other words, the Hopfield model is effective only in dealing with independent patterns.
88
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CK § 4
Fig. 4.4. (a) Four letters stored in the memory mask; (b) pattern obtained by the Hopfield model.
From the preceding example we see that the smaller the Hamming distances among the stored patterns, the less the error-correction ability. However, it is also known that the less information stored in the memory, the more effectively the neural network can correct the error. Using the advantages of the programmability of the proposed optical neural network, described in preceding sections, a multilevel recognition (MR) algorithm can be developed. This algorithm adopts the tree search strategy which increases the error-correction ability by reducing the number of vectors stored in each memory matrix (Hecht-Nielsen [19861). The MR algorithm first classifies the reference patterns into subgroups, and then develops a tree search strategy based on the similarity (i.e., Hamming distance) to the reference patterns. A smaller number of reference patterns can be stored in the memory matrix built for each subgroup. The MR algorithm then changes the memory matrices with reference to the Hamming distances between the intermediate result and the patterns in different subgroups. In this manner the storage capacity is not limited by the size of the neural network. However, the tradeoff is that the processing speed is slowed down due to the changes in the memory matrices. As an example, we consider the same four letters as before. According to the similarity of the patterns, these letters can be classified into two groups, namely, [0,G] and [T, I]; T and 0 are arbitrarily selected from these two subgroups to form a root group [T, 01, as shown in figs. 4.5a-c. Instead of constructing an IWM TTIOG, three submemory matrices, TTo, T o G I and TTI, are composed and stored in the microcomputer. In the first demonstration the memory matrix TOT is displayed on the LCTVI, and an input vector V', which represents the partial image of G, is presented on the input LCTV2 of the optical neural network. After converging to a stable state, the output vector V* is compared using the Hamming distances with respect to the stored vectors 0 and T. If the Hamming distances between the vectors V* and 0 is shorter than that of T, the memory matrix TOG will be used to replace the matrix TOT.Subsequently, the input vector Y* is fed into the
11, I 41
NEURAL NETWORK MODELS
89
Fig.4.5. (a) Root group; (b),(c) two subgroups of letters used in the MR algorithm; (d), (e) reconstruction of the letter G by TToand TOG,respectively.
network for a new round of iteration, such that the letter G is retrieved, as shown in fig. 4Sd, e. A flow chart to illustrate this tree search operation is shown in fig. 4.6. 4.5. INTERPATTERN-ASSOCIATION MODEL
Neural networks have proved to be very effective for pattern recognition. There are two basic approaches to construct the IWM for a given set of reference patterns. The first approach is the intrapattern association, which emphasizes the association of elements within each reference pattern. For example, in the Hopfield model the outer products of the reference patterns are summed to form the IWM. This type of approach can create an unstable or ill-conditioned network, if the reference patterns are not mutually indepen-
Fig. 4.6. Flow chart diagram of the MR algorithm.
90
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CII § 4
dent. The second scheme is the interpattern association (IPA), in which the IWM is constructed by emphasizing the association among the reference patterns. If we suppose that the reference patterns are similar patterns (e.g., human faces, fingerprints, handwritten characteristics), the special features of the patterns become very important for pattern recognition. It therefore becomes advantageous for the construction of the IWM to consider the relationships between the special and common features of the reference patterns (Lu, Xu, Wu and Yu [1990]). For example, a set of three overlapping patterns A, B, and C is presented in a Venn diagram (fig. 4.7). These patterns can be divided into seven subspaces: I, 11, and 111 are the special subspaces of patterns A, B, and C, respectively, IV, V, and VI are the common subspaces of A and B, B and C, C and A, respectively; VII is the common subspace of A, B, and C. The rest can be defined as an empty set 8. These subspaces can be expressed by the following logic functions: I =A
A
11 = B
( B v C),
A
(Av
c),
111 = c A ( A V B),
lI/=(A
A
B ) A C,
I/=
(B A
c) A 2,
I/I = (c A A ) A
B,
~ I I = ( AA B A C) A @,
-
(4.8)
where A , v , and stand for the logic AND, OR, and NOT operations, respectively. Let us now start building the interconnections between input and output neurons. By using this set of logic functions, we can determine their excitatory, inhibitory, and null interconnections. For instance, if an input neuron in the VII subspace is at the “on” state, this neuron can only excite the
Fig. 4.7. Common and special subspaces of three reference patterns.
11,s41
NEURAL NETWORK MODELS
91
output neurons within the VII subspace and has no connection with the output neurons in other subspaces. On the other hand, if an input neuron in the V subspace is “on”, it will excite the output neurons in V and VII subspaces but inhibit the output neurons in the I subspace, similarly for input neurons in the IV or VI subspace. Furthermore, if an input neuron is “on” in the I subspace, it will excite all the output neurons in pattern A (i.e., I, IV, VI, and VII subspaces) but inhibit the output neurons in (B v C ) A A (i.e., 11, 111, and V subspaces), and similarly for the other subspaces. By using the induction principle the logic function rule can be extended to M reference patterns, as given by (4.9)
X=PAQ,
where P=p1
A p2 A
Q = q i v q2 v
“‘
A
Pn,
v qm,
where p l , p 2 , ..., pn and q l , q 2 , ...,q m are reference patterns, and n + rn = A4 is the total number of reference patterns. The input neurons in subspace X must excite (i.e., having positive connections with) all the output neurons in P subspaces, inhibit (i.e., having negative connections with) all output neurons in Q A P‘ subspaces, where P‘ is defined by P’ = p1 v p 2 v ... v pn,
(4.10)
and have no connection with the output neurons in the remaining subspaces. For simplicity, if the connection strengths (i.e., interconnection weights) are assumed to be equal to 1 for positive connections, -1 for negative connections, and 0 for no connection, then the IPA neural network can be constructed in a simple three-state structure. To illustrate further the construction of the IPA model, A, B, and C are assumed to be three 2 x 2 array patterns fig. 4.8a-c, and the pixel-pattern relationship is provided in table 4.1. It is apparent from this table that pixel 1 represents the common feature of patterns A, B, and C , pixel 2 is the common feature of A and B, pixel 3 is the common feature of A and C, and pixel 4 represents the special feature of C . By applying the preceding logic operations of eq. (4.8), a three-state interconnection neural network can be constructed, fig. 4.8d. This is a onelayer neural network of four input and four output neurons, in which each neuron is matched to one pixel of the reference patterns. For example, the first input neuron corresponds to pixel 1 of the input pattern and excites
92
OPTICAL NEURAL NETWORKS: ARCHITECTURE. DESIGN AND MODELS
IK 8 4
Fig. 4.8. Example of an IPA neural network: (a)-(c) three reference patterns; (d) one-level neural net; and (e) interconnection weight matrix. 4.1 TABLE Pixel-pattern relationship of three reference patterns. Pattern
A B C
Pixel 1
2
3
4
I
1 1
I 0
0 0
0
1
1
1 1
only the first output neuron. The second input neuron (the corresponding pixel belongs to patterns A and B) excites both the first and second output neurons, whereas it inhibits the fourth output neuron, which belongs to the special subspace of pattern C. We can now partition the four-dimensional IWM into a two-dimensional submatrix array (fig. 4.8e). The IWM can be divided into four blocks, in which each block corresponds to one output neuron. The four elements in one block represent the four neurons at the input end. For example, since all four elements in the upper-left block have a value of 1, any of the four input neurons can excite the first output neuron. In the upper-right block, as another example, the first and third elements are 0, the second element has a value of 1, and the fourth element is - 1. Thus the interconnections can be determined such that the first and third input neurons have no connection with the second output neuron, the second input neuron excites
11, P 41
NEURAL NETWORK MODELS
93
the second output neuron, and the fourth input neuron inhibits the second output neuron. To simplify the logic operations, an equivalence rule is developed which can be used to construct the IWM by examining the pixel-pattern relationships as described below. We stress that these rules for the construction of the IWMs are simple and straightforward, and thus are suitable for computer implementation. Let us define D1,,as a two-dimensional matrix that corresponds to the two-dimensional array in table 4.1, where 1 and i denote the row and column numbers, respectively. Let d, be the number of patterns that are in state 1 at the ith pixel, then it can be determined by summing the elements in the ith column of table 4.1, i.e., M
di =
1 Dl,i.
(4.1 1)
I= 1
Let us also define (4.12) which is the sum of the product of columns i and j in table 4.1. We then can construct an IPA neural network by applying the following logical rules: (1) If kij = min(d,, dj), when di < d j , pixel i will excite pixel j , but pixel j will not excite pixel i; when di = dj, pixels i and j will excite each other; when di> dj, pixel j will excite pixel i, but pixel i will not excite pixel j . (2) If 0 < kij < min(di, d j ) , pixels i and j have no connection with each other. ( 3 ) If kij = 0, when di # 0, and dj # 0, pixels i and j will have no connection with each other. Differences rather than similarities between patterns are used for pattern recognition. Similarly to some other neural network algorithms, the Hopfield model constructs the IWM by correlating the elements within each pattern ignoring, however, the relationships between the patterns. For example, the IWM of the Hopfield model for three reference patterns A, B, and C can be written by an outer-product representation
+
T = AAT BET
+ CCT,
(4.1 3)
where the superscript T represents the transpose of the pattern vectors. If input pattern A is applied to the neural system, the output pattern vector would be
+
+
V = TA = A(ATA) B(BTA) C(CTA),
(4.14)
94
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN A N D MODELS
CII, § 4
in which ATA represents the autocorrelation of pattern A, whereas BTA and CTA are the cross-correlation between A and B, and A and C, respectively. If the differences between patterns A, B, and C are sufficiently large (e.g., Hamming distance), the autocorrelation of A, B, or C would be much larger than their cross-correlations, i.e., ATA >> BTA,
ATA >> CTA.
(4.15)
Thus we see that pattern A has a larger weighting factor when compared with patterns B and C; for pattern A patterns B and C can be considered as noise disturbances. It is therefore apparent that by choosing a threshold value, pattern A can be reconstructed at the output end of the neural network. On the other hand, if patterns A, B, and C are very similar, the inequalities of eq. (4.15) no longer hold, meaning that the threshold value cannot be defined, and the Hopfield neural network would become unstable. Computer simulations using a (8 x 8)-neuron neural network are conducted for both Hopfield and IPA models. The reference patterns considered are the 26 capital letters of the English alphabet arranged according to their similarities. Figure 4.9 shows the error rates as a function of the number of stored reference patterns for various input signal-to-noise ratios (SNR).The Hopfield model becomes unstable at about 4 stored patterns, whereas the IPA model is quite stable up to about 12 stored letters, for an SNR of 7 dB. For a noiseless input, the IPA model can perform even better, producing correct results for all 26 stored letters, whereas the Hopfield model starts making significant errors when the number of reference patterns increase beyond 4 patterns. It is interesting to show the experimental results obtained from the preced-
0
10
20
30
Reference Pattern Number
Fig. 4.9. Comparison of the IPA and Hopfield models. For curve I noise level is lo%, for I1 5% and for 111 there is no noise.
11, I 41
NEURAL NETWORK MODELS
95
ing LCTV neural network. We used the letters B, P, and R as the training set for constructing the IWM (fig. 4.10a). The positive and negative parts of the IWM for the IPA model are shown in figs.4.10b and c, and for the Hopfield IWM, in figs. 4.10d and e. By comparing these two IWMs, it can be seen that the IPA model has fewer interconnections and fewer gray levels. The latter is significant for SLM implementation, because the IPA model requires only three gray levels, whereas the Hopfield model needs 2M + 1 gray levels, where M is the total number of stored patterns. Furthermore, the results of these two models based on an input pattern which is embedded in 30% random noise (SNR = 7 dB) is shown in fig. 4.1Of; the output results
Fig. 4.10. Experiments on an optical neurocomputer. (a) Three similar reference patterns; (b), (c) positive and negative IWMs of the IPA model; (d),(e) positive and negative IWMs of the Hopfield model; (f) input pattern, SNR = 7 dB; (g) pattern reconstruction using the IPA model; (h) result using the Hopfield model.
96
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CII, § 4
are shown in figs.4.10g and h. Once again, we see that the IPA model performs better, even within the storage capacity range of the Hopfield neural network [i.e., eq. (4.5)]. 4.6. HETERO-ASSOCIATION MODEL
A hetero-associative neural network model using interpattern association will be described. The concept of the IPA model is to determine whether the pixels in the pattern space belong to special or common subspaces, and then decide the excitatory or inhibitory interconnections based on a simple logical relationship. Let us now apply the IPA algorithm to the hetero-association model, as described by Yu, Lu and Yang [19901. For example, the overlapping input-output training sets (fig. 4.1 1) can be divided into subspaces, for which I, 11, and I11 are the special subspaces of input patterns A, B, and C, respectively, whereas IV, V, and VI are the common subspaces for A and B, B and C, and C and A, respectively, and VII is the common subspace for A, B, and C. Similarly, for the output pattern space. These subspaces in the inputoutput spaces can be determined by the following logical functions: I =A
A
I1 = B
(BV
A
c),
( A v C),
111 = c A ( A v B),
Iv= ( A A B) A c, v=(BA
c)A 1,
1’ = A’
A
(B’v c‘),
11‘ = B’ A (A‘ V 111‘ =
c’),
c’ A (A’ V B’),
e,
IV‘ = ( A ‘ A B’) h V‘ =(B’A c’)A A’,
v1 = (cA A ) A
B,
I/I’=
VII = ( A A B A
c)A 6,
VII’ = (A’ A B’ A
-
(c’A A’) A B’, c’) A 6,
(4.16)
stand for the logic AND, OR, and NOT operations, where A , v and respectively, and 8 denotes the empty set. Based on these simple logical operations, a hetero-association IWM can be constructed. Let us consider the input-to-output neuron relationship as a mapping process for every pixel in the input space onto the output space. For example, when a neuron (e.g., representing a pixel) in the input subspace VII is “on”, it implies that this neuron will excite all the neurons within the output space VII’, and that it has no connection with the neurons in other output subspaces in S2. However, when a neuron in the input subspace V is “on ”, it will excite the neurons in the output subspaces V‘ and VII‘, but inhibit the neurons in 1’. Similarly, logical operations can also be applied to neurons in subspaces IV and VI.
NEURAL NETWORK MODELS
91
Fig. 4.1 I . Hetero-association model. S , and S2 are the input and output pattern spaces.
For the case where a neuron is “on” in the input subspace I, it implies that pattern A appears at the input end, for which this neuron can excite all the neurons in output pattern A’ (i.e., subspaces 1‘, IV, VI’, and VII’), and inhibit the neurons in subspaces ( B v C ’ ) A A ’ (i.e., subspaces II’, III’, and V). Similarly, neurons in subspaces I1 or I11 would excite output patterns B or C‘, and inhibit 1’, 111’, and VI’, or 1’, 11’, and I V , respectively. Thus we see that a hetero-association IWM can be constructed by simple logical rules. Needless to say, by applying the induction principle, hetero-association logical operations can be extended to M reference patterns, in a way similar to the one described in the preceding section. To illustrate the construction of the hetero-associative IWM, let us assume that A, B, C , A‘, B’, and C’ are the input-output training sets (figs.4.12a and b), and the corresponding input and output pattern pixel relationships are given in tables 4.2 and 4.3, respectively. As can be seen from the input pattern space, pixel 1 (upper-left) is the common feature of A , B, and C , pixel 2 is the common feature of A and B, pixel 3 is the common feature of A and C , whereas pixel 4 is the special feature of C . Likewise, from the output pattern space, pixel 3 is the special feature of B’, and so on. A three-state interconnection neural network can therefore be constructed (fig. 4.12~).Note that the second output neuron, representing the common feature of A’, B , and C’, has positive interconnections from all the input neurons. Whereas the fourth output neuron, also a common feature of A’, B’, and C‘, is subjected to inhibition from all input neurons, presenting the common features of A, B, and C in the input pattern space. The corresponding IWM for the hetero-association is constructed as shown in fig. 4.12d. Implementation of the hetero-association model in the preceding LCTV neural network of fig. 3.6 for character translation is conducted. A set of input-output training patterns is shown in fig. 4.13a, where the upper and
98
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CIL 8 4
Fig. 4.12. Construction of a hetero-association IWM using the IPA model: (a) input-output training sets; (b) a three-state neural network; and (c) hetero-association IWM.
TABLE 4.2 Input pixel-pattern relationship. Pattern
Pixel 1
2
3
4
B
1 1
C
1
1 1 0
1 0 1
0 0 1
A
TABLE 4.3 Output pixel-pattern relationship
Pattern
A' B'
c
Pixel 1
2
3
4
0 0 I
1 1 1
I 0 1
0
0 0
NEURAL NETWORK MODELS
99
Fig. 4.13. Character translations: (a) English-Chinese character training sets; (b) positive (left) and (c) negative (right) hetero-association IWMs for English-to-Chinese translation; (d), (e) partial input English letter to the translated output Chinese character.
100
OPTICAL NEURAL NETWORKS. ARCHITECTURE, DESIGN A N D MODELS
LlI, § 4
lower rows are the letters of the English alphabet and the corresponding Chinese characters. The positive and negative parts of the hetero-association IWMs to translate English letters to Chinese characters are shown in figs. 4.1 3b and c. Although an area encoding and biasing method can be used to accommodate the negative values of the IWMs, for simplicity the positive and negative parts of the IWMs are sequentially displayed in the LCTVl . Again, the subtraction operations are performed by the microcomputer, and the output results can be obtained after thresholding. Figures 4.13d and e show a partial English letter A and its translated Chinese character obtained with only one iteration. Thus we see that the heteroassociation neural network can indeed perform pattern translations. 4.7. SPACE-TIME-SHARING MODEL
For a fully interconnected neural network, every neuron has to be interconnected to the other neurons; for instance, 1000 neurons would require a million interconnections. Thus it requires a very high resolution SLM for the massive interconnection. However, the resolution of the currently available SLMs is rather limited, which poses an obstacle for the development of a practical optical neural network for large-scale operation. This section discusses a space-time sharing technique, as described by Yu, Yang and Lu [1991], to alleviate this constraint. For a N x N neuron network the iterative equation can be described by eq. (3.1), which we repeat here, (4.17)
where n stands for the nth iteration, f represents a nonlinear operator, f l k and U i j represent the state of lkth and 0th neurons, respectively, and '7;kij is the connection strength from the lkth to ijth neuron. [Tkij] is the IWM, which can be partitioned into an array of N x N submatrices, as shown in fig. 3.1. The LCTV neural network of fig. 3.4 is used in this discussion, and the IWM and input pattern are displayed into LCTVl and LCTV2, respectively. Each lens in the lenslet array images a specific IWM submatrix onto the input LCTV2 to establish the proper interconnections. If we let the resolution of the LCTVl be limited by an R x R array of pixels, and the lenslet array be equal to an L x L array of neurons, and assume that if the size of the IWM (i.e., N x N ) is larger than the resolution of the LCTV1, i.e., N 2 > R ,
I I , § 41
101
NEURAL NETWORK MODELS
apparently the IWM cannot be represented entirely by the LCTV1. In the following, the discussion is divided into two cases: Case I : For N 2 > R and LN R, we let D = int(N/L), where int(.) is an integral value. We assume that the IWM is partitioned into D x D subIWMs, and each sub-IWM consists of L x L submatrices of size N x N , as shown in fig. 4.14. In this case we see that to complete the iterative operation of eq. (4.17), D x D sequential operations of the sub-IWMs are required. Thus, it appears that a smaller neural network can handle a larger spacebandwidth product (SBP) input pattern through sequential operation of the sub-IWM. Case I I : For N 2 > R and LN > R, we let D = int(N/L) and d = int(LN/R). In this case the submatrices within the sub-IWMs are further divided into d x d submatrices, and the size of each submatrix is ( N / d )x ( N l d ) , as shown in fig. 4.15. Thus the iterative equation (4.17) can be written in the following form:
-=
(4.18) If the input pattern is also partitioned into d x d submatrices and each submatrix is of size ( N / d ) x ( N l d ) , then by sequentially displaying each of
F]... llij
1Li j
......... rn 11
I
I
rn 12
...
... ... ...
I
0
I
Iwl
21
rn 22
Fig. 4.14. Partition of the IWM into D x D sub-IWMs, for D = 2.
102
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
ClI, 0 4
(b)
Fig. 4.15. Partition of a sub-IWM into d x d submatrices, for d = 2: (a) the pqth sub-IWM; (b) s x d smaller submatrices.
the IWM submatrices with respect to the input submatrices onto the LCTVl and LCTV2, respectively, a very large SBP pattern can be processed using a smaller neural network. It is obvious that the price we paid for achieving a larger bandwidth operation in this case is prolonging the processing speed by D2 x d 2 times. Generally speaking, the processing time increases as the square function of the space-bandwidth product of the input pattern, t2 = C(N2 x N2)/(N1x N1)I2fl= (N2/N1)4tl, ( N 2 > N , 2 ~9,
(4.19)
where t2 and t , are the processing times for the input patterns with N2 x N2 and N1x N1 resolution elements, respectively. This relationship was described in the preceding equation and is plotted in fig. 4.16. For instance, if the resolution elements of the input pattern increase four times in each dimension, i.e., N2 = 4N,, the processing time would be 44= 256 times longer. For experimental demonstrations we show that patterns with 12 x12 resolution elements can be processed using a 6 x 6 neuron network. A 240 x 480 element color LCTV (LCTVl) is used to display the IWM. Since each color pixel is composed of red, green, and blue elements, the resolution is actually reduced to 240 x 160 pixels. In the experiment, however, we have
11, 41
103
NEURAL NETWORK MODELS
T2lT
1
0
1
2
3
5
4
SBP2ISBP
1
Fig.4.16. Processing time increases as the square of the SBP of the input pattern, for SBP2=N, x N2 and SBP, = N , x N , .
(b)
(a
(4
Fig. 4.17. Processing of a 12 x 12 element pattern by a 6 x 6 neuron network: (a) four reference patterns stored in the IWM; (b) partial input pattern; (c) one of the four 6 x 6 sub-output arrays; and (d) composed output pattern.
used 2 x 2 pixels for each interconnection weight, by which the resolution of the LCTVl is essentially reduced to 120 x 80 elements. By referring to N = 12, L = 6, and R = 80, and noting that N 2 = 144 > R and LN = 72 c R , having D = int(N/L) = 2, the IWM can be divided into a 2 x 2 sub-IWM array, the elements of which can be displayed sequentially onto the LCTV1. Assuming that an input pattern is displayed on LCTV2, the signals collected by the CCD camera can be thresholded and then
104
OPTICAL NEURAL NETWORKS ARCHITECTURE. DESIGN AND MODELS
CII, 4 4
(d)
Fig. 4.18. Simulated results; processing of a 24 x 24 element pattern by a 6 x 6 neuron network: (a) reference patterns stored in the IWM; (b) partial input pattern; (c) four of the sixteen 6 x 6 sub-output arrays; and (d) composed output pattern.
composed to produce an output pattern, which has an SBP four times larger than that of the optical neural network. One of the experimental results is shown in fig.4.17. The training set, which includes four cartoon patterns, is shown, in which each pattern is limited by a 12 x 12 pixel matrix. Figure 4.17b shows a partial image of the second cartoon figure, with 12 x 5 pixels blocked out, as the input pattern. The sub-IWMs are then sequentially displayed, one by one, onto the LCTVl. Different parts of the output pattern are obtained, and one of the output parts is shown in fig.4.17~.The final composed output pattern using this technique is given in fig. 4.17d.
11, $51
REDUNDANT INTERCONNECTION NEURAL N E T W O R K S
105
To further demonstrate the larger scale operation, a 24 x 24 neuron IWM is used. Since N = 24, L = 6, R = 80, N 2 = 576 > R, and LN = 144 > R, we take D = int(N/L) = 4 and d = int(LN/R) = 2. We assume that the IWM is partitioned into 4 x 4 sub-IWMs and each sub-IWM is divided into 2 x 2 smaller submatrices, as illustrated earlier in fig. 4.1 5. In this case the input pattern is also divided into 2 x 2 submatrices, and the size of each submatrix is 12 x 12. It is apparent that, by sequentially displaying the submatrices of IWM and the input submatrices onto LCTVl and LCTV2, respectively, a 24 x 24 output pattern can be obtained. Figure 4.18a shows four 24 x 24 pixel cartoon patterns as the training set. A partial image of the third cartoon figure (fig. 4.18b) is used as the input pattern, which is divided into 2 x 2 matrices with 12 x 12 size during the processing; four of sixteen 6 x 6 output parts are shown in fig. 4.18~.By going through the whole set of partitions of the IWM, an output pattern is composed as shown in fig. 4.18d. Notice that the whole process takes D 2x d 2 = 64 operations using the LCTV neural network.
0 5.
Redundant Interconnection Neural Networks
Although a redundant interconnection neural network would produce better noise immunity, the pattern discriminality would also be reduced. This section will examine the performance of a neural network affected by redundant interconnection (Yang, Lu, Yu and Gregory [1991]). As we have shown in the preceding sections, the excitatory and inhibitory interconnections of an IPA neural network can be determined by using simple logic operation. It appears that the higher-order subsets (i.e., subspaces) can be excited by the lower-order subsets, but not the other way around. Therefore, the common (higher-order) subsets are expected to have more redundant interconnections than the lower-order subsets. Thus, the redundant level of interconnection can be defined as the difference in the orders of interconnection. It can be seen that by using the IPA algorithm the common features would be enhanced, whereas the special features would be relatively suppressed. Thus, the neural network would be less effective in recognizing patterns of great similarities. For example, the letter “ P has all the common features of the letters “B’ and “R”.A letter “B’input to the IPA neural net may produce an erroneous output “P.There is, however, an advantage of implementing redundant interconnection in a neural network, in that the network would have a greater input-noise immunity.
106
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CII, § 5
5.1. REDUNDANT-INTERCONNECTION IPA MODEL
This section will discuss a redundant interconnection interpattern association (R-IPA) network. We assume that a set of binary reference patterns, P1,P 2 , ..., P,, is stored in the IWM, for which the excitation and inhibition interconnections can be determined by the following logic operation: S i n = P 1 P 2 “ ‘ P ~ ~ k + ~ ~ k + 2(1 ”<’k~<~M, ) ,
(5.1)
and So,, = PI
P2
... P,P,,,+l
Pm+2
-
... P M , (1 Q m Q M ) .
(5.2)
Thus the differences of excitatory layers (ELD) and inhibitory layers (ILD) are given by ELD = m - k,
(m > k),
(5.3)
ILD = k - m,
(m < k).
(5.4)
and
Since the ELD and the ILD can be computed “bit-by-bit”, they are called number “1” states obtained from So,, - Sin,and So,,- Sin,respectively. In the case of a “-1” state in So,, - Sin or So,, - Sin, it represents a null interconnection. Thus, the maximum layer differences in the excitatory and inhibitory interconnections would be ELD,,,
= M - 1,
(5.5)
2.
(5.6)
and ILD,,,
=M
-
Let ERL and IRL be the redundancy levels of the excitatory and inhibitory interconnections. If ELD Q ERL, the corresponding output neurons will be excited. On the other hand, if ELD > ERL, there is no interconnection between input and output neurons. Similarly, the output neurons will be inhibited for ILD < IRL, and there is no interconnection between input and output neurons if ILD > IRL. An example of redundant interconnections is shown in fig. 5.1, in which three binary reference patterns are stored in the memory matrix, and each pixel is assumed to be equal to one neuron. Looking at the relationship between input neuron 3 and output neuron 1, we have S1-S3= 0 1 1, which implies E L D = 2 . If the ERL is assigned equal to 1, input neuron 3 and
11, P 51
107
REDUNDANT INTERCONNECTION NEURAL NETWORKS
111
1
1
32
2 00 1
4
; 0 0 1
4
lbl l
1
*
l
-1
1 (excite)
-1 (inhibit)
Fig. 5.1. Constructing an R-IPA model neural network. (a) Reference patterns; (b) interconnection for ERL = 1 and IRL = 0; (c) interconnection for ERL = 2 and IRL = I.
output neuron 1 should not be interconnected, as shown in fig. 5.lb. However, if ERL = 2, there will be an excitatory interconnection (+ 1) between these two neurons (fig. 5.1~). Inhibitory interconnections can be determined in the same manner. For instance, with respect to input neuron 3 and output neuron 4, we have S4-S3 = 0 1 0, by which ILD = 1. If we assume IRL = 0, these two neurons should not be interconnected, whereas if we assign IRL = 1, there would be an inhibitory interconnection (- 1) from input neuron 3 to output neuron 4 (fig. 5. lc). Furthermore, when input neuron 1 and output neuron 4 are considered,
108
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CIL 9: 5
we have S4-S1 = -1 - 1 0, and S4-S1 = O 0 -1. Since these results yield " -1" states, there should be no interconnection between them, as can be seen in figs. 5.1 b and c, respectively. It is apparent that the IPA logic operation can be easily extended to M reference patterns, by which the subset
x = PI P2
"'
Pk(Pk+ 1
+ Pk+2 + + PM) "'
-
= P I P , *'*PkPk++lpkk+2"'PM,
(1 G k G M ) ,
(5.7)
will excite all the neurons in the following subsets: P, = PI P2
-
9..
P,,,F,,,+1 P,,,+2 ... PM, ( m > k),
(5.8)
and inhibit all the neurons in the subsets Pi given by Pi=(pl +P2
+ '*'Pn)(Pn+lpn+2" ' P M )
= Pl P2 ... P, P, + 1 P, + 2
.*.PM,
(n 2 k).
(5.9)
It is simple to show that, for ERL = ERL,,, = M - 1 and IRL = IRL,,, = M - 2, the same result can be derived from the R-IPA algorithm, by which we have proved that the IPA neural network is indeed a maximum redundant interconnection network.
5.2. MINIMUM REDUNDANT IPA MODEL
The logic operation for achieving minimum redundant interconnections, for which the output neurons were excited by the neurons in the same subset and will be inhibited by neurons in the opposite subset, was discussed in 9 5.1. As there are no interlayer excitatory and inhibitory interconnections, the network is indeed a minimum redundancy interconnection network. In other words, the minimum redundant interconnection IPA (MR-IPA) model is a special case of R-IPA model. To verify this, we can simply assign a zeroredundancy level to the excitatory and inhibitory interconnections, i.e.,
ERL = IRL = 0.
(5.10)
The minimum redundancy interconnection weights for M binary patterns (assuming that each has N pixels) can be written as (5.1 1)
11, I 51
109
REDUNDANT INTERCONNECTION NEURAL NETWORKS
where h[ -3 is a three-level, hard-limiting function, i.e., 1,
x=M,
0, - M < x < M , -1, x = - M .
(5.12)
As for the Hopfield model the interconnection weight can be written as
(5.13) i=j,
where T represents a multivalued memory matrix. For practical implementation T is clipped into a three-state function, given by (5.14)
where g[x] =
I
1,
x>G,
0, - G < x < -1, x < - G .
G,
(5.15)
We note that for most cases G = 0. We see further that, as G increases, the number of interconnections decreases. It is therefore apparent that the least interconnected network occurs at G = M . Under this condition, eq. (5.14) would reduce to eq. (5.11) except for the diagonal elements T j , i = I , 2,3, ..., N . Thus the MR-IPA interconnection can be derived from either the R-IPA model or the Hopfield model. In other words, the Hopfield model and the R-IPA model would be the same if all redundant interconnections are eliminated. However, the basic distinction between the R-IPA model and the Hopfield model must be in the way in which redundancy is introduced. In the Hopfield model redundancy is introduced based on the information within each of the stored patterns, ignoring interpattern relationships, whereas in the R-IPA model it is based on interpattern associations between the stored patterns. We note further that without redundant interconnections, the MR-IPA neural network would have the lowest input-noise tolerance of all the R-IPA
110
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CK § 5
models. However, the MR-IPA model possesses the highest ability to discriminate between patterns that have a relative large similarity with one another. Since the MR-IPA neural network has the least interconnection, it should have the merit of applications to data-reduction assessment, such as weather forecasting, earthquake prediction, etc. 5.3. SIMULATED AND EXPERIMENTAL RESULTS
We first simulate the Hopfield and R-IPA models using various redundancy levels in an optical neural network. We assume that the 26 capital letters of the English alphabet are used as the training set. If the input pattern is contaminated with 20% noise, the output results showed that the optimum redundancy level Rapt is given by ERL,,, = int(0.3M), IRL,,, = ERL,,, - 1,
(5.16)
where int(-) represents an integer function and M is the number of the reference patterns. We stress that the empirical formulas to estimate optimum redundancy at different noise levels can actually be derived using a larger database simulation. The error rates as a function of stored patterns are also plotted in fig. 5.2. We see that the Hopfield, R-IPA, and OR-IPA models can sustain up to four, eight, and ten patterns, respectively. Thus the optimum redundant
0
10
20
Number of reference patterns Fig. 5.2. Performance under 20% noisy inputs, where I is the Hopfield model, I1 is the IPA model, and I11 is the OR-IPA model (ERL = int ( O S M ) , IRL = ERL-I).
II,§ 51
REDUNDANT INTERCONNECTION NEURAL NETWORKS
111
Fig. 5.3. Simulated result using the OR-IPA model.
interconnection is capable of improving the performance of a neural network under noisy conditions. Furthermore, a simulated result for the OR-IPA model using 10 capital letters is shown in fig. 5.3. The middle row represents the input patterns with a 20% error rate, whereas the last row represents the reconstructed output patterns. Furthermore, performance under partial inputs is also provided. The optimum redundancy levels are found to be ERL,,, = int(OSM), IRL,,, = ERL,,,
-
(5.17)
1.
Once again we see that the OR-IPA model performs better than the Hopfield and IPA models, as shown in fig. 5.4. Since the R-IPA interconnection weights emphasize the interpattern rela-
0
5
10
15
Number of referencepatterns Fig. 5.4. Performances for partial inputs, where I is the Hopfield model, I1 is the IPA model, and 111 is the OR-IPA model (ERL = int (0.3M),IRL = ERL-I).
112
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CIL § 5
Fig. 5.5. Simulated result using the OR-IPA model.
tionship, the features of the stored patterns play an important role for the pattern recognition. It is apparent that the stored patterns should retain their main features, otherwise it would produce erroneous results. Figure 5.5 shows a simulated result using the OR-IPA model. The middle row represents a set of partial input letters that contain the major features. The reconstructed letters are shown in the last row, in which nine out of 10 letters can be completely recalled. An experimental result using the LCTV neural network is presented in fig. 5.6, in which the letters B, E, F, P, and R are used as the training set, and each letter is represented by an 8 x 8 pixel array. Figure 5.6b shows a partial input that contains the main feature of the letter B. The results using
Fig. 5.6. Experimental results. (a) Training sets; (b) partial input; (c) obtained by the IPA model; (d) obtained by the OR-IPA model.
11, 5 61
OPTICAL IMPLEMENTATION OF HAMMING NETS
113
IPA and OR-IPA models are given in figs. 5 . 6 ~and d, respectively. By comparing these two results, we again see that the OR-IPA model performs better than the IPA and Hopfield models. This section has discussed an R-IPA model to improve the performance of a neural network. Although redundant interconnection is more robust, it reduces the discriminability for pattern recognition. Nevertheless, under noisy and partial input situations, the redundant interconnection network performs better. When compared with the Hopfield and IPA models, we have shown that the OR-IPA neural network improves robustness and pattern discriminability.
0 6.
Optical Implementation of Hamming Nets
It is fair to say that the Hopfield model and Perceptron are the most frequently used in optical implementations as described by, e.g., Psaltis and Farhat [1985], Athale and Stirk [1989], Wang and Jenkins [1990], Hong, Campbell and Yeh [1990], Dunning, Owechko and Soffer [1991], Zhang, Robinson and Johnson [1991], and Lu, Wu, Xu and Yu [1989]. The Hopfield model is a fully interconnected network that requires as many interconnection weights as the square of the pixel number of the input patterns. For example, if a group of 32 x 32 pixel patterns are stored, there will be more than one million interconnections in the Hopfield network. The capability of optics for implementing such a huge interconnection is limited by the low resolution of the currently available spatial light modulators (SLMs). Although a space-time sharing scheme may alleviate this limitation, it reduces the processing speed significantly. Furthermore, the number of stored patterns in the Hopfield network is severely limited. In addition, if a great number of patterns are stored in the neural net, it would produce spurious results, which corresponds to a “no match” pattern (Lippmann [19871). Even though the Perceptron is trained by the error back-propagation algorithm, it needs a long training time and requires precise detection of the analog output signals, which diminishes the value of using optics. Since the Hamming net does not suffer from these limitations (Lippmann [1987], Lippmann, Gold and Malpass [1987]), it works like an optimum image classifier. The output is generated by selecting the class (or exemplar) that has the minimum Hamming distance with respect to the input pattern. The Hamming distance is defined as the number of bits of the input pattern that does not match the exemplar. The number of interconnections in a
1 I4
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
PI. § 6
Hamming net is proportional to the number of input pixels and exemplars, and it has fewer interconnections than the Hopfield model. By referring to the previous 10 32 x 32 pixel stored exemplars, the Hamming net requires about ten thousand interconnections, instead of one million for the Hopfield model. Since the output is selected from the stored exemplars, the Hamming net would not produce any spurious or “no match” result. In fact, the Hamming net is a K-nearest neighbor network. In comparison with the Perceptron, the Hamming net has a shorter training time (Lippmann [1989]), and it does not need the precise analog detection at the output domain during the training process. Because of these features, the Hamming net is particularly suitable for large-scale optical implementation. This section discusses a modified Hamming net model that reduces the dynamic range requirement of the SLMs. The optical implementation of the modified Hamming net, as proposed by Yang and Yu [1992], is introduced.
6.1. HAMMING NET MODEL
A Hamming net is essentially a two-layer neural network (fig. 6.l), which can be used as a maximum likelihood image classifier (Lippmann [1987]). The first layer is known as the Hamming layer, which calculates the Hamming distances between the input pattern and each exemplar (i.e., each class), whereas the second layer is known as “MAXNET” or “winner-takeall” layer, which selects the maximum output node (Lazzaro, Ryckebusch and Mead [1988]). Let M be the number of the bipolar exemplars stored in the neural network, in which each exemplar has N pixels. The first layer (i.e., the Hamming layer) has N input and M output neurons corresponding with N pixels and M classes, respectively. The interconnection weights in the first layer are determined by
wj=iX?),
(1 Q i Q N ,
1SjSM),
(6.1)
where W j is the interconnection weight from the ith input neuron to the jth output neuron, and Xi” (which can be either + 1 or - 1) is the value of the ith pixel in the j t h exemplar. The threshold value can be set at 8,=$N,
(1 Q j G M ) .
Thus, when an unknown input pattern is presented at the input, the output
OPTICAL IMPLEMENTATION OF HAMMING NETS
115
output
4
4
4
second layer
first layer
x1
... ... ... ... ...
x2
&J.
input
Fig. 6.1. A Hamming net.
of the first layer is given by N
v,(o)= i1 (wjxi) + e,, = 1
(1 ~j G M I .
(6.3)
When an exemplar matches the input, the output of the neuron representing that exemplar will reach the maximum value N . In contrast, when all the pixels in an exemplar are different from the corresponding pixels of the input pattern, the output of the neuron representing the exemplar will be zero. In general, the output V j ( 0 )has a value between 0 and N, which is equal to the number of bits of the input pattern that match the bits of the j t h exemplar; Vj(0)= N - HD, (1
<j < M ) ,
(6.4)
where HD is the Hamming distance. Although the Hamming net is theoretically sound, the second layer requires a high dynamic range and successive iterations to produce a maximum output node. However, if the Hamming distance between an exemplar and the input pattern is larger than a certain value (e.g., $N, the case where more than half of the bits differ), it is not necessary to send a nonzero signal
116
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CIL 5 6
to the second layer. By referring to this argument, a modified Hamming net model can be developed as follows: Let us introduce a parameter a (0 < a < l), such that aN is the maximum Hamming distance that gives rise to a nonzero output signal from the first layer. The modified interconnection weights in the Hamming layer can be written as
for which the threshold value is set at
( &),
0J. = N 1 -
( 1 <j < M ) .
The output of the Hamming layer can therefore be expressed as
"
u , ( o ) = ~c wjxi+ ej i=l
1
otherwise, where f ( is a thresholding function defined as a)
{
x,
f(4= 0,
x>o, XdO.
By using this scheme, the Hamming distance between the input and an exemplar can be enlarged by a factor of l/a, such that the dynamic range requirement for the spatial light modulators (SLMs) and the number of iterative cycles in the MAXNET can be reduced. Thus the overall performance of the modified Hamming net can be improved. However, in practice the parameter a cannot be very small. It should be larger than the input-noise tolerance of the network, otherwise the Hamming layer would produce a zero output at the matching neuron. Let us set a = 0.5 to adapt to the dynamic range of a practical SLM, in which we assume that the input noise is about 20%. In this case the interconnection weights of the Hamming layer are given by
K j = xy,,
(6.9)
the threshold level is
oj = 0,
(6.10)
11, § 61
OPTICAL IMPLEMENTATION OF HAMMING NETS
117
and the output of the Hamming layer is N - 2HD, H D < i N ,
otherwise.
(6.11)
Since the MAXNET or winner-take-all layer has M input and M output neurons, the interconnection weight between the j t h input and kth output neurons can be written as tjk=
{
1, j = k, -E,
j # k,
(E<
1/M, l < j ,
k1<M),
(6.12)
where E is known as the inhibition constant. If the output signal of the Hamming layer is fed to the MAXNET layer, iterations can be carried out given by
=g
1
U k ( n ) - & C Uj(n) , ( I < j ,
[
j#k
k<M),
(6.13)
where g(.) is a nonlinear operator, for which we assume a sigmoid function that represents the input-output transfer characteristic of the chargecoupled-device (CCD) detector. By successive iterations of the MAXNET, one of the output nodes would have a higher intensity value, whereas all other nodes will eventually go to zero. It has been proved that the MAXNET will always converge if E < (l/M), as pointed out by Lippmann, Gold and Malpass [1987]. Thus a maximum output node can always be found. As a K-nearest neighbor classifier, the Hamming net gives rise to only the best match among all the stored exemplars. However, the selected match exemplar may not be the same as the input pattern. In other words, if the input pattern does not belong to one of the training exemplars, the output will be the one exemplar that has the least Hamming distance with respect to the input. The Hamming net can also be used as an associative memory if the exemplar can be represented at the output of the MAXNET, instead of using the intensity value. 6.2. OPTICAL IMPLEMENTATION
Figure 6.2 is a schematic diagram of an adaptive optical neural network for Hamming net implementation. The system's design is detailed in 9 3.2.
I I8
OPTICAL NEURAL NETWORKS: ARCHITECTURE. DESIGN AND MODELS
CIL 5 6
Fig. 6.2. A hybrid optical Hamming net: the output signal is fed back for multilayer operation.
For demonstrations, 12 8 x 8 pixel exemplars are used in the experiments. Since the number of submatrices in the IWM equals the number of stored exemplars, a circular lenslet array that comprises 12 planoconvex lenses is used, as illustrated. This circular lenslet array is, in fact, matched with the aperture of the imaging lens, such that the primary aberrations can be minimized. In §6.2 we noted that the IWM is a bipolar matrix in the first layer of the Hamming net. To realize the bipolar multiplication in the optical system, the IWM can be area modulated before being displayed onto the LCTV1. As shown in fig. 6.3, each pixel can be divided into upper and lower parts. For example, the value 1” can be encoded with transparent and opaque regions, as shown in fig. 6.3a, for positive IWM; similarly, the value “-1” can be encoded, as shown in fig. 6.3b, for negative IWM. This input pattern can be encoded in the same manner as the positive IWM. If we assume that an encoded input pattern is fed to the LCTV2, and encoded positive and negative IWMs are sequentially displayed at the LCTV1, then output intensities representing the positive and negative parts would be sequentially collected by the CCD detector. These two sets of signals are then sent to
“+
Fig. 6.3. Area modulation encoded IWM in the Hamming layer: (a) transparent and opaque encoding for “+1” and “-1” in the positive IWM; and (b) encoded pixel in the negative IWM.
OPTICAL IMPLEMENTATION OF HAMMING NETS
I19
Fig. 6.4. Exemplar set and the IWMs for the Hamming layer: (a) the 12 exemplars; (b) encoded positive IWM; and (c) encoded negative IWM.
the microcomputer for subtraction and thresholding. This array of thresholded signals can be fed back to the LCTV2 for the MAXNET (i.e., second layer) operation. Figure 6.4a shows a set of 8 x 8 pixel capital letters used as the exemplars in the Hamming net. The positive and negative encoded parts of the IWM are shown in figs. 6.4b and c, respectively, in which the IWMs are partitioned into 12 submatrices, each of which is represented by an 8 x 8 pixel array. Since the number of input and output neurons in the MAXNET is equal the number of exemplars, the IWM for the MAXNET is partitioned into 12 submatrices, each of which contains 12 pixels. Area modulation can again be used to encode the bipolar IWM. Figure 6.5a shows that the transparent pixel represents the value “ + I ” in the positive IWM, whereas the pixel of value “ - E = - &” has only & of the transmitted area in the negative IWM. The encoded positive and negative IWMs are shown in figs.6.5b and c, respectively. We stress that the modified Hamming net increases the Hamming dis-
Fig. 6.5. Encoded IWMs in the MAXNET (a) pixel encoding for “+ 1” and “-A”;(b) encoded positive IWM; and (c) encoded negative IWM.
I20
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CK 5 6
tances at the input of the MAXNET, which can relax the dynamic range requirement of the SLM and reduce the iteration cycles for the MAXNET. The sigmoid function representing the input-output transfer characteristic of the CCD detector would further reduce the number of iterative cycles. Since the dynamic range of the input signal to the MAXNET is rather large, the low-intensity signals would be suppressed by the CCD detector. Note further that the iteration cycles for a conventional MAXNET algorithm would require about 10 cycles, whereas this proposed hybrid optical system takes only two to three iterations. 6.3. EXPERIMENTAL DEMONSTRATIONS
Demonstrations of the optical Hamming net are illustrated in fig. 6.6, in which figs. 6.6a and e represent two capital letter exemplars “A” and “H” embedded in 20% random noise. The encoded patterns are depicted in
Fig. 6.6. Experimental demonstrations: (a),(e) input patterns embedded in 20% random noise; (b), (f) encoded input patterns; (c), (9) outputs from the first layer; (d), (h) output results obtained from MAXNET after two iterations for “A” and three iterations for “H”, respectively.
I I , 71 ~
INFORMATION STORAGE CAPACITY
121
figs. 6.6b and f, respectively. The outputs from the first layer are shown in figs. 6 . 6 ~and g, respectively. Figure 6.6d shows the output result of “A” from the MAXNET after two iterations, whereas fig. 6.6h represents the output of “H” after three iterations. Referring to these results with respect to the exemplars pictured in fig. 6.4a, we showed that the Hamming net can be used for pattern classification. When compared with the Hopfield model using the same optical architecture, the Hopfield model becomes unstable for storing more than four exemplars. These demonstrations showed that the optical Hamming net has a larger processing capacity when compared with the Hopfield net. We note further that the converged result from the MAXNET can also be used to recall the corresponding exemplar. If this recalled exemplar is displayed on an SLM as a final result, the Hamming net can obviously be used as an associative memory. We also note that the MAXNET used in the optical unsupervised learning model (by Lu, Yu and Gregory [1990]) is primarily carried out with a computer, whereas the MAXNET in the optical Hamming net is partially carried out by optics. In summary, we note that the Hamming net requires fewer interconnections than the fully interconnected Hopfield neural net, and as mentioned earlier, has a rapid training process that requires no analog detection. These features make it particularly suitable for large-scale optical implementation. We also showed that the optical Hamming net can be used as a pattern classifier or an associative memory if the convergent result is used to recall the exemplar. An important aspect of the modified Hamming net is to enlarge the Hamming distances of the output patterns at the first layer. This modification relaxes the dynamic range requirement of the SLMs, and also reduces the number of iteration cycles in the MAXNET.
§ 7. Information Storage Capacity
In contrast to the standard memory, information is an explicit quantity, the storage of which in an associative memory is a complicated procedure. However, associative memory must be a reasonable model for biological memory, in which a large number of simple connected building blocks (i.e., the neurons) act individually in a random way, but for a specific task to be accomplished they act as a collective constitution of an organ. An important step in understanding collective systems is to quantify their ability in storing information and carrying out computations.
122
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CII, 5 7
The Hopfield neural network is a model of associative memory that is capable of storing information, as well as carrying out certain computational tasks, such as error correction and speech and pattern recognitions. In this section a definition of information capacity is introduced in which the upper and lower bounds of the storage capacity can be found for a neural network. We shall confine discussion so that the format of information storage is in stable-state conditions, for which an upper bound of vectors can be made stable in the model (after Li [1990]). A neural network consists of N x N neurons, and the ith neuron can be in one of two states: ui = - 1 (off) or ui= + 1 (on). The synaptic connections are undirected, and have strengths that are fixed, real numbers. Define the state vector u to be binary vector (kl), whose ith component corresponds to the state of the ith neuron. Each neuron examines its input and then decides whether to turn itself on or off in the following manner: Let the interconnection weight matrix (IWM) T be an N x N positive real number with zero-diagonal symmetric matrix (i.e., T j = Ti and Ti = 0), for which the entries T j represent the strength (which may be negative) of the synaptic connection from the j t h to ith neuron. We further let t i be the threshold voltage of the ith neuron. If the weighted sum over all of its input is greater than, or equal to, ti, the ith neuron turns on and its state becomes + l . However, if the sum is less than t i , the neuron turns off and its state becomes -1. Thus the choice of T and t defines a special neural network with specific synaptic connection strengths and threshold voltages of the neurons. We assume that the network starts with an initial state, and it runs with each neuron randomly and independently reevaluating itself. Often the network enters a stable point in state space in which all neurons remain in their current state after evaluating their inputs. This stable vector constitutes a stored word in the memory, and the basic operation of the network is to converge to a stable state if we initiate it with a nearby state vector. As pointed out in 5 4.1, Hopfield proposed a specific scheme of constructing the IWM T that makes a given set of vectors ul,..., U" stable states of a neural network. The scheme is based on the sum of the outer products of these vectors. Here no assumption is made about how the matrix T is constructed in terms of vectors ul,..., uK, and all the results are valid even if the construction does not follow Hopfield's scheme. Since the neural network represents a memory that stores information, it is appropriate to ask how much information can be stored in a network of N neurons. To define the information storage capacity, we start with a
1 4 8 71
INFORMATION STORAGE CAPACITY
123
familiar example. If we have a random access memory (ie., the RAM) with M addressable and one data lines (an M x 1 RAM, consisting of 2Mmemory locations, where each location is accessed by an M bit address and contains one bit of stored data), it is obvious that we can store 2M bits. In other words, we can load the M x 1 RAM with a string of binary information by which the whole string can be retrieved from the RAM. Another way to look at it is to consider the string as a single object, we can store and retrieve any string (of length 2M) in the M x 1 RAM, and there are 2'" such distinguishable strings. Since the storage capacity can be defined as the logarithm of the number of distinguishable objects, it is obvious that the storage capacity of the M x 1 RAM is C = log 2'"
= 2Mbits.
(7.1)
We shall now apply this definition to the Hopfield model with N x N neurons, for which the upper and lower bounds of the information storage capacity C can be found.
7.1. UPPER BOUND
The question now becomes how many different sets of values for T j and t i can we distinguish merely by observing the state transition scheme of the neurons? The state transition scheme is determined by the number of distinguishable networks of N x N neurons. If this number is M, the capacity of the network would be C = log M bits. The key factor in estimating the number of distinguishable networks is the well known estimate of the number of switching functions, for which the function of n binary variables xl, xz, ..., x, is defined by assigning - 1 or 1 to each of the 2" elements in the set { - 1, Furthermore, a switching function f(xl, x2, ..., x,) of n variables is linearly separable if there exists a hyperplane 17 in the n-dimensional space, which strictly separates the "on" set f - ' ( + l ) and the "off set j-'(-l). In other words,f-'(+l) lies on one side of 17 and f-'(-l) on the other side, and 17n{-1, +l}N is empty. Linearly separable switching functions are also called threshold functions. Winder [1962] gave the following upper bound on the number of threshold functions of n variables defined on m points,
+
+
I24
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
PI, 8 7
The result is derived in the following manner: Define an (n + 1)-dimensional space in which the coordinate axes correspond to the weights and to the threshold value. Consider a particular state u. When u is plotted as a hyperplane in the n 1 space, the set of all values of w j and t satisfies the following equation:
+
2
wjuj
- t = 0.
j= 1
(7.3)
Note that the hyperplane passes through the origin of the coordinate system, and that this plane divides the space into two regions. Weights and threshold value from one of the regions make wjuj
-t
> 0,
j= 1
(7.4)
and correspond to the threshold function on u being equal to 1. Weights and threshold values from the other region make wjujj= 1
t < 0,
(7.5)
and correspond to the threshold function on u being equal to -1. Each of the m points gives a similar hyperplane. Thus we have m hyperplanes passing through the origin in the n + 1 space, and partitioning the space into a number of regions. Each region corresponds to a threshold function. All points in any one of these regions correspond to values of w j and t that produce the same threshold function. Two points in different regions correspond to two different functions, at least one u out of the m u's is mapped to + I by one function and mapped to -1 by the other. Therefore, B," is less than, or equal to, the maximum number of regions (called C,"+ made by m hyperplanes passing through the origin in the n + 1 space. Assume that m - 1 hyperplanes have made 12,";: regions in the n + 1 space. We add the mth hyperplane to make as many more regions as possible. The mth hyperplane can intersect the other m - 1 hyperplanes in, at most, m - 1 hyperlines. The m - 1 hyperlines can, at most, partition the mth plane into C,"-' hyperplane regions, which is the same problem as in the n space. Since each region in the mth plane has been divided into a boundary between two regions in the n + 1 space, we have added C,"" regions to the other C,";: regions determined by m - 1 planes. This means that
11, § 71
INFORMATION STORAGE CAPACITY
I25
The solution of this relation is given by
izo n
" +
1
=
i!
(m- I)!(j
-m
+ I)!
9
(7.7)
which is the upper bound for B," of eq. (7.2). If m = 2", i.e., the threshold function is defined for every binary n-vector, then the upper bound for the number of fully defined threshold functions of n variables is given by
2"'
< 2(n + 1) n. < 2"'. The action of each neuron simulates a general threshold function of N - 1 variables (i.e., the states of all the other neurons), and we have, at most, 2(N-1)2such functions, as shown by Abu-Mostafa and St. Jacques [1985]. Since there are N neurons, it will be, at most, (2"- 1 ) 2 ) N distinguishable networks. The logarithm of this number is an upper bound for the Hopfield information storage capacity C , namely, C 6 10g(2"-')')~
= N ( N - I)' = O(N3)bits.
(7.9)
7.2. LOWER BOUND
It is known from Abu-Mostafa and St. Jacques [I9851 that there are at least 2°.33n2threshold functions of n variables. The symmetry of the IWM T makes the N threshold functions dependent, but we can take the submatrix of T consisting of the first [N/2] rows and the last [N/2] columns, and consider the partial threshold functions defined by this submatrix. Since the entries of this submatrix are independent, we have at least [N/2] functions, each with n = [N/2] variables. Therefore, the number of distinguishable networks is at least (2°.33"/212)tN'21. The logarithm of this number is a lower bound for the storage capacity C, namely, 0.33[N/2l2 N / 2
log(2
)
0'33 8
= -N 3 bits
(7.10)
In view of eq. (7.10), we see that an N-neuron network of Hopfield would have an information storage capacity of the order of N 3 bits. The preceding results can be extended to the interpattern association (IPA) model. Since the associative memory T of the IPA model is a three-
126
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN AND MODELS
CII, § 7
state interconnection weight matrix, the storage capacity of the IPA model would be higher than that for the Hopfield model. As a rough estimation, we consider that a switching function of n trial variables is defined by assigning + 1,0, or - 1 to each of the 3" points in the set { - 1,0, + I }N. After performing a derivation similar to that discussed above, the upper bound information storage capacity for the IPA model can be shown to be
C < l 0 g ( 3 ~=~ )1.6N3 bits.
(7.1 1 )
7.3. MOMENT-INVARIANT NEUROCOMPUTING
This section examines the feasibility of using the image irradiance moments to replace the Hamming distance, which is generally used as a criterion for the convergence in neural computing. The moment mp,qof image irradiance can be defined as (7.12)
where f(x, y) is assumed to be a continuous, bounded, and non-zero function. It should be noted that mp,4is not invariant to distortions. In order to obtain moments that are invariant to translational change, we define the central moments p p , q as given by Hu [1959], m
rr
(7.13) -00
where (xo,y o ) are the coordinates of the centroid of the image irradiance; m1.o mo.0
(7.14) and yo=-,mo,1 mo,o with the zero-order moment mo,o= po,o representing the total image irradiance. The above functions p p , q ,however, are not invariant under a scale transformation. To overcome this inadequacy, the scale normalized moment is introduced, namely, xo=-,
(7.15)
It is well known for the second- and third-order central moments that the following functions are invariant under position, scale, and rotation
II,Q 71
127
INFORMATION STORAGE CAPACITY
(Hu [1959]):
$4
= (u3,0 + u1.2)2
+ ('Z,l + u 0 , 3 ) z ,
...
(7.16)
We note that the first seven moments provide sufficient discrimination between alphabetical characters, and also permit recognition of multisensor imagery by Teague [19801. The Hopfield model can be summarized as follows (see 4 4.1): The first step is the assignment of connection weights between neurons, namely,
Tj=
M
1 fm(i)fm(j), i, j = 1,2,..., N,
Ti=O.
(7.17)
m= 1
i#j
Here, T j is the connection weight from neuron i to neuron j , and fm(i) is the ith component of the vector m in a set containing M bipolar binary vectors. Every vector in this set represents an exemplar pattern, and every component of a vector corresponds to a pixel in the exemplar pattern. The second step is the initiation of the Hopfield neural network with an unknown input pattern that is described by the vector [fm0(l),fm0(2), . . . , f m o ( N)]. This step can be expressed by go(i) =fmo(i),
(7.18)
where go(i) is the output of neuron i at the 0th iteration of the network. The third step describes the update rule,
(7.19) where sgn[x] = -1 if x < 0 and +1 otherwise. The process of iteration is repeated until outputs remain unchanged for further iterations. Under some conditions the outputs then represent the exemplar pattern that best matches the unknown input. A Hopfield net with 64 neurons and thus 4096 weights was trained to recall four capital letters A, C, E, and T, as demonstrated in fig. 7.1,by Li and Yu [1991]. Every letter is placed in a square unit area (fig. 7.2).The
128
CII. 8 7
OPTICAL NEURAL NETWORKS: ARCHITECTURE, DESIGN AND MODELS
Fig. 7.1. Reference patterns for a 64 neuron Hopfield net.
&l+
Fig. 7.2. Dimension of the reference patterns.
Hamming distances from letter to letter are listed in table 7.1. Seven moment invariants calculated from eq. (7.16) are listed in table 7.2 and the first four of them, together with the Hamming distance, are depicted in fig. 7.3. Figure 7.3 is divided into three parts: The first part, at the top, demonstrating the whole process concerning the retrieval of the input pattern in TABLE7.1 Hamming distance from letter to letter. Letter
A
C
E
T
A C E T
0 40 35 44
40 0 29 26
35 29 0 33
44 26 33 0
TABLE7.2 Low-order moment invariants of the four exemplars. A 41
42
43
44 4 5
46
4,
0.328 8.40 x 1 0 - 3 3.86 x 10-4 3.55 x 1 0 - 5 0 -3.25 x 0
0.490 1.71 x 7.75 x 1.36 x -1.39 x -5.60 x 0
T
E
C 10-3 10-4 10-3
10-5
0.433 6.17 x 2.87 x 3.32 x -1.03 x -2.61 x 0
10-3
10-3 10-3
10-5 10-4
0.299 1.74 x 1.67 x 1.42 x 2.61 x 1.87 x 0
lo-’ lo-* 10-4
low6 10-5
129
INFORMATION STORAGE CAPACITY
nornrning Dirtonce
I
I
I I I
I
I
I
I
I I I
I
I
I I
I
I
I
(b)
1 I
f
I
2:0.0174
Itemtion
i
(d I
1
I
I
1.5 l v 4 n 1 0 L
1.o
I
I
I lterotion
I
I
I
I
I Iteration
v7 I I
0.5
0
I
I
I I
I
I
I
1
2
I I
-
(~
I II
I
I Iteralion
3
4
Fig. 7.3. (a) Output patterns as the number of iterations increases; (b) convergence of the Hamming distance; (c) through (f) convergence of the &functions.
four iterations in the Hopfield neural network; the second part shows the convergence of the Hamming distance; and the third through sixth parts show the variations of the first four &functions in the process of the four iterations. The input binary pattern is a letter T corrupted by a random noise, for which each bit is independently reversing from 1 to 0, and vice versa, with a probability near 0.1. The Hamming distance between the corrupted and
130
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CII, 5 7
uncorrupted letters T is 5 bits. The corrupted pattern is presented at the input end of the Hopfield neural network; the patterns obtained at the output end, from the first to fourth iteration, are demonstrated in the figure. It can be seen that when the network iterates, the output appears more and more like the correct exemplar pattern, and ultimately converges to it. In other words, the Hamming distance decreases from 5 bits to 0 in four iterations, as can be seen in the following sequence, Hamming distance = 5, 3,2,0, ... (bits).
(7.20)
Unfortunately, Hamming distance is not an image descriptor, and it cannot be used to discriminate between image patterns. For the purpose of pattern recognition we suggest the use of moment invariants to show the convergence of collective computation in neural nets. For example, the function 41 for the letter T converges to its stable value 0.299 (see table 7.1) in four iterations, as shown in the sequence
41= 0.255, 0.319, 0.277, 0.299, ....
(7.21)
Instead of using the Hamming distance sequence expressed in eq. (7.20), one can use the difference to show the convergence in neurocomputing, such as lAC#JlI = 0.064,0.042,0.022,0, ..., IAC#J21= 0.012,0.011,0.012,0,
...
..., (7.22)
A comparison of the sequence of eq. (7.20) with that of eq. (7.22) shows that both are uniformly converging to zero when the corrupted pattern is completely retrieved. However, the sequences of eq. (7.22) provide information that enables discrimination between patterns on the basis of an examination of the quantitative difference between the values of moment invariants and the data listed in table 7.2. The primary advantage is that the code description is replaced by an image description in neurocomputing. In summary, it is known that 4iis noise sensitive, and for neural networks it is difficult to make a shift or distortion invariant without the use of a large number of training patterns. Nevertheless, we have proposed a method to combine the invariant moments with neurocomputing to make up for their limitations: the distortion invariant by the moments, and the noise immunity by the neural networks.
11, 8 81
SELF-ORGANIZINGOPTICAL NEURAL NETWORKS
131
6 8. Self-organizing Optical Neural Networks Strictly speaking, two types of learning processes are used in the human brain: supervised and unsupervised learning (Lippmann [19871). If a teacher organizes the information and then teaches the students, it is obviously a supervised learning. For example, in an artificial neural network (ANN), both the input and desired output data must be provided as training exemplars. In other words, the ANN has to be taught to learn and when to process the information. If an unknown object is presented to the ANN during the processing, however, the network may provide an erroneous output result. On the other hand, for unsupervised learning (also called self-learning) the students learn by themselves, relying on some simple rules and their past experiences. In an ANN only the input data are provided, but not the desired output result. After a single trial or series of trials, an evaluation rule (previously provided to the neural network) is used to evaluate the performance of the network. Thus, we see that the network can adapt and categorize the unknown objects. This kind of self-organizing process is a representation of the self-learning ability of the human brain. In past decades, various models of ANNs were developed. Some examples of the supervised learning models include the Perceptron, the error-driven back-propagation model (Rumelhart, Hinton and Williams [1986]), the Hopfield model, and the Boltzmann machine. The adaptive resonance theory (ART) (Rumelhart and Zipser [1986]), the Neocognitron (Rosenblatt [1962]), the Madline, and the Kohonen [1984] self-organizing feature map are among the best known unsupervised learning models. Among these, the Kohonen model is the simplest self-organizing algorithm, capable of performing statistical pattern recognition and classification, and it can be modified for optical neural network implementation, as described by Lu, Yu and Gregory [1990]. 8.1. KOHONEN’S FEATURE MAP
Knowledge representation in human brains is generally at different levels of abstraction, and assumes the form of a feature map. The Kohonen model suggests a simple learning rule by adjusting the interconnection weights between input and output neurons, based on the matching score between the input and the memory. A single interconnection layer neural network is defined that consists of N x N input and M x M output neurons forming an input and output vector space (fig. 8.1), in which the nodes are also
132
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
Output
space:
[II, 0 8
MxM output neurons
i Input
space: NxN input neurons
Fig. 8.1. Single-layer neural net.
laterally interconnected. Let us assume that two-dimensional vectors (i.e., input patterns) are sequentially presented to the neural network, given by x(t) = xij(t), i, j = 1, 2, ...,N ,
(8.1)
where t represents the time index, such as the iteration number in the discrete-time sequence, and (i, j) specifies the position of the input neuron. Thus, the output vectors of the neural network can be expressed as a weighted sum of the input vectors
where ylk(t) represents the state of the (I, k)th neuron in the output space and mlkij(t)is the interconnection weight between the (i, j)th input and (1, k)th output neurons. Equation (8.2) can also be written in matrix inner-product representation as Ylk(t)= mlk(t)x(t),
(8.3)
where mlk(t)can be considered to be a two-dimensional vector in a fourdimensional memory matrix space, which can be expanded in an array of two-dimensional submatrices (fig. 8.2). Each submatrix can be written in the
SELF-ORGANIZINGOPTICAL NEURAL NETWORKS
Memory
133
space:
Fig. 8.2. Memory vectors in the memory matrix space.
where the elements in each submatrix represent the associative weight factors from each of the input neurons to one output neuron. Note that, in general, the Kohonen model does not specify the desired output results. Instead, a similarity matching criterion is defined to find the best match between the input vector and the memory vectors, and then determine the best matching output node. The optimum matching score d,, as defined by Kohonen, can be written as
where c = (1, k)* represents the node in the output vector space at which a best match occurred, and /I 11 denotes the Euclidean distance operator. After obtaining an optimum matching position c, a neighborhood N,(t) around the node c is further defined for modification (fig. 8.3). Note that the
-
Fig. 8.3. Neighborhood selection using Kohonen's self-learning algorithm: N,(t, ) is the initial neighborhood, as time proceeds the neighborhood shrinks to N r ( t 2 ) ,etc., until it reduces to one memory submatrix.
134
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
DL § 8
memory matrix space is equivalent to the output space, in which each submatrix corresponds to an output node. As time progresses, the neighborhood Nc(t) will slowly reduce to the neighborhood that consists of only the selected memory vector mcras illustrated in fig. 8.3. Furthermore, in a simple algorithm that is used to update the weighing factors in the neighborhood topology, the similarity between the stored memory matrix m,k(t) and the input vector x ( t ) increases. Note that the input vectors can be binary or analog patterns, but the memory vectors are updated in an analog incremental process. The adaptation formula of the algorithm can be written as
where 0 < a(t) < 1 represents a gain sequence, called the learning speed, which is usually a monotonically slowly decreasing function oft. In practice, the learning speed is assumed to be linear, and can be written as a(t) = a(0) - at,
(8* 7)
where a is a learning rate. We should mention that the point density of the memory vectors tends to approximate the probability density function of the input vectors, which has been proved by Kohonen [1984]. It should be noted that eq. (8.5) can be expanded in the following form: N
N
where the first term is a constant with respect to the output position ( I , k). If the weight vectors are normalized so that their autocorrelations (i.e., the sum of the squared weights from all inputs to each output node) are identical, the second term also becomes a constant. Under this condition the minimum Euclidean distance occurs whenever the third term becomes maximum, i.e.,
which represents the output result of the neural network. In other words, selecting the minimum Euclidean distance between the memory vectors and the input pattern vector is equivalent to finding the maximum output node in the output space.
11, § 81
SELF-ORGANIZING OPTICAL NEURAL NETWORKS
I35
The maximum output node can be determined by using extensive lateral inhibition among the output nodes in the output space, which is known as the MAXNET algorithm (Lippmann [19871). The expression of the inhibition interconnections can be written as
where f is a nonlinear operator over the output nodes and E is an inhibition constant between the output nodes. For simplicity we assume that the nonlinear operation f ( x ) is a thresholding function given by 1,
x 2u,
(8.1 I ) -1,
X<
b,
where a and b are arbitrary constants. By referring to the MAXNET algorithm, we see that each node will excite itself and inhibit the other nodes during each transaction. Finally, only the maximum output node will survive, whereas all of the other nodes will eventually go to zero. Since the MAXNET algorithm will always converge, for E < 1/M, the maximum node can always be found, as shown by Lippmann [1987]. 8.2. UNSUPERVISED LEARNING
Before applying the self-organizing model in the LCTV neural network of 9 3.2, two major factors must be considered: one is the system components and alignment errors, the other is the effect of the parameters of the selforganizing system. Since the submatrices of the interconnection weight matrix (IWM) are precisely interconnected with the input pattern vector by a lenslet array, the alignment of the interconnections is rather critical. Although the adjustment of lenses is difficult, it is rather simple to shift the IWM submatrices in LCTV1. A set of test patterns can be displayed on LCTVl and LCTV2. The test patterns are shifted in small steps according to the sharpness of the output pattern detected by the CCD camera. Thus, the interconnection alignment can be self-adjusted by means of a feedback loop. The uniformity and stability of the light illumination also pose some problems for the accuracy of output results. To alleviate the uniformity
136
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CIL § 8
problem, a test is performed when a new input pattern vector is presented. During the test, all elements in the memory matrix are set to the same value, i.e., mlkij(to)=l,
1 , k = l , 2,..., M , i , j = 1 , 2,..., N .
(8.12)
The output array can then be obtained as N
N
(8.13) The uniformity test array ?(to) is then divided by the output array during the self-learning and recognition processes, given by (8.14) where Y*(t)is the normalized output array in which the nonuniformity for the light illumination can be eliminated. For experimental demonstrations, four 8 x 8 pixel patterns (Lee,tree, dog, house, and airplane) were sequentially presented at the input LCTV2 (fig. 8.4a). Figure 8.4b shows the initial memory matrix as a random pattern. The first neighboring region is chosen as NJO) = 5, the initial learning speed is a(0)= 0.01, and the learning rate is selected as a = 0.00025 per iteration. The output pattern picked up by the CCD camera must be normalized by referring to eq. (8.14). The location of maximum output intensity can then be identified by using the MAXNET algorithm. The memory submatrices
Fig. 8.4. Self-organizing ONN using unsupervised learning: (a) input training set; (b) initial memory matrix space with random noise; and (c) final memory matrix space. The patterns are adopted into the memory matrix; the tree is centered at (I, S), the dog is at (7, l), the house is at (7,7), and the airplane is at ( I , I).
11, § 81
SELF-ORGANIZING OPTICAL NEURAL NETWORKS
137
in the neighborhood of the maximum output spot are adjusted based on the adaptation rule of the Kohonen model. The updated memory matrix is then displayed on LCTVl for the next iteration, and so on. Due to the graylevel limitation in LCTV displays, a few iterations in the learning process may result in one noticeable gray level change by the CCD detector array. Since the inputs are binary patterns, the center memory vector in the neighborhood region eventually converges to the binary input vector. Thus the limited dynamic range of LCTVs would not pose a major problem in our experiment. As shown in fig. 8.4c, after 400 iterations the memory has learned the four input patterns, mapping them around four overlapping regions in the IWM. The centers of these patterns are located at (1,8), (7, l), (7,7), and (1, l), respectively. One of the major advantages in adaptive pattern recognition is to learn the unknowns in addition to recognizing known objects. This task can be achieved by adding three criteria to the Kohonen learning rule, for which the matching rate R for pattern recognition can be defined as
(8.15)
-
where 11 11 is the norm operator. Let us assume that two constants F, and F, represent the matching and unmatching scores, and F, < 1, F, < 1, and F, > F,; then the criteria of the learning rules are as follows: (1) If R 2 F , , the input object matches the memory submatrix at position c; then the neural network recognizes the input object. (2) If F, < R < F , , the input object is identified as a pattern that belongs to the class of patterns (i.e., submatrices) around the location c. This shows that the input object has differences from the existing patterns in the class. Then the memory submatrices in the neighborhood of c should be modified to adapt to the new input pattern. (3) If R < F,, the input object is considered to be an unknown pattern; then a new category is developed to learn the unknown pattern using eq. (8.15). For illustration, F , and F, are selected as F, = 95% and F, = 75%. A computer-simulated, self-learning process is illustrated in fig. 8.5, in which four standardized capital letters A, B, C, and D are sequentially presented as the input objects to the neural network. It can be seen that in the memory matrix the locations of the submatrices reflect the similarity among the input
138
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CIL § 8
Fig. 8.5. Feature map in the memory matrix space; three similar patterns are located on the same side in this memory matrix, B is centered at (8,4), C is at (6, I), and D is at (8,8), whereas A is at (I, 8).
patterns. The grouping of the submatrix locations is known as Kohonen’s feature map, for which the patterns of similar features tend to stay close to each other. For example, B, C, and D are similar patterns; their submatrices are located on the same side of the memory matrix, centered at (8,4), (6, I), and (8,8), respectively, whereas pattern A sits in the opposite direction, located at (1,8). It should be noted that the inner products of the memory matrix with the input pattern are performed in parallel by optics, but the MAXNET algorithm, the thresholding, and the adaptive operations are carried out by microcomputer. Our experiences showed that the best results in self-organization are obtained when the neighborhood of the memory matrix space is first selected to be a rather wide region, which is then allowed to shrink as time proceeds. The linearly decreasing learning speed a([)is shown in fig. 8.6. Since tll > a2, it is clear that a,(t) is a faster learning speed than az(t).It is obvious that to reach the stable state of the memory matrix, al(t) would require fewer iterations than a 2 ( t ) .In real life, however, the intensive training may not always be effective. With the slower learning speed the memory matrix is more organized and also appears smoother. The price paid is obviously the training time required for using a slower learning speed. As an example, the memory matrix space adapted from the faster learning speed is shown in fig. 8.7a. After 400 iterations, using a(0) = 0.02 and a = 0.00005, similar patterns I, J, and T are adapted in the lower left part of the memory matrix,
SELF-ORGANIZING OPTICAL NEURAL NETWORKS
0
139
Time sequence
Fig. 8.6. Fast and slow linear learning speeds.
centered at (5, l), (7,2), and (8,4), respectively, whereas pattern X submatrices occupy the upper right part, centered at (3,8). Since pattern X is very different from the I, J, and T patterns in the memory space, using the slower learning rate, CI = 0.000025, pushes the pattern X submatrices farther away to the upper right corner, centered at (1,8) after 800 iterations (fig. 8.7b). In this figure we see that the memory space is more topologically organized. When similar patterns are presented at the input of a neural network, a new pattern may occasionally override the old pattern through selecting the same optimum position in the memory matrix space, and the old pattern gradually fades away. This phenomenon is rather similar to the learning
Fig. 8.7. Memory matrix topology using linear learning speeds. (a) Fast learning rate with the centers of the similar patterns I at (5, I), J at (7,2), and T at (8,4),and the center of the different pattern X at (3,8). (b) Slow learning rate, with the pattern X pushed farther away to the upper right corner, centered at ( I , 8). Notice that the memory matrix for the slow learner becomes topologically more organized and also appears smoother.
140
OPTICAL NEURAL NETWORKS ARCHITECTURE. DESIGN AND MODELS
CII, 9: 8
process of humans: if one moves to southern Florida for a long period, one may forget the bitterness of the cold winters in the northern United States. An example of this phenomenon has been simulated by computer (fig. 8.8). An input pattern B is first presented to the optical neural network. The memory matrix adapts the pattern around the lower right corner, centered
Fig. 8.8. Topological memory matrices showing learning without erasing the old memory: (a) pattern B in the memory matrix space; (b) pattern B submatrices are erased, after pattern E is learned; (c) forbidden regions in the memory matrix space; and (d) learning new patterns without erasing old ones. The centers of the 11 patterns are B at (8,4), C at (8,6), E at (6,8), F at (8,8), I at (I, 8), J at (l,6), T a t (3,8), V at (7, I), W at (6,3), X at ( I , I), and Y at (I, 3).
11, § 81
SELF-ORGANIZING OPTICAL NEURAL NETWORKS
141
around (8,8), as shown in fig. 8.8a. A second pattern E is presented to the neural network at a later time. Since pattern E is similar to pattern B, it picks the same spot (8,8) in the memory matrix space as its maximum output point. As can be seen in fig. 8.8b, the memory submatrices of B are eventually taken over by E after 100 iterations. If one wishes to preserve the old memory while learning new knowledge, there are two ways to achieve this goal. First, one has to refresh the memory by repeatedly reviewing the previous input patterns, which takes a lot of computational time and may cause confusion. Second, one has to set a rule in the learning process: the new pattern would not take the output nodes within certain regions of the old patterns in the memory matrix, i.e., their center spots. This rule can be expressed as follows: (8.16) where Re@,) consists of the output nodes of d,(t,), dc(tl),..., dc(tn-l) and their neighboring regions (fig. 8.8~).The radii re of the forbidden region can be determined by experience (in our experiments, re = 2 was selected), After this rule is added to the Kohonen model, the 8 x 8 array neural net can easily remember 11 patterns without erasing similar patterns (fig. 8.8d). The centers of these 11 patterns in the memory matrix are B at (8,4), C at (8,6), E at (6,8), F at (8,8), I at (1,8), J at (1,6), T at (3,8), V at (7, l), W at (6,3), X at (1, l), and Y at (1,3). Although reasonable results can be obtained in a self-organizing optical neural network by adjusting the learning parameters, the best results were achieved through experience. In addition, since the original Kohonen model was not designed for pattern recognition, the ability for self-organization and adaptive pattern recognition can be enhanced by incorporation of more effective models, such as the interpattern association (IPA) model, in constructing the interconnection weight matrix during the learning process. In summary, we have implemented the Kohonen self-organizing feature map in an optical neural network. This model has been modified with matching criteria, showing that a self-organizing optical neural network has the ability to develop new categories, and to learn the unknowns. The optical neural network is also capable of organizing a feature map and preserving old memory while learning new knowledge. These abilities can be achieved by setting new matching scores, and introducing the concept of forbidden regions in the memory space.
I42
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
CII, 9 9
9; 9. Conclusion
We have reviewed the basic principles of neural networks, stressing their design, models, and architectures. In view of the massive interconnection property and parallel processing capabilities of optics, we have presented several optical neural network structures, including liquid-crystal television based neural networks, lenslet-array and mirror-array interconnections, and optical-disk neural nets; several models of associative memories were also discussed, including the Hopfield, back-propagation, orthogonal projection, multilevel algorithm, interpattern-association, and space-time-sharing models. We have shown that the interpattern-association and orthogonal projection models provide greater robustness of performance, in which the storage capacity of the neural network increases dramatically. Although the space-time-sharing model can process a wide space-bandwidth input, the price paid is that the processing speed of the network net is prolonged. To improve the performance of the neural network, the concept of redundant interconnection was introduced, which enhanced the network performance under noisy conditions. Implementation of a Hamming net in the optical neural network was also discussed. We showed that the Hamming net requires fewer connections than the fully interconnected Hopfield net. An important aspect of the modified Hamming net is to enlarge the Hamming distances of the output patterns, which relaxes the dynamic range requirement and also reduces the number of iteration cycles in the MAXNET. By contrast with the standard memory, storing information in an associative memory is a complicated procedure. Nevertheless, the basic principles of information storage capacity of a neural net were reviewed, with mention of the Hopfield and IPA models. We have stressed the possibility of using image moments rather than the Hamming distance for neural computing, showing that the distortion invariant and noise immunity can be applied in neural networks. Strictly speaking, the two topics of artificial neural network are supervised and unsupervised neural nets. We applied the Kohonen’s feature map to the optical neural network, showing that the unsupervised learning model can learn by itself. Despite the widespread application of digital computers, the neural networks perform better in cognitive tasks. By exploiting the massive connectivity and parallel processing of optics, and computer flexibility, using the strengths of both electronics and photonics, we developed several hybrid optical neural networks, with the hope that this article will provide the basic design concepts and implementation for future research in optical neural net works.
11, Q 91
REFERENCES
143
The number of contributors in this field is large, and I apologize for any omission in reference to their work. References Abu-Mostafa, Y.S., and J.M. St. Jacques, 1985, IEEE Trans. Inform. Theory IT-31, 461. Anderson, D.Z., and M.C. Erie, 1987, Opt. Eng. 26,434. Athale, R.A., and C.W. Stirk, 1989, Opt. Eng. 28, 447. Athale, R.A., H.H. Szu and C.B. Friedlander, 1986, Opt. Lett. 11, 482. Carpenter, G.A., and S. Grossberg, 1987, Comput. Vision, Graphics & Image Process. 37, 54. Dunning, G.J., Y. Owechko and B.H. Soffer, 1991, Opt. Lett. 16, 928. Farhat, N.H., and D. Psaltis, 1987, in: Optical Signal Processing, ed. J.L. Homer (Academic Press, New York) ch. 2.3. Farhat, N.H., D. Psaltis, A. Prata and E.G. Paek, 1985, Appl. Opt. 24, 1469. Fukushima, K., 1969, IEEE Trans. Syst. Sci. & Cybernetics SSC-5, 322. Fukushima, K., 1988, Computer 21, 65. Gorman, R.P., and T.J. Sejnowski, 1988, Neurol. Netw. 1, 75. Gregory, D.A., 1986, Appl. Opt. 25, 467. Hecht-Nielsen, R., 1986, Proc. SPIE 634, 277. Hong, J., S. Campbell and P. Yeh, 1990, Appl. Opt. 29, 3019. Hopfield, J.J., 1982, Proc. Nat. Acad. Sci. USA 79, 2554. Hopfield, J.J., 1988, IEEE Circuits & Devices CD-4, 3. Hu, M.K., 1959, IRE Trans. Inform. Theory IT-8, 179. Hurst, S.L., 1978, The Logical Processing of Digital Signals (Crane Russak, New York) ch. 3. Johnson, K.M., M.A. Handschy and L.A. Pagano-Stauffer, 1987, Opt. Eng. 26,385. Jutamulia, S., H. Fuji and T. Asakura, 1982, Opt. Commun. 43, 7. Jutamulia, S., G.M. Storti, J. Lindmayer and W. Seiderman, 1991, Appl. Opt. 30,2879. Kohonen, T., 1984, Self-organization and Associative Memory (Springer, Berlin) ch. 5. Lalanne, P., J. Taboury, J.C. Saget and P. Chavel, 1987, Proc. SPIE 817, 27. Lazzaro, P., M. Ryckebusch and C. Mead, 1988, Winner-Take-All Network of q n ) Complexity, CALTECH-CS-TR-21-88, Li, Y., 1990, Application of Moment Invariants to Neurocomputing for Pattern Recognition, Ph.D. Thesis (The Pennsylvania State University, University Park, PA) ch. 5. Li, Y., and F.T.S. Yu, 1991, Optik 86, 141. Lippmann, R.P., 1987, IEEE Trans. Acoust., Speech & Signal Process. ASSP-45,4. Lippmann, R.P., 1989, IEEE Commun. Mag. 27,47. Lippmann, R.P., B. Gold and M.L. Malpass, 1987, A Comparison of Hamming and Hopfield Neural Nets for Pattern Classification (MIT Press, Cambridge, MA) Lincoln Lab. Tech. Rep. TR.769. Liu, H.K., J.A. Davis and R.A. Lilly, 1985, Opt. Lett. 10, 635. Liu, H.K., S.Y. Kung and J.A. Davis, 1986, Opt. Eng. 25, 853. Lu, T., K. Choi, S . Wu, X. Xu and F.T.S. Yu, 1989, Appl. Opt. Lett. 28, 4722. Lu, T., F.T.S. Yu and D.A. Gregory, 1990, Opt. Eng. 29, 1107. Lu, T., S . Wu, X. Xu and F.T.S. Yu, 1989, Appl. Opt. 28, 4908. Lu, T., X. Xu, S . Wu and F.T.S. Yu, 1990, Appl. Opt. 29, 284. McAulay, A.D., J. Wang and C. Ma, 1990, Appl. Opt. 29, 2067. McEliece, R.J., E.C. Posner, E.R. Rodemich and S.S. Venkatesh, 1987, IEEE Trans. Inform. Theory IT-33, 100.
144
OPTICAL NEURAL NETWORKS ARCHITECTURE, DESIGN AND MODELS
111
McEwan, J.A., A.D. Fisher, P.D. Rolsma and J.N. Lee, 1985, J. Opt. SOC.Am. A 2, 8. Owechko, Y., G.J. Dunning, E. Marom and B.H. Sofler, 1987, Appl. Opt. 16, 1900. Paek, E.G., and D. Psaltis, 1987, Opt. Eng., 428. Paige, L.J., and J.D. Swift, 1961, Elements of Linear Algebra (Ginn and Company, New York, NY) p. 78. Psaltis, D., and N.H. Farhat, 1985, Opt. Lett. 10, 98. Psaltis, D., D. Brady and K. Wagner, 1988, Appl. Opt. 37, 1752. Psaltis, D., A.A. Yamamura, M.A. Neifeld and S. Kobayashi, 1989, Opt. Computing 9, 58. Psaltis, D., J. Yu, X.G. Gu and H. Lee, 1987, Opt. Computing, Tech. Digest Ser. 11, 129. Rosenblatt, F., 1962, Principles of Neurodynamics (Spartan Book, Washington, DC) chs. 1 & 2. Rumelhart, D.E., and D. Zipser, 1986, in: Parallel Distributed Processing, Vol. 1, eds D.E. Rumelhart and J.L. McClelland (MIT Press, Cambridge, MA) ch. 5. Rumelhart, D.E., G.E. Hinton and R.J. Williams, 1986, in: Parallel Distributed Processing, Vol. 1, eds D.E. Rumelhart and J.L. McClelland (MIT Press, Cambridge, MA) ch. 8. Tai, A.M., 1986, Appl. Opt. 25, 1380. Tam, E.C., F.T.S. Yu, D.A. Gregory and R.D. Juday, 1990, Opt. Eng. 29, 314. Teague, M.R., 1980, J. Opt. SOC.Am. 70,920. Wang, C.H., and B.K. Jenkins, 1990, Appl. Opt. 29, 2171. Winder, R.O., 1962, Threshold Logic, Ph.D. Thesis (Princeton University, Princeton, NJ) ch. 3. Wu, S., T. Lu, X. Xu and F.T.S. Yu, 1989, Microwave Opt. Tech. Lett. 2, 252. Yang, X., and F.T.S. Yu, 1992, Appl. Opt. 31, 3999. Yang, X., T. Lu, F.T.S. Yu and D.A. Gregory, 1991, Appl. Opt. 30, 5182. Yang, X., T. Lu and F.T.S. Yu, 1990, Appl. Opt. 29, 5223. Young, M., 1986, Appl. Opt. 25, 1024. Yu, F.T.S., and S. Jutamulia, 1987, Appl. Opt. 26, 2293. Yu, F.T.S., T. Lu and X. Yang, 1990, Int. J. Opt. Compt. 1: 129. Yu, F.T.S., T. Lu, X. Yang and D.A. Gregory, 1990, Opt. Lett. 15, 863. Yu, F.T.S., X. Yang and T. Lu, 1991, Opt. Lett. 16, 247. Yu, F.T.S., X. Yang, S. Yin and D.A. Gregory, 1991, Opt. Lett. 16, 1602. Zhang, L., M.G. Robinson and K.M. Johnson, 1991, Opt. Lett. 16,45.
E. WOLF, PROGRESS IN OPTICS XXXII @ 1993 ELSEVIER SCIENCE PUBLISHERS B.V.
ALL RIGHTS RESERVED
THE THEORY OF OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES BY
L. P. YAROSLAVSKY* Laboratory of Digital Optics Institute of Information Transmission Problems Russian Academy of Sciences Yermolovoy Street 19 I0I44 7 Moscow, Russia
* Present address: Biomedical Engineering and Instrumentation Program, National Center for Research Resources, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892, USA. 145
CONTENTS PAGE
8 1 . INTRODUCTION . . . . . . . . . . . . . . . . 147 8 2. THE ACCURACY AND RELIABILITY OF THE LOCALIZATION OF TWO-DIMENSIONAL OBJECTS ON APLANE . . . . . . . . . . . . . . . . . . . 149 Q 3. LOCALIZATION OF OBJECTS ON A COMPLEX
BACKGROUND WITH A MINIMUM OF ANOMALOUS ERRORS . . . . . . . . . . . . . . . . . . . 172
0 4.
CONCLUSION . . . . . . . . . . . . . . . . . 199
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . 200 REFERENCES . . . . . . . . . . . . . . . . . . . 200
I46
0 1. Introduction One of the major goals of picture processing is to provide information about the relative location of objects in space. In many applications, detection and localization (i.e., measurement of coordinates) of objects is of extreme practical importance. Almost all tasks in picture processing and interpretation, especially those of object recognition, can be reduced to this problem. A copious literature exists on localization and detection of objects in pictures, but the variety of ideas used to solve this problem is not too rich. Essentially, detection and localization of objects is reduced in all methods to some kind of correlation of the given object with the observed picture and subsequent comparison of the result with a threshold. This approach is usually substantiated by an additive observation model, which treats the observed picture signal as an additive mixture of the desired object signal and signal-independent noise. There are practical reasons for the general adherence to a correlator. The correlation detector-estimator is essentially a version of the so-called linear detector-estimator. Decisions about the presence and coordinates of the desired object are made point-wise on the basis of the signal level in each point of the field at the output of a linear filter acting upon the observed picture. The function of the linear filter in such devices is to transform the signal space in order to enable decision making on the base of individual signal coordinates of the transformed space rather than using the entire signal. Due to separation into independent linear and point-wise non-linear units, data analysis and implementation of such devices is much simplified. The implementation issue is of particular importance in picture processing because of the enormous number of degrees of freedom in optical and picture signals, which results in computational complexity in data processing. Fortunately, a very elegant optical feature of the correlator exists which allows image correlation to be performed with the speed of light. It was proposed by Van der Lugt [1964] almost 30 years ago. This development began an “era” of coherent optical correlators in optical information processing. Even now in almost each issue of Applied Optics, Optics Communications 147
148
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, § 1
and other optical journals one can find a paper on optical correlators and related problems. This tells about the topicality of the problem. At the same time, this means that the problem is still open despite the passage of 30 years. What is the reason for that? In the present author’s opinion, there are two reasons. The first is technical: optical correlators remain too imperfect to be able to compete with digital electronic processors in real applications. The second reason is that, until recently, the theoretical basis was not sufficiently developed. The present paper is an attempt to review the current state of the theory of optimal correlators, and, in this way, to contribute to the solution of the problem. The performance characteristics of localization devices can be assessed quantitatively in terms of their accuracy and reliability (Woodward and Davies [ 19501). Localization accuracy and reliability are limited by the presence of some random noise in the observed picture signal due to the image sensor’s noise (e.g., graininess of the photomaterial or intrinsic noise of the video camera, etc.), as well as by signal components from outside background objects. This paper presents an analysis of these factors, starting with the simplest case of additive, signal-independent noise and progressing to a treatment of the most general situation of pictures with a cluttered background. Section 2 is devoted to a general analysis of the potential accuracy and the reliability of object localization in the presence of additive Gaussian noise. In 52.1, the concepts of accuracy and reliability as well as normal and anomalous errors of localization are introduced. The optimality of the conventional correlator (matched filter) is shown for the case of picture observation with additive white Gaussian noise. In Q 2.2, the potential accuracy of localization is characterized in terms of the variance of measurement errors along the coordinates. In Q 2.3, these results are extended to the case of mismatched filter and non-white noise, and in 6 2.4 they are extended to object localization in color or, generally, in multi component pictures. The reliability of localization in the presence of additive white Gaussian noise is investigated in Q 2.5. The probability of anomalous errors is estimated, its threshold behavior is indicated, and the lower bound of the object signal energy per bit of measurement information is found. In Q 2.6 these results are revisited from the more general perspective of the theory of random processes. Finally, in 9 2.7 the probability of anomalous localization errors in the presence of additive Gaussian noise and multiple outside objects is estimated. Moreover, the problem of designing devices for localization with minimal probability of anomalous errors in the presence of cluttered background is formulated.
I n 5 21
LOCALIZATIONOF 2D OBJECTS ON A PLANE
149
Solution of this problem is discussed in Q 3. In Q 3.1 the problem obtains a mathematical formulation. In Q 3.2 the optimal filter for localization of an exactly known object with a minimal rate of anomalous errors is presented. In Q 3.3, this result is extended to the case of an inexactly known object. In Q 3.4 the result is further extended to the case of an inhomogeneous localization criterion wherein the importance of the errors is not uniform over the picture area, and in Q 3.5 to localization in blurred pictures. In Q 3.6, the results are extended to localization in multi-component pictures with a cluttered background. In Q 3.7, the discrimination capability of some correlators that are now popular in optical pattern recognition is analyzed and compared with that of an optimal correlator. An explanation is given for the importance of contours and edges in pattern discrimination. Finally, in Q 3.8 a strategy for selection of the most reliable reference objects is suggested and discussed on the basis of the theory presented.
Q 2. The Accuracy and Reliability of the Localization of Two-dimensional Objects on a Plane 2.1. LOCALIZATION OF A SINGLE OBJECT IN THE PRESENCE OF ADDITIVE WHITE GAUSSIAN NOISE OPTIMAL LOCALIZATIONDEVICE AND TWO TYPES OF LOCALIZATION ERRORS
Let us first consider the simplest discrete observation model when samples { b k }of the observed signal can be regarded as a sum of samples ( a k ( X 0 , y o ) } of the signal from a given object having unknown coordinates (xo, y o ) and samples { n k } of the interference of noise,
Now assume that noise samples are statistically independent of the signal { a k ( X O , yo)}, are non-correlated, and have a Gaussian probability distribution with zero mean and variance a2.This model describes the simplest situation where the only disturbance that interferes with object localization is the noise of the signal sensor. Thermal noise serves as an example, and can usually be regarded as additive, Gaussian, signal-independent and noncorrelated. A typical practical task to which such a model corresponds is, e.g., localization of constellations in stellar navigation. Because of random noise, the problem of optimal localization should be treated statistically. We shall seek a way to obtain the statistically best
150
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
[III, 9: 2
estimation of the coordinates ( x o , y o ) of the object given samples of the observed signal { b k } .The statistically best estimation is known to be the estimation by the maximum a posteriori probability of (xo,yo), or its equivalent, y o ) } (see, e.g., Wozencraft and Jacobs [1965]). The a posteriori probability of {ak(XO,yo)} given {bk} may be found from the Bayes rule in probability theory as
where P({ -}) are the corresponding a priori probabilities and P({ *}/{ .}) are the corresponding conditional probabilities. It is evident that for the model of eq. (2.1) P({bk}/{ak(xO,
YO)})
= P({nk = bk - a k ( x O , YO)}),
(2.3)
and P ( { b k } )does not depend on {(xo,yo)}. Therefore, the optimal estimation of the object coordinates will be ( i O ? 90)=
arg max {P({ak(xO,YO)}) P({nk = bk - a k ( x O ,
YO)})}*
(2.4)
(X0.YO)
Such an estimation is called a maximum a posteriori probability (MAP) estimation. It requires a knowledge of the a priori probabilities of the object coordinates. If the estimation is made without regard to an a priori distribution, or, equivalently, made using the assumption of a uniform a priori distribution, it is called a maximum likelihood (ML) estimation. Let us now write an explicit expression for P({nk = bk - a k ( X 0 , yo)}). Since by our assumption the samples ({ nk}) are non-correlated Gaussian numbers with variance g2,
where C is an, unimportant, normalization factor and K is the total number of signal and noise samples. Substituting eq. (2.5) into eq. (2.4) and taking into account that exp(.) is a monotonic function and that :::C Ibklz and cK-1 k = O lak(x0, yO)l2do not depend on the coordinates ( x o , yo), we obtain (iO,
90)=
arg max (X0,YO)
{
bkak(x0, Y O ) + 2g2 In P(x0, Y O )
1 9
(2.6)
According to the theory of discrete representation of signal transforms (see,
111, P 21
151
LOCALIZATION OF 2D OBJECTS ON A PLANE
e.g., Yaroslavsky [19851) K-1
1 bkak(xO, Y O ) =
4FxFy
k=O
(2.7) where b(x, y ) and a(x - xo, y - y o ) are the continuous signals that correspond to the set of samples {bk} and (ak(X0, yo)}, and (2Fx,2Fy) are the frequency bandwidths which correspond to the chosen sampling rate for the discrete model of eq. (2.1). Then for continuous signals we obtain for M AP-estimation (io, j o )= arg max (X0,YO)
{ j_a,sp, b(x,
Y ) a(x - xo, Y - Y o ) dx dy
1
+ 2No In W 0yo) ,,
(2.8)
where No=a2/4FxF, is the spectral density of the noise, and for MLestimation we obtain (io, j o )= arg max (X0.YO)
I s_
b(x, Y ) 4 x - xo, y - y o ) dx dy
1
.
(2.9)
Thus, the optimal ML-estimator should calculate the mutual correlation function between the object signal a(x - xo, y - yo) and the observed signal b(x, y) and take the coordinates of the maximum of the correlation pattern as a coordinate estimation. The optimal MAP-estimator also consists of a correlator and a decision making device that locates the maximum in the correlation pattern (fig. 1). The only difference is that the correlation pattern should be biased by the appropriately normalized pattern of the logarithm of the a priori probability distribution of the coordinates of the object. The operation of correlation of the mixture of signal and noise with the copy of the signal is often called matched filtering. The filter which carries
A
picture
Fig. 1. Block diagram of optimal localization device for localization of objects observed in the presence of additive white Gaussian noise.
152
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 8 2
out this operation is appropriately called a matched filter. The frequency response of the matched filter, according to the properties of Fourier transform, is evidently cr*(f., &), the complex conjugate of the object signal spectrum. A correlation, or matched filter type of localization device may be implemented easily by optical and holographic means. As was already mentioned, this was recognized at the very early stages of holography (Van der Lugt [ 19643). Let us determine the performance characteristics of the optimal ML coordinate estimator for the situation under consideration. In this analysis, we should distinguish between two essentially different types of possible errors of estimation (Yaroslavsky [ 19721): (i) small errors of measurements due to distortion of the object shape by the noise when coordinate estimations lie in the vicinity of their actual values, and (ii) large errors due to false localization of the object very far from its actual location due to possible big noise outbursts outside the object. These large errors are similar to the so-called “false alarm” errors in signal detection, or false recognition errors in object recognition. Following the terminology of Kotelnikov [1956], we shall refer to the first type of errors as normal errors because, as we shall see later, their distribution density is very close to a Gaussian one. Errors of the second type we shall call anomalous errors. Normal errors characterize the accuracy of measurements while anomalous errors characterize the measurement reliability.
2.2. LOCALIZATION OF A SINGLE OBJECT IN THE PRESENCE OF ADDITIVE
WHITE GAUSSIAN NOISE: POTENTIAL ACCURACY OF COORDINATE MEASUREMENTS
Statistical characteristics of normal errors were found by Yaroslavsky [1972, 1992a1 from an analysis of the correlator output signal
where (2.1Ob)
11175 21
LOCALIZATION OF 2D OBJECTS ON A PLANE
153
R,(x, y) is the auto-correlation function of the object signal Ra(x, Y) =
jym j -m
dq,
(2.1la)
4 5 - x, rl - Y) d5 drt,
(2.11b)
- x, q - Y) d t
a(t,
and
R " b , Y) =
Y jm jm4 5 , -m
II)
is a random Gaussian field. It follows from eq. (2.1lb) that its correlation function is equal to NoRa(x,y). In the case of normal errors, the maximum of the signal R(x,y) at the correlator output occurs in the close vicinity of the coordinates (xo,yo) of a maximum of R,(x, y). The location of the maximum of R(x, y ) can be found from the following system of equations
Let the solution of this system be x = xo
+ n,;
(2.13)
Y =yo + n y ;
and assume that the errors n, and ny are small. Then after some computations one can obtain for n, and ny the following relationships n, =
DY DXY D, D, - D:Y "- D, D, - DZYv y ;
(2.14a)
ny =
D, D,Dy - D:y
(2.14b)
VY -
DXY D,Dy - D:Y vx'
with (2.14c)
(2.14d)
154
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIIL 0 2
(2.14e) where E, is object signal energy given by Ea=
S_ jm
~a(fx,f,)~'dfxdfyi
(2.15)
-m
a ( f x , f y )is Fourier spectrum of the object signal and is given by
s_ lm
@, Y ) exp{ -i2x(fxx + f Y y ) ) dx dy,
a(fx,fy)=
(2.16)
-m
z,z,z
are the inertial moments of the object signal power spectrum along the corresponding axes, r m
Pm
(2.17a)
(2.17b) J-m
J-m
and (2.17c)
(2.17d) These relationships hold because the derivatives of the Gaussian random process R,(x, y), with correlation function No Ra(x,y) are also Gaussian random processes with zero mean and with variances 4.rr2NoEaZand 4 7 ~ respectively. ~ ~ ~ Equations (2.14a)-(2.14e) imply that the small errors {nx, n y } in the determination of the object coordinates by the ML-estimator have a Gaussian
E~E,
I K § 21
LOCALIZATION OF 2D OBJECTS ON A PLANE
155
distribution with zero mean and variances given by (2.18a) (2.1 8b) where (2.18~) This means that variances of the normal errors are determined by the signalto-noise ratio E , / N o and the inertia moments (eqs. (2.17a)-(2.17d)) of the object signal power spectrum. These last are the only characteristics of the object shape that affect its potential localization accuracy. If the object signal power spectrum is symmetrical in relation to the coordinate axes, I~(fx9fy)12
=I 4 f X 3
-fy)12
= 14-fx>fy)127
(2.19)
eqs. (2.18a, b) take more simple forms: (2.20a) rJ:y
= 0.
(2.20b)
The last equation means that if the signal power spectrum is axisymmetrical, normal localization errors along the coordinates x and y are noncorrelated. This situation arises if the object spectrum a ( f x ,f,), or equivalently, the object signal a(x,y), is a separable function of the coordinates. Equation (2.20a) coincides with the classical relationship for one-dimensional signals (e.g., Wozencraft and Jacobs [1965]). Sometimes it is more convenient to express the variances of the normal errors in terms of the variance of the noise at the matched filter output rather than in terms of the input noise spectral density N o . Since this variance is evidently equal to (2.21)
156
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
[In, 5 2
eqs. (2.18a, b) may be rewritten as (2.22a) (2.22b)
2.3. LOCALIZATION OF A SINGLE OBJECT IN THE PRESENCE OF ADDITIVE GAUSSIAN NOISE: MEASUREMENT ACCURACY FOR NON-OPTIMAL
ESTIMATOR; LOCALIZATION IN NON-WHITE NOISE
Implementation of the correlator or matched filter requires an exact knowledge of the shape of the object under study. Of course, in practice the object shape is often not known with high accuracy and/or it must be approximated because of limitations of the implementation. Therefore, it is of interest to estimate losses in localization accuracy due to deviations of a real filter in the localization device from a matched filter. We shall explore these losses for the one-dimensional or separable spectrum case. Let the frequency response of the filter in the localization device be H(f,) instead of a*(f,), the frequency response of the matched filter. The value of the signal at the output of such a filter will be
1
4 ,
b(x) =
a ~ x ~(fx)exp{-i2n~fx(X--Xg)1}dfx+ ) Rnh(X),
(2.23)
-m
where Rnh(x)is the result of filtering the white noise component in the observed signal by the filter H(f,). The location point (xo + a , ) of the maximum of this signal in the close vicinity of the point xo (n, is small) is defined by
a
- b(x)l
ax
x = xg
+ n,
=O.
(2.24)
Since n, is assumed to be small and xo is the point of the actual location of the object where the filter response is maximal, we obtain (2.25) where as before v, = -(a/ax)Rn,(x). The power spectrum of the random process v, is evidently 47~~f~N,,IH(f,)1~. Hence, the variance c : of the small
111, § 21
LOCALIZATION OF 2D OBJECTS ON A PLANE
157
(2.26)
The third factor in this formula is nothing but a loss factor (LSFR) showing how much the normal error variance for a non-optimal filter exceeds that of an optimal (matched) one,
lm lm fZIfKL)12dfx
LSFR =
-00
f,ZI.(fx)l’dfx
-m
\2
(2.27)
According to the Schwarz inequality LSFR 2 1. LSFR reaches its minimal value of one if the filter is matched to the object signal; i.e., if H ( f x )= a*(fx). To estimate the order of magnitude of the loss factor, we shall present the results of calculation of the LSFR for an object signal of Gaussian shape with spectrum a ( f x )= exp(-f:/2fz), and for the filter pulse response of a similar shape but of different spread, such that H(fJ = exp(-f;/2fi). For such a case, we obtain (2.28)
A graph of LSFR versus the ratio of the object spread to that of the filter pulse response is shown in fig. 2. Figure 2 demonstrates that the losses become noticeable if this ratio exceeds a value of about 1.5; i.e., when the loss factor is equal approximately to 1.27. This means that if the filter in the localization device is not grossly mismatched to the object under localization, the localization accuracy will remain close to its upper limit. Let us now suppose that the noise is non-white; i.e., its power spectrum is not uniform. Denote the noise power spectrum in this case by NolH,(fxfy)12. Evidently, this situation can be reduced to the previous one of white noise if we route the observed signal plus noise mixture through a so-called “whitening filter” with a frequency response of l/Hn(fx,fy) (Kotelnikov [19561).At the output of such a whitening filter, the noise power spectrum becomes uniform with a spectral density No, while the object signal spectrum becomes equal to cr(fx,f,)/l Hn(fx,fy)l. In this case, an ML-optimal
158
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIIL § 2
ratio of the filter bandwidths Fig. 2. Comparison of localization accuracy for optimal and non-optimal filters.
localization device will consist of the whitening filter, followed by a filter matched to the object signal at the output of the whitening filter, and a device for localization of the signal maximum. The whitening and matched filters may be combined into one optimal filter whose frequency response Hopt(fX,fy)will be (see Van der Lugt [1964]) (2.29) We thus arrive at the optimal estimator shown in fig. 3. It is obvious that the potential localization accuracy for such an estimator will be defined by the same formulae (2.18a)-(2.18c) as for localization in
picture
Fig. 3. Block diagram of the optimal localization device for objects observed against additive colored Gaussian noise.
m5 21
LOCALIZATION OF 2D OBJECTS ON A PLANE
I59
white noise (2.30a) (2.30b) f:y,NW
’=
.
22112’
(2.3Oc)
(fx.NWfy,NW)
where (2.31a)
(2.31b)
f2 xy,NW
(2.3lc)
2.4. OPTIMAL LOCALIZATION IN COLOR PICTURES
The results discussed above for optimal localization in monochrome pictures were extended by Yaroslavsky [1992b] to the case of localization in color or, more generally, multicomponent pictures observed with additive Gaussian noise. Let {uk,,(xo, yo)} be the mth component of the signal of the object to be localized, m = 1,2, ..., M, with M the number of components (for color pictures, M = 3 with red, blue and green representing the three components), and { n k , , } and { b k , , , } be the samples of the corresponding components of the additive Gaussian noise and the observed signal, so that bk,m
= u k , m ( X O , YO)
+ nk,m*
(2.32)
Further assume that the components of the additive noise in each of the M channels, as well as the samples of the same noise components, are
160
CIW 52
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
statistically independent. This assumption implies directly that the a posteriori probability of the coordinates (xo, yo), given the observed signal {bk,m}, is
(2.33) where 0; denotes the variance of the mth component of the noise. Therefore, the optimal MAP- and ML-estimations are
(2.34a) and (2.34b) For continuous signals, the MAP-estimation is
x am(x-xO,y-yo)dxdy
+ 2 In P(xo,Yo)}}.
(2.35)
where No,,, = a;/4Fx Fy is the spectral density of the mth component of the noise, and the ML-estimation is
.. This means that the optimal localization device should consist of M parallel correlators (or matched filters) for each component of the signal, an adder for weighted summation of the outputs of the correlators, and a unit for the determination of the coordinates of the signal maximum at the output of the adder (fig. 4). The variances of the normal errors that characterize the accuracy of
111, D 21
161
LOCALIZATION OF 2D OBJECTS ON A PLANE
input
component)
Device for
input
localization
(second component)
...................
of s i g n a l
I
I
maximum
Fig.4. Block diagram of the optimal device for localization of objects in multicomponent pictures with additive white noise in each channel.
optimal localization can be found with the same technique as for the singlecomponent signal, (2.37a) (2.37b) with
(2.37e)
162
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, § 2
(2.37f)
(2.378)
(2.37h)
(2.37i) (2.37j) In the case of a symmetrical signal, i.e.,
= 0,
(2.38a) (2.38b) bxy,M 2
= O.
(2.38~)
2.5. LOCALIZATION OF AN OBJECT IN THE PRESENCE OF ADDITIVE GAUSSIAN NOISE: RELIABILITY OF COORDINATE MEASUREMENTS
Let us now proceed to characterization of anomalous localization errors. By our definition, anomalous errors occur when the localization device incorrectly locates the object somewhere outside of the area occupied by the object signal at the matched filter output. This takes place if random noise outbursts in the matched filter output exceed the signal value in the point of actual location of the object. Since noise in the output of the matched filter results from filtering the white Gaussian input noise, it is spatially homogeneous. Hence, large noise outbursts and consequent anomalous errors are uniformly distributed over the area where the object is supposed
111, § 21
LOCALIZATIONOF 2D OBJECTS ON A PLANE
163
to be located. This means that we can characterize anomalous errors simply be their rate of occurrence (i.e., probability). The probability of anomalous errors was estimated by Yaroslavsky [1972], who used the following reasoning. Let AS be the area of correlation of the noise in the output of the matched filter in the sense that it is the area that it takes for the correlation between noise values to become negligibly small. Since the correlation function of the noise in the output of the matched filter coincides with that of the object signal, AS is of the same order of magnitude of the area occupied by the signal in the output of the matched filter. Therefore, in the area of the search, S, there are of the order of Q = S/AS non-correlated samples of Gaussian noise with variance of = N,,Ea. The probability P, of anomalous errors is the complement to the probability that none of the (Q - 1) non-correlated samples of Gaussian noise outside of the object location exceeds the signal value of the point of actual location of the object, which according to the notations in 6 1.1 is equal to R,(O, 0) + R,, where R , is a Gaussian zero-mean random value with variance of and Pm
Pm
(2.39a) In this way, we obtain
with 1 Js;r
"
@(x) = -J-m
exp(-9n2)dn9
(2.39~)
This formula is, evidently, valid for multicomponent pictures as well, with denotations of the appropriate values involved which were introduced in 6 2.4. In communication theory, eq. (2.39b) is known as Kotelnikov's integral and is used to determine the probability of errors in a communication channel with M orthogonal signals and additive white Gaussian noise (see, e.g., Kotelnikov [19561, Wozencraft and Jacobs [19651). It is illustrated in fig. 5.
164
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIIL § 2
A
-15
-10
-5
0
5
10
10 l o g ( E 12 N l n M) Fig. 5. Probability of anomalous errors as a function of the object signal energy E, the spectral density of noise N and the ratio M of the picture area to the object area.
A remarkable feature of Kotelnikov’s integral is its threshold behaviour for large Q
lim Pa=
Q+
1, if Ea/No In Q .
(2.40)
This feature implies that if the field of search is large enough in comparison to the size of the object under localization, the probability of anomalous errors may become enormously high. To minimize this probability, the signal-to-noise ratio should be increased upon increasing the field of search. A second important implication of eq. (2.40)is that for a given noise intensity there exists a trade-off between accuracy of localization (defined by the variance of normal errors) and localization reliability (described by the probability of anomalous errors). Increases in accuracy which are achieved by widening the object signal spectrum with the signal energy being fixed are accompanied by an increasing probability of anomalous errors. This result arises because widening the spectrum is equivalent to narrowing the signal and consequently increasing the ratio Q of the area of search to the signal extent.
1 1 1 9 5
21
165
LOCALIZATION OF 2D OBJECTS ON A PLANE
It should also be noted that eq. (2.40)has a general, fundamental meaning. Note that log, Q is the entropy of the results of our measurements. Therefore, eq. (2.40)gives an absolute lower bound for object signal energy per unit of measurement information in the presence of white Gaussian noise, EJlog, Q > No In 2. 2.6.
(2.41)
LOCALIZATION RELIABILITY IN THE PRESENCE OF ADDITIVE WHITE GAUSSIAN NOISE: MORE ACCURATE ESTIMATION AND APPROXIMATION OF THE LOCALIZATION ERROR DISTRIBUTION DENSITY
As it was stated in $6 2.1, 2.2 and 2.5, two characteristic sections may be separated in the probability density of the localization error. The section of small errors forms the main mode of the density and has an approximately Gaussian shape with almost uniform tails which stretch to the borders of the field of search. The dispersion of the main, Gaussian, mode was estimated in $2.2, and the total volume of the tail section (i.e., the probability of anomalous localization errors) was estimated in $ 2.5. In this section we shall further substantiate these estimations using the methods of the theory of Gaussian random processes. These methods were developed in the early forties by Rice [1944, 19451. We shall also summarize the results presented earlier by formulating a unified expression for the localization error distribution density. For simplicity we consider the one-dimensional case (Yaroslavsky [1970]):
b(x - x o ) = a(x - x o ) + n(x).
(2.42)
Let lobe the estimation of the object coordinate xo, obtained as coordinate of the highest maximum of the signal in the output of matched filter. The probability density p ( i o ) of io may be found from probability density p ( R , i o ) of the event, such that in the point the random process R(x) = R , ( x - x O )
+ R,(x)
the the the xo
(2.43)
in the output of the matched filter has a local maximum with a value R , and from the conditional probability P(A/R,i, that ) the process values in all other local maxima over the interval of search, say X , do not exceed R , p ( i o )=
Ira.
p(R, 2,) P(A/R,io) dR.
(2.44)
I66
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 5 2
The probability density p ( R , 2,) may be computed as p(R, 2,) dx = -
Sr
p ( R ( i 0 )= R; R”(2o);R ’ ( i 0 )= 0) dR, dR,,
,
p(R(20)= R; R“(2,);R ‘ ( i 0 )= 0) dR“ dR’, (2.45a)
where (2.45b) Since n(x) is supposed to be a Gaussian process it follows that (Rice [1944]) 3/2
P(R,Rx, R x x ) = (2nEa)
7 7 72 -(fx)
{ f x Cfx
11112
(2.46) where
E is the fourth moment of the object signal power spectrum, (2.47)
After substitution of eq. (2.46) into eq. (2.45a) and integration, we obtain
x {4n2E[R - R,(2,)] - R ; ( i , ) )
(2.48a)
111, § 21
LOCALIZATION OF ZD OBJECTS ON A PLANE
I67
where 2 - 3 -2 16 - f x / f x .
(2.48b)
For relatively small localization errors, when (2.49) the following approximations can be used for Ra(g0)and Rk(a0):
After some transformations, we then obtain the formula
with (2.51 b) (2.51 c) (2.51 d) If the signal-to-noise ratio is large enough; i.e., (Ea/No)9 1, then p(R, 2,) can be approximated by a very simple expression,
(2.52) Exact calculation of the conditional probability P(A/R,2,) in the general case is a difficult problem, but a reasonably good approximation can be obtained for high signal-to-noise ratios, (&/No) 9 1 . In this case, the value
I68
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 8 2
R of the local maximum in the point 2o close to xo is much higher than the noise variance, and the distribution of the quantity of local maxima on the level higher than R tends to the Poisson distribution (Cramer and Leadbetter [1967]). In this limiting case, if we neglect the probability of more than one local maximum in the area occupied by the object, we find that P(A/R,2 0 )E exp[- QR(X - AX)],
(2.53)
where X is the area of search, Ax is the area occupied by the object, and QR is the mean quantity per unit length of the process n(x) maxima, exceeding the level R. According to Rice [1944]
For high signal-to-noise ratios, ( E , / N o )% 1,
(2.55a) and (2.55b) Therefore, we have in this case
(2.56a) where we denote
0-
1= ( X - A x )
J (z).
(2.56b)
By substituting eqs. (2.52) and (2.56a) into eq. (2.44),we obtain after changing
III, Q 21
I69
LOCALIZATION OF 2D OBJECTS ON A PLANE
variables according to eq. (2.51d):
(2.57) Equation (2.57) has a simple physical meaning. First of all, it confirms the conclusion reached in § 2.2, which was that small localization errors are normally distributed with the variance as defined by eqs. (2.18a, b). It also specifies, by eq. (2.49), which errors can be regarded as small. Furthermore, the integral in the formula has the meaning of a complement to the integral of eq. (2.39b) for the estimation of the anomalous localization errors. The factor Q" in eq. (2.56a), defined by eq. (2.56b), represents a more substantiated estimation of a parameter Q which was introduced qualitatively in 2.5 as the ratio of the area of search to the area occupied by the object. These results can be summarized by the following approximation formula for the distribution density of the localization errors nx and n y ,
+-
(2.58)
which unifies the characterization of normal (first term in the sum) as well as anomalous (second term) noise components. Recall from § 2.5 that S is the area of the field of search and AS is the area occupied by the object. 2.7. LOCALIZATION RELIABILITY IN THE PRESENCE OF ADDITIVE WHITE GAUSSIAN NOISE AND MULTIPLE OUTSIDE OBJECTS
Let us now suppose that the observed picture signal b(x, y ) having the object signal a(x - xo, y - y o ) to be located also contains additive white Gaussian noise n ( x , y ) and some quantity Q of outside objects signals
170
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 0 2
{as(x,y)} which do not overlap each other or the given object (Yaroslavsky
C 19721),
A prominent illustration of such a situation could be the task of automatic detection of a specific character on a page of printed text. Since the signals of the outside objects are supposed not to overlap the signal of the object under search, it is clear that the filter which ensures the highest localization accuracy; i.e., the lowest normal error variance, will, in this case, be the matched one. The variance of normal errors will be defined by the same formulae (eq. (2.18a, b)) as in the absence of outside objects. The presence of outside objects affects only the probability of anomalous errors. To estimate the probability of anomalous errors, let us introduce some quantitative characteristics of outside objects. Let P(Q) be the probability of appearance of exactly Q outside objects in the area of search. Let the outside objects form some t classes by the maximal value RO,qof the cross-correlation function between the given object signal and the signal of the outside object of class q. Let P(Q,, Q2, ..., Q,) be the joint probability of appearance of Q 1 objects of the first class, Q2 objects of the second class, etc., given the total quantity of outside objects is Q. Evidently, in the presence of outside objects, anomalous errors will occur mainly due to false identification of the given object with one of the outside objects. Then the probability of anomalous errors may be found approximately as the complement to the probability that all signal values in the output of the matched filter in the points of location of the outside objects do not exceed the signal value in the location of the given object. Since the outside objects do not overlap each other or the located object, these values can be regarded as statistically independent. Therefore, by analogy with eq. (2.39b),
(2.60) where AV, denotes averaging over the Gaussian variable n with zero mean and unity variance. Since the probability distribution of the appearances of
I K § 21
LOCALIZATION OF 2D OBJECTS ON A PLANE
171
Q1,Q2,...,Qm from Q is a polynomial one described by (2.6 1) we obtain
x {l-[,jP.m(E~+n)~}dn}.
(2.62)
Thus for estimating the probability of anomalous errors it is necessary to know: (i) the variance 0,' = NoEo of the additive noise at the matched filter output, (ii) the differences between the values of the auto- and cross-correlation peaks, (iii) the probabilities of the outside object for each class, and (iv) the probability distribution of the total quantity Q of outside objects in the area of search. Some more transparent formulae and practical recommendations can be obtained by considering the partial cases, when the probability distribution P(Q) is concentrated around some point Qo and t = 1. In this case,
This expression coincides with eq. (2.39b) for the probability of anomalous errors with the presence of outside objects being the only difference. Now the signal-to-noise ratio is hardly reduced by the cross-correlation peaks of the outside objects. Naturally, the probability of anomalous errors in the case of multiple outside objects also features threshold behavior, 1, when Eo - RoJ 0lim -m
Pa=
JNOEO
< Jm;
(2.64)
(0, otherwise.
For instance, for Qo = 16 to 2048 (note that the number of characters on a 2 4.5, the standard printed page is about 2000) and ( E , - R,,,)/quantity Pa is less than but it increases rapidly with decreasing signalto-noise ratio ( E , - R o , , ) / G . The probability of anomalous errors in the presence of multiple outside objects could therefore be very high and the reliability of object detection could consequently be very low.
172
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, § 3
In order to increase the reliability, it is necessary to increase the signalto-noise ratio by suppressing the cross-correlation peaks for outside objects in relation to the auto-correlation peak of the object under localization. This requires an appropriate modification of the matched filter. We come to the conclusions that: (i) in the presence of multiple outside objects it is impossible to achieve simultaneously a minimum of normal error variance and a minimum probability of anomalous errors with the same localization device, and (ii) the optimal localization can be obtained in two steps. The first step is optimal localization with a minimal probability of anomalous errors. Here a reliable but not accurate estimation of coordinates is obtained. The second step is localization with minimal variance of normal errors within the small area found in the first step. The optimal estimator for the second step and its quantitative characteristics were discussed in this section. In the next section, we shall discuss the design and features of the minimal anomalous error localization device.
!j 3. Localization of Objects on a Complex Background with a Minimum of
Anomalous Errors 3.1. FORMULATION OF THE PROBLEM
In the present and subsequent subsections, we shall discuss the problem of object localization in pictures with a complex background, like photographs of natural scenes, aerial and space photographs of the Earth, etc. The main problem of localization in this general case will be the problem of anomalous errors of localization caused by the background outside objects. We confine the analysis to one involving a localization device similar to that depicted in fig. 3, consisting of a linear filter and decision making unit determining the coordinates of the absolute signal maximum at the filter output (Yaroslavsky [1975, 19861). Our aim is now to find the optimal linear filter ensuring a minimum probability of anomalous localization errors. At the onset of this discussion, let us define exactly the notion of optimality. In order to allow for possible nonhomogeneity of the optimality criterion over the picture frame, let us assume that the picture is decomposed into Q fragments of area S,, q = 0, 1, ..., Q - 1 (fig. 6). Let h,(b, xo, y o ) be a histogram of the picture signal b ( x , y ) in the filter output as measured for the qth fragment over the area not occupied by the object (i.e., the background
111,431
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
I73
Fig. 6 . Explanation of the basic notations in eqs. (3.1) through (3.3).
area), for the fixed background, and fixed sensor (imaging system) noise, provided that the object is located at the point with coordinates ( x o , y o ) . Further, let bo be the filter output in the object location (it may be assumed that bo > 0 without restricting the generality of the argument). Since the localization device under consideration decides upon the coordinates of the desired object via those of the absolute maximum at the signal output, the integral
then represents that average portion of the qth fragment points that can be erroneously taken by the decision unit for the object coordinates. The symbols AVimsand AVb, represent averaging over the sensor’s noise and over possible realizations of the background component of the picture. Generally speaking, bo should be regarded as a random variable because it depends on such random factors as sensor noise, photographic environment, illumination, object orientation, neighboring objects, etc. In order to take these factors into consideration, we introduce an a priori probability density p(bo) of bo. Object coordinates should also be regarded as random. Moreover, the weight of the measurement errors in the localization problems
174
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 5 3
may differ over different picture fragments. To allow for these factors, we introduce weighting functions wq(xo,yo) and W, characterizing the a priori significance of localization within the qth fragment and for each qth fragment, respectively, (3.2a) N- 1
1 wq=l.
(3.2b)
q=o
The performance of the localization device under consideration may then be described by a weighted mean with respect to p(b,), wq(xo,yo),and Wq of the integral of eq. (3.1):
A device which provides the minimum value of P will be regarded as optimal in average over sensor noise and the background component of the picture. If we seek a device which would be optimal for the fixed background part of the picture, we should eliminate AV,, (i.e., averaging over background) from eq. (3.3). 3.2. LOCALIZATION OF AN EXACTLY KNOWN OBJECT FOR THE SPATIALLY HOMOGENEOUS OPTIMALITY CRITERION
Assume that the object under search is exactly defined, which in the present context means that the response of any filter to this object may be exactly determined, or that p(bo) is a delta function: P(b0) = - 60). (3.4) Let Kq(b)be the histogram, averaged within each fragment over (x,, yo) KqW
=
jj
wq(xo,Yo) hq(b, XOI Yo) dxo dY0.
s,
Equation (3.3), which defines the localization quality, then becomes
(3.5)
I I L O31
175
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
Suppose that the optimality criterion is spatially homogeneous; i.e., that the weights {Wq}are independent of q and are equal to l/Q. Then
represents the histogram of the filter output signal as measured over the whole picture and averaged with respect to the unknown coordinates of the object under localization. By substituting eq. (3.7) into eq. (3.6), we obtain Pm
Let us now determine the frequency response H(fx,fy) of a filter which provides a minimum value of P. In the derivation, we shall follow the ideas of Yaroslavsky [1979, 19861 with minor additional substantiations. affects both Fo and the histogram @). Since F0 is The choice of H(fx,fy) the filter response at the point of object location, it may be determined through the object Fourier spectrum a ( f x ,f,) as (3.9) The relationship between h(b) and H(f,,f,) is, generally speaking, of an involved nature. The explicit dependence of h(b)on H ( f x , f , ) may be written only for the second moment of the histogram K(b) by making use of Parseval’s relation for the Fourier transform,
mi =
{_a,
b 2 @ ) db = S
j w jm
Iabg(fx,fy)12
-m
IH(fx,fy)12
dfx d f y ,
-m
(3.10)
where S1 is the search area of the picture at the filter output without the area occupied by the object signal, labg(fx,fy)12=
ss
w(xO,
Y O ) Ia,”t(fx,fy)12 dxO
dyO,
(3.1 1 )
Sl
and a,”p(fx,fy) is the Fourier spectrum of the picture with the signal in the area occupied by the desired object set to zero (i.e., the spectrum of the background component of the picture). Therefore, we can only rely upon the, in probability theory, well known Tchebyshev’s inequality (e.g., Hald [1962]), which connects the probability
176
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 5 3
for some random variable x to exceed some threshold bo with its mean value R and standard deviation a such that: Probability(x 2 R + boa) < l/bg.
(3.12)
Applying this relationship to eq. (3.8) we can write (3.13) where 6 is the mean value of the histogram @b), which by virtue of the properties of the Fourier transform can be calculated as (3.14) S
It follows from this equation that the mean value 6 of the histogram over the background part of the picture is defined by the filter frequency response H(0,O) at the point (fx = O , f , = 0). The same value affects the mean value of the signals over all the picture under filtering. This constant bias of the signal at the filter output is irrelevant for the device which localizes the signal maximum. Therefore, we can choose any value for H(0, 0), and hence disregard 6 i n eq. (3.13) without restricting the generality of the analysis. We come to the conclusion that to make the rate P of anomalous errors minimal we should design a filter such that H(fx
Y
f y
1 ”.
The solution of this equation follows from Schwarz’s inequality: (3.15) where the asterisk (*) denotes the complex conjugate. One can express I abg(fx,f,)l2 through the spectrum of the observed picture /3(fx,fy) and that of the desired object a(fx,f , ) , abg(fx9 fy
) = / 3 ( f x , f y - a(fx, f , ) exp [- i2n(fx
xO
+ & Y O )I*
(3.16)
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
I I7
Then, substitution of eq. (3.16) into eq. (3.1 1) results in AVimsAVbgIabg(f)12
= AvimsAvbgIP(f)12 - w(f)a(f)Avbgb*(f)-
+ lN(f)l2 w*(f)C1O*(f)Avbg&f),
(3.17) where f is (f,,f,), and
Wf,,fy)=
~ ~ W ~ X o , Y o ~ e ~ P ~ - i 2 n ( / , x o + fdXodY0 Yo~l
(3.18)
S
is the spectrum of the weight function w(xo,yo). At this point, it is useful to make some simplifying assumptions. First, note that if the weight function w(xo,yo) is approximately uniform over the picture area, its Fourier transform W ( f x , f y )will be a function which is essentially very close to zero over the entire frequency plane except in the very close vicinity of the point ( f , = O,f, = 0). Therefore, its influence does not extend very far from this point, and is thus irrelevant in filter design. In a first approximation, we can neglect the last two terms in eq. (3.17) for the denominator of the optimal filter frequency response (3.15). Second, the ratio of the power spectrum of the total picture to that of the object under search is, in order of magnitude, equal to the squared ratio of their areas. Usually, the area occupied by the object is much less than that of the picture itself. Therefore, we can often also neglect the second term in eq.(3.17) and use the following approximation for the denominator of eq. (3.19, (3.19) AVimsAVbg l a b g ( f x ? &)I2 2: AvimsAVbgI p(fx, f y ) which implies that (3.20) At last we can see that very often we might be interested in the most reliable localization in the given picture; i.e., with fixed background rather than in average over all possible pictures. In this case we should omit averaging A V b g . Then the optimal filter will be represented as follows (3.21) The filter described by eq. (3.21) is obviously adaptive, since its frequency response is determined by the power spectrum Ip(fx,f,)12of the observed picture.
178
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIW 0 3
On the other hand, An/b,Ifl(fx,fy)12is just a statistical power spectrum of the picture ensemble. Therefore, the filter described by eq. (3.20), which involves averaging over realizations of the background, is nothing but an optimal filter (2.29) for non-white additive noise. As described in 0 1, the object recognition problem is similar to that of localization, and anomalous localization errors are equivalent to false recognition errors. Therefore, a localization device which works with a minimal probability of anomalous errors will provide the highest discrimination capability in the object recognition task. The filter described by eq. (3.20) is therefore optimal for recognition as well. The only difference is that, in of the observed this case, the average power spectrum AVi,,AVbgI fl(fx,f,)12 picture should be substituted by the averaged power spectra AVk(lflk(fx,f,)lz) of all the objects to be discriminated from the given one, (3.22) Averaging AVk in eq. (3.22) should be carried out over all the objects, numbered by k, with the averaging weights proportional to the probability of appearance of the corresponding false object or, more generally, to some measure of wrong identification of the given object with the corresponding false one. From the preceding argument, we see that the ratio of the squared signal value in the optimal filter output at the point of actual location of the object to the variance of the output signal over the rest of the picture can be used as an index of the localization device optimality. We shall refer to this ratio as the “signal-to-noise” ratio (SNR). From the above, it follows that for the optimal adaptive filter described by eq. (3.21), the SNR is given by (3.23) We shall illustrate the advantages and characteristic features of the optimal filter of eq. (3.21) using results obtained by computer simulation. Simulation results for a one-dimensional signal are shown in fig. 7. The reference signal in this experiment was the fragment of the signal marked in the fig. 7 as “object”. One can easily see from this figure that the conventional correlator is unable to discriminate the “object” from other pulses in the signal, because the correlation peak is lower than the cross-correlation ones. In contrast, the optimal filter does this successfully: signal outbursts from the heavy
IIL§ 31
179
LOCALIZATIONOF OBJECTS ON A COMPLEX BACKGROUND
I
I
0 -50;
‘ I
20
40
60
80
100
\rJ 120
I I
140
coordinate. Fig. 7. Comparison of the conventional correlator and the optimal filter for localization of an “object”in a one-dimensional signal.
pulses in the signal are suppressed considerably in comparison to the signal peak from the object. An experiment similar to the computer simulation just described was carried out by Yaroslavsky [1975, 19793 using a real picture. An aerial photograph (fig. 8) digitized over a square raster of 512 x 512 pixels was used in experiments on localization of 20 test (5 x 5) pixels, which were uniformly dark marks superimposed upon the picture. The disposition of the marks is shown in fig. 9 by numbered squares. As can be seen from this scheme, the test marks were located in structurally different areas of the aerial photograph in order to evaluate the correlator and optimal filter performance under different background conditions. The contrast of the marks was about 25% of the video signal magnitude range. For synthesis of the filter represented by eq. (3.21), the observed power spectrum of the picture IB(fx,fy)12 was used as a zero-order approximation for the averagedover-noise spectrum AVb,l B(fX, fy)12. A comparison of the output signals for the conventional matched filter and the optimal one is presented in fig. 10. Figure 10 shows (in the downward direction) the graphs of the lines of the initial picture and the outputs of the conventional correlator and optimal filter going through the centers of marks (12) and (15) in fig. 8. One can easily see in the graph of correlator output the auto-correlation peaks of the test spots and false cross-correlation peaks,
I80
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, § 3
Fig. 8. Test aerial photograph with superimposed square marks (Yaroslavsky C1979, 19863).
including those exceeding the auto-correlation one. These false peaks result in false decisions. Comparison of this graph with the lower one in fig. 10, which presents the same line of the signal in the output of the optimal filter, shows how much the optimal filter facilitates the task of spot localization for the decision making unit. The optimal adaptive filter can be implemented both digitally and optically. In digital implementation, one can use fast algorithms based on fast Fourier transforms for signal convolution. For optical implementation, a coherent optical system with a non-linear medium placed in the Fourier plane was proposed (Yaroslavsky C1975, 19761);(fig. 11). In this system when being exposed to a power spectrum of the picture, the non-linear medium
111, B 31
LOCALIZATIONOF OBJECTS ON A COMPLEX BACKGROUND
181
Fig. 9. Scheme of mark locations in fig. 8 (Yaroslavsky [1979, 19861).
implements a denominator of the optimal filter (3.21), if the transparency of the medium is inversely proportional to the intensity of the incident light. Experiments by Dudinov, Kryshtal and Yaroslavsky [19771 with negative recording of the power spectrum of the observed picture on a photographic material with y = 1 have confirmed the radical improvement in the reliability of object localization in aerial photographs. 3.3. LOCALIZATION O F INEXACTLY KNOWN OBJECTS (Yaroslavsky [1979,1986])
In the case of an inexactly defined object, there exists uncertainty about the object parameters; i.e., the probability density q(b,) cannot be regarded
182
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
IIII, §3
coordinate Fig. 10. Graphs of lines of the original picture (fig. 8) video signal (upper), output of the conventional matched filter (middle), and output of the optimal filter (lower) (Yaroslavsky [1979, 19861).
as a delta function. Therefore, the optimal estimator must provide the minimum of the integral
where F(b) is defined by eq. (3.5).
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
I83
Fig. 11. Schematic diagram of an optimal coherent optical correlator.
Two different possibilities should now be considered. Each possibility depends on implementation restrictions. (a) An estimator with selection. Decompose the interval of possible values of bo into sub-intervals within which p(bo) may be regarded as constant. Then, r m
(3.25) where bg’ is a representative of the kth interval and P k is the area under p(b,) over kth interval. Since Pk 2 0, then PI is minimal if (3.26) is minimal. The problem reduces to that of localization of an exactly known object, with the only difference being that now a set of optimal filters given by (3.27) should be generated for each “representative” of all possible object variations. The same argument also relates to the adaptive filters of the type given by eq. (3.21). Of course, the generation and fabrication of multiple filters (as well as multiple filtering itself) requires additional time and hardware, which can
184
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIIL § 3
be unacceptable. In this case, the only alternative is adjustment of the filter to an averaged object. (b) Estimator adjusted to an averaged object. If the variance of the uncertain object parameters is not too large, one can solve the problem as though the object is exactly known, albeit at the expense of a higher rate of anomalous errors. The optimal filter in this case should be corrected with due regard to the object parameter dispersion. In order to show this, change the variables b, = b - bo and the order of integration in eq. (3.24),
P, =
[:
p(bo)k(b,
db,
+ bo) dbo.
(3.28)
-0c
The internal integral in eq. (3.28) is a convolution of distributions, or a distribution of the difference of two variables b and bo. Denote this distribution by kp(bl).Its mean value is equal to the difference of mean values E0 and b,, of the distributions p(b) and k(b), and the variance is equal to the sum of variances of these distributions; i.e., [mi - (b,,)’] + G;, where g i is the variance of the distribution p(bo). Therefore, (3.29)
The problem has thus been reduced to that of Q 3.2. One can use eq. (3.15) for optimal filter frequency response with appropriate modifications of its numerator and denominator. Since E0 is a mean value of the filter output in the point of the object located over the distribution p(bo), the complex conjugate spectrum a*(fx,fy) of the object in the numerator of eq. (3.15) should be substituted by the average over the p(b) of the complex conjugate object spectrum E*(fx,f,).The denominator of eq. (3.1 5) should be modified by addition of the variance of the object spectrum, (3.30)
We can now write the following expression for the optimal-in-average filter frequency response, (3.31)
The variance of the object spectrum laef(fx,fy)12, being of the order of magnitude of the object power spectrum la(fx,&)Iz, is evidently considerably less than the power spectrum of the total picture. Consequently, for the
111, P 31
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
185
denominator of eq. (3.31) we may use the same approximation as that used in eq. (3.20)
(3.32) By eliminating averaging over the background, we obtain the adaptive filter
(3.33) As already mentioned, such an averaged filter cannot provide as large a “signal-to-noise” ratio as a filter adjusted for the exactly known object. Indeed, for each specific object with spectrum a(f,,f,), the following inequality for the “signal-to-noise” ratio SNR for the filter (3.32) takes place,
(3.34) Note that the right-hand part of the inequality is the “signal-to-noise” ratio for the optimal filter matched to the object.
3.4. RELIABLE LOCALIZATION FOR SPATIALLY INHOMOGENEOUS OBJECTS
(Yaroslavsky [1979, 19861)
Let us now abandon the assumption of 0 3.2 concerning the spatial homogeneity in the optimality criterion. One of the following two ways to attain the minimum of P(eq. (3.3))may then be chosen, depending on the implementation constraints. (a) Localization device with a re-adjustable jilter. Under a given nonnegative {W,}, the minimum of P is attained at the minima of all
(3.35) This means that the filter should be re-adjustable for each fragment, filtering being carried out in the fragments within which the averaging in eq. (3.35)
186
OF'TIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
[III, § 3
is done. For each fragment, the optimal filter is determined through eqs. (3.13), (3.20), (3.21), (3.32) and (3.33) on the basis of measurements of the observed local spectra of fragments (with allowance for the above reservations about the influence of the object spectrum on the observed picture spectrum). According to eq. (3.3), the fragments do not overlap; this corresponds to the fragment-by-fragmentmode of processing. This can be extended naturally to the sliding processing mode based upon an estimate of the current local power spectrum of the picture. Note also that with both fragment-wise and sliding processing, the re-adjustable filter response does not depend on the weights { W,}. (b) Localization device with a j x e d j l t e r . When there is no possibility of using an adjustable filter with fragment-wise or sliding processing, an alternative way is to design an estimator adjusted to the power spectrum of picture fragments averaged over { W,}. Indeed, it follows from eq. (3.3) that
B=
lm p(bo)dbo
-m
(3.36)
where 6(b) is a histogram averaged over {W,} and w,(x,, yo). By analogy with eq. (3.31) one may conclude from this that
where (3.37b)
and a4,,(fx,fy) is the spectrum of the qth fragment background component. Thus, the frequency response of the optimal filter in this case depends on the weights {Wq}and is determined by the averaged power spectrum of all fragments.
111, § 31
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
I87
3.5. RELIABLE LOCALIZATION IN BLURRED PICTURES
In many practical situations the picture in which an object should be located is defocussed or blurred by the imaging system. The approach presented can be extended to this case as well (Yaroslavsky [1979, 19863). Let the picture be distorted by a linear, spatially invariant system with frequency response Hs(fx,f , ) . Obviously, the optimal localization device should be adjusted in this case to an object that was subjected to the same distortions as the observed picture. For instance, the adaptive filter described by eq. (3.21) should be modified as follows:
(3.38) The physical meaning of this formula will be more evident if we represent Hopl(fx,f,) in the following equivalent form
(3.39) with I P o ( f x ,&)I’ being the power spectrum of a hypothetical non-distorted picture. Thus, the optimal filter (3.38) may be regarded as consisting of two filters in cascade
(3.40)
(3.41) (N,,(fx,f,) is the spectral density of additive sensor noise) is nothing but an adaptive filter for correcting picture blur (Yaroslavsky [19871). The second filter
(3.42) is simply the optimal localization filter for a non-distorted picture. This means that the optimal filter (3.38) may be treated as one performing deblurring and then optimal filtering of the resultant deblurred picture. It will now be instructive to estimate how much picture blur affects the
188
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, § 3
localization reliability, which can be characterized by the attainable “signalto-noise” ratio at the optimal filter output. By analogy with eq. (3.23) we can write SNR=
Y jm jm -m
from which it is evident that if the frequency response of the imaging system does not fall to zero and the noise level is not too high, picture blur does not deteriorate the performance of the optimal filter to a large extent. 3.6. OPTIMAL LOCALIZATION IN MULTICOMPONENT PICTURES WITH
CLUTTERED BACKGROUND
Thorough derivation of the structure of an optimal device for object localization in multicomponent pictures requires bulky computations. Therefore, we shall proceed from the analogy between optimal localization in pictures with additive Gaussian noise and that in single-component pictures with cluttered background. Following this analogy, we can conclude that the device for localization in multicomponent pictures with cluttered background should contain: (i) a unit for decorrelating the picture components, (ii) optimal filters in each channel, (iii) an adder for summation of the channel optimal filter outputs, and (iv) a device for localization of the signal maximum at the output of the adder. Thus, we modify the device of fig. 4 for object localization in multicomponent pictures with non-correlated noise in each channel in a way shown in fig. 12. The decorrelating unit in this device is intended for suppressing the interchannel cross-correlation terms in the adder output. It can be implemented as a transcoder multiplying the picture component samples by a decorrelating matrix. The alternative way to perform component decorrelation is to carry out whitening of the component Fourier transform spectrum. The decorrelation operator DECORR, which must be applied in this case to the picture component samples {b,(x, y)}, can be expressed as (3.44)
’.
where DFT{ and DFT- { } are direct and inverse discrete Fourier transa }
111, § 31
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
1-st component
input picture 2-nd component
5
5
D e c o r r e l a t i n g
... M-th
189
component
5
u n i t
I
filter 1
5
filter 2
filter M
5.
5
A d d e r
D e v i c e o f
f o r s i g n a l
l o c a l i z a t i o n m a x i m u m
Fig. 12. Block diagram of the optimal device for localization of objects in multicomponent pictures with cluttered background.
forms over the variable m, the number of the components. The optimal filters H,,,(fx,fy)in the channels can be defined according to eq. (3.21) as (3.45) where a',,,(x,y) and Sm(x,y) are mth output of the decorrelating unit for the object and picture signals. 3.7. PHASE-ONLY-,BINARY PHASE-ONLY-,MINIMUM AVERAGE
CORRELATION ENERGY-, ENTROPY-OPTIMIZED,AND OTHER FILTERS FOR OPTICAL PATTERN RECOGNITION; RELIABLE LOCALIZATIONAND PICTURE CONTOURS
Very soon after Van der Lugt's [I9641 introduction of holographic matched spatial filters and optical correlators in pattern recognition, it was recognized that the ability of matched filters to discriminate effectively between objects of different classes is far less ideal than their ability to
190
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 5 3
combat additive Gaussian noise. Recognition of this fact stimulated the search for filters with better discrimination ability. One of the oldest and most popular ideas is that first proposed by Lowenthal and Belvaux [19671, and involves preprocessing of both the picture and the reference object by an appropriate differential operator before they are correlated. This can be accomplished by a corresponding spatial filtering in coherent light, e.g., by using opaque stops or slits centered on the optical axis, or by introducing into the coherent optical correlator an additional spatial filter. If the transmittance of the filter is made proportional to the distance from the optical axis, the filter gives a good approximation to the first derivative of the signal. Another simple method for increasing discrimination by differentiation involves using the non-linearity of the medium on which an holographic matched filter is recorded. It was noted by Van der Lugt [1968] and by Binns, Dickinson and Watrasiewicz [1968] that overexposing the spectrum information from the target, of which the filter is being made, can result in high pass filtering similar to spatial differentiation. The so-called joint transform correlators, where formation of the matched filter and the filtering operation are carried out simultaneously by using a non-linear medium in the Fourier plane of the coherent optical correlator (Weaver and Goodman [ 1966]), were recently demonstrated by Javidi [1989] to have better discrimination capability than the matched filter if the non-linearity of the media is properly chosen. Fabrication of matched filter requires, in general, recording of both amplitude and phase information. This is not a trivial process, especially if the filter is to be a computer-generated synthetic one (as, e.g., an averaged filter, or the filter for localization or recognition of an object given only by its mathematical description, etc.). To simplify filter synthesis, Horner and Gianino [1984] proposed the use of the so-called phase-only filter (POF) instead of the matched filter. The POF is a filter with a constant amplitude transmittance; the phase transmittance is equal to that of the matched filter. The next attempt to simplify filter synthesis involved the so-called binary phase-only filter (BPOF), where only the sign of the real and imaginary parts of the matched filter complex transmittance is recorded (Horner and Leger [1985]). Despite their having been introduced primarily to avoid the synthesis of complex filters and to increase the filter efficiency in using light energy in optical correlators, it was soon apparent that both POF and BPOF have much better discrimination ability than a conventional matched filter. This fact, along with their simplicity of implementation, contributed greatly to the popularity of the POFs.
111, o 31
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
191
The better performance of the P O F is usually explained by modeling this filter as a combination of the matched filter with frequency response a * ( f ) and the inverse filter l/la(f)l (see, e.g., Chlanska-Macukow and Nitka [19871): PWf)
=
.*(f)/I.(f)l.
(3.46)
The inverse filter makes the amplitude response of the P O F constant. Being reciprocal to the magnitude of the reference object spectrum, the inverse filter acts as a high pass or differentiating filter since the magnitude of the picture spectrum usually decreases as the spatial frequency increases. But why should enhancement of the picture’s higher spatial frequencies resulting from high-pass filtering or differentiating improve the discrimination capability of the filter? And if an improvement can be achieved, by how much and in what concrete ways should the higher frequencies be enhanced? Without having answered these questions, many different improvements of the POF were proposed: (i) Amplitude compensated matched filters (ACMF)(Mu, Wang and Wang 19881)
c
A C M F ( f ) = a*(f)/14f)12;
(3.47)
(ii) Amplitude-modulated P O F (AMPOF) (Awwal, Karim and Jahan C19901) AMPWf)
+ 4,
= a*(f)/(14f)12
(3.48)
where E is a small constant introduced to avoid the need to divide by zero; (iii) POF with improved signal-to-noise ratio (POFISNR) (Vijaya Kumar and Zouhir Bahri [19891)
19
POFISNRC~) = I a ( f )I
for f E F,
(3.49)
otherwise, where the area F is chosen by some optimization procedure to improve signal-to-noise ratio; (iv) A family of ternary matched filters (TMF) (e.g., Dickey, Vijaya Kumar, Romero and Connelly [1990]), which are the natural generalization of the BPOF. The real and imaginary parts of the matched filter transmittance are quantized on three levels (1, - 1 , O ) instead of two, and the location of the areas where transmittance of the filter must be zeroed is again chosen by an optimization procedure to improve the signal-to-noise ratio;
192
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, 0 3
(v) Optimal BPOF (OPOF) (Farn and Goodman [1988])
(3.50) where e(f) = 0, 1 is a binary function and and 42 are constants chosen from the standpoint of optimization of the filter figure of merits. It is clear that the problem of filter optimality, in terms of discrimination capability, has been addressed by many researchers. To complete this short review, we shall mention two more approaches to filter discrimination capability optimization. Mahalanobis, Vijaya Kumar and Casasent [ 19871 have recently introduced so-called “minimum average correlation energy” (MACE) filters. MACE filters produce sharp output correlation peaks and maximize the ratio of the squared peak value of the correlation function to the average correlation plane energy. In principle, MACE-filters are designed for identification (or detection) of multiple targets in the presence of virtually different types of uncertainty of their a priori description. However, for an exactly known object they coincide with the ACMF filters. Mahalanobis and Casasent [19911 demonstrated good performance of the MACE filters in actual experiments. Remarkably, they still needed some preprocessing of the input images in the form of edge enhancement, which modifies the correlation between the images under recognition. The need for preprocessing is, of course, not surprising because the set of possible images to be discriminated is in no way involved in the design of the MACE filter. Fleisher, Mahlab and Shamir [1990] and Mahlab, Fleisher and Shamir [1990] have proposed another criterion for the synthesis of the optimal filter. They require a strong, narrow peak for a match between the input and filter function as contrasted with uniform signal distribution for a pattern to be rejected. For a pattern to be rejected, the signal distribution over the picture area is treated as a probability density distribution, so the requirement of uniform distribution is equivalent to the requirement of maximization of the entropy of this distribution. This is why filters obtained according to this criterion are referred to as entropy optimized filters (EOF). Fleisher, Mahlab and Shamir [19903 showed by computer simulation that the discrimination power of the EOF is much better than that of the POF, and is comparable with that of MACE filters. It was also observed experimentally that a dominant feature of the EOF is a substantial enhancement of the high-frequency components of the images. Another important feature of the EOF is its adaptivity, since the entropy criterion also takes into account the patterns to be rejected. Despite these attractive features, optimi-
111, § 31
LOCALIZATIONOF OBJECTS ON A COMPLEX BACKGROUND
193
zation by the entropy criterion practically never yields a purely constant signal outside of the detection peak. The question therefore remains: Is it possible to further reduce the probability of a wrong classification which will take place when the remaining outbursts of the filter output signal (plus sensors noise, which is always present in real signals) exceed the detection peak? All of the filters reviewed in this section can be shown to approximate more-or-less the optimal filters of eqs. (3.21), (3.33) and (3.37a). The formulae for the optimal filters can also provide the answers to questions concerning the importance of high frequencies and contours for pattern recognition. To demonstrate this, we represent the basic formula (eq. (3.21)) for the optimal filter in the following way (3.51) In such a representation, the optimal filter is regarded as consisting of two filters in cascade. The first one, which is represented by the first factor in eq. (3.51), may be referred to as a whitening filter. This is an appropriate description, because in the case of simultaneous observation of both the object and outside objects within the same picture, this filter makes the picture signal spectrum almost uniform in the filter output. The second filter (the second factor in eq. (3.51)) is obviously a matched filter for the object, predistorted by the same whitening operator. In the case of separate observation of patterns in pattern recognition, the whitening acts as an orthogonalization procedure, as described by Caulfield and Maloney [19691. If the observed signal contains only the object signal without any wrong patterns, the second filter in eq. (3.51) will be just a phase-only filter, and the entire filter (3.5 1) will be just an amplitude-compensated phase-only filter, as well as a MACE filter for a single object. As filter output we shall have, of course, a delta function. Further, in pattern recognition the main danger of false recognition is connected to the patterns which closely resemble the given one. In this case, the phase-only filter presents a quite reasonable approximation to the second filter in eq. (3.51), depending on the similarity between the patterns; the other filters mentioned above approximate the optimal filter itself. However, it is dissimilarities in spectra that are most important for the discrimination of patterns. The optimal filter takes advantage of these dissimilarities, while other filters do not. Superiority of the optimal filter to, for instance, POF can be illustrated by the simulation results presented in fig. 13 for the signal of fig. 7
194
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
120
-40;
CIII, 8 3 1
I
20
40
60
80
100
120
140
coordinate
Fig. 13. Comparison of POF and optimal filters.
(Yaroslavsky [1992c]). From top to bottom, the graphs show the output signals of the phase-only and optimal filters. In this one-dimensional simulation experiment the number of signal samples was chosen to be 128. Filtering was carried out by the FFT-technique. As an estimate of the denominator of eq. (3.51) the squared spectrum modulus of the total signal was chosen and smoothed by the convolution with a rectangle window of five samples for averaging over computer round-off noise. An important feature of the optimal filter (3.51) is its adaptivity in application for target detection, since its frequency response is determined by the power spectrum of the observed picture. The whitening operation is therefore also adaptive. Due to the fact that usually (but not always!) the picture spectrum decreases with increasing spatial frequencies, whitening results in enhancement of high frequencies (visually, this can be observed as edge enhancement). This is why the recommendations about enhancement of high frequencies work. But in contrast to these ad hoc recommendations, eq. (3.51) explicitly says to what degree, and in which specific way, this enhancement must be done for each specific picture. Graphs of the initial signal, of the whitening filter output signal and of the whitening filter pulse response are shown in fig. 14. Figure 14 illustrates the edge enhancement feature of the optimal filter for the same signal and object as represented in fig. 7. However, if the object to be detected was masked, for instance, by an high-frequency grid, the whitening would not
111, § 31
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
195
coordinate
Fig. 14. Illustration of signal whitening by an optimal filter.
enhance but rather attenuate the high frequencies of the grid! The same phenomena is illustrated by fig. 15 for the aerial photo of fig. 8 and by fig. 16 for the test pictures of geometrical figures and characters. It is clearly seen from these figures that whitening automatically suppresses all powerful or frequently occurring features of the objects. Examples of these features include low-frequency components of the pictures and even some edges if they occur very often in different patterns, such as vertical and horizontal lines in geometrical figures and characters. On the other hand, whitening enhances dissimilarities in the patterns, like circumferences and corners in geometrical figures and corners and inclined fragments of the characters. Equations (3.21) and (3.51) also provide a rational explanation of the importance of different frequency bands in the signal. It was shown in 0 3.2 (eq. (3.23)) that the signal-to-noise ratio at the output of this filter is equal to: (3.52)
i.e., equal to the energy of the whitened spectrum of the object. Consequently, the higher the energy of the whitened signal, the better the signal-to-noise ratio. It is this relationship that measures the importance of the signal frequencies. In the next subsection we shall discuss how it can be exploited
196
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
[III, 0 3
Fig. 15. The result of “whitening”of the picture of fig. 8 (Yaroslavsky [1979, 19861).
for selection of the reference objects from the point of view of their potential localization reliability. 3.8. SELECTION OF REFERENCE OBJECTS FROM THE STANDPOINT OF LOCALIZATION RELIABILITY
There exist numerous applications in which the reference object is not assigned and must therefore be selected. This is the case in aerial photo image registration and matching with a map; similar problems occur in stereogrammetry, in robot vision, etc. The question is how to make this choice to best advantage.
LOCALIZATION OF OBJECTS ON A COMPLEX BACKGROUND
197
Fig. 16. “Whitening” of the test picture consisting of geometrical figures and characters: (a) original picture; (b) the picture after “whitening”(Yaroslavsky [ 1979, 19863).
The literature on stereogrammetry recommends to take as reference objects those fragments of the aerial photo that have pronounced local peculiarities such as crossroads, river bends, separately situated buildings, etc. In the literature on pattern recognition and computer vision one can find recommendations to choose as reference objects those fragments of the picture where some informative functions such as local variance of the signal, variance of local gradients, or other similar measures of the fragment detailedness have extreme values. Almost all these recommendations are of a qualitative nature. The analysis presented in this review provides a more quantitative and accurate approach to this problem, and gives a reasonable explanation of the various recommendations which have been offered. Indeed, eq. (3.52) can be regarded as a precise performance measure for optimal localization. It follows from this equation for the maximum possible “signal-to-noise” ratio in the output of the optimal linear filter that the best reference objects will be those picture fragments which have maximal variance of the “whitened” power spectrum. Due to this feature they provide the greatest response of the optimal filter and, consequently, a minimal rate of the false identification errors. Therefore, the reference objects with slowly decreasing spectra (i.e., picture fragments) which are visually estimated as containing the most intensive contours will be the best ones. Experimental verification of this conclusion was obtained by Yaroslavsky [1986]. The results of this investigation are
198
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
CIII, § 3
illustrated in figs. 17 and 18, where the “goodness” of the fragments of 16 x 16 pixels in terms of eq. (3.52) is represented by the darkness of the pixels. It may be readily seen that where the picture has some sharply pronounced local peculiarities (brightness overfalls, textures, etc.), the best fragments are found. Mostly they are located in the places that one would
Fig. 17. Automatic selection of reference objects in aerial photograph: (a) original picture; (b) result of testing of 16 x 16 pixels fragments (Yaroslavsky [1986]).
111,s 41
CONCLUSION
199
Fig. 18. Automatic selection of reference objects in space photograph: (a) original picture; (b) result of testing of 16 x 16 pixels fragments (Yaroslavsky [1986]).
call contours. But it is seen from the figures that not all fragments which we would qualify as containing contours will be the best reference objects even if the fragments contain intensive brightness overfalls. For instance, the vertical edges in fig. 17 or the line of the coast in fig. 18 are not excellent reference objects in spite of their good contrast. This is because there are many similar fragments in the pictures, which means that it would be very difficult to distinguish them one from another. This performance measure is very sensitive to the dissimilarities between picture fragments and could be effectively used for automatic selection of the reference objects in pictures. The corresponding algorithm for choice of referent objects requires rather cumbersome computations. Therefore, computationally simpler algorithms approximating the exact one are of interest. Experiments by Belinsky and Yaroslavsky [19801 have shown that algorithms for computation of the local signal variance or local mean of the magnitudes of video signal gradients which are computationally very simple may be used as such approximating algorithms. But with these algorithms we, of course, loose the adaptivity inherent in the more comprehensive algorithms.
0 4. Conclusion The theory of optimal localization devices, which consist of an optimal adaptive linear filter and a point-wise decision making unit, seems now to
200
OPTIMAL METHODS FOR LOCALIZATION OF OBJECTS IN PICTURES
m ,§4
be more or less complete. Nonetheless, experimental experience and intuitive conjecture suggest that in the future, supplementing linear filtering with some adaptive, non-linear processing promises further and perhaps radical improvements in localization reliability.
Acknowledgements The present paper was motivated mainly by the course of lectures, delivered by the author to researchers and students at the Department of Applied Optics of the University of Erlangen, Germany during the spring semester of 1991, and by numerous discussions of the subject which the author has had with Prof. Dr. A. Lohmann and Prof. Dr. G. Hausler of this University, with Prof. M. Yzuel and her collaborators at the Autonomous University of Barcelona, Spain; with Prof. N. Gallagher and his collaborators at Purdue University, West Lafayette, USA; with Dr. M. Eden and his collaborators at the National Institutes of Health, Bethesda, USA; with Prof. Dr. H. Niemann and the collaborators at the Chair of Informatics 5 (Pattern Recognition) of the University of Erlangen, Germany; and with Prof. Dr.-Ing. R. Schwarte and the collaborators at the Institute of Information Processing of the University of Siegen, Germany. The author is sincerely grateful to them all.
References Awwal, A.A.S., M.A. Karim and S.R. Jahan, 1990, Appl. Opt. 29, 223. Belinsky, A.N., and L.P. Yaroslavsky, 1980,Issledovaniya Zemli iz Kosmos 4,85-91. In Russian. Binns, R.A., A. Dickinson and B.M. Watrasiewicz, 1968, Appl. Opt. 7, 1047-1051. Caulfield, H.J., and W.T. Maloney, 1969, Appl. Opt. 8, 2355-2357. Chlanska-Macukow, K., and T. Nitka, 1987, Opt. Commun. 64,224-228. Cramer, H., and M.R. Leadbetter, 1967, Stationary and Related Stochastic Processes. Sample Functions Properties and Applications (Wiley, New York) p. 168. Dickey, F.M., B.V.K. Vijaya Kumar, L.A. Romero and J.M. Connelly, 1990, Opt. Eng. 29, 994- 1001. Dudinov, V.N., V.A. Kryshtal and L.P. Yaroslavsky, 1977, Geod. Kartogr. 1, 42-47. Farn, M.W., and J.W. Goodman, 1988, Appl. Opt. 27,4431-4437. Fleisher, M., U. Mahlab and J. Shamir, 1990, Appl. Opt. 29, 2091-2098. Hald, A., 1962, Statistical Theory with Engineering Applications (Wiley, New York) p. 783. Horner, J.L., and J.L. Gianino, 1984, Appl. Opt. 23, 812-816. Horner, J.L., and J.R. Leger, 1985, Appl. Opt. 24, 609-611. Javidi, B., 1989, Appl. Opt. 28, 2358-2367. Kotelnikov, V.A., 1956, Theory of Potential Noise Immunity (Gosenergoizdat, Moscow-Leningrad). In Russian.
1111
REFERENCES
20 1
Lowenthal, S., and Y. Belvaux, 1967, Opt. Acta 14(3), 245-258. Mahalanobis, A,, and D. Casasent, 1991, Appl. Opt. 26, 561-572. Mahalanobis, A,, B.V.K. Vijaya Kumar and D. Casasent, 1987, Appl. Opt. 26, 3633-3640. Mahlab, U., M. Fleisher and J. Shamir, 1990, Opt. Commun. 77, 415-422. Mu, G.-G., X.-M. Wang and 2.-Q.Wang, 1988, Appl. Opt. 27, 3461-3463. Rice, S.O., 1944, Bell Syst. Tech. J. 23(3), 282-332. Rice, S.O., 1945, Bell Syst. Tech. J. 24(1), 46-156. Van der Lugt, A.B., 1964, IEEE Trans. Inf. Theory, IT-10, 139-145. Van der Lugt, A.B., 1968, Opt. Acta 15, 1-33. Vijaya Kumar, B.V.K., and Zouhir Bahri, 1989, Appl. Opt. 29,250-257. Weaver, C.S., and J.W. Goodman, 1966, Appl. Opt. 5(7), 1248-1249. Woodward, Ph.M., and I.L. Davies, 1950, Philos. Mag. 41, 1001-1017. Wozencraft, J.M., and M. Jacobs, 1965, Principles of Communication Engineering (Wiley, New York) p. 720. Yaroslavsky, L.P., 1970, Radiotekh. & Electron. XV(6), 1169-1 173. In Russian. Yaroslavsky, L.P., 1972, Radiotekh. & Electron. XVII (4),714-720. In Russian. Yaroslavsky, L.P., 1975, Vopr. Radioelektron. 8, 70-74. In Russian. Yaroslavsky, L.P., 1976, Bull. Izobr. 43, 135-137. In Russian. Yaroslavsky, L.P., 1979, Introduction to Digital Picture Processing (Sov. Radio, Moscow) pp. 195. In Russian. Yaroslavsky, L.P., 1985, Digital Picture Processing. An Introduction (Springer, Berlin) p. 276. Yaroslavsky, L.P., 1986, Applied topics of digital optics, in: Advances in Electronics and Electron Physics, Vol. 66, ed. P.W. Hawkes (Academic Press, Orlando, CO) pp. 1-140. Yaroslavsky, L.P., 1987, Digital Signal Processing in Optics and Holography (Radio i Svyaz, Moscow). In Russian. Yaroslavsky, L.P., 1992a, Accuracy and reliability of localization of objects in pictures, in: Proc. Symp. on Image Analysis. Uppsala, 10-1 I March, 1992, eds 0. Eriksson and E. Bengtsson (Centre for Image Analysis. Swedish Society for Automated Image Analysis) ISSN 11006641, pp. 1-8. Yaroslavsky, L.P., 1992b, Accuracy and reliability of localization of objects in color pictures, in: Remote Sensing and Visualization of Information, Proc. Third Int. Sem. Digital Image Processing in Medicine, DIP-92, Riga, Latvia, 21-25 April, 1992 (Institute of Electronics and Computer Engineering, Latvian Academy of Sciences) pp. 3-6. Yaroslavsky, L.P., 1992c, Appl. Opt. 31, 167771679,
This Page Intentionally Left Blank
E. WOLF, PROGRESS IN OPTICS XXXII @ 1993 ELSEVIER SCIENCE PUBLISHERS B.V.
ALL RIGHTS RESERVED
IV WAVE PROPAGATION THEORIES IN RANDOM MEDIA BASED ON THE PATH-INTEGRALAPPROACH BY
M. I. CHARNOTSKII, J. GOZANI, V. I. TATARSKII, and V. U. ZAVOROTNY Cooperative Institute for Research in Environmental Sciences, University of Colorado, NOAA Boulder, CO 80309-0216, USA
203
CONTENTS PAGE
Q 1.
INTRODUCTION
.
.
. . . . . . . . . . . . . .
205
0 2. PROBLEM FORMULATION AND GOVERNING EQUATIONS . , . . , . , . . . . . . . . . . . 209 Q 3.
INTRODUCTION TO PATH INTEGRALS . . . . . . . 217
0 4.
PATH-INTEGRAL REPRESENTATIONS OF WAVE FIELDS IN INHOMOGENEOUS MEDIA . . . . . . . . . . . 229
9: 5.
PATH-INTEGRAL REPRESENTATIONS OF MOMENTS. . 241
Q 6.
THE CONNECTION BETWEEN HEURISTIC APPROXIMATIONS AND PATH-INTEGRAL REPRESENTATIONS . . . . . . . . .
. . . . . . Q 7. CONCLUSIONS . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . REFERENCES . . . . . . . . . . . . . . . . . . . .
204
25 I 261 262 262
5 1. Introduction The theory of wave propagation in random media with large-scale (compared with wavelength) inhomogeneities has been developed during almost half a century. It represents an extended area of study, having important applications in such fields as atmospheric optics, ocean and atmospheric acoustics, radio meteorology, optical and radio astronomy, and plasma physics. Until the mid-l960s, the main efforts were spent working out different approaches based on perturbation theory. Descriptions of those efforts can be found in the books by Chernov [1960] and Tatarskii [1961]. Unfortunately, the results could be applied to only a narrow range of problems, characterized by small fluctuations of the field at the receiving point. However, there exist many important problems that are characterized by strong fluctuations in the field intensity, in spite of the small-angle character of the scattering phenomenon. Examples include light propagation near the ground, sound propagation over long distances in the ocean, and radio-wave propagation in the interstellar medium. To develop an adequate theory for these situations, one must use methods describing multiple small-angle scattering. The parabolic equation has been used for such a description. It was originally used by Leontovich [1944], Leontovich and Fock [1946] and Fock [1950] for the problem of radiowave propagation near the ground in a regularly inhomogeneous atmosphere. For a random medium, this equation becomes a stochastic parabolic one. The application of quite different approaches (such as the Markov approximation or the local method of small perturbations) to this equation has led to equations for arbitrary statistical moments of the receiving field (Chernov [1968,1969], D o h [1968], Shishov [1968], Beran and Ho [1968], Tatarskii [1969], Klyatskin and Tatarskii [1970a], Klyatskin [1970], Molyneux [1971a], Brown [19721). This step represented considerable progress in the development of the theory of strong intensity fluctuations. The main results obtained in that area can be found in numerous monographs and review papers (Tatarskii [1971], Prokhorov, Bunkin, Gochelashvili and Shishov [1975], Klyatskin [1975, 19803, Fante [1975, 19851, Gurvich, Kon, Mironov and Khmelevtsov [1976], Uscinski [1977], Ishimaru [1978], 205
206
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRAL APPROACH
CIV, § I
Tatarskii and Zavorotny [1980], Yakushkin [1985], Rytov, Kravtsov and Tatarskii [19881, Kravtsov [ 19921). The first moment is the mean field, which is associated with the “coherent” or unscattered part of the wave field, and the second moment is the mutual coherence function, which gives the degree of correlation of the field at two points and determines the angular distribution of the scattered radiation as well as the mean energy density of the radiation. The equations for the mean field and for the coherence function can be solved in a general form for arbitrary cases of initial conditions and of medium structure functions (see, e.g., Tatarskii [1971], Ishimaru [1978]). The next significant statistical moment is the fourth moment, because the third moment, as well as all other odd moments, behave similarly to the mean field and thus decay rapidly with increasing distance in a random medium. The fourth moment is used to describe intensity fluctuations by determining their variance and spatial-temporal spectrum. However, the corresponding equations do not have full analytical solutions for all ranges of the parameters occurring in the problems. Only asymptotic solutions for regions of weak and saturated intensity fluctuations were found. These asymptotic expressions for fourth statistical field moments were derived by various asymptotic methods applied directly to the moment equations (e.g., Gochelashvili and Shishov [1971, 19741, Yakushkin [ 1975, 19763). Significant efforts were made to obtain a numerical solution that may complement the description in the intermediate regime. Most of the numerical solutions presented until now use two-dimensional models of the medium in order to study salient effects. The full-dimensional numerical solution of the fourth-moment parabolic equation was reliably obtained by Gurvich, Elepov, Pokasov, Sabel’feld and Tatarskii [1979a,b] and Martin and FlattC [1988, 19901, as a result of numerical simulation. Apart from the methods mentioned above, which use differential equations for the statistical moments of the field, there is an alternative method that uses Feynman path integrals, or functional integrals. Functional integration implies the integration over some measure. A special measure in the space of continuous functions that gives the infinite-dimensional probability distribution of trajectories of a Brownian particle (Einstein [1905]) was thoroughly studied by Wiener [1923, 19241 (see, e.g., the book by Kac [1959]). Feynman [1948] then introduced the complex measure and the corresponding functional (path) integration in his formulation of non-relativistic quantum theory. Feynman path integrals are now used widely in quantum mechanics, quantum electrodynamics, and statistical mechanics. The best known review papers and books of the field are written by Feynman
IV,§ 11
INTRODUCTION
207
[1958], Kac [1959], Gel'fand and Yaglom [1960], Feynman and Hibbs [1965], and Fradkin [1966]. The application to wave propagation in random media was first suggested by Klyatskin and Tatarskii [1970b], in which the solution of both the Helmholtz and the parabolic equation was presented in the form of a Feynman path integral. The method was used for analogous purposes in the works of Molyneux [1971b] and Chow [1972, 19751. Using asymptotic methods for the evaluation of Feynman path integrals, Zavorotny, Klyatskin and Tatarskii [1977] and Dashen [19793 derived expressions for arbitrary moments of the wave intensity in the regime of saturated fluctuations. The result obtained there for the fourth moment was the same as that obtained in the works of Gochelashvili and Shishov [1971, 19741 and Yakushkin [1975, 19761 who solved the differential equation for this moment. The same asymptotic result for higher statistical moments was obtained by Yakushkin [1978] using the conventional method of Green functions to produce the solution of the moment equation. The above-mentioned asymptotic approaches were improved by using the path-integral technique, in a paper by Dubovikov [1984] in which the rigorous asymptotic expansion for the fourth moment was obtained. The path-integral technique was also used: (i) by Zavorotny [1978] for deriving corrections to the intensity moments calculated in the Markov approximation and for calculations of twofrequency intensity correlations (Zavorotny [19811); (ii) by Rose and Besieris [19791 for analyzing N th-order multifrequency coherence functions; (iii) by Flattt, Dashen, Munk, Watson, and Zachariasen [19793, FlattC, Bernstein and Dashen [1983], and Dashen, FlattC and Reynolds [1985] for sound propagation in the ocean; (iv) by Codona, Creamer, Flattt, Frehlich and Henyey [1986a] for describing wave propagation including anisotropy in a random medium and a deterministic background refractive index; (v) by Codona, Creamer, FlattC, Frehlich and Henyey [1986b] for deriving the fourth moment of the Green function; (vi) by Codona and Frehlich [1987], Frehlich [1987], and Charnotskii [19911 for describing scintillations from extended incoherent sources; (vii) by Lukin and Charnotskii [1985] for the problem of phase conjugating systems working in random media; and (viii) by Tatarskii and Zavorotny [1986] and Uscinski, Macaskill and Spivak [19861 for deriving various approximate expressions for the statistical moments.
208
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRALAPPROACH
CIV, 5 1
The last application is very important because, despite its apparent complexity, the path-integral technique provides valuable physical insight and is very fruitful in producing new, useful approximations. Moreover, the technique is a tool for understanding how some known but not wellgrounded approximations work and what their applicability limits are. Some investigators prefer to use heuristic analytical approximations instead of more substantiated asymptotic expressions for statistical moments, because the heuristic approximations are supposed to work with acceptable accuracy for all displacements and all scattering strength values in a medium. These approximations yield the expression for a statistical moment in the form of an ordinary finite-fold integral that can be calculated analytically or numerically. One of them is the phase approximation of the extended Huygens-Fresnel principle (in Russian literature, this is the phase approximation of the Huygens-Kirchhoff method), which was first used in a paper by Feyzulin and Kravtsov [1967] (see also Lutomirskii and Yura [1971], Yura [1972], Lee, Holmes and Kerr [1976], Fante [1975, 19851). Tur and Beran [19831 provided a comparison between the results of this approximation and numerical solutions for the case of a two-dimensional random medium. Other, similar, approaches were proposed using plane wave expansions instead of spherical wave expansions (Gochelashvili [1971, 19741, Aksenov and Mironov [1978,1979]). Mironov [1981] gives a detailed review of these methods in his book. Recently, another approximate approach has been developed for wave propagation in random media which also yields the expression for a statistical moment in the form of an ordinary finite-fold integral. This approach, called the two-scale expansion technique, was proposed by Frankenthal, Beran and Whitman [19821, Beran, Whitman and Frankenthal [19823, Mazar and Beran [19823, Frankenthal, Whitman and Beran [19841, and Macaskill [1983]. Gozani [1987, 1988, 19911 developed another version of the two-scale expansion and showed that it is not an asymptotic technique, as was previously thought. Uscinski [1982, 19851 obtained the same zeroorder approximation using an iterative procedure based on the moment equation in the spectral domain. These approaches were applied to various problems of intensity fluctuations of waves propagating in the ocean and atmosphere (Whitman and Beran [1985], Mazar, Gozani and Tur [1985], Gozani [1987,1988], Beran and Mazar [1987], Whitman and Beran [1988], Whitman and Beran [ 19921). Of course, the results produced by all these approximate methods cannot be considered satisfactory for all cases, because sometimes they give controversial or even wrong results. Therefore, their physical justification and the
IV, § 21
PROBLEM FORMULATION AND GOVERNING EQUATIONS
209
limits of validity should be obtained. To this end, the present article also contains an examination of the connection between the approximate expressions and the exact solutions written in the form of path integrals.
9 2. Problem Formulation and Governing Equations 2.1. PARABOLIC EQUATION FOR A FIELD IN A RANDOM MEDIUM
Before turning to the path-integral formulation, let us review the basic postulates of wave propagation theory in random media. It is known that for large-scale inhomogeneities (in comparison with the wavelength A), it is possible to neglect the backscattering and depolarization of waves. For this case the scalar parabolic equation for the field envelope of the wave propagating in the positive direction of the z-axis is valid (Leontovich [ 19441, Leontovich and Fock [1946], Fock [1950], Rytov, Kravtsov and Tatarskii [1988])
au
2ik - + Au
aZ
+ k2E(r,z)u = 0,
(2.1)
where r = (x, y) is the transverse position vector, A = a2/axZ+ a 2 / a y 2 , k = 2n/A = w&c, w is the wave frequency, E(r, z) = ( E - E)/E, E is the permittivity of the medium, and the overbar denotes the averaging over fluctuations in the medium. The complex envelope u(r, z) of the electric field E(r, z, t) is introduced by E(r, z, t) = exp(ikz - iot) u(r, z). The solution of eq. (2.1) satisfies the initial condition in the plane z = Z o u(r, z = Z , ) = uo(r), (2.2) where uo(r) is the field distribution in the plane z = Zo. In the case of a point source at ( R o ,Z o ) , we take
uo(r)= 6(r - Ro), (2.3) where S(r) is a two-dimensional Dirac &function. We denote the field of a spherical wave at the point ( R , Z ) as G ( R , Z ;R o , Z o ) . This is the Green function for eqs. (2.1) and (2.2). So, the field satisfying the general initial condition (2.2) can be represented as the superposition of the fields created by the point sources in the initial plane, this is called the spherical wave expansion (SWE) of the solution
u(R, Z ) =
s
G(R, 2;Ro, 20)uo(&) d2&,
(2.4)
210
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRALAPPROACH
CIV, 0 2
It is well known that the Green function is subject to the reciprocity principle. That is, it is invariant to interchangement of source and observation point, i.e., G(R, Z Ro, Z o ) = G(Ro,20;R, 2).
(2.5)
Introducing Fourier decomposition of the initial field
uo(Ro)=
s s
-
tio(Po)exp(iPo R,) d2Po,
which is simply a plane wave expansion of the initial conditions, we can represent eq. (2.4) as u(R, 2) =
6(R,2; Po,2,) $'(Po) d2Po,
where
e ( R , 2 Po, 2,) =
s
G(R, Z R,, 2,) exp(iPo * R,) d2Ro.
(2.8)
According to eq. (2.4), 6(R,Z; Po, 2,) is the field of the original plane wave with a transverse wave vector Po, travelling from the plane z = Z o to the plane z = 2. Therefore, we call eq. (2.7) the outgoing plane wave expansion (OPWE) of the field. The Fourier spectrum of the field in the observation plane is defined by
[
G(P, 2) = ( 2 ~ ) - ~u(R, 2)exp(-iP.R) d2R. J
Using eq. (2.4), eq. (2.9) can be represented in the form
s
ti(P, 2)= ( 2 ~ ) - ~G(P,2; Ro, 2,) uo(R0)d2Ro, where
G(P,2;R,, 2,)
=
s
G(R, 2; Ro, 2,) exp(-iP.R) d2R.
(2.10)
(2.1 1)
Because of the reciprocity principle, G(P, 2; R,, 2,) is the field of an originally plane wave with a transverse wave vector P,traveling from the plane z = Z to the plane z = 2,. We call eq. (2.10) the incoming plane wave expansion (IPWE).
IV,8 21
21 1
PROBLEM FORMULATION AND GOVERNING EQUATIONS
Equations (2.4), (2.7) or (2.10) permit us to reduce the solution of any problem with arbitrary initial conditions to a plane or spherical wave problem. 2.2. MOMENT EQUATIONS
On the basis of eq. (2.1), equations for the statistical moments, rnm(rl,..., r,; r’,, ..., ra, z) = u(rl, z )
u(r,, z) u*(r;, z )
u*(ra, z),
(2.12) were derived, where the overbar denotes the ensemble average over the fluctuations in the medium and * denotes complex conjugation. The equation for r,,, where n, rn are arbitrary, was obtained independently by different investigators (Chernov [1968, 19693, D o h [19681, Shishov [19683, Beran and Ho [1968], Tatarskii [1969], Klyatskin and Tatarskii [1970a], Klyatskin [1970], Molyneux [1971a], Brown [1972]). It was shown (Tatarskii [1969], Klyatskin [19703) that this equation corresponds to the Markov approximation, for which the actual correlation function of E, Be@,z; r’, z’) = E(r, z) E(r’, z’),
is replaced by the effective function Bcff(r, z; r’, z ) = d(z - z’) A(r - r‘, z),
(2.13)
where Be(r, z; r’, z) dz‘.
A(r - r’, z) =
For Gaussian fluctuations of E, the equation for rnm assumes the form (see, e.g., Rytov, Kravtsov and Tatarskii [1988])
arnm az
i [A, 2k
+ + A, - A; 9 . .
-
-
A&]rnm + ik2Q,mrnm=0,
(2.14)
where
(2.15) with the appropriate initial condition r,,,(r,,
..., r,; r’,, ..., ra, z = 2,)
= r:m(rl,
..., r,; r’, , ..., r;).
(2.16)
212
WAVE PROPAGA?lON IN RANDOM MEDIA: PATH-INTEGRALAPPROACH
w,0 2
Practically statistical moments with n = rn are more significant than those with n # rn, but moments for which n = rn 2 3 are less tractable than those for which n = rn = 1,2. The equation for Tlo(r,z) = u(r,z) can be written as (Tatarskii [1969]) ~
arlo az
i ATlo Qk2A(0, z)Tl0 = 0, 2k
+
(2.17)
and obviously has a simple analytical solution (2.18)
where Tyo(r,2) is the free-space solution. The function Tlo describes the coherent, unscattered part of the wave field. It decays quickly with range 2 and is usually not of great interest. The same is also true for all r,,, if n # m. The equations for T l l ( r ,r’, z) = u(r, z) u*(r’,z) can be written as ar11
az
i ( A - A’)Tll + $nk’H(z, r -r’)Tll = 0, 2k
Tll (r, r’, z = 2,) = Ty1(r, r’),
nH(r, z) = A(0, z) - A(r, z).
(2.19)
This equation has a closed-form analytical solution, r l l ( r , r’, 2)=
k2
4n2(2- 2,)’ ik
{
dr, drb ryl( y o , rb) [(r - ro)2- (r’ - rb)2]
- $D(r, - rb, 2,; r
I
- r’, 2) ,
(2.20)
where D(r,, 2,; r, 2)= 4nk’
jz:
H ( r ( z - ‘0)
(2- 2,) +
)
(’ - ’) dz. (2- 2,)
(2.21)
Here, D coincides with the complex phase structure function that appears in smooth perturbation theory (Chernov [1960], Tatarskii [1961]). The function rll describes the mean intensity distribution of wave beams I(r, 2)= Tl (r, r, 2) and their angular spectrum that determines the resolution and image quality. Also, D o h [1964] and Walther [1973] have found
IV, 0 21
PROBLEM FORMULATION AND GOVERNING EQUATIONS
213
the connection between coherence theory, based on T,1 , and the phenomenological radiative transfer theory (see also Rytov, Kravtsov and Tatarskii [19883). The fourth-moment function r2,has been the focus of considerable interest in the last two decades. This function, which describes the correlation of intensity fluctuations necessary in optical measurements and radio astronomy, satisfies the equation
ar,, aZ
-
i [A, 2k
+A2-A',
-4]T22-$nk2F(r1,r2,r',,r;,z)~22,
r22(rl, r 2 ,r ; , 4 ,z = 0) = r22(r1, 0 r 2 ,r ; ,
(2.22)
where F is called the scattering function:
F(r,, r 2 , r',, r ; , z ) = H(r, - r ; , z) + H(r2 - r ; , z ) + H ( r , - rl,, z)
+ H(rz - r; , z) - H ( r , - r 2 ,z ) - H(r; - r;, z).
(2.23)
If we introduce the new variables
R = a(rl + r 2 + r', r"
+ r;); -1 - z(rl - r2 - r; + r;);
+ r 2 - r; - r;, i2= f ( r , - r 2 + r; -. r;),
p =r1
and denote TZ2(r,,r 2 ,r ; , r;, z) = T4(R,p, v",, i2,z), we obtain from eq. (2.4) the equation (omitting the tilde)
ar, - i [VR.V, + V, .V21T4 - $nk2F4(r1,r 2 ,p, z)r4, aZ k
(2.24)
where
+ i p , z) + H(rl - i p , z ) + H(r2+ i p , z ) + H(r2- i p , z ) - H ( r , + r 2 ,z) - H ( r , - r 2 ,z).
F4(rl, r 2 ,p, z) = H(r,
(2.25)
The important feature of the small-angle approximation for wave propagation in an inhomogeneous medium is the special form of the energy conservation law. One can find (Tatarskii [1971]) that the total energy flux through any plane normal to the propagation direction is constant even for the instantaneous realization of the field; i.e.,
s
dr Z(r, z ) =
s
droZo(ro)= const.,
(2.26)
214
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
w,§ 2
where Z(r, z) = u(r, z ) u*(r, z) is the intensity distribution in any plane O < z < Z . Using the spherical wave expansion (2.4), it can be shown (Klyatskin [1980]) that the necessary and sufficient condition of energy conservation in terms of the Green function is
s
G(R, 2; Ro, 2 0 ) G*(R', Z ; Ro, 2 0 ) dR0 = 6(R - R').
(2.27)
This formula can be treated as a Green function orthogonality. Averaging of eq. (2.26) obviously leads to JdR Z(R, z) = SdR, Zo(Ro) for finite beams, and Z(R, z) = Z o = const. for unbounded waves with a uniform intensity distribution in the initial plane. It is not difficult to verify that eq. (2.20) matches these restrictions. For the fourth field moment, energy conservation results in JdR dr B,(R, r, z) = 0 for finite beams and Jdr B,(r, z) = 0 for unbounded waves, where B, is the spatial covariance of the intensity, B,(R, r, z) = Z(R
+ f r , z) Z(R - f r , z) - Z(R + f r , z) Z(R - i r , z),
and for plane and spherical waves it does not depend on R. It has been shown (Tatarskii [1971]) that the fourth-moment equation (2.24) matches this condition. 2.3. PLANE-WAVE-TYPEFOURTH-MOMENT EQUATIONS
Considering an initial plane wavefront normal to the z-axis; i.e., uo(r)= const., simplifies eq. (2.24). The initial condition in this case can be written as T4(R,r l , r 2 ,p, z = 0). It is clear that VRr4= 0 because of transverse
displacement invariance of the equation coefficients and initial conditions. That means that p is simply a parameter of the problem, and for most applications related to radiation intensity, we can let p = 0 in eq. (2.24) and obtain the partial differential equation that we refer to as the plane wave tYPe
with the initial condition r 4 ( r 1 ,r 2 , z = 0) = 1. Here,
+ r 2 , z ) u * ( r l , z) u * ( r 2 ,z), F 4 ( r l ,r 2 , z ) = 2 H ( r 1 ,z) + 2 H ( r 2 ,z) - H ( r , + r 2 , z ) - H ( r l - r 2 ,z). r4@1, r 2 , z) = 40, z) u(rl
(2.29)
IV, D 21
21 5
PROBLEM FORMULATION AND GOVERNING EQUATIONS
Shift-invariant initial conditions can be realized not only for the plane wave case but also for the more general case of partially coherent uniform waves. Also it can be shown that eq. (2.28) describes the case of the fourth moment for an initial plane having a tilt relative to the z-axis, and the case of the mutual coherence function for four plane waves with different slopes. It is possible to simplify the problem significantly with arbitrary initial conditions that obey the full (five arguments) fourth-moment equation (2.4). Let us demonstrate this by making use of the shifted Fourier transform (Tatarskii [ 19711). We introduce the Fourier spectrum for fourth moments by the R argument,
s
T 4 ( ~r l, , r z , p, z) = (2n)-’ exp(-iK. R ) f4(R, r l , r z , p, z) d2R.
(2.30)
This function satisfies the equation (Tatarskii [1971], Tur and Beran [1982])
ar4+ i1 z * V p-r 4 ( r~l, rz, p , z ) = ii Vl-VzT4 -$nk2F4(r1, rz, P, z ) L (2.31 ) with the initial condition being the same transformation of rt. Equation (2.31) can be transformed to the plane wave type (Tur and Beran [1982])
aP4- i az
) r l , r2, p, z ) k V, * Vz P4 - ink2F4(~1,rz, p + K Z / ~ P4(~,
(2.32)
after substituting T,(K,r1,r2, p, z) = P4(x, r l , r z , p - Kz/k, z ) and changing variables. The dependence on p and K is parametric, so we can set either of them equal to zero. When K = 0, eq. (2.30) becomes an average with respect to R, (2.33) This function was shown to be useful in characterizing the evolution of the spatial moments of a finite beam (Tatarskii [1971]).
2.4. GENERALIZATION OF THE PROBLEM
Certain applied problems considered by wave propagation theory really require only knowledge of the field moments. Examples are the mean inten-
216
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, § 2
sity profile of a laser beam, the scintillation index of a beam or wave and covariance of the intensity and higher intensity moments. Sometimes the specifics of the applied problem necessarily lead to an analysis of some power functionals of fields. Here we outline briefly some examples. The energy flux through a finite-sized aperture (or a signal detected by a finite-sized receiver) in the plane can be written as P=
s
d2R A 2 ( R )u(R,Z) u*(R, z),
(2.34)
where A(R) is the aperture field transparency function. The spatial spectrum of the instantaneous intensity of the beam can also be written in the form (2.34) if we let A(R)= exp(ilc * R). The instantaneous intensity distribution in the image plane of an optical system can be written as
s
d 2 r d2r’A(r) A(r’) u(r, z) u*(r’,z) exp[ikR.(r-r’)/lZ-
I,,,,@)=
Zo1],
(2.35)
where u(r’, z) is the field in the entrance plane of the aperture, and it can be treated as a generalization of eq. (2.34). The spherical wave expansion (2.4) lets us separate not only initial conditions but also operations performed over the field in the final plane from the propagation terms described by the Green function G. For example, the image intensity distribution (2.35) can be written as
s s
d2r d2r’ d2rod2rb uo(ro)u$(rb) G(r, 2;yo, 2,)
Zi,,,(R) =
x
G*(r‘, 2; rb, Zo)A(r)A(r’) exp[ikR *(r- r’)/lZ - 2,1].
(2.36)
The Green function representation is also convenient for small-angle propagation problems in which light is scattered or reflected on a body embedded in a random medium and propagates backward to the source plane. In this case we can use the reciprocity principle (2.5) to describe the field at the point (rb, Z0),reflected by an object placed in the plane z = 2 and having a reflectivity distribution O(r), u,,f(rb
=
s
d2r d2ro uo (yo G(r, Z; vb
5
zo G(r, Z;yo
9
20)
O(r).
(2.37)
IV,9: 31
217
INTRODUCTION TO PATH INTEGRALS
The functional form of eqs. (2.36) and (2.37) leads us to a generalization of the problem being formulated by eqs. (2.1) and (2.2), and eqs. (2.14) and (2.16). We examine instantaneous values and averages of power-type functionals containing Green functions, related to some slab Z o 6 z < Z of a random medium, having the form Pnn{G}=
1
[
d2r, ... d2rn d2r; ... d2rL [ d 2 p l ... d2pn[d2p; ***d2pb
x W(pl,...,~n,~;,...,~b)LY(rl,...,rnr~’,,...,~b) x G(r1 , 2; P I 20) G(rn, 2; ~ 9
x G * ( r ; , 2;p’, , Z o )
n 20) ,
(2.38)
G*(rb, Z; pb, Zo).
For example, eq. (2.36) corresponds to n = 1 when W(Pl1
P; 1 = U O ( P 1 ) u,*(P;
(2.39a)
17
and LY(r,, r ; ) = A ( r , ) A ( r ; ) exp[ikR * (rl - r’,)/lZ- 2, I].
(2.39b)
We will further limit ourselves to n < 2, and for the sake of simplicity, set Zo = 0 , Z > 0 and omit the Z o dependence of a Green function.
Q 3. Introduction to Path Integrals 3.1, DERIVATION OF PATH-INTEGRAL REPRESENTATION OF THE PARABOLIC
EQUATION SOLUTION
It is instructive to introduce a functional integral via the simplest onedimensional parabolic equation
au
-=
at
aZu
D 7- V(X, t ) U(X,
ax
t),
U(X, 0) = u’(x).
Here D is a complex number, with Re D > 0; V ( X ,t ) is an arbitrary “good” function. This diffusion equation was studied from a probabilistic point of view by Einstein, Smoluchowski, Wiener and Kac (see, e.g., Kac [19591). The following method of deriving the path-integral representation for the solution of eq. (3.1) can be found, e.g., in the review paper by Tatarskii and Zavorotny [1980]. We integrate eq. (3.1) from t = t k to t k + = t k + At and
,
218
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRALAPPROACH
CIV, 0 3
obtain
Equation (3.2) can be verified by expansion of the exponent in a Taylor series. It can also be represented as
because the commutator
Hence, we can consider these operators as commu a ing with an accuracy of O(At2). Using the Fourier representation of the &function, we can write
It is easy to calculate the action of the operator exp(At D a z / a x z ) because this operator takes the form exp(-At DK’).We can then calculate the integral over K. If Re D > 0, this integral converges and we obtain the simple result
(3.3)
We then substitute this expression into eq. (3.2) and write x = X k + 1, x’ = x k . We can introduce the factor exp[-At V ( x ,t k ) ] under the integral sign because it does not depend on x’ = x k . The final result is given by
exp[-At
u ( x k + l , tk)
+ o(At)lU(Xk,
tk).
(3.4)
Thus, we use eq. (3.4) recursively to obtain values of u for the next point,
IV,o 31
219
INTRODUCTION TO PATH INTEGRALS
given that the values of u are known for the previous point. Let x N = X , tN = T be the final point, to = 0 be the initial point and u(x, 0) = uo(x) be known. We then obtain for At = t/N,
+(xN-1 - At[u(X,
I
+ No(At)
tN- 1 )
+
-XN-2)2
+~(XN-
uo(xo).
"'
+(XI
1, t N - 2 )
"*
-xO)2]
XI, o)] (3.5)
Taking the limit of eq. (3.5) as N + co and No(At)-,O gives the solution to eq. (3.1). The limits of the expressions in the exponent of eq. (3.5) are the following:
f T
N
Here we have introduced the continuous function X(T) in such a manner that X(tk) = xk, x(T) = X. We also use the notation
Using all these notations, we can represent eq. (3.5) in the formally closed form x(T)=X
u(X, T) =
220
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, 8 3
where i ( r ) = x(z)/dz. From eq. (3.6) it follows that ('"=x
exp{ -
JOT
[i(z)]' d r )
~ x ( T= )
1,
The right-hand side of eq. (3.7) is known as a functional, or path, integral (Feynman [1948,19581,Kac [19591, Gel'fand and Yaglom [19601, Feynman and Hibbs [1965]). If D = D*, this functional integral is the Wiener integral (Kac [1959]). If D is pure imaginary, it is the Feynman integral in a configurational (coordinate) space. For wave propagation problems, D is a complex number; when Re D -4 Im D,the properties of this integral are more similar to the Feynman integral.
3.2. UNCONDITIONAL AND CONDITIONAL PATH INTEGRALS
Following Feynman, we call the type of path integral exemplified by eq. (3.7) the unconditional path integral, although there is one condition on the end of the path included in the definition given by eqs. (3.5) and (3.6). Using the Green function introduced in 8 2 for this two-dimensional case, we substitute uo(x)= 6(x - X , ) in eq. (3.7) to obtain x ( T )= X
C ( X , T; X,) =
x 6[x(O) - X,].
(3.9)
It is clear that all the paths which really contribute something to the integral are restricted by the additional condition x(0) = X,. We may include this restriction in the definition of the integral using, e.g., a discrete formulation analogous to eq. (3.5) with obvious modifications. The result for the Green function can be written as x ( T )= X
C ( X , T; X , )
=
IV,
o 31
INTRODUCTION TO PATH INTEGRALS
22 1
Fig. 1. Schematic diagrams of paths for unconditional path integrals.
and the general solution (3.7) takes the form
(3.1 1) Following Feynman, we call this type of path integral the conditional path integral. It should be noted that the normalization (3.8) produces some normalization of the conditional path integral. Schematic diagrams of paths for the unconditional and conditional path integrals are shown in figs. 1 and 2, respectively. 3.3. THE PROBABILISTICINTERPRETATION
We see that all the approaches developed in the preceding sections connect fields, or their spectral expansion, between successive planes. The trajectories of the path integral were considered to be deterministic, albeit arbitrary. A different and appealing approach considers the trajectories as Brownian motion and assigns probabilistic weights to the trajectories. The resulting Wiener path integral represents a sum over all these trajectories (Kac [ 19591).
222
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, 5 3
X 0
+7
T *O
Fig. 2. Schematic diagrams of paths for conditional path integrals.
Let us consider the probability distribution
... dXN- 1 dP(x0, ...,x N - 1 ) ~ dxo 2 J z x 2 J z x (3.12)
with D real, of the random numbers xo, ..., xN-1. Here, xN= X is the fixed value. This is, of course, the Gaussian probability distribution, and we can use it to represent eq. (3.5) as an expectation of a functional (3.13)
over xn fluctuations, denoted by the brackets ( ). In the limit N -+ co,the continuous form of the probability density becomes .9P[x(z)] = exp
I
--
joT
A’(7)
I
dz 9x(7),
(3.14)
and it is properly normalized because of eq. (3.8). Using eq. (3.14),we rewrite the unconditional path integral (3.7) as a conditional expectation,
IV, § 31
223
INTRODUCTION TO PATH INTEGRALS
The subscript x ( T ) = X denotes that only paths that follow the condition x ( T ) = X are considered. The trajectory x(z) in eq. (3.15) is a Gaussian random process, characterized fully by its mean value ( x ( T ) )and the correlation function B(z’, z”) = ( [ x ( z ’ )- ( x ( z ’ ) ) ][ x ( r ” )- ( X ( ? ” ) ) ] ) . From eq. (3.14), it follows that B(z’, z”) = 2 D min ( T - z’, T - z”).
(~(7)) = X,
(3.16)
The variance a’(?) = B(z, T ) = 2 D ( T - z) is equal to zero at the point z = T and to its maximum value of 2 0 at the point t = 0. It corresponds to a Brownian motion beginning at the final point z = T directed “back” (see fig. 3). For the case D # D*, the probabilistic interpretation has only a formal meaning. However, a formula obtained using this interpretation will be correct even for the case of complex D, if the result has a mathematical meaning for such a D . In the case of the conditional path integral (3.9), it is beneficial to include the initial restriction in a continuous form of the probability density function,
{
9 P [ x ( z ) ]= N - ’ exp - :D
SoT SoT
I
a’(?)dz 6[x(O)- X , ]
~ x ( T ) . (3.17)
Here N is a normalization coefficient given by
=
S’’;”
{
9 x ( t )exp -
[i(r)]’ dr} 6[x(O) - X , ] .
(3.18)
It is clear that N is simply the well-known free-space Green function of
z
0
I
trajectory
Fig. 3. The Brownian trajectory in the case of the unconditional path integral.
224
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, 5 3
(3.19) and we can now represent eq. (3.1 1) as u(X, T) =
s
dZXouO(X0) Go(X, T, Xo)
x (exp
{
-
loT
W x ( z ) ,71 dz
(3.20)
and Green's function as
x(T)=X
(3.21) x(0) =xo
We denote the averaging over these random paths by (...)$/I$o. It can be shown that the new probability density function (3.17) is also a Gaussian one, but it differs from the previous one because both ends of the path are fixed. We can calculate the mean value ( ~ ( t )and ) the correlation function B(z', z") for this random process. The result is given by
B(t', z") = 2 0 min(z', z")
max(z', T")
(3.23)
The mean value for this conditional process is a linear function. The variance aZ(z) = B(z, 7) is equal to B(z, z) = 2Dz(T- z)/T and vanishes at the ends of the paths (see fig. 4). It is also possible to include the arbitrary initial conditions in the probability density function of paths by Qp[x(t)] = N - ' exp
(3.24)
Of course, in the framework of Wiener path-integral theory we need uo(x) to be a non-negative real function, but keeping in mind the generalization
IV, 5 31
225
INTRODUCTION TO PATH INTEGRALS
ex>
1
+\ o(z)
‘trajectory
Fig. 4. The Brownian trajectory in the case of the conditional path integral.
of the case of complex D and the Feynman integral, we can consider complex initial conditions as well. The normalization coefficient N here is given by
=
sx(T’=x { soT 9 4 7 ) exp
CW12dr}
-
uo [x(O)l,
(3.25)
comparing eq. (3.25) with eq. (3.7), the former is obviously a free-space solution of eq. (3.1), which we can write readily as
}
N=uo(X, T ) = ( 2 m ) ’ {dXoexp{ - (x4DT - xo)2uo(Xo). (3.26) Now we can represent eq. (3.7) as U(X, T )= u o v , T )(exp
{ J; -
VCx(t), T I d.r
(3.27)
Here, averaging is performed over paths having a random distribution of the initial coordinate x(0) with the “probability density function” uo and a fixed final coordinate value x( T )= X. Keeping in mind the generalization of the problem described in 0 2.4, we consider some linear functional of the field in the plane z = T, having the form F[u(X, T ) ]=
s
dX u(X, T ) W(X),
(3.28)
where W(X) is some weighting function. The path-integral representation of
226
CIV, § 3
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
the field [eq. (3.7)] allows us to write eq. (3.28) in the form
1
x(T)=X
F[u(X, T ) ] =
dX W ( X ) x exp{-
9 x ( t ) uo[x(0)]
& loT
[i(t)]’d t
-
loT
V [ x ( t ) ,t ] d t
1
.
(3.29)
We include both the initial conditions and the weighting function in the probability density function of paths by
{ -&lor
9 P [ x ( t ) ]dX = N - W ( X )exp -
i’(2)
dt} uo[x(O)] 9 x ( t )dX, (3.30)
where according to eq. (3.29), all of the paths x ( t ) have the fixed end value x(T)= X, and the averaging of X is separated out. The same set of paths can obviously be described by 9 P [ x ( t ) ] = N - W[x(T ) ]exp
{ & loT
i’ (T)dr} uo [x(O)] 9 x ( t ) ,
-
(3.3 1)
in which no restrictions on the paths are presumed. Similar to eqs. (3.24)-(3.27), we obtain
g [ u ( X , TI1 = FCuo(X, TI1 (exp
{ :j W x ( 9 ,TI -
dt})w,
(3.32)
UO
where averaging is performed over paths having a random distribution of the initial coordinate x(0) with the “probability density function” uo[x(O)] and a random distribution of the final coordinate value x(T) with the “probability density function” W [ x ( T ) ] These . paths and those in eq. (3.27) are generally non-Gaussian random processes unless uo and W are Gaussian functions. Therefore, the mean value and covariance are not sufficient to describe their statistics. A full statistical description can be obtained through the characteristic functional of a random process (Rytov, Kravtsov and Tatarskii [1988]): (3.33)
IV,8 31
221
INTRODUCTION TO PATH INTEGRALS
3.4 PHASE-SPACE PATH INTEGRAL
Equation (3.1) can produce yet another type of functional integral, called the phase-space path integral as contrasted to the configuration-space Feynman path integral [eq. (3.7)]. In our field of research, this integral is mainly used to simulate realizations of wave field propagating in random media (Martin and FlattC [1988, 19901). Its importance stems also from the simple interpretation it offers to recent analytical approximate solutions (Frankenthal, Beran and Whitman [ 19821, Beran, Whitman and Frankenthal [19823, Mazar and Beran [19821, Frankenthal, Whitman and Beran [19841, Macaskill [1983], Uscinski [1982, 19851). In the future it will probably provide an efficient technique for numerical solution of fourth-moment propagation. Assume that we prefer not to calculate the integral with respect to the variable IC preceding eq. (3.3). Equation (3.4) will then read
X
exp[-At DK2 - AC
U(Xk+
1, tk)]U(Xk, t k ) .
Using this relationship recursively produces
x
exp[irc,-,(x,-xN-,)...
+iKO(x1- x o )
- A ~ H ( K N - ~ , X Ntj,-l)*" -~, -AtH(Ko,xo,O)]uo(xo), (3.34) where xN= X and H ( K ,X,t ) = DIG' + U ( X ,t )
(3.35)
is the Hamiltonian of the system, in which K , x, and t are momentum, position, and time, respectively. As N + 00 and At +0, this integral assumes the formal limiting form x(T)=X
[ loT
9 x ( r ) exp i
[
x exp -
loT
K(T)
1
i ( z ) dr]
H(K(T),x(r), r) dz uo(xo).
(3.36)
228
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, P 3
Thus, in this limit we integrate all the values of ~ ( t and ) ~ ( t for ) 0 < t < T, and the function x(t) is subject to the end condition x ( T )= X . Note that the function K(Z) has no restrictions at all. The integration measure may be written formally as
(3.37) which is expressed in terms of the product of Liouville measures over all t. Since 9x(t) is determined by eq. (3.6), the measure of the coupled variable K should be equal to
(3.38) The power of this formulation is evidently that it is able to handle general Hamiltonians that are not quadratic in K. Practically, however, this class is limited to cases for which the proper ordering of K and x is known. The phase-space path integral is expressed in terms of the classical mechanics canonical invariants: action and Liouville measure on the phase space. We will not pursue the suggestive classical interpretation, since it used in our field of research in its finite-dimensional definition for simulation. For the frequently encountered Hamiltonians that are quadratic in K, it is possible to show a simple way to remove the configuration path integral from the phase-space path integral. Completing the square demonstrates that shifting the momentum ~ ( t+) ~ ( t ) ii(t)/2D, splits the integration over K and x. The result is
+
where Jf =
[ {
exp - D
loT
[ ~ ( t ) ]d~t
}
%(t)
= 1.
The last integral should be considered in the complex domain. The equality can be proved using the discrete representation and the definition of the corresponding measure [eq. (3.38)].
o
IV, 41
WAVE FIELDS IN INHOMOGENEOUS MEDIA
229
8 4. Path-Integral Representations of Wave Fields in Inhomogeneous Media In this section, we obtain several different wave field representations, based on some simple transformations of the functional variable and the probabilistic approach to path integrals. All these representations are equivalent, but they give different insights into the problem and can serve as bases for different analytical approximations and numerical techniques. 4.1. BASIC UNCONDITIONAL AND CONDITIONAL FEYNMAN
PATH-INTEGRAL REPRESENTATIONS
The parabolic diffusion equation (3.1) has the same form as the parabolic wave equation (2.1), but with the following notational changes T+ 2 > 0,
T + z,
9 X ( T ) + 92p(z)
D = i/2k,
4 7 ) +P(Z) =
CX(Z),Y(Z)l,
= 9x(z) 9y(z),
U ( x , t)+tikE[p, z].
(4.1)
Substituting eq. (4.1) in eq. (3.7), we can write the solution of eq. (2.1) with the initial condition (2.2) in the form of the unconditional Feynman path integral (Chow [1972]),
x exp {iik
loz + soz Ip(z)12dz
$ik
E[p(z), z] dz}.
(4.2)
Following eq. (3.8), it is clear that this path integral is normalized by the condition
s
p ( Z )= R
@p(z) exp{fik
joz Ip(z)I2
dz} = 1.
(4.3)
Analogously, from eq. (3.11) we can express the field in the form of a conditional Feynman path integral,
[
u(R,2)= d2Ro u0(Ro) x exp {$ik
PW=R jp(0)=Ro
loz
92p(z)
Ip(z)12dz
+i i k
soz
E[p(z), z] dz}.
(4.4)
230
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, § 4
Substituting uo(p)= S(p - R,) gives us the unconditional path integral form for the Green function
soz
lp(z)I2dz + 3ik
x exp {fik
joz z] E[p(z),
dz}.
(4.5)
As a result of the reciprocity principle (2.5), we can also write the Green function unconditional path-integral form as
joz
Ip(z)12dz + 3ik
x exp {fik
joz zl ,%p(z),
dz}.
(4.6)
Comparing eqs. (4.5) and (4.6), we see that path restrictions included in the definition of the unconditional Feynman integral can be replaced by an appropriate b-function in the integrand. The same is also valid for the conditional integral. Therefore, we can replace the Green function representations (4.5) and (4.6) by
joz
Ip(z)12dz + iik
x exp {iik
joz
E[p(z), 23 dz},
(4.7)
and the general solution (4.4) by
x exp {fik
1,’
Ip(z)l’ dz
+ i i k [: E[p(z), z] dz).
(4.8)
In eqs.(4.7) and (4.8), the path arguments p(z) are not restricted by the definition, but the normalization is valid only if there is at least one b-function in the integrand,
IV, 0 41
23 1
WAVE FIELDS IN INHOMOGENEOUS MEDIA
= J9’p(z) G[p(Z)- R ] exp
{
fik
1
IP.(n)l’ dz = I . loZ
(4.9)
4.2. VELOCITY REPRESENTATION FOR PATH-INTEGRALVARIABLES
These representations were used frequently in the works of Russian authors; they simply changed the dummy integration variable. For the twodimensional problem, this new variable can be related to some velocity u(z) = i(z). In spite of the three-dimensional character of the problem considered further, we keep this name to distinguish it from the more common coordinate path integral. This new variable seems to be more convenient than a coordinate one for analytical transformations, although this is not so clear from a physical point of view. In the discrete representation ( 3 3 , it can be introduced by
k = O ,..., N - 1,
Uk=(Xk+I-Xk)/At;
X,=x.
(4.10)
Here we use directly the continuous form to introduce the velocity representation, changing the functional variable in the following manner (Klyatskin [1975, 1980]), U(Z) =
-@(z),
p(Z)= R,
Iu(z)~
(4.1 1)
= W(Z)
which is equivalent to P(4 = R +
s:
o(c)
a.
(4.12)
Substituting this expression into eq. (4.2) gives
loz loz + lZz [
u(R, 2)= h’u(z) uo R + x exp {fi k
E[ R
u(z) dn] exp {fik
loz
u(c) dc, z] dz}.
u’(z) dr} (4.13)
Note that the path boundary condition p(Z) = R is included in eq. (4.12), and it is clear from eqs. (4.10) and (3.5) that, by definition, u(z) is not restricted for any z. We use here the new notation D for a differential to emphasize that: (i) the velocity functional variable u(z) belongs to a different functional space, and
232
CIV, § 4
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRALAPPROACH
(ii) the functional integral over velocities is normalized by
s
[
DZu(z)exp 3ik
1
u2(z) dz = 1. soz
(4.14)
Using the velocity integral for the Green function (4.7), we get r
r
rz
1
Subtracting the slope between the source and the observation point, which is just a shift of variable; i.e., u([) +u({) - ( R - Ro)/Z,leads to the following form, iklR - Ro12 x exp {iik
[
soz soz u2 ( z )dz
+
+ (R - Ro)-Z
.(Ro
Z
Inserting eq. (4.16) into eq. (2.4), we obtain the functional integral form of the SWE, u(R,2) =
s
iklR - ROl2
d2Rou
{ [
x exp 3ik
soz soz u2 (z) dz
+
E'( Ro
+ ( R - Ro)-ZZ
We can check the free-space value of the Green function by calculating eq. (4.16) for E(r, z ) = 0, which leads to
N=
s
D2u(z)6 [soz
1 i
u(z)dz exp 3ik
1
u2(z)dz soz
.
Using the Fourier expansion of the &function, we can write
(4.18)
IV, 5 41
A’”
233
WAVE FIELDS IN INHOMOGENEOUS MEDIA
-!-
= 4n2
{ya { d2p
D2v(z)exp[$ik
loz
u2(z) dz] exp[ip*{ozu(r) dz].
Making the change of the functional variable u(5) --t u([) - p/k gives
Using eq. (4.14) and calculating the two-dimensional integral, we get .Af = k/(2niZ), which leads us to the correct value of the free-space Green
function, G(R, 2; R,) = -exp 2n1z
4.3.
1.
ik(R- Ro12
[
22
(4.19)
VARIATIONAL OPERATOR REPRESENTATION
Sometimes an equivalent functional operator form of the functional integral [eq. (4.2) or (4.13)] is used (see, e.g., Klyatskin and Tatarskii [1970b], Klyatskin [1975, 19803). We use here a lucid derivation of this form for a two-dimensional problem that was presented in the review by Tatarskii and Zavorotny [1980]. We start from the operator form for a regular integral transformation [eq. (3.3)]. However, we now use the left-hand side of eq. (3.3) in a somewhat modified form,
To get a corresponding equation for the three-dimensional random medium case, we should change the notation in eq. (4.20), following eq. (4.1),
where Az = Z/N and z, = m Az. If we use the operator form (4.21) in the discrete representation (3.9, written for the three-dimensional case for each regular integral over p, we get u(R, 2) = lim exp N-02
uo(R + q1 + ... + qN)
234
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRALAPPROACH
CIV, 0 4
To perform the transition to the limit, we introduce the piecewise constant function u(zm)= qm/Az.Then, l j
+
N
lj+ 1
+ ... + l N = C
m=j
v(zm) Az+ (4.23)
Here, S/Sv(i) is the symbol of functional differentiation (for more details, see, e.g., Tatarskii [1971]). Then,
n exp p$ N
lim
N+
w
m=l
$1
= exp
[
IOz A], di
(4.24)
and eq. (4.13) is rewritten in the following operator form (Klyatskin and Tatarskii [1970b], Klyatskin [1975, 1980]),
x exp
{ loZ
E [ R + "Z
4 5 ) di, 5'1 di.j
1
.
(4.25)
u=O
It is also possible to obtain eq. (4.22) using a functional Fourier transform of eq.(4.13) (see Klyatskin and Tatarski [1970b]), or using the method developed in quantum field theory (Fradkin [1966]). 4.4. PLANE-WAVE EXPANSION
We note that our transformations from eq. (4.2) to eq. (4.8) allowed us to represent end restrictions by means of &functions. We use this simple method to express the boundary conditions of path variables in an implicit way. Applying the Fourier expansion of the &function in eq. (4.15), we obtain
x exp{$k[
loz + loZ++ u2(z)dz
IV, 5 41
WAVE FIELDS IN INHOMOGENEOUS MEDIA
235
If we now use the Fourier transform of the initial condition,
uo(Ro)=
I
(4.27)
d2Poao(Po)exp(iPo Ro),
insert eqs. (4.26) and (4.27) into eq. (2.4), and shift the velocity variable u(l:)+u(c) - Po/k, we are able to integrate over Ro. This integration gives 4n2 6(Po- p ) , and the final result becomes 2k
x exp {iik[
joz
v2(z) dz+
Ioz
( R - P O P- z)/k
Comparing eqs. (4.28)and (2.7),we see that eq. (4.28)is the functional integral form of the OPWE, and
G(R,Z;Po)= exp
2k c
x exp p i k
1J
r rz
rz
u2(z)dz
+ Jo
/
E(,R - PdZ - z)/k
0
where 6(*; was introduced in eq. (2.8). Equation (4.29) can be obtained directly from eqs. (2.8) and (4.16). Now we obtain another representation of the field. Beginning from eq. (4.17),we can replace a)
(4.30) using the equation f': u([) dl: = 0 stemming from the &function in eq. (4.17). Again, as in the case of eq. (4.26), we use the Fourier expansion of the &function and change variables. However, we then perform a Fourier transform on the final field [eq. (4.17)], making it possible to integrate over R.
236
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRAL APPROACH
CIV, (i 4
The final result is G(P, Z) = ( 2 q 4
s
d2Rouo(Ro)exp
2k
1,' + 1,' (
x exp{$ik[
u2(z)dz
S Ro + Pz/k -
s:
u(i) dr, z) dz]}.
(4.31)
Comparing eqs. (4.31) and (2.10), we see that eq. (4.31) is the functional integral form of the IPWE, and
2k
[loz + lozE ( +
x exp {iik
u2(z)dz
Ro
Pz/k -
s:
u(() dr, z) dz]}.
(4.32)
Although velocity path integration is used in eqs. (4.28) and (4.13), we can restore the coordinate paths p(z) by examining the arguments of E. In the OPWE case, P(Z)
= R - Po(Z - z)/k
+
lZZ
u(() dr.
(4.33)
We see that all paths p(z) terminate at the point (R, Z), and their mean slope over the interval ( 0 , Z ) is Polk; i.e., it corresponds to the given spatial frequency of the initial field. In the IPWE case, (4.34)
All the paths start from the point (R,, 0), and their mean slope is P/k; i.e., it corresponds to the given spatial frequency of the field in the plane z = 2. This geometrical consideration confirms the physical content of IPWE and OPWE.
IV, 5 41
WAVE FIELDS IN INHOMOGENEOUS MEDIA
231
4.5. ORTHOGONAL EXPANSION OF PATHS
It is often very useful to expand the path x(z) into a series by some set of functions and replace integration over the path space by integration over the series coefficients. In combination with the probabilistic interpretation, this approach can create a variety of analytical approximations and numerical techniques (Tatarskii [1976], Sabel’feld and Tatarskii [1978]). In this section, we intend to demonstrate this idea for the two-dimensional case using the notation of Q 3. First we will take two-dimensional versions of eq. (4.2) or (4.13) and represent the path in the form rn
(4.35) where { f p k ( t ) } is a set of functions such that {&(z)} is a complete orthonormal set; i.e., (4.36) This means that we use the orthogonal expansion not for coordinate paths, but for velocities [eq. (4.1l)]. This is more convenient because of the absence of any restrictions on velocities in eq. (4.13) and the simpler form of the first exponential. Therefore, we obtain (4.37) To fulfil the condition x(T)= X , we must satisfy the condition pk(T)= 0; this leads to (4.38) The coefficients a k are obtained from (4.39) Probabilistic interpretation of the path integral leads us to probabilistic interpretation of the Fourier coefficients. It is clear from eq. (4.39) that the
238
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRALAPPROACH
CIV, 0 4
{ak} are Gaussian numbers with ( a k ) = 0. Using eq. (3.16), it is easy to find that
(a(?’)a(?’))= 2 0 S(Z’ - T”).
(4.40)
Combining eqs. (4.39) and (4.40), we obtain (akal)
= 2D S k , l *
(4.41)
Therefore, the numbers {ak} are statistically independent. It is clear that averaging over all paths X(T) is the same as averaging over all numbers { a k } . Thus, eq. (3.15) takes the form
(4.42) Here the averaging operation over each ak denotes an integration with the weight functions
This implies that eq. (4.42) can be written in the form
(4.43) If D* # D, this transformation [from eq. (3.13) to eq. (4.41)] has no real probabilistic interpretation, but is correct for Re D > 0. The functions ( q k ( ~satisfying )} the conditions of eq. (4.36) can be introduced in different ways. We can use the additional information contained in the path statistics (3.16) to choose which set is the best one (in the mean-square sense). This is the so-called Karhunen-Loeve expansion of random processes (Papoulis [1965]). This means that we can obtain the
IV, 5 41
WAVE FIELDS IN INHOMOGENEOUS MEDIA
functions
((Pk(t)}
loT
239
from the condition (Tatarskii [19761)
B(t, t’)( P k ( t ’ ) dt’ = & ( P k ( t ) ,
0<
<
(4.44)
If we substitute eq. (3.16) into eq. (4.44), we obtain the eigenvalue problem @k
20 +Lk
(Pk
= 0,
(i)k(O)
= 0,
(Pk(T)
= 0.
(4.45)
Its solution normalized by eq. (4.36) is
2DT2
Lk
= z2(k - *)2’ k = l , 2 ,
....
(4.46)
It is clear that, for large k, the amplitudes of { ( P k ( t ) } decrease and the significance of high-order terms is lessened. This permits us to approximate eq. (3.15) by using a finite-dimensional integral (Cameron [1951]). It is essential to note that if an appropriate system of (Pk(t)is chosen, this integral converges faster than the original finite-dimensional integral (3.5) as k = N + 00 (Sabel’feld and Tatarskii [1978]). For the conditional path integral the expansion of x(t) is introduced by the formula X ( t ) = (x(t)>
+
m
(4.47)
akvkb), k=l
where (x(t)) is given by eq. (3.22), ( P k ( 0 ) = qk(T)= 0, and again we assume { & ( t )to } be a complete orthonormal set; i.e., eq. (4.36) is valid. Thus, (4.48) We multiply this equation by &(t)and integrate from 0 to T. The first term vanishes because ( P k ( 0 ) = q k ( T ) = 0, and we obtain an equation that is the same as eq. (4.39). For ( a k ) we obtain (4.49) To obtain
(XkXI),
we use
( i ( t ’i(t”)) ) =2 0
d(t’ - 2’’)-
”,” + -
(“;xo>’], ~
(4.50)
240
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, 5 4
which is the consequence of eq. (3.23). Therefore, (l?(f)l?(T’‘))
Si)k(T’)
Si)l(T”)dT’dT“= 2 0 6 k . l .
(4.51)
0
The random numbers {a,‘} thus have the same distribution as in the previous case. We can represent now the Green function (3.21) in a form that contains only an averaging over Gaussian random numbers {a,},
As the specific choice of { ( P k ( T ) } , we again use the Karhunen-Loeve expansion [eq. (4.44)], which in this case results in the eigenvalue problem
(4.53)
Its solution, normalized by eq. (4.36), is given by (4.54)
If we substitute uo(x)= d(x - X,) into eq. (4.43),we obtain another representation of G ( X , T X,) because the set of functions { ( P k ( T ) } would be given by eq. (4.46). We believe that this latter representation is not effective for numerical calculation by Monte Carlo methods. The reason is that most of the { a k } realizations will give zero contribution to the integral sum because of the &function. To obtain an effective computational technique, one would have to choose the basis in accordance with the initial conditions. In the general case (3.28),the probability density function (3.17)is generally not Gaussian, but the Karhunen-Loive expansion still can be applied.
IV, o 51
PATH-INTEGRAL REPRESENTATIONS OF MOMENTS
0 5.
24 I
Path-Integral Representations of Moments
Equations for statistical moments of the field (2.14) are of the same parabolic type as the instantaneous field equation (2.1). Thus, the pathintegral representation of a solution is available, and is similar to those described in 092 and 3. We can obtain it by applying eqs. (3.2)-(3.6) to eq. (2.14) with some obvious generalizations for the multi-dimensional case. However, here we use the solution of eq. (2.1) in the form of eq. (4.13), which provides a field realization u(p, z). Given realizations, we may construct the statistical moments r,, defined by eq. (2.12). For averaging, we use the Gaussian &correlated model of E' fluctuations [eq. (2.13)]. The result is
rn,,,(r,, ..., r,, =
r'l, ...,rk, 2)
J D2V1(Z)... D2u,(z) D2u;(z) *.. D2u:,(z) x
r;,,,[rl + Joz r;
+
ul(z) dz, *-., r ,
loz
u,(z) dz, JOZ
u; (z) dz, ...,rk
[u:(z)
+
+ Joz &(z) dr]
+ ... + u;(z) - ui2(z)- ... - u ~ ( z ) ]dz
If we use the SWE in the form eq. (4.17), we obtain
242
CIV, 6 5
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
12;
+ + Ir, - rn0l2
x exp - [lrl - rlo12
+ jik
joz + [u:(z)
+ u;(z) - ui2(z)- .... - u:(z)]
dz
It can be shown that eqs. (5.1) and (5.2) satisfy eq. (2.14). 5. I . SECOND-MOMENT PATH-INTEGRAL REPRESENTATIONS
For the second moment n = rn = I, we introduce new variables in eq. (5.2),
I',(R, p, 2)=
j
ID2V ( Z )D2V(Z)I'i
[
x exp ik
[.+ joz
V(Z) dz, p +
loz 1 loz jz2
x exp{-&ck2
joz W(Z)
d~]
V(z). w(z) dz
H[p+
v(C)dC,z]dz}.
(5.3)
Equation (5.3) represents the same value as eq. (2.20), so we hope that this path integral can be evaluated completely. We start from the plane wave case, when I ' i ( R , p ) = 1, and the only dependence on V(z) is in the first exponent. The integration over V(z) can be carried out using the formula
1
[ joz
D2V(z)exp ik
1
V(z)-w(z) dz = 6, [v(z)],
(5.4)
where a(, is a &functional (Novikov [1961]). Equality (5.4) can be obtained by using the finite-dimensional approximation of the functional integral and a)
IV, 8 51
243
PATH-INTEGRAL REPRESENTATIONSOF MOMENTS
the normalization (4.14). For the general case, we use the SWE (5.2) as a starting point and obtain
rzm P, Z )
x
s
DzV(z) Dzv(z) 6
[
x exp ik
soz
(
soz
V(z) dz) 6
V(z). w(z) dz
1
(joz
w(z) dz)
1
Here the V(z) dependence is also in the argument of the &function, but we can use the Fourier expansion of the 6-function to rewrite the integral over V(z) as @ [ W ( . ) ] = ( ~ X ) - ~ s d z p DzV(z)
[
x exp ik
soz
V(z). w(z) dz - ip*sozV(z) dz].
Applying eq. (5.4), we get
and after substitution into eq. (5.5) and integration over w(z) and p we get
11 xexp{-$nkz
1 I1
~ o z H [ p o +Z~ ( p - - p o ) , zdz
,
(5.7)
which is the same as eq. (2.21), taking into account the change in notation.
244
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, 5 5
5.2. FOURTH-MOMENT PATH-INTEGRALREPRESENTATIONS
5.2.1. Spherical wave expansion The basic fourth-moment functional integral formula can be obtained from eq. (5.2) when n = m = 2 and after a change of variables similar to eq. (2.23),
where the scattering function F4(r1, rz, p, z) has been defined by eq. (2.25). Now we can integrate over the functional variable V(z) using eq. (5.6), as in the case of the second moment (5.7), to obtain
245
PATH-INTEGRAL REPRESENTATIONS OF MOMENTS
x exp{fik
joz
[ul(z)~uz(z)] dz
Note that only two functional variables are present in eq. (5.9),and the result depends on R and p as simple (not functional) variables. This circumstance is closely related to the possibility of reducing the full-dimensional fourthmoment problem [eq. (2.24)] to a plane-wave-type problem [eq. (2.38)]. Since the main problem with eq. (5.9) is connected with functional variables, for simplicity’s sake we further limit ourselves to the spatially averaged fourth moment [eq. (2.33)] and set p = po = 0. Then r4(r1,
=
rZ,
z,
jdZrl, dZrzoC ( h 0 , rzo) exp x
Iik
Tb.1 - r10)*(rz- rzo)
J D2U1(Z)DZUZ(Z)6 [jozdz] 6 [jozdz]
x exp[ik
Ul(Z)
joz
ul(z)~uz(z)dz]
UZ(Z)
1
246
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, § 5
where T4(~1, rz,Z ) = J dZR T4(R, rl, r z ,0, Z ) and the scattering function F4(r1, rz,z) has been defined by eq. (2.28). Note that the presence of the variables u1 and uz in the scattering function F makes it impossible to calculate eq. (5.10) analytically. In the case of the second moment, only one of the two functional variables was present in H.This made it possible to calculate the functional integral analytically, using eqs. (5.4) and (5.6). As was noted in 8 1, the method for the evaluation of eq. (5.10)for asymptotically large values of Z was proposed in papers by Zavorotny, Klyatskin and Tatarskii [1977] and by Dashen [1979]. We will, not consider such approaches here. We address instead another fruitful idea, which uses the principle of simplification of the corresponding path integral under some reliable assumptions. This allows us to reduce the initial path integral to some ordinary finite-fold integral, which can be calculated analytically or numerically. We might start directly from eqs. (5.8) or (5.10), but there exist additional ways to compose new approximate formulae for the statistical field moments using various expansions described in & 2 and 4. According to them, eq. (5.8) represents the fourth moment of the field in the SWE (4.17). To compose the statistical moment of the wave, one can produce two other, different forms by multiplying and averaging OPWE and IPWE forms of the field, presented by eqs. (2.7) and (4.28), and eqs. (2.10) and (4.31), respectively. Several different formulae can be derived using mixed representations in the product of the fields. For convenience, we prefer to use the velocity representations for the variables of the Green function, but coordinate representation can be used as well. It is also possible to derive the forms corresponding to the OPWE and the IPWE directly from the SWE (5.8). 5.2.2. The outgoing plane-wave expansion As in the case of the fields (2.7) and (4.28)dealt with before, we can exclude the initial boundary conditions on both paths. The first step is Fourier representation of the b-functions in eq. (5.10), written as
We then shift the functional variables
vj(0
-+
Uj(0
-pj/k
j
=
192,
(5.12)
and change the variables pi= xi- (ri - rio)k/z, i = 1,2.
(5.13)
IV, § 51
247
PATH-INTEGRAL REPRESENTATIONS OF MOMENTS
As a result, we can rewrite eq. (5.10) as
where
=
S d Z x ld2x, exp x
S
rcl * K ,
i
+ ircl * ( r 2- r z 0 )+ irc, *(rl- r l 0 )
D2u1(z)D2u2(z)*expik SoZ
+
rz - u, (' -
k
1
q ( z ) uz(z)dz
SZz (0 u2
di, z] dn}.
(5.15)
This Green function representation of the problem differs from the representation in eq. (5.10). It is more convenient to write eq. (5.15) in terms of the complete Fourier transform of the moment in the initial plane, defined by
'S
f:(xZ,Kl)=-
(2.14
d2rlo d2rZofz(rlo, r z 0 )exp(-irc, *rl0- ircl * r z o ) . (5.16)
We can then integrate over rl0 and rz0.This produces a simplified new form of r4, f4(r1
3
r2,
x
z,
J [Dzu,(z)D'v,(z)expfk J
- ink2
Ioz
ul(z) uz(z)dz 0
F4 [rl - u1 k
+
r2 - u, (' k
SZz
v,(C) di, Z] dz].
(5.17)
248
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, § 5
Comparing eqs. (5.17) and (5.10), we see that the former contains no end restrictions on functional variables, and comparing it with eq. (4.28) we see that the form of the arguments of E" in eq. (4.28) and of F ( . ) in eq. (5.17) are similar. In fact, eq. (5.17) can be derived directly as a fourth moment of eq. (4.28), so we can treat eq. (5.17) as a fourth moment in the OPWE form. 5.2.3. The incoming plane-wave expansion
We can exclude the final boundary condition on both paths in the same way we did to get eq. (4.31). First, we make the replacement in eq. (5.10) of
which is possible because of the d-functions in eq. (5.10). We then repeat the transformations from eq. (5.11) through eq. (5.13). The final result is more a clearly expressed for the spectrum r4(~2,icl, Z) than for r4(r1,r2, z). The relationship between these two functions is analogous to eq. (5.16). Following these steps, we have finally ?4(K2
3
K1,
z)
x exp[ -i
Z
K1 * K 2 - iK1
1
-rzo- i K 2 orl0
i joz
x {D2ul(z) D2u2(z)exp ik
v,(z).v2(z)dz
(5.18) Again, comparing eqs. (5.18)and (4.3l), we see that the form of the arguments of E in eq. (4.31) and of F4(.)in eq. (5.18) are similar. In fact, eq. (5.18) can be derived directly as a fourth moment of eq. (4.31). Therefore, eq. (5.18) is a fourth moment in the IPWE form.
IV, P 51
PATH-INTEGRAL REPRESENTATIONS OF MOMENTS
249
5.2.4. The mixed plane-wave expansion In the case of the fourth moment we have one more possible representation, It is a mixed IPWE-OPWE, not for fields but for the fourth moment (5.10).To get this mixed plane-wave expansion (MPWE), we need to make the replacement in eq. (5.1) of
and then to transform the variables according to eqs. (5.12)-(5.14) to obtain
iK1
x
[
+
(r2 - rzO) k2* (rl - rl0)- I
{
D2u1(z)D2u2(z)exp ik
loZ
x1 * x2
ul(z).u2(z) dz
(5.19) A comparison of eqs. (5.19)and (5.15) shows that the former has the same degree of complexity. Although F4(-,-) in eq. (5.10)is symmetric with respect to the arguments r1 and r 2 , in eq. (5.19) it is not. The MPWE is written most conveniently in terms of partial spectral densities of r4and To, namely,
(5.20)
250
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRAL APPROACH
CIV, 9 5
Substituting eq. (5.20) in eq. (5.19), we get
Z)
f 4 ( r l Y K1,
=
4nZ
SdzKzd2r2, P:(xZ, r20)
[ 4
x exp - i
r
K1
+ iKz - r1
K 2 - i K 1 * rz0 i-
1
rz
(5.21) To represent eq. (5.19) or eq. (5.21) by coordinate variables, we need to introduce
(5.22a) and (5.22b) Note that this change of variables introduces the additional boundary conditions p1(Z)= 0, p2(0)= 0. In this case, eq. (5.21) becomes
1 4n2
= -S
d z ~d2r2, 2 I ? : ( K ~ ,rzO)
IV,8 61
HEURISTIC APPROXIMATIONS AND PATH-INTEGRALREPRESENTATIONS
{
x exp - ink2
loz
F4 [rl
25 1
(zk + p1 (z),
- K~ - z,
(5.23) Equation (5.23) exhibits clearly the mixed nature of the representation, because boundary conditions are enforced on opposite ends of the paths. The mixed representation [eqs. (5.21) and (5.23)] differs from the previous three [eqs. (5.10), (5.17) and (5.18)], because it can neither be obtained as a fourth moment of a field in some form nor as a combination of field representations.
6 6. The Connection between Heuristic Approximations and Path-Integral Representations 6.1. HEURISTIC FIELD APPROXIMATIONS
Now we can easily connect the exact representation of fields [eqs. (4.17), (4.28), and (4.3l)] with the following heuristic analytical approximations: (i) the phase approximation of the Huygens-Kirchhoff method (PAHKM), which was first used by Feyzulin and Kravtsov [1967] and analyzed by Banakh and Mironov [19773; (ii) the approximation that we call here the phase approximation of the plane wave expansion method (PAPWEM), first used by Gochelashvili [1971]; and (iii) the spectral phase approximation of the Huygens-Kirchhoff method (SPAHKM), described by Aksenov and Mironov [1978]. In a later paper Aksenov and Mironov [I9791 used another name for this technique, namely, the phase approximation of the spectral expansion method (PASEM). We consider these as heuristic approximations because they were derived by replacing some part of an exact expression by an approximate one, primarily because of obvious computational advantages to be gained. However, their limits of applicability are still not properly investigated.
252
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, 8 6
First, we show that these approximations have a direct connection with the exact field representations discussed in $82 and 4: the spherical wave expansion (SWE), the outgoing plane wave expansion (OPWE), and the incoming plane wave expansion (IPWE). The main formula of the PAHKM is
IR - Ro12
Z
(6.1)
It is seen that eq. (6.1) can be derived from the SWE (4.17) if one neglects the dependence over u(c) in the argument in b and integrates over u(z), using eq. (4.18). The Green function approximation corresponding to eq. (6.1) is
k [iklR2;ROl2 G,(R, 2; R,) = -exp 2niZ
+ 3ik
soz
E( Ro
+Z
and it matches obviously the reciprocity condition (2.5). In the same way, the formula of the PAPWEM can be obtained from the OPWE (4.28), again omitting the u(c) dependence in the argument of E, u,(R, z) =
s
d2Po tio(Po)
ip'z + +ik 2k
soz
E"(R- (2- z)Po/k,z) dz
The Green function approximation corresponding to eq. (6.3) is G,(R, Z; Ro) = ( 2 7 ~ ) - ~
1
ip; Z iPo -(R - R,) - 2k
1
$R - (2- z)P,/k, z) dz , and the reciprocity condition (2.5) is invalid for G,.
1
.
IV, 5 61
HEURISTIC APPROXIMATIONSAND PATH-INTEGRAL REPRESENTATIONS
253
Considering the IPWE (4.31) and omitting the u([) dependence in the argument of i, we obtain the first term of the SPAHKM,
‘J
zi,,(P, 2)= - dZRouO(R0) 4n2
+
x exp[ -iP*Ro - ipz2 $ik
2k
Soz
E( Ro
+
P, z ) dz].
(6.5)
The Green function approximation corresponding to eq. (6.5) is G,,(R, Z R,)
=
-
iP (R - R,)
+ fik
Soz
-
iP22 k
~
i( Ro + P, z) dz],
(6.6)
and the reciprocity condition (2.5) is invalid for Gsp. All these approximations can be called “straight line approximations” because they can be obtained by neglecting all the deviations of the paths from the straight line. In terms of the probabilistic interpretations [eqs. (3.15) and (3.20)], this means replacing the averaging of a functional over paths by substituting the mean path value instead of the path realization. In terms of orthogonal expansions of paths [eqs. (4.43) and (4.52)], these approximations are equivalent to neglecting all the terms in the expansion but the zero-order terms. Approximations (6.1)-(6.6) were produced heuristically by neglecting the amplitude fluctuations in corresponding spherical and plane waves and assuming simple geometrical optics formulae for the phase fluctuations. Physically these approximations ignore the cumulative effect of the perturbed paths in the contribution to the field by inhomogeneities. The connection between the PAHKM and the functional integral representation was discussed in papers by Zavorotny, Klyatskin and Tatarskii [I9771 and Tatarskii and Zavorotny [1986]. It should be noted that although eqs. (4.17), (4.28), and (4.31) are just different forms of the same exact solution of eq. (2.1), approximations (6.1)-(6.6) are quite different if i # 0. It is obvious that we can obtain a series expansion of the exact path representations of the fields described by eqs. (4.17),(4.28), and (4.31) using
254
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRAL APPROACH
CIV, 5 6
eqs. (6.1)-(6.6) as first terms. For example, we can rewrite eq. (4.17) as
u(R,2) =
j
d2RouO(R0)exp
x exp {iik
x exp {iik
[
[Io2 jo2 + ( u2(z)dz
[2( Ro -E
]j + joz+
iklR - Ro12 2z
W Z )
6
[loZ
u(z) dz]
E(Ro ( R - R o ) z / Z ,z ) dz
( R - R o )z
+ jzz u(5) di, z)
Ro+(R-Ro)--,z z
)Idz}.
(6.7)
We can now expand the last exponent in eq. (6.7) in a series and perform the u(z) integration via the Fourier expansion and eq. (5.6) to obtain the series in which the first term coincides with eq. (6.1). It is important that all these approximations give the exact value of the first [eq. (2.18)] and second [eq. (5.7)] moments. 6.2. FOURTH-MOMENT HEURISTIC APPROXIMATIONS
Now, if we return to the spherical wave expansion for the fourth moment [eq. (S.lO)] and omit one of the functional arguments of the function F4(*), then the path integration is becoming trivial, and we obtain the PAHKM formula for the fourth moment Ts(r19 r2, Z )
(6.8)
which can also be obtained as the fourth moment of eq. (6.1). This approximation was widely used for a variety of problems (Lutomirskii and Yura [1971], Yura [1972], Lee, Holmes and Kerr [1976], Fante [1975, 19851, Mironov [1981]), although it is clear that it has some drawbacks. First, it
IV, § 61
255
HEURISTIC APPROXIMATIONS AND PATH-INTEGRAL REPRESENTATIONS
is not a difficult task to show that eq. (6.8) gives a zero scintillation index value for the point source without regard to the intensity of the fluctuations in the medium. Second, eq.(6.8) does not match the energy conservation principle (0 2.2), having here the form
s
r4(0,r 2 , 2)d2r2= const.
(6.9)
So, approximation (6.8) is known to be invalid for the calculation of scintillations for small sources and for large incoherent sources, as well as for the calculation of the spatial covariance of the intensity when there is a large separation between the receiving points. At the same time, eq. (6.1) gives the exact result for the second moment and qualitatively correct asymptotic results for the plane wave scintillation index (Mironov [1981]). The same procedure of omitting the functional arguments applied to the OPWE [eq. (6.3)] gives us the PAPWEM formula for the fourth moment, rp(rl, r2’ 2) =
1
[
d 2 x l d Z K 2P:(rc2, K ~ exp ) -i
{
x exp - i n k
soz
F4[r1
Z
-
x1 K~
+ iK1
-
r2 + iK2 rl
1
1
- K~ (2- z)/k, r2 - ~ ~- z)/k, ( z] 2 dz , (6.10)
which coincides with the fourth moment of eq. (6.3). This approximation does not describe the plane wave scintillation index correctly, but gives qualitatively correct asymptotic results for the point source scintillation index (Mironov [1981]). It can be shown that rp(rl,r 2 , 2) does not match the energy conservation principle (6.9),nor does it satisfy the reciprocity principle. Neglecting both the u([) variables in the IPWE (5.18) gives 1
C P ( K 2 , K1,
2)
x exp[ -i
1
K1 * K 2 - i K 1 - r Z 0- iK2 -rl0
x exp{ -ink2
loz
F4[r10
+Z
~
1 r20 ,
+ -Zk K z , z ]dz}
,
(6.11)
which is the same as the fourth moment of eq. (6.5); i.e., the first term of the
256
CIV, 9 6
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
SPAHKM. This gives qualitatively correct asymptotic results for the point source and plane wave scintillation index (Aksenov and Mironov [1978, 1979]), but it needs two more integrations than eq. (6.8) to obtain the second moment of intensity. It can be shown that f,,(~,, K ~Z ,) does not match the energy conservation principle [eq. (6.9)], nor does it satisfy the reciprocity principle. The last three formulae written in terms of f4can be found in the book by Mironov [1981]. If we omit one of the functional variables ui in the arguments of F4(.) in the MPWE (5.21), we can calculate the remaining functional integral. In that case the result is i?,s(r19 K19
=
Z)
1 4n2 s d zt i z d2rzoP ~
x exp{ - i n k z
~ ( Krzo) ~ exp ,
soz
F4[r1
x1
(Z -4 k
- K ~-
1
+ iKZ* r l
K~ - iK1 *rzo Z
Equation (6.12) coincides with Uscinski's solution (Uscinski [1982, 1985)) and the zero-order approximation by the two-scale expansion derived by Macaskill [19833 and by Frankenthal, Whitman and Beran [19841, which is now widely used (Whitman and Beran [1985, 1988, 19921, Mazar, Gozani and Tur [1985], Beran and Mazar [1987], Gozani [1987, 19881). Gozani [1987, 1988, 19913 has developed an iterative procedure based on the twoscale expansion technique. His solution in terms of uniformly convergent series shows that the solution is not an asymptotic expansion. We note that the two-scale approximation, unlike the previous three, cannot be described as a fourth moment of some field representation. Another specific feature of eq. (6.12) is the absence of the inherent symmetry of the first two arguments of rts(r1, r z , 2)=
s
dZK1fIS(Y1,K1,Z) exP(iK1 -rz),
and that cs(O, r z , Z ) is a better approximation than Tls(rl,0 , Z ) for intensity correlation purposes. Because of the symmetry of r1 and r z in T4(r1, rz, Z ) , the energy conservation principle (6.9) obviously has two forms, which can be represented through f4as
s
f 4 ( ~ 1K, ~ 2) , d2rl d2til = const.,
(6.13)
IV, § 61
HEURISTIC APPROXIMATIONS AND PATH-INTEGRAL REPRESENTATIONS
251
or f4(rl = 0, x1 = 0, Z ) = const.
(6.14)
Approximation (6.12) matches the energy conservation principle in the form (6.14) but not the form (6.13), and this seems to be the drawback of the twoscale theory. The reciprocity principle can be matched if both propagation direction and coordinate variables representation are changed simultaneously. The two-scale approximation gives qualitatively correct asymptotic results for plane and spherical wave scintillation indexes. Our derivation of eq. (6.12) clarifies the nature of the two-scale approximation. It is now clear that this approximation is not an asymptotic solution. On the basis of many experiences using these four approximations (mostly the PAHKM and the two-scale method) for various problems of wave propagation through random media, we argue that the outcome of applying these approximations depends strongly on the problem to which they are applied. One can find many examples of applications in which the result was unsuccessful; for instance, the scintillation index of a point source in the PAHKM and of a plane wave in the PAPWEM, which erroneously is zero for an arbitrary level of inhomogeneity. This means that all these approximations describe unsatisfactorily the intensity fluctuation phenomena, and need corrections and improvements. This was noted in many works (Tatarskii and Zavorotny [1986], Tur and Beran [1983], Mironov [1981], Gozani [1987, 19881, Furutsu [1988]) and some methods were provided for these purposes. 6.3. AN ORTHOGONAL EXPANSION OF THE PATH INTEGRAL FOR THE
FOURTH MOMENT
In $4.4, we demonstrated the idea of the orthogonal expansion for the path variables for a two-dimensional case. We now apply this for the case of wave propagation in a three-dimensional random medium. Let us make the notational change (4.1) in eq. (4.2) in order to obtain a field realization u(r, 2) that is governed by the parabolic equation. We get
(6.15)
258
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV, § 6
We now have the vectors a, instead of numbers, but each component or remains a statistically independent, Gaussian number. The meaning of orthogonal functions p,(z) also remains the same, and all corresponding material in 0 4 is applicable for now, taking into account the new notation. Using eq. (6.15) together with the Gaussian &correlated model for E“ fluctuations [eq. (2.13)], we can write the statistical moments r,,,,,.For the fourth moment we obtain
(6.16) where
While we have changed a real diffusion coefficient D to an imaginary one, i/2k, the “averaging” operation ( ) also has changed its meaning. It corresponds to a transition from a Wiener to a Feynman path integral. A more rigorous mathematical development of this transition can be carried out assuming the analyticity of the function F in eq. (6.16) (Tatarskii [1976]). Therefore, taking into account eq. (6.17), the “averaging” operation for given n is determined by
.
x f (..., an,b,, a;, bb, ...) exp{$ik[u:
+ b: - ukz - &’I}.
(6.18)
Formally this operation can be considered as an averaging over “statistically } , {&}, {bi,n},which are independent Gaussian random numbers” { u ~ , ~{bi,n},
Iv, 5 61
259
HEURISTICAPPROXIMATIONSAND PATH-INTEGRAL REPRESENTATIONS
(6.19)
The cpn(z) functions in eq. (6.17) can be chosen the same way as presented in Q 4, using the Karhunen-Lokve expansion. For instance, in the case of an initial plane wave, r:z = 1, we get a cosinusoidal expansion for the paths, like eq. (4.46),and in the case of a point source, we get a sinusoidal one, like eq. (4.54). The above example of an orthogonal expansion of the path integral for the fourth moment demonstrated a new approach for the approximate path integral representation by the n-fold regular oscillatory integral; for n co, it becomes exact. We may write r2,in the form of the spherical wave expansion, similar to the path integral (5.8) but in a coordinate representation, and then use an orthogonal expansion of the path variables. If we retain only the zero-order term of the expansion of the path pn(z), then we may finally obtain for rz2 an expression similar to eq. (6.8),which corresponds to the phase approximation of the Huygens-Fresnel method (PAHKM). To obtain a more precise approximation, one should “plug in” the next term of the path expansion [eq. (6.18)], with n = 1 and add a new integration over u l , b l , and so on. This approach allows us to estimate corrections to the PAHKM, and therefore to estimate the applicability limits of the PAHKM. Calculations of the nth-order approximation for a scintillation index (normalized intensity variance), --f
m,2=r‘,j(O,O,o,O,Z)- 1,
(6.20)
in the case of an initial plane wave and uniform Kolmogorov turbulence in the regimes of both weak and strong intensity fluctuations were shown by Tatarskii and Zavorotny [19861. In a weak-fluctuation regime, the relative error e(n) of the approximation
(6.21) m,Z = [l - e(n)]Pg is 0.31,O. 12, and 0.07, for n = 0, 1, and 2, respectively, where is the Rytov variance. In a strong-fluctuation regime, the relative error with respect to the coefficient 0.86 in the asymptotic expression (Gochelashvili and Shishov [ 19741, Yakushkin [1975], Zavorotny, Klyatskin and Tatarskii [1977]) m2 = 1
+ 0.86&4/5,
(6.22a)
260
WAVE PROPAGATION IN RANDOM MEDIA: PATH-INTEGRAL APPROACH
CIV, Q 6
i.e.,
m,’= I
+ [l - e(n)]0.86P,4/5,
(6.22b)
was found to be 0.41, 0.36, and 0.27 for n = 0, I , and 2, respectively. In agreement with previous studies (Mironov [1981]), we see that n = 0 (i.e., PAHKM) is a crude approximation for a plane wave even in weak scintillations. Further computations for 1 < n < 100 confirmed a slow convergence of the expansion on a sinusoidal basis. It was found by computations that the relative error is e(n) 0.i2n-0.91,
(6.23a)
e(n) z 0.36n-0.59,
(6.23b)
and
in the weak- and strong-fluctuation regimes, respectively. Computations for an initially spherical wave likewise revealed slow convergence. The asymptotic expressions to compare are (see, e.g., Prokhorov, Bunkin, Gochelashvili and Shishov [19751, Yakushkin [19853):
m,’= 0.40[1
- e(n)]Pi,
(6.24a)
and
m,’= 1 + 2.74[1 -e(t~)]P;~/~,
(6.24b)
in the weak- and strong-fluctuation regimes, respectively. The case of n = 0, also known as the PAHKM, erroneously predicts no scintillation and thus e(0) = 1, both in the weak- and strong-fluctuation regimes. Further computations for 1 < n < 100 gave a relative error of e(n) z 0.36n-0.91,
(6.25a)
e(n) E 0.73n-0.54,
(6.25b)
and
in the weak- and strong-fluctuation regimes, respectively. One of the possible ways to improve convergence is to choose yet another set of orthogonal functions {cp,(z)}.For instance, if we start from the Walsh functions for i , ( z ) (Tatarskii [1976]), we get so-called “triangular” functions for cp,(z), the use of which might be more suitable for path expansions, but this problem has not been examined in sufficient detail.
IV, 4 71
CONCLUSIONS
26 1
An orthogonal expansion of the path integral for the statistical moments can also be used for examining and improving other heuristic methods, like the PAPWEM, the SPAHKM, and the two-scale method.
6 7. Conclusions In this article we considered the theory of wave propagation in a continuous large-scale random medium based on the path-integral technique. Attention was devoted mainly to the connection between the path-integral technique and different analytical approximations for solutions of the statistical moment equations, like the phase approximation of the Huygens-Fresnel method and other heuristic approaches that are based on various plane wave expansions. A treatment of the path integral considered here helped us to realize the nature of the two-scale approximation. It is now clear that this approximation is not an asymptotic one. We also examined the probabilistic interpretation of the path-integral representation, and an approach using orthogonal expansions for the path variables was developed on this basis. The latter approach indicates a possible way to improve various approximations considered in the earlier parts of the article. Here the orthogonal expansion was obtained, which leads to the Huygens-Fresnel formula for its zero-order term. However, the same idea of orthogonal expansions can be used for an improvement of any other approximate solution considered above. We hope that the whole set of these approximations with the orthogonal expansion corrections will be a rather good instrument for analytical and numerical solutions for problems connected with fourth and perhaps higher moments. If we look at the advantages of the path-integral technique in wave propagation studies more generally, we note that this technique provides a uniform approach to a variety of propagation problems, and to analytical and computational methods. By using it we may obtain a clear representation of the complex functional of a field, which is impossible to obtain by any other technique. The path integral method was shown to be a powerful tool for asymptotic analysis of statistical field moments. In the present article we demonstrated how to construct analytical approximations and to estimate their accuracy using path integral representations. One can expect further intensive exploitation of the path integral method, possibly for the creation of numerical techniques, and for expanding the field of an application; e.g., for a random medium combined with the regular refractivity background, wave guides, resonators, rough surfaces, etc.
262
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
[IV
Acknowledgements
This research was performed with support from the Wave Propagation Laboratory (WPL) of the National Oceanic and Atmospheric Administration (NOAA). In particular, the authors thank S. Clifford and R. Hill of WPL for the encouragement to write this review. Notes made by I. Besieris on the preliminary draft are highly appreciated. M. I. Charnotskii, V. I. Tatarskii and V. U. Zavorotny are on leave from the P. N. Lebedev Physical Institute, Russian Academy of Sciences, Moscow, Russia.
References Aksenov, V.P., and V.L. Mironov, 1978, Spectral expansion method in problems of laser-beam propagation in the turbulent atmosphere, Opt. Lett. 3, 184. Aksenov, V.P., and V.L. Mironov, 1979, Method of spectral expansions in problems of optical wave propagation in turbulent media, Radiophys. & Quantum Electron. 22, 414. Banakh, V.A., and V.L. Mironov, 1977, Phase approximation of the Huygens-Kirchhoff method in problems of laser beam propagation in the turbulent atmosphere, Opt. Lett. 1, 172. Beran, M.J., and T.L. Ho, 1968, Propagation of the fourth-order coherence function in a random medium (a nonperturbative formulation), J. Opt. SOC.Am. 59, 1134. Beran, M.J., and R. Mazar, 1987, Intensity fluctuations in a quadratic channel, J. Acoust. SOC. Am. 82, 588. Beran, M.J., A.M. Whitman and S. Frankenthal, 1982, Scattering calculations using the characteristics rays of the coherence function, J. Acoust. SOC.Am. 71, 1124. Brown Jr, W.P., 1972, Moments equations for waves in random media, J. Opt. SOC.Am. 62,45. Cameron, R.H., 1951, A “Simpson rule” for the numerical evaluation of Wiener’s integrals in functional space, Duke Math. J. 18, 11I . Charnotskii, M.I., 1991, Asymptotic analysis of the flux fluctuations averaging and finite-size source scintillations in random media, Waves in Random Media 1, 223. Chernov, L.A., 1960, Wave Propagation in Random Medium (McGraw-Hill, New York). Chernov, L.A., 1968, The local method of computing strong fluctuations of the field in the problem of wave propagation in a medium with random inhomogeneities, in: Proc. 6th AllUnion Acoustical Conf. (Nauka, Moscow). Chernov, L.A., 1969, Equations for the statistical moments of the field in a randomly inhomogeneous medium, Sov. Phys. Acoust. 15, 51 I. Chow, P.L., 1972, Application of function space integrals to problems in wave propagation in random media, J. Math. Phys. 13, 1224. Chow, P.L., 1975, A functional phase-integral method and applications to the laser beam propagation in random media, J. Stat. Phys. 12, 93. Codona, J.L., and R.G. Frehlich, 1987, Scintillation from extended incoherent sources, Radio sci. 22, 469. Codona, J.L., D.B. Creamer, S.M. Flatte, R.G. Frehlich and F.S. Henyey, 1986a, Momentequation and path-integral techniques for wave propagation in random media, J. Math. Phys. 27, 171.
IVI
REFERENCES
263
Codona, J.L., D.B. Creamer, S.M. Flatte, R.G. Frehlich and F.S. Henyey, 1986b, Solution for the fourth moment of waves propagating in random media, Radio Sci. 21, 929. Dashen, R., 1979, Path integrals for waves in random media, J. Math. Phys. 20, 894. Dashen, R., S.M. Flatte and S.A. Reynolds, 1985, Path-integral treatment of acoustic mutual coherence functions for rays in a sound channel, J. Acoust. SOC.Am. 77, 1716. D o h , L.S., 1964, Beam description of weakly-inhomogeneous wave fields, Radiophys. & Quantum Electron. 7, 244. D o h , L.S., 1968, Equations for the correlation functions of a wave beam in a randomlyinhomogeneous medium, Radiophys. & Quantum Electron. 11,486. Dubovikov, M.M., 1984, Class of non-Gaussian functional integrals, Theor. & Math. Phys. 58, 2 15. Einstein, A., 1905, On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat, Ann. Phys. 17, 549 [Translation: 1956, Investigation on the theory of the Brownian Movement, ed. R. Fiirth, translator A.D. Cowper (Dutton, New York)]. Fante, R.L., 1975, Electromagnetic beam propagation in turbulent media, Proc. IEEE 63, 1669. Fante, R.L., 1985, Wave propagation in random media: A systems approach, in: Progress in Optics, Vol. 22, ed. E. Wolf (North-Holland, Amsterdam) pp. 341-398. Feynman, R.P., 1948, Space-time approach to non-relativistic quantum mechanics, Rev. Mod. Phys. 20, 367. Feynman, R.P., 1958, Selected Papers on Quantum Electrodynamics, ed. J. Schwinger (Dover, New York). Feynman, R.P., and A.R. Hibbs, 1965, Quantum Mechanics and Path Integrals (McGraw-Hill, New York). Feyzulin, Z.I., and Yu.A. Kravtsov, 1967, Broadening of a laser beam in a turbulent atmosphere, Radiophys. & Quantum Electron. 10, 33. Flatt6, S.M., R. Dashen, W.H. Munk, K.M. Watson and F. Zachariasen, 1979, Sound Transmission Through a Fluctuating Ocean, ed. S.M. Flatte (Cambridge University Press, Cambridge). Flattb, S.M., D.R. Bernstein and R. Dashen, 1983, Intensity moments by path integral techniques for wave propagation through random media, with application to sound in the ocean, Phys. Fluids 26, 1701. Fock, V.A., 1950, Theory of radio-wave propagation in an inhomogeneous atmosphere for a raised source, Izv. Akad. Nauk SSSR Ser. Fiz. 14, 70. In Russian. [Translation: 1965, Electromagnetic Diffraction and Propagation Problems (Pergamon, Oxford) ch. 141. Fradkin, EX, 1966, Application of functional methods in quantum field theory and quantum statistics, Nucl. Phys. 76, 588. Frankenthal, S., M.J. Beran and A.M. Whitman, 1982, Caustic corrections using coherence theory, J. Acoust. SOC.Am. 71, 348. Frankenthal, S., A.M. Whitman and M.J. Beran, 1984, Two-scale solutions for intensity fluctuations in strong scattering, J. Opt. SOC.Am. A 1, 585. Frehlich, R.G., 1987, Space-time fourth moment of waves propagating in random media, Radio Sci. 22, 481. Furutsu, K., 1988, Intensity correlation functions of lightwaves in a turbulent medium: An exact version of the two-scale method, Appl. Opt. 27, 2127. Gel'fand, I.M., and A.M. Yaglom, 1960, Integration in functional spaces and its applications in quantum physics, J. Math. Phys. 1, 48. Gochelashvili, K.S., 1971, Saturation of the fluctuations of focused radiation in a turbulent medium, Radiophys. & Quantum Electron. 14, 470. Gochelashvili, K.S., 1974, Propagation of focused laser radiation in a turbulent medium, Sov. J. Quantum Electron. 4, 465.
264
WAVE PROPAGATION IN RANDOM MEDIA PATH-INTEGRAL APPROACH
CIV
Gochelashvili, K.S., and V.I. Shishov, 1971, Laser beam scintillation beyond a turbulent layer, Opt. Acta 18, 313. Gochelashvili, K.S., and V.I. Shishov, 1974, Saturated fluctuations in the laser radiation intensity in a turbulent medium, Sov. Phys.-JETP 39,605. Gozani, J., 1987, Improvement in the two-scale solution for wave propagation in a random medium, Opt. Lett. 12, I . Gozani, J., 1988, Applicability of the two-scale solution of a wave propagating in a random medium, J. Opt. SOC.Am. B 5, 721. Gozani, J., 1991, Two-scale expansion of wave propagation in a random medium, Comput. Phys. Commun. 65, 117. Gurvich, AS., B.S. Elepov, V.V. Pokasov, K.K. Sabel'feld and V.I. Tatarskii, 1979a, Spatial structure of strong fluctuations of light intensity in a turbulent medium, Radiophys. & Quantum Electron. 22, 135. Gurvich, AS., B.S. Elepov, V.V. Pokasov, K.K. Sabel'feld and V.I. Tatarskii, 1979b, Space structure of strong intensity fluctuations of light in a turbulent medium, Opt. Acta 26, 531. Gurvich, AS., A.I. Kon, V.L. Mironov and S.S. Khmelevtsov, 1976, Laser Radiation in Turbulent Atmosphere (Nauka, Moscow). In Russian. Ishimaru, A., 1978, Wave Propagation and Scattering in Random Media (Academic Press, New York). Kac, M., 1959, Probability and Related Topics in the Physical Sciences (Interscience, New York) ch. 4. Klyatskin, V.I., 1970, Applicability of the approximation of a Markov random process in problems relating to the propagation of light in a medium with random inhomogeneities, SOV.Phys.-JETP 30,520. Klyatskin, V.I., 1975, Statistical Description of Dynamic Systems with Fluctuating Parameters (Nauka, Moscow). In Russian. Klyatskin, V.I., 1980, Stochastic Equations and Waves in Random Inhomogeneous Media (Nauka, Moscow) chs. 8 and 9. In Russian. [Translation: 1985, Ondes et equations stochastique dans les milieux aleatoirement non-homogenes (Editions de Physiques, Besancon). In French]. Klyatskin, V.I., and V.I. Tatarskii, 1970a, On the theory of the propagation of light beams in a medium having random inhomogeneities, Radiophys. & Quantum Electron. 13, 828. Klyatskin, V.I., and V.I. Tatarskii, 1970b, The parabolic equation approximation for propagation of waves in a random medium with random inhomogeneities, Sov. Phys.-JETP 31, 335. Kravtsov, Yu.A., 1992, Propagation of electromagnetic waves through a turbulent atmosphere, Rep. Prog. Phys. 55, 39. Lee, M.H., J.F. Holmes and J.R. Kerr, 1976, Statistics of speckle propagation through the turbulent atmosphere, J. Opt. SOC. Am. 66, 1164. Leontovich, M.A., 1944, On the new method for solving of the radio-wave propagation problem, Izv. Akad. Nauk. SSSR Ser. Fiz. 8, 16. In Russian. Leontovich, M.A., and V.A. Fock, 1946, Solution of the problem of propagation of electromagnetic waves along the earth's surface by the parabolic equation method, Zh. Eksp. & Teor. Fiz. 16, 557. In Russian. Lukin, V.P., and M.I. Charnotskii, 1985, Reverse wave propagation in a randomlyinhomogeneous medium, Sov. Phys. J. 28,894. Lutomirskii, R.F., and H.T. Yura, 1971, Propagation of finite optical beam in an inhomogeneous medium, Appl. Opt. 10, 1652. Macaskill, C., 1983, An improved solution to the fourth moment equation for intensity fluctuations, Proc. Phys. SOC.London Ser. A 386,461. Martin, J.M., and S.M. Flatt6, 1988, Intensity images and statistics from numerical simulation of wave propagation in 3-D random media, Appl. Opt. 27, 21 11.
IVl
REFERENCES
265
Martin, J.M., and S.M. Flatte, 1990, Simulation of point-source scintillation through threedimensional random media, J. Opt. SOC.Am. A 7,838. Mazar, R., and M.J. Beran, 1982, Intensity corrections in a random medium in the neighbourhood of a caustic, J. Acoust. SOC.Am. 72, 1269. Mazar, R., J. Gozani and M. Tur, 1985, Two-scale solution for the intensity fluctuations of two-frequency wave propagation in a random medium, J. Opt. SOC.Am. A 2,2152. Mironov, V.L., 1981, Propagation of Laser Beam in a Turbulent Atmosphere (Nauka, Moscow). In Russian. Molyneux, J.E., 1971a, Propagation of the Nth-order coherence function in a random medium: The governing equations, J. Opt. SOC.Am. 61, 248. Molyneux, J.E., 1971b, Propagation of the Nth-order coherence funtion in a random medium, 11. General solutions and asymptotic behaviour, J. Opt. SOC.Am. 61, 369. Novikov, E.A., 1961, The solutions of some variational differential equations, Usp. Mat. Nauk SSSR 16, 135. In Russian. Papoulis, A., 1965, Probability, Random Variables, and Stochastic Processes (McGraw-Hill, New York) ch. 13. Prokhorov, A.M., F.V. Bunkin, K.S. Gochelashvili and V.I. Shishov, 1975, Laser irradiance propagation in turbulent media, Proc. IEEE 63, 790. Rose, C.M., and I.M. Besieris, 1979, Nth-order multifrequency coherence functions: A functional path integral approach, J. Math. Phys. 20, 1530. Rytov, S.M., Yu.A. Kravtsov and V.I. Tatarskii, 1988, Principles of Statistical Radiophysics, Vol. 4, Wave Propagation through Random Media (Springer, Berlin). Sabel’feld,K.K., and V.I. Tatarskii, 1978, The approximate evaluation of Wiener path integrals, Sov. Phys. Dokl. 23, 898. Shishov, V.I., 1968, Theory of wave propagation in random media, Radiophys. & Quantum Electron. 11, 500. Tatarskii, V.I., 1961, Wave Propagation in a Turbulent Medium (McGraw-Hill, New York). Tatarskii, V.I., 1969, Light propagation in a medium with random index refraction inhomogeneities in the Markov process approximation, Sov. Phys.-JETP 29, 1133. Tatarskii, V.I., 1971, The Effect of the Turbulent Atmosphere on Wave Propagation (National Technical Information Service, Springfield, VA) TT-68-50464. Tatarskii, V.I., 1976, On approximate computation of the path integral, Monte-Carlo Method in Computational Mathematics and Mathematical Physics (Nauka, Novosibirsk) pp. 60-90. Tatarskii, V.I., and V.U. Zavorotny, 1980, Strong fluctuations in light propagation in a randomly inhomogeneous media, in: Progress in Optics, Vol. 18, ed. E. Wolf (North-Holland, Amsterdam) pp. 205-256. Tatarskii, V.I., and V.U. Zavorotny, 1986, On the connection between the extended Huygens-Fresnel principle and the path-integral approximate computation based on orthogonal expansions, Proc. SPIE 642, 276. Tur, M., and M.J. Beran, 1982, Propagation of a finite beam through a random medium, Opt. Lett. 7, 171. Tur, M., and M.J. Beran, 1983, Wave propagation in random media: A comparison of two theories, J. Opt. SOC.Am. 73, 1343. Uscinski, B.J., 1977, The Elements of Wave Propagation in Random Media (McGraw-Hill, New York). Uscinski, B.J., 1982, Intensity fluctuations in a multiple scattering medium. Solution of the fourth moment equation, Proc. R. SOC.London Ser. A 380,137. Uscinski, B.J., 1985, Analytical solution of the fourth-moment equation and interpretation as a set of phase screens, J. Opt. SOC.Am. A 2, 2077. Uscinski, B.J., C. Macaskill and M. Spivak, 1986, Path integral for wave intensity fluctuations in random media, J. Sound Vibration 106, 509.
266
WAVE PROPAGATION IN RANDOM
MEDIA PATH-INTEGRAL APPROACH
CIV
Walther, A., 1973, Radiometry and coherence, J. Opt. SOC.Am. 63, 1622. Whitman, A.M., and M.J. Beran, 1985, On the atmospheric scintillation, J. Opt. SOC.Am. A 2, 2133. Whitman, A.M., and M.J. Beran, 1988, Two-scale solution for atmospheric scintillation from a point source, J. Opt. SOC.Am. A 5, 735. Whitman, A.M., and M.J. Beran, 1992, First-order correction for the scintillation index and correlation of intensity function, J. Opt. SOC.Am. A 9, 974. Wiener, N., 1923, Differential space, J. Math. Phys. 2, 131. Wiener, N., 1924, The average value of a functional, Proc. London Math. SOC.Ser. II,22,454. Yakushkin, I.G., 1975, Asymptotic calculation of field-intensity fluctuations in a turbulent medium for long paths, Radiophys. & Quantum Electron. 18, 1224. Yakushkin, I.G., 1976, Strong intensity fluctuations in the field of a light beam in a turbulent atmosphere, Radiophys. & Quantum Electron. 19, 270. Yakushkin, I.G., 1978, Moments of field propagating in randomly inhomogeneous medium in the limit of saturated fluctuations, Radiophys. & Quantum Electron. 21, 835. Yakushkin, I.G., 1985, Intensity fluctuations during small-angle scattering of wave field, Radiophys. & Quantum Electron. 28, 365. Yura, H.T., 1972, First and second moment of an optical wave propagating in a random medium: Equivalence of the solution of the Dyson and Fkthe-Salpeter equation to that obtained by the Huygens-Fresnel principle, J. Opt. SOC.Am. 69, 1292. Zavorotny, V.U., 1978, Strong fluctuations of electromagnetic waves in a random medium with finite longitudinal correlation of the inhomogeneities, Sov. Phys.-JETP 48, 27. Zavorotny, V.U., 1981, Frequency correlation of large intensity fluctuational in a turbulent medium, Radiophys. & Quantum Electron. 24, 407. Zavorotny, V.U., V.I. Klyatskin and V.I. Tatarskii, 1977, Strong fluctuations of the intensity of electromagnetic waves in randomly inhomogeneous media, Sov. Phys.-JETP 46,252.
E. WOLF, PROGRESS IN OPTICS XXXII @ 1993 ELSEVIER SCIENCE PUBLISHERS B.V.
ALL RIGHTS RESERVED
V
RADIATION BY UNIFORMLY MOVING SOURCES Vavilov-Cherenkov effect, Doppler effect in a medium, transition radiation and associated phenomena BY
V. L. GINZBURG P . N . Lebedev Physical Institute. Russian Academy of Sciences, 117924 Moscow, Russia
261
CONTENTS PAGE
$ 1.
5 2.
. . . . . . . . . . . . . . . . VAVILOV-CHERENKOV EFFECT FOR A CHARGE . . INTRODUCTION
269 271
$ 3. QUANTUM THEORY O F THE VAVILOV-CHERENKOV EFFECT . . . . . . . . . . . . . . . . . . . 277 54. VAVILOV-CHERENKOV RADIATION IN THE CASE OF
MOTION IN CHANNELS AND GAPS
. . . . . . . .
279
$ 5. VAVILOV-CHERENKOV RADIATION FOR ELECTRIC, MAGNETIC AND TOROIDAL DIPOLES . . . . . . . 281
$6. CLASSICAL AND QUANTUM THEORIES OF THE DOPPLER EFFECT IN A MEDIUM . . . . . . . . . 288 $ 7. ACCELERATION RADIATION
. . . . . . . . . . .
$8. TRANSITION RADIATION AT BETWEEN TWO MEDIA . . . .
THE
292
BOUNDARY
. . . . . . . . .
294
$9. TRANSITION RADIATION AS A MORE GENERAL PHENOMENON. FORMATION ZONE . . , . . . . . 299 $ 10. TRANSITION STRAHLUNG
SCATTERING. TRANSITION
. . . . . . .
,
. .
BREMS-
. . . . . . .
303
5 11. TRANSITION RADIATION, TRANSITION SCATTERING AND TRANSITION BREMSSTRAHLUNG IN A PLASMA
306
5 12. CONCLUDING REMARKS . . . . . . . . . . . . 309
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
NOTE ADDED IN PROOF REFERENCES
.
268
31 1 31 1
9 1. Introduction Radiation by uniformly and rectilinearly moving sources (particularly charges) is a well-defined subject. It belongs to the field of electrodynamics, but the effects which we will discuss in this article have analogues beyond the domain of electrodynamics; i.e., they have counterparts in acoustics, chromodynamics and, strictly speaking, in any field theory. Although some provisos are needed here (they will be noted below), we will deal with radiation of a source moving uniformly in or near a medium. The most important role is played here by sources with zero eigenfrequency; i.e., sources which are static in the frame of reference in which they are at rest. These sources include charges, various permanent dipoles, etc. The radiation which they generate is the Vavilov-Cherenkov radiation and transition radiation and its various forms. If a source has a non-zero eigenfrequency and moves uniformly in a straight line, it gives rise to the Doppler effect. For motion in vacuo, the Doppler effect is widely known. In a medium, the Doppler effect is appreciably more complicated than in a vacuum, and this situation is usually not discussed in textbooks. This class of problems is to some extent isolated and relatively poorly understood, mainly because it has been a neglected area of research for some time. In part, this neglect can be attributed to the dogma which was prevalent in electrodynamics for a long time which asserted that a charge moving uniformly does not radiate. We will show that this dogma is false, strictly speaking, even in vacuo. The radiation of a charge moving uniformly in a medium is closely related to the energy loss by this charge in the medium. In his early papers, N. Bohr [1913, 19151 disregarded the radiation, but mentioned it in a later paper (N. Bohr [1948]). The microscopic approach to charge radiation in a medium can be found in the paper by A. Bohr [1948]. Meanwhile, Frank and Tamm [ 19371 and Fermi [19403 considered this problem macroscopically, when the properties of the medium are characterized by the dielectric permittivity of the medium. Such an approach is rather general and useful. A consistent application of the electrodynamics of continuous media to the motion of fast charged particles through matter is presented in ch. 14 of the book by 269
270
RADIATION BY UNIFORMLY MOVING SOURCES
[V,§ 1
Landau and Lifshitz [1984]. Note, however, that for dipoles and various higher multipoles, the situation is more complicated than for charges (see 9 5). For charges, the role of the medium is essential only for distant collisions with an impact parameter p % a. For a condensed medium a (-3 x lo-’) is an atomic dimension and for a plasma a ( - N - ’ I 3 ) is the mean distance between particles. Under such circumstances, radiation of waves with a wavelength 3, % a can be considered by the use of the electrodynamic equations for continuous media. Assuming for simplicity that the medium is isotropic, and neglecting spatial dispersion, we can take into account all of the properties of the medium by introducing the dielectric permittivity &(a) and magnetic permeability p(w).If the medium is inhomogeneous, E and p depend on position and, in particular, change discontinuously at the boundary between two media. In the optical frequency range, as well as for the higher frequencies, one can take p = 1, but for generality and taking into account the “permutation duality principle”*, it is preferable to not make this apparent simplification. Below we use just this approximation of the electrodynamics of continuous media and disregard absorption. Therefore, we shall deal with a transparent medium whose refractive index n(w)=.-/, In the case of transition radiation, however, we will also consider an absorbing or nontransparent medium (for a nonabsorbing but nontransparent medium, E and p are real quantities, but c p < 0). We shall not go into exhaustive detail in this article, especially concerning calculations which can be found, for example, in the books by Ginzburg C1979, 19893, Landau and Lifshitz [1984], Ter-Mikaelyan [19723 and Ginzburg and Tsytovich [1990]. Our references do not claim completeness. I have taken the liberty of citing many of my own papers, for it was not my aim to give a weighted historical review of the problem under discussion (this may rather be a concern of historians of physics). The main part of the present article is based on an article by Ginzburg [I19861but is here extended and updated; 8 7 discusses more recent findings.
* In the simplest form, this principle asserts that from the knowledge of the solution of the problem for an electric charge e, the solution of an analogous problem for a magnetic monopole g can be obtained by the substitution E + H , H + -E, e + g , e e p (here E = D/e and H = B/p are electric and magnetic field strengths; see, e.g., Ginzburg [1979, 19891). However, also see the note made in 5 5.
v, 8 21
VAVILOV-CHERENKOV EFFECT FOR A CHARGE
0 2.
271
Vavilov-Cherenkov Effect* for a Charge
In 0 1, we described a dogma - a uniformly moving charge does not radiate. It would seem that this statement is not at all a dogma, but is instead an obvious truth. Indeed, if according to our assumption, a charge moves uniformly in vacuo (in a given inertial frame of reference), one can change over to a frame of reference in which the charge is at rest. However, a charge at rest cannot radiate - it is sufficient to say that it does not have the energy necessary to produce the radiation. Moreover, the absence of radiation in the rest frame also means that no radiation is generated in other inertial frames of reference. The same conclusion can be drawn in the quantum description (see 0 3). However, even for the case of the motion of a charge in vacuo, these remarks are valid only under two important restrictions. Firstly, the motion must be uniform and rectilinear (i.e. both the direction and the magnitude of the velocity u is constant) at all times in the time interval - co c t c 00. If, however, the charge had been accelerated at some point in the past (this is virtually inevitable i;l practice), its field may “relax” over a long time; i.e., it may develop into a stationary state and this, in general, is accompanied by radiation (for more details, see ch. 1 of Ginzburg [1979, 19891). Secondly, no radiation is generated if the velocity of the source is less than the velocity of light, c = 3 x 10” cm s-’. At the same time, even if we ignore the hypothetical tachyon [which probably do not exist (for tachyons, u > c)], radiation sources (but not individual particles, such as electrons) can move with superluminal velocities u > c as we will see. If, however, a charge moves uniformly not in vacuo, but rather in a medium, the absence of radiation is the exception rather than the rule. In fact, a charge does not radiate only if its velocity u is less than the phase velocity Cph for any of the electromagnetic waves that can propagate in a given medium. If u > cph,Vavilov-Cherenkov radiation occurs. Moreover, even if u c c p h ,the absence of radiation concerns only the situation of a medium at rest which must also be uniform and time-independent. In a nonuniform and/or non-stationary medium, a uniformly moving charge radiates also when u c Cph. This is the so-called transition radiation. It is noteworthy that even a charge at rest can radiate in a medium, although this is a rather exotic case (see 5 9).
+
* In the Western literature this effect is called the Cherenkov radiation. However, my colleagues, acquainted with the history of the question, and I believe that the term “VavilovCherenkov effect” is more appropriate (see Frank [1984, 19881).
212
RADIATION BY UNIFORMLY MOVING SOURCES
Fig. I. Formation of Vavilov-Cherenkov radiation [(c/n))t is the light path traveled in time ut is the distance traveled by the particle during the same time].
t,
The possibility that a uniformly moving charge can radiate had, in some form or other, been considered long before the discovery and explanation of the Vavilov-Cherenkov effect (1934-1937). This problem had been discussed by Heaviside [1888], Kelvin [1901], and Sommerfeld [1904]. Sommerfeld, for instance, considered in some detail the following situation: a charge (charged pellet) moves in vacuo at a velocity u=const. and its electromagnetic field is to be determined. For u < c, there is no radiation (a stationary problem). For u > c, radiation occurs and the radiation front forms a conical surface with an angle Oo between the normal to it (the wavevector k) and the velocity u. In this case, cos 00 = c/u;
(2.1 )
also refer to fig. 1. One may say that condition (2.1) has a kinematic character - it is the condition for interference of secondary waves excited by the charge along its path. To be more precise, under condition (2.1), waves, which in the given case have a phase velocity Cph = c(vacuum) and which are emitted along the trajectory of the source, are in phase on the appropriate canonical surface. This behaviour is evident in fig. 1; after some time t , the charge has traversed a path ut, and the wave has traversed a path ct, because in this case the refractive index n = 1. This is in agreement with the Huygens principle and is, of course, especially realistic when we deal with any kind of wave in a medium, but it is well known that the result is also true for a vacuum, notwithstanding the fact there is no ether. It is clear from what we have said that eq.(2.1), when rewritten in the form cos 6 0 = cph/u,
(2.2)
v, § 21
V A V I L O V - C H E R E N K O V EFFECT F O R A CHARGE
273
holds for any kind of wave and for any static source. However, it can be satisfied only if 0
> Cph.
(2.3)
In acoustics, condition (2.2),where Cph = u is the velocity of sound, has been known for a long time as the Mach condition; the surface of constant phase is known as the Mach cone encountered by supersonic bullets, missiles, etc. Since Sommerfeld solved the electrodynamic problem, he obtained eq. (2.1) rather automatically, and he evaluated the radiation intensity. Of course, the radiation intensity is not zero only for u > c [see eqs. (2.1)-(2.3)]. The formula obtained by Sommerfeld corresponds to the Frank-Tamm formula for the intensity of Vavilov-Cherenkov radiation [see eq. (2.6) below] in the case of a nondispersive medium - the vacuum, whose refractive index n is unity. In a sense, Sommerfeld was “unlucky” - he published his results in 1904, but within a year (1909, the special theory of relativity appeared. According to that theory, the particle momentum p = m v / J w , and it is impossible to accelerate a particle from small velocities to a velocity u > c. Moreover, it would seem that the causality requirements also do not allow particles to move with a velocity u > c, because such particles would produce superluminal “signals”. It is true that relatively recently scientists have started to consider superluminal particles (tachyons, for which u > c). Whether they exist, however, is doubtful because of the requirement of causality, quite apart from the fact that there is no experimental evidence whatsoever to support the tachyon hypothesis. In any case, at the beginning of the century no one, as far as I know, was considering tachyons, and Sommerfeld’s paper was forgotten for a long time. No one appears to have considered transferring Sommerfeld’s results to the case of motion of a charge in a medium. True, even before the appearance of the Sommerfeld’s papers, Heaviside [1888, 18891 understood correctly both the possible role of the medium and the role of superluminal “light spots” (see below). (The Heaviside papers were “rediscovered” in 1974 independently in England and the USSR; for references, see Frank [1988].) However, at that time his papers did not attract due attention. History took a different turn. In 1934 S. I. Vavilov and P. A. Cherenkov observed this radiation and the nature of the radiation was explained in 1937 by Frank and Tamm. For the history of this discovery, see Frank [1984, 19881. The Vavilov-Cherenkov radiation is the radiation of a uniformly moving charge in a transparent medium with refractive index n ( o ) and, hence, with
214
RADIATION BY UNIFORMLY MOVING SOURCES
[V, § 2
phase velocity cph= c/n(w). Conditions (2.2) and (2.3) become cos eo = c/n(o)v,
(2.4)
v 2 c/n(o).
(2.5)
and
Frank and Tamm* obtained the following expression for the energy emitted by a particle of charge e per unit time (i.e., along a pathlength v )
= C2
jp(m)[sin2 e,(o)]w d o .
(2.6)
The Vavilov-Cherenkov radiation now occupies a prominent place in physics. Many papers are devoted to it, including books and review articles (see references cited near the end of 9 1 and the literature cited there). Perhaps an even greater role is played here not by the Vavilov-Cherenkov effect in the strict sense of the word [optical emission by a charge which moves uniformly in a medium with velocity v > c/n(o)], but rather by ideas and analogues related to it. As a typical example we give the interpretation of the so-called Landau damping or, more precisely, the damping of longitudinal (plasma) waves in a collisionless plasma. Landau [1946] concluded that such damping should exist when he was solving the problem with initial data related to the propagation of longitudinal perturbations in a collisionless plasma using the kinetic equation. The collisionless damping which then occurs and which also appears in a number of problems of plasma physics and plasma-like media (e.g., in the case of the “solid-state plasma”, the electron liquid in metals, etc.), can be interpreted (if one considers it from a physical point of view) in different ways. One of them is the following: the condition for collisionless wave absorption by electrons in the plasma, which has the form w=k*v,
(2.7)
is simply the condition [eq. (2.2)] for emission of waves. In this case, the * Frank and Tamm learned about the work of Sommerfeld only after they had completed their investigation. In their paper [1937], they assumed from the very start that p = 1, and therefore n = [in eqs. (2.4),(2.5) and (2.6), n = 6 1 .
4
v, P 21
VAVILOV-CHERENKOV EFFECT FOR A CHARGE
215
waves are longitudinal plasma waves with a phase velocity Cph
= o / k = cln,,
n,(w)
=ck/o,
(2.8)
where n / ( o ) is the refractive index for the longitudinal waves considered. Note that because the velocity of the longitudinal waves cphis independent of c,* the introduction of the refractive index n, has a rather formal character. The collisionless Landau absorption is thus closely connected with the inverse Vavilov-Cherenkov effect for plasma waves. If recoil is neglected, the kinematic conditions for absorption and for emission of waves are the same. Of course, when dealing with an “external” wave - in this case a longitudinal wave propagating in the plasma - we must consider its interaction not only with a single particle, but with an ensemble of particles. As a result, it is necessary to take into account not only the absorption of the waves, but also their stimulated (induced) emission. It is just this explanation which now makes collisionless absorption completely obvious and understandable to many scientists. (One should, of course, not take these remarks as a denial of the validity of other models; i.e., of another physical language.) In general it is rather natural that the development of the physics of superluminal sources (i.e., sources which move with a velocity greater than the speed of light in a given medium) has progressed and is likely to progress further still. We shall give here one relatively recent example which has its roots in the rather distant past. Indeed, we have already emphasized that Sommerfeld’s work in this area was not believed to deal with a true physical situation, because it is impossible to satisfy the requirements of eqs. (2.3) and (2.5) in vacuo. In fact, until recently it seemed rather obvious that the Vavilov-Cherenkov radiation could not be produced in vacuo or in media with a refractive index n(o)< 1 (in particular, in an isotropic plasma under is valid). circumstances where the well-known formula n(w)= ,/In fact, however, such a conclusion is incorrect, or perhaps simply too rash (see Ginzburg C1979, 19891). Although for the above-mentioned reasons it seems unlikely that “superluminal” particles - tachyons - exist, realistic radiation sources can move at a velocity u > c. Moreover, such sources have long been known. We might mention as an example patches of light (light spots), which can move at any velocity, including u > c. The same “spots”
*
For instance, in a well-known approximation in which for longitudinal waves + 3(k,T/m)kz, the velocity cph= w/k = J(3ks T/rn)/(l- o i / w Z ) , where m i = 4ne2N/rn is the square of the plasma frequency and k, is the Boltzmann constant ( N is the concentration and T is the temperature of the electrons in the plasma).
w2 = w i
216
RADIATION BY UNIFORMLY MOVING SOURCES
[V, $ 2
may also consist of charged particles. They are formed if suitable particle beams are incident upon a metal plate. “Light spots” consist of many particles, which in the “spot” are different at different instants of time; the particles in the “spot” (photons, electrons, etc.) move with the speed of light or with subluminal speed. Therefore, the existence of these “light spots” does not violate the requirements of causality - it is impossible to use them to send signals with superluminal velocities. On the other hand, a charged “light spot” (a moving “patch”) is, in the electrodynamic sense, not at all “worse” than any other macroscopic charge; in this respect, Sommerfeld’s calculation is fully applicable (formally, this is clear from the fact that the current density j = p(r, t ) u(r, t), which corresponds to a distributed charge, can well move in a space at a velocity greater than c). Optical “light spots” (i.e., those formed by photons) which move with a velocity u > c can also radiate, but in this case nonlinear effects must be taken into account. Briefly, radiation sources moving with a velocity greater than c are quite realistic and thus there appears a possibility to observe the Vavilov-Cherenkov effect both in vacuo (under normal conditions, however, only if a boundary is present), and also in an isotropic plasma. The Vavilov-Cherenkov effect can also exist in vacuo far from any boundaries if a strong constant magnetic field B is present which is comparable with the well-known critical value B, = m2c3/eh= 4.4 x I O l 3 Gauss (here e and m are the charge and mass of the electron, respectively). As was already noted in the early 1930s, vacuum behaves like a birefringent medium when in the presence of a strong field. In some cases for weak electromagnetic waves propagating in a strong magnetic field the refractive index n, > 1 and, hence, uniformly moving charged particles can emit Vavilov-Cherenkov waves. We must not be confused here by the fact that, unless we consider motion strictly in the direction of the strong magnetic field, a charged particle can be deflected by the magnetic field. In principle, we can maintain a constant velocity u of a particle by some external means (sources). Moreover, one can formally assume the mass m of the particle to be arbitrarily large, in which case its velocity will then also be constant. Frank and Tamm [ 19373 derived formula (2.6) by evaluating the electromagnetic energy flux S through a cylindrical surface surrounding the trajectory of the particle. The present author obtained the same formula (in 1939) by evaluating the change in the energy dW,,,/dt of the electromagnetic field per unit time in the whole space (see Ginzburg [19393, and Ginzburg [1979, 19891 ch. I). Finally, the same result (see A. Bohr [1948], Landau and Lifshitz [19841) can be obtained by evaluating the work done by the field
v, o 31
QUAR I'UM rHEORY OF THE VAVILOV-CHERENKOV EFFECT
277
-
on the particle per unit time; i.e., the quantity eu E, where the field E is calculated in the position of the charge (eE represents the radiative friction force; the other parts of the field do not contribute to the corresponding expressions). One should perhaps expect that each of the three methods would give identical results. Indeed, this is the case with the Vavilov-Cherenkov effect. However, in the general case for nonstationary charges in vacuo (and in a medium) and, for example, for transition radiation, the quantities S, d WJdt and eu E are generally already different, and we must not forget this (for details, see Ginzburg [1979, 19891, Ginzburg and Tsytovich [1990]).
Q 3. Quantum Theory of the Vavilov-Cherenkov Effect Let us now turn to the quantum interpretation of the Vavilov-Cherenkov effect. The Vavilov-Cherenkov effect is, as a general rule, described quite satisfactorily in the framework of the classical theory, and the corresponding quantum corrections are not important. However, it seems that from both methodical and physical points of view, the quantum approach is useful and interesting. However, perhaps I am influenced by the fact that this problem was considered in one of my first papers (Ginzburg [1940a]). At that time, when L. D. Landau heard about my paper, he did not think it interesting. In general, let me take this opportunity to note that not only in arts, but also in science preferences and tastes are varied and different. I personally like problems connected with the Vavilov-Cherenkov radiation, transition radiation, etc. At the same time, Landau, probably not by chance, considered the quantum theory of the Vavilov-Cherenkov effect uninteresting, for he did not, in general, have any special interest in this effect because, evidently, he did not consider it beautiful. This is not at all a criticism, but simply a statement. Its aim is only to emphasize that, in my opinion, there are no grounds to banish from the scientific literature, as is often done, everything personal, and to aim only to a dry statement of facts and description of formulae. As we consider the quantum theory of the Vavilov-Cherenkov effect, let us restrict the discussion to obtaining the condition (2.4) for emission and to its quantum generalization. Quantum theory enables us, of course, to derive eq. (2.6) with the appropriate quantum corrections (Ginzburg [1940a]). How can one explain in quantum-mechanical terms the absence of emis-
278
rv, § 3
RADIATION BY UNIFORMLY MOVING SOURCES
sion by a charge or by another static source which moves uniformly in vacuo? To do this, it is sufficient to use the energy and momentum conservation laws,
Eo = El + h o ,
E o , 1 =,-/,
(3.1)
+ hk,
hk = hw/c, (3.2) where Eo,l and po,l are the energy and momentum of the charge (source) with rest mass m before (0) and after (1) the emission of a photon of energy ho and momentum hk = (hw/c)(k/k)(w is the radiation frequency). One can verify that it is impossible [and this is also clear from eq.(3.5) with n = 13 to satisfy eqs. (3.1) and (3.2) for o > 0. In order to consider radiation by a source in a medium, one must know what the energy and the momentum of the radiation are (the expression for of the source is evidently not changed). It is the energy E = not quite so simple to do this fully consistently, but on an intuitive level the answer is clear at once. Indeed, the presence of a nonmoving and timeindependent medium does not at all affect the frequency o,and the wavelength in the medium is 1 = Ao/n(A), where l o= 2 4 0 is the wavelength in vacuo. In other words, in the medium the wavenumber is k = 2x11 = (o/c)(n(w)).If we agree with this substitution, we must, instead of eq. (3.2) put, PO
=pi
Jm
A simultaneous solution of eqs. (3.1) and (3.3) leads to the result
2(mc/n)(vo cos O0 - c/n) (3.5)
where 0, is the angle between uo and k. If hw/mc2 G 1,
[or for a somewhat more general inequality which is clear from eq. (3.4)], which corresponds to the classical limit, eq. (3.4) reduces to eq. (2.4) as one would expect. The classical limit corresponds, apparently, to neglecting the recoil which occurs when a “photon in the medium” with momentum hk is emitted. It is also clear from eq. (3.5) that w > O and cos 0, 1 so that
-=
V, 5 41
219
VAVILOV-CHERENKOV RADIATION IN CHANNELS AND GAPS
emission is possible only for uo > c/n(o) [see eq. (2.5)]. In the classical limit, where the result [given by eq. (2.4)] does not contain the quantum constant h, the quantum calculation has merely a methodical character: it may be convenient, but it is not essential. This is just the case in reality, and the energy and momentum conservation laws can also be formulated in the classical region. We need only to take into account the connection between the emitted eneigy Welmand the change in the momentum C of the radiation and of the medium. In accordance with eq. (3.3) we must set
Furthermore, because for a freely moving particle with sufficiently small changes in energy and momentum AE = El - Eo = u Ap = u (PI -po) [indeed, dE/dp = ( d / d p ) J m = c2p/E= u ] , one can put vo x u1 x u. From the conservation laws (3.1) and (3.3) and upon replacing hw by Welm, we obtain AE = Welm= u Ap = (Welmn/c)(ku/k).It is clear that the energy Welmcancels out, and we are thus led to the classical condition for emission; viz., (nu/c)cos O0 = 1 [see eq. (2.4)]. Note that eq. (3.7) or k = hon/c [see eqs. (3.3)] corresponds to writing the energy-momentum tensor in a medium in the Minkowskii form. However, the energy-momentum tensor of the field in a medium has, in fact, the form proposed by Abraham (we mean the simplest case of a nondispersive medium) which is reflected in the existence of the Abraham force which acts upon the medium with a density f" = [(n2 - 1)/4nc](a/at](E x H ) . One can readily show, however, that eq. (3.7) is also valid when the Abraham tensor is used, if one is interested in the total momentum of both the radiation and the medium (for details, see Ginzburg [1979, 19893, where the connection with the phonon momentum in a solid body is also indicated). But it is just that quantity which occurs in the conservation law (3.3) or its classical analogue. Incidentally, the necessity to use just eqs. (3.3) or eq. (3.7) for hk or for G is already clear from the obvious correctness of the result obtained - the classical formula (2.4) for the angle of Vavilov-Cherenkov radiation.
-
9 4. Vavilov-Cherenkov Radiation in the Case of Motion in Channels and Gaps Energy losses due to the Vavilov-Cherenkov radiation contribute to the total losses, called ionization losses. This contribution is usually not
280
RADIATION BY UNIFORMLY MOVING SOURCES
cv9
§4
large*. The Vavilov-Cherenkov radiation in a transparent medium is, of course, separated from the total losses, as it can go far from the source (charge, etc.) trajectory, It is of interest to consider how one might eliminate practically all of the losses other than those due to the Vavilov-Cherenkov radiation. To this end, a radiating particle (a charge) must move in an empty channel or in a gap that is present in a medium. In this case, the losses due to near collisions are absent altogether, polarization losses are strongly suppressed, and the Vavilov-Cherenkov radiation of waves with a wavelength 1 changes little under the condition that 1 % r, where r is the radius of the channel or the width of the gap. This fact was mentioned by L. I. Mandelstam in 1940 when he spoke at the doctoral degree defense of Cherenkov. The point is that the transverse electromagnetic field of a charge in the direction perpendicular to the source trajectory is formed in a region of the order of the wavelength 1.If r % a, the influence of a channel or a gap can be considered on the basis of a macroscopic equation, with the usual boundary conditions. Such calculations show (Ginzburg and Frank [1947b]), for example, that in a medium with refractive index n = 1.5, the radiation from a charge with u + c in an empty channel, for which r/1 0.1, is weaker by only 10-20% than in a continuous medium. In optics (not to mention in the region of longer wavelengths) r/1 -0.1 for a channel of radius r 5x 9 a 3 x lo-’ cm. The possibility to use a channel or a gap is not of particular importance for a moving charge, but may be important for an atom or another complex “system” (see Q 6) which will be simply destroyed in a continuous medium. Moreover, we should not ignore the rather obvious condition that in order to suppress ionization losses, one can also place the trajectory of a charge or another radiator outside of the medium, but sufficiently close to it [it is clear that the Vavilov-Cherenkov radiation in this case is not too small unless d / l < 1, where d is the distance from the trajectory to the boundary between the medium and the vacuum]. The corresponding electrodynamic problems are, in principle, simple but
-
-
-
* The remaining losses, if we bear in mind distant collisions, are sometimes called polarization losses. These are caused by excitation of longitudinal oscillations, for which E(O) = 0; if the spatial dispersion is taken into account, we have for longitudinal waves E((W, k) = 0, where E! is the dielectric permittivity for a longitudinal field. However, these remarks refer only to an isotropic medium. In an anisotropic medium (e.g., a crystal), strictly longitudinal waves propagate only along symmetry axes. For particles moving in other directions, all of the losses, except those due to near collisions, are caused by the Vavilov-Cherenkov radiation (Ginzburg [1958]). Here, we do not touch upon the complication of the model due to absorption. The emission of longitudinal waves can be regarded as the Vavilov-Cherenkov effect for longitudinal waves, especially if absorption is not taken into account.
v, P 51
VAVILOV-CHERENKOV RADIATION FOR DIPOLES
28 1
require rather cumbersome calculations (Ginzburg and Frank [ 1947b], Bogdankevich and Bolotovskii [19571, Tsytovich [19861). In this connection and from general methodological considerations, it is instructive to note that in some cases it is efficient to use the reciprocity theorem (see, e.g., Landau and Lifshitz [1984], sect. 89). The application of this theorem makes it possible to establish immediately that sufficiently thin channels or gaps do not influence the Vavilov-Cherenkov radiation by a charge (Ginzburg and Eidman [1959]; see also Ginzburg [1979, 19891, ch. 7). This conclusion repeats what has been said above, but it is not trivial. The important point is that by means of the same reciprocity theorem, and without any special calculations, one can determine that even thin channels or gaps, in general, affect the Vavilov-Cherenkov radiation for dipoles and other multipoles. This problem is discussed in 0 5. The reciprocity theorem is, of course, sometimes also useful in the solution of other problems concerning radiators in a medium. For instance, let an oscillator (a dipole of frequency oo)be placed in the center of an empty spherical cavity of radius r 4 ,lo = 27rc/w0 in a medium with permittivity ~ ( oOn ) . the basis of the known solution of the electrostatic problem concerning the field in a spherical cavity, the reciprocity theorem immediately suggests that the oscillator radiation field for the oscillator in a cavity differs w ~from ) the radiation field in the case of by a factor of ~ E ( w ~ ) / [ ~ E+( 11 continuous medium.
Q 5. Vavilov-Cherenkov Radiation for Electric, Magnetic and Toroidal Dipoles Condition (2.4), which determines the opening of a cone for the VavilovCherenkov radiation and the very possibility of the appearance of this radiation [see eq. (2.5)], is, as has already been emphasized, of a kinematic or interferential nature. The opening of the cone* is therefore the same for all radiators -charges, dipoles, etc. The intensity of the radiation, its distribution and polarization along the cone depend, of course, on the nature of the radiator. A packet (pulse) of electromagnetic waves can also radiate, but in
* We are dealing here with an optically isotropic medium. In an optically anisotropic medium at a given frequency o,waves propagate with different values of n,(o); if spatial dispersion is disregarded, there exist only two such waves (i = I, 2; in an uniaxial crystal these are the ordinary and the extraordinary waves). In an anisotropic medium, several cones of Vavilov-Cherenkov radiation exist; these cones are generally not circular (Ginzburg [1940b], Agranovich and Ginzburg [19841).
282
RADIATION BY UNIFORMLY MOVING SOURCES
[V, 6 5
the framework of linear electrodynamics the wave packet radiation intensity is equal to zero. An account of nonlinearity leads to the Vavilov-Cherenkov radiation [the role of the source velocity v in eq. (2.4) is played by the group velocity of the packet; n(o) is the refractive index for the emitted waves which are considered in the linear approximation. See Agranovich and Ginzburg [1984], sect. 6.4.1 for greater detail]. We shall consider here the Vavilov-Cherenkov radiation for the case of different dipoles. Although this problem has already been discussed extensively (Ginzburg [1940a], Ginzburg and Eidman [1959], Frank [1942, 1952, 1988]), some questions arose from time to time, and it is only recently that the situation seems to have become rather clear (Ginzburg [1985a1,Ginzburg and Tsytovich [1985, 19901). Let us write the field equations that make it possible to calculate the Vavilov-Cherenkov radiation using macroscopic methods, 1 aEE 4n rot H = - -+ - j , c at
1 apH rot E = - - at ' div EE= 4np,
div pH = 0.
(5.3)
We deal here with a stationary medium whose properties are described by the permittivity E and the permeability p (taking into account the frequency dispersion). In eqs. (5.1), (5.2), and (5.3), these quantities must be considered as operators, but in the problem at hand, one may at first set both E and p equal to constants, and in the final result substitute E(O) and p(w); see Ginzburg C1979, 19891. In the case of the Vavilov-Cherenkov effect, the medium is assumed to be everywhere homogeneous (i.e., E and p do not depend on coordinates). If a source moves uniformly with a velocity u and has a point charge e, an electric moment p, and a magnetic moment m, then j
= eu 6(r - ut)
a { p 6(r - ut)} + c rot{m 6(r - ut)}. +at
(5.4)
Herep and m are moments in the laboratory frame of reference (by definition, the latter coincides with a nonmoving medium). In the rest frame of the source, the values p' and m' are, of course, different (see below). If only a charge exists, the solution of eqs. (5.1) and (5.2) leads to eq. (2.6). We will not write here all the formulae for the case of dipoles (see Ginzburg [1985a],
v,451
VAVILOV-CHERENKOV RADIATION FOR DIPOLES
283
Frank [1988]), but will give those which clarify some of the previous uncertainties. For a magnetic dipole rn, perpendicular to the velocity u, we have
(5.5)
The dipole is considered here to be purely magnetic in its rest frame; in the laboratory frame it also possesses an electric dipole moment p I = (l/c)[u x m,]. For clarity, we should recall that in this case m‘ = m, and p’ = 0. For an electric dipole p I perpendicular to u (in this case, rn’ = 0 and ml= -(l/C)CV x PlIh
The magnetic dipole considered above [see eqs. (5.4) and (5.5)] is the usual “current” magnetic dipole. In principle, however, there may also exist magnetic dipoles of other types (we will call them “true” magnetic dipoles), which are formed by two magnetic monopoles + g and - g . The moment of such a dipole is rlt = gd, where d = r 2 - rl, and r 2 and rl are the positions of the monopoles + g and - g (see fig. 2). Naturally, for a point dipole, as d - 0 , gd = hi. To calculate the fields of magnetic monopoles, dipoles, etc., in a medium we can, in a certain approximation used here, assume that j = 0 and p = 0 in eqs. (5.1)-(5.3) but add to the right-hand side of eq. (5.2) the term -(4.n/c)jm, where j , is the current density of the magnetic monopoles. Moreover, eq. (5.3b) takes the form div p H = 47cp,, where pmis the magnetic charge density*. As discussed in 5 1, if the solution of the problem for an electric charge is known, the solution for a magnetic monopole is obtained immediately as a result of the duality principle (for more details, see Ginzburg [1985a1). Thus, for the Vavilov-Cherenkov radiation of a magnetic monopole, where j , = gu 6(r - ut), we obtain from eq. (2.6)
r$)g9 [ 5) =
e( 1-
o do.
(5.7)
* As a rule, for magnetic monopoles the electrodynamics of continuous media is more complicated than that discussed here (see Kirzhnits and Losyakov [19851 and Kirzhnits [1987]). For our purposes, this fact is not particularly important.
284
RADIATION BY UNIFORMLY MOVING SOURCES
[V,§ 5
In the case of a “true” magnetic dipole mL, which is a direct analogue of an electric dipole pL, we obtain from eq. (5.6) the relation
Equation (5.8) differs from eq. (5.5), which refers to the “current” dipole ml. This result (Frank [1952]) seems paradoxical, because the fields of a current dipole and a true magnetic dipole are quite similar, at least outside the dipoles. Actually, “current” and a “true” magnetic dipoles do not appear to be equivalent - the fields inside them differ (see fig. 2; we mean, of course, extended dipoles, but it is clear that the point dipoles of the indicated types are also different). For instance, the field of a true dipole rit at rest is given by
H=
3(m * r)r - r2m
,
E=O.
(5.9)
pr5
+e
2 I-e *
*
E
+
p-ed
* H
Fig. 2. An electric dipole p (a and b), a “true” magnetic dipole 1 (c) and a current magnetic dipole m (d). The current magnetic dipole is schematically shown here as a rod with magnetization Mo in a volume V; another possible model is a ring carrying a current.
v, P 51
VAVILOV-CHERENKOV RADIATION FOR DIPOLES
285
These expressions are quite analogous to those describing the field of an electric dipole, E=
-
3 ( p r)r - r 2 p &r5
9
H=O.
(5.10)
At the same time, the field of a current dipole is given by
H=
3(m * r)r - r z m
r5
+ 4nm 6(r),
E=O.
(5.1 1)
The absence of the factor l/p in eq. (5.1 I) as compared with eq. (5.9) may be associated with different definitions of the moments m and t% (Ginzburg [1985a]). The term 4nm 6 ( r ) in eq. (5.1 1 ) reflects the difference between the dipoles, which is clear from fig. 2. Note that for a current dipole, the field lines are closed; eq. (5.1 1 ) is the solution of eq. (5.1) with j = c rot{m 6(r)}. If the macroscopic equations (5.1)-(5.3) and analogous ones are used, and if the permittivity E and the permeability p are assumed to be independent of the coordinates, then the medium may be assumed to occupy the entire space, including the region occupied by the sources themselves. In the case of electric charges and magnetic monopoles (i.e., magnetic charges) the latter fact is unessential. Although other arguments could be put forward, it is sufficient to note that for charges and monopoles moving in thin empty channels and gaps, the Vavilov-Cherenkov radiation remains unchanged (see 4 4). In the case of dipoles, however, an arbitrarily thin channel or a gap affects, in general, the field and the energy radiated by the source*. For instance, an electric dipole p L perpendicular to u and contained in a thin cylindrical channel radiates more than in a continuous medium by a factor of ( ~ E / ( E + 1))’; for a gap, the amplification is characterized by the factor E~ (we have set p = 1 and assumed that in the laboratory frame of reference the magnetic moment m = 0; see Ginzburg and Eidman [1959] and Ginzburg [1979,1989], ch. 7. This fact shows that the close surroundings are important for radiation by a dipole. Moreover, it was shown by Ginzburg [1952] that a “current” dipole moment radiates like a “true” dipole if the medium inside it is not at rest but is moving with the same velocity u as the dipole itself. Certain aspects of this problem have been obscure until quite recently; consideration of the Vavilov-Cherenkov radiation for a toroidal dipole
* The presence of thin channels and gaps does not affect the dipoles parallel to the velocity u and, thus, to the channel axis or to the gap plane. This can be seen fairly readily by means
of the reciprocity theorem (Ginzburg and Eidman [ 19591).
286
RADIATION BY UNIFORMLY MOVING SOURCES
[V,P 5
moment has provided a good insight into the matter (Ginzburg [1985a], Ginzburg and Tsytovich [1985]). Whereas electric and magnetic dipoles are already known to students at an early stage of their training, toroidal dipoles are not even mentioned in theoretical physics courses known to me. Actually, unless toroidal moments are introduced, a multipole expansion is incomplete (Dubovik and Tosunyan [1983]). An obvious example of a toroidal dipole is a “toroid”; i.e., a toruslike solenoid carrying current (fig. 3). If such a system is not charged ( p = 0), it does not possess an electric-dipole or higher . multipole moment. Furthermore, if an azimuthal current is absent (for this example, the winding must be double), the system does not possess a magnetic moment either. Inside the torus, however, the field H # 0 , and the system has a toroidal dipole moment. It is defined by the expression t
P
‘J
T = - {(r j)r - 2r’j) dr, 1oc
(5.12)
where j is the current density. If a toroidal point dipole is at rest, the density of the toroidal moment z = T 6(r), and j = c rot rot{ TG(r)}.
(5.13)
If a toroidal dipole is moving in a vacuum at a constant velocity u, then, as is immediately clear from Lorentz transformations, the fields H a n d E outside the dipole are equal to zero, as before. Inside the dipole, the field H # 0, and E = -(l/c)[V x HI. The fields of a toroidal dipole moving uniformly in a medium cannot be found by means of Lorentz transformations (in this context, we mean a medium which does not move in the laboratory frame of reference, and in
Fig. 3. Toroid with current. T is the toroidal dipole moment.
v, I 51
VAVILOV-CHERENKOV RADIATION FOR DIPOLES
287
the rest frame of the dipole the medium was also considered to be at rest). The calculation on the basis of eqs. (5.1) and (5.2), with the current (5.13) shows (Ginzburg and Tsytovich [1985]) that outside such a toroidal dipole, fields are present. Moreover, if condition (2.5) holds, Vavilov-Cherenkov radiation will be generated with power (5.14) The dipole T is assumed here to be directed parallel to the velocity u. Furthermore, a critical assumption is that only a toroidal dipole exists in the rest frame [i.e., the current density in the rest frame is described by eq. (5.13)]. If in the laboratory frame of reference one has j = c rot rot{ T,6(r - ut)}, then in the rest frame there also exists a quadrupole electric moment. We obtain an expression for (dW/dt)T,, (Ginzburg [1985a], Ginzburg and Tsytovich [1985]) which differs from eq. (5.14) by the replacement of (cp - 1)’ by ( ~ pand, ) ~ of course, by the replacement of T by T,. By virtue of what has been said, one expects that if a toroidal dipole moves in an empty channel, the Vavilov-Cherenkov radiation vanishes completely. The calculation confirms this conclusion (Tsytovich [1986], Ginzburg [1985b]). Calculation is unnecessary in the case of an empty channel or a gap; the result is obvious, because the field in a vacuum outside a toroidal dipole is equal to zero. Calculation is necessary if the channel is filled with a medium with permittivity c0 and permeability po. In this case, for a thin channel we obtain eq. (5.14) with the replacement of (cp - 1)2 by (cop0 - 1)2, which vanishes as cop0+ 1 and, of course, as c0 + 1 and po + I . What does all this mean? The whole point is that, in the case considered, the medium fills up the toroidal dipole inside which there is a field. This field polarizes the medium, and this effect persists after the dipole has passed. In other words, the dipole leaves a “trace” which records its passing. This picture is especially obvious for a plasma. Particles (e.g., electrons or ions) passing through a dipole are deflected inside it by the fields; therefore, the plasma behind the dipole is already perturbed. For a toroidal dipole, this effect manifests itself in a pure form, so to speak. It also takes place for magnetic moments, and due to different fields inside “current” and “true” dipoles [see eqs. (5.9) and (5.1 l)], the corresponding “traces” in the medium are also different. According to this conclusion, “current” and “true” magnetic dipoles moving in an empty channel or in a gap must radiate quite similarly (Ginzburg and Eidman [1959]), which is confirmed by calculation (Bogdankevich [19601, Tsytovich [1986]).
288
RADIATION BY UNIFORMLY MOVING SOURCES
rv, 0 6
How will different dipoles radiate when they move in a continuous medium? Since the radiation of dipoles is already influenced by the motion in thin channels, and the fields inside the dipoles are essential, it is clear that not only “far”, but also “near” collisions are responsible for the radiation. If the dipoles are macroscopic, then the passage of medium (e.g., plasma) through them may play some role. But for microscopic (point) dipoles, the role of “near” collisions may only be considered in the framework of microscopic theory, or by taking spatial dispersion into account. This also concerns problems which deal with particle radiation in the channeling process. Fortunately, if we do not touch upon channeling of charged particles, the corresponding problems are at present of no real importance simply because of the smallness of the dipole moments of different particles (neutrons, etc.)*. Although not yet solved, these problems are nonetheless of interest from a methodical point of view. For this reason, and bearing in mind that the question of radiation by moving dipoles has long remained unclear even in a “zero-order approximation”, we have devoted some space to this topic.
0 6.
Classical and Quantum Theories of the Doppler Effect in a Medium
The quantum theory of the Vavilov-Cherenkov effect in the classical limit [eq. (3.6)] does not give anything new, apart from an understanding of the role played by the conservation laws. However, in more complicated cases, quantum theory reveals interesting features even in the classical limit. To illustrate this fact, we will consider the Doppler effect in a medium. First we recall the classical stituation using as an example an oscillator with the eigenfrequency moo; this is a frequency in the frame of reference in which the oscillator as a whole is at rest. If the oscillator moves in vacuo with a constant velocity u in the laboratory frame of reference, then in this frame the frequency of the waves emitted by it is equal to
where 8 is the angle between the wavevector k (in the direction of observa-
* Radiation by toroidal dipoles moving in a medium may be of only methodological interest. However, in solid-state theory the situation is quite different; crystals with a toroidal moment of nonzero density are quite specific magnetic substances (see Ginzburg, Gorbatsevich, Kopaev and Volkov [1984]).
V,5 61
CLASSICAL AND QUANTUM THEORIES OF THE DOPPLER EFFECTI N A MEDIUM
289
tion) and u; in eq. (6.1), wo is the oscillator frequency in the laboratory frame. Now consider a transparent medium with refractive index n(o), which is at rest in the same frame as the laboratory frame considered. We should not be surprised that the motion of the source in a medium may lead to large energy losses and, more importantly, to the destruction of the source itself (an excited atom would be an example). Indeed, as pointed out in 84, to eliminate losses and destructive collisions, one can make an empty gap or an empty channel in the medium, or direct a beam of atoms near the boundary between the medium and vacuum. Particularly if the medium is a gaseous plasma located in a magnetic field, the close collision losses may be of no significance, even for a continuous medium. In a medium, eq. (6.1) is replaced by wooJl - u2/c2 00 I 1 - (u/c)n(o)cos 8 I 11 - (u/c)n(w)cos 6 I ‘ Equation (6.2) is derived from eq. (6.1) by using the general rule which one consists of replacement of u/c by un/c. However, in the term should not, of course, make this substitution, as it does not concern the emission process. One can obtain eq. (6.2) automatically, i.e., by solving the problem of an oscillator moving in a medium. It is a nontrivial result that absolute values occur in eq. (6.2). Of course, if the motion is subluminal (u < c/n), or if in the case of a superluminal motion the emission proceeds outside the cone (2.4), i.e., if
4 8 )=
Jw
(u/c)n(w)cos 8 < 1, (6.3) then we are dealing with the usual, so-called normal Doppler effect. However, the so-called complex Doppler effect due to dispersion (ie., due to the dependence of n on w) is also possible in this case. The complex Doppler effect as well as eq. (6.2) with the modulus were first considered by Frank in 1942 (see also Frank [1988]). If the motion is superluminal, then under the condition that
(u/c)n(w)cos 8 > 1 (6.4) [i.e., when there is emission into the cone (2.4), which is often referred to as the “Cherenkov cone”; fig. 41,then eq. (6.2) without the modulus would give negative values of the frequency w. From this observation it is clear that it is necessary to introduce the absolute signs, but we can verify this requirement by other means (it follows, for example, from the calculations given below).
290
RADIATION BY UNIFORMLY MOVING SOURCES
"Cherenkol cone"
kr
Fig. 4. Regions of normal and anomalous Doppler effect.
In the region (6.4), the Doppler effect is called anomalous. If dispersion is taken into account, the whole picture becomes rather complicated, but here we are interested in another aspect of the problem and will therefore neglect dispersion in the analysis. In this case, it follows from eq. (6.2) with n(o)= n = const. that on the Cherenkov cone [for (un/c)cos B = (on/c)cos Bo = 1; see eq. (2.4)], the frequency w(B,) = co, and a(@+ co as B+O0 on both sides of the cone. One cannot say more on the basis of eq. (6.2);it appears that the difference between the normal and the anomalous Doppler effects is not profound. We now turn to a quantum derivation of the formula for the Doppler effect in a medium (Ginzburg and Frank [1947a]). To do this, the conservation laws (3.1) and (3.3) are used, but the particle energies for a charge (or, more generally, for a source without internal degrees of freedom) are replaced by the expression Eo,l = ,/(m mo,l)2c4 c2&. In this expression (m + mo)c2= mc2 W, is the total energy of the system (atom) in the lower state 0, and (m ml)c2 = mc2 + W, is the same energy in the upper state 1. For simplicity, we consider here a system with two levels or simply discuss a certain transition in an atom, and refer to the state with a larger energy as the upper state [i.e., W, > W,, and the frequency of the radiation by an atom at rest is moo= (W, - W,)/h]. If we now apply the conservation laws in the classical limit [eq. (3.6)] and use the relation AE = E , - Eo = u Ap, we obtain eq. (6.2). A more general calculation which takes recoil into account was published by Ginzburg and Frank [1947al. An important fact is completely hidden in the classical derivation of eq. (6.2). This fact is revealed by tracing down the signs (by simple algebra, which we omit); the quantum corrections are not important in this regard. The fact is that in the region of the normal Doppler effect [eq. (6.3)], the emission at frequency o corresponds to a transition of the atom from the
+ +
+
+
v, 5 6)
CLASSICAL A N D QUANTUM THEORIES OF THE DOPPLER EFFECT IN A MEDIUM
29 1
upper state 1 to the lower state 0. The direction of the transition is determined by the requirement that the energy hw of the emitted quantum be positive; i.e., from the requirement that w > O . We have become accustomed to this situation, and, it naturally, only is realized in vacuum. However, in the case of an anomalous Doppler effect [eq. (6.4)]; i.e., if a quantum of energy hw is emitted into the Cherenkov cone, the atom must undergo a transition from state 0 upwards to state 1 (fig. 5; see also Ginzburg and Frank [1947a], Frank [1979, 19881 for a more detailed discussion). Of course, there is no contradiction here - the energy needed for the excitation of the radiating system (the atom) and the radiation energy hw itself are derived from the kinetic energy of translational motion. It follows that in the case of superluminal motion u > c/n, only where the anomalous Doppler effect is possible, even if the radiating atom is initially not excited (i.e., the atom is initially at the lower energy state 0), it becomes excited and, simultaneously, emits quanta inside the Cherenkov cone. Upon the transition to a lower energy state, the excited atom emits quanta at angles 0>8,,; i.e., outside of the Cherenkov cone. It appears rather difficult to obtain this unusual picture without a quantum calculation, although it is clearly not the quantum effects but the use of the energy and momentum conservation laws that is important. Having obtained these results by the means used above, we can then confirm the results and develop the theory further using the classical calculations of the radiative friction force acting upon a superluminal oscillator. Specifically, and in accordance with the above conclusion, the waves emitted outside the Cherenkov cone lead to a damping of the oscillator vibrations, whereas the waves emitted inside the cone (the anomalous Doppler effect) excite the
Atomic levels
Fig. 5. Transitions between levels 0 and 1 in the case of normal and anomalous Doppler effects.
292
RADIATION BY UNIFORMLY MOVING SOURCES
cv, § 7
oscillator vibrations [see Ginzburg [1979, 19891, ch. 7 and the literature cited there for a more detailed discussion of this topic].
8 7. Acceleration Radiation A great achievement of theoretical physics during the 1970s was the establishment of the fact that black holes, treated within the framework of general relativity with allowance for quantum effects, radiate (Hawking [1974]). The radiation of black holes was found to have the spectrum of a black body with a temperature
T bh-
ilc3 8?ckBGM’
(7.la)
where M is the mass of a black hole, kB is the Boltzmann constant and G is the gravitational constant. As is known, physical processes in a homogeneous gravitational field proceed as in a uniformly accelerated reference frame (the equivalence principle). In this connection, the behavior of various systems (“detectors”) in uniformly accelerated reference frames was considered (Unruh [19763). In this case, the “detectors” (atoms, oscillators, etc.) were found to undergo excitation as if they were in a thermal bath or a thermal radiation field with temperature, (7.lb) where a is a constant acceleration of our reference frame relative to the inertial one. Note that eq. (7.lb) takes the form of eq. (7.la) if one takes for the acceleration a the acceleration typical of the black hole (the gravitational field “strength”), the so-called surface gravity of the black hole, IC = G M / r i = c4/4GM, and rg = 2GM/cZ is the gravitational radius of the mass M. To a certain extent, the excitation of a “detector” that was initially at rest in a uniformly accelerated reference frame is similar to the quantum radiation of black holes. This similarity attracted a great deal of attention (references can be found in the review by Ginzburg and Frolov [1987]). The cause of the accelerated “detector” excitation and the character of the excitationinduced variation of the state of the quantized field interacting with the detector (a massive or massless scalar field, an electromagnetic field, etc.)
v, 5 71
ACCELERATION RADIATION
293
remained unclear for a rather long time. The reason for this is that the calculations have been made in a uniformly accelerated reference frame. However, the problem may well be considered under the assumption of an inertial frame (i.e., in a Minkowski space) in which a “detector” is uniformly accelerated. With such an approach, Unruh and Wald [19843 established that “detector” excitation is followed by the radiation of a field quantum; the latter is sometimes referred to as “acceleration radiation”. Excitation followed by radiation is somewhat unusual, and therefore warrants further discussion. The explanation for this phenomenon was provided by Ginzburg and Frolov [1987], who approached the problem by the analogy to “system” excitation in the case of the anomalous Doppler effect (see 0 6). For a “system”, whether it is an atom or any other “detector”, to undergo a transition into another state with concomittant emission or absorption of a quantum, it is first necessary that the corresponding transition matrix element be nonzero. Second, the laws of conservation of energy and momentum must not be violated by such a transition. If the “system” is at rest, then transitions with radiation are only possible from upper to lower levels; the same holds for the normal Doppler effect. As we have seen, for the anomalous Doppler effect the conservation laws are obeyed only in transitions from lower to higher levels. If the “system” and the radiation are initially at the lowest level, then within the framework of the classical physics, such a state is stable and, thus the excitation is a quantum effect. This is particularly obvious in the case of an oscillator to whose lower level in the classical picture there correspond no oscillations. If a “system” is moving uniformly, the anomalous Doppler effect is possible only when the center of gravity of the “system” has a velocity u > c / n ; otherwise, the conservation laws will not hold for a transition with excitation. In the case of a uniform acceleration, and generally for an accelerated motion of the “system”, the conservation laws can hold for any velocity u, even in vacuo. In vacuo, the role of the medium that provides a change in the momentum of the radiation and thus conformity with conservation law, is played by those external forces which accelerate the “system” (the “detector”) as a whole. Ginzburg and Frolov [1987] demonstrated this by a direct computation. The energy necessary for the detector excitation and for quantum radiation is due to the work of the external field which accelerates the detector. These remarks explain (clarify physically) the character of acceleration radiation, but the temperature (7. lb) must still be determined. To this end, one can either carry out a direct calculation of the radiation balance or make use of the equivalence principle. This aspect of the problem
294
RADIATION BY UNIFORMLY MOVING SOURCES
[V, 9: 8
is beyond the scope of the present article; see Ginzburg and Frolov [1987] for a detailed discussion. Moreover, the result (7.lb) is a particular case, in the sense that it only refers to uniform acceleration. The remarks made above are more general in character, since they concern an arbitrary acceleration [in this case, of course, the detector distribution over levels is not generally thermal]. In this case formula of the type (7.lb) are not valid. In the paper by Ginzburg and Frolov [1987], at the end of its sect. 5 , there are also a few remarks concerning the case when the detector is a macroscopic body.
!j 8. Transition Radiation at the Boundary between Two Media
In previous sections of this article, we have seen that when a source (a charge, a dipole or a higher multipole) moves uniformly and rectilinearly in a medium with a velocity v > c/n(w), Vavilov-Cherenkov radiation occurs. In the case that the medium is spatially homogeneous and time-independent, only the Vavilov-Cherenkov radiation is possible. If, however, the medium is inhomogeneous and/or changes over time, or if such a medium lies near the trajectory of a source, then the situation changes drastically. Under these circumstances, there generally occurs the so-called transition radiation (in the present context, the term is used in a broad sense). Such a radiation is produced when a charge (or another source without eigenfrequency) moves uniformly and in a straight line under nonuniform conditions. It bears repeating that these nonuniform conditions arise in inhomogeneous media, or in media with time-dependent properties, and in the neighborhood of such media. In the general case, transition radiation may, of course, coexist and interfere with the Vavilov-Cherenkov radiation and with radiation due to charge acceleration (i.e., with Bremsstrahlung, synchrotron radiation, etc.). However, to provide a deeper insight into the physics of the problem, we will simplify the discussion by considering the transition radiation alone. Suppose that a charge is moving at a constant velocity*
v < c/n,
(8.1)
* In the presence of Cherenkov or transition radiation, the energy of the charge changes as a general rule. Consequently, the question arises regarding whether or not we can assume the velocity of the charge to be strictly constant. The answer is undoubtedly affirmative for the reason given in 9: 2. In some problems, one must also take into account the change in the velocity of the source, but that is quite a different question.
v 9
§ 81
TRANSITION RADIATION AT THE BOUNDARY BETWEEN TWO MEDIA
295
under circumstances in which Vavilov-Cherenkov radiation does not occur. If, in addition, we are dealing with a vacuum ( n = l), there will be no radiation at all. For the radiation to appear in a vacuum, a charge (or a multipole) must be accelerated; i.e., the parameter u/c, which characterizes the radiation, must change. If the medium is transparent, this parameter already has the form u/Cph = un(w)/c;it is equal to the ratio of the particle velocity u to the phase velocity of light, Cph = c/n(w). However, the crux of the matter is that the parameter un/c in a medium may change not only with u, but also with a change in the phase velocity Cph = c/n along the source trajectory due to a corresponding change of the refractive index. Only the radiation which occurs when the parameter unjc changes due to changes in n along or close to the trajectory of a source with u = const., is called transition radiation. To be precise, in the general case of an absorbing medium the role of the refractive index n is played by = n + ilc, where E is the complex dielectric permittivity of the medium (for simplicity we assume here and below that the medium is nonmagnetic; i.e., that p = 1). The simplest problem of this kind is a charge crossing the boundary between vacuum and a medium. I. M. Frank and the present author considered in 1944 (see Ginzburg and Frank [1946]) just the simplest kind of transition radiation which occurs in this case. In a sense, transition radiation is even a simpler effect that the Vavilov-Cherenkov radiation. The fact that generation of transition radiation was revealed with such a delay is apparently due to the same reasons as in the case of the VavilovCherenkov effect. Note that the above explanation of the nature of transition radiation, which associates the occurrence of the radiation with a change in the parameter un/c, is still somewhat formal and in fact requires an insight into the theory of radiation in a medium. It is therefore not irrelevant to recall the most obvious explanation for the occurrence of transition radiation when a charge crosses the boundary between media. It is well known that the electromagnetic field in the first medium (i.e., in the medium in which the charge moves at a given instant) can be represented as the field of the charge itself and the field due to its “mirror image” moving in the second medium towards the charge. When the charge and its image cross the boundary, from the “point of view” of the first medium, they partly “annihilate” or “reconstruct” themselves, and this leads to the radiation. An especially simple example of this is the case of a charge incident normally on an ideal mirror; when crossing the boundary of the mirror, the charge e and its
4
296
RADIATION BY UNIFORMLY MOVING SOURCES
Vacuum
Metal ( ideal mirror)
\\
\
---,--e--
€ -+
Trajectory Charge rnovin of charge with vel$ty
---------e ‘’Mirror image” moving with velocity -3
,v Fig. 6. Transition radiation of charge e which crosses the boundary between vacuum and a metal.
image, -e, “annihilate” each other completely or, better expressed, they stop at the boundary [in the sense that the radiation occurring in vacuum is the same as the radiation by an incident charge e and its image -e, which stop simultaneously at the boundary (fig. 6 ) ] . To find the emitted energy W in this simplest case, we need not solve a generally rather cumbersome boundary problem, but can use instead a very simple and well-known formula for radiation by charges which sharply change their velocity*; viz.,
where ei is the charge of the ith particle whose velocity changes sharply from uil to ui2, and s = k/k is the direction of the radiation wavevector characterized by the angles 8 and cp. The total frequency-dependent energy density of the radiation W(w)=
s
W ( o ,8, cp) sin 8 d8 dcp,
and the total energy W = W(w)do. In a vacuum (in the absence of boundaries, etc.), if one charge el = e stops
* A sharp change in velocity means that the change occurs in a time r that is small compared to the wave period T = 2x/o. This condition is always sufficient, but it is necessary only in the nonrelativistic case. In the general case, the change of the velocity may be assumed to be sharp if it proceeds within a time T 4 tf, where tf is the time in which the radiation is generated. This time is introduced in 8 9. For a vacuum, tf = 2n/(w(l- (u/c)cos O)), where 0 is the angle between the velocity u and the wavevector k (for vacuum, of course, k = w/c; in a transparent medium its influence on tl is reduced to the replacement of c by c/n(w), where n(o) is the refractive index).
v, § 81
TRANSITION RADIATION AT THE BOUNDARY BETWEEN TWO MEDIA
291
abruptly or accelerates rapidly from rest to the velocity v, then [see eq. (8.2)] w(w’
”
e’v’ sin’ 9
4n’c3(1 - (vlc) cos 0)’’
For transition radiation generated on an ideal mirror, one should assume in eq. (8.2) that the charge e , = e moving with velocity u and the charge e2 = - e moving with velocity - u stop abruptly at the boundary (see fig. 6). As a result [after integration over cp, which justifies multiplication by 2711, one observes in medium 1 (in vacuo) a radiation with energy
Here, 6’ is the angle between k and - u, as shown in fig. 6. In the nonrelativistic case, i.e., for v 4 c, Wl (w, 0) =
e’v’ sin’ 6 n2c3 ’
v’ w, (w)= 4e’ 3n2
(8.5a)
*
In the ultrarelativistic case (v + c), e’ 2 e’ 2E W, (w)= - In -= 2 - In -, nc 1 -v/c nc mc’
mc’ (8.5b)
This result is the same as that for the radiation of a single particle in the same limit [see eq. (8.2)]: this is clear because when v+c, the radiation is along the direction of the velocity of the charge. Therefore, the radiation of the charge e “entering the metal” is not observed. In medium 1 (in vacuo), only the radiation of the mirror image is observed; i.e., radiation of the charge - e, with a velocity - u. In the nonrelativistic limit, the energy (8.5a) is four times larger than the energy (8.3) for a single charge radiating into the hemisphere of directions (into the vacuum), because for a nonrelativistic velocity the fields of the charge e and of its mirror image - e add; i.e., they double. It is quite obvious that the transition radiation under discussion occurs if a boundary of any media with different “electrical” parameters (such as
298
RADIATION BY UNIFORMLY MOVING SOURCES
[V, 5 8
dielectric permittivity, refractive index) is crossed. However, analytical attention concentrated initially on the incidence of a charge on a metal (which may not be a perfect mirror) and, hence, on transition radiation - mainly optical - in the “backward” direction, which is observed in vacuum. For relativistic particles of high enough energy it is, however, quite realistic that a particle may pass through a medium and proceed into vacuum. From a theoretical perspective, this problem is equivalent to the previous one, and the corresponding formula for the radiation intensity is derived by simply replacing the velocity u by - u (see below). However, in the calculation of the fields there is, of course, no symmetry in these cases, and the radiation intensities are different when u is replaced by - u (i.e., when the particle enters or leaves the medium during “backward” or “forward” observations, respectively).Under certain conditions, the differences are large. For forward radiation and, in particular, when a particle leaves the medium and enters vacuum, the radiation spectrum has higher frequencies; more specifically, in a condensed medium the transition radiation of relativistic particles can extend into the X-ray part of the spectrum. We will not present here the solution of the corresponding boundary problems; the solution is discussed by Ginzburg and Tsytovich [1990] and in the literature cited in that paper. However, it is appropriate to recall the main formula of Ginzburg and Frank [I9461 for the case where medium 1 is vacuum and medium 2 is described by the complex permittivity E.
w,( w e ) =
+( u / c ) J i X Z Z ) l ~ n2c31(i- (02/c2)C O S ~e)(i + ( u / c ) J E S Tcos ) (eE+ J Z Z T ) 1 2 ’ e
e2u2 sin2 cos2 el(&- 1)(1- uz/cz
(8.6) For an ideal mirror, one can assume that IE[+CO and eq.(8.6) goes over into eq. (8.4), as it should. The expression for W2(0,O), which pertains to the case where the charge e leaves a medium of permittivity E (medium 1) and enters vacuum (medium 2) with a velocity u, is derived from eq. (8.6) by replacing u with -u; the angle 8 is now the angle between k and u [but not between k and - u, as in eq. (8.6)]. The replacement of u by - u in eq. (8.6) is not at all trivial - in the denominator there now appears a factor of (1 - ( u / c ) , / m ) instead of (1 + ( u / c ) , / m ) . This is just the reason why, as we have mentioned, higher frequencies appear in the radiation spectrum when a particle leaves the medium. One must also take into account that E approaches unity for high frequencies. As a result, the total
TRANSITION RADIATION AS A MORE GENERAL PHENOMENON
299
intensity (integrated over all angles and frequencies) in this case also increases; in the simplest case, the total intensity is proportional to E/mc2 = l / J v [see $ 10; E is the total energy of a radiating charge of mass m]*. This important fact was clarified in 1959 (Barsukov [1959], Garibyan [1960]). It opened up much wider perspectives for the creation of efficient “transition counters” used for particle detection or, more precisely, for determination of relativistic particle velocities. Actually, questions concerning transition counters were encountered earlier in connection with boundaries between media, but the proposed use of transition radiation in the optical region of the spectrum cannot, apparently, be used for this purpose. As often happens, the apparent potential for a “practical” application in this case, to high-energy physics - stimulated scientific interest in transition radiation. According to one bibliographical listing, only 14 papers were devoted to transition radiation from 1945/1946 to 1958. In the next 13 years (from 1959 to 1971),already 244 papers appeared; since that time, hundreds of papers on this subject have been published. The first published results of experiments in which transition radiation was observed appeared in 1959. Altogether, transition radiation - a rather simple and clear effect in the field of classical electrodynamics - attracted little attention for about 15 years; it now receives great attention, although mainly in connection with transition counters, This topic has been mentioned in a number of review articles, as well as in many papers (for references see, e.g., Ginzburg and Tsytovich [1990], Fabian and Fisher [ 19801, Kleinknecht [1982]).
$9. Transition Radiation as a More General Phenomenon. Formation Zone While not denying the importance of investigations connected with transition counters, we would like to emphasize that transition radiation is, in the broad sense, of undoubted value from a general, physical perspective. It
* The fact that for ultrarelativistic particles ( u j c ) , the total forward-radiated energy W, (e.g., when a particle leaves a medium and enters vacuum) exceeds substantially the total backward-radiated energy W, (into a vacuum, for example) requires additional explanation. We now deal with real media, whereas for an ideal mirror for u+c, the forward- and backwardradiated energies are equal. However, if the frequency dispersion (which is always present in a medium, especially at high frequencies) is taken into account, it is not the charge e itself, but its image with a charge fe that radiates backward (as u-c). At high frequencies, If1 < 1, whereas the forward radiation is not due to the charge fe but rather to the charge e itself. In the end, therefore, the total forward radiation prevails (as u+c).
300
RADIATION BY UNIFORMLY MOVlNG SOURCES
[V,§ 9
develops certain ideas and the “language”, and thereby facilitates further progress in some directions. The situation here is similar generally to the one with the Vavilov-Cherenkov effect. It is perhaps noteworthy that the latter is used directly in Cherenkov counters. Thus far, our attention has been concentrated on the transition radiation which occurs when one or more boundaries between media are crossed. In the latter case, we deal either with an ordered sequence of boundaries (i.e., a system with a definite period) or with randomly distributed boundaries (inhomogeneities)*. Another trend, which has developed for a long time, is based on the fact that any radiation and, in particular, transition radiation with a wavelength A (in vacuum 1 = 2 4 0 ) is not formed at a point, but rather in some region (the “formation zone”). The dimension of the formation zone is determined by the wavelength 1,but can also be appreciably larger. As has already been mentioned, this is the reason why the VavilovCherenkov effect occurs when a particle moves in a vacuum but near a medium (in a channel, a gap, or near a boundary between media). Quite similarly, transition radiation (which in this case is sometimes called diffraction radiation) occurs when a source (charge) moving uniformly in either vacuum or a uniform medium passes close to some obstacles. Examples of such obstacles include metallic or dielectric globules, diaphragms, diffraction gratings, etc. Apart from the above general considerations, questions regarding such transition radiation can also be explained easily on the basis of the method of images. For relativistic particles, when radiation in the direction of their velocity is considered, we find that the formation zone (also variously termed the coherence length, formation length, or formation path) generally increases with increasing particle energy. For instance, in vacuo the size of the formation zone Lf in the direction of the velocity for a given radiation wavelength 1increases in proportion to (E/rnc2)’ = (1 - u 2 / c 2 ) - ’ , where E is the total charge (source) energy and it is assumed that E % mc2. The concept of the radiation formation zone and its size Lf, and the concept of the radiation formation time tf = Lf/u, are comparatively little
* Transition radiation in a periodic, inhomogeneous medium has its own specific features. This may be the reason why it is sometimes called resonance radiation (Ter-Mikaelyan [19721). The use of different terms for different forms of transition radiation may lead to confusion, and for this reason, we advise against the practice. However, to be consistent with the terminology used in the literature, we shall refer to transition radiation in a periodic medium as resonance transition radiation; we shall also refer to it as transition scattering, the term which we prefer.
v, Q 91
TRANSITION RADIATION AS A MORE GENERAL PHENOMENON
30 1
Fig. 7. The length Lf of the radiation formation zone.
known despite their great importance not only in electrodynamics, but also in high-energy physics as a whole (see Frank [1942], Ter-Mikaelyan [1972] and Berestetskii, Lifshitz and Pitaevskii [1982], sect. 93). Although it is elementary, the derivation of the expressions for Lf and tf for a source moving with a velocity u in a transparent medium with refractive index n ( o ) and emitting waves at an angle 8 with respect to u seems worth considering here (fig. 7). Suppose that at t = 0 the source is located at a point A, and that the phase of the wave emitted by it in the direction k is equal to pA. We define the formation time tf as the time after which the phase of the wave pB,emitted at point B in the same direction, differs by 2n from the wave plase cpA emitted at a point A. The phase factor of the wave has the form exp cp = exp i(k r - ot).Since the size of the formation zone is the path Lf = vtf, it is clear (see fig. 7) that
It follows from eq. (9.1) that
It is clear from eqs. (2.4) and (9.1) that for Vavilov-Cherenkov radiation, Lf = co. The meaning of the formation time tf is especially obvious when we deal with forward radiation in vacuum. In this case, within the formation time tf, the radiation is ahead of the particle by a wavelength 1. Indeed, in vacuum for 8 = 0, according to eq. (9.2), tf = 1/[c(l - u/c)] and (c-u)tf = I = 2 n c / o . In a medium, we should replace c by cph=c/n [because we deal with phase relations], and from the relationship (c/n - u)tf = 1 = 27cc/on, we obtain eq. (9.2) for 8 = 0. In vacuum, for 8 = 0,
302
RADIATION BY UNIFORMLY MOVING SOURCES
[V,ij 9
the formation length L, = (lu/c)/(1 - u/c), and for u + c, we have
Note that the length of the formation zone may be chosen to be half of the one used above, because here we are concerned primarily with the definition, and the choice of Lf does not affect the qualitative results obtained (e.g., the radiation intensity). It is important to appreciate that for sufficiently high energies (more precisely, when u x c), the dimensions of the formation zone L, and the time tf can increase greatly and exceed the wavelength II and the time L/c, respectively. For instance, if the interference terms are neglected, then the intensity of transition radiation from two boundaries (e.g., boundaries of a plate of thickness d ) may be considered as the sum of the radiation intensities from one boundary only, provided that L f < d . If L, a d , and especially if Lf b d, the radiation from a plate differs in an essential way from the radiation from two independent boundaries. The above considerations indicate the nature of the transition radiation formed by means of a sequence of plates or other regularly inhomogeneous media. For a detailed discussion of such radiation, which we have called the resonance transition radiation or transition scattering, see Ter-Mikaelyan [19721 and Ginzburg and Tsytovich [19901. Another type of transition radiation occurs in a homogeneous medium with time-dependent properties. The essence of the matter is explained most easily in terms of the parameter un/c. As has already been emphasized, for a transition radiation to occur (for u = const.), the refractive index n(w)must change along the charge trajectory or near to it. This change will also take place if the index n changes in time; e.g., if n increases or decreases sharply at certain instants. It is of interest to note that this kind of transition radiation can occur even for a charge which is at rest relative to the medium. Indeed, if by applying a magnetic or an electric field or by some other means, one alters rapidly the medium from an optically isotropic to an anisotropic state, the polarization of the medium that surrounds the fixed charge will lose its spherical symmetry. Such a change in the polarization evidently entails the emission of electromagnetic waves (for references, see Ginzburg and Tsytovich [19901). Like Vavilov-Cherenkov radiation, transition radiation is also of a very general nature in the sense that it takes place with various kinds of waves.
v, 8 101
TRANSITION SCATTERING AND TRANSITION BREMSSTRAHLUNG
303
As an example, we mention transition radiation of acoustic waves which arises when a moving dislocation crosses a grain boundary in a polycrystalline body*. Other problems connected with transition radiation are also of interest in acoustics (see 9 12). Of theoretical interest, and potentially of real importance in application to pulsar magnetospheres is the transition radiation that arises in vacuum in the presence of a strong magnetic field, which leads to nonlinear electrodynamic effects.
4 10. Transition Scattering.Transition Bremsstrahlung If transition radiation occurs when a charge moves in a medium with a periodically (e.g., sinusoidally) varying refractive index, it may be called not only transition radiation or resonance transition radiation, but also transition scattering (a terminology which we have already used). Indeed, a dielectric permittivity (refractive index) wave, which can be a standing or a travelling wave, is in this case “scattered” by a moving charge generating electromagnetic (transition) radiation. However, using the term “transition scattering” rather than “transition radiation” would be inappropriate if the effect did not also take place in the limiting case of a charge at rest. In this case, it would be somewhat unnatural to speak of transition radiation, whereas the term “transition scattering” would reflect the essence of the effect; an example of this would be a permittivity wave incident on a nonmoving (fixed) charge, which emits an electromagnetic wave. It is easy to understand this result without invoking the theory of transition radiation. To illustrate this, we will consider an isotropic medium characterized by a dielectric permittivity E which depends only on the density of the medium p. If a longitudinal acoustic wave propagates in this medium, the density p = pC0)+ p ( ’ ) sin(k, r - m o t ) and, by virtue of what has been said above,
-
E
+
= do)
E(’)
-
sin(k, r - a,, t),
(10.1)
* More frequently, one considers the emission of sound waves occurring when a dislocation reaches a crystal boundary. The closest analogy in this case is not transition radiation but rather the Bremsstrahlung produced when a charge is stopped. If one is concerned with the radiation and not with the “fate” of the source, then transition radiation and Bremsstrahlung (considered with the influence of the boundary taken into account) are in many cases indistinguishable.
304
RADIATION BY UNIFORMLY MOVING SOURCES
Frequency w
-,
, /------Medium (\,‘ wave vector A
Permitivity I/ wave -wave / / (E-wave)
\,
Frequency uQ\ wavevector \:K
. ,----. /’\, J\ \, /*---_
1
//
/’
’ I‘
\
‘ /
~ - ~ -
\,Scattered \ electromagnetic
___ ~
I
~~
\ -\ I , \\.--/ /,
/I
//
\
‘ aroundchargee Polarization
\\._____’ ,I
\\
‘.
/
--_-___”
Fig. 8. Schematic picture which characterizes the process of transition radiation formation of a nonmoving (fixed) charge.
where E ( ’ ) is the change in E caused by the change in p (in the simplest case, &(1) = const. x p“)); we have chosen a definite mechanism for the change in E (in this case, as being due to a change in the density p ) only to make our consideration concrete and obvious. For the purposes of the following discussion, it is only the presence of the permittivity wave (10.1) in a medium that is important. Suppose that a fixed or an infinitely heavy charge e is placed in the medium. Around this charge there appears an induction and a field D ( r , t ) = EE(r, t),
er
D‘O)= r3’
er (0)- -
- E(0)r3’
(10.2)
[with the superscript (0) denoting the “unperturbed” problem of the field of a charge without a permittivity wave]. In the presence of a permittivity wave in a first approximation (corresponding to the assumption that -g do), made here for simplicity) the variable polarization, (10.3)
arises around the charge. Such variable polarization, which possesses no spherical symmetry for ko # 0, results in the appearance of an electromagnetic wave with a frequency wo;this wave propagates away from the charge If, as (see fig. 8). The wave number of this wave is k = 271/2 = (aO/c)$? we have assumed, the permittivity wave is caused by the acoustic wave, then k 4 ko = wo/u, where u is the velocity of sound (assuming that u 4 c/.\/Eo). The electromagnetic wave which arises in this example can be regarded as scattered in the same sense as for other kinds of scattering; e.g., the
V,§ 101
TRANSITION SCATTERING AND TRANSITION BREMSSTRAHLUNG
305
scattering of an electromagnetic wave by an electron at rest (in this case, we mean a rest state only when the effect of an incident wave is disregarded). Transition scattering plays a prominent role in plasma physics (see 9 1 I), and is a rather general phenomenon. It occurs, for example, in vacuum when an electromagnetic or a gravitational wave is incident on the region with a strong constant (static) or quasistationary electromagnetic field (in this case, an outgoing electromagnetic wave is produced; see also Q 12). Related to the transition scattering process is transition Bremsstrahlung. It occurs in a medium if a charge e moving uniformly in a straight line passes close to another charge e’ which is, for example, at rest. Transition Bremsstrahlung has characteristics similar to those of ordinary Bremsstrahlung, although for its occurrence, an acceleration (a change in the rectilinear trajectory or deceleration) is not necessary. The term “transition Bremsstrahlung” is also justified, because this radiation occurs in particle collisions, and radiation produced in collision is just called Bremsstrahlung. Moreover, transition Bremsstrahlung of electrons is described by expressions which are very similar to the corresponding formulae for ordinary Bremsstrahlung. Furthermore, transition Bremsstrahlung interferes with ordinary Bremsstrahlung. However, in contrast to ordinary Bremsstrahlung, transition Bremsstrahlung does not disappear in the limit of infinitely heavy colliding particles. The general theory of Bremsstrahlung of particles in a medium must ultimately take into account transition Bremsstrahlung and its interference with ordinary Bremsstrahlung, just as the general theory of scattering must take into account transition scattering. It is easy to understanding the physical nature of transition Bremsstrahlung if we bear in mind that the field E and the polarization P = [ ( E - 1)/47c]E of a uniformly moving and, in particular, of a charge at rest can be expanded into waves with a wavevector ko and a frequency wo = ( k , u), where u is the charge velocity. Generally, these waves are also connected with permittivity waves with the same values of ko and wo. Such permittivity waves “dragged along” by a single charge undergo transition scattering by another charge; electromagnetic radiation, in this case transition Bremsstrahlung, is produced in consequence. Transition Bremsstrahlung may be responsible for the generation of any “normal-mode waves” (excitons, photons, phonons, etc.) which can propagate in the medium considered. In other words, transition Bremsstrahlung is a phenomenon of a rather general nature, just as is transition scattering. In g9-12, we have not cited much of the original literature, and have
306
RADIATION BY UNIFORMLY MOVING SOURCES
[V,§ 11
referred frequently to the review by Ginzburg and Tsytovich [1990], as the latter is probably accessible to most English-speaking readers. We will mention as an historical note that transition radiation in a nonstationary medium was first considered by the author of this article in 1973. Transition scattering was first considered by Tsytovich and the author in the same year, and transition Bremsstrahlung by Tsytovich in 1973-1976.
0 11.
Transition Radiation, Transition Scattering and Transition Bremsstrahlung in a Plasma
Transition radiation, transition scattering and transition Bremsstrahlung generally play a particularly important role in connection with a plasma. This fact was not immediately appreciated for transition scattering, because in plasma studies one can proceed far by using a microscopic approach and without introducing, for instance, the concept of transition scattering. Nonetheless, the notion that processes corresponding to transition scattering occur in plasmas provided an insight into the situation and facilitated the development in this field (see Ginzburg and Tsytovich [19901). It is clear from general considerations that in a rarefied (in the limit, a collisionless) plasma, the processes of transition scattering and transition Bremsstrahlung may turn out to be particularly important. Indeed, the above-mentioned transition processes (including transition radiation) occur without any particle acceleration due to the inhomogeneity of the medium along the trajectory of rectilinearly and uniformly moving charges. Neglecting collisions, the particle (electron and ion) motion in a plasma is, to a first approximation, rectilinear and uniform (the action of mean macroscopic magnetic and electric fields are also neglected in this approximation). Furthermore, various instabilities readily appear in a collisionless plasma, and this leads to the appearance of a background of different kinds of waves. In broad terms, these waves are coupled, or more precisely, they are, at the same time, permittivity waves which undergo transition scattering. Of particular importance is that such permittivity waves are already coupled with the most typical and most often observed high-frequency (Langmuir) wave in an isotropic plasma. Let us consider this example in more detail. For a collisionless isotropic plasma, if the role of ions is assumed to be negligibly small (see Ginzburg [ 1979, 1989]), then the longitudinal dielectric
V,§
111
307
TRANSITION EFFECTS IN A PLASMA
permittivity is known to have the form &[(a,k) = 1 -
w2 W2
-3
kBTo;k2 mu4 '
W;
4xe2 N m
=,
kB 7- k2. m2% m (1 1.1)
The dispersion equation.for longitudinal waves Ee(W, k ) = 0 then leads to the following equations for a longitudinal (plasma) wave E = Eo cos(ko * r - mot), W;
x
CEOx k] = 0,
Eo * k= Eok,
+ 3 mk ~ T
0; -k2
(1 1.2)
Furthermore, because div E = - Eo ko sin@, * Y - w0 t ) = - 4neN") and N = N'O) N ( ' ) (the electron charge is -e), it is clear that the longitudinal wave is coupled to the permittivity wave and that the charge -eN") is compensated by the ion charge. The relevant equations are
+
&e -&(I)
sin(k,.v-
mot),
&jl)=-
4xe2 N ( ' ) mw;
ek, Eo
moi
(11.3) *
Thus, plasma particles in a plasma wave are affected by the electric field E of the wave, and, at the same time, a permittivity wave is incident on them. Under the influence of the field, electrons oscillate, and Thomson scattering occurs with the known cross-section aT = ;xi-," = ; x ( s > '
= 6.65 x
cm2.
(1 1.4)
We are dealing here with a total cross-section for nonpolarized transverse radiation; it is assumed that the electron velocity u 6 c/n(w).However, transition scattering occurs simultaneously, and interferes with Thomson scattering. For electrons, both effects are of the same order of magnitude. In this respect, a plasma is rather complicated, because both the spatial and frequency dispersions should be taken into account. The corresponding formulae are given by Ginzburg and Tsytovich [1990]. Here we will only remark that in the case of ions, the role of Thomson scattering is small because of the large ion mass. Under such circumstances, transition scattering dominates; its order of magnitude is generally comparable to that for electrons. The same is true for transition Bremsstrahlung: it is substantial for ions to about the same extent as for electrons; in contrast, ordinary Bremsstrahlung is, in practice, significant only for electrons. At boundaries between media, transition radiation in a plasma is usually
308
[V, § 1 1
RADIATION BY UNIFORMLY MOVING SOURCES
not of great interest, because plasma boundaries are “smeared out”. This remark should be qualified, however, because of the dependence on the wavelength of the waves one is dealing with. For long enough waves, even if there are no walls, the plasma boundary may be sharp enough for the appearance of a detectable transition effect. Moreover, it is known that for high enough frequencies w Z 4 w f (here, w, are eigenfrequencies of the medium), for all media, the following plasma formula holds to a good approxima tion &(W) =
0;f
1 - -,
wz
4ne2N , o;,, = m
(1 1.5)
~
where N , is the volume density of the electrons. In forward radiation, when a relativistic particle leaves the plate, the energy is concentrated mainly in the X-ray range. Therefore, to calculate the total energy W, for any medium, one can use eqs. (1 1.5). The result we alluded to in 0 6 (the original literature is cited and the calculation is given by Ginzburg and Tsytovich [1990]) is (1 1.6) and the maximum of radiation corresponds to a frequency (11.7)
-
- -
According to eq. (9.3) with A A,, the dimension of the formation zone in this case [A, (2nc/o,) (2nc/op,,)(mc2/E)] is equal to (1 1.8)
-
Equation (1 1.8) takes into account that for an ordinary medium wp,f 1016-1017 s- 1 , and AP,[ = 2nc/o,,, 10-5-10-6 cm. For high-energy particles; e.g., for protons with E 1015 eV, E/mcZ lo6 and Lf 10 cm. Arbitrary as it may be, this example shows how large the formation zone may be. For the sake of completeness, we note that for the boundary between plasma and vacuum [i.e., in the case where eqs. (8.6) and (1 1.5) are used for E 4 mc’], the total energy of backward radiation (in vacuum) is
-
N
-
-
(1 1.9)
V,§ 121
CONCLUDING REMARKS
and in the region of plasma transparency (i.e., for frequencies o > energy is
309
the
(11.10) In 9 8, we have already analyzed the causes of such increase of the energy W, with an increase of E l m 2 , which is slower than given by eq. (I 1.6). Even in cases where one boundary is crossed and eq. (1 1.6) applies, the probability of the appearance of a single transition quantum is, according to eq. (11.7), of the order of W2/Ao, e2/h 1/137. This is the reason why transition counters have many dividing boundaries. The number of boundaries is in turn limited by the necessity to have layers (plates) with a thickness comparable to or exceeding the dimension of the formation zone Lf. Despite these limitations, transition counters have their advantages and are used in a number of applications (Fabian and Fisher [1980], Kleinknecht [1982], Ginzburg and Tsytovich [19903). It seems probable that transition counters and other devices which make use of transition radiation and scattering of various types will find further application in experimental physics.
-
N
0 12. Concluding Remarks As this article demonstrates, the motion of sources at a constant velocity is of great importance in the electrodynamics of continuous media, particularly in a plasma. New problems are frequently encountered in this field. The Vavilov-Cherenkov and the Doppler effects are the best known and most widely studied effects in this area. However, as is evident from the discussion in 4 5 , unsolved problems remain even in the theory of the Vavilov-Cherenkov effect. Transition radiation, and especially transition scattering, are not well understood by most physicists. This is apparently the reason why many problems concerning transition radiation and scattering have not been thoroughly explored even within the framework of electrodynamics. Problems from surface physics serve as examples, including: (1) transition radiation of different surface waves in different situations (e.g., a charge moving on an inhomogeneous surface consisting of two media, or near this surface), (2) transition scattering of Rayleigh waves and other surface waves on charges or defects on or near the surfaces, and (3) transition radiation and scattering in the case of a rough or regularly
310
RADIATION BY UNIFORMLY MOVING SOURCES
CV,§ 12
inhomogeneous surface (e.g., a lattice). Even less attention has been given to the corresponding situations in acoustics, hydrodynamics, and the theory of elasticity. For instance, it is known that several kinds of sound waves can propagate in superfluid helium 11. First and second sound waves propagate in the bulk material, third sound waves can propagate in films, and in helium which occupies a finely porous medium, fourth sound waves can propagate. Several kinds of normal (eigen)waves can propagate in crystals, which in general are neither purely longitudinal nor purely transverse. Different kinds of sound waves can, in principle, be scattered by various inhomogeneities and perturbations, and can be transformed into waves of other types. We should also mention here the transition radiation (formation) of electronpositron pairs which occur when a boundary between two media or a boundary of an atomic nucleus, is crossed. In principle, transition radiation and transition scattering are possible for any kind of field (particle), with the emission of another kind of field. This is certainly true also of the Vavilov-Cherenkov effect. It is also noteworthy that there are many unresolved questions related to the passage of relativistic and ultrarelativistic particles through crystals. In this process, electron-positron pairs are produced, generating radiation in the X-ray and pray ranges (see Akhiyezer and Shulga [19873, Baryshevskii and Tikhomirov [1989], Baier, Katkov and Strakhovenko [1989], and Caticha [1989, 19923). To illustrate concretely the usefulness of understanding the physics of transition processes, such as transition scattering, we consider an example. In 1973, a paper appeared which, within the framework of the general relativity, considered a charge located in the center of mass of a binary star (two identical, electrically neutral stars moving in a circle relative to their center of mass). Such a nonmoving charge emits electromagnetic waves. At first glance, this result seems rather unexpected. However, the result is obvious if one knows what transition scattering is and if one bears in mind that in general relativity, a gravitational field affects the electromagnetic properties of vacuum; one can say that vacuum possesses an electric permittivity and a magnetic permeability which depend on the metric tensor g,. One may, therefore, assume that moving stars modulate permittivity and permeability; i.e., they generate permittivity and permeability waves. Consequently, transition scattering of these waves by a non-moving charge occurs, and a “scattered” electromagnetic wave appears. Understanding this fact makes it possible to consider in a particularly simple way problems such as the transformation (scattering) of a gravitational wave on a charge or, more realistically for astrophysics, on a magnetic dipole such as a pulsar (see Ginzburg and Tsytovich [1990]).
VI
REFERENCES
311
Evidently, the insight into the nature and properties of the radiation and scattering processes which take place for uniformly and rectilinearly moving sources (Vavilov-Cherenkov radiation, Doppler effect, transition radiation, transition scattering and related phenomena) is of great importance.
Note Added in Proof
I would like to pay attention to two new articles. Their content is to some extent clear from the titles: (1) Transition radiation of relativistic particles in a magnetized plasma with random inhomogeneities, by Fleishman [19923; (2) Microwave transition radiation in solar flares and in astrophysics, by Fleishman and Kahler (1992).
References* Agranovich, V.M., and V.L. Ginzburg, 1984, Crystal Optics with Spatial Dispersion, and Excitons (Springer, Heidelberg). Akhiyezer, A.I., and N.F. Shulga, 1987, Usp. Fiz. Nauk 151, 385. Baier, V.N., V.M. Katkov and V.M. Strakhovenko, 1989, Usp. Fiz. Nauk 159,455. Barsukov, K.A., 1959, Zh. Eksp. & Teor. Fiz. 37, 1106. Baryshevskii, V.G., and V.V. Tikhomirov, 1989, Usp. Fiz. Nauk 159, 529. Berestetskii, V.B., E.M. Lifshitz and L.P. Pitaevskii, 1982, Quantum Electrodynamics (Pergamon, London). Bogdankevich, L.S., 1960, Sov. Phys. & Tech. Phys. 4, 992. Bogdankevich, L.S., and B.M. Bolotovskii, 1957, Zh. Eksp. & Teor. Fiz. 32, 1421. Bohr, N., 1913, Philos. Mag. 25, 10. Bohr, N., 1915, Philos. Mag. 30,581. Bohr, N., 1948, Mat.-Fys. Medd. Dan. Vidensk. Selsk. 18, N8. Bohr, A., 1948, Mat.-Fys. Medd. Dan. Vidensk. Selsk. 24, N19. Caticha, A., 1989, Phys. Rev. A 40,4322. Caticha, A., 1992, Phys. Rev. B 45, 9541. Dubovik, V.M., and L.A. Tosunyan, 1983, Fiz. Elem. Chastits & At. Yadra 14, 193. Fabian, C.W.,and H.G. Fisher, 1980, Rep. Progr. Phys. 43, 1003. Fermi, E., 1940, Phys. Rev. 57,485.
* Some of the papers mentioned in this list of references were published (or originally published) in Russian. In this connection we note that the journals Zh. Exp. & Teor. Fiz., Usp. Fiz. Nauk, and some others are now translated into English in the USA (Sov. Phys.-JETE Sov. Phys. Usp., etc.). For the sake of accuracy, we also mention that in co-authored articles, the authors’ names are placed in alphabetic order according to the Russian alphabet. Thus, for example, the succession of names Ginzburg and Frank rather than Frank and Ginzburg is due to the fact that the letter r (G) in the Russian alphabet precedes the letter @ (F).
312
RADIATION BY UNIFORMLY MOVING SOURCES
VI
Fleishman, G.D., 1992, Transition radiation of relavistic particles in magnetized plasma with random inhomogeneities, Sov. Phys.-JETP 74, 26 1. Fleishman, G.D., and S.W. Kahler, 1992, Microwave transition radiation in solar flares and in astrophysics, Astrophys. J. 394, 688. Frank, I.M., 1942, Izv. Akad. Nauk SSSR, Ser. Fiz. 6, 3. Frank, I.M., 1952, in: Vavilov Memorial Volume (Acad. Sci. USSR, Moscow) p. 173. Frank, I.M., 1979, Usp. Fiz. Nauk 129,685. Frank, I.M., 1984, Usp. Fiz. Nauk 143, 111. Frank, I.M., 1988, Vavilov-Cherenkov Radiation. Theoretical Aspects (Nauka, Moscow). Frank, I.M., and I.E. T a m , 1937, C.R. Dokl. Acad. Sci. USSR 14, 109. Garibyan, G.M., 1960, Sov. Phys.-JETP 10, 372. Ginzburg, V.L., 1939, C.R. Dokl. Akad. Sci. USSR 24, 131. Ginzburg, V.L., 1940a, J. Phys. USSR 2,441. Ginzburg, V.L., 1940b, J. Phys. USSR 3, 95. Ginzburg, V.L., 1952, in: Vavilov Memorial Volume (Acad. Sci. USSR, Moscow) p. 193. Ginzburg, V.L., 1958, Sov. Phys.-JETP 7, 1096. Ginzburg, V.L., 1979, Theoretical Physics and Astrophysics (Pergamon, London, New York). Ginzburg, V.L., 1985a, Radiophys. & Quantum Electron. 27, 601. Ginzburg, V.L., 1985b, Radiophys. & Quantum Electron. 28, 839. Ginzburg, V.L., 1986, in: The Lessons of Quantum Theory, eds J. de Boer, E. Dal and 0. Ulbeck (Elsevier, Amsterdam) p. 113. Extended version: 1988, Proc. Lebedev Physical Inst., Vol. 176 (Nova, Commack). Ginzburg, V.L., 1989, Application of Electrodynamics in Theoretical Physics and Astrophysics (Gordon and Breach, New York, London). Ginzburg, V.L., and V.Ya. Eidman, 1959, Sov. Phys.-JETP 8, 1055. Ginzburg, V.L., and I.M. Frank, 1946, Zh. Eksp. & Teor. Fiz. 16, 15; Short version of this paper: I.M. Frank and V.L. Ginzburg, 1945, J. Phys. USSR 9, 353. Ginzburg, V.L., and I.M. Frank, 1947a, Dokl. Akad. Nauk SSSR 56, 583. Ginzburg, V.L., and I.M. Frank, 1947b, Dokl. Akad. Nauk SSSR 56, 699. Ginzburg, V.L., and V.P. Frolov, 1987, Sov. Phys. Usp. 30, 1073 [See also: 1986, JETP Lett. 43, 265; Phys. Lett. A 116, 4231. Ginzburg, V.L., and V.N. Tsytovich, 1985, Sov. Phys.-JETP 61,48. Ginzburg, V.L., and V.N. Tsytovich, 1990, Transition Radiation and Transition Scattering (Adam Hilger, Bristol). See also: 1979, Phys. Rep. 49, NI. Ginzburg, V.L., A.A. Gorbatsevich, Yu.V. Kopaev and B.A. Volkov, 1984, Solid State Commun. 50, 339. Hawking, S.W., 1974, Nature 248, 30. Heaviside, O., 1888, Electrician (November 23) p. 83. See also: 1889, Philos. Mag. 27, 124; 1912, Electromagnetic Theory, Vol. 3 (The Electrician Publ. Co., London). Kelvin, Lord, 1901, Philos. Mag. 2, I . Kirzhnits, D.A., 1987, Usp. Fiz. Nauk 152, 399. Kirzhnits, D.A., and V.V. Losyakov, 1985, JETP Lett. 42, 226. Kleinknecht, K., 1982, Phys. Rep. 84(2), 87. Landau, L.D., 1946, J. Phys. USSR 10,25. Landau, L.D., and E.M. Lifshitz, 1984, Electrodynamics of Continuous Media (Pergamon, London). Sommerfeld, A., 1904, Gottinger Nachrichten S.99, S363; 1905, S.201. Ter-Mikaelyan, M.L., 1972, High-energy Electromagnetic Processes in Condensed Media (Wiley, New York). Tsytovich, V.N., 1986, Radiophys. & Quantum Electron. 29, 447. Unruh, W.G., 1976, Phys. Rev. D 14, 870. Unruh, W.G., and R.M. Wald, 1984, Phys. Rev. D. 29, 1047.
E. WOLF, PROGRESS IN OPTICS XXXII @ 1993 ELSEVIER SCIENCE PUBLISHERS B.V.
ALL RIGHTS RESERVED
VI
NONLINEAR OPTICAL PROCESSES IN ATOMS AND IN WEAKLY RELATIVISTIC PLASMAS BY
G. MAINFRAY and C . MANUS Service des Photons, Atomes et Molkules, Bdtiment 522, Centre d'Etudes de Saclay, 91191 Gifsur Yvette cedex, France
313
CONTENTS PAGE
§ 2.
. . . . . . . . . . . . . . . LASER LIGHT POLARIZATION EFFECTS . . . . , ,
§ 3.
RESONANCE EFFECTS . . . . . . . . . . . . . 317
§ 4.
LASER TEMPORAL-COHERENCE EFFECTS IN NONRESONANT MULTIPHOTON IONIZATION OF ATOMS. . . . . . . . . . . . . . . . . . . . 321
§ 5.
LASER TEMPORAL-COHERENCE EFFECTS IN RESONANT MULTIPHOTON IONIZATION OF ATOMS
§ 1.
6 6.
INTRODUCTION .
3 15
316
332
RELATIVISTIC SELF-FOCUSING OF A LASER PULSE IN APLASMA . . . . . , . . . . . . . . . . . . 335
REFERENCES . . . . . . . . . . . . . . . . . . . 358
314
9 1. Introduction The nonlinear interaction of an intense laser pulse with a neutral or ionized gaseous medium gives rise to a variety of interesting effects. The present article consists of two parts. The first emphasizes physical effects which are peculiar to multiphoton ionization of atoms, i.e., to the nonlinear interaction of an intense laser pulse with neutral atoms at low atomic density. The second part is devoted to the nonlinear interaction of an ultra-intense laser pulse with electrons and new physical effects which are expected to occur when a picosecond terawatt laser pulse is focused into a plasma. Multiphoton ionization of atoms is a typical example of one of the fields of investigation in atomic physics that lasers have opened up. Multiphoton ionization, which results from the simultaneous absorption of several photons, constitutes a further generalization of single-photon ionization. The N-photon ionization rate of an atom varies as a N lN , where oNis a generalized N-photon ionization cross section with units of cmZNsN-' when the laser intensity I is expressed in photons cm-' s-'. Since oNdecreases when the nonlinear order N increases, an N-photon ionization process can be observed at any order N if a high enough laser intensity is used. Typically, using a 50 ps pulse from Nd-glass laser radiation at 1060 nm, the four-photon ionization of Cs requires a laser intensity of about 10" W/cm2, the 11-photon ionization of Xe about 1013W/cm2, and the 22-photon ionization of He an intensity in the 10'4-10'5 W/cmZ range (L'Huillier, LomprC, Mainfray and Manus [1983]). Clearly, these intensities can be created only by focusing high-power pulsed lasers. The multiphoton ionization of an atom reflects both the characteristics of the laser pulse (polarization, coherence, frequency, intensity) and the properties of the atom perturbed by the intense laser field. Multiphoton ionization thus constitutes a very favorable method for studying the nonlinear response of an atom in the presence of an intense laser field. Multiphoton ionization of atoms is a very active field of research as has been emphasized by an extensive series of reviews; see, e.g., Bakos [1974], Lambropoulos [19761, Morellec, Normand and Petite [19821, Mainfray [1980], Chu [1985], Rhodes [1985], Smith and Leuchs [1987], 315
316
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
IVI, 9: 2
Lambropoulos [19871, Delone and Fedorov [ 19891, Bruzzese, Sasso and Solimeno [1989], and Mainfray and Manus [19911. Multiphoton ionization of atoms has also been examined in a number of books (Mittleman [1982], Chin and Lambropoulos [1984], Delone and Krainov [1984], Lin [1984], Faisal [19871, Bandrauk [ 19881, Gavrila [ 1992]), conference proceedings (Eberly and Lambropoulos [19783, Lambropoulos and Smith [1984], Smith and Knight [1988], Mainfray and Agostini [1991]), and special issues of journals (Cooke and McIlrath [19871, Burnett and Hutchinson [19891, Kulander and L'Huillier [19903). The first part of the present article emphasizes laser radiation properties which induce physical effects peculiar to multiphoton processes; these include laser polarization effects, laser temporal coherence effects as well as laser frequency effects which induce resonances in the ionization rate of an atom. The second part of the article is devoted to the nonlinear interaction of an ultra-intense laser pulse with electrons which induces new optical effects, such as relativistic self-focusing of the laser pulse. These effects are expected to be observed in the near future by using terawatt laser pulses produced by recent short pulse laser technology.
5 2. Laser Light Polarization Effects Although the rate of single-photon ionization is independent of the polarization of the incident light, laser light polarization effects are peculiar to multiphoton processes. In an N-photon ionization of unpolarized atoms, the Nth-order transition proceeds via intermediate states. During the first transitions, the polarization of photons is partially transferred to the atoms. The basic polarization phenomena can be explained by selection rules governing multiphoton transitions. These rules are a generalization of the selection rules for a single-photon transition. The selection rule for the orbital quantum number 1 is A1 = + I . The selection rule for the magnetic quantum number m is Am = 0 for linear polarization and Am = + 1 and Am = - 1 for right and left circular polarization, respectively. For the dipole approximation, these rules are illustrated in fig. 1 for a one-electron atom with an initial S state. Clearly, only one channel will exist for circular polarization leading to only one final state, whereas many channels are available for linear polarization leading to f N 1 allowed final states if N is even, and f ( N + 1) if N is odd. As a result of these selection rules, the total yield of multiphoton ionization depends on the state of polarization of the incident laser radiation, and a strong dominance of linear over circular
+
VL 5 31
RESONANCE EFFECTS
317
Fig. I . Schematic representation of the channels of a five-photon transition with linearly (solid line) and circularly (dashed line) polarized laser light.
polarization is expected for large N values (Reiss [1972], Gontier and Trahin [1973], Lambropoulos [1976]). This point has been confirmed by various experiments, for example, the ratio of five-photon ionization rate of Na atoms by a Nd-glass laser pulse with linear and circular polarization is 2 f 0.4 (Delone, Manakov, Preobrazhenskii and Rapoport [1976]). The ratio between the 11-photon ionization rate of Xe for linearly and circularly polarized laser light at 1065 nm is measured to be 38 f 6, and 70 & 8 for the 13-photon ionization of Kr (LomprC, Mainfray, Manus and Thebault [19771). Two- and three-photon ionization processes are generally an exception to the above-mentioned rule. For example, the ionization rates for circularly polarized light in the three-photon ionization of Cs and K were found to be larger, by 2.15 0.4 and 2.34 L 0.2, respectively, than for linearly polarized light (Fox, Kogan and Robinson [1971], Cervenan and Isenor [1974]). At first, it may seem surprising that circular polarization with fewer available channels gives a higher rate. The reason for this is that, in addition to the number of channels, the strength of the matrix elements of the N-photon transitions also is of importance and especially the proximity of a resonance, as shown in § 3. The reversal from small circular-polarization dominance to large linear-polarization dominance occurs at N = 4 or 5. These conclusions have been borne out in a number of theoretical papers.
6 3. Resonance Effects In the N-photon ionization of an atom, the penultimate photon absorbed generally falls in the dense part of the atomic spectrum. Resonant enhance-
318
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI,5 3
ment in the multiphoton ionization rate occurs when the energy of ( N - 1) photons is equal to the energy of an atomic state shifted by the intense laser field. Figure 2 shows schematically the four-photon ionization of Cs atoms by a Nd-glass laser pulse through the resonant three-photon excitation of the 6F state. The resonant state is coupled much more strongly to the continuum than to the ground state. This means that the photoionization rate from the resonant state is much greater than the decay rate to the ground state due to stimulated emission. The resonance profile can be modeled by an effective two-level atom with the assumption of a rate-limiting three-photon excitation step followed by rapid photoionization from the excited state. The ionization rate W is given by (Eberly [1979], Zoller [1979], and Gontier and Trahin [1979]) 52’ r W=2 (AE - af)’ + T’’
where 52 is the three-photon Rabi frequency between the ground state and the excited state; s1 is proportional to f3/’. AE is the static resonance where 0 is the laser frequency and coo the detuning given by AE = 3 0 - oo, unshifted resonance frequency. af is the AC Stark shift of the resonant level with the assumption of a resonance shift proportional to laser intensity. The width of the resonant state is r = 01, where o is the photoionization cross
65 Fig. 2. Schematic representation of the four-photon ionization of the Cs atom with a threephoton excitation of the 6F level. AE is the resonance detuning.
VI,o 31
319
RESONANCE EFFECTS
section from the resonant state. Figure 3 shows the enhancement in the number of ions at the resonance frequency. This result was obtained using a bandwidth-limited, 15 ps laser pulse (Lompre, Mainfray, Manus and ThCbault [1978]). Figure 3 shows clearly a shift to shorter wavelengths in the resonance profiles when the laser intensity is increased. As is well known, the exchange of photons between the laser field and the atoms shifts and broadens atomic levels. These effects have been well described within the framework of the dressed atom theory (Cohen-Tannoudji [ 19671). The resonance shift a1 is linear with respect to the laser intensity, with a = 2 cm-'/GW cmT2, in excellent agreement with calculations in Cs (Crance [1978], Gontier and Trahin [1980]). As the resonance shift is a linear function of the laser intensity, this means that a photon is absorbed and re-emitted in addition to the four-photon absorption leading to the ionization of the atom. The importance of atomic level shifts induced by the laser field is also
105
10'
uI
103
0
+I
102
1.5.2
x
10' W crn-'
I = 1.8 x 10' w cm-'
1c
I
9LLO
i , 9111
I
I
I
9118
Laser frequency (cm-')
Fig. 3. Resonance profiles of the three-photon resonant four-photon ionization of Cs atoms induced by a 15 ps laser pulse.
320
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI,6 3
emphasized through the law which describes the variation of the number Ni of ions as a function of the laser intensity in the neighbourhood of the resonance. As the laser intensity increases, the AC Stark shift can move an excited state into or out of a resonance, depending on the resonance detuning. This effect causes a nonmonotonic dependence of the ionization rate upon the laser intensity. Figure 4 shows the variation of K = (a log Ni)/(alog I ) as a function of the static resonance detuning AE = &, - E6S- 3E,,, where E,, is the photon energy. This result was obtained using a single-mode Ndglass laser pulse which can be tuned from 1056 to 1060nm (Morellec, Normand and Petite [1976]). For BE values larger than 10 cm-’, atomic level shifts are insignificant compared to AE and the l 4 law again becomes valid and characterizes the off-resonance four-photon ionization process. A very good agreement between the theoretical and experimental results shown in fig. 4 is obtained when all the experimental parameters are brought into calculation (Gontier and Trahin [19803). The damping term in the resonant multiphoton ionization rate comes from the one-photon coupling of the resonant state to the continuum, i.e., r = 01. The amplitude of ion resonance profiles is very large when t~ and I are weak, while resonance profiles can be completely damped and obliterated when B and I are large, especially in multiphoton ionization of rare gases at 1013- I OI4 W/cm2. However, the photoionization cross section from the excited states of the noble gas atoms decreases approximately as E,;1” as the laser photon energy Ephis increased (Chang and Kim [19821).As an example,
30 -
1
H
cn 0
‘ -20 z
E! 07
0 J W
G
10
0
l , , , I I , , , l i , , l l
I
F
I I , 1 1 1 1 1 I I
Resonance detuning (ern-']
Fig. 4. The variation of the order of nonlinearity K =(a log Ni)/(alog I ) as a function of resonance detuning in the four-photon ionization of Cs with a three-photon excitation of the 6F level. The laser pulse duration is 37 ns.
VI, o 41
NONRESONANT MULTIPHOTON IONIZATION OF ATOMS
32 I
at about I O l 3 W/cm2, resonance effects in multiphoton ionization of Kr are clearly observed around 290 nm (Landen, Perry and Campbell [1987]), while resonance effects are so damped at 1060 nm that the resonant enhancement is no longer significant (Lompre, Mainfray and Manus [1980]). The reason is that the photoionization cross section of a given resonant state is approximately 50 times smaller at 290 nm than at 1060 nm, and the damping term is then 50 times larger at 1060 nm than at 290 nm.
9; 4. Laser Temporal-Coherence Effects in Nonresonant Multiphoton Ionization of Atoms 4.1. GENERAL DISCUSSION
Multiphoton ionization of atoms is an inherently nonlinear process which depends not simply on the laser intensity but also on laser light coherence properties. Physically, laser coherence effects occur because of correlations in the photon arrival times at the absorbing atom. The radiation generated by a Q-switched laser has a spectrum which consists of a series of evenly spaced frequencies that are the longitudinal modes of the resonant cavity. The frequency spacing is c/2L, where c is the velocity of light and L is the optical length of the cavity. The mode-spacing is 200 MHz for L = 75 cm.The beating between these close frequencies produces a strong modulation in the temporal distribution of the laser intensity. Here we assume the laser to be spatially coherent as it generates the TEM,, mode with a well-defined Gaussian profile of the radial distribution of the intensity. As a result, only laser temporal coherence effects are considered. By putting dispersive elements such as Fabry-Pkrot etalons in the oscillator cavity, it is possible to narrow the bandwidth of the laser spectrum and to select one of the longitudinal modes in order to have a single-mode operation of the laser. Figure 5 shows the temporal distribution of a singlemode and a seven-mode laser pulse recorded by a photodiode and an oscilloscope whose combined rise time is 350 ps (Lecompte, Mainfray, Manus and Sanchez [ 19741).In multimode operation, modes are oscillating simultaneously but independently of each other. The laser light then fluctuates like a Gaussian source. The fluctuation of the intensity is periodical, with a period determined by the round-trip time of the light in the laser cavity. The same fluctuation pattern is reproduced periodically during a given laser
322
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI,0 4
Fig. 5. The temporal distribution of a Nd-glass laser pulse when the laser operates in a singlemode (a) or 7 modes (b).
pulse, as shown in fig. 5b. This stochastic pattern depends on the number of modes and on both the relative phases and intensities of the modes, and varies from shot to shot. The coherence time zc is defined as the characteristic time of the most rapid fluctuation of the laser intensity; zc is given by the Fourier transform of the overall laser bandwidth. Typically, it varies from a few tens of nanoseconds for a single-mode laser pulse as in fig. 5a, to a few picoseconds for a 100-mode pulse. It should be pointed out that the number of ions produced in the N-photon ionization of an atom varies as a function of the instantaneous laser intensity I as I N . The instantaneous laser intensity can no longer be measured when the number of modes becomes larger than about ten because the coherence time becomes shorter than the temporal resolution of a fast photodiode associated with a large-bandwidth oscilloscope. It is then convenient to consider the laser intensity as a statistical quantity and to deal with average values. The number of ions is then proportional to ZNP(Z)dl, where P(1) d l is the probability that at any given time t during the laser pulse, the intensity lies between I and I + dl. P(Z) can be predicted fairly reliably. For
VI,0 41
NONRESONANT MULTIPHOTON IONIZATION OF ATOMS
323
example, in the approximation of an infinite number of independent modes, the statistical properties of the light are expected to be close to that of thermal light. In this case,
where Tis the average value of the laser intensity over the pulse duration, and ( I N ) = N! TN,where the brackets denote the ensemble average. It should be noted that time averages are mainly used in stationary fields, while ensemble averages are naturally used for nonstationary fields. On the other hand, for a nonfluctuating coherent light, Pcoh(l)
=
-
r),
(4.2)
rN.
where 6 denotes Dirac’s delta-function, and (IN) = This shows that the nonresonant N-photon ionization rate of an atom induced by an incoherent laser pulse which has a very large number of independent modes is expected to be N! times larger than that induced by a coherent single-mode laser pulse with the same average intensity 1 This also means that the average intensity required to produce a given ionization rate is reduced by (N!)’” when using an incoherent laser pulse. Ducuing and Bloembergen [ 19643 wrote the first theoretical paper on statistical fluctuations in nonlinear processes. Subsequently, articles written by Teich and Wolga [1966], Lambropoulos, Kikuchi and Osborn [1966], Shen [ 19671, Mollow [19681 and Lambropoulos [ 19681 were devoted to photon correlation enhancement of second harmonic generation and twophoton absorption processes. This work was then extended to N-photon ionization of atoms (Bebb and Gold [1966], Meadors [1966], Agarwal [1970], Tomov and Chirkin [1971], Debethune [1972], Sanchez [1975], Gersten and Mittleman [1976]), followed by a review paper on the statistical properties of broad-band laser radiation (Masalov [ 19851). The instantaneous laser intensity seen by atoms can be expressed in the form
where M is the number of modes, a,(t) is the time-dependent complex amplitude of the mth mode, w, is its mean angular frequency, w, = w o
+ ma,
(4.4)
324
NONLINEAR OPTICAL PROCESSES M ATOMS AND PLASMAS
and cr,(t) = C,(t) eiem(‘)G(t),
where G(t) is the slow temporal envelope of the modes. Three different situations can be considered: (i) The amplitudes C, and phases 0, = 0 of the modes are time independent. This case corresponds to a mode-locked laser. (ii) The amplitudes C, are time independent, but the phases 0(t) are random variables distributed equally between 0 and 2n. (iii) The amplitudes C,(t) and phases 0,(t) are both random variables, with C,(t) obeying a Gaussian distribution. According to the standard definition of correlation, one can define the following moment of Nth order,
where T is the pulse duration. It has been demonstrated that such an expression is nothing but the enhancement of the multiphoton absorption process of the Nth order due to the multimode operation of the laser compared to the multiphoton absorption of a single-mode laser operating at the same average intensity. Other moments have been considered. such as
where gN is the Nth order normalized function at the top value of the mode intensities, and
(P)
b N = (J)“
In most cases, g N =fNbN, where only the moment phase correlation properties of the field.
fN
is connected to the
VI,8 41
NONRESONANT MULTIPHOTON IONIZATION OF ATOMS
325
4.2. COMPARISON WITH EXPERIMENTS
The N ! enhancement in the N-photon ionization rate of an atom induced by an incoherent laser pulse was confirmed by Arslanbekov [1976] in the five-photon ionization of Na atoms, and by LomprC, Mainfray, Manus and Marinier [1981] in the four-photon ionization of Cs atoms. Figure 6 shows the variation of the number of Na ions as a function of the average value of the laser intensity produced by a single-mode and multimode Q-switched Nd-glass laser pulse at 1059 nm (Arslanbekov [ 19761). The enhancement factor due to the mode structure is 5! = 120. Laser temporal-coherence effects have been demonstrated much more dramatically in the example of the 1 1-photon ionization of Xe atoms, with 1 I ! = 4 x 10’. The number of longitudinal modes of a Q-switched Nd-glass laser at 1043 nm was changed in a controlled manner from 1 to 2, 7, 10, ..., up to 100 modes (Sanchez and Lecompte [1974], Lecompte, Mainfray, Manus and Sanchez [1975]). Figure 7 shows a log-log plot of the variation of the number of ions Ni induced in the 1 1-photon ionization of Xe atoms as a function of the average laser intensity Twhen the laser operates succes-
Fig. 6. The variation of the number of Na ions in five-photon ionization as a function of the average laser intensity f i n arbitrary units of a single-mode (a) or multimode (b) laser pulse.
326
NONLINEAR OPTICAL PROCESSES IN ATOMS A N D PLASMAS
I
Fig. 7. The I 1-photon ionization of Xe. The variation of the number of Xe' ions is plotted as a function of the average laser intensity Tin arbitrary units, when the laser operates in 1, 2, 7, 10, 30 and 100 modes.
sively in one mode, two modes with a visibility 0.6,7, 10, 30 and 100 modes. The slope (a log Ni)/(a log 7) = 1 1 1, characteristic of a nonresonant 1 I-photon ionization, remains constant when the number of modes is changed. Furthermore, experimental points induced by a single-mode laser pulse are perfectly lined up on a straight line with a slope of 11, while experimental points obtained with a multimode laser pulse show scattering. This scattering is due to the variation in both phases and relative intensities of modes from one laser shot to another. 4.2.1. The two-mode case When the laser operates in two adjacent modes, the modulation is purely sinusoidal, as shown in fig. 8a and b (Sanchez and Lecompte [1974]). The period T of the modulation corresponds to a round-trip time of the light in the oscillator cavity. The beating between the two laser frequencies coming from the same source produces a fringe pattern between a maximum and a
VI, § 41
NONRESONANT MULTIPHOTON IONIZATION OF ATOMS
327
Fig. 8. Temporal modulation of a two-mode laser pulse. The ratio between the intensity of the (a) or 0.2 (b). two modes is
minimum value of the laser intensity. The modulation depth, at the highest part of the laser pulse, varies from shot to shot depending on the relative intensities of the two modes. The modulation depth is 20% when the ratio A two-mode pulse is a between the intensity of the two modes is special case for which the phase of the two modes does not play any role. The visibility of the fringe pattern is defined as V=
Imax
- Imin
Imax
+ Imin
(4.9) ’
The instantaneous laser intensity can be recorded accurately for a two-mode case, and is given by I(t)= 1
+ vcos($).
(4.10)
The Nth order moment for a two-mode pulse is given by fN =
f joT[1 + Vcos($)r
dt.
(4.1 1)
328
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI, § 4
For N = 11 and V=O.6, f i l = This is in good agreement with the enhancement in the number of ions, 101.6*o.2,induced by a two-mode pulse with visibility V = 0.6 compared to that induced by a single-mode pulse with the same average laser intensity. 4.2.2. The multimode case
When the laser oscillates in three or more modes, the stochastic pattern varies from shot to shot. It depends on the number of modes and on both phases and intensities of the modes. From the experimental data shown in fig. 7, it is possible to determine the enhancement in the number of Xe' ions due to the mode-structure as shown in fig. 9 (Mainfray [1982]). The number of ions is enhanced by nearly lo7 when the number of modes is increased from one to one hundred. This
M
Fig. 9. The variation of thef,, moment as a function of the number M of modes. Experimental (full line curve) and calculated (dashed line) f i l moment. In the calculation it is assumed that phases of the modes are independent.
VI, B 41
NONRESONANT MULTIPHOTON IONIZATION OF ATOMS
329
enhancement factor is not far from the maximum theoretical value 1 I! = 4 x lo7. It should be pointed out that the number of modes necessary to provide the Gaussian character of laser-radiation statistics and to reach the N ! asymptotic value depends strongly on the order of the N-photon ionization process. Debethune [1972] has shown (fig. 10) that the higher the order of the N-photon process, the larger is the number of modes required to reach the N ! value. Typically a number of 30 modes is enough for a fourphoton process, while nearly 300 modes are necessary in an 11-photon process (Masalov [ 19851). Calculations have been performed for the multimode case assuming stochastic independent phases. If it is assumed that all the modes have the same amplitude, b, = 1 and f, = g,. Tomov and Chirkin [1971] have calculated the enhancement factor f N given in table 1. Modes with Gaussian stochastic distribution have been considered by Lecompte, Mainfray, Manus and Sanchez [19751, Gersten and Mittleman [I9761 and Masalov [1976], and lead to the following results fN=N!
bN=
M,(M - I)! ( N M - l)!’
+
(4.12)
(N+M-l)! M N ( M - I)! ’
(4.13)
1
102
10
lo3
M
Fig. 10. The Nth order correlation function fN normalized to N! as a function of the number M of modes for N = 2,4, 6, 8, 10 and 12.
330
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI, § 4
TABLE1 Efficiency of generation of harmonics by multimode radiation compared to that for single-rnode radiation of the same average intensity.
Nonlinear order N
I 2
Moment fN 1
2--
1 M
3
9 4 6 - - + 7 M M
4
72 + 82 - 33 24 - M M2 M3
5
600 1250 1225 456 120 - -+ 2- 3+ M M M M 4
where M is the number of independent modes. The f i l moment calculated in this way is shown by the dashed line in fig. 9. Experimental and calculated f l l values tend towards each other and are expected to become identical at an asymptotic value 1 l! for a very large number of modes. Two effects could explain the difference between the experimental and the calculated f l moment. Firstly, the phase relationship of the modes of the laser radiation can be different from that used in calculating the f l l moment assuming that phases of the modes are independent. Secondly, when the laser oscillates in a very large number of modes, the laser spectrum is no longer regular; it consists of a series of bands, each containing 10 modes. The intensities of the modes can be measured, but no information on their relative phases can be obtained. A decisive experiment was carried out to check the influence of the phase relationship when the laser oscillates in seven modes. A cell containing a dye has been put in the oscillator cavity to phase-lock the seven modes. Figure 11 shows that the number of Xe' ions induced when the seven modes are phase-locked is 100 times larger than that induced when the seven modes have random phases. When modes are phase-locked, b N = 1 and f N = g N . Tomov and Chirkin [I9711 have calculated fi, f 3 and f4. For large values of N, f N 2 : 0 . 5 M N - l , i.e., f l = 1.4 x lo8 for M = 7. Comparison of f l thus derived and experimental data taken from fig. 7 seem to indicate that the difference may be explained by a partial mode locking of the seven modes.
NONRESONANT MULTIPHOTON IONIZATION OF ATOMS
33 1
f 102 c ._ c
3
r’ I
VI
c 0
r(
10
1
Fig. 1 1 . The 1I-photon ionization of Xe. The variation of the number of Xe+ ions is plotted as a function of the average laser intensity Tin arbitrary units induced by a seven-mode laser pulse when the seven modes are unlocked (a), or the seven modes are locked (b).
In summary, the N-photon ionization rate W of an atom can be written as
The Nth order autocorrelation function f N is equal to unity for a singlemode laser pulse and N ! in the limit of an infinite number of independent modes. It is a small correction factor (G2) for a two-photon process, while it has a dramatic effect in high-order nonlinear processes. In addition, the characteristic ionization time of a nonresonant multiphoton ionization of an atom is as short as s. As a result, an atom ionized through multiphoton ionization is a very fast detector as far as the statistical properties of a laser pulse are concerned. The measurement of the Nth-order autocorrelation function f N is of special interest to characterize fully a laser radiation.
332
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI, § 5
9 5. Laser Temporal-Coherence Effects in Resonant Multiphoton Ionization of Atoms The characteristic time taken for the nonresonant ionization of atoms can s. As a result, atoms be as short as a few periods of the laser field, i.e., “see” the intensity fluctuations of a multimode laser pulse and are very sensitive to the statistical properties of laser radiation, as was shown in 6 4. In resonant multiphoton ionization of alkaline atoms (in the regime of moderate laser intensities ranging from 107-109 W/cm2), the characteristic ionization time is governed instead by the lifetime of the resonant atomic state and can be as long as s. Consequently, the statistical properties of laser radiation are not expected to enhance dramatically the resonant multiphoton ionization rate. In addition, the laser bandwidth of the multimode laser pulse begins to play a role as soon as it becomes comparable to the resonance width and the resonance detuning. Therefore, laser temporal-coherence effects cannot be investigated independently of the laser bandwidth effects. The problems of laser bandwidth and statistics effects have been investigated theoretically (Kovarskii, Perelman and Todirashku [19761, Armstrong, Lambropoulos and Rahman [1976], Gontier and Trahin [ 19791,
--t--r--I
6 SF ,
=4
6S1,,F=3
0.3 cm-‘
Fig. 12. Schematic representation of the four-photon ionization of the Cs atom when the laser frequency is tuned to the resonant three-photon transition 6s -6F. The four components of the resonance are due to the hyperfine structure of the 6 s ground state and the fine structure of the 6F level.
VI, 8 51
333
RESONANT MULTIPHOTON IONIZATION OF ATOMS
'
O
.
2
3.6 Laser frequency Icm-'1
Fig. 13. The three-photon resonant four-photon ionization of Cs atoms. Resonance profiles were obtained with a single-mode laser pulse (a) or a multi-mode laser pulse wth a 3 GHz bandwidth (b). The average laser intensity for both profiles is 5.6 x lo7 W/cm*, and the laser pulse duration is 50 ns.
Zoller and Lambropoulos [19801) and experimentally (Marx, Simons and Allen [19781, Agostini, Georges, Wheatley, Lambropoulos and Levenson [19781) in two- and three-photon processes. Moreover, the three-photon resonant five-photon ionization of mercury (Poirier, Reif, Normand and Morellec [1984]), and the four-photon resonant five-photon ionization of mercury (Reif, Poirier, Morellec and Normand [19841) have also been considered. Laser coherence and bandwidth effects have been investigated in detail in the four-photon ionization of Cs atoms, with a three-photon resonance on the 6F level (Lomprt, Mainfray, Manus and Marinier [1981], Zoller [1982], Gontier and Trahin [1979]). The sophisticated Nd-glass laser used in this experiment allows the possibility to change both the wavelength and the statistics of the light by varying the number of modes from a single-mode to about 20 modes, corresponding approximately to coherent and Gaussian
334
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CVI,§ 5
0.5 Laser intensity Ix108
w
crn-*I
Fig. 14. The three-photon resonant four-photon ionization of Cs atoms. Resonance shift, expressed in terms of the energy of the three-photon transition 6 s + 6F, induced by a singlemode laser pulse (a), or a incoherent laser pulse (b) is plotted as a function of intensity.
light, respectively. The bandwidth of the single-mode laser pulse is so narrow (about 10 MHz) that it is possible to resolve the four components of the resonance due to the hyperfine structure of the ground state and the fine structure of the 6F level of the Cs atom, as shown in fig. 12. The resonance curves obtained with the incoherent laser pulses were observed to be enhanced, shifted and broadened with regard to those induced by coherent pulses with the same average intensity, as shown in fig. 13. In the resonance profile induced by the multimode laser pulse, the 6F fine structure is not resolved because of the bandwidth of the multimode pulse. The incoherent laser pulse induces a specific shift of the resonance curves. Figure 14 shows that the resonance shift induced by a 3GHz bandwidth laser pulse is 2.8 k 0.2 times larger than that induced by a single-mode pulse. This is in good agreement with calculations performed by Zoller [1982] and Gontier and Trahin [1979]. Zoller [1982] has shown that two laser modes of equal intensity induce a statistical shift as large as half the shift induced by a chaotic multimode laser pulse.
VI, 0 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
335
6 6. Relativistic Self-Focusingof a Laser Pulse in a Plasma 6.1. RECENT POSSIBILITY O F OBSERVING NEW PHYSICAL EFFECTS
For a long time the available focused laser intensity remained limited in the 1015-1016 W/cm2 range. This situation changed at the end of the 1980s because new short pulse laser technology made possible the production of compact, intense laser sources operating at the terawatt level. The two main approaches are based on excimer lasers, especially KrF lasers at 248 nm (Roberts, Taylor, Lee and Gibson [19881, Endoh, Watanabe, Sarukura and Watanabe [1989], Luk, McPherson, Gibson, Boyer and Rhodes [1989]), and chirped pulse amplification in solid-state amplifiers followed by temporal compression at the picosecond or subpicosecond duration (Maine, Strickland, Bado, Pessot and Mourou [19881, Ferray, Lomprt, Gobert, L'Huillier, Mainfray, Manus, Sanchez and Gomes [19903, Perry, Patterson and Weston [19901, Sauteret, Husson, Thiell, Seznec, Gary, Migus and Mourou [1991], Yamakawa, Shiraga, Kato and Barty [1991], Sullivan, Hamster, Kapteyn, Gordon, White, Nathel, Blair and Falcone [19911). These laser sources are capable (after focusing) of producing about 10l8W/cm2 (Normand, Ferray, LomprC, Gobert, L'Huillier and Mainfray [19901,Taylor, Tallman, Roberts, Lester, Gosnell, Lee and Kyrala [19901, Seznec, Sauteret, Gary, Bechir, Bocher and Migus [1992]), i.e., laser field strengths much in excess of an atomic unit. Electrons oscillating in such fields become weakly relativistic. Laser-matter interactions have never yet been investigated in such an intense field. This makes possible a host of experiments never before thought possible in the laboratory (Mainfray and Manus [1991]). This section is devoted to an analysis of one of the most important of these new effects: the expected relativistic self-focusing of a terawatt laser pulse in a plasma. When all the atoms irradiated by a laser pulse are ionized, a plasma is formed and the interaction is then dominated by the coupling between the photons and the electrons. Many theoretical developments have underlined the influence of the main characteristics of the laser light on the plasma behavior. Various effects have been predicted, such as wakefield generation, harmonic generation by relativistic electrons, electron-positron pair production, etc. Self-focusing of a laser pulse in a plasma has been considered one of the most important physical processes. The reason is that self-focusing of the laser light by the plasma would considerably increase the possibility of maintaining ultra-high laser intensity over a length much longer than the
336
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
CW § 6
Rayleigh length by compensation of the natural divergence of the laser beam. This would increase the probability of producing the previously mentioned effects. If the index of refraction of optical media is a function of the field intensity, an electromagnetic beam can generate its own dielectric waveguide that causes the beam to self-focus. If this effect overcomes the diffraction, the beam is self-trapped and propagates in the nonlinear optical medium without divergence. Chiao, Garmire and Townes [ 19641 were the first to predict and estimate such a process in neutral media where the nonlinear term is of the form n,E2. The first experimental evidence was given by Lallemand and Bloembergen [19653.Since then, many theoretical and experimental observations were made. Our main concern will be devoted to plasma media. Several review papers have been devoted to the subject (Svelto [1974], Sodha, Ghatak and Tripathi [1974,1976], Hora [1981], Shen [1984]). The nonlinearities can be produced by three different mechanisms: (i) heating of charged particles, (ii) ponderomotive effects, and (iii) mass relativistic change. We will concentrate on the last two mechanisms. Let us first consider the situation where the plasma has attained equilibrium with the radiation field, which corresponds to long laser pulse duration (adiabatic regime). 6.2. SELF-TRAPPING OF A LONG LASER PULSE IN A PLASMA IN EQUILIBRIUM WITH THE LASER FIELD
During the propagation of the electromagnetic beam in the plasma, the ponderomotive force pushes the electrons out of the region of high power. The central rarefied density channel acts as a waveguide for the beam. If the power is high enough the waveguiding effect may overcome the diffraction, and the beam experiences self-trapping (Kaw, Schmidt and Wilcox [19731). The ponderomotive force is
where ( ) stands for time averaging and 4, is the ponderomotive potential. If ( E 2 ) is time independent and the scale length L is larger than the Debye distance AD = (kT/47~n,e~)'/~, then
+
n, = ni = no exp(e$,/(T, TJ). (6.2) Macroscopic velocities must be smaller than Cs (speed of sound) and time
VI, P 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
337
scales must be long compared to L/Cs.n, obeys the Boltzman distribution. Equation (6.2) shows the reduction of the plasma density at the central region of the beam where the intensity is the highest. The ponderomotive force must be balanced by pressure forces. The wave equation for the laser electric field is written according to Max [19761
a-2-E at2
+
c2 V ~ Ec2 V ( V -E ) = - og0
(6.3)
where no is the electron density for E2 + O , and 4nnoe2
wp20= -.
(6.4)
me
If we set E(x, t ) = &&(x, t) exp [i(oot - ko Z ) ]
+ c.c,
(6.5)
neglect
the term V ( V - E ) relative to V 2 E and assume that 1a2e/aZ21 4 kilel, we obtain from eqs. (6.2) and (6.3) the envelope equation -2ikoc 2 a&+c2V:c-r2E+colfoe[l -exp(-Pl~1~)]= 0 ,
az
Here r2is the nonlinear wave number shift of the light due to the electron density depression and to the waveguide type propagation. Introducing the eikonal approximation, E ( X ) is written as E(X)
= E~
exp(- ikoS).
(6.9)
If we assume (according to Akhmanov, Sukhorukov and Khokhlov [1966]) that E~ and S can be written as E o = ( E o / f ) exp(-r 2 / a2 f 2 ),
S=
r2 1 df 2 fdZ
---
+ #@),
f=f(z),
(6.10) (6.1 1)
then we have for t oa Gaussian constant-shape ansatz. If we introduce E ( X ) from eq. (6.9) in eq. (6.6) we obtain two equations in e0 and S. Replacing E~ and S by eqs. (6.10) and (6.1 1 ) and using the paraxial
338
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
approximation r2 c u2f
CvI, § 6
’,we find for the shape factor f (2) (6.12)
The first term on the right-hand side is the diffraction term. The second term is the self-focusing effect induced by the density gradient. When Eo is such that the second member equals zero, self-trapped propagation is obtained corresponding to a soliton-like solution. The value of the beam radius uE,as shown in fig. 15, is (6.1 3) The minimum value of uE occurs for BE; = 1, (6.14) The critical power Pc leading to self-trapped propagation can be calculated from eq. (6.12); the value is found to be smaller than the exact value by a factor 1/4. The limitation is related to the paraxial approximation which underestimates the ponderomotive effect. Nevertheless, the qualitative behavior of self-trapping is correctly described. When uE is different from
I’
Fig. 15. The variation of equilibrium beam radius a, for self-trapping as a function of laser light intensity.
VI, 8 61
339
RELATIVISTIC SELF-FOCUSING IN A PLASMA
(uE,Jrnin,or when the initial divergence df /dZ is nonzero, the beam undergoes oscillatory behavior in 2 (Max [1976]). This behavior is related to the saturation of nonlinearity at high values of the intensity, i.e., to the influence of the higher-order terms. Catastrophic self-focusing (f +O), does not occur for exponential nonlinearity induced by ponderomotive forces. Lam and Lippmann [19771 have revisited the ponderomotive self-focusing problem by applying to the quasi-optical equation (Kaw, Schmidt and Wilcox [1973]) the moment theory (Vlasov, Petrischev and Talanov [1971]) for deriving u2. They considered the same Gaussian constant-shape approximation. The main difference with respect to Max [1976] concerns the value of uEoin the saturation regime, i.e., for PE; > 1. The plasma is completely expelled from the central region of the beam, the photons propagate freely in an empty cylinder and thus the radius of the beam should not depend strongly on the intensity (fig. 16). Lam and Lippmann [1977] derived the critical power Pc above which self-trapping is obtained,
(6.15)
a
Po = 2mc3 2 = 34T,(keV) MW.
(6.16)
e
For n,/ne = lo2 and T, = 1 keV, Pc = 3.4 GW. The equilibrium beam radius
01
I
I
I
I
0
4
0
12
16
PE:
Fig. 16. The equilibrium beam radius a as a function of the laser intensity, as calculated by Max [I9761 (dotted line) and by Lam and Lippmann [I9771 (solid line).
340
CVI,§ 6
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
versus power is shown in fig. 17. There is a general qualitative agreement with the result of paraxial ray theory. The minimum beam radius is of the order of c/op,, there is a threshold power for the onset of self-trapping, no limit on the amount of transmitted power in such “light channels”, and stability of the self-trapped beam. The only disagreement lies on the absolute value of the critical power (four times the value deduced from paraxial ray theory). Subbarao and Sodha [19791 analyzed self-trapping using a generalization of the angular spectrum representation of electromagnetic beams. Such a method combined with the paraxial ray approximation provides a correct value of the critical power. Anderson and Bonneda [1979] have reformulated the problem in terms of a variational principle. The expressions obtained in this way differ from the corresponding expressions obtained by the moment theory and the paraxial approximation theories. Nevertheless, the results concerning equilibrium radii agree exactly with those of the moment theory. Full information is given on the phase variation of the wave allowing the nonlinear frequency shift determination, which is also found by the use of paraxial ray theories (Max [1976]). The preceding results are not applicable to the relativistic regime. Yu and Shukla [19781 and Felber [19801 have derived the relativistic equations governing the nonlinear interaction of intense, circularly polarized optical beams with warm, quasi-neutral plasmas. Slowly varying envelope and paraxial-ray approximations are used, allowing the determination of the evolu-
01
I
I
I
I
0
2
1
6
8 10 PIP,
I
I
I
I
12
11
16
3
Fig. 17. The equilibrium beam radius as a function of the laser power P . Pc is the critical power for self-focusing.
VL 0 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
34 1
tion of the beam width as the beam propagates in the plasma. The steadystate solutions obtained describe the equilibrium between the plasma and the optical beam. The profile of the radial density due to the ponderomotive force acting on the electrons has been determined; the ions follow the electrons displacement by charge coupling, and they respond to the ambipolar potential in a time which corresponds to few transit times of an ion sound wave propagating across the beam radius,
n, z ni z exp{ - +B[(1
+ y 2 ) l l 2 - I]},
(6.17)
with
Equation (6.17) shows that an optical beam creates a rarefied channel in the plasma around the optical axis, while the intensity required to deplete the plasma at the central part of the beam is found to increase with temperature. The critical power Pc above which the beam is self-trapped is (6.18) Here I,, the Debye length, is given by (6.19) from which it follows that wz
Pc = 10 7T(keV) MW.
(6.20)
OP
If we consider T,= 1 keV and w 2 / w ; = lo2 or n,/n,
= lo2, then
Pc= 1 GW. Felber [1980] found a value for Pc that is a factor 3.4 smaller than the equivalent value given by Lam and Lippmann [1977]. This difference can be explained by the use of paraxial approximation theory in Felber’s derivation. He also determined that the beam cannot be focused to a radius smaller than amin x AD = (2.718)”2
-.C UP
(6.21)
342
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
I0 a
Fig. 18. Distance Z along the optical axis as a function of the beam radius normalized to the Debye length. A, B and C represent beams that begin with equal radius but with different and 0, respectively. Beam A reaches a waist, then divergences da/dZ: -8 x 2x diffracts. Beam B is self-trapped with an oscillating radius. Beam C is self-trapped with a constant radius.
The value of aminis very similar to the values obtained by Max [1976], Lam and Lippmann [1977], and Anderson and Bonneda [1979]. Felber [1980] has determined the conditions that should prevail for selftrapping of the photon beam. The condition P > P, must be supplemented by a sufficiently small divergence of the launching of the beam in the plasma. If the entrance angle is too large, the beam reaches a waist, then diffracts (fig. 18, curve A). Curves B and C of fig. 18 correspond to self-trapping. 6.3. SELF-FOCUSING AND SELF-TRAPPING OF ULTRA-SHORT LASER PULSES
6.3.1. Self-focusingdue to relativistic efiects Let us consider a very different situation related to the interaction of an ultra-short (in the pic0 or subpicosecond range) laser pulse, with an underdense plasma. The interaction is short enough to prevent the establishment of a thermal equilibrium between the electromagnetic field and the plasma. A different type of self-focusing will be considered, in which the nonlinear
VI, § 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
343
index of refraction change is related to the relativistic mass variation induced by the intense electromagnetic (EM) radiation. The influence of charge displacement due to the ponderomotive force is also taken into consideration in more recent developments. The first study on the subject was presented by Max, Arons and Langdon [19743. The relativistic mechanism explored by the authors requires that there is only motion of the plasma electrons, which differs from the point of view previously described, where the dynamics of the plasma as a whole was considered. Relativistic self-modulation and self-focusing were both considered. The plasma is considered as a cold, uniform electron fluid with fixed ion density irradiated by a linearly polarized EM wave E, E = f E o cos ~
0 ,XO
= m o t - koZ.
(6.22)
Due to their motion in the EM wave, electrons acquire a relativistic Lorentz factor yo, yo z 1
+ 3(uo sin x ~ ) ~ ,
(6.23)
where eE0 4 1 uo = mCwO
(weak relativistic limit).
The resulting nonlinear index of refraction is (6.24)
where uf = 4nne2/m and 4 = $ - [(a;- w f ) / ( 4 4 - wf )I. The dynamics of self-modulation is studied through the use of a linear instability analysis. The maximum growth rate ymax is the same for selffocusing and self-modulation, (6.25) ymax is proportional to E ; . Let us consider eq. (6.24). If the light intensity is larger at the center of the beam, which is a reasonable experimental assumption, the index of refraction increases and the phase velocity is decreased in this region relative to the outer regions. The wave front curves, enhancing the intensity by focusing the light beam downstream (fig. 19). The self-focusing length has been calculated by Hora [1975] for various
344
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
Vacuum
Plasma
v.,,:c/N
I I
I
Fig. 19. Schematic representation of the self-focusing of a laser pulse in a plasma.
electron densities and light intensities. An initial plane wave front is assumed at the entrance of the plasma. The wave front moves in steps proportional to the effective wavelength 1 = Lo/N(1),where l ois the vacuum wavelength. The relativistic value of the refractive index N depends on the intensity I. Such a geometrical argument (illustrated in fig. 19) leads to the determination of a self-focusing length which is minimum at the critical value of the intensity, corresponding to a relativistic threshold for which the oscillating energy of the electron equals mc’. Hora and Kane [1977] and Hora, Kane and Hughes [1978] have proposed an explanation for the high-energy ion emission of surfaces irradiated by strong laser pulses based on self-focusing effects in the laser-produced plasma. Spatschek [1977] has reformulated the same problem on the basis of a variational principle (Whitham [ 19741). Up to second order in the electric field of the wave, complete agreement was found with the nonlinear dispersion equation of Max, Arons and Langdon [1974] using Akhmanov, Sukhorukov and Khokhlov’s [1966] method. The focusing of the radiation is calculated; the nonlinear steady state of the beam is shown to be represented by a cubic, nonlinear Schrodinger equation for the electric field amplitude. A self-focusing singularity is found, corresponding to a collapse at the focusing distance, leading locally to an infinite power. This physically
VI,8 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
345
impossible situation disappears when higher order nonlinearities are taken into consideration (Zakharov [19723, Zakharov and Synakh [19751). The refraction index has been calculated up to the fourth order in the electric field amplitude, (6.26)
The last term on the right-hand side of the member of eq. (6.26) is a saturation term which suppresses the collapse. The corresponding nonlinear Schrodinger equation shows that, after the focus, the beam radius oscillates between a minimum (nonzero) and a maximum radius. The first determination of the critical power Pc related to self-focusing due to relativistic effects on the oscillating electrons was given by Schmidt and Horton [1985], p C = - - 109 w. 2 WP When this condition is met, the final laser beam diameter will be of the order of several times c/w,.
9(w)i
6.3.2. Relativistic and ponderomotive efects in selfifocusing
Up to this point, no charge displacement due to ponderomotive forces was taken into consideration. Sun, Ott, Lee and Guzdar [1987] were the first to consider self-focusing as a result of an increase of the refractive index due to the increase on the mass of the electrons caused by their relativistic quiver velocity in the light wave and to the depletion of the electron density related to the expulsion of electrons by the ponderomotive force. All their results are time independent. The pulse length considered is so short that ion inertia prevents ion motion during the length of the pulse. The electrons are assumed to be cold. The basic equations are given by the relativistic Maxwell equations with the Lorentz gauge condition combined with the electron momentum and continuity equations. The plasma is assumed to be very underdense; i.e., & = op/w4 1. The equations are expanded in E by means of the multiple scale method, resulting in: A n= 1
+ F:y,
(6.27) (6.28)
346
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
y = J W ,
CVI,0 6
(6.29)
with (6.30) and N : being the index of refraction. The left-hand side of eq. (6.27) is the envelope equation; the right-hand side represents the current. The balance between the ponderomotive and radial electrostatic forces determines the electron density n in the channel [see eq. (6.28)]. All quantities are dimensionlesss. Equations (6.27)-(6.29) have been solved numerically (for Z independent solutions). In the case where depletion (called cavitation by the authors) is negligible, it may easily be shown that (6.31) with y(0) = [l + A:p,0)]1/2. d is the half-diameter of the beam (see eqs. (28) and (29) in Sun, Ott, Lee and Guzdar [1987]). The dispersion equation is (6.32) As A or the power P increases, (T decreases. For a certain critical value, the density for p = 0 becomes zero. This situation is defined as the onset of cavitation (fig. 20). The critical power for self-focusing is obtained by plotting the evolution
ac= 0.8778,
Fig. 20. Profiles of the field amplitude A(p) and electron density n(p) for u = 0.8778, where q(p = 0) = O occurs (Sun, Ott, Lee and Guzdar [1987]).
VI,P 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
347
of the radii of the beam against power. It is seen in fig. 21 that when P approaches P,, a , and u2 go to infinity. P, defines the critical power relative to the onset of self-focusing in a very demonstrative way. P, is shown to be P , = 1.62
(3 -
10''
W.
(6.33)
For the definition of the radii, see eqs. (34) and (35) of Sun, Ott, Lee and Guzdar [1987]. The critical power for the onset of cavitation, P,,,, is found to be rather close to P,; more precisely:
-PC," - 1.1.
(6.34)
P C
The modifications of the refractive index caused by relativistic effects and ponderomotive effects become of the same order when P is slightly above P,; the beam width channel is then of the order c/o,. In the case of a Z-dependent solution, when P is larger than P,, the radius of the beam is shown to oscillate with Z between a nonzero minimum and a maximum value. When P < P,, diffraction is observed. Unfortunately, the authors were unable to investigate cases where cavitation occurs because of the numerical instabilities in the routine.
10
-
5 -
3
5
7
9
11
P
Fig. 21. The laser beam profile radii a, and a2 as a function of the power P, where P, is the lower limit of laser power for self-focusing (Sun, Ott, Lee and Guzdar C19871).
348
NONLINEAR OF'TICAL PROCESSES IN ATOMS AND PLASMAS
PI,0 6
6.3.3. Potential representation Sprangle, Tang and Esarey [ 19871, and Barnes, Kurki-Suonio and Tajima [19871 have written the envelope equations in terms of an effective particle moving in a potential. Such a representation gives an excellent insight into the laser-plasma dynamics. Barnes, Kurki-Suonio and Tajima [19871 have considered self-trapping by formation of a vacuum channel in the plasma produced by the ponderomotive force acting on the electrons, whereas Sprangle, Tang and Esarey [ 19871 have limited the scope of their investigations to the influence of the nonlinear relativistic term due to the mass change of the oscillating electrons. The methods used to describe the ponderomotive self-focusing effect in the case of equilibrium between radiation field and plasmas, such as Kaw, Schmidt and Wilcox [1973], Max [1976], Lam and Lippmann [1977], Anderson and Bonneda [1979], Felber [1980], are not valid for short laser pulses. Barnes, Kurki-Suonio and Tajima [19871 proposed the following model for the plasma response to the ponderomotive force. The ions are taken to be infinitely massive so that only the electron density can fluctuate n, = no
+ 6n,,
(6.35)
ni = n o .
The electric field is nonuniform and circularly polarized. According to Lindman and Stroscio [1977], the ponderomotive force Fp is Fp= - b x ,
(6.36)
where (6.37) The ponderomotive force is balanced by the radial electrostatic field E induced by the charge separation,
Fp= eE.
(6.38)
From the Poisson equation, the normalized electron density is obtained, (6.39) Equations (6.39) and (6.28) are equivalent. By the use of Maxwell's equations and the Lorentz gauge, the wave
VI, § 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
349
equation is derived
(6.40) with 1 ,’ = op2/c2 and I , = IA,I2. The vector potential A is expressed in terms of the eikonal approximation [see eq. (6.9)]. Applying the slowly varying envelope approximation and resolving the equation uder the paraxial approximati,on with r = ELI and E < 1, we get to first order in E, a rather complicated expression for the equation of motion for the beam radius a. !(a, P ) represents the “force” acting on the beam radius
(6.41) For P = O , d2a 1 dZ2 - kia3’
(6.42)
and pure Rayleigh spreading is recovered. As the power increases, the effect of plasma focusing increases. For a certain critical power Pcritplasma focusing compensates the Rayleigh spreading term; the equation of motion corresponds to zero force and, consequently, to a constant beam radius. Integrating the force term in eq. (6.41), the equation of motion takes the form
av
d2a dz2
-= - -(a,
where I/= -
aa
P),
(6.43)
s
!(a, P ) da.
Two different types of potentials are derived according to the value of the power P being below or above Pcrit(see fig. 22),
(6.45) A careful study of the motion of a in the potential well leads to behavior which differs, depending on the relative sizes of the quantities P and Pcrit. (a) P > Pcri,.The radius oscillates in the potential well, and the beam is
350
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
t
a"
D
(bl
Fig. 22. The Sagdeev potential. The figure shows the potential for P > P, (a), and for P < P, (b). The beam will either self-focus or defocus depending on the initial intensity and divergence of the beam at the entrance to the plasma. If P > P, and the initial divergence is small enough, the beam will self-trap as represented by point I in (a). If the initial divergence is too strong, the beam will defocus. In the case of a very strong initial convergence, the beam will first focus to a small radius and then defocus forever as represented by point 2 in (a). In the case of P < P,, if the beam is initially convergent, it will first focus, and then defocus forever. Otherwise, the beam defocuses as soon as it enters the plasma. These cases are represented by point 3 in (b) (Barnes, Kurki-Suonio and Tajima [1987]).
self-trapped. If the initial convergence da/dZ is too large, the beam will defocus. (b) P < Pcri,.No matter what the initial conditions, the beam cannot be self-trapped. Sprangle, Tang and Esarey [1987] have also derived the envelope equations in terms of an effective particle moving in a potential. Let us recall that their derivation implies no charge displacement, no effect whatsoever of the ponderomotive force on the electrons. The nonlinear term is due solely to relativistic mass change effects. They have considered a helically
VI,8 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
351
polarized radiation field propagating in a cold, collisionless plasma. The approximate local dispersion relation is
+ [c2 k: + o ; ( r , Z)]/2ck, with kl = (k: + k;)”’, and kl $ k. Then o x ck
(6.46)
(6.47) where n(r, 2) is the modified electron density due to the plasma wave, and y,(r,Z) = (1 +
.W/’,
(6.48)
where y,(r,Z) is the relativistic mass factor, and A,@, 2) is the normalized laser field amplitude given by A&, 2) = -exp(- r2/R2(Z)), R(Z)
(6.49)
where Ro is the minimum spot size in the vacuum, and A,, is the normalized laser field amplitude at the center of the beam. From geometric optics the transverse motion of the electromagnetic rays is established, as well as the transverse ray equations. Various moments of the ray equations are taken in order to evaluate R 2 ( 2 )and determine the envelope equation in terms of an effective particle located at R ( Z ) moving in a potential I/. The shape of the potential depends on a parameter a, in such a way that for a = 1, the diffraction term compensates the self-focusing term. For a > 1, V has a minimum for a certain value of R (fig. 23). For a < 1. The effective potential V,,, is a decreasing function of R , with a=-
P
(6.50)
Pcrit
The general behavior is quite comparable to the one described by Barnes, Kurki-Suonio and Tajima [1987]. 6.3.4. Recent developments Mori, Joshi, Dawson, Forslund and Kindel [19881, Kurki-Suonio, Morrison and Tajima [19891, and Borisov, Borovskiy, Shiryaev, Korobkin, Prokhorov, Solem, Luk, Boyer and Rhodes [1992] have included in their derivations the nonlinear effects of both the relativistic electron mass and the ponderomotive potential due to the electromagnetic wave. Mori, Joshi, Dawson, Forslund and Kindel [19881 have used a two-dimensional particle
352
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
0
I
I
I
1
I
1
2
3
4
5
6
X
Fig. 23. The effective potential V ( x ) for a = P/P, = 5, with x = R/R,A,,. The well minimum occurs at x = 0.4 (Sprangle, Tang and Esarey [19873).
in cell-periodic simulations for both single- and double-frequency illumination (Gibbon [1990]). The initial conditions are: = 2.5 keV and u/c = 0.56. In the case of double-frequency illumination, the resonantly excited plasma wave enhances the self-focusing of each incident wave. The following important results were obtained: (1) Relativistic self-focusing occurs initially, followed by “ponderomotive bloW-Out”. (2) No oscillatory motion of the radius of the beam is observed in the x-y Cartesian geometry simulations. (3) At the time when the electrons begin to respond adiabatically to the ponderomotive force, the resulting nonlinearity is important only for scalelengths of the order of c/o,. (4) This nonlinearity leads to an intensity threshold rather than a power threshold for self-focusing. (5) At variance with the paper of Sun, Ott, Lee and Guzdar [1987], no evidence was found of complete expulsion of the electrons from the beam region in the case where the ions are motionless. (6) When movable ions are present, the behavior is different. The beam is seen to self-focus, and then expand outwardly again. The subsequent expansion occurs because the plasma is almost completely expelled, leading to little variation of the dielectric constant except at the channel’s wall. The time taken for the channel formation depends on the laser intensity and
xi
VI, § 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
353
beam waist. For subpicosecond and even picosecond lasers, the pulse duration is smaller than the characteristic time of channel formation for very underdense plasmas. It is expected that no strong plasma blow-out would occur. The treatment by Borisov, Borovskiy, Shiryaev, Korobkin, Prokhorov, Solem, Luk, Boyer and Rhodes [1992] is a generalization of the work by Sun, Ott, Lee and Guzdar [1987] and Kurki-Suonio, Morrison and Tajima [19891. It describes self-channeling including the influence of relativistic and charge-displacement effects, i.e., the propagation of a circularly polarized laser pulse in cold, inhomogeneous plasmas. Four phenomena are taken into consideration: (1) the relativistic increase in the mass of the electrons, (2) the perturbation of the electron density by the ponderomotive force, (3) the diffraction caused by the finite aperture of the beam, and (4) the influence of an inhomogeneous transverse plasma density on the refractive index. The ions are assumed to be frozen inertially in space. The initial radially inhomogeneous plasma density is described by a functionf(r). The relativistic Maxwell equations are solved within the Coulomb gauge condition. Several assumptions and approximations are introduced (e.g., pulse length greater than plasma and electromagnetic wavelength),
P, = A
(6.51)
The electron density n, results from the balance of the ponderomotive and electrostatic forces, (6.52) An analogous expression for n, has been obtained previously by Sun, Ott, Lee and Guzdar [1987], and Barnes, Kurki-Suonio and Tajima [1987]. The general equation, describing the propagation of the amplitude of the vector potential &(r, Z), has been solved using a great variety of initial conditions. Figure 24 presents the self-channeling of a pulse with a hyper-Gaussian initial transverse - intensity distribution and a flat incident wave front in an initially homogeneous plasma, with I,, = 3 x 10’’ W/cm2, ro = 3 pm, 3, = 0.248 nm, and n, = 7.5 x lozocmP3. The main conclusions are the following: (1) The cooperative effect between relativistic effects and charge displacement leads to self-channeling of stable high-intensity Z-independent modes.
354
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
r (pml
Fig. 24. Radial dependence of the asymptotic solutions for the normalized amplitude [Is(r)/I0]”* and the normalized electron density Ns(r)/No.
(2) These modes are identified as the lowest eigenmodes of the nonlinear Schrodinger equation. (3) It has been generally accepted that initially sharply focused beams exhibit a single focus followed by infinite divergence of the beam. The present study shows, in significant contrast, that the combined effects of relativistic and charge-displacement mechanisms on initially strongly focused beams leads generally to confined modes of propagation. Brandi, Manus, Mainfray and Lehner [1993] have developed a twoparameter perturbative approach in w,/w and 1/271r0.A careful account has been taken of the initial inhomogeneity of the plasma and of its influence on the propagation term. The theoretical development generalizes the models of Kurki-Suonio, Morrison and Tajima [1989], Sun, Ott, Lee and Guzdar [19871, and Borisov, Borovskiy, Korobkin, Prokhorov, Shiryaev, Shi, Luk, McPherson, Solem, Boyer and Rhodes [1992]. The main results are the following: (1) A strong influence of the tailoring of the plasma on the value of the critical power has been demonstrated (figs. 25 and 26). If the profile of the radial density is concave, a strong reduction of the critical power can be anticipated, leading to new experimental perspectives. In the case of a convex profile, the increase of the critical power represents a considerable drawback for experimental observations. (2) The mechanisms responsible for the onset of self-focusing are the
VI, 5 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
355
Fig. 25. Parabolic radial profile of the electronic density n(r) = n(O)(1 - urZ/L,2)for a concave profile (a = - 1) leading to focusing (a), and a convex profile (u= + 1) leading to defocusing (b), or a homogeneous profile (c).
relativistic mass variation and the degree of plasma inhomogeneity. The ponderomotive term plays a complementary role in the development of the self-focusing process. It is only for very high values of y that the nonlinear ponderomotive term could initiate the self-focusing process. (3) Diffraction erosion of the leading edge and trailing portion of the pulse corresponding to undercritical power conditions is observed, giving rise to a reduction of the duration of the light pulse propagating in the plasma. Relativistic optical guiding is found to be more effective for long laser pulses than for short pulses (Sprangle, Esarey and Ting [ 19901, Ting, Esarey and Sprangle [19903). 6.4. GENERAL DISCUSSION AND CONCLUSIONS
In the case of long pulses, the laser-plasma interaction leads to an equilibrium between the plasma and the photon beam. The electronic density is given by a Boltzmann distribution which shows a depletion at the center of the beam. Ponderomotive self-focusingappears above a certain critical power of the beam. A soliton-like solution is obtained when the diffraction term and the nonlinear ponderomotive term are equal (P = P,).The minimum
356
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
a)
PIP,
3
2
z 1
-0.1
Focusing I
I
1
0.1
0.2
0.3
U B
0
100
200
300
2
Fig. 26. (a) Behavior of the critical power ( P / P H as ) a function of the inhomogeneity (aB); a/h = 0.1. The chosen initial laser power PL= 0.75PH(horizontal dashed line). (b) Curves I, 11, 111, IV, V. The beam radius (f)as a function of the axial coordinate z (in units of r o ) for aB=O.l; 0.05; 0; -0.05; -0.1, for curves I-V, respectively. PL=0.75PH,except for curve V where PL= 7.5PH. Curves I11 and IV show that in spite of a laser power PL less than the critical power PHfor a homogeneous plasma, self-focusing begins to take place when the radial profile of the electron density has a minimum on the laser axis, as shown in fig. 25a. Moreover, curve V, which corresponds to another condition, where the laser power is larger than PH, gives rise to a stronger focusing behavior.
VL § 61
RELATIVISTIC SELF-FOCUSING IN A PLASMA
357
value of the beam radius is very close to the skin depth of the plasma. When different conditions prevail ( P > Pc), or when the launching angle of the beam in the plasma is large, the propagation of the beam in the plasma is characterized by an oscillatory motion of the beam radius. Different variational theories have been used. Paraxial theories have the advantage of simplicity, but they underestimate the critical power by a factor of 4. This is related to the fact that the central part of the beam corresponds to the region where the ponderomotive effect is maximum. For short pulses, no equilibrium conditions between the photon beam and the plasma can be reached. Under such conditions, self-focusing may arise from the relativistic mass variation of the electron by an increase of the refractive index in the center of the beam and a decrease of the phase velocity relatively to its value at the outer parts of the beams. The wave front curves, enhancing the intensity by focusing the light beam downstream. The first calculation was made by Max [1976], who considered a firstorder nonlinear term in EZ.Extension to a higher-order defocusing term in E4 was achieved by Spatschek [1977]. The influence of charge displacement due to the ponderomotive term was introduced by Sun, Ott, Lee and Guzdar [1987]. Total depletion at a power only slightly greater than the critical power was found. Two groups (Barnes, Kurki-Suonio and Tajima [19871, Sprangle, Tang and Esarey [1987]) have developed a potential formulation in which the envelope equation is expressed in terms of an effective particle moving in a potential. This approach gives an excellent insight into the behavior of the nonlinear propagation of a light beam in plasmas. Barnes, Kurki-Suonio and Tajima [19871 considered a complete balance between the ponderomotive force and the radial electrostatic force due to charge separation, but no conclusion was derived concerning the respective role of relativistic mass variation and charge displacement on the onset of self-focusing. The study of a two-dimensional particle in cell simulation brought a host of new information (Mori, Joshi, Dawson, Forslund and Kindel [19881). Relativistic self-focusing is shown to occur initially, followed by ponderomotive blow-out. At variance with the report of Sun, Ott, Lee and Guzdar [1987], no evidence is found of complete expulsion of electrons from the central beam region in the case where the ions are immobile. Moreover, there is no oscillation in the beam radius in the course of the propagation through the plasma. Somewhat different conclusions were derived by Borisov, Borovskiy, Shiryaev, Korobkin, Prokhorov, Solem, Luk, Boyer and Rhodes [19921.
358
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
[VI
They found that complete expulsion of electrons was generally observed, and that an initially sharply focused beam leads to a confined mode of propagation, in contrast to what was found previously. Brandi, Manus, Mainfray and Lehner [1993] confirmed the onset of relativistic self-focusing prior to any ponderomotive influence on the beam propagation. One major point is related to the importance of a careful tailoring of the plasma which strongly reduces the critical power. The limitations of all theoretical contributions on the subject were found to be related to the perturbative expansion, for which only the leading term is presently taken into consideration in the different approaches, with the exception of the particle in cell simulation. During the period that our article was under review a number of interesting contributions appeared in the literature: Sprangle, Esarey, Krall and Joyce [1992], Antonsen and Mora El9921 and Chen and Sudan [1993]. This fascinating subject needs a combined theoretical and experimental development in order to make substantial progress in the complete understanding of the field. References
Agarwal, G.S., 1970, Phys. Rev. A 1, 1445. Agostini, P., A. Georges, S. Wheatley, P. Lambropoulos and M. Levenson, 1978, J. Phys. B 11, 1733. Akhmanov, S., P. Sukhorukov and R. Khokhlov, 1966, Sov. Phys.-JETP 23, 1025. Anderson, D., and M. Bonneda, 1979, Phys. Fluids 22, 105. Antonsen, T., and P. Mora, 1992, Phys. Rev. Lett. 69, 2204. Armstrong, L., P. Lambropoulos and N. Rahman, 1976, Phys. Rev. Lett. 36, 952. Arslanbekov, T., 1976, Sov. J. Quantum Electron. 6, 117. Bakos, J., 1974, Multiphoton Ionization of Atoms (Academic Press, New York); Adv. Electron & Electron. Phys. 36, 57. Bandrauk, A,, ed., 1988, Atomic and Molecular Processes with Short Intense Laser Pulses, NATO ASI, Vol. 171 B (Plenum, New York). Barnes, D., T. Kurki-Suonio and T. Tajima, 1987, IEEE Trans. Plasma Sci. PS-15, 154. Bebb, H., and A. Gold, 1966, Phys. Rev. 143, I. Borisov, A., A. Borovskiy, V. Korobkin, A.M. Prokhorov, 0. Shiryaev, X.M. Shi, T.S. Luk, A. McPherson, J. Solem, K. Boyer and C.K. Rhodes, 1992, Phys. Rev. Lett. 68,2309. Borisov, A,, A. Borovskiy, 0. Shiryaev, V. Korobkin, A.M. Prokhorov, J. Solem, T.S. Luk, K. Boyer and C.K. Rhodes, 1992, Phys. Rev. A 45, 5830. Brandi, H., C. Manus, G. Mainfray and T. Lehner, 1993, Phys. Rev. E 47, 3780. Bruzzese, R., A. Sasso and S. Solimeno, 1989, Riv. Nuovo Cimento, 12, 1. Burnett, K., and M. Hutchinson, 1989, J. Mod. Opt. 36, 811. Cervenan, M., and N. Isenor, 1974, Opt. Commun. 10, 280. Chang, T.N., and Y.S. Kim, 1982, Phys. Rev. A 26, 2728. Chen, X.L., and R.N. Sudan, 1993, Phys. Rev. Lett. 70,2082. Chiao, R., E. Garmire and C. Townes, 1964, Phys. Rev. Lett. 13, 479.
VII
REFERENCES
359
Chin, S.L., and P. Lambropoulos, eds, 1984, Multiphoton Ionization of Atoms (Academic Press, New York). Chu, S.I.,1985, Adv. Atom. & Mol. Phys. 21, 197. Cohen-Tannoudji, C., 1967, Cargese Lectures in Physics, Vol. 2 (Gordon and Breach, New York) p. 347. Cooke, W., and T. McIlrath, 1987, J. Opt. SOC.Am. B 4, 701. Crance, M., 1978, J. Phys. B 11, 1931. Debethune, J.-L., 1972, Nuovo Cimento B 12, 101. Delone, N.B., and M. Fedorov, 1989, Sov. Phys.-Usp. 32, 500. Delone, N.B., and V. Krainov, eds, 1984, Atoms in Strong Light Fields, Springer Series in Chemical Physics, Vol. 28 (Springer, Berlin). Delone, G.A., N. Manakov, M. Preobrazhenskii and L. Rapoport, 1976, Sov. Phys.-JETP 43, 642.
Ducuing, J., and N. Bloembergen, 1964, Phys. Rev. A 133, 1493. Eberly, J.H., 1979, Phys. Rev. Lett. 42, 1049. Eberly, J.H., and P. Lambropoulos, eds, 1978, Multiphoton Processes (Wiley, New York). Endoh, A., M. Watanabe, N. Sarukura and S. Watanabe, 1989, Opt. Lett. 14, 353. Faisal, F., ed., 1987, Theory of Multiphoton Processes (Plenum, New York). Felber, F., 1980, Phys. Fluids 23, 1410. Ferray, M., L.-A. Lompre, 0. Gobert, A. L'Huillier, G. Mainfray, C. Manus, A. Sanchez and A. Gomes, 1990, Opt. Commun. 75, 278. Fox, R., R. Kogan and E. Robinson, 1971, Phys. Rev. Lett. 26, 1416. Gavrila, M., 1992, Atoms in Intense Radiation Fields. Adv. Atom. & Mol. Opt. Phys., Suppl. I (Academic Press, New York). Gersten, J., and M. Mittleman, 1976, in: Proc. Int. Symp. on Electron and Photon Interaction with Atoms, eds H. Kleinpoppen and M. McDowell (Plenum, New York) p. 553. Gibbon, P., 1990, Phys. Fluids B 2, 2196. Gontier, Y., and M. Trahin, 1973, Phys. Rev. A 7, 2069. Gontier, Y., and M. Trahin, 1979, J. Phys. B 12, 2123. Gontier, Y., and M. Trahin, 1980, J. Phys. B 13, 259. Hora, H., 1975, J. Opt. SOC.Am. 65, 882. Hora, H., ed., 1981, Physics of Laser-Driven Plasmas (Wiley, New York). Hora, H., and E. Kane, 1977, Appl. Phys. 13, 165. Hora, H., E. Kane and J. Hughes, 1978, J. Appl. Phys. 49, 923. Kaw, P., G. Schmidt and T. Wilcox, 1973, Phys. Fluids 16, 1522. Kovarskii, V., N. Perel'man and S. Todirashku, 1976, Sov. J. Quantum Electron. 6, 980. Kulander, K., and A. L'Huillier, 1990, J. Opt. SOC.Am. B 7, 502. Kurki-Suonio, T., P. Morrison and T. Tajima, 1989, Phys. Rev. A 40,3230. Lallemand, P., and N. Bloembergen, 1965, Phys. Rev. Lett. 15, 1010. Lam, J., and B. Lippmann, 1977, Phys. Fluids 20, 1176. Lambropoulos, P., 1968, Phys. Rev. 168, 1418. Lambropoulos, P., 1976, Adv. Atom. & Mol. Phys. 12, 87. Lambropoulos, P., 1987, Comment. Atom. & Mol. Phys. 20, 199. Lambropoulos, P., and S. Smith, eds, 1984, Multiphoton Processes (Springer, Berlin). Lambropoulos, P., C. Kikuchi and R. Osborn, 1966, Phys. Rev. 144, 1081. Landen, O., M. Perry and E. Campbell, 1987, Phys. Rev. Lett. 59, 2558. Lecompte, C., G. Mainfray, C. Manus and F. Sanchez, 1974, Phys. Rev. Lett. 32,265. Lecompte, C., G. Mainfray, C. Manus and F. Sanchez, 1975, Phys. Rev. A 11, 1009. L'Huillier, A., L.-A. Lompre, G. Mainfray and C. Manus, 1983, J. Phys. B 16, 1363. Lin, S.H., 1984, Advances in Multiphoton Processes and Spectroscopy (World Scientific, Singapore).
360
NONLINEAR OPTICAL PROCESSES IN ATOMS AND PLASMAS
[VI
Lindman, E., and M. Stroscio, 1977, Nucl. Fusion 17, 619. Lompre, L.-A., G. Mainfray and C. Manus, 1980, J. Phys. B 13, 85. Lompre, L.-A., G. Mainfray, C. Manus and J. Thebault, 1977, Phys. Rev. A 15, 1604. Lompre, L.-A., G. Mainfray, C. Manus and J. Thebault, 1978, J. Phys. (France) 39, 610. Lompre, L.-A,, G. Mainfray, C. Manus and J.-P. Marinier, 1981, J. Phys. B 14, 4307. Luk, T.S., A. McPherson, G. Gibson, K. Boyer and C.K. Rhodes, 1989, Opt. Lett. 14, 1113. Maine, P., D. Strickland, P. Bado, M. Pessot and G. Mourou, 1988, IEEE J. Quantum Electron. QE-24,398. Mainfray, G., 1980, Comment. Atom. & Mol. Phys. 9, 87. Mainfray, G., 1982, J. Phys. Colloq. (France) 43, C2-367. Mainfray, G., and P. Agostini, eds, 1991, Multiphoton Processes (CEA Press, Saclay). Mainfray, G., and C. Manus, 1991, Rep. Progr. Phys. 54, 1333. Marx, B., J. Simons and L. Allen, 1978, J. Phys. B 11, L273. Masalov, A., 1976, Sov. J. Quantum Electron. 6, 902. Masalov, A., 1985, in: Progress in Optics, Vol. XXII, ed. E. Wolf (North-Holland, Amsterdam). Max, C., 1976, Phys. Fluids 19, 74. Max, C., J. Arons and A. Langdon, 1974, Phys. Rev. Lett. 33,209. Meadors, J., 1966, IEEE J. Quantum Electron. QE-2, 638. Mittleman, M., 1982, Theory of Laser-Atom Interactions (Plenum, New York). Mollow, B., 1968, Phys. Rev. 175, 1555. Morellec, J., D. Normand and G. Petite, 1976, Phys. Rev. A 14, 300. Morellec, J., D. Normand and G. Petite, 1982, Adv. Atom. & Mol. Phys. 18, 97. Mori, W., C. Joshi, J. Dawson, D. Forslund and J. Kindel, 1988, Phys. Rev. Lett. 60, 1298. Mostowski, J., 1976, Phys. Lett. A 56, 87. Normand, D., M. Ferray, L.-A. Lompre, 0. Gobert, A. L'Huillier and G. Mainfray, 1990, Opt. Lett. 15, 1400. Perry, M., F. Patterson and J. Weston, 1990, Opt. Lett. IS, 381. Poirier, M., J. Reif, D. Normand and J. Morellec, 1984, J. Phys. B 17, 4135. Reif, J., M. Poirier, J. Morellec and D. Normand, 1984, J. Phys. B 17, 4151. Reiss, H., 1972, Phys. Rev. Lett. 29, 1129. Rhodes, C.K., 1985, Science 299, 1345. Roberts, J., A. Taylor, P.H. Lee and R. Gibson, 1988, Opt. Lett. 13, 734. Sanchez, F., 1975, Nuovo Cimento B 27, 305. Sanchez, F., and C. Lecompte, 1974, Appl. Opt. 13, 1071. Sauteret, C., D. Husson, G. Thiell, S. Seznec, S. Gary, A. Migus and G. Mourou, 1991, Opt. Lett. 16, 238. Schmidt, G., and W. Horton, 1985, Comments Plasma Phys. & Controlled Fusion, 9, 85. Seznec, S., C. Sauteret, S. Gary, E. BCchir, J.L. Bocher and A. Migus, 1992, Opt. Commun. 87, 331. Shen, Y.R., 1967, Phys. Rev. 155, 921. Shen, Y.R., ed., 1984, The Principles of Nonlinear Optics (Wiley, New York). Smith, S., and P. Knight, eds, 1988, Multiphoton Processes (Cambridge University Press, Cambridge). Smith, S., and G. Leuchs, 1987, Adv. Atom. & Mol. Phys. 24, 157. Sodha, M.S., A.K. Ghatak and V. Tripathi, eds, 1974, Self Focusing of Laser Beams in Dielectrics Plasma and Semiconductors (Tata McGraw-Hill, New Delhi). Sodha, M.S., A.K. Ghatak and V. Tripathi, 1976, in: Progress in Optics, Vol. XIII, ed. E. Wolf (North-Holland, Amsterdam) p. 169. Spatschek, K., 1977, J. Plasma Phys. 18, 293. Sprangle, P., E. Esarey, J. Krall and G. Joyce, 1992, Phys. Rev. Lett. 69, 2200. Sprangle, P., E. Esarey and A. Ting, 1990, Phys. Rev. A 41,4463.
VI1
REFERENCES
361
Sprangle, P., C.M. Tang and E. Esarey, 1987,IEEE Trans. Plasma Sci. Ps-15,145. Subbarao, D.,and M.S. Sodha, 1979,J. Appl. Phys. 50,4604. Sullivan, A., H. Hamster, H. Kapteyn, S. Gordon, W. White, H. Nathel, R. Blair and R. Falcone, 1991,Opt. Lett. 16, 1406. Sun, G.Z., E. Ott, Y.C. Lee and P. Guzdar, 1987,Phys. Fluids 30,526. Svelto, O.,1974,in: Progress in Optics, Vol. XII, ed. E. Wolf (North-Holland, Amsterdam) p. 1. Taylor, A,, C. Tallman, J. Roberts, C. Lester, T. Gosnell, P.H. Lee and G. Kyrala, 1990,Opt. Lett. 15,39. Teich, M., and G. Wolga, 1966,Phys. Rev. Lett. 16,625. Ting, A., E. Esarey and P. Sprangle, 1990, Phys. Fluids B 2, 1390. Tomov, I., and A. Chirkin, 1971,Sov. J. Quantum Electron. 1, 79. Vlasov, S.,V. Petrischev and V. Talanov, 1971,Sov. Radiophys. & Quantum Electron. 14,1062. Whitham, G., ed., 1974, Linear and Nonlinear Waves (Academic Press, New York). Yamakawa, K., H. Shiraga, Y. Kato and C. Barty, 1991,Opt. Lett. 16,1593. Yu, M.Y., and P. Shukla, 1978,Phys. Rev. A 18, 1591. Zakharov, V., 1972,Sov. Phys.-JETP 35,908. Zakharov, V., and S. Synakh, 1975,Sov. Phys.-JETP 41,465. Zoller, P., 1979,Phys. Rev. A 19,1151. Zoller, P., 1982,J. Phys. B 15,2911. Zoller, P., and P. Lambropoulos, 1980,J. Phys. B 13,69.
This Page Intentionally Left Blank
AUTHOR INDEX
A Aarnio, J., 6,51 Abe, M., 42,54 Abu-Mostafa, Y.S.,125, 143 Ackerman, D.A., 29,45,48,51,56,57 Adams, M.J., 6,51 Adams, R., 6,5I Agarwal, G.S., 323,358 Agostini, P., 316,333,358,360 Agranovich, V.M., 281,282,311 Aiki, K., 3,51 Ainslie, B.J., 43, 54 Akhiyezer, A.I., 310,311 Akhmanov, S., 337,344,358 Aksenov, V.P., 208,251,256,262 Alferness, R.C., 4,51,55,57 Allen, L., 333,360 Anderson, D., 340,342,348,358 Anderson, D.B., 27,28,59 Anderson, D.Z., 66, 143 Anthony, P.J., 45,48,56 Antonsen, T., 358,358 Armstrong, L., 332,358 Arons, J., 343,344,360 Arslanbekov,T., 325,358 Asakura, T., 68,143 Athale, R.A., 66,67, 113,143 August, R.R., 27,59 Awwal, A.A.S., 191,200 Axmann, A., 50,53 Azema, A., 5,53
B Baba, T., 6,20-23,25,36,51,52,55 Bado, P., 335,360 Baier, V.N., 310,311 Bakos, J., 315,358 Banakh, V.A., 251,262
Bandrauk, A., 316,358 Barbier, D., 38,52 Barnes, D., 348,350,351,353,354,357,358 Barsukov, K.A., 299,311 Barty, C., 335,361 Baryshevskii, V.G., 310,311 Bean, K.E., 5,52 Beaumont, C., 6,58 Bebb, H., 323,358 Bdechir, E., 335,360 Becker, P.C., 50,55,56 Beguin, A,, 4,56 Belinsky, A.N., 199,200 Belvaux, Y.,190,201 Beran, M.J., 205,208,21 I, 215,227,256, 257,262,263,265,266 Berestetskii,V.B., 301,311 Bernstein,D.R., 207,263 Besieris, I.M., 207,265 Bhattacharya, A.B., 6,25,36,56 Binns, R.A., 190,200 Bismuth, J., 45,52 Bloembergen, N., 323,336,359 Blonder, G.E., 6,29-31,45,46,48,50,51, 53,55,56 Bocher, J.L., 335,360 Bogdankevich, L.S., 281,287,311 Bohr, A., 269,276,31 I Bohr, N., 269,311 Bolotovskii,B.M., 281,311 Bonneda, M., 340,342,348,358 Bonnett, B.R., 36-38,48,57 Borisov, A,, 351,353,354,357,358 Borovskiy,A., 351,353,354,351,358 Botineau, J., 5,53 Boyd, J.T., 5,20,25-28,52,56,59 Boyer, K., 335,351,353,354,357,358,360 Bradley, J.C., 3,55 Brady, D., 84,144 363
364 Brandi, H., 354,358,358 Brandt, G.B., 25,55 Brown Jr, W.P., 205,21 I , 262 Bruzzese, R., 3 16,358 Buhl, L.L., 4,51 Bunkin, F.V., 205,260,265 Burkey, B.C., 36,55 Burnett, K., 316,358 Burns, W.K., 14,54 Busch, J.R., 3,4,58 C
Cameron, R.H., 239,262 Campbell, E., 32 1,359 Campbell, S., 113,143 Carney, J.K., 4,52 Carpenter, G.A., 64,143 Casasent, D., 192,201 Caticha, A., 3 10,311 Caulfield, H.J., 193,200 Cervenan, M., 3 17,358 Chang, C.L., 3,52 Chang, S.H., 5,52 Chang, T.N., 320,358 Charnotskii, M.I., 207,262,264 Chavel, P., 65,66, 143 Chen, C.L., 25,27,52 Chen, X.L., 358,358 Chernov, L.A., 205,211,212,262 Chiao, R., 336,358 Chin, S.L., 316,359 Chirkin, A., 323,329,330,361 Chlanska-Macukow, K., 191,200 Choi, K., 11,143 Chow, P.L., 207,229,262 Chrostowski, J., 23,54 Chu, S.I., 3 15,359 Chubachi, N., 27,52 Codona, J.L., 207,262,263 Cohen-Tannoudji, C., 3 19,359 Combemale, Y., 4,56 Connelly, J.M., 191,200 Cooke, W., 316,359 Cramer, H., 168,200 Crance, M., 319,359 Creamer, D.B., 207,262,263 D Dashen, R., 207,246,263 Davies, I.L., 148,201
AUTHOR INDEX
Davis, J.A., 67, 71, 143 Davis, R.L., 27,52,54 Dawson, J., 351,357,360 De Micheli, M., 5,53 Debethune, J.-L., 323,329,359 Delone, G.A., 3 17,359 Delone, N.B., 3 16,359 Denis, H., 5,6,28,29,3 I, 32,43-45,53,58 Desgranges, E., 6,28,29,31,32,43-45,58 Dickey, F.M., 191,200 Dickinson, A,, 190, 200 Diggavi, S., 25,57 Digonnet, M.J., 49,52 D o h , L.S., 205,211,212,263 Dragone, C., 45,52 Dubovik, V.M., 286,311 Dubovikov, M.M., 207,263 Ducuing, J., 323,359 Dudinov, V.N., 181,200 Duguay, A,, 20-22,35,52 Dunning, G.J., 67, 113, 143, 144 Dutta, S.,27,52
E Eberly, J.H., 316,318,359 Edahiro, T., 33,34,54 Edwards, C.A., 45,52 Eidman, V.Ya., 281,282,285,287,312 Einstein, A., 206,263 Eisele, K., 50,53 Elepov, B.S., 206,264 Endoh, A., 335,359 Ennen, H., 50,53 Erbeia, C., 6,28,29, 31, 32,43-45,58 Erie, M.C., 66,143 Esarey, E., 348,350, 352,355,357,358,360, 361
F Fabian, C.W., 299,309,311 Faisal, F., 3 16,359 Falco, C., 5,53 Fan, C.L., 5,52 Fante, R.L., 205,208,254,263 Farhat, N.H., 65,66,70,77, 113,143,144 Farn, M.W., 192,200 Fedorov, M., 316,359 Felber, F., 340-342,348,359 Fermi, E., 269,311
AUTHOR INDEX
Ferray, M., 335,359,360 Feynman, R.P., 206,207,220,263 Feyzulin, Z.I., 208,251,263 Findalky, T., 4,53 Fisher, A.D., 71,144 Fisher, H.G., 299,309,311 Flatti, S.M., 206,207,227,262-265 Fleisher, M., 192,200,201 Fleishman, G.D., 31 1,312 Fock, V.A., 205,209,263,264 Fontaine, M., 23,54 Forslund, D., 351, 357,360 Fournier, A., 6,28,29, 31,32,43-45,53,58 Fox, R., 3 17,359 Fradkin, E.S., 207,234,263 Frank, I.M., 269,27 I, 273,274,276, 280-284,289-291,295,298,301,312 Frankenthal, S., 208,227,256,262,263 Frehlich, R.G., 207,262,263 Friedlander, C.B., 66,67,143 Frolov, V.P., 292-294,312 Fujii, H., 68, 143 Fujisawa, Y., 48,54 Fukushima, K., 64,66,143 Furutsu, K., 257,263
G Garibyan, G.M., 299,312 Garmire, E., 336,358 Gary, S., 335,360 Gavrila, M., 316,359 Gel’fand, I.M., 207, 220,263 Georges, A,, 333,358 Gersten, J., 323, 329,359 Ghatak,A.K., 6-8, 12, 15, 16, 18, 19,21,23, 25,36,43,53,55-57,336,360 Gianino, J.L., 190,200 Gibbon, P., 352,359 Gibson, G., 335,360 Gibson, R., 335,360 Gidon, P., 6,28,29,31,32,43-45,52,53,58 Ginzburg, V.L., 270,271,275-277,279-283, 285-288,290-295,298,299,302, 306-310,311,312 Gleine, W., 27,42,53 Gobert, O., 335,359,360 Gochelashvili, K.S., 205-208, 251, 259,260, 263-265 Goell, J.E., 14, 15,27,53 Gold, A., 323,358
365
Gold, B., 113, 117, 143 Comes, A., 335,359 Gontier, Y., 317-320,332-334,359 Goodman, J.W., 190,192,200,201 Gorbatsevich, A.A., 288,312 Gorman, R.P., 84,143 Gosnell, T., 335, 361 Gottlieb, M., 25,55 Gozani, J., 208,256,257,264,265 Grand, G., 5,28,44,45,53 Grand, S., 28, 29, 31, 32,43-45,58 Green, M., 38,52 Gregory, D.A.,66,71,74, 105, 121, 131, 143, I44 Grossberg, S., 64,143 Grouillet, A.M., 6,28,29, 31, 32,43-45,53, 58
Gu, X.G., 66,144 Gurvich, AS., 205,206,264 Guzdar, P., 345-347,352-354,357,361
H Habara, K., 43,57 Haga, H., 4,54 Hald, A,, 175,200 Hall, D.G., 6,38,53,55 Hanawa, F., 49,54 Handschy, M.A., 66,143 Hashizume, H., 4,53 Hattori, K., 49,50,53,54 Hawking, S.W., 292,312 Haydl, W., 50,53 Heaviside, O., 272, 273,312 Hecht-Nielsen, R., 88, 143 Henry, C.H., 5,6,29-31,45-48,51-58 Henyey, F.S., 207,262,263 Hibbs, A.R., 207,220,263 Hibino, Y., 42,49,54 Hickernell, F.S., 6,27,52,54 Hinton, G.E., 83, 131,144 Ho, T.L., 205,211,262 Hocker, G.B., 14,54 Holmes, J.F., 208, 254,264 Hong, J., 113, 143 Honkanen, S., 6,51 Hopfield, J.J., 64, 66, 81, 143 Hora, H., 336,343,344,359 Homer, J.L., 190,200 Horton, W., 345,360 Hu, M.K., 126, 127,143
366
AUTHOR INDEX
Hughes, J., 344,359 Hurst, S.L., 83, 143 Husson, D., 335,360 Hutcheson, L.D., 3,4,52,55 Hutchinson, M., 316,358
I Iga, K., 20-23,36,52,55 Isenor, N., 3 17,358 Ishimaru, A., 205,206,264 Izutsu, M., 4,54 J Jackson, H.E., 20,25-28,52,59 Jacobs, M., 150,155, 163,201 Jacobson, D.C., 50,55,56 Jadot, J.P., 6,28,29,31,32,43-45,53,56,58 Jahan, S.R., 191,200 Jansen, R., 4,56 Javidi, B., 190,200 Jenkins, B.K., 113, I44 Jewell, T.E.,45,48,53,54 Jiang, W., 23,54 Jinguji, K., 6, 18, 19, 33-35,40-42,54,57 Johnson, B.H.,45,47,48,56,58 Johnson, K.M., 66,113,143,144 Joshi, C., 351,357,360 Joyce, G., 358,360 Juday, R.D., 71,144 Jutamulia, S., 66-68,143,144
K Kac, M., 206,207,217,220,221,264 Kahler, S.W., 31 I, 312 Kaminow, I.P., 45,52 Kanada, S., 48,54 Kane, E., 344,359 Karim, M.A., 191,200 Kashyap, R., 43,54 Katkov, V.M., 310,311 Kato, Y., 335,361 Katz, L.E., 29,30,53 Kaul, A.N., 15, 19,55 Kaw, P., 336,339,348,359 Kawachi, M., 5,6, 18, 19, 33-35, 39-43,54, 57-59 Kazarinov, R.F., 5,6,29-3 I , 45-48,53-58 Kellner, A.L., 3,55
Kelvin, Lord, 272,312 Kenan, R.P., 3,4,58 Kendall, D.L., 5,54 Kerr, J.R., 208,254,264 Khmelevtsov, S.S., 205,264 Khokhlov, R., 337,344,358 Kikuchi, C., 323,359 Kikuiri, K., 48,54 Kim, Y.S., 320,358 Kindel, J., 351,357,360 Kirzhnits, D.A., 283,312 Kistler, R.C., 6,29,45,47,48, 50,51-57 Kitagawa, T., 49,50,53,54 Kito, T., 40,54 Kleinknecht, K., 299,309,312 Klyatskin, V.I., 205,207,211,214,231,233, 234,246,253,259,264,266 Knight, P., 3 16,360 Knox, R.M., 14,54 Kobayashi, M., 5,34,49,50,53,54,57-59 Kobayashi, S., 40,54,80, I44 Koch, T.L., 20-22,35,52 Kogan, R., 317,359 Kohonen,T, 64,66,131,134,143 Kokubun, Y., 6,20-23,25,35,36,51,52,55 Kometani, T.Y., 31,55 Kominato, T.,34,41,42,54,55,57 Kon, A.I., 205,264 Kondo, T.,6,55 Kopaev, Yu.V., 288,312 Koren, U., 4,55 Korobkin, V., 351,353,354,357,358 Korotky, S.K., 4,55 Kotelnikov, V.A., 152, 157, 163,200 Kovarskii, V., 332,359 Krainov, V., 316,359 Krall, J., 358, 360 Kravtsov, Yu.A., 206,208,209, 21 1,213, 226,25 I , 263-265 Kryshtal, V.A., 181,200 Kulander, K., 316,359 Kumar, A., I5-20,55,58 Kung, S.Y., 61, 143 Kurdi, B.N., 38,55 Kurki-Suonio, T., 348,350,351,353,354, 357,358,359 Kwo, C.Y., 48,51 Kyrala, G., 335,361 L Laborde, P., 4.56 Lalanne, P., 65,66, 143
AUTHOR INDEX
Lallemand, P., 336,359 Lam, J., 339,341,342,348,359 Lambropoulos, P., 315-317,323,332,333, 358,359,361 Landau, L.D., 270,274,276,281,312 Landen, O.,321,359 Langdon, A., 343,344,360 Lauaro, P., 114,143 Leadbetter, M.R., 168,200 Lecompte, C., 321,325,326,329,359,360 Lee, H., 66,144 Lee, H.J., 5, 6,29-31,45,48,53,55,56 Lee, J.N., 71,144 Lee, M.H., 208,254,264 Lee, P.H., 335,360,361 Lee, Y.C., 345-347,352-354,357,361 Leger, J.R., 190,200 Lehner, T., 354,358,358 Leonberger, F.J., 3,55 Leontovich, M.A., 205,209,264 Leppihalme, M., 6,51 Lester, C., 335,361 Leuchs, G.,3 15,360 Levenson, M., 333,358 L'Huillier, A., 3 15,3 16,335,359,360 Li, Y.,122,127,143 Lidgard, A., 50,55,56 Lifshitz, E.M.,270,276,281,30I,311,312 Lilly, R.A., 71,I43 Lin, S.H., 3 16,359 Lindman, E., 348,360 Lindmayer, J., 66,143 Lippmann, B., 339,341,342,348,359 Lippmann,R.P.,64,113,114, 117, 131, 135, I43 Liu, H.K., 67,71,143 Lizet, J., 43,58 Lomprd, L.-A., 315,317,319,321,325,333, 335,359,360 Lorenzo, J.P., 6,36,37,48,49,57 Losyakov, V.V., 283,312 Lowenthal, S.,190,201 Lu, T.,66,71,73,77,90,96, 100, 105, 113, 121,131,143,144 Lubberts, G., 36,55 Luk, T.S., 335,351,353,354,357,358,360 Lukin, V.P., 207,264 Lutomirskii, R.F., 208,254,264
M Ma, C., 68, 143 Macaskill, C., 207,208,227,256,264,265
367
Madden, S.J., 38,52 Mahalanobis, A., 192,201 Mahlab, U., 192,200,201 Maine, P., 335,360 Mainfray, G., 315-317,319,321,325,328, 329,333,335,354,358,358-360 Malarkey, E.C., 3,55 Maloney, W.T., 193,200 Malpass, M.L., I 13, 1 17,143 Manakov, N.,317,359 Manus,C., 315-317,319,321,325,329,333, 335,354,358,358-360 Marcatili, E.A.J., 14, 15, 19,55 Marcuse, D., 6,7,55 Margulis, W., 43,56 Marinier, J.-P., 325,333, 360 Marom, E., 67,144 Martin, J.M., 206,227,264,265 Marx, B.,333,360 Marx, G.E., 3,25,55 Masalov, A,, 323,329,360 Mathieu, X.,4,56 Matsunagu, T.,43,57 Matsuo, S.,5,55 Max, C.,337,339,340,342-344,348,357, 360 Maxwell, G.D., 43,54 Mazar, R.,208,227,256,262,265 McAulay, A.D., 68,143 McCormack, J.S., 35,56 McEliece, R.J., 84,143 McEwan, J.A., 71,144 McIlrath, T., 3 16,359 McPherson, A,, 335,354,358,360 Mead, C.,114,143 Meadors, J., 323,360 Mergerian, D., 3,55 Merz, J.L., 4,55 Migus, A., 335,360 Miller, S.E., 3,55 Mironov, V.L., 205,208,251,254-257,260, 262,264,265 Mittleman, M., 316,323,329,359,360 Miya, T.,6,54 Mollow, B.,323,360 Molyneux, J.E., 205,207,211,265 Mora, P.,358,358 Morellec, J., 315, 320,333,360 Mori, W., 351, 357,360 Morque, A., 44,58 Morrison, P., 351,353,354,359
368
AUTHOR INDEX
Moser, F., 36,55 Moss, T.S., 36,56 Mottier, P., 43,44,53,56,58 Mourou, G., 335,360 MU,G.-G., 191,201 Mulatier, L., 28,29, 3 I, 32,43-45,58 Muller, J., 27,42,53 Munk, W.H., 207,263 Myers, W.M., 35,57 N Nakamura, M., 3,51 Nakoma, N., 4,53 Namavar, F., 38,58 Neifeld, M.A., 80, 144 Neumaan, A., 5,20,25-28,52,56,59 Nield, M., 6,58 Nissim, C., 4,56 Nitka, T., 191,200 Normand, D., 315,320,333,335,360 Nosu, K., 41,58 Nourshargh, N., 35,56 Novikov, E.A., 242,265
0 Oda, K., 41,43,57,58 Ohmori, Y.,6, 34, 39,42,43,49, 50,53-57 Okamoto, K., 6,39,56 Okazaki, H., 34,43,55,57 Okuno, M., 43,57 Olsson, N.A., 6,45,47,48,51,53,54,56,58 Ong, T.M., 35.56 Onose, K., 34,43,55,57 Orlowsky, K.J., 5,6,29-31,45,47,48, 53-58 Osborn, R., 323,359 Osterberg, U., 43,56 Ostrowsky, D.B., 4, 5,53,56 Ott, E., 345-347,352-354,357,361 Owechko, Y.,67, 113,143,144
P Paek, E.G., 65-67,143,144 Pagano-Stauffer, L.A., 66,143 Paige, L.J., 85, 144 Pal, B.P., 6-1 1, 13,23,25, 28, 36,56,57 Papoulis, A,, 238,265 Papuchon, M., 3,4,56
Patterson, F., 335,360 Pautienus, R.P., 3,55 Perel'man, N., 332,359 Perry, M., 321,335,359,360 Pessot, M., 335,360 Petermann, K., 38,57 Peterson, K.E., 5,56 Petite, G., 3 15,320,360 Petrischev, V., 339,361 Pfeiffer, L., 20-22,35,52 Philippe, P., 6,28, 29, 3 1, 32,43-45,58 Pitaevskii, L.P., 301,311 Poate, J.M., 50,55,56 Poirier, M., 333,360 Pokasov, V.V., 206,264 Pol, V., 45,48,53,54 Polman, A., 50,55,56 Pomrenke, G., 50,53 Posner, E.C., 84,143 Prata, A,, 65,66,143 Preobrazhenskii, M., 317,359 Prokhorov,A.M., 205,260,265,351,353, 354,357,358 Psaltis, D., 65-67, 70, 77, 80, 84, 113, 143, 144
R Rahman, N., 332,358 Ramaswamy, R.V., 4,56 Ramey, D.A., 5,52 Rand, M.J., 25,56 Rapoport, L., 317,359 Reed, G.T., 38,58 Reiber, L., 4,56 Reif, J., 333,360 Reiss, H., 3 17,360 Renard, S., 6,28,29,31,32,43-45,58 Revol, F., 45,52 Reynolds, S.A., 207,263 Rhodes,C.K., 315,335,351,353,354,357, 358,360 Rice, S.O., 165, 166, 168,201 Richard, F.V., 27,54 Ritter, K.J., 6, 57 Roberts, J., 335,360,361 Robinson, E., 317,359 Robinson, M.G., 113,144 Rodemich, E.R., 84,143 Rolsma, P.D., 71, 144 Romero, L.A., 191,200
AUTHOR INDEX
Rose, C.M., 207,265 Rosenblatt, F., 66, 131,144 Roy, A.M., 4,56 Rumelhart, D.E., 64,83, 131,144 Ryckebusch, M., 114, I43 Rytov, S.M., 206,209,211,213,226,265 S
Sabel'feld, K.K., 206,237,239,264,265 Saget, J.C., 65,66, 143 Sanchez, A,, 335,359 Sanchez, F., 321,323,325,326,329,359,360 Sarukura, N., 335,359 Sasaki, T., 20-23,36,52,55 Sasamaya, K., 43,57 Sasso, A,, 3 16,358 Sauteret, C., 335,360 Schmidt,G., 336,339,345,348,359,360 Schmidt, K.M., 35,57 Schmidt, R.V., 4,57 Schmidtchen, J., 38,57 Schneider, J., 50,53 Schupert, B., 38,57 Schweizer, P., 44,53 Scotti, R.E., 45,48,56 Seiderman, W., 66,143 Sejnowski, T.J., 84, 143 Sejourne, B., 4,56 Seki, M., 4,53 Seznec, S., 335,360 Shamir, J., 192,200,201 Shani, Y., 6,29,45,47,48,51,53,54,57 Shen, Y.R., 323,336,360 Shenoy, M.R., 20,23,25,53,55 Shi, X.M., 354,358 Shimizu, M., 49, 50,54 Shiraga, H., 335,361 Shiryaev, O., 351,353,354,357,358 Shishov, V.I., 205-207, 21 I, 259,260,264, 265 Shukla, P., 340,361 Shulga, N.F., 310,311 Silva, V.L., 48,51 Simons, J., 333,360 Singh, H., 6,23,25,36,56,57 Smith, S., 315, 316,359,360 Sodha, M.S., 6,57,336,340,360,361 Soffer, B.H., 67,113,143,144 Solem, J., 351, 353, 354, 357,358 Solimeno, S., 3 16,358
369
Sommerfeld, A., 272,273,312 Soref, R.A., 6,36-38,48,49,57,58 Spatschek, K., 344,357,360 Spears, D.L., 3,55 Spivak, M., 207,265 Splett, A,, 38,57 Sprangle, P., 348, 350, 352, 355, 357, 358, 360,361 Sriram, S., 5,52 Srivastava, R., 4,56 St. Jacques, J.M., 125,143 Starr, E.M., 35,56 Stensland, L., 3,58 Stirk, C.W., 113,143 Storti, G.M., 66, 143 Strakhovenko, V.M., 310,311 Strandley, R.D., 25,27,53,56 Streifer, W., 5, 25,26,57 Strickland, D., 335,360 Stroscio, M., 348,360 Stutius, W., 5,25,26,57 Subbarao, D., 340,361 Sudan, R.N., 358,358 Sueta, T., 4,54 Sugita, A., 39-41,43,49,54,56,57 Sukhorukov, P., 337,344,358 Sumida, S., 35,57 Sun, CJ., 35,57 Sun, G.Z., 345-347,352-354,357,361 Suzuki, S., 39,56 Svelto, O., 336,361 Swift, J.D., 85, 144 Synakh, S., 345,361 Szu, H.H., 66,67,143 T Taboury, J., 65,66, 143 Tai, A.M., 71, 144 Tajima, T., 348, 350, 351, 353, 354, 357,358, 359 Takagi, A., 6,57 Takahashi, H., 39,56 Takato, N., 18, 19,33-35,40-43,54,57,58 Talanov, V., 339,361 Tallman, C., 335,361 Tam, E.C., 71,144 Tamm, I.E., 269,274,276,312 Tamura, S., 6,55 Tang, C.M., 348,350,352,357,361 Tatarskii, V.I., 205-207,209,211-215,217,
370
AUTHOR INDEX
226,233,234,237,239,246,253,257-260, 264-266 Taylor, A,, 335,360,361 Taylor, H.F., 3,57 Teague, M.R., 127,144 Teich, M., 323,361 Ter-Mikaelyan, M.L., 270,300-302,312 Terui, H., 34,57 Tewari, R., 6,23, 25,57 Thdbault, J., 317, 319,360 Thiell, G., 335,360 Thyagarajan, K., 6-8, 12, 15, 16, 18,20,21, 23,25,43,53,55,57 Thylen, L., 3,58 Tikhomirov, V.V., 310,311 Timlin, H.A., 5,20,25-28,52 Ting, A,, 355,361 Toba, H., 18, 19,33,34,41,42,57,58 Todirashku, S., 332,359 Toh, S.K., 38,58 Tomov, I., 323, 329, 330,361 Tosunyan, L.A., 286,311 Toulios, P.P., 14,54 Townes, C., 336,358 Trabka, E.A., 36,55 Trahin, M., 317-320,332-334.359 Tripathi, V., 336,360 Tsai, C.S., 3,52 Tsytovich, V.N., 270,277,281,282,286,287, 298,299,302,306-3 10,312 Tur, M., 208,215,256,257,265 U Umeda, J., 3,51 Unruh, W.G., 292,293,312 Urquhart, P., 49,58 Uscinski, B.J., 205,207,208,227,256, 265
Vlasov, S., 339,361 Volkov, B.A., 288,312
W Wagner, E.J., 48,51 Wagner, K., 84,144 Wald, R.M., 293,312 Walther, A,, 212,266 Wang, C.H., 113,144 Wang, J., 68, 143 Wang, X.-M., 191,201 191,201 Wang, Z.-Q., Watanabe, H., 6,51 Watanabe, M., 335,359 Watanabe, S., 335,359 Watrasiewicz, B.M., 190,200 Watson, K.M., 207,263 Weaver, C.S., 190,201 Weiss, B.L., 38,58 Welbourn, D., 6,58 Werner, M., 4,56 Weston, J., 335,360 Wheatley, S., 333,358 Whitham, G., 344,361 Whitman, A.M., 208,227,256,262,263,266 Wiener, N., 206,266 Wilcox, T., 336,339,348,359 Willander, M., 5,58 Williams, R.J., 83, 131, 144 Winder, R.O., 123,144 Wolga, G., 323,361 Woodward, C.E., 3,55 Woodward, Ph.M., 148,201 Wozencraft, J.M., 150,155,163,201 WU,R.W., 5,20,25-28,52 Wu,S.,66,71, 77,90, 113, 143, 144
X V Valette, S., 5,6,28,29, 31,32,43-45,52,53, 56,58 Van der Lugt, A.B., 147, 152,158,189, 190, 201 Varshney, R.K., 15-19,58 Vawter, G.A., 4,55 Venkatesh, S.S., 84,143 Verbeek, B.H., 45,47,58 Verber, C.M., 3,4,58 Vijaya Kumar, B.V.K., 191, 192,200,201
Xu, X., 66,71,77,90, 113,143,144
Y Yaglom, A.M., 207,220,263 Yakushkin, I.G., 206,207,259,260,266 Yamada, Y., 5,34,54,57-59 Yamakawa, K., 335,361 Yamamura, A.A., 80,144 Yang, X., 66,71,73, 74,96, 100, 105, 114, 144
AUTHOR INDEX
Yao, S.K., 27,59 Yaroslavsky, L.P., 151, 152, 159,163, 165, 170,172,175,179-182, 185, 187, 194, 196- 199,200,201 Yaw, M., 5, 18, 19,33-35,40-42,54,55, 57-59 Yeh, P., 113, 143 Yin, S., 74,144 Young, M., 71,144 Yu, F.T.S., 66,67, 71, 73,74,77,90,96, 100, 105,113,114, 121,127,131,143,144 Yu, J., 66, 144 Yu, M.Y., 340,361
Yuan, Y.R., 4,55 Yura, H.T., 208,254,264,266 Z Zachariasen, F., 207,263 Zakharov, V., 345,361 Zavorotny, V.U., 206,207,217,233,246, 253,257,259,265,266
Zelmon, D.E., 5,20,25-28,52,59 Zhang, L., I 13,144 Zipser, D., 64, 131, 144 Zoller, P., 318,333, 334,361 Zouhir Bahri, 191, 201
371
This Page Intentionally Left Blank
SUBJECT INDEX
A AC Stark shift, 318,320 adaptive filter, 180 associative memory, 70, I I7 - -, holographic, 67 - -, optical, 66
E effective index method, 14, 15, 18
F Fabry-Perot etalon, 321 - - resonator,21 Fourier spectrum, 215 - transform, 215 Frank-Tamm formula, 273 Fresnel lens, 44
B back-propagation model, 83,84 Bayes rule, I50 birefringence, 35 - controller, 39 black hole, 292 Boltzmann distribution, 337,355 - machine, 64,131 Bragg cell, 39 - reflection filter, 39,4548 - reflector laser, 39 Bremsstrahlung, 294 -, transition, 303,305-307 Brownian motion, 221,223 C
central processing unit, 63 Cherenkov cone, 289-291 - counter, 300 - radiation, see Vavilov-Cherenkov effect coherence function, 206 computer vision, 197 correlator, optical, 147, 148, 189 Coulomb gauge condition, 353
D diffusion equation, 217 directional coupler, 3,4, 8, 18 Doppler effect, 269,288-291,293,309,311 dressed atom, 319
G Gaussian noise, 148, 152, 162 - probability distribution, 149, 154,222 - random numbers, 240,258 - - process, 154,223 Green function, 207,209,210,214,216,217, 220,223,230,232,233,240,246,247
H Hamming distance, 87,88, 115-1 17, 121, 126,128-130,142 - net, 113-121,142 Helmholtz equation, 207 hetero-association model, 96 hologram, 66 holography, 152 Hopfield model, 64,80-87,89,93-96,109, 111, 113,114, 123, 126, 131 Huygens-Fresnel method, 259,261 - - principle, 208 Huygens-Kirchhoff method, 208,25 I - principle, 272
I integrated optics, 3, 38 interpattern-association model, 89,94, 11I, I42 373
374
SUBJECT INDEX
K
P
Karhunen-Loeve expansion, 238,240 Kohonen learning rule, 137 - model, 133,141 - self-organizingfeature map, 64, 131, I4 I, 142 Kotelnikov’s integral, 164
parabolic equation, 205-207,209,217 paraxial approximation, 340,341 parseval relation, 175 - theorem, 68 path-integral, Feynman, 206,207,220 -, phase space, 227 - representation, 217,225,229,241,242, 244,251,261 pattern recognition, 63,66, 197 perceptron, 64, 113, 131 phase-only filter, 193 photoionization cross-section, 318 photolithography, 39,40 ponderomotive force, 337,341,345,348,353 - potential, 35 I power spectrum, 156,157,177, 180,184,197 Poynting vector, 12
L Landau damping, 274 liquid-crystal television, 1 1 lithography, frequency-doubling, 48 Lorentz gauge, 345,348 - transformation, 286
M Mach-Zehnder interferometer, 40-42,46, 47 - - multiplexer, 45 Markov approximation, 205,211 matched filter, 152, 158, 160, 163, 165, 179, 189, 191, 193 Minkowski space, 293 Monte Carlo method, 240 multilevel recognition model, 87 multiphoton ionization of atoms, 315,316, 321,331,332 - - rate, 332 multiplexer, frequency division, 39,42 -,wavelength division, 39
N neural network, 63-66,78,97,100, 142 - -, adaptive, 72 - -, back-propagation, 84 - -, Hopfield, 121, 122, 127, 129, 130 - -, LCTV, 71,72,97, 100 - -, models, 80 - -, multilayer, 83 - -,optica1,69-71,73,85, 104, 117, 131 - -, - disc-based, 77,80 - _ ,- implementation of, 74 - -, redundant interconnection, 105, 106 neuron, 63,64,75,96,97 0 optical fiber, 5,49 - information processing, 147
R Rabi frequency, 318 radiative transfer theory, 213 random media, phase conjugating in, 207 - -, wave propagation in, 205,207,208 Rayleigh length, 336 - spreading, 349 - wave,309 S
Schrodinger equation, nonlinear, 353 self-focusing, 335, 338, 342-348, 354,355, 357 soliton, 338 space-bandwidth product, 101 spatial filtering, 190 - light modulator, 66, 70, 113 spectrum analyzer, 3 synchrotron radiation, 294
T Tchebyshev’s inequality, 175 Thomson scattering, 307
V Vavilov-Cherenkov effect, 271,272, 274-277,282,288,300,309,310 - - radiation, 269,271-277,279-283,285, 287,294,295,301,302,310 Venn diagram, 90
SUBJECT INDEX
W Walsh function, 260 waveguide, channel, 29,3435 -, graded-index, 27 -, multilayer, 20 -, multimode, 34
-,optical, 3,5,6, 15,20,25,38 -, planar, 6, I2,3 1 -, single-mode, 12,33 Wiener integral, 220 - path-integral, 224
375
This Page Intentionally Left Blank
CUMULATIVE INDEX - VOLUMES I-XXXII 11, 249 A B E L ~F., , Methods for Determining Optical Parameters of Thin Films VII, 139 ABELLA,I. D., Echoes at Optical Frequencies XVI, 71 ABITBOL,C. I., see J. J. Clair ABRAHAM,N. B., P. MANDEL,L. M. NARDUCCI, Dynamical Instabilities and xxv, 1 Pulsations in Lasers XI, 1 AGARWAL,G. S., Master Equation Methods in Quantum Optics IX, 235 AGRANOVICH,V. M., V. L. GINZBURG, Crystal Optics with Spatial Dispersion XXVI, 163 AGRAWAL, G. P., Single-Longitudinal-Mode Semiconductor Lasers IX, 179 ALLEN,L., D. G. C. JONES,Mode Locking in Gas Lasers IX, 123 AMMANN,E. O., Synthesis of Optical Birefringent Networks ARMSTRONG, J. A,, A. W. SMITH,Experimental Studies of Intensity Fluctuations VI, 21 1 in Lasers XI, 247 ARNAUD, J. A., Hamiltonian Theory of Beam Mode Propagation BALTES, H. P., On the Validity of Kirchhoffs Law of Heat Radiation for a Body XIII, I in a Nonequilibrium Environment Enhanced BARABANENKOV, Yu. N., Yu. A. KRAVTSOV, V. D. OZRIN,A. I. SAICHEV, XXIX, 65 Backscattering in Optics R., The Intensity Distribution and Total Illumination of AberrationBARAKAT, I, 67 Free Diffraction Images XXI, 217 BARRETT, H. H., The Radon Transform and its Applications XII, 287 S., Beam-Foil Spectroscopy BASHKIN, BASSETT,I. M., W. T. WELFORD, R. WINSTON,Nonimaging Optics for Flux XXVII, 161 Concentration VI, 53 BECKMANN, P., Scattering of Light by Rough Surfaces BERRY,M. V., C. UPSTILL,Catastrophe Optics: Morphologies of Caustics and XVIII, 257 their Diffraction Patterns XXVII, 227 BERTOLOTTI, M., see D. Mihalache XVI, 357 BEVERLY111, R. E., Light Emission From High-Current Surface-Spark Discharges XXVIII, 87 BJORK,G., see Y.Yamamoto IX, 1 BLOOM, A. L., Gas Lasers and their Application to Precise Length Measurements BOUMAN, M. A., W. A. VAN DE GRIND,P. ZUIDEMA, Quantum Fluctuations in Vision XXII, 77 IV, 145 BOUSQUET, P., see P. Rouard XXIII, 1 BROWN,G. S., see J. A. DeSanto
377
378
CUMULATIVE INDEX
~
VOLUMES I-XXXII
BRUNNER,W., H. PAUL,Theory of Optical Parametric Amplification and xv, 1 Oscillation BRYNGDAHL, O., Applications of Shearing Interferometry IV, 37 XI, 167 O., Evanescent Waves in Optical Imaging BRYNGDAHL, BRYNGDAHL,0.. F. WYROWSKI, Digital Holography - Computer-Generated Holograms XXVIII, 1 11, 13 BURCH,J. M., The Metrological Applications of Diffraction Gratings XIX, 211 BUTTERWECK, H. J., Principles of Optical Data-Processing XVII, 85 CAGNAC, B., see E. Giacobino CASASENT, D., D. PSALTIS, Deformation Invariant, Space-Variant Optical Pattern Recognition XVI, 289 Zone Plate Coded Imaging: Theory and CEGLIO,N. M., D. W. SWEENEY, Applications XXI, 287 CHARNOTSKII, M. I., J. GOZANI,V. I. TATARSKII and V. U. ZAVOROTNY, Wave Propagation Theories in Random Media Based on the Path-Integral XXXII, 203 Approach XIII, 69 CHRISTENSEN, J. L., see W. M. Rosenblum CHRISTOV, I. P., Generation and Propagation of Ultrashort Optical Pulses XXIX, 199 CLAIR,J. J., C. I. ABITBOL, Recent Advances in Phase Profiles Generation XVI, 71 CLARRICOATS, P. J. B., Optical Fibre Waveguides - A Review XIV, 327 COHEN-TANNOUDJI, C., A. KASTLER, Optical Pumping v, 1 COLE,T. W., Quasi-Optical Techniques of Radio Astronomy XV, 187 COLOMBEAU, B., see C. Froehly XX, 63 COOK,R. J., Quantum Jumps XXVIII, 361 M. SA~SSE, Some New Optical Designs COURT&G., P. CRUVELLIER, M. DETAILLE, for Ultra-Violet Bidimensional Detection of Astronomical Objects xx, 1 K., Phase-Measurement Interferometry Techniques XXVI, 349 CREATH, CREWE,A. V., Production of Electron Probes Using a Field Emission Source XI, 223 CRUWLLIER, P., see G. Courtes xx, 1 H. Z., H. L. SWINNEY, Light Beating Spectroscopy CUMMINS, VIII, 133 DAINTY, J. C., The Statistics of Speckle Patterns XIV, 1 DANDLIKER, R., Heterodyne Holographic Interferometry XVII, 1 DATTOLI,G., L. GIANNESSI, A. RENIERI,A. TORRE,Theory of Compton Free Electron Lasers XXXI, 321 DECKER JR, J. A., see M. Harwit XII, 101 E., R. J. PEGIS,Methods of Synthesis for Dielectric Multilayer Filters VII, 61 DELANO, DEMARIA, A. J., Picosecond Laser Pulses IX, 31 Analytical Techniques for Multiple Scattering from DESANTO, J. A,, G. S. BROWN, Rough Surfaces XXIII, 1 DETAILLE, M., see G. Courtes xx, 1 X, 165 DEXTER, D. L., see D. Y. Smith XII, 163 K. H., Interaction of Light with Monomolecular Dye Layers DREXHAGE. XIV, 161 DUGUAY, M. A., The Ultrafast Optical Kerr Shutter
CUMULATIVE INDEX - VOLUMES I-XXXII
379
DUTTA,N. K., J. R. SIMPSON, Optical Amplifiers XXXI, 189 J. H., Interaction of Very Intense Light with Free Electrons EBERLY, VII, 359 ENGLUND, J. C., R. R. SNAPP,W.C. SCHIEVE, Fluctuations, Instabilities and Chaos in the Laser-Driven Nonlinear Ring Cavity XXI, 355 ENNOS,A. E., Speckle Interferometry XVI, 233 FABRE,C., see S. Reynaud xxx, 1 FANTE, R. L., Wave Propagation in Random Media: A Systems Approach XXII, 341 FIORENTINI, A., Dynamic Characteristics of Visual Processes I, 253 FLYTZANIS, C., F. HACHE,M. C. KLEIN,D. RICARD,PH. ROUSSIGNOL, Nonlinear Optics in Composite Materials. I. Semiconductor and Metal Crystallites in Dielectrics XXIX, 321 FOCKE,J., Higher Order Aberration Theory IV, 1 FRANCON,M., S. MALLICK,Measurement of the Second Order Degree of Coherence VI, 71 Localization of Waves in Media with OneFREILIKHER, V. D., S . A. GREDESKUL, Dimensional Disorder XXX, 137 FRIEDEN, B. R., Evaluation, Design and Extrapolation Methods for Optical Signals, Based on Use of the Prolate Functions IX, 311 FROEHLY,C., B. COLOMBEAU, M. VAMPOUILLE, Shaping and Analysis of Picosecond Light Pulses XX, 63 FRY,G. A., The Optical Performance of the Human Eye VIII, 51 GABOR,D., Light and Information I, 109 GAMO,H., Matrix Treatment of Partial Coherence 111, 187 GHATAK, A,, K. THYAGARAJAN, Graded Index Optical Waveguides: A Review XVIII, 1 GHATAK, A. K., see M. S . Sodha XIII, 169 Doppler-Free Multiphoton Spectroscopy GIACOBINO, E., B. CAGNAC, XVII, 85 GIACOBINO, E., see S. Reynaud xxx, 1 GIANNESSI, L., see G. Dattoli XXXI, 321 GINZBURG, V. L., see V. M. Agranovich IX, 235 GINZBURG, V. L., Radiation by Uniformly Moving Sources. Vavilov-Cherenkov Effect, Doppler Effect in a Medium, Transition Radiation and Associated Phenomena XXXII, 267 GIOVANELLI, R. G., Diffusion Through Non-Uniform Media 11, 109 GLASER,I., Information Processing with Spatially Incoherent Light XXIV, 389 Applications of Optical Methods in the Diffraction GNIADEK, K., J. PETYKIEWICZ, Theory of Elastic Waves IX, 281 GOODMAN, J. W., Synthetic-Aperture Optics VIII, I J., see M. I. Charnotskii GOZANI, XXXII, 203 R., The Phase Transition Concept and Coherence in Atomic Emission GRAHAM, XII, 233 S. A., see V. D. Freilikher GREDESKUL, XXX, 137 HACHE,F., see C. Flytzanis XXIX, 321 HALL, D. G., Optical Waveguide Diffraction Gratings: Coupling between Guided Modes XXIX, 1
380
CUMULATIVE INDEX - VOLUMES I-XXXII
HARIHARAN, P., Colour Holography P., Interferometry with Lasers HARIHARAN, HARWIT,M., J. A. DECKERJR, Modulation Techniques in Spectrometry HASEGAWA, A., see Y. Kodama HEIDMANN, A., see S. Reynaud HELSTROM,C. W., Quantum Detection Theory HERRIOT,D. R., Some Applications of Lasers to Interferometry HUANG,T. S., Bandwidth Compression of Optical Images IMOTO,N., see Y. Yamamoto JACOBSON,R., Light Reflection from Films of Continuously Varying Refractive Index JACQUINOT, P., B. ROIZEN-DOSSIER, Apodisation JAMROZ,W., B. P. STOICHEFF,Generation of Tunable Coherent VacuumUltraviolet Radiation JONES,D. G. C., see L. Allen KASTLER, A., see C. Cohen-Tannoudji KHOO, I. C., Nonlinear Optics of Liquid Crystals KIELICH,S., Multi-Photon Scattering Molecular Spectroscopy KINOSITA, K., Surface Deterioration of Optical Glasses KITAGAWA, M., see Y. Yamamoto KLEIN,M. C., see C. Flytzanis KODAMA, Y., A. HASEGAWA, Theoretical Foundation of Optical-Soliton Concept in Fibers KOPPELMAN, G., Multiple-Beam Interference and Natural Modes in Open Resonators KOITLER,F., The Elements of Radiative Transfer KOTTLER, F., Diffraction at a Black Screen, Part I: Kirchhoffs Theory KOTTLER, F., Diffraction at a Black Screen, Part 11: Electromagnetic Theory KRAVTSOV, Yu. A., Rays and Caustics as Physical Objects KRAVMV,Yu. A., see Yu. N. Barabanenkov KUBOTA, H.,Interference Color LABEYRIE, A., High-Resolution Techniques in Optical Astronomy LEAN,E. G., Interaction of Light and Acoustic Surface Waves LEE,W.-H., Computer-Generated Holograms: Techniques and Applications LEITH,E. N., J. UPATNIEKS, Recent Advances in Holography LETOKHOV, V. S., Laser Selective Photophysics and Photochemistry LEVI,L., Vision in Communication LIPSON,H., C. A. TAYLOR, X-Ray Crystal-Structure Determination as a Branch of Physical Optics LUGIATO, L. A., Theory of Optical Bistability MACHIDA, S., see Y. Yamamoto G., C. MANUS,Nonlinear Processes in Atoms and in Weakly MAINFRAY, Relativistic Plasmas
XX, 263 XXIV, 103 XII, 101 XXX, 205 xxx, 1 X, 289 VI, 171 x, 1 XXVIII, 87 V, 247 111, 29 XX, 325 IX, 179 v, 1 XXVI, 105 XX, 155 IV, 85 XXVIII, 87 XXIX, 321 XXX, 205 VII, 1 111, 1 IV, 281 VI, 331 XXVI, 227 XXIX, 65 I, 211 XIV, 47 XI, 123 XVI, 119 VI, 1 XVI, 1 VIII, 343 V, 287 XXI, 69 XXVIII, 87 XXXII, 3 13
CUMULATIVE INDEX - VOLUMES I-XXXII
38 1
MALACARA, D., Optical and Electronic Processing of Medical Images XXII, 1 S., see M. Francon MALLICK, VI, 71 MANDEL.L., Fluctuations of Light Beams 11, 181 MANDEL,L., The Case For and Against Semiclassical Radiation Theory XIII, 27 MANDEL,P., see N. B. Abraham xxv, 1 MANUS,C., see G. Mainfray XXXII, 313 MARCHAND, E. W., Gradient Index Lenses XI, 305 MARTIN,P. J., R. P. NETTERFIELD, Optical Films Produced by Ion-Based Techniques XXIII, 113 MASALOV,A. V., Spectral and Temporal Fluctuations of Broad-Band Laser Radiation XXII, 145 MAYSTRE, D., Rigorous Vector Theories of Diffraction Gratings XXI, 1 MEESSEN,A,, see P. Rouard xv, 77 MEHTA.C. L., Theory of Photoelectron Counting VIII, 373 MEYSTRE,P., Cavity Quantum Optics and the Quantum Measurement Process XXX, 261 MIHALACHE, D., M. BERTOLOTTI, C. SIBILIA,Nonlinear Wave Propagation in Planar Structures XXVII, 227 Quasi-Classical Theory of Laser MIKAELIAN, A. L., M. L. TER-MIKAELIAN, Radiation VII, 231 MIKAELIAN, A. L., Self-Focusing Media with Variable Index of Refraction XVII, 279 MILLS,D.L., K. R. SUBBASWAMY, Surface and Size Effects on the Light Scattering Spectra of Solids XIX, 45 Atoms in Strong Fields: Photoionization and MILONNI.P. W., B. SUNDARAM, Chaos XXXI, 1 K., Wave Optics and Geometrical Optics in Optical Design MIYAMOTO, I, 31 MOLLOW,B. R., Theory of Intensity Dependent Resonance Light Scattering and Resonance Fluorescence XIX, I MURATA,K., Instruments for the Measuring of Optical Transfer Functions V, 199 Multilayer Antireflection Coatings MUSSET,A,, A. THELEN, VIII, 201 NARDUCCI, L. M., see N. B. Abraham xxv, I NETTERFIELD, R. P., see P. J. Martin XXIII, 113 NISHIHARA, H., T. SUHARA,Micro Fresnel Lenses XXIV, 1 OHTSU,M., T. TAKO,Coherence in Semiconductor Lasers XXV, 191 OKOSHI, T., Projection-Type Holography XV, 139 OOUE,S., The Photographic Image VII, 299 Holographic Methods of Plasma OSTROVSKAYA, G. V., Yu. I. OSTROVSKY, Diagnostics XXII, 197 OSTROVSKY, Yu. I., see G. V. Ostrovskaya XXII, 197 OSTROVSKY, Yu. I., V. P. SHCHEPINOV, Correlation Holographic and Speckle Interferometry XXX, 87 K. E., Unstable Resonator Modes OUGHSTUN, XXIV, 165 OZRIN,V. D., see Yu. N. Barabanenkov XXIX, 65 PAL, B. P., Guided-Wave Optics on Silicon: Physics, Technology and Status XXXII, 1
382
CUMULATIVE INDEX - VOLUMES I-XXXII
PATORSKI, K., The Self-Imaging Phenomenon and Its Applications PAUL,H., see W. Brunner PEGIS,R. J., The Modern Development of Hamiltonian Optics PEGIS, R. J., see E. Delano PERINA,J., Photocount Statistics of Radiation Propagating Through Random and Nonlinear Media PERSHAN, P. S., Non-Linear Optics PETYKIEWICZ, J., see K. Gniadek PICHT,J., The Wave of a Moving Classical Electron POPOV, E., Light Diffraction by Relief Gratings: A Macroscopic and Microscopic View PORTER,R. P., Generalized Holography with Application to Inverse Scattering and Inverse Source Problems PSALTIS, D., see D. Casasent PSALTIS, D., Y. QIAO,Adaptive Multilayer Optical Networks QIAO,Y., see D. Psaltis RAYMER,M. G., I. A. WALMSLEY, The Quantum Coherence Properties of Stimulated Raman Scattering RENIERI, A,, see G. Dattoli S., A. HEIDMANN, E. GIACOBINO, C. FABRE,Quantum Fluctuations in REYNAUD, Optical Systems RICARD, D., see C. Flytzanis RISEBERG,L. A., M. J. WEBER, Relaxation Phenomena in Rare-Earth Luminescence RISKEN,H., Statistical Properties of Laser Light RODDIER,F., The Effects of Atmospheric Turbulence in Optical Astronomy B., see P. Jacquinot ROIZEN-DOSSIER, RONCHI,L., see Wang Shaomin ROSENBLUM, W. M., J. L. CHRISTENSEN, Objective and Subjective Spherical Aberration Measurements of the Human Eye ROTHBERG,L., Dephasing-Induced Coherent Phenomena Optical Constants of Thin Films ROUARD, P., P. BOUSQUET, ROUARD, P., A. MEESSEN, Optical Properties of Thin Metal Films ROUSSIGNOL,PH., see C. Flytzanis RUBINOWICZ, A,, The Miyamoto-Wolf Diffraction Wave RUDOLPH, D., see G. Schmahl A. I., see Yu. N. Barabanenkov SAICHEV, SA~SSE, M., see G. Courtes SAITO,S., see Y. Yamamoto SAKAI,H., see G. A. Vanasse SALEH,B. E. A,, see M. C. Teich SCHIEVE, W. C., see J. C. Englund G., D. RUDOLPH,Holographic Diffraction Gratings SCHMAHL,
XXVII,
xv,
1 1
1,
1
VII, 67 XVIII, V, IX, V,
127 83 281 351
XXXI, 139 XXVII, XVI, XXXI, XXXI,
315 289 227 227
XXVIII, 181 XXXI, 321
xxx, 1 XXIX, 321 XIV, 89 VIII, 239 XIX, 281 111, 29 XXV, 279 XIII, XXIV, IV, XV, XXIX, IV, XIV, XXIX,
69 39 145 71 321 199 195 65
xx,
1
XXVIII, VI, XXVI, XXI, XIV,
87 259 I 355 195
383
CUMULATIVE INDEX - VOLUMES I-XXXII
SCHUBERT,M., B. WILHELMI,The Mutual Dependence Between Coherence Properties of Light and Nonlinear Optical Processes SCHULZ,G., J. SCHWIDER, Interferometric Testing of Smooth Surfaces SCHULZ,G., Aspheric Surfaces SCHWIDER, J., see G. Schulz SCHWIDER, J., Advanced Evaluation Techniques in Interferometry SCULLY,M. O., K. G. WHITNEY, Tools of Theoretical Quantum Optics SENITZKY, 1. R., Semiclassical Radiation Theory Within a Quantum-Mechanical Framework SHCHEPINOV, V. P., see Yu. I. Ostrovsky SIBILIA,C., see D. Mihalache SIMPSON,J. R., see N. K. Dutta SIPE,J. E., see J. Van Kranendonk SITTIG,E. K., Elastooptic Light Modulation and Deflection SLUSHER,R. E., Self-Induced Transparency SMITH,A. W., see J. A. Armstrong Optical Absorption Strength of Defects in Insulators SMITH,D. Y., D. L. DEXTER, SMITH,R. W., The Use of Image Tubes as Shutters SNAPP,R. R., see J. C. Englund V. K. TRIPATHI, Self-Focusing of Laser Beams in SODHA,M. S., A. K. GHATAK, Plasmas and Semiconductors SOROKO,L. M., Axicons and Meso-Optical Imaging Devices Optical Atoms SPREEUW,R. J. C., J. P. WOERDMAN, STEEL,W. H., Two-Beam Interferometry STOICHEFF, B. P., see W. Jamroz STROHBEHN, J. W., Optical Propagation Through the Turbulent Atmosphere STROKE,G. W., Ruling, Testing and Use of Optical Gratings for High-Resolution Spectroscopy SUBBASWAMY, K. R., see D. L. Mills SUHARA,T., see H. Nishihara SUNDARAM, B., see P. W. Milonni SVELTO,O., Self-Focusing, Self-Trapping, and Self-phase Modulation of Laser Beams SWEENEY, D. W., see N. M. Ceglio SWINNEY,H. L., see H. Z. Cummins TAKO, T., see M. Ohtsu TANAKA, K., Paraxial Theory in Optical Design in Terms of Gaussian Brackets TANGO, W. J., R. Q. Twiss, Michelson Stellar Interferometry TATAKSKII, V. I., V. U. ZAVOKOTNYI, Strong Fluctuations in Light Propagation in a Randomly lnhomogeneous Medium TATAKSKII, V. I., see M. I. Charnotskii TAYLOR, C. A., see H. Lipson TEICH,M. C., 9. E. A. SALEH, Photon Bunching and Antibunching
XVII, 163 XIII, 93 xxv, 349 XIII, 93 XXVIII, 271 X, 89 XVI, 413 XXX, 87 XXVII, 227 XXXI, 189 XV, 245 X, 229 XII, 53 VI, 211 X, 165 x , 45 XXI, 355 XIII, XXVII, XXXI, V, XX, IX,
169 109 263 145 325 73
11, 1 XIX, 45 XXIV, 1 XXXI, I XII, XXI, VIII, xxv, XXIII, XVII,
1 287 133 191 63 239
XVIII, 204 XXXII, 203 V, 287 XXVI, 1
384
CUMULATIVE INDEX- VOLUMES I-XXXII
TER-MIKAELIAN, M. L., see A. L. Mikaelian THELEN, A,, see A. Musset THOMPSON, B. J., Image Formation with Partially Coherent Light THYAGARAJAN, K., see A. Ghatak TONOMURA, A., Electron Holography TORRE,A,, see G. Dattoli TRIPATHI, V. K., see M. S. Sodha TSUJIUCHI, J., Correction of Optical Images by Compensation of Aberrations and by Spatial Frequency Filtering Twiss, R. Q., see W. J. Tango J., see E. N. Leith UPATNIEKS, UPSTILL,C., see M. V. Berry USHIODA,S., Light Scattering Spectroscopy of Surface Electromagnetic Waves in Solids VAMPOUILLE, M., see C. Froehly VANDE GRIND,W. A,, see M. A. Bouman VANHEEL,A. C. S., Modern Alignment Devices VAN KRANENDONK,J., J. E. SIPE, Foundations of the Macroscopic Electromagnetic Theory of Dielectric Media VANASSE, G. A., H. SAKAI,Fourier Spectroscopy VERNIER, P. J., Photoemission WALMSLEY, I. A,, see M. G. Raymer WANGSHAOMIN, L. RONCHI,Principles and Design of Optical Arrays WEBER,M. J., see L. A. Riseberg WEIGELT, G., Triple-Correlation Imaging in Optical Astronomy WELFORD,W. T., Aberration Theory of Gratings and Grating Mountings WELFORD,W. T., Aplanatism and Isoplanatism WELFORD,W. T., see I. M. Bassett WHITNEY,K. G., see M. 0. Scully WILHELMI, B., see M. Schubert WINSTON,R., see I. M. Bassett WOERDMAN, J. P., see R. J. C . Spreeuw WOLTER,H., O n Basic Analogies and Principal Differences between Optical and Electronic Information WYNNE,C. G., Field Correctors for Astronomical Telescopes WYROWSKI, F., see 0. Bryngdahl I., Fringe Formations in Deformation and Vibration Measurements YAMAGUCHI, using Laser Light YAMAJI,K., Design of Zoom Lenses T., Coherence Theory of Source-Size Compensation in Interference YAMAMOTO, Microscopy
VII, 231 VIII, 201 VII, 169 XVIII, 1 XXIII, 183 XXXI, 321 XIII, 169 11, 131 XVII, 239
VI, I XVIII, 257 XIX, XX, XXII, I,
139 63 77 289
XV, 245 VI, 259 XIV, 245 XXVIII, 181 XXV, 279 XIV, 89 XXIX, 293 IV, 241 XIII, 267 XXVII, 161 X, 89 XVII, 163 XXVII, 161 XXXI, 263
I, 155 X, 137 XXVIII, 1 XXII, 271 VI, 105 VIII, 295
CUMULATIVE INDEX - VOLUMES I-XXXII
YAMAMOTO, Y., S . MACHIDA,S . SAITO,N. IMOTO, T. YANAGAWA, M. KITAGAWA, G. BJORK,Quantum Mechanical Limit in Optical Precision Measurement and Communication XXVIII, YANAGAWA, T., see Y. Yamamoto XXVIII, YAROSLAVSKY, L. P., The Theory of Optimal Methods for Localization of Objects XXXII, in Pictures YOSHINAGA, H., Recent Developments in Far Infrared Spectroscopic Techniques XI, Yu, F. T. S., Principles of Optical Processing with Partially Coherent Light XXIII, Yu, F. T. S., Optical Neural Networks: Architecture, Design and Models XXXII, ZAVOROTNY, V. U., see M. I. Charnotskii XXXII, ZAVOROTNYI, V. U., see V. I. Tatarskii XVIII, ZUIDEMA,P., see M. A. Bouman XXII,
385
87 87
145 77 221 61 203 204 77
This Page Intentionally Left Blank