ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 95
EDITOR-IN-CHIEF
PETER W. HAWKES CEMESILaboratoire d’Optique Electronique du Centre National de la Recherche Scienti3que Toulouse, France
ASSOCIATE EDITORS
BENJAMIN KAZAN Xerox Corporation Palo Alto Research Center Palo Alto, California
TOM MULVEY Department of Electronic Engineering and Applied Physics Aston University Birmingham, United Kingdom
Advances in
Imaging and Electron Physics EDITEDB Y PETER W. HAWKES CEMESlLaboratoire d’Optique Electronique du Centre National de la Recherche Scientijique Toulouse, France
VOLUME 95
ACADEMIC PRESS San Diego New York Boston London Sydney Tokyo Toronto
This book is printed on acid-free paper. 8 Copyright 0 1996 by ACADEMIC PRESS, INC. All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NWI 7DX
International Standard Serial Number: 1076-5670 International Standard Book Number: 0-12-014737-8 PRINTED IN THE UNITED STATES OF AMERICA 96 97 9 8 9 9 00 01 BB 9 8 7 6 5
4
3 2
1
CONTENTS CONTRIBUTORS . . PREFACE .....
. . . . . . . . . . . . . . . . . . . . . . . . .
.........................
Ernst Ruska (1986.1988).
ix xi
Designer Extraordinaire of the Electron Microscope: A Memoir L . LAMBERT AND T . MULVEY
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . I1. Family Background . . . . . . . . . . . . . . . . . . . . . I11. TheTechnische HochschuleBerlin (1928-1933) . . . . . . . . IV . The Sudden Collapse of the Knoll Group . . . . . . . . . . . V . Political and Other Setbacks . . . . . . . . . . . . . . . . . VI . The Wartime and Postwar Era . . . . . . . . . . . . . . . . VII . Intervention by the Soviet Union . . . . . . . . . . . . . . . VIII . Modest New Beginning at Siemens . . . . . . . . . . . . . . IX . An Interrogation Camp in the United Kingdom and Subsequent Detention as a Prisoner of War in “Dustbin, Taunus . . . . . X . An Important Turning Point: From Industry Back to Basic Research . . . . . . . . . . . . . . . . . . . . . . . . . . XI . The Institut fur Elektronenmikroskopie . . . . . . . . . . . . XI1. Two Distressing Events . . . . . . . . . . . . . . . . . . . XI11. A New Challenge . . . . . . . . . . . . . . . . . . . . . . XIV . Extramural Activities . . . . . . . . . . . . . . . . . . . . XV . Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . XVI . The Emeritus Professor . . . . . . . . . . . . . . . . . . . XVII . Nobel Prize . . . . . . . . . . . . . . . . . . . . . . . . XVIII . Stockholm . . . . . . . . . . . . . . . . . . . . . . . . . XIX . Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . XX.Sunset . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
..
3 4 13 18 20 26 28 30 31
37 39 42 44 47 48 50 53 57
59 61 61
Electron Field Emission from Atom-Sources: Fabrication. Properties. and Applications of Nanotips Vu THIENBINH.N . GARCIA.AND S . T. PURCELL
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . I1. Electron Emission from a Metal Surface: Summary of the Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . V
63
64
vi
CONTENTS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
111. Electron Emission from Nanotips
IV . Applications V . Conclusions References
. . . .
. . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81 112 149 150
The Convex Feasibility Problem in Image Recovery P . L . COMBETTES I . Introduction
. . . . . . . . . . . . . . . . . . . . . . . . .
............. 111. Overview of Convex Set Theoretic Image Recovery . IV . Construction of Property Sets . . . . . . . . . . . V . Solving the Convex Feasibility Problem . . . . . . . I1. Mathematical Foundations
VI . Numerical Examples VII . Summary . . . . . . Appendix: Acronyms References . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
156 161 172 187 199 235 259 260 261
Spacetime Algebra and Electron Physics CHRISDORAN.ANTHONYLASENBY.STEPHENGULL.SHYAMAL SOMAROO. AND ANTHONY CHALLINOR
I . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . I1. Spacetime Algebra . . . . . . . . . . . . . . . . . . . . . .
111. Spinors and the Dirac Equation
. . . . . . . . . . . . . . . .
IV . Operators. Monogenics. and the Hydrogen Atom V . Propagators and Scattering Theory . . . . . . . VI . Plane Waves at Potential Steps . . . . . . . . . VII . Tunneling Times . . . . . . . . . . . . . . . . VIII . Spin Measurements . . . . . . . . . . . . . . . IX . The Multiparticle STA . . . . . . . . . . . . . X . Further Applications . . . . . . . . . . . . . . XI . Conclusions . . . . . . . . . . . . . . . . . . Appendix: The Spherical Monogenic Functions . References . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . ....... ....... ....... . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272 275 283 297 309 315 332 339 347 374 379 380 383
vii
CONTENTS
Texture Representation and Classification: The Feature Frequency Matrix Approach HELEN C . SHENAND DURGESH SRIVASTAVA
I . Introduction . . . . . I1. Representation . . . . I11. Classification Scheme. IV . Conclusions . . . . . References . . . . .
INDEX
. . . . . . ...... . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . ..... . . . . . . . . . . . . . . .
387 390 402 404 406
. . . . . . . . . 409
This Page Intentionally Left Blank
CONTRIBUTORS Numbers in parentheses indicate the pages on which the authors’ contributions begin.
Vu THIENBINH(63), Laboratoire d’Emission Electronique, DPM-URA CNRS, Universite Claude Bernard Lyon 1,69622 Villeurbanne, France ANTHONY CHALLINOR (271), MRAO, Cavendish Laboratory, Madingley Road, Cambridge CB3 OHE, United Kingdom (1551, Department of Electrical Engineering, City ColP. L. COMBETTES lege and Graduate School, City University of New York, New York, New York 10031
CHRISDORAN(271), MRAO, Cavendish Laboratory, Madingley Road, Cambridge CB3 OHE, United Kingdom N. GARCIA(63), Fisica de Sistemas Pequeiios, CSIC, Universidad Autonoma de Madrid, CIII, 28049 Madrid, Spain STEPHEN GULL(271), MRAO, Cavendish Laboratory, Madingley Road, Cambridge CB3 OHE, United Kingdom L. LAMBERT(3), Department of Electron Microscopy, Fritz-HaberInstitut der Max-Planck Gesellschaft, Faradayweg 4-6, D-14195 Berlin (Dahlem), Germany ANTHONYLASENBY (271), MRAO, Cavendish Laboratory, Madingley Road, Cambridge CB3 OHE, United Kingdom T. MULVEY (3), Department of Electronic Engineering and Applied Physics, Aston University, Birmingham B4 7ET, United Kingdom S. T. PURCELL (63), Laboratoire d’Emission Electronique, DPM-URA CNRS, Universite Claude Bernard Lyon 1,69622 Villeurbanne, France
(387), Department of Computer Science, The Hong Kong HELENC. SHEN University of Science and Technology, Hong Kong SHYAMAL SOMAROO (271), MRAO, Cavendish Laboratory, Madingley Road, Cambridge CB3 OHE, United Kingdom DURGESH SRIVASTAVA (387), Department of Computer Science, The Hong Kong University of Science and Technology, Hong Kong ix
This Page Intentionally Left Blank
PREFACE
These Advances were launched in 1948 by Ladislaus Marton, who was among the earliest electron microscopists, having begun building such instruments in 1934. He would have thoroughly enjoyed the first contribution to the present volume-a biographical essay on the late Ernst Ruska, Nobel prizewinning inventor of the electron microscope, by Lotte Lambert and Tom Mulvey. Frau Lambert was Ruska’s secretary for 23 years and is hence well placed to tell us not only about the scientist but also about the man. This most enjoyable account takes us from Ruska’s boyhood in Heidelberg, through his studies at the Technische Hochschule in Munich and in Berlin (against his father’s wishes, for engineering studies were distinctly infra dignitate) to the long saga of the electron microscope. The difficult wartime years and the horrors of the Soviet ‘‘liberation’’ are portrayed vividly. The postwar years saw the building of new electron microscopes and the widespread recognition of their importance and, finally, the Nobel prize ceremony. At the dinner, Ruska sat next to the Queen of Sweden, who was also born in Heidelberg; her father and Ruska had played together as boys! This chapter is full of little-known information, family photographs provided by Ruska’s widow, and anecdotes from numerous sources. I am delighted to publish it here. The subsequent chapters cover field emission from nanosources, image restoration, the role of geometric algebra in electron physics, and lastly, texture representation and classification. In the first of these subsequent chapters, V. T. Binh, N. Garcia, and S . T. Purcell describe in considerable detail the present state of knowledge about field emission of electrons from atom sources. A long section presents the physics of emission from metal surfaces, after which the authors turn to nanotips. In the last section a number of applications are explored, some for the future, whereas others are already being investigated. This is a clear and thorough account of this important area. The problems of image restoration have by no means all been solved, despite the considerable progress that has been made during the past two decades. Set theoretic methods have shown themselves to be powerful, and it is these that are explained at length in the contribution by P. L. Combettes. This chapter does indeed form a short monograph on the subject, with a section on the mathematical tools needed, a survey of the approach adopted, sections on the construction of property sets and xi
xii
PREFACE
solution of the convex feasibility problem, followed by numerical examples. I have no doubt that this scholarly presentation will be most useful. The next chapter, too, has the character of a monograph. In it, C. Doran, A. Lasenby, S. Gull, S. Somaroo, and A. Challinor explain convincingly the advantages of using spacetime algebra in electron physics. They show that the use of this algebra simplifies the Dirac theory and that the spacetime formulation of this theory facilitates the analysis of all aspects of electron physics. The text takes us from sections in which the ideas and principles are explained and defined formally through discussion of “operators, monogenics, and the hydrogen atom,” to propagators and scattering theory, plane waves at potential steps, tunneling times, spin measurements, and the multiparticle spacetime algebra. The volume ends with an account by H. C. Shen and D. Srivastava of the frequency matrix approach to texture representation and classification. This short but well illustrated text will be invaluable for newcomers to the subject who wish to acquire a rapid grasp of the ideas. I conclude, as usual, by thanking all the contributors for the trouble they have taken with the preparation of their chapters and especially for making sure that their texts are accessible to readers who are not specialists in that particular subject. A list of forthcoming surveys follows, and I can confirm that the next volume, 96, will be the volume chronicling “The Growth of Electron Microscopy,” guest-edited on behalf of the International Federation of Societies of Electron Microscopy by one of my Associate Editors, Tom Mulvey. Its successor is already in production and volume numbers are indicated in the list where possible. Peter Hawkes
FORTHCOMING ARTICLES Nanofabrication Use of the hypermatrix Image processing with signal-dependent noise The Wigner distribution Discontinuities and image restoration
Hexagon-based image processing Microscopic imaging with mass-selected secondary ions Modern map methods for particle optics
H. Ahmed D. Antzoulatos H. H. Arsenault M. J. Bastiaans L. Bedini, E. Salerno and A. Tonazzini (vol. 97) S. B. M. Bell M. T. Bernius
M. Berz and colleagues
xiii
PREFACE
Cadmium selenide field-effect transistors and display ODE methods Electron microscopy in mineralogy and geology Fuzzy morphology The study of dynamic phenomena in solids using field emission Gabor filters and texture analysis Miniaturization in electron optics Liquid metal ion sources The critical-voltage effect Stack filtering Median filters RF tubes in space Relativistic microwave electronics Quantitative particle modeling The quantum flux parametron Structural analysis of quasicrystals The de Broglie-Bohm theory Formal polynomials for image processing Contrast transfer and crystal images Seismic and electrical tomographic imaging
Morphological scale-space operations Algebraic approach to the quantum theory of electron optics Surface relief
T. P. Brody, A. van Calster, and J. F. Farrell J. C. Butcher P. E. Champness E. R. Dougherty and D. Sinha M. Drechsler J. M. H. Du Buf A. Feinerman R. G. Forbes A. Fox M. Gabbouj N. C . Gallagher and E. Coyle A. S. Gilmour V . L. Granatstein D. Greenspan (vol. 97) W. Hioe and M. Hosoya K. Hiraga P. Holland A. Imiya K. Ishizuka P. D. Jackson, D. M. McCann, and S. L. Shedlock P. Jackway R. Jagannathan and S. Khan (vol. 97) J. J. Koenderink and A. J. van Doorn
Spin-polarized SEM Sideband imaging The recursive dyadic Green’s function for ferrite circulators Regularization Near-field optical imaging
K. Koike W. Krakow C. M. Krowne A. Lannes A. Lewis
xiv
PREFACE
Vector transformation SEM image processing STEM holography of magnetic specimens
Electron holography of electrostatic fields Electronic tools in parapsychology The Growth of Electron Microscopy The Gaussian wavelet transform
Phase-space treatment of photon beams Image plate Z-contrast in materials science HDTV The wave-particle dualism Scientific work of Reinhold Rudenberg Electron holography X-ray microscopy Accelerator mass spectroscopy Applications of mathematical morphology Set-theoretic methods in image processing Wavelet vector transforms
Focus-deflection systems and their applications New developments in ferroelectrics Electron gun optics Very high resolution electron microscopy Morphology on graphs
W.Li N. C. MacDonald M. Mankos, M. Scheinfein, and J. C. Cowley G. Matteucci, G. F. Missiroli, and G. Pozzi R. L. Morris T. Mulvey, ed. (vol. 96) R. Navarro, A. Taberno, and G. Cristobal (vol. 97) G. Nemes T. Oikawa and N. Mori S. J. Pennycook E. Petajan H. Rauch H. G. Rudenberg D. Saldin G. Schmahl J. P. F. Sellsehop J . Serra M. I. Sezan E. A. B. da Silva and D. G. Sampson (vol. 97) T. Soma J. Toulouse Y. Uchikawa D. van Dyck L. Vincent
ADVANCES IN IMAGING AND ELECTRON PHYSICS
VOLUME 95
Ernst Ruska (1955)
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL . 95
Ernst Ruska (1906.1988). Designer Extraordinaire of the Electron Microscope: A Memoir L . LAMBERT* Fritz-Haber-Institut der Max-Planck Gesellschaft. Department of Electron Microscopy. Faradayweg 4.6. 0-14195 Berlin (Dahlem). Germany AND
T . MULVEY Department of Electronic Engineering and Applied Physics. Aston University. Birmingham B4 7ET. United Kingdom
I . Introduction . . . . . . . . . . . . . . . . . . . . I1. Family Background . . . . . . . . . . . . . . . . . . 111. The Technische Hochschule Berlin (1928-1933) . . . . . . . . . IV . The Sudden Collapse of the Knoll Group . . . . . . . . . . . V . Political and Other Setbacks . . . . . . . . . . . . . . . VI . The Wartime and Postwar Era . . . . . . . . . . . . . . V11 . Intervention by the Soviet Union . . . . . . . . . . . . . VIII . Modest New Beginning at Siemens . . . . . . . . . . . . . IX . An Interrogation Camp in the United Kingdom and Subsequent Detention as a Prisoner of War in "Dustbin, " Taunus . . . . . . . . . . . X . An Important Turning Point: From Industry Back to Basic Research . . XI . The Institut fur Elektronenmikroskopie . . . . . . . . . . . . XI1 . Two Distressing Events . . . . . . . . . . . . . . . . XI11. A New Challenge . . . . . . . . . . . . . . . . . . XIV . Extramural Activities . . . . . . . . . . . . . . . . . XV . Relaxation . . . . . . . . . . . . . . . . . . . . XVI . The Emeritus Professor . . . . . . . . . . . . . . . . XVII . Nobel Prize . . . . . . . . . . . . . . . . . . . . XVIII . Stockholm . . . . . . . . . . . . . . . . . . . . XIX . Epilogue . . . . . . . . . . . . . . . . . . . . . XX . Sunset . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .
3 4 13 18 20 26 28 30 31 37 39 42 44 47 48 50 53 57 59 61 61
I . INTRODUCTION This memoir is intended to shed some light on Ernst Ruska as Mensch as well as on some little-known details of his life that led him to become * Secretary to E . Ruska.
3
.
Copyright 1996 by Academic Press Inc . All rights of reproduction in any form reserved .
4
L. LAMBERT AND T. MULVEY
the original designer and constructor of the electron microscope and eventually the leading pacesetter in the design of high-resolution electron microscopes. Under Knoll’s supervision, at the Technische Hochschule Berlin, he demonstrated experimentally, for the first time, the basic laws of electron optics. Together, Knoll and Ruska produced in 1931 the crude but effective transmission electron microscope (TEM) shown in Fig. 5. In 1933, working alone, Ruska built the first TEM to surpass the light microscope in resolving power (Fig. 6). In close collaboration with his fellow postgraduate student Bod0 von Borries and his medically qualified brother Helmut, he later produced at Siemens Berlin the first serially manufactured TEM, in a form suitable for immediate laboratory use. This instrument was to revolutionize, among other things, medical diagnostic practice. The rest of Ruska’s life was devoted essentially to perfecting the technology inherent in his 1933 prototype TEM. The memoir does not attempt a complete survey of his career, but sets out in broad outline the simple, often idiosyncratic, way that Ruska pursued his early vision of the TEM throughout his life, starting from his undergraduate days and continuing single-mindedly until his death. It is uncanny to see how straightforwardly, even stubbornly, Ruska followed his route to reach his self-imposed goal, step by step, ignoring pessimistic forecasts, from both the knowledgeable and the ignorant, about the future role of the TEM. There was no stopping halfway or deviating from the path when he encountered obstacles. In all this activity, he seems not to have been motivated by a desire for prestige or for monetary reward. Rather, he seemed to be urged forward by some hidden driving force. Few competing designers and manufacturing companies could keep up with him for long. The electron microscope was his passion, or possibly his addiction. 11. FAMILY BACKGROUND Ernst Ruska’s roots are fascinating and rich in contrast, on both his father’s and his mother’s side. Among his forbears were famous artists, high-ranking church dignitaries, teachers and scientists, and a female author. The Ruscas came to Germany from Italy via Switzerland. The oldest provable registered church entry in Grafenhausen, in the Black Forest region, is the 1748 marriage certificate of Franciscus Josephus Nicolaus Rusca, the “honestus et perdoctusjuvenis, ludimagister.” Obviously, the intellectual gifts of this honorable young man and teacher have been transmitted, unweakened, to further generations. Ernst Ruska’s grandfather was an outstandingly versatile teacher in Grafenhausen, and even more pronounced were the eminent qualifications of Julius Ruska, Ernst’s
ERNST RUSKA: A MEMOIR
5
father, who was a famous Heidelberg professor, a humanist, a historian of sciences, and an orientalist. His mother, Elisabeth Merx, was a remarkable, strong, and proud woman. She was the daughter of the distinguished theologian and Geheimrut (privy councillor) Adalbert Merx in Giessen and Heidelberg. The saying went: “Der liebe Gott weiss alles, Geheimrat Merx weiss alles besser” (If the good Lord knows everything, Privy Councillor Merx knows it better). From her line also stem several renowned psychiatrists, and the author Louise Aston (1814-1871), a German George Sand, daughter of a church counsellor and superintendent. She struggled against political despotism and for the right of self-realization for women. Due to her radical lifestyle, she was expelled from Berlin in 1846. Thus genetically favored and burdened, Ernst Ruska was born on Christmas Day 1906 in Heidelberg, the fifth of seven children. He spent his early boyhood in a residential suburb, where his parents owned a twostory house that was outstanding in design for those times, since it had central heating and electric light. There was an atmosphere of science and learning around Ernst Ruska from his earliest childhood. A frequent visitor was uncle Max, the successful Heidelberg astronomer Max Wolf
FIGURE1. The Ruskafamily (around 1913): upper row (from left to right), Walter, Father Ruska, Hans; below, Emst, Elisabeth, Hedwig, Mother Elisabeth Ruska, Maria, Helmut.
6
L. LAMBERT AND T. MULVEY
(1869-1932), who discovered 233 minor planets. He was married to the sister of Ernst’s mother. The upbringing of the Ruska children was, according to the custom at the time, austere. Father Ruska was extremely ambitious, in particular with respect to the achievements of his sons in school and their later professional education. He tried to instill in them the importance of a sense of duty, diligence, and a critical outlook. He educated them in the spirit that in life it all boils down to discipline-and he provided an appropriate example. Julius Ruska was an industrious man who worked untiringly in his two big study rooms, one for his scientific interests such as the history of chemistry, in particular alchemy, botany, mineralogy, and geology, the other for studies relating to classical philosophy. Julius Ruska followed this line systematically for decades. When there were obstacles in his path, he simply swept them out of the way. When his father-in-law, the old Geheimrat Merx, after delivering a funeral address for a friend, collapsed and died at the open grave, it was self-evident to Julius Ruska that it should fall on him to complete the important unfinished work that his father had left behind. This meant that he had to brush up his knowledge of Arabic and learn two additional languages: Coptic and Armenian-which he did, at the age of 42. His motto: per aspera ad astra! Ernst’s mother, who was deeply religious, brought the children up in Spartan fashion. All extravagances were ruled out. Her motto was: “obedience, modesty, and thrift.” She was a woman of iron will, with outstanding organizing ability. She ruled the roost, towering above this huge household, which also included her own mother, who, suffering from chronic tuberculosis, was in need of care. Even though she had brought along her personal nurse, much additional work was heaped upon an already daunting domestic task, since laundry and tableware. had to be strictly segregated to avoid infection; care had also to be taken that the children should not approach their grandmother too closely. There were two domestic maids, Hanna and Frieda, the latter a pearl beyond price, even moving, later on, with the Ruska family from Heidelberg to Berlin and staying with them up to her 75th year. The boys had a very close relationship with Frieda, who always had an open ear for their confidences and childish pleasures. There were three girls in the Ruska family: Elisabeth, Maria, and Hedwig; they received the conventional education for girls at that time. Elisabeth became a teacher; Maria took a n examination that allowed her to work in a Kindergarten; she married very young, and although she should possibly have remained childless on account of her fragile health, in fact gave birth to two children and died soon afterward from tuberculosis. Hedwig had some social ‘‘Pestalozzi” Froebel training and later married Bod0 von Borries, who was to play an important role later with Ernst
ERNST RUSKA: A MEMOIR
7
FIGURE 2. The four Ruska brothers (from left to right): Helmut, Emst, Walter, Hans.
in the development of the commercial electron microscope of Siemens and Halske. There were four wonderful boys in the family, the hope and pride of Julius Ruska. He was an ambitious teacher who had, however, fixed ideas about their later professions. First and foremost, he wanted to give them the best possible education. The eldest son Hans, a brilliant student at the famous Heidelberg College (Humanistisches Gymnasium), soon showed signs of following in his father’s footsteps. He was gifted both in classical languages and the natural sciences; at the age of 15 he devoured philosophical books avidly. But he also played the violin and made beautiful sketches. Sisters and brothers looked up to him with adulation. The second eldest son, Walter, was less gifted academically, but more gifted manually; he was a born handyman. After leaving the Realschule he became an engineer. He was obviously proficient in industry. His employer, the firm Askania, sent him to America to set up an Askania branch, and he never returned to Germany. Walter Ruska later founded the Ruska Instrument Corporation in Houston, Texas, and developed instruments for oil prospecting. The two youngest sons, Ernst and Helmut, stuck together like burrs from early boyhood. They were two completely normal young rascals,
8
L. LAMBERT AND T. MULVEY
with fun in their heads. They played foolish pranks, such as throwing “waterbombs” onto the street from the upper floor of the house. Later they did even worse; Ernst was probably the instigator, but the younger Helmut was his willing accomplice. They were wild and always too noisy for Father Ruska in his study. Admonition to be as quiet as possible and to walk quietly in the house so that Father Ruska could work was not always followed, and so they were often sent out into the street, where they would romp about in a frolicsome and most unrestrained manner until Father Ruska summoned them to his room to have them sit still for up to an hour on a low wooden stool, back to back. While sitting out their punishment they could do nothing except look at the imposing Zeiss light microscope of their father’s, which they were not so much allowed as to touch. But strangely enough, instead of developing frustration toward this instrument, their desire to look through it increased. Ernst was impressed by optical instruments right from his early boyhood. He was fascinated when he listened to the discussions between Uncle Max, his father, and his godfather, the astronomer August Kopff, on the telescope with which one can see as far as the stars. At the other extreme, he was equally impressed with his father’s optical microscope for observing the minutest objects. Therefore, he did not really regard this sitting there back to back with Helmut as a punishment. At least he could marvel at this microscope and abandon himself to his dreams. The Ruskas moved as a column of soldiers. Regularly on Sundays, the long file of children would follow Mama Ruska in an orderly procession to church. Father Ruska, however, was far too rational, too skeptical to take part in formal religious observance. His main concern was to bring the children closer to nature, and to raise their scientific curiosity. So on weekends and during holidays he would marshal1 all the children for excursions of 6 to 8 hours’ duration into the beautiful Odenwald nearby. Equipped with a botanist’s specimen box and a hammer for selecting and trimming stones, he gave them wayside lectures in botany, geology, mineralogy, and ornithology. The girls, and also Helmut, listened openmouthed, and were enthralled when they succeeded in classifying correctly some rock sample or when they identified birds’ cries. When the Ruska children entered elementary school and met other children, they suddenly realized that they did not come from poor parents as they had previously imagined, and they had already learned quite a lot at home. In many subjects they were far in advance of their ages, and their knowledge in biology almost equaled that of the teacher. Nonetheless, they were still rather timid and seemed to suffer from some inferiority complexes. Ernst, on the other hand, hated these expeditions. He did not mind the marching part, since he enjoyed physical exercise and exertion. But botany
ERNST RUSKA: A MEMOIR
9
for hours on end seemed to him far too girlish. Much more fascinating was visting Uncle Max at the Konigstuhl Observatory near Heidelberg to look through the telescopes. Most of all, he longed to design and make something with his own hands, with the aid of his metal construction set. At the outbreak of World War I, Ernst was 8 years old. Times were hard. If they had lived previously in a Spartan way by their parents’ conviction, it was now a necessity. For years, they all suffered hunger and privation. Only the children had some bread before going to school; the grown-ups had to make do with a few potatos or beets. After the war, Helmut had to be treated for a severe furunculosis together with asthma, which prevented him from attending school for a considerable time. Wartime also taught the children to lend a helping hand in the house in order to survive without the help of tradesmen. Walter, the handyman, became a key figure. He repaired and even soled the shoes of the whole family, with Helmut acting as an apprentice handing him the tools and nails. Ernst was responsible for painting the window frames and the garden fence. He also repaired the bicycles, sadly at first because he did not yet have his own bicycle. His turn was not to come until he was 16. All these tasks were carried out without a murmur, rewarded only by trivial pocket money. The girls of course helped with the housework-without receiving any pocket money. It would be misleading, however, to think that Ernst’s childhood was all work and no play. He was strong and healthy enough to get out of life everything he could. During those boring educational walks he would collect some sharp stones suitable for later clandestine amusement with his brothers in the street. A favorite game involved the local street gas lamps. The one who could destroy, in one throw, the incandescant gas mantle and the glass of a gas lamp was declared the winner. During the war (1914-1918) when the family had four soldiers billeted on them, the boys stole cartridges from their covered wagon, knocked out the gun powder and made fireworks with it. By far the most adventurous escapade, however, was masterminded by Walter, who at the end of the war “appropriated” a horse from the returning soldiers and thought he could keep it in the garden! The boys tended to relieve their bad consciences with Frieda, who would scold them but not tell their parents, so that usually their misdemeanours did not come to the ears of their father. But if he learned about it, he disciplined the boys by thrashing them, as was usual in those days. In 1916, a most tragic event rocked the family to its foundations. Hans was discovered in the attic by Ernst-hanged. He had commited suicide at the age of 16, probably because of adolescent problems together with the bad news that he had just heard, that many of his friends and classmates
10
L. LAMBERT AND T. MULVEY
had been killed in the war. He, as the youngest of the class, had not yet been called up for military service. He could not face the prospect of the future. The family was paralyzed. For years, the family atmosphere was tense and depressed. The parents quarreled, torturing themselves, each putting the blame on the other. Life in this house where Hans had hanged himself became psychologically unbearable for the parents; after four years they sold it and moved into another house nearby. Instead of eleven rooms, they now had only nine. Father Ruska buried himself even more deeply in his work. Another problem at the time was the financial situation. Julius Ruska had given up his secure position as a civil servant to devote himself entirely to his favorite private studies. Part of his wife’s fortune had been invested in war loans and now risked being lost. With wise foresight, Mother Ruska insisted on Julius taking up at least some part-time position in a school in order to secure a regular income. Inflation then swallowed up the rest of the fortune. Julius Ruska had repeated fits of deep depression. For a man of such qualifications and ambitions, schoolteaching was, of course, unsatisfying. Only very much later, at the age of about 60, when be became director of the Historical Institute of Sciences in Berlin, did he completely recover and enjoy some happy years, until World War I1 broke out. The tragedy of Hans’s death and the hard times, in fact, welded the family tightly together. At the age of 10, Ernst entered the traditional Heidelberg College (Humanistisches Gymnasium) in 1916. Father Ruska, who was a member of the teaching staff, kept a sharp eye on him and monitored his progress. Ernst must fulfill his hopes now that it had become evident that Walter, the second eldest, preferred to have a technical college training and thus would not enter an academic career. Ernst was a bright pupil; in particular, he received his best marks in the natural sciences. But to the annoyance and distress of his father, who was proficient in seven languages, he had a strong aversion to classical languages. He got on well with French, but he thought it foolish to learn dead languages. Of course, he had the ability to learn Latin and Greek. He was intelligent, but he was more interested in other things, and he wanted to do things he found to his liking. In particular, he did not want to be a substitute for Hans. He was Ernst-totally different from Hans. So at an early stage there were often disagreeable scenes at lunchtime. Father Ruska clearly showed his disappointment when Ernst got low marks in Greek or Latin, and Ernst often shed tears at the table on these occasions. From the age of 12 onward, Ernst developed a passion for electrical switchboards. He constructed increasingly complicated machines with his
ERNST RUSKA: A MEMOIR
11
metal construction set and carried out interesting experiments. Helmut was a useful apprentice, helping to loosen or tighten screws, but was in fact more interested to know what could be done with the finished machines rather than the building of them. With this early boyhood collaboration, a fruitful lifelong cooperation was set in motion, to be crowned later with the Paul-Ehrlich and Ludwig-Darmstaedter Prize, bestowed upon the two brothers in 1970, three years before the untimely death of Helmut. On the occasion of Ernst’s confirmation, his Greek teacher, a close friend and former fellow student of his father, was invited. He, as an outsider, was a very realistic person, and it seems that he knew Ernst better than did Julius. As a confirmation gift he presented to Ernst the book, Hinter Pjlug und Schraubstock (“Behind Plough and Vice”) by the Swabian “poet” engineer, Max Eyth (1836-1906). This book proved to be an eye-opener and a treasured possession for Ernst; it both confirmed and fueled his desire to become an engineer. As a compensation for his inadequate performance in Greek and Latin, Ernst sought satisfaction in other fields, where he was naturally gifted. He was, for example, very good at gymnastics. He was the only one in the class who was able to perform the “giant swing” on the horizontal bars. Whenever possible, he pursued sport at a level well beyond gymnastics at school. Swimming was his passion. Just to swim around for half an hour, as the other boys did, was not sufficient for Ernst. He set up a challenging target. What mattered to him was to reach it. So, as a pupil of about 16, he would take the train to Neckarhausen, from which he swam back to Heidelberg, a distance of some 18 km. This project took five hours, and he allowed himself a first pause below the railway bridge at Neckargemiind, after swimming some 9 km downriver. From childhood, Ernst was extremely purposeful, spurred on by some hidden driving force, a zest for action to achieve something. As he was only rarely praised by his parents, he deliberately imposed on himself efforts to prove something to himself in order to boost his self-confidence. In 1925 Ernst graduated from high school. Crossing the famous Heidelberg Neckar bridge after his last day in school, Ernst opened his satchel and threw, in exuberant mood, its complete contents into the river. This earned him a final slap on his face from his father. In spite of the pained eyebrow-raising and ill-concealed sneers of his parents, who regarded studying at a Technical Highschool as beneath the family dignity, Ernst insisted on studying electrotechnics at the Technische Hochschule in Munich. In a last desperate attempt, his father offered him a probationary half-year at Heidelberg University, but Ernst stubbornly refused. He knew whavhe wanted, and he set about achieving it. First, he undertook six months’ practical training with the firm Brown-
12
L. LAMBERT AND T. MULVEY
FIGURE3. Ernst Ruska as a student (1926).
Boveri & Cie in Mannheim to demonstrate his firm decision to become an engineer; after which he entered the Technische Hochschule in Munich. Here, the austere training in modest living and thrift continued. Ernst received only a small amount of money monthly; he kept a book in which he noted down, in detail, all expenditures. He lived frugally on milk, bread, and cheese in order to have money left over for his regular visits to concerts and theaters. But often there was a deficit by the 22nd of the month. A particularly meager time was the autumn of 1926, when just before going home, his bicycle, loaded with a sack full of laundry to be washed at home, was stolen from the university court. The laundry was replaced by his parents, but as he believed it was his fault, and as he had the pride of Lucifer, he stinted himself to pay for the new bicycle, and so for quite some time his daily fodder consisted solely of sour milk and bread. Ruska later reported that he never suffered under such material privation. He was later even grateful to his parents for his severe upbringing, and, in retrospect, he was also grateful for the excellent humanistic education at the gymnasium. Ruska, too, sent his children to a humanistic college in Berlin, only his daughter was allowed to learn French instead of Greek. In 1927 his father received a call to become the head of a newly founded Institute for the History of Sciences in Berlin. It was a matter of course
ERNST RUSKA: A MEMOIR
13
for Ernst to proceed to Berlin immediately after his preexamination in Munich in 1928; an obvious reason was to reduce his expenses, but also because he had strong ties with his parental home. In time, the memory of the unhappy experiences in his childhood faded. Instead, the more pleasant hours were often remembered-for instance, the weekend musical evenings arranged by his father, who played the piano, accompanied by one of his former students, an excellent violinist, who later married Maria. But above all, Ruska later always liked to remember the many interesting evenings in his father’s “open house” in Berlin. It was a meeting place for a small circle of friends, professors and their assistants, who gathered regularly around Julius Ruska to participate in endless discussions on all kinds of scientific topics. In fact, this change of scenery from Heidelberg to Berlin was in a way beneficial for the whole family, at least for the next decade. Now that Julius Ruska was scientifically recognized, he was far more approachable. In the tranquility of the Berlin Castle, where his new office was located, he was now incredibly productive, devoting himself entirely to the work that had become so dear to him: Islamic culture. Ernst and Helmut began to realize why they always had to walk quietly in the house. Their father had written numerous books, articles, and notes, 243 publications in all up to 1938. Reading the titles of his publication list, one wonders to which scientific discipline he definitely belonged. Julius himself soon realized in Berlin that he could justly be proud of Ernst’s achievements at the Technical Highschool. When Helmut, as a medical student, wanted to have his own microscope, his father immediately purchased one for him.
111. THETECHNISCHE HOCHSCHULE BERLIN(1928-1933)
Ernst Ruska’s decision to continue his studies at the Technische Hochschule Berlid worked out for the best. It was indeed a stroke of luck, because he came to the right place at the right moment to encounter the right people. In 1928, at the end of his summer-term lecture on highvoltage technology, Professor A. Matthias announced his project of setting up a small group of students to develop the Braun tube into an efficient cathode-ray oscillograph for the measurement of the very fast electrical transients that occur in power stations and open-air high-voltage transmission lines. He asked who would be interested in doing this. Ruska, the only one who showed any enthusiasm, immediately volunteered for this task and was accepted. He thus became the youngest collaborator of the group, which was headed by Max Knoll (1897-1969); Ruska was then 22.
14
L. LAMBERT AND T. MULVEY
His dream had come true. Here, he could make experiments to his heart’s content. Both Matthias and Knoll allowed much freedom to their students; even side issues could be pursued. This was a wise attitude as later turned out, since the concept of the electron microscope originated as a byproduct of research work aimed at the improvement of the cathode-ray oscillograph. The team consisted of five particularly bright young students who, in a friendly atmosphere, discussed openly all day-to-day problems, so that each benefited from the experience of the others. Today it is easy to say that the idea of an electron microscope was in the air in the late 1920s. This may have been true at a purely intellectual level, but how did the situation present itself to Knoll and Ruska in 1928, and what was there to motivate them? There was, of course, the 1924 de Broglie wave theory of the electron, but neither Knoll nor Ruska had heard of it at the time. There was the Busch lens theory, but Busch’s theory did not agree with his own experiments. There was the Gabor partly iron-cased coil, but Gabor, at the time, could not explain how it worked! There was, in fact, a severe discrepancy between Busch’s theory and his experimental results. This was surprising since Busch was acknowledged more as an experimenter than a theoretician. At the TH Berlin, Gabor tended to support Busch’s theory, but did not have any reliable experimental data of his own to check it. It was decided that Ernst Ruska should carry out a crucial experiment to verify or perhaps falsify Busch’s theory. This was a tall order for an undergraduate project (Studienarbeit). Suffice it to say here that Ruska’s investigation confirmed the correctness of Busch’s theory within an experimental error of some 5%, a remarkable achievement with the available technology. Full details of this experiment are set out in Ruska (1979, 1980). This crucial experiment was later cited by the Nobel Prize Committee as one of the grounds for the award of the Nobel Prize to Ruska. The next step by Knoll and Ruska was to see if the electron image formed by such an (objective) lens could be further magnified by a second (projector) lens. Figure 4 shows the sketch made by Ernst Ruska on March 9, 1931, of this proposed two-stage arrangement. A photograph of the complete construction is shown in Fig. 5 . Although the total magnification was a mere 13 times, the principle of successive imaging, precisely analogous to that of the light microscope, was established experimentally. This simple instrument was a true prototype of all succeeding magnetic electron microscopes. Furthermore, it was fully operational in Berlin well before Ruedenberg’s patent on the electron microscope reached the Berlin Patent Office. This first TEM was designed to illustrate fundamental electron optical principles and not to offer competition to the light microscope, although this step was not far away. For this reason, it was adequate to
ERNST RUSKA: A MEMOIR
15
FIGURE 4. Sketch made by Ernst Ruska as a student on March 9, 1931, for the construction of a two-stage magnetic lens column to test the feasibility of a compound electron microscope.
16
L. LAMBERT AND T. MULVEY
FIGURE5. The first two-stage electron microscope as it was in 1931 at the Technische Hochschule Berlin. This photograph was actually taken on February 8, 1944. M. Knoll (left) and E. Ruska (right).
use iron-free coils, partly because they were easier to make, but also because their magnetic fields could be calculated analytically. After this success, Ruska had to get down to the work of his thesis, namely, to find a way to make iron-shrouded magnetic lenses with a much shorter focal length and aberrations than is possible with iron-free coils, since he was planning to have a magnification of some 12,000 times in his next TEM. In this lens development work, he could build on the experimental work of Gabor, who had previously shown, in the same laboratory, the optical advantages of placing a simple iron casing, but without polepieces, around the lens coil of his experimental high-voltage oscillograph. Knoll himself was an idealist, somewhat ivory-towered in outlook. Ruska was very young and inexperienced, and neither of them was skilled in business affairs. Bod0 von Borries, Ruska’s co-doctorand, close friend, and later brother-in-law, was chiefly concerned with oscillographs at the time. Ruska and von Borries had already decided in 1932, as a joint effort, to forget the high-voltage oscillograph and to develop the electron microscope into a trustworthy high-resolution microscope. Between them they devised the idea of iron polepieces in which the coil was completely
ERNST RUSKA: A MEMOIR
17
surrounded by iron except in a short axial “air gap” in which the polepieces concentrated a volume of high magnetic flux density by means of tapered iron polepieces. Von Borries was businesslike and persuaded Ernst that, as a first step, they should take out a joint patent privately, while Professor Matthias was on holiday. Matthias subsequently agreed to this arrangement. In fact, on March 17,1932, von Borries and Ruska were granted German Patent 680284 for an iron-polepiece lens based on the above ideas. This patent was to prove very useful commercially later on, when production of electron microscopes was undertaken by Siemens and Halske. Competitors such as the Allgemeine Elektrizitats Gesellschaft (AEG) were more or less forced to adopt electrostatic lenses, which are indeed simpler to manufacture, but more difficult to make work reliably, especially at accelerating voltages higher than some 80 kV. Ruska’s Ph.D. thesis was concerned chiefly with the design and construction of such polepiece lenses for the electron microscope. The design experience that Ruska gained in this investigation was later to give the Siemens Company a decisive head start over all other manufacturers. Max Knoll described the atmosphere of his group in the late 1920s in his address of thanks on the occasion of being nominated an Honorary Member of the German Society of Electron Microscopy in September 1967 (Knoll, 1968). Here is an excerpt: Ich glaube, wir haben in diesem Laboratorium schon damals einen Typ von Forschungsvorhaben venvirklicht, der heute unter dem Namen “Teamwork” bekannt geworden ist, mit dem Unterschied, dal3 es sich nicht urn fertige Wissenschaftler, sondern um junge Studenten handelte, deren Leistungen, wenn sie ohne intensive Anleitung allein fur sich arbeiten, meist wesentlich begrenzter sein miissen. Ich erklare mir dies dadurch, dal3 dabei die Studenten durch den engen Umgang mit ihresgleichen und dem Laborleiter nicht nur physikalisch, sondern auch soziologisch etwas Neues lernten (und zwar rascher als ihre alteren Berufsgenossen), namlich: Die erkenntnisfordernde Wirkung der Vorurteilslosigkeit nicht nur allen physikalischen Erscheinungen gegeniiber, sondern auch den oft logisch schwer verstandlichen Verhaltensweisen der BerufskoUegen gegeniiber. Eine solche in jungen Jahren erworbene vorurteilsfreie Haltung des Wissenschaftlers in der Gruppe ist fast immer die Voraussetzung zum spateren Erfolg, und ich bitte Sie, in diesem Sinne die Laudatio weitergeben zu diirfen an meine damaligen jungen Mitarbeiter, insbesondere an ERNST RUSKA. I believe that in this laboratory we already realized a kind of research project that is known today as “teamwork,” with the distinction that we were not dealing with competent scientists but with young students, whose achievements would necessarily be considerably restricted if they were to work by themselves without intensive guidance.
18
L. LAMBERT AND T. MULVEY
I surmise that this is due to the fact that the students are in close contact with their peers as well as with the leader of the laboratory. They thus learn something new not only physically but also sociologically (in fact, they learn it more rapidly than do their elder colleagues), namely: the beneficial effect of freedom from prejudice not only toward all physical phenomena but also toward the attitudes of colleagues, which are often hard to understand rationally. Such a prejudice-free attitude acquired in a group of scientists at an early age is almost always a precondition for later success, and I would ask you to allow me to reiterate the Laudatio in this sense to my former young collaborators, in particular to ERNST RUSKA.
IV. THESUDDENCOLLAPSE OF
THE
KNOLLGROUP
Knoll precipitated a crisis when he left the group in April 1932 to take up a position with Telefunken (Berlin), involving development work in the field of television, in which, at the time, electron-beam technology promised to become of decisive importance. While there, he designed and built the first scanning electron microscope, which he also omitted to patent. Nevertheless, Knoll’s group were the founders of electron microscopy as we know it today. Bod0 von Borries had completed his dissertation on March 24,1932. He also left the group in April of that year, to join the Rheinisch-Westfalisches Elektrizitatswerk in Essen. Ernst was still busy with his thesis, so no further progress was possible with the projected high-magnification TEM. However, as soon as Ernst had submitted his thesis, entitled “A Magnetic Lens for the Electron Microscope,” on August 31, 1933, he worked for a month at full pressure in the Technische Hochschule’s High Voltage Institute at Neubabelsberg, assembling a high-magnification TEM. The design was already in his head, and he used as many parts as possible from the previous column, machining the remaining components himself. The column was designed for an incredible top magnification of 12,OOOx, compared with the 2000X of a top-quality light microscope (electron microscope designers tend to be optimistic by nature). The design was brilliantly simple; all nonessential features were ruthlessly eliminated in order to save time (Fig. 6). It would have been nice, even important, to have had internal photography, but there was no time to design and make a suitable plate camera. The whole instrument was complete in the record time of three to four weeks of frantic effort. When an image first appeared on the screen, the first difficulty was to find a suitable test specimen for measuring the resolution. There were, of course, no EM specimen facilities at that time, but Ernst managed to carburize a cotton thread under the intense electron beam; this reduced its diameter considerably but also strength-
19
ERNST RUSKA: A MEMOIR
Insulator Cathode Anode
Electron gun
tube
Anode aperture with water cooling Uater cooling Observation mirror Air inlet valve Specimen changing device Pole~ieces Yater cooling
Condenser lens SWCimen changing
Objective lens
Extension tube
lnterrdiate tube Ceissler t
Intennediate screen Polepieces Yater cooling
Purcino connection
Projector lens
External car0
Viewing chanber Final screen FIGURE 6. First (two-stage) electron microscope magnifiying higher than the light microscope. Cross section of the microscope column. (Redrawn 1976).
20
L. LAMBERT AND T. MULVEY
ened it and stabilized it against further damage. On September 25, an image of such a fiber was obtained at a magnification of 8000X. Accurate measurement of resolution was not easy at the time, but the estimated resolution was clearly better than that of the best light microscope, but not by a large margin. There was no doubt, however, that Ernst Ruska was to be regarded as the first to step over the resolution barrier of the light microscope, whose resolution the great Ernst Abbe had confidently predicted, in 1876, would never be surpassed by any form of imaging device using either corpuscles or X rays. It was also generally acknowledged by light microscopists at the time that the electron microscope might well have a better resolution than that of the light microscope, but it seemed useless for examining biological specimens, for example, which would be damaged both by exposure to the vacuum and by radiation damage inflicted by the electron beam. These problems were also clear to Ruska himself, but he characteristically ignored them, being confident that he could overcome them by a better design of the microscope and more attention to preparing the specimen. This view proved to be correct, and Ernst Ruska, with his self-built 1933 TEM, may be regarded as the pathfinder for today’s atomic-resolution electron microscopy in all disciplines. This was also the point when he realized that he needed the expertise and help of his brother Helmut in applying the TEM, especially in the medical field, and that of his codoctorand Bod0 von Borries on the technical and entrepreneurial side in producing the TEM on a commercial scale and introducing it into industrial laboratories. Both of them were regarded very highly by Ernst Ruska. Helmut was an important driving force for Ernst right from the beginning. Their common pursuits during boyhood, continued as high school students by carrying out together chemical experiments and solving mathematical problems, led them to the desire to seek a professional career in which each would stimulate and support the other. It was not surprising that Helmut, as a future medical doctor, had a vision of what an electron microscope could mean to medicine, and he wanted to be the first to apply this novel and wonderful instrument in this area. He urged and implored Ernst to continue after setbacks. Helmut’s enthusiastic predictions of success in medicine clearly inspired and convinced Ernst to continue on his chosen path. V . POLITICAL AND OTHERSETBACKS The year 1933 had started with Hitler’s accession to power, and Ernst Ruska soon witnessed its immediate effects at the Technische Hochschule
ERNST RUSKA: A MEMOIR
21
Berlin. His co-doctorand, Martin Freundlich, had to hurry to finish his dissertation and emigrate to England. Ruska was not enthusiastic about this new movement; rather he felt strong resentment against it. This is clearly visible from his correspondence. Ruska was in fact now unemployed at a time of industrial depression. He was glad when he could take up a position with the Fernseh AG in Berlin-Zehlendorf in December 1933. Nevertheless, he regarded this as a transition job to bridge the time until a company could be set up in which the development of a commercial supermicroscope could be realized. Ruska eagerly awaited Bod0 von Borries’ return to Berlin for a joint attack on the project. Before this could take place, a suitable firm had to be found that would be interested in taking up production of the electron microscope. From May 1934 to December 1936, Ruska and von Borries trudged from pillar to post seeking industrial support, as described in detail by Ruska (1979, 1980). Enormous efforts were made, trying in countless discussions, open lectures, and business negotiations to interest and convince people in industry and research institutions of the importance of the TEM, a microscope with a considerably better resolution than that of the optical microscope. One can only marvel at the dogged determination and courage with which these two young men kept up the struggle, not giving in when they encountered negative attitudes, which was often the case. They seemed so young and were often considered cranks. There were genuine doubts, of course, as to the value of the electron microscope even if it could be achieved; and there was the big financial question: Would such an instrument pay off scientifically and technically? Helmut contributed right from the start to its final success. He set out his ideas on the operational form that such an instrument should take. It was also through the positive professional assessment by his clinical teacher, the far-sighted Professor Siebeck (1883-1969, Medical Director at the Berlin CharitC, that finally convinced two big industrial firms, Carl Zeiss and Siemens, at the end of 1936, to take the risk. For severely practical reasons, Ruska and von Borries decided for Siemens. Siemens already held the patents on the transmission electron microscope, taken out by Rudenberg, to which could be added the patents on iron polepiece lenses, taken out previously by Ruska and von Borries while they were still research students. Moreover, Siemens had more experience than Zeiss in the generation of high-voltage supplies. In parallel with these exciting events, Ruska also experienced some dramatic changes in his private life. In February 1936, Ernst Ruska met a young girl, Irmela Geigis (Fig. 7), daughter of the bank director Carl Geigis and his wife Anne Geigis, born Fellmann, in Schramberg (Black Forest). Her parents had sent her to Berlin for a year to forestall a planned
22
L. LAMBERT AND T. MULVEY
premature marriage on the part of their daughter. Irmela Geigis was full of life, refreshingly natural and easy to get on with. She had a passion for opera, where she spent most of her spare evenings. She liked very much the pulsating life of Berlin and the friendly Berliners who always smiled at her. It took her quite some time until she realized that their smile was meant for her looped plaits and her hat decorated with a shaving brush! On their first outing, the day after they met, Ernst explained to her the internal construction of an electron microscope. With a small stick, he drew something like his famous sketch of March 9, 1931, shown in Fig. 4, in the dry sand of the Brandenburg March. After this “scientific rendezvous,” Ernst hurried off to have some of his front teeth crowned before going out again with this beautiful 19-year-old. Ernst had not payed much attention to girls up to then, partly for lack of time, partly because he was rather timid. Up until then he was inclined to believe that he had better not marry. Acquiring a family appeared to him at the time as an external factor hampering his freedom; he was too much attached to his work. But now he began to think otherwise. He
FIGURE 7. Irmela Geigis (as Ernst Ruska’s fiancbe).
ERNST RUSKA: A MEMOIR
23
wanted to go out every night, but at the moment this was simply not feasible. The Fernseh-AG demanded his full attention to duty, because they were feverishly preparing for the XIth Olympic Games (1936, Berlin), due to be transmitted live on TV in about 30 public exhibition halls. The Fernseh-AG had installed a Farnsworth TV camera with a special scanner at the Marathon Gate, and Ruska had to repair the Braun oscilloscope tubes which, at the time, often suffered fractures. So he had often to work at night to “bake” and process new tubes. Moreover, the important final negotiations were now under way with Zeiss and Siemens. In the spring of 1937, just as Ruska and von Borries were going to start their demanding project of developing a commercial supermicroscope (Ubermikroskop) at the Siemens Company, Ruska had to do an eightweek spell of military service. He was assigned as a radio operator to the Fernsprechzug Potsdam 4, Neuer Marstall, where he seemed to cut a poor figure. He completed this service with little enthusiasm and even less ambition, not even obtaining the rank of lance corporal. Within his thoughts he was still at Siemens and the current production problems. So it happened during drill practice that he would sometimes go off to the left while the rest of the company marched to the right. Moreover, for some obscure reason, he refused to adhere to the strict rule of addressing his company commander in the old-fashioned form (i.e., the third person), as usual in the army. After the friendly atmosphere in the Technische Hochschule Berlin and his causal dealings with his Siemens director, he was not inclined to kowtow to some young lieutenant. He did not change his conduct even when the commander started to dress him down. More and more often he had to do punitive press-ups, which he did not seem to mind; he rather regarded it as a useful exercise to remain fit. More painful, however, was that a planned weekend trip with his fiancCe into the Thuringian Forest was cancelled without explanation, but definitely due to his stubbornness. Straight after this military service, Ernst Ruska married Irmela Geigis. Three months later, Bod0 von Borries married Ernst Ruska’s sister, Hedwig. Now began a period of the most hectic activity at Siemens, with Ruska and von Borries sparing no effort to attain their goal as soon as possible. They felt strongly committed to Siemens, which was going to invest millions of marks in their project. So they put themselves under enormous pressure to succeed. Additional stress came from the fact that, meanwhile, parallel developments were under way in several other places, at the AEG Berlin, led by E. Briiche, and especially in Canada, where Hillier was developing a prototype electron microscope that would later lead to the manufacture of electron microscopes by RCA in the United States.
24
L. LAMBERT AND T. MULVEY
FIGURE8. The successful trio (all in their early 30s): Ernst Ruska (above left), Bod0 von Bomes (above right), and Helmut Ruska (left).
ERNST RUSKA: A MEMOIR
25
Often Ruska and von Borries telephoned their wives around 9 p.m. to tell them that once more another nightshift was impending, because they were still trying to get a good vacuum. The main reason for calling was usually that they were hungry. So, in turn, Irmela Ruska and Hedwig von Borries, who had both given birth in the meantime, had to hurriedly prepare some meatballs and set out with them for the laboratory. They soon had to learn how to adapt to this hand-to-mouth existence and to put their personal wishes on one side. The main problem in this embryonic phase of the electron microscope was how to achieve a good working vacuum. The mercury pump was heated with Bunsen burners. The direct current (DC) of the lens coils was drawn from a gigantic storage battery in an adjacent room that had to be recharged overnight. The exposed high-tension components above the electron gun at the top of the microscope column were shielded by a kind of earthed hip bath. When the instrument was switched off for any reason, a servomotor cranked up an earthed mushroom-shaped metal electrode that short-circuited the charged condensers in the high-voltage supply, to prevent accidental electrocution of the operator! Thus, the approaching wife with the supply of meatballs and other refreshment was usually welcomed with a flash and a bang! Ruska and von Borries were both skillful and lucky in getting hold of expert collaborators. They secured H. 0. Miiller and F. Krause, two specialists in microscopy, and the theorist W. Glaser from Prague, together with a few, mostly very young, co-workers, who all contributed effectively to the rapid and successful development of the first supermicroscope. Ernst Ruska and Bod0 von Borries, although of strongly contrasting temperament, made perfect partners in the development of the instrument. But this combination became unbeatable when Helmut Ruska joined them. He was given the use of a microscope, and he set up a service for examining a wide range of specimens. Not only that, he was able to make strong representations to Ernst and Bod0 von Borries for the improvement of the daily use of the microscope. No other group of designers or manufacturers could compete with this extraordinary troika of very gifted, highly motivated, hard-working young scientists. The financial and practical support put in by Siemens on this project were also quite extraordinary. Helmut took over the first 1939 TEM. He carried out systematically the first comprehensive investigations of biological objects in the TEM and was able to interpret his remarkable findings to the medical community. In 1940, Siemens set up an Institute for Helmut Ruska to carry out research work in electron microscopy together with German and foreign scientists. Helmut thus had an enormous influence in making electron microscopy
26
L. LAMBERT AND T. MULVEY
available to the medical profession throughout the world. No other firm in the world put such an effort into electron microscopy as did Siemens in the early days. The company was soon to profit from its bold initiative. As early as 1938, the first two prototypes had been commissioned. By the end of 1939, the first serially produced Siemens electron microscope had been brought into operation at I.G. Farben-Industrie AG at Hoechst. By the end of the war, to the delight of scientific institutions and the Siemens shareholders (Siemens Company), 35 institutes had been equipped with electron microscopes. More than 200 papers had been published during this fruitful period.
V1. THE WARTIMEAND POSTWARERA
By 1943, the bombing of Berlin by the Allied air forces had become very severe. To escape the bombs, Irmela Ruska fled from Berlin in the summer of 1943, together with her three small children, aged 1, 3, and 5 years, to take refuge with her parents in the Black Forest. Ernst’s mother, now paralyzed down one side, could not get down to the air-raid shelter. Therefore, Ruska was very glad that he could secure his own parents under the roof of Irmela’s parents-just in time, because soon after, his parents’ house in Berlin was burned down. Irmela’s father also arranged for accommodation nearby to house Hedwig von Borries and her small children. At Siemens, the production and delivery of electron microscopes continued in spite of the ever-increasing air raids and the call-up of more and more craftsmen for military service. Five employees stood permanent guard in an outdoor shelter to sound air-raid warnings and to fight fires. The possibility of moving the EM department to a safer place was discussed several times, but it was hard to find a place which was both suitable and safe. When Helmut Ruska’s research laboratory fell victim to the bombing in October 1944, Siemens managed to evacuate Helmut complete with two electron microscopes and some of his personnel. They found shelter at the Reichsforschungsanstalt on the Island of Riems in the Baltic Sea. But nobody could really escape from the ravages of war. The advance of the Soviet troops toward Berlin proceeded much faster than anyone had imagined possible. In February 1945, Bod0 von Borries decided to leave Berlin. In a most depressed letter to his wife, Ernst Ruska reflected on what was his overriding duty: his state-political duties and his profession, or his family. Knowing his family were in a relative safe place, Ruska decided to remain in
ERNST RUSKA: A MEMOIR
27
Berlin. He felt that he could not forsake his co-workers. He felt responsible for his staff. This was not an easy decision for him. On March 8, 1945, he went to Schramberg to see his family and his parents, possibly for the last time, to discuss with his wife the steps to be taken (financial situation, last will, etc.) in case of his death. With a heavy heart, he returned to Berlin. He knew that his name was on the list for the Volkssturm, stage 2, which meant that he could be recruited even in the very last days of the war. Ruska wrote in a letter to his wife: “I would rather be killed on the Siemens site together with my colleagues and collaborators.” When Bod0 von Borries had found a suitable place to install some of the scientific equipment, Ruska and his people rapidly dismantled two electron miscroscopes, packed together accessories, components, tools, and almost all of the archive material to be evacuated. From early morning to late at night they loaded two railway wagons up to the roof, so that von Borries, when he came for a last visit to Berlin on March 20, 1945, could accompany them, in a last-minute effort,, on their way to Westfalia, where he now lived. At the end of the war, at least some of the instruments and workshop equipment were distributed over three locations. Nobody could foresee which area would be safe, but Berlin was surely the least safe. Nonetheless, almost all the prominent people at Siemens and most of the personnel decided to stay on in Berlin. On April 21, 1945, when people no longer dared go on the street, the Siemens emergency service ‘was officially cancelled, but in the various departments, numerous employees voluntarily continued this emergency service and took up residence in the Siemens Works to protect the laboratories as far as possible. Ruska, too, confined himself to his laboratory. His regular home was now occupied by 13 homeless people, and, besides, he wanted to do all in his power to maintain his laboratory. Together with his colleague, Dr. Nistler, who had just lost his house, he settled in provisionally into the laboratory. Dr. Nistler had some knowledge of the Russian language; this proved helpful for survival after the war. With time on his hands, Ruska started to write a long letter to his wife. As there was no possibility of posting the letter, he continued with it daily until June 17, 1945, describing the end phase of the war, the “Twilight of the Gods,” now with incessant raids and gunfire on all sides, no electricity, no radio, the threatening rumble of the Red Army tanks coming closer every day. The Siemens laboratory was situated in the immediate vicinity of the Spandau Citadelle, where extremely heavy fighting took place. Ruska’s chronicle gives an illustrative eyewitness account of this hell, both physically and psychologically, awaiting the encirclement, the battle of Berlin, and the final capitulation. It was a day-to-day survival; one could not indulge in the luxury of worrying about the future. Each stanza
28
L. LAMBERT AND T. MULVEY
of Ruska’s letter started with words reassuring his wife that he was fine and that she must not worry. But in fact he was deeply depressed and emotionally shattered by being separated from his family.
VII. INTERVENTIONBY
THE
SOVIET UNION
The Soviet Army had, of course, their Intelligence Service. They knew where to find Ruska. On May 4, 1945, a Russian major inspected the Siemens laboratory. Ruska convinced him of the high scientific and industrial value and the importance of the Siemens supermicroscope. One was ready for dispatch, and he pleaded for protection of the building; this was granted. Two days later, however, the laboratory was looted by the Soviet Army. Individual components for some 20 supermicroscope models, all the machinery, everything removable, including light switches, were confiscated. Ruska’s heart broke when he had to witness the destruction of his laboratory; all these valuable instruments and components that had survived the bombs were now simply thrown into boxes and onto wagons. He knew they would never be able to put everything together again. The microscope components were taken to a central depot in Weissensee, ready to be shipped off to Moscow later. One supermicroscope was housed in an AEG building in Oberschoneweide in East Berlin. Next day, Ruska was visted by Colonel Kostrow, Director of the Moscow Eletrotechnical Institute, together with a major. They told Ruska that they were interested in having him go voluntarily to Moscow together with 20 of his specialists to set up and lead an institute for electron microscopy. In the name of the Soviet government, Kostrow assured Ruska that he, as a reputable scientist, would have special treatment. Ruska then had to hand over a list of all the people who had been working with him at the Siemens EM Department. Ruska negotiated diplomatically, at first cautiously testing his position. He gave his consent, in principle, to go to Moscow, but demanded the possibility of first finding his family; he wanted the same right to be guaranteed for his co-workers if they would join him. The Russians came every day, either to Ruska’s house or to Siemens, to negotiate with him and to persuade him to go to Moscow. In order to gain time, Ruska delayed matters by making vague promises. He gave the impression of elaborating plans for the Moscow institute. Although the negotiations became increasingly tougher each day, Ruska did not immediately recognize the imminent danger of being deported, not even when he was asked to move to East Berlin within two days, together with Dr. Nistler and three more collabora-
ERNST RUSKA: A MEMOIR
29
tors. He had complete confidence in the verbal promises of the scientist Kostrow. On June 27, 1945, he narrowly escaped deportation thanks to Ilse H., who at the time lived in his flat and who, after the repeated arrival of a Soviet military car occupied by GPU (Secret Service personnel) demanding threateningly to know where Ruska was, managed to mobilize neighbors to intercept Ruska on his way home and bring him to an agreed hiding place. Miss H. herself was arrested in the middle of the night and taken away for intense interrogation. After she was set free, Ruska’s house had been broken into and plundered, including his private scientific papers. Two days later, the Soviets began to leave the district of Spandau. On July 1, 1945, British troops marched into the district. As soon as Ruska emerged from his hiding place, his first action was to inform Colonel Kostrow, in a letter of July 2, 1945, that he would not go to Moscow, “because the main condition for successful scientific work is the internal and external liberty of man.” Now that he knew that he had been marked out to be removed by force from his house, the basis for a fruitful collaboration with the Russians was no longer valid. In this letter he also complained about the bad behavior of the Russians toward the people living in his house, and demanded the immediate return of his private scientific papers. This was effected at the end of July 1945, when Colonel Kostrow wanted to continue “peaceful meetings” with Ruska in the Academy of Sciences in Buch. Kostrow now tried to win Ruska, if not for Moscow, at least for Berlin. Ruska was to leave Siemens to set up and lead an institute in East Berlin. But Ruska, of course, refused to do this. He offered help, however, in setting up the confiscated Siemens supermicroscope in Oberschoneweide, but not for money, as suggested by Kostrow. In return for his help, Ruska wanted to get back to Siemens one of the confiscated supermicroscopes. This, of course, failed to materialize. But at least he succeeded in extracting from the Russians up to November 1945, a milling machine, a lathe, a drilling machine, a bicycle, and a typewriter! Moreover, he obtained the return of blueprints of design documents and an entry permit for an important Siemens EM specialist., who had gone to West Germany at the end of the war and wanted to get back to Berlin. Many such things could be “organized” in the chaotic aftermath of the war. But to achieve all this, Ruska had to chase from one place to another. For months, he slept each night at whatever place at which he had business to do. He also consulted about the setting up of a department of instrumental research at the German Academy of Sciences in Buch (East Berlin), in the hope of getting an order for an instrument. The Four Power Commission had originally decided to dissolve the Kaiser-Wilhelm-Gesellschaft. In East Berlin, however, obviously due to the fact that they were not successful in convincing German scientists to
30
L. LAMBERT AND T. MULVEY
go to Russia, the Soviets immediately reactivated all research institutions in their district. They reopened the Humboldt University and its institutes, which were situated partly in the American district, and they reopened the former Prussian Academy of Sciences, including the institutes of the Kaiser-Wilhelm-Cesellschaft situated in Berlin Buch (East). This prompted the Western Allies to rethink their strategy.
VIII. MODESTNEW BEGINNING AT SIEMENS As no director was left at Siemens-one did not survive his deportation to the Soviet Union, another was killed, a third committed suicide-Siemens reinstated a recently retired director, Dr. Schwenn. He offered Ruska a new contract at a salary of 400 Reichsmarks. This was the asking price for a loaf on the black market. But Ruska unhesitatingly accepted. Scientific work was, of course, hardly feasible. Ernst Ruska was, however, not the type to sit idly by and wait for better times. He rolled his sleeves up and attacked the problems head on. He was empty-handed, without machines or material, and with mainly new co-workers. The war had inflicted heavy casualities. In particular, the death of H. 0. Miiller left a great void in the team. On an official trip to Saxony, he had to be hospitalized due to a severe lung infection and was killed by enemy action on April 24, 1945, in this hospital, when the town of Sachsenburg was occupied. In spite of the difficulties, Ruska immediately set about designing a new and improved electron microscope. An old wooden writing desk served as a make-shift microscope desk. The first two winters after the war were bitterly cold, There was no public transport, and the Citadelle bridge leading to the Siemens Wernenverk had been destroyed, so people had to make iong detours on foot over the debris. The laboratory was unheated; all the windows were nailed up with roofing felt or old radiographs. The staff, wearing gloves, worked with woolen caps on their heads. They crowded periodically around an iron stove and warmed their feet on heated bricks. The ink froze, so they could only write in pencil. Postal communication was not possible for a long time. One had to find someone who was prepared to walk to the place where the letter was to be delivered. In this way, Irmela Ruska learned in June 1945 that Ernst was still alive. In August 1945, Ruska took his first chance to get to the Black Forest, partly by long marches on foot, partly hitch-hiking by lorry, to rejoin his family. His mother had passed away in April 1945; his father, now almost blind, was being taken care of by Irmela Ruska and her father. Ernst Ruska was very grateful that the war had, at least, been merciful
ERNST RUSKA: A MEMOIR
31
FIGURE 9. First Zonal Travel Permit (1946).
to his family. On this trip he also visited Bod0 von Borries and his family. He had to cross the border illegally; there was no Interzone Passport at that time. Ernst did not get the one shown in Fig. 9 until October 1946. IX. AN INTERROGATION CAMPI N THE UNITED KINGDOM AND SUBSEQUENT DETENTION AS A PRISONER OF WAR I N ‘‘DUST B I N , ” TAUN U S In March 1946, Ernst Ruska, like many other German scientists, had to go to Wimbledon in the United Kingdom. Here he met D. Gabor, whom he already knew from the Technische Hochschule Berlin in 1928, and he got to know Dr. Sayers and M. E. Haine from the Metropolitan Vickers Company. Ruska was interrogated only once. In England, too, they tried to attract Ruska. In his letters to his wife, Ruska described his stay in Wimbledon as most pleasant. He slept in a big dormitory together with 12 other scientists. After the preceding years spent alone in Berlin with all the worries, fears, strains, and considerable privations, his stay in Wimbledon was a veritable convalescence. Meals and laundry were taken care of, and he had a lot of time to think, to read, and to concentrate on
32
L. LAMBERT AND T. MULVEY
the future. Most stimulating discussions took place on natural sciences, on moral-ethical questions, and on how to avoid another world catastrophe. Wimbledon was an experience which Ruska appreciated as an enrichment of his life. In Wimbledon he also became acquainted with peanuts. Later on, at parties, Ruska recounted roguishly how he applied his “excavator grab” in England in order to grab as many peanuts as possible when he was invited to help himself. He was so famished. His suits were two sizes too big! On his return from Wimbledon, Ruska was incarcerated for a further five weeks as a prisoner of war (POW) in the “Dustbin” (Cransberg in the Taunus), although he was innocent politically. This prolonged detention worried him very much, as it was time to return to Siemens to fulfil his contract. When he had left Berlin, he had been given the written assurance of the Inter-Allied Commission that he would be treated as a guest of the British government and that he would be absent from Berlin for only five weeks. Ruska managed to smuggle out a postcard to his wife in which he told her what to do to get him out. This seemed to her to be far too complicated. She preferred, instead, to go personally to the American military headquarters in Frankfurt-Hoechst and energetically demand the release of her husband, there and then, threatening not to leave the place until this was effected. Ruska was finally transferred to Stuttgart (within the U.S. occupation zone) on June 5, 1946, after he had promised that he would not go back to Berlin to be deported. Ruska briefly visited his family and Bod0 von Borries and then again illegally crossed the border to Berlin. He arrived safely back in Berlin on July 8, 1946, and immediately resumed work at Siemens, where in the meantime 26 newly hired staff awaited his arrival. Helmut Ruska fled from the Island of Riems at the end of the war. In October 1945, he and his wife Carla arrived in Westfalia, where they lived under most primitive conditions above a goat shed. In January 1946, Helmut returned to Berlin to start work at the German Academy of Sciences in Buch. He used an old and decrepit supermicroscope for his virus and protein research work and impatiently waited for the first new microscope, built by his brother. Helmut often went to Siemens to see how things were going, but also to contribute his requirements and ideas to the concept of the new instrument that was to be improved and at the same time reduced in cost. The food situation, which had already been poor for quite some time in Berlin, was even worse after the war. Ernst Ruska made ends meet by parting with his belongings. He started with the most dispensable furniture, a record player, for which he obtained a rabbit ready for slaughter, which caused him problems. As he was a nonsmoker, he could exchange his
ERNST RUSKA: A MEMOIR
33
cigarettes for bread. Dr. Nistler demounted railway sleepers and Ruska chopped the wood so they could heat an oven. Helmut, a very practical man, together with the employees of the academy, dug up the sewage fields near the academy and planted sugar beet and potatos. The academy also housed the Institute for Geschwulstforschung, where they worked on test animals, among them apes. The employees successfully scrounged food for the animals from the Americans. They received milk powder and some meat every now and then. It was divided fairly between “man and beast.” At the beginning of August 1946, the first real progress, or rather “Interzone success,” could be celebrated at Siemens, when four high-vacuum pumps had been delivered from Leybold. In September 1946, the EM section moved into better accommodation in Siemensstadt. Although life was still gray and poor, Ernst Ruska was optimistic. He now had three competent design engineers, and everything went better than he had dared to hope. Of course, there were ups and downs, but the new Ubermikroskop 100 was underway and orders had even been placed. In 1947 Ernst Ruska’s wife could finally return to Berlin. The children had to remain in West Germany; the two elder children lived in a boarding school near Munich, the youngest stayed with the grandparents. In fact, there was no access to Berlin for children yet, because the bare necessities of life were still missing. Another delay was caused by the Berlin blockade, so the Ruska family was not reunited until 1950. The former close cooperation between Ernst Ruska and Bod0 von Borries effectively ended after the war, as von Borries preferred to remain in West Germany. Nobody really wanted to go to Berlin at the time. A cooperation with Siemens from Westphalia, as von Borries had planned, proved unrealizable. Bod0 von Borries had other ideas as to the continuation of the work. He wanted to concentrate on the production of smaller, modest and cheap instruments to guarantee immediate sales. But as they had to start from scratch anyway, Siemens and Ruska aimed at a considerably improved high-quality 100-kV instrument. In addition, a 60-kV instrument was designed for more modest purposes, and as the next goal they envisaged an instrument operating up to 220 kV. In the eyes of Siemens, von Borries’s close connection to Leitz, Wetzlar, also stood in the way. Bod0 von Borries was Director of the Rheinisch-Westfalisches Institut fur Ubermikroskopie in Diisseldorf. Unfortunately, another political event soon threatened to put an end to the progress reached so far. On June 24, 1948, the Soviets imposed a blockade on West Berlin. The Western sectors were cut off from their energy supply in the East, and the lights literally went out in West Berlin. Moreover, the city was cut off from the overland routes by the surrounding
34
L. LAMBERT AND T. MULVEY
Soviets. To satisfy the economic needs of more than 2 million West Berliners, the Western Allies flew in, by means of an airlift, in some 278,000 flights-at the peak of the airlift, planes’ landing every one or two minutes-over 1.7 million tons of supplies. And, amazingly, as well as coal for the city, dried milk, dried fruit, dehydrated potatos (to save on weight), heavy components for Ruska’s electron microscope had also been flown in. Against all gloomy predictions, this difficult period was also overcome. It was still possible to move freely within Berlin from East to West. Ernst Ruska still helped out at the Academy of Sciences in Buch (East). In December 1948 he finally gave up because he realized that work in the East was not efficient; there was no headway. The few available specialists came over to Siemens to work with Ruska in West Berlin. In February 1949, Ernst Ruska’s father passed away in Schramberg, where he lived with Irmela Ruska’s parents. When Irmela Ruska returned to Berlin in 1947, her father took over the round-the-clock nursing of the poor old man. Because of the blockade, Ernst Ruska and his wife could not go to the funeral; there was no possibility of getting out of Berlin for private purposes within a couple of days. This visibly distressed Ernst Ruska, and he was not responsive at the institute for quite some time. The unstable living conditions in Berlin caused many scientists to leave Berlin as soon as they received a good offer from abroad. In particular, the United States was interested in highly qualified German scientists. In the first years after the war they had posted agents in Germany to take care of German scientists. The scientists of the former Kaiser-WilhelmInstitut fur Physikalische Chemie (now Fritz-Haber-Institut der MaxPlanck-Gesellschaft) were looked after by Dr. Birman from the United States. The rare scientists in Berlin, including Ruska, were employed as “technical consultants” at various scientific institutions. They were all on a list of the U.S. High Commissioner for Germany to be protected in case of a political crisis. Thus Ruska felt safe in Berlin and never considered leaving it. Financial benefits never attracted Ruska; the only thing that mattered to him was to be free to improve the microscope. His proper stamping ground was Siemens, now more than ever. In 1948, Karl Friedrich Bonhoeffer, then Director at the Deutsche Forschungshochschule (KaiserWilhelm-Institut fur Physikalische Chemie und Elektrochemie) in Dahlem, asked him to lead a small group that had been abandoned when Hartmut Kallmann had gone to New York. Ruska was to encourage fundamental research toward the further development of electron microscopes while
‘ Colloquially called “raisin bomber.”
ERNST RUSKA: A MEMOIR
35
FIGURE10. “Raisin bomber” landing in Tempelhof among Berlin houses. (Photo: Landesbildstelle Berlin.)
continuing his work at Siemens. This group was the basic unit for his later institute at the Fritz-Haber-Institut. Helmut Ruska also worked in this Dahlem institute, building up and leading a group for micromorphology . In February 1949, Bod0 von Borries suggested that a German Society for Electron Microscopy should be founded with Ernst Ruska as chairman. Busch was to become the first honorary member of the Society. In his opening speech on the occasion of the first meeting of the DGE (Deutsche Gesellschaft fur Elektronenmikroskopie) (Mosbach, April 23, 1949), Ruska welcomed the opportunity to thank Busch personally and in public for his fine work, which had put him (Ruska) on the right path to the electron microscope (Ruska, 1949). When the Berlin blockade was lifted on May 12, 1949, rapid progress became immediately visible at Siemens. Ruska could be fully content with his situation there. He was absolutely privileged with his project. He
36
L. LAMBERT AND T. MULVEY
FIGURE1 1 . The ELMI I (1955). Ernst (right) and Helmut (standing) on a visit from the New York Institute of Health Albany, to inspect and order an ELMI I for the United States. At the controls, Otto Wolff.
had an electron optics laboratory of his own, a specimen preparation laboratory, a design office, a test workshop, and his own workshop-all this on one floor, immediately to hand, a situation unique at Siemens. Today it is difficult to imagine how it was possible that new electron microscopes could be delivered by as early as 1949. This could only be achieved because enormous efforts were made. People were thankful to have survived. Everybody was creative and lent a willing hand. A spirit of a new beginning was in the air. Everything was on the point of emerging. Another reason for this success was that Ernst Ruska was obviously gifted in motivating others. It was a pleasure to work with him. His high work morale and his motivation and zest for achieving his goal was contagious. He swept people along with him. The new development period after the war led in 1954 to the first im-
ERNST RUSKA: A MEMOIR
37
proved microscope, the “ELMISKOP I,” a universal high-resolution electromagnetic transmission microscope. Its practical resolution was 0.8 nm, its theoretical resolution below 0.4 nm, which was a considerable feat for the time. Germany had thus regained its leading position in the field of electron microscopy, closely followed by Japan. This instrument was Ruska’s farewell present to Siemens. They sold 1,000 ELM1 I instruments up to 1965!
X. AN IMPORTANT TURNING POINT:FROM INDUSTRY BACKTO BASICRESEARCH In the meantime, drastic changes loomed large at Siemens, where endeavours were being made to reorganize the firm. First, the design office, i.e., Ruska’s design office, was to become a service facility accessible to everybody. No political crisis, nor the hard times after the war, no offer however favorable, ever tempted Ruska to leave Siemens. His ties to Siemens were very strong, as long as he was free to continue improving the electron microscope. But now he began to feel like a blacksmith deprived of his hammer. Second, and even worse, Siemens was no longer interested in continuing EM research on a large scale. The company was now interested mainly in the production and sale of routine electron microscopes. For Ruska this meant stagnation. He realized that he could pursue his aim of developing an electron microscope with atomic resolution only in an institute for basic research that was free from the restrictions of an industrial research laboratory. A most suitable place seemed to be the Fritz-Haber-Institute at the Max-Planck-Gesellschaft in BerlinDahlem, where he already had a foothold. Ruska described his situation to Otto Hahn, then president of the MPG, and inquired about the possibilities of founding a larger department or an institute at the FHI for the development of high-resolution microscopes and elaboration of the relevant preparation methods for important applications. In particular, he suggested that an attempt be made to improve the resolution limit, i.e., to make atoms visible at 2 A resolution. Ruska was asked to present his ideas to a committee consisting of Bonhoeffer, Bothe, Butenandt, Heisenberg, and von Laue. The committee found that the plan had a sound physical basis that was most promising for many fields of science. The already existing group at the FHI led by Ernst Ruska was immediately expanded considerably. Ruska finished his current work at Siemens and, after 20 fruitful years, gave up his industrial post at Siemens to start a second 20-year scientific career at the Max-Planck-Gesellschaft . In 1955
38
L. LAMBERT AND T. MULVEY
FIGURE12. The Institute for Electron Microscopy.
he was appointed Scientific Member at the Fritz-Haber-Institut and head of an independent department with optimium working conditions and the full support of Max von Laue, Chief Director of the Institute. Two years later, Ruska became a director at the Fritz-Haber-Institut, and a spacious institute was being built for him in several stages, the Institut fur Elektronenmikroskopie (IFE). Again, many of his former co-workers from Siemens followed him to the more academic surrounding of Dahlem as soon as possible. The comprehensive exhibition of electron microscopes on the occasion of the IVth International Congress of Electron Microscopy, in Berlin in 1958, was, in fact, exhibited in this new building. During this decade, while Ruska was still taking three steps at a time, two important electron microscopists passed away after short but severe illness: Bod0 von Borries (1956) died in Dusseldorf at the age of only 51, leaving behind Ruska's sister Hedwig as a widow with five adolescent children; and Walter Glaser (1960), aged 54 in Vienna. Even Hans Busch, then in his 70s, was still exceptionally fit for his age and followed keenly any progress in electron microscopy. He lived up to his 90th year, and it was only in 1973 that he conceded on his last postcard to Ruska ". . .ich selbst kann mit meinen fast 90 Jahren auf unserem Gebiet leider nicht
ERNST RUSKA: A MEMOIR
39
FIGURE13. Presentation of the symbolic key to Ernst Ruska by Adolf Butenandt, President of the Max Planck Society, at the ceremonial extension to the Fritz Haber Institute (IFE, library and administration building) on October 9, 1963.
mehr mithalten . . ." (. . .now at almost 90, I can unfortunately no longer keep up with electron microscopy.. .). Max Knoll, who in 1948 had gone to RCA in Princeton, New Jersey, returned to Germany when he was offered a chair at the Technische Hochschule Munich in 1956. He attended, of course, the Berlin Congress on Electron Microscopy (1958), and from then on, Ruska and Knoll kept in contact. They met in Berlin or Munich, whenever the occasion arose.
XI. THE INSTITUT FUR ELEKTRONENMIKROSKOPIE Now that Ruska had his own fine institute and the freedom to carry out basic research, he continued his career, aimed at the attainment of atomic
40
L. LAMBERT AND T. MULVEY
1
diaphragm for limiting irradiated 4
I
i
FIGURE14. The single-field condenser-objective. (a) Glaser’s (1940) theoretical concept of the high-resolution “condenser-objective” lens. The specimen is placed at the center of the gap, where the magnetic field is a maximum. The specimen is inserted by means of a side-entry stage. The upper part of the field acts as a powerful condenser lens; the lower part acts as an equally powerful objective lens. The lens excitation and overall volume are twice as large as that of an ordinary objective. High precision of manufacture is essential. (b) The first practical realisation of such a high-resolution lens (Riecke and Ruska, 1966). Left: The side-entry specimen stage and airlock. Right: One of the specimen movement controls. Two large iron yokes terminating in carefully designed iron polepieces create a high field strength in a small airgap. This type of lens is now a key component in highresolution TEM, STEM, and SEM. (Continued on facing page.)
ERNST RUSKA: A MEMOIR
41
resolution. First, electron optical benches were built and experiments were carried out on a large scale to improve the electron microscope, experimentally and theoretically. For instance, investigations were made to optimize the shape and the material of the polepieces; the electron gun had to be improved in brightness; moreover, the luminescence properties of the final image screen had to be improved. Another concern was the reduction of specimen contamination due to carbon formed from hydrocarbon compounds in the residual gas in the microscope column. The hightension stability, the lens current stability, the mechanical stability, and the shielding from magnetic stray fields all had to be improved. All these shortcomings had, for a long time, been factors limiting the resolution. Now Ruska and his main collaborators, some highly qualified scientists in the field and ample expert staff, worked hard on the realization of all these goals. Within the scope of this memoir, it is not possible to mention all these collaborators by name. When all technical conditions for high resolution had been largely realized, Ruska could finally attack the problem of designing an optimum electron objective lens. W. Glaser had calculated theoretically, some 20 years before, the optical data for the so-called single-field condenser objective; the main feature of this lens is that the first half of the field is used as a condenser and the second half as an objective, thus the specimen is positioned at the field maximum. Glaser’s calculations were based on his mathematical model, the “bell-shaped’’ field, which gave no direct guidance how to design the relevant lens structure. Ruska entrusted
42
L. LAMBERT AND T. MULVEY
W. D. Riecke with the task of analyzing this lens and testing it on an optical bench. As a result, a single-field condensor objective lens was designed with very small spherical aberration. This lens was then incorporated in an especially well-designed electron microscope and proved remarkably successful. This microscope, the DEEKO 100 (DurchstrahlungsElektronenmikroskop mit Einfeld-Kondensor-Objektiv), was described at the VIth International Congress in Kyoto in 1966 (Riecke and Ruska, 1966). The possibility of obtaining a point resolution in the atomic region came immediately in sight, and the enormous importance of the electron microscope was now increasingly recognized. Over the years the design of the single-field condenser-objective lens has been improved in detail, to the extent that it has been adopted universally in all forms of electron microscopes, TEM, STEM, and SEM. This continuous and persistent innovation in design marked out Ruska as a designer of outstanding ability and persistence. XII. Two DISTRESSING EVENTS Once more, two important electron microscopists were soon to leave the scene: Max Knoll and Helmut Ruska. Early in 1969, Ernst Ruska met Max Knoll in Munich and was taken aback when he saw him. The symptoms of the progressing Parkinson’s disease were now clearly visible. In the spring of 1969, Knoll suffered a bout of severe influenza which must have accelerated his old malady. In June 1969 he had to be invalided into the Psychiatric Department of Munich University. Knoll had no close relatives who could have taken care of him, although he had, in fact, been married three times-to two women! After divorcing his first wife, he married her sister. But in 1947 he remarried his first wife. Knoll was an introverted, sensitive person. After the death of his wife in 1961, a good old friend of his, the widow of a general, took care of his special needs. As he was diabetic, he had to live on a diabetic diet. Moreover, he was a vegetarian. Knoll died under very sad circumstances, suffering terrible fears at the end of his life. Fortunately, this old lady friend sat by his side in the hospital, day and night, until he passed away on November 6, 1969. Max Knoll did not live to see the unveiling of a memorial tablet on September 1 1 , 1975, at the Technische Universitat Berlin stating: In den anliegenden Raumen wurde im Jahre 193 1 von Max Knoll and Ernst Ruska das erste Elektronenrnikroskop gebaut und erprobt. [In the adjacent rooms Max Knoll and Ernst Ruska built and tested the first electron microscope in 1931.1
ERNST RUSKA: A MEMOIR
43
FIGURE15. Max Knoll and Ernst Ruska on the occasion of Knoll being nominated Honorary Member of the German Society for Electron Microscopy (EM Meeting, September 1967, Marburg).
In 1970, Ernst and Helmut Ruska, at the height of their careers, were awarded the Paul Ehrlich und Ludwig Darmstaedter Prize, the most famous and highly valued medical honor in Germany, for the “common work of the engineer and the physician.” This was their finest hour! Three years later, on August 30, 1973, Helmut Ruska, too, passed away after a short but severe illness. This was a severe blow to Ernst, a very painful loss. Not only did he lose his brother, he lost his best and most reliable friend and an important colleague for discussing problems in electron microscopy. It was most fortunate that Ernst and Helmut, coming from two different fields of science, could, in a common effort and in perfect harmony, open up a dimension in microscopy that was previously unimaginable. Only for a short period did the careers of the two brothers diverge, but they soon teamed up again. Time and again they played the ball into each other’s court: The biologist stimulated the engineer with his ideas and demands, and vice versa. They were both masters at coordinating their work.
44
L . LAMBERT AND T. MULVEY
FIGURE16. Mrs. K. Strobel, Minister for Health, presenting the Paul-Ehrlich and Ludwig-Darmstaedter Prize to Ernst Ruska. Beside E. Ruska stands his brother Helmut (1970). (Photo: Lutz Kleinhans, Frankfort.)
Helmut wished to be buried in Berlin, because it was in Berlin that he had spent the most fruitful and most important years of his scientific career.
XIII. A NEWCHALLENGE The bestowal of the Paul Ehrlich Prize was no reason for Ernst Ruska to sit back and rest on his laurels at the age of 64. The many important improvements in the electron microscope achieved so far, and even the sophisticated single-field condenser objective, had not yet proved entirely sufficient for obtaining atomic resolution routinely. A point had been reached where it seemed that something was hindering further progress. Electron microscopy has often reached a plateau of this nature, for example, inadequate specimen preparation, instability of the electronic supplies, astigmatism in the images, etc. In this case, the limit turned out to be external mechanical vibrations of the building, which caused a random
ERNST RUSKA: A MEMOIR
45
disturbance of the image. High resolution could be obtained on favorable occasions, but not routinely. In 1969, the proposal had been put forward that the Institut fur Elektronenmikroskopie should be moved to some other place where external disturbances such as traffic or industry would not impair the image resolution. As a suitable site for a high-resolution TEM, a disused stone quarry near Baden-Baden in the Black Forest was suggested. Ruska regarded this idea as too esoteric. He argued that it made no sense to go into the countryside to carry out research. What is the use of external rural stillness when we know that any laboratory generates its own internal noise, once it is equipped with machinery and people working in the vicinity of the microscope? However initially calm the environment might be, it is soon wrecked by the new environment. He objected strongly to the project on the grounds that it ought to be possible to carry out research in a “normally disturbed” environment. In particular, it must be possible to design electron microscopes that could be used in big cities, in hospitals, in universities. Ruska’s counterproposal was to get down to the roots of the problems and eliminate them. Once more Ruska was a fighter. Whenever the wind howled in his face, he felt stronger than ever. In the event, the committee that had to decide on the funds for the Baden-Baden project came to the conclusion that the project was far too ambitious. It would have necessitated an entire new infrastructure around the new institute. Another disadvantage was that there was no university in the near vicinity. In the end, Ruska’s arguments won the day. As a result, Ruska obtained further funds in 1970 and became once more very active and successful! He immediately embarked on his new project, “the stabilization and shielding of electron microscopes against external mechanical disturbances.” A new building complex was started as an annex to his institute. Happily, a plot of land was available only a stone’s throw from his institute and even closer to the Underground station that would provide challenging vibrations. His new building, with two towers, was to incorporate new antivibration foundations to ensure extreme stability of the high-resolution electron microscopes then under construction. It was completed in 1974, shortly before Ruska’s retirement. To solve the basic problem, Ruska attacked it, characteristically, from several directions. The central problem was to suppress the mechanical deformation of the iron circuit of the objective lens carrying the magnetic flux. Hence, he put the objective lens of the DEEKO 100 into a protective casing. In addition, a massive tetrahedron support was built for stabilizing the whole column of the DEEKO 100. But the most striking measure was the construction of a new antivibration isolating foundation for the DEEKO 100 itself.
46
L. LAMBERT AND T. MULVEY
FIGURE 17. Ernst Ruska watching the steel cage for the DEEKO column being lowered into the double-walled tower (1974).
The DEEKO 100, weighing about 1 metric ton, was housed in a 20-ton steel cage of rigid construction. To protect it from ground disturbances, the cage was suspended as a pendulum inside a double-walled tower by means of three 10-m-long plastic cables from the roof of the inner tower. The outer tower protected the inner tower from ground and air movements to a great extent. The floor of the microscope room on which the control console and the operator’s chair stood was attached to the outer tower
ERNST RUSKA: A MEMOIR
47
FIGURE 18. The Emst-Ruska-Bau.
so that it had no connection with the inner tower or with the suspended cage that housed the microscope itself. These combined measures solved the external vibration problems. His 1931 electron microscope had yielded sharp images at a magnification of about 14x. Now, in 1975, atomic resolution at a useful magnification of 800,OOOX was possible. The DEEKO 100 and its suspension system was indeed unique. It was of course an expensive solution, but it did solve the problem. It also alerted other designers to the need to design electron microscopes that were inherently insensitive to external vibration. When the Nobel Prize was awarded to Ernst Ruska in 1986, this new building was named the “Ernst-Ruska-Bau.” XIV. EXTRAMURAL ACTIVITIES Ruska was not keen on traveling, especially to distant countries. He nevertheless attended and chaired many EM meetings in Europe and accepted invitations to visit the United States, Japan (1956), and China (1957), warmly commending the latter country. He admired the philosophical calm, the wisdom and innate courtesy of the Chinese, but also the
48
L. LAMBERT AND T. MULVEY
industrial progress that was being made, and was a little annoyed by some high-ranking German politician who, for ideological reasons, shrugged off any praise on China with the remark, “alles nur Potemkinsche Dorfer” (nothing but Potemkin’s villages). Likewise Ruska was not overenthusiastic taking on honorary tasks such as being president or chairman, which usually entail a lot of paperwork and organization. He accepted such tasks as a social duty but was glad when he could eventually delegate them to others. From 1949 to 1971, Ernst Ruska gave lectures on the fundamentals of electron optics and electron microscopy both at the Free University of Berlin and at the Technical University of Berlin. These lectures imposed an onerous burden on him, and he was relieved when this was over. He was always suggesting interesting and demanding tasks for diploma works and for doctoral theses. His main concern was always toward high-resolution electron microscopy; this manifested itself, for example, in the DEEKO and similar instrumentation, but in addition, various individual independent projects were attacked and led to remarkable results. So, for instance, a photoelectron microscope was developed that served as prototype for a commercial instrument of Balzer AG. This kind of surface microscopy has been further developed and successfully applied in surface physics. The students were given great freedom, as Ruska himself had experienced as a student at the Technische Hochschule Berlin under Knoll and Matthias. Students were usually assigned to a group led by a expert scientist and were provided with good working conditions. Ruska was not the “hail fellow, well met” type, but rather tended to keep a certain distance. He was recognized as an authority, but at the same time he was popular, even loved. Many photos of institute festivities prove this (see Fig. 19). He had a reputation for being witty and was very good at repartee.
XV. RELAXATION As a compensation for stress and too much sedentary work, Ruska indulged his old passion-swimming-whenever the opportunity presented itself, which was, however, fairly infrequent. But, for instance, on an Institute’s staff outing to Lake Tegel in Berlin 1954, as in his younger days, he was not content with just swimming around near the shore or halfway over to the Isle of Scharfenberg and then turn back as the others did; this was not his style. He alone swam over to the island and then around it. The diameter of the island is about 2 km. Swimming round this
ERNST KUSKA: A M E M O I R
49
island is, by the way, an annual sports competition for Berlin scholars, many of whom give up halfway. Another such possibility of overindulging in swimming offered itself each year when Ruska spent holidays in Murnau, Upper Bavaria, where he liked to swim from Seehausen over to the Isle of Worth, his wife anxiously awaiting his return. She was, of course, not allowed to accompany him in a boat. Ruska had the chest of an athlete, but in 1971, when he was 65, he finally admitted that he should now give up such excesses. Ruska was basically apolitical. He had, however, a general political attitude; he was liberal. Always knee-deep in work, he was not at all interested in being active politically or even being a member of a political party. He was glad that he was never pestered politically. He stood up for his political conviction though, and showed personal courage. At n o
50
L . LAMBERT AND T. MULVEY
time did Ruska believe in the Hitler mythology. When he saw on Kristallnacht, November 9, 1939, a burning synagogue while on his way to Siemens, he was very upset and blurted out, “One day the whole of Berlin will be burning.” He was criticized for saying this by some of his Siemens colleagues and friends. A couple of years later, in 1943, when Ruska learnt that the Jewish wife and children of a high Siemens official had been refused entry into the air-raid shelter, reserved for Aryans, in the block of buildings where they lived, he spontaneously offered space in his apartment for them to move in permanently. His own family and the family of the upper floor were evacuated, so they now even had a cellar of their own. Again Ruska ignored the warnings of his friends and colleagues to be more prudent. “Nobody can forbid me to have friends in my house,” he argued. The Jewish woman in fact belonged to the small group of “privileged” Jews who were protected by being married to Aryans.2 When her husband eventually complained to higher authority, they were given permission to use the public shelter, but they preferred to stay in Ruska’s house, partly because it was situated near a woodland and was thus slightly more bombproof. Ernst Ruska was sometimes criticized for overreacting to incorrect statements in the literature of the electron microscope, although his position in electron microscopy was utterly secure. Ruska had an extremely pronounced sense of fair dealing. He always quoted others correctly. After the death of Bod0 von Borries, he saw to it that von Borries obtained a separate entry in the new issue of the 12-volumeBrockhaus Encyclopedia published in 1958. He also repeatedly wrote letters to editors to correct them when they did not give adequate credit to Knoll, von Borries, Briiche, or whomever. Naturally, he himself wanted to be treated fairly. Ruska was someone who preferred to blurt out what he thought; he liked a manto-man argument. XVI. THEEMERITUS PROFESSOR
Ruska’s entire life was restless and hectic. When his retirement was in sight, his wife was a little worried about how he would cope with the new situation. So she cleverly initiated a new project for him. She persuaded
* Jews married to Aryans with children that were Christians were protected. According to the Berlin Handbook, Lexikon der Bundeshauptstadt, FAB Verlag Berlin, p. 623 (1992), 4700 Jews survived the war in Berlin in “privileged mixed-marriages.”
ERNST RUSKA: A MEMOIR
51
him to build a house in Ticino, Switzerland. This new task would make it necessary to leave Berlin from time to time. Ruska jumped at the idea. He had always been a little envious of his wife’s beautiful property in Murnau in the picturesque landscape of Upper Bavaria, a beautiful Landhaus-Villa just around the corner from the so-called Russenhaus, where Wassily Kandinsky had lived with his life companion Gabriele Munter, a painter herself. Ruska immediately drew up an outline plan for a bungalow to be built near Lugano. Ruska’s five-room bungalow in Arosio, some 800 m above sea level, has a magnificent view over the lake of Lugano and shows unmistakably the hand of Ernst Ruska. He wanted to have a holiday house that was both practical and of generous proportions. His bungalow, the Casa Rusca, indeed is the most popular destination of all the Ruska offspring. In 1974 Ernst Ruska retired, at the age of 68, but he continued to go to his Altenteil in the institute daily. In 1976 he could finally attack the long-planned project of writing a more comprehensive account of the early history of electron microscopy. The result was his book, published in 1979 (Ruska, 1979), translated by T. Mulvey in 1980 (Ruska, 1980). Immediately after completing this, another comprehensive task fell to Ruska. Two historic exhibitions were planned for the 2nd International Congress on Cell Biology in Berlin (1980): 1 . The Development of the Light Microscope, with Donald E. O h , Oak Ridge, Tennessee Chairman. 2. The Development of the Electron Microscope, Ernst Ruska. Ruska was asked to construct a full-scale replica of the first Knoll-Ruska 1931 electron microscope. This was planned to be exhibited in full operation. As all parts and the original drawings had been lost during World War 11; everything had to be redrawn. The Technical University of Berlin was about to celebrate its 100th anniversary in 1979. When they heard of the replication of the first microscope, they urged Ruska to have it ready to be shown on this occasion as well. With great effort and commitment and with the support of the workshops of the Fritz-Haber-Institut, the Bundesanstalt fur Materialprufung, the Technische Universitat of Berlin, and some optics firms, the body of the first electron microscope was, in fact, put together in time for the TU anniversary. The instrument was completed in the summer of 1980 and put in operation just a few weeks before the congress. At this congress, the historic exhibition, “Microscopes and Cell Biology,” displayed 150 fine old light microscopes supplied by museums, institutions, and private individuals from many countries. But the highlight of this
52
L. LAMBERT AND T. MULVEY
exhibition was the reconstructed first electron optical instrument of 1931. Many of the 3200 participants of the congress took the opportunity to see the first electron microscope in operation. After the congress it was transferred to the Deutsche Museum in Munich to complete the Optics Department. At the same time, the second 1933 electron microscope (with 0.05-pm resolution) was rebuilt to be shown at the 10th International Congress on Electron Microscopy in Hamburg (1982). When all this work was done, Ruska-now in his mid-70s-began to lean back and relax. He spent only a couple of hours each day in the institute to answer mail and finally spent more time with his family, which had become large; Ruska was an eightfold grandfather. For health reasons-his heart and his back increasingly plagued him-he went more often to Arosio. Its pleasant climate did him a lot of good. On a specialist road map he discovered 12 possible routes to get there. Each time, his wife, the driver, was to explore another pass-one time even the Stilfser Joch, the highest pass in Austria (2757 m). In fact, Irmela Ruska would have preferred always to take the same route, the easiest one, but she willingly fulfilled his wish. In the Ticino they repeatedly encountered the name “Rusca.” There is, for instance, a “Castello Rusca” and, in the immediate vicinity of his house, there is an “Istituta Rusca. ” Ruska wondered if this Rusca was possibly one of his ancestors. He dug out the genealogical studies his father had once started and continued seeking more information on his family roots. In particular, he was interested in finding the parental whereabouts of his early ancestor from Grafenhausen, the ludimagister Franciscus Rusca, who was born December 9, 1729, ex tharo illegitimo,baptized by Abbot I1 of St. Blasien (Black Forest) and then given in adoption. The data on his parents are shrouded in mystery. They were recorded only in the secret archives of the St. Blasien monastery and the court from whence they later “disappeared.” The fact is that the European Royal Court painter Carlo Francesco Rusca (1669-1769) was called in for work in connection with a modification of the St. Blasien monastery in 1728. Ernst Ruska traveled around to track down this famous Rusca, who also painted Friedrich Wilhem I, father of Frederic the Great (the large oil painting was recently exhibited in Schloss Sanssouci in Potsdam), and he produced the only existing painting of Frederic the Great as an adolescent. In the autumn of 1986, for the first time ever, Ruska went to a health resort, to Bad Bellingen (South Baden), for treatment of his rheumatism. Here he learned the news of the Nobel Prize. Things started to boil up up again!
ERNST RUSKA: A MEMOIR
53
XVII. NOBELPRIZE
In 1986 Binnig and Rohrer were under consideration for the Nobel Prize for the design of their scanning tunneling microscope. The electron microscope, which was the basic instrument used in many scientific and technological discoveries in the previous decades, had up to then not been the subject of the award of a Nobel Prize, although it had been put forward at regular intervals over the decades. Two difficulties seemed to stand in the way of such an award. One was the existence of the comprehensive patent on the electron microscope by Reinhold Rudenberg in 1931, although he did not play a significant role in the design and development of the instrument. The other difficulty was that many of the early pioneers in the field had died in the meantime. In fact, Ernst Ruska and Max Knoll, his supervisor, had designed, constructed, and exhibited a prototype electron microscope in Berlin in 193 1, well before the acceptance date of the patent, and so, under German law, were co-users of the patent. Unfortunately, Knoll died in 1969, so ajoint Knoll-Ruska award of the Nobel Prize was not possible. There were, of course, many scientists and engineers who later made many significant contributions to the design of electron microscopes. This difficult situation was eventually resolved by the president of the Swedish Royal Academey ,Professor Sven Johansson, who instigated a profound investigation into'these controversial scientific and personal issues. The result was the following pronouncement by the Nobel committee: The Royal Swedish Academy of Sciences has decided to award the 1986 Nobel Prize in Physics by one half to Professor Ernst Ruska, Fritz-HaberInstitut der Max-Planck-Gesellschaft, Berlin, Federal Republic of Germany, for his fundamental work in electron optics, and for the design of the first electron microscope; and the other half jointly to Dr. Gerd Binnig and Dr. Heinrich Rohrer, IBM Research Laboratory, Zurich, Switzerland, for their design of the scanning tunneling microscope (Royal Swedish Academy of Sciences, 1986a).
In a more detailed Information on the Prize, the Academy pointed out: The significance of the electron microscope in different fields of science such as biology and medicine is now fully established: it is one of the most important inventions of this century. Its development began with work carried out by Ruska as a young student at the Berlin Technical University at the end of the 1920's. He found that a magnetic coil could act as a lens for electrons, and that such an electron lens could be used to obtain an image of an object irradiated with electrons. . . . Using two coils in series,
54
L. LAMBERT AND T. MULVEY
Ruska achieved a magnification of fifteen times. Even though this was a modest result, it nevertheless represents the first prototype of an electron microscope. Ruska subsequently worked purposefully to improve the details, and in 1933 he built what can be described as the first electron microscope in the modem sense-an instrument with considerably better performance than a conventional light microscope’s. (Royal Swedish Academy of Sciences, Nobel Prize citation, 198613).
Moreover, in his laudatio on the occasion of the presentation of the prize on December 10, 1986 in the Concert Hall in Stockholm, Johansson pointed out: “Several scientists, among them Hans Busch, Max Knoll and Bod0 von Borries, contributed to the development of the electron microscope, but Ernst Ruska deserves to be placed foremost.” Ernst Ruska was content; but he took it with composure, with the wisdom of age. He had not expected the Nobel Prize at this stage, but now he felt great satisfaction that the field of electron microscopy as a whole had finally been recognized. Ernst, in the Banquet Speech that he, as the oldest of the three Physics Laureates, had to give on December 10, 1986, in Stockholm said: A Nobel prize automatically implies the recognition of the workers in the Laureate’s field. I think that I do not only speak for myself but also for our colleagues when I thank the Committee for honouring today our efforts to elucidate the fine structure of matter. Most Laureates have been accompanied on their way to success by dedicated and diligent assistants who are not in the limelight today. Our sincere gratitude should therefore include all these collaborators.
Particularly in Berlin, the news of the Nobel Prize to Ernst Ruska provoked widespread elation. Harro Zimmer, of the Berlin RIAS (Rundfunk Im Amerikanischen Sektor), in his broadcast, “Zeichen der Zeit-Technik und Forschung heute” (Signs of the Time-Technique and Research Today) on October 18, 1986, on the occasion of the Nobel Prize to Ernst Ruska, said: Die Uberraschung war perfekt. Kein Sterbenswortchen war aus den Klausursitzungen des Preis-Komitees in Stockholm an die AuBenwelt gedrungen. Am letzten Mittwoch, kurz nach 12.00 Uhr, war es dann soweit: Unter den drei gekurten Physikern war ein Name dabei, der wie eine Bombe einschlug: Ernst Ruska. Ein Mann, der seit sechs Jahrzehnten hier in dieser Stadt ansassig ist, der das biblische Alter von 80 Jahren fast erreicht hat. Ein Forscher, der zu einer Berliner Institution geworden war, sich aber in den letzten Jahren von offentlichen Aktivitaten zuriickgezogen hatte. Die Reaktionen reichten vom flapsigen “Ja, lebt e r denn noch?” bis zur ahnungslosen Frage: “Hat er denn den Preis nicht schon langst?”
55
ERNST RUSKA: A MEMOIR
The surprise was complete. Not a syllable had slipped out of the closed sessions of the Nobel Committee in Stockholm. Last Wednesday, shortly after noon, it was all settled: Among the three elected physicists was one name that exploded like a bomb: Ernst Ruska. A man who has dwelt in this town for six decades and has almost reached the biblical age of 80. A scientist who had become a Berlin institution, but had retired from public activities during recent years. The reactions ranged from the flippant “What, is he still alive?” up to the ill-informed question: “Didn’t he already have the prize long ago?”
Ernst Ruska was inundated with congratulatory letters, both to the institute and to his private address in Max-Eyth-Strasse, from letters from former classmates from his primary school to letters from scientists all over the world and from eminent politicians. Even the commander who had dressed him down during his (1937) Potsdam military service now sent brief military-style congratulations. His whole life unfolded before his eyes. Many letters began with the words, “You will not remember me.. .,” but Ernst Ruska recognized them all, and answered each letter individually. This obviously gave him much pleasure. There was no question of a stereotype letter to be changed slightly depending on the recipient. In fact, in contrast to his rather boring, dry scientific style, Ernst Ruska now composed witty, humorous letters, or even consoling ones. In fact, quite a number of widows of earlier employees had written to him. Ruska tried to sympathize with them, to give them hope. Particularly touching was his answer to a letter of 7- to 9-year-old children of a primary school in Grafenhausen. Prof. Dr. Ing. Ernst Ruska &n.E+Qr&
Berlin-Oohlern,den
4 .2. 1987
10.Yao &rlm 33 Id 0318770
An die 2. und 4. Klasse der Ferdinand-Ruska-Schule Grund- und Hauptschule 7631 Kappel-Grafenhausen
Uebe Schlilerinnen und SchUler!
N r Eure zahlrelchen Gliickwiinsche und Eure hUbschen Bilder zu meinem Nobelprels danke lch Euch sehr herzlich. In Bellingen hat e s meiner Frau
56
L. LAMBERT AND T. MULVEY und mir sehr gut gefallen. SUdbaden ist schon ein sehr schones Land. Sparer sind wir nach Stockholm geflogen, wo ich meinen Preis dann erhalten habe. Es war eine wundervolle Feier. In Schweden gibt es noch einen Konig und eine Konigin, die so schon ist wie eine Marchenktinigin. Aber man kann wirklich neben ihr am Tisch sitzen und sich rnit ihr unterhalten. Damit Ihr es auch seht, schicke ich Euch ein groRes buntes Bild aus einer Zeitung. Eure Schule t r a g t den Namen meines 1826 in Grafenhausen geborenen GroRvaters, der sich spiiter als Lehrer in Mahlberg, Bernau. Biihl und Badenscheuern durch seine Begabung und seinen groRen FleiR einen sehr guten Ruf e rworbin hat. Zum Andenken an ihn haben zuerst mein Vater und sp at er ich die Buchpreise fUr die b este Schtilerin und den besten Schuler gestiftet, die jedes Jahr von d er Schule abgehen. Ich freue mich Uber jeden von Euch. de r in de r Schule und sp ater im Leben durch Begabung und FleiR Erfolg hat. Dazu ist wichtig, rechtzeitig zu erkennen, wozu man begabt ist. Wenn man fUr e tw a s groRes Interesse hat, fallt e s gar nicht mehr schwer. dafUr auch fleiRig zu arbeiten. In diesem Sinne wunsche ich Euch allen fur die Zukunft GlUck und Lebensfreude. Euer
To the 2nd and 4th class of the Ferdinand-Ruska-Schule Basic and High School 7631 Kappel-Grafenhausen Dear girls and boys,
I thank you most heartily f o r your multitudinous good wishes and splendid sketches in connection with my Nobel Prize. My wife and I greatly enjoyed being in Bellingen. Suedbaden is certainly a lovely county. Later on we flew to Stockholm, where I received my Prize. It was a wonderful celebration. In Sweden they still have a King and a Queen who is as beautiful as any fairytale Queen. I could even sit next to her at table and chat with her. So that you can see it for yourselves, I am sending you a large cutting from a newspaper, in colour. Your school bears the name of my grandfather, born in 1826 in Grafenhausen, and who later, as a teacher in Mahlberg, Bernau. and
ERNST RUSKA: A MEMOIR
57
Buehl, through his personal gifts and his great industry earned a very good reputation. As a memorial to him, my father and later I myself founded a Book Prize t o be given each year to the best girl scholar and to the best boy scholar leaving the school. I take pleasure in all o f you who, in the school and later in life, find success through your gifts and hard work. It is important to recognise, in time, where one's gifts lie. If one has a great interest in something, it doesn't seem so difficult to work hard at it. In this context, I wish you all, f o r the future, good luck and the j o y o f living.
Yours, Ernst Ruska
XVIII. STOCKHOLM When Ernst Ruska learned that he would be seated beside Her Majesty the Queen at the Banquet, he was happy and proud at this great honor, but at the same time he was also a little worried. His main concern was with his physical condition; he had serious problems with his heart and his back at the time. In fact, he had to convalesce in a hospital before going to Stockholm, and it was not clear whether he would be able to make the journey. The other concern was, how would the conversation go with the Queen? Those who knew Ernst Ruska will remember that he was a charming, quick-witted companion in a circle of friends and colleagues. Toward those he met for the first time, he was rather reserved and a little stiff. So, before going to Stockholm, he thought about a subject to start the conversation with the Queen at the banquet. Fortunately, he remembered that Queen Silvia was German; hence, conversation would be easy. The Queen was even, like Ernst Ruska himself, born in Heidelberg. Moreover, he remembered that in 1976, when he saw on TV the wedding of King Carl XVI Gustav of Sweden and Silvia Sommerlath from Heidelberg, he already wondered whether this beautiful young girl and future queen possibly belonged to the family Sommerlath in Heidelberg whose son Walther often frequented his parents' house to play with his two older brothers Hans and Walter. Ruska decided to make this the first point of contact at the banquet. Queen Silvia, however, anticipated his problem. At the presentation of the Laureates to the royal family before the banquet, she came up to Ruska, smiled cheerfully, and transferred warm greetings from her father, Walther Sommerlath, who had called her up on October 16, straight after he had learned in the news that the Nobel
58
L. LAMBERT AND T. MULVEY
Prize in Physics had been awarded to Ernst Ruska. Walther Sommerlath immediately realized that Ruska was one of the numerous children he used to play with in Monchhofstrasse 8, the home of his former math teacher, Julius Ruska. So the ice was broken, and at the banquet, Queen Silvia and Ernst Ruska reveled in common memories: Heidelberg then and now. Ernst Ruska told the Queen of the rough times at school in those days, the high “morality” then in vogue in Heidelberg, with girls’ and boys’ high schools strictly segregated. One more thing they found they had in common: The Queen was born one day before, Ruska one day after Christmas Eve, so that both of them always felt a little cheated during their childhoods. The photograph in Fig. 20 was published in many newspapers. Commented one German paper, “Heidelberger unter sich” (Heidelbergers together). The festivities on the occasion of the Nobel Prize coincided with Ruska’s 80th birthday and his golden wedding anniversary, One reception followed the other; celebration colloquia were held. Ernst Ruska sighed, “Everywhere I must be the Festschwein (festive pig).
FIGURE20. Ernst Ruska and Queen Silvia at the Banquet after the Prize Presentation on December 1 1 , 1986. (Photo: Action Press, Hamburg.)
ERNST RUSKA: A MEMOIR
59
XIX. EPILOGUE This Memoir would lose credibility if one private aspect of Ernst Ruska’s life were to be concealed. Ruska himself would not like this story to be suppressed, because it was part of his biography which he openly admitted. In fact, the older he became, the stronger was his desire to put his biographical house in order. The story, although involving intimate personal matters, not widely known previously, is nevertheless recounted here in some detail to describe objectively the facts and the situation that the three people concerned found themselves in. At the end of World War 11, in the bleak, hopeless situation where nobody knew whether he or she would be alive the next day, Ilse H . , a young Austrian conscript employee with Siemens, scared to death about the future, joined Dr. Nistler and Ruska, always keeping close on the heels of one of them. Together they went through those chaotic days of May 1945. Ernst was deeply depressed and emotionally shattered by being cut off from his family. All that mattered to him, his whole life’s work, in fact, was gone; there seemed to be no future. This young girl, brought up as a strict Catholic, now suffering from depression herself, was to Ruska like a straw to clutch. She gave him the courage and the desire to fight for life. And it was she who prevented his deportation to the Soviet Union as described above. When Ernst returned from the POW camp in July 1946, he found that she was pregnant. He unburdened his troubles onto a good friend of his, Dr. Joseph Jantsch, an Austrian, who offered immediate help. Jantsch, formerly a member of the Jesuit order, had taken his examination in physics and mathematics with Walter Glaser in Prague. During the war, Jantsch replaced some called-up physicist in Ruska’s group at Siemens. After the war, he was needed in the school service; the previous teachers had either been killed in the war or they were Nazis and thus not allowed to teach. Dr. Jantsch lived with his unmarried sister. They were happy to take in this young woman with her child. It was now clear to Ruska that he must take full responsibility for the young mother and child, and not just financially. Divorcing Irmela was out of the question for Ernst; he did not want to lose his wife and family. He was very much relieved that Irmela, fair and understanding, was able to forgive him. Ernst did not tell lies or make excuses; he openly discussed with his wife all the problems. For years Ilse H. tried, from Austria and from Switzerland, to get back to her family in Austria, but they repudiated her. Even a heart-to-heart talk with her mother was prevented by her brother, who was aiming at a political career in a rural, strongly Catholic
60
L. LAMBERT AND T. MULVEY
region. Soon afterward, her mother died in an accident. All this weighed heavily on Ernst’s mind. He felt even more responsible and strongly bound to this woman. Ernst and Irmela convinced Ilse in 1951 to return to Berlin to get a solid education. Two years later, a second child was born, this time causing even greater chagrin to Irmela. Ernst knew only that he would have to succeed in gently persuading his wife to accept the situation. And Irmela, convinced that she owed Ernst’s surviving the end of the war to the existence of this young woman, also knew that she had to find a humane solution to the problem. But she also knew that she would not give up her husband. Although deeply hurt, she stayed on with him to help him. She knew that Ernst could only get on with his work if his mental equilibrium was balanced. Irmela, young and inexperienced though she was when they married, had already grown into the role of the wife of an important man. She had realized from the start that Ernst’s passion, or rather “addiction,” was the electron microscope. Throughout her life she put her personal preferences aside, freeing her husband as much as possible to concentrate on his chosen work. Ruska always gratefully acknowledged this. He was well aware that he could not find his way forward without this strong, down-to-earth wife at his side. With her he felt secure; her home was a secure haven from the storm. Irmela reluctantly accepted this second family as a reality. Ernst tried to be as fair as possible to everybody, but always conceded absolute priority to Irmela. Ilse H. remained discreetly in the background. Ernst insisted from the start, however, on one rather awkward arrangement: On December 25, when Christmas Day coincided with his birthday, he wanted his second family to join in the celebrations and be integrated and recognized by his first. The above details are given to illustrate Ernst Ruska’s character, his strong personality, and his straightforward way of solving problems. He wanted to be able to look both women straight in the eye and was himself amazed and always a little proud of how he managed to get everything under one umbrella. This was possible only because both women were scrupulously fair to one another. In 1958 Ilse H. met a man whom she married in 1961, thus founding her own family with two girls added to her two sons. Ernst and Irmela were the witnesses of the marriage. But the December 25th arrangement nevertheless persisted right up to the death of Ernst Ruska. It was aremarkable act of magnanimity, tolerance, and deep understanding on the part of Irmela to go along with all this, regarding it as a “sacrifice to the evils of war.” She was richly rewarded for her generous action, and she never regretted this tough decision. Even after Ernst’s death she did not sever the connection with the second family. Dr. Jantsch and his
ERNST RUSKA: A MEMOIR
61
sister had gained a large family; their readiness to help was recompensed as well. They were taken care of in their house until their deaths.
XX. SUNSET The octogenarian Ernst Ruska could look back upon a remarkably stormy but fulfilled life. An insidious disease which had been with him for some time now progressed rapidly, and for the first time in his life he did not fight; he was tired. On May 27, 1988, he passed away. What remains are indelible impressions, details associated with his personality: his brisk step, vigorous voice, his firm handshake, his roguish smile. At the age of 33, Ernst Ruska received his first scientific prize, the Senckenberg Prize in 1939. It was followed by a succession of important awards and prizes over the years. His career was crowned with the Nobel Prize in 1986. As successful as Ruska had been in his life, he always remained a simple and modest man, never showing arrogance. It was not easy to get close to him, but if one had once earned his confidence or even his friendship, one could utterly rely on him. He loved his work, his family, wine and chamber music; he disliked religious and political fanaticism, intolerance, and insincerity. He was buried by the side of his brother Helmut in the Waldfriedhof in BerlinDahlem. ACKNOWLEDGMENTS The authors are deeply indebted to Mrs. Irmela Ruska for most helpful discussions and for making available family documents, photographs, and key sources of information concerning Ernst Ruska’s personal and career development. They also wish to thank Professor Elmar Zeitler, director of the Department of Electron Microscopy of the Fritz-Haber-Institute, Berlin, and members of his staff, for their critical encouragement and help in preparing this memoir. REFERENCES Knoll, M. (1968). Mikroskopie 23, 70. Riecke, W. D., and Ruska, E. (1966). Sixth Int. “Conf. Elec. Microsc. Kyoto 1966,” (Uyeda, Ed.), Vol. 1, pp. 19-20, Maruzen, Tokyo.
62
L . LAMBERT AND T. MULVEY
Royal Swedish Academy of Sciences (15 October 1986a). Nobel Prize citation. Royal Swedish Academy of Sciences (15 October 1986b). Information, Nobel Prize. Ruska, E. (1949). Optik 5, 457-459. Ruska, E. (1979). “Die friihe Entwicklung der Elektronenlinsen und der Elektronenmikroskopie,” Acta Historica Leopoldina, No. 12, pp. 120ff. Ruska, E. (1980). “The Early Development of Electron Lenses and Electron Microscopy” (Thomas Mulvey, Transl.), S. Hirzel Verlag Stuttgart, pp. 120ff.
ADVANCES IN IMAGING AND ELECTRON PHYSICS. VOL. 95
Electron Field Emission from Atom-Sources: Fabrication, Properties, and Applications of Nanotips VU THIEN BINH Laboratoire d'Emission Electronique, DPM-URA CNRS, Universite Claude Bernard Lyon I , 69622 ViNeurbanne. France
N . GARCIA Fisica de Sistemas Pequeiios, CSIC, Universidad Autonoma de Madrid, CIII, 28049 Madrid, Spain AND
S . T. PURCELL Laboratoire d'Emission Electronique, DPM-URA CNRS, Universite Claude Bernard Lyon I , 69622 Villeurbanne, France 1. Introduction . . . . . . . . . . . . . . . . 11. Electron Emission from a Metal Surface: Summary of the Basic A. Metalivacuum Barrier . . . , . . . . . . . . B. Emission Currents . . . . . . . . . . . . . C. Energy Distribution of Emitted Electrons . . . . . . D. Current Density Distribution , . . . . . . . . . E. Current Stability . . . . . . . . . . . . . 111. Electron Emission from Nanotips . . . . . . . . . A. Experimental Setup and Procedures. . . . . . . . B. Confinement of the Field Emitting Area . . . . , . C. Field Emission Characteristics from Nanotips: Experiment D. Field Emission Characteristics from Nanotips: Discussion . IV. Applications . . . . . . . . . . . . . . . . A. Atomic Resolution under FEM . . . . . . . . . B. Monochromatic Electron Beam . . . . . . . . . C. Local Heating and Cooling by Nottingham Effect . . . D. Fresnel Projection Microscopy , . . . . . . . . E. Ferromagnetic Nanotips: Atomic Beam Splitter . . . . V. Conclusions . . . . . . . . . . . . . . References . , , . . . . . . . . . . . . .
. .
.
. . . . . . . . . . . . . . . . .
Results
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . .
. .
. .
. . . .
. .
. .
63 64 64 66 72 74 78 81 82 84 97 104 112 112 115 118 124 145 149 150
1. INTRODUCTION A major instrumental development in the history of electron optics and electron microscopy occurred in the early 1960s when the field emission Copyright 1996 by Academic Press. Inc.
63
All rights of reproduction in any Form reserved.
64
VU THIEN BINH ET AL.
gun (FEG) replaced the thermionic emitter as the electron source in the scanning electron microscope. In 1965, Crewe [l] made the first experimental demonstration that a dramatic improvement in resolution could be made by the use of the FEG, because this permitted the electron beam to be focused into a probe area of only a few angstroms in diameter. Further developments can be expected if new improvements of the field emission (FE) tip can be realized, in particular by decreasing (1) the size of the emitting area, (2) the angular dispersion of the emitted flux, (3) the extraction voltage, and (4)the width of the energy distribution of the emitted electrons; and by increasing the stability of the emission. This chapter summarizes the improvements in all of these aspects that can be realized by the use of a nanotip as a FE source. These nanotips are single-atom sharpness nanoprotrusions, 2 to 5 nm in height, on top of hemispherical base tips for which the whole FE current is emitted from the topmost apex atom [2, 31. The object of this chapter is to present the consolidated results obtained with controlled field emission from nanotips and in particular to discuss the specific effects related to the fact that the source is atomic size. This chapter does not present an exhaustive review of field emission. A good number of review papers and books exist in the literature about field emission and its applications, even at its very beginning [4-81. However, in Section I1 we summarize the basic results of field emission theory, including discussions of thermionic emission, in order to place the field emission from nanotips in the global context of electron sources. Characteristics of field emission from nanotips are then presented and discussed in Section 111. The usefulness of the very specific properties of nanotips is convincing only if they permit new advances to be developed. The demonstration of their utility will be explored in Section IV. Among other examples, nanometric-resolved images of synthetic polymers and RNA-based biological molecules will be presented and discussed. They were obtained with the Fresnel projection microscope (FPM), using the nanotip as an atomsize electron source. 11. ELECTRON EMISSION FROM A METALSURFACE: SUMMARY OF THE BASICRESULTS A . MetallVacuum Barrier
Within a metal, an electron current density of roughlyj, = enou, impinges on the inner surface, where no is the electron density, e is the electron
ELECTRON FIELD EMISSION FROM ATOM SOURCES
65
Position (A> FIGURE1 . Potential energy for an electron in the vicinity of a metal surface with and without applied fields. The decrease in the effective barrier due to the Schottky effect is 3.8 F”z.
charge, and urnis the electron velocity. For no between 10” and loz3cm-3 and u, = lo8 cm * s-‘ near the Fermi energy E F ,j, is -10I2 A . cm-2. Only a small fraction of this current escapes from the metal due to the surface tunneling barrier which is presented schematically in Fig. 1. This barrier is higher than EF by the value of the work function #, about 2 to 5 eV at zero applied field, and it is modified by the application of an electric field F, which decreases the potential energy by an amount of -eF,x outside the metal, x being the distance from the tip surface. Near the surface the emitted electron experiences an image force, which comes from the attraction of the induced positive charge in the metal. The standard form for potential energy of an electron V ( x ) at a distance x from the cathode with the zero of energy at the bottom of the conduction band is e2 V ( x )= EF + # -eF,x - - - E F 4x
V ( x )= 0
+ # -eF,x
3.6 --
forx >x, (1)
X
for x < x,
with the energies in eV, the fields in VIA, the distance x in A,and x, = 3.6/(EF + #) = 0.3 A is chosen such that V(x,) = 0. The potential energy given by Eq. (1) shows a maximum at
66
VU THIEN BINH ET AL.
giving an effective work function in the presence of the applied field of +en
=
+ - e3l2F:l2= 6 - 3.8FA1*.
(3)
This reduction of the height of the barrier by 3 . 8 F f 2 is called the Schottky effect [9].The barrier width, Ax, for electrons at EF can be obtained from Eq. (1) to be
Extracting electrons from metal surfaces can be done by two main processes, which are represented schematically in Fig. 2: 1. The emission of thermally excited electrons having energy greater than the barrier height. This process includes thermionic emission for zero applied field [Fig. 2.I(a)], and also the Schottky emission, which is thermionic emission in the presence of an applied field [Fig. 2.I(b)]. 2. The tunneling emission of electrons through the barrier with energy lower than the barrier height (Fig. 2.11). This process includes cold field emission for E < EF at T = 0 K and also thermal field emission for thermally excited electrons having EF < E < +eff at T > 0 K. There is an overlap between these two regimes where the field and temperature are such that electrons are emitted both over and through the barrier. As described in the following sections, the main features of the temperature and field dependence of the emission currents and the energy distributions of emitted electrons can be understood by considering the electrons inside the metal to be a free electron gas. B . Emission Currents 1. Thermionic Emission
In zero applied field, the thermionic current J, is given by considering the flux from all the electrons that have energy normal to the surface greater than the barrier height 4:
wheref(p) = 1/{1 + exp[(E - E F ) / k B T ]is} the Fermi-Dirac distribution function for electrons with momentum p and Eminis the minimum normal energy to pass the barrier and kB is the Boltzmann constant. Considering the fact that
ELECTRON FIELD EMISSION FROM ATOM SOURCES
67
I. Thermionic Emission
a
Vacuum
b
11. Field Emission (FE)
FIGURE2. Schematic diagrams for electron emission from a metal, with the respective energy distributions of the emitted electrons. I. Thermionic emission: (a) without applied field; (b) in the presence of applied field (Schottky ernmission). 11. Field emission (FE).
68
VU THIEN BINH ET AL.
where m is the mass of the electrons, then J
=?!I
+X
h3
--m
-=
The thermionic emission condition for electrons of the metal is 2
Px
-2
2m
EF+
4.
(7)
Thus the minimum energy of an electron to be emitted is
Emin= EF
+ PZ + 4 + P: -. 2m
Integration gives Jx = 2ekBTexp h3
(-")kBT I+Xexp (-&) 2 m k ~ TdpyI:mexp (-L) 2mkBT dpz, --m
00
(-&)
J 41r mek2 x2 = d T exp = AoT2exp (-A), h3 kB T
(9)
Equation (9) is the Richardson-Dushman equation. Some parameters for several standard emitters are given in Table I. Though Eq. (9) is for a step barrier, it has the same form for a fieldmodified barrier. For this case, has to be replaced by 4eff given by Eq. (3). The applied field lowers the height of the barrier by A+ = 3.8FA'2 and then the thermionic emission current will be enhanced by exp(A+/kBT). The lowering becomes noticeable for Fo> lo5 V/cm, as can be seen from the values in Table 11. This effect is used in the so-called Schottky emission cathodes, which are basically ZrO-covered W(100)tips with radii -1 pm. The presence of ZrO also lowers the intrinsic work function from 4.5 eV to 2.8 eV,
+
TABLE I PARAMETERS OF SOME THERMIONIC EMISSION CATHODES AT ZEROAPPLIEDFIELD
Work function 4 (eV) Richardson constant A. (A cm-* K-2) Emission current density j , (A cnC2) Working temperature T (K)
W
LaB6
ZrOIW
4.5 75-120 1-3 2600-2900
2.4-2.7 30 25 1400-2000
2.8 100 500 1800
69
ELECTRON FIELD EMISSION FROM ATOM SOURCES TABLE I1
VARIATION OF THE WORKFUNCTION AND CORRESPONDING EMISSION CURRENT WITH APPLIED FIELDI N THE SCHOTTKY REGIME DENSITIES Fo (V/cm) (VIA) A 4 (eV) j(Fo)/j(Fo= 0) (4
105 0.001 0.12 =
4.5 eV, T
=
2000 K)
2
1 06 0.01 0.38 8
108 1 .o 3 .O 1.5 x 109
107 0.1 1.2
800
which permits an emission current density of more than two orders of magnitude larger (-500 A/cm2) at a temperature of 1800 K, in contrast to clean W thermionic cathodes at -2600 K (see Table I). Although the potential barrier is lowered by the Schottky effect, the electrons still have to overcome the barrier by their thermal energy, so it is a little confusing to call this cathode a field emission Schottky gun. 2. Field Emission Application of an external electric field lowers the potential barrier and also modifies the position of the maximum x, and the barrier width as shown in Table 111. For Fo < lo7 V/cm, the barrier width is large and it is practically opaque to the electrons. Appreciable tunneling occurs for F, > 2-3 x LO’. The tunneling current was named field emission because there is no need to heat the cathode in order to deliver the electron current. The Fowler-Nordheim (F-N) model describing the electron emission from metals by application of a high electric field was developed by assuming that the temperature of the metal is 0 K, the free-electron approximation applies inside the metal, the surface is smooth and planar, and the potential barrier closing the surface in the vacuum region consists of an image force potential and a potential due to the applied electric field F,. The emitted flux is found by considering the product of the supply function for the flux of electrons impinging on the barrier and the transmission
TABLE I11 MAXIMUM, xo, AND WIDTH,Ax (ATEF) OF (4 = 4.5 eV) BARRIER
THE
TUNNELING
~
Fo (V/cm)
(A) Ax (A)
xo
1 os 60 4500
106
10’
3 x 10’
108
20 450
7.0 45.0
3.5 15.0
2.0 4.5
70
VU THIEN BINH ET AL.
probability, D(E,). D(EJ depends on the height and width of the barrier, and the subscript x specifies that the transmission probability depends only on the component of energy normal to the surface. D(E,) for a free electron gas in the WKB approximation is given by [lo]
Considering the fact that the electrons tunnel in the narrow range of energy near E F , Eq. (10) results in
with
and e 3 / 2 I/?. ~
F
for F, in V /A and 4J in eV. 3-8T Do corresponds to the transmission probability at the Fermi level, and t ( w ) and u ( w ) are nondimensional, slowly varying functions derived from elliptic integrals [l 11 to take into account the image forces during the tunneling process. For FE, t(w) = 1 and u(w) ranges from 0.4 to 0.8. The tunneling current for a given function D(E,) is w
=
o
=
4J
Substituting D(E,) from Eqs. (1 1) into the integral gives
J , = 1.55 x 10”-
Fi 4t2(W)
with J , in A/cm2, 4J in eV, and F, in VIA. In Table 1V we give some numerical estimates of the applied fields necessary to achieve current densities of lo6 and lo7 A/cm2. As the current density impinging on the metal surface from the inside of the metal is about 10” A/cm2, applied fields for density currents of
ELECTRON FIELD EMISSION FROM ATOM SOURCES
71
TABLE IV CURRENT DENSITIES AND CORRESPONDING FIELDSFOR VARIOUS WORKFUNCTIONS j , = lo7 Alcm2
j , = lo6 A/cm2
+=2eV
Fo = 1.5 x lo7 V/cm u(w) = 0.3964 f ( w ) = 1.0751
+=3eV 4=5eV
F,
=
1.8 x lo7 Vlcm
u(w) = 0.2933
1.0849 Fo = 3.5 x lo7 V/cm u ( w ) = 0.3890 t ( w ) = 1.0758 F, = 7.6 x lo7 Vlcm u(w) = 0.5013 t ( w ) = 1.0646 t(w) =
Fo = 2.9 x lo7 V/cm u(w) = 0.4791 r(w) = 10668 Fo = 6.3 x lo7 V/cm u ( w ) = 0.5749 t ( w ) = 1.0568
106-107 A/cm2 correspond to a transmission probability of the deformed potential barrier of the order of 10-6-10-5, respectively. 3 . I-V Characteristics: The Fowler-Nordheirn Equation
In experimental field emission, the current I is measured as a function of the potential difference V between the tip and the screen. These quantities are related toj, and Fo as
I = J,A
and
V F o = ~ = p V
where A is the emitting surface area, K and p are geometric factors determined by the local geometry of the electron emitter, and r is the tip radius. Equation (13) becomes
[
].
I = 1.55 x 1 0 - 6 q e x p -0.685 43'2 v(w) 4 2v2A t (w) pv
(14b)
with p in AT', A in A2, and I in A. The curve obtained by plotting ln(IlV2)versus 1 / V is called the FowlerNordheim plot. It is practically a straight line whose slope is a function of 4 and p. This behavior is observed experimentally for hemispherical and buildup tips (see Section 111). F-N plots are used in such cases to determine experimentally the tip parameters (4, p, and A) or to follow in-situ tip sharpening due to a variation of 0. At temperatures above 0 K, the electron emission from the thermal tail near EF cannot be ignored. Within the low-temperature approximation,
72
VU THIEN BINH ET AL.
(i.e., T I 1700 K for W), which implies a negligible thermal tail in the electron distribution at V = EF + 4, the current is [12] I - I,
-z
(1.28 X 10s)t2(w)+
I0
where I, and I are, respectively, the currents at temperatures of 0 (K) and T (K). The variation affects the preexponential term of the F-N equation and shows that the increase in the current is proportional to T 2 . It is of the order of 5% when the temperature increases from liquid nitrogen temperature (-78 K) to room temperature (RT = 300 K). However, the temperature dependence does not alter the linear variation of the F-N plot. C . Energy Distribufion of Emitted Electrons
Let us now discuss the energy distribution of the emitted currents for both thermionic and field emission. The width of the distribution is one of the main parameters of importance in the use of the electron beam in microscopy. 1. Thermionic Emission
At high temperature the thernal tail of the Fermi-Dirac distribution, f ( E ) , becomes
As the electrons are emitted in all directions within the half-space, the normalized total energy distribution, F,(E), is given by F,(E) dE = -exp (kB
n2
(-")kB T
dE.
The half-width (FHWM) of this distribution is BE = 2.45kBT with a mean energy (E) = 2kBT and the maximum in the distribution occurs at Em,, = kBT with respect to the vacuum level. For a cathode temperature of -2800 K, this gives an energy spread of around 0.6 eV. The widths of the experimentally measured distributions are generally much higher values, of about 2 eV. The difference between these two values is due to additional mechanisms and the experimental setups used. They are mainly:
ELECTRON FIELD EMISSION FROM ATOM SOURCES
73
The roughness of the cathode emitting area The voltage drop across the emitting area of the cathode when it is heated by electrical resistivity The Boersch or space charge effect [13], which is the result of Coulomb interactions among the electrons inside the emitted e beam The stability of the high-voltage power supply, in particular when very high voltages are used Thermionic emission is then characterized by a wide energy distribution. This is a severe handicap for some applications, and this is one of the reasons that field emission sources, which have narrower energy distributions. have come into use. 2. Field Emission The energy dependence of the electron density emitted in the field emission process, J ( E ) , is described by the total energy distribution (TED), originally derived by Young for a free electron gas [14]. It turns out to depend simply on the product of a transmission probability factor and the FermiDirac distribution function:
with B = 1.58 x 10" exp
[
-6.85
X
1
:7~(~)43n
and 1
t(w)4'/2
-z=
d
1.025-.
Fo
The maximum in the energy distribution relative to EF occurs for
and the half-width at T = 0 K is given by AE(0) = d In 2 For T # 0 K, the expression for AE becomes too complex to be useful. Representative values of the current densities and the peak positions and
74
VU THIEN BINH ET AL. TABLE V
CURRENT DENSITIES (INA/cm2) AND TED PEAKPOSITIONS AND FWHMs (INeV) FOR VARIOUS FIELDSAND TEMPERATURES (a = 4.5 eV)
Fo (V/cm) 77 K J Emx
AE 300 K J Emm
AE
s
8 x lo7
I 0'
5.27 X lo5 -0.0230 0.212
1.26 X 10' -0.0262 0.323
8.44 X lo8 -0.0280 0.396
4.86 X 10' -0.0364 0.203
5.35 X lo5 -0.0520 0.281
1.27 X 10' -0.0654 0.396
8.47 X 10' -0.0716 0.472
8.43 x 10' 0.0555 0.458
6.41 x 10' -0.0373 0.487
1.36 x 10' -0.0966 0.593
8.85 x lo8 -0.121 0.669
-
8.43 x lo5 0.0471 0.680
1.50 X 10' -0.0694 0.746
9.41 x lo8 -0.113 0.816
107
3 x 107
1.65 X lo-'' 0.0114 0.0612
4.65 X 10' -0.0 195 0. I38
2.56 X lo-" 0.00942 0.136
-
x 107
1000 K
J Emax
-
AE
-
1500 K J Emax
-
AE
-
-
widths are tabulated in Table V. Note that the predicted FWHMs are -0.3 eV, which is generally in agreement with experiment. A graphical representation of Eq. (18) is given in Fig. 3. Several characteristics of the TED to note are as follows.
I . The high-energy slope (a) is mostly temperature-dependent. 2. The low-energy slope (b) is mostly field-dependent. 3. At a temperature T* = d/2kB,the average number of FE electrons under E F is equal to those coming from over the Fermi level and Em, = EF. The temperature T* is called the inversion temperature. For T < T * , most of the field emitted electrons are under EF and Em,,< EF. Conversely, for T > T * ,there are more electrons emitted with energy higher than E F , and the maximum in the energy distribution is over the Fermi level. 4. For useful current densities (>lo5 A/cm-*), the width of the energy distribution has a lower limit of -0.2 eV at 77 K and -0.3 eV at 300 K. D . Current Density Distribution An important parameter for the use of emitted electron beams is the current density in the beam. This has two aspects: the current density at
ELECTRON FIELD EMISSION FROM ATOM SOURCES
75
Energy Relative to EF (eV) FIGURE3. Plot of the theoretical TED from Eq. (18) for 4 = 4.5 eV, Fo = 0.5 V/A, and T = 300 K.
the emitting surface and the current density in the beam at some distance from the emitter, which has been influenced by the local field of the whole tip. 1 . Thermionic Emission As most metals melt before they reach a sufficiently high temperature to obtain thermionic emission, the most widely used thermionic cathode consists of a W wire, 100-200 pm in diameter, bent like a hairpin. Only the bent tip of the filament contributes to the emission. The emission area is in the range of 10-'-10-* mm2. In order to reduce the emitting area, sharpened filaments are used either by direct electropolishing of the hairpin wire or by soldering to a heating wire a small electropolished tip with small radius of curvature at the apex, but the size of the emission source remains much larger and the current density much lower than FE sources. 2. Field Emission The density distribution inside the field emitted beam is determined by the field distribution over the emitting area, which means the tip apex.
76
VU THIEN BINH ET AL.
To obtain a high electric field F, at the emitter apex, we use the property that F, near a charged conductor is inversely proportional to the radius of curvature r of its surface [15].
In practice, p has to be calculated by taking in account the exact geometry of the blunt tip after each thermal treatment [ 161. However, an estimation of the electric field at the apex of the tip can be made, within an accuracy of a factor of 2, by using either the hyperboloidal approximation [ 171 F -
2v
- r ln(4Dlr) '
or the paraboloidal approximation [ 181
F, =
2v r ln(2Dlr) *
These equations are valid for r -e D,where r is the tip radius and D is the cathode-anode spacing. It is then easy to estimate that to have field emission (0.3 < F, < 1 V/&, a voltage of few lo3 V is enough if the tip radius is of the order of a few tenths of a pm, for cathode-anode distances in the order of cm. To estimate the variation of the current density over the tip, it is necessary to determine how p vanes as a function of angle from the tip apex. As the tip is usually needle-shaped, it can be usually modeled by a cone with a hemispherical tip end of radius r, as can be seen by the image of the simulated tip in Fig. 4 [16]. P(d)/P,, where Po is the apex value, for a similar tip has been given in ref. 15 and is reproduced in Fig. 5 . The field variation over the emitting area at the tip end induces a varying current density distribution, J ( d ) / J ( O ) ,controlled by Eqs. (13)-( 14), and is also piotted in Fig. 5 for a constant work function and a field of 0.5 V/A at the apex. It shows that the FE e-beam density distribution is roughly a Gaussian shape with a total opening angle OC of -200" for F, = 0.5 VIA, which increases to -240" when the field increases to 0.7 V/A. The figure demonstrates that the e-beam source size is controlled principally by the tip geometry. Superimposed on this Gaussian distribution, the current density variation is also affected by the modification of the work function over the emitting area. For simplicity, however, this variation, which is dependent on each specific crystallographic and adsorption state of the tip end, is not considered here. The second effect of the emitter shank on the density distribution is the compression of the lines of force toward the tip axis, which means
+-
-2000
A
4
FIGURE4. 3D geometry of a FE tip. The shape is the result of a numerical simulation of the morphological changes by surface diffusion for a tip with a cone angle of 14" [16].
I
I
I
-
0.8
a 0.4 0.2
0
45
90
135
180
Angle from apex (deg) FIGURE5. Variation of p(O)lp(O)(from ref. [IS]) and of the current density, J(O)/J(O), given by Eqs. (13)-(14) away from the apex of a hemispherical FE tip. 71
78
VU THIEN BINH ET AL.
FIGURE6. Schematic representation for the virtual radial projection point source V tor a microtip relative to the surface apex and its geometric center C .
that the electron trajectories are not radial. Calculations of the electron trajectories [19] have shown that the full beam opening, BC, decreases to 0" with a ratio of 01, 0.5 I- < 0.7. 0,
Actually, this means that the FE e beam is radially emitted from a virtual radial projection point source, V, which is the intersection point of all the asymptotes of the electron trajectories far away from the tip (Fig. 6), with virtual source size roughly of
A = 2.rrr2(1- cos 0,-). The virtual point V is situated on the tip axis and is shifted behind the hemispherical apex geometric enter C , by a distance of at least the value of the radius of the tip. This compression phenomenon induced by the tip shank can also be treated as the expression of a refractive index, in electron optics terms, with the tip playing the role of an intrinsic electron lens [20]. The effective beam opening angle from a hemispherical tip is then in the range of 45 to 80". E . Current Stability
For both thermionic and field emission sources, the reproducibility and stability of the emission current are determined primarily by the reproducibility and stability of the cathode work function. This can be seen from
ELECTRON FIELD EMISSION FROM ATOM SOURCES
79
Eqs. (9) and (13), which show that the thermionic and FE current densities are exponentially dependent on the work function. 1 . Thermionic Emission
For thermionic emitters, various low-work-function surface treatments are often employed because of the greatly increased currents-for example, by depositing on the W surface (+ = 4.5 eV) either a layer of LaB, ( n = 4 , 6 , 9 ) to lower the work function to 2.52 to 3.35 eV depending on the boron concentration [21], or a layer of ZrO to reach a value of = 2.8 eV [22]. The decreases in the work function arise from the presence of adsorbed surface double layers. In these cases a dipole moment pin,, can be associated with each adsorbate atom. The corresponding change of the work function due to the adsorbed layer is given, in the first approximation, by
+
A$ = 2TpindNaea
(25)
where N , is the maximum number of adsorption sites per unit area, 8, is the fraction of occupied sites, and pin,,is the adsorbed atom moment. This equation implies a linear relation between A 4 and the value Ba, and pin‘,, i.e., changes in the chemical composition of the first monolayer of the emitter surface. The stability of the emitted current is therefore controlled by the stability of the adsorbed layer. This is a very demanding prerequisite for hot cathodes working inside an electron gun environment and especially when the largest possible current density is drawn. Holding the cathode work function to a constant value is thus a very complex technological problem which, added to the difficulty of fabricating homogeneous emitters, has impeded the extensive use of such techniques, as the use of single crystal LaB6 cathodes, for example [23]. 2. Field Emission
FE currents depend exponentially on 43‘2 [Eq. (14)], and thus the reproducibility and stability of the FE current are strongly influenced by adsorption during operation of the emitter. This is the main cause for the regular “regeneration” of the tips in actual FE guns. Figures 7 and 8 show, as illustrative examples, the variations for hemispherical and buildup tips due to the adsorption of gases on the surface from the UHV environment. There is a rapid regular decrease in the FE currents until a few percent of the initial values during the first 10 to 20 min as a consequence of the formation of adsorbed layers. The actual duration is a function of the surrounding working pressure and is traced by a smooth continuous variation of the FEM pattern. The following
80
VU THIEN BINH ET AL. I
I
W<111>Hemispherical Microtip
-
3
40 -~-~
-
20
-
0
Time (min) FIGURE7. Total FE currents in UHV for a hemispherical microtip at fixed applied voltage measured as a function of time from a flash cleaning.
increase of the current is due to the formation of multiple localized emitting areas coming probably from the field-induced formation of small protrusions and local changes in the work function of the adsorbed layer. The origin can be either the surface diffusion of the adsorbed atoms under field gradient and/or ion bombardment. The long-term behavior is then unpredictable. It leads to the appearance of erratic local high emitting zones with subsequent destruction of the tip. In order to avoid the adsorp-
L
'
'
I
0.3
*
8
0.2
9
U 0.1
1
0.0 0
2
4
6
8
10
12
14
16
Time (rnin) FIGURE 8. Total FE currents in UHV for a buildup microtip at fixed applied voltage measured as a function of time from a flash cleaning.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
81
Energy Relative to EF (eV) FIGURE 9. Measured TEDs from a W microtip at 300 K (FWHM = 0.25 eV) and at 1400 K (FWHM’ = 0.58 eV), showing the large increase in energy spread with temperature.
tion process, some FEGs have a working temperature in the range of 1000 K. The tip is quite insensitive to contamination when operating at high temperatures. Continuous FE of several hours is then possible. However, the results of this thermal treatment are an increase in the energy dispersion as shown in Fig. 9 and a geometric instability due to surface diffusion.
111. ELECTRON EMISSION FROM NANOTIPS Passing from thermionic to field emission cathodes principally allows reduction of the emitting area and the energy dispersion of the e beam. This section summarizes the further appreciable improvements over normal microscopic field emission cathodes that can be made by using nanotips as field emission sources. These nanotips consist of 2- to 5-nm-high pyramidal nanoprotrusions of single-atom sharpness on top of hemispherical base tips [2, 31. Due to the atomic size of the emitting area, the field emission characteristics present very specific measured properties compared to the conventional field emission behavior presented above, some of which have not yet been explained. Table VI summarizes the main differences between field emission characteristics from microscopic tips and nanotips. This comparative table shows clearly that nanotips have most of the qualities that can lead to a serious improvement in the FEG. These experimental characteristics as well as the physics of the field emission from
82
V U THIEN BINH ET AL. TABLE VI
COMPARISON
OF
ELECTRON SOURCE PROPERTIES OF MICROSCOPIC TIPS A N D Microscopic tips
Emitting area, A Beam opening, 0" Stability I-V characteristics TED Energy dispersion, AE
2nrz ( 1 - cos 0,) = 7rr? with r 2 2.5 nm 45-80" Minutes/regular decrease F-N straight line From conduction band + a peak at EF AE 2 0.3 eV increasing with T and Fo
NANOTlPS
Nanotips Apex atom 4-6" Hourddiscrete jumps Current saturation From localized band(s) -+ localized peak(s) and shift AE 2 0.06 eV peak shifting with F,,
one atom will be discussed in order to explain these specific properties. The discussion is organized into four main parts: II1.A. A description of the experimental system used for these studies which includes field electron microscopies (FEM), field electron emission spectroscopy (FEES), and field ion microscopy (FIM) in the same chamber. 1II.B. A comparison of nanotips with other tips that also exhibit a confinement of the field emission area. 1II.C. An overview of the experimental characteristics of the field emitted beams from nanotips, pointing out the specific properties that are attached to the atomic size of the emitting area. I11.D. Discussions about the physics of the observed emission properties taking into account the atomic nature of a nanotip. A . ExperitnPntal Setup and Procedures Most of the experimental results that are presented in this chapter were obtained with the experimental installation whose schematic diagram is shown in Fig. 10. This installation includes, in the same ultrahigh-vacuum chamber, the possibilities of in siru tip treatments, field electron emission microscopy (FEM), field electron emission spectroscopy (FEES), and field ion microscopy (FIM). The tip mounting includes both a mechanical movement and electrostatic and magnetic deflection systems, which allow transfer among FEM, FEES, and FIM measurements at will and within the same environment. The whole device is inside a chamber which has
FIGURE 15. 3D calculated field distribution with atomic resolution over a n equipotential m-face for a nanotip (p,) = 4 n m and base diameter = 4 nm). ( a ) A complete view of the nanoprotrusion o n top of the 50-nm-radius base tip. (b) A close view of the nanoprotrusion apex to show the local field enhancement over the topmost atom. The color x a l e represents the I-ange of variation of the fi f x t o r . which is between lo4 and 10' cm-'. (From ref. [34].)
F I G U R E 21. A M l E spots observed o n the screen for W a n d Au nanotips. These metallic ion beams come from the ionization of the fast diffusing atoms toward the apex of the protrusions. After quenching. E'EM and FIM show the superposition of the three emitting spots due to gas imaging ions ( F I M ) . electrons ( F E M ) . and metallic ions (AMIE). This is shown o n the right-side image obtained with a W nanotip.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
83
Fluorescent scr Electrostatic le
FIGURE10. Experimental setup used for studying the fabrication and emission properties of nanotips. The UHV system contains FEM, FEES, and FIM facilities in the same chamber.
a base vacuum of -5 X lo-'' torr and has a controlled gas introduction system. The tips used in these studies were electrochemically etched [24] from Pt and Au polycrystalline wires, Fe single-crystal whiskers, and W( 111) single-crystal wire. The etched tips were spot-welded onto W loops to allow the control of the tip temperature by joule heating and cooling with liquid nitrogen. The temperatures were determined by a combination of optical micropyrometer measurements on the conical tip shank, the heating loop resistivity values, and by fitting the experimental TED spectra. The controlled temperature range available was 80-3500 K. For FEES, a fluorescent screen was placed at 2.5 cm from the tip, with a I-mm-diameter probe hole in its center. Any region of the FEM pattern could be studied by the electron energy analyzer by using the tip displacement movement and visual control of the pattern on the screen. The TEDs
84
VU THIEN BINH ET AL.
were measured with a commercial 135" hemispherical energy analyzer with nominal resolution of 10 meV, positioned behind the probe hole, in which the entrance lenses had been adapted for the FEES measurements. The tip mounting and deflection systems allowed the choice of the local zone of the apex region of the tip to be analyzed concomitant to the alignment of the e beam to the analyzer axis. FEM and FIM observations were done with the standard technique, i.e., a microchannel plate (MCP) in front of a fluorescent screen located 5 cm away from the tip. FEM and FIM patterns were followed by a video camera connected to a tap recorder and a numerical image treatment system. B . Confinement of the Field Emitting Area The first problem to be confronted in improving tip performance is the reduction of the field emission area. As the FE area is governed principally by the tip geometry and in particular by the apex structure and composition, three directions can be foreseen for narrowing the FE area at the apex as depicted in Fig. 11: 1. To decrease the whole tip radius, i.e., to produce ultrasharp tips 2. To confine the emission over a small area by modifying the atomic structure and/or the work function 3. To confine the field over a small protruding zone, i.e., to fabricate buildup tips and nanotips
m
FIGURE1 1 . Schematic diagram showing the three possibilities for FE tips to exhibit confinement of the emission area to nanometer dimensions.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
a
85
b
FIGURE12. FIM images of an electrochemically etched ultrasharp tip with an estimated radius of about 2 nm. (a) The best image voltage (BIV) is on the apex three-atom facet ( I 11). (b) The BIV is on the zones underneath, with consequently a loss in the resolution of the ending trimer, in order to show the structure underneath the hemispherical tip end.
The results of the studies of these mechanisms presented in the following section clearly show the advantages in the use of the nanotips for confining the whole emission area over the last apex atom of the nanometric protrusion. 1 . Ultrasharp Tips Ultrasharp tips are tips with ending radii of about a few nanometers. The field emission confinement is then simply a result of the reduction of the high field apex area. The fabrication techniques to decrease the tip radius to a few nm can be either an ex situ electrochemical etching technique or an in situ mechanism using ion bombardment. The electrochemical tip etching technique [7, 81 can be controlled to produce very sharp tips [24]. In Fig. 12 we show the FIM image of a W(111) tip with a radius of about 20-25 It was obtained after an electrochemical etching in NaOH (2N) with a controlled pulsed AC current, followed by a very gentle field evaporation of the first adsorbed layer after the introduction into the vacuum. One can notice, from this example, that the (1 11) plane at the tip apex for this tip is a three-atom plane. Another possibility is the in situ etching of initial tips having radius of -100 nm by ion bombardment during FE under a pressure of to ton of Ar or Ne [25]. This technique can be pushed toward the obtention of ultrasharp tips with a radius in the range of 10 nm (Fig. 13).
A.
86
VU THIEN BINH ET AL.
80
I
70
I
I
I
-
Sputter Voltage 500 V Sputter Current lOpA
W
m
2 60
3G
50
-
g
40
-
3rd
30
3w"
20
'
10
1
0
0
I
10
I
I
20 30 Sputter Time (min)
I
40
I
50
FIGURE13. Evolution of the tip radius with argon sputtering time. The inset is the FEM image at the end of the sputtering cycle. The radii were estimated from the voltages needed to have a fixed FE current of 1 X lo-'* A.
The use of these ultrasharp tips for FE applications gives rise to the following comments : 1. The production of such a tip, either by electrochemical etching or in situ sputtering, needs a priori a tip radius control by FIM, which
is not very convenient for FE gun settings. 2. With the electrochemical fabrication, the tip must be etched as shortly as possible before its introduction into vacuum due to possible tip evolution and blunting by corrosion outside the vacuum chamber, especially if one wants to keep the value of the radius in the range of several nanometers. 3. The source size for electron emission is roughly determined by the radius of the tip, which is still at best -20 X 20 atoms. 4. The beam opening angles are those of a hemispherical cap ending tip that is in the range of 45 to 80" as discussed in Section II.D.2. Therefore, a diaphragm is needed to collimate the beam, which results in a relatively low current density available for use in the final beam. 5 . A very unstable emission if the ultrasharp electrochemical etched tip is used. To keep this nanometric radius, the initial tip cannot
ELECTRON FIELD EMISSION FROM ATOM SOURCES
87
be thermally cleaned, so the adsorbed layer on the shank diffuses instantaneously to the apex when the field is cut and is the cause of the nonreproducibility and large fluctuations during field emission. 2. Local Work Function Decrease
The second technique is based on a significant lowering of the local work function by an external deposition of appropriate foreign atoms andlor reorganization of the surface atomic structure [4, 26-29]. Products such as copper phtalocyanine, Ba, Cs, or Zr compounds, for example, are used in order to enhance the FE over single adsorbed molecules or atoms. This technique has been discussed in detail by these authors and presents the following characteristics:
1. The stability of the adsorbate, especially under field emission (see below), due to the energy transfer between the emitted electron and the substrate (Nottingham effect). This unstability becomes critical in particular for individual atom or molecule deposition at a selected site for a controlled local enhancement [28, 301. 2. The local field enhancement may be created at one atom or molecule [4,29], but it does not cancel the field emission from the surrounding regions. This means that the emitted electron beam contains simultaneously the background FEM pattern superimposed on the field emission spots of the adsorbed particles. This behavior is clearly visible from the FEM patterns [41. For better control of the local modification at the apex, the combined action of work function decrease with tip geometry gives a more localized FE area. This is the technique utilized to obtain ZrOlW tips [22,31], used in Schottky emission guns. 3 . In Situ Field Sharpening
A procedure for fabricating in situ tips that have localized FE over atomicsize areas is the thermal field shaping method, i.e., using the diffusion at high temperature of surface atoms in the presence of large electric field gradients. Two cases have to be considered: 1. When the applied electric field F is in the range of few 0.1 V/& then buildup tips are obtained [32]. 2. For F > 1 V/& a field surface melting is produced and leads to the fashioning of nanometric protrusions on the top of the base tips [2, 31. Due to their specific protruding geometry, these cathodes were named teton tips or nanotips.
88
VU THIEN BINH ET AL.
b FIGURE 14. Equipotential lines for a hyperbolic tip (radius = 50 wand applied voltage = 175 V) and a point charge at 2 A from the apex (from ref. [33b]). (b) Equipotential lines corresponding the superposition of the potentials given in (a). The dashed region corresponds to the tunneling region. (c) The same as in (b) including the image force correction. In this case, the tunnel bamer is lower and the equipotentials near the protrusion are almost flat.
The basic mechanism for the protrusion formation will be considered in the following paragraphs. The first step is to analyze the enhancement of the local field with the protrusion geometry.
a . Local Field Enhancement. The protrusion technique is based on the property of local field enhancement over nanoprotrusions [ 191 leading to the confinement of FE to their apexes. The presence of the nanoprotrusions distorts and compresses the equipotentials in their vicinity. To estimate the field enhancement, let us first consider an analytical approach that uses the superposition properties [19, 33a] of a point charge or a dipole on top of a microscopic tip whose potential distribution is described by the hyperboloid screen geometry. The resulting equipotential line for V = V,, will then define the whole tip, including the protrusion at the apex (Fig. 14). The field distribution along the tip axis in the presence of such a protrusion was shown to be
where po is the height of the protrusion, z is the distance from the microscopic base-tip apex, and F, is the field at the apex without the protrusion. The field at the top of the protrusion (z = p,,) is -3 times that of the
ELECTRON FIELD EMISSION FROM ATOM SOURCES
89
substrate in its vicinity; this value does not depend on the protrusion height in this model. Numerical calculations based on the superposition principle give a more precise 3D potential distribution at the atomic scale of the whole tip, which consists of a base tip with a nanoprotrusion at the apex [34]. The base tip is also described by a hyperboloid function, but the protrusion is now modeled on the atomic scale by a cluster of electrostatic charges placed at the center of the spheres at the atomic sites that shape the protrusion. The value of each of the charges is given by minimizing the electrostatic energy of the complex capacitor consisting on one hand of the hyberboloid and the cluster of spheres, all of them at the same potential Vtip,and on the other hand of a plane orthogonal to the tip axis, describing the screen, located a few centimeters away at another fixed potential. The 3D potential distribution is then calculated for a given voltage between this tip and the screen by summation of the different potentials created by the hyberboloid plus all the charges. The field is derived afterward from the calcualted potential distribution. This method, assuming no symmetry except for the base tip, allows calculations of the potential and field distributions in 3D with atomic resoluton for any protrusion shape. Figure 15 (see color plates following page 82) is an example of such a calculation of the 3D field distribution over a conical protrusion with ( I 1 I) axis, base radius 2 nm and height 4 nm, placed on top of a 50-nmradius base tip. The variation of the parameter p of Eq. (21), which is equivalent to the field distribution, is plotted in Fig. 16 for a cross section through protrusions with different heights and a fixed cone angle of 53" (Fig. 16a), and for different heights and cone angles (Fig. 16b). For small protrusions a value of 3.2 for the enhancement factor is found forp, = 1 nm. The numerical results confirm the estimated value of the field enhancement given by Eq. (26), (see Fig. 16a). This /3 value is practically constant for po < 1 nm, which emphasizes the role of the electrostatic screening from the base tip. A second result must be noted. For a given geometry of the protrusion end (i.e., a constant cone angle of -53O) and for p, 2 2 nm, the field at the apex is -9.5 times that of the substrate in its vicinity. This enhancement factor is much larger than the value of 3 estimated by the analytical approach [Eq. (26)]. As the FE is an exponential function of PV, protrusions with po 2 2 nm result in a confinement of the emission area exclusively over the top of the protrusions. This is the essential reason for the choice of nanotips as advanced FE sources. 6 . Tip Sharpening in the Presence of Applied Field. In this approach, the role of the applied field is to induce a gradient in order to define a
90
VU THIEN BINH ET AL.
01 -4
-2
-3
0
-1
1
2
3
4
Distance from Apex (nm) a
-4
I
I
I
I
-3
-2
-1
0
I 1
I 2
I
3
4
Distance from Apex (nm) b FIGURE16. (a) Field distribution over the apex of nanotips with conical protrusions of different heights and a fixed cone angle of 53". (b) Field distributionover the apex of nanotips with conical protrusions of different heights and a fixed base radius of 2 nm. (From ref. [34].)
direction of surface diffusion. As this driving force is effective only if it is applied to diffusing atoms, the protrusion formation has to be performed at temperatures high enough to create mobile atoms on the metal surface. Under the conditions of elevated temperature and field, the surface atoms will migrate from low-electric-field regions toward higher-field regions.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
91
The final geometry of the tip apex is then governed by the equilibrium between two opposing diffusion processes, the first driven by the gradient of the electric field, and the second driven by the capillary forces (gradient of the surface chemical potential) [35]. Depending on the value of the applied electric field, two different tip end geometries can be obtained: buildup tips for F around 0.5 V/A and nanotips for F larger than 1 VIA. the specific properties attached to each of these two profiles are described next. Buildup Tips.
Consider the case when the applied field is around 0.5
V/& this procedure is termed the buildup technique [321. The applied field induces a gradient across some low-index facets with, as consequence, the enlargement of these facets. It can be performed either with positive or
C d FIGURE17. FEM of a buildup sequence. (a)-(d) Evolution of the FEM pattern from a hemispherical W(11 I ) tip (a) to a buildup tip (d), due to the enlargement of the three facets { 112) under temperature and field. All the patterns are at the same scale in order to show the confinement to one spot of the emission area during the buildup.
92
VU THIEN BINH ET AL.
negative polarity. With negative polarity one can follow in time the variation of the apex geometry by FEM because the field value is in the range of FE. For W(111) tips, the facets which are enlarged are the three (112) planes around the tip axis. Taking the facet enlargement to its limit ends in the intersection of the three facets with a comer at the (1 11) apex (Fig. 17). This corner can end in one or three atoms and becomes a small triangular facet after a controlled field evaporation. This local region of high curvature creates a predictable local field enhancement [19]. The calculation of the field distribution at atomic scale gives an enhancement factor in the range of 1.4 compared to the surrounding field, as shown in Fig. 18 [34]. This enhancement factor is small because the angle between (112) and (111) is only -20". It is enough to allow preferential FE over the protruding apex, but without being exclusive as indicated by FEM patterns and local current measurements over the tip end cap. In Fig. 19 is shown the same FEM pattern but with increasing FE voltages and MCP gain, which clearly illustrates the apparent confinement is partially an artefact due to signal detection sensitivity. Thus, for buildup tips the FE current is not confined to the apex atoms. The buildup only enhances preferentially the FE over the intersection corners of some facets by a ratio of ZlZ, = 15, without being exclusive. Derived methods to increase the local angular beam confinement by using concomitantly the buildup and the selective work function reduction (with oxygen processing or ZrO coating, for example) [36] are now currently used in commercial FE guns.
I
-4
-3
I
I
I
I
I
I
-2
-1
0
1
2
3
4
Distance from Apex (nm) FIGURE18. Field distribution over the apex of a ( 1 1 1 ) buildup tip (dashed line) and a nanotip (height po = 2 nm and cap diameter 4 nm). (From ref. [34].)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
a
b
93
C
FIGURE19. (a)-(c) Comparison of the FEM pattern of a buildup tip (a) with the corresponding FIM pattern (b). It shows the one-atom boundary between the {112} facets and the three-atom comer forming the apex of the tip. An increase of the FE voltage and the MCP gain shows that the emission area is not confined to the topmost atoms, as revealed by the FEM pattern (c) of the same buildup tip shown in (a).
Teton Tips or Nanotips [2,31. To obtain exclusive FE from the protrusion apex, the calculations indicated that a minimum protrusion height of about 2 nm is necessary. A very high mobility of the atoms is needed to obtain such a protrusion height, as for example in the Taylor cone formation with liquid layers [37]. However, in this latter case the apex of the protrusion is in the micrometer range. To obtain a very sharp apex, the protrusion formation by surface melting mechanism has been introduced [2, 31. Under these conditions, the surface atoms are very mobile but the underneath protrusion is still solid, and this is the main difference from the classic Taylor cone formation. It is the very high mobility of the surface atoms driven by the field gradient over the solid substrate that leads to the formation of nanoprotrusions ending with atomic sharpness. This process is detailed in the following paragraphs. Field surface melting mechanism. In order to increase the mobility of only the surface atoms, they must be under an action which lowers their activation energy but which does not affect their underneath neighbor atoms. This is what happens when a large electric field is applied to a metal surface. For a flat surface, the effect of the field on the reduction of the activation barrier for surface diffusion is negligible even at very large applied field. The reason is that the dipole induced by the applied field is small. However, if the surface is rough-with adatoms, vacancies, kinks, steps, etc., as due to thermal treatment-the values of the permanent dipole moments are different at each point. This difference is increased by the spreading out of the surface charge [38]. The action of the field is then enhanced on the protruding parts of the surface. The estimation of surface diffusion in the presence of an applied field
94
VU THIEN BINH ET AL.
can be made by considering the activation energy for surface diffusion in the presence of a field, Q ( F ) [81: Q ( F ) = Q, - f a F 2- p F
(27)
where Q, is the activation diffusion barrier at zero field [39], and a and pare the atomic polarizability and the permanent dipole moments, respectably. The surface diffusion coefficient in the presence of F is given by
where a is the jumping distance of diffusing atoms taken to be the unit cell -3 and v, is the attempt frequency (10’2-1013s-I). For a field value of 2.55 V/A, which is approximately the value used for the fashioning of W nanotips, D , = 3 x cm2/s at 1200-1500 K. One can also estimate D , from the atom flux supply needed to obtain the experimental atomic metallic ion emission (AMIE) beam of lo6 ions/s (see below). The value obtained is also in the range cm2/s. As the criterion for surface melting is a diffusion coefficient larger than 2 X 1O-j cm2/s [40], the surface in the presence of very high field is then melted locally at about one-third of the bulk melting temperature. Growth and formation of nanoprotrusions. The high diffusivity facilitates an increase of the height of some existing protrusions due to the field gradient driving force over the thermally induced corrugations leading to the formation of nanoprotrusions. The geometry of the formed nanoprotrusions is determined by the equilibrium between the pulling-up by the electric field gradient force and the blunting due to the capillary force. A schematic drawing of this mechanism is given in Fig. 20. When the field enhancement over the apex of these protrusions is high enough, i.e., for a certain height, the last atom is ionized. This gives rise to a metallic ion beam which is regulated by the supply of diffusing surface atoms to the apex under the field gradient. The appearance of such atomic metallic ion emission (AMIE) is detected by the presence of a spot on the screen placed in front of the tip (Fig. 21, see color plates following page 82). By adjusting the two parameters, F and T , the high protrusion formed during AMIE could end in one atom. Note that the AMIE mechanism has been experimentally observed for W, Pt, Au, and Fe and therefore it can be used for all metallic emitters. Of crucial importance is that the high protrusion geometry remains intact upon quenching. After cooling, the resulting protrusions are generally ( I 11) pyramids of 2 to 3 nm dimensions (for the base and the height) ending in one atom. The FIM analysis of such a nanoprotrusion is presented in Fig. 22(I). The sequence shows the FIM of the apex atom and the structures underneath obtained by progressive field evaporation.
A
-
ELECTRON FIELD EMISSION FROM ATOM SOURCES
1
95
Atomic Metallic Ion Emission and Nanotip Formation
FIGURE20. Schematic of the field- and temperature-driven formation of nanometric protrusions on a metal surface. Atomic metallic ion emission (AMIE) from the protrusion apex occurs under a positive field > 1 V/A and T = one-third the melting point. A rapid quenching preserves the nanoprotrusion with a one-atom apex.
As mentioned above, 3D calculations of the field distribution for teton tips showed a field enhancement factor of 7 to 10 over the apex atom compared to the substrate tip [34], which means that all the FE current comes exclusively from the single apex atom. FEM observations of the protrusion tip showed only F E from the protrusion zone, which means a FE spot of <2 mm on the screen at -5 cm. No other pattern out of this spot is observed over the whole range of applied voltages for observing the nanotips [Fig. 22(II)1. From the above comparisons, the nanotip or teton tip geometry is best
96
VU THIEN BINH ET AL.
11. (a)
11. (b)
11. (c)
FIGURE22. 1. FIM analysis of a W nanotip on top of a ( 1 11) base tip. Images (a)-(f) are FIM patterns [Ne is imaging gas except for (e), which is with He] during a progressive field evaporation. They show the topmost atom (a), then a trimmer (b), a small facet (c), the crystalline structure of the nanopyramid (d), and the triangular base of the nanoprotrusion (e). The FIM pattern (f) is obtained with BIV on the base tip and shows the location of the nanoprotrusion (the blurred central zone) within the ending-cap base tip. 11. FE spot from the W nanotip at different voltages (a) 525 V, 1.5 x A, (b) 620 V, 2 X A, (c) 700 V, 1.3 X lo-* A. The spot geometry is the same for the three FEM patterns, and this is an indication of an exclusive FE from the apex of the nanotip.
suited as an electron source in view of the size of the emitting area and the confinement of the electron beam. The confinement of the FE area over the apex atom implies specific FE characteristics, which will be described below.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
97
C . Field Emission Characteristics from Nanotips: Experiment
In this section the measured field emission properties of the nanotips are presented. The measurements include: (1) the beam opening angle, (2) the current stability in time, (3) the I-V characteristics, and (4) the total energy distribution. To clarify the presentation, the experimental measurements presented in this section are separated from their interpretations, which will be presented in Section 1II.D. 1. Beam Opening Angle
Typical images of the FEM patterns of a nanotip are shown in Fig. 22(II). The emitted electron beam was self-collimated to opening angles of 4-6" [2, 31. From a practical point of view, this beam gives rise to a single, nearly round spot with a diameter in the range of 0.5 cm on a fluorescent screen situated 5 cm away. The electron current distribution across the beam was characterized either by a direct measurement with a probe hole or by an image processing of the intensity of the spot obtained on the fluorescent screen. In both cases the experimental measurements have to be deconvoluted by the measurement function (probe hole diameter or MCP gain and fluorescent screen response) in order to have the actual distribution. For different extraction voltages it is generally a Gaussian distribution, as shown by the example in Fig. 23, and the size of the emission spot does not change dramatically for a large range of FE volt-
5
m
4
m
u 2oooo 1OOOO
0 -10
-5
0
5
10
Beam Angle (deg) FIGURE23. Measurement of the angular distribution of the FE current of a nanotip by the probe hole technique.
98
VU THIEN BINH ET AL.
11.2 .o
~
7 0.8 c
= 0.25 %
W
g 0.6 -
Y
5
U 0.4
-
0.2
-
0.0
'
6.0
I
0
I
2
I
I
70
65
I
4
I
6
I
-
I
8
Time (hour) FIGURE24. Total current emitted by a W nanotip as a function of time for current of about I nA.
ages. This indicates that the total FE'current comes exclusively from the nanoprotrusion apex even at higher voltages. 2 . Field Emission Stability A characteristic example of the nanotip FE current stability behavior at fixed FE voltage (VApp),starting with a clean tip, is shown in the Fig. 24. From multiple experimental results, three properties can be highlighted: 1. The high stability in time of the current over periods of hours for
currents I1 nA. In the example of Fig. 24, the variation is less than 1% for -10 h of continuous emission. 2. For higher currents, up to 0.1 pA (Fig. 25), we observed reversible and irreversible jumps of the current between different levels, followed by periods of relative stability. These changes have been shown to be due to current induced local heating [see Section IV.C]. 3. For FE current higher than 0.1 PA, the probability to destroy or to irreversibly change the protrusion is very high, and this constitutes the third characteristic of FE from nanotips: The useful current coming from the apex of the nanoprotrusion has an upper limit of around 0.1 pA. However, even if the stable working current is -1 nA, the brightness of the beam is still exceptional if one considers that the whole current is coming from atom-size source with a 4-6" beam opening.
ELECTRON FIELD EMISSION FROM ATOM SOURCES I
2o 'O
I
'
99
I
I1 t 1
01
0
I
I
I
10
20
30
I 40
I
I
I
50
60
70
80
Time (min) FIGURE25. Total current emitted by a Pt nanotip as a function of time for current in the 50-nA range.
3 . I-V Characteristics The variation of the field emission current from a nanotip versus the applied voltage is plotted as In(ZlV2)versus 1/V. An example of such I-V characteristics is shown in Fig. 26. The main difference from the FowlerNordheim relation of Eq. (14) is the current saturation at high voltages [2]. This means that, relative to field emission from a metal surface, there is a progressively weaker emission current on increasing to higher applied
I
-
I
I
I
I
I
I
I
-
10-'0
% &
z
-
4 > 10-12 -
10-l~ ~
6
I
I
7
I
8
I
9 10'~
1/v (volt-') FIGURE 26. FE I-V characteristics from a nanotip.
100
VU THIEN BINH ET AL. 10-14
.
I
I
I
W< 11 I> Build-up Tip n
1 0 - l ~-
-
10-16 -
-
*0s
2 J W
-
10-17 -
lo-'*
I
I
I
fields, This current saturation behavior is not observed for buildup tips (Fig. 27), and is therefore a signature of an exclusive FE from the apex of the nanotip when localized FE is observed. 4. Total Energy Distribution [41]
The experimental procedure for measurements of the TEDs of nanotips was as follows; (1) in situ fabrication of a clean microscopic tip of less than 100-nm radius; (2) FEES measurements of this tip centred on the (1 11) region; (3) fabrication of a single-atom protrusion on top of this tip; (4) FEES study from this protrusion; and ( 5 ) destruction of this protrusion by a controlled heating of the tip and FEES measurements of the resulting microscopic tip at the (111) region. The TED spectra recorded for microscopic tips during steps (2) and (5) had the same shape, were located at the Fermi level (EF),and showed the well-known behaviors for clean W( 11 1) tips [42]. Two example spectra, measured after the destruction of the protrusion, for different values of the applied voltage are presented in Fig. 28. Basically, one strong peak was observed with a sharp edge at EF for any applied voltage, VApp.An increase of VApp did not change the position of the Fermi edge, only causing a broadening of the peak on the low-energy side. To fit the spectra we have used the classic Eq. (18) for the tunneling current from a freeelectron metal which was developed by Young [ 141. Excellent agreement between the experimental data and theory was found, as is shown in Fig.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
1 .o
B
-
P)
.-NLd
I
I
I
1
-1.0
-0.8
-0.6
I
I
I
-0.2
0.0
101
0.8 0.6
0
C
v
y 0.4 C
s 0 0 0.2 0.0 -1.2
-0.4
0.2
Energy Relative To EF (eV) FIGURE28. TEDs of the W(111)microtip before formation of the nanotip. The fits by Eq. (18) are represented by the solid curve over the experimental points.
28. The value of EF for microscopic tips was constant and was then taken as the reference level for all the spectra. The measured FWHMs for the spectra in Fig. 28 were 0.23 eV and 0.30 eV for 1650 V and 2250 V, respectively, a variation which is a consequence of the increase of the slope of the tunnel barrier. The TED spectra were recorded for different VAppafter formation of protrusions on top of the macroscopic tip. Four salient features, which are not present for microscopic tips, can be discerned in the experimental observations. 1 . The spectra are composed solely of well-separated peaks. To show clearly the relation between the peaks and the protrusion height, the evolution of the TEDs versus height protrusion on the same base tip is plotted in Fig. 29. The TED spectra were recorded for different stages during the formation of a protrusion, that is, at increasing height, on top of the microscopic tip. As the height of the protrusion increased, the contribution from the localized band became predominant (curve 2+ 3), then exclusive (curve 4). For BU or small-radius tips, standard TEDs were measured (curve 1). In general, the number of peaks and their relative intensities depends on the protrusion geometry and on V,,,. In Figs 30a and 30b, examples of TED spectra observed from different nanotips are shown with one and two peaks, respectively.
102
VU THIEN BINH ET AL.
I 1.0
E2
0.8 0.6
-
0.4
-
0.2
-
v Y m
C
U
I
-
I
I
I
I
1
-1.0
-0.5
0.0
0.5
w (111) ~~
(1) : Build-up tip (2) - (4) : Nanotips
with increasing protrusion height
-2.5
-1.5
-2.0
Energy Relative to EF (eV) FIGURE 29. Evolution of the TEDs versus height of nanoprotrusion, showing the appearance of localized peaks not at EF.
2. The peaks are not pinned at the Fermi level, and a linear shift of the whole spectrum is observed as a function of VApp.This is illustrated in Fig. 31, where the dependence of the TEDs with a two-peak spectrum are given. The positions of the peak maxima as a function of VAppare plotted in Fig. 32. All the data fall on parallel lines, with slopes of 1.65 0.02 meV per applied volt, showing that the separation between the peaks remains constant. The total shift of the peaks for the range of V,,, in this experiment was -0.7 eV. Note that no shift was detectable for microscopic tips and for similar changes of VAYp.The shifts and the intensity variations of the peaks during their shlft were reversible; they could be varied reproducibly by changing VApp. 3. None of these peaks could be fitted satisfactorily by Eq. (18). For the same FWHM, the spectra from the microscopic tip have wider tails on the low-energy sides and sharper maxima than the peaks from the protrusion. 4. The FWHMs of the peaks vary little with V,,,. For example, for the TEDs of Fig. 31, the FWHMs remain -0.24 eV for all values
*
Of vApp*
A direct relationship exists between FEM patterns and FEES spectra. The presence of adsorption from the background gases on top of the
ELECTRON FIELD EMISSION FROM ATOM SOURCES
103
Energy Relative to EF (eV)
1.0
F
3 0.8
3
88 0.6
c v
2m
0.4
1 0
u
0.2 0.0
-2.0
-1.5
-1.0
-0.50
0.0
0.50
Energy Relative to EF (eV)
b FIGURE30. Examples of TEDs from high nanotips with (a) one and (b) two peaks not pinned at EF.
protrusion was easily characterized by large instabilities of the emission current and strong modifications of the TEDs. The adsorption could be removed by application of field and temperature, after which the emission properties were again stable. The FEES spectra could be the same as before the cleaning or could show changes in the positions, number, and relative intensities of peaks for the same VApp,probably reflecting a different structure in the geometry of the protrusion and the nature of the atoms forming the apex.
104
VU THIEN BINH ET AL.
1
1200 I
looo
I
I
I
I
I
53q+(a!
Single Atom Protrusion
800 WJ
3
600 400 200
n --2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
Energy Relative to EF(eV) FIGURE31. Evolution of a two-peak TED from a nanotip as a function of applied voltage, showing displacement with applied field. (From ref. [41].)
D . Field Emission Characteristics from Nanotips: Discussion
In this section we wish to connect the measured specific FE properties of the nanotips to the atomic size of the protrusion apex, indicating by the presented arguments that the nanotips are atom-sources for field emission of electrons.
v
We O
CI
-0.4
u > Y
.r(
cd
2 -0.8 d
E ? 0
P
' Z -1.2
a"
%u
L -1.6
900
lo00
1100
1200
1300
1400
1500
v.4pp
FIGURE32. Position of the peaks of Fig. 31. The shift is 1.65 meV per applied volt and is reversible. (From ref. [411.)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
105
1 . Self-collimation of the e beam to 4-6" The resulting beam opening that determines the spot size measured at the projection screen is controlled by two mechanisms: (1) the intrinsic angular spread 8, of the emission current just after the tunneling barrier at the apex, and (2) the compression of the electron trajectories due to the influence of the emitter shank on the potential distribution. The second effect reduces the initial angular spread by at least a factor of about 2 [Eq. (24)]. This means that the measured angular opening 0, of the e beam of 4-6" at the screen corresponds to an angular spread 8, at the emitting apex atom of the order of 8-12". Two factors are of importance in the determination of the value of 8,: 1 . The geometric effect, i.e., the radius of curvature of the protruding emitting area 2. The diffraction effect, i.e., the size of the tunneling region restricted to one atom versus the wavelength of the incoming electrons inside the tip A, (-4 for EF = 8 eV).
A
a. Geometric Effect. Consider first the field emission from a smooth hyperboloidal surface having a small radius of curvature which simulates the apex of the nanotips. Semiclassical calculations neglecting the electron diffraction at the tunneling opening [43] give a full angular spread 8, of 8, = 4
Ja.
This relation is obtained by using the WKB approximation for the tunneling probability T ( E ) ,given by
evaluated at E = EF and the field distribution F(8) over the apex, given by
F(8)=zFo(I
-:).
For Fo = 0.5 V/A and 6 = 4.5 eV, Eq. (29) gives an angular spread 8, = 52". To verify the validity of this approach, the problem was solved using the time-dependent Schrodinger equation (TDSE) of a Gaussian wave packet moving toward a constriction which represents the tunnel apex atom. The TDSE was solved numerically by means of an algorithm based on a fourth-order Trotter formula [44]. It gives a value for 8, = 50". In conclusion, the geometric factor attached to the nanoprotrusion ge-
106
VU THIEN BINH ET AL.
ometry gives a reduction of the value' of OC of about a factor of 2. The resulting beam opening 8" is then of the order of 25" in the case of a nanoprotrusion, instead of 45-80' for microscopic tips. However, this does not explain the values of 4-6' for the 8" measured from the nanotips. b. Diffraction through a Tunnel Barrier. When the tunneling emission comes from a region which has dimensions of the same order as the wavelength of the incoming electrons, the diffraction through the tunnel barrier must be considered. This is the situation of field emission from the last atom of the nanotips. In the diffraction problem without the presence of a tunneling barrier, it is possible to estimate the diffraction by using the Heisenberg uncertainty principle. In the presence of a tunnel barrier, the diffraction process requires the solution of the TDSE [45]of the transmission function T,(E). The angular spread BC is then defined as the angle for which the tunnel intensity J ( 8 ) is lie of the axial intensity J ( 0 ) . J ( 8 ) is given by
The summation over i runs over all quantized levels. As a first approximation, each scattered plane wave is considered to be filtered incoherently by the tunneling barrier. Under such assumptions, electrons having a total energy of E will have a transmission function Ti given by Ti(E)= I Fn(ksin 8)I2T(Ecos' 8) (32) where k = and Fn(k sin 8) is the Fourier transform of the slit function which is the diffraction function of the constriction. The calculations give, for a constriction opening between 8 and 20 A, EF = 8 eV, n = 1, and a field value of F = 0.5 V/A, an angular spread of 8, 5 20". This result expresses two properties which are intrinsic to the tunneling process. 1. Only the electrons within a small energy range AE will contribute to the current, due to the filtering effect of the triangular tunnel barrier. 2. The transmission probability decays exponentially with the angle.
The angular spread BC can then be estimated from
(h'k';;'
8,)
1
z-
e
(33)
which gives, for small values of angular spread, ec=2
:/-.
(34)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
107
This relation gives, for EF = 8 eV and AE in the range 0.1-0.3 eV, values of 8, between 15 and 20", which are in agreement with the exact TDSE calculations. The values of Bc in the range 15-20" are the consequence of the diffraction process though an atom-size slit with a triangular tunneling barrier. Taking into account the geometric effect and the convergent lens effect of the tip shank, one can expect these values to be divided by a factor of 2 or more. The resulting values for 8, are then in the same range as those measured experimentally from the nanotips, which is from 4 to 6". 2 . Stability The long-term stability behavior of FE from nanotips can be assessed by comparison with hemispherical and buildup tips. For these latter cases the current stability is explained by considering the adsorption of the residual gas over the emitting area. As noted in Section II.E, hemispherical and buildup tips have a limited stability due to the formaton of an adsorbed layer which varies the work function (see Figs. 7 and 8). The FE stability of the nanotips presented in Fig. 24 is simply explained if one considers the very small probability of having an adsorbed atom on the apex atom coming from the surrounding gas phase (-5. lo-" to lo-'' torr). An estimation of the impinging frequency v can be calculated, just by considering the gas kinetics equations, which gives v = s-l for a surrounding pressure of torr, i.e., a time interval of -3 h. This value is in the range of the experimental measurements. When the FE current is increased, the temperature at the protrusion increases due to the Nottingham effect within the localized band structure. Note that this effect has been used to measure the local energy exchange during the FE process from a nanotip, which will be presented in the following section 1V.C. The probability of a rearrangement of the nanotip apex on an atomic scale increases at higher temperature, or equivalently higher FE current, leading to abrupt changes in the total current. This is the cause of the observed reversible and/or irreversible discrete jumps in the current in Fig. 25. The upper limit for the FE current is also explained by the very high increase of the temperature during FE, leading to the destruction of the protrusion by surface diffusion or by local melting. 3. Localized Band Structure
The existence of well-separated peaks in the single-atom TEDs shows that the electrons do not tunnel directly from the bulk Fermi level to the vacuum. The peaks and their shifting suggest the presence of a localized band structure at the tip apex. Furthermore, this idea is supported by the constant spectra widths and energy gaps observed experimentally. The
108
VU THIEN BINH ET AL.
. . .
FtGURE 33. Simplified model of FE from single-atom nanotips for two values of VApp. The lightly and darkly shaded bands signify the position of the band for two different values of V A p p .
peaks in the TED spectra then occur because the emitted electrons resonantly tunnel to the vacuum only through these bands. This situation is depicted schematically in Fig. 33 for the case of a one-band TED. Resonant tunneling through atomic energy levels of adsorbed atoms, which have been broadened due to interaction with the surface underneath, was first introduced by Duke and Alferieff [46] and later developed more fully by Gadzuk [47, 481. This was used to explain the small bumps added to the energy distributions of the clean microscopic tips observed in FEES experiments [27] with chemisorbed atoms on metallic surfaces. It must be emphasized that the presence of chemisorbed atoms in these experiments only slightly modified the standard peak of a clean microscopic tip, in contrast to the spectra from the protrusions which consist solely of well-defined peaks. This latter behavior could have its origin in the atomic size and shape of single-atom protrusion tips, and in particular on the reduced coordination number of the atom that constitutes the apex compared to a single atom on a surface. The shifts of the peaks run counter to a metallic behavior of the topmost atom. The linearity of the shifts versus V,, shown in Fig. 32 means linear shifts versus applied field F at the cathode surface because F = PV,,,. This shift and its linearity versus the applied voltage are explained by a
ELECTRON FIELD EMISSION FROM ATOM SOURCES
109
charge confinement in the region of the topmost atom, which implies a field penetration into the tip. The charge confinement and the penetration of the field can be estimated by the Thomas-Fermi model of screening [49]. To estimate the field penetration x, for the protrusion, the expression for the potential of the electric field penetration into a flat subsurface region is used as a first approximation: V,
= x,Fexp
(-:)
(35)
where x is the distance from the surface to a position within the cathode. Thus the energy of the emitted electrons (x = 0) varies linearly with vApp as AE = exOpVApp
(36)
by taking into account the relation between F and VApp. Applying Eq. (36) to our experimental results of AEIVA,, = 1.65 meV/V and taking p as 5 to 10 X lo6 m-l for a protrusion of 2-3 nm height [34], gives x, of 2-3 A for the single-atom protrusion. This value should be compared to the screening length of a metal surface, which is less than 0.5 [49], and also with the estimation of field shift with single adsorbed Ba, which is 1.3-1.7 A [27]. It is also roughly the dimension of an atom and this strongly supports the idea that the observed peaks in the TED spectra are related to localized levels at the topmost atom. Calculations of the electronic structure over metal protrusions for different materials, structures, and geometries (height) [50] have been recently developed using the tight-bonding formalism. The main advantage of this semiempirical method is that complex objects containing nonequivalent atoms can be calculated. The calculations were done for single-atom ending pyramidal protrusions for different metals (W, Fe, and Cr). Different heights and crystallographic orientations were considered. Figure 34 is an example of the results, showing the local denisty of states of the topmost atom. The main points of interest related to this work are as follows: 1. The local density of states over the apex atom of the pyramidal protrusion evolves toward a peak structure when the height, p,, of the protrusion is increased layer by layer, starting with one adsorbed atom on a surface. This electronic structure comes to a steady state for po 2 4 layers. 2. The final steady-state electronic structure is characterized by a predominant peak localised 1 eV over the Fermi level. However, the
-
110
VU THIEN BINH ET AL. 2.5 2.0
9
9 1.5 P) * 0 * vl
v3
Y
1.0
3 E:
1.0
0.5
-5
-3
-1
1
3
5
0.0
-5
-3
E[eVI
a
-1
WeVI
b
FIGURE34. (a) Local density of states of a W surface atom of a semiinfinite W (001) crystal. (b) Local density of states of a W atom at the apex of a (001) pyramid of height 2 4 atomic planes above the (001) surface from ref. [50b]. The vertical line corresponds to EF.(From ref. [50bl.)
calculations have been done for zero applied field and do not take into account a possible field shifting. The experimental measurements of many nanotip TEDs show that the extrapolated zero-field position of the bands are found quite generally to lie in the 1 to 2 eV range above E F , in agreement with the calculations. 3. This local density of states distribution is specific to a single-atom ending protrusion. The conventional surface LDOS structure is recovered when the last atom is stripped off. It is premature to compare too strictly the calculations with the experimental results and to expect a fit between the two sets of values. However, the theoretical results confirm most of the previously specific experimental characteristics of the TED spectra from the single-atom nanotips. All these results, experimental and theoretical, assert that the freeelectron behavior for field emission is not valid for the atomic-scale emitting source of a nanotip. Consequently, the local density of states specific to each nanoprotrusion must be considered in interpreting the experiments with an atomic-scale probe as in scanning tunneling microscopy and scanning tunneling spectroscopy experiments with atomic resolution, instead of the commonly used free-electron model [51, 521. 4. Current Saturation in the I-V Characteristics
The current saturation related to the presence of a protrusion was measured at the very beginning of the study on nanotips [2], and it is the
ELECTRON FIELD EMISSION FROM ATOM SOURCES
111
signature of the presence of a high nanoprotrusion ending in one atom. Different interpretations [2, 33a, 531 were proposed to account for the observed discrepancies between I-V characteristics for nanotips and the conventional Fowler-Nordheim analysis. Since in the conventional Fowler-Nordheim analysis the model of the tip is a planar surface, the field everywhere outside the tip is a constant. For protrusion tips with atomic sharpness, the field in the region around the emitter and away from the apex varies on the scale of the protrusion. 10-l~
I
I
I
1 0 - l ~-
-
h
z
*w
-
cw10-17
-
J 10-18 -
10-l~
1.6 l o 3
I
I
1.8
1
2.0
2.2 1 0 ‘ ~
2.4
1/v (Volt-‘)
a
I
I
I
I
-
Energy Relative to EF(eV) b FLGURE 35. Nanotip FE I-V characteristics (a) and accompanying TEDs (b) for three different extraction voltages measured concomitantly: (1) 420 V, (2) 475 V, and (3) 525 V. The TED for 420 V is above E F . They show the direct relationship between the current saturation and the peak structure.
112
V U THIEN BINH ET AL.
It is found, for example, that the field away from the apex decreases rapidly [34], whereas the field is constant for a planar geometry. Consequently, the analysis of the I-V characteristics for nanotips have to take into account the protruding geometry in the calculation of the tunneling barrier. The results [33a, 531, plotted as In(J/F2)versus 1/F, show a saturation of the current but only in the very high-field region. A second parameter is the presence of localized peaks for FE from nanotips. The relation between the presence of the localised peak(s) in the TED and the current saturation for the I-V characteristics is shown in Fig. 35. They show clearly that the presence of the peaks and their shifting (Fig. 35b) are concomitant with the current saturation (Fig. 35a). The presence of localized bands that shift with the applied voltage dramatically changes the supply function, which could be the predominant factor for the current saturation. Conventional Fowler-Nordheim analysis is then not valid, because considering the tip as planar and the electrons as a free-electron gas are not valid assumptions for nanotips. IV. APPLICATIONS
As was pointed out in the introduction, the level of interest in nanotips depends essentially on the new possibilities that they can open due to the specific field emission properties attached to the atomic size of their emitting area. Five actual subjects will be presented, which have been developed with the nanotips: FEM with atomic resolution; monochromatic electron sources; energy exchange and its analysis at the atomic scale; low-energy ,high-resolution microscopy with the Fresnel Projection microscope; and finally, several new phenomena with ferromagnetic nanotips. The results, which are presented below, were obtained in the last few years and are examples of the breakthroughs made possible by the use of nanotips. A . Atomic Resolution under FEM
Atomic resolution in FEM has been a subject of interest in the FE community since its inception [6]. The conventional approach to estimating the resolution in FEM [6,7] considers the distribution of momenta transverse to the normal emission direction and the wave nature of the electron. Taking these into account gives an effective resolution of the order of 2 nm, well above the size of one atom.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
113
Consider the resolution problem within a three-atom apex of a nanotip. Because of the field enhancement in the vicinity ofthe three atoms forming the apex of the protrusion, the tunneling barrier will present three minima, located on top of each of these atoms. However, as the distances between these minima are of the order of the wavelength of the tunneling electron, the three emitting beams cannot be considered independent. Electron diffraction and interference between these beams are important, and thus a full quantum mechanics approach is necessary. This has been performed with a 2D model which mimics as close as possible the three-atom apex of the nanotip and is solved by exact numerical integration of the corresponding TDSE (Fig. 36) [33]. The initial electron wave packet is a Gaussian packet moving in the direction of the nanotip axis and the field-emitted e beam is described by the probability distribution of the transmitted wave packet. The calculations show that if the field emission comes from all the atoms of the protrusion apex, rigorous conditions on the protrusion geometry are necessary in order to obtain atomic resolution in the FEM patterns. The first of these conditions is that the radius of curvature of the equipotential around the apex should be of the order of the interatomic
114
VU THIEN BINH ET AL.
Ilb FIGURE36. (continued) 11. Intensity of the wave packet reflected and transmitted by the tunneling model in I for a tilt angle of 30". (a) In the presence of the tunnel barrier, the two coherent waves emitted by the two atoms do not merge and the atomic resolution is observed. (b) In the absence of the tunnel barrier, the two transmitted waves interfere and no atomic resolution can be observed. (From ref. [33b].)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
115
distance. This requires a protrusion ending in a trimer with a height po much larger than 1 nm. The second condition is that the tilting angle O1 with the presence of the tunnel barrier must be in the range 20-30". Although the first condition is fulfilled by nanotips, the second condition is attained only for three-atom ending protrusion whose height is in the range of 4 to 6 nm [34]. This is illustrated in Fig. 37. Experimental realisation of atomic resolution in FEM for a three-atom ending protrusion is shown in Fig. 38. The three emitting spots of Fig. 38a are related directly to the three ending atoms. The stability of the current over a 5-h duration demonstrated that the spots were coming from the atomic position, because larger emitting areas would have certainly shown current decreases due to adsorption (cf. discussion about stability). Moreover, we show in Fig. 38b the FEM pattern of the same nanotip with an adsorbed atom between two of the initial three atoms, which is indicated by the arrow in the figure. The presence of this additional spot was accompanied by a discrete jump in the FE current, which is another experimental proof of atomic resolution.
B . Monochromatic Electron Beam The existence of localized bands in the TEDs from nanotips is in contrast to FE from microscopic metal emitters, where the electrons come from the wide conduction band and have a distribution fixed at the Fermi level (EF). In this case, the distribution width is defined essentially by the tunneling barrier, which fixes the lower limit of about 0.3 eV (cf. Section II.C.2). Conversely, for nanotips the energy distribution of the FE electrons is now governed not only by the tunneling barrier but also by the localized band structure. Modifying this last parameter allows us to narrow the FE energy spread well under the 0.3-eV limit [54]. Examples of TEDs measured for W and Pt nanotips at 293 and 80 K are shown in Fig. 39. The nanotip TEDs are characteristically narrower. For the W nanotip the measured FWHM was 120 meV at room temperature and 110 meV at 80 K (Fig. 39a). For the case of the Pt nanotip, the measured FWHM was 100 meV at 293 K, and this decreases to 64 meV at 80 K (Fig. 39b). This has to be compared with the energy dispersion from microtips plotted in the same figures, which was about 0.3 eV and decreases only modestly with temperature. In Fig. 40 the FWHM of the Pt nanotip TEDs is plotted as a function of temperature. It shows a linear decrease with temperature with a zero temperature value of 51 meV. The above values are the experimentally measured values without any correction for the instrumental broadening. Using the conventional decon-
VU THIEN BINH ET AL.
116
a l5
-
r
0 -2
0
-1
1
2
Distance from Apex (nm)
b FIGURE37. Variation of the tilt angle between the tip axis and the direction of the maximum field over each of the atoms forming the trimer apex of a protrusion as a function of the protrusion height. (a) 3D calculated field distribution with atomic resolution over an equipotential surface for a trimer nanotip (po = 4 nm and base diameter = 4 nm). It shows a local enhancement of the field just over each of the atoms forming the trimer apex. (b) Plot of the /3 factor showing the local field enhancement over each of the three atoms of the trimer apex for three different heights of the nanoprotrusion. The indicated angles are the tilt angle over the apex atoms. Tilt angles > 20" are obtained for protrusions with a height >4 nm. (From ref. [34].)
a
b
FIGURE38. (a) FEM patterns showing three emitting spots from a nanoprotrusion. (b) FEM pattern of the same nanotip showing the presence of an absorbed atom indicated by the arrow. (From ref. [33al.)
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
0.2
Energy Relative to EF (eV)
a 1 1.0
9v)0 . 8
.-
d
t d
-
I
I
I
I
I
--
Bulk tip (293 K) FwHM=270 meV Nanotip (293 K) FWHM=lOOmeV Nanotip (80 K)
-
2 0.4 fz I 0 0.2 0
I
0.6
G
-
W
"
0.aI .1.0
-0.8
-0.6
-0.4
-0.2
0.0
0.2
Energy Relative to EF (eV)
b FIGURE39. TEDs of microtips and nanotips for different temperatures: (a) W; (b) Pt.
VU THIEN BINH ET AL.
118 110
I
I
I
I
I
40 0
I
50
I
I
100
150
I
200
I
250
300
Temperature (K) FIGURE40. Variation of FWHM of the TEDs from the Pt nanotip with temperature. The experimental data falls on a straight line.
volution technique [55], the FWHMs for 293 K, 80 K, and extrapolated to zero temperature are 90 meV, 43 meV, and 20 meV, respectively. An energy dispersion in the range of 20 meV should then be achievable by cooling the tip to liquid helium temperatures, a factor at least 10 times narrower than for the standard FE microtips. Thus the use of nanotips allows us to break through the limiting value of about 0.3 eV for cold field emission. C . Local Heating and Cooling by Nottingham Effect
During the field electron emission process, energy exchanges take place between the emitted electrons and the cathode surface. These exchange processes, or so-called Nottingham effect [56], can cause a heating or cooling at the emitter surface when the average energy of the replacement electrons, which is near EF,is different from that of the emitted electrons. The Nottingham effect is negligible for macroscopic tips with apex radii greater than a few tens of nanometers and FE current < 1 pA [57]. This is partly because both the replacement electrons and the emitted electrons come from energy levels close to EF and thus the energy exchange per emitted electron is limited. The localized peak distribution in the TED and its shifting lead to questions about the local heating or cooling at the nanotips. Experimental measurements show that there is considerable heating induced by the FE process in the case of nanotips and, further-
ELECTRON FIELD EMISSION FROM ATOM SOURCES
119
more, they show that local temperatures of areas with atomic scale can be measured [%I. 1. Localized Peaks under EF :Heating Effect
The energy exchange in a nanotip during FE is depicted in Fig. 41 for the case of two localized bands. The replacement electrons coming from near EF must fill the levels in each localized band emptied by emission (processes 1 and 2 of Fig. 41) and also from the upper localized band level to the lower (process 3 of Fig. 41). In both processes, an amount of energy is lost by the electrons which depends strongly on the number and position of the bands with respect to EF. This implies a dependence on the protrusion geometry and the V,,, as well. Since the energy exchange per emitted electron could have values of the order of electronvolts, this will lead to much larger increases in the temperature at the single-atom apex of a nanotip compared to the conventional Nottingham effect. The experimental problem in the studies of such temperature increases is how to measure the local temperature at the apex of a nanotip during FE. This temperature may be very different from the temperature of the whole tip because of the very small emitting area. It is necessary to have a local probe of the temperature giving atomic-scale resolution. The determination of the local temperatures is based on two effects. The first
FIGURE41. Simplified potential diagram for a nanotip emission which depicts the additional energy-exchange paths during field emission in the presence of localized bands at the apex of the nanotip.
120
VU THIEN BINH ET AL.
effect is the possibility of having a repetitive, back-and-forth motion of a single atom between neighboring atomic sites at the nanotip apex, termed “flip-flop” [59], whose frequency is dependent on the local temperature. The second effect is that the shape of the TEDs depends on temperature. Using procedures based on these effects, the local temperatures at the apex of a nanoprotrusion for electron emissions were determined in the range of to A [%I. In the first procedure the current fluctuations due to the flip-flop of one adsorbed atom between two neighboring sites at a nanotip apex were measured versus the FE currents. For single-atom protrusion tips, the total FE current switches between two fixed discrete values which depend on the atomic configuration of the protrusion. Each current level is associated with a particular TED. An example of two TEDs measured during the two states of a flip-flop is shown in Fig. 42. The number of peaks and their relative positions are preserved during the flip-flop, but the TEDs shift as a whole and the relative peak intensities change. The switching between the two spectra is repetitive as long as the flip-flop continues. This phenomenon allows very easy detection of the flip-flop even for total FE currents from the single-atom tips in the range of A. The variation of the number of counts at a fixed energy during a flip-flop process is
FIGURE 42. Effect of a flip-flop process on the TED from a nanotip presenting two bands. The inset shows change in the total number of counts at one particular energy during a flip-flop. (From ref. 1.581.)
ELECTRON FIELD EMISSION FROM ATOM SOURCES I
I
I
I
975
1000
1025
1050
121
"
950
1075
FIGURE43. Frequency of a flip-flop at a nanotip apex versus V,,,. (From ref [58].)
shown in the inset of Fig. 42. The effect of the emission current on the flip-flop frequency is shown in Fig. 43. For an increase of the FE voltage from 950 V to 1070 V, the flickering frequency increases from -0.1 Hz to -11 Hz. This corresponds to an increase in the temperature in the range of 30 K for a FE current increase from -3 x A to -9 x lo-'* A. The second method is based on the following experimental observation: The shape of the peaks in the TED from single-atom protrusion tips is temperature-dependent . Figure 44a shows a broadening of the high-energy edge of the TED of a nanotip for controlled increasing temperature by using the heating loop at fixed FE voltage and current. As shown in Fig. 44b, for increasing applied voltage and F E current there is a broadening of the high-energy side due to the emission-induced temperature increase, in addition to the shift of the spectra characteristic of the nanotips. The local temperature increases at the apex of the protrusion tip found by fitting the spectra for different FE currents are shown in Fig. 45. The temperature increase can reach a value of -210 K for -1 X A. For higher FE currents the temperature increase is even larger. The protrusion becomes unstable and it can be destroyed by a local melting for I > -10-7 A. 2. Localized Peaks over EF: Cooling Effect The position of the localized bands of nanotips relative to EF can be controlled by the applied voltage because of the field shifting. In particular,
122
VU THIEN BINH ET AL.
G? 0
1.o
2 0.8 8 0.6 c
v v,
2 0.4 1 0
u
0.2 0.0 -1.0
-0.8
-0.5
-0.3
0.0
0.3
0.5
Energy Relative to EF(eV)
a
b FIGURE 44. (a) TEDs from a protrusion tip with one localized band: room temperature and 590 K. The higher temperature is created by loop heating current. (b) Spectra from the same nanotip for different applied voltages and different emission currents. The spectrum at higher voltage shifts to lower energy [4]. It has been numerically shifted by A E to the position of the lower voltage peak (small dots) to show the broadening of the high-energy side of the spectrum that is related to the temperature increase. (From ref. [ 5 8 ] . )
this allows emission from localized levels well above EF of the support tip. This phenomenon is explained by the partial filling of the bands by the tail of the Fermi sea, which acts as a supply function [60]. In this case, energy conservation in the FE process is obeyed. Experimentally, the linear shifting is typically -0.5 eV for the range of
ELECTRON FIELD EMISSION FROM ATOM SOURCES 250
I
I
1
1
123
I
-
-
0.0
0.2
0.6
0.4
0.8
1.0
1.2
1 (nA) . 45. Variation in temperature versus FE current as determined by fitting the FIGURE TEDs from nanotips.
possible applied voltages, which permits us to scan the bands completely through EF in a controlled way. As an example, Fig. 46 shows a narrow band shifting linearly with applied field through EF. It does not change in form or width, but its intensity drops rapidly as it crosses E F . The experimental results show that emission from a peak above EF is possible if the peak or related band is sharper than the Fermi edge itself.
I
I
I
I
I
Energy Relative to EF(eV) FIGURE 46. FEES spectra as a function of applied field from a W nanotip. For the lower voltages the peak shifts to above E F . (From ref. 1601.)
124
VU THIEN BINH ET AL.
Under such conditions, the peaks above EFcan provide significant cooling of the tip due to Nottingham energy exchange effects, because the emission comes exclusively from electrons above the Fermi level. A side consequence of a band being positioned on the high-energy tail that has been noticed is that the FE is strongly dependent on the temperature, in complete contrast to normal FE, which has a very weak temperature dependence [see Eq.(15)]. This is simply because the supply function increases as the temperature is raised and the total current is then proportional to exp(-AElkT) for AE S- kT.
D. Fresnel Projection Microscopy In electron microscopy, efforts to increase resolution have focused mainly on reductions of the spherical aberration, the wavelength, and the energy spread of the electron beam. These approaches give excellent results for specimens which are not sensitive to radiation damage caused by the interactions with a high-energy e beam. However, there still remains much to be done in obtaining high-resolution images of organic specimens. Using the nanotip as an atom-source of electrons in a projection microscope is another approach to achieving observations of carbon and organic nanofibers such as synthetic polymers and RNA [61,62,75]. This combination takes advantage of the simplicity and low working voltages (50-300 V) of the projection microscope, and the unique properties of the fieldemission electron beam from the nanotips that are related to the atomic size of the sources. The experimental images of nanometric fibers were interpreted as Fresnel diffraction patterns from opaque objects, even for fibers whose diameters were down to 1.4 nm.
-
1 . The Fresnei Projection Microscope
Projection microscopy was proposed in 1939 by Morton and Ramberg [63] with their point projector electron microscope. In 1968, E. W. Muller introduced the field ion shadow projection microscope [8, 641 based on the same principle, which is the following. The greatly magnified shadow of an object (magnification factor -lo6) can be obtained by making use of the quasi-radial propagation of field emitted electrons or ions coming from a tip when the object is inside the beam path. The projection or shadow microscope is then essentially a lensless microscope based on the radial propagation of an e beam from a point source (Fig. 47). The image has a magnification factor M given by
i M=-zDld 0
(37)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
I25
Projection microscope
4-- -4- -
MCP
magnification = i/o = D/d FIGURE47. Schematic description of the Fresnel projection microscope. The projection coherent source is a field-emission W nanotip emitting in the range of 200 to 300 V. The image magnification is given in the first approximation by the ratio Dld, where d and D are the distances between the virtual projection point to the object and to the screen, respectively.
where i and o are the image and object dimensions and D and d are the distances of the projection point to the screen and to the object, respectively. Equation (37) shows that the magnification increases by approaching the object to the projection point and could reach values in the range 107-106for projection point-object distances between 10 nm and 100 nm, with the screen located 10 cm away. With recent technological developments due to scanning tunneling microscopy (STM) [65], tip-sample distances of less than 1 nm can now be routinely handled by using piezodrives for controlled nanometric displacements. This has given rise to renewed interest in this projection microscope [61, 66, 671. Among the nanotip characteristics, two are of particular interest for the projection microscope: the atomic size of the emitting area and the
126
VU THIEN BINH ET AL.
protrusion geometry of the nanotips. We show hereafter that both play a role in the image formation and, therefore, in the analysis of the interaction between the coherent nanosource and the nano-objects.
a . The Virtual Projection Point. The distribution of the electric field in the apex region of a nanotip induces trajectory distortions of the emitted electrons, and thus the center of the real source at the apex does not correspond to the projection point or virtual point source [20].The virtual projection point is defined as the intersection of the asymptotes of the trajectories from the distortion-free zone far away from the tip. This is drawn schematically in Fig. 48 for a nanotip and has to be compared with the virtual source of a conventional tip in Fig. 6. It is assumed as a first approximation, even if the distortions depend on the exact geometry of the tip end, that the tip behaves like a lens with a value of the ratio BJB, around 0.5. This means that the minimum distance dminfrom the virtual source to the apex is greater than 2r. From the schematic drawings in Fig. 48, it can be seen that nanotips give smaller dmin and therefore higher possible magnifications compared to hemispherical microtips (Fig. 7), due to their protruding geometry. This allows us to work experimentally under Fresnel conditions, as we discuss below. Moreover, as the distance of the tip to the object is in the range of 100 nm and less, and because in the projection microscope configuration the object also plays the role of an extracting anode, the FE voltages needed to have field emission current are in the range of 50 V to 300 V [68]. Thus,
FIGURE48. Schematic representation of the virtual radial projection point source Vfor a nanotip relative to the surface apex and its geometric center C. Both C and V are much closer to the apex than for the hemispherical tip (Fig. 6). (From ref. [62].)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
127
for high-magnification working distances, the projection microscope is intrinsically a low-energy electron microscope.
6 . Fresnel versus Fraunhofer Diffractions. The above approach of defining the magnification considered the projection microscope only within the “geometric” point of view. However, as the FE beam from nanotips is corning from the last single atom, the interaction between a coherent beam with an object must also be considered [69], in other words, the diffraction of the beam by the object and the interference. Electron interference and holography are intimately related. However, the essence of holography [70] is a two-stage process: first the formation of an interference pattern by adding an intense reference beam with a beam modulated by the specimen, and second, the extraction of information about the object from this interference pattern. The exact mechanism for the diffractogram formation in a projection microscope has to be settled first, and this requires diffraction and interference theory. This step is necessary before information from the object itself can be extracted with confidence from the diffractograms. In the interpretation of the images presented here, we take into consideration only the diffraction mechanisms between a source and an object with the different related parameters (source size, object size, source-object distance, wavelength, etc.) defining the incident wavefront geometry which are basic to the understanding of the interference images. Within the projection microscope configuration, the object-screen distance is typically about 10 cm. It is therefore the distance between the tip and the object and the sizes of the source and the object which will determine the nature of the resulting diffraction [62]. Because the object dimensions (>1 nm) are much larger than the wavelength of the electrons X (-0.1 nm), let us consider the classical electron optics wave theory, which provides a precise formalism for describing the scattering. For illustration, imagine that we have an object 0 having a transmission function T ( y , , z,), illuminated by a point source V which gives a beam illumination B(y,, z,) at the object. Under these conditions, the wavefunction 9 ( P S )at each point P,of an image of the object projected onto the screen is given by the Fresnel-Kirchhoff formula with the Helmholtz-Sommerfeld boundary condition [7 I] :
where Tois the amplitude of the incidence wave, yo and z, are the transverse coordinates in the object plane, k is the wave vector, rloand r,, are
128
V U THIEN BINH ET AL.
tip-object and object-screen distances, respectively, n is the unit vector perpendicular to the object plane ( y , z ) , and the term [cos(n, ros) + cos(n, rt,)]/2 = K(n, ros, rro)is the obliquity factor. The exponential factor describes the spherical waves impinging and scattering from the object with their respective director cosines. B ( y , , z), is the beam shape known from the experimental measurements (Fig. 23) to be Gaussian-like:
where w = zt, sin(a) is the illumination of the beam at the object, and a is the half-beam opening. T(y,, z,) describes the transmission function of the mask object. For example, for an opaque object, T(y,, z,) is 0 inside the object and 1 outside the object. If the object is three-dimensional, the x component is averaged because it is the direction of propagation. The intensity at each point on the screen is then
W,)= lWP,>l2
(40)
Within the experimental situation, x:,
(y2 + z’,).
(41)
This implies that
which is precisely the condition for the Fresnel approximation. Introducing Eq. (42) in Eq. (38) gives
x exp ( - i k ( y o y s
+
XOS
zozs))
exp (ik (Y2 + 2 240
3
)
dy, dz,.
There are two limiting cases for the diffraction. First, under experimental situations where the electron source is small compared to the object and for small source-object distances, the small angle approximation or Fresnel conditions can be applied and the result of Eqs. (43) and (40) is a projection image on the screen which is clearly recognizable despite fringes around its periphery. This is known as Fresnel or near-field diffraction when the wavefront can be considered as spherical within the object dimension.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
129
An increase in the source-object distance results in a continuous change in the fringes. For large source-object distances, the projected pattern will shrink considerably and the fringes will bear little or no resemblance to the actual object. Thereafter, moving the source-object distance changes mostly the size of the diffraction pattern and not its shape. This is Fraunhofer or far-field diffraction. The incoming wave is then nearly planar over the extent of the diffracting object. As a practical rule of thumb, Fraunhofer diffraction will prevail over Fresnel when Y2
x,*% 2
h
and
X,**?
Z2
A
(44)
where yo and z, are the object dimensions. Similarly, if the size of the source is of the order of the object dimension, then the incoming wave is almost a plane compared to the object dimensions, which means Fraunhofer diffraction. In experimental situations where the electron source is atom-size as with the nanotip, the small-angle approximation and Fresnel conditions can be applied for small tip-sample distances, i.e., high magnification, so the resulting projection diffractograms bear the contour of the mask object. This is a big advantage for the experimental observations. 2. Experimental Procedures In the Fresnel projection microscope [61, 621, the electron point source is a W( 111) single-atom nanotip spot-welded to a joule heating loop and in contact with a liquid nitrogen reservoir. The sample and the object holder are attached to a nanodisplacement system which is composed of a commercial piezomotor for the x direction and home-made inertial movement driven by a piezotube for the y - z displacements. The overall displacements are in the range of centimeters in the x , y , z directions. The resolution in the displacements is given by the minimum bending and elongation of the piezotube, which are in the range of 0.1 nm. The projection image is formed -10 cm away from the tip on a multiple-channel plate coupled to a fluorescent screen. These images are visualized and analyzed through a numerical image acquisition system. The entire microscope system is vibration-isolated with a simple pneumatic system, without any internal antivibration system as is normally employed in STM microscopy. Shielding of the stray magnetic field is not needed to obtain nanometric resolution [61, 621. The absolute dimensions of the samples and the scales given in the figures are measured directly by following the displacement of the projection image on the screen versus the motion of the object due to the
VU THIEN BINH ET AL.
130
FIGURE49. Fresnel diffraction patterns (VFE= 300 V) by nanometric carbon holes and fibers. The illuminated area corresponds to the e beam coming from a W nanotip. (From ref. [61].)
deflections of the sample-holder piezotube with applied voltages. The dimensions of the object are then determined directly for any nanotipobject distances with an accuracy given only by the calibrations of the piezodrives whose behaviors are now very well known [65]. This procedure removed the uncertainty in the determination of the object dimensions, because the position of the virtual projection point source V is not known with accuracy due to the deformation of the electric field lines near the tip apex. 3 . Experimental Results a . Nanometric Carbon Fibers. An example of the effects of the nanotip geometry on image formation is illustrated in Fig. 49, which shows the diffraction patterns obtained with carbon nanofibers. A comparison with calculated Fresnel diffractograms is given in Fig. 50. The calculated Fresnel diffraction pattern is obtained with the following parameters: A = 0.7 diameter of the wire = 14 A and point source-object distance = 280 A. This last value corresponds to the distance of the virtual source (see Fig. 48) to the object and not to the actual distance from the nanotip
A,
ELECTRON FIELD EMISSION FROM ATOM SOURCES
131
FIGURE50. Fresnel diffraction patterns (VFE= 300 V) by a nanometric carbon fiber. A nanometric structural defect is indicated by the arrow. (1) Image with the Fresnel projection microscope. (2) Calculated Fresnel fringes from a 1.4-nm-diameter wire illuminated by a beam of A = 0.7 A coming from a point source at 28 nm from the wire. (3) Diameter of the wire. (From ref. [62].)
apex to the object. The observations of nanometric details present along the fibers and the similarity between experimental and calculated Fresnel diffraction patterns indicate that the nanotips used were nearly ideal coherent point projectors. This is consistent with the simple approach given in the upper paragraph. Comparison between these results and those of refs. [66, 671 gives rise to the following considerations: In the case of carbon fibers, for example, a direct comparison with diffraction under the Fresnel conditions, as shown in Fig. 50, already gives good agreement. Interpretations in terms of holography [72,73], although fancy, could be misleading about the nature of the diffractive object. This was recently amply proved when the former experimental observations [66], which had been interpreted by holographic theories as images of the atomic lattice of the substrate [72, 731, were shown to be only Fraunhofer diffractograms of multiple -20 nm carbon holes 1741. Among the diffraction patterns of fibers presented by different authors [66], some of them cannot be interpreted as Fresnel diffraction. In
132
V U THIEN BINH ET AL.
these images the underlying FEM patterns of the tips used to obtain these fringes were composed of multiple spots over the whole screen, which means an actual source that was not limited to one atom. The diffractograms presented in ref. [66] must be interpreted as Fraunhofer diffraction. The same conclusion was reached when some diffraction patterns presented in refs. [66,72] were reinterpreted in ref. [74]. This confirms that conventional FE tips with extended electron source area have to be considered as plane wave sources.
b . Organic Molecules. The FPM is then a low-voltage, high-resolution microscope giving nanometric resolution in the hundred-volt energy range. It is a perfect tool for observations of organic materials such as synthetic and biomacromolecules. This prediction is assessed by the observations with nanometric resolution of synthetic polymers (PS-PVP) [62] and biological molecules of RNA [75] with the FPM. Sample Preparation Procedure. The object preparation in FPM differs from conventional electron microscopies (transmission or scanning) because of the low-energy observation beam of less than 300 V. Within this range of energy, the samples are opaque objects when their thickness are greater than 1 nm. This means that the samples have to be prepared as standalone fibers across holes. The following two-step procedure has proved to be valuable for organic polymer fibers, synthetic 1621 or biological [75]. 1. Dissolution of the macromolecules in a solution at a concentration around a few mg/ 1 . The solvent has to be specific to each sample. For example, chloroform is used for the PS + PVP polymers [62] or NaCl solution in the case of RNA [75]. 2. Deposition of a drop of 2 p1 of this solution on a holey-carbon grid. After evaporation of the solvent, the probability of having polymers stretching across a hole is rather large, allowing observation by FPM as shown in Fig. 51. The polarization of the holey-carbon grid is used during the deposition of RNA to assist the anchoring of the molecules on the substrate due to the negative polarization of the phosphate groups. Note that no other specimen preparation such as staining or metal coating, for example, is done. Polymers [621. The polymers were a mix of polysulfone of bis-phenolA (PS) (95%) and polyvinylpyrrolidone (PVP) (5%). They are the constituents for the fabrication of the hollow fibers used in commercial fibers for human dialysis. Figure 52 is an overview image of the main characteris-
ELECTRON FIELD EMISSION FROM ATOM SOURCES
133
FIGURE51. Low-magnification FPM image of an RNA network stretching across a micrometer carbon hole. The black hole in the middle of the image is the blind direction of the channel plate. Notice the similar diameter of most of the fibers. (From ref. [75].)
FIGURE52. Overview of a supramolecular network of the polymers (PS-PVP) with the indications of some specific structures: 1, polyhedric shape of the network; 2, periodic structure along the fibers; 3, clew. The imaging voltage is 280 V. (From ref. [62].)
134
VU THIEN BINH ET AL.
tics of the polymer network. Other examples of each of the designated specific characteristics are shown with higher magnification in Fig. 53. Some conclusions on the polymer behavior can be highlighted. 1. Observations of polymers with details less than a nanometer can be achieved with an e-beam energy in the range of 200-300 V without any observable degradation of the sample under the beam even after 1-hr-duration observation. 2. The polymer chains are self-organized into polyhedral superstructures with fibers of different lengths and different diameters, with special mention of the presence of the nanofibers sitting across the polymer holes (Figs. 52 and 53a). 3. When the polymers are not stretched over two anchoring points, they form a clew (Fig. 53c). For the polymer this feature should be its minimum energy conformation and is observed only for polymers as opposed to carbon fibers. 4. The Fresnel diffraction patterns show a periodic variation along the length of the structure (Fig. 53b). This periodic variation also has an echo in the surrounding fringes. The comparison between Figs. 49 and 50 and Fig. 53b shows clearly the differences between the experimental diffraction patterns of a carbon fiber and a polymer fiber. The periodic structure for the polymer fibers, which induces modulated diffraction fringes, is also present in the diffractograms of the network. This then raises the question of the formation of periodic supramolecular structures from the initial polymer solution. Figure 53b, for example, suggests strongly the presence of a twist shape for the supramolecular fiber structure.
RNA [751. The capability of FPM for high-resolution analysis of soft materials is also largely confirmed by the observations of the as-deposited A-RNA molecules. For some images comparisons are made with simulated Fresnel patterns of the masks sketched and shown as insets in the figures. The objective of these simulations is not to find out the exact real experimental geometry of the objects, but only to show what kind of mask geometry can give the observed diffractograrns. From the experimentally observed diffractograms, which are presented in Figs. 54 to 60, the following points can be highlighted. The periodic structure along the .fibers. The fibers, whose diameter is around 2 nm, present a periodic variation of the fringe intensity along the longitudinal direction. This periodic variation of the fringe intensities could be observed along the whole fiber length (Fig. 54a). In this figure, the length of the fiber is about 30 nm and the periodicity is -30 nm. Comparisons between the experimental images (Fig. 54b) and calculated Fresnel
ELECTRON FIELD EMISSION FROM ATOM SOURCES
b
135
C
F ~ G U R53. E Detailed observations of some characteristic features of the polymers. Imaging voltages are between 260 and 275 V. (a) Self-organization into polyhedral supramolecular structures. The polyhedric shape of the holes reflects the presence of nanometric structures of the polymer fibers constituting the network. These structures are also echoed in the rich diffractograms inside the polymer holes, as shown by a comparison with the diffractograms of carbon network in Fig. 49. (b) Periodic supramolecular structure of a polymer fiber. The diffractogram suggests the presence of a twist shape. (c) A polymer clew with its surrounding diffraction pattern. (From ref. [62].)
136
VU THIEN BINH ET AL.
b FIGURE54. (a). FPM images of an RNA free-standing fiber at two different magnifications, showing the presence of a periodic variation of the fringe pattern (the high-magnification part is framed inside the low-magnification image). (b). Comparisons with the Fresnel diffractogram: The upper image is the FPM image; the middle diffractogram are simulated diffraction from a 2D mask presented in the lower part, which mimics the shadow of the A-RNA conformation (diameter of 2.3 nm and period of 3 nm). The numerical simulations use a wavelength value of 0.7 8, and a projection virtual source-sample distance of 500 A. (From ref. [75].)
diffraction from a 2-nm-diameter fiber that mimics the periodic variation of the helix pitch in the A-RNA structure [76] show that the periodic variation of the fringes can only be interpreted by the presence of a periodic structure along the fibers which is very near to that of RNA. Secondary structures. Besides the above periodic variation along the fibers whose diameters were mostly around 2 nm, different other conformations were also noticed. 1. The formation ofnetworks. The linear periodic fiber shown in Fig. 54 is observed when it is stretching across a small carbon hole. When
the dimensions of the carbon holes are larger, the characteristic features observed were not single fibers but networks. The network units are of polyhedric shape with the constituent fibers showing a periodic structure and having a diameter mostly in the range of 2 nm. The image in Fig. 51 and the high-magnification images of the networks presented in Fig. 55 are illustrations of this configuration. A cumulative histogram of the measured angles at the crossing points of these networks reveals a peak around 120". This value is also confirmed by the very high percentage of three-
ELECTRON FIELD EMISSION FROM ATOM SOURCES
137
b FIGURE 55. High-magnifications FPM images given the characteristic details of RNA networks. They are indicated by the numbered arrows. (1) 2-nm fiber network and connections without extra material at the crossing point. Figure 56 compares such connections to a simulated Fresnel diffractogram. (2) High-density connection zone. (3) Fibers less than 2 nm in diameter with a connection without extra material. (4) A loop inside a network. This has to be compared with a loop along a fiber shown in Fig. 60. (From ref. [75].)
branch crossing points. This indicates that the network observed is not due to the superposition of individual fibers during the flattening of a 3D distribution. In this latter case, the distribution of the crossing-point angles must be random and the number of fourbranch links must be predominant. This polyhedric-shape net struc-
138
VU THIEN BINH ET AL.
FIGURE56. Fork separation structure showing a connection without extra material at the crossing point. (a) Calculated Fresnel diffractogram (A = 0.7 A, point source-object distance = 500 A); the mask of the object is shown as an insert (not at the same scale). (b) FPM image o f a network connection without extra material shown in Fig. 55. (From ref. [75].)
ture, concomitant with the presence of a periodic structure along the constituent fibers, suggests more an intrinsic molecular structure, a notion which is asserted by the structure of the crossing points that will be discussed just below. At the crossing points between the fibers, their size remains constant (Fig. 56). That means there is no supplementary matter present inside this crossing zone. In order to verify this assertion, we have made Fresnel diffraction simulations for two different cases, with and without supplementary matter at the connection point. The results, presented in Figs. 56 and 57, show very specific fringe patterns for each of the two cases. This proposed secondary structure is very similar to the one indicated by Noller et al. [77] for RNA, in which the connections between the different fibers are done without matter surplus but through a splitting and a prolongation of the two strands. Connections between the fibers could be very dense; in other words in some places the length between two connections could be very small, as indicated by arrows in Fig. 55. 2 . Supercoiled structures The diameter of some fibers could be greater than 2 nm. Moreover, they can present a very complicated profile as illustrated by Fig.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
a
I39
b
FIGURE57. Four-branch connection point with extra material showing a diffraction image very different from Fig. 56. (a) Calculated Fresnel diffractogram (A = 0.7 A, point source-object distance = 500 A); the mask of the object, not at the same scale, is shown as an insert. (b) FPM image of a connection with extra material showing fringes inside the connection zone. (From ref. [75].)
58. This conformation suggests a supercoiled structure. This is strongly comforted by the presence of a smaller-diameter fiber extending out from these features. When a fiber is cut (Fig. 59), it develops a crooked end at the free extremity. This free-end equilibrium conformation is very different from the clew structure developed by synthetic organic polymers as shown in Fig. 53C. Loop structures can also be observed. This conformation can be inside a network (Fig. 55) or as a free-standing loop along a fiber (Fig. 60). To assess the hypothesis of a nanoloop, a Fresnel simulation has been performed. Results show good agreement between the simulated diffractogram and the observed image. Such loop structures were also proposed to depict the super twisted aspect of RNA analyzed by electron microscopy [78].
6. Discussion. From the point of view of electron microscopy, the following points should be stressed. Coherence. The whole beam area is covered with sharp diffraction patterns. This is an experimental indication of the high coherence of the field emitted beam from nanotips at 200-300 V . Moreover, these patterns
140
V U THIEN BINH ET AL.
A
B
FIGURE58. (A) Low-magnification FPM image of an RNA network; the arrow indicates the presence of a supercoiled fiber. (B) High-magnification FPM images of the supercoiled fiber of the framed zone. (a) The initial structure; the arrow indicates a small-diameter fiber extending from the supercoiled conformation. (b) The same fiber modified after e-beam irradiation, showing a decrease in its diameter. (From ref. [751.)
are indications that nanometer fibers have to be considered as opaque objects under the e beams in this energy range. High Resolution. The nanometric resolution of the images obtained with the FPM is within the theoretical limit for the visual detectability of small objects in a statistically noisy image. Using the Rose equation [79], which for the purpose of electron microscopy is do 2 51C(fn)’’2,where do is the characteristic object size, C is the contrast factor relative to the immediate surroundings (in our case l), f is the efficiency of “electron utilization” (assumed to be l), and N is the number of incident electrons per unit area. The images were taken with an exposure time of i s, so do is in the range of 2-3 A. FIGURE 59. (A) Low-magnification FPM image showing a cut RNA fiber with a crooked free end. (B) The FPM image of the free-end frame zone (a), and comparison with the calculated Fresnel diffractogram of a crooked-end fiber shown in the insert. (C) From (a) to (c), FPM image evolution of the free-end conformation under increasing e-beam irradiation doses. (From ref. [75].)
ELECTRON FIELD EMISSION FROM ATOM SOURCES
C
141
142
VU THIEN BINH ET AL.
B FIGURE60. Loop structure along an RNA fiber. (A) Low-magnification FPM image. (B) Detail of the loop which is framed (a), and comparison with the Fresnel diffractogram (b) (A = 0.7 A, point source-object distance = 500 A); the insert represents the loop mask, not at the same scale, that was used for the simulation. (From ref. [75].)
Magnetic Stray Field. The sharp diffraction figures obtained are experimental proofs that the projection microscope using a nanotip as coherent nanosource does not need magnetic protection in order to perform Fresnel diffraction. This is confirmed by the following estimation of the image blurring due to the stray magnetic field. The measured permanent magnetic
ELECTRON FIELD EMISSION FROM ATOM SOURCES
143
field is about 0.5 gauss (-0.5 X tesla) with AC stray field B(o)in -5 x the range of 1 to 5 milligauss tesla) near the microscope chamber. Under these experimental conditions, simple calculations [80] of the deviations of the image at the screen by the Lorentz force effects and/or the change of the phase due to the vector potential due to the stray fields gives A(i) = 2 x lo2 x B
(45)
with A(i) in meters and B in tesla. For the measured range of the stray field B ( o ) , the deviations are from 20 to 100 pm. They are substantially smaller than the fringe widths at the screen, which were in the millimeter range. Thus the blurring will not prevent the observation of the interference fringes, as is fully supported by the experimental results. Irradiation Effects. Irradiation effects are consequences of collisions between the incoming electrons and the atoms of the specimen. Different main processes can be envisaged: Elastic scattering, the atom remains in its ground state, and the electron conserves its energy but changes direction Inelastic scattering that excites the atom Inelastic scattering that ionizes the atom Capture of the incident electron by the atom, followed by a multielectron excitation as in the Auger process, for example The primary damage process is inelastic scattering, which causes either molecular excitation or ionization. The energy dissipated is either converted to molecular vibrations with temperature increases or causes bond scissions as the loss and diffusion of hydrogen and the production of radicals. The damage depends on the energy dissipated in the specimen per unit volume (J ~ m - or ~ electron ) dose ( q = f7 = e n , in C cm-2), which is proportional to the number of incident electrons per unit area. However, the knowledge of the individual damage processes is very poor because the range of primary and secondary processes is very broad and complex. For practical electron microscopy, damage processes and in particular the loss of mass can be observed by following the evolution of the images or, in our case, the diffraction patterns under the irradiation. From the values of the currents and the dimensions of the objects, during the observations of the polymers (PS-PVP) the electron exposures were in the range of 1016-10*7electrons cm-2 s-’. For these polymers, no damage or charging effects were noticed during 15 min- to 1-h-duration observations, i.e., for an electron dose in the range of lot9to 3.6 X 10” electrons cm-*.
144
VU THIEN BINH ET AL.
Observations of the RNA show also that low-magnification imaging could generally be carried out for extended periods of hours without apparent evolution of the fiber structures. The transformations under irradiation happen only when the object is close enough to the nanotip, in other words, at high magnification, which induces a greater irradiation flux. In practical observations, this means that the modifications are observed when the irradiated sample area covered by the beam opening is inside a circle of diameter of less than 50 nm. When these transformations are observed on the screen, the field emission from the nanotips becomes very unstable, thus indicating the presence of adsorbed species on the nanotip. This last phenomenon is an indication of the presence of an evaporation of materials from the fiber and during its morphological changes. In other words, the transformations observed are accompanied by a matter loss from the fibers. The images in Figs. 58-59 are examples of the observed modifications of RNA general structure under irradiation. Figure 58 shows a supercoiled structure becoming thinner under the irradiation. Modifications can be observed more clearly when the free extremity of a fiber is exposed to irradiation, as shown in Fig. 59. Studies by electron diffraction and by electron loss spectroscopy 181, 821 indicate, for complete destruction of the different bases of the nucleic acid (adenine, cystosine, guanine, and thymine), electron doses in the to 5 X lo-’ C cm-* are needed and with an incident range of 5 X energy of the order of 20 keV at 300 K. This means an irradiation dose of 3 X lOI7 to 3 x 10l8 electrons cmP2.Very few studies exist for energy in the range of 50 to 300 V, but let us use these values and some assumptions to interpret the results. In the experimental procedure, the total A. Let us assume current of the incident beam is of the order of now that the destruction doses are deposited within 1 s. This means that with the total current of lo-’’ A, the circular area irradiated by the e beam must be smaller than a circle of -150 nm (for a 3 x 10’’ dose) to 50 nm in diameter (for a 3 x 10l8 dose). These values are in the range of our experimental observations. A strict comparison between the given values and the experimental observations is not realistic because the differences, first in energy of the incident beam and second in the energy transfer process to the sample itself during the irradiation, which depends strongly on the specimen supporting device. For example, in FPM the RNA fibers are standalone nano-objects, while in the other experiments the fibers are deposited or embedded on solid substrate. This comparison tells us that the organic fibers during FPM observation undergo irradiation damages, due to the incident electron beam, only from a certain threshold flux.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
145
E. Ferromagnetic Nanotips: Atomic Beam Splitter As the last application of nanotips, we present now the field electron emission and atomic metallic ion emission (AMIE) studies of Fe nanotips. This was performed to investigate the possible magnetic interactions and beam properties from ferromagnetic nanotips [83]. The Fe tips are obtained from (1 11) or (1 10) Fe whiskers by an electrolytic sharpening. The experimental results show unique emission properties specific to the magnetic atomic-scale protrusions. In particular, in the case of FE, the electron beam from an Fe nanotip splits when the temperature is lowered from above T, (1042 K) to below T,, and this process is reversible upon reheating to above T,. In addition, the AMIE patterns coming from single protrusions consisted of sharp, multiple spots. Both of these effects were not found with nonmagnetic tips. These results show that magnetic nanotips can be used as an atomic-scale beam splitter for electrons and metallic ions when operated for temperatures under T,. Possible explanations of these phenomena are discussed below. The observed FEM patterns from Fe nanotips are presented in Fig. 61 in two consecutive sequences: cooling (a to e), then heating (f to j). The FE voltage in this example is -2870 V. The experimental procedure was as follows. A stable single spot pattern coming from a nanoscale protrusion was obtained at -1100 K; the heating current was then cut, leading to the cooling of the tip while the FEM patterns were recorded on the video. The sequence, Fig. 61a to 61e, are examples illustrating the variation of the FE pattern during the cooling of the nanotip from -1100 K (Fig. 61a) to liquid nitrogen temperature (Fig. 61e). This sequence shows a progressive splitting of the initial singlet spot into a stable doublet during the temperature decrease. The tip was then heated back up to -1100 K. Figures 61f to 61j show a progressive merging of the doublet into one singlet spot during the heating sequence. The splitting/ merging from a singlet toward a doublet spot are reversible processes. In detail: 1. The splitting/merging of the electron beam(s) occurs when the nanotip temperature crosses a critical temperature which is near the Curie temperature. This temperature is well under the crystallographic phase-transition temperature Ta-y, which was checked by FEM observations of an abrupt change in the FE patterns with standard Fe microtips of approximately 100-nm radius. 2. The splitting/merging rates depend on the coolinglheating speed. 3. During the splitting/merging process, the two beams are not of equivalent intensity, although at the final temperature the intensities of
146
VU THIEN BINH ET AL.
a
b
C
d
f
e
FEM observations of Fe nanotip
process
FIGURE 61. Evolution of the field emission pattern from an Fe nanotip as a function of temperature. (a)-(e) Cooling sequence from (a) - 1100 K to (e) liquid nitrogen temperature. (f)-(g) Consecutive heating sequence from (f) liquid nitrogen to (j) 1100 K. The splitting and merging are reversible and are illustrated schematically on the right-hand side. (From ref. [83].)
the two beams could be very similar (i.e., Fig. 61e). The same asymmetry in the intensity occurs during the splitting and the merging. 4. As long as the protrusion has a certain height and the structure underneath is not destroyed, the process is reversible with temperature without measurable change in the localization of the spot.
ELECTRON FIELD EMISSION FROM ATOM SOURCES
147
Fe - AMIE, sextuplet spot FIGURE62. Sextuplet AMIE spot from an Fe nanotip for two different extracting voltages with the schematic illustration of the six bright spots which constitute the pattern (AMIE voltages were, respectively, -9.5 kV and -10.5 kV at -800 K).(From ref. [83].)
5 . The splitting angle of the doublet depends on the temperature of the nanotip, with a maximum in the range of 4-6".
It is important to note that none of these observations took place for W and Au nanotips, nor for standard magnetic Fe microtips. In this last case, this may be because the splitting of the beam occurs only for the atomic-scale geometry of the magnetic protrusion, or its observation is masked by the spatial resolution limit of the magnetic microtips, which is 2 nm [6]. There are also effects on the AMIE beams that are unique to the magnetic nanotips, which could have the same physical cause as for the electron beams. The big difference with W or Au are the particular patterns of the Fe AMIE spots (Fig. 62). Figure 62 shows specific Fe AMIE patterns. A characteristic one is a triangular sextuplet spot shape with sharp edges. The opening angle between two extremities of the triangle is -6", and between each spot it is -2". The intensity of the triplet inside is higher than for the triplet at the outskirt. This difference in intensity between the internal and the external triplets can be easily distinguished when the AMIE voltage is lowered. This is illustrated in Fig. 62. For W and Au, only single spots with opening angles of 2-4" were observed for the whole AMIE temperature range (see Fig. 21). The multiplet-spot AMIE can move due to a gradual displacement of the protrusions over the surface of the base tip. During their displacements their whole initial patterns are conserved. The progressive movement of two AMIE multiplet-spots toward each other can lead to a partial overlapping of their patterns. The whole initial patterns for both the AMIE spots were conserved even during their partial overlapping. This characteristic
148
VU THIEN BINH ET AL.
b FIGURE63. Sequence showing the progressive moving toward each other of two sextuplet Fe AMIE spots (-10.5 kV, -800 K). The sextuplet pattern of each spot is conserved during the displacement and the partial overlapping. The duration of this sequence is a few minutes. (From ref. [83].) (a) Fe-AMIE: experimental observations of the overlapping of 2 sextuplet spots. (b) Schematic representation of the above overlapping process.
is illustrated in Fig. 63, in which two sextuple-spot AMIE patterns move toward each other until their partial overlapping. To explain these data, two general mechanisms, both based on the magnetic phase transition, have been proposed. Interpretations based on crystallographic phase transition has not been taken into account, not only because 1100 K is well under To-,,, but also that the splitting/merging are progressive processes during the temperature variation, which is in contradiction with crystallographic phase transition, which has to be abrupt. These two proposed mechanisms are as follows. 1. Geometric interpretation. The splitting of the e beam is due to the reversible formation of two protrusions during the magnetic phase transition, and AMIE multiplet spots are emitted from an aggregate of protrusions. The aggregation or splitting of the protrusions are due to a rearrangement of the structure caused by magnetic interaction for temperatures under T,. FIM analysis did not give unequivocal answers because of the difficulties of getting stable and controlled progressive field evaporation images. However, we find this hypothesis difficult to handle in view of the classical weak magnetic energy (order of meV) versus the structure modification energy (order of
ELECTRON FIELD EMISSION FROM ATOM SOURCES
149
eV). Furthermore, the conservation of the AMIE sextuplet patterns during their movement and also during their partial overlapping (Fig. 63) would not occur in the case of aggregates of protrusions. 2. Magnetic interpretation. First of all, calculations show that classical explanations taking into account the deviation of the emitted beams by the magnetic field of the bulk tip can be ruled out. The beam deviations in that case are orders of magnitude too small and also cannot produce the observed patterns. This leads to the conclusion that the effect must be due to a very strong magnetic interaction at the atomic scale. This could be either a scattering process, a magnetic diffraction (Aharonov-Bohm like) [84], Stern-Gerlach-like spin selection [ 8 5 ] , or Lorentz force beam deviations under very large and localized magnetic gradients existing at the apex of the ferromagnetic nanotips. The different patterns observed then reflect the magnetic state of the different particles (ions or electrons), the 3D field distribution at the atomic scale apex region, and the nature of the interaction. At present we cannot assess which of these mechanisms, alone or in concomitance, are the proper ones for explaining the presented phenomena.
V. CONCLUSIONS The size reduction to one atom of the field emission area, which is the main characteristic of the nanotips, is obtained by taking advantage of the protrusion effect to enhance locally the field over the topmost atom of the nanoprotrusion. The field emission beam, which comes exclusively from this atom, manifests specific properties that are attached to the atomic size of these nanosources. Applications of the intrinsic physical properties of the nanotip lead to the possibilities of having atomic resolution under FEM, or to measure the energy exchange down to atom size area. The nanotip is a coherent, monochromatic e-beam source. The use of the nanotip as a point source in a projection microscope transforms it into a versatile, low-energy, high-resolution electron microscope: the Fresnel projection microscope. Most of the main physical properties of the nanotips are explained by taking into consideration the physical mechanism of electron tunneling through an atom with a field-deformed triangular barrier. However, some properties observed experimentally are still under consideration such as, for example, the splitting of the e beams and the AMIE with ferromagnetic nanotips.
150
VU THIEN BINH ET AL.
In this chapter the nanotips are presented as e-beam nanosources. They are also AMIE sources, which means metallic ion sources, with all the specific properties attached to their atomic dimensions (Fig. 20). Furthermore, among the applications which take advantage of these properties and not quoted in this chapter are those related to the use of microguns. Microguns are constituted of the integration of the nanotips, used as atom sources of electrons or ions, inside microlens systems [86], with the advantages of a drastic reduction of the size and the aberrations. These microguns can be standalone field emission gun systems or inside an array used, for example, as tools for parallel nanowriting or metallic nanodeposition. ACKNOWLEDGMENTS It is a pleasure to thank V. Semet for his important participation in this work as well as L. Bitar for his contributions and fruitful discussions. The contributions of Dr. R. Semet, Dr. Pham Quang Tho, and Prof. E. Taillandier for the choice of the samples (polymer and RNA) and discussions about the Fresnel projection microscope images are highly appreciated. We acknowledge the technical assistance from the Service Central d’Analyse du CNRS-DCpartement Instrumentation. This work has been supported by European Union Contracts (SCIENCE, HCM, and BRITE), by French and Spanish government agencies.
REFERENCES 1. A. V. Crewe, Conf. on Non-Conventional Electron Microscopy, Cambridge, England (1965); A. V. Crewe, J. Walls, and L. M. Welter, J. Appl. Phys. 39, 5861 (1968). 2. Vu Thien Binh, J. Microsc. 151,355 (1988); Vu Thien Binh and J. Marien, Surface Sci.
202, L539 (1988). 3. Vu Thien Binh and N. Garcia, J. Physique I1,605 (1991);Vu Thien Binh and N. Garcia, Ultramicroscopy 42-44, 80 (1992). 4. E. W. Muller, Ergeb. Exackt. Naturwiss. 27, 290 (1953). 5 . (a) L. W. Swanson and A. E. Bell, in “Advances in Electronics and Electron Physics,” XXIII, L. Marton (Ed.), p. 193, Academic Press, New York (1973); (b) A. Modinos, “Field, Thermionic and Secondary Electron Emission Spectroscopy,” Plenum Press, New York (1984). 6. R. H. Good and E. W. Muller, in “Handbuch der Physik,” XXI, p. 176, Springer Verlag, Berlin (1956). 7. R. Gomer, “Field Emission and Field Ionisation,” Harvard Univ. Press, Cambridge, Mass. (1961).
ELECTRON FIELD EMISSION FROM ATOM SOURCES
151
8. E. W. Muller and T. T. Tsong, “Field Ion Microscopy, Principles and Applications,” Elsevier, Amsterdam (1969); E. W. Muller and T. T. Tsong, Prog. Surface. Sci. 1, l(1974). 9. W. Schottky, Z. Physik 14, 63 (1923). 10. R. H. Fowler and L. Nordheim, Proc. Roy. SOC.Lond. A 119, 173 (1928). 1 I. L. Nordheim, Proc. Roy. Sac. Lond. A l21, 626 (1928); H. C. Miller, J . Franklin Inst. 282, 382 (1966). 12. E. L. Murphy and R. H. Good, Phys. Rev. 102, 1464 (1956); S. G. Christov, Phys. Status Solidi 17, 1 I (1966). 13. H. Boersch, Z. Phys. l39, 115 (1954). 14. R. D. Young, Phys. Reu. 113, 110 (1959). IS. W. P. Dyke and W. W. Dolan, in ”Adv. in Electronics and Electron Phys.,” VIII, L. Marton (Ed.), p. 89, Academic Press, New York (1956). 16. Vu Thien Binh, A. Piquet, H. Roux, R. Uzan, and M. Drechsler, Surface Sci. 25, 348 (1971); Vu Thien Binh and R. Uzan, Surface Sci. 179, 540 (1987). 17. C. F. Eyring, S. Mackeown, and R. A. Millikan, Phys. Rev. 31, 900 (1928). 18. J. A. Becker, Bell System Tech. J . 30, 907 (1951). 19. D. J. Rose, J. Appl. Phys. 27, 215 (1956). 20. A review is given in P. W. Hawkes and E. Kasper, “Principles of Electron Optics,” Vol. 2, Applied Geometrical Optics, Academic Press, London (1989). 21. D. M. Goebel, Y. Hirooka, and G. A. Campbell, Reu. Sci. Instrum. 56, 1888 (1985). 22. D. W. Tuggle and L. W. Swanson, J . Vac. Sci. Techn. B3,220 (1985). 23. A. N. Broers, J. Appl. Phys. 38, 1991 (1967). 24. Vu Thien Binh, A. Piquet, R. Uzan, and M. Drechsler, Rev. Phys. Appl. 5,645 (1970). 25. E. W. Muller, Z. Phys. 106, 132 (1937); A. P. Janssen and J. P. Jones, J. Phys. D: Appl. Phys. 4, 118 (1971); H. W. Fink, IBM J . Res. Deveiop. 30, 460 (1986). 26. L. W. Swanson and L. C. Crouser, in G. A. Somorjai (Ed.), “The Structure and Chemistry of Solid Surfaces, p. 60-1, Inorganic Materials Research Division Series, John Wiley, New York (1969). 27. E. W. Plummer and R. D. Young, Phys. Rev. B 1, 2088 (1970). 28. L. W. Swanson and L. C. Crouser, Surf. Sci. 23, 1 (1970). 29. H. W. Fink, Phys. Scr. 38, 260 (1988). 30. Ch. Kleint and K. Mockel, Surface Sci. 40, 343 (1973). 31. D. W. Tuggle, J. 2. Li, and L. W. Swanson, J. Microsc. 140, 293 (1985). 32. I. L.Sokolovskaia,J. Tech.Phys. (URSS)26,1177(1956);P. BettlerandC. Charbonnier, Phys. Rev. 119,85 (1960). These authors were the first to apply an electric field to have buildup tips. 33. (a) J. J. Saenz, N. Garcia, V u Thien Binh, and H. De Raedt, “Scanning Tunneling Microscopy and Related Methods, NATO-AS1 Series E: Appl. Sci., Vol. 184, 409, R. J. Behm, N. Garcia, and H. Rohrer (Eds.), Kluwer, Dordrecht (1990);(b) H. DeRaedt and K. Michielsen, in “Nanosources and Manipulations of Atoms Under High Fields and Temperatures: Applications,” NATO-AS1 Series E: Applied Sciences 235, 45. Vu Thien Binh, N. Garcia, and K. Dransfeld (Eds.), Kluwer, Dordrecht (1993). 34. D. Atlan, G. Gardet, Vu Thien Binh, N. Garcia, and J . J. Saenz, Ultramicroscopy 42-44, 154 (1992). 35. Review papers on surface diffusion under different driving forces can be found in Vu Thien Binh (Ed.), “Surface Mobilities on Solid Materials, Fundamental Concepts and Applications,” NATO-AS1 Series B: Physics, Vol. 86, Plenum Press, New York (1983). 36. L. W. Swanson and L. C. Crouser, J. Appl. Phys. 40,4741 (1969); L. H. Veneklaasen and B. M. Siegel, J . Appl. Phys. 43, 1600 (1972).
152
VU THIEN BINH ET AL.
37. G. I. Taylor, Proc. Roy. SOC. Lond. A 280, 383 (1964). 38. R. Smouluchowski, Phys. Rev. 60,661 (1941). 39. P. Bettler and C. Charbonnier, Phys. Rev. 119, 85 (1960). 40. G. Neumann and G. M. Neumann, in “Surface Self Diffusion of Metals,” Diffusion Monograph Series, No. 1, F. H. Wohlbier (Ed.) (1972) USA; J. G. Dash, Contemp. Phys. 30, 89 (1989). 41. VuThien Binh, S. T. Purcell, N. Garcia, and J. Doglioni, Phys. Reu. Lett. 69,2527(1992). 42. C. E. Kuyatt and E. W. Plummer, Reu. Sci. Instrum. 43, 108 (1972). 43. P. Serena, L. Escapa, J. J. Saenz, N. Garcia, and H. Rohrer,J. Microscop. 152,43 (1988). 44. H. De Raedt, Comp. Phys. Rep. 7, 1 (1987). 45. N. Garcia, J. J. Saenz, and H. De Raedt, J. Phys.: Condens. Mutter 1, 9931 (1989). 46. C. B. Duke and M. E. Alferieff, J . Chern. Phys. 46, 923 (1967). 47, J. W. Gadzuk, Phys. Reu. B 1,2110 (1970). 48. J. W. Gadzuk and E. W. Plummer, Rev. Mod. Phys. 45, 487 (1973). 49. See, for example, C. Kittel, “Introduction to Solid State Physics,” Wiley, New York (1968). 50. (a) F. Gautier, H. Ness, and D. Stoeffler, Ultramicroscopy, 42-44, 91 (1992); (b) H. Ness, Thesis, UniversitC Louis Pasteur, Strasbourg, France (1995); (c) H. Ness and F. Gautier, J . Phys. Condens. Mutter 7, 6625 (1995). 51. C. J. Chen, Phys. Reu.Lett. 65, 448 (1990); 69, 1656 (1992). 52. N. Garcia, Vu Thien Binh, and S. T. Purcell, Surface Sci. Lett. 293, L884 (1993). 53. P. H. Cutler, J. He, N. M. Miskovsky, T. E. Sullivan, and B. Weiss, J . Vuc. Sci. Technol. B 11(2), 387 (1992). 54. (a) Vu Thien Binh, N. Garcia, S. T. Purcell, and V. Semet, in “Nanosources and Manipulations of Atoms Under High Fields and Temperatures: Applications,” NATOAS1 Series E: Applied Sciences, Vol. 235, p. 59, Vu Thien Binh, N. Garcia, and K. Dransfeld (Eds.), Kluwer, Dordrecht (1993); (b) S. T. Purcell, Vu Thien Binh, and N. Garcia, Appl. Phys. Lett. 67, 436 (1995). 55. R. D. Young and C. E. Kuyatt, Rev. Sci. Instrum. 39, 1477 (1968). 56. W. B. Nottingham, Phys. Rev. 59, 908 (1941). 57. L. W. Swanson, L. C. Crouser, and F. M. Charbonnier, Phys. Reu. 151, 327 (1966); for a review on energy exchange during FE, see ref. pa]. 58. Vu Thien Binh, S. T. Purcell, G. Gardet, and N. Garcia, Surface Sci. 279, L197 (1992). 59. J. R. Chen and R. Gomer, Surfuce Sci. 79,413 (1979). 60. S. T. Purcell, Vu Thien Binh, N. Garcia, M. E. Lin, R. P. Andres, and R. Reifenberger, Phys. Rev. B 15, 17259 (1994). 61. Vu Thien Binh, V. Semet, and N. Garcia, Appl. Phys. Lett. 65, 2493 (1994). 62. Vu Thien Binh, V. Semet, and N. Garcia, Ultramicroscopy, (1995) in press; Vu Thien Binh, N. Garcia, and V. Semet, Phil. Trans. R . SOC.Lond. A 350 (1995) in press. 63. G. A. Morton and E. G. Ramberg, Phys. Rev. 56,705 (1939). 64. E. W. Muller, 15th Field Emission Symposium, Bonn (1968). 65. For a review of STM techniques, see C. Julian Chen, “Introduction to Scanning Tunneling Microscopy,” Oxford Series in Optical and Imaging Sciences, Oxford Univ. Press, New York (1993). 66. H. W. Fink, W. Stocker, and H. Schmid, Phys. Rev. Lett. 65, 1204 (1990);J . Vuc. Sci. Technol. BS, 1323 (1990). 67. J. C. H. Spence, W. Qian, and A. J. Melmed, Ultrumicroscopy 52,473 (1993). 68. R. D. Young, Reu. Sci. Instrum. 37, 275 (1966). 69. N. Garcia and H. Rohrer, J . Phys. Condensed Mutter 1, 3737 (1989). 70. D. Gabor, Nature 161,777 (1948); P. W. Hawkes and E. Kasper, “Principles of Electron Optics, Vol. 3, Wave Optics,” Academic Press, London (1994).
ELECTRON FIELD EMISSION FROM ATOM SOURCES
153
71. Joseph W. Goodman, in “Introduction to Fourier Optics,” McGraw-Hill Physical and Quantum Electronics Series, McGraw-Hill, New York (1968). 72. H. W. Fink, H. Schmid, H. J. Kreuzer, and A.Wierzbicki, Phys. Rev. Leu. 67, 1543 (1991); H. J. Kreuzer, K. Nakamura, A. Wierzbicki, H. W. Fink, and H. Schmid, Ultramicroscopy 45, 381 (1992). 73. J. C. H. Spence and W. Qian, Phys. Reu. B 45, 10271 (1993). 74. G. M. Shedd, J . Vac. Sci. Technol. A 12, 2595 (1994). 75. Vu Thien Binh, L. Bitar, V. Semet, N. Garcia, and E. Taillandier, submitted. 76. See, for example, D. Voet and J. G. Voet, “Biochemistry,” Wiley, New York (1990). 77. H. F. Noller, Annu. Reu. Biochem. 53, 134 (1984); R. R. Gutell, B. Weiser, C. R. Woese, and H. F. Noller, Prog. Nucleic Acid Res. Mol. Biol. 32, 183 (1985). 78. W. R. Bauer, F. H. C. Crick, and J. H. White, Sci. Am. 243, 118 (1980). 79. A. Rose, Ado. Electronics 1, 131 (1948); “Vision: Human and Electronics,” Plenum Press, New York (1973); R. M. Glaser, “Introduction to Analytical Electron Microscopy,” p. 423, Plenum Press, New York, 1979. 80. R. P. Feynrnan, R. B. Leighton, and M. Sands, “The Feynman Lectures in Physics,” Vol. I1 Addison Wesley, London (1964). 81. L. Reimer and J. Spruth, J. Microsc. Spectr. Electron 3, 579 (1978). 82. A. V. Crewe, M. Isaacson, and D. Johnson, in “Proc. 28th Annual Meeting of EMSA, Baton Rouge, La. (1970), p. 264, Claitor’s Publ. Div. 83. Vu Thien Binh and N. Garcia, Surface Sci. 320, L69 (1994). 84. S. Olaru and I. Iovitzu Popescu, Reu. Mod. Phys. 57, 339 (1985). 85. See N . F. Mott and H. S. W. Massey, “Theory of Atomic Collisions,” p. 210, Clarendon Press, Oxford (1965); J. Kessler, “Polarised Electrons,” Vol. 1 , Springer Verlag, Berlin (1976). 86. D. Pribat, Vu Thien Binh, and P. Legagneux, Electrode de focalisation inttgrte pour r6seaux de microcathodes a effet de champ et proctd6 de fabrication, Patent 9014287 (1990).
This Page Intentionally Left Blank
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL . 95
The Convex Feasibility Problem in Image Recovery P. L . COMBETTES Depariment of Electrical Engineering. City College and Graduate School City University of New York. New York. N Y 10031. USA
I . Introduction . . . . . . . . . . . . . . . . . . . . . A . The Image Recovery Problem . . . . . . . . . . . . . . B . Optimal Solutions and Point Estimates . . . . . . . . . . . C . Feasible Solutions and Set Theoretic Estimates . . . . . . . . . D . The Convex Feasibility Problem . . . . . . . . . . . . . I1 . Mathematical Foundations . . . . . . . . . . . . . . . . A . General Notations . . . . . . . . . . . . . . . . . . B . Geometrical Properties of Sets . . . . . . . . . . . . . . C . Strong and Weak Topologies . . . . . . . . . . . . . . D . Convex Functionals . . . . . . . . . . . . . . . . . E . Projections . . . . . . . . . . . . . . . . . . . . F. Nonlinear Operators . . . . . . . . . . . . . . . . . G . FejCr-Monotone Sequences . . . . . . . . . . . . . . . H . Convex Feasibility in a Product Space . . . . . . . . . . . I11. Overview of Convex Set Theoretic Image Recovery . . . . . . . . A . Theoretical Framework . . . . . . . . . . . . . . . . B . Historical Developments . . . . . . . . . . . . . . . . C . Applications . . . . . . . . . . . . . . . . . . . D . The Issue of Convexity . . . . . . . . . . . . . . . . IV . Construction of Property Sets . . . . . . . . . . . . . . . A . Generalities . . . . . . . . . . . . . . . . . . . . B . Sets Based on Intrinsic Properties of the Image . . . . . . . . . C . Sets Based on Properties of the Imaging System . . . . . . . . D . Information Management . . . . . . . . . . . . . . . . V . Solving the Convex Feasibility Problem . . . . . . . . . . . . A . Introduction . . . . . . . . . . . . . . . . . . . B . The Limitations of the POCS Method . . . . . . . . . . . . C . Inconsistent Problems . . . . . . . . . . . . . . . . D . Projection Methods . . . . . . . . . . . . . . . . . E . Extrapolated Method of Parallel Approximate Projections (EMOPAP) . . F . Extrapolated Method of Parallel Subgradient Projections (EMOPSP) . . G . Extrapolated Method of Parallel Nonexpansive Operators (EMOPNO) . . H . Toward Unification . . . . . . . . . . . . . . . . . I . Practical Considerations for Digital Image Processing . . . . . . . VI . Numerical Examples . . . . . . . . . . . . . . . . . . A . Recovery with Inconsistent Constraints . . . . . . . . . . . B . Deconvolution with Bounded Uncertainty . . . . . . . . . . C . Image Restoration with Bounded Noise . . . . . . . . . . . D . Image Restoration via Subgradient Projections . . . . . . . . . I55
156 156 158 159 160 161 161 162 163 165 165 168 170 171 172 172 176 180 184 187 187 187 189 198 199 199 200 202 209 223 226 229 231 232 235 235 240 246 252
Copyright 199b by Academic Press. Inc . All rights of reproduction in any form reserved .
156
P. L. COMBETTES
VII. Summary. . . . . Appendix: Acronyms . References . . . .
. . . . . . . . . . . . . . . . . 259 . . . . . . . . . . . . . . . . . 260 . . . . . . . . . . . . . . . . . 261 I. INTRODUCTION
A . The Image Recovety Problem Image recovery is a broad discipline that encompasses the large body of inverse problems in which an image h is to be inferred from the observation of data x consisting of signals physically or mathematically related to it. The importance of image recovery stems from the growing need for visual information in a wide spectrum of environmental, medical, military, industrial, and artistic fields. More specifically, we can mention scientific applications in astronomy, bioengineering, electron microscopy, interferometry, ultrasonic imaging, flow imaging, radiology, surveillance, nondestructive testing, seismology, and satellite imaging. General references on image recovery and its applications are [51, 1531, [981, [1561, and [1591. Image restoration and image reconstruction are the two main subbranches of image recovery. The term image restoration usually applies to the problem of estimating the original form h of a degraded image x. Hence, in image restoration the data consist of measurements taken directly on the image to be estimated, x being a blurred and noise-corrupted version of h. The blurring operation can be induced by the image transmission medium, e.g., the atmosphere in astronomy, or by the recording device, e.g., an out-of-focus or moving camera. On the other hand, image reconstruction refers to problems in which the data x are indirectly related to the form of the original image h . For example, the term reconstruction would apply to the problem of estimating an image given measurement of its line integrals in tomography or given partial diffraction data in extrapolation problems. Four basic elements are required to solve an image recovery problem:
1. A data formation model; 2 . A priori information; 3. A recovery criterion; 4. A solution method. The data formation model is essentially a model of the imaging system, i.e., a mathematical description of the relation between the original image h and the recorded data x. One of the most common data formation models in image restoration is
THE CONVEX FEASIBILITY PROBLEM
157
x = T ( h )+ u,
(1.1) where the operator T represents the blurring process and u an additive noise component. Within this generic model, various subcategories can be distinguished, according as T is linear or nonlinear, deterministic or stochastic, or according as the noise depends on T(h)or not, etc. Different models can also be considered to reflect situations when the noise is multiplicative, or when several noise sources are present, etc. The basic model (1.1) is also appropriate in a number or image reconstruction problems. For instance, T ( h )will stand for a low-passed Fourier transform in band-limited extrapolation and a Radon transform in tomography. A data formation model is always accompanied by some a priori knowledge. Thus, in (1. l), information may be available to describe the original image h , the operator T , or the noise u . As emphasized in [170], a priori information is an essential ingredient in recovery problems, even if it is often exploited only partially. The recovery criterion defines the class of images that are acceptable as solutions to the problem. It is chosen by the user on grounds that may include experience, compatibility with the available a priori knowledge, personal convictions on the best way to solve the problem, and ease of implementation. The traditional approach has been to use a criterion of optimality, which usually leads to a single “best” solution. An alternative approach is to use a criterion of feasibility, in which consistency with all prior information and the data defines a set of equally acceptable solutions. This will be the framework discussed in this survey. The solution method is a numerical algorithm that will produce a solution to the recovery problem, i.e., an image that satisfies the recovery criterion. This computational aspect of image recovery is critical, as it restricts the choice of recovery criteria. Indeed, a physically founded criterion may yield a numerical problem for which no solution technique is available and it can therefore not be adopted. A conceptual formulation of recovery problems in a Hilbert image space c is I
min @(a)
subject to constraints
(P&,
(1 -2)
aEB
where the functional 0 represents the cost to be minimized’ and where the constraints (VJiEfarise from a priori knowledge and the observed data. A collection of property sets can be defined in B by (Vi E I ) Si = {a E 3 [ a satisfies P i } . I
If a cost 0 is to be maximized, we shall simply minimize -0.
(1.3)
158
P. L. COMBETTES
The feasibility set for the problem is the class of all images that are consistent with all the constraints, that is,
s=
nsi
= {a E
5 I (viE Z)a satisfies VJ.
(1.4)
iEI
Therefore, (1.2) takes the form min @ ( a ) aE3
subject to
aE
n
S;.
iEI
This quite general constrained programming problem can usually not be solved and it must therefore be modified. Modification can be made in two directions: In the conventional image recovery framework, one seeks to preserve the notion of an optimal solution, whereas in the set theoretic framework the emphasis is placed on feasibility.
B . Optimal Solutions and Point Estimates In most engineering problems the criterion of optimality with respect to a unimodal cost function 0 has been used to define unique solutions. The systematic quest for optimal solutions, which is now well rooted in the scientific culture, originated in the late 1940s. It has been fueled to a large extent by the conjunction of technological advances in computing machinery as well as progress in branches of applied mathematics such as optimization theory, numerical analysis, and statistics. Naturally, optimal estimators have also ruled in image recovery and there is no shortage of definitions for optimality . Thus, researchers have proposed criteria such as minimum cross-entropy [23], regularized leastsquares residual [53], maximum likelihood [70, 104, 1421, least-squares error [5], maximum a posteriori [166, 1671, and other Bayesian techniques [71, 87, 901, maximum entropy [70, 119, 1771, and maximum power [168]. Optimal procedures have undoubtedly provided satisfactory solutions in numerous applications. However, certain reservations can be formulated vis-a-vis such approaches. First, the criterion of optimality is inherently subjective, and different criteria may yield different solutions. Thus, some will argue that a maximum likelihood estimate is desirable while other will discount it on account of its many pathologies. Others will argue that the Bayesian framework is better suited to incorporate a priori information. However, it requires a probabilistic model for the original image, a highly debatable issue. Moreover, not all a priori information can be easily described in probabilistic terms, and the resulting prior
THE CONVEX FEASIBILITY PROBLEM
159
distribution is usually too complex to yield a tractable minimization of the resulting conditional expectation. In fact, such pathologies exist for almost every type of estimation procedure and have given rise to many controversies [52, 63, 64, 80, 86, 1861. A second concern with optimal formulations is computational tractability, which requires that (1 3)be simplified by choosing a workable cost function 0 and getting rid of some, if not all, of the contraints (?JiE,. For that reason, one tends to select 0 on grounds which are seldom related to rational and practical goals reflecting the specificities of the problem at hand. For instance, the least-squares error criterion, which usually yields tractable problems, has been used in countless recovery algorithms although its inadequacy in imaging sciences has long been recognized [7]. In addition, the necessity of ignoring constraints leads to solutions which violate known facts about the original image. In short, optimal procedures often amount in practice to finding an image which is optimal with respect to a standard cost function and likely to be outside of the feasibility set S.
C . Feasible Solutions and Set Theoretic Estimates The set theoretic approach in estimation is governed by the notion of feasibility [38]. In other words, one recognizes the importance of the constraints in (1 5 )and, at the same time, the inherent arbitrariness that surrounds the choice of a relevant cost function 0. As a result, the recovery problem is posed as a feasibility problem, namely,
The restoration criterion thus defined is clear: Any image which is consistent with all the information available about the problem and the data is acceptable. The solution to the problem is therefore the set S of feasible images. The main asset of the set theoretic approach is to allow the incorporation of a broad range of statistical as well as nonstatistical information in the definition of a solution. In the engineering literature, this approach seems to have been first applied to systems theory as a nonstatistical way to incorporate uncertainty in modeling, analysis, estimation, and control problems [38]. In this context, the basic idea of an estimation scheme which yields a set based on available information, rather than a single point, can be traced back to [150]. To this day, image recovery remains the most active field of application of set theoretic estimation. This popu-
160
P. L. COMBETTES
iarity can be explained by two main factors. First, image recovery problems are typically accompanied by a great deal of qualitative information about the original image that is not easily expressed in purely statistical terms, which is the only form that conventional estimation methods can exploit. The second factor is that in most cases, a human observer will judge the quality of the recovered image. Since the human eye is not sensitive to standard mathematical goodness measures, the importance of an optimal recovery, in one sense or another, is significantly diminished. Set theoretic image recovery departs radically from the conventional framework of Section I.B, in which the primary criterion of acceptability of a solution was its optimality with respect to some cost and where feasibility was of secondary importance. In this regard, a common criticism against the set theoretic approach is that it does not produce a unique solution. First, as we have just seen, although it may be gratifying to have obtained the “best” image, optimality claims often have little practical value. At best, if an optimal solution does land in the feasibility set, it can be regarded as a qualitative selection of a feasible solution. Moreover, from a philosophical standpoint, demanding that one, and only one, image be acceptable as a solution in problems which are notoriously affected by uncontrollable factors (e.g., noise, uncertain image formation models) may appear somewhat unwise. Finally, it should be noted that methods which yield unique solutions are usually iterative and their solution depends on a stopping rule. Since there is a whole collection of images that satisfy any given stopping rule, a set of solutions is thus implicitly defined, not a single point. All in all, uniqueness of a solution is merely a conservative postulate in the tradition of a certain scientific culture, not a universal, philosophically correct, and rational requirement.
D. The Convex Feasibility Problem So far, we have not put any restrictions on the set theoretic recovery problem (1.6). However, due to the lack of numerical methods for solving feasibility problems in their full generality, we must restrict ourselves to problems yielding closed and convex sets in the Hilbert space 2.In this case (1.6) is called a convex feasibility problem and efficient techniques are available to solve it. Requiring convexity is certainly a limitation since, as will be seen in Section III.D, important constraints are not convex in the selected solution space. Fortunately, in many problems, convex constraints will suffice to define meaningful feasibility sets. For instance, all linear and affine
THE CONVEX FEASIBILITY PROBLEM
161
constraints lead to convex sets as well as linear inequality constraints. In addition, a large corpus of nonlinear constraints are of the convex type. A convex set theoretic image recovery problem involves three steps: 1. Selecting a Hilbertian solution space 8; 2. Selecting the constraints that yield closed and convex property sets (S& in 2 and constructing these sets; 3. Solving the convex feasibility problem (1.6). The selection of a solution space is discussed in Section 111, where we provide a general overview of set theoretic image recovery. The construction of convex property sets from various properties of the image to be estimated and of the imaging system is then discussed in Section IV. Section V is devoted to the question of solving convex feasibility problems. Numerical simulations are presented in Section VI to illustrate various theoretical and practical aspects of convex image recovery. The survey is concluded by a brief summary in Section VII. For the convenience of the reader, we have listed some frequently used acronyms in the Appendix. We shall now start with a review of the necessary mathematical background.
11. MATHEMATICAL FOUNDATIONS We review here the essential elements of analysis that constitute the mathematical foundation of convex set theoretic image recovery. Notations are definitions used throughout the survey are also introduced. Complements and background on general mathematical analysis will be found in [57]. More specialized references are: on weak convergence, [ 161 and [189]; on convex analysis, [8], [65], [ill], and [190]; on projections, [8], [ 161, [ 171, and [ 1871; on nonlinear operators, [75], [ 1 1 11, and [ 1881. A. General Notations
C is the set of complex numbers, R the set of reals, R, the set of nonnegative reals, RT the set of positive reals, Z the set of integers, N the set of nonnegative integers, and N* the set of positive integers. The complex conjugate of z E C is denoted by 2. The family of all subsets of a set S is denoted by ' @ ( S ) .Moreover, the cardinality of S is denoted by card S, its complement by C S , and its indicator function by l , , i.e.,
162
P. L. COMBETTES
+-
E is a real Hilbert s ace with scalar product (-1.).
Its norm is given by (Va E B) ( ( ~ ( 1 = (a I a) and its distance by (V(a, b) E Z2) &a, 6 ) = Ila - 611. The dimension of E is denoted by dim 5 , the zero vector in E by 0, and the identity operator on B by Id. The boundary of a set S is denoted by as. If S C E is an affine subspace, the vector space S is its orthogonal complement. Finally, ‘M denotes the transpose of a matrix M . B . Geometrical Properties of Sets
H
A vector subspace is any nonernpty subset S of ( V a E R)(V(a, 6) E S 2 ) aa
such that
+6 ES,
(2.2)
and an affine subspace is any set S = {a + b I a E V}, where V is a vector subspace and b E E . Now let 6 be a nonzero vector in E and (v,K ) a pair of real numbers. The set (2 3)
H={aEaI(ap7)=K}
is a (closed) affine hyperplane, the set = {a E
E 1 (a 1 b) 5 K}
(2.4)
a closed affine half-space, and the set ?l = {a E
8 I 7) 5 (a I b) 5 K }
a closed affine hyperslab. The closed ball of center r E RT is defined as B ( r , y ) = {a E B I ]la -
(2.5)
E and radius y
y}.
E
(2.6)
Letf: R, +-R, be a nondecreasing function that vanishes only at 0. Then S is f-uniformly convex if 6) E S2)B ( ( a + b)/2,f()(a- 6))))C S ,
which implies that it is bounded, unless S = convex, that is, (Va E [0, I])(V(a, b) E S 2 )a~
(2.7)
H. All of the above sets are
+ (1 - a)6 E S.
(2.8)
The convex hull of a set S is the smallest convex set containing S. S is called a cone (of vertex 0) if ( V a E R:)(va E S ) aa E s.
(2.9)
163
T H E CONVEX FEASIBILITY PROBLEM
A cone S is convex if and only if (V(a,b) E S 2 ) a
+ b E S.
( 2 . lo)
One will often have to show that a set is convex. The following proposition gives sufficient conditions for convexity.
Proposition 2.1 [16]. A subset S of 8 is convex if any of the following conditions holds. (i) S is an arbitrary intersection of convex sets. (ii) S = {a + b 1 ( a , b) E C , X C2},where C , and C, are conuex. (iii) There exists a convex subset C of a vector space 8'and
Either a linear operator T : 8 {a E 5 1 T(a) E C}; Or a linear operator T : 8'-+ B I (3a' E C ) a = T(a')}.
A
+
8'such that
E
A such that S = T ( C ) = {a E
S = T ' ( C )=
A special case of interest is 8' = R , where it is known that the intervals are the only convex sets. (iv) There exists a convex functional g :8 + R [i.e., (2.12) holds] and a real number q such that either S = g-l(]--03,q])or S = g-' (]-w, v[). C. Strong and Weak Topologies
A sequence (an)nz0C 8 converges to a E E strongly if (\la, converges to 0 and weakly if ((a, - a 1 b)),20 converges to 0 , for every b in a. We shall use the notations a, %a and a, La to designate respectively to a. the strong and weak convergence of (an)nzO Let S be a subset of E . Then S is (strongly) closed if for every sequence (an)nzOC S, we have a, 3 a j a E S . The closure of S is the smallest closed set 3 containin S. S is open if CS is closed. The interior of S is the largest open set contained in S. The following proposition gives sufficient conditions for closedness.
8
Proposition 2.2 [57]. A subset S of 8 is closed if any of the following conditions holds. (i) S is a$nite untion or an arbitrary intersection of closed sets. (ii) There exists a continuous functional g : 2 -+ R and a closed set C C R such that S = g-'(C). (iii) There exists a lower semicontinuous functional g : 5 + R and a real number q such that S = g - ' ( l - a , 91).
I64
P. L. COMBETTES
A point a E E is a strong cluster point of (an)n2O if there exists a converging stronly to a. S C E is compact subsequence of (an)n2O if every sequence with elements in S admits at least one strong cluster point in S. Every compact set is closed and bounded. S is boundedly compact if its intersection with any closed ball is compact.
Proposition 2.3 [57]. conditions holds. (i) (ii) (iii) (iv)
A subset S of E is compact i f a n y of the following
S is closed and bounded and dim B < +m. S is afinite union or an arbitraiy intersection of compact sets. S is a closed subset of a compact set.
There exists a compact subset K of a (topological)space E l and a continuous operator T : E’ +. E such that S = T ( K ) . (v) S = {a + bl(a, b) E C,X C,},where C,and C, are compact.
S is weakly closed if for every sequence (an),,2OC S we have a,, 5 a
+ a E S. Every weakly closed set is closed and every closed and convex set is weakly closed. At point a E (aJnaOif there exists a subsequence to a.
Proposition 2.4. ments hold.
S
is called a weak cluster point of of (a,JnrOconverging weakly
Take (an),,2oC E and a E 8.Then the following state-
(i) Zfa, LL a, then (a,,)n2O is bounded and /lull 5 lim i n f w + m ~ ~ u , , ~ ~ . (ii) If (an),,2Ois bounded, then it p o s m s e s a weak cluster point a. (iii) r f (an),,2Ois bounded and possesses a unique weak cluster point a, then a, % a . (iv) I f a, a and i f ( b J n Z oC E satisfies b,, 3 6 , then (Va E R) aa, + b,, A a a + 6 . (v) a, 3 a a, 3 a. (vi) I f dim B < +m, then a, L a 3 a , , 3 a . (vii) If //a,,(I7 //all,then a,, 5 a j a,, 3 a. (viii) I f (a,,)n2OC S, where S is boundedly compact, then a, LL a j a, * a . (ix) If d(a,, S) J+ 0 , where S is closed and uniformly convex, and i f a, 5 a E as, then a,, 3 a.
+
Proof. (i)-(vii): see [1891. (viii): According to (i), (an),,rOlies in some
closed ball B and therefore in the compact set S n B. Therefore, it possesses at least one strong cluster point b, say ankA 6 . Then, by (v), a,, & b and, since a f l k L a ,we obtain a = b. Since (an)nbO lies in a compact set and possesses a unique strong cluster point a , we conclude that a, 3 a [57]. (ix): see [IIO].
165
THE CONVEX FEASIBILITY PROBLEM
D . Convex Functionals A functional on
8 is an operator g : E
(Vq E R) S ,
+ R.* Its sections are the sets
= g - Y l - ~ ,171) = {a E
8 ) g ( a )5 17).
(2.11)
The functional g is convex if (VaE [0, l])(V(a,6 ) E B2)g(aa + (1 - a)b)5 ag( a ) + (1 - a)g(b). (2.12)
If g is convex, then its sections (S,JVER are convex sets. If the sections (S,),ER are closed, then g is lower semicontinuous (1.s.c.). Proposition 2.5 [8, 651. Let g : B + R be a convex functional. Then g is continuous if either of the following properties holds. (i) dim E < (ii) g is 1.s.c.
+w.
In addition, in case (ii), g is also weak I.s.c. in the sense that a, L a
(2.13)
g(a) 5 lim inf g(a,). W+"
As a corollary of (i) above, we obtain a useful sufficient condition for closedness and convexity of a set in Euclidean (finite dimensional real Hilbert) spaces.
Proposition 2.6. Let g : E + R be a convex functional and suppose that dim E < +a. Then, for every 7 E R, the set {a E B ) g ( a )5 7)is closed and convex. We shall say that g is lower semiboundedly compact (1.s.b.co.) if for any closed ball B the sets (3, r l B)qEIW are compact. Now assume that g is convex. The subdifferential of g at a is the set of its subgradients, that is, ag(a) = {t E BI(Vb E E ) ( b - a 1 t ) Ig(b) - g(a)}.
(2.14)
If g continuous at a , then it is subdifferentiable at a , i.e., dg(a) # 0. If g is Ggteaux differentiable at a , then there is a unique subgradient, Vg(a),called gradient: ag(a) = {Vg(a)}.
E. Projections S is a nonempty subset of 8 . As a reminder, this notation means that the domain of
g
is
a.
166
P. L. COMBETTES
1. Distance to a Set The distance to S is the function d ( . , S ) defined as (Va E 9)d ( a , S ) = inf{d(a, b)lb E S}.
(2.15)
Theorem 2.1 [8, 1871. Suppose that S is closed and convex. Then the functional d ( - ,S ) : 9 .+ R + is continuous, convex, and F r k h e t differentiable. We have (Va E E)V d ( a ,S)*
=
2(a - P,(a))
(2.16)
and
(2.17) 2. Projection Operators The projection operator onto S is the set-valued map
n,: E-+ V ( S ) a
{b E S 1 d(a, b ) = d(a, S)}.
(2.18)
In general, 0 5 card n,(a) 5 +m. S is proximinal if (Va E Z) n,(a)# 0, i.e., every point admits at least one projection onto S , and it is a Chebyshev set if (Va E E ) card n,(a)= 1, i.e., every point admits one and only one projection onto S. In the standard Euclidean space, such properties were systematically investigated by Bouligand [ 151,who called points with more than one projection onto a nonempty closed set the multifurcation points of that set. Erdos later showed that the set of multifurcation points of a nonempty closed set of the Euclidean space has Lebesgue measure zero [66]. The set S is approximately compact if, for every a in 3 , every sequence (b,),,o C S such that d ( a , b,) %. d(a, S ) possesses a strong cluster point in S. Proposition 2.7 [17,491. Each property in the following list implies the next.
(i) (ii) (iii) (iv) (v)
S is compact. S is boundedly compact. S is approximately compact. S is proximinal. S is closed.
In addition, ifdim Z <
+m,
properties (ii) through (v) are equivalent.
167
THE CONVEX FEASIBILITY PROBLEM
Theorem 2.2 [8, 161. Suppose that S is closed and convex. Then it is a Chebyshev set: for every a E B there exists a unique point P,(a) E S, called projection of a onto S, such that d(a, Ps(a))= d(a, S ) . The projection operator P, is characterized by the variational inequality (Va E 8 ) ( V bE S)(a - Ps(a)I b - P,(a))
5 0,
(2.19)
which becomes
i f S is a cone, and
i
(a - P,(a) 1 b - P,(a)) = 0
(Va E E)(Vb E S)
or (a - Ps(a) 16) = 0
(2.21)
according as S is an afine or a vector subspace. In Euclidean spaces, the class of Chebyshev sets coincides with the class of nonempty closed and convex sets [94]. However, in infinitedimensional Hilbert spaces, whether every Chebyshev set must be convex is still an open question. A partial answer is that in incomplete pre-Hilbert spaces Chebyshev sets may not be convex [95, 961. The projection operators onto the closed and convex sets (2.3)-(2.6) are given, -respectively, by
(Va E 8)P,(a)
=a
+
I
- ( a b,
b,
(2.22)
lIbl12
if (a I b ) > K
(2.23)
if ( a 1 b ) 5 K ,
a-r (Va E 2) P,(a)
if [la- rIJ> y
=
if I(a - rI( 5 y .
(2.25)
168
P. L. COMBETTES
3 . Relaxed Convex Projections
Let A E [0, 21 and suppose that S is closed and convex. The relaxed operator of projection onto S is defined as (VU E E ) T $ ( u )=
+ A(P,(a) - a).
(2.26)
For 0 5 A 5 1, T i ( a ) is an underrelaxed projection, or underprojection; for A = 1, T i ( a ) is an unrelaxed projection, or projection; for 1 I A 5 2, T$(a)is an overrelaxed projection or overprojection; for A = 2, T i @ ) is the reflection of a with respect to S and is denoted by Rs(a)(see Fig. 1).
F. Nonlinear Operators Let T : E
+
3 be an operator. The set of fixed points of T is Fix T = {a E B I T(a)= a}.
(2.27)
T is contractive if
( 3 k E 10, l[>(V(a, 6) E 2’)11 T(a) - T(b)(l5 klla
- b(l,
(2.28)
nonexpansive if (Wa, b) E 8’)11 T ( a ) - T(b)lls
FIGURE1. Relaxed projection.
- bll,
(2.29)
169
THE CONVEX FEASIBILITY PROBLEM
and firmly nonexpansive if (V(U,b ) E 8’) 11 T ( u ) - T(b))I25 (a - b 1 T ( u ) - T ( b ) ) ,
(2.30)
or, equivalently, if (V(U,b ) E 8’)11 T ( u ) - T(b>//2 5 / / a- blI2 - J/(Id- T ) ( a ) - (Id - T)(b)I(’. (2.31)
T is demiclosed if for any sequence such that a, LL a and T(a,) 3 b , we have T ( a ) = b. T is demicompact if any bounded sequence (an)n2Oadmits a strong cluster point whenever the sequence (T(a,) - u,,),,~ converges strongly.
-
Theorem 2.3 [188]. I f T is contractive, it admits one and only onefixed point. Now let C be a nonempty, closed, bounded, and convex subset of c 6. and suppose that T : C C is nonexpansive. Then Fix T is nonempty, closed, and convex.
Proposition 2.8.
Consider the properties:
(a) T is the operator of projection onto a nonempty closed and convex subset F of 2. (b) T is firmly nonexpansive. (c) T is nonexpansive. (d) Id - T is demiclosed. Then: (i) (a) 3 (b) 3 ( 4 (ii) Suppose that F
3 (d). =
Fix T # 0. Then (b) implies
In addition, (b) 3 (a) if and only i f ( V a E 8)T ( a ) E F. (iii) Suppose that F = Fix T # 0. Then (c) implies (Va E E) T ( a )E B ( P F ( a ) ,) / a- PF(Q)II).
(iv) (b) holds if and only if T nonexpansive.
=
(T‘
+
(2.33)
Id)/2, where T‘ :H +
E is
Proof. (i): (a) .$ (b) follows from (2.19) (e.g., [187]), (b) j (c) follows directly from (2.31), and (c) j (d) is proved in [19]. (ii): Take any a E 8 and let b = PF(a).Then T ( b ) = b and (2.30) gives (a - P
F ( ~I T) ( a ) - P F ( ~ 2 ) )// T ( a ) - f‘F(a)1I2,
(2.34)
170
P. L. COMBETTES
so that we obtain (a - T ( a ) 1 T(a) - P&)) PF(a))I-(la - T ( u ) ( (Therefore ~.
2
0. Hence, (T(a) - a I a -
(2.35) (2.36) which proves (2.32). To prove the second assertion, note that necessity is obvious. As to sufficiency, take any a E E, suppose that T(a) E F, and put b = T(a)in (2.19). Then we get (a - P,(a) I T(a) - PF(a))5 0 which, in view of (2.34), implies T ( a ) = PF(a).(iii): Take any a E E. Then (IT(a)- PF(a)ll = I(T(a)- T(P,(a))(l I[la - PF(a)ll.(iv): see [141] or [187]. Proposition 2.9. Let S be a nonempty, closed, and convex subset of Z. Then for any A E [O, 21 the relaxed projection operator Tk = Id + A(P, - I d ) is nonexpansive.
Pruuf. Let a = A/2 E [O, 13. Then T t = (1 - a)Id + a(2Ps - Id). According to Proposition 2.8(i) + (iv), R, = 2P, - Id is nonexpansive. Therefore T i is nonexpansive, as a convex combination of the two nonexpansive operators Id and R,. Proposition 2.10. Let P , be the operator of projection onto a nonempty, boundedly compact, and convex subset S of E . Then P , is demicompact.
Pruuf. Let
(Un)nzO be a bounded sequence. Then, thanks to Proposition 2.4(ii), it admits a weak cluster point a, say unk a. Now suppose P,(a,) - a, %a’ E Z. Then P,(a,J - uflkLa’ and, thanks to Proposition 2.4(iv), Ps(alk) a + a’. But (Ps(ank))k20 C S. Therefore Proposition A a + a’ and, since Ps(aflk)- ank a ’ , it 2.4(viii) implies that Ps(ank) follows that ank a. In words, (an),,?Oadmits a strong cluster point.
G. Feje‘r-Monotone Sequences
Let S be a nonempty, closed, and convex subset of 8.A sequence (an)n20 is FejCr-monotone with respect to S if (Vn E M V a E S) llafl+,- all 5 llan - all.
(2.37)
Proposition 2.11 [ l l , 191. Suppose that (an)n20is Feje‘r-monotone with respect to S. Then the following properties hold.
THE CONVEX FEASIBILITY PROBLEM
171
(i) (a,,),,sOis bounded and admits at least one weak cluster point. (ii) If all the weak cluster points of ( u , , ) , , lie ~ ~ in S, then ( 3 a E S ) a,, L a . (iii) If (a,,),,sOadmits a strong cluster point a in S, then a,, S a. (iv) I f 9 # 0,then (an),,rOconverges strongly.
H . Convex Feasibility in a Product Space Consider the convex feasibility problem (1.6),and assume that the number of sets is finite, say card I = m. Take a real m-tuple (wJiEIsuch that
cwi=l
and
(Vi E I ) wi > 0 ,
(2.38)
iEI
and let
(2.39) m times
be the m-fold Cartesian product of the Hilbert space 8.We shall denote by a = ( a l ,. . . ,a,) = (ai)i,r an m-tuple in E . B can be made into a Hilbert space by endowing it with the scalar product
wi(ai 1 bi).
(V(a, b) E 2’)((alb)) =
(2.40)
iEI
The associated norm and distance are given by
Let S be the Cartesian product of the sets (S&, convex set
i.e., the closed and
S = X Si = {a E E J ( V iE I ) ai E Si},
(2.42)
i€I
and D be the diagonal vector subspace, i.e.,
D
.
= { ( a , .. , a ) E B ( a E
5).
(2.43)
Thus, to every point a E B there corresponds a unique point a ( a , . . . , a ) E D and vice versa. With these notations, observe that
Ma, b) E D2)((a 1 b)) = (a 16)
and
lllalll = (\all.
Whence, we obtain immediately the following result.
=
(2.44)
172
P. L. COMBETTES
Proposition 2.12. Take (aJnsOC D and a E D, in correspondence with (a,,),,20C 8 and a E 3. Then (i) a, S a (ii) a, 5 a
a, a. a,, L a .
It is also clear that (2.42) and (2.43) imply
s
n
D = { ( a ,. . . , a ) E el(viE I ) a E si>
. . , a> E %la
={(a,.
E
n
SJ.
(2.45) (2.46)
iEI
Therefore, in the product space 8 , we can reformulate the feasibility problem (1.6) as Find a* E s
n
D.
(2.47)
This product space characterization of (1.6) was developed by Pierra in [132]. It reduces the m-set problem (1.6) to the simpler problem (2.47), which involves only a vector subspace and a convex set.
IMAGE RECOVERY 111. OVERVIEW O F CONVEX SET THEORETIC
In this section, we provide a general overview of convex set theoretic image recovery. We discuss the mathematical formalization and the history of the field, as well as specific applications. Finally, we discuss nonconvex problems.
A. Theoretical Framework I . Basic Assumptions Throughout this survey, the image space is a real Hilbert space 3 with scalar product norm \I.I(, and distance d . The original image h is described by a family of constraints (*&,where 0 # I C N. A family (SJiEIof property sets is constructed in 8 via (1.3). Their intersection S is nonempty, unless otherwise stated. (.(a),
2. The Image Space a . General Model. Let (Y, d,p) be a measure space. For most of our purposes, it will be sufficient to take 3 as the Hilbert space 9*(Y, d,p) of (classes of equivalence of) square p-integrable real-valued functions of
THE CONVEX FEASIBILITY PROBLEM
two variables on the domain Y [59, 1491. In defined as (V(a, b) E Z2)(a 1 b) =
173
8,the scalar product is
\ a(5)b(Op(dO. Y
(3.1)
As we shall see, this representation has the advantage of encompassing analog, discrete, and digital image models. b. Analog Model. Here, Y = R2,sd is the associated Bore1 a-algebra, and p is the two-dimensional Lebesgue measure. E then becomes the usual space L2 with scalar product (V(a, b) E L2 X L2)(a 1 b) =
In L2, the Fourier transform operator
-1
6: RZ-, (vl, v2)
c
I\
iwz
a(x,y)b(x, y ) dx dy.
:a
H
jR2 a(x,Y ) exp(-r2+v1
(3.2)
d is defined by
+ y v 2 ) )dx dy.
(3.3)
c . Discrete Model. Here, Y = Z2, d = @(r), and p is the counting measure ( p : A H card A ) . B then becomes the usual space t 2with scalar product
In
e2,the Fourier transform operator 8 :a
-
c
-
B is defined by
d : [-1/2, 1/2]2-3
(v,, v2)
2 C a(m,n>exp(-i2.rr(mv, + nv2)).
(3.5)
m€Z n€Z
d . Digital Model. In digital image processing applications, we are dealing with finite-extent, N x N discretized images [138]. Such an image can be represented by an N x N matrix [a(m9n)105m,naN-l whose entries are called pixels. The value of a pixel is called a gray level and represents the brightness of the image at that point. It is usually more convenient to represent an N x N image by the N2-dimensional vector a obtained by stacking the rows of the image matrix [a(m*n)105m,n5N-I on top of each other [138]. In other words, the ith component of the vector a is the pixel d r n s n ) , where i = mN + n. Consequently, 8 can be taken to be the standard N 2 dimensional Euclidean space EN', which is obtained by taking Y = (0, . . . , N 2 - I}, s4 = '$(Y), and p as the counting measure in Y2(Y, sd, p ) . The Fourier transform S ( a ) = d of a stacked image a E EN2 is its twodimensional discrete Fourier transform (DFT), i.e.,
174
P. L. COMBETTES
B : { O , . . . ,N - 1}*+
c
c c a(mN+n) exp
N - l N-I
(k, I)
H
m=O
(--I
(mk+ nl)
n=O
3 . Set Theoretic Formulation
A set theoretic image recovery problem is entirely specified by its set theoretic formulation, i.e., the pair (E,(SJiE1).The solution, or feasibility, set is S = f l i E r S i All . the images in S are equally acceptable solutions to the problem. The set theoretic formulation is said to be finite if card I < + m and countable if card Z = +co (recall that I C N). It is said to be ideal if S = {h}, meaning that the contraints uniquely define h; unfair if h $Z S, meaning that h fails to satisfy at least one of the specified constraints; inconsistent if S = 0, meaning that at least two of the contraints are incompatible [38] (see Figs. 2-5). Unfair formulations and, afortiori, inconsistent ones arise when inaccurate or imprecise constraints are present. For instance, most of the sets that will be described in Section 1V.B depend on attributes of the original image that may not be known exactly. The same remark also applies to the attributes of the uncertainty process that will be required to construct the sets of Section 1V.C. In addition, such sets based on stochastic information will be seen to be confidence regions whose construction depends
Is4 FIGURE2. Ideal set theoretic formulation.
THE CONVEX FEASIBILITY PROBLEM
175
FIGURE 3. Fair set theoretic formulation.
on the specification of a confidence level. If the confidence level is unrealistically low, the sets may not intersect. Inconsistencies may also be due to inadequate data modeling, for instance, when random variations in the point spread function of an imaging system [48] or noise perturbations in
\
s4
FIGURE 4. Unfair set theoretic formulation.
176
P. L. COMBETTES
FIGURE5. Inconsistent set theoretic formulation.
the data [84, 1521 are not taken into account. A method for obtaining meaningful solutions to inconsistent problems will be discussed in Section V.C. The degree of unfeasibility of an image a E E will be quantified via the proximity function
where the weights (wi)iEIare strictly convex, i.e., wi = 1
and
( % € I ) wi>O.
(3.8)
iEI
In other words, the smaller @(a),the more feasible a. Note that @(a) = OWaES.
B . Historical Developments
It is assumed here that the set theoretic formulation is finite and comprises m sets.
THE CONVEX FEASIBILITY PROBLEM
177
1. Computerized Tomography The field of computerized tomography can be regarded as the starting point of the set theoretic approach in image recovery in the early 1970s. In computerized tomography, measurements are made of the line integrals of a property of the cross section of an object (e.g., X-ray attenuation) along various straight lines by varying lateral displacements at a given angle. The problem is then to reconstruct the image of the cross section from these measurements taken at various angles [85]. This problem is fundamental in diagnostic medicine but also in an increasing number of nonmedical applications [28]. With proper discretization, the original image can be represented by a vector in E N 2 and the reconstruction problem can be written as a system of m linear equations of the type (a I b;) = a;, for 1 Ii d m . From a set theoretic standpoint, each of these constraints restricts estimates to a hyperplane S; = {a E EN2((a I b;) = a;},
(3.9)
and the problem is then to find a point in their intersection S. In [78], a so-called algebraic reconstruction technique (ART) was proposed to this end. It employs the periodic recursion (Vn E N) a,,, = Pi(,)(an)
i(n) = n (modulo m ) + 1 (3.10)
with
to generate a feasible solution. In fact, this mathematical method was developed by Kaczmarz in 1937 [971 to solve systems of linear equations. An alternative projection method was then proposed in [74] under the name simultaneous iterative reconstruction technique (SIRT). In this parallel method, the projections onto all the sets are averaged to form the update, namely, (Vn E N) a,,,
1 m
=-
c.
(3.11)
Pi(U,).
iEI
SIRT is similar to the algorithm devised by Cimmino in 1938 [35] to solve linear systems of equations by successive averaging of reflections onto the sets. A problem with the set theoretic formulation (3.9) is that noise and other uncertainty sources are ignored. As a result, it may be unfair or even inconsistent. In order to incorporate these disturbances, the hyperplanes were replaced in [84] by the hyperslabs Si = { a E IEN2pi - Ei
5
(a I b;)5 a;
+
Ei},
(3.12)
178
P. L. COMBETTES
where E; is a tolerance factor. This feasibility problem was solved by the Agmon-Motzkin-Schoenberg algorithm for affine inequalities [ 1, 1221,
(Vn E N)a,,,
= a,
+ A(P,,,(aJ - a,)
with
i(n) = n (modulo m) + 1 O
(3.13)
Simply stated, this algorithm proceeds as follows: Starting with an initial estimate a,, a sequence is generated, where the new iterate a,,, lies on the segment between the current iterate a, and its reflection 2Pj(,,(a,)a , with respect to the set Xi(,). The position of a,,, on this segment depends on the value of the relaxation parameter A, which determines the step size l\a,+, - a,[\.This framework is discussed further in [%I. 2. The Gerchberg-Papoulis Algorithm The fundamental problem of estimating an image from partial spatial and spectral information has been the focus of a lot of research in various disciplines ranging from crystallography to astronomy. In the absence of any additional information, it can be formalized as the problem of finding an image in S, n S,, where S, is the subset of E of all images consistent with the spatial information and S, that of all images consistent with the spectral information. Although not formulated explicitly in set theoretic terms, the idea of constructing a sequence of points that would alternate between S , and S, in order to converge to their intersection can be found in [102]. In [72], Gerchberg considered the problem of recovering a finite object from limited diffraction data, i.e., of reconstructing a spatially limited image from partial knowledge of its Fourier transform. The proposed reconstruction method was to alternate resubstitutions of the known data in both domains. It can be regarded as a method of alternating projections between the affine subspaces S , = { a E E I a = 0 outside K , }
S,
= {a E E
1 d = g on K2},
(3.14)
where d is the Fourier transform of a , g is a known function, and K , and K , are neighborhoods of the origin in the spatial and spectral domains, respectively. Almost at the same time, Papoulis [I281 proposed the same method to solve a dual problem, namely, to reconstruct a band-limited signal which is partially known in the time domain. In this case, the affine
THE CONVEX FEASIBILITY PROBLEM
179
subspaces are of the form S , = {a E
B 1 a = gon K , }
S,
E 16 = 0 outside K2}.
= {a E
(3. IS)
In the early 1980s attempts were made to formalize the GerchbergPapoulis algorithm into larger classes of successive approximation methods amenable to the incorporation of certain types of a priori knowledge [145, 148, 1651.
3 . Affine Constraints One of the very first abstract set theoretic approaches to image recovery appeared in [181]. The recovery problem considered in this paper was to find an image h known to belong to a closed subspace S, of E given that the observed data consist of the projection of h onto another closed subspace S, of 8.Under certain conditions, a modified sequence of alternated projections onto S, and S (the orthogonal complement of s,)was shown to converge to h. This framework encompasses several basic problems including that considered by Gerchberg and Papoulis (as discussed in Section III.B.2) and the various extensions considered in [ 1601. Additional affinely constrained problems can be found in [120]. The following theorem provides the mathematical foundation for such methods. It was proved in [81] for vector subspaces, but the proof can be extended routinely to affine subspaces. In the case of two subspaces, it is known as the alternating projection theorem and is due to Von Neumann 11761.
:
Theorem 3.1. Let (Si)iE,be ajinite family of m closed aflne subspaces of E with nonempty intersection S . Then every sequence (an)nz0constructed as in (3.10) converges strongly to a point in S . 4. Arbitrary Convex Constraints-The
POCS Algorithm
Despite the apparent disparity in their original formulation, all the above methods share the common objective of producing a solution consistent with a collection of affine or affine inequality constraints. As a result, the associated set theoretic formulations comprise only affine subspaces or half-spaces. The scope of this framework is limited, since many useful constraints encountered in practice are nonaffine, as will be seen in Section IV. Thus, the main motivation for the extension to convex set theoretic formulations is to allow a much larger class of information to the exploited. This extension was made possible by the availability of convex feasibility algorithms.
180
P. L. COMBETTES
As appealing as it may seem, the set theoretic approach would be fairly futile if efficient methods were not available to actually solve (1.6). The field of set theoretic image recovery entered a new era when the image processing community became aware of one such method called POCS, for projections onto convex sets. Although POCS had been used in image reconstruction in [106], it is really [184] which popularized the method and established a broad conceptual and computational basis for convex image recovery. The method of POCS, which extends the Kaczmarz (3.10) and AgmonMotzkin-Schoenberg (3.13) algorithms to arbitrary closed convex sets, is defined by the serial algorithm (VnEN)an+l =an + An(Pn(modulom)+l(an) -an),
where the relaxation parameters
(3.16)
satisfy
( V ~ € N ) E I A , ~ ~ with - E
O<E<
1.
(3.17)
The relaxation parameters provide the flexibility of under- or overprojecting at each iteration. Figs. 6-8 depict orbits generated by POCS for relaxations A,, = 1.0, A, = 0.5, and A,, = 1.5, respectively.
Theorem 3.2. Let (Si)iElbe ajnitefamily of m closed and convex subsets of POCS conof B with nonempty intersection S . Then any orbit (a,,)n80 verges weakly to a point in S . In addition, the convergence is strong if any of the following conditions holds. (i) (3.j E I ) sjn(niEny)siyz 0. (ii) All but possibly one of the sets in (S& are f-uniformly convex. (iii) (Si)iEIis a family of closed afJine halfspaces. (iv) One of the sets in (Si)iEI is boundedly compact.
Proof. Weak convergence was proved in [ 181. Assertions (i)-(iii) were proved in [79] and assertion (iv) in [163].3 C . Applications
In this section we briefly indicate some of the image recovery problems that have been approached within the convex set theoretic framework. We somewhat arbitrarily classify them into restoration problems, tomographic reconstruction problems, and other recovery problems. Some of these studies involve only one-dimensional signals, but they can also be applied The results of [18] and [I631 pertain only to the unrelaxed model (3.10), but they still hold true for (3.16143.17).
THE CONVEX FEASIBILITY PROBLEM
181
FIGURE 6. Unrelaxed POCS algorithm (A, = 1 .O).
to images. It should also be mentioned that the majority of these problems have been solved via the unrelaxed version of POCS. Comparative studies of conventional versus set theoretic image recovery in specific problems can be found in [1251 and 11641.
FIGURE 7. Underrelaxed POCS algorithm (A, = 0.5).
182
P. L. COMBETTES
FIGURE 8. Overrelaxed POCS algorithm (An
=
1.5).
1 . Restoration Problems
The first application of set theoretic methods in image restoration was demonstrated in [171], where various properties of the noise were shown to produce useful convex sets. The stochastic nature of some blurring functions such as atmospheric turbulence and camera vibration has also been addressed using set theoretic methods [48]. Set theoretic restoration in the presence of bounded kernel disturbances and noise was considered in [51]. Sets based on locally adaptive constraints [loll as well as on smoothness constraints [I611 have also been proposed. In addition, set theoretic restoration has been used with other statistically based methods, such as Wiener filtering [MI, [157]. Other studies have focused on the restoration of specific types of image, e.g., rnultiband satellite images [34]. character images [103], echographic images [ I 131, diffraction wave fields [120], optical flow fields and electromagnetic fields [158]. In order to best exploit specific a priori information, the set theoretic restoration problem of [ 1461 was posed in a singular value space rather than in the natural image space. Restoration in the presence of an inconsistent set of constraints was considered in [42]. Finally, set theoretic approaches to regularized restoration were proposed in [99] and [144].
2. Tomographic Reconstruction Problems In the early work discussed in Section III.B.I., the chief objective was to generate an image consistent with the (possibly noisy) projection data
THE CONVEX FEASIBILITY PROBLEM
183
[85]. Various extensions of ART and SIRT relevant to such set theoretic formulations are surveyed in [28] and [26]. More recent work has been geared toward the incorporation of additional constraints relevant to specific situations. For instance, reconstructions must often be performed with limited view data, i.e., with inaccurately measured projections and/or an insufficient number of projections, which will typically result in severe artifacts such as streaking and geometric distortion [140]. In such instances, the set theoretic approach has proven particularly well suited to incorporate a priori knowledge and thereby improve the reconstruction. Thus, a convex set theoretic formulation was used to extrapolate tomographic images reconstructed from a limited range of views in [I061 and [151]. In [152], the formulation of [151] was modified to account for noisy data. In [153], POCS was combined with the method of direct Fourier tomography to reconstruct an image from limited-view projection data. Strictly speaking, these approaches are not set theoretic reconstruction methods per se but, rather, syntheses of a reconstruction method and a set theoretic restoration method. In that sense, they should not be regarded as extensions of ART (or SIRT), where the property sets simply translate the requirement that the reconstruction be consistent with the observed projections. In [ 1241, a more sophisticated convex set theoretic formulation was developed by incorporating additional constraints such as known object support and energy boundedness. Other types of constraints can also be imposed, such as consistency of the error between the recorded projection data and the data obtained by reprojecting the reconstructed image with the uncertainty caused by the numerical approximations of the reprojection method [ 1721. Set theoretic methods have also been used in fan-beam tomography [130]. In the above studies, the solution space is that of the reconstructed image. In [loo], a different set theoretic approach was proposed in which the solution space is the space of Radon transforms of images. A complete set of line integrals consistent with a priori knowledge and the measured line integrals was first obtained by POCS and then used to reconstruct the image via ordinary convolution backprojection. In [178], POCS was used to synthesize the projection matrix from noisy measurements made by a moving array of detectors and the image was then reconstructed by filtered bac kprojec tion. 3. Other Image Recovery Problems
Applications of convex set theoretic image recovery can be found in such fields as electron microscopy [24], speckle interferometry [62], halftone imaging [83], holography [ 1161, and biomagnetic imaging [ 1231. Other
184
P. L. COMBETTES
applications include image recovery from multiple frames of sparse data [139], image recovery from nonuniform samples [ 147, 1801, and recovery of images remotely sensed by image-plane detector arrays [162].
D . The Issue of Convexity The basic objective of the set theoretic approach is to provide a flexible framework for the incorporation of a wide range of information in the recovery process. However, our discussion in this survey is confined to problems in which the constraints yield closed and convex sets in some Hilbert space 2. As mentioned in Section I.D, the reason for this is quite simple: There does not exist any method that is guaranteed to produce a point in the intersection of sets when at least one of them is not convex. The condition that the sets be closed should not cause concern, since a property set Sican always be replaced by its closure
-
si= {a E 2 1 d(a, Si)= 0).
(3.18)
In doing so, one merely adds points which are at distance zero from the points in Si,which will have no significant effect on the solution of a practical problem. The issue of convexity is a more serious one, as many important constraints do not yield convex sets in the desired solution space, which precludes their use. A classical example of nonconvex set is the set
si = { a E 2 1 I&)
= Icz1,1},
(3.19)
based on the knowledge of the Fourier magnitude of the original image over some frequency band K. It arises in various problems in which intensity measurements can be made in the diffraction plane. It is in particular found in the Gerchberg-Saxton method [73] as well as in phase recovery problems [91, 1091. We shall now describe the three main approaches that are presently available to deal with nonconvex problems.
1 . Convexification Convexification is the process of partially enforcing constraints by replacing nonconvex property sets by their convex hulls. This yields a larger set that can still be useful. An example of useful convexification is the set S; of (4.41). Another example is found in [1841, where the set (3.19)
THE CONVEX FEASIBILITY PROBLEM
185
is replaced by si = {a E
E I pl,l
5
lL1Kl).
(3.20)
In some cases, the convexification process may give trivial results. For instance [49], consider the set of all digital images in EN* whose maximum number of nonzero values is known (e.g., star images in astronomy). It turns out that the convex hull of this set is EN’ itself, which means that the convexification process has eliminated the constraint. 2. New Solution Space
If the available information does not yield convex property sets in the selected Hilbert image space E , one may seek a new Hilbertian solution space E’.An option is to obtain E’ via a (nonlinear) transformation of 2.Another option is to redefine the vector space structure of the space. Indeed, recall that the structure of a real vector space V is defined by a socalled addition operation @ : V 2+ V and a so-called scalar multiplication operation 0 : R X V + V which satisfy certain axioms [57]. Hence the definition of the convexity of a set A C V, i.e., (Va E 10, l[)(V(a, b) E A’) (a0a ) f 3((1 - CY)0b) E A
(3.21)
depends on the choice of the operations @ and 0. Therefore, by changing the vector space structure, one can render some sets convex. Such a strategy was implemented in [3 13 in connection with the reconstruction of square-summable discrete signals. The natural space t2with norm llall = (& la(i)12)1’2 was replaced by the new space t* of absolutely summable sequences a whose Fourier transform 3(a) = & satisfies (Vv E [-1/2, 1/21) B(v) # 0. In P-, the operation @ was taken to be convolution and the operation 0 was defined as (V(a, a) E R x l*)CY 0a = g-’(exp(a In(&))).
(3.22)
In addition, a pre-Hilbertian structure was defined by the scalar product ( ~ ( 0 b) ,
E c* x
e*) (a I b), =
I
I12
ln(&(v))ln(b(v)) dv.
(3.23)
-112
It was shown that certain sets, in particular (3.19), that were not convex in t2became convex in t * .The space t*also proved useful for reconstruction from bispectral information [30]. In general, a difficulty that arises in a change of solution space is to render the nonconvex sets convex while preserving the convexity of the other sets in the set theoretic formulation.
186
P. L. COMBETTES
3. Feasibility with Nonconvex Sets
If the two above approaches turn out to be unsatisfactory, a third option is to try and solve the nonconvex feasibility as is. Heuristic attempts have been made to use the periodic projection algorithm (3.10) in the presence of nonconvex sets. This is essentially the approach of [73] and [1091. A formal local convergence result for this method is the following theorem, in which (IIi)iEIdesignates the family of set-valued projection operators onto the sets (SJiEIr as defined in Section II.E.2.
Theorem 3.3 [49]. Let (SJiEr be a j n i t e family of m approximately compact subsets of E with nonempty bounded intersection S and suppose be any sequence that one of them, say S1,is boundedly compact. Let constructed according to the algorithm (Vn E N) a,+l E Hi(,) ( a , )
with
i ( n ) = n (modulo m ) + 1,
(3.24)
where a, is a point of attraction of (SJiElin the sense that r
Then (am,Jn2, admits at least one strong cluster point and all of its cluster points lie in S . In addition, converges strongly to a point in S if C.,zod(am,,S) < +m.
An interpretation of the above result is that convergence takes place locally, i.e., when the initial point a, is suitably positioned with respect to the property sets. A possible candidate for a starting point a, is an image which is feasible with respect to all the convex sets (such an image can be obtained by POCS or by any of the methods described in Section V). Let us also mention that according to Proposition 2.7, if dim E < +x, the conditions on the sets in Theorem 3.3 reduce to closedness of the S,s and boundedness of S . A theorem similar to Theorem 3.3 can also be established for the set-valued version of the SIRT algorithm (3.11). Besides projection methods, another approach to nonconvex feasibility problems is via the unconstrained minimization of a functional whose set of global minimizers is contained in S , e.g., the proximity function @ of (3.7). Note that @ will no longer be convex and that only local convergence results should be expected. In digital image recovery, certain stochastic minimization procedures could be contemplated. However, given their prohibitive computational cost in high-dimensional spaces, this approach seems unrealistic at the present time.
THE CONVEX FEASIBILITY PROBLEM
187
IV. CONSTRUCTION OF PROPERTY SETS A . Generalities In this section we describe how a priori knowledge and data can be used to generate constraints on the solution and construct property sets in E . In general, information may be known a priori or can be extracted a posteriori from the data. An example of possible a priori knowledge is the range of intensity values of the original image. In a recovery problem, a posteriori information can be obtained in various ways. For instance, if the original scene to be restored contains a point source, the blur function can be estimated from the degraded image; moreover, the statistics of flat regions in the degraded image can be used to estimate pertinent noise properties. In image recovery problems, the two main sources of constraints are the intrinsic properties of the original image h and the properties of the imaging system. As demonstrated by the following examples, a lot of useful constraints give rise to closed and convex property sets. These few examples are meant only to illustrate some commonly used constraints. By no means do they exhaust the virtually unlimited list of sets that can be created. B . Sets Based on Intrinsic Properties of the Image
The importance of spatial and spectral information in image recovery problems has been recognized in countless studies, e.g., [9], [37], [82], [I 171, [159], and [160]. As we shall now see, such information is to a large extent straightforward to incorporate in the form of convex sets. Other types of constraints will also be considered.
I. Spatial Properties We provide here examples of sets based on attributes describing the original image h itself in the spatial domain. As a first example, suppose that lower and upper bounds on the amplitude of the original image h are known. This knowledge can be associated with the property set Si = {a E E I range(a)
c [ y , 61).
(4.11
Another common assumption is that the image has limited region of support K [72]. The set associated with this information is
Si = {a E 8 1 a = aIK}.
(4.2)
188
P. L. COMBETTES
Next, suppose as in [173] that h is known over some domain K . Then the corresponding property set is
Si = {a E B I al,
= hl,}.
(4.3) If a bound y2 is available on the energy llh1I2of the original image, one can define the set
I
S; = {a E B \lull 5 y}.
(4.4) More generally, when a bound y is available on the maximum deviation of h from a reference image r, as in [loo], [124], and [157], the associated property set is the ball
Si= {a E E I I(a - rll
(4.5)
Iy}.
A further generalization of this type of closed and convex set is Si = { a E
=- I IlW4 - rll5 y>,
(4.6) where 5E : E + E is a bounded linear operator. For instance, if r = 0 and E is a differential operator, (4.6) is a set of smooth images; if r = 0 and lZ = Id - E f , (4.6) is a set of images that are nearly invariant under the operator X f . Moment constraints have also been employed [154]. They yield property sets in the form of hyperslabs
Si= {a E E I y 5 ( a 16) 5 6 ) .
(4.7)
2. Spectral Properties In many problems, certain attributes of the Fourier transform & I ) = d of the original image are available. In optical experiments, they arise from partial measurements in the diffraction plane. In the following, [dl is the Fourier magnitude of an image a and L d its phase. The Fourier transform operator Q is defined in accordance with the 3’space selected in Section III.A.2. A common assumption in image recovery is that the original image is band-limited [184]. If we designate by K the corresponding low-frequency band, we obtain the property set
Si= { a E E I B = dl,}. In [721 and [151], the stronger hypothesis that frequency band K led to the set
Si= { a E E 181,
fi
= kK}.
(4.8) was known over some
(4.9) This constraint can be generalized by considering the set of images that match approximately a reference image r over some frequency band K,
THE CONVEX FEASIBILITY PROBLEM
189
i.e. [152],
sj = {a E E I Il(ri - ?) 1,11
5 y}.
(4.10)
In particular, the set
sj = {a E E 1 Ilri1,ll
5 y}
(4.1 1)
of images whose energy in a certain frequency band K is within some bound y 2 was proposed in "341. The same study also proposed the closed and convex cone Si = {a E Z I range@) c R,}
(4.12)
of images with nonnegative Fourier transform. The larger set of images with real Fourier transforms had been used previously in [106]. In [42], [108], and "41, knowledge of the phase of h was assumed to construct the set
si = {a E E I Lri = L h } .
(4.13)
3. Other Properties
In the previous section, we have seen how sets could be derived from attributes of the Fourier transform B(h) of the original image. Sets can also be constructed from attributes of other transforms X(h) of h. For instance, X can be the wavelet transform [32,114], the bispectral transform [30], the singular value decomposition [ 1461, or a differential operator [162].
C . Sets Based on Properties of the Zmaging System 1 . Overview
In this section, we describe how information pertaining to the imaging system can be used to construct property sets. The basic principle is as follows. From the data and the knowledge of the deterministic component of the imaging system, one forms an estimation residual which is then constrained to be consistent with those known probabilistic properties of the uncertain components in the system, i.e., measurement noise and, possibly, model uncertainty. As the estimation residual depends on the estimate, one thus obtains property sets in the solution space. Pieces of information relative to quantities such as range, moments, absolute moments, and second-order probabilistic attributes are considered.
190
P. L. COMBETTES
In image recovery, the idea of imposing noise-based constraints on the estimation residual was first implemented in the constrained least-squares restoration problem of [89], where the sample second moment of the residual was forced to match that of the noise. This particular constraint has also been employed in other restoration techniques, e.g., [48], [169], and [171]. In the set theoretic deconvolution problem posed in [171], new constraints were introduced by considering other pieces of noise information (mean, outliers, spectral density) under the assumption that the noise was white and Gaussian. Work in this direction was pursued by considering random convolution kernels [48] as well as more general hypotheses on the noise and the imaging system [44, 501. Some of the sets developed in [171] were reexamined via fuzzy set theory in [36]. In [5 13, the set theoretic deconvolution problem was studied in the context of bounded-error models and the only information available about the noise and the disturbances induced by random kernel perturbations consisted of amplitude bounds. Applications involving residual-based property sets can be found in [421, [491, [991, [1271, and [1441. The following presentation is a synthesis of the results of [44], [50], [Sl], and [171] relevant to the construction of convex property sets. 2. Data Formation Model In this section we introduce our mathematical model for the imaging system. a . Notations. All the random elements are defined on a probability space (a, 9 , P). All r.v.s. are real-valued. The chi-square distribution with L degrees of freedom and mean L is denoted by xi. For every p E R*,, 2p(P) denotes the vector space of r.v.s. with finite pth absolute momect. The abbreviations a s . and i.i.d. stand respectively for P-almost surely and independent and identically distributed. b. General Model. The observed data are discrete and consist of a sequence of r.v.s. (X,JnEZrelated to the original image h via the model
(Vn E E ) X ,
=
T,(h) + V,,
(4.14)
where the random operators (T,JnEZrepresent the imaging system and where the random sequence (VJnEZrepresents measurement noise. Furthermore, these operators are decomposable as (Vn E Z)T,, = 4, + g , ,
(4.15)
THE CONVEX FEASIBILITY PROBLEM
191
where y,, denotes the known, deterministic component of 5,and its unknown component, i.e., the component associated with model uncertainty. (TJnEB and (g,JnEB are taken to be sequences of a.s. bounded linear random functionals on 8.4Moreover, the processes ($,,(h))nEZ and ( V,JnEZare second-order, independent from each other, with mean zero. It follows from the Riesz representation theorem that there exists a sequence of E-valued random elements (TJnEPsuch that (4.14) can be expressed as ('dn E H)
x,= TJh)+ u,,
(4.16)
where
('dn E Z)
u, = $,(h) + v,
= (h
I T,) + v,.
(4.17)
The process (U,JnEH will be called the uncertainty process. It stands for the uncertainty arising from the inaccurate model and the noise. c . Digital Model. If a digital model is assumed (see Section III.A.2.d), then X , may represent the nth pixel of the degraded image in a restoration problem, or a point in a sinogram in a tomographic reconstruction problem. In addition, the T,s in (4.17) are simply N2-dimensional random vectors. are identically distributed with uncorrelated Note that if the vectors ( f,,)nEZ components of variance v, and if the noise (V,JnEZ is white with power v2, then the power of the uncertainty process reduces to
+ ElVOl2= llh112~,+ v2. (4.18) ElUo12= El@ I d . Remarks. The model (4.16)-(4.17) is far from universal, as it covers only situations in which the known component of the system is linear and the uncertainty is additive and affine. Nonetheless, it adequately approximates many physical systems encountered in imaging science and has the advantage of allowing unmodeled dynamics and random perturbations. In general, the operators (T,JnEZ will represent a known mean component of the system and the operators unknown variations about it. This level of generality is required in various contexts. For instance, in atmospheric imaging, random fluctuations of the index of refraction can seriously degrade the image and they must be accounted for. Other examples are found in X-ray imaging, where the image formed by a phosphor screen-film system results from the stochastic amplification and the random scattering of quanta, and in applications where the recordThe 7 , s are therefore cont@uous. Linearity of the 7 , s will guarantee the convexity of the S,s while continuity of the 9,s will guarantee closedness of the Sis [50].
192
P. L. COMBETTES
ing device is subject to random motions. Pertinent statistical descriptions of various imaging systems can be found in [14], [69], [77], and [185]. On the other hand, there also exists a vast body of problems for which the noise-only model (Vn E Z) X , = T , ( h ) + V ,
(4.19)
is adequate. This is actually the model considered in [50], which resulted in a somewhat simpler analysis of the set construction process. 3. Set Construction Method
Given a proposed estimate a of h , the residual process ( Y , ( U ) ) , , ~is defined by (Vn E E ) Y,(a) = X , - T,(a).
(4.20)
According to (4.16), the processes ( Y,,(h)),Ez and ( U,JnEZare equivalent, i.e., (Vn E Z) Y,(h) = U ,
a.s.
(4.21)
Therefore, they share the same probability theoretic property. Consequently, any known probabilistic property qi of the uncertainty process constrains estimates to lie in the random set
Si= {a E E I ( Y , ( U ) ) ,satisfies ~~ qi}.
(4.22)
Of course, this set cannot be utilized directly since only a finite segment Y ( a ) = ( X , - ~ , , ( u ) ) of , ~the , ~residual ~ process is observable in practice. We therefore replace (4.22) by the property set
Si= {a E E I Y ( a )is consistent with Wi}.
(4.23)
The above consistency statement can be formulated explicitly via statistical confidence theory. To this end, qjis associated with a statistic Qi(h) of Y(h) whose distribution, exact or asymptotic, is determined. The set (4.23) is then rewritten in the more practical form
Si= {a E E 1 Qj(a)E ri},
(4.24)
where the confidence region ri is based on the distribution of Q ( h ) and some confidence coefficient 1 - E ; E 10, 11, i.e., 1 - ci = P{w E Rl h E S j ( w ) } = P{w E R I Qi(h,w) E rj}.
Henceforth, L will designate the length of the sample path.
(4.25)
193
THE CONVEX FEASIBILITY PROBLEM
4. Sets Based on Range Information
The L property sets arising from bounds on the amplitude range of the random variables ( U,JnEZare
1 IX, - T,(a)lS 6,)
S, = {a E
for
1 5 n 5 L.
(4.26)
Proposition 4.1 [50]. The sets (Sn)l.=n5L are closed and convex. We shall now see how the parameters (8n),5n5Lcan be determined from information on ((h I T,))nEZand (V,JnEz.The energy llh112 of the original image is assumed to be known. a. Bounded Error Model.
Suppose that the random sequences and (V,JnEZare a s . uniformly bounded, say,
( ~ ~ ~ n ~ ~ ) , E Z
(Vn E Z)
llffl,ll 5 K~
and
IV,l
a.s.
IK*
(4.27)
Then, it follows from (4.17) and the Cauchy-Schwarz inequality that
(Vn E z) Iu,ls I(h I f,J + Iv,l
(4.28)
llhll * llfflll + lvfll 5 IlhllKI -k K2 a.s. 5
(4.29) (4.30)
+
This property places estimates in the set (4.26) where 6, = llhllKl K ~ Let us note that h lies in each S, almost surely. These sets can therefore be employed with a 100 percent confidence coefficient.
.
b . I.I.D. Model. Suppose that the ( h 1 T,,)s and the V,s are i.i.d. and that the distribution functions of 11F,,l1 and V , are known. Then, by virtue of (4.291, we can find 6 E R such that (4.31)
P{wERIIUa(o)156}= l - E n ,
where 1 - E, is our preset confidence coefficient defined in (4.25). Since the U,s are also i.i.d., a11 the points in the residual path should lie in the confidence interval [-a, 61 with probability 1 - E , . Therefore, the L sets of images that satisfy this constraint are given in (4.26), where (Vn E (1, . . . , L}) 6" = 6. c . General Model. Let us assume that the distribution functions of the r.v.s. (IlF,[l)15nsLare known as well as those of the r.v.s. (V,,)15ncL. Then, thanks to (4.29), we can find parameters (6n),5n5L for the sets (4.26) such that P{o E R I I U,(W)l
I6),
= 1 - E,
.
(4.32)
194
P. L. COMBETTES
5. Sets Based on Moment Information It is assumed that the uncertainty process (Un)nEZ consists of i.i.d.r.v.s. Extensions of the following results to dependent variables are possible thanks to the various central limit theorems that exist for mixing processes (see, e.g., [13]). a. Mean.
The sample mean of the uncertainty process is the statistic (4.33)
A straightforward application of the standard central limit theorem shows that under our assumptions M is asymptotically normal with mean zero and variance a’ = ElU0/*/L[67]. The same property should therefore be satisfied by the residual process. Whence, for a given confidence coefficient, a confidence interval [-a, a] for M/c+is determined from the tables of the standard normal distribution by making the normal approximation. The set of images that yield a residual sample mean within this confidence interval is S,
= {a E
3I
L
X, - T n ( a ) 5 aVEi$Q}.
(4.34)
In=,
The second moment EJUo)2 can be obtained from (4.18) if the required conditions are met. In general, note that the above assumptions on (U,),,, are satisfied when the (h I f n ) s and the Vnsare i.i.d. and that v, A EllT011’ A and v2 = ElVo12are known. Thanks to our hypotheses, the variance of Uo can then be majorized by ElUOl2= El(h I fo)12
+ ElVo125 Ilhll’v, + v2
(4.35)
to produce a useful bound in (4.34).
Proposition 4.2 [50]. S, is closed and convex. b. Absolute Moments. Suppose that, for a fixed p E [ I , +m[, Uo E %2P(P>and that the pth and 2pth absolute moments of Uoare known. The pth sample absolute moment of the uncertainty process is the statistic
M,
I L =
C
n=l
IunIp*
(4.36)
Under the above hypotheses, as the sample size L tends to infinity, M, is asymptotically normal with mean EJUolPand variance a: = (ElUOl2P E21UOlP)/L [67]. Therefore, by invoking the limiting distribution, one can compute a confidence interval [-a, a ] for (M, - EIUolp>/a,based on
THE CONVEX FEASIBILITY PROBLEM
195
some confidence coefficient. Hence, the subset of E of images which yields a residual sample absolute moment within the desired confidence interval is
where (4.38) and (4.39) Let (4.40) denote the convex deficiency of S, and (4.41)its convex hull. Then S p = S;\S; Proposition 4.3 [50]. S; is closed and convex. In the particular case when p = 2 and U, is zero mean Gaussian, the exact distribution of LM,IE(Uo12is a xi [67]. Thus, from the tables of the xt,one can obtain a value of t2which is more accurate than that resulting from the normal approximation. As an example of computation of the pararpeter &. consider the case when the ( h I Fn)s and the V,s are i.i.d. and Elli'01)2p, EllfollP, ElVo12p,and EIVolp are known. Then the U,s are also i.i.d. and their pth absolute moment can be rnajorized as
I T0)I + IV0l)P I 2P-'(El(h I f',>lp + EIVolp) 5 2P-l(llhllPEllf0;o/lP + E J V , I P ) .
ElUOlP 5
On the other hand, we can majorize approximated as above.
CT; by
(4.42)
(4.43) (4.44)
EIUo12p,which can itself be
196
P. L. COMBETTES
6. Sets Based on Second-Order Information Since the processes are real-valued, the spectral distributions are defined on [0, 1/21 (see [58] for details). It is assumed that L is even (if not, L/2 should be replaced by (L - 1)/2 thereafter). a . Gaussian White Uncertainty Process
Theorem 4.1 [143]. Let (U,JnEZbe a zero mean Gaussian discrete white noise process with power u2.Dejne (Vk€{O , . . . , Then
(i) The statistics (Ik)Os,cL/2 are independent. (ii) The statistics 10/2u2and lLl2/2a2have a x: distribution. (iii) The statistics ( l , / u 2 ) , ~ , ~have L , 2a-xi l distribution. Now, suppose that ( U J n E Zsatisfies the assumptions of Theorem 4.1. Then, from Theorem 4.1 and the tables of the x: and xi distributions, one can determine confidence intervals [ O , p,] and [ O , p23 for the r.v.s in (ii) and (iii), respectively. Consequently, the sets of images that produce a residual path consistent, to within a desired confidence coefficient 1 E , , with the whiteness and normality of the uncertainty process are
(4.46) where
(Vk E (0, . . . , L/2}) 6, Proposition 4.4 [501. The sets
=
Lu2P, {Lcr2P,i2
(S,)OsksL/2
ifk=OorL/2
if 0 < k < L/2.
(4.47)
are closed and convex.
Note that since the x$ distribution is simply an exponential distribution with parameter 112, (4.47) reduces to
(Vk f (0, . . . , L/2}) 6,
=
{z?;
if k
if 0 < k < L/2.
ln(e,)
In addition, it should be observed that, for k same set as the mean set (4.34).
= 0 or L/2
=
(4.48)
0, S, is essentially the
197
THE CONVEX FEASIBILITY PROBLEM
b. Non-Gaussian White Uncertainty Process. Suppose that ( U,JflEZ is a discrete white noise process consisting of i.i.d.r.v.s all distributed as a zero mean r.v. Uo E Z4(P),with variance u2.Then the r.v.s in (ii) and (iii) of Theorem 4.1 are asymptotically distributed as a xi and a x:, respectively [93]. Thus, under relatively mild conditions, the conclusions of Theorem 4.1 hold in an asymptotic sense. Consequently, since in image processing applications L is typically large, the sets (Sk)O&cL/Z of (4.46) can be used. c . Correlated Uncertainty Process. In this section, we further generalize the analysis by dropping the whiteness assumption. We shall base the construction of a spectral set in this case on the following theorem.
Theorem 4.2 [143]. Let (U,,)nEZbe a zero mean, strictly stationary, strongly mixing process with summable second- and fourth-order cumulant functions and spectral density g . Let 0 = vo < v I < * * < v,, = 1/2 and
Then
(i) The statistics (zk)Oaksrnare asymptotically independent. (ii) The statistics Zo/g(0) and Zrn/g(1/2) are asymptotically distributed as a x:. (iii) The statistics (2zk/g(Vk))l~ksm-l are asymptotically distributed as a xi. Loosely speaking, Theorem 4.2 states that if the span of dependence of the process is small enough, the results of Theorem 4.1 can be generalized for large L . Now suppose that (U,),,, satisfies the hypotheses of Theorem 4.2 and that its spectral density g is known at points 0 i vo < v 1 < * < v, I1/2. Then, given a confidence coefficient 1 - & k , one can compute the confidence intervals [O, p,] and [0, p2]for the r.v.s in (ii) and (iii), respectively, by invoking their asymptotic properties [as before, note that p2 = -2 In(ck)]. (U,JflEZ and (Yn(h)),,EZ being equivalent, this leads to the sets
-
L
(x,- T , ( a ) )exp(-i2rrvkn)
1
2
5
z")
for 0 Ik
5 m,
(4.50) where Lg(0)p1/2
Lg(Vo)p2/4
if vo = 0 if vo > 0,
(4.51)
198
P. L. COMBETTES
and t m = {
Lg(v,)pz/4
if v m < 112
Lg(1/2)p,/2
if v,,
=
1/2.
(4.53)
Naturally, Proposition 4.4 still holds. The processes ((hlp,,)),,Ezand (VJnEi?have mean zero and are independent from each other. Therefore, if they possess respectively spectral densities g , and g,, the spectral density of (U,,)nEE will be g = g, + g,. In particular, if the ?',,s are i.i.d. and if (V,JnEz is white with power v,, g will be defined as (Vv E [0, 1/21) g(v) = 2(El(h
I To)[z+ v,).
(4.54)
This expression will be evaluated as in (4.18) under suitable hypotheses or majorized as in (4.35) in general. D . Information Management
In order to produce the most accurate set theoretic estimates, one should exploit all the information available in a given problem. Indeed, the larger the number of sets intersected in (1.6), the smaller the resulting feasibility set S . This statement, however, should be tempered by the requirement that the information be utilized efficiently and reliably. To process the available information efficiently, all the constraints that do not contribute to a significantly smaller feasibility set should be discarded, especially if their processing cost is high (meaning, for instance, that a projection method is employed to find a feasible solution and that the projections onto the associated sets are computationally involved). The issue of reliability comes into play when statistical constraints are present, as in Section 1V.C. In that case, the feasibility set depends on a realization of the stochastic data process (4.16) and one will obtain a reliable set theoretic formulation only if the confidence level c
=
P{w E .n 1 h E S(w)}
(4.55)
on the solution set is sufficiently large, say, c 2 0.90. In the jargon of Section III.A.3, c is the probability of obtaining a fair set theoretic formulation. Of course, one has control only over the confidence coefficient I - E ; placed on each property set in (4.25). It should be borne in mind that these coefficients should be determined in terms of the sets used and not preset to some ad hoc value. To illustrate this point, consider
THE CONVEX FEASIBILITY PROBLEM
199
the scenario of Section IV.C.4.6 and suppose that the L sets (4.26) are to be used. If, as suggested in certain digital image recovery studies, one took 1 - E = 0.99 as a confidence coefficient on each set, one would arrive at an overall confidence of c = 0.99L= 0.99N2= 0. Consequently, such a set theoretic formulation would be unlikely to be fair or even consistent, and would fail to represent reliably the original image. A 99 percent confidence on each set might be acceptable when just a few sets are used, e.g., mean and second moment, but not in large-scale problems. In general, the statistics (Qi)iE, defining the property sets (4.26), (4.34), (4.371, (4.461, and (4.50) may be dependent and the relation between c and 1 - .zimay be difficult to establish when joint distribution functions are not available. Such simultaneous inference problems are discussed in [118]. Coming back to the problem of using information efficiently, let us stress that in the presence of statistical constraints, a trade-off arises in the selection of property sets. Indeed, the confidence coefficient on each set must increase with the number of sets selected in order to maintain a fixed overall confidence in (4.55). Consequently, one ends up intersecting a larger number of larger sets. This certainly increases the complexity of the resulting feasibility problem while possibly having little effect on reducing the feasibility set. For instance, the information that the uncertainty process is white and Gaussian with mean zero and known power leads to an infinite number of sets of type (4.37), since all the absolute moments are then known. Of course, not all of them should be used. Thus, efficiency and reliability appear as two intertwined factors that should be carefully considered in selecting property sets.
V. SOLVING THE CONVEX FEASIBILITY PROBLEM A . Introduction The goal of this section is to describe methods to solve the convex feasibility problem (1.6).Recall that, unless otherwise stated, (Si)iEI is a countable family of closed and convex subsets of 8 with nonempty intersection S . The convex feasibility problem is a central problem in applied mathematics [ l l , 25, 38, 56, 1 1 1 , 1411, which can be formulated in various ways, such as: 1. Finding a common point of closed and convex sets; 2. Finding a common fixed-point of nonexpansive operators; 3. Finding a common minimum of convex functionals;
200
P. L. COMBETTES
4. Finding a common zero of maximal monotone operators; 5 . Solving a system of variational inequalities;
6. Solving a system of convex inequalities. Surveys of methods for solving such problems can be found in [25] and [38]. Since these surveys were written, the convex feasibility problem has been the focus of a significant research effort. As a result, a good part of the material presented here will be new. It should also be mentioned that two very important papers in this area were published in 1967 by Browder [19] and Gubin et al. [79]. The importance of fundamental concepts such as FejCr monotonicity, admissibility, and bounded regularity was stressed in these papers and basic proof techniques were established. More recent work has mainly been geared toward various generalizations, especially in the direction of parallel algorithms. In Section V.B we shall first discuss the limitations of the popular POCS algorithm, which will motivate the subsequent developments on alternative algorithms to solve the convex feasibility problem. In Section V.C, we discuss a parallel projection method for solving in a least-squares sense inconsistent image feasibility problems. We then go back to consistent problems and discuss successively projection methods in Section V.D, approximate projection methods in Section V.E, subgradient projection methods in Section V.F, and finally fixed-point methods in Section V.G. These various approaches are considered from a higher perspective in Section V.H. For the sake of completeness, we shall maintain the discussion at a fairly general theoretical level. We are nonetheless aware of the more practical concerns of practicing engineers and scientists, who are interested mainly in digital image processing applications, in which recovery is performed on a digital computer with a finite number of constraints. Section V.1 will be devoted to this framework and a number of practical issues will be discussed there. Afew proofs have been included to illustrate the relevance of certain assumptions and give more theoretical insight into convergence issues. B . The Limitations of the POCS Method
Let us recall that the POCS algorithm is defined by the iteration process (Vn E N) a , + , = an + A n ( P i ( n ) ( a n ) - an),
(5.1)
where the control is periodic, i.e., (Vn E N) i(n) = n (modulo m ) + 1
with
m
= card I
< +w,
(5.2)
THE CONVEX FEASIBILITY PROBLEM
20 1
and where the relaxation parameters satisfy (Vn E N) E
5
An 5 2 - E
with
0 < E < 1.
(5.3)
As mentioned in Section IILC, POCS has been the prevalent solution method in convex set theoretic image recovery. It is nonetheless limited in several respects. 1. Serial Structure
A salient feature of POCS is its serial algorithmic structure: At each iteration, only one of the property sets can be activated. Clearly, such a structure does not lend itself naturally to implementations on architectures with parallel processors. 2. Slow Convergence
A problem with POCS which has long been recognized is its slow convergence. Conceptually, the algorithm can be accelerated by properly relaxing the projections at each iteration. Unfortunately, even for simple set theoretic formulations, there is no systematic method for determining (A,),,o so as to speed up the iterations. For instance, when all the Sis are affine half-spaces, there is no systematic answer as to whether underrelaxations are faster than overrelaxations or vice versa [85, 1151. Likewise, in the studies reported in [159], only heuristic rules for specific problems are given. 3 . Inconsistent Problems The convergence properties of the unrelaxed version of POCS, that is, (Vn
N, an+I
= pn (modulo m ) + l(an)
(5.4)
in inconsistent problems were studied in [79] (additional convergence results were recently established in [ 121). Theorem 5.1 [79]. Let (an)nkO be any sequence generated by (5.4) and suppose that one of the sets in (Si)15i5m is bounded. Then there exist points (Zii)15i5m such that P,(ii,J = Zi, and Pi(Zii-,) = Zii for every i E (2, . . . , m } . Moreover, for every i E (1, . . . , m}, the periodic subsequence (a,nn+i)nkO converges weakly to such a point Zii E S i .
In the particular case when m = 2, this theorem simply states that the sequence (a2n+l)nk0 converges weakly to a point El E S , such that P1(P2(Zi,))= Z,, i.e., to an image that satisfies property W,and which is closest to satisfying W2 (this result is also discussed in [33], [76], and [183]). Beyond two sets, however, the above result has no useful interpre-
202
P. L. COMBETTES
tation and little practical value. It merely indicates that the limit image ?iflies in Si and, thereby, satisfies q;. Aside from 'Pi, however, the properties of Ziare totally unknown and there is no guarantee that any of the remaining constraints will be satisfied, even in an approximate sense. Such a solution clearly constitutes a poor approximation of a feasible image. Thus, the convergence behavior of POCS in the inconsistent case is generally unsatisfactory. 4. Countable Set Theoretic Formulations
Countable set theoretic formulations are of great theoretical interest and they are also encountered in certain analog problems. POCS is limited to finite set theoretic formulations and it cannot be used in such problems. C . Inconsistent Problems
In this section, (SJiEl is a finite family of m sets whose intersection may be empty and the strictly convex weights (wJiEI are those of (3.7)/(3.8). Following [42], we present parallel projection methods to find leastsquares solutions to inconsistent convex image feasibility problems. The problem of finding an image that minimizes a weighted average of the squares of the distances to the property sets is reformulated in the product space E of Section II.H, where it is equivalent to that of finding a point that lies in the diagonal subspace D and at minimum distance from the Cartesian product S of the original sets. A solution is obtained in E: via methods of alternating projections which lead naturally to methods of parallel projections in the original space E. 1. Least-Squares Solutions
In inconsistent problems there exists no image possessing exactly all the properties (!Pi)iE,, but one can look for an image that satisfies them in some approximate sense. Let us consider the basic feasibility problem of solving a system of m linear equations in Rk. If the system is overdetermined, it is customary to look for a least-squares solution. In set theoretic terms, if (S&, represents the family of hyperplanes of R k associated with the equations, this is equivalent to looking for a point a* which minimizes &,d ( a , Si)*,the sum of the squares of the distances to the Sis.Along the same lines, the exact feasibility problem (1.6) can be replaced by the weighted least-squares feasibility problem of minimizing the proximity function (3.7), that is, Find a* E G = { a E 8 1 (tlb E 9)@(a)5 @(b)}.
(5.5)
THE CONVEX FEASIBILITY PROBLEM
203
Of course, if niEISi # 0, the minimum value of the proximity function is 0, which is attained only on G = fliE,Si,so that (1.6) and (5.5) coincide. In general, (5.5) can be viewed as an extension of (1.6) and G is the set of least-squares solutions of the (possibly inconsistent) image feasibility problem. From an image processing point of view, such solutions are clearly more acceptable and useful than those generated by POCS, whose properties were seen to be elusive. It should be noted that in finite-dimensional spaces, and under certain conditions on the problem, (5.1) can solve (5.5) if the sequence of relaxation parameters (AJnso approaches zero [27, 1331. Experimental evidence first suggested this property in the inconsistent tomographic reconstruction problems of [85], where POCS was reported to provide better results with strong underrelaxations than without relaxations, as in (5.4). From a practical viewpoint, however, strong underrelaxations are not desirable, and, overall, excesas they impose very small step sizes (\\a,,+l sively slow convergence. 2 . Alternating Projections in a Product Space
As shown in Section II.H, the original convex feasibility problem (1.6) can be recast in the m-fold product space as the new feasibility problem (2.47) of finding a point a* common to the product S of the property sets and the diagonal subspace D of 8. When n i E r S i= 0, then S f l D = 0 and the best approximate solution will be to find a point a* in D which is at minimum distance from S. This statement can be formalized by introducing the functional Q,:D+R+ 1
a H- d(a, S)2 2 and calling G its set of minimizers. Proposition 5.1 [42]. In the product space 8, the weighted least-squares problem (5.5) is equivalent t o minimizing a,i.e., to solving Find a* E G .
(5.7)
Now let P , and P , be the operators of projection onto the sets D and S. Then G = Fix P , P , 1331, and the following theorem provides an alternating projection method to solve (5.7). 0
Theorem 5.2. Suppose that G # 0. Then,for any a, in D, every sequence of iterates (an),,2odefined b y
204
P. L. COMBETTES
(Vn E N a,+1
= a,
+ U P D o Ps(a,)
where the relaxation parameters (A,),o to a point in G .
(5.8)
- a,),
satisfy (5.3), converges weakly
Proof. Let (a,JnrO be any sequence generated by the algorithm. Let T = P, Ps and fix c E G = Fix T, n E N. Then T : D + D is firmly nonexpansive, as shown by the relations 0
M a , b) E D2) 111 U a ) - T(b)1112 5 IIIPs(~) - f's(b)l1I2 5 ((a
- b I f's(a) - f's(b)))
= ((a
- b 1 PD(P,(a) - P,(b))))
= ((a
- b 1 T(a) - T(b))),
(5.9)
where we have used successively the nonexpansivity of P ,, then the firm nonexpansivity of P, [see Proposition 2.8(i)], and then (2.21) since P, is linear and a - b E D. As c E Fix T, (5.9) yields ((a, - c 1 T(aJ - c)) 1 \I(T(a,) - cll12 and therefore ((T(a,) - c I T(a,) - a,>> 9 0.
(5.10)
Whence, ((a, - c I T(a,) - a,>> = -lIlT(a,) 5
-lIlT(a,)
- a,1Il2 + ((T(a,) - c I T(a,) - a,>>
- an11I2.
(5.11)
Then (5.8), (5.1 l), and (5.3) imply IlIan+l - cll12 = Illan - c1112 + 2((a, - c I a,+, - a,)) + Illan+l- an11I2
1 1 +1 ~2A,((a, - c I T(a,) - a,>> + GlllT(a,) 5 Illan - ~ 1 1 1 ~A,(2 - Afl)lllT(afl)- a,11I2 =
Itla, - ~
- a1 ,112
(5.12)
5
Illa, - cIIlZ- &2111T(afl)- an1Il2
(5.13)
5
Illan - C11l2.
(5.14)
Hence, is Fejkr-monotone with respect to G. According to Proposition 2.11, it possesses a weak cluster point a, say ankL a, and it remains to show a E G. In view of (5.13), we have 111% - W%)11l2 5 E-*(lIla, - cll12 - IlIan+1- cll12).
(5.15)
But since the nonnegative sequence (Ills, - ~ l l l ) , is~ ~nonincreasing, it converges and therefore a,, - T(a,) 0. According to Proposition 2.8(i), Id - T is demiclosed and therefore (Id - 2') (a) = 0 since ,a, L a and 0. Whence a E Fix T = G. (Id - T)(a,,)
THE CONVEX FEASIBILITY PROBLEM
205
A pictorial description of (5.8) is given in Fig. 9: s, = P,(a,) and d, = P,(s,) = P , P,(a,) are first computed and a,+, is then positioned on the segment between a, and d, or between d, and 2d, - a, according as E 5 A 5 1 or 1 5 A, I2 - E . As discussed in Section V.B.3, in the unrelaxed case, Theorem 5.2 follows from Theorem 5.1. A noteworthy property of (5.8) is that it can be viewed as a gradient method, as stated in the following proposition. Q
Proposition 5.2 [42]. Let (a,),2o be any sequence of iterates in Theorem 5.2. Then (@(a,)),2o decreases until convergence and (Vn E N an+l= a, - h,V,@(a,),
(5.16)
where V, is the gradient operator in the Hilbert space D. Moreover, at iteration n, the relaxation parameter which is optimal in terms of bringing a,+, closest to an arbitrary point a* in G is (5.17)
We observe that the optimal relaxation parameter A,* depends on a solution point a*, which of course is not known. Hence, optimal relaxations cannot be achieved. However, the above proposition indicates that they are always overrelaxations. Strong convergence of the unrelaxed version of (5.8) can be proved if one makes additional assumptions on S, such as compactness [33], finite dimensionality [33], or uniform convexity [79]. The next theorem presents
FIGURE9. PPM algorithm in the product space. 0 1994 IEEE [42], with permission.
206
P. L. COMBETTES
a strong convergence result for a variant of (5.8) which does not require special conditions. Theorem 5.3. Suppose that G # 0.Then,for any a, in D , every sequence of iterates (a,JnzOdefined by (Vn E N)an+l= (1 - a,)ao + a,(APDoPs(a,) + (1 - A)a,), where 0 < A
5
(5.18)
C [0, I[ satisfies
2 and where
(5.19)
converges strongly to Pc(a,).
Proof. Similar to that found in [42], except that we now use the more general conditions (5.19) allowed by a fixed-point theorem of [ 1791. Note that, as n increases, (5.18) tends to behave like a constantrelaxation version of (5.8). Moreover, a simple example of sequence (a,),zo that satisfies (5.19) is n (Vn E N) a, = (5.20) n + 1'
3. Simultaneous Projection Methods In the previous section we have solved the least-squares feasibility problem (5.5) in the product space E . It remains to reformulate the solution methods in the original signal space 8,where they will actually be employed. First, we must secure conditions under which (5.5) admits solutions. Proposition 5.3 [42, 551. Suppose that either of the following conditions holds:
(i) One of the S,s is bounded; (ii) All of the Sis are closed affine half-spaces. Then G # 0. Next, we need a point of passage from
E to %.
Proposition 5.4 [132]. We have
(Va E D) Pda) = (P;(a));er (Va E S) P,(a)
=
(XiErwia(i), . . . ,XiErWia"').
(5.21)
THE CONVEX FEASIBILITY PROBLEM
207
It follows from this proposition that (Va E D) P,oPs(a) =
w,Pi(a),. . .
, 2 w,Pi(a) iEI
Therefore the alternating projection method (5.8) in E yields the simultaneous projection method
(5.23) in %. We shall call the algorithm (5.23) with relaxation scheme ( 5 . 3 ) the parallel projection method (PPM). A salient feature of PPM is its parallelism: At every iteration the projections can be computed simultaneously on concurrent processors. Thus, the phase of an iteration of PPM consists of projecting the current signal a, onto all the sets, a task which can be distributed among m parallel processors. The second phase is a combination phase in which the projections computed by the m processors are averaged to form d, = &Iw,Pi(a,). The last phase consists of positioning the new iterate a,,, on the segment between a,, and 2d,, - a,,. This procedure is illustrated in Fig. 10. The weak convergence of PPM is a direct consequence of Theorem 5.2 and Proposition 2.12.
FIGURE10. PPM algorithm in the original space. 0 1YY4 IEEE [42], with permission.
208
P. L. COMBETTES
Theorem 5.4. Suppose that G # 0 (see Proposition 5.3). Then every orbit of PPM converges weakly t o a point in G. Special cases of PPM have already been studied in the literature via direct approaches in the original space. Thus, Theorem 5.4 generalizes a result of [54], which was restricted to half-spaces in finite-dimensional spaces and could therefore be applied only to linear inequality constraints. It also generalizes a result of [%I, which assumed constant relaxations in (5.23). The following proposition is a consequence of (2.40) and Propositions 5.2 and 5.4.
Proposition 5.5. Let (an)n2O be any orbit of PPM. Then (@(a,)),2odecreases until convergence and (Vn E N) a,,,
= a,
- h,V@(a,),
(5.24)
where V is the gradient operator in 8.Moreover, at iteration n , the relaxation parameter that will bring a,,, closest to a solution point a* in G is (5.25)
Although the product space formalism is well suited to analyze and develop projection methods, it is sometimes limited when it comes to strong convergence properties, as it imposes conditions on the whole set S. For instance, compactness of S guarantees strong convergence of (5.8) in 8 , but it translates into compactness of all the sets (SJiElin 8.As we shall now see, much less restrictive conditions can be obtained via a direct approach in a. Theorem 5.5 [46]. Every orbit of PPM converges strongly to a point in G if any of the following conditions is satisfied. (i) (Si),,, contains only closed affine half-spaces. (ii) (Si),,, contains only uniformly conuex sets. (iii) (Si)iEIcontains a boundedly compact set and a bounded set. An alternative strong convergence result which does not place any restriction on the sets is the following. Theorem 5.6. Suppose that G # 0 (see Proposition 5.3). Then,for any a, in z , every sequence of iterates (an)n20 defined b y 3
(Vn E N) a,,,
= ( I - a,)a,
+ a, (1
w;Pi(a,) + (1 - Qa,,
THE CONVEX FEASIBILITY PROBLEM
where (a,,),,2ois as in (5.19) and 0 < A projection of a, onto G .
5
209
2, converges strongly to the
Proof. Thanks to (5.22), (5.18) in Z yields (5.26) in a. It then follows from Proposition 2.12 that Theorem 5.6 is a corollary of Theorem 5.3. It is worth noting that (5.26) not only converges strongly to a leastsquares-feasible solution but also guarantees that this solution is the closest to the initial point a,. Even in consistent problems, this property is very valuable in certain image recovery applications, when one seeks the best feasible approximation of a reference image a, [39] (in comparison, the method developed in [ 1051 is limited to the case m = 2 and is relatively i n v o l ~ e d )As . ~ an example, one who adopts the aim of finding a leastsquares-feasible image with minimum energy can take a, to be the zero image. It then follows from Theorem 5.6 that the iterations
will converge strongly to the desired solution. D . Projection Methods 1. Panorama
Although POCS has been the focus of most of the attention in image recovery, other projection methods have been available, some for almost three decades, that overcome some of its shortcomings. We discuss here three frameworks that, in our opinion, contain interesting features. a . Framework 1: Browder’s Admissible Control. In POCS the control sequence (i(n)),,, imposes that the sets be activated in periodic order. As mentioned in Section V.B.4, this periodic control mode can be implemented only when card Z < +m. An alternative way of defining the control is to require that each set Si be activated at least once sequence (i(n))f120 within any cycle of M iconsecutive iterations, that is, (Vi E Z)(3Mi E N*)(Vn E N) i E {i(n),. . . , i(n + Mi - 1)). (5.28) In consistent problems (Le., G = S ) , a slight extension of a result of [112] shows that (5.26) can be replaced by (Vn E N) a,,, = (1 - CYJao + a , P , . * oP,(a,) in Theorem 5.6. In addition, for both methods, strong convergence to the projection of a. onto S remains true if each Pi is replaced by any firmly nonexpansive operator Ti such that Fix Ti = Si [43, 1121. 0 .
210
P. L. COMBETTES
For Z = N* and Mi
=
2', an example of admissible control sequence is
~i~~~~n~0=~1,2,1,3,1,2,1,~,~,2,~,3,~,~,~,~ 3, 1,2, 1 , 4 , 1 , 2 , 1 , 3 , 1 , 2 , 1 , 6 , 1 , 2 , 1 , 3 , 1,2, 1,4,1,2,1,3,1,2,1,5,1,2,1,3,1,2,1,4,1, (5.29) 2,1,3,1,2,1,7,1,2,1,3,1,2,1,4,1,2,1,3, 1,2,1,5,1,2,1,3,1,2,l,4,1,2,1,3,1,2,1, 6 , 1 , 2 , 1 , 3 , 1 , 2 , 1 , 4 , 1 , 2 , 1 , 3 , 1 , 2, . . . ). It is noted that periodic control is a particular case of admissible control. Hence, the following theorem due to Browder generalizes the weak convergence result of POCS found in Theorem 3.2.
Theorem 5.7 [19]. Suppose that I is any nonempty subset of N. Then every sequence generated by the serial algorithm (5.1) with relaxation strategy (5.3) and admissible control scheme (5.28) converges weakly to a point in S . An even more general control scheme is the so-called chaotic control scheme, which imposes only that every set be used infinitely often, i.e., (Vi E I)(Vn E N) i E {i(n),i(n + l), . . .}.
(5.30)
This condition goes back to the work of Poincare on boundary problems
[ 1341, who gave the following example for I = N* :
(i(n))n20 = (1,2, 1 , 2 , 3 , 1 , 2 , 3 , 4 , 1 , 2 , 3 , 4 , 5 , 1,2, 3 , 4 , 5 , 6 , . . .). (5.31) However, the result of Theorem 5.7 no longer holds in this case (even in finite-dimensional spaces [40]), and some restrictions are needed.
Theorem 5.8. Every sequence generated by the unrelaxed version of the serial algorithm (5.1) under chaotic control converges weakly to a point in S ifany of the following conditions holds. (i) (Si)iEI is a finite family of closed vector subspaces [4]. (ii) (Si)iE,is afinite family containing a weak interior point [61], i.e., ( 3 w E S)(Vc E S ) ( - J pE RT) w
+p
( -~c) E S .
(5.32)
(iii) card I = 3 1611. A result similar to (ii) can also be found in [182]. If instead of merely a weak interior point, we require the existence of an interior point for S , then strong convergence takes place for countable families.
Theorem 5.9. Suppose that I is any nonempty subset of N and that 3 # 0. Then every sequence (an)n2Ogenerated b y the serial algorithm
THE CONVEX FEASIBILITY PROBLEM
21 1
(5.1) with relaxation strategy (5.3) and chaotic control scheme (5.30) converges strongly to a point in S .
Prooj. First of all, ( u , ) , ~is~ FejCr-monotone with respect to S . Indeed by fixing c E S and following a procedure similar to that of the proof of Theorem 5.2, we arrive at (Vn E N>lla,,+l - c1I25 lla, - cI12- ~~llP~(,)(a,J - 412 5 /[a, -
c(12.
(5.33)
(5.34)
According to Proposition 2.1 l(iv), there exists a point a E B such that a, 3 a, and it remains to show a E S. Take an arbitrary i E I . Since the control is chaotic, there exists an increasing sequence (nk)krOC N such that (Vk E N) i = i(nk).Therefore (5.33) yields (5.35)
As in the proof of Theorem 5.2, we obtain Pi(aflk)- aflkA 0. But since aflkA a, we get P,(a,,) A a. However (Pi(ank))k>O C Si and Si is closed. Therefore a E Si. Since i was arbitrary, we conclude a E niE,Si = S.
b. Framework 2: Pierra’s Extrapolated Iterations. We have seen in Sections V.B.1 and V.B.2 that POCS suffered from slow convergence and that it was not well suited to take advantage of parallel computing. It would be erroneous, however, to conclude that a parallel projection method is always faster than a serial one just because it can process projections simultaneously as opposed to sequentially. Thus, the parallel algorithm SIRT (3.11) was found to be actually slower than the serial algorithm ART (3.10) in tomographic image reconstruction [85]. In our numerical simulations we have also found that (3.11) is usually slower than unrelaxed POCS (5.4) in a number of problems involving general convex constraints. This fact can be illustrated by comparing Fig. 6 and Fig. 1 1 . An advantage of a parallel projection method such as PPM (5.23) is that it can be accelerated by overrelaxations, which is not true for serial algorithms. In fact, overrelaxations have been reported to accelerate parallel projection methods in a number of studies, e.g., [21], [46], [60], and [92]. To explain this, note that the efficient progression of ageneral relaxed algorithm of the type (Vn E N) a,,, = a, + A,@, - a,) toward a solution depends on two factors at every iteration n: 1. Centering: In order to avoid “zigzagging,” the iterations should remain centered with respect to the sets so that the directions taken by the algorithm keep pointing to the solution set S.
212
P. L. COMBETTES
FIGURE1 1 . SIRT algorithm.
2. Relaxation: At every iteration, A,, should place a,,, close to S on the ray emanating from a, and going through d,,.
In the case of a serial algorithm such as (5.1), d,, is the projection onto a single set Si(,) and therefore the algorithm will keep moving in different directions and will tend to zigzag. By contrast, since PPM averages the projections its centering is much better, which takes care of condition 1 above. On the other hand, Proposition 5.2 takes care of condition 2, as it indicates that overrelaxations will bring the update closer to S. In PPM, however, overrelaxations were limited to 2 in order to guarantee convergence in inconsistent problems. We shall now follow the work of Pierra [132], who showed that in consistent problems this condition can be bypassed and much larger relaxations can be obtained. In order to define an alternative relaxation strategy, let us return to the product space formalism of Section II.H, in which the convex feasibility problem was seen to reduce to (2.47). Now consider Fig. 12, where a,, E D n CS, s,, = Ps(a,) and d,, = P,(s,) = P, Ps(a,,). Let H, be the affine hyperplane supporting S at s,. Then H,, separates a, from S and intersects D at a point en. Note that 0
(5.37)
Hence, returning to the alternating projection method (5.8), an update a,,+] on the segment between a,, and enwill be obtained by taking relaxations
213
THE CONVEX FEASIBILITY PROBLEM
D
FIGURE12. EPPM algorithm in the product space.
up to
Note that we always have L, 2 1 . Indeed, since a, E D, the nonexpansivity of P, yields IllPD oPs(aJ - an111 = lllPD(Ps(aJ) - PD(aJlll 5
lllPs(an) - anlll*
Proposition 5.6 [132]. Every sequence (a,),,o
(5.39) (5.40)
C D constructedas in (5.8)
with relaxation strategy
( V n E N ) E:A,SL,,
whereO<E< 1 ,
(5.41)
converges weakly to a point in S f l D.
We can recast this result in the original image space B via Proposition 5.4 to obtain Pierra’s extrapolated parallel projection method (EPPM), which is described by the algorithm (5.23) with relaxation range
otherwise. (5.42)
214
P. L. COMBETTES
a,-
FIGURE13. EPPM algorithm in the original space.
The weak convergence of PPM follows immediately from Propositions 5.6 and 2.12.
Theorem 5.10 [132]. Every orbit of EPPM converges weakly to a point in S . It was observed in [132] that the fast convergence of EPPM was due to the large overrelaxations allowed by (5.42). In fact, L, can attain values much larger than 2 and eliminate the “angle problem” of conventional methods: One can see in Figs. 6 and 1 1 that the iterations will slow down as the angle between the two sets diminishes. On the other hand, EPPM is not sensitive to this problem, as seen in Fig. 12. Figure 13 shows a realization of EPPM with equal weights on the projections and (Vn E N) X, = L,. It compares favorably with POCS (Fig. 6) and SIRT (Fig. 11). In order to mitigate the possible zigzagging that could take place with large relaxations and could reduce the effectiveness of the algorithm, it was suggested in [132] to recenter the orbit every 3 iterations by halving the extrapolations, namely, (Vn E N A, =
iL,
L,/2
if n = 2 modulo 3 otherwise.
(5.43)
We saw in Proposition 5.5 that PPM was a steepest-descent method for the proximity function of (3.7). By invoking (2.16), EPPM can be written
215
THE CONVEX FEASIBILITY PROBLEM
in the form (Vn E N) a,,, = a,
-
V@(a,)
llV@(an)l12
with
E 5
a, 5 2 - E ,
(5.44)
which was shown in [40] to be a particular case of the Gauss-Newton method studied in [136]. In that paper the Gauss-Newton approach was reported to converge more efficiently than the standard steepest-descent approach (5.24). This furnishes another justification for the superiority of EPPM over PPM.6 It should be noted that EPPM does not generalize PPM, for the relaxation ranges (5.42) are not necessarily wider than (5.3). Indeed, we have seen that the extrapolation parameter L, was at least equal to 1, but it may not necessarily be greater than 2.' Therefore, to unify and extend both PPM and EPPM, we shall now consider relaxations up to 2L,, i.e., (Vn E N) E
5
A,
5
(2 - E)L,.
(5.45)
This extension.will also allow faster convergence in certain problems through the use of larger overrelaxations than those allowed by PPM and EPPM. To justify this extension more rigorously, let us go back to Proposition 5.2. In the consistent case, it states that at iteration n the relaxation parameter which brings a,,, closest to an arbitrary point a* in SnDis (5.46)
(5.47)
(5.48) (5.49) (5.50)
This statement applies only to consistent problems. PPM was developed for inconsistent problems, where relaxations cannot be extended beyond 2 (see Fig. 9). Cases when L, 5 2 can easily be constructed, e.g., [221.
216
P. L.COMBETTES
Thus, extending the relaxation range to 10, 2L,[ opens the possibility of getting closer to the optimal relaxation parameter, which lies in [L,, +m[. Let us also note that, according to (2.21), ((P& - a, I a* - Ps(a,J))= 0 if S is a closed affine subspace. Whence, (5.49) shows that A,* = L, in this case. These results can be routinely transferred to the original space 8 as follows.
Proposition 5.7. At iteration n, the relaxation parameter that will bring closest to a solution point a* in S is
a,,,
(5.51)
In addition, if(Si)iEIis a finite family of closed afine subspaces, then A,*
= L,.
(5.52)
We shall call E P P M 2 the algorithm obtained by combining (5.23) and (5.45).
Theorem 5.11 [41]. Every orbit of EPPM2 converges weakly to a point in S . c. Framework 3: Block-Parallel Methods. A limitation of parallel methods such as EPPM2 is that all the sets must be acted upon at each iteration. If the number of sets is larger than the number of concurrent processors available, the implementation of the algorithm will not be fully parallel. At iteration n, a flexible adaptation of the computational load to the parallel computing architecture at hand can be obtained by activating only a subfamily (Si)iE,ncrof property sets. If, in addition, we allow the weights on the projections to vary at each iteration, we obtain an iterative method of the form
(5.53) where (I,JnsO is a sequence of subsets of 1 and ( ( w ~ ~ , ) a~ sequence ~ ~ , ) , ~of~ convex weights. Let us observe that if only one set is processed at each iteration, say, (Vn E N) 1, = {i(n)},then (5.53)reverts to the serial method (5.1). On the other hand, if all the sets are processed at each iteration, i.e., (Vn E N) I , = I, we obtain the simultaneous projection method (5.23). Algorithms of the general form (5.53) have been proposed with various assumptions on the dimension of E,the sets ( S J i E I the , relaxation parameters (A,),,o, the weights ( ( w ~ , , ) ~and ~ ~the ~ )control , ~ ~ , sequence (I,)nrO [3,
THE CONVEX FEASIBILITY PROBLEM
217
21, 22, 26, 46, 1261. In the next section we present a projection method based on (5.53) which encompasses and generalizes these approaches.
2. Extrapolated Method of Parallel Projections (EMOPP) Each of the three frameworks discussed in the previous section has an attractive feature. Framework 1 provides flexible control schemes that can handle an infinite number of sets; framework 2 provides extrapolated iterations that converge efficiently; framework 3 provides a flexible management of the property sets that can easily be adapted to the configuration of a parallel computer. These attractive features can be combined into a single algorithm, the extrapolated method of parallel projections (EMOPP), which we now describe. Given an initial point a, E E and numbers C E N*, 6 E 10, l I C [ , and E E 10, I[, EMOPP is defined by the iterative process (5.54) where at each iteration n: (a) The family Z, of indices of selected sets satisfies
0 # Z, C Z
and
card {iE Z, 1 a, $E S i } 5 C .
(5.55)
(b) The weights on the projections satisfy wi,, = 1 ;ern
and
(ViE Z,) wi,, 2 61Csi(a,).
(5.56)
(5.57)
Practically, iteration n of EMOPP is performed as follows. First, one selects the sets to be activated; Z, contains the indices of these sets. One then computes the projections (Pi(a,))jE,nof the current iterate a, onto the selected sets and determines a convex combination d, = &,wi,, Pi(a,) of these projections as well as the extrapolation parameter L,. The position of the new iterate a,+l on the segment between a, and an + 2L,(dn - a,) is determined by the relaxation parameter A,, .
218
P. L. COMBETTES
EMOPP features doubly extrapolated relaxation ranges as EPPM2, it can process variable blocks of sets as (5.53), and it can be driven by flexible control schemes that extend in particular the admissible and chaotic control schemes of the serial algorithm (5.1). The condition (5.56) imposes that the weights be bounded away from 0 on violated sets and add up to 1. This, in turn, implies that the number of violated sets processed at each iteration must be bounded, whence condition (5.55). It can always be assumed that nonviolated sets are selected, since they can be assigned a 0 weight. Finally, let us observe that L, 2 1 (since the function 11. 1 12 is convex) and that L, = 1 when less than two violated sets are used (card {i E I,, I a, $5 Si}< 2). In this case, the relaxation range reduces to the usual interval [ E , 2 - E ] and no extrapolation takes place. 3. Control
We shall consider the following control strategies for EMOPP. They constitute extensions to parallel projection methods of schemes which have been proposed for serial ones. We shall say that the control is:
(i) Static if all the sets are activated at each iteration, i.e., (Vn E N) In = I.
(5.58)
This control condition goes back to Cimmino's algorithm [35]. It was used in SIRT, PPM, EPPM, and EPPM2. (ii) Cyclic if there exists a positive integer M such that n+M-l
(5.59) k=n
Thus, if the control is M-cyclic, all the sets must be activated at least once within any M consecutive iterations. This condition was utilized in [461 and [ 1261. (iii) Quasi-cyclic if there exists an increasing sequence (M,n)m20C N such that Mo
=0
Emao(Mm+l - Mm)-'=
(Vm E N) I
=
+m
(5.60)
uZ;;;lz,.
In words, if the control is (M,),,,-quasi-cyclic, all the sets are activated at least once within each quasi-cycle of iterations {M,,
219
THE CONVEX FEASIBILITY PROBLEM
. . . , M,,, - I}. The nonsummability condition imposes that the lengths (M,,,,, - Mm),,z_Oof the quasi-cycles do not increase too fast eventually. For instance, the linear growth condition (Vm E N) M,,,,, - M , = a ( m + 1) is acceptable. Quasi-cyclic control was introduced in [175] for a serial method. (iv) Admissible if there exist positive integers (Mi)iEIsuch that n+Mi-l
(v(i,n ) E I x N) i E
U 1,.
(5.61)
k=n
Hence, the set Si is activated at least once within any Mi consecutive iterations, which extends Browder’s definition (5.28) to the parallel case. Of course, if card Z < +a,this control mode coincides with the cyclic mode (5.59) for M = maxiEIMi. (v) Chaotic if each set is activated infinitely often in the iteration process, i.e., (5.62) This is a direct generalization of (5.30), which was used in the parallel method of [126]. Clearly, static cyclic quasicyclic chaotic, and cyclic .$ admissible chaotic. (vi) Coercive if
+ +
+
(3(i(n))n80E X 1,) d(a,, si(,))*O n80
+
SUP iEI
+
d(a,, Si)AO* (5.63)
In the serial case, this control mode was proposed in [79] as a generalization of the most-remote set control scheme (Vn E N)(3i(n)E I,) d(a,, S,,,) = sup d(a,, Si), iEZ
(5.64)
which is not always applicable when card I = +a. contains a subsequence ( I n k ) k r O such (vii) Chaotically coercive if (In),trO that (g(i(k))kzOE
x
k8O
Ink)
d(ank?Si(k))&o
sup d(ankySi)%O* (5.65) iEI
This condition generalizes (5.63) as well as the control strategy consisting in activating one of the most remote sets infinitely often in the course of the iterations. 4. Convergence Results
In this section we present results on the convergence of EMOPP. As usual, a key step in proving the convergence to a feasible image is (2.37).
220
P. L. COMBETTES
Proposition 5.8 [40]. Every orbit of EMOPP is FejPr-monotone with respect to S . The next step is to determine suitable conditions on the control and the sets so that weak or strong convergence to a point in S is actually achieved by every orbit. Let us note that, since the sequence (card {i E Z,,I a,, Si}),,tO is bounded, quasi-cyclic control cannot be applied to countable set theoretic formulations in general, as it requires that all the sets be activated over a finite number of iterations. a . Weak Convergence. The following theorem, which generalizes results of [46] as well as Theorems 5.7 and 5.11, appears to be the most general result available on the weak convergence of projection methods. Theorem 5.12 [401. Under coercive or admissible control, every orbit of EMOPP converges weakly to a point in S . The next theorem does not guarantee weak convergence as weak cluster points could exist outside of S but it is nonetheless of interest. Theorem 5.13 [40]. Each orbit of EMOPP possesses one and only one weak cluster point in S if either of the following conditions holds. (i) The control is chaotically coercive. (ii) card Z < +m and the control is quasi-cyclic. In the special case of algorithm (5.1), Theorem 5.13(ii) was obtained in [174]. 6 . Strong Convergence. Following the terminology of [lo], (Si)iEIis we have boundedly regular if for any bounded sequence (a,JnrO
sup &a,, S,) A 0 jd(a,, S ) AO.
(5.66)
iEI
The concept of bounded regularity was first used extensively in [79] to prove the strong convergence of several serial projections algorithms. Conditions for bounded regularity were previously discussed in [ 1101 in the case of two sets. The importance of this notion stems from the following fact. Proposition 5.9 [79]. Let (an)n90be a FejPr-monotone sequence with respect to S and suppose that (Si)iEIis boundedly regular. Then
Since under chaotically coercive or quasi-cyclic control each orbit such that sup,,, d(ank,S,) A 0 [40], Propositions 5.8, 5.9, and 2.ll(iii) lead to the following result, which generalizes results of [46] and [132], as well as Theorem 3.2. (an)nrO contains a suborbit
THE CONVEX FEASIBILITY PROBLEM
22 1
Theorem 5.14 [40]. Suppose that (Si)iEIis boundedly regular. Then every orbit of EMOPP converges strongly to a point in S ifeither of the following conditions holds.
(i) The control is chaotically coercive. (ii) card I < + w and the control is quasi-cyclic. We now give specific conditions when this theorem can be applied. The following definition is motivated by [ 1101 (see also Proposition 2.4(ix)). We shall say that a set Si is a Levitin-Polyak set if, for every sequence (an),,>,,C E such that d(a,,, Si) 0, the following property holds: If a,, a and a E asi,then a,, a. Locally uniformly convex sets, and a fortiori uniformly convex sets, are Levitin-Polyak sets. However, unlike uniformly convex sets, locally uniformly convex sets need not be bounded [l Proposition 5.10 [ 11, 40, 791. (SJiEIis boundedly regular following conditions is satisfied.
if any of the
n(niEl,(j$i)o
(i) ( 3 j E I) sj # 0. (ii) All, except possibly one, of the sets in (Si)iE,aref-uniformlyconvex. (iii) One of the sets in (SJiEIis boundedly compact. In particular: One of the sets in (Si)iElis compact. One of the sets in (SiJiEIis contained in a Jinite dimensional affine subspace. dim E < +a.
(iv) (Si)iEIis afinite family and all, except possibly one, of its sets are Levitin-Polyak sets. In particular: (Si)iEIis afinite family and all, except possibly one, of its sets are locally uniformly convex. (Si)iElis afinite family and all, except possibly one, of its sets are uniformly convex.
* The fact that a locally uniformly convex set S, is a Levitin-Polyak set can be proved as such that d(a,, S,) -50 and a, LL a E as,. Then P,(a,) 3 a. follows. Take a sequence (an)nrO Now take a point b @ S, such that P,(b) = a and consider the half-space {h E E I ( h a I b - a ) 5 0) containing S,and whose boundary supports S, at a. Since S,is locally uniformly convex, there exists a nondecreasing functionf: R, + R, that vanishes only at 0 such that (Vh E S,) (h - a I b - a ) 5 -f(llh - all). Whence, (Vn E N) (P,(a,) - a I a - b) =f(llP,(a,) - all). But since P,(a,) 5 a, we getf(llP,(a,) - all) -5 0 and therefore P,(a,) 5 a. As d(a,, S,) A 0, we conclude a, -5a. We also note that (ii) in Theorem 5.5 can be generalized to: (S,),,, contains only LevitinPolyak sets, one of which is bounded.
222
P. L. COMBETTES
(v) (Si)iElis a finite family of closed afine subspaces such that is closed. In particular:
(Si)iEI is afinitefamily of closed afjne subspaces, all of which, except possibly one, have jinite codimension. (Si)iEIis afinite family of closed affine subspaces, all of which, except possibly one, are affine hyperplanes. (vi) (Si)iEIis a finite family of closed polyhedrons (jinite intersections of closed affine half-spaces). We now move to the most general type of control, namely, chaotic control. To obtain strong convergence in that case, the hypotheses on the sets will have to be strengthened. A point c E S is a strongly regular point of (S& if [I261
Theorem 5.15 [40]. Under chaotic control, every orbit of EMOPP converges strongly to apoint in S ifany of the following conditions is satisfied.
(i) (Si)iEIhas a strongly regular point. In particular:
4#0. (Si)iEIis a family of $uniformly convex sets.
(ii) (Si)iE,is afinite family and one of its sets is boundedly compact. In particular: (Si)iEI is afinite family and one of its sets is compact. (Si)iEI is a finite family and one of its sets is contained in a finite dimensional afine subspace. (Si)iEIis a$nite family and dim 8 < +m.
(iii) (Si)iEIis a finite family of closed affine subspaces with finite codimensions. In particular, (Si)iElis a finite family of affine hyperplanes. (iv) (Si)iEIis afinite family of closed polyhedrons. In particular, (SJiEI is ajinite family of closed afJine half-spaces.
THE CONVEX FEASIBILITY PROBLEM
223
For relaxations only up to L,, Theorem 5.15(i) was established in [ 1261; in the special case of the unrelaxed version of algorithm (5.1) + (5.30), Theorem 5.15(ii) was proved in [20] with the compactness condition. A related result is the following.
Theorem 5.16 [lo]. Suppose that (Si)iEI is a Jinite family and that any of its rionvoid subfamilies (SJiEJCI is boundedly regular, in particular: (i) ( 3 j E 1) sj n(niE,qj,S.,> + 0. (ii) All, except possibly one, of the sets in (SJiE1 are boundedly compact. (iii) Each set in (Si)iElis a closed aflne subspace and x i E J S : is closed for every 0 # J C 1. (iv) (SJiE,is a family of closed polyhedrons. Then every sequence generated b y the unrelaxed version of the serial algorithm (5.1) under chaotic control (5.30) converges strongly to a point in S .
E . Extrapolated Method of Parallel Approximate Projections (EMOPAP) 1. Problem Statement
In Section II.E.2 we have given examples of sets whose projection operators admit closed-form expressions. There are many cases, however, when projection operators are not so easy to determine, which constitutes a serious obstacle in the implementation of a projection algorithm. To illustrate this point, consider the problem of projecting a digital image a onto the set Si = {b E EN* I g i ( b ) I0},
(5.69)
where g i is a convex functional (in digital image processing, this is typically how sets are specified). The projection Pi(a) is obtained by solving the constrained quadratic minimization problem 1 min - I(b- all2 2
subject to
gi(b)= 0,
(5.70)
which can be recast as the problem of minimizing 1
W )= -2 ( ( b- all2 + cLgi(b),
(5.71)
where p is a Lagrange multiplier to be adjusted so that gi(Pi(a))= 0. Assuming that g iis differentiable, Pi(a) should therefore satisfy
224
P. L. COMBETTES
(5.72) If g; is an affine or quadratic functional, as in the examples (2.3)-(2.6), this system is easily solved. Otherwise, it may require a costly solution method to adjust p iteratively. For instance, consider the second moment set S; = {a E EN2 1 1Jx- Tall2 9 52)
(5.73)
of (4.41), where T is an N 2 x N 2 matrix.’ In this case, (5.72) was solved in [171] via a Newton method initialized at po = 0. Another example is the minimum entropy set proposed in [47] for images that exhibit a low , this level of structure. If we denote by ln(a) the vector set takes the formlo Si = {a E A I -(a I ln(a)) 2 q},
(5.74)
where A = {a E EN* I E$i1 a(,)= 1 and (Vi E (0, . . . ,N 2 - 1)) di)2 T > 0}, 7 being a lower bound on the pixel values. The closedness and convexity of this set follow from the convexity of the functional a w (a I ln(a)) on A (see Proposition 2.6). There too, (5.72) must be solved via iterative methods similar to those proposed in [70] and [167] for the maximum entropy method. A way to circumvent the sometimes tedious computation of projections is to replace them by approximate ones. By an approximate projection of a, E CS, onto S i , we shall mean the projection of a, onto any closed and convex superset S,,, of Siwhich does not contain a,.” A natural candidate for Si,, is a closed affine half-space whose boundary hyperplane Hi,nseparates a, from S; (see Fig. 14). The approximate projection is then simply given by (2.22), meaning that the nonlinear constraint defining Si has been “affinized.” More formally, we shall say that ((Si,n)iEl,)n80 are approximating sets if they are closed and convex and satisfy (3r)E 10, 1[)(Vn E N)(Vi E I,) SiC S,,,
and
d(a,, S,,,) 5 qd(a,, Si). (5.75)
This condition has been used in [21, [ I l l , [451, and [68].
Note that the set (5.73) has the same analytic expression as (4.6). lo
In this context, each gray level dois viewed as a probability and ~ $ l u ( ‘=~ 1 .
I’ The idea of replacing exact projections by approximate ones was actually suggested in [1321. Naturally, this approach will be numerically advantageous if the determination of this superset is less costly than the computation of the exact projection.
THE CONVEX FEASIBILITY PROBLEM
225
Si,n
FIGURE14. Projection onto a separating hyperplane.
2. Algorithm Given an initial point a, E E and numbers C E N*, 6 E 10, 1/C[, q E 10, 1[, and E E 10, 1[, EMOPAP is defined by the iterative process [40]
(5.76) where at each iteration n: (a) The family I , of indices of selected sets satisfies
0 # Z, C Z
and
card {iE I , 1 a, $Z Si}IC .
(5.77)
(b) (Pi,n(an))iEI are the projections of a, onto the approximating sets (Si,JiEI,defined by (5.75). conform to (5.56). (c) The weights (wi,JiEIn (d) The relaxation parameter A, lies in [ E , (2 - &)I,,],where
226
P. L. COMBETTES
otherwise. Methods involving projections onto separating hyperplanes have been proposed previously for less general projection algorithms in [2] and [68]. 3. Convergence Results
Theorem 5.17 1401. Theorems 5.12,5.13, and5.14 remain true f o r EMOPAP. In addition, under chaotic control, every orbit of EMOPAP converges strongly to a point in S $any of the following conditions holds. (i) i z 0. (ii) (Si)iEIis a finite family and one of its sets is boundedly compact. In particular:
(Si)iElis aJinite family and one of its sets is compact. (Si)iEIis a finite family and one of its sets is contained in a finite-dimensional afine subspace. (Si)iEIis afinite family and dim E < $00. (iii) (Si)iEIis a finite family of closed affine subspaces with finite codimensions. In particular, (Si)iEIis a Jinite family of afine hyperplanes. (iv) (Si)iEIis afinite family of closed afine halfspaces. In the finite-dimensional case, part (ii) of this theorem was established C in [45] and generalizes results of [2] and [68] which considered [E, 2 - E l .
F. Extrapolated Method of Parallel Subgradient Projections (EMOPSP) I . Problem Statement The previous framework gives a lot of latitude in the choice of the approximating supersets. In practice, however, it is often convenient to have at hand a systematic method for determining the separating hyperplanes ((Hi,fl)iE,n)n=.o in Fig. 14.'* In this section, we follow [41] and define one such method. IZ This is essentially the same problem that arises in cutting plane methods in nonlinear programming [ 1071.
227
THE CONVEX FEASIBILITY PROBLEM
First of all, let us observe that a closed and convex property set Sj can always be expressed as the 0-section Sj = {a E a 1 g ( a ) 5 0)
(5.79)
of a convex, (lower semi-) continuous functional g j : E + R. This representation is quite general, since (3.18) indicates that one can always take gi = d ( . , Si). More practically, let us note that a convex constraint qi is usually formulated through a convex inequality, which leads directly to (5.79). Now suppose a, E CS,.Then the closed affine half-space Si., = { a E
I (a, - a 1 ti,,) 2 gi(a,)}
where
r,,, E agi(a,) (5.80)
is a valid outer approximation of Sj at iteration n. Indeed, a, E S;,, would imply 0 2 gi(a,), which is impossible since a, $Z S j . Moreover, take any a E S i . Then gi(a) 3 0. But, according to (2.14), gj(a,) 5 gj(a) + (a, - a 1 ti,,) 5
(5.81)
(a, - a I ti,,).
(5.82)
Therefore a E Si,, and S, C S j , , . Note that we have Sj,, = {a E E I ( a 1 ti,,> 5 ( a , I ti,,) - gj(arJ}-
(5.83)
Therefore, the projection of a, onto S;,, is given by (2.23) and reads
(5.84) This projection is called a subgradient projection. With such projections, only the computation of a subgradient ti,, [of the gradient Vgi(a,) if g , is differentiable at a,] is needed to process the set S j at iteration n as opposed to the potentially involved exact projection Pj(a,).It is important to note that subgradient projections generalize the notion of projections. Indeed, if we let g j = d ( - ,SJ, then (5.84) yields the exact projections thanks to (2.17). In general, for an arbitrary a, E E , the subgradient projection of a, onto Si in (5.79) will be defined by ,
where
P;,n(an)=
ti,,E &(a,),
(5.85)
otherwise and it can be taken as the conventional projection Pi(a,) whenever this exact projection is easy to compute.
228
P. L. COMBETTES
2. Examples of Subgradient Projections We have seen that the projections onto the sets (5.73) and (5.74) needed to be computed iteratively. By contrast, the subgradient projection of an image a, onto (5.73) is simply obtained via (5.85) as
where y(a,) = x - Tan and where we have used the identity Vgi(an)= V(llx -Tan(I2- t2)= -2 'T(x - Tan).
(5.87)
Likewise, the subgradient projection of an image a, onto (5.74) is obtained via
otherwise, where 1denotes the vector of ones in EN* and where we have used the identity Vgi(a,) = V((an I ln(a,)) + 7) = w,,)
+ 1.
(5.89)
3. Algorithm Given an initial point a,, E B and numbers C E N*, 6 E 10, l/C[, and E 10, 1[, EMOPSP is defined by the iterative process [41]
E
where at each iteration n: (a) The family Z,,of indices of selected sets satisfies
0# Z, C Z
and
card {i E Z,,1 a, $? Si}5 C .
(5.91)
(b) The subgradient projections (Z'i,n(an))iElnare defined by (5.85). (c) The weights (w~,,,)~~,, conform to (5.56). (d) The relaxation parameter A, lies in [ E , (2 - E)L,],where L, is as in (5.78).
THE CONVEX FEASIBILITY PROBLEM
229
4. Convergence Results Recall that, for every i E I , Si is defined in (5.79) via a (lower semi-) continuous convex functional gi : 8 + R. We shall say that the subdifferentials of (gJiElare locally uniformly bounded if (Vy E IWT)(3[ E RT)(Vi E I)(Va E B(0, y ) ) dgi(a)C B ( O , ( ) . (5.92)
Theorem 5.18 [4II. Suppose that the subdifferentials of(gi)iElare locally uniformly bounded. Then, under admissible control, every orbit of EMOPSP converges weakly to a point in S . The next theorem pertains to strong convergence under chaotic control. Naturally, additional hypotheses are required.
Theorem 5.19 [41]. Suppose that the subdifferentials of (gi)iEIare locally uniformly bounded. Then, under chaotic control, every orbit of EMOPSP converges strongly to a point in S if either of the following conditions is satisfied.
(i) 3 z 0 ; (ii) Thefamily (g&, isfinite and contains a lower semiboundedly compact functional.
To our knowledge, these results are the most general ones available for the subgradient methods governed by (5.90). In particular, the following corollary of Theorem 5.19(ii) generalizes results of [29], which considered serial, cyclic control, as well as results of [60], which considered static control. Suppose that dim 5 < + m and card I < +m. Then, under chaotic control, every orbit of EMOPSP converges (strongly) to a point in S .
Proposition 5.11.
Proof. If dim S < +m and card I < + m ,then ( g J i E satisfies , (5.92) [137]. In addition, each gi is 1.s.b.co. by virtue of Proposition 2.3(i). G. Extrapolated Method of Parallel Nonexpansive Operators (EMOPNO)
1. Problem Statement Another generalization of the projection framework of Section V.D can be obtained by replacing the projection operators ( Pi)iE,by arbitrary firmly nonexpansive operators (7'JiE,such that (5.93) (Vi E I ) Si = Fix Ti.
230
P. L. COMBETTES
This framework is of interest when constraints are specified as invariance properties, say, h = Ti(h),where Ti is none~pansive.'~ For instance, Ti may be a local rotation or reflection operator to model local symmetries in the image, or a translation operator to model certain periodicities, etc. In such cases, activating the property set Si = Fix Ti through the projection operator Pi may be difficult, whereas activating it through the readily available operator Ti is straightforward. In this regard, it should be noted that, by virtue of (2.32), the elementary update a,,, = Ti(a,)is still a step in the direction of Si.
2. Algorithm Given an initial point a, E E and numbers C E N*, 8 E 30, l / C [ , and E E 10, 1/2[, EMOPNO is defined by the recursion [43]
w a,+, = a, + A,
( ~ En
(2"
Wi,nTi(an)- a,
1
(5.94)
where (a) The family Z, of indices of selected operators satisfies
0f Z, C I
and
card { i E Z, I a, @ S i } 5 C .
(5.95)
(b) The weights ( w ; , , ) ~conform ~~, to (5.56). (c) E IA,, 5 (2 - E)L,with if a , f?=&I,
s;
otherwise. 3. Convergence Results
Theorem 5.20 [43]. Under admissible control, every orbit of EMOPNO converges weakly to a point in S . The convergence is strong if (TJiEI contains a demicompact mapping. Theorem 5.21 [43]. Under chaotic control, every orbit of EMOPNO converges strongly to a point in S if either of the following conditions is satisjied. " We actually address the problem of finding a common fixed point of firmly nonexpansive operators, but it is closely related to that of finding a common fixed point of nonexpansive operators. Indeed, Proposition 2.8(iv) indicates that a nonexpansive operator T,' can be associated with a firmly nonexpansive operator T, = (T: + Id)/2, where, by construction, Fix T,= Fix T : .
THE CONVEX FEASIBILITY PROBLEM
23 1
(i) 5: z 0. (ii) The family ( Ti)iE,is finite and contains a demicompact mapping. Theorem 5.20 improves upon results of [19], which considered a serial scheme. Theorem 5.20 and Theorem 5.21(i) improve, respectively, upon results of [I311 and [121], which both considered the successive approximation scheme a,,, = T(a,).The above theorems also generalize certain results of Section V.D.4, which were restricted to projection operators. In particular, thanks to Proposition 2.10, condition (ii) above generalizes condition (ii) in Theorem 5.15. Finally, since in finite-dimensional spaces any operator is demicompact, we obtain the following corollary of Theorem 5.21(ii). It generalizes a result of [174], which considered card I,, = 1 in (5.95).
Proposition 5.12. Suppose card Z < + w and dim E < +w. Then, under chaotic control, every orbit of EMOPNO converges (strongly) to a point in S . H . Toward Unification EMOPAP, EMOPSP, and EMOPNO are three separate generalizations of EMOPP which are not related in general. However, given their similar structure, it is natural to contemplate the possibility of unifying them in a single framework. An important step towards unification was made in [Ill, where, under C [E, 2 - E], the more restrictive assumptions card Z < + w and some of the results of Sections V.D-V.G were obtained by investigating a general iterative method for solving convex feasibility problems. The algorithm proposed there was of the formI4
where at each iteration n: (a) The family I,, of indices of selected sets satisfies
0# 1, c 1. l4
Actually, the algorithm of [ l l ] proceeds by averaging relaxed operators, i.e., ('fn E N) an+i =
2 ~ i , J ( l- Ai,n)an+ Ai,n7'i,n(an)).
Eln
But this is equivalent to relaxing averaged operators, as in (5.96).
(5.97)
232
P. L. COMBETTES
is a family of firmly nonexpansive operators such that
(b) (
(5.98)
(Vi E I,,)Si C Fix T;,,,.
(c) The weights ( w ~ , ~ satisfy ) ~ ~ ! a, condition similar to (5.56). (d) The relaxation parameter A,, lies in [ E , 2 - E ] . In addition, a so-called focusing condition was introduced to study convergence. It requires that for every i E I and every subsequence of an orbit of the algorithm, we have a, L a a,
- T;,,,k(ank) $0
(Wi,y)krO C
3 0 9
(5.99)
ja E S j .
1[
This study also contains a number of results on geometrical convergence rates. It appears reasonable to investigate the algorithms presented above in a single framework described by the recursion (5.96) with (a)-(c) but that would allow, as in Sections V.D-V.G, countable families of property sets under suitable control modes, as well as extrapolated relaxations, i.e.,
(e)
E 5
A,, I ( 2 - E)L,
with
if at?@
n i E / , , Si
otherwise.
I . Practical Considerations f o r Digital Image Processing In this section, we discuss the practical issues pertaining to the numerical realization of the proposed methods on a digital computer. This places us in !he context of a finite number of sets (Si);€[in the Euclidean space [EN . In other words, all the above results should now be viewed from the perspective card I < +m and dim E < +a.Fortunately, this is the context in which the most powerful convergence results were obtained. Inconsistent problems will be considered first.
I. Inconsistent Problems When the property sets do not intersect, POCS has been seen to be i n a d e q ~ a t e ' and ~ ; two parallel methods producing weighted least-squares Is
It goes without saying that the same is true of any serial method of type (5.1).
233
THE CONVEX FEASIBILITY PROBLEM
solutions were developed in Section V.C. The method (5.26) is interesting theoretically, for it converges strongly and it provides the closest leastsquares solution from a starting point a,. The first aspect is irrelevant in digital processing, since weak and strong convergence modes coincide. As to the second, it may be of interest in certain best-approximation problems, but since our chief interest here is just feasibility, we shall discuss only the second method, namely, PPM (5.23) (5.3). First of all, it follows from Theorem 5.4 that any sequence generated by PPM converges to a solution of the weighted least-squares problem (5.5). In practice, PPM will provide an approximate minimum of the proximity function @ in a finite number of steps. According to Proposition 5.5, the proximity function decreases at every iteration. Hence, the algorithm can be stopped when negligible improvement in the decrease of @ is observed, i.e., whenever the stopping criterion
+
(5.100)
is met for a suitably small positive number K . An alternative way of determining the near convergence of the algorithm is to measure the norm of the gradient, which leads to the stopping rule (5.101)
In implementing PPM, one should also be aware of the influence of the weights ( w ~ on) solutions. ~ ~ ~ The larger a particular weight wi, the closer the solution to the corresponding set S;. Hence, if some constraints are judged to be more critical than others in defining a least-squares-feasible solution, they should be assigned larger weights. For problems in which no particular group of contraints should be privileged, the weights should be taken to be equal, that is, wi = lkard 1. We have seen that overrelaxations had an accelerating effect on the algorithm. One could therefore blindly choose relaxations in [ 1, 2 - E]. However, an explicit relaxation rule can be determined by going back to Proposition 5.5. Since PPM behaves as a steepest-descent method, we can use the relaxation scheme devised by Armijo [6], which consists of successively reducing the relaxation parameter X, until the inequality @(a,) - @(a,+,) 2 ah,(IV@(a,)(12is satisfied. In our applications, this adaptation scheme yielded overrelaxations that converged efficiently. Based on numerical experience and the recommendations of [ 1351 regarding Armijo’s relaxation scheme, we propose the following algorithm as an efficient practical implementation of PPM. 1, Choose an initial guess a, E 8 , strictly convex weights ( w ; ) ; ~and ~, K E 10, +m[. Set n = 0.
234 2. 3. 4. 5. 6.
P. L. COMBETTES
Set V@(a,) = a, - &,wiPi(a,) and A,, = 1.999. Set a,+, = a,, - A,V@(a,,). If @(a,) - @(a,+,)< A,llV@(a,)11*/2,set A, = O.75An,and return to 3. If @(a,) - @(a,,+,)> K , set n = n + 1, and return to 2. Stop.
2 . Consistent Problems In consistent problems, we highly recommend that the extrapolated parallel methods of Sections V.D-V.G be used. We shall discuss EMOPSP here since in practice sets are most frequently specified in the format (5.79). The convergence of EMOPSP is guaranteed by Proposition 5.11, which indicates that the sets can be activated in any order so long as every set is used repeatedly in the course of the iterations. EMOPSP is superior to the widely used POCS algorithm on three counts. 1. It is straightforward to implement on any parallel machine, as the number of activated sets is variable. 2. It converges very efficiently thanks to its extrapolated relaxations. 3. It does not rely on the often-cumbersome computation of exact projections and involves only the evaluation of subgradients. EMOPSP is faster than POCS in that each iteration has a lower computational cost (item 3) and the whole iterative process converges in a smaller number of steps (item 2). EMOPSP is also very versatile, as all of its parameters can be changed at each iteration (sets selected, approximating supersets, weights on the projections, relaxations). However, a standard implementation can be obtained with the following guidelines [41]. a. Control. If the number P of parallel processors is at least equal to the number m of sets, one can implement the algorithm with static control. It may not be worth activating only the violated sets, since checking for membership in a set is usually done before projecting and there will be no savings in terms of computation. When m > P , then only violated sets should be activated. The chaotic control mode does not impose any specific scheduling for the processing of the sets but, for the sake of simplicity, one may want to sweep through the constraints circularly and activate blocks of P consecutive violated sets.
6 . Weights. Although the weights can be defined in a number of ways and may have some influence on the centering of the algorithm, it is usually best to keep them uniform, that is, (Vn E N)(Vi E I , ) wi.*= lkard I,,.
(5.102)
THE CONVEX FEASIBILITY PROBLEM
235
c . Relaxations. Although no general conclusion is intended, our intensive simulations with EMOPSP in various problems has revealed the following behavior. When a small number of sets is used, very large extrapolations (say, I SL,, 5 A,, 5 1.99Ln)often create a lot of zigzagging and are not as effective as the centered extrapolations (5.43). On the other hand, large extrapolations accelerate the iterations significantly in more sizable problems.
d. Stopping Rule. If static control is used with exact projections (in which case EMOPSP reduces to EPPM2), a stopping rule involving @, such as ( 5 . loo), can be used. In other cases, the exact projections ( Pi(an))iEI will not be available at iteration n and alternative stopping criteria must be considered, e.g., lla,,+, - anll IK , ~ ~ M = O C j E I , _ I I ( ~ i ,n an-J - k ( a2 n5- Kk ) for some M E N, etc. VI. NUMERICAL EXAMPLES This section is devoted to concrete numerical examples of convex set theoretic digital signal and image recoveries. The results of previous sections will therefore be applied in the context described in Sections III.A.2.d and V.I. A . Recovery with Inconsistent Constraints
This example is taken from [42] and illustrates an application of PPM to the set theoretic restoration of a one-dimensional signal in the presence of an inconsistent family of constraints. 1. Experiment
The problem is to deconvolve a noisy discrete-time N-point signal, i.e., to estimate the original form of a signal h which has been passed through a linear shift-invariant system and further degraded by addition of noise. The length of the signals is N = 64 and the solution space is the N dimensional euclidean space [EN. The original signal h is shown in Fig. 15. The recorded signal x of Fig. 16 was obtained via the standard convolutional model x=Th+u,
(6.1)
in which the N X N Teplitz matrix T models a shift-invariant linear blur and u is a vector of noise samples uniformly distributed in [-6, 61, with
236
P. L. COMBETTES 12
10
20
30 samples
40
50
60
FIGURE15. Original signal. 0 1994 IEEE [42], with permission.
12 108-
231
THE CONVEX FEASIBILITY PROBLEM
6 = 0.15. The blurring kernel is a Gaussian function with a variance of 2 samples2. If T, designates the nth row of T and x, the nth component of the data vector x , (6.1) can be written as
(Vn f (0, . . . , N - 1)) x, = (T, 1 h) + u,, which is a special case of (4.16).
(6.2)
2 . Set Theoretic Formulation The set theoretic formulation for the problem consists of m = 66 closed and convex sets. The sets (Sn)05nsN-lare based on the knowledge of the blurring operator T and the information that the noise samples are distributed in [-a, 61. According to the analysis of Section IV.C.4, they take the form of hyperslabs defined by (4.26), namely, S , = ( a E I E N ( x , - 6 ~ ( T nI a ) : x , + 6 } ,
for
O s n s N - 1. (6.3)
Therefore, the projection P,(a) of a signal a onto S, is given by (2.24) and reads a + [(x, + 6 - (T, I a))/((T,(I*ITn if(Tn I a ) > x , a
I
[(xn - 6 - (Tn a))/IlTnI121T,
I
if(T, a ) <xn
+6 -6
(6.4)
otherwise.
[a
The next set is constructed by assuming knowledge of the phase of h. From (4.13), we obtain Sm-l = {a E [EN 1 (Vk E (0, . . . ,N - 1)) ~ c i ( k=) Lh(k)},
(6.5)
where h is the N-point DFT of h. Since the DFT operator is an isometry (up to a factor N ) , the projection onto SmPlcan be performed in the DFT domain for each frequency individually. It is then easy to show that the projection of a signal a onto Sm-lis the signal P,-l(a) = b, where for every k in (0, .. . , N - 1) 0 =
ifcos(Lci(k) - L/%(k)) 50
{I&) I cos(Lii(k) - Lh(k))exp(&(k))
otherwise.
(6.6) The last set arises from the prior knowledge that the components of h are nonnegative and bounded by llhllm= 12. This leads to the bounded set (4.1) defined by
I
S, = {a E EN (Vi E ( 0 , . . . ,N - 1)) 0 s a ( ; )IIlhllm).
(6.7)
238
P. L. COMBETTES
The projection of a signal a onto S, is given by P,(a)
=
6, where
if a") < O (Vi E (0, . . . , N - I}) b(')=
if a(i)> llhllr
(6.8)
otherwise.
a"'
3. Results All the algorithms are initialized with the degraded signal, i.e., a, = x. The feasible signal of Fig. 17 is obtained by POCS. It is seen that most features of h have been fairly well recovered. Next, we introduce inaccuracies in the specifications of the Q priori information that will induce an inconsistent set theoretic formulation: The variance of the Gaussian impulse response of the system is taken to be 2.5 samples2 instead of 2, the bound on the noise is taken to be 0.1 instead of 0.15, and the phase of h is recorded in 10 dB of background noise. The limiting signal of the subsequence generated by POCS in this case is depicted in Fig. 18. As discussed in Section V.B.3, the only definite property of this signal is to lie in S, and, thereby, to satisfy the amplitude constraints. The convergence behavior of POCS in the inconsistent case is shown in Fig. 19, where the values taken by the proximity function ( @ ( u ~ , ~ )are )~~, plotted. The limiting value of the proximity function (degree of unfeasibil12, 108-
-2' 0
10
20
40
30
50
60
samples
FIGURE17. Consistent case-Deconvolution mission.
by POCS. 0 1994 IEEE 1421, with per-
THE CONVEX FEASIBILITY PROBLEM
1
2
-2' 0
10
20
0.165
0.16-*, 1(
0.15P
s
* I
5
0.15-
2
g0.145-
g 0.140.1350.13 0
* * * I !u !u !u
G
30
40
samples
FIGURE18. Consistent case-Deconvolution mission.
239
50
60
by POCS. 0 1994 IEEE [42], with per-
240
P. L. COMBETTES
-2’ 0
10
20
30
40
50
60
samples
FIGURE20. Consistent case-Deconvolution mission.
by PPM. 0 1994 IEEE [42], with per-
ity) achieved by POCS is about 0.136. PPM is then employed to produce the restored signal shown in Fig. 20. This least-squares solution to the inconsistent feasibility problem has fewer artifacts than the solution gener~ by PPM is shown in ated by POCS. The sequence ( + ( U , J ) ~ ~produced Fig. 21. PPM achieves a much lower asymptotic degree of unfeasibility than POCS of @(a,) = 0.035. 4. Numerical Performance
Figure 22 depicts the convergence behavior of PPM subjected to various relaxations schemes. In the underrelaxed case A, is drawn randomly from the interval 10, 11; in the unrelaxed case B the relaxations are is drawn randomly from the equal to 1; in the overrelaxed case C, is obtained as in Section interval [I, 21; in the adapted case D, V.I.1. These plots support the claims of Section V.I.1 to the effect that overrelaxations are more effective than underrelaxations and that Armijo’s adapted relaxation scheme is preferable. In all cases, ( @ ( u , , ) is ) ~decreas~~ ing, in conformity with Proposition 5.5 B. Deconvolution with Bounded Uncertainty
We demonstrate an example of deconvolution with bounded kernel disturbances and bounded measurement noise along the lines of [511.
24 1
THE CONVEX FEASIBILITY PROBLEM
0.12.
1 2
0.1.
go.o*-
g
0.06 0.04 -
0.02
0.161
O 0.04 aI
-
20
40
60
A: underrelaxed B: unrelaxed C: overrelaxed D:adapted
80
100
!O
iteration index
FIGURE22. Convergence of PPM for various relaxations schemes. 0 1994 IEEE [42], with permission.
242
P. L. COMBETTES
1. Experiment In this experiment, the length of the signals is set to N = 512. We consider the problem of recovering the original signal h of Fig. 23 from the data signal x = T h + Fh+v,
(6.9)
where T is an N x N Teplitz matrix representing a known shift-invariant linear blur with kernel 5,T an N X N matrix representing an unknown shift-variant linear blur with kernel 6, and w a vector of bounded noise samples. We note that (6.9) is a special case of (4.16)-(4.17), which can be written component-wise as A -
(T, I h) + (Tn1 h) + w n = (T,I h) + u,, (6.10) where T,, and Tndenote, respectively, the nth rows of T and T. The known convolutional kernel 6which makes up the rows of T is uniform and has (Vn € (0,. . . , N - 1))
length f
=
X, =
16 points, i.e., (Vi E (0, . . . ,N - 1))
{:I1 G(j)
=
i f 0 s i l l - 1, otherwise.
(6.11)
The unknown blurring kernel 6 which makes up the rows of T is shiftvariant and has the same I-point region of support as 5. Furthermore, for
243
THE CONVEX FEASIBILITY PROBLEM
every n, a bound on the t'-norrn of each T, is available, say,
llT,,lll 5 a,
as. It is also assumed that h is nonnegative with maximum value lIhJJ, =
7.4. Finally, the absolute bound on the noise samples ( u , ) ~ is~p ~= 0.1. ~ ~ - ~ 2. Set Theoretic Formulation
From the above information, a bound on the uncertainty signal samples can be derived ad6 (Vn E (0, . . . ,N - 1)) (u,( 5 a, * JJh(l,+ p
A
= 6,.
(6.12)
As seen in Section VI.A.2, we obtain from (4.26) the sets S,
= {a E
EN I x,
- 6, 5 (T,
1 a ) 5 x,,+ a,},
for 0 5 n
5N -
1.
(6.13)
The projection of a signal a onto S , is a + [(x, + a,, - (T,?1 a))/llT,l121T,
+ [ ( x , - a,,
I a)>/(lT,t121T,
- (T,,
if (T, I a ) > x,,+ 6, if(T, I a ) < x, - 8,
(6.14)
otherwise. The last set S N is based on the information on the amplitude of h , which yields the same set as in (6.7).The feasibility set is S = n:,S;.
3. Results As all the projections are easy to evaluate, EMOPP is used with a, = x to obtain feasible solutions. First, we simulate an instance when the blur is known exactly, i.e., a, = 0. The degraded signal is shown in Fig. 24 and the set theoretic deconvolution in Fig. 25. Then, we introduce shiftvarying perturbations in the blurring kernel, with a, = 0.04, to obtain the degraded signal of Fig. 26, whose restoration is shown in Fig. 27. Clearly, the added uncertainty has increased the bounds (6n)05n5N-I and therefore the feasibility set, which results in a poorer restoration. Besides the knowledge of the component T of the system, the information used in this experiment is limited to upper and lower bounds on the input signal and the noise, and upper bounds on the ['-norm of the shiftvariant disturbances affecting the blurring kernel. Let us emphasize that no statistical assumption has been made and that the only conventional deconvolution method that could be implemented with such little information would be inverse filtering, which is known to give unacceptable results [ 5 ] . l6 In general, such a bound can be obtained via Holder's inequality as long as the t p norm of h is known as well as bounds ( C X , , ) , , ~ , , ~ for ~ - ~ the @-norms of the random vectors (~n)OsnrN-l. wherep E [ I , +mI and I/p + 1/q = 1. For instance. p = 2 was chosen to derive (4.29), which assumed prior knowledge of the energy of h.
244
P. L. COMBETTES
1c
E
n
E
3 E
E
4
0
2
C
-2
50
100
150
200
250
300
350
L
400
450
500
samples
FIGURE24. Degraded signal-Known blur.
-2'
0
L
50
100
150
200
250
300
350
samples
FIGURE25. Deconvolved signal-Known blur.
400
450
50
245
THE CONVEX FEASIBILITY PROBLEM
8-
6-
s= 5
I
4-
-0
50
100
150
200
250 300 samples
350
FIGURE26. Degraded signal-Perturbed
-2'
o
50
100
150
200
250 300 samples
450
500
400
450
50
blur.
350
FIGURE27, Deconvo!ved signal-Perturbed
400
blur.
L
246
P. L. COMBETTES
C . Image Restoration with Bounded Noise
In this section, a two-dimensional version of the previous experiment is investigated. It leads to a set theoretic formulation with m = 16,385 sets, which will allow us to demonstrate the flexibility of EMOPP in large-scale problems. Such large set theoretic formulations have also been encountered in other studies, e.g., [127], [157], [171], where they were solved using POCS. 1. Preliminaries
All images have N x N pixels ( N = 128) and will be represented using stacked-vector notations as in Section III.A.2.d. 8 is the usual N 2 dimensional Euclidean space E N 2 . Every algorithm will be initialized with the degraded image, that is, a, = x , and the progression of its orbit (an)n20will be tracked by plotting the normalized decibel values of the proximity function (3.7), where (10 loglo(@(an)/@(ao)))n20 (Vi E I ) w i= l/(card I ) .
(6.15)
As a practical stopping rule to compare performance, we shall use the criterion llhll: (6.16) @(an) 1300 card I '
As seen in Section V.I.2.d, the sequence (@(an))n2o will usually not be computed in actual applications, but we use it here as we need a pertinent and uniform quantification of the notion of unfeasibility to compare accurately the performance of the algorithms. 2 . Experiment The original image h of Fig. 28 is degraded by convolutional blur with a uniform 7 x 7 kernel 6 and addition of noise. The noise samples are distributed in the interval [O, R ] and the resulting blurred image-to-noise ratio is 32 dB. The degraded imagex is shown in Fig. 29. It can be written as x=Th+u,
(6.17)
where T is the N 2 X N 2 block-Tceplitz matrix associated with the point spread function 6 [5] and u is a noise vector. 3. Set Theoretic Formulation First, we assume that the point spread function b (or, equivalently, T ) is known. No probabilistic information is available about the noise vector
THE CONVEX FEASIBILITY PROBLEM
FIGURE28. Original image.
FIGURE29. Degraded image-Bounded noise.
247
248
P. L. COMBETTES
u , except that its components lie in [0, R ] . As before, this information leads to the N 2 hyperslabs S n = { a E E N 2 1O ~ x , - ( T , I a ) l R }
for O s n l N 2 - 1 ,
(6.18)
where T, is the nth row of T. Then, by using the fact that the pixel values are nonnegative, the last property set we obtain is the nonnegative orthant SN2
= (R+)N2.
(6.19)
The projection of an image a onto SN2 is simply
pN2(a)= a+ = ‘[max{O, a(1~}]05isN2-, .
(6.20)
The set theoretic formulation is ( E N 2 , (Si)05i5N2)and it comprises m = N 2 1 = 16,385 sets. Since all the projections are easily computed, EMOPP will be used to solve the feasibility problem.
+
4. Numerical Performance
POCS (5.4) is implemented by skipping the nonviolated sets so that each iteration actually produces an update. The convergence pattern of POCS is shown in Fig. 30. To implement EMOPP, computer architectures with P = 8 and 64 parallel processors are considered.” At each iteration, the control selects P sets as follows: SN2, if it is violated, and a block of consecutive violated sets in (6.18). In addition, over the iterations, the sets (Sn)05nsN~-, are swept through in a circular fashion. Three values of A, are considered: 1 , L,, and I .9L,. In Figs. 3 1 and 32, the corresponding algorithms are labeled as EMOPP(I), EMOPP(L), and EMOPP(1.9L), respectively. These plots clearly show the numerical superiority of EMOPP and the remarkable acceleration provided by extrapolated oMrrelaxations. Thus, the -55 dB mark corresponding to the stopping rule (6.16) was reached by POCS in 44,700 iterations. By contrast, it took EMOPP( 1.9L) only 5346 iterations to reach this point with 8 processors, and 1168 iterations with 64 processors.
5. Results The restored image obtained by EMOPP is shown in Fig. 33. Again, it is important to stress that the only information available about the noise consists of amplitude bounds and that no probabilistic assumption whatsoever has been made. None of the conventional methods could operate with such little information (except inverse filtering, but it is unacceptable in the presence of noise). Our ATLT Pixel Machines have 64 parallel processors.
249
THE CONVEX FEASIBILITY PROBLEM
-100; 1
2
3 4 iteration index
FIGURE 30. Convergence of POCS.
5
I
6
x
lo4
250
P. L. COMBETTES
EMOPP(1)
-80
-
-90
-
EMOPP(1.9L)
iteration index
FIGURE32. Convergence of EMOPP--64 parallel processors.
FIGURE33. Restored image-Bounded
noise.
THE CONVEX FEASIBILITY PROBLEM
25 1
6. Bounded versus Unbounded Noise As was mentioned in Section IV.C.4.a, in the presence of bounded noise, the confidence coefficient c on the solution set defined in (4.55) is 100%. A question that naturally arises is what happens when the noise is unbounded. To answer this question, let US assume that the components of the noise vector u in (6.17) are i.i.d. and distributed as a zero mean normal r.v. Uo with known second moment u2 = ElUo12,adjusted so that the blurred image-to-noise ratio is again 32 dB. The degraded image x thus obtained is shown in Fig. 34. According to the results of Section IV.C.4.b, the sets (Sn)05,,5N2-l take the form S, = {a E EN’ I x,, - au 5 (T,, I a) 5 x,
+ au}
for 0 5 n 5 N 2 - 1 , (6.21)
where a is to be determined from the tables of the standard normal distribution in terms of the confidence coefficient 1 - E placed on each set. Now suppose that we fix the global confidence coefficient at c = 95% in (4.55). Then, since the noise samples are independent, we must have 1 - E = c‘”’ = 99.999687%,
(6.22)
which gives a = 4.662. Of course, these sets are “wider” than those obtained in the case of bounded noise. For instance, assume that in the experiment of Section VI.C.2 the noise samples were i.i.d. and distributed
FIGURE 34. Degraded image-Gaussian noise.
252
P. L. COMBETTES
uniformly in [0, R ] as a r.v. V,, of same power as V , , i.e., ElVOl2= u2. Then the residual samples were constrained in (6.18) to fall in the interval [0, R ] which has length R = 1 . 7 3 2 ~In . comparison, for Gaussian noise, they were constrained in (6.21) to fall in a wider interval of length 2 a u = 9 . 3 2 4 ~As . a result, the restoration obtained in this case with the sets of (6.19) and (6.21) is seen in Fig. 35 to be very poor. We conclude that in the presence of bounded noise the sets (4.26) are quite effective and require minimal information to be constructed. On the other hand, when the noise is unbounded, they must be made large in order to secure a reasonable confidence coefficient c on their intersection. Consequently, they usually fail to describe the original image accurately and must be accompanied by other sets. It should also be noted that a substantial amount of information is required to construct these sets in the case of unbounded noise. For instance, in the above example, the i.i.d. assumption was used and knowledge of the distribution of U,was assumed. Fortunately, when such information is available, all the sets described in Section 1V.C can be constructed to refine the set theoretic formulation, as we shall see in the next section. D . Image Restoration via Subgradient Projections We consider here an application of EMOPSP to set theoretic image is the tworestoration. A similar example was presented in [41].
FIGURE35. Restored image-Gaussian noise.
253
THE CONVEX FEASIBILITY PROBLEM
dimensional DFT operator defined in (3.6), and the basic setup is as in Section VI.C.l. 1. Experiment
The experiment is the same as in Section VI.C.6: The degraded image x of Fig. 34 is obtained by convolving the original image h of Fig. 28 with a known uniform 7 x 7 kernel b and addition of zero mean white Gaussian noise with power d.The blurred image-to-noise ratio is 32 dB. The number of parallel processors is P = 4. 2. Set Theoretic Formulation We first assume that the maximum intensity value llhllmof h is known to obtain the set S , = {a E P" I (Vi E (0, . . . , N Z- 1))
o 5 a(fi 5 Ilhlla}.
(6.23)
The projection operator P I is described in (6.8). Next, we assume that the discrete Fourier transform of h is known over the low-frequency region K' = {(k, 1) E (0, . . . , N - 1)' 10 5 k , 1 5 M},
(6.24)
where M = 21. Recall that the two-dimensional DFT of real images possesses the conjugate-symmetry properties
(V(k, 1) E (0, . . . , N
-
l},)
h(k, 0) = z ( N - k, 0)
ifk#O
h(o, 1) = z(o,N - 1)
ifl# 0
h(k, 1) = z ( N - k , N - 1)
if kl # 0.
(6.25)
The set K' must therefore be extended accordingly to a set K including all the symmetric pairs. The associated property set is thengiven by (4.9) as
s, Note that we have d is given by
=
= {a E E N Z \a,= h l K } .
81,
(6.26)
+ 8 1 ~Hence, ~. the projection of a onto S,
P,(a) = 8-'(hlK + dlCK).
(6.27)
The information that the noise is zero mean white and Gaussian with power cr' provides a complete description of its probabilistic structure. Hence, all the sets described in Section 1V.C can be constructed. For instance, since the noise samples are i.i.d. with second and fourth moments given, respectively, by m zand 3a4, the second moment set (4.41) becomes S,
= {a E
EN2 IIIx - Tall25 t2},
where l2= N ( N + aV?)cr2. (6.28)
254
P. L. COMBETTES
This set has proven quite useful in several applications, e.g., [48], [171]. Unfortunately, we have seen in Section V.E. 1 that its projection operator must be determined iteratively via a costly procedure, which precludes its use in certain applications [127]. However, with EMOPSP, S, can simply be activated via (5.86), where the subgradient projection of an image a onto S, was seen to be
otherwise, where y ( a ) = x - Ta is the residual image. Upon making the standard = B block-circulant approximation on the matrix T [5], we obtain id.Whence, the upper line in (6.29) can be computed efficiently via the fast Fourier transform (FFT) as
(6.30) where we have kept the notation ((-(( to designate the norm in the Fourier space, i.e.,
(Va E EN*) 1
N-l N-l
1 ~ 1 =1 ~ k=O 2 /=0 2 I B ( ~r ,) l 2 .
(6.31)
The exact computation of P,(a) proposed in [171] typically requires 10 to 20 iterations of much higher complexity than (6.30). Consequently, the subgradient projection reduces the cost of processing S , by at least an order of magnitude. To define the last set based on the spectral properties of the noise, let D = { l , ..., N / 2 - l } x ( l , . . . , N - 1 ) .
(6.32)
Then we can define
s4=
n
{a E [ E N *1 l z ) ( k , /)I2
5
0,
where 5 = -N2q2In(&). (6.33)
(k.l)E D
We observe that this is not exactly the form in which the sets were given in Section IV.C.6.a. Indeed, it is more convenient here to replace (4.45) by the two-dimensional periodogram
255
THE CONVEX FEASIBILITY PROBLEM
where (U,JnEZ is the noise process. It can be shown that Theorem 4,l(i)+(iii) remains true for the statistics of (6.34), which leads to the above definition of S,. As was done for (6.5), the projection P,(a) of an image a onto S , can be performed in the Fourier domain for every frequency pair ( k , I ) individually. Note that, for any frequencies ( k , I ) E D such that &(k,I ) # 0, the constraint on the residual can be written as (6.35) The projection onto this ball is given by (2.25). Consequently, by taking (6.25) into account, we obtain
where
( k , I ) E D or ( N - k , N - I ) E D
t
.
(6.37)
To fully specify the sets S, and S,, it remains to choose the confidence parameters a and E . To this end, let us impose a global confidence coefficient of c = 95% on the feasibility set in (4.55) and let us call p the confidence coefficient to be placed on S , and S,. Consider the events A,
= {w E R
I h E S,(o)}
A,
= {w E R
I h E S4(w)}.
(6.38)
Note that since the statistics (4.36) and (6.34) are not mutually independent, we cannot take p = 6. We can nonetheless derive the value of p from the relations
c = PA,
f l A,
(6.39)
=
1 - PCA, U CA4
(6.40)
Z
1 - PCA, - PCA,
(6.41)
= 2 p - 1.
(6.42)
Hence, we should take p = 97.5%, which yields a = 2.241 in (6.28). Moreover, since the statistics (lk,,)(k,,)ED are independent, the confidence
256
P. L. COMBETTES
coefficient 1 - E on the ( N - 1)(N/2 - 1) sets defining S4 should satisfy (1 - E ) ( ~ - ~ ) ( ” ~ - *=) p, which yields E = 3.164 X in (6.33). We have now completely defined the set theoretic formulation (IEN2, (S;)ls;54) for this problem.
3 . Numerical Performance Various subgradient projection methods are compared here. The exact projection operators will be used for the sets S , , S,, and S4, for they admit closed-form expressions. On the other hand, S 3 will be activated through its subgradient projection. We shall call subgradient POCS (SPOCS) the subgradient version of (5.4) thus obtained. Since we have P = 4 processors and m = 4 sets, EMOPSP is implemented with static control. Several relaxation schemes are considered. We shall call EMOPSP( I), EMOPSP(1.9), and EMOPSP(L) the algorithms obtained by taking at each iteration n relaxations A,, = 1, A,, = 1.9, and A,, = L,,, respectively. Finally, EMOPSP(C) designates the algorithm obtained with the centering technique (5.43). Since the control is static, EMOPSP(1) can be regarded as the subgradient version of SIRT (3.1 l), EMOPSP( 1.9) as an overrelaxed subgradient version of PPM (5.23), and EMOPSP(L) as the fully extrapolated subgradient version of EPPM. The convergence patterns are shown in Fig. 36. We notice, that the
iteration index
FIGURE36. Convergence of subgradient methods.
THE CONVEX FEASIBILITY PROBLEM
FIGURE37. Restored image.
FIGURE38. Original image.
257
258
P. L. COMBETTES
FIGURE39. Degraded image.
FIGURE 40. Restored image.
THE CONVEX FEASIBILITY PROBLEM
259
unrelaxed EMOPSP( 1) algorithm is slower than SPOCS and that overrelaxations in EMOPSP(1.9) have an accelerating effect. However, the extrapolated algorithm EMOPSP(L) is much faster and centering in EMOPSP(C) further accelerates the progression of the iterates towards a solution. Thus, the -51 dB mark corresponding to the stopping rule (6.16) was reached by SPOCS in 64 iterations and by EMOPSP(C) in only 14 iterations. 4. Results
Figure 37 shows the image restored by EMOPSP. To give a more complete demonstration of the effectiveness of this particular set theoretic formulation, the same experiment was repeated on the image of Fig. 38. The degraded image appears in Fig. 39 and its restoration in Fig. 40. VII. SUMMARY Every image recovery problem is accompanied by some a priori knowledge. Together with the observed data, this a priori knowledge defines constraints on the solutions to the problem. In the conventional approach, an optimality criterion is introduced to define a unique solution, and computational tractability imposes that many constraints be left out of the recovery process. As a result, the end product may violate known properties of the image being estimated. In the set theoretic approach, the notion of feasibility prevails: Any image which satisfies all the constraints arising from the data and a priori knowledge is an acceptable solution. A set of solutions is thus defined, whose elements are equally likely to have generated the observed data in the light of the available information. The main asset of this framework is to provide great flexibility in the incorporation of statistical as well as nonstatistical contraints. In addition, the recovered images thus obtained have-by construction-well-defined, tangible, and meaningful properties, which is often more valuable than satisfying some conceptual optimality criterion. The focus of this survey has been placed on problems in which the property sets associated with the constraints are closed and convex in some Hilbert image space. In this context, the set theoretic image recovery problem can be abstracted into the problem of finding a common point of convex sets, i.e., into a convex feasibility problem. This framework is certainly limited by the restriction to convex constraints. However, this limitation is advantageously counterbalanced by the existence of efficient
260
P. L. COMBETTES
algorithms that are guaranteed to find feasible solutions. In addition, a wide range of useful constraints was seen to yield convex property sets. The field originated in the early 1970s with the formulation of tomographic reconstruction and band-limited extrapolation problems as affine feasibility problems. Because these approaches lacked a general abstract formalism and powerful analytical tools, their scope remained limited both in the nature of the problems and in the amount of information that could be used. As convex feasibility algorithms entered the image recovery toolbox in the early 1980s, the restriction to subspace and half-space property sets disappeared, and a much wider range of information became exploitable. As a result, the set theoretic approach soon gained widespread recognition and found applications in numerous image recovery problems. Very recently, the field has benefited from a regained interest in the convex feasibility problem on the part of several groups of researchers, and efficient parallel alternatives to the rudimentary POCS algorithm have been proposed. We expect such developments to further broaden the scope of set theoretic image recovery by making it less involved computationally and therefore more widely applicable. Naturally, the next logical extension would be to relax the convexity requirement on the property sets. In this regard, the lack of a generalpurpose, globally convergent method for solving nonconvex feasibility problems seems to be an unsurmountable obstacle. On the other hand, it is quite conceivable that suitable methods could be developed for specific problems. Before closing this survey, the unavoidable question should be posed: When should an image recovery problem be formulated as a feasibility problem rather than an optimization problem? A complete and systematic answer is of course not possible, and it would set the stage for endless philosophical discussions. In addition, some methods are simply known to work better in certain problems, which makes such a debate a rather academic one. Nonetheless, our view is that any optimization approach is acceptable as long as it yields a feasible image. If not, it simply produces a solution which is inconsistent with known facts about the image being estimated.
APPENDIX: ACRONYMS
ART: algebraic reconstruction technique (Section 1II.B. 1) EMOPP: extrapolated method of parallel projections (Section V .D.2) EMOPAP: extrapolated method of parallel approximate projections (Section V.E)
THE CONVEX FEASIBILITY PROBLEM
26 1
EMOPNO: extrapolated method of parallel nonexpansive operators (Section V.G) EMOPSP: extrapolated method of parallel subgradient projections (Section V.F) EPPM: extrapolated parallel projection method (Section V.D. 1.b) EPPM2: (generalized) extrapolated parallel projection method (Section V.D.1.b) POCS: projection onto convex sets (Section III.B.4) PPM: parallel projection method (Section V.C.3) SIRT: simultaneous iterative reconstruction technique (Section 1II.B.1) ACKNOWLEDGMENTS This work was supported by the National Science Foundation under grant MIP-9308609. REFERENCES I. S. Agmon, “Therelaxation method for linear inequalities,” CanadiunJournalofMathematics, vol. 6, no. 3, pp. 382-392, 1954. 2. R. Aharoni, A. Berman, and Y. Censor, “An interior points algorithm for the convex feasibility problem,” Advances in Applied Mathematics, vol. 4, no. 4, pp. 479-489, December 1983. 3. R. Aharoni and Y.Censor, “Block-iterative methods for parallel computation of solutions to convex feasibility problems,” Linear Algebra and Its Applications, vol. 120, pp. 165-175, August 1989. 4. I. Amemiya and T. Ando, “Convergence of random products of contractions in Hilbert space,” Acta Scientiarum Mathematicarum (Szeged),vol. 26, no.3, pp. 239-244, 1965. 5. H. C. Andrews and B. R. Hunt, Digital Image Resforation. Englewood Cliffs, NJ: Prentice-Hall, 1977. 6. L. Armijo, “Minimization of functions having Lipschitz continuous first partial derivatives,” PaciJc Journal of Mathematics, vol. 16, no. 1 , pp. 1-3, 1966. 7. J. Arsac, Transformation de Fourier et The‘orie des Distributions. Paris: Dunod, 1961. 8. J. P. Aubin, Optima and Equilibria-An Introduction to Nonlinear Analysis. New York: Springer-Verlag, 1993. 9. R. Barakat and G. Newsam, “Algorithms for reconstruction of partially known, bandlimited Fourier transform pairs from noisy data,” Journal of the Optical Society of America A , vol. 2, no. I I , pp. 2027-2039, November 1985. 10. H. H. Bauschke, “A norm convergence result on random products of relaxed projections in Hilbert space,” Transactions of the American Mathematical Society, vol. 347, no. 4, pp. 1365-1373, April 1995. 11. H. H. Bauschke and J. M. Borwein, “On projection algorithms for solving convex feasibility problems,” accepted for publication in SIAM Review. 12. H. H. Bauschke, J. M. Borwein, and A. S. Lewis, “On the method ofcyclic projections for convex sets in Hilbert space.” Research report, Simon Fraser University, 1994.
262
P. L. COMBETTES
13. P. Billingsley, Convergence of Probability Measures. New York: Wiley, 1968. 14. J. M. Boone, B. A. Arnold, and J. A. Seibert, “Characterization of the point spread function and modulation transfer function of scattered radiation using a digital imaging system,” Medical Physics, vol. 13, pp. 254-256, 1986. 15. G. Bouligand, Introduction a la Ge‘omPtrieInJinite‘simaleDirecte. Paris: Vuibert, 1932. 16. N. Bourbaki, Elements de Mathe‘matique-Espaces Vectoriels Topologiques, Chapiires 1 a 5. Pans: Masson, 1981. 17. D. Braess, Nonlinear Approximation Theory. New York: Springer-Verlag. 1986. 18. L. M. Brkgman, “The method of successive projection for finding a common point of convex sets,” Soviet Mathematics-Doklady, vol. 6, no. 3, pp. 688-692, May 1965. 19. F. E. Browder, “Convergence theorems for sequences of nonlinear operators in Banach spaces,” Mathematische Zeirschrift, vol. 100, no. 3, pp. 201-225, July 1967. 20. R. E. Bruck, “Random products of contractions in metric and Banach spaces,” Journal of Mathematical Analysis and Applications, vol. 88, no. 2, pp. 319-332, August 1982. 21. D. Butnariu and Y. Censor, “On the behavior of a block-iterative projection method for solving convex feasibility problems,” International Journal of Computer Mathematics, vol. 34, nos. 1-2, pp. 79-94, 1990. 22. D. Butnariu and Y. Censor, “Strong convergence of almost simultaneous block-iterative projection methods in Hilbert spaces,” Journal of Computational and Applied Mathematics, vol. 53, no. 1, pp. 33-42, July 1994. 23. C. L. Byme, ”Iterative image reconstruction algorithms based on cross-entropy minimization,” IEEE Transactions on Image Processing, vol. 2, no. 1, pp. 96-103, January 1993. (“Erratum and addendum,” vol. 4 , no. 2, pp. 226-227, February 1995.) 24. J. M. Carazo and J. L. Carrascosa, “Information recovery in missing angular data cases: An approach by the convex projections method in three dimensions,” Journal of Microscopy, vol. 145, pt. 1 , pp. 23-43, January 1987. 25. Y. Censor, “Iterative methods for the convex feasibility problem,” Annals of Discrete Mathematics, vol. 20, pp. 83-91, 1984. 26. Y . Censor, “Parallel application of block-iterative methods in medical imaging and radiation therapy,” Mathematical Programming, vol. 42, no. 2, pp. 307-325, 1988. 27. Y. Censor, P. P. B. Eggermont, and D. Gordon, “Strongunderrelaxation in Kaczmarz’s method for inconsistent systems,” Numerische Mathematik, vol. 41, no. 1, pp. 83-92, April 1983. 28. Y. Censor and G. T. Herman, “On some optimization techniques in image reconstruction from projections,” Applied Numerical Mathematics, vol. 3, pp. 365-391, 1987. 29. Y. Censor and A. Lent, “Cyclic subgradient projections,” Mathematical Programming, vol. 24, no. 2, pp. 233-235, 1982. 30. A. E. Cetin, “An iterative algorithm for signal reconstruction from bispectrum,” IEEE Transactions on Signal Processing, vol. 39, no. 12, pp. 2621-2628, December 1991. 31. A. E. Cetin and R. Ansari, “Convolution-based framework for signal recovery and applications,” Journalof the OpticalSocietyofAmerica A , vol. 5 , no. 8, pp. 1193-1200, August 1988. 32. A. E. Cetin and R. Ansari, “Signal recovery from wavelet transform maxima,’’ IEEE Transactions on Signal Processing, vol. 42, no. I , pp. 194-196, January 1994. 33. W. Cheney and A. A. Goldstein, “Proximity maps for convex sets,” Proceedings of the American Mathematical Society, vol. 10, no. 3, pp. 448-450, June 1959. 34. R. T. Chin, C. L. Yeh, and W. S. Olson, “Restoration of multichannel microwave radiometric images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 7, no. 4, pp. 475-484, July 1985.
THE CONVEX FEASIBILITY PROBLEM
263
35. G . Cimmino, “Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari,” La Ricerca ScientiJca (Roma), vol. I , pp. 326-333, 1938. 36. M. R. Civanlar and H. J. Trussell, “Digital signal restoration using fuzzy sets,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 4, pp. 919-936, August 1986. 37. D. Cochran, “Phase and magnitude in normalized images,” IEEE Transactions on Image Processing, vol. 3, no. 6, pp. 858-862, November 1994. 38. P. L. Combettes, “The foundations of set theoretic estimation,” Proceedings of the IEEE, vol. 8 1 , no. 2, pp. 182-208, February 1993. 39. P. L. Combettes, “Signal recovery by best feasible approximation,” IEEE Transactions on Image Processing, vol. 2, no. 2, pp. 269-271, April 1993. 40. P. L. Combettes, “Hilbertian convex feasibility problem: Convergence of projection methods,” accepted for publication in Applied Mathematics and Optimization. 41. P. L. Combettes, “Convex set theoretic image recovery by extrapolated iterations of parallel subgradient projections,” IEEE Transactions on Image Processing, submitted. 42. P. L. Combettes, “Inconsistent signal feasibility problems: Least-squares solutions in a product space,” IEEE Transactions on Signal Processing, vol. 42, no. 1 1 , pp. 2955-2966, November 1994. une i famille de contractions 43. P. L. Combettes, “Construction d’un point fixe commun ? fermes,” Comptes Rendus de I’Acade‘mie des Sciences de Paris, Se‘rie I , vol. 320, no. 1 1 , pp. 1385-1390, June 1995. 44. P. L. Combettes, M. Benidir, and B. Picinbono, “A general framework for the incorporation of uncertainty in set theoretic estimation,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. vol. 3, pp. 349-352. San Francisco, CA, March 23-26, 1992. 45. P. L. Combettes and H. Puh, “Extrapolated projection method for the euclidean convex feasibility problem.” Technical report, City University of New York, 1993. 46. P. L. Combettes and H. Puh, “Iterations of parallel convex projections in Hilbert spaces,” Numerical Functional Analysis and Optimization, vol. 15, nos. 3-4, pp. 225-243, 1994. 47. P. L. Combettes and H. J. Trussell, “Modeles et algorithmes en vue de la restauration numtrique d’images rayons-)<,” Proceedings of MARI-Cognitiva Electronic Image, pp. 146-151. Paris, France, May 18-22, 1987. 48. P. L. Combettes and H. J. Trussell, “Methods for digital restoration of signals degraded by a stochastic impulse response,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 393-401, March 1989. 49. P. L. Combettes and H. J. Trussell, “Method of successive projections for finding a common point of sets in metric spaces,” Journal of Optimization Theory arid Applications, vol. 67, no. 3, pp. 487-507, December 1990. 50. P. L. Combettes and H. J. Trussell, “The use of noise properties in set theoretic estimation,” IEEE Transactions on Signal Processing, vol. 39, no. 7, pp. 1630-1641, July 1991. 51. P. L. Combettes and H. J. Trussell, “Deconvolution with bounded uncertainty,” International Journal of Adaptive Control and Signal Processing, vol. 9, no. 1, pp. 3-17, January 1995. 52. I. Csiszfir, “Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems,” The Annals of Statistics, vol. 19, no. 4, pp. 2032-2066, December 1991. 53. G. Demoment “Image reconstruction and restoration: Overview of common estimation
264
P. L. COMBETTES
structures and problems,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 12, pp. 2024-2036, December 1989. 54. A. R. De Pierro and A. N. Iusem, “A simultaneous projections method for linear inequalities,” Linear Algebra and Its Applications, vol. 64, pp. 243-253, January 1985. 55. A. R. De Pierro and A. N. h e m , “A parallel projection method for finding a common point of a family of convex sets,” Pesquisa Operacional, vol. 5 , no. 1, pp. 1-20, July 1985. 56. F. Deutsch, “The method of alternating orthogonal projections,” in Approximation Theory, Spline Functions andApplications ( S . P. Singh, Ed.), pp. 105-121. Dordrecht, The Netherlands: Kluwer, 1992. 57. J. A. Dieudonnk, Foundations of Modern Analysis, 2nd ed. New York: Academic Press, 1969. 58. J. L. Doob, Stochastic Processes. New York: Wiley, 1953. 59. J. L. Doob, Measure Theory. New York: Springer-Verlag, 1994. 60. L. T. Dos Santos, “A parallel subgradient projections method for the convex feasibility problem,” Journal of Computational and Applied Mathematics, vol. 18, no. 3, pp. 307-320, June 1987. 61. J. M. Dye and S. Reich, “Unrestricted iterations of nonexpansive mappings in Hilbert space,” Nonlinear Analysis-Theory, Methods, and Applications, vol. 18, no. 2, pp. 199-207, January 1992. 62. S. Ebstein, “Stellar speckle interferometry energy spectrum recovery by convex projections,” Applied Optics, vol. 26, no. 8, pp. 1530-1536, April 1987. 63. B. Efron, “Controversies in the foundations of statistics,” American Mathematical Monthly, vol. 85, no. 4, pp. 231-246, April 1978. 64. B. Efron, “Why isn’t everyone a Bayesian?” American Statistician, vol. 40, no. 1 , pp. 1-5, February 1986. 65. I. Ekeland and R. Temam, Analyse Convexe et ProblPmes Variationnels. Paris: Dunod, 1974. 66. P. Erdos, “Some remarks on the measurability of certain sets,” Bulletin of the American Mathematical Society, vol. 51, no. 10, pp. 728-731, October 1945. 67. M. Fisz, Probability Theory and Mathematical Statistics, 3rd ed. New York: Wiley, 1963. 68. S. D. Flgm and J. Zowe, “Relaxed outer projections, weighted averages, and convex feasibility,” BIT, vol. 30, no. 2, pp. 289-300, 1990. 69. D. L. Fried, “Optical resolution through a randomly inhomogeneous medium for very long and very short exposures,” Journal of the Optical Society of America, vol. 56, no. 10, pp 1372-1379, October 1966. 70. B. R. Frieden, “Restoring with maximum likelihood and maximum entropy,” Journal of the Optical Society ofAmerica, vol. 62, no. 4, pp. 511-518, April 1972. 71. S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721-741, November 1984. 72. R. W. Gerchberg, “Super-resolution through error energy reduction,” Optica Acta, vol. 21, no. 9, pp. 709-720, September 1974. 73. R. W. Gerchberg and W. 0. Saxton, “A practical algorithm for the determination of phase from image and diffraction plane pictures,” Optik, vol. 35, no. 2, pp. 237-246, April 1972. 74. P. Gilbert, “Iterative methods for the three-dimensional reconstruction of an object fromprojections,” Journalof TheoreticalBiology, vol. 36, no. I, pp. 105-1 17, July 1972.
THE CONVEX FEASIBILITY PROBLEM
265
75. K. Goebel and W. A. Kirk, Topics in Metric Fixed Point Theory. Cambridge: Cambridge University Press, 1990. 76. M. Goldburg and R. J. Marks 11, “Signal synthesis in the presence of an inconsistent set of constraints,” IEEE Transactions on Circuits and Systems, vol. 32, no. 7, pp. 647-663, July 1985. 77. J. W. Goodman, Statistical Optics. New York: Wiley Interscience, 198.5. 78. R. Gordon, R. Bender, and G. T. Herman, “Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography,” Journal of Theoretical Biology, vol. 29, no. 3, pp. 471-481, December 1970, 79. L. G. Gubin, B. T. Polyak, and E. V. Raik, “The method of projections for finding the common point of convex sets,” USSR Computational Mathematical and Mathematical Physics, vol. 7, no. 6, pp. 1-24, 1967. 80. S. S. Gupta and J. 0. Berger (Eds.), Statistical Decision Theory and Related Topics I V , vol. 1. New York: Springer-Verlag. 1988. 81. I. Halperin, “The product of projection operators,” Acta Scientiarum Mathematicarum (Szeged), vol. 23, no. 1, pp. 96-99, 1962. 82. M. H. Hayes, “The reconstruction of a multidimensional sequence from the phase or magnitude of its Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 30, no. 2, pp. 140-1.54, April 1982. 83. S. Hein and A. Zakhor, “Halftone to continuous-tone conversion of error-diffusion coded images,” IEEE Transactions on Image Processing, vol. 4, no. 2, pp. 208-216, February 1995. 84. G. T. Herman, “A relaxation method for reconstructing objects from noisy X-rays,” Mathematical Programming, vol. 8, no. 1, pp. 1-19, February 1975. 85. G. T. Herman, Image Reconstruction from Projections, the Fundamentals of Computerized Tomography. New York: Academic Press, 1980. 86. G. T. Herman, “Mathematical optimization versus practical performance: A case study based on the maximum entropy criterion in image reconstruction,” Mathematical Programming Study, vol. 20, pp. 96-112, October 1982. 87. G. T. Herman, H. Hunvitz, A. Lent, and H. P. Lung, “On the Bayesian approach to image reconstruction,” Information and Control, vol. 42, no. 1, pp. 60-71, July 1979. 88. G. T. Herman, A. Lent, and P. H. Lutz, “Relaxation methods for image reconstruction,” Communications of the ACM, vol. 21, no. 2, pp. 152-158, February 1978. 89. B. R. Hunt, “The application of constrained least-squares estimation to image restoration by digitalcomputer,”IEEE Transactionson Computers, vol. 22, no.9, pp. 805-812, September 1973. 90. B. R. Hunt, “Bayesian methods in nonlinear digital image restoration,” IEEE Transactions on Computers, vol. 26, no. 3, pp. 219-229, March 1977. 91. N. E. Hurt, “Signal enhancement and the method of successive projections,” Acta Applicandae Mathematicae, vol. 23, no. 2, pp. 145-162, May 1991. 92. A. N. Iusem and A. R. De Pierro, “Convergence results for an accelerated nonlinear Cimmino algorithm,” Numerische Mathematik, vol. 49, no. 4, pp. 367-378, August 1986. 93. G. M. Jenkins and D. G. Watts, Spectral Analysis and Its Applications. Oakland, CA: Holden Day, 1968. 94. B. Jessen, “To saetninger om konvekse punktmaengder,” Mafematisk Tiddskrift 3, VOI. 1940, pp. 66-70, 1940. 95. M. Jiang, “On Johnson’s example of a nonconvex Chebyshev set,” Journal ofApproximation Theory, vol. 74, no. 2, pp. 152-158, August 1993.
266
P. L. COMBETTES
96. G. G. Johnson, “A nonconvex set which has the unique nearest point property,” Journal of Approximation Theory, vol. 51, no. 4, pp. 289-332, December 1987. 97. S. Kaczmarz, “Angenaherte Auflosung von Systemen h e a r e r Gleichungen,” Bulletin de I’Academie des Sciences de Pologne, vol. A35, pp. 355-357, 1937. 98. A. K. Katsaggelos (Ed.), Digital Image Restoration. New York: Springer-Verlag, 1991. 99. A. K . Katsaggelos, J. Biemond, R. W. Schafer, and R. M. Mersereau, “A regularized iterative image restoration algorithm,” IEEE Transactions on Signal Processing, vol. 39, no. 4, pp. 914-929, April 1991. 100. H. Kudo and T. Saito, “Sinogram recovery with the method of convex projections for limited-data reconstruction in computed tomography,” Journal of the Optical Society of America A , vol. 8 , no. 7, pp. 1148-1160, July 1991. 101. S. S. Kuo and R. J. Mammone, “Image restoration by convex projections using adaptive constraints and the L , norm,” IEEE Transactions on Signal Processing, vol. 40, no. 1, pp. 159-168, January 1992. 102. H. J. Landau and W. L. Miranker, “The recovery of distorted band-limited signals,” Journal of Mathematical Analysis and Applications, vol. 2, no. 1, pp. 97- 104, February 1961. 103. A. M. Landraud, “Image restoration and enhancement of characters, using convex projection methods,” CVGIP: Graphical Models and Image Processing, vol. 53, no. I , pp. 85-92, January 1991. 104. K. Lange, M. Bahn, and R. Little, “A theoretical study of some maximum likelihood algorithms for emission and transmission tomography,” IEEE Tramactions on Medical Imaging, vol. 6, no. 2, pp. 106-114, June 1987. 105. R. M. Leahy and C. E. Goutis, “An optimal technique for constraint-based image restoration and reconstruction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 6, pp. 1626-1642, December 1986. 106. A. Lent and H. Tuy, “An iterative method for the extrapolation of band-limited functions,” Journal of Mathematical Analysis and Applications, vol. 83, no. 2, pp. 554-565, October 1981. 107. D. G. Luenberger, Linear and Nonlinear Programming. 2nd ed. Redwood City, CA: Addison-Wesley, 1984. 108. A. Levi and H. Stark, “Signal reconstruction from phase by projection onto convex sets,” Journalof the Optical Society ofAmerica, vol. 73, no. 6, pp. 810-822, June 1983. 109. A. Levi and H. Stark, “Image restoration by the method of generalized projections with application to restoration from magnitude,” Journal of the Optical Society of America A , vol. 1, no. 9, pp. 932-943, September 1984. 110. E. S. Levitin and B. T. Polyak, “Convergence of minimizing sequences in conditional extremum problems,” Soviet Mathematics-Doklady, vol. 7, no. 3, pp. 764-767, May 1966. I 1 I . J. L. Lions, Quelqws Methodes de Resolution des ProblPmes aux Limites Non Lineaires. Paris: Dunod, 1969. 112. P. L. Lions, “Approximation de points fixes de contractions,” Comptes Rendus de I’Academie des Sciences de Paris. vol. A284, no. 21, pp. 1357-1359, June 1977. 113. G. E. Mailloux, F. Langlois, P. Y . S h a r d , and M. Bertrand, “Restoration of the velocity field of the heart from two-dimensional echograms,” IEEE Transactions on Medical Imaging, vol. 8 , no. 2, pp. 143-153, June 1989. 114. S. Mallat and S. Zhong, “Characterization of signals from multiscale edges,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 7, pp. 710-732, July 1992.
THE CONVEX FEASIBILITY PROBLEM
267
115. J. Mandel, “Convergence of the cyclical relaxation method for linear inequalities,” Mathematical Programming, vol. 30, no. 2, pp. 218-228, 1984. 116. C. P. Mariadassou and B. Yegnanarayana, “Image reconstruction from noisy digital holograms,” IEE Proceedings-F, vol. 137, no. 5 , pp. 351-356, October 1990. 117. C. L. Matson, “Fourier spectrum extrapolation and enhancement using support constraints,” IEEE Transactions on Signal Proceuing, vol. 42, no. 1 , pp. 156-163, January 1994. 118. R. G. Miller, Jr., Simultaneous Sfatisticaf Inference. 2nd ed. New York: SpringerVerlag, 1981. 119. A. Mohammad-Djafari and G. Demoment, “Maximum entropy image reconstruction in x-ray and diffraction tomography,” IEEE Transactions on Medical Imaging, vol. 7, no. 4, pp. 345-354, December 1988. 120. W. D. Montgomery, “Optical applications on Von Neumann’s alternating-projection theorem,” Optics Letters, vol. 7, no. I , pp. 1-3, January 1982. 121. J. J. Moreau, “Un cas de convergence des itkrkes d’une contraction d’un espace hilbertien,” Comptes Rendus de I’Acadkmie des Sciences de Paris, vol. A286, no. 3, pp. 143-144, January 1978. 122. T. S. Motzkin and I. J. Schoenberg, “The relaxation method for linear inequalities,” Canadian Journal of Mathematics, vol. 6, no. 3, pp. 393-404, 1954. 123. S . Oh, C. Ramon, R. J. Marks 11, A. C. Nelson, and M. G. Meyer, “Resolution enhancement of biomagnetic images using the method of alternating projections,” IEEE Transactions on Biomedical Engineering, vol. 40, no. 4, pp. 323-328, April 1993. 124. P. Oskoui-Fard and H. Stark, “Tomographic image reconstruction using the theory of convex projections,” IEEE Transactions on Medical Imaging, vol. 7, no. I , pp. 45-58, March 1988. 125. P. Oskoui and H. Stark, “A comparative study of three reconstruction methods for a limited-view computer tomography problem,” IEEE Transactions on Medical Imaging, vol. 8, no. I , pp. 43-49, March 1989. 126. N. Ottavy, “Strong convergence of projection-like methods in Hilbert spaces,” Journal of Optimization Theory and Applications, vol. 56, no. 3, pp, 433-461, March 1988. 127. M. K. Ozkan, A. M. Tekalp, and M. I. Sezan, “POCS-based restoration of spacevarying blurred images,” IEEE Transactions on Image Processing, vol. 3, no. 4, pp. 450-454, July 1994. 128. A. Papoulis, “A new algorithm in spectral analysis and band-limited extrapolation,” IEEE Transactions on Circuits and Systems, vol. 22, no. 9, pp. 735-742, September 1975. 129. H. Peng and H. Stark, “Signal recovery with similarity constraints,” Journal of the Optical Society of American A , vol. 6, no. 6, pp. 844-851, June 1989. 130. H. Peng and H. Stark, “Image recovery in computer tomography from partial fanbeam data by convex projections,” IEEE Transactions on Medical Imaging, vol. 1 1 , no. 4, pp. 470-478, December 1992. 131. W. V. Petryshyn, “Construction of fixed points of demicompact mappings in Hilbert space,” JournalofMathematicalAnalysis anddpplications, vol. 14, no. 2, pp. 276-284, May 1966. 132. G. Pierra, “Mkthodes de projections paralltles extrapolkes relatives a une intersection de convexes.” Rapport de Recherche, INPG, Grenoble, France, September 1975. See also “Decomposition through formalization in a product space,” Mathematical Programming, vol. 28, no. I , pp. 96-115, January 1984. 133. S. V. Plotnikov, “Cyclic projection on a system of convex sets with empty intersection”
268
134. 135. 136. 137.
P. L. COMBETTES (in Russian), in Improper Optimization Problems (I. I. Eremin and V. D. Skarin, Eds.), pp. 60-66. Sverdlovsk: Akademiia Nauk SSSR, 1982. H. Poincart, “Sur les equations aux derivees partielles de la physique rnathematique,” American Journal of Mathematics, vol. 12, pp. 211-294, 1890. E. Polak, Computational Methods in Optimization: A Uni3ed Approach. New York: Academic Press, 1971. B. T. Polyak, “Minimization of unsmooth functionals,” USSR Computational Mathematics and Mathematical Physics, vol. 9, no. 3, pp. 14-29, 1969. B. T. Polyak, Introduction to Optimization. New York: Optimization Software Inc.,
1987. 138. W. K. Pratt, Digital Image Processing, 2nd ed. New York: Wiley, 1991. 139. R. Ramaseshan and B. Yegnanarayana, “Image reconstruction from multiple frames of sparse data,” Multidimensional Systems and Signal Processing, vol. 4, no. 2, pp. 167-179, April 1993. 140. R. Rangayyan, A. P. Dhawan, and R. Gordon, “Algorithms for limited-view computed tomography: An annoted bibliography and a challenge,” Applied Optics, vol. 24, no. 23, pp. 4000-4012, December 1985. 141. R. T. Rockafellar, “Monotone operators and the proximal point algorithm,” SIAM Journal on Control and Optimization, vol. 14, no. 5, pp. 877-898, August 1976. 142. A. J. Rockmore and A. Macovski, “A maximum likelihood approach to emission image reconstruction from projections,” IEEE Transactions on Nuclear Science, vol. 23, no. 4, pp. 1428-1432, August 1976. 143. M. Rosenblatt, Stationary Sequences and Random Fields. Boston, MA: Birkhauser, 1985. 144. C. SSlnchez-Avila, “An adaptive regularized method for deconvolution of signals with edges by convex projections,” IEEE Transactions on Signal Processing, vol. 42, no. 7, pp. 1849-1851, July 1994. 145. J. L. C. Sanz and T. S. Huang, “Unified Hilbert space approach to iterative leastsquares linear signal restoration,” Journal of the Optical Society of America, vol. 73, no. 11, pp. 1455-1465, November 1983. 146. 0. Sasaki and T. Yamagami, “Image restoration in singular vector space by the method of convex projections,” Applied Optics, vol. 26, no. 7, pp. 1216-1221, April 1987. 147. K. D. Sauer and J. P. AUebach, “Iterative reconstruction of band-limited images from nonuniformly spaced samples,” IEEE Transactions on Circuits and Systems, vol. 34, no. 12, pp. 1497-1506, December 1987. 148. R. W. Schafer, R. M. Mersereau, and M. A. Richards, “Constrained iterative restoration algorithms,” Proceedings of the ZEEE, vol. 69, no. 4, pp. 432-450, April 1981. 149. L. Schwartz, Analyse IIZ-Calcul Inte‘gral. Paris: Hermann, 1993. 150. F. C. Schweppe, “Recursive state estimation: Unknown but bounded errors and system inputs,” ZEEE Transactions on Automatic Control, vol. 13, no. 1, pp. 22-28, February 1968. 151. M. I. Sezan and H. Stark, “Image restoration by the method of convex projections: Part 2-Applications and numerical results,” IEEE Transactions on Medical Imaging, vol. 1, no. 2, pp. 95-101, October 1982. (“Correction,” vol. 1, no. 3, p. 204, December 1982.) 152. M. I. Sezan and H. Stark, “Image restoration by convex projections in the presence of noise,” Applied Optics, vol. 22, no. 18, pp. 2781-2789, September 1983. 153. M. I. Sezan and H. Stark, “Tomographic image reconstruction from incomplete view data by convex projections and direct Fourier inversion,” IEEE Transactions on Medicallmaging, vol. 3, no. 2, pp. 91-98, June 1984.
THE CONVEX FEASIBILITY PROBLEM
269
154. M. 1. Sezan and H. Stark, “Incorporation of a priori moment information into signal recovery and synthesis problems,” Journal ofMathematica1 Analysis and Applications, vol. 122, no. 1, pp. 172-186, February 1987. 155. M. I. Sezan and A. M. Tekalp, “Adaptive image restoration with artifact suppression using the theory of convex projections,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 1 , pp. 181-185, January 1990. 156. M. I. Sezan and A. M. Tekalp, “Survey of recent developments in digital image restoration,” Optical Engineering, vol. 29, no. 5 , pp. 393-404, May 1990. 157. M. I. Sezan and H. J. Trussell, “Prototype image constraints for set-theoretic image restoration,” IEEE Transactions on Signal Processing, vol. 39, no. 10, pp. 2275-2285, October 1991. 158. P. Y. S h a r d and G. E. Mailloux, “Vector field restoration by the method of convex projections,” Computer Vision, Graphics, and Image Processing, vol. 52, no. 3, pp. 360-385, December 1990. 159. H. Stark (Ed.), Image Recouery: Theory and Application. San Diego, CA: Academic Press, 1987. 160. H. Stark, D. Cahana, and H. Webb, “Restoration of arbitrary finite energy optical objects from limited spatial and spectral information,” Journal of the Optical Society of America, vol. 71, no. 6, pp. 635-642, June 1981. 161. H. Stark and E. T. Olsen, “Projection-based image restoration,” Journal of the Optical Society of America A , vol. 9, no. 1 1 , pp. 1914-1919, November 1992. 162. H. Stark and P. Oskoui, “High-resolution image recovery from image-plane arrays, using convex projections,” Journal of the Optical Society of America A , vol. 6, no. 1 1 , pp. 1715-1726, November 1989. 163. W. J. Stiles, “Closest point maps and their product 11,” Nieuw Archiefuoor Wiskunde, vol. 13, no. 3, pp. 212-225, November 1965. 164. A. M. Tekalp and H. J. Trussell, “Comparative study of some statistical and settheoretic methods for image restoration,” CVGIP: Graphical Models and Image Processing, vol. 53, no. 2, pp. 108-120, March 1991. 165. V. T. Tom, T. F. Quatieri, M. H. Hayes, andJ. H. McClellan, “Convergence ofiterative nonexpansive signal reconstruction algorithms,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 5, pp. 1052-1058, October 1981. 166. H. J. Trussell, “Notes on linear image restoration by maximizing the a posteriori probability,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 2, pp. 174-176, April 1978. 167. H. J. Trussell, “The relationship between image restoration by maximum a posteriori and maximum entropy methods,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 1 , pp. 114-117, February 1980. 168. H. J. Trussell, “Maximum power signal restoration,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 29, no. 5, pp. 1059-1061, October 1981. 169. H. J. Trussell, “Convergence criteria for iterative restoration methods,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 31, no. 1, pp. 129-136, February 1983. 170. H. J. Trussell, “ A priori knowledge in algebraic reconstruction methods,” in Advances in Computer Vision and Image Processing (T. S . Huang, Ed.), vol. 1, pp. 265-316. Greenwich, CT: JAI Press, 1984. 171. H. J. Trussell and M. R. Civanlar, “The feasible solution in signal restoration,” IEEE Transactions on Acoustics, Speech, and SignalProcessing, vol. 32, no. 2, pp. 201-212, April 1984. 172. H. J . Trussell, H. Orun-Ozturk, and M. R. Civanlar, “Errors in reprojection methods
270
P. L. COMBETTES
in computerized tomography,” IEEE Transactions on Medical Imaging, vol. 6, no, 3, pp. 220-227, September 1987. 173. H. J. Trussell and P. L. Vora, “Bounds on restoration quality using a priori information,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1758-1761. New York, NY, April 11-14, 1988. 174. P. Tseng, “On the convergence of products of firmly nonexpansive mappings,” SIAM Journal on Optimization, vol. 2, no. 3, pp. 425-434, August 1992. 175. P. Tseng and D. P. Bertsekas, “Relaxation methods for problems with strictly convex separable costs and linear constraints,” Mathematical Programming, vol. 38, no. 3, pp. 303-321, 1987. 176. J. Von Neumann, “On rings of operators. Reduction theory,” Annals of Mathematics, vol. 50, no. 2, pp. 401-485, April 1949 (the result of interest first appeared in 1933 in lecture notes). 177. S. J. Wernecke and L. R. D’Addario, “Maximum entropy image reconstruction,” IEEE Transactions on Computers, vol. 26, no. 4, pp. 351-364, April 1977. 178. M. N. Wernick and C. T. Chen, “Superresolved tomography by convex projections and detector motion,” Journal of the Optical Society of America A , vol. 9, no. 9, pp. 1547-1553, September 1992. 179. R. Wittmann, “Approximation of fixed points of nonexpansive mappings,” Archiu der Mathematik, vol. 58, no. 5, pp. 486-491, May 1992. 180. S. J. Yeh and H. Stark, “Iterative and one-step reconstruction from nonuniform samples by convex projections,” Journal of the Optical Society of America A , vol. 7, no. 3, pp. 491-499, March 1990. 181. D. C. Youla, “Generalized image restoration by the method of alternating orthogonal projections,” IEEE Transactions on Circuits and Systems, vol. 25, no. 9, pp. 694-702, September 1978. 182. D. C. Youla, “On deterministic convergence of iterations of relaxed projection operators,” Journal of Visual Communication and Image Representation, vol. 1 , no. 1, pp. 12-20, September 1990. 183. D. C. Youla and V. Velasco, “Extensions of a result on the synthesis of signals in the presence of inconsistent constraints,” IEEE Transactions on Circuits and Systems, vol. 33, no. 4, pp. 465-468, April 1986. 184. D. C. Youla and H. Webb, “Image restoration by the method of convex projections: Part I-Theory,” IEEE Transactions on Medical Imaging, vol. 1 , no. 2, pp. 81-94, October 1982. 185. H. T. Yura and S. G. Hanson, “Second-order statistics for wave propagation through complex optical systems,” Journal o f t h e Optical Society of America A , vol. 6, no, 4, pp. 564-575, April 1989. 186. L. A. Zadeh, “What is optimal?,” IRE Transactions on Information Theory, vol. 4, no. 1, p. 3, March 1958. 187. E. H. Zarantonello, “Projections on convex sets in Hilbert space and spectral theory,” in Contributions to Nonlinear Functional Analysis (E. H. Zarantonello, Ed.), pp. 237-424. New York: Academic Press, 1971. 188. E. Zeidler, Nonlinear Functional Analysis and Its Applications I : Fixed-Point Theorems, 2nd ed. New York: Springer-Verlag, 1993. 189. E. Zeidler, Nonlinear Functional Analysis and Its Applications IIIA: Linear Monotone Operators. New York: Springer-Verlag. 1990. 190. E. Zeidler, Nonlinear Functional Analysis and Its Applications I l l : Variational Methods and Optimization. New York: Springer-Verlag, 1985.
.
ADVANCES IN IMAGING AND ELECTRON PHYSICS VOL . 95
Spacetime Algebra and Electron Physics CHRIS DORAN. ANTHONY LASENBY. STEPHEN GULL. SHYAMAL SOMAROO. and ANTHONY CHALLINOR MRAO. Cavendish Laboratory. Madingley Road. Cambridge CB3 OHE. United Kingdom
I . Introduction . . . . . . . . . . . . . . . . . . . . I1. Spacetime Algebra . . . . . . . . . . . . . . . . . .
A . The Spacetime Split . . . . . . . . . . . . . . B . Spacetime Calculus . . . . . . . . . . . . . . I11 . Spinors and the Dirac Equation . . . . . . . . . . . A . Pauli Spinors . . . . . . . . . . . . . . . . B . Dirac Spinors . . . . . . . . . . . . . . . . C . The Dirac Equation and Observables . . . . . . . . . IV . Operators, Monogenics. and the Hydrogen Atom . . . . . . A . Hamiltonian Form and the Nonrelativistic Reduction . . . . B . Angular Eigenstates and Monogenic Functions . . . . . . C . Applications . . . . . . . . . . . . . . . . V . Propagators and Scattering Theory . . . . . . . . . . A . Propagation and Characteristic Surfaces . . . . . . . . B . Spinor Potentials and Propagators . . . . . . . . . . C . Scattering Theory . . . . . . . . . . . . . . . VI . Plane Waves at Potential Steps . . . . . . . . . . . . A . Matching Conditions for Traveling Waves . . . . . . . B . Matching onto Evanescent Waves . . . . . . . . . . C . Spin Precession at a Barrier . . . . . . . . . . . D . Tunneling of Plane Waves . . . . . . . . . . . . E . The Klein Paradox . . . . . . . . . . . . . . VII . Tunneling Times . . . . . . . . . . . . . . . . A . Wavepacket Tunneling . . . . . . . . . . . . . B . Two-Dimensional Simulations . . . . . . . . . . . VIII . Spin Measurements . . . . . . . . . . . . . . . A . A Relativistic Model of a Spin Measurement . . . . . . B . Wavepacket Simulations . . . . . . . . . . . . IX. The Multiparticle STA . . . . . . . . . . . . . . A . Two-Particle Pauli States and the Quantum Correlator . . . B . Comparison with the “Causal” Approach to Nonrelativistic Spin C . Relativistic Two-Particle States . . . . . . . . . . D . Multiparticle Wave Equations . . . . . . . . . . . E . The Pauli Principle . . . . . . . . . . . . . . F . Eight-Dimensional Streamlines and Pauli Exclusion . . . . X . Further Applications . . . . . . . . . . . . . . . A . Classical and Semiclassical Mechanics . . . . . . . . B . Grassmann Algebra . . . . . . . . . . . . . . 27 1
272 275 . . . 278 . . . 280 . . . 283 . . . 284 . . . 288 . . . 292 . . . 297 . . . 298 . . . 302 . . . 305 . . . 309 . . . 310 . . . 311 . . . 312 . . . 315 . . . 317 . . . 320 . . . 322 . . . 326 . . . 328 . . . 332 . . . 332 . . . 338 . . . 339 . . . 342 . . . 344 . . . 347 . . . 352 States . 358 . . . 361 . . . 364 . . . 366 . . . 369 . . . 374 . . . 374 . . . 377
Copyright 1996 by Academic Press. Inc . All rights of reproduction in any form reserved .
272
CHRIS DORAN ET AL.
XI. Conclusions . . . . . . . . . . . . Appendix: The Spherical Monogenic Functions. References . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
379 380 383
I. INTRODUCTION This paper surveys the application of “geometric algebra” to the physics of electrons. The mathematical ideas underlying geometric algebra were discovered jointly by Clifford [I] and Grassmann [2j in the late nineteenth century. Their discoveries were made during a period in which mathematicians were uncovering many new algebraic structures (quaternions, matrices, groups, etc), and the full potential of Clifford and Grassmann’s work was lost as mathematicians concentrated on its algebraic properties. This problem was exacerbated by Clifford’s early death and Grassmann’s lack of recognition during his lifetime. This paper is part of a concerted effort to repair the damage caused by this historical accident. We firmly believe that geometric algebra is the simplest and most coherent language available for mathematical physics, and deserves to be understood and used by the physics and engineering communities. Geometric algebra provides a single, unified approach to a vast range of mathematical physics, and formulating and solving a problem in geometric algebra invariably leads to new physical insights. In the series of papers [3]-[6], geometric algebra techniques were applied to a number of areas of physics, including relativistic electrodynamics and Dirac theory. In this paper we extend aspects of that work to encompass a wider range of topics relevant to electron physics. We hope that the work presented here makes a convincing case for the use of geometric algebra in electron physics. The idea that Clifford algebra provides the framework for a unified language for physics has been advocated most strongly by Hestenes, who is largely responsible for shaping the modern form of the subject. His contribution should be evident from the number and range of citations to his work that punctuate this paper. One of Hestenes’ crucial insights is the role of geometric algebra in the design of mathematical physics [7]. Modern physicists are expected to command an understanding of a vast range of algebraic systems and techniques-a problem that gets progressively worse if one is interested in the theoretical side of the subject. A list of the some of the algebraic systems and techniques employed in modern theoretical physics (and especially particle physics) is given in Table I. Hestenes’ point is that every one of the mathematical tools contained in Table I can be expressed within geometric algebra, but the converse is not true. One would be hard-pressed to prove that two of
SPACETIME ALGEBRA AND ELECTRON PHYSICS
273
TABLE I SOME
ALGEBRAIC SYSTEMS EMPLOYED IN MODERNPHYSICS
Coordinate geometry Complex analysis Vector analysis Tensor analysis Lie algebras Clifford algebra
Spinor calculus Grassmann algebra Berezin calculus Differential forms Twistors Algebraic topology
the angles in an isosceles triangle are equal using spinor techniques, for example, but the proof is simple in geometric algebra because it encompasses vector geometry. The work of physicists would be considerably simplified if, instead of being separately introduced to the techniques listed in Table I, they were first given a firm basis in geometric algebra. Then, when a new technique is needed, physicists can simply slot this into their existing knowledge of geometric algebra, rather than each new technique sitting on its own, unincorporated into a wider framework. This way, physicists are relieved of the burden of independently discovering the deeper organizational principle underlying our mathematics. Geometric algebra fulfills this task for them. In the course of this paper we will discuss a number of the algebraic systems listed in Table I, and demonstrate precisely how they fit into the geometric algebra framework. However, the principal aim here is to discuss the application of geometric algebra to electron physics. These applications are limited essentially to physics in Minkowski spacetime, so we restrict our attention to the geometric algebra of spacetime-the spacetime algebra [8]. Our aim is twofold: to show that spacetime algebra simiplifies the study of the Dirac theory, and to show that the Dirac theory, once formulated in the spacetime algebra, is a powerful and flexible tool for the analysis of all aspects of electron physics-not just relativistic theory. Accordingly, this paper contains a mixture of formalism and applications. We begin with an introduction to the spacetime algebra (henceforth the STA), concentrating on how the algebra of the STA is used to encode geometric ideas such as lines, planes, and rotations. The introduction is designed to be self-contained for the purposes of this paper, and its length has been kept to a minimum. A list of references and further reading is given at the end of the introduction. In Sections I11 and IV Pauli and Dirac column spinors, and the operators that act on them, are formulated and manipulated in the STA. Once the STA formulation is achieved, matrices are eliminated from Dirac theory, and the Dirac equation can be studied and solved entirely within the real
274
CHRIS DORAN ET AL.
STA. A significant result of this work is that the unit imaginary of quantum mechanics is eliminated and replaced by a directed plane segment-a bivector. That it is possible to do this has many implications for the interpretation of quantum mechanics [9]. In Sections V, VI, and VII we turn to issues related to the propagation, scattering, and tunneling of Dirac waves. Once the STA form is available, studying the properties of electrons via the Dirac theory is no more complicated than using the nonrelativistic Pauli theory. Indeed, the first-order form of the Dirac theory makes some of the calculations easier than their nonrelativistic counterparts. After establishing various results for the behavior of Dirac waves at potential steps, we study the tunneling of a wavepacket through a potential step. Indirect timing measurements for quantum mechanical tunneling are now available from photon experiments, so it is important to have a solid theoretical understanding of the process. We argue that standard quantum theory has so far failed to provide such an understanding, as it misses two crucial features underlying the tunneling process. In Section VIII we give a relativistic treatment of a measurement made with a Stern-Gerlach apparatus on a fermion with zero charge and an anomalous magnetic moment. As with tunneling, it is shown that a disjoint set of outcomes is consistent with the causal evolution of a wavepacket implied by the Dirac equation. Wavepacket collapse is therefore not required to explain the results of experiment, as the uncertainty in the final result derives from the uncertainty present in the initial wavepacket. It is also argued that the standard quantum theory interpretation of the measurement performed by a Stern-Gerlach apparatus is unsatisfactory. In the STA, the anticommutation of the Pauli operators merely expresses the fact that they represent orthonormal vectors, so cannot have any dynamical content. Accordingly, it should be possible to have simultaneous knowledge of all three components of the spin vector, and we argue that a Stern-Gerlach apparatus is precisely what is needed to achieve this knowledge! Multiparticle quantum theory is considered in Section IX. We introduce a new device for analysing multiparticle states-the multiparticle STA. This is constructed from a copy of the STA for each particle of interest. The resulting algebraic structure is enormously rich in its properties, and offers the possibility of a geometric understanding of relativistic multiparticle quantum physics. Some applications of the multiparticle STA are given here, including a relativistic treatment of the Pauli exclusion principle. The paper ends with a brief survey of some other applications of the STA to electron physics, followed by a summary of the main conclusions drawn from this paper.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
Summation convention and natural units (h = c throughout, except where explicitly stated.
= E,, =
275
1) are employed
11. SPACETIME ALGEBRA
“Spacetime algebra” is the name given to the geometric (Clifford) algebra generated by Minkowski spacetime. In geometric algebra, vectors are equipped with a product that is associative and distributive over addition. This product has the distinguishing feature that the square of any vector in the algebra is a scalar. A simple rearrangement of the expansion (a
+ b)2 = (a + b)(a + b) = a* + (ab + ba) + b2
(2.1)
+ ba = (a + b)2 - a2 - b2,
(2.2)
yields
ab
from which it follows that the symmetrized product of any two vectors is also a scalar. We call this the inner product a * b , where
a . b = B (ab + ba).
(2.3)
The remaining, antisymmetric part of the geometric product is called the outer product a A b , where
a A b = & (ab - ba).
(2.4)
The result of the outer product of two vectors is a biuector-a grade-2 object representing a segment of the plane swept out by the vectors a and b. On combining Eqs. (2.3) and (2.4), we see that the full geometric product of two vectors decomposes as
ab = a . b
+ aAb.
(2.5)
The essential feature of this product is that it mixes two different types of object: scalars and bivectors. One might now ask how the right-hand side of (2.5) is to be interpreted. The answer is that the addition implied by (2.5) is that used when, for example, a real number is added to an imaginary number to form a complex number. We are all happy with the rules for manipulating complex numbers, and the rules for manipulating mixed-grade combinations are much the same [3].But why should one be interested in the sum of a scalar and a bivector? The reason is again the same as for complex numbers: algebraic manipulations are simplified considerably by working with general mixed-grade elements (multivectors) instead of working independently with pure-grade elements (scalars, vectors, etc.).
276
CHRIS DORAN ET AL.
An example of how the geometric product of two vectors (2.5) is employed directly is in the description of rotations using geometric algebra. Suppose initially that the vector a is reflected in the hyperplane perpendicular to the unit vector n (n2 = 1). The result of this reflection is the vector a - 2a.nn
=
a - (an + na)n
=
(2.6)
-nun.
The form on the right-hand side is unique to geometric algebra, and is already an improvement on the usual formula on the left-hand side. If one now applies a second reflection in the hyperplane perpendicular to a second unit vector m , the result is the vector -m(-nan)m
=
(2.7)
mna(mn)-.
The tilde on the right-hand side denotes the operation of reuersion, which simply reverses the order of the vectors in any geometric product,
(ab ... ,-)-= c
... ba .
(2.8)
The combination of two reflections is a rotation in the plane specified by the two reflection axes. We therefore see that a rotation is performed by
a
H
RaR,
(2.9)
mn.
(2.10)
where
R
=
The object R is called a rotor. It has the fundamental property that
RR
=
mnnm
=
1.
(2.11)
Equation (2.9) provides a remarkably compact and efficient formulation for encoding rotations. The formula for R (2.10) shows that a rotor is formed from the geometric product of two unit vectors, so does indeed consist of the sum of a scalar and a bivector. A rotor can furthermore be written as the exponential of a bivector, R = 2 exp(B/2), where the bivector encodes the plane in which the rotation is performed. This naturally generalizes the complex representation of rotations frequently used in two dimensions. Rotors illustrate how mixed-grade objects are frequently employed as operators which act on other quantities in the algebra. The fact that both geometric objects and the operators that act on them are handled in a single unified framework is a central feature of geometric algebra. The above discussion applies to vector spaces of any dimension. We now turn to the case of specific interest, that of Minkowski spacetime. To make the discussion more concrete, we introduce a set of four basis
SPACETIME ALGEBRA AND ELECTRON PHYSICS
277
vectors {y@},p = 0, . . . , 3, satisfying
(2.12)
yCL-y,,= r)@,, = diag(+---).
The vectors {yp}satisfy the same algebraic relations as Dirac’s y-matrices, but they now form a set of four independent basis vectors for spacetime, not four components of a single vector in an internal “spin space.” When manipulating (geometric) products of these vectors, one simply uses the rule that parallel vectors commute and orthogonal vectors anticommute. This result is clear immediately from Eq. (2.5). From the four vectors {yp} we can construct a set of six basis elements for the space of bivectors:
(2.13)
{3/1yO 9 y2yO 9 y3yO 9 73y2 9 yIy3 9 y2yI1.
After the bivectors comes the space of grade-3 objects or trivectors. This space is four-dimensional and is spanned by the basis
(2.14)
{y3y2Y1?yOy3y2 yOYIy3 7Oy2y1)7
9
Finally, there is a single grade-4 element. This is called the pseudoscalar and is given the symbol i, so that
(2.15)
yOyly2y3 *
The symbol i is used because the square of i is - 1 , but the pseudoscalar must not be confused with the unit scalar imaginary employed in quantum mechanics. The pseudoscalar i is a geometrically significant entity and is responsible for the duality operation in the algebra. Furthermore, i anticommutes with odd-grade elements (vectors and trivectors), and commutes only with even-grade elements. The full STA is spanned by the basis
(2.16) where (+k
E
ykyo,
k = 1 , 2, 3.
(2.17)
An arbitrary element of this algebra is called a muftivector and, if desired, can be expanded in terms of the basis (2.16). Multivectors in which all elements have the same grade are usually written as A, to show that A contains only grade-r components. Multivectors inherit an associative product from the geometric product of vectors, and the geometric product of a grade-r multivector A, with a grade-s multivector B, decomposes into ArBs = (AB)r+s + (AB)r+s-l +
*
. + (AB>Ir-si
*
(2.18)
The symbol (M), denotes the projection onto the grade-r component of M. The projection onto the grade-0 (scalar) component of M is written
278
CHRIS DORAN ET AL.
as (M). The scalar part of a product of multivectors satisfies the cyclic reordering property (A
* *
BC) = (CA
*
B).
(2.19)
The and “A” symbols are retained for the lowest-grade and highestgrade terms of the series (2.18), so that “.’I
(2.20) (2.21) which are called the inner and outer (or exterior) products, respectively. We also make use of the scalar product, defined by A * B = (AB),
(2.22)
and the commutator product, defined by AXB
&(AB- BA).
(2.23)
The associativity of the geometric product ensures that the commutator product satisfies the Jacobi identity A x ( B x C ) + B x ( C x A ) + C x ( A X B ) = 0.
(2.24)
When manipulating chains of products we employ the operator ordering convention that, in the absence of brackets, inner, outer, and scalar products take precedence over geometric products. As an illustration of the working of these definitions, consider the inner product of a vector a with a bivector b A c : a * ( b A c )= (abAc), = +(abc- acb), =
(2.25)
a - b c - a - c b - &(bac- cab),.
The quantity bac - cab reverses to give minus itself, so cannot contain a vector part. We therefore obtain the result a . ( b A c ) = aebc - a . c b ,
(2.26)
which is useful in many applications. A . The Spacetime Split
(2.27)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
279
and therefore generate the geometric algebra of three-dimensional Euclidean space [3, 101. This is identified as the algebra for the rest space relative to the timelike vector yo. The full algebra for this space is spanned by the set
1,
bk},
i,
{i’+k},
(2.28)
which is identifiable as the even subalgebra of the full STA (2.16). The identification of the algebra of relative space with the even subalgebra of the STA simplifies the transition from relativistic quantities to observables is in a given frame. It is apparent from (2.27) that the algebra of the {uk} isomorphic to the algebra of the Pauli matrices. As with the {yp},the {mk} are to be interpreted geometrically as spatial vectors (spacetime bivectors) and not as operators in an abstract spin space. It should be noted that the pseudoscalar employed in (2.28) is the same as that employed in spacetime, since ul‘2u3
= yly072yZ’yOy3yO=
’YOyIy2y3= i.
(2.29)
The split of the six spacetime bivectors into relative vectors {uk} and relative bivectors {iuk}is a frame-dependent operation-different observer,s determine different relative spaces. This fact is clearly illustrated using the Faraday bivector F. The “spacetime split” [8, 111 of F into the yo system is made by separating F into parts which anticommute and commute with y o . Thus
F
=
E
+ iB,
(2.30)
where
(2.31) and
(2.32) Both E and B are spatial vectors in the yoframe, and iB is a spatial bivector. Equation (2.30) decomposes F into separate electric and magnetic fields, and the explicit appearance of yo in the formulas for E and B shows how this split is observer-dependent. Where required, relative (or spatial) vectors in the yo system are written in bold type to record the fact that in the STA they are actually bivectors. This distinguishes them from spacetime vectors, which are left in normal type. No problems arise for the {uk}, which are unambiguously spacetime bivectors, and so are left in normal type. When dealing with spatial problems it is useful to define an operation which distinguishes between spatial vectors (such as E) and spatial bivec-
280
CHRIS DORAN ET AL.
tors (such as iB). (Since both the {uk>and {iuk}are spacetime bivectors, they behave the same under Lorentz-covariant operations .) The required operation is that of spatial reversion which, as it coincides with Hermitian conjugation for matrices, we denote with a dagger. We therefore define Mt = yoA2y,,
(2.33)
E
(2.34)
so that, for example, F'
=
-
iB.
The explicit appearance of yo in the definition (2.33) shows that spatial reversion is not a Lorentz-covariant operation. When working with purely spatial quantities, we often require that the dot and wedge operations drop down to their three-dimensional definitions. For example, given two spatial vectors a and b, we would like a A b to denote the spatial bivector swept out by a and b. Accordingly we adopt the convention that, in expressions where both vectors are in bold type, the dot and wedge operations take their three-dimensional meaning. While this convention may look clumsy, it is simple to use in practice and rarely causes any confusion. Spacetime vectors can also be decomposed by a spacetime split, this time resulting in a scalar and a relative vector. The spacetime split of the vector a is achieved via ay, = a . y o
+ a A y o = a, + a,
(2.35)
so that a, is a scalar (the ?,-time component of a) and a is the relative spatial vector. For example, the 4-momentum p splits into (2.36) E + P, where E is the energy in the yo frame, and p is the 3-momentum. The definition of the relative vector (2.35) ensures that PYO
a.b
=
=
(ayoyob)
=
((a, + a)(b, - b))
=
a,b, - a . b ,
(2.37)
as required for the inner product in Minkowski spacetime. B . Spacetime Calculus The fundamental differential operator on spacetime is the derivative with respect to the position vector x . This is known as the vector derivative
SPACETIME ALGEBRA AND ELECTRON PHYSICS
28 1
and is given the symbol V . The vector derivative is defined in terms of its directional derivatives, with the derivative in the a direction of a general multivector M defined by a * VM(x)= lim
M ( x + &a)- M(x)
t-0
&
(2.38)
If we now introduce a set of four arbitrary basis vectors {ej},with reciprocal vectors {ek}defined by the equation ej*ek = a;, then the vector derivative assembles from the separate directional derivatives as V = ejej.V.
(2.39)
This definition shows how V acts algebraically as a vector, as well as inheriting a calculus from its directional derivatives. As an explicit example, consider the {y,J frame introduced above. In terms of this frame we can write the position vector x as xwy,,, with x o = t , x 1 = x, etc., and {x, y, z} a usual set of Cartesian components for the rest frame of the yo vector. From the definition (2.38) it is clear that (2.40)
which we abbreviate to ap. From the definition (2.39) we can now write
v = y a p = yoa, + y*a, + y2ay+ y3az,
(2.41)
which, in the standard matrix language of Dirac theory, is the operator that acts on Dirac spinors. It is not surprising, therefore, that the V operator should play a fundamental role in the STA formulation of the Dirac theory. What is less obvious is that the same operator should also play a fundamental role in the STA formulation of the Maxwell equations [8]. In tensor notation, the Maxwell equations take the form
aFFF”= J ” ,
t@’Ful = 0,
(2.42)
where [. ..] denotes total antisymmetrization of the indices inside the brackets. On defining the bivector
F
~F~”yFAyu
(2.43)
and the vector J = J F y F the equations (2.42) become V * F= J
(2.44)
V A F = 0.
(2.45)
and
282
CHRIS DORAN ET AL.
But we can now utilize the geometric product to combine these separate equations into the single equation VF
=
J,
(2.46)
which contains all of the Maxwell equations. We see from (2.46) that the vector derivative plays a central role in Maxwell theory, as well as Dirac theory. The observation that the vector derivative is the sole differential operator required to formulate both Maxwell and Dirac theories is a fundamental insight afforded by the STA. Some consequences of this observation for propagator theory are discussed in [6]. The vector derivative acts on the object to its immediate right unless brackets are present, when it acts on everything in the brackets. Since the vector derivative does not commute with multivectors, it is useful to have a notation for when the derivative acts on a multivector to which it is not adjacent. We use overdots for this, so that in the expression VAB the V operator acts only on B . In terms of a frame of vectors we can write
V A b = ejAej.VB.
(2.47)
The overdot notation provides a useful means for expressing Leibniz’ rule via
V(AB)
=
VAB
+ VAB.
(2.48)
The spacetime split of the vector derivative requires some care. We wish to retain the symbol V for the spatial vector derivative, so that
V
k
= (+kak,
= 1,
. . . , 3.
(2.49)
This definition of V is inconsistent with the definition (2.35), so for the vector derivative we have to remember that
VY,
=
a,
-
v.
(2.50)
We conclude this introduction with some useful results concerning the vector derivative. We let the dimension of the space of interest be n , so that the results are applicable to both space and spacetime. The most basic results are that Vx
=
(2.51)
n
and that VAVg
=
0,
(2.52)
where $ is an arbitrary multivector field. The latter result follows from the fact that partial derivative commute. For a grade-r multivector A , , the following results are also useful:
SPACETIME ALGEBRA AND ELECTRON PHYSICS
(2.53)
V i - A , = rA, V i A A , = (n - r)A, VAjr
= (-
283
l),(n - 2r)A,.
(2.54)
(2.55)
More complicated results can be built up with the aid of Leibniz' rule, for example, vx2 =
0i.X
+ v x * i= 2x.
(2.56)
This concludes our introduction to the spacetime algebra. Further details can be found in Space-Time Algebra by Hestenes [8] and Clifford Algebra to Geometric Calculus by Hestenes and Sobczyk [ 1 2 ] .The latter is a technical exposition of geometric algebra in general and does not deal directly with spacetime physics. A number of papers contain useful introductory material, including those by Hestenes [7, 1 1 , 131 and the series of papers [3-61 written by three of the present authors. Baylis et al. [14]and Vold [ 1 5 , 161 have also written good introductory pieces, and the books, New Foundations for Classical Mechanics by Hestenes [ 101 and Multivectors and Clifford Algebras in Electrodynamics by Jancewicz [17],provide useful background material. Further work can be found in the three conference proceedings [ 18-20], though only a handful of papers are directly relevant to the work reviewed in this paper. Of greater interest aie the proceedings of the conference entitled The Electron [21],which contains a number of papers dealing with the application of the STA to electron physics. 111. SPINORS A N D
THE
DIRACEQUATION
In this section we review how both the quantum states and matrix operators of the Pauli and Dirac theories can be formulated within the real STA. This approach to electron theory was initiated by Hestenes [22, 231 and has grown steadily in popularity ever since. We start with a review of the single-electron Pauli theory and then proceed to the Dirac theory. Multiparticle states are considered in Section IX. Before proceeding, it is necessary to explain what we mean by a spinor. The literature is replete with discussions about different types of spinors and their interrelationships and transformation laws. This literature is highly mathematical, and is of very little relevance to electron physics. For our purposes, we define a spinor to be an element of a linear space which is closed under left-sided multiplication by a rotor. Thus spinors are acted on by rotor representations of the rotation group. With this in
284
CHRIS DORAN ET AL.
mind, we can proceed directly to study the spinors of relevance to physics. Further work relating to the material in this section is contained in 141. A . Pauli Spinors We saw in Section 1I.A that the algebra of the Pauli matrices is precisely that of a set of three orthonormal vectors in space under the geometric product. So the Pauli matrices are simply a matrix representation of the geometric algebra of space. This observation opens up the possibility of eliminating matrices from the Pauli theory in favor of geometrically significant quantities. But what of the operator action of the Pauli matrices on spinors? This too needs to be represented with the geometric algebra of space. To achieve this aim, we recall the standard representation for the Pauli matrices:
The overhats distinguish these matrix operators from the { u k } vectors whose algebra they represent. The symbol i is reserved for the pseudoscalar, so the symbol j is used for the scalar unit imaginary employed in quantum theory. The {&k} operators act on 2-component complex spinors
where 4, and $* are complex numbers. Quantum states are written with bras and kets to distinguish them from STA multivectors. The set of I$)'s form a two-dimensional complex vector space. To represent these states as multivectors in the STA we therefore need to find a four-dimensional (real) space on which the action of the {&,J operators can be replaced by operations involving the { u k } vectors. There are many ways to achieve this goal, but the simplest is to represent a spinor 14) by an element of the even subalgebra of (2.28). This space is spanned by the set (1, h k } and the column spinor I$) is placed in one-to-one correspondence with the (Pau1i)-even multivector 9 = yo$yothrough the identification [4, 241
-a2
+ ja'
$ = ao + akiuk.
In particular, the spin-up and spin-down basis states become
(3.3)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
and
(y)
* -iu2.
285
(3.5)
The action of the quantum operators {ek} andjis now replaced by the operations
-
&kI+)
and
U k W 3
(k = 1, 2, 3)
(3.6)
A+) * +ic3.
(3.7) Verifying these relations is a matter of routine computation; for example,
C?~I+)
=
(
-a2
+ ju' +
)
uo ju3
-
-a2
+ u'ia3 - aoiu2+ u3iu, = cr,+c3.(3.8)
We have now achieved our aim. Every expression involving Pauli operators and spinors has an equivalent form in the STA and all manipulations vectors alone, with no can be carried out using the properties of the {uk} need to introduce an explicit matrix representation. This is far more than just a theoretical nicety. Not only is there considerable advantage in being able to perform the computations required in the Pauli theory without multiplying matrices together, but abstract matrix algebraic manipulations are replaced by relations of clear geometric significance. 1. Pauli Observables
We now turn to a discussion of the observables associated with Pauli spinors. These show how the STA formulation requires a shift in our understanding of what constitutes scalar and vector observables at the quantum level. We first need to construct the STA form of the spinor inner product ($14). It is sufficient just to consider the real part of the inner product, which is given by
W+I+) * (+++),
(3.9)
(+I+) * ($29)= ((ao- iujuj)(uO+ iuku,)) = ( d ) 2 + ukuk.
(3.10)
so that, for example,
(Note that no spatial integral is implied in our use of the bra-ket notation.) Since
($14) = %(+I+)
--jW+Ij+A
(3.11)
286
CHRIS DORAN ET AL.
-
the full inner product becomes
($14)
($7
4 ) s = ($W
(3.12)
- ($Jt4ia3)ic+3.
The right-hand side projects out the { 1, ia3}components from the geometric product The result of this projection on a multivector A is written (A),. For Pauli-even multivectors this projection has the simple form ( A ) , = h(A
-
(3.13)
ic+,Aic+,).
As an application of (3.12), consider the expectation value of the spin in the k direction,
($I4M* ($t%$c+3)
-
(dJtc+k$Qic+3
(3.14)
*
reverses to give minus itself, it has zero scalar part. The Since right-hand side of (3.14) therefore reduces to (Crk$u3$t) = ' T k ' ($m3$t>u
(3.15)
9
where (. . .), denotes the relative vector component of the term in brackets. (This notation is required because (. . .), would denote the spacetime vector part of the term in brackets.) The expression (3.15) has a rather different interpretation in the STA to standard quantum mechanics-it is the ak component of the vector part of $ ( T ~ $As ~ .$u,$~is both Pauli-odd and Hermitian-symmetric, it can contain only a relative vector part, so we define the spin-vector s by
s = $a3$+.
(3.16)
(In fact, both spin and angular momentum are better viewed as spatial bivector quantities, so it is usually more convenient to work with is instead of s.) The STA approach thus enables us to work with a single vector s, whereas the operator/matrix theory treats only its individual components. We can apply a similar analysis to the momentum operator. The momentum density in the k direction is given by
($1
- jd,l$)
-
-(Jltgk
*
V$ic+d -
Wc+k
*
V $ )ia3
7
(3.17)
in which the final term is a total divergence and so is ignored. Recombining with the {u,} vectors, we find that the momentum vector field is given by p
=
-V(~,hic+,~l~).
(3.18)
It might appear that we have just played a harmless game by redefining various observables, but in fact something remarkable has happened. The spin vector s and the momentump are both legitimate (i.e., gauge-invariant) quantities constructed from the spinor $. But standard quantum theory dictates that we cannot simultaneously measure all three components
SPACETIME ALGEBRA AND ELECTRON PHYSICS
287
of s, whereas we can for p . The “proof” of this result is based on the noncommutativity of the {bk}operators. But in the STA formulation, this lack of commutativity merely expresses the fact that the {vk}vectors are orthogonal-a fact of geometry, not of dynamics! Furthermore, given a spinor JI, there is certainly no difficulty in finding the vector s. So how then are we to interpret a spin measurement, as performed by a SternGerlach apparatus, for example? This problem will be treated in detail in Section VIII, but the conclusions are straightforward. A Stern-Gerlach apparatus is not a measuring device-it should really be viewed as a spin polarizer. When a spinor wavepacket with arbitrary initial vector s enters a Stern-Gerlach apparatus, the wavepacket splits in two and the vectors rotates to align itself either parallel or antiparallel to the B field. The two different alignments then separate into the two packets. Hence, in the final beams, the vector s has been polarised to point in a single direction. So, having passed through the apparatus, all three components of the spin vector s are known-not just the component in the direction of the B field. This is a major conceptual shift, yet it is completely consistent with the standard predictions of quantum theory. Similar views have been expressed in the past by advocates of Bohm’s “causal” interpretation of guantum theory [25-271. However, the shift in interpretation described here is due solely to the new understanding of the role of the Pauli matrices which the STA affords. It does not require any of the additional ideas associated with Bohm’s interpretation, such as quantum forces and quantum torques.
2. Spinors and Rotations Further insights into the role of spinors in the Pauli theory are obtained by defining a scalar p
= $4,
(3.19)
so that the spinor JI can be decomposed into
JI
p“’R.
(3.20)
RR~ = I.
(3.22)
=
Here, R is defined as and satisfies In Section I1 we saw that rotors, which act double-sidedly to generate rotations, satisfy Eq. (3.22). It is not hard to show that, in three dimen-
288
CHRIS DORAN ET AL.
sions, all even quantities satisfying (3.22) are rotors. It follows from (3.20) that the spin vector s can now be written as s = pRu3Rt,
(3.23)
which demonstrates that the double-sided construction of the expectation value (3.15) contains an instruction to rotate the fixed u3 axis into the spin direction and dilate it. The decomposition of the spinor $ into a density term p and a rotor R suggests that a deeper substructure underlies the Pauli theory. This is a subject which has been frequently discussed by Hestenes [9, 23, 28, 291. As an example of the insights afforded by this decomposition, one can now “explain” why spinors transform singlesidedly under rotations. If the vector s is to be rotated to a new vector Rod?;, then, according to the rotor group combination law, R must transform to R,R . This induces the spinor transformation law, $
-
RO$,
(3.24)
which is the STA equivalent of the quantum transformation law (3.25) where {nk} are the components of a unit vector. We can also now see why the presence of the c3vector on the righthand side of the spinor I/J does not break rotational invariance. All rotations are performed by left-multiplication by a rotor, so the spinor $ effectively shields the u3 on the right from the transformation. There is a strong analogy with rigid-body mechanics in this observation, which has been discussed by Hestenes [9, 301. Similar ideas have also been pursued by Dewdney, Holland, and Kyprianidis [31, 321. We shall see in the next section that this analogy extends to the Dirac theory. The main results of this section are summarized in Table 11.
B. Dirac Spinors The procedures developed for Pauli spinors extend simply to Dirac spinors. Again, we seek to represent complex column spinors, and the matrix operators acting on them, by multivectors and functions in the STA. Dirac spinors are four-component complex entities, so must be represented by objects containing 8 real degrees of freedom. The representation that turns out to be most convenient for applications is via the 8-dimensional even subalgebra of the STA [23, 331. If one recalls from Section 1I.A that the even subalgebra of the STA is isomorphic to the Pauli algebra, we see
SPACETIME ALGEBRA AND ELECTRON PHYSICS
Pauli matrices
B,=
Spinor equivalence
A)
(;
B,=
(;
Y)
(
u 3 ) tf ,/t 14) = -a2 ++jja'
+3=
= a0
289
(; ",)
+ akiuk
Observables
that what is required is a map between column spinors and elements of the Pauli algebra. To construct such a map we begin with the y matrices in the standard Dirac-Pauli representation [34],
where qs = +5 = -j+,,+,j2T3 and I is the 2 x 2 identity matrix. A Dirac column spinor I+) is placed in one-to-one correspondence with an 8-component even element of the STA via [24, 351
14) =
(2;;) bO+jb3
++
I,b
= Uo -k U%Uk
+ (bo-k
bkiUk)C3.
(3.27)
With the spinor I+) now replaced by an even multivector, the action of the operators {Tp, q5,j } becomes
290
--
CHRIS DORAN ET AL.
?,I$) A$) ?51$)
YpJ,Yo $iu3
+c+3
(El. = 07 . . . t 3)
(3.28)
*
To verify these relations, we note that the map (3.27) can be written more concisely as
(3.29) where 14) and 17) are two-component spinors, and 4 and r ) are their Pauli-even equivalents, as defined by the map (3.3). We can now see, for example, that
as required. The map (3.29) shows that the split between the ‘‘large’’ and “small” components of the column spinor I+) is equivalent to splitting into Pauli-even and Paulil-odd terms in the STA.
+
1. Alternative Representations
All algebraic manipulations can be performed in the STA without ever introducing a matrix representation, so Eqs. (3.27) and (3.28) achieve a map to a representation-free language. However, the explicit map (3.27) between the components of a Dirac spinor and the multivector J, is relevant only to the Dirac-Pauli matrix representation. A different matrix representation requires a different map, so that the effect of the matrix operators is still given by (3.28). The relevant map is easy to construct given the unitary matrix $, which transforms between the matrix representations via (3.31) The corresponding spinor transformation is IQ) H $I$), and the map is constructed by transforming the column spinor I$)’ in the new representation back to a Dirac-Pauli spinor $ ‘I$)‘. The spinor St[$)‘is then mapped into the STA in the usual way (3.27). As an example, consider the Weyl representation defined by the matrices [36]
The Weyl representation is obtained from the Dirac-Pauli representation by the unitary matrix,
SPACETIME ALGEBRA AND ELECTRON PHYSICS
29 1 (3.33)
A spinor in the Weyl representation is written as (3.34) where
Ix) and 15) are 2-component spinors. Acting on I$)' with U' gives "t
u(-=')$I
Ix) - 1%) v5 IX) + I?) 1
)
(3.35)
Using Eq. (3.27), this spinor is mapped onto the even element
where x and 77 are the Pauli-even equivalents of the 2-component complex spinors Ix) and I?), as defined by Eq. (3.3). The even multivector 1 $ = x-(1
v5
+ UJ - 7 - q l v5
- UJ
(3.37)
i's therefore our STA version of the column spinor (3.38) where I$)' is acted on by matrices in the Weyl representation. As a check, we observe that
and
(3.40) (Here we have used Eq. (3.7) and the fact that yo commutes with all Paulieven elements.) The map (3.36) does indeed have the required properties. While our procedure ensures that the action of the {qp, q5} matrix operators is always given by (3.28), the same is not true of the operation of complex conjugation. Complex conjugation is a representation-dependent
292
CHRIS DORAN ET AL.
operation, so the STA versions can be different for different representations. For example, complex conjugation in the Dirac-Pauli and Weyl representations is given by
I$)* * -Y2$Y29
(3.4 1)
whereas in the Majorana representation complex conjugation leads to the STA operation 141 I$)taj
* Ijlu.2
(3.42)
*
Rather than think of (3.41) and (3.42) as different representations of the same operation, however, it is simpler to view them as distinct STA operations that can be performed on the multivector $.
C . The Dirac Equation and Obseruables As a simple application of (3.27) and (3.28), consider the Dirac equation P”(ja, - eA,>I$) =
44).
(3.43)
The STA version of this equation is, after postmultiplication by y o ,
V$iu3
-
eA$
=
m$yo,
(3.44)
where V = y”dlL is the spacetime vector derivative (2.41). The STA form of the Dirac equation (3.44) was first discovered by Hestenes [8], and has been discussed by many authors since; see, for example, [35,37-401. The translation scheme described here is direct and unambiguous and the resulting equation is both coordinate-free and representation-free. In manipulating Eq. (3.44) one needs only the algebraic rules for multiplying spacetime multivectors, and the equation can be solved completely without ever introducing a matrix representation. Stripped of the dependence on a matrix representation, Eq. (3.44) expresses the intrinsic geometric content of the Dirac equation. In order to discuss the observables of the Dirac theory, we must first consider the spinor inner product. It is necessary at this point to distinguish between the Hermitian and Dirac adjoint. These are written as
($1 ($1
- Dirac adjoint -Hermitian adjoint,
(3.45)
which are represented in the STA as
(3.46)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
293
One can see clearly from these definitions that the Dirac adjoint is Lorentzinvariant, whereas the Hermitian adjoint requires singling out a preferred timelike vector. The inner product is handled as in Eq. (3.12), so that
7
ff
(3.47)
which is also easily verified by direct calculation. By utilizing (3.47), the STA form of the Dirac bilinear covariants [36] are readily found. For example,
CSI?,lJI)
-
($Y,+YO> - (+Y,JIiY3)4
= Y,
*
(+Yo$),
(3.48)
identifies the “observable” as the yp component of the vector ($yo?),. Since the quantity $yo$ is odd and reverse-symmetric it can only contain a vector part, so we can define the frame-free vector J by
J = $yo$.
(3.49)
The spinor JI has a Lorentz-invariant decomposition which generalizes the decomposition of Pauli spinors into a rotation and a density factor (3.20). Since +$ is even and reverses to give itself, it contains only scalar and pseudoscalar terms. We can therefore define pe@ = $4,
(3.50)
where both p and p are scalars. Assuming that p # 0, JI can now be written as
JI
~
pl/2eip12R 9
(3.51) (3.52)
The even multivector R satisfies RR = 1 and therefore defines a spacetime rotor. The current J (3.49) can now be written as J = pv,
(3.53)
= RyoR.
(3.54)
where u
The remaining bilinear covariants can be analyzed likewise, and the results are summarized in Table 111. The final column of this table employs the quantities s
= $y3$
and
S = JIiu3$.
(3.55)
294
CHRIS DORAN ET AL. TABLE 111 BILINEAR COVARIANTS I N THE DIRAC THEORY
Bilinear covariant
Standard form
STA equivalent
Frame-free form
Double-sided application of R on a vector a produces a Lorentz transformation [3]. The full Diract spinor $ therefore contains an instruction to rotate the fixed {yp} frame into the frame of observables. The analogy with rigid-body dynamics first encountered in Section 1II.A with Pauli spinors therefore extends to the relativistic theory. In particular, the unit vector u (3.54) is both future-pointing and timelike and has been interpreted as defining an electron velocity [9, 231 (see also the critical discussion in [6]). The “P factor” appearing in the decomposition of $ (3.50) has also been the subject of much discussion [6, 9, 38, 391 since, for free-particle states, P determines the ratio of particle to antiparticle solutions. It remains unclear whether this idea extends usefully to interacting systems. In Section II1.A we argued that, for Pauli spinors, any dynamical consequences derived from the algebraic properties of the Pauli matrices were questionable, since the algebra of the Pauli matrices merely expresses the geometrical relations between a set of orthonormal vectors in space. Precisely the same is true of any consequences inferred from the properties of the Dirac matrices. This observation has the happy consequence of removing one particularly prevalent piece of nonsense-that the observed velocity of an electron must be the speed of light [41, 421. The “proof’ of this result is based on the idea that the velocity operator in the k direction is the ?Aomatrix. Since the square of this matrix is 1, its eigenvalues must be k l . But in the STA the fact that the square of the (7,) matrices is ?1 merely expresses the fact that they form an orthonormal basis. This cannot possibly have any observational consequences. To the extent that one can talk about a velocity in the Dirac theory, the relevant observable must be defined in terms of the current J . The {yp} vectors play no other role than to pick out the components of this current in a particular frame. The shift from viewing the {y,} as operators to viewing them as an arbitrary, fixed frame is seen clearly in the definition of the current J (3.49). In this
295
SPACETIME ALGEBRA AND ELECTRON PHYSICS
expression it is now the 4J that “operates” to align the yo vector with the observable current. Since 4J transforms single-sidedly under rotations, the fixed initial yo vector is never affected by the rotor and its presence does not violate Lorentz invariance 141. We end this subsection by briefly listing how the C,P , and T symmetries are handled in the STA. Following the conventions of Bjorken and Drell [34], we find that (3.56)
fl + ) * i Y o 4 J ( - m
9
where X = yoxyo is (minus) a reflection of x in the timelike yo axis. The combined CPT symmetry corresponds to
4J
-
-qJ(-x),
(3.57)
so that CPT symmetry does not require singling out a preferred timelike vector. A more complete discussion of the symmetries and conserved quantities of the Dirac theory from the STA viewpoint is given in [5]. There the “multivector derivative” was advocated as a valuable tool for extracting conserved quantities from Lagrangians. 1. Plane- Wave States
In most applications of the Dirac theory, the external fields applied to the electron define a rest frame, which is taken to be the yo frame. The rotor R then decomposes relative to the yovector into a boost L and a rotation @, R
=
La,
(3.58)
L,
(3.59)
6,
(3.60)
where
Lt
=
@+ =
and LL
=
@6
= 1. A positive-energy plane-wave state
4~ = where
is defined by (3.61)
4Jo is a constant spinor. From the Dirac equation (3.44) with A 4Jo satisfies
=
0 , it follows that
P4Jo
=
m4Joyo.
(3.62)
296
CHRIS DORAN ET AL.
Postmultiplying by
I&, we see that PJr$
=
(3.63)
mJ,
from which it follows that exp(ip) = 2 I. Since p has positive energy, we must take the positive solution ( p = 0). It follows that Jro is just a rotor with a normalization constant. The boost L determines the momentum by p = mLyJ
= mL2yo,
(3.64)
E+m+p d 2 m ( E + m )’
(3.65)
which is solved by
L=V&G= where
(3.66)
PYO = E + P.
The Pauli rotor Q, determines the “comoving” spin bivector Q,iu3Q,. This is boosted by L to give the spin S as seen in the laboratory frame. As Q,(+,Q, gives the relative spin in the rest frame of the particle, we refer to this as the “rest-spin.” In Section V1.C we show that the rest spin is equivalent to the “polarization” vector defined in the traditional matrix formulation. Negative energy solutions are constructed in a similar manner, but with an additional factor of i or u3on the right (the choice of which to use is simply a choice of phase). The usual positive- and negative-energy basis states employed in scattering theory are (following the conventions of Itzykson and Zuber [36, Section 2-21) Positive energy
+(+)(x)= ur(p)e-’“3P‘“
(3.67)
and Negative energy
Jr(-)(x) = ~ ~ ( p ) e ~ “ + ” ~ , (3.68)
with
(3.69)
ur(P) = L ( P ) x ,
and u r (P ) =
U P)xru3
*
(3.70)
Here L ( p ) is given by Eq. (3.65) and xr = (1, -iu2} are spin basis states. The decomposition into a boost and a rotor turns out to be very useful in scattering theory, as is demonstrated in Section V. The main results for Dirac operators and spinors are summarized in Table IV.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
297
TABLE IV
SUMMARY OF THE MAINRESULTSFOR
Spinor equivalence
THE
[
a'
-a2
I$)=
bO
STA REPRESENTATION OF DIRACSPINORSO
+j a 3
+j a ' +
jb3)
a0 + a k i q +
+ bkiuk)u3
= (bo
-bZ+jbl
Operator equivalences
Dirac equation
V$iu3 - eA$
=rnho
The matrices and spinor equivalence are for the Dirac-Pauli representation. The spinor equivalences for other representations are constructed via the method outlined in the text. (I
IV. OPERATORS, MONOGENICS, AND
THE
HYDROGEN ATOM
So far, we have seen how the STA enables us to formulate the Dirac equation entirely in the real geometric algebra of spacetime. In so doing, one might worry that contact has been lost with traditional, operator-
298
CHRIS DORAN ET AL.
based techniques, but in fact this is not the case. Operator techniques are easily handled within the STA, and the use of a coordinate-free language greatly simplifies manipulations. The STA furthermore provides a sharper distinction between the roles of scalar and vector operators. This section begins by constructing a Hamiltonian form of the Dirac equation. The standard split into even and odd operators then enables a smooth transition to the nonrelativistic Pauli theory. We next study central fields and construct angular momentum operators that commute with the Hamiltonian. These lead naturally to the construction of the spherical monogenics, a basis set of orthogonal eigenfunctions of the angular momentum operators. We finally apply these techniques to two problems-the hydrogen atom, and the Dirac “oscillator.” A . Hamiltonian Form and the Nonrelativistic Reduction
The problem of how to best formulate operator techniques within the STA is really little more than a question of finding a good notation. We could, of course, borrow the traditional Dirac “bra-ket” notation, but we have already seen that the bilinear covariants are better handled without it. It is easier instead just to juxtapose the operator and the wavefunction on which it acts. But we saw in Section I11 that the STA operators often act double-sidedly on the spinor g. This is not a problem, as the only permitted right-sided operations are multiplication by yo or icr,, and these operations commute. Our notation can therefore safely suppress these right-sided multiplications and lump all operations on the left. The overhat notation is useful to achieve this, and we define (4.1)
It should be borne in mind that all operations are now defined in the STA, so the 3, are not intended to be matrix operators, as they were in Section 1II.B. It is also useful to have a symbol for the operation of right-sided multiplication by iv,.The symboljcarries the correct connotations of an operator that commutes with all others and squares to - 1, and we define
j g = $iv3.
(4.2)
The Dirac equation (3.44) can now be written in the “operator” form j V + - eA+ = mg,
(4.3)
where
VJl = Vgy,
and
A$ = AJly,.
(4.4)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
299
Writing the Dirac equation in the form (4.3) does not add anything new, but does confirm that we have an efficient notation for handling operators in the STA. In many applications we require a Hamiltonian form of the Dirac equation. To express the Dirac equation (3.44) in Hamiltonian form, we simply multiply from the left by yo. The resulting equation, with the dimensional constants temporarily put back in, is jhd,$
=
cpJI
+ eV$
-
ceA$
+ mc2$,
(4.5)
where @$= -jtiV$,
(4.6)
i= YO$YO%
(4.7)
y o A = V - cA.
(4.8)
and Choosing a Hamiltonian is a noncovariant operation, since it picks out a preferred timelike direction. The Hamiltonian relative to the yo direction is the operator on the right-hand side of Eq. (4.5). We write this operator with the symbol X. 1. The Pauli Equation
As a first application, we consider the nonrelativistic reduction of the Dirac equation. In most modern texts, the nonrelativistic approximation is carried out via the Foldy-Wouthuysen transformation [34, 361. While the theoretical motivation for this transformation is clear, it has the defect that the wavefunction is transformed by a unitary operator which is very hard to calculate in all but the simplest cases. A simpler approach, dating back to Feynman [41], is to separate out the fast-oscillating component of the waves and then split into separate equations for the Pauli-even and Pauli-odd components of $. Thus, we write (with h = 1 and the factors of c kept in) $ = (4 + q ) e - ' v c 2 ' ,
(4* 9)
where = 4 and jj = -77. The Dirac equation (4.5) now splits into the two equations %$ - coq = 0
(4.10)
and (%
+ 2mc2)77- car$ = 0,
(4. I I )
300
CHRIS DORAN ET AL.
where
(4.12) and
04 = ( p - eA)+.
(4.13)
The formal solution to the second equation (4.11)is
(4.14) where the inverse on the right-hand side is understood to denote a power series. The power series is well defined in the nonrelativistic limit, as the 8 operator is of the order of the nonrelativistic energy. The remaining equation for is
+
84--
2m
(1-- 2mc2 +...)
0+=0,
(4.15)
which can be expanded out to the desired order of magnitude. There is little point in going beyond the first relativistic correction, so we approximate (4.15)by
(4.16) We seek an equation of the form 84 = X+,where X is the nonrelativistic Hamiltonian. We therefore need to replace the 080 term in Eq. (4.16)by a term that does not involve 8. To do so we would like to utilize the approximate result that
o2
(4.17)
4 9 %$I=-
2m
but we cannot use this result directly in the 080 term since the 8 does not operate directly on Instead we employ the operator rearrangement
+.
2080 = [O, [%, 011 + 802 + 02%
(4.18)
to write Eq. (4.16)in the form 02 %+=-+2m
802+ 028 1 8m2c2 +-8pnzc2
[O,18, 0114.
(4.19)
We can now make use of (4.17)to write 802+ = 0284 ==
64 + O(c-2) 2m
(4.20)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
30 1
and so approximate (4.16) by (4.21) which is valid to order c - ~ The . commutators are easily evaluated, for example,
[YZ,01 = -je(a,A
+ VV) =jeE.
(4.22)
There are no time derivatives left in this commutator, so we do achieve a sensible nonrelativistic Hamiltonian. The full commutator required in Eq. (4.21) is [O,[%, O]] = [ -jV - eA, jeE] (4.23) = (eVE) - 2eEAV - 2je2AAE, in which the STA formulation ensures that we are manipulating spatial vectors, rather than performing abstract matrix manipulations. The various operators (4.121, (4.13), and (4.23) can now be fed into Eq. (4.21) to yield the STA form of the Pauli equation, 1 d,+iu, = -( p - eA)2+ + eV+ -
2m
4 + 8m c
-[e(VE - 2EAV)+ - 2e2AAE+iu3], 8mV
(4.24)
-
which is valid to 0(cp2).(We have assumed that IAJ c-’ to replace the
O4 term byg4.) Using the translation scheme of Table 11, it is straightforward to check that Eq. (4.24) is the same as that found in standard texts [34]. In the standard approach, the geometric product in the VE term (4.24) is split into a “spin-orbit’’ term VAE and the “Darwin” term V * E. The STA approach reveals that these terms arise from a single source. A similar approximation scheme can be adopted for the observables of the Dirac theory. For example the current, $-yo& has a three-vector part J = (JIYo3;)No = +rl+ + Y#+.
(4.25)
which is approximated to first order by 1 J = -- ((V$kr3+t),- A+++). m
(4.26)
Not all applications of the Pauli theory correctly identify (4.26) as the conserved current in the Pauli theory-an inconsistency first noted by Hestenes and Gurtler [13] (see also the discussion in [6]).
302
CHRIS DORAN ET AL.
B. Angular Eigenstates and Monogenic Functions Returning to the Hamiltonian of Eq. (4.5), let us now consider the problem of a central potential V = V ( r ) , A = 0, where r = 1x1. We seek a set of angular momentum operators which commute with this Hamiltonian. Starting with the scalar operator B - ( x A V ) ,where B is a spatial bivector, we find that
[ B . ( x A V ) ,XI
= [ B . ( x A V ) ,-jVl
=j =
bB.(iAV)
(4.27)
-jB.V.
But, since B . V = [B, V]/2 and B commutes with the rest of X , we can rearrange the commutator into
[ B . ( x A V )- &B,X ] = 0,
(4.28)
which gives us the required operator. Since B . ( x A V ) - B/2 is an antiHermitian operator, we define a set of Hermitian operators as
JB E j ( B . ( x A V )- 4B).
(4.29)
The extra term of 4B is the term that is conventionally viewed as defining “spin 1/2.” However, the geometric algebra derivation shows that the result rests solely on the commutation properties of the B - ( x r \ V )and V operators. Furthermore, the factor of one-half required in the JB operators would be present in a space of any dimension. It follows that the factor of one-half in (4.29) cannot have anything to do with representations of the 3-D rotation group. From the STA point of view, J , is an operator-valued function of the bivector B . In conventional quantum theory, however, we would view the angular momentum operator as a vector with components
j ; = L$ + gc;,
(4.30)
where Zi = ( j / 2 ) ~ ; , ~ ? , &The . standard notation takes what should be viewed as the sum of a scalar operator and a bivector, and forces it to look like the sum of two vector operators! As well as being conceptually clearer, the STA approach is easier to compute with. For example, it is a simple matter to establish the commutation relation [JB,,JB21 =
-jJBIxBZ,
(4.31)
which forms the starting point for the representation theory of the angular momentum operators.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
303
1. The Spherical Monogenics
The key ingredients in the solution of the Dirac equation for problems with radial symmetry are the spherical monogenics. These are Pauli spinors [even elements of the Pauli algebra (2.28)] which satisfy the eigenvalue equation -xAV$
=
(4.32)
l$.
Such functions are called spherical monogenics because they are obtained from the “monogenic equation” v*=o
(4.33)
by separating W into r‘$(O,+). Equation (4.33) generalizes the concept of an analytic function to higher dimensions [7, 121. To analyze the properties of Eq. (4.32) we first note that [Jg,XAV] = 0 ,
(4.34)
which is proved in the same manner as Eq. (4.28). It follows that JI can simultaneously be an eigenstate of the x A V operator and one of the JB operators. To simplify the notation we now define Jk$ = Jimk$= (iak * (xAV) - +iuk)$iu3.
(4.35)
We choose $ to be an eigenstate of .I3,and provisionally write -xAV$
= 1$
and
J3$ = p$.
(4.36)
Before proceeding, we must introduce some notation for a sphericalpolar coordinate system. We define the { r , 8, +} coordinates via
r = d?,
cos 8 = cr3* x / r ,
and
tan
+ = u2. x / u , ‘x. (4.37)
The associated coordinate frame is e, = sin 8(cos 4 m I+ sin
+ cr2) + cos 8 u3
en = r cos 8(cos 4 cr, + sin 4 m2) - r sin 8 cr3
(4.38)
+
e+ = r sin 8(- sin uI+ cos 4 u2). From these we define the orthonormal vectors {u,,
u+}by
(4.39)
304
CHRIS DORAN ET AL.
The {ur,u,, u+}form a right-handed set, since urup+= 1.
(4.40)
xAVu, = 2u,.
(4.41)
-xAV(ur$u3) = -(1+ 2)u,$u,
(4.42)
The vector ur satisfies It follows that
so, without loss of generality, we can choose 1 to be positive and recover the negative-1 states through multiplying by ur.In addition, since XAV(XAV$) = P$,
(4.43)
we find that (4.44) Hence, with respect to a constant basis for the STA, the components of $ are spherical harmonics and 1 must be an integer for any physical solution. The next step is to introduce ladder operators to move between different J3 eigenstates. The required analysis is standard, and has been relegated to the Appendix. The conclusions are that, for each value of 1, the allowed values of the eigenvalues of J3 range from (1 + 4) to -(I + 4). The total degeneracy is therefore 2(1 + 1). The states can therefore be labeled by two integers 1 and m such that -xAV+;"
= l$;"
1 20
(4.45)
-1 - 1 s m s l .
(4.46)
and J&;"=(m+t)$;"
Labeling the states in this manner is unconventional, but provides for many simplifications in describing the properties of the $;". To find an explicit expression for the we start from the highest-m eigenstate, which is given by
$r
+f= sin'0 e'"~3,
(4.47)
and act on this with the lowering operator J - . This procedure is described in detail in the Appendix. The result is the following, remarkably compact formula:
9;"
= [(1+ m
+ l)P;"(cos 0) - P;"+'(cosO)iu,1em+'"3,
(4.48)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
305
where the associated Legendre polynomials follow the conventions of Gradshteyn and Ryzhik [43]. The expression (4.48) offers a considerable improvement over formulas found elsewhere in terms of both compactness and ease of use. The formula (4.48) is valid for non-negative 1 and both signs of m. The positive and negative m states are related by (4.49)
The negative I states are constructed using (4.42), and the J3 eigenvalues are unchanged by this construction. The possible eigenvalues and degeneracies are summarized in Table V. One curious feature of this table is that we appear to be missing a line for the eigenvalue 1 = - 1. In fact, solutions for this case do exist, but they contain sinularities which render them unnormalizable. For example, the functions ia+ -
and
sin 8
e +3$J sin 8
-
(4.50)
have 1 = - 1 and J3 eigenvalues +t and -t, respectively. Both solutions are singular along the z axis, however, so they are of limited physical interest. C . Applications
Having established the properties of the spherical monogenics, we can proceed quickly to the solution of various problems. We have chosen to consider two-the standard case of the hydrogen atom, and the “Dirac oscillator” [44].
TABLE V EICENVALUES AND DEGENERACIES FOR THE MONOCENICS
$r
I
Eigenvalues of J3
Degeneracy
... -3 ... -f $ ... -4
2
Q
6
1 0 (- 1)
9
4
-2
?
... -$
2 ?
2
306
CHRIS DORAN ET AL.
1 . The Coulomb Problem The Hamiltonian for this problem is
X+ =p+
-
-+r
za
+ m$,
(4.51)
where a = e2/4,rris the fine-structure constant and Z is the atomic charge. Since the JB operators commute with X, can be placed in an eigenstate of J , . The operator JiJi must also commute with X , but x A V does not, so both the and ur+;"u3 monogenics are needed in the solution. Though x A V does not commute with X , the operator
+
+;"
K
=
90(l- xAV)
(4.52)
does, as follows from
[90(1- xAV), v] = 29,v =
- ToViAV
(4.53)
0.
We can therefore work with eigenstates of the K operator, which means that the spatial part of goes either as
+
+(x, 1 + 1) = +;"u(r) + cr,+;"u(r)iu3
(4.54)
or as (4.55) + 1)) = u,+;ncr,u(r) + $;"iv(r). In both cases the second label in +(x, I + I ) specifies the eigenvalue of +(x, -(I
K . The functions u(r) and u(r) are initially "complex" superpositions of a scalar and an iu, term. It turns out, however, that the scalar and icr, equations decouple, and it is sufficient to treat u ( r ) and u(r) as scalars. We now insert the trial functions (4.54) and (4.55) into the Hamiltonian (4.51). Using the results that (4.56)
and looking for stationary-state solutions of energy E , the radial equations reduce to
where K is the eigenvalue of K . ( K is a nonzero positive or negative integer.) The solution of these radial equations can be found in many textbooks (see, for example, [34, 36, 451). The solutions can be given in terms of confluent hypergeometric functions, and the energy spectrum is obtained
SPACETIME ALGEBRA AND ELECTRON PHYSICS
307
from the equation (4.58) where n is a positive integer and u=
+ 1)2 + (Zff)2]”2.
[(I
(4.59)
While this analysis does not offer any new results, it should demonstrate how easy it is to manipulate expressions involving the spherical monogenics. 2. The Dirac “Oscillator” The equation describing a Dirac “oscillator” was introduced as recently as 1989 [44]. The equation is one for a chargeless particle with an anomalous magnetic moment [34] which, in the STA, takes the form V$ia, - ipF$y,
(4.60)
= m$yo.
This equation will be met again in Section VIII, where it is used to analyze the effects of a Stern-Gerlach apparatus. The situation describing the Dirac oscillator is one where the F field exerts a linear, confining force described by mw
F=-x.
(4.61)
P
The Hamiltonian in this case is = p$ - jmwx$
+ m$.
(4.62)
It is a simple matter to verify that this Hamiltonian commutes with both the JB and K operators defined above, so we can again take the wavefunction to be of the form of Eqs. (4.54) and (4.55). The resulting equations in this case are ( K - l)/r - mwr - ( E + m, (4.63) E-m ( - K - l ) / r + mwr
)(:).
(=)I:(
The equations are simplified by transforming to the dimensionless variable p , p
= (mw)’”r,
(4.64)
and removing the asymptotic behavior via u =
u1
(4.65)
308
CHRIS DORAN ET AL.
and u
=
pIe 4
(4.66)
1 2u 2 .
The analysis is now slightly different for the positive and negative equations, which we consider in turn. a . Positive
K.
K
The equations reduce to
which are solved with the power series u1 =
Anp2"
(4.68)
Bnp2"+'.
(4.69)
n=O
and u2 = n=O
The recursion relations are E+m 2nAn = --
(4.70)
6B n - i
and (2n
-m + 21 + 3)Bn = EAn + 2Bn-19 G
(4.71)
and the requirement that the series terminate produces the eigenvalue spectrum n = 1,2,...
E 2 - m 2 = 4nmw
(4.72)
Remarkably, the energy levels do not depend on 1, so they are infinitely degenerate ! b. Negative
K.
In this case the equations reduce to -2(1+ l ) / p
-(E
+m
) / G )
(1:) 1
(E- m ) / G
2P
(4.73)
and are solved with the power series A,p2"+l
uI = n=O
(4.74)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
309
and
u2 =
2 Bnp2".
(4.75)
n=O
The recursion relations become
E+m (2n + 21 + 3)An = --
(4.76)
65n
and
(4.77) and this time the eigenvalues are given by the formula
E 2 - m 2 = 2 ( 2 n + 2 1 + 1)mo
n=
1,2, ....
(4.78)
The energy spectrum contains only E through equations forE2.It follows that both positive and negative energies are allowed. The lowest positiveenergy state (the ground state) has E 2 = m 2 + 4mw, leading to a nonrelativistic energy of -2%. The ground state is infinitely degenerate, whereas the first excited state has a degeneracy of 2. The energy spectrum is clearly quite bizarre and does not correspond to any sensible physical system. In particular, this system does not reduce to a simple harmonic oscillator in the nonrelativistic limit. The simple, if bizarre, nature of the energy spectrum is obscured in other approaches [44, 461, which choose a less clear labeling system for the eigenstates.
V . PROPAGATORS A N D SCATTERING THEORY In this section we give a brief review of how problems requiring propagators are formulated and solved in the STA. The STA permits a first-order form of both Maxwell and Dirac theories involving the same differential operator-the vector derivative V. The key problem is to find Green's functions for the vector derivative that allow us to propagate initial data off some surface. We therefore start by studying the characteristic surfaces of the vector derivative. We then turn to the use of spinor potentials, which were dealt with in greater detail in [6]. The section concludes with a look at single-particle scattering theory. Using the mappings established in Sections 111 and IV, it is a simple matter to reformulate in the STA the standard matrix approach to scattering problems as described in [34, 361.
310
CHRIS DORAN ET AL.
The STA approach allows for a number of improvements, however, particularly in the treatment of spin. This is a subject which was first addressed by Hestenes [47], and our presentation closely follows his work.
A . Propagation and Characteristic Surfaces One of the simplest demonstrations of the insights provided by the STA formulation of both Maxwell and Dirac theories is in the treatment of characteristic surfaces. In the second-order theory, characteristic surfaces are usually found by algebraic methods. Here we show how the same results can be found using a simple geometric argument applied to firstorder equations. Suppose, initially, that we have a generic equation of the type
V $ =f($,X I ,
(5.1)
where $(x) is any multivector field (not necessarily a spinor field) and
f($,x) is some arbitrary, known function. If we are given initial data over some 3-D surface, are there any obstructions to us propagating this information off the surface? If so, the surface is a characteristic surface. We start at a point on the surface and pick three independent vectors {a, 6 , c} tangent to the surface at the chosen point. Knowledge of $ on the surface enables us to calculate a.
V$,
b * Vq,
and c * V$.
(5.2)
We next form the trivector a A b A c and dualize to define n = iaAbAc.
(5.3)
We can now multiply Eq. (5.1) by n and use
+ nAV+ = n.V+ + i ( a A b A c ) - V $ = n - V $ + i(aAbc.V$ - aAcb.V$ + bAca.V+),
nV+ = n . V $
(5.4)
to obtain
n.V$ = nf($, x)
- i(aAbc.V$ - a A c b * V $
+ bAca.V+).
(5.5) All of the terms on the right-hand side of Eq. (5.5) are known, so we can find n.V$ and use this to propagate $ in the n direction (i.e., off the surface). The only situation in which we fail to propagate, therefore, is
SPACETIME ALGEBRA AND ELECTRON PHYSICS
311
when n remains in the surface. This occurs when nA(aAbAc) = 0 +nA(ni) = 0 +it-n
=
(5.6)
0.
Hence we only fail to propagate when n2 = 0, and it follows immediately that the characteristic surfaces of Eq. (5.1) are null surfaces. This result applies to any first-order equation based on the vector derivative V, including the Maxwell and Dirac equations. The fundamental significance of null directions in these theories is transparent in their STA form. Furthermore, the technique extends immediately to a gravitational background, as described in [48]. B . Spinor Potentials and Propagators
A simple method to generate propagators for the Dirac theory is to introduce a spinor potential satisfying a scalar second-order equation. Suppose that J, satisfies the Dirac equation Vljricr, - m$yo = 0. $ can be generated from the (odd multivector) potential
J, = V4i-3
+ m4yO7
(5.7)
+ via (5.8)
provided that (V* + m2)+ = 0.
(5.9)
The standard second-order theory can then be applied to 4, and then used to recover I). In [6] this technique was applied to constant-energy waves J, = J,(X)e-i~3Er.
(5.10)
The Dirac equation then becomes V$icr3 + EJ, - m$ = 0,
(5.11)
which is solved by J, = -V+icr3
where
+ E+ + rn6,
(5.12)
312
CHRIS DORAN ET AL.
In this integral the initial data Jl(x’) is given over some closed spatial surface with normal n’ = n(x’), and p and r are defined by p
=
w
and
r = Ix-x’I.
(5.14)
Similar techniques can be applied to the propagation of electromagnetic waves (see [6] for details).
C . Scattering Theory We finish this short section with a brief look at how the matrix approach to scattering theory is handled in the STA, closely following the work of Hestenes [47]. We continue to employ the symbolj for iu3 in places where it simplifies the notation. In particular, we employ t h e j symbol in the exponential terms introduced by Fourier transforming to momentum space. Where the j ’ s play a more significant geometric role, they are left in the iu, form. We start by replacing the Dirac equation (3.44) with the integral equation
$(x)
+
= Jli(x) e J d4x‘S,(x - x ’)A(x’)Jl(x ’),
(5.15)
where Jli is the asymptotic in-state which solves the free-particle equation, and S,(x - x ’) is the STA form of the Feynman propagator. Substituting (5.15) into the Dirac equation, we find that S,(x - x‘)must satisfy
V,SF(x
- x ’ ) M ( x ’ ) i u 3- rnS,(x
- x ’ ) M ( x ’ ) y , = S ( x - x ’ ) M ( x ’ ) (5.16)
for an arbitrary multivector M ( x ’ ) . The solution to this equation is
where, for causal propagation, the dE integral must arrange that positivefrequency waves propagate into the future (t > t ’) and negative-frequency waves propagate into the past (t‘ > t ) . The result of performing the dE integral is
where E
=
+-
and M = M ( x ’ ) .
313
SPACETIME ALGEBRA AND ELECTRON PHYSICS
(5.19)
we find that, as t tends to $diff(x)
+w, $diff(X)
= -e
+
is given by
d3p 1 d 4 x f -- [ pA(x ’) $ ( x ‘ ) 1(277)’2E mA(x’)$(x’)yo]icr3e -jP’“-’’).
I
(5.20)
We therefore define a set of final states G f ( x ) by d4x‘
= -e I - [ p f A ( x ’ ) $ ( x ’ ) + mA(x’)$(x’)ro]iu,e-jp~’“-x’)(5.21) 2 4 which are plane-wave solutions to the free-field equations with momentum p f . $diff(x) can now be expressed as a superposition of these planewave states, &.(x)
9
(5.22) 1. The Born Approximation and Coulomb Scattering
In order to find q f ( x ) we must evaluate the integral (5.21). In the Born approximation, we simplify the problem by approximating $ ( x ’ ) by $Jx’). In this case, since t,hi(x’) = $;e -kX’ and
m$;yo = p i $ ; ,
(5.23)
we can write G f ( x )= -e
d4x’ 2Ef
I
-[ p f A ( x ’ )+ A ( ~ ’ ) p ~ ] $ ~ i u ~ e j ~ ‘ ~ ’ e -(5.24) j~r‘~,
where 4
Pf-
Pi.
(5.25)
The integral in (5.24) can now be evaluated for any given A field. As a simple application consider Coulomb scattering, for which A ( x ’ ) is given by Ze A ( x ’ )= (5.26) 477 (x’I yo ’ Inserting this in (5.26) and carrying out the integrals, we obtain (5.27)
314
CHRIS DORAN ET AL.
where
Here, E = Ef = Ei and a = e2/(47r) is the fine-structure constant. The quantity S, contains all the information about the scattering process. Its magnitude determines the cross section via [47] (5.29) and the remainder of Sfi determines the change of momentum and spin vectors. This is clear from (5.27), which shows that Sfi must contain the rotor R f R i , where Ri and Rf are the rotors for the initial and final planewave states. Substituting (5.28) into (5.29), we immediately recover the Mott scattering cross section
where
The notable feature of this derivation is that no spin sums are required. Instead, all the spin dependence is contained in the directional information in S,. As well as being computationally more efficient, the STA method for organizing cross-section calculations offers deeper insights into the structure of the theory. For example, for Mott scattering the directional information is contained entirely in the quantity [47] (5.32) where Lf and L, are the boosts contained in Rf and R , , respectively. The algebraic structure
sfi= PfM + M P ~ ,
(5.33)
where M is some odd multivector, is common to many scatteringproblems.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
315
Since S, contains the quantity R fR i,we obtain a spatial rotor by removing the two boost terms. We therefore define the (unnormalized) rotor
u; = Lf(Lj + L?)Li = LfLi + L f L i ,
(5.34)
so that Ut = U j / l U i [ determines the rotation from initial to final rest spins. A simple calculation gives (5.35) hence the rest-spin vector precesses in the pfApi plane through an angle 6, where
tan
(3
= (E
sin 6
+ m ) / ( E- m ) + cos 6 '
(5.36)
While the derivations of the Mott scattering formula and the polarization precession angle are only presented in outline here (further details are contained in [47]), it should be clear that they offer many advantages over the usual derivations [34, 361. All the features of the scattering are contained in the single multivector Sr;,the algebraic form of which is very simple. Much work remains, however, if these techniques are to be extended to the whole of QED.
STEPS VI. PLANEWAVESAT POTENTIAL We now turn to a discussion of the matching of Dirac plane waves at a potential step. The case of perpendicular incidence is a standard problem and is treated in most texts [34,36,42]. In order to demonstrate the power of the STA approach, we treat the more general case of oblique incidence, adapting an approach used in electrical engineering to analyze the propagation of electromagnetic waves. A number of applications are given as illustrations, including the tunneling of monochromatic waves and spin precession on total reflection at a barrier. We conclude the section with a discussion of the Klein paradox. The problem of interest is that of plane waves incident on a succession of potential steps. The steps are taken as lying along the x direction, with infinite extent in the y and z directions. Since the spatial components of the incoming and outgoing wavevectors lie in a single plane, the matching problem can be reduced to one in two dimensions. The analysis is simplified further if the wavevectors are taken to lie in the im, plane. (Other configurations can always be obtained by applying a rotation.) The ar-
316
CHRIS DORAN ET AL.
rangement is illustrated in Fig. 1. The waves all oscillate at a single frequency E , and the Dirac equation in the ith region is ( E - eVi)$ = -V$icr3
+ myo$yo.
(6.1)
By continuity of $ at each boundary, the y component of the wavevector, p y , must be the same in all regions. For the ith region we define E ] = E - eVi
(6.2)
and, depending on the magnitude of V i , the waves in the this region will be either traveling or evanescent. For traveling waves we define (dropping the subscripts) pz
IE - eVI >
El2 - p ; - m2,
w.
(6.3)
In terms of the angle of incidence 4 we also have p,
=p
cos 4,
p y = p sin 4,
El2
=p 2
+ m2.
(6.4)
For evanescent waves we write K2
-El2 + p ;
+ m*,
I E- eVI <
w.
(6.5)
In all the cases that we study, the incoming waves are assumed to be positive-energy traveling waves in a region where eV < E -
w.
-> 2
I FIGURE1 . Plane waves at a potential step. The spatial component of the wavevector lies in the x-y plane and the step lies in the y-z plane.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
317
Recalling the plane-wave solutions found in Section III.C, the traveling waves are given by
where
(6.7) so that P E’ sinh u = - and cosh u = -. m rn
The transmission and reflection coefficients Tand R are scalar +ir3combinations, always appearing on the right-hand side of the spinor. The fact that p y is the same in all regions gives the electron equivalent of Snell’s law, sinh u sin #I = constant.
(6.9)
The Pauli spinor Q, describes the rest spin of the particle, with Q, = 1 giving spin up and Q, = -ia, spin down. Other situations are, of course, built from superpositions of these basis states. For these two spin basis states the spin vector is *a3,which lies in the plane of the barrier and is perpendicular to the plane of motion. Choosing the states so that the spin is aligned in this manner simplifies the analysis, as the two spin states completely decouple. Many treatments (including one published by some of the present authors [6]) miss this simplification. There are three matching situations to consider, depending on whether the transmitted waves are traveling, evanescent, or in the Klein region (eV > E + We consider each of these in turn.
w).
A . Matching Conditions for Traveling Waves The situation of interest here is when there are waves of type (6.6) in both regions. The matching condition in all of these problems is simply that 9 is continuous at the boundary. The work involved is therefore, in principle,
3 18
CHRIS DORAN ET AL.
less than for the equivalent nonrelativistic problem. The matching is slightly different for the two spins, so we consider each in turn. 1. Spin U p (Q, = 1) We simplify the problem initially by taking the boundary at x = 0. Steps at other values of x are then dealt with by inserting suitable phase factors. The matching condition at x = 0 reduces to
(6.10)
Since the equations for the reflection and transmission coefficients involve only scalar and ia, terms, it is again convenient to replace the ia, bivector with the symbol j . If we now define the 2 X 2 matrix cosh(uJ2)
cosh(uj12)
sinh(uj/2)ej+i
-sinh(uj/2)e-j+i
(6.1I)
we find that Eq. (6.10) can be written concisely as A(i):
RL I = A j t l (TJtl).
(6.12)
The Ai matrix has a straightforward inverse, so Eq. (6.12) can be easily manipulated to describe various physical situations. For example, consider plane waves incident on a single step. The equation describing this configuration is simply (6.13)
so that
(:;)
=
TJ sinh u I cos c#q sinh(ul/2)cosh(uz/2)e-j+l+ cosh(u,/2) sinh(uz/2)ej+2 sinh(u,/2) cosh(uz/2)ej+l- cosh(u,/2) sinh(uz/2)ej42 (6.14)
319
SPACETIME ALGEBRA AND ELECTRON PHYSICS
from which the reflection and transmission coefficients can be read off. The case of perpendicular incidence is particularly simple as Eq. (6.12) can be replaced by sinh i ( ~ + ;+ u;)~
sinh &;+I
sinh I ( U , ~ - u;)
sinh Hu;+I + u;)
):(
- ";I)
7
(6.15) which is valid for all spin orientations. So, for perpendicular incidence, the reflection coefficient r = R , / T Iand transmission coefficient t = T2/Tl at a single step are
(6.16) which agree with the results given in standard texts (and also [351).
2. Spin-Down (4 = - i q ) The matching equations for the case of opposite spin are
(6.17)
Pulling the iu2 out on the right-hand side just has the effect of complexconjugating the reflection and transmission coefficients, so the matrix equation (6.10) is unchanged except that it now relates the complex conjugates of the reflection and transmission coefficients. The analog of Eq. (6.12) is therefore
(6.18) As mentioned earlier, the choice of alignment of spin basis states ensures that there is no coupling between them. One can string together series of barriers by including suitable "propagation" matrices. For example, consider the setup described in Fig. 2. The
320
CHRIS DORAN ET AL.
x=o
X :
,X
d
FIGURE2. Plane waves scattering from a barrier. The barrier has height eV and width d . Quantities inside the barrier are labeled with a subscript 1 , and the free quantities have no subscripts. The phases are given by 6, = md sinh uI cos 4, and 6 = p,d.
matching equations for spin up are, at the first barrier,
(")
A(T RT ? ) = A , R,r
(6.19)
and at the second barrier,
Equation (6.20) demonstrates neatly how matrices of the type (6.21)
where 6 = p,d and d is the distance between steps, can be used to propagate from one step to the next. In this case the problem is reduced to the equation
(i:) [ =
cos 6, I - j sin 6, A-IA,
(:")
A;'*]
('j':'
')
, (6.22)
which quickly yields the reflection and transmission coefficients. B . Matching onto Evanescent Waves
Before studying matching onto evanescent waves, we must first solve the Dirac equation in the evanescent region. Again, the two spin orientations
32 1
SPACETIME ALGEBRA AND ELECTRON PHYSICS
behave differently and are treated separately. Taking spin up first and looking at the transmitted (decaying) wave in the evanescent region, the solution takes the form
Substituting this into the Dirac equation yields
+
(E ' y o pyy2)e"u2/2 = euu2/2(myo -KY~),
which is consistent with the definition of find that
(i)
tanh -
K
(6.24)
(6.5). From Eq. (6.24) we
E ' - m - P"+K p y - ~ E'+m'
(6.25)
=--
which completes the solution. For the incoming (growing) wave, we flip the sign of K . We therefore define u* via tanh
(c)
p
? K
(6.26)
=E: +
and write the outgoing and incoming spin-up waves in the evanescent region as
If we now consider matching at x
[cosh
=
0, the continuity equation becomes
(7) + sinh (7) ule+iu3]T? + [cosh (2) [cosh (%) + sinh (*) u2] T J 1
- sinh
=
(7)
c 1 d i i u 3 ]
R?
(6.28)
On defining the matrix cosh(u'/2)
cosh(u;/2)
j sinh(u'/2)
+j sinh(u;/2)
(6.29)
322
CHRIS DORAN ET AL.
we can write Eq. (6.28) compactly as (6.30) Again, either of the matrices can be inverted to analyze various physical situations. For example, the case of total reflection by a step is handled by (6.31) from which one finds the reflection coefficient pt = -
tanh(u+/2) + tanh(u/2)jej4 tanh(u+/2) - tanh(u/2)je-j#'
(6.32)
which has Ir I = 1, as expected. The subscripts on u I , u : , and 41are all obvious, and have been dropped. The case of spin down requires some sign changes. The spinors in the evanescent region are now given by
and, on defining cosh(u'/2) -j sinh(u'l2)
the analog of Eq. (6.30) is A?
(x,
= B,;;
("')
(6.34)
(6.35)
R!+ 1
These formulas are now applied to two situations of physical interest. C . Spin Precession at a Barrier
When a monochromatic wave is incident on a single step of sufficient height that the wave cannot propagate, there is total reflection. In the
SPACETIME ALGEBRA AND ELECTRON PHYSICS
323
preceding section we found that the reflection coefficient for spin up is given by Eq. (6.32), and the analogous calculation for spin down yields r.1
=-
tanh(u-/2) - tanh(d2)je -jtJ tanh(u-/2) + tanh(zd2)jdtJ '
(6.36)
Both r t and r1 are pure phases, but there is an overall phase difference between the two. If the rest-spin vector s = Q,rr,cj? is not perpendicular to the plane of incidence, then this phase difference produces a precession of the spin vector. To see how, suppose that the incident wave contains an arbitrary superposition of spin-up and spin-down states,
where
4
=
4,
-
$29
E
=
4, +
42,
(6.38)
and the final pure-phase term is irrelevant. After relection, suppose that the separate up and down states receive phase shifts of 6 T and 6 1, respectively. The Pauli spinor in the reflected wave is therefore
where 6
=
6T - 8.1.
(6.40)
and again there is an irrelevant overall phase. The rest-spin vector for the reflected wave is therefore sr =
Grrr3&= eiu36/~se-iu,8i2,
(6.41)
so the spin vector processes in the plane of incidence through an angle 6 T - 6 .l. If 6 f and 6 are defined for the asymptotic (free) states, then this result for the spin precession is general. To find the precession angle for the case of a single step, we return to the formulas (6.32) and (6.36) and write ejs = r t y l * -
(tanh(u+/2)+ tanh(u/2)jej+)(tanh(u-/2) + tanh(u/2)jej+) (tanh(uf/2) - tanh(ul2)je -jtJ)(tanh(u-/2) - tanh(u/2)je -j+)'
(6.42)
324
CHRIS DORAN ET AL.
If we now recall that
(i)
tanh -
P E+m’
tanh
=-
($)
=
Py
*
K
E-eV+m’
(6.43)
we find that eja = e2j4m cos 4 - j E sin 4 m cos 4 + j E sin 4’
(6.44)
The remarkable feature of this result is that all dependence on the height of the barrier has vanished, so that the precession angle is determined solely by the incident energy and direction. To proceed we write
m cos 4
-jE
sin 4
=
peja
(6.45)
so that tan a = - cosh u tan 4.
(6.46)
Equation (6.44) now yields tan 4,
(6.47)
(cosh u - 1) tan 4 I + cosh u tan24 .
(6.48)
= - cosh u
from which we obtain the final result that tan
(3
=-
A similar result for the precession angle of the rest-spin vector was obtained by Fradkin and Kashuba [49] using standard techniques. Readers are invited to compare their derivation with the present approach. The formula (6.48) agrees with Eq. (5.36) from Section V.C, since the angle 0 employed there is related to the angle of incidence 4 by 0 =
77 -
24.
(6.49)
Since the decomposition of the plane-wave spinor into a boost term and a Pauli spinor term is unique to the STA, it is not at all clear how the conventional approach can formulate the idea of the rest spin. In fact, the rest-spin vector is contained in the standard approach in the form of the “polarization operator” [49, SO], which, in the STA, is given by (6.50)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
325
where n is a unit spatial vector. This operator is Hermitian, squares to 1, and commutes with the free-field Hamiltonian. If we consider a freeparticle plane-wave state, then the expectation value of the d((+,) operator is
=
m EP2
-[L((+,*ppL+ (+,AppL>l‘S,
where L is the boost
L(p)= E + m + P (6.52) v 2 m ( E + m) and p is replaced by its eigenvalue p . To manipulate Eq. (6.51), we use the facts that L commutes with p and satisfies _E+P L2 -m
(6.53)
to construct
-m +-E2m
(E + m
+ p)cri(E+ m - p ) ] . (6.54)
Since only the relative vector part of this quantity is needed in Eq. (6.51), we are left with E - m ( 2 0 i . p p+ ( E + mI2ui - p u i p ) = 1 ( p 2 a i+ ( E + m)2(+i) 2E(E + m) 2Ep2 -
-
‘+,.
(6.55)
The expectation value of the “polarization” operators is therefore simply (6.56) which just picks out the components of the rest-spin vector, as claimed. For the case of the potential step, d(n) still commutes with the full Hamiltonian when n is perpendicular to the plane of incidence. In their
326
CHRIS DORAN ET AL.
paper, Fradkin and Kashuba decompose the incident and reflected waves into eigenstates of d(n),which is equivalent to aligning the spin in the manner adopted in this section. As we have stressed, removing the boost and working directly with Q, simplifies many of these manipulations, and removes any need for the polarization operator.
D . Tunneling of Plane Waves
Suppose now that a continuous beam of plane waves is incident on a potential barrier of finite width. We know that, quantum mechanically, some fraction of the wave tunnels through to the other side. In Section VII we address the question, “How long does the tunnelling process take?” To answer this we will need to combine plane-wave solutions to construct a wavepacket, so here we give the results for plane waves. The physical setup is illustrated in Fig. 3. The matching equation at the x = 0 boundary is, for spin up,
(6.57)
B+(:)
where the A and B+ matrices are as defined in (6.11) and (6.29), respectively. All subscripts can be dropped again, as A always refers to free space and B+ to the barrier region. The matching conditions at x = -d require the inclusion of suitable propagators, and the resulting equation is
Equation (6.58) shows that the relevant propagator matrix for evanescent waves is
0
epKd
(6.59)
The problem now reduces to the matrix equation
(6.60) from which the reflection and transmission coefficients are easily obtained.
327
SPACETIME ALGEBRA AND ELECTRON PHYSICS
x = -d
-5
x=o
FIGURE3. Schematic representation of plane-wave tunneling.
The applications to tunneling discussed in Section VII deal mainly with perpendicular incidence, so we now specialize to this situation. For perpendicular incidence we can set uf = -u- = u ’ , where tanh u’ =
K
E‘ ~
(6.61)
+ m‘
It follows that the equations for spin up and spin down are the same, and we can remove the up-arrows from the preceding equations. Equation (6.57) now yields
(i;)T’( =
sinh(u’/2) cosh(u/2) - j cosh(u’/2) sinh(u/2)
sinh u’
sinh(u’/2) cosh(u/2) + j cosh(u’/2) sinh(u/2)
1
9
(6.62) and from TIand R , the current in the evanescent region can be constructed. The ratio J , / J , may be interpreted as defining a “velocity” inside the barrier. The consequences of this idea were discussed in [6], where it was concluded that the tunneling times predicted by this velocity are not related to tunneling times measured for individual particles. The reasons for this are discussed in Section VII. Multiplying out the matrices in Eq. (6.60) is straightforward, and yields
(
=
-j sinh(Kd)eVm/(Kp)
@PR
),
T , (cosh(Kd) -j sinh(Kd)(EE’ - m*)/(Kp)
(6.63)
which solves the problem. The transmission coefficient is t=
Kpe
-jdp
~pcosh(Kd) - j ( p 2 - e V E ) Sinh(Kd)’
(6.64)
which recovers the familiar nonrelativistic formula in the limit E = m.
328
CHRIS DORAN ET AL.
E. The Klein Paradox
w,
In the Klein region, eV - E > traveling wave solutions exist again. To find these we observe that plane-wave solutions must now satisfy (6.65) m+ro and, as p - eVyo has a negative time component, $4 must now be -1. We could achieve this flip by inserting a ''@-factor" of the type described in Section III.C, but this would mix the rest-spin states. It is more convenient to work with solutions given by ( P - eVYo)4J
=
(6.66)
and (-cos +m1 - sin +m2) uI@e-iu3(Ef-p~-P.~v)R, (6.67)
where the choice of u1or m2 on the right-hand side of the boost is merely a phase choice. To verify that 4J' is a solution, we write the Dirac equation as
which holds provided that (6.69)
It follows that m cosh u
=
eV
-
E,
rn sinh u = p.
(6.70)
The current obtained from 4J' is found to be 4Jfyodr= (eV
- E)YO+ AYI
-
P~YZ~
(6.71)
which is future pointing (as it must be) and points in the positive x direction. It is in order to obtain the correct direction for the current that the sign of px is changed in (6.66) and (6.67). As has been pointed out by various authors [35, 51, 521, some texts on quantum theory miss this argument and match onto a solution inside the barrier with an incoming group velocity [34, 361. The result is a reflection coefficient greater than 1. This
SPACETIME ALGEBRA AND ELECTRON PHYSICS
329
is interpreted as evidence for pair production, though in fact the effect is due to the choice of boundary conditions. To find the correct reflection and transmission coefficients for an outgoing current, we return to the matching equation which, for spin up, gives
(6.72)
This time we define the matrix sinh(ui/2)ej@i - sinh(ui/2)e-j+i cosh(uJ2)
cosh(ui/2)
(6.73)
so that Eq. (6.72) becomes
A i ( Z ) =Ci+,(“‘). R,r,1
(:
It should be noted that
Ci=
:)Ai.
(6.74)
(6.75)
The corresponding equation for spin down is simply (6.76) The Klein “paradox” occurs at a single step, for which the matching equation is A, ( TR’’ ) = C 2 ( y ) .
(6.77)
Inverting the A,matrix yields cosh(u + u’)/2 sinh(u/2) sinh(u’/2)e2j+- cosh(d2) cosh(u’/2) (6.78)
330
CHRIS DORAN ET AL.
from which the reflection and transmission coefficients can be read off. (The primed quantities relate to the barrier region, as usual.) In particular, for perpendicular incidence, we recover r = - cosh(u - ~ ' ) / 2 cosh(u + u')/2
and
t=
sinh u cosh(u + u')/2'
(6.79)
as found in [35]. The reflection coefficient is always 5 1 , as it must be from current conservation with these boundary conditions. But, although a reflection coefficient 5 1 appears to ease the paradox, some difficulties remain. In particular, the momentum vector inside the barrier points in an opposite direction to the current. A more complete understanding of the Klein barrier requires quantum field theory since, as the barrier height is >2m, we expect pair creation to occur. An indication that this must be the case comes from an analysis of boson modes based on the Klein-Gordon equation. There one finds that superradiance (r > 1) does occur, which has to be interpreted in terms of particle production. For the fermion case the resulting picture is that electron-positron pairs are created and split apart, with the electrons traveling back out to the left and the positrons moving into the barrier region. If a single electron is incident on such a step, then it is reflected and, according to the Pauli principle, the corresponding pair production mode is suppressed. A complete analysis of the Klein barrier has been given by Manogue [51], to which readers are referred for further details. Manogue concludes that the fermion pair production rate is given by
2 ITiI2,
-
(6.80)
TABLE VI
SUMMARY OF RESULTSFOR PLANE WAVESINCIDENT ON Traveling waves
A
POTENTIAL STEP"
SPACETIME ALGEBRA AND ELECTRON PHYSICS
33 1
TABLE V1 (continued)
I
(upper/lower signs = spin upldown)
Klein waves
cosh(uR) * = (sinh(u/2)eJ0 Matching matrices
B+=(
cosh(uI2) -~inh(u/2)e-~@
cosh(u'i2)
cosh(~-/2)
j sinh(ut/2)
+ j sinh(u-/2)
sinh(u/2)ej$
-sinh(u/2)e-jm
cosh(u/2)
cosh(u/2)
c=(
A*, B-*, C* for spin down.
Propagators
The waves travel in the x-y plane and the steps lie in the y-z plane. The matching matrices relate T and R on either side of a step.
332
CHRIS DORAN ET AL.
where the integrals run over the available modes in the Klein region, and the sum runs over the two spin states. This formula gives a production rate per unit time, per unit area, and applies to any shape of barrier. The integrals in (6.80) are not easy to evaluate, but a useful expression can be obtained by assuming that the barrier height is only slightly greater than 2m, (6.81) eV = 2m(l + E ) . Then, for the case of a single step, we obtain a pair production rate of (6.82)
to leading order in E . The dimensional term is m3, which, for electrons, corresponds to a rate of lo4*particles per second, per square meter. Such an enormous rate would clearly be difficult to sustain in any physically realistic situation! The results obtained in this section are summarized in Table VI. VII. TUNNELING TIMES In this Section we study tunneling phenomena. We do so by setting up a wavepacket and examining its evolution as it impinges on a potential barrier. The packet splits into reflected and transmitted parts, and the streamlines of the conserved current show which parts of the initial packet end up being transmitted. The analysis can be used to obtain a distribution of arrival times at some fixed point on the far side of the barrier, which can be compared directly with experiment. The bulk of this section is concerned with packets in one spatial dimension, and compares our approach to other studies of the tunneling-time problem. The section ends with a discussion of the complications introduced in attempting 2- or 3dimensional simulations. The study of tunneling neatly combines the solutions found in Section VI with the views on operators and the interpretation of quantum mechanics expressed in Section 111. Tunneling also provides a good illustration of how simple it is study electron physics via the Dirac theory once the STA is available. A . Wavepacket Tunneling
In Section V1.D we studied tunneling of a continuous plane wave through a potential barrier. It was found that the growing and decaying waves in
SPACETIME ALGEBRA AND ELECTRON PHYSICS
333
the barrier region are given by Eqs. (6.27) for spin up and (6.33) for spin down. Restricting to the case of perpendicular incidence, the amplitudes of the reflected and transmitted waves are given by Eq. (6.62). It follows that, for arbitrary spin, the wavefunction in the barrier region is
+,I
[ (5) + ($)
= cosh
sinh
Q,
cr2u3@u3] e-Kxe-iu3Eta
where a=
T ‘[sinh (f sinh u‘
cosh
)(:
- zb3cosh
($) sinh (i)] (7.2)
and
):(
K
tanh - =-E’+m
-
K
K~
= m2 - El2.
(7.3)
E-eV+m’
The current in the barrier region is
IT’I’ mK
+
$Iyot$, = y [ r n 2 e V c o s h ( 2 ~ x ) E’(p2- Eeu)
+~
(7.4) K ~ u - ,mKeV sinh(2~x)(iu,)s l y o ,
from which we can define a “velocity” dx - J - y ’ PK2 + E’(p’ -Eeu)’ dt -Jayo m2eVcosh(2~x)
(7.5)
In fact, the velocity (7.5) does not lead to a sensible definition of a tunneling time for an individual particle [6]. As we shall see shortly, an additional phenomenon underlies wavepacket tunneling, leading to much shorter times than those predicted from (7.5). To study wavepacket tunneling it is useful, initially, to simplify to a one-dimensional problem. To achieve this we must eliminate the transverse current in (7.4) by setting s = + u I . This is equivalent to aligning the spin vector to point in the direction of motion. (In this case there is no distinction between the laboratory and comoving spin.) With Q, chosen so that s = ul, it is a now a simple matter to superpose solutions at f = 0 to construct a wavepacket centered to the left of the barrier and moving toward the barrier. The wavepacket at later times is then reassembled from the plane-wave states, whose time evolution is known. The density J o = yo J can then be plotted as afunction of time, and the result of such a simulation is illustrated in Fig. 4.
rime component of probobility current
1000
500
0 150
-
-100
0
-50
so
100
1
z (hgmtroma)
a lime component of probobility currant
2000
-
1500
-
1000
-
z (hgrtroms)
b FIGURE 4. Evolution of the density .To as a function of time. The initial packet (Fig. 4a) is a Gaussian-of width Ak = 0.04 k'and energy 5 eV. The barrier starts at the origin and has width 5 A and height 10 eV. Figures 4a, 4b, 4c, and 4d show the density profile at 334
lime component of probability current
2000
1500
1000
500
z (Angstroms)
C rime componenl of probability current
z (Anqslrorns)
d times -0.5 x s , -0. I x s , 0.1 X s, and 0.5 X s, respectively. In all plots the vertical scale to the right of the barrier is multiplied by lo4 to enhance the features of the small, transmitted packet.
335
336
CHRIS DORAN ET AL.
The Dirac current J = $yo$ is conserved even in the presence of an electromagnetic field. It follows that J defines a set of streamlines which never end or cross. Furthermore, the time component of the current is positive-definite, so the tangents to the streamlines are always futurepointing timelike vectors. According to the standard interpretation of quantum mechanics, J o ( x , t ) gives the probability density of locating a particle at position x at time t . But, considering a flux tube defined by adjacent streamlines, we find that P@O xo) dxo = PO, x , ) dx* 9
9
(7.6)
1
where ( t o , xo) and ( t l , xl) are connected by a streamline. It follows that the density .Io flows along the streamlines without “leaking” between them. So, in order to study the tunneling process, we should follow the streamlines from the initial wavepacket through spacetime. A sample set of these streamlines is shown in Fig. 5. A significant feature of this plot is that a continuously distributed set of initial input conditions has given Streamlines 15
10
-5
-in -0.6
-0.4
-0.2
0
Time /
0.2
0.4
0.6
s
FIGURE5. Particle streamlines for the packet evolution shown in Fig. 4. Only the streamlines from the very front of the packet cross the barrier, with the individual streamlines slowing down as they pass through.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
337
rise to a disjoint set of outcomes (whether or not a streamline passes through the barrier). Hence the deterministic evolution of the wavepacket alone is able to explain the discrete results expected in a quantum measurement, and all notions of wavefunction collapse are avoided. This is of fundamental significance to the interpretation of quantum mechanics. Some consequences of this view for other areas of quantum measurement have been explored by Dewdney et al. [27] and Vigier et al. [26], though their work was founded in the Bohmian interpretation of nonrelativistic quantum mechanics. The results presented here are, of course, independent of any interpretation-we do not need the apparatus of B o h d d e Broglie theory in order to accept the validity of predictions obtained from the current streamlines. The second key feature of the streamline plot in Fig. 5 is that it is only the streamlines starting near the front of the initial wavepacket that pass through the barrier. Relative to the center of the packet, they therefore have a “head start” in their arrival time at some chosen point on the far side of the barrier. Over the front part of the barrier, however, the streamlines slow down considerably, as can be seen by the change in their slope. These two effects, of picking out the front end of the packet and then slowing it down, compete against each other, and it is not immediately obvious which dominates. To establish this, we return to Fig. 4 and look s, the peak at the positions of the wavepacket peaks. At t = 0.5 x of the transmitted packet lies at x = 70 A, whereas the peak of the initial packet would have been at x = 66 A had the barrier not been present. In this case, therefore, the peak of the transmitted packet is slightly advanced, a phenomenon often interpreted as showing that tunneling particles speed up, sometimes to velocities greater than c [53].The plots presented here show that such an interpretation is completely mistaken. There is no speeding up, as all that happens is that it is only the streamlines from the front of the wavepacket that cross the barrier (slowing down in the process), and these reassemble to form a localized packet on the far side. The reason that tunneling particles may be transmitted faster than free particles is due entirely to the spread of the initial wavepacket. There is considerable interest in the theoretical description of tunneling processes because it is now possible to obtain measurements of the times involved. The clearest experiments conducted to date have concerned photon tunneling [54], where an ingenious 2-photon interference technique is used to compare photons that pass through a barrier with photons that follow an unobstructed path. The discussion of the results of photon tunneling experiments usually emphasise packet reshaping, but miss the arguments about the streamlines. Thus many articles concentrate on a comparison of the peaks of the incident and transmitted wavepackets and
338
CHRIS DORAN ET AL.
discuss whether the experiments show particles traveling at speeds >c [53, 551. As we have seen, a full relativistic study of the streamlines followed by the electron probability density show clearly that no superluminal velocities are present. The same result is true for photons, as we will discuss elsewhere. Ever since the possibility of tunneling was revealed by quantum theory, people have attempted to define how long the process takes. Reviews of the various different approaches to this problem have been given by Hauge and Stovneng [56] and, more recently, by Landauer and Martin [571. Most approaches attempt to define a single tunneling time for the process, rather than a distribution of possible outcomes as is the case here. Quite why one should believe that it is possible to define a single time in a probabilistic process such as tunneling is unclear, but the view is still regularly expressed in the modern literature. A further flaw in many other approaches is that they attempt to define how long the particle spent in the barrier region, with answers ranging from the implausible (zero time) to the utterly bizarre (imaginary time). From the streamline plot presented here, it is clearly possible to obtain a distribution of the times spent in the barrier for the tunneling particles, and the answers will be relatively long as the particles slow down in the barrier. But such a distribution neglects the fact that the front of the packet is preferentially selected, and anyway does not appear to be accessible to direct experimental measurement. As the recent experiments show [54], it is the arrival time at a point on the far side of the barrier that is measurable, and not the time spent in the barrier. B . Two-Dimensional Simulations
In the preceding section we simplified the problem in two ways: by assuming perpendicular incidence, and by aligning the spin vector in the direction of motion. For other configurations more complicated two-dimensional or three-dimensional simulations are required. As well as the obvious numerical complications introduced, there are some further difficulties. For the 1-D plots just shown, there was no difficulty in deciding which part of the wavepacket was transmitted and which was reflected, since the bifurcation point occurred at some fixed value of x. For 2-D or 3-D simulations, however, the split of the initial packet into transmitted and reflected parts occurs along a line or over a 2-D surface. Furthermore, this split is spin-dependent-if one constructs a moving wavepacket, one finds that the streamlines circulate around the spin axis [291. (A similar
SPACETIME ALGEBRA AND ELECTRON PHYSICS
339
circulation phenomena is found in the ground state of the hydrogen atom [6].) The full picture of how the packet behaves is therefore quite complicated, though qualitatively it is still the front portion of the packet that is transmitted. The results of a 2-D simulation are shown in Fig. 6, and show a number of interesting features. For example, it is the part of the wavepacket that “spins” infothe barrier that is predominantly responsible for the transmitted wavepacket. The significance of the spin in the barrier region was clear from Eq. (7.4), which showed that the spin vector generates a transverse current in the barrier region. These transverse currents are clearly displayed in Fig. 6. The motion near the barrier is highly complex, with the appearance of current loops suggesting the formation of vortices. Similar effects have been described by Hirschfelder et al. [58] in the context of the Schrodinger theory. The streamline plots again show a slowing down in the barrier, which offsets the fact that it is the front of the packet that crosses the barrier.
VIII. SPINMEASUREMENTS We now turn to a second application of the local observables approach to quantum theory, namely, to determine what happens to a wavepacket when a spin measurement is made. The first attempts to answer this question were made by Dewdney et al. [27, 311, who used the Pauli equation for a particle with zero charge and an anomalous magnetic moment to provide a model for a spin-f particle in a Stern-Gerlach apparatus. Written in the STA, the relevant equation is 1 2m
a,Q,iu3 = --V2Q,
- pBQ,u3
and the current employed by Dewdney et al. is
J
=
1 m
- -b(diu,a+>.
(8.2)
Dewdney et al. parameterise the Pauli spinor Q, in terms of a density and three “Euler angles.” In the STA, this parameterisation takes the transparent form Q, = p1/2eiu3+/2 e iul 8/2e iu 39 / 2 ,
(8.3)
where the rotor term is precisely that needed to parameterize a rotation in terms of the Euler angles. With this parameterization, it is a simple
IS
10
S
d
o
-5
-10
IJ
10
J
I
0
J
-10
-15
-7.0
-10
0
10
20
30
40
Y
b 6. A two-dimensional simulation. The simulation uses a 2-D wavepacket in the A-’ and energy 5 eV. The packet is incident perpendicular to the barrier, and the spin vector lies in the +z direction. The barrier is at x = 0 and has width 2.5 d and height 10 eV. Figures 6a and 6b show streamlines in the x-y plane. Figure 6a shows streamlines about the bifurcation line, illustrating that the left side of the packet, which “spins” into the barrier, is preferentially transmitted. Figure 6b shows streamlines for a set of points near the front of the packet with the same x value. Again, it is the left side of the packet that is transmitted. The reflected trajectories show complex behavior, FIGURE
x-y plane of width Ak = 0.04
340
I 4.15
4.1
4.0s
0
0.0,
0.1
0.15
0.2
t C
4.2
4.15
4.1
405
0
0.05
0. I
0.15
0.2
t
d and in both plots the effect of the transverse current in the barrier is clear, The r-dependence of the streamlines from Fig. 9b are shown in Figs. 9c and 9d. Figure 9c shows the streamlines in t-x space. Since the streamlines were started from the same x position and different y positions, the streamlines start from the same point and then spread out as the individual lines evolve differently. Again, it is possible to infer that the streamlines slow down as they pass through the barrier. Figure 9d shows the t-y evolution of the same streamlines. In all plots distance is measured in and time in units of s.
a
34 1
342
CHRIS DORAN ET AL.
matter to show that the current becomes
J
=k
(V$ 2m
+ cos 6V#).
(8.4)
But, as was noted in Section IV.A, the current defined by Equation (8.2) is not consistent with that obtained from the Dirac theory through a nonrelativistic reduction. In fact, the two currents differ by a term in the curl of the spin vector [6, 281. To obtain a fuller understanding of the spin measurement process, an analysis based on the Dirac theory is required. Such an analysis is presented here. As well as dealing with a well-defined current, basing the analysis in the Dirac theory is important if one intends to proceed to study correlated spin measurements performed over spacelike intervals (i.e., to model an EPR-type experiment). To study such systems it is surely essential that one employs relativistic equations so that causality and the structure of spacetime are correctly built in.
A . A Relativistic Model of a Spin Measurement
As is shown in Section IV.C, the modified Dirac equation for a neutral particle with an anomalous magnetic moment p is Vq!iiu3- ipFJiy, = m$yo.
(8.5)
This is the equation we use to study the effects of a spin measurement, and it is not hard to show that Eq. (8.5) reduces to (8.1) in the nonrelativistic limit. Following Dewdney et al. [3 11, we model the effect of a spin measurement by applying an impulsive magnetic field gradient, F = BzS(t)ir3.
(8.6)
The other components of B are ignored, as we are only modeling the behaviour of the packet in the z direction. Around t = 0, Eq. (8.5) is approximated by
a, JI ir, = A P Z W Y , $Y3
3
(8.7)
where Ap = @.
(8.8)
To solve (8.7) we decompose the initial spinor $o into
JIT $(GO - 73q!iOy3),
q!i*
i($O + Y3$0?3).
(8.9)
343
SPACETIME ALGEBRA AND ELECTRON PHYSICS
Equation (8.7) now becomes, for $ 7 ,
a,$ T = ApzS(t)JIt iv3 (8.10) with the opposite sign for JI i . The solution is now straightforward, as the impulse just serves to insert a phase factor into each of JI T and I) : $T +tei~jApz, +&+ q , J e - i q A ~ z + (8.11) If we now suppose that the initial JI consists of a positive-energy j
plane-wave
Jlo = L(p)&is3(P'X-E') then, immediately after the shock, JI is given by
(8.12)
q, = $ T e i u 3 @ . x + A ~ z )+ q, 1eiu-,@'x-A~z)
(8.13)
The spatial dependence of )I is now appropriate to two different values of the 3-momentum, p t and p i , where pJ "p-hpu,.
p"p+Apc+3,
(8.14)
The boost term L(p) corresponds to a different momentum, however, so both positive and negative frequency waves are required for the future evolution. After the shock, the wavefunction therefore propagates as
q, =
,,,'
e-iu3pt.x +
$1 e i u 3 F f . x + +!
e - i u j p l . x + q,!
eiu3Fi.x
,
(8.15)
where (8.16) (8.17)
(8.18)
Both p i and EL are defined similarly. Each term in (8.15) must separately satisfy the free-particle Dirac equation, so it follows that P W
= mq,!Yo
(8.19)
= m+f y o ,
(8.20)
+ q,!.
(8.21)
and -
-pT$1
which are satisfied together with
q,'
= q,!
344
CHRIS DORAN ET AL.
The same set of equations hold for $ $. Dropping the arrows, we find that (8.22) and
(8.23) which hold for both $ and $ $ . The effect of the magnetic shock on a monochromatic wave is to split the wave into four components, each with a distinct momentum. The positive-frequency waves are transmitted by the device and split into two waves, whereas the negative-frequency states are reflected. The appearance of the antiparticle states must ultimately be attributed to pair production, and becomes significant only for large B fields. We examine this effect after looking at more physical situations. B . Wavepacket Simulations
For computational simplicity we take the incident particle to be localized along the field direction only, with no momentum components transverse to the field. This reduces the dimensionality of the problem to one spatial coordinate and the time coordinate. This was the setup considered by Dewdney et al. [31] and is sufficient to demonstrate the salient features of the measurement process. The most obvious difference between this model and a real experiment where the electron is moving is that, in our model, all four packets have group velocities along the field direction. The initial packet is built up from plane-wave solutions of the form $ = eu~j/2@ eiu,(Pz-EO,
(8.24)
which are superposed numerically to form a Gaussian packet. After the impulse, the future evolution is found from Eq. (8.15), and the behaviour of the spin vector and the streamlines can be found for various initial values of @. The results of these simulations are plotted on the next few pages. In Figs. 7 and 8 we plot the evolution of a packet whose initial spin vector points in the u, direction (@ = e~p{-ic+~rr/4}). After the shock, the density splits neatly into two equal-sized packets, and the streamlines bifurcate at the origin. As with the tunneling simulations, we see that disjoint quantum outcomes are entirely consistent with the causal wavepacket evolution defined by the Dirac equation. The plot of the spin vector
Robability density in laboratory system
I
b FIGURE 7. Splitting of a wavepacket caused by an impulsive B-field. The initial packet kg m s-I in momentum space, and receives an impulse of A p = 1 x has a width of 1 x lodz3kg m s-l. Figure 7a shows the probability density J,, at t = 0, 1.3, 2.6, 3.9 x s, with r increasing up the figure. Figure 7b shows streamlines in the ( 1 , z ) plane.
346
CHRIS DORAN ET AL.
sAyo shows that immediately after the shock the spins are disordered,
but after a little time they sort themselves into one of the two packets, with the spin vector pointing in the direction of motion of the deflected packet. These plots are in good qualitative agreement with those obtained by Dewdney et al. [31], who also found that the choice of which packet a streamline enters is determined by its starting position in the incident wavepacket. Figure 9 shows the results of a similar simulation, but with the initial spinor now containing unequal amounts of spin-up and spin-down compo-
SPACETIME ALGEBRA AND ELECTRON PHYSICS
347
nents. This time we observe an asymmetry in the wavepacket split, with more of the density traveling in the spin-up packet. It is a simple matter to compute the ratio of the sizes of the two packets, and to verify that the ratio agrees with the prediction of standard quantum theory. As a final, novel illustration of our approach, we consider a strong shock applied to a packet which is already aligned in the spin-up direction. For a weak shock the entire packet is deflected, but if the shock is sufficiently strong that the antiparticle states have significant amplitude, we find that a second packet is created. The significant feature of Fig. 10 is that the antiparticle states are deflected in the opposite direction, despite the fact that their spin is still oriented in the +z direction. The antiparticle states thus behave as if they have a magnetic moment-to-mass ratio of opposite sign. A more complete understanding of this phenomenon requires a field-theoretic treatment. The appearance of antiparticle states would then be attributed to pair production, with the antiparticle states having the same magnetic moment but the opposite spin. (One of the crucial effects of the field quantisation of fermionic systems is to flip the signs of the charges and spins of antiparticle states.) The conclusions reached in this section are in broad agreement with those of Dewdney et al. From the viewpoint of the local observables of the Dirac wavefunction (the current and spin densities), a Stern-Gerlach apparatus does not fulfil the role of a classical measuring device. Instead, it behaves much more like a polarizer, where the ratio of particles polarized up and down is dependent on the initial wavefunction. The B field dramatically alters the wavefunction and its observables, though in a causal manner that is entirely consistent with the predictions of standard quantum theory. The implications of these observations for the interpretation of quantum mechanics are profound, though they are only slowly being absorbed by the wider physics community. (Some of these issues are debated in the collection of essays entitled Quantum Implications [59] and in the recent book by Holland [321.)
IX. THEMULTIPARTICLE STA
So far we have dealt with the application of the STA to single-particle quantum theory. In this section we turn to multiparticle theory. The aim here is to develop the STA approach so that it is capable of encoding multiparticle wavefunctions , and describing the correlations between them. Given the advances in clarity and insight that the STA brings to single-particle quantum mechanics, we expect similar advances in the
Probability density in laboratory system
z I hgstrorn
a Streamlines 3
2
1
E
:0
.
1
0
N
-1
Y -I/
-3
I
0
0.25
0.5
0.75
1
1.25
1.5
1.75
,
Time I lO-"s
b FIGURE9. Splitting of a wavepacket with unequal mixtures of spin-up and spin-down
components. The initial packet has = 1.618 - iu2,so more of the streamlines are deflected upwards, and the bifurcation point lies below the z = 0 plane. The evolution of the spin vector is shown in Figure 9c.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
349
multiparticle case. This is indeed what we have found, although the field is relatively unexplored as yet. Here we highlight some areas where the multiparticle STA promises a new conceptual approach, rather than attempting to reproduce the calculational techniques employed in standard approaches to many-body or many-electron theory. In particular, we concentrate on the unique geometric insights that the multiparticle STA provides-insights that are lost in the matrix theory. A preliminary introduction to the ideas developed here was given in [4], though this is the first occasion that a full relativistic treatment has been presented.
Probability density in laboratory system
z I hgshlirn
a Streamlines
Time / lo-’*s
b FIGURE10. Creation of antiparticle states
by a strong magnetic shock. The impulse used is Ap = 1 X kg m s-’ and the initial packet is entirely spin-up (a= 1). The packet travelling to the left consists of negative energy (antiparticle) states.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
351
The n-particle STA is created simply by taking n sets of basis vectors labels the particle space, and imposing the geometric algebra relations
{y2, where the superscript
. .
Y L Y i + Y$Y;rZ = 0 .
.
. .
r;rZrJ,+ yJ,y;rZ=
2qpv,
i#j i =j.
(9.1)
These relations are summarized in the single formula
The fact that the basis vectors from distinct particle spaces anticommute means that we have constructed a basis for the geometric algebra of a 4n-dimensional configuration space. There is nothing uniquely quantum mechanical in this idea-a system of three classical particles could be described by a set of three trajectories in a single space, or one path in a 9-dimensional space. The extra dimensions serve simply to label the properties of each individual particle, and should not be thought of as existing in anything other than a mathematical sense. This construction enables us, for example, to define a rotor which rotates one particle while leaving all the others fixed. The unique feature of the multiparticle STA is that it implies a separate copy of the time dimension for each particle, as well as the three spatial dimensions. To our knowledge, this is the first attempt to construct a solid conceptual framework for a multitime approach to quantum theory. Clearly, if successful, such an approach will shed light on issues of locality and causality in quantum theory. The {y;} serve to generate a geometric algebra of enormously rich structure. Here we illustrate just a few of the more immediate features of this algebra. It is our belief that the multiparticle STA will prove rich enough to encode all aspects of multiparticle quantum field theory, including the algebra of the fermionic creatiodannihilation operators. Throughout, Roman superscripts are employed to label the particle space in which the object appears. So, for example, JI' and J12 refer to two copies of the same 1-particle object JI, and not to separate, independent objects. Separate objects are given distinct symbols, or subscripts if they represent a quantity such as the current or spin vector, which are vectors in configuration space with different projections into the separate copies of the STA. The absence of superscripts denotes that all objects have been collapsed into a single copy of the STA. As always, Roman and Greek subscripts are also used as frame indices, though this does not interfere with the occasional use of subscripts to determine separate projections.
352
CHRIS DORAN ET AL.
A . Two-Particle Pauli States and the Quantum Correlator
As an introduction to the properties of the multiparticle STA, we first consider the 2-particle Pauli algebra and the spin states of pairs of spin4 particles. As in the single-particle case, the 2-particle Pauli algebra is just a subset of the full 2-particle STA. A set of basis vectors is defined by 1(Ti
- YiYO
1 1
(9.3)
2 2
(9.4)
and 2ui
- YiYOi
So, in constructing multiparticle Pauli states, the basis vectors from different particle spaces commute rather than anticommute. Using the elements (1, iui, ici, i u f i u i } as a basis, we can construct 2-particle states. Here we have introduced the abbreviation (9.6)
iuf3 i1C;
since, in most expressions, it is obvious which particle label should be attached to the i. In cases where there is potential for confusion, the particle label is put back on the i. The basis set ( I , ia;, iui , iuf i c i } spans a 16-dimensionalspace, which is twice the dimension of the direct product space of two 2-component complex spinors. For example, the outer-product space of two spin4 states can be built from complex superpositions of the set
which forms a 4-dimensional complex space (8 real dimensions). The dimensionality has doubled because we have not yet taken the complex structure of the spinors into account. While the role of j is played in the two single-particle spaces by right multiplication by iui and iu: , respectively, standard quantum mechanics does not distinguish between these operations. A projection operator must therefore be included to ensure that right multiplication by iu: or ia: reduces to the same operation. If a 2-particle spin state is represented by the multivector then $must satisfy
+,
SPACETIME ALGEBRA AND ELECTRON PHYSICS
353
from which we find that $ j
=
-giu:ia:
+ = $6(1 - iaiiu;).
(9.9)
On defining E
= $(I
- iuiiui),
(9.10)
we find that (9.11)
E2= E,
so right multiplication by E is a projection operation. (The relation E 2 = E means that E is technically referred to as an “idempotent” element.) It follows that the 2-particle state 9 must contain a factor of E on its righthand side. We can further define J = Eiai = Eiu: = &(iu:+ i a ? )
(9.12)
J 2 = -E.
(9.13)
so that Right-sided multiplication by J takes on the role ofjfor multiparticle states. The STA representation of a direct-product 2-particle Pauli spinor is now given by $1+2E,where and +2 are spinors (even multivectors) in their own spaces. A complete basis for 2-particle spin states is provided by
(A) (0) @
(9.14)
* -iuiE
This procedure extends simply to higher multiplicities. All that is required is to find the “quantum correlator” En satisfying E,,ia< = E,,iu: = J,,
for allj , k.
(9.15)
354
CHRIS DORAN ET AL.
En can be constructed by picking out the j all the other spaces to this, so that E,, =
nt(1 -
=
1 space, say, and correlating
iu$iu<).
(9.16)
j=2
The value of En is independent of which of the n spaces is singled out and correlated to. The complex structure is defined by J,, = E,,iw4,
(9.17)
where i d , can be chosen from any of the n spaces. To illustrate this consider the case of n = 3 , where E3 = $(l - iu$im$)(l- iviiv:) = +(I - iu;icr$- icriicr: - icriicr:)
(9.18) (9.19)
and
J3 = + ( i r : + i c : + ic: - icriic$iui).
(9.20)
Both E3 and J3 are symmetric under permutations of their indices. A significant feature of this approach is that all the operations defined for the single-particle STA extend naturally to the multiparticle algebra. The reversion operation, for example, still has precisely the same definition-it simply reverses the order of vectors in any given multivector. The spinor inner product (3.12) also generalizes immediately, to ($9
4)s = (En)-’[($’+>- ($’+ J,,)i~31,
(9.21)
where the right-hand side is projected onto a single copy of the STA. The factor of (En)-’is included so that the state “1” always has unit norm, which matches with the inner product used in the matrix formulation. 1. The Nonrelativistic Singlet State As an application of the formalism outlined above, consider the 2-particle singlet state Is), defined by (9.22) This is represented in the 2-particle STA by the multivector (9.23)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
The properties of
E
355
are more easily seen by writing
s. = t(1
+ icr;iuZ)f(l+ icr~iC+:>V%~,
which shows how E contains the commuting idempotents f ( 1 and t(l + ic$u:). The normalization ensures that ( E , E)S
(9.24)
+ icr$cr:)
= 2(Et&) = 4(+(1
+ icrhb;)$(l + b i b ! ) )
(9.25)
= 1.
The identification of the idempotents in results that i u : E = t(icri -
E
leads immediately to the
icr:>+(1+ iciiu!)V5icri= -ia;.s
(9.26) (9.27)
and hence that
TIE = icrliu!& = -icriicri& = iu;iu:&= -icrie.
(9.28)
If M ' is an arbitrary even element in the Pauli algebra ( M = M o Mkicri), it follows that E satisfies M ' E = MZt&.
+
(9.29)
This now provides a novel demonstration of the rotational invariance of E. Under a joint rotation in 2-particle space, a spinor $ transforms to R 'R2$, where R and R are copies of the same rotor but acting in the two different spaces. From Eq. (9.29) it follows that, under such a rotation, E transforms as
'
H
so that
E
R'R% = R ' R ' ~ = &, ~
(9.30)
is a genuine 2-particle scalar.
2. Nonrelativistic Multiparticle Observables Multiparticle observables are formed in the same way as for single-particle states. Some combination of elements from the fixed (-8 frames is sandwiched between a multiparticle wavefunction $ and its spatial reverse Jlt. An important example of this construction is provided by the multiparticle spin vector. In the matrix formulation, the kth component of the particle1 spin vector is given by
s,, = (JlImA
(9.31)
356
CHRIS DORAN ET AL.
which has the STA equivalent
s,,
=
2n-'((~tcr~~ - a(Jlticr:+)ia3) $
= -2n-'(icr~~icrJI +) = -2qicr;)
*
(9.32)
(GJJlt).
Clearly, the essential quantity is the bivector part of + J J l t , which neatly generalises the single-particle formula. If we denote the result of projecting out from a multivector M the components contained entirely in the ithwe can then write particle space by
(w',
s: = 2"-'($JJlt);.
(9.33)
The various subscripts and superscripts deserve some explanation. On both sides of Eq. (9.33), the superscript a labels the copy of the STA of interest. The subscript on the right-hand side, as usual, labels the fact that we are projecting out the grade-2 components of some multivector. The subscript a on the left-hand side is necessary to distinguish the separate projections of JlJ$+. Had we not included the subscript, then S' and S2 would refer to two copies of the same bivector, whereas St and S: are different bivectors with different components. The reason for including both the subscript and the superscript on S; is that we often want to copy the individual bivectors from one space to another, without changing the components. We can hold all of the individual S: bivectors in a single multiparticle bivector defined by
s = 2"4(+JJlt), .
(9.34)
-
Under ajoint rotation in n-particle space, Jl transforms to R , . * RnJland S therefore transforms to Rl . . . R n S R n t . . . R ' t = R ' S 11 R ' f + ... +RnS;Rnt. (9.35) Each of the separate projections of the spin current is therefore rotated by the same amount, in its own space. That the definition (9.34) is sensible can be checked with the four basis states (9.14). The form of S for each of these is contained in Table VII. Multiparticle observables for the 2particle case are discussed further below. Other observables can be formed using different fixed multivectors. For example, a 2-particle invariant is generated by sandwiching a constant rnultivector Z between the singlet state E ,
M=
(9.36)
SPACETIME ALGEBRA AND ELECTRON PHY SlCS
357
TABLE VII
SPINCURRENTS FOR 2-PARTICLE PAULISTATES Pauli state
Multivector form
Spin current
Taking I: = 1 yields
M
+ icr;iu;)+(I + icriiu:) + icr!io:+ icriicr; + icr$cri),
= &ist= 2$(1
= $(I
(9.37)
which rearranges to give iu:icr; = 2&Et- 1.
(9.38)
This equation contains the essence of the matrix result cigad?;b’ = 2 6 3 : 1 - 6:A8*,
(9.39)
where a, b, a‘, 6‘ label the matrix components. This matrix equation is now seen to express a relationship between 2-particle invariants. Further invariants are obtained by taking Z = i’iz, yielding
M = &i’i2Et
= &‘i2
+ utcr: + u:u; + cr;cr:).
(9.40)
This shows that both and uLcr; are invariants under 2-particle rotations. In standard quantum mechanics these invariants would be thought of as arising from the “inner product” of the spin vectors cij and ci;. Here, we have seen that the invariants arise in a completely different way by looking at the full multivector E E ? . The contents of this section should have demonstrated that the multiparticle STA approach is capable of reproducing most (if not all) of standard multiparticle quantum mechanics. One important result that follows is that the unit scalar imaginaryj can be completely eliminated from quantum mechanics and replaced by geometrically meaningful quantities. This should have significant implications for the interpretation of quantum mechanics.
358
CHRIS DORAN ET AL.
B . Comparison with the “Causal” Approach to Nonrelativistic Spin States As an application of the techniques outlined above, we look at the work of Holland on the “causal interpretation of a system of two spin-; particles’’ [60]. This work attempts to give a nonrelativistic definition of local observables in the higher-dimensional space of a 2-particle wavefunction. As we have seen, such a construction appears naturally in our approach. Holland’s main application is to a Bell inequality-type experiment, with spin measurements carried out on a system of two correlated spin4 particles by Stern-Gerlach experiments at spatially separated positions. Such an analysis, though interesting, will be convincing only if carried out in the fully relativistic domain, where issues of causality and superluminal propagation can be coherently addressed. We intend to carry out such an analysis in the future, using the STA multiparticle methods, and the work below on the observables of a 2-particle system can be seen as part of this aim. The aspect of Holland’s work that concerns us here (his Section 3 and Appendix A) deals with the joint spin space of a system of two nonrelativistic spin-t particles. The aim is to show that “all 8 real degrees of freedom in the two body spinor wavefunction may be interpreted (up to a sign) in terms of the properties of algebraically interconnected Euclidean tensors” [60]. Holland’s working is complex and requires a number of index manipulations and algebraic identities. Furthermore, the meaning of the expressions derived is far from transparent. Using the above techniques, however, the significant results can be derived more efficiently and in such a way that their geometric meaning is made much clearer. Rather than give a line-by-line translation of Holland’s work, we simply state the key results in our notation and prove them. Let $ = $E be a 2-particle spinor in the correlated product space of the I-particle spin spaces (the even subalgebra of the Pauli algebra). The observables of this 2-particle system are formed from projections of bilinear products of the form $r$, where r is an element of the 2-particle Pauli spinor algebra. For example, the two 3-dimensional spin vectors, sl , s: , are defined by is: + is; = 241J $ ,
(9.41)
where the right-hand side can be written in the equivalent forms $ J $ = $iv:$= $ia:$.
(9.42)
The formula (9.41) is a special case of Eq. (9.34), where, as we are working in a 2-particle system, the projection onto bivector parts is not required.
359
SPACETIME ALGEBRA AND ELECTRON PHYSICS
The vectors s! and s: correspond to the two spin vectors defined by Holland, with the explicit correspondence to his s,k and S Z k given by
s,k = - ( i s ! ) .(iu;)
and
S2k
= -(is:) * (iu:).
(9.43)
An important relation proved by Holland is that the vectors si and s: are of equal magnitude, (9.44)
( s l y = ($)2 = 2 R - p2,
where P
=2(Wt)
(9.45)
and an explicit form for fl is to be determined. To prove this result, we write the formulas for the components of si and s: (9.43) in the equivalent forms Slk = - 2 ( $ t i u L $ ) - ( i ~ 9
and
Szk = --2($+iu:$)*(iu3.
(9.46) But, in both cases, the term $tiuz$ contains a bivector sandwiched between two idempotents, so is of the form E . . - E . This sandwiching projects out the iu! and iu: components of the full bivector, and ensures that the terms have equal magnitude. The inner products in (9.46) can therefore be dropped and we are left with $
$ = s,kJ,
(9.47)
where the a = 1 , 2 labels the two separate spin vectors. It follows immediately from Eq. (9.47) that ( S y = -2(
(9.48)
tiat $$ tiu;$),
where the a's are not summed. But the quantity $$ contains only scalar and 4-vector components, so we find that iu:$+'iu:
=
iu:$$"iu:= -3($$+)
+ ($$t)4
=
$$i - 2p.
(9.49)
(This result follows immediately from u&zWk = -iu, which is valid for any vector u in the single-particle Pauli algebra.) Inserting Eq. (9.49) back into (9.48), we can now write (Si)2 = (s:)2
= 2p2 - 2($$+$$t),
(9.50)
which shows that s l and s: are indeed of equal magnitude, and enables us to identify R as
n = ;(3p* - 2($J,t$$t)).
(9.51)
360
CHRIS DORAN ET AL.
In addition top, si and si ,Holland defines a tensor Sjkwhose components are given by sjk
= -2(~,~icrficr;1/.1).
(9.52)
This object has the simple frame-free form T = (9J,Y.I.
(9.53)
Among them, p, s! , si and Tpick up 7 of the possible 8 degrees of freedom in t,!J. The remaining freedom lies in the phase, since all of the observables defined above are phase-invariant. Encoding this information caused Holland some difficulty, but in the STA the answer is straightforward, and is actually already contained in the above working. The crucial observation is that, as well as containing only scalar and 4-vector terms, the quantity t,!Jtt,!J is invariant under rotations. So, in addition to the scalar p, the 4vector components of J, tJ, must pick up important rotationally invariant information. Furthermore, since is of the form E ' * E, the 4-vector component of GtJ, contains only two independent terms, which can be taken as a complex combination of the icr$cr; term. This is seen most clearly using an explicit realisation of J,. Suppose that we write t,!J = ( p - iu;q - iw$r + icr:icr;s)E,
(9.54)
where p, q , r, and s are complex combinations of 1 and J . We then find that J , ~ J= , [ p + 2icriiu3ps - qr)]E.
(9.55)
This shows explicitly that the additional complex invariant is given by p s - qr. It is this term that picks up the phase of J,, and we therefore define the complex quantity a = (+icriicr;~,~) - ($iaiicr:+t)icr3,
(9.56)
which is the STA equivalent of the complex scalar p defined by Holland. The complex scalar a is invariant under rotations, and under the phase change $H$eJ+
(9.57)
a transforms as w
e*+icj.
(9.58)
The set {p, a,si, s:, T} encodes all the information contained in the 2particle spinor J,, up to an overall sign. They reproduce the quantities defined by Holland, but their STA derivation makes their properties and geometric origin much clearer.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
36 1
C . Relativistic Two-Particle States The ideas developed for the multiparticle Pauli algebra extend immediately to the relativistic domain. The direct product of the two single-particle spinor spaces (the even subalgebras) now results in a space of 8 x 8 = 64 real dimensions. Unlike the single-particle case, this space is not equivalent to the even subalgebra of the full 8-dimensional algebra. The full algebra is 256-dimensional, and its even subalgebra is therefore 128dimensional. It is not yet clear whether the remaining 64-dimensional space which is not picked up by sums of direct-product states could be of use in constructing 2-particle wavefunctions, and for the remainder of this section we work only with the space obtained from sums of directproduct states. Postmultiplying the direct-product space by the quantum correlator E reduces it to 32 real dimensions, which are equivalent to the 16 complex dimensions employed in standard 2-particle relativistic quantum theory. All the single-particle observables discussed in Section 1II.C extend simply. In particular, we define the vectors (9.59) and (9.60) which are repectively the 2-particle current and spin vector. (The calligraphic symbol 9 is used to avoid confusion with the correlated bivector J . ) We also define the spin bivector S by s = ($Jq&. (9.61) Of particular interest are the new Lorentz-invariant quantities that arise in this approach. From the work of the preceding section, we form the quantity $$, which decomposes into (9.62) 44 = <;G$)o,* +
<4$>,.
The grade-0 and grade-8 terms are the 2-particle generalization of the scalar + pseudoscalar combination +$ = p exp(ip) found at the singleparticle level. Of greater interest are the 4-vector terms. These offer a wealth of Lorentz-invariant 2-particle observables, the meaning of which we are only beginning to appreciate. Such invariants are rarely seen in the traditional matrix approach. 1. The Relativistic Singlet State and Relativistic Invariants Our task here is to find a relativistic analog of the Pauli singlet state discussed in Section 1X.A. Recalling the definition of E (9.23), the property
362
CHRIS DORAN ET AL.
that ensured that
E
was a singlet state was that iaiE
=
k
-ia,&,
=
1 ...3.
(9.63)
In addition to (9.63) a relativistic singlet state, which we will denote as q , must satisfy .
k
a k1 q = -a:q,
1 . . . 3.
=
(9.64)
It follows that q satisfies i ' q = aia!a:q
=
-.2 -a 23 a 22 u 2171--1rl
(9.65) (9.66)
3q
=
i(1 - i i i 2 ) q .
(9.67)
The state q can therefore be constructed by multiplying E by the idempotent 1( 1 - i ' i 2 ) . We therefore define 1 q = ~ - ( 1 - i ' i 2 ) = ( i a ; - iu;)&(l- iu$a$i(l - i 1 i 2 ) ,
v5
(9.68)
which is normalized such that (7, q), = 1 . The invariant q satisfies iaiq = i a : ~ & ( l i 1 i 2 ) = -iaiq
k
=
1
.. . 3
(9.69)
and a kI r j =
-uii1i2q = i2iaiq = -uiq
k=I
... 3 .
(9.70)
These results are summarized by M'q
(9.71)
= M2q,
where M is an even multivector in either the particle-1 or particle-2 STA. The proof that q is a relativistic invariant now reduces to the simple identity R1R2q= RII?'q = q ,
(9.72)
where R is a single-particle relativistic rotor. Equation (9.71) can be seen as arising from a more primitive relation between vectors in the separate spaces. Using the result that 7A-y; commutes with q , we can derive I
l
l
2 1 1
Y:rlYA = YpYoYo~YoYoYo =
Y;(YpYo)'7)Y; 2 2 2
= YoYoYprlY; =
YEsY;,
(9.73)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
363
and hence we find that, for an arbitrary vector a ,
(9.74)
a'qy; = a 2 q y i .
Equation (9.71) now follows immediately from (9.74) by writing (ab)'q = a'b'qyhyh = a1b2qyiyA = b2a'qyhyi = b2a'q y iy
(9.75)
i
= (ba)2q.
Equation (9.74) can therefore be viewed as the fundamental property of the relativistic invariant q. From q a number of Lorentz-invariant 2-particle multivectors can be constructed by sandwiching arbitrary multivectors between q and 4. The simplest such object is qq = &b(l - i ' i 2 ) E = ;(I
+ iu!iui + ia$bi + iu$a;)$(l - i ' i 2 )
= f(1
- i1i2)- b(aiai - iuiiui).
(9.76)
+
This contains a scalar pseudoscalar (grade-8) term, which is obviously invariant, together with the invariant grade4 multivector (oLaiiaiioi).The next simplest object is qy;y;?j = b(1 + iafia: + ia:ia; + icr:ia:>t(l - i 1.2 1 )yoyo 1 2 = a(yAyi
+ i1i2y:yi - i1i2yAyi- y i y : )
=
- y!yi)(l
(9.77)
- i'i2).
On defining the bivector K
yhyF2
(9.78)
and the 2-particle pseudoscalar
W E i l j 2 = j2i1,
(9.79)
the invariants from (9.77) are simply K and W K . That W is invariant under rotations is obvious, and the invariance of K under joint rotations in the two particle spaces follows from equation (9.77). The bivector K is of the form of a "doubling" bivector discussed in [61], where such bivectors are shown to play an important role in the bivector realization of many Lie algebras.
364
CHRIS DORAN ET AL.
From the definition of K (9.78), we find that 1 2 1 2 K A K = -2YoYoYkYk + (Y:Y:)A(YfYj)
(9.80)
= 2(uiui - iu:iu$),
which recovers the grade-4 invariant from (9.76). The full set of 2-particle invariants constructed from K are summarized in Table VIII. These invariants are well known and have been used in constructing phenomenological models of interacting particles [62,63]. The STA derivation of the invariants is quite new, however, and the fundamental role played by the bivector K is hidden in the matrix formalism. D . Multiparticle Wave Equations In order to extend the local-observables approach to quantum theory to the multiparticle domain, we need to construct a relativistic wave equation satisfied by an n-particle wavefunction. This is a subject that is given little attention in the literature, with most textbooks dealing solely with the field-quantized description of an n-particle system. An n-particle wave equation is essential, however, if one aims to give a relativistic description of a bound system (where field quantisation and perturbation theory on their own are insufficient). A description of this approach is given in Chapter 10 of Itzykson and Zuber [36], who deal mainly with the BetheSalpeter equation for a relativistic 2-particle system. Written in the STA, this equation becomes
W' - ml)CiV2 - m2)9(r,s) =
s)ljl(r, s),
(9.81)
where j represents right-sided multiplication by J, Z(r, s) is an integral operator representing the interparticle interaction, and (9.82) TABLE VIII INVARIANTS TWO-PARTICLE RELATIVISTIC Invariant
Type of interaction
1 K KAK WK W
Scalar Vector Bivector Pseudovector Pseudoscalar
Grade 0 2 4 6 8
SPACETIME ALGEBRA AND ELECTRON PHYSICS
365
with r and s the 4-D positions of the two particles. Strictly, we should have written Vj and V%instead of simply V’ and V2. In this case, however, the subscripts can safely be ignored. The problem with Eq. (9.81) is that it is not first-order in the 8-dimensional vector derivative V = V’ + V2. We are therefore unable to generalize many of the simple first-order propagation techniques discussed in Section V. Clearly, we would like to find an alternative to (9.81) which retains the first-order nature of the single-particle Dirac equation. Here we will simply assert what we believe to be a good candidate for such an equation, and then work out its consequences. The equation we shall study, for two free spin-f particles of masses ml and m 2 , respectively, is (9.83)
+
We can assume, a priori, that is not in the correlated subspace of the direct-product space. But, since E commutes with iyi + iy:, any solution to (9.83) can be reduced to a solution in the correlated space simply by right-multiplying by E. Written out explicitly, the vector x in Eq. (9.83) is x = rl
+ s* = y
y
+y p ,
(9.84)
+.
where { r p , sp}, are a set of 8 independent components for Of course, all particle motions ultimately occur in a single space, in which the vectors r and s label two independent position vectors. We stress that in this approach there are two timelike coordinates, ro and so, which is necessary if our 2-particle equation is to be Lorentz-covariant. The derivatives V’ and V2 are as defined by Eq. (9.82), and the 8-dimensional vector derivative V = V, is given by
v = v1 + v2.
(9.85)
Equation (9.83) can be derived from a Lorentz-invariant action integral in 8-dimensional configuration space in which the llm, and llm, factors enter via a linear distortion of the vector derivative V. We write this as (9.86) where Z is the linear mapping of vectors to vectors defined by (9.87) This distortion is of the type used in the gauge theory approach to gravity developed in [48, 64-66], and it is extremely suggestive that mass enters Eq. (9.83) via this route.
366
CHRIS DORAN ET AL.
Any candidate 2-particle wave equation must be satisfied by factored states of the form $J
where tions,
C#II
=
6'(r') x 2(s2)E,
(9.88)
and x 2 are solutions of the separate single-particle Dirac equa-
Vc$
=
-ml$iy3,
Vx = -m2xiy3.
(9.89)
To verify that our Eq. (9.83) meets this requirement, we substitute in the direct-product state (9.88) and use (9.89) to obtain
where we have used the result that V2 commutes with and i y : anticommute, we have
6'.Now, since iy:
(iyi + iy;)(iyi + iy:) = -2
so that
(f
+
z)
4'x2E(iy!+ iy:) = 2 4 ' x 2 E ,
(9.91)
(9.92)
and (9.83) is satisfied. Equation (9.83) is only satisfied by direct-product states as a result of the fact that vectors from separate particle spaces anticommute. Hence, Eq. (9.83) does not have an equivalent expression in terms of the direct-product matrix formulation, which can only form commuting operators from different spaces.
E. The Pauli Principle In quantum theory, indistinguishable particles must obey either FermiDirac or Bose-Einstein statistics. For fermions this requirement results in the Pauli exclusion principle that no two particles can occupy a state in which their properties are identical. At the relativistic multiparticle level, the Pauli principle is usually encoded in the anticommutation of the creation and annihilation operators of fermionic field theory. Here we show that the principle can be successfully encoded in a simple geometrical manner at the level of the relativistic wavefunction, without requiring the apparatus of quantum field theory. We start by introducing the grade-4 multivector
I = r,,rlr2r3,
(9.93)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
367
where (9.94) It is a simple matter to verify that I has the properties (9.95) and (9.96) It follows that I functions as a geometrical version of the particle exchange operator. In particular, acting on the 8-dimensional position vector x = Y ' + s 2 , we find that IxZ=r2+s1,
(9.97)
where r2 = y i r p ,
s1
= y;s".
(9.98)
So I can certainly be used to interchange the coordinates of particles 1 and 2. But, if I is to play a fundamental role in our version of the Pauli principle, we must first confirm that it is independent of our choice of initial frame. To see that it is, suppose that we start with a rotated frame { Ry , R } and define 1 r; = (R'yhR' + R2y;R2)= R'R2T,R2R'. v5
The new
(9.99)
rk give rise to the rotated 4-vector 11
=~ f ~ 2 1 R 2 R f .
(9.100)
But, acting on a bivector in particle space 1, we find that
la' A 6'1 = -(Za'Z) A (Ib'l) = -a2 A b2,
(9.101)
and the same is true of an arbitrary even element in either space. More generally, I . . . I applied to an even element in one particle space flips it to the other particle space and changes sign, while applied to an odd element it just flips the particle space. It follows that IR2R1= RlIR' = RlR21,
(9.102)
and substituting this into (9.100), we find that I' = I , so I is indeed independent of the chosen orthonormal frame. We can now use the 4-vector I to encode the Pauli exchange principle geometrically. Let $ ( x ) be a wavefunction for two electrons. Our sug-
368
CHRIS DORAN ET AL.
-
gested relativistic generalization of the Pauli principle is that $(x) should be invariant under the operation $(x)
Zl/J(ZXZ)Z.
(9.103)
For n-particle systems the extension is straightforward: The wavefunction must be invariant under the interchange enforced by the Z’s constructed from each pair of particles. We must first check that (9.103) is an allowed symmetry of the 2-particle Dirac equation. With x’ defined as ZxZ, it is simple to verify that
vx<= vf + vg = zvz,
(9.104)
v = z v, J .
(9.105)
and hence that
So, assuming that $(x) satisfies the 2-particle equation (9.83) with equal masses m, we find that
v[z$(zxz)zl(iy;
+ iy:> = -zvx,$(x‘)z(iy; + iy:) = mZ$(x’)(iy: + iy:)Z(iyi + iy!).
(9.106)
But iyi + iy: is odd and symmetric under interchange of its particle labels. It follows that Z(iy4
+ i y ; ) =~ (iy: + iy:)
(9.107)
and hence that
V[Z$(ZXZ)Z](~~; + iy:)
= 2mZ$(Zxl)~.
(9.108)
So, if $ ( x ) is a solution of the 2-particle equal-mass Dirac equation, then so to is Z$(ZxZ)Z. Next we must check that the proposed relativistic Pauli principle deals correctly with well-known elementary cases. Suppose that two electrons are in the same spatial state. Then we should expect our principle to enforce the condition that they are in an antisymmetric spin state. For example, consider iui - iu;,the spin singlet state. We find that Z(iu; - iu;)Z = -ia: + iui,
(9.109)
recovering the original state, which is therefore compatible with our principle. On the other hand, Z(icri
+ iu:)Z = -(hi+ iaz),
(9.110)
so no part of this state can be added in to the wavefunction, which again is correct. In conclusion, given some 2-particle solution $(x), the corre-
369
SPACETIME ALGEBRA AND ELECTRON PHYSICS
sponding state $*= $(x)
+ Z$(IxI)f
(9.11 1)
still satisfies the Dirac equation and is invariant under $(x) H Z$(ZxZ)f. We therefore claim that the state $I is the correct relativistic generalization of a state satisfying the Pauli principle. In deference to standard quantum theory, we refer to Eq. (9.111) as an antisymmetrization procedure. The final issue to address is the Lorentz covariance of the antisymmetrisation procedure (9.111). Suppose that we start with an arbitrary wavefunction $ ( x ) satisfying the 2-particle equal-mass equation (9.83). If we boost this state via +(x)
-$'(X)
= R~R~$(R*R!XRIR*),
(9.1 12)
then $ ' ( x ) also satisfies the same equation (9.83). The boosted wavefunction $ ' ( x ) can be thought of as corresponding to a different observer in relative motion. The boosted state $'(x) can also be antisymmetrized to yield a solution satisfying our relativistic Pauli principle. But for this procedure to be covariant, the same state must be obtained if we first antisymmetrize the original $(x), and then boost the result. Thus we require that SJl(SxS) + ZS$(Z3xSZ)Z = S[+(SxS)
+ f$(SZxZS)Z],
(9.
where S = R1R2.Equation (9.113) reduces to the requirement that
ZS$(lSxSZ)
=
SZ$(SZxlS),
(9. 14)
which is satisfied provided that
IS = sz
(9.115)
or R ' R ~ I= Z R ' R ~ .
(9.116)
But we proved precisely this equation in demonstrating the frame invariance of I , so our relativistic version of the Pauli principle is Lorentzinvariant. This is important as, rather like the inclusion of the quantum correlator, the Pauli procedure discussed here looks highly nonlocal in character.
F . Eight-Dimensional Streamlines and Pauli Exclusion
For a single Dirac particle, a characteristic feature of the STA approach is that the probability current is a rotated/dilated version of the yo vector,
370
CHRIS DORAN ET AL.
f = $yo$. This current has zero divergence and can therefore be used to define streamlines, as discussed in Section VII. Here we demonstrate how the same idea extends to the 2-particle case. We find that the conserved current is now formed from $ acting on the y; + yi vector, and therefore exists in 8-dimensional configuration space. This current can be used to derive streamlines for two particles in correlated motion. This approach should ultimately enable us to gain a better insight into what happens in experiments of the Bell type, where spin measurements on pairs of particles are performed over spacelike separations. We saw in Section VIII how the local observables viewpoint leads to a radical reinterpretation of what happens in a single spin measurement, and we can expect an equally radical shift to occur in the analysis of spin mesurements of correlated particles. As a preliminary step in this direction, here we construct the current for two free particles approaching each other head-on. The streamlines for this current are evaluated and used to study both the effects of the Pauli antisymmetrization and the spin dependence of the trajectories. This work generalizes that of Dewdney et al. [26, 271 to the relativistic domain. We start with the 2-particle Dirac equation (9.83), and multiply on the right by E to ensure the total wavefunction is in the correlated subspace. Also, since we want to work with the indistinguishable case, we assume that both masses are m. In this case our basic equation is V$E(iy: + iy:) = 2m$E
(9.117)
+ Y&
(9.118)
and, since EGy: + i d ) = JCY;
Eq. (9.117) can be written in the equivalent form V$E(yA
+ yi) = -2m$J.
(9.119)
Now, assuming that $ satisfies $ = $E, we obtain
v$(r;+ Y;)$
=
(9.120)
-2m$J$,
and adding this equation to its reverse yields V$(yl!l+Y 3 $ + My; + y$bv
= 0.
(9.121)
The scalar part of this equation gives
v
'
C$CYA + Yi>$>,
= 0,
(9.122)
which shows that the current we seek is
9 = ($(A+ rim,
9
(9.123)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
37 1
as defined in Eq. (9.59). The vector 8, has components in both particle-1 and particle-2 spaces, which we write as 8, = 8,; + 9:.
(9.124)
The current 8, is conserved in 8-dimensional space, so its streamlines never cross there. The streamlines of the individual particles, however, are obtained by integrating 8,, and j 2 in ordinary 4-D space, and these can of course cross. An example of this is illustrated in Fig. 11, which shows the streamlines corresponding to distinguishable particles in two Gaussian wavepackets approaching each other head-on. The wavefunction used to produce this figure is just
JI = 4'(r1)x2(s2)E,
(9.125)
with 4 and x being Gaussian wavepackets, moving in opposite directions. Since the distinguishable case is assumed, no Pauli antisymmetrization is used. The individual currents for each particle are given by s) = +(r)yo6(r>(x(s));,(s)) and
My,
s)
= x(s)yoX(s)(4(r)6(r)) (9.126) Distinguishable Particles
FIGURE I I . Streamlines generated by the unsymmetrized 2-particle wavefunction J, = Q'(r')X2(s2)E. Time is shown on the vertical axis. 4 and x are Gaussian wavepackets moving in opposite directions, and the "collision" is arranged to take place at t = 0. The lack of any antisymmetrization applied to the wavefunction means that the streamlines pass straight through each other.
372
CHRIS DORAN ET AL.
and, as can be seen, the streamlines (and the wavepackets) simply pass straight through each other. An interesting feature emerges in the individual currents in (9.126). One of the main problems with single-particle Dirac theory is that the current is always positive-definite so, if we wish to interpret it as a charge current, it fails to represent antiparticles correctly. The switch of sign of the current necessary to represent positrons is put into conventional theory essentially “by hand,” via the anticommutation and normal ordering rules of fermionic field theory. In Eq. (9.126), however, the norm (xi)of the second state multiplies the current for the first, and vice versa. Since (xi) can be negative, it is possible to obtain currents which flow backward in time. This suggests that the required switch of signs can be accomplished while remaining wholly within a wavefunction-based approach. An apparent problem is that, if only one particle has a negative norm state-say, for example, x has (xi)< O-then it is the 4 current which is reversed, and not the x current. However, it is easy to see that this objection is not relevant to indistinguishable particles, and it is to these we now turn. We now apply the Pauli symmetrization procedure of the previous subsection to the wavefunction of Eq. (9.125) so as to obtain a wavefunction applicable to indistinguishable particles. This yields
Ji = ( + W x 2 ( s 2 )- x’(r2M2(s1))E,
(9.127)
from which we form and $*, as before. We must next decide which spin states to use for the two particles. We first take both particles to have their spin vectors pointing in the positive z direction, with all motion in the + z direction. The resulting streamlines are shown in Fig. 12a. The streamlines now “repel” one another, rather than being able to pass straight through. The corrugated appearance of the lines near the origin is the result of the streamlines having to pass through a region of highly oscillatory destructive interference, since the probability of both particles occupying the same position (the origin) with the same spin state is zero. If instead the particles are put in different spin states, then the streamlines shown in Fig. 12b result. In this case there is no destructive interference near the origin, and the streamlines are smooth there. However, they still repel! The explanation for this lies in the symmetry properties of the 2-particle current. Given that the wavefunction t,b has been antisymmetrized according to our version of the Pauli principle, then it is straightforward to show that
Z$ (fxZ)Z = $ (x).
(9.128)
It follows that at the same spacetime position, encoded by ZxZ = x in the 2-particle algebra, the two currents and $? are equal. Hence, if two
Spins Aligned
a Spins Anti-aligned
b FIGURE 12. Streamlines generated by the antisyrnmetrised 2-particle wavefunction $ = [ + l ( r l ) x 2 ( s 2 ) - ~ ' ( r ~ ) + ~ ( s 'The ) ] Eindividual . wavepackets pass through each other, but the streamlines from separate particles do not cross. Figure 12a has both particles with spins aligned in the +z direction, and Fig. 12b shows particles with opposite spins, with in the +z direction, and x in the -z direction. Both wavepackets have energy 527 KeV and a spatial spread of -20 pm. The spatial units are 10-12rn and the units of time are lo-'* s. The effects of the antisymmetrization are important only where there is significant wavepacket overlap.
+
374
CHRIS DORAN ET AL.
streamlines ever met, they could never separate again. For the simulations presented here, it follows from the symmetry of the setup that the spatial currents at the origin are both zero and so, as the particles approach the origin, they are forced to slow up. The delay means that they are then swept back in the direction they have just come from by the wavepacket traveling through from the other side. We therefore see that “repulsion” as measured by streamlines has its origin in indistinguishability, and that the spin of the states exerts only a marginal effect.
X. FURTHER APPLICATIONS In this section we briefly review two further applications of spacetime algebra to important areas of electron physics. The first of these is classical and semiclassical mechanics. As well as simplifying many calculations in quantum mechanics. Spacetime algebra is well suited to handling problems in classical mechanics where electrons are treated as point charges following a single trajectory. In recent years there has been considerable interest in finding modifications to the simple classical equations to include the effects of spin, without losing the idea of a definite trajectory 130, 681. One of the aims behind this work is to find a suitable classical model which can be quantized via the path-integral route [69]. One of the more promising candidates is discussed here, and we outline some improvements that could repair some immediate defects. The second application discussed here is to Grassmann algebra and the associated “calculus” introduced by Berezin [70]. Grassmann quantities are employed widely in quantum field theory, and the Berezin calculus plays a crucial role in the path-integral quantization of fermionic systems. Here we outline how many of the calculations can be performed within geometric algebra, and draw attention to some work in the literature. We do not attempt a more detailed analysis of path-integral quantization here. A . Classical and Semiclassical Mechanics
The Lorentz force law for a point particle with velocity u , mass m , and charge q is i, = -9F * u ,
m
(10.1)
where ir denotes differentiation with respect to the affine parameter and F is the external electromagnetic field bivector. Any future-pointing unit
SPACETIME ALGEBRA AND ELECTRON PHYSICS
375
timelike vector can be written in terms of a rotor acting on a fixed vector y o , U(T) = R ( T ) Y ~ R ( T ) ,
(10.2)
from which we find u = (2RR)* u.
(10.3)
The quantity RR is a bivector, so we can recover Eq. (10.1) by setting
2RR
=
-F 4
(10.4)
rn
so that
A
=
- 4F R . 2m
(10.5)
This is not the only equation for R that is consistent with (lO.l), since any bivector that commutes with u could be added to F . However, (10.5) is without doubt the simplest equation available. It turns out that Eq. (10.5) is often easier to analyse than (10. l), as was first shown by Hestenes [13]. Furthermore, we can extend this approach to include a classical notion of spin. Let us suppose that, as well as describing the tangent vector u, the rotor R determines how a frame of vectors is transported along the curve. We can then define the spin vector as the unit spatial vector s
=Ry,R,
(10.6)
which matches the definition given for the quantum observable. If we now assume that Eq. (10.5) is valid, we find that the spin vector satisfies the equation S = - 4F a m
S,
(10.7)
which gives the correct precession equation for a particle of gyromagnetic ratio 2 [13]. It follows that g = 2 can be viewed as the natural value from the viewpoint of the relativistic classical mechanics of a rotating frame-a striking fact that deserves to be more widely known. We can use the same approach to analyze the motion of a particle with a g factor other than 2 by replacing (10.5) with
A=-
'
2m
[FR + ( g / 2 - l ) R B ] ,
(10.8)
which reproduces the Bargmann-Michel-Telegdi equation employed in the analysis of spin-precession measurements [47].
376
CHRIS DORAN ET AL.
A remarkable aspect of the Dirac theory is that the current $yo$ and the momentum (which is defined in terms of the momentum operator) are not necessarily collinear. This suggests that a more realistic classical model for an electron should employ an independent quantity for the momentum which is not necessarily related to the tangent vector to the spacetime trajectory. Such a model was proposed by Barut and Zanghi [68], who did not employ the STA, and was analyzed further in [5] (see also [71]). Written in the STA, the action proposed by Barut and Zanghi takes the form
s=
1
d ~ ( $ i u ~+$p ( i - +yo$> + qA(x)+yo+),
where the dynamical variables are x(A), p ( h ) and $(A). respect to these variables yields the equations [SI
(10.9)
Variation with
i
=
$yo$,
(10.10)
P
=
qF-i,
(10.11)
and $iu3= P$yo,
(10.12)
P E P - qA.
(10.13)
where
These constitute a set of first-order equations so, with x , p , and $ given for some initial value of A, the future evolution is uniquely determined. Equations (10.10)-(10.12) contain a number of unsatisfactory features. One does not expect to see P entering the Lorentz force law (lO.ll), but rather the dynamical variable p . This problem is simply addressed by replacing (10.9) with S, =
1
dA($iu3$ + p ( k - $yo$) + q i A ( x ) )
(10.14)
so that the p and $ equations become p = qF*i
(10.15)
iiC3 = P$Yo*
(10.16)
and
The quantity p .iis a constant of the motion, and can be viewed as defining the mass.
SPACETIME ALGEBRA AND ELECTRON PHYSICS
S
377
A more serious problem remains, however. If we form the spin bivector = $iu,$,we find that
s=2pAi,
(10.17)
so, if p and ,i are initially collinear, the spin bivector does not precess, even in the presence of an external B field [35].To solve this problem an extra term must be introduced into the action. The simplest modification is S, =
1
dh($iu3$ + p ( i - $'yo$)
+ qiA(x)-
2F$icr3$), (10.18)
which now yields the equations i
=
$Yo$,
(10.19)
6
=
q F . f - 4V F ( x )* S , 2m
(10.20)
and 4 F$. $iu3= p$yo - -
2m
(10.21)
The problem with this system of equations is that m has to be introduced explicitly, and there is nothing to identify this quantity with p . i . If we assume that p = m i , we can recover the pair of equations = - 4F x S m
(10.22)
and
d = -4F . v - - V4F ( x ) * S , m 2m2
(10.23)
which were studied in [72]. While a satisfactory semiclassical mechanics for an electron still eludes us, it should be clear that the STA is a very useful tool in constructing and analyzing candidate models.
B . Grassmann Algebra Grassmann algebras play an essential role in many areas of modern quantum theory. However, nearly all calculations with Grassmann algebra can
378
CHRIS DORAN ET AL.
be performed more efficiently with geometric algebra. A set of quantities (5,) form a Grassmann algebra if their product is totally antisymmetric:
{,tj = -5.5.. J
I
(10.24)
Examples include fermion creation operators, the fermionic generators of a supersymmetry algebra, and ghost fields in the path integral quantization of non-Abelian gauge theories. Any expression involving the Grassmann variables (5;) has a geometric algebra equivalent in which the {t;} are replaced by aframe of independent vectors {e,} and the Grassmann product is replaced by the outer (wedge) product [73, 741. For example, we can make the replacement
(,tjc, e,Aej.
(10.25)
This translation on its own clearly does not achieve a great deal, but the geometric algebra form becomes more powerful when we consider the “calculus” defined by Berezin [70]. This calculus is defined by the rules
- _ - a,, a5i
(10.26)
c
(10.27) together with the “graded Leibniz’ rule,”
(10.28) where [f,] is the parity (evedodd) off,. In geometric algebra, the operation of the Grassmann derivatives can be replaced by inner products of the reciprocal frame vectors
(10.29) so that (10.30) Some consequences of this translation procedure were discussed in [73], where it was shown that the geometric product made available by the geometric algebra formulation simplifies many computations. Applications discussed in [73J inciuded “Grauss” integrals, pseudoclassical mechanics, path integrals and Grassmann-Fourier transforms. It was also shown that super-lie algebras have a very simple representation within geometric
SPACETIME ALGEBRA AND ELECTRON PHYSICS
379
algebra. There seems little doubt that the systematic replacement of Grassmann variables with geometric multivectors would considerably enhance our understanding of quantum field theory.
XI. CONCLUSIONS There is a growing realization that geometric algebra provides a unified and powerful tool for the study of many areas of mathematics, physics, and engineering. The underlying algebraic structure (Clifford algebra) appears in many key areas of physics and geometry [75], and the geometric techniques are finding increasing application in areas as diverse as gravitation theory- [64] and robotics [76, 771. The only impediment to the wider adoption of geometric algebra appears to be physicists’ understandable reluctance to adopt new techniques. We hope that the applications discussed in this paper make a convincing case for the use of geometric algebra, and in particular the STA, in electron physics. Unfortunately, in concentrating on a single area of physics, the unifying potential of geometric algebra does not necessarily come across. However, a brief look at other applications should convince one of the wider utility of many of the techniques developed here. Further work in this field will center on the multiparticle STA. At various points we have discussed using the multiparticle STA to analyze the nonlocality revealed by EPR-type experiments. This is just one of many potential applications of the approach outlined here. Others include following the streamlines for two particles through a scattering event, or using the 3-particle algebra to model pair creation. It will also be of considerable interest to develop simplified techniques for handling more complicated many-body problems. Behind these goals lies the desire to construct an alternative to the current technique of fermionic field quantization. The canonical anticommutation relations imposed there remain mysterious, despite 40 years of discussion of the spin-statistics theorem. Elsewhere, there is still a clear need to develop the wavepacket approach to tunneling. This is true not only of fermions, but also of photons, on which most of the present experiments are performed. Looking further afield, the approach to the Dirac equation described in Section IV extends simply to the case of a gravitational background [48]. The wavepacket and multiparticle techniques developed here are essentially all that is required to address issues such as superradiance and pair creation by black holes. Closer to home, the STA is a powerful tool for classical relativistic physics. We dealt briefly with the construction of
380
CHRIS DORAN ET AL.
classical models for the electron in Section X. Elsewhere, similar techniques have been applied to the study of radiation reaction and the Lorentz-Dirac equation [35]. The range of applicability of geometric algebra is truly vast. We believe that all physicists should be exposed to its benefits and insights.
APPENDIX: THESPHERICAL MONOGENIC FUNCTIONS We begin by assuming that the spherical monogenic is an eigenstate of the x A V and J3 operators, where all operators follow the conventions of Section 1V.B. We label this state as $(1, p), so
44,p),
PI = P W , PI.
(A. 1)
JiJi = -[(iuJ * ( X A V ) -+Qicri][(iui) * (xAV) - tiu;] = $ - X A V + (XAVXAV),
64.2)
- x A V W , p) =
J3W9
The Jioperators satisfy +
where the Since
"v indicates that the derivative acts on everything to its right. (XA?XAV)$ = XAV(XAV+) - XAV$,
(A.3)
we find that
J;J;$(1,
=
(2 + 21 + 12)JI([,
=
(1 + N l + 9)$(1, PI.
(A.4)
With the ladder operators J , and J - defined by J , = J , +jJ2 J - = J, - j J 2 ,
04.5)
it is a simple matter to prove the following results:
[ J , ,J-1 = 2 J 3 , [ J , ,J3] = TJ,,
JiJi = J - J + + J 3 + J : , JiJi = J+J- - J3 + J : .
(A.6)
The raising operator J , increases the eigenvalue of J3 by an integer. But, for fixed I , p must ultimately attain some maximum value. Denoting this value as p, , we must reach a state for which
J+W,p+) = 0.
(A.7)
SPACETIME ALGEBRA AND ELECTRON PHYSICS
38 1
Acting on this state with JiJi and using one of the results in (A.6), we find that (1 +
4>(1 + 9 = p+(p+ +
11,
(A.8) and, as 1 is positive and p + represents an upper bound, it follows that
I + 4.
fA.9) There must similarly be a lowest eigenvalue of J3 and a corresponding state with p+ =
J-$(l, p-)
=
0.
(A. 10)
In this case we find that
(I and
+ t)(l + 8) = p-(p-
*
p- =
- 1)
+ 4).
-(I
(A. 11) (A. 12)
The spectrum of eigenvalues of J3 therefore ranges from (I + 4) to -(1 + i),a total of 2(1 + 1) states. Since the J3 eigenvalues are always of the form (integer + i), it is simpler to label the spherical monogenics with a pair of integers. We therefore write the spherical monogenics as #', where -XAV$;n = 1$;"
1 20
(A. 13)
-1 - l s m % l .
(A. 14)
and J3JI;"=(m+4)$CI;"
To find an explicit form for the JI? we first construct the highest-m case. This satisfies J+$; = 0
(A. 15)
and it is not hard to see that this equation is solved by
$; cc sin' 6
e-'@m3.
(A. 16)
Introducing a convenient factor, we write
qf= (21 + 1)pi(cos 0)e/@~3.
(A. 17)
Our convention for the associated Legendre polynomials follows Gradshteyn and Ryzhik [43], so (A. 18)
382
CHRIS DORAN ET AL.
and we have the following recursion relations:
(A.20) The lowering operator J - has the following effect on $:
L$ = [-do$
+ cot 8 a,$ic~Je-+~~3 - icr2&($ + a3$cr3).
(A.21)
The latter term just projects out the { 1, icr,} terms and multiplies them by -iu2. This is the analog of the lowering matrix in the standard formalism.
(A.22)
(A.23) (A.24) Proceeding in this manner, we are led to the following formula for the spherical monogenics:
+ m + l)PY(cos 6) - P;"+l(cos O)i~+]e~'f'~~3, (A.25) in which I is a positive integer or zero, m ranges from -(I + 1) to I, and $";I
= [(I
the P;" are taken to be zero if )ml> 1. The positive and negative rn states can be related using the result that p /-ni (x)=(-IY- (I - m)!
(1 + m)!P;"(X),
(A.26)
+ m + I)! (1 - m)!
(A.27)
from which it can be shown that
+;"(-iu2) = (-l)m ( I
$;(m
+ 1).
The spherical monogenics presented here are unnormalized. Normalization factors are not hard to compute, and we find that
SPACETIME ALGEBRA AND ELECTRON PHYSICS
lond e 12=d+ sin e JI;"$Tt
=4n
(1 + rn
0
+ l)!
(I - m)!
383 (A.28)
*
REFERENCES 1 . W. K. Clifford, Applications of Grassmann's extensive algebra, Am. J . Math. 1, 350
(1878). 2. H. Grassmann, Die Ausdehnungslehre, Enslin, Berlin (1862). 3. S. F. Gull, A. N. Lasenby, and C. J. L. Doran, Imaginary numbers are not real-The geometric algebra of spacetime, Found. Phys. 23(9), 1175 (1993). 4. C. J. L. Doran, A. N. Lasenby, and S . F. Gull, States and operators in the spacetime algebra. Found. Phys. 23(9), 1239 (1993). 5. A. N. Lasenby, C. J. L. Doran, and S. F. Gull, A multivector derivative approach to Lagrangian field theory, Found. Phys. 23(10), 1295 (1993). 6. S. F. Gull, A. N. Lasenby, and C. J. L. Doran, Electron paths, tunnelling and diffraction in the spacetime algebra, Found. Phys. 23(10), 1329 (1993). 7. D. Hestenes, A unified language for mathematics and physics, in J. S. R. Chisholm and A. K. Common (Eds.), Clifford Algebras and Their Applications in Mathematical Physics (1985), p. 1 , Reidel, Dordrecht (1986). 8. D. Hestenes, Space-Time AIgebra, Gordon & Breach, New York (1966). 9. D. Hestenes, Clifford algebra and the interpretation of quantum mechanics, in J. S. R. Chisholm and A . K. Common (Eds.), Clifford Algebras and Their Applications in Mathematical Physics (1983, p. 321, Reidel, Dordrecht (1986). 10. D. Hestenes, New Foundations for Classical Mechanics, Reidel, Dordrecht (1985). 11. D. Hestenes, Proper particle mechanics, J . Math. Phys. 15(10), 1768 (1974). I?. D. Hestenes and G. Sobczyk, Clifford Algebra to Geometric Calculus, Reidel, Dordrecht (1984). 13. D. Hestenes, Properdynamicsofarigid point particle,J. Math. Phys. 15(10), 1778(1974). 14. W. E. Baylis, J. Huschilt, and Jiansu Wei, Why i?, Am. J . Phys. 60(9), 788 (1992). 15. T. G. Vold, An introduction to geometric algebra with an application to rigid body mechanics, A m . J . Phys. 61(6), 491 (1993). 16. T. G. Vold, An introduction togeometric calculus and its application to electrodynamics, Am. J . Phys. 61(6), 505 (1993). 17. B. Jancewicz, Multivectors and Clifford Algebras in Electrodynamics, World Scientific, Singapore (1989). 18. J. S . R. Chisholm and A. K. Common (Eds.), Clifford Algebras and Their Applications in Mathematical Physics (1985). Reidel, Dordrecht (1986). 19. A. Micali, R. Boudet, and J. Helmstetter (Eds.),CliffordAlgebras and Their Applications in Mathematical Physics (1989), Kluwer Academic. Dordrecht (1991). 20. F. Brackx, R. Delanghe, and H. Serras (Eds.), Clifford Algebras and Their Applications in Mathematical Physics (1993). Kluwer Academic, Dordrecht (1993). 21. D. Hestenes and A. Weingartshofer (Eds.), The Electron. New Theory and Experiment. Kluwer Academic, Dordrecht (1991). 22. D. Hestenes, Real spinor fields, J . Math. Phys. 8(4), 798 (1967). 23. D. Hestenes, Vectors, spinors, and complex numbers in classical and quantum physics, A m . J . Phys. 39, 1013 (1971).
384
CHRIS DORAN ET AL.
24. A. N. Lasenby, C. J. L. Doran, and S. F. Gull, 2-spinors, twistors and supersymmetry in the spacetime algebra, in Z. Oziewicz, B. Jancewicz, and A. Borowiec (Eds.), Spinors, Twistors, Cfifford Algebras and Quantum Deformations, p. 233, Kluwer Academic, Dordrecht (1993). 25. D. Bohm, R. Schiller, and J. Tiomno, A causal interpretation of the Pauli equation, Nuovo Cim. Suppl. 1,48 (1955). 26. J. P. Vigier, C. Dewdney, P. R. Holland, and A. Kyprianidis, Causal particle trajectories and the interpretation of quantum mechanics, in B. J. Hiley and F. D. Peat (Eds.), Quantum Implications, p. 169, Routledge, London (1987). 27. C. Dewdney, P. R. Holland, A. Kyprianidis, and J. P. Vigier, Spin and non-locality in quantum mechanics, Nature 336, 536 (1988). 28. D. Hestenes and R. Gurtler, Consistency in the formulation of the Dirac, Pauli and Schrodinger theories, J. Math. Phys. 16(3), 573 (1975). 29. D. Hestenes, Spin and uncertainty in the interpretation of quantum mechanics, Am. J . Phys. 47(5), 399 (1979). 30. D. Hestenes, The zitterbewegung interpretation of quantum mechanics, Found. Phys. 20(10), 1213 (1990). 31. C. Dewdney, P. R. Holland, and A. Kyprianidis, What happens in a spin measurement? Phys. Lett. A . 119(6), 259 (1986). 32. P. R. Holland, The Quantum Theory of Motion, Cambridge University Press, Cambridge (1993). 33. D. Hestenes, Observables, operators, and complex numbers in the Dirac theory, J . Math. Phys. 16(3), 556 (1975). 34. J. D. Bjorken and S. D. Drell, Relativistic Quantum Mechanics, vol. 1 , McGraw-Hill, New York (1964). 35. S. F. Gull, Charged particles at potential steps, in A. Weingartshofer and D. Hestenes, (Eds.), The Electron, p. 37, Kluwer Academic, Dordrecht (1991). 36. C. Itzykson and J-B. Zuber, Quantum Field Theory, McGraw-Hill, New York (1980). 37. J. D. Hamilton, The Dirac equation and Hestenes’ geometric algebra, J . Math. Phys. 25(6), 1823 (1984). 38. R. Boudet, The role of the duality rotation in the Dirac theory, in A. Weingartshofer and D. Hestenes (Eds.), The Electron, p. 83, Kluwer Academic, Dordrecht (1991). 39. H. Kriiger, New solutions of the Dirac equation for central fields, in A. Weingartshofer and D. Hestenes (Eds.), The Electron, p. 49, Kluwer Academic, Dordrecht (1991). 40. C. Daviau and G. Lochak, Sur un modkle d’tquation spinorielle non lintaire, Ann. de la fond. L . de Broglie 16(1), 43 (1991). 41. R. P. Feynman, Quantum Electrodynamics, Addison-Wesley, Reading, MA (1961). 42. J. J. Sakurai, Advanced Quantum Mechanics, Addison-Wesley, Reading, MA (1967). 43. I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series and Products, 5th ed., Academic Press, S2n Diego (1994). 44. M. Moshinsky and A. Szczepaniak, The Dirac oscillator, J . Phys. A : Math. Gen. 22, L817 (1989). 45. W. T. Grandy, Jr., Retativistic Quantum Mechanics of Leptons and Fiefds. Kluwer Academic, Dordrecht (1991). 46. R. P. Martinez Romero and A. L. Salas-Brito, Conformal invariance in a Dirac oscillator, J. Math. Phys. 33(5), 1831 (1992). 47. D. Hestenes, Geometry of the Dirac theory, in J . Keller (Ed.), The Mathematics of Physical Spacetime, p. 67, UNAM, Mexico (1982). 48. A. N. Lasenby, C. J . L. Doran, and S. F. Gull, Gravity, gauge theories and geometric algebra, submitted to: Phys. Rev. D (1995).
SPACETIME ALGEBRA AND ELECTRON PHYSICS
385
49. D. M. Fradkin and R. J. Kashuba, Spatial displacement of electrons due to multiple total reflections, Phys. Reu. D 9(10), 2775 (1974). 50. M. E. Rose, Relatiuisfic Electron Theory, Wiley, New York (1961). 51. C. A. Manogue, The Klein paradox and superradiance, Ann. Phys. 181, 261 (1988). 52. W. Greiner, B. Miiller, and J. Rafelski, Quantum Electrodynamics of Strong Fields, Springer-Verlag, Berlin (1985). 53. R. Y. Chiao, P. G. Kwiat, and A. M. Steinberg, Faster than light?, Sci. Am. 269(2), 38 (1993). 54. A. M. Steinberg, P. G. Kwiat, and R. Y. Chiao, Measurement of a single-photon tunneling time, Phys. Reu. Lett. 71(5), 708 (1993). 55. R. Landauer, Light faster than light?, Mafure 365, 692 (1993). 56. E. H. Hauge and J. A. Stfivneng, Tunnelling times: A critical review, Rev. Mod. Phys. 61(4), 917 (1989). 57. R. Landauer and Th. Martin, Bamer interaction time in tunneling, Rev. Mod. Phys. 66(1), 217 (1994). 58. J. 0 .Hirschfelder, A. C. Christoph, and W. E. Palke, Quantum mechanical streamlines. 1. Square potential barrier, J . Chem. Phys. 61(12), 5435 (1974). 59. B. J. Hiley and F. D. Peat (Eds.), Quantum Implications, Routledge, London (1987). 60. P. R. Holland, Causal interpretation of a system of two spin-: particles, Phys. Rep. 169(5), 294 (1988). 61. C. J. L. Doran, D. Hestenes, F. Sornmen, and N. van Acker, Lie groups as spin groups, J. Math. Phys. 34(8), 3642 (1993). 62. A. P. Galeao and P. Leal Ferreira, General method for reducing the two-body Dirac equation, J. Math. Phys. 33(7), 2618 (1992). 63. Y. Koide, Exactly solvable model of relativistic wave equations and meson spectra, I1 Nuouo Cim. 70A(4), 411 (1982). 64. A. N. Lasenby, C. J. L. Doran, and S. F. Gull, Astrophysical and cosmological consequences of a gauge theory of gravity, in N. Sanchez and A. Zichichi (Eds.), Advances in Astrofundamentai Physics, Erice 1994, p. 359, World Scientific, Singapore (1995). 65. C. J. L. Doran, A. N. Lasenby, and S. F. Gull, Gravity as a gauge theory in the spacetime algebra, in F. Brackx, R. Delanghe, and H. Serras (Eds.), Clifford Algebras and Their Applications in Mathematical Physics (1993), p. 375, Kluwer Academic, Dordrecht (1993). 66. A. N. Lasenby, C. J. L. Doran, and S. F. Gull, Cosmological consequences of a flatspace theory of gravity, in F. Brackx, R. Delanghe, and H. Serras (Eds.), Clifford Algebras and Their Applications in Mathematical Physics (1993), p. 387, Kluwer Academic, Dordrecht (1993). 67. C. Dewdney, A. Kyprianidis, and J. P. Vigier, Illustration ofthe causal modelof quantum statistics, J. Phys. A 17, L741 (1984). 68. A. 0. Barut and N. Zanghi, Classical models of the Dirac electron, Phys. Rev. Lett. 52(23), 2009 (1984). 69. A. 0. Barut and I. H. Duru, Path integral formulation of quantum electrodynamics from classical particle trajectories, Phys. Rep. 172(1), 1 (1989). 70. F. A. Berezin, The Method ofSecond Quantization, Academic Press, San Diego (1966). 71. W. A. Rodrigues, Jr., J. Vaz, Jr., E. Recami, and G. Salesi, About zitterbewegung and electron structure, Phys. Left. B 318, 623 (1993). 72. J . W. van Holten, On the electrodynamics of spinning particles, Nuct. Phys. B356(3), 3 (1991). 73. A. N . Lasenby, C. J. L. Doran, and S. F. Gull, Grassmann calculus, pseudoclassical mechanics and geometric algebra, J . Math. Phys. 34(8), 3683 (1993).
386
CHRIS DORAN ET AL.
74. C. J . L. Doran, A. N . Lasenby, and S. F. Gull, Grassmann mechanics, multivector derivatives and geometric algebra, in Z. Oziewicz, B. Jancewicz, and A. Borowiec (Eds.),Spinors, Twistors, CliffordAlgebras and Quantum Deformations, p. 215, Kluwer Academic, Dordrecht (1993). 75. H. B. Lawson and M.-L. Michelsohn, Spin Geometry, Princeton Univ. Press, Princeton, NJ (1989). 76. D. Hestenes, Invariant body kinematics: I . Saccadic and compensatory eye movements, Neural Nerworks 7(1), 65 (1994). 77. D. Hestenes, Invariant body kinematics: 11. Reaching and neurogeometry, Neural Networks 7(1), 79 (1994).
ADVANCES IN IMAGING AND ELECTRON PHYSICS, VOL. 95
Texture Representation and Classification: The Feature Frequency Matrix Approach HELEN C . SHEN and DURGESH SRIVASTAVA Department of Computer Science, The Hong Kong University of Science & Hong Kong
I. Introduction . . . . . . . . . A. State of the Art . . . . . . . 11. Representation . . . . . . . . A. Feature Images . . . . . . . B. Feature Frequency Matrix . . . C. Moment Feature Vector . . . . D. Partitioned Feature Frequency Vector E. Summary . . . . . . . . . 111. Classification Scheme . . . . . . A. Distance Measure . . . . . . B. Experimental Results . . . . . C. Summary . . . . . . . . . IV. Conclusions . . . . . . . . . References . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
387 389 390 390 391 393 400 40 1 402 402 403 404 404 406
I. INTRODUCTION Texture is a surface feature that is utilized implicitly in human visual perception. It has been a challenge to many machine vision researchers for the past decades to quantify textures meaningfully. Unfortunately, to date, researchers are still trying to agree on a “comprehensive” definition of this term texture. The Oxford English Dictionary defines texture as quality of a surface or substance when felt or looked at; arrangement of threads in textile fabric Webster’s Dictionary
defines it as
something composed of closely interwoven elements, the structure formed by the threads of a fabric; the disposition or manner of union of the particles of a body or substance; the visual or tactile surface characteristics and appearance of something. 387
Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
388
HELEN C. SHEN AND DURGESH SRIVASTAVA
Haralick, in the 1970s (Haralick, 1979), described texture in two layers, tonal primitives and the spatial organization of its primitives. The spatial organization may be random, it may have a dependence between primitives, and the dependence may be structural, probabilistic, or functional. Goo1 et al. (1985) describes texture as a structure composed of a large number of more or less ordered similar elements without one of these drawing special attention. Unser (1986) considers texture as the term used to qualify the surface of a given object or phenomenon. And it must be regarded as a neighborhood property of an image point. Amadasun and King (1989) define textures in two aspects, literal and visual: literally, texture refers to the arrangement of the basic constituents of a
material which is depicted by spatial inter-relationshipbetween and/or spatial arrangement of the image pixels in digital image; visually, these spatial interrelationship or arrangement of image pixels are seen as changes in the intensity patterns or grey tones.
In our opinion, texture can be described in three aspects-visual, constructional, and informational. Visual aspect concerns the perceptual features such as fineness, coarseness, line-likeness, blob-likeness, roughness, smooth, etc. The constructional aspect provides notions of the “makeup” of the texture, i.e., texture patterns which are also referred to as texture primitives. The third aspect of texture relates to the statistical and structural information in the global sense. In this paper, our emphasis is on the representational aspects for the purpose of classification of homogeneous classes of textures. Reconstruction of a textured image is an entirely different research challenge. Thus, the capability of reconstruction will not be considered as one of the criteria of a good representation. Our proposed representational scheme is based on feature frequency matrices (FFM). These matrices capture the frequency of occurrence of joint feature properties. They are statistical descriptions of certain feature properties of a texture. Information in a texture image is resolution dependent; so are the feature frequency matrices. Therefore, for a given texture image, a set of FFM is able to capture various aspects of the statistical, structural, global, and local properties. This is a comprehensive representation of texture. Before we present the scheme, we shall briefly give an overview of the current state of the art in texture representation.
TEXTURE REPRESENTATION AND CLASSIFICATION
389
A . State of the Art
Unfortunately and to our disappointment, activities in the area of texture representation have dwindled in the last few years. In the 1970s, researchers were actively involved in the attempt to, quantify textures. Haralick (1979), Wechsler (1980), and Goo1 et al. (1985) are some of the more recent surveys that reported some research activities. Image transformation was among the earliest techniques used in texture analysis. In particular, discrete Fourier transformation was first used by Lendaris and Stanley (1970). A number of researchers have adapted the approach (Bajcsy and Lieberman, 1976; Eklunch, 1979) to capture texture features from the energy and phase spectra. Haralick et al. (1973) and many others (Conners and Harlow, 1980; Unser, 1986) extract spatial gray-tone dependence cooccurrence statistics in the form of matrices. Each matrix is the estimated second-order joint conditional probability density functions P ( i ,j 1 d, 19).Each P ( i , j 1 d, 0) denotes the probability of occurrence of a pair of gray tones ( i , j ) at distance ( d )pixels and angle (0) away from each other. From these spatial gray-tone dependence matrices, many single-value features can be extracted to form the feature vector. Local mask methods were proposed by Laws (1979) and Harwood et al. (1985). Features are extracted by using a set of designed masks of varying sizes to encapsulate specific texture features. The random field model is a popular mathetmatical tool for image modeling (Chellappa and Jain, 1993). By estimating the parameters, different texture properties can be extracted. Two-dimensional Markov random field models were proposed by Hassner and Sklansky (1978), Kaneko and Yodogawa (1982), Kashyap et al. (1982), Cross and Jain (1983), Dunn et al. (1988) in texture analysis. The multichannel filter has been another popular approach in recent years (Coggins and Jain, 1985; Turner, 1986; Fogel and Sagi, 1989; Unser and Eden, 1989; Bovik et al., 1990; Jain and Farrokhnia, 1991). Spatial frequency and orientation filters are used to generate filtered images. Depending on the selected filters, multiresolution features corresponding to visual perceptual properties can be extracted for discrimination and segmentation. He and Wang (1992) devised a texture spectrum to characterize texture images. Shen and Wong (1983) proposed a resolution-dependent frequency mean diagram representation scheme. A new metric based on feature events rather than frequency was proposed. Our proposed representational scheme can be considered as a generalization of the gray-tone cooccurrence statistics method and an extension of the frequency mean diagram.
390
HELEN C . SHEN AND DURGESH SRIVASTAVA
11. REPRESENTATION
Desirable properties of a “good” texture representation include the capabilities to capture (1) statistical and structural information; (2) meaningful visual aspects; and (3) global and local information of the texture image. The measure of the efficacy of the representation includes (1) the ease of “computing” the representation; and (2) the ease in employing the representation to perform classification. Conciseness of the representation is also an important factor to consider. With these measures of criteria in mind, we propose a scheme for representing homogeneous texture image. Most of the existing methodologies try to quantify one visual feature by a single value. On the one hand, the ability to describe a visual feature, e.g., coarseness, by a value is desirable for its simplicity. On the other, we believe that much information is lost in the process of computing the one feature value. The basis of the proposed scheme is thefeaturefrequency matrix (FFM). It differs from the cooccurrence matrix approach (Haralick et al., 1973) in that it captures the frequency of occurrence of joint feature properties rather than the frequency of cooccurrence of the spatial gray tones. A feature frequency matrix is a statistical description of certain properties of a texture. Feature properties capture the structural information of the texture. It is also believed that images at different resolution levels reveal different information for texture differentiation (Srivastava and Shen, 1995). Therefore, important “parameters” to be considered in the proposed scheme are (1) the types of features to be extracted and (2) the level(s) of resolution. Figure 1 gives a diagrammatic illustration of a full scheme for one texture image. Without loss of generality and for the sake of simplicity in presenting the proposed scheme, three basic features, namely, gray level, magnitude, and direction of gradient, and one resolution level will be considered in the following discussion. A . Feature Images
People perceive textures in qualitative terms such as fine, coarse, linelike, bloblike, dark, light, with a change of intensity in certain directions, etc. Quantifying these properties is the first step in defining the representation. Various attempts were made in the past (Tamura et al., 1978; Amadasun and King, 1989; Shen and Bie, 1992) to define these visual properties. In this paper, we shall make no attempt to quantify different visual properties. We believe that the choice of feature properties is application dependent. Given an image at a certain resolution level, a set of operators can be
TEXTURE REPRESENTATION AND CLASSIFICATION
39 1
a Original Image
Feature Frequency Matrices:
I-D, 2-D, ... , n-D
Feature Frequency Matrices:
I-D, 2-D, ... , n-D
Partitioned FF Vector
Moment Feature Vector
FIGURE1, Representation scheme.
defined to extract feature properties. By applying one operator to the image, a feature image can be obtained. Formally, we define:
Definition I : Feature Image ( F ) . Given an (N1 x N2) image S at any resolution level and an n x n operator OP,a feature image F is obtained by convolving OP over the image S. For example, the Sobel gradient operators can be used to obtain the magnitude and directionality of the gradient feature images. Figure 2 is a set of gray-level feature images of 20 samples from the texture album (Brodatz, 1968), i.e., the digitized images. B. Feature Frequency Matrix
Statistical descriptions of feature properties are important information in texture analysis. From the feature images, histograms can be computed to reflect the distribution of particular features. However, these histograms
392
HELEN C. SHEN AND DURGESH SRIVASTAVA
D76
D77
D78
FIGURE2. Twenty texture images.
D84
TEXTURE REPRESENTATION AND CLASSIFICATION
393
(we refer to them as one-dimensional feature frequency matrices) do not reveal distributions of joint events, i.e., certain structural properties that the texture image possesses. The natural reason is that one-dimensional feature frequency matrices consider different features independently, while visual properties are usually distributions of joint features. Therefore, occurrence of joint features should be extracted from the feature images. Formally, we define the different feature frequency matrices (FFM) in different dimensions. Dejnition 2: Z-D Feature Frequency Matrix (FFM,). Given a feature image F , the 1-D feature frequency matrix is defined by a vector F F M , , where the element ffm(i)is the number of pixels with feature value i in F . Dejnition 3: 2-D Feature Frequency Matrix (FFM,). Given two feature images, F1 and F2,the 2-D feature frequency matrix for F1 and F2 is defined as a matrix FFM, where the element ffm(i,j) is the number of pixels with feature value i in F 1 and feature value j in F2. Dejnition 4: Generalized n-D Feature Freqency Matrix (FFM,,). Given n feature images, F 1 , F 2 , . . . , F n , the generalized n-D feature frequency matrix for these feature images is defined as an n-D array (matrix), FFM,, , where the elementffm(k, , k 2 , . . . , k,) is the number of pixels with feature value kj in the feature image Fj, j = 1 . . . , n. Note that the lower-dimensional feature frequency matrix can be considered as the “marginal frequency matrix” of the higher-dimensional frequency matrix. Theoretically, for a given image at a certain resolution level, with p feature operators, p 1-D FFMs and [ p X ( p - 1)/2] 2-D FFMs can be extracted. Without loss of generality, in this paper we confine our discussion to l-D and 2-D feature frequency matrices. Furthermore, we shall consider three basic features, namely, the gray level, the magnitude, and the directionality of the gradient measure. Thus, there will be three l-D FFMs and three 2-D FFMs. Figures 3, 4, and 5 are the three l-D FFMs of gray-level feature (i.e., original image), magnitude of the gradient, and directionality of the gradient of the 20 texture images in Fig. 2. Figures 6 , 7 , and 8 are the three 2-D FFMs computed from the three feature images. C. Moment Feature Vector
From the FFMs, the distributions of “important” features are encapsulated. However, the FFMs are usually large and sparse, of the order of 256 x 256 if the gray-level distributes between 0 and 255. At times, it may
W
rs
i
395
TEXTURE REPRESENTATION AND CLASSIFICATION
1 - 7
‘ 1
DO3
DO4
DO5
r’-- 1 D14
DO9
T-- -1
D16
D17
D19
3I D2 1
D20
D22
D24
D53
D57
D68
D77
D78
D84
i--7 D29 7 ‘1
I
D76 FIGURE 4.
I-D magnitude of the gradient FFMs of the prototypes.
396
HELEN C. SHEN AND DURGESH SRIVASTAVA
DO3
DO4
DO5
DO9
D14
D16
D17
D19
D20
D21
D22
D24
D29
D53
D57
D68 -I-
"0
"a
D76 D77 D78 D84 FIGURE5. 1-D directionality of the gradient FFMs of the prototypes.
397
TEXTURE REPRESENTATION AND CLASSIFICATION
D76
D77
D78
FIGURE 6. 2-D GD-FFMs of the prototypes.
D84
398
HELEN C. SHEN AND DURGESH SRIVASTAVA
D76
D77
D78
FIGURE7. 2-D GM-FFMs of the prototypes.
D84
399
TEXTURE REPRESENTATION AND CLASSIFICATION
D76
D77
D78
FIGURE8. 2-D MD-FFMs of the prototypes.
D84
400
HELEN C. SHEN AND DURGESH SRIVASTAVA
be desirable to “compress” the FFM to a “more manageable” size. Moments are the typical measures to characterize distributions. Thus, we propose to define feature vectors for the FFMs.
Dejnition 5: Moment Feature Vectorfor1 -D FFM. The moment feature vector for a 1-D FFM is a quadruple = ( p , u 2 ,5, K ) , where
’;$’
p, the mean, is defined as u , =
i
uz,the variance, is defined as u 2=
X
#m(i);
L-1
(i - p)2 x ffm(i);
5, the skewness, i.e., the normalized third moment, is defined as 5= K,
~f=i’ (i - p) x 8 m ( i ) , 7
u3
the kurtosis, i.e., the normalized fourth moment, is defined as K =
’ ;=$
(i - p)4 x 8 m ( i )
- 3.
a4
L is the number of gray levels.
Definition 6: Moment Feature Vectorfor2-DFFM. The moment feature , 2 ) )I r = 1, 2; s = 1, vector for a 2-D FFM is a quadruple = ( M r , s ( p , p 2), where L
L
EiJoXi& (i - pl)‘0’ - p2ISx f f m ( i ,j ) Mr,s(ply P Z )=
UiU;
and p I(,u2), u1(c2) are the mean and the standard deviation of the two particular features of the 2-D FFM which are used in the generation of the 2-D FFM.
D. Partitioned Feature Frequency Vector Moments are usually meaningful for single-mode distributions. FFMs can be and usually are multimode. Therefore, in Bie et al. (1993), a partitioning scheme was proposed to “reduce” the size of the FFMs so that relevant information is retained. By attaining maximum entropy, a FFM is partitioned into submatrices dynamically. An algorithm was presented in Bie et al. (1993) to partition a FFM into a set of submatrices such that the diversity between them is maximized. Different textures will have their feature frequency events occur in different submatrices. Thus, information in the original FFM is encapsulated by the set of submatrices. Figure 9 gives an example of the original FFM and the partitioned one. Typically,
TEXTURE REPRESENTATION AND CLASSIFICATION
40 1
60000
40000
J
0-
G
20000
0
Magnitude
FIGURE9. Partitioned FFM.
a FFM of size 2048 (64 x 32) can be partitioned into less than 100 submatrices. Furthermore, by computing the total frequency in each of the submatrix, a vector called the partitioned feature frequency vector (PFFV) is obtained. Bie et al. (1993) demonstrated the efficacy of these vectors in classifying textures. One main disadvantage of this representation is the computation complexity. The proposed algorithm does not guarantee “optimal” partitioning.
E. Summary The representation scheme presented captures many aspects of a given texture image. Depending on the application, appropriate feature images
402
HELEN C. SHEN AND DURGESH SRIVASTAVA
can be extracted. The “ease” in obtaining the FFM representation depends primarily on the feature operators chosen. The computation of the frequencies is reasonably fast. Moment feature vector can also be obtained in reasonable time. However, the attraction in the information content of the partitioned feature frequency vector, its computational complexity renders it not practical in real applications. In the next section, we shall demonstrate the ease in using FFMs to perform classification. 111. CLASSIFICATION SCHEME
Twenty texture images (Fig. 2) from the album of Brodatz (1968) are used. Each image is digitized into 1024 X 1024 pixels with gray levels between 0 and 255. A training set of 256 nonoverlapping samples of size 64 x 64 pixels is extracted from each image. Three feature images, namely, the gray level, the magnitude, and the directionality of the gradient, are used in these experiments. Thus, there are three 1-D FFMs and three 2-D FFMs. A . Distance Measure
Meaningful distance measure is essential in classification. In Shen and Bie (1992) and Bie et al. (1993), Euclidean distance is employed when the moment feature vector and the partitioned feature frequency vector are the representations of textures. In Shen et al. (1993), a distance measure which is a weighted sum of the differences in the frequencies of corresponding submatrices between the partitioned FFMs is presented. The novelty of this weighted distance measure lies in the process of determining the weights. To each submatrix, a weight is associated. The weight represents the discriminatory power of that particular submatrix. Conceptually, the weighted distance measure is meaningful since the values of the weights are based upon the training samples and the partitioning scheme. These values can be recomputed as more samples are added to the training set. The classification success rate is high when compared with the Euclidean distance. However, we discovered that the gain in high correct classification rate is at great expense computationally. Thus far, no attempt has been made to utilize the FFMs directly in classification. The notion that any computation involving FFMs is “unacceptably” high has prevented us in the past from utilizing them. We have also neglected the complexity in computing the partitioned feature frequency vectors. Therefore, our experiments will be based on FFMs and Euclidean distance. As a comparison, we also consider the moment feature vector,
TEXTURE REPRESENTATION AND CLASSIFICATION
403
B . Experimental Results
Two types of representations were used. Type one is the moment feature vector which consists of four moments from each of the three 1-D FFMs and three 2-D FFMs, denoted by MFV-24. Each class is represented by 256 MFV-24from the training set. Classification is performed by comparing the MFV-24 of a test sample to every MFV-24 of the entire training set. Euclidean distance and the 3-nearest-neighbors rule are applied to derive adecision. Type two is the prototype of each class. The prototype, denoted by MFFM, is generated by taking the average of each of the three 2-D FFMs over the 256 training samples from each class. Therefore, MFFM is a set of the mean of the gray-level and directionality (GD), magnitude and directionality (MD), and gray level and magnitude (GM) FFMs. Classi-
D70
D79 FIGURE
10. Test images.
404
HELEN C. SHEN AND DURGESH SRIVASTAVA
fication is performed by comparing the FFMs of a test sample to the prototype of each class in the training set. Again Euclidean distance and the 1-nearest-neighbor rule are applied to obtain a decision. Two sets of experiments were carried out: Set I. Ten test samples were taken from each of the 20 images to give
a total of 200 test samples. The moment feature vector results in 10 misclassified samples, which gives a rate of 95% correct classification, while the prototype has 6 misclassified samples, which is a rate of 97% correct classification. Set 11. Four other images from Brodatz are taken to test the efficacy of the representations. One hundred test samples are taken from each of the four images, namely, D28 (beach sand), D35 (lizard skin), D70 (wood grain), and D79 (oriental grass fiber cloth) (Fig. 10). Table 1 gives the classification results when MFV-24 is used, and Table 2 gives the results when MFFM is used. The underlined values indicate “correct” classification. For example, D35 is an image of lizard skin which is similar to DO3 and D22 (reptile skin). Using the moment feature vector, the correct classification rate is 74%, while that from the prototype is 82%. Overall mean FFM prototype gives slightly better correct classification rate. We attributed the poor performance of some of the samples to the resolution problem and the block size of the samples used.
C . Summary From the experiments, we have demonstrated that in employing FFM prototypes for each texture class, Euclidean distance and l-nearest-neighbor rule in classification is better than using the moment vector form. Specifically, the advantages are that (1) each texture class is represented by a prototype, i.e., the mean feature frequency matrices, instead of one moment vector for each training sample; (2) simple Euclidean distance can be used; and (3) a 1-nearest-neighbor rule is applied instead of a 3-nearest-neighbors rule.
IV. CONCLUSIONS Texture is a surface feature that is difficult to quantify, yet it is an intuitive capability that human vision system takes for granted. In this paper, we have presented an approach that captures both statistical and structural information of texture images. By computing the joint occurrence of fea-
TABLE 1 CLASSIFICATION USINGMOMENTFEATURE VECTORS ~
D3
D4
D5
D9
D14
D16
D17
D19
D20
D21
D22
D24
D29
D53
D57
0 19 t28 t 3 5 4 1 0 0 1 t70 3 I t79
0 0 0
17 9 2 7
0 1 0 3
0 3 0 4
0 4 0 1
2 7 0
0 0 0 3
0 0 0
0
4 0 0 3
0 12 0
0 0 0 0
0
22
1 0
D29
D53
D57
61
0 0 8 3
0
0
1
0 0
2
1
D68
D76
D77
D78
DS4
0 2 18 6
0 0 0 9
0 0 6
2
0 0 0 3
D68
D76
D77
D78
DS4
0 2
0 2 5 3
0 1 1 3
0 0 1 9
0 0 4
0 0 6 0 2
TABLE 2 CLASSIFICATION USINGFFM PROTOTYPE D3 t28 0 1 t 3 5 5 t70 0 t 7 9 4
D4
D5
D9
5 0 12 1 0 2 0 0 0 3 0 3 1 3
D14
D16
D17
D19
D20
D21
0 0 0 5
0 0 1 0
0 5
10
0
0
0 2
1 0 1
2 0 7
D22
2
0
0 0 0
D24 ~
0 5
0 0 5
2 8
21 12 1
67 1
0
406
HELEN C. SHEN AND DURGESH SRIVASTAVA
tures, statistical information about structural aspects of textural properties can be obtained. Homogeneity of textures depends heavily on the resolution level. The proposed scheme provides the framework whereby texture images at different resolutions can be considered too. To encapsulate the information content of the feature frequency matrices, two types of feature vector can be extracted: (1) the moment feature vector, which is the typical characterization of any distributions; and (2) the partitioned feature frequency vector, which maintains maximum entropy of a feature frequency matrix. In terms of information content, the moment feature vector retains the least, while the partitioned feature frequency vector retains the most. In terms of ease of extraction, the moment feature vector takes less time than the partitioned feature frequency vector. The moment feature vector is more compact than the partitioned feature frequency vector. In this paper, we have shown that for the purpose of classification, utilizing the feature frequency matrices directly and the Euclidean distance, a consistent and high rate of correct classification can be obtained. The advantage is the tremendous savings in the overhead of extracting the moment feature vector and/or the partitioned feature frequency vector.
ACKNOWLEDGMENTS Part of the research was performed at the Dept. of Systems Design Engineering, University of Waterloo, Canada.
REFERENCES Amadasun, M. and King, R. (1989). Textural features corresponding to textural properties. ieeesmc 19(5), 1264-1274. Bajcsy, R., and Lieberman, L . (1976). Texture gradient as a depth cue. Compur. Graph. Image Process. 5(1), 52-67. Bie, C., Shen, H., and Chiu, D. (1993). Hierarchical maximum entropy partitioning in texture image analysis. Pattern Recognition Lett. 14421-429. Bovik, A., Clark, M., and Geisler, W. (1990). Multichannel texture analysis using localized spatial fileters. ieeeprrmi 12(I ) , 55-73. Brodatz, P. (1968). Textures. Reinhold, New York. Chellappa, R., and Jain, A., Eds. (1993). Markov Random Fields-Theory and Applicution. Academic Press, Harcourt Brace Jovanovich, New York. Coggins, J . , and Jain. A. (1985). A spatial filtering approach to texture analysis. Pattern Recognition Lett. 3(5), 195-203. Conners, R., and Harlow, C. (1980). A theoretical comparison of texture algorithms. ieeepami Z(3), 204-222.
TEXTURE REPRESENTATION AND CLASSIFICATION
407
Cross. G . , and Jain, A. (1983). Markov random field texture models. ieeepnmiS(l),25-39. Dunn, S., Keizer, R., and Rosenfeld, A. (1988). Random field identification from a sample: Experimental results. Pattern Recognition L e f f .8( I ) , 15-20. Eklunch, J. (1979). On the use of fourier phase features for texture discrimination. Comput. Graph. Image Process. 9(2), 199-201. Fogel, I., and Sagi, D. (1989). Gabor filters as texture discriminator. Eiol. Cybernet. 61, 103-113. Gool, L., Dewaele, P., and Oosterlinck, A. (1985). Survey: Texture analysis anno. 1983. cvgip 29, 336-357. Haralick, R. (1979). Statistical and structural approaches to textures. Pror. IEEE 67, 786-804. Haralick, R., Shanmugam, K., and Dinstein, I. (1973). Textural features for image classification. ieeesmc SMC-3(6), 610-621. Harwood, D., Subbarao, M., and Davis, L. (1985). Texture classification by local rank correlation. cvgip 32, 404-41 1 . Hassner, M., and Sklansky, J. (1978). Markov random field models of digitized image texture. In Proc. 4th I n t . Joint Conf. on Pattern Recognition, Kyoto, Japan, Nov. 1978, pp. 538-540. He, D., and Wang, L. (1992). Unsupervised textural classification of images using the texture spectrum. Pattern Recognition 25, 247-255. Jain, A., and Farrokhnia, F. (1991). Unsupervised texture segmentation using gabor filters. Pattern Recognition 24, 1167-1 186. Kaneko, H., and Yodogawa, E. (1982). A markov random field application to texture classification. In Proc. Pattern Recognition and Image Processing, pp. 221-225. Kashyap, R., Chellappa, R., and Khotanzad, A. (1982). Texture classification using features derived from random field models. Pattern Recognition Lett. 1(1),43-50. Laws, K. (1979). Texture energy measures. In Proc. Image Understanding Workshop, pp. 47-51. Lendaris, G . , and Stanley, C. (1970). Differaction pattern sampling for automatic pattern recognition. Proc. IEEE 58(2), 198-216. Shen, H., and Bie, C. (1992). Representation of visual textural properties. In C. Archibald, Ed., Advances in Machine Vision: Strategies and Applicutions, pp. 193-210. World Scientific Press, Series in Computer Science, Singapore. Shen, H., Bie, C., and Chiu, D. (1993). A texture-based distance measure for classification. Pattern Recognition 26(9), 1429-1437. Shen, H., and Wong, A. (1983). Generalized texture representation and metric. Computer Vision Graphics Image Process 23, 187-206. Srivastava, D., and Shen, H. (1995). Homogeneous textures: A study using feature frequency matrices. Submitted for publication. Tamura, H., Mori, S . , and Yamawaki, T. (1978). Textural features corresponding to visual perception. ieeesmc 8(6), 460-473. Turner, M. (1986). Texture discrimination by gabor functions. B i d . Cybernet. 55, 77-82. Unser, M. (1986). Sum and difference histograms for texture classification. ieeepami 8(1), 118-125. Unser, M., and Eden, M. (1989). Multiresolution feature extraction and selection for texture segmentation. ieeepami 11(7), 717-728. Wechsler, H. (1980). Texture analysis-A survey. Signal Processing 2, 271-282.
This Page Intentionally Left Blank
Index
Convex feasibility problem, see also Convex set theoretic image recovery convex feasibility in a product space, 17 1- 172 convex functionals, 165 FejCr-monotone sequences, 170-171 geometric properties of Hilbert space sets, 162-163 inconsistent problem solution alternating projections in a product space, 203-206 least-squares solutions, 202-203 simultaneous projection methods, 206-209 nonlinear operators, 168-170 projections distance to a set, 166 operators, 166-167 relaxed convex projections, 168 steps in solution, 161, 199-200 topologies of subsets, 163-164 Convex set theoretic image recovery affine constraints, 179, 259 applications, 180- 18 1, 183- 184 restoration problems, 182 tomographic reconstruction, 182- 183 basic assumptions, 172 confidence level, 198-199 historical developments, 176- 180
A Agmon-Motzkin-Schoenberg algorithm, 178 Algebraic reconstruction technique, 177, 183 ART, see Algebraic reconstruction technique Atomic beam splitter, ferromagnetic nanotip application, 145-149 Atomic metallic ion emission, 94
B Bargmann-Michel-Telegdi equation, 375 Beam splitter, see Atomic beam splitter Block-parallel methods, image recovery projection, 216-217 Born approximation, 313 Browder’s admissible control, image recovery projection, 209-21 1
C Characteristic surface, spacetime algebra, 310-311 Clifford algebra, see Spacetime algebra Computerized tomography, convex set theoretic image recovery, 177-178, 182-183 409
410
INDEX
image space models analog, 173 digital, 173-174 discrete, 173 general, 172-173 information management, 198-199 nonconvex problem solving convexification, 184- 185 feasibility with nonconvex sets, 186 new solution space, 185 property sets in Hilbert space, construction data formation model, 190-192 imaging system properties, 189-198 moment information sets, 194-!95 range information sets, 193 second-order information sets, 196- 198 spatial properties of image, 187-188 spectral properties of image, 188-189 transformed image, 189 set theoretic formulation, 174-176 Correlated uncertainty process, 197-198 Coulomb problem Hamiltonian, 306 spacetime algebra solution, 306-307 Coulomb scattering, spacetime algebra, 313-314 Current density distribution, 74-76, 78
D Data formation model image recovery problem, 156-157 set construction digital model, 191 general model, 190-191 notation, 190 Dirac equation CPT symmetry handling, 295 Dirac adjoint, 292-293 Hamiltonian form, 299 angular momentum operators, 302 nonrelativistic reduction, 299-301 notation, 298 Pauli theory, 299-301 spherical monogenics, 303-305 Hermitian adjoint, 292-293 observables, 292-296 plane-wave states, 295-296, 316
spacetime algebra version, 292, 297 spinors, 293-296 Dirac oscillator equation, 307 Hamiltonian, 307 negative K equation, 308-309 positive K equation, 308 Dirac plane wave evanescent waves, 316, 320-321 KIein paradox, 328-330, 332 matching at a potential step evanescent waves, 321-322 oblique incidence, 315 perpendicular incidence, 3 I5 traveling waves spin down, 319-320 spin up, 318-319 spacetime algebra equations, 295-296, 316 spin precession at a barrier Pauli spinor in reflected wave, 323 polarization operator, 324-325 precession angle, 323-324 reflection coefficient, 323 rest-spin vector for reflected wave, 323 traveling waves, 316-317 tunneling time, see Tunneling Dirac spinor complex conjugation, 291-292 Dirac-Pauli matrix representation, 289-290 operator action of matrices, 288-289 vector derivative operator, 280-28 I Weyl matrix representation, 290-291
E Electron beam, monochromatic beam and nanotip, 115, 118 Electron emission metal surface emission current density distribution, 74-76, 78 current stability, 78-81 current-voltage characteristics, 71-72 electron potential energy, 65-66 energy distribution of electrons, 72-74 extraction processes, 66 field emission, 69-71, 73-76, 78-81 metalivacuum barrier, 64-66
41 1
INDEX thermionic emission, 66, 68-69, 72-73, 75, 79 nanotip, see Nanotip Electron microscope, see Transmission electron microscope EMOPAP, see Extrapolated method of parallel approximate projections EMOPNO, see Extrapolated method of parallel nonexpansive operators EMOPP, see Extrapolated method of parallel projections EMOPSP, see Extrapolated method of parallel subgradient projections EPPM, see Extrapolated parallel projection method Extrapolated method of parallel approximate projections algorithm, 225-226 convergence results, 226 image recovery projection, 223-226 problem statement, 223-224 Extrapolated method of parallel nonexpansive operators algorithm, 230 convergence results, 230-231 image recovery projection, 229-23 1 problem statement, 229-230 Extrapolated method of parallel projections control strategies, 218-219 convergence results, 219-223 image recovery projection, 217-223 image restoration with bounded noise, 248 iteration, 217 Extrapolated method of parallel subgradient projections algorithm, 228, 257 control, 234 convergence results, 229, 234, 256 image recovery projection, 226-229, 252-253 practical considerations, 234-235 problem statement, 226-227 relaxations, 235 set theoretic formulation, 253-256 stopping rule, 235 subgradient projections, 228 superiority to POCS, 234, 257 weights, 234
Extrapolated parallel projection method, image recovery projection, 213-216
F Feasible solutions, image recovery problem, 159-160 Feature frequency matrix, texture representation, 388, 390 classification scheme, 402-403, 405 distance measure, 402, 406 feature image, 390-391 generalized, 393 moment feature vectors, 393, 400, 406 one-dimensional, 393 partitioned feature frequency vector, 400-401, 406 two-dimensional, 393 FEG, see Field emission gun FejCr-monotone sequence, 170-171 FFM, see Feature frequency matrix Field emission gun, 63-64 Field emission theory, see Electron emission Field emission tip, see Nanotip Fowler-Nordheim equation, 69-72, 99, 111-112 FPM, see Fresnel projectioc microscopy Fresnel-Kirchhoff formula, 127, 128 Fresnel projection microscopy coherence, 139, 140 experimental procedures, 129, 130 field emission current, 126, 127 Fraunhofer diffraction, 129 Fresnel diffraction, 127-129 instrumentation, 124-126 irradiation effects, 143, 144 magnetic stray field, 142, 143 magnification factor, 124, 125 , nanotip application, 124-126 nanometric carbon fibers, 130-132 ribonucleic acid, 132, 134, 136, 138, 144 synthetic polymers, 132, 134, 136-139 resolution, 140 sample preparation, 132 , virtual projection point, 126, 127
G Geometric algebra, see also Spacetime algebra
412
INDEX
discovery, 272 electron physics, 272 Gerchberg-Papoulis algorithm, 178-179 Grassman algebra, quantum theory, 377-379
H Hilbert space, see Convex feasibility problem; Convex set theoretic image recovery
I Image recovery deconvolution with bounded uncertainty experiment, 240-242 results, 243 set theoretic formulation, 243 problem solving convex feasibility problem, see Convex feasibility problem data formation model, 156-157, 190-192 elements required, 156 feasible solutions, 159-160, 259-260 optimal solutions, 158-159, 260 point estimates, 158-159 set theoretic estimates, 159-160 solution method, 157-158 projection methods block-parallel methods, 216-217 Browder’s admissible control, 209-2 I 1 extrapolated method of parallel projections, 217-223 extrapolated method of parallel approximate projections, 223-226 extrapolated method of parallel nonexpansive operators, 229-23 1 extrapolated method of parallel subgradient projections, 226-229 unification of methods, 231-232 Pierra’s extrapolated iteration, 211-216 reconstruction, 156 restoration, 156 image with bounded noise bounded versus unbounded noise, 25 1-252 experiment, 246 numerical performance, 248
results, 248 set theoretic formulation, 246, 248 subgradient projections experiment, 253 numerical performance, 256-257, 259 results, 259 set theoretic formulation, 253-256
K Klein paradox, 328-330, 332 Knoll, Max, electron microscopy contributions, 13-14, 16-18, 39, 42
L Leibniz’ rule, 283 Lorentz force law, 374
M Maxwell equation, vector derivatives, 281-282 Monogenic function, 303 Mott scattering, spacetime algebra, 314-315
N Nanotip applications, 64,112, 149 atomic beam splitter, 145-149 atomic resolution under FEM, 112-115 Fresnel projection microscopy, 124132, 134, 136-140, 142-144 local cooling, 118, 121-124 local heating, 118-121 microguns, 150 monochromatic electron beam, 115, 118 beam opening angle, 97-98 diffraction through a tunnel barrier, 106-107 geometric effect, 105-106 confinement of field emitting area, 84-96 current saturation, 110-1 12 current-voltage characteristics, 99- 100, 110-112 design, 64,81 experimental setup, 82-84
413
INDEX field emission characteristics, 81-82, 104 stability, 98, 107 in siru field sharpening, 87-88 buildup tips, 91-92 field surface melting mechanism, 93-94 growth and formation of nanoprotrusions, 94-96 local field enhancement, 88-89 sharpening in applied field, 89-91 iron nanotip atomic metallic ion emission, 147-148 beam splitting mechanism, 148-149 field emission, 145 localized band structure, 107-1 10 local work function decrease, 87 total energy distribution, 100-103, 107-110 ultrasharp tips, 85-87 Nottingham effect defined, 118 nanotip application local cooling, 118, 121-124 local heating, 118-121
0 Optimal solutions, image recovery problem, 158-159
P Parallel projection method projection in image recovery, 207-208, 211-212, 215 recovery with inconsistent restraints experiment, 235, 237 numerical performance, 240 results, 238-240 set theoretic formulation, 237-238 Pauli principle eight-dimensional streamlines, 369-372, 374 fermionic field theory, 366 relativistic wavefunctions, 366-369 Pauli spinor momentum density, 286 momentum vector field, 286 observables, 285-287 operator action of matrices, 284-285 parameterization, 339
reflected wave, 323 rotors, 287-288 spherical monogenics, 303-305, 380-383 spin vector, 286 Pauli spin states, two-particle basis vectors, 352 causal approach comparison, 358-360 nonrelativistic multiparticle observables, 355-357 nonrelativistic singlet state, 354-355 quantum correlator, 353-354 Pierra’s extrapolated iteration, image recovery projection, 21 1-216 POCS, see Projection onto convex sets algorithm Point estimates, image recovery problem, 158- 159 Polymers, synthetic, Fresnel projection microscopy, 132, 134, 136-139 PPM, see Parallel projection method Projection onto convex sets algorithm image recovery, 179-181, 183, 200-201, 238-240 image restoration with bounded noise, 248 limitations of convex feasibility problem solving countable set theoretic formulation, 202 inconsistent problems, 201-202 serial structure, 201 slow convergence, 201 practical considerations, 232-234 subgradients, 256-257, 259 Propagation spacetime algebra, 3 10-3 11 spinor potentials, 311-312
R Relativistic two-particle states relativistic singlet state and invariants, 361-364 vectors, 361 Ribonucleic acid base damage by electron irradiation, 144 Fresnel projection microscopy, 132, 134, 136, 138, 144 Ruska, Ernst children, 26, 59-61
414
INDEX
contributions to minimization of external vibration, 44-47 death, 61 extramural activities, 47-48 family background, 4- I3 Knoll's influence, 13-14, 16-18, 39, 42 marriage, 22-23, 59-61 Max-Planck-Gesellschaft experience, 37-38 military service, 23 Nobel Prize, 53-54, 56-58 politics, 48-50 postwar experiences, 28-34 retirement, 50-52 Siemens experience, 25-28, 30, 35-37 single-field condenser-objective development, 41-42 Soviet relationship, 28-30 Technische Hochschule Berlin experience, 13-14, 16-18 transmission electron microscope development, 4, 18, 20, 25, 37
S Scattering theory Born approximation, 313 Coulomb scattering, 313-314 Mott scattering, 314-315 spacetime algebra, 3 12-3 15 Schottky emission cathode, 68-69 Set theoretic estimates, image recovery problem, 159-160 Simultaneous iterative reconstruction technique, 177, 183 Single-field condenser-objective, development, 41-42 SIRT, see Simultaneous iterative reconstruction technique Snell's law, 317 Solution method, image recovery problem, 157-158 Spacetime algebra classical and semiclassical mechanics, 374-377 Grassman algebra, 377-379 introduction, 273-278 multiparticle quantum theory, 347, 349, 35 1
applications, 379 eight-dimensional streamlines and Pauli exclusion, 369-372, 374 multiparticle wave equations, 364-366 notation, 351 Pauli principle, 366-369 relativistic two-particle states, 361-364 two-particle Pauli states, 352-360 multivectors, 277 operators, 276 product types, 275 reversion operation, 276 rotors, 276 spacetime calculus, 280-283 spacetime split, 278-280 spinors, see Spinor vector derivative, 280-282 Spherical monogenic derivatization, 380-383 Pauli spinor, 303-305 Spin measurement Dirac current, 339, 342 relativistic model, 342-344 spacetime algebra, 339, 342 wavepacket simulations, 344, 346-347 Spinor, see also Dirac spinor; Pauli spinor definition, 283-284 types, 283 STA, see Spacetime algebra Stern-Gerlach apparatus, spin polarization, 287, 339
T Texture definition, 387-388 feature image, 390-391 representation cooccurrence matrix approach, 389 feature frequency matrix, 388, 390391, 393, 400-403,405-406 image transformation, 389 local mask methods, 389 multichannel filter, 389 properties, 390 Tomography, see Computerized tomography Transmission electron microscope, development, 4, 18, 20, 25, 37
INDEX Tunneling Dirac current, 333, 336 time calculation, 326-327, 337-338 two-dimensional simulation, 338-339 wavepacket tunneling, 332-333, 336-338
415 W
Wave, see Dirac plane wave Wave equations, multiparticle, 364-366 White uncertainty processes, 196-197
This Page Intentionally Left Blank