Advances in
IMAGING AND ELECTRON PHYSICS VOLUME
160
EDITOR-IN-CHIEF
PETER W. HAWKES CEMES-CNRS Toulouse, France
Advances in
IMAGING AND ELECTRON PHYSICS VOLUME
160 Edited by
PETER W. HAWKES
CEMES-CNRS, Toulouse, France
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
Academic Press is an imprint of Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 32 Jamestown Road, London NW1 7BY, UK 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA First edition 2010 Copyright # 2010, Elsevier Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email:
[email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://www.elsevier.com/ locate/permissions, and selecting Obtaining permission to use Elsevier material. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-381017-5 ISSN: 1076-5670 For information on all Academic Press publications visit our Web site at elsevierdirect.com Printed in the United States of America 10 11 12 10 9 8 7 6 5 4 3 2 1
Contents
Preface Contributors Future Contributions
1. Gamut Mapping
ix xv xvii
1
Zofia Baran´czuk, Joachim Giesen, Klaus Simon, and Peter Zolliker 1. Introduction 1.1. Scope of this Chapter 2. Fundamentals of Color Science 2.1. Human Vision 2.2. Color Spaces 2.3. RGB and XYZ Color Spaces 2.4. CIELAB Color Space 2.5. Color and Image Appearance 3. Gamut Mapping in the Image Reproduction Process 3.1. Model of Visual Image Perception in the Reproduction Environment 3.2. Color Management 4. Psychometrics 4.1. Test Methods 4.2. Data Evaluation 5. Image Quality Measures 5.1. Mean Square Error 5.2. The Measure QDE 5.3. Laplacian Mean Square Error 5.4. The Measure QDLC 5.5. Structural Similarity Index 5.6. Comparing Models 6. Gamut Mapping as an Optimization Problem 6.1. Abstract Optimization Problem 6.2. Concrete Objective Functions and Constraints 7. Conclusion and Future Work References
2 4 4 5 5 6 7 8 9 9 12 14 14 16 21 22 22 22 23 23 24 25 26 27 31 32
v
vi
Contents
2. Color Area Morphology Scale-Spaces
35
Adrian N. Evans 1. 2. 3. 4.
Introduction Area Openings and Closings Area Morphology Scale-Spaces Color Connected Filters 4.1. Color Area Morphology Scale-Spaces 4.2. Convex Color Sieve 4.3. Vector Area Morphology Sieve 4.4. Vector Area Morphology Open-Close Sieve 5. Performance Evaluation 5.1. Implementation and Timings 5.2. Application to Image Segmentation 5.3. Robustness to Noise 6. Conclusions Acknowledgments References
3. Harmonic Holography
35 38 41 43 44 48 50 57 60 61 63 68 70 71 71
75
Ye Pu, Chia-Lung Hsieh, Rachel Grange, and Demetri Psaltis 1. Introduction 2. Scattering of Optical Second Harmonics 2.1. SHG Compared with Two-Photon Fluorescence 2.2. Differences between Bulk and Nanoscale Harmonic Generations 2.3. Emission Power and Scattering Cross Section of Nanoscale SHG 3. Principle of Harmonic Holography 3.1. Coherence of Optical Harmonics 3.2. Harmonic Holography 3.3. Performance Estimation 3.4. Holography with Other Types of Nonlinear Emissions 4. Three-Dimensional Microscopy of Cells with Harmonic Holography 5. Future Developments 6. Conclusion References
4. Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
75 80 81 83 86 93 93 93 100 102 102 105 106 107
113
Gerhard X. Ritter and Gonzalo Urcid 1. 2. 3. 4. 5.
Introduction Basic Concepts from Lattice Theory Properties of LAMs LAMs and Endmembers Application Examples and Results
113 116 118 119 120
Contents
vii
6. The Geometry of F(X) 7. Relationships Among X, WXX, MXX, and F(X) 8. Prelude to Affine Independence 9. Affine-Independent Sets derived from W and M 10. Conclusions and Discussion 11. Appendix: Theorem Proofs References
136 145 148 154 157 158 167
5. Origin and Background of the Invention of the Electron Microscope
171
Reinhold Ru¨denberg 1. Introduction 2. Cathode Ray Tubes in Evolution 3. My Own Career 4. A Case of Infantile Paralysis 5. Limitations of Optical Microscopes 6. Realization of the Electron Microscope 7. Fate of the Invention 8. Building of the Instrument References
6. Origin and Background of the Invention of the Electron Microscope: Commentary and Expanded Notes on Memoir of Reinhold Ru¨denberg
172 173 178 181 183 184 193 195 198
207
H. Gunther Rudenberg and Paul G. Rudenberg 1. Introduction 2. Early Life 2.1. Childhood and Education 2.2. Professional Career 2.3. Family 2.4. Polio 3. The Process of Invention 3.1. Early Patent Applications 3.2. Early Work at Siemens 3.3. Leaving Germany 4. The Fate of the Patent Applications 4.1. Controversy 5. Building the Electrostatic Microscope 6. Recognition and Character Acknowledgments References
208 211 211 214 215 216 218 233 236 244 247 255 262 265 267 267
Contents of Volumes 151–159
287
Index
291
This page intentionally left blank
Preface
The first four chapters of this volume cover topics taken from different aspects of imagery: the relation between representations of colour in different devices; still in the domain of colour, the related area morphology scale-spaces; an attempt to display the time-variation of three-dimensional spatial information by harmonic holography; and the mathematics of a problem in hyperspectral imagery. The last two chapters are of a different nature. The role of Reinhold Ru¨denberg in the early history of the electron microscope has been discussed on many occasions. That he took out the first patent on the electron microscope is not disputed but how did he come by the idea? The last two chapters consist of a memoir written by Ru¨denberg in the 1950s followed by a commentary by his son and grandson. We open with a chapter by Z. Baran´czuk, J. Giesen, K. Simon and P. Zolliker on gamut mapping, the transfer of a range of colours from one medium to another. In order to make their account self-contained, the authors first provide an introduction to colour science before discussing in detail the image reproduction process, psychometrics, quality measures and, finally, mapping as an optimization problem. This is followed by a discussion of colour area morphology scalespaces by A.N. Evans. Area openings and closings (connected operators) are attractive because they generate scale-spaces without altering edges, which makes them invaluable in many applications. Here, an important advance is the application of such operators to colour images. The author guides us through this difficult material with great skill. The third chapter, by Y. Pu, C.-l. Hsieh, R. Grange and D. Psaltis, introduces the subject of harmonic holography. This is a way of creating a fast time sequence of three-dimensional images, the contrast in which is generated by a form of holography known as harmonic holography. This clear presentation will enable the newcomer to this very original technique to understand what is involved. The last of this group of four chapters is by G.X. Ritter (a regular contributor to these Advances) and G. Urcid. Hyperspectral images, from Landsat for example, require interpretation and an essential feature of this is the identification of a set of ‘‘endmembers’’. G.X. Ritter and G. Urcid show that lattice algebra offers a way of determining these. In view of the relatively unfamiliar nature of the concepts employed, they provide a good introduction to lattice algebra before discussing the
ix
x
Preface
problem in detail. This excellent presentation should be invaluable to readers concerned with such problems. The last two chapters need a longer introduction for, without some knowledge of the earliest days of the electron microscope, the reason for their presence here may not be clear. In 1926 and 1927, Hans Busch published two articles in which he demonstrated that the effect of a static rotationally symmetric field on an electron beam is analogous to that of a glass lens on a beam of light and hence that the same quantities – focal length and focal distance – can be used to characterize this effect. It was only a matter of time before an attempt was made to form an enlarged image with such electron lenses, thus paving the way to the development of the electron microscope, and in 1931 Max Knoll and Ernst Ruska obtained promising results with a simple two-lens instrument in the Technische Hochschule in Berlin. We are told by Martin Freundlich, who was also present, that ‘‘Between February and May 1931, the microscope was shown to many interested scientists from the Technical University as well as from industrial laboratories’’ (Freundlich 1963), though I do not think that the word microscope was yet in use. Among these visitors was Max Steenbeck (1904–1981) of the Siemens–Schuckertwerke, who reported what he had been shown to his superior, Reinhold Ru¨denberg. Soon after (the exact date of Steenbeck’s visit is not known), Ru¨denberg applied for patents that effectively gave him priority in the invention of the electron microscope; these patents were applied for at the end of May 1931, a few days before Max Knoll was scheduled to give a public lecture (the Cranz Colloquium) on his work with Ruska. Ru¨denberg published nothing to suggest that he had done any work on an electron microscope, his only appearance in print being an 18-line note in Naturwissenschaften (1932). If this were all, the story would stop here: Ru¨denberg would appear in a footnote in the history books while scientific articles only rarely cite patents (which are not refereed in the sense that journal articles are). But it is not all. In 1943, Ru¨denberg published a description of the circumstances surrounding his invention, explaining that he had been seeking a way of making viruses visible since one of his sons had contracted polio in 1930. It had struck him that electrons could provide the solution and he therefore patented the idea. He wrote: ‘‘I recognised late in 1930 the possibility of making use of electrons for producing largely magnified images of small objects. . .it appeared possible to me with application of electron rays to penetrate deep into the details of the submicroscopic world. . .Since already the first investigations turned out to be very promising, the principal results were expressed in patent applications assigned to the German industrial firm with which I was formerly connected.’’ There is no mention of these ‘‘promising first investigations’’ in the Memoir. Ru¨denberg published nothing more on the early history and died in 1961.
Preface
xi
The information about Steenbeck’s visit and his subsequent report to Ru¨denberg comes from an exchange of letters between him and Max Knoll shortly after World War II, triggered off by an enquiry from the editor of the Encyclopedia of Microscopy, G.L. Clark1. In an article submitted for publication in the Encyclopedia, Ru¨denberg had claimed to be the true inventor of the electron microscope and Erwin Weise had written to Knoll on Clark’s behalf to ask whether such a claim was justified. Knoll in turn wrote to Steenbeck, who made it clear that in his opinion, Ru¨denberg had the idea of an electron microscope as a result of his report. A later paper by Freundlich (1994), which was incidentally first intended for these Advances but, being rather short, was referred to the MSA Bulletin, is of considerable interest here. The Knoll—Steenbeck letters were published (in German and in English translation) in an earlier volume of these Advances (Hawkes, 1985). They were also included in a meticulous analysis of the early patents related to the electron microscope by Ernst Ruska, which was published in German and in English (Ruska, 1984, 1986); this too, like Ruska’s earlier book (Ruska, 1979, 1980), is a key document. The letters gave rise to discussion of the ‘‘vexed question or ‘‘leidige Elektronenmikroskop-Angelegenheit’’: did Ru¨denberg patent the electron microscope as a result of what Steenbeck told him or did he have the idea independently? Opinions are sharply divided, some believing that Ru¨denberg’s patent was indeed inspired by the Knoll–Ruska work, others accepting that he could have had the idea of an electron microscope independently and, on learning that others were on the same track, hastened to patent his idea. It is worth repeating here that, although Steenbeck was convinced that it was his report that resulted in Ru¨denberg’s patents, he also found it possible that Ru¨denberg’s quick mind could have conceived the idea of a microscope independently on the basis of the slender information provided by Steenbeck. The latter was, incidentally, an unconditional admirer of Ru¨denberg as his memoirs attest (Steenbeck, 1977; the relevant passages are quoted by Ruska, 1984, 1986). The publication of the letters led Ru¨denberg’s son Gunther to approach me and I assured him that if he had other evidence to offer I should be willing to publish it in the Advances. Reinhold Ru¨denberg had written a long autobiographical account and Gunther Rudenberg proposed that this should be published, accompanied by explanatory notes by himself. The usual Academic Press contract was sent to him and was signed and returned to the publisher in April 1988. Many years passed, during which Gunther Rudenberg and his brother Hermann published a number of short papers on the subject
1 Knoll in fact mentions the Encyclopedia of Chemistry, also edited by Clark, but it seems safe to assume that it was the Encyclopedia of Microscopy that was intended. No article by Ru¨denberg appears in either encyclopedia.
xii
Preface
(Rudenberg, 1987, 1992; Rudenberg and Rudenberg, 1992, 1994), and towards the end of 2008, Gunther Rudenberg told me that he had nearly finished. But in the New Year, I learned from his son Paul that Gunther Rudenberg had died on 14 January 2009, leaving the commentary well advanced but not yet complete. Paul Rudenberg has now brought the work to a conclusion and it is the Memoir and Commentary that form the last two chapters of this volume. Both the memoir and the commentary go far beyond the ‘‘vexed question’’. The reader will see that they provide a quite full biography of Reinhold Ru¨denberg and his family. A little-known part of the commentary concerns the testimony given in the court case in which Ru¨denberg applied (successfully) for restitution of his patents. Here we are told that Ru¨denberg had prepared a longer publication than the short note in Naturwissenschaften but that permission for publication was not granted by Siemens; we are also told that some experimental work was performed in the Siemens & Halske laboratories but that here again, publication was not authorized. This has already been mentioned by Rudenberg and Rudenberg (1994). There are features of the Memoir that I find surprising. First is the fact that Ru¨denberg does not mention Steenbeck’s report; it would have been so easy to write ‘‘Steenbeck’s report showed that the Knoll group would soon be thinking along the same lines as myself and I therefore decided to protect my idea’’. In the commentary, Gunther and Paul Rudenberg speculate that Steenbeck’s report just didn’t seem important to Ru¨denberg and that it was the imminent public lecture that led him to apply for a patent. A second point concerns Ru¨denberg’s complaint that he was not mentioned in four articles on the early history of the electron microscope that appeared in the wartime years. Reading those articles today, I find it natural that the author of a short note published in 1932, known to have done no experimental work, should be left unmentioned, especially as the main purpose of these four papers was to establish the chronology of the contributions of the Knoll–Ruska–Siemens group and of the AEG group. This is made very clear in a recent study by Falk Mu¨ller of the early work on the electron microscope in Germany (Mu¨ller, 2009). As in any scientific article, the authors cite publications in journals and give no account of the patent history of the instrument; the only patents that are mentioned are an early patent of Stintzing’s (1927, granted 1929), which the AEG group regarded as a precursor; a 1929 patent of Knoll’s; and a magnetic lens patent of von Borries and Ruska’s (1932). Paul Rudenberg also reminds us that Ru¨denberg’s name could not be mentioned because he was a Jew and this makes Ru¨denberg’s observation even more singular. It is not disputed that, as has often been said, Ru¨denberg is the inventor of the electron microscope in patent law. But Ru¨denberg’s patents were first disclosed in December 1932 (in France) and by then, Knoll and Ruska and
Preface
xiii
Bru¨che at AEG had all published or submitted for publication articles on their electron microscopes. Ru¨denberg did, incidentally, attend the Cranz colloquium but did not participate in the discussion or mention his patent application. During the composition of the Commentary, Paul Rudenberg has shown me successive drafts and invited me to comment. The final text and the opinions expressed there are of course his alone. I repeat, the reader must make up his own mind on the ‘‘vexed question’’. He can now read both the letters of Steenbeck and Knoll and the memoir written by Ru¨denberg and I hope that he will find the rest of Ru¨denberg’s story interesting. As always, I express my appreciation of the hard work of all the contributors to this volume and in particular, of their efforts to make difficult and novel material accesible to wide readership. Peter W. Hawkes
REFERENCES Freundlich, M. M. (1963). Origin of the electron microscope. Science 142, 185–188. Freundlich, M. M. (1994). The history of the development of the first high-resolution electron microscope. Remembering the Knoll research team at the Technical University, Berlin, 1927–1934. MSA Bull. 24, 405–415. Hawkes, P. W. (1985). Complementary accounts of the history of electron microscopy. In ‘‘The Beginnings of Electron Microscopy’’, Adv Electron. Electron Phys. Supplement 16, 589–618, at pp. 602–608. Mu¨ller, F. (2009). The birth of a modern instrument and its development in World War II. Electron microscopy in Germany from the 1930s to 1945. In ‘‘Scientific Reseach in World War II. What scientists did in the war,’’ (A. Maas and H. Hooijmaiers, eds.), pp. 121–146. Routledge, London and New York. Rudenberg, H. G. (1987). Nobel Prize. Letter in New Scientist, 2 July 1987, p. 70. Rudenberg, H. G. (1992). The 50 years before the electron microscope: from electron to electron lens – Hans Busch and the ‘‘Go¨ttingen Group’’ Proc. EMSA 50, 1084–1085. Rudenberg, F. H., and Rudenberg, H. G. (1992). Early contributions to the electron microscope: the relevant work and patents of Reinhold Ru¨denberg. Proc. EMSA 50, 1086–1087. Rudenberg, H. G., and Rudenberg, F. H. (1994). Reinhold Ru¨denberg as a physicist – his contributions and patents on the electron microscope, traced back to the ‘‘Go¨ttingen Electron Group.’’ MSA Bull. 24, 572–580. Ru¨denberg, R. (1932). Elektronenmikroskop. Naturwiss. 20, 522. Ru¨denberg, R. (1943). The early history of the electron microscope. J. Appl. Phys. 14, 434–436. Ruska, E. (1979). Die fru¨he Entwicklung der Elektronenlinsen und der Elektronenmikroskopie. Acta Historica Leopoldina, No. 12. Ruska, E. (1980). The Early Development of Electron Lenses and Electron Microscopy (Hirzel, Stuttgart), also published as Supplement 5 to Microscopica Acta. Ruska, E. (1984). Die Entstehung des Elektronenmikroskops. (Zusammenhang zwischen Realisierung und erster Patentanmeldung, Dokumente einer Erfindung.) Arch. Geschichte Naturwiss. 11/12, 524–551.
xiv
Preface
Ruska, E. (1986). The emergence of the electron microscope. Connection between realization and first patent application. Documentation of an invention. J. Ultrastruct. Molec. Struct. Res. 95, 3–28. Steenbeck, M. (1977). Impulse und Wirkungen. Schritte auf meinem Lebensweg. Verlag der Nation, Berlin.
Contributors
Zofia Baran´czuk, Joachim Giesen, Klaus Simon, and Peter Zolliker Swiss Federal Laboratory for Materials Testing and Research (EMPA), Du¨bendorf, Switzerland; and Friedrich-Schiller-Universita¨t, Jena, Germany
1
Adrian N. Evans Department of Electronic and Electrical Engineering, University of Bath, BA2 7AY, United Kingdom
35
Ye Pu, Chia-Lung Hsieh, Rachel Grange, and Demetri Psaltis Optics Laboratory, School of Engineering, E´cole Polytechnique Fe´de´rale de Lausanne (EPFL), STI IOA LO, BM 4.107, Station 17, CH-1015 Lausanne, Switzerland
75
Gerhard X. Ritter and Gonzalo Urcid CISE Department, University of Florida, Gainesville, FL 32611, USA Optics Department, INAOE, Tonantzintla, Pue 72000, Mexico
113
Reinhold Ru¨denbergw Gordon McKay Professor of Electrical Engineering, Emeritus, Harvard University, Cambridge, Massachusetts, USA; and Professor Emeritus and Honorary Senator, Technological University of Berlin, Germany
171
H. Gunther Rudenbergw and Paul G. Rudenberg H. Gunther Rudenberg, the older son of Reinhold Ru¨denberg, is deceased. At the time of the writing of this chapter, he lived in Scarborough, Maine, USA; and Paul G. Rudenberg, son of H. Gunther Rudenberg lives in Les Cayes, Haiti, c/o 259 Foreside Road, Falmouth, Maine 04105, USA
207
w
Deceased
xv
This page intentionally left blank
Future Contributions
S. Ando Gradient operators and edge and corner detection K. Asakura Energy-filtering x-ray PEEM W. Bacsa Optical interference near surfaces, sub-wavelength microscopy and spectroscopic sensors C. Beeli Structure and microscopy of quasicrystals C. Bobisch and R. Mo¨ller Ballistic electron microscopy G. Borgefors Distance transforms Z. Bouchal Non-diffracting optical beams A. Buchau Boundary element or integral equation methods for static and time-dependent problems B. Buchberger Gro¨bner bases E. Cosgriff, P. D. Nellist, L. J. Allen, A. J. d’Alfonso, S. D. Findlay, and A. I. Kirkland Three-dimensional imaging using aberration-corrected scanning confocal electron microscopy T. Cremer Neutron microscopy A. X. Falca˜o The image foresting transform R. H. A. Farias and E. Recami Introduction of a quantum of time (‘‘chronon’’) and its consequences for the electron in quantum and classical physics
xvii
xviii
Future Contributions
R. G. Forbes Liquid metal ion sources C. Fredembach Eigenregions for image classification A. Giannakidis and M. Petrou Conductivity imaging and generalized Radon transform: a review ¨ lzha¨user A. Go Recent advances in electron holography with point sources M. Haschke Micro-XRF excitation in the scanning electron microscope L. Hermi, M. A. Khabou, and M. B. H. Rhouma (vol. 162) Shape recognition based on eigenvalues of the Laplacian M. I. Herrera The development of electron microscopy in Spain M. S. Isaacson Early STEM development K. Ishizuka Contrast transfer and crystal images A. Jacobo Intracavity type II second-harmonic generation for image processing L. Kipp Photon sieves G. Ko¨gel Positron microscopy T. Kohashi Spin-polarized scanning electron microscopy O. L. Krivanek Aberration-corrected STEM R. Leitgeb Fourier domain and time domain optical coherence tomography B. Lencova´ Modern developments in electron optical calculations J.-c. Li Fast Fourier transform calculation of diffraction integrals H. Lichte New developments in electron holography O. Losson, L. Macaire, and Y.-q. Yang Color demosaicing: comparison of color demosaicing methods
Future Contributions
xix
M. Mankos, V. Spasov, and E. Munro (vol. 161) Principles of dual-beam low energy electron microscopy M. Marrocco Discrete diffraction M. Matsuya Calculation of aberration coefficients using Lie algebra S. McVitie Microscopy of magnetic specimens J. D. Mendiola-Santiban˜ez, I. R. Terol-Villalobos, and I. M. Santilla´n-Me´ndez (vol. 161) Determination of adequate parameters for connected morphological contrast mappings through morphological contrast measures I. Moreno Soriano and C. Ferreira (vol. 161) Fractional Fourier transforms and geometrical optics M. A. O’Keefe Electron image simulation D. Oulton and H. Owens Colorimetric imaging D. Paganin and T. Gureyev Intensity-linear methods in inverse imaging N. Papamarkos and A. Kesidis The inverse Hough transform K. S. Pedersen, A. Lee, and M. Nielsen The scale-space properties of natural images R. Shimizu, T. Ikuta, and Y. Takai Defocus image modulation processing in real time S. Shirai CRT gun design methods A. S. Skapin The use of optical and scanning electron microscopy in the study of ancient pigments T. Soma Focus-deflection systems and their applications P. Sussner and M. E. Valle Fuzzy morphological associative memories V. Syrovoy Theory of dense charged particle beams I. Talmon Study of complex fluids by transmission electron microscopy
xx
Future Contributions
M. E. Testorf and M. Fiddy Imaging from scattered electromagnetic fields, investigations into an unsolved problem E. Twerdowski Defocused acoustic transmission microscopy Y. Uchikawa Electron gun optics Z. Umul The boundary diffraction wave V. Velisavljevic and M. Vetterli (vol. 161) Space-frequence quantization using directionlets M. H. F. Wilkinson and G. Ouzounis (vol. 161) Second generation connectivity and attribute filters E. Wolf History and a recent development in the theory of reconstruction of crystalline solids from X-ray diffraction experiments
Chapter
1 Gamut Mapping Zofia Baran´czuk,*,† Joachim Giesen,† Klaus Simon,* and Peter Zolliker*
Contents
1. Introduction 1.1. Scope of this Chapter 2. Fundamentals of Color Science 2.1. Human Vision 2.2. Color Spaces 2.3. RGB and XYZ Color Spaces 2.4. CIELAB Color Space 2.5. Color and Image Appearance 3. Gamut Mapping in the Image Reproduction Process 3.1. Model of Visual Image Perception in the Reproduction Environment 3.2. Color Management 4. Psychometrics 4.1. Test Methods 4.2. Data Evaluation 5. Image Quality Measures 5.1. Mean Square Error 5.2. The Measure QDE 5.3. Laplacian Mean Square Error 5.4. The Measure QDLC 5.5. Structural Similarity Index 5.6. Comparing Models 6. Gamut Mapping as an Optimization Problem 6.1. Abstract Optimization Problem 6.2. Concrete Objective Functions and Constraints
2 4 4 5 5 6 7 8 9 9 12 14 14 16 21 22 22 22 23 23 24 25 26 27
* Swiss Federal Laboratory for Materials Testing and Research (EMPA), Du¨bendorf, Switzerland {
Friedrich-Schiller-Universita¨t, Jena, Germany
Advances in Imaging and Electron Physics, Volume 160, ISSN 1076-5670, DOI: 10.1016/S1076-5670(10)60001-8. Copyright # 2010 Elsevier Inc. All rights reserved.
1
2
Zofia Baran´czuk et al.
7. Conclusion and Future Work References
31 32
1. INTRODUCTION The visual system is the dominant component of human perception; an important part of this system is its ability to discriminate colors. Consequently, visual information in the form of color images is a natural part of modern communication, particularly on the Internet. In modern electronic media, information is usually processed (i.e., captured, stored, transformed, and presented) as digital data. As a consequence, such data require a physical device to become visible. Physical devices have restricted color reproduction capabilities, and color scientists refer to the set of colors that a device or a process can present (monitor), reproduce (printer), capture (camera), or store (computer) as the color gamut of the device. Because of device limitations, a printer is typically not able to reproduce all the colors visible on a display screen. This is illustrated in Figure 1, which shows an original standard red-
FIGURE 1 Demonstration of in-gamut colors for typical printing gamuts. Colors not in gamut are left white: Original sRGB (a), photo paper (b), coated offset paper (c), and newspaper (d).
Gamut Mapping
3
green-blue (sRGB) image and its reproduction by different printer gamuts where all not-reproducible (out-of-gamut) colors are replaced by white. The transformation of color data from one device specification to another one is called gamut mapping. Typically, the transformation is from an input to an output device specification. In traditional photomechanical reproduction, gamut mapping is implicitly given by the physical behavior of the devices. However, desktop publishing and digital reproduction changed the situation fundamentally as the input is now given in digital form, namely, a specification by device-independent color coordinates. Gamut mapping is then realized as a colorimetric function in a device-independent color space. Traditionally, only the final result of gamut mapping has been evaluated by the user, whereas in a completely digital process it is possible to control and evaluate gamut mappings colorimetrically. Gamut mapping is no longer physically determined but must be described mathematically. Consequently, the International Color Consortium (ICC) color management system (ICC-CMS) was introduced in 1993.* This now-dominant software standard describes how to translate device-dependent color coordinates (e.g., RAW-RGB, cyan-magenta-yellow-black [CMYK]) to device-independent color spaces (like CIEXYZ or CIELAB) and vice versa. The basic tools are interpolation tables, known as lookup tables (LUTs), which are stored in device characterization files (profiles). This approach works well for all colors reproducible by a device but out-ofgamut colors are neglected. Over the past two decades, gamut-mapping research evolved from finding the best treatment of out-of-gamut colors to searching for the optimal image-dependent mapping in specific environments. Optimality is defined here by visual image-quality measures that should keep the mapped image as similar as possible to the original. However, no simple definitions exist to describe similarity or image quality, and thus the definition of the objectives of gamut mapping is itself a challenge. This can be addressed by experimental methods assessing human preferences in psychovisual tests. Human perception can be roughly modeled by perceptual attributes such as lightness, saturation (colorfulness), hue, sharpness, and details (contrast). In his book on psychometric scaling, Engeldrum (2000) calls these attributes ‘‘Customer Perception-Nesses.’’ To derive an image-quality measure for optimizing gamut-mapping algorithms, good approximations of these attributes by physical image parameters (also called objective measures) are needed. Furthermore, an appropriate relative weighting of the attributes is important. Consider the following example: Map each in-gamut color onto itself and out-of-gamut color to its closest in-gamut color (using an appropriate * See the ICC website at www.color.org.
4
Zofia Baran´czuk et al.
FIGURE 2 The effect of different gamut mapping strategies: original (a), linear compression (b), and clipping (c).
color space metric). This approach minimizes the average color difference of all pixels in an image. However, colors are always seen in their spatial neighborhood as described by perceptual attributes such as sharpness and contrast, which are not covered by the colorfulness attribute. Hence, the mapped image need not be optimal with respect to those attributes, as illustrated in Figure 2. When the perceived color distances are minimized (i.e., colorfulness is maximized), image details are lost in color-saturated regions. Conversely, details can be preserved by using methods such as linear compression, but then the mapped image usually is desaturated. In general, attributes such as detail and colorfulness cannot be optimally preserved simultaneously. Hence, there is always a trade-off between the different attributes involved; thus, the design of an optimal gamutmapping algorithm needs (1) good psychovisual test data, (2) appropriate image-quality measures, and (3) a versatile gamut-mapping framework allowing multi-attribute optimization.
1.1. Scope of this Chapter A complete overview of the still-active field of gamut mapping is beyond the scope of this chapter. For a general treatment of the topic, see the book on color gamut mapping by Morovic (2008). The goal of this chapter is to provide an overview of gamut mapping considered as an optimization problem. This chapter is structured as follows: In the next section, we review the basics of color science relevant for the subsequent topics. Section 3 describes gamut mapping as part of image reproduction workflow. Section 4 discusses current psychovisual test methods. Section 5 summarizes some image-quality measures relevant for gamut mapping. Finally, in Section 6 we discuss the optimization objectives addressed by popular gamut-mapping algorithms.
2. FUNDAMENTALS OF COLOR SCIENCE This section reviews some facts about human vision, color specification, and color spaces.
Gamut Mapping
5
2.1. Human Vision First, color is not a physical parameter but a sensation (e.g., see Schmidt et al., 2005; Sekuler and Blake, 2002; Wandell, 1994). Derivation of a quantitative color specification requires the correlation of a perceived light stimulus intensity with the magnitude of its sensation. Some aspects, in particular lightness, follow general principles, such as Weber– Fechner’s law or Steven’s power function, but the most interesting one, namely, chroma, does not. Additionally, human sensation is strongly dependent on the actual viewing conditions. For that reason, it is not surprising that the development of device-independent color spaces is a tedious process that cannot be considered completed. However, before going into detail we briefly discuss the retina and its influence on vision. The retina includes several layers of neural cells, especially the photoreceptors, rods, and cones. The rods contribute mostly to vision at low luminance levels (i.e., less than 1 cd/m2), whereas the cones serve vision at higher levels. The ability of photoreceptor cells to adapt their visual sensitivity to the luminance level of the considered scene is called dark adaptation. Roughly speaking, humans differentiate between day and night, which may involve luminance ratios of a factor of 10,000 or more. Consequently, color coordinates usually do not involve physical dimensions. Actually, human vision discriminates relative differences in stimulus intensity. Because there is a lower bound of 1% to 2% for this ability, humans can distinguish only between 50 and 100 grey levels in one scene. Hence, an 8-bit encoding of color coordinates is sufficient.
2.2. Color Spaces The cones can be subdivided into three types: L (long-wavelength), M (middle-wavelength), and S (short-wavelength) according to their peak of spectral responses. Their differences in spectral sensitivity are the basis of color vision and can be modeled by RGB color spaces. But this trichromatic approach of color vision (see Maxwell, 1855/57; von Helmholtz, 1855; Young, 1802), cannot sufficiently explain effects such as opponent colors and perceived color differences. To understand these effects, the neuronal structure of retina cells that are organized into receptive fields needs to be considered. The input signals from the photoreceptor cells are integrated, concentrated, and modified in several neuronal processes and, finally, transmitted to the brain. The resulting output signal contains a luminance, a red-green, and a yellow-blue component. A color space that reflects this representation is the CIELAB space. The following describes the aforementioned color spaces in more detail. Basic to their description are quantitative color descriptions addressed in colorimetry. Colorimetry is the branch of color science,
6
Zofia Baran´czuk et al.
which quantifies and describes physically human color perception (Wyszecki and Stiles, 1982, p. 117), specifying numerically the colors of physically defined visual stimuli in fixed viewing conditions.
2.3. RGB and XYZ Color Spaces Grassmann (1853) observed that empirical color matchings satisfy a mathematical structure he introduced himself some years earlier (Grassmann, 1844). This structure is today known as a vector space. In case of color we observe a three-dimensional (3D) vector space; accordingly, colors can be specified as tristimulus values. There is a strong correlation between the mathematical structure and the underlying physics of light stimuli. A vector can be understood as light source, its length as intensity, and the addition of vectors as the physical mixture of the corresponding light sources. For fixed viewing conditions, this approach was carefully tested and documented by Wright (1928-29) and Guild (1931), respectively. This resulted in the introduction of the standardized color spaces CIERGB and CIEXYZ (see CIE, 1932, for short RGB and XYZ). The primary colors—red, green, and blue—are defined as colors of monochromatic light at wavelengths of 435.1nm (B), 546.1nm (G), and 700 nm, (R). The tristimulus values of monochromatic light at a given wavelength l have been determined and documented as the color-matching functions r (l), g (l), and b(l). These functions allow the calculation of the tristimulus value R, G, and B of an arbitrary light stimulus with given spectral power distribution F(l) as ð ð ð R ¼ k FðlÞr ðlÞdl; G ¼ k FðlÞ g ðlÞdl; B ¼ k FðlÞbðlÞdl; (1.1) where k means a normalizing constant. More popular than RGB is XYZ, which is mathematically derived from RGB by a change of the vector space base (i.e., by a base transformation matrix): 0 0 1 10 1 r ðlÞ x ðlÞ 0:490 0:310 0:200 @ y ðlÞ A ¼ 5:6508@ 0:177 0:812 0:011 A@ g ðlÞ A: (1.2) bðlÞ z ðlÞ 0:000 0:010 0:990 The choice of the XYZ base seems arbitrary but optimizes some technical constraints, for instance, the Y coordinate is identical to the CIE spectral luminance efficiency function also known as photopic luminance efficiency function V(l). Many other well-known RGB color spaces, such as sRGB, Adobe 98-RGB, and ECI-RGB, are also derived from RGB and should be understood as mathematically equivalent. Contrary to these color spaces,
Gamut Mapping
7
CMYK typically denotes a device-dependent specification describing the amount of ink (cyan, magenta, yellow, and black) placed in a raster cell. Because of the subtractive interaction of C, M, Y, and K inside the raster cell, a modeling of the resulting color is much more complex. The CIEXYZ system represents the average ability of humans to discriminate colors under specific viewing conditions; hence it is sometimes also called standard or normal observer.
2.4. CIELAB Color Space Unfortunately, the Euclidean distance in XYZ space does not match the perceived color distances, and thus XYZ is not well suited for gamut mapping. Therefore, two color spaces, CIELUV and CIELAB, have been recommended by the CIE (1978). Approximate correlates with the perceived lightness, chroma, and hue of a stimulus can be easily derived from their coordinates. Although originally both spaces were recommended, CIELAB is almost universally used today, in particular for color measurements. In CIELAB the psychometric lightness L* is defined as 8 sffiffiffiffiffiffi > Y Y > > 116 3 1 16 for 0:008856 > < Y0 Y0 def ¼ (1.3) L ¼ L ðYÞ Y Y > > > for 0 0:008856; 903:29 > : Y0 Y0 where (X0, Y0, Z0) represents the reference white. This definition agrees with Stevens power function and roughly with the Weber–Fechner-law. Then CIELAB contains the a* (red-green) and b* (yellow-blue) coordinates: X Y f (1.4) a ¼ 500 f X0 Y0 Y Z f ; (1.5) b ¼ 200 f Y0 Z0 where ffiffiffiffi 8p 3 < w 16 f ðwÞ ¼ : 7:787 w þ 116
for w > 0:008856 otherwise:
(1.6)
According to the uniformity of CIELAB, color differences are understood as Euclidean distances and denoted as DEx, where the index x indicates the underlying color space. Over the years, some improvements for color differences have been introduced by the CIE, especially DE94
8
Zofia Baran´czuk et al.
(cf. CIE, 1995) and DE00 (cf. CIE, 2001), which are modifications of DELab, in addition to stricter specifications of the viewing conditions. In CIELAB color space we can visualize the difference between source and destination gamut quantitatively. Figure 3 shows typical printing gamuts as colored objects compared with a standard sRGB source gamut (as is typical for a monitor).
2.5. Color and Image Appearance Because viewing conditions play an important role, appropriate models are necessary particularly when colors or images are compared under different conditions. Fairchild (1998) explains this in his textbook: ‘‘Color appearance models aim to extend basic colorimetry to the level specifying the perceived color of stimuli in a wide variety of viewing conditions’’ (p.1). A first step toward a color appearance space is the compensation of the reference white in the CIELAB color space. This allows a first-order compensation for the adaption of the human viewing system to the lighting condition. However, the CIELAB model lacks compensation of other viewing parameters such as surround conditions. It also has some limitations in its color difference metric. These shortcomings led to the development of new color appearance models, such as CIECAM97 and CIECAM02 (cf. CIE, 2004b). The CIECAM02 model allows the calculation of dependences for the six technically defined dimensions of color appearance: brightness, lightness, colorfulness, chroma, saturation, and hue. The model’s parameters compensate for the relevant influence of surround conditions:
FIGURE 3 Typical printer gamuts (colored objects) compared with a standard sRGB gamut (wire frame: newspaper gamut [left], coated offset paper [middle], inkjet printer [right].
Gamut Mapping
9
A factor F determining the degree of adaptation of the human eye to the
lightness
A parameter c compensating for the impact of the surround A factor Nc, the chromatic induction factor.
These parameters are defined for a set of typical viewing conditions as follows: Average for viewing surface colors, Dim for viewing television, and Dark for using a projector in a dark room. One major shortcoming of color appearance models is that they do not directly account for spatial and temporal properties of human vision. They basically treat each pixel in an image as a completely independent stimulus. Thus a new class of models, termed image appearance models, has been developed and is still under development. An interesting approach including spatial aspects of image appearance is the Retinex model introduced by Land (1986). Retinex models are used quite successfully in computer vision; however, they do not accurately model human color perception (Hurlbert et al., 2002). Recently a more sophisticated model, the image color appearance model (iCAM) framework, was proposed by Fairchild and Johnson (2004). Their basic idea is extracting image appearance components from four different images: (1) a high-frequency color image, (2) a low-frequency color image, (3) a low-frequency image grey scale, and (4) a low-frequency image within its surrounding. As illustrated later in this chapter, a better understanding of image appearance is probably the key for the development of gamut-mapping algorithms beyond pixel-for-pixel mappings.
3. GAMUT MAPPING IN THE IMAGE REPRODUCTION PROCESS 3.1. Model of Visual Image Perception in the Reproduction Environment Understanding the requirements of gamut mapping requires a model of the image reproduction process. More specifically, an image has to be matched visually with an image shown on a source device (e.g., liquid crystal display [LCD] monitor) in order to be printed on a destination device (e.g., an inkjet printer). Both devices have their own defined viewing conditions (lighting, viewing angle, and surround). A possible model is shown in Figure 4. The basic components of this model are (1) a characterization of the source and destination device, (2) appearance models for both viewing conditions, and (3) an appropriate metric for the visual image difference. The task of finding the (visually) best match can now be formulated as determining a digital output image that minimizes the perceived image
Model of image perception
Source device characterization
Source color measurements
Appearance model
Source viewing conditions Perceptual image difference measure
FIGURE 4
Destination color measurements
Destination viewing conditions
Destination device characterization
Appearance model
Model of image perception.
Minimize
Image perception situation
Gamut Mapping
11
difference. The achievable quality depends on (1) the accuracy of the involved model components and (2) practical restrictions in solving the minimization problem. Over the years several models have been developed for each of the three components. The following text summarizes important models for the components:
3.1.1. Device Characterization Devices are characterized either by physical models or empirical techniques based on a representative set of test patches. Bala (2003) reviews variety of solutions for monitors, scanner, digital cameras, and printers. For cathode ray tube (CRT) monitors physical models work quite well and are rather simple (Berns et al., 1993). For characterizing printers a few physical models have been developed, for example, the models of Kubelka and Munk (1931), Yule and Nielsen (1951), and Neugebauer (1937). However, in practice the accuracy of physical models, particularly for printing but also for non-CRT monitors, is usually in sufficient because the model does not cover all of the determining parameters. It therefore became common practice to use empirical techniques for device characterization—building LUTs (look up tables) based on measured test patches. This can be conveniently done with the data structures provided by the ICC color management.
3.1.2. Appearance Model If source and destination devices have similar appearance conditions, (e.g., if both images are simulated on a monitor), the choice of appearance model is of minor importance. For most applications, a color appearance model like CIECAM02 or an image appearance model like ICAM can be used.
3.1.3. Perceived Image Difference Measure The most direct method to measure the perceived image difference is by human judgment. Psychovisual testing, which is described in the next section, is a basic tool granting direct access to human judgment. This approach works well if there are only few image variations. For highly parameterized image optimization, psychovisual tests are not feasible and thus appropriate image-quality measures are needed. Such measures provide a good estimate of the perceived image differences. More details on such measures are provided in Section 5. In an image processing workflow derived from a model with components as shown in Figure 4 the processing order of the appearance model and the device characterization is reversed for the destination device (compared with the source device). However, the reversal is not straightforward. A number of problems need to be resolved; for example, in general not all colors are reproducible by the destination device (out-ofgamut colors). For out-of-gamut colors the device characterization is not
12
Zofia Baran´czuk et al.
invertible. Gamut mapping is basically a solution to this problem and the topic of this chapter. Another issue is that the inversion is overdetermined for destination devices with more than three colors. This problem can be handled by using a predefined color separation strategy. Furthermore, the spatial resolution of human vision is usually smaller than the resolution of the device. Thus, the direct inversion of image appearance models is problematic. The following text neglects this issue as it is usually neglected in today’s imaging workflows. An image reproduction workflow can now be modeled as shown in the top row of Figure 5. This model is, for example, realized in the ICC color management workflow that is briefly described in the following subsection.
3.2. Color Management In traditional workflows, the resulting image was more or less regarded as an original. However, the WYSIWYG (what-you-see-is-what-you-get) principle of desktop publishing, changed this point of view. In desktop publishing, every image presented on a display screen is considered a proof. Consequently, the common device-dependent color interpretation was no longer acceptable. Also, desktop publishing is often a distributed process with heavy data exchange over global and local area networks.
Source device characterization
Appearance model
Source color measurements
Source viewing conditions
Gamut mapping
Inverse appearance model
Inverse destination device characterization
Destination viewing conditions
Destination color measurements
Color separation
Destination image (e.g., CMYK)
Source image (e.g., RGB)
Model of image reproduction
ICC input profile Source “Appearance device model” characterization
Gamut mapping
Profile connection space (CIEXYZ CIELAB)
ICC output profile Inverse
Inverse destination Gamut Color “appearance device mapping separation model” characterization
Destination image (e.g., CMYK)
Source image (e.g., RGB)
Realization as ICC color management workflow
FIGURE 5 Model of image reproduction: general model (top) and realization in a ICC color management workflow (bottom).
Gamut Mapping
13
For this reason, color management needs to be based on device-independent color specifications that are organized as open software standards. In 1993 FOGRA, one of the well-known standardization organizations in the graphics industry, convinced leading computer companies to agree on a new software standard, now known as the ICC color management system (ICC-CMS). The ICC-CMS provides a universal transfer function from a device-dependent color specification to a device-independent color space (or vice versa) using a so-called profile connection space (PCS), which is either XYZ or CIELAB. The transfer function is then implemented through LUTs and interpolation. Aside from LUTs, ICC profiles contain data for device characterization and software maintenance. Profiles are usually generated by special software tools such as ProfileMaker (X-Rite, Grand Rapids, MI) which, in particular, evaluate measurements of the test charts and create the LUTs. In order to carry out the LUT interpolation, applications such as Photoshop pass ICC profiles together with the color data to a color management module (CMM) that usually is part of the operating system. It is important to note that ICCCMS always realizes a static mapping from one color space to another. A typical workflow using the ICC color management components is shown in the bottom row of Figure 5. Finally, we are ready to discuss the role of gamut mapping in the color management process. As previously mentioned, its purpose is to handle out-of-gamut colors. In ICC-CMS, gamut mapping is aligned with so-called rendering intents that offer several approaches to find a replacement for an out-of-gamut color. We can distinguish several rendering intents. The absolute colorimetric intent provides no specification for a replacement. This leaves the gamut mapping to the profile creator that usually implements clipping to the nearest reproducible color. The relative colorimetric intent differs from the absolute one just in an additional whitepoint adaption. From a user’s perspective the perceptual rendering intent is the typical case. Here a mapping of all colors, including in-gamut colors, is defined to retain the relationship between colors to obtain better visual image quality. However, in this case the specific mapping algorithm also is left to the profile software provider. Finally, the saturation intent is a variant of the perceptual rendering intent optimized for business graphics. The realization of gamut mapping in ICC-CMS has been criticized for the following reasons: 1. Gamut mapping in ICC-CMS is hidden from the user, which is inconsistent with its accountability for applying a profile. 2. In early versions of ICC color management, gamut mapping was expected to be handled in the output profile. This drawback means that different color gamuts of input devices cannot be taken into account for the gamut mapping. In version 4 of the ICC Profile
14
Zofia Baran´czuk et al.
definition (ICC.1:2003:9, 2003), a reference media gamut has been introduced for perceptual rendering. This allows the gamut mapping to be divided into two parts, first mapping the input data to the reference media gamut, and second mapping from the reference media gamut to the device gamut (see Figure 5, bottom row). 3. From its structure ICC-CMS allows only static pixel-to-pixel gamut mappings between color spaces and therefore has limited visual performance. Alternatives based on image appearance can achieve better visual ratings. 4. From a user’s point of view, gamut mapping and the creation of device-dependent CMYK output data (i.e., transformation to device colors and color separation) should be disassociated. This offers the potential for improved gamut-mapping concepts. The resulting images can be reintegrated in an ICC workflow using the absolute rendering intent.
4. PSYCHOMETRICS 4.1. Test Methods Psychovisual tests are used to subjectively evaluate the quality of images, such as the quality of images transformed by gamut-mapping algorithms. A variety of methods are available to carry out such tests. The aim of the tests is to compare images with respect to perceived quality. Often an interval scale for the images is computed from these comparisons, where a scale value is a measure for the quality of an image. The CIE guidelines for testing gamut-mapping algorithms (CIE, 2004a) provide specific experimental methods, viewing conditions, and reference algorithms. Three kinds of psychovisual tests are recommended for evaluating the quality of gamut-mapping algorithms: pair comparison, rank order, and category judgement. The most widely used method is pair comparison. It is also the easiest for the observers, especially if the differences between the images are small. However, the other methods are also useful for specific problems; in particular, they are typically less time consuming than pair comparisons. The following text briefly describes the different methods.
4.1.1. Pair Comparisons A pair of images from a set A is presented to an observer, who is then asked to choose the one that better fulfills instructions of the test. In the gamut-mapping case, the instructions usually state that the observer should choose the more aesthetic image or the image more similar to the original. In the second case, the original image is shown along with the transformed images. Pair comparison tests have been used, for example, in Farup et al. (2004) or Bonnier et al. (2006). Typically, the data are stored
Gamut Mapping
15
in a frequency matrix F ¼ ( fij), where fij is the number of comparisons where i was preferred over j (i j).
4.1.2. Rank Order An observer is asked to rank the images from the best to the worst according to an attribute defined in the instructions (e.g., similarity to the original or aesthetics of the image). Each ordering immediately provides n(n 1)/2 pair comparisons. The data gathered in a rank order test can also be stored in a frequency matrix. Alternative evaluation methods for rank order data are computing average ranks (Engeldrum, 2000) or distance-based models as described and applied by Millen et al. (2001).
4.1.3. Category Judgment In a category judgment test, observers are asked to assign images to categories. Categories can have descriptive names or just numbers. If the quality of images is considered, category names can be, for example, 5. Excellent, 4. Good, 3. Fair, 2. Poor, and 1. Bad. For example, category judgments have been used in Morovic and Yang (2003).
4.1.4. Media Psychovisual tests can be performed using different media. The typical situation compares an original image on a monitor with printed images using two different gamut-mapping algorithms. Another common technique shows both the original and the mapped images on a monitor. A necessary condition for this method is that the printer gamut is a subset of the monitor gamut. Both methods have advantages and disadvantages. The first method is closer to the targeted application; however, original and mapped images are seen under different viewing conditions, for which appropriate color appearance models need to be known. In the second method, all images are seen under the same viewing condition; thus, color appearance models are less important. Furthermore, the production of a large series of test images is much simpler.
4.1.5. Environment The CIE guidelines recommend fixed viewing conditions for a medium. In the case of a monitor, the number of observers can easily be increased by conducting the test on the Internet. This environment does not afford guaranteed fixed viewing conditions. However, a recent study (Sprow et al., 2009) has shown that gamut-mapping studies using the Internet can yield similar results as studies in a controlled laboratory environment. Internet studies can be an interesting alternative, especially if a large number of observers is required.
16
Zofia Baran´czuk et al.
4.2. Data Evaluation The data gathered in a psychovisual test need to be processed to derive an interval scale for the considered gamut-mapping algorithms. We briefly discuss how to process data gathered in pair comparison tests. For onedimensional problems (an unstructured set of distinct gamut-mapping algorithms), we cover the most frequently used data evaluation method, namely Thurstone’s law of comparative judgment (Case V). Parameterized gamut-mapping algorithms and their analysis in psychovisual tests recently became a focus of research. We review two methods to analyze multidimensional (parameterized) data: multidimensional scaling using just-noticeable differences (JNDs) and conjoint analysis as a generalization of Thurstone’s method.
4.2.1. Thurstone’s Law of Comparative Judgment Thurstone’s law of comparative judgment is a method used to evaluate data obtained in a pair comparison test. Rank order experimental data also can be transformed to pair comparison data and then evaluated by this method. Given is a set A of n stimuli, for example, images mapped with different gamut-mapping algorithms and choice data in the form of a frequency matrix F. Let pij represent the proportion that stimulus i is f
ij preferred over stimulus j, pij ¼ fij þf . This proportion is an indirect measure ji
for the distance of the scale values Si and Sj of items i and j, respectively. S is a vector of scale values. The perceived value of stimulus i is modeled as ui ¼ Si þ ei, where the error ei is a random variable following a Gaussian distribution with mean value 0. It is assumed that for every i, ei follows the same normal distribution N(0, s2), that is, si ¼ s. Since the preferred image is the one with the higher perceived value, the probability of preferring stimulus i over stimulus j is given as P½ui uj ¼ P½Si þ ei Sj þ ej ¼ P½Si Sj ei ej : Since both ei and ej are N(0, s2) distributed and are assumed to be independent, their difference is N(0, 2s2) distributed. Hence, Si Sj ; P½ui uj ¼ P½ei ej Si Sj ¼ F pffiffiffi 2s where F is the cumulative distribution function of the standard normal distribution. By applying F1 to the both sides of the above equation, we get pffiffiffi Si Sj ¼ 2sF1 ðP½ui uj Þ:
17
Gamut Mapping
Using the proportion pij to approximate (P[ui uj]), we construct the matrix Z with pffiffiffi zij ¼ 2sF1 ðpij Þ: (1.7) We assume that the entries zij of the Z matrix are noisy measurements of the true differences Si Sj. If we assume that the noise term e is independent of the pair (i,j), then we get the system of equations Si Sj ¼ zij þ e for i, j ¼ 1 . . . n, i ¼ 6 j. Mosteller (1951a) has already realized that the least squares solution of the system of equations for the scale values Si can be determined by averaging the columns Si ¼
1X zij : n j
(1.8)
In order to fix the arbitrary offset of the scale values, the sum of all scale values Si is assumed to be zero.
4.2.1.1. A generalization of Thurstone’s Method The classical solution of Thurstone’s method assumes that all the zij are known and have approximately the same error. However, zij is not always available for all considered pairs. A solution for the problem with an incomplete matrix has been proposed by Morrisey (1955). The least squares solution can be determined from the set of linear equations Si Sj ¼ zij for every available zij. This system form as z ¼ XS, where 2 3 2 1 1 0 z1;2 6 z1;3 7 61 0 1 6 7 6 6 ⋮ 7 6 6 7 6 6 z1;n 7 61 0 0 6 7 ; x¼6 z¼6 7 6 0 1 1 z 6 2;3 7 6 6 ⋮ 7 6 6 7 6 4 zn1;n 5 40 0 0 0 1 1 1
of equations can be written in matrix 0 0 ⋮ 0 0 ⋮ 0 1
3 0 07 7 7 7 . . . 0 1 7 7; ... 0 07 7 7 7 . . . 1 1 5 ... 1 1 ... ...
0 0
2
3 S1 and S ¼ 4 ⋮ 5: Sn
If the rank of the matrix X is n, we can find the least squares error solution for this smaller set of equations and compute the vector of scale values as S ¼ ðXT XÞ1 XT z:
18
Zofia Baran´czuk et al.
4.2.2. Multi-Attribute Just-Noticeable Differences A just-noticeable difference is the smallest detectable difference between two levels of a particular sensory stimulus. Here we consider JND in quality—we consider the smallest differences that have an effect on the perceived image quality (but not the differences that are noticeable, though their impact on quality is unclear). We now consider a multi-attribute setting with several independent sensory stimuli. The question is how to combine the perceived effect of the different stimuli. Keelan (2002) has introduced a multi-attribute formalism to combine the effects of different stimuli whose main two constraints are as follows: (1) if all JNDs are small, then the effect of the different stimuli just adds up, and (2) when at least one attribute’s (stimulus) JND is large relative to the other JNDs, then the stimuli with small JNDs have little impact on the perceived effect. To satisfy these constraints, Keelan proposes to combine effects using the variable-power Minkowski metric, which is the nth root of the sum of nth powers of differences in each attribute. In the (image) quality context, that is, DQm ¼
X
ðDQi Þnm
n1
m
;
i
where nm is the power of the metric, DQi is the quality change induced by the ith attribute, and DQm is the perceived overall image quality. Keelan also proposes that nm should be a function of the DQi that is close to 1 if all the DQi are small, and larger if at least one of the DQi is large. Note that this formalism assumes that the attributes are scalable. In the image-quality setting, this means that the image with no distortions should have been an element in the multi-attribute space. Typically this is not the case in gamut-mapping tests, because the original image is usually far from the transformed images and the difference to the original cannot be expressed in terms of JNDs.
4.2.3. Multi-Attribute Analysis Using Conjoint Analysis Conjoint analysis is an alternative to multi-attribute JNDs for evaluating parameterized gamut-mapping algorithms. We assume that the (gamutmapping) algorithm has m parameters (attributes), where each parameter takes values in the parameter sets Al,. . .,Am. An algorithm is then given simply by a choice of values for every parameter, that is, it is an element in the Cartesian product Al . . . Am. Note that every algorithm corresponds to a mapped image, and preferential data collection for the algorithms can be done using the corresponding images. Also note that the total number of algorithms grows exponentially with the number of parameters and thus the standard Thurstone approach (or any other single-attribute technique) needs too many comparison.
Gamut Mapping
19
However, using the structure on the parameter space, we can estimate how much each parameter level contributes to the observers’ preferences. The high-level concept is to apply Thurstone’s method to every individual parameter, normalize the scale values across parameters, and finally sum the scale values for parameter values present in the tested algorithm. The normalization step is crucial. Normalization is based on the assumption that the scale values Sij (scale value for parameter i at value (level) j) for any parameter i are drawn from a normal distribution with variance s2i1 and 0 mean. The s2i1 values themselves are drawn from another normal distribution with mean 0 and variance s2i2 . Hence the Sij values are simply drawn from a Nð0; s2i1 þ s2i2 Þ distribution as a convolution of two normal distributions. To make the scale values for the different parameters i comparable, the distributions Nð0; s2i1 þ s2i2 Þ are assumed to be independent of i: These distributions are the same for all i. This assumption is quite reasonable since the scale values for the different parameters are all computed from the same database, namely, pair comparisons between algorithms. We now want to find a rescale value li, such that l2i ðs2i1 þ s2i2 Þ ¼ const: Without loss of generality we can assume that l2i ðs2i1 þ s2i2 Þ ¼ 1: While applying Thurstone’s method we had to fix value si2 arbitrarily, and set to 1 for all parameters i. In the following, we fix the parameter i and drop it from the index. We can estimate s1 from the scale values obtained by Thurstone’s method as follows: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uPni Pnj u i¼1 j¼1 ðSi Sj Þ2 fij s1 ¼ s2 t : P i Pnj 2 ni¼1 j¼1 fij As stated previously, s2 was assumed to be 1. Given that Si ¼ 0 and because of l2 ðs21 þ s22 Þ ¼ 1; we get 1 l ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi : Pn Pn 1þ
i i¼1
2
j
ðSi Sj Þ2 fij
Pni j¼1Pnj i¼1
f j¼1 ij
For the fixed parameter i we now rescale the values si1 ; . . . ; sini computed by Thurstone’s method by the estimated value of li(¼ l) to normalize them. The normalized values for the parameter levels are our partworths that we assume to contribute linearly to the qualityP of an algorithm, that is, the quality value of a stimulus ða1j1 ; . . . amjm Þ 2 A is m i¼1 li Sij , which is the sum of the part-worths of the parameter levels present in the stimulus.
20
Zofia Baran´czuk et al.
4.2.4. Error Analysis A good error estimation is important to gauge the statistical significance of differences between scale values as well as for statistical tests. Different methods can be used to estimate errors. Here we describe two of them: analytical error estimation and experimental error estimation.
4.2.4.1. Analytic Error Estimation Here we demonstrate only the analytic error estimation for Thurstone’s method. Corresponding error calculations can be derived also for the other data evaluation methods. The basic approach is to estimate the error in the image choice process and then propagate the error through the data evaluation steps. The process of choosing one image from the pair of images can be modeled as a Bernoulli trial. The standard deviation for a Bernoulli variable in n trials is given by rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pij ð1 pij Þ ; spij ¼ n where n is the number of comparisons between two stimuli and p is the relative number of times that the first stimulus was preferred over the second one. We propagate the error using equation (1.7) to compute the errors of the entries Zij in the Z matrix: pffiffiffi d 1 szij ¼ spij 2 F ðpij Þ: dpij Using equation (1.8) the errors of the scale values Si are computed as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 1 s2 : sSi ¼ n j2A;i6¼j zij In practice, it is typically sufficient to assume that the probabilities pij qffiffiffiffiffi 1 are not far from 1/2. Then their standard deviation is spij const ¼ 4N and error of the Z-matrix elements zij can be approximated as rffiffiffiffiffiffiffi rffiffiffiffi 1 pffiffiffi d 1 p 2 F ð0:5Þ ¼ szij
4N dpij N and the error for the scale values Si as rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 pðn 1Þ : ss
n N
Gamut Mapping
21
4.2.4.2. Experimental Error Analysis The theoretical error estimations rely on several model assumptions, in particular, independence of the choices. One error estimation method based on a minimum of assumptions samples the error by dividing the choice data randomly into two groups. For both groups the scale values are individually computed and errors are estimated from the differences of the values obtained from both groups. This process is repeated several times and the results are averaged to increase the accuracy of the error estimation. An extension of this method is to divide individual observers (or individual images) into two groups. Error estimation using such biased samplings allows testing whether the choices depend on individual observers (or images).
4.2.5. Testing Model Assumptions In evaluating the data using different methods, we made some assumptions. For example, in Thurstone’s method it was assumed that the parameters are not correlated and that the variances for all parameters are equal. These assumptions are usually tested using Mosteller’s test (Engeldrum, 2000; Mosteller, 1951b). The main assumption in conjoint analysis is the additivity of parameters. A description of how to test this assumption can be found in Giesen et al. (2007).
5. IMAGE QUALITY MEASURES Conducting psychovisual experiments with observers is so far the most popular, but not the only, method for assessing image quality. Imagequality measures are an efficient alternative for comparing the quality of transformed images with an original image. The results do not always exactly match with observers preferences, but they are reliable enough to be treated as a reasonable approximation. A good example of the necessity of methods that require less effort than psychovisual tests is optimizing gamut-mapping algorithms on the basis of their performance on the produced images. This section reviews the most important image-quality measures. In the following description of image-quality measures we always compare two images X and Y with n m pixels. xij e X and yij e Y are color coordinates in both images. If not stated otherwise, we do not distinguish in our notation between a pixel and the color coordinate considered at this pixel. Apart from DE, all measures described below use a one-dimensional (1D) color description, and it is not always clear how to generalize them to three color coordinates. Primarily, one uses just one color coordinate (usually luminance) or combines the different coordinates into a single value. In Baran´czuk et al. (2009) the luminance coordinate was used and provided good results.
22
Zofia Baran´czuk et al.
5.1. Mean Square Error A well-known and intuitive measure is the mean square error (MSE). The MSE is simply the squared pointwise difference between the images X and Y. The corresponding image quality measure QMSE is defined as QMSE ðX; YÞ ¼
n X m 1 X ðxij yij Þ2 ; nm i¼1 j¼1
where xij and yij are the lightness coordinates (e.g., L* in the CIELAB color space) for the points in images X and Y, respectively. This measure is not very accurate in predicting image quality, but this measure, as well as other pointwise measures, reflects the main assumption of gamut-mapping algorithms—mapped colors should be close to the original ones.
5.2. The Measure QDE DE is another pointwise distance measure, similar to the MSE. The difference is that it is computed in a 3D color space instead of a 1D projection. It is defined as the Euclidean distance in CIELAB color space between corresponding pixels in two images. That is, locally at a pixel x 2 X and the corresponding pixel y 2 Y, the DE distance is defined as qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi DEðx; yÞ ¼ ððLx Ly Þ2 þ ðax ay Þ2 þ ðbx by Þ2 Þ: As our image-quality measure QDE we take the average DE over the pixels of the two images, that is, QDE ðX; YÞ ¼
n X m 1 X DEðxij ; yij Þ: nm i¼1 j¼1
DE is a popular image-quality measure because it is easy to compute and has a natural interpretation. In principle, it could be replaced by any more sophisticated color distance measure such as CIECAM02 (CIE, 2004b; Moroney et al., 2002) or DE94 (CIE, 1995; McDonald and Smith, 1995).
5.3. Laplacian Mean Square Error An important attribute of image quality is local contrast. Evaluating local contrast is not included in any pointwise measure. The basic approach of evaluating image local contrast is the Laplacian mean square error (LMSE). The LMSE (see Eskicioglu and Fisher, 1995), is a local measure for the difference in two images. We compute the following quantities at each pixel (with indices 2 i n 1 and 2 j m 1) of X and Y, respectively:
Gamut Mapping
23
Lðxij Þ ¼ xðiþ1Þj þ xði1Þj þ xið jþ1Þ þ xið j1Þ 4xij and Lðyij Þ ¼ yðiþ1Þj þ yði1Þj þ yið jþ1Þ þ yið j1Þ 4yij : The image-quality measure QLMSE is then defined as QLMSE ðX; YÞ ¼
n1 m1 X X 1 ðLðxij Þ Lðyij ÞÞ2 : ðn 2Þðm 2Þ i¼2 j¼2
Another measure considering a local contrast in images is DLC.
5.4. The Measure QDLC The image-quality measure QDLC is based on a local contrast measure. Here the Michelson contrast (see Michelson, 1927) is used as a measure of local contrast. We compute it on a k k patch PX X of the image X as follows: LCðPX Þ ¼
xmax xmin ; xmax þ xmin
where x is a luminance coordinate in XYZ color space (at pixel x 2 PX), and xmax and xmin are the highest value and the lowest value, respectively, of this intensity on the patch PX. Analogously, we can compute the value LC(PY) for the corresponding patch PY in image Y, and define DLCðPX ; PY Þ ¼ LCðPX Þ LCðPY Þ: The image-quality measure QDLC(X, Y) is then finally defined as the measure DLC averaged over all possible k k patches in images X and Y.
5.5. Structural Similarity Index The structural similarity (SSIM) index was introduced by Wang and Bovik (2002) and is defined on quadratic image patches of size k k at the same location within image X and Y. Let PX X be such a patch and PY the corresponding patch for Y. We compute the following quantities for the patches: X X ¼ 1 P x; 2 k x2P X
s PX 2 ¼
k2
X Y ¼ 1 P y; 2 k y2P Y
1 X X Þ2 ; ðx P 1 x2P X
24
Zofia Baran´czuk et al.
sP Y 2 ¼
k2
1 X Y Þ2 ; ðy P 1 y2P Y
and k 1 X X Þðyi P Y Þ: ðxi P 2 k 1 i¼1 2
s PX PY ¼
The SSIM index is then defined as SSIMðPX ; PY Þ ¼
XP Y þ c1 Þð2sP P þ c2 Þ ð2P X Y ; 2 2 2 ðP þ P þ c1 Þðs þ s2 þ c2 Þ X
Y
PX
PY
with two constants c1 and c2. The SSIM index is a product of three factors that describe a linear correlation between both images, consistency of mean luminances, and consistency of contrasts in both images. Hence, optimizing SSIM means optimizing these three factors in the image. From the SSIM index the image-quality measure QSSIM(X, Y) can be defined as the SSIM index averaged over all possible k k patches in the images X and Y. The resulting measure is in the range [1, 1], and the higher the QSSIM value, the more similar are the compared images.
5.6. Comparing Models The previous sections described the most popular methods used to evaluate gamut-mapping algorithms, either with psychovisual tests or using image-quality measures. Each model has a number of parameters. To find the optimal model, a method to compare models is needed—a sort of benchmark. For instance, in Thurstone’s method Gaussian distributions are used for modeling the preferential choice process. We could test different distributions and choose the one that gives the best model. Here we review hit rates and cross-validation as a quite general technique to compare different methods for evaluating gamut-mapping algorithms.
5.6.1. Hit Rate The hit rate is the percentage of correctly predicted choices. In the case of pair comparisons, observers choose the image that they assume to be the better one from each pair. According to the model that is used to describe the comparisons data, one of the images in each pair must at least as good as the other. Now if the observer’s decision was the same as the prediction obtained from the model, we call the prediction correct. If the model’s prediction was different than the observer’s choice, we call this prediction wrong. Assuming that the number of correct predictions is c and number
Gamut Mapping
25
c of wrong ones is w, then the hit rate h is defined as h ¼ cþw . We can compare hit rates for different models and assume that the one with the higher hit rate describes the given dataset better.
5.6.2. Cross-Validation The same data should not be used to derive the model and test it. This problem is circumvented in cross-validation: The dataset is randomly divided into k subsets. Then one of these subsets is chosen as the test set, and the remaining subsets form the training set. The training set is used to derive the model. Then the hit rate can be computed on the test set using the model. This procedure can be repeated k times, using each of k sets as the test set once. This allows determination of k hit rates, from which the mean hit rate is computed. The results can be improved further by repeating the entire procedure several times to get less noisy results.
5.6.2.1. Example: Conjoint Analysis For example, in the approach toward conjoint analysis (as described here) an important part is rescaling of partworths obtained for different parameters. The rescaling method that we have described builds on strong model assumptions. An alternative is to optimize the scalings using hit rates. 5.6.2.2. Example: Individualization In general, a first evaluation uses the entire dataset, and thus the results are averaged over all images and observers. However, different observers may not have the same preferences and the different mapping algorithms may not have have the same impact on the individual images. In that case, individualized models that treat every image or every observer independently may be superior. To derive meaningful individualized models more data are necessary since the number of free parameters that need to be estimated is much larger. Typically, the data tend to be too sparse to compute personalized models directly. In that case, better results can be achieved by using a linear combination of a global and a personalized model. The best weight for the linear combination can be found by testing multiple values and choosing the one with the highest mean hit rate using cross-validation.
6. GAMUT MAPPING AS AN OPTIMIZATION PROBLEM Gamut mapping can be described as a constrained optimization problem. The objective is to find a mapped image that represents an original image as best as possible, that is, we want to minimize the perceived difference of a mapped image and the original image under the constraint of color gamut limitations. This section describes gamut mapping as an optimization problem in abstract terms. We then provide concrete examples that
26
Zofia Baran´czuk et al.
show that many of the existing gamut-mapping algorithms fit into the abstract framework.
6.1. Abstract Optimization Problem Conceptually, a gamut mapping is a function g : Xmn ! Ymn ; where X, Y R3 are 3D color spaces, and m n are image dimensions. That is, every x 2 Xmn is a color image with m n pixels, and xij, 1 i m, 1 j n is the color of the pixel at position (i, j). Let G be the space of all gamut mappings, that is, all mappings g : Xmn ! Ymn. Practically, gamut mapping means devising an algorithm to pick a good/optimal g 2 G. Here, optimality (also goodness) is measured with respect to certain objectives and constraints. Let us first introduce some definitions before we discuss concrete objectives and constraints.
6.1.1. Fixed Mappings
We will call a mapping gf 2 G a fixed mapping if and only if it is induced by the mappings f in the form f : X ! Y. That is, gf ¼ f mn : Xmn ! Ymn—that is, all colored pixels get mapped as prescribed by the mapping f independent of their position in the image.
6.1.2. Global Image Dependency Fixed mappings are very restrictive but are still used in practice because they are fairly easy to compute: Only a good mapping f : X ! Y needs to be defined/computed. A slightly more flexible class of mappings is defined as follows: Given an image x 2 Xmn, the color content C(x) of x consists of all colors that appear in x, that is,
CðxÞ ¼ c 2 X : 9ði; jÞ such that c ¼ xij : Finding a mapping g 2 G is now reduced to computing a mapping fC(x) : X ! Y and g is simply the induced mapping gfCðxÞ:
6.1.3. Local Image Dependency Even less restrictive but still computationally feasible are locally imagedependent gamut mappings. In locally image-dependent gamut mappings, the mapping of every colored pixel xij depends on the color content of a neighborhood N(i, j) {1,. . ., m} {1,. . ., n} of the pixel at (i, j)*. Let Cij ðxÞ ¼ fc 2 X : 9ðk; lÞ 2 Nði; jÞ such that c ¼ xkl g * Typical neighborhoods of a pixel at (i, j ) are {i k, i k þ 1, . . . , i þ k 1, i þ k} { j k, j k þ 1, . . . ,
j þ k 1, j þ k}.
Gamut Mapping
27
be the color content in the neighborhood of xij. Finding a mapping g 2 G is now reduced to computing mappings fcij : X ! Y and g is simply the induced mapping gfcij that maps xij to fcij ðxij Þ:
6.1.4. Family of Parametric Mappings Often it is efficient to restrict the class of allowed mappings to a family of parametric mappings. Then, instead of optimizing over a huge space of possible mappings, the problem is reduced to optimize a few parameters. Of course, the gain in efficiency comes at the price that the theoretically possible optimum can only be approximated, and the approximation quality depends on the set parameters.
6.1.5. Constraints One constraint is mandatory for gamut mapping: All mapped colors must be in the destination gamut. Here we assume that Y is the destination gamut (this constraint is not mentioned explicitly hereafter). In practice, it is necessary to consider additional constraints such as continuity. Sometimes constraints can be incorporated in the objective function but often they must be defined explicitly.
6.1.6. Continuity Note that the notion of continuity assumes that we have a topology on Xmn and Ymn. In case of perceptually uniform color spaces X and Y, this would be the topology induced by the Euclidean metric on R3 and its subsets. In the following, we always assume that the topology on X is induced by some metric dX on R3 and that Xmn carries the product topology, analogously for Y and Ymn. For induced mappings gf continuity boils down to f : X ! Y being continuous. In general, we call g 2 G locally Lipschitz-continuous if dY ðgðxÞij ; gðxÞkl Þ cdX ðxij ; xkl Þ for some constant c > 0 and some neighborhood of (i, j). Note that if g is induced by a continuous map f : X ! Y, then g is locally Lipschitz continuous.
6.2. Concrete Objective Functions and Constraints In the following we discuss how the abstract concepts from the previous subsection can be made concrete by defining objectives and constraints. Any image-quality measure is a suitable candidate for an objective function. In general, the objective function should consider the following aspects: 1. Minimizing the average perceived color difference, that is, DE 2. Avoiding gamut-mapping artefacts
28
Zofia Baran´czuk et al.
3. Conserving local image information 4. Avoiding the enhancement of image artefacts. Accommodating these aspects requires a metric d on X [ Y that allows comparison of the colors in X and Y. Such a metric is available, for example, when both the source gamut X and destination gamut Y are subsets of a common working color space, which we want to assume in the following.
6.2.1. Minimizing the Perceptual Distance Pixel-Wise A natural objective is to minimize the total distance of the colors in the source gamut to their images in the destination gamut. This objective function simply reads as minimizef
m X n X
dðxij ; yij Þ:
i¼1 j¼1
In a psychovisually equidistant color space (where d is the Euclidean metric), optimizing this objective function leads to minimum distance clipping. In our terminology, clipping is a fixed mapping; it does not depend on the local or even global color content in the source gamut (image). Several studies have shown that not all color directions are equally important for gamut mapping. The human eye is more sensitive to hue changes than saturation or lightness. This fact can be accounted for by introducing the constraint that hue must not be changed. We have then the following optimization problem: minimizef
m X n X
dðxij ; yij Þ
i¼1 j¼1
subject to hueðxij Þ ¼ hueðyij Þ: This constrained optimization problem is solved by the hue-preserving minimum distance algorithm (HPMinDE) (Morovic and Sun, 2001). It is one of the two recommended reference gamut-mapping algorithms in the CIE guidelines (CIE, 2004a). Another way to address the difference in the importance of the different color directions is to use a weighted distance measure d with different weights for L, C, and H as proposed by Nakauchi et al. (1999). This again results in an unconstrained optimization problem using the weighted distance measure d instead of the Euclidean as in clipping. As soon as hue preservation becomes an issue, hue in the working color space needs to be defined accurately. Unfortunately, this is only approximatively the case in CIELAB and in CIECAM02. Color spaces
Gamut Mapping
29
with improved hue definition are IPT (Ebner and Fairchild, 1998) and mLab (Marcu, 1998).
6.2.2. Minimizing the Perceived Distance under Detail Preservation and Continuity Constraints Clipping algorithms as described above all lack constraints to prevent loss of details (local image information) or gamut-mapping artefacts. Clipping can create artifacts by mapping widely separated color points to the same point in the destination gamut (Zolliker and Simon, 2006). This can be prevented by introducing an appropriate constraint, that is, minimizef
m X n X
dðxij ; yij Þ
i¼1 j¼1
subject to
c1 dðx; x0 Þ dðy; y0 Þ;
with a defined constant c1 > 0. These constraints impose the limitation that the images y and y’ of color points x and x’, respectively, should not be much closer than the points x and x’. Note that by introducing this constraint the mapping becomes dependent on the color content of the source gamut (image); in our terminology, it becomes globally imagedependent mapping. Unfortunately, this problem is computationally costly to solve. A computationally tractable variant is presented by Giesen et al. (2007). This variant builds on an additional practically important objective—preservation of hues and grey axis (effectively reducing each point mapping from a 3D to a 1D mapping problem). This can be achieved by fixing one point on the grey axis, the so-called focal point, and mapping every color point in the source gamut along the ray from the focal point to the color point. Thus for every color point, only the best position on the intersection of the ray with the destination gamut needs to be found. This is already an easier problem but still computationally too expensive in practice. Hence in Giesen et al. (2007) the objective has been changed to exhausting the destination gamut as good as possible, that is, essentially moving the source color points as little as possible along the respective lines. Because the color points are now mapped independently along their rays, it may happen that nearby color points are mapped far apart (just what we wanted to prevent by the original objective function). We prevent this by imposing a continuity constraint dðy; y0 Þ c2 dðx; x0 Þ; with a defined constant c2 > 0. Finally, the problem is simplified by optimizing only the mapping of color points in the boundary of the image gamut and then interpolating the mapping for the interior points along the rays. The gamut boundary can be obtained as follows: The
30
Zofia Baran´czuk et al.
gamut is approximated by some ‘‘simple’’ covering shape within the working color space, and the boundary of the approximating shape is taken as the gamut boundary. A number of methods are known for computing covering shapes of image gamuts. These include the convex hull of the color points (Kress and Stevens, 1994), the segment maximum method (Morovic and Luo, 2000), alpha shapes (Cholewo and Love, 1999), flow shapes (Giesen et al., 2005), and star shapes (Giesen et al., 2007). Note that the convex hull description usually overestimates the image gamut. Even device gamuts are generally not convex. Alpha shapes and flow shapes aim at mitigating this problem. Therefore, we are looking at SHAPE(C(x)), where C(x) is the image gamut (color content of the image x to be mapped), and SHAPE can be either the convex hull, an alpha shape, flow shape, or segment maximum shape.
6.2.3. Spatial Gamut-Mapping Algorithms In principles using models of perception and human vision allows the design of algorithms that not only minimize the perceived color differences between the original and its reproduction but also preserve spatial image features such as the local contrast. A good example of such a spatial gamut-mapping algorithm has been given by Kimmel et al. (2005), who derive the mapping from a variational approach. The perceptual difference between the original x and the mapped image y is modeled as Dc ¼ g ðyc xc Þ; where c 2 {L, a, b}, * denotes convolution, and g is a normalized Gaussian kernel with a small variance s. The functional that is used in the variational formulation is then given as ð 1 X L a b ðDc þ a,Dc ÞdOÞ; Eðy ; y ; y Þ ¼ 2 c2fL;a;bg O where Dc is the c 2 {L, a, b} color plane of D. This is a powerful and quite general approach. The solution (a stationary point of the functional) can be computed iteratively, although this, to date, has been accomplished only for grey-scale images. Furthermore, halo effects must be avoided by using ad hoc methods because the functional as stated does not capture midrange frequencies, which typically govern halo effects. Another iterative approach has been proposed by McCann (2001), who describes an iterative gamut-mapping technique based on spatial image compression. The objective in McCann’s approach is to optimize ratios of radiances. In both optimization approaches, the main difficulty is defining an appropriate optimization criterion using a perceptual model. Another
Gamut Mapping
31
issue is the computational efficiency that makes use of these algorithms difficult in practical applications. So far, optimization approaches respecting spatial features become so high dimensional that the objective function cannot be optimized directly. This difficulty is probably why heuristic methods using appropriate local filters are much more popular in the literature. Using our terminology, these heuristics fall in the class of local image-dependent methods. Examples include Bala et al. (2001), Morovic and Wang (2003), Zolliker and Simon (2007), Bonnier et al. (2007), and Farup et al. (2007). Most of these concepts can be described in terms of our terminology as families of parametric mappings because they involve a set of parameters that must be optimized, either implicitly in the design step of the algorithm or explicitly through psychovisual data or image-quality measures. An example is the algorithm designed by Zolliker and Simon (2007). This algorithm has the following parameters: the choice of a basic pointwise gamut-mapping algorithm, a reference color distance sc, a reference spatial distance ss, and an overall weight parameter r. To date, this algorithm has been tested only on a very restricted set of parameters: four basic pointwise gamut-mapping algorithms, two values of r (0, 1), and fixed values for sc and ss. Hence it is still possible to find a better parameter setting—for example, through a conjoint analysis study.
7. CONCLUSION AND FUTURE WORK Gamut mapping regarded as a color transformation from the color capacity of one device to another has reached a satisfying degree of maturity. The state of the art is well documented in several publications within the past two decades. However, if gamut mapping is regarded as a color image transformation from the color capacity of one device to another, there still is potential for significant improvement in terms of perceived image quality. We consider the following three major challenges that need to be addressed to further advance the progress of gamut mapping: 1. A better understanding of the goals of gamut mapping. Such goals should be formulated in terms of image-quality measures derived from and validated on solid data from psychophysical measurements. Of particular interest is the influence of different mapping parameters and their interactions on the overall perceived image quality. 2. Definition and solution of optimization problems that reflect the above goals and are manageable in terms of computational resources. Based on today’s knowledge of image appearance, better gamut-mapping solutions should include spatial aspects and image dependencies.
32
Zofia Baran´czuk et al.
3. Today’s color management workflows, such as ICC color management, need to be adapted to efficiently incorporate modern gamutmapping algorithms. In particular, it is important progress beyond the inherent pixel-to-pixel color transformations.
REFERENCES Bala, R. (2003). Device characterization. In ‘‘Digital Color Imaging, The Electrical Engineering and Applied Signal Processing Series,’’ (G. Sharma, ed.), pp. 269–384. CRC Press, Boca Raton, FL. Bala, R., deQueiroz, R., Eschbach, R., and Wu, W. (2001). Gamut mapping to preserve spatial luminance variations. J. Imaging Sci. Technol. 45(5), 436–443. Baran´czuk, Z., Zolliker, P., and Giesen, J. (2009). Image quality measure for evaluating gamut mapping. In ‘‘17th Color Imaging Conference,’’ Vol. 17, pp. 38–43. Society for Imaging Science and Technology/Society for Imaging Display, Albuquerque, NM. Berns, R. S., Motta, R. J., and Gorzynski, M. E. (1993). CRT colorimetry. Part I: theory and practice. Color Res. Appl. 18(5), 299–314. Bonnier, N., Schmitt, F., Brettel, H., and Berche, S. (2006). Evaluation of spatial gamut mapping algorithms. In ‘‘14th Color Imaging Conference,’’ Vol. 14, pp. 56–61. Society for Imaging Science and Technology/Society for Imaging Display, Scottsdale, AZ. Bonnier, N., Schmitt, F., Hull, M., and Leynadier, C. (2007). Spatial and color adaptive gamut mapping algorithms. In ‘‘15th Color Imaging Conference,’’ Vol. 15, pp. 267–272. Society for Imaging Science and Technology/Society for Imaging Display, Albuquerque, NM. Cholewo, T. J., and Love, S. (1999). Gamut boundary determination using alpha-shapes. In ‘‘7th Color Imaging Conference,’’ Vol. 7, pp. 200–204. Society for Imaging Science and Technology/Society for Imaging Display, Scottsdale, AZ. CIE (Commission Internationale de L’Eclaraige [International Commission on Illumination]). (1932). CIE (Comite´ dEtudes sur la Colorime´trie) Proceedings 1931. Cambridge University Press, Cambridge. CIE (1978). Recommendations on Uniform Color Spaces, Color-Difference Equations, Psychmetric Color Terms. Supplement No. 2 of CIE publ. No. 15. Bureau Central de la CIE (Comite´ dEtudes sur la Colorime´trie), Paris. CIE (1995). Industrial Colour-Difference Evaluation, CIE publ. No. 116. Bureau Central de la CIE (Comite´ dEtudes sur la Colorime´trie), Vienna. CIE (2001). Improvement to Industrial Colour-Difference Evaluation, CIE publ. No. 1422001. Bureau Central de la CIE (Comite´ dEtudes sur la Colorime´trie), Vienna. CIE (2004a). Guidelines for the Evaluation, of Gamut Mapping Algorithms, CIE publ. No. 156. Central Bureau of the CIE (Comite´ dEtudes sur la Colorime´trie), Vienna. CIE (2004b). A Color Appearance Model for Color Management Systems: CIECAM02, CIE publ. No. 159. Central Bureau of the CIE (Comite´ dEtudes sur la Colorime´trie), Vienna. Ebner, F., and Fairchild, M. D. (1998). Developement and testing of a color space (IPT) with improved hue uniformity. In ‘‘14th Color Imaging Conference,’’ Vol. 6, pp. 8–13. Society for Imaging Science and Technology/Society for Imaging Display, Scottsdale, AZ. Engeldrum, P. G. (2000). Psychometric Scaling, A Toolkit for Imaging Systems Development. Imcotek Press, Winchester, MA. Eskicioglu, A. M., and Fisher, P. S. (1995). Image quality measures and their performance. IEEE Trans. Commun. 43(12), 2959–2965. Fairchild, M. D. (1998). Color Appearance Models. Addison-Wesley, Boston, MA. Fairchild, M. D., and Johnson, G. M. (2004). iCAM framework for image appearance, differences, and quality. J. Electron. Imaging 13, 126–138.
Gamut Mapping
33
Farup, I., Gatta, C., and Rizzi, A. (2007). A multiscale framework for spatial gamut mapping. IEEE Trans. Image Process. 16(10), 2423–2435. Farup, I., Hardeberg, J. Y., and Amsrud, M. (2004). Enhancing the SGCK colour gamut mapping algorithm. In ‘‘Proc. Second European Conference on Color in Graphics, Imaging and Vision.’’ The Society for Imaging Science and Technology, Aachen, Germany. Giesen, J., Schuberth, E., Simon, K., and Zolliker, P. (2005). Toward image-dependent gamut mapping: fast and accurate gamut boundary determination. In ‘‘Color Imaging X: Processing, Hardcopy and Applications,’’ (R. Eschbach and G. Marcu, eds.), Vol. 5667, pp. 201–210. SPIE/IS&T, San Jose, CA. Giesen, J., Schuberth, E., Simon, K., and Zolliker, P. (2007). Image-dependent gamut mapping as optimization problem. IEEE Trans. Image Process. 16(10), 2401–2410. Grassmann, H. (1844). Die lineare Ausdehnungslehre. Open Court Publishing, Peru, IL Wiegand, Leipzig, 1844; English translation, 1995, by Lloyd Kannenberg, A New Branch of Mathematics. Grassmann, H. (1853). Zur Theorie der Farbenmischung. Poggendorrfs Ann. Physik 89, 69–84. Guild, J. (1931). The colorimetric properties of the spectrum. Phil. Trans. Roy. Soc. London A 230, 149–187. Hurlbert, A., Wolf, K., Rogowitz, B., and Pappas, T. (2002). The contribution of local and global cone-contrasts to colour appearance: a retinex-like model. In ‘‘Proceedings of the SPIE 2002,’’ Vol. 4662, pp. 286–297. San Jose, CA. International Color Consortium (2003). File Format for Color Profiles. ICC, Reston, VA ICC.1:2003:9 (2003). Keelan, B. (2002). Handbook of Image Quality: Characterization and Prediction. Marcel Dekker, New York. Kimmel, R., Shaked, D., Elad, M., and Sobel, I. (2005). Space-dependent color gamut mapping: a variational approach. IEEE Trans. Image Process. 14(6), 796–803. Kress, W., and Stevens, M. (1994). Derivation of 3-dimensional gamut descriptors for graphic arts output devices. In ‘‘TAGA Proceedings,’’ pp. 199–214. Technical Association of the Graphic Arts, Sewickley, PA. Kubelka, P., and Munk, F. (1931). Ein Beitrag zur Optik der Farbanstriche. Zeit. technische Physik 12, 593–601. Land, E. H. (1986). Recent advances in the retinex theory. Vision Res. 26, 7–21. Marcu, G. (1998). Gamut mapping in Munsell constant hue sections. In ‘‘6th Color Imaging Conference,’’ Vol. 6, pp. 159–162. Society for Imaging Science and Technology/Society for Imaging Display, Scottsdale, AZ. Maxwell, J. C. (1855). Experiments on color, as perceived by the eye, with remarks on color blindness. Trans. Roy. Soc. Edinburgh 21, 275–297. McCann, J. J. (2001). Color gamut mapping using spatial comparisons. In ‘‘Proc. SPIE, Color Imaging, VI,’’ Vol. 4300, pp. 126–130. SPIE. McDonald, R., and Smith, K. (1995). CIE94—a new color-difference formula. Journal of the Society of Dyers & Colourists 111(12), 376–379. Michelson, A. A. (1927). Studies in Optics. University of Chicago Press, Chicago. Millen, A. B., Bunge, J., and Handley, J. C. (2001). Ranked data analysis of a gamut-mapping experiment. J. Electron. Imaging 10(2), 399–408. Moroney, N., Fairchild, M., Hunt, R., Li, C., Luo, M., and Newman, T. (2002). The CIECAM02 color appearance model. In ‘‘10th Color Imaging Conference,’’ pp. 23–27. Society for Imaging Science and Technology/Society for Imaging Display, Scottsdale AZ. Morovicˇ, J. (2008). Colour Gamut Mapping. John Wiley & Sons, Chichester, UK. Morovicˇ, J., and Luo, M. R. (2000). Calculating medium and image gamut boundaries for gamut mapping. Color Res. Appl. 25(6), 394–401.
34
Zofia Baran´czuk et al.
Morovicˇ, J., and Sun, P. L. (2001). Non-iterative minimum de gamut clipping. In ‘‘Ninth Color Imaging Conference,’’ Vol. 9, pp. 251–256. The Society for Imaging Science and Technology. Morovicˇ, J., and Wang, Y. (2003). A multi-resolution, full-colour spatial gamut mapping algorithm. In ‘‘11th Color Imaging Conference,’’ Vol. 11, pp. 282–287. The Society for Imaging Science and Technology, Scottsdale AZ. Morovicˇ, J., and Yang, Y. (2003). Influence of test image choice on experimental results. In ‘‘Proceedings of Society for Imaging Science and Technology/Society for Imaging Display 11th Color Imaging Conference, Scottsdale AZ,’’ pp. 143–148. Morrisey, J. H. (1955). New method for the assignment of psychometric scale values from incomplete paired comparisons. J. Opt. Soc. America 45(5), 373–378. Mosteller, F. (1951a). Remarks on the method of paired comparisons: I. The least squares solution assuming equal standard deviations and equal correlations. Psychometrika 16, 3. Mosteller, F. (1951b). Remarks on the method of paired comparisons: III. A test of significance or paired comparisons when equal standard deviations and equal correlations are assumed. Psychometrika 16, 203. Nakauchi, S., Hatanaka, S., and Usui, S. (1999). Color gamut mapping based on a perceptual image difference measure. Color Res. Appl. 24(4), 280–291. Neugebauer, H. E. J. (1937). Die theoretischen Grundlagen des Mehrfarbenbuchdrucks. Zeit. Photophys. Photochem. 36(4), 73–89. Schmidt, R., Lang, F., and Thews, G. (2005). Physiologie des Menschen, ed. 29, Springer, Berlin. Sekuler, R., and Blake, R. (2002). Perception, ed. 4, McGraw-Hill, New York. Sprow, I., Baranczuk, Z., Stamm, T., and Zolliker, P. (2009). Web-based psychometric evaluation of image quality. In ‘‘Image Quality and Systems Performance VI,’’ (S. P. Farnandi and F. Gatkema, eds.), Vol. 7242, p. 72420A. SPIE. ¨ ber die Zusammensetzung von Spektralfarben. Poggendorrfs von Helmholtz, H. (1855). U Ann. Physik 94, 1–28. Wandell, B. (1994). Foundations of Vision: Behavior, Neuroscience and Computation. Sinauer Associates, Sunderland, MA. Wang, Z., and Bovik, A. C. (2002). A universal imaging quality index. IEEE Sgnal Proc. Lett. 9(3), 81–84. Wright, W. D. (1928). A re-determination of the trichromatic coefficients of the spectral colours. Trans. Opt. Soc. London 30, 144–164. Wyszecki, G., and Stiles, W. (1982). Color Science: Concepts and Methods, Quantitative Data and Formulae. ed. 2, John Wiley & Sons, New York. Young, T. (1802). On the theory of light and colours. Philos. Trans. Roy. Soc. London 92, 210–271. Yule, J. A. C., and Nielsen, W. J. (1951). The penetration of light into paper and its effect on halftone reproduction. In ‘‘TAGA Proceedings’’ pp. 65–75. Technical Association of the Graphic Arts, Sewickley, PA. Zolliker, P., and Simon, K. (2006). Continuity of gamut mapping algorithms. J. Electron. Imaging 15(1), 13004. Zolliker, P., and Simon, K. (2007). Retaining local image information in gamut mapping algorithms. IEEE Trans. Image Process 16(3), 664–672.
Chapter
2 Color Area Morphology Scale-Spaces Adrian N. Evans
Contents
1. 2. 3. 4.
Introduction Area Openings and Closings Area Morphology Scale-Spaces Color Connected Filters 4.1. Color Area Morphology Scale-Spaces 4.2. Convex Color Sieve 4.3. Vector Area Morphology Sieve 4.4. Vector Area Morphology Open-Close Sieve 5. Performance Evaluation 5.1. Implementation and Timings 5.2. Application to Image Segmentation 5.3. Robustness to Noise 6. Conclusions Acknowledgments References
35 38 41 43 44 48 50 57 60 61 63 68 70 71 71
1. INTRODUCTION Multiscale image analysis using morphological scale-spaces has many important applications in computer vision, for example, segmentation (Gauch, 1999) and classification (Acton and Mukherjee, 2000). The concept was originally introduced using linear scale-spaces (Koenderink, 1984; Witkin, 1983) in which a succession of images is obtained by blurring the original image using Gaussian filters with increasing scale-space Department of Electronic and Electrical Engineering, University of Bath, BA2 7AY, United Kingdom Advances in Imaging and Electron Physics, Volume 160, ISSN 1076-5670, DOI: 10.1016/S1076-5670(10)60002-X. Copyright # 2010 Elsevier Inc. All rights reserved.
35
36
Adrian N. Evans
parameters. One drawback with this approach is that the locations of image boundaries tend to drift with increasing scale, an effect that can have the undesired result of merging distinct objects and destroying corners (Perona and Malik, 1990). The anisotropic diffusion algorithm of Perona and Malik (1990) was an attempt to address this problem by reducing the blurring in the presence of edges. An alternative approach to Gaussian filters is to use morphological operators in the process of generating the scale-spaces ( Jackway and Deriche, 1996; van den Boomgaard and Smeulders, 1994). Morphological scale-spaces based on successive openings and closings using structuring elements of increasing size can suffer from the same drifting of edges through scale that characterizes linear scale-spaces (Maragos, 1989; Park and Lee, 1996). In contrast, scale-spaces constructed using area morphology operators do not have this problem because area openings and closings are types of connected operators, which work by merging the regions of constant signal. As with scale-spaces based on other connected operators such as opening by reconstruction and levelings (Meyer and Maragos, 2000), area morphology scale-spaces do not introduce any new edges with increasing scale. Further, they do not alter any edges present at finer scales until the scale where they are completely removed is reached (Acton and Mukherjee, 2000). Area openings and closings belong to a class of morphological techniques called connected operators (Salembier and Serra, 1995). A full discussion and introduction to the subject of mathematical morphology is beyond the scope of the chapter and, indeed, is well covered elsewhere. An authoritative description of the mathematical basis of morphology is contained in the books by Serra (1982, 1988), and an excellent practical treatment, including examples of applications, is provided by Soille (2003). For a recent update on the advances in mathematical morphology, particularly those pertaining to the theme of connective segmentation, readers are referred to a chapter from volume 150 of this publication and references therein (Serra, 2008). In common with many other morphological techniques, problems exist in attempting to extend connected operators to color and other multichannel images because of the absence of an unambiguous ordering. Color morphology has received much attention from the research community and was the subject of a recent review paper by Apoula and Lefe`vre (2007). However, for the most part researchers have concentrated on the development of multichannel approaches for structural morphology rather than connected operators, as evidenced by the fact that only 3 of the 98 references in Aptoula and Lefe`vre (2007) consider the particular problem of extending connected operators to multichannel images. The conventional approach to extended morphological operators to color and other multi-channel images is to use marginal ordering, in
Color Area Morphology Scale-Spaces
37
which each channel is filtered independently. Since marginal processing completely disregards any interchannel correlation, when the image to be processed is highly correlated, it is often necessary to change the color space or apply a decorrelating transform before the ordering operation. In addition, as marginal processing can alter the spectral composition of an image, it can result in edge jitter and create colors not present in the original image. Despite these problems, marginal ordering does offer the advantage of allowing any of the grey-scale morphological techniques to be directly applied to multichannel imagery. In addition to marginal ordering, a number of other vectorial ordering techniques can be used. These have the advantage of eliminating the possibility of creating new colors by virtue of treating each pixel as a vector. The ordering schemes can be classified according to the taxonomy of sub-ordering schemes detailed in the classic paper on the ordering of multivariate data by Barnett (1976). In additional to marginal ordering, reduced (or aggregate) ordering, partial ordering, and conditional ordering can be employed and the use of all of these for color morphology has been proposed at some time or other. In particular, there has been much interest in the use of lexicographical ordering, the best-known type of conditional ordering, to define color morphology operations in hue-based color spaces (Hanbury and Serra, 2001a,b; Louverdis et al., 2002). Reduced ordering sorts vectors according to scalar values that are a function of the observations in each channel. Functions that have been investigated include luminance and individual color channels (Comer and Delp, 1999). A reduced ordering scheme based on principal component analysis has also been proposed (Li and Li, 2004). This chapter discusses the development of area morphology scalespaces for color images. Unlike morphological techniques based on the use of structuring elements, area morphology does not use erosions and dilations but instead works by using openings and closings. Therefore, the requirements for extending area scale-spaces to color images are subtly different from those of structural scale-spaces. The specific approach to be detailed is the vector area morphology sieve (VAMS) that first appeared in the literature in 2003 (Evans, 2003a,b). An approach known as the convex color sieve (CCS) (Gibson et al., 2003b) that is similar to the VAMS was developed independently around the same time and both approaches were first presented in June 2003 at the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing (Evans, 2003a) and Scale Space 2003, respectively. Subsequent work has shown that the two techniques have a common algorithm and differ only in the specific implementation of some of the algorithmic steps (Gimenez and Evans, 2005, 2008). The VAMS was first proposed for analyzing motion vector fields and has also been applied to color images. However, because its approach is generic, in theory it can be extended to images with an arbitrary number of channels.
38
Adrian N. Evans
The format of this chapter is as follows. Grey-scale area operations are introduced in Section 2 and their use in scale-spaces in described in Section 3. Some recently proposed color connected filters used for color area morphology scale-spaces are described in more detail in Section 4. Section 5 contains an evaluation of the performance of the color scalespaces, with particular emphasis on their application to image segmentation. Finally, conclusions are drawn in Section 6. The chapter gathers work from a number of sources; for a bibliography readers are referred to references (Evans, 2003a,b; Evans and Gimenez, 2008; Gimenez, 2007; Gimenez and Evans, 2005, 2008).
2. AREA OPENINGS AND CLOSINGS Openings and closings are two of the fundamental operations in mathematical morphology and, as such, are the building blocks for many other morphological operations (Serra, 1982). The performance of the opening and closing operations depends in large part on the structuring element used. If some a priori knowledge of the size, shape, and orientation of objects within the image is available, then this can be used to select the structuring elements for the filtering operation. An alternative approach applicable when little or no a priori shape information is available is to use an area morphology opening or closing, which removes light or dark structures of a given number of pixels regardless of their shape or orientation. Area openings and closings belong to the class of morphological operations known as connected operators. The first connected operator described in the literature was that of openings by reconstruction (Klein, 1976). The generalization of openings by reconstruction to grey-scale images and the development of efficient algorithms was proposed by Vincent (1993c). The use of reconstruction criteria to constrain the reconstruction process also has been proposed and the development of a fast approximation recently reported (Wilkinson, 2008). Also belonging to the general class of connected operators are levelings, which filter an image by removing or attenuating its contours (Meyer, 2004; Serra, 2008). Area openings and closings work by processing (merging or removing) the flat zones in an image. The flat zones of a signal are the largest connected components where the signal is constant (Salembier and Serra, 1995). If S is a subset of image pixels, then two pixels p and q are said to be connected if there exists a path between them consisting entirely of pixels in S. Typically the path is defined using 4 of 8 connectivity. For any pixel p in S, a connected component of S is the set of pixels that are connected to p in S. Therefore, any pixel p belongs to a flat zone that consists of the set of connected pixels with the same signal value. For grey-scale images the
Color Area Morphology Scale-Spaces
39
flat zones are connected components with the same intensity, whereas for color images they are connected components of constant color. The term area opening was introduced by Vincent (1993a), although as pointed out in Meijster and Wilkinson (2002), area openings and closings first were noted in the literature a year earlier as ‘‘a new type of opening operators (NOP) and closing operators (NCP)’’ (Cheng and Venetsanopoulos, 1992). For a grey-scale image I, the area opening (NOP) and area closing (NCP) operations from Cheng and Venetsanopoulos (1992) can be denoted gal and ’al , respectively, and defined as (2.1) gal ðIÞ ¼ _ ðIBÞ B2Al
and ’al ðIÞ ¼ ^ ðI B ), B2Al
(2.2)
where Al is the set of connected subsets with area l. Examination of Eq. (2.1) reveals that an area opening is the maximum of openings with all possible connected structuring elements of a minimal size (Cheng and Venetsanopoulos, 1992; Vincent, 1993b). An area opening can be considered adaptive in that the structuring element adapts itself to the image structure at every location so that the image is changed as little as possible. Subsequent area opening and closing definitions use a threshold decomposition (Acton and Mukherjee, 2000; Meijster and Wilkinson, 2002) to produce a set of binary images from a grey-scale input by applying a threshold t for t ¼ 0, 1, . . ., L 1, where L is the number of discrete grey levels. An area opening can be applied to each binary image by removing all connected components whose area is less than l. The resulting grey-scale image is reconstructed by a stacking operation, such that the output at each pixel position is the maximum threshold at which the area opened binary image is true. Area closings can be implemented by applying the same operations to the complement image. It is immediately apparent that a naı¨ve implementation of Eq. (2.1) or (2.2) requires an image to be opened (or closed) using all possible connected structuring elements consisting of l pixels. The number of structuring elements involved rapidly becomes unfeasible and although a more efficient algorithm was proposed in Cheng and Venetsanopoulos (1992), it was not until the pixel-queue algorithm was proposed by Vincent (1993a,b) that the application of area openings and closings to grey-scale images became a practical proposition. Figure 1 compares the operation of an area closing with that of a conventional closing using a structuring element. When the image is closed using a 9 9 structuring element (Figure 1b), many of the darker details—for example, the boats’ masts—are removed. In addition, the size
40
Adrian N. Evans
(a)
(b)
(c)
(d)
FIGURE 1 Boats. (a) 576 720-pixel original image, (b) closing with a 9 9 structuring element, and (c) area opening and (d) closing to area l ¼ 81, respectively.
and shape of the structuring element used to probe the image can clearly be seen in the result, which distorts edge positions. Area closing to an area l ¼ 81 (Figure 1d) removes the smaller, darker connected components— for example, from the beach—without altering the positions of any of the other image boundaries. As illustrated by Figure 1c and 1d, area openings and closings work by removing light and dark structures with area less than l, respectively. This observation provides the basis for Vincent’s pixel-queue algorithm, where the light and dark structures to be removed are synonymous with regional maxima and minima, respectively. Regional maxima (resp. minima) are connected sets of pixels whose intensity is greater (resp. less) than that of their connected neighbors. Rather than processing every pixel directly, to perform an area opening the pixel-queue algorithm identifies all regional maxima and places them in a list. Each maximum is then processed by adding the neighbor with the highest grey level and the process is repeated until the desired area size has
Color Area Morphology Scale-Spaces
41
been reached or the region becomes nonmaximal. Full details of this algorithm are provided in Vincent (1993a,b) and a convenient summary is given in Meijster and Wilkinson (2002). Other efficient algorithms have been proposed, including the Max-tree of Salembier et al. (1998) and Tarjan’s union find algorithm (Wilkinson and Roerdink, 2000). A full comparison of the performance of these two fast algorithms with the pixel-queue has demonstrated their advantages (Meijster and Wilkinson, 2002). In particular, the processing times for area closings using the max-tree and union find algorithms do not exhibit the strong dependence of l and image content found with the pixel-queue algorithm. The processing speed for all methods is linear with image size, with the union find generally the best-performing method. Attribute openings and closings are a more generalized approach to area morphology and use attributes other than area to control the filtering action (Breen and Jones, 1996). Both increasing and nonincreasing criteria can be accommodated using attribute openings and thinnings, respectively. Examples of increasing criteria include contrast, volume (Salembier et al., 1996), moments of inertia, and power (Young and Evans, 2003). Nonincreasing attributes allow use of shape criteria.
3. AREA MORPHOLOGY SCALE-SPACES All nonlinear scale-spaces based on morphological operations have at their heart the application of successive morphological operations of increasing size. Using standard openings and closings, Maragos (1989) proposed a morphological scale-space with the important property of not creating any spurious extrema with increasing scale. However, as with other approaches using structuring elements, problems with causality exist when they are applied to more than one dimension ( Jackway and Deriche, 1996; Park and Lee, 1996). In essence, this means that the positions of image edges are not fixed as scale increases and can drift through the scale-space, which is the same problem found with linear scale-spaces based on Gaussian filters. One way to overcome this problem is to replace the structural openings and closings with the equivalent area morphology operations and combine them in an alternating sequential filter (ASF) structure (Bangham et al., 1996). Bangham et al. termed these filters M- or N-sieves, respectively, according to whether the area opening or closing operation was performed first. Acton and Mukherjee (2000) refer to these as area open-close (AOC) and area close-open (ACO) scale-spaces and have used them for scale-space classification. Given an image I, the AOC and ACO image representations at scale l are defined by
42
Adrian N. Evans
AOCl ðIÞ ¼ ’al gal ð’al1 gal1 ð. . . ð’a2 ga2 ð’a1 ga1 ðIÞÞÞÞÞ
(2.3)
AOCl ðIÞ ¼ gal ’al ðgal1 ’al1 ð. . . ðga2 ’a2 ðga1 ’a1 ðIÞÞÞÞÞ;
(2.4)
and
where gal and ’al are the area opening and closing operations to an area l, given by Eqs. (2.1) and (2.2), respectively. These scale-spaces possess some important properties for multiscale image analysis. First, combining area openings and closings in this manner results in a sieve structure that is guaranteed not to produce any new extrema as the scale increases. Second, the ACO and AOC sieves possess the property of strong causality (Acton and Mukherjee, 2000). This means that not only will no new edges be created as scale increases, but the position of existing edges also will not drift through scale-space. The ACO and AOC are both sieves and ASFs, although not all ASFs have the properties of sieves (Bangham et al., 1996). As openings and closings do not commute, the results of sieving an image to area l using AOC and ACO sieves are not guaranteed to be identical, although they are often very similar. Conceptually, the algorithms for the AOC and ACO area morphology sieves described by Eqs. (2.3) and (2.4) have two main steps: (1) the identification of extrema regions and (2) the merging of the extrema with the neighboring region with the closest grey-scale value. These steps are repeated with increasing scale giving rise to the algorithm below: 1. Identify all regional extrema; 2. Merge all scale 1 regional extrema with their nearest neighbor; 3. Repeat step 2 with increasing scale, up to scale ¼ l. This algorithm clearly shows that it is possible to have a clear separation between the stages of identifying and merging the extrema. For grey-scale images the mechanisms used to perform these steps are clear and unambiguous: the extrema are the regional maxima and minima and are merged with the region with the closest intensity value. For color images, different choices can be made for both these stages; these are explored in more detail in Section 4.1. An efficient and convenient way to interact with the scale-space representations derived using area operators is to use a tree-based representation. If either image maxima and minima are to be processed independently, the max-tree or the min-tree of Salembier and Garrido (2000) can be used to represent the connected components of the space. When it is desirable to represent and process both maxima and minima simultaneously, the inclusion tree can be built from the level line image representation
Color Area Morphology Scale-Spaces
43
(Monasse and Guichard, 2000), where the level lines are the boundaries of the level sets of the image. The inclusion tree induces a scale-space image structure in which scale is in terms of the number of pixels (i.e., the area for two-dimensional [2D] image data). A review of region tree representations—namely, the max-tree, min-tree, inclusion tree and binary partition tree—that have been used to create connected operators is provided by, Salembier (2008). Other connected operators can be used instead of area openings and closings within an ASF structure to generate a scale-space representation. Openings and closings by reconstruction are one possibility, but these are far more computationally expensive because unlike the scale-spaces derived using fast area morphological operators (Meijster and Wilkinson, 2002), the algorithms for their computation are not nearly so efficient. As an alternative to using an ASF, levelings can be used to provide a symmetric treatment of the image peaks and valleys (Meyer and Maragos, 2000).
4. COLOR CONNECTED FILTERS The problem of extending connected operators to multichannel images has received only limited attention. Weber and Acton (2004) proposed a color connected filter based on applying marginal ordering in the hue, saturation, and value (HSV) color space that overcomes the lack of ordering in the hue by applying a rotational shift. The filter outperforms redgreen-blue (RGB) marginal connected filtering for impulsive noise reduction but, like all marginal filters, is not vector preserving. Several approaches to vector levelings have been proposed, including separable and nonseparable levelings (Meyer, 2000, and Zanoguera and Meyer, 2002, respectively). A comparison of the performance of vector levelings using lexicographical total orderings for image segmentation has been performed by Angulo and Serra (2003). Related work includes the application of seeded region growing to the quasi-flat zones of multichannel images, as proposed by Brunner and Soille (2007). More recently an approach based on constrained connectivity that considers the grey-scale variations—both along connected paths and within connected components—has been described and results for its extension to color images presented (Soille, 2008). Alternatively, a connective criterion can be used to aggregate regions around a set of seeds. Jump connections provide one such criterion that forms connected regions in which each pixel is less than a jump connection k above a minimum (Serra, 2008). Iterated jumps have been used for color image segmentation by separately segmenting the hue, luminance, and saturation and then using the saturation to choose between the hue and
44
Adrian N. Evans
luminance partitions (Angulo and Serra, 2007). Serra (2009) recently extended this approach to images with an arbitrary number of channels using the unit sphere. An elegant solution for four principal components is also detailed. The remainder of this section is devoted to the extension of area morphological filters to color images and the development of color area morphology scale-spaces. The three scale-spaces described in detail are those based on the CCS, the VAMS, and the vector area morphology open-close sieve (VAMOCS).
4.1. Color Area Morphology Scale-Spaces A number of key challenges must be overcome to extend the grey-scale area morphology scale-space algorithm described in Section 3 to color images. The CCS of Gibson et al. (2003a,b) and the VAMS of Evans (2003a,b) are two approaches to overcoming these challenges. Although these sieves were developed independently, subsequent analysis has shown that they share a common algorithm and differ only in their choices for implementing some of the key choices that must be made at various stages in the algorithm. The algorithm for the filters has the following steps: 1. 2. 3. 4.
Identify all regional extrema Merge all scale 1 regional extrema with their nearest neighbor Repeat steps 1 and 2 until no extrema exist at current scale Repeat steps 1 to 3 with increasing scale, up to scale ¼ l.
Comparison with the grey-scale area morphology scale-space algorithm presented in Section 3 shows that the major difference is the inclusion of an additional step: Repeat steps 1 and 2 until no extrema exist at current scale. This step is required because, unlike the grey-scale case, with color imagery the merging process can create new extrema in the vicinity of the merged regions. Therefore, to ensure that the property of idempotence is preserved, any new extrema with areas less than the current value of l must be processed before moving to the next scale. Several other significant differences between the color and grey-scale sieves exist. First, in grey-scale area morphology extrema are unambiguously identified as either regional maxima or minima. In the color equivalent, extrema are not further classified as maxima or minima and there is no unique method for their identification. Furthermore, as the additional step in the color algorithm requires the extrema identification and merging steps to be repeated at each scale, the performance of the color area morphology sieves is largely determined by the extrema definition used. The approach used by the VAMS to identify color extrema is discussed in detail in Section 4.1.1.
Color Area Morphology Scale-Spaces
45
Similarly, there are a number of choices for the mechanism of selecting the neighboring region with the closest color to merge with in step 2 of the algorithm. Unlike the grey-scale case where the closest intensity is given by the difference in grey-scale, for color images there are a number of norms and distance measures that can be used. Section 4.1.2. provides a brief discussion of some of the possible distance metrics. Finally, the second step of the algorithm contains a further subtlety: the value to be assigned to the merged region. The conventional approach for grey-scale images is to set the merged region to the intensity of the non-extreme region. Salembier and Garrido (2000) have proposed other possibilities that also account for the area of the two merging regions and for color images the freedom to explore various other options may be advantageous.
4.1.1. Color Extrema The problem of vector ordering is found in all areas of multichannel morphology. Likewise, in color area morphological sieves vector ordering provides the mechanism for identifying the extremal pixels. In addition to identifying extrema in its initial step, the color area morphology scalespace sieve algorithm also creates new extrema during the merging process of step 2, which must be subsequently processed. Therefore, the proportion of regions classified as extrema is critical to the sieve’s performance. Extrema definitions that result in a high proportion of image regions being classified as extreme produce sieves that are characterized by an extremely aggressive sieving action, as few regions smaller than the current area survive at each scale. Another view is that extrema are the seeds from which the image simplification process commences, in which case an extrema definition that identifies a relatively low proportion of extreme regions will not significantly alter the image until larger scales. The other significance of the proportion of extreme regions is that it directly relates to processing time at each scale. Some of the various approaches to this problem can be considered with reference to the example set of vectors shown in Figure 2. Applying lexicographical ordering to this vector set results in very different vectors being selected as the maximum and minimum, depending on which channel is given priority. Furthermore, the categorization of vectors as either maxima or minima is not as intuitive as in the scalar case. This problem is addressed by an alterative vector ordering scheme based on reduced ordering; such a scheme has been used in the design of many multichannel filters including the vector median (Astola et al., 1990) and color edge detectors (Trahanias and Venetsanopoulos, 1993a, 1996). For a ! ! ! ! set of n vectors, x 1 ; x 2 ; . . . x n ;, the aggregate distance of x i to the other vectors in the set is denoted di and given by
46
Adrian N. Evans
9 8 7 6 5 4 3 2 1 0
0
1
2
3
4
5
6
7
8
9
FIGURE 2 Example vector set (2, 3), (4, 1), (4, 7), (6, 6), and (8, 2). The vectors’ aggregate distances using Eq. (2.5) with the L2 norm are 18.38, 18.34, 19.11, 17.09, and 21.08, respectively. ' IEEE 2008.
di ¼
n X ! ! k x ðiÞ x k kp ;
i ¼ 1; 2; . . . ; n;
(2.5)
k¼1
where kkp is a vector norm. When the vectors are ordered according to their dis such that d(1) d(2) . . . d(n), the result is the ordered sequence ! ! ! x ð1Þ x ð2Þ . . . x ðnÞ : (2.6) ! ! In this ordering, x ð1Þ is the vector median ð x vm Þ, defined by n n X X ! ! ! ! k x vm x k kp k x i x k kp ; k¼1
i ¼ 1; 2; . . . ; n;
k¼1
and is the vector whose aggregate distance to the other vectors is less than or equal to the aggregate distances of all other vectors. This definition of the vector median forms the basis of the widely used vector median filter (Astola et al., 1990). ! The highest-ranked vector is known as the vector extremum, x ve (Evans, 2003a), such that n n X X ! ! ! ! k x ve x k kp k x i x k kp ; k¼1
k¼1
i ¼ 1; 2; . . . ; n:
(2.7)
Color Area Morphology Scale-Spaces
47
This definition of the vector extremum has also been used by other authors, where it has been termed the vector outlier (Zhu et al., 1999) or the most spectrally singular pixel (Plaza et al., 2002). For the set of vectors shown in Figure 2 the aggregate distances (listed in the figure’s caption), show that the vector median and extremum are (6, 6) and (8, 2), respectively. Although in this example the vector median and extremum are uniquely defined, it should be noted that there is no guarantee that this is always the case. As an extreme example, consider any set of only two vectors, where this ordering results in two vectors being both the median and extremum. To provide an insight into the vector ordering of Eqs. (2.5) and (2.6), it is interesting to compare the results it produces when applied to scalar data with those of a simple sort operation. For example, consider the sorted set of numbers shown below: 25; 32; 71; 139; 153; 208; 231; 233; 244: A simple inspection reveals that the minimum ¼ 25, the median ¼ 153, and the maximum ¼ 244. If the values are then sorted according to their aggregate distances using Eqs. (2.5) and (2.6), then their order is 153; 139; 208; 71; 231; 233; 244; 32; 25: Comparing these two orderings, it can be seen that the (vector) median provides a fulcrum about which the other values are pivoted and, in the scalar case, the highest-ranked value will be either the maximum or the minimum. Therefore, with this ordering, instead of two extrema (the maximum and the minimum) being identified, only a single extremum exists, as given by the highest-ranked vector. For vector data (as in Figure 2), the notion of a vector minimum or maximum is not consistent with an intuitive interpretation and therefore the identification of a single extremum is more rational. For example, in Figure 2 the vector extremum is (8, 2) and is synonymous with an intuitive interpretation of an outlier but cannot be easily classified as a maximum or minimum. Other similar vector orderings that can be used to identify vector extrema have been proposed, such as ranking vectors according to their distance from the vector median and using the convex hull (Gibson et al., 2003b). In terms of area morphology scale-space sieves for color images, the existence of only a single class of extrema is very helpful, as the problems of accommodating both maxima and minima within the filter structure are avoided.
4.1.2. Distance Metrics Lp norms are widely used metrics and therefore are an obvious choice for determining the aggregate distances in Eq. (2.7). However, there are many alternative distance metrics that can be used both to determine the vector
48
Adrian N. Evans
extrema and to identify the neighboring region with the closest color with which to merge in step 2 of the algorithm. In other areas of color image processing the use of the angular difference between two vectors as the distance metric has been proposed, for example, in the vector directional filters of Trahanias and Venetsanopoulos (1993b). ! ! In this approach, for two vectors x i and x j the angular difference Ly is given by 0 1 !! : x x i j Ly ¼ cos1 @ ! ! A: (2.8) k x i kp k x j kp Combined metrics consider both magnitude and direction, thus capturing the advantages of both properties. For example, Androutsos et al. (1998) normalize the direction and magnitude differences by their maximum possible values to give the following combined metric: " ! ! # k x i x j kp 2 (2.9) Lae1 ¼ 1 1 Ly 1 pffiffiffiffiffiffiffiffiffiffiffiffiffi : p 3:2552 Equation (2.9) is advantageous in that only the magnitude difference is used when the two vectors under consideration are collinear. When the magnitudes of both vectors are low, the magnitude term on the right is close to unity and the metric is dominated by the angular difference. However, in such cases a small change in the response in an individual color channel can have significant influence on the angular difference and, as low intensity is often associated with random hue, an alternative combined metric, denoted Lae2, takes the angle from the point RGB ¼ [255, 255, 255], if this is smaller than Ly (Gimenez, 2007).
4.2. Convex Color Sieve The CCS was proposed by Gibson et al. (2003b) as an attempt to extend the grey-scale area morphology sieve of Bangham et al. (1996) to color images. As its name implies, the mechanism used by the CCS to determine image extrema is based on a convex hull. The method proposed was a local convex hull, consisting of a pixel and its connected neighbors. For images, the local neighborhood is given by either the four or eight nearest neighbors’ connectivity, and a pixel is defined as extreme if it lies on the exterior of the local convex hull. Figure 3 shows an example of the local convex hull for the (6, 5) pixel from a two-channel vector image. In this example, all pixels in the set are extreme, as they are all on the exterior of the hull. In fact, this approach results in all except one of the vectors in Figure 3a being classified as extreme. For color images using the RGB color space, with a maximum of five or nine vertices for 4- or 8-connectivity, respectively, degenerate cases result
Color Area Morphology Scale-Spaces
49
(a) 1,0
3,0
8,7
2,0
2,0
2,0
8,–4
2,–1
1,–1
2,0
2,0
2,0
3,0
1,–1
2,0
2,0
2,0
2,0
1,–1
2,–1
2,0
2,0
–6,–6
2,0
1,–1
2,–1
2,0
1,–1
2,0
2,0
3,0
2,0
2,0
2,–1
2,0
2,0
1,–1
1,0
–4,–3
–6,–5 –9,–8 1,–1
(b) 1 0 −1 −2 −3 −4 −5 −6 −7 −8 −9
−9 −8 −7 −6 −5 −4 −3 −2 −1
0
1
2
3
FIGURE 3 Local convex hull example. (a) Two-dimensional vector image and (b) convex hull for (6 5) vector and 8-neighbor connectivity.
where the hull is reduced to either a line or a plane. To accommodate this scenario, the approach in Gibson et al. (2003b) was to define a point as extreme if it lies on the exterior of the reduced dimensionality convex hull. Defining extrema using the local convex hull approach has the advantage that the hull topology is invariant to monotonic scaling and linear axes transformations. However, it typically results in a large proportion of regions being defined as extreme. For example, the color test image Lily (shown in Figure 4) has more than 80% of its pixels classified as extrema. The high proportion of extrema is at odds with an intuitive interpretation of extrema as outliers. There are also many instances of different pixels that are neighbors being labeled as extreme. Although this can occur in the grey-scale case, where maxima and minima regions can be neighbors, this does not provide a satisfactory explanation of the many adjacent extrema. The CCS uses the Euclidean distance to determine the neighbor with which to merge in step 2 of the color area sieve algorithm. However,
50
(a)
Adrian N. Evans
(b)
FIGURE 4 CCS extrema for the Lily test image using 8-nearest neighbor connectivity. (a) Original image and (b) CCS extrema (shown in white).
because it is possible for two or more different colors to be equidistant from a reference color, a further ordering using the difference in luminance was proposed. Any remaining ties are then ordered by their G, R, and B values, respectively. Although not explicitly stated in Gibson et al., (2003b), the convention used to select the color to assign to the combined pixels, after the merging process, follows that of standard grey-scale area morphological sieves— that is, the color of the pixel selected to be merged with. Therefore, when two single-pixel regions are neighbors and both are classified as extreme, the color assigned to the resultant merged region will arbitrarily depend on which extrema is processed first. New extrema created by the merging process are identified and processed by the recursion introduced by step 3 of the algorithm. As with other sieves, the set of scale-space images produced by the CCS can be presented in a tree-based representation. An evaluation of how semantically meaningful the segmentations from the CCS are was undertaken by Gibson et al. (2003a). In this work, the approximately 50 regions were extracted form the scale-space tree, starting from (but excluding) the root. Comparison with sets of human-segmented ground truths from a set of 50 images showed some agreement, although there was better agreement among the humans who performed the segmentations. A comprehensive comparison of segmentation performance produced by the CCS and other color area-morphology scale-spaces is provided in Section 5.2.
4.3. Vector Area Morphology Sieve The VAMS was developed independently of the CCS and was first presented in 2003 (Evans, 2003a,b). Although its algorithmic steps are identical to the CCS, it differs in how the stages are implemented. In particular,
Color Area Morphology Scale-Spaces
51
the mechanism used to identify the extrema is significantly different. As discussed in Section 4.1, because the extrema definition has a major influence on the sieve’s performance the VAMS and CCS can produced substantially different results. The starting point for the VAMS’s extrema definition is the vector ! extrema ð x ve Þ of Eq. (2.7). However, unlike the CCS that applies a binary extreme/non-extreme decision at each individual pixel, the VAMS uses a slightly different approach. First, a scalar image is derived from the vector data by calculating the aggregate difference from each pixel to its neighbors using Eq. (2.5). Then, for each flat zone containing more than one pixel, the aggregate distances are averaged and the mean value assigned to all its pixels. Finally, the extrema are identified as the maxima in the scalar surface of all the aggregate distances. In this approach, the extrema are those regions whose sum of distances to their connected neighbors is greater than that of any neighboring region. This process is illustrated in Figure 5a, which shows the example motion field from Figure 3 with the flat zones of area > 1 clearly identified and the corresponding scalar surface. Note that in comparison with the CCS, where all but one flat zone was classified as extreme, the VAMS produces only four initial extrema. The VAMS uses the Euclidean distance to identify the region with which to merge in step 2 of the algorithm. This approach is very similar to the CCS; the only difference is how ties are resolved, where the VAMS simply uses scan order. New extrema created by the merging process are themselves merged and the scalar image is updated. The sieving process is shown in Figure 5, where parts b through d show the vector field, its flat zones, and the corresponding scalar images up to scale l ¼ 2 when all the extrema have an area 2. In this example, the merged regions are assigned the vector of the non-extreme region, as is conventional. Figure 6 shows the results of applying the VAMS to the Lily test image of Figure 4a. As the scale increases (Figure 6b to d), patches of extreme color are removed from the image—for example, from the lily’s petals— without altering the position of any other image boundaries. Figure 6a shows the initial extrema, which constitute approximately 6% of image regions. Comparing the initial extrema for the VAMS to those produced by the CCS for the same image in Figure 4b, which has more than 80% of the regions classified as extreme, it can be seen that the VAMS has a much less aggressive sieving action. As discussed in Section 4.1, other choices can be made in implementing the various steps of the VAMS algorithm. Some of these possibilities are examined with respect to the noise-reduction performance of the VAMS in the following section.
52
Adrian N. Evans
(a) 1,0
3,0
8,7
2,0
2,0
2,0
10.1
12.0
28.5
4.0
4.0
4.0
8,–4
2,–1
1,–1
2,0
2,0
2,0
21.2
10.1
14.5
4.0
4.0
4.0
3,0
1,–1
2,0
2,0
2,0
2,0
10.9
5.7
4.0
4.0
4.0
4.0
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
3.0
4.0
4.0
33.6
4.0
1,–1
2,–1
2,0
–6,–5 –9,–8 1,–1
2.2
3.0
4.0
33.4
33.7
15.0
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
4.0
4.0
14.5
11.9
11.9
2,–1
2,0
2,0
1,–1
1,0
–4,–3
2.0
4.0
4.0
4.7
7.8
12.5
(b) 1,0
3,0
3,0
2,0
2,0
2,0
4.0
3.3
3.3
3.5
3.5
3.5
3,0
2,–1
1,–1
2,0
2,0
2,0
3.9
4.8
6.1
3.5
3.5
3.5 3.5
3,0
1,–1
2,0
2,0
2,0
2,0
3.9
5.7
3.5
3.5
3.5
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
3.0
3.5
3.5
24.8
3.5
1,–1
2,–1
2,0
–6,–5 –6,–6 1,–1
2.2
3.0
3.5
30.2
24.8
11.4
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
3.5
3.5
14.5
7.2
7.2
2,–1
2,0
2,0
1,–1
1,0
1,0
2.0
3.5
3.5
4.7
1.5
1.5
3,0
3,0
3,0
2,0
2,0
2,0
1.9
1.9
1.9
3.5
3.5
3.5
3,0
2,–1
2,–1
2,0
2,0
2,0
1.9
3.6
3.6
3.5
3.5
3.5
3,0
1,–1
2,0
2,0
2,0
2,0
1.9
5.7
3.5
3.5
3.5
3.5
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
3.0
3.5
3.5
26.5
3.5
1,–1
2,–1
2,0
–6,–6 –6,–6 1,–1
2.2
3.0
3.5
26.5
26.5
11.4
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
3.5
3.5
15.1
7.2
7.2
2,–1
2,0
2,0
1,–1
1,0
1,0
2.0
3.5
3.5
4.7
1.5
1.5
(c)
(d) 3,0
3,0
3,0
2,0
2,0
2,0
1.8
1.8
1.8
3.5
3.5
3.5
3,0
2,–1
2,–1
2,0
2,0
2,0
1.8
2.7
2.7
3.5
3.5
3.5 3.5
3,0
2,–1
2,0
2,0
2,0
2,0
1.8
2.7
3.5
3.5
3.5
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
2.7
3.5
3.5
26.5
3.5
1,–1
2,–1
2,0
–6,–6 –6,–6 1,–1
2.2
2.7
3.5
26.5
26.5
11.4
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
3.5
3.5
15.1
7.2
7.2
2,–1
2,0
2,0
1,–1
1,0
1,0
2.0
3.5
3.5
4.7
1.5
1.5
FIGURE 5 VAMS operation using 4-neighbor connectivity and the L2 norm. (a) Original vector image with flat zones of area > 1 identified using color and corresponding scalar image, with extrema colored red. (b)-(d) Corresponding vector and scalar images as extrema of area 2 are merged.
Color Area Morphology Scale-Spaces
(a)
(b)
(c)
(d)
53
FIGURE 6 VAMS example for Lily test image using 8-connectivity. (a) Initial extrema (black), (b) area ¼ 9, (c) area ¼ 49, and (d) area ¼ 81.
4.3.1. VAMS Parameter Evaluation This section looks at the effect of some of the different choices that can be made at the VAMS algorithm’s steps on the sieve’s performance. In particular, the influence of these on the noise-reduction performance of the VAMS is considered. The first choice to be considered is the distance metric used to form scalar images from the aggregate distances, which consequently affects the proportion of regions classified as image extrema. Figure 7 shows the aggregate distances for the Lily test image using the L2 and Lae2 norms (as described in Section 4.1.2.). These scalar images have 2118 and 2209 initial extrema for the L2 and Lae2 norms, respectively. To evaluate the noise-reduction performance both the normalized mean square error (NMSE) and mean chromaticity error (MCRE) were used, as these are widely used metrics for determining objective error measures for color images (Trahanias and Venetsanopoulos, 1993b). The NMSE is given by
54
Adrian N. Evans
(a)
(b)
FIGURE 7 VAMS aggregate distance images using (a) L2 norm and (b) Lae2 metric.
PM PN NMSE ¼
! f 0 ði; jÞk2 ; PM PN ! 2 i¼0 j¼0 k f ði; jÞk
i¼0
!
j¼0 k f ði; jÞ
(2.10)
! ! where f ði; jÞ and f 0 ði; jÞ are the original and filtered vectors at pixel (i, j), and M and N are the image dimensions. The MCRE is defined as ! ! the distance between the intersection points of f ði; jÞ and f 0 ði; jÞ and the Maxwell triangle. When the triangle is defined on the unit plane, this gives 1 XM XN h ! ! MCRE ¼ ð rði; jÞ r0 ði; jÞÞ2 i¼1 j¼1 MN i1=2 ! ! ! ! þð gði; jÞ g0 ði; jÞÞ2 þ bði; jÞ b0 ði; jÞÞ2 ; (2.11) where r, g, and b are the normalized RGB values (Trahanias and Venetsanopoulos, 1993b). A set of six test images (Lily, Autumn, Sample1, Lenna, Boats, and Baboon) were corrupted with uncorrelated Gaussian (s2 ¼ 1000), 10% impulsive, and mixed noise using the noise model of Viero et al. (1994). The images were sieved using the VAMS with the extrema identified using the L1 and L2 norms, the differences in luminance (LY), angle (Ly), and the combined distance and angle differences Lae1 and Lae2. All images were sieved up to area l ¼ 12 and, for each image, the minimum NMSE between the filtered and the original, noise-free image was found. For all aggregate distances used to identify the extrema, the L2 norm was used to select the region with which to merge in step 2 of the algorithm, and the merged region was assigned the color of the non-extreme region. Figure 8 shows the average NMSE results for the six test images. The MCRE at the scale that produced the minimum NMSE was also recorded and these results are also shown. All choices for the extrema definition
55
Color Area Morphology Scale-Spaces
NMSE/1e−2 15 10
5
MCRE/1e−2 20
Input L1 L2 LY Lq Lae2 Lae1
15
10
3 5
2
3
1
2 0.5 Input
Gaussian Impulsive
Mixed
Input
Gaussian Impulsive
Mixed
FIGURE 8 VAMS results using various distance metrics to calculate the aggregate distances used for extrema calculation. ' IEEE 2008.
result in a VAMS that reduces the image noise, with the reduction greater for impulsive noise than for Gaussian noise. Using the luminance difference LY produced the worst noise-reduction performance for all noise types for both the NMSE and the MCRE. The performance of the L1 and L2 norms was broadly similar, with the L2 norm having a slight advantage for impulsive noise. Both combined distance metrics performed well and had a slightly lower NMSE and MCRE than the L1 and L2 norms for Gaussian and mixed noise. Their results were comparable for impulsive noise. The angular difference Ly produced the lowest MCRE for Gaussian and mixed noise. Also shown are the results for the VAMS applied to the original image, which give a measure of the minimum distortion the sieve will introduce at small scales. The six distance metrics described above were also used to investigate the influence of the distance metric in selecting the region with which to merge on the noise-reduction performance. As before, the noise reduction performance was evaluated using the NMSE and MCRE (Figure 9). Here, regardless of the method used to find the closest region, the L2 norm was used to identify the extrema and distance ties were resolved using luminance. The results show that the angular difference introduced the greatest distortion on the original, noise-free image and also performed poorly for all noise types. For example, with Gaussian noise it does not produce any improvement on the noisy image. The L1 and L2 norms produce slightly lower NMSE than the combined metrics, with the L1 norm performing particularly well for impulsive noise. Overall the noise reductions produced by the VAMS are greater for impulsive noise than for the other noise types.
56
Adrian N. Evans
MCRE/1e−2
NMSE/1e−2 15 10
5
Input L1 L2 LY Lq Lae2 Lae1
20 15
10
3 5
2
3
1
2 0.5 Input
Gaussian Impulsive
Mixed
Input
Gaussian Impulsive
Mixed
FIGURE 9 VAMS results using various distance metrics to select the region with which the extrema will merge. ' IEEE 2008.
The final VAMS choice to be considered is the color to assign to the merged regions. With grey-scale connected sieves, the convention is to assign the merged, region the value of the non-extreme region. Some other alternatives have been suggested for example, Salembier and Garrido (2000) proposed using the value of the largest of the two regions being merged. The color equivalent of this is simply choosing the color of the largest region. Other options also can be considered, such as selecting the mean or the marginal median of the regions being merged. Although this method does not preserve the original vectors, in color morphology this approach has been found advantageous for noise reduction (Aptoula and Lefe`vre, 2007). If vector preservation is desired, an intermediate approach can be used in which the central vector is used as a ‘‘guide’’ and the merged region is then assigned the closest color to the guide from the colors of the input regions. Figure 10 presents the NMSE and MCRE results using the different mechanisms to select the color to assign the merged region. In particular, the merged region is assigned the color of the non-extreme region, the mean color, the marginal median, and the mean and median guided colors, as described above. For all noise types, selecting the color of the non-extreme region results in the lowest NMSE and MCRE. For impulse noise, this can be explained by the fact that the noise gives rise to extrema that should be completely removed from the image without affecting the surrounding values. For Gaussian and mixed noise, the desideratum to completely remove extrema applies to a lesser degree but the nonextreme region’s color still proves the best choice. This evaluation also
Color Area Morphology Scale-Spaces
6
NMSE/1e−2
14
57
MCRE/1e−2
Non−extreme
5
Mean
12
Median
4
Mean-guided
10
Median-guided
8 3 6 2
4
1 0
2 Gaussian Impulsive Mixed
0
Gaussian Impulsive Mixed
FIGURE 10 VAMS results produced using different mechanisms to determine the colors of the merged regions. ' IEEE 2008.
has been performed with other color area morphology scale-space sieves and the non-extreme region proves the best choice in nearly all cases (Gimenez, 2007). For the other choices, the mean and mean-guided color choices perform poorly for impulsive and mixed noise, whereas the marginal median performs better than the median-guided.
4.4. Vector Area Morphology Open-Close Sieve The VAMOCS is a development of the VAMS that aims to improve the performance of the color area morphology scale-space sieve by increasing the number of image extrema (Gimenez and Evans, 2005, 2008). As the extrema act as seeds for the process of image simplification through the merging of flat zones, increasing their numbers should make the filter more aggressive and consequently has the potential for improving the segmentation performance of the sieve. Unlike the CCS, where the use of the local convex hull results in a binary extreme/non-extreme decision for each flat zone, the scalar surface used by the VAMS allows more flexibility to be introduced to the sieve structure. To introduce more extrema, the VAMOCS classifies both the minima and maxima of the scalar surface as extreme. Unlike the maxima, which are true outliers, the minima are those flat zones with a difference in color to their neighboring regions that is less than those of their connected neighbors. For the example vector image in Figure 5a, there are two initial minima with values (1, 0) and (2, 1), in the top-left and bottom-left corners
58
Adrian N. Evans
of the image, respectively. Although the maxima and minima differ conceptually and are processed by openings and closings, respectively, in the VAMOCS structure both types of extrema are treated identically and are merged with their closest neighbor in step 2 of the algorithm. However, the choice of color to assign the merged region may differ according to whether the extremum is a maximum or a minimum. Processing the minima in this manner merges those flat zones whose color is similar to the surrounding colors and creates larger flat zones in regions with little variation in color. This approach is similar to that of Salembier et al. (1998), who proposed the use of a bound on the maximum allowable grey-scale fluctuations in response to the observation that visual entities may not be strictly flat. When the minima regions are merged particular care must be taken with the selection of the color to assign to the merged region. To try to ensure that the choice reflects a color that is representative of the region to be merged the color of the flat zone with the largest area was selected, an approach previously proposed for grey-scale images (Salembier and Garrido, 2000). The VAMOCS also has a further refinement, introduced to improve the segmentation performance. This is to normalize the distance metric for flat zones by their perimeters rather than their areas (Gimenez and Evans, 2005). The VAMOCS extrema using this normalization for the Lily test image are shown in Figure 11. For comparison, the grey-scale maxima and minima for the luminance image are also shown. Both the VAMOCS and the grey-scale sieve define approximately 10% of the pixels as extreme, although the distributions of these extrema differ. The sieve results for increasing the number of image extrema by including the minima of the scalar surface are presented in Figure 12 for the Lily test image. Here, the VAMS, VAMOCS, and CCS were used to sieve the image to scales 10, 100, 1000, and 10,000. For comparison the (a)
(b)
FIGURE 11 Maxima (black) and minima (white) for Lily test image. (a) Grey-scale extrema and (b) VAMOCS extrema.
Color Area Morphology Scale-Spaces
59
VAMOCS closings are also shown, where only the minima are selected as extrema. The figure illustrates that the actions of the VAMOCS openings and closings are complementary; processing the maxima removes outliers while using closings to remove minima extends the image regions that are relatively flat, leaving isolated extrema unaffected. The VAMOCS (column 3 of Figure 12) combines the effects of openings and closings to produce an area morphology color sieve whose filtering action is similar in aggressiveness to the CCS. However, this aggressiveness is achieved with far fewer extrema, as can be seen by comparing Figure 11(b) with the CCS extrema in Figure 4b.
(a)
(b)
(c)
(d)
VAMS
VAMOCS (closings)
VAMOCS (openings+ closings)
CCS
Area = 10.
Area = 100.
Area = 1000.
Area = 10,000.
FIGURE 12 Color sieve results for Lily test image using 8 connectivity. Columns 1 to 4: VAMS, VAMOCS, (closings only) and VAMOCS (combined openings and closings) and CCS.
60
Adrian N. Evans
TABLE 1 Area morphology scale-space sieves extrema for Lily image* Scale
VAMS CCS VAMOCS GS-AOC
1
2
10
50
100
500
1000
2209 41207 33616 41207 4388 41207 4062 41207
1054 37199 12852 16402 2353 33882 1973 36553
151 27547 2211 2878 476 14326 424 28457
21 19660 456 572 112 3856 110 20697
8 16102 223 279 16 1845 63 17040
1 8024 44 53 8 298 11 9114
1 6871 23 25 9 132 7 6150
* Expressed as fractions: number of flat zones classified as extreme to the total number of flat zones.
The proportion of VAMOCS flat zones defined as extreme is further characterized in Table 1 where, for comparison, results for the VAMS, CCS, and grey-scale area open-close (GS-AOC) sieves are also shown. The CCS has 82% of its flat zones initially classified as extreme, and the number of regions rapidly reduces with increasing area until there are only 25 regions when the scale reaches 1000. The VAMOCS has only 10.6% of regions initially defined as extrema, but its aggressive action rapidly reduces the total number of flat zones until only 132 remain at scale 1000. In contrast, the GS-AOC sieve starts with 9.9% of initial extrema but still has more than 600 flat zones at scale 1000. The VAMS has the fewest initial extrema and still has nearly 7000 extrema at scale 1000, confirming its comparatively nonaggressive action.
5. PERFORMANCE EVALUATION The aim of this section is twofold. First, the implementation used for the color sieves is described and the relationship between the extrema definition and the computational complexity investigated. Second, the suitability of the color area-morphology scale-spaces for image segmentation is investigated. Segmentation is one of the most important problems of color image analysis; it has been widely considered elsewhere and has been the subject of several survey papers (Cheng et al., 2001; Lucchese and Mitra, 2000). Color image segmentation was also the focus of Gimenez and Evans (2005, 2008), from which material in this section is drawn. Although only the RGB color space is used in this work, the relationship between color image segmentation and the color space used is a topic in its own right and has been addressed at length by Busin et al. (2008). The use of the color area morphology scale-spaces in other color spaces has been investigated (Gimenez, 2007), as has their use for image noise reduction (Gimenez, 2007; Gimenez and Evans, 2008).
Color Area Morphology Scale-Spaces
61
5.1. Implementation and Timings An approach based on Vincent’s (1993b) pixel-queue algorithm has been proposed to implement the color area-morphology scale-space sieves (Gimenez, 2007; Gimenez and Evans, 2008). Although other, more efficient algorithms exist for grey-scale sieves, the pixel-queue lends itself to a straightforward adaption to the color sieve algorithm presented in Section 4.1. Figure 13 presents the algorithm used to implement the color sieves; a brief explanation of its steps follows. In line 1, all flat zones identified as extreme are placed in a list according to its area, each list being a first-infirst-out (FIFO) queue. The lists are processed from area 1 to l (line 2) until no extrema of area l remain. The use of the while loop in line 3 ensures that any new extrema created by the merging process (line 6) that are smaller than the current area are processed, thus ensuring idempotency. New extrema can be created only in the vicinity of the merging regions, and therefore this check in line 7 can be performed locally. In addition to creating new extrema, the merging process can also result in regions that were classified as extrema that became non-extreme. These cases could be detected and removed from the extrema list. Alternatively, line 5 performs a simple test on each previously identified extrema—to confirm that it is still extreme—before the merging process. The CCS and VAMS implementations both follow this algorithm, differing only in how they identify the extrema. The VAMOCS requires a slight modification as the choice of color to assign to the merged region is different for maxima and minima. Therefore, the extrema must either be placed in a separate list or be distinguished before the merging process— for example, by determining the extrema type in line 5. 1.
Extract all extremum flat zones and place in list extrema(area), according to their area
2.
For area = 1 to l:
3. 4.
While extrema(i) " i = 1, 2, ... , area are not empty For each flat zone in extrema(area) If flat zone is still an extremum,
5. 6.
Merge region with its closest neighbor;
7.
Append any new extrema created by merging process to appropriate list; Else remove flat zone from extrema(area).
8.
FIGURE 13
Color area morphology scale-space sieve algorithm.
62
Adrian N. Evans
The color area morphology scale-space sieve algorithm has several properties in common with its grey-scale counterpart. These include idempotence, strong casuality, and a reduction in the total number of flat zones with increasing scale. However, the outputs of the color sieves are not invariant to the order in which the extrema are merged. A similar situation is found with grey-scale sieves, where different results are obtained according to whether the maxima or minima are processed first (i.e., whether the sieve is an AOC or an ACO). Color sieves typically have many adjacent extrema and the order in which they are processed can have a more significant influence on the final result. Using the above algorithm, the color area morphology sieves were implemented in Cþþ, then compiled and run as mex files within Matlab. The processing times and proportion of regions classified as extreme using 8-connectivity and the L2 norm are shown in Figure 14. Results for the GS-AOC sieve applied to the luminance component are also included. To provide a meaningful comparison, results for a pixel-queue implementation of a GS-AOC sieve are also included, although it should be noted that more computationally efficient algorithms have been developed for the grey-scale case (Meijster and Wilkinson, 2002).
(a)
(b) 102
18 16 14
101
12 Time (s)
Proportion of extrema
VAMS CCS GS-AOC VAMOCS
100
10 8 6
10−1
4 VAMS CCS GS-AOC VAMOCS
10−2 0 10
101
2 102
Area
103
0 100
101
102 103 Area
104
FIGURE 14 Color sieve performance for Lily image. (a) Proportion of regions classified as extreme and (b) processing times.
Color Area Morphology Scale-Spaces
63
The processing times can largely be explained by considering the proportion of extreme regions. The CCS has a high proportion of extrema and the calculation of the convex hull is also relatively expensive. These factors are reflected in a high processing time for low values of area. However, because the total number of regions rapidly reduces with scale (see Table 1) the processing time is relatively constant above an area of 100. The VAMS has a lower proportion of extrema than the GSAOC sieve but has a slightly higher processing time below an area of 1000 due to its extrema definition being more computationally expensive to calculate. Compared to the VAMS, the more complex structure of the VAMOCS is reflected by its higher processing times. However, its total number of regions reduces more rapidly because of its more aggressive action. Therefore, the computational advantage of the VAMOCS increases with increasing scale and it is the quickest color sieve for areas > 1000.
5.2. Application to Image Segmentation The ability of connected filters to simplify images by either completely removing or perfectly preserving edges means that they are well suited for image segmentation, and their application to this problem has been proposed by several authors (for example, see Crespo et al., 1997; Serra, 2008; Soille, 2008). For grey-scale images, a formal relationship between connected operators and segmentation algorithms based on region merging has been established (Gatica-Perez et al., 2001). One useful approach is to derive a tree-based image representation from the scale-space and apply an appropriate pruning technique. This strategy has been demonstrated for scale-spaces from area morphology and other connected operators (Salembier et al., 1998; Salembier and Garrido, 2000). This section explores the potential of the color area morphology scalespaces for image segmentation and contains some material drawn from Gimenez and Evans (2008). The intention here is not to propose a pruning strategy that is optimal in a segmentation sense. Instead, the ability of the color sieves’ tree representations to contain semantically meaningful segmentation is investigated by applying the sieves over a range of scales, such that the sieved images have a predetermined target number of regions remaining, and then quantitatively assessing the resulting segmentations. All experiments were performed in the RGB color space and the L2 norm was used both to determine the extrema and to select the regions with which to merge. Figure 15 shows the trees for 500 regions produced by the GS-AOC and the color sieves on the Lily test image. Although the target number of regions is the same, there are marked differences in the resulting tree structures. To provide an objective measure of the segmentation performance, the Berkeley Segmentation Dataset and Benchmark is used (available
64
Adrian N. Evans
(a)
(b)
GS-AOC (c)
CCS (d)
VAMS
VAMOCS
FIGURE 15 Example trees for Lily test image (Figure 4a), sieved to 500 regions.
from http://www.cs.berkeley.edu/projects/vision/grouping/segbench) (Martin et al., 2001, 2004). The public dataset contains 300 (100 test and 200 training) images and the segmentation results produced by a number of human observers, using both color and grey-scale versions of the images (Martin et al., 2001). An example color image and its human segmentations are shown in Figure 16. Using each collection of human segmentations to provide ground truth, a methodology for quantifying the segmentation performance is also proposed (Martin et al., 2004). This methodology uses precision-recall (P-R) curves to characterize the segmentation performance, where precision measures the probability that a detected boundary pixel is contained in the ground truth and recall is the probability of detecting a true boundary pixel. Although other quantitative evaluation methods exist (for example, those described in Busin et al., 2008), P-R curves are particularly well adapted for color image segmentation. P-R curves for each of the 100 color test images were found by applying the color area morphology scale-space sieves with increasing area until the number of regions was less than a predetermined threshold. For each target number of regions the precision and recall values were recorded. Predetermined thresholds for the number of regions were used
Color Area Morphology Scale-Spaces
(a)
65
(b)
FIGURE 16 Example image from the Berkeley Segmentation Dataset. (a) Koala image and (b) collection of human segmentations.
in preference to explicitly varying the area parameter as it is less dependent on image content and also allows direct comparison with sieves results using other attributes. The approach is also compatible with the dataset ground truth images that contain a small number of equally important regions. To enable a comparison between different P-R curves the F-measure can be calculated for all points on the curves. The F-measure is given by the harmonic mean of precision and recall which can be simply calculated by F-measure ¼
2 Precision Recall ; Precision þ Recall
(2.12)
and the maximum value of the F-measure along a P-R curve provides a single number characterizing the segmentation performance (Martin et al., 2004). Figure 17 presents the P-R curves for the Koala test image shown in Figure 16a produced by the GS-AOC sieve, the CCS, the VAMS, and the VAMOCS. The position on each curve at which the maximum F-measure occurs is also shown and the corresponding values are given in the figure’s caption. The F-measures show that all that color sieves produce a better segmentation performance than the grey-scale sieve, with the CCS and VAMOCS producing the best F-measures of 0.66 and 0.65, respectively. The segmentation results for the maximum F-measures are
66
Adrian N. Evans
(b) 1
1
0.75
0.75 Precision
Precision
(a)
0.5
0.25
0
0.25
0
0.25
0.5 Recall
0.75
0
1
(c)
0
0.25
0.5 Recall
0.75
1
0
0.25
0.5 Recall
0.75
1
(d) 1
1
0.75
0.75 Precision
Precision
0.5
0.5
0.25
0
0.5
0.25
0
0.25
0.5 Recall
0.75
1
0
FIGURE 17 P-R curves for Koala test image. The maximum F-measure is marked by a bold dot in the figure with its exact value given in parentheses as follows: (a) GS-AOC sieve (0.54), (b) CCS (0.66), (c) VAMS (0.60), and (d) VAMOCS (0.65).
shown in Figure 18. These clearly show the improved performance of the color sieves over their grey-scale counterpart. In addition to improved segmentation performances, the color sieves typically achieve their best results with far fewer regions. For example, the CCS and VAMOCS results in Figure 18b and d have between 30 and 40 regions, whereas the GS-AOC segmentation has several thousand regions. When two segmentations have the same maximum F-measure, the segmentation with fewer regions is generally preferable. Therefore, the number of regions is important despite not directly contributing to the segmentation performance. To provide a more comprehensive evaluation the maximum F-measures from the P-R curves for all 100 color test images were found and the average values produced by each sieve calculated; these values are presented in
Color Area Morphology Scale-Spaces
(a)
(b)
(c)
(d)
67
FIGURE 18 Sieve segmentation results corresponding to the maximum F-measures in Figure 17 produced by (a) GS-AOC sieve, (b) CCS, (c) VAMS, and (d) VAMOCS.
Table 2. Also given are the number of regions at which the maximum F-measures occurred. These results confirm the initial impressions gained from the Koala image; compared with the grey-scale sieve, all the color area-morphology scale-spaces produce an improved segmentations with
68
Adrian N. Evans
TABLE 2 Average F-measure for 100 images from the Berkeley dataset* Morphological Sieve
GS-AOC CCS VAMS VAMOCS
Maximum F-measure
Number of Regions
0.37 0.49 0.40 0.51
3000 30 60 70
* Achieved by the CCS, area-normalized VAMS, perimeter-normalized VAMOCS, and the GS-AOC ASF applied to the luminance component.
fewer regions. The maximum F-measures for the color sieves are 0.03–0.14 higher than the grey-scale case and the number of regions is at least an order of magnitude lower. Comparing the performance of the color sieves, the VAMS has the lowest F-measure and this is achieved with, on average, the highest number of regions. The CCS has a significantly better segmentation performance than the VAMS, with an F-measure that is improved by 0.09. However, the inclusion of area closings in addition to openings by the VAMOCS results in the highest F-measure of 0.51, which is 0.02 above that of the CCS, although this is achieved with a slightly increased number of regions. Results using contrast as the attribute instead of area have also been reported (Gimenez and Evans, 2008) and show the same general trends.
5.3. Robustness to Noise One important aspect of segmentation algorithms is their sensitivity to perturbations of the input image, for example, by noise. To this end, the robustness of the color area morphology scale-spaces was investigated by applying the color sieves to original and noise-corrupted images. The resulting segmentations were compared, again using the quantitative evaluation methodology provided by the Berkeley Segmentation Dataset and Benchmark. For each noise-free image the segmentation that produced the maximum F-measure on the P-R curve was used as the best segmentation result. The images were then corrupted by different levels of impulsive and Gaussian noise, and the resulting segmentations were compared to the best noise-free results produced by the same sieves. In practice, this was accomplished by generating P-R curves for the noisy images, using the noise-free segmentation results as the ground truths. The maximum F-measure then provides a measure of how closely the segmentation results of the noisy images match those of the noise-free cases. The results for one image from the dataset are shown in Figure 19. In the figure, the overlaps between the original and noise-corrupted
Color Area Morphology Scale-Spaces
(a)
Original image and collection of human segmentations
(b)
CCS results for impulsive noise (left) and Gaussian noise (right)
(c)
VAMS results for impulsive noise (left) and Gaussian noise (right)
69
(d) VAMOCS results for impulsive noise (left) and Gaussian noise (right)
FIGURE 19 Segmentation results for image corrupted by 10% impulsive noise and Gaussian noise (s2 ¼ 103). Segmentation results for noise-corrupted images are shown in blue, noise-free segmentation results are shown in yellow, and overlaps are shown in black.
segmentations are shown in black and the noise-free and noisy segmentations in yellow and blue, respectively. If the introduction of image noise caused no change in the segmentation result, then the segmentations
70
Adrian N. Evans
would be congruent and only the black overlap would be visible. In practice, the segmentations differ but are difficult to evaluate subjectively, confirming the need for a quantitative evaluation methodology to assess the segmentation performances. The 100 color test images from the dataset were corrupted by three levels of independent impulsive and Gaussian noise and the average F-measure was used to quantify the robustness of the results (Table 3). All color sieves show a decreasing F-measure as the level of noise increases, for both impulses and Gaussian noise. The VAMS has the lowest average F-measure for all noise types and levels, implying that is the least robust. The CCS is the most robust for a low and median level of impulsive noise. The VAMOCS is the most robust for a high level of impulsive noise and all levels of Gaussian noise.
6. CONCLUSIONS The large majority of color morphology work to date has been concerned with structural techniques. In comparison, the development of multichannel connected filters is an area that is somewhat more inchoate though recent contributions—for example, those of Serra (2008) and Soille (2008)—show that it is gathering momentum. Area morphology scalespace sieves have proved useful for image analysis and have been widely applied to image segmentation. Their extension to the color domain offers some attractions but also several problems that must be overcome. Here, a vectorial approach is preferred as simply applying a grey-scale sieve to individual channels can result in color bleeding and perceived edge shifts. Although there is an ordering problem that must be addressed, in multichannel images the distinction between maxima and minima that must be accommodated in the grey-scale case vanishes, resulting in an inherently balanced treatment. TABLE 3 Robustness of color sieve segmentations to (Independent) image noise* Noise Type
0.1% Impulsive 1% Impulsive 10% Impulsive Gaussian (s2 ¼ 101) Gaussian (s2 ¼ 103) Gaussian (s2 ¼ 104)
CCS
VAMS
VAMOCS
0.670 0.590 0.543 0.537 0.490 0.431
0.478 0.452 0.456 0.436 0.452 0.415
0.610 0.577 0.568 0.558 0.499 0.436
* Average F-measures between noisy and noise-free segmentations for 100 images from the Berkeley dataset.
Color Area Morphology Scale-Spaces
71
A detailed description of the underlying algorithm of three such color area morphology sieves—the CCS, the VAMS, and the VAMOCS—has been presented and their different mechanisms for determining extrema described. For the CCS, the extrema definition is binary, resulting in a high proportion of extrema and an aggressive sieving action. However, some discretion in the merging process is possible. The VAMS has additional flexibility resulting from its capacity to accommodate different distance metrics in its extrema definition. Its scalar surface also can be processed using closings, giving rise to the VAMOCS. Although it has an increased proportion of extrema—and hence a more aggressive action— the complexity of the VAMOCS is still comparable with that of the VAMS. The color area morphology filters are idempotent by virtue of the fact that they process any new extrema created by the merging process at each scale. However, they do lack invariance to the order in which the extrema are processed, which may be an issue in some applications. The application of the color area morphology scale-spaces for segmentation has been evaluated using the Berkeley Segmentation Dataset and Benchmark and an improvement on the GS-AOC sieve shown. The VAMOCS produced the best overall segmentation performance and is also the most robust to image noise. Although not competitive with state-of-theart color segmentation techniques, the results show the potential of the scalespaces to generate trees from which semantically meaningful segmentations can be extracted. The scale-spaces are also suitable for use with other multivariate images, for example, from remote sensing applications.
ACKNOWLEDGMENTS The contribution by D. Gimenez during the course of his PhD. work to the material on which this chapter is based is gratefully acknowledged.
REFERENCES Acton, S., and Mukherjee, D. (2000). Scale-space classification using area morphology. IEEE Trans. Image Process. 9(4), 623–635. Androutsos, D., Plataniotis, K., and Venetsanopoulos, A. (1998). Distance measures for color image retrieval. In ‘‘IEEE International Conference on Image Processing,’’ Vol. 2, pp. 770–774. Angulo, J., and Serra, J. (2003). Color segmentation by ordered mergings. In ‘‘IEEE International Conference on Image Processing,’’ Vol. 2, pp. 125–128. Angulo, J., and Serra, J. (2007). Modelling and segmentation of colour images in polar representations. Image Vision Comput. 25(4), 475–495. Aptoula, E., and Lefevre, S. (2007). A comparative study on multivariate mathematical morphology. Pattern Recogn. 40(11), 2914–2929. Astola, J., Haavisto, P., and Neuvo, Y. (1990). Vector median filters. Proc. IEEE 78, 678–689.
72
Adrian N. Evans
Bangham, J., Harvey, R., Ling, P., and Aldridge, R. (1996). Morphological scale-space preserving transforms in many dimensions. J. Electron. Imaging 5(3), 283–299. Barnett, V. (1976). The ordering of multivariate data. J. Roy. Stat. Soc. A 139(Part 3), 318–343. Breen, E., and Jones, R. (1996). Attribute openings, thinnings, and granulometries. Comput. Vision Image Und. 64, 377–389. Brunner, D., and Soille, P. (2007). Iterative area filtering of multichannel images. Image Vision Comput. 25(8), 1352–1364. Busin, L., Vandenbroucke, N., and Macaire, L. (2008). Color spaces and image segmentation. Adv. Imaging Electron Phys. 151, 65–168. Cheng, F., and Venetsanopoulos, A. (1992). An adaptive morphological filter for image processing. IEEE Trans. Image Proc. 1, 533–539. Cheng, H., Jiang, X., Sun, Y., and Wang, J. (2001). Color image segmentation: advances and prospects. Pattern Recogn. 34(12), 2259–2281. Comer, M., and Delp, E. (1999). Morphological operations for color image processing. J. Electron. Imaging 8(3), 279–289. Crespo, J., Schafer, R., Serra, J., Gratin, C., and Meyer, F. (1997). The flat zone approach: a general low-level region merging segmentation method. Signal Process. 62(1), 37–60. Evans, A. N. (2003a). Extending area morphology to multivariate images. In ‘‘Proc. 6th IEEEEURASIP Workshop on Nonlinear Signal and Image Processing’’. Evans, A. N. (2003b). Vector area morphology for motion field smoothing and interpretation. IEE Proc.Vision Image Signal Process 150(4), 219–226. Evans, A. N., and Gimenez, D. (2008). Extending connected operators to colour images. In ‘‘International Conference on Image Processing,’’ pp. 2184–2187. DOI: 10.1109/ ICIP.2008.4712222. Gatica-Perez, D., Gu, C., Sun, M., and Ruiz-Correa, S. (2001). Extensive partition operators, gray-level connected operators, and region merging/classification segmentation algorithms: theoretical links. IEEE Trans. Image Process. 10(9), 1332–1345. Gauch, J. (1999). Image segmentation and analysis via multiscale gradient watershed hierarchies. IEEE Trans. Image Process. 8(1), 69–79. Gibson, S., Bangham, J., and Harvey, R. (2003a). Evaluating a colour scale-space. In ‘‘Proc. British Machine Vision Conference,’’ doi: 10.1.1.108.6725. Gibson, S., Harvey, R., and Finlayson, G. (2003b). Convex colour sieves. In ‘‘Proc. 4th International Conference on Scale Space Methods in Computer Vision,’’ LNCS 2695, pp. 550–563. Gimenez, D. (2007). Colour morphological sieves for scale-space image processing. University of Bath, Ph.D. thesis. Gimenez, D., and Evans, A. N. (2005). Colour morphological scale-spaces for image segmentation. In ‘‘Proc. British Machine Vision Conference,’’ pp. 909–918. Gimenez, D., and Evans, A. N. (2008). An evaluation of area morphology scale-spaces for colour images. Comput. Vision Image Und. 110, 32–42. Hanbury, A., and Serra, J. (2001a). Mathematical morphology in the HLS colour space. In ‘‘12th British Machine Vision Conference,’’ Vol II, pp. 451–460. Hanbury, A., and Serra, J. (2001b). Morphological operators on the unit circle. IEEE Trans. Image Process. 10(12), 1842–1850. Jackway, P., and Deriche, M. (1996). Scale-space properties of the multiscale morphological dilation erosion. IEEE Trans. Pattern Analy. Machine Intell. 18(1), 38–51. Klein, J. C. (1976). Conception et re´alisation d’une unite´ logique pour l’analyse quantitative d’images. Ph.D. thesis, Nancy University, France. Koenderink, J. J. (1984). The structure of images. Biol. Cybern. 50, 363–370. Li, J., and Li, Y. (2004). Multivariate mathematical morphology based on principal component analysis: initial results in building extraction. Int. Arch. Photogrammetry 35(B7), 1168–1173.
Color Area Morphology Scale-Spaces
73
Louverdis, G., Vardavoulia, M., Andreadis, I., and Tsalides, P. (2002). A new approach to morphological color image processing. Pattern Recogn. 35(8), 1733–1741. Lucchese, L., and Mitra, S. (2000). Filtering color images in the xyY color space. In ‘‘IEEE Int. Conf. Image Process.,’’ Vol. III, pp. 501–503. Maragos, P. (1989). Pattern spectrum and multiscale shape representation. IEEE Trans. Pattern Analy. Machine Intell. 11(7), 701–716. Martin, D., Fowlkes, C., and Malik, J. (2004). Learning to detect natural image boundaries using local brightness, colour and texture cues. IEEE Trans. Pattern Analy. Machine Intell. 26(5), 530–549. Martin, D., Fowlkes, C., Tal, D., and Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ‘‘Proc. 8th Intl. Conf. Comput. Vision,’’ Vol. 2, pp. 416–423. Meijster, A., and Wilkinson, M. (2002). A comparison of algorithms for connected set openings and closings. IEEE Trans. Pattern Analy. Machine Intell. 24(4), 484–494. Meyer, F. (2000). Vector levelings and flattenings. In ‘‘Mathematical Morphology and Its Applications to Image and Signal Processing,’’ ( J. Goutsias, L. Vincent, and D. Bloomberg, eds.) pp. 51–60, Springer US, Kluwer. Meyer, F. (2004). Levelings, image simplification filters for segmentation. J. Math. Imaging and Vision 20(1-2), 59–72. Meyer, F., and Maragos, P. (2000). Nonlinear scale-space representation with morphological levelings. J. Visual Commun. Image R. 11(2), 245–265. Monasse, P., and Guichard, F. (2000). Fast computation of a contrast-invariant image representation. IEEE Trans. Image Process. 9(5), 860–872. Park, K., and Lee, C. (1996). Scale-space using mathematical morphology. IEEE Trans. Pattern Analy. Machine Intell. 18(11), 1121–1126. Perona, P., and Malik, J. (1990). Scale space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Analy. Machine Intell. 12(7), 629–639. Plaza, A., Martı´nez, P., Pe´rez, R., and Plaza, J. (2002). Spatial/spectral endmember extraction by multidimensional morphological operations. IEEE Trans. Geosci. Remote Sensing 40(9), 2025–2041. Salembier, P. (2008). Connected operators based on region-trees. In ‘‘IEEE international conference on image processing,’’ pp. 2176–2179. Salembier, P., and Garrido, L. (2000). Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Trans. Image Process. 9(4), 561–576. Salembier, P., Meyer, F., Brigger, P., and Bouchard, L. (1996). Morphological operators for very low bit rate video coding. In ‘‘IEEE Int. Conf. Image Process,’’ Vol. III, pp. 659–662. Salembier, P., Oliveras, A., and Garrido, L. (1998). Antiextensive connected operators for image and sequence processing. IEEE Trans. Image Process. 7(4), 555–570. Salembier, P., and Serra, J. (1995). Flat zones filtering, connected operators, and filters by reconstruction. IEEE Trans. Image Process. 4(8), 1153–1160. Serra, J. (1982). Image Analysis and Mathematical Morphology. Academic Press, London. Serra, J. (1988). Image Analysis and Mathematical Morphology, Vol. II: Theoretical Advances. Academic Press, London. Serra, J. (2008). Advances in mathematical morphology: segmentation. Adv. Imaging Electron Phys. 50, 185–219. Serra, J. (2009). The ‘‘false colour’’ problem mathematical morphoogy and its application to signal and image processing. 9th International Symposium, ISMM 2009 Groningen, The Netherlands. Lecture notes in computer science, Vol. 5720, pp. 13-23. Springer Berlin/Heidelberg. Soille, P. (2003). Morphological Image Analysis: Principles and Applications. 2 SpringerVerlag, Berlin.
74
Adrian N. Evans
Soille, P. (2008). Constrained connectivity for hierarchical image decomposition and simplification. IEEE Trans. Pattern Analy. Machine Intell. 30(7), 1132–1145. Trahanias, P., and Venetsanopoulos, A. (1993a). Color edge detection using vector order statistics. IEEE Trans. Image Process. 2(2), 259–264. Trahanias, P., and Venetsanopoulos, A. (1996). Vector order statistics operators as color edge detectors. IEEE Trans. Systems, Man Cybern. 26(1), 135–143. Trahanias, P. E., and Venetsanopoulos, A. N. (1993b). Vector directional filters—a new class of multichannel image processing filters. IEEE Trans. Image Process. 2(4), 528–534. van den Boomgaard, R., and Smeulders, A. (1994). The morphological structure of images: the differential equations of morphological scale-space. IEEETrans. Pattern Analy. Machine Intell. 16(11), 1101–1113. Viero, T., Oistamo, K., and Neuvo, Y. (1994). Three-dimensional median-related filters for colour image sequence filtering. IEEE Trans. Circuits Systems Video Technol. 4(2), 129–142. Vincent, L. (1993a). Morphological area openings and closings for grey-scale images. In ‘‘Shape in Picture: Mathematical Description of Shape in Grey-level Images,’’ pp. 196–208. Springer-Verlag, Berlin. Vincent, L. (1993b). Morphological area openings and closings, their efficient implementation and applications In ‘‘Proceedings of the ‘‘EURASIP Workshop on Mathematical Morphological and its Application to Signal Processing,’’ May 1993,’’ pp. 22–27. Barcelona, Spain. Vincent, L. (1993c). Morphological gray scale reconstruction in image analysis: applications and efficient algorithms. IEEE Trans. Image Process. 2, 176–201. Weber, K., and Acton, S. (2004). On connected filters in color image processing. J. Electron. Imaging 13(3), 619–629. Wilkinson, M. (2008). Connected filtering by reconstruction: basis and new advances. In ‘‘IEEE International Conference on Image Processing,’’ pp. 2180–2183. Wilkinson, M. H. F., and Roerdink, J. B. T. M. (2000). Fast morphological attribute operations using Tarjan’s union-find algorithm. In ‘‘Mathematical Morphology and Its Applications to Image and Signal Processing,’’ pp. 311–320. Springer US, Kluwer. Witkin, A. P. (1983). Scale-space filtering. In ‘‘Proc. 8th Int. Joint Conf. Artificial Intelligence,’’ pp. 1019–1022. Young, N., and Evans, A. (2003). Psychovisually tuned attribute operators for pre-processing digital video. IEE Proc. Vision Image Signal Process. 150(5), 277–286. Zanoguera, F., and Meyer, F. (2002). On the implementation of non-separable vector levelings. In ‘‘Mathematical Morphology, Proceedings,’’ (H. Talbot and R. Beare, eds.), pp. 369–377. Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia. Zhu, S.-Y., Plataniotis, K., and Venetsanopoulos, A. (1999). Comprehensive analysis of edge detection in color image processing. Optical Eng. 38(4), 612–625.
Chapter
3 Harmonic Holography Ye Pu, Chia-Lung Hsieh, Rachel Grange, and Demetri Psaltis
Contents
1. Introduction 2. Scattering of Optical Second Harmonics 2.1. SHG Compared with Two-Photon Fluorescence 2.2. Differences between Bulk and Nanoscale Harmonic Generations 2.3. Emission Power and Scattering Cross Section of Nanoscale SHG 3. Principle of Harmonic Holography 3.1. Coherence of Optical Harmonics 3.2. Harmonic Holography 3.3. Performance Estimation 3.4. Holography with Other Types of Nonlinear Emissions 4. Three-Dimensional Microscopy of Cells with Harmonic Holography 5. Future Developments 6. Conclusion References
75 80 81 83 86 93 93 93 100 102 102 105 106 107
1. INTRODUCTION Imaging is one of the most important ways we observe and study our world. However, even the most advanced imaging sensor is not looking at our world in the right way. The dynamic, three-dimensional (3D) world Optics Laboratory, School of Engineering, E´cole Polytechnique Fe´de´rale de Lausanne (EPFL), STI IOA LO, BM 4.107, Station 17, CH-1015 Lausanne, Switzerland Advances in Imaging and Electron Physics, Volume 160, ISSN 1076-5670, DOI: 10.1016/S1076-5670(10)60003-1. Copyright # 2010 Elsevier Inc. All rights reserved.
75
76
Ye Pu et al.
necessitates a four-dimensional (4D) representation for complete knowledge. On the other hand, 3D or 4D image sensors simply do not exist, and a two-dimensional (2D) array is insufficient to represent the real world. Scanning is the most common way to represent a higher-dimensional world using a lower-dimensional sensor. Although scanning in the temporal domain (i.e., time-lapse method) is natural, scanning in any spatial dimension at a finite speed inevitably interferes with the temporal domain, causing distorted results. Indeed, truly distortionless 4D imaging is achievable only through a time sequence of rapid, scanningless 3D images. This riddle is often typified in the study of biology through optical microscopy, where time-lapse studies are often crucial to observe changes or movements of specific molecules in three dimensions inside living cells or tissues. This, in fact, remains one of the grand open challenges in modern science. In addition to the highly complex 3D structures spanning a large range of length scale (Whitesides, 2003), living organisms by their nature are highly dynamic: Molecular processes such as protein, DNA, and RNA conformation changes take place in a timescale ranging from 100 fs to 100 sec (Brauns et al., 2002; Cheatham, 2004; Gilmanshin et al., 1997; Trifonov et al., 2005) while the organisms move and metabolize. Essentially, probing these processes exactly requires a 4D microscopy technique. Modern microscopy techniques have made great advances in probing life processes while compromising dimensionality. Confocal (Brakenhoff et al., 1979) and two-photon laser scanning microscopy (LSM) (Denk et al., 1990) obtain ‘‘optical sectioning’’ image planes by scanning the 2D plane point by point. Three-dimensional image volumes can be obtained by scanning in an additional dimension. Deconvolution wide-field microscopy (Agard et al., 1989) provides similar 3D capability by acquiring images in two dimensions while scanning in the third, and performs deconvolution over the entire image volume. Alternatively, 3D imaging was also achieved by applying the concept of X-ray computed tomography to optical microscopy (Huisken et al., 2004; Kawata, 1994; Sharpe et al., 2002), where the scanning was in the form of sample rotation. Although single-molecule sensitivity has been achieved with these farfield techniques (Nie et al., 1994; Schwille et al., 1999), their spatial resolution is classically limited by diffraction, and temporal resolution is greatly sacrificed due to the scanning. A number of recent techniques have attempted to break the ubiquitous diffraction limit. Hell and Stelzer (1992) developed 4Pi-confocal microscopy that improves the axial resolution by several folds in 3D imaging, where the sample is illuminated with counterpropagating focused light using two opposing objectives. Gustafsson and colleagues (1999) devised I5M based on incoherent interference on both the illumination and imaging sides and achieved better than 100-nm axial resolution. Hell and Wichmann (1994) demonstrated
Harmonic Holography
77
stimulated depletion (STED) microscopy, which recently achieved 5.8-nm resolution in the imaging of nitrogen-vacancy defects in diamond nanocrystals (Rittweger et al., 2009). Betzig et al. (2006) and Rust et al. (2006) developed photo-activated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM), respectively, based on photo-switchable fluorophores and evanescent illumination microscopy. The same technique was also developed by Hess et al. (2006) simultaneously. A modified version of STORM was also devised recently that is capable of obtaining 3D images (Huang et al., 2008). Subwavelength resolution was also achieved through structured illumination microscopy (Frohn et al., 2000; Heintzmann et al., 2002; Neil et al., 1997) and its nonlinear version, saturated structured illumination microscopy (Gustafsson, 2005). In an excellent review, Hell (2007) summarizes these far-field optical methods in resolving features smaller than the diffraction limit. Whereas microscopy techniques have made great progress in resolving finer spatial structures in 2D, the temporal resolution for acquiring 3D image frames, or the ‘‘framing time,’’ has been relatively unchanged. This scenario can be seen more clearly if we put the above techniques in a survey map, as illustrated in Figure 1, where we assume an image volume of 1000 1000 1000 pixels is to be acquired. The 2D framing time for all variants of LSM is assumed to be 20 ms since video-rate LSMs have been available (Fan et al., 1999; Nakano, 2002; Rajadhyaksha et al., 1995). The framing time for 3D STORM is not due to scanning but rather the intrinsic integration time required to image sufficient number of fluorophores. The horizontal axis in Figure 1 represents the spatial spanning as the resolvable range in the depth dimension, and the vertical axis is the framing time spent to obtain the 3D image frame. Capabilities of microscopy technologies are represented with bars, whereas the spatiotemporal extent of a few selected biological phenomena is shown as blocks with a dashed outline. Dotted lines show the physical constraints for optical microscopy. Obviously, a phenomenon is only resolvable by a particular technology when the corresponding block falls above the bar representing that technology. Figure 1 suggests that many important processes are not resolvable due to the long 3D framing time, and thus long-term 3D observation of these highly dynamic processes is beyond the reach of the current microscopy techniques. Here we limit our discussion in 3D imaging using far-field optical techniques only. Near-field scanning optical microscopy and many variants of scanning probe microscopy such as scanning probe microscopy are not included due to their 2D nature. Since holography was invented more than six decades ago (Gabor, 1948), an aberration-free, truly 3D microscope has long been sought. In fact, soon after Leith and Upatniek (1964) demonstrated the first 3D image by holography, holographic microscopy was proposed (Leith et al., 1965). With the recent advent of digital holography (Schnars and
78
Ye Pu et al.
Temporal spanning/3D framing time
STED SSIM CLSM SD-CLSM DC-WFM 2PLSM
3D storm
103 s
Mitosis
1s Neural signaling
1 ms 1 ms
DNA dynamics
Protein dynamics
3
Technologies Phenomena of interest
1 ns
2 1 Optical diffraction limit
1 H2 theoretical
1 ps 1 fs
H2 now practical
3 Penetration depth, NIR
H-O-H bend
1 nm
10 nm 0.1 mm 1 mm
2 Penetration depth, UV
10 mm 0.1 mm 1 mm 10 mm
Spatial spanning/vertical range
FIGURE 1 A brief survey on the 3D imaging capabilities of select microscopy techniques and a few examples of biological processes. Phenomena of interest are shown in light grey blocks with a dashed outline; capabilities of microscopy technologies are shown as bars. The process of water molecule bending is shown as an extreme case for reference. 3D framing time refers to the total time required to build one 3D spatiotemporally resolved frame of image data whether by volume scanning or some other mechanism. Calculations of framing time are approximate and are based on a 3D volume consisting of 1000 image planes with 1000 1000 pixels each plane. Methods based on scanning probe microscopy are not included. CLSM, confocal laser scanning microscopy; H2, harmonic holography; SD-CLSM, spinning-disk confocal laser scanning microscopy; 2PLSM, two-photon laser scanning microscopy; DC-WFM, deconvolution wide-field microscopy; SSIM, saturated structured-illumination microscopy; STED, stimulated emission depletion microscopy; STORM: stochastic optical reconstruction microscopy. The original form of photo-activated localization microscopy (PALM) and STORM are not included because of their 2D nature.
Juptner, 1994; Yamaguchi and Zhang, 1997), holographic 3D microscopy received renewed and rapidly increasing attention (Cuche et al., 1999; Dubois et al., 1999; Marquet et al., 2005; Miccio et al., 2007; Xu et al., 2003; Yamaguchi et al., 2001; Zhang and Yamaguchi, 1998). As a ‘‘whole-field’’ technique, holography records all 3D pixels simultaneously in one laser shot without scanning, which is extremely valuable for biomedical imaging. When combined with modern high-repetition rate pulsed laser and fast imaging devices, a time sequence of consecutive 3D images can be captured, forming a 4D microscope. Successful 4D holographic imaging has been achieved recently in the context of fluid velocity measurement (Pu and Meng, 2005).
Harmonic Holography
79
Despite the technical advancements, holographic microscopes are not widely deployed in biomedical research because of their lack of specificity. The interior of a biological cell is an extremely crowded world where substances ranging from a few nanometers to a few microns are densely packed. A holographic microscope would capture all scattering entities in the viewing field faithfully but indiscriminately. On the other hand, the signals of interest (often from small nanostructures like protein molecules) are usually very weak and buried in the strong ambient scatterings from much larger organelles. A key to achieve the specificity required for molecular biomedical imaging is to create a contrast between the useful signal and the ambient scatterings. Many mechanisms of contrast have been devised throughout the history of microscopy, including phase (Zernike, 1942), differential interference (Allen et al., 1981; Nomarski, 1969), wavelength (mostly fluorescence) (Giepmans et al., 2006; Konig, 2000; Michalet et al., 2005; Tsien, 1998), polarization (Aaron et al., 2008), plasmonic resonance ( Jacobsen et al., 2006), and thermal (Cognet et al., 2003) contrast. Recently intrinsic second harmonic response was also used to image certain collagen-rich biological structures without the need for extraneous contrast agents (Campagnola and Loew, 2003; Zipfel et al., 2003). Among these mechanisms of contrast, wavelength contrast in the form of fluorescence is by far the most widely used method in biomedical imaging because of its simplicity and molecular-level sensitivity (Gao et al., 2005; Konig, 2000; Pringle et al., 1991; Weiss, 1999). Fluorescence microscopy is routinely used to image a molecule or nanostructure of interest tagged with fluorescent agents, such as fluorescent dyes, green fluorescent proteins (GFPs) (Chalfie et al., 1994; Prasher et al., 1992; Tsien, 1998), and quantum dots (QDs) (Bruchez et al., 1998; Chan and Nie, 1998; Michalet et al., 2005). By converting the light signal into a different frequency, the unwanted ambient scattering can be easily removed with proper optical filters. However, the incoherent fluorescence signal lacks the capability of 3D representation, and the finite scanning speed constrains these techniques to mostly two dimensions. Attempts to extend fluorescence microscopy into three spatial dimensions over time (4D microscopy) (Gerlich and Ellenberg, 2003) resulted in a great sacrifice of framing time and are thus incapable of capturing dynamic events. In 2008, Pu et al. demonstrated a new holographic principle termed harmonic holography (H2), which devised a mechanism of contrast for 3D imaging through holography. The goal of H2 is not to surpass the existing technologies in spatial resolutions but rather to complement them in 3D framing time (see Figure 1). Although H2 is constrained within the diffraction limit in the spatial domain, its 3D framing time is theoretically limited only by the pulse width of the source laser. Furthermore, the
80
Ye Pu et al.
center of the tagging particles can be resolved with subwavelength accuracy, although two particles at that separation cannot be distinguished from each other (Ghosh and Webb, 1994; Schnapp et al., 1988). These features make H2 potentially a very powerful tool for imaging fast dynamics of complex 3D structures in the cell. Here we describe the principle and application of H2. We first describe the theoretical foundation for scattering of optical harmonics by nanocrystals in Section 2. In Section 3 we then discuss the coherence of optical second harmonics and explain the principle of H2, along with the first experimental demonstration of this principle. In Section 4 we describe the application of H2 in 3D harmonic holographic microscopy for cell imaging. In Section 5 we briefly discuss the future developments in H2. Our conclusion is found in Section 6.
2. SCATTERING OF OPTICAL SECOND HARMONICS When a material system is exposed to an intense optical field E(o), its optical properties such as dielectric constant can be modified by the applied field. Thus, the optical response of the material system is no longer in a linear relationship to the optical field. The polarization density of the material system can be generally modeled with a Taylor expansion series: P ¼ xð1Þ EðoÞ þ xð2Þ EðoÞEðoÞ þ xð3Þ EðoÞEðoÞEðoÞ þ ;
(3.1)
where the coefficient x(n) is the nth-order susceptibility of the material. The second-order response, x(2) E(o)E(o), gives rise to a radiation at exactly twice the frequency of the applied field. First discovered in 1961 by Franken et al., the phenomenon of optical frequency doubling is known as second harmonic generation (SHG). Because SHG is mostly a nonresonant process where Kleinman’s symmetry holds, by convention the tensorial second-order coefficient is often expressed in a contracted notation as x(2) ¼ 2d and the polarization density at the second harmonic frequency is calculated with (Boyd, 2002) Pð2oÞ ¼ 2dEðoÞEðoÞ 2
d11 ¼ 24 d21 d31
d12 d22 d32
2
d13 d23 d33
d14 d24 d34
d15 d25 d35
3 E2x 3 6 E2y 7 7 d16 6 6 E2 7 5 6 7; z d26 6 7 2E E x y 6 7 d36 4 2Ey Ez 5 2Ex Ez
(3.2)
where Ex, Ey, and Ez are field components in the crystal coordinate frame.
Harmonic Holography
81
SHG is specific to materials with noncentrosymmetric crystalline structures only, and x(2) vanishes for all other types of materials. Therefore, SHG satisfies the requirement to serve as a mechanism of contrast for biomedical imaging applications. A sharp contrast is formed when particles of noncentrosymmetric structures are dispersed in a medium of other species, excited at a fundamental frequency, and imaged at the second harmonic frequency. A number of material systems have been investigated for their SHG properties, including ZnO (Johnson et al., 2002; Kachynski et al., 2008), Fe(IO3)3 (Bonacina et al., 2007), KNbO3 (Nakayama et al., 2007), KTiOPO4 (Sandeau et al., 2007; Xuan et al., 2006), and BaTiO3 (Hsieh et al., 2009). This section describes the uniqueness of nanoscale SHG and lays a theoretical foundation for it, with BaTiO3 as one particular example.
2.1. SHG Compared with Two-Photon Fluorescence It is interesting to compare SHG and two-photon fluorescence side by side. Figure 2a shows the electron energy transition diagram ( Jablon´ski energy diagram) for SHG. In an SHG process, the total energy of the photon pair is smaller than the main energy band gap of the material (a)
(b) Singlet state
Excited state wp
S1
~10–15 s w = 2wp
wp S0 Ground state Second harmonic generation
wp S1 wp S0
~10–12 s
Triplet state ~10–9 s Intersystem >10–3s crossing T1 w < 2wp w < 2wp
Ground state Phosphorescence Fluorescence
FIGURE 2 Comparison between SHG and two-photon fluorescence. (a) SHG, where the energy of the photon pair is less than the lowest band gap. The bond electrons undergo only a brief ( a few femtoseconds) virtual transition before they emit a single photon with exactly doubled frequency and transit back to the ground state. Energy is conserved throughout process. (b) Two-photon fluorescence, where the energy of the photon pair is greater than the lowest band gap. The bond electrons absorb the photon pair and transit to a real energy level. After undergoing nonradiative energy decay, the excited electrons stay at the lowest excited state for a few nanoseconds before they emit a single photon with a frequency less than the doubled frequency. The process is lossy and the quantum efficiency is always less than 100%. Straight arrows represent transitions involving an absorption or emission of photons. Wavy arrows indicate transitions that do not involve photons. Note that for clarity a number of nonradiative relaxation pathways are not shown.
82
Ye Pu et al.
system, and thus no real electron energy transition takes place. Instead, the electron that annihilates the photon pair undergoes only a brief ‘‘virtual’’ energy state and returns to the ground state while emitting a single photon with exactly twice as much energy as each of the photons in the photon pair. Thus, energy is conserved throughout the process. Because the emission is strictly frequency doubled, the bandwidth of the SHG output is comparable to that of the input. The strict frequency relationship also means that the second harmonic radiation is coherent to the excitation field (see Section 3.1 for further discussion on the coherence property of the SHG process). Moreover, SHG as a generally nonresonant process is ultrafast because the lifetime of the virtual state is in the order of only a few femtoseconds. The process of two-photon fluorescence is drastically different from and much more complex than that of SHG. The Jablon´ski energy diagram for two-photon fluorescence is illustrated in Figure 2b. In a fluorescence process, real electron energy transition occurs through two-photon absorption since the total energy of the photon pair is greater than the main energy band gap of the material system. The excited electron often goes through a vibrational relaxation to reach the lowest excited state, where it usually dwells for a period of a few nanoseconds before returning to the ground state and emitting a single photon with less energy than the total energy absorbed. Due to the nonradiative vibrational relaxation and a number of other energy-decay pathways not shown in the diagram, such as internal conversion, fluorescence is a lossy process. Because each state contains many densely spaced, fine-energy levels that constantly interact with the thermal ambient, fluorescence emission occurs over a broadened frequency band irrespective of the spectral property of the input light. The emission has no constant relationship to the excitation field, and consequently fluorescence is an optically incoherent process. Furthermore, the excited electron may undergo spin conversion and enters a triplet state through intersystem crossing, resulting in phosphorescence. Although the probability of intersystem crossing is much lower than fluorescence, the long lifetime of the triplet state (in the order of milliseconds to minutes) means that, once entered, the fluorophore undergoes an extended silent period during which it is incapable of fluorescing. In addition, molecules in the triplet state possess a high degree of chemical reactivity, often giving rise to photobleaching and toxic by-products. Molecules in the triplet state themselves are also often toxic due to their chemical reactivity. Photobleaching and phototoxicity are major problems for fluorescent agents in the long-term observation of biomolecular activities. Note that the above discussion of two-photon fluorescence also applies to single-photon fluorescence. When used as a mechanism of contrast in biomedical imaging applications, there are more differences between SHG and two-photon fluorescence
Harmonic Holography
83
to be noted. One particularly important difference not directly related to the electron energy transition is that emissions from many fluorescent agents, such as QDs and GFPs, fluctuate over time (Dickson et al., 1997; Kuno et al., 2000). This ‘‘blinking’’ behavior makes it difficult to track the complex movements of individual tags. SHG signals, on the other hand, are stable and do not blink. Another difference lies in the flexibility of excitation. SHG is largely a nonresonant process, which works well in a broad range of wavelengths and is thus quite flexible on the choice of excitation frequency. Fluorescence, on the other hand, works best when the excitation wavelength falls into its ‘‘excitation band’’ that is usually a few tens of nanometers wide. Furthermore, SHG emission is usually fairly narrowband, whereas fluorescence bandwidth is often many times wider. This gives SHG an edge over fluorescence in rejecting the unwanted autofluorescence from biological components by simply using a narrowband-pass filter. Finally, since the SHG process is finished within a few femtoseconds, electrons are always available for further excitation and the output does not saturate over increasing excitation intensity. Fluorescence, however, does easily saturate because the excited electrons remain in the excited state for a few nanoseconds before returning to the ground state and again becoming available for another excitation. Table 1 summarizes the differences between SHG and various fluorescent agents. It should be noted that the SHG methodology is in many ways a complement to rather than a replacement of its fluorescence counterparts. Some recent developments in the field of fluorescent markers are noteworthy. Wang et al. (2009) synthesized a new type of QDs that do not blink, making such QDs more suitable for molecular imaging. Ow et al. (2005) developed 20–30-nm core-shell structures where multiple fluorescent dye molecules are enclosed in a silica shell. The authors demonstrated that such composite nanoparticles do not blink and are more resistant to photobleaching than fluorescent dye molecules. Fu and colleagues (2007) also showed that the fluorescent diamond nanocrystals with nitrogen-vacancy defects are free of blinking and resistant to photobleaching.
2.2. Differences between Bulk and Nanoscale Harmonic Generations In terms of SHG, perhaps the most frequently asked question after crystal symmetry is the phase-matching condition. Although the symmetry governs whether SHG is possible at all in a given material system, the phase-matching condition controls the efficiency of the SHG output. The SHG process has been well studied in bulk crystals, where it was treated as a wave propagation problem and there is always a clearly defined wave vector for each frequency component involved. The entire process
TABLE 1 Summary of comparison between SHG and fluorescence
a
Methodology
Sensitivity (GM)
Type of Signal
Excitation Lifetime
Signal Saturation
Signal Bleaching
Signal Blinking
Second harmonics Fluorescence/QD Fluorescence/GFP Fluorescence/dye
100–10,000a 2,000–47,000b 7.5c 1–300d
Coherent Incoherent Incoherent Incoherent
<1 fse 7–22 nsf >3.3 nsg 1–5 nsh
No Yes Yes Yes
Noh Yesi Yesj Yesi,k
Noh Yesl,m Yesn Yesk,o
Based on 100-nm crystalline nanospheres. See Figure 5. After Larson et al. (2003). After Xu et al. (1996). d After Albota et al. (1998). e Based on bond electron response time. See Boyd (2002). f After Schlegel et al. (2002). Also see Resch-Genger et al. (2008). g After Chattoraj et al. (1996). h See, for example, Le Xuan et al. (2008) for results of KTiOPO4 nanocrystals. i QDs are up to 100 times more resistant to photobleaching than fluorescent dyes. See Bruchez et al. (1998), and Chan and Nie (1998). j After Patterson et al. (1997). k Recently silica-coated fluorescence dye core-shell structures of 20–30-nm diameter were synthesized, which demonstrates enhanced brightness and photostability. See Ow et al. (2005). l Statistical properties of QD intensity fluctuation have been extensively researched. See, for example, Kuno et al. (2000) and Shimizu et al. (2001). m Recently nonblinking QDs have been developed. See Wang et al. (2009). Diamond nanocrystals with nitrogen-vacancy defects were also demonstrated to possess stable emission intensity; see Fu et al. (2007). n After Dickson et al. (1997) and Haupts et al. (1998). o Blinking of fluorescent dye molecules occurs only at the single-molecule level. Some examples of fluorescent dye molecule-blinking behavior can be found in Yip et al. (1998) and Zondervan et al. (2003). b c
Harmonic Holography
85
must be modeled with coupled wave equations (Boyd, 2002). Because the fairly strict phase-matching condition must be met, SHG in bulk materials is a rather narrow band process, where walk-off of the input frequency leads to output dropping without tuning. Although scattering and propagation are tightly coupled optical phenomena, SHG as a scattering process has been studied only recently (Dadap et al., 2004; de Beer and Roke, 2009). The problem of SHG scattering by a simple sphere—the simplest scattering problem—is depicted in Figure 3. Unlike the SHG process in bulk materials, no single wave vector could represent the scattered field (at either fundamental or second harmonic frequency), which propagates in all directions. This creates an extremely complicated situation for a sphere comparable to or greater than the coherent buildup length of the material (which is usually a few tens of microns), above which phase matching occurs. Thus, the coupled wave equations describing the SHG process must be solved with extensive considerations of the boundary conditions. The problem becomes less stringent as the size of the sphere drops below the coherence length. However, the complexity is still substantial when the size of the sphere is larger than the smallest wavelength involved in the process, due to extensive interference among waves generated from different parts of the sphere. In fact, neither case is solvable without extensive numerical efforts.
FIGURE 3 Scattering in second harmonic frequency. A nanocrystal of noncentrosymmetric structure (tetragonal BaTiO3 as an example) is excited by an external optical field E0 of frequency o. The physical dimension of the nanocrystal is much smaller than the wavelengths involved in the process. The relative dielectric constant of the nanocrystal and the surrounding medium is e2 and e1, respectively. The nanocrystal possesses a second-order nonlinear susceptibility tensor d. The internal polarization p of the nanocrystal is nonlinear to the external field E0, giving rise to a polarization in second harmonic frequency. Because of the small size of the nanocrystal, phase matching does not happen. Therefore, the nanocrystal behaves like a dipole, and the emission in second harmonic frequency is in all directions.
86
Ye Pu et al.
This problem is significantly simplified when the size of the sphere shrinks to a scale that is smaller than the smallest wavelength involved. Because the sphere is so small compared with all wavelengths, the waves generated from different parts are essentially in phase. Thus, the scattering problem can be resolved by using the electrostatic approximation, and the radiation field can be modeled by treating the sphere as an electric dipole (Bohren and Huffman, 1998). The nanoscale SHG scattering is extremely broadband, and the output remains approximately constant in a broad range of input frequency. To a certain extent, the SHG process is as efficient as it can be at the nanoscale where phase matching does not play a role.
2.3. Emission Power and Scattering Cross Section of Nanoscale SHG The SHG emission power from nanoscale objects involving centrosymmetric material systems has been investigated in a number of previous theoretical and experimental works (Dadap et al., 1999; de Beer and Roke, 2009). Because of the material symmetry, SHG from such systems in a homogeneous excitation field is dipole prohibited and is thus in a quadrupole form originating from the surface rather than the body. The dipole form of body and surface contributions exist only in a highly focused field with a large field gradient. Work was also done in the context of hyperRayleigh scattering (Clays and Persoons, 1991; Vance et al., 1998) to quantify the incoherent SHG response from isotropic materials as an ensemble behavior. We now try to calculate the SHG emission power from an individual nanoscale sphere of noncentrosymmetric material given an optical input. Here we assume that the sphere of interest is not too small so that the surface contribution to the SHG response is negligible compared to the volume contribution. Due to the tensor nature of the second-order nonlinear coefficients, the SHG process is highly vectorial and polarization dependent. Because the orientation of the nanocrystals is arbitrary, the input electric field must be projected onto the crystal axis to calculate the total second harmonic emission power. As illustrated in Figure 4, this can be accomplished by transforming the input electric field from the laboratory frame XL-YL-ZL to the crystal frame XC-YC-ZC. Practically, this coordinate transformation is decomposed into three consecutive coordinate rotations: a rotation of ’ around the z-axis, followed by a rotation of y around the x-axis, and followed by a rotation of c around the z-axis. The input electric field (external of the nanocrystal) in the crystal coordinate is then E0 ¼ TEL, where EL is the electric field in the laboratory frame, and the transformation matrix is expressed as
Harmonic Holography
87
XL Ew kw
XL
ZC
j y
q
ZL
YL Laboratory frame
YL
ZL XC Crystal frame
YC
FIGURE 4 Coordinate transformation from the laboratory frame to the crystal frame. The transformation can be considered as a series of coordinate rotations. XL-YL-ZL, laboratory frame. XC-YC-ZC, crystal frame. Transformation from the laboratory frame to the crystal frame consists of three consecutive rotations in the order of j (around Z), y (around X), and c (around Z). The circular planes are for visual aid.
2
cos c T ¼ 4 sin c 0
sin c cos c 0
32 1 0 0 0 5 4 0 cos y 1 0 sin y
32 0 cos ’ sin y 5 4 sin ’ cos y 0
sin ’ cos ’ 0
3 0 0 5: 1 (3.3)
A simplification of the transformation matrix is possible for material systems such as tetragonal BaTiO3 (point group 4 mm), where rotation of c has no effect on the total dipole moment. Thus in the following discussion, by taking c ¼ 0 the transformation matrix is simplified as 2 32 3 1 0 0 cos ’ sin ’ 0 T ¼ 4 0 cos y sin y 5 4 sin ’ cos ’ 0 5: (3.4) 0 sin y cos y 0 0 1 This simplification, however, is not generally applicable and should be exercised with caution. As discussed previously, in a dimension that is much smaller than the wavelengths involved in the SHG process, polarizations at all points of the material are indeed in phase, and the entire nanocrystal can be treated as a radiating dipole. Referring to Figure 3, the internal electric field inside the crystal sphere in the dipole approximation is ( Jackson, 1998) Ei ¼
3e1 E0 ; e2 þ 2e1
(3.5)
where e1 and e2 are the dielectric constant of the inside and outside media, respectively. Note that here we have assumed the refractive index of the nanocrystal is isotropic. This simplification is usually not true but often quite accurate. By using a fully vectorial «2 an exact model can be established
88
Ye Pu et al.
to include birefringence. The polarization density at second harmonic frequency can be found through P(2o) ¼ 2d EiEi. The total dipole moment as a function of the input field E0 at second harmonic frequency is then 2 4 3e1 2d E0 E0 ; (3.6) pð2oÞ ¼ pr 3 e2 þ 2e1 3 where r is the radius of the nanocrystal, and 2d E0E0 term is calculated according to Eq. 3.2. In dipole approximation the total radiation power produced by this dipole moment can be calculated as Wpð2oÞ ¼
c2 Z0 k42 ð2oÞ 2 jp j ; 12p
(3.7)
where c is the speed of light, Z0 ¼ 1=e0 c is the impedance of vacuum, and k2 ¼ 2o=c is the wave number of the second harmonic radiation. Since the orientation of the dipole moment p(2o) (in the crystal frame) is arbitrary, it is often useful to find the dipole components in the laboratory frame. This transformation can be achieved by multiplying p(2o) by the inverse matrix T1, ð2oÞ
pL ð2oÞ pL
¼ T1 pð2oÞ ;
(3.8)
where is the dipole moment in the laboratory frame. The radiation power by the dipole component in XL-, YL-, and ZL-directions can now be obtained separately by substituting the corresponding component of ð2oÞ pL into Eq. 3.8. Although hyperpolarizability in the Gaussian system of units is the norm in the hyper-Rayleigh scattering community, its use is cumbersome and less intuitive in the field of imaging and microscopy where the MKS (meter-kilogram-second) unit system dominates. Here we adopt the concept of two-photon scattering a cross section from the field of two-photon fluorescence and mark the SHG property of nanocrystals with a similarly defined ‘‘second harmonic scattering cross section.’’ Despite the many differences, SHG and two-photon fluorescence share the nature of twophoton interaction, and thus this definition is reasonable. Under this defið2oÞ nition, Wp ¼ sð2oÞ I 2 where s(2o) is the second harmonic scattering cross section, and I is the excitation intensity. Following the convention of twophoton absorption (Albota et al., 1998; Xu et al., 1996), we use the Go¨ppertMayer (GM) unit as the unit for s(2o), 1 GM ¼ 1 1050 cm4 s photon1. The use of the GM unit also enables a direct comparison between SHG scattering and two-photon fluorescence. The definition of intensity used here follows I ¼ njEj2 =ð2Z0 Þ. For a particular geometry, the tensor operation in Eq. 6 can be expressed in a scalar relationship as |2d E0E0| ¼ 2deff|E|2. This leads to the calculation of the second harmonic scattering cross section directly from the material properties and the geometry:
Harmonic Holography
s
ð2oÞ
4 2 3 4 64p 3e1 c Z 0 k2 6 2 ¼ r d eff : n2 27 e2 þ 2e1
89
(3.9)
Note the sixth-order dependence of the cross section on the size of the sphere. To compare this to linear scattering, let us consider the linear cross section (Bohren and Huffman, 1998) 8p e2 e1 2 4 6 ðoÞ k1 r ; (3.10) s ¼ 3 e2 þ 2e1 where k1 ¼ o=c is the wave number of the fundamental frequency. The ratio between the physical cross section in second harmonic and the linear cross section is then 2
d I sð2oÞ I e31 eff : ¼ 1152 2 2 ðoÞ s ðe2 þ 2e1 Þ ðe2 e1 Þ e30 c
(3.11)
As an example, let us consider a spherical BaTiO3 nanocrystal under an excitation intensity of 1 1011 W=cm2 . Assuming d eff ¼ 1:59 1022 C=V2 for BaTiO3 (Boyd, 2002), the above ratio is 7.2 105. Thus at this intensity approximately 70 second harmonic photons are generated for every 1 million fundamental photons scattered. Figure 5 shows sample calculations based on the above theory for select material systems in various orientations. The nanocrystals under consideration were assumed to be spheres of 100-nm diameter. The cross sections are plotted as a function of input polarization angle. The dipole polarizations in x-, y-, and z-directions are shown separately as dotted, dashed, and dotted-dashed curves, respectively. All cross sections are in GM units. We note that, experimentally, this type of measurements can be used to determine the crystal orientation. When imaged using high–numerical aperture (NA) optics, special treatment is needed to handle the randomly oriented dipole moments. This is normally not encountered in traditional imaging techniques where scalar rather than vectorial approaches apply. Here we try to calculate the collection efficiency for radiations from an arbitrarily oriented dipole to find the actual dipole moment from measured image intensity. From the angular distribution of the dipole radiation ( Jackson, 1998), dP c2 Z0 k4 2 jpj sin2 y; ¼ 32p2 dO the collection efficiency for the dipole can be obtained as Ð dP dO ; ¼ dO Wp
(3.12)
(3.13)
90
Ye Pu et al.
q = 90°, j = 0°
q = 90°, j = 90°
q = 45°, j = 45°
CdS (6 mm)
– ZnS (4 3 m)
LiNbO3 (3 m)
BaTiO3 (4 mm)
q = 0°, j = 0°
FIGURE 5 Calculated polarization-dependent second harmonic scattering cross section as a function of input polarization angle for select material systems in various orientations as marked in the top row. Solid curves show the total dipole moment. Dotted, dashed, and dotted-dashed curves show dipole moments in x-, y-, and z-directions, respectively. The nanocrystals in the calculation were assumed to be a sphere of 100-nm diameter. The unit of second harmonic scattering cross section is in Go¨ppert-Mayer (GM).
where Wp is the total radiate power. As illustrated in Figure 6, a dipole moment p of an arbitrary orientation can be considered as the sum of a transverse dipole moment pr and an axial dipole moment pz. The collected radiation power is then P ¼ rPr þ zPz, where Pr and Pz are the radiated power by pr and pz, respectively, calculated according to Eq. 8. Simple analyses indicate that r and z can be expressed analytically as
Harmonic Holography
91
FIGURE 6 Collection efficiency of high-NA objective for an arbitrarily oriented dipole. (a) An arbitrarily oriented dipole p can be decomposed into a transverse component pr and an axial component pz, whose radiation pattern is shown in the upper- and bottom-left corners, respectively. (b) The objective collection efficiency for pr and pz as a function of the aperture angle y. The efficiencies Zr and Zz are taken into account in the theoretical calculations of the cross sections. Experimentally, contributions from pr and pz cannot be separated and were treated equally using Zr.
1 3 1 cos y cos3 y 2 8 8 1 1 1 z ¼ cos y sin2 y cos y; 2 2 4 r ¼
(3.14)
where y ¼ sin1 ðNA=nÞ is the collection angle (i.e., the angular aperture), and n is the refractive index of the surrounding medium of the dipole being imaged. Figure 6b plots r and z as a function of the collection angle. Clearly, at small collection angles imaging of pz is much less efficient than that of pr. As y increases, both r and z increase, but z increases more rapidly than r. The two efficiencies finally reach an equal value of 0.5 at the maximum possible collection angle of p=2. The above theory for the calculation of SHG radiation power was verified through a correlated scanning electron microscopy (SEM) and optical SHG imaging of isolated BaTiO3 nanocrystals. The nanoparticles were first dispersed on an indium-tin-oxide (ITO)-coated glass slide with prescratched coordinates for SEM. After the SEM measurements, optical SHG measurements with exactly one-to-one mapping to the SEM images were performed with the assistance of the prescratched coordinates. The measurements were conducted using a femtosecond Ti:sapphire oscillator operated at 800 nm for excitation with a peak intensity of roughly 1 109 W=cm2 . To obtain a polarization-dependent response, the
92
Ye Pu et al.
polarization of the excitation was rotated and the SHG radiation power was measured for each angle. The sample was immersed in indexmatching oil to work with the oil-immersion objective (1001.4 NA), and the collection angle at this NA is 1.20 radian. The collection efficiencies corresponding to this configuration are r ¼ 0.358 and z ¼ 0.240. A 400-nm interference band-pass filter in conjunction with a colored-glass filter was used to efficiently reject the excitation wavelength. All transmission of the imaging optics was calibrated and factored so that a true, system-independent sample response is obtained. Because the contributions from pr and pz cannot be distinguished experimentally, we treat them equally in the measurements using Zr. This approach slightly underestimates the true cross section when there is a significant pz involved, provided a sufficiently large NA. In the above experimental condition, for example, the real cross section will be 34% smaller than the real value when the dipole consists of purely pz. Results of the correlated SEM-optical measurements are shown in Figure 7b as total SHG scattering cross section in GM units. The crystal orientation, described as values of y and ’, was solved by searching through all possible combinations of the two angles and finding the best match in the angular pattern. For best visualization, the calculation based on the theory at the best orientation match was scaled by a constant factor to fit the maximum value. The measured value of the cross section is 20%
FIGURE 7 Measured polarization-dependent SHG scattering cross section. (a) Scanning electrom microscopy (SEM) image of an isolated BaTiO3 nanocrystal. The diameter measured from the SEM picture was 106 nm. (b) Polarization-dependent measurement of SHG scattering cross section corresponding to the nanocrystal shown in (a). Dots, measurements. Curve, calculated total emission fit to the experimental data, from which the orientation of the nanocrystal was determined as shown in the upper-right corner. Objective collection efficiency for NA ¼ 1.4 and the refractive index of the immersion medium was taken into account. The measured value is 20% lower than the theoretical prediction due to the inability to separate the dipole components in the experiment.
Harmonic Holography
93
lower than the theoretical prediction due to the inability to separate contributions from pr and pz experimentally. When pr and pz in the theoretical calculation were treated equally using Zr, the theory and the experiment matched well.
3. PRINCIPLE OF HARMONIC HOLOGRAPHY In this section, we describe the basic principle of H2, which takes advantage of the unique property of SHG that creates contrast in the coherent domain. H2 captures 3D images of SHG active nanocrystals within a time as short as (in theory) a single laser shot. The need for scanning is entirely eliminated. This fast imaging capability enables H2 to capture very ephemeral yet complex 3D phenomena, which is an extremely useful and highly sought-after asset in the field of biomedical imaging.
3.1. Coherence of Optical Harmonics Despite the new frequency generated in the process, SHG is principally a phenomenon of forced oscillations. The output of the SHG process maintains a clear analytical relationship to the driving optical field (i.e., the fundamental wave), regardless of the underlying material system. At nanometer scale, this relationship is further simplified because of the absence of dispersion and phase matching. In general, SHG radiations emitted from one material system are coherent to the SHG emissions from any other material system provided they are driven by the same excitation field. To demonstrate the coherence between SHG radiations emitted from independent material systems driven by the same field, Figure 8 shows the interference pattern between two independently generated second harmonic waves based on a Mach–Zehnder interferometer setup. One of the waves was a planar wave generated by a beta barium borate (BBO) crystal, and the other was a spherical wave generated by a quartz crystal and focused with a lens. Both crystals were under the excitation of the same femtosecond laser pulses. When the optical path length was carefully matched using a motorized translation stage, a sharp interference pattern was observed from the charge-coupled device (CCD) image.
3.2. Harmonic Holography Based on the coherence property of SHG process, Pu and colleagues (2008) suggested that holograms can be recorded using second harmonic signals as a mechanism of contrast to obtain whole-field 3D images with specificity. The principle of harmonic holography is illustrated in Figure 9.
94
Ye Pu et al.
FIGURE 8 Interference pattern between two independently generated second harmonic waves. One of the waves was a plane wave generated by a BBO crystal, while the other was a spherical wave generated by a quartz plate and focused with a lens. SHG-tagging nanocrystals
w + 2w 2w signal
2w
2w band-pass filter
CCD
ref ere
nc
e
w pump laser pulse
FIGURE 9 Basic principle of harmonic holography. When excited by an intense pump laser pulse of frequency o, a collection of tagging nanocrystals scatters in both o and 2o frequencies. A band-pass filter picks up the 2o signals and rejects the linear scatterings and the pump pulse. An independently frequency-doubled reference interferes with the 2o signals at the CCD camera and forms a hologram. Structures that are not tagged with the nanocrystals are incapable of SHG and scatter only in o. As such, they will not be recorded on the hologram. [Reprinted from Pu et al. (2008) with permission from the Optical Society of America.]
In this scheme, an intense laser pulse at frequency o is delivered to a group of second harmonic–generating nanocrystals, which are tagged to specific parts of the system being studied. The nanocrystals emit scattering signals at both the fundamental (o) and second harmonic (2o) frequencies. A band-pass filter centered at 2o rejects the excitation and scattered light at
Harmonic Holography
95
the fundamental frequency but transmits the second harmonic signal to an imaging sensor. A reference beam at the doubled frequency, independently generated from the same excitation laser pulse, is also delivered to the CCD sensor and a hologram is thus formed. The coherence of the SHG process guarantees that the scattered 2o signals have a deterministic, static phase relationship with the 2o reference. Pu et al. (2008) also made the first demonstration of harmonic holography. Their experimental setup used in the demonstration is shown in Figure 10, where an amplified Ti:sapphire femtosecond laser operating at 810-nm wavelength and 150-fs full width half maximum (FWHM) duration was used as the excitation source. The laser pulses of 2-mJ energy and 10-Hz repetition rate were split into a pump and a reference beam at the beam splitter BS1. Approximately 1 mJ of the energy was delivered to the sample containing randomly dispersed SHG imaging targets. After the boost in intensity through the reversed 3:1 telescope (L3 and L4), the pump intensity at the sample was 1 1011 W=cm2 . This intense pump beam at the fundamental frequency was rejected by the filter (F2), while the weak second harmonic radiations form the samples (the object wave) passed through and were collected by the aspheric objective (NA ¼ 0.5) at a magnification of 25. An independently generated (through a BBO crystal) second harmonic plane wave interfered with the signal wave at
Variable delay w
Pump
BS1
2 mJ 150 fs M2
Reference
M5
M7 M1 L3 L4
BBO F1
S
F2 2w
M6 F3 F4 BS2
M8 ASP OBJ
M3
M4
2w L1
CCD
L2
FIGURE 10 Experimental setup for the first demonstration of harmonic holography. Femtosecond laser pulses of 2-mJ energy and 150-fs pulse width FWHM were split into a pump and a reference beam at BS1. After a variable delay line, the pump carrying the major portion of the energy was shrunk by 3:1 with a reverse telescope (L3, L4) for higher intensity and sent to the sample containing SHG nanocrystal clusters. The second harmonic signals scattered from the nanocrystal clusters were collected by the aspheric objective and steered to the CCD camera. The reference, frequency-doubled through a BBO crystal, interfered with the signal at the CCD plane and forms a hologram. BS1, BS2, Beamsplitter 1 and 2; L1, L2, lenses 1 and 2; ASP OBJ: aspheric objective; M1–M8, mirrors; F1–F4, filters; S, sample. [Reprinted from Pu et al. (2008) with permission from the Optical Society of America.]
96
Ye Pu et al.
the beam splitter BS2 to form holograms that were captured by the CCD camera. To handle the short coherent length of the femtosecond light pulses, which is less than 50 mm, a variable delay line was used to match the optical path lengths of the two second harmonic waves. In the experiment, 100-nm BaTiO3 nanocrystals were used as the imaging target. Since the as-purchased nanocrystals were in a centrosymmetric cubic structure and were unable to produce second harmonic responses, the nanocrystals were heated to 1000 C for 1 hour followed by rapidly cooling. The heating converted the crystal structures into tetragonals and resulted in large clusters with a broad variation in size, as revealed by the SEM images shown in Figure 11. Larger clusters were then collected through sedimentation for better control of the number density of particles and were dispersed randomly on a microscope cover slip. The position of the sample was adjustable to suit different imaging purposes. When the sample was positioned at the imaging focal plane and the reference was shut off, a ‘‘direct’’ (on-focus) image could be captured; when the sample was moved off-focus by a short distance (30 mm in this experiment) and the reference was turned on, a hologram was obtained. In the digital holographic-recording process, the intensity of the reference wave was set to at least 100 times stronger than the read noise of the CCD so that the read noise would not contribute significantly to the overall noise. The digital image captured by the CCD camera was
FIGURE 11 Scanning electron microscopy image of the BaTiO3 nanocrystal cluster sample. The clusters were obtained by heating 100-nm cubic-phase BaTiO3 nanocrystals at 1000 C for 1 hour followed by rapidly cooling. The sintering converted the crystal structure from cubic phase to tetragonal phase while forming clusters with large size variations. [Reprinted from Pu et al. (2008) with permission from the Optical Society of America.]
Harmonic Holography
97
reconstructed numerically to obtain the electric field of the object wave at the sample plane. Because the reference was introduced at a near-zero angle with respect to the optical axis, the hologram was treated as in-line. The reconstruction algorithm was essentially a convolution between the holographic image intensity and the system impulse function (Schnars and Juptner, 2002): Gðx; ; zÞ ¼ hðx; yÞ gðx; y; zÞ;
(3.15)
where G(x, , z) is the reconstructed 3D image intensity at object coordinate (x, , z) , h(x, y) is the intensity of the holographic image captured from the CCD at hologram coordinate (x, y), and g(x, y; z) is the system impulse function at object coordinate z along the optical axis. This convolution was implemented as a sequence of fast Fourier transforms (FFTs) and inverse FFTs. Because the aspheric objective was not designed to accommodate the thick filter F2 (1.7-mm thick), significant spherical aberration was found in the image. To compensate for such aberrations, a ray-tracing–based algorithm was used to computationally generate the accurate impulse function g(x, y; z) from the design parameters published by the manufacturer. This approach allowed for a complete cancellation of the aberration effect and enabled an image reconstruction with near–diffraction-limited performance. Since imaging with second harmonic signals often involves a very low photon count, an accurate estimate of signal levels for quantifying the signal to noise ratio (SNR) becomes difficult because of significant shot noise. Therefore, the actual signal intensity was estimated through a long integration time and linear scaling. An on-focus (direct) second harmonic image of the nanocrystal clusters is shown in Figure 12. The image is acquired with an integration of 100 pulses and serves as an accurate measurement of signal intensity. For the convenience of discussions, each major object was assigned with a unique identification number. A comparison was made in Figure 13 between direct and H2 imaging under identical conditions at integrations of 1, 5, and 10 pulses. The purpose of the experiments at different integration times was to investigate the imaging characteristics as a function of signal intensity rather than the integration time itself. For visualization reasons, each panel in Figure 13 was histogram-stretched so that the darkest 0.01% pixels are black and the brightest 0.01% pixels are white. Magnified images of object no. 7 are shown in the insets. The effect of aberration compensation is clearly seen as the nebulous images in the direct picture become much sharper in the reconstructed holographic image, revealing the fine structure of the nanocrystal clusters through a random phase-matching effect. A clear gain in SNR through reference biasing and aberration compensation was also demonstrated in Figure 13.
98
Ye Pu et al.
6 4
9
7
11
3 10
5 1 2 8 50 mm
FIGURE 12 Second harmonic focal image of the nanocrystal clusters obtained directly from the CCD at an integration of 100 laser pulses. The objects are numbered for the ease of discussion. [Reprinted from Pu et al. (2008) with permission from the Optical Society of America.]
Pu et al. (2008) gave a definition for the SNR of the direct images while taking the shot noise into account: hIt i D ffi; SNRðDÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðDÞ 2 ðDÞ hIi i D þ ðN DÞ ðDÞ
(3.16)
where D is the digitization sensitivity of the CCD camera in electrons per digit, and N(D) is the device noise. In the equation the effect of detector quantum efficiency is ignored, which affects only the absolute sensitivity but not the SNR at the same level of signal. At the integration time used in this experiment, the read noise of the CCD camera dominates N(D). The authors also gave a definition for the SNR in H2 imaging (again ignoring detector quantum efficiency), where the bias provided by the reference wave renders the device noise negligible: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðHÞ SNRðHÞ ¼ hIi i D: (3.17) Based on the above two definitions, the overall gain in the SNR through holography over direct imaging is then 0 1 GSNR ¼
SNRðHÞ SNRðDÞ
¼
pffiffiffiB N ðDÞ D C g@1 þ qffiffiffiffiffiffiffiffiffiffiffiA; ðDÞ hIi i
(3.18)
Harmonic Holography
(a)
(b)
(c)
(d)
(e)
(f)
99
FIGURE 13 Comparison between direct and holographic reconstruction images. (a), (c), (e): Direct imaging integrated for 1, 5, and 10 pulses, respectively. (b), (d), (f): Holographic reconstruction of H2 integrated for 1, 5, and 10 pulses, respectively. The scale bar is 50 mm. The insets show magnified image of object no. 7, where the scale bar is 5 mm. The effect of aberration compensation is clear. For consistent visualization, all images are histogram-stretched so that the darkest 0.01% pixels are black and the brightest 0.01% pixels are white. [Reprinted from Pu et al. (2008) with permission from the Optical Society of America.]
where g is the gain of signal intensity from the cancellation of aberra. ðHÞ ðDÞ tion defined as g ¼ hIi i hIi i. In this experiment an NA of 0.45 was achieved, and numerical investigations using a joint approach of ray tracing and wave propagation suggested g ¼ 1.63. The value of g, of course, increases with objective NA because a larger NA results in higher aberrations.
100
Ye Pu et al.
10 SNR
10
(a) (H )
5
SNR
5
(D)
R
SN
2
2 1 1
5 Number of pulses
(b)
GSNR
20
10
1
1
5 Number of pulses
10
FIGURE 14 SNR comparison between direct and reconstructed H2 images. (a) SNR of direct (downward-pointing triangles) and holographic (upward-pointing triangles) images. Open triangles represent the measurement data, solid triangles with lines show the mean values of the measurements, and the dashed lines are theoretical predictions from Eqs. 3.16 and 3.17. (b) Gain of SNR by holography over direct imaging. Open circles are measurement data points, solid circles with line are the mean value, and the dashed line is the theoretical gain calculated from Eq. 3.18. [Reprinted from Pu et al. (2008) with permission from the Optical Society of America.]
The SNR of each identified object in Figure 12 was calculated based ðDÞ on the experimental condition hIi i 2:5=photons=pixel=pulse; D 3 (D) electrons=digit, and N 2.0, and is plotted in Figure 14a (upwardpointing triangles) as a function of integration time, along with the SNR from the direct image (downward-pointing triangles) for comparison. The open triangles represent the measured data points, solid triangles with lines show the mean values of the measurement, and the dashed lines are theoretical predictions from Eqs. 3.16 and 3.17. The gain in SNR, GSNR, is plotted in Figure 14b as a function of the integration time, where open circles are measurement data points, solid circles with line are the mean value, and the dashed line is the theoretical value calculated from Eq. 18. The deviation of measure gain from the theoretical calculation was due to the uncontrolled airflow in the laboratory environment.
3.3. Performance Estimation In terms of two-photon imaging, SHG represents the highest frequency and shortest wavelength possible. Thus, SHG imaging possesses the highest spatial resolution available in any imaging technique involving the two-photon approach. In H2, the transverse and axial resolution is limited only by the NA of the objective when aberration-free imaging conditions are met. In a 100% intensity fall-off criterion, the smallest transverse and axial resolvable units can be estimated as dr ¼ 1.22 l/NA and dz ¼ 2 l/NA2, respectively. This was verified experimentally at 400-nm
Harmonic Holography
101
FIGURE 15 Measured point spread function of H2. (a) Transverse point spread function. (b) axial point spread function. The image was reconstructed from H2 recorded using an Olympus oil-immersion objective (NA ¼ 1.4). The scale bar is 500 nm.
wavelength using an oil-immersion microscope objective of 1.4 NA in a fully index-matched sample (Figure 15). In the measurement, a diffraction limited spot in bright-field microscopy was chosen for the H2 recording. The measured dr and dz are 355 nm and 472 nm, respectively, which agree reasonably well with the theoretical values of 348 nm and 408 nm. Holographic particle images are subject to intrinsic speckle noise (Pu and Meng, 2004). Such noise is due to the interference between outof-focus wave fronts and is thus an inherent property of the 3D image field. Pu and Meng (2004) suggested that the ratio between the signal intensity Is and the standard deviation of the speckle noise sN can be calculated as Is pNA2 ; ¼ 2 sN l np L
(3.19)
where NA is the objective numerical aperture, np is the number density of the particles, and L is the axial dimension of the image field. In H2 microscopy the wavelength refers to the second harmonic wavelength. Although this equation was derived using paraxial approximation, it applies equally well in H2 with high-NA objectives because the same derivation can be performed in the magnified image domain where paraxial condition holds to obtain the same result. For an illustration, let us consider an H2 configuration using water immersion objective with NA ¼ 1.2, where the wavelength in water is l ¼ 301 nm. Assuming we would like to image particles as dense as 0.01 particles/mm3 in a volume of 50-mm depth, the resulting signal to speckle ratio is 100. Equation 19, however, is not the actual SNR of the image due to the interference between the speckle background and the on-focus images. Goodman (1967) gave a good definition of SNR for holographic image with speckle background as Is =sN SNR ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; 1 þ 2Is =sN which is a monotonic function of the intensity ratio Is =sN .
(3.20)
102
Ye Pu et al.
3.4. Holography with Other Types of Nonlinear Emissions The principle of harmonic holography can be extended to exploit a number of coherent nonlinear optical processes other than SHG. These possibilities include sum frequency generation, difference frequency generation, and coherent anti-Stokes Raman scattering (CARS), among others. In particular, the use of CARS will likely enhance efficient multicolor capabilities in H2 microscopy, as well as highly sensitive 3D tracing of particular chemical compositions.
4. THREE-DIMENSIONAL MICROSCOPY OF CELLS WITH HARMONIC HOLOGRAPHY Harmonic holography makes scan-free 3D bio-imaging possible when SHG-active nanocrystals are labeled on the targets of interest and imaged with the H2 microscope. Based on the principle of H2, Hsieh and coworkers (2009) demonstrated 3D microscopy in biological cells using BaTiO3 nanocrystals as SHG tags. In this context, the SHG-active nanocrystals were termed second harmonic radiative imaging probes (SHRIMPs). The sample under investigation consisted of cultured mammalian (HeLa) cells embedded with SHRIMPs. Since aqueous BaTiO3 nanoparticle dispersion is unstable due to insufficient zeta potential, in this experiment the 30-nm BaTiO3 SHRIMPs suspension was stabilized using aminomethylphosphonic acid. The HeLa cells were then incubated for 24 hours at 37 C with the stabilized SHRIMP particles. The cell sample was then washed with phosphate-buffered saline solution to remove unbound SHRIMPs and fixed with 3.7% paraformaldehyde for study in microscopy. During the incubation, it was believed that the presence of an amine group on the SHRIMP surface encourages an uptake of the particles into the cells through endocytosis (Lorenz et al., 2006). It was also believed that the SHRIMPs would be engulfed nonspecifically into vesicles and packed as clusters randomly by the cells. Pilot studies using twophoton laser scanning microscope (Leica SP5; Leica Systems, Wetzlar, Germany) confirmed the presence of SHRIMPs inside the cells. After 24-hour incubation, the HeLa cells were still alive, showing qualitatively a reasonably good biocompatibility of BaTiO3 nanoparticles as SHRIMPs. Figure 16 shows typical optical sectioning images from the pilot study using two-photon LSM. The sample was a calcein-stained HeLa cell with a 3D distribution of SHRIMPs inside it. The excitation was provided by a femtosecond laser operated at 800-nm wavelength. The calcein dye molecules as a cytoplasm indicator emit two-photon fluorescence signal centered at 515 nm, while the SHRIMPs emit an SHG signal centered at 400 nm. The optical data were collected in two separate spectral channels simultaneously. To present such two-channel data properly in the
Harmonic Holography
103
FIGURE 16 (a)–(d) Dual-channel optical sectioning images of aminophosphonic acid– stabilized BaTiO3 nanocrystals uptaken in a HeLa cell in four select planes. The images were obtained using a laser scanning two-photon microscope with an excitation wavelength of 800 nm. The cell cytoplasm was stained with calcein. The BaTiO3 nanocrystals were stabilized with aminophosphonic acid and were uptaken by the cell through possibly endocytosis. Fluorescence from calcein centered at 515 nm and SHG signals at 400 nm from BaTiO3 nanocrystals were collected using separated channels. In these grey-scale pictures, the fluorescence and the SHG channel were rendered in lower and higher shades, respectively. Note that the aminophosphonic acid–stabilized nanocrystals aggregate into small clusters in the cell environment. The scale bar is 5 mm. [Reprinted from Hsieh et al. (2009) with permission from the Optical Society of America.]
grey-scale pictures, the fluorescence and the SHG channel were rendered in lower and higher shades, respectively. A 3D presence of the SHRIMPs inside the cell cytoplasm is clearly seen in Figure 16. Similarly prepared HeLa cell samples were used for the H2 microscopy experiments. The optical setup of the H2 experiments was similar to that used in Pu et al. (2008). A Ti:sapphire femtosecond oscillator was used as the excitation source, delivering a peak intensity of 1:5 109 W=cm2 on the sample. A highly sensitive electron-multiplying charge-coupled device (EMCCD) was used for digital hologram recording. Figure 17a shows a bright-field transmission image of a cell containing some SHRIMP clusters (not clearly visible). Figure 17b shows a superposition of the SHG
104
Ye Pu et al.
(b)
(a)
(f) 6 5
1 2
5
6
4
3 4
3 (c)
(d)
(e) 2 1 0
10
20 z/mm
30
40
FIGURE 17 3D Images of BaTiO3 nanocrystals in HeLa cells. (a) Bright-field microscopy image of the cells of interest. (b) Superposition of the bright-field microscopy image of the HeLa cell (dark pattern) and the conventional microscopy image of SHG signals from the tagging nanocrystals (bright spots). Note that the aminophosphonic acid–stabilized nanocrystals aggregate into small clusters in the cell environment. A total of six clusters were identified and assigned with numbers, some of which were not shown clearly because of their out-of-focus position. (c)–(e) H2-reconstructed images in three distinct planes at relative depths of 0, 3.12, and 6.24 mm, respectively. The white arrows indicate the SHRIMPs that are in focus, while the grey arrows indicate the SHRIMPs that are out of focus. Due to its larger size and stronger signal, cluster No. 4 is bright in both (c) and (d). (f) Normalized intensity line profiles of the six clusters when they are in focus through digital reconstruction. The scale bar is 5 mm. [Reprinted from Hsieh et al. (2009) with permission from the Optical Society of America.]
image from SHRIMPs taken by conventional wide-field microscopy onto a microscopy image of the same cell (artificially darkened for visualization purposes). Here some on-focus SHRIMPs appeared as bright spots, while out-of-focus ones were not quite visible. Each of the SHRIMPs in Figure 17b was assigned a unique number for the convenience of discussion. The same sample area in Figure 17b was imaged using H2 microscopy. The digital hologram was subsequently processed using a numerical reconstruction algorithm to obtain a stack of images in the object domain. Parts c through e of Figure 17 show the reconstructed image planes on three different relative depths into the sample (0, 3.12, and 6.24 mm, respectively) where distinct clusters become on focus. Since in the intracellular environment the SHRIMP particles aggregated into clusters with varying sizes, the SHG intensities of these clusters varied drastically (in a factor of 10). Figure 17f shows the normalized axial line
Harmonic Holography
105
intensity profiles of the six clusters through numerical reconstruction. The unique contrast-forming property of H2 is clearly demonstrated, and all images show a high SNR. In the above experiment, the acquisition time of the digital hologram was 1 minute, which was limited by the integration time to obtain a detectable second harmonic signal. The H2 acquisition time, which is also the 3D framing time, can be further optimized by increasing the excitation power. The peak intensity used in the experiment was 100 times weaker than the critical intensity affecting the cellular metabolism (Konig et al., 1997), and thus roughly 104 times stronger signal (owing to the quadratic relationship between the signal and the excitation and the property of no saturation in SHG) can be achieved by raising the excitation intensity close to this limit. Furthermore, the collection efficiency at the SHG frequency was only 1% due to many suboptimal optical components used, which can also be significantly improved. In addition, the reference wave in H2 can serve as a coherent, homodyne local oscillator, leading to a gain in SNR and a shot noise–limited performance of detection even at fast readout (Pu et al., 2008). With these optimizations, the current H2 setup should be able to record an entire 3D image within a 1-ms time frame. Ultimately, H2 microscopy could reach a temporal resolution limited only by the laser pulse width (Centurion et al., 2006; Pu et al., 2008).
5. FUTURE DEVELOPMENTS The ultimate goal of H2 is to achieve time-lapse single-shot 3D imaging of nanocrystals at the 10-nm size range on par with QDs. Due to the sixthorder dependence on the particle size, SHG in this size range using the present inorganic crystals is less likely to provide sufficient sensitivity for rapid imaging. The development of highly sensitive materials at the 10nm size range suitable for ultrafast H2 imaging will likely generate major impacts in the field of biomedical imaging. Organic SHG dyes and their polymer crystals (Evans and Lin, 2002; Kanis et al., 1994; Long, 1995; Reeve, et al., 2009; Verbiest et al., 1997) have shown great promise in achieving this goal. Enhancements can also be achieved in the inorganic nanocrystals by enclosing them in resonating plasmonic shells (Oldenburg et al., 1998). Band gap–engineered QDs also showed great enhancements in SHG scattering (Sauvage et al., 2001). Furthermore, GFPlike protein structures have been shown to be SHG active (Asselberghs et al., 2008; Khatchatouriants et al., 2000; Wampler et al., 2008). Eventually, it remains of great interest to genetically encode sensitive, bleachingresistant, blinking-free SHG-active proteins for imaging with H2. Surface functionalization and bioconjugation is also an active area of exploration. The eventual success of H2 depends not only on the optical
106
Ye Pu et al.
properties of the tagging nanocrystals, but also the proper targeting of these nanocrystals to the biomolecules of interest. Although many existing functionalization schemes are suitable for the nanocrystals currently in use, new material systems will probably call for novel ways to functionalize the surface. Multicolor, multiplexed imaging is yet another area of high interest. By definition, SHG from any material system falls in the same frequency band (i.e., the doubled frequency), and wavelength-multiplexed imaging using one excitation wavelength is not feasible (contrary to QDs). However, the wide availability of commercial tunable laser sources offers the opportunity of multiplexing with multiple excitation bands, simultaneously or sequentially. This requires creation of a resonant condition in the SHG-tagging nanocrystals through engineered band gap or plasmonic resonance. Molecular vibrational resonance is also an excellent mechanism for multiplexing through CARS. It should be noted that creating a condition of resonance also results in a drastic enhancement in SHG, which makes such schemes highly attractive.
6. CONCLUSION The mechanisms to achieve contrast play a vital role in the field of biomedical imaging. Second harmonic generation provides a unique means to achieve contrast in the coherent domain, enabling the use of holography in contrast imaging. We have laid the theoretical foundation for nanoscale second harmonic generation as a mechanism of contrast. Second harmonic generation is significantly different from fluorescence, a widely used contrast-forming method in modern-day biomedical imaging. The behavior of second harmonic generation at the nanoscale also drastically differs from its counterpart in the bulk material due to the lack of phase matching. The dipole-like behavior of a nanocrystal that scatters in second harmonic frequency simplifies the estimation of the emission power of second harmonics. The theoretical predictions agree well with the experimental measurements. We have also shown that a new holographic principle—harmonic holography—is promising for constructing an ultrafast 4D contrast microscope with high spatial and temporal resolution. Harmonic holography records holograms using second harmonic signals scattered from SHG nanocrystals and an independently generated second harmonic reference. The first proof-of-principle experiment demonstrated a number of unique advantages of harmonic holography over direct imaging, including numerical aberration compensation and device noise elimination. This principle can also be extended to use other coherent optical processes,
Harmonic Holography
107
such as sum frequency generation, difference frequency generation, and coherent anti-Stokes Raman scattering. We have further demonstrated an application of harmonic holography in the high-resolution 3D microscopy imaging in biological cells using tetragonal phase barium titanate nanocrystals as what was termed second harmonic radiating imaging probes. The coherent second harmonic signal allowed for the scanning-less capture of the 3D distribution of the tagging nanocrystals in the cells by a harmonic holographic microscope. Future research in this young field will be focused on the development of highly sensitive SHG nanoprobes, the surface functionalization and bioconjugation of these nanoprobes, and multicolor multiplexing capabilities.
REFERENCES Aaron, J., de la Rosa, E., Travis, K., Harrison, N., Burt, J., Jose-Yacaman, M., and Sokolov, K. (2008). Polarization microscopy with stellated gold nanoparticles for robust, in-situ monitoring of biomolecules. Opt. Express 16, 2153–2167. Agard, D. A., Hiraoka, Y., Shaw, P., and Sedat, J. W. (1989). Fluorescence microscopy in three dimensions. Method Cell Biol. 30, 353–377. Albota, M. A., Xu, C., and Webb, W. W. (1998). Two-photon fluorescence excitation cross sections of biomolecular probes from 690 to 960 nm. Appl. Opt. 37, 7352–7356. Allen, R. D., Allen, N. S., and Travis, J. L. (1981). Video-enhanced contrast, differential interference contrast (AVEC-DIC) microscopy—a new method capable of analyzing microtubule-related motility in the reticulopodial network of Allogromia laticollaris. Cell Motil. Cytoskel. 1, 291–302. Asselberghs, I., Flors, C., Ferrighi, L., Botek, E., Champagne, B., Mizuno, H., Ando, R., Miyawaki, A., Hofkens, J., Van der Auweraer, M., and Clays, K. (2008). Second-harmonic generation in GFP-like proteins. J. Am. Chem. Soc. 130, 15713–15719. Betzig, E., Patterson, G. H., Sougrat, R., Lindwasser, O. W., Olenych, S., Bonifacino, J. S., Davidson, M. W., Lippincott-Schwartz, J., and Hess, H. F. (2006). Imaging intracellular fluorescent proteins at nanometer resolution. Science 313, 1642–1645. Bohren, C. F., and Huffman, D. R. (1998). Absorption and Scattering of Light by Small Particles. Wiley, New York. Bonacina, L., Mugnier, Y., Courvoisier, F., Le Dantec, R., Extermann, J., Lambert, Y., Boutou, V., Galez, C., and Wolf, J. P. (2007). Polar Fe(IO3)3 nanocrystals as local probes for nonlinear microscopy. Appl. Phys. B. 87, 399–403. Boyd, R. W. (2002). Nonlinear Optics. Academic Press, Boston. Brakenhoff, G. J., Blom, P., and Barends, P. (1979). Confocal scanning light microscopy with high aperture immersion lenses. J. Microsc. 117, 219–232. Brauns, E. B., Madaras, M. L., Coleman, R. S., Murphy, C. J., and Berg, M. A. (2002). Complex local dynamics in DNA on the picosecond and nanosecond time scales. Phys. Rev. Lett. 88, article no. 158101. Bruchez, M., Moronne, M., Gin, P., Weiss, S., and Alivisatos, A. P. (1998). Semiconductor nanocrystals as fluorescent biological labels. Science 281, 2013–2016. Campagnola, P. J., and Loew, L. M. (2003). Second-harmonic imaging microscopy for visualizing biomolecular arrays in cells, tissues and organisms. Nat. Biotech. 21, 1356–1360.
108
Ye Pu et al.
Centurion, M., Pu, Y., and Psaltis, D. (2006). Holographic capture of femtosecond pulse propagation. J. Appl. Phys. 100, article no. 063104. Chalfie, M., Tu, Y., Euskirchen, G., Ward, W. W., and Prasher, D. C. (1994). Green fluorescent protein as a marker for gene-expression. Science 263, 802–805. Chan, W. C. W., and Nie, S. M. (1998). Quantum dot bioconjugates for ultrasensitive nonisotopic detection. Science 281, 2016–2018. Chattoraj, M., King, B. A., Bublitz, G. U., and Boxer, S. G. (1996). Ultra-fast excited state dynamics in green fluorescent protein: Multiple states and proton transfer. Proc. Natl. Acad. Sci. 93, 8362–8367. Cheatham, T. E. (2004). Simulation and modeling of nucleic acid structure, dynamics and interactions. Curr. Opin. Struct. Biol. 14, 360–367. Clays, K., and Persoons, A. (1991). Hyper-Rayleigh scattering in solution. Phys. Rev. Lett. 66, 2980–2983. Cognet, L., Tardin, C., Boyer, D., Choquet, D., Tamarat, P., and Lounis, B. (2003). Single metallic nanoparticle imaging for protein detection in cells. Proc. Natl. Acad. Sci. 100, 11350–11355. Cuche, E., Bevilacqua, F., and Depeursinge, C. (1999). Digital holography for quantitative phase-contrast imaging. Opt. Lett. 24, 291–293. Dadap, J. I., Shan, J., Eisenthal, K. B., and Heinz, T. F. (1999). Second-harmonic Rayleigh scattering from a sphere of centrosymmetric material. Phys. Rev. Lett. 83, 4045–4048. Dadap, J. I., Shan, J., and Heinz, T. F. (2004). Theory of optical second-harmonic generation from a sphere of centrosymmetric material: small-particle limit. J. Opt. Soc. Am. B 21, 1328–1347. de Beer, A. G. F., and Roke, S. (2009). Nonlinear Mie theory for second-harmonic and sumfrequency scattering. Phys. Rev. B 79, article no. 155420. Denk, W., Strickler, J. H., and Webb, W. W. (1990). Two-photon laser scanning fluorescence microscopy. Science 248, 73–76. Dickson, R. M., Cubitt, A. B., Tsien, R. Y., and Moerner, W. E. (1997). On/off blinking and switching behaviour of single molecules of green fluorescent protein. Nature 388, 355–358. Dubois, F., Joannes, L., and Legros, J. C. (1999). Improved three-dimensional imaging with a digital holography microscope with a source of partial spatial coherence. Appl. Opt. 38, 7085–7094. Evans, O. R., and Lin, W. B. (2002). Crystal engineering of NLO materials based on metalorganic coordination networks. Accounts Chem. Res. 35, 511–522. Fan, G. Y., Fujisaki, H., Miyawaki, A., Tsay, R. K., Tsien, R. Y., and Ellisman, M. H. (1999). Video-rate scanning two-photon excitation fluorescence microscopy and ratio imaging with cameleons. Biophys. J. 76, 2412–2420. Franken, P. A., Weinreich, G., Peters, C. W., and Hill, A. E. (1961). Generation of optical harmonics. Phys. Rev. Lett. 7, 118–119. Frohn, J. T., Knapp, H. F., and Stemmer, A. (2000). True optical resolution beyond the Rayleigh limit achieved by standing wave illumination. Proc. Natl. Acad. Sci. 97, 7232–7236. Fu, C. C., Lee, H. Y., Chen, K., Lim, T. S., Wu, H. Y., Lin, P. K., Wei, P. K., Tsao, P. H., Chang, H. C., and Fann, W. (2007). Characterization and application of single fluorescent nanodiamonds as cellular biomarkers. Proc. Natl. Acad. Sci. 104, 727–732. Gabor, D. (1948). A new microscopic principle. Nature 161, 777–778. Gao, X. H., Yang, L. L., Petros, J. A., Marshal, F. F., Simons, J. W., and Nie, S. M. (2005). In vivo molecular and cellular imaging with quantum dots. Curr. Opin. Biotechnol. 16, 63–72. Gerlich, D., and Ellenberg, J. (2003). 4D imaging to assay complex dynamics in live specimens. Nat. Cell Biol. 5, S14–S19.
Harmonic Holography
109
Ghosh, R. N., and Webb, W. W. (1994). Automated detection and tracking of individual and clustered cell surface low density lipoprotein receptor molecules. Biophys. J. 66, 1301–1318. Giepmans, B. N. G., Adams, S. R., Ellisman, M. H., and Tsien, R. Y. (2006). Review—the fluorescent toolbox for assessing protein location and function. Science 312, 217–224. Gilmanshin, R., Williams, S., Callender, R. H., Woodruff, W. H., and Dyer, R. B. (1997). Fast events in protein folding: relaxation dynamics of secondary and tertiary structure in native apomyoglobin. Proc. Natl. Acad. Sci. 94, 3709–3713. Goodman, J. W. (1967). Film-grain noise in wavefront-reconstruction imaging. J. Opt. Soc. Am. 57, 493–502. Gustafsson, M. G. L. (2005). Nonlinear structured-illumination microscopy: wide-field fluorescence imaging with theoretically unlimited resolution. Proc. Natl. Acad. Sci. 102, 13081–13086. Gustafsson, M. G. L., Agard, D. A., and Sedat, J. W. (1999). I5M: 3D widefield light microscopy with better than 100 nm axial resolution. J. Microsc. Oxf. 195, 10–16. Haupts, U., Maiti, S., Schwille, P., and Webb, W. W. (1998). Dynamics of fluorescence fluctuations in green fluorescent protein observed by fluorescence correlation spectroscopy. Proc. Natl. Acad. Sci. 95, 13573–13578. Heintzmann, R., Jovin, T. M., and Cremer, C. (2002). Saturated patterned excitation microscopy—a concept for optical resolution improvement. J. Opt. Soc. Am. A 19, 1599–1609. Hell, S., and Stelzer, E. H. K. (1992). Fundamental improvement of resolution with a 4Piconfocal fluorescence microscope using 2-photon excitation. Opt. Commun. 93, 277–282. Hell, S. W. (2007). Far-field optical nanoscopy. Science 316, 1153–1158. Hell, S. W., and Wichmann, J. (1994). Breaking the diffraction resolution limit by stimulatedemission–stimulated-emission-depletion fluorescence microscopy. Opt. Lett. 19, 780–782. Hess, S. T., Girirajan, T. P. K., and Mason, M. D. (2006). Ultra-high resolution imaging by fluorescence photoactivation localization microscopy. Biophys. J. 91, 4258–4272. Hsieh, C. L., Grange, R., Pu, Y., and Psaltis, D. (2009). Three-dimensional harmonic holographic microcopy using nanoparticles as probes for cell imaging. Opt. Express 17, 2880–2891. Huang, B., Wang, W. Q., Bates, M., and Zhuang, X. W. (2008). Three-dimensional superresolution imaging by stochastic optical reconstruction microscopy. Science 319, 810–813. Huisken, J., Swoger, J., Del Bene, F., Wittbrodt, J., and Stelzer, E. H. K. (2004). Optical sectioning deep inside live embryos by selective plane illumination microscopy. Science 305, 1007–1009. Jackson, J. D. (1998). Classical Electrodynamics, ed. 3. Wiley, New York. Jacobsen, V., Stoller, P., Brunner, C., Vogel, V., and Sandoghdar, V. (2006). Interferometric optical detection and tracking of very small gold nanoparticles at a water-glass interface. Opt. Express 14, 405–414. Johnson, J. C., Yan, H. Q., Schaller, R. D., Petersen, P. B., Yang, P. D., and Saykally, R. J. (2002). Near-field imaging of nonlinear optical mixing in single zinc oxide nanowires. Nano Lett. 2, 279–283. Kachynski, A. V., Kuzmin, A. N., Nyk, M., Roy, I., and Prasad, P. N. (2008). Zinc oxide nanocrystals for nonresonant nonlinear optical microscopy in biology and medicine. J. Phys. Chem. C 112, 10721–10724. Kanis, D. R., Ratner, M. A., and Marks, T. J. (1994). Design and construction of molecular assemblies with large 2nd-order optical nonlinearities—quantum-chemical aspects. Chem. Rev. 94, 195–242. Kawata, S. (1994). The optical computed tomography microscope. Adv. Opt. Electron Microsc. 14, 213–248.
110
Ye Pu et al.
Khatchatouriants, A., Lewis, A., Rothman, Z., Loew, L., and Treinin, M. (2000). GFP is a selective nonlinear optical sensor of electrophysiological processes in. Caenorhabditis elegans. Biophys. J. 79, 2345–2352. Konig, K. (2000). Multiphoton microscopy in life sciences. J. Microsc. Oxf. 200, 83–104. Konig, K., So, P. T. C., Mantulin, W. W., and Gratton, E. (1997). Cellular response to nearinfrared femtosecond laser pulses in two-photon microscopes. Opt. Lett. 22, 135–136. Kuno, M., Fromm, D. P., Hamann, H. F., Gallagher, A., and Nesbitt, D. J. (2000). Nonexponential ‘‘blinking’’ kinetics of single CdSe quantum dots: a universal power law behavior. J. Chem. Phys. 112, 3117–3120. Larson, D. R., Zipfel, W. R., Williams, R. M., Clark, S. W., Bruchez, M. P., Wise, F. W., and Webb, W. W. (2003). Water-soluble quantum dots for multiphoton fluorescence imaging in vivo. Science 300, 1434–1436. Le Xuan, L., Zhou, C., Slablab, A., Chauvat, D., Tard, C., Perruchas, S., Gacoin, T., Villeval, P., and Roch, J. F. (2008). Photostable second-harmonic generation from a single KTiOPO4 nanocrystal for nonlinear microscopy. Small 4, 1332–1336. Leith, E. N., and Upatniek, J. (1964). Wavefront reconstruction with diffused illumination and three-dimensional objects. J. Opt. Soc. Am. 54, 1295–1301. Leith, E. N., Upatniek, J., and Haines, K. A. (1965). Microscopy by wavefront reconstruction. J. Opt. Soc. Am. 55, 981–986. Long, N. J. (1995). Organometallic compounds for nonlinear optics—the search for en-lightenment. Angew Chem. Int. Edt. 34, 21–38. Lorenz, M. R., Holzapfel, V., Musyanovych, A., Nothelfer, K., Walther, P., Frank, H., Landfester, K., Schrezenmeier, H., and Mailander, V. (2006). Uptake of functionalized, fluorescent-labeled polymeric particles in different cell lines and stem cells. Biomaterials 27, 2820–2828. Marquet, P., Rappaz, B., Magistretti, P. J., Cuche, E., Emery, Y., Colomb, T., and Depeursinge, C. (2005). Digital holographic microscopy: a noninvasive contrast imaging technique allowing quantitative visualization of living cells with subwavelength axial accuracy. Opt. Lett. 30, 468–470. Miccio, L., Alfieri, D., Grilli, S., Ferraro, P., Finizio, A., De Petrocellis, L., and Nicola, S. D. (2007). Direct full compensation of the aberrations in quantitative phase microscopy of thin objects by a single digital hologram. Appl. Phys. Lett. 90, article no. 041104. Michalet, X., Pinaud, F. F., Bentolila, L. A., Tsay, J. M., Doose, S., Li, J. J., Sundaresan, G., Wu, A. M., Gambhir, S. S., and Weiss, S. (2005). Quantum dots for live cells, in vivo imaging, and diagnostics. Science 307, 538–544. Nakano, A. (2002). Spinning-disk confocal microscopy—a cutting-edge tool for imaging of membrane traffic. Cell Struct. Funct. 27, 349–355. Nakayama, Y., Pauzauskie, P. J., Radenovic, A., Onorato, R. M., Saykally, R. J., Liphardt, J., and Yang, P. D. (2007). Tunable nanowire nonlinear optical probe. Nature 447, 1098–1101. Neil, M. A. A., Juskaitis, R., and Wilson, T. (1997). Method of obtaining optical sectioning by using structured light in a conventional microscope. Opt. Lett. 22, 1905–1907. Nie, S. M., Chiu, D. T., and Zare, R. N. (1994). Probing individual molecules with confocal fluorescence microscopy. Science 266, 1018–1021. Nomarski, G. (1969). New theory of image formation in differential interference microscopy (DIM). J. Opt. Soc. Am. 59, 1524. Oldenburg, S. J., Averitt, R. D., Westcott, S. L., and Halas, N. J. (1998). Nanoengineering of optical resonances. Chem. Phys. Lett. 288, 243–247. Ow, H., Larson, D. R., Srivastava, M., Baird, B. A., Webb, W. W., and Wiesner, U. (2005). Bright and stable core-shell fluorescent silica nanoparticles. Nano. Lett. 5, 113–117. Patterson, G. H., Knobel, S. M., Sharif, W. D., Kain, S. R., and Piston, D. W. (1997). Use of the green fluorescent protein and its mutants in quantitative fluorescence microscopy. Biophys. J. 73, 2782–2790.
Harmonic Holography
111
Prasher, D. C., Eckenrode, V. K., Ward, W. W., Prendergast, F. G., and Cormier, M. J. (1992). Primary structure of the Aequorea victoria green-fluorescent protein. Gene 111, 229–233. Pringle, J. R., Adams, A. E. M., Drubin, D. G., and Haarer, B. K. (1991). Immunofluorescence methods for yeast. Method Enzymol. 194, 565–602. Pu, Y., Centurion, M., and Psaltis, D. (2008). Harmonic holography: a new holographic principle. Appl. Opt. 47, A103–A110. Pu, Y., and Meng, H. (2004). Intrinsic speckle noise in off-axis particle holography. J. Opt. Soc. Am. A 21, 1221–1230. Pu, Y., and Meng, H. (2005). Four-dimensional dynamic flow measurement by holographic particle image velocimetry. Appl. Opt. 44, 7697–7708. Rajadhyaksha, M., Grossman, M., Esterowitz, D., and Webb, R. H. (1995). In-vivo confocal scanning laser microscopy of human skin–melanin provides strong contrast. J. Invest. Dermatol. 104, 946–952. Reeve, J. E., Collins, H. A., De Mey, K., Kohl, M. M., Thorley, K. J., Paulsen, O., Clays, K., and Anderson, H. L. (2009). Amphiphilic porphyrins for second harmonic generation imaging. J. Am. Chem. Soc. 131, 2758–2759. Resch-Genger, U., Grabolle, M., Cavaliere-Jaricot, S., Nitschke, R., and Nann, T. (2008). Quantum dots versus organic dyes as fluorescent labels. Nat. Meth. 5, 763–775. Rittweger, E., Han, K. Y., Irvine, S. E., Eggeling, C., and Hell, S. W. (2009). STED microscopy reveals crystal colour centres with nanometric resolution. Nat. Photonics 3, 144–147. Rust, M. J., Bates, M., and Zhuang, X. W. (2006). Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat. Meth. 3, 793–795. Sandeau, N., Le Xuan, L., Chauvat, D., Zhou, C., Roch, J. F., and Brasselet, S. (2007). Defocused imaging of second harmonic generation from a single nanocrystal. Opt. Express 15, 16051–16060. Sauvage, S., Boucaud, P., Brunhes, T., Glotin, F., Prazeres, R., Ortega, J. M., and Gerard, J. M. (2001). Second-harmonic generation resonant with s-p transition in InAs/GaAs selfassembled quantum dots. Phys. Rev. B 63, article no. 113312. Schlegel, G., Bohnenberger, J., Potapova, I., and Mews, A. (2002). Fluorescence decay time of single semiconductor nanocrystals. Phys. Rev. Lett. 88, article no. 137401. Schnapp, B. J., Gelles, J., and Sheetz, M. P. (1988). Nanometer-scale measurements using video light microscopy. Cell Motil. Cytoskel. 10, 47–53. Schnars, U., and Juptner, W. (1994). Direct recording of holograms by a CCD target and numerical reconstruction. Appl. Opt. 33, 179–181. Schnars, U., and Juptner, W. P. O. (2002). Digital recording and numerical reconstruction of holograms. Meas. Sci. Technol. 13, R85–R101. Schwille, P., Haupts, U., Maiti, S., and Webb, W. W. (1999). Molecular dynamics in living cells observed by fluorescence correlation spectroscopy with one- and two-photon excitation. Biophys. J. 77, 2251–2265. Sharpe, J., Ahlgren, U., Perry, P., Hill, B., Ross, A., Hecksher-Sorensen, J., Baldock, R., and Davidson, D. (2002). Optical projection tomography as a tool for 3D microscopy and gene expression studies. Science 296, 541–545. Shimizu, K. T., Neuhauser, R. G., Leatherdale, C. A., Empedocles, S. A., Woo, W. K., and Bawendi, M. G. (2001). Blinking statistics in single semiconductor nanocrystal quantum dots. Phys. Rev. B 63, 5. Trifonov, A., Raytchev, M., Buchvarov, I., Rist, M., Barbaric, J., Wagenknecht, H. A., and Fiebig, T. (2005). Ultrafast energy transfer and structural dynamics in DNA. J. Phys. Chem. B 109, 19490–19495. Tsien, R. Y. (1998). The green fluorescent protein. Annu. Rev. Biochem. 67, 509–544. Vance, F. W., Lemon, B. I., and Hupp, J. T. (1998). Enormous hyper-Rayleigh scattering from nanocrystalline gold particle suspensions. J. Phys. Chem. B 102, 10091–10093.
112
Ye Pu et al.
Verbiest, T., Houbrechts, S., Kauranen, M., Clays, K., and Persoons, A. (1997). Second-order nonlinear optical materials: recent advances in chromophore design. J. Mater. Chem. 7, 2175–2189. Wampler, R. D., Kissick, D. J., Dehen, C. J., Gualtieri, E. J., Grey, J. L., Wang, H. F., Thompson, D. H., Cheng, J. X., and Simpson, G. J. (2008). Selective detection of protein crystals by second harmonic microscopy. J. Am. Chem. Soc. 130, 14076–14077. Wang, X. Y., Ren, X. F., Kahen, K., Hahn, M. A., and Rajeswaran, M. (2009). MaccagnanoZacher, S., Silcox, J., Cragg, G. E., Efros, A. L., and Krauss, T. D. Non-blinking semiconductor nanocrystals. Nature 459, 686–689. Weiss, S. (1999). Fluorescence spectroscopy of single biomolecules. Science 283, 1676–1683. Whitesides, G. M. (2003). The ‘‘right’’ size in nanobiotechnology. Nat. Biotechnol. 21, 1161–1165. Xu, C., Zipfel, W., Shear, J. B., Williams, R. M., and Webb, W. W. (1996). Multiphoton fluorescence excitation: new spectral windows for biological nonlinear microscopy. Proc. Natl. Acad. Sci. 93, 10763–10768. Xu, W., Jericho, M. H., Kreuzer, H. J., and Meinertzhagen, I. A. (2003). Tracking particles in four dimensions with in-line holographic microscopy. Opt. Lett. 28, 164–166. Xuan, L. L., Brasselet, S., Treussart, F., Roch, J. F., Marquier, F., Chauvat, D., Perruchas, S., Tard, C., and Gacoin, T. (2006). Balanced homodyne detection of second-harmonic generation from isolated subwavelength emitters. Appl. Phys. Lett. 89, 121118. Yamaguchi, I., Kato, J., Ohta, S., and Mizuno, J. (2001). Image formation in phase-shifting digital holography and applications to microscopy. Appl. Opt. 40, 6177–6186. Yamaguchi, I., and Zhang, T. (1997). Phase-shifting digital holography. Opt. Lett. 22, 1268–1270. Yip, W. T., Hu, D. H., Yu, J., Vanden Bout, D. A., and Barbara, P. F. (1998). Classifying the photophysical dynamics of single- and multiple-chromophoric molecules by single molecule spectroscopy. J. Phys. Chem. A 102, 7564–7575. Zernike, F. (1942). Phase contrast, a new method for the microsopic observation of transparent objects. Physica 9, 686–698. Zhang, T., and Yamaguchi, I. (1998). Three-dimensional microscopy with phase-shifting digital holography. Opt. Lett. 23, 1221–1223. Zipfel, W. R., Williams, R. M., Christie, R., Nikitin, A. Y., Hyman, B. T., and Webb, W. W. (2003). Live tissue intrinsic emission microscopy using multiphoton-excited native fluorescence and second harmonic generation. Proc. Natl. Acad. Sci. 100, 7075–7080. Zondervan, R., Kulzer, F., Orlinskii, S. B., and Orrit, M. (2003). Photoblinking of rhodamine 6g in poly(vinyl alcohol): radical dark state formed through the triplet. J. Phys. Chem. A 107, 6770–6776.
Chapter
4 Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery Gerhard X. Ritter* and Gonzalo Urcid†
Contents
1. Introduction 2. Basic Concepts from Lattice Theory 3. Properties of LAMs 4. LAMs and Endmembers 5. Application Examples and Results 6. The Geometry of F(X) 7. Relationships Among X, WXX, MXX, and F(X) 8. Prelude to Affine Independence 9. Affine-Independent Sets derived from W and M 10. Conclusions and Discussion 11. Appendix: Theorem Proofs References
113 116 118 119 120 136 145 148 154 157 158 167
1. INTRODUCTION Advances in passive remote sensing has resulted in imaging devices with ever-growing spectral resolution. The high spectral resolution produced by current hyperspectral imaging devices facilitates identification of fundamental materials that make up a remotely sensed scene and thus supports discrimination between them. A typical pixel of a multispectral or hyperspectral image generally represents a region on the ground * CISE Department, University of Florida, Gainesville, FL 32611, USA {
Optics Department, INAOE, Tonantzintla, Pue 72000, Mexico
Advances in Imaging and Electron Physics, Volume 160, ISSN 1076-5670, DOI: 10.1016/S1076-5670(10)60004-3. Copyright # 2010 Elsevier Inc. All rights reserved.
113
114
Gerhard X. Ritter and Gonzalo Urcid
consisting of several square meters. For example, each Landsat Thematic Mapper pixel represents a 30 30 m2 footprint on the ground. Thus, a hyperspectral image pixel can have all or parts of many different objects in it. The collection of measured intensities associated with the pixel is called the spectrum of the pixel. It is, therefore, useful to know the percentage of different, fundamental object parts that are most represented in the spectrum of a given pixel. The most widely used spectral mixing model is the linear mixing model, which assumes that the observed reflectance spectrum of a given pixel is a linear combination of a small number of unique constituent deterministic signatures known as endmembers. This model has been used by a multitude of researchers ever since Adams et al. (1986) analyzed an image of Mars using four endmembers. In the cited reference and various other applications, hyperspectral image segmentation and analysis takes the form of a pattern recognition problem as the segmentation problem reduces to matching the spectra of the hyperspectral image to predetermined spectra stored in a library. In many cases, however, endmembers cannot be determined in advance and must be selected from the image directly by identifying the pixel spectra that are most likely to represent the fundamental materials. This comprises the autonomous endmember detection problem. Unfortunately, the spatial resolution of a sensor often makes it unlikely that any pixel is composed of a single endmember. Thus, the determination of endmembers becomes a search for image pixels with the least contamination from other endmembers. These are also referred to as pure pixels. The pure pixels exhibit maximal reflectance in certain spectral bands and correspond to vertices of a high-dimensional simplex. This simplex, hopefully, encloses most if not all the pixel spectra. In this paper we assume the linear mixing model, which is based on the fact that points on a simplex can be represented as a linear sum of the vertices that determine the simplex, is a reasonable model. The mathematical equations of the model and its constraints are given by x ¼ Sa þ n ¼
m X i¼1
ai si þ n;
m X ai ¼ 1 and ai 0 8i;
(4.1)
i¼1
where x is the measured spectrum of an image pixel, s1, . . . , sm are the m endmember spectra assumed to be affinely independent, a1, . . . , am are the corresponding abundances (percentages of endmember spectra present in x), and n represents an additive noise vector. Endmembers may be obtained from spectral libraries for certain specific materials or autonomously from the image by a variety of techniques (Boardman, 1993; Roberts et al., 1998; Winter, 1999a,b). Autonomous endmember detection has received wide attention because signatures of various objects that may be present in an image are unknown beforehand.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
115
Boardman (1993) uses the framework of the geometry of convex sets to identify the m þ 1 endmembers as the vertices of the smallest simplex that bounds the measured data. A major problem is that the vertices need not be image pixels (which, in most cases, they are not) and, hence, need not have any physical connection to actual image data. Winter’s N-FindR method (Winter, 1999a,b) is based on inflating a simplex within the dataset to determine the largest simplex inscribed within the data. It is not clear how pixels outside the inscribed data are handled, and the actual algorithm is proprietary and not available in print or on the Web. Additionally, the algorithm, when implemented according to outlines available in the open literature, is computationally intensive despite claims to the contrary. Individual pixels need to be examined and simplex volume recalculated for each image pixel. In contrast, the autonomous endmember determination proposed in this paper is extremely fast and carries little computational overhead. The method is derived from examining a lattice-based autoassociative memory that stores the hyperspectral image cube. Gran˜a et al. (2003) were the first to propose the use of lattice-based auto-associative memories for autonomous endmember determination, as well as an evolutionary-based strategy for end-member discrimination (Gran˜a et al., 2004). In the first approach, strongly related to the present work, they used the notion of morphological independence, which does not necessarily lead to finding an affinely independent set of vectors that in some sense provides a maximal simplex within the dataset. Furthermore, Gran˜a et al.’s algorithm forces the user to choose a starting pixel and different starting pixels can, and often do, produce different results. More recently, Gran˜a et al. (2007, 2009) have made major changes to their original algorithm. Their new algorithm is more in line with the algorithm proposed by Myers (2005). These latter two algorithms are based on recent discoveries of algebraic properties inherent in lattice-based auto-associative memories (Ritter and Gader, 2006), but the endmembers obtained are generally not image pixels and need not have any direct physical connection to the hyperspectral image. Moreover, several of the lattice theory approaches to the endmember detection problem are based on the claim that strong lattice dependence implies affine independence (theorem 11.4 in Ritter and Gader, 2006). However, this claim remains a conjecture as the proof of the claim contains an obvious error in the dimension-reduction argument. Finally, the method described in this paper differs from those described by Myers and Gran˜a as the endmembers we obtain have a physical relationship to the pixels of the hyperspectral image under consideration. Our method will always provide the same sets of endmembers for a given hyperspectral image based on facts established in this paper. This paper is for the most part self-contained. Section 2 introduces some basic concepts of binary lattice operations on real numbers and real-valued vectors, as well as lattice-based matrix operations. The notion of latticeassociative matrix memories (LAMs) and several of their well established
116
Gerhard X. Ritter and Gonzalo Urcid
properties are discussed in Section 3. The connection between LAMs and the hyperspectral data cube is explained in Section 4. The proposed method of extracting endmembers using LAMs is also outlined in this section. Application examples and results are given in Section 5. The remaining sections are devoted to the mathematical foundation that guarantees the correctness of the proposed method. Specifically, Section 6 establishes the geometric classification of the fixed-point sets of LAMs. Although the theorems in this section have been published in Ritter and Gader (2006), we include this brief discussion for sake of completeness and to provide a more thorough understanding of the subsequent sections. Section 7 establishes the relationships among the hyperspectral data cube, the corresponding LAMs, and their fixed-point set. In Section 8 we present theorems and corollaries needed to prove the affine independence of the set of endmembers extracted by the proposed method. The claim of affine independence is made in Section 9 followed by a discussion of the method of extracting endmembers. A final discussion about our proposed methodology and conclusions drawn is presented in Section 10. Long mathematical proofs of a few theorems are included in the Appendix.
2. BASIC CONCEPTS FROM LATTICE THEORY Computational concepts for neural networks based on lattice theory are governed by the bounded lattice-ordered group (R1,_, ^, þ, þ0 ), where R denotes the set of real numbers, R1 ¼ R [{1, 1}, _ and ^ denote the binary operation of maximum and minimum, respectively, þ denotes addition, and þ0 denotes the dual operation of þ defined by aþ0 b ¼ aþb 8a 2 R and b 2 R1, and (1) þ0 1 ¼ 1 þ0 (1) ¼ 1, while (1) þ 1 ¼ 1 þ (1) ¼ 1. If r 2 R1, then its additive conjugate is given by r* ¼ r. Unless stated otherwise, a vector x 2 Rn1 , where Rn1 denotes the n-fold Cartesian product of R1, is always viewed as a column vector; that is, x ¼ (x1, . . ., xn)t, where xi 2 R1 for i ¼ 1, . . . , n and t denotes the transpose. Scalar addition of a vector x 2 Rn1 is defined componentwise. That is, if a 2 R1, then a þ x ¼ (a þ x1, . . ., a þ xn)t. The scalar dual addition a þ0 x is defined similarly. The conjugate of x 2 Rn1 is defined as x* ¼ (x)t. Given two vectors x, y 2 Rn1 , then the maximum and minimum of x and y, denoted by x _ y and x ^ y, respectively, are defined componentwise as (x _ y)i ¼ xi _ yi and (x ^ y)i ¼ xi ^ yi for i ¼ 1, . . ., n. This definition is used whenever both x and y are column vectors or both are row vectors. We note that the following duality holds: x _ y ¼ ðx ^ y Þ
;
x ^ y ¼ ðx _y Þ
(4.2)
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
117
The inequalities x y and x < y mean that xi yi and xi < yi, respectively, where i ¼ 1, . . ., n. Thus, if u ¼ x _ y and v ¼ x ^ y, then v u. As our application domain concerns only real valued vectors, we restrict our discussion to sets of vectors X ¼ fx1 ; . . . ; xk g Rn1 for which xx 2 Rn for x ¼ 1, . . ., k. With this restriction, the operation of scalar addition is self-dual as a þ xx ¼ a þ0 xx 8a 2 R1 and for all x ¼ 1, . . ., k. Henceforth, we suppose that X ¼ {x1, . . ., xk} Rn. A linear minimax combination of vectors from the set X is any vector x 2 Rn1 of the form x ¼ Sðx1 ; . . . ; xk Þ ¼
k _^
ðaxj þ xx Þ;
(4.3)
j2J x¼1
where J is a finite set of indices and axj 2 R1 8j 2 J and 8x ¼ 1, . . ., k. The expression S(x1, . . ., xk) given by Eq. (4.3) is called a linear minimax sum. In particular, any finite expression involving the symbols _, ^, and vectors of form a þ xx, where a 2 R1 and xx 2 X is a linear minimax sum with vectors from X (Ritter and Gader, 2006). A vector x 2 Rn is lattice dependent on X if and only if x ¼ S (x1, . . ., xk) for some linear minimax sum of vectors from X. We shall also use the simpler notation S (X) for S (x1, . . ., xk) whenever the set X under discussion is known. The vector x is said to be lattice independent of X if and only if it is not lattice dependent on X. The set X is said to be lattice independent if and only if 8l 2 {1, . . ., k}, xl is lattice independent of X\{xl} ¼ {xx 2 X: x 6¼ l}. To simplify notation, we define Xl ¼ X\{xl}. Linear minimax sums provide for the definitions of lattice dependence and lattice independence that are in close analogy with these concepts as defined in linear algebra. The set X is linearly independent if and only if xl 6¼ Sxx 2Xl ax xx for any set of constants {ax} R and any l ¼ 1, . . ., k. Analogously, the set X is lattice independent if and only if xl 6¼ S (Xl) for any set of constants {axj} R1 and any l ¼ 1, . . ., k. Given two m n matrices A ¼ (aij) and B ¼ (bij) with entries from R1, then the pointwise maximum, A _ B, of A and B, is the m n matrix C defined by A _ B ¼ C, where cij ¼ aij _ bij. Similarly, the pointwise minimum of two matrices of the same size is defined as A ^ B ¼ C, where cij ¼ aij ^ bij. If A is m p and B is p n, Wpthen the max product of A and B is the matrix C ¼ A _ B, where cij ¼ k¼1 ðaik þ bkj Þ. The min product of p A and B is the matrix C ¼ A ^ B defined by cij ¼ ^k¼1 ðaik þ0 bkj Þ. The two matrix products are collectively referred to as minimax products. A vector x 2 Rn1 is called a max fixed point of A if A _ x ¼ x and a min fixed point of A if A ^ x ¼ x.
118
Gerhard X. Ritter and Gonzalo Urcid
3. PROPERTIES OF LAMs Suppose X ¼ {x1, . . ., xk} Rn and Y ¼ {y1, . . ., yk} Rm are two sets of pattern vectors with desired association given by the diagonal {(xx, yx): x ¼ 1, . . ., k} of X Y. The goal is to store these pattern pairs in some memory that recalls yx when presented with the pattern xx, for x ¼ 1, . . ., k. With each pair of pattern associations (X, Y), two canonical latticeassociative memories denoted by WXY and MXY can be defined. These memories are matrices of size m n and can be expressed in terms of minimax outer products, namely, WXY ¼
k ^
ðyx þ xx Þ ; MXY ¼
x¼1
k _
ðyx þ xx Þ;
(4.4)
x¼1
where the minimax outer product is defined as y þ x ¼ y _ x ¼ y ^ x or, equivalently, as 0 1 y1 x1 . . . y1 xn B C .. (4.5) y þ x ¼ @ ⋮ . ⋮ A: ym x1
. . . ym xn
The entry elements of the min-memory WXY and the max-memory MXY, respectively, are defined by wij ¼
k ^
ðyxi xxj Þ ; mij ¼
x¼1
k _
ðyxi xxj Þ:
(4.6)
x¼1
The memories are called auto-associative memories whenever Y ¼ X. In this case the diagonals wii and mii of the matrices WXX and MXX, respectively, consist entirely of zeros. All entries of these matrices obey the following triangle inequalities. Theorem 3.1. If h, j, ‘ 2 {1, . . ., n}, then wj‘ þ w‘h wjh and mh‘ mhj þ mj‘. This theorem was proven as lemma 5.1 in Ritter and Gader (2006). When speaking of fixed points of the matrices WXX and MXX, we always mean a fixed point of WXX with respect to the operation _ and of MXX with respect to the operation ^ . The strengths and weaknesses of these lattice correlation matrices have been discussed in detail elsewhere (Ritter et al., 1998, 1999, 2003; Ritter and Gader, 2006). The following properties, which were proven in Ritter and Gader (2006), are pertinent to our discussion. In the following, X ¼ {x1, . . ., xk} Rn and x 2 Rn. Property 3.1. WXX _ xx ¼ xx ¼ MXX ^ xx for each x ¼ 1, . . ., k. Property 3.2. WXX _ x ¼ x if and only if MXX ^ x ¼ x.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
119
Property 3.3. x is a fixed point of WXX if and only if x is lattice dependent on X or, equivalently, x is a fixed point of MXX if and only if x is lattice dependent on X. According to Property 3.1 above, both WXX and MXX are perfect recall memories for uncorrupted input patterns and any positive integer k, no matter how large. Also, Property 3.2 says that WXX and MXX share the same set of fixed points. Hence, every max fixed point of WXX is also a min fixed point of MXX and vice versa. We denote the common set of fixed points by F(X). Property 3.3 provides an algebraic classification of the set F(X).
4. LAMs AND ENDMEMBERS W V Henceforth we let u ¼ kx¼1 xx , and v ¼ kx¼1 xx . The hyperbox determined by v and u is the convex set B(v, u) ¼ {x 2 Rn: vi xi ui, i ¼ 1, . . ., n}. The points u and v are called the maximal and minimal corners of the hyperbox B (v, u), respectively. Obviously, X B (v, u) and by Property 3.1, X F(X). Therefore, X B (v, u) \ F(X). Also, with the memory WXX we associate a collection of n vectors W ¼ {w1, . . ., wn} defined by wj ¼ uj þ vj, where uj denotes the jth coordinate value of u and vj is the vector corresponding to the jth column of WXX. We need to emphasize that the collection W need not be a set in the strict sense as it is possible that for two column vectors wj and w‘ in W with j 6¼ ‘ we may have that wj ¼ w‘. In an analogous fashion, we define a collection M of vectors associated with MXX by defining mj 2 M as mj ¼ vj þ uj, where vj denotes the jth coordinate value of v and uj corresponds to the jth column of MXX. It follows from the detailed analysis presented in Sections 6 to 9 that the elements of the set V ¼ W [ M [ {v, u} are vertices of the convex polytope B (v, u) \ F(X). Thus, if X represents the hyperspectral image cube, then X is completely contained in B (v, u) \ F(X) and lends itself to convex hull analysis using the elements of V. In our approach to endmember determination, we consider the set X to be the total given set of pixels obtained from the hyperspectral image cube. This cube may have been significantly reduced in dimensionality by applications of a chosen technique, such as principal component analysis (PCA), minimum noise fraction transform, or adjacent band removal of highly correlated bands. Such reductions are often necessary in view of storage and computational requirements. For example, the storage space required for a single hyperspectral image can be around 134 Mb (Vane et al., 1993). The first step of the algorithm is to form the memories WXX and MXX ¼ WXX ¼ WtXX and the computation of the vectors v and u. The next step consists of forming the sets W and M from the columns of WXX and MXX, respectively. The set V then contains all possible
120
Gerhard X. Ritter and Gonzalo Urcid
endmembers for analyzing X, with v representing the dark point. To find affinely independent subsets of V we turn our attention to the sets W and M. In many but not necessarily all cases, the sets W and M turn out to be affinely and even linearly independent. Specifically, if wj 6¼ w‘ and mj 6¼ m‘ for all distinct pairs j, ‘ 2 {1, . . ., n}, then W and M are affinely independent (see Theorem 9.1). In this case, we generally have that W [ {v} is also affinely independent. In fact, as Example 7 shows, it is possible that W ¼ W [ {v}. This holds if and only if vi ¼ ui for some i 2 {1, . . ., n}. However, in practice this case will have been eliminated since it means that the ith band is of constant value. Thus, if v is not affinely dependent on W, then W [ {v} spans an n-simplex with the endmember wj containing the maximal measured brightness in the jth band since j wj ¼ uj þ wjj ¼ uj . If wj ¼ w‘ or mj ¼ m‘ for some pair j 6¼ ‘, then a simple ~ or M ~ from W or procedure to obtain a smaller affinely independent set W M, respectively, is described in detail following Theorem 9.1. But in either ~ generally contain too large a number of endmembers. case the sets W or W However, a priori knowledge of the desired number of endmembers or use of alternative endmember reduction methods can be applied to eliminate possibly unimportant endmembers. In the next section we provide one such application.
5. APPLICATION EXAMPLES AND RESULTS The following examples are based on hyperspectral data remotely sensed by the Airborne Visible and Infrared Imaging Spectrometer (AVIRIS) of the Jet Propulsion Laboratory at the National Aeronautics and Space Administration (NASA). The aim of the application examples is to provide enough detail to demonstrate the effectiveness of using the canonical lattice auto-associative memories, WXX and MXX, to determine sets of endmembers from which a subset of final endmembers is selected to accomplish hyperspectral image segmentation by means of constrained linear unmixing. Other methods for autonomous final endmember determination have been described elsewhere, for example, vertex component analysis (VCA) (Nascimento and Bioucas-Dias, 2005), hierarchical Bayesian models (Dobigeon et al., 2007), minimum mutual information (Gran˜a et al., 2007), minimum-volume enclosing simplex (MVES) (Chan, et al., 2009a,b), and pattern elimination based on minimal Chebyshev distance or angle between vector pairs (Ritter et al., 2009). At the end of this section, linear unmixing based on the VCA and MVES approaches is described briefly, including a qualitative comparison of the results obtained between them and the W & M columns method. Example 1. One of the most studied cases in mineralogical and chemical composition has been the mining site of Cuprite, Nevada. The United States
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
121
Geographical Survey (USGS) Spectroscopy Laboratory has produced detailed maps of mineral and chemical compound distribution at the Cuprite site. The left part of Figure 1 shows a grey-scale image obtained from a red-green-blue (RGB) color image with wavelengths of 0.67 mm (red), 0.56 mm (green), and 0.48 mm (blue). The right image of Figure 1 displays a simplified 6–grey-tone map, built with five representative minerals chosen from the mineral groups present in the original USGS image map produced by Clark and Swayze in 1995 (see Table 1; the assigned grey tone is a visual aid to distinguish mineral distribution). The USGS source image map shows only the geographical region of Cuprite where mineral distribution is prominent. It has a width of 534 pixels (of 614 pixels available in a scan line) and is formed with the
FIGURE 1 Left, grey-scale equivalent of an RGB composite image of the mining site of Cuprite, Novada. Right, simplified 6–grey-tone mineral distribution map. The true size of each image is 534 972 pixels. TABLE 1
Representative minerals in the cuprite site
Description
Grey Tone
Area (%)
Alunite GDS96 (250C) Calcite CO 2004 Kaolinite KGa-2 (pxl) Montmorillonite SCa-2.b Muscovite CU93-1 low-Al Unclassified
255 191 223 159 127 0
4.2 22.6 10.8 16.1 27.1 19.2
122
Gerhard X. Ritter and Gonzalo Urcid
lower 358 scan lines of scene no. 3, followed by all 512 scan lines of scene no. 4, and ends with the upper 102 scan lines of scene no. 5 as registered in the 1997 flight. The first 80 pixels were dropped from each scan line to match spatially the region studied by USGS scientists. The AVIRIS device acquires hyperspectral images in 224 channels; however, only 52 noiseless channels that fall within the short-wavelength infrared band are considered for mineral detection. Specifically, channels no. 169 (1.95 mm) to 220 (2.47 mm) are used to match remote spectra against subsampled ground or laboratory USGS spectra, since these last ones are obtained with higher spectral resolution (Clark et al., 2007). However, only 48 AVIRIS channels were selected for spectral matching due to the existing difference in wavelength scale between ground and remote spectrometers. Table 2 gives the correspondence between the wavelength scale used by the AVIRIS imaging spectrometer and the reference scale of a USGS laboratory spectrometer, showing explicitly which channels were not considered. In this example, an artificial test data cube associated with the 6–grey-tone map is composed of ‘‘pure’’ pixel spectra corresponding to the minerals listed in Table 1 and illustrated in Figure 2. Note that each pixel spectra is a 48-dimensional vector. Computation of the lattice auto-associative memories WXX and MXX is quite simple in this case. From Eq. (4.6), e.g., for i, j ¼ 1, . . ., n with n ¼ 48, wij ¼
k ^
ðxxi xxj Þ ¼
x¼1
m ^ ^ p¼1x2Op
ðxxi xxj Þ ¼
m ^
x
x
ðxi p xj p Þ;
(4.7)
p¼1
P where, k ¼ m p¼1 jOp j ¼ 419; 390 (of 519, 048 ¼ 534 972 vectors), is the number of nonzero vectors in the data cube, m ¼ 5 is the number of ‘‘pure’’ pixels, each Op is the family of indices that correspond to pixels describing TABLE 2 AVIRIS and USGS spectrometer wavelength scales in microns (mm). AVIRIS channels not used for spectral sample matching are marked with an X Channel
AVIRIS
USGS
212 213 214 215 216 217 218 219 220
2.388 2.398 2.408 2.418 2.427 2.437 2.447 2.457 2.467
2.386 2.400 X 2.418 X 2.440 X X 2.466
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
123
0.9
Normalized reflectance
0.8 0.7 0.6 0.5 0.4 0.3 0.2 2.00
Alunite Calcite Kaolinite Montmorillonite Muscovite
2.05
2.10
2.15
2.20
2.25
2.30
2.35
2.40
2.45
Wavelength (mm)
FIGURE 2
Spectral curves of the minerals given in Table 1 (true endmembers).
the same mineral, and xp 2 {1, . . ., k} is the first sequential index in the data cube equal to a different spectral vector. The first equality in Eq. (4.7) is a consequence of the associative property of the minimum binary operation, the second equality follows by idempotency. With X reduced to fxx1 ; . . . ; xx5 g, the scaled min- and max-memories are calculated in milliseconds (see Example 2, Section 5), and the 48 columns of each memory matrix give all candidate endmembers. Since this is a test example for which a priori knowledge is available about the hyperspectral image, a simple matching procedure applied to the column vectors wj 2 W and mj 2 M, together with the u and v bounds, immediately yields a subset of final endmembers. Let Y ¼ W [ {u} and, for p ¼ 1, . . ., 5 and q ¼ 1, . . ., 49, let cpq ¼ rðxxp ; yq Þ, where xxp 2 X and yq 2 Y, be the linear correlation coefficients between true and LAM-determined endmembers. In addition, for all p, let mp ¼ max1q49 {cpq} be the maximum correlation coefficient of each pure pixel against all potential endmembers in Y, and compute bpq ¼ ifðcpq ¼ mp ; ifðcpq a; 1; 0Þ; 0Þ;
(4.8)
where a is a parameter used to threshold correlation coefficient values and if(condition, true, false) is the usual if-then-else programming construct expressed as a three-argument function. A binary value, bpq ¼ 1, gives the row and column indices of a ‘‘very good’’ match between a true endmember (row index) and the potential endmember marked as final
124
Gerhard X. Ritter and Gonzalo Urcid
(column index). A similar spectral matching procedure is applied to the set Z ¼ M [ {v}. As it turns out, all five ‘‘pure’’ pixels used to build the artificial data cube are correctly determined from both LAMs as final endmembers, a result in agreement with the theoretical background developed in Sections 7 to 9. Figure 3 displays the set of true and final endmembers, where the curve labels give the specific element in W [ M [ {u, v} whose correlation coefficient, denoted by r, is highest if compared to each true endmember (‘‘pure’’ spectra). Final and true endmember curves are superimposed if a perfect match exists (r ¼ 1) and, to avoid too much overlap, the bottom two curves (muscovite) are displaced 0.1 off their original normalized reflectances. As a final remark, the set {v, m1, m19, m28, w2, u}, where v is taken as the dark point, is affinely independent and its convex hull forms a 5-simplex that encloses X. The corresponding abundance maps are displayed in Figure 4. Example 2. Moffett Field is a remote sensing test site on the bay of San Francisco, a few kilometers north of the city of Mountain View, California. The site includes the Naval Air Station Moffet Field, agriculture fields, water ponds, salt banks, and manmade constructions such as several airplane hangars that today are museums. The hyperspectral data collected by AVIRIS are ideal for water variability, vegetation, and urban studies (Anderson et al., 2007; Kruse
0.9 Calcite r =1
Normalized reflectance
0.8
Alunite r = 0.988
0.6 0.5
u
Kaolinite 2 r = 0.999 w
0.7
m19
0.4
Muscovite r = 0.963 m28
0.3
m. m. r =1 m1
True endmember Final endmember
0.2 2.00
2.05
2.10
2.15 2.20 2.25 2.30 Wavelength(mm)
2.35
2.40
2.45
FIGURE 3 The solid lines represent the final endmembers determined from the set W [ M [ {u, v} and the dashed lines correspond to the given true endmembers; the mineral montmorillonite is abbreviated as m.m.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
125
FIGURE 4 Top row, left to right: 6–grey-tone mineral distribution map of Cuprite, abundance maps of alunite (m1) and calcite (u). Bottom row, left to right: abundance maps of kaolinite (w2), montmorillonite (m1), and muscovite (m28).
et al., 1997; Richardson et al., 1994). Figure 5 shows a grey-scale equivalent of an RGB color image of scene no. 3 composed with wavelengths of 0.656 mm (red, channel 30), 0.508 mm (green, channel 15), and 0.409 mm (blue, channel 5). Several artificial and natural resources can be extracted from the scene using only 103 noiseless channels that fall within the visible and first half of the short-wavelength infrared bands. Specifically, channels no. 4 (0.40 mm) to 106 (1.34 mm) can be used for endmember determination. Thus, the hyperspectral
126
Gerhard X. Ritter and Gonzalo Urcid
FIGURE 5 Grey-scale equivalent of an RGB composite color image of scene no. 3 of the Moffett Field site in California. True size of image is 614 512 pixels.
cube of reflectance data is formed by 314, 368 pixel spectra (614 512) and each pixel is a 103-dimensional vector. In this case, the specific format of the AVIRIS data file requires as a first step the extraction of all pixel spectra line by line to form set X ¼ {x1, . . ., xk} Rn, where k ¼ 314,368 and n ¼ 103. The second step performs the computation of the vector bounds u and v and the scaled sets W and M, respectively with Eqs. (4.9) and (4.10), for i, j ¼ 1, . . ., n; see also Eq. (4.33). wij ¼
k _ x¼1
mij ¼
xxj þ
k ^ x¼1
xxj þ
k ^
ðxxi xxj Þ;
(4.9)
ðxxi xxj Þ:
(4.10)
x¼1 k _
x¼1
We point out that the scaling operation generates an ‘‘upward spike’’ in endmembers selected from W since wii ¼ ui, or a ‘‘downward spike’’ if endmembers come from M because mii ¼ vi. A simple smoothing procedure to eliminate the anomalous spikes considers the nearest one
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
127
or two spectral samples next to wii or mii (Myers, 2005). It is given, for any i 2 {1, . . ., n}, by 8 , i ¼ 1; y2;1 > > <1 (4.11) yii ¼ 2 ðyi1;i þ yiþ1;i Þ , 1 < i < n; > > : yn1;n , i ¼ n; where y equals w or m. Since Y ¼ W [ {u} or Z ¼ M [ {v} result in matrices of size 103 104, each one gives all 104 column vectors as possible endmembers. The LAMs method always gives a number of ‘‘candidate’’ endmembers that is either equal or slightly less than the spectral dimensionality, but in practice several contiguous columns are highly correlated. Hence, most of these potential endmembers can be discarded using appropriate techniques. As an alternative way to automate endmember screening, we calculate a matrix whose entries are given by the linear correlation coefficients, cpq ¼ R(yp, yq) for p, q ¼ 1, . . ., n, where yp, yq 2 Y (resp. in Z). A threshold t value is then applied on cpq to get a subset of selected endmember pairs with low correlation coefficients. For scene no. 3 of the Moffett Field data, t ¼ 0.005 was applied to Y (resp., t ¼ 0.0005 for Z). The resulting set is first sorted by ascending column index and after elimination of repeated or contiguous indices, the number of potential endmembers is decreased from 104 to 10. On physical ground, it should be clear that a low correlation value between potential endmembers does not necessarily guarantee a clear-cut criterion to obtain ‘‘good’’ endmembers useful to match highresolution laboratory spectra. Nevertheless, the reduction in the number of LAM column vectors is quite useful to find a smaller subset of final endmembers. Another, simpler for the pffiffiffiffiffiffiffiffiffiffiffibut supervised technique pffiffiffiffiffiffiffiffiffiffiffi latticebased approach forms n þ 1 subsets, each with n þ 1 column vectors taken from Y (resp. Z), and then, a representative from each group is selected at random as a final endmember. Thus, in this scheme, the number of possible endmembers is always diminished one order of magnitude. Both techniques provide a reasonable number of approximate true endmembers without sacrificing spectral resolution, which, from a physical point of view, is an advantage compared with data dimensionality reduction techniques such as PCA (Duda, et al., 2000). Figures 6 and 7 display the final endmember curves for scene no. 3 of the Moffett Field site determined, respectively, from the scaled version of the min memory WXX and the max memory MXX. Normalization of reflectance data values in spectral distributions is linearly scaled from the range [50, 12000] to the unit interval [0, 1]. For the problem at hand, the vector bounds u and v, as well as many other potential endmembers, were rejected by the correlation coefficient technique described earlier.
128
Gerhard X. Ritter and Gonzalo Urcid
1.0 0.9
p./f. Water w28
Normalized reflectance
0.8
Vegetation w49
0.7
Manmade
0.6 w4
0.5 0.4 0.3
w78
0.2
Pigmented water
0.1 0.0 0.4
0.5 .5
0.6
0.7
0.8 0.9 1.0 Wavelength (mm)
1.1
1.2
1.3
FIGURE 6 Spectral curves of final endmembers determined from Y, with corresponding resource classification; p./f., ¼ pigmented/fresh.
1.0 0.9
Normalized reflectance
0.8 Manmade/soil
0.7
m1 m24
0.6 0.5
Vegetation
0.4 0.3 m78
0.2 0.1
Algae-laden water & s.s.c.
0.0 0.4
0.5
0.6
0.7
0.8 0.9 1.0 Wavelength(mm)
1.1
1.2
1.3
FIGURE 7 Spectral curves of final endmembers determined from Z, with corresponding resource classification; s.s.c. ¼ suspended sediment concentration.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
129
For the unmixing stage, recall that Eq. (4.1) is an overdetermined system of linear equations (n >m), subject to the restrictions of full additivity and nonnegativity of abundance coefficients as stated in the same equation. In the present case, in matrix W (resp. M), any two-column vectors are distinct and therefore its columns form a linear independent set of vectors. Also, it happens that the set of final endmembers, E ¼ {w4, w28, w49, w78, m1, m24, m78} W [ M, is a linear independent set whose pseudoinverse matrix is unique. Although the unconstrained solution corresponding to Eq. (4.1), where n ¼ 103 >7 ¼ m, has a single solution, some coefficients turn out to be negative for many pixel spectra and do not sum up to unity. If full additivity is enforced, negative coefficients again appear. Therefore, the best approach consists of imposing nonnegativity for the abundance proportions and P simultaneously relaxing full additivity by considering the inequality m k¼1 ak < 1. Specifically, we use the nonnegative least squares (NNLS) algorithm that solves the problem of minimizing the Euclidean norm expressed by kSa xk2, subject to the condition a 0. The details related to the NNLS algorithm can be found in Ikramov and Matin-far (2006); and Lawson and Hanson (1974). Figures 8 to 14 illustrate the abundance maps corresponding to the seven final endmembers shown in Figures 6 and 7. The maps shown were obtained
FIGURE 8 Abundance map obtained from w4; brighter grey tones indicate artificial or manmade resources (e.g., urban settlements).
130
Gerhard X. Ritter and Gonzalo Urcid
with the NNLS numerical method as implemented in MATLAB (The Math Works, Inc., Natick, MA) and, for visual clarity, each map has been contrast enhanced by the application of a nonlinear increasing function. Note that in the present situation, we are faced with no a priori knowledge about the hyperspectral image content, and therefore, approximate resource classification is possible after the abundance maps are computed. However, a minimal base knowledge of hyperspectral image analysis and a fundamental background in spectral analysis are required for proper recognition of spectral signature characteristics in single, multiple, or mixed resources. For example, in the abundance map shown in Figure 12 (final endmember, m1), the presence of manmade structures or soil is clearly differentiated. Also, observe that the abundance map shown in Figure 11, obtained from final endmember w78, is somewhat similar to the abundance map generated from endmember w28 (see Figure 9) because each spectral distribution provides information about the presence of pigmented water. However, additional pigmented areas in Figure 11 are related to wet soil and wet vegetation. To visualize these ‘‘hidden’’ areas, the abundance image map associated with w78 was enhanced by applying a nonlinear contrast curve
FIGURE 9 Abundance map obtained from w28 showing pigmented and freshwater distribution. The bright regions at the left correspond to evaporation ponds pigmented by red brine shrimp.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
131
FIGURE 10 Abundance map obtained from w49; brighter grey tones indicate vegetation distribution (e.g., shoreline and inland vegetation).
FIGURE 11 Abundance map obtained from w78 displaying the red-pigmented areas shown in Figure 9. In addition, it shows other pigmented areas of wet soil and wet vegetation.
132
Gerhard X. Ritter and Gonzalo Urcid
FIGURE 12 Abundance map obtained from m1; grey-tone distribution reveals both manmade resources and different kinds of soil.
FIGURE 13 Abundance map obtained from m24. Large, irregular white regions with dark grooves correspond to golf courses.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
133
FIGURE 14 Abundance map obtained from m78 showing algae-laden water with some concentration of suspended sediment.
defined by the points (0, 40), (25, 115), (70, 172), (130, 195), and (250, 190) over the nominal input-output grey-scale range. Thus, the identification labels that appear in Figure 6 and 7 are the result of an overall spectral scrutiny, aided by a visual match between an RGB ‘‘true’’ color reference image and each final endmember corresponding abundance map obtained by constrained linear unmixing. Precise resource identification would require an exhaustive matching procedure between each final endmember that belongs to E against high-resolution signatures available in professional spectral libraries (Clark, et al., 2007; Hook, 1999). Table 3 resumes the time spent by a low-end personal computer (1.8-GHz processor, 512-Mb RAM, 2-Gb virtual RAM, and 80-Gb hard disk) for each computational task performed in the analysis of scene no. 3 of the Moffet Field hyperspectral image. Note that tasks 5 and 7 in the analysis process are of an interactive nature but rely, respectively, on tasks 4 and 6, which are completely automatic. Specifically, tasks 2, 3, and 4 are the fundamental steps of the LAMs-based technique whose computational effort is minimum in comparison to the processing times needed to perform the other tasks. For a hyperspectral image of size N ¼ p q pixels acquired over n spectral bands, the computational effort required for pixel spectra extraction (task 1) is linear in N since n N. On the other hand, the
134
Gerhard X. Ritter and Gonzalo Urcid
TABLE 3 Processing times for Example 2 Task No./Description
Time (min)
1 2 3 4 5 6 7
11.23 5.65 0.01 2.52 — 32.00 —
Pixel spectra extraction Vector bounds and LAMs computation LAMs linear independence test Correlation-based endmember elimination Final endmember selection NNLS abundance map generation General resource classification
NNLS method needs on the order of nm3 arithmetical operations to find a unique set of abundance coefficients for each pixel spectra. Since m represents the number of final endmembers where m N, it turns out that for hyperspectral image segmentation the computational complexity of the NNLs method (task 6) is nm3 per pixel. The overall computational complexity of the LAMs-based technique (tasks 2, 3, and 4) is proportional to n2, which for values of n of a few hundreds is quite fast (see Table 3). The endmember determination (task 5) relies on a subset of pffiffiffiffiffiffiffiffiffiffiffi 2 n þ 1 candidate extremal vectors generated after task 4 is completed pffiffiffi and, although interactive in nature, m / b nc ‘‘final’’ endmembers can be selected in a lapse of minutes. Finally, general resource classification (task 7) may also be accomplished within minutes whenever a working knowledge in spectral identification or prior experience in hyperspectral image analysis is available for the problem at hand (Jensen, 2007). Two recent approaches to linear spectral unmixing, comparable in performance to N-FindR and based on the geometry of convex sets, are vertex component analysis (VCA) by Nascimento and Bioucas-Dias (2005) and MVES by Chan et al. (2009a,b). VCA is an unsupervised technique that relies on singular value decomposition (SVD) and PCA as subprocedures and assumes the existence of pure pixels. Specifically, VCA exploits the fact that endmembers are vertices of a simplex and that the affine transformation of a simplex is again a simplex. This algorithm iteratively projects data onto a direction orthogonal to the subspace generated by the endmembers already determined. The new endmember spectrum is the extreme of the projection and the main loop continues until all given endmembers are exhausted. Similarly, the MVES is an autonomous technique supported on a linear programming (LP) solver but does not require the existence of pure pixels in the hyperspectral image. However, when pure pixels exist, the MVES technique leads to unique identification of endmembers. In particular, dimension reduction is accomplished by affine set fitting, and Craig’s unmixing criterion (Craig, 1994) is applied to formulate hyperspectral unmixing as an MVES optimization problem.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
TABLE 4
135
Characteristics of three autonomous linear unmixing techniques
Algorithm
Input
Endmembers
Abundances
Complexity
VCA MVES W&M
X, m X, m X
SVD, PCA APS, LPs LAMs, CMs
M-P inverse M-P inverse LNNS
2n2N þ m3, 2m2N 2n2N þ m3, am2N1.5 n2N(n þ 3), nm3N
Table 4 gives the main characteristics of the VCA and MVES convex geometry–based algorithms, as well as the lattice algebra approach based on the W & M-scaled column vectors. In the third column of Table 4, the numerical procedures used by VCA are SVD for projections onto a subspace of dimension m and PCA for projections onto a subspace of dimension m 1. Algorithm MVES first determines the affine parameters set (APS), solves by LP an initial feasibility problem with linear convex constraints, and iteratively optimizes two LP problems with nonconvex objective functions. The W & M algorithm computes first the scaled min and max LAMs and their corresponding correlation coefficient matrices (CMs) for further endmember discrimination. Abundance coefficients (fourth column) are determined in the VCA and MVES algorithms using the Moore–Penrose pseudoinverse and, as explained earlier, the W & M algorithm makes use of the NNLS method. Computational complexity, listed in the fifth column, is expressed as a function of the number of endmembers (m), the number of pixels or observed spectra (N), the number of iterations (a), and the number of spectral bands (n). The left expression stands for the complexity of the numerical procedures on which each algorithm relies. Thus, 2n2N þ m3 floating-point operations are required by SVD, PCA, or APS. Note that the corresponding complexity to compute both LAMs and their CMs does not depend on m. The right expression in the same column gives the complexity of finding endmembers and calculating their abundance coefficients for all pixels of input image X. We remark that since VCA and MVES require knowing in advance the number m of endmembers to be found, their application to a real hyperspectral image must probe all values of m in a specified interval, for example, from 1 to mmax. Hence, if no a priori information is known about the number of pure pixels existing in a hyperspectral image, the computational performance for finding endmembers andP determining max 2 their P abundance fractions increases, respectively, to 2N m m¼1 m and m max aN 1:5 m¼1 m2 , which are proportional to m3max N and am3max N1:5 (see right expression, fifth column). Therefore, in this case, overall complexity is similar among the three listed techniques. Furthermore, the application of the W & M columns method, briefly explained in Section 4 and
136
Gerhard X. Ritter and Gonzalo Urcid
theoretically founded in the following sections, was validated by the application of the VCA and MVES algorithms to specific subimages of Examples 1 and 2.
6. THE GEOMETRY OF F(X) The algebraic classification of F(X) conveys little information as to the actual shape of F(X) or its geometric relationships to X, WXX, and MXX. These relationships are key in understanding the lattice-based approach to autonomous endmember detection and in proving the affine independence of the endmembers found by our algorithm. The shape characterization of F(X) requires some basic knowledge of vector space theory and high-dimensional Euclidean geometry. Specifically, the notions of linear spaces, direction of linear spaces, and orientation in higher-dimensional spaces are essential. A set E in a vector space V over a field F is called a linear subspace of V if and only if for any two distinct points x, y 2 E the set L(x, y) ¼ {z 2 V: z ¼ lx þ (1 l)y, l 2 F} E. It follows that in vector spaces linear subspaces are equivalent to affine subspaces (but not necessarily vector subspaces) (Critescu, 1977; Hadwiger, 1957). For spaces other than vector spaces, linear subspaces generally have no relation to affine spaces (Batten and Rentelspacher, 1993). In our subsequent discussion, V ¼ Rn, F ¼ R, and the terms linear subspace and affine subspace can be used interchangeably. Several linear subspaces of Rn play a vital role in describing F(X). One such set of linear subspaces consists of lines of form LðxÞ ¼ fy 2 Rn : y ¼ a þ x; a 2 Rg;
(4.12)
where x 2 Rn is fixed. The connection between L(x) and F(X) is given by the following observation. If x 2 F(X), then a þ x 2 F(X) 8 a 2 R (see theorem 6.2 in Ritter and Gader, 2006) and, hence, L(x) F(X). Another set of linear subspaces consists of specific types of hyperplanes. Recall that a hyperplane E in Rn is defined as the set of all points x 2 Rn that satisfy an equation of the form a1 x1 þ a2 x2 þ . . . þ an xn ¼ b;
(4.13)
where the ai’s and b are constants and not all the ai’s are zero. It follows from Eq. (4.13) that E is an (n 1)-dimensional linear subspace of Rn. It will be convenient to associate a set of direction with a given linear subspace E of Rn and an orientation with a given hyperplane. For this purpose we let e1, . . ., en denote the canonical orthonormal basis for Rn; that is, each ej is defined by 1 if i ¼ j j : (4.14) ei ¼ 0 if i 6¼ j
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
137
In addition to the canonical basis vectors, the directional vectors pertinent to our discussion are the vectors e and dij defined by e¼
e1 þ þ en ei ej ; d ; ¼ ij ke1 þ þ en k kei ej k
(4.15)
where i < j, 1 i < n, and 1< j n. Directions and orientations in Rn will be specified in terms of unit vectors emanating from the origin 0 with endpoints lying on the (n 1)P dimensional unit sphere Sn1 ¼ fx 2 Rn : ni¼1 x2i ¼ 1g centered at the origin. Thus, a direction d is uniquely determined by a system of directional cosines that determines the coordinates of d on the unit sphere. This system is defined by cos yi ¼ ei d ¼ di ;
n X
cos2 yi ¼ 1:
(4.16)
i¼1
A directional vector d is a parallel direction, or simply a direction for a kdimensional linear subspace Ek Rn with 0< k < n if and only if for every point x 2 Ek, x þ d 2 Ek. If D(Ek) ¼ {d 2 Sn1: d is a parallel direction for Ek}, then D(Ek) is called the set of parallel directions affiliated with EK and this set forms a (k 1)-dimensional subsphere of Sn1. Also, two linear subspaces Ek and E‘ (k ‘) are said to be parallel if and only if D(Ek) D(E‘). Obviously, all lines of form L(x) ¼ {y 2 Rn: y ¼ a þ x, a 2 R} are parallel one-dimensional (1D) linear subspaces with affiliated parallel directions e and e and D(L(x)) ¼ {e, e} is a 0-dimensional subsphere of Sn1. An oriented hyperplane E(d) is simply a hyperplane E with an associated directional unit vector d that is normal (perpendicular) to E. We note that since the vector d points in the opposite direction of d (i.e., is the antipode of d on the unit sphere), each hyperplane can be endowed with one of two possible directions. Given two oriented hyperplanes E1(d) and E1(v), then E1 and E2 are said to be parallel, denoted by E1 k E2, whenever d ¼ v or, equivalently, whenever d v ¼ 1. If d v 6¼ 1, then E1 \ E2 is a linear subspace of dimension n 2. A special case occurs when d v ¼ 0. In this case, E1 and E2 are said to be perpendicular, denoted by E1 ? E2. An oriented hyperplane E(d) separates Rn into two open half-spaces þ H (d) and H(d) that are bounded by E(d). If E is given by Eq. (4.13), then E can also be expressed in terms of the function f ðxÞ ¼ a1 x1 þ a2 x2 þ . . . þ an xn b ¼ 0: þ
(4.17)
We shall use the convention of identifying H (d) and H (d) with the half-spaces {x 2 Rn: f(x) >0} and {x 2 Rn: f(x)< 0}, respectively. The closure þ(d) ¼ {x 2 Rn: f(x) 0}. Similarly, H (d) ¼ {x 2 Rn: of Hþ(d) is the set H þ (d) ¼ E(d). (d) \ H f(x) 0}. Therefore, H
138
Gerhard X. Ritter and Gonzalo Urcid
It is instructive to illuminate the geometric properties of hyperplanes of type E(dij) as it is the half-spaces of these hyperplanes that determine the shape of F(X) and aid in the autonomous selection of the set of endmembers. For y 2 Rn, there exist exactly [n(n 1)]/2 distinct hyperplanes of type E(dij), where 1 i < j n, that contain the point y (Ritter and Gader, 2006). We let Ey(dij) denote the unique hyperplane of type E(dij) containing y 2 Rn. Thus, if the dimension n ¼ 2 and y ¼ (y1, y2)t 2 R2, then there is only one such hyperplane—namely, the line Ey(d12) ¼ L(y) of slope m ¼ 1 given by x2p¼ffiffiffi x1 þ p b, ffiffiwhere b ¼ y2 y1. Explicitly, E ffi pyffiffi(d ffi 12)phas ffiffiffi orientation d12 ¼ ð1= 2; 1= 2Þ and direction e ¼ ð1= 2; 1= 2Þ. Figure 15 illustrates the three unique hyperplanes Ey(dij) for y 2 R3. Since Ey(dij) ¼ {x 2 Rn: xi xj ¼ yi yj}, it is easy to verify that each of the hyperplanes contains the line L(y) and that the equality \i<jEy(dij) ¼ L(y) holds in general. Another important geometric construction is derived from the following observation. Note that for each pair (i, j) with 1 i < j 3, the hyperplane Ey(dij) in Figure 15 is perpendicular to the (xi, xj)–plane in R3. This holds true in any dimension. The hyperplane Rn1 Rn defined i i n1 n by Ri ¼ fx 2 R : xi ¼ 0g has orientation e , and if ‘ 2 = {i, j}, then dij e‘ ¼ 0. n1 Therefore, Ey ðdij Þ ? R‘ . The case where ‘ 2 {i, j} is of special interest. In this case, our interest . If r denotes the constant r ¼ yj yi, concerns the sets Sy ðdij Þ ¼ Ey ðdij Þ \ Rn1 ‘ then : xj ¼ rg Sy ðd‘j Þ ¼ fx 2 Rn1 ‘
if ‘ ¼ i; and
Sy ðdi‘ Þ ¼ fx 2 Rn1 : xi ¼ rg if ‘ ¼ j: ‘
(4.18) (4.19)
x3
Ey(d13)
Ey(d23) y
x1
x2 L(y) Ey(d12)
FIGURE 15 The three unique hyerplanes containing y 2 R3; their intersection is the line L(y).
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
139
Since r is a constant, Sy(d‘j) is parallel to the hyperplane Rn2 ¼ ‘j n1 n2 fx 2 Rn1 : x ¼ 0g in the space R . Both S (d ) and R have orientaj y ‘j ‘ ‘ ‘j tion ej in Rn1 . Similarly, Sy ðdi‘ Þ k Rn2 ¼ fx 2 Rn1 : xi ¼ 0g and both ‘ i ‘ have orientation ei in Rn1 Sy(di‘) and Rn2 i‘ ‘ . This analysis proves the following theorem. ; (2) Sy ðd‘j Þ k Rn2 in Rn1 Theorem 6.1. (1) If ‘ 2 = {i, j}, then Ey ðdij Þ ? Rn1 ‘ ‘j ‘ j n2 n1 and both are oriented by e ; (3) Sy ðd‘i Þ k R‘i in R‘ and both are oriented by ei; . and (4) Sy ðd‘i Þ ? Sy ðd‘j Þ in Rn1 ‘ The shape of F(X) for X ¼ {x1, . . ., xk} Rn can be derived from , where l ¼ 1, . . ., n. The hyperboxes situated in the linear subspaces Rn1 ‘ maximal and minimal corners of these hyperboxes are directly related to the columns of matrices WXX W and MXX. For each xV ¼ 1, . . ., k and ‘ 2 {1, . . ., n} set xx ð‘Þ ¼ xxl þ xx ; u‘ ¼ kx¼1 xx ð‘Þ, and v ¼ kx¼1 xx ð‘Þ. The hyperbox : determined by v‘ and u‘ is the convex set Bðv‘ ; u‘ Þ ¼ fx 2 Rn1 ‘ v‘i xi u‘i ; i ¼ 1; . . . ; n 1g. The points u‘ and v‘ are called the maximal and minimal corners of the hyperbox B (v‘, u‘), respectively. For any value of ‘, we have the following fundamental relationships between these corners. j j Theorem 6.2. W For i; j 2 f1; . . .W; ng; uij ¼ vi and vWij ¼ ui . W k k k x Proof. uij ¼ x¼1 ½xx ðiÞj ¼ x¼1 ½xi þ xx j ¼ x¼1 ðxxi þ Xjx Þ ¼ kx¼1 V V V ðxxj xxi Þ ¼ kx¼1 ðxxj xxi Þ ¼ kx¼1 ðxxj þ xxi Þ ¼ kx¼1 ½xxj þ xx i ¼ V j j kx¼1 ½Xx ðjÞi ¼ vi . The proof of vij ¼ ui is analogous. □ As an easy consequence, we have Corollary 6.3. Eui ðdij Þ ¼ Evj ðdij Þ and Evi ðdij Þ ¼ Euj ðdij Þ. Proof. Note that uii ¼ 0 ¼ vij . Thus, Eui ðdij Þ ¼ fx 2 Rn : xi ¼ xj þ ðuii j uij Þg ¼ fx 2 Rn : xi ¼ xj uij g ¼fx 2 Rn : xi ¼ xj þ vi g ¼ fx 2 Rn : xi ¼ xj þ j j ðvi vj Þg ¼ Evj ðdij Þ. A similar argument shows that Evi ðdij Þ ¼ Euj ðdij Þ. □ and since they are hyperFor i < j, we have Svi ðdij Þ k Sui ðdij Þ in Rn1 i planes in Rn1 , each separates Rn1 into two open half-spaces. For any i i n1 , we let Fþ y 2 Rn1 y ðdij Þ and Fy ðdij Þ denote the open half-spaces of Ri i associated with the hyperplane Sy(dij). Suppose ‘ ¼ 1 and consider the sets Su1 ðd1j Þ and Sv1 ðd1j Þ. Since Su1 ðd1j Þ is defined by f ðxÞ ¼ xj þ u1j ¼ 0, it follows that for any z 2 Bðv1 ; u1 Þ, v1j zj u1j for j ¼ 1, . . ., n and, hence, f ðzÞ ¼ zj þ u1j ¼ u1j zj 0. þ1 ðd1j Þ and, hence, Bðv1 ; u1 Þ F þ1 ðd1j Þ for j ¼ 2, . . ., n. Therefore, z 2 F u u Similarly, Sv1 ðd1j Þ is defined by f ðxÞ ¼ xj þ v1j ¼ 0. Hence, for any 1 ðd1j Þ for any z 2 Bðv1 ; u1 Þ; f ðzÞ ¼ v1j zj 0. Therefore, Bðv1 ; u1 Þ F v j ¼ 2, . . ., n. This proves that n \ þ1 ðd1j Þ \ F 1 ðd1j Þ: (4.20) Bðv1 ; u1 Þ ½F u v j¼2
On the other hand, if ‘ ¼ n, then the sets Sun ðdin Þ and Svn ðdin Þ are defined by f ðxÞ ¼ xi uni ¼ 0 and f ðxÞ ¼ xi vni ¼ 0, respectively. Thus, if z 2 B (vn, un), then f ðzÞ ¼ zi uni 0 and f ðzÞ ¼ zi vni 0 for i ¼ 1, . . ., n. un ðdin Þ \ F þn ðdin Þ for i ¼ 1, . . ., n 1. Therefore, Hence z 2 F v
140
Gerhard X. Ritter and Gonzalo Urcid
Bðvn ; un Þ
n\ 1
n ðdin Þ \ F þn ðdin Þ: ½F u v
(4.21)
i¼2
Note the reversal of the positive and negative half-planes in Eq. (4.21) versus those in Eq. (4.20). As the next theorem asserts, the subset relation in these two equations can be shown to be equalities and similar results can be obtained for any ‘ 2 {1, . . ., n}. Theorem 6.4. (1) If ‘ ¼ 1, then n \ þ1 ðd1j Þ \ F 1 ðd1j Þ: Bðv1 ; u1 Þ ¼ ½F (4.22) u v j¼2
(2) If ‘ ¼ n, then Bðvn ; un Þ ¼
n1 \
n ðdin Þ \ F þn ðdin Þ: ½F u v
(4.23)
i¼1
(3) If 1< ‘ < n, then Bðv‘ ; u‘ Þ ¼
‘1 \
þ
‘ ðdi‘ Þ \ F ‘ ðdi‘ Þ \ ½F u v
i¼1
n \ j¼‘þ1
þ
‘ ðd‘j Þ \ F ‘ ðd‘j Þ: ½F u v
(4.24)
Proof. We begin by proving Eq. (4.22). In the text we already proved that Bðv1 ; u1 Þ
n \
þ1 ðd1j Þ \ F 1 ðd1j Þ: ½F u v
j¼2
To prove equality we only need to show that n \ þ1 ðd1j Þ \ F 1 ðd1j Þ Bðv1 ; u1 Þ: ½F u v
(4.25)
j¼2
For j ¼ 2, . . ., n, let f1 ðxÞ ¼ xj þ u1j ¼ 0 and f2 ðxÞ ¼ xj þ v1j ¼ 0 denote the functions defining the planes Su1 ðd1j Þ and Sv1 ðd1j Þ, respectively. Then for þ1 ðd1j Þ \ F 1 ðd1j Þ, f2 ðzÞ ¼ v1 zj 0 u1 zj ¼ f1 ðzÞ for j ¼ z 2 \nj¼2 ½F u v j j 1 1 1 1 2, . . ., n. Also, since z 2 Rn1 1 ; z1 ¼ 0 ¼ v1 ¼ u1 . Therefore, vi zi vi for 1 1 i ¼ 1, . . ., n and, hence, z 2 B (v , u ). This proves Eq. (4.25) and, hence, Eq. (4.22). We dispense with the proof of Eq. (4.23) since it is analogous to the proof of Eq. (4.22). The proof of Eq. (4.24) is also similar. The functions defining the planes Su‘ ðdi‘ Þ and Sv‘ ðdi‘ Þ are given by f1 ðxÞ ¼ xi u‘i ¼ 0 and f2 ðxÞ ¼ xi v‘i ¼ 0, respectively. Thus, if z 2 B (v‘, u‘), then f1 ðzÞ ¼ zi u‘i 0 ‘ ðdi Þ \ F þ‘ ðdi‘ Þ. Similarly, the funcand f2 ðzÞ ¼ zi þ v‘i 0. Therefore, z 2 F u v tions defining the planes Su‘ ðdlj Þ and Sv‘ ðdj‘ Þ are given by g1 ðxÞ ¼ xj þ u‘j ¼ 0 and g2 ðxÞ ¼ xj þ v‘j ¼ 0, respectively. Thus, g1 ðzÞ ¼ u‘j zj 0 and þ ‘j Þ \ F ‘ ðd‘j Þ. Since ‘ was arbitrary, g2 ðzÞ ¼ v‘j zj 0. Hence, v T‘1 z 2 F u‘ ðd þ ‘ ðd‘i Þ \ Tn ½F þ‘ ðd‘j Þ \ F ‘ ðd‘j Þ. this implies that z 2 i¼1 ½F u‘ ðdi‘ Þ \ F v u v j¼1
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
141
To prove the converse, let z2
‘1 \
n \
i¼1
j¼‘þ1
‘ ðdi‘ Þ \ F þ‘ ðdi‘ Þ \ ½F u v
þ‘ ðd‘j Þ \ F ‘ ðd‘j Þ: ½F u v
Then z 2 Rn1 and, hence, z‘ ¼ 0 ¼ v‘‘ ¼ u‘‘ . Furthermore, f1(z) 0, f2(z) ‘ 0, g1(z) 0, and g2(z) 0 and, therefore, v‘i zi u‘i for i ¼ 1, . . ., ‘1 and v‘j zj u‘j for j ¼ ‘ þ 1, . . ., n. Thus, v‘i zi u‘i for i ¼ 1, . . ., n which proves that z 2 B (v‘, u‘). Hence the equality in Eq. (4.24) holds. □ Equations (4.22), (4.23), and (4.24) can be written as the single equation \ \ þ ‘ ðdi‘ Þ \ F þ‘ ðdi‘ Þ \ ½F ‘ ðd‘j Þ \ F ‘ ðd‘j Þ; (4.26) Bðv‘ ; u‘ Þ ¼ ½F u v u v ‘<j
i<‘
with the understanding that if ‘ ¼ 1 in Eq. (4.26), then the case i < ‘ does not appear in the equation and only the intersection \‘<j, where j ¼ 2, . . ., n, is being considered. This Tcase corresponds to Eq. (4.22). Similarly, if ‘ ¼ n, then only the intersection i
x2
u3
Su3(d23) B(v3,u3) Sv3(d23)
v3
x1 Sv3(d13) x2
u3
Su3(d23)
B(v3,u3) Sv3(d23)
v3 Sv3(d13)
FIGURE 16
Su3(d13)
x1
Two possible boxes for X R3 and ‘ ¼ 3.
142
Gerhard X. Ritter and Gonzalo Urcid
‘ ðdi‘ Þ, H þ‘ ðdi‘ Þ, H þ‘ ðdj‘ Þ, and H ‘ ðdj‘ Þ are closed half-spaces in Rn If H u v u v determined by the hyperplanes Eu‘ ðdi‘ Þ, Ev‘ ðdi‘ Þ, Eu‘ ðdj‘ Þ, and Ev‘ ðdj‘ Þ, respectively, then Eq. (4.26) can also be written as 0 1 \ \ ‘ ðdi‘ Þ \ H þ‘ ðdi‘ Þ \ ½H þ‘ ðd‘j Þ \ H ‘ ðd‘j ÞA \ Rn1 : Bðv‘ ; u‘ Þ ¼ @ ½H u v u v ‘ ‘<j
i<‘
(4.27) þ‘ þ‘ ðdi‘ Þ ¼ H the observation that F u u þ þ n1 n1 ‘ ðd‘j Þ ¼ H ‘ ðd‘j Þ \ R , and R‘ , F u u ‘
The equation follows from ðdi‘ Þ \ Rn1 ‘ , F v‘ ðdi‘ Þ ¼ H v‘ ðdi‘ Þ \ n1 ‘ ðd‘j Þ ¼ H ‘ ðd‘j Þ \ R . F v v ‘ By setting F‘ ðXÞ ¼ FðXÞ \ Rn1 ‘ , we obtain the following relationship between F‘ (X) and B (v‘, u‘). Theorem 6.5. F‘ (X) B (v‘, u‘). This theorem, proven in Ritter and Gader (2006), simply states that F‘(X) is bounded by the hyperbox. It follows from the definition of F‘ (X) that F(X) ¼ [a2R[a þ Fl (X)], where a þ F‘ (X) ¼ {y 2 Rn: y ¼ a þ x, x 2 F‘ (X)} (see also Ritter and Gader, 2006). Similar considerations show that Ey(dij) ¼ [a2R[a þ Sy(dij)]. Thus, by Theorem 6.5 and Eq. (4.27) we have that \ þ \ ‘ ðdi‘ Þ \ H þ‘ ðdi‘ Þ \ ½H ‘ ðd‘j Þ \ H ‘ ðd‘j Þ: (4.28) FðXÞ ½H u v u v ‘<j
i<‘
To replace the subset relationship by an equality it is necessary to let ‘ range over the set {1, . . ., n}. Theorem 6.6. FðXÞ ¼
n1 \
n \
þi ðdij Þ \ H i ðdij Þ: ½H u v
(4.29)
i¼1 j¼iþ1
Proof. Arbitrarily choose i0 2 {1, . . ., n 1}, j0 2 {2, . . ., n} with i0 < j0. Let ‘ ¼ j0 and x 2 F(X). According to Eq. (4.28), x2
j\ 0 1
j0 ðdij Þ \ H þj0 ðdij Þ \ ½H 0 0 u v
i¼1
n \ j¼j0 þ1
þj0 ðdj j Þ \ H j0 ðdj j Þ: ½H 0 0 u v
In particular, x2
j\ 0 1
j0 ðdij Þ \ H þj0 ðdij Þ ¼ ½H 0 0 u v
i¼1
j\ 0 1
þi ðdij Þ \ H i ðdij Þ; ½H 0 0 u v
i¼1
where the equality follows from Corollary 6.3. Since i0 2 {1, . . ., j0 1}, þi0 ðdi j Þ \ H i0 ðdi j Þ. But since i0 and j0 where arbitrarily chosen, x 2 ½H 0 0 0 0 u v the last equation holds for any pair i 2 {1, . . ., n 1} and Tni < j with Tn1 þ ½ H ðd Þ j 2 {2, T . . ., n}. Therefore, x 2 i ij \ H vi ðdij Þ and, hence u i¼1 j¼iþ1 þ n1 Tn i ðdij Þ \ H i ðdij Þ. ½H FðXÞ i¼1
j¼iþ1
u
v
143
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
Conversely, suppose that x2
n\ 1
n \
i¼1 j¼iþ1
þi ðdij Þ \ H i ðdij Þ: ½H u v
For ‘ ¼ 1, . . ., n, set x(‘) ¼ x‘ þ x, p‘ ¼ x‘ þ v‘, and q‘ ¼ x‘ þ u‘. Since x(‘) 2 B (v‘, u‘), v‘ x(‘) u‘. Therefore, p‘ ¼ x‘ þ v‘ x‘ þ xð‘Þ x‘ þ u‘ ¼ q‘ : Since x ¼ x‘ þ x(‘), we now have p‘ x q‘ 8 ‘ ¼ 1, . . ., n. Hence, p¼
n _
p‘ x
‘¼1
n ^
q‘ ¼ q:
‘¼1
Thus, pi xi qi for i ¼ 1, . . ., n. But pii ¼ xi þ vii ¼ xi ¼ xi þ uii 8i ¼ 1; . . . ; n: W V Therefore, pi ¼ n‘¼1 p‘i ¼ xi ¼W n‘¼1 q‘i ¼ qi for i ¼ 1, . . ., n, which means that p ¼ x ¼ q. It follows that x n‘¼1 p‘ and, hence, x¼
n _
ðx‘ þ v‘ Þ ¼
‘¼1
n _
½x‘ þ
‘¼1
k ^ x¼1
xx ð‘Þ ¼
n ^ k _
ðax;‘ þ xx Þ;
‘¼1 x¼1
where ax;‘ ¼ x‘ xx‘ . This proves that x 2 S (X) and, therefore, x 2 F(X).□ According to this theorem, F(X) is the intersection of particular types of closed half-spaces determined by the hyperplanes Eui ðdij Þ and Evi ðdij Þ, where i < j with i ¼ 1, . . ., n 1 and j ¼ 2, . . ., n. Since half-spaces are convex and the intersection of convex sets is again convex, F(X) is a convex set. The specification of the boundary of F(X), denoted by @F(X), is another consequence of Theorem 6.6. Corollary 6.7. @FðXÞ ¼
n1 [
n [
½Eui ðdij Þ [ Evi ðdij Þ \ FðXÞ:
(4.30)
i¼1 j¼iþ1 n Proof. Let AðXÞ ¼ [n1 i¼1 [j¼iþ1 ½Eui ðdij Þ [ Evi ðdij Þ \ FðXÞ. To prove the corollary we will show that A(X) @F(X) and @F(X) A(X). To accomplish this, we will use the following elementary facts from topology: (1) the intersection of a collection of closed sets is closed, (2) if S is a closed set and Int(S) denotes the interior of S, then @S S, (3) S ¼ Int(S) [ @S, and Int(S) \ @S ¼ . Suppose that x 2 A(X). Then x 2 ½Eu‘ ðd‘k Þ [ Ev‘ ðd‘k Þ \ FðXÞ ¼ ½Eu‘ ðd‘k Þ \ FðXÞ [ ½Ev‘ ðd‘k Þ \ FðXÞ for some pair of integers ‘, k with 1 ‘ < k n. Assume without loss of generality that x 2 Eu‘ ðd‘k Þ \ FðXÞ ¼
144
Gerhard X. Ritter and Gonzalo Urcid
Eu‘ ðd‘k Þ \ ½@FðXÞ [ IntðFðXÞÞ ¼ ½Eu‘ ðd‘k Þ \ @FðXÞ[½Eu‘ ðd‘k Þ \ IntðFðXÞÞ. Since Eu‘ ðd‘k Þ is a support hyperplane of F(X), Eu‘ ðd‘k Þ \ IntðFðXÞÞ ¼ and, hence, x 2 Eu‘ ðd‘k Þ \ @FðXÞ @FðXÞ. Therefore, A(X) @F(X). By Theorem 6.5 F(X) is the intersection of closed half-spaces. Thus, F(X) is closed and @F(X) F(X). It is also obvious that þ n IntðFðXÞÞ ¼ \n1 i¼1 \j¼iþ1 ½Hui ðdij Þ \ Hvi ðdij Þ. Let x 2 @F(X). Since @F(X)\ þ n1 n Int(F(X)) ¼ , x 2 = \i¼1 \j¼iþ1 ½Hui ðdij Þ \ Hvi ðdij Þ FðXÞ. It follows that there exists at least one pair of integers ‘, k with ‘ < k such that þ‘ ðd‘;k Þ \ H ‘ ðd‘;k ÞÞ. But x2 = Huþ‘ ðd;k‘ Þ \ Hv‘ ðd‘;k Þ ¼ Intð½H u v x 2 FðXÞ ¼
n\ 1
n \
i¼1 j¼iþ1
þ‘ ðd‘;k Þ H u
þ
i ðdij Þ \ H i ðdij Þ; ½H u v
‘ ðd‘;k Þ H v
þ‘ ðd‘;k Þ \ H ‘ ðd‘;k Þ [ Intð½H þ‘ ðd‘;k Þ \ ¼ @½H so that x 2 u v u ‘ ðd‘;k ÞÞ ¼ ½Eu‘ ðd‘;k Þ [ Ev‘ ðd‘;k Þ [ ½Hþ‘ ðd‘;k Þ \ H‘ ðd‘;k Þ with x 2 Eu‘ \H v v u ðd‘;k Þ [ Ev‘ ðd‘;k Þ. Since we also have x 2 F(X), x 2 ½Eu‘ ðd‘;k Þ [ Ev‘ ðd‘;k Þ\ FðXÞ AðXÞ. This shows that @F(X) A(X). □ If Y is a convex subset of Rn, then a hyperplane E(v) is said to cut Y if and only if Y \ Hþ(v) ¼ 6 ¼ 6 Y \ H(v). If E(v) intersects the closure of Y but does not cut Y, then E(v) is said to be a support hyperplane of Y. Thus, F(X) is characterized by pairwise parallel support hyperplanes of type E(dij) and the maximum number of these boundary hyperplanes is n(n 1). It is important to note that although F(X) has a non-empty boundary, it is not a bounded set. For every point x 2 F(X), the unbounded line L(x) is a subset of F(X). Example 3. Suppose X ¼ {x1, . . ., x12} R2is defined by x3 ¼ (2.5, 1)t, x1 ¼ (2.5, 3.5)t, x2 ¼ (2, 2)t, 4 t 5 t x ¼ (4, 2) , x ¼ (5, 4) , x6 ¼ (4.5, 5)t, 7 t 8 t x ¼ (4.5, 3.5) , x9 ¼ (3.5, 2)t, x ¼ (4, 3) , 10 t x ¼ (3, 3) , x11 ¼ (4, 3.5)t, x12 ¼ (2.5, 1.5)t. V W12 x t t x 1 Then v1 ¼ 12 x¼1 x ð1Þ ¼ ð0; 2Þ and u ¼ x¼1 x ð1Þ ¼ ð0; 1Þ . Similarly, 2 t 2 t 1 1 2 2 v ¼ (1, 0) and u ¼ (2, 0) . Hence, B (v , u ) and B (v , u ) are the intervals [2, 1] and [1, 2] on the x2-axis and x1-axis, respectively, as shown in Figure 17. The hyperplanes Eu1 ðd12 Þ and Eu1 ðd12 Þ in R2 are the two lines Lðu1 Þ ¼ x1 x2 þ u12 ¼ 0 and Lðv1 Þ ¼ x1 x2 þ v12 ¼ 0, respectively. Obviously, Eu1 ðd12 Þ ¼ Ev2 ðd12 Þ and Ev1 ðd12 Þ ¼ Eu2 ðd12 Þ as established by Corollary 1 ðd12 Þ is bounded by the lines þ1 ðd12 Þ \ H 6.3. The fixed-point set FðXÞ ¼ H u v Eu1 ðd12 Þ and Ev1 ðd12 Þ so that @FðXÞ ¼ ½Eu1 ðd12 Þ [ Ev1 ðd12 Þ \ FðXÞ ¼ Eu1 ðd12 Þ[ Ev1 ðd12 Þ. In the above example, we have Fl (X) ¼ B (vl, ul) for l ¼ 1, 2. This equality always holds in dimension n ¼ 2 but is generally false in dimension n 3 (Ritter and Gader, 2006). Hence, the subset relation in Theorem 6.5 cannot be replaced by an equality.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
145
x2 x6
5 x1
–
Hv 1(d12)
x10
x8 x7
x12 x2
x9
u1
x4
x3
v2
u2
5 +
Eu1(d12) Ev1(d12)
x5
x11
x1
Hu 1(d12) v1
þ FIGURE 17 The set X R2 and the fixed point set FðXÞ ¼ H u1 ðd12 Þ \ H v1 ðd12 Þ. The fixed point set is the cross-hatched area bounded by the lines Eu1 ðd12 Þ and Ev1 ðd12 Þ of slope 1. The two hyperboxes Bðvl ; ul Þ R1l for ‘ ¼ 1, 2 become closed intervals in R2, which are bounded by the open circles representing v‘ and u‘.
7. RELATIONSHIPS AMONG X, WXX, MXX, AND F(X) If Y Rn is convex and x 2 Y, then x is an extreme point of Y if and only if there do not exist two points y1, y2 2 Y such that y1 6¼ x ¼ 6 y2 and x 2 [y1, y2], where [y1, y2] Y denotes the line interval {z 2 Y: z ¼ ly1 þ (1 l)y2, 0 l 1}. It is well known that a closed bounded convex set is completely specified by its extreme points. In other words, if Y is a closed bounded convex set and Z Y is the set of extreme points of Y, then Z 6¼ and C(Z) ¼ Y, where C(Z) denotes the convex hull of Z (Eggleston, 1963; Hadwiger, 1957). For instance, the points v‘ and u‘ are extreme points of the convex set B (v‘, u‘). However, they are not extreme points of the unbounded set F(X). For example, the point u‘ is a point of the line \ \ Lðu‘ Þ ¼ Eu‘ ðdi‘ Þ \ Eu‘ ðd‘j Þ; (4.31) i<‘
‘<j
so that u‘ 2 [u‘ a, u‘ þ b] L(u‘) @F(X) F(X) for any pair of real numbers a > 0 and b > 0. Similarly, the point v‘ is a point on the line \ \ Lðv‘ Þ ¼ Ev‘ ðdi‘ Þ \ Ev‘ ðd‘j Þ: (4.32) i<‘
‘<j
146
Gerhard X. Ritter and Gonzalo Urcid
j
j
We shall use the notation WXX and MXX to denote the jth column of the matrices WXX and MXX, respectively. The relationship between the hyperboxes of type B (v‘, u‘) and the lattice correlation memories WXX and MXX is given by the next theorem. Theorem 7.1. For ‘ ¼ 1, . . ., n, v‘ ¼ W‘XX and u‘ ¼ M‘XX . Proof. If ‘ 2 {1, . . ., n}, then for i ¼ 1, . . ., n, we have v‘i ¼ Vk Vk Vk x x x ‘ x x Therex¼1 ½x ð‘Þi ¼ x¼1 ðx‘ þ x Þi ¼ x¼1 ðxi x‘ Þ ¼ wi‘ ¼ ðWXX Þi . ‘ ‘ ‘ ‘ fore, v ¼ WXX . The proof that verifies that u ¼ MXX follows a similar argument. □ W V Henceforth, we let u ¼ kx¼1 xx and v ¼ kx¼1 xx . With each memory WXX and MXX we associate, respectively, a set of n-scaled vectors W ¼ {w1, . . ., wn} and M ¼ {m1, . . ., mn} defined by j
j
wj ¼ uj þ WXX and mj ¼ vj þ MXX ;
(4.33)
where uj denotes the jth coordinate value of u, vj denotes the jth coordinate value of v, and j ¼ 1, . . ., n. As a direct consequence of Theorem 7.1, we have Corollary 7.2. wj ¼ uj þ vj and mj ¼ vj þ uj. Also, since the diagonals of WXX and MXX consist entirely of zeros, the j j jth coordinate of the vectors wj and mj is given by wj ¼ uj and mj ¼ vj , respectively. This observation is used in proving the following theorem. W V Theorem 7.3. u ¼ nj¼1 wj and v W ¼ nj¼1 mj . V n j Proof. We prove only that u ¼ j¼1 wj as the proof of v ¼ nj¼1 m Wnis analogous. Arbitrarily choose iW2 {1, . . ., n}. Then ui ¼ ui þ wii j¼1 W j j ðuj þ wij Þ ¼ nj¼1 ½uj þ ðWXX Þi ¼ nj¼1 wi . Thus, ui
n _
j
wi :
(4.34)
j¼1
Wn Wn W W W j ðuj þ wij Þ ¼ nj¼1 ½ðW kx¼1 W xxj Þ þ wij ¼ nj¼1 Wk We xalso haveWn j¼1Wwk i ¼ x j¼1V ½ x¼1 ðxjWþ wijW Þ¼ j¼1 f x¼1 ½xj þ kg¼1 ðxgi xgj Þg nj¼1 f kx¼1 ½xxj þ ðxxi n x xj Þg ¼ j¼1 ð kx¼1 xxi Þ ¼ ui and, therefore, n _ j wi u i
(4.35)
j¼1
W j But inequalities (34) and (35) imply that ui ¼ nj¼1 wi . W Since i was arbitrary, this equality holds for i ¼ 1, . . ., n and, hence, u ¼ nj¼1 wj . □ The next corollary is an easy consequence of Theorem 7.3. Corollary 7.4. If L(w‘) ¼ L(wj), then w‘ ¼ wj. Similarly, if L(ml) ¼ L(mj), then m‘ ¼ mj. Proof. This is trivial if ‘ ¼ j. Suppose ‘ ¼ 6 j. Since w‘ 2 L(wj), w‘ ¼ a þ wj for some a 2 R. If a ¼ 6 0, then either a > 0 or a < 0. WSuppose that a >0. Using Theorem 7.3, we obtain the contradiction w‘j nh¼1 whj ¼ uj < a þ uj ¼ a þ j wj ¼ w‘j . Similarly, if a < 0, then b ¼ a > 0 and we obtain the contradiction
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
147
W j j w‘ nh¼1 wh‘ ¼ u‘ < b þ u‘ ¼ a þ w‘‘ ¼ w‘ . Hence, a ¼ 0. The argument ‘ j ‘ for proving that L(m ) ¼ L(m ) implies m ¼ mj is analogous. Since v xx u 8 x 2 {1, . . ., k}, X B (v, u) and, by Property 3.1, X F(X). Since B (v, u) and F(X) are convex, C(X) B (v, u) and C(X) F(X). Therefore, X CðXÞ Bðv; uÞ \ FðXÞ:
(4.36)
The set B (v, u) \ F(X) is a convex polytope and the elements of W [ M [ {v, u} are vertices of this polytope. This last observation follows from Corollary 7.2, which implies that wj 2 L(vj) and mj 2 L(uj). According to Eqs. (4.31) and (4.32), L(uj) and L(vj) are edges of @F(X). Thus, wj and mj are edge points of F(X). But the hyperplanes Euj ¼ fx 2 Rn : xj ¼ uj g and Evj ¼ fx 2 Rn : xj ¼ vj g both have orientation ej and, therefore, both cut F(X). Since j j wj ¼ uj and mj ¼ vj , then fwj g ¼ Euj \ Lðvj Þ and fmj g ¼ Evj \ Lðuj W Þ. Thus, j j j w and m are vertex points of B (v, u) \ F(X). Also, since wj ¼ uj ¼ kx¼1 xxj , the jth coordinate of wj represents the maximal measured signature in the jth band if the elements of X represent the spectra of pixels of a hyperspectral image. Despite this fact, wj itself need not be an element of X. Example 4. Let X V R2 be the set defined in Example 3. Then Wk x u ¼ x¼1 x ¼ ð5; 5Þt , v ¼ kx¼1 xx ¼ ð2; 1Þt , 0 1 0 2 and MXX ¼ : WXX ¼ 2 0 1 0 Thus, from Eq. (4.33), W ¼ {w1, w2} and M ¼ {m1, m2} are given by w1 ¼ (5, 3)t, w2 ¼ (4, 5)t, m1 ¼ (2, 3)t, and m2 ¼ (3, 1)t. The values w11 ¼ u1 ¼ x51 ¼ 5 and w22 ¼ u2 ¼ x62 ¼ 5 represent the highest first coordinate value and highest second coordinate value of the dataset X. Note also that the set {w1, w2} is linearly independent and the set {v, w1, w2} is affinely independent, where the vector v consists of the lowest coordinate values of the set X. Figure 18 illustrates the geometric relationship between the sets X, C(X), W, M, {v, u}, fW1XX ; W2XX g ¼ fð0; 2Þt ; ð1; 0Þt g, fM1XX ; M2XX g ¼ fð0; 1Þt ; ð2; 0Þt g, and the polygon B (v, u) \ F(X). We conclude this section with the following pertinent fact. Theorem 7.5. WXX ¼ WMM ¼ WWW and MXX ¼ MWW ¼ MMM. ij denote the (i, j)th entry of the matrix WWW. Then Proof. V Let w ij ¼ n‘¼1 ðw‘i w‘j Þ wji wjj ¼ wji uj ¼ uj þ ðWjXX Þi uj ¼ wij . w This ij wij . On the other hand, V according to Theorem 3.1, w shows that w ij Vn n ‘ wi‘ wj‘ for ‘ ¼ 1, . . ., n. Hence, w ðw w Þ ¼ ½ðW Þ ij i‘ j‘ ‘¼1 V ‘¼1 XX i V ij . Thus, ðW‘XX Þj ¼ n‘¼1 ½½u‘ þ ðWXX Þi ½u‘ þ ðW‘XX Þj ¼ n‘¼1 ðw‘i w‘j Þ ¼ w ij and, therefore, wij ¼ w ij . This proves that WXX ¼ WWW. In the wij w ij denote the (i, j)th entry of the matrix WMM. Then m ij ¼ same vein, let m Vn i ‘ ‘ i i i ðm m Þ m w ¼ v m ¼ v ½v þ ðM Þ ¼ m ¼ w . The i i i ji ij XX j ‘¼1 i j j i j last equality follows from the fact that WXX ¼ MXX ¼ ðMtXX Þ (Ritter ij wij . et al., 1999). Thus, m
148
Gerhard X. Ritter and Gonzalo Urcid
x2 w2
5
x6
u x5
x1 m1
w1
x2
x4
v
1
x3
−1
m2
2
5
x1
−2
FIGURE 18 The fixed point set F(X) is the infinite strip bounded by the two lines of slope 1. The shaded area indicates the intersection of F(X) with the 2D box determined by u and v. Note that C({x1, . . ., x6}) ¼ C(X) and C(X) F(X) \ B (v, u). The open circles on the x1 and x2 axes correspond to column vectors of WXX and MXX.
Conversely, using the fact that mji mi‘ mi‘ (Theorem 3.1), we have . ., n. Thus, wij ¼ mji m that mji mV i‘ mj‘ for ‘ ¼ 1, .V Vi‘ mj‘ 8 ‘ and, hence, wij n‘¼1 ðmi‘ mj‘ Þ ¼ n‘¼1 ½ðM‘XX Þi ðM‘XX Þj ¼ n‘¼1 ðm‘i m‘j Þ ¼ ij . It follows that WXX ¼ WMM. The proof ij or, equivalently, wij m m that MXX ¼ MWW ¼ MMM is analogous, with the maximum _ replacing the minimum ^ in the above arguments. □
8. PRELUDE TO AFFINE INDEPENDENCE The linear unmixing theory assumes that the endmembers are affinely independent. The vector wj 2 W has the feature that its jth coordinate corresponds to the highest measured intensity within the jth band of the dataset X. In this sense, the elements of W can be viewed as excellent representatives of endmembers of the data cube X. However, it is important to note that for any pair {wj, mi} the inequalities wj mi or mi wj are j j generally false even though mij wj and mii wi . In fact, when using j hyperspectral data one usually obtains w‘ < mi‘ for several indices ‘. For this reason, various elements of M may represent important endmembers as demonstrated in Section 5. In this section we establish a sequence of theorems and corollaries that will aid in establishing necessary and sufficient conditions for extracting affine independent sets of
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
149
vectors from W and M. The main two theorems affecting the extraction process are Theorems 8.1 and 9.1. Recall that a set S ¼ {s0, s1, . . ., sm} Rn is said to be affinely independent if and only if the set {sj s0: j ¼ 1, . . ., m} is linearly independent. Thus linear independence, by definition, implies affine independence. It follows from the definition of affine independence that any two distinct points are affinely independent, any three non-collinear points are affinely independent, any four non-coplanar points are affinely independent, and in general any m points in Rn, with m n þ 1, are affinely independent if and only if they are not points of a common (m 2)dimensional linear subspace of Rn. The convex hulls of affinely independent points form the simplexes that are key in the linear mixing model. In particular, if S is affinely independent, then C(S) is an m-dimensional simplex, or m-simplex, denoted by sm or hs0, s1, . . ., smi. Thus, a 0-simplex is simply a point, a 1-simplex is a line segment determined by two affinely independent points, a 2-simplex is a triangle determined by three affinely independent points, a 3-simplex is a tetrahedron determined by four affinely independent points, and so on. Specifically, sm ¼ fx 2 Rn : x ¼
m X i¼0
ai si ;
m X
ai ¼ 1; ai 0g:
(4.37)
i¼0
Since any non-empty subset of an affinely independent set is also affinely independent, any non-empty subset of S consisting of j m þ 1 points generates a ( j 1)-dimensional simplex called a ( j 1)- dimensional face of sm. The (m 1)-dimensional face of sm opposite the vertex si is the simplex sim1 whose vertices are the elements of Si ¼ S \ {si} and the m1 boundary of sm is given by @sm ¼ [m . i¼0 si The first theorem of this section provides six equivalent conditions that furnish a computationally simple test for the affine independence of the sets W ¼ {wj: j ¼ 1, . . ., n} and M ¼ {mj: j ¼ 1, . . ., n} (see Appendix for proof). In the following discussion, the symbols wi and mi denote the ith row of W and M, respectively; that is, wi ¼ ðw1i ; w2i ; . . . ; wni Þ. Also, the vector c ¼ (c, c, . . ., c) denotes a constant vector, where c 2 R. Theorem 8.1. If j, ‘ 2 {1, . . ., n}, then the following statements are equivalent: j j (1) wj w‘ ¼ c, (2) v‘j ¼ u‘j , (3) w‘ ¼ wj, (4) mj m‘ ¼ c, (5) v‘ ¼ u‘ , and ‘ j (6) m ¼ m . Corollary 8.2. If j < ‘, then Ewj ðdj‘ Þ k Ew‘ ðdj‘ Þ in Rn and Ewj ðdj‘ Þ ¼ Ew‘ ðdj‘ Þ if and only if wj ¼ w‘. Similarly, Emj ðdj‘ Þ k Em‘ ðdj‘ Þ in Rn and Emj ðdj‘ Þ ¼ Em‘ ðdj‘ Þ if and only if mj ¼ m‘. Proof. We prove only the first part of the corollary as the proof of the second part is similar. That Ewj ðdj‘ Þ k Ew‘ ðdj‘ Þ follows from the fact that both hyperplanes have the same orientation. Now Ewj ðdj‘ Þ ¼ Ew‘ ðdj‘ Þ if j j and only if wj wi‘ ¼ w‘j w‘‘ if and only if uj ðuj þ v‘ Þ ¼ ðu‘ þ v‘j Þ u‘ if j ‘ ‘ ‘ □ and only if v‘ ¼ vj if and only if uj ¼ vj if and only if wj ¼ w‘.
150
Gerhard X. Ritter and Gonzalo Urcid
In various applications examples, one often obtains an overlap in the elements of W and M in that w‘ ¼ mj without affecting the affine independence of either set W or M. Thus, the results of the next corollary may be a little surprising. Corollary 8.3. w‘ ¼ a þ m‘ for some ‘ 2 {1, . . ., n} and a 2 Rn if and only if ‘ w ¼ mj for j ¼ 1, . . ., n. Proof. w‘ ¼ a þ m‘ if and only if u‘ þ v‘j ¼ w‘j ¼ a þ m‘j ¼ a þ v‘ þ u‘j for j ¼ 1, . . ., n. Equivalently, u‘ þ v‘j ¼ a þ v‘ þ u‘j for j ¼ 1; . . . ; n. Setting j ¼ ‘ we obtain u‘ ¼ a þ v‘ and, therefore, ða þ v‘ Þ þ v‘j ¼ ða þ v‘ Þ þ u‘j or v‘j ¼ u‘j for j ¼ 1; . . . ; n. By Theorem 8.1 this is equivalent to w‘ ¼ mj for j ¼ 1, . . ., n. □ In light of subsequent theorems, an important consequence of Theorem 8.1 is that to ascertain that W (or M) is affinely independent, all one has to do is to check that no two vectors of W (or M) are identical. Before stating the remaining theorems of this section we define certain concepts and sets that are critical in the proofs of these theorems. If the subset Wk ¼ fwi1 ; . . . ; wik g W is affinely independent, then it generates a (k 1)-dimensional linear subspace Lk1 Rn defined by Lk1 ¼ fx 2 Rn : x ¼
k X
aj wij ;
j¼1
k X
aj ¼ 1; aj 2 Rg:
(4.38)
j¼1
To simplify notation we set J ¼ {1, . . ., n}, Jj ¼ J \ {j} for j 2 J, Jk ¼ {i1, . . ., i ik} denotes some arbitrary subset of J, and for ij 2 Jk we set Jkj ¼ Jk \fij g. If Wk W is affinely independent, then for ij 2 Jk we define the linear of Lk1 by subspace Lk2 ij X X Lk2 ¼ fx 2 Rn : x ¼ ai‘ wi‘ ; ai‘ ¼ 1g: (4.39) ij ij
i‘ 2Jk
ij
i‘ 2Jk
Thus we may view Lk1 as being generated by the vertices of sk1 and being generated by the vertices of sk2 so that sk2 ¼ Lk2 \ sk1 . ij ij ij To further simplify notation, we also assume without loss of generality that the elements of Jk are ordered so that i1 < i2 <
< ik and set Wj ¼ W \ {wj}. In an analogous fashion we let Fk1 denote the (k 1)-dimensional linear subspace generated by an affinely independent subset Mk ¼ fmi1 ; . . . ; mik g M. In the subsequent discussion it is convenient to use the notion of the join of two sets. Given two subsets Y and Z of Rn, the join of Y and Z, denoted by Y * Z, is defined by Y Z ¼ fay þ bz : y 2 Y; z 2 Z; a; b 0; a þ b ¼ 1g: (4.40) Lk2 ij
A consequence of this definition is that the join operation is commutative and associative. If Y ¼ {y} is a one-point set, then y * Z shall represent {y} * Z.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
151
To clarify the preceding concepts and sets under discussion, we consider a few simple examples. Suppose that k ¼ n and W is affinely independent. In this case, Jn ¼ J and Ln1 is a hyperplane in Rn. Since {wj} ¼ Ln1 \ L(wj), Ln1 cuts the edges of @F(X) and, hence, Ln1 cuts F(X). We set {pj} ¼ Ln1 \ L(mj), P ¼ {pj: j ¼ 1, . . ., n}, sn1 ¼ hw1, . . ., wni, and let i ; . . . ; wn i ¼ sn2 hw1 ; . . . ; w denote the (n 2)-dimensional face of sn1 i i i means that the vertex wi opposite the vertex w . Here the symbol w n2 has been deleted so that the vertices of si are the elements of the set Wi ¼ W \ {wi}. Using the elements of P we form n new sets ti by defining . Similarly, whenever M is affinely independent, then Fn1 ti ¼ pi sn2 i denotes the hyperplane generated by M, Q ¼ {qj: j ¼ 1, . . ., n}, where {qj} ¼ i ; . . . ; mn i, and ¼ hm1 ; . . . ; m Fn1 \ L(wj), sn1 ¼ hm1, . . ., mni, sn2 i i n2 0 i ti ¼ q si . If n ¼ 2, then obviously ti ¼ si ¼ p ¼ wj for i, j 2 {1, 2} with i 6¼ j. If n 3 and X represents a random real dataset, then generally pi 6¼ wj and qi 6¼ mj. The following example should help to further clarify some of these concepts. Example 5. Let X ¼ {x1, . . ., x5}, where x1 ¼ (5, 9, 5)t, x2 ¼ (4, 5, 5)t, x3 ¼ (9, 10, 2)t, x4 ¼ (10, 9, 1)t, and x5 ¼ (2, 7, 7)t. It is easy to verify that X is a subset of the plane 4x1 x2þ5x3 ¼ 36 in R3. Computing v ¼ (2, 5, 1)t, u ¼ (10, 10, 7)t, WXX, MXX, W, and M, one obtains w1 ¼ (10, 9, 1)t, w2 ¼ (5, 10, 2)t, w3 ¼ (2, 7, 7)t, 6 0¼ 6 jMj, W m1 ¼ w3, m2 ¼ (6, 5, 5)t, and m3 ¼ w1. Since the determinants jWj ¼ and M are linearly independent and therefore affinely independent. The plane L2 generated by W is given by 4x1 þ 11x2 þ 9x3 ¼ 148. Since W L2, m1 ¼ w3, and m3 ¼ w1, we have p1 ¼ w3and p3 ¼ w1. Solving the equation p2 ¼ L(m2) \ L2 yields p2 ¼ (7, 6, 6)t ¼ m2 þ 1. Figure 19 illustrates the geometric relationship between t2 ¼ p2 s12 , s2, and W in L2. Also, the same figure illustrates the fact that since the hyperplanes of type Ewj ðd‘j Þ; Emj ðd‘j Þ ¼ Ew‘ ðd‘j Þ; Ewj ðdj‘ Þ, and Emj ðdj‘ Þ ¼ Ew‘ ðdj‘ Þ constitute support hyperplanes of F(X) and Ln1cuts F(X), Sn n1 n1 we have FðXÞ \ L S¼ CðW [ PÞ ¼ s [ i¼1 ti . Similarly, FðXÞ \ Fn1 ¼ n n1 [ i¼1 ti . These equalities are also an easy consequence of CðM [ QÞ ¼ s Theorem 6.6. If k ¼ 2 and n 2, then L1 is simply the line determined by the two distinct points wi1 and wi2 . Since hwi1 ; wi2 i L1 and hwi1 ; wi2 i FðXÞ, i1 ðdi i Þ \ H þ i2 ðdi i Þ; E i1 ðdi i Þk hwi1 ; wi2 i L1 \ FðXÞ. Also, FðXÞ H 1 2 1 2 w 1 2 w w Ewi2 ðdi1 i2 Þ with Ewi1 ðdi1 i2 Þ \ Ewi2 ðdi1 i2 Þ ¼ , and thus L1 \ ½Ewi1 ðdi1 i2 Þ i1 ðdi i Þ \ H þ i2 [Ewi2 ðdi1 i2 Þ ¼ fwi1 ; wi2 g. Therefore, L1 \ FðXÞ L1 \ ½H 1 2 w w i1 i2 1 i1 i2 1 ðdi1 i2 Þ ¼ hw ; w i and, hence, L \ FðXÞ ¼ hw ; w i ¼ s . This representation of Lk1 \ F(X) changes somewhat when k >2. Considering the case k ¼ 3, then there are three disjoint pairs of parallel hyperplanes with each pair associated with an edge of s2 ¼ hwi1 ; wi2 ; wi3 i—namely, Ewi1 ðdi1 i2 Þ k Ewi2 ðdi1 i2 Þ; Ewi1 ðdi1 i3 Þ k Ewi3 ðdi1 i3 Þ, and Ewi2 ðdi2 i3 Þ k Ewi3 ðdi2 i3 Þ þ i2 ðdi i Þ \ ½H i1 ðdi i Þ \ H þ i3 ðdi i Þ \½H i2 i1 ðdi i Þ \ H with FðXÞ ½H w 1 2 1 2 1 3 1 3 w w w w þ i i i3 ðdi i Þ. Note that the edge hw j ; w ‘ i is associated with the pair ðdi2 i3 Þ \ H 2 3 w fEwij ðdij i‘ Þ; Ewi‘ ðdij i‘ Þg, where ii < il and {ij, il} {i1, i2, i3}. In contrast to the
152
Gerhard X. Ritter and Gonzalo Urcid
Ew2 (d12) ∩ L2
Ew3 (d13) ∩ L2
w2
E(u2) ∩ L2
s2
Ew1 (d12) ∩ L2 w1
s 21 Ew2 (d23) ∩ L2 w3
t2 E(v2) ∩ L2
E(v1) ∩ L2
p2
Ew1 (d13) ∩ L2 Ew3 (d23) ∩ L2
E(u1) ∩ L2
FIGURE 19 The plane L2 ¼ fx 2 R3 : 4x1 þ 11x2 þ 9x3 ¼ 148g and various intersections of L2 with the planes Ewj ðd‘j Þ; Ewj ðdj‘ Þ, and the planes Eðui Þ and Eðvi Þ defined by xi ¼ ui and xi ¼ vi, respectively. These latter hyperplanes are determined by the sides of B (v, u).
case k ¼ 2, here we have Lðmi1 Þ Emi1 ðdi1 i2 Þ \ Emi1 ðdi1 i3 Þ ¼ Ewi2 ðdi1 i2 Þ \ Ewi3 ðdi1 i3 Þ and, similarly, Lðmi2 Þ Ewi1 ðdi1 i2 Þ \ Ewi3 ðdi2 i3 Þ and Lðmi3 Þ ij ij 2 Ewi1 ðdi1 i3 Þ \ Ewi2 ðdi2 i3 Þ. Setting fp S3g ¼ Lðm 2Þ \ L for j ¼2 1, 2, 3 and ij 1 2 tij ¼ p sij , we have s [ j¼1 tij L \ FðXÞ L \ ½ðH wi1 ðdi1 i2 Þ þ i2 ðdi i ÞÞ \ ðH i1 ðdi i Þ \ H þ i3 ðdi i ÞÞ \ ðH i2 ðdi i Þ \ H þ i3 ðdi i ÞÞ ¼ s2 [ \ 1 2 1S 3 1 3 2 3 2 3 w w w w w SH 3 3 2 2 j¼1 tij . Therefore, s [ j¼1 tij ¼ L \ FðXÞ and 2 \ 3 \ ij ðdi i Þ \ H þ i‘ ðdi i Þ: L2 \ FðXÞ ¼ L2 \ ½H (4.41) j ‘ j ‘ w w j¼1‘¼jþ1
In general, there exist k(k 1)/2 pairs of disjoint parallel hyperplanes supporting the simplex sk1 ¼ hwi1 ; . . . ; wik i, each associated with a unique 1D face of sk1. Specifically, these hyperplanes are given by Ewij ðdij ijþ1 Þ k Ewijþ1 ðdij ijþ1 Þ; . . . ; Ewij ðdij ik Þ k Ewik ðdij ik Þ for 1 j < k or Ewij ðdij1 ij Þk Ewij1 ðdij1 ij Þ; . . . ; Ewij ðdi1 ij ÞkEwi1 ðdi1 ij Þ for 1< j k. Again, setting ij ij k1 fp Þ \ Lk1 for j ¼ 1; . . . ; k and tij ¼ pij sk2 [ ij , one obtains s Sk g ¼ Lðmk1 t ¼ L \ FðXÞ as claimed in Corollary 8.6. This fact could also be j¼1 ij proven directly as above for the cases k ¼ 2 and k ¼ 3. If Wk is affinely independent and Pk ¼ fpij : j ¼ 1; . . . ; kg, then since Wk [ Pk F(X) and Wk [ PSk Lk1, we have Wk [ Pk Lk1 \ F(X). Hence, by convexity, sk1 [ kj¼1 tij ¼ Lk1 \ FðXÞ. Using the identities Ewij ðdij i‘ Þ ¼ Evij ðdij i‘ Þ and Ewi ‘ ðdij i‘ Þ ¼ Euij ðdij i‘ Þ, then in view of Theorem 6.6, we have
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
Lk1 \ FðXÞ Lk1 \
k1 \
k \
j¼1 ‘¼jþ1
þ
153
ij ðdi i Þ \ ½H ij ðdi i Þ ½H j ‘ j ‘ v u
k k \ \ [ T k1 ij ðdi i Þ \ ½H þ i‘ ðdi i Þ ¼ sk1 [ ti : ¼ Lk1 ½H j ‘ j ‘ j w w j¼1 ‘¼jþ1
(4.42)
j¼1
Sk
Therefore, sk1 [ j¼1 tij ¼ Lk1 \ FðXÞ. The subset relation in Eq. (4.42) follows from Theorem 6.6 and the last equality in the equation follows from the classification of the set tij in Theorem 8.4 (see Appendix þ ij ðij ; ‘Þ ¼ H þ ij ðdi ‘ Þ \ for proof). By setting Swij ðij ; ‘Þ ¼ Ewij ðdij ‘ Þ \ Lk1 , S j w w ij ðij ; ‘Þ ¼ H ij ðdi ‘ Þ \ Lk1 , and so on, Eq. (4.42) can then be Lk1 , S j w w rewritten as Lk1 \ FðXÞ ¼
k\ 1
k \
þ
ij ðij ; i‘ Þ \ S ij ðij ; i‘ Þ: ½S v u
(4.43)
j¼1 ‘¼jþ1
Equation (4.43) represents the same results in the linear subspace Lk1 of Rn as does Theorem 6.6 in Rn. Of course, analogous reasoning applies to affine-independent subsets of Mk ¼ fmi1 ; . . . ; mik g M and Fk1 \ F(X) as well. The next theorem classifies the sets tij and tij and the vertex points pij and qij . Theorem 8.4. If Wk is affinely independent, then for a given integer ij 2 Jk, ij ; . . . ; wik ; pij i or tij ¼ sk2 and either tij is the (k 1)-simplex hwi1 ; . . . ; w ij pij ¼ wir for some ir 2 Jk \ {ij}. Similarly, if Mk is affinely independent, then tij is ij ; . . . ; mk ; qij i or tij ¼ sk2 either the (k 1)-simplex hmi1 ; . . . ; m and qij ¼ mir ij for some ir 2 Jk \ {ij}. The notion of a hypertriangle plays an important role in the construction of the proof of the final theorem of this section. Suppose S1, S2, and S3 are three distinct (k 2)-dimensional linear subspaces of Lk1. If the sets P ¼ S1 \ S2, Q ¼ S1 \ S3, and R ¼ S2 \ S3 are non-empty and P \ Q ¼ P \ R ¼ Q \ R ¼ , then the compact region T Lk1 bounded by S1, S2, and S3 is called a hypertriangle with corners P, Q, and R and sides P * Q, P * R, and Q * R. It is an easy exercise to show that the set L[P, Q] ¼ {x 2 Rn: x ¼ a1p þ a2q, p 2 P, q 2 Q, a1 þ a2 ¼ 1} is the linear space S1. Similarly, L[P, R] ¼ S2 and L[Q,R] ¼ S3. Theorem 8.5. If Wk is affinely independent and wx 2 W, then wx 2 Lk1if and only if wx 2 Wk. Similarly, if Mk is affinely independent and mx 2 M, then mx 2 Fk1if and only if mx 2 Mk. A consequence of Theorem 8.5 (see Appendix for proof) is the geometric classification of the sets F(X) \ Lk1 and F(X) \ Fk1. Corollary 8.6. FðXÞ \ Lk1 ¼ sk1 [
k [ j¼1
tij
and
FðXÞ \ Fk1 ¼ sk1 [
k [ tij : j¼1
154
Gerhard X. Ritter and Gonzalo Urcid
Proof. Note that for k < n Jk does not cut F(X). However, for each ij 2 Jk, Lk1 does cut the edges Lðvij Þ and Lðuij Þ at cut points wij and pij , respectively. According to Theorem 8.5, the cut points of the edges of F(X) that get cut by Lk1 are exactly the elements of Wk [ Pk. Since F(X) \ Lk1 is a convex polytope (Ritter and Gader, 2006) with vertices the elements of Wk [ Pk, we have F(X) \ Lk1 ¼ C(Wk [ Pk), the convex S hull of Wk [ Pk. But by definition of sk1 and tij ; CðWk [ Pk Þ ¼ sk¼1 [ k‘¼1 ti‘ . □
9. AFFINE-INDEPENDENT SETS DERIVED FROM W AND M The theorems and corollaries given in the preceding section provide the tools necessary for proving the affine independence of sets derived from W and M by our proposed method. As before, Jk denotes an arbitrary nonempty subset of J. Theorem 9.1. Wk W is affinely independent if and only if wij 6¼ wi‘ for all distinct pairs {ij, i‘} Jk. Similarly, Mk M is affinely independent if and only if mij 6¼ mi‘ for all distinct pairs {ij, i‘} Jk. Proof. Obviously, if Wk is affinely independent, then wij 6¼ wi‘ for all distinct pairs {ij, i‘} Jk. To prove the converse, suppose that wij 6¼ wi‘ for all distinct pairs {ij, i‘} Jk. The proof is trivial if k ¼ 1 or k ¼ 2. If k ¼ 3, let ij 2 Jk and set J2 ¼ {i‘, ir} ¼ Jk \ {ij}. Since W2 is affinely independent, the set fwi‘ ; wir g determines a unique line L1. If wij 2 L1 , then according to Theorem 8.5 either wij ¼ wir or wij ¼ wi‘ . But this contradicts our hypoth= L1 and, esis that wij 6¼ wi‘ for all distinct pairs {ij, i‘} Jk. Therefore wij 2 ij i‘ ir hence, fw ; w ; w g ¼ Jk is affinely independent. We now proceed by induction. Assume that the theorem is true for any subset Wk1 W that satisfies the hypothesis, where k 1 3 and 4 k n. Suppose that Wk also satisfies the hypothesis. Again, let ij 2 Jk be arbitrary and put Jk1 ¼ Jk \ {ij} and Wk1 ¼ Wk fwij g. Since Wk satisfies the hypothesis, so does Wk1 Wk. Thus, by the induction hypothesis, Wk1 is affinely independent and generates a unique (k 2)-dimensional linear space Lk2. If wij 2 Lk2 , then again according to Theorem 8.5, wij ¼ wi‘ for some wi‘ 2 Wk1 with ij 6¼ i‘. But this again contradicts the hypothesis. It follows that wij 2 = Lk2 . Since wij was arbitrarily chosen, we have that Wk is affinely independent. The proof of the second part of the theorem follows the same reasoning and we dispense with the details. □ This theorem, in conjunction with Theorem 8.1, provides a simple and efficient method for deriving a set of affine-independent vectors from W and M. Specifically, if wj 6¼ w‘ 8 {j, ‘} J, then any non-empty subset of W can serve as a possible set of endmembers. However, our experiments with real data have shown that wj ¼ w‘ does occur quite frequently. In this latter case, the following simple elimination of columns and rows (bands) from W results in an affinely independent set of vectors. For j 2 J and each
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
155
‘ 2 Jj for which wj ¼ w‘, remove the ‘th column and the ‘th row whenever wj w‘ ¼ c >0 or remove the jth column and jth row whenever wj w‘ ¼ j c 0. Note that this amounts to checking whether w1 w‘1 > 0. In either of these two cases, the jth and ‘th bands carry the same information and we are simply removing the darker band. An additional step is to remove the corresponding jth or ‘th coordinate from the vectors v and u. Repeat this procedure until no two columns are equal. If W and {~ v, u˜} denote the resulting sets obtained from W and {v, u}, respectively, then no two columns of W are equal. Additionally, it is an easy exercise to show that j ¼ u j 2 W, . Thus, ~ j for all w ~ j denotes the jth column of WW ˜j þ v w where v W ~ by Theorem 9.1, W is affinely independent. The procedure for reducing M is identical. Obviously, the data cube X also needs to be reduced accordingly in its dimensionality. Specifically, supposing again that , then accordingWto Theorem 8.1 we have that v‘j ¼ u‘j or, equivawj ¼ w‘V lently, x ðxx‘ þ xxj Þ ¼ x ðxx‘ þ xxj Þ, so that ðxx‘ þ xxj Þ ¼ ðxl‘ þ xlj Þ ¼ c8x; l 2 f1; . . . ; kg. Solving for xxj results in xxj ¼ xx‘ þ c. Since this latter equation holds for x ¼ 1, . . ., k, it follows that the jth band and the ‘th band in the data cube X are perfectly correlated modulo a constant and therefore carry the same information. Thus, whenever one removes the ‘th row from W, one also needs to reduce the ‘th band from X to obtain a new data cube X of (n 1)-dimensional vectors. Analogous procedures are applied for c 0 and when reducing M. The following simple example clarifies this procedure. Example 6. Let X ¼ {x1, x2, x3} R4, where x1 ¼ (4, 8, 10, 5)t, x2 ¼ (7, 11, 13, 8)t, x3 ¼ (7, 4, 8, 8)t. Then v ¼ (4, 4, 8, 5)t, u ¼ (7, 11, 13, 8)t, 0 1 0 1 0 4 6 1 7 7 7 7 B 3 0 4 4 C B C C and W ¼ B 4 11 9 4 C: WXX ¼ B (4.44) @ 1 A @ 2 0 0 8 13 13 8 A 1 3 5 0 8 8 8 8 Thus, w1 ¼ w4 with w11 w41 ¼ 1 < 0. Therefore, 0 1 0 1 0 4 4 11 9 4 @ ¼ @ 2 0 0 A; (4.45) W ¼ 13 13 8 A; WW W 3 5 0 8 8 8 v ¼ (4, 8, 5)t, u˜ ¼ (11, 13, 8)t, and X ¼ { x 1, x 2, x 3} R3, where x 1 ¼ (8, 10, 5)t, 2 t t ¼W . It is also interesting to note x ¼ (11, 13, 8) , x 3 ¼ (4, 8, 8) , and WW W XX that the first and last row of W are constant, but when computing M one obtains 0 1 4 7 7 4 B 18 4 6 8 C C M¼B (4.46) @ 10 8 8 10 A: 5 8 8 5 Although m1 ¼ m4 with m11 ¼ m41 ¼ 1 as predicted by Theorem 8.1, here m ¼ w1 ¼ w4and M has no constant row. 2
156
Gerhard X. Ritter and Gonzalo Urcid
As this example demonstrates, eliminating the jth row and jth column from W to obtain W corresponds to elimination of the jth row and jth . Had we eliminated the jth row and ‘th column from WXX to obtain WW W column instead, then the elements (vectors) of the set W would be the same as those in the example but their labels would be different. That is, we would obtain 0 1 4 11 9 W ¼ @ 8 13 13 A (4.47) 8 8 8 1 1 ¼ (4, 8, 8)t instead of w ¼ (11, 13, 8)t as in the example. so that w ~ given by Eq. 4.47 to compute W results in However, using the set W WW the same matrix as given in Eq. 4.45. Hence the proposed method for deriving W and X simplifies bookkeeping (tracking of labels) and avoids recomputing WW~ W~ . Generally, in most endmember detection schemes, endmembers form a linearly independent set of m points in Rn and a dark point is chosen to obtain a large m-simplex containing most or all of X. For the m þ 1 points to be affinely independent, one must ensure that the dark point is not an element of the hyperplane spanned by the (m 1)-simplex defined by the m-affine–independent points. Generally, the sets W[ {~ v} and M[ {u˜} are affine independent. If W¼W[ {~ v}, then of course W[ {~ v} is affinely independent. Example 7. If X ¼ {x1, x2}, with x1 ¼ (2, 3, 4, 1)t and x2 ¼ (4, 3, 4.5, 2)t, then v ¼ x1, u ¼ x2, and 0 1 0 1 0 1 2 1 4 2 2:5 3 B 1 0 1:5 1 C B 3 C C; hence W ¼ B 3 3 3 C WXX ¼ B @ 0:5 1 @ 4:5 4 4:5 3:5 A: 0 2:5 A 2 2 3 0 2 1 1:5 2
Thus, v ¼ w2 and W ¼ W [ {v}. Note that in this example v2 ¼ u2. The next theorem shows that this holds in general. That is, if W is affine independent, then v 2 W is possible only if vi ¼ ui for some i 2 J. Theorem 9.2. Suppose vij < uij for all ij 2 Jk. If Wk W is affinely independent, then v 2 = sk1. Similarly, if Mk M is affinely independent, then k1 u2 =s . Proof. Obviously, v 2 = Wk for otherwise v ¼ wij for some ij 2 Jk. But then ij vij ¼ wij ¼ uij , contrary to our hypothesis that vij < uij . Thus, v is not a vertex of sk1. Observe that this proves the theorem for k ¼ 1. Next we show that v 2 = hwij ; wi‘ i for all pairs {ij, i‘} Jk. Suppose to the contrary that ij i‘ v 2 hw ; w i for some pair {ij, i‘}. Then v ¼ lwij þ ð1 lÞwi‘ with 0 < l < 1
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
157
and, therefore, vij ¼ luij þ wii‘j lwii‘j . Since l > 0; vij wii‘j 0, and uij wii‘j 0, we now have 0 vij wii‘j ¼ lðuij wii‘j Þ 0:
(4.48)
Thus, vij wii‘j ¼ 0 ¼ uij wii‘j and, hence, vij ¼ uij . But this contradicts = s1 for any 1D face of sk1. Also, the hypothesis that vij < uij . Therefore, v 2 observe that this proves the theorem for k ¼ 2. We now proceed by induction. Suppose that 3 k n and that for some integer x with 2 x < k, v 2 = sj for any j-dimensional face of sk1, with 0 j x. Assume that there exists a subset fwi1 ; . . . ; wixþ2 g Wk such that v 2 sxþ1 ¼ hwi1 ; . . . ; wixþ2 i. By the induction hypothesis x xþ1 v2 = @sxþ1 ¼ [xþ2 . Let L ‘¼1 si‘ . Therefore, v must be an interior point of s x i1 i2 be the line determined by fw ; vg and fxg ¼ L \ si1 ¼ L \ hw ; . . . ; wixþ2 i. Since x 2 sxi1 and wi1 is the vertex of sxþ1 opposite the face sxi1 , we have hwi1 ; vi hwi1 ; xi and v 2 hwi1 ; xi. Therefore, v ¼ lwi1 þ ð1 lÞx with i 0 < l < 1 and, hence, x‘ ¼ lw‘j þ ð1 lÞx‘ for ‘ ¼ 1, . . ., n. But x 2 sxi1 Bðu; vÞ, which means that v‘ x‘ u‘ for ‘ ¼ 1, . . ., n. Thus, 0 vi1 xi1 ¼ lui1 lxi1 ¼ lðui1 xi1 Þ 0:
(4.49)
But this is possible only if vi1 xi1 ¼ ui1 xi1 or, equivalently, if vi1 ¼ ui1 . Again, this contradicts the hypothesis that vij < uij . This shows that our assumption that v 2 sxþ1 must be wrong. Therefore, v 2 = sxþ1 and, k1 k1 =s is analogous. □ hence, v 2 = s . The proof that u 2 Since vi ¼ ui for some i 2 J means that the output of the ith channel is constant, this phenomenon will probably not occur as the ith band will most likely have been eliminated during preprocessing of the data. Hence, we may not encounter the case v 2 Wk. Thus, we would have v 2 = sk1. But k1 the question remains whether it is possible to have v 2 L . In our experimentation with real data this case has never been encountered. The S reason may be that if v 2 = Wk and v 2 Lk1, then v 2 FðXÞ \ Lk1 ¼ sk1 [ kj¼1 tij since by Property 3.3, v 2 F(X). Therefore, v 2 tij n sk1 for some ij 2 Jk. But in experimentation with data we always obtained a þ v 2 Lk1 with a > 0. Hence, we conjecture that if v < u, then v 2 = Lk1. Thus, we believe that the choice of v for the dark point remains a good one, especially in view of the fact that it represents the low values of the actual data.
10. CONCLUSIONS AND DISCUSSION A novel technique was established to determine a representative set of final endmembers in a given hyperspectral image. The mathematical foundation of the autonomous stage in the proposed procedure is based on lattice algebra or, more specifically, on several algebraic and geometric properties derived from the set of fixed points of LAM transforms.
158
Gerhard X. Ritter and Gonzalo Urcid
Given a set X of n-dimensional pixel spectra, we have shown that sets W and M derived from the column vectors of the LAMs WXX and MXX, respectively, provide all the potential endmembers and that these endmembers are linked to the physical data by the fact that the jth coordinate of the endmember wj corresponds to the maximal measured brightness in the jth band and the jth coordinate of mj corresponds to the minimal measured brightness in the jth band. Another important contribution of the proposed method is the fast computational speed of autonomously deriving the scaled sets W and M. Only a single-pass scan of the hyperspectral image cube X is required to construct the scaled sets from the respective canonical LAMs. In a second stage, determination of final end-members relies on appropriate reduction techniques, such as, discrimination by noncontiguous maximal difference correlation coefficient or random selection of a single representative endmember in contiguous equal-sized blocks of candidate endmembers. We point out that extraction of final endmembers is problem dependent and required a certain degree of a priori knowledge for proper spectral identification of resource materials against available spectral libraries. In the final stage, abundance coefficients were computed using the nonnegative least squares numerical algorithm from which resource distribution images have been illustrated for a typical application example in hyperspectral imagery. For comparison of basic characteristics between recently developed approaches for linear unmixing and validation of the given examples, we included a brief description of the vertex component analysis and the minimum-volume enclosing simplex algorithms.
11. APPENDIX: THEOREM PROOFS Proof of Theorem 8.1. The case where j ¼ ‘ is trivial. Thus, we assume that j ¼ 6 ‘. We first show that statement (1) implies statement (2). If wj w‘ ¼ c, then whj wh‘ ¼ c for j j h ¼ 1, . . ., n. In particular, for h ¼ j we have wj w‘ ¼ c and for h ¼ ‘ we have w‘j w‘‘ ¼ c. In view of Theorem 6.2 and Corollary 7.2, we have j
u‘j ¼ v‘ j
j
j
¼ ðuj þ vj Þ ðuj þ v‘ Þ ¼ c ¼ ðu‘ þ v‘j Þ ðu‘ þ v‘‘ Þ;
(4.50)
j
or v‘ ¼ v‘ and, hence, v‘j ¼ u‘j . We now show that (2) implies (1). Suppose that v‘j ¼ u‘j . As a consequence of Theorem 3.1, we obtain wj‘ wjh w‘h and mj‘ mh‘ mhj
(4.51)
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
159
for h ¼ 1, . . ., n. Using this equation, Theorem 6.2 and Corollary 7.2, we get, for h ¼ 1, . . ., n j
j
v‘ ¼ wj‘ wjh w‘h ¼ vhj vh‘ ¼ uh þ u‘h
(4.52)
j
¼ u‘h uh ¼ mh‘ mhj mj‘ ¼ u‘j :
But v‘j ¼ u‘j . Thus, vhj vh‘ ¼ u‘j for h ¼ 1, . . ., n. Hence, by Corollary 7.2, wh‘ ¼ ðuh þ vhj Þ ðuh þ vh‘ Þ ¼ vhj vh‘ ¼ u‘j for h ¼ 1, . . ., n. This shows that wj wl ¼ c, where c ¼ u‘j . Therefore (1) is equivalent to (2), which will be denoted by (1) (2). The fact that (2) (5) is trivial: v‘j ¼ u‘j if and only if v‘j ¼ u‘j if and j j only if u‘ ¼ v‘ , where the last equivalence follows from Theorem 6.2. Therefore, (2) (5) and since (1) (2), we now have (1) (2) (5). We now prove that statement (5) implies statement (3). Suppose that j j v‘ ¼ u‘ . Using Theorem 3.1, we again have that, for i ¼ 1, . . ., n,
whj
j
j
u‘ ¼ m‘j mij mi‘ ¼ w‘i wji w‘j ¼ v‘ : j v‘
(4.53)
j u‘
Using the supposition ¼ replaces the inequalities by equalities in the equation. Applying Theorem 6.2 and Corollary 7.2 to the equalities j j j shows that u‘ ¼ ui u‘i ¼ v‘ . Thus, wi w‘i ¼ ðvj þ ui Þ ðv‘ þ u‘i Þ ¼ ðvj v‘ Þ þ v‘ j
j
j
(4.54)
or wi w‘i ¼ ðvj v‘ Þ þ v‘ for i ¼ 1, . . ., n. Therefore, wj ¼ a þ w‘, where j a ¼ ðvj v‘ Þ þ v‘ . According to Corollary 7.4 we now have wj ¼ w‘. This shows that (5) implies (3). Our next step shows that statement (3) implies statement (2). By j assumption, w‘i ¼ wi for i ¼ 1, . . ., n. Thus, for i ¼ l and i ¼ j we obtain j
j
u‘ ¼ w‘‘ ¼ w‘ ¼ uj þ v‘ and uj ¼ wj ¼ w‘j ¼ u‘ þ v‘j ; j
j
j
(4.55)
j
respectively. Therefore, uj ¼ u‘ þ v‘j ¼ ðuj þ v‘ Þ þ v‘j and, hence, j j j 0 ¼ uj ðuj þ v‘ Þ þ v‘j ¼ v‘ þ v‘j . Consequently, v‘j ¼ v‘ ¼ u‘j , where the very last equality follows from Theorem 6.2. This proves that (3) implies (2). Since (3) implies (2) and (2) (5), we have that statement (3) implies statement (5). We already have shown that (5) implies (3). Hence (3) (5). Therefore, (1) (2) (3) (5). The argument of showing that (5) implies (6) is similar to proving that j j (5) implies (3). Using Eq. (4.53) and the equality v‘ ¼ u‘ , we have that j
j
u‘ ¼ m‘j ¼ mij mi‘ ¼ w‘i wji ¼ w‘j ¼ v‘ j
(4.56)
j
for i ¼ 1, . . ., n. It follows that mij mi‘ ¼ ui u‘i ¼ v‘ for i ¼ 1, . . ., n. Therefore, mi m‘i ¼ ðvj þ ui Þ ðv‘ þ u‘i Þ ¼ ðvj v‘ Þ þ v‘ j
j
j
(4.57)
160
Gerhard X. Ritter and Gonzalo Urcid
j
j
or mi ¼ a þ m‘i , where a ¼ ðvj v‘ Þ þ v‘ and i ¼ 1, . . ., n. Hence, mj ¼ a þ m‘. Again, according to Corollary 7.4, we now have mj ¼ m‘. This shows that (5) implies (6). Equation (4.56) also provides the fact that ui‘ uij ¼ j w‘i wji ¼ v‘ for i ¼ 1, . . ., n. Hence mij mi‘ ¼ ðvi þ uij Þ ðvi þ ui‘ Þ ¼ j i i uj u‘ ¼ c, where c ¼ v‘ . This shows that (5) also implies (4). To show the converse—namely, that (4) implies (5)—we show that (4) implies (2). Since (2) (5), this will prove that (4) implies (5). Suppose that mj m‘ ¼ c. Then m‘j ¼ mi‘ for i ¼ 1, . . ., n. For i ¼ j we have j j j vj m‘ ¼ vj ðvj þ u‘ Þ ¼ u‘ ¼ c, while for i ¼ ‘ we obtain j ‘ ‘ ‘ mj v‘ ¼ ðv‘ þ uj Þ v‘ ¼ uj ¼ c. Hence, u‘j ¼ u‘ ¼ v‘j . We have now established that (1) (2) (3) (4) (5). We have left to show that (6) implies (5). To do so, we will show that j (6) implies (2) since (2) (5). Let m‘ ¼ mj. Then m‘i ¼ mi for i ¼ 1; . . . ; n. For j j i ¼ ‘, we have v‘ ¼ m‘ ¼ vj þ u‘ , while for i ¼ j we get vj ¼ m‘j ¼ v‘ þ u‘j . j j Therefore, vj ¼ ðvj þ u‘ Þ þ u‘j , or u‘ ¼ u‘j and, hence, v‘j ¼ u‘j . Thus, (6) implies (5). Since we have already shown that (5) implies (6), we now have (5) (6). Thus, we have (1) (2) (3) (4) (5) (6). □ Proof of Theorem 8.4. If k ¼ 1, the proof is trivial and for k ¼ 2 the proof follows trivially from the discussion preceding Theorem 8.4. Suppose k 3 and ij 2 Jk. The fact that mij 2 FðXÞ and Lðmij Þ FðXÞ follows from either Theorem 6.6 or from theorem 4.1 in Ritter and Gader (2006). Thus, pij 2 FðXÞ and, hence, pij lies between the hyperplanes Ewir ðdi‘ ir Þ and Ewi‘ ðdi‘ ir Þ for any pair i‘, ir 2 Jk \ {ij} with i‘ < ir. More precisely, in view þi‘ ðdi i Þ \ H i‘ ðdi i Þ ¼ H þ ir ðdi i Þ of Eq. (4.28), we have that pij 2 FðXÞ H ‘ r ‘ r ‘ r u v w \ H wi‘ ðdi‘ ir Þ 8i‘ ; ir 2 Jk fij g, where the last equality is a consequence of Corollaries 6.3 and 7.2. þ ir ðdi i Þ \ H i‘ ðdi i Þ ¼ Ewir ðdi i Þ [ E i‘ ðdi i Þ for some pair If pij 2 @½H ‘ r ‘ r ‘ r ‘ r w w w of integers i‘ < ir, then according to Corollary 8.2, Ewir ðdi‘ ir Þ \ Ewi‘ ðdi‘ ir Þ ¼ since wi‘ 6¼ wir . Therefore, either pij 2 Ewir ðdi‘ ir Þ or pij 2 Ewi‘ ðdi‘ ir Þ. Suppose without loss of generality that pij 2 Ewir ðdi‘ ir Þ. In this case, we have pij 2 Ewir ðdi‘ ir Þ \ Ewm ðdmij Þ8m < ij ;
(4.58)
pij 2 Ewir ðdi‘ ir Þ \ Ewm ðdij m Þ8m > ij :
(4.59)
Of course, if ij ¼ 1, then only Eq. (4.58) applies, and if ij ¼ k, then only Eq. (4.59) is being considered. We now prove that wir 2 Ewm ðdmij Þ8m < ij and wir 2 Ewm ðdij m Þ8m > ij . Let m 2 J \ {ij} with m < ij. Then there are two subcases that must be considered: ij < ir and ir < ij. If ij < ir, set L1 ¼ Ewir ðdi‘ ir Þ \ Ewm ðdmij Þ and note that pij 2 L1 . Let L2 ¼ Ewir ðdi‘ ir Þ \ Ewir ðdij ir Þ. Then pij 2 L2 and L1 and L2 are (n 2)-dimensional linear subspaces of Ewir ðdi‘ ir Þ with L1 k L2 in Ewir ðdi‘ ir Þ. Since pij 2 L1 \ L2 , we have that L1 ¼ L2. Hence wir 2 L2 ¼ L1 Ewm ðdmij Þ and, therefore, wir 2 Ewm ðdmij Þ
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
161
8m < ij < ir : If ir < ij , let L1 be defined as previously and set L2 ¼ Ewir ðdir i‘ Þ \ Ewir ðdir ij Þ. Again, L1 k L2 in Ewir ðdi‘ ir Þ and pij 2 L1 \ L2 . Thus, wir 2 L2 ¼ L1 Ewm ðdmij Þ and, hence, wir 2 Ewm ðdmij Þ8m < ij . The proof that wir 2 Ewm ðdij m Þ8m > ij is similar. Again, there are two subcases: ij < ir and ir < ij. If ij < ir, then for a given m > ij, set L1 ¼ Ewir ðdi‘ ir Þ \ Ewm ðdij m Þ and L2 ¼ Ewir ðdi‘ ir Þ \ Ewir ðdi‘ ir Þ to obtain wir 2 L2 ¼ L1 Ewm ðdij m Þ. If ir < ij, let L1 be defined as previously and set L2 ¼ Ewir ðdi‘ ir Þ \ Ewir ðdir ij Þ. Then again we have wir 2 L2 ¼ L1 Ewm ðdij m Þ. It now follows that \ \ Lðwir Þ ¼ Ewm ðdmij Þ \ Ewm ðdij m Þ ij <m
m
¼
\
m
Emij ðdmij Þ \
\
ij <m
Emij ðdij m Þ ¼ Lðmij Þ:
Therefore, if Lk1 denotes the (k 1)-dimensional linear space generated by Wk, then fwir g ¼ Lðwir Þ \ Lk1 ¼ Lðmij Þ \ Lk1 ¼ fa þ mij g ¼ fpij g or wir ¼ pij . If pij 2 Ewi‘ ðdi‘ ir Þ, then an analogous argument shows that wi‘ ¼ pij . This proves the theorem in case pij 2 Ewir ðdi‘ ir Þ [ Ewi‘ ðdi‘ ir Þ for some pair of integers il < ir in Jk \ {ij}. We now assume the case where pij 2 = Ewir ðdi‘ ir Þ [ Ewi‘ ðdi‘ ir Þ for all pairs of integers {i‘, ir} Jk \ {ij} with i‘ < ir. To show that pij is affinely = Lk2 independent of Wk fwij g, we will show that pij 2 ij . Assume to the ij k2 ij ij = Wk n fw g Lk2 for otherwise contrary that p 2 Lij . Note that p 2 ij pij ¼ wi‘ for some wi‘ 2 Wk n fwij g. But then pij 2 Ewi‘ ðdis i‘ Þ [ Ewis ðdis i‘ Þ for any is 2 Jk \ {ij, i‘} if is < i‘ or pij 2 Ewi‘ ðdi‘ is Þ [ Ewis ðdi‘ is Þ if i‘ < is. But = Ewir ðdi‘ ir Þ [ each of these possibilities contradicts our assumption that pij 2 Ew‘ ðdi‘ ir Þ for all pairs of integers {i‘, ir} Jk \ {ij} with i‘ < ir. Suppose {i‘, ir} Jk \ {ij} with i‘ < ir. If ij < i‘ < ir, let Sij i‘ ¼ Lk2 \ ij i‘ i‘ \ E ðd Þ. Then w 2 = S for otherwise w 2 Ewi‘ ðdij i‘ Þ and Sij ir ¼ Lk2 i r ij ir ij ir w ij Ewir ðdij ir Þ and, hence, fpij ; wi‘ g L1 ¼ Ewi‘ ðdij i‘ Þ \ Ewir ðdij ir Þ and wi‘ 2 L2 ¼ Ewi‘ ðdi‘ ir Þ \ Ewi‘ ðdij i‘ Þ with L1 k L2 in Ewi‘ ðdij i‘ Þ. But w 2 L1 \ L2 , which means that L1 ¼ L2. Thus, pij 2 L1 ¼ L2 Ewi‘ ðdi‘ ir Þ Ewi‘ ðdi‘ ir Þ [ Ewir ðdi‘ ir Þ, contrary to our assumption that pij 2 = Ewir ðdi‘ ir Þ [ Ewi‘ ðdi‘ ir Þ for all pairs of integers {i‘, ir} Jk \ {ij} with i‘ < ir. For analogous reasons = Sij i‘ . As illustrated in Figure 20, the set Sij i‘ separates we also have wir 2 þ i‘ ðdi i Þ \ Lk2 and A ¼ Lk2 into two components—namely, Aþ ¼ H j ‘ w ij ij k2 H wi‘ ðdij i‘ Þ \ Lij —so that according to Eq. (4.28), FðXÞ \ Lk2 Aþ . ij þ Observe that Sij i‘ ¼ @A ¼ @A is a support hyperplane of FðXÞ \ Lk2 ij in Lk2 . Similarly, the set Sij ir separates Lk2 into two components, Bþ ¼ i i j j þ ir ðdi i Þ \ Lk2 and B ¼ H ir ðdi i Þ \ Lk2 , with FðXÞ \ Lk2 Bþ . Thus, H j r j r w w ij ij ij þ þ if C1 ¼ A \ B , C2 ¼ A \ Bþ, C3 ¼ Aþ \ B, and C4 ¼ A \ B, then C1 . FðXÞ \ Lk2 ij Now, if k 4, then there exists an integer is 2 Jk \ {ij, i‘, ir} with is is k2 C1 . Reasoning as before, wis 2 Lk2 ij . Since w 2 FðXÞ, w 2 FðXÞ \ Lij is it is easy to verify that w 2 = Sij i‘ [ Sij ir and wi‘ , wir 2 = Sis ij ¼ Lk2 \ Ewis ðdis ij Þ ij
162
Gerhard X. Ritter and Gonzalo Urcid
Sisij C4
B+ C3
pij
C2 wil
A+ wir Sijir
wis C1 A+ ∩ B+
Sijil
FIGURE 20 Separation of wir from wil by Sis ij in Lk2 ij .
if is < ij or wi‘ , wir 2 = Sij is if ij < is. It follows that wis lies in the interior of C1. is The sets Sis ij for is < ij and Sij is for ij < is separate Lk2 ij . Specifically, since w lies in the interior of C1, the set Sis ij separates C2 ðSij ir \ Sij i‘ Þ from C3 ðSij ir \ Sij i‘ Þ and the same holds for Sij is if ij < is. Since wi‘ 2 = Sij ir , wi‘ 2 = Sij ir \ Sij i‘ , but i‘ i‘ w 2 @C2 C2 . Therefore, w 2 C2 n ðSij ir \ Sij i‘ Þ and, similarly, i‘ wir 2 C3 ðSij ir \ Sij i‘ Þ. Hence, Sis ij separates wi‘ from Wir in Lk2 ij . Since w 2 þ k2 k2 i ij ðdi i Þ \ L we must have w r 2 H ij ðdis ij Þ \ Lk2 FðXÞ \ Lij H s j ij ij w w which leads to the contradiction that wir2 = FðXÞ \ Lk2 ij . Applying the same reasoning in the cases where ij < is and i‘ < ij also leads to contramust be wrong and, dictions. Therefore, our assumption that pij 2 Lk2 ij hence, pij is affinely independent of W fwij g. Thus, tij ¼ pij sk2 is a ij (k 1)-dimensional simplex for k 4. If k ¼ 3, then Jk \ {ij} ¼ {i‘, ir} and we assume without loss of generality that i‘ < ir. In this special case, Lk2 ¼ L1ij is the line generated by {i‘, ir} and ij 1 1 FðXÞ \ Lij ¼ hi‘ ; i‘ i ¼ s . Thus, if pij 2 L1ij , then since pij 2 L1ij \ FðXÞ, pij must be in the interior of s1 as pij 2 = fi‘ ; i‘ g. But since fpij ; wi‘ g 1 ij ir Ewi‘ ðdij i‘ Þ \ Lij and fp ; w g Ewir ðdij ir Þ \ L1ij and L1ij is a line, we have L1ij Ewi‘ ðdij i‘ Þ \ Ewir ðdij ir Þ. Therefore, fwi‘ ; wir g L1ij L1 ¼ Ewi‘ ðdij i‘ Þ \ Ewir ðdij ir Þ. But L2 ¼ Ewi‘ ðdij i‘ Þ \ Ewir ðdij ir Þ k L1 in Ewi‘ ðdij i‘ Þ and since wi‘ 2 L1 \ L2 , L1 ¼ L2. Thus, we have wir 2 L1 ¼ L2 Ewi‘ ðdi‘ ir Þ and, by applying Corollary 8.2, wir ¼ wi‘ . This contradicts the fact that Wk is affinely inde= L1ij and pij is affinely independent of Wk n fwij g. pendent. Therefore, pij 2
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
163
This proves that if pij 2 = Ewir ðdi‘ ir Þ [ Ewi‘ ðdi‘ ir Þ for all pairs of integers ij ; . . . ; wik ; pij i {i‘, ir} Jk \ {ij} with i‘ < ir, then tij ¼ pij sk2 ¼ hwi1 ; . . . ; w ij is a (k 1)-dimensional simplex for any affinely independent set Wk and 1 k n. It should now be clear that if Mk M is affinely independent, then the proof that tij is either sk2 and qij ¼ mir for some ir 2 Jk \ {ij} or tij ¼ pij ij k2 sij is a (k 1)-dimensional simplex, follows from a similar analysis for □ the case of Wk. Proof of Theorem 8.5. Obviously, if wx 2 Wk, then wx 2 Lk1 Conversely, suppose that wx 2 k1 L . To show that this implies that wx 2 Wk, we proceed S in three steps. Step 1. We show that if wx 2 Lk1, then wx 2 sk1 [ k‘¼1 ti‘ . S To prove this step, assume to the contrary that wx 2 = sk1 [ k‘¼1 ti‘ . ij Also suppose that for some ij 2 Jk ; tij 6¼ sk2 ij . In this case, let fi‘ ; ir g Jk and assume, without loss of generality, that x < ij < i‘ < ir. If wx 2 Swi‘ ðij ; i‘ Þ, then fpij ; wx g L1 ¼ Ewi‘ ðdij i‘ Þ \ Ewx ðdxi‘ Þ and pij 2 L2 ¼ Ewi‘ ðdij i‘ Þ \ Epij ðdij i Þ for i > ij. Since L1 k L2 in Ewi‘ ðdij i‘ Þ and pij 2 L1 \ L2 ; L1 ¼ L2 . Thus, wx 2 L2 Epij ðdij i Þ. Since i > ij was arbitrary, i 1
j we have that wx 2 \ni¼ij þ1 Epij ðdij i Þ. Similarly, wx 2 \i¼1 Epij ðdiij Þ. Therefore,
fwx g ¼ Lk1 S \ Lðpij Þ ¼ fpij g, which is contrary to our assumption that = sk1 [ k‘¼1 ti‘ . Thus, we cannot have wx 2 Swi‘ ðij ; i‘ Þ. If wx 2 wx 2 Swir ðij ; ir Þ, then an analogous argument shows that wx ¼ pij 2 sk1 [ k [ ti‘ . This again contradicts our assumption. Therefore, if wx 2 = sk1 [ S‘¼1 k x 2 = Swi‘ ðij ; i‘ Þ [ Swir ðij ; ir Þ. ‘¼1 ti‘ , then w þ i‘ ðij ; i‘ Þ \ S þ ir ðij ; ir Þ; Pi ¼ S i‘ ðij ; i‘ Þ \ Swir ðij ; ir Þ; S1 i ðij ; i‘ Þ ¼ Put C ¼ S j w w w w‘ 1 Swi‘ ðij ; i‘ Þ \ C, and Swir ðij ; ir Þ ¼ Swir ðij ; ir Þ \ C. Thus, wx 2 F(X) \ Lk1 C, wx 2 = @C ¼ S1wi‘ ðij ; i‘ Þ [ S1wir ðij ; ir Þ; wi‘ 2 S1wi‘ ðij ; i‘ ÞnPij , and wir 2 S1wir ðij ; ir Þn = @C, wx is an interior point of C. Also, since Pij . Since wx 2 ij p 2 Swx ðx; ij Þ; Swx ðx; ij Þ \ @C ¼ Pij . Therefore, Swx ðx; ij Þ separates C into þ x ðx; ij Þ \ C and C ¼ S x ðx; ij Þ \ C with either wi‘ 2 S1 i ðij ; i‘ Þn Cþ ¼ S w w w‘ þ ir 1 Pij C and w 2 Swir ðij ; ir Þn Pij C or vice versa. If wi‘ 2 Cþ , then wi‘ 2 = FðXÞ, and if wir 2 Cþ , then wir 2 = FðXÞ. Thus, either case S will result = sk1 [ k‘¼1 ti‘ must in a contradiction. Hence, our assumption that wx 2 be wrong if there exists a tij with tij 6¼ sk2 ij . k2 We now suppose that t ¼ s for j ¼ 1, . . ., k. Equivalently, ij S ij we suppose that sk1 [ k‘¼1 ti‘ ¼ sk1 . According to Theorem 8.4, we i have that for a given ij 2 Jk, there exists i‘ 2 Jkj such that pi‘ ¼ wij . But ij ir then w 2 Swir ðir ; i‘ Þ8r < ‘ and, hence, hw ; wij i Swir ðir ; i‘ Þ. Similarly, hwij ; wis i Swis ði‘ ; is Þ8s > ‘. We need to remark that if ‘ ¼ 1 (i.e., pi1 ¼ wij ), then simply choose any r >1 with r 6¼ s and use hwij ; wir i and hwij ; wis i in the succeeding argument. The case ‘ ¼ k is handled in a þ is ði‘ ; is Þ. Using an argument ir ðir ; i‘ Þ \ S similar fashion. Now let C ¼ S w w analogous to the one presented at the beginning of the proof of Step 1, it is = Swir ðir ; i‘ Þ [ Swis ði‘ ; is Þ Thus, wx 2 = @C and, now easy to show that wx 2
164
Gerhard X. Ritter and Gonzalo Urcid
therefore, wx is an interior point of C. Setting Pi‘ ¼ Swir ðir ; i‘ Þ \ Swis ði‘ ; is Þ; þ x ðx; i‘ Þ \ S1wis ði‘ ; is Þ ¼ Swis ði‘ ; is Þ \ C, and S1wir ðir ; i‘ Þ ¼ Swir ðir ; i‘ Þ \ C; Cþ ¼ S w C and C ¼ S wx ðx; i‘ Þ \ C, we again obtain the contradiction that one of is ir thatS our assumption that wx 2 = sk1 [ w Sk or w is not in F(X). It now follows k x k1 [ ‘¼1 ti‘ which proves Step 1. ‘¼1 ti‘ must be false. Hence, w 2 s Step 2. If wx 2 sk1, then wx 2 Wk. To prove Step 2 we assume to the contrary that wx 2 = Wk. In this case, x w 6¼ s0 for any 0-dimensional face of sk1 since the elements of Wk are the vertices of sk1. Observe also that this is impossible if k ¼ 1 since then sk1 ¼ s0 and {s0} ¼ Wk. Next suppose that k >1 and that wx 2 s1 ¼ hwi‘ ; wir i for some 1D face = fwi‘ ; wir g wk , wx must be an interior of sk1 with i‘ < ir. Since wx2 1 point of s . Let L denote the line determined by fwi‘ ; wir g and suppose that x < i‘. Since s1 F(X) and Ewx ðdxi‘ Þ is a support hyperplane of FðXÞ; Ewx ðdxi‘ Þ does not cut s1. Thus, there exists a point x 2 s1 Ewx ðdxi‘ Þ with x 6¼ wx. Since fwx ; xg s1 \ Ewx ðdxi‘ Þ, {wx, x} also determines the line L and, therefore, L Ewx ðdxi‘ Þ. But then wi‘ 2 Ewx ðdxi‘ Þ k Ewi‘ ðdxi‘ Þ and, hence, Ewx ðdxi‘ Þ ¼ Ewi‘ ðdxi‘ Þ. By Corollary 8.2, wx ¼ wi‘ 2 Wk , contrary to our assumption that wx 2 = Wk. If i2 < x, then using the hyperplane Ewx ðdi‘ x Þ in the above argument instead also yields the contradiction that wx ¼ wi‘ 2 Wk . It follows that if k ¼ 1 or k ¼ 2, then the theorem is true. Furthermore, under the assumption that wx 2 = Wk we have that wx 2 = s1 for k1 for 2 k n. any 1D face of s We now proceed by induction. Suppose that wx 2 = sj for any j-dimenk1 sional face of s , where 2 z < k 1. Assume that wx 2 szþ1 ¼ hwj1 ; . . . ; wjzþ2 i for some subset {j1, . . ., jjþ2} Jk with j1 < j2 < < jzþ2. By z x the induction hypothesis, wx 2 = @szþ1 ¼ [zþ2 ‘¼1 sj‘ . Thus, w must be an interior zþ1 point of s . Let j‘ 2 {j1, . . ., jzþ2} and suppose that x < j‘. Since Ewx ðdxj‘ Þ is a support hyperplane of F(X) and of szþ1 FðXÞ; Ewx ðdxj‘ Þ does not cut szþ1. Hence szþ1 Ewx ðdxj‘ Þ. But then wj‘ 2 Ewx ðdxj‘ Þ k Ewj‘ ðdxj‘ Þ and, therefore, Ewx ðdxj‘ Þ ¼ Ewj‘ ðdxj‘ Þ. Again by Corollary 8.2, wx ¼ wj‘ 2 Wk , contrary to = Wk. If j‘ < x, then using the same argument we our assumption that wx 2 again obtain that Ewx ðdj‘ x Þ ¼ Ewj‘ ðdj‘ x Þ and, hence, wx ¼ wj‘ 2 Wk . Therefore, our assumption that wx 2 szþ1 must be wrong. This proves that if wx 2 = = sz for any z-dimensional face of sk1 with j ¼ 1, . . ., k 1. Wk, then wx 2 = sk1, which is contrary to our assumption that wx 2 sk1. Therefore, wx 2 k1 This proves that if wx 2 sS , then wx 2 Wk. k Step 3. If wx 2 sk1 [ ‘¼1 ti‘ , then wx 2 Wk. = Wk. Then according to Again we assume the contrary, namely, wx 2 the proof of Step 2, wx 2 = sk1. Hence, we must have wx 2 ti‘ n sk1 for some i‘ 2 Jk. We first prove that wx 6¼ pi‘ . 6 pi‘ and let fir ; is g Jki‘ with ir < is. In this Assume to the contrary that wx ¼ case, there are 12 possible ordered arrangements of the integers in the set {ir, is, i‘, x}; namely, i‘ < ir < is < x, or x < ir < is < i‘, or ir < is < i‘ < x, and so on. For the
165
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
remainder of the current analysis, suppose that i‘ < ir < is < x. Since Wk is affinely independent, wir ¼ 6 wis and, hence, Swir ðir ; is Þ \ Swis ðir ; is Þ ¼ with Swir ðir ; is Þ k Swis ðir ; is Þ in Lk1. It follows from Eq. (4.28) that pi‘ 2 FðXÞ \ ir ðir ; is Þ \ S þ is ðir ; is Þ. If pi‘ 2 @½S ir ðir ; is Þ \ S þ is ðir ; is Þ ¼ Swir ðir ; is Þ[ Lk1 S w w w w Swis ðir ; is Þ Ewir ðdir is Þ [ Ewis ðdir is Þ, then according to Theorem 8.4, ti‘ ¼ sk2 sk1 . But then wx 2 ti‘ n sk1 ¼ , which is impossible. Therei‘ x ir ðir ; is Þ \ S þ is ðir ; is Þ and pi‘ lies between Swir ðir ; is Þ and fore, w ¼ pi‘ 2 = @½S w w ði ; i Þ \ Sþ ði ; i Þ. Next, define two Swis ðir ; is Þ or, more precisely, pi‘ 2 S w ir r s wis r s k1 hypertriangles in L as follows: ir ðir ; is Þ \ S ir ði‘ ; ir Þ \ S þ is ði‘ ; is Þ; T1 ¼ S w w w is ði‘ ; is Þ S w
T2 ¼
\
þ ir ði‘ ; ir Þ S w
\
þ is ðir ; is Þ: S w
(4.60) (4.61)
The corners of T1 are given by Pi‘ ¼ Swir ði‘ ; ir Þ \ Swis ði‘ ; is Þ; Qir ¼ Swir ði‘ ; ir Þ \ Swir ðir ; is Þ, and Ri‘ ¼ Swir ðir ; is Þ \ Swis ði‘ ; is Þ. Similarly, the corners of T2 are given by Pi‘ ; Qis ¼ Swis ðir ; is Þ \ Swis ði‘ ; is Þ, and Rir ¼ Swis ðir ; is Þ \ Swir ði‘ ; ir Þ as illustrated in Figure 21. If pis 2 = sk1 , then define a third hypertriangle, þ ir ðir ; is Þ \ S þ ir ði‘ ; ir Þ \ S i‘ ði‘ ; ir Þ; T3 ¼ S w w w
(4.62)
with corners Pis ¼ Swir ðir ; is Þ \ Swi‘ ði‘ ; is Þ; Qir , and Ris ¼ Swir ði‘ ; ir Þ \ Swi‘ ði‘ ; is Þ. A consequence of these definitions is that Lk1 \ F(X) does not intersect the interior of Ti for i ¼ 1, 2, 3. Now consider the linear space Swx ðir ; xÞ. If Swx ðir ; xÞ \ Qis 6¼ , then there exists a point x 2 Swx ðir ; xÞ \ ½Swis ðir ; is Þ \ Swis ði‘ ; is Þ. Thus, x 2 L1 ¼ Ewis ðdir is Þ \ Ewis ðdi‘ is Þ and x 2 L2 ¼ Ewis ðdir is Þ \ Ewx ðdir x Þ. Swis(il,is)
R ir T2 Q is wis Pir
R il
Pil T1
til
s k-1
wir
Swil(il,is)
Q ir T3
R is
Pis wil
Swis(ir,is)
FIGURE 21
Swir(il,ir)
Swil(il,ir) Swir(ir,is)
The hypertriangles T1, T2, and T3 and their relationship to sk1 [ ti‘ .
166
Gerhard X. Ritter and Gonzalo Urcid
Since L1 k L2 in Ewis ðdir is Þ and x 2 L1 \ L2, L1 ¼ L2. But we also have x 2 L3 ¼ Ewx ðdir x Þ \ Ewis ðdi‘ is Þ with L2 k L3 in Ewx ðdir x Þ and, therefore, L1 ¼ L2 ¼ L3. Finally, since wx ¼ pi‘ and Ewis ðdi‘ is Þ ¼ Epi‘ ðdi‘ is Þ, we have L3 ¼ Epi‘ ðdir x Þ \ Epi‘ ðdi‘ is Þ. Thus, pi‘ 2 L3 ¼ L1 and wis 2 L1 ¼ L3 . It now follows that pi‘ ¼ wis , contrary to the fact that ti‘ n sk1 6¼ . Consequently, Swx ðir ; xÞ \ Qis ¼ and Swx ðir ; xÞ \ Ri‘ ¼ . Obviously, Swx ðir ; xÞ \ Qir ¼ since Swx ðir ; xÞ k Swir ðir ; xÞ in Lk1 , Swx ðir ; xÞ \ Swir ðir ; xÞ ¼ , and Swir ðir ; xÞ \ Qir 6¼ . Now since Ewx ðdir x Þ is a support hyperplane of F(X), Swir ðir ; xÞ does not cut F(X) \ Lk1. Hence, Swx ðir ; xÞ \ ½ðQis Pi‘ ÞnðQis [ Pi‘ Þ ¼ ¼ Swx ðir ; xÞ \ ½ðQir Pi‘ ÞnðQir [ Pi‘ Þ, Swx ðir ; xÞ \ ½ðQis Pi‘ Þ [ ðQir Pi‘ Þ ¼ Pi‘ , and Swx ðir ; xÞ \ ðti‘ nfpi‘ g ¼ . 6 ¼ 6 Swx ðir ; xÞ \ ½ðQir Ri‘ Þn Therefore, Swx ðir ; xÞ \ ½ðQis Rir ÞnðQis [ Rir Þ ¼ ðQir [ Ri‘ Þ. An analogous argument shows that Swir ðir ; xÞ \ ½ðQir Pis ÞnðQir [ Pis Þ ¼ ¼ Swir ðir ; xÞ \ ½ðQir Pi‘ ÞnðQir [ Pi‘ Þ;Swir ðir ; xÞ \ ½ðQir Pis Þ [ ðQir Pi‘ Þ ¼ Qir , and Swir ðir ; xÞ \ ½ðPi‘ Ri‘ ÞnðPi‘ [ Ri‘ Þ 6¼ . Since Swir ðir ; xÞ \ T1 separates Pi‘ from ðQir Pi‘ ÞnðQir [ Pi‘ Þ; Swir ðir ; xÞ \ Swx ðir ; xÞ 6¼ and, hence, Ewir ðdir x Þ \ Ewx ðdir x Þ 6¼ . Thus, by Corollary 8.2, wx ¼ wir 2 Wk , which contradicts our assumption that wx 2 = Wk. Hence, our assumption that wx ¼ pi‘ must be false in case tis nsk1 6¼ . Next suppose that tis nsk1 ¼ and consider the following six convex components of Lk1: C1 C2 C3 K1 K2 K3
þ ir ðir ; is Þ \ S þ ir ði‘ ; ir Þ; ¼ S w w ir ði‘ ; ir Þ \ S þ i‘ ði‘ ; is Þ; ¼ S w w þ ir ðir ; is Þ \ S i‘ ði‘ ; is Þ; ¼ S w w ir ðir ; is Þ \ S ir ði‘ ; ir Þ; ¼ S w w þ ir ði‘ ; ir Þ \ S þ i‘ ði‘ ; is Þ; ¼ S w w ir ðir ; is Þ \ S þ i‘ ði‘ ; i‘ Þ: ¼ S w w
Then Ci \ Ki ¼ Qir for i ¼ 1, 2, 3, [3i¼1 ðCi [ Ki Þ ¼ Lk1 ; T1 K1 ; T2 [ ðFðXÞ \ Lk1 Þ K2 , and [3i¼1 ðCi [ Ki ÞnK2 \ FðXÞ ¼ . Figure 22 illustrates this setup in case n ¼ 4 and k ¼ 3. We now consider the relationships between Swir ðir ; xÞ and the sets Ci [ Ki for i ¼ 1, 2, 3. Using our previous analysis of the intersections of two (k 2)dimensional hyperplanes in Lk1, it is easy to verify that Qir ¼ Swir ðir ; xÞ \ Swir ðir ; is Þ ¼ Swir ðir ; xÞ \ Swir ði‘ ; ir Þ ¼ Swir ðir ; xÞ \ Swi‘ ði‘ ; is Þ. Therefore, Swir ðir ; xÞ Ci [ Ki for some i 2 {1, 2, 3}. Clearly, since Swir ðir ; xÞ does not cut 6 C1 [ K1 , for otherF(X) \ Lk1, Swir ðir ; xÞ \ ðC2 [ K2 Þ ¼ Qir . Also Swir ðir ; xÞ wise Swir ðir ; xÞ must intersect Swx ðir ; xÞ in the interior of T1. Thus, the only possibility remaining is that Swir ðir ; xÞ C3 [ K3 . In this case, consider the hypertriangle T C3 with corners P ¼ Swir ðir ; xÞ \ Swis ði‘ ; is Þ; Qir , and Ri‘ . 6 and Swx ðir ; xÞ k6 Swis ði‘ ; is Þ, we Since Swx ðir ; xÞ \ ½ðQir Ri‘ ÞnðQir [ Ri‘ Þ ¼ must have Swx ðir ; xÞ \ ðP Qir Þ ¼ 6 . But this yields the contradiction that 6 . Hence our assumption that wx ¼ pi‘ must be false. Swx ðir ; xÞ \ Swir ðir ; xÞ ¼
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
Ril
K1 K2 Qis
Swis(il,is)
T2
Pil
wis
til
C3 T1
s k-1
P ir
167
wil
Sws(il,s)
Qir
Swil(il,is) wir
C1
C2 Swir(il,ir)
K3 Swis(ir,is) Swir(ir,is)
Swil(il,ir)
FIGURE 22 C1, C2, C3, K1, K2, and K3 are six convex components of Lk1 when tis sk1 ¼ for n ¼ 4 and k ¼ 3.
Since wx 2 ti‘ nsk1 and wx 6¼ pi‘ , we now have wx 2 ti‘ nðsk1 \ fpi‘ gÞ or, equivalently, wx is not a vertex of ti‘ . By following the steps used in the proof of the fact that wx 2 = sk1 in Step 2, we obtain that wx 2 = ti‘ . More specifically, it is easy to show that wx cannot be an interior point of any 1D face s1 ¼ hwir ; pi‘ i of ti‘ . Then using the same induction step as in Step 2 and assuming that wx 2 szþ1 ¼ hwj1 ; . . . ; wjzþ1 ; pi‘ i for some arbitrary z subset fj1 ; . . . ; jzþ1 g Jki‘ and wx 2 = @szþ1 ¼ [zþ1 where r¼1 sjr , z z1 jr ; . . . ; wjzþ1 ; pi‘ i, will lead to the contradiction sjr ¼ sjr pi‘ ¼ hwj1 ; . . . ; w that wx 2 = sjþ1. Therefore, wx 2 = ti‘ and, hence, wx 2 = sk1 [ \k‘¼1 ti‘ which is x contrary to the hypothesis of Step 3. Thus, w 2 Wk. If the order of the integers i‘, ir, is, and x, is different than the one used in the above argument—say, for example, that ir < is < i‘ < x—then simple relabeling of the objects used in the proof will also yield the conclusion that wx 2 Wk. For instance, the surfaces Swi‘ ði‘ ; ir Þ and Swir ði‘ ; ir Þ would be replaced by Swir ðir ; i‘ Þ and Swi‘ ðir ; i‘ Þ, respectively. For all possible order arrangements, only Swir ðir ; is Þ and Swis ðir ; is Þ remain the same since we had ir < is. As this relabeling amounts to simple bookkeeping, we dispense with repeating the above proof for the remaining cases of possible orders of the elements of the set {i‘, ir, is, x}. The proof that mx 2 F k1 if and only if □ mx 2 Mk is analogous.
REFERENCES Adams, J. B., Smith, M. O., and Johnson, P. E. (1986). Spectral mixture modeling: a new analysis of rock and soil types at the Viking Lander 1 site. J. Geophys. Res. 91(B8), 8098–8112.
168
Gerhard X. Ritter and Gonzalo Urcid
Anderson, J. C., Gosink, L., Duchaineau, M. A., and Joy, K. I. (2007). Feature identification and extraction in function fields. In ‘‘Proceedings Eurographics/IEEE-VGTC Symposium on Visualization,’’ pp. 195–201. Batten, L. M., and Rentelspacher, A. (1993). The Theory of Finite Linear Spaces. Cambridge University Press, Cambridge, UK. Boardman, J. W. (1993). Automated spectral unmixing of AVIRIS data using convex geometry concepts. In ‘‘Summaries, Fourth JPL Airborne Geoscience Workshop,’’ Vol. 1, pp. 11–14. JPL Publication 93-26. Chan, T- H., Chi, C- Y., Huang, Y- M., and Ma, W- K. (2009a). Convex analysis based minimum-volume enclosing simplex algorithm for hyperspectral unmixing. In ‘‘Proceedings of IEEE, International Conference on Acoustics, Speech, and Signal Processing,’’ pp. 1089–1092. Chan, T- H., Chi, C- Y., Huang, Y- M., and Ma, W- K. (2009b). A convex analysis based minimum-volume enclosing simplex algorithm for hyperspectral unmixing. IEEE Trans. Signal Process 57(11), 4418–4432. Clark, R. N., Swayze, G. A., Wise, R., Livo, E., Hoefen, T., Kokaly, R., and Sutley, S. J. (2007). USGS digital spectral library splib06a: U.S. Geological Survey, Digital Data Series 231. http://speclab.cr.usgs.gov/spectral.lib06. Craig, M. D. (1994). Minimum-volume transforms for remotely sensed data. IEEE Trans. Geosci. Remote Sensing 32(3), 542–552. Critescu, R. (1977). Topological Vector Spaces. Nordhoff International Publishing, Leyden, Holland. Dobigeon, N., Tourneret, J. Y., and Chang, C. I. (2007). Semi-Supervised Linear Spectral Unmixing using a Hierarchical Bayesian Model for Hyperspectral Imagery, pp. 1–34. IRIT-ENSEEHIT-Te´SA, France, Technical Report. Duda, R. O., Hart, P. E., and Stork, D. G. (2000). Pattern Classification, 2 ed. John Wiley and Sons, New York. Eggleston, H. G. (1963). Convexity. Cambridge University Press, Cambridge. Gran˜a, M., Herna´ndez, C., and Gallego, J. (2004). A single individual evolutionary strategy for endmember search in hyperspectral images. Inform. Sci. 161(3–4), 181–197. Gran˜a, M., Jime´nez, J. L., and C. Herna´ndez, C. (2007). Lattice independence, autoassociative morphological memories and unsupervised segmentation of hyperspectral images. In ‘‘Proceedings 10th Joint Conference on Information Sciences,’’ pp. 1624–1631. Gran˜a, M., Sussner, P., and Ritter, G. X. (2003). Associative morphological memories for endmember determination in spectral unmixing. In ‘‘Proceedings of IEEE, International Conference on Fuzzy Systems,’’ pp. 1285–1290. Gran˜a, M., Villaverde, I., Maldonado, J. O., and Herna´ndez, C. (2009). Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10–12), 2111–2120. ¨ ber Inhalt, Oberfla¨che und Isoperimetrie. Springer, Berlin. Hadwiger, H. (1957). Vorlesungen U Hook, S. J. (1999). ASTER spectral library: Jet Propulsion Laboratory. California Institute of Technology, Pasadena, CA, http://speclib.jpl.nasa.gov. Ikramov, Kh.D., and Matin-far, M. (2006). Computer-algebra implementation of the least squares method on the nonnegative orthant. J. Math. Sci. 132(2), 156–159. Jensen, J. R. (2007). Remote Sensing of the Environment: An Earth Resource Perspective, 2 ed. (Prentice-Hall Series in Geographic Information Systems), Upper Saddle River, NJ, Prentice-Hall. Kruse, F. A., Richardson, L. L., and Ambrosia, V. G. (1997). Techniques developed for geological analysis of hyperspectral data applied to near-shore hyperspectral ocean data. In ‘‘Proceedings IV International Conference on Remote Sensing for Marine and Coastal Environments,’’ I, pp. 233–246. Environmental Research Institute of Michigan, Ann Arbor, MI.
Lattice Algebra Approach to Endmember Determination in Hyperspectral Imagery
169
Lawson, C. L., and Hanson, R. J. (1974). Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs, NJ. Myers, D. S. (2005). Hyperspectral Endmember Detection Using Morphological Autoassociative Memories, pp. 1–51. M.S. thesis, University of Florida, Gainesville, FL. Nascimento, J. M. P., and Bioucas-Dias, J. M. (2005). Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sensing 43(4), 898–910. Richardson, L. L., Buison, D., Liu, C. J., and Ambrosia, G., V (1994). The detection of algal photosynthetic accessory pigments using AVIRIS spectral data. Marine Technol. Soc. J. 28, 10–21. Ritter, G. X., Dı´az de Le´on, J. L., and Sussmer, P. (1999). Morphological bidirectional associative memories. Neural Netw. 12, 851–867. Ritter, G. X., and Gader, P. (2006). Fixed points of lattice transforms and lattice associative memories. Adv. Imaging Electr. Phys. 144, 165–242. Ritter, G. X., Sussner, P., and Dı´az de Le´on, J. L. (1998). Morphological associative memories. IEEE Trans. Neural Netw. 9(2), 281–293. Ritter, G. X., Urcid, G., and Iancu, L. (2003). Reconstruction of patterns from noisy inputs using morphological associative memories. J. Math. Imaging Vision 19(2), 95–111. Ritter, G. X., Urcid, G., and Schmalz, M. S. (2009). Autonomous single-pass endmember approximation using lattice auto-associative memories. Neurocomputing 72(10–12), 2101–2110. Roberts, D. A., Gardner, M., Church, R., Ustin, S., Sheer, G., and Green, R. O. (1998). Mapping chaparral in the Santa Monica Mountains using multiple endmember spectral mixture models. Remote Sensing Environ. 65(3), 267–279. Vane, G., Green, R. O., Chrien, T. G., Enmark, H. T., Hansen, E. G., and Porter, W. M. (1993). The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). Remote Sensing Environ. 44, 127–143. Winter, M. E. (1999a). Fast autonomous spectral endmember determination in hyperspectral data. In ‘‘Proceedings 13th International Conference on Applied Remote Sensing,’’ 2, pp. 337–344. Winter, M. E. (1999b). An algorithm for fast autonomous spectral endmember determination in hyperspectral analysis. In ‘‘Imaging Spectrometry, Proceedings of SPIE,’’ 3753, 266–275.
This page intentionally left blank
Chapter
5 Origin and Background of the Invention of the Electron Microscope* Reinhold Ru¨denbergw
Contents
1. Introduction 2. Cathode Ray Tubes in Evolution 3. My Own Career 4. A Case of Infantile Paralysis 5. Limitations of Optical Microscopes 6. Realization of the Electron Microscope 7. Fate of the Invention 8. Building of the Instrument References
172 173 178 181 183 184 193 195 198
* The original manuscript for this article is from the Papers of Reinhold Ru¨denberg. 1901-1964 (Harvard
University Archives, Pusey Library (Cambridge, MA), HUG 4756.31). This text is the latest of several drafts written by Reinhold Ru¨denberg. Some earlier drafts are still extant and contain a few variant readings. These are noted and included here in their original form when they seemed significantly different from Ru¨denberg’s final version. This text reproduces the 1959 text and figures of Ru¨denberg’s memoir exactly. However, some of the original references have been expanded, titles and closing page numbers have been added, and a few corrections have been made. For a general introduction to Chapters 5 and 6, see the Preface to this volume, especially pp. x–xiii. Gordon McKay Professor of Electrical Engineering, Emeritus, Harvard University, Cambridge, Massachusetts, USA; and Professor Emeritus and Honorary Senator, Technological University of Berlin, Germany w
Deceased
Advances in Imaging and Electron Physics, Volume 160, ISSN 1076-5670, DOI: 10.1016/S1076-5670(10)60005-5. Copyright # 2010 Elsevier Inc. All rights reserved.
171
172
Reinhold Ru¨denberg
1. INTRODUCTION{ During the past decades a number of deep-searching investigations have been published about the mechanism of the mental creation of significant inventions in the various areas of knowledge,1a,b in contrast to the more or less technical advancement of constructions in well-known fields. The question has been raised, whether and in which way entirely new realizable thoughts or concepts may be created without strictly logical deductions, be they conscious or unconscious ones, and which role mere intuition may play herewith. Unfortunately many of the inventors either were too busy, or even have died, before their inventions became fully developed, and therefore have not been in a position to describe themselves in detail the mental procedures which have led to their achievements. It then is very difficult to penetrate later by means of clues and deductions from outside into the thoughts which the inventor may have had before. An example of the opening of a new branch of research which formerly was entirely unknown is presented by the advent of the electron microscope, which by now has grown into an indispensable instrument for the pursuit of progress and discoveries in many areas of the experimental sciences and engineering. Since I have participated in the origin of this instrument as inventor, my description of the background and the beginning of its development may be of interest not only for the physicist or engineer but also for the psychologist and historian. Naturally it is necessary for such a description to analyze very accurately the previous volume of knowledge and experience in the mother areas in such detail that an understanding is gained for the broad foundation on which this disclosure occurred. Therefore it seems not sufficient to me to consider only the general concepts and interrelations as we see them in retrospect, but it shall be tried to depict the actual physical facts and events in their true historical sequence and in such specific words that all scientists, and I hope nonscientists, may understand the chronological flow of the developments. In part I am following with this a lecture which I gave before a broader audience some time ago at the occasion of the centenary of the founding of the Engineering School of Harvard University. I must also report in a condensed way about my personal circumstances, since otherwise the mental path of development to the electron microscope could not be completely understood.
{
Summary from orginal manuscript: About thirty years ago it was realized that electron rays can be employed for imaging purposes in the magnification of such minute details of nature which are invisible in the lightoptical microscope. The way in which this discovery originated and was utilized, the background of the art of electronics as known at the time, and of a number of personal circumstances of the inventor, are presented here for the first time and references are given for every step.
Origin and Background of the Invention of the Electron Microscope
173
2. CATHODE RAY TUBES IN EVOLUTION Just a century ago J. Plu¨cker2 in 1859, W. Hittorf3 in 1869, and W. Crookes4 in 1879, contrived the cathode ray tube consisting of a vacuum vessel in which high-voltage electricity was discharged between a negative pole, the cathode, and the positive anode. All of these explorers observed that rays emerged in straight lines from the cathode surfaces, which could be seen from outside due to the fluorescence of the low-pressure gas content of the tube. Even shadows of obstacles to the cathode rays could be observed on an internal fluorescent screen, proving their rectilinear spreading. These tubes were the precursors of those used in many modern applications. It was suspected from the outset, and we know today for certain, that these cathode rays consist of swarms of very minute negative electrical particles, or electrons. From such crude beginnings many scientists in physics and engineering derived consequently revolutionary ways, means, and materials. The pioneers mentioned had already observed a deflection of their rays under the effect of an external magnet. In 1881, and successively in 1901 and 1902, E. Riecke5a-c showed mathematically that free electrically charged particles subjected to uniform magnetic or electric fields, or both, stream in circular, helical, or cycloidic curves, depending on the direction and intensity of those fields. In 1890, A. Schuster6 produced and measured such spiral paths of cathode rays in a magnetic field traversing the vacuum tube. P. Lenard7 in 1894 found that cathode rays could pass through a thin aluminum window into the outer space which gave an indication for the smallness of the electron particles, finding their way through the window’s molecules. And in 1895, W. Ro¨ntgen8 discovered the X-rays which emanate from any solid material hit by cathode rays in the vacuum vessel. Very soon, in 1897, J. J. Thomson9 proved by ingenious experiments with cathode rays deflected in vacuum tubes, the existence and the behavior of electrons as the elementary corpuscles of electricity. Independently, also in 1897, very similar experiments were performed by E. Wiechert10 with lower accuracy, and by W. Kaufmann11a-c with higher precision, both leading to the same results and thus confirming this amazing fact by a multitude of evidences. By applying the magnetic field of a coil transversely to the stream or beam of electrons in a long vacuum tube, similarly as already done in Schuster’s early experiments, F. Braun12 in 1897 developed the cathode ray deflection tube bearing his name, as a measuring instrument for rapidly changing electric currents. And in 1898, H. Ebert13 showed that an electric field between charged deflection plates across the tube may be used to
174
Reinhold Ru¨denberg
FIGURE 1 Cathode ray tube outline.
measure similarly electric voltages. Both these means today are widely applied in the millions of television tubes and oscilloscopes of every kind. The principal scheme of such a tube as shown in Figure 1 embodies a neck, containing the negative cathode emitting electron rays which pass the positive anode and then can be deflected either by an electromagnetic coil or by a pair of electrostatic plates, both across the neck. It further contains a flared funnel terminated by an extended screen on which the cathode rays impinge, thus tracing a luminous path on its fluorescent layer if the current in the coil or the voltage across the plates or both are changing with time. During the early period, a sharp spot of cathode rays falling on a fluorescent screen as necessary for precise measurements was produced by a small aperture in an intermediate mask, as also shown in Figure 1. However, E. Wiechert14 showed in 1899 that a much more efficient concentration of the electron beam could be produced by an electric coil around the beam producing an axially directed magnetic field, so that under its force the rays stream in long helixes of slight inclination to the axis, as generally analyzed long ago by Riecke, and all of them hit the screen on a narrow spot. In 1905, R. Rankin15 used a similar but shorter coil to the same effect, and this has been repeated by many others thereafter. Much scientific and engineering work has been performed in the following years on special problems connected with cathode ray tubes. After T. A. Edison16 in 1884 and J. A. Fleming17 in 1890 had discovered the valve effect of a hot cathode in a vacuum, O. W. Richardson18 in 1901 analyzed successfully the laws of thermionic electron emission from an incandescent cathode. And in 1904 A. Wehnelt19 explored this experimentally and found a much more abundant emission from oxide-coated cathodes, even at moderate temperature.
Origin and Background of the Invention of the Electron Microscope
175
In 1906 there began the development of the amplifier triode independently initiated by L. de Forest20a,b and by R. von Lieben,21 who both used lattices or grids between cathode and anode to control by their charge the intensity of the electron flow. This created a far-reaching revolution in the entire communication field, which nowadays goes under the collective name of ‘‘Electronics.’’ If this was on the one hand an ever more important application of electrons to practical engineering, there followed on the other hand in 1912 the fundamental discovery by M. von Laue22a-c and W. L. Bragg23 of the wave character of electronically created X-rays, with wave lengths much shorter than light or even than ultraviolet rays. At the same time this gave a clear demonstration of the molecular fine-structure of crystals to which these explorers subjected their irradiating X-rays. Returning to cathode ray tubes proper, A. Wehnelt24 devised in 1908 a coaxial cylindrical electrode adjacent to the cathode, in order to change the concentration of the cathode rays, and allowing to control the intensity of the electron beam emitted. This was an electrostatic counterpart to the magnetic concentration coils mentioned above. J. E. Lilienfeld25 in a patent of 1915, but published in 1923, devised for a similar purpose a series of apertured electrodes of increasing potential between cathode and anode along the axis. This scheme was varied by J. B. Johanson26 in 1922 to the use of a disk and a cylinder of appropriate voltages, and by L. T. Jones with H. G. Tasker27 in 1924 to the application of a cap and a plate, both in tubes containing rest-gases for additional concentration of the beam by effect of free ions. Finally W. Rogowski28 with W. Grosser in 1925 and R. H. George29 in 1929 used various shapes and potentials of cathode and anode and of intermediate electrodes in order to form a ray concentrating field distribution, and C. A. Sabbah30 in 1926 as well as M. von Ardenne,31 V. K. Zworykin,32a,b and P. E. L. Chevallier,33 all of them in 1930, repeated this in several consecutive steps. All these designs were planned to produce a controllable sharp luminous spot on the screen for use in deflection tubes of oscilloscopes. Among the many investigators who used for the same purpose Wiechert’s concentration coil coaxial to the electron beam, some published their success, as did C. Samson34 in 1918 and H. Behnken35 in 1922, without presenting any significant new results in this respect. However, D. Gabor,36 working in 1927 on cathode ray oscillographs for high voltage and rapid action, improved considerably the effect of magnetic concentration coils by enclosing them in iron, leaving an active central gap with greatly increasing magnetic field strength in the electron zone. Finally in 1931 E. Ruska with M. Knoll37 showed how to compute such magnetic concentration coils for condensing electron rays to a spot.
176
Reinhold Ru¨denberg
FIGURE 2
Cathode ray tube schematic.
A complete cathode ray tube now looked like Figure 2 in which the basic elements are marked with the names of their principal developers. This is still the main scheme of present day oscilloscopes and television tubes. Whereas most of these innovations had resulted from experimental work in the field, three remarkable theoretical developments ensured a much deeper insight into the mechanism of action of electrons in electromagnetic fields of force. The first goes back to 1903, when K. Schwarzschild38 investigated the general performance of charged particles in such fields and found very precise mathematical expressions connecting the problem to general Hamiltonian mechanics and optics. The others restricted themselves to the effects of coaxial concentration fields. W. Rogowski39 with W. Gro¨sser in 1925 proved graphically that a bundle of electron rays emitted from a cathode point on the axis of a relatively short coil before entering its magnetic field, is deflected circularly within radial planes, and therefore after leaving the coil will return to the axis at a common point farther out, under the condition that the coil of proper excitation is located in the middle between these two points. Such a design actually gave a considerable intensification of the luminous spot on the fluorescent screen. Then, in 1926 and 1927 H. Busch40a,b investigated the same problem analytically and showed that, without the last mentioned condition, quite generally the position of the concentration point is determined by Gauss’ optical law, only depending on the intensity of the magnetic field. Thus, a concentration coil has a definite focal length, for which he gave a formulation, and acts on electron rays in the same way as ‘‘a condensing glass lens’’ (Sammellinse) acts on light rays. With electrons also, a finite spot of emission on the cathode becomes enlarged or reduced on the screen proportional to the distances from the coil.
Origin and Background of the Invention of the Electron Microscope
177
Busch’s analysis was restricted to homogeneous magnetic fields of either very long or very short coils. However it was indicated that with non-homogeneous fields the well known aberrations of optical lenses with light rays have their counterpart with concentration coils and electron beams. In using all these results it now was possible for him to greatly sharpen the fluorescent spot. Regarding electrostatic fields, Busch also had analytically investigated their general effect on electron beams, but as he explained himself41 ten years later: ‘‘without however drawing the conclusion of the existence of an electric lens as an independent imaging organ.’’ Actually, this latter was done for concentration purposes by M. Knoll42 in a patent of 1929 published in 1940, and for image formation, which later will be explained in detail, by myself89b,89d in a patent of 1931, first published in 1932, the two disclosures thus overcutting each other. Both these patents described among others 3-aperture electrodes for focusing an electron beam independent of all other influences. In parallel to this gradually increasing knowledge of the internal electron ray behavior in vacuum vessels, a great number of investigators, particularly within the period from 1920 to 1930, perfected the usage of the original Braun tube into a cathode ray oscillograph by refining its design and the external circuit performance to those of a reliable measuring instrument. These were especially A. Dufour43 and J. Fallou44 in France; D. A. Keys,45 A. B. Wood,46 N. V. Kipping,47 and J. T. McGregor-Norris48 in England; H. Norinder49 in Sweden; K. Berger50 in Switzerland; H. L. Ryan,51 E. L. Chaffee,52 F. Bedell53 with H. J. Reich, E. S. Lee,54 and K. B. McEachron55 in the U.S.A; J. Zenneck,56 H. Th. Simon,57 E. Madelung,58 W. Rogowski59a-c with E. Flegler, D. Gabor,60a,b A. Matthias,61 M. Knoll,62a,b and W. Krug63a,b in Germany, and also myself.64a,b They all tried to obtain a well-controlled tiny luminous spot on the fluorescent screen in their deflection tubes of various kinds, and these were used in that period predominantly to move the spot rapidly over the screen in order to indicate or to measure voltage or current or time, or to show their interrelations by tracing a luminous curve on the screen. This was the state of the art around 1930. A very extensive and thorough research into the international literature and patents in the field of cathode ray devices until 1931 performed by E. Alberti,65 a member of the German Patent Office, resulted in no different picture. In the meantime, deep-rooted concepts related to the characteristic properties of electrons had been disclosed, which changed gradually the entire philosophy of physics. In 1924 to 1925, L. de Broglie66a-c and in 1926 E. Schro¨dinger67 published their revolutionary theories of the wave
178
Reinhold Ru¨denberg
concept of electrons and other particles, and in 1927 C. J. Davisson68 with L. H. Germer and in 1928 G. P. Thomson69 had verified by diffraction experiments with crystals and thin foils the wavelike behavior of electron rays. These now have to be conceived of in a dual way, as particles as well as waves, the frequency and wave length of the latter depending on the velocity of the electrons in space. As viewed a posteriori, one should believe that every theoretical of experimental scientist aware of such developments as depicted before could now have easily conceived and realized the idea of an electron microscope. However, none of them, not even the many inventors and developers as named above in the cathode ray oscillograph field, have actually disclosed in any way such an inventive step before the year 1931. They had concentrated their attention perhaps too much on the production of a minute cathode ray spot in its movement over a luminous screen. One inventor, H. Stintzing,70 had suggested in 1927 to count or register minute particles by means of X-rays, gamma-rays, or corpuscular rays— without, however, indicating anything like producing a magnified real image of them. Another scientist, E. Bru¨che,71 thinking in broader lines and later becoming one of the foremost experts in electron optics, explained still in 1930: ‘‘Quite generally, the magnetic electron deflection is analogous to the refraction, the electric deflection is analogous to the reflection of light,’’ without drawing any conclusion in our direction from this half-truth, half-fallacy. Also for myself, a stimulation from outside and coming from an entirely different direction, proved to be necessary to convey to me through a toilsome process of thinking the recognition of the possibility of the creation of an electron microscope. In order to explain this in detail it will be necessary to look briefly into the development of my own career as a scientific engineer. I should say at this point that, although only a part of the aforementioned publications had been known to me at that time, I always was well aware of the more important achievements in the general domain of electrons and electromagnetic fields, which I considered as one of the scientific foundations of the area of electrical engineering.
3. MY OWN CAREER From youth on I felt attracted to the wonders of electricity, a field which then was still in its infancy. When in High School (German ‘‘Gymnasium’’) until 1901 I had built a two-way Morse Telegraph from our playroom to a garden cottage, and later for a science fair a self-excited Siemens generator.
Origin and Background of the Invention of the Electron Microscope
179
While in the upper classes, I constructed a Ru¨hmkorff inductor to operate a Roentgen tube, and also a Righi transmitter with Marconi receiver, so that I became quite familiar with X-rays as well as with radio waves. The making of electric and other machinery I learned after high school during a year in the workshops of several machine factories. After studying from 1901 to 1906 electrical and mechanical engineering at the Institute of Technology in my native town of Hannover, Germany, terminating with a Doctoral dissertation on Eddy-Current Formation,72 I served from 1906 to 1908 with the University of Go¨ttingen as an assistant to L. Prandtl and an instructor in the applied mechanics laboratory. Besides, I did some research on high-frequency production in the electrical laboratory of the University, and I had an opportunity of taking courses in the newly developed field of electron theory with M. Abraham and E. Wiechert, then two outstanding experts in this area, and also with K. Schwarzschild in astronomy and related problems. In 1908 I joined the industrial firm of Siemens-Schuckertwerke in Berlin as a design engineer for electric commutator motors. In 1914 I became chief engineer for alternating current machines and in 1919 for all types of electric power machinery. In between I served through three years at half time with the patent department of this firm, learning also the essential tools of that field. In 1923 I advanced to the position of Chief Electrical Engineer. As such I was in charge of further developments in the field of electric high-power engineering (Starkstrom-technik), whereas the adjacent field of low-power engineering (Schwachstromtechnik) was covered by the sister firm of Siemens & Halske. During the time with Siemens-Schuckertwerke I had an opportunity to develop a number of new electrical designs in their manufacturing area, for example, three-phase shunt motors with brush control of speed,73a direct inner cooling of high-voltage generator windings,73b1,b2 one of the two first 60,000 kW turbine generators,73c induction motors with automatic starting by rotor eddy currents,73d smooth hollow conductors for high-voltage transmission lines,73e high-frequency carrier telephony on long power lines,73f automatic relay systems for distant interconnected power stations,73g1,g2 and many others in the electric power field. For numerous of these achievements patents were granted in many countries as it is usual with industrial enterprises. Further developments and experiences I condensed in 1923 into a book on Transients in Electric Power Systems,74a,b the fourth edition of which was published in the English, German, Russian, and Romanian languages. Other investigations in theory and practice I have published by papers in engineering journals75 and by lectures in scientific societies. Among these were production and propagation of shock waves from explosions,76a preferred number series for standards of materials and
180
Reinhold Ru¨denberg
machines,76b the cause of inductive interference between power and communication lines,76c the concept of natural power, or surge impedance loading, of long-distance transmission lines,76d generation of electric power from heat by forcing ionized vapor across a magnetic field,76e and quality control as based on statistical theory.76f In 1913 I joined the Institute of Technology in Berlin as a lecturer (Privatdozent) on Special Problems in Electric Power Engineering, in order to instruct promising students in such modern developments as named above, which were not yet included in the standard courses. For this I was awarded an Honorary Professorship in 1927.* And in 1929 I had been called for similar lectures as a visiting professor to the Massachusetts Institute of Technology in Cambridge. Despite my practical work in industry and teaching I always tried to remain in contact with its foundations and even to perform basic research where possible. For example, while experimenting with the generation of high-frequency currents by machinery,77 I analyzed quantitatively the fundamental mechanism of the reception of electric waves in antennas,78 and these early results are still widely used in this field. Forced by the Nazi system, I left Germany in 1936 and after an association through about three years with the British General Electric Co. in England, I was called in 1938 to Harvard University in the U.S.A., where I served as Gordon McKay Professor of Electrical Engineering until my retirement in 1952. My principal knowledge of the theory and practice of electrons stems from my early time in Go¨ttingen where I was close to a number of scientists as mentioned above in the historical evolution of cathode ray tubes (Riecke, Wiechert, Simon, Schwarzschild, Madelung, Busch). Others in that group I met later in Berlin (Wehnelt, Rogowski, Matthias, Gabor, von Laue, Zenneck, Schro¨dinger), and I kept well in touch with the development of this fascinating electron field. In Go¨ttingen laboratories when working with Braun’s cathode ray tubes used as measuring instruments, I had occasionally observed strange patterns drifting over the luminescent screen, appearing and disappearing at random, which could not be explained by any known means, and which I almost had forgotten later on. Nowadays I surmise that these patterns must have been magnified electron images of the structure of the cathode ray surface, but at that early date nobody dared to think of such a fantastic explanation.
* May be a mistranslation of the German title of ‘‘ Honorarprofessor.’’
Origin and Background of the Invention of the Electron Microscope
181
From early Ru¨denberg draft July 3, 1945.
4. A CASE OF INFANTILE PARALYSIS Now it happened in the autumn of 1930, that my youngest son Hermann at the age of nearly three years was stricken with poliomyelitis. Many discussions with our physicians and particularly extensive conversations with Oskar Minkowski,79a-c an uncle of my wife and an outstanding medical scientist of diabetes fame, revealed to me that almost nothing was known about the presumptive germ, or virus, of this disease, except that it was filterable through the finest capillaries, but evaded any observation under the optical microscope. It thus must be of submicroscopic minuteness. This amazing fact and its significance for science and health gave me no rest in my thoughts. During many sleepless nights, tortured by the fate of my son, agonizing fantasies came and went, how to find ways to examine these minute germs, how possibly to attack them in order to attain healing
182
Reinhold Ru¨denberg
or at least a standstill of the disease. Certainly an agent finer than light had to be found to make these tiny viruses of immeasurable size visible to the human eye. X-rays, which von Laue and Bragg had shown to be waves like light, only of very minute length, had to be discarded since no means could be devised to influence their paths like those of visible light by glass lenses or curved mirrors. One early morning however, about November 1930, I told my wife that it might be conceivable in principle to recognize such viruses by the use of electrons and possibly even visualize them by magnifying imaging. For, electrons as elementary building stones of our material world and constituents of the atoms would probably be much smaller in size than the assumed virus particles. But how to realize such vague ideas for the recognition of minute germs, much smaller than the length of the shortest light waves and therefore completely outside the realm of our visible world? Only the drifting patterns in the vacuum tubes years back in Go¨ttingen seemed to me to indicate that electrons might actually serve as carriers of image patterns, if the contrasts of these could be imposed in some way on an electron beam. And this did not appear impossible to me, since Lenard’s early experiments had shown that thin aluminum windows could be penetrated by the electron particles of cathode rays which, as was known now, have sizes of only one millionth of one millionth of a centimeter. Even then, ways and means had to be found to magnify these patterns, perhaps by diverging the electron rays very widely in a cathode ray tube, or even first converging them in order to simulate the effect of glass lenses in the light microscope. Finally a fairly conclusive conception developed: Impinging or traversing beams of electrons, or even rays of positive particles, or protons, said to be still tinier than electrons, may be changed in the minute structure of their cross-sectional arrangement by the contrasts in density and shape of a group of viruses, in a similar way as a light beam is changed and obtains an information by the specimen in an optical microscope. Then, by subjecting the electron beam to proper electric or magnetic fields of force, arranged around the beam, the electron rays might be influenced to diverge their inherent pattern into a greatly magnified image on a screen. If this scheme should be realizable, both my wife who is a biologist, and myself were convinced toward the end of 1930 that a new tool for penetrating deeper than before into the secrets of nature would be created, not only for visualizing of viruses and other medical germs, but also for the recognition and perception on a new level of observation of the minutest structures and performances in many other areas of science, which never could be revealed by optical microscopes.
Origin and Background of the Invention of the Electron Microscope
183
5. LIMITATIONS OF OPTICAL MICROSCOPES Hence, let us look shortly into the history and evolution of light microscopy. The ancient Chaldeans at 1000 BC had known already the concentration of light rays through a glass or water sphere to a crude burning spot, or focus. More than 2000 years later, towards 1300 AD, glass spectacles80 were invented as a visual aid for aging human eyes. A. van Leeuwenhoek,81 initiating the science of bacteriology, used by 1695 still a simple magnifying glass, ground by himself, although the compound microscope with two lenses in cascade had been disclosed about 1600, some three hundred years later than the spectacles. However, after this invention, all endeavors to usefully enhance the magnification of the compound microscope82 beyond a certain limit, or to increase its resolving power beyond that of the bacterial world, failed completely, in spite of making even finer glass lenses of greatly reduced optical defects. It was nearly three hundred more years until E. Abbe83 in 1873 and H. von Helmholtz84 in 1874 presented the actual reason for this limitation, namely the finite length of light waves, which obviously cannot probe or scan any object smaller than they are themselves. With light waves of the order of 105 cm length and a power of resolution of the human eye near to 102 cm, the limit of useful magnification of any light microscope is about 102/105 ¼ 103, or a thousand times. With oil immersion of the lens or use of ultraviolet light one can go a little further. Any still higher magnification only gives patterns of empty content in which the details are missing. This is the physical reason why virus particles could not be seen by any microscope in 1930. The visible world, even with the help of the most powerful optical microscope, was restricted to phenomena larger than about 105 cm—1/10,000 millimeter—in size. However, electrons are very small corpuscles as shown by Thomson, Wiechert, Kaufmann, and their successors. We may conceive of them by thinking of an electrically charged sphere and steadily reducing its size until it reaches the limit of 1012 cm, and then removing the sphere while leaving the charge. This remainder is the tiny electron, and de Broglie and Schro¨dinger have shown that it acts not only as a particle, but also as a packet of waves of about 109 cm in length. This is 10,000 times smaller than light waves and hence, if we can use electrons as imaging agents we should be able to penetrate that much farther into the minute details of nature.
184
Reinhold Ru¨denberg
6. REALIZATION OF THE ELECTRON MICROSCOPE Back in the winter of 1930 and 1931, while seeking for ways and means to employ electrons as carriers of virus patterns which should be made visible, I was led to believe that electromagnetic fields, if properly devised, might be able to influence the rays of electrons in such a way that they would reproduce on a greatly enlarged scale the intrinsic contrasts imposed previously on the beam by a minute object.
From Ru¨denberg draft April 8, 1946.
In order not to distort during the expansion the pattern now contained in the cross section of the electron beam, it was necessary to arrange for electric forces on the individual electrons, which are strictly in proportion to their distance from the axis.
Origin and Background of the Invention of the Electron Microscope
185
From draft July 3, 1945.
Thus, the intensity of the deflecting electromagnetic field must be proportional to the separation of any point from the beam axis. Many different structures of electric or magnetic lines of force were devised in order to find useful embodiments for this purpose. Finally, circular rings or diaphragms, coaxially arranged around the electron beam and properly charged electrically, were found to give near the center of their openings a ray deflecting field strength very well proportional to the distance from the center. Figure 3 represents such an apertured diaphragm showing by arrows and solid lines the directions and lines of force E developing in the inner space and by dotted lines the paths of electron rays entering this zone from the left. If this electron lens is negatively charged, the electrons are repelled from the outer metallic diaphragm and they all converge under the electric forces toward a common focus F and diverge after passing through this. A positive charge contrariwise will diverge after passing through this. A positive charge contrariwise will diverge the impinging electron beam immediately. It appeared particularly useful to compose such electrostatic electron lenses of two or three parts of opposite charge and arranged
186
Reinhold Ru¨denberg
FIGURE 3
Apertured diaphragm—focusing.
FIGURE 4
Magnetic focusing coil.
symmetrically along the beam, in order to obtain an independent or neutral electron lens which did not influence the axial velocity of the electron beam passing through the lens as a whole. Or, a magnetic coil coaxial to the electron beam may be employed. With this, the force on the electrons is perpendicular to their velocity v and to the magnetic field strength H of the coil. Thus, the direction of force is within the equipotential surfaces of the field, and is again proportional to the separation from the axis. This is represented in Figure 4 where the magnetic field lines are shown solid, the equipotentials dashed, and the electron paths dotted again. Here the electrons in the rays always move radially converging toward the axis, but they also diverge after passing through the focus. Simultaneously the magnetic forces rotate the emerging beam around the axis. It is obvious that these two types of ‘‘electron lenses,’’ the electrostatic and the magnetic ones, correspond to their general action directly to the ‘‘concentration electrodes’’ from Wehnelt via George until Chevallier and to the ‘‘concentration coils’’ from Wiechert via Busch until Gabor. Only that
Origin and Background of the Invention of the Electron Microscope
187
now the condition resulting for the lenses, in order to obtain a highly defined divergence of the electron rays, had to be stressed as an absolute necessity for our problem, in order to expand the beam radially up to 100,000 times and more. This is very different from condensing a beam to a small dot so often done and explored before, as described in the foregoing historical part. Then, the electron rays had been concentrated to a spot without asking for their inherent structure. Now, however, the cross-sectional fine structure of the beam was of main concern and its preservation during a very wide divergence of the rays became the goal of the endeavor. Further, the experiments and the analyses of the former investigators referred to electron rays emanating from a single spot on the axis of coils or diaphragms, which should return to another common spot on the axis. My problem, however, was to allow all the extra-axial rays in the minute cross-sectional pattern of the beam to expand widely from the axis without detrimental distortion of the pattern. This effect had been analyzed mathematically much later by L. Posener,100 working under M. von Laue and J. Picht. My own analysis showed that electron lenses useful for these purposes should have a free aperture circularly surrounded by magnetic or electric material, and that this aperture should be relatively large as compared to the active parts of the electron beam, in order to secure a field strength as proportional as possible to the radius. For, the edges and rims of the material at the contour of the aperture bring about distortions of the field which should be far removed from the center portion of the lenses. Or, the contour of the aperture had to be shaped in a specific way to avoid such distortions, a problem which I had postponed for mathematical solution to a later date. All this is in contrast to some electron lenses later suggested by other designers85 who simulated by net-screens the curved surfaces of optical glass lenses, devices which have not worked properly in electron microscopes. Since now the condition for perfectly imaging electron lenses have been found and since it was apparent that such electric or magnetic lenses could be built with sufficient precision, it became certain that a high magnification in scale of all intrinsic contrasts in a field of viruses or other minute particles would be possible by effect of electron rays, and this enlarged pattern finally could be made visible on a luminescent screen. In order to penetrate into contrasts of sub-microscopic minuteness it appeared favorable to apply several steps of magnification in cascade, similar to the action of the optical compound microscope, since otherwise an excessively long tube* would be necessary for the enlargement here intended. The ultimate resolution of the original object would not
* ‘‘tubus’’ in all drafts of the manuscript.
188
Reinhold Ru¨denberg
be affected hereby, since this depends only upon the size and the wave properties of the individual electrons forming the rays. The marvel is that such electron lenses not only can condense the fire of electrons to a small burning spot or focus, as did the concentration coils and concentration diaphragms of the former explorers, but that, when properly shaped and located, electron lenses will pick up in ultramicroscopic action the physical cross-sectional pattern of any object in the path of their rays, if only the specimen is thin enough to allow a part of the electrons to pass through without too much distortion. Further, that the lens effect then can project the minute contrasts of the pattern into a real electron image in an extended plane where it may be made visible on a fluorescent screen. And finally, that such a projected real electron image can be picked up again by a subsequent electron lens and further be magnified many more times by use of the same process. All this was found out, when I brooded over viruses in the winter of 1930 to 1931. In 1927 E. E. Watson86 had shown that under the repelling forces among themselves of the negative electrons in a beam, these actually do never concentrate into a point, but contract only to a smallest cross-section and expand again thereafter. Thus a sharp optical focus, as known between converging and diverging straight rays of light, will never exist with electrons. Happily enough, this does not disturb the lens effect as used for multiple magnification of images, because the focus proper is not used for this process. Now, to build up a complete electron microscope necessitates to combine a number of the elements described before in their interaction with electrons. This may be shown by means of Figure 5. In an evacuated vessel we provide a source of electrons, usually a high-voltage heated
FIGURE 5
Electron microscope schematic.
Origin and Background of the Invention of the Electron Microscope
189
FIGURE 6 Concentration of spot: illuminator.
cathode and an apertured anode, which defines the direction of the emerging electron beam and thus the axis of the instrument. Then we arrange for a condensing electron lens in order to concentrate the beam on the specimen, or to illuminate our object. An objective electron lens, closely behind the object, captures the minute intrinsic structure of the thin specimen and magnifies this pattern, say 100 times or more, very precisely into an invisible but real electron image in the vacuum space. Then a projection electron lens magnifies this image again, and possibly a further electron lens repeats this image magnification. Thus there will finally be produced on a fluorescent screen a visible image which may be 5000, 10,000, 50,000, or more times the original size of the structure of the object, depending on the strength of the electron lenses. All the lenses following the specimen must be of the most perfect design in order to enable a precise image formation at such magnifications, and thus revealing the finest details of the specimen. A light optical device may be used to further enlarge the image up to a total of 100,000 or even 1,000,000 times with full resolution of the details. If one wants to compare such an electron microscope with the state of knowledge before its invention, the two sketches of Figures 5 and 6 may give an idea of the difference. The previous art to concentrate cathode rays to a tiny spot as in Figure 6 is now used merely to illuminate the specimen to be explored in Figure 5. However, the exploration itself of its finest details by further use of precisely acting electron lenses as just described seemed to me an entirely novel development.
190
Reinhold Ru¨denberg
From draft July 3, 1945.
Many other details useful for perfect operation of an electron microscope were described for the first time in the specifications of several patent applications which had been disclosed in the spring of 1931 to the German Patent Office.
From draft July 3, 1945.
In most countries such patents rest a fairly long time until their publication, and hence the first printing appeared late in 1932 in France. Similarly, the scientific periodicals do not contain these publications, as is very often the case with inventions.
Origin and Background of the Invention of the Electron Microscope
191
Only a short reference87 to my work by myself had been printed in 1932. From a later publication88 in 1943 I am presenting here some excerpts from the respective United States patents containing their scientific contents, so that the reader, without studying the complete specifications, may gain an idea in which way the invention has been described at the time of its disclosure. The first three excerpts from the main U.S. Patent No. 2,058,914 (Figures 7, 8, and 9) show some diagrams and portions of the text in which the general principle of electron microscopy is explained, consisting in the application of coaxial field lenses in order to render an electron beam convergent or divergent, and arranging of such magnetic or electric electron lenses in analogy to optical instruments. Furthermore the quantitative condition of spatial fields for perfect electronic lens formation is presented, namely, proportionality of the radial field strength to the distance from the axis, and the means to attain this. In addition, there is presented the condition for correct image formation of a physical object by electronic converging as well as diverging lenses, to be used for transmitted or for reflected electron rays.
FIGURE 7
General principle of electron microscope.
FIGURE 8 Condition for lens-forming fields.
192
Reinhold Ru¨denberg
FIGURE 9 Condition for image formation. Magnifiers, microscopes and telescopes constructed in this manner enable observations to be made which are impossible with optical investigations and also permit, regarding the order of magnitude, a considerably greater enlargement to be obtained than is possible with optical instruments, whose resolving power is limited by the wave-length of light. This limitation does not exist with magnifiers operating with electron rays, as the wavelength of these rays is several orders of magnitude smaller than that of light.
FIGURE 10 Complete electron microscope with extended limitation of resolution. Legend: 21 ¼ vacuum-tube; 22¼ fluorescent screen; 23 ¼ Lenard window; 24 ¼ object; 29 ¼ vacuum-tube; 30 ¼ hot cathode; 31 ¼ electrostatic stop (electron lens); 32 ¼ Lenard window; 33 and 34 ¼ electrostatic stop (electron lens); 35¼ envelope.
The fourth excerpt (Figure 10) shows a complete electron microscope as revealed in the above patent for magnification of the image in a number of steps by use of several field lenses in cascade, and shows further details for objects which cannot stand the high vacuum. The text explains that the limitation of the resolution with electrons and their waves is extended far beyond the wave length of optical light. The fifth excerpt (Figure 11) shows the combination of electronic and optical magnification, enabling the electron beam to perform its main objective under the most favorable conditions, namely, the resolution of the minute submicroscopic structure of the specimen and projecting it to a pattern visible by light, leaving a subsequent magnification to an optical microscope with its high qualities developed over the centuries for this purpose.
Origin and Background of the Invention of the Electron Microscope
FIGURE 11
193
Multiplication of optical and electron magnification.
FIGURE 12
Design of neutral electrostatic lenses.
Finally, taken from U.S. Patent No. 2,070,319, the sixth excerpt (Figure 12) shows the design of neutral electron lenses by use of several electrodes having two or three coaxial apertures in their stops which by their opposite charges are restricted to a local action in space. They constitute a lumped electron field lens without interfering with the axial velocity of the beam, as is advantageous for a well-controlled design.
7. FATE OF THE INVENTION After having worked out all these details of an electron microscope, many of them quantitatively, in order to be certain not to slip into the realm of fantasy, I informed an old friend and colleague of mine, L. Fischer, at that time head of the patent department of the Siemens firms in Berlin, with whom I had worked confidently through many years, of my invention, in order to learn of his opinion about such a fundamentally new instrument
194
Reinhold Ru¨denberg
and its prospects. He was so enthusiastic about these novel ideas that he suggested for matter of acceleration an application for a patent within the frame of Siemens-Schuckertwerke. This was done by one of their experts on May 29, 1931, with the Patent Office of the German Reich. By a mere chance the same patent expert, C. Avery, prepared a year later after his emigration to this country in a New York patent office* the U.S. applications for this invention so that these contain very well all the viewpoints of my original disclosures, whereas the German patent applications later under the Nazi regime had been considerably watered down. The U.S. patents89a,b were granted in 1936 and 1937, both with German priority of May 30, 1931. Corresponding patents have been granted in France89c,d in 1932, in England89e in 1933, in Switzerland89f in 1934, and in Austria89g also in 1934, all of them with the priority of the original application as just named.{ The German patents89h1-5,i1-5 themselves have been greatly retarded, due in part to Nazi measures, to the Second World War, and to its aftermath, and were finally published in unduly subdivided shape from 1950 to 1954. Strangely enough, they had been delayed so long that legally they were expiring just after they had been granted. I had prepared an early description in a scientific journal of the new principles incorporated in the electron microscope on the basis of my disclosures. However, the executives of the Siemens firms objected to such a publication.
From draft April 8, 1946. 87
Only a short note without any details was conceded in 1932. One of their reasons apparently was that they did not want to be connected with a
* Curt Abraham (Avery) oversaw the Ru¨denberg U.S. patent applications from the Siemens and Halske patent office, but they were executed by the Knight Brothers firm in New York. Abraham emigrated from Germany in February, 1937. { U.S. patent 2,058,914 was based on five related applications with the German priority dates May 30, June 26, and June 27, 1931, and two on March 30, 1932, but according to convention the published patent carried the first priority date. The corresponding French, Swiss, and Austrian patents were published listing all five German priority dates.
Origin and Background of the Invention of the Electron Microscope
195
mere fantasy. This was also indicated by the refusal of any further development of the invention in their appertaining departments. Under the reign of the Nazi philosophy at that time I could only yield to their decisions. In the following years, particularly during the hazardous preparation of my emigration from Germany in 1936 and the subsequent settlement of my family first in England and then in America, I could not work on the further development of the electron microscope, since in those trying times quite different obligations were predominant for me. It was only toward 1947 when I assisted an industrial firm to develop their electron microscope that I could return to such work,90 primarily by detailed investigations on the effects of electromagnetic fields in and about electron lenses, the subject which I had postponed in the former period. The reaction under the Nazi regime with respect to my early work had been clearly shown by some publications91a-d during that period in German scientific journals about the history of the electron microscope, in all of which my contribution to this field was not even mentioned but suppressed, although it was well known to the authors. It seems that the first objective recognition in Germany of my inventorship has been published only long after the Nazi time in a historical survey by D. Gabor92 in 1957. In this country, on the other hand, the Stevens Institute of Technology93 in Hoboken, New Jersey, awarded me in 1946 an Honor medallion for this invention ‘‘outlining both the electrostatic and the electromagnetic lens electron microscope.’’ In the beginning of the Second World War the U.S. patents had been vested in the Alien Property Custodian from the Siemens-Schuckertwerke in Germany in whose name they had been previously issued. However, by decision of the Federal Court of Boston94a,b in 1947 the patents were restored to me since on the basis of their origin and disclosure they were considered a free invention and therefore my property from the very outset. After this, the industrial firms then making or selling electron microscopes in this country secured licenses on the invention. The reader might ask for the fate of my son whose illness gave rise to the chain of thoughts described before. He recovered reasonably well after we had treated him in accord with our medical advisers without any plaster cast as was conventional at the time, but in the beginning with high-frequency diathermy and for many years after this by massage with gymnastics and resistance exercises.
8. BUILDING OF THE INSTRUMENT The construction and application of present-day electron microscopes were brought about by a great number of developers, most of them working independently in the beginning. It seems, in retrospect, that the time was ripe for the idea to employ electrons instead of light for magnification.
196
Reinhold Ru¨denberg
The first group at the Institute of Technology in Berlin, namely, M. Knoll with E. Ruska,95a,b used cathode ray oscilloscopes at hand, modifying them for the new purpose. Others, inspired by such success, approached the building of electronic enlargers along different lines. The efficiency of these tenacious practical workers will always be highly respected. We owe to them a design of the electron microscope which has been copied nearly everywhere. They have deposed their results in numerous papers and patents. When their original dates and data of their disclosures had been published,91c it became evident that their earliest endeavors had been revealed a short time after I had disclosed my invention89 as described above. M. Knoll95a,b with E. Ruska in 1932, and later E. Bru¨che96a with H. Johannson96b also in 1932, described their first apparatus in Germany, the former succeeding with magnetic lenses in a ca. 15-fold magnification of metallic contours and wire screens at the anode and other stops, the latter observing with electric lenses the emission structure of the cathode surface. In 1934 E. Ruska97 surpassed for the first time with an actual electron microscope the resolution limit of optical microscopes. Also in 1934 L. Marton98 in Belgium succeeded in magnification of biological material, showing how this can be made to withstand the vacuum and the heating by the electron beam. Analytically, C. J. Davisson99a,b with C. J. Calbick in 1931 had attempted to derive, and in 1932 succeeded to publish, a formulation for the focal length of an electrostatic single-aperture lens as originated much earlier by Wehnelt and Lilienfeld. When L. Posener100 in 1933 analyzed the imaging of extra-axial points by electron lenses, O. Scherzer,101 W. Glaser,102a-c and E. Bru¨che,103 all from 1933 on, promoted greatly the finer theory of geometrical electron optics and lens defects. L. C. Martin104 with R. V. Whelpton described their first instrument in England in 1937, A. Prebus with J. Hillier105 did the same in Canada in 1939, and M. Siegbahn106 also in 1939 in Sweden, all of these using magnetic lenses. H. Mahl107 showed, however, in 1939 that electrostatic lenses in his microscope worked just as well, and he was followed in 1940 by M. von Ardenne,108a,b both in Germany, with a universal instrument attaining the highest resolution of that time. P. Grivet109a,b with H. Bruck in 1946 built instruments with electrostatic lenses in France, and J. B. Le Poole110 in 1947 in Holland came back to magnetic lenses. Each of these designers added a number of fine details in his own way to the same basic plan of the electron microscope. Commercial instruments were brought out through the endeavors of B. von Borries with E. Ruska111 in 1939 by Siemens and Halske in Germany, and by the work of L. Marton, J. Hillier, and V. K. Zworykin,112 the Radio Corporation of America had developed theirs in 1941 in
Origin and Background of the Invention of the Electron Microscope
197
the United States. They were followed by the Metrovick Electrical Company113 in 1947 in England, and by the Philips Lamp Company114 in 1950 in Holland. All these instruments were built with two or three magnetic magnifying lenses, whereas Carl Zeiss115 in Germany in 1951 and Farrand Optical Co.116 in the U.S.A. in 1952 developed commercial instruments with electrostatic lenses. Also in Switzerland, in Russia, and in Japan electron microscopes have been built since about 1940. All these instruments followed the lines as disclosed in my original patents. An important progress in the quality of electron lenses was initiated in 1946, when J. Hillier with E. G. Ramberg117 showed that the main cause of lens defects at that time had been a very slight departure of the contour of the aperture of the lenses from an exact circle, remaining even with the best manufacturing processes. Consequently they devised effective means for compensation of these errors, which greatly improved the resolving power of the electron microscope. Successively a number of major achievements were made in the preparation of the specimen, adapting it to the new technique of viewing and magnifying in the vacuum of the electron microscope. H. Mahl118 introduced in 1940 the replica method using a thin transparent film to copy the surface structure of opaque materials which could not be observed directly. This initiated the use of electron microscopy in the metallographic and related fields. R. C. Williams119 with R. W. G. Wyckoff developed in 1944 the method of shadow-casting of relief surfaces by metal vapor, thereby greatly enhancing the contrasts in hardly visible thin structures. Mainly by use of proper embedding materials, by appropriate staining methods, and by cutting specimen sections adequate in thinness to the power of resolution inherent in the electron microscope, further vast improvements in the observation technique have been achieved. At this point we will break off the early history of the electron microscopes, which now are widely used in all industrially and scientifically interested countries throughout the world. They have led to many new insights and discoveries in the minute behavior of nature in physics and chemistry, in biology and medicine, in metallurgy, and numerous other fields of the natural sciences. While viruses of many types had been recognized in the electron microscope soon after its inception, just the polio virus resisted for many years a clear identification. Only in recent years its shape and arrangement could be unveiled due to progress in the technique of its culture. In Figure 13 this virus is employed in order to give at least one actual example of the resolving power of the electron microscope by an image which recently was made by R. C. Williams120 in the Virus Laboratory of W. M. Stanley at the University of California. The magnification is 100,000 times and shows in addition to many individual particles a big cluster of
198
FIGURE 13 Biol.
Reinhold Ru¨denberg
Virus image (Williams/Stanley120). Reprinted with permission of the Soc. Exp.
viruses constituting a regular array of the same character as is usually built up by molecules to form a crystal. This similarity between living viruses and inanimate molecules is one of the spectacular achievements demonstrated by the electron microscope.
REFERENCES 1a. J. Hadamard, An Essay on the Psychology of Invention in the Mathematical Field, Princeton University Press, Princeton, 1945. 1b. K.R. Popper, The Logic of Scientific Discovery, Basic Books, New York, 1959. 2. J. Plu¨cker, Fortgesetzte Beobachtungen u¨ber die elektrische Entladung in gasverdu¨nnten Ra¨umen, Ann. Physik Chem. 183 (1859) 77–113. 3. W. Hittorf, Ueber die Elektricita¨tsleitung der Gase, Ann. Physik 212 (1869) 1–31 ,197–234. 4. W. Crookes, On the illumination of lines of molecular pressure, and the trajectory of molecules, Phil. Mag. 7 (1879) 57–64. 5a. E. Riecke, Ueber die Bewegung eines elektrischen Theilchens in einem homogenen magnetischen Felde und das negative elektrische Glimmlicht, Ann. Physik 249 (1881) 191–194. 5b. E. Riecke, Bewegung eines elektrischen Teilchens in einem Felde elektrostatischer und elektromagnetischer Kraft, Ann. Physik 309 (1901) 378–387. 5c. E. Riecke, Zur Bewegung eines elektrischen Teilchens in einem constanten elektromagnetischen Felde, Ann. Physik 312 (1902) 401–407. 6. A. Schuster, The discharge of electricity through gases, Proc. Roy. Soc. 47 (1890) 526–561. 7. P. Lenard, Ueber Kathodenstrahlen in Gasen von atmospha¨rischem Druck und im a¨ussersten Vakuum, Ann. Physik Chem. 287 (1894) 225–267. 8. W. Ro¨ntgen, Ueber eine neue Art von Strahlen I. & II. Sitzungsber. d. Wu¨rzburger, Physik.-Med. Gesellsch. 18 (I)) (1895-6) 137–141; (II), 11–16, 17–19.
Origin and Background of the Invention of the Electron Microscope
199
9. J.J. Thomson, Cathode rays, Phil. Mag. 44 (1897) 293–316. ¨ ber das Wesen der Elektricita¨t; II. Experimentelles u¨ber die Kathoden10. E. Wiechert, U strahlen, Sitzungsber. phys.-oekon. Gesellsch. Ko¨nigsberg 38 (1897) 3–12 ,12–16. 11a. W. Kaufmann, Die magnetische Ablenkbarkeit der Kathodenstrahlen und ihre Abha¨ngigkeit vom Entladungspotential, Ann. Physik 297 (1897) 544–552. 11b. W. Kaufmann, E. Aschkinass, Ueber die Deflexion der Kathodenstrahlen, Ann. Physik Chem. 298 (1897) 588–595. 11c. W. Kaufmann, Nachtrag zu der Abhandlung: ‘‘Die Magnetische Ablenkbarkeit der Kathodenstrahlen etc.’’ Ann. Physik Chem. 298 (1897) 596–598. 12. F. Braun, Ueber ein Verfahren zur Demonstration und zum Studium des zeitlichen Verlaufes variabler Stro¨me, Ann. Physik 296 (1897) 552–559. 13. H. Ebert, Das Verhalten der Kathodenstrahlen in electrischen Wechselfeldern, Ann. Physik 300 (1898) 240–261. 14. E. Wiechert, Experimentelle Untersuchungen u¨ber die Geschwindigkeit und die magnetische Ablenkbarkeit der Kathodenstrahlen, Ann. Physik 305 (1899) 739–766. 15. R. Rankin, The cathode-ray oscillograph, Electric Club J. 2 (1905) 620–631. 16. T.A. Edison, E.J. Houston, Trans. Am. Inst. Elec. Eng. 1 (1884) 1. [Editorial note. No paper by Edison and Houston is present in the early volumes of these Transactions. However, a paper by Houston fits Ru¨denberg’s description very closely: Houston, E.J. (1884). Notes on phenomena in incandescent lamps. Trans. Am. Inst. Elec Eng., i, Paper No. 2, 8 pp.]. 17. J.A. Fleming, On electric discharge between electrodes at different temperatures in air and in high vacua, Proc. Roy. Soc. 47 (1890) 118–126. 18. O.W. Richardson, Negative radiation from hot platinum, Proc. Cambridge Philos. Soc. 11 (1901) 286–295. ¨ ber den Austritt negativer Ionen aus glu¨henden Metallverbindungen 19. A. Wehnelt, U und damit zusammenha¨ngende Erscheinungen, Ann. Physik 319 (1904) 425–468. 20a. L. de Forest, Device for amplifying feeble electrical currents, (1906). U.S. patent 841,387, filed 1906. 20b. L. de Forest, Space telegraphy, (1907). U.S. patent 879,532, filed 1907. 21. R. von Lieben, Kathodenstrahlenrelais, (1906). German patent 179,807, filed 1906. 22a. M. von Laue, W. Friedrich, P. Knipping, Interferenzerscheinungen bei Ro¨ntgenstrahlen, Mu¨nchener Sitzungsber. (1912) 303–322 and 363. 22b. M. von Laue, Eine quantitative Pru¨fung der Theorie fu¨r die Interferenzerscheinungen bei Ro¨ntgenstrahlen, Ann. Physik 346 (1913) 989–1002. 22c. W. Friedrich, P. Knipping, M. von Laue, Interferenzerscheinungen bei Ro¨ntgenstrahlen, Ann. Phys. 346 (1913) 971–988. 23. W.H. Bragg, W.L. Bragg, The reflection of x-rays by crystals, Proc. Roy. Soc. A88 (1913) 428–438, and A89, 246–248. 24. W. Westphal, Potentialmessungen an glu¨henden Oxydkathoden Verh, Dtsch. Phys. Ges. 6 (1908) 401–405. 25. J.E. Lilienfeld, Oszillographenroehre, (1915). German patent 373,834 of 1915, published 1923. 26. J.B. Johanson, A low-voltage cathode ray oscillograph, J. Opt. Soc. Am. 6 (701) (1922). 27. L.T. Jones, H.G. Tasker, A thermionic Braun tube with electrostatic focusing, J. Opt. Soc. Am. 9 (1924) 471–475. ¨ ber einen lichtstarken Glu¨hkathodenoszillographen fu¨r 28. W. Rogowski, W. Gro¨sser, U Außenaufnahme rasch verlaufender Vorga¨nge, Arch. Electrotech. 15 (1925) 377–384. 29. R.H. George, Trans. Am. Inst. Elec Eng. 48 (1929) 884. [A new type of hot cathode oscillograph and its application to the automatic recording of lightning and switching urges. J. Am. Inst. Elec Eng. 48, 534–538]. 30. C.A. Sabbah, Space discharge apparatus, (1926). U.S. patent 1,927,807 of 1926, publ. 1933.
200
Reinhold Ru¨denberg
31. M. von Ardenne, Die Braunsche Ro¨hre als Fernsehempfa¨nger, Fernsehen 1 (1930) 193–302. 32a. V.K. Zworykin, Television system, (1930). U.S. patent 2,084,364 of 1930, published 1937. 32b. V.K. Zworykin, Electron multiplier, (1930). U.S. patent 2,249,552 of 1930, published 1941. 33. P.E.L. Chevallier, Kinescope, (1930). U.S. patents 2,021,252 and 2,021,253 of 1930, published 1935. ¨ ber ein Braunsches Rohr mit Glu¨hkathode und einige Anwendungen 34. C. Samson, U desselben, Ann. Physik 55 (1918) 608–632. 35. H. Behnken, Untersuchungen an Hochspannungstransformatoren mit dem Braunschen Rohr Arch, Elektrotech. 11 (1922) 131–139. 36. D. Gabor, Oszillographieren von Wanderwellen mit dem Kathoden-Oszillographen. Forschungshefte Studienges. Ho¨chstspannungsanlagen No.1 (1927) 7–46. 37. E. Ruska, M. Knoll, Die magnetische Sammelspule fu¨r schnelle Elektronenstrahlen, Z. Tech. Phys. 12 (1931) 389–400, 448. 38. K. Schwarzschild, [Two forms of the principle of least action in the theory of electrons. The elementary electrodynamic force], Nachr. K. Gesell. Wiss. Go¨ttingen, Math.-Phys. Klasse 3 (1903) 126–131. ¨ ber einen lichtstarken Glu¨hkathodenoszillographen fu¨r 39. W. Rogowski, W. Gro¨sser, U Außenaufnahme rasch verlaufender Vorga¨nge, Arch. Elektrotech. 15 (1925) 377–384. 40a. H. Busch, Berechnung der Bahn von Kathodenstrahlen im axialsymmetrischen elektromagnetischen Felde, Ann. Physik 386 (1926) 974–993. ¨ ber die Wirkunsweise der Konzentrierungsspule bei der Braunschen 40b. H. Busch, U Ro¨hre, Arch. Elektrotech. 18 (1927) 583–594. 41. H. Busch, Grundlagen und Entwicklung der Elektronenoptik, Z. Techn. Phys. 17 (1936) 584–588. 42. M. Knoll, Vorrichtung zur Konzentrierung des Elektronenstrahls eines Kathodenstrahloszillographen, (1929). German patent 690,809 of 1929, published 1940. 43. A. Dufour, Sur une oscillographe cathodique, C. R. Acad. Sci. Paris 158 (1914) 1339–1341. 44. J. Fallou, Nouvelle contribution expe´rimentale a` l’e´tude des surtensions dans les transformateur, Bull. Soc. Franc¸. Electr. 6 (1926) 1245–1269. 45. D.A. Keys, A piezoelectric method of measuring explosion pressures, Phil. Mag. 42 (1921) 474–488. 46. A.B. Wood, The cathode-ray oscillograph, Proc. Phys. Soc. 35 (1923) 109–124. 47. N.V. Kipping, Investigations with the cathode-ray oscillograph, Wireless World 13 (1923/4) 309, 705. 48. J.T. MacGregor-Morris, R. R. Mines, Measurements in electrical engineering by means of kathode rays, J. Instn. Electr. Engr. 63 (1925) 1056–1107. 49. H. Norinder, Katodstra˚lro¨rets anva¨ndning som ho¨gfrekvensoscillograf sa¨rskilt fo¨r underso¨kning av vandringsva˚gor, Teknisk Tidskrift 55 (1925) 152–157. 50. K. Berger, Ueber die Weiterentwicklung des Kathodenstrahl-Oszillographen von Dufour zur Ermo¨glichung der Aufnahme von Gewittererscheinungen, sowie anderer Vorga¨nge ku¨rzester Dauer and Der Kathodenstrahl-Oszillograph als Registrierinstrument, speziell fu¨r raschverlaufende Vorga¨nge, Bull. Schweiz. Elektrotech. Verein 19 (1928) 292–301, 688–694. 51. H.J. Ryan, Cathode-ray wave tracer, Trans. Am. Inst. Elec. Engr. 20 (1903) 1417–1430. 52. E.L. Chaffee, New method of impact excitation, Proc. Am. Acad. Arts Sci. 47 (1911) 267–312. 53. F. Bedell, H.J. Reich, Trans. Am Inst. Elec. Engr. 46 (1927) 546. [The oscilloscope: a stabilized kathode-ray oscillograph with linear time-axis. J. Am Inst. Elec. Engr. 46, 563–567].
Origin and Background of the Invention of the Electron Microscope
201
54. E.S. Lee, Kathode-ray oscillographs and their uses, General Electric Rev. 31 (1928) 404–412. 55. K.B. McEachron, Trans. Am. Inst. Elec Engr. 48 1929, p. 953; [Kathode-ray oscillograph studies of artificial lightning surges. J. Am Inst. Elec. Engr. 46, 374–378]. 56. J. Zenneck, Eine Methode zur Demonstration und Photographie von Stromcurven, Ann. Physik 305 (1899) 838–853. ¨ ber die Dynamik der Lichtbogenvorga¨nge und u¨ber Lichtbogenhyster57. H. Th Simon, U esis, Physik. Z. 6 (1905) 297–319. 58. E. Madelung, Neue Verwendungsarten der Braunschen Ro¨hre zur Untersuchung der magnetischen und dielektrischen Hysteresis, Physik. Z. 8 (1907) 72–75. 59a. W. Rogowski, E. Flegler, Neue Vorschla¨ge zur Verbesserung des KathodenstrahlOszillographen, Arch. Elektrotech 9 (1920) 115–120; and Einige Versuche mit einem verbesserten Kathodenstrahl-Oszillographen, ibid., 120–126. 59b. W. Rogowski, E. Flegler, Die Wanderwelle nach Aufnahmen mit dem Kathodenoszillographen, Arch. Elektrotech. 14 (1925) 529–530. 59c. W. Rogowski, E. Flegler, Eine neue Bauart des Kathodenoszillographen, Arch. Elektrotech. 18 (1927) 513–524. 60a. D. Gabor, Oszillographieren von Wanderwellen, Arch. Elektrotech. 16 (1926) 296–302. 60b. D. Gabor, Fortschritte im Oszillographieren von Wanderwellen, Arch. Elektrotech. 18 (1927) 48–55. 61. A. Matthias, M. Knoll, H. Knoblauch, Kathodenstrahloszillographen liegender Bauart, Z. Techn. Phys. 11 (1930) 276–282. 62a. M. Knoll, Vakuumtechnishe Neuerungen an Kathodenoszillographen, Z. Techn. Phys. 10 (1929) 294–299. 62b. M. Knoll, H. Knoblauch, B. von Borries, Fortschritte am Kathodenstrahloszillographen durch Dauerbetrieb mit Metallentladungsro¨hren und durch Außenphotographie sehr kurzzeitiger Vorga¨nge, Elektrotech. Z. 51 (1930) 966–970. 63a. W. Krug, Das Verhalten von Stoßschaltungen nach Aufnahmen mit dem Kathodenstrahl-Oszillographen, Elektrotech. Z. 50 (1929) 681–685. 63b. W. Krug, Eine Sprungschaltung fu¨r Sperr- und Zeitkreise fu¨r Kathodenstrahl-Oszillographen, Elektrotech. Z. 51 (1930) 605–609. 64a. R. Ru¨denberg, Oszillograph zur Aufnahme schnell veraenderlicher Erscheinungen, (1924) German patent 429,926 of 1924, published 1926. 64b. R. Ru¨denberg, Oscillograph, (1924). U.S. patent 1,695,719 of 1924, published 1928. 65. E. Alberti, Braun’sche Kathodenstrahlro¨hren und ihre Anwendung, Springer, Berlin, 1932. 66a. L. de Broglie, Recherches sur la The´orie des Quanta, Thesis, Masson, Paris, 1924 . 66b. L. de Broglie, Recherches sur la the´orie des quanta, Ann. Physique 3 (1925) 22–128. 67. H. Schro¨dinger, Quantisierung als Eigenwertproblem, Ann. Physik 384 (1926) 361–376 489–527, and 386, 109–139. 68. C.J. Davisson, L.H. Germer, Diffraction of electrons by a nickel crystal, Phys. Rev. 30 (1927) 705–740. 69. G.P. Thomson, Experiments on the diffraction of cathode rays, Proc. Roy. Soc. A117 (1928) 600–609, and A119, 651–663. 70. H. Stintzing, Verfahren und Einrichtung zum automatischen Nachweis, Messung und Zahlung von Einzelteilchen beliebiger Art, Form und Gro¨ße, (1927) German patent 485,155 of 1927, published 1929. 71. E. Bru¨che, Strahlen langsamer Elektronen und ihre technische Anwendung, in: W. Petersen (Ed.), Forschung und Technik, Springer, Berlin, 1930, p. 29. 72. R. Ru¨denberg, Energie der Wirbelstro¨me in elektrischen Bremsen und Dynamomaschinen, Enke, Stuttgart, 1906.
202
Reinhold Ru¨denberg
73a. R. Ru¨denberg, Proce´de´ et dispositif pour la regulation des transformateurs de fre´quence avec excitation par le stator, (1909) French patent 412,767 of 1909, published 1910 and German Gebrauchs-Muster 450,489 of 1910. 73b1. R. Ru¨denberg, (1913). U.S. patent 1,285,398 of 1913, published 1918. 73b2. R. Ru¨denberg, A.A. Zehrung, Cooling dynamos, (1915). U.S. patent 1,170,192 of 1915, published 1916. 73c. R. Ru¨denberg, Built for Rein. [sic] Westf, Elektr. Werk, Cologne, 1916. 73d. R. Ru¨denberg, Asynchronmotoren mit Selbstanlauf durch tertia¨re Wirbelstro¨me, Elektrotech. Z. 39 (1918) 483–486, 493–495, and 501–504. 73e. R. Ru¨denberg, Verseilter Hochspannungsleiter, (1919). German patent 398,516 of 1919, published 1924. 73f. R. Ru¨denberg, Leitungsanordnung fu¨r Betriebstelephonie auf Starkstromlinien, (1919). German patent 380,307 of 1919, published 1923. 73g1. R. Ru¨denberg, Anordnung zur Regelung von Wechselstromfernleitungen, (1928). German patent 592,309 of 1928, published 1934. 73g2. R. Ru¨denberg, Alternating-current long-distance line, (1928). U.S. patent 1,947,061 of 1928, published 1934. 74a. R. Ru¨denberg, Elektrische Schaltvorga¨nge und verwandte Sto¨rungserscheinungen in Starkstromanlagen, Springer, Berlin, 1923 [4th edition, 1953]. 74b. R. Ru¨denberg, Transient Performance of Electric Power Systems, McGraw Hill, New York, 1950 [later edition, MIT Press, Cambridge, MA, and London, 1970]. 75. R. Ru¨denberg, List of publications, Elektrotech. Z. 79A (1958) 99–100. ¨ ber die Fortpflanzungsgeschwindigkeit und Impulssa¨rke von 76a. R. Ru¨denberg, U Verdichtungssto¨ßen, Artillerist. Monatshefte 10 (113-114) (1916) 237–316. ¨ ber den Entwurf technischer Modellreihen, Z. Verein Deutsch. Ingen. 76b. R. Ru¨denberg, U 62 (1918) 406–412. 76c. R. Ru¨denberg, Die Ausbreitung der Erdstro¨me in der Umgebung von Wechselstromleitungen, Z. Angew. Math. Mech. 5 (1925) 361–389. 76d. R. Ru¨denberg, Das Verhalten elektrischer Kraftwerke und Netze beim Zusammenschloß, Elektrotech. Z. 50 (1929) 970–984. 76e. R. Ru¨denberg, Thermoelectric apparatus, (1926). U.S. patent 1,717,413 of 1926, published 1929. 76f. R. Ru¨denberg, Die Beurteilung elektrischer Maschinen und Apparate durch Toleranzen auf Grund statistischer Methoden, Z. Angew. Math. Mech. 9 (1929) 318–334. 77. R. Ru¨denberg, Eine Methode zur Erzeugung von Wechselstro¨men beliebiger Periodenzahl, Physik. Z. 8 (1907) 668–672. 78. R. Ru¨denberg, Der Empfang elektrischer Wellen in der drahtlosen Telegraphie, Ann. Physik 330 (1908) 446–466. 79a. B.A. Houssay, The discovery of pancreatic diabetes: the role of Oskar Minkowski, Diabetes 1 (1952) 112–116. 79b. M. Nothmann, Oskar Minkowski (1858–1931), N. Engl. J. Med. 259 (1958) 1276–1277. 79c. M. Nothman, Gedenktage. Oskar Minkowski 1858–1931, Deutsche Medizin. Wochenschr. 84 (120) (1959). 80. Spectacles invented ca. 1300, Encyclopaedia Britannica. ‘‘Light History,’’ par. 6. 81. A. van Leeuwenhoek, Arcana Naturae Detecta., Delphis Batavorum, 1695. 82. Compound microscope invented ca. 1600, Encyclopaedia Britannica, ‘‘Light History,’’ par. 7. 83. E. Abbe, Beitra¨ge zur Theorie des Mikroskops und der mikroskopisches Wahrnehmung, Arch. Mikrosk. Anat. 9 (1873) 413–468. 84. H. von Helmholtz, Die theoretische Grenze fu¨r die Leistungsfa¨higkeit der Mikroskope, Ann. Physik Jubelband (1874) 557–584.
Origin and Background of the Invention of the Electron Microscope
203
85. M. Knoll, E. Ruska, Beitrag zur geometrischen Elektronenoptik. I, II., Ann. Physik 404 (1932) 607–640, 641–661. 86. E.E. Watson, Dispersion of an electron beam, Phil. Mag. 3 (1927) 849–857. 87. R. Ru¨denberg, Elektronenmikroskop, Naturwissenschaften 20 (1932) 552. 88. R. Ru¨denberg, The early history of the electron microscope, J. Appl. Phys. 14 (1943) 434–436. 89a. R. Ru¨denberg, Apparatus for producing images of objects, (1931). U.S. patent 2,058,914 of May 30, June 26 and 27, 1931, and March 30, 1932, published Oct. 27, 1936. 89b. R. Ru¨denberg, Apparatus for influencing the character of electron rays, (1931). U.S. patent 2,070,319 of May 30, 1931, published Feb. 9, 1937. 89c. R. Ru¨denberg, Dispositif pour obtenir des images d’objets, (1931). French patent 737,716 of May 30, June 26 and 27, 1931, and March 30, 1932, published Dec. 15, 1932. 89d. R. Ru¨denberg, Disposition pour influencer la nature de rayons e´lectroniques, (1931). French patent 737,816 of May 30, 1931, published Dec. 16. 1932. 89e. R. Ru¨denberg, Improvements in or relating to cathode-ray tubes, (1931). British patent 402.781 of May 30, 1931, published Nov. 30, 1933. 89f. R. Ru¨denberg, Einrichtung zum Abbilden von Gegensta¨nden, (1931). Swiss patent 165,549 of May 30, June 26 and 27, 1931, and March 30, 1932, published April 2, 1934. 89g. R. Ru¨denberg, Einrichtung zum Abbilden von Gegensta¨nden, (1931). Austrian patent 137,611 of May 30, June 26 and 27, 1931, and March 30, 1932, published May 25, 1934. 89h1. R. Ru¨denberg, Anordnung zur vergro¨ßerten Abbildung von Gegensta¨nden mittels Elektronenstrahlen und mittels den Gang der Elektronenstrahlen beeinflussender elektrostatischer oder elektromagnetischer Felder, (1931). German patent 895,635 of May 30, 1931, published 1953. 89h2. R. Ru¨denberg, Anordnung zum vergro¨ßerten Abbildung von Gegensta¨nden mittels Elektronenstrahlen, (1931). German patent 906,737, of May 30, 1931, published 1954. 89h3. R. Ru¨denberg, Anordnung zur Beeinflussung des Verlaufs von Elektronenstrahlen durch elektrisch geladene Feldblenden, (1931). German patent 889,660, of May 30, 1931, published 1953. 89h4. R. Ru¨denberg, Anordnung zur Steuerung der Intensita¨t der durch eine Lochblende hindurchtretenden Elektronenstrahlen, (1931). German patent 754,259 of May 30, 1931, granted 1942, not published. 89h5. R. Ru¨denberg, Anordnung zur Beeinflussung des konvergenten Verlaufs von Elektronenstrahlen durch Platten-oder ringfo¨rmige elektrisch geladene Feldblende, (1931). German patent 758,391 of May 30, 1931, granted 1939, not published. 89i1. R. Ru¨denberg, Einrichtung zum Abbilden von Gegensta¨nden, (1931). German patent 911,996 of 1931, published 1954. 89i2. R. Ru¨denberg, Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen, (1931). German patent 916,838 of 1931, published 1954. 89i3. R. Ru¨denberg, Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen, (1932). German patent 916,839 of 1932, published 1954. 89i4. R. Ru¨denberg, Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen (Elektronenmikroskop)., (1932). German patent 916,841 of 1932, published 1954. 89i5. R. Ru¨denberg, Anordnung zum Beeinflussen des Characters von Elektronenstrahlen durch elektrostatisch aufgeladene Doppelblenden, (1932). German patent 915,253 of 1932, published 1954. 90. R. Ru¨denberg, Electron lenses of hyperbolic field structure, J. Franklin Inst 246 (1948) 311–339, 377–408. 91a. A. Mathias, Bemerkung zur Entstehung des Elektronenmikroskops, Physik. Z. 43 (1942) 129–130. 91b. E. Bru¨che, Zum Entstehen des Elektronenmikroskops, Physik. Z. 44 (1943) 176–180.
204
Reinhold Ru¨denberg
91c. K. Ku¨pfmu¨ller, Zur Geschichte des Elektronenmikroskops, Physik. Z. 45 (1944) 47–51. 91d. B. von Borries, E. Ruska, Neue Beitra¨ge zur Entwicklungsgeschichte der Elektronen¨ bermikroscopie, Physik. Z. 45 (1944) 314–326. mikroskopie und der U 92. D. Gabor, Die Entwicklungsgeschichte des Elektronenmikroskops, Elektrotech. Z. A78 (1957) 522–530. 93. ‘‘Here and there’’, Reinhold Rudenberg has received the Honor Award of Stevens Institute of Technology ‘‘for notable achievement in the field of electron optics as the inventor of the electron microscope.’’ J. Appl. Phys. 17 (1946) 408. 94a. Civil Action No. 3873 of June 24, Rudenberg v. Clark, National Archives & Records Administration, Waltham, MA, 1947. 94b. C. Wyzanski, Federal Court of Boston, Fed. Supp. 72 (1947) 381. 95a. M. Knoll, E. Ruska, Beitrag zur geometrischen Elektronenoptik, I, II, Ann. Physik 404 (1932) 607–640, 641–661. 95b. M. Knoll, E. Ruska, Das Elektronenmikroskop, Z. Physik 78 (1932) 318–339. 96a. E. Bru¨che, Elektronenmikroskop, Naturwissenschaften 20 (1932) 49, and 353. 96b. E. Bru¨che, H. Johannson, Elektronenoptik und Elektronenmikroskop, Naturwissenschaften 20 (1932) 353–358. ¨ ber Fortschritte im Bau und in der Leistung des magnetischen Elektro97. E. Ruska, U nenmikroskops, Z. Physik 87 (1934) 580–602. 98. L. Marton, La microscopie e´lectronique des objest biologiques, Bull. Cl. Sci. Acad. Roy Belgique 20 (1934) 439–446. 99a. C.J. Davisson, C.J. Calbick, Electron lenses, Phys. Rev. 38 (1931) 585. 99b. C.J. Davisson, C.J. Calbick, Electron lenses, Phys. Rev. 42 (1932) 580. 100. L. Posener, Zur Theorie des Elektronenmikroskops, Z. Physik 80 (1933) 813–818. 101. O. Scherzer, Zur Theorie der elektronenoptischen Linsenfehler, Z. Physik 80 (1933) 193–202. ¨ ber geometrisch-optische Abbildung durch Elektronenstrahlen, Z. Physik 102a. W. Glaser, U 80 (1933) 451–464. 102b. W. Glaser, Zur geometrischen Elektronenoptik des axialsymmetrischen elektromagnetischen Feldes, Z. Physik 81 (1933) 647–686. 102c. W. Glaser, Theorie des Elektronenmikroskopes, Z. Physik 83 (1933) 104–122. 103. E. Bru¨che, O. Scherzer, Geometrische Elektronenoptik, Springer, Berlin, 1934. 104. L.C. Martin, R.V. Whelpton, D.H. Parnum, A new electron microscope, J. Sci. Instrum. 14 (1937) 14–24. 105. A. Prebus, J. Hillier, The constrution of a magnetic electron microscope for high resolving power, Can. J. Res. 17A (1939) 49–63. 106. M. Siegbahn, Vetenskapsakadamiens Forskningsinstitut fo¨r Experimentell Fysik, Nordisk Familjeboks Manedskro¨na 2 (1939) 571–576. ¨ ber das elektrostatische Elektronenmikroskop hoher Auflo¨sung, Z. Techn. 107. M. Mahl, U Phys. 20 (1939) 316–317. ¨ ber ein Universal-Elektronenmikroskop fu¨r Hellfeld-, Dunkelfeld108a. M. von Ardenne, U und Stereobild-Betrieb, Z. Physik 115 (1940) 339–368. 108b. M. von Ardenne, Abbildung feinster Einzelteilchen insbesondere von Molekulen mit dem Universal-Elektronenmikroskop, Z. Physik. Chem. 187A (1940) 1–12. 109a. P. Grivet, H. Bruck, Le microscope e´lectronique e´lectrostatique, Ann. Radioe´lec. 1 (1946) 293–310. 109b. P. Grivet, The French electrostatic microscope, Ann. Radioe´lec. 3 (1948) 144–145. 110. J.B. Le Poole, A new electron microscope with continuously varying magnification, Philips Techn. Rev. 9 (1947) 33–46. ¨ bermikroskopes, 111. B. von Borries, E. Ruska, Aufbau und Leistung des Siemens-U Z. Wiss. Mikrosk. 56 (1939) 317–333.
Origin and Background of the Invention of the Electron Microscope
205
112. V.K. Zworykin, J. Hillier, A.W. Vance, An electron microscope for practical laboratory service, Trans. Inst. El. Engr. 60 (1941) 157–161. 113. M.E. Haine, The design and construction of a new electron microscope, J. Inst. Electr. Eng. 94 (I) (1947) 447–462. 114. A.C. van Dorsten, H. Nieuwdorp, A. Verhoeff, The Philips electron microscope EM 100, Philips Techn. Rev. 12 (1950) 33–64. 115. O. Rang, H. Schluge, Aufbau des AEG-ZEISS Elektronenmikroskops EM8, A.E.G.-Mitteilg 41 (1951) 151–154, and 285. 116. C.E. Hall, Introduction to Electron Microscopy, McGraw-Hill, New York, 1953, p. 178, 223. 117. J. Hillier, E.G. Ramberg, The magnetic electron microscope objective: contour phenomena and the attainment of high resolving power, J. Appl. Phys. 18 (1947) 48–71. ¨ bermikros118. M. Mahl, Metallkundliche Untersuchungen mit dem elektrostatischen U kop, Z. Techn. Physik 21 (1940) 17–19. 119. R.C. Williams, R.W.G. Wyckoff, The thickness of electron microscopic objects, J. Appl. Phys. 15 (1944) 712–716. 120. C.E. Schwerdt, R.C. Williams, W.M. Stanley, F.C. Schaffer, M.E. McClain, Morphology of the type II poliomyelitis virus (MEF1) as determined by electron microscopy, Proc. Soc. Exp. Biol. Med. 86 (1954) 310–312.
This page intentionally left blank
Chapter
6 Origin and Background of the Invention of the Electron Microscope: Commentary and Expanded Notes on Memoir of Reinhold Ru¨denberg* H. Gunther Rudenberg† and Paul G. Rudenberg‡
Contents
1. Introduction 2. Early Life 2.1. Childhood and Education 2.2. Professional Career 2.3. Family 2.4. Polio 3. The Process of Invention 3.1. Early Patent Applications 3.2. Early Work at Siemens 3.3. Leaving Germany 4. The Fate of the Patent Applications 4.1. Controversy 5. Building the Electrostatic Microscope 6. Recognition and Character Acknowledgments References
208 211 211 214 215 216 218 233 236 244 247 255 262 265 267 267
* Reinhold Ru¨denberg’s memoir begins on p. 171 of this volume {
{
H. Gunther Rudenberg, the older son of Reinhold Ru¨denberg, is deceased. At the time of the writing of this chapter, he lived in Scarborough, Maine, USA Paul G. Rudenberg, son of H. Gunther Rudenberg lives in Les Cayes, Haiti, c/o 259 Foreside Road, Falmouth, Maine 04105, USA
Advances in Imaging and Electron Physics, Volume 160, ISSN 1076-5670, DOI: 10.1016/S1076-5670(10)60006-7. Copyright # 2010 Elsevier Inc. All rights reserved.
207
208
H. Gunther Rudenberg and Paul G. Rudenberg
1. INTRODUCTION Reinhold Ru¨denberg, well known for his leadership of electric power technology at Siemens in Germany, was one of the foremost electrical engineers of his time (Goetzeler and Feldtkeller, 1994; Weiher, 1976; Schoen, 1985). His books were known for their exceptionally clear explanations of complex concepts and are now considered classics in their field. He wrote more than a hundred articles that contained analyses of problems of electrical engineering and transients (Jacotett et al., 1958) and taught these subjects both in Germany and later in the United States. Ru¨denberg (Figure 1) was also a prolific inventor, with several hundred patents and innovations in both the electrical and related fields. While many of his innovations are incremental, others show a brilliant flash of insight. Ru¨denberg is little known for his two fundamental and comprehensive patents of May 30, 1931 on the electron microscope, yet he saw this invention as his most significant and far-reaching contribution to society. Until now, few scholars have accurately presented Reinhold Ru¨denberg’s place in the early history of the electron microscope. At times, editors or workers in the field have instead attacked his role. Many provide incomplete information, or speculation, about his life and work, often because published information is lacking. Ru¨denberg’s story of how he came to conceptualize and delineate his electron microscope is a chronicle of the many facets of insight that came
FIGURE 1 Reinhold Ru¨denberg.
Origin and Background of the Invention of the Electron Microscope Commentary
209
together to give birth to this invention. Since the genealogy of the intellectual origins of his invention has never been published in detail, his memoir represents a valuable addition to the historical record of this important instrument. In the process of describing how his ideas developed, ‘‘Origin and Background of the Invention of the Electron Microscope’’ presents a panorama of early twentieth-century electron physics (Ru¨denberg, 1959). Parts of the story are told with such brevity that a reader could easily underestimate Ru¨denberg’s sense of the relevance of each name and accomplishment cited. The Ru¨denberg memoir does not fit the traditional categories of scientists’ autobiographies and memoirs. It is neither a full-life autobiography nor a study of patterns of innovation. Instead, Reinhold Ru¨denberg’s memoir focuses on a single invention and describes its background and development out of the prior experiences and education of the inventor. Ru¨denberg presents only those events in his life he believed had contributed to his conception of this single important invention. He thought his description would interest both other inventors and those interested in probing the process of innovation and invention, beyond those interested in the history of the electron microscope. The manuscript began as eight pages specifically describing the stimulus and source of his ideas and his related personal history, in preparation for court testimony. His U.S. patents on the electrostatic lens and the electron microscope, assigned to the German firm SiemensSchuckertwerke (SSW) with Ru¨denberg as inventor, had been confiscated as ‘‘alien property’’ by the U.S. government at the beginning of World War II. In 1945, as a new U.S. citizen, Ru¨denberg began to assemble the information necessary to have his patents restored. However, having had to leave Germany abruptly in 1936 under the guise of a business trip, Ru¨denberg had been unable to take his personal papers with him. Most were shipped later, but these were regrettably incomplete. In preparing for and in support of his testimony, Ru¨denberg had re-collected much background material about his early contact with pioneers regarding the nature and behavior of the electron, and about his own work and career. In 1947, the Federal District Court in Boston (Wyzanski, 1947a)1 assigned him ownership of his two U.S. patents (Ru¨denberg, 1931k,o) on the electron microscope derived from his German patent applications of May 30, 1931, and of June 1931 and March 1932.
1
The associated court records, henceforth referred to as ‘‘Rudenberg v Clark,’’ contain many details on the background of the inventor and his invention. See also http://scholar.google.com/scholar_case?case=12199444461003792188
210
H. Gunther Rudenberg and Paul G. Rudenberg
Ru¨denberg added to the manuscript in preparation for a lecture (Ru¨denberg 1948b) for the Centennial Celebration of the Scientific Schools at Harvard University. Also in 1948, I. B. Cohen, managing editor of the Review of the History and Philosophy of Science, suggested that he add ‘‘a general account of all the work done in the early days’’ and include ‘‘in particular the conditions under which the idea came to your mind’’ (Cohen, 1948). So Ru¨denberg began to expand his manuscript and to rearrange this information and add considerable historical references about earlier research on cathode rays that had led to discovery of the electron.2 He continued to refine his manuscript, in many ways a memoir of his career, and completed it in 1959. He had hoped as well to end the rumors and negative myths that were circulating at times surrounding the invention, which appeared to be based on lack of information about his background, the source of his invention, and specific details of his original patent applications. Even in the expanded version, the style is terse and matter-of-fact. He mentions only the names of his achievements or important events in his life, without revealing the underlying human drama that accompanied several of the more noteworthy episodes. In 1958, Ru¨denberg sent his manuscript to several editors, but each replied that the manuscript was too long or specialized for publication in their respective journals. Yet to him shortening it would remove a considerable portion of facts he deemed important. After his death (Dec. 25, 1961), his widow, Lily, deposited many of his papers, including this memoir, in the Harvard University Archives to make them available for historical research. The present commentary on Ru¨denberg’s manuscript seeks to augment Ru¨denberg’s writing with background from his son Gunther’s own recollection of discussions with him, with details supporting the statements he makes, and with relevant archival or historical information that might help clarify the Ru¨denberg manuscript regarding the development of his ideas, their implementation at Siemens, and the subsequent history of his patents. A section describing Ru¨denberg’s connection with the Farrand electrostatic microscope beginning in the 1940s is included because, though this was well after the important early period, Ru¨denberg appreciated this opportunity to contribute more directly to the building of an electrostatic electron microscope.
2
In some parts the manuscript became as a result less detailed and personal, so the present authors have added text from earlier drafts, always noting these additions, and reminding readers that these were not part of Ru¨denberg’s final draft.
Origin and Background of the Invention of the Electron Microscope Commentary
211
2. EARLY LIFE 2.1. Childhood and Education Reinhold Ru¨denberg was born on February 4, 1883, in Hannover, Germany. His mother, Elsbeth (1858–1947), was the fourth of eight children of Levi Herzfeld (1810–1884), chief rabbi of the nearby Duchy of Brunswick (Braunschweig). His father, Georg Ru¨denberg (1837–1918), owned a factory for processing down and down goods located in Wu¨lfel, a small and pleasant suburb a few miles south of Hannover near the placid river Leine. One hundred twenty years later, the city of Hannover has now grown such that it has almost swallowed this once-rural village. The decade of 1888–98 was an exciting period for a young boy interested in all things electrical. At that time the local newspapers presented many such discoveries in popular accounts. Within one decade each discovery in physics or electricity followed another almost every year: Hertzian waves in 1888; Branly’s coherer for detecting these in 1890; Marconi’s radio transmissions over ever-greater distances, starting in 1894; Ro¨ntgen’s discovery of X-rays in 1895; the establishment of the character of the electron by Wiechert and Thomson 1896–97; Braun’s invention of the cathode ray tube for displaying electrical signals, also in 1897; Becquerel’s discovery in 1896 of radioactivity; and Marie and Pierre Curie’s detailed investigations of radioactivity in 1898. Ru¨denberg avidly read stories and journal articles about such electrical and physical phenomena and often tried relevant experiments himself. Ru¨denberg’s science teacher at the Leibniz Gymnasium, Mr. Mu¨ller, who had become both a mentor and a friend, realized that his pupil had already read about much of what he was teaching. Consequently, during Ru¨denberg’s last year of high school, his teacher allowed him to sit in the back of the classroom and read current physics journals during class. As a junior he hand-wound a large inductor and with this built a Ru¨hmkorff induction coil.3 He used this to generate and experiment with Hertzian waves, and to provide the high voltage to power an X-ray tube—his first experiment, at age 16, with a type of electron beam device. Ru¨denberg excelled in his studies of electrical engineering at the Hannover Technical University (then Technische Hochschule Hannover) from 1901 to 1906. While still a student at Hannover he applied for his first patent, on a radio oscillator, which was granted in 1905. He also published the first of more than a hundred technical articles, showing an early aptitude for clear and concise technical exposition. His studies alternated
3
Now at the Collection of Historical Scientific Instruments at Harvard University.
212
H. Gunther Rudenberg and Paul G. Rudenberg
with several periods of practical employment in nearby machine factories and electrical companies. This was both to gain field experience and to earn money for his tuition and to help meet his family’s extremely tight budget at home. At the close of that century electrical engineering was a brand-new field, and most teachers were drawn to this infant field from other, related professions. One of his professors, also his thesis advisor at Hannover, was Wilhelm Kohlrausch (1855–1936), who had been educated as a physicist. Kohlrausch thus taught electrical engineering from the standpoint of physical fundamentals and frequently brought theoretical analyses to bear on his topics. This approach both suited Ru¨denberg’s perspective and provided him with the beginnings of his own lifelong approach to engineering problems of first analyzing the applicable fundamental relationships. Reinhold Ru¨denberg had already completed his doctoral thesis (Ru¨denberg, 1906) by the time he finished his undergraduate studies. He submitted this thesis on the date of receiving his undergraduate degree in electrical engineering. After a short delay because of such an unusually early submission, the faculty accepted his thesis, and he received his Dr.-Ing. (doctorate in engineering) degree with distinction in 1906. As soon as Ru¨denberg had completed his studies at Hannover, Professor Louis (Ludwig) Prandtl, an expert on aerodynamics, asked Ru¨denberg to work for him at Go¨ttingen University as Assistant and Lecturer in Applied Mechanics. At Hannover, Prandtl had been Ru¨denberg’s professor in courses on mechanics and hydrodynamics. When Prandtl was called to Go¨ttingen University in 1904, he remembered his former student. Ru¨denberg gladly accepted this position in the Institute of Applied Mechanics, which Prandtl directed, and spent the next two years there as lecturer and laboratory instructor. He also helped his professor do research on hydraulics and aerodynamics, and participated in building and operating optical and other instrumentation for Prandtl’s wind tunnel, the first of its kind. Ru¨denberg used this opportunity to take courses in advanced physics offered at Go¨ttingen University where he studied alongside Busch and Barkhausen. In 1907, Ru¨denberg took the celebrated course on ‘‘Advanced Electrodynamics’’ taught by Professor Emil Wiechert (Wiechert, 1896, 1897).4 Since the establishment of the character of the electron a decade earlier, many professors and postgraduate students at Go¨ttingen did theoretical or experimental research on the properties of 4
From his theoretical analysis of X-ray emission, Wiechert deduces that an electron before impact has extremely high velocity (later also Sommerfeld’s ‘‘Bremsstrahlen’’ theory), and thus must have a very small mass and be a subatomic particle.
Origin and Background of the Invention of the Electron Microscope Commentary
213
this newly recognized particle (Pyenson, 1979; Rudenberg and Rudenberg, 1994). Ru¨denberg also studied under astronomer Karl Schwarzschild5 and electron theory under Max Abraham6. Ru¨denberg spent two years in the University’s applications-oriented academic culture established by mathematician Felix Klein (Klein and Riecke, 1900). In many ways, both the courses and the friendships he formed with professors, fellow instructors, and research students in the amiable academic atmosphere found at Go¨ttingen had a deep influence on his subsequent professional life (Figure 2). It is not surprising that he drew from these rich experiences to develop his ideas toward the invention of the electron microscope.
FIGURE 2 Akademischer Mittagstisch, Go¨ttingen, 1907. According to handwritten note, from left : 2nd, Reinhold Ru¨denberg; 4th, Heinrich Barkhausen; 5th, Karl Schwarzschild. From right: 2nd, Ludwig Prandtl; 4th, Adolf Bestelmeyer; 5th, thought to be Hans Busch.
5 Karl Schwarzschild (1873–1916), astronomer and physicist. From 1901 until 1909, he was a professor at Go¨ttingen and director of the Astronomical Observatory. Schwarzschild published on electrodynamics of the electron and on the Hamiltonian equations of geometrical optics during his time at Go¨ttingen, both later highly pertinent to electron optics. In 1909, he became director of the Potsdam Observatory close to Berlin. 6 Notes from his courses with Minkowski, Wiechert, and Schwarzschild have been recovered and preserved.
214
H. Gunther Rudenberg and Paul G. Rudenberg
2.2. Professional Career In 1908, Ru¨denberg left Go¨ttingen University to follow his original goal to practice electrical engineering in an industrial setting. He moved to Berlin, where he joined Siemens-Schuckertwerke (SSW),7 starting as a Design Engineer in the Alternating Current Equipment Division. Before long Siemens management recognized his technical and managerial capabilities and advanced him rapidly in their firm. As early as 1910 he was appointed head of the department in which he had started in 1908. Other positions followed, leading to his appointment as Chief Electrical Engineer of SSW in 1923, where he supervised design and engineering functions of the firm. Ru¨denberg spent two years in the Siemens Patent Department, where he both learned much about the process and patented some of his own designs. From then on, he contributed what turned out to be a steady stream of inventions that grew to several hundred by the time he left Siemens (Ru¨denberg, 1936).8 These ranged from incremental improvements to major advances or new concepts, not only in the field of electrical machines but also in related areas. In his memoir, Ru¨denberg cites only a few of his own most important ones, primarily those relating to cathode ray tubes or electron apparatus most relevant to his electron microscope concepts. Based on his Go¨ttingen experience, Ru¨denberg recognized the importance of basic research for the advancement of technology. Consequently, he suggested that SSW establish a Research and Development function aimed at electrical power equipment and systems. When Siemens management agreed, they created a new Scientific Department (Wissenschaftliche Abteilung) and made Ru¨denberg its director, in parallel with his function as Chief Electrical Engineer. Ru¨denberg headed this group at SSW for thirteen years until his emigration in 1936, contributing during that time to the growth of Siemens with developments from his highvoltage laboratory (Trendelenburg, 1975). Throughout his life, Ru¨denberg loved to teach, enabling him to share his experience, knowledge, and approach with others. He was renowned for his clear descriptions in both lectures and writing. While a teaching assistant with Prandtl, Ru¨denberg also taught an evening course on precision mechanics for nearby companies around Go¨ttingen. The course had been organized by Felix Klein to provide support to their local industries. Once in Berlin, he taught a weekly course on electrical engineering at the
7
Siemens-Schuckertwerke (SSW), a major German manufacturer of electric power machinery and power systems, since 1952 was joined with other divisions and affiliates into a single Siemens company. The German Patent Office lists patents on their website, http://depatisnet.dpma.de, but some of the patent images are not searchable by the names of inventors.
8
Origin and Background of the Invention of the Electron Microscope Commentary
215
Technische Hochschule Charlottenburg (now Technical University Berlin), going there after a day’s work at Siemens. He became Lecturer (habilitiert als Privatdozent) at the Technical University in 1913, Professor in 1919, and ‘‘Honorarprofessor’’ in 1927 (Ru¨denberg, 1959, p. 13). During his professional life at Siemens, he was often called upon to lecture on electrical engineering topics, especially the behavior of transients, throughout Europe. In February and March of 1929, he traveled to the United States to be a guest lecturer on ‘‘Special Topics in Electrical Engineering’’ at Massachusetts Institute of Technology. There Ru¨denberg met a young graduate student, Truman Gray, who guided the new visitor from Germany around MIT.9 By an unusual coincidence Gray was doing experimental research to improve the concentration and spot size of the electron beam in a cathode ray tube and showed his thesis apparatus to Ru¨denberg . On seeing Gray’s research apparatus, Ru¨denberg mentioned his own experiences in Go¨ttingen twenty years earlier; he also described the 1926 and 1927 papers by his friend Hans Busch on the focusing properties of axially symmetric magnetic coils on electron beams. Apparently, news of Busch’s lens formulas had not yet reached the research group at MIT at the time of Ru¨denberg ‘s visit. Ru¨denberg taught at Harvard from 1939 to his retirement in 1952, teaching while he continued to develop and patent his invention ideas. In his later years, he made a teaching tour of several countries in South America. He began developing his ideas related to the conversion of radiation to electricity, a differential density microscope, imaging with ultrasound, and harnessing the tides for production of electricity.
2.3. Family Lily Minkowski (1898–1983) had first met the young lecturer Ru¨denberg during his work and studies at Go¨ttingen (1906–1908), where her father also taught. After Professor Minkowski’s death in 1909, his widow, Auguste, had moved to Berlin with her two young daughters, Lily and Ruth, to live closer to her brothers and her other relatives. The Minkowskis met Ru¨denberg again in 1916 at the funeral of astronomer Karl Schwarzschild, their mutual friend from Go¨ttingen, near the observatory in Potsdam. After the Minkowskis learned that Ru¨denberg now also lived in Berlin, Mrs. Minkowski often invited him for dinner at their home in Gru¨newald, a suburb of Berlin. For Reinhold and Lily, acquaintance grew to love (Figure 3). To Lily, Reinhold’s proposal at gardens near the Landwehrkanal, was ‘‘like in . . . an old German folksong’’ and they were married in the 9 See Gray (1929). His design was for greatest resolution, minimum spot size, and empirical optimum for anode shape to minimize spot size, but not yet using Busch’s theory.
216
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 3 Lily and Reinhold, engaged (1919).
Minkowski home in September of 1919 (L. Ru¨denberg, 1983, p. 3a). Reinhold and Lily settled in Gru¨newald, where Hermann Gunther was born in 1920, Angelika in 1922, and Frank Hermann in 1927 (Figure 4). In 1927, Lily began to study botany at Berlin University. Once she had begun her studies, the family always had an optical microscope in the house. Later, when the family lived in the United States in the 1940s, Lily became interested in plant genetics and began to volunteer in the Biology Department at the Gray Herbarium at Harvard University. There she investigated the chromosomes of little-studied plants at high optical magnifications, often using a phase-contrast microscope.
2.4. Polio The family enjoyed many traveling vacations together. However, soon after a family trip to Domburg, Holland, in August 1930, the Ru¨denberg’s youngest son, F. Hermann, developed a fever and paralysis. Ru¨denberg (1959) wrote: ‘‘Now it happened in the autumn of 1930, that my youngest son Hermann at the age of nearly three years was stricken with poliomyelitis’’. From his son Gunther’s viewpoint: Even this statement leaves out the cold fear the word ‘‘polio’’ or ‘‘infantile paralysis’’ evoked in my parents at the time my young
Origin and Background of the Invention of the Electron Microscope Commentary
FIGURE 4
217
Ru¨denberg children, Gunther, Hermann, and Angelika (ca. 1932).
brother fell ill in 1930. At the time some 15–25% of those infected with polio died when the paralysis affected their breathing. This possibility worried my parents very much. At that time in Germany, thousands contracted this disease every year. While some recovered with only minor after-effects, many others remained permanently crippled and had to wear heavy steel braces, use crutches, or even had to be confined to a wheelchair for the rest of their lives. It is important to understand that when my brother contracted ‘‘infantile paralysis,’’ there was no known cure for this fatal disease, no preventive vaccine, no clear way of diagnosis, and no method to visualize the infective agent’’ (Rudenberg, 1986). In her unpublished memoir, Lily Ru¨denberg describes the events as follows: From an overnight stop about one day’s drive from home we telephoned with Fra¨ulein Klara [who had taken the children home by
218
H. Gunther Rudenberg and Paul G. Rudenberg
train ahead of their parents] to find out how the children were. She said, ‘‘please don’t worry, little Bubchen has been sick and we had Dr. Michaelis.’’ When we arrived home in the afternoon she received us at the gate, her face white as a sheet.. . . When [Hermann] was told to get up, he could not do it, he had no strength in his knee, no command over one leg and foot, but also some additional weakness in the small of his back. When Dr. Walter Michaelis returned, he diagnosed it right as polio. . . the fear of having him paralyzed for life haunted us. . . we turned to [Uncle Oskar] for clearer concepts (L. Ru¨denberg, 1983, p. 27). Uncle Oskar had also told us, that doctors thought that many cases of polio were never diagnosed as such, because the patients seemed to have the flu without any special after-effects. Others had light, others severe or even deadly paralysis a consequence. We were also very worried that Gunther or Angela might similarly be afflicted.. . . During that year quite a few cases of infantile paralysis had occurred in Europe, many of them with much more tragic effect. This was an incentive for Reinhold to try to help his fellowmen by furthering an understanding of the disease (L. Ru¨denberg, 1983, p. 30).
3. THE PROCESS OF INVENTION As a father, Ru¨denberg desired a cure for his son. As a scientist, Ru¨denberg sought answers, such as where his son could have picked up the infection and the latest ideas about the disease (Figure 5). Stimulated by discussions with a number of consulting physicians, Ru¨denberg learned more about the disease: These consultants informed me not only on my request about the steps to be taken with my son but about the medical character of this illness which, according to their statement, was based on a virus which could not be seen even through the best microscope because of its minuteness, which only could be assumed was present by the fact that the illness apparently was an infectious illness. I tried to help the medical profession by thinking over in which manner possibly such minute particle as apparently the viruses were could be made visible (Ru¨denberg, 1947, pp. 49–50). Having been convinced that visualizing the submicroscopic polio virus would be useful to the diagnosis and perhaps treatment of cases of polio, Ru¨denberg set out toward this goal. In those weeks Reinhold had a pad of paper at his bedside table. When he could not sleep, because he worried about the boy, he
Origin and Background of the Invention of the Electron Microscope Commentary
219
FIGURE 5 Carbon copy of a letter dated August 23, 1930, from R. Ru¨denberg to a physician. ‘‘We tried to find out where Hermann could have picked up this dreaded infection and corresponded about it with the town doctor of Domburg’’ (L. Ru¨denberg, 1983, p. 30).
would ponder his invention and make drawings (L. Ru¨denberg, 1983, p. 30). Ru¨denberg had acquired an excellent working knowledge of optics, mainly during his years at Go¨ttingen. He had been required to design optical instruments for the wind tunnel during his work with Professor Prandtl. Ru¨denberg had further studied multilens optics under the celebrated astronomer Karl Schwarzschild (Ru¨denberg, 1907). In the late 1920s his wife studied botany and the family acquired a microscope for her studies. Ru¨denberg had also cultivated a strong friendship with
220
H. Gunther Rudenberg and Paul G. Rudenberg
Max Born. Physics professor Sigmondy had in 1908 shown Ru¨denberg an instrument he had developed together with Dr. Siedentopf at the Zeiss works related to the ultraviolet microscope. ‘‘These instruments were in 1930 the latest development in microscope optics’’ (Ru¨denberg, 1947, p. 50). With Oskar Minkowski’s explanations Ru¨denberg understood that one key to imaging the polio virus was the question of resolution—to overcome the limits caused by the relatively large wavelength of light. From this point in his pondering and through to his final designs his main goal was to find a method to overcome this limitation and to maintain a much higher resolution. He sought another form of electro-magnetic or other type of radiation which, like light, would travel in straight lines, was absorbed or reflected by thin objects, and could, by way of silhouette images, ‘‘catch minute details of an object’’ (Ru¨denberg, 1959). It was not sure to me from the beginning whether it was possible to use particle rays for the purpose of imaging. Up to that time nobody had tried to do that, and all imaging was done with the elements of light optics based on waves and wave lengths of light (Ru¨denberg, 1947, p. 51). Perhaps recalling his youthful experiments with a Ru¨hmkorff induction coil and a borrowed X-ray tube, Ru¨denberg considered whether X-rays, with their shorter wavelength, might be used for imaging a virus. However, he rejected the concept of using X-rays: It was known by this time that X-rays had very much smaller wave lengths than light. But I did not succeed in finding a concept by which X-rays and images of X-rays could be magnified by a factor of more than 2500 because of the absence in nature, apparently, of any lens like element to image X-ray structures (Ru¨denberg, 1947, p. 50). In addition, from his well-grounded education in physics, Ru¨denberg knew the well-known experiments of Crookes, Hittorf, and Lenard, which showed that cathode rays not only traveled in straight lines in what Ru¨denberg called ‘‘fieldless spaces’’ but also would be absorbed or scattered by an object placed between the cathode and the fluorescent screen10. The rays would diverge rectilinearly, as his patent application would state, to produce a shadow image of that object on the screen. I had worked with these particles together with colleagues on the basis of the establishments mentioned before and knew some of their properties and a bit of their behavior. I tried to develop a mechanism by which such electrons could be directed to a group, a sample of the viruses of the group of presumptive viruses, and to
10
See also Busch (1922). He mentions the possibility of an image (scharfes Bild des Diaphragmas), or sharp shadow image (scharfe Schattenbild), of an apertured diaphragm on a fluorescent screen.
Origin and Background of the Invention of the Electron Microscope Commentary
221
develop further the devices by which the paths of electrons could be controlled and made so regular that it was possible to form magnified images of the structure of these various samples to a detail or minuteness of contrast which apparently could not be observed by light fields, which compared with electrons were comparative giants (Ru¨denberg, 1947, pp. 50, 51). In the 1920s Ru¨denberg was in correspondence with Rogowski, and more commonly with his friend D. Gabor, with whom he continued to exchange offprints of the papers they each authored for more than 20 years (Figure 6). Gabor (1947) wrote to Ru¨denberg much later as follows: . . . your book on Transients . . . came out the first time when I was a student at the T. H. Berlin, and it has influenced my whole thinking on the subject. It was probably the main reason for my choosing
FIGURE 6 One of the many papers Gabor sent to his friend Ru¨denberg. Ru¨denberg and Gabor later began a dialog regarding the development of the electron microscope that extended through the 1950s.
222
H. Gunther Rudenberg and Paul G. Rudenberg
traveling waves and the cathode ray oscillograph for my doctor’s thesis. I wish only I had stuck to the cathode ray tube a little longer (Gabor, 1947). Ru¨denberg, an acknowledged authority on the use of the high-voltage oscillograph, had worked with this instrument throughout his professional life, especially for examining the waveforms of transient voltages in his power systems engineering. He had patented some of his own related designs as well. In December 1924, he patented an improvement for controlling the spot movement on the screen for observing transient phenomena (Ru¨denberg, 1924). On January 31, 1931, he had applied for a patent on ‘‘Electron tube with a part of the tube wall forming the anode’’ in which he described an electron beam-emitting electron tube with an elongated helical cathode and a cylindrical anode-forming metal container, creating a cylindrical magnetic field that decreased radial flow of the electrons and increased the axial component of the beam (Ru¨denberg, 1931a). Previously at Go¨ttingen, he had the opportunity of learning about and performing experiments with cathode ray tubes and observing the influence of magnetic fields on the cathode rays in such evacuated tubes. ‘‘These phenomena were discovered a few years ago, around 1900, by two professors of that university, at which I studied from 1905 to 1908’’ (Ru¨denberg, 1947, p. 30). From these Go¨ttingen experiments, and those of his colleague Hans Busch, he later recalled the strange shadows or images that had appeared in cathode ray tubes when attempting to focus them. Thus in his microscope patent one finds: This enlargement or magnification of shadows has hitherto been regarded only as a secondary phenomenon of minor technical importance. The reduction has in the past only been employed to obtain a sharply defined luminous spot (focus) (Ru¨denberg, 1931g,h,i,k). Hans Busch had sent Ru¨denberg copies of his 1926 and 1927 papers on the lens action (optical properties) of axially symmetric fields in cathode ray tubes oscillographs (Busch, 1926). These profound papers, which transformed electron ballistics to electron optics, had an incredible impact on everyone working with cathode rays and cathode ray oscillographs. Gabor described the atmosphere as follows: Busch’s [1927] paper was more than an eye-opener, it was almost like a spark in an explosive mixture. In 1927 the situation in physics was such that nothing more than the words ‘‘electron lens’’ were needed to start a real burst of activity (Gabor, 1942, p. 3). Geometrical optics was widely understood at the time, but no one had previously published the concept that it could be applied to electron beams. Hans Busch’s analysis of the lens effect of magnetic coils (and to some extent
Origin and Background of the Invention of the Electron Microscope Commentary
223
FIGURE 7 From U.S. patent application 613,857: ‘‘a magnetic stricture coil, the length of which is small compared with the length of the path of the beam of rays’’ (Ru¨denberg, 1931i).
electrostatic fields) was crucial to Ru¨denberg’s early ideas on electron imaging; thus, he included as the first illustration in his imaging patent a schematic related to Busch’s concept, showing a stricture coil focusing to a spot (and expanding from there)11 (Figure 7), and the following paragraph: In cathode ray oscillographs, electro-magnetic fields surrounding the beam concentrically and causing a stricture of the beam are often employed. These fields have an effect on the beam of rays similar to that of an optical lens on a ray of light (Ru¨denberg, 1931i). It is unclear whether in 1930 or early 1931 Ru¨denberg was familiar with the 1924 DeBroglie work on the wave nature of electrons. A phrase suggesting this knowledge is found in his U.S. microscope patent application, signed and dated May 1932, but not in the 1953 German text, presumably derived from May 1931 (Ru¨denberg, 1931c). However, Ru¨denberg, a meticulous student of Abraham, Wiechert, and Schwarzschild, clearly knew the character of the electron—and probably much of what was known at that time of electron behavior in electromagnetic fields (Ru¨denberg, 1907). Ru¨denberg had settled on a method to overcome the resolution limit posed by light by choosing to work with cathode rays. The key issues became how they could be used to form and to greatly magnify an electron optical image of an object. As Mulvey (1967) points out, with magnetic fields, the mental step from a focused burning spot to an electron image is not an obvious one, partly because ‘‘it is not obvious that a magnetic solenoid is capable of forming an image at all.. . . [A] magnetic field cannot change the velocity of an electron, it can only alter its direction. The concept of a magnetic field as a medium of anisotropic refractive
11 In Ru¨denberg’s portrayal of his imaging ideas, diagrams of electron lenses capable of imaging show a verticle arrow for the specimen and an inverted arrow for its electron image, while his electron lenses for focusing a cathode beam do not show this.
224
H. Gunther Rudenberg and Paul G. Rudenberg
index does not lead naturally to the invention of an electron microscope. The Busch papers offered no promise of the desired high resolution, even of a real cathode image. Busch had based his calculations on the nonexistent, infinitely long stricture coils, and infinitely short coils, because under these conditions a homogeneous magnetic field could be hypothesized. In contrast, ‘‘the existence and possibility of electrostatic lenses,’’ could be explained directly from an understanding of the Hamiltonian analogy of geometric optics (Gabor, 1942, p. 6). For this reason, though Ru¨denberg’s ideas were dependent on the Busch idea of an ‘‘electron lens,’’ he began to seek an alternative method to produce a homogeneous field of force to manipulate a cathode beam to produce a real image of an object. Through twenty years of work around cathode ray devices and vacuum tubes, Ru¨denberg became familiar with calculating their various electromagnetic forces. For example, in June 1925, Ru¨denberg patented a thermionic radio tube, including calculations of electrostatic field strength according to the radius (e ¼ C/r) of two coaxial cylindrical anodes surrounding the cathode in a high-voltage tube. The fields of force were equalized to reduce damage to the tube in several designs (Ru¨denberg, 1926). Students of cathode ray devices of the time knew that the angle of deflection of the rays of the beam depended on the square of the velocity of the beam, on electron mass, and on the radial electromagnetic force produced by an axially symmetric field (tan y ¼ el E/m v2). Ru¨denberg perceived that for electron imaging, in order to cause a cathode ray beam of homogeneous velocity to reach a focal point after passing through an object, and then to expand again containing in exact optical proportions and high-resolution, the silhouette image of that object, the field of force on the beam must be directly proportional to the distance of the rays of the beam from the central axis (Ru¨denberg, 1931b,c,i,n). When the electrons of a cathode ray stream through the diaphragm, they are repelled by it and have the tendency to move toward the axis. They are, therefore, deflected from their original paths, which were assumed to be parallel, towards the axis and made to meet in a convergent pencil or bundle in a focal point o, after which the beam becomes divergent. As the radial component of the field strength of the diaphragm is zero in its axis and at first increases outwards linearly, the individual electrons will be all the more deflected from their paths the greater their distance from the axis of the beam and of the diaphragm. Hereby all the rays are made to meet in the same focal point o. To obtain with sufficient accuracy this proportionality of the radial field strength to the distance from the axis, it is advisable to make the opening considerably larger than the width of the original beam (Ru¨denberg, 1931i,n,c). Busch later writes on the importance of creating a field that obeys the proportionality principle ‘‘dass die ablenkende Kraft proportional mit dem Achsenabstand r zunehmen muss’’ to focus and to image:
Origin and Background of the Invention of the Electron Microscope Commentary
225
. . . a radial electric field (a cylindrical condenser) or a circular magnetic field (current-carrying wire). Both arrangements, which have often been investigated up to quite recently, are however— quite apart from the very perturbing shadow effect of the wires in the axis in both arrangements—essentially unusable because they do not satisfy one important condition: for genuine concentration, it is necessary for all the rays in the beam to be reunited at the same point. For this to be valid for both the inner and outer rays, the outer rays must experience a stronger force; this implies that the deflecting force must be proportional to distance r from the axis. In both the above arrangements, however, the deflecting force is proportional to 1/r and they cannot therefore be regarded as a solution of the aim to produce a concentrating or even imaging system (Busch, 1936, p. 585). Ru¨denberg reasoned, and worked, as follows: . . . the intensity of the deflecting electro-magnetic field must be proportional to the separation of any point from the beam axis. Many different structures of electric lines of force were devised in order to find an embodiment of an electron lens, namely a shape of electrode giving the radial condition required. Finally, circular rings or apertures in discs, both concentrically arranged around the electron beam, were found to give near the center of the opening a deflecting field-strength proportional to the distance from the axis. Thus, an electrostatic lens for electron beams was disclosed. A short concentric magnetic coil as already known by H. Busch’s publication also appeared very useful as a magnetic electron lens (Ru¨denberg, 1945a, p. 5). For Ru¨denberg , the closest thing to the ‘‘infinitely short’’ field of force suitable for the task of imaging minute shadows was the apertured, electrostatically charged metal diaphragm (Figure 8). He determined, perhaps using the Laplace equation, that the fields of force around this charged apertured diaphragm were such that close to the center of the aperture, near the beam axis, a field directly proportional to distance from
FIGURE 8 Diagram of Ru¨denberg’s electrostatic lens from DE 906737(1931c), U.S. application 613857(1931i), and corresponding European patents.
226
H. Gunther Rudenberg and Paul G. Rudenberg
the axis could be found. These electron lenses must also be of proper shape such that their force is exerted on the beam only within an axially narrow zone. To obtain a sharp lens- or focus-effect, forms of electrodes or diaphragms with edges forming such curves are employed that their effect upon the electron beam passing through them only takes place within a narrowly limited zone in that within that range the radial field strength is as proportional as possible to the distance from the axis, whilst the actual field strength to the right and left of the diaphragm is symmetrical in its course, in order that the additional axial acceleration and retardation of the electrons cancel each other in their effect (Ru¨denberg, 1931b,n). So with the properly controlled beam, he reasoned that enlarged imaging would be possible. In similar fashion, Davisson and Calbick (1931, 1932), by mathematic formulation alone, calculated the focal length of an electrically charged cylinder and an apertured diaphragm. Some question whether Ru¨denberg could make that jump theoretically, instead of experimentally, from Busch’s description of electron optical concentration of cathode rays to actual imaging of an object. In favor of Ru¨denberg’s descriptions of this inventive step, one must consider the following: (1) The Busch papers reminded him of his much earlier laboratory experience, some of it together with fellow graduate students, of which he recalled enlarged shadows produced in Braun tubes. (2) For Ru¨denberg, electron images were nothing more than these enlarged shadow-silhouettes, as is clear in the first paragraphs of his initial patent applications (Ru¨denberg, 1931i, this volume, p. 272). His model of imaging at the time was rather simple: After traversing the object, the cross section of the electron beam contains nothing more than the object’s ‘‘shadow pattern’’ in the form of ‘‘diversified electron density,’’ and this pattern would expand rectilinearly. (3) Most importantly, at the time that Ru¨denberg disclosed the idea forming an enlarged image of an object using lenses consisting of charged, apertured diaphragms, there were no known laboratory or published examples of this particular method of imaging. Both Knoll (1929 patent, not published until 1940) and George (1929) had described apertured diaphragm focusing devices for oscillographs, but neither referred to the possibility of forming an image of an object with such a device (Mulvey, 1962). Ruska had rejected apertured diaphragms in favor of curved and straight mesh screens. Ru¨denberg must have arrived at this concept, used throughout the history of electrostatic electron microscopes, from his own theoretical analyses. Since it was clear to Ru¨denberg that a charged diaphragm would affect the velocity of the beam, and that this might influence image resolution,
Origin and Background of the Invention of the Electron Microscope Commentary
227
FIGURE 9 Neutral lens designs from DE 889660 (Ru¨denberg, 1931b) and U.S. patent application 615260, Apparatus for influencing the character of electron rays (Ru¨denberg 1931n).
he devised several combinations of apertured diaphragms and rings that would have no overall effect on the axial potential (Figure 9). He arrived at the lens design of a central apertured charged diaphragm or ring surrounded in an axially symmetric fashion by two apertured diaphragms of an opposite charge, thus returning the electron beam to its previous velocity. Thus he devised what he claimed to be the first design for a multiple electrode, apertured diaphragm, ‘‘neutral’’ or ‘‘einzel’’ lens intended for enlarged imaging of an object. This lens became the basis of his microscope designs and can be seen in a variety of constructed electrostatic microscopes and other electron optical instruments, including the Farrand microscope, on which he consulted twenty years later.12 Ru¨denberg disclosed his lens design as useful in a variety of electron optical devices, including oscillographs designed to measure rapid occurrences, and for a ‘‘microscope’’ with which ‘‘a much greater magnification can be obtained than with our optical microscopes’’ (Ru¨denberg, 1931b,n). He later summarized his electrostatic lens design as follows: 1. The aperture of a charged diaphragm symmetrically to the beam axis is sufficiently large for the passage of the entire beam. (. . . the focusing occurs with low aberration and high definition) 2. The electrodes are polarized and spaced with respect to the source, so that their focusing charge is either positive or negative. 3. The beam is convergently focused by use of a negatively charged diaphragm. 4. The diaphragm is composed of a plurality of concentric elements. 5. Two concentric elements of different sized apertures are used. 6. One central element is surrounded by an outer element both charged at different potentials.
12 See also the Siemens and Halske design of an apertured diaphragm lens electrostatic microscope with diffraction capabilities (Ruska and Jaentsch, 1942).
228
H. Gunther Rudenberg and Paul G. Rudenberg
7. The shape of the electrode is so that they affect the beam in a narrowly limited zone, the axial field being symmetrical to cancel out axial acceleration and retardation of the electrons (Ru¨denberg, ca. 1950). Initially Ru¨denberg was not much interested in the magnetic stricture coil, except in principle, but was instead trying to devise a more ideal field of force for focusing with minimal aberration. For most early workers, the implication of Gabor’s improved iron-clad coils with regard to axial length and focal length had not yet been realized. But Ru¨denberg was on a different path—arriving at the electrically charged diaphragm for imaging, of which the forces and his understanding of them were more conducive to his present mental process. He also explained that for use in oscillographs his charged diaphragms have, ‘‘contrary to the known magnetic influence through stricture fields, the advantage of having very much smaller time constants for their action, and are therefore capable of responding correctly as to quantity to exceedingly rapidly occurring influences’’ (Ru¨denberg, 1931n). In a poorly understood aspect of his earliest designs, Ru¨denberg chose to include only electrostatic lenses in the descriptions and figures of his earliest microscope objective and projection lenses, reserving Busch-type magnetic coils for manipulating the beam before it reached the object. He explained this later: This minuteness necessitates proper qualities of the electron lenses with respect to their imaging properties in order to obtain the desired resolution of the details of the objects. Objective and projection electron lenses, and any other intermediate lenses of such type have to fulfill certain quality conditions in order to obtain this aim. For the condensing lens, used to concentrate the electrons on the object, this is not necessary. The condition for such precision imaging of the objective and projection lenses is, that the electromagnetic forces on the electron rays are strictly proportional to the separation from the axis through the entire width of the electron beam. Such lenses either may be built by proper shaping of their surface; or they must have an opening which is large as compared with the width of the electron beam carrying the pattern. With a beam of such width and concentration coils of such dimensions as used by Busch, merely acting analogous to condensing lenses for light, it would be impossible to develop an electron microscope having the power of resolution sufficient to magnify submicroscopic objects (Ru¨denberg, 1945b). Ru¨denberg’s lack of confidence in then-available magnetic coils with their significant axial width is mirrored by comments of those later defending his patent applications before the U.S. patent office:
Origin and Background of the Invention of the Electron Microscope Commentary
229
. . . [E]lectrostatically charged diaphragms . . . in the applicant’s case may be made as narrow disks with a cross-section most favorable for the desired electrostatic field. . . The winding of [magnetic] coils by the very nature and purpose of the coils, requires a certain length of space in the direction of the beam axis.. . . It is very difficult to obtain with such coils a very well defined deflection of the rays of the beam on account of the length of this field in axial direction (Ru¨denberg, 1931j). No figures of magnetic imaging lenses were included in Ru¨denberg’s initial disclosures. His lack of emphasis on magnetic lenses in these early patent applications may have become clear to others. Two outstanding pioneers in the field, Ernst Ruska and Bodo von Borries, perhaps became aware by March of 1932 of the Siemens applications. Ruska much later stated, ‘‘Even before Ru¨denberg’s patent applications were known, the author had understood that Siemens had applied for patents for the electron microscope’’ and ‘‘no Ru¨denberg application showed a picture of or offered any proposal for how magnetic fields were to be developed in order to obtain electron lenses of short focal length’’ (Ruska, 1986). They hastened to apply for a patent of their iron-shrouded pole-piece lenses (patent DE 679857, submitted March 17, 1932). Besides the shortened focal length, these lenses produced a magnetic field of narrowed axial width, which may have increased their capacity to produce a high-resolution electron optical image. Throughout the commercial history of the instrument, magnetic lenses have been favored for a variety of reasons, including fewer problems with lens aberrations. In 1993, Dr. Gertrude Rempfer, who had worked in development of electron microscopes for almost 50 years, saw this tendency as artificial since high-vacuum and high-quality lens materials were available (Rempfer, 1993). Instability in power source voltage, problematic in the first decades of electron microscopy, was also less of an issue with electrostatic lenses that could be powered by the same source as the cathode. Ru¨denberg included in both the initial lens and first imaging patents a new method for velocity filtering of the electron beam (monochromator) before it illuminated the object (Figure 10). This again would help preserve high resolution by reducing chromatic aberration. By way of two axially symmetric electrostatic or magnetic fields, separate from the anode of the beam source, and a smaller neutral aperture to block rays of other velocities, beam velocity was selected. The object plane, where a specimen would be inserted, was placed after the velocity selector and/or condenser. With ‘‘the object not being simply the cathode or anode diaphragm,’’ this design differed greatly from existing shadowing beam apertures.
230
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 10 The Ru¨denberg velocity selector b1 and b3, charged diaphragms; b2, aperture; 0, 01, 02, focal points. Magnetic coils could replace the charged diaphragms. (From U.S. patent application 615260.)
FIGURE 11 Schematic representation of the Ru¨denberg microscope as of May 30, 1931, including velocity selector and objective lens, from patents DE 906737, U.S. 2,058,914, FR737716, and CH165549 (Ru¨denberg 1931c,g,h,i,l). (a1, electrostatic stop; a2, charged stop; 01, 02, focal spot; C, zero potential or charged stop; a3, stop, d object to be magnified; a4, stop charged to converge the rays) ‘‘behind the stop a4 further stops may be placed.’’
Ru¨denberg combined the various elements of his design, describing and proposing an apparatus with elements to maintain high resolution that he thought would be capable of visualizing the polio virus, shown schematically in Figure 11. Ru¨denberg proposed combining his electrostatic lenses in an arrangement of cascading projection lenses, instead of using one lens of exceptionally short focal length and therefore exceptionally high voltage. These, he suggested, could produce the desired high magnification of an object without making the instrument excessively long or reducing resolution. Ru¨denberg also described electrostatic fields of positive bias in relation to the cathode to produce a diverging field, which he saw useful for magnified electron imaging. This design, including use of the diverging fields in the cascading projection lenses, caused objections to the announced German patent in 1939, which resurfaced in 1953 before the patent was finally granted (Eisenzapf, 1953a). This has remained one controversial aspect of his design, suggesting to some that his ideas were patented before enough experiments were performed. Through 1953 Ru¨denberg
Origin and Background of the Invention of the Electron Microscope Commentary
231
maintained that cascading diverging lenses were feasible in a compound field design resembling that used in a cathode lens. The realization of a cascade of diverging lenses is in reality very simple. These must be positively charged and for this purpose a voltage that is positive and increases in the direction of the electron motion must be applied to them. If, for example, the cathode is held at –10000 volts and the anode of the beam source as well as the object d are both at zero volts, then the four lenses shown in Figure 2 could be held at the following voltages: b1 at þ2000V, b2 at þ4000V, b3 at þ6000V and b4, þ8000V. Between each lens there is thus a positive potential increase of 2000 volts. This information may be of value if further developments are made (Ru¨denberg, 1953a). It has since been shown possible to produce a diverging three-aperture electrostatic lens by placing an electrically conductive, electron-transparent film or fine screen over at least one of the two outer apertures with the center electrode of positive potential with respect to the end electrodes. Such diverging electrostatic lenses are over corrected for spherical aberration and were advantageous in combination with an under corrected divergent lens (Fleming, 1953). Ru¨denberg added two innovations to improve his microscope design for examining living biological specimens—testimony to his desire to visualize the polio virus and his early confidence in the potential of the instrument (Ru¨denberg 1931e,f). First, he proposed a means for reducing the velocity of the electron beam just before the beam arrived at the specimen, and then reaccelerating the beam immediately thereafter. His June 27, 1931, patent stated ‘‘when examining living objects. . . slowly moving electrons and low vacuum may be more advantageous to avoid the destruction of the object to be examined and to obtain good contrasts’’ (Ru¨denberg, 1931f,k). He included a separate compartment for the specimen that was kept at lower vacuum to allow the biological, or perhaps living specimens, to remain intact. He also wrote in 1932 regarding the work within the Siemens group, that ‘‘our main goal is to image submicroscopic stationary or moving objects multiply magnified. If they cannot be exposed to full high vacuum they may be exposed and observed through transparent windows’’ (Ru¨denberg, 1932). It was several years before others were convinced that useful images of organic specimens could be obtained in an electron microscope. In his memoir, Ru¨denberg states that ‘‘magnetic lenses could be built with sufficient precision’’ to meet the conditions he had defined for imaging. Despite reservations about the capability of existing magnetic stricture coils to image precisely, Ru¨denberg generalized his design to include all ‘‘electro-magnetic fields of force,’’ elaborating a set of overall principles of the high-resolution electron microscope and incorporated
232
H. Gunther Rudenberg and Paul G. Rudenberg
these into his microscope application 48483, dated May 30, 1931. These allowed for, and apparently legally protected, designs that included all types of electromagnetic fields that corresponded to his general principles (i.e., radially proportional, homogeneous fields of force and parallel, homogeneous electron beams). According to my invention, the effect, similar to that of lenses in optics, produced by fields of force surrounding concentrically a beam of electrons and exerting a radial influence on the same, is utilized for enlarging objects in a manner corresponding to that obtained with optical magnifying glasses and microscopes. For this purpose, the object to be enlarged is exposed to an electron ray or a beam of rays, and before or behind the object, the beam is by means of lens-like acting fields made convergent or divergent (Ru¨denberg, 1932i,k). Thus, the final Ru¨denberg microscope design disclosed in May of 1931 consisted of (1) an electrostatic multi-lens system designed to produce a high-resolution, high-magnification microscope, as well as (2) a set of general principles for electron microscopic design applicable to both electrostatic and magnetic lenses. In his 1945 description of his patents, he emphasized high resolution and high magnification as follows: An electron-objective-lens greatly magnifies the intrinsic structure or pattern to be studied of an object to a real image, this image being constituted of a pattern of diversified electron density as impressed on the cross sectional area of the electron beam by the actual pattern of the object. There is formed in the empty space a greatly magnified real image of the object-pattern, and this is enlarged again 1, 2, or more times in cascade to ever increasing magnification before its details are finally made visible to the human eye on a luminescent screen. The minute pattern of an object is captured by variations in the cross sectional density of space charge of an electron beam and the entire cross section of the beam is expanded with sharply defined magnification of the pattern until good visibility is attained (Ru¨denberg, 1945b). T. Mulvey (1962) summarized Ru¨denberg’s accomplishments as follows: ‘‘Ru¨denberg’s patent expressed two important ideas in print for the first time: i) the overcoming of the resolution limit set by the wavelength of light by means of an image of an object formed by electrons, and ii) the means of successive magnification of that image by a series of electron lenses so that the resolution could be realized in practice.’’ In another account, It is quite obvious that applicant is the first who has invented an electronic microscope, that is to say, a device in which the electron
Origin and Background of the Invention of the Electron Microscope Commentary
233
beam is affected electrostatically by elements in the same manner in which a light beam is affected by lenses, and in which this arrangement is used for magnifying objects. By suitably electrostatically charging a number of diaphragms a cathode beam can serve for magnifying a microscopic object to an extent to which it was not possible heretofore with ordinary optical microscopes. That is a discovery of the first order, and there is nothing contained in the literature so far as applicant is aware which has suggested such an idea’’ (Knight Brothers, 1935).
3.1. Early Patent Applications On May 27, 1931, Ru¨denberg disclosed his ideas by way of two pages of penciled drawings to his friend Dr. Ludwig Fischer, the head of the Siemens patent department. Ru¨denberg later stated, I showed him by means of some figures, some drawings, quickly made drawings and explanations, in which way I thought such a device would work. I had to disclose the invention to my company, anyway, and I selected Dr. Fischer because he was, from the time I worked at Siemens and Halske under him, a close friend of mine, and I wanted that he, with his wide experience in engineering and physics—he was a Doctor of Physics—to criticize my invention and tell me if he could where this invention was based on real things, or possibly something in the clouds’’ (Ru¨denberg, 1947, p. 52–53). Dr. Fischer was enthusiastic and saw some urgency in getting the applications to the patent office. ‘‘He suggested to apply for a German patent immediately because he considered this invention and the instrument which might be produced on the basis of that invention highly important’’ (Ru¨denberg, 1947, p. 56–57). This urgency has been much discussed, leading to much speculation. Ru¨denberg suggests that he went to Fischer more to get his reaction to his ideas and that Dr. Fischer wished to act quickly (Ru¨denberg, 1947, p. 52–53). Fischer clearly saw the implications of this invention. As Gabor had pointed out, the time in electron optics was an ‘‘explosive’’ one and there were many pioneer workers13. An inventor in this field would want to set out his or her ideas promptly. The upcoming Cranz Colloquium with the title ‘‘Berechnungsgrundlagen und neuere Aufu¨hrungsformen des Kathodenstrahloszillographen’’14 could have been the stimulus, but Ru¨denberg never states this. Others associated with the electron microscope
13 14
Often forgotten in this group, Leo Szilard laid out his microscope ideas in a patent application of July 1931. Principles of design and new ways to construct the cathode ray oscillograph,
234
H. Gunther Rudenberg and Paul G. Rudenberg
expressed their need to act in this period in a similar rapid fashion (Ruska, 1986).15 A young electrical engineer in the Siemens patent department, Mr. Curt Abraham16 (later called C. Avery), was called in to help prepare the applications for the German patent office. Ru¨denberg had felt a little hesitation with respect to involving SSW because, although he was required to disclose all inventions to his employer (Wyzanski 1947c, page 5), he hoped to preserve his rights to an invention he had developed at home. However, Dr. Fischer reassured him about this. Ru¨denberg explained to Mr. Abraham ‘‘the essence and the details of the electron microscope, the lenses, the specimen, the screen and their relation to each other, which I had in mind, and he produced a specification and claims for the German patent application’’ (Ru¨denberg, 1947, p. 57). In testimony under oath in 1947, Avery described the events as follows: A (Avery). When I entered Doctor Fischer’s office both people were talking about the matter, and after a while they were ready to deal with me. Then Doctor Fischer said—of course I do not remember the exact words. The Court. I do not expect you to, after 15 years. A (continuing). But I have a vivid memory of the whole thing, because it was quite exceptional. He said, ‘‘here is an invention that is somewhat unusual and ought to be given immediate attention, and Professor Rudenberg will explain it to you.’’ From then on, Professor Rudenberg talked with me and Doctor Fischer was more or less just a spectator; and Professor Rudenberg handed me two pencil sketches, each having a number of figures, and then he explained to me the principles of one or two inventions, depending how one looks at those things. Both were related to the electron microscope. Q. What was the origin of these inventions disclosed to you? A. The origin of these inventions had not been disclosed to me, not during the entire period of my activities in Germany at Siemens.
15
To understand the pressures on these early workers, it is informative to read Dr. Ruska’s letter of March 16, 1932, to his supervisor Professor Matthias, where he explains to his supervisor his urgency to apply for patents when he had received a report that ‘‘certain matters pertaining to the area of electron microscopy are about to be commercially explored’’ at Siemens and that ‘‘priority could be lost even by delaying the application until tomorrow’’ (Ruska, 1986). 16 Dr. Curt Abraham, an electrical engineer (doctorate, Berlin Institute of Technology, 1934) who studied some commercial and patent law as well, was employed in the Patent Department of Siemens and Halske from 1927 to 1937, primarily associated with the work of the research laboratories. In February 1937, he immigrated to the United States and in 1939 his name was changed by order of the New York Supreme Court to Curt M. Avery. Avery worked as a patent attorney for the Westinghouse Electric Corporation and later with the Knight Brothers Firm. He authored Das US-Patent, published in 1967.
Origin and Background of the Invention of the Electron Microscope Commentary
235
Q. What were your instructions at the close of the conference? A. At the close of the conference I had a talk with Doctor Fischer about how to treat those inventions. Q. In Doctor Rudenberg’s presence? A. That I do not know definitely. He may have been there, but I just don’t know. And Doctor Fischer told me to treat those two intended applications as if they were company applications. I mentioned to him whether they should be filed for S. and H. or S.S.W., because they were obviously out of line for S.S.W. He told me, ‘‘No, you file them for S.S.W.’’ The Court. The reference to it being clearly out of line with S.S.W. products will go out and be disregarded by the Court. Q. Did you understand the inventions? A. After they were explained to me, I understood them perfectly. Q. Had you previously prepared any patent applications on the inventions or Dr. Rudenberg? A. None at all. Q. Did your assignment to prepare applications on these inventions take precedence over your other work? A. Yes, I dropped practically everything that could be dropped, and started immediately to work on these applications. Q. I assume you prepared the patent applications. Did you prepare anything besides the specifications and claims? A. I prepared first this notice of an invention, and then prepared the specifications and claims. Q. Which is Defendant’s Exhibit I here? A. I also transmitted those two sheets of sketches to the draftsmen, and had drawings made (Avery, 1947, p. 185–186). On May 27 to 29, a Siemens ‘‘Erfindungsmeldung,’’ or ‘‘Notice of Invention’’ and two patent applications on the Ru¨denberg disclosure were prepared by Mr. Abraham. The first application, titled, ‘‘Anordnung zum Beeinflussen des Charakters von Elektronenstrahlen,’’ and numbered at Siemens, ‘‘48476,’’ Apparatus for Influencing the Character of Electron Rays, described and claimed Ru¨denberg’s single- and multiple- electrode apertured diaphragm electrostatic lenses for use to achieve a sharper and brighter writing spot in cathode ray devices, for a beam velocity selector, to control beam intensity, or to build a microscope of higher-than-optical magnification (Ru¨denberg, 1931b,n). A second application, originally titled ‘‘Einrichtung zum Abbilden von Gegenstanden’’ and numbered ‘‘48483,’’ Apparatus for the Imaging of Objects, described and claimed general high resolution electron microscope principles, and developed more specifically the use of Ru¨denberg’s electrostatic lenses to create a high-resolution, very high-magnification, image of an object via an electron beam. These two applications were developed with some further consultation of Ru¨denberg and were submitted to the German patent office in Berlin by Mr. Abraham on May 30, 1931.
236
H. Gunther Rudenberg and Paul G. Rudenberg
Siemens filed two additional patent applications with Ru¨denberg listed as inventor on June 26, 1931, and two more on March 30, 1932. It is not known when Ru¨denberg had disclosed these ideas. These applications included the designs for protecting biological specimens and for subsequent magnification by means of an optical microscope. In May 1932, applications corresponding to the two German applications of May 1931 were submitted in six countries: France, Austria, Switzerland, Britain, the Netherlands, and United States (Appendix 3). In these submissions the four later German patent applications were combined with initial application no. 48483. For the two U.S. applications, Ru¨denberg signed before the U.S. consulate on May 14, 1932, that he was the sole inventor of the contents of the applications (Ru¨denberg, 1931i,n). Besides the very short mention in Die Naturwissenschaften in May (Ru¨denberg, 1932), the first public announcement of the Ru¨denberg patents took place in Paris in early October 1932, when two patents, the electrostatic lens application and the combined microscope patent application, were displayed for comments. They were published on December 15 and 16 of that year. These publications were available by December 1932 and spoke clearly of Ru¨denberg’s design of a higher-than-optical resolution microscope based on electron lenses. Patents were granted on the combined microscope patent application in Austria (1934), Switzerland (1934), and the United States (1936). In the Netherlands, the original May 30 German application 48483 was again split off and perhaps combined with 48476, producing, interestingly, a 1938 patent titled ‘‘Inrichting voor het vergroot afbeelden met behulp van een electronenmicroscoop met electrostatische lenzen17 (Ru¨denberg, 1931m). In England and the United States, the electrostatic lens application was patented in the mid-1930s.
3.2. Early Work at Siemens Ru¨denberg’s microscope design was based on ideas he developed over the winter of 1930–31. In the pattern of other Ru¨denberg inventions, he had not yet built the apparatus at the time of his patent applications. He explained this later: The Court. I am going to interrupt because the chronology is quite important. Had you by that time in any way incorporated that theory in any kind of apparatus to demonstrate it? The Witness [Ru¨denberg]. No.
17
‘‘Apparatus for magnified images using an electron microscope with electrostatic lenses.’’
Origin and Background of the Invention of the Electron Microscope Commentary
237
The Court. Embodying it? The Witness. No. We were used throughout Germany to describe an invention as accurately as possible, in order to throw light possibly on the disclosure before such apparatus were built, because of economy. . . The Court. Just put it in your own words instead of saying ‘‘disclosure’’ because I am not quite clear yet. The Witness. I was very used to think an invention through to the very end before it was put into the real material by building an apparatus (Ru¨denberg, 1947, p. 51–52). The Ru¨denberg patent applications were distributed widely at SSW and the sister firm, Siemens and Halske (Ru¨denberg, 1947, pp. 203–205, Avery, 1947, p. 71). After filing his German patent applications of May of 1931, Ru¨denberg began practical analysis. In an early draft of his memoir, Ru¨denberg wrote that he requested two of his assistants, Dr. Max Steenbeck and Dr. G. Berthold, to carry out some initial experiments with electrostatic lenses, but after some initial failure they soon lost interest (Ru¨denberg, 1945a). Steenbeck in 1960 had reported that ‘‘He thus applied for a patent for this idea, without me knowing anything about it. If he had spoken to me about it, I would have found the suggestion nonsensical at that time anyway’’ (Steenbeck, 1960). Dr. Avery also remembered work in Ru¨denberg’s department, but not laboratory development, by Dr. Pohlhausen. On April 15, 1932, a patent application of Dr. Pohlhausen was filed by SSW (application 51698 of April 15, 1932, entitled ‘‘Bestimmte Form Electrostatischer Blenden’’18) but it was later withdrawn (Avery, 1964). On June 18, 1932, R. Swinne, who worked under Ru¨denberg at SSW, applied for two additional patents regarding the electron microscope: DE 915843, ‘‘Einrichtung zum Abbilden von Gegenstaenden,’’ and DE 916830, ‘‘Einrichtung zum vergroesserten Abbilden von Gegenstaenden in zwei Stufen.’’ The development of the microscope at SSW was soon stopped in its tracks by company policy. In 1931, Siemens Director Schwenn had not allowed Ru¨denberg a budget for research to develop an electron microscope he thought ‘‘hypothetical, a brainstorm,’’ certainly not within his Scientific Department for Electric Power Systems at SSW. Any experimental work on electron lenses and an electron microscope would have been under the umbrella of Siemens and Halske, designated to work on electrical and electronic instruments and communications equipment (Weigend, 1946). In this period of financial instability, a committee of managers that included directors of SSW and Siemens and Halske carefully allocated funds and determined research priorities for each Siemens laboratory. Ru¨denberg had discussed the issue with Dr. Schwenn on one occasion. However, ‘‘Schwenn’s position was that he did not see any
18
‘‘Specific Form of Electrostatic Apertures.’’
238
H. Gunther Rudenberg and Paul G. Rudenberg
commercial value in the electron microscope, and therefore refused to develop it commercially in his department for electrical instruments’’ and he said, ‘‘Besides, the fundamental patents are within the Siemens group, and so if somebody else should develop the same thing, we are protected’’ (Ru¨denberg, 1947, p. 142). Because in 1939 all Austrian patents became German property, Siemens was protected not only by their German applications, but by the Austrian patent (Ru¨denberg, 1931l), which contained five of Ru¨denberg’s early microscope applications. However, at Siemens and Halske, work on a microscope began. Soon after the Ru¨denberg patent May 30, 1931, applications, ‘‘the first electron microscope was constructed and tested in the Research Laboratories at Berlin Siemensstadt, then under the direction of Prof. Dr. Gerdien. The work in the Research Laboratories was done by Dr. Lu¨bcke,’’ chief physicist in the research laboratory on arc discharge tubes (Avery, 1964). Avery reports that The particular initial tubes involved were rather tall, about 1 meter or more. The first microscope was provided with a tubular glass vessel similar in shape and size. The lenses were magnetic. They produced an intermediate image and magnified it on a screen or photographic plate. The first object used for testing purposes was the incandescent cathode of the electron gun. The cathode had a planar surface with a scratched-in grid pattern. The first magnifications were moderate but clearly showed the workability of the principle (Avery, 1964). Dr. Avery had offered similar testimony in the 1947 case: Q. Do you know how Doctor Luebcke19 of Siemens and Halske came to make a reduction to practice of one or more of Doctor Rudenberg’s inventions? A. [Avery]. Yes, I do, because I prosecuted all other patent applications for Luebcke at that time. When the two German applications of Professor Rudenberg were filed, copies were distributed among other divisions and departments of Siemens and Halske and Siemens-Schuckertwerke that might possibly be interested in these inventions. Since I was working for the research laboratories, I sent them a copy.
19 Dr. Ernst Lu¨bcke, born 1890 (Wolfenbu¨ttel), received his Dr. Phil. in Physics, U. Go¨ttingen, in 1916. He worked at at the Siemens Forschungs-Laboratorium (1925-33) in the development of industrial acoustics, and later at the Dynamoworks of SSW (1933-1945). His research included, in part, gas discharge cathode ray tubes for the generation of high frequency sound. Dr. Lu¨bcke also taught at the Tech. Hochschule Braunschweig(1929-45). He contributed to the measurement, standardization, and reduction of machinery noise, and created the ‘‘Lu¨bcke rating curves’’. With L. Cremer, E. Meyer and others, he published ‘‘Schallabwehr im Bau- und Maschinenwesen’’ in 1940, opening the field of machine acoustics. Other writings included the psychology of hearing and underwater sound transmission. Lu¨bcke was appointed Professor in Experimental Physics at the Univ. of Rostock in 1946. However, he was forced to leave for the USSR that year, and remained until 1953. In 1953 Lu¨bcke began teaching at the Institute for Technical Accoustics at the Technische Hochschule, Berlin. He died in Berlin in 1971. (source: Univ. of Rostock, Catalog of Professors). Throughout the court testimony, the anglicized version of Dr. Ernst Lu¨bcke’s name is used.
Origin and Background of the Invention of the Electron Microscope Commentary
239
Q. Would you specify what research laboratories you have in mind? A. I am speaking of Siemens and Halske, doing more or less scientific research, as distinguished from the laboratories doing shop work or shop developments. Q. And you were previously explaining how Dr. Luebcke came to be interested in this subject? A. Well, Doctor Luebcke had a laboratory within the research laboratories, engaging in the development of a gas-filled electronic tube for generating high-frequency, intended for medical purposes, and he had at his disposal several glass blowers of exceptional skill, because his particular line required glass-blowing to a very large extent. And when he received a copy, well, actually I think, I don’t know when he received the copy—when I came to him about two or three months later I saw in his laboratory an electron microscope. It consisted predominantly of glass, with two electro-magnetic coils or lenses, and lots of conduits leading into and out of the structure, and connected with a pump that was running continuously during the operation; and he explained to me that he was trying to find out whether the concepts disclosed in these applications would work. Q. Did you see the results achieved? A. Yes, I saw a group of photographs that were quite excellent, as to what was intended to show. As a matter of fact, they were much better than many photographs taken later by himself and others. Q. Were Doctor Luebcke’s results published do you know? A. They were not published. He intended to publish them, and submitted a draft of publications to Siemens and Halske, but the company was strictly against any publication of this line, much to his dismay. He was very peeved about that. Q. Apart from prosecutions of applications on inventions by Doctor Rudenberg, what was the next subsequent patent work in which you were involved which related to electron microscopes? A. Aside from Rudenberg’s inventions? After filing a total of six applications for Professor Rudenberg, I filed in April of 1932, two applications on inventions made by Luebcke in the course of his developments (Avery, 1947). In 1964 Avery recalled further details of Dr. Lu¨bcke’s developments during 1932: Dr. Lubcke knew of [Ru¨denberg’s] disclosure. When I told him about it, he already had knowledge thereof and had already started developing or testing a microscope. The development at the Research Laboratory resulted in various modifications of the microscope. These included a reflection microscope, also vacuum lock devices for inserting specimens. Corresponding German patent
240
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 12 Siemens patent list on the electron microscope, date unknown, carrying the initials Ab/vWn. Note that there are two Ru¨denberg patent applications listed with the date May 30, 1931, one by Ru¨denberg’s associate Dr. Pohlhausen, and two by Dr. Lu¨bcke, dated April 1932. The two Swinne applications, and Ru¨denberg’s patent on neutral electrostatic lenses (DE 915253), are not listed.
applications were filed, two of them in March [April] of 1932 [Figure 12]. This development work had nothing to do with the apparent simultaneous activities at the Technical College (Avery, 1964). In communication with Lily Ru¨denberg, Dr. Lu¨bcke remembered other influences (Lu¨bcke, 1964). He recalled demonstrating his work
Origin and Background of the Invention of the Electron Microscope Commentary
241
early in the summer of 1931 at a Colloquium sponsored by the head of the Research Laboratory, Dr. Gerdien.20 He also remembered attending a meeting of the Niedersachsen district association of the German Physics Society where Mr. Houtermans and others presented ‘‘Emission from Hot Cathodes by means of an Electron Microscope’’ (Knoll et al., 1932). Beyond Lu¨bcke’s applications, Siemens and Halske applied for a patent for a diffraction electron microscope on May 15, 1932 (no inventor listed). This patent, granted in 1954, describes the use of an electron microscope with magnetic or electrostatic lenses to produce an enlarged ‘‘structure image’’ from the diffraction patterns of crystalline specimens, (Siemens and Halske, patent DE 909156, 1932). Dr. Ferdinand Trendelenburg investigated the use of cathode rays for electron diffraction at the Research Laboratories, but it is not known if he was involved in this patent. Little information is available on the early electron microscopes developed in the Siemens Forschungs-Laboratorium in 1931 and 1932, though their history would provide a more complete picture of the early development of the instrument. Dr. Lu¨bcke had planned to publish his work in the Journal of High Frequency and Electron Acoustics, but his draft was withdrawn before publication. Siemens and Halske had reached some agreements that electron optics was supposed to be developed by the firm AEG (Lu¨bcke, 1964).21 According to Avery, The Research-Laboratories did work for both Siemens and Halske and Siemens-Schuckertwerke. Dr. Lu¨bcke was eager to publish a paper on the microscope but S and H refused to permit this. My knowledge of this information is secondary. I only remember vividly that Lu¨bcke was extremely upset because he was not permitted to publish’’ (Avery, 1964). Ru¨denberg had also prepared a contribution for publication on the scientific aspects of his electron microscope design and received a similar response.22 The management of Siemens reduced this manuscript to a very short note, published in Die Naturwissenschaften (Ru¨denberg, 1932). An informal agreement between the management of Siemens and that of the firm AEG, such that Siemens would not go further into the field of electron optics, was in effect as early as 1932 (Mu¨ller, 2008), and in late 1932 all work on the microscope was apparently halted. This agreement may have related to the work of the AEG Research Institute in electron
20 The research laboratory leadership cultivated close contacts with university colleagues and published its own journal, Wissenschaftliche Veroo¨ffentlichungen aus dem Siemens Konzern. 21 Allgemeine Elektrisita¨t Gesellschaft (AEG), a German electric power engineering and electrical equipment firm, though a competitor of Siemens, also engaged in a number or cooperative research and development agreements with the Siemens companies. 22 Once in the United States, Ru¨denberg published the scientific aspects of his patents in the Journal of Applied Physics in 1943.
242
H. Gunther Rudenberg and Paul G. Rudenberg
optics. Ru¨denberg seemed to be unaware of this agreement, but clearly felt the frustration of neither being unable to continue work on his invention nor publish the associated scientific aspects. However, despite Ru¨denberg’s lack of publication on the details of his microscope other than in patents, the early work on electron microscopes in the Berlin research laboratories of Siemens and Halske (the company which later built the first commercial electron microscope), laboratory work that began with the distribution of the first Ru¨denberg patent applications in 1931, leaves open for discussion the question of whether Ru¨denberg played a part in the early development of the electron microscope. By 1933, the new regime had come to power and although Ru¨denberg was a highly esteemed engineer at Siemens, he felt that it would have been highly imprudent to oppose or try to change the management edict against research on an electron microscope at the Siemens companies: ‘‘Under the reign of the Nazi philosophy at that time I could only yield to their decisions.’’ In addition, discussions to clarify Ru¨denberg’s patent relationship with Siemens were ‘‘made impossible because the Nazi regime then in power had restricted my freedom of action and finally led to my dismissal from S.S.W’’ (Ru¨denberg, 1946b). Ru¨denberg, forced into silence in the wake of one of his most exciting new ideas, was clearly disheartened with these developments. Despite this situation, E. Ruska, who had built his first higher-thanoptical electron microscope with magnetic lenses in 1933, may have met with Ru¨denberg in September of 1935 to discuss their work and the issue of patent priority. After correspondence between the two initiated by Ruska in the summer (Ruska, 1986), a meeting between them appears to have been planned for Monday, September 16, 1935, at 10 AM with the subject, ‘‘Electron microscope’’ (Figure 13) followed by a meeting on September 20 of Ru¨denberg with Dr. Lu¨bcke, and the next day with Wolf (perhaps Dipl. Ing. Wolf of the patent department). Neither Ru¨denberg nor Ruska later mentioned the meeting nor what was discussed.23 After Ru¨denberg had emigrated, Ruska and von Borries visited Siemens and Halske to present their patent applications dealing with electron microscopes (Avery 1947, pp. 205–206). Avery was asked to study the offered applications and give an opinion as to the chance of obtaining patents, on the question of dependency on the earlier applications, and perhaps on their commercial value. By late 1936, AEG appeared to be involved in electron microscope production and Siemens and Halske reentered the field.
23
The meeting, though scheduled, may not have taken place; however, other calendar entries show that Ru¨denberg had a habit of crossing out any postponed or canceled meetings.
(a)
(b)
FIGURE 13 Reinhold Ru¨denberg’s 1935 Siemens diary (a), The diary lists meetings the week of September 15, 1935, with Dr. Ruska (16th), Lu¨bcke (20th), and Wolf (21st) (b). NOTE: To this day, only the 1935 and 1936 diaries have been found.
244
H. Gunther Rudenberg and Paul G. Rudenberg
Ru¨denberg, however, had to leave Siemens’ work on the electron microscope to others. He had a much more pressing concern—his life and the lives of his family might be in danger.
3.3. Leaving Germany Discerning and negotiating the impending dangerous situation in Germany, and then moving his family to new countries and twice reestablishing his work life, took its toll on opportunities to do practical work on this important invention. He was not able to return to the task until his consultancy on the Farrand microscope, twelve years later. As early as 1932 there appeared to be tension among employees based on politics and culture affecting their relationships. In many situations, Ru¨denberg’s ability to implement his work was affected by the political situation. He avoided contact or conversation with certain former colleagues. In his statement honoring Ru¨denberg’s 25 years with the Siemens firms in 1933, Carl Frederich von Siemens lamented the country’s situation that ‘‘has paralyzed so completely the sense of security, and has inflicted bitter wrong on deserving men’’ (von Siemens, 1933). Another colleague wrote to the Ru¨denbergs wishing ‘‘that the unfavorable tendencies of the moment, which have also impeded Reinhold’s professional effectiveness, be of restricted duration’’ (Ru¨denberg, L., 1983). Reinhold used to discuss these problems with von Siemens. During the first years of the Nazi regime Carl Friedrich tried to persuade Reinhold to be patient, that it would all blow over and later seem to be like a bad dream, that he would be able to keep Reinhold in his present job and would protect him, as long as Reinhold would be modest and would abstain from trips outside of Germany in his capacity as Chief Electrical Engineer of the firm (Ru¨denberg, L., 1983). By 1935 the situation deteriorated professionally and personally such that there was no choice but to take radical steps. Ru¨denberg (1959) writes: ‘‘Forced by the Nazi system, I left Germany in 1936.’’ This brief statement provides no clue to the deep anxieties the Ru¨denbergs felt for many years as they debated, prepared to leave, and then finally departed Germany. Ru¨denberg’s teaching position at the Berlin Technical University had been withdrawn and he was prohibited from entering the grounds (but after the war, he was made an honorary senator of the school). After these incidents and then the Nazi Nuremberg laws of 1935,24 24
September 15, 1935, promulgation of the ‘‘Nuremberg Laws for Protection of German Blood and Honor.’’ This was the new German (Nazi) National Law of Citizenship. It included definition of the term ‘‘Jew,’’, Aryan requirement for all official appointments, and many other restrictions on Germans of Jewish descent.
Origin and Background of the Invention of the Electron Microscope Commentary
245
which relegated Jews to the status of second-class citizens or worse, Ru¨denberg said it took him two days and two nights to resolve the matter in his mind that the family should leave Germany as soon as practical. Ru¨denberg had been watched in his business travels, in February or March of 1936, when he had spoken with a former neighbor, Alfred Kerr, in Paris during a convention to find out about his daughter, who had been ill. A Siemens colleague had seen them together, and in the usual questioning by German police after such a trip, reported this. The police wanted to know what horrible thing Ru¨denberg was planning with Kerr. He was not able to convince them that he merely bumped into the old neighbor and said hello (Howard, 1985; Ru¨denberg, L. 1983). As a result of this incident, Ru¨denberg’s passport was apparently confiscated by the police. However, through the careful work of the Siemens management, they were able to have the passport returned. He told only Carl Friedrich von Siemens, the head of the firm, whom he always regarded with great loyalty, and two of his colleagues at SSW about his decision to leave: Personnel Director Dr. Dietrich von Witzleben, and Dr. Fessel. Each knew Ru¨denberg well from his long service at the firm. Now, with the new Nuremberg laws, they understood and supported his intention to leave Germany, offering to help as judiciously as they could within the political environment of the time. In support, they provided him with a letter to justify his ‘‘business’’ trip to the authorities at the German border. The letter directed him to travel to London on behalf of the Siemens company to explore prospects of licensing foreign patents held by Siemens and to try to obtain such license payments in foreign currency. At that time, the Reich government was very eager to bring in as much money as possible from abroad to help fund purchases for secret war preparations. At C. F. von Siemen’s recommendation, Ru¨denberg had made preparatory contacts with the British General Electric Company. Ru¨denberg’s excellent professional reputation was known throughout Europe. With a letter of recommendation from chairman von Siemens, Ru¨denberg was quickly hired as an in-house consulting engineer, to work at the company’s headquarters in Wembley, north of London. On the April weekend before he had planned to leave, chairman C. F. von Siemens invited Ru¨denberg and Lily to a formal dinner party at the von Siemens home. This dinner was given to honor Ru¨denberg in a covert farewell party, although this fact was never mentioned. Also present were the von Witzlebens and the Japanese ambassador to Germany. The ambassador’s presence made this an international goodwill dinner, a strategy meant to avoid possible rumors. In this way, their hosts made this dinner with the Jewish couple politically credible. A few days later, Ru¨denberg attended a business appointment at Siemens in the morning. Calling attention to his plans by cleaning out
246
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 14
Telegram from Reinhold Ru¨denberg to Lily Ru¨denberg, April 3, 1936.
his desk and his office was out of the question—papers and notes useful later in defense of his creative work were likely there.25 Ru¨denberg quietly, presumably without being able to say farewell to anyone, even his trusted assistants, took a taxi to the Charlottenburg train station in Berlin to catch the overnight train to Amsterdam and London. His daughter Angelika had previously been sent to school in France and his oldest son, Gunther, to Felsted School, near Stansted, England. Ru¨denberg traveled alone, to avoid attention, ostensibly on his business trip. To reinforce this strategy he carried a full round-trip ticket, the unused return portion of which he would later mail back from London. On the next day, Lily received a telegram from Amsterdam, thus indicating her husband’s safe crossing of the border into Holland (Figure 14). The telegram asked her to come to Amsterdam at once because Gunther’s school required this. That day Lily took the next available train with 7-year-old Hermann. They joined Ru¨denberg in Amsterdam without any disturbing incident as the train crossed the border out of Germany. From there, they traveled together to London to reunite the children and start a new life. They soon
25
The authors have worked with the Siemens Corporate Archives in Munich, which has repeatedly tried to locate such documents, including invention drawings that might have been left in his office, as well as the initial patent applications 48456 and 48483, but to this date none have been located. The intervening war, and the subsequent occupation, led to huge losses of archival material.
Origin and Background of the Invention of the Electron Microscope Commentary
247
found a house in nearby Northwood and expected to stay in England from then on. The Siemens firm graciously, and perhaps with some risk, packed up the family belongings at their Charlottenburg home and shipped them to England. Unfortunately, it seems that remaining personal papers from Ru¨denberg’s office were probably not similarly sorted and shipped, as many notes and letters from the early 1930s that would shed light on the controversy that ensued later have never been located. Ru¨denberg began work at the British General Electric Company and lectured at Queen Mary College in London. Beginning in 1935, and while in England, Ru¨denberg’s health was poor. He had not known whether he and his family would escape alive, and he was uncertain if he would find suitable employment again. The trauma of the recent years, plus the knowledge about so many family and friends in danger in Germany, took their toll on his health. His physician recommended that he officially cut his ties with Siemens and his work in Germany (Ru¨denberg, 1938). Unexpectedly, early in 1938, an American friend, Westinghouse engineer Joseph Slepian, who was returning from a European trip, visited Ru¨denberg in England. On the request of Dean Westergaard, Slepian asked if Ru¨denberg would be willing to take the post of Gordon McKay Professor of Electrical Engineering at the Graduate School of Engineering at Harvard University, where the previous department head had passed away. The university invited him to pay a visit for a guest lecture at the Harvard Engineering School and discuss this appointment further, after which Ru¨denberg accepted the new position. So now, the family prepared to move again, at the end of September 1938. The ocean crossing was rather rough as the S.S. Volendam (Holland America Line) encountered the last gasps of a New England hurricane at sea. The arrival date of October 16 has from that time been kept as a kind of family ‘‘Thanksgiving day’’ after the many years of uncertainty.
4. THE FATE OF THE PATENT APPLICATIONS A year after the family arrived in the United States, war broke out between Germany and England, and in December 1941, following the Japanese attacks on Pearl Harbor, the United States also entered this war. In 1942, as part of the war effort, patents and all other intellectual property of German firms were subject to a vesting order of the U.S. Alien Property Custodian (Figure 15). The two Ru¨denberg US patents on the electron microscope, officially assigned to Siemens, were confiscated by the Alien Property Custodian. In 1942, Ru¨denberg filed a first claim for his share in licensing, but no action was taken. Eventually he had to initiate legal proceedings to have
248
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 15 Newspaper article on use of German patents by US industry, including those describing the electron microscope.
them restored. Judge Wyzanski later wrote, ‘‘You will also observe that the United States Government in this hotly-contested case would have proved if it could have proved, that Ru¨denberg was not in fact the inventor and would have proved, if it could have proved, that Ru¨denberg’s invention was not highly original and creative’’ (Wyzanski, 1961). However, in 1947 the two U.S. patents were restored to Ru¨denberg in title and ownership as inventor (Wyzanski, 1947a). Ru¨denberg felt, and wrote in his memoir, that the scientific ideas contained in his applications, widely distributed at Siemens and published in 1932 in his French patents, should have been mentioned in the work of four early German authors on electron optics (Ru¨denberg, 1959), several of which referenced patents as well as papers. He blames Nazi
Origin and Background of the Invention of the Electron Microscope Commentary
249
laws of the 1930s for forcing this omission. In testimony, director of the Siemens patent department in 1938, Dr. Paul Weigend, agreed that this is a reason for the lack of citing of Ru¨denberg’s accomplishments by another author. However he also states that ‘‘only those inventors are named which have practically worked on the performance of the electron microscope’’ (Weigend, 1946). D. Gabor later wrote to Ru¨denberg in 1946 that another reason could have been that the ideas were expressed in the form of patents, not papers: I am afraid, it can happen only too easily that scientists who file patents instead of writing papers find themselves forgotten. In my case this was a matter of not noticing the patents, but others disregard patents as a matter of principle. For instance in Appleton’s Encyclopedia article on Thermionic Tubes Edison’s name is not even mentioned. A similar thing has happened to me recently. In 1940 I have invented and patented a lens system which three years later was reinvented by the Russian Maksutov and described in the Journal of the American Optical Society, and in the Scientific American. The latter journal even refused to call attention to my patent, when I brought it to their notice. It seems to be more or less accepted practice nowadays in the scientific world, that patents are priority before the Law, but not in science. This being so I shall draw my own consequences and publish preliminary notices whenever possible (Gabor, 1946 [italics added to journal titles]). However, Fruendlich, who worked under Max Knoll in the hightension laboratory, reported that although his doctoral thesis was printed and ready to be bound together with those of four other graduate students in the lab, only three appeared in the research journal of the Studiengesellschaft (fu¨r Ho¨chstspannungsanlagen). Those of Lubszynski and himself ‘‘were suppressed due to Nazi persecution’’ (Freundlich, 1994). To the present authors it appears that (1) Ru¨denberg’s description of conditions to maintain higher-than-optical resolution in electron imaging, (2) his early proposed use in a microscope of single- and multiple-electrode apertured diaphragm electrostatic lenses, (3) his design of the first velocity selector in an electron microscope, and (4) the first mention of conditions for electron imaging of submicroscopic living specimens were significant. However, these innovations are not mentioned in scientific literature in association with Ru¨denberg’s name. Three of the four early German authors Ru¨denberg mentions do cite certain patents. Bru¨che (1943) mentions the 1927 patent of H. Stintzing, suggesting his invention as a precursor of the scanning electron microscope, and the M. Knoll (1929) design for an electrostatic einzel lens for oscillographs. Ku¨pfmu¨ller (1944), using the same format for both journal references and patents, mentions both the Knoll patent and the Ruska and von Borries (1932) magnetic pole piece lens. Yet neither mentions
250
H. Gunther Rudenberg and Paul G. Rudenberg
Ru¨denberg’s patent on the first description of one- and three-electrode apertured lenses for the imaging of objects. However, in 1953, Gabor states ‘‘. . . if one considered the electron microscope as an invention, the following dates would be of great importance’’ and lists the dates of the patent applications of H. Stintzing (May 14, 1927), M. Knoll (November 10, 1929), and R. Ru¨denberg (May 30, 1931). T. Mulvey (1962) also includes Ru¨denberg’s patents, as well as those of Knoll and Stintzing, in his ‘‘essentially historical’’ treatment of the origins of the electron microscope, stating patent specifications should be mentioned ‘‘especially when such references can throw light on the state of scientific opinion at any given time.’’ More recent literature on the early history of the microscope includes the significant patents (Lin Qing, 1995). Thus, it seems probable that some of the early German authors writing on the invention and early history of the physical instrument might have also cited these relevant Ru¨denberg patents if there were not restrictions on mentioning non-Aryan accomplishments between 1935 and 1945. In Germany, the final fate of the two applications was a prolonged and complicated one. The twenty- two-year process moving toward the granting of these patents produced some battle scars in the statements and claims. In the years 1936 to 1939, as the result of interactions between the Reich patent office and Siemens after Ru¨denberg had left Germany, these applications were reworked and divided into five parts, with some important changes in language.26 The original applications of May 30, 1931, have, regrettably, not been located either at the Siemens Archives or the German patent office. For this discussion, it is assumed that the first pages of the U.S. applications of May 1932, which strongly resemble the published patents issued in France, Austria, and Switzerland, are the closest known documents to the original German applications. In his court testimony in 1947, Ru¨denberg specifically clarified that the German patent applications were separate from the German patents: ‘‘I don’t know of German patents, but only of German patent applications on the electron microscope.’’ He also stated that the German applications were ‘‘practically identical to the two American patents at issue’’ (Ru¨denberg, 1947, p. 142). However, the claims of the U.S. patents had been given a new form, different than their 1932 application, to correspond to requirements of U.S. patent law. Thus, the original patent claims of May 30, 1931, may be best understood by looking at the French and Swiss Ru¨denberg patents together with the 1932 U.S. applications.
26
The Siemens patent department has suggested that patents were often subdivided if the result would produce several separate instruments (personal communication, August 2009).
Origin and Background of the Invention of the Electron Microscope Commentary
251
The journey of the German applications began as follows: The electrostatic lens application, Siemens no. 48476, Arrangement for influencing the character of electron rays, was divided into two or three sections, probably in late 1935. Two parts were resubmitted to the German patent office in early 1936 and patents granted in 1939 and 1942. The two patents derived from the one original application were: DE 754259 Arrangement to control the intensity of electron rays passing through an aperture, which described and claimed the method of controlling electron beam intensity and the beam velocity by way of electrostatic diaphragms, and DE 758391 Arrangement for influencing the course of convergent electron rays with flat- or ring-shaped electrostatically charged apertures, which claimed Ru¨denberg’s principles of design of electrostatic lenses, and their various forms. By comparison with the more complete foreign patents it is possible to determine that these two published patents carried some of the drawings and text from the original application, and some common text. Although these two patents were granted and printed in a draft form, they were not published, and interestingly, at no point mention the name of the inventor. Much later, in 1953, a third patent derived from the same electrostatic lens application 48476 was granted and published, DE 889660, which claimed the single- and three-aperture einzel lenses, and which overlapped greatly with 758391 and may have replaced this patent. The second May 30 initial patent application, ‘‘Apparatus for the Imaging of Objects,’’ Siemens application number 48483, had a more prolonged and difficult history. In 1937, this application had been split into two separate applications. Ru¨denberg was not in Germany at the time. ‘‘Arrangement for enlarged imaging of objects by means of electron rays’’ [which described an electrostatic lens microscope (Ru¨denberg, 1931c)], and ‘‘Arrangement for enlarged imaging of objects by means of electron rays and influencing the course of the electron rays through electrostatic or electromagnetic fields’’ (Ru¨denberg, 1931d). The latter of the two was first displayed in Germany on May 31, 1938. This application was over the years subject to fierce opposition, as can be seen from the list below. The television manufacturer Fernseh A. G., the Berlin company Gotthard Sachseberg Zentral G.m.b.H, and the electron microscope producer AEG presented objections (Eisenzapf, 1953b).
252
H. Gunther Rudenberg and Paul G. Rudenberg
The list of patents and papers presented by these companies and by the patent examiners in 1938–39 challenging this application were as follows: Patents DE 595784 (Rogowski) DE 373834 and application 30642 (Lilienfeld) DE 431220 (Rogowski) FR 644776 (Sabbah) GB 295710 (Rogowski) GB 315362 (Tihangi) FR 737716 (Ru¨denberg) US 2058914 (Ru¨denberg) OE 137611 (Ru¨denberg) Papers Busch, H., Archiv fu¨r Elektrotechnik, 1927 Schmidt, G. C. , book ‘‘Die Kathodenstrahlen,’’ 1904 and 1907 Bruche and Scherzer, ‘‘Geometrische Elektronenoptik,’’ 1934 pp. 87–89 and 168–169 Scherzer, Zeitschrift fu¨r Physik , no. 101, 1936 p. 593 Bru¨che, Archiv fu¨r Elektrotechnik, no. 27, 1933, pp. 266, 273 Ruska and Knoll, Zeitshrift fu¨r Tech. Physik, no. 12, 1931, pages 389, 398, 399 Zeitschrift fu¨r Tech. Physik , no. 17, 1936, p. 596 Rusch, M., Annalen der Physik, no. 80, 1926, pages 707–727 Draht et al. , Zeitshrift fu¨r Tech. Physik, no. 52, 1928, pages 8–15 Bru¨che in von W. Peterson, Forschung und Technik, 1930, p. 28 (Eisenzapf, 1953b) A striking aspect of the list of challenges is that three Ru¨denberg patents were presented. As shown below, the German patent application eventually differed from its foreign counterparts, and in this period it may not have carried the Ru¨denberg name. The new title of this application (a title that apparently was derived from the summary claim of the French combined microscope patent, FR737716 ‘‘Dispositif pour obtenir des images d’objets’’) emphasized the general principle that both electrostatic and ‘‘electromagnetic’’ fields are used to influence the microscope electron beam. In fact, in the text of one of the published German patents derived from this application, use of ‘‘electromagnetic fields’’ to produce images is stated as prior art: To be able to produce an enlarged image of an object by means of electron rays, it is known that the bundle of electron rays is allowed to pass through a magnetic coil. The rays are pushed by the magnetic field into the axis of the beam which coincides with the axis of the coil, so that the rays converge in a focal spot or focal point.
Origin and Background of the Invention of the Electron Microscope Commentary
253
The magnetic coil therefore acts for the electron rays like a convergent lens in optics (DE 895,635). In the corresponding French patent (as well as the Austrian and Swiss patents), the individual claims following this summary specify that the beam is made homogeneous by electrostatic or magnetic fields, but Ru¨denberg also clearly specifies that ‘‘the fields causing enlargement are created by fields of electrostatic charge surrounding the rays in a symmetrical fashion.’’27,28 In the U.S. patent application from 1932, and most of the foreign patents, the inventor includes a paragraph that traces the roots of his or her idea: In cathode-ray oscillographs, electro-magnetic fields surrounding the beam concentrically and causing a stricture of the beam are often employed. These fields have an effect on the beam of rays similar to that of an optical lens on a beam of light; they many cause the rays of the beam to become convergent or divergent. Consequently, with such stricture fields an enlarged or a reduced luminous image of the cathode ray beam is, according to circumstances, also obtained on the fluorescent screen. This enlargement or magnification of shadows has hitherto been regarded only as a secondary phenomenon of minor technical importance in the use of electron rays. The reduction has in the past only been employed to obtain a sharply defined luminous spot (focus) (Ru¨denberg, 1931i). However, this paragraph and the figure representing a concentrating coil were no longer found in the German patents published in 1953 and 1954. The 1927 Busch paper key to this history—and Bru¨che (1930)—were cited as references in the German patent. By the time of publication in 1953, the concept of an ‘‘enlarged shadow of an object impervious to the rays’’ found in the early U.S. application and the foreign patents was replaced by that of a magnified image produced by penetrating the object.29 In patent 895635 the figure showing the Ru¨denberg electrostatic lens was also removed. At one point the changes 27 Translation of patent FR737716, page 6 claim 2e. ‘‘Les champs effectuant l’agrandissement sont engendre´s par des e´crans a` charge e´lectrostatique , qui entourent le rayon de manie`re pratiquement syme´trique. See also Swiss patent CH165549 claim 5. Note on the French claim 2a, an apparent error in the text of the claim places the monochromator after the object. 28 In his late patent application (Aug. 15, 1932), granted and published in 1954 as DE 915253, Ru¨denberg includes one magnetic collecting lens in the figures but the patent discussion is entirely on his electrostatic lenses. 29 U.S. application 613857 (1932) Owing to the fact that in fieldless spaces electron rays travel, similar to light rays, in straight paths, an object impervious to the rays, when placed in a divergent bundle or beam of rays, produces an enlarged shadow on a fluorescent screen onto which the rays are projected.
DE 895635 (published 1953) Because electron rays travel, similar to light rays, in straight lines in fieldless spaces, they can be used to enlarge images of objects. The electron rays can thereby be emitted directly from the object, the image of which is to be magnified, be reflected by it or can penetrate the object.
254
H. Gunther Rudenberg and Paul G. Rudenberg
appear to more specifically increase the emphasis on ‘‘electromagnetic’’ lenses. The difference of negative and positive electrostatic stops mentioned in the early application (1932) had been modified by 1953 to ‘‘the electromagnetic electron lens and the negatively and positively charged electrostatic lenses.’’30 Ru¨denberg mentions electron imaging with magnetic lenses in his memoir and papers, and the U.S. 1932 patent application also claims the use of ‘‘at least one magnet coil surrounding the beam substantially symmetrically’’ (Ru¨denberg, 1931i). However, in the other foreign patents of this early date, Ru¨denberg describes the magnetic coil as being used primarily to make the beam homogeneous, or parallel (Ru¨denberg. 1931g,h). These splits and subsequent changes leading to the German published patents are most likely a response to the demands of the German patent examiners to the intense challenges posed to these applications over many years, as well as a result of the turmoil of the war years. They were made in years of renewed electron microscope activity at Siemens without the involvement of Dr. Ru¨denberg, who had emigrated. In summary, they emphasize concepts that Ru¨denberg did not emphasize in his 1932 foreign applications. Any attempt to understand Ru¨denberg’s original ideas and claims should not depend alone on the German patents published in the 1950s, without at least also considering the two unpublished German patents, the much earlier U.S. applications, and the foreign patents of the 1930s. The Siemens patent department attempted to keep Ru¨denberg informed throughout the history of the German patents, noting when patents were announced, challenged, or finally granted. When Ru¨denberg questioned the procedures, especially when he heard of the splits, he would dialog with the Siemens patent department. However, he was not aware when the U.S. patents were granted in 1936 and 1937. By 1954, all of the microscope patent applications were granted as German patents, albeit in the split and modified form. After the final patents were granted, Ru¨denberg summarized and titled his patents with priority dates of May 30, 1931, as shown in Figure 16. Ru¨denberg
30
U.S. application 613857 (1932 ) The negative stop, therefore, has the same effect as a convex lens in optics, and the positive stop has that of a concave lens. By combining stops of that kind, all the devices known to optics and based on converting or diverging beams can be imitated for electron rays. It is in this manner possible, for example, to make a microscope or a telescope for use with direct or reflected electron rays. DE 895635 (published 1953) Because, as described above, the electromagnetic electron lens and the negatively charged electrostatic electron lens correspond to a convergence lens in optics, while the positively charged electrostatic electron lens corresponds to a divergence lens in optics, all of the devices known in optics that are based on convergent or divergent beam bundles can be imitated for electron beams by juxtaposing lenses of this type. It is therefore possible in this manner to construct a microscope or telescope that accepts direct or reflected electron rays.
Origin and Background of the Invention of the Electron Microscope Commentary
255
FIGURE 16 Ru¨denberg’s personal notes on the five German patents carrying the May, 31, 1931, priority date: 889660 Electrostatic Lens with 2 or 3 apertures; 895635 Compound Microscope or Telescope; 758391 Electrostatic Lens, 2 electrode (and 1 plane?); 754259 Intensity Controller; and 906737 Electron Microscope with Electrostatic Lenses (Ru¨denberg, 1954).
was encouraged that they were finally granted in Germany, and related this to his friend Dr. D. Gabor (Figure 17): With all legal challenges overcome the twenty-year journey of the patents in Germany was complete.
4.1. Controversy To Ru¨denberg, granting of the German patents in 1953 and 1954 was important, not only as a statement of the primacy of his ideas, but also because the state of his German patent applications before 1953 had cast doubts in commercial spheres. RCA was one of the early U.S. firms to produce a commercial electron microscope, protected by U.S. patents held by the Alien Property Custodian. In 1945, H.G. Grover, assistant to the vice president of RCA, wrote of ‘‘the probability that the question of [patent] ownership will not be decided for a long time and might not be entirely clear’’ (Grover, 1945). RCA, which expressed doubt about the validity of the Ru¨denberg patents, sent investigators to Berlin in 1947 to request that the Russian officials look for Siemens and Halske records that might shed light on the subject (Wyzanski, 1947b). A year later, after the
(a)
(b)
FIGURE 17
Letters of October 13 (a) and November 16, 1953 (b), from Ru¨denberg to Gabor (Ru¨denberg, 1953c,d).
Origin and Background of the Invention of the Electron Microscope Commentary
257
two U.S. patents were restored to Ru¨denberg, RCA delayed in taking a license under them because of ‘‘the disorganized state of German patent office records which we wish to examine to determine why German patents were not granted to Rudenberg’’ (Grover, 1948). However, the RCA reluctance to license based on the German patent situation in the 1940s added a further reason for Ru¨denberg to be satisfied with the 1953 and 1954 conclusions of the German patent office. Although he held numerous patents, Ru¨denberg saw this invention as his most significant contribution to humanity, and he displayed a copy of his microscope patent together with an electron micrograph of the polio virus prominently in his study. This heartfelt and forthright claim, which implied he felt others were aware of and employed his original ideas, was quite irritating to his critics, even stimulating them to write against his patent priority.31 In particular, although Ru¨denberg mentioned Dr. Max Knoll’s important work,32 at four different points in his memoir, it was never as coinventor of the electron microscope (Ru¨denberg, 1959). This is due in part to Ru¨denberg’s view that only a design for higher-than-optical resolution ¨ bermikroskop’’) should be considered an ‘‘electron instrument (an ‘‘U microscope.’’ Ru¨denberg had sent his memoir to George Clark, the editor of the Encyclopedia of Microscopy. Dr. Knoll, perhaps having read his friend Clark’s assessment of Ru¨denberg’s memoir draft, contacted Ru¨denberg twice by way of a patent lawyer, Mr. Theodore Hafner, in New York to discuss the issue of priority of invention (Figure 18). During Ru¨denberg’s lifetime, questions about the source or stimulus of Ru¨denberg’s electron microscope disclosures had primarily taken the form of rumors, similar to those begun about many non-Aryan scientists in the politically tumultuous 1930s. D. Gabor, who had been present after the war for at least one interview of German scientists, wrote to Ru¨denberg that ‘‘apparently the Nazis are very much incensed with your patents’’ (Gabor, 1946). Ru¨denberg refused to respond to the speculation or become involved in the controversy. Ute Siemens, widow of Georg Siemens, wrote to Lily Ru¨denberg in 1963 that von Siemens had felt at the time that such rumors were not consistent with the character of 31 Borasky, private communication to H. G. Rudenberg, October 23,1963; H.G. Rudenberg Archive: ‘‘[Professor Knoll and Dr. Freundlich] are at a loss in trying to understand why the obituary for your father published in Physics Today, cited the invention of the electron microscope as your father’s most significant achievement and contribution to science, almost to the exclusion of the many other significant contributions to knowledge made by him in a long and illustrious career’’ [italicized title added]. 32 See Knoll and Ruska (1932a). This paper was based mostly on material from an open seminar talk by Professor Max Knoll, which described his student Ernst Ruska’s work, that had been given on June 4, 1931, at the TH Charlottenburg (now TU Berlin). Additional material about work done after the seminar was included in the published paper. The paper described Ruska’s experimental investigation of geometrical electron optics in a cathode ray oscillograph that had been adapted to investigate experimentally the lens theory published by Hans Busch in 1927. In later years, the authors referred to this paper as defining an electron microscope.
258
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 18 Letter from Dr. Knoll’s friend, Mr. Theodore Hafner, to Prof. Dr. Reinhold Ru¨denberg, May 27, 1960.
Origin and Background of the Invention of the Electron Microscope Commentary
FIGURE 19
259
Reply of Reinhold Ru¨denberg to Mr. Hafner, September 29, 1960.
Ru¨denberg (Siemens, 1963). Ru¨denberg had repeated the details of the mental process and the timing of his invention consistently over the years in journals, sworn court testimony, and in his memoir (Ru¨denberg, 1943, 1947, 1959). However, Ru¨denberg wrote to Mr. Hafner (Figure 19) expressing regret that such a discussion meeting had not taken place many years earlier, perhaps during the years when Dr. Knoll had been at Princeton University and RCA. Ru¨denberg had proposed to meet with Mr. Hafner in early December. It is not clear if this meeting took place or if some common understanding was reached. Probably not: Letters questioning the source of Ru¨denberg’s patents began circulating in electron microscopy circles beginning soon after Ru¨denberg’s death in December 1961 and eventually were published in 1985 (Hawkes, 1985). In October 1960, after Ru¨denberg had responded to Mr. Hafner, Dr. Knoll wrote to Dr. Max Steenbeck, who had worked under Ru¨denberg at SSW (Knoll, 1960b). Dr. Knoll felt that at the time of Ru¨denberg’s 1931 microscope patent applications, Knoll had already been building electron microscopes for ‘‘1½ to two years’’ (Knoll, 1960b). In a separate letter he stated of the Ruska/Knoll apparatus, ‘‘with the electron microscope close
260
H. Gunther Rudenberg and Paul G. Rudenberg
at my desk, I succeeded with [Ruska] in obtaining pictures of considerable magnification’’ (Knoll, 1960a). Siemens, AEG, and other company representatives had paid regular visits to, and received reports from, Knoll’s Cathode Ray Oscillograph Laboratory, which was partly supported by the ‘‘Studiengesellchaft fu¨r Ho¨chstspannungsanlagen,’’ an industrial research organization of which Siemens (and AEG) were members (Knoll, 1960a). It seems likely that information and suggestions were exchanged in both directions during these visits. Dr. Steenbeck may have been one of those regular visitors to observe the work of the laboratory. Dr. Knoll received a reply from Dr. Steenbeck in November 1960. In his response to Knoll, Dr. Steenbeck gives his remembrances of the Ruska/ Knoll apparatus of the spring of 1931. Specifically, Steenbeck was ‘‘particularly struck by the experimental demonstrations of the optical laws especially with a view to their use for cathode ray oscillographs,’’ and he found the Ruska experiments ‘‘a thoroughly solid and interesting fundamental study, of practical importance for sharp imaging in oscillographs’’ and a ‘‘convincing proof of the Busch lens formula’’ (Steenbeck, 1960). Steenbeck wrote that he recalled a magnetic coil ‘‘formed a magnified image of a fine wire mesh, through which the electron beam passed, by means of a relatively short-focal-length coil’’; and that ‘‘Distortion was clearly present in the image of the mesh.’’ Perhaps ‘‘during this same visit’’ he saw severely distorted images of the anode aperture by use of electrostatic lenses of curved and straight wire mesh (Steenbeck, 1960).33 It is not made clear in his descriptions if he saw two-stage magnification or the imaging of a second mesh at a distinct object plane. These observations mirror Mulvey’s analysis of the Knoll/Ruska experiments. Mulvey wrote that their subsequent publication (Knoll and Ruska, 1932a) ‘‘laid a sound foundation for the magnetic transmission instrument; the same cannot be said for the electrostatic instrument. In this paper the quality of the images obtained with the electrostatic lens was poor compared with those found with the magnetic lenses. This was mainly due to the use of the ‘Kugelkondensator’ lens . . ., an unsuitable design for electron microscopy’’ (Mulvey, 1962). Even so, Dr. Steenbeck speculates decisively on the direction of Ru¨denberg’s thoughts at that time, suggesting that without experimental evidence of the Busch lens formula, Ru¨denberg’s electron microscope ideas (and the subsequent patent disclosure) must have come from Steenbeck’s report of his visit to the Technische Hochschule. However,
33
This description should be contrasted with C. Avery’s descriptions of the photographs coming from the early Siemens apparatus in the Lu¨bcke laboratory some months later (this volume, p. 239).
Origin and Background of the Invention of the Electron Microscope Commentary
261
Steenbeck himself ‘‘did not grasp at that time that a useful magnification could in practice be achieved, significantly extending and widening the domain accessible in (light) optics’’ and ‘‘never thought of its practical exploitation in an electron microscope, and should probably have dismissed such an idea as pure fantasy; particularly from the instinctive expectation that space charges in the electron beam would exclude sharp image formation’’ (Steenbeck, 1960).34 Neither the discussions during his visit to the Knoll/Ruska laboratory nor the descriptions at the June 4 Cranz Colloquium convinced Dr. Steenbeck that ‘‘images of things could be produced by electrons, essentially surpassing the possibilities of (light) optics were possible.’’ In his memoir, which clearly covers the ideas and experiences that brought him to the electron microscope, and in his other writings, Ru¨denberg does not mention any such report of Dr. Steenbeck to him regarding his visit to Knoll’s laboratory. As mentioned above, during Ru¨denberg’s lifetime, questions about the source of his inspiration for the electron microscope had primarily taken the form of rumors that began in a politically volatile period. A written record of the visit or subsequent report from Siemens, the Technical University, or the Studiengesellchaft might help solve the dilemma of its date and content, but none has been found. Perhaps Ru¨denberg did not recall such a report by Dr. Steenbeck, but more likely he did not consider it an influence on his thinking at that late date on an electron microscope designed primarily with electrostatic lenses. The period in question was nine months after his son had contracted poliomyelitis, a period over which he reports developing his imaging concepts as distinct from Busch’s focusing concepts. Ru¨denberg left only several short comments that help clarify his views about Dr. Knoll’s subsequent lecture at the Cranz Colloquium. Gabor had written, ‘‘Ruska mentioned. . . that Knoll and he had already micrographs at the time when you filed your patents. I am not in a position to check this statement’’ (Gabor, 1946). Only with penciled comment in the margin, Ru¨denberg annotated this letter ‘‘Is not so! After my appl. filed he gave lecture without all that!!’’ In Ru¨denberg’s unpublished ‘‘Statement about the early history of the electron microscope’’ (Ru¨denberg, 1961a), he wrote that the first footnote of the paper ‘‘Das Elektronenmikroskop’’ (Knoll and Ruska, 1932b) states that the paper ‘‘was preliminarily presented in a lecture of June 4, 1931 and in addition relates this to the second paper [Knoll and Ruska, 1932a] above’’. However Ru¨denberg, calling himself ‘‘an eyewitness’’ to the lecture, argues that ‘‘only the main
34
For Steenbeck it was instead from the Bru¨che paper on cathode images eight months later that he ‘‘recognized the fundamentally new possibilities in electron-optical imaging.’’
262
H. Gunther Rudenberg and Paul G. Rudenberg
contents of paper No. 2 [1932a] and not of paper no. 3 [1932b] had been presented’’ (Ru¨denberg, 1961a). According to Steenbeck’s report, Ruska and Knoll had produced magnified images of anode screens made with a magnetic coil and others with a charged wire mesh, and had made calculations proving the Busch lens formula. In contrast, Ru¨denberg had created a fundamental design for higher-than-optical imaging of a living specimen at a distinct object plane, primarily with single and multiple-electrode charged apertured diaphragms and a monochromated beam.35 These important developments of Knoll and Ruska, observed by Steenbeck, and the unique designs which Ru¨denberg disclosed in his early patent applications, appear different enough both in idea and in history for one to reasonably conclude that two (groups of) scientists were on distinctly separate and original paths leading them toward a similar and profoundly important result. With all due respect to the opinions of these pioneers, it seems likely that by the spring of 1931, due to the foundations laid by Hans Busch, Gabor, and others, ideas about electron optical imaging filled the air of Berlin, and these three outstanding scientists each played a part in the birth of the invaluable instrument we have today.
5. BUILDING THE ELECTROSTATIC MICROSCOPE After emigrating from Germany, working as a power engineer in England, and then as a full-time professor in the United States, Ru¨denberg had neither the opportunity nor the funds to begin practical work on an electron microscope. However, this situation changed when Clair L. Farrand invited him in 1945 to assist the Farrand Optical Company, Inc. (FOCI) with the design and development of a new electron microscope. The company had unique skills in the fabrication of precise optical instruments and lenses. C. L. Farrand had read one of Ru¨denberg’s articles and examined his U.S. patents. Shortly after this he asked Dr. Ru¨denberg to consult for FOCI on the microscope, and later after the restitution of his patents, to license the firm under these patents. Farrand also greatly assisted Ru¨denberg in the patent restitution process. The proposal was to build an electrostatic instrument that was to have a smaller size, be especially useful for intermediate magnifications, and be simpler to operate than other models available at the time. It was hoped that this might be more useful for industrial laboratories compared with the rather large and complex electromagnetic instruments that
35
In contrast, as discussed, to what were presented in the German patents of the 1950s.
Origin and Background of the Invention of the Electron Microscope Commentary
263
by this time were available from Siemens, Phillips, RCA, and others (Ru¨denberg, 1960). A small group of scientists and engineers was assembled to develop an electrostatic lens electron microscope at FOCI. The project was directed by Dr. Philip Nolan and included on the staff mathematician Robert Rempfer and physicists Gertrude Fleming Rempfer and Edith Asherman, as well as consulting engineer Ru¨denberg. The construction and development of the prototype instrument took place at Farrand’s main plant in the Bronx, New York. Ru¨denberg provided periodic reports to the team, ranging from an initial analysis of the technical history of developments in electron microscopes in Germany and elsewhere up to that date, to an in-depth analysis of electrostatic microscope design principles and principles of magnetic shielding.36 The Farrand electrostatic lens (Figure 20) was based on the basic threeelectrode apertured diaphragm type described in the Ru¨denberg lens patent but modified to produce a state-of-the-art design (Rudenberg, 1987). Various unipotential and other lenses were systematically studied in the FOCI lab with a protocol designed in part by Ru¨denberg (Rempfer, 2009). Ru¨denberg had also encouraged a design based on his calculations of the ideal hyperbolic lens that he believed would produce a perfect lens field with no aberration. However, it turned out that even these lenses were subject to spherical aberration, thought to be due to the variation in time that electron rays were exposed to lens forces. These lenses also could not be readily constructed. Two of these electrodes must have equipotential hyperboloidal surfaces, which were difficult and expensive to machine at the time. Extending the field across the lens axis required electrically conductive, electron-transparent films over the aperture of the end electrodes. Ru¨denberg’s ideas formed the basis of an extensive publication summarizing the focal properties and aberrations of his hyperbolic lenses
FIGURE 20 Farrand three-electrode electrostatic lens design, as of June 1947. This design was the basis for an excellent laboratory instrument.
36
Many of these reports have been preserved in the H. G. Rudenberg Archive.
264
H. Gunther Rudenberg and Paul G. Rudenberg
FIGURE 21 Farrand photomicrograph of ‘‘Gold-shadowed Pseudomonas’’ published on the cover of Journal of Applied Physics (volume 20, January 1949).
(Ru¨denberg, 1948a). For the Farrand scientific team, calculations of these ‘‘ideal lenses’’ were useful mainly as a point of departure to derive certain aspects of actual low-aberration lens design. The experimental use of the electron-transparent films across the lens aperture, and the analysis of the properties of hypothetical diverging hyperbolic fields, eventually helped produce a design for a divergent three-aperture lens that, combined with a convergent lens, enabled correction of spherical aberration (Fleming, 1953). A factor degrading image resolution appeared during the early days of the project. Ru¨denberg suspected stray magnetic fields were affecting the sharpness of the electron images. He believed that these fields were caused by the considerable ground currents flowing back to generating stations as New York subways moved along nearby tracks, creating weak magnetic fields These could degrade image sharpness unless effectively shielded from the beam. This led to an examination (Ru¨denberg, 1946a) of the efficacy of the magnetic shield surrounding the beam column. Further shielding of the microscope was carried out. Ru¨denberg (1949) also published a detailed analysis of the electron gun for illuminating samples. After the usual early difficulties, the performance of the Farrand electrostatic lenses in 1948 was found to be excellent, and the initial laboratory instrument was quite successful. Photomicrographs from the laboratory model (Figure 21) were presented at the Philadelphia meeting
Origin and Background of the Invention of the Electron Microscope Commentary
265
of the Electron Microscopy Society of America in December 1947, generating substantial interest in the instrument, especially among small industrial enterprises. (G. Rempfer [2009] remembers that these were the best images presented at the meeting, partly because the team had devised a method to eliminate astigmatism in the three-electrode lenses.) By 1948 a ˚ was attained (Reisner, 1989). resolution of about 15 A Reports to C. L. Farrand show that Ru¨denberg remained interested in the theoretical, experimental, and even commercial aspects through the life of the project. By 1952 two initial units of the Farrand microscope had been completed. After a visit to the factory in August 1953, Ru¨denberg reported his reactions to the microscope to Mr. Farrand, with which images of 225,000 magnification with resolution ranging from 50 to 20 ˚ had been taken. His written report mentions image distortion could not A be explained by the external magnetic fields. Experimental analysis suggested this was due to vibrations picked up from the environment, which were magnified by the unusual horizontal column design, even with the extensive rubber insulation of the frame and microscope. Despite this shortcoming, Ru¨denberg agreed with Nolan that the pilot commercial model was ready for display and suggested to Mr. Farrand that the commercial microscope itself be exhibited at the Electron Microscopy Society Meeting in November of 1953 (Ru¨denberg, 1953b). In the end, however, the Farrand electrostatic microscope was not successful commercially—there were few, if any, orders for units. The excessively long duration of the developments and the horizontal column design were deemed to be contributing factors. By the early 1950s the Farrand company was swamped with orders of specialized optical equipment for the Korean War, and eventually the electron microscope project and any further marketing efforts were discontinued. This was discouraging to Ru¨denberg and to the entire scientific team of the project (Rempfer, 2009).
6. RECOGNITION AND CHARACTER Throughout his lifetime, Ru¨denberg received many honors, primarily for his work and developments in electrical engineering or electric power systems. A few of these are noted here. In 1911 Ru¨denberg was awarded the Montefiore Prize by the Institut Montefiore in Belgium, and in 1921 he received the honorary degree of Doctor of Engineering from the Institute of Technology in Karlsruhe. In 1946, Stevens Institute presented him their Medal and Honor Award for ‘‘Notable achievement in the field of electron optics as the inventor of the electron microscope.’’ In 1949 he was awarded the 8th Cedergren Medal, conferred every five years for ‘‘high merits in the Arts and Sciences of Electricity’’ by the Swedish Royal Governors of the Universities of Technology and the professors of the
266
H. Gunther Rudenberg and Paul G. Rudenberg
Department of Electrical Engineering of the Royal Institute of Technology. In 1956 the Technical University Berlin elected Ru¨denberg an Honorary Senator for his dedication to teaching at their faculty. In 1957 the German Federal Government awarded him the Grand Cross of Meritorious Service for his many contributions to the country of his birth.37 The journal Elektrotechnische Zeitschrift commemorated his 75th birthday with an issue listing his many awards and publications (Jacotett et al., 1958). In 1961 Ru¨denberg received the Elliott-Cresson Medal (now part of the Benjamin Franklin Award) from the Franklin Institute in Philadelphia for his work in the performance of electric power systems. Posthumously the Siemens firm named him a Siemens ‘‘pioneer’’ and inaugurated the Reinhold-Ru¨denberg Laboratory for Low Voltage Switching Technology in his honor. Ru¨denberg’s students, colleagues, and friends admired and remembered him for his unusual clarity of thought in his writings, his love of teaching, and his lively personal interest in each of their well-being. Ru¨denberg had an intense work ethic and a wonderfully keen mind, and a talent for cutting through directly to the core of a knotty problem in remarkable clarity. With this he could quickly perceive a likely direction to explore, or even recognize a suitable solution, often before his colleagues who were close at hand. This trait was admired by his friends but occasionally viewed with irritation by some of his detractors. In his memoir, Ru¨denberg has left out all references to the deep emotions that accompanied some of his work and life experiences along the path culminating in his invention of the electron microscope. He tells his story, virtually an autobiography of a major part of his life, with brevity and understatement. Certain unhappy or joyful events are mentioned only in a few words without further description (for example, his young son’s diagnosis with infantile paralysis; the difficult events surrounding his and his family’s departure from their home in Germany, and in contrast, his excitement when he first saw an electron micrograph of a tobacco mosaic virus, and later of a polio virus crystal). Ru¨denberg expressed disappointment but not bitterness about not receiving more recognition for his accomplishments in the electron microscopy field. He, in fact, often included the developments of other workers, even his detractors, in his descriptions of his own work (Ru¨denberg, 1948b, 1959). Ru¨denberg had the reputation for being honest down to minute details.38 Because many details have been lost due to the events in Germany, it is important to consider this reputation in weighing the 37
1957, Große Verdienstkreuz des Verdienstordens der Bundesrepublik Deutschland, ‘‘Pour le Merite.’’ Ru¨denberg once tried unsuccessfully to return funds to his insurance company when a valuable ring of Lily’s, previously claimed to the company as lost, was later found (Angelika Rudenberg Howard, 1983).
38
Origin and Background of the Invention of the Electron Microscope Commentary
267
words of his memoir. In 1947 Judge Charles Wyzanski summarized Ru¨denberg’s character after extensive testimony in the proceeding to reclaim ownership of his two U.S. patents on the electron microscope, which best describes to an independent observer Ru¨denberg’s nature and his meticulous presentation of the events surrounding his ideas and work: From the moment he (Reinhold Ru¨denberg) took the stand it was evident to all in the courtroom that the plaintiff adhered to the highest traditions of the scientific profession of which he is admittedly an outstanding member. He consistently showed the deepest concern for the truth. Each of his statements was made with scrupulous accuracy—nothing was underlined or omitted out of self-interest or other extraneous considerations (Wyzanski, 1947a, p. 2). This commitment of Reinhold Ru¨denberg to science, to invention, to fairness, and to the truth should always be considered in any attempt to understand his unique contributions to the early history of this important invention.
ACKNOWLEDGMENTS We are most grateful to all those who assisted with the development and improvement of this commentary: Ms. Alexandra Kinter, Siemens Corporate Archives; Dr. Angela Clark (granddaughter of R. Ru¨denberg); Ms. Kathleen Duff; Dr. Elizabeth Rudenberg (grandaughter); Dr. Gertrude Rempfer; Dr. Douglas J. Taatjes; Dr. Charles Lyman; Rev. Volker Schnu¨ll; Dr. David Finkelhor; Mr. Jim Rudenberg (grandson); Staff of the Harvard University Archives; Mr. Walter Hickey and Staff, National Archives and Records Administration, Waltham, Massachusetts; and Staff of the National Archives and Records Administration, College Park, Maryland. Special thanks to Marguerite, Nathaniel and Peter Rudenberg.
REFERENCES* Avery, C. (1947). Testimony in Rudenberg vs. Tom Clark, Attorney General. Civil Action No. 3873. Box 1469. National Archives and Records Administration, Waltham, MA. Avery, C. (1964). Letter to Lily Ru¨denberg, re: Electron microscope patent, May 27, 1964.* Bru¨che, E. (1930). Strahlen langsamer Elektronen und ihre technische Anwendung. In ‘‘Forschung und Technik,’’ (W. Petersen, ed.), p. 29. Springer, Berlin. Busch, H. (1922). Eine neue Methode zur e/m-Betstimmung. Physik. Zeitschr. 23, 438–441. Busch, H. (1926). Berechnung der Bahn von Kathodenstrahlen im axialsymmetrischen elektromagnetischen Felde. Ann. Physik 386, 974–993. ¨ ber die Wirkungsweise der Konzentrierungsspule bei der Braunschen Busch, H. (1927). U Ro¨hre. Arch. Elektrotech. 18, 583–594. Busch, H. (1936). Grundlagen und Entwicklung der Elektronenoptik. Z. Techn. Phys. 17, 584–588. Cohen, I. B. (1948). Letter to R. Ru¨denberg,. April 12, 1948.* Davisson, C. J., and Calbick, C. J. (1931). Electron lenses. Phys. Rev. 38, 585. *The noted documents are found in the H. G. Rudenberg Archive, Scarborough, Maine. Requests regarding them may be made to Paul Rudenberg, 259 Foreside Road, Falmouth, Maine, 04105.
268
H. Gunther Rudenberg and Paul G. Rudenberg
Davisson, C. J., and Calbick, C. J. (1932). Electron lenses. Phys. Rev. 42, 580. [Error in 1931 abstract corrected]. Eisenzapf, K. (1953a). Letter to R. Ru¨denberg. July 20, 1953.* Eisenzapf, K. (1953b). Das in den Einspruchen gegen die Anmeldung B. 48483 genannte material, July 18, 1953.* Fleming, G. (1953). U.S. patent 2740919. Electron lens. Filed June 25, 1953, published April 3, 1956. Fleming, G., Rempfer, R., Asherman, E., and Nolan, P. (1947). Recent electronmicrographs obtained with a new electrostatic electron microscope. Proceedings of the Electron Microscope Society of America Meeting, December 11–13, 1947. Freundlich, M. M. (1994). The history of the development of the first high-resolution electron microscope. MSA Bull. 24, 405–415. Gabor, D. (1942). Electron optics. Electron. Eng. 15, 295–299, 328–331, 337, 372–374. [Originally given as a lecture, delivered October 31, 1942 at the Midlands Branch of the Institute of Physics]. Gabor, D. (1946). Letter to R. Ru¨denberg, June 21, 1946.* Gabor, D. (1947). Letter to R. Ru¨denberg, December 23, 1947.* Gabor, D. (1953). The History of the Development of Electron Microscopes. Lecture at the Bournemouth Meeting of the Institute of Physics, May 29, 1953. Gray, T. (1929). Development of a Hot-Cathode High-Voltage Cathode-Ray Oscillograph. MS thesis, Massachusetts Institute of Technology Library Archives. George, R. H. (1929). New type of hot kathode-ray oscillograph. J. Am. Inst. Elec. Eng. 49, 534–538. Goetzeler, H., and Feldtkeller, E., et al. (eds.), (1994). Pionere der Wissenschaft bei Siemens: beruflicher Werdegang und wichtigste Ergebnisse. Publicus MCD. Verlag, Erlangen. Grover, H. G. (1948). Letter to attorney Willis Taylor (cc), April 30, 1948.* Grover, H. G. (1945). Letter to attorney Willis Taylor (cc), March 26, 1945.* Hawkes, P. (1985). The beginnings of electron microscopy. Adv. Elec. Electron Phys. Suppl. 16, 602–608. [The letters were circulated, but only published after both authors had died]. Howard, Angelika (1985). Memories. Unpublished (ca 1985).* Jacottet, P., and Strigel, R. (1958). Reinhold Ru¨denberg zum 75 Geburtstag. Elektrotechnische Zeitschrift A 79, 97–100. ¨ ber angewandte Mathematik und Physik in ihrer BedeuKlein, F., and Riecke, E. (1900). U tung fu¨r den Unterricht anden ho¨heren Schulen. [On applied mathematics and physics and their importance for higher education]. B. G. Teubner. Leipzig. Knight Brothers Firm (1935). Statement regarding patent application from June 28, 1935. Patent wrapper of U.S. application 613857. National Archives and Records Administration [NARA], College Park, MD. Knoll, M. (1929). Vorrichtung zur Konzentrierung des Elektronenstrahls eines Kathodenstrahloszillographen. German patent 690809. Filed 1929, published 1940. Knoll, M. (1960a). Letter to Prof. Erwin Weise, March 7, 1960. Borasky Collection, Smithsonian Institution, Washington, DC. Knoll, M. (1960b). Letter to Max Steenbeck. Adv. Elec. Electron Phys. Suppl. 16, 602–608. Knoll, M., Houtermans, F. G., and Schulze, W. (1932). Untersuchung der Emissionaverteilung an Gluhkathoden mit dem magnetischen Electronenmikroskop. [First presented by von F. Houtermans, February 14, 1932, for the Tagung des Gauvereins Niedersachsen der Phys. Ges]. Z. Physik 78, 340. Knoll, M., and Ruska, E. (1932a). Beitrag zur geometrischen Elektronenoptik. Ann. Physik V 12, 607–640, 641–701. Knoll, M., and Ruska, E. (1932b). Das Elektronenmikroskop. Z. Phys. 78, 318–339. Ku¨pfmu¨ller, K. (1944). Zur Geschichte des Elektronenmikroskops. Physikalische Zeitschrift 45, 47–51.
Origin and Background of the Invention of the Electron Microscope Commentary
269
Lin Qing (1995). Zur Fru¨hgeschichte des Elektronenmikroskops. GNT-Verlag, Stuttgart. Lu¨bcke, E. (1964). Personal communication with Lily Ru¨denberg. July 1964.* Mu¨ller, F. (2008). The birth of a modern instrument and its development during World War II. In ‘‘Scientific Research in World War II,’’ (A. Maas and H. Hooijmaijers, eds.), pp. 121–146. London and New York. Mulvey, T. (1962). Origins and historical development of the electron microscope. Br. J. Appl. Phys. 13, 197–207. Mulvey, T. (1967). The history of the electron microscope. In ‘‘Historical Aspects of Microscopy,’’ (G. Bradbury and G. Turner, eds.), pp. 201–227. Heffer, Cambridge. Pyenson, L. (1979). Physics in the Shadow of Mathematics: The Go¨ttingen Electron-Theory Seminar of 1905. Arch. Hist. Exact Sci. 21, 55–89. Reisner, J. H. (1989). An early history of the electron microscope in the United States. Adv. Elec. Electron Phys. 73, 133–231. Rempfer, G. (1993). Electrostatic electron microscopy in the 1940s and today. MSA Bull. 23, 153–158. Rempfer, G. (2009). Personal Communication with Paul G. Rudenberg, August 2009. Rudenberg, H. G. (1986). Impressions of the family experience of the diagnosis of polio. Unpublished (ca. 1986).* Rudenberg, H. G. (1987). The Farrand microscope years. Unpublished manuscript.* Ru¨denberg, Lily (1983). Historiches. Unpublished.* Rudenberg, H. G., and Rudenberg, F. H. (1994). Reinhold Ru¨denberg as a physicist—his contributions and patents on the electron microscope, traced back to the ‘‘Go¨ttingen Electron Group.’’ MSA Bull. 24, 572–580. Ru¨denberg, R. (1906). Energie der Wirbelstro¨me in elektrischen Bremsen und Dynamomaschinen [Eddy Current Formation]. Dissertation, Dr. Ing. at TU Hannover. F. Enke, Stuttgart. Ru¨denberg, R. (1907). Go¨ttingen course notes. Reinhold Rudenberg papers. Harvard University Archives Pusey Library, HUG 4753. Ru¨denberg, R. (1924). Oszillograph zur Aufnahme schnell veraenderlicher Erscheinungen. German patent 429926 of 1924, published 1926. Ru¨denberg, R. (1925). U.S. patent 1,695,719. Oscillograph. Filed December 1, 1925. In Germany, 1924, published 1928. Ru¨denberg, R. (1926). U.S. patent 1,937,846. Thermionic tube. Filed April 12, 1926. In Germany, 1925, published December 5, 1933. Ru¨denberg, R. (1931a). German patent 719112. Elektronenroehre mit einen Teil der Roehrenwandung bildender. Anode. Filed January 20, 1931, published March 30, 1942. Ru¨denberg, R. (1931b). German patent 889660. Anordnung zur Beeinflussung des Verlaufs von Elektronenstrahlen durch elektrisch geladene Feldblenden. Filed May 31, 1931, published 1953. [Siemens application 48476]. Ru¨denberg, R. (1931c). German patent 906737. Anordnung zum vergro¨ßerten Abbildung von Gegensta¨nden mittels Elektronenstrahlen. Filed May 31, 1931, published in Germany March 18, 1954. [Siemens application 48483]. Ru¨denberg, R. (1931d). German patent 895635. Anordnung zur vergro¨ßerten Abbildung von Gegensta¨nden mittels Elektronenstrahlen und mittels den Gang der Elektronenstrahlen beeinflussender elektrostatischer oder elektromagnetischer Felder. Filed May 30, 1931, published November 5, 1953 [derived from Siemens application 48483]. Ru¨denberg, R. (1931e). German patent 916838. Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen. Filed June 27, 1931, published August 19, 1954. [Siemens application 48804]. Ru¨denberg, R. (1931f). German patent 911996. Einrichtung zum Abbilden von Gegensta¨nden. Filed June 28, 1931, published May 24, 1954. [Siemens application 48806].
270
H. Gunther Rudenberg and Paul G. Rudenberg
Ru¨denberg, R. (1931g). Swiss patent 165549. Einrichtung zum Abbilden von Gegensta¨nden. Filed 17 May, 1932. In Germany, May 30, June 27, June 28, 1931, and March 30, 1932, published April 2, 1934. [Siemens application numbers 48483, 48804, 48806, 51733, 51858]. Ru¨denberg, R. (1931h). French patent 737716. Dispositif pur obtenir des images d’objets. Filed May 27, 1932, published December 15, 1932. [Siemens application numbers 48483, 48804, 48806, 51733, 51858]. Ru¨denberg, R. (1931i). U.S. Patent application 613,857. Apparatus for producing images of objects. May 27, 1932. NARA, College Park, MD. [Siemens application numbers 48483, 48804, 48806, 51733, 51858]. Ru¨denberg, R. (1931j). Patent wrapper of U.S. patent application 613,857. Apparatus for producing images of objects. Filed May 27, 1932-October 27, 1936, NARA, College Park, MD. Ru¨denberg, R. (1931k). U.S. patent 2,058,914. Apparatus for the imaging of objects. Filed May 27, 1932, published Oct. 27, 1936. In Germany, May 30, June 26 and 27, 1931, and March 30, 1932. [Siemens application numbers 48483, 48804, 48806, 51733, 51858]. Ru¨denberg, R. (1931l). Austrian patent 137611. Einrichtung zum Abbilden von Gegensta¨nden. Filed May 30, 1932. In Germany, May 30, June 26 and 27, 1931, and March 30, 1932, published May 25, 1934. [Siemens application numbers 48483, 48804, 48806, 51733, 51858]. Ru¨denberg, R. (1931m). Netherlands patent 43263. Inrichtung voor het vergroot afbeelden met behulp van een electronenmicroscoop met electrostatiche lenzen. Filed May 30, 1932. In Germany, May 30, 1931, published June 15, 1938. [Siemens application 48483 (48476)]. Ru¨denberg, R. (1931n). U.S. Patent application 615,260. Apparatus for influencing the character of electron rays. Filed June 3, 1932, NARA, College Park, MD. In Germany, May 30, 1931. [Siemens application number 48476]. Ru¨denberg, R. (1931o). U.S. Patent 2,070,319. Apparatus for influencing the character of electron rays. Filed June 3, 1932. In Germany, May 30, 1931, published February 9, 1937. [Siemens application number 48476]. Ru¨denberg, R. (1932). Elektronenmikroskop. Naturwissenschaften 20, 522. Ru¨denberg, R. (1936). Verzeichnis eigener Patente (List of own patents to 1936). Harvard University Archives, Pusey Library, HUG 4756. Ru¨denberg, R. (1938). Letter to Carl v. Siemens. 1938.* Ru¨denberg, R. (1943). The early history of the electron microscope. J. Appl. Phys. 14, 434. Ru¨denberg, R. (1945a). Origin and background of the invention of the electron microscope. Draft of July 3, 1945.* Ru¨denberg, R. (1945b). Notes on my U.S. patents. nos. 2058914 and 2070319 as of May 30, 1931 (ca. 1945).* Ru¨denberg, R. (1946a). Stray field shielding for electron microscopes, its effects and defects.. Internal report to C. Farrand, 1946.* Ru¨denberg, R. (1946b). Letter to Willis Taylor,. December 6, 1946 (cc).* Ru¨denberg, R. (1947). Testimony in Rudenberg vs. Tom Clark. Attorney General (1947). Civil Action No. 3873. NARA, Waltham, MA, Box 1469. Ru¨denberg, R. (1948a). Electron lenses of hyperbolic field structure. J. Franklin Inst. 246, 311–339, 377–408. Ru¨denberg, R. (1948b). The Story of the Electron Microscope. Lecture at the Centennial Celebration of the Scientific Schools at Harvard University, February 13, 1948. Papers of Reinhold Ru¨denberg, 1901–1964. Harvard University Archives, HUG 4756. Ru¨denberg, R. (1949). The electron gun for illuminating samples. Internal report. Farrand Optical Company, Inc. Ru¨denberg, R. (1950). Multiple element electrostatic lenses. according to U.S. patent 2,070,319. Unpublished.* Ru¨denberg, R. (1953a). Letter to Eisenzapf, September 1953 (cc).* Ru¨denberg, R. (1953b). Letter and report to C. L. Farrand,. September 2, 1953 (cc).* Ru¨denberg, R. (1953c). Letter to D. Gabor, October 13, 1953 (cc).*
Origin and Background of the Invention of the Electron Microscope Commentary
271
Ru¨denberg, R. (1953d). Letter to D. Gabor,. November 13, 1954 (cc).* Ru¨denberg, R. (1954). El Mi patent list. Unpublished.* Ru¨denberg, R. (1959). Origin and Background of the Invention of the Electron Microscope. Papers of Reinhold Ru¨denberg, 1901–1964. Harvard University Archives, HUG 4756.31. Ru¨denberg, R. (1960). Personal communication to H. G. Rudenberg ca. 1960. Ru¨denberg, R. (1961a). Statement about the early history of the electron microscope. Unpublished.* Ru¨denberg, R. (1961b). Letter to Dr. Zane Price, December 1961(cc)* [NOTE: The letter was drafted by Ru¨denberg and typed and sent posthumously by his son]. Ruska, E. (1986). The emergence of the electron microscope. J. Ultrast. Mol. Struct. Res. 95, 3–28. Ruska, E., and Knoll, M. (1931). Die magnetische Sammelspule fu¨r schnelle Elektronenstrahlen. Z. Techn. Physik 12, 389–399, 448. Ruska, E., and von Borries, Bodo (1932). German patent 680284. Magnetische Sammellinse kurzer Feldlaenge, Filed, March 16, 1932, published August 25, 1939. Ruska, E., and Jaentsch, O. (1942). Mikroskop mit elektrostatischen Linsen fu¨r Elektronen und Ionenstrahlen. German patent 911,058, filed 1942, published May 10, 1954. [Also published as Microscopes avec lentilles e´lectrostatiques pour rayons e´lectroniques et ioniques. French patent 901,934, published August 9, 1945]. Schoen, L. (1985). Ru¨denberg, Reinhold. Neue Deutsche Biographie. Bavarian Academy of Sciences and Humanities. Duncker & Humblot Berlin 22, 210–212. Siemens, and Halske, A. G. (1932). German patent DE 909,156. Einrichtung zur Erzeugung von Gefu¨gebildern mit Elektronenstrahlen Zur Herstellung von Gefu¨gebildern der Oberfla¨che kristalliner Haufwerke. Filed on May 15, 1932, published 1954. Siemens Patent Department (date unknown). List of Rudenberg patents. Siemens Corporate Archives, Munich. Siemens, U. (1963). Letter to Lily Ru¨denberg. December 13, 1963.* Steenbeck, M. (1960). Letter to M. Knoll, November 8, 1960. Adv. Elec. Electron Phys. Suppl. 16, 602–608. Trendelenburg, F. (1975). Aus d. Geschichte d. Forschung im Hause Siemens. [From the history of research at Siemens]. VDI Verlag Nr.31, Du¨sseldorf. von Siemens, C. F. (1933). Statement given in honor of R. Ru¨denberg’s 25 years at Siemens, October, 14, 1933, Munich.* Weigend, P. (1946). Deposition June 15 and 24, 1946, in New York. Civil Action 3873. NARA, Waltham, MA. Weiher, S. von (1976). Ru¨denberg, Reinhold. Dictionary of Scientific Biography. Charles Scribner, New York, 11, 588–589. Weichert, E. (1896). Die Theorie der Elektrodynamik und die Ro¨ntgensche Entdeckung. ¨ kon. [Electrodynamic Theory and the Roentgen Discovery] Schriften der Phys. O Gesellschaft Zu Ko¨nigsberg 37, 1–48. Wiechert, E. (1897). I. Ueber das Wesen der Elektricita¨t. II. Experimentelles u¨ber die Kathodenstrahlen. Schriften der physikalisch-o¨konomischen ( January 1897, issued 1898). Gesellschaft zu Ko¨nigsberg 38, 3–16. Wyzanski, C. (1947a). Opinion, June 24, 1947 in Civil Action 3873. Reinhold Rudenberg v. Tom C. Clark, Attorney General. District Court of Massachusetts. No. 72 Federal Supplement, 381. Wyzanski, C. (1947b). Notice of defendant to hold record open, May 7, 1947, in Civil Action 3873. Reinhold Rudenberg v. Tom C. Clark, Attorney General. District Court of Massachusetts. NARA, Waltham, MA, Box 1469. Wyzanski, C. (1947c). Stipulation of facts, May, 1947 in Civil Action 3873. Reinhold Rudenberg v. Tom C. Clark, Attorney General. District Court of Massachusetts. NARA, Waltham, MA, Box 1469. Wyzanski, C. (1961). Letter to Dr. Jacob Fine, June 16, 1961.*
272
H. Gunther Rudenberg and Paul G. Rudenberg
APPENDIX 1. U.S. PATENT APPLICATION 613857 Apparatus For Producing Images Of Objects1
Reinhold Ru¨denberg, Berlin-Grunewald, Germany Assor. To Siemens–Schuckertwerke Aktiengesellschaft of Berlin-Siemensstadt, Germany, a Corp. of Germany Application filed, complete, May 27, 1932 In Germany, May 30, 19312 My invention relates to apparatus for producing images of objects. Owing to the fact that in field-less spaces electron rays travel, similar to light rays, in straight paths, an object impervious to the rays, when placed in a divergent bundle or beam of rays, produces on a fluorescent screen an enlarged shadow. In cathode ray oscillographs, electro-magnetic fields surrounding the beam concentrically and causing a stricture of the beam are often employed. These fields have an effect on the ray similar to that of an optical lens on a ray of light they can make a beam of rays become convergent or divergent. Consequently, with such stricture fields an enlarged or reduced luminous image of the cathode ray3 is, according to circumstances, also obtained on the fluorescent screen. This enlargement or magnification of shadows has hitherto been regarded only as a secondary phenomenon of minor technical importance of the electron rays. The reduction has in the past only been employed to obtain a sharply defined luminous spot. (focus) According to my invention, the effect, similar to that of lenses in optics, produced by fields of force surrounding concentrically a beam of electrons and exerting a radial influence on same, is utilized for enlarging objects in a manner corresponding to that obtained with optical magnifying glasses and microscopes. For this purpose the object to be enlarged is exposed to an electron ray or a beam of rays, and before or behind the object, the beam is by means of lens-like acting fields made convergent or divergent. In figures 1 to 5 of the drawings the principle of the invention is explained.4
1
National Archives and Records Administration, College Park, Maryland. U.S. application 613,857, filed on May 27, 1932, combined five German patent applications: three dated May 30, June 26, and June 27, 1931, and two dated March 30, 1932. By convention, the U.S. application was given the first priority date. Only the part of U.S. application 613,857 determined to correspond to the Siemens application no. 48483, which was submitted to the German patent office on May 30, 1931, is reprinted here. 3 Changed to ‘‘cathode ray beam’’, January 1933. 4 Sentence omitted here which refers to figures 6 to 11, derived from the later German patent applications. 2
Origin and Background of the Invention of the Electron Microscope Commentary
273
Figure 1 shows a magnet stricture coil, the breath5 of which is small compared with the length of the path of the beam of rays. The rays are pressed by the magnetic field towards the axis of the coil, coinciding with the axis of the rays6, so that the rays meet in a focal spot or a focal line. In the case of homogeneous rays they meet in one focal point. The same effect can according to the present invention also be obtained by means of electrostatically charged stops or diaphragms disposed in the main symmetrically round the direction of the rays. In Fig. 2 a stop of that kind is illustrated. The electrostatic field is represented by its lines of force. The charge on the stop a is assumed to be negative. When the electrons of a beam or bundle of cathode rays pass through the stop they are repelled by the same. They are, therefore, deflected from their original paths, which are assumed to be parallel, towards the center and are assembled to a convergent beam meeting in the focal point o, beyond which they then diverge. As the radial component of the field strength of the stop is zero in its axis and increases outwardly in linear proportion to the distance, the degree of deflection of the electron rays also increases with the distance from the axis of the rays and the stop. Hereby all the rays are made to meet in the same focal spot. To obtain with sufficient accuracy this proportional relation of the radial field strength to the distance from the axis, it is advisable to make the opening of the stop considerably larger than the width of the original rays7, or by choosing a special shape for the electrodes, to give the field the suitable shape. If the potential of the diaphragm is assumed to be positive instead of negative, the electron rays will be attracted by the stop. The beam of rays which till then were moving in parallel paths then becomes divergent, as shown in Fig. 3. The negative stop, therefore, has the same effect as a convex lens in optics, and the positive stop has that of a concave lens. By combining stops of that kind, all the devices known to optics and based on converging or diverging beams can be imitated for electron rays. It is in this manner possible, for example, to make a microscope or a telescope for use with direct or reflected electron rays. Magnifiers, microscopes and telescopes constructed in this manner enable observations to be made which are impossible with optical investigations and also permit, regarding the order of magnitude, a considerably greater enlargement to be obtained than is possible with optical instruments, whose resolving power is limited by the wavelength of light. This limitation does not exist with magnifiers operating with electron rays (as the wavelength of these rays is several orders of magnitude smaller than that of light).8
5
Corrected to ‘‘breadth’’, January 1933. From the German ‘‘Strahlenachse’’, Changed to ‘‘axis of the beam’’, January 1933. Changed to ‘‘beam’’, January 1933. 8 Note, the phrase in parentheses is found in the foreign patents (Netherlands, Switzerland, Austria, United States) but not found in the two German microscope patents with priority date May 1931, published in 1953/4. 6 7
274
H. Gunther Rudenberg and Paul G. Rudenberg
As the deflection of the electron rays depends upon their velocity, a sharply defined focus is only obtained when operating with homogeneous cathode rays. It is therefore advisable to make the beam homogeneous before exposing to its rays the object to be magnified to the beam. The homogeneity can be obtained in the known manner, for example by making the rays pass through several velocity stops having the same potential. According to my invention, the homogeneity can also be obtained by utilizing the lens-effect of radially acting magnetic or electrostatic fields. An example for this is illustrated in Fig. 4. From the cathode K a diverging and non-homogeneous bundle of rays is discharged. This bundle is made to pass through the electrostatic stop a1, and so directed as to be almost parallel. The charged stop a2 concentrates the bundle of rays. At the focal spot o1, corresponding to the rays of the bundle having the desired velocity, an aperture stop c is placed. The stop c may be charged or be made to have the potential zero. As the bundle of rays also contains rays having different velocities, the focal point would not be sharply defined, if the stop c were omitted. The individual focal points of the rays9 having different velocities would be distributed along a long distance of the axis of the rays. The narrow aperture stop c, however, keeps back all the rays whose focal points are otherwise situated tham at o. Consequently the bundle of rays passing through the stop c is divergent and contains only, or for the greater part, rays having a definite velocity. The divergent and homogeneous bundle of rays leaving the stop c is made parallel by another stop a3. Behind the stop a3 the object to be magnified is placed in the path of the rays and is enlarged by means of a fourth charged stop a4, as shown in Fig. 2. In the example illustrated in Fig. 4, a separate stop is for clearness’ sake employed every time the beam is influenced. It is, however, possible to reduce the number of stops, or, according to circumstances, to increase it. Behind the stop a4, further stops may be placed which together with the stop a4, having the effect of a magnifying glass, produce a magnification similar to that obtained with a microscope or a telescope. An example of this kind of arrangement is shown in Fig. 5. The homogeneous parallel beam or bundle of electron rays penetrates the body d to be examined and then passes through a series of divergence stops b1, b2, b3, b4. The first stop b increases the crosssectional area of the beam to a certain value, the second diaphragm b2 acts on a small part of the increased area of the beam and re-increases same, etc. In this manner a cascade-like magnification is obtained. Assuming that the stops all have the same coefficient of magnification, the magnification obtained with n stops is equal to the coefficient of magnification to the nth power. With this kind of cascade arrangement large magnifications can be obtained, without it being necessary for the field of the individual stops to have an inconveniently high field intensity. Similar
9
Changed to ‘‘beam’’, January 1933.
Origin and Background of the Invention of the Electron Microscope Commentary
275
results can also be obtained with convergence stops or with arrangements containing both convergence and divergence stops. The object d to be magnified may consist, for example, of a thin layer or film to be observed, which, whilst weaking10 the electron rays more or less, allows them to pass through it. Arrangements according to the invention are also applicable in which the object to be examined is itself a source of electron rays, the latter being either directly produced rays or reflected rays or secondary rays. In that case it is possible with the use of charged diaphragms to obtain images corresponding to the real or virtual images in optics. According to the invention, microscopes and telescopes based on such images can also be constructed.11 I claim as my invention: 1. Apparatus for producing images of objects to any desired scale, comprising a source producing a beam of electron rays, means for producing a field influencing the beam of electron rays in a manner corresponding to that in which light rays are influenced by optical lenses, means for receiving the image produced by the influenced beam. 2. In combination with an apparatus according to claim 1, at least one magnet coil surrounding the beam substantially symmetrically. 3. In combination with an apparatus according to claim 1, at least one electrically charged diaphragm surrounding the beam substantially symmetrically. 4. In combination with an apparatus according to claim 1, at least one device to make the beam parallel. 5. In combination with an apparatus according to claim 1, a field making the beam parallel, said field being located between the source of electrons and the object of which the image is to be produced. 6. In combination with an apparatus according to claim 1, a plurality of means for producing fields influencing the beam radially, said means being arranged the one behind the other and causing a cascade-like increase of the magnification12. 7. In an apparatus according to claim 1, a source of electrons consisting of a hot cathode. 8. Apparatus according to claim 1, in which the source of the electrons is itself the object of which the image is to be produced.
10
Corrected to ‘‘weakening’’ January 1933. ‘‘According to the invention. . .’’ The two German patents of 1953 and 1954 that are derived from the German application 48483 of May 30, 1931, each end with this same sentence. The U.S. application 613,857, which includes four later German patent applications, continues with five more pages of text before the patent claims. 12 After this claim, the corresponding French, Swiss, and Austrian patents list the following claim: ‘‘Apparatus according to claim 1, that the enlargement causative fields are produced by electrostatically charged diaphragms that surround the beam essentially symmetrically.’’ In the U.S. application and patent, this specificity is found only in the figures and the text. 11
276
H. Gunther Rudenberg and Paul G. Rudenberg
9. In combination with an apparatus according to claim 1, means for producing fields influencing the beam radially, said means being arranged the one after the other and so dimensioned that real or virtual images are produced in a manner corresponding to that in which such images are produced in optical microscopes and telescopes.13
13
The Austrian patent 137611 lists this as the final claim with the German priority date of May 30, 1931. The U.S. application 613,857 lists 14 additional claims that appear to correspond to the four later German patent applications.
Origin and Background of the Invention of the Electron Microscope Commentary
277
APPENDIX 2. U.S. PATENT APPLICATION 6152601 Apparatus for Influencing the Character of Electron Rays
Reinhold Ru¨denberg, Berlin-Grunewald, Germany Assor. To Siemens–Schuckertwerke Aktiengesellschaft Siemensstadt, Germany, a Corp. of Germany
of
Berlin-
Application filed, complete June 3, 1932 In Germany May 30, 1931 My invention relates to improvements in apparatus for influencing the character of electron rays. It is known to be possible to control by means of electrical and magnetic fields the course in space of electron rays, as produced and employed for example, in cathode ray tubes according to Fig. 1. As an example, in Fig. 1, deflecting plates a1 and a2 are shown which can deflect the rays in two different directions and through the control of which, therefore, any kind of figure can be traced on the screen. The present invention relates to an arrangement by the means of which not so much the path of the electron ray as its character can be influenced. For this purpose according to the invention electrostatically controlled diaphragms disposed substantially in symmetrical arrangements around the direction of the beam and making the electron ray convergent, divergent, or parallel, are employed. In Fig. 1 diaphragms of that kind are indicated by b1 and b2. The charged diaphragms have, contrary to the known magnetic influence through stricture fields, the advantage of having very much smaller time constants for their action and therefore of following correctly as to quantity exceedingly rapid phenomena or occurrences. Through the variable convergence of divergence of the individual paths of the electron beam brought about by the diaphragms, the width of the beam can be altered. It can be made narrower or broader and, therefore, its radial extension be increased or diminished. The homogeneity of the velocity of the electrons in the beam can also be increased. By such means, on the one hand, much more clearly defined light spots than hitherto obtainable can be produced on the screen s, and, on the other hand, the light spots can be blurred to a certain degree. It is furthermore possible to regulate the intensity of the beam and the light within wide limits and one is able to control all these operations with practically any desired speed from the outside through the application of variable voltages.
1
National Archives and Record Administration, College Park, Maryland.
278
H. Gunther Rudenberg and Paul G. Rudenberg
In Fig. 2, the electrostatic field of such a charged diaphragm is represented by its lines of force. The charge is, for example, assumed to be negative. When the electrons of a cathode ray stream through the diaphragm, they are repelled by it and have the tendency to move toward the axis. They are, therefore, deflected from their original paths, which were assumed to be parallel, towards the axis and made to meet in a convergent pencil or bundle in a focal point o, after which the beam becomes divergent. As the radial component of the field strength of the diaphragm is zero in its axis and at first increases outwards linearly, the individual electrons will be all the more deflected from their paths the greater their distance from the axis of the beam and of the diaphragm. Hereby all the rays are made to meet in the same focal point o. To obtain with sufficient accuracy this proportionality of the radial field strength to the distance from the axis, it is advisable to make the opening considerably larger than the width of the original beam. This arrangement may be used to define more sharply the spot of light on the screen of Fig. 1. It is with same also possible to increase the brightness of the spot of light, for the pencil of electron rays can now be obtained not only by cutting out, by means of a diaphragm, the center parallel part of a bundle of rays diverging from the cathode k in all directions, but also by employing a diaphragm of the said kind with a suitably wide opening and a correctly chosen voltage and placing it relatively close to the cathode, where it collects the rays and makes them parallel or convergent, as shown in Fig. 3. By this means an exceedingly greater brightness of light is obtained on the screen and the cathode beam is much better utilized. If the potential of the diaphragm is made positive instead of negative, the individual electrons of the beam will be drawn toward the diaphragm. The parallel beam then becomes a diverging beam as shown in Fig. 4. Whilst the negative diaphragm has the effect of a convex lens in optics, the positive diaphragm has the effect of a diverging lens. By combining such diaphragms, all the instruments known in optics and based on converging and diverging beams can be imitated for electron rays and thereby a series of interesting and useful results be obtained. It is in this manner possible, for example, to build a microscope for use with direct or reflected electron rays, and with which a much greater magnification can be obtained than with our optical microscopes, limited in this respect by the wave length of light. According to the invention, said possibilities of influencing a bundle or beam of electron rays similarly to the way light rays are acted upon by optical lenses, are made use of in the following manner to influence the bundle of rays in accordance with the phenomenon to be reproduced. Close to a focal point, an aperture diaphragm is placed which only permits rays of a certain velocity to pass. An arrangement of that kind is illustrated in Fig. 5. As the deflection of each individual electron depends upon its velocity, only rays of the same velocity meet in the focal point o.
Origin and Background of the Invention of the Electron Microscope Commentary
279
For each velocity occurring in a non-homogeneous beam of electrons, a different focal point is obtained. For example, for a beam having a range of velocities from v1 to v2, focal points are obtained which according to Fig. 5 are located between 01 and 02. If at a spot o within the said focal line, an aperture diaphragm b2, without a charge or made to have the potential zero, is placed, only rays having the definite velocity v will be able to pass through the aperture whereas the others will strike against the aperture diaphragm and be kept back by the same. The charge thereby imparted to the aperture diaphragm can easily be led away. Behind the diaphragm, a homogeneous beam of to a great extent uniform velocity is obtained. If this beam is now deflected from its direction in space by deflecting plates a1 and a2 or by magnetic fields, the spot of light will in its movements always retain the same sharpness owing to the deflecting fields being able to deflect all parts of the beam only the same angle of deflection, depending upon their field strength and upon the uniform velocity of the rays. By varying the negative strength of the controllable diaphragm b1, the velocity of the beam passing through the aperture diaphragm b2 can be varied at will. It can, therefore, easily be adjusted to a value, at which the velocities occurring in the beam can be utilized to best advantage. Should it be desired to make the beam leaving the diaphragm in a diverging form parallel again, all that is required is to place a second negatively charged convergence diaphragm b3 at the same distance from the focal point, and to impart to it the same potential, as b1. The diaphragm b3 must therefore be capable of being adjusted and controlled simultaneously with diaphragm b1. In a similar manner the intensity of the beam may be varied. Let us assume that, in Fig. 5, the beam coming from the left has an approximately homogeneous velocity, then it will only be able to pass through the opening of the aperture diaphragm b2 when the controlling diaphragm b1 has a quite definite voltage. At every other voltage of the diaphragm b1, large quantities of the beam will strike against the aperture diaphragm itself and cannot pass through its opening. It is, therefore, possible by controlling the voltage, to vary to a great extent the intensity of the beam passing through the diaphragm. By choosing the opening of the diaphragm b2, suitably both with respect to size and shape, the change in intensity of the beam passing through it can be made to be completely or approximately proportional to the variation of the voltage of the diaphragm b1. The intensity of the beam can, therefore, be controlled in a simple manner and at any desired velocity. As the intensity of the impression of the light on the luminous screen is not exactly proportional to the intensity of the beam, and as the effect on the eye or on the photographic plate or on any other receiver is not proportional to the light intensity of the luminous screen, all these relations can be taken into account in choosing the shape of the diaphragm o, in order to obtain a proportional or other functional relation between the last impression and the controlling voltage of the diaphragm b1.
280
H. Gunther Rudenberg and Paul G. Rudenberg
In Fig. 6 is illustrated in which manner the various effects of the electrostatic diaphragms may be combined. From the cathode k, a divergent beam is emitted, which passes through the electrostatic diaphragm b1 and is made almost parallel. The charged diaphragm b2 concentrates the beam and throws through the opening in the uncharged aperture diaphragm b3 only the rays having the desired velocity. Under circumstances b1 and b2 can be joined into one. Onto the diaphragm b4, a homogeneous divergent beam falls, which through the pulsating charge is passed to the diaphragm b5, where only the variations in intensity pass through the opening. The charged diaphragm b6 concentrates onto the luminous screen s the divergent beam coming from b5. The diaphragm b6 maybe controlled in dependence on b4, in order again to concentrate on the luminous screen s also the rays partly screened off and having a different focal point outside of b5. By means of the deflecting plates a1 and a2, the homogeneous beam varying in its intensity can now be easily and accurately moved in the direction of the various coordinates. Regarding the shape to be given to the electrostatic diaphragm, there are various possibilities producing differing results. The simplest form is that of the disk-diaphragm according to Fig. 7 with a round opening. Another shape is represented by a circular ring, as in Fig. 8, also producing an axially symmetrical field, which is proportional to the distance from the axis. By combining the ring with the disk-diaphragm, as in Fig. 9, the two being similarly or contrarily charged or only having a difference in voltage, the course of the field can be varied in the effective deflecting zone. A similar variation is also obtained by means of a combination of a circular ring with a surrounding differently charged casing as in Fig. 10. To obtain a sharp lens- or focus- effect, forms of electrodes or diaphragms with edges forming such curves are employed that their effect upon the electron beam passing through them only takes place within a narrowly limited zone in that within that range the radial field strength is as proportional as possible to the distance from the axis, whilst the actual field strength to the right and left of the diaphragm is symmetrical in its course, in order that the additional axial acceleration and retardation of the electrons cancel each other in their effect. By means of the described arrangements, it is possible to obtain clear and sharp images on the luminous screen s, said images giving, when the fields of the diaphragms and the deflecting fields are suitably adjusted and controlled, the impression of stationary moving pictures. I claim as my invention: 1. Arrangement for influencing the character of electron rays by means of electrostatically charged diaphragms disposed substantially symmetrically round the direction of the rays and whose potential is controlled by a phenomenon to be produced, characterized by the placing, close to a focal
Origin and Background of the Invention of the Electron Microscope Commentary
2. 3.
4. 5.
6.
7. 8.
9.
281
point produced by a controllable diaphragm, of an aperture diaphragm charged or maintained at the potential zero with a small opening which substantially only permits electron rays of a definite velocity corresponding to the focal point in the opening of the diaphragm to pass. Arrangement according to claim 1, characterized by the placing behind the aperture diaphragm of a further charged diaphragm which at this place makes divergent beams become convergent or parallel. Arrangement according to claim 2, characterized by the charged diaphragm located behind the aperture diaphragm being connected to a varying potential depending upon the potential of the controlled diaphragm located on the opposite side of the aperture diaphragm. Arrangement according to claim 3, characterized by the controllable diaphragm having the same potential and being at the same distance from the uncharged aperture diaphragm. Arrangement according to claims 3 and 4, characterized by the deflecting plates being placed behind (see in the direction from the cathode) the controllable diaphragm located at both sides of the aperture diaphragm and acting upon the cathode ray-beam varying in intensity and made homogeneous. Arrangement according to claim 1, characterized by the charged diaphragm being so formed and dimensioned that the radial component of its field strength increases completely or approximately in proportion to the distance from the axis of the diaphragm and of the electron beam. Arrangement according to claim 6 characterized by the opening in the charged diaphragm being considerably larger than, for example more than twice the size of, the sectional area of the beam. Arrangement for influencing electron rays by means of electrostatically charged diaphragms surrounding the beam substantially concentrically, particularly according to claim 1, characterized by the charged diaphragms for obtaining a desired course of the field being composed of a plurality of parts to which potentials differing from one another can be imparted. Arrangement for influencing electron rays by means of electrostatically charged diaphragms surrounding the beam substantially concentrically, particularly according to claim 1, characterized by the employment of with respect to the electron beam positively charged diaphragms which make the bundle of rays divergent.
282
H. Gunther Rudenberg and Paul G. Rudenberg
Origin and Background of the Invention of the Electron Microscope Commentary
283
APPENDIX 3 R. Ru¨denberg: Patent Applications and Patents on the Electron Microscope
Siemens No. a
48476
Date Appl. Filed
Priority Date
Patent Country and No.
No change
May 30, 1931
May 31, 1931
Germany 889660
64721
January 1936
May 31, 1931
Germany 758391
64723
January 1936
May 31, 1931
Germany 754259
64722
May 30, 1931
NA
None granted
48476 - Foreign submissions
May 30, 1932
May 30, 1931
England 402781
Jun 3, 1932 Dec 16, 1932
May 30, 1931 May 30, 1931
United States 2,070,319 France 737816
New Siemens No.
Published Title Anordnung zum Beeinflussen des Verlaufs von Elektronenstrahlen durch elektrisch geladene Feldblenden Anordnung zur Beeinflussen des konvergenten Verlaufes von Elektronenstrahlendurch plattenoder ringfo¨rmige elektrisch geladene Feldblenden Anordnung zur Steuerung der Intensita¨t der durch eine Lochblende hindurchtretenden Elektronenstrahlen Anordnung zum Beeinflussen des Charakters des konvergenten oder divergenten Verlaufs von Elektronenstrahlen durch elektrostatisch geladene Feldblenden Improvements in or relating to cathode ray tubes Apparatus for influencing the character of electron rays Disposition pour influencer la nature de rayons e´lectroniques
English Translation of Title Arrangement for influencing the course of electron rays by electrically charged apertures Arrangement for influencing the course of convergent electron rays with plateor ring-shaped electrostatically charged apertures Arrangement to control the intensity of electron rays passing through through an aperture Arrangement for influencing the nature of the convergent or divergent course of electron rays by electrostatically charged apertures
Date Application Displayed
Date Patent Published
March 7, 1940 and Dec 18, 1952
Sep 14, 1953
May 17, 1939
Feb 26, 1942
Aug 31, 1944
NA
NA
Accepted Nov 30, 1933 Feb 9, 1937 Apparatus for influencing the character of electron rays
Oct 10, 1932
Dec 16, 1932
48483b
48483 (Continued)
No change
May 30, 1931
May 31, 1931
Germany 895635
Anordnung zur vergro¨ßerten Abbildung von Gegensta¨nden mittels Elektronenstrahlen und mittels den Gang der Elektronenstrahlen beeinflussender elektrostatischer oder elektromagnetischer Felder
69881
May 30, 1931
May 31, 1931*
Germany 906737
Anordnung zum vergro¨ßerten Abbildung von Gegensta¨nden mittels Elektronenstrahlen
Foreign submission 48483 only, after split **
May 30, 1932
May 30, 1931
Netherlands 43263
Inrichting voor het vergroot afbeelden met behulp van een electronenmicroscoop met electrostatiche lenzen
Apparatus for enlarged imaging, using an electron microscope with electrostatic lenses
Foreign submissions (of 48483 combined with 48804, 48806, 52556, 51733
May 27, 1932 May 17, 1932 May 30, 1932 May 27, 1932
Combined
France 737716
Combined
Apparatus for obtaining images of objects Apparatus for the imaging of objects Apparatus for the imaging of objects
May 30, 1931
Switzerland 165549 Austria 137611 United States 2,058,914
Dispositif pour obtenir des images d’objets Einrichtung zum Abbilden von Gegensta¨nden Einrichtung zum Abbilden von Gegensta¨nden Apparatus for producing images of objects Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen Einrichtung zum Abbilden von Gegensta¨nden
Apparatus for the enlarged imaging of objects through electron rays Apparatus for the imaging of objects
Combined
48804
No change
Jun 26, 1931
Jun 27, 1931*
Germany 916838
48806
No change
Jun 27, 1931
Jun 28, 1931
Germany 911996
Arrangement for the enlarged imaging of objects by means of electron rays and influencing the course of the electron rays through electrostatic or electromagnetic fields Arrangement for the enlarged imaging of objects by means of electron rays
March 31, 1938 and Feb. 12, 1953
Nov 5, 1953
Jun 25, 1953
Mar 18, 1954
Jan 15, 1938
Oct 10, 1932
Dec 15, 1932 Apr 2, 1934
May 25, 1934
Jan 15, 1934 Oct 27, 1936
Oct 1, 1953
Jul 8, 1954
March 31, 1938 and May 7, 1953
Apr 8, 1954
(continued)
(continued)
a
Date Appl. Filed
Priority Date
Patent Country and No.
Siemens No.
New Siemens No.
51733
No change
Mar 31, 1932
Mar 31, 1932*
Germany 916839
51858
No change
Mar 31, 1932
Mar 31, 1932*
Germany 916841
52556
No change
Aug 13, 1932
Aug 13, 1932*
Germany 915253
Published Title Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen Einrichtung zum vergro¨ßerten Abbilden von Gegensta¨nden durch Elektronenstrahlen (Elektronenmikroskop) Anordnung zum Beeinflussen des Charakters von Elektronenstrahlen durch elektrostatisch aufgeladene Doppelblenden
Original title of application 48476: Anordnung zum Beeinflussen des Charakters von Elektronenstrahlen Original title of application 48483: Enrichtung zum Abbilden von Gegensta¨nden * Five additional years were added to the license period of this patent. ** See also Netherlands patents No.49378, 57032. b
Date Application Displayed
Date Patent Published
Apparatus for the enlarged imaging of objects through electron rays Apparatus for the enlarged imaging of objects through electron rays
Nov 12, 1953
Jul 8, 1954
Oct 29, 1953
Jul 8, 1954
Arrangement for influencing the character of electron rays through electrostatically charged double diaphragms
Sept 24, 1953
Jun 10, 1954
English Translation of Title
Contents of Volumes 151–159
VOLUME 1511 C. Bontus and T. Ko¨hler, Reconstruction algorithms for computed tomography L. Busin, N. Vandenbroucke, and L. Macaire, Color spaces and image segmentation G. R. Easley and F. Colonna, Generalized discrete Radon transforms and applications to image processing T. Radlicˇka, Lie agebraic methods in charged particle optics V. Randle, Recent developments in electron backscatter diffraction
VOLUME 152 N. S. T. Hirata, Stack filters: from definition to design algorithms S. A. Khan, The Foldy–Wouthuysen transformation technique in optics S. Morfu, P. Marquie´, B. Nofie´le´, and D. Ginhac, Nonlinear systems for image processing T. Nitta, Complex-valued neural network and complex-valued backpropagation learning algorithm J. Bobin, J.-L. Starck, Y. Moudden, and M. J. Fadili, Blind source separation: the sparsity revoloution R. L. Withers, ‘‘Disorder’’: structured diffuse scattering and local crystal chemistry
VOLUME 153 Aberration-corrected Electron Microscopy H. Rose, History of direct aberration correction M. Haider, H. Mu¨ller, and S. Uhlemann, Present and future hexapole aberration correctors for high-resolution electron microscopy
1 Lists of the contents of volumes 100–149 are to be found in volume 150; the entire series can be searched on ScienceDirect.com
287
288
Contents of Volumes 151–159
O. L. Krivanek, N. Dellby, R. J. Kyse, M. F. Murfitt, C. S. Own, and Z. S. Szilagyi, Advances in aberration-corrected scanning transmission electron microscopy and electron energy-loss spectroscopy P. E. Batson, First results using the Nion third-order scanning transmission electron microscope corrector A. L. Bleloch, Scanning transmission electron microscopy and electron energy loss spectroscopy: mapping materials atom by atom F. Houdellier, M. Hy¨tch, F. Hu¨e, and E. Snoeck, Aberration correction with the SACTEM-Toulouse: from imaging to diffraction B. Kabius and H. Rose, Novel aberration correction concepts A. I. Kirkland, P. D. Nellist, L.-Y. Chang, and S. J. Haigh, Aberration-corrected imaging in conventional transmission electron microscopy and scanning transmission electron microscopy S. J. Pennycook, M. F. Chisholm, A. R. Lupini, M. Varela, K. van Benthem, A. Y. Borisevich, M. P. Oxley, W. Luo, and S. T. Pantelides, Materials applications of aberration-corrected scanning transmission electron microscopy N. Tanaka, Spherical aberration-corrected transmission electron microscopy for nanomaterials K. Urban, L. Houben, C.-L. Jia, M. Lentzen, S.-B. Mi, A. Thust, and K. Tillmann, Atomic-resolution aberration-corrected transmission electron microscopy Y. Zhu and J. Wall, Aberration-corrected electron microscopes at Brookhaven National Laboratory
VOLUME 154 H. F. Harmuth and B. Meffert, Dirac’s difference equation and the physics of finite differences
VOLUME 155 D. Greenfield and M. Monastyrskiy, Selected problems of computational charged particle optics
VOLUME 156 V. Argyriou and M. Petrou, Photometric stereo: an overview F. Brackx, N. de Schepper, and F. Sommen, The Fourier transform in Clifford analysis N. de Jonge, Carbon nanotube electron sources for electron microscopes E. Recami and M. Zamboni-Rached, Localized waves: a review
Contents of Volumes 151–159
289
VOLUME 157 M. I. Yavor, Optics of charged particle analyzers
VOLUME 158 P. Dombi, Surface plasmon-enhanced photoemission and electron acceleration with ultrashort laser pulses B. J. Ford, Did physics matter to the pioneers of microscopy? J. Gilles, Image decomposition: Theory, numerical schemes, and performance evaluation S. Svensson, The reverse fuzzy distance transform and its use when studying the shape of macromolecules from cryo-electron tomographic data M. van Droogenbroeck, Anchors of morphological operators and algebraic openings D. Yang, S. Kumar, and H. Wang, Temporal filtering technique using time lenses for optical transmission systems
VOLUME 159 Cold Field Emission and the Scanning Transmission Electron Microscope A. V. Crewe, The work of Albert Victor Crewe on the scanning transmission electron microscope and related topics L. W. Swanson and G. A. Schwind, A review of the cold-field electron cathode Joseph S. Wall, Martha N. Simon, and James F. Hainfeld, History of the STEM at Brookhaven National Laboratory Hiromi Inada, Hiroshi Kakibayashi, Shigeto Isakozawa, Takahito Hashimoto, Toshie Yaguchi, and Kuniyasu Nakamura, Hitachi’s development of cold-field emission scanning transmission electron microscopes P. W. Hawkes, Two commercial STEMs: The Siemens ST100F and the AEI STEM-1 Ian R. M. Wardell and Peter E. Bovey, A history of Vacuum Generators’ 100-kV scanning transmission electron microscope H. S. von Harrach, Development of the 300-kV Vacuum Generator STEM (1985–1996) Bernard Jouffrey, On the high-voltage STEM project in Toulouse (MEBATH) Andreas Engel, Scanning transmission electron microscopy: biological applications K. C. A. Smith, STEM at Cambridge University: reminiscences and reflections from the 1950s and 1960s
This page intentionally left blank
Index
A Airborne visible and infrared imaging spectrometer (AVIRIS) correlation coefficient, 123–124 endmember screening linear correlation coefficients, 127 scaling operation, 126–127 spectral curves, 127–128 final endmember, 120 lattice auto-associative memories, 122–123 linear unmixing techniques characteristics, 135 complexity, 135–136 final endmember, 120 W & M algorithm, 135 Moffett Field site, 124–126 nonnegative least squares (NNLS) algorithm, 129–130 pixel spectra, computational effort, 133–134 processing times, 133, 134 USGS source image map, 121–122 Alternating sequential filter (ASF), 41 Aperture diaphragm, 278–279, 282 Area close-open (ACO), 41 Area open-close (AOC), 41 Auto-associative memories, 118 B Berkeley segmentation dataset, 65 Blinking behavior, fluorescent agents, 83 Bounded lattice-ordered group, 116 C Cathode ray oscillographs, 272 Cathode ray tube (CRT), 11, 277, 282 amplifier triode, 175 Busch’s analysis, 177 electron beam, 175 electrons properties, 177–178 parts, 173 schematics, 176 spiral ray paths, 173
thermionic electron emission, 174 Color area morphology scale spaces area openings and closings connected operators, 38 grey-scale images, 39 structuring elements, 39–40 color connected filters color area morphology scale-spaces, 44–48 convex color sieve, 48–50 vector area morphology open-close sieve, 57–60 vector area morphology sieve, 50–57 Gaussian filters, 36 image segmentation Berkeley segmentation dataset, 63–65 grey-scale sieve, 67–68 Koala test image, 65 lily test image, 64 maximum F-measures, 66–68 P-R curves, 64–65 segmentation performance, 66–68 tree-based image representation, 63 implementation and timings color sieve algorithm, 61–62 GS-AOC sieve, 62–63 pixel-queue algorithm, 61 marginal ordering, 37 morphological operations ACO and AOC sieves, 42 scale-space, 42–43 noise-corrupted images images segmentation, 68 impulsive and Gaussian noise, 68–69 vectorial approach, 70 Color connected filters color area morphology scale-spaces color extrema, 45–47 distance metrics, 47–48 convex color sieve CCS extrema, 50 convex hull, 48–49 vector area morphology open-close sieve flat zone, 58
291
292
Index
Color connected filters (cont.) lily test image, 58 maxima (black) and minima (white), 58–59 VAMOCS extrema, 57–58 vector area morphology sieve definition, 51 parameter evaluation, 53–57 Color management module (CMM), 13 Convex color sieve (CCS), 37 D Disk-diaphragm, 280, 283 Diverging beam, 278, 282 E Electro-magnetic fields, 272 Electron microscope, 284–286 apertured diaphragm, 185 electromagnetic field, 185 electron lenses, 186–187 fluorescent screen, 189 general principle, 191 image formation, 192 infantile paralysis poliomyelitis, 181 instrument cathode ray oscilloscopes, 196 electron lens, 197 magnetic lens, 196–197 opaque materials, 197 virus image, 197–198 invention Nazi regime, 195 Siemens-Schuckertwerke, 194 inventions, 172 lens-forming fields, 191 optical microscopes, 183 Ru¨denberg design engineer, 179 lecturer, 180 schooling, 178–179 working, 180–181 schematic, 188 Electron rays, 273 Electrostatically charged diaphragms, 278, 280–282 Electrostatic field, 278 F First-in first-out (FIFO), 61
G Gamut mapping color science CIELAB color space, 7–8 color and image appearance, 8–9 color spaces, 5–6 human vision, 5 RGB and XYZ color spaces, 6–7 image quality measures comparing models, 24–25 Laplacian mean square error, 22–23 mean square error, 22 measure QDE, 22 measure QDLC, 23 structural similarity index, 23–24 image reproduction process color management, 12–14 visual image perception model, 9–12 optimization problem constraints and continuity, 27 fixed mappings, 26 global image dependency, 26 local image dependency, 26–27 objective functions and constraints, 27–31 parametric mappings, 27 psychometrics data evaluation, 16–21 test methods, 14–15 Gaussian noise, 68–69 Grey-scale area morphology, 44 Grey-scale area open-close (GS-AOC), 60 H Harmonic holography microscopy. See also Second harmonic generation (SHG) basic principle, 93–95 coherent nonlinear optical process, 102 description, 79–80 3D imaging capabilities, 77, 78 far-field technique, 76–77 fluorescence microscopy, 79 framing time, 77 holography, 77–78 4D imaging, 75–76 vs. direct imaging, 97–100 3D microscopic cell imaging acquisition time, 105 bright-field transmission image, 103
Index
intensity profiles, 104–105 optical sectioning, two-photon LSM, 102–103 reconstructed image, 104 sample preparation, 102 experimental setup, 95–96 multiplexed imaging, 106 nanocrystal clusters, 96–98 particle size, 105 surface functionalization and bioconjugation, 105–106 optical harmonic coherence, 93, 94 performance estimation point spread function, 100–101 speckle noise, 101 reconstruction algorithm, 96–97 signal to noise ratio (SNR) direct vs. H2 imaging, 100 gain, 98–99 shot noise, 97–98 ultrafast 4D contrast imaging, 106–107 Hyperspectral imagery, lattice algebra approach affinely independent theorem proofs, 158–167 application examples mineralogical and chemical composition, 120–124 Moffett field site, 124–136 autonomous endmember detection, 114–115 endmembers, 114 F(X) geometry canonical basis vector, 136 constructive theorem, 138–145 convex polytope and vertices, 147 directional vectors, 137 directions and orientations, 137–139 extreme points, 145 hyperboxes and lattice correlation memories, 146 linear subspaces, 136 lattice-associative matrix memories (LAMs), 115 endmembers, 119–120 properties, 118–119 lattice correlation memories (W and M) affine-independent sets, 154–157 X and F(X), 145–148 lattice theory bounded lattice-ordered group, 116
293
lattice dependent and independent vector, 117 linear minimax combination, 117 pointwise maximum and minimum, 117 scalar addition, 116–117 linear mixing model, 114 linear unmixing techniques (see Linear unmixing techniques) pixel spectrum, 113–114 spectral resolution, 113–114 I ICC color management system (ICC-CMS), 13 Image color appearance model (iCAM), 9 Image reproduction process color management, 12–14 visual image perception model, 9–10 appearance model, 11 device characterization, 11 perceived image difference measure, 11–12 Impulsive noise, 68–69 Infantile paralysis, 181–182 International color consortium (ICC), 3 Inventions, microscopes and telescopes, 275–276 J Jablon´ski energy diagram, 81, 82 Just-noticeable differences (JNDs), 16 K Koala test image, 65 L Laplacian mean square error (LMSE), 22 Lattice algebra approach, hyperspectral imagery affinely independent theorem proofs, 158–167 application examples mineralogical and chemical composition, 120–124 Moffett field site, 124–136 autonomous endmember detection, 114–115 endmembers, 114 F(X) geometry
294
Index
Lattice algebra approach, hyperspectral imagery (cont.) canonical basis vector, 136 constructive theorem, 138–145 convex polytope and vertices, 147 directional vectors, 137 directions and orientations, 137–139 extreme points, 145 hyperboxes and lattice correlation memories, 146 linear subspaces, 136 lattice-associative matrix memories (LAMs), 115 endmembers, 119–120 properties, 118–119 lattice correlation memory (W and M) affine-independent sets, 154–157 X and F(X), 145–148 lattice theory bounded lattice-ordered group, 116 lattice dependent and independent vector, 117 linear minimax combination, 117 pointwise maximum and minimum, 117 scalar addition, 116–117 linear mixing model, 114 linear unmixing techniques (see Linear unmixing techniques) pixel spectrum, 113–114 spectral resolution, 113–114 Lattice-associative matrix memories (LAMs). See also Linear unmixing techniques computational complexity, 133–134 endmembers, 119–120 properties, 118–119 Lily test image, 58, 64 Linear minimax sums, 117 Linear unmixing techniques affine independence sets, 148–154 characteristics, 135 complexity, 135–136 final endmember, 120 VCA and MVES approach performance, 134 W & M algorithm, 135–136 Liquid crystal display (LCD), 9 M Magnetic focusing coil, 186 Magnet stricture coil, 273 Mean chromaticity error (MCRE), 53
Minimum-volume enclosing simplex (MVES), 134–136 Moffett field site, AVIRIS abundance maps fresh water distribution, 130 golf courses, 132 man-made resources, 129, 132 suspended sediment concentration, 133 vegetation distribution, 131 wet soil and wet vegetation, 131 automate endmember screening, 127 final endmember spectral curves, 127–128 gray-scale images, 124–126 hyperspectral image analysis, 133–134 linear spectral unmixing (see Linear unmixing techniques) nonnegative least squares (NNLS) algorithm, 129–130 N Nanoscale materials emission power, 86–88 high-numerical aperture (NA) optics, 89–93 scattering cross-section, 85–86, 88–89 SHG scattering, 85–86 Neutral electrostatic lenses, 193 Nonnegative least squares (NNLS) algorithm, 129–130 Normalized mean square error (NMSE), 53 O Optical and electron magnification, 193 Optical microscopes, 183 Optimization problem, gamut mapping constraints and continuity, 27 fixed mappings, 26 global image dependency, 26 local image dependency, 26–27 objective functions and constraints perceived distance minimizing, 29–30 perceptual distance pixel-wise, 28–29 spatial gamut-mapping algorithms, 30–31 parametric mappings, 27 P Patent applications, Ru¨denberg Alien Property Custodian, 247–248
Index
Apparatus for the Imaging of Objects, 251 cathode-ray oscillographs, 252 controversy, 255, 257–262 electrostatic lens application, 251 enlarged shadow, 253 invention process Anordnung zum Beeinflussen des Charakters von Elektronenstrahlen, 235 combined microscope patent application, 236 Einrichtung zum Abbilden von Gegenstanden, 235 specification and claims, German patent application, 234–235 magnet coil, 254 magnetic pole piece lens, 249 Nazi persecution, 249 negatively and positively charged electrostatic lenses, 254 object enlarged image, electron rays, 252 Perfect recall memories, 119 Pixel-queue algorithm, 61 Precision-recall (P-R) curves, 64 Profile connection space (PCS), 13 Psychometrics data evaluation analytic error estimation, 20 conjoint analysis, 18–19 experimental error analysis, 21 just-noticeable differences, 18 testing model assumptions, 21 Thurstone’s law, 16–17 test methods media and environment, 15 pair comparisons, 14–15 rank order and category judgment, 15 R Ru¨denberg, Reinhold childhood and education advanced electrodynamics course, 212 Hannover Technical University, 211 Ru¨hmkorff induction coil, 211 electron microscope (see Electron microscope) electrostatic microscope Farrand electrostatic lens, 263 Farrand Optical Company, Inc. (FOCI), 262–263
295
focal properties and aberrations, hyperbolic lenses, 263 gold-shadowed Pseudomonas, Farrand photomicrograph, 264 family, 215–216 invention process diversified electron density, 226 early work, Siemens, 236–244 electro-magnetic fields of force, 231 electron beam-emitting electron tube, 222 electron lens, 222, 224–226, 228 electrostatically charged metal diaphragm, 225, 229 enlarged shadows, 226 Farrand microscope, 227 fieldless spaces, 220 high resolution and high magnification, microscope, 232 high-voltage oscillograph, 222 infection and latest ideas, polio disease, 218–219 iron-clad coils, 228 leaving Germany, 244–247 magnetic field, 229 magnetic stricture coil, 223 microscope, schematic representation, 230 neutral lens designs, 227 object, minute pattern, 232 Ru¨hmkorff induction coil, 220 sharply defined luminous spot, 222 silhouette images, 220, 224 treatment, polio cases, 218 velocity filtering, electron beam, 229–230 patent applications Alien Property Custodian, 247–248 Anordnung zum Beeinflussen des Charakters von Elektronenstrahlen, 235 Apparatus for Imaging of Objects, 251 cathode-ray oscillographs, 252 combined microscope patent application, Austria, 236 controversy, 255, 257–262 Einrichtung zum Abbilden von Gegenstanden, 235 electrostatic lens application, 251 enlarged shadow, 253 magnet coil, 254 magnetic pole piece lens, 249
296
Index
Ru¨denberg, Reinhold (cont.) Nazi persecution, 249 negatively and positively charged electrostatic lenses, 254 object enlarged image, electron rays, 252 specification and claims, German patent application, 234–235 polio, 216–218 professional career chief electrical engineer, 214 design engineer, alternating current equipment division, 214 director, Wissenschaftliche Abteilung, 214 Honorarprofessor, 215 recognition and character, 265–267 S Second harmonic generation (SHG). See also Harmonic holography microscopy bulk materials, 83, 85 contrast imaging, 106 coupled wave equations, 85 electron energy transition, 81–82 frequency doubling, 81–82 intersystem crossing, 82 materials, 81 nanoprobes, 107 nanoscale materials emission power, 86–88 high-numerical aperture (NA) optics, 89–93 scattering cross section, 85–86, 88–89 phase-matching condition, 83, 85 phenomenon, 80 vs. two-photon fluorescence, 84 biomedical imaging application, 82–83 electron energy transition, 81–82 Second harmonic radiative imaging probes (SHRIMP) BaTiO3 nanoparticles, 102 bright-field transmission image, 103–104 3D distribution, 102–103 definition, 102
reconstructed images, 104–105 Siemens-Schuckertwerke frame, 194 Singular value decomposition (SVD), 134–135 Standard red-green-blue (sRGB), 2–3 Structural similarity (SSIM) index, 23 T Thurstone’s method, 17 Two-dimensional vector image, 49 Two-photon fluorescence blinking behaviour, 83 chemical reactivity, 82 contrast agents, biomedical application, 82–83 excitation band, 83 Jablon´ski energy diagram, 81, 82 markers, 83 photobleaching and phototoxicity, 82 quantum dots (QDs), 83 vibrational relaxation, 82 U United States Geographical Survey (USGS) spectrometer, Cuprite site grey-scale image, 121 minerals abundance maps, 124, 125 distribution, 121 spectral curves, 122–124 wavelength scale, 122 V Vector area morphology open-close sieve (VAMOCS), 44 Vector area morphology sieve (VAMS), 37 definition, 51 parameter evaluation aggregate distance images, 53–54 Gaussian noise, 55 Maxwell triangle, 54 MCRE and NMSE, 54–56 merged regions, 56 non-extreme region, 56 Vertex component analysis (VCA), 134–136. See also Linear unmixing techniques
Zofia Baran´czuk et al., Figure 1 Demonstration of in-gamut colors for typical printing gamuts. Colors not in gamut are left white: Original sRGB (a), photo paper (b), coated offset paper (c), and newspaper (d).
Zofia Baran´czuk et al., Figure 2 The effect of different gamut mapping strategies: original (a), linear compression (b), and clipping (c).
Zofia Baran´czuk et al., Figure 3 Typical printer gamuts (colored objects) compared with a standard sRGB gamut (wire frame: newspaper gamut [left], coated offset paper [middle], inkjet printer [right].
(a) 1,0
3,0
8,7
2,0
2,0
2,0
10.1
12.0
28.5
4.0
4.0
4.0
8,–4
2,–1
1,–1
2,0
2,0
2,0
21.2
10.1
14.5
4.0
4.0
4.0 4.0
3,0
1,–1
2,0
2,0
2,0
2,0
10.9
5.7
4.0
4.0
4.0
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
3.0
4.0
4.0
33.6
4.0
1,–1
2,–1
2,0
–6,–5 –9,–8 1,–1
2.2
3.0
4.0
33.4
33.7
15.0
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
4.0
4.0
14.5
11.9
11.9
2,–1
2,0
2,0
1,–1
1,0
–4,–3
2.0
4.0
4.0
4.7
7.8
12.5
(b) 1,0
3,0
3,0
2,0
2,0
2,0
4.0
3.3
3.3
3.5
3.5
3.5
3,0
2,–1
1,–1
2,0
2,0
2,0
3.9
4.8
6.1
3.5
3.5
3.5 3.5
3,0
1,–1
2,0
2,0
2,0
2,0
3.9
5.7
3.5
3.5
3.5
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
3.0
3.5
3.5
24.8
3.5
1,–1
2,–1
2,0
–6,–5 –6,–6 1,–1
2.2
3.0
3.5
30.2
24.8
11.4
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
3.5
3.5
14.5
7.2
7.2
2,–1
2,0
2,0
1,–1
1,0
1,0
2.0
3.5
3.5
4.7
1.5
1.5
(c) 3,0
3,0
3,0
2,0
2,0
2,0
1.9
1.9
1.9
3.5
3.5
3.5
3,0
2,–1
2,–1
2,0
2,0
2,0
1.9
3.6
3.6
3.5
3.5
3.5 3.5
3,0
1,–1
2,0
2,0
2,0
2,0
1.9
5.7
3.5
3.5
3.5
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
3.0
3.5
3.5
26.5
3.5
1,–1
2,–1
2,0
–6,–6 –6,–6 1,–1
2.2
3.0
3.5
26.5
26.5
11.4
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
3.5
3.5
15.1
7.2
7.2
2,–1
2,0
2,0
1,–1
1,0
1,0
2.0
3.5
3.5
4.7
1.5
1.5
(d) 3,0
3,0
3,0
2,0
2,0
2,0
1.8
1.8
1.8
3.5
3.5
3.5
3,0
2,–1
2,–1
2,0
2,0
2,0
1.8
2.7
2.7
3.5
3.5
3.5 3.5
3,0
2,–1
2,0
2,0
2,0
2,0
1.8
2.7
3.5
3.5
3.5
1,–1
2,–1
2,0
2,0
–6,–6
2,0
2.2
2.7
3.5
3.5
26.5
3.5
1,–1
2,–1
2,0
–6,–6 –6,–6 1,–1
2.2
2.7
3.5
26.5
26.5
11.4
1,–1
2,0
2,0
3,0
2,0
2,0
2.2
3.5
3.5
15.1
7.2
7.2
2,–1
2,0
2,0
1,–1
1,0
1,0
2.0
3.5
3.5
4.7
1.5
1.5
Adrian N. Evans, Figure 5 VAMS operation using 4-neighbor connectivity and the L2 norm. (a) Original vector image with flat zones of area > 1 identified using color and corresponding scalar image, with extrema colored red. (b)-(d) Corresponding vector and scalar images as extrema of area 2 are merged.
(a)
Original image and collection of human segmentations
(b)
CCS results for impulsive noise (left) and Gaussian noise (right)
(c)
VAMS results for impulsive noise (left) and Gaussian noise (right)
(d) VAMOCS results for impulsive noise (left) and Gaussian noise (right)
Adrian N. Evans, Figure 19 Segmentation results for image corrupted by 10% impulsive noise and Gaussian noise (s2 ¼ 103). Segmentation results for noise-corrupted images are shown in blue, noise-free segmentation results are shown in yellow, and overlaps are shown in black.